Hadoop, why I need it

I graduate in computer science with Artificial Intelligence specialization, however during my academic path I was never taught how to use the most common frameworks and environment. When I stared looking for my first job, I immediately went for machine learning and data analysis junior positions. I was sure to be qualified for them, however the truth is that probably I’m not (yet). Most of positions are strongly related to the so called Industry 4.0 as a new way to manage industrial machines with IoT and Big Data technologies. This is when my attention focused on a couple names that I maybe heard only once or twice so far: Hadoop and Spark.

Big Data do not necessarily relate to machine learning but you usually end up with running prediction and learning algorithms on them. Conducting Big Data analysis is almost never faisable on a single machine and instead requires many of them, and here is where Hadoop makes his entrance. Continue reading “Hadoop, why I need it”

I graduated, what now?

The last month I reached two goals: I finally graduated in computer science and I passed the Chinese HSK (汉语水平考试) level 2 exam.
During Xmas holidays I had a chance to visit the Emirates and now I’m rather relaxed.
Highest tower in Dubai, I visited it after I graduated
Since then I’m not really doing anything, of course I’m checking job aggregator websites (Indeed, Monster…) as well job oriented socials like LinkedIn and also considering the many job offers I get through them. Continue reading “I graduated, what now?”

When Subclipse plugin freezes

The last few weeks (months…) I have been working on my thesis work, as I’m going to present it this fall to get my M.D. in Computer Science.
Maybe in the future I’ll explain you what it is all about, but now I just want to share a really annoying fact that made me lose more than an hour. I surfed the web looking for a solution that I did not found. Maybe so simple that noone ever posted… Continue reading “When Subclipse plugin freezes”

More on common Chinese words: adjectives, verbs, nouns

In the last (and first) two blog posts I showed how I was able to retrieve hanzi frequency from an online news website. I also showed you how to produce an appealing HTML output for the processed data with Jinja2 python package.
But now, I want to go deeper trying to get more useful informations. Up to now I worked on single Hanzi so characters that are not necessarily words in Chinese language.

What I’m going to do is to apply the same frequency analysis to entire Chinese words: adjectives, verbs, nouns…

Continue reading “More on common Chinese words: adjectives, verbs, nouns”

The most common Hanzi in Chinese language (part 2)

Hello again,

the last time we saw how I retrieve, store and process Hanzi data to build my (for now basic) statistics for the most common Chinese characters list.
Today I won’t go much further since I’ll just talk about how I managed to present the data I collected in a clearer and more detailed way.

I’ll show you a basic usage of Jinja2, a python package to work with Templates

I’m quite new to this since I usually do not work on web presentation layer and the only other template engine I worked with is JSF for Java (and i didn’t like it much…)

I found Jinja2 really pleasant, and really easy to use. Oh… and funny thing is that “Jinja” (神社 in japanese) means “Shinto Shrine” and Jinja2’s logo is precisely a stylized Shinto Shrine. Not Chinese but still fun to notice the oriental “touch”. Continue reading “The most common Hanzi in Chinese language (part 2)”

The most common Hanzi in Chinese language

The Work

I’m going to introduce a work that is related to two of my field of interest: Chinese language and programming.
This is about a study on the frequency with which the most common Chinese Hanzi (word) appears in the modern language. I’m interested in this topic because I’m right now studying Chinese and I was wondering which words and characters I should be confident with as soon as possible.
When we study a language, often we aim to pass certain exams (HSK in my case) and each language exam usually have a set of words and grammar points to know. This is especially true for Chinese, since its lexical characteristics, and vocabularies for each HSK level exist (I suggest HSK Academy).

Similar works already exist but my plans are wider, I’m going to gather a lot of data, and I want to do it in a proper manner so that I’ll be able to use it for other purposes. Continue reading “The most common Hanzi in Chinese language”