More on common Chinese words: adjectives, verbs, nouns

In the last (and first) two blog posts I showed how I was able to retrieve hanzi frequency from an online news website. I also showed you how to produce an appealing HTML output for the processed data with Jinja2 python package.
But now, I want to go deeper trying to get more useful informations. Up to now I worked on single Hanzi so characters that are not necessarily words in Chinese language.

What I’m going to do is to apply the same frequency analysis to entire Chinese words: adjectives, verbs, nouns…

Continue reading “More on common Chinese words: adjectives, verbs, nouns”

The most common Hanzi in Chinese language (part 2)

Hello again,

the last time we saw how I retrieve, store and process Hanzi data to build my (for now basic) statistics for the most common Chinese characters list.
Today I won’t go much further since I’ll just talk about how I managed to present the data I collected in a clearer and more detailed way.

I’ll show you a basic usage of Jinja2, a python package to work with Templates

I’m quite new to this since I usually do not work on web presentation layer and the only other template engine I worked with is JSF for Java (and i didn’t like it much…)

I found Jinja2 really pleasant, and really easy to use. Oh… and funny thing is that “Jinja” (神社 in japanese) means “Shinto Shrine” and Jinja2’s logo is precisely a stylized Shinto Shrine. Not Chinese but still fun to notice the oriental “touch”. Continue reading “The most common Hanzi in Chinese language (part 2)”

The most common Hanzi in Chinese language

The Work

I’m going to introduce a work that is related to two of my field of interest: Chinese language and programming.
This is about a study on the frequency with which the most common Chinese Hanzi (word) appears in the modern language. I’m interested in this topic because I’m right now studying Chinese and I was wondering which words and characters I should be confident with as soon as possible.
When we study a language, often we aim to pass certain exams (HSK in my case) and each language exam usually have a set of words and grammar points to know. This is especially true for Chinese, since its lexical characteristics, and vocabularies for each HSK level exist (I suggest HSK Academy).

Similar works already exist but my plans are wider, I’m going to gather a lot of data, and I want to do it in a proper manner so that I’ll be able to use it for other purposes. Continue reading “The most common Hanzi in Chinese language”