Data sources

The most important thing of this software is the data packaged along. Some people out there are doing a great job creating and compiling information about the Japanese language, that is invaluable for us students.

Kanji and vocabulary

This package uses the EDICT, KANJIDIC and KRADFILE files. These files are the property of the Electronic Dictionary Research and Development Group at Monash University, and are used in conformance with the Group's licence.

The information about kanji comes from the KANJIDIC file. A subset of 1926 kanji was taken from the file.

The vocabulary use in this tool was taken from the EDICT - Japanese-English Electronic DICTionary File. A large subset of that dictionary is used (around 90.000 words), stored in the EDICT format.

Radical information for kanji cames from the KRADFILE file; this information is used since version 0.4.1 for computing kanji similarity.

Kanji Stroke Order Diagrams

Stroke Order Diagrams (SOD) pictures and Animations (SODA) are taken from the SODER project, hosted in the Kanji Cafe website. The 3th release of the project was used in this software. This release containt SODs for 1513 kanji.

Additionally, the SODs are used as a raw data for determining correct stroke order using machine logic in this software package.

The SODs are used in conformance to the SOD and SODA License Agreement.

The JLPT Study Page

Peter van der Woude has compiled kanji and vocabulary lists for the defferent levels of the JLPT. The organization of kanji used in this tool is taken from his website www.jlptstudy.com. The lists are up to date, since the (last?) JLPT requirements update in 2002.

Kanji similarity models

Lars Yencken and Timothy Baldwin's paper on similarity models for kanji has been really useful to introduce similarity methods in this tool. For more information, check:

Yencken, Lars and Baldwin, Tim: Modelling the orthographic neighbourhood for Japanese Kanji, in Proceedings of ICCPOL 2006, Singapore (2006) [pdf]