MeesterDaan (talk | contribs) (→Gelb's Hypothesis: from pictures to sounds) |
MeesterDaan (talk | contribs) (→Best Paper Award) |
||
(20 intermediate revisions by the same user not shown) | |||
Line 57: | Line 57: | ||
</Center> | </Center> | ||
− | ==Gelb's Hypothesis: from | + | ==Gelb's Hypothesis: from Pictures to Sounds== |
<Center> | <Center> | ||
{| align="justify" | style=" align="top"; text-align: center; margin-left: 1em; margin-bottom: 1em; font-size: 100%;" | {| align="justify" | style=" align="top"; text-align: center; margin-left: 1em; margin-bottom: 1em; font-size: 100%;" | ||
|- | |- | ||
|valign="top"| | |valign="top"| | ||
− | Many language networks have small-world properties but for Japanese Kanji, there's a little more to it. | + | Many language networks have small-world properties but for Japanese Kanji, there's a little more to it. An old conjecture by Ignacy Jay Gelb (1907-1985), states that all written languages go through a ''phase transition'' from being picture-based to being sound-based. The idea original is quite coarse and the exact trajectory might differ from language to language, but the conjecture is alluring where it concerns clustering in Japanese Kanji. |
− | Kanji characters are often explained as symbols of 'compound meaning', built up from individual components. But these explanations seem anecdotical rather than scientific. For Kanji characters, some studies have related components to sounds rather than to meaning. In fact, a few convincing examples exist of Kanji that share a component, and a pronunciation, but | + | Kanji characters are often explained as symbols of 'compound meaning', built up from individual meaningful components. But these explanations seem anecdotical rather than scientific. For Kanji characters, some studies have related components to sounds rather than to meaning. In fact, a few convincing examples exist of Kanji that share a component, and a pronunciation, but not a meaning. But what's more, an ancient character dictionary named Shuowen Jiezı, containing 9353 Kanji, adopts a 540-piece component index. This shows that the number of components ''must have dropped through time'' by deletion, by merger, transformtation or by replacement of individual components. |
+ | |||
+ | |||
+ | These reduction operations could account for the correspondence in components and pronunciation on the one hand, and discorrespondence in meaning on the other, as such being the driving mechanism behind the Gelbian Phase Transition. | ||
Line 73: | Line 76: | ||
</Center> | </Center> | ||
− | ==Kanjis | + | ==Kanjis & Clusters: A Gelbian Phase Transition== |
<Center> | <Center> | ||
{| align="justify" | style=" align="top"; text-align: center; margin-left: 1em; margin-bottom: 1em; font-size: 100%;" | {| align="justify" | style=" align="top"; text-align: center; margin-left: 1em; margin-bottom: 1em; font-size: 100%;" | ||
|- | |- | ||
|valign="top"| | |valign="top"| | ||
− | So it seems that | + | So it seems that successive disappearance of visual features accounts for the clustering found in the Kanji-network. Modern-day Kanji characters are systematically structured, and nowhere near the elaborate pictures they were around 2,000 years ago. |
+ | |||
+ | Simultaneously, there is some evidence that some components correlate closely to a Kanji's pronounciation. Studies by Townsend (2011) and Toyoda, Fardius and Kano (2013) have yielded large sets of components that "can be used by students to guess their pronunciation". For instance: many of the characters containing the 'middle'-component have a pronunciation "chuu", and many Kanji with the 'orders'-component have a pronunciation 'rei'. Yet, many of these Kanji sharing such a 'speech-component' seem a world apart when it comes to their meaning. Why do the Kanji for mushroom, wise, bell and actor all have an orders-component? Is there ''really'' a shared meaning in these characters? Or have the components actually taken on somewhat of an alphabetic function? | ||
− | |||
+ | We think they have, and as such regard small-worldness in Japanese Kanji characters as a byproduct of a language going through a Gelbian phase transition. Many ancient writing systems around the world began as series of pictures, but most of them have transformed to an alfabetical system over time, or have disappeared. Japanese, a rarity in linguistics, is still in the midst of this transformation, and complex network analysis provide the tools to quantify this transformation, and explain why written Japanese is as it is today. | ||
− | |||
− | |||
|valign="top" |[[Image:kanji_evolution.jpg|frame|Kanji evolution through time. Notice how visual similarity has increased, especially between 'horse' and 'fish'. Adapted from [https://www.tofugu.com www.tofugu.com]]] | |valign="top" |[[Image:kanji_evolution.jpg|frame|Kanji evolution through time. Notice how visual similarity has increased, especially between 'horse' and 'fish'. Adapted from [https://www.tofugu.com www.tofugu.com]]] | ||
Line 91: | Line 94: | ||
|} | |} | ||
+ | </Center> | ||
+ | |||
+ | ==Best Paper Award== | ||
+ | <Center> | ||
+ | |||
+ | |||
+ | {| align="justify" | style=" align="top"; text-align: center; margin-left: 1em; margin-bottom: 1em; font-size: 100%;" | ||
+ | |- | ||
+ | |valign="top"|[[Image:vierluik2.jpg|600px|Best Paper Award.]] | ||
+ | |} | ||
+ | |||
+ | |||
+ | |||
+ | <h3> Yes, we won the elusive Best Paper Award from IARIA's Data Analytics Conference. </h3> | ||
+ | |||
+ | |||
</Center> | </Center> | ||
Latest revision as of 23:18, 9 October 2018
Contents
Paper
I'm still working on this page, but our paper is here. I (Daan van den Berg) welcome all feedback you might have. Look me up in the UvA-directory, on LinkedIn or FaceBook.
Japanese Kanji Characters
The whole idea was quite simple actually, born from the language enthousiasm of three programmers. But Japanese is not an easy language of choice, as it features a 60,000-piece character set named Kanji. Although you 'only' need to learn about 2,000 to read the language, this is still a formidable exercise in itself. There is some systematicity though, as many Kanji are composed of one or more of 252 components, and some combinations are more common tha others. Individual Kanji-characters are said to carry meaning in a word-like manner, and akin to how compound words are built up ("swordfish", "snowball", "fireplace"), kanji often derive meaning from the combination of components, and are thus often explained as such in textbooks for Japanese children and foreign students of the language.
|
Kanjis & Small-Worlds
The network of connected Kanji turned out to be a small-world network, which means it has a high clustering coefficient, a low average path length and a low connection density. At the time, computational linguists had already found a large number of small-world networks in different languages on various levels, but this network of Kanji sharing components was not yet one of them.
|
Clustering Coefficient & Average Path Length (optional)
First let's have a brief look at what makes a small-world network a small-world network. First of all: small-world networks are sparse networks, that is, networks with relatively few edges. Second, small-world networks have a high clustering coefficient. The clustering coefficient on a vertex is the fraction of edges between its neighbours. As an example, look at the figure. Vertex C has four neighbours and between these four we have three edges, out of a possible six, so the clustering coefficient on C is 0.5. Similarly, the clustering coefficient on B is 1, and on F it is 0.33 if we disregard the two neighbours it must have outside the figue. If we don't disregard those, it has five neighbours with either one or two connections between them, so the clustering coefficient on F would be either 0.1 or 0.2. The cluster coefficient of the network is simply the average of all its vertices.
|
Gelb's Hypothesis: from Pictures to Sounds
Many language networks have small-world properties but for Japanese Kanji, there's a little more to it. An old conjecture by Ignacy Jay Gelb (1907-1985), states that all written languages go through a phase transition from being picture-based to being sound-based. The idea original is quite coarse and the exact trajectory might differ from language to language, but the conjecture is alluring where it concerns clustering in Japanese Kanji.
|
Kanjis & Clusters: A Gelbian Phase Transition
So it seems that successive disappearance of visual features accounts for the clustering found in the Kanji-network. Modern-day Kanji characters are systematically structured, and nowhere near the elaborate pictures they were around 2,000 years ago.
|
Best Paper Award
Yes, we won the elusive Best Paper Award from IARIA's Data Analytics Conference.
The Team
|
More
Maybe later.