How
it works!
Romtrans
provides a general-purpose package for script to script
conversion specifically Roman to non-roman scripts such
as Devanagri, Arabic, Malayalam etc. It uses ID3 algorithm
which is a machine learning algorithm for classification
tasks.
Very
simply, ID3 builds a decision tree from a fixed set
of examples. The resulting tree is used to classify
future samples. A training set in form of dictionary
is provided to ID3 algorithm to make descision tree
in the form of If-Else code.
Decision
Tree is the representation of decision procedure for
determining class (classification) of a given instance.Since
roman script only has 26 characters whereas scripts
such as devanagri have lot more characters to represent
various sounds therefore such a classification is required.
The user
must understand that script transliteration is not translation.
Rather, script transliteration it is the conversion
of letters from one script to another without translating
the underlying words.
Using
this desicion tree the input text in roman script is
mapped to there corrseponding Phonemes in 7-bit ASCII
characters(ISO 646 character set).For Roman to Hindi
we have used ITANS (with some modifications) convention
for representing phonemes.
Now these
phonemes are mapped to unicode characters of the object
language which is the final output.
|