The Multilingual Editor
Introduction
The purpose
of the Multilingual editor is to allow easy preparation of text in all
the Indian languages so that many different applications can utilize the
text. An important aspect of the text prepared using the editor is the
representation of the text in a form suited for easy and effective linguistic
processing. The Editor supports a uniform user interface across all the
languages/scripts and allows a number of flexible data entry schemes.
The Editor package
also includes utilities to convert the representation into formats compatible
with other applications. Text prepared using the editor could be taken
to Word (or other similar applications) and very high quality printed documents
could be obtained. The main idea behind the design of the Editor
is the concept of "One program for all of India". The program has achieved
this distinction by supporting Urdu as well, which is included in the list
of national languages.
The version of the
Editor described here is meant for use on Microsoft Windows based systems.
When the software was designed in 1990s, a version was made for use on
Linux systems. Unfortunately, that development could not continue. Hence
the linux version is not distributed now.
Basic
Features of the Editor
1. Flexible
data entry
Text preparation using
the recommended data entry method may be mastered in just a few hours.
seen below is
the phonetic mapping scheme standardized at IIT Madras. The script used
in the illustration is Devanagari. The mapping accommodates about 58 basic
vowels and consonants across eleven languages. The mapping shown covers
aksharas from all the languages.
2. Edit
large files.
Text files of
large sizes can be handled by the Editor, typically upto 20,000 lines or
more in any of the scripts.
3. Dynamic
selection of the script
The Editor is
truly multilingual and allows free mixing of all the scripts even on a
single line. English letters (i.e., text in English) can always be typed
in along with Indian scripts. See the illustration at the beginning where
the selection of languages is shown.
Back
to contents
4. On-screen
transliteration
Text entered
in one script may be immediately converted to another dynamically. In the
screen shot shown below, the first line entered in Devanagari has been
duplicated using the copy feature and each line dynamically changed to
a script of choice. Transliteration is based on the phonetic nature of
the languages of India and the Editor permits correct transliteration of
Aksharas across all the scripts, using phonetically equivalent aksharas.
Thus aksharas not present in a language may also be shown using phonetic
equivalents for them.
In the screen image below, see how Devanagari is
transliterated into Gurmukhi and Malayalam. It is quite possible
that modern Gurmukhi may not show the conjunct in the form shown. The fourth
line is in Sinhalese and the same has been transliterated into Devanagari
in the fifth line.
Back
to contents
5. Cut/copy
Paste into other applications.
The text prepared
using the Editor may be pasted into applications such as Microsoft Word,
Wordpad, Instant Messenger, Outlook Express and many others. In essence
the IITM Editor allows many Windows applications to be enabled with all
Indian languages. One need not therefore, look for Word in Indian Languages
with its limited features in handling the Indian scripts. Seen below
are examples of cut and paste. In one case, the text from the editor is
copied into Word, where it can be formatted further. A more interesting
application is seen where the text from the editor is copied on to the
composer window of Outlook express. Email in Indian languages is just a
clicl away from the Editor.
The Editor supports
Find/Replace strings in local languages also as the screen image given
below illustrates. The keystrokes are echoed in Roman and the text string
itself is displayed in a separate window. The language selection is to
allow strings to be entered in specific languages.
Back
to contents
6. Support
for more than 10000 aksharas across all the Indian scripts.
The Editor allows data
entry correctly for many many conjuncts (Samyuktaksharas) across the different
languages. Approximately 800 conjuncts are recognized by the editor and
each one of these may combine with one of upto 16 vowels to yield the above
number. The data entry scheme also permits new conjuncts to be typed in
consistent with the rules for the writing system for the scripts.
In each
script, upto 13 punctuation marks and 10 numerals (in their respective
scripts) are supported. Traditionally Indian scripts have used few punctuation
marks, if any. However current requirements for publishing text in Indian
languages presuppose the availability of most of the Roman punctuation
symbols.
Data entry allows
for typing in Vedic accent marks in Devanagari and the Grantha scripts.
Samavedic accent marks are also supported for Grantha, the script used
in South India for writing Sanskrit.
Back
to contents
7. New
scripts.
The design of
the Editor allows new scripts to be introduced without difficulty. The
basic principle of the design rests on the concept of the Akshara and the
internal representation is the equivalent of the akshara (i.e., a sound).
Hence the display of the akshara can be effected in any script through
look up tables. The Multilingual Editor will also accommodate new
fonts for any script. the tools for introducing new fonts are included
in the IITM package. However, the wide variations seen in the fonts designed
for Indian scripts makes it virtually impossible to guarantee that all
the aksharas will be properly rendered. The set of fonts recommended by
IIT Madras fulfill the requirements for correct rendering of all the aksharas
in all the languages. The Editor package includes these fonts.
Back
to contents
|