Using
the Multilingual Editor
The multilingual editor
is used for preparing text documents in different Indian languages. The
Editor is an application written for use under Linux as well as Microsoft
Windows (98/me/2000/XP). It supports a graphical user interface as seen
from the screen shot.
As
in normal applications supporting drop down menus, the editor features
menus for file operations, editing and selecting the script. The currently
selected script and the mode of input are displayed at the bottom of the
window along with the line number where the cursor is located. It must
be kept in mind that the application is a text editor and therefore word
processing features are almost completely absent. Yet, the text prepared
by the editor can be taken to a word processor (e.g., Abiword or Microsoft
Word) and formatted as desired.
Features of the editor
which are useful for multilingual text preparation are discussed below
along with details of use.
Opening
Files
A new file
can be opened by clicking 'New' in the 'File' option in the main menu .
The name for the file will be required only when you save the file.
An already prepared
file can be opened by first selecting the 'Open' in the 'File' option of
the main menu and then selecting the file by clicking on the filename when
a new window displaying the available .llf files comes up (or typing the
file name in the text area beside 'File Name' and clicking the 'Open' button).
This is the standard feature seen in many Microsoft Windows applications.
The File Names
of some recently opened files can be seen on the menu displayed when the
'File' option is clicked and they can be opened just by clicking on the
file name. This is a standard feature seen in many Windows applications.
In the present Editor,
the file names will have to be specified in English. In future versions,
it is envisaged that the user will be able to name the files in local languages.
Selecting
a Language
While opening
an already existing file, the file is checked for the information about
the language in which it was prepared. (Each file features a header that
contains self-identifying information about the file.) In case the header
is absent, the file is opened in the Default language (which has been set
to SANSKRIT). It is possible that a text file conforming to the coding
scheme of IIT Madras is prepared by another text processing application.
In such a case, the pure text may not have a header.
The current version
of the Editor will give an error message about the missing header but will
allow you to open it if you say yes. It is possible to fool the Editor
by taking an arbitrary file and naming it with a .llf extension. The Editor
will open the file but you will end up editing a local language text string
that may have no meaning for you, just as what you may find in editing
a binary file with a text Editor.
During the editing
process, a new language may be selected by clicking on the 'Language' option
on the main menu and then clicking on the required language. In case you
have added a new language, then choose the 'Other Languages' option in
the menu and then choose the language corresponding to which you have added
your language in the IITMfcEd.ini file. The screen shot below shows this.
Back
to top
On
Screen Transliteration
It is possible
to change a portion of text seen in one script into another dynamically.
Just select the text using the mouse and open the language menu. Choose
the language and the selected text will immediately change to the newly
specified script. Please note that this powerful on screen transliteration
facility will work properly even for a single akshara. However,
it may be difficult to identify exactly where an akshara begins and where
it ends if it includes a few zero width glyphs (as in the case of a Matra).
So we recommend that you transliterate whole words and do a bit of editing
if necessary. Transliteration will be possible only if the selected text
is in one single script. If English letters are included in the selected
text, transliteration will be effected on that text as well, leading to
strange results!
Back
to top
Fonts
used by the Editor
The current
version of the Editor has basic support for all the Indian Languages/Scripts.
The practical use of the Editor is however restricted only to those languages
for which the Truetype fonts and the associated .tab files are available.
As of May 2001, all the languages/scripts of India including Urdu are supported.
In the list below, the names in the second column refer to the names of
the True type fonts required for displaying the script corresponding to
the indicated language. Where other fonts may also be used, appropriate
mention is made in the third column.
Sanskrit
|
iitmsans
|
Sanskrit 1.2, Sanskrit98,
Xdvng
|
Hindi
|
iitmhind
|
Xdvng
|
Tamil
|
iitmtam
|
Adhawin, Iweb-kambar,
tamnet99_fonts
|
Telugu
|
iitmtel
|
Pothana
|
Kannada
|
iitmkann
|
LangscapeKnd-Padmini
|
Malayalam
|
iitmmal
|
ltml_manoj, kerala
|
Oriya
|
iitmoriya
|
-
|
Bengali
|
iitmbeng
|
ItxBeng
|
Gujarati
|
iitmguj
|
ItxGuj
|
Gurmukhi
|
iitmpunj
|
-
|
Diacritics
|
iitmipa
|
-
|
Urdu
|
iitmurdu
|
-
|
Basic
Editing Operations
The text
can be edited by moving the cursor using the arrow keys and the Page up
and Page down keys, and keying in the desired text. While editing, the
cursor moves one akshara at a time. That is, once you have typed a
full character, you cannot delete only a part of that character. This is
because, the character is obtained by combining different key-strokes and
all of them are assembled into a single character. This gives you a facility
to add vowels to consonants to form a full character even in the middle
of the text. This feature is consistent with the observation that the internal
representation of text is in syllable form.
Whenever you move
on to a different line, the cursor may move to a place not directly above
or below the old position and in case there is no text on that line, then
the language is set to the default language. In case there is some text
on that line, then the language is set to the language of that text. A
general recommendation is to type in no more than fifty characters per
line though many more can be typed in. The language can be set by selecting
the required language from the Languages option in the main menu. Other
editing options provided are, Cut and Paste, Search and Replace and inserting
text from other files.
Mixing
many Languages
A line of
text in the Editor can have many languages, which can be selected from
the 'Language' option in the main menu. A general recommendation is to
use around two or three languages per file. The languages can be selected
in any order. The file having multiple languages is saved in a manner where
the languages are preserved when it is reopened in the Editor or browser.
You may save a file in a different language but the change will apply only
to the default language used in the entered text. Please see the section
on "saving as".
Cut
and Paste Options
This option
for cutting and pasting of text is useful for generating text that has
many repeated parts. The desired text to be replicated can be selected
by dragging the mouse on it (keeping the mouse button down as the mouse
is moved). The Select All option in the Edit menu of main menu can be used
to select the whole file. By selecting the Copy option in the Edit menu
the selected text is copied into a buffer in the memory. After placing
the cursor at the desired target location the Paste option in the Edit
can be used to replicate the text at the desired location (any number of
times).
The
only limitation in this method is that whole lines of text must be cut
and pasted at a time. Another file can be pasted using the insert
file option in Edit menu. The cut/copy and paste operation can also be
used to take text into other Windows applications such as Word, Excel,
Outlook Express (email composer window) or even the Microsoft instant messenger.
This is a very useful feature of the Editor. The copy paste operation combined
with the on-screen transliteration feature can save hours of work in preparing
multilingual documents where the same text is seen in different scripts.
Please note that the cut and paste operation works with full lines only.
This is currently a design limitation.
Back
to top
Search
and Replace Options
When this
option is selected, a new window appears on the screen with three fields.
The first is for the language specification and the second is for the input
string. The third field shows up the string in local language as letters
are typed into the second field.
The string to be searched
should be typed in the text area where the keystrokes are echoed in Roman.
However this string is dynamically transformed into the chosen script ,
and is displayed in the text area below the former. Then 'Find' button
should be clicked to locate the text. 'Reset' will clear the search string.
The search can be repeated by pressing F3 key or by clicking on the 'Find
Next' option in the Edit menu in main menu for searching the text in the
entire file.
It must be borne in
mind that this somewhat different approach to inputting the search string
is caused by some restrictions imposed by the design of the Editor, which
relies on Microsoft foundation classes. It turns out that there is no easy
way to accept a string in local language for the find option. Hence the
input string is really presented as an ASCII string, which however is processed
to display the equivalent local language characters. Since the input has
to conform to ASCII conventions, use of the Ctrl key while forming conjuncts
will give some problems, as the control key will be interpreted differently.
To avoid this, the user should use the "^" key to indicate combinations
while entering strings with conjuncts. Please note that the use of the
carat key will cause some problems for Vedic accents where the carat key
has been assigned a specific function!
Text can
be searched and replaced using the Replace option in Edit option of the
main menu. Select a language for the text to be searched and replaced.
Type the text to be searched in the text area in English, which is dynamically
transformed into the language selected in the test area below. Then type
in the 'Replace with' text area the transliterated text in English. Click
on the Replace button to replace the first occurrence of the text or on
the "Replace All" button to replace the text in the entire file. This option
is useful for replacing wrongly keyed in text by the correct one at a later
point of time.
The
search and replace operation is not yet supported in the Linux version
(Jan. 2003).
Back
to top
Inserting
Text
Text can be inserted
at a point in a file by the Cut and Paste option or by simply moving the
cursor to that point and keying in the text. This option also allows the
user to add text from another file. This can be a very useful option in
association with the Saving and Saving as option. A file can be typed in
a language and can be saved as a file in another language using the Save
As option. These two files can be concatenated into one file by using the
Insert file option in the Edit menu. Using this option an important file
can be converted into any of the languages provided, and concatenated into
one file having many languages in a matter of few minutes! The sample Vande.llf
file was prepared this way. The same file could also have been prepared
using the on-screen transliteration feature. In the current version of
the Editor, new data from another file may be inserted at any selected
line and not merely at the end.
Please note that cut/paste and insert file operations work with whole lines
only. This is a limitation in the current implementation of the editor.
Saving
and "Saving As" Options
An interesting
feature of the IITM Editor in that the data input during the editing process
is retained in the memory of the computer in two different formats simultaneously.
Hence the file may be saved in either of the formats. The first of these
is the .llf format in which the characters of different Indian languages
are stored as 16 bit codes. The .llf format is universal in the sense that
it is a language independent representation of the text, which allows automatic
transliteration across different languages. The .llf format is also recognized
by IITM software running on other computer systems such as UNIX machines,
DOS machines and the Macintosh.
The .llf format is
a compact format where each character, be it a vowel, consonant, conjunct
or combination, occupies two bytes. This format is a BINARY format and
when the Editor saves the text in this format, it produces a binary file
consisting of 16 bit codes. The Editor attaches a header to the file when
it is saved. This header consists of specific Multilingual information
relating to the contents of the file.
The name of the file
may be specified by typing it in, when the window for specifying the file
name appears. By default, the Editor will save a file as "untitiled.llf"
if no file name is specified.
The .llf format is
very useful for saving the text if further processing of the entered text
is required e.g., indexing the words, generate concordances etc., or other
linguistic processing. The IITM Local Language library may be used to write
applications, which work with 16 bit codes.
The second format
is known as the Rich Text Format (rtf), a standard used by Microsoft to
produce documents which may be easily imported into other application software
(such as Word Perfect, Microsoft Word or Wordpad). The rich text format
consists of purely ASCII text incorporating mechanisms to denote formatting
information.
Text saved in Rich
Text Format may be easily imported into applications such as Wordpad, Microsoft
Word etc., thus permitting global formatting of the entered text using
the features of these applications. Rich Text format can also be easily
translated into the HTML format useful for generating web documents. To
view a .rtf file generated by the IITM Editor on other systems, the corresponding
fonts must be available in the second machine.
The text saved in
the Rich Text format may be directly printed from any application that
can handle the format. This is also the preferred way of inserting text
in Indian languages into other documents say, prepared using Word or Word
perfect.
It must be
remembered that the file saved by the Editor in .rtf format cannot be opened
again by the Editor. The Editor will open files saved in the .llf format
only. The Rich Text format embeds the information relating to the specific
fonts, which must be used in viewing the text. Thus while the .rtf format
is truly portable, one would also require to install the corresponding
font(s) in their system.
When you select the
"Save as" option the choice of a language in which the file is to be saved
as well as the choice of a format are available for the user. Note that
a new file has to be first saved using the 'Save' option, before it can
be saved using the 'Save as' option.
Back
to top
Saving
the file in a different language/script
The Editor works with
a universal representation for the characters in all the Indian Languages.
Thus when data is entered using one script say Malayalam, the text can
be saved so as to identify it as a document in, say Bengali. This means
that when you open the saved file it would come up with Bengali as the
script.
What would be the
use for such a feature?
Often when preparing
multilingual documents, the same Text (typically a couplet from of Gita
or a poem of Tagore) may have to be reproduced in different scripts for
people to read them in their own mother tongue. In such situations, one
need not enter the same text again and again in different languages. (Remember
when we say text in Indian languages we mean text that is phonetically
presented). Once the text is prepared in the base language, a copy of it
can be opened by the Editor and saved in another language. This new file,
if appended to the first one, will produce a new document with the same
text in two different languages. The insert text option of the Editor will
be useful for appending files or inserting test from one file in the middle
of the document being edited.
Note: The assumption
that there is a common phonetic base across all the Indian languages is
the basis for this feature. While it may be thought that this should permit
any text to be entered in any language/script, one must keep in mind that
during data entry, the input is limited to the characters of the specific
language chosen. Characters found in other languages but not the one in
use in the Editor, cannot be input. Thus the universal representation is
useful for uniform display of the phonetic information. Characters specific
to one language can be input only in that language.
Vande.llf, included
with the Editor package, is an example of a file prepared by the Editor,
which displays the same text in different languages.
Back
to top
Output
formats generated by the Editor
The keyed
in local language text will generate output in .llf (Local Language Format)
file as well as in .rtf (Rich Text Format) . The details of these formats
are described in the "Saving and Saving as" topic. The text displayed on
screen is in a format compatible with clipboard based cut/copy and paste
applications. This is in the .rtf format.
Printing
The text being edited
may be printed using the print option in the File menu. The print preview
may be selected to get an idea of the appearance of the printed page. In
practice, it would be easier to copy the text into a word processor and
print the same after formatting the text to suit one's requirements.
If multiple printers
are installed in the system, the appropriate printer may be selected. Some
flexibility is available in orienting the page(s) to be printed.
The
Editor does not permit formatting of the text as in some word processors.
However, if the text was saved in the rich text format, a program such
as Wordpad or AbiWord may be used to effect the formatting prior to the
printing. The application accepting the .rtf file may be used to change
the appearance of the page by selecting the fonts, sizes, left or right
alignment etc.
|