History of the IITM Project
It would be of interest
to know something about the Systems Development Laboratory in the department
of Computer Science and Engineering at IIT Madras. This lab has distinguished
itself as a unique student managed laboratory in the Institute. Started
around the time the 8080 cpu was introduced along with a small development
kit known as the SDK80, the lab had been christened as the hardware lab
of the department! Early projects in the lab centered around building small
systems for educational use, specifically to allow students to learn about
the underlying principles of computer systems. Though IIT Madras had at
that time one of the state of the art machines (370/155), one could never
look at the machine inside, much less put an oscilloscope probe inside
to look at wave forms! One of the earliest projects in the lab was the
design and implementation of a digital module for drawing Bezier curves
(please see the image in the acknowledgment page), the idea being the possibility
of generating characters of Indian scripts.
It took a few years
before bit slice processors were available in the country. The earlier
8080 based hardware was modified to work with the AMD2900 series and a
simple system was demonstrated in 1982. The paper
"An approach to character generation using cubic splines" presented at
the IEEE Consumer Electronics conference in 1983 won the second place outstanding
paper award and paved the way for deeper interest in the students to work
towards systems for displaying Indian scripts.
The availability of
the PC in India made a big difference with the possibility of running the
curve generation algorithm in software, almost at the same speed that the
AMD2900 was providing. The students of the lab had by now mastered the
art of drawing curves and generating characters but the lab had virtually
no PC since the systems being built in the lab were 68000 based Unix machines
with graphical support. This was the period known for the six versus the
eight fight (the superiority of the 68000 CPU over the 8086!). The earliest
of the PCs which were procured for the lab were XTs and in 1988, the first
attempt at computing with Indian scripts was made by designing and implementing
an interpreter for a Basic like language written in Tamil or Telugu. The
characters would not be displayed through fonts but drawn on the screen
using curves. Independence from the vagaries of fonts was of great importance
to the students who could use a simple incremental algorithm to stroke
the characters independent of the script. At the
same time, the need to represent syllables rather than shapes was recognized
and the system used a sixteen bit code for the syllables of Tamil and Telugu.
.
The approach taken
was not unlike that of the Metafont approach suggested by Prof.Knuth but
the rendering was done in IIT's own way. TeX was a favourite with many
academically motivated developers but unfortunately interaction was out
of question. In the IIT experiment, the representation of characters using
sixteen bits was done in a manner which made it possible to quickly identify
the strokes needed to generate the character. As many as four different
shapes, each made up of up to 16 curves, could combine together to generate
a composite shape for a syllable. The result was that one could get the
system to provide an interactive user interface so that computing with
Indian scripts could be made possible. Though a simple Basic like interpreter
(written using Turbo-C) was demonstrated, the real need was not a programming
environment but one in which applications would be available to users for
data preparation and processing.
It was in the light
of this requirement, the present project was conceived. The very first
application developed using the curve drawing approach to displaying text,
along with the syllable level internal representation was a screen editor
under DOS which worked on a set of application calls to provide input and
output functions to a user application. These functions were much like
the getch() and putch() functions of C but allowed us to input strings
in Indian scripts. These functions constituted what the students called
the local language library. They had these basic functions named lgetch()
and lputch() to indicate the local approach to input and output. Using
the library of functions, one could write a variety of applications, which
would work uniformly across the languages of India. In 1993, a gopher client
was built and it established the feasibility of developing many useful
applications supporting user interfaces in Indian languages by modifying
the character processing routines of standard applications to work with
two byte codes. The students also demonstrated a client to work with Oracle
and allow queries to be effected in Indian scripts and the results displayed.
Around this time (1993),
there were quite a few Operating Systems in use but DOS was yielding to
Windows3.X and Unix. This gave the students an opportunity to port
the system to several machines including the Macintosh. The lab was
fortunate to gain the friendship of Prof. Frank Starmer of Duke University,
who spent a sabbatical year at IIT Madras and Prof. Sankara Rao of the
University of North Dakota who graciously agreed to provide space on their
systems and allow the IITM software to be made available via ftp to others.
This was a time when IIT Madras had just one 9600 baud line to the net
catering just to email services.
By this time, over
fifteen students had contributed to the development of the software and
many new applications were being envisaged. During 1994-96, the library
functions were standardized as was the global set of aksharas across the
Indian languages and a line editor (led), a viewer (lb) and a printing
utility were developed for as many as six platforms and these were subsequently
distributed from the Duke web site. (http://taylor.mc.duke.edu/~krishnan/
)
Subsequently, the
lab was fortunate to get a 486 machine and could host a small web server
whose purpose was to serve on-line lessons to learn Sanskrit. The multilingual
Editor "led" was enhanced with additional features and was used to prepare
the text for the lessons. This service had won much appreciation from the
user community.
The on-line Sanskrit
lessons established something very important. That Indian language text
could be easily displayed on the web without having to install special
software was perhaps the most important observation. That the IITM software
had the best features to perform linguistic processing as well, was another
important observation. The lab's web server which was known as
http://sdlcfsn.cs.iitm.ernet.in/,
was
named to honour Prof. Charles Frank Starmer who had done much to help the
lab gain visibility on the net. Most persons who have come to know about
the IITM software actually got the details from the Duke site, which later
moved to the Medical University of South Carolina, along with Frank Starmer.
For historical reasons, the early pages with characters drawn through curves
were maintained at that time at
http://www.musc.edu/~krishnan
During the summer
of 1997, Prof. Raj Reddy of CMU, who had visited IIT Madras, saw the development
and immediately recognized the strength of the syllable level coding. He
felt strongly that it was time for the lab to start working with fonts
so that the standardization that was being effected on the net could be
honoured by the IITM software. Though the lab was fully aware of the vagaries
of fonts and specifically the chaotic situation in respect of Indian language
fonts, the students were convinced that the software would indeed gain
strength by providing output formats consistent with the support provided
by the newer systems, specifically win95.
The newer versions
of the editor and related software, which work well on Win95 systems actually
utilize the full complement of glyphs supported in truetype fonts. However,
for the text to be rendered properly on other systems, compatibility with
ISO-8859-1 has been forced. Also it must be stated that the lab continued
to recommend the syllable level coding for the aksharas though elsewhere
in the world, Unicode was being recommended. Microsoft and other developers
continue to provide support for Indian languages primarily though a language
enabling process rather than a language localization process.
Our stand
on Unicode is reflected in the observation "Unicode for Indic Scripts requires
the Application Programmer to understand how a syllable should be rendered.
Application Programmers are thus expected to thoroughly comprehend the
Orthography of the script for the language. Being a variable length code,
Unicode is not easily amenable to linguistic text processing".
1997 also brought
in an important development in the lab, that of synthesizing the sounds
of the aksharas. The syllable level coding made it possible for the students
to experiment with different synthesis schemes. Systems such as the Festival
speech synthesis system or the Klatt synthesizer were initially used by
the IITM software to directly go from akshara to sound but the rendering
of the phonemes was not satisfactory, being limited to either American
or British speech.
The MBROLA system was an ideal choice and it would not
be an exaggeration if it is said here that the very first continuous text
to speech application in Indian languages was demonstrated in just three
days after a version of MBROLA for win95 was downloaded. A bit of experimentation
with the different data bases allowed the students to finalize the choice
on the Swedish data base. Today (July 2001), there is indeed a Hindi data
base for use with MBROLA but it is somewhat inadequate when it comes to
generating conjuncts. The recently added Telugu Data Base also supports
a very restricted set of diphones (August 2002).
Clearly the lab has
to work with other groups to develop meaningful data bases for all the
different Indian languages. The absence of proper recording resources has
hampered this activity but hopefully we will be able to work with other
groups.
Enhancing the Indian
language applications with speech is very easily accomplished in the IITM
software when new applications are developed. The speech enhanced multilingual
editor was one of the first applications developed at the lab for the benefit
of the visually handicapped persons in India. This application allows a
visually handicapped person master data entry in Indian languages and thus
prepare himself/herself for higher education and meaningful employment.
At the same time, the syllable level coding matched the requirements of
Bharati Braille and preparation of documents in Braille could be easily
accomplished.
Senior citizens of
Chennai, who had read about the software through newspaper articles, had
proposed that volunteer groups be formed to promote the use of the
software for the benefit of the disabled and underprivileged. Thus was
born Vidya Vrikshah, the volunteer organization in Chennai, which now stands
as a fine example of a group of volunteers who have actually demonstrated that
IT does indeed hold much promise for literacy and education in the country,
if approached through the mother tongue.
The organization conducts monthly
training programs to train visually handicapped persons in the use of computers.
This program, given free of charge, has attracted several hundreds people
from different parts of the country to come get trained in the use of computers.
Recently, the group of experts from different organizations
for the disabled, who attended the
INTEND
2001 conference, wholeheartedly endorsed the use of the IITM software
for large scale use within the country.
Here
are some additional details relating to the project.
During the past fifteen years,
approximately sixty students have contributed to the development of the
software. Majority of the students were undergraduates who took up the
work out of a conviction that something meaningful can be achieved. Though
in many cases, the work related to their undergraduate project, their involvement
was deeper since they spent more than one year in the lab, getting trained
first before continuing the development.
The
IITM software project has been unique in many respects within the IIT system.
It is the first project of its kind run entirely by the students over a
period of a decade. It is the very first project ever in the IIT system
where a product directly usable by the people of the country has been designed,
built and delivered to the people.
The
project,
by design, has not been funded by any Government or private organizations.
IIT Madras is the only Institution among the IITs which has refrained from
requesting for funds from the ministry for technology development in Indian
languages. This has given the lab the freedom to make the software available
free of charge to the people. It has also given the students an opportunity
to work on socially relevant problems and provide workable solutions as
opposed to developing prototypes which would require further work to make
them usable.
It is very clear that
commercialization of any product, especially software, would render it
unreachable to the section of the community that should truly get access
to it. The students of the lab are quite convinced about this. The free
distribution of the software by IIT Madras cannot be likened to the free
distribution of software developed at many academic institutions of the
world. The purpose is to make available to someone the basic means for
gaining literacy. It is a different question of course if people choose
to ignore the software. We do hope that this will not happen.
A related issue is that the
development at the lab has not been publicized or made known through academic
channels. The only channel of information transfer has been the lab's web
server which carries the on-line Sanskrit lessons. Also, the question of
why the lab has not worked with other groups in the country needs to be
addressed. The answer is easy to provide. It is not unusual for heavily
funded academic projects to gain national visibility. The destiny of what
results from a project is often decided by the funding agency and not the
group which develops the technology. The question of working with
other groups just did not arise because almost all of them run funded projects
and would find it difficult to freely share the information. This is an
important ideological difference that must be reckoned. This would explain
the conspicuous absence of IIT Madras' name when IT in Indian languages
gets discussed at a national level.
On a philosophical note,
it
is not the technology that matters. It is how the technology is actually
used by the people that really counts. The IITM project has consciously
addressed the latter issue, while most projects have concentrated on the
former.
Resources
for continuing the development were frequently added by students themselves
during their visit to the lab. In many cases, the students of the
lab who had pursued graduate study in the U.S would bring back books, peripherals
and other development software and thus bless the continued development
of the project. This is in contrast with the normal IIT approach to resource
generation through sponsored research. The end result is extremely satisfying
in the approach taken by the lab, where a socially relevant problem receives
attention and a solution as well.