Indian Institute of Technology, Madras
In most world languages, the basic alphabet is found to consist of not more than 50 or 60 letters and there are enough keys on the ASCII keyboard (including the shift) to accommodate all of them. The character sets of the Indian languages, made up of basic consonants and vowels which number about 55 or so, can therefore be assigned keys on te keyboard without much difficulty and can be mapped using only the Roman letters and the numerals.
The IIT Madras system is very flexible in that it would permit any mapping between the Indian language letters and the keys on the ASCII keyboard. However, two specific assignments are meaningful. One, the familiar Indian language typewriter based mapping and two, a phonetically based mapping. The former would permit those familiar with local language typewriters to type in naturally and the latter would help those familiar with English to type in the Indian language texts. Clearly, the data entry method using the Indian language typewriter keyboard would be different for different languages but would help fast typing for those already adept at the keyboard.
For the sake of uniformity, the IITM system recommends the phonetically assigned keyboard mapping for data entry across all the Indian languages. More information on this follows. It is useful to remember here that the current keyboard assignment maps a superset of the basic consonants and vowels in typical use across all the Indian languages. Hence characters which are unique to some languages also have a place in the common mapping.
Here is an inline image of the keyboard mapping for local language characters.
This image relates to Sanskrit. Similar image files are available for all the languages as Postscript files. Click on the link below to view the actual mappings for different languages. If your web browser invokes ghostscript appropriately, then the files will be seen on your screen. Otherwise, you will have to download the files and print them on a Postscript printer.
While typing, the ALT key on the keyboard ( by now standard in
all computer keyboards) is used to indicate that the character key pressed
simultaneously with the ALT key is to be handled specially. This special
handling usually means that a conjunct is being formed. For example, if
k and r are pressed one after the other, the system would accept them as two
different consonants and will display
.
However if after pressing k, the r key is pressed along with the ALT key,
then the system would accept this as a conjunct formed by ka and ra
and will display (ka + ra)
A consonant-vowel combination is quite frequent and therefore the system has
been designed to treat a vowel entered after a consonant, as a part of the
consonant. Thus entering i after k will result in (ki)
There are situations where a vowel has to stand apart and not combine with
a previously entered consonant. In such cases, entering the vowel along
with the ALT key pressed will produce a distinct vowel followed by the
consonant. Thus k followed by ALT i results in (ka, i)
There are several instances in Gujarati, Sanskrit and Tamil where a vowel may
follow a consonant but should stand out by itself. To cater to such situations,
the ALT key can be used.
As may be inferred from the foregoing, the keyboard entry is managed through a state machine in the software which keeps track of the keys already entered. It is this state machine that gives power to the system which allows very natural data entry using the standard ASCII keyboard.
The following points may be kept in mind however.
While several hundreds of conjunct characters may be formed theoretically, the set used in practice is usually 500 or so. The keyboard entry is therefore programmed to accept only valid conjuncts. While the superset of supported conjuncts is more than adequate for all practical uses, some earlier writing systems (during period 1000 AD) used combination of four or even consonants in one letter. The present system does not cater to all conceivable conjuncts and restricts the set to a maximum of three consonants while forming conjuncts. The document test.llf may be viewed to get an idea of the set of conjuncts supported for any language.
The document sanskbd.llf may be viewed to get more information about the keyboard mappings in sanskrit.
Note : You will require "lb" for viewing the above.
Yet some tricks can be employed to display these characters. The superset of characters includes special consonants which have no meaning directly but can be used to generate interesting character shapes. There are four such consonants which may be used in combination with others to generate special character shapes or symbols that may accompany a regular consonant. Let us look at some typical situations where these will be of help.
Though there is no formal notation for Indian classical music, teachers employ special symbols to represent the variants of a note, the pitch to be used in singing a note etc. The key assigned for generating music symbols is language-dependent.
Follow the link to view the document explaining this in greater detail. You will require to update your language support files for Tamil to view this document correctly. Some of the music symbols were not correctly included in the files distributed with the archives.
Sample document about Music related symbols.
From time immemorial, the Vedas have been taught via the oral tradition. In recent times, the interest in studying and understanding of the Vedas, has led to a notation which could be useful to learn the chanting procedures.
The Vedic symbols primarily consist of different Anuswaras, Visargas and chanting accents called Anudatta, Swarita, Campa etc. Of these, the anuswaras and Visargas are written following a character and can therefore be included in the text by just entering them using the assigned key combinations. On the other hand, the accent symbols anudatta, swarita etc. are written alone or below a character and hence require very special handling. In the IITM software, these are handled through special codes ( distinguished by the most significant bit of the 16 bit code) which will result in the proper positioning of the accents during display.
These special codes are similar in function to the Escape codes used with standard ASCII streams. However they are not part of the character set and cannot therefore be handled by the library. Yet there is a provision in the system to alter the process of normal display of a character by adding some attributes and the accent marks are handled through these attribute-interpreting functions which may be called by the applications. It will therefore be necessary to make use of these attribute functions while preparing the text. This may be accomplished by assigning some of the special function keys on the keyboard to the attributes to be assigned to the previously entered character. The line editor "mled" does not incorporate such a feature now. This will however be included in the next version of the editor which will support direct screen editing.
The browser program "lb" was written to interpret the attribute codes and so one will be able to view text prepared with Vedic symbols and accents. The link below will take you to a sample document containing Vedic accent marks.
Sample document with Vedic accent marks
In these two languages, one sees specialized symbols for nasalization of the vowel sounds. In Hindi, the Bindu and the Chandrabindu symbols are frequently used. In Punjabi, one sees similar symbols for nasalization including Tipe mark. The Punjabi symbol 'adda' is used for doubling the succeeding consonant.
Also, these two languages have been characters which are of Persian influence. There are seven such cases involving ka, kha, ga, pha, ja, ta and ddha. It is normal practice to display these characters through the use of a "dot" beneath the character. While entering such characters in either Hindi or Punjabi, the Q key is useful. The required character is obtained by first entering the basic consonant (ka, pha etc.) and following it with ALT Q.
In Sanskrit, the half character shape is often used while forming conjunct characters. The half character shape applies to most of the consonants, though there are exceptions such as nga, ha, da etc. For many of the consonants, the half character shape can be generated by first entering the special consonant key Q followed by ALT (cons) where cons is the consonant whose half shape is required. One use for these half shapes is while forming unusual conjuncts which are not supported by the system directly. Also in teaching the script, it would be helpful to display these half shapes.
The Matras or Vowel symbols can be generated by entering Q followed by the corresponding Vowel.
The symbols known as Avagraha and Jihvamulya are commonly seen in Sanskrit texts (also some manuscripts in other languages). Special keys have been assigned for these two symbols as well. The key { corresponds to the Avagraha while } represents the Jihvamulya.
The coding scheme of the IIT Madras system has imposed a restriction on the number of conjuncts which can be formed with any basic consonant and presently this is set at 31. What this means is that the system will not be able to handle (using the 16 bit representation) more than 31 conjunct letters formed with any of the basic consonants. This is not a real restriction in practice since any conjunct can always be written by successively writing the combining characters in their generic forms (the form without any vowel signs).
The inline images show two examples.
The first form uses three 16 bit characters to represent the conjunct while the second uses only one. Both the representations are valid though the second form is preferred as it corresponds to a single syllable.
Traditionally, in Sanskrit and other languages, many consonants having fewer than 31 conjuncts are known to have been in use. Also it will not make sense to form conjuncts arbitrarily and the limit of 31 imposed is not really a restriction except, not surprisingly, for the letter 'ra'.
It is found that there are close to 60 conjuncts based on the consonant 'ra'. To accommodate this, the IIT Madras system has split the conjuncts with 'ra' into two groups. The first corresponds to conjuncts formed with 'ra' and consonants 'ka' through 'da'. The second group consists of all the conjuncts formed with ra and consonants dha through the rest. Thus ra has been assigned two independent mappings on the keyboard ( r as well R ). During normal keyboard entry, both will result in the basic "ra" but, while forming conjuncts with other letters, the "ra" from r should be used for combinations with ka through da and ra from R should be used for the rest.
The inline Image below illustrates this.
This arrangement continues to preserve the sorting order of the
conjuncts.