Indian Institute of Technology, Madras

KEYBOARD ENTRY OF LOCAL LANGUAGE CHARACTERS


General Introduction to Keyboard entry

The standard ASCII keyboard (the familiar QWERTY or typewriter keyboard) is a part of any computer-user interface. It would be meaningless to design a separate keyboard to support data entry in Indian languages (though specific keyboards for Japanese have been built). Such keyboards, even if designed, would be difficult to incorporate into the different computers in use today. The IITM system therefore chose to use the standard ASCII keyboard itself, since it would be available with all systems and would certainly include the letters of the Roman alphabet. Also these keyboards support the ALT key which may be used to specify alternate interpretations for the key pressed along with it. In effect, the ALT key is very effective for implementing state machines to handle successive key closures which may be interpreted appropriately.

In most world languages, the basic alphabet is found to consist of not more than 50 or 60 letters and there are enough keys on the ASCII keyboard (including the shift) to accommodate all of them. The character sets of the Indian languages, made up of basic consonants and vowels which number about 55 or so, can therefore be assigned keys on te keyboard without much difficulty and can be mapped using only the Roman letters and the numerals.

The IIT Madras system is very flexible in that it would permit any mapping between the Indian language letters and the keys on the ASCII keyboard. However, two specific assignments are meaningful. One, the familiar Indian language typewriter based mapping and two, a phonetically based mapping. The former would permit those familiar with local language typewriters to type in naturally and the latter would help those familiar with English to type in the Indian language texts. Clearly, the data entry method using the Indian language typewriter keyboard would be different for different languages but would help fast typing for those already adept at the keyboard.

For the sake of uniformity, the IITM system recommends the phonetically assigned keyboard mapping for data entry across all the Indian languages. More information on this follows. It is useful to remember here that the current keyboard assignment maps a superset of the basic consonants and vowels in typical use across all the Indian languages. Hence characters which are unique to some languages also have a place in the common mapping.

Here is an inline image of the keyboard mapping for local language characters.

This image relates to Sanskrit. Similar image files are available for all the languages as Postscript files. Click on the link below to view the actual mappings for different languages. If your web browser invokes ghostscript appropriately, then the files will be seen on your screen. Otherwise, you will have to download the files and print them on a Postscript printer.

Key maps for other languages

Entering Consonant Vowel combinations as well as Conjuncts using the Keyboard

Even though the number of different characters used in many of the Indian languages runs into thousands, all of them are built around a much smaller set comprising of the basic consonants and vowels. As seen earlier, this set consists of about 43 consonants and 16 vowels, just about what can be mapped on to an ASCII keyboard.

While typing, the ALT key on the keyboard ( by now standard in all computer keyboards) is used to indicate that the character key pressed simultaneously with the ALT key is to be handled specially. This special handling usually means that a conjunct is being formed. For example, if k and r are pressed one after the other, the system would accept them as two different consonants and will display .

However if after pressing k, the r key is pressed along with the ALT key, then the system would accept this as a conjunct formed by ka and ra and will display (ka + ra) A consonant-vowel combination is quite frequent and therefore the system has been designed to treat a vowel entered after a consonant, as a part of the consonant. Thus entering i after k will result in (ki) There are situations where a vowel has to stand apart and not combine with a previously entered consonant. In such cases, entering the vowel along with the ALT key pressed will produce a distinct vowel followed by the consonant. Thus k followed by ALT i results in (ka, i) There are several instances in Gujarati, Sanskrit and Tamil where a vowel may follow a consonant but should stand out by itself. To cater to such situations, the ALT key can be used.

As may be inferred from the foregoing, the keyboard entry is managed through a state machine in the software which keeps track of the keys already entered. It is this state machine that gives power to the system which allows very natural data entry using the standard ASCII keyboard.

The following points may be kept in mind however.

While several hundreds of conjunct characters may be formed theoretically, the set used in practice is usually 500 or so. The keyboard entry is therefore programmed to accept only valid conjuncts. While the superset of supported conjuncts is more than adequate for all practical uses, some earlier writing systems (during period 1000 AD) used combination of four or even consonants in one letter. The present system does not cater to all conceivable conjuncts and restricts the set to a maximum of three consonants while forming conjuncts. The document test.llf may be viewed to get an idea of the set of conjuncts supported for any language.

The document sanskbd.llf may be viewed to get more information about the keyboard mappings in sanskrit.

Note : You will require "lb" for viewing the above.

Using "lb" to check out the key mappings

"lb" was designed to accept many of the commands from the keyboard. Therefore one can check the key mappings by entering characters in response to say, the prompt issued by lb when a switch in the language is sought by the user. Since the state machine is in operation during keyboard entry, all the supported characters can be entered. Users may also get a feel for how the software allows for editing the characters as well using the arrow and backspace keys. That the system handles variable width characters effectively will be seen during keyboard entry.

Note on Vowels, special symbols and rarely used Conjuncts

Sanskrit (as well as many other Indian languages) traditionally include two somewhat rarely used vowels. These are the long ru and lu. These have their associated matras as well and for the sake of completeness, should find a place in the set of vowels. The IIT Madras system has restricted the total number of vowels to lb and the above two are not included. It will therefore be not possible to show these two vowels or their combinations with consonants. Likewise, there are some rarely used conjuncts such as the often quoted combination of ra, tha, sa, na and ya, which cannot be directly handled by the software.

Yet some tricks can be employed to display these characters. The superset of characters includes special consonants which have no meaning directly but can be used to generate interesting character shapes. There are four such consonants which may be used in combination with others to generate special character shapes or symbols that may accompany a regular consonant. Let us look at some typical situations where these will be of help.

Special treatment of the letter 'ra'

The letter ra occupies an important place in the Devanagari Script, for it forms conjunct consonants with almost all the other consonants and even with many two consonant conjuncts.

The coding scheme of the IIT Madras system has imposed a restriction on the number of conjuncts which can be formed with any basic consonant and presently this is set at 31. What this means is that the system will not be able to handle (using the 16 bit representation) more than 31 conjunct letters formed with any of the basic consonants. This is not a real restriction in practice since any conjunct can always be written by successively writing the combining characters in their generic forms (the form without any vowel signs).

The inline images show two examples.

The first form uses three 16 bit characters to represent the conjunct while the second uses only one. Both the representations are valid though the second form is preferred as it corresponds to a single syllable.

Traditionally, in Sanskrit and other languages, many consonants having fewer than 31 conjuncts are known to have been in use. Also it will not make sense to form conjuncts arbitrarily and the limit of 31 imposed is not really a restriction except, not surprisingly, for the letter 'ra'.

It is found that there are close to 60 conjuncts based on the consonant 'ra'. To accommodate this, the IIT Madras system has split the conjuncts with 'ra' into two groups. The first corresponds to conjuncts formed with 'ra' and consonants 'ka' through 'da'. The second group consists of all the conjuncts formed with ra and consonants dha through the rest. Thus ra has been assigned two independent mappings on the keyboard ( r as well R ). During normal keyboard entry, both will result in the basic "ra" but, while forming conjuncts with other letters, the "ra" from r should be used for combinations with ka through da and ra from R should be used for the rest.

The inline Image below illustrates this.

This arrangement continues to preserve the sorting order of the conjuncts.