Jan 9, 2010

Palm Leaves to OSX; problems of Pali Fonts

Pali was originally a spoken language only, and was not committed to writing until several hundred years after the Buddha's time. During the Buddha's own lifetime writing was, in India, a fairly recent technological innovation and was used only for practical purposes such as commercial and diplomatic messages. It was still considered improper to use such a vulgar medium for religious texts.

So, Pali has no written alphabet of its own. The language has, by one count, 32 consonant and 8 vowel sounds. The consonants are organized in a logical fashion in a grid according to how they are sounded; whether aspirated or not and where the tongue is placed in the mouth. This is very different from the Roman alphabet used in English and other Western European languages, but is a system widely used in South and South-East Asian alphabets. (Some readers may be familiar with a similar system adopted by J.R.R. Tolkien for his imaginary Elvish languages. Tolkien was, after all, a linguist.)

In the traditional Theravada countries, Pali is easily rendered into the local alphabets and there are Sinhala, Thai and Burmese editions of the Tipitika. Pali was not rendered into Roman until the nineteenth century when German and English scholars began to take an interest in the scriptures of Theravada Buddhism. A problem arose immediately in that the Roman alphabet does not have enough letters to render each Pali sound.

This was solved in two ways. First, the aspirated versions of several consonants were rendered by adding an "h." Thus; bh, kh etc. represent only one letter in Pali. "Buddha" has four letters, not five, in Pali. This is a reasonable compromise and only causes confusion to those not familiar with Pali orthography, thus we see common misspellings such as "Bhudda."

The other method adopted was the addition of diacritical marks. Pali vowels are relatively simple; there are five basic vowel sounds which occur as either "long" or "short." The length of a vowel does not change it's basic sound, but only the time it is held and is mostly important for metrical purposes in verse. The long vowels are indicated by a macron (dash) over the letter. ā ū ī

Several of the consonants have a "retroflex" version, a sound not familiar to English speakers. It is made by curling the tongue back in the mouth. This is indicated by a dot placed under the letter.  ḍ ṭ There is also a special version of n, which is pronounced like "ny" as in the English "canyon" which is indicated by a tilde (like a sine-wave) mark over the n, like in Spanish. ñ

This leaves one very special sound in Pali to be rendered. That is the "pure nasal" or in Pali, the "niggahita" which nasalizes the preceding vowel. It is not really a sound on it's own, but roughly it is like a terminal "ng" as in English "ring." There is a lot of typographical confusion over this letter in Roman Pali. Nowadays it is most commonly indicated by an "m" with a dot underneath but in many older books one will see a funny "n" with a curly tail, or an "m" with a dot over it, or even an "n" with a dot over it. ṃ ŋ

When books were still printed with moveable type, special letters would have to be cast for the diacriticals. If a page with Pali words was produced on a typewriter, the marks would have to be added by hand.

This was the case for the early pioneering editions of the Pali texts produced in Roman fonts by the Pali Text Society. That august body still prints from photo-engraved plates based on the original, hence their editions usually have a longish insert of "errata" since it is impossible to correct minor faults in the original.

The original Roman Pali was produced by painstaking scholarship, comparing word by word the Sinhala, Siamese and Burmese versions; footnotes indicated any variation between the three. This, of course, was done at a time when computer technology was no more than a twinkle in Sir Charles Babbage's eye.

Fast forward to the 1980's and the dawn of the modern computer age with its promise of a paperless office, expanded leisure time and easy to use Pali fonts. (Not so much for any of it.) My own first computer was a Commodore-128.  For the information of the younger set, this was a primitive device with no hard-drive, a black-and-white low-res monitor and packed with 128 kilobytes of RAM. The word "font" was not yet known outside of professional typographical circles. The word processor had one bit-mapped typeface for general use but it did come with an alternate to be used for typing in French, which included the various accented vowels for that language.

I needed to be able to produce Pali letters so I copied the French type-set, hacked the machine-code for the bit-mapped letters and put the most common Pali diacritical letters in place of the French accented vowels. I was able to type Pali because the poor Commodore thought it was speaking French. (Oddly, I miss that machine.)

Come the nineties (remember them?) and the computer revolution shifted into second or third gear. I started using a Mac (System 6) and cobbled together my own postscript Pali Fonts using a programme called Fontographer. After something called the Internets became a wildly popular fad, more and more Pali Fonts started to become available.

The problem now was one sadly familiar to computer users in those days; lack of standards. Each font had its own unique keymap. A document produced using MyNorman would not print properly in LeedsBitPali. Conversions required a lot of tedious search-and-replace. Worse, fonts and keymaps did not translate well across platforms. Changes in software eventually made my own Pali fonts obsolete. Sometimes cumbersome work-arounds had to be employed. I once produced a Pali chanting book using Word macros. The resulting file was huge and it slowed the computer to crawl just attempting to scroll through the pages.

But now we are at the dawn of a new era. Finally, it is getting easy to use not only Pali but almost any alphabet, thanks to Unicode. This is an expansion of the old ASCII idea; each character has a unique, universally agreed hexadecimal code. The old ASCII standard was limited to 256 characters. The new Unicode, by adding a few digits, increases the potential to over 2 million characters. Of course, not every font will have every known character, but as long as developers adhere to the standard (hah! I'm talking to you Bill Gates) we should be guaranteed that every font which has a Pali retroflex "d" will have it in the same place, i.e. use the same hexadecimal coding.

So documents produced in Unicode Helvetica on a Mac should be readable in Unicode Arial on a Windows box. And everything should display properly in a browser window. Let's see if it works; I'm going to type the Pali word for "Consciousness" which uses several diacriticals; how does it display in your browser window?


This was especially easy for me to do in Mac OSX using a freeware application called Ukelele which lets me define my own custom keymap. So, I have a home-made Pali keymap which puts, for example, the long-a under option-a. Because of Unicode, this doesn't matter at the other end because the hex code for the letter remains unchanged! When the Unicode standard becomes really universal we'll finally have reached the same ease of use for Pali letters as scratching on palm-leaves.

Paperless office and expanded leisure time coming next...