Orthographies & Scripts
An orthography is a standardised writing system for a specific language. This includes a prescribed system of characters, script, spelling and punctuation. Some groups of languages share orthographic features e.g. Czech, Slovak and Slovene have very similar orthographies, but most languages have some features unique to them e.g. Spanish uses the inverted question and exclamation marks and German capitalises all nouns.

In many alphabets letters many have two or more distinct forms. In many writing systems a distinction is drawn between:
lower case - smaller letters whose height may vary, and
upper case - large letters all the same height also called capital letters. Some typefaces allow small capitals with letters the same size as lower case letters (small caps) are used alongside full capitals e.g. in some Bibles "LORD" is often printed in small caps to represent the Tetragrammaton.

Greek and Latin alphabets were originally written in majuscule (upper case). Minuscule (lower case) writing was a gradual development until they became dual alphabets containing both forms.
Capital letters have evolved many functions but they are typically used to:
mark the start of a sentence,
indicate proper nouns, (or in German all nouns)
make abbreviations e.g. MAT
indicate emphasis

In many alphabets some letters may have different forms depending upon where they appear. In Greek, Hebrew and Arabic some letters have distinct medial and final forms e.g. Sigma in Greek. In Devanagari many letters have distinct initial and medial forms.

Some languages contain characters that combine to write a single orthographic unit, representing one sound. These maybe physically combined such as the œ or æ, which have largely fallen out of use in English, or they may be a combination of characters that function as a single unit. In many languages a digraph functions as a "letter" in its own right. Common digraphs are
ch e.g. in Czech, Spanish and Welsh
ij e.g. in Dutch
ll e.g. in Spanish and Welsh
ng e.g. in Welsh and many African languages

Most alphabets have a sort order that determines the order in which words are sorted for dictionaries or lists. Languages which use the same alphabet may not use the same sort order. In many languages which use distinct digraphs these may function as independent "letters" for the purposes of sorting e.g. in Czech, Spanish and Welsh "ch" serves as a single unit and words beginning with "ch" have their own section in a dictionary.

Today most European languages use a similar system of punctuation, which was invented by Aristophanes of Byzantium in the second century BC, but rarely used in ancient manuscripts e.g. most early Greek and Hebrew manuscripts had virtually no punctuation. Lack of punctuation can lead to ambiguities. To solve this problem ben Asher, a Masorete, devised a very thorough punctuation system for the Hebrew Bible text system of cantillation marks, which help the reader to understand the precise relationship between words.

Not all languages use spacing to separate words. Today Thai is an example of this, but prior to the third century AD many Greek manuscripts ran all the words together without spaces. Usually it is clear where one word starts and another ends but occasionally there is scope for confusion.

English uses the Roman script with upper and lower case letters. Diacritics are rarely used except in foreign loan words. Old English used some additional letters which have no fallen out of use:
eth (ð), now replaced by "th" but still used in Icelandic
thorn (þ), now replaced by "th" or incorrectly by "y" in phrases like "Ye Olde Shoppe"
wynn, with a sound corresponding to modern English [w]
yogh, now usually replaced by "gh" or "ch" in words like "loch" and "night"

It can be a useful help to someone working with a particular language to have an orthographic definition of the language available. This may not be static e.g. the Spanish Royal Academy decided that characters "ll" are no longer considered a separate "letter" for sorting purposes. MAT maintain orthographic tables that are used by MAT programs to run using the correct orthographic definition for the language it is working with. These are .tab tables consisting of upper and lower case characters, sort order, preferred font and punctuation.


When a language is written down it is represented by a script. Some languages have their own ancient scripts e.g. Hebrew and Greek, whilst others have adopted the scripts of other languages e.g. English uses the Latin characters introduced into Britain by the Romans. Russia uses the Cyrillic script which was developed by St Cyril and St Methodius who developed it by using characters from Greek and Coptic. Some languages can be represented in a number of scripts e.g. Yiddish has been written with Latin or Hebrew characters, and Swahili has been written with Latin or Arabic characters.

Sometimes governments change the script a languages uses in line with its political or cultural aspirations e.g. Kemel Attaturk switched Turkish from using Arabic to a modified Latin characters as part of his attempts to Westernise the country. In the nineteenth century Romanian switched from Cyrillic to Roman script, and after the second world war when Moldavia was made a soviet republic the Romanian spoken there was switched from Latin script back to Cyrillic. In the Soviet Union Stalin tried to convert many languages e.g. Mongolian, Tajik, Kazhak, to use a uniform Cyrillic and recently attempts have been made to switch Azeri and Uzbek from Cyrillic to Roman characters using Turkish orthography.

In Europe the only native scripts which are still in use are Greek, Roman and Cyrillic. Greek was based on Phoenician but introduced vowels, and this idea was later adopted by Latin and Cyrillic (which is based on Greek).
Many Asian languages developed their own scripts and these continue in long established traditions. In Africa there are also some native scripts e.g. the hieroglyphics, hieratic, Demotic and later Coptic script of Egypt (based on Greek) and Amharic in Ethiopia. Amharic is also used for a number of other languages in the Ethiopian region. Arabic script has long been used in northern Africa, although the Taureg tribesmen preserve a writing system called Tifinagh.
Most of the sub-Saharan African languages outside the Ethiopian region use modern alphabets based upon Roman characters.

Many languages existed for centuries without a written form. Many languages have been first analysed and written down by people working in Bible translation. When organisations, such as SIL, are involved in developing new scripts they usually follow the Latin alphabet and often supplement it with characters from the International Phonetic Alphabet (IPA) or use diacritics.

English is written in Roman script left to right horizontally, but although this may seem natural to English speakers it is not an automatic feature of all script. Some scripts run left-to-right horizontally e.g. Roman and Greek, whilst others run right-to-left horizontally e.g. Hebrew and Arabic. One form of ancient Greek ran in opposite directions on adjacent lines called "boustrophedon". In Asia many scripts run vertically, usually running top-to-bottom e.g. Japanese, Korean and Mongolian.