Linguistic Computing Overview
THEORYMATHEMATICS AND LINGUISTICS
BFBS LC work on the assumption that language can be analysed mathematically, and that all the phenomena of language can be rendered logically and rationally. The team work to express these logical explanations of linguistic processes in mathematical notation so that they can then be used in computer programs.
Machine translation is the term given when a computer program translates one language to another. Many existing translation systems have been developed to deal with translation between a specified set of languages (usually only one language pair) and operate with a particularly restricted vocabulary. These work well particularly when they have restricted vocabulary.
To write a machine translation program someone needs to know all the rules that govern the way the languages work. These rules are then written into the logic of the programming code. The machine then goes away and, following these rules, produces a translation. Such programs are expensive to develop because they use the time and expertise of skilled linguists and computer experts.
Machine translation works well where:
- there is a big enough commercial market to make a profit and
- the language is well defined by linguists
Unfortunately these two factors do not apply to most Bible translation projects which is why the BFBS LC team do not do machine translation of this type.
THE BFBS LC APPROACH
The team do not write machine translation programs because they are working with many languages, and often the languages they work with are not fully analysed. Instead we start with no rules because we probably don't know the language concerned. However we have a large text and vocabulary, probably a Bible.
The task for the machine is to analyse that text and from that analysis derive the rule set that governs the working of that language. Once the rule set has been identified the machine can then learn from it and apply those rules to further analysis of the language.
The advantage of the approach is that the need for the involvement of the language speaker is much reduced. It is machine based and therefore totally consistent (unlike the language speaker) and it can be applied to minority languages which do not benefit from vibrant economies which can afford to finance large projects. The benefits of this approach are the ability to harness the power and patience of modern computers to the task of analysis and to produce objective analysis of highly complex structures.
Like any child learning a language, the learning machine must have exposure to the target language. In the restricted context in which we are working this means having access to a text, usually biblical. Of course the text may be incomplete and it is a limitation of any learning process, human or artificial, that the wider the exposure to the target data the more coherent and complete will be the analysis. It is implicit to such an approach that before any attempt at generating a translation can be made, an adequate analysis of the target language must be derived.
Amongst the problems is the recognition that anyone, human or computer, learning something as complex as a natural language may make mistakes. The role of the LC team is to develop systems which are not only heuristic but ultimately self-validating with the ability to correct and enhance their own results as the analysis proceeds.