Marking syllable boundaries in text is considered a pre-requisite to developing automatic hyphenation systems. The MAT team is working with colleagues from the United Bible Societies (UBS) & Canadian Bible Society (CBS) to develop systems which will set hyphenation points for vernacular languages without the need for and extensive lexicon or pre-defined rule-set.


To assess the validity of the assumption that hyphenation should be primarily based upon syllable boundaries the MAT team have constructed a syllable parser. This work enabled some initial conclusions to be drawn with regard to hyphenation boundaries including the definition of a three-layer model for setting hyphenation points. A detailed description of the process which was developed can be found in the paper:

First Experiments in Automatic Hyphenation, J D Riding, 20th May 2005
read/download PDF Automatic Hyphenation document (229 kB)
Cite this work:
Techreport (Riding2005)
Riding, J. D.
First Experiments in Automatic Hyphenation
British & Foreign Bible Society, 2005

This paper describes first attempts at the automatic identification of syllables within a text. It seeks to mark syllable boundaries in the expectation that these may prove useful in setting hyphenation points for the text. This work follows discussions between the author, Brad Olson, Ron Rother & Nathan Miles at SIL Dallas on 2nd May 2005. The basic premise is that, if a syllable is comprised of onset, nucleus and coda, and it is possible to identify these components, either from the text directly or by enquiry of a language speaker, then it ought to be possible to construct a machine to process a text stream, identifying onsets, nuclei and codae and thus syllable boundaries. The first part of the paper describes an initial attempt to identify syllable components within an English text and the second part describes a process for identifying syllable boundaries based upon these identifications. This work is highly preliminary and has been tested only on English although every attempt has been made to avoid language specific processing. It is hoped that it might form the basis for a more comprehensive and thoroughly language independent system in the future.

The core method code used to construct the syllable parser can be found in the file:
read/download PDF Java Syllable Finder (60 kB)
get Adobe Acrobat Reader


To access and print these PDF documents you will need Adobe Acrobat Reader. If you don't have this then click the Adobe icon (on the left) for a free copy of Acrobat Reader.

You must be logged in to make comments on this site - please log in, or if you are not registered click here to signup