Encoding diacritics for search purposes

James K. Tauber (jtauber@entmp.org)
Wed, 14 Aug 1996 20:46:15 +0800 (WST)

I'm trying to put together a system for separating diacritic placement
from transliteration of the actual letters of a Greek word to enable
searches with or without diacritics (as well as helping with sorting).

I seem to recall Gramcord developed a system whereby the diacritic
placement is encoded as a number attached at the end of the
transliteration. This is exactly the sort of system I had in mind. As I'd
hate to re-invent the wheel, could anyone outline the Gramcord system for
me?

Failing that, I will reinvent the wheel. What would the requirements be?
First, lets assign a number to the following accented word types:

enclitic 0
proclitic 1
oxytone 2
paroxytone 3
proparoxytone 4
perispomenon 5
properispomenon 6

Is this enough to characterize all words standing alone? Is the
interaction between accents in adjacent words regular enough that the
accents manifested can always be predicted from the underlying accents
encoded using the numbers above?

Breathing can be done with:

no breathing 0
smooth breathing 1
rough breathing 2

Diaeresis and Iota subscripts can be done with some kind of tenary
(base-3) number system as each vowel with have either neither (0),
diaeresis (1) or iota subscript (2). Each vowel would be a digit in the
ternary number which could then be written in decimal.

So XRH/|ZH| would be 22 in ternary = (2 x 3 + 2) = 8.

Overall in this system, XRH/|ZH| would be written

XRHZH308

Failing the revelation of Gramcord's system, what do people think of my
(admittedly off-the-cuff) system outlined above?

James K. Tauber <jtauber@entmp.org> http://www.entmp.org/people/jtauber
Associate Director, Electronic New Testament Manuscripts Project