MorphBHS misunderstanding

Dale M. Wheeler (dalemw@teleport.com)
Sat, 02 Nov 1996 11:08:43 -0800

Dear List-Members:

There seems to be a rather severe misunderstanding abroad
related to a note I had posted (which is now changed) on
the GRAMCORD Web site concerning the MorphBHS and MorphLXX,
which was in response to repeated comments and questions I
received from folks about the accuracy of these databases
(let me hasten to add that this part of the note originated
with me, not with anyone at the GRAMCORD Institute). It seems
that some have read my comments to mean that there is an ERROR
rate in these Morph texts of *25%* !! Nothing could be further
from the truth; both with respect to these texts and to my note.

I tried to be VERY careful with my words in this note, but I
have discovered upon re-reading it that I was not as
careful--or as clear--as I should have been. The note used to
read:

************************************************************
Dr. Wheeler has determined that approximately 25% of the lemmas
in the MorphBHS database have some sort of problem associated
with them. Some of these errors effect only a very small number
of occurrences in the database while others effect dozens.
Dr Wheeler cautions all users of the database (no matter what
software they are using) to proofread carefully their results
if they are planning on using such software output in serious
research papers, theses, or dissertations.
**************************************************************

In the first sentence I carefully chose the word PROBLEM not
ERROR or MISTAKE or any other such term...unfortunately I
goofed in the second sentence and used the word ERROR. I wish
to offer a world-wide apology to ALL those who have labored on
these Morph texts; their work on such an incredibly difficult
task is astounding in its accuracy. My estimate (and that's all
it is at this point, since I'm only 1/2 way through the approx.
9000 separate Hebrew lemmas) is that the ERROR rate is
considerably below *5%*, and my FEELING is that it is probably
as low as *2%* !! I'd say that is pretty remarkable. However,
I think all would nevertheless agree with my caveat, that if
you are going to use research based on these texts for your
dissertation, that you'd better hand-check it; you may just
have landed in the 2%.

My statement that (if you change the word ERROR to PROBLEM):
"...approximately 25% of the lemmas in the MorphBHS database
have some sort of problem associated with them. Some of these
**errors-->>problems** effect only a very small number
of occurrences in the database while others effect dozens."
means what it says (though the 25% seems to be going down as
I go along...where I started, towards the end of the alphabet
[for reasons I won't take time to explain here] seems to have
had a higher rate of problems than the beginning of the
alphabet [I'm in Cheth right now]; thus in toto we may be
below the 20% mark). A certain percentage of the total of
some 9000 lemmas have SOME SORT OF PROBLEM...WHICH EFFECTS
SOME OF THE OCCURRENCES. This does not mean that 25% of
the lemmas are all messed up, but that perhaps 1 or 2 entries
for each of that 25% has a PROBLEM, NOT AN ERROR. Occasionally
(and this is the exception, rather than the rule) one of
these lemmas will have wide-spread PROBLEMS (as outlined
below). Thus its a very small percent of the 25% which is
the PROBLEM and needs correction. As I said, my guess is that
the total PROBLEM rate is around 2%, and the actual ERROR
rate MUCH lower than that.

Let me explain as well what the difference is between a PROBLEM
and an ERROR. PROBLEMS are things like: sometimes the MorphBHS
follows Even-Shoshan (which is the documented basis for the
lemmatization) and sometimes they follow Koehler-Baumgartner.
This shows up eg., in the choices related to whether we should
cite the undocumented singular lemma of a word which only
occurs in the plural in the OT or whether the plural should be
cited as the lemma. I certainly wouldn't call any of those
choices ERRORS, but they certainly can be PROBLEMS for users,
esp. computer users where consistency is absolutely crucial.
Occasionally the same word is cited with both a plural and a
singular lemma. Again, I wouldn't call that an ERROR, but it
is PROBLEMATIC, esp., for users who might not recognize the
way lemmatization is done in Concordances and Lexicon and thus
look for both forms. Another example relates to compound names
which sometimes are lemmatized with a maqqef and sometimes
without one. NOT an ERROR in my estimation, but a PROBLEM.

Some of the PROBLEMS are unavoidable at this stage in the
refinement of the database, for example, when E-S and KB and
BDB disagree about which verses go with which lemma, like
XLL, XYL, XWL, XLH; or when the tools are internally
inconsistent themselves in their lemmatization procedure--
which isn't a big problem in a book, but wreaks havoc in
a computer implementation. I hope in the NEXT pass through the
text to provide alternates at those places where the major
tools/commentaries disagree. But for the time being, hand-
checking is highly recommended for publishing quality
work (which is no more than any of us would do anyway when
getting references out of an lexicon or concordance; an
electronic concordance is no different).

The aforementioned "PROBLEMS" will come as no surprise to
those who use the Concordances and Lexicons on a regular
basis, they probably will catch beginners unawares. Our goal
is to create a database that catches no one unawares, since
a computer implementation of a database is different from
the same data in a book

The above comments apply IN PRINCIPLE to the MorphLXX,
though I cannot estimate PROBLEM rates since our revision
approach is different than that being used at this stage
for the MorphBHS.

Again, let me apologize to those who have preceded me in
this endeavor to create Morphological versions of the OT
texts for any misunderstanding created by my note. I have
nothing but the highest praise for their sacrificial labors
through the years in creating these texts. The editing work
I'm doing is insignificant in comparison with the Herculean
tasks they have performed in bringing these texts into
existence in the first place. In future generations I suspect
that virtually all serious research will be based on these
texts, and that scholars will rise up call blessed the names
of Whitacker, Parunak, Kraft, Tov, Groves, et.al....

Humbled-ly yours...

***********************************************************************
Dale M. Wheeler, Th.D.
Research Professor in Biblical Languages Multnomah Bible College
8435 NE Glisan Street Portland, OR 97220
Voice: 503-251-6416 FAX:503-254-1268 E-Mail: dalemw@teleport.com
***********************************************************************