[Prev][Next][Index][Thread]

Word Alignment Results



A while back I requested some info on how to get electronic versions of the Bible in English for a project in statistical word alignment.  I got an overwhelming response and was able to obtain just the texts I needed.  Some of you wanted to know the results and though a full explanation would need the whole dissertation, here are some points.  

The intent of the project was to find a way of aligning 4 versions of the Syriac Gospels, Syriac being a dialect of Aramaic.  Word alignment means that the versions are visually in line with each other, word for word, though some words may be aligned with a "null" word.  I ended up using an algorithm developed by Dagan, Church and Gale from AT&T.  It uses statistical analysis and word proximity to determine which words are most likely to correspond to each other.  My results for the Syriac were 37% of the words were aligned exactly with over 60% being within an offset of 4.  

The English results were only a disappointing 19.5% but still with over 50% within an offset of 4.  Because the algorithm, which I implemented in C++, took such a long time to run, I had to use a small subset of the Gospels, the Sermon on the Mount, so there was not a grand amount of data for estimating the parameters of the model.   Just those 3 chapters on a 486 PC took between 20 hours and 5 days, depending on some initial conditions.  If I had more computer horsepower, I may have gotten better results, because I believe the algorithm is sound.  Also, since all the published algorithms only deal with two languages and I had 4 versions, I had to figure out a way to do a multiple alignment.  I did, but it's needs quite a bit of refinement.

I'll also post a list of the sources for English versions of on-line Biblical texts with the names and comments of who sent it to me.  Thank you very much to all who responded.  Again, I'm not subscribed here, so you get contact me personally, if you want.  I'll only be at this email address until the end of Sept though, since my course is ending.

Thanks again,
Larry Piano

-------------------------------------------------------------------------
Source List for Electronic Bibles:

1. James K. Tauber, Undergraduate Student          ``Perplexed but not
Centre for Linguistics, UWA, Australia               despairing''
E-mail: jtauber@tartarus.uwa.edu.au                    - Paul (2 Cor 4.8)
WWW:    ftp://tartarus.uwa.edu.au/pub/jtauber/main.html  

The directory /pub/bible on ftp.spss.com has a number of English 
translations of the Bible (including AV, Weymouth's New Testament and 
Young's Literal New Testament).

Comment: Richest find so far: see Bible.List, spurgeon directory, rwp (Robertson Word Pictures in the Bible), web (Noah Webster's Bible), wnt (Weymouth's New Testament), ylt (Young's Literal New Testament).

2. Henry S. Thompson - eucorp@cogsci.ed.ac.uk

ftp to ftp.cogsci.ed.ac.uk, cd eci, binary, get acts.txt.z, get
palm-sunday.txt.z

3. "DDDJ" <DDDJ@aol.com>

ABS=American Bible Soceity 1865 Broadway New York NY 10023 212-408-1499
Both Mac and PC versions available.

4. Bob Kraft - University of Pennsylvania - kraft@ccat.sas.upenn.edu

You might find it useful to contact Doug DeLacey ( del2@phx.cam.ac.uk) at 
Tyndale House therein Cambridge regarding electronic biblical materials. Lots 
of things are available from various sources, including the networks. Doug 
might have some texts on hand. You might also contact the Oxford Text Archive
 (LouBurnard; archive@vax.ox.ac.uk, I think). We also have some stuff here.

5. Maurice A. O'Sullivan  [Bray,Ireland] mauros@iol.ie

ftp to mailbase.ac.uk; cd /pubs/list/religion-all; get software.text (contains
 mostly cmmercial packages)

6. NAME:  Swedish Bible (Bibeln eller den Heliga Skrift)

Platform: DOS, Windows, Macintosh, UNIX

Requirements: FTP, Gopher or WWW access.

Description: On-line version of the Swedish Bible electronically produced by 
Project Runeberg at Linkoping University in Sweden.  Contains ASCII text 
of both the Old and New Testaments taken from the official translation 
published in 1917.

Price: Free.

Available from:  Individual directories and files are available via 
gopher from gopher.lysator.liu.se (path=project-runeberg/txt/bibeln).  
Via WWW at http://www.lysator.liu.se:7500/runeberg/Main.html.  Via FTP 
from ftp.lysator.liu.se as /pub/runeberg/korrektur/bibeln/

Notes: Software to search and retreive from the Swedish bible is under 
development.  Contact Per Cederqvist (ceder@lysator.liu.se) for further 
information.  Details about the Runeberg project may be obtained from 
Lars Aronsson (aronsson@lysator.liu.se).
====


7. Philip Sutherland Ross - psr@psrsite.demon.co.uk (Bible aid sales)

I sell a package called Bible Works for Windows which runs in Windows 3.1
and has KJV, ASV, RSV and has several versions to follow.  It also has
a full parsed version of the Greek New Testament and several other features.
This 'Student Bundle' costs 99

A 'Research Bundle' which includes the Hebrew Old Testament and the Septuagint
is also avaliable at 199.

8. David Housholder - 73423.2015@CompuServe.COM Marietta, Georgia, USA

Your beginning point for discovering electronic versions of the Bible could 
wellbe via Durham. Send a message to 
        mailbase@mailbase.ac.uk
with the message
        send religion-all software.txt
You will receive the "Software for Theologians" compilation done by Mr. Fraser
at Durham. It is an excellent beginning point for your work.
You can also contact Mr. Fraser as
        m.a.fraser@durham.ac.uk

9. Trent Riggs - tariggs@icaen.uiowa.edu

     Do you have access to Gopher or Mosaic (gopher preferrably)?
     If you do, do a Veronica search on Bible.  If you don't, do an
     Archie search for bible.  The only archie server near you that I know
     of is archie.doc.ic.ac.uk (146.169.11.3).  Login as 'archie' and
     type 'prog bible'  It will default as a substring search (rather than
     exact text).  There's online help if you get confused.  I know the 
     On-Line Bible is good (has a few different versions & languages built
     in!)  That's like 20Meg or something, though, for everything. (DOS)
     If you have the space, look for it.

10. Tim Finney - finney@csuvax1.murdoch.edu.au 

On-line Bible has these things. It is available by anonymous ftp from 
white sands university archive.

To get it, do an ftp to wuarchive.wustl.edu and get to the directory with 
bible in it. I think this is in either the doc or pub directory. Then 
choose mac or ibm. I've had trouble getting the mac version to download 
properly. If you have trouble there is information in the manuals about 
how to obtain the program and versions by conventional mail.