Re: comparing files

From: Randy Leedy (rleedy@bju.edu)
Date: Fri Jun 18 1999 - 08:54:42 EDT


I've been doing quite a lot of the sort of file comparison requested
in this thread. Here's a link to a collection of utility programs that
contains among many others the ones needed. I hope my explanations are
reasonably accurate; one of the more expert computer buffs among us
might be able to point out flaws or offer corrections.

These are DOS programs, so one must be familiar with DOS syntax to
make good use of them.

http://atnetsend.ne.mediaone.net/~paquette/WinProgramming.html#textutils

This file is a large .ZIP file. The three programs needed to do this
are SORT, UNIQ, and COMM; they can be individually extracted. To work
backwards, COMM compares two text-only files line by line and gives
output in 3 columns: the first contains the lines unique to the first
file, the second contains lines unique to the second file, and the
third contains lines common to both. The syntax to get the output into
a new text file is
>>COMM [-123] file1 file2 > newfile
The switches [-123] specify which columns to suppress in the output.
e.g., comm -12 mtvocab.txt lkvocab.txt > mtlkcomm.txt
 will give you just the third column, the lines common to the two
files.

To work reliably, files must be pre-sorted, with no duplicate lines
in either. That's where SORT and UNIQ come in.

The whole process, starting with the Word files would be to make sure
you have one word per line with no other data on the line, including
spaces or punctuation. Save the files as text only with line breaks.
Then sort the output to a new file [sort file1 > file2]. UNIQ the
sorted file to remove any duplicate lines [uniq file2 > file3]. At
this point you're ready to plug the resulting files into the COMM
command as indicated above.

The results can be pulled back into Word, the font changed back to
Greek, and you've got a readable list again.

With just a little imagination, a lot of this can be greatly
simplified with a DOS batch file.

I have not had good success with Word's "compare" function because it
does not work on a line-by-line basis.

BibleWorks for Windows has a built-in function to do this sort of
thing. It has a word frequency generator that will generate vocabulary
lists, based on part of speech if desired, from any passage or version
of Scripture (though only versions with embedded lemma information
will yield lists of dictionary forms rather than inflected forms). One
of the options within this program is to combine two passages using
AND, OR, or NOT.

If someone knows of an even simpler way to compare files
line-by-line, I'd like to know about it myself

Blessings! (Acts 3:26)

Randy Leedy
RLeedy@bju.edu

---
B-Greek home page: http://sunsite.unc.edu/bgreek
You are currently subscribed to b-greek as: [cwconrad@artsci.wustl.edu]
To unsubscribe, forward this message to leave-b-greek-329W@franklin.oit.unc.edu
To subscribe, send a message to subscribe-b-greek@franklin.oit.unc.edu


This archive was generated by hypermail 2.1.4 : Sat Apr 20 2002 - 15:40:31 EDT