The cover of the above ledger is not all that exciting to look at. But within that cover is the first of the hard earned rewards from the Ledger Challenge. This is the cover of Ledger 7, the first ledger we received with consensus data from the 20 finished during those exciting two weeks. The consensus text has been reviewed and published in the Huntington Digital Library. Coming in at 400 pages, with some 460 telegrams, it took us a while to go through it. The contents of the ledger are varied, covering the period from late May to early July 1863, and include various (sometimes conflicting) reports from Vicksburg, potential traitors in Indiana, and the start of Gettysburg.
Our volunteers worked hard on all 20 ledgers. We want to say, again, thank you for that effort. Ledger 7 shows that the effort is bearing fruit. We are already hard at work on the next ledger, and the others are going through the consensus processing now.
As we publish these ledgers, we will link to them on the Results page of the Decoding the Civil War site.
Decoding the Civil War has just finished its two-week transcription challenge. Our original goal was simple: complete 10 ledgers. Well, we reached that goal in the first six days. Deciding to ask our volunteers for a little more, we added 10 more ledgers. The challenge became 20 ledgers in two weeks. We can happily say that we have met that goal as well.
That is correct—all of our wonderful volunteers have completed an incredible 20 ledgers! The ledgers are:
mssEC_01; mssEC_02; mssEC_04; mssEC_05; mssEC_06; mssEC_07; mssEC_08; mssEC_09; mssEC_10; mssEC_11; mssEC_12; mssEC_15; mssEC_17; mssEC_20; mssEC_21; mssEC_22; mssEC_25; mssEC_33; mssEC_34; mssEC_35.
That is a total of 9,998 classifications, an average of 714 transcriptions a day, far exceeding our goal of 425 classifications per day! We also added 727 volunteers. Welcome to all of you! You and our veteran volunteers have helped make this a very successful challenge.
The researchers now have their hands full reviewing the consensus data and getting it transferred into the Huntington Digital Library. Keep checking our Results page to see new ledgers added.
So it is time to strike up the band, and order extra rations to all our volunteers! We have 30962 classifications left. That is still quite a bit, but remember that we have completed almost 10,000 in the last two weeks, and 87,150 classifications since the project started last June.
We ask you to keep your enthusiasm up and those fingers flying. Let’s try to finish them by June 30th!
Huzzah! Huzzah! Huzzah!
Providing an accurate transcription of the Thomas T. Eckert Papers is one of the primary goals of Decoding the Civil War. It’s why we applied to the National Historical Publications and Records Commission (NHPRC) for a grant, and it’s why our thousands of volunteers have put so much time and effort into this project. With more than 4,000 subjects, or pages, now retired, the folks at at Zooniverse have begun the process of establishing the consensus transcriptions, and we are excited and pleased with the results.
Each page, whether in the telegram ledgers or the codebooks, is seen by multiple people. The pages that have been “retired” are those that have been seen and classified by a sufficient number of people. To find the consensus transcription, an algorithm is run that compares every word and finds the most frequently used one. This doesn’t guarantee that the transcription is accurate, and we are allowing for corrections in the future, but it gives us a version of the text that most people agree on.
Let’s look, for example, at this telegram from the top of page 8, ledger 1:
The consensus lines and box are fairly straightforward – they were made by averaging the locations drawn by everyone who classified this page. As you can see, there are some quirks, such as the weirdly short top line, which we have seen consistently throughout the reviewed data: some people underlined the entire top line, while others split it into two smaller lines. The lack of underlining with the second line of the telegram is a bit harder to parse out, but most people transcribed “sent”, so the effect on the transcription was minimal. The box comes from that last step of “boxing” the telegram. It will prove very useful in Phase 2 when we parse out individual telegrams, as we have done in the example above.
So, what does the consensus transcription look like for this telegram?! Without further ado….
Louisville 4′ Recd Feb Feb 4 ’62
Col Colburn asst adjt General Ocean
they had better not be sent
I may want them soon if
they are ready for service Alvord
Huzzah! With the exception of that duplicated “Feb”, this seems to be spot on.
Here’s a closer look at the breakdown of the responses:
The numbers underneath each word indicate the number of people who transcribed it that way. The Zooniverse team uses these numbers to calculate the reliability of each line and each page. This message comes from a page with a reliability value of .8658, an excellent value on the scale from .0 to 1.0. We are currently working on determining what is an acceptable base level, or floor, of reliability. Pages whose reliability is lower than that base level will have to be reviewed further, or placed back into the transcription queue.
Once we have an acceptably reliable consensus transcription for a ledger, we will load that transcription into the Huntington Digital Library, so that researchers can start using the materials in a keyword-searchable form. At the same time we will be loading individual telegrams into Phase 2 of Decoding the Civil War, in which volunteers will tag metadata such as sender, recipient, times sent and received, and more. The fruits of the volunteer labor of “boxing” the telegram come into play here with the consensus box helping us determine the correct location on the page of the telegram.
We are incredibly excited about our progress so far, and can’t wait to share more of our findings in the near future!
It can be hard to wrap the mind around the idea of 16,000 telegrams. Even when told these are all in 35 ledgers, there are so many variables involved, coded, plaintext, mixed, etc., it can cause one’s mind to spin. So, in order to help understand date ranges within the ledgers, which ledgers have encoded messages, which ledgers are mixed, and which ledgers had sent vs. received messages, we created this chart. In our next phase of the project we will be asking our volunteers to add metadata to the transcribed messages. Years will be important, but some of the messages provide no year, just the the month and day, so this chart will help to complete this task.
But that is in the future, why release the chart now? In light of some comments on the Project Talk boards we thought we would release the chart now. With this chart, our volunteers can look at the Huntington ID for an image. By clicking on the little “i” at the bottom of the image will show the “hdl_id”, e.g. mssEC_08_046. Breaking down the id tells us that the page shown in that image is from ledger 08, and is the 46th consecutive image (note that this is not the page number, we scanned the whole ledger, covers included). With that information one can look a the chart and determine that the messages in ledger 08 on that page were received at the Washington telegraph office between May 1863 and January 1864, and that they were not in code.
Hopefully this will provide some further insight into the collection.