On June 21, 2016 we embarked on Decoding the Civil War, an ambitious project to transcribe 12,921 handwritten pages from 35 ledgers. These ledgers were from the archive of Thomas T. Eckert, head of the Washington office of the United States Military Telegraph, Assistant Quartermaster of U.S. Army, and Assistant Superintendent of the U.S. Military Telegraph.
The 35 ledgers record the telegrams sent and received by the War Department during the American Civil War. About one-third of the recorded messages in the ledgers were written in code, and another third may have never been published in the The War of the Rebellion: A Compilation of the Official Records of the Union and Confederate Armies (often simply abbreviated to OR), and there are more than 100 communiques from Lincoln himself.
As of this morning, November 15, 2017, we can announce that all the ledgers have been transcribed by our Volunteer Corps! All the difficult work of parsing sentences, figuring out 19th Century hand and spelling, and even dealing with difficult to read, smeared, pages has been completed in under 17 months. An incredible achievement. It would have taken 2 staff members working nearly full time 3 to 4 years to reach this goal!
We have begun to publish those ledgers on the Huntington Digital Library; 16 are already available for researchers. As we work though the huge amount of data provided by our Volunteer Corps, the remaining volumes will be published. We are excited to provide this new trove of American Civil War material to the Public, Researchers, and Students. The work completed has already been used to create educational materials that will be published shortly.
We began this project hoping to be finished in six months to a year. We learned quickly that it would take longer, that the task was full of challenges, and that tools needed to be built to overcome some of those challenges. Through it all our loyal volunteers stayed with Decoding the Civil War. As we move forward with the next phase of our project we will call again on those volunteers to help.
Right now, those volunteers can rest. All who have worked on Decoding the Civil War deserve thanks.
Strike up the Band! A lively polka is in order!
Thank you also to the National Historical Publications and Records Commission (NHPRC), which funded a two year grant for this project and to our institutions who have provided support throughout: The Huntington Library, Art Collections, and Botanical Gardens; North Carolina State University; and Zooniverse with its spectacular team at the University of Minnesota. Finally, a thank you to Daniel W. Stowell, whose advice, support and enthusiasm has been fundamental since the beginning.
After a year of hard work by our volunteers on Decoding the Civil War, Phase 1, we are about ready to launch Phase 2, the marking of metadata within specific telegrams. There are two work flows to this task. The first work flow, Code Words, is marking the arbitraries, or code words, for those messages in code. These coded telegrams will then be fed into Phase 3, the final decoding of the telegrams. Having the marked arbitraries should make the process of decoding much faster, possibly aided by computer algorithms.
The second work flow, Metadata, is a little more ambitious and complex, as we are asking our volunteers to work with individual telegrams, identifying specific metadata such as the sender, recipient, date sent, time received, etc. We are asking for metadata for a total of 10 fields; most telegrams have only a few; rarely do they have all 10. What we wish to accomplish is a way to provide simple metadata that will enable researchers to find all the telegrams to, say, Secretary of War Edwin M. Stanton, no matter whether it is in Ledger 2, or 6, or 22.
But did not Phase 1 enable full-text searching? Yes it did, and it is wonderful, but the transcriptions are accurate to the text as written in the ledger. Keeping with Stanton, if you typed in “Stanton” in the search box, you would get those pages where “Stanton” matches the search. But what if the telegram begins or ends with “EMS” or “Stantin” or “the Secretary of War”? The full-text search would ignore those pages. Furthermore, such a search looks at the whole message and returns results for any mention of “Stanton,” including other people named Stanton or places named Stanton. What if you want to look for Stanton only as the recipient? A search in a specific metadata field for “recipient” would enable that search and give you the correct results.
To aid in that search we will take the metadata tagged by the volunteers in Phase 2 and standardize the terms. So, continuing with Stanton, if the recipient is “EMS” and it is tagged as a sender or recipient, we will be able to take the consensus term and edit it to the standardized form of “Stanton, Edwin M. (Edwin McMasters), 1814-1869.” Once all the telegrams are tagged and the fields edited, if you do a specific search for “Recipient” as “Stanton, Edwin M. (Edwin McMasters), 1814-1869.” you will only get those telegrams to Stanton, not from or about him, and you will have those whether they are sent to him as “EMS,” “Stanton,” or “Stantin.”
The tagging of individual telegrams in the Phase 2 Metadata workflow will eventually enable specific searches to be done across the almost 16,000 telegrams. It will enable users to look for individuals or places or dates in specific fields. And the tagging of code words (arbitraries) in the Code Word work flow will help round out this project with the final decoding of encoded telegrams. An incredibly useful archive has been made available in Phase 1 of Decoding the Civil War. Help us leverage and categorize that hard-earned knowledge in Phase 2 to aid in the discovery of the American Civil War.
By Daniel Stowell, Decoding the Civil War Independent Researcher.
One of the most exciting aspects of digital humanities projects is the ability to make connections among materials that would not have been possible earlier. One citizen researcher, Linda Dodge, recently brought to our attention that a telegram from Decoding the Civil War connects to a receipt held at the Abraham Lincoln Presidential Library and Museum and digitized through Chronicling Illinois.
The telegram from Mary Lincoln to Thomas Eckert reads:
640 P[M] March 1 6
Will please deliver this message
to A Williamson
To A Williamson
your letter of 27th Gen Spinner
has list that will suffice to
settle without bills. Have it
done at once. Send receipt
Alexander Williamson (1814-1903) became tutor to Willie and Tad Lincoln when Mary Lincoln hired him in September 1861. In March 1863, Abraham Lincoln obtained a clerk position for Williamson in the Second Auditor’s Office of the Treasury Department. Williamson remained a friend of the Lincoln family and assisted Mary Lincoln in her financial difficulties following the President’s death.
Francis E. Spinner (1802-1890) was Treasurer of the United States from 1861 to 1875. He also assisted Mary Lincoln in settling her husband’s estate and in obtaining a pension. On February 13, 1866, Mary Lincoln’s friend and New York businessman Norman S. Bentley sent Spinner a list of ten merchants to whom she was indebted; this document is also a part of Chronicling Illinois. It is possible that Williamson was assisting Spinner in settling the former First Lady’s accounts and that the following receipt was for his services in that effort.
The receipt reads:
17th March 1866 Received from General
Spinner, U.S. Treasurer the sum of Ten
dollars ($10.) on account of Mrs. M. Lincoln
These two documents, held in institutions 1,600 miles apart, each tell a part of the story of Mary Lincoln’s efforts to settle the estate of her murdered husband and to obtain a pension to support herself and her youngest son. Now, thanks to digital projects like Decoding the Civil War and Chronicling Illinois, researchers can access both documents and others that are a part of this tragic story.
Chronicling Illinois is a digital archive project that the writer, Decoding the Civil War researcher Daniel W. Stowell, organized and implemented.
As a follow-up to Mario’s post on Monday, I just wanted to share that we have officially passed 100,000 classifications for the project! Well done transcribers! The end (of phase 1) may not quite be nigh, but it is in sight.
Today, April 17th, 2017, marks the first day of a two-week challenge, a challenge for not only our current volunteers but for all who would like to join in. The goal is a simple one: complete 10 ledgers in Decoding the Civil War between April 17 and May 1.
Our volunteers have been doing yeoman’s work turning out 200 classifications a day (a classification is equal to a page of transcription). However, we have fallen behind where we had hoped to be at this stage of our project. Thus, the challenge and the selection of 10 ledgers. We need roughly 425 classifications per day, a bit more than double our current number. We can do this and it will help get us back toward the long-term goal of having the majority of ledgers finished by June.
But why accept the challenge? The canard is often repeated that libraries and archives are dead, or if not dead, then they are simply morgues for outdated material. Our work, all our work, has demonstrated that active collaboration, research, and discovery are vital. Remember that the work that is being done on Decoding the Civil War brings together resources from four institutions — The Huntington Library, Art Collections, and Botanical Gardens; the Papers of Abraham Lincoln at the Abraham Lincoln Presidential Library and Museum; North Carolina State University; and the Zooniverse with its team at the University of Minnesota — and the hard work of over 3,000 volunteers. There is also the generous backing of the National Historical Publications and Records Commission (NHPRC). The collaboration of these groups has brought back to life telegrams from the Civil War, presenting the United States Civil War to the world in a continuous stream, not neatly packaged and organized.
Finally, Decoding the Civil War has created new and exciting paths of research—paths that have been cleared by the hard work of the citizen archivists, who have generously volunteered countless hours to this collaborative project. A hearty Thank You to them!
Starting today, let us see what new paths can be carved and cleared. To keep track of our progress we will be resetting the statistics page to reflect only the ten ledgers in the challenge. The numbers will not be set to zero as some work on the chosen ledgers has already be done. Rather, the numbers can be used as a base line to mark progress going forward. And we, as well as you, will be able to see the number of classifications per day clearly. Come back to our blog daily to see updates and new posts.
Go to our Decoding the Civil War project website, register as a new volunteer, or dive in!
Let us continue to prove that our work is vital! Take up the challenge: 10 ledgers in 2 weeks!
It’s always nice to have your hard work recognized, so we are very excited to be nominated for a 2016 Digital Humanities Award in the category Best Use of Digital Humanities – Public Engagement! It’s a pleasant development, though not entirely surprising considering the enthusiasm that Decoding the Civil War volunteers have shown on the talk boards, on this blog, and on Twitter.
Our transcribers’ commitment to the project has helped us retire 43% of the 12,921 pages of telegrams and codebooks from the Thomas T. Eckert papers. Due to their effort, we have already been able to share full transcriptions of two of the ledgers (ledgers 3 and 24), with more coming soon! The raw consensus data has let our research team pinpoint telegrams that are being incorporated into new educational materials. Gems have also been found by our volunteers, as can be seen in some of the posts on this blog.
This has been a team effort and a great collaboration. We are happy and grateful for the nomination, but the early success of this project has been a great reward already delivered. So thanks to all our volunteers, our research team, and all those who have supported us!
Now vote! Voting ends February 25, 2017!
Providing an accurate transcription of the Thomas T. Eckert Papers is one of the primary goals of Decoding the Civil War. It’s why we applied to the National Historical Publications and Records Commission (NHPRC) for a grant, and it’s why our thousands of volunteers have put so much time and effort into this project. With more than 4,000 subjects, or pages, now retired, the folks at at Zooniverse have begun the process of establishing the consensus transcriptions, and we are excited and pleased with the results.
Each page, whether in the telegram ledgers or the codebooks, is seen by multiple people. The pages that have been “retired” are those that have been seen and classified by a sufficient number of people. To find the consensus transcription, an algorithm is run that compares every word and finds the most frequently used one. This doesn’t guarantee that the transcription is accurate, and we are allowing for corrections in the future, but it gives us a version of the text that most people agree on.
Let’s look, for example, at this telegram from the top of page 8, ledger 1:
The consensus lines and box are fairly straightforward – they were made by averaging the locations drawn by everyone who classified this page. As you can see, there are some quirks, such as the weirdly short top line, which we have seen consistently throughout the reviewed data: some people underlined the entire top line, while others split it into two smaller lines. The lack of underlining with the second line of the telegram is a bit harder to parse out, but most people transcribed “sent”, so the effect on the transcription was minimal. The box comes from that last step of “boxing” the telegram. It will prove very useful in Phase 2 when we parse out individual telegrams, as we have done in the example above.
So, what does the consensus transcription look like for this telegram?! Without further ado….
Louisville 4′ Recd Feb Feb 4 ’62
Col Colburn asst adjt General Ocean
they had better not be sent
I may want them soon if
they are ready for service Alvord
Huzzah! With the exception of that duplicated “Feb”, this seems to be spot on.
Here’s a closer look at the breakdown of the responses:
The numbers underneath each word indicate the number of people who transcribed it that way. The Zooniverse team uses these numbers to calculate the reliability of each line and each page. This message comes from a page with a reliability value of .8658, an excellent value on the scale from .0 to 1.0. We are currently working on determining what is an acceptable base level, or floor, of reliability. Pages whose reliability is lower than that base level will have to be reviewed further, or placed back into the transcription queue.
Once we have an acceptably reliable consensus transcription for a ledger, we will load that transcription into the Huntington Digital Library, so that researchers can start using the materials in a keyword-searchable form. At the same time we will be loading individual telegrams into Phase 2 of Decoding the Civil War, in which volunteers will tag metadata such as sender, recipient, times sent and received, and more. The fruits of the volunteer labor of “boxing” the telegram come into play here with the consensus box helping us determine the correct location on the page of the telegram.
We are incredibly excited about our progress so far, and can’t wait to share more of our findings in the near future!