Receive this blog in your e-mail.

* = required field

powered by MailChimp!

Chronicling America API


Ed Summers, whose dig­i­tal life is at http://inkdroid.org/, and who is a soft­ware devel­oper on the Library of Congress’s Chronicling America project, noticed my “Chronicling America” blog entry of the other day, where I talked about pre­dic­tive URLs I found on the site.

He pointed out that Chonicling America has a pub­lished API (appli­ca­tion pro­gram­ming inter­face) that explains how one can access the con­tent of Chronicling America. The API is at: http://chroniclingamerica.loc.gov/about/api/

The API facil­i­tates the fol­low­ing functions:

  • Search —  with results returned in HTML, JSON, or Atom — allow­ing for sim­ple human read­ing of a web page, web page manip­u­la­tion of the returned data arriv­ing in JavaScript Object Notation, or as an Atom feed, that can be read in a feed reader, such as Google Reader or Bloglines.
  • Link —  to “titles, issues, edi­tions, and pages” using “LCCNs, dates, issue num­bers, edi­tion num­bers, and page sequence num­bers.” Using some of the exam­ples on the site, you can quickly pre­dict and test poten­tial URLs, then use and share them. You can also gen­er­ate URLs out of a data­base, once you under­stand the rules.
  • Linked Data — using pub­lished, stan­dard ontolo­gies, you can use the Chronicling America data­base to get at related con­tent on the “seman­tic web”, where that con­tent is sim­i­larly tagged. Using RDF/OWL (Resource Description Framework / Web Ontology Language) tech­nolo­gies, this con­tent can be deliv­ered to users in new and cre­ative ways.
  • Aggregations — Chronicling America has assem­bled col­lec­tions of related items (such as JPEG 2000, PDF, and OCR text of the same news­pa­per page) using a tech­nol­ogy called OAI/ORE (Open Archives Initiative, Object Reuse and Exchange).

I am amazed by the scope of this project, as well as how openly the con­tent is being made avail­able. Here’s a brief snip­pet from their API page about the scope of Chronicling America:

There are more than a mil­lion dig­i­tized news­pa­per pages in Chronicling America. These pages span sev­eral decades and many U.S. states and ter­ri­to­ries. New batches of data come in from part­ner insti­tu­tions through­out the year and are added to the site regularly.

The open­ness of the con­tent, which such a rich, pub­lished API, means that this con­tent is ripe for re-purposing, and the site itself can teach you how to get to its own con­tent. Just as I noticed the pre­dic­tive URLs, the folks at Chronicling America write:

Details about these inter­faces are below. In case you want to dive right in, though, we use HTML link con­ven­tions to adver­tise the avail­abil­ity of these views. If you are a soft­ware devel­oper or researcher or any­one else who might be inter­ested in pro­gram­matic access to the data in Chronicling America, we encour­age you to look around the site, “view source” often, and fol­low where the dif­fer­ent links take you to get started.

I intend to do just that. What an excit­ing and pow­er­ful resource.

 
OPENGEN - Genealogy Standards Alliance OPENGEN.ORG - Genealogy Standards Alliance