RootsTech 2011: Day 3

Internet Archive
Internet Archive

Brewster Kahle, founder of the Internet Archive, gave an incredible keynote address this morning.

His non-profit has been digitizing and providing on the Internet all kinds of media. As he said, “We are in the business of giving information away.” He briefly mentioned “born digital” data, but focused his discussion on the data we all have in shoeboxes, what he called the “canonical box ‘o stuff.”

The Internet Archives has 23 scanning centers in 6 countries. For example, they have digitized documents from the Leo Baeck Institute, and did so while removing private information via remote curation over the web.

Mr. Kahle also discussed their digitization of video content (8mm, Super8, 16mm,  video tape). He pointed out that some of this kind of conversion is available in the consumer market, for about $200 / hour. Higher grade (HD-quality transfers are also available, but are much more expensive.

Specifically in the genealogical field, Mr. Kahle said that the Internet Archive is involved in creating a free genealogical library – partnering with FamilySearch and the Allen County Library. Recently, the Internet Archive completed digitizing the 1790-1930 Census and making it available for free. They are now working on digitizing passenger records. Soon, they will be announcing a partnership with libraries that will allow for 80,000 e-books to be “loaned” from the library to patrons who are in the library.

For me, this was all powerful, transformative information. But I was most interested in Mr. Kahle’s discussion of print-on-demand digital bookmobiles, which can provide books as people need them, at a very low cost. (One example was that Alice in Wonderland costs about $1 to print and bind.) According to Mr. Kahle, a Harvard study has shown that it takes a library $3 to loan a book, so $1 to give a book away should be a reasonable price. This is being used to provide printed books free in India, Egypt, and Uganda.

One of the most moving portions of the discussion was the fact that the Internet Archive has doubled, to more than 1 million, the number of books available to the blind and text-disabled in the DAISY format for automated readers.

A key issue for any archive, Mr. Kahle pointed out is institutional responsibility: How long, and at what level can a company, or any institution be trusted to store information. He told us not to trust that Flickr, Google, or even his non-profit would be around, or make the right decisions when it counted. So, his recommendation is to not only have one copy in one institution. He said that the Library in Alexandria burned, yes, but it already had lost many of the important texts that it had gathered because of institutional neglect: “the new guys didn’t like the old stuff around.”

In 2002, the Internet Archive handed 200 TB of their data to the Library of Alexandria, which reciprocated with their collection of digitized Arabic materials. These kinds of large scale swap agreements are critical to the redundancy needed to ensure that we do not have another loss similar to what we lost at Alexandria, books by Aristotle, the other plays of Euripides … At this point, the whole Internet Archive is stored in three locations: San Francisco, Alexandria, and Amsterdam. Mr. Kahle acknowledged that an earthquake zone, the Middle East, and a flood plain were perhaps not the best choices, but they were not planning on stopping there.

For us, as genealogists, Mr. Kahle poses the following questions, which should make us think hard about the responsibility we have to take care of our data and documents:

  • Can we learn the stories of our ancestors?
  • Will our descendants know our story?

The RootsTech conference was a great success. More than 3,000 attendees were there, making it one of the biggest, if not the biggest genealogy gathering in the US. Next year, the second RootsTech conference will be held at the Salt Palace in Salt Lake City, Utah from 2-4 February. I plan to be there.