Receive this blog in your e-mail.

* = required field

powered by MailChimp!

RootsTech 2011: Towards a New Genealogical Data Model


On Saturday at the RootsTech con­fer­ence in Salt Lake City, there was an open dis­cus­sion ses­sion on genealog­i­cal data stan­dards. There has been a heated dis­cus­sion, lit­er­ally going on for years, about a new data model that could replace GEDCOM. A new GEDCOM stan­dard would address GEDCOM’s gaps — for exam­ple, being able to store evi­den­tiary analy­sis within the data model — and be a liv­ing dynamic stan­dard, unlike GEDCOM, which has been sta­tic since 1996.

In the first hour, the dis­cus­sion iden­ti­fied sev­eral issues with the data model:

  • Data in Proprietary Formats — Because of gaps in GEDCOM, and the lack of a stan­dards body to address this issue, most soft­ware ven­dors devel­oped their own pro­pri­etary exten­sions, which lim­ited the abil­ity to share data.
  • Lack of Persistent URLs (PURLs)
  • Unstructured Text
  • Tag & Link Issues
  • Inconsistent Search Experience
  • Data Versioning (Diff/Merge)
  • Inability to Transfer Rich Data (rich media)
  • Inability to do Cross-Repository Search
  • Documentation (in other words, cap­tur­ing the source of a genealog­i­cal state­ment, the abil­ity to provide
  • Key as seen (Representation) — In other words, how do we nor­mal­ize data while pre­serv­ing the orig­i­nal “as-keyed” version?
  • Static data interchange

After the first hour, devoted to cre­at­ing this list, we were to vote on buck­ets of tech­no­log­i­cal or fea­ture issues to come up with one or two we could dis­cuss. For me, the biggest issue was not any of these tech­ni­cal issues, it was the lack of a gov­er­nance model. Since no one was signed up to main­tain GEDCOM, it did not change with the times, and died as a stan­dard; in other words, peo­ple saw gaps and addressed them in a pro­pri­etary way, since there was no way to get issues addressed within the standard.

I got up and sug­gested we talk about how we build a work­ing gov­er­nance model instead of the issues that the gov­er­nance model would help us solve. For more than a decade, peo­ple have been lament­ing the lack of a stan­dards body to adju­di­cate issues, develop a com­mon stan­dard, and sub­mit it for pub­lic review. At the same time, peo­ple have pointed out the fea­ture gaps, and pro­posed ways to address them. For the fea­ture gap dis­cus­sion to have an effect, how­ever, we need to have a place to have these dis­cus­sions that is actu­ally designed to main­tain a work­ing stan­dard. Lack of gov­er­nance, not lack of tech­nol­ogy, is the issue. We voted, and changed the direc­tion of the meet­ing to dis­cuss governance.

It was at about this time that Tom Creighton, the CTO of FamilySearch, got up and announced that FamilySearch is nearly ready to announce a new pro­posed data model. This changed the meet­ing imme­di­ately. Instead of an open dis­cus­sion, it became more like a press con­fer­ence, with Tom field­ing ques­tions about what they have done, when the work will be shared, and so on. There was not a lot that he was able to divulge at this point.

Key por­tions of the new pro­posed stan­dard are based on the GenTech genealog­i­cal data model owned by the National Genealogical Society (full dis­clo­sure, I am on the Board of the NGS). The deci­sion to make the new pro­posed data model pub­lic and free has not yet been made by the man­age­ment at FamilySearch, but is being dis­cussed. This means that there can­not be a date set for the launch of the new stan­dard, as it could remain the intel­lec­tual prop­erty of FamilySearch, and unavail­able out­side of FamilySearch. (Mr. Creighton said that they had dis­cussed the fact that they were devel­op­ing a new stan­dard with sev­eral soft­ware ven­dors, but had not pro­vided any of them any more detail than that they were work­ing on something.)

This is an excit­ing devel­op­ment in the inter­sec­tion of geneal­ogy and tech­nol­ogy. If FamilySearch decides to share their work, and if a gov­er­nance body can be iden­ti­fied or set up, and finally if that gov­er­nance body has the trust of the genealog­i­cal com­mu­nity, including:

  • the major desk­top and mobile appli­ca­tion developers
  • the major web databases
  • the NGS
  • NEHGS (New England Historic Genealogical Society)
  • FGS (the Federation of Genealogical Societies)
  • BCG (the Board for Certification of Genealogists)
  • APG (the Association of Professional Genealogists)

we could be near the start of a much more rich tech­nol­ogy envi­ron­ment. A new data model, address­ing issues with GEDCOM and upgraded and changed through a com­mu­nity gov­er­nance model could lead to inte­grated set of inde­pen­dently devel­oped soft­ware tools that would allow peo­ple to rep­re­sent their research bet­ter than they can with GEDCOM, and bet­ter share their data or move it from one vended prod­uct to another.

It sounds a lit­tle like Shangri-la as I write it here, but we are talk­ing about the incred­i­ble poten­tial that would be unleashed if most soft­ware ven­dors did not have to fix inde­pen­dently (or ignore) issues with the cur­rent data model, and could instead focus on the next new way to access and work with genealog­i­cal data.

Update, 17 February 2011: A sum­mary of the meet­ing dis­cussed here has been posted on the FamilySearch wiki: https://wiki.familysearch.org/en/Genealogical_Data_Standards_(RootsTech_Session)

 
Share
OPENGEN - Genealogy Standards Alliance OPENGEN.ORG - Genealogy Standards Alliance