My Day at DPLA – Part 2

Digital Public Library of America sticker

The second half of the day was dedicated to the beta sprint presentations, which laid out several component pieces that could be reviewed for incorporation into the DPLA project, whatever it may become.  The Beta Sprint is a technique used in software development to develop working models of software to demonstrate operability.  The sprinters work ruthlessly to push out a piece of working code as fast as possible.  The DPLA Secretariat got 39 models and they selected six major ones for a long presentation and three shorter ones for a “lightning round” presentation.  All of this work, unpaid volunteer work by major institutions and college students.  Yeah.

The first presentation was Library of Congress, National Archives and Smithsonian.  Smithsonian created an intermediary metadata layer that sat over the digital collections of all three institutions and mapped common fields into a unified search function.  MAJOR.  Second presentation was was the Digital Library Federation, IMLS and DCC.  They created an actual live model that integrated data sets from a few hundred small cultural heritage institutions around the entire country.  This led to a question about curation of the content and who can be included and who gets excluded.  No clear answer to that at this stage.  The third presentation was for a product called ExtraMuros.  OMG, it is mind poppingly cool. This allows you to not only search across multiple document types including full text book searching, photo and video collections in partner institutions but also on the web via sites like Flickr and YouTube, BUT ALSO allows you to play with the content and create new collections, new documents, and enhance existing documents by overlaying and integrating multimedia resources into a text.  I was blown away.  The next presentation was a consolidated government documents interface from University of Minnesota, Hathi Trust and CIC.  It was primarily a mapping and data scrubbing layer that would create greater access to historical government documents, which are notoriously difficult to navigate.  Interestingly the GPO was not involved, nor were they interested.  As a former GPO employee I was a little surprised, because they have an army of catalogers pumping out records every day.  Who knows.  Then the folks from Athens, the one in Greece, presented a product called MINT.  MINT is a metadata mapping product that allows you to create the connections between the products in your data sets and everyone else’s data sets.  They also discussed a minimum viable record standard that they apply for data to be discoverable using their system.  Looked easy.  Finally were two coordinated products called LibraryCloud and Shelf Life.  Library Cloud is exactly what it sounds like, a data cloud server for library content that backs up local data and serves it up for you.  Shelf-Life was much like an OPAC interface that allows you to interact with all the different types of virtual objects in the DPLA catalog through visual shelf arrangements, and incorporated a lot of social media elements such as public reviews, comments, tagging and ranking of data.  I wasn’t totally sold on the look of it, but that’s obviously something that can be changed.

Then there was the lightning round.  First up was Bookworm, which combined the N-Gram viewer and the library’s metadata to create a more powerful search result system.  There was a hilarious moment in here where the undergraduate math student was explaining how to use the product and said “Social Sciences is ‘H’ for some reason” and the entire room burst into laughter.  Silly undergraduates not understanding the Library of Congress Classification System.  It was good, and made great use of variable data visualization techniques.  Next was a method for creating profiles for the cultural institutions and the content that they share with the DPLA.  Meh.  The final one was a project called WikiCite, which would create a citation index of digital information, as well as caching links that are referenced and cited as sources for Wiki articles.

After this we broke for the afternoon tea and had a chance to go and explore some of the poster sessions.  I primarily just hung around looking to see if I knew anyone else. I didn’t really see anyone that I hadn’t already run into.  It was a conference of maybe 300 people so you got to see a lot of the same people over and over again.

The final panel of the day was the report back and mission statements from the six work streams to see where they were headed.  I’m just going to identify the work streams and their mission statements, so I can move on to future thoughts.

  • Audience: Create a digital public Library of America that is a trusted first platform for knowledge online and is universally accessible, participatory, and compelling for all.
  • Content and Scope: Facilitate the discovery and exposure of digital heritage content for permanent, open, public access for the enhancement of knowledge and community.
  • Financial: Explore and develop mechanisms to generate ongoing support for the DPLA. Generating recurring demand is implicit in this statement.
  • Governance: Develop a system of decision making and management for the DPLA.
  • Legal: Illuminate legal issues and, where feasible, provide information and options for addressing legal issues for America’s libraries as they go digital
  • Information Technology: Establish the technical and normative principles of the technological framework that will best support the DPLA’s aims.

As you can see from this, it’s all a little vague, and that’s good at this stage, because they’re still defining the future of the project.  But they’ve also got a very aggressive schedule and a deadline of 18 months to a deliverable product.


So, that kind of wrapped it up there at the end and I was left with a ton of questions, all of which will have to wait for answers.

What kind of product is this going to be?

Who’s going to be using it and what are their needs?

How can the public library use this resource and promote its use with their user base?

How can libraries and cultural institutions become contributors to this project as well as users?

Will the general public be able to create content, share it with the DPLA and be able to expect longevity and access?

Will the DPLA advocate for copyright reform to increase digital access, and actually be able to compete with the stakeholders?

Can the federal government or local governments or public/private partnerships create an internet corps of engineers to enhance access?

Will this product start to change average people’s minds about copyrights and accessibility of content?

Would the DPLA start to challenge the publishing industry to end EULAs and DRM on eBooks to increase digital adoption?

Are we just going to stop with the United States or will we push this toward a global digital culture revolution?  With the U.S. and Europe on board this digital train, South American, Africa and Asia ought to be close behind.

I’m going to end with the vision of the starship library that I wrote about last month.  This is how we get there. By partnering together to make the entire cultural heritage of the world universally accessible, downloadable, remixable, and free.  With this level of access and collective urge to make things available we will get to that point.  And when we finally reach another world, we can start building a new collection, with the unified wisdom of our entire planet behind us.

I am so ready to take that big step.

I’m going to edit this to add one very important thing.  This project is going to revolutionize the web for one very simple reason.  Metadata.  We have been living in a world where blunt force, raw searching yields millions of useless hits.  The value of a service like the DPLA is that it is in fact curated by librarians, archivists, museum curators, as well as the public who volunteer their efforts to make it relevant.  This is the hybrid of the old school library catalog and the new school wiki pages, where we have expert metadata people working round the clock to make things accessible, and average people dedicating their personal knowledge and time to make that metadata even more relevant.  This is going to fundamentally change how we use the web, because I will guarantee you that website owners are going to want to get in on this somehow.  And that means that they are going to have to generate metadata for their work to make it accessible and relevant to the collection, and then the users of those sites are going to curate the hell out of them.  Is that Web 3.0?  2.5?  I don’t know, but it’s a radical shift in an excitingly old/new way.


My Day at DPLA – Part 1

Today I had the pleasure of attending the Digital Public Library of America plenary session at the National Archives and Records Administration.  It was one of those moments where you see something and you instantly know that this is going to be huge.  The heaviest hitters in library science and digital access were there in full force, all of them throwing their support at this new coordinated initiative that, if successful, will revolutionize digital access to not only the United States, but to the world.  And I’m not just saying that, I really, really believe that this is going to be an utterly transformative movement in the world of internet culture.

Let me get all the name dropping out of the way.  Harvard, Stanford, The Internet Archive, Wikipedia, Public Knowledge, The National Archives and Records Administration, The Library of Congress, The Smithsonian, The National Endowment for the Humanities, The Institute for Museum and Library Services, The American Library Association, State Library of Texas, The Sloan Foundation, The Arcadia Foundation, The Gates Foundation… Carl Malamud, Brewster Kahle, Bob Darnton, Susan Hildreth, Maureen Sullivan… This thing was HUGE.  The scale of it, never before attempted, and never before possible, and they brought everyone to the table, including interested parties from a similar project called Europeana, and the director of the British Library just happened to stop by.  The other fascinating aspect of this was the participation of rank and file librarians (like myself) and library school students.  They are really making an effort to spread the word and reach out to get the kind of feedback that they need to really develop a service that’s going to transform society.

And on one small, and interesting, detail: the entire conference was illustrated simultaneously by two different live artists.  It was like watching RSA Animate live!  All I could see of them was their pixie-like heads and their colored pens zooming along, but these ladies were incredible.  They were able to summarize hours and hours of presentations into cool wall sized graphics.  I’ve never seen anything like it done before my eyes.  I want these ladies at every meeting I ever have.

I’m going to try and reconstruct the day from my tweets.  Hopefully it won’t be too mangled.

Up first there was a welcoming prologue from the National Archivist David Ferreiro who turned it over to James Leach from the NEH.  Leach talked about C.P. Snow’s concept of the Two Cultures: Sciences and Humanities, and how today’s culture is merging those two fields via projects like this.  His driving note was that we need to develop an “infrastructure of ideas.” This was immediately followed by a generous donation from the Alfred P. Sloan Foundation of $2.5 million dollars toward the project, which was then followed by an equally generous matching $2.5 million from the Arcadia Foundation.  Yeah.  That just was announced almost randomly in front of everyone there.  The speaker from Arcadia talked about how digitization projects to date have been haphazard boutique kinds of projects with a little money here and a little money there to make a small thing accessible online.  His call to action was to develop the big box version of that, going from the boutique to the Wal-Mart phase.  Everyone kind of gasped and chuckled.

The money bomb was followed by a report from some big players in the digitization movement here in Washington.  Library of Congress has scanned and made available 28 million of their 148 million items in their collections, and are itching to get the rest out there, much of it public domain books with priority scanning for American History titles.  IMLS was looking for projects that they could directly fund to help increase the DPLA movement.  The resounding statement here was collaborate and conquer.  NARA spoke about their mandate to make available a trove of some 400 million declassified documents by 2013.  All of which are pending review by relevant agencies.  The National Archivist wants to digitize absolutely every piece of paper in the archive and make it freely available.  BOLD.  In the follow up questions it was asked if they were considering the difference between making something accessible versus making something discoverable. Ferreiro made the statement that “if it’s not online it doesn’t exist.”  I’ll come back to that in a minute.  This was followed up by a lot of talk about massive amounts of metadata as well as accessibility for the blind and others as well.  Lynne Brindley, the director of the British Library stood up and mentioned that they have opened up all of their metadata under a Creative Commons 0 license.  A director from the Smithsonian also chimed in stating that they have 137 million items that they want to make available as well, most of them natural history specimens.

Now let me take a moment to just talk about metadata.  Many of you who read this blog already know what that is, but for those of you who don’t let me try and explain it in plain English. When you go to the library and you use their online catalog to search for a book, that catalog is created from a database containing about 80-100 fields of information about that book from the title, author, and subject to really obscure things like the height of the book, its language, illustrators (if it has one), I could go on and on and on.  Anyhow, that data, is data about the properties of that book.  We call that metadata.  Now, books aren’t the only things that have metadata.  Everything does!  Pictures online have metadata, items in museums have metadata, archives are loaded with metadata.  The crazy thing is that wildly different standards have arisen for different industries, and all of that unique information is often only readable by systems specifically designed to read that database code.  That’s one of the major hurdles in a project like this that wants to combine the forces of libraries, museums, archives and user generated content.  It’s a metadata nightmare!  But they are thinking about this and in a major way.  More on metadata in the beta sprints.

Bob Darnton from Harvard wrapped up that session with a very inspirational vision that this is not just a project for America, but a project that is international in scope via partnerships with similar cultural heritage projects like Europeana.  It’s easy to see that coming via open metadata standards between DPLA and Europeana.  In fact they plan to do a digital exhibit on the history of European migration to the United States as one of their earliest partnership projects.

The next panel consisted of many of the visionary people behind the DPLA movement.  The first was John Palfrey from Harvard.  His vision of this system was not one unique repository, but rather an access point that coordinated online access to the digital treasures that are the purview of local institutions.  He reinforced that the metadata itself needed to be open to everyone, and that the code that powers the DPLA be made available for local customization projects, like a Sourceforge for Libraries. He concluded with a hilarious idea about creating “scannebagos” to go out to different little towns and scan their documents and get them online.  Peggy Rudd from State Library of Texas pushed the idea of making the DPLA so resourceful that it would itself spawn a verb, ala Googling, viz. DPLAing.  Doesn’t have the same ring, but I like this vision of saying “I’m going to check The Library for it.”  Brewster Kahle spoke about three simple ideas to build a digital America.  The thing is we already are living in the digital America, and the services that we create today are what is going to drive the future of digital access online.  His three points were to make everything in the public domain freely available, make orphaned works available to lend, and to buy digital copies of new works and lend them.  Straightforward, and covers everything.  Amanda French from the Center for History and New Media had what was the most poetic speech about the vision of the DPLA.  She began by reading an aubade by John Donne, and talking about how we are clinging to our love of books as the sun is rising on a digital era.  Her conclusion was to find the balance between the digital products that we absolutely need, as well as the necessity of the physical space of the library and that would lead us to the gleeful rendez-vous with the soul of the library.  Carl Malamud was the final speaker and his was a call to action.  He sounded a rallying cry to create a new public works program of digitizing our nation’s heritage. “Deploy the Internet Corps of Engineers!”   It was astounding.

It was in this last panel’s question and answer session that we revisited the sentiment “if it’s not online, it doesn’t exist.”  Several other people, Kahle and Malamud I believe, echoed that sentiment.  When an audience member questioned this, asking “doesn’t this denigrate the physical work? Won’t people decide to not go to that museum, if they’ve already seen the entire collection online?”  Amanda French chimed in and restated it.  “If it’s not online people don’t know it exists.”  Making content freely available increases it’s value by exposing and promoting it.  How would anyone know if a museum in Iowa has a Caravaggio painting?  Perhaps in the knowing of that information a person may plan a trip to Des Moines, thus increasing tourism through open access.

It was at this point that we went to lunch.  I had a great conversation with some lawyers from Public Knowledge and Berkeley about the Hathi Trust / Author’s Guild lawsuit and the ridiculousness of it.  It was great and the food was awesome.

I’m going to take a break in the narrative here and post the second half of the day with all of the technical details and visionary work as well as my questions and dreams in the next post.