Showing posts with label digital preservation. Show all posts
Showing posts with label digital preservation. Show all posts

Who Will Save America's Vanishing Songs?

1859 drawing of an phonautograph, the first device capable of recording sound. You can listen to some of the first recorded sounds here. Illustration from Wikimedia Commons.
NPR recently ran an interesting piece, Who Will Save America's Vanishing Songs? The story was inspired by a paper from the Library of Congress' National Recording Preservation Board [huge PDF warning].

Both pieces report that in some ways it is the most recent musical tracks that are the most endangered. "Older recordings actually have better prospects to survive another 150 years than recordings made last week using digital technologies," according t the report. Many smaller bands only release their music digitally, sometimes via a MySpace page or similar site, with no thought to digital preservation. Modern copyright laws have become so restrictive that "Were copyright law followed to the letter . . . it would brand virtually all audio preservation as illegal." And don't even get us started on the multiple digital tracks, alternate versions, and bonus tracks that make up modern music releases.

The report is not optimistic about preserving older media either. "Public institutions, libraries, and archives hold an estimated 46 million recordings," the report tells us, yet "degree programs to train professional audio archivists are nonexistent."

I found the report a discouraging read. Popular music is one of the best sources that we have for getting into the mindset of people in the past. Popular music is an invaluable source for social history and a great teaching tool (see this post on teaching with digitized music from Edison cylinders).

What is to be done? "This study will be followed by publication of a national plan developed on the basis of the recommendations of task forces convened to discuss the findings presented here," we are told. Apparently it cannot happen soon enough.

A Post for Jerry Handfield



Jerry Handfield, the Washington State Archivist and one of my bosses, loves these Team Digital Preservation videos. DigitalPreservationEurope has created a series of these cartoons to help explain basic concepts of digital preservation. There are a half-dozen videos in the series so far, you can see the rest here.

Google's Book Search: A Disaster for Scholars?

Your humble Northwest History blogger is sometimes accused of being a Google fanboy. A fair cop. But you know who is not a Google fanboy? Geoffrey Nunberg, that is who. Over at the Chronicle of Higher Education Nunberg has a witty jerimiad, Google's Book Search: A Disaster for Scholars.

Nunberg's beef is with Google's sloppy and commercially driven metadata schemes. He demonstrates that even with such a basic item as date of publication, Google Books very frequently gets it wrong. This in turn often corrupts search results: "A search on 'Internet' in books published before 1950 produces 527 results; 'Medicare' for the same period gets almost 1,600." By comparing Google's data to that found in the catalogues of the contributing libraries Nunberg shows that these errors do in fact belong to Google, not to their partners.


Nunberg also whacks Google for the classification errors where books are placed in the wrong categories: " H.L. Mencken's The American Language is classified as Family & Relationships. A French edition of Hamlet and a Japanese edition of Madame Bovary are both classified as Antiques and Collectibles . . . An edition of Moby Dick is labeled Computers; The Cat Lover's Book of Fascinating Facts falls under Technology & Engineering."

Worst of all to Nunberg is Google's adoption of the Book Industry Standards and Communications categories for Google Books, which he describes as a modern commercial invention used to sell books, rather than a scholarly system of classification like the Library of Congress subject headings: "For example the BISAC Juvenile Nonfiction subject heading has almost 300 subheadings, like New Baby, Skateboarding, and Deer, Moose, and Caribou. By contrast the Poetry subject heading has just 20 subheadings. That means that Bambi and Bullwinkle get a full shelf to themselves, while Leopardi, Schiller, and Verlaine have to scrunch together in the single subheading reserved for Poetry/Continental European. In short, Google has taken a group of the world's great research collections and returned them in the form of a suburban-mall bookstore."

I think that Nunberg has a number of good points--point he gathers together to form a molehill, from which he conjures up a mountain. Google's metadata may be everything he says (and I think he is probably right) but how great a problem is that really? This scholar at least uses Google Books either 1) to locate a digital copy of a book I already know about, or 2) via a string of search terms. In the first case, it is not relevant to me that Google has classified Adventures of Huckleberry Finn under "wild plants" or whatever. I know perfectly well what it is, and just wanted to find a quote I remember.

In the second case, I might search for mentions of the Columbia River in books published before 1860. And suppose a faulty date in Google's database brings me to something written after 1860. So what? Surely when I click on the link and find myself reading Sherman Alexie instead of Lewis and Clark, I will notice the fact. (Actually I just did the search and on the first 10 pages of results I don't see any errors at all. Take that, Nunberg.)

So for which scholars exactly is Google Book Search a "disaster?" Nunberg cites "linguists and assorted wordinistas" who are "adrenalized" at the thought of data mining to "track the way happiness replaced felicity in the 17th century, quantify the rise and fall of propaganda or industrial democracy over the course of the 20th century, or pluck out all the Victorian novels that contain the phrase "gentle reader." But who does this? OK, I know that people do it, but most data mining of this type has always struck me as more of a parlour trick than actual scholarship.

The other thing Nunberg ignores is that metadata is not that hard to fix. Google already provides a "feedback" button on every virtual page so readers can report unreadable or missing pages. If we howl loud enough we could easily see similar feedback mechanisms on the "More book information" page so we could correct names and dates and categories.

Nunberg is absolutely correct to recognize the monumental importance to scholars of the Google Book Search project. It is vital that scholars take a critical stance that will push Google to improve the project and make it even more useful. His article is a valuable push in that direction.

UPDATE 9/3/09: Reader Ed points out that Geoff Nunberg also posted a nicely illustrated version of his article on the blog Language Log, and got a brief response in the comments from
John Orwant, who manages the metadata at Google Books.

Wax Cylinder Recordings Online

Belfer Cylinders Digital Connection - Syracuse University Library: "The Belfer Cylinders Digital Connection provides online access to digital audio files of cylinders in the Belfer Audio Laboratory and Archive. Belfer’s cylinder collection includes over 22,000 cylinders, 12,000 of which are unique titles. The goal of this digitization project is to provide 6,000 audio files by 2010."

This is a really nicely done site--clean and easy to search and navigate, with abundant metadata. And you may download the songs as MP3s! They should have a podcast or two of songs from the collection.

There is no northwest content that I could find at the site but there are a lot of songs on historical topics. A keyword search for "war" brings up fascinating WW1 nuggets such as the instrumental Battle of the Marne the rousing Are we downhearted - no! and the touching My Bugler boy. And don't miss the Scottish Wit and Humor category--which reminds me of Robert Darnton's oft-quoted point that "When you realize that you are not getting something—a joke, a proverb, a ceremony—that is particularly meaningful to the natives, you can see where to grasp a foreign system of meaning in order to unravel it."

[Via Metafilter. For additional wax cylinder recordings online see The Cylinder Preservation and Digitization Project at UCSB.]


Are We Losing Our History in the Digital Age?

Time for a scare-report! British Library warns of 'black hole' in history if websites and digital files are not preserved: "Historians face a ‘black hole’ of lost information if we do not preserve websites and other digital records, the head of the British Library warned today. Chief executive Lynne Brindley said our cultural heritage is at risk as the internet evolves and technologies become obsolete."

Well, maybe. The article underestimates the efforts already underway to preserve at least some digital records. There is the Internet Archive (Wikipedia article) which maintains a huge cache of expired webpages. (The Wayback Machine is invaluable for recovering information when you hit an expired link.)

And of course there is the magnificent Washington State Digital Archives, my employer. We preserve the websites of former Washington governors Mike Lowery and Gary Locke among others.

The other problem with the "black hole" argument is that it compares the spotty preservation of digital records to an imaginary paper past where every record was lovingly archived and preserved in climate controlled isolation. But every historian learns soon enough that huge chunks of our historical record are missing. Twain's articles in the Territorial Enterprise are gone, burned up with the rest of the archives in an 1875 fire. A 1973 fire in Saint Louis destroyed 16-18 million military personnel files dating back to 1912. The Library at Alexandria was burned.

And yet we have histories of all these times and people. I would dearly love to be able to read all of Twain's articles as a fledgling journalist--but the handful that survive, Twain's own accounts of his Nevada years, and other primary sources from the period give us a pretty clear idea of what was happening in the to Virginia City and to Twain in those years. Future historians will find records enough for writing their histories of the early 2000s.

[Burning paper image from Flickr user The Shifted Librarian and used via a Creative Commons license. I added the wise-ass text using Picasa 3. This story is also being discussed over at Metafilter.]