Beth and I attended SAA in Chicago over Labor Day weekend. Very interesting conference, as usual. And the Fairmont Hotel was just a couple of blocks from the Art Institute, Lake Michigan, and Millennium Park. An enjoyable setting, for this displaced Midwesterner!
Lots of good stuff on the unofficial conference wiki site.
Open Source Software Solutions for Collection Management and Web Delivery
Susan Hamburger from Penn State was the first presenter. She described the decision-making process whereby the Penn State library chose a management and delivery system for their EAD finding aids. The library wanted a system that would automate the generating of EAD documents in their homegrown Oracle database, and would provide a federated search tool for finding aids. Developing a prioritized criteria list was an important step in the process of evaluating several potential systems (Archeon, Archivists Toolkit, CONTENTdm v. 4.2, DLXS v. 12, XTF). After considering all the data ( cf. http://www.personal.psu.edu/sxh36/appendixa.htm ), they chose ContentDM, which has performed well, with some tweaking.
The second speaker, from Mount St. Mary’s University, described their Archives’ D-Space project, in which self-deposit and self-cataloging by authorized faculty helps to preserve and make accessible the University’s scholarly output. This system has several advantages: users archive their own material, cataloging is done by subject experts, material is searchable and downloadable by users (when permitted). There are also some challenges: extensive tech support is necessary in the beginning, there is a long learning curve for many faculty, and copyright issues can be a problem.
The final presenter was from the Hoover Institution library and archives at Stanford. A few years ago they received the archives of the Firing Line television program and wanted to make the video available on the Internet. Their IT staff, however, was unwilling/unable to help with this project. So they sought support from graduate student interns from the Computer Science department (fairly plentiful at Stanford). The student interns, who ended up being exclusively Thai, developed a MySQL database with a web-based form for data entry. This homegrown system has worked well for the most part, but there are some drawbacks to relying exclusively on student workers (e.g., in 2003 the server was down for the entire summer, because all of the students who knew how to fix it were home in Thailand!). The Hoover Institution is currently seeking funding for a permanent IT position.
How Controlled is Your Vocabulary: Experience from the Digital Field
Very interesting session. The first participant was an archivist from Purdue, who talked about their Amelia Earhart digital collection and the use of controlled vocabulary in ContentDM. The Earhart collection is indexed with terms from the Thesaurus for Graphic Materials (TGM) and LCNA, with supplemental LCSH where more precise terms are needed. Problems arose when users questioned the LC subject term used for Earhart’s plane (“Electra (turboprop transports)”), which was apparently not accurate, but was the closest thing the indexers could find in the LC authority file. In the end, they created a more accurate but not LC-approved heading (“Lockheed Electra”). [Audience member asked why they didn’t submit this new heading to LC, which I thought was a valid question!]
Controlled vocabularies for many digital projects end up as a combination of local headings, LCSH, and various thesauri. This provides more precise access points, which is especially important for visual materials since they cannot be transcribed or OCR’d. But use of local headings can lead to issues, such as the need for cross-references in library catalogs (a weakness of ContentDM is that it doesn’t allow cross-references in public view). The Purdue speaker urged libraries to document local headings and suggested that we need to include derivation of each index term in metadata, so that users will know where we got them, and that we provide users with access to thesauri.
Next up was Sheila McAllister from the Digital Library of Georgia, who discussed name authority control in the Galileo system. She recommended NACO training for metadata creators [good idea], but admitted that this wouldn’t be feasible for many projects. Sheila also emphasized the need for archival name authority records to include context information (see discussion of EAC below). DLG developed NAME, a web-based form for name authorities (developed parallel to EAC).
DLG is starting to ingest NAME records as part of its thesaurus. Sheila gave us a preview of their current project, the Civil Rights Digital Library, which is making extensive use of NAME records. For example, the NAME thesaurus allows metadata creators to choose an authoritative form of a name for display in the “people browse” list, while still indexing all forms of the name. And place names in the Civil Rights DL are being associated with geographic coordinates in a Google maps interface.
In the future, DLG plans partnerships with other institutions to add more name authority records, and they will start ingesting biographical/historical notes from EAD finding aids (as NC is already doing in the NCHBio project—see below).
Finally Seth Shaw from Duke University Archives gave a presentation on folksonomies, which are organic taxonomies in which the lexicon results from descriptive activities of a user community. Seth used Flickr and LibraryThing as examples of tagging, which is the most common (but not the only) form of folksonomy.
The important thing to remember about folksonomies is that the tags are intended for personal retrieval or retrieval by a particular sub-community. Thus many terms are unique to the specific individual or community and may result in ambiguity for a larger user community.
Theoretically, as specific terms become popular in a given folksonomy, the more-used terms become standard and result in eventual consolidation into a de facto taxonomy. However, actual statistics don’t bear this out—variant and redundant terms survive and thrive in most folksonomies. Some outside intervention is necessary if a folksonomy is to evolve into something resembling a thesaurus. Seth suggested three ways that this could occur:
- Parallel descriptions, in which controlled subject headings are provided by catalogers and uncontrolled tagging is done by users
- Merging of codified and colloquial terms, in which some authority terms are recommended for use by taggers (for example, a suggested term might be provided as a user enters text)
- Recommended social terminology, in which tags which reach a threshold of use are flagged for inclusion in an official thesaurus
When using folksonomies for archival digital collections, one needs to consider
- What is being described, and at what level?
- Who is allowed to participate in tagging? Is some subject expertise required?
- Can we achieve critical mass—i.e. are enough users contributing tags for the resulting taxonomy to be meaningful?
An audience member made a good point about this last consideration, observing that users of Flickr, LibraryThing, YouTube, etc. have a vested interest in subject tagging because the material is their own. Users might be less likely to spend time tagging other digital archives. Seth agreed that this was an issue, and suggested that we need a tool by which researchers could describe cross-institutional holdings, which descriptions could then be harvested for their own and others’ use (e.g. OAIster ). If folksonomy activity could spread beyond individual institutions, it might attract a larger user base.
Archivists’ Toolkit Demo
Archivists’ Toolkit is an open source collections management system and relational database designed specifically for archives. It was developed by a coalition of universities who were unhappy with commercial ILS’s inability to handle archival acquisition and processing procedures. The demo was interesting. Rather amazing to see a system that doesn’t have to be tweaked (or pummeled) to serve basic archival needs. The drawback, as with all open source stuff, is that it requires a lot of local tech support. Apparently there’s a very active user group. No public access component at this point, but they’re encouraging the user group to apply for grant funding to develop one.
Archivists’ Toolkit was getting loads of publicity at the conference. I don’t know if it’s officially endorsed by SAA, but they were certainly trying to generate buzz.
Description Section business meeting
The usual committee reports, mercifully brief, with the full reports available on the website.
Yet another plug for Archivists’ Toolkit!
Finally the interesting bit: a panel presentation on Contextual Information Innovations in Archival Description, which highlighted the new beta standard EAC (Encoded Archival Context– see http://www.iath.virginia.edu/eac/). EAC is basically a way of standardizing the contextual information which is vital for making sense of smaller units (folders, items) in archival or manuscript collections.
In this session, Kathy Wisser gave a presentation on the development and structure of EAC, and Peter Hymas (State Library of NC) debuted the NCBHIO project, which is the beginning of a union repository of EAC records for NC archival collections. NC is one of the first states to undertake this type of initiative, and our Digital Forsyth biographical database should fit nicely into the project.
Rethinking Access and Descriptive Practice
This wasn’t the strongest session of the conference. In fact, I thought the most interesting thing about it was the speaker/audience demographics. All of the panelists were well under 40 (the final paper was presented the author’s coworker, as the author herself was at Burning Man). Over half of the audience was probably 50+. And the surprising thing was, the GenX&Y panelists seemed far more alarmed by the effects of technology on archival practice than the audience was. Perhaps because the speakers’ presentations were based mostly on professional research literature, whereas the audience was drawing from years of practical experience of seeing technologies develop and be absorbed into the library/archival mainstream?
The presenters talked mostly about archivists’ responses to our patrons’ new information-seeking behavior. An important topic…but the audience of experienced archivists/librarians seemed underwhelmed by the speakers’ shocking revelation that college students use Google more than they use the library catalog. [Is this really a bad thing? Doesn’t Google serve a useful purpose that library catalogs don’t, and aren’t meant to?]
The presenters in this session also made much of the supposed estrangement between archivists and librarians. This, I have to say, is starting to sound like a tired cliche. Maybe it was true ten or twenty years ago, but now many institutions expect their archivists to have the MLIS degree, and many librarians are working with archival materials. I’m not sure it’s accurate to state, as one speaker did, that the library catalog is a “metaphor for archivists’ alienation from design and delivery tools.” It’s certainly true that most ILS are terrible at providing access to archival collections, but my experience (and that of several audience commentators) is that archivists have been eager to make what use they can of them, to bring attention to their collections.
Not to sound too negative about the session—there were several good points made. Such as:
- Unique materials are becoming more important to libraries, since they’re what everyone wants to digitize; archivists, as curators of these unique materials, should use their new “clout” to influence the design of next generation ILS so that these systems will become better delivery tools for archival materials.
- To do this, archivists need to do a better job of educating themselves about design and delivery tools.
- We need to be aware of changing ways in which people seek information: need to make collections accessible to standard search engines, not hide them in deep web; provide network-level cataloging and service so that users will have one-stop searching (e.g. Worldcat); users want discovery tools that are consistent and comprehensive.
- When creating digital collections, remember that
- Users want more than metadata—need to provide EAC type context information to make collections meaningful.
- Accurate keyword searching is useful, but not a substitute for traditional subject analysis and controlled vocabulary.
- Minimal processing at least makes collections accessible to users; no access to unprocessed collections for lengthy period of time is unacceptable.