On Tuesday, I arrived in rainy Chicago and headed straight for the Hotel Palomar for the AIMS Project (“Born-Digital Collections: An Inter-Institutional Model for Stewardship”) workshop regarding born-digital archival material in collecting repositories. The free workshop, called “CREW: Collecting Repositories and E-Records Workshop,” included archivists and technologists from around the world to discuss issues related to collection development, accessioning, appraisal, arrangement and description, and discovery and access of born-digital archival materials.
The workshop program started with Glynn Edwards of Stanford and Gretchen Gueguen of UVa, who discussed collection development of born-digital records. The speakers suggested that both collection development policies and donor agreements should have clear language about born-digital material, including asking donors to contribute metadata to electronic records from his/her collection. The challenge, they note, is in collaboratively developing sound guidelines and policies to help archivists/curators make decisions about what to acquire. A group discussion about talking to donors about their personal digital lives and creating a “digital will,” both of which help provide important information about an individual’s work, communication, and history of using technologies.
Kevin Glick and Mark Matienzo from Yale and Seth Shaw from Duke discussed accessioning, the process through which a repository gains control over records and gathers information that informs other functions in the archival workflow. While many of the procedures for accessioning born-digital material is the same for analog material, the speakers distinguished accessioning the records from accessioning the media themselves (ie the Word document versus the floppy disk on which it is saved). Mark described his process of “re-accessioning” material through a forensic (or bit-level) disk imaging process, whereby he write-protected accessioned files to protect data from manipulation. He used FTK imager to create a media log with unique identifiers and physical/logical characteristics of the media, followed by BagIt to create packages with high level info about accessions. Seth discussed Duke’s DataAccessioner program, which he created as an easy way for archivists to migrate and identify data from disks. A group discussion asked: what level of control is necessary for collections containing electronic records at your institution? and, what are the most common barriers to accessioning electronic records, and how would they show up? Our table agreed that barriers include staffing (skills and time); being able to read media; software AND hardware; storage limits; and greater need for students/interns.
Simon Wilson from Hull, Peter Chan from Stanford, and Gabriela Redwine from the Harry Ransom Center at UT Austin discussed arrangement and description. They questioned whether archivists can appraise digital material without knowing content therein, which conflicts with the high-level, minimal processing emphasized in our field in the past few years. Another major issue is with volume: space is cheap, but does that mean archivists shouldn’t appraise? It isn’t practical to describe every item, but how will archivists know what is sensitive or restricted? Hypatia provides an easy-to-use interface that allows drag-and-drop for easy intellectual organization of e-records, as well as the ability to add rights and permissions information. Peter Chan described a complex method for using a combination of AccessData FTK in combination with TransitSolution and Oxygen to compare checksums, find duplicate records, and do a “pattern search” for sensitive terms and numbers (such as social security numbers). Gabi Redwine explored her work with a hybrid collection (analog and digital records) where she learned that descriptive standards should be a learning process for staff, not students or volunteers. Her finding aids for the collection included hyperlinks to electronic content and she advocated for disk imaging. The group discussion following this session was intense! The hotbed topic was: are professional skills of appraisal, arrangement, description still relevant for born digital materials? Our group agreed that appraisal and description remain important; however, we were strongly divided about whether archivists will need to contribute to arrangement of e-records. I believe that arrangement becomes less important as things become more searchable, as argued in David Weinberger’s Everything is Miscellaneous. Arrangement emerged before the digital realm as a way for archivists and librarians to contextualize and organize material based on topics/subjects; however, with better description, users can create their own ways of organizing e-records!
Finally, Gretchen Gueguen (UVa) and Erin O’Meara of UNC Chapel Hill discussed discovery and access. Our goals as archivists include to preserve original format and order as much as possible, and apply restrictions as necessary, while balancing this with our mission to make things accessible and available. Gretchen suggested the idea of Google Books’ “snippet” idea as a way to provide access without compromising privacy or restrictions on sensitive material. Her models for access for digital material include: in-person versus not; authenticated versus not; physical versus online access; and dynamic versus static. Erin described her use of Curator’s Workbenchwithin FOXML and Solr to control access permissions and assign restrictions and roles to e-records. Another group discussion included chewy scenarios for dealing with born-digital materials; my table had to consider: “you are at a large public academic research library; director brings several CDROMs, Zip disks and floppy disks of famous (secretive) professor from campus; they are backup files created over the years; office has more paper files; professor and his laptop are missing; no one can give further details on files; write 1 page plan for preserving/describing files; working institutional repository exists.” With no donor agreement and an understanding that the faculty member was very private, we couldn’t go ahead with full access of the material.
At the end of the day, I left with a much better grasp of how I see myself as an archivist dealing with born-digital material (primarily those on optical and disk media). It seems that item-level description works best for born-digital while aggregate description works best for analog materials. Digital records are dealt with best through collaboratively-created policies and procedures for acquiring, processing, and describing them. Great stuff!