Thursday, 31 March 2016

Preserving Digital Research Notes

As a historical research project, a large quantity of records I produce are research notes that I have taken after studying multiple sources. And since my work flow is almost entirely digital, a large portion of my records are born-digital. When I’m conducting research, I create a Word document, and I will write out exactly, passages from my sources. I also have a folder of digital journal articles I’ve cited, all in PDFs, since that is the file type uploaded online. I typically brainstorm ideas and a thesis with paper and pen, but I will create another Word document for these notes so that I can access them while writing the paper on my computer. Due to the nature of my topic, I’m limited in terms of access to primary source documents that I would want to use. Instead, I can only look at secondary sources that discuss the primary sources I would want. Another type of record I could potentially produce, if it existed, would be to gather scanned images of Lollard Bibles and Lollard-created records, IF they are scanned and the institution allows public access to the images. These would most likely be uploaded as JPEGs, and I would have a folder of these on my computer to refer to.

For this week’s blog post, I felt it was best to go back to my notes for the course, INF 2122 “Digital Preservation and Curation” in order to approach preserving DOC, PDF, and JPEG files. The first thing to do when assessing how best to preserve digital records is to determine the significant properties of the object, which can be identified as the content (information convey, i.e. text, image, programming code), the context (background information on its creation, i.e. creator, custodian), the structure (the arrangement of the component parts, i.e. pagination), the behaviour (essential functionality, i.e. hypertext links, updating calculations), and appearance (how the content appears, i.e. font and size, page layout, colour). These all relate to the “essence” of a digital record, which is the main component being preserved.

The National Archives of the UK has a very comprehensive guide for preserving digital records, accessible to the public. The National Archives has developed multiple tools for digital preservation including PRONOM and DROID. PRONOM is an online repository about data file formats and supporting software, with details on over 1,000 different digital file formats. DROID is a tool that scans a computer’s hard drive and identifies files, either through its file extension or its internal signature, with entries in PRONOM. Now some file formats have greater longevity than others due to file format obsolescence. For example, it’s extremely difficult to open a World Perfect file these days unless you are able to migrate the file to another format, in which case you could lose essential behaviour, appearance, or content. Or you emulate an environment where it’s possible to perform the file's original encoding structure. In general, what I have gathered from the Digital Preservation and Curation class, it is best practice to convert .DOC files into .PDF, and JPEG files into .TIFF as they have greater longevity and are not as prone to format obsolescence or bit rot. I would also have multiple copies of my files in differing formats, keeping the original DOC and JPEG, in multiple places including on a Cloud service like Dropbox, and an external hard drive.


 Sources: 
Andrew Wilson (2008). “Significant Properties of Digital Objects.” JISC Significant Properties Workshop, British Library, London, UK. April 7, 2008.



“Selecting File Formats for Long-Term Preservation.” The National Archives, August 2008, http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf 

No comments:

Post a Comment