Thursday, 31 March 2016

Formatted Memory-What I Learn About Digital Preservation

       Manojlovich (2011) mentioned two types of content that could be digitally preserved: digitized analogue content and born-digital content such as web sites and databases. The preservation of the latter will be more difficult because it has no physical counterpart.

Kirchhoff (2008), in her article, Digital Preservation: Challenges and Implementation, discussed about near-term access protection and long-term content protection, whose steps can be placed along a continuum as shown in the diagram below. She also mentioned that the key goals of DP are to maintain “usability, authenticity, discoverability, and accessibility” of digital resources.




Both “backup” and “access system redundancy” are regarded as near-term access protection. The unreliability of backup over a long period of time can be easily understood. Access system design suggests there will be the copies of the entire system from the very beginning. However, it is only an improved version of backup as both of them cannot accommodate any development in technology or data format.

Byte replication sounds better, but it does not solve the problem of file format incompatibility either. However, it allows the storage of the digital content in diverse geographical locations and no specialized software is needed to access the content. In that sense, it achieves more goals than the former methods.

Digital Preservation is the most reliable measure to ensure the longevity of digital materials. However, it needs organizational efforts, as well as government policy support.

There are also technological methods to solve the challenges of DP, according to Kirchhoff (2008). These include migration-producing different formats of digital content for future access, and emulation-the development of software that can identify earlier hardware and software.

Right now the biggest threat to digital content is the development of technology. Future technology are expected to be very different from today that it can no longer be applied to formats used today. This incompatibility happened with floppy disk once, and it might very possibly happen again. Therefore, the best practices might be to store in as many formats as possible. For example, Memorial University stored data in a variety of formats going back to 1977. They include: “Access Databases, Paper Files (14 filing cabinets), Excel Spreadsheets, Progeny files, Cyrillic files, Slides of various testing images, JPEGs of various testing images, Powerpoint presentations, and mostly importantly, Researcher’s memory (Manojlovich, 2011).

As for my research materials, it seems the best I can do for them now is to make backups in formats of both digital and physical.



References

Kirchhoff, A. J. (2008). Digital Preservation: Challenges and Implementation. Learned Publishing. 21. 285-294. doi: 10.1087/095315108X356716 

Manojlovich, S. (2011). Digital Preservation Best Practices [Power Point Slides]. Retrieved from http://www.slideshare.net/bpauwels/digital-presentation-best-practices-lessons-learned-from-across-the-pond

Documentable research

As I was considering the 'documentability' and preservation of research, I thought back to James' post from a few weeks back about the difficulty of Internet archiving. How can this be accomplished when the groups you're studying are form and dissolve easily, shutting down their resources simply by not paying for their domain?

From the beginning of the term, my research interest has changed quite a bit. Instead of the communication between traditional scholars, I began thinking about how amateur scholars communicate. In particular, amateur scientists (or citizen scientists) captured my attention.

Source: "The Return of Amateur Science" by Mark Frauenfelder


Before amateur became a sort of dirty word, it was considered a noble pursuit. Indeed, amateur science continues to contribute in important ways to STEM fields. Just as I was doing research on this topic, this article popped up on my Twitter feed:

Space rock smashing into Jupiter captured by amateur astronomers

A bright spot appears in the second-last frame of the timelapse posted by Irish amateur astronomer John Mckeon.
Source: CBC News























The missing piece of information-seeking behavior in science is how amateurs find the information that they need to conduct experiments or develop theories. One of the difficulties lies in the precarious nature of citizen science groups, associations, and organizations. Although important research comes out of citizen science, the lack of formal organizations makes it difficult to track information-seeking behavior. One concern that I have in this research is figuring out how to reach people who participate in these activities and make sure to have a varied enough sample group.

What is exactly 'un-documentable' about information-seeking behavior in citizen science? It may be the fact that the research process is difficult to define and is often not recorded. The one article I found about information-seeking of amateur scientists and the many about professional scientists have depended on surveys. By using a method such as interviews, it may become easier to capture some of the process of amateur science.


Preserving Digital Research Notes

As a historical research project, a large quantity of records I produce are research notes that I have taken after studying multiple sources. And since my work flow is almost entirely digital, a large portion of my records are born-digital. When I’m conducting research, I create a Word document, and I will write out exactly, passages from my sources. I also have a folder of digital journal articles I’ve cited, all in PDFs, since that is the file type uploaded online. I typically brainstorm ideas and a thesis with paper and pen, but I will create another Word document for these notes so that I can access them while writing the paper on my computer. Due to the nature of my topic, I’m limited in terms of access to primary source documents that I would want to use. Instead, I can only look at secondary sources that discuss the primary sources I would want. Another type of record I could potentially produce, if it existed, would be to gather scanned images of Lollard Bibles and Lollard-created records, IF they are scanned and the institution allows public access to the images. These would most likely be uploaded as JPEGs, and I would have a folder of these on my computer to refer to.

For this week’s blog post, I felt it was best to go back to my notes for the course, INF 2122 “Digital Preservation and Curation” in order to approach preserving DOC, PDF, and JPEG files. The first thing to do when assessing how best to preserve digital records is to determine the significant properties of the object, which can be identified as the content (information convey, i.e. text, image, programming code), the context (background information on its creation, i.e. creator, custodian), the structure (the arrangement of the component parts, i.e. pagination), the behaviour (essential functionality, i.e. hypertext links, updating calculations), and appearance (how the content appears, i.e. font and size, page layout, colour). These all relate to the “essence” of a digital record, which is the main component being preserved.

The National Archives of the UK has a very comprehensive guide for preserving digital records, accessible to the public. The National Archives has developed multiple tools for digital preservation including PRONOM and DROID. PRONOM is an online repository about data file formats and supporting software, with details on over 1,000 different digital file formats. DROID is a tool that scans a computer’s hard drive and identifies files, either through its file extension or its internal signature, with entries in PRONOM. Now some file formats have greater longevity than others due to file format obsolescence. For example, it’s extremely difficult to open a World Perfect file these days unless you are able to migrate the file to another format, in which case you could lose essential behaviour, appearance, or content. Or you emulate an environment where it’s possible to perform the file's original encoding structure. In general, what I have gathered from the Digital Preservation and Curation class, it is best practice to convert .DOC files into .PDF, and JPEG files into .TIFF as they have greater longevity and are not as prone to format obsolescence or bit rot. I would also have multiple copies of my files in differing formats, keeping the original DOC and JPEG, in multiple places including on a Cloud service like Dropbox, and an external hard drive.


 Sources: 
Andrew Wilson (2008). “Significant Properties of Digital Objects.” JISC Significant Properties Workshop, British Library, London, UK. April 7, 2008.



“Selecting File Formats for Long-Term Preservation.” The National Archives, August 2008, http://www.nationalarchives.gov.uk/documents/selecting-file-formats.pdf 

Preserving Digital Content



Parts of this week's question actually already occurred to me, about documenting what I'm studying when we were talking about fieldwork in week 5. To document archives' use of certain types of social media, web archiving is the only real answer, and I already covered many of the problems with that in week 5.

In looking to preserve digital content into the future, there are still a few other concerns which I did not talk about all that much in that earlier post. File format plays a big part in preserving digital content, and concerns like disclosure, or how much documentation is available for that format, as well as how widely it has been adopted and whether there are any other dependencies, like requiring proprietary software, all play roles in determining a format's longevity. This is not an exhaustive list. Making sure that the information a researcher wishes to preserve is in a format which will still be accessible years into the future is an extraordinarily important concern. No format is completely free from the risk of obsolescence, but pdf, as an example, does not appear to be hard to get access to in the future (Pearson and Webb, 2008). Planning for migration or emulation in the future for all content which a researcher wishes to keep available will be absolutely vital to properly preserving it, as will routinely looking at the environment for files of that type, and looking into migration options. Like all preservation, proper risk assessments will help make sure that there is nothing being forgotten, at least as much as this is actually possible.

To actually answer how I personally would keep track of the materials which I am talking about, I would look at current developments in web archiving and do everything else I practically could to save the content. Mostly, this would be keeping multiple backup copies of everything I produce, and ideally checking them regularly for bitloss or other forms of corruption. I'd do everything I could to use web archiving software to gather as many of the postings related to the topic as possible, store the resulting records as safely as I could in as open and widely adopted a format as I can find, and plan to migrate it to another format before the one I started becomes obsolete. I do hope that the technology involved improves in the future and solves these problems for me, but ultimately I would have to accept that, as I talked about in week 5, there really only is so much that can be done to properly preserve online content at this time. No matter how much effort I devote to this during my research, a great deal of what I am researching will one day simply disappear without any real trace.


Sources:
Brown, A. (2013). Practical Digital Preservation: a how-to guide for organizations of any size. London: Facet Publishing.
Giaretta, D. (2011). Advanced digital preservation. New York: Springer.
Kasioumis, N., Banos, V., & Kalb, H. (2013). Towards building a blog preservation platform. World Wide Web, 17(4), 799–825. http://doi.org/10.1007/s11280-013-0234-4
Pearson, D., & Webb, C. (2008). Defining File Format Obsolescence: A Risky Journey. International Journal of Digital Curation, 3(1), 89–106. http://doi.org/10.2218/ijdc.v3i1.44
Webber, J. (n.d.). What is still on the web after 10 years of archiving? - UK Web Archive blog. Retrieved March 3, 2016, from http://britishlibrary.typepad.co.uk/webarchive/2014/10/what-is-still-on-the-web-after-10-years-of-archiving-.html

Preserving My Research Materials... Now I'm Nervous!

I always like how the blogging questions have me thinking about things I would have never thought of… Although, as a “researcher”, I am almost ashamed to say that I never thought of how to ensure that my research materials are preserved! Anyway, let’s use this blog post to make up for lost time and think about how I can make sure that my research materials survive time (and technology). Well first of all, most of my research materials are not born-digital. I am working on 18th century manuscripts, which I hope are well preserved at the University of Ottawa’s Archives and Special Collections. Secondly, I like to print all articles relevant to my subject or photocopy parts of relevant books I find (SORRY environment!! I print double-sided though…) and directly write notes on the printed papers. I know this is probably not the most efficient way of working, and certainly not the most environmentally friendly way of working, but I have tried many different methods over the years and I still keep coming back to this one. So right now, unless there is a fire in my apartment or my files are lost in a moving, anyone should be able to access and read my research materials for years to come. However, everything I write about my research (excluding notes), such as scholarly articles, presentations for conferences, grant proposals and event a thesis or thesis proposal is written on a computer and I have no idea how I could make sure these could survive time and technological changes. I know about the various technologies available for digital data preservation, but I am taking the “Managing Organizational Records I” course this semester (required introductory records management course for the ARM concentration) and after having a lot of discussion about the subject in class, we still (including the prof!) do not a have a clear answer to that question. I am really hoping my fellow bloggers’ posts or comments of my post will help me with that, since this blogging question has really startled me!!!


Wednesday, 30 March 2016

Preserving Oral Histories: A Glance at Best Practices

Ideally, my research would be executed within a cultural institution such as the AGO or National Gallery of Canada. Since my purpose is to aid in the development of Knowledge Management protocols for capturing and preserving tacit knowledge of art librarians and archivists using the methods of oral history, I would assume that whatever documents and interview recordings I create will housed and maintained by that institution. This subject is slightly more complex since KM practices for knowledge retention, talent acquisition and succession planning typically falls under the responsibility of Human Resources. So establishing a partnership with HR for producing, using, and maintaining the results of my research would be beneficial.

With regards to the types of records my project will generate, these would namely be in the form of a variety of documents and recordings: my prior research, observation notes, interview transcripts, interview voice/video recordings, data analysis documents and final protocols/procedures developed from the research. This is a lot of information to organize and preserve. Of course, the Library and Archives might only want the interview-related materials while HR might wish to keep only information pertinent to policy development. Regardless, the data I collect would be stored both digitally and physically. Data can begin to be coded and categorized from the point of the literature search even prior to conducting any interviews (Knight, 2002, p. 189). Using software such as f4/f5transkript or NVivo would also be valuable for the analysis and storage aspects of the project.

In North America, there is not one particular prescribed Best Practice for conducting, representing and preserving oral histories online, since there are currently many varying approaches to these practices within the field. For the interview process, the Oral History Association (OHA) outlines post-interview Best Practices for archivists. One stipulation is that any information deemed relevant for use by future users including photographs, documents and other records, should be collected. The relationship of these supplementary materials to the existing interviews should be clearly stated and made available (“Principles for Oral History and Best Practices for Oral History”). In this case then, I would ensure that all of the materials gathered for my research would be available within the institution.

With regards to the digital media component of oral history recordings, the Oral History Association stresses the importance of storing, processing, refreshing and accessing the media according to the archival standards of the chosen format. Where possible, media formats should be cross platform and non-proprietary (“Principles for Oral History”). Obsolescence must also be a considered factor when storing and preserving the chosen media. Simply because a format is popular on the market does not mean it is immune to the challenges of obsolescence and modification. We know this all too well. In addition, institutions should continue to monitor these best practices for future migration and preservation procedures. By placing large amounts of data on hard drives or in a networked environment, users will be able to batch convert media formats. Alternatively, automated format migration is also becoming increasingly more affordable when dealing with large quantities of data (Boyd, 2010). The Collaborative Digitization Program Digital Audio Working Group (CDP) highly recommends that organizations maintain multiple copies in separate locations as a fail-safe strategy for the failure or destruction of the digital media. This may include hard drives, optical disk, magnetic data tapes or cloud computing. Proactive migration to new media as it becomes available is essential for sustainability purposes. It is suggested that this migration take place at a minimum of five-year intervals (Digital Audio Best Practices: Version 2.1).

While it is great that there are endless resources on best practices for preserving, migrating, storing and hosting oral history projects, the act of preservation largely depends upon those responsible for this. A possible option might be make these oral histories available online by creating a website to host the content such as the Toronto-based projects including Harbord Village Oral History Project, General Eclectic, and [murmur] Toronto. This would involve an entirely separate preservation plan, but would be in keeping with the best practices referred to above. In my case, I might not be in a position to control the preservation of my materials within the institution, digital or otherwise. Eventually, my methods and the information I collected might also become obsolete as the knowledge ecology of the library changes. Though I do believe it would be representative of a library’s history, regardless of its initial intent for knowledge capture and succession planning.


References:

Boyd, Douglas. “Achieving The Promise Of Oral History In A Digital Age.” In The Oxford Handbook Of Oral History, n.p. Donald A. Ritchie. Oxford: Oxford University Press, 2010. http://www.oxfordhandbooks.com.myaccess.library.utoronto.ca/view/10.1093/oxfordhb/9780195339550.001.0001/oxfordhb-9780195339550-e-21.

Collaborative Digitization Program Digital Audio Working Group. Digital Audio Best Practices: Version 2.1. Minneapolis: Collaborative Digitization Program, 2006. http://sustainableheritagenetwork.org/content/digital-audio-best-practices-version-            21.

Knight, P.T. (2002). Small Scale Research: Pragmatic Inquiry in Social Science and the Caring Professions. Thousand Oaks, CA: Sage.


Oral History Association. “Principles for Oral History and Best Practices for Oral History,” 2009. http://www.oralhistory.org/about/principles-and-practices/#best.