A Guide to Digitization
A Guide to Digitization
Digitization typically takes the form of images. The advantage of a jpeg file is not only spatial and distributive—many more pages of text can be made available to many more people—but also the ability to search a text by means of OCR (Optical Character Recognition). But even OCR is bound to material constraints. Poor lighting makes for bad optics, and thus an error-filled transcription. Marlene Mancoff takes the materiality warning a few steps further, urging us not to overestimate our digitized documents’ content value (an e-book reads differently from a book) and immateriality (an e-book has a magnetic field!). Paul Conway adds another proviso about digitized distortion. Photographers’ editorial privilege frees them to tamper with everything from color to cropping. Video can compensate for some of these distortions, e.g. by capturing the object from different angles. But ultimately, DH requires practitioners to couple digitized recordings with footnotes (metadata or otherwise) that account for what the isolated image, audio, or video cannot.
We should thus digitize in as many dimensions as possible. In the case of a book, this means the thickness of the binding in addition to the number of pages. One could take this mandate even further, by hitching said book to a luggage scale to gauge its weight or recording the sound it makes in a car window. But as Melissa Terra reminds us, storage space is finite. If there wasn’t enough funding to prevent DH data rot in 2006, it’s unlikely that there will be today.
Conway’s and Mancoff’s essays remind us to keep potential distortions and alterations in mind as we read a digitized text. But aside from hand-wringing about material loss and representation, we should also monitor the attitudes that we bring to electronic media. I have yet to come across a study comparing how many pages students will read when assigned a book or a PDF, but I’m confident that there’s a difference. A book on the shelf retains value in a way that a digitized document, provided everyone can access it, does not.