Among Digitized Manuscripts

My book is published

Among Digitized Manuscripts is published in hardback and electronic open access. It also has an online appendix.

Manuscripts have been digitized in very large numbers. Technology has become more and more easy to learn. Right here, right now, is the time to capitalize on these opportunities and accept ‘digital manuscript studies’ as a normal, yes necessary, part of our work. This book is a great start for that!

Read it now for free

General summary Chapter by chapter summary Videos, Podcasts, and Tutorials Successful workshops and talks How to get in touch

General Summary

In about 350 pages I introduce you to manuscripts in their digitized form.

Central to the first part of the book is a proposal to speak of the ‘digital materiality’ of digitized manuscripts. Too often we approach them as a neutral window onto the material artifact, but digitization is not a neutral (let alone enhancing) process! With ten notions (size of collection, online availability, ability to download, portal, viewer, page numbers, resolution, color balance, lighting, and cut) I propose a way of describing digitized manuscripts that does justice to both the digital data that constitute the actual, digitized manuscript, and the digital context in which this is stored or offered. To guide and give foundation to this work, I include a chapter reflecting on the differences between material manuscripts, printed publications, and digital documents.

I apply these notions to twenty repositories around the world, showing the variety we can encounter. In this book I am not so much speaking about how to digitize but more about what to do with all the digitized holdings. In my opinion, the model of the lone scholar we are so used to in the Humanities remains viable. Team projects in the so-called digital humanities often produce meagre results, yet require large sums of money. What can we do, then, as a single student, scholar, or librarian, applying open source technology on freely obtained digital files? A lot, honestly, and I introduce a whole suite of skills, a rich toolbox, to do with digitized manuscripts whatever you want. I encapsulate this in the archetype of a Centaur: a person whose head is firmly in the humanities, asking research questions relevant to their field and letting their erudition be their guiding principle, but whose feet are formed by digital methods, allowing to move around in ways and at paces never before seen.

I show the importance of redrawing glyphs in vector format, in order to do paleographic investigations and later be able to reuse them in digital editions or catalogs. I show how, then, to make a catalog - describing a workflow from a dusty backroom of an archive to a polished website on the internet. When your needs become more and more specific and even specialized software does not cut it anymore, it is time to write code yourself. The book crescendos into an introduction to programming in Python, using so-called computer vision to automatically detect and analyze features in images such as the shape of a codex or ownership statements. Yes - you too can do that, for free, on your own computer.

Of course I give ample space to discussing how to do the philological work of editing texts, or, as it is called on computers, how to encode it. My main aim is to get readers to stop using Microsoft Word as their one and only tool. I think what helps is emphasizing the different stages of a workflow, which will shift our attention away from the final, publishable product. By proposing the following workflow: Digitizing → Transcribing ⇄ Analyzing → Publishing, I emphasize the modular nature of working on computers: once we transcribe (i.e. encode) the text we can use for it multiple purposes of which an edition is only one. To get the most out of our work it is imperative that we know and use standards. Mightily important ones are IIIF for images, Unicode for transcribing, and TEI for marking up. But they are not the only options. I am especially critical of TEI, which is likely too complicated and too unwieldy for most purposes. I include a short guide to different workflows depending on different aims.

In the conclusion I once more make the case for emancipating digital methods within every field of the humanities, taking down the ‘digital’ in ‘digital humanities’ and simply incorporate the tools and methods into our normal, everyday workflow. We are at a point in time at which many of you will wonder: is this all worth it? Is this all necessary? I answer forcefully yes, because what seems like a super-power right now by which you can distinguish yourself from your peers, it will in five years become the norm, and in ten years you will be perceived as being behind if you do not know your way around a computer.

The low-hanging fruit is plenty, assuring a fast and high return on investment. With this book I have endeavored to help you on your way, to point out those low-hanging fruits. Both the novice and more advanced user of computers will find use in this book. I hope you are as excited about it as I am!

Chapter by chapter summary

1. Manuscript, Print, and Digital World

Chapter 1 is a theory-heavy chapter that provides a framework to see how the manifestations of a manuscript, print publication, and digital document relate to each other. In the chapter, I introduce the concepts ‘manuscript world,’ ‘print world,’ and ‘digital world.’ I discuss how our work can be explained through the different relationships between these worlds. The manuscript world is a realm in which participants use and produce texts by writing them with ink by hand, on parchment or paper. The print world is a world understood through engaging with texts machine-printed on mass- produced paper. The digital world, finally, is created by typing on a computer keyboard and reading back text on an electronic screen. When we edit, we base our work on artifacts from the manuscript world. We work, meanwhile, on a computer, thereby working in the digital world. Our final product, however, is often times a printed book, part of the print world.

2. The Digital Materiality of Digitized Manuscripts

Chapter 2 is the core of the conceptual part of the book. I discuss the perception scholars have of digitized manuscripts. They consider them ‘larger than life,’ that is, they emphasize the ability to zoom in and make visible tiny details invisible to the naked eye. I discuss how this perception rests on larger trends of thinking about mechanical reproduction and digital surrogates. In essence, I see scholars use digitized manuscripts as though they are a window onto which one can look at the physical manuscript. As such, it is unsurprising to see scholars cite the physical manuscript when in fact they made use of a digital surrogate. I then proceed to criticize this attitude. In fact, I tear it to shreds. This attitude rests largely on ignoring the ‘digital materiality’ of digital photos, which in turn is because we do not have a vocabulary to describe its important aspects. I introduce ten aspects to evaluate a digitized manuscript and its repository: 1) size of the collection; 2) online availability; 3) ability to download; 4) the portal; 5) the viewer; 6) indication of page numbers; 7) image resolution; 8) color balance; 9) lighting; and 10) how the image is cut.

3. Digitized Manuscripts and Their Repositories

Chapter 3 takes these aspects and applies them to twenty repositories that were chosen to give a representative picture of the state of digitization of Islamic manuscripts worldwide. As many of these libraries also host manuscripts of other disciplines, readers from beyond Islamic Studies should still find this of ample interest. The result is that quality and usability varies wildly. Not all manuscripts are downloadable, which is worrisome, as is the often ambiguous legal restrictions. On the whole, digitization seems to be firing on all cylinders which is promising. I end with a SWOT-analysis to speculate on the future of these repositories.

4. Paleography: Between Erudition and Computation

Chapter 4 has two topics, both connected to paleography. First, I discuss the sprawling field of big-budget team projects related to digital paleography and I notably discuss when such big projects work well and when they do not at all seem to deliver what they promise. Next, I provide a practical example of how a tablet and free drawing software can be used to do simple yet effective paleographic work. This part of the chapter is an extended, more formal, and more in-depth version of my article “Mysterious Symbols in Islamic Philosophy.” That article discusses three glyphs that appear in a text by twelfth-century philosopher Suhrawardī. Suhrawardī says that only the initiate will understand how these symbols represent the essence of his philosophy. By (literally) drawing from a number of medieval manuscripts and combining different versions of the glyphs, I come to the interpretation that the symbols are constructed from Arabic letters.

5. Philology: Standards for Digital Editing

Chapter 5 takes on the concept of ‘digital edition.’ I draw on the extensive literature available, and emphasize that a digital edition as a general concept, and TEI as a specific building block of it, is not to be universally adopted, but a solution for specific cases. One such case is the editing of commentaries from the post-classical period. Their intertextuality is multidimensional to such a degree as can hardly be explained by conventual print techniques. Thus we stand to benefit in preparing our analysis and in publishing our results from a digital environment in which different layers (i.e. authors) can be turned on and off. In the chapter I describe the technology behind it. I further point out the merits of a good digital edition: give back control to the reader to make decisions for theirselves, while clearly also adding editorial value yourself. Lastly, I show how setting up your computer just right can be very helpful. For example, I demonstrate how to create your own keyboard lay-out so that when you press a key on your keyboard, a character (or multiple characters) of your choice is displayed.

6. Cataloging: From a Dusty Backroom to the World Wide Web

Chapter 6 explains why knowing web development is a great asset for students and scholars in the Humanities, and why catalogues should be among the first to be turned into digital assets. I explain the entire process of creating an online catalog of a hitherto uncatalogued collection, from how computers helped me in my fieldwork to how I created an interactive website to make the catalog available to anyone else.

7. Codicology: Automated Analysis Using Python and OpenCV

Chapter 7 provides a counter example for those who argue that codicology requires the physical manuscript. This chapter takes on a notably technical character, as I explain how I used the programming language Python and the library OpenCV to analyze the cover of several thousand digitized manuscripts. The method is automated image recognition, the aim is to say meaningful things about the shape of the codex, focussing on one particular aspect of Islamic manuscripts. Preliminary results are explained, but the chapter revolves more around introducing programming in general and Python in particular.

8. A Digital Orientalist

The conclusion brings together some issues arising in our everyday workflow, and legal issues of our work. I make a case for the usefulness of self-sufficiency; a scholar in the humanities is much better off establishing their own digital repository and implementing automatized solutions themselves, rather than doing this on a project basis made possible through grants. Of course, I end with an eye towards the future.

9. Postscript. Among Digitized Manuscripts

Working in manuscript studies is an extraordinary grace. In my own field, Ignati Kratchkovsky (Игнатий Крачковский) wrote beautifully about this, in his Among Arabic Manuscripts; a memoir of his experience with manuscripts throughout his life (1883–1951). He writes about what tactile interaction can do, saying for example that “Many are the hands through which it passed in Africa, Asia and Europe before it came to rest on the shelves of the Manuscript Department.” He speaks of the highs “when some discovery will gleam like a tiny spark,” and of the lows “bringing me often to the verge of despair and making me doubt my ability.” In his book, he works out what that feeling of attraction is by giving little anecdotes, or snapshots, of his interaction with manuscripts, libraries, and the people in and around them. Each snapshot sets a certain tone and gives one aspect of the multi-faceted experience of working with manuscripts. Kratchkovsky does this masterfully, especially by connecting stories in unexpected ways. What seemed an unimportant detail in one snapshot, takes a central role in another one. I think it is important to say something about what that experience is, when working with digitized manuscripts. To do that, I wish to leave you with some of my own stories, inspired in style and content by Kratchkovsky.

10. Online appendix

Over at GitHub.com/Among you can find all kinds of resources which will be updated from time to time. This way, the book lives on.

Successful workshops and talks

I organized a full-day workshop “From manuscript to text analysis” at DH2019, the global conference on digital humanities, in collaboration with Wido van Peursen (ETCBC, VU Amsterdam), Mladen Popović (Qumran Institute, RU Groningen), and Pierre van Hecke (Research Unit Biblical Studies, KU Leuven). It was a great success. Read about it here https://github.com/ancient-data/dh2019/wiki
For the LUCIS Summer School “Philology & Manuscripts from the Muslim World” I provided a 2-hour introduction to the possibilities of digital manuscript studies.
For NISIS, the Dutch graduate school for Islamic and Middle East Studies, I developed a three-part workshop called ‘Behind the looking-glass’ as a general introduction to DH. The first part consists of homework.The second part is a lecture. The third part is a full-day workshop. This workshop has an emersion approach: we actually start programming in Python. In the morning we show how you can make visualizations. In the afternoon two parallel sessions covered extracting text from scanned editions in PDFs and scraping text from websites such as YouTube and Twitter. Forty students attended from all over Europe and the Middle East. See here. For some of the resources look here.

Additionally, I have spoken about this topic at conferences in Amsterdam, Utrecht, Leiden, The Hague, Oxford, London, Vienna, Berlin, Frankfurt, Durham, Providence, New Haven, and Washington DC.