Symposium Webcast: Distant Reading & the Islamic Archive (October 2015)

On October 16, 2015, the Digital Islamic Humanities Program at Brown University held its third annual scholarly gathering, a symposium on the subject “Distant Reading & the Islamic Archive.”

Paper abstracts are available here, and some photos of the event are posted below. The symposium was recorded in its entirety and may be accessed at the links following the photo gallery.

Photographs (by Rythum Vinoben; see his website for more photos)



Session 1:

  • Elias Muhanna (Brown University), Introduction and welcoming remarks
  • David Vishanoff (University of Oklahoma): A Customizable Exaptive “Xap” for Charting Currents of Islamic Discourse across Multiple Bibliographic and Full Text Datasets
  • Peter Verkinderen (Universität Hamburg): Which Muḥammad? Computer-Based Tools for the Identification of Moving Elites in the Early Islamic Empire

Session 2

  • Alexander Magidow (Univ. of Rhode Island) & Yonatan Belinkov (MIT), “Digital Philology and the History of Written Arabic”
  • Elias Muhanna (Brown University), “Modeling Mannerism in Classical Arabic Poetry”

Session 3 

  • Karen Pinto (Boise State University), “MIME and Other Digital Experimentations with Medieval Islamic Maps”
  • Seyed Mohammad Bagher Sajadi (Qazvin Islamic Azad University) and Mohammad Sadegh Rasooli (Columbia University): Automatic Proper Names Extraction from Old Islamic Literature
  • Maxim Romanov, (Universität Leipzig), “al-Ḏahabī’s Monster: Dissecting a 50-Volume Arabic Chronicle-cum-Biographical Collection From the 14th Century CE”

Session 4

  • Nir Shafir (UCLA), “Distant Reading the Material and Bibliographic Record of the Early Modern Islamic Archive”
  • Eric van Lit (Yale Univ.), “A Digital Approach for Production and Transmission of Knowledge in Islamic Intellectual History”
  • Taimoor Shahid (Univ. of Chicago), “Mobile Ethics: Travel and Cosmopolitanism in the Islamic Archive”

CFP: Courts and Judicial Procedure in Early Islamic Law


Professor Intisar Rabb (Harvard Law School, Director of Islamic Legal Studies Program, creator of SHARIAsource) is convening a conference at Harvard next year (May 6 2016) on courts and judicial procedure in Islamic law. The SHARIAsource project recently received a $425,000 grant from the MacArthur Foundation, and I imagine that the project may be unveiled publicly around the time of the conference.

Here’s the Call for Papers.


Textual Corpora Workshop 2014: A Review


On October 17-18, 2014, the Digital Islamic Humanities Project at Brown University organized a workshop on “Textual Corpora” in the Digital Scholarship Lab at Rockefeller Library. We had around forty participants from various universities and institutions from around the world, and we spent a couple of days engaged in fruitful discussions and hands-on tutorial sessions. One of the participants, Dr. Sarah Pearce (NYU), wrote up a review of the event for her blog, Meshalim. She has kindly allowed us to reprint it here.


Textual Corpora and the Digital Islamic Humanities (day 1) | by S. J. Pearce

Normally I would not be totally comfortable posting a conference report like this because it’s basically reproducing others’ intellectual work, presented orally and possibly provisionally, in written and disseminated form. However, because video of the event is going to be posted online along with all of the PowerPoint slides, these presentations were made with an awareness that they were going to be disseminated online and so a brief digest does not strike me as a problem. With that said, what I am writing here represents the work of others, which I will cite appropriately.

The workshop convener, Elias Muhanna, began by introducing what he called “digital tools with little no learning curve.” These included text databases such as,, and the aggregate dictionary page al-mawrid reader (which is apparently totally and completely in violation of every copyright law on the books). Then there were sources for collections of unsearchable PDFs (,, and; and, which is pretty much the only one on this list that isn’t violating copyright law in some way) and sources for searchable digital libraries of classical Arabic texts ( and; and al-jami’ al-kabīr, which is a database that has the special feature of mostly not functioning and mostly not being installable). as well as various databases used by computational linguists are the best bets for modernists looking for things.

With respect to all of these, the question of how the texts are entered is a bit of a mystery. Some are rekeyed from editions, some are scanned as PDFs and some are OCR scanned; and even though OCR scanning can be up to 99% accurate, that still translates into a typo every hundred characters, which is not ideal. Regardless of the technology used to upload these texts to these databases, copyright law was raised, ongoing, as an issue surrounding the use of these tools, and the current state of play appears to be somewhere between the wild west and don’t ask don’t tell.

A few sample searches were run to demonstrate what they might be used for — occurrences of the phrase allahu a’lam to gauge epistemological humility (I’m not totally sure about the reliability of the one to gauge the other, but nevermind) and an Arabic proverb I did not know previously about mongoose farts (fasā baynahum al-ẓaribān) to illustrate a search to determine how a saying might be used, whether purely for grammatical or sociologically illustrative purposes (as this one apparently is) or whether it occurs within a discourse.


These text collections were a segue into Maxim Romanov’s presentation on the difference between text collections and text corpora and the desiderata for the creation of the latter.

Text collections are what already exit. They are characterized by the following traits:

  • reproduce books (technically DBs but don’t fuction as DBs)
  • Book/Source is divided into meaningless unites of data, such as “pages”
  • Limited, ideologically biased (shamela is open but BOK format is obscure)
  • Not customizable  (users cannot add content)
  • Limited search options
  • Search results are impossible to handle (have to have your own system on top of the library system)
  • No advanced way for analyzing results (no graphing, mapping)
  • No ability to update metadata

Textual corpora are what we need to be creating. They are characterized by the following traits:

  • Adapted for research purposes (open organization format)
  • Source is divided into meaningful unites of data (such as “biographies” for a biographical collection, “events” for chronicles, “hadith reports” for hadith collections)
  • Open and fully customizable
  • Complex searches (with filtering options)
  • Results can be saved for later processing (multiple versions, annotations, links)
  • Visualizations of results
  • Easy to update metadata


Elli Mylonas gave an introduction to the idea of textual markup, which was the piece that was the most general and most theoretical of the day. She raised a number of interesting issues.

One was the question of how archival data can be, and she made the case for XML files being not quite as good as acid-free paper in a box, but basically the digital equivalent. It’s a standard language and it is text-based and therefore should be readable on future technologies, whatever they might be.

She then made the case that text markup is a form of textual interpretation; and when somebody asked a question that was predicated on his being okay with the status quo in which some people do programming and some people analyze texts, she replied that marking up a text for XML really forces you to think more carefully about both the structure and the content of the text; it’s not an either-or proposition. This is not a case where science is trying to impose itself upon the humanities (ahem, quantum medievalism) but rather supplement it methodologically.

One important distinction is between markup and markdown. The latter is a more descriptive, plain-text rendering of, well, text, that allows it to be more easily exported into a variety of schema. (I think?) Markdown is less rule-bound, more abstract, and more idiosyncratic, which means that it is less labor intensive but potentially less-widely useful in the absence of a really robust tagging scheme.

She showed a few examples of a marked up text, including the Shelley-Godwin archive, which has Mary Shelley’s notebooks marked up to show where her handwriting occurs and where her husband Percey’s does, as a way of trying to put to rest the question of who really wrote Frankenstein; a Brown project on the paleographic inscriptions of the Levant that, she told us, provokes an argument between every new graduate student worker and the PI over how to classify the religious assignation of the inscriptions (see? interpretation!); and the Old Bailey Online, in which you can search court records by crime and punishment.

The one difficulty for Islamic Studies is that XML comes out of the European printing and typesetting tradition and is therefore not natively suited to Arabic and other right-to-left languages.


Maxim Romanov then gave a practical introduction to one element of text markup, namely regular expressions, a way of creating customized searches within digitized text. These are two web sites with some basic instructions and options to practice:

One example of a regular expression is this. If I wanted to find all the possible transliterations of  قذافي (the surname of the deposed Libyan dictator) in a given text in a searchable text corpus, I would type:  [QGK]a(dh?)+a{1,2}f(i|y) as my search term. This would look for any word that began with Q, G, or K, then had an a, then had a d and possibly an h and a possible repetition of that combination, one or two As, and f, and then either an i or a y. There was much practicing and many exercises and that’s really all I have to say about that. (Except that this cartoon  and this one suddenly make a lot more sense.)


Textual Corpora & the Digital Islamic Humanities (day 2) | by S.J. Pearce


Following up on the Qaddafi-hunt by regular expression of day 1 of the workshop on digital Islamic humanities, here is Maxim Romanov, demonstrating a regular expression to search for terms that describe years within a text corpus that hasn’t been subjected to Buckwalter transliteration but is rather in the original Arabic script.

Three major topics that were covered on day 2.

Scripting. Maxim Romanov covered a basic overview of/introduction to scripting and the automation of repetitive tasks, such as downloading thousands of things from the web, converting text formats, and conducting complex searches (by regular expressions).

The preferred scripting language amongst this crowd of presenters was Python, in no small measure because it is named after Monty Python, but also because it is very straightforward. Maxim illustrated some of the possibilities with python by walking us through one of his research questions, which was about the chronological coverage of certain historical sources, in other words, how much attention do certain time periods get versus other time periods?. He demonstrated the methods he used for capturing date information from a really large amount of text by automating specific queries with script, and then processing the data so it could be output in an easily readable graph. Conference organizer Elias Muhanna emphasized that this was an example of how digital and computational methodologies are not replacements for analysis but rather demand quite a lot of good, old-fashioned philological hard-nosedness, but offer different tools for exploring and expressing it. This is a way of simply speeding up and scaling up what we are already doing.

We then had a brief presentation from one of the researchers from the Early Islamic Empire at Workproject, who showed us how his team is creating search tools for their corpus, tools which will be made publicly available in December as the Jedli toolbox, which will include various types of color-codeable, checklist- and keyword-based searching. One of the major takeaways from this presentation was the idea that by being able to edit open-source code and program things, it’s possible to build upon earlier existing work to make things do specifically what any given researcher wants them to.

This raised the question of citation, which, based on a lot of the comments made in response to the question (which I asked), made it seem like a total wild west. One of the participants with quite a lot of programming experience said that citing someone else’s code would be like citing a recipe when you make dinner for friends, and other participants and presenters said that if you were using something really extraordinary from somebody else’s project, you might mention that. However, Elli Mylonas disagreed, arguing that correct citation of existing work is one of the ways that the digital humanities can gain traction within the academy as legitimate scholarship that counts at moments like tenure review rather than languishing, in the same manner as the catalogues and indices that we all rely upon but don’t view as having been built by proper “scholars.” I would tend to think she’s right.

Timelines. Then Elli Mylonas introduced us to various timeline programs. Like yesterday, her presentation was really grounded in the theory and the wherefores and the big issues behind the DH. So she started out with the assertion that “timelines lie,” that is, that any kind of timeline looks objective but is, in fact, encoding a historical argument made by the researcher who compiled and presented it. (I think this actually has an interesting parallel with narrative, footnotelessness or minimally-footnoted writing such as A Mediterranean Society (which has loads of footnotes but leaves a lot out, too), that in effect encodes a massive amount of historical argumentation within something that simply reads as text.)

Important things to look for in choosing a timeline program are: the ability to represent spans of time rather than just single points, the exportability of data, and the ability of the program to handle negative dates (again, encoding an argument about the notions of temporality and the potentiality of time). A free timeline-generating app is Timeline JS, which works with Google spreadsheets. That is the one that we tested out as a group. We also looked at Tiki-Toki, which is gorgeous but requires a paid subscription. (Definitely worth looking into whether one’s institution has an institutional subscription.)

Maxim Romanov suggested that this might be a useful tool for something like revisiting the chronology in Marshall Hodgson’s Venture of Islam.

Finally, we looked at Orbis, Sanford’s geospatial model of the Roman world, which looks at travel through the Roman empire based upon time and cost. This is a feasible project because of the wealth of data and the relative uniformity of roads and resources and prices within the Roman empire and would have to be modified to deal with most of Islamciate history (Which brings to mind the question of the extent to which Genizah sources as a fairly coherent(ish) corpus can be used to extrapolate for the rest of the Islamic world rather than just the Jewish communities within it; if yes, that might be a feasible data set for this kind of processing. Really not my problem, though.) This was a perfect segue into the final topic of the day.

Geographic information systems. This piece was presented by Bruce Boucek, who is a social sciences librarian at Brown trained as a cartographer. He gave an overview of data sources and potential questions and problems, and then Maxim Romanov gave a final demonstration about how geographic imaging can be used to interrogate medieval geographic descriptions and maps.

Image courtesy of S.J. Pearce

Image courtesy of S.J. Pearce

By aligning the latitude and longitude information from a modern map to the cities marked on a medieval one (or simply by making L&L conform on a less contemporary modern map of unknown projection or questionable scake) and observing the distortion of the medieval map when it was made to conform to the modern one, we began to see what kind of view of the world the mapmaker, in this case Muqaddasi, held. What was he making closer and farther away than it really was? What kind of schematic does that yield?

And that’s that. Video and a link library should be up online at the workshop web site, and one of my colleagues storified all of the tweets from the conference. I’ll probably write another post or two in the coming week reflecting on how I might begin to start using some of these tools and methods as I finish up the book and start work on a second project.

Time-Stamped Program for 2013 DH Conference

I’m very grateful to Maxim Romanov for putting together this very helpful program complete with timestamps for each presentation of our 2013 conference. Click the links below to watch each day of presentations in its entirety, and navigate to the talk or discussion section you’re interested in via the slider.

Thursday, October 24, 2013 – Day One

  • Introduction
    • | 0:00:00 | Beshara Doumani (Brown U): Opening Remarks
    • | 0:00:45 | Elias Muhanna (Brown U): Opening remarks
  • Digital Ethnography | chair: Beshara Doumani (Brown)
    • | 0:13:30 | Beshara Doumani (Brown U): Introduction
    • | 0:14:25 | Peter McMurray (Harvard), “Berlin Islam as Acoustic Ecology: An Ethnography in Sound”
    • | 0:41:00 | Nadia Yaqub (UNC), “Working with Indigenous Digital Humanities Projects: The Case of the Mukhayyam al-Sumud al-Usturi Tal al-Za‘tar Facebook Group”
    • | 1:09:20 | Discussion
  • Manuscript Visualization and Digitization | chair: Elias Muhanna (Brown)
    • 1:33:05 | Elias Muhanna (Brown): Introduction
    • 1:34:50 | Alex Brey (Bryn Mawr), “Quantifying the Qur’an”
    • 1:54:50 | David Hollenberg (Univ. of Oregon), “Preserving Islamic Manuscripts Under Erasure: The Yemeni Manuscript Digitization Initiative”
    • 2:18:15 | Discussion
  • Text Mining | chair: Beatrice Gruendler (Yale)
    • | 2:45:00 | Beatrice Gruendler (Yale): Introduction
    • 2:47:40 | Maxim Romanov (Tufts), “[Toward] Abstract Models for Islamic History”
    • 3:14:50 | Guy Burak (NYU library), “Comparing Canons: Examining Two 17th-century Fatawa Collections from the Ottoman Lands”
    • 3:35:20 | Kirill Dmitriev (St. Andrews), “Arab Cultural Semantics in Transition”
    • 3:51:50 | Discussion
  • Databases | chair: Elli Mylonas (Brown)
    • | 4:37:30 | Elli Mylonas (Brown): Introduction
    • 4:37:55 | Sebastian Günther (Göttingen), “A Database & Handbook of Classical Islamic Pedagogy”
    • 5:09:45 | Will Hanley (FSU), “Prosop: A Social Networking Tool for the Past”
    • 5:31:00 | Discussion
  • Mapping | chair: Sheila Bonde (Brown)
    • 6:03:50 | Sheila Bonde (Brown): Introduction
    • 6:05:00 | Till Grallert (Freie Univ. Berlin), “Mapping the Urban Landscape through News Reports: Damascus and its Hinterlands in Late Ottoman Times”
    • 6:26:50 | Meredith Quinn (Harvard), “The Geography of Readership on Early Modern Istanbul”
    • -:–:– | Discussion (not available)

Friday, October 25, 2013 – Day Two

  • Digitization and E-Publication | chair: Ian Straughn (Brown)
    • 0:00:00 | Ian Straughn (Brown): Introduction
    • 0:02:00 | Dagmar Riedel (Columbia Univ.), “Manuscripts and Printed Books in Arabic Script in the Age of the E-Book: The Challenges of Digitization”
    • 0:31:50 | Chip Rossetti (Managing Editor, LAL), “Al-Kindi on the Kindle: The Library of Arabic Literature and the Challenges of Publishing Bilingual Arabic-English Books”
    • 0:54:50 | Discussion
  • Disciplinary and Theoretical Considerations | chair: Elias Muhanna (Brown)
    • 1:21:10 | Elias Muhanna (Brown): Introduction
    • 1:21:55 | Afsaneh Najmabadi (Harvard), “Making (Up) an Archive: What Could Writing History Look Like in a Digital Age?”
    • 1:55:25 | Travis Zadeh (Haverford), “Uncertainty and the Archive: Reflections on Medieval Arabic and Persian Book Culture in the Digital Age”
    • 2:22:10 | Discussion
  • Keynote address
    • 2:43:55 | Elias Muhanna (Brown): Introduction
    • 2:47:00 | Dr. Dwight Reynolds (UCSB): “From Basmati Rice to the Bani Hilal: Digital Archives and Public Humanities”
    • | 3:30:00 | Discussion

Watson Institute Write-Up About 2013 Islamic DH Conference

Detail of a page (c. 1580) from Minassian Collection, a database of Persian, Mughal, and Indian miniature paintings at Brown's Center for Digital Scholarship.

Detail of a page (c. 1580) from Minassian Collection, a database of Persian, Mughal, and Indian miniature paintings at Brown’s Center for Digital Scholarship.

Here’s a great Watson Institute write-up about our 2013 conference, by Samuel Adler-Bell. Thanks to Sarah Baldwin-Beneich. 

Digital Humanities and Middle East Studies

New Methodologies for Old Texts Raise Eyebrows

Last month “The Digital Humanities and Islamic and Middle East Studies” conference at the Watson Institute brought together scholars from a range of disciplines to examine the effect of new digital archiving and research technologies on the study of Islamic and Middle Eastern history and literature. And it might never have happened if it weren’t for renowned Islamic historian Michael Cook’s eyebrow.

In 2004, conference organizer Elias Muhanna was a graduate student in Professor Cook’s famously difficult history methodologies seminar at Princeton. Muhanna, who is now assistant professor of comparative literature and Middle East studies at Brown, was spending, as he put it, hours upon hours in the “depths of Princeton’s Firestone Library poring over 19th century editions of long forgotten compendia by minor authors in godforsaken locations of the medieval Islamic world” and often still failing to find the answers to Cook’s arcane historical puzzles. The course was, in Muhanna’s words, a “trial by fire,” a sink or swim tutorial in the esoteric methods of deep archival research.

But at a certain point in the middle of the semester, something changed. Muhanna and his colleagues began finding the answers to Cook’s questions, but in unexpected places, locating references to Cook’s citations in works that he had not consulted. It was at this point, Muhanna says, that Cook raised his eyebrow suspiciously. Cook’s students had discovered the utility of the enormous textual databases of classical Islamic sources that had, in 2004, just been made available online. “We had found the answer,” says Muhanna, “using a kind of search and capture method … and not in the very tortuous way he was hoping to make us get it.”

This experience, says, Muhanna, was one of the impetuses for last month’s conference, which is part of a larger research initiative hosted by the Middle East Studies program.  Michael Cook’s raised eyebrow represents an ambivalence at the core of the digital humanities, perhaps especially as they relate to the study of the Islamic world.  Digital archives, text-searchable databases, computational analyses, these innovations have reshaped the methodological landscape and opened a door to new and exciting research. But for scholars of Islamic history and literature who have come to see the long, tortuous work of archival research as commensurate with the discipline itself, the digital humanities have been met with more than a few raised eyebrows.

“I felt that there was something tremendous to be gained by this technology,” Muhanna said in his opening remarks, “but there was also something probably tremendous that was in danger of being lost.”

Although this ambivalence may have inaugurated the conference, the vast majority of work presented by attending scholars attested, unambiguously, to the rich new world of research questions provoked by combining digital innovations with Islamic and Middle East studies scholarship. For example, for her project on “The Geography of Readership in Early Modern Istanbul,” Harvard historian Meredith Quinn compiled a database of probate inventories from 17th century Istanbul, paying special attention to those that listed books among the possessions of the deceased. Using quantitative analysis, she worked to identify correlations among book ownership, gender, class, and occupation. She then integrated that data with a map of the city to identify the more “bookish” neighborhoods of 17th century Istanbul.

Projects like Quinn’s, which elegantly combine archival sources with digital mapping and quantitative analysis, are so natural, grounded in good research, and plainly productive of new scholarly knowledge and questions, that any resistance from the digital humanities skeptics seems misguided: purist methodological traditionalism. Or worse, the resentful Luddism of a generation of scholars who “had to do it the hard way, so why don’t you?” On the other hand, one can more easily understand humanists chafing a little at the title of Bryn Mawr graduate student Alex Brey’s algorithm-dependent presentation on “Quantifying the Qur’an,” despite the fact that it addressed core issues of humanist concern, such as book history and scribal practices.

Professor Muhanna notes that scholars in the digital humanities might occasionally have a romantic, emotional, or religious reticence about converting a sacred text into points of data to yield historical knowledge. “There’s an understandable resistance to construing the tremendously complex object of one’s research, whether that’s a literary or a religious text, as basically a corpus of data. It has the association that it becomes just ones and zeroes.” And even more resistance about “the idea that we can somehow perform complicated analytical operations that might replace the very careful, painstaking work of interpretation.”

A self-critical debate over the proper scope of the digital humanities popped up at various moments throughout the conference. “Is this a new paradigm?” Muhanna asked, “Does digital, data-driven scholarship tell us anything qualitatively new? Or does it just give us these tremendous tools to confirm what we already intuitively know, and that we had already arrived at through old-fashioned interpretative scholarship?”

At some point during one of these self-reflective flare-ups, one scholar remarked, somewhat pugnaciously, “So we’re historians with computers. That’s enough for me!”

Muhanna’s conclusion is somewhat more nuanced: “The best way to think about it is that we’re just dealing with different sets of questions. And that one set doesn’t invalidate the other set.”

– Sam Adler-Bell