Text Analysis Tools for Early Modern Literature: The Case of Margaret Cavendish | Newberry

Text Analysis Tools for Early Modern Literature: The Case of Margaret Cavendish

John Shanahan

John Shanahan

Robin Burke

Robin Burke

Research Methods Workshop for Early-Career Graduate Students
Friday, March 3, 2017

9 am to 5 pm

Room B-82

John Shanahan and Robin Burke, both of DePaul University
The application deadline has passed.
Center for Renaissance Studies Programs
Renaissance Graduate Programs

The works of Margaret Cavendish, in both physical and digital form, will serve as the subject matter for this introduction to digital humanities tools and methods. Cavendish’s work has been digitized in the EEBO and Chadwyck-Healy databases, but to our knowledge automated text analysis of her work has not to date been done.

In this workshop participants will engage with Cavendish’s work as an example of various forms of preservation and research accessibility stretching from early modern physical codices from the Newberry collection to microfilm-turned-database (Early English Books Online) to searchable TEI-encoded websites (Chadwyck-Healy and EEBO-TCP). While nearly all of Cavendish’s works will be brought to bear in discussion, this workshop will focus on a handful of her works as objects of text analysis, in particular, her 1662 and 1668 volumes of plays, and the two editions of Observations on Experimental Philosophy from 1666 and 1668.

The workshop will have three components:

  • Text Preparation. After examining copies of Cavendish’s works from the Newberry’s collection in the first hour, we will move to studying her work at scale in digital form. While examining the first editions in a ‘show and tell’ format, we will at the same time discuss briefly the work involved in moving her texts from physical to digital form for processing (OCR software, mark-up) on the way to our exercises in text clean up for analysis. While we will supply already-cleaned files of some of Cavendish’s texts in order to save time, we will also provide students experience doing prep work themselves with her works (or others).
  • “Distant reading.” This part of the workshop centers on hands-on work with automated text analysis tools including Voyant and R-Studio. (Both programs are free online and we’ll ask students to download R-Studio to their laptops before the workshop. Voyant runs in any browser.) After a brief look at use of n-grams for research more generally, students will learn to use text analysis packages and write new queries with R for purposes of quantitative comparison of textual data. Our queries and measures will include: word statistics for Cavendish’s texts as individual works, as a corpus, and vis a vis EEBO-TCP and other larger data sets; parts of speech tagging and entity extraction; sentence length; detection of text reuse and phrase duplication; context and proximity searches; lexical density.
  • Network Analysis. In the final session of the workshop, we will examine the uses of network analysis for exploring relationships in text. Students will consider different ways of defining nodes, edges, and associated attributes in literary works, and different techniques for extracting such data. Students will learn about different metrics that can be applied to networks including centrality measures. We will provide extracted character relationship data for some of Cavendish’s plays, and students will be guided in the use of the Gephi network visualization tool (also a free open-source package) to create of visualizations of these works and to explore a wide range of visualization parameters including layout, color, size, attribute mappings, and computed metrics.

John Shanahan has published several articles and book chapters on Margaret Cavendish, most recently “Natural Magic in The Convent of Pleasure” (2014), and he directs DePaul’s graduate certificate program in digital humanities. He will co-teach this workshop with Robin Burke, a specialist in data mining, artificial intelligence, and recommender systems. The two are frequent collaborators on digital humanities research.

Learn more about the workshop directors:

John Shanahan, DePaul University
Robin Burke, DePaul University

We encourage participants to arrive a day early or stay over on Saturday to pursue research in the Reading Rooms. Participants may also be interested in attending the March 4 Eighteenth Century Seminar with Sean Silver.

Preliminary schedule

8:30 - 9

Coffee and Introductions

9 - 9:30

Obtain reader cards; library orientation

9:30 - 10:30

Rare books session with Cavendish’s works. Discussion of OCR issues.

10:30 - 12:30

Text preparation; “cleaning” text files for processing

12:30 - 1:30

Catered lunch

1:30 - 3

Analysis of prepared files of Cavendish (and EEBO comparitors) with Voyant and R-Studio

3 - 4:30

Visualization and Network Analysis (Gephi)

4:30 - 5

Open workshop time for exploration and next steps

Prerequisites: No background with programming is required. Laptops are strongly recommended.

Eligibility: This workshop is open to graduate students in a terminal master’s program and those who have not yet completed comprehensive exams in a PhD program, at Newberry Center for Renaissance Studies consortium member institutions. We encourage students to apply from disciplines as varied as the literatures of English and other languages, Religious Studies, Medieval or Renaissance Studies, Art History, and History, among others.

Travel funding: Faculty and graduate students of Center for Renaissance Studies consortium institutions may be eligible to apply for travel funds to attend CRS programs or to do research at the Newberry. Each member university sets its own policies and deadlines; contact your Representative Council member in advance for details.

Cost and Registration Information 

The application deadline has passed. Enrollment is limited, by competitive application, with priority given to students from Center for Renaissance Studies consortium institutions. Fees are waived for consortium students.