Baquaqua : The Afro-Diasporic Text Corpus Project: Corpora

Transcribed and Annotated Texts by African-American Authors

Current Corpora

The following files are the most recent editions of the corpus, and contain poetry form the 18th century.  These files have not yet been proofread and are provided here "as is."  Final proof-read corpora will be added as they are completed.  Corpus Edition : 20210728

Annotation Schema

To facilitate analyses we are encoding texts with "Part of Speech" tagging, which assigns basic grammatical information to each word.  For example in the annotation sample to the right, MAECENAS_NP1 means "MAECENAS-Proper-name-singular," beneath_II means "beneath-general-proposition," and myrtle_NN1 means "myrtle-singular-common-noun."

The scheme is called the Constituent Likelihood Automatic Word-tagging System, seventh standard (CLAWS 7).  The complete list of annotations can be found here: CLAWS C7.

The CLAWS system can generate POS tags automatically at the project website HERE.

We are using this scheme at present, because it can be done automatically.  Annotating the texts with an XML (eXtensible Mark-up Language) schema according to specifications of the Text-Encoding Initiative (TEI) is the long term goal of this project.  Such a scheme would make tagging much more expressive by combining the CLAWS 7 taggings with headwords (hw) and other data about the meter, rhyme and more.  The XML can be extended to capture data about as many different aspects of the text as possible which will facilitate more comprehensive and robust scholarship.  An example of the first two lines of "To Maecenas" can be seen to the right, below.


Here is an example of a text annotated with Part-of-Speech tags using CLAWS c7, from Wheatley's TO MÆCENAS

MÆCENAS, you, beneath the myrtle shade,
Read o'er what poets sung, and shepherds played.
What felt those poets, but you feel the same?
Does not your soul possess the sacred flame?
Their noble strains your equal genius shares
In softer language, and diviner airs.
While Homer paints, lo! circumfused in air,
Celestial Gods in mortal forms appear;
Swift as they move, hear each recess rebound,
Heav'n quakes, earth trembles, and the shores resound.
MAECENAS_NP1 ,_, you_PPY ,_, beneath_II the_AT myrtle_NN1 shade_NN1 ,_, 
Read_VVN o'er_II what_DDQ poets_NN2 sung_VVN ,_, and_CC shepherds_NN2 played_VVD ._. 
What_DDQ felt_VVD those_DD2 poets_NN2 ,_, but_CCB you_PPY feel_VV0 the_AT same_DA ?_? 
Does_VDZ not_XX your_APPGE soul_NN1 possess_VVI the_AT sacred_JJ flame_NN1 ?_? 
Their_APPGE noble_NN1 strains_VVZ your_APPGE equal_JJ genius_NN1 shares_NN2 
In_II softer_JJR language_NN1 ,_, and_CC diviner_JJR airs_NN2 ._. 
While_CS Homer_NP1 paints_VVZ ,_, lo_UH !_! circumfused_VVD in_II air_NN1 ,_, 
Celestial_JJ Gods_NN2 in_II mortal_JJ forms_NN2 appear_VV0 ;_; 
Swift_NP1 as_CSA they_PPHS2 move_VV0 ,_, hear_VV0 each_DD1 recess_NN1 rebound_VVI ,_, 
Heav'n_NN1 quakes_VVZ ,_, earth_NN1 trembles_VVZ ,_, and_CC the_AT shores_NN2 resound_VV0 ._. 

XML tagged text:

<div title="To Maecenas">
<lg rhyme="aa"met="-+|-+|-+|-+|-+"> 
<l n="1">
<w c7="NP1" hw="maecenas" pos="NP">MÆCENAS</w>
<w c7="PNP" hw="you" pos="PRON">you</w>
<w c7="PRP" hw="beneath" pos="PREP">beneath</w>
<w c7="AT0" hw="beneath" pos="ART">the</w>
<w c7="NN1" hw="myrtle" pos="SUBST">myrtle</w>
<w c7="NN1" hw="shade" pos="SUBST">shade</w>
<l n="2">
<w c7="VVB" hw="read" pos="VERB">read</w>
<w c7="PRP" hw="over" pos="PREP">o'er</w>
<w c7="DTQ" hw="what" pos="PRON">what</w>
<w c7="NN2" hw="poet" pos="SUBST">poets</w>
<w c7="VVN" hw="sing" pos="VERB">sung</w>
<w c7="CJC" hw="and" pos="CONJ">and</w>
<w c7="NN2" hw="shepherd" pos="SUBST">shepherds</w>
<w c7="VVN" hw="play" pos="VERB">play'd</w>

