Reconstructing the Population History of Puerto Rico by

Dr. Juan Martinez Cruzado- Biology and Genetics

by Means of mtDNA Phylogeographic Analysis



The haplogroup identities of 800 mtDNAs randomly and systematically selected to be representative of the population of Puerto Rico were determined by restriction fragment length polymorphism (RFLP), revealing maternal ancestries in this highly mixed population of 61.3% Amerindian, 27.2% sub-Saharan African, and 11.5% West Eurasian. West Eurasian frequencies were low in all 28 municipalities sampled, and displayed no geographic patterns. Thus, a statistically significant negative correlation was observed between the Amerindian  and African frequencies of the municipalities. In addition, a statistically highly significant geographic pattern was observed for Amerindian and African mtDNAs. In a scenario in which Amerindian mtDNAs prevailed on either side of longitude 66°16  West, Amerindian mtDNAs were more frequent west of longitude 66°16  West than east of it, and the opposite was true for African mtDNAs. Haplogroup A had the highest frequency among Amerindian samples (52.4%), suggesting its predominance among the native Taı’nos. Principal component analysis showed that the sub-Saharan African fraction had a strong affinity to West Africans. In addition, the magnitudes of the Senegambian and Gulf of Guinea components in Puerto Rico were between those of Cape Verde and Sa˜o Tome’. Furthermore, the West Eurasian component did not conform to European haplogroup frequencies. HVR-I sequences of haplogroup U samples revealed a strong North African influence among West Eurasian mtDNAs and a new sub- Saharan African clade. Am J Phys Anthropol 128: 131–155, 2005.  

2005 Wiley-Liss, Inc.

Recent technical advances have facilitated the discovery of genetic polymorphisms in the human population, many of which are useful as markers for prehistoric migrations that gave rise to continental and regional populations. Continental-population histories were reconstructed using Y-chromosome markers, which are paternally inherited (Hurles et al., 1998; Rosser et al., 2000; Bamshad et al., 2001; Hammer et al., 2001; Karafet et al., 2001; Kayser et al., 2001; Malaspina et al., 2001; Underhill et al., 2001; Bortolini et al., 2002, 2003; Cruciani et al., 2002; Lell et al., 2002; Pereira et al., 2002; Semino et al., 2002; Zerjal et al., 2002, 2003; Zegura et al., 2004), and mtDNA markers, which are inherited maternally (Merriwether and Ferrell, 1996; Comas et al., 1998; Starikovskaya et al., 1998; Richards et al., 2000, 2002; Forster et al., 2001; Kaestle and Smith, 2001; Malhi et al., 2001; Torroni et al., 2001a,b; Keyeux et al., 2002;  Oota et al., 2002; Salas et al., 2002; Schurr and Wallace, 2002; Yao et al., 2002a,b; Kong et al., 2003), usually finding remarkable differences in sex migration histories. In this study, we developed a hierarchical strategy that makes use of haplogroup-defining mtDNA restriction markers to identify maternal biological ancestries in a sample set randomly and systematically selected to be representative of the Puerto Rico population, a mixed Caribbean population of three principal components: Amerindian, Sub-Saharan African, and West Eurasian. With some notable exceptions, most haplogroups are regarded as continent- specific. Thus, determining the haplogroup to which a mtDNA belongs usually identifies the mtDNA biological ancestry. The HVR-I sequence was used when biological ancestry could not be determined through restriction marker analysis.

The biological ancestries of a mixed people have implications in their population genetics and thus in public health. In terms of mtDNA ancestry, studies of European and North American populations related particular West Eurasian haplogroups to higher frequencies of some diseases such as Alzheimer’s (Hutchin and Cortopassi, 1995), Leber hereditary optic neuropathy (Johns and Berman, 1991; Brown et al., 1997; Hofmann et al., 1997; Lamminen et al., 1997; Torroni et al., 1997; Howell et al., 2003), Wolfram syndrome and sudden infant death syndrome (Hoffman et al., 1997), and some conditions such as asthenozoospermia and nonasthenozoospermia (Ruiz-Pesini et al., 2000). Further, it was shown that the 10394  Dde   I state plays a protective role  against Parkinson’s disease, and that its effect is stronger when it is combined with other polymorphisms that are specific to haplogroups J and K (van der Walt et al., 2003). Thus, the characterization of the mtDNA pool of any population may be instrumental in determining risk factors for various diseases and conditions. 

In addition, biological ancestries imply human migration routes that shed light on the possible origins of introduced fauna and flora, including agricultural varieties. Moreover, biological ancestries play a fundamental role in population history which, as one of the main categories of cultural history, is essential to explain the social systems and behavioral  guidelines that rule all aspects of social life. Population history considers population growth in relation to geographic regions, biological ancestries, and admixture, and thus plays a central role in the cultural development of a people (Ferna’ndez-Me’ndez, 1970). 

It is estimated that from 60,000 – 600,000Arawak-speaking Taı好o Indians lived in Puerto Rico when it was discovered for the Europeans by Christopher Columbus in 1493 (Abbad, 1959; Ferna ’ndez-Me’ndez, 1970). Traditional history tells us that they were decimated by war, hunger, disease, and emigration, such that they had totally disappeared by the end of the 16th century. The vast majority of Spanish settlers were single men, and mixing with Indian women commenced fully upon colonization in 1506. The Spanish Crown took measures to increment the number of “white” people on the island, including ordering “white” Christian female slaves to be sent to Puerto Rico in 1512. However, the 1530 census reported that only 57 of the 369 “white” men on the island were married to “white” women. Such  Whites” were a minority. The census reported 335 “black” female slaves and 1,168 “black” male slaves, and a total of 1,148 Indians, both genders included (Brau, 1904). By this time, the base of the Puerto Rican economy was shifting from gold mine exploitation to sugar cultivation. African slaves became the cornerstone of the sugar industry.

Traditional history includes abundant evidence of the widely dispersed geographic origins of the sub- Saharan African peoples who were brought to the Americas, spanning Cape Verde on the northwest edge of sub-Saharan Africa to Mozambique and the island of Madagascar in the southeast. The arrival in Puerto Rico of people from various African regions can be confirmed by traditional festivals and other activities held in the names of African gods and by the use of words that can be found only in particular African regions. However, the lack of a classification system for slaves by tribe or even by geographic region during the Atlantic slave trade leaves doubts concerning the relative contribution of the different continental regions (A’ lvarez-Nazario, 1974).

Slaves were first brought to Puerto Rico in 1508 by its conquistador, Juan Ponce de Leo’n. These were residents of the Iberian Peninsula, many of North African, Senegambian, or Guinean origin (A’ lvarez- Nazario, 1974); others were Greek, Slavic, or Turkish (Thomas, 1997), and others Jewish (Dı’az- Soler, 2000). The capture of sub-Saharan Africans with the goal of providing Spanish and Portuguese colonies in the Americas with a labor force, first in the search for gold and later in sugar plantations, started in 1518 (Dı’az-Soler, 2000). Up to the beginning of the second half of the 16th century, almost all slaves originated in Senegambia and Guinea (Alegrı’a, 1985). The island of Sa˜o Tome’, with slaves acquired mainly from the Gulf of Guinea, was an important supplier thereafter. Throughout the 16th century and with the exception of one in the west coast, all 13 sugar mills in Puerto Rico were east of the La Plata River, which streams along longitude 66°16  West (Gelpı’-Baı’z, 2000).

The Portuguese were the legal source of African slaves until 1640, at which time Spain suspended all contracts in retaliation for the revolution that removed their Spanish rulers. The resulting shortage of slave labor provoked the collapse of the sugar industry, starting a period of subsistence economy that lasted for a century and a half until the Crown suspended all taxes and source restrictions on the slave trade in 1789. The poor state of the economy hindered the importation of slaves, and the tax collected upon their sale made the illegal trade their main source. The illegal slave trade was circuitous in that the main slave sources were the Dutch colony of Curacao and the English colony of Jamaica, in that order. Slaves brought from the Gold Coast (Ghana) were the most common in these  lonies at the time. The illegal Puerto Rican harbors were located on the west and south coasts, where most of the island population lived (A’ lvarez-Nazario, 1974). The only legal harbor was far away in San Juan, the capital, and few legal immigrants made it to Puerto Rico during these times. 

The importation of slaves increased dramatically as a consequence of the land and tax reforms of the last decades of the 18th century, and approximately two-thirds of all slaves ever brought to Puerto Rico arrived from that point in time until the abolition of slavery in 1873 (A’ lvarez-Nazario, 1974). By then, the African harbors most used by slave traders extended from the Gold Coast to Angola (Thomas, 1997).

This new wave of slaves found Puerto Rico mainly inhabited by criollos, Puerto Rico  Natives who were the product of centuries of admixture and generations living under a subsistence economy with little or no Spanish government intervention (Ferna¡¦ndez-Me¡¦ndez, 2000).

The Spanish Empire started to crumble at the beginning of the 19th century, and an 1815 royal decree permitted the settlement in Puerto Rico of foreign Catholics with their wealth and slaves. Thus, wealthy “white” refugees and other immigrants from Europe and the Americas made it to Puerto Rico in great numbers, stimulating the economy by developing the sugar industry in the coastal plains and the coffee and tobacco industries in the mountains. International treaties banned importation of slaves directly from Africa north of the Equator in 1817 and south of it in 1820. However, enforcement of the treaties was neffective south of the Equator, where the Portuguese had bountiful slave factories. Thus, the illegal Angolan trade became substantial in the 19th century. Larger sources of Africans were probably the West Indies, because trade within the Caribbean was not banned and because escapees arriving to Puerto Rico were granted freedom. In this respect, migrations from the then Danishruled island of Saint Thomas, which acquired its slaves mainly from the Gold Coast to the Slave Coast in the Bight of Benin (Thomas, 1997), were a major source (A’ lvarez-Nazario, 1974). 

Our results conform to most accounts of traditional history, but not at all with the extermination of the Taı’no people as early as the 16th century, thus showing that  population genetics has a lot to offer studies on Caribbean population history. It is important to note that neglected people rarely contribute to traditional history, and a great part of the cultural development of the Puerto Rican people occurred in the “darkness” of history, far away from the capital, as did the illegal trade that kept their subsistence economy alive. 

In the interest of greater clarity, we often refer to Puerto Ricans carrying mtDNAs of Amerindian, African, or West Eurasian origin by such terms as Amerindians, Africans, and West Eurasians. However, it is important to keep in mind that we are referring to a thoroughly mixed population composed of people of a single culture and whose phenotypes do not predict individual mtDNA ancestries. 



A random sample of 872 housing units representative of the island of Puerto Rico was selected using a sampling frame developed by the Center for Applied Social Research (University of Puerto Rico at Mayagu‥ ez) for survey research in Puerto Rico, based on the 1990 Census of Population and Housing. Excluding the island municipalities of Vieques and Culebra from the sampling frame, 28 of the 76 municipalities in Puerto Rico were selected (Fig. 1), as per the following description. The eight most populated municipalities were selected with probability equal to one. Each was assigned a number of housing units proportional to its estimated population size, based on a total of 872 housing units for the entire island. To select the remaining 20 municipalities, the remainder of the island was divided into five geographical regions. Four municipalities from each region were selected at random with a probability proportional to estimated population size, while stratifying by estimated population size. They were assigned an equal number of housing units, oportional to the estimated population size of the geographic region they represented.  

Thirty percent of the census tracts within each municipality were selected at random, with probability proportional to estimated population size. Having established an estimated number of housing units for each census tract based on the number of housing units for the municipality and the relative population sizes of the selected census tracts, census blocks were selected within them so that each would contribute an expected eight households to the sample. On the field, housing units were chosen by systematic sampling, with a preestablished random starting point for each block. This means that the actual number of housing units obtained from each block could be greater or smaller than initially expected, depending on how the number of housing units in it had changed since 1990. An adult was selected at random from each housing unit. Participation in the project was agreed to by appropriate informed consent.

DNA manipulation

Sample collection and DNA extraction were performed as in Martı’nez-Cruzado et al. (2001). Thereafter, a 200- l aliquot from each 500-l sample was purified, using the QIAamp DNA Mini Kit (Qiagen). To each aliquot, 36 l of 60 mM Tris-HCl (pH 8.0), 60 mM Na  EDTA (pH 8.0), 0.6 M NaCl, 0.24 mM DTT, and 12% SDS were added, followed by 250 l of buffer AL and 250 l of 100% ethanol. The aliquots were vortexed thoroughly, transferred to a spin column, and spun at 8,000 rpm for 1 min. The filter was  washed by adding 300 l of buffer AW1, spinning at 8,000 rpm for 1 min, adding 300 l of buffer AW2, and spinning at 14,000 rpm for 5 min. DNA was eluted from the filter into two 100- l aliquots. The eluate aliquots were kept at 80°C as backups until the end of the study.

Except for the cycling conditions (see below) and that 1.5 U of Taq  DNA polymerase were used in each amplification reaction, the DNA amplification, restriction digestion, and agarose gel electrophoresis procedures were performed as in Martı’nez-Cruzado et al. (2001). The amplification reactions were usually subjected to one cycle of 2.5 min at 94°C, 32 cycles of 30 sec at 94°C, 1 min at 54°C, and 70 sec at 72°C, and one cycle of 10 min at 72°C. Primer annealing was achieved at 52°C to amplify the diagnostic site for  croparagroup L, and at 56°C to amplify the sites diagnostic for haplogroups G and L3d. 


Haplogroup identification strategy and quality-control estimates

Studies involving high-resolution restriction analysis (Ballinger et al., 1992; Torroni et al., 1992, 1993a,b, 1994a–d, 1996, 1997; Chen et al., 1995, 2000), analyses of the complete sequence of mitochondrial chromosomes (Kong et al., 2003; Reidla et al., 2003), or complete (Herrnstadt et al., 2002) or partial (Silva et al., 2002) sequences of their coding region showed that all haplogroups are virtually monomorphic for the 10,394  DdeI and 10,397 Alu  I sites, with the exception of haplogroup K. Thus, the a priori determination of the state of these sites quickly reduces the number of candidate haplogroups to which an unknown mtDNA may belong. Because these sites are close to each other, the 10394 DdeI/10397 Alu I motif (hereafter referred to athe motif ) can be easily determined from a single amplicon.

Thus, each mtDNA sample was first tested for its motif. Depending on the result, each sample was then tested for the markers diagnostic for all haplogroups known to share its motif. The haplogroups, their motifs, their defining markers, and the primers used are shown in Table 1. Haplogroups that are defined by two or more markers invariably share at least one of them with some other haplogroup. Thus, tests on unshared haplogroup markers were performed only when the samples showed the shared ones. The two markers that define haplogroup L1b were tested on all (/ ) motif samples, as each by itself defines another (/ ) motif haplogroup. L is a macroparagroup, a large group of mtDNAs including several haplogroups and other paraphyletic mtDNAs (Chen et al., 1995; Salas et al., 2002). Among others, it includes haplogroup L2 (here further subdivided into L2a and L2* to pool subhaplogroups L2b, L2c, and L2d) and subhaplogroups L1b and L1c. All other L  plogroups and paraphyletic mtDNAs were included in paragroup L0 (Mishmar et al., 2003).


The testing of markers for all haplogroups within each motif group served as a quality-control measure, as it allowed us to detect false positives. In the few instances in which the mtDNA tested positive for no haplogroup-defining markers, its identity was determined by the sequence of its HVR-I and confirmed by restriction analysis. Thus, false negatives were also detected, and the likelihood of an y error involving false haplogroup positives, false haplogroup negatives, or motif group misdiagnoses could be estimated experimentally. Such estimates were used to calculate the probabilities of any number of samples being misdiagnosed. Because all tests were performed independently, the likelihood that any two errors were committed in analyzing the same sample could be calculated based on the multiplicative rule of probability.  

 Amplicons to be sequenced were purified using the High Pure PCR Product Purification Kit (Roche Applied Science), as instructed by the manufacturer. Automated sequencing was performed at the New Jersey Medical School Molecular Resource Facility (University of Medicine and Dentistry of New Jersey), using an Applied Biosystems (ABI) model 3100 capillary sequencer after cycle sequencing with Dye Terminator mix version 2.0. 

Biological ancestry determination and data analysis

Biological ancestries were inferred from haplogroup identity. Because only nine women of Asian ancestry were reported living in Puerto Rico in 1899 (Sanger et al., 1900), mtDNAs of haplogroups belonging to both the New World and Asia were assumed to be of Amerindian origin unless participant interviews revealed otherwise. 

Data analysis was performed using the program SPSS 10.0.5 for Windows. To determine whether variation in participation rates or changes in population size occurring since 1990 in the sampled municipalities would lead to biased estimates of the parameters, we devised a weighting scheme. Through these weights, the number of samples provided by each municipality was adjusted so that it would be equal to the number expected by applying the original sampling proportions to the final sample size. The weights for municipality samples (W m) were a function of the sampling proportion of the municipality (P ), the final total obtained sample size (n), and the number of samples provided by the municipality (n ), so that m  Pm n/nm.

A triangular graphic of ancestry distribution among municipalities was constructed using MATLAB. A projected plane representing a linear function of form w f(X, Y, Z), in which plotted population dots were defined as the end of vectors with form w  Xi Yj  Zk, where X, Y, and Z represented Amerindian, African, and West Eurasian frequencies, respectively,   produced. The sum of X, Y, and Z was equal to one. Their magnitudes were a function of the 30¢X and 60¢X angles. Vectors i, j, and k were their respective unit vectors in the positive directions of the coordinate axes x, y, and z.  

To illustrate the geographic distribution of Amerindian mtDNA frequencies, municipalities were listed in order according to such frequencies and divided into 12 categories by creating a new category every time that the difference between two municipalities was 1.6% or more. Divisions were drawn halfway between the frequencies of such municipalities.


Principal component (PC) analyses were performed using the POPSTR program of Henry Harp Harpending (University of Utah). They were based on population haplogroup frequencies, and included only populations with 17 samples or more. Sub- Saharan African mtDNAs were classified as follows. Macroparagroup L was divided into haplogroup L2 (further subdivided into L2a and L2*), subhaplogroups L1b and L1c, and paragroup L0 to pool all other haplogroups and paraphyletic mtDNAs within the macroparagroup. Paragroup L3A (Salas et al., 2002) was divided into L3b, L3d, L3e, L3f, L3g, and L3*. We designated U5b2 as a sub-Saharan African clade with the HVR-I sequence 16189-16192-16270- 16320. Taken from one source were Shona (n 17), Tongas (20), Shangaan (22), Chopi (27), Chwabo (20), Lomwe (20), Makonde (19), Makhwa (20), Ndau (19), Nyungwe (20), Nyanja (20), Ronga (21), Sena (21), and Tswa (19) from Mozambique (Salas et al., 2002), Brazil (65) (Alves-Silva et al., 2000), Bubi (45), Sa˜o Tome’ (49) (Mateu et al., 1997), Mandenka (118) (Graven et al., 1995), Serer (23), a group of other Senegalese tribes (48), a pool of Mauritanian and West Saharan tribes (24) (Rando et al., 1998), Tuareg (22), Yoruba (33), Hausa (20), Fulbe (60), Turkana (37),  malia (27), Kikuyu (22) (Watson et al., 1997), Nubia (46) (Krings et al., 1999), Khwe (31) (Chen et al., 2000), and the southeastern islands of the Cape Verde Archipelago (169) (Brehm et al., 2002). From two sources were Biaka (34) and Mbuti (35) Pygmies (Chen et al., 1995; Watson et al., 1997), Wolof (66) (Chen et al., 1995; Rando et al., 1998), and !Kung (62) (Watson et al., 1997; Chen et al., 2000). For West Eurasians, mtDNAs were classified as belonging to H, V, HV, (pre-HV)1, J, T, I, W, X, M, N, R, K, U*, U2, U5*, U5(a b), U6, and U(others) to pool the remaining clades (U1, U3, U4, and U7). Populations were obtained from Rando et al. (1998) (23 Moroccan non-Berbers and 58 Moroccan Berbers), Brakez et al. (2001) (37 Moroccan  Souss Valley inhabitants), and Richards et al. (2000). This last group of authors compiled data from several authors concerning 13 populations from North Africa and the Near East, as well as several populations from Europe. They classified the European populations into 10  ographic regions, and we observe those same classifications here. Amerindian populations  ere divided into 12 geographic regions and “Others.” These were three from eastern North America (Mohawk (123) (Merriwether and Ferrell, 1996) and Ojibwa from Manitoulin Island (33) and northern Ontario (28) (Scozzari et al., 1997)), five from the Great Plains (Cheyenne/Arapaho (35), Sisseton/Wapheton Sioux (45), Turtle Mountain Chippewa (28) and Wisconsin Chippewa (62) (Malhi et al., 2001), and Siouan (34) (Lorenz and Smith, 1996)), six from the North American Southeast (Choctaw (27) (Lorenz and Smith,  1996), Creek (39) and Seminole (40) (Weiss and Smith, 2003), Oklahoma Muskoke (70) (Merriwether and Ferrell, 1996), and Oklahoma Red Cross Cherokee (19) and Stillwell Cherokee (37) (Malhi et al., 2001)), 15 from the North American Southwest (Akimal O’odham (43), Apache (38), Delta Yuman (23), Navajo (64), North Paiute/Shoshoni (94), Pai Yuman (27), River Yuman (22), Tauno O’odham (37), Zuni (26) (Malhi et al., 2003), California Penutian (17),  vasupai/Hualapai/Yavapai/Mojave (18), Jemez (36), Pima (37), Quechuan/Cocopa (23), and asho (28) (Lorenz and Smith, 1996)), four from Mesoamerica (Maya (26), Mixtec (29),  ahua/Cora (32) (Lorenz and Smith, 1996), and North Central Mexico (199) (Green et al., 2000)), eight from eastern Central America (Bribri-Cabecar (24) (Torroni et al., 1993a), Embera’ (Panama’ ) (44), Wounan (31) (Kolman and Bermingham, 1997), Guatuso (20), Teribe (20) (Torroni et al., 1994d), Huetar (27) (Santos et al., 1994), Kuna (63) (Batista et al., 1995), and Ngo‥be’ (46) (Kolman et al., 1995)), 14 from western Colombia and Ecuador including the Andes (Cayapa (94) (Rickards et al., 1999), Chimila (34), Guambiano (23), Guane-Butaregua (33), Ijka-Arhuaco (40), Kogui (30), Paez (31), Tule-Cuna (29), Waunana (30), Yuco-Yukpa (88) (Keyeux et al., 2002), Embera’ (Colombia) (41), Ingano (52), Wayuu (59), and Zenu (69) (Mesa et al., 2000; Keyeux et al., 2002)), nine from Colombia east of the Andes (Coreguaje (19), Curripaco (17), Guahibo-Sikuani (23), Guayabero (24), Huitoto (22), Murui-Muinane (18),  Nukak (20), Piaroa (18) (Keyeux et al., 2002), and Tucano (71) (Mesa et al., 2000; Keyeux et al., 2002)), seven from the Amazon (Bele’n (Brazil) (81) (Batista dos Santos et al., 1999),  razilian North (26) (Alves-Silva et al., 2000), Gaviao (27), Xavante (25), Zoro’ (30) (Ward et al., 1996), Ticuna (28) (Torroni et al., 1993a), and Yanomami (97) (Merriwether and Ferrell, 1996)), nine from the Peruvian, Bolivian, and Chilean highlands around Lake Titicaca  Atacamen˜ o (50) (Merriwether et al., 1995), Chimane (40), Ignaciano (21), Mosete’n (19), Movima (22), Trinitario (33), Yuracare’ (27) (Bert et al., 2001), Aymara (98), and Quechua (51) (Merriwether and Ferrell, 1996; Bert et al.,  2001)), six from northern Argentina (Mataco  from the provinces of Chaco (28), Formosa (44), and Salta (55), Pilaga (40), and Toba from the provinces of Chaco (28) and Formosa (26) (Demarchi et al., 2001)), and five from southern South America (Huilliche (89) (Merriwether and Ferrell, 1996), Mapuche- Argentina (50) (Bailliet et al., 1994), Mapuche- Chile (156) (Merriwether et al., 1995; Moraga et al., 2000), Pehuenche (204) (Merriwether and Ferrell, 1996; Moraga et al., 2000), and Yaghan (21) (Moraga  et al., 2000)). Two “Other” populations were Bella Coola (36) (Lorenz and Smith, 1996) and Brazilian Southeast (33) (Alves-Silva et al., 2000).

Haplogroup diversity for the Amerindian mtDNAs was calculated using the method of Tajima (1989), h  [1  i ]n/(n  1), where xi  is the frequency of each haplogroup and n is the sample size. 

RESULTS- Response rate

All selected housing units were identified between August 6, 1999–March 19, 2000. Based on the 1990 Census of Population and Housing, a total of 872 housing units was selected. This translated into 1,067 because of housing growth through the decade. Eighty-one housing units were uninhabited. From the 986 remaining housing units, 876 selected individuals were contacted. Exactly 800 of these agreed to participate, for a response rate of 81.1% based on the 986 selected individuals. The sampling procedure results for each municipality and region are detailed in Table 2. 

Haplogroup identification data quality

The haplogroup identification strategy described above allowed the detection of misdiagnoses of both motif and haplogroup-defining marker identities, and thus an estimation of the probability that any misdiagnoses may have gone undetected. The largest margin of error lies within the (/ The haplogroup identification strategy described above allowed the detection of misdiagnoses of both motif and haplogroup-defining marker identities, and thus an estimation of the probability that any misdiagnoses may have gone undetected. The largest margin of error lies within the (/ ) motif group. Initially, all (/ ) samples were tested, among others, for the 3592 HpaI and 2349 Dpn II markers but not for markers 9070 TaqI and 16389 Hin fI, which are necessary to discriminate L1c and L2,  respectively, from all other mtDNAs within L(Table 1). Thus, the samples belonging to L1b  (3592 HpaI/2349 DpnII) and L3e (3592 Hpa I/ 2349 Dpn II) were quickly identified, while  the samples with the 3592 HpaI/2349 Dpn  II profile had to be subjected to a  second round of tests for ) motif group. Initially, all (/ ) samples were tested, among  others, for the 3592 HpaI and 2349 Dpn II markers but not for markers 9070 TaqI and 16389 Hin fI, which are necessary to discriminate L1c and L2, respectively, from all other mtDNAs within L (Table 1). Thus, the samples belonging to L1b (3592 HpaI/2349 DpnII) and L3e (3592 Hpa I/ 2349 Dpn II) were quickly identified, while the samples with the 3592 HpaI/2349 Dpn II profile had to be subjected to a second round of tests for markers 9070 TaqI and 16389 Hin fI. HVR-I sequencing of those samples with no haplogroup-defining markers showed that the 3592 Hpa I motif of one of the 79 samples with the 3592 HpaI/2349 Dpn II profile initially went undetected. This gave us an experimental estimate of 1/79 for the frequency with which the 3592 Hpa I motif went undetected. Using such a frequency and a base of 49 L1b samples, we calculated a probability of 53.6% that none of the 38 samples identified as belonging to haplogroup L3e ( 3592 HpaI/2349  Dpn II) may actually belong to L1b (3592 HpaI/2349 Dpn II). Using bases of 50, 51, and 52 L1b samples, we calculated probabilities of 33.9%, 10.9%, and 2.4% that one, two, or three samples identified as L3e actually belong to L1b.

There are two other scenarios by which misdiagnoses could occur. One is the combination of a misdiagnosis of the sample motif group with a false positive for a haplogroup-defining marker. The other is the occurrence of both a false negative and a false positive for haplogroup-defining markers with the same sample. Based on the detection of 11 motif misdiagnoses (three samples misdiagnosed as( /), seven as (/), and one as (/)), six false positives (one each for the markers corresponding to A, D, HV, L, L3b, and J/T), 11 false  egatives (three each for the markers of A and J/T, two for that of C, 

and one each for those of HV, L and L3b), the number of samples belonging to each motif group (377 ( /), 233 (/), and 190 (/)), and the number of samples belonging to each haplogroup (Table 3), we estimate that the probability that no misdiagnoses were made under either of these two scenarios is 86.0%, and that the probability that two or more misdiagnoses were made is insignificant. 

Haplogroup identities

Table 3 shows the distribution by municipality of all haplogroups found, their frequencies, and their biological origin. Only six of the 800 samples were confirmed, through HVR-I  quencing, as having 10394 DdeI/10397 Alu I motifs different from those corresponding to their haplogroups (Table 1). Specifically, we found two L3e, one L2*, one L2a, and  one L1c sample to have ( /) instead of (/) motifs. In addition, one haplogroup C sample had a (/) motif instead of the (/) expected. 


On six occasions, samples were confirmed as having more than one haplogroup-defining marker. The 8616 Dpn II marker that characterizes haplogroup L3d was found in one L0 and one L1c sample. Furthermore, two L1b samples had the 4216  Nla III marker that characterizes haplogroups J and T. These were all regarded as belonging to macroparagroup L because of the known stability of the TABLE 2. Sampling procedure results categorized by region Region Municipality Number of housing units Uninhabited housing units Inhabited using units Agreed to participate Selected, not contacted Declined to participate Metro  ecibo 33 3 30 26 (86.7%) 2 (6.7%) 2 (6.7%) Bayamo¡¦n 60 7 53 43 (81.1%) 6 (11.3%) 4 (7.5%) Caguas 40 3 37 30 (81.1%) 7 (18.9%) 0 Carolina 51 3 48 39 (81.3%) 4 (8.3%) 5 (10.4%)  Guaynabo 23 2 21 16 (76.2%) 5 (23.8%) 0 Mayagu¡L ez 33 3 30 26 (86.7%) 0 4 (13.3%) Ponce 37 3 34 27  9.4%) 0 7 (20.6%) San Juan 118 5 113 78 (69.0%) 22 (19.5%) 13 (11.5%) Subtotal 395 29 366 285  7.9%) 46 (12.6%) 35 (9.6%) North Florida 39 6 33 29 (87.9%) 4 (12.1%) 0  Toa Baja 28 1 27 22  81.5%) 1 (3.7%) 4 (14.8%) Vega Alta 50 3 47 38 (80.9%) 4 (8.5%) 5 (10.6%) Vega Baja 41 6 35 25  1.4%) 7 (20.0%) 3 (8.6%) Subtotal 158 14 142 114 (80.3%) 16 (11.3%) 12 (8.5%) East Humacao 72 4   51 (75.0%) 11 (16.2%) 6 (8.8%) Loı’za 46 1 45 37 (82.2%) 4 (8.9%) 4 (8.9%) Patillas 26 2 24 21  7.5%) 1 (4.2%) 2 (8.3%) San Lorenzo 43 3 40 31 (77.5%) 7 (17.5%) 2 (5.0%) Subtotal 187 10 177 140 (79.1%) 23 (13.0%) 14 (7.9%) South Guayanilla 24 6 18 17 (94.4%) 0 1 (5.6%) Juana Dı’az 23 1 22 1 9 (86.4%) 2 (9.1%) 1 (4.5%) Pen˜ uelas 13 4 9 9 (100%) 0 0 Yauco 27 2 25 22 (88.0%) 0 3 (12.0%) Subtotal 87 13 74 67 (90.5%) 2 (2.7%) 5 (6.8%) West Aguadilla 26 2 24 23 (95.8%) 1 (4.2%) 0 Hormigueros 33 1 32 28 (87.5%) 2 (6.3%) 2 (6.3%) Moca 27 2 25 23 (92.0%) 0 2 (8.0%) San Sebastia’n 29 2 27 23 (85.2%) 1 (3.7%) 3 (11.1%) Subtotal 115 7 108 97 (89.8%) 4 (3.7%) 7 (6.5%)  Central Barranquitas 38 2 36 30 (83.3%) 5 (13.9%) 1 (2.8%) Cayey 31 1 30 22 (73.3%) 7 (23.3%) 1 (3.3%) Corozal 29 1 28 23 (82.1%) 4 (14.3%) 1 (3.6%) Jayuya 27 2 25 22 (88.0%) 3 (12.0%) 0 Subtotal 125 6 119 97 (81.5%) 19 (16.0%) 3 (2.5%) Total Total 1,067 81 986 800 (81.1%) 110  1.2%) 76 (7.7%)

L-defining 3592 Hpa I marker. The true haplogroup identities of the remaining two samples were determined from their HVR-I sequences. One haplogroup H sample had the 9-bp deletion between the tRNA Lys  and COII genes that characterizes haplogroup B, and one haplogroup C sample had the 7598 Hha I mutation that characterizes Asian haplogroup E. Their respective HVR-I sequences were 16093-16362 and 16221-16223-16261-16298-16325- 16327. Thus, the first lacked the transitions at positions 16189 and 16217 that characterize  haplogroup B (Ginther et al., 1993; Horai et al., 1993), and  the second possessed the  ogroup C-specific transitions at positions 16298 and 16327 as well as the Amerindian-specific  transition at 16325 (Torroni et al., 1993b). 

Samples that did not test positive for any haplogroup-defining marker were identified by  sequencing their HVR-I as well as specific sites in their coding regions. Nine (/ ) mtDNAs  were classified as L3* for having transitions at sites 10873 and 12705. The HVR-I sequences of two ( /) mtDNAs not having transitions at 10873 or 12705 were 16288-16311 and 16126-16189-16362. The first (/) mtDNA had a transition at site 11719 but not at 16223, and was thus classified as belonging to R. The transitions at sites 16126 and 16362 showed that the second (/) mtDNA belonged to JT or (pre-HV)1 (Macaulay et al., 1999). The absence of a  transition at 11719 showed that it belonged to (pre-HV)1 (Richards et al., 2000). Finally, the HVR-I sequence of one (/) sample that did not exhibit any haplogroup-defining marker was 16086-16183- 16189-16223-16278-16298-16325-16327. Thus, it contained the 16223, 16298,  16325, and 16327 transitions specific for Native American haplogroup C, and transitions  6183, 16189, 16223, and 16278, which are found in most haplogroup X mtDNAs (Brown et al.,  1998). However, it possessed the 10397 Alu I motif specific of macrohaplogroup M, to which haplogroup C but not haplogroup X belongs. This motif was shown to be very stable  Kivisild et al., 2002; Kong et al., 2003), and we thus regarded this mtDNA as belonging to  aplogroup C. One haplogroup C mtDNA lacking the 13262 Alu I marker was previously described from the Amazonian Makiritare (Torroni et al., 1993a), and (/) mtDNAs lacking defining markers for haplogroups C and D seem to be common in Colombia (Keyeux et al., 2002; Rodas et al., 2003). No mtDNAs belonging to haplogroups E, F, G, I, M, N, JT, W, or X were found in our set of 800 samples.

Haplogroup U subdivisions

Among all haplogroups found here, U is the only one that was reported in significant numbers in more than one continental region (Torroni et al., 1996). It was thus necessary to study such mtDNAs in more detail to identify their biological origin. The HVR-I sequence of the 27 samples belonging to haplogroup U segregates them into 10 types (Table 4). Although haplogroup U ismostly regarded as aWest Eurasian haplogroup, it is apparent that nine of these samples originate from sub-Saharan Africa. All share the same sequence type, which has not been found in Europe or the Near East despite the thousands of samples from these areas for which HVR-I was sequenced (Alves-Silva et al., 2000; Richards et al., 2000; Finnila‥ et al., 2001; Malyarchuk et al., 2002). However, it was found in one out of 60 Fulbe sequences (Watson et al., 1997), and in one of 38 and 23 Wolof and Serer sequences, espectively (Rando et al., 1998). We classify it as a member of clade U5b* because of its 16189, 16192, and 16270 motif (Richards et al., 2000). Its distinction is the addition of a transition at position 16320. We designate it as clade U5b2 to represent a sub-Saharan African clade with a transition at 16320 as its signature.

Eleven samples seem to originate from North Africa and the Canary Islands. Two samples sharing the same sequence exhibit the 16163 motif, which is diagnostic for the Native Canarian-specific clade U6b (Rando et al., 1999). Nine samples segregate into three North African sequence types. The most common type (16224-16270), comprising seven samples, may correspond to the 16093-16224-16270 type of apparently North African ancestry that was found in two Canarian Islands (Pinto et al., 1996; Rando et al., 1999), because our sequencing reactions did not extend to the left of the 16154 site in these samples. No other mtDNAs have been found with the 16093- 16224-16270 or the 16224-16270 sequence types elsewhere. Of the two remaining North African sequence types, one (16224-16261-16270) may have derived directly from the most common type, as it differs from it at only one site. The remaining one has been found mainly in North Africa, but also in the Near East and sub-Saharan Africa. Its highest frequency was reported in the Berber-speaking ozabites of northern Algeria: 10 out of 85 samples (Coˆrte-Real et al., 1996). Other populations with lower frequencies are Moroccan Berbers and non- Berbers (Pinto et al., 1996; Rando et al., 1998), Egyptians (Krings et al., 1999), Syrians (Richards et al., 2000), and some East and West African tribes (Watson et al., 1997). It was also found in two of 54 samples from Portugal (Coˆrte-Real et al., 1996), but we believe its presence in the Iberian Peninsula is due to migrations related to the slave trade.

Two samples share the motif 16189-16362. They likely belong to the U2 clade, which is characterized by the 16051 motif (Kivisild et al., 1999; Macaulay et al., 1999), a site to which our sequencing reaction did not extend. However, most West Eurasian U2 mtDNAs, but not other haplogroup U clades, present substitutions at positions 16129 and 16362. Since these samples do not present motifs that would classify them under any other clade, but possess the 16362 transition, they likely belong to clade U2. Clade U2 is virtually absent in North Africa and is found in the Near East at somewhat higher frequencies than in Europe. However, the precise sequence type is found at a higher frequency in the Iberian Peninsula than in any Near Eastern population except the Kurdish (Richards et al., 2000). PC analysis does not assign the Puerto Rican West Eurasian population a decisively higher affinity to the Kurdish or the European Mediterranean Western Region population (see below). Thus, we can only conclude that these samples should originate either in the Iberian Peninsula or the Near East.

The remaining four sequence types, encompassing only five samples, are likely of European origin. One differs from the CRS only by a transition at position 16192. Although the exact sequence type has not been reported elsewhere, it is regarded as European in origin because of the instability of the 16192 site in haplogroup U mtDNAs and the fairly high frequency of the otherwise resulting sequence type (CRS) in the Iberian Peninsula. The remaining three European sequence types are either particular U5b* types common only throughout Europe or belong to subclade U5a1a, which evolved in Europe (Richards et al., 2000).

L3* subdivisions

The HVR-I sequences of the nine (/ ) samples for which no haplogroup-specific markers were found are shown in Table 5. They segregate into eight sequence types sharing the 16223-16311 motif. Three of the sequence types possess the 16209 transition diagnostic of the L3f clade and the 16292 transition of subclade L3f1 (Salas et al., 2002). Only these three sequences showed a 1-bp deletion in the 5-bp T-stretch that runs from 15940–15944 in the CRS. This deletion is not the result of errors in the CRS (Andrews et al., 1999); it may play a significant role in RNA translation efficiency, as it makes the T*C arm loop of the tRNA hr only two nucleotides long, and may become a useful phylogenetic marker for group L3* clades or to further subdivide subclade L3f1.

Another sequence type contains the 16293T-16355-16362 motif of clade L3g. The four remaining sequence types encompass five samples and cannot be grouped into any L3* clade. These thus remain classified as L3*. However, four of these five samples seem to be not too distantly related, as they all share the 16256A transversion. Two of them also share transitions at positions 16129 and 16362.  

Geographic distribution of mtDNAs by biological ancestry

Little change is observed when biological ancestry frequencies are corrected by sample weight. Frequencies and 95% confidence intervals of 61.0 3.4% Amerindian, 27.5 3.1% African, 11.4  2.2% West Eurasian, and 0.1  0.2% Asian (Table 3) are corrected to 61.3 3.4% Amerindian, 27.2  3.1% African, 11.5  2.2% West Eurasian, and 0.0% Asian (Table 6). 

Amerindian mtDNAs are the most common in all  municipalities except Loı’za, where African mtDNAs are more frequent, and Cayey, where the population is equally divided into African and Amerindian mtDNAs. Amerindian mtDNA frequencies are 50% or higher in all unicipalities except Loı’za, San Juan, and Carolina (Table 3).

In addition, West Eurasian frequencies are low in all municipalities (0–17.9%). Thus, in a triangular graph with axes representing biological ancestries, ancestry frequencies cluster close to the vertex where the Amerindian frequency equals one, and scatter next to the side defined by zero West Eurasian frequency, toward the vertex where African frequency equals one (Fig. 2). A negative Pearson correlation ( 9.19) between African and  Amerindian frequencies is observed that is significant at the  0.01 level (two-tailed test). That is, the biological ancestry frequency of municipalities can be virtually described by stating only their African or Amerindian frequencies.

Figure 3 divides the 28 sampled municipalities into 12 categories according to their Amerindian mtDNA frequencies, and divides Puerto Rico by longitude 66°16 West, as 12 of the 13 sugar mills that worked throughout the 16th century were built east of it. It can be observed that the three municipalities with the lowest Amerindian frequencies are next to each other in San Juan and further east. Further, all 11 municipalities east of longitude 66¢X16  West are among the 14 municipalities with the lowest Amerindian frequencies. There is a highly significant deviation from the null hypothesis that frequencies for all ancestries are the same east and west of longitude 66¢X16  West (Pearson 2  43.70, df 2, 0.001). 2  tests also show highly significant deviations from null hypotheses of equal frequencies on each side of longitude 66°16 West for Amerindian (Pearson2  41.72, df  1, P  0.001) and African (Pearson 2  34.40, df  1, P  0.001) mtDNAs. African mtDNAs are more frequent in the east than in the west; the reverse is true for Amerindian  mtDNAs. No significant difference is found for West Eurasian mtDNAs. 

Interestingly, the geographic distribution by biological ancestry does not fit expectations based on traditional history that place Amerindians fleeing to the mountains and African slaves working in sugar plantations on the coasts. The three municipalities with the highest Amerindian frequencies are coastal (Fig. 3), and 2  tests show that Amerindian frequencies in noncoastal municipalities are not significantly higher than those in coastal ones, and that African frequencies are not significantly higher in coastal than noncoastal municipalities.