“I’m totally confused about citing online images,” he wrote. Why do some cite ARK numbers, some cite paths or waypoints, and some cite neither?”
In the 20 years since we published our first guide to citing historical records,1 we’ve learned something well: If one person is confused about something and musters the courage to ask, there are thousands more who won’t ask. So, instead of answering our perplexed friend privately, we’re doing a public tutorial here.
Citing online digital images is complicated because of three things:
- We aren’t just citing an image. Yes, that’s the main thing we’re interested in, but we actually have multiple things to cite.
- Providers deliver their data in different ways. That affects how records have to be identified.
- Even the same provider will organize different collections in different ways. Those differences can change how we have to identify the collections—both our formats and our wording.
When you cite an online image copy of an original record book or original file—whether it's provided by FamilySearch, Ancestry, or any other entity that did not create the original record—you have at least two different things to identify. Each is a separate part of the same “citation sentence.” We refer to each part as a “layer" within the citation, and each layer is separated by a semicolon.
- In Layer 1, you cite the original that you are eyeballing as a digital image. You cite it the same way you would if you were using the original or microfilm of the original.
- In Layer 2, you cite the database and website that provided the images—prefaced by explanatory words such as "imaged at ... ." This layer is essential, even when you are citing common records such as censuses that can be accessed through various providers. Each provider has used its own image-enhancement processes. What you see in an image at one provider is not necessarily the same as what you will see in the "same" image at another provider.2
You might also have a Layer 3 in which you report whatever the provider gives as the source of its source. How to handle that issue is not part of this QuickLesson. We have already addressed it elsewhere.3
Citing the items in Layer 1 has its own complications, depending on the type of record we have found, EE’s chapters 3–11 discuss all that with separate chapters for different types of historical records (censuses, courthouse property records, military files, etc.)
How to cite the providers in Layer 2—the website, its database, and the exact location of the image of interest—is the purpose of this tutorial.
With some image databases, the providers are super helpful. In addition to the image, they give us a search engine in which we can type our query. Then they deliver results in the form of a (a) list of hits; or a (b) “typed” page with abstracts4 or extracts5 from the document that show details about our item of interest. We generically refer to the detail in both types of hits as a “database entry.”
Data provided by a database can be cited easily. The basic rule is this:
- Websites, being a "standalone" publication, are cited like a book.
- A database at a website is cited like a chapter in a book.
- Database entries are cited in the field in which we cite the specific page of a book.
Following this rule gives us a basic pattern for citing online databases:
“Name of Database in Quotation Marks,” type of database material, Title of Website in Italics (URL=Place of Publication : date of publication or access), exact item of interest.
EE’s QuickSheet: Citing Ancestry Databases & Images (2d ed., 2017) provides this example for citing a database entry:
“World War I Draft Registration Cards, 1917–1918,” database with images, Ancestry (http://www.ancestry.com : accessed 1 January 2017), database entry for Clovis Julian, born 26 July 1887, New Orleans.
Some databases with images also have a simple structure that can be cited with this same pattern. All we have to do is change the description of our item of interest:
“World War I Draft Registration Cards, 1917–1918,” database with images, Ancestry (http://www.ancestry.com : accessed 1 January 2017), imaged card for Clovis Julian, no. 120, New Orleans Draft Board 13.
However, things are changing—especially with the mega-providers.
ARK & PAL Citations
FamilySearch, over the past few years, has used two kinds of so-called "persistent identifiers" called ARK and PAL:6
“Oklahoma County Marriage Records,” database with images, FamilySearch (https://familysearch.org/ark:/61903/3:1:9Q97-Y3Q7-Z6Y?owc=waypoints&cc=1709399 : accessed 1 January 2017) …
You can see actually the terms "arc" and "pal" in the URL; so you know you are dealing with a persistent identifier that (a) will take you directly to the page of interest; and (b) optimistically will not develop link-rot.
However, three problems exist here:
- No identifier is ever really permanent. For example, the PALs used by FamilySearch for several years are being replaced by ARKs. In that changeover, part of the URL is also changed. In the example used above (an example EE has used since 2012), three sections of the URL has now changed. The new URL is https://familysearch.org/ark:/61903/1:1:MVXB-3VQ.7
- People make typos. Even when cutting-and-pasting, we often lose a digit or add one—thereby making a URL unworkable.
- In addition to citing the exact URL—a non-intuitive jumble of numbers and letters—we may also need to add information about the database’s organizational scheme so that (a) we can relocate the image when either A or B happens; and (b) we can better understand the type of record we're working with.
ARK + Path Citations
When we add the data that Point 3 calls for, we describe this as adding a “path” with “waypoints” to lead us to the image of the exact record we have cited in Layer 1. Sometimes the path is very short. Let's take the the Oklahoma example, above, and place the path and its waypoints in their appropriate position—the last field of the citation, where we normally put our "item of interest."
Oklahoma County Marriage Records,” database with images, FamilySearch (https://familysearch.org/ark:/61903/3:1:9Q97-Y3Q7-Z6Y?owc=waypoints&cc=1709399 : accessed 1 January 1017) > 1313685 (004532716) > image 479 of 711.
You will notice that each waypoint is introduced by a “greater than” sign (>), which tells us “the unit that came before this sign is greater than the unit that comes after this sign.” The two waypoints in this example are
- 1313685 (004532716), which identifies the roll of Family History Library (FHL) film that FamilySearch has imaged and (in parentheses) the new digitization project number.
- Image 479 of 711, which tells us where on that roll of film we can find the image of the record we will have cited in Layer 1.
Now let's complicate things.
Path Citations without ARK or PAL
Many collections digitized online at FamilySearch (and now Ancestry, as well) are organized in a more complex fashion that creates numerous waypoints to cite, each of them nested inside of others.
EE 7.18 (3d. ed. rev., 2017) shows us how to cite a set of registers from a church in St. Libory, Illinois.
The second layer of that citation, where we identify the provider and its database, looks like this:
…; accessed as “Illinois, Diocese of Belleville, Catholic Parish Records, 1729–1956,” browsable images, FamilySearch (https://familysearch.org/search/collection/1388122 : 1 April 2015), path: St. Clair County > St. Libory > St. Liborious > 1849–1862 Baptisms, First Communion, Confirmations > image 33 of 68.
The boldface used above reflects two alterations from the format in the Oklahoma example.
- The type of database has changed. It's no longer "database with images." Now it's "browsable images." This term is used for a set of records with no indexing and no neatly typed abstract or extract of the record. We have to browse the images to find what we need.
- The URL pattern has changed. This is not an ARK or PAL citation. Because we are browsing a collection, instead of going straight to the image we want, the URL leads to the landing page for that collection. At FamilySearch, when we select a collection and go to its landing page, the URL typically includes the word "collection" followed by a slash and a collection number. We use this URL, optimistically assuming that the collection number will never change.
You will also note that the path in this St. Libory citation is much longer. To get to the specific image in this database, we have 5 waypoints to identify—five menu items through which we drill down to the image of interest. We cite these from the biggest to the smallest:
- St. Clair County—a link with a menu from which we choose→
- St. Libory—a link with a menu from which we choose→
- St. Liborious—a link with a menu from which we choose→
- 1849-1862 Baptisms, First Communion, Confirmations—actual images from which we choose→
- Image 33 of 68.
When we combine this citation to the provider (our Layer 2) with the Layer 1 citation of the actual source, we end up with this layered citation:
St. Liborious Church (St. Libory, Illinois), “Liber Baptismalis ab anno 1849 die 30 Murt. Usque ad initium anni 1863,” unnumbered pages, unnumbered entries in chronological order, “Elisabetham Aberle,” baptism, 12 November 1857; accessed as “Illinois, Diocese of Belleville, Catholic Parish Records, 1729–1956,” browsable images, FamilySearch (https://familysearch.org/search/collection/1388122 : 1 April 2015), path: St. Clair County > St. Libory > St. Liborious > 1849–1862 Baptisms, First Communion, Confirmations > image 33 of 68.
This example assigns a different font color to each of the two layers. From this you can easily see a critical point:
- All information that identifies the original record appears in Layer 1.
- All information that identifies the website and its database appears in Layer 2.
- Never, ever, should details from one layer be mixed into the other layer. For example, the original volume has pages. It does not have image numbers. Image numbers are a unit of the image database. That image number should not appear in the part of the citation in which we identify the original register. Nor should our discussion of pages (something we do find in original registers) appear in Layer 2 where we are discussing elements of the database.
Some website providers such as Ancestry are not currently using ARKs, PALS, or numbered collections. Even so, they may have complicated databases that require us to cite the path and its waypoints. EE’s QuickSheet: Citing Ancestry Databases & Images (2d ed., 2017) provides this example for creating a Layer 2, when citing a ship’s crew list:
…; imaged in “California Passenger and Crew Lists, 1882–1959,” database with images, Ancestry (http://www.ancestry.com : accessed 1 January 2017) > M1416 – San Francisco, 1905–1954 > 012 > image 656 of 1016.
The actual URL Ancestry generates for the image is this:
Citing all that would be foolish.8 Yes, we could use a site such as Bitly to shorten the citation. Or, more simply, we could just cite the short URL to Ancestry's home page: http://www.ancestry. There we can query for the database, after which we have three waypoints to follow:
- M1416 San Francisco, 1905–1954 (the publication number and name of the National Archives microfilm that Ancestry has imaged)
- 12 (the specific roll of film)
- Image 656 of 1016 (the specific image for the specific page of the specific document we're interested in)
When we add this Layer 2 to our citation of the original record (Layer 1), the result is this:
“Alien Crew List,” S.S. Arrino (Lota, Chile, to San Francisco), arriving 7 October 1913, p. 11, C. S. Dendy; imaged in “California Passenger and Crew Lists, 1882–1959,” database with images, Ancestry (http://www.ancestry.com : accessed 1 January 2017) > M1416 – San Francisco, 1905–1954 > 012 > image 656 of 1016.
In the twenty years since we published our first guide to citing historical resources, technology has made it so much easier to access the records that are our lifeblood. At the same time, it has also complicated the one part of research that makes most people cringe: citation.
Source identification is now more complicated. What was once just tedious is now often baffling. Yet, the very fact that historical records are being processed in so many different ways—ways that can change their content, their legibility, and their dependability—makes it all the more critical that our citations identify precisely what we have used.
The old concept that "We cite sources so others know where we found our stuff" isn't good enough for reliable researchers. We know that the real reason for investing this effort is to ensure that we use the most reliable sources possible. We accomplish that by carefully studying what it is we are using and recording enough details that we can accurately evaluate reliability any time we have conflicting information.
The tedium of citing materials as random as historical records will likely always exist. But the task does not have to be baffling. If we take the time to learn basic patterns and to understand why variations exist, we can construct our own source identifications even when EE or one of its QuickSheets is not at hand.
1. Elizabeth Shown Mills, Evidence! Citation & Analysis for the Family Historian (Baltimore: GPC, 1997).
2. If you aren't convinced of this point yet, see "It's What You Don't See," Ancestry Magazine 27 (May/June 2009): 34–35; archived at Google Books (http://bit.ly/2uLMrJr). This article features images of census pages from Manchester, England, 1861—showing before-and-after photo enhancement by Ancestry. What was utterly unreadable became quite readable under a combination of infrared, ultraviolet, fluorescent, and incandescent lighting and a skillfully modified camera.
3. For a tutorial on how to handle the source-of-our-source element of a citation, see "The Source of the Source, of Course, of Course," blog posting, Evidence Explained: Historical Analysis, Citation & Source Usage (https://www.evidenceexplained.com/quicktips/source-source-course-course : posted 6 March 2015).
4. Abstracts in a notetaking context, are a condensed version of a record, preserving all important details in original sequence. An abstract may contain verbatim extracts from the record, in which case those exactly copied words are placed in quotation marks.
5. Extracts are portions of text quoted verbatim out of a record; they always should be enclosed in quotation marks when integrated into an abstract or other piece of writing. (Most online databases will extract data without quotation marks.) Unlike a transcript, an extract does not represent the complete record; but it is more precise than an abstract. For a tutorial on the differences between various other types of derivatives and research notes, see "QuickLesson 10: Original Records, Image Copies, and Derivatives," Evidence Explained (https://www.evidenceexplained.com/content/quicklesson-10-original-records-image-copies-and-derivatives : posted 28 July 2012).
6. The digital world (particularly within institutions) has developed a number of "persistent identifiers" other than ARKs (Archival Resource Keys) and PALs (Persistent Archival Locators). We also find DOIs (Digital Object Identifiers), PURLs (Persistent Uniform Resource Locators), URNS (Uniform Resource Names), XRIs (Extensible Resource Identifiers), and the Handle System. This QuickLesson focuses on the two systems used by the largest providers of digitized historical records.
7. Amid changeovers of this type, researchers immediately wonder whether they need to go back through their own research databases or notes to revise each citation that is affected. FamilySearch at this point reports that the PAL links will remain "permanently" operable.
8. EE user Robert Laurens offers other suggestions for parsing and shortening these long Ancestry URLs; see his responses to the thread "Using Permalinks in Citations," begun by K Britanik in EE's Citation Issues Forum, under dates of 9–13 March and 20 April 2017.
Posted 30 July 2017
“I’m totally confused about citing online images,” he wrote. "Why do some cite ARK numbers, some cite paths or waypoints, and some cite neither?" In the 20 years since our first guide to source citations was published, we have learned one thing well ...