Skip to content

YaleMakes/data-and-utilities

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Digital Collections Public Domain Item Data and Tools

Did you know that nearly one-third of the items in our Digital Collections are in the public domain -- that is, they have been designated as having no known U.S. copyright restrictions? This means that everyone has the freedom to enjoy and reuse these materials in almost limitless ways. To help you explore, visualize, and repurpose these items, we've gathered all of their metadata into a single data release. (Based on feedback from this release, we'll be considering regular update possibilities, but at this time the data is a snapshot of our data from 12/30/15. See the NYPL Digital Collections Metadata API for updated information and for data about the non-public domain portions of our Digital Collections.)

This dataset is organized by Items and Collections in both CSV and JSON formats. Our descriptive metadata is normally stored in the MODS schema (which is what you'll find in our Digital Collections API), but for this release we've simplified and flattened the metadata structure for CSV to make it easier to navigate with spreadsheet tools. The JSON versions include a bit more metadata, including URIs for many names and subjects and links to the full-size images comprising each item.

NYPL has been digitizing collections since 1999, so our metadata reflects an evolution of standards, practices, and workflows. We are actively refining our metadata creation and quality control processes and exploring ways to improve the consistency and accuracy of our legacy metadata, but in the meantime, you may find some idiosyncracies and curiosities in our data. If you'd like to bring certain issues to our attention, we welcome your feedback through our Digital Collections feedback form.

Items

Items are distinct intellectual or bibliographic entities in Digital Collections. They can be photographs, full books or illustrations from books, journals, letters, pamphlets, skull fragments, cuneiform tablets, and much more. Most items belong to collections, but there are some (usually books) that stand alone. Items are made up of one or more images (or "captures").

Below are the metadata fields you'll find in the CSV and JSON files that describe our public domain items. In the CSV file, arrays of strings are represented as a single pipe-delimited string, like "Wollrabe, Amalie | Helmerding, Carl, 1822-1899".

Field Description CSV JSON Value
UUID The unique identifier for the item. This is always present. UUID UUID string
Database ID The item's identifier from our database. This is always present. Database ID databaseID integer
Title The primary title of the item. This should always be present. Title title string
Alternative title Any additional or alternative titles for the item. Alternative Title alternativeTitle array of strings
Contributor A list of people or organziations who contributed to the creation of the item. An item may have zero or more contributors. Contributor contributor array of strings (CSV) / array of objects (JSON)
Contributor name (JSON) The name of the contributor. - contributor.contributorName string
Contributor type (JSON) The type of contributor name. This is usually 'personal' for individuals or 'corporate' for groups or organizations. - contributor.contributorType string
Contributor role (JSON) The role(s) the contributor played in the creation of the item, if available. - contributorRole array of strings
Contributor URI (JSON) The VIAF URI for the contributor, if available. - contributor.contributorURI string
Date The date the item was originally created or published. Zero or more dates may be recorded. Many dates are encoded in YYYY-MM-DD format, but others are in free text format, like "ca. 1890", "1760-1770?" or "1920s". Date date array of strings
Date start The earliest date recorded for the item. This could be the earliest of multiple single dates or the start date of a date range. Date Start dateStart string
Date end The latest date recorded for the item. This could be the latest of multiple single dates or the end date of a date range. Date End dateEnd string
Language The language of the item. An item may have zero or more languages. Language language array of strings
Description A summary or description of the contents of the item, if available. Description description string
Note A list of notes associated with the item. In the CSV, each note in the list is prefaced by a label denoting the note type, like "General Note: " Note note array of strings (CSV) / array of objects (JSON)
Note type (JSON) The type of note. - note.type string
Note text (JSON) The text of the note. - note.text string
Topical subject A list of topical subjects. Most terms are taken from LCSH or LCTGM. Complex subject headings are usually broken down into individual subjects. Subject Topical subjectTopical array of strings (CSV) / array of objects (JSON)
Topical subject text (JSON) The text of the topical subject. - subjectTopical.text string
Topical subject URI (JSON) A URI for the topical subject, if available. - subjectTopical.URI string
Name subject A list of people or organizations described or depicted in the contents of the item. Most terms come from the LC Name Authority File. Subject Name subjectName array of strings (CSV) / array of objects (JSON)
Name subject text (JSON) The name of the subject. - subjectName.text string
Name subject URI (JSON) A URI for the name subject, if available. - subjectName.URI string
Geographic subject A list of places described or depicted in the contents of the item. Most terms come from LCNAF and LCSH. Subject Geographic subjectGeographic array of strings (CSV) / array of objects (JSON)
Geographic subject text (JSON) The name of the place. - subjectGeographic.text string
Geographic subject URI (JSON) A URI for the geographic subject, if available. - subjectGeographic.URI string
Temporal subject A list of time periods related to the contents of the item. Many terms come from LCSH. Subject Temporal subjectTemporal array of strings (CSV) / array of objects (JSON)
Temporal subject text (JSON) The text of the time period. - subjectTemporal.text string
Temporal subject URI (JSON) A URI for the temporal subject, if available. - subjectTemporal.URI string
Title subject A list of titles described or depicted in the contents of the item. Most terms come from LCNAF. Subject Title subjectTitle array of strings (CSV) / array of objects (JSON)
Title subject text (JSON) The text of the title. - subjectTitle.text string
Title subject URI (JSON) A URI for the title subject, if available. - subjectTitle.URI string
Type of resource A list of broad resource types categorizing the content of the resource. Terms are drawn from the following list: (text , still image, moving image, cartographic, notated music, sound recording, three dimensional object, mixed material). Resource Type resourceType array of strings
Genre A list of terms that describe the nature of the content or function of the resource at a greater level of specificity than Type of Resource. This field is currently very uncontrolled, with some terms representing physical form and many items without genre terms at all. Most terms are taken from LCTGM, with some coming from AAT, LCSH, and LCGFT. Genre genre array of strings (CSV) / array of objects (JSON)
Genre text (JSON) The text of the genre. - genre.text string
Genre URI (JSON) The URI of the genre, if available. - genre.URI string
Identifier - Bnumber The catalog identifier, if the item is represented in the NYPL catalog. Identifier BNumber identifierBNumber string
Identifier - Accession number The accession number of the item, if available. Identifier Accession Number identifierAccessionNumber string
Identifier - Call number The call number of the physical item, if available. Identifier Call Number identifierCallNumber string
Identifier - ISBN The ISBN of the item, if available. Identifier ISBN identifierISBN string
Identifier - ISSN The ISSN of the item, if available. Identifier ISSN identifierISSN string
Identifier - Interview ID The interview id of interview items, if available. Identifier Interview ID identifierInterviewID string
Identifier - Postcard ID The publisher series number of the postcard item, if available. Identifier Postcard ID identifierPostcardID string
Identifier - LCCN The LCCN of the item, if available. Identifier LCCN identifierLCCN string
Identifier - OCLC/RLIN The OCLC or RLIN number of the item, if available. Identifier OCLC/RLIN identifierOCLCRLIN string
Physical description - Extent The number and dimensions of the physical item, if available. Physical Description Extent physicalDescriptionExtent array of strings
Physical description - Form A list of terms describing the physical format or medium of the item. Physical Description Form physicalDescriptionForm array of strings
Publisher The publisher of the item content. Publisher publisher array of strings
Place of publication The place(s) where the item was created or published. Place Of Publication placeOfPublication array of strings
Collection UUID The UUID of the item's parent collection, if applicable. Collection UUID collectionUUID string
Container UUID The UUID of the item's immediate parent container, if applicable. Use this identifier to find an item's parent collection metadata in the Collections data. Container UUID containerUUID string
Collection title The title of the item's parent collection, if applicable. Collection Title collectionTitle string
Container Title The title of the item's immediate parent container, if applicable. ContainerTitle containerTitle string
Parent hierarchy The hierarchy of the item's direct ancestors, from collection to item. Parent Hierarchy parentHierarchy string
Number of captures The number of images comprising the item. Number of Captures numberOfCaptures integer
First image (CSV) A link to the full-size image of the item's first capture. First Image - string
Captures (JSON) A list of links to the full-size jpgs for an item's captures. - captures array of strings
Digital Collections URL A link to the item in Digital Collections. Digital Collections URL digtalCollectionsURL string

Example item (JSON version):

{
  "UUID": "17159270-c556-012f-af61-58d385a7bc34",
  "databaseID": 3384249,
  "title": "Norway and Sweden, 1895 [part 1].",
  "alternativeTitle": [],
  "contributor": [
    {
      "contributor": "Vinkhuijzen, Hendrik Jacobus",
      "contributorType": "personal",
      "contributorRole": [
        "Collector"
      ],
      "contributorURI": null
    }
  ],
  "date": [],
  "dateStart": null,
  "dateEnd": null,
  "language": [
    "English"
  ],
  "description": [],
  "note": [],
  "subjectTopical": [
    {
      "text": "Military uniforms",
      "URI": "http://id.loc.gov/authorities/subjects/sh85139693"
    },
    {
      "text": "History",
      "URI": "http://id.loc.gov/authorities/subjects/sh85061212"
    }
  ],
  "subjectName": [],
  "subjectGeographic": [],
  "subjectTemporal": [],
  "subjectTitle": [],
  "resourceType": [
    "still image"
  ],
  "genre": [],
  "identifierBNumber": null,
  "identifierAccessionNumber": null,
  "identifierCallNumber": null,
  "identifierISBN": null,
  "identifierISSN": null,
  "identifierInterviewID": null,
  "identifierPostcardID": null,
  "identifierLCCN": null,
  "identifierOCLCRLIN": null,
  "physicalDescriptionExtent": [],
  "physicalDescriptionForm": [],
  "publisher": [],
  "placeOfPublication": [],
  "collectionUUID": "51894d20-c52f-012f-657d-58d385a7bc34",
  "containerUUID": "11eb5c20-c556-012f-f8bb-58d385a7bc34",
  "collectionTitle": "Prints and drawings collected by H.J. Vinkhuijzen",
  "containerTitle": "Norway and Sweden, 1895 [part 1].",
  "parentHierarchy": "Prints and drawings collected by H.J. Vinkhuijzen / Norway and Sweden. / Norway and Sweden, 1895 [part 1].",
  "numberOfCaptures": 1,
  "captures": [
    "http://images.nypl.org/index.php?id=437185&t=g"
  ],
  "digitalCollectionsURL": "http://digitalcollections.nypl.org/items/17159270-c556-012f-af61-58d385a7bc34"
}

Collections

Collections in Digital Collections usually represent physical collections at NYPL. These can be the personal papers of an individual or organization, like the United States Sanitary Commission Records, artifacts belonging to a prolific collector, like the Thomas Addis Emmet Collection, items collected around a certain subject or genre, like [Prints Depicting Dance](Prints depicting dance ) or Maps of North America, or works of art the library holds of a particular artist, like Berenice Abbott's Changing New York. Sometimes a "collection" can be a book or or object that has its own distinct intellectual items described further within, like Apartment Houses of the Metropolis.

Metadata included in the collections files follows the same format as for items, with a few exceptions. Instead of 'Number of captures', 'First image', and 'Captures', the collections data includes the following field:

Field Description CSV JSON Value
Number of Items The number of public domain items contained in the collection. Number of Items numberOfItems integer

The collections data also does not include 'Parent hierarchy', 'Collection UUID', 'Container UUID', 'Collection title', and 'Container title'. Each collection's own UUID and title are represented in 'UUID', and 'Title'.

Example collection (JSON version):

{
  "UUID": "954eecd0-c5bf-012f-9413-58d385a7bc34",
  "databaseID": 25812,
  "title": "Samuel J. Tilden papers, 1794-1886, bulk (1835-1876).",
  "alternativeTitle": [],
  "contributor": [
    {
      "contributor": "Tilden, Samuel J. (Samuel Jones) (1814-1886)",
      "contributorType": "personal",
      "contributorRole": [
        "Author"
      ],
      "contributorURI": "http://viaf.org/viaf/28125745"
    }
  ],
  "date": [
    1794
  ],
  "dateStart": 1794,
  "dateEnd": 1886,
  "language": [],
  "description": [],
  "note": [
    {
      "type": "ownership",
      "text": "1903 Tilden, Samuel J. - Estate & Trust Gift and purchase"
    },
    {
      "type": "biographical/historical",
      "text": "Samuel Jones Tilden (1814-1886) was an attorney, prominentDemocrat, governor of New York in 1874-1875, and U.S. presidential candidate in 1876."
    },
    {
      "type": "content",
      "text": "The Tilden papers are comprised of correspondence, political and legal files, financial documents, writings, speeches, and personal papers documenting the political and legal career of Samuel J. Tilden. Material dates from 1785 - 1929 (bulk 1832 - 1886)."
    },
    {
      "type": "ownership",
      "text": "MSS 86M75"
    }
  ],
  "subjectTopical": [],
  "subjectName": [
    {
      "text": "New York Public Library",
      "URI": null
    },
    {
      "text": "Tammany Hall",
      "URI": null
    },
    {
      "text": "Tilden, Samuel J. (Samuel Jones), 1814-1886",
      "URI": null
    }
  ],
  "subjectGeographic": [],
  "subjectTemporal": [],
  "subjectTitle": [],
  "resourceType": [
    "mixed material"
  ],
  "genre": [
    {
      "text": "Documents",
      "URI": "http://id.loc.gov/vocabulary/graphicMaterials/tgm003185"
    },
    {
      "text": "Correspondence",
      "URI": "http://id.loc.gov/vocabulary/graphicMaterials/tgm002590"
    },
    {
      "text": "personal papers",
      "URI": ""
    }
  ],
  "identifierBNumber": "b11652246",
  "identifierAccessionNumber": null,
  "identifierCallNumber": "MssCol 2993",
  "identifierISBN": null,
  "identifierISSN": null,
  "identifierInterviewID": null,
  "identifierPostcardID": null,
  "identifierLCCN": null,
  "identifierOCLCRLIN": "NYPW92-A241",
  "physicalDescriptionExtent": [
    "49.4 linear feet (99 boxes, 13 v.)"
  ],
  "physicalDescriptionForm": [],
  "publisher": [],
  "placeOfPublication": [],
  "numberOfItems": 1561,
  "digitalCollectionsURL": "http://digitalcollections.nypl.org/collections/954eecd0-c5bf-012f-9413-58d385a7bc34"
}

Attribution

NYPL's bibliographic metadata records provided via this repository are distributed under a Creative Commons CC0 1.0 Universal Public Domain Dedication ("CCO 1.0 Dedication").

Code Examples

We've included a few example scripts and utilities to help you get started digging into the data made available in this repository.

Pull Requests and Issues

Are you doing cool things with our public domain items? Have a script or utility you'd like to share? We welcome your pull requests for code examples that can help others access, reuse, and remix our data and images.

While we appreciate your support in cleaning up our legacy data, we are not able to accept your pull requests to our CSV and JSON datasets. We are actively working to improve our data consistency and completeness within Digital Collections, but we do rely on you to help point out factual inaccuracies. If you would like to contribute corrections to our data, please send us your feedback through our form at Digital Collections or email to [DigitalCollections@nypl.org](mailto: DigitalCollections@nypl.org).

If you have suggestions for how we can improve this data documentation, please let us know in Issues.


About the NYPL Public Domain Release

On January 6, 2016, The New York Public Library enhanced access to public domain items in Digital Collections so that everyone has the freedom to enjoy and reuse these materials in almost limitless ways. For all such items the Library now makes it possible to download the highest resolution images available directly from the Digital Collections website.

That means more than 187,000 items free to use without restriction! But we know that 180K of anything is a lot to get your head around — so as a way to introduce you to these collections and inspire new works, NYPL Labs developed a suite of projects and tools to help you explore the vast collections and dive deep into specific ones.

Go forth, reuse, and let us know what you made with the #nyplremix hashtag! For more information:

About

Snapshot of Item and Collection data for public domain materials in NYPL Digital Collections, as part of NYPL's January 2016 public domain release.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Shell 44.2%
  • Python 32.4%
  • JavaScript 21.1%
  • R 2.3%