Data Export Best Practice Documentation
On http://swisscollections.ch , you can export your search results as a csv file or data package. This documentation shows how to use the Metadata package (ZIP) option.
Quickstart
Please unzip the downloaded data package before you proceed.
Your data package contains title metadata in the following formats:
CSV
JSON Lines
MARCXML
Please note the swisscollections terms of use.
If your search results contain digitized titles, you can download image, text and PDF files in a second step. For achieving this, you have two options:
use your own script to iterate over CSV or JSON Lines files containing download links
use DownThemAll! (recommended) or a similar download manager browser plugin → DownThemAll! is available for Firefox, Chrome and Edge
For the second option, please open the file START_HERE.html in your browser and follow the instructions.
Detailed documentation
Your downloaded swisscollections data package contains two main folders:
data
ignore
You can ignore the ignore folder for now. All metadata files are inside the data folder.
Title metadata
You can find your title metadata in the formats CSV, JSON Lines and MARCXML inside the data folder :
data
csv
metadata
media_file_links
jsonl
marcxml
The jsonl, marcxml, csv/metadata and csv/media_file_links folders may contain multiple files with chunks of 200 title records each. In order to process the metadata of your whole result list, you could, for instance, write a Python script that iterates over all the files inside the jsonl folder.
For an overview over all possible title metadata elements in csv/metadata/ and jsonl/ files, see the underlying template. Please note: If a metadata element, e.g. "digital_platform"
, is absent from a given title record, the JSON key does not appear in the respective .jsonl line. The only exception to this rule is "page_download_links"
which is always present, and which contains an empty list should there be no download links at all.
Terms of use
Please note the swisscollections terms of use. The following title metadata elements inside the data/csv/metadata/ and data/jsonl/ files contain important additional information:
"metadata_copyright"
→ contains the copyright of the metadata, usually CC0 or CC license. Is only partially present. Contains a sentence like “The catalog data is availble for further use under a CC0 license.”"digital_reproduction_copyright"
→ contains the copyright for the digital object according to the licensing information from e-rara or e-manuscripta. Example:"https://creativecommons.org/publicdomain/mark/1.0/"
"digital_reproduction_rights_owner"
→ contains the rights owner from e-rara or e-manuscripta. Example:"Universitätsbibliothek Basel"
You can find the above title metadata elements in the files inside csv/metadata/ as well as inside jsonl/. See also our metadata JSON template.
Media file download
If your search results contain digitized titles, your swisscollections data package enables you to download image and text files as well as PDFs in a second step.
Disclaimer
swisscollections is a meta catalogue and does not serve image or text files. You can find download links inside your swisscollections data package, but the media files are downloaded directly from the platforms e-rara.ch and e-manuscripta.ch. swisscollections cannot guarantee that every single download is successful.
Download manager
In order to download digitized image and text pages or PDFs, the easiest way is to open the file START_HERE.html in your browser and follow the instructions.
You will be guided to web pages containing download links for image and possibly text files, and PDFs. (Audio and video files are currently not available). With the help of a download manager, you can easily download the image, text and PDF files. We recommend the browser extension DownThemAll! which is available for Firefox, Chrome and Edge. It is a Recommended Firefox extension and does not include any user tracking.
Set up the download manager DownThemAll!
Go to DownThemAll! and get the extension for your browser.
Pin the extension to your browser’s toolbar. For instructions how to do this in Firefox, please see the FAQ.
Configure DownThemAll!
In Chrome and Edge, you first need to grant the extension the permission to access local file URLs. This is currently not necessary in Firefox.
Chrome: Allow access to local filesYou can now Download image and text files / PDFs.
Your own script
You may also write your own download script which iterates over the files inside the jsonl/ or csv/media_file_links/ folders and makes requests to the media file URLs contained therein.
The common data element between the csv files in csv/medatata/ and csv/media_file_links/ is the column identifier_swisscollections.
Recommended workflow for the data download
On http://swisscollections.ch, download your search results metadata by clicking the Export button. Please choose the Metadata package (ZIP) option. Save the zipped folder inside your download directory and extract it there. Please make sure that the extracted folder (e.g. zwingli_2024-06-03-11h41) contains directly the subfolders data and ignore and not another subfolder with the same name (e.g. zwingli_2024-06-03-11h41).
Open the file START_HERE.html in your browser. On this page, you can find a link to reproduce your query. To download the media files, follow the instructions.
Starting from Choose Media Format, you can open various download pages in order to download image and text files, as well as PDFs.
Images may be downloaded as jpeg files in three different sizes:
small (128px width)
medium (504px width)
large (original size)
OCR text, if available, may be downloaded in the following formats:
Plain text (.txt)
ALTO XML
Manuscript transcriptions, if available, may be downloaded in the following formats:
HTML
Markdown
You can also download one PDF file per title with the help of the PDF download page.
Download image and text files / PDFs
open a page with download links, e.g. Download Files: jpeg small
click on the icon of DownThemAll! in your browser’s toolbar and choose the option DownThemAll!
Choose the filter All files
Define the following mask:
*title*/*text*
Click the Download button
Image and text files are saved inside the subfolder data/images/ or data/text/. During the download, one subfolder per title is created.
PDF files are saved inside the subfolder data/pdf/.
The download contains one file per digitized page. The file name is composed as follows: {Beginning of the title}_{swisscollections ID}_{page label}_{page order}.{file extension}
For further configuration options of the browser extension DownThemAll!, please consult the official documentation.
FAQ
Contact
For further questions or feedback please contact us at info@swisscollections.ch.