Data Export Best Practice Documentation

On http://swisscollections.ch , you can export your search results as a csv file or data package. This documentation shows how to use the Metadata package (ZIP) option.

swisscollections_export_highlight.png
Export your search results as data package

Quickstart

Please unzip the downloaded data package before you proceed.

Your data package contains title metadata in the following formats:

  • CSV

  • JSON Lines

  • MARCXML

Please note the swisscollections terms of use.

If your search results contain digitized titles, you can download image, text and PDF files in a second step. For achieving this, you have two options:

  • use your own script to iterate over CSV or JSON Lines files containing download links

  • use DownThemAll! (recommended) or a similar download manager browser plugin → DownThemAll! is available for Firefox, Chrome and Edge

For the second option, please open the file START_HERE.html in your browser and follow the instructions.

Detailed documentation

Your downloaded swisscollections data package contains two main folders:

  • data

  • ignore

You can ignore the ignore folder for now. All metadata files are inside the data folder.

Title metadata

You can find your title metadata in the formats CSV, JSON Lines and MARCXML inside the data folder :

  • data

    • csv

      • metadata

      • media_file_links

    • jsonl

    • marcxml

The jsonl, marcxml, csv/metadata and csv/media_file_links folders may contain multiple files with chunks of 200 title records each. In order to process the metadata of your whole result list, you could, for instance, write a Python script that iterates over all the files inside the jsonl folder.

For an overview over all possible title metadata elements in csv/metadata/ and jsonl/ files, see the underlying template. Please note: If a metadata element, e.g. "digital_platform", is absent from a given title record, the JSON key does not appear in the respective .jsonl line. The only exception to this rule is "page_download_links" which is always present, and which contains an empty list should there be no download links at all.

Terms of use

Please note the swisscollections terms of use. The following title metadata elements inside the data/csv/metadata/ and data/jsonl/ files contain important additional information:

  • "metadata_copyright" → contains the copyright of the metadata, usually CC0 or CC license. Is only partially present. Contains a sentence like “The catalog data is availble for further use under a CC0 license.”

  • "digital_reproduction_copyright" → contains the copyright for the digital object according to the licensing information from e-rara or e-manuscripta. Example: "https://creativecommons.org/publicdomain/mark/1.0/"

  • "digital_reproduction_rights_owner" → contains the rights owner from e-rara or e-manuscripta. Example: "Universitätsbibliothek Basel"

You can find the above title metadata elements in the files inside csv/metadata/ as well as inside jsonl/. See also our metadata JSON template.

Media file download

If your search results contain digitized titles, your swisscollections data package enables you to download image and text files as well as PDFs in a second step.

Disclaimer

swisscollections is a meta catalogue and does not serve image or text files. You can find download links inside your swisscollections data package, but the media files are downloaded directly from the platforms e-rara.ch and e-manuscripta.ch. swisscollections cannot guarantee that every single download is successful.

Download manager

In order to download digitized image and text pages or PDFs, the easiest way is to open the file START_HERE.html in your browser and follow the instructions.

You will be guided to web pages containing download links for image and possibly text files, and PDFs. (Audio and video files are currently not available). With the help of a download manager, you can easily download the image, text and PDF files. We recommend the browser extension DownThemAll! which is available for Firefox, Chrome and Edge. It is a Recommended Firefox extension and does not include any user tracking.

Set up the download manager DownThemAll!

https://youtu.be/qQvvoxiohA8?feature=shared&t=71
  • Go to DownThemAll! and get the extension for your browser.

  • Pin the extension to your browser’s toolbar. For instructions how to do this in Firefox, please see the FAQ.

  • Configure DownThemAll!

    • In Chrome and Edge, you first need to grant the extension the permission to access local file URLs. This is currently not necessary in Firefox.

      image-20240111-081945.png
      Chrome: Allow access to local files

    • You can now Download image and text files / PDFs.

Your own script

You may also write your own download script which iterates over the files inside the jsonl/ or csv/media_file_links/ folders and makes requests to the media file URLs contained therein.

The common data element between the csv files in csv/medatata/ and csv/media_file_links/ is the column identifier_swisscollections.

Recommended workflow for the data download

On http://swisscollections.ch, download your search results metadata by clicking the Export button. Please choose the Metadata package (ZIP) option. Save the zipped folder inside your download directory and extract it there. Please make sure that the extracted folder (e.g. zwingli_2024-06-03-11h41) contains directly the subfolders data and ignore and not another subfolder with the same name (e.g. zwingli_2024-06-03-11h41).

Open the file START_HERE.html in your browser. On this page, you can find a link to reproduce your query. To download the media files, follow the instructions.

Starting from Choose Media Format, you can open various download pages in order to download image and text files, as well as PDFs.

Images may be downloaded as jpeg files in three different sizes:

  • small (128px width)

  • medium (504px width)

  • large (original size)

OCR text, if available, may be downloaded in the following formats:

  • Plain text (.txt)

  • ALTO XML

Manuscript transcriptions, if available, may be downloaded in the following formats:

  • HTML

  • Markdown

You can also download one PDF file per title with the help of the PDF download page.

Download image and text files / PDFs

  • open a page with download links, e.g. Download Files: jpeg small

  • click on the icon of DownThemAll! in your browser’s toolbar and choose the option DownThemAll!

  • Choose the filter All files

  • Define the following mask:

    *title*/*text*

     

  • Click the Download button

Image and text files are saved inside the subfolder data/images/ or data/text/. During the download, one subfolder per title is created.

PDF files are saved inside the subfolder data/pdf/.

The download contains one file per digitized page. The file name is composed as follows: {Beginning of the title}_{swisscollections ID}_{page label}_{page order}.{file extension}

For further configuration options of the browser extension DownThemAll!, please consult the official documentation.

FAQ

If you open the csv files in Excel, please do so by opening a blank Excel document and then importing the csv file via Data > From Text/CSV. After choosing the csv file from your file explorer, set the character encoding to UTF-8, click Transform Data and set the data type of all columns to text. If you open the csv file directly with Excel, some values may be corrupted.

Instead of starting with START_HERE.html, you can go directly to the downloads overview by opening ignore/html/choose-media-format.html in your browser. Or you can directly open a specific download page, e.g. ignore/html/url-pdf.html

You can define the following mask inside DownThemAll!:

*text*

This will download your media files directly into your Downloads folder. You can specify a subfolder in the DownThemAll! downloads selection menu.

 

After adding the DownThemAll! extension to Firefox, you can pin it to the toolbar. Please click the jigsaw puzzle icon in the toolbar. Then right-click on DownThemAll! and select Pin to Toolbar.

Please see https://support.mozilla.org/en-US/kb/extensions-button for more information.

Inside the DownThemAll! downloads selection menu, please define the following mask:

*title*/*text*

Your media files should now be downloaded to a subfolder of data/ inside your data package folder, e.g. data/text/plain/<title>/.

Contact

For further questions or feedback please contact us at info@swisscollections.ch.