swissbib has two options to import structured metadata and both are actually in use:
The basic import path goes through the data preparation module and is best suited for any library metadata that is not unique
The additional import path goes directly in the search engines index and is very well suited for unique materials that need not to be deduplicated (archival collection and alike)
Primary import path
During the import the data is analysed and transformed to Pica+ the internal format of the swissbib data preparation engine CBS (Central Bibliographic System). At this stage some of the data is excluded as it is not useful for swissbib.
The conversion to the internal format as well as other steps can be defined by source in order to get the maximum out of the individual data delivered by the local library networks.
A lot of the conversion steps are shared between the different networks.
Data sources imported through primary import path
all library data (18 sources)
the Swiss posters collection (1 source)
the institutional repositories (6 sources)
the e-periodica collection (1 source)
the e-codices collection (1 source)
data from publishers for national licences (1 source, 3 publishers)
Criteria for excluding data
In order to deliver useful data to the user, some classes of records should be excluded from swissbib while importing. This has no influence on the local systems that have in contrast to swissbib to divide their focus between librarians and library users.
A record is useful for a library user if it
contains valuable bibliographic information that is as controlled as possible
leads to items or electronic resources that can by tendency be accessed or loaned
leads to records with accessible items or resources
In order to achieve this swissbib applies general criteria for exclusion of data. There mainly centred around two topics:
data quality and richness (minimal requirements have to be met)
usefulness of the information provided
General exclusion criteria
The rules for exclusion are guided by the principle that a user should not find records that
have no items or resources attached
are not linkable to records with item or holding information
are only of use for libraries' internal administration
contain no usable title information
Record types to be excluded
Acquistion records These records normally are basic, uncontrolled and not linked to items directly useful to the user. A second and no less important factor is that these records hinder the deduplication process. Although some of them are of good quality there is no chance of determine this accurately. They are excluded until they are completed and the attached items are available. If a library is heavily dependent on acquisition records in swissbib, those records are imported but are excluded from the deduplication process.
Dummy records These records are generally used as auxiliary records for requesting uncatalogued items. Some of them contain basic information of usable quality but most of them are of no use as they neither contain complete author nor title information. They are of no use for the user while searching and cannot be deduplicated by software. If a library is heavily dependent on them and their quality is minimally acceptable it has to be checked, whether these records could be included in swissbib. At the moment this is not the case as most bigger libraries start or run recataloguing projects.
Records which do not match bibliographic requirements These records cannot be deduplicated and are therefore excluded. If they exist in larger quantities in a library network other solutions have to be found. Records are excluded if the title description (245) is missing OR the entry fields (100/110/111/700/711/700/711) contain wild-card characters instead of names OR the control field 008 is corrupt or missing.
Records without link and item Some of the bibliographic databases contain (mostly) analytical records that are neither linked (or linkable) to other records nor have an item or URL attached. These records are of no use for the user as he or she can neither find out where to look for a holding library nor has any item giving him/her additional information.
Records that are for internal use only Records or items that are only there to tell that an item is lost, removed from the collection (five years ago), cancelled, or else will not be included in swissbib. Furthermore a library can choose not to include records in swissbib by marking them locally with a code (“noswissbib”) or by requesting the exclusion of records matching specific criteria, which can be implemented by swissbib.
Secondary import path
The secondary import path is used to index full text, authority data and structured metadata that describes unique data.
Full text of abstract and indexes
swissbib uses the full text indexation to enrich the library metadata with abstracts and indexes that are scanned by the Swiss libraries. Unfortunately not all institutions share their content openly.
Alternative forms of names, titles, places and subject headings from the Gemeinsame Normdatei (GND) are indexed if a heading is present in the bibliographic data.
For unique data (mainly archival data) it makes little sense to transform and mix it with library data in the “data preparation module” because of the nature of this data. It is a lot easier to index it directly. In order to smoothen the process the data is converted into a MARC-field-structure that can be indexed with the same definitions as the library data.
Other transformation scenarios are possible but it isn't clear what could be gained with it.
Currently this applies to the following data:
Swiss national library: digitized content of the Swiss Literary Archives SLA
Swiss national library: digitized content of the Federal Archives of historic monuments (FAHM)