koala is a service which can be used to form a long term archival solution in combination with a pre-existing data repository. It is not designed to face end users, rather to be used as a backend service in a service oriented architecture. Gathering of technical metadata and basic validation must be done before ingesting data into koala.
© aratehortua - stock.adobe.com
Example use case¶
The GWDG is a cooperation partner of the Deutsche Nationalbibliothek (DNB) in the field of digital long-term archiving. Its task is to collect, permanently archive, comprehensively document and record bibliographically all German and German-language publications and that includes digital records as well.
Workflow: The producer of assets (for example a publishing company) delivers books in a digital format to the DNB. The DNB does some basic validation and extracts technical metadata followed by the creation of a Submission information package (SIP). A SIP is an archive file like zip or tar which includes content files and descriptive metadata.
The SIP gets send to koala for accounting and additional checks. Afterwards the SIP is saved natively into IBM Spectrum Protect (SP). SP has multiple backends available like tape/filesystem/object storage. Depending on currently configured storage rules SP replicates the data to these backends on different sites to ensure long term data integrity.
Administrative interface for observing the running koala system and doing administrative tasks. See: adm
- Fast and scalable: Can be deployed on multiple hosts. See: Deployment
- Easy deployment and upgrade with docker images. See: Installation
- DIAS interface compatible. See: api
- REST interface. See: api
- Simple blob storage
- Build upon proven open source components. See: Architecture
- Supports multiple archival storage backends: Tape archival with IBM Spectrum Protect (formerly Tivoli Storage Manager), filesystem.
- Supports multiple SIP archive formats: zip, tar, tar.gz, 7z.
- Secured connections: TLS for Web UI and API; SFTP for ingesting SIPs
- Multiple authentication methods: Local DB, LDAP, Single Sign-On with OpenID Connect. See: Authentication
- Administrative web interface for: live statistics, administrative changes, auditing. See: Administration
- DROID signature file importer
- Apps export prometheus metrics like: requests/sec, ingest duration, ingest errors
- Scan SIPs for viruses with clamav
There are two main actors using the koala system:
An API client is a remote application which consumes the REST and SFTP interface to ingest and retrieve assets.
A human administrator who controls the system through the provided web interface. The admin can monitor the running system which includes reviewing the status of the running applications and getting live informations about the growing size of the archival storage system.
- Braunschweig, Björn, Digitale Langzeitarchivierung für die Deutsche Nationalbibliothek, GWDG Nachrichten 9-10/2015, Seite 16 ff.
- Braunschweig, Björn, koala – neue Plattform zur Digitalen Langzeitarchivierung für die Deutsche Nationalbibliothek, GWDG Nachrichten 7/2018, Seite 15 ff.
- GWDG koala project page: gwdg.de/web/guest/research-education/projects/koala