API

References

Overview

koala provides two kinds of interfaces. The older one is called DIAS and is required by the DNB.

Some REST endpoints require basic authentication which is a simple authentication scheme built into the HTTP protocol. The client sends HTTP requests with the Authorization header that contains the word Basic followed by a space and a base64-encoded string: username:password.

SIP package format

The SIP is a self describing archive which includes technical metadata and digital assets. Technical metadata must be specified as a file named mets.xml in the root directory. Digital content must be located in a subfolder called content/. See SIP Interface Spec 4.2 DIAS-METS format. Sample mets.xml

Submit SIP

SFTP is used to transport the SIP from the producer to the koala upload directory. The following procedure must be followed:

  1. Use a UNIQUE temporary filename during the transfer to prevent the scheduler from issuing a load request on an incomplete file. E.g. <unix-timestamp>.tmp or <guid>.tmp
  2. Transfer the SIP
  3. Upon finishing the transfer rename the temporary file to a file with a unique name and extension. Only files with supported format will be processed. E.g. mv <unix-timestamp>.tmp <unix-timestamp>.tar

Info

Currently supported SIP formats are: tar, tar.gz, zip

Metadata Update

Purpose:

Provides a mechanism to update the metadata of an already existing asset. It uses the same mechanism as described in: Submit SIP data package whereby the given SIP does not contain any content data. The ingest process:

  • validates that the new technical metadata reagarding content files has not changed to prevent an out of sync.
  • overrides the metadata file in archival storage
  • updates the mdqi search index

Info

Not in original spec. Extension in koala only.

SIP

Request storage rules

Purpose:

Storage Rules determine which kinds of medium exist to store the AIP.

Request:

GET /AccessManager/AccessManager?cmd=storagerules

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<STORAGERULES>
  <Item StorageRule="Depot" />
  <Item StorageRule="Digitisation" />
  <Item StorageRule="Hosting" />
  <Item StorageRule="Optical" />
  <Item StorageRule="Scientific" />
  <Item StorageRule="Tape" />
</STORAGERULES>  

Warning

Not implemented. Multiple storage backends selectable per SIP were never needed.

Request file types

Purpose:

In order to get the right values to describe the file type in the SIP data package the Producer can retrieve a list of file type id and file properties. Note that the use of unsupported (or unknown) file types is permitted. However, these files may not open correctly and are not subject to preservation processing.

Request:

GET /AccessManager/AccessManager?cmd=filetypes

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<FILETYPES>
    <Item fileTypeID="urn:diasid:fty:kopal:0200507050000000000001" fileType="Adobe Acrobat Document" fileTypeVersion="1.2" fileTypeStatus="active" mimeType="application/pdf" fileExtension=".pdf"/>
    <Item fileTypeID="urn:diasid:fty:kopal:0200507050000000000002" fileType="Adobe Acrobat Document" fileTypeVersion="1.3" fileTypeStatus="active" mimeType="application/pdf" fileExtension=".pdf"/>
</FILETYPES>    

Request:

GET /api/filetypes

Response:

200 OK
Content-Type: application/json; charset=utf-8
[
  {"id": "rn:diasid:fty:kopal:0200507050000000000001", "name": "Adobe Acrobat Document", "version": "1.2", "status": "active", "mimetype": "application/pdf", "extensions": [".pdf"]},
  {"id": "rn:diasid:fty:kopal:0200507050000000000001", "name": "JPEG Image", "version": "1.02", "status": "active", "mimetype": "image/jpeg", "extensions": [".jpg", ".jpeg"]}
]

Request transport parameters

Purpose:

The default way to transport the SIP data package to koala is using the SFTP protocol. Provides an operation to get the SFTP connection infos.

Request:

GET /AccessManager/AccessManager?cmd=FTPInfo&username=<user>&password=<pw>

<user> user with api permission
<pw> password

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<FTPINFO>
    <FTPInfoItem FTP_Password="pw" SIP.preloadarea="/data/prddias/preload/SIP/DDBDIASMETS10" 
    FTP_Server="host:port" FTP_User="user"/>
</FTPINFO>

Request:

GET /api/ftpinfo
Authorization: Basic YWRtaW46cGFzc3dvcmQK=

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "host": "koala.gwdg.de:22",
  "user": "user",
  "password": "pw",
  "uploadarea": "/data/upload"
}

Note

koala supports SFTP authentication with ssh public/private key pairs as well.

Request SIP ingest

Purpose:

Provides an operation via which a producer application can retrieve a unique ticket number identifying an instance of the SIP Ingest process and the associated preload area into which the SIP can be uploaded.

Request:

/IngestStatus/Ingest?SipName=<SIPName>

<SIPName>   Name of SIP. E.g: 1345743.tar

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock type="busy" version="1.0">
    <Description>Ingest Status</Description>
    <Metadata>
        <SIP_Filename>123.tar</SIP_Filename>
        <SIPIngestTicketNumber>123.tar</SIPIngestTicketNumber>
        <SIP.preloadarea>/data/prddias/preload/SIP/DDBDIASMETS10</SIP.preloadarea>
        <FTP_Password>pw</FTP_Password>
        <FTP_Server>server:port</FTP_Server>
        <FTP_User>user</FTP_User>
    </Metadata>
</MetadataBlock>

Note

Implemented for compatibility reasons only. <SIPIngestTicketNumber> always reflects the specified SipName=<SIPName>. The ingest constraint which states that always a UNIQUE SIP name has to be used allows to keep track of every SIP via the following "Request Ingest Status" method.

Request ingest status

Purpose:

Provides an operation that can be used by producer applications to retrieve status and error information related to a SIP.

Request:

GET /IngestStatus/Ingest?TicketId=<TicketId>

<TicketId>  SIP name with extension. e.g. 1427326550312.tar

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock type="busy" version="1.0">
    <Description>Ingest Status</Description>
    <Metadata>
        <StartTimestamp>2016-02-23 08:40:52</StartTimestamp>
        <SourceID>loader@820c49d0e6ef</SourceID>
        <State>busy</State>
        <Substate>extract</Substate>
        <SupplementaryInformation/>
        <Timestamp>2016-02-23 08:40:57</Timestamp>
    </Metadata>
</MetadataBlock>

<State> valid values: uploading, busy, done, error

<Substate> corresponds to the current processing step of the loader application. It can contain an arbitrary value until it ends in one of the following states:

  • done: SIP is successfully ingested.
  • error: An error occurred while ingesting the SIP. Additional information is available in the <SupplementaryInformation> element.
  • fatal: A fatal error occurred which stops the corresponding loader.

Request:

GET /api/ingeststatus/<SIP>

<SIP>   Name of SIP. E.g: 13457543.tar

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "create_ts": "2018-06-28T09:11:45",
  "modify_ts": "2018-06-28T09:11:45",
  "originator": "loader@24d737fc147f",
  "status": "done",
  "substate": "done",
  "suppl_info": null
}

Info

Special care must be taken in case a loader stops without setting the appropriate ticket state. Therefore the client application must specify a timeout after which it considers the ingest transaction as failed. E.g. 1h for a SIP less than 50GB.

Administration

The basic response to requests to koala will be an HTTP response message carrying an XML response message as body. The XML response message has the root element <MetadataBlock>, which contains a mandatory <header> element and either a <result> or an <error> element, depending on whether an operation completed successfully or not.

Note: A 'result' for an operation may either be positive (the intended operation completed successfully and had the desired outcome), or be negative (the intended operation completed successfully but did not have the desired outcome) . Only if an operation could not be completed an 'error' is to be included.

The <header> element of the XML response message contains the following mandatory child elements:
<version> the version of the interface operation protocol
<request> the URL of the request
<datetime> the date/time stamp for the response message formatted according to ISO 8601:1988, i.e. YYYY + '-' + MM + '-' + DD + 'T' + HH + ':' + MM + ':' + SS+ 'Z'
<originator> the originator of the request, i.e. the DIAS user organization that did the request

The <result> element of the XML response message contains the XML formatted result of the requested operation, as specified for each individual operation later in this document. The \<result> element may be empty.

The <result> element has an enumerated attribute type 'resultcode' that indicates the success of the operation. The following table specifies the possible general values for resultcode. Each operation may specify additional result codes to replace 'notOK'. The result code will have no default value, to prevent ambigue representations like \<result/>.

  • OK Operation completed successfully and had the desired outcome
  • notOK Operation completed successfully, but did not have the desired outcome. No operation specific resultcode was provided

The <error> element of the XML response message contains the following elements:
<errortype> the type of error that occurred (mandatory)
<errorcode> the error code, unique within the type for each error (mandatory)
<errormessage> the message that describes the error (mandatory)
<errordetail> any details that can be provided about the error (optional)

ErrorCodes:

  • 1: General system error
  • 2: System temporarily unavailable
  • 3: Unsupported operation

Request schema list

Purpose: Enables the retrieval of a list of all xml schema definitions that exist in the system. Xml schemas are used to enforce the structural validity of the metadata xml file.

Request:

GET /dias/administration?request=getschemalist

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock>
    <header>
        <version>1.0</version>
        <request>getschemalist</request>
        <datetime>2018-07-16 08:00:24.577404</datetime>
        <originator>DNB</originator>
    </header>
    <result resultcode="OK">
        <xmlSchemas>
            <xmlSchema>
                <xmlSchemaID>urn:diasid:xsd:kop01:0000000000000000000001</xmlSchemaID>
                <xmlSchemaName>fits.xsd</xmlSchemaName>
                <xmlSchemaDescription/>
                <xmlSchemaURI>
                http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd
                </xmlSchemaURI>
                <xmlSchemaVersion>1</xmlSchemaVersion>
            </xmlSchema>
            ...
        </xmlSchemas>
    </result>
</MetadataBlock>

Request a specific schema

Purpose: Enables the retrieval of a specific xsd schema file.

Request:

GET /dias/administration?request=getschema&schemaID=<schemaID>

<schemaID> e.g. urn:diasid:xsd:kop01:0000000000000000000009

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" targetNamespace="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" elementFormDefault="qualified" attributeFormDefault="unqualified">
    <xs:element name="fits">
    <xs:complexType>
    ...
    </xs:complexType>
    </xs:element>
</xs:schema>

Deletion of assets and AIPs

Purpose:

Provides an operation that can be used to either delete an Asset or delete an AIP. Deletion of assets which have derived assets is not supported.

Request:

GET /dias/administration?internalassetid=<internalassetid>&request=<deleterequest>&username=<user>&password=<pw>

<user> user with api permission
<pw> password

<internalassetid>   The internal asset id of the asset that the delete is requested for.
<deleterequest>     The delete function that is requested: deleteAsset or deleteAIP.
  • deleteAsset: Remove data from data management and archival storage
  • deleteAIP: Remove data from archival storage and mark deletion in data management.

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<MetadataBlock>
    <header>
        <version>1.0</version>
        <request></request>
        <datetime>2016-02-23 11:27:00</datetime>
        <originator>DNB</originator>
    </header>
    <result resultcode='OK'>
    </result>
</MetadataBlock>

Request:

DELETE /api/asset/internal/<internalassetid>
Authorization: Basic <authstring>
{
  "type": "deleteasset"
}

Response:

200 OK
Content-Type: application/json; charset=utf-8

Checking the consistency of stored assets

Purpose:

Provides an operation that can be used to check the consistency of an asset by recalculating the checksums and comparing the values with the stored values in Data Management.

Request:

GET /dias/administration?internalassetid=<internalassetid>&request=checkConsistency

<internalassetid>   the internal asset id of the asset that the consistency check is requested for.

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<MetadataBlock>
    <header>
        <version>1.0</version>
        <request></request>
        <datetime>2016-02-23 11:27:00</datetime>
        <originator>DNB</originator>
    </header>
    <result resultcode='ok'>
    </result>
</MetadataBlock>

Result codes attached to the checkConsistency operation:

  • ok: The asset or AIP consistency checks did not return any consistency errors
  • error: A consistency error occurred. Further information can be found in the result element.
  • fatal: Indicates a fatal state which caused the application to shutdown.

Request:

POST /api/checkconsistency/<internalassetid>
Authorization: Basic <authstring>

Response:

201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<sessionid>

Modified AIPs

Purpose:

Provides an operation to get an overview of all AIPs that have been created or deleted within a specified timeframe. This allows systems with a dependency on DIAS to discover changes.

Request:

GET /dias/administration?modification=<modification_type>
&from=<start>&to=<end>&request=getModifiedAIPs 

<modification_type>  created, aipdeleted, assetdeleted 
<start> lower boundary of the status retrieval timeframe (inclusive), eg: 2005-01-01T00:00:00 
<end> upper boundary of the status retrieval timeframe (exclusive), eg: 2006-12-31T23:59:59

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock>
    <header>
        <version>1.0</version>
        <request>getmodifiedaips</request>
        <datetime>2016-04-04 12:35:20.567000</datetime>
        <originator>DNB</originator>
    </header>
    <result resultcode="OK">
        <modifiedAIP>
            <internalAssetID>9dac534d-38d7-41d4-9b6f-876dee236581</internalAssetID>
            <modificationTimeStamp>2016-04-04 14:26:28</modificationTimeStamp>
        </modifiedAIP>
    </result>
</MetadataBlock>

DIP

Request technical metadata

Purpose:

Request the technical metadata by the external asset identifier (lmerObject.persistentidentifier in SIP interface) or internal asset identifier.

One external asset identifier can refer to multiple Internal Assets when the digital object is transformed over time. The metadata of all related Internal Assets can be returned as part of this request. The request can be extended with the parameter 'list' which can have the values 'true' or 'false'. If List is set to 'true' then metadata is returned for all Assets in the collection of assets with the specified external asset identifier. If the value is set to 'false' (which is the default value) only the metadata of the latest Asset with the specified external asset identifier will be returned.

Request:

GET /dias/Retriever?ExternalAssetID=<ExternalAssetID>&RespFmt=<RespFmt>
&Request=Metadata&list=<list> 

GET /dias/Retriever?InternalAssetID=<InternalAssetID>&RespFmt=<RespFmt>
&Request=Metadata 

<RespFmt>           Currently only 'xml' is supported.
<list>              true or false

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
   <DIASMETSRESPONSE type="dip" version="1.0">
       <DESCRIPTION>metadata response</DESCRIPTION>
       <REQUEST>
            <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
       </REQUEST>
       <DIP>
           <METADATABLOCK>
               <METADATA>
                   <INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
                   <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
                   <ASSETVERSION>1</ASSETVERSION>
                   <ASSETDESCRIPTION/>
                   <AIPISDELETED>0</AIPISDELETED>
                   <AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
                   <ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
                   <ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
                   <OLDOBJECTIDENTIFIER/>
                   <OLDVERSION/>
                   <SIZE>21577686</SIZE>
                   <DIPRETRIEVEWAITTIME unit="sec"/>
               </METADATA>
           </METADATABLOCK>
           <ASSETBLOCK>
               <ASSET>
                   <INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
                   <URI/>
               </ASSET>
           </ASSETBLOCK>
       </DIP>
   </DIASMETSRESPONSE>

Request:

GET /api/asset/<by>/<assetid>?list=true
Authorization: Basic <authstring>

<by>   external or internal

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
  "externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
  "assetversion": "1",
  "aipisdeleted": "false",
  "size": "5456222",
  ...
}

Request complete technical metadata by external or internal asset identifier

Purpose:

Request complete metadata: mets.xml

Request:

GET /dias/Retriever?InternalAssetID=<InternalAssetID>&Respfmt=<RespFmt>
&Request=FullMetadata 
Authorization: Basic <authstring>

GET /dias/Retriever?ExternalAssetID=<ExternalAssetID>&Respfmt=<RespFmt>
&Request=FullMetadata 
Authorization: Basic <authstring>

<RespFmt>           Currently only 'xml' is supported.

Response:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<DIASMETSRESPONSE type="dip" version="1.0">
    <DESCRIPTION>fullmetadata response</DESCRIPTION>
    <REQUEST>
        <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
    </REQUEST>
    <DIP>
        <METADATABLOCK>
            <METADATA>
                <INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
                <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
                <ASSETVERSION>1</ASSETVERSION>
                <ASSETDESCRIPTION/>
                <AIPISDELETED>0</AIPISDELETED>
                <AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
                <ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
                <ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
                <OLDOBJECTIDENTIFIER/>
                <OLDVERSION/>
                <SIZE>21577686</SIZE>
                <DIPRETRIEVEWAITTIME unit="sec"/>
            </METADATA>
        </METADATABLOCK>
        <ASSETBLOCK>
            <ASSET>
                <INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
                <URI>/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/unpacked/mets.xml</URI>
            </ASSET>
        </ASSETBLOCK>
    </DIP>
</DIASMETSRESPONSE>

The response is returned as a link to the file containing the full technical metadata. The technical metadata is placed in an unpacked form on a HTTP accessible location in the koala system.

The response will contain DIAS-METS formatted technical metadata with all information that was present in the input technical metadata with the exception of data elements that will be actualised.

Request:

POST /api/retrieve/<by>/<assetid>
Authorization: Basic <authstring>
{
  "type": "fullmetadata"
}

Response:

201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<sessionid>

Request a specific asset

Purpose:

Request a specific asset.

Request:

GET /dias/Retriever?InternalAssetID=<InternalAssetID>&Respfmt=<RespFmt>&DipFmt=<format>
&Request=Asset

<InternalAssetID>   Internal Asset Id
<RespFmt>           Currently only 'xml' is supported.
<format>            archive format ('zip', 'tar', 'unpacked'). If not specified the DIP will be unpacked.

Response:

On successful retrieve:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<DIASMETSRESPONSE type="dip" version="1.0">
    <DESCRIPTION>asset response</DESCRIPTION>
    <REQUEST>
        <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
    </REQUEST>
    <DIP>
        <METADATABLOCK>
            <METADATA>
                <INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
                <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
                <ASSETVERSION>1</ASSETVERSION>
                <ASSETDESCRIPTION />
                <AIPISDELETED>0</AIPISDELETED>
                <AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
                <ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
                <ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
                <OLDOBJECTIDENTIFIER></OLDOBJECTIDENTIFIER>
                <OLDVERSION></OLDVERSION>
                <SIZE>21577686</SIZE>
                <DIPRETRIEVEWAITTIME unit="sec"></DIPRETRIEVEWAITTIME>
            </METADATA>
        </METADATABLOCK>
        <ASSETBLOCK>
          <ASSET>
                <INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
                <URI>http://koala.gwdg.de/download/c16e7f14-ba59-4ea3-a5ca-6d5ac7c1f646/unpacked</URI>
            </ASSET>
        </ASSETBLOCK>
    </DIP>
</DIASMETSRESPONSE>

On server busy:

200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<DIASMETSRESPONSE type="busy" version="1.0">
    <DESCRIPTION>asset response</DESCRIPTION>
    <REQUEST>
        <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
    </REQUEST>
    <DIP>
        <METADATABLOCK>
            <METADATA>
                <INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
                <EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
                <ASSETVERSION>1</ASSETVERSION>
                <ASSETDESCRIPTION />
                <AIPISDELETED>0</AIPISDELETED>
                <AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
                <ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
                <ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
                <OLDOBJECTIDENTIFIER></OLDOBJECTIDENTIFIER>
                <OLDVERSION></OLDVERSION>
                <SIZE>21577686</SIZE>
                <DIPRETRIEVEWAITTIME unit="sec"></DIPRETRIEVEWAITTIME>
            </METADATA>
        </METADATABLOCK>
        <ASSETBLOCK />
    </DIP>
</DIASMETSRESPONSE>

The response is returned as a link to the DIP-file containing the asset. The technical metadata and the asset is placed in an unpacked form on a HTTP accessible location in the koala system.

The DIP data package for that is returned for an Asset Retrieve Request will contain DIAS-METS formatted technical metadata with all information that was present in the input technical metadata with the exception of data elements that have to be actualised.

The following data elements will be actualised with current values from Data Management in the DIASMETS formatted technical metadata:

  • mets.OBJID must be set with the internalAssetID;
  • mets.amdSec.techMD.mdWrap.xmlData.lmerObject.objectIdentifier must be set with the internalAssetID;
  • mets.metsHdr.CREATEDATE must be set with the current system-date/time;
  • mets.metsHdr.agent.ROLE must be set to 'ARCHIVIST';
  • mets.metsHdr.agent.TYPE must be set to 'ORGANISATION';
  • mets.metsHdr.agent.name must be set to the requesting organisation

koala will update the lmerObject.status-element to 'deleted' in the technical metadata in DIASMETS format that is included in the DIP data package that is returned for a successful Asset Retrieve Request in case the AIP of the requested Asset has been deleted.

koala will remove all mets.fileSec.fileGrp.file-elements in the mets.fileSec.fileGrp-element identified by 'ASSET' in the technical metadata in DIAS-METS format that is included in the DIP data package that is returned for a successful Asset Retrieve Request in case the AIP of the requested Asset has been deleted.

The physical files in the DIP data package will be present in the same relative location from the Asset root as they were in the input SIP data package.

The DIP-archive is the response on a request with the DipFmt parameter. A DIP-archive is made available on a HTTP accessible location in the koala system. The DIP-archive is placed in a unique directory for the request. In this directory there are two subdirectories: the 'packed' and the 'unpacked' directory. In the packed directory the DIP-archive is named like: Dip--.archive extension Where ResponseType is the type of response requested, (asset or metadata) and unique number is the Internal Asset Identifier.

The Archival Information Package (AIP) with the asset is retrieved from archival storage and the asset is extracted and stored in unpacked format in the unpacked directory. Then the asset is packet into the requested archive format and stored in the packed directory. The DIP-archive is made available on a HTTP accessible location in the koala system. Dependent on request the DIP-archive contains metadata and/or the asset. When the asset is included the content of the asset is the same as it was ingested. This means the additional files and a descriptive file are included as well.

Request:

POST /api/retrieve/<by>/<assetid>
Authorization: Basic <authstring>
{
  "type": <type>,
  "format": <format>,
  "use_cache": <use_cache>,
  "callback_url": <callback_url>
}

<type> required, possible values: asset, fullmetadata
<format> optional, possible values: unpacked, zip, tar
<use_cache> optional, possible values: false, true
<callback_url> optional, sends http post request to specified url after successfully building DIP. 

Response:

201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<session_id>

Request multiple assets

Purpose:

Retrieve multiple assets in a more efficient way. It is better to pass a list of packages to be retrieved directly to the API. The TSM can then internally set the retrieval order to minimize the number of tape changes and spooling of the tapes.

Request by internal_asset_id:

POST /api/retrieveall
Authorization: Basic <authstring>
{
  "internal_asset_ids": ["71f3b7c1-e0d7-48de-a75b-098498697da4", "4f086b5d-7827-4d0b-9adb-e21a25a30f48"],
  "callback_url": <callback_url>
}

<callback_url> optional, sends http post request to specified url after successfully retrieving all assets. 

Request by external_asset_id:

If migrations exist, only the latest asset will be retrieved.

POST /api/retrieveall
Authorization: Basic <authstring>
{
  "external_asset_ids": ["71f3b7c1-e0d7-48de-a75b-098498697da4", "4f086b5d-7827-4d0b-9adb-e21a25a30f48"],
  "callback_url": <callback_url>
}

<callback_url> optional, sends http post request to specified url after successfully retrieving all assets. 

Response:

201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<session_id>

Response on asset not found:

400 BAD REQUEST
Content-Type: application/json; charset=utf-8
{
  "msg": "Some assets could not be found",
  "ids": ["71f3b7c1-e0d7-48de-a75b-098498697da4", "4f086b5d-7827-4d0b-9adb-e21a25a30f48"]
}

Response on downloadarea has not sufficient space:

400 BAD REQUEST
Content-Type: application/json; charset=utf-8
{
  "msg": "need approx. 1.1TB but only 101GB free"
}

Query retrieve status

Purpose:

To query the status of a retrieve operation. For REST endpoints that return a <session_id>.

Request:

GET /api/retrievestatus/<session_id>

Response for fullmetadata:

{
  "status": "ready",
  "internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
  "externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
  "uri": "https://koala.gwdg.de/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/unpacked/mets.xml"
}

Response for packed asset:

{
  "status": "ready",
  "internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
  "externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
  "uri": "https://koala.gwdg.de/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/packed/dip-asset.zip"
}

Response for unpacked asset:

{
  "status": "ready",
  "internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
  "externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
  "uri": "https://koala.gwdg.de/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/unpacked"
}

Response for retrieve_all:

{
  "status":"ready",
  "external_asset_id":null,
  "internal_asset_id":null,
  "suppl_info":"4 of 4",  # reflects retrieve progress
  "uri":"https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e"}

Assets are grouped by internal_asset_id. Example:

https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/47c53955-e7b6-48e5-b046-005fc9d5683a/aip.zip
https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/47c53955-e7b6-48e5-b046-005fc9d5683a/mets.xml

https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/cc8d6f41-e76e-49a4-a448-186a67526781/aip.zip
https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/cc8d6f41-e76e-49a4-a448-186a67526781/mets.xml

MDQI

Every metadata file ingested into koala is stored in the baseX database. baseX is a native xml database and provides support for standardized query languages such as XPath and XQuery. See: http://docs.basex.org/wiki/Getting_Started

REST Query URL

POST https://koala.gwdg.de:8984/rest
User-Agent: <agent>
Authorization: Basic <basicauth>
Host: 134.76.20.34:8984
Content-Length: 165
<query xmlns="http://basex.org/rest">
    <text>
        <![CDATA[ 
            enter query here 
        ]]>
    </text>
</query>

Sample Queries:

Get external asset ids of all assets that have a minimum of one file with mimetype: application/pdf

declare default element namespace "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";

for $db in db:list()
  for $doc in collection($db)
    let $pid := $doc/koala/metsns:mets/metsns:amdSec/metsns:techMD/metsns:mdWrap/metsns:xmlData/lmerObject:persistentIdentifier
    let $f := $doc//metsns:mets/metsns:fileSec/metsns:fileGrp/metsns:file[@MIMETYPE = "application/pdf"]
    return $pid/text()

Count all files (not assets!)

declare default element namespace "http://www.loc.gov/METS/";
count(for $db in db:list()
    return collection($db)//file)

Get internal asset ids of all assets that have a minimum of one file with mimetype: application/pdf

declare default element namespace "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";

for $db in db:list()
    for $doc in db:open($db)
        let $objid := $doc/@OBJID
        let $file := $doc//file
        where $file/@MIMETYPE = 'application/pdf'
        return <OBJID>{data($objid)}</OBJID>

Search a specific asset by external asset id

declare namespace mets =  "http://www.loc.gov/METS/"; 
declare namespace lmerObject = "http://www.ddb.de/LMERobject";

  for $db in db:list()
    for $doc in collection($db)
    where $doc/koala/metsns:mets/metsns:amdSec/metsns:techMD/metsns:mdWrap/metsns:xmlData/lmerObject:persistentIdentifier/text() eq 'urn:nbn:de:101:1-2008591923'
    return $doc

Search a specific asset by internal asset id

(for $db in db:list()
  for $doc in collection($db)
      where $doc//*:internal_asset_id = 'urn:diasid:ast:kop01:0201504171131522770000'
      return $doc)[1]

Get the number of files where fits was used in a specific version

declare namespace mets =  "http://www.loc.gov/METS/"; 
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
declare namespace lmerFile = "http://www.ddb.de/LMERfile";
declare namespace fits = "http://hul.harvard.edu/ois/xml/ns/fits/fits_output";

count(for $db in db:list()
    for $doc in collection($db)
    where $doc//fits:identity[@toolversion = '0.6.1']
    return $doc//mets:file)

Get first 10 files where bitrate in fits metadata is set to a specific value

declare default element namespace "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
declare namespace lmerFile = "http://www.ddb.de/LMERfile";
declare namespace fits = "http://hul.harvard.edu/ois/xml/ns/fits/fits_output";

let $a := for $db in db:list()
  for $doc in collection($db)
    let $techmd := $doc//amdSec/techMD
    let $bitrate := $techmd/mdWrap/xmlData/lmerFile:xmlData/fits:fits/fits:metadata/fits:audio/fits:bitRate
    let $fileid := $techmd/@ID
    where $bitrate = 128000
    return $doc//file[@ADMID=$fileid]

 for $b in subsequence($a, 1, 10)
 return $b

Get all files for a given external_asset_id

declare namespace mets = "http://www.loc.gov/METS/"; 
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
declare namespace xlink = "http://www.w3.org/1999/xlink";

for $db in db:list()
  for $doc in collection($db)
    let $file := $doc//mets:file
      where $doc/koala/aip/external_asset_id = "urn:de:sample:myurn"
      return $doc//mets:fileGrp

Directly access text index

count(
 for $db in db:list()
 return db:text($db, "urn:diasid:fty:kopal:0000000000000000000000")
)

Count the number of files which have an unknown filetype.

declare namespace lmerFile = "http://www.ddb.de/LMERfile";

count(
  (# db:enforceindex #) {
  for $db in db:list()
  for $doc in db:open($db)/*
  where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000'
  return $doc
})

Simple blob storage

Archive and Retrieve binary objects via simple http rest calls. HTTP basic auth must be specified in all cases.

Upload

An uploaded object is addressable by its internal_asset_id.

user@host ~: curl -X POST --data-binary @"myfile" --user user:pw https://koala.gwdg.de/api/sbs
HTTP/1.1 100 Continue

HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 61

{"internal_asset_id":"63b54de9-dc2b-4bff-be28-e2ad92d02b5c"}

Download

Retrieving a large object can take some time. Therefore an asynchronous retrieve job must be triggered.

user@host ~: curl -i --user user:pw -X POST -H "Content-Type: application/json" -d '{"internal_asset_id":"63b54de9-dc2b-4bff-be28-e2ad92d02b5c"}' https://koala.gwdg.de/api/sbsr
HTTP/1.1 202 ACCEPTED
Location: https://koala.gwdg.de/api/sbsr/39bffece-3cb8-419a-bb58-432d061483c8

If object is not yet available:

user@host ~: curl -i --user user:pw https://koala.gwdg.de/api/sbsr/39bffece-3cb8-419a-bb58-432d061483c8
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8

{
    "status": "retrieve",
    "originator": "retriever@6146de9de49f",
    "internal_asset_id": "c546a567-724a-46ed-8ffb-19ab078ffc7b"
}

If object is available for download:

user@host ~: curl -i --user user:pw https://koala.gwdg.de/api/sbsr/39bffece-3cb8-419a-bb58-432d061483c8
HTTP/1.1 303 SEE OTHER
Location: https://koala.gwdg.de/download/39bffece-3cb8-419a-bb58-432d061483c8/content

Save object to local file.

user@host ~: curl --user user:pw -o mybigfile https://koala.gwdg.de/download/39bffece-3cb8-419a-bb58-432d061483c8/content

Delete

Delete object from database and archival storage.

user@host ~: curl -X DELETE -i --user user:pw https://koala.gwdg.de/api/sbs/39bffece-3cb8-419a-bb58-432d061483c8
HTTP/1.1 200 OK

{"msg":"ok"}

Miscellaneous

Ready for ingest

Purpose:

Can be used by connected systems to decide if koala is currently ready to process SIPs.

Request:

GET /api/check_ingest_ability 

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "ready": true,
  "uploadarea_usage": {"free": 68058800128, "total": 73550618624, "used": 2244472832}
}

Health

Purpose:

Request health status of middleware componenents. Tries to create a new connection to the individual backend.

Request:

GET /api/health 

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
    "health": [
        {
          "name": "database", 
          "value": "ok"
        }, 
        {
          "name": "archival storage", 
          "value": "ok"
        }, 
        {
          "name": "cache", 
          "value": "ok"
        }, 
        {
          "name": "message queue", 
          "value": "ok"
        }, 
        {
          "name": "mdqi", 
          "value": "ok"
        }
    ], 
    "last_check": "2018-01-24 09:14:51"
}

App status

Purpose:

Requests which koala apps are running at the moment and their current processing status.

Request:

GET /api/stats/apps 

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "apps": [
    {
      "name": "loader@2c608e84f918", 
      "processing": "done", 
      "progress": "100", 
      "status": "running", 
      "uptime": "00:17:09", 
      "version": "c341d48-267-2018-01-24"
    }, 
    {
      "name": "loader@43b2fc39f0db", 
      "processing": "done", 
      "progress": "100", 
      "status": "running", 
      "uptime": "00:17:09", 
      "version": "c341d48-267-2018-01-24"
    }, 
    {
      "name": "loader@b0bfe15a26ea", 
      "processing": "done", 
      "progress": "100", 
      "status": "running", 
      "uptime": "00:17:11", 
      "version": "c341d48-267-2018-01-24"
    }, 
    {
      "name": "loader@b3a356a302f5", 
      "processing": "done", 
      "progress": "100", 
      "status": "running", 
      "uptime": "00:17:09", 
      "version": "c341d48-267-2018-01-24"
    }, 
    {
      "name": "purger@36c076ad6740", 
      "processing": "sleeping", 
      "progress": "0", 
      "status": "running", 
      "uptime": "00:17:11", 
      "version": "c341d48-267-2018-01-24"
    }, 
    {
      "name": "retriever@a05eb49313d4", 
      "processing": "", 
      "progress": "0", 
      "status": "running", 
      "uptime": "00:17:11", 
      "version": "c341d48-267-2018-01-24"
    }, 
    {
      "name": "scheduler@142e6754c68c", 
      "processing": "sleeping", 
      "progress": "0", 
      "status": "running", 
      "uptime": "00:17:11", 
      "version": "c341d48-267-2018-01-24"
    }, 
    {
      "name": "stats@b4e4e1dbcf6a", 
      "processing": "sleeping", 
      "progress": "0", 
      "status": "running", 
      "uptime": "00:17:11", 
      "version": "c341d48-267-2018-01-24"
    }
  ]
}

Disk status

Purpose:

Requests disk usage.

Request:

GET /api/stats/disk 

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "disk_download": {
    "free": 45697851392, 
    "total": 52710469632, 
    "used": 6995841024
  }, 
  "disk_error": {
    "free": 22852845568, 
    "total": 26288123904, 
    "used": 3418501120
  }, 
  "disk_preload": {
    "free": 51225415680, 
    "total": 52710469632, 
    "used": 1468276736
  }, 
  "disk_work": {
    "free": 26223419392, 
    "total": 26288123904, 
    "used": 47927296
  }
}

Ingest statistics

Purpose:

Requests time based ingest statistics.

Request:

GET /api/stats/asset/<resolution> 

<resolution>    hour/day/week/month/year, required

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "asset": {
    "average": "11.09 MB", 
    "count": "2,956,898", 
    "sum": "31.28 TB"
  }, 
  "data": [
    {
      "x": "2018-01-24 08:30", 
      "y1": "3", 
      "y2": "19168649"
    }, 
    {
      "x": "2018-01-24 08:31", 
      "y1": "23", 
      "y2": "169189475"
    }
  ]
}
x1: timespan
y1: number of assets ingested
y2: number of bytes

Refresh cache

Purpose:

Invalidates all cached data and triggers a recalculation of all statistics.

Warning

This can take some time and generates high load on the database.

Request:

GET /api/refreshcache 

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "result": "ok"
}

Show cache sync status

Purpose:

Compares the number of stored assets in cache and db.

Request:

GET /api/showcachesync 

Response:

200 OK
Content-Type: application/json; charset=utf-8
{
  "cache_count": 654899, 
  "db_count": 654899, 
  "diff": 0, 
  "result": "ok"
}

Limitations

  • L1: Schemas which are used to validate the metadata file while ingesting can have a maximum file size of 16MB according to MySQLs MediumBlob Datatype.
  • L2: mets.xml metadata files may only have a maximum file size of 0.5 x RAM of Loader Host.