API¶
References¶
- SIP Interface Spec: kopal_DIAS_SIP_Interface_Specification.pdf
- DIP Interface Spec: kopal_DIAS_DIP_Interface_Specification.pdf
- UOF Spec: kopal_Universelles_Objektformat.pdf
Overview¶
koala provides two kinds of interfaces. The older one is called DIAS
and is required by the DNB.
Some REST
endpoints require basic authentication which is a simple authentication scheme built into the HTTP protocol.
The client sends HTTP requests with the Authorization header that contains the word Basic followed by a space and a
base64-encoded string: username:password.
SIP package format¶
The SIP is a self describing archive which includes technical metadata and digital assets. Technical metadata
must be specified as a file named mets.xml
in the root directory. Digital content must be located in a subfolder
called content/
. See SIP Interface Spec 4.2 DIAS-METS format
. Sample mets.xml
Submit SIP¶
SFTP is used to transport the SIP from the producer to the koala upload directory. The following procedure must be followed:
- Use a UNIQUE temporary filename during the transfer to prevent the scheduler from issuing a load request on an
incomplete file. E.g.
<unix-timestamp>.tmp
or<guid>.tmp
- Transfer the SIP
- Upon finishing the transfer rename the temporary file to a file with a unique name and extension.
Only files with supported format will be processed.
E.g.
mv <unix-timestamp>.tmp <unix-timestamp>.tar
Info
Currently supported SIP formats are: tar, tar.gz, zip
Metadata Update¶
Purpose:
Provides a mechanism to update the metadata of an already existing asset. It uses the same mechanism as described in: Submit SIP data package whereby the given SIP does not contain any content data. The ingest process:
- validates that the new technical metadata reagarding content files has not changed to prevent an out of sync.
- overrides the metadata file in archival storage
- updates the mdqi search index
Info
Not in original spec. Extension in koala only.
SIP¶
Request storage rules¶
Purpose:
Storage Rules determine which kinds of medium exist to store the AIP.
Request:
GET /AccessManager/AccessManager?cmd=storagerules
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<STORAGERULES>
<Item StorageRule="Depot" />
<Item StorageRule="Digitisation" />
<Item StorageRule="Hosting" />
<Item StorageRule="Optical" />
<Item StorageRule="Scientific" />
<Item StorageRule="Tape" />
</STORAGERULES>
Warning
Not implemented. Multiple storage backends selectable per SIP were never needed.
Request file types¶
Purpose:
In order to get the right values to describe the file type in the SIP data package the Producer can retrieve a list of file type id and file properties. Note that the use of unsupported (or unknown) file types is permitted. However, these files may not open correctly and are not subject to preservation processing.
Request:
GET /AccessManager/AccessManager?cmd=filetypes
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<FILETYPES>
<Item fileTypeID="urn:diasid:fty:kopal:0200507050000000000001" fileType="Adobe Acrobat Document" fileTypeVersion="1.2" fileTypeStatus="active" mimeType="application/pdf" fileExtension=".pdf"/>
<Item fileTypeID="urn:diasid:fty:kopal:0200507050000000000002" fileType="Adobe Acrobat Document" fileTypeVersion="1.3" fileTypeStatus="active" mimeType="application/pdf" fileExtension=".pdf"/>
</FILETYPES>
Request:
GET /api/filetypes
Response:
200 OK
Content-Type: application/json; charset=utf-8
[
{"id": "rn:diasid:fty:kopal:0200507050000000000001", "name": "Adobe Acrobat Document", "version": "1.2", "status": "active", "mimetype": "application/pdf", "extensions": [".pdf"]},
{"id": "rn:diasid:fty:kopal:0200507050000000000001", "name": "JPEG Image", "version": "1.02", "status": "active", "mimetype": "image/jpeg", "extensions": [".jpg", ".jpeg"]}
]
Request transport parameters¶
Purpose:
The default way to transport the SIP data package to koala is using the SFTP protocol. Provides an operation to get the SFTP connection infos.
Request:
GET /AccessManager/AccessManager?cmd=FTPInfo&username=<user>&password=<pw>
<user> user with api permission
<pw> password
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<FTPINFO>
<FTPInfoItem FTP_Password="pw" SIP.preloadarea="/data/prddias/preload/SIP/DDBDIASMETS10"
FTP_Server="host:port" FTP_User="user"/>
</FTPINFO>
Request:
GET /api/ftpinfo
Authorization: Basic YWRtaW46cGFzc3dvcmQK=
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"host": "koala.gwdg.de:22",
"user": "user",
"password": "pw",
"uploadarea": "/data/upload"
}
Note
koala supports SFTP authentication with ssh public/private key pairs as well.
Request SIP ingest¶
Purpose:
Provides an operation via which a producer application can retrieve a unique ticket number identifying an instance of the SIP Ingest process and the associated preload area into which the SIP can be uploaded.
Request:
/IngestStatus/Ingest?SipName=<SIPName>
<SIPName> Name of SIP. E.g: 1345743.tar
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock type="busy" version="1.0">
<Description>Ingest Status</Description>
<Metadata>
<SIP_Filename>123.tar</SIP_Filename>
<SIPIngestTicketNumber>123.tar</SIPIngestTicketNumber>
<SIP.preloadarea>/data/prddias/preload/SIP/DDBDIASMETS10</SIP.preloadarea>
<FTP_Password>pw</FTP_Password>
<FTP_Server>server:port</FTP_Server>
<FTP_User>user</FTP_User>
</Metadata>
</MetadataBlock>
Note
Implemented for compatibility reasons only. <SIPIngestTicketNumber> always reflects the specified SipName=<SIPName>. The ingest constraint which states that always a UNIQUE SIP name has to be used allows to keep track of every SIP via the following "Request Ingest Status" method.
Request ingest status¶
Purpose:
Provides an operation that can be used by producer applications to retrieve status and error information related to a SIP.
Request:
GET /IngestStatus/Ingest?TicketId=<TicketId>
<TicketId> SIP name with extension. e.g. 1427326550312.tar
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock type="busy" version="1.0">
<Description>Ingest Status</Description>
<Metadata>
<StartTimestamp>2016-02-23 08:40:52</StartTimestamp>
<SourceID>loader@820c49d0e6ef</SourceID>
<State>busy</State>
<Substate>extract</Substate>
<SupplementaryInformation/>
<Timestamp>2016-02-23 08:40:57</Timestamp>
</Metadata>
</MetadataBlock>
<State> valid values: uploading, busy, done, error
<Substate> corresponds to the current processing step of the loader application. It can contain an arbitrary value until it ends in one of the following states:
- done: SIP is successfully ingested.
- error: An error occurred while ingesting the SIP. Additional information is available in the <SupplementaryInformation> element.
- fatal: A fatal error occurred which stops the corresponding loader.
Request:
GET /api/ingeststatus/<SIP>
<SIP> Name of SIP. E.g: 13457543.tar
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"create_ts": "2018-06-28T09:11:45",
"modify_ts": "2018-06-28T09:11:45",
"originator": "loader@24d737fc147f",
"status": "done",
"substate": "done",
"suppl_info": null
}
Info
Special care must be taken in case a loader stops without setting the appropriate ticket state. Therefore the client application must specify a timeout after which it considers the ingest transaction as failed. E.g. 1h for a SIP less than 50 GB.
Administration¶
The basic response to requests to koala will be an HTTP response message carrying an XML response message as body. The XML response message has the root element <MetadataBlock>, which contains a mandatory <header> element and either a <result> or an <error> element, depending on whether an operation completed successfully or not.
Note: A 'result' for an operation may either be positive (the intended operation completed successfully and had the desired outcome), or be negative (the intended operation completed successfully but did not have the desired outcome) . Only if an operation could not be completed an 'error' is to be included.
The <header> element of the XML response message contains the following mandatory child elements:
<version> the version of the interface operation protocol
<request> the URL of the request
<datetime> the date/time stamp for the response message formatted according to ISO 8601:1988, i.e. YYYY + '-' +
MM + '-' + DD + 'T' + HH + ':' + MM + ':' + SS+ 'Z'
<originator> the originator of the request, i.e. the DIAS user organization that did the request
The <result> element of the XML response message contains the XML formatted result of the requested operation, as specified for each individual operation later in this document. The \<result> element may be empty.
The <result> element has an enumerated attribute type 'resultcode' that indicates the success of the operation. The following table specifies the possible general values for resultcode. Each operation may specify additional result codes to replace 'notOK'. The result code will have no default value, to prevent ambigue representations like \<result/>.
- OK Operation completed successfully and had the desired outcome
- notOK Operation completed successfully, but did not have the desired outcome. No operation specific resultcode was provided
The <error> element of the XML response message contains the following elements:
<errortype> the type of error that occurred (mandatory)
<errorcode> the error code, unique within the type for each error (mandatory)
<errormessage> the message that describes the error (mandatory)
<errordetail> any details that can be provided about the error (optional)
ErrorCodes:
- 1: General system error
- 2: System temporarily unavailable
- 3: Unsupported operation
Request schema list¶
Purpose: Enables the retrieval of a list of all xml schema definitions that exist in the system. Xml schemas are used to enforce the structural validity of the metadata xml file.
Request:
GET /dias/administration?request=getschemalist
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock>
<header>
<version>1.0</version>
<request>getschemalist</request>
<datetime>2018-07-16 08:00:24.577404</datetime>
<originator>DNB</originator>
</header>
<result resultcode="OK">
<xmlSchemas>
<xmlSchema>
<xmlSchemaID>urn:diasid:xsd:kop01:0000000000000000000001</xmlSchemaID>
<xmlSchemaName>fits.xsd</xmlSchemaName>
<xmlSchemaDescription/>
<xmlSchemaURI>
http://hul.harvard.edu/ois/xml/xsd/fits/fits_output.xsd
</xmlSchemaURI>
<xmlSchemaVersion>1</xmlSchemaVersion>
</xmlSchema>
...
</xmlSchemas>
</result>
</MetadataBlock>
Request a specific schema¶
Purpose: Enables the retrieval of a specific xsd schema file.
Request:
GET /dias/administration?request=getschema&schemaID=<schemaID>
<schemaID> e.g. urn:diasid:xsd:kop01:0000000000000000000009
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" targetNamespace="http://hul.harvard.edu/ois/xml/ns/fits/fits_output" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:element name="fits">
<xs:complexType>
...
</xs:complexType>
</xs:element>
</xs:schema>
Deletion of assets and AIPs¶
Purpose:
Provides an operation that can be used to either delete an Asset or delete an AIP. Deletion of assets which have derived assets is not supported.
Request:
GET /dias/administration?internalassetid=<internalassetid>&request=<deleterequest>&username=<user>&password=<pw>
<user> user with api permission
<pw> password
<internalassetid> The internal asset id of the asset that the delete is requested for.
<deleterequest> The delete function that is requested: deleteAsset or deleteAIP.
- deleteAsset: Remove data from data management and archival storage
- deleteAIP: Remove data from archival storage and mark deletion in data management.
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<MetadataBlock>
<header>
<version>1.0</version>
<request></request>
<datetime>2016-02-23 11:27:00</datetime>
<originator>DNB</originator>
</header>
<result resultcode='OK'>
</result>
</MetadataBlock>
Request:
DELETE /api/asset/internal/<internalassetid>
Authorization: Basic <authstring>
{
"type": "deleteasset"
}
Response:
200 OK
Content-Type: application/json; charset=utf-8
Checking the consistency of stored assets¶
Purpose:
Provides an operation that can be used to check the consistency of an asset by recalculating the checksums and comparing the values with the stored values in Data Management.
Request:
GET /dias/administration?internalassetid=<internalassetid>&request=checkConsistency
<internalassetid> the internal asset id of the asset that the consistency check is requested for.
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<MetadataBlock>
<header>
<version>1.0</version>
<request></request>
<datetime>2016-02-23 11:27:00</datetime>
<originator>DNB</originator>
</header>
<result resultcode='ok'>
</result>
</MetadataBlock>
Result codes attached to the checkConsistency operation:
- ok: The asset or AIP consistency checks did not return any consistency errors
- error: A consistency error occurred. Further information can be found in the result element.
- fatal: Indicates a fatal state which caused the application to shut down.
Request:
POST /api/checkconsistency/<internalassetid>
Authorization: Basic <authstring>
Response:
201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<sessionid>
Modified AIPs¶
Purpose:
Provides an operation to get an overview of all AIPs that have been created or deleted within a specified timeframe. This allows systems with a dependency on DIAS to discover changes.
Request:
GET /dias/administration?modification=<modification_type>
&from=<start>&to=<end>&request=getModifiedAIPs
<modification_type> created, aipdeleted, assetdeleted
<start> lower boundary of the status retrieval timeframe (inclusive), eg: 2005-01-01T00:00:00
<end> upper boundary of the status retrieval timeframe (exclusive), eg: 2006-12-31T23:59:59
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<MetadataBlock>
<header>
<version>1.0</version>
<request>getmodifiedaips</request>
<datetime>2016-04-04 12:35:20.567000</datetime>
<originator>DNB</originator>
</header>
<result resultcode="OK">
<modifiedAIP>
<internalAssetID>9dac534d-38d7-41d4-9b6f-876dee236581</internalAssetID>
<modificationTimeStamp>2016-04-04 14:26:28</modificationTimeStamp>
</modifiedAIP>
</result>
</MetadataBlock>
DIP¶
Request technical metadata¶
Purpose:
Request the technical metadata by the external asset identifier (lmerObject.persistentidentifier in SIP interface) or internal asset identifier.
One external asset identifier can refer to multiple Internal Assets when the digital object is transformed over time. The metadata of all related Internal Assets can be returned as part of this request. The request can be extended with the parameter 'list' which can have the values 'true' or 'false'. If List is set to 'true' then metadata is returned for all Assets in the collection of assets with the specified external asset identifier. If the value is set to 'false' (which is the default value) only the metadata of the latest Asset with the specified external asset identifier will be returned.
Request:
GET /dias/Retriever?ExternalAssetID=<ExternalAssetID>&RespFmt=<RespFmt>
&Request=Metadata&list=<list>
GET /dias/Retriever?InternalAssetID=<InternalAssetID>&RespFmt=<RespFmt>
&Request=Metadata
<RespFmt> Currently only 'xml' is supported.
<list> true or false
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<DIASMETSRESPONSE type="dip" version="1.0">
<DESCRIPTION>metadata response</DESCRIPTION>
<REQUEST>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
</REQUEST>
<DIP>
<METADATABLOCK>
<METADATA>
<INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
<ASSETVERSION>1</ASSETVERSION>
<ASSETDESCRIPTION/>
<AIPISDELETED>0</AIPISDELETED>
<AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
<ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
<ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
<OLDOBJECTIDENTIFIER/>
<OLDVERSION/>
<SIZE>21577686</SIZE>
<DIPRETRIEVEWAITTIME unit="sec"/>
</METADATA>
</METADATABLOCK>
<ASSETBLOCK>
<ASSET>
<INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
<URI/>
</ASSET>
</ASSETBLOCK>
</DIP>
</DIASMETSRESPONSE>
Request:
GET /api/asset/<by>/<assetid>?list=true
Authorization: Basic <authstring>
<by> external or internal
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
"externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
"assetversion": "1",
"aipisdeleted": "false",
"size": "5456222",
...
}
Request complete technical metadata by external or internal asset identifier¶
Purpose:
Request complete metadata: mets.xml
Request:
GET /dias/Retriever?InternalAssetID=<InternalAssetID>&Respfmt=<RespFmt>
&Request=FullMetadata
Authorization: Basic <authstring>
GET /dias/Retriever?ExternalAssetID=<ExternalAssetID>&Respfmt=<RespFmt>
&Request=FullMetadata
Authorization: Basic <authstring>
<RespFmt> Currently only 'xml' is supported.
Response:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<DIASMETSRESPONSE type="dip" version="1.0">
<DESCRIPTION>fullmetadata response</DESCRIPTION>
<REQUEST>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
</REQUEST>
<DIP>
<METADATABLOCK>
<METADATA>
<INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
<ASSETVERSION>1</ASSETVERSION>
<ASSETDESCRIPTION/>
<AIPISDELETED>0</AIPISDELETED>
<AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
<ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
<ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
<OLDOBJECTIDENTIFIER/>
<OLDVERSION/>
<SIZE>21577686</SIZE>
<DIPRETRIEVEWAITTIME unit="sec"/>
</METADATA>
</METADATABLOCK>
<ASSETBLOCK>
<ASSET>
<INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
<URI>/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/unpacked/mets.xml</URI>
</ASSET>
</ASSETBLOCK>
</DIP>
</DIASMETSRESPONSE>
The response is returned as a link to the file containing the full technical metadata. The technical metadata is placed in an unpacked form on a HTTP accessible location in the koala system.
The response will contain DIAS-METS formatted technical metadata with all information that was present in the input technical metadata with the exception of data elements that will be actualised.
Request:
POST /api/retrieve/<by>/<assetid>
Authorization: Basic <authstring>
{
"type": "fullmetadata"
}
Response:
201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<sessionid>
Request a specific asset¶
Purpose:
Request a specific asset.
Request:
GET /dias/Retriever?InternalAssetID=<InternalAssetID>&Respfmt=<RespFmt>&DipFmt=<format>
&Request=Asset
<InternalAssetID> Internal Asset Id
<RespFmt> Currently only 'xml' is supported.
<format> archive format ('zip', 'tar', 'unpacked'). If not specified the DIP will be unpacked.
Response:
On successful retrieve:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<DIASMETSRESPONSE type="dip" version="1.0">
<DESCRIPTION>asset response</DESCRIPTION>
<REQUEST>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
</REQUEST>
<DIP>
<METADATABLOCK>
<METADATA>
<INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
<ASSETVERSION>1</ASSETVERSION>
<ASSETDESCRIPTION />
<AIPISDELETED>0</AIPISDELETED>
<AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
<ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
<ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
<OLDOBJECTIDENTIFIER></OLDOBJECTIDENTIFIER>
<OLDVERSION></OLDVERSION>
<SIZE>21577686</SIZE>
<DIPRETRIEVEWAITTIME unit="sec"></DIPRETRIEVEWAITTIME>
</METADATA>
</METADATABLOCK>
<ASSETBLOCK>
<ASSET>
<INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
<URI>http://koala.gwdg.de/download/c16e7f14-ba59-4ea3-a5ca-6d5ac7c1f646/unpacked</URI>
</ASSET>
</ASSETBLOCK>
</DIP>
</DIASMETSRESPONSE>
On server busy:
200 OK
Content-Type: application/xml; charset=utf-8
<?xml version="1.0" encoding="UTF-8"?>
<DIASMETSRESPONSE type="busy" version="1.0">
<DESCRIPTION>asset response</DESCRIPTION>
<REQUEST>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
</REQUEST>
<DIP>
<METADATABLOCK>
<METADATA>
<INTERNALASSETID>0e2592bd-3d37-40a2-a6a2-5360ee7016b7</INTERNALASSETID>
<EXTERNALASSETID>urn:nbn:de:101:1-2016021016844</EXTERNALASSETID>
<ASSETVERSION>1</ASSETVERSION>
<ASSETDESCRIPTION />
<AIPISDELETED>0</AIPISDELETED>
<AIPDELETIONTIMESTAMP>None</AIPDELETIONTIMESTAMP>
<ASSETCREATIONTIMESTAMP>2016-02-23 09:54:50</ASSETCREATIONTIMESTAMP>
<ASSETINGESTTIMESTAMP>2016-02-23 09:54:50</ASSETINGESTTIMESTAMP>
<OLDOBJECTIDENTIFIER></OLDOBJECTIDENTIFIER>
<OLDVERSION></OLDVERSION>
<SIZE>21577686</SIZE>
<DIPRETRIEVEWAITTIME unit="sec"></DIPRETRIEVEWAITTIME>
</METADATA>
</METADATABLOCK>
<ASSETBLOCK />
</DIP>
</DIASMETSRESPONSE>
The response is returned as a link to the DIP-file containing the asset. The technical metadata and the asset is placed in an unpacked form on a HTTP accessible location in the koala system.
The DIP data package for that is returned for an Asset Retrieve Request will contain DIAS-METS formatted technical metadata with all information that was present in the input technical metadata with the exception of data elements that have to be actualised.
The following data elements will be actualised with current values from Data Management in the DIASMETS formatted technical metadata:
- mets.OBJID must be set with the internalAssetID;
- mets.amdSec.techMD.mdWrap.xmlData.lmerObject.objectIdentifier must be set with the internalAssetID;
- mets.metsHdr.CREATEDATE must be set with the current system-date/time;
- mets.metsHdr.agent.ROLE must be set to 'ARCHIVIST';
- mets.metsHdr.agent.TYPE must be set to 'ORGANISATION';
- mets.metsHdr.agent.name must be set to the requesting organisation
koala will update the lmerObject.status-element to 'deleted' in the technical metadata in DIASMETS format that is included in the DIP data package that is returned for a successful Asset Retrieve Request in case the AIP of the requested Asset has been deleted.
koala will remove all mets.fileSec.fileGrp.file-elements in the mets.fileSec.fileGrp-element identified by 'ASSET' in the technical metadata in DIAS-METS format that is included in the DIP data package that is returned for a successful Asset Retrieve Request in case the AIP of the requested Asset has been deleted.
The physical files in the DIP data package will be present in the same relative location from the Asset root as they were in the input SIP data package.
The DIP-archive is the response on a request with the DipFmt parameter. A DIP-archive is made available on a HTTP
accessible location in the koala system. The DIP-archive is placed in a unique directory for the request. In this
directory there are two subdirectories: the 'packed' and the 'unpacked' directory. In the packed directory the
DIP-archive is named like: Dip-
The Archival Information Package (AIP) with the asset is retrieved from archival storage and the asset is extracted and stored in unpacked format in the unpacked directory. Then the asset is packet into the requested archive format and stored in the packed directory. The DIP-archive is made available on a HTTP accessible location in the koala system. Dependent on request the DIP-archive contains metadata and/or the asset. When the asset is included the content of the asset is the same as it was ingested. This means the additional files and a descriptive file are included as well.
Request:
POST /api/retrieve/<by>/<assetid>
Authorization: Basic <authstring>
{
"type": <type>,
"format": <format>,
"use_cache": <use_cache>,
"callback_url": <callback_url>
}
<type> required, possible values: asset, fullmetadata
<format> optional, possible values: unpacked, zip, tar
<use_cache> optional, possible values: false, true
<callback_url> optional, sends http post request to specified url after successfully building DIP.
Response:
201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<session_id>
Callback request example:
POST https://www.external.org/events/retrieve
Content-Type: application/json; charset=utf-8
{
"create_ts": "2021-09-28 10:15:40",
"modify_ts": "2021-09-28 10:15:40",
"originator": " loader@8e47783cdcfd",
"uri": "https://koala.gwdg.de/download/5e6d4cf0-0dfb-44e2-8d2d-8ef58bd33a58/unpacked",
"internal_asset_id": "d53e34c7-e1bb-4867-9a36-95df1b7adcca",
"external_asset_id": "71f3b7c1-e0d7-48de-a75b-098498697da4"
}
Request multiple assets¶
Purpose:
Retrieve multiple assets in a more efficient way. It is better to pass a list of packages to be retrieved directly to the API. The TSM can then internally set the retrieval order to minimize the number of tape changes and spooling of the tapes.
Request by internal_asset_id:
POST /api/retrieveall
Authorization: Basic <authstring>
{
"internal_asset_ids": ["71f3b7c1-e0d7-48de-a75b-098498697da4", "4f086b5d-7827-4d0b-9adb-e21a25a30f48"],
"callback_url": <callback_url>
}
<callback_url> optional, sends http post request to specified url after successfully retrieving all assets.
Request by external_asset_id:
If migrations exist, only the latest asset will be retrieved.
POST /api/retrieveall
Authorization: Basic <authstring>
{
"external_asset_ids": ["71f3b7c1-e0d7-48de-a75b-098498697da4", "4f086b5d-7827-4d0b-9adb-e21a25a30f48"],
"callback_url": <callback_url>
}
<callback_url> optional, sends http post request to specified url after successfully retrieving all assets.
Response:
201 CREATED
Content-Type: application/json; charset=utf-8
Location: /api/retrievestatus/<session_id>
Response on asset not found:
400 BAD REQUEST
Content-Type: application/json; charset=utf-8
{
"msg": "Some assets could not be found",
"ids": ["71f3b7c1-e0d7-48de-a75b-098498697da4", "4f086b5d-7827-4d0b-9adb-e21a25a30f48"]
}
Response on downloadarea has not sufficient space:
400 BAD REQUEST
Content-Type: application/json; charset=utf-8
{
"msg": "need approx. 1.1TB but only 101GB free"
}
Query retrieve status¶
Purpose:
To query the status of a retrieve operation. For REST
endpoints that return a <session_id>
.
Request:
GET /api/retrievestatus/<session_id>
Response for fullmetadata:
{
"status": "ready",
"internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
"externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
"uri": "https://koala.gwdg.de/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/unpacked/mets.xml"
}
Response for packed asset:
{
"status": "ready",
"internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
"externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
"uri": "https://koala.gwdg.de/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/packed/dip-asset.zip"
}
Response for unpacked asset:
{
"status": "ready",
"internalassetid": "da12f687-67f2-452d-ab55-7c2ce517c9c4",
"externalassetid": "c1c55e93-80cb-4639-af2f-653bf9ddb68f",
"uri": "https://koala.gwdg.de/download/7aaff403-8ce9-439b-ae43-1e1ad51b76b6/unpacked"
}
Response for retrieve_all:
{
"status":"ready",
"external_asset_id":null,
"internal_asset_id":null,
"suppl_info":"4 of 4", # reflects retrieve progress
"uri":"https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e"}
Assets are grouped by internal_asset_id. Example:
https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/47c53955-e7b6-48e5-b046-005fc9d5683a/aip.zip
https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/47c53955-e7b6-48e5-b046-005fc9d5683a/mets.xml
https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/cc8d6f41-e76e-49a4-a448-186a67526781/aip.zip
https://koala.gwdg.de/download/e3ffd339-9eba-4de4-8643-28b5b507eb5e/cc8d6f41-e76e-49a4-a448-186a67526781/mets.xml
MDQI¶
Every metadata file ingested into koala is stored in the baseX database. baseX is a native xml database and provides support for standardized query languages such as XPath and XQuery. See: http://docs.basex.org/wiki/Getting_Started
REST Query URL
POST https://koala.gwdg.de:8984/rest
User-Agent: <agent>
Authorization: Basic <basicauth>
Host: 134.76.20.34:8984
Content-Length: 165
<query xmlns="http://basex.org/rest">
<text>
<![CDATA[
enter query here
]]>
</text>
</query>
Sample Queries:
Get external asset ids of all assets that have a minimum of one file with mimetype: application/pdf
declare default element namespace "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
for $db in db:list()
for $doc in collection($db)
let $pid := $doc/koala/metsns:mets/metsns:amdSec/metsns:techMD/metsns:mdWrap/metsns:xmlData/lmerObject:persistentIdentifier
let $f := $doc//metsns:mets/metsns:fileSec/metsns:fileGrp/metsns:file[@MIMETYPE = "application/pdf"]
return $pid/text()
Count all files (not assets!)
declare default element namespace "http://www.loc.gov/METS/";
count(for $db in db:list()
return collection($db)//file)
Get internal asset ids of all assets that have a minimum of one file with mimetype: application/pdf
declare default element namespace "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
for $db in db:list()
for $doc in db:open($db)
let $objid := $doc/@OBJID
let $file := $doc//file
where $file/@MIMETYPE = 'application/pdf'
return <OBJID>{data($objid)}</OBJID>
Search a specific asset by external asset id
declare namespace mets = "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
for $db in db:list()
for $doc in collection($db)
where $doc/koala/metsns:mets/metsns:amdSec/metsns:techMD/metsns:mdWrap/metsns:xmlData/lmerObject:persistentIdentifier/text() eq 'urn:nbn:de:101:1-2008591923'
return $doc
Search a specific asset by internal asset id
(for $db in db:list()
for $doc in collection($db)
where $doc//*:internal_asset_id = 'urn:diasid:ast:kop01:0201504171131522770000'
return $doc)[1]
Get the number of files where fits was used in a specific version
declare namespace mets = "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
declare namespace lmerFile = "http://www.ddb.de/LMERfile";
declare namespace fits = "http://hul.harvard.edu/ois/xml/ns/fits/fits_output";
count(for $db in db:list()
for $doc in collection($db)
where $doc//fits:identity[@toolversion = '0.6.1']
return $doc//mets:file)
Get first 10 files where bitrate in fits metadata is set to a specific value
declare default element namespace "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
declare namespace lmerFile = "http://www.ddb.de/LMERfile";
declare namespace fits = "http://hul.harvard.edu/ois/xml/ns/fits/fits_output";
let $a := for $db in db:list()
for $doc in collection($db)
let $techmd := $doc//amdSec/techMD
let $bitrate := $techmd/mdWrap/xmlData/lmerFile:xmlData/fits:fits/fits:metadata/fits:audio/fits:bitRate
let $fileid := $techmd/@ID
where $bitrate = 128000
return $doc//file[@ADMID=$fileid]
for $b in subsequence($a, 1, 10)
return $b
Get all files for a given external_asset_id
declare namespace mets = "http://www.loc.gov/METS/";
declare namespace lmerObject = "http://www.ddb.de/LMERobject";
declare namespace xlink = "http://www.w3.org/1999/xlink";
for $db in db:list()
for $doc in collection($db)
let $file := $doc//mets:file
where $doc/koala/aip/external_asset_id = "urn:de:sample:myurn"
return $doc//mets:fileGrp
Directly access text index
count(
for $db in db:list()
return db:text($db, "urn:diasid:fty:kopal:0000000000000000000000")
)
Count the number of files which have an unknown filetype.
declare namespace lmerFile = "http://www.ddb.de/LMERfile";
count(
(# db:enforceindex #) {
for $db in db:list()
for $doc in db:open($db)/*
where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000'
return $doc
})
Simple blob storage¶
Archive and Retrieve binary objects via simple http rest calls. HTTP basic auth must be specified in all cases.
Upload¶
An uploaded object is addressable by its internal_asset_id.
user@host ~: curl -X POST --data-binary @"myfile" --user user:pw https://koala.gwdg.de/api/sbs
HTTP/1.1 100 Continue
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 61
{"internal_asset_id":"63b54de9-dc2b-4bff-be28-e2ad92d02b5c"}
Download¶
Retrieving a large object can take some time. Therefore an asynchronous retrieve job must be triggered.
user@host ~: curl -i --user user:pw -X POST -H "Content-Type: application/json" -d '{"internal_asset_id":"63b54de9-dc2b-4bff-be28-e2ad92d02b5c"}' https://koala.gwdg.de/api/sbsr
HTTP/1.1 202 ACCEPTED
Location: https://koala.gwdg.de/api/sbsr/39bffece-3cb8-419a-bb58-432d061483c8
If object is not yet available:
user@host ~: curl -i --user user:pw https://koala.gwdg.de/api/sbsr/39bffece-3cb8-419a-bb58-432d061483c8
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
{
"status": "retrieve",
"originator": "retriever@6146de9de49f",
"internal_asset_id": "c546a567-724a-46ed-8ffb-19ab078ffc7b"
}
If object is available for download:
user@host ~: curl -i --user user:pw https://koala.gwdg.de/api/sbsr/39bffece-3cb8-419a-bb58-432d061483c8
HTTP/1.1 303 SEE OTHER
Location: https://koala.gwdg.de/download/39bffece-3cb8-419a-bb58-432d061483c8/content
Content-Type: application/json; charset=utf-8
{
"status": "ready",
"originator": "retriever@6146de9de49f",
"internal_asset_id": "c546a567-724a-46ed-8ffb-19ab078ffc7b"
}
Save object to local file.
user@host ~: curl --user user:pw -o mybigfile https://koala.gwdg.de/download/39bffece-3cb8-419a-bb58-432d061483c8/content
Delete¶
Delete object from database and archival storage.
user@host ~: curl -X DELETE -i --user user:pw https://koala.gwdg.de/api/sbs/63b54de9-dc2b-4bff-be28-e2ad92d02b5c
HTTP/1.1 200 OK
Miscellaneous¶
Ready for ingest¶
Purpose:
Can be used by connected systems to decide if koala is currently ready to process SIPs.
Request:
GET /api/check_ingest_ability
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"ready": true,
"uploadarea_usage": {"free": 68058800128, "total": 73550618624, "used": 2244472832}
}
Health¶
Purpose:
Request health status of middleware componenents. Tries to create a new connection to the individual backend.
Request:
GET /api/health
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"health": [
{
"name": "database",
"value": "ok"
},
{
"name": "archival storage",
"value": "ok"
},
{
"name": "cache",
"value": "ok"
},
{
"name": "message queue",
"value": "ok"
},
{
"name": "mdqi",
"value": "ok"
}
],
"last_check": "2018-01-24 09:14:51"
}
App status¶
Purpose:
Requests which koala apps are running at the moment and their current processing status.
Request:
GET /api/stats/apps
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"apps": [
{
"name": "loader@2c608e84f918",
"processing": "done",
"progress": "100",
"status": "running",
"uptime": "00:17:09",
"version": "c341d48-267-2018-01-24"
},
{
"name": "loader@43b2fc39f0db",
"processing": "done",
"progress": "100",
"status": "running",
"uptime": "00:17:09",
"version": "c341d48-267-2018-01-24"
},
{
"name": "loader@b0bfe15a26ea",
"processing": "done",
"progress": "100",
"status": "running",
"uptime": "00:17:11",
"version": "c341d48-267-2018-01-24"
},
{
"name": "loader@b3a356a302f5",
"processing": "done",
"progress": "100",
"status": "running",
"uptime": "00:17:09",
"version": "c341d48-267-2018-01-24"
},
{
"name": "purger@36c076ad6740",
"processing": "sleeping",
"progress": "0",
"status": "running",
"uptime": "00:17:11",
"version": "c341d48-267-2018-01-24"
},
{
"name": "retriever@a05eb49313d4",
"processing": "",
"progress": "0",
"status": "running",
"uptime": "00:17:11",
"version": "c341d48-267-2018-01-24"
},
{
"name": "scheduler@142e6754c68c",
"processing": "sleeping",
"progress": "0",
"status": "running",
"uptime": "00:17:11",
"version": "c341d48-267-2018-01-24"
},
{
"name": "stats@b4e4e1dbcf6a",
"processing": "sleeping",
"progress": "0",
"status": "running",
"uptime": "00:17:11",
"version": "c341d48-267-2018-01-24"
}
]
}
Disk status¶
Purpose:
Requests disk usage.
Request:
GET /api/stats/disk
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"disk_download": {
"free": 45697851392,
"total": 52710469632,
"used": 6995841024
},
"disk_error": {
"free": 22852845568,
"total": 26288123904,
"used": 3418501120
},
"disk_preload": {
"free": 51225415680,
"total": 52710469632,
"used": 1468276736
},
"disk_work": {
"free": 26223419392,
"total": 26288123904,
"used": 47927296
}
}
Ingest statistics¶
Purpose:
Requests time based ingest statistics.
Request:
GET /api/stats/asset/<resolution>
<resolution> hour/day/week/month/year, required
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"asset": {
"average": "11.09 MB",
"count": "2,956,898",
"sum": "31.28 TB"
},
"data": [
{
"x": "2018-01-24 08:30",
"y1": "3",
"y2": "19168649"
},
{
"x": "2018-01-24 08:31",
"y1": "23",
"y2": "169189475"
}
]
}
x1: timespan
y1: number of assets ingested
y2: number of bytes
Refresh cache¶
Purpose:
Invalidates all cached data and triggers a recalculation of all statistics.
Warning
This can take some time and generates high load on the database.
Request:
GET /api/refreshcache
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"result": "ok"
}
Show cache sync status¶
Purpose:
Compares the number of stored assets in cache and db.
Request:
GET /api/showcachesync
Response:
200 OK
Content-Type: application/json; charset=utf-8
{
"cache_count": 654899,
"db_count": 654899,
"diff": 0,
"result": "ok"
}
Limitations¶
- L1: Schemas which are used to validate the metadata file while ingesting can have a maximum file size of 16 MB according to MySQLs MediumBlob Datatype.
- L2: mets.xml metadata files may only have a maximum file size of 0.5 x RAM of Loader Host.