Installation and Deployment

Installation

Clone repository and edit config files

root@koala:~# git clone https://gitlab.gwdg.de/bbrauns/koala.git
root@koala:~# cd koala/install/compose-koala
root@koala:/root/koala/install/compose-koala# l
common.env  dias-koala/  docker-compose.yml  dsm.sys  koala.key  koala.pem  mysql_dump.sql
  • Edit config files: common.env, dsm.sys
  • Ensure paths exists, for example: /dias/workarea
  • Create ssl cert with private key

Login to docker registry

root@koala:~# docker login docker.gitlab.gwdg.de/bbrauns/koala
Username: user@gwdg.de
Password:
Login Succeeded

Startup stack

root@koala:/root/koala/install/compose-koala# export NR_OF_CPUS=$(grep -c processor /proc/cpuinfo)
root@koala:/root/koala/install/compose-koala# docker-compose up -d --scale loader=$NR_OF_CPUS

Deployment

The deployment of the koala stack is done via docker containers and orchestrated via docker-compose. A sample docker-compose.yml can be found in the install directory. A basic deployment consists of the following application containers: web, scheduler, purger, loader, retriever, stats. Load dependently additional loader/retriever can be spawned.

Minimum requirements:

  • x86_64 Linux
  • Docker >= 17.03.0-ce
  • 2 CPU,
  • 4 GB RAM,
  • HDD System 20 GB
  • HDD DB 50 GB
  • HDD Workarea 200 GB,
  • HDD Downloadarea 200 GB
  • HDD Uploadarea 200 GB

Note

Size estimation based on SIPs < 50 GB. Larger SIPs require appropriate dimensioned upload- and workareas.

Typical

Hardware

- koala-test koala-prod
Description Test system Productive system
Applications Database, MQ, Cache, Koala apps Database, MQ, Cache, Koala apps
CPU 2 4
RAM 8 GB 6 GB
HDD 1 TB 256 GB
SSD - 50 GB (uploadarea), 100 GB (workarea)
Platform VM VM

Cluster

Variant A

Applicable if loader processing time is the bottleneck.

  • shared storage for koala-1 and koala-2
  • web application only on main instance
  • middleware only on main instance
  • ingests get distributed across two nodes
  • koala-2 does not need a public IP

cluster-a

Variant B

Applicable if transfering the files with SFTP is the bottleneck

  • retriever and web application only on main instance
  • middleware only on main instance
  • ingests get distributed across two nodes
  • retrieves only on main instance
  • no shared storage needed
  • koala-2 needs a public IP
  • purger app per host needed
  • client application must ask with /api/ftpinfo which hostname to use for uploading files

cluster-b

Variant C

Applicable on A, B and: multiple retrieves simultaneously, high write/read load on middleware.

  • same as B
  • koala-1 and koala-2 http and sftp interfaces can be used interchangeably
  • koala-db as separate host for middleware
  • koala-db does not need a public IP

cluster-c

Software

Frontend

frontend

Backend

backend

Configuration

The applications make use of environment variables for configuration settings.

Name Type Description
DATABASE_CONNECTION_STRING string Redis hostname or ip
CACHE_HOST string string
DATABASE_CONNECTION_STRING string string
DEFAULT_APP_NAME string string
DEFAULT_HTTP_BASIC_AUTH_REQUIRED bool Use http basic auth to access api endpoints
DEFAULT_API_QUERY_AUTH_REQUIRED string See DIAS Spec. Passwords via url required
DEFAULT_HEARTBEAT_INTERVAL_IN_SEC int Application heartbeat interval
DEFAULT_SEND_MAIL_ON_ERROR bool Send email on ingest error or crash
DEFAULT_SEND_MAIL_ON_SUCCESS bool Send email on successful ingest
DEFAULT_SMTP_FROM string Email sender
DEFAULT_SMTP_HOST string Email relay hostname
DEFAULT_SMTP_TO string Email recipient(s)
DEFAULT_WORKAREA_BASE_PATH string Path to work area
DEFAULT_KOALA_NAME string Name of koala installation. Shown in web ui
DEFAULT_SEND_MAIL_ON_NO_INGESTS bool Send mail on no new ingested packages after a defined period
DEFAULT_NO_INGESTS_MAX_TIMESPAN_IN_MINUTES int Send mail after x minutes no new packages
DEFAULT_NO_INGESTS_EMAIL_TO string Mail recipient(s) on no new ingested packages
FTPINFO_PASSWORD string Passwort for SFTP user
FTPINFO_PRELOAD_AREA string Path to upload area in SFTP Container
FTPINFO_SERVER string Hostname or ip to SFTP upload server
FTPINFO_SSHKEY string SSH public key which is used to allow connections via SFTP instead of username+password
FTPINFO_USER string SFTP username
KOALA_MYSQL_HOST string MySQL hostname
LOADER_AIP_MAX_FILE_COUNT string Maximum allowed file count in an AIP
LOADER_ERRORAREA string Path to error area
LOADER_SIP_MAX_FILENAME_LENGTH string If specified limit maximum filename length in SIP.
LOADER_USE_7ZIP bool Use installed 7zip package for packing and unpacking instead of internal implementation
LOADER_USE_BLOCKSIZE int Blocksize in bytes to use for file hashing, optional
LOADER_7ZIP_PATH string Path to 7zip binary
LTP_IMPLEMENTATION string Archival storage backend type
LTP_NFSHSM_AIP_FOLDER string If LTP_IMPLEMENTATION=hsm then path to archival base folder
LTP_AIP_BASE_NAME string Filename of the AIP (e.g. aip). Basename and format specify the resulting AIP.
LTP_AIP_FORMAT string Format of the AIP (e.g. zip)
LTP_TSM_FILESPACE string Use different tsm filespace name. Default: kopal
LTP_OPTIMIZE_FOR_ZIP bool Use updated SIP in uploadarea as AIP. Saves an IO intensive repacking operation but in case of an ingest error the SIP is not re-ingestable because of the updated metadata file.
MDQI_INDEX_WHILE_INGEST bool Save metadata in MDQI during ingest transaction
MDQI_BASEX_URL string BaseX rest endpoint address
MDQI_BASEX_USER string BaseX username
MDQI_BASEX_PW string BaseX password
MQ_HOST string RabbitMQ hostname or ip
MQ_PORT int RabbitMQ protocol port
MQ_PW string RabbitMQ password
MQ_USER string RabbitMQ username
MQ_API_PORT string RabbitMQ Rest endpoint and port
MYSQL_ROOT_PASSWORD string MySQL root password
MYSQL_USER string MySQL application user
MYSQL_PASSWORD string MySQL application password
MYSQL_DATABASE string MySQL database
PURGER_HIGH_WATERMARK_IN_PERCENTAGE string Percentage of used disk space which triggers the purger to reclaim space
PURGER_INTERVAL_IN_SEC int Purger check interval
PURGER_LOW_WATERMARK_IN_PERCENTAGE string Percentage of used disk space which makes the purger to stop reclaming space
PURGER_RETENTION_TIME_IN_MINUTES int Minimum time interval the retrieved assets stay in download area
RABBITMQ_DEFAULT_PASS string RabbitMQ password
RABBITMQ_DEFAULT_USER string RabbitMQ user
RETRIEVER_DIP_FILENAME string DIP filename
RETRIEVER_DISABLE_DIP_AGENT_OVERWRITE bool Do not overwrite metsHdr/agent values in DIP mets.xml
RETRIEVER_DOWNLOADAREA string Download area
RETRIEVER_RETRIEVE_QUEUE string Retriever queue
RETRIEVER_RETRIEVE_ROUTING_KEY string Retriever routing key
RETRIEVER_WEBSERVER_RELATIVE_ALIAS string Uri segment which causes the web server to rewrite and list disk contents
RETRIEVER_FRONTEND string Can be set to form a complete URL with protocol + hostname instead of a relative url
SCHEDULER_INGEST_QUEUE string Ingest queue name
SCHEDULER_INGEST_ROUTING_KEY string Ingest routing key
SCHEDULER_INTERVAL_IN_SEC int Interval the scheduler checks for new SIPs
SCHEDULER_PRELOADAREA string Upload area path
SCHEDULER_SIP_FILE_PATTERN string Regex pattern which is used to recognize new SIPs
SCHEDULER_SCHEDULED_FILE_EXTENSION string File extension the scheduler uses to mark queued SIPs
STATS_INTERVAL_IN_SEC int Statistics refresh interval in seconds
WEB_SECRET_KEY string Secret key for sessions
WEB_LDAP_ENABLE bool Enable/Disable ldap auth
WEB_LDAP_HOST string Ldap server hostname
WEB_LDAP_PORT int Ldap port
WEB_LDAP_USE_SSL bool Enable/Disable SSL
WEB_LDAP_ADMIN_DN string DN of admin user
WEB_LDAP_ADMIN_PW string Password of admin user
WEB_LDAP_BASE_DN string BaseDN for user search
WEB_LDAP_FILTER string Filter for user search
WEB_SSO_ENABLED bool Enable/Disable SSO auth
WEB_SSO_ISSUER string OpenID Connect issuer
WEB_SSO_SCOPE string OpenID Connect scope
WEB_SSO_CLIENT_ID string OpenID Connect client id
WEB_SSO_CLIENT_SECRET string OpenID Connect client secret
WEB_SSO_PROTO string http/https protocol of client app
WEB_SSO_AUTHORIZATION_ENDPOINT string OpenID Connect auth endpoint url
WEB_SSO_TOKEN_ENDPOINT string OpenID Connect token endpoint url
WEB_SSO_USERINFO_ENDPOINT string OpenID Connect userinfo endpoint url
WEB_SSO_JWKS_URI string OpenID Connect jwks endpoint url