Ingestion

From S3

To ingest data from s3 you need to specify a host to connect to, but there are also other settings that can be used:

spec: clusterConfig: ingestion: s3connection: host: yourhost.com (1) port: 80 # optional (2) credentials: # optional (3) ...

1	The S3 host, not optional
2	Port, optional, defaults to 80
3	Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained below.

You can specify just a connection/bucket for either ingestion or deep storage or for both, but Druid only supports a single S3 connection under the hood. If two connections are specified, they must be the same. This is easiest if a dedicated S3 Connection Resource is used - not defined inline but as a dedicated object.

TLS for S3 is not yet supported.

S3 credentials

No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You will need a Secret containing the access key ID and secret access key, a SecretClass and then a reference to this SecretClass where you want to specify the credentials.

The Secret:

apiVersion: v1 kind: Secret metadata: name: s3-credentials labels: secrets.stackable.tech/class: s3-credentials-class (1) stringData: accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE

1	This label connects the `Secret` to the `SecretClass`.

The SecretClass:

apiVersion: secrets.stackable.tech/v1alpha1 kind: SecretClass metadata: name: s3-credentials-class spec: backend: k8sSearch: searchNamespace: pod: {}

Referencing it:

... credentials: secretClass: s3-credentials-class ...

Adding external files, e.g. for ingestion

Since Druid actively runs ingestion tasks there may be a need to make extra files available to the processes.

These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket.

In order to make these files available the operator allows specifying extra volumes that will be added to all pods deployed for this cluster.

spec: clusterConfig: extraVolumes: - name: google-service-account secret: secretName: google-service-account

All Volumes specified in this section will be made available under /stackable/userdata/{volumename}.