Ingestion
From S3
To ingest data from s3 you need to specify a host to connect to, but there are also other settings that can be used:
spec: clusterConfig: ingestion: s3connection: host: yourhost.com (1) port: 80 # optional (2) credentials: # optional (3) ...
1 | The S3 host, not optional |
2 | Port, optional, defaults to 80 |
3 | Credentials to use. Since these might be bucket-dependent, they can instead be given in the ingestion job. Specifying the credentials here is explained below. |
You can specify just a connection/bucket for either ingestion or deep storage or for both, but Druid only supports a single S3 connection under the hood. If two connections are specified, they must be the same. This is easiest if a dedicated S3 Connection Resource is used - not defined inline but as a dedicated object. TLS for S3 is not yet supported. |
S3 credentials
No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You will need a Secret
containing the access key ID and secret access key, a SecretClass
and then a reference to this SecretClass
where you want to specify the credentials.
The Secret
:
apiVersion: v1 kind: Secret metadata: name: s3-credentials labels: secrets.stackable.tech/class: s3-credentials-class (1) stringData: accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE
1 | This label connects the Secret to the SecretClass . |
The SecretClass
:
apiVersion: secrets.stackable.tech/v1alpha1 kind: SecretClass metadata: name: s3-credentials-class spec: backend: k8sSearch: searchNamespace: pod: {}
Referencing it:
... credentials: secretClass: s3-credentials-class ...
Adding external files, e.g. for ingestion
Since Druid actively runs ingestion tasks there may be a need to make extra files available to the processes.
These could for example be client certificates used to connect to a Kafka cluster or a keytab to obtain a Kerberos ticket.
In order to make these files available the operator allows specifying extra volumes that will be added to all pods deployed for this cluster.
spec: clusterConfig: extraVolumes: - name: google-service-account secret: secretName: google-service-account
All Volumes
specified in this section will be made available under /stackable/userdata/{volumename}
.