|
| 1 | += S3 resources |
| 2 | + |
| 3 | +// -------------- Intro ---------------- |
| 4 | + |
| 5 | +Many of the tools on the Stackable platform integrate with S3 storage in some way. |
| 6 | +For example Druid can xref:druid::usage.adoc#_s3_for_ingestion[ingest data from S3] and also xref:druid::usage.adoc##_s3_deep_storage[use S3 as a backend for deep storage], Spark can use an xref:spark-k8s::usage.adoc#_s3_bucket_specification[S3 bucket] to store application files and data. |
| 7 | + |
| 8 | +== S3Connection and S3Bucket |
| 9 | +// introducing the objects |
| 10 | + |
| 11 | +Stackable uses _S3Connection_ and _S3Bucket_ objects to configure access to S3 storage. |
| 12 | +// s3 connection |
| 13 | +An S3Connection object contains information such as the host name of the S3 server, it's port, TLS parameters and access credentials. |
| 14 | +// s3 bucket |
| 15 | +An S3Bucket contains the name of the bucket and a reference to an S3Connection, the connection to the server where the bucket is located. An S3Connection can be referenced by multiple buckets. |
| 16 | + |
| 17 | +Here's an example of a simple S3Connection object and an S3Bucket referencing that connection: |
| 18 | + |
| 19 | +[source,yaml] |
| 20 | +---- |
| 21 | +--- |
| 22 | +apiVersion: s3.stackable.tech/v1alpha1 |
| 23 | +kind: S3Connection |
| 24 | +metadata: |
| 25 | + name: my-connection-resource |
| 26 | +spec: |
| 27 | + host: s3.example.com |
| 28 | + port: 4242 |
| 29 | +--- |
| 30 | +apiVersion: s3.stackable.tech/v1alpha1 |
| 31 | +kind: S3Bucket |
| 32 | +metadata: |
| 33 | + name: my-bucket-resource |
| 34 | +spec: |
| 35 | + bucketName: my-example-bucket |
| 36 | + connection: |
| 37 | + reference: my-connection-resource |
| 38 | +---- |
| 39 | + |
| 40 | +== Object Reference Structure |
| 41 | +// ---------- Referencing ------------- |
| 42 | + |
| 43 | +S3Bucket(s) reference S3Connection(s) objects. Both types of objects can be referenced by other resources. For example in a DruidCluster you can specify a bucket for deep storage and an S3Connection for data ingestion. |
| 44 | +S3 connection objects can be defined in a standalone fashion or they can be inlined into a bucket object. Similarly a bucket can be defined in a standalone object or inlined into an enclosing object. |
| 45 | + |
| 46 | +[excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%] |
| 47 | +---- |
| 48 | +include::partial$diagrams/S3ResourceOverview.excalidraw[] |
| 49 | +---- |
| 50 | + |
| 51 | +The diagram above shows three examples of how the objects can be |
| 52 | + structured. |
| 53 | +// Option 1 |
| 54 | +In option 1 all objects are separate from each other. This provides maximum reusability because the same connection or bucket object can be referenced by multiple resources. It also allows for separation of concerts across team members. Cluster administrators can define S3 connection objects that developers reference in their applications. |
| 55 | +// Option 2 |
| 56 | +In option 2 the bucket is inlined in the cluster definition. This makes sense if you have a dedicated bucket for a specific purpose, if it is only used in this one cluster instance, in this single product. |
| 57 | +// Option 3 |
| 58 | +Option 3 shows all S3 objects inlined in a DruidCluster resource. This is a very convenient way to quickly test something since the entire configuration is encapsulated in a single but potentially large manifest. |
| 59 | + |
| 60 | +=== Examples |
| 61 | + |
| 62 | +To clarify the concept, a few examples will be given, using a DruidCluster resource as an example. |
| 63 | + |
| 64 | +[source,yaml] |
| 65 | +---- |
| 66 | +
|
| 67 | +apiVersion: druid.stackable.tech/v1alpha1 |
| 68 | +kind: DruidCluster |
| 69 | +metadata: |
| 70 | + name: my-druid-cluster |
| 71 | +spec: |
| 72 | + deepStorage: |
| 73 | + # to be defined ... |
| 74 | + # more spec here ... |
| 75 | +---- |
| 76 | + |
| 77 | +==== Inline definition |
| 78 | + |
| 79 | +The inline definition is variant 3 in the figure above. |
| 80 | + |
| 81 | +[excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%] |
| 82 | +---- |
| 83 | +include::partial$diagrams/S3ResourcesInlined.excalidraw[] |
| 84 | +---- |
| 85 | + |
| 86 | +This variant as the advantage that everything is defined in a single file, right where it is going to be used: |
| 87 | + |
| 88 | +[source,yaml] |
| 89 | +---- |
| 90 | +
|
| 91 | +apiVersion: druid.stackable.tech/v1alpha1 |
| 92 | +kind: DruidCluster |
| 93 | +metadata: |
| 94 | + name: my-druid-cluster |
| 95 | +spec: |
| 96 | + deepStorage: |
| 97 | + s3: |
| 98 | + inline: # <1> |
| 99 | + bucketName: my-bucket |
| 100 | + connection: |
| 101 | + inline: # <2> |
| 102 | + host: test-minio |
| 103 | + port: 9000 |
| 104 | + # more spec here ... |
| 105 | +---- |
| 106 | +<1> The inline definition of the bucket. The bucket definition contains `bucketName` and `connection`. |
| 107 | +<2> The inline definition of the connection. It contains the `host` and `port`. |
| 108 | + |
| 109 | + |
| 110 | +==== Stand-alone resources |
| 111 | + |
| 112 | +Often multiple buckets are used across a data pipeline, as well as buckets being used by different applications, so stand-alone resource definitions that can be referenced from multiple objects make sense. |
| 113 | + |
| 114 | +[excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%] |
| 115 | +---- |
| 116 | +include::partial$diagrams/S3ResourcesByReference.excalidraw[] |
| 117 | +---- |
| 118 | + |
| 119 | +The DruidCluster references the S3Bucket, which in turn references the S3Connection. First the definition of the S3Connection: |
| 120 | + |
| 121 | +[source,yaml] |
| 122 | +---- |
| 123 | +--- |
| 124 | +apiVersion: s3.stackable.tech/v1alpha1 |
| 125 | +kind: S3Connection |
| 126 | +metadata: |
| 127 | + name: my-connection-resource |
| 128 | +spec: |
| 129 | + host: s3.example.com |
| 130 | + port: 4242 |
| 131 | +---- |
| 132 | + |
| 133 | +Then the bucket, which references the connection: |
| 134 | + |
| 135 | + |
| 136 | +[source,yaml] |
| 137 | +---- |
| 138 | +--- |
| 139 | +apiVersion: s3.stackable.tech/v1alpha1 |
| 140 | +kind: S3Bucket |
| 141 | +metadata: |
| 142 | + name: my-bucket-resource |
| 143 | +spec: |
| 144 | + bucketName: my-example-bucket |
| 145 | + connection: |
| 146 | + reference: my-connection-resource |
| 147 | +---- |
| 148 | + |
| 149 | +You can then use this bucket, for example in Druid, as a deep storage: |
| 150 | + |
| 151 | +[source,yaml] |
| 152 | +---- |
| 153 | +
|
| 154 | +apiVersion: druid.stackable.tech/v1alpha1 |
| 155 | +kind: DruidCluster |
| 156 | +metadata: |
| 157 | + name: my-druid-cluster |
| 158 | +spec: |
| 159 | + deepStorage: |
| 160 | + s3: |
| 161 | + reference: my-bucket-resource |
| 162 | + # more spec here ... |
| 163 | +---- |
| 164 | + |
| 165 | +== Credentials |
| 166 | + |
| 167 | + |
| 168 | +No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You will need a `Secret` containing the access key ID and secret access key, a `SecretClass` and then a reference to this `SecretClass` where you want to specify the credentials. |
| 169 | + |
| 170 | +The `Secret`: |
| 171 | + |
| 172 | +[source,yaml] |
| 173 | +---- |
| 174 | +apiVersion: v1 |
| 175 | +kind: Secret |
| 176 | +metadata: |
| 177 | + name: s3-credentials |
| 178 | + labels: |
| 179 | + secrets.stackable.tech/class: s3-credentials-class # <1> |
| 180 | +stringData: |
| 181 | + accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE |
| 182 | + secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE |
| 183 | +---- |
| 184 | + |
| 185 | +<1> This label connects the `Secret` to the `SecretClass`. |
| 186 | + |
| 187 | +The `SecretClass`: |
| 188 | + |
| 189 | +[source,yaml] |
| 190 | +---- |
| 191 | +apiVersion: secrets.stackable.tech/v1alpha1 |
| 192 | +kind: SecretClass |
| 193 | +metadata: |
| 194 | + name: s3-credentials-class |
| 195 | +spec: |
| 196 | + backend: |
| 197 | + k8sSearch: |
| 198 | + searchNamespace: |
| 199 | + pod: {} |
| 200 | +---- |
| 201 | + |
| 202 | +Referencing it: |
| 203 | + |
| 204 | +[source,yaml] |
| 205 | +---- |
| 206 | +... |
| 207 | +credentials: |
| 208 | + secretClass: s3-credentials-class |
| 209 | +... |
| 210 | +---- |
| 211 | + |
| 212 | +== What's next |
| 213 | + |
| 214 | +- Find details about the options of the S3 resource in the xref:reference:s3.adoc[S3 resources reference]. |
0 commit comments