Skip to content

Commit 3c0478e

Browse files
fhennigrazvan
andauthored
S3 resource documentation (#200)
* Added file and navigation structure * Added some stuff * Added new diagram and more text * Added S3 reference stub * Some more s3 reference info * Diagrams and example section updated * Updated 'whats next' section * More reference info * minor changes on the reference info * typos fixed * more text! * updated package-lock * Fixed broken symbolic links * reverted changes * Update modules/concepts/pages/s3.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * Update modules/concepts/pages/s3.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * minor changes * Update modules/concepts/pages/s3.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * Update modules/concepts/pages/s3.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> * changed dedicated to stand-alone * Update modules/concepts/pages/s3.adoc Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com>
1 parent ca4bcaa commit 3c0478e

File tree

12 files changed

+1839
-0
lines changed

12 files changed

+1839
-0
lines changed

antora-playbook.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ ui:
8484
antora:
8585
extensions:
8686
- '@antora/lunr-extension'
87+
- asciidoctor-kroki
8788

8889
asciidoc:
8990
attributes:

antora.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@ version: master
33
title: Stackable Documentation
44
nav:
55
- modules/ROOT/nav.adoc
6+
- modules/concepts/nav.adoc
67
- modules/tutorials/nav.adoc
8+
- modules/reference/nav.adoc
79
- modules/operators/nav.adoc
810
- modules/contributor/nav.adoc

local-antora-playbook.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,10 +85,12 @@ ui:
8585
antora:
8686
extensions:
8787
- '@antora/lunr-extension'
88+
- asciidoctor-kroki
8889

8990
asciidoc:
9091
attributes:
9192
base-repo: https://github.com/stackabletech
9293
plantuml-server-url: http://www.plantuml.com/plantuml
94+
kroki-fetch-diagram: true
9395
extensions:
9496
- asciidoctor-kroki

modules/concepts/nav.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
* Concepts
2+
** xref:s3.adoc[]

modules/concepts/pages/s3.adoc

Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
= S3 resources
2+
3+
// -------------- Intro ----------------
4+
5+
Many of the tools on the Stackable platform integrate with S3 storage in some way.
6+
For example Druid can xref:druid::usage.adoc#_s3_for_ingestion[ingest data from S3] and also xref:druid::usage.adoc##_s3_deep_storage[use S3 as a backend for deep storage], Spark can use an xref:spark-k8s::usage.adoc#_s3_bucket_specification[S3 bucket] to store application files and data.
7+
8+
== S3Connection and S3Bucket
9+
// introducing the objects
10+
11+
Stackable uses _S3Connection_ and _S3Bucket_ objects to configure access to S3 storage.
12+
// s3 connection
13+
An S3Connection object contains information such as the host name of the S3 server, it's port, TLS parameters and access credentials.
14+
// s3 bucket
15+
An S3Bucket contains the name of the bucket and a reference to an S3Connection, the connection to the server where the bucket is located. An S3Connection can be referenced by multiple buckets.
16+
17+
Here's an example of a simple S3Connection object and an S3Bucket referencing that connection:
18+
19+
[source,yaml]
20+
----
21+
---
22+
apiVersion: s3.stackable.tech/v1alpha1
23+
kind: S3Connection
24+
metadata:
25+
name: my-connection-resource
26+
spec:
27+
host: s3.example.com
28+
port: 4242
29+
---
30+
apiVersion: s3.stackable.tech/v1alpha1
31+
kind: S3Bucket
32+
metadata:
33+
name: my-bucket-resource
34+
spec:
35+
bucketName: my-example-bucket
36+
connection:
37+
reference: my-connection-resource
38+
----
39+
40+
== Object Reference Structure
41+
// ---------- Referencing -------------
42+
43+
S3Bucket(s) reference S3Connection(s) objects. Both types of objects can be referenced by other resources. For example in a DruidCluster you can specify a bucket for deep storage and an S3Connection for data ingestion.
44+
S3 connection objects can be defined in a standalone fashion or they can be inlined into a bucket object. Similarly a bucket can be defined in a standalone object or inlined into an enclosing object.
45+
46+
[excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%]
47+
----
48+
include::partial$diagrams/S3ResourceOverview.excalidraw[]
49+
----
50+
51+
The diagram above shows three examples of how the objects can be
52+
structured.
53+
// Option 1
54+
In option 1 all objects are separate from each other. This provides maximum reusability because the same connection or bucket object can be referenced by multiple resources. It also allows for separation of concerts across team members. Cluster administrators can define S3 connection objects that developers reference in their applications.
55+
// Option 2
56+
In option 2 the bucket is inlined in the cluster definition. This makes sense if you have a dedicated bucket for a specific purpose, if it is only used in this one cluster instance, in this single product.
57+
// Option 3
58+
Option 3 shows all S3 objects inlined in a DruidCluster resource. This is a very convenient way to quickly test something since the entire configuration is encapsulated in a single but potentially large manifest.
59+
60+
=== Examples
61+
62+
To clarify the concept, a few examples will be given, using a DruidCluster resource as an example.
63+
64+
[source,yaml]
65+
----
66+
67+
apiVersion: druid.stackable.tech/v1alpha1
68+
kind: DruidCluster
69+
metadata:
70+
name: my-druid-cluster
71+
spec:
72+
deepStorage:
73+
# to be defined ...
74+
# more spec here ...
75+
----
76+
77+
==== Inline definition
78+
79+
The inline definition is variant 3 in the figure above.
80+
81+
[excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%]
82+
----
83+
include::partial$diagrams/S3ResourcesInlined.excalidraw[]
84+
----
85+
86+
This variant as the advantage that everything is defined in a single file, right where it is going to be used:
87+
88+
[source,yaml]
89+
----
90+
91+
apiVersion: druid.stackable.tech/v1alpha1
92+
kind: DruidCluster
93+
metadata:
94+
name: my-druid-cluster
95+
spec:
96+
deepStorage:
97+
s3:
98+
inline: # <1>
99+
bucketName: my-bucket
100+
connection:
101+
inline: # <2>
102+
host: test-minio
103+
port: 9000
104+
# more spec here ...
105+
----
106+
<1> The inline definition of the bucket. The bucket definition contains `bucketName` and `connection`.
107+
<2> The inline definition of the connection. It contains the `host` and `port`.
108+
109+
110+
==== Stand-alone resources
111+
112+
Often multiple buckets are used across a data pipeline, as well as buckets being used by different applications, so stand-alone resource definitions that can be referenced from multiple objects make sense.
113+
114+
[excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%]
115+
----
116+
include::partial$diagrams/S3ResourcesByReference.excalidraw[]
117+
----
118+
119+
The DruidCluster references the S3Bucket, which in turn references the S3Connection. First the definition of the S3Connection:
120+
121+
[source,yaml]
122+
----
123+
---
124+
apiVersion: s3.stackable.tech/v1alpha1
125+
kind: S3Connection
126+
metadata:
127+
name: my-connection-resource
128+
spec:
129+
host: s3.example.com
130+
port: 4242
131+
----
132+
133+
Then the bucket, which references the connection:
134+
135+
136+
[source,yaml]
137+
----
138+
---
139+
apiVersion: s3.stackable.tech/v1alpha1
140+
kind: S3Bucket
141+
metadata:
142+
name: my-bucket-resource
143+
spec:
144+
bucketName: my-example-bucket
145+
connection:
146+
reference: my-connection-resource
147+
----
148+
149+
You can then use this bucket, for example in Druid, as a deep storage:
150+
151+
[source,yaml]
152+
----
153+
154+
apiVersion: druid.stackable.tech/v1alpha1
155+
kind: DruidCluster
156+
metadata:
157+
name: my-druid-cluster
158+
spec:
159+
deepStorage:
160+
s3:
161+
reference: my-bucket-resource
162+
# more spec here ...
163+
----
164+
165+
== Credentials
166+
167+
168+
No matter if a connection is specified inline or as a separate object, the credentials are always specified in the same way. You will need a `Secret` containing the access key ID and secret access key, a `SecretClass` and then a reference to this `SecretClass` where you want to specify the credentials.
169+
170+
The `Secret`:
171+
172+
[source,yaml]
173+
----
174+
apiVersion: v1
175+
kind: Secret
176+
metadata:
177+
name: s3-credentials
178+
labels:
179+
secrets.stackable.tech/class: s3-credentials-class # <1>
180+
stringData:
181+
accessKey: YOUR_VALID_ACCESS_KEY_ID_HERE
182+
secretKey: YOUR_SECRET_ACCES_KEY_THATBELONGS_TO_THE_KEY_ID_HERE
183+
----
184+
185+
<1> This label connects the `Secret` to the `SecretClass`.
186+
187+
The `SecretClass`:
188+
189+
[source,yaml]
190+
----
191+
apiVersion: secrets.stackable.tech/v1alpha1
192+
kind: SecretClass
193+
metadata:
194+
name: s3-credentials-class
195+
spec:
196+
backend:
197+
k8sSearch:
198+
searchNamespace:
199+
pod: {}
200+
----
201+
202+
Referencing it:
203+
204+
[source,yaml]
205+
----
206+
...
207+
credentials:
208+
secretClass: s3-credentials-class
209+
...
210+
----
211+
212+
== What's next
213+
214+
- Find details about the options of the S3 resource in the xref:reference:s3.adoc[S3 resources reference].

0 commit comments

Comments
 (0)