stackabletech · fhennig · Jun 23, 2022 · Jun 16, 2022 · Jun 16, 2022 · Jun 16, 2022
diff --git a/...ributor/images/service_discovery_arch.png → ...oncepts/images/service_discovery_arch.png b/...ributor/images/service_discovery_arch.png → ...oncepts/images/service_discovery_arch.png
diff --git a/modules/concepts/nav.adoc b/modules/concepts/nav.adoc
@@ -1,2 +1,3 @@
-* Concepts
+* xref:concepts:index.adoc[]
+** xref:service_discovery.adoc[]
 ** xref:s3.adoc[]
diff --git a/modules/concepts/pages/index.adoc b/modules/concepts/pages/index.adoc
@@ -0,0 +1,3 @@
+= Concepts
+
+This section of the documentation is intended to be read to gain a deeper understanding of the bigger picture and architectural design of the platform.
diff --git a/modules/concepts/pages/s3.adoc b/modules/concepts/pages/s3.adoc
@@ -41,7 +41,7 @@ spec:
 // ---------- Referencing -------------
 
 S3Bucket(s) reference S3Connection(s) objects. Both types of objects can be referenced by other resources. For example in a DruidCluster you can specify a bucket for deep storage and an S3Connection for data ingestion.
-S3 connection objects can be defined in a standalone fashion or they can be inlined into a bucket object. Similarly, a bucket can be defined in a standalone object or inlined into an enclosing object.
+S3Connection objects can be defined in a standalone fashion or they can be inlined into a bucket object. Similarly a bucket can be defined in a standalone object or inlined into an enclosing object.
 
 [excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%]
 ----

diff --git a/modules/concepts/pages/service_discovery.adoc b/modules/concepts/pages/service_discovery.adoc
@@ -0,0 +1,122 @@
+= Service discovery ConfigMap
+
+// Abstract
+Stackable operators provide a _service discovery ConfigMap_ for each product instance that is deployed. This ConfigMap has the same name as the product instance and contains information about how to connect to the instance. The ConfigMap is used by other Operators to connect products together and can also be used by you, the user, to connect external software to Stackable-operated software.
+
+== Motivation
+
+Products on the Stackable platform can, and in some cases must be connected with each other to run correctly. Some products are fundamental to the platform while others depend on them. For example, a NiFi cluster requires a ZooKeeper connection to run in distributed mode. Other products can optionally be connected with each other for better data flow. For example Trino does not store the query data itself, instead it interfaces with other applications to get access to it.
+
+To connect NiFi to ZooKeeper, NiFi needs to know at which host and port it can find the ZooKeeper instance. However the exact address is not known in advance. To enable a connection from NiFi to ZooKeeper purely based on the name of the ZooKeeper cluster, the discovery ConfigMap is used.
+
+With the ConfigMap, the name of the ZooKeeper cluster is enough to know how to connect to it, the ConfigMap has the same name as the cluster and contains all the information needed to connect to the ZooKeeper cluster.
+
+=== Example
+
+For a ZookeeperCluster named simple-zk:
+
+[source,yaml]
+----
+apiVersion: zookeeper.stackable.tech/v1alpha1
+kind: ZookeeperCluster
+metadata:
+ name: simple-zk
+spec:
+ ...
+----
+
+The Zookeeper operator reads the resource and creates the necessary pods and services to get the instance running. It is aware of the interfaces and connections that may be consumed by other products and it also knows all the details of the actual running processes. It then creates the discovery ConfigMap:
+
+[source,yaml]
+----
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: simple-zk
+data:
+ ZOOKEEPER: simple-zk-server-default-0.simple-zk-server-default.default.svc.cluster.local:2181,simple-zk-server-default-1.simple-zk-server-default.default.svc.cluster.local:2181
+----
+
+The information needed to connect can be a string like above, for example a JDBC connect string: `jdbc:postgresql://localhost:12345`. But a ConfigMap can also contain multiple configuration files which can then be mounted into a client Pod. This is the case for xref:hdfs::discovery.adoc[HDFS], where the `core-site.xml` and `hdfs-site.xml` files are put into the discovery ConfigMap.
+
+== Usage of the service discovery ConfigMap
+
+The ConfigMap is used by Stackable operators to connect products together, but can also be used by the user to retrieve connection information to connect to product instances. The operators consume only the ConfigMap, so it is also possible to create a ConfigMap by hand for a product instance that is not operated by a Stackable operator. These different usage scenarios are explained below.
+
+=== Service discovery within Stackable
+
+Stackable operators use the discovery ConfigMap to automatically connect to service dependencies. Hbase requires HDFS to run. With an HdfsCluster named simple-hdfs defined as such:
+
+[source,yaml]
+----
+apiVersion: hdfs.stackable.tech/v1alpha1
+kind: HdfsCluster
+metadata:
+ name: simple-hdfs
+spec:
+ ...
+----
+The HDFS instance is referenced to by name in HBase cluster spec in the field `hdfsConfigMapName`:
+
+[source,yaml]
+----
+apiVersion: hbase.stackable.tech/v1alpha1
+kind: HbaseCluster
+metadata:
+ name: simple-hbase
+spec:
+ hdfsConfigMapName: simple-hdfs
+ ...
+----
+
+This is a common pattern across the platform. For example the DruidCluster spec contains a field `zookeeperConfigMapName` and the TrinoCluster spec contains a field `hiveConfigMapName` to connect Druid to ZooKeeper and Trino to Hive respectively.
+
+=== Service discovery from outside Stackable
+
+You can connect your own products to Stackable-operated product instances. How exactly you do this depends heavily on the application you want to connect.
+
+In general, use the name of the product instance to retrieve the ConfigMap and use the information in there to connect your own service. You can find links to these documentation pages below in the <<whats-next>> section.
+
+=== Discovering services outside Stackable
+
+It is not uncommon to already have some core software running in your stack, such as HDFS. If you want to use HBase with the Stackable operator, you can still connect your already existing HDFS instance. You will have to create the discovery ConfigMap for your already existing HDFS yourself. Looking at xref:hdfs::discovery.adoc[the discovery documentation for HDFS], you can see that the discovery ConfigMap for HDFS contains the `core-site.xml` and `hdfs-site.xml` files.
+
+The ConfigMap should look something like this:
+
+[source,yaml]
+----
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: my-already-exisiting-hdfs
+data:
+ core-site.xml: |
+ <here should be your core-site.xml file contents>
+ hdfs-site.xml: |
+ <here should be your hdfs-site.xml file contents>
+----
+
+In your HBase cluster spec that you use with the Stackable HBase Operator, you can then reference my-already-existing-hdfs and the Stackable HBase Operator will use your manually created ConfigMap to configure HBase to use your HDFS instance:
+
+[source,yaml]
+----
+apiVersion: hbase.stackable.tech/v1alpha1
+kind: HbaseCluster
+metadata:
+ name: simple-hbase
+spec:
+ hdfsConfigMapName: my-already-exisiting-hdfs
+ ...
+----
+
+[#whats-next]
+== Further reading
+
+Consult discovery ConfigMap documentation for specific products:
+
+* xref:druid::discovery.adoc[Apache Druid]
+* xref:hdfs::discovery.adoc[Apache Hadoop HDFS]
+* xref:hive::discovery.adoc[Apache Hive]
+* xref:kafka::discovery.adoc[Apache Kafka]
+* xref:opa::discovery.adoc[OPA]
+* xref:zookeeper::discovery.adoc[Apache ZooKeeper]
diff --git a/modules/contributor/pages/service_discovery.adoc b/modules/contributor/pages/service_discovery.adoc
@@ -1,34 +1,11 @@
 :source-highlighter: highlight.js
 :highlightjs-languages: rust
 
-= Service Discovery
+= Service discovery implementation guidelines
 
-== Introduction
+For a conceptual overview of service discovery, consult the xref:concepts:service_discovery.adoc[service discovery concept page].
 
-Several products deployed by the Stackable platform depend on other (Stackable) products. This could be a product that requires an external database, high availability support or synchronization.
-
-In order to programmatically resolve this dependency, the Stackable platform uses _service discovery_. A Stackable operator is aware of interfaces and connections that have to be exposed and may be consumed by other operators to configure their products. These interfaces or connections are usually referred to as _connection string_.
-
-As a real world example, the Stackable Operator for Apache Kafka has to configure Kafka brokers with an Apache ZooKeeper connection string in order to store and share information about e.g. Kafka topics. This connection string is provided by the Stackable Operator for Apache ZooKeeper, which is aware of all the pods and services related to ZooKeeper.
-
-== Examples for connection strings
-
-- JDBC SQL connection strings: `jdbc:postgresql://localhost:12345`
-- thrift protocol: `thrift://localhost:12345`
-- spark protocol: `spark://master:7077`
-- REST API: `\http://localhost:8080`
-- HDFS: `hdfs://localhost:12345`
-- ZooKeeper ZNode: `host1:2181,host2:2181/my-chroot`
-
-== Concepts of Service Discovery
-
-=== Architecture
-
-The Operator that provides service discovery writes a `ConfigMap` with all necessary information about its exposed services. Each service has its own entry in the `ConfigMap` as can be seen with the `ZOOKEEPER` entry below:
-
-image::service_discovery_arch.png[Service Discovery]
-
-=== Best practices
+== Best practices
 
 ==== Exposing config maps for service discovery
 

diff --git a/supplemental-ui/partials/navbar.hbs b/supplemental-ui/partials/navbar.hbs
@@ -1,6 +1,6 @@
 <a class="navbar-sub-item" href="{{{ relativize "/home/index.html" }}}">Home</a>
 <a class="navbar-sub-item" href="{{{ relativize "/home/getting_started.html" }}}">Getting Started</a>
-<a class="navbar-sub-item" href="{{{ relativize "/home/concepts/s3.html" }}}">Concepts</a>
+<a class="navbar-sub-item" href="{{{ relativize "/home/concepts/index.html" }}}">Concepts</a>
 <a class="navbar-sub-item" href="{{{ relativize "/home/tutorials/end-to-end_data_pipeline_example.html" }}}">Tutorials</a>
 <a class="navbar-sub-item" href="{{{ relativize "/stackablectl/stable/index.html" }}}">stackablectl</a>
 <div class="navbar-sub-item drop-down">
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		= Concepts

		This section of the documentation is intended to be read to gain a deeper understanding of the bigger picture and architectural design of the platform.