Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
9373f99
refactored service dicsovery text
Jun 16, 2022
ca6ba65
Elaborated the example
Jun 16, 2022
ffafef7
Added link to ZNode
Jun 16, 2022
8408dc6
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 20, 2022
6e62361
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 20, 2022
d95a87f
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 20, 2022
b26319a
more/different text
Jun 20, 2022
14116d3
Merge branch 'main' into service-discovery-refactoring
fhennig Jun 23, 2022
7834dca
New text
Jun 23, 2022
9dc8734
Merge branch 'service-discovery-refactoring' of github.com:stackablet…
Jun 23, 2022
2af33e8
Minor changes
Jun 23, 2022
58b1700
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
be6b4b8
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
8c52c34
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
5b2ce08
Update modules/contributor/pages/service_discovery.adoc
fhennig Jun 23, 2022
c8df4dc
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
3fbb2e5
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
fa22a41
Changes
Jun 23, 2022
f339938
Added link to druid
Jun 23, 2022
3414f01
small change
Jun 23, 2022
3625c37
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
80a0fe6
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
9d41ce3
Update modules/concepts/pages/service_discovery.adoc
fhennig Jun 23, 2022
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion modules/concepts/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
* Concepts
* xref:concepts:index.adoc[]
** xref:service_discovery.adoc[]
** xref:s3.adoc[]
3 changes: 3 additions & 0 deletions modules/concepts/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
= Concepts

This section of the documentation is intended to be read to gain a deeper understanding of the bigger picture and architectural design of the platform.
2 changes: 1 addition & 1 deletion modules/concepts/pages/s3.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ spec:
// ---------- Referencing -------------

S3Bucket(s) reference S3Connection(s) objects. Both types of objects can be referenced by other resources. For example in a DruidCluster you can specify a bucket for deep storage and an S3Connection for data ingestion.
S3 connection objects can be defined in a standalone fashion or they can be inlined into a bucket object. Similarly, a bucket can be defined in a standalone object or inlined into an enclosing object.
S3Connection objects can be defined in a standalone fashion or they can be inlined into a bucket object. Similarly a bucket can be defined in a standalone object or inlined into an enclosing object.

[excalidraw,s3-cluster-bucket-connection-reference,svg,width=70%]
----
Expand Down
122 changes: 122 additions & 0 deletions modules/concepts/pages/service_discovery.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
= Service discovery ConfigMap

// Abstract
Stackable operators provide a _service discovery ConfigMap_ for each product instance that is deployed. This ConfigMap has the same name as the product instance and contains information about how to connect to the instance. The ConfigMap is used by other Operators to connect products together and can also be used by you, the user, to connect external software to Stackable-operated software.

== Motivation

Products on the Stackable platform can, and in some cases must be connected with each other to run correctly. Some products are fundamental to the platform while others depend on them. For example, a NiFi cluster requires a ZooKeeper connection to run in distributed mode. Other products can optionally be connected with each other for better data flow. For example Trino does not store the query data itself, instead it interfaces with other applications to get access to it.

To connect NiFi to ZooKeeper, NiFi needs to know at which host and port it can find the ZooKeeper instance. However the exact address is not known in advance. To enable a connection from NiFi to ZooKeeper purely based on the name of the ZooKeeper cluster, the discovery ConfigMap is used.

With the ConfigMap, the name of the ZooKeeper cluster is enough to know how to connect to it, the ConfigMap has the same name as the cluster and contains all the information needed to connect to the ZooKeeper cluster.

=== Example

For a ZookeeperCluster named simple-zk:

[source,yaml]
----
apiVersion: zookeeper.stackable.tech/v1alpha1
kind: ZookeeperCluster
metadata:
name: simple-zk
spec:
...
----

The Zookeeper operator reads the resource and creates the necessary pods and services to get the instance running. It is aware of the interfaces and connections that may be consumed by other products and it also knows all the details of the actual running processes. It then creates the discovery ConfigMap:

[source,yaml]
----
apiVersion: v1
kind: ConfigMap
metadata:
name: simple-zk
data:
ZOOKEEPER: simple-zk-server-default-0.simple-zk-server-default.default.svc.cluster.local:2181,simple-zk-server-default-1.simple-zk-server-default.default.svc.cluster.local:2181
----

The information needed to connect can be a string like above, for example a JDBC connect string: `jdbc:postgresql://localhost:12345`. But a ConfigMap can also contain multiple configuration files which can then be mounted into a client Pod. This is the case for xref:hdfs::discovery.adoc[HDFS], where the `core-site.xml` and `hdfs-site.xml` files are put into the discovery ConfigMap.

== Usage of the service discovery ConfigMap

The ConfigMap is used by Stackable operators to connect products together, but can also be used by the user to retrieve connection information to connect to product instances. The operators consume only the ConfigMap, so it is also possible to create a ConfigMap by hand for a product instance that is not operated by a Stackable operator. These different usage scenarios are explained below.

=== Service discovery within Stackable

Stackable operators use the discovery ConfigMap to automatically connect to service dependencies. Hbase requires HDFS to run. With an HdfsCluster named simple-hdfs defined as such:

[source,yaml]
----
apiVersion: hdfs.stackable.tech/v1alpha1
kind: HdfsCluster
metadata:
name: simple-hdfs
spec:
...
----
The HDFS instance is referenced to by name in HBase cluster spec in the field `hdfsConfigMapName`:

[source,yaml]
----
apiVersion: hbase.stackable.tech/v1alpha1
kind: HbaseCluster
metadata:
name: simple-hbase
spec:
hdfsConfigMapName: simple-hdfs
...
----

This is a common pattern across the platform. For example the DruidCluster spec contains a field `zookeeperConfigMapName` and the TrinoCluster spec contains a field `hiveConfigMapName` to connect Druid to ZooKeeper and Trino to Hive respectively.

=== Service discovery from outside Stackable

You can connect your own products to Stackable-operated product instances. How exactly you do this depends heavily on the application you want to connect.

In general, use the name of the product instance to retrieve the ConfigMap and use the information in there to connect your own service. You can find links to these documentation pages below in the <<whats-next>> section.

=== Discovering services outside Stackable

It is not uncommon to already have some core software running in your stack, such as HDFS. If you want to use HBase with the Stackable operator, you can still connect your already existing HDFS instance. You will have to create the discovery ConfigMap for your already existing HDFS yourself. Looking at xref:hdfs::discovery.adoc[the discovery documentation for HDFS], you can see that the discovery ConfigMap for HDFS contains the `core-site.xml` and `hdfs-site.xml` files.

The ConfigMap should look something like this:

[source,yaml]
----
apiVersion: v1
kind: ConfigMap
metadata:
name: my-already-exisiting-hdfs
data:
core-site.xml: |
<here should be your core-site.xml file contents>
hdfs-site.xml: |
<here should be your hdfs-site.xml file contents>
----

In your HBase cluster spec that you use with the Stackable HBase Operator, you can then reference my-already-existing-hdfs and the Stackable HBase Operator will use your manually created ConfigMap to configure HBase to use your HDFS instance:

[source,yaml]
----
apiVersion: hbase.stackable.tech/v1alpha1
kind: HbaseCluster
metadata:
name: simple-hbase
spec:
hdfsConfigMapName: my-already-exisiting-hdfs
...
----

[#whats-next]
== Further reading

Consult discovery ConfigMap documentation for specific products:

* xref:druid::discovery.adoc[Apache Druid]
* xref:hdfs::discovery.adoc[Apache Hadoop HDFS]
* xref:hive::discovery.adoc[Apache Hive]
* xref:kafka::discovery.adoc[Apache Kafka]
* xref:opa::discovery.adoc[OPA]
* xref:zookeeper::discovery.adoc[Apache ZooKeeper]
29 changes: 3 additions & 26 deletions modules/contributor/pages/service_discovery.adoc
Original file line number Diff line number Diff line change
@@ -1,34 +1,11 @@
:source-highlighter: highlight.js
:highlightjs-languages: rust

= Service Discovery
= Service discovery implementation guidelines

== Introduction
For a conceptual overview of service discovery, consult the xref:concepts:service_discovery.adoc[service discovery concept page].

Several products deployed by the Stackable platform depend on other (Stackable) products. This could be a product that requires an external database, high availability support or synchronization.

In order to programmatically resolve this dependency, the Stackable platform uses _service discovery_. A Stackable operator is aware of interfaces and connections that have to be exposed and may be consumed by other operators to configure their products. These interfaces or connections are usually referred to as _connection string_.

As a real world example, the Stackable Operator for Apache Kafka has to configure Kafka brokers with an Apache ZooKeeper connection string in order to store and share information about e.g. Kafka topics. This connection string is provided by the Stackable Operator for Apache ZooKeeper, which is aware of all the pods and services related to ZooKeeper.

== Examples for connection strings

- JDBC SQL connection strings: `jdbc:postgresql://localhost:12345`
- thrift protocol: `thrift://localhost:12345`
- spark protocol: `spark://master:7077`
- REST API: `\http://localhost:8080`
- HDFS: `hdfs://localhost:12345`
- ZooKeeper ZNode: `host1:2181,host2:2181/my-chroot`

== Concepts of Service Discovery

=== Architecture

The Operator that provides service discovery writes a `ConfigMap` with all necessary information about its exposed services. Each service has its own entry in the `ConfigMap` as can be seen with the `ZOOKEEPER` entry below:

image::service_discovery_arch.png[Service Discovery]

=== Best practices
== Best practices

==== Exposing config maps for service discovery

Expand Down
2 changes: 1 addition & 1 deletion supplemental-ui/partials/navbar.hbs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<a class="navbar-sub-item" href="{{{ relativize "/home/index.html" }}}">Home</a>
<a class="navbar-sub-item" href="{{{ relativize "/home/getting_started.html" }}}">Getting Started</a>
<a class="navbar-sub-item" href="{{{ relativize "/home/concepts/s3.html" }}}">Concepts</a>
<a class="navbar-sub-item" href="{{{ relativize "/home/concepts/index.html" }}}">Concepts</a>
<a class="navbar-sub-item" href="{{{ relativize "/home/tutorials/end-to-end_data_pipeline_example.html" }}}">Tutorials</a>
<a class="navbar-sub-item" href="{{{ relativize "/stackablectl/stable/index.html" }}}">stackablectl</a>
<div class="navbar-sub-item drop-down">
Expand Down