What is olap traversal in graph database?
OLAP stands for OnLine Analytical Processing, is one of the ways to traverse graph database parallelly in batch operations.
Janusgraph OLAP Traversal makes use of distributed graph processing by leveraging gremlin plugin for Apache Hadoop and Apache Spark.
For more information on this topic please refer to below links:
JanusGraph with TinkerPop’s Hadoop-Gremlin - JanusGraph
The Problem
We had a working setup of Janusgraph with version 0.5.2 where we were able to insert and query (OLTP) the data as per need. We were exploring JanusGraph OLAP traversal for some reporting and analytical requirements. However when we tried to follow the instructions provided on the JanusGraph documentation, we were not able connect to Cassandra with SSL enabled, when traversing the graph in OLAP mode through Gremlin queries. Cassandra database was setup on SSL connection with a Truststore expected with client connection requests. OLTP Queries or the regular way of working with the queries was working fine and inline with the official documentation available.
Below is config for OLTP which works janusgraph-cql-oltp.properties:
gremlin.graph=org.janusgraph.core.JanusGraphFactory storage.backend=cql storage.hostname=cassandra.cassandra.svc.cluster.local storage.username=cassandra storage.password=cassandra123 storage.cql.keyspace=janusgraph cache.db-cache = true cache.db-cache-clean-wait = 20 cache.db-cache-time = 180000 cache.db-cache-size = 0.5 storage.lock.wait-time = 60000 storage.cql.ssl.enabled=true storage.cql.ssl.truststore.location=/etc/config/tls/truststore storage.cql.ssl.truststore.password=secretpasswd
When we load this line in gremlin console to connect and traverse a simple query we were able to fetch the expected results.
Below is the config for OLAP which is showing error for connection to Cassandra with ssl enabled:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat gremlin.hadoop.jarsInDistributedCache=true gremlin.hadoop.inputLocation=none gremlin.hadoop.outputLocation=output gremlin.spark.persistContext=true # # JanusGraph Cassandra InputFormat configuration # # These properties defines the connection properties which were used while write data to JanusGraph. janusgraphmr.ioformat.conf.storage.backend=cql # This specifies the hostname & port for Cassandra data store. janusgraphmr.ioformat.conf.storage.hostname=cassandra.cassandra.svc.cluster.local janusgraphmr.ioformat.conf.storage.port=9042 janusgraphmr.ioformat.conf.storage.username=cassandra janusgraphmr.ioformat.conf.storage.password=cassandra123 janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph janusgraphmr.ioformat.conf.storage.lock.wait-time = 60000 janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/etc/config/tls/truststore janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123 janusgraphmr.ioformat.conf.storage.ssl.enabled=true janusgraphmr.ioformat.conf.storage.ssl.truststore.location=/etc/config/tls/truststore janusgraphmr.ioformat.conf.storage.ssl.truststore.password=cassandra123 janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE storage.lock.wait-time = 60000 storage.cql.ssl.enabled=true storage.cql.ssl.client-authentication-enabled=true storage.cql.ssl.truststore.location=/etc/config/tls/truststore storage.cql.ssl.truststore.password=cassandra123 janusgraphmr.ioformat.conf.cache.db-cache = true janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20 janusgraphmr.ioformat.conf.cache.db-cache-time = 180000 janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5 cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner cassandra.input.widerows=true # # SparkGraphComputer Configuration # spark.master=local[*] spark.executor.memory=1g spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator
When we load the graph object in gremlin console, we can see properties are loaded correctly. But when we traverse the graph as mentioned in the documentation, we get cassandra connection error related to ssl config.
gremlin> graph=HadoopGraph.open('/janusgraph-full-0.5.2/conf/olap.properties') ==>hadoopgraph[cqlinputformat->nulloutputformat] gremlin> g=graph.traversal().withComputer(SparkGraphComputer) ==>graphtraversalsource[hadoopgraph[cqlinputformat->nulloutputformat], sparkgraphcomputer] gremlin> graph.configuration() //// i can see all the properties from the file loaded here gremlin> g.V().limit(1) 07:34:44 WARN org.apache.tinkerpop.gremlin.spark.process.computer.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: cassandra.cassandra.svc.cluster.local/10.0.165.158:9042 (com.datastax.driver.core.exceptions.TransportException: [cassandra.cassandra.svc.cluster.local/10.0.165.158:9042] Connection has been closed)) Type ':help' or ':h' for help.
We could verify from cassandra logs that a connection was attempted but request was rejected for ssl reasons. Below are the logs from cassandra instance:
INFO [epollEventLoopGroup-2-4] 2023-05-02 07:34:58,809 Message.java:826 - Unexpected exception during request; channel = [id: 0xeb0e017f, L:/10.12.0.224:9042 ! R:/10.12.0.135:60316] io.netty.handler.ssl.NotSslRecordException: not an SSL/TLS record: 0400000001000000500003000b43514c5f56455253494f4e0005332e302e30000e4452495645525f56455253494f4e0005332e392e30000b4452495645525f4e414d4500144461746153746178204a61766120447269766572 at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1057) ~[netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:411) [netty-all-4.0.44.Final.jar:4.0.44.Final]
Finally found the missing piece
After trying several combinations to pass the ssl info the connection configuration, we were still not able to establish connection with Cassandra and successfully execute an OLAP query.
We posted this as a question on stackoverflow, discord channel and google groups hoping to receive some help from community. Finally got a response from the discord community member and it worked out. The discord channel for Janusgraph and Gremlin users is quite active. The configuration parameters which were needed to be populated for ssl connection were not mentioned in the documentation. They are there in the code and below is the reference. These however work with latest versions of Janusgraph and we verified this with 0.6.0 and 1.0.0-rc2 versions.
The OLAP connection configuration was updated with below mentioned entries:
cassandra.input.native.ssl.trust.store.password=cassandra123
Finally the updated OLAP traversal configuration looks like below:
gremlin.graph=org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph gremlin.hadoop.graphReader=org.janusgraph.hadoop.formats.cql.CqlInputFormat gremlin.hadoop.graphWriter=org.apache.hadoop.mapreduce.lib.output.NullOutputFormat gremlin.hadoop.jarsInDistributedCache=true gremlin.hadoop.inputLocation=none gremlin.hadoop.outputLocation=output gremlin.spark.persistContext=true janusgraphmr.ioformat.conf.storage.backend=cql janusgraphmr.ioformat.conf.storage.hostname=cassandra-headless.cassandra.svc.cluster.local janusgraphmr.ioformat.conf.storage.port=9042 janusgraphmr.ioformat.conf.storage.username=cassandra janusgraphmr.ioformat.conf.storage.password=cassa@2@2! janusgraphmr.ioformat.conf.storage.cql.keyspace=janusgraph janusgraphmr.ioformat.conf.storage.cql.read-consistency-level=ONE janusgraphmr.ioformat.conf.storage.cql.ssl.enabled=true janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.location=/tmp/security/truststore janusgraphmr.ioformat.conf.storage.cql.ssl.truststore.password=cassandra123 storage.cql.read-consistency-level=ONE janusgraphmr.ioformat.conf.cache.db-cache = true janusgraphmr.ioformat.conf.cache.db-cache-clean-wait = 20 janusgraphmr.ioformat.conf.cache.db-cache-time = 180000 janusgraphmr.ioformat.conf.cache.db-cache-size = 0.5 cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner cassandra.input.native.keep.alive=true cassandra.input.native.ssl.trust.store.path=/tmp/security/truststore cassandra.input.native.ssl.trust.store.password=cassa@2@2! storage.cql.protocol-version=V4 spark.master=local[*] spark.executor.memory=3g spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryo.registrator=org.janusgraph.hadoop.serialize.JanusGraphKryoRegistrator spark.cassandra.input.fetch.size_in_rows=500
With the above configuration we were able to traverse the graph using OLAP traversal and achieve our objective.
Top comments (0)