Titan DB with Cassandra and Elastic search Setup
What is titan db ?
Titan db is a scalable Graph and Transactional database. which can be optimized to store and querying graph data's using complex graph traversal with thousands of concurrent user access.
It is using Cassandra, HBase and Oracle Berkeley DB as back end storage.
Titan DB Features:
- Elastic and linear scalability for a growing data and user base
- It uses data monitoring and replication techniques for avoid fault tolerance
- It support multiple data center and increase high data availability
- It support graph data analytic's with integration on some other technologies.
- It support high data search with the help of Elastic search or solr or lucene
- It support ACID and Eventual consistency.
Use below steps to install and setup titan with cassandra.
Step 1 :
Download Apache Cassandra : Apache cassandra
Download Titan DB : titan-db
both files are download in zip format. Need to extract in one location
Step 2 :
Extract both downloads in one location. like /home/ramakavanan/titan-db and /home/ramakavanan/cassandra.
Step 3 :
Setup Cassandra :
If you have installed cassandra from rpm or deb, then it
have the correct permission for write logs and data storage directory.
Other wise, we should provide the permission manually.
So under cassandra folder have conf (/home/ramakavanan/cassandra/conf) folder, In that we have the cassandra.yaml file.
Which has the cassandra log , data and cache setup. So we can change it
based on our needs. Here we need to check and make sure these
directories are exist and can be written.
data_file_directories (/home/ramakavanan/cassandra/data) commitlog_directory (/home/ramakavanan/cassandra/commitlog) saved_caches_directory (/home/ramakavanan/cassandra/saved_caches)
By default, Cassandra will write its logs in ${cassandra.logdir}/logs which means /home/ramakavanan/cassandra/logs . Make sure this directory exists and is writable, or change this line in conf/log4j-server.properies:
log4j.appender.R.File=/var/log/cassandra/logs/system.log
Note: that in Cassandra 2.1+, the logger in use is logback, so change this logging directory in your conf/logback.xml file such as:
<file>/var/log/cassandra/system.log</file>
JVM-level settings such as heap size can be set in conf/cassandra-env.sh.
Start Cassandra:
Start up Cassandra by invoking bin/cassandra -f from the command line1. The service should start in the foreground and log gratuitously to the console. Assuming you don’t see messages with scary words like “error”, or “fatal”, or anything that looks like a Java stack trace, then everything should be working.
Press Control-C to stop Cassandra
Step 4: Setup Titan-DB
We can setup cassandra with titan in two ways.
1 . By Default , inbuilt cassandra and cassandrathrift – DB available with titan, where it will run the node tool and elastic search(for index storage back end db) automatically .
We can using default titan configuration, start up titan by invoking bin/sh titan.sh start from command line. It forking cassandra, node tool and start elastic search server.
Then we can connect gremlin server by invoking bin/gremlin.sh.
Note : Open titan present root folder path and then provide the above command
We can setup cassandra with titan in two ways.
1 . By Default , inbuilt cassandra and cassandrathrift – DB available with titan, where it will run the node tool and elastic search(for index storage back end db) automatically .
We can using default titan configuration, start up titan by invoking bin/sh titan.sh start from command line. It forking cassandra, node tool and start elastic search server.
Forking Cassandra… Running `nodetool statusthrift`. OK (returned exit status 0 and printed string “running”). Forking Elasticsearch… Connecting to Elasticsearch (127.0.0.1:9300)…. OK (connected to 127.0.0.1:9300). Forking Gremlin-Server… Connecting to Gremlin-Server (127.0.0.1:8182)…. OK (connected to 127.0.0.1:8182). Run gremlin.sh to connect.
Then we can connect gremlin server by invoking bin/gremlin.sh.
\,,,/ (o o) —–oOOo-(3)-oOOo—– plugin activated: aurelius.titan plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities 11:05:19 INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph – HADOOP_GREMLIN_LIBS is set to: /var/lib/titan/titan-1.0.0-hadoop1/lib plugin activated: tinkerpop.hadoop plugin activated: tinkerpop.tinkergraph gremlin>
Note : Open titan present root folder path and then provide the above command
2. Second way to config Cassandra with titan :
go to titan conf folder and then find the file titan-cassandra-es.properties (conf/titan-cassandra-es.properties) and Edit as the following properties.
gremlin.graph=com.thinkaurelius.titan.core.TitanFactory # Other values: cassandrathrift, astyanax (synonym: cassandra), embeddedcassandra, inmemory storage.backend=cassandra storage.hostname=127.0.0.1 # Index backend index.search.backend=elasticsearch index.search.directory=/tmp/ramakavanan/es index.search.elasticsearch.local-mode=true index.search.elasticsearch.client-only=false
then start cassandra.
go to cassandra root path upto bin folder and execute cassandra -f comand
then start Nodetool
go to cassandra root path upto bin folder and execute nodetool statusthrift command. This command return whether this nodetool running or not. If not running means we should start that server using nodetool enablethrift
then start Elastic search server if you required.
Note : Dont need to start titan db server.
Check if working or not using gremlin server.
now connect gremlin server from titan db. go to titan db root path and execute the following command bin/sh gremlin.sh
\,,,/ (o o) —–oOOo-(3)-oOOo—– plugin activated: aurelius.titan plugin activated: tinkerpop.server plugin activated: tinkerpop.utilities 11:05:19 INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph – HADOOP_GREMLIN_LIBS is set to: /var/lib/titan/titan-1.0.0-hadoop1/lib plugin activated: tinkerpop.hadoop plugin activated: tinkerpop.tinkergraph gremlin>
then open titan properties to connect graph database using follow command
gremlin> g = TitanFactory.open(‘conf/titan-cassandra-es.properties’) ==>standardtitangraph[cassandra:[127.0.0.1]]
Check below commands in gremlin server to show how to create vertices and edge in titan db
gremlin> g1 = g.traversal() ==>graphtraversalsource[standardtitangraph[cassandra:[127.0.0.1]], standard] gremlin> g1.V() gremlin> g1.V().count() ==>0 Creatin vertices gremlin> v1 = g.addVertex(T.label, “person”, “name”, “rama”, “age”, 25) gremlin> v2 = g.addVertex(T.label, “software”, “name”, “lop”, “lang”, “java”) Create An edge between above two created vertices:- gremlin> v1.addEdge(“created”, v2, “weight”, 63) ==e[2rl-360-4r9-38g][4104-created-;4192] gremlin> g1.V().has(‘name’,’rama’).values(‘name’) ==rama gremlin> g1.V().has(‘name’,’rama’) ==v[4104]
Note : we can format the gremlin server output as JSON or XML
Checking in Cassandra :
We can check if this changes are affected in cassandra means, just open the cassandra logs and look on that. It produces some keys and values for that vertices and edges we created in titan through gremlin server.
we can check through cassandra keyspace created. If you want to look keyspaces created in cassandra means, in cassandra console execute the following commands and look at the outputs.
cqlsh:titan> describe keyspaces; titan system_auth mykeyspace system_traces system_schema system system_distributed cqlsh:>use titan; cqlsh:titan> cqlsh:titan> describe tables; cqlsh:titan> select * from titan_ids; key | column1 | value ——————–+——————————————————————————————————+——- 0x0000000000000003 | 0xfffffffffffec77f000535233381db083766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 | 0x 0x6000000000000003 | 0xfffffffffffec77f0005352337d4aac83766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 | 0x 0x6000000000000000 | 0xffffffffffffd8ef0005352337cf82783766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 | 0x 0x0000000000000004 | 0xffffffffffffff9b00053523337cc2583766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 | 0x 0x0000000000000004 | 0xffffffffffffffcd0005352333779a083766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 | 0x 0x0800000000000000 | 0xffffffffffffd8ef000535233387b7083766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 | 0x 0x0800000000000003 | 0xfffffffffffec77f00053523338cd7883766303030313031373330372d6b6e6f6c6475732d566f7374726f2d3335353832 | 0x
Note : Along with this technologies we can use Redis server cache to make our application as too fast and optimized.
No comments:
Post a Comment