neo4j - triplestores vs native graph dbs on fast queries -
i'm researching native graph databases , triple stores (rdf stores) our use. we’re focused on marklogic
triple store, , neo4j
, , maybe orientdb
native graph db.
part of q below laying out context-- i’m investigating major distinction between these 2 types of dbs. i’m looking verification on first part-- whether i’m missing in picture.
the second part-- part b, i’m looking answers on how each db has how of i’m outlining in part a.
part a:
afaik far, major distinction is-- triple-stores store relationships, or rather edges, based on relationship itself. so, it's "bag" of edges, each specific, designed attributes on them reflect semantics of relationship. native graph dbs on other hand, store graph structure-- nodes , links on them, along attributes you'd define on these nodes , links.
i think, following 2 set 2 extremes fair view of these two. following 2 extremes-- i'm pretty sure dbs out there doing more either 1 of these extremes.
1.) bag of edges (triple store): in overall, each subject-predicate-object triple, (sourcenode, edge, destnode)
stored single record, forming triple store entry. triple store indexed on each of these 3 columns, when need list of people have friends live in australia, (or rather, triple store engine) gets “friends” relationships , among them, searches ones have source or dest node node person , has property “lives in australia”.
2.) native graph: nodes labels , properties, , links in between. in order find people "who have friends live in australia", first find nodes labeled "person", search relationship list (which linked list (?)) of node, , go there. 2 searches, 1 on nodes , second on relationships of node, opposed 1 search on relationships (triples) of triple-stores.
one thing kept seeing on blogs far pros , cons of triple stores vs native graph dbs is, triplestores score on queries because of indexing: relationships can accessed. in native graph db, relationships accessed through nodes incident to. (i'm aware that, same token, native graph dbs have advantage of retaining graph structure graph algorithms , solutions can implemented easier , run faster.)
however, lack of indexing not have a shortcoming of native graph db if allows indexing of nodes and/or relationships based on properties and/or on labels.
if allows labeling of nodes , indexes on labels, developer can take subgraph of overall graph , go there. such query on restricted domain faster.
if allows labeling of relationships, queries "revolving around” relationships, “list of people have friends live in australia” above can execute faster. because query won't traverse links nodes , properties of nodes, instead , access links directly.
i wondering how of these marklogic
, neo4j
, orientdb
doing?
i skimmed thru chapter 6 of this book on neo4j
, haven’t seen direct search on index of edges (relationships.) have missed anything?
if did miss , neo4j
has such indexing on edges, how come triple stores have major advantage of fast queries on native graph dbs?
tia.
//----------------------
edit:
note: i've seen graph dbs vs. document dbs vs. triplestores among other useful discussions.
for part a: differences between triple store , graph store - differences aren't in stored, more in how intended queried.
a graph store aims answer graph queries. things include questions structure of graph. includes minimum distance between 2 points (e.g. route planning), perhaps conditional evaluation (e.g. avoid motorways/highways, or i'm driving caravan @ limit of 50mph), perhaps including returning calculated value (e.g. distance/time taken, best route steps). include finding similar sub graphs, , various other graph-type queries.
a triple store aims return information matching subject or subjects. e.g. "find me people know other people members of organisation of type drug gang, , return personal profile information". in query bounds of network querying known (person -> person -> organisation -> org type), , returning set of information (all 'person' assertions). triple query.
because of nature of above 2 query types see different physical architectures. neo4j , graph stores adopt 'all information on each node' approach, multiple nodes being used scale query load. other nodes contain 100% copy of data.
a triple store on other hand (pure plays, or hybrid nosql databases marklogic , orientdb) architected split data in partitions/shards across multiple servers. allows linear scalability on commodity hardware rather large amount of data requiring large piece of tin. downside of course if of data lies across multiple servers, local network hit complete complex 'graph style' query.
this isn't graph stores cannot store triples (they do) or triple stores cannot carry out graph queries (they can, have construct yourself) - they're built different query types.
i have query console example of graph queries across large datasets in marklogic's triple store, example, run in few seconds rather usual milliseconds 'normal' triple store queries.
there open standards around triple stores led world wide web consortium (w3c). these standards include rdf , sparql, , associated standards. using open standards of course avoids vendor lock in 1 product. marklogic server , allegrograph both open standards compliant in regard.
the downside of w3c standards is, rdf has no concept of 'assertions relationship' - i.e. not allow storing of properties on relationships themselves. graph stores neo4j allow this. can model have relationship type of 'thing' in triple store, isn't nice mental model work in.
where have both documents , triples, hybrid nosql database natively supports indexing , querying of both useful. marklogic server , orientdb both provide this. marklogic server allows execute structural (has/doesn't have element), field (exact match), range (less than, greater than), geospatial (point within area, e.g. arbitrary polygon), bi-temporal (need more room explain...) , semantic query in 1 hit against same record. if need cover both, may want there.
at risk of plugging own work, have published 2 books on subject - nosql dummies (retail, 400 page version), , state of nosql 2016 (kindle only) give background need. i've blogged related subjects on https://adamfowler.org/blog/ . hope helps.