It actually really does show you a „different reality“. It has different effects on different people, I assume. For me, it was quite eye-opening, in that it made me stop using cigarettes for example. But for others, it leads to those strange thoughts of living in the matrix or something
Sorry, We didn't make it clear on the website. In short, We build and generate REST APIfor you, so you can focus on building on the frontend side. Thanks, I appreciate your taking the time to write.
Well, first off: A graph database is typically considered a type of NoSQL database. Second off, a lot of graph databases use a SQL database such as PostgreSQL as the storage engine.
What really distinguishes graph databases is the querying language. There are a lot of these out there - RDF, Datalog, Cypher and Gremlin. These are typically optimized for modeling and making it easy to query against data with a high degree of interconnectedness. So, taking an RDBMS as the baseline, and assuming that by NoSQL you meant something like a column or document store that offers poorer support for ad-hoc queries than an SQL database, a graph database would be moving in the opposite direction.
Sort of. There's technically not anything a graph database can do that can't be expressed in modern (i.e., since the early 2000s for most, or 2018 if MySQL is your jam) SQL. But sometimes it can take a fair bit of effort to do so. If you find yourself frequently getting lost in a quagmire of complex joins and recursive CTEs, a graph DB can be a real boon for the maintainability of your data layer.
I'm not so sure that many graph databases use a relational database as the data store. Some use Linear Algebra representations of the graph. Some use key-value stores. Some are proprietary implementations that we'll never know exactly how the data is represented under the covers.
RecallGraph at first glance looks a bit like TerminusDB that recently featured on HN [0]. In terminusdb data is stored like code in git, and you can time travel and do branch, merge, squash, rollback, diff, blame, etc. But TerminusDB is a semantic graph database based on OWL schemas, which stores data as RDF and querying delivers JSON-LD. I will certainly give RecallGraph a closer look.
A Graph in the computer science world is a type of data model. There are many problems that are easier to solve with that kind of data model. For example:
What is the shortest path between two locations on a map? (ie, every time you ask google for directions)
What is the single point of failure in this network?
How am I connected to pbg on Linked In?
How are these financial crimes connected?
Who is the biggest "influencer" in my facebook network?
How do diseases spread?
How do forest fires spread?
I want to make a phone call, how does it get routed (with old telephone switch technology) across the country?
I have 10 rooms, 30 speakers, and 1000 attendees in my conference. How do I arrange the speakers and conference rooms for an optimal conference schedule?
I have a bunch of pilots who speak different languages and are qualified to fly on a variety of aircraft how do I maximize the number of planes in the air at any one time?
How do I send my garbage trucks out to collect the garbage and use the least amount of fuel?
I get graphs and perhaps some graph algos (maximal flow etc) but I've never used a graph DB. Is this really how it works, cos what you're describing sounds more like some kind of generic-optimiser-in-a-box
I'd say it's more about query language, and what types of queries the DB is optimized for. Graph DBs come with query languages that let you directly ask questions like "starting from node $foo, select all nodes and edges that lead to node $bar, but only for paths consisting of edges with property $xyz > 42".
An example from Neo4J documentation:
MATCH p =(charlie:Person)-[* { blocked:false }]-(martin:Person)
WHERE charlie.name = 'Charlie Sheen' AND martin.name = 'Martin Sheen'
RETURN p
which, per documentation, "returns the paths between 'Charlie Sheen' and 'Martin Sheen' where all relationships have the blocked property set to false".
Graph DBs are designed for modelling your data as nodes with properties, connecting by directed edges, with properties, to another nodes; they're also internally optimized for doing such queries.
What you're usually gaining with a graph database: a query language designed primarily for operating on graphs, and a design tuned to be very efficient at some subset of queries one might want to make against graphs. They're typically gonna be worse than other database types at other types of queries or data-fetching generally, and sometimes even for certain kinds of graph-focused queries, so watch your ass. Consider: if it were possible to tune on-disk and in-memory data-structures for excellent performance with all query types, every database would do it—graph database make lots of trade-offs, typically, so make sure you mostly need to do the thing they're fast at.
If you don't have dense (many edges per node) and very large graphs and a need to do various things with them that could basically qualify as a form of path-finding, then you probably don't need/want a graph database.
If you need great performance at things graph databases are good at but also great performance at things PostgreSQL (or whatever) are good at, you can always run both and, say, use queries against your graph DB to inform what you fetch out of your SQL DB. This is less than ideal in a lot of ways (you now have a distributed system even if you didn't otherwise want/need one) but, especially if your graph DB data can be derived from your SQL db so you don't have to worry too much about it getting screwed up, can make sense if you really need to do both things. I think this is how a lot of big players use them—recommendation engines and such querying against a graph DB, but e.g. invoices or inventory somewhere a little more general and robust.
[EDIT] to rebut a post downthread, just "my data model has lots of joins" is not a strong indication, per se, that you should use a graph DB. If your schema or some important and large (data-size wise) part of it consists of a couple tables and a lot of expensive recursive queries over those searching for things without much idea of how deep the recursion will go in advance, then you might want a graph DB.
[EDIT EDIT] Nb. graph DB companies may market to you that they are a suitable or even superior replacement for other types of database for most any purpose. Don't believe them. At all. They are trying to mislead you to make more sales (think: early MongoDB). Do your own research.
Very interesting. I work with TerminusDB and we've been thinking a lot about how to apply a revision control semantic graph db to ML tasks. The whole MLOps process is fragmented and we think a collaborative revision control (like git but for data) that allows all of the parts to work together (data engineer, data scientist, ML engineer) could be very useful.
I had never heard of both TerminusDB and MLOps so thx for sharing !
A git for data like you describe seems intuitively (but should be well defined) to be a technology very useful for many things.
From safely versioning knowledge a la mediawiki to versioning business data in DBs and making it seamlesss for all the human pipeline (data engineer, data scientist, ML engineer).
Actually I have a startup Idea that would require somthing similar yet different: I would need both version control for user data AND guaranteed immutability of what users have wrote. It would allow users to trust that the server cannot modify their data.
For such a use case, the first things that comes to mind are blokchains but the technology feels too limiting.
The only offer that I'm aware of as a general SQL DB is https://aws.amazon.com/qldb/
BTW git but for data is an idea that has a lot of competing implementations, it would be nice for your landing page to explain what differentiates you from e.g ->
https://news.ycombinator.com/item?id=22731928
Anyway I wish you a good luck in this fun and probably useful project !
I'd say... anytime you are about to shove the square peg in the round hole...
More concretely, observe the complexity of nested sets [1] ... So if you need to represent trees, hierarchies or, well, graphs, in your data, maybe it could be reasonable to use a graph database instead of a relational one.
It's useful to separate graph-shaped problems from graph-db-shaped problems. From what we (Graphistry) see when working with folks here:
1. Graph-shaped, and generally fine without a graph DB:
* You / your app wants to run some graph algorithms, it fits in CPU/GPU memory, you have the data elsewhere, and it's easily stitched into a graph. We regularly do 1000-1B nodes/edges on one GPU node. SQL/CSV/Parquet/Splunk/Spark query -> node+edge table -> ... . Ex: Correlating user journeys, mapping host/network IT/security log activity, analyzing bots, ... .
* You want to visually explore ^^^^ as graphs/relationships/correlations (where we often come in for Graphistry)
Having to manage 2 systems of record for some data to get some algorithmic/usability benefits is terrible, so often I recommend your regular DB + on-the-fly graph compute like ^^^^ .
The upcoming security session of LearnRAPIDS.com will walk through some of this.
2. Graph search + graph enrichment, esp. on heterogeneous data or on > 1B nodes/edges.
2a. Graph query languages provide genericity not seen in normal SQL/NoSQL.
Ex: An analyst or an ML algorithm wants to get a 360 on all data associated with some value, maybe a couple hops out. There may be many types of data available. In SQL/NoSQL land, you need to know all the ways to pivot ahead of time (Users.id -> Customers.user_id --phone--> Calls.phone), and pray that the Join queries don't tank the system either as one-off queries or in throughput scenarios.
2b. Graph DB impls can efficiently run certain search queries other DBs cannot. When your searches have extra fun patterns, like "between user A and user B, find all paths", and "Process A talks to Process B, which creates File C, which ...", this can be a big deal.
Growing in # of Tables or # Rows both make these more important.
* DB management can be good for auth & locked schema reasons even early on; part of why we did Neo4j early for ProjectDomino.org
* When working set sizes do start hitting say 100M or 1B, you may have a variety of queries where you don't want the overheads of going from scratch for everything (#1), esp. in a multi-user/service arch.
* Likewise, when data grows to multi-node & write-heavy, you may want it always on. An ephemeral system can be good (no state!), but if writes are needed to and you don't want 2 systems, a graph db may be a good system.
We get involved in all 3 categories of graph projects, am happy to help.
I figured what the author is talking about is often my problem. If I want to learn a new concept, I have to know the meta regarding that concept, to the point it distracts me from the actual concept I wanted to learn in the first place. It’s annoying me, and I don’t know why that happens every single time.
I don’t think we need programmers. I think for this task we’d need neuroscientists, philosophers, psychologists, mathematicians and engineers. Also, I don’t think we need more developers working on that problem. We need the right people with the right motivation and cognitive abilities to solve the problem of Artificial General Intelligence.
There is the narrating self and the experiencing self. The narrating self makes up stories from the information that the experiencing self feeds to it. The important thing being that the experiencing self cannot judge, it just... experiences. Takes in every bit of sensory information and let’s the narrating self make up the story it thinks matches the experience one goes through.
Now, imagine the sensory overload you get when stuck on YouTube, showing those reviews, unboxing videos, whatever to you - or more precisely, to your experiencing self. At some point your narrating self will make up the story and then you end up paying 15k for camera equipment... of course, it’s your choice, right?