Home > big data, nosql, sql > A Beginner’s Guide to NoSQL

A Beginner’s Guide to NoSQL

What is it going to be? MySQL, MS-SQL, Oracle or PostgreSQL? After all, nothing can be as amazing as a good old RDBMS that employs SQL to manage the data.

Well, allow me to introduce to you an entirely unique and unconventional Database model – NoSQL. Just like every other fine article out there, we too shall begin ith…eh….disclaimers!

NoSQL stands for not-only-SQL. The idea here is not to oppose SQL, but instead provide an alternative in terms of storage of data. Yet, for the obvious reason that most users are well versed with SQL, many NoSQL databases strive to provide an SQLlike query interface.

Why NoSQL?  

That’s a valid question, indeed. Well, here are the reasons:

  • Managing Large Chunks of Data: NoSQL databases can easily handle numerous read/write cycles, several users and amounts of data ranging in petabytes.
  • Schema? Nah, not needed: Most NoSQL databases are devoid of schema and  therefore very flexible. They provide great choices when it comes to constructing a schema and foster easy mapping of objects into them. Terms such as normalization and complex joins are, well, not needed!
  • Programmer-friendly: NoSQL databases provide simple APIs in every major programming language and therefore there    is no need for complex ORM frameworks. And just in case APIs are not available for a particular programming language, data can still be accessed over HTTP via a simple RESTful API, using XML and/or JSON.
  • Availability: Most distributed NoSQL databases provide easy replication of data and failure of one node does not affect the availability of data in a major way.
  • Scalability: NoSQL databases do not require a dedicated high performance server. Actually, they can easily be run on a cluster of commodity hardware and scaling out is just as simple as adding a new node.
  • Low Latency: Unless you are running a cluster of a trillion data servers (or something like that, give or take a few million  of           them),  NoSQL can help you achieve extremely low latency. Of course, latency in itself depends on the amount of data that can be successfully loaded into memory.

Triple stores save data in the form of subject-predicate-object with the predicate being the linking factor between subject and object. As such, Triple Scores too are variants of network databases. For instance, let’s say “Jonny Nitro reads Data Center Magazine.” In this case, Jonny Nitro is the subject, while Data Center Magazine is the object, and the term ‘reads’ acts as the predicate linking the subject with the object. Quite obviously, mapping such semantic queries into SQL will prove difficult, and therefore NoSQL offers a viable alternative. Some of the major implementations of Triple Stores are Sesame, Jena, Virtuoso, AllegroGraph, etc.

SQL ideology 

Basically, NoSQL drops the traditional SQL ideology in favor of CAP Theorem or Brewer’s Theorem, formulated by Eric Brewer in 2000. the theorem talks about three basic principles of Consistency, Availability and Partition Tolerance (abbreviated as CAP), adding that a distributed database can at the most satisfy only two of these. NoSQL databases implement the theorem by employing Eventual Consistency, which is a morerelaxed form of consistency that performs the task over a sufficient period of time. This in turn improves availability and scalability to a great extent.This paradigm is often termed as BASE  – implying Basically Available, Soft state, Eventual Consistency.

NoSQL Data Models

Some of the major and most prominent differentiations among NoSQL databases are as follows:

  1. Document Stores
  2. Hierarchical
  3. Network
  4. Column-oriented
  5. Object-oriented
  6. Key-value Stores
  7. Triple Stores

Document stores

Gone are the days when data organization used to be as minimal as simple rows and columns. Today, data is more often than not represented in the form of XML or JSON (we’re talking about the Web, basically). The reason for favoring XML or JSON     is because both of them are extremelyportable, compact and standardized. Bluntly put, it makes little sense to map XML or JSON documents into a relational model. Instead, a wiser decision would be to utilize the document stores already available. Why? Again, simply because NoSQL databases are schema-less, and there existsno predefined for an XML or JSON document and as a result, each document is independent of the other. The database can be employed in CRM, web-related data, real-time data, etc. Some of the most well known implementation models are MongoDB, CouchDB and RavenDB. In fact, MongoDB has been used by websites such as bit.ly and Sourceforge.

Hierarchical Databases 

These databases store data in the form of hierarchical relevance, that is, tree or parent-child relationship. In terms of relational models, this can be termed as 1:N relationship. Basically, geospatial databases can be used in a hierarchical form to store location information which is essentially hierarchical, though algorithms may vary. Geotagging and geolocation are in vogue of late. It is in such uses that a geospatial database becomes very relevant, and can be used  in Geographical Information System.     Major   examples of the same include PostGIS, Oracle Spatial, etc. Also, some of the most well known implementations of hierarchical databases are the Windows Registry by Microsoft and the IMS Database by IBM.

Graph Network Databases

Graph databases are the most popular form of network database that are used to store data that can be represented in the form of a Graph. Basically, data stored by graph databases can grow exponentially and thus, graph databases are  ideal for storing data that changes frequently. Cutting the theoretical part, graph database has perhaps the most awesome example in the likes of FlockDB, developed by Twitter to implement a graph of who follows whom. FlockDB uses the Gizzard Framework to query a database up to 10,000 times per second. A general technique to query a graph is to begin from an arbitrary or specified start node and follow it by traversing the graph in a depth-first or breadth-first fashion, as per the relationships that obey the given criterion. Most graph databases allow the developer to use simple APIs for accomplishing the task. For instance, you can make queries such as: “Does Jonny Nitro read Data Center Magazine?” Some of the most popular graph databases include, apart from FlockDB, HyperGraphDB and Neo4j.

Column-oriented Databases

Column-oriented databases came into existence after Google’s research paper on its BigTable distributed storage system, which is used internally along with the Google file system. Some of the popular implementations are Hadoop Hbase, Apache Cassandra, HyperTable, etc.

Such databases are implemented more like three-dimensional arrays, the first dimension being the row identifier, the second being a combination of column family plus column identifier and the third being the timestamp. Column-oriented databases are employed by Facebook, Reddit, Digg, etc.

Object-oriented Databases

Whether or not object-oriented databases are purely NoSQL databases is debatable, yet they are more often than not considered to be so because such databases too depart from traditional RDBMS based data models. Such databases allow the storage of data in the form of objects, thereby making it highly transparent. Some of the most popular ones include db4o, NEO, Versant, etc. Object-oriented databases are generally used in research purposes or web-scale production.

Key-value stores

Key-value stores are (arguably) based on Amazon’s Dynamo Research Paper and Distributed hash Tables. Such data models are extremely simplifiedand generally contain only one set of global key value pairs with each value having a unique key associated to it. The database, therefore, is highly scalable and  does not store data relationally. Some popular implementations include Project Voldemort (open-sourced by LinkedIn), Redis, Tokyo Cabinet, etc.

Triple stores

Triple stores save data in the form of subject-predicate-object with the predicate being the linking factor between subject and object. As such, Triple Scores too are variants of network databases. For instance, let’s say “Jonny Nitro reads Data Center Magazine.” In this case, Jonny Nitro is the subject, while Data Center Magazine is the object, and the term ‘reads’ acts as the predicate linking the subject with the object. Quite obviously, mapping such semantic queries into SQL will prove difficult, and therefore NoSQL offers a viable alternative. Some of the major implementations of Triple Stores are Sesame, Jena, Virtuoso, AllegroGraph, etc.

Summary

So, what now? Well, you’ve just been introduced to NoSQL. However, does this mean that you should make the switch to it from SQL? Perhaps. Or perhaps not. The answer varies from situation to situation. If you find SQL queries way too much to cope with, chances are you’ll find NoSQL equally difficult. However, if you’re looking for a more flexible      alternative and do not mind getting your hands dirty, you should definitely give NoSQL a spin! The choice, obviously, is yours! Happy data managing to you!

What NoSQL databases are present today?

  • Cassandra
    • Data Model: Columnfamily
    • Query API: Thrift
  • CouchDB
    • Data Model: Document
    • Query API: map/reduce views
  • HBase
    • Data Model: Columnfamily
    • Query API: Thrift, REST
  • MongoDB
    • Data Model: Document
    • Query API: Cursor
  • Neo4j
    • Data Model: Graph
    • Query API: Graph
  • Redis
    • Data Model: Collection
    • Query API: Collection
  • Riak
    • Data Model: Document
    • Query API: Nested hashes
  • Scalaris
    • Data Model: Key/value
    • Query API: get/put
  • Tokyo Cabinet
    • Data Model: Key/value
    • Query API: get/put
  • Voldemort
    • Data Model: Key/value
    • Query API: get/put
Advertisements
Categories: big data, nosql, sql Tags: ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: