Education

Moving from Oracle DB to Cassandra NoSQL: the right choice for up to 20 times better performances

The storage mode and the ability to manage digital data is, as never before, a crucial point for any type of business. Nevertheless, it still does not seem clear which are the best technologies and the best tools to ensure that data are used and exploited to the fullest. Within this article, we will show you which are the main differences between traditional relational DB and  NoSQL DB, the advantages deriving from the use of the latter and, more specifically, the extraordinary performance of the Non Relational Database Apache Cassandra.

RELATIONAL AND NON RELATIONAL DATABASE: DIFFERENCES

Let’s start by saying that the main difference between a traditional SQL DB such as the one developed by Oracle and DB NoSQL as Apache Cassandra resides in their different technical structuring and the type of language used for the creation and execution of queries.

A Relational or SQL Database (with reference to the language set to write queries – Structured Query Language) appears to have a structure based on the concept of relationship. Relationships, or tables, are organized into cells, grouped together in rows and columns. The connection that you create between relationships and columns is called schema: within the Relational DB, the schema must be set and defined before adding data.

The definition “NoSQL” (Not Only SQL), on the contrary, implies that this type of Database does not necessarily have to use the SQL language and thus greater general freedom. The flow of operations is no constraint: There are no predefined schemas.

Created to meet the needs of flexibility, NoSQL DB can have different technical structures: specifically, there are 3 main possible structures. NoSQL can be document-oriented, with a representation of data in structures similar to objects, or even of type “key/value”, therefore presenting a structure made of dictionaries and maps. The mechanism of this type involves the insertion of a key for the fast extraction of a value and of a value associated with it, which contains the information. Another type of structure for DB NoSQL is the graph, based on network of nodes connected through arcs.

 

DB SQL and DB NoSQL comparison

Source: Medium

 

There are also types of “hybrids” NoSQL DB, such as Apache Cassandra.

Cassandra is a large archive of columns and, as such, is essentially an hybrid between a key/value system and a table system. Its data model consists of a partitioned row store with tunable consistency. This last concept is an extension of that of consistency, which refers to the level of updating and synchronization of a row of a Cassandra table in all its replicas. Through the tunable consistency, for each write or read operation, the client can decide the level of consistency he wants to achieve. In Cassandra, thanks to the use of the tunable consistency, it is possible to choose a “strong” or “eventual” consistency level according to your needs: this is very important within a distributed system, where multiple copies of the same data set reside on different physical servers in the system. You can also decide to make a mix of the two types of consistency, and then use the “strong” type for a local data center (where the latency can have 1 ms) and the type “eventual” for a remote data center (where the latency could be 100 ms or more).

Going back to the structure of Cassandra, the lines are arranged according to tables; the first component of a table’s primary key is the partition key; within a partition, the rows are grouped by the remaining columns of the key. Other columns can be indexed separately from the primary key.

Tables can be created, released, and modified during execution without the need to stop updates and the execution of queries.

Within Cassandra, denormalization is emphasized through features such as collections.

With denormalization, you can optimise the reading performances of the DB by adding redundancy to data or by grouping them.

Denormalization is therefore used to resolve the inefficiency of relational systems: the response speed of the DB to a given query and more generally the improvement of reading performances is certainly more important today than the organization of  data themselves.

A family of columns (called “table”) resembles a table in a Relational DB. Column families contain rows and columns. Each line is uniquely identified by a row key. Each row has multiple columns, characterised by name, value, and timestamp. Unlike a table in a Relational Database, different rows in the same column family do not necessarily have to share the same set of columns, and a column can be added to one or more rows.

Each key in Cassandra represents a value that is an object. Each key has values as columns, and the columns are grouped into groups called column families. In this way, each key identifies a row of a variable number of elements. These column families can be considered tables. A table in Cassandra is a distributed multi-dimensional map, indexed by a key. Furthermore, applications can specify the order of columns within a super column or even a family of simple columns.

Timeline table

Source: Slideshare

 

The second, substantial, difference between NoSQL DB and SQL Database resides in the data storage and management mode.

In the case of Relational DB, data must be structured and organised before import procedure.

On the contrary, in the case of NoSQL DB, data must not necessarily be structured before being imported. NoSQL DB eliminates the problem of strict relationship limitations: there are many ways to store and manage data. Through the use of a NoSQL DB, you can create an unlimited number of entries, or vice versa, very simple but extremely effective solutions. The information is no longer located in rows listed in tables, but in completely different and not necessarily structured objects.

NoSQL DB are simple to scale both vertically (as Relational DB) and horizontally: compared to the Relational DB, the Non-relational thus allows the creation of clusters formed by several machines, guaranteeing a better and more equitable distribution of data and, at the same time, great reliability.

WHY MOVING FROM A SQL TO A NoSQL DB

Many of the most important and incisive technologies present on the market today, all useful to improve business, produce massive amounts of unstructured data: IoT sensors, data from social networks, photos, videos, information regarding the localization, online activities, web applications, metrics and much more. If you consider that many companies have built their fortune through the analysis and understanding of this type of data, it is difficult to think that a SQL DB can still be enough today. NoSQL DB are born precisely to meet the needs of flexibility and scalability, features not present within the SQL DB.

The advantages of making this important change are many and substantial. Despite the migration of data to a NoSQL DB can be, at first glance, a  complex and expensive activity, the truth is that today’s technologies allow you to do everything in a short time and in the simplest possible way. But above all, you have to focus on the benefits that you will get once you migrate, in terms of speed and performance.

There is so much circulating information on the internet nowadays, and it is present in many forms and dimensions: with this kind of scenario, the option of having to make any data structured and therefore importable within a Relational DB seems to be difficult to take into account.
Let’s see, below, some technical advantages related to the use of a NoSQL Database.

  • Simple computational operations: By not making aggregations on data, Non Relational DB do not have problems related to computational weight.
  • No schema: NoSQL Databases are schemaless and there is no need for definition. This way, you can enrich the applications with new data and information, freely definable within the NoSQL DB without taking any risks on data integrity. Non Relational Fatabases, unlike SQL ones, are therefore suitable for quickly embedding new data types and preserving semistructured or unstructured data.
  • Scalability: data aggregation and the absence of a pre-defined schema provides the opportunity to scale horizontally without difficulty and without operational risks.
  • Accelerate time to market: NoSQL Databases guarantee speed of operation and analysis, and therefore allow you to process data in the shortest time, so that you can intervene promptly on your business strategies.
  • Easy access: NoSQL Databases offer another important advantage, especially for those who develop apps: ease of access. Relational Databases have a dense relationship with applications written in object-oriented languages such as Java, PHP, and Python. NoSQL often allow you to bypass the problem through the use of APIs, to execute queries without the need to know the SQL language.

NoSQL SUCCESS STORIES: BIG COMPANIES AND CASSANDRA

Big giants like Facebook, Netflix and Twitter use NoSQL Databases to manage their own data. This choice has been made to ensure speed of execution even in the processing of terabytes and terabytes of data, horizontal scalability, a high availability level, and the ability to contain thousands of unstructured data without the need for setting a fixed pattern.

Among the big companies who have chosen to make the move to NoSQL (and specifically to Apache Cassandra) we find Facebook, which in addition to having contributed to the development of the code of this DB, has decided to use it to enhance research within the e-mail system. Digg, a very important social news site, uses Cassandra from 2010. Twitter, another famous social network, has moved to Cassandra to meet the need to perform operations on different clusters and servers. Another giant of our days, Netflix, uses Cassandra to better manage the data of its subscribers.

Specifically, Netflix said to have decided to use Cassandra NoSQL because this type of Database can scale horizontally and dynamically with the addition of multiple servers, without the need to stop services. The absence of limits on scalability, architectural limitations on data size, row and column counts etc. and the excellent performances, especially for what concerns writing throughput have led Netflix to decide to choose this technology for his own business.

Some of Cassandra most attractive features, according to Netflix, are its flexible consistency and replication patterns. Applications can determine at call level the level of consistency that has to be used in reading and writing (single, quorum, or for all replicas). This, along with the customizable replication factor feature, and the special support for determining which cluster nodes to designate as replicas, makes it particularly suitable for cross-datacenter and interregional deployments. As a matter of fact, a single global Cassandra cluster can simultaneously serve applications and asynchronously replicate data across multiple geographic locations. Cassandra is thus, according to Netflix, the best technology for cross-regional distribution and resizing without single points of failure.

 

PERFORMANCE TESTS ON CASSANDRA NoSQL

Isaac has chosen to build his own platform on the basis of the NoSQL Database Apache Cassandra after an in-depth scouting on all the best Non Relational technologies present on the market today. The choice was motivated by the unique requirements of this Database, including:

1. High availability: Cassandra has no points of failure, all nodes in the cluster are identical. Losing a single node does not prevent write/read operations.
2. Writing scalability: Cassandra, with its  “multiple master” model can write from any node. If the workload increases, simply add a node (without interruption of services) to scale the load.
3. Support of Query language: unlike other NoSQL Databases, Cassandra supports the CQL, a language very similar to the SQL. This allows data analysis.

Click here to see more details about analysis reports regarding performance differences between Cassandra, CouchBase, HBase and MongoDB