SQL Vs No-Sql
--
· Pointers to choose Sql vs No-Sql
∘ Data structure
∘ Ability to query data
∘ Scaling
∘ The convergence of SQL and NoSQL
∘ Database options
∘ Conclusion
· WHEN TO PICK A NoSQL DATABASE
· SQL databases: Pros and cons
∘ Pros
∘ Cons
· Examples of SQL databases
· NoSQL databases: Pros and cons
∘ Pros
∘ Cons
· Examples of NoSQL databases
· SQL vs. NoSQL: When to use each
· When to use SQL
· When to use NoSQL
· Conclusion and next steps
Follow this link for all System design articles
Pointers to choose Sql vs No-Sql
Structure of the data
The first and primary factor in making the SQL vs. NoSQL decision is what your data looks like.
If your data is primarily structured, a SQL database is likely the right choice.
A SQL database is a great fit for transaction-oriented systems such as customer relationship management tools, accounting software, and e-commerce platforms. Each row in a SQL database is a distinct entity (e.g. a customer), and each column is an attribute that describes that entity (e.g. address, job title, item purchased, etc.). Because of these distinct, structured relationships between rows and columns in a table.
SQL databases are best when you need ACID compliance. ACID stands for:
- Atomicity — each transaction either succeeds completely or is fully rolled back.
- Consistency — data written to a database must be valid according to all defined rules.
- Isolation — When transactions are run concurrently, they do not contend with each other, and act as if they were being run sequentially.
- Durability — Once a transaction has been committed to the database, it is considered permanent, even in the event of a system failure.
ACID compliance protects the integrity of your data by defining exactly what a transaction is and how it interacts with your database. It avoids database tables from becoming out-of-sync, which is super important for financial transactions.
ACID compliance guarantees validity of transactions even in the face of errors, technology failures, disastrous events, and more.
If your data is very structured and ACID compliance is a must, SQL is a great choice.
On the other hand, if your data requirements aren’t clear or if your data is unstructured, NoSQL may be your best bet.
Data is unstructured —
The data you store in a NoSQL database does not need a predefined schema like you do for a SQL database. Rather, the data can be column stores, document-oriented, graph-based, or key-value pairs.
This provides much more flexibility and less upfront planning when managing your database.
With NoSQL, you can:
- Create documents without carefully defining their structure upfront
- Add fields to your database without changing the fields of existing documents
- Store documents that have their own unique structure
- Have multiple databases with different structures and syntax
A NoSQL database is a much better fit to store data like article content, social media posts, sensor data, and other types of unstructured data that won’t fit neatly into a table. NoSQL databases were built with flexibility and scalability in mind, and follows the BASE consistency model/CAP theorem which means:
- Basic Availability
→ The database appears to work most of the time — This means that while the database guarantees the availability of the data, the database may fail to obtain the requested data or the data may be in a changing or inconsistent state. - Soft state
→ Stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time — The state of the database can be changing over time. - Eventual consistency
→ Stores exhibit consistency at some later point (e.g., lazily at read time) — The database will eventually become consistent, and data will propagate everywhere at some point in the future.
The BASE model was built for maximum flexibility. But there actually are some NoSQL databases that are ACID compliant. You can read more about this in the “The convergence of SQL and NoSQL” section.
The structure of your data is the most important factor in deciding whether to use a SQL or NoSQL database, so put a lot of thought into this before making a decision.
Ability to query data
The next factor to consider is
- how often you’ll query your data,
- how quickly you need to run queries, and
- who will be responsible for running these queries.
Because your data is nicely structured and organized, it is very efficient to query your data with a SQL database.
SQL is a popular programming language that has been around for over 45 years, so it’s extremely mature and well-known. It efficiently executes queries and retrieves and edits data quickly. It’s very lightweight and declarative, and thus is easy to learn. Therefore, queries can be run by less technical staff like business analysts and marketers.
A NoSQL database provides a ton of flexibility in the types of data that you can store, but because of the potentially large differences in data structures, querying isn’t as efficient as with a SQL database.
When NoSQL database technology was being built, developers focused on scalability and flexibility, not query efficiency (not always true though)
So in order to run NoSQL queries, you will have to perform extra processing on the data. Depending on the NoSQL database you’re using, you may have to implement some level of MapReduce. Many developers build querying functionality into the application layer, instead of worrying about it in the database layer. There have been some attempts at standardizing NoSQL querying, such as XQuery or JSONiq, but these tools haven’t been widely adopted.
Querying NoSQL databases typically requires developers or data scientists, which will be more costly and less efficient.
How often will you query your data, and who will run these queries? The answers to these questions will impact your SQL or NoSQL decision.
Scaling
SQL and NoSQL databases scale differently, so you’ll have to think about how your data set will grow in the future.
SQL databases scale vertically, meaning you’ll need to increase the capacity of a single server (increasing CPU, RAM, or SSD) to scale your database. SQL databases were designed to run on a single server to maintain the integrity of the data, so they’re not easy to scale. If vertical scaling is not possible, we have to scale SQL database horizontally, which then will need additional efforts to persist, query, maintain data.
NoSQL databases scale horizontally, meaning you can add more servers to power your growing database. This is a huge advantage that NoSQL has over SQL.
The ability of NoSQL databases to horizontally scale has to do with the lack of structure of the data. Because NoSQL requires much less structure than SQL, each stored object is pretty much self-contained and independent. Thus objects can be easily stored on multiple servers without having to be linked. This is not the case for SQL, where each table row and column needs to be related.
An analogy for vertical and horizontal scaling is a wedding cake. With SQL, you can feed more people by adding more layers to the wedding cake. With NoSQL, you can just make a bunch of wedding cupcakes. Yum.
As your business grows, so will your database. So make sure you consider your scaling needs.
The convergence of SQL and NoSQL
Both SQL and NoSQL databases have their pros and cons. As such, there has been a movement to take the best characteristics of both types of databases and integrate them so users can realize the best of both worlds.
For instance, MySQL, the most popular open-source relational database, offers MySQL Document Store. This provides the structure of a MySQL database combined with the flexibility and high availability of NoSQL without having to implement a separate NoSQL database.
MongoDB, one of the most popular NoSQL databases, offers multi-document ACID transactions.
AWS’ managed NoSQL database, DynamoDB, also provides ACID-compliant transaction functionality.
And with the easy database setup that cloud service providers offer, you have the ability to use both SQL and NoSQL databases in your cloud data architecture to meet your data storage needs.
Now you have much more flexibility regardless of whether you choose a SQL or NoSQL database, and there are sure to be more flexible options in the future.
Database options
Regardless of whether you go with a SQL or NoSQL database (or both!), there are plenty of options to choose from.
On-premise SQL database offerings include:
- MySQL — as mentioned prior, the most popular open-source relational database
- Microsoft SQL server — Microsoft’s enterprise version of SQL
- PostgreSQL — and enterprise-level, open-source database focused on extensibility
- Oracle — full-service (and expensive) SQL option
- MariaDB — an enhanced version of MySQL, built by MySQL’s original developers
- And many more
The major cloud service platforms have their own SQL options:
- AWS has:
→ RDS, their standard cloud SQL database
→ Aurora, which focuses on increased throughput and scalability - Microsoft Azure has:
→ Azure SQL Database, their managed database-as-a-service
→ Azure Database for MySQL, PostgreSQL, and MariaDB - Google Cloud Platform (GCP) has:
→ Cloud SQL, which you can use for MySQL and PostgreSQL
→ Cloud Spanner, which combines elements of SQL and NoSQL
On-premise NoSQL database options include:
- MongoDB — by far the most popular NoSQL database
- Redis — an open source, distributed, in-memory key-value database that is super fast
- Cassandra — free, open-source NoSQL database created by Facebook that focuses on scalability and high availability
- Many others
Cloud service providers offer plenty of NoSQL options as well:
- AWS has:
→ DynamoDB, its managed NoSQL database
→ DocumentDB, a fast, scalable, highly-available MongoDB-compatible database - Microsoft Azure offers:
→ CosmosDB, its globally distributed, multi-model database - Google Cloud has:
→ Bigtable, its NoSQL wide-column database service
→ Cloud Datastore, its NoSQL document database service
→ Cloud Firestore, a cloud-native NoSQL document database that helps store and query app data
There is no shortage of database options to choose from!
Conclusion
There are plenty of decisions to be made when thinking about your cloud data storage. One of the most important decisions is whether to go with a SQL or NoSQL database as your primary database, and whether you may need both to meet your needs.
You’ll need to think about what your data looks like, how you’ll query your data, and the scalability you’ll need in the future.
SQL databases provide great benefits for transactional data whose structure doesn’t change frequently (or at all) and where data integrity is paramount. It’s also best for fast analytical queries.
NoSQL databases provide much more flexibility and scalability, which lends itself to rapid development and iteration.
WHEN TO PICK A NoSQL DATABASE
Here are a few reasons why you’d want to pick a NoSQL database:
- Handling A Large Number Of Read Write Operations
→ Look towards NoSQL databases when you need to scale fast. NoSQL databases fit best in these scenarios. Since they have the ability to add nodes on the fly, they can handle more concurrent traffic and large amounts of data with minimal latency. - Running data analytics
→ NoSQL databases also fit best for data analytics use cases, where we have to deal with an influx of massive amounts of data. - Popular NoSQL databases:
→ MongoDB, Redis, Cassandra, HBASE - data sets are extremely large
- paradoxically the things you are tracking are quite small (like player stats)
- much data will never be queried or referenced
- you need nested data
- you need extremely fast in memory data
- you want globally distributed data
- your schema is flexible and changing
- you want to run different versions of your application against different evolving schema in the same database(s)
- you need 1000+ databases
- you have simple data requirements (key/value)
- Your application requires super-low latency.
- You only need to serialize and deserialize data (JSON, XML, YAML, etc.).
SQL databases: Pros and cons
Pros
- Reduced data storage footprint due to normalization and other optimization opportunities. Often results in better performance and more efficient use of resources.
- Strong and well-understood data integrity semantics through ACID (Atomicity, Consistency, Isolation, Durability).
- Standard access to data via SQL.
- Generally more flexible query support capable of handling a broader range of workloads. SQL abstracts over the underlying implementation and allows the engine to optimize queries to fit their on-disk representation.
Cons
- Rigid data models that require careful up-front design to ensure adequate performance and resist evolution — changing a schema will often include downtime
- Scaling horizontally is challenging — either completely unsupported, supported in an ad-hoc way, or only supported on relatively immature technologies
- Non-distributed engines are generally a “single point of failure” that must be mitigated by replication and failover techniques; no illusion of infinite scalability
Examples of SQL databases
NoSQL databases: Pros and cons
Summarizing the pros and cons of NoSQL is challenging for just this reason. The space has been well-explored and the range of options available is enormous. Some general pros and cons — which may not all apply to all NoSQL stores — include the following:
Pros
- Scalable and highly available — many NoSQL databases are generally designed to support seamless, online horizontal scalability without significant single points of failure.
- Flexible data models — most non-relational systems do not require developers to make up-front commitments to data models; what schemas do exist can often be changed on the fly.
- High performance — by limiting the range of what the database can do (for example, by relaxing durability guarantees) many NoSQL systems are able to achieve extremely high levels of performance.
- High-level data abstractions — moving beyond the “value in a cell” data model, NoSQL systems can provide high-level APIs for powerful data structures. Redis, for example, includes a native-sorted set abstraction
Cons
- Vague interpretations of ACID constraints — despite widespread claims of ACID support for NoSQL systems, the interpretation of ACID is often made so broad that not much can be gleaned about the semantics of the database in question. For example, what does “isolation” mean without transactions?
- Distributed systems have distributed systems problems. While not unique to NoSQL systems, it’s the norm, rather than the exception, for developers programming against NoSQL to deeply understand, e.g., CAP Theorem and its interpretation by the database in question.
- Lack of flexibility in access patterns — the relational/SQL abstraction gives the database engine broad powers to optimize queries for the underlying data; without that abstraction, the on-disk representation of data leaks in to the application’s queries and leaves no room for the engine to optimize.
Examples of NoSQL databases
SQL vs. NoSQL: When to use each
When to use SQL
When you have relational data, this is the natural fit, of course. But you may be asking yourself how you identify the “mythical” natural fit. Well, when you look at your data, do you see distinct entities with well-defined relationships with one another that must be strictly enforced and/or navigable? If so, we have match!
When your focus is on data integrity, relying on a tried and true relational databases is a good bet. When you want flexible access to your data, the relational model and SQL allow for much greater support of ad-hoc queries. Additionally, databases like PostgreSQL have added excellent support for NoSQL-style workloads with features like native JSON data types. If you don’t need the scale-out capabilities of NoSQL data stores, they can be a good fit for some non-relational workloads as well. This makes them a great Swiss army knife when you have some relational data and some unstructured data, but don’t want to buy the complexity of working with different types of data stores.
While many people look toward NoSQL for simplicity, it’s important to understand the implications of those data stores when building your application. While it is true that they are easy to get started with, it is critical to understand the implications of write consistency (or lack thereof), eventual consistency, and impacts of sharding on how you plan to access the data in the future. Relational databases can be simpler to build a reliable application on as they free you from worrying about such concerns.
When to use NoSQL
NoSQL is appealing when you have highly flexible data models or very specific needs that don’t fit into the relational model. If you are taking in a lot of unstructured data, a document database like MongoDB or CouchDB can be a nice fit. If you need very fast access to key-value data but can live without strong integrity guarantees, Redis is a great fit. Complex or flexible search across a lot of data? Elasticsearch is a great fit.
NoSQL data stores tend to be highly scalable, and scaling out is a core tenet of many of these systems. Built-in sharding makes scaling reads and writes out much easier than doing so with a relational database. Relatedly, NoSQL systems can often meet very high availability requirements. Databases like Cassandra have no single points of failure and your applications can trivially react to underlying failures of individual members.
Conclusion and next steps
Choosing or recommending a database is a nontrivial exercise, even for database experts. The SQL vs. NoSQL division is a useful rubric for helping inform that decision, but ultimately, there’s no substitute for thinking hard about the data needs of your application and the tradeoffs you’re willing to accept to achieve performance or uptime goals.
If nothing else, the fact that NoSQL exists is a boon for systems — it’s an excuse to explore the design space and find the sweet spots that solve real-world application problems. Even so, in 2020, there are innumerable reasons to keep choosing SQL.