A Student's Guide to Software Engineering Tools & Techniques »

Introduction to NoSQL

Author(s): Ang Ze Yu

Reviewer(s): Neil Brian, James Pang, Daryl Tan, Yash Chowdhary

Basic knowledge of relational databases is assumed. If not, give the SQL article a read first!

What is NoSQL?

Non-Structured Query Language Some other common interpretations include 'not only SQL', 'non relational', 'no SQL'(NoSQL) is a wide set of implementations of query technologies used to retrieve and store data in a non-tabular format.

For example, here is one such data item - a book in the catalogue of an e-commerce website, represented in a Javascript Object NotationA commonly used data format which represents data in a simple, human and machine-readable format.JSON format.

{
  type: "book",
  price: 20,
  popularity: 9.7,
}

To start, let's jump into the most common implementation of such databases, document databases.
One such commonly used database is MongoDB, which is a document based database.

  • Other types of NoSQL databases, such as key-value databases, graph databases, can be viewed as extensions or reductions of document based databases.
  • Some of these types can even be used together!
  • In contrast to tables and table entries in relational databases, document databases comprise of multiple collections, which in turn consists of multiple documents.

    In a simplified e-commerce website for example, you may have the following collections:

    • customers - storing the account details of customers, their purchase histories, etc.
    • items - a collection of all items available for purchase (which are documents)
    • admin - a collection storing admin account details
    • ...

    In this case, the items collection which contains the catalogue of purchase items may be structured like so:

    [
      {
        type: "book",
        title: "about pandas",
        price: 20,
        popularity: 9.7,
        author: "panda1",
        ...
      },
      {
        type: "grocery",
        name: "cheese",
        brand: "panda",
      },
      ...
    ]
    

    Note that while many NoSQL databases provide a JSON interface to interact with the data, the underlying storage implementation may be different for performance reasons.
    For example, MongoDB stores documents in BSON (json encoded in binary).

    Key Characteristics of NoSQL Databases

    1. Powerful and Simple CRUD Operations

    Interaction with most document databases is achieved in a simple and intuitive object oriented manner, and JSON-like queries.

    Let's get back to the above example of an e-commerce website. To register a new user account, an example insert operation made in mongoDB would be like so:

    db.customers.insertOne({
      username: "panda",
      password: hashedPassword,
      email: "panda@pandas.com"
    })
    

    Like relational databases, queries in NoSQL databases often also support more specific and powerful variants, and can be even more succinct in some cases.

    For example, to filter through items with a price of less than 30, and a popularity of more than 8, thereafter sorting them by their price, you would make a query in MongoDB like so, intuitively matching the structure of a typical item in the items collection.

    db.items.find({
      price: {
        $lt: 30
      },
      popularity: {
        $gt: 8
      }
    }).sort({
      price: 1 // Here 1 means ascending order
    })
    

    2. Schema-Less Data

    Another key characteristic of most NoSQL databases is that they are schema-less. In document databases for example, this means that each individual document has no restriction on what keys it must have, the number of keys, the type of values and so on.

    Note the missing fields for the second item in the items collection earlier, which are intentionally omitted.
    All purchase items, regardless of their types and their fields, can still be contained in a singular collection. This is an example of how a schema-less architecture can greatly simplify the organisation of data.

    Documents can even contain other documents, arrays, and depending on the implementation, likely anything the database can serialize and deserialize.

    At the same time, NoSQL databases usually also provide some form of optional Schema ValidationA way to enforce some structure on data, and the corresponding operations on that data. schema validation.

    For example, in the customers collection, where the fields of a customer are unlikely to change, it can be especially helpful to enforce a strict schema on documents; This would prevent the unsuspecting programmer from say, deleting a customer's password, which would be rather undesirable.

    // Example schema validation options in mongoDB
    $jsonSchema: {
      bsonType: "object",
      require: [ "username", "password", "email" ]
      properties: {
        ...
      }
    }
    

    3. Straightforward Expression of Relations in Data

    The world is full of relations. For example, a patient is related to her disease record, just as a customer is related to their shopping cart.

    Sometimes, the objects on both sides of the relation can contain substantial amounts of information, and may be impossible to store as a singular field in one or the other document.

    Hence, simple relations such as A type of relation where each item is only related to one other item one-to-one relations, A type of relation where each item can be related to many other items, but these other items are only ever related to one item on the other end one-to-many relations are often expressed in document databases simply in the form of embedded documents, which is made possible due to the schema-less characteristic of NoSQL databases.

    For example, for a customer and his / her shopping cart, we may have the following:

    {
      username: "panda",
      cart: {
        totalPrice: 100,
        cartItems: [ ... ],
        discountCode: "panda"
      },
      email: "panda@pandas.com",
      ...
    }
    

    In the case of more complicated A type of relation where each item can be related to many other items, and these other items can also be related to many items on the other end many-to-many relationships, relations are commonly stored using references, to avoid duplication of data.

    For example, items in an e-commerce website are related to the many customers through their carts. In these carts, it is much more space efficient to store references to the items, than the item documents themselves.

    In this example, the uniquely generated _id field for each item document in the items collection could be one such reference:

    {
      type: "book",
      price: 20,
      popularity: 9.7,
      _id: "9d1793bd491349n913847n93d"
    }
    

    In the user's cart, we would simply store these _id references, which are used to lookup the item documents in the items collection later:

    cart: {
      totalPrice: 100,
      cartItems: [
        "9d1793bd491349n913847n93d",
        "9d1793bd491349n913847njh8",
      ],
      discountCode: "panda"
    }
    
    For this reason, many NoSQL database solutions (eg. MongoDB) implement a unique id field for each document by default.

    Why NoSQL?


    1. Highly Suited for Iterated Development

    Although less mature than relational databases, NoSQL databases were designed to solve many of the emerging challenges in databases today.

    One of the most consequential impacts NoSQL has had was enabling faster iterated development. Given the highly flexible relational structure of NoSQL databases, and the schemaless format of documents in NoSQL, this means that developers can adapt the database quicker to changing customer and business requirements.

    In contrast, tables in relational databases necessitate predefined schema, which can be rather difficult to change later on while ensuring there are no side effects.

    2. Easy Horizontal Scaling

    Another key benefit of NoSQL databases is the ability to scale horizontally (distributing workload across multiple servers), without discarding much of its key features.

    This is largely due to the schema-less architecture of such databases, allowing data to split across multiple servers easily and efficiently.

    For example, take the following collection of items with a title:

    [
      {
        title: "Apple",
        ...
      },
      {
        title: "Orange",
        ...
      },
      ...
    ]
    

    Assuming we don't have relations from items to themselves inside these documents, we can split the collection like so:

    As a result, the database access workload can be distributed evenly and efficiently across multiple servers easily.

    As businesses grow, it is crucial that its databases can scale to meet greater consumer and business demands.

    increasing the processing power of the machineVertical scaling can only go so far until the single machine hits its limit.

    3. Widespread Adoption


    Most popular database technologies, as ranked by db-engines.com

    While certainly trending behind relational databases, NoSQL databases have been Amazon uses a proprietary NoSQL database!booming over the past couple of years, due to the increasing applicability of its benefits to requirements today.

    This bodes well for the maturity and development of this evolving technology, and your potential use cases for it.


    Caveats of NoSQL


    1. Lack of Standardisation

    From both a user and implementation standpoint, NoSQL databases vary from one solution to another greatly, which can incur extra development costs in projects when there is a need to migrate to another solution, or when new developers are introduced to the project.

    This is in stark contrast to relational databases which mainly use Structured Query LanguageSQL, having a syntax that is mostly standardised across its different eg. PostgreSQL, MySQL, etc.implementations.

    2. Not Suited for Complex Relational Queries

    While NoSQL databases certainly allow for more flexibility in structuring out relations, most complex queries (eg. joins for many-to-many relations) usually involve structured data that can be easily represented in tabular formats.

    In such instances, queries are often more performant in SQL equivalents.


    How to get Started With NoSQL?

    There are many NoSQL variants out there as mentioned earlier. For starters, it may be wise to go with the most common solution, mongoDB.

    Setup

    You could follow the mongoDB documentation here and learn to set up a local instance of mongoDB.

    Thereafter, you should use a local mongo shell to get familiar with mongoDB syntax.
    You can follow the instructions here to connect to your mongoDB instance from the shell as you had configured earlier.

    Online playgrounds

    There are also many online playgrounds that allow you to experiment with mongoDB queries without setting up a local database instance and shell, such as this.

    However, to put what you've learnt into practice (bulding an application) later, I highly recommend getting your feet wet with the shell and local mongoDB instance first, since it will be necessary to set up your application drivers later on!

    Basics

    Here are some great resources on mongoDB:

    • Key Components of MongoDB Architecture heading here for a quick refresher of key terminology in mongoDB.
    • Data-flair is a great starting point on creating databases, collections, etc.administration of your local mongoDB instance. It also provides a higher level overview of each topic than the MongoDB documentation.
    • MongoDB documentation can be overwhelming, but it is also a great starting point to learn and test features of mongoDB, and is the defacto reference for it.

    To guide you through your journey, here are the essentials that you should go through on the above sites in order.

    1. Basic database administration
    2. CRUD operations
    3. Data aggregation

    Practice

    After learning these core features and getting familiar with the syntax, you could try your hand at building a simple project to get a good feel for NoSQL in an actual backend.

    Depending on the backend language you are using, you should browse through the documentation here for the appropriate language and learn how to connect to your mongoDB instance from your application and utilize the features you learnt above.

    Which driver should I try first?
    • Syntax for the different drivers will inevitably vary slightly from language to language. However, the core concepts stay the same.
    • If you want something familiar, and you have knowledge of nodeJS, I highly recommend getting started with the nodeJS driver which is very close to the shell syntax.

    Advanced

    If you're interested in learning more about mongoDB, I recommend going through some of the following topics in order - Indexes, Schema validation, Sharding (horizontal scaling), Replica sets (redundancy).

    Otherwise, you could check out some other popular NoSQL databases, which can even be complementary to mongoDB.

    • Redis - An in-memory NoSQL Key-Value databasesA simpler variant of document databases where data is accessed through keys and stored in corresponding values, which can be of many formats.key-value database used for caching purposes.
    • Neo4j - A NoSQL Graph databasesData is represented by a graph in such databases. Values are stored in the graph's nodes, while relations between these nodes are represented by the edges of the graph.graph database.