hierarchical Archives - Lets Code Them Up!

Most applications are built by layering one model over another. Data model at each layer abstracts the complexity of layers below it. This abstraction allows different group of engineers at different layers to work together effectively.

The best known model is SQL based on relational model. Relational model were introduced first in 1970s. Their adoption increased in 80s and 90s. Since then they have been dominant in the software world. Document models and hierarchical models came but were not able to replace the relational model.

Next came NoSQL (Not only SQL) databases came into picture. They have gained popularity because –

They scale well
dynamic schema rather than restrictive relational schema
support wide variety of operations
open source

Even then as different applications have different requirements, both SQL and NoSQL databases/data stores could be used alongside. (polyglot persistence).

The Object-Relational Mismatch

Impedance mismatch is the difference between the objects in the object oriented programming languages and the database tables.

When it comes to SQL, it had high impedance mismatch. There is need for layer to translate between the application layers and storage layers, adding overhead to application development.

When it comes to NoSQL, it is said to have low impedance mismatch when compared to SQL.

Consider the example of linkedin profile. It has many sections like Education, Location, Past experiences, Summary, languages, skills, etc. When storing this information in SQL, there will be multiple tables connected with each other through foreign keys. There could be one table to store users, another to store their location, another to store their education and so on. The application developer would have to code the complex queries/joins to store/retrieve data from database, with complex logic to convert application model to database model and vice versa.

While in document based model, the entire resume could be store as JSON structure. Developers would only need to take care of encoding/decoding that json. No need for joins/complex queries here.

Many-to-One and Many-to-Many relationships

There are fields in databases which could either be given a plain text value or could be given a ID. An ID will be internal to the database, wherever that value is needed we could use that ID. This method avoids duplication. If the value changes, there is no need to change the corresponding data in different places across tables. This is called normalization and often used in relational databases.

Thinking of example, if we want to store states where people live given in the resume, we could either store the states with their original human given names like California, Michigan, etc. Or we can store their abbriviations, like CA, MI, etc. If we use the abbreviations then we would need to store those somewhere in database and point to them through our users table. This is creating a many to one relationships. There could be many people living in one state.

The above kind of normalization is possible in relational databases as they support joins. However, not possible in document databases as there is no support for joins. So developer would need to emulate joins through application code. Thus transferring logic to application side.

Even if at beginning, document based model is used as there is small amount of join duplication logic needed, with the addition of new features there could be more complex relationships added between models and thus increasing load on application developer.

Are Document Databases Repeating History?

The problem of joins and many-to-many relationships was present when the databases came into existence (eg. hierarchical model used by first database IMS). And at that time, relational models and network models were suggested as solutions.

Network model was like hierarchical model, tree structure, only difference was that one node can have multiple parent. The links between the records were like pointers in programming language. To find a record, one would need to go through all the records one by one through these pointers, called access path. The main drawback was that as one record has multiple parents, programmers would need to keep track of all the multiple paths. This made querying and updating data inflexible.

Relational model on the other hand, stores data as a relation or table. No complex tree structure, no access paths, simple queries to get all data or some data based on some conditions. Query optimizer automatically decides which part of query to execute first and which indexes to use. Once a query optimizer is written for a database, multiple application developers can then make use of it. If it’s not written, then also manual queries were easy to write.

Which data model leads to simpler application code?

It depends upon the data needed for the application. If there are many-to-many relationships needed, joins needed, then best to use relational model as that would reduce the application code. If the data is like a document, less nested structure, then best to use document based models.

Schema flexibility

Document databases are schema-on-read, while relational databases are schema-on-write. Schema-on-read means that the database as such don’t enforce any schema, but upon reading, the application code assumes a structure while reading the data from the database. Because schema is enforced at time of reading, document databases provide more flexibility.

Because there is no schema enforced at the time of writing, any changes needed for any field in document database is simple, one just needs to start writing the new fields and let application code handle the old field if the document has it. On the other hand, for relational database, one would need to migrate the data in entire table if one needs to change a field.

Data locality for queries

Suppose you need to render the entire document on a webpage, then it’s best to use document database as they provide the whole document upon querying. In case of relational databases, one would need to do complex queries and joins to get the same data, which is lot of useless operations.

However if only a part of document is needed, then using document database is useless as they provide the whole document on querying. So it depends on the usage as well, whether one wants to use which type of database.

Most relational databases offer locality driven features, example google’s spanner, oracle’s multi table cluster tables, etc.

Convergence of document and relational databases

Most relational database like PostgreSQL, MtSql support XML/JSON, querying inside XML/JSON documents, etc, which allows same support as while using document based databases. Similarly there are document databases like MongoDB that provide joins. It seems like both relational and document models are becoming similar, which is good as it provides more flexibility.

Thanks for stopping by! Hope this gives you a brief overview in to data models. Eager to hear your thoughts and chat, please leave comments below and we can discuss.

Lets Code Them Up!

Tag: hierarchical

DDIA | Chapter 2 | Data Models and Query Languages | Relational vs Document Model