Why Postgres

backend
database
software
tools
technology
software-architecture

SQL is the backbone of any database system. However there are many variants of SQL. This decision post contains the reasons for using PostgreSQL, which is a powerful and feature-full variant of SQL.

Author

Kristiane Beicher

Published

February 26, 2024

Context and Problem Statement

Building databases are best done through by using formal database systems. Most scientific research makes use of or uses databases that are relational rather than unstructured, and we believe that the user base for Seedcase will likely want or be familiar with a relational database structure. The type of database systems that are relational are called Structured Query Language (SQL). There are a large number of different SQL variants available, so we need to decide which one to use.

We’re only look at the top three open source relational databases (as defined by DB-Engines in November 2022): MySQL, PostgreSQL, and SQLite.

As we are planning to use container technology to run the database it is not as important which operating systems the database will run on. Having said that, of the three systems that we are looking at, MySQL and PostgreSQL will run on multiple operating systems (e.g. Linux, Mac OS, and Windows), and SQLite is a classic serverless application.

A side-by-side comparison on DB-Engines was used to compile some of the comparison below.

Decision Drivers

One of the most important functions of Seedcase is to handle data, and the most efficient and flexible ways of doing this is to store it in a database. We need a SQL database system that fits our needs best.

Considered Options

MySQL

MySQL was first released in 1995 and is maintained by Oracle Corp. It is an open source platform with the option to deploy either as a local server solution or cloud based. The implementation languages are C and C++, and it runs of a variety of operating systems. The system allows access through standard technologies (ADO.NET, JDBC, ODBC, and native APIs).

Benefits

  • At present the second most popular database both open source and overall with good support and a large community.

  • Traditional database system with a recognisable format which should be easy to manipulate and work with for the advanced Seedcase user.

  • Support for both XML and JSON formats, both reading and writing.

  • There are a number of ways for MySQL to interact with Apache Parquet files.

Drawbacks

  • MySQL is run by Oracle which is a commercial entity. There is always a risk that the company decides to reverse the open source concept and go move to a solution with a free light version and full payable version. In the case of MySQL, it is very unlikely as the software is very well established and the user user community quite large.

  • There is currently no option in MySQL to store data in a columnar (rather than row-based) table.

PostgreSQL

PostgreSQL was first released in 1989 from UC Berkeley and is maintained by the PostgreSQL Development Group. It is an open source platform with the option to deploy either as a local server solution or cloud based. The implementation language is C, and it runs of a variety of operating systems. The system allows access through standard technologies (ADO.NET, JDBC, ODBC, a native C library, and streaming APIs).

Benefits

  • At present, it is the fourth most popular database overall, and the second most popular open source database. There is a thriving community with a lot of engaging users delivering support.

  • Traditional database system with a recognisable format which should be easy to manipulate and work with for the advanced Seedcase user.

  • Support for both XML and JSON formats, both reading and writing.

  • There are scripts that will allow for PostgreSQL to interact with Apache Parquet files.

  • It is possible to create columnar based tables directly in PostgreSQL.

Drawbacks

  • We don’t see any major drawbacks.

SQLite

First released in 2000, SQLite is slightly different to the two systems described above, as it is an embedded serverless database primarily maintained by an international team of programmers (see About SQLite). It is an open source platform with the option to deploy either locally or in the cloud. The implementation language is C, and it is platform independent. The system allows access through standard technologies (ADO.NET, JDBC, and ODBC).

Benefits

  • Support for both XML and JSON formats, both reading and writing.

  • Easy to set up and implement, works well with R and other languages.

  • There is always a risk that an open source community will break apart and leave a product unsupported, but the risk here looks minimal. The explicitly stated intention from the core developers of SQLite is to support the product until at least 2050.

Drawbacks

  • SQLite is not fully ACID (atomicity, consistency, isolation, and durability) compliant, which is always a risk when working with larger data sets.

  • The database is designed primarily as a tool to sit underneath applications running in single user mode. This means that the database does not as a standard support multi-user work.

  • As the database is serverless it is quite possible that the target audience for the Seedcase project will struggle to work with the database in the instances where local development is needed.

Decision Outcome

We’ve decided to work with PostgreSQL as our backend database as it fulfills all our needs and is a very popular open source tool. MySQL would be the other obvious choice, the application does everything that Seedcase needs, but the user community for PostgreSQL seems to be a bit more active. SQLite is quite popular within the application developer community, but it doesn’t have a reliable multi-user functionality, so it may be an uphill battle to get it to do the things we are hoping to do with Seedcase.

Consequences

The main consequence of our choice is the limiting factor in who can work on the project. Anyone wanting to work on the database part will need to have an understanding of for instance psql, which is the command line tool to work with Postgres. There are not many differences between Postgres and the other large database systems, but there is always some differences in the version of SQL they use, and which terms are used.