A High-Level Overview of SQL DB Engine
I've been dabbling in SQL for more than four years now, although not consistently. Recently, over the last four months, I've been tackling Leetcode SQL problems and have become quite proficient at it. I've handled a lot of problems involving various joins, window functions, and even dabbled in stored procedures. But like many others who work with SQL, I don't have a deep understanding of how it works behind the scenes.
Lately, I've developed a sudden interest in SQL and databases in general. I started exploring and reading some research papers by PhD candidates on the subject. It got me thinking that I really want to grasp SQL at a deeper level.
So, I've decided to dive into understanding it better. This guide can serve as a perfect overview to quickly grasp a high-level understanding of what's going on under the hood.
Also, bullet points? New obsession! Great stuff! Easy, and straightforward.
SQL & RDBMS
- SQL (Structured Query Language) is a language designed for communicating with relational databases.
- It aims to handle large volumes of data, particularly in scenarios with numerous simultaneous data writes and transactions. SQL is a standardized language, providing fundamental functionalities.
- The interaction between an SQL client and a database enables various operations such as data creation, deletion, storage, updating, extraction, and user permission management.
- The overall framework for managing this interaction is known as the relational database management system (RDBMS).
How Data Comes?

- Data is collected and organized through various methodologies, but the basic process remains consistent.
- Initially, data is received by a server, often from platforms like Apache or Nginx.
- This data is then processed into tables and stored in a data warehouse server suitable for SQL usage.
- Data is typically converted into a format compatible with the database (e.g., JSON) before storage, avoiding direct writing to the database.
- The data warehouse serves as the central database engine, enabling connection and communication with SQL clients.
- When data retrieval is requested, the data warehouse forwards the SQL query to an application server.
- The application server processes the query and returns the results to a web server.
- Finally, the web server presents the data in a user-friendly format, usually as SQL data tables.
How the Data is Processed?

- The database engine, known as the RDBMS, facilitates communication between SQL clients and the database.
- SQL queries undergo various stages of processing within the database engine.
- These stages typically include compiling (parsing and binding), optimizing, and executing the query.
- The compilation process involves translating the SQL statement into bytecode.
- A virtual machine within the database engine evaluates this bytecode.
- Various SQL database engines implement their own compiler and virtual machine.
- For instance, SQLite, written in C programming language, has its own tokenizer, parser, interpreter, and B-tree headers and structures.
- This processing mechanism is akin to how low-level programming languages like GCC operate.
The stages of processing:
Compiling (Parsing):- Tokenizes the SQL statement into individual words with valid verbiage and clauses.
- Validates the statement against the system's catalog.
- Checks for the existence of databases, tables, and columns mentioned in the query.
- Verifies if the user has the necessary privileges to execute the SQL query.
- Generates a query plan, representing the steps required to execute the statement.
- This plan is typically compiled into byte code, serving as a binary representation.
- Transforms the SQL statement into a command-line shell program.
- Optimizes the query plan by selecting the best algorithms for operations like searching and sorting.
- Utilizes features like Query Optimizer or Relational Engine for efficient execution.
- Results in a prepared SQL statement ready for execution.
- The RDBMS runs the SQL statement by executing the optimized query plan.
The process of compiling SQL statements involves more steps than optimizing and executing due to the nature of each task. Parsing or tokenizing a SQL statement is quick and doesn't require database access. Optimizing a query plan can be CPU-intensive, especially for complex queries, as the optimizer explores various execution strategies. Spending time on optimization is preferred over inefficient query execution, as optimization bears a lower opportunity cost. Reusing the same optimized query plan for similar schemas further enhances efficiency.
The Engines: Storage, Relational, and Execution Engine
- The SQL Database Engine consists primarily of two core components: the Storage Engine and the Query Optimizer (Relational Engine).
- The Storage Engine is responsible for managing data operations such as creation, reading, and updating, ensuring data integrity during disk-to-memory transactions.
- It facilitates maintaining data integrity through mechanisms like rollback journals and write-ahead logs.
- SQL clients come equipped with a Query Optimizer, a valuable feature that enhances performance.
- The Query Optimizer selects the most efficient algorithm based on the written query, akin to choosing algorithms in programming based on time complexity.
- It generates an access plan or query plan using a cost-based optimization approach.
- The Query Optimizer evaluates multiple potential query access plans, estimates their costs, and selects the one with the lowest cost.
- It doesn't exhaustively examine every possible plan but instead balances the cost of finding potential plans with the cost of the plans themselves.
- Once the best plan is chosen, it is passed to the Execution Engine for executing the SQL statement.
How Data Looks AS Tables?

- SQL is typically implemented using low-level programming languages like C.
- The SQL database engine processes incoming data into tables comprising columns and records.
- This transformation is facilitated by B-trees, which are self-balancing trees with multiple branches.
- B-tree structures enable data organization into tables, with keys serving as pointers to other keys within the structure.
Summary
- The SQL database engine is indispensable in nearly any database setup, constructed using low-level programming languages.
- It follows distinct stages for compiling and executing SQL queries, ensuring efficient data processing.
- Algorithm selection is optimized by comparing alternatives, enhancing query performance.
- Tables are structured using binary trees, enabling organized data storage and retrieval.
- Various platforms can connect to a SQL database engine, utilizing different codes while maintaining connectivity.
- SQL facilitates data management with well-defined schemas, automatic algorithm selection, and index maintenance.
- It streamlines performance enhancements without requiring extensive recoding or retesting efforts.
As stated already, this is all a very high-level overview, but it still helps. It's a nice introduction. However, I would like to dive deeper into B-Trees, Query Optimization, etc. As I do so, I'll attempt to simplify it all in another post. For now, this works.