Understanding SQL


Featured image

The Structured Query Language (SQL) is everywhere. All major applications, all the systems that deal with the ‘Relational Database Management Systems’ (RDMS) beg for deep knowledge on SQL. So, I set out to learn deep concepts of SQL and document my learning.

SQL execution sequence

SQL statements includes a variety of clauses to select, join, trim and slice data as per our needs. Each of these clauses have a execution sequence, and order. Let’s take a look at it.

SELECT DISTINCT <TOP_specification> <select_list>
FROM <left_table>
<join_type> JOIN <right_table>
ON <join_condition>
WHERE <where_condition>
GROUP BY <group_by_list>
HAVING <having_condition>
ORDER BY <order_by_list>

In SQL statements, the primary clause that is processed in the FROM clause. This is the way of statement telling SQL language to look for the table first and then do whatever that is asked to do later. So, FROM clause always executes first.

Here’s the sequence of the 10 clauses.

1. FROM clause
2. ON clause
3. OUTER clause
4. WHERE clause
5. GROUP BY clause
6. HAVING clause
7. SELECT clause
8. DISTINCT clause
9. ORDER BY clause
10. TOP clause

Understanding the execution order helps optimizing the query performance easy.

Set Operations

Imagine you have two tables that have equivalent columns, and you want to perform set operations on the table outputs.

Union

In SQL the UNION clause combines the results of two SQL queries into a single table of all matching rows. The two queries must result in the same number of columns and compatible data types in order to unite. Any duplicate records are automatically removed unless UNION ALL is used.

SELECT * FROM table1
UNION
SELECT * FROM table2;

Union All

To include the duplicate records, UNION ALL should be used. Note that the order of the records is not preserved. To do so, use ORDER BY clause

SELECT * FROM table1
UNION ALL
SELECT * FROM table2;

Intersect

The INTERSECT operation takes the output of two queries and returns the records that appear in both the result sets. Consider it as the overlapping records. Just like UNION, the INTERSECT operator removes the duplicates. To include duplicates, use INTERSET ALL operator.

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 1 AND 100

INTERSECT

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 50 AND 200;

Intersect All

The INTERSECT ALL operation includes the duplicates.

Minus /Except

The SQL EXCEPT operator takes the distinct rows of one query and returns the rows that do not appear in a second result set. It’s like performing A-B, removing B from A. The EXCEPT ALL operator does not remove duplicates, but if a row appears X times in the first query and Y times in the second, it will appear max(X - Y, 0) times in the result set.

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 1 AND 100

EXCEPT

SELECT *
FROM   Orders
WHERE  Quantity BETWEEN 50 AND 75;

Limiting result set record count

If you are expecting the SQL query to return a large result set and want to limit the records, consider adding below snippets to your code. NOTE - Remove the snippets during production deployment.

ROW_NUMBER()

The ROW_NUMBER() OVER may be used for a simple table on the returned rows, e.g. to return no more than ten rows:

SELECT * FROM
( SELECT
    ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS row_number,
    columns
  FROM tablename
) AS foo
WHERE row_number <= 10;

NOTE - ROW_NUMBER can be non-deterministic: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.

RANK()

The RANK() OVER window function acts like ROW_NUMBER, but may return more or less than n rows in case of tie conditions, e.g. to return the top-10 youngest persons:

SELECT * FROM (
  SELECT
    RANK() OVER (ORDER BY age ASC) AS ranking,
    person_id,
    person_name,
    age
  FROM person
) AS foo
WHERE ranking <= 10;

The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.

FETCH FIRST

The FETCH FIRST clause can be used to limit the output result set. NOTE - This is supported by limited no. of databases.

SELECT * FROM T 
FETCH FIRST 10 ROWS ONLY;

LIMIT Clause

The LIMIT clause can also server a simialr purpose as FETCH FIRST, but again, it is supported by limited databases.

SELECT * FROM TABLE1
LIMIT 10;