Understanding SQL
The Structured Query Language (SQL) is everywhere. All major applications, all the systems that deal with the ‘Relational Database Management Systems’ (RDMS) beg for deep knowledge on SQL. So, I set out to learn deep concepts of SQL and document my learning.
SQL execution sequence
SQL statements includes a variety of clauses to select, join, trim and slice data as per our needs. Each of these clauses
have a execution sequence, and order. Let’s take a look at it.
SELECT DISTINCT <TOP_specification> <select_list>
FROM <left_table>
<join_type> JOIN <right_table>
ON <join_condition>
WHERE <where_condition>
GROUP BY <group_by_list>
HAVING <having_condition>
ORDER BY <order_by_list>
In SQL statements, the primary clause
that is processed in the FROM
clause. This is the way of statement telling SQL language to look for the table first and then do whatever that is asked to do later. So, FROM clause always executes first.
Here’s the sequence of the 10 clauses.
1. FROM clause
2. ON clause
3. OUTER clause
4. WHERE clause
5. GROUP BY clause
6. HAVING clause
7. SELECT clause
8. DISTINCT clause
9. ORDER BY clause
10. TOP clause
Understanding the execution order helps optimizing the query performance easy.
Set Operations
Imagine you have two tables that have equivalent columns, and you want to perform set operations
on the table outputs.
Union
In SQL the UNION
clause combines the results of two SQL queries into a single table of all matching rows. The two queries must result in the same number of columns and compatible data types in order to unite. Any duplicate records are automatically removed unless UNION ALL
is used.
SELECT * FROM table1
UNION
SELECT * FROM table2;
Union All
To include the duplicate records, UNION ALL
should be used. Note that the order of the records is not preserved. To do so, use ORDER BY
clause
SELECT * FROM table1
UNION ALL
SELECT * FROM table2;
Intersect
The INTERSECT
operation takes the output of two queries and returns the records that appear in both the result sets. Consider it as the overlapping records. Just like UNION
, the INTERSECT
operator removes the duplicates. To include duplicates, use INTERSET ALL
operator.
SELECT *
FROM Orders
WHERE Quantity BETWEEN 1 AND 100
INTERSECT
SELECT *
FROM Orders
WHERE Quantity BETWEEN 50 AND 200;
Intersect All
The INTERSECT ALL
operation includes the duplicates.
Minus /Except
The SQL EXCEPT
operator takes the distinct rows of one query and returns the rows that do not appear in a second result set. It’s like performing A-B, removing B from A. The EXCEPT ALL
operator does not remove duplicates, but if a row appears X times in the first query and Y times in the second, it will appear max(X - Y, 0)
times in the result set.
SELECT *
FROM Orders
WHERE Quantity BETWEEN 1 AND 100
EXCEPT
SELECT *
FROM Orders
WHERE Quantity BETWEEN 50 AND 75;
Limiting result set record count
If you are expecting the SQL query to return a large result set and want to limit the records, consider adding below snippets to your code. NOTE - Remove the snippets during production deployment.
ROW_NUMBER()
The ROW_NUMBER() OVER
may be used for a simple table on the returned rows, e.g. to return no more than ten rows:
SELECT * FROM
( SELECT
ROW_NUMBER() OVER (ORDER BY sort_key ASC) AS row_number,
columns
FROM tablename
) AS foo
WHERE row_number <= 10;
NOTE - ROW_NUMBER can be non-deterministic: if sort_key is not unique, each time you run the query it is possible to get different row numbers assigned to any rows where sort_key is the same. When sort_key is unique, each row will always get a unique row number.
RANK()
The RANK()
OVER window function acts like ROW_NUMBER
, but may return more or less than n rows in case of tie conditions, e.g. to return the top-10 youngest persons:
SELECT * FROM (
SELECT
RANK() OVER (ORDER BY age ASC) AS ranking,
person_id,
person_name,
age
FROM person
) AS foo
WHERE ranking <= 10;
The above code could return more than ten rows, e.g. if there are two people of the same age, it could return eleven rows.
FETCH FIRST
The FETCH FIRST
clause can be used to limit the output result set. NOTE - This is supported by limited no. of databases.
SELECT * FROM T
FETCH FIRST 10 ROWS ONLY;
LIMIT Clause
The LIMIT
clause can also server a simialr purpose as FETCH FIRST
, but again, it is supported by limited databases.
SELECT * FROM TABLE1
LIMIT 10;