Advanced SQL: Window Functions, CTEs , Views & Indexes
This article captures notes from watching the video tutorials from this YouTube channel which I highly recommend to learn SQL.
WINDOW Functions
Aggregate functions, Rank functions, and Value Functions
Window functions are functions that operate by creating virtual “windows” 🪟 within the table to achieve an operation on each window.
Aggregate Functions
Common Aggregate Window Functions
The following are the most commonly used SQL Aggregate Window Functions:
COUNT()
counts the number of rows in a specified column across a defined window.SUM()
computes the sum of values within a specified column across a defined window.AVG()
calculates the average of a selected group of values across a defined window.MIN()
retrieves the lowest value from a particular column across a defined window.MAX()
fetches the highest value from a specific column across a defined window.FIRST_VALUE()
returns the first value in a designated column across a defined window.LAST_VALUE()
provides the last value in a given column across a defined window (can be repalced by the FIRST_VALUE with order by DESC).
COUNT () to identify duplicates
Ranking Functions
Value Functions
Think of LEAD()
as a function that lets you peek into the future 🔮, and LAG()
as a way to glance into the past ⏪.
Time-series analysis
To do year over year (YoY) analysis or month over month (MoM) analysis , the LEAD (next) and LAD (previous) functions are used. Another use case is to create a 7-day moving average metric, or a 28-day rolling count metric
Customer retention analysis
check how many days are for a customer to do its next order
Lowest and highest analysis nm
You can use the FIRST_VALUE by making the order DESC and this way there is no need to define the frame as its done in the LAST_VALUE example.
Subqueries vs CTE (Common Expression Tables)
subqueries and CTEs both are temporary tables used by the main query but the main difference is that the subquery is used only once while CTE can be used multiple times and for different purposes in the main query.
VIEWS vs CTEs
View syntax
The view is as any other data, the view is an abstraction and behind it there is a query that goes to the table to query the results.
This is a practical example to creates a view of a table that joins multiple tales and the end-user can directly query the view.
Security use case of the VIEW done for different end-users access
Views for multiple languages
Views can also be used for multiple languages. A VIEW which will be consumed by German users, the columns of it are translated to German.
VIEWS as virtual Data Marts in DWH
the views can reflect the status of the data warehouse
In the DWH we are going to do multiple steps to prepare the data for complex reporting and analysis and all this information is stored in the DWH as a physical table. It will be very complex to connect the DWH directly to a reporting tool e.g. PowerBI but instead the data can be split into multiple subsets like departments e.g. sales, finance.
The best practice for data marts is to use views instead of tables.
INDEXES
An INDEX is a data structure that provides a quick access to data (the rows) optimizing the speed of the query. It is like a guide in the database.
Clustered / Non clustered Index
Indexes Best Practices
- Naming convention of the index is “idx_tableName_columnName”