SQL Best Practices
Overview
Writing efficient and readable SQL queries is crucial for database performance and maintainability. Adhering to Best Practices can greatly enhance the efficiency, readability, and scalability of your SQL code. Below are several key SQL Best Practices , along with examples, to guide you in writing optimal SQL queries.
1. SQL syntax
- Best Practice
Write clean, readable SQL by properly formatting your queries. Use consistent indentation, and meaningful aliases, and avoid unnecessary complexity.
Snowflake Optimization: This will maintain the consistency of the query text and utilize the query result when available, thereby minimizing computational costs.
- Good Example
SELECT
customer_id,
first_name,
last_name
FROM
customers
WHERE
active = 1
ORDER BY
last_name;
- Bad Example
SELECT customer_id, first_name, last_name FROM customers WHERE active = 1 ORDER BY last_name;
2. Select column list over SELECT *
-
Best Practice
Specify the columns you need in the SELECT statement instead of using SELECT *. This improves performance and readability.
Snowflake Optimization: This approach allows Snowflake to scan only the necessary data, reducing computational load and potentially cost.
- Good Example
SELECT
customer_id,
first_name,
last_name
FROM
customers;
- Bad Example
SELECT * FROM customers;
3. CTE over sub-query
-
Best Practice
Use Common Table Expressions (CTEs) to make complex queries more readable and maintainable instead of using sub-queries.
-
Good Example
WITH OrderTotals AS (
SELECT
customer_id,
SUM(order_amount) AS total_amount
FROM
orders
GROUP BY
customer_id
)
SELECT
c.customer_id,
c.first_name,
ot.total_amount
FROM
customers c
JOIN
OrderTotals ot ON c.customer_id = ot.customer_id;
- Bad Example
SELECT
c.customer_id,
c.first_name,
(
SELECT
SUM(o.order_amount)
FROM
orders o
WHERE
o.customer_id = c.customer_id
) AS total_amount
FROM
customers c;
4. Early filter where possible
-
Best Practice
Filter records as early as possible in the query to reduce the number of rows processed by subsequent operations.
Snowflake Optimization: This approach allows Snowflake to scan only the necessary data, reducing computational load and potentially cost.
- Good Example
WITH FilteredOrders AS (
SELECT
order_id,
customer_id,
order_date
FROM
orders
WHERE
order_date >= '2023-01-01'
)
SELECT
c.customer_id,
c.first_name,
fo.order_id,
fo.order_date
FROM
customers c
JOIN
FilteredOrders fo ON c.customer_id = fo.customer_id;
- Bad Example
SELECT
c.customer_id,
c.first_name,
o.order_id,
o.order_date
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.order_date >= '2023-01-01';
5. WHERE clause instead of HAVING
-
Best Practice
Use the WHERE clause to filter records before aggregation, and use HAVING only to filter records after aggregation.
-
Good Example
SELECT
customer_id,
COUNT(order_id) as order_count
FROM
orders
WHERE
order_date >= '2023-01-01'
GROUP BY
customer_id
HAVING
COUNT(order_id) > 5;
- Bad Example
SELECT
customer_id,
COUNT(order_id) as order_count
FROM
orders
GROUP BY
customer_id
HAVING
order_date >= '2023-01-01'
AND
COUNT(order_id) > 5;
6. INNER JOIN over LEFT/RIGHT JOIN
-
Best Practice
Use INNER JOIN when you need only the matched records from both tables. LEFT JOIN or RIGHT JOIN should be used only when you need all records from one table and matched records from the other.
-
Good Example
SELECT
c.customer_id,
c.first_name,
o.order_id
FROM
customers c
INNER JOIN
orders o ON c.customer_id = o.customer_id;
- Bad Example
SELECT
c.customer_id,
c.first_name,
o.order_id
FROM
customers c
LEFT JOIN
orders o ON c.customer_id = o.customer_id;
7. Join on index or integer column
-
Best Practice
Join tables on indexed or integer columns to improve performance. Avoid joining on string columns or non-indexed columns.
Snowflake Optimization: Joining on an index or integer column in Snowflake reduce compute costs by speeding up query execution, optimizing storage usage, improving clustering efficiency, and enabling effective partitions pruning.
- Good Example
SELECT
o.order_id,
o.order_date,
c.customer_name
FROM
orders o
JOIN
customers c ON o.customer_id = c.customer_id;
- Bad Example
SELECT
o.order_id,
o.order_date,
c.customer_name
FROM
orders o
JOIN
customers c ON o.customer_email = c.customer_email;
8. Use EXISTS() instead of COUNT()
-
Best Practice
Use EXISTS for checking the existence of rows, which is generally faster than using COUNT(*).
Snowflake Optimization: This reduces Snowflake compute costs by quickly identifying the presence of records without scanning the entire dataset, thus improving query performance and efficiency.
- Good Example
IF EXISTS (SELECT 1 FROM customers WHERE customer_id = 1)
BEGIN
PRINT 'Customer exists';
END;
- Bad Example
IF (SELECT COUNT(*) FROM customers WHERE customer_id = 1) > 0
BEGIN
PRINT 'Customer exists';
END;
9. Use comments for complex logic
-
Best Practice
Add comments to your SQL code to explain complex logic or any non-obvious parts of the query.
-
Example
- Calculate the total order amount for each customer
WITH OrderTotals AS (
SELECT
customer_id,
SUM(order_amount) AS total_amount
FROM
orders
GROUP BY
customer_id
)
-- Select customers along with their total order amount
SELECT
c.customer_id,
c.first_name,
ot.total_amount
FROM
customers c
JOIN
OrderTotals ot ON c.customer_id = ot.customer_id;
Snowflake specific query performance and compute cost optimization tips:
Optimizing query performance and managing compute costs effectively in Snowflake can significantly enhance the efficiency of your data operations. Below are some best practices to help you achieve these goals:
1. Use LIMIT Clause to Restrict Result Volume
-
Best Practice
When working with large datasets, it’s beneficial to use the LIMIT clause to restrict the number of rows returned by a query. This not only speeds up query execution but also reduces the amount of data transferred, leading to lower compute costs.
-
Example
-- Retrieve only the first 100 rows from the large_table
SELECT * FROM large_table
WHERE condition = 'value'
LIMIT 100;
2. Use temp table creation to materialize a complex query output
-
Best Practice For complex queries involving multiple joins, aggregations, or sub-queries, creating a temporary table to store intermediate results can improve performance. This approach allows you to break down complex queries into simpler, more manageable steps and reuse the intermediate results efficiently.
-
Example
-- Create a temporary table to store the intermediate results
CREATE TEMPORARY TABLE temp_results AS
SELECT column1, SUM(column2) AS total
FROM large_table
WHERE condition = 'value'
GROUP BY column1;
-- Use the temporary table in subsequent queries
SELECT column1, total
FROM temp_results
WHERE total > 100;
3. Use Snowsight to filter columns and get insights from already executed queries
-
Best Practice
Snowsight, Snowflake’s built-in analytics interface, provides powerful tools for analyzing query performance and getting insights from executed queries. By using Snowsight, you can easily filter columns, view execution details, and optimize your queries based on past performance.
-
Example
-
Filter Columns
“If you’ve already retrieved the data and wish to further filter columns, utilize Snowsight. This eliminates the need to execute an additional query.”
-
-
Data Insights
Use Snowsight to gain insights from previously retrieved columns. This strategy will help decrease the cost associated with executing additional queries.
-
Table Sample
Utilize the TABLESAMPLE/SAMPLE function when you need to extract a data sample from the table for analysis. This method enhances the probability of identifying data gaps beyond the set limit ‘n’.