Steps in SQL Query Optimization

Alen IBRIC
4 min readJun 7, 2024

--

Generally speaking, SQL optimization aims to minimize the number of steps — “database touches” — that a query involves, thereby reducing processing time and wait time. There are many small SQL tricks and best practices to keep in mind, and while this list is by no means exhaustive or universally applicable, the following guidelines are useful in a large number of situations.

  1. Optimization should begin by identifying the most expensive queries to appropriately allocate optimization efforts. The truth is that SQL query performance optimization is an ongoing process; there is always room for improvement, always more code to optimize, and constant monitoring and maintenance are necessary — which can make it feel never-ending. For this reason, it is important to isolate high-impact SQL statements — those that are executed most frequently and require the most database activity and I/O operations (read and write operations). These statements provide the greatest returns in improving database performance, so targeting them optimizes the amount of work invested relative to performance improvement.

2. Whenever possible, minimize the amount of data that needs to be scanned in an operation. Many SQL statements will cause the database to perform a full table scan, which entails a significantly higher number of I/O operations and can degrade performance by slowing down operations and executing unnecessarily broad searches. To optimize data retrieval, it is recommended to:

  • Add indexes to tables if less than 5% of their data needs to be accessed, except in the case of relatively small tables (which can be more efficiently searched in their entirety regardless of how much data is needed).
  • Avoid including * in SELECT statements unless it is necessary for data retrieval, as this symbol burdens the system.
  • Use filters in WHERE clauses to limit the size of the data set.
  • Conversely, in a column-oriented system, select only the columns needed for the query.
  • Remove unnecessary tables from SQL statements. Sometimes developers forget to remove JOINs that were used for query testing and do not serve the final query. While this may be harmless during the testing phase, JOINs on tables that do not contribute to data retrieval can significantly increase processing time.
  • Use EXISTS in subqueries. This tells the subquery to stop searching once a match is found, instead of performing a full table scan.

3. Avoid using indexes on tables that undergo frequent UPDATE or INSERT operations, as indexes can slow down data entry. Similarly, consider deleting indexes when performing batch updates or inserts. In this case, it might be best to recreate the indexes after a batch event or simply avoid indexing tables that frequently experience batch data inserts.

4. Avoid mixing data types and do not convert numbers to strings. Such conversions can slow down operations and impact performance.

5. In some scenarios, it might be easier to create a new field rather than perform a calculation in a JOIN or WHERE clause. In this case, the new field would contain the calculated value, which the statement would select instead of computing it on the fly. To achieve this, the person optimizing the code must have permission to modify the data sets, of course — but this should not be an issue for a DBA or another IT administrator.

6. More broadly, it is necessary to align your SQL statements and data sets. Essentially, you need to check the SQL syntax to ensure that the SQL statements are written in ways that match the data structure and allow for easy access.

7. Introduce a protocol using stored procedures instead of individual statements. A stored procedure is a set of statements that reduce the execution cost of repetitive queries. For example, if an application requires data to be read weekly, this query can constitute a significant portion of database activity. Using a stored procedure can ensure the query executes quickly and according to a manually written execution plan, as databases execute stored procedures without re-optimizing them each time.

8. Use global temporary tables (GTTs) whenever possible to simplify complex aggregation queries. By breaking down work-intensive subqueries, GTTs have shown to significantly improve database performance.

9 .Utilize hints. Some DBMSs provide a list of hints online to assist application designers and DBAs. The purpose is to enable administrators and developers to “alter execution plans” and “force different approaches.” This allows designers optimizing their SQL statements to take control from the optimizer in certain scenarios where humans know more about the data than the optimizer. In these situations, they can ensure their execution plan is enforced instead of being overridden by the optimizer, which might choose an access approach that doesn’t optimize speed and performance.

10. Finally, make optimization routine. Performance optimization in SQL requires regular maintenance to prevent degradation of database performance over time as both data sets and RDBMS software evolve. With that in mind, it is necessary to commit to regular database normalization and defragmentation.

--

--

Alen IBRIC
Alen IBRIC

No responses yet