SQL Query optimization using Statistics in OutSystems

John Alvin Salamat
4 min readMay 18, 2022

SQL Sandbox has been part of every project that I’ve worked on. It’s my go-to tool to have a view on data and to build complex queries. In query performance consideration, we rely a lot on the time the query gets executed but undeniably sometimes this is not enough to say we got an “optimized” query. The query that was written in the development environment might not adapt to the data growth in production thus further visibility is needed. That’s how I considered exploring the path of exposing SQL messages where some important statistics can be viewed.

Fortunately, I have access to the brains behind SQL sandbox and was allowed to contribute to this project. Nostalgic experience indeed to see again C# codes and playaround different SQL classes and here’s the result of it:

I added a capability to execute a statement and show resulting SQL messages. By adding SET STATISTICS IO ON, I am able to generate detailed information about the amount of disk activity generated by a T-SQL statement showing the following attributes:

Scan count
Number of seeks or scans started after reaching the leaf level in any direction to retrieve all the values to construct the final dataset for the output.

Scan count is 0 if the index used is a unique index or clustered index on a primary key and you’re seeking for only one value. For example, WHERE Primary_Key_Column = <value>.

Scan count is 1 when you’re searching for one value using an index on a non-primary key column. This process is done to check for duplicate values for the key value that you’re searching for. For example, WHERE Index_Key_Column = <value>.

Scan count is N when N is the number of different seeks or scans started towards the left or right side at the leaf level after locating a key value using the index key.

logical reads
Number of pages read from the data cache.

physical reads
Number of pages read from disk.

read-ahead reads
Number of pages placed into the cache for the query.

lob logical reads
Number of pages read from the data cache. Applies to Text fields that’s more than 2000 length

lob physical reads
Number of pages read from disk. Applies to Text fields that’s more than 2000 length and Binary fields

lob read-ahead reads
Number of pages placed into the cache for the query. Applies to Text fields that’s more than 2000 length

Worktable
Quite often SQL Server has to perform any logical operations for any specific queries and to perform these logical operations SQL Server has to build a worktable. Worktables are built in tempdb and are dropped automatically when they are no longer needed.

Original Source: https://docs.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017, some meanings were changed to make it OutSystems terminology

Impacts of filtering with HAVING versus WHERE clauses on io statistics

Let’s say we have the following scenario — we are trying to aggregate the number of Customers per State and filtering the State by the first letter.

  • You can join the CountryState entity to the Customer entity and then group the CountryState by name with a HAVING clause for states names starting with the letter G. Then, just count the number of rows by state name.
  • Alternatively, you can use a where clause to filter the join or rows from the CountryState and Customer entities. Then, you can count the number of rows by state name.

Both of these query have the same result

But because we are using HAVING on the first query, it has to create a Worktable in order to get the filtered results. The goal is to have a query with fewer reads and avoid unnecessary creation of Worktable. This will lead to leaner query that in return will return better performance.

I also like to share an additional tool I use to get a better view of statistics to do analysis https://statisticsparser.com/.

--

--