Search This Blog

SQL Server: Approaching Database Server Performance Issues (GOOD)


Step 1:
When you work as DBA, many people will approach you with a complaint like "Application is taking ages to load the data on a page,could you please check something going wrong with database server?" There might be hundred of other reason for slowness of the page.It might be a Problem with application server,network issues,really a bad implementation or problem with database server due to generation of huge report /job running at that moment. What ever be the issue, database gets the blame first. Then it is our responsibility to cross check the server state. 

Let us discuss how we can approach this issue. I use following script to diagnose  the issues. The first script which I will run against server is given below:

SELECT 
parent_node_id                  AS Node_Id,
COUNT(*)                        AS [No.of CPU In the NUMA],
SUM(COUNT(*)) OVER()            AS [Total No. of CPU],
SUM(runnable_tasks_count )      AS [Runnable Task Count], 
SUM(pending_disk_io_count)                    AS [Pending disk I/O count],
SUM(work_queue_count)          AS  [Work queue count]
FROM sys.dm_os_schedulers WHERE status='VISIBLE ONLINE' GROUP BY parent_node_id
 
This will give following information. 
  • Number of records in the output will be equal to number of NUMA nodes (if it is fetching only one record , it is not a NUMA supported server)
  • Node_id : NUMA node id . Can be mapped into the later scripts.
  • No.of CPU in the NUMA : Total number of CPU assigned to the specific NUMA node or the number of schedulers.
  • Total No. of CPU : Total No. of CPU available in the server.If you have set the affinity mask, total number of CPU assigned to this instance.
  • Runnable Task Count: Number of workers, with tasks assigned to them, that are waiting to be scheduled on the runnable queue. Is not nullable. In short number of request in runnable queue.To understand more about Runnable queue , read my earlier post.
  • Pending disk I/O count : Number of pending I/Os that are waiting to be completed. Each scheduler has a list of pending I/Os that are checked to determine whether they have been completed every time there is a context switch. The count is incremented when the request is inserted. This count is decremented when the request is completed.
  • Work queue count: Number of tasks in the pending queue. These tasks are waiting for a worker to pick them up.
I have scheduled this scrip to store the output this query in a table for two days in the interval of 10 minutes. That will give baseline data about what is normal in your environment. In my environment people will start complaining once the Runnabable Task Count of  most of the nodes goes beyond 10 consistently. In normal scenario, the value of Runnabable Task Count will be always below 10 on each node and never seen a value greater than 0 for work queue count field.This will give a picture of current state of the system.If the output of this step is normal, we are safe to an extent, the slow response might be issue which might be beyond our control or a blocking and slow response is only for a couple of screens(sessions) not for entire system. 

This is the first step in my way of diagnosing.

Step 2:
The next step (Step2)  in my way of diagnosing is to check the session that are waiting of any resources. Below script will help us. This query required a function as prerequisite,  which will help us to display the SQL server agent job name if the session started by SQL server agent.

/*****************************************************************************************
           PREREQUISITE FUNCTION
******************************************************************************************/

USE MASTER
GO 
CREATE FUNCTION ConvertStringToBinary  @hexstring  VARCHAR(100)
)  RETURNS BINARY(34)  AS
BEGIN

   RETURN
(SELECT CAST('' AS XML).value('xs:hexBinary( substring(sql:variable("@hexstring"), sql:column("t.pos")) )''varbinary(max)')
   
FROM (SELECT CASE SUBSTRING(@hexstring12WHEN '0x' THEN ELSE ENDAS t(pos))  

END
/***************************************************************************************
STEP 2: List the session which are currently waiting for resource
****************************************************************************************/

SELECT node.parent_node_id AS Node_id,
es.HOST_NAME,
es.Login_name,
CASE WHEN es.program_name LIKE '%SQLAgent - TSQL JobStep%' THEN
         
(
          
SELECT 'SQL AGENT JOB: '+name FROM msdb..sysjobs WHERE job_id=
          MASTER
.DBO.ConvertStringToBinary(LTRIM(RTRIM((SUBSTRING(es.program_name,CHARINDEX('(job',es.program_name,0)+4,35)))))
          )
    
ELSE es.program_name END  AS [Program Name] ,

DB_NAME(er.database_idAS DatabaseName,
er.session_id
wt.blocking_session_id,
wt.wait_duration_ms,
wt.wait_type,
wt.NoThread ,
er.command,
er.status,
er.wait_resource,
er.open_transaction_count,
er.cpu_time,
er.total_elapsed_time AS ElapsedTime_ms,
er.percent_complete ,
er.reads,
er.writes,
er.logical_reads,
wlgrp.name AS ResoursePool              ,
SUBSTRING   (sqltxt.TEXT,(er.statement_start_offset/2) + 1,     
            ((
CASE WHEN er.statement_end_offset -1     
             
THEN LEN(CONVERT(NVARCHAR(MAX), sqltxt.TEXT)) * 2     
             
ELSE er.statement_end_offset     
            
END er.statement_start_offset)/2) + 1AS [Individual Query]

sqltxt.TEXT AS [Batch Query]                
FROM (SELECT session_idSUM(wait_duration_msAS 
wait_duration_ms,wait_type,blocking_session_id,COUNT(*) AS NoThread 
FROM  SYS.DM_OS_WAITING_TASKS  GROUP BY session_idwait_type,blocking_session_idwt 
INNER JOIN SYS.DM_EXEC_REQUESTS  er ON wt.session_id=er.session_id INNER JOIN SYS.DM_EXEC_SESSIONS es ONes.session_ider.session_id
INNER JOIN SYS.DM_RESOURCE_GOVERNOR_WORKLOAD_GROUPS wlgrp ON wlgrp.group_id=er.group_id          
INNER JOIN  (SELECT  os.parent_node_id ,task_address FROM SYS.DM_OS_SCHEDULERS  OS INNER JOIN 
SYS.DM_OS_WORKERS  OSW ON OS.scheduler_address=OSW.scheduler_address 
WHERE os.status='VISIBLE ONLINE' GROUP BY os.parent_node_id ,task_address node   
ON node.task_address=er.task_address
CROSS APPLY SYS.DM_EXEC_SQL_TEXT(er.sql_handleAS sqltxt
WHERE sql_handle IS NOT NULL AND wt.wait_type NOT IN ('WAITFOR','BROKER_RECEIVE_WAITFOR')
GO

The Description of the columns in the result are given below. 


Column NameDescription
Node IdNUMA node id . Can be mapped to the node id of the scheduler query.
Host_NameName of the computer from the connection is originated.
Login NameLogin used in the session to connect the database server
Program NameName of the program/application using this session. You can set the application name in the connection string. If this session is part of SQL server agent job, it will show the job name 
Database NameCurrent database  of the session
Session IdThe session id
Blocking Session idSession id blocking statement 
wait_duration_msTotal wait time for this wait type, in milliseconds. This time is inclusive of signal wait time 
wait_typeName of the wait type like SLEEP_TASK,CXPACKET etc
No of ThreadNo of threads running on this session. If the session is in parallel execution
CommandIdentifies the current type of command that is being processed like Select,insert,update,delete etc
StatusStatus of the request. This can be of the following: Background,Running,Runnable,Sleeping and Suspended
Wait Resource Resource for which the request is currently waiting
Open Transaction countNumber of transaction opened in this session
Cpu TimeCPU time in milliseconds that is used by the request.
Total Elapsed TimeTotal time elapsed in milliseconds since the request arrived
Percent_CompletePercent of work completed for certain operations like backup,restore
rollback etc.
ReadsNumber of reads performed by this request.
WritesNumber of writes performed by this request.
logical_readsNumber of logical reads performed by this request.
ResoursePoolName of of Resource Governor Pool
Individual Querycurrent statement of the batch running on this session.
Batch QueryCurrent batch (procedure/set of sql statement) running on this session.


If there is a session with very long wait_duration_ms and  not blocked by any other session and  not going away from the list in the subsequent execution of the same query, I will look into the program name,host name,login name and the statement that is running which will give me an idea about the session.Based on all these information, I might decide to kill that session and look into the implementation of that SQL batch. If the session is blocked, I will look into the blocking session using a different script which I will share later.(Refer this post)

Step 3:

The next step (Step 3)  is to list all session which are currently running on the server. I use below query to do that.

/***************************************************************************************
STEP 3: List the session which are currently waiting/running
****************************************************************************************/
SELECT node.parent_node_idAS Node_id,

es.HOST_NAME,
es.login_name,
CASE WHEN es.program_name LIKE '%SQLAgent - TSQL JobStep%' THEN
(SELECT 'SQL AGENT JOB: '+name FROM msdb..sysjobs WHERE job_id=ADMIN.DBO.ConvertStringToBinary(LTRIM(RTRIM((SUBSTRING(es.program_name,CHARINDEX('(job',es.program_name,0)+4,35)))))
)
ELSE es.program_name END  AS program_name ,

DB_NAME(er.database_idAS DatabaseName,
er.session_id
wt.blocking_session_id,
wt.wait_duration_ms,
wt.wait_type,
wt.NoThread ,
er.command,
er.status,
er.wait_resource,
er.open_transaction_count,
er.cpu_time,
er.total_elapsed_time AS ElapsedTime_ms,
er.percent_complete ,
er.reads,er.writes,er.logical_reads,
wlgrp.name AS ResoursePool              ,
SUBSTRING (sqltxt.TEXT,(er.statement_start_offset/2) + 1,                
((CASE WHEN er.statement_end_offset -1                
THEN LEN(CONVERT(NVARCHAR(MAX), sqltxt.TEXT)) * 2                
ELSE er.statement_end_offset                
END er.statement_start_offset)/2) + 1AS [Individual Query],
sqltxt.TEXT AS [Batch Query]                
FROM 
SYS.DM_EXEC_REQUESTS  er INNER JOIN SYS.DM_EXEC_SESSIONS es ON es.session_ider.session_id
INNER JOIN SYS.DM_RESOURCE_GOVERNOR_WORKLOAD_GROUPS wlgrp ON wlgrp.group_id=er.group_id          
INNER JOIN  (SELECT  os.parent_node_id ,task_address FROM SYS.DM_OS_SCHEDULERS  OS 
INNER JOIN SYS.DM_OS_WORKERS  OSW ON OS.scheduler_address=OSW.scheduler_address
WHERE os.status='VISIBLE ONLINE' GROUP BY os.parent_node_id ,task_address node ONnode.task_address=er.task_address
LEFT JOIN 
(SELECT session_idSUM(wait_duration_msAS 
wait_duration_ms,wait_type,blocking_session_id,COUNT(*) AS NoThread 
FROM  SYS.DM_OS_WAITING_TASKS  GROUP BY session_idwait_type,blocking_session_idwt 
ON wt.session_id=er.session_id
CROSS apply SYS.DM_EXEC_SQL_TEXT(er.sql_handleAS sqltxt
WHERE sql_handle IS NOT NULL AND ISNULL(wt.wait_type ,'') NOT IN 
('WAITFOR','BROKER_RECEIVE_WAITFOR')
ORDER BY er.total_elapsed_time DESC

GO
 

The columns are same as we discussed in step 2 . I used to analyse the sessions with  more total_elapsed_time and take appropriate actions like killing the session and look into the implementation. In most of the scenario (where server was running perfectly but all off sudden it become standstill) , I will be able fix the issue by following these steps.  In the next part let us discuss  about blocking session and session with open transaction which is not active.


Step 4:
In this post let us see how to list the blocking sessions with required information.  

******************************************************************************************/
CREATE FUNCTION [dbo].dba_GetStatementForSpid
(  
   @spid SMALLINT  
)  
RETURNS NVARCHAR(4000)  
BEGIN  
   DECLARE @SqlHandle BINARY(20)  
   DECLARE @SqlText NVARCHAR(4000)  
   SELECT @SqlHandle sql_handle   
      FROM sys.sysprocesses WITH (nolockWHERE   spid @spid  
   SELECT @SqlText [text] FROM   
      sys.dm_exec_sql_text(@SqlHandle)  
   RETURN @SqlText  
END  
GO

/*****************************************************************************************
STEP 4: List the current blocking session information
****************************************************************************************/


SELECT

es.session_id,
es.HOST_NAME,
DB_NAME(database_idAS DatabaseName
CASE WHEN es.program_name LIKE '%SQLAgent - TSQL JobStep%' THEN (SELECT 'SQL AGENT JOB: '+name FROM msdb..sysjobsWHEREjob_id=MASTER.DBO.ConvertStringToBinary (LTRIM(RTRIM((SUBSTRING(es.program_name,CHARINDEX('(job',es.program_name,0)+4,35)))))
ELSE es.program_name END  AS program_name ,
es.login_name ,
bes.session_id AS Blocking_session_id,
MASTER.DBO.dba_GetStatementForSpid(es.session_idAS [Statement],
bes.HOST_NAME AS Blocking_hostname,
CASE WHEN Bes.program_name LIKE '%SQLAgent - TSQL JobStep%' THEN
(SELECT 'SQL AGENT JOB: '+name FROM msdb..sysjobs WHERE job_id=
MASTER.DBO.ConvertStringToBinary 
(LTRIM(RTRIM((SUBSTRING(Bes.program_name,CHARINDEX('(job',es.program_name,0)+4,35))))))
ELSE Bes.program_name END  AS Blocking_program_name,
bes.login_name AS Blocking_login_name,
  MASTER.DBO.dba_GetStatementForSpid(bes.session_id AS [Blocking Statement]
FROM sys.dm_exec_requests S 
INNER JOIN sys.dm_exec_sessions  es ON es.session_id=s.session_id
INNER JOIN sys.dm_exec_sessions  bes ON bes.session_id=s.blocking_session_id


This script will list the blocked and blocking statement information and will be helpful to troubleshoot. Below script will help us to identify the sessions which have an open transaction but not active . That is, a sessions with an open transaction but not running any statement in the last 30 seconds.

/*****************************************************************************************
STEP 4: List the Open session with transaction which is not active
****************************************************************************************/

SELECT es.session_id
es.login_name
es.HOST_NAME
DB_NAME(SP.dbidAS DatabaseName,
sp.lastwaittype,
est.TEXT,cn.last_read
cn.last_writeCASE WHEN es.program_name LIKE '%SQLAgent - TSQL JobStep%' THEN(SELECT 'SQL AGENT JOB: '+name FROM msdb..sysjobsWHERE job_id=MASTER.DBO.ConvertStringToBinary(LTRIM(RTRIM((SUBSTRING(es.program_name,CHARINDEX('(job',es.program_name,0)+4,35)))))
)
ELSE es.program_name END  AS program_name 

FROM sys.dm_exec_sessions es
INNER JOIN sys.dm_tran_session_transactions st ON es.session_id st.session_id                INNER JOINsys.dm_exec_connections cn ON es.session_id cn.session_id 
INNER JOIN sys.sysprocesses SP ON SP.spid=es.session_id                 
LEFT OUTER JOIN sys.dm_exec_requests er  ON st.session_id er.session_id   
AND er.session_id IS NULL          
CROSS 
APPLY sys.dm_exec_sql_text(cn.most_recent_sql_handleest                

WHERE (DATEDIFF(SS,cn.last_read,GETDATE())+DATEDIFF(SS,cn.last_write,GETDATE()))>30
AND lastwaittype NOT IN ('BROKER_RECEIVE_WAITFOR' ,'WAITFOR')                                  GO 

Step 5:

In this post let us discuss about listing execution stats from the plan cache.

The below query will help us to list the CPU intensive queries from your  plan cache.

/*****************************************************************************************
List heavy query based on CPU/IO. Change the order by clause appropriately
******************************************************************************************/

SELECT TOP 20
DB_NAME(qt.dbidAS DatabaseName
,DATEDIFF(MI,creation_time,GETDATE()) AS [Age of the Plan(Minutes)]
,last_execution_time AS [Last Execution Time]
,qs.execution_count AS [Total Execution Count]
,CAST((qs.total_elapsed_time) / 1000000.0 AS DECIMAL(28,2)) [Total Elapsed Time(s)]
,CAST((qs.total_elapsed_time ) / 1000000.0qs.execution_count AS DECIMAL(282)) AS [Average Execution time(s)]
,CAST((qs.total_worker_time) / 1000000.0 AS DECIMAL(28,2)) AS [Total CPU time (s)]
,CAST(qs.total_worker_time 100.0 qs.total_elapsed_time AS DECIMAL(28,2)) AS [% CPU]
,CAST((qs.total_elapsed_time qs.total_worker_time)* 100.0 /qs.total_elapsed_time AS DECIMAL(282)) AS [% Waiting]
,CAST((qs.total_worker_time) / 1000000.0qs.execution_count AS DECIMAL(282)) AS [CPU time average (s)]
,CAST((qs.total_physical_reads) / qs.execution_count AS DECIMAL(282)) AS [Avg Physical Read]
,CAST((qs.total_logical_reads) / qs.execution_count AS DECIMAL(282))  AS [Avg Logical Reads]
,CAST((qs.total_logical_writes) / qs.execution_count AS DECIMAL(282)) AS [Avg Logical Writes]
,max_physical_reads
,max_logical_reads
,max_logical_writes
SUBSTRING (qt.TEXT,(qs.statement_start_offset/2) + 1,((CASE WHEN qs.statement_end_offset -1
   THEN LEN(CONVERT(NVARCHAR(MAX), qt.TEXT)) * 2
   
ELSE qs.statement_end_offset
   
END qs.statement_start_offset)/2) + 1AS [Individual Query]

qt.TEXT AS [Batch Statement]
qp.query_plan
FROM SYS.DM_EXEC_QUERY_STATS qs
CROSS APPLY SYS.DM_EXEC_SQL_TEXT(qs.sql_handleAS qt
CROSS APPLY SYS.DM_EXEC_QUERY_PLAN(qs.plan_handleqp
WHERE qs.total_elapsed_time 0
ORDER BY 
[Total CPU time (s)] 
--[Avg Physical Read]
--[Avg Logical Reads]
--[Avg Logical Writes]
--[Total Elapsed Time(s)]
--[Total Execution Count]
DESC 

Let us discuss the output columns.
Column NameDescription
DatabaseNameDatabase Context of the execution plan.
Age of the Plan(Minutes)Age of the plan in the plan cache in minutes.
Last Execution TimeDate and Time of the last execution of this plan.
Total Execution CountTotal number of execution after the last compilation
Or Total number of execution in the last [Age of the Plan(Minutes)]
(since it was last compiled)

Total Elapsed Time(s)
Total time (second) took to execute this plan [Total Execution Count] times
Average 
Execution time(s)
Average Time(seconds) took for single execution of this plan
Total CPU time (s)Total CPU time(seconds) took to  execute this plan [Total Execution Count] times
% CPUPercentage of CPU time compared to  [Total Elapsed Time(s)]

% Waiting 
Percentage of waiting time(wait for resource)  compared to  [Total Elapsed Time(s)]

CPU time 
average (s)
Average CPU time (Seconds) used for single exection

Avg Physical Read
Average number of Physical read for single execution

Avg Logical Reads
Average number of Logical read for single execution

Avg Logical Writes
Average number of Logical writes for single execution

max_physical_reads
Maximum number of physical reads that this plan has ever performed during a single execution.

max_logical_reads
Maximum number of logical reads that this plan has ever performed during a single execution.

max_logical_writes
Maximum number of logical writes that this plan has ever performed during a single execution.

Individual Query
Part of Batch Statement 

Batch Statement
Batch Query

query_plan
XML execution . On clicking this we can see the graphical execution plan

I used to  analyse  the first five to ten records ( in each category by changing the order by clause) to see  procedures implementation. Most of the time I used to find some thing wrong with the procedure  like unnecessary temp table usage, distinct clause , cursors, table joining with out proper joining condition,no proper indexing etc. The other issue usually happen is, enormous calls to database for single procedure (CPU cost and execution time might less). This might be a wrong implementation, which should be fixed by discussing with development team. Usually this can be fixed by adding some types of caching  in application.Some time calls to the database is only to check if there is any change in the result data. Some thing like if there is new records in a table, it should be processed immediately. To achieve this, application might be querying the table to find the unprocessed record multiple time in a second, This can be fixed by implementing some kind of asynchronous call to the application by the application which inserts the data into this table or by implementing the notification event using SqlDependency available in the .Net frame work.