Recently an issue attracted my attention where we needed to tame the runaway SQL jobs, where the step in the job to import a file from the FTP source at times would run up to 2 days (which would normally complete in ~2 minutes).
Well at times ftp downloads by scripts are affected by lots of factors out of our hands, network congestion, provider or what not.
So the only way to make it work was quite simple, kill the job and start again and in the next run every thing would be absolutely fine.
But this cant be done manually some one sitting and looking at each job run time.
So I devised a way in the below script which when run at frequency intervals of 30 seconds to a minute would
monitor job (or step) current run duration and compare it with the meta data in a table. If the duration exceeds the meta-table entry the job is killed and email is send out for notification.
This has proved us very valuable, and hence thought would prove very helpful for any one in a similar situation, till SQL server jobs doesn’t come with a default max run time functionality in new releases
The script in the job step is as follows:
IF OBJECT_ID('tempdb..##KillJobList') IS NOT NULL DROP TABLE ##KillJobList; SELECT AJ.job_name , AJ.start_execution_date , AJ.current_executed_step_id , AJ.step_name , AJ.RunDurationMinutes , AJ.RunDurationSeconds, JM.max_duration_sec , JM.step_number, AJ.CurrentStepRunDurationMinutes , AJ.CurrentStepRunDurationSeconds, CASE WHEN JM.step_name IS NOT NULL THEN --Check for the job step duration CASE WHEN JM.step_name = AJ.step_name THEN --When the current executing job matches the defination CASE WHEN JM.max_duration_sec < AJ.CurrentStepRunDurationSeconds THEN 1 ELSE 0 END --if the job manager max duration is less than current execution time, kill job by flag 1 ELSE 0 END ELSE CASE WHEN JM.max_duration_sec < AJ.RunDurationSeconds THEN 1 ELSE 0 END END KillJobFlag INTO ##KillJobList FROM dbo.vwActiveRunningJob AJ INNER JOIN dbo.JobManager JM ON JM.job_name = AJ.job_name WHERE JM.duration_check = 1; DECLARE @sqlstmt VARCHAR(2000); DECLARE db_cursor CURSOR FOR SELECT job_name FROM ##KillJobList WHERE KillJobFlag = 1 OPEN db_cursor FETCH NEXT FROM db_cursor INTO @sqlstmt WHILE @@FETCH_STATUS = 0 BEGIN --SELECT @sqlstmt EXEC msdb.dbo.sp_stop_job @job_name = @sqlstmt FETCH NEXT FROM db_cursor INTO @sqlstmt END CLOSE db_cursor DEALLOCATE db_cursor IF EXISTS (SELECT * FROM ##KillJobList WHERE KillJobFlag = 1) BEGIN EXEC msdb.dbo.sp_send_dbmail @profile_name='--Mail Profile Name--', @recipients = 'firstname.lastname@example.org', @subject = '[Job Force Killed] Auto Job Manager', @body = 'The attached file contains the list of job that were killed due to exceeding there max run time defined', @query = 'SELECT * FROM ##KillJobList WHERE KillJobFlag = 1', @body_format='HTML', @attach_query_result_as_file = 1, @query_attachment_filename = 'job_auto_killed.txt' END EXEC msdb.dbo.sp_purge_jobhistory @job_name = 'Auto Job Manager' --This is necessary as this job will run very frequently and we dont need its every minute history to balloon our logs
The view (dbo.vwActiveRunningJob) mentioned in the above script is as follows:
CREATE VIEW [dbo].[vwActiveRunningJob] AS SELECT ja.job_id, j.name AS job_name, ja.start_execution_date, ja.last_executed_step_date, ISNULL(last_executed_step_id,0)+1 AS current_executed_step_id, js.step_name, DATEDIFF(MINUTE, ja.start_execution_date, GETDATE()) AS RunDurationMinutes, DATEDIFF(SECOND, ja.start_execution_date, GETDATE()) AS RunDurationSeconds, DATEDIFF(MINUTE, ISNULL(ja.last_executed_step_date,ja.start_execution_date), GETDATE()) AS CurrentStepRunDurationMinutes, DATEDIFF(SECOND, ISNULL(ja.last_executed_step_date,ja.start_execution_date), GETDATE()) AS CurrentStepRunDurationSeconds FROM msdb.dbo.sysjobactivity ja LEFT JOIN msdb.dbo.sysjobhistory jh ON ja.job_history_id = jh.instance_id JOIN msdb.dbo.sysjobs j ON ja.job_id = j.job_id JOIN msdb.dbo.sysjobsteps js ON ja.job_id = js.job_id AND ISNULL(ja.last_executed_step_id,0)+1 = js.step_id WHERE ja.session_id = (SELECT TOP 1 session_id FROM msdb.dbo.syssessions ORDER BY agent_start_date DESC) AND start_execution_date IS NOT NULL AND stop_execution_date IS NULL GO
And the meta data table which governs the duration’s allowed for job/steps is as follows:
SELECT job_id, job_name, step_number, step_name, max_duration_sec, duration_check FROM dbo.JobManager;
I hope the above helps to control your environment runaway task. And other application where this technique can be used in processing the cube partitions, where the processing is hung due to other jobs processing the cube partitions.