Post-load data cleanup

Feb 8, 2011 at 9:58 PM

Hey guys,

I´ve been facing the issue when loading a PSSDIAG output that the "post-load-data cleanup" (which comes after "Indexing tables") never ends, it remains there, If I shut the readtrace down, and let SQL Nexus to finish the loading, I go then to read trace main, but no data is available (and the DB is like 1.5 GB). This is the last part of the read trace log:

02/08/11 17:01:10.881 [0X000020EC] Reached the end of the trace; found [TRACE_STOP_EVENT].
02/08/11 17:01:10.903 [0X000020EC] Reads completed
02/08/11 17:01:10.904 [0X000020EC] Shutting down the worker thread message queues.
02/08/11 17:01:10.910 [0X000020EC] Waiting for the worker threads to complete final actions.
02/08/11 17:01:12.247 [0X000020EC] Performing general cleanup actions.
02/08/11 17:01:12.247 [0X000020EC] Reducing cached memory.
02/08/11 17:01:16.788 [0X000020EC] Total Events Processed: 16919446
02/08/11 17:01:16.788 [0X000020EC]  Total Events Filtered: 1499212
02/08/11 17:01:16.788 [0X000020EC] Parse errors: 0
02/08/11 17:01:17.082 [0X000020EC] Table ReadTrace.tblUniqueBatches: loaded ~322 rows
02/08/11 17:01:17.114 [0X000020EC] Table ReadTrace.tblUniqueStatements: loaded ~946 rows
02/08/11 17:01:17.147 [0X000020EC] Table ReadTrace.tblUniquePlans: loaded ~589 rows
02/08/11 17:01:17.174 [0X000020EC] Table ReadTrace.tblUniquePlanRows: loaded ~4138 rows
02/08/11 17:01:17.288 [0X000020EC] Table ReadTrace.tblBatches: loaded ~631626 rows
02/08/11 17:01:17.469 [0X000020EC] Table ReadTrace.tblStatements: loaded ~3896015 rows
02/08/11 17:01:17.675 [0X000020EC] Table ReadTrace.tblPlans: loaded ~1908102 rows
02/08/11 17:01:17.832 [0X000020EC] Table ReadTrace.tblPlanRows: loaded ~13973350 rows
02/08/11 17:01:17.874 [0X000020EC] Table ReadTrace.tblInterestingEvents: loaded ~51756 rows
02/08/11 17:01:17.886 [0X000020EC] Table ReadTrace.tblConnections: loaded ~5111 rows
02/08/11 17:01:17.889 [0X000020EC] WARNING: One or more warning conditions exist that may affect the quality of the analysis data.  See sqlnexus.ReadTrace.tblWarnings table and the ReadTrace log for complete details.
02/08/11 17:01:17.908 [0X000020EC] Indexing tables...
02/08/11 17:05:26.032 [0X000020EC] Doing post-load data cleanup...
02/08/11 18:52:46.567 [0X00002C88] Do you wish to shutdown (Y/N or Q to dump stats)?
02/08/11 18:52:52.378 [0X00002C88] Ignoring CTRL command
02/08/11 18:53:02.621 [0X00002B58] Do you wish to shutdown (Y/N or Q to dump stats)?
02/08/11 18:53:04.493 [0X00002B58] Signaling shutdown
02/08/11 18:53:04.494 [0X00002B58] ***************************************************************
02/08/11 18:53:04.494 [0X00002B58] WARNING: [SPID: 0] Disabling aggregation operations.
02/08/11 18:53:04.494 [0X00002B58] ***************************************************************
02/08/11 18:53:04.494 [0X00002B58] Canceling active query...
02/08/11 18:53:05.119 [0X000020EC] Initial HRESULT: 0x80040E4E
02/08/11 18:53:05.119 [0X000020EC] Command [ exec ReadTrace.usp_PerformPostLoadFixups ]
02/08/11 18:53:05.131 [0X000020EC]
02/08/11 18:53:05.132 [0X000020EC] Dumping error record 1 of 1:
02/08/11 18:53:05.132 [0X000020EC] --------------------------------------------------------------------------
02/08/11 18:53:05.133 [0X000020EC]    Description       = Operation canceled
02/08/11 18:53:05.133 [0X000020EC]    Source            = Microsoft SQL Server Native Client 10.0
02/08/11 18:53:05.134 [0X000020EC]    ERRORINFO.hrError = 0x80040e4e
02/08/11 18:53:05.134 [0X000020EC]    ERRORINFO.dwMinor = 0
02/08/11 18:53:05.134 [0X000020EC]    ERRORINFO.clsid   = {8F4A6B68-4F36-4E3C-BE81-BC7CA4E9C45C}
02/08/11 18:53:05.135 [0X000020EC]    SQL State         = HY008
02/08/11 18:53:05.135 [0X000020EC]    Native Error      = 0
02/08/11 18:53:05.136 [0X000020EC] ERROR: Performance analysis failed to complete all necessary post-load processing.
02/08/11 18:53:05.139 [0X000020EC] *******************************************************************************
* ReadTrace encountered one or more ERRORS. An error condition typically      *
* stops processing early and the ReadTrace output may be unusable.            *
* Review the log file for details.                                            *
*******************************************************************************
02/08/11 18:53:05.143 [0X000020EC] ***** ReadTrace exit code: -1024

 

It does not say that much, only that I just shuted it down...

Any hint will be apreciated!

Thanks a lot!

Sebastian.

Coordinator
Feb 9, 2011 at 11:50 AM

Post-data cleanup can take time based on the type of events that have been captured in your profiler trace. Eg. If you have captured Stmt Starting event and not Stmt Completed event or vice versa, then the post load process which does the aggregation for the reports that can be viewed using SQL Nexus/Reporter will take time. You could try importing it into an instance which has more CPU/RAM resources.

Feb 10, 2011 at 3:36 PM

I understand about the events captured, but that is not the issue, the events I capture are always the same and somehow now it is taking like forever to finish the pos-load data cleanup, in fact I waited for 10 hours for it to finish and it did not... The hardware I use I think it is enough, plenty of disk space, dual core CPU and 8GB of RAM... There must be something else.

Thanks anyways!

Coordinator
Feb 10, 2011 at 4:51 PM

We have seen in the past that post cleanup jobs take a long time due to missing of associated events for the started and completed for a query batch.

Have you checked if there are IO bottlenecks on the instance or if the RML utils thread is being blocked due to some reason? If not, can you obtain the latest version of the RML Utilities and check if the same behavior occurs. To do this send an email to sstlbugs@microsoft.com

Feb 10, 2011 at 5:51 PM

No blocking at all, but seen the SOS_SCHEDULER_YIELD wait type, so checked with the following statement:

SELECT scheduler_id, current_tasks_count, runnable_tasks_count
FROM sys.dm_os_schedulers
WHERE scheduler_id < 255

The result:

scheduler_id current_tasks_count runnable_tasks_count
------------ ------------------- --------------------
0                         9                   1
1                         10                  0

Only one runnable task, and is not even changing over time... And I can see processor activity, which means it is working....

The RML version I got is the 9.02.0013... I do not know if it is the last one, I´ll check that out.