An Example of Trace Facility Use

AIX Versions 3.2 and 4 Performance Tuning Guide

An Example of Trace Facility Use

Obtaining a Sample Trace File

Trace data accumulates rapidly. We want to bracket the data collection as closely around the area of interest as possible. One technique for doing this is to issue several commands on the same command line. For example:

$ trace -a -k "20e,20f" -o ./trc_raw ; cp ../bin/track /tmp/junk ; trcstop

captures the execution of the cp command. We have used two features of the trace command. The -k "20e,20f" option suppresses the collection of events from the lockl and unlockl functions. These calls are numerous and add volume to the report without adding understanding at the level we're interested in. The -o ./trc_raw option causes the raw trace output file to be written in our local directory.

Note: This example is more educational if the input file is not already cached in system memory. Choose as the source file any file that is about 50KB and has not been touched recently.

Formatting the Sample Trace

We use the following form of the trcrpt command for our report:

$ trcrpt -O "exec=on,pid=on" trc_raw > cp.rpt

This reports both the fully qualified name of the file that is execed and the process ID that is assigned to it.

A quick look at the report file shows us that there are numerous VMM page assign and delete events in the trace, like the following sequence:

1B1 ksh            8525          0.003109888       0.162816                   VMM page delete:      V.S=00
00.150E ppage=1F7F
                                                                                  delete_in_progress proce
ss_private working_storage

1B0 ksh            8525          0.003141376       0.031488                   VMM page assign:      V.S=00
00.2F33 ppage=1F7F
                                                                                  delete_in_progress process_private working_storage

We are not interested in this level of VMM activity detail at the moment, so we reformat the trace with:

$ trcrpt -k "1b0,1b1" -O "exec=on,pid=on" trc_raw > cp.rpt2

The -k "1b0,1b1" option suppresses the unwanted VMM events in the formatted output. It saves us from having to retrace the workload to suppress unwanted events. We could have used the -k function of trcrpt instead of that of the trace command to suppress the lockl and unlockl events, if we had believed that we might need to look at the lock activity at some point. If we had been interested in only a small set of events, we could have specified -d "hookid1,hookid2" to produce a report with only those events. Since the hook ID is the left-most column of the report, you can quickly compile a list of hooks to include or exclude.

A comprehensive list of Trace hook IDs is defined in /usr/include/sys/trchkid.h.

Reading a Trace Report

The header of the trace report tells you when and where the trace was taken, as well as the command that was used to produce it:

Fri Nov 19 12:12:49 1993
System: AIX ptool Node: 3
Machine: 000168281000
Internet Address: 00000000 0.0.0.0
trace -ak 20e 20f -o -o ./trc_raw

The body of the report, if displayed in a small enough font, looks as follows:

ID  PROCESS NAME   PID           ELAPSED_SEC     DELTA_MSEC   APPL    SYSCALL KERNEL  INTERRUPT
101 ksh            8525          0.005833472       0.107008           kfork
101 ksh            7214          0.012820224       0.031744           execve
134 cp             7214          0.014451456       0.030464           exec cp ../bin/trk/junk

In cp.rpt you can see the following phenomena:

The fork, exec, and page fault activities of the cp process
The opening of the input file for reading and the creation of the /tmp/junk file
The successive read/write system calls to accomplish the copy
The process cp becoming blocked while waiting for I/O completion, and the wait process being dispatched
How logical-volume requests are translated to physical-volume requests
The files are mapped rather than buffered in traditional kernel buffers, and the read accesses cause page faults that must be resolved by the Virtual Memory Manager.
The Virtual Memory Manager senses sequential access and begins to prefetch the file pages.
The size of the prefetch becomes larger as sequential access continues.
When possible, the disk device driver coalesces multiple file requests into one I/O request to the drive.

The trace output looks a little overwhelming at first. This is a good example to use as a learning aid. If you can discern the activities described, you are well on your way to being able to use the trace facility to diagnose system-performance problems.

Filtering of the Trace Report

The full detail of the trace data may not be required. You can choose specific events of interest to be shown. For example, it is sometimes useful to find the number of times a certain event occurred. To answer the question "How many opens occurred in the copy example?" first find the event ID for the open system call. This can be done as follows:

$ trcrpt -j | grep -i open

You should be able to see that event ID 15b is the open event. Now, process the data from the copy example as follows:

$ trcrpt -d 15b -O "exec=on" trc_raw

The report is written to standard output, and you can determine the number of open subroutines that occurred. If you want to see only the open subroutines that were performed by the cp process, run the report command again using the following:

$ trcrpt -d 15b -p cp -O "exec=on" trc_raw