Trace data accumulates rapidly. We want to bracket the data collection as closely around the area of interest as possible. One technique for doing this is to issue several commands on the same command line. For example:
$ trace -a -k "20e,20f" -o ./trc_raw ; cp ../bin/track /tmp/junk ; trcstop
captures the execution of the cp command. We have used two features of the trace command. The -k "20e,20f" option suppresses the collection of events from the lockl and unlockl functions. These calls are numerous and add volume to the report without adding understanding at the level we're interested in. The -o ./trc_raw option causes the raw trace output file to be written in our local directory.
Note: This example is more educational if the input file is not already cached in system memory. Choose as the source file any file that is about 50KB and has not been touched recently.
We use the following form of the trcrpt command for our report:
$ trcrpt -O "exec=on,pid=on" trc_raw > cp.rpt
This reports both the fully qualified name of the file that is execed and the process ID that is assigned to it.
A quick look at the report file shows us that there are numerous VMM page assign and delete events in the trace, like the following sequence:
1B1 ksh 8525 0.003109888 0.162816 VMM page delete: V.S=00 00.150E ppage=1F7F delete_in_progress proce ss_private working_storage
1B0 ksh 8525 0.003141376 0.031488 VMM page assign: V.S=00 00.2F33 ppage=1F7F delete_in_progress process_private working_storage
We are not interested in this level of VMM activity detail at the moment, so we reformat the trace with:
$ trcrpt -k "1b0,1b1" -O "exec=on,pid=on" trc_raw > cp.rpt2
The -k "1b0,1b1" option suppresses the unwanted VMM events in the formatted output. It saves us from having to retrace the workload to suppress unwanted events. We could have used the -k function of trcrpt instead of that of the trace command to suppress the lockl and unlockl events, if we had believed that we might need to look at the lock activity at some point. If we had been interested in only a small set of events, we could have specified -d "hookid1,hookid2" to produce a report with only those events. Since the hook ID is the left-most column of the report, you can quickly compile a list of hooks to include or exclude.
A comprehensive list of Trace hook IDs is defined in /usr/include/sys/trchkid.h.
The header of the trace report tells you when and where the trace was taken, as well as the command that was used to produce it:
Fri Nov 19 12:12:49 1993 System: AIX ptool Node: 3 Machine: 000168281000 Internet Address: 00000000 0.0.0.0 trace -ak 20e 20f -o -o ./trc_raw
The body of the report, if displayed in a small enough font, looks as follows:
ID PROCESS NAME PID ELAPSED_SEC DELTA_MSEC APPL SYSCALL KERNEL INTERRUPT 101 ksh 8525 0.005833472 0.107008 kfork 101 ksh 7214 0.012820224 0.031744 execve 134 cp 7214 0.014451456 0.030464 exec cp ../bin/trk/junk
In cp.rpt you can see the following phenomena:
The trace output looks a little overwhelming at first. This is a good example to use as a learning aid. If you can discern the activities described, you are well on your way to being able to use the trace facility to diagnose system-performance problems.
The full detail of the trace data may not be required. You can choose specific events of interest to be shown. For example, it is sometimes useful to find the number of times a certain event occurred. To answer the question "How many opens occurred in the copy example?" first find the event ID for the open system call. This can be done as follows:
$ trcrpt -j | grep -i open
You should be able to see that event ID 15b is the open event. Now, process the data from the copy example as follows:
$ trcrpt -d 15b -O "exec=on" trc_raw
The report is written to standard output, and you can determine the number of open subroutines that occurred. If you want to see only the open subroutines that were performed by the cp process, run the report command again using the following:
$ trcrpt -d 15b -p cp -O "exec=on" trc_raw