[ Previous | Next | Contents | Glossary | Home | Search ]
AIX Versions 3.2 and 4 Performance Tuning Guide

Adding New Trace Events

The operating system is shipped instrumented with key events. The user need only activate trace to capture the flow of events from the operating system. Application developers may want to instrument their application code during development for tuning purposes. This provides them with insight into how their applications are interacting with the system.

To add a trace event you have to design the trace records generated by your program in accordance with trace interface conventions. You then add trace-hook macros to the program at the appropriate locations. Traces can then be taken via any of the standard ways of invoking and controlling trace (commands, subcommands, or subroutine calls). To use the trcrpt program to format your traces, you need to add stanzas describing each new trace record and its formatting requirements to the trace format file.

Possible Forms of a Trace Event Record

A trace event can take several forms. An event consists of a hook word, optional data words, and an optional time stamp, as shown in the figure "Format of a Trace Event Record". A four-bit type is defined for each form the event record can take. The type field is imposed by the recording routine so that the report facility can always skip from event to event when processing the data, even if the formatting rules in the trace format file are incorrect or missing for that event.

An event record should be as short as possible. Many system events use only the hook word and time stamp. There is another event type that is mentioned but should seldom be used because it is less efficient and is intrusive. It is a long format that allows the user to record a variable length of data. In this long form, the 16-bit data field of the hook word is converted to a length field that describes the length of the event record.

Trace Channels

The trace facility can accommodate up to eight simultaneous channels of trace-hook activity, which are numbered 0-7. Channel 0 is always used for system events, but application events can also use it. The other seven channels, called generic channels, can be used for tracing application-program activity.

When trace is started, channel 0 is used by default. A trace -n command (where n is the channel number) starts trace to a generic channel. Use of the generic channels has some limitations.

Macros for Recording Trace Events

There is a macro to record each possible type of event record. The macros are defined in /usr/include/sys/trcmacros.h. The event IDs are defined in /usr/include/sys/trchkid.h. These two files should be included by any program that is recording trace events.

The macros to record events on channel 0 with a time stamp are:

TRCHKL0T(hw)

TRCHKL1T(hw,D1)

TRCHKL2T(hw,D1,D2)

TRCHKL3T(hw,D1,D2,D3)

TRCHKL4T(hw,D1,D2,D3,D4)

TRCHKL5T(hw,D1,D2,D3,D4,D5)

Similarly, to record events on channel 0 without a time stamp, use:

TRCHKL0(hw)

TRCHKL1(hw,D1)

TRCHKL2(hw,D1,D2)

TRCHKL3(hw,D1,D2,D3)

TRCHKL4(hw,D1,D2,D3,D4)

TRCHKL5(hw,D1,D2,D3,D4,D5)

The type field of the trace event record is set to the value that corresponds to the macro used, regardless of the value of those 4 bits in the hw parameter.

There are only two macros to record events to one of the generic channels (1-7). These are as follows:

TRCGEN(ch,hw,D1,len,buf)

TRCGENT(ch,hw,D1,len,buf)

These macros record in the event stream specified by the channel parameter (ch) a hook word (hw), a data word (D1) and len bytes from the user's data segment beginning at the location specified by buf.

Use of Event IDs

The event ID in a trace record identifies that record as belonging to a particular class of records. The event ID is the basis on which the trace mechanism records or ignores trace hooks, as well as the basis on which the trcrpt command includes or excludes trace records in the formatted report.

Event IDs are 12 bits (three hexadecimal digits) for a possible 4096 IDs. Event IDs that are permanently left in and shipped with code are permanently assigned by IBM to avoid duplication. To allow users to define events in their environments or during development, the range of event IDs from hex 010 through hex 0FF has been set aside for temporary use. Users can freely use IDs in this range in their own environment (that is, any set of systems within which the users are prepared to ensure that the same event ID is not used ambiguously).

Attention: It is important that users who make use of this event range do not let the code leave their environment. If you ship code instrumented with temporary hook IDs to an environment in which you do not control the use of IDs, you risk collision with other programs that already use the same IDs in that environment.

Event IDs should be conserved because there are so few of them, but they can be extended by using the 16-bit Data Field. This yields a possible 65536 distinguishable events for every formal hook ID. The only reason to have a unique ID is that an ID is the level at which collection and report filtering are available in the trace facility.

A user-added event can be formatted by the trcrpt command if there is a stanza for the event in the specified trace format file. The trace format file is an editable ASCII file--see "Syntax of Stanzas in the Trace Format File".

Examples of Coding and Formatting Events

The following example shows the use of trace events to time the execution of a program loop:

#include <sys/trcctl.h>
#include <sys/trcmacros.h>
#include <sys/trchkid.h>
char *ctl_file = "/dev/systrctl";
int ctlfd;
int i;
main()
{
  printf("configuring trace collection \n");
  if (trcstart("-ad")){
    perror("trcstart");
    exit(1);
  }
  printf("opening the trace device  \n");
  if((ctlfd = open(ctl_file,0))<0){
    perror(ctl_file);
    exit(1);
  }
  printf("turning  trace on \n");
  if(ioctl(ctlfd,TRCON,0)){
    perror("TRCON");
    exit(1);
  }    
  for(i=1;i<11;i++){
    TRCHKL1T(HKWD_USER1,i);
   
    /* The code being measured goes here. The interval */
    /* between occurrences of HKWD_USER1 in the trace  */
    /* file is the total time for one iteration.       */
  }
  printf("turning trace off\n");
  if(ioctl(ctlfd,TRCSTOP,0)){
    perror("TRCOFF");
    exit(1);
  }
   
  printf("stopping the trace daemon \n");
  if (trcstop(0)){
    perror("trcstop");
    exit(1);
  }
   
  exit(0);
}

When you compile the sample program, you need to link to the librts.a library as follows:

$ xlc -O3 sample.c -o sample -l rts

HKWD_USER1 is event ID 010 hexadecimal (you can verify this by looking at /usr/include/sys/trchkid.h). The report facility does not know how to format the HKWD_USER1 event, unless rules are provided in the trace format file. The following example of a stanza for HKWD_USER1 could be used.

# User event HKWD_USER1 Formatting Rules Stanza
# An example that will format the event usage of the sample program
010 1.0 L=APPL "USER EVENT - HKWD_USER1" O2.0      \n \
               "The # of loop iterations =" U4     \n \ 
               "The elapsed time of the last loop = " \
                endtimer(0x010,0x010) starttimer(0x010,0x010)

When entering the example stanza, do not modify the master format file /etc/trcfmt, but instead make a copy and keep it in your own directory (assume you name it mytrcfmt). When you run the sample program, the raw event data is captured in the default log file since no other log file was specified to the trcstart subroutine. You probably want to filter the output report to get only your events. To do this, run the trcrpt command as follows:

trcrpt -d 010 -t mytrcfmt -O "exec=on" > sample.rpt

You can browse sample.rpt to see the result.

Syntax for Stanzas in the Trace Format File

The intent of the trace format file is to provide rules for presentation and display of the expected data for each event ID. This allows new events to be formatted without changing the report facility. Rules for new events are simply added to the format file. The syntax of the rules provides flexibility in the presentation of the data.

The figure "Syntax of a Stanza in a Trace Format File" illustrates the syntax for a given event. A trace format stanza can be as long as required to describe the rules for any particular event. The stanza can be continued to the next line by terminating the present line with a '\' character. The fields are:

event_id Each stanza begins with the three-digit hexadecimal event ID that the stanza describes, followed by a space.
V.R Describes the version (V) and release (R) in which the event was first assigned. Any integers will work for V and R, and users may want to keep their own tracking mechanism.
L= Specifies text indentation level. The text description of an event can begin at various indentation levels. This improves the readability of the report output. The indentation levels correspond to the level at which the system is executing. The recognized levels are application level (APPL), a transitioning system call (SVC), kernel level (KERN), and interrupt (INT).
event_label Specifies an ASCII text string that describes the overall use of the event ID. This is used by the -j option of the trcrpt command to provide a listing of events and their first-level description. The event_label also appears in the formatted output for the event unless the event_label starts with an @ character.
\n The event stanza describes how to parse, label, and present the data contained in an event record. The \n (newline) function can be imbedded in the event stanza to force presentation of the data to a new line. This allows the presentation of the data for an event to be several lines long.
\t Inserts a tab at the point where it is encountered in parsing the description. This is similar to the way the \n function inserts new lines. Spacing can also be inserted by spaces in the data_label or match_label fields.
starttimer(timerID), endtimer(timerID)
                          The timerID is a unique identifier that associates a particular starttimer with a later endtimer that has the same identifier. By (unenforced) convention, the timerID is of the form:

ID of starting event, ID of ending event

When the report facility encounters a starttimer directive while parsing an event, it associates the starting event's time with the specified timerID. When an endtimer with the same timerID is encountered, the report facility shows the delta time (in brackets) that elapsed between the starting event and ending event. The begin- and end-system-call events make use of this capability. On the return from a system-call event, a delta time indicates how long the system call took.

data_descriptor Describes how the data should be consumed, labeled, and presented by the report facility. The syntax of the data_descriptor field is expanded in the second part of the figure "Syntax of a Stanza in a Trace Format File". The various fields of the data_descriptor are described as follows:
format The user can think of the report facility as having a pointer into the data portion of an event. This data pointer is initialized to point to the beginning of the event data (the 16-bit data field in the hook word). The format field describes how much data the report facility should consume from this point and how the data should be considered. For example, a format field of Bm.n tells the report facility to consume m bytes and n bits of data and to consider it as binary data. (The possible format fields are described in following sections.) If the format field is not followed by a comma, the report facility outputs the consumed data in the format specified. If, however, the format field is followed by a comma, it signifies that the data is not to be displayed but instead compared against the following match_values. The data descriptor associated with the matching match_value is then applied to the remainder of the data.
data_label The data_label is an ASCII string that can optionally precede the output of data consumed by the format field.
match_value The match_value is data of the same format described by the preceding format fields. Several match_values typically follow a format field that is being matched. The successive match fields are separated by commas. The last match value is not followed by a comma. A \* is used as a pattern-matching character to match anything. A pattern-matching character is frequently used as the last match_value field to specify default rules if the preceding match_values field did not occur.
match_label The match_label is an ASCII string that will be output for the corresponding match.

All of the possible format fields are described in the comments of the /etc/trcfmt file. A brief introduction to the possibilities is provided here:

Format Field Descriptions
Am.n Specifies that m bytes of data should be consumed as ASCII text and that the text should be displayed in an output field that is n characters wide. The data pointer is moved m bytes.
S1, S2, S4 Specifies left-justified string. The length of the field is defined as 1 byte (S1), 2 bytes (S2), or 4 bytes (S4). The data pointer is moved accordingly.
Bm.n Specifies binary data of m bytes and n bits. The data pointer is moved accordingly.
Xm Specifies hexadecimal data of m bytes. The data pointer is moved accordingly.
D2, D4 Specifies signed decimal format. Data length of 2 (D2) bytes or 4 (D4) bytes is consumed.
U2, U4 Specifies unsigned decimal format. Data length of 2 or 4 bytes is consumed.
F4, F8 Specifies floating point of 4 or 8 bytes.
Gm.n Specifies that the data pointer should be positioned m bytes and n bits into the data.
Om.n Omits, from the current location of the data pointer, the next m bytes and n bits.
Rm Reverses the data pointer m bytes.

Some macros are provided that can be used as format fields to quickly access data. For example:

$HD, $D1, $D2, $D3, $D4, $D5 Access the 16-bit data field of the hook word and data words 1 through 5 of the event record, respectively. The data pointer is not moved. The data accessed by a macro is hexadecimal by default. A macro can be cast to a different data type (X, D, U, B) by using a "%" character followed by the new format code. For example:
$D1%B2.3

This macro causes data word 1 to be accessed but to be considered as 2 bytes and 3 bits of binary data.

The comments in the /etc/trcfmt file describe other format and macro possibilities and describe how a user can define additional macros.


[ Previous | Next | Contents | Glossary | Home | Search ]