[ Previous | Next | Contents | Glossary | Search ]
Performance Toolbox Version 1.2 and 2 for AIX: Guide and Reference

Chapter 11. Monitoring Remote Systems

This chapter provides information about monitoring remote systems.

Remote Monitoring Overview

The Performance Toolbox for AIX Agent component is a collection of programs that make it possible for a host to act as a provider of performance statistics across a network or locally. The key program is the daemon xmservd. This chapter and "Recording Performance Data on Remote Systems " and " SNMP Multiplex (SMUX) Interface " describe the features of xmservd. " Data Reduction and Alarms with filtd " describes the other important daemon in the Agent component, filtd. Finally, "System Performance Measurement Interface API" describes the local API provided with the Agent.

The remainder of this chapter first explains important features of the System Performance Measurement Interface (SPMI), which is the mechanism that provides statistics to xmservd, and then explains in detail how monitoring of remote systems is made possible. For that discussion, we have adopted the term data-supplier host to describe a host that supplies statistics to another host across a network, while a host receiving the statistics over the network, processing, and displaying them is called a data-consumer host.

The System Performance Measurement Interface

Monitoring statistics supplied by xmservd is made possible through an API called System Performance Measurement Interface (SPMI). Through the SPMI, an application can access statistics available on the local system. This is done by defining sets of statistics (statsets). Observations are taken for all the statistics of a statset at the same time. The concept of statsets is key to understanding how statistics are monitored. It is explained in the section entitled "Statsets " .

The SPMI makes extensive use of shared memory. Similarly, any dynamic data-supplier programs that extend the set of provided statistics use shared memory to export their data. The xmservd daemon (and properly written dynamic data-supplier programs) allocates and frees shared memory segments when starting and terminating. Some important things to know about the use of shared memory are explained in "Shared Memory Types" and subsequent sections .

Statsets

Each system provides a range of statistics, some of which are fixed while others, such as process statistics come and go over time. Most monitoring tasks involve the monitoring of more than one of the statistics provided by a system. In the simplest possible way to access the statistics, the requestor would ask for observations for each statistic and would issue a series of requests to get multiple statistics. This would create a number of inconveniences:

All of these inconveniences are eliminated by the definition of statsets as implemented in the SPMI. Statsets represent views of the entire data repository of statistics and are implemented as data structures that are used to keep track of delta values (difference between the latest observation and the previous one) for statistics. The only way an application program can read observations is by defining a statset and then requesting a reading for all the statistics in the statset. Because statsets are defined to the SPMI, which permits access to local statistics, all statistics in a statset must come from the same system.

The concept of statsets applies throughout Performance Toolbox for AIX. The program xmperf uses statsets to define instruments. There's always a one-to-one relationship between an xmperf instrument and a statset. Similarly, every right side column of a 3dmon graph corresponds to a statset.

Statsets are also closely related to the data packets that carry observations over the network. The xmservd daemon supplies data across the network in the form of data packets that correspond to statsets. Each data packet contains a time stamp that shows when a set of observations was taken and the elapsed time since the previous observation. It then contains two fields for each of the statistics in the statset. The first gives the delta value. The second contains the actual observation value.

In recording files, value records are used to carry observations. They have the same contents as the data packets and maintain the concept of statsets. When recordings are played back with xmperf, the statsets are used to define instruments. When a recording file is analyzed by azizo, statsets are not important but they are preserved when writing a filtered recording file, if requested.

Shared Memory Types

Two types of shared memory are used by the daemon and dynamic data-supplier programs. The first is called common shared memory and is memory that all running dynamic data-supplier and local data-consumer programs (data-consumer programs that do not use the Remote Statistics Interface API) share with the daemon. The second type is allocated in one copy for each dynamic data-supplier program and is supposed to be deallocated and removed by that dynamic data-supplier program when the program exits. This second type of shared memory is called DDS shared memory.

Common Shared Memory

The common shared memory is allocated by the SPMI library on behalf of whichever local data-consumer or data-supplier program (including xmservd) starts first. Each additional such program detects the common shared memory and uses the allocated segment. A counter in the common shared memory segment is incremented by one for each starting data-supplier or local data-consumer program and is decremented by one whenever one of the running programs terminate. When the counter reaches zero, the common shared memory segment is released.

Properly written data-supplier and local data-consumer programs issue a subroutine call when they terminate. This call detaches from the common shared memory segment and decrements the counter. To be properly written, the programs must detect various signals, which indicate that the program (process) is about to terminate. When one of the signals is received the program must issue the subroutine call. The call must also be issued when the program terminates normally.

Most signals can be detected by a program, but some cannot. If one of the undetectable signals causes the program (process) to terminate, the subroutine call is not issued. As a result, the common shared memory segment never is released, since the counter never reaches zero. If this happens, you must release the common shared memory manually, as described in "Releasing Shared Memory Manually" .

To avoid the situation, never kill a data-supplier or local data-consumer program with the option -9. That would terminate the program with a SIGKILL signal, which is not detectable.

DDS Shared Memory

A dynamic data-supplier program exports its data through a private shared memory area, called the DDS shared memory area, which is allocated by the SPMI API. If the memory area exists when the dynamic data-supplier program starts, the program terminates. This ensures that the same dynamic data-supplier program is not running in multiple copies. It also places the responsibility for releasing the DDS shared memory segment whenever the dynamic data-supplier program terminates on the program itself.

Properly designed dynamic data-supplier programs detect signals, which cause termination and issue a subroutine call to release shared memory. One single subroutine call is used to release DDS shared memory and disassociate the program from the common shared memory.

If a dynamic data-supplier program is terminated in a way that cannot be detected from the program itself, the DDS shared memory is not released and subsequent attempts to start the dynamic data-supplier program fail. If this happens, you must release the DDS shared memory manually, as described in section entitled "Releasing Shared Memory Manually" .

To avoid the situation, never kill a dynamic data-supplier program with the option -9. That would terminate the program with a SIGKILL signal, which is not detectable.

Releasing Shared Memory Manually

In situations where one or more data-supplier or local data-consumer programs have terminated in such a way that their shared memory allocations have not been released, the shared memory segments should be released from the command line before attempting to restart the programs. It is recommended that all data-supplier and local data-consumer programs, including xmservd, are killed before you attempt to release shared memory. Clearing of all shared memory segments could be done through the following steps.

  1. Identify all data-supplier and local data-consumer programs that are running. Use the ps command and your knowledge of the programs in use on your system to locate all of them. For each of the running data-supplier or local data-consumer programs, note their process IDs.
  2. Kill all processes associated with data-supplier and local data-consumer programs without using a command line flag.
  3. Verify that all data-supplier and local data-consumer processes have been killed. If not, use the kill -9 command to kill them.
  4. List the shared memory segments in use with the command ipcs -m. This produces a list like the following:
    IPC status from /dev/mem as of 
    Fri Dec 31 07:54:44 CST 1993 
    T  ID       KEY        MODE       OWNER  GROUP 
    Shared Memory: 
    m      0  0x0d050296 --rw-------   root system
    m  20481  0x5806188b --rw-rw-rw- nchris system
    m  28674  0x780502ea --rw-rw-rw-   root system
    m  12292  0x780502e3 --rw-rw-rw-   root system
    m  20485  0x780502d1 --rw-rw-rw-   root system
  5. Identify all shared memory segments with a KEY that begins with "0x78". All shared memory allocated by data-supplier and local data-consumer programs has this key. Now use the ipcrm command to remove the shared memory segments, specifying as command arguments the IDs of the segments you want to remove. To remove all three data-supplier and local data-consumer segments listed above, your command would be:
    ipcrm -m 28674 -m 12292 -m 20485
  6. Restart the dynamic data-supplier and data-consumer programs as required. To start xmservd and any dynamic data-supplier programs started by it, simply execute the command xmpeek.

The xmservd Command Line

The xmservd daemon is always started from inetd. Therefore, command line options must be specified on the line defining xmservd to inetd in the file /etc/inetd.conf. The general format of the command line is:

xmservd [-v] [-b UDP_buffer_size] [-i min_remote_interval] [-l remove_consumer_timeout] [-m supplier_timeout] [-p trace_level] [-s max_logfile_size] [-t keep_alive_limit] [-x xmservd_execution_priority]

All command line options are optional. The options are:

v
Verbose. Causes parsing information for the xmservd recording configuration file to be written to the xmservd log file.
b
Defines the size of the buffer used by the daemon to send and receive UDP packets. The buffer size must be specified in bytes and can be from 4,096 to 16,384 bytes. The buffer size determines the maximum number of data values that can be sent in one data_feed packet. The default buffer size is 4096 bytes, which allows for up to 124 data values in one packet.
i
Defines the minimum interval in milliseconds that data feeds can be sent with. Default is 500 milliseconds. A value between 100 and 5,000 milliseconds can be specified. Any value specified is rounded to a multiple of 100 milliseconds. Whichever minimum remote interval is specified causes all requests for data feeds to be rounded to a multiple of this value. See further details in section "Rounding of Sampling Interval" .
l
(Lowercase L). Sets the time_to_live after feeding of statistics data has ceased as described in section "Life and Death of xmservd" . Must be followed by a number of minutes. A value of 0 (zero) minutes causes the daemon to stay alive forever. The default time_to_live is 15 minutes.

This value is also used to control when to remove inactive data-consumers as described in "Removing Inactive Data-Consumers" .

m
When a dynamic data-supplier is active, this value sets the number of seconds of inactivity from the DDS before the SPMI assumes the DDS is dead. When the timeout value is exceeded, the SiShGoAway flag is set in the shared memory area and the SPMI disconnects from the area. If this flag is not given, the timeout period is set to 90 seconds.
The size of the timeout period is kept in the SPMI common shared memory area. The value stored is the maximum value requested by any data consumer program, including xmservd.
p
Sets the trace level, which determines the types of events written to the log file /etc/perf/xmservd.log1 or /etc/perf/xmservd.log2. Must be followed by a digit from 0 to 9, with 9 being the most detailed trace level. Default trace level is 0 (zero), which disables tracing and logging of events but logs error messages.
s
Specifies the approximate maximum size of the log files. At least every time_to_live minutes, it is checked if the currently active log file is bigger than max_logfile_size. If so, the current log file is closed and logging continues to the alternate log file, which is first reset to zero length. The two log files are /etc/perf/xmservd.log1 and /etc/perf/xmservd.log2. Default maximum file size is 100,000 bytes. You can not make max_logfile_size smaller than 5,000 or larger than 10,000,000 bytes.
t
Sets the keep_alive_limit described in section "Life and Death of xmservd" . Must be followed by a number of seconds from 60 to 900 (1 to 15 minutes). Default is 300 seconds (5 minutes).
x
Sets the execution priority of xmservd. Use this option if the default execution priority of xmservd is unsuitable in your environment. Generally, the daemon should be given as high execution priority as possible (a smaller number gives a higher execution priority).

Rounding of Sampling Interval

As explained under the -i command line argument, all sampling intervals requested by remote data-consumer programs are rounded to the effective minimum sampling interval of xmservd. This can cause unintended rounding of sampling intervals as shown in the "Rounding of Sampling Interval by xmservd " . This rounding can be eliminated by always using 100 milliseconds as the minimum sampling interval. However, if you use 100 milliseconds, and remote data-consumer programs use a wide variety of sampling intervals, then the overhead of xmservd increases because it has to set its interval timer to do processing more frequently. Generally, the minimum sampling interval should be set to as large a value as possible, preferably 1000 milliseconds or more.

The following example illustrates rounding of sampling interval for different minimum sampling intervals and various requested sampling intervals:

minimum remote interval  requested interval  resulting interval
-----------------------  ------------------  ------------------
        200                     500             600
        200                     3,000           3,000
        200                     1,000           1,000
      
        300                     500             600
        300                     3,000           3,000
        300                     1,000           900
        
        400                     500             400
        400                     3,000           3,200
        400                     1,000           1,200
       
        500                     500             500
        500                     3,000           3,000
        500                     1,000           1,000

The xmservd Interface

The xmservd daemon is designed to be started from the inetd "super daemon." Even when you start the daemon manually it reschedules itself via inetd and lets the manually started process die. The following sections describe how xmservd starts, terminates, and keeps track of data-consumer programs.

Life and Death of xmservd

The xmservd daemon must be configured as an inetd daemon to run properly. If you do start the daemon manually, it attempts to reschedule itself by invoking the program xmpeek and then exit. This causes xmservd to be rescheduled via inetd. The line defining the daemon in /etc/inetd.conf must specify the "wait" option to prevent inetd from starting more than one copy of the daemon at a time. The file /etc/inetd.conf is prepared during the installation of the Agent component.

If you want the daemon to be started automatically as part of the boot process, you can add the following two lines at the very end of the file /etc/rc.tcpip:

/usr/bin/sleep 10   
/usr/bin/xmpeek

The first line is necessary only when you intend to use the xmservd/SMUX interface to export statistics to the local SNMP agent.

Note: The xmservd/SMUX interface is only available on RS/6000 Agents.

The "SNMP Multiplex (SMUX) Interface" rel="pagenum"> describes the xmservd/SMUX interface. The line with the sleep command makes sure the start of the snmpd daemon is completed before xmservd starts. The second line uses the program xmpeek (described later in this chapter) to kick off the xmservd daemon.

The xmservd daemon is started by inetd immediately after a UDP datagram is received on its port. Note that the daemon is not scheduled by a request through the SMUX interface from the local SNMP agent. This is because the SNMP agent uses a different port number. Unless xmservd ends abnormally or is killed, it continues to run as long as any data-consumer needs its input or a connection to the SNMP agent is established and alive. When no data-consumer needs its input and either no connection was established through the SMUX interface or any such connection is terminated, the daemon hangs around for time_to_live minutes as specified with the -l (lowercase L) command line argument to xmservd. The default number of time_to_live minutes is 15.

In some environments, it make take some time for a system's xmservd daemon to respond to invitations from remote data-consumers. This can be because the network route is long or the network congested; it may be because all memory on the system is in use so pages must be paged out before xmservd can be loaded; it may be because the xmservd executable is loaded off a server, as on diskless systems. In either case, the data-consumer program may not receive a response from the system in time. Most remote data-consumer programs in Performance Toolbox for AIX have ways to extend the time they wait for responses. See the command lines for each data-consumer program for specifics.

Whenever a connection to the SNMP agent through the SMUX interface is active, or whenever xmservd is configured to record performance data to a file (see "Recording Performance Data on Remote Systems" ) the daemon does not time out and die even when there are no data-consumers to supply. In these situations, the time_to_live limit is used only to determine when to look for inactive remote consumers that can be deleted from the tables in xmservd.

Signals Understood by xmservd

Like many other daemons, xmservd interprets the receipt of the signal SIGHUP (kill -1) as a request to refresh itself. It does this by spawning another copy of itself via inetd and kill itself. When this happens, the spawned copy of xmservd is initially unaware of any data consumers that may have been using the copy of xmservd that received the signal. Consequently, all data-consumer programs must request a resynchronizing with the spawned daemon to continue their monitoring.

The other signal recognized by xmservd is SIGINT (kill -2) that causes the daemon to dump any MIB data it has to a file as described in the section "Interaction Between xmserv and SNMP" .

Removing Inactive Data Consumers

When a data-consumer program such as xmperf uses broadcasts to contact data-supplier hosts, most likely the monitor defines instruments (each of which causes xmservd to create a statset) with only a few of the daemons that respond. Consequently, most daemons have been contacted by many data consumers but supply statistics to only a few. This causes the host tables in the daemon to swell and, in the case of large installations, can induce unnecessary load on the daemon. To cope with this, the daemon attempts to get rid of data consumers that appear not to be interested in its service.

The time_to_live parameter is used to check for inactive partners. A data consumer is removed from the daemon's tables if either of the following conditions is true:

  1. No packet was received from the data consumer for twice the time_to_live period and no statsets were defined for the data consumer.
  2. No packet was received from the data consumer for eight times the time_to_live period and none of the defined statsets are feeding data to the data consumer.

A data consumer that is subscribing to except_rec messages is treated as if it had a statset defined with the daemon.

Checking that Data Consumers are Alive

Once xmservd is running and supplying input to one or more data consumers, it must make sure that the data consumers are still alive and needing its input. If not, it would be a waste of system resources to continue sending statistics across the network. The daemon uses a keep_alive_limit to determine when it's time to check that data-consumer hosts are still alive. The alive limit is reset whenever the user makes changes to the remote monitoring configuration from the data-consumer host, but not when data is fed to the data consumer.

When the keep_alive_limit is reached, xmservd sends a message of type still_alive to the data consumer. The data-consumer program has keep_alive_limit seconds to respond. If a response is not received after keep_alive_limit seconds, the daemon sends another still_alive message and waits another keep_alive_limit seconds. If there's still no response, the daemon assumes the data consumer to be dead or no longer interested and stops sending statistics to it. The default keep_alive_limit is 300 seconds (five minutes); it can be set with the -t command line argument to xmservd.

Handling Exceptions

Through the program filtd described in "Data Reductions and Alarms with filtd" , you can define exception conditions that can cause one or more actions to be taken. One such action is the execution of a command on the host where the daemon runs; another is the sending of an exception message. The message type except_rec is used for the latter.

The contents of each exception message is:

  1. The host name of the host sending the exception message.
  2. The time when the exception was detected.
  3. The severity of the exception, a number between 0 and 10.
  4. The minimum number of minutes between two exception messages from a given exception definition.
  5. A symbolic name describing the exception.
  6. A more verbose description of the exception.

The xmservd daemon sends exceptions to all hosts it knows that have declared that they want to receive exception messages. The RSiOpen and RSiInvite subroutine calls of the API are used by the data-consumer application to declare whether it wants to receive exception messages.

The program exmon is especially designed to monitor exception messages. It allows its user to specify which hosts to monitor for exceptions and displays a window with a matrix that shows which hosts generated exceptions, what types were generated, and how many of each type. This program is described in "Monitoring Exceptions with exmon" .

Currently, xmperf does not request exception messages unless you set the X resource GetExceptions to true or use the -x command line argument. If you have requested exceptions this way and one is received by xmperf, it is sent to the xmperf main window where it appears as a text message. No other action is taken by xmperf.

Session Recovery by xmservd

If the xmservd daemon dies or is killed while one or more data consumers have statsets defined with it, the daemon attempts to record the connections in the file /etc/perf/xmservd.state. If this file exists when xmservd later is restarted, a message of type i_am_back is sent to each of the data-consumer hosts recorded in the file. The file is then erased.

If the programs acting as data consumers are capable of doing a resynchronizing, the interrupted monitoring can resume swiftly and without requiring manual intervention. The xmperf and 3dmon programs can and do resynchronize all active monitors for a host whenever an i_am_back message is received from that host.

The xmquery Network Protocol

We have already mentioned several types of messages (packets) that flow between data-supplier hosts and data-consumer hosts. Message types are organized in four groups as follows:

Configuration Messages
        create_stat_set    Type = 01
        del_set_stat       Type = 02
        first_cx           Type = 03
        first_stat         Type = 04
        instantiate        Type = 05
        next_cx            Type = 06
        next_stat          Type = 07
        path_add_set_stat  Type = 08
        path_get_cx        Type = 09
        path_get_stat      Type = 10
        stat_get_path      Type = 11
       
Data Feed and Feed Control Messages
        begin_feeding      Type = 31
        change_feeding     Type = 32
        end_feeding        Type = 33
        data_feed          Type = 34
        going_down         Type = 35
     
Session Control Messages
        are_you_there      Type = 51
        still_alive        Type = 52
        i_am_back          Type = 53
        except_rec         Type = 54
           
Status Messages
        send_status        Type = 81
        host_status        Type = 82

Configuration Messages

All the configuration messages are specific to the negotiation between the data consumer and the data supplier about what statistics should be sent by the data supplier. We shall not go into detail with these message types but only note, that all such messages require a response, and that they all are initiated by the data consumer.

Data Feed and Data Feed Control Messages

Once the negotiation of what data to supply is completed, the data-supplier host's xmservd maintains a set of information about the statistics to supply. A separate set is kept for each data-consumer program. No feeding of data is started until a begin_feeding message is received from the data-consumer program. The begin_feeding message includes information about the frequency of data feeds and causes xmservd to start feeding data at that frequency, using data_feed packets.

Data feed to a data consumer continues until that data consumer sends an end_feeding message or until the data consumer does no longer respond to still_alive messages. At that time data feeding stops.

The frequency of data feeds can be changed by the data-consumer program by sending the change_feeding message. This message is sent whenever the user changes the interval property of an xmperf instrument.

The final message type in this group is going_down. This message is sent by xmperf and the other remote data-consumer programs in Performance Toolbox for AIX whenever they terminate orderly and whenever any other program written to the RSi API (see "The Remote Statistics Interface API" ) issues the RSiClose call. The message is sent to all data-supplier hosts that the data-consumer program knows about (or the host RSiClose is issued against) and causes the daemons on the data-supplier hosts to erase all information about the terminating data-consumer program.

Session Control Messages

We have already mentioned two of the session control message types in previous sections. To recapture, are_you_there is sent from a data consumer to provoke potential data-supplier hosts to identify themselves. The still_alive message is the only message type that is initiated by xmservd without input from a data consumer. It prompts remote monitors to respond and thus prove that they are still alive.

The third session control message is the i_am_back message, which is always the response to the first message xmservd receives from a data consumer.

Resynchronizing in xmperf

When an i_am_back message is received by a data-consumer host's xmperf program, it responds by marking the configuration tables for the data-supplier host as void. This is because the data-supplier host's xmservd daemon has obviously restarted, which means that earlier negotiations about statsets are now invalidated.

If an i_am_back message is received from a remote supplier while an instrument for that supplier is active, a renegotiation for that instrument is started immediately. If other remote instruments for the supplier are defined to the data-consumer host, renegotiation for those instruments is delayed until the time each instrument is activated.

Renegotiation is not started unless xmperf on the data-consumer host takes action. It is quite possible that a data-supplier host is rebooted and its xmservd daemon therefore goes quietly away. The data consumer no longer receives data, and the remote instrument(s) stop playing. Currently, no facility detects this situation but a menu option allows the user to "resynchronize" with a data supplier. When this option is chosen, an are_you_there message is sent from the xmperf. If the data-supplier daemon is running or can be started, it responds with an i_am_back message and renegotiation starts.

Status Messages and the xmpeek Program

If a large number of data-consumer programs each is monitoring several statistics from one single data-supplier host, the sheer number of requests that must be processed can result in more load on the data-supplier host than is feasible.

Two features allow you to control the daemon on any host you are responsible for. The first one is a facility to display the status of a daemon, as described in this section. The other is the ability to control the access to the xmservd daemon as described in "Limiting Access to Data-Supplier" .

Because the xmservd daemon runs in the background and may start and stop as required, special action is needed to determine the status of the daemon. Such action is implemented through the two message types send_status and host_status. The first can be sent to any xmservd daemon, which then responds by returning the message with total counts for the daemon's activity, followed by a message of type host_status for each data consumer it knows.

A program called xmpeek is supplied as part of the Performance Toolbox for AIX. This program allows you to ask any host about the status of its xmservd daemon. The command line is simple:

xmpeek [-a|-l] [hostname]

Both flags are optional. The -l flag (lowercase L) is explained in "Using the xmpeek Program to Print Available Statistics" . If the flag -a is specified, one line is listed for each data consumer known by the daemon. If omitted, only data consumers that currently have instruments (statsets) defined with the daemon are listed.

If a host name is specified, the daemon on the named host is asked. If no host name is specified, the daemon on the local host is asked. The following is an example of the output from the xmpeek program:

Statistics for xmservd daemon on *** birte ***
Instruments currently defined:    1
Instruments currently active:     1
Remote monitors currently known:  2
--Instruments--- Values  Packets
Defined  Active  Active  Sent    Internet Address  Port  Hostname 
------- ------- ------- ------- ----------------   ----  --------  
1        1       16      3,344   129.49.115.208    3885  xtra

Output from xmpeek can take two forms.

The first form is a line that informs you that the xmservd daemon is not feeding any data-consumer programs. This form is used if no statsets are defined with the daemon and no command flags are supplied.

The second form includes at least as much as is shown in the Sample Output from xmpeek, except that the single detail line for the data consumer on host xtra only is shown if either the -a flag is used or if the data consumer has at least one instrument (statset) defined with the daemon. Note that xmpeek itself appears as a data consumer because it uses the RSi API to contact the daemon. Therefore, the output always shows at least one known monitor.

In the fixed output, first the name of the host where the daemon is running is shown. Then follows three lines giving the totals for current status of the daemon. In the above example, you can see that only one instrument is defined and that it's active. You can also see that two data consumers are known by the daemon, but that only one of them has an instrument defined with the daemon in birte. Obviously, this output was produced without the -a flag.

An example of more activity is shown in the following example output from xmpeek. The output is produced with the command:

xmpeek -a birte 

Notice that some detail lines show zero instruments defined. Such lines indicate that an are_you_there message was received from the data consumer but that no states were ever defined or that any previously defined states were erased.

Statistics for smeared daemon on *** birte ***
   Instruments currently defined:    16   
   Instruments currently active:     14   
   Remote monitors currently known:   6
--Instruments--- Values Packets 
 Defined Active  Active  Sent     Internet Address Port  Hostname ------- 
------- ------- ---------- ---------------- ---- -------- 
   8       8       35    10,232    129.49.115.203  4184   birte  
   6       4       28     8,322    129.49.246.14   3211   umbra  
   0       0        0         0    129.49.115.208  3861   xtra  
   1       1       16     3,332    129.49.246.14   3219   umbra  
   0       0        0         0    129.49.115.203  4209   birte
   1       1       16       422    129.49.115.208  3874   xtra
------- ------- ------- ----------  
  16      14       95    22,308

Notice that the same host name may appear more than once. This is because every running copy of xmperf and every other active data-consumer program is counted and treated as a separate data consumer, each identified by the port number used for UDP packets as shown in the xmpeek output.

The second detail line in the Sample Output from xmpeek shows that one particular monitor on host umbra has six instruments defined but only four active. This would happen if a remote xmperf console has been opened but is now closed. When you close an xmperf console, it stays in the Monitor menu of the xmperf main window and the definition of the instruments of that console remains in the tables of the data-supplier daemon but the instruments are not active.

Instrument Status in xmperf

If the data-consumer program is xmperf, there are only three ways an instrument can be erased from the tables in the xmservd daemon after it is defined. They are:

  1. You can erase an instrument in a remote console or in an instantiated remote skeleton console.
  2. You can erase a remote console or an instantiated remote skeleton console.
  3. The daemon takes the initiative to erase its information about an instrument after it has detected that the data consumer, which defined the instrument is no longer active.

In most cases, the latter situation occurs because the data consumer has been killed (as opposed to closed down orderly). As the daemon detects that the instruments of the data-consumer hosts are no longer active, it deletes them one at a time. When the last instrument of a data consumer is deleted from the tables in xmservd, all information about the remote monitor is deleted too, and the monitor no longer shows up in the output from xmpeek.

Using the xmpeek Program to Print Available Statistics

If the xmpeek program is invoked with the -l flag (lowercase L) it lists all the available statistics of the remote host given on the command line, or the local host if no host name is given. The list of statistics is sent to standard output, which permits you to redirect it to a file or pipe it into another command. The following figure shows a partial listing of statistics on an HP 9000/7255:

/hp2/CPU/             Central processor statistics 
/hp2/CPU/gluser            System-wide time executing in user mode 
(percent)
/hp2/CPU/glkern            System-wide time executing in kernel mode 
(percent)
/hp2/CPU/glwait            System-wide time waiting for IO 
(percent)
/hp2/CPU/glidle            System-wide time CPU is idle (percent)
/hp2/CPU/glnice            System-wide time CPU is running w/nice priority 
(%)
 . . .
/hp2/CPU/cpu0/           Statistics for processor #0 
/hp2/CPU/cpu0/user         Time executing in user mode (percent) 
/hp2/CPU/cpu0/kern         Time executing in kernel mode (percent) 
/hp2/CPU/cpu0/wait 
Time waiting for IO (percent) 
/hp2/CPU/cpu0/idle         Time CPU is idle (percent) 
/hp2/CPU/cpu0/nice         Time CPU is running code with nice 
priority
 . . .
/hp2/Mem/              Memory Statistics
/hp2/Mem/Real/           Physical memory statistics 
/hp2/Mem/Real/size         Size of physical memory (4K pages) 
/hp2/Mem/Real/numfrb       Number of pages on free list
/hp2/Mem/Real/%free        % memory which is free
/hp2/Mem/Real/totreal      Total real memory (Kbytes?'
/hp2/Mem/Real/actreal      Active real memory (Kbytes?'
/hp2/Mem/Virt/           Virtual memory management statistics 
/hp2/Mem/Virt/pagein
4K pages read by VMM 
/hp2/Mem/Virt/pageout      4K pages written by VMM
/hp2/Mem/Virt/zerofill     Page faults satisfied by zero-filling memory 
frames
/hp2/Mem/Virt/pagexct      Total page faults
 . . .

When a host's statistics include contexts that may exist in multiple instantiations and such instantiations are volatile, the list does not break all such contexts down in their components. Rather, only the first instance of the context is broken down and all further instances are listed with five dots appended to the statistics path name. The following example shows this. The process identified by 514~wait (actually a pseudo process) is fully broken down. All other processes are merely listed with their identifier since they would all break down to the same base statistics as the wait process.

/birte/Proc/         Process statistics
/birte/Proc/pswitch           Process context switches
/birte/Proc/runque            Average count of processes waiting for the 
CPU
/birte/Proc/runocc            Number of samplings of runque
/birte/Proc/swpque            Average count of processes waiting to be 
paged in
/birte/Proc/swpocc            Number of samplings of swpque
/birte/Proc/ksched            Number of kernel process creations
/birte/Proc/kexit             Number of kernel process exits
/birte/Proc/514~wait/                  Process wait  (514) %cpu 54.6, 
PgSp: 0.0mb, uid:
/birte/Proc/514~wait/pri      Process priority
/birte/Proc/514~wait/wtype    Process wait status
/birte/Proc/514~wait/majflt            Process page faults involving 
IO
/birte/Proc/514~wait/minflt            Process page faults not involving 
IO
/birte/Proc/514~wait/cpums             CPU time in milliseconds in 
interval
/birte/Proc/514~wait/cpuacc            CPU time in milliseconds in life of 
process
/birte/Proc/514~wait/cpupct            CPU time in percent in 
interval
/birte/Proc/514~wait/usercpu           Process CPU use in user mode 
(percent)
/birte/Proc/514~wait/kerncpu           Process CPU use in kernel mode 
(percent)
/birte/Proc/514~wait/workmem            Physical memory used by process 
private data (4K)
/birte/Proc/514~wait/codemem           Physical memory used by process 
code (4K pages)
/birte/Proc/514~wait/pagsp             Page space used by process private 
data (4K page
/birte/Proc/514~wait/nsignals          Signals received by process
/birte/Proc/514~wait/nvcsw             Voluntary context switches by 
process 
/birte/Proc/514~wait/tsize             Code size (bytes) 
/birte/Proc/514~wait/maxrss            Maximum code+data resident set size 
(4K pages) 
/birte/Proc/12002~x/.....
/birte/Proc/13207~xlock/.....
/birte/Proc/771~netw/.....
/birte/Proc/1~init/.....
/birte/Proc/5723~trapgend/.....
/birte/Proc/0~/.....
/birte/Proc/15339~aixterm/.....
/birte/Proc/2823~syncd/.....
/birte/Proc/13047~xmservd/.....
/birte/Proc/15593~aixterm/.....
 . . .

Protocol Version Control

Because the Performance Toolbox for AIX can be expanded in the future, it is likely that changes to messages or network protocol will be introduced. For this reason, the message types are_you_there, i_am_back, and send_status carry information about the xmquery protocol level they are using.

In case of a difference in protocol version, data-consumer programs do not attempt to negotiate with the data-supplier host. This does not prevent the data supplier from negotiating with, and supplying data to, other remote monitors at the same protocol level as itself.

Limiting Access to Data Suppliers

Access to the xmservd daemon can be limited by supplying stanzas in the configuration file /etc/perf/xmservd.res (or /usr/lpp/perfagent/xmservd.resif the file /etc/perf/xmservd.res does not exist). The three stanzas follow. Note that the colon is part of the stanza. The stanza must begin in column one of a line. There may be more than one line for each stanza type, but in the case of the max: stanza, the last instance overrides any earlier.

only:
When this stanza type is used, access to the daemon is restricted to hosts that are named after the stanza. Hostnames are specified separated by blanks, tabs or commas. Access from any host that is not specified in an only: line is rejected at the time an are_you_there message is received.

Make sure you understand this: If one or more only: lines are specified, only hosts specified in such lines get through to the data retrieval functions of the daemon.

always:
When this stanza type is used, access to the daemon is always granted to hosts that are named after the stanza. Hostnames are specified separated by blanks, tabs or commas. The idea is to make sure that persons who need to do remote monitoring from their hosts can indeed get through, even if the number of active data consumers exceeds the limit established.

However, if an only: stanza is also specified, but the host is not named in such stanza line, access is denied even before the always: stanza can be checked. Consequently, if you use the always: stanza, you must either refrain from using the only: stanza or make sure that all hosts named in the always: lines are also named in the only: lines.

max:
This stanza must be followed by the number of simultaneous data consumers that are allowed to define statsets with the daemon at any one time. Any data consumers running from hosts named in always: lines are not counted when it is checked if the maximum is exceeded.

Access is denied at the time a statset is defined, which normally is when a remote console is opened from the data-consumer host.

If no max: line is found, the maximum number of data consumers defaults to 16.

The following shows a sample xmservd configuration file. Two only: lines define a total of nine hosts that can access the xmservd daemon. No other host is allowed to request statistics from the daemon on the host with this configuration file.

Two always: lines name two hosts from where remote monitoring should always be allowed. Finally, a maximum of three data consumers at a time are permitted to have statsets defined. Note that each copy of xmperf and the other remote data-consumer programs of Performance Toolbox for AIX count as one data consumer, no matter on which host they run.

only:   srv1   srv2   birte  snavs  xtra  jones  chris
only:   savanna rhumba  
always: birte  
always: chris  
max: 3


Starting Dynamic Data-Supplier Programs

The xmservd daemon supplies statistics to data consumers. Such statistics may be maintained and updated internally by xmservd itself through the SPMI API or may be marketed by xmservd to data consumers on behalf of other manufacturers of statistics. Programs that provide xmservd with statistics in this way are called dynamic data-supplier (DDS) programs. They are written to the application programming interface of the System Performance Measurement Interface (see the "System Performance Measurement Interface API" ).

Before a DDS can start supplying statistics to xmservd, the DDS must register with xmservd. Before it can do this, it must be started. DDS programs can be started manually or by any other process when their presence is required, but some dynamic data suppliers may always be required to start when xmservd starts. To facilitate this, the xmservd configuration file in /etc/perf/xmservd.res (If the file /etc/perf/xmservd.res does not exist, the file /usr/lpp/perfagent/xmservd.res is used.) has a special type of stanza to identify DDS programs that must be started by xmservd whenever xmservd starts. The stanza can occur as many times as you have DDS programs to start, each line describing one DDS program. The stanza is:

supplier:
The stanza must be followed by at least one byte of white space and the full path name of the executable dynamic data-supplier program as shown in the following example:
supplier: /usr/samples/perfagent/server/SpmiSupl    
supplier: /u/jensen/mysuppl -x -k 100
supplier: /usr/bin/filtd -p5  

The example contains three stanzas as follows:

Your mysuppl program, apparently, takes command line arguments as does the filtd daemon. The example also shows how these command line arguments can be put into the file.

supplier: /usr/samples/perfagent/server/SpmiSupl    
supplier: /u/jensen/mysuppl -x -k 100
supplier: /usr/bin/filtd -p5  

Adjusting Socket Buffer Pool

If you use the Performance Toolbox for AIX in a network where a large number of hosts are running the xmservd daemon, you may have to increase the maximum size of the socket buffer pool on data consumer hosts to reduce the probability of UDP packets being dropped.

If you notice that xmperf does not see all the hosts that run xmservd, chances are that UDP drops packets. Use the no command to increase the socket buffer pool from four to eight times the default. For example:

no -o sb_max=262144

On AIX Version 3.2, if packets still seem to be dropped, use the netstat -m command to display the "requests for memory denied." If this number grows as you refresh the host list, use the no command to increase the "lowclust" option like this:

no -o lowclust=50

To make sure the values are increased each time your host boots, add the above commands to the file /etc/rc.tcpip.


[ Previous | Next | Contents | Glossary | Search ]