This chapter provides information about monitoring remote systems.
The Performance Toolbox for AIX Agent component is a collection of programs that make it possible for a host to act as a provider of performance statistics across a network or locally. The key program is the daemon xmservd. This chapter and "Recording Performance Data on Remote Systems " and " SNMP Multiplex (SMUX) Interface " describe the features of xmservd. " Data Reduction and Alarms with filtd " describes the other important daemon in the Agent component, filtd. Finally, "System Performance Measurement Interface API" describes the local API provided with the Agent.
The remainder of this chapter first explains important features of the System Performance Measurement Interface (SPMI), which is the mechanism that provides statistics to xmservd, and then explains in detail how monitoring of remote systems is made possible. For that discussion, we have adopted the term data-supplier host to describe a host that supplies statistics to another host across a network, while a host receiving the statistics over the network, processing, and displaying them is called a data-consumer host.
Monitoring statistics supplied by xmservd is made possible through an API called System Performance Measurement Interface (SPMI). Through the SPMI, an application can access statistics available on the local system. This is done by defining sets of statistics (statsets). Observations are taken for all the statistics of a statset at the same time. The concept of statsets is key to understanding how statistics are monitored. It is explained in the section entitled "Statsets " .
The SPMI makes extensive use of shared memory. Similarly, any dynamic data-supplier programs that extend the set of provided statistics use shared memory to export their data. The xmservd daemon (and properly written dynamic data-supplier programs) allocates and frees shared memory segments when starting and terminating. Some important things to know about the use of shared memory are explained in "Shared Memory Types" and subsequent sections .
Each system provides a range of statistics, some of which are fixed while others, such as process statistics come and go over time. Most monitoring tasks involve the monitoring of more than one of the statistics provided by a system. In the simplest possible way to access the statistics, the requestor would ask for observations for each statistic and would issue a series of requests to get multiple statistics. This would create a number of inconveniences:
All of these inconveniences are eliminated by the definition of statsets as implemented in the SPMI. Statsets represent views of the entire data repository of statistics and are implemented as data structures that are used to keep track of delta values (difference between the latest observation and the previous one) for statistics. The only way an application program can read observations is by defining a statset and then requesting a reading for all the statistics in the statset. Because statsets are defined to the SPMI, which permits access to local statistics, all statistics in a statset must come from the same system.
The concept of statsets applies throughout Performance Toolbox for AIX. The program xmperf uses statsets to define instruments. There's always a one-to-one relationship between an xmperf instrument and a statset. Similarly, every right side column of a 3dmon graph corresponds to a statset.
Statsets are also closely related to the data packets that carry observations over the network. The xmservd daemon supplies data across the network in the form of data packets that correspond to statsets. Each data packet contains a time stamp that shows when a set of observations was taken and the elapsed time since the previous observation. It then contains two fields for each of the statistics in the statset. The first gives the delta value. The second contains the actual observation value.
In recording files, value records are used to carry observations. They have the same contents as the data packets and maintain the concept of statsets. When recordings are played back with xmperf, the statsets are used to define instruments. When a recording file is analyzed by azizo, statsets are not important but they are preserved when writing a filtered recording file, if requested.
Two types of shared memory are used by the daemon and dynamic data-supplier programs. The first is called common shared memory and is memory that all running dynamic data-supplier and local data-consumer programs (data-consumer programs that do not use the Remote Statistics Interface API) share with the daemon. The second type is allocated in one copy for each dynamic data-supplier program and is supposed to be deallocated and removed by that dynamic data-supplier program when the program exits. This second type of shared memory is called DDS shared memory.
The common shared memory is allocated by the SPMI library on behalf of whichever local data-consumer or data-supplier program (including xmservd) starts first. Each additional such program detects the common shared memory and uses the allocated segment. A counter in the common shared memory segment is incremented by one for each starting data-supplier or local data-consumer program and is decremented by one whenever one of the running programs terminate. When the counter reaches zero, the common shared memory segment is released.
Properly written data-supplier and local data-consumer programs issue a subroutine call when they terminate. This call detaches from the common shared memory segment and decrements the counter. To be properly written, the programs must detect various signals, which indicate that the program (process) is about to terminate. When one of the signals is received the program must issue the subroutine call. The call must also be issued when the program terminates normally.
Most signals can be detected by a program, but some cannot. If one of the undetectable signals causes the program (process) to terminate, the subroutine call is not issued. As a result, the common shared memory segment never is released, since the counter never reaches zero. If this happens, you must release the common shared memory manually, as described in "Releasing Shared Memory Manually" .
To avoid the situation, never kill a data-supplier or local data-consumer program with the option -9. That would terminate the program with a SIGKILL signal, which is not detectable.
A dynamic data-supplier program exports its data through a private shared memory area, called the DDS shared memory area, which is allocated by the SPMI API. If the memory area exists when the dynamic data-supplier program starts, the program terminates. This ensures that the same dynamic data-supplier program is not running in multiple copies. It also places the responsibility for releasing the DDS shared memory segment whenever the dynamic data-supplier program terminates on the program itself.
Properly designed dynamic data-supplier programs detect signals, which cause termination and issue a subroutine call to release shared memory. One single subroutine call is used to release DDS shared memory and disassociate the program from the common shared memory.
If a dynamic data-supplier program is terminated in a way that cannot be detected from the program itself, the DDS shared memory is not released and subsequent attempts to start the dynamic data-supplier program fail. If this happens, you must release the DDS shared memory manually, as described in section entitled "Releasing Shared Memory Manually" .
To avoid the situation, never kill a dynamic data-supplier program with the option -9. That would terminate the program with a SIGKILL signal, which is not detectable.
In situations where one or more data-supplier or local data-consumer programs have terminated in such a way that their shared memory allocations have not been released, the shared memory segments should be released from the command line before attempting to restart the programs. It is recommended that all data-supplier and local data-consumer programs, including xmservd, are killed before you attempt to release shared memory. Clearing of all shared memory segments could be done through the following steps.
IPC status from /dev/mem as of Fri Dec 31 07:54:44 CST 1993 T ID KEY MODE OWNER GROUP Shared Memory: m 0 0x0d050296 --rw------- root system m 20481 0x5806188b --rw-rw-rw- nchris system m 28674 0x780502ea --rw-rw-rw- root system m 12292 0x780502e3 --rw-rw-rw- root system m 20485 0x780502d1 --rw-rw-rw- root system
ipcrm -m 28674 -m 12292 -m 20485
The xmservd daemon is always started from inetd. Therefore, command line options must be specified on the line defining xmservd to inetd in the file /etc/inetd.conf. The general format of the command line is:
xmservd [-v] [-b UDP_buffer_size] [-i min_remote_interval] [-l remove_consumer_timeout] [-m supplier_timeout] [-p trace_level] [-s max_logfile_size] [-t keep_alive_limit] [-x xmservd_execution_priority]
All command line options are optional. The options are:
This value is also used to control when to remove inactive data-consumers as described in "Removing Inactive Data-Consumers" .
As explained under the -i command line argument, all sampling intervals requested by remote data-consumer programs are rounded to the effective minimum sampling interval of xmservd. This can cause unintended rounding of sampling intervals as shown in the "Rounding of Sampling Interval by xmservd " . This rounding can be eliminated by always using 100 milliseconds as the minimum sampling interval. However, if you use 100 milliseconds, and remote data-consumer programs use a wide variety of sampling intervals, then the overhead of xmservd increases because it has to set its interval timer to do processing more frequently. Generally, the minimum sampling interval should be set to as large a value as possible, preferably 1000 milliseconds or more.
The following example illustrates rounding of sampling interval for different minimum sampling intervals and various requested sampling intervals:
minimum remote interval requested interval resulting interval ----------------------- ------------------ ------------------ 200 500 600 200 3,000 3,000 200 1,000 1,000 300 500 600 300 3,000 3,000 300 1,000 900 400 500 400 400 3,000 3,200 400 1,000 1,200 500 500 500 500 3,000 3,000 500 1,000 1,000
The xmservd daemon is designed to be started from the inetd "super daemon." Even when you start the daemon manually it reschedules itself via inetd and lets the manually started process die. The following sections describe how xmservd starts, terminates, and keeps track of data-consumer programs.
The xmservd daemon must be configured as an inetd daemon to run properly. If you do start the daemon manually, it attempts to reschedule itself by invoking the program xmpeek and then exit. This causes xmservd to be rescheduled via inetd. The line defining the daemon in /etc/inetd.conf must specify the "wait" option to prevent inetd from starting more than one copy of the daemon at a time. The file /etc/inetd.conf is prepared during the installation of the Agent component.
If you want the daemon to be started automatically as part of the boot process, you can add the following two lines at the very end of the file /etc/rc.tcpip:
/usr/bin/sleep 10 /usr/bin/xmpeek
The first line is necessary only when you intend to use the xmservd/SMUX interface to export statistics to the local SNMP agent.
Note: The xmservd/SMUX interface is only available on RS/6000 Agents.
The "SNMP Multiplex (SMUX) Interface" rel="pagenum"> describes the xmservd/SMUX interface. The line with the sleep command makes sure the start of the snmpd daemon is completed before xmservd starts. The second line uses the program xmpeek (described later in this chapter) to kick off the xmservd daemon.
The xmservd daemon is started by inetd immediately after a UDP datagram is received on its port. Note that the daemon is not scheduled by a request through the SMUX interface from the local SNMP agent. This is because the SNMP agent uses a different port number. Unless xmservd ends abnormally or is killed, it continues to run as long as any data-consumer needs its input or a connection to the SNMP agent is established and alive. When no data-consumer needs its input and either no connection was established through the SMUX interface or any such connection is terminated, the daemon hangs around for time_to_live minutes as specified with the -l (lowercase L) command line argument to xmservd. The default number of time_to_live minutes is 15.
In some environments, it make take some time for a system's xmservd daemon to respond to invitations from remote data-consumers. This can be because the network route is long or the network congested; it may be because all memory on the system is in use so pages must be paged out before xmservd can be loaded; it may be because the xmservd executable is loaded off a server, as on diskless systems. In either case, the data-consumer program may not receive a response from the system in time. Most remote data-consumer programs in Performance Toolbox for AIX have ways to extend the time they wait for responses. See the command lines for each data-consumer program for specifics.
Whenever a connection to the SNMP agent through the SMUX interface is active, or whenever xmservd is configured to record performance data to a file (see "Recording Performance Data on Remote Systems" ) the daemon does not time out and die even when there are no data-consumers to supply. In these situations, the time_to_live limit is used only to determine when to look for inactive remote consumers that can be deleted from the tables in xmservd.
Like many other daemons, xmservd interprets the receipt of the signal SIGHUP (kill -1) as a request to refresh itself. It does this by spawning another copy of itself via inetd and kill itself. When this happens, the spawned copy of xmservd is initially unaware of any data consumers that may have been using the copy of xmservd that received the signal. Consequently, all data-consumer programs must request a resynchronizing with the spawned daemon to continue their monitoring.
The other signal recognized by xmservd is SIGINT (kill -2) that causes the daemon to dump any MIB data it has to a file as described in the section "Interaction Between xmserv and SNMP" .
When a data-consumer program such as xmperf uses broadcasts to contact data-supplier hosts, most likely the monitor defines instruments (each of which causes xmservd to create a statset) with only a few of the daemons that respond. Consequently, most daemons have been contacted by many data consumers but supply statistics to only a few. This causes the host tables in the daemon to swell and, in the case of large installations, can induce unnecessary load on the daemon. To cope with this, the daemon attempts to get rid of data consumers that appear not to be interested in its service.
The time_to_live parameter is used to check for inactive partners. A data consumer is removed from the daemon's tables if either of the following conditions is true:
A data consumer that is subscribing to except_rec messages is treated as if it had a statset defined with the daemon.
Once xmservd is running and supplying input to one or more data consumers, it must make sure that the data consumers are still alive and needing its input. If not, it would be a waste of system resources to continue sending statistics across the network. The daemon uses a keep_alive_limit to determine when it's time to check that data-consumer hosts are still alive. The alive limit is reset whenever the user makes changes to the remote monitoring configuration from the data-consumer host, but not when data is fed to the data consumer.
When the keep_alive_limit is reached, xmservd sends a message of type still_alive to the data consumer. The data-consumer program has keep_alive_limit seconds to respond. If a response is not received after keep_alive_limit seconds, the daemon sends another still_alive message and waits another keep_alive_limit seconds. If there's still no response, the daemon assumes the data consumer to be dead or no longer interested and stops sending statistics to it. The default keep_alive_limit is 300 seconds (five minutes); it can be set with the -t command line argument to xmservd.
Through the program filtd described in "Data Reductions and Alarms with filtd" , you can define exception conditions that can cause one or more actions to be taken. One such action is the execution of a command on the host where the daemon runs; another is the sending of an exception message. The message type except_rec is used for the latter.
The contents of each exception message is:
The xmservd daemon sends exceptions to all hosts it knows that have declared that they want to receive exception messages. The RSiOpen and RSiInvite subroutine calls of the API are used by the data-consumer application to declare whether it wants to receive exception messages.
The program exmon is especially designed to monitor exception messages. It allows its user to specify which hosts to monitor for exceptions and displays a window with a matrix that shows which hosts generated exceptions, what types were generated, and how many of each type. This program is described in "Monitoring Exceptions with exmon" .
Currently, xmperf does not request exception messages unless you set the X resource GetExceptions to true or use the -x command line argument. If you have requested exceptions this way and one is received by xmperf, it is sent to the xmperf main window where it appears as a text message. No other action is taken by xmperf.
If the xmservd daemon dies or is killed while one or more data consumers have statsets defined with it, the daemon attempts to record the connections in the file /etc/perf/xmservd.state. If this file exists when xmservd later is restarted, a message of type i_am_back is sent to each of the data-consumer hosts recorded in the file. The file is then erased.
If the programs acting as data consumers are capable of doing a resynchronizing, the interrupted monitoring can resume swiftly and without requiring manual intervention. The xmperf and 3dmon programs can and do resynchronize all active monitors for a host whenever an i_am_back message is received from that host.
We have already mentioned several types of messages (packets) that flow between data-supplier hosts and data-consumer hosts. Message types are organized in four groups as follows:
Configuration Messages create_stat_set Type = 01 del_set_stat Type = 02 first_cx Type = 03 first_stat Type = 04 instantiate Type = 05 next_cx Type = 06 next_stat Type = 07 path_add_set_stat Type = 08 path_get_cx Type = 09 path_get_stat Type = 10 stat_get_path Type = 11 Data Feed and Feed Control Messages begin_feeding Type = 31 change_feeding Type = 32 end_feeding Type = 33 data_feed Type = 34 going_down Type = 35 Session Control Messages are_you_there Type = 51 still_alive Type = 52 i_am_back Type = 53 except_rec Type = 54 Status Messages send_status Type = 81 host_status Type = 82
All the configuration messages are specific to the negotiation between the data consumer and the data supplier about what statistics should be sent by the data supplier. We shall not go into detail with these message types but only note, that all such messages require a response, and that they all are initiated by the data consumer.
Once the negotiation of what data to supply is completed, the data-supplier host's xmservd maintains a set of information about the statistics to supply. A separate set is kept for each data-consumer program. No feeding of data is started until a begin_feeding message is received from the data-consumer program. The begin_feeding message includes information about the frequency of data feeds and causes xmservd to start feeding data at that frequency, using data_feed packets.
Data feed to a data consumer continues until that data consumer sends an end_feeding message or until the data consumer does no longer respond to still_alive messages. At that time data feeding stops.
The frequency of data feeds can be changed by the data-consumer program by sending the change_feeding message. This message is sent whenever the user changes the interval property of an xmperf instrument.
The final message type in this group is going_down. This message is sent by xmperf and the other remote data-consumer programs in Performance Toolbox for AIX whenever they terminate orderly and whenever any other program written to the RSi API (see "The Remote Statistics Interface API" ) issues the RSiClose call. The message is sent to all data-supplier hosts that the data-consumer program knows about (or the host RSiClose is issued against) and causes the daemons on the data-supplier hosts to erase all information about the terminating data-consumer program.
We have already mentioned two of the session control message types in previous sections. To recapture, are_you_there is sent from a data consumer to provoke potential data-supplier hosts to identify themselves. The still_alive message is the only message type that is initiated by xmservd without input from a data consumer. It prompts remote monitors to respond and thus prove that they are still alive.
The third session control message is the i_am_back message, which is always the response to the first message xmservd receives from a data consumer.
When an i_am_back message is received by a data-consumer host's xmperf program, it responds by marking the configuration tables for the data-supplier host as void. This is because the data-supplier host's xmservd daemon has obviously restarted, which means that earlier negotiations about statsets are now invalidated.
If an i_am_back message is received from a remote supplier while an instrument for that supplier is active, a renegotiation for that instrument is started immediately. If other remote instruments for the supplier are defined to the data-consumer host, renegotiation for those instruments is delayed until the time each instrument is activated.
Renegotiation is not started unless xmperf on the data-consumer host takes action. It is quite possible that a data-supplier host is rebooted and its xmservd daemon therefore goes quietly away. The data consumer no longer receives data, and the remote instrument(s) stop playing. Currently, no facility detects this situation but a menu option allows the user to "resynchronize" with a data supplier. When this option is chosen, an are_you_there message is sent from the xmperf. If the data-supplier daemon is running or can be started, it responds with an i_am_back message and renegotiation starts.
If a large number of data-consumer programs each is monitoring several statistics from one single data-supplier host, the sheer number of requests that must be processed can result in more load on the data-supplier host than is feasible.
Two features allow you to control the daemon on any host you are responsible for. The first one is a facility to display the status of a daemon, as described in this section. The other is the ability to control the access to the xmservd daemon as described in "Limiting Access to Data-Supplier" .
Because the xmservd daemon runs in the background and may start and stop as required, special action is needed to determine the status of the daemon. Such action is implemented through the two message types send_status and host_status. The first can be sent to any xmservd daemon, which then responds by returning the message with total counts for the daemon's activity, followed by a message of type host_status for each data consumer it knows.
A program called xmpeek is supplied as part of the Performance Toolbox for AIX. This program allows you to ask any host about the status of its xmservd daemon. The command line is simple:
Both flags are optional. The -l flag (lowercase L) is explained in "Using the xmpeek Program to Print Available Statistics" . If the flag -a is specified, one line is listed for each data consumer known by the daemon. If omitted, only data consumers that currently have instruments (statsets) defined with the daemon are listed.
If a host name is specified, the daemon on the named host is asked. If no host name is specified, the daemon on the local host is asked. The following is an example of the output from the xmpeek program:
Statistics for xmservd daemon on *** birte *** Instruments currently defined: 1 Instruments currently active: 1 Remote monitors currently known: 2 --Instruments--- Values Packets Defined Active Active Sent Internet Address Port Hostname ------- ------- ------- ------- ---------------- ---- -------- 1 1 16 3,344 129.49.115.208 3885 xtra
Output from xmpeek can take two forms.
The first form is a line that informs you that the xmservd daemon is not feeding any data-consumer programs. This form is used if no statsets are defined with the daemon and no command flags are supplied.
The second form includes at least as much as is shown in the Sample Output from xmpeek, except that the single detail line for the data consumer on host xtra only is shown if either the -a flag is used or if the data consumer has at least one instrument (statset) defined with the daemon. Note that xmpeek itself appears as a data consumer because it uses the RSi API to contact the daemon. Therefore, the output always shows at least one known monitor.
In the fixed output, first the name of the host where the daemon is running is shown. Then follows three lines giving the totals for current status of the daemon. In the above example, you can see that only one instrument is defined and that it's active. You can also see that two data consumers are known by the daemon, but that only one of them has an instrument defined with the daemon in birte. Obviously, this output was produced without the -a flag.
An example of more activity is shown in the following example output from xmpeek. The output is produced with the command:
xmpeek -a birte
Notice that some detail lines show zero instruments defined. Such lines indicate that an are_you_there message was received from the data consumer but that no states were ever defined or that any previously defined states were erased.
Statistics for smeared daemon on *** birte *** Instruments currently defined: 16 Instruments currently active: 14 Remote monitors currently known: 6 --Instruments--- Values Packets Defined Active Active Sent Internet Address Port Hostname ------- ------- ------- ---------- ---------------- ---- -------- 8 8 35 10,232 129.49.115.203 4184 birte 6 4 28 8,322 129.49.246.14 3211 umbra 0 0 0 0 129.49.115.208 3861 xtra 1 1 16 3,332 129.49.246.14 3219 umbra 0 0 0 0 129.49.115.203 4209 birte 1 1 16 422 129.49.115.208 3874 xtra ------- ------- ------- ---------- 16 14 95 22,308
Notice that the same host name may appear more than once. This is because every running copy of xmperf and every other active data-consumer program is counted and treated as a separate data consumer, each identified by the port number used for UDP packets as shown in the xmpeek output.
The second detail line in the Sample Output from xmpeek shows that one particular monitor on host umbra has six instruments defined but only four active. This would happen if a remote xmperf console has been opened but is now closed. When you close an xmperf console, it stays in the Monitor menu of the xmperf main window and the definition of the instruments of that console remains in the tables of the data-supplier daemon but the instruments are not active.
If the data-consumer program is xmperf, there are only three ways an instrument can be erased from the tables in the xmservd daemon after it is defined. They are:
In most cases, the latter situation occurs because the data consumer has been killed (as opposed to closed down orderly). As the daemon detects that the instruments of the data-consumer hosts are no longer active, it deletes them one at a time. When the last instrument of a data consumer is deleted from the tables in xmservd, all information about the remote monitor is deleted too, and the monitor no longer shows up in the output from xmpeek.
If the xmpeek program is invoked with the -l flag (lowercase L) it lists all the available statistics of the remote host given on the command line, or the local host if no host name is given. The list of statistics is sent to standard output, which permits you to redirect it to a file or pipe it into another command. The following figure shows a partial listing of statistics on an HP 9000/7255:
/hp2/CPU/ Central processor statistics /hp2/CPU/gluser System-wide time executing in user mode (percent) /hp2/CPU/glkern System-wide time executing in kernel mode (percent) /hp2/CPU/glwait System-wide time waiting for IO (percent) /hp2/CPU/glidle System-wide time CPU is idle (percent) /hp2/CPU/glnice System-wide time CPU is running w/nice priority (%) . . . /hp2/CPU/cpu0/ Statistics for processor #0 /hp2/CPU/cpu0/user Time executing in user mode (percent) /hp2/CPU/cpu0/kern Time executing in kernel mode (percent) /hp2/CPU/cpu0/wait Time waiting for IO (percent) /hp2/CPU/cpu0/idle Time CPU is idle (percent) /hp2/CPU/cpu0/nice Time CPU is running code with nice priority . . . /hp2/Mem/ Memory Statistics /hp2/Mem/Real/ Physical memory statistics /hp2/Mem/Real/size Size of physical memory (4K pages) /hp2/Mem/Real/numfrb Number of pages on free list /hp2/Mem/Real/%free % memory which is free /hp2/Mem/Real/totreal Total real memory (Kbytes?' /hp2/Mem/Real/actreal Active real memory (Kbytes?' /hp2/Mem/Virt/ Virtual memory management statistics /hp2/Mem/Virt/pagein 4K pages read by VMM /hp2/Mem/Virt/pageout 4K pages written by VMM /hp2/Mem/Virt/zerofill Page faults satisfied by zero-filling memory frames /hp2/Mem/Virt/pagexct Total page faults . . .
When a host's statistics include contexts that may exist in multiple instantiations and such instantiations are volatile, the list does not break all such contexts down in their components. Rather, only the first instance of the context is broken down and all further instances are listed with five dots appended to the statistics path name. The following example shows this. The process identified by 514~wait (actually a pseudo process) is fully broken down. All other processes are merely listed with their identifier since they would all break down to the same base statistics as the wait process.
/birte/Proc/ Process statistics /birte/Proc/pswitch Process context switches /birte/Proc/runque Average count of processes waiting for the CPU /birte/Proc/runocc Number of samplings of runque /birte/Proc/swpque Average count of processes waiting to be paged in /birte/Proc/swpocc Number of samplings of swpque /birte/Proc/ksched Number of kernel process creations /birte/Proc/kexit Number of kernel process exits /birte/Proc/514~wait/ Process wait (514) %cpu 54.6, PgSp: 0.0mb, uid: /birte/Proc/514~wait/pri Process priority /birte/Proc/514~wait/wtype Process wait status /birte/Proc/514~wait/majflt Process page faults involving IO /birte/Proc/514~wait/minflt Process page faults not involving IO /birte/Proc/514~wait/cpums CPU time in milliseconds in interval /birte/Proc/514~wait/cpuacc CPU time in milliseconds in life of process /birte/Proc/514~wait/cpupct CPU time in percent in interval /birte/Proc/514~wait/usercpu Process CPU use in user mode (percent) /birte/Proc/514~wait/kerncpu Process CPU use in kernel mode (percent) /birte/Proc/514~wait/workmem Physical memory used by process private data (4K) /birte/Proc/514~wait/codemem Physical memory used by process code (4K pages) /birte/Proc/514~wait/pagsp Page space used by process private data (4K page /birte/Proc/514~wait/nsignals Signals received by process /birte/Proc/514~wait/nvcsw Voluntary context switches by process /birte/Proc/514~wait/tsize Code size (bytes) /birte/Proc/514~wait/maxrss Maximum code+data resident set size (4K pages) /birte/Proc/12002~x/..... /birte/Proc/13207~xlock/..... /birte/Proc/771~netw/..... /birte/Proc/1~init/..... /birte/Proc/5723~trapgend/..... /birte/Proc/0~/..... /birte/Proc/15339~aixterm/..... /birte/Proc/2823~syncd/..... /birte/Proc/13047~xmservd/..... /birte/Proc/15593~aixterm/..... . . .
Because the Performance Toolbox for AIX can be expanded in the future, it is likely that changes to messages or network protocol will be introduced. For this reason, the message types are_you_there, i_am_back, and send_status carry information about the xmquery protocol level they are using.
In case of a difference in protocol version, data-consumer programs do not attempt to negotiate with the data-supplier host. This does not prevent the data supplier from negotiating with, and supplying data to, other remote monitors at the same protocol level as itself.
Access to the xmservd daemon can be limited by supplying stanzas in the configuration file /etc/perf/xmservd.res (or /usr/lpp/perfagent/xmservd.resif the file /etc/perf/xmservd.res does not exist). The three stanzas follow. Note that the colon is part of the stanza. The stanza must begin in column one of a line. There may be more than one line for each stanza type, but in the case of the max: stanza, the last instance overrides any earlier.
Make sure you understand this: If one or more only: lines are specified, only hosts specified in such lines get through to the data retrieval functions of the daemon.
However, if an only: stanza is also specified, but the host is not named in such stanza line, access is denied even before the always: stanza can be checked. Consequently, if you use the always: stanza, you must either refrain from using the only: stanza or make sure that all hosts named in the always: lines are also named in the only: lines.
Access is denied at the time a statset is defined, which normally is when a remote console is opened from the data-consumer host.
If no max: line is found, the maximum number of data consumers defaults to 16.
The following shows a sample xmservd configuration file. Two only: lines define a total of nine hosts that can access the xmservd daemon. No other host is allowed to request statistics from the daemon on the host with this configuration file.
Two always: lines name two hosts from where remote monitoring should always be allowed. Finally, a maximum of three data consumers at a time are permitted to have statsets defined. Note that each copy of xmperf and the other remote data-consumer programs of Performance Toolbox for AIX count as one data consumer, no matter on which host they run.
only: srv1 srv2 birte snavs xtra jones chris only: savanna rhumba always: birte always: chris max: 3
The xmservd daemon supplies statistics to data consumers. Such statistics may be maintained and updated internally by xmservd itself through the SPMI API or may be marketed by xmservd to data consumers on behalf of other manufacturers of statistics. Programs that provide xmservd with statistics in this way are called dynamic data-supplier (DDS) programs. They are written to the application programming interface of the System Performance Measurement Interface (see the "System Performance Measurement Interface API" ).
Before a DDS can start supplying statistics to xmservd, the DDS must register with xmservd. Before it can do this, it must be started. DDS programs can be started manually or by any other process when their presence is required, but some dynamic data suppliers may always be required to start when xmservd starts. To facilitate this, the xmservd configuration file in /etc/perf/xmservd.res (If the file /etc/perf/xmservd.res does not exist, the file /usr/lpp/perfagent/xmservd.res is used.) has a special type of stanza to identify DDS programs that must be started by xmservd whenever xmservd starts. The stanza can occur as many times as you have DDS programs to start, each line describing one DDS program. The stanza is:
supplier: /usr/samples/perfagent/server/SpmiSupl supplier: /u/jensen/mysuppl -x -k 100 supplier: /usr/bin/filtd -p5
The example contains three stanzas as follows:
Your mysuppl program, apparently, takes command line arguments as does the filtd daemon. The example also shows how these command line arguments can be put into the file.
supplier: /usr/samples/perfagent/server/SpmiSupl supplier: /u/jensen/mysuppl -x -k 100 supplier: /usr/bin/filtd -p5
If you use the Performance Toolbox for AIX in a network where a large number of hosts are running the xmservd daemon, you may have to increase the maximum size of the socket buffer pool on data consumer hosts to reduce the probability of UDP packets being dropped.
If you notice that xmperf does not see all the hosts that run xmservd, chances are that UDP drops packets. Use the no command to increase the socket buffer pool from four to eight times the default. For example:
no -o sb_max=262144
On AIX Version 3.2, if packets still seem to be dropped, use the netstat -m command to display the "requests for memory denied." If this number grows as you refresh the host list, use the no command to increase the "lowclust" option like this:
no -o lowclust=50
To make sure the values are increased each time your host boots, add the above commands to the file /etc/rc.tcpip.