[ Previous | Next | Contents | Glossary | Home | Search ]
AIX Version 4.3 System Management Guide: Communications and Networks

Troubleshooting NIS-Related Problems

The approach to troubleshooting a Network Information Service (NIS) problem depends on whether the problem is at the NIS client or the NIS server.

Identifying NIS Client Problems

The most common NIS client problems occur at the following times:

Using rsh

When an AIX machine has two interfaces and they both are given the same name, gethostbyname lookups for rsh command will fail if NIS is being used. This is because AIX NIS will not return both addresses, but only the first one found. This is an implementation limitation imposed by the New Database Manager (NDBM) and performance considerations. The error message will be:

0826-825: there is a host address that does not match

When Commands Hang

The most common problem occurring at an NIS client node is for a command to hang. Sometimes a command appears to hang, even though the system seems fine and other commands run. In such a case, a message like the following can be generated at the console:

NIS: server not responding for domain <wigwam>. Still trying

This error message indicates that the ypbind daemon on the local machine is unable to communicate with the ypserv daemon in the wigwam domain. This results when systems that run the ypserv daemon have failed. It may also occur if the network or the NIS server machine is so overloaded that the ypserv daemon cannot get a response back to your ypbind daemon within the time-out period.

Under these circumstances, all the other NIS clients on your network show the same or similar problems. The condition is usually temporary. The messages go away when the NIS server machine reboots and the ypserv daemon restarts, or else when the load on the NIS server and the network decreases.

If the ypbind daemon is communicating with the ypserv daemon and the NIS server is not overloaded, one of the following problems may exist:

Look for the ypserv and ypbind processes. If the server's ypbind daemon is not running, start it using the instructions in "Starting and Stopping the NIS Daemons".

If a ypserv process is running, issue the ypwhich command on the NIS server machine. If this command returns no answer, the ypserv daemon is probably hung and should be restarted. Stop and restart the ypserv daemon by following the instructions in "Starting and Stopping the NIS Daemons".

When NIS Service Is Unavailable

When other machines on the network appear to have no problems, but NIS service becomes unavailable on your system, a variety of symptoms can show up:

When symptoms like these occur, issue the ls -l command on a directory containing files owned by many users, including users not in the local machine's /etc/passwd file. Use the following format:

ls -l

If the ls -l command reports file owners that are not in the local machine's /etc/passwd file as numbers, rather than names, it means that NIS service is not working.

These symptoms usually indicate that your ypbind daemon is not running. You can use the ps -ef command to check for one. If you do not find a ypbind daemon, start it by following the instructions in "Starting and Stopping NIS Daemons".

When the ypbind Daemon Becomes Inoperable

If the ypbind daemon repeatedly crashes immediately after it is started, you should look for a problem in some other part of the system.

When the ypwhich Command Is Inconsistent

When you use the ypwhich command several times at the same client node, the response varies because the status of the NIS server changes. The status changes are normal.

The binding of NIS client to NIS server changes over time on a busy network, when the NIS servers are busy. Whenever possible, the system stabilizes so that all clients get acceptable response time from the NIS servers. The source of an NIS service is not important, because an NIS server machine often gets its own NIS services from another NIS server on the network.

Identifying NIS Server Problems

The most common NIS server problems occur at the following times:

When Different Versions of an NIS Map Exist

Because NIS works by propagating maps among servers, you can sometimes find different versions of a map at the network servers. This is normal if temporary and abnormal otherwise.

Normal update is prevented when an NIS server or a router between NIS servers is down during a map transfer attempt. When all the NIS servers and all the routers between them are up and running, the ypxfr command should succeed.

If a particular slave server has problems updating a map, you can log in to that server and run the ypxfr command interactively. If this command fails, an error message returns to tell you why, so that you can fix the problem. If the command succeeds, but you want to check it regardless, create a log file to enable logging of messages by typing the following:

cd /var/yp
touch ypxfr.log

This saves all output from the ypxfr command. The output looks much like what the ypxfr command creates when it is run interactively, but each line in the log file is time stamped. The time stamp tells when the ypxfr command began its work. It is normal to see unusual orderings in the time stamps. If copies of the ypxfr command ran simultaneously but their work took differing amounts of time, the summary status line may be written to the log files in an order that differs from the order in which they were invoked.

Any pattern of intermittent failure shows up in the log. After you fix the problem, turn off logging by removing the log file. If you forget to remove the log file, it grows without limit.

While you are logged in to the NIS slave server, inspect the system crontab entries, and the ypxfr shell scripts it invokes.

Make sure that the NIS slave server is in the ypservers map. If not, the yppush command will not notify the slave server when a new copy of a map exists.

When the ypserv Daemon Becomes Inoperable

When the ypserv process repeatedly crashes immediately after starting and does not stay up with repeated activations, the debugging process is similar to that described for ypbind crashes. First, you should check for the portmap daemon:

ps -ef | grep portmap

If you do not find the portmap daemon, reboot the server. If there is a portmap daemon, type:

rpcinfo -p speed

where speed is the hostname of the NIS server.

On your particular machine, the port numbers will be different. The four entries that represent the ypserv daemon are:

100004     2   udp   1027  ypserv
100004     2   tcp   1024  ypserv
100004     1   udp   1027  ypserv
100004     1   tcp   1024  ypserv

If these entries do not exist, the ypserv daemon is unable to register its services. Reboot the machine. If the ypserv entries are present, but they change each time you try to restart the /usr/lib/netsvc/yp/ypserv daemon, reboot the machine again.


[ Previous | Next | Contents | Glossary | Home | Search ]