Hangs

Note that if an NFS server doesn't respond to a client's request, the request should be retried by the NFS client. Taking tcpdump for hangs should be the best way to determine if it is a client's fault or the server's fault. If you don't see anything in the tcpdump, then it is a client's failure to retry. If the client is retrying but the server isn't responding, then it is likely the server's issue. If they both talk to each other same thing over and over again, then it is time to consult the protocol to see why that is happening and whose fault it is.

Since this is a hang and you want to see what is happening in its full glory, capturing everything at the NFS client would be ideal assuming that the server might be serving other NFS clients. The following tcpdump options while executing it from NFS client are recommented, use your best judgement though:

tcpdump -i <iface> -s0 -w <pcapfile-name> host <NFS-server-ip-address>

Performance

Performance should be analysed by performance counters. This is beyond the scope of tmcpdump. Tcpdump sometimes gives a good indication of performance issues by RTT statistics. The following pcap file shows high WRITE (with FILE_SYNC mode) average response time (78ms), most likely due to the backend file system being slow (with large files!). It most likely caused other operations to be slow as well:

$ tshark -q -zrpc,srt,100003,3  -r slow-writes.pcap

=======================================================
NFS Version 3 SRT Statistics:
Filter:
Procedure        Calls    Min SRT    Max SRT    Avg SRT
GETATTR           2549   0.000056   6.688285   0.222500
SETATTR             79   0.000100   0.186188   0.008062
ACCESS             541   0.014839   5.270222   0.285889
READ                 2   0.132805   2.016926   1.074866
WRITE             9188   0.000125   7.358130   0.784342
READDIRPLUS       1821   0.000078   1.096325   0.042413
FSSTAT            1210   0.000055   7.479788   0.322355
COMMIT               1   0.022565   0.022565   0.022565
=======================================================

Filter NFS port traffic

Port 2049 is used for NFSv3 and NFSv4. Note that MOUNT, NLM, NSM protocols used in NFSv3 environments use random ports.

The following filter will display port 2049 (aka NFS) traffic only:

$ tshark -Ynfs  -r nfs.pcap
7          0 192.168.200.16 -> 192.168.200.184 NFS 4302 V3 WRITE Call, FH: 0xdb3717db Offset: 20603228160 Len: 4096 FILE_SYNC
10          0 192.168.102.121 -> 192.168.102.183 NFS 210 V3 FSSTAT Call, FH: 0x6eb2ec3f
14          0 192.168.200.27 -> 192.168.200.184 NFS 4310 V3 WRITE Call, FH: 0x50a17930 Offset: 2528481280 Len: 4096 FILE_SYNC
27          0 192.168.101.41 -> 192.168.101.183 NFS 186 V3 GETATTR Call, FH: 0xc4cfb703
31          0 192.168.200.16 -> 192.168.200.184 NFS 4302 V3 WRITE Call, FH: 0xc36dcd2e Offset: 21383094272 Len: 4096 FILE_SYNC
35          0 192.168.200.11 -> 192.168.200.184 NFS 24790 V3 WRITE Call, FH: 0x4c45ddae Offset: 2464600064 Len: 24576 FILE_SYNC
45          0 192.168.200.25 -> 192.168.200.184 NFS 8406 V3 WRITE Call, FH: 0xc0d7fbd3 Offset: 2523549696 Len: 8192 FILE_SYNC
51          0 192.168.200.11 -> 192.168.200.184 NFS 3822 V3 WRITE Call, FH: 0x424ed19f Offset: 41099276288 Len: 53248 FILE_SYNC
59          0 192.168.102.183 -> 192.168.102.121 NFS 230 V3 FSSTAT Reply (Call In 10)
60          0 192.168.102.121 -> 192.168.102.183 NFS 210 V3 FSSTAT Call, FH: 0x68a15344

Capturing NFSv3 byte range lock traffic

NFSv3 environments use NLM protocol. Find the NLM port number using "rcpinfo -p <NFS-server>" and search for nlockmgr port number. Then use the tcpdump "port <port>" option to capture only the NLM traffic. For example, tcp NLM port number is 46859 in the example below:

# rpcinfo -p vm2
  program vers proto   port  service
   100000    4   tcp    111  portmapper
   100000    3   tcp    111  portmapper
   100000    2   tcp    111  portmapper
   100000    4   udp    111  portmapper
   100000    3   udp    111  portmapper
   100000    2   udp    111  portmapper
   100024    1   udp  55665  status
   100024    1   tcp  49741  status
   100003    3   udp   2049  nfs
   100003    3   tcp   2049  nfs
   100003    4   udp   2049  nfs
   100003    4   tcp   2049  nfs
   100005    1   udp  60952  mountd
   100005    1   tcp  45982  mountd
   100005    3   udp  60952  mountd
   100005    3   tcp  45982  mountd
   100021    4   udp  33489  nlockmgr
   100021    4   tcp  46859  nlockmgr
   100011    1   udp  50620  rquotad
   100011    1   tcp  33899  rquotad
   100011    2   udp  50620  rquotad
   100011    2   tcp  33899  rquotad

To capture traffic going to or from port 46859, you can do:

tcpdump -i <iface> -s0 -w <pcapfile-name> "tcp port 46859 and host <NFS-server-ip-address>"

Published

Category

tcpdump

Tags

Contact