If NFS clients become very slow or applications hang with files on NFS mount, it is likely that the NFS server isn't responding or isn't responding quick enough! An NFS server talks to various other software components (authentication, back end file system etc), it is possible that some other components might contribute to slowness but a hang is something that is likely to be in NFS server itself. The following data capture steps help identify if the observed problem is a performance issue or a hang condition.

  • Collect below data periodically on every CES node. These produce output, so please append the data to filenames of your choice. Adding a timestamp using "date" command is also preferred as given below for some commands (others have their own timestamp!). These can be run every 5 minutes or less. This data collection should be started before the problem is observed.

    1. Extract GPFS stats and reset them for the next collection:

      mmfsadm vfsstats show  && mmfsadm vfsstats reset
      
    2. Collect Ganesha stats:

      ganesha_stats fast # For request stats
      ganesha_stats iov3 # For NFSv3 read/write stats
      ganesha_stats iov4 # For NFSv4 read/write stats
      
    3. Number of file descriptors opened by Ganesha server (use the actual PID of Ganesha process):

      sh -c 'date && ls /proc/<PID-of-Ganesha-process>/fd | wc -l'
      
    4. Top 10 processes using the most amount of memory:

      sh -c 'date && ps aux --sort -rss | head'
      
    5. Free memory in the system:

      sh -c 'date && free -m'
      
  • When the problem is observed with a CES node, collect this information on the CES node:

    1. Collect netstat output:

      sh -c 'date && netstat -an'
      
    2. Collect NFS-Ganesha kernel thread stacks:

      sh -c 'date && for i in /proc/<PID-of-Ganesha-process>/task/*; do echo "===$i===="; cat $i/stack; done'
      
    3. Enable Ganesha tracing (messages go to /var/log/ganesha.log):

      ganesha_mgr set_log COMPONENT_ALL FULL_DEBUG
      
    4. Enable GPFS tracing:

      With at least vnode level 4 ??
      
    5. Collect kernel thread stacks:

      mmdumpkthreads
      
    6. Collect tcpdump at client & server for 5 minutes after enabling Ganesha and GPFS traces. Always collect tcpdump in pcap format by providing -w option to tcpdump command.

    7. Collect coredump by sending SIGABORT signal to ganesha process, make sure you setup your CES nodes to collect NFS-Ganesha coredumps first though. See Setup to take ganesha coredumps for more details.


Published

Category

nfs

Tags

Contact