Mountpoints disappearing from namespace unexpectedly.

* Mountpoints disappearing from namespace unexpectedly.
@ 2016-09-05 16:45 Oleg Drokin
  2016-09-20  1:44 ` Revalidate failure leads to unmount (was: Mountpoints disappearing from namespace unexpectedly.) Oleg Drokin
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Drokin @ 2016-09-05 16:45 UTC (permalink / raw)
  To: <linux-fsdevel@vger.kernel.org>

Hello!

   I am seeing a strange phenomenon here that I have not been able to completely figure
   out and perhaps it might ring some bells for somebody else.

   I first noticed this in 4.6-rc testing in early June, but just hit it in a similar
   way in 4.8-rc5

   Basically I have a test script that does a bunch of stuff in a limited namespace
   in three related namespaced (backend is the same, mountpoints are separate).

   When a process (a process group or something) is killed, sometimes ones of the
   mountpoints disappears from the namespace completely, even though the scripts
   themselves do not unmount anything.

   No traces of the mountpoint anywhere in /proc (including /proc/*/mounts), so it's not
   in any private namespaces of any of the processes either it seems.

   The filesystems are a locally mounted ext4 (loopback-backed) + 2 nfs
   (of the ext4 reexported).
   In the past it was always ext4 that was dropping, but today I got one of the nfs
   ones.

   Sequence looks like this:
+ mount /tmp/loop /mnt/lustre -o loop
+ mkdir /mnt/lustre/racer
mkdir: cannot create directory '/mnt/lustre/racer': File exists
+ service nfs-server start
Redirecting to /bin/systemctl start  nfs-server.service
+ mount localhost:/mnt/lustre /mnt/nfs -t nfs -o nolock
+ mount localhost:/ /mnt/nfs2 -t nfs4
+ DURATION=3600
+ sh racer.sh /mnt/nfs/racer
+ DURATION=3600
+ sh racer.sh /mnt/nfs2/racer
+ wait %1 %2 %3
+ DURATION=3600
+ sh racer.sh /mnt/lustre/racer
Running racer.sh for 3600 seconds. CTRL-C to exit
Running racer.sh for 3600 seconds. CTRL-C to exit
Running racer.sh for 3600 seconds. CTRL-C to exit
./file_exec.sh: line 12: 216042 Bus error               $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 229086 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 230134 Segmentation fault      $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 235154 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
./file_exec.sh: line 12: 270951 Segmentation fault      (core dumped) $DIR/$file 0.$((RANDOM % 5 + 1)) 2> /dev/null
racer cleanup
racer cleanup
racer cleanup
sleeping 5 sec ...
sleeping 5 sec ...
sleeping 5 sec ...
file_create.sh: no process found
file_create.sh: no process found
dir_create.sh: no process found
file_create.sh: no process found
dir_create.sh: no process found
file_rm.sh: no process found
dir_create.sh: no process found
file_rm.sh: no process found
file_rename.sh: no process found
file_rm.sh: no process found
file_rename.sh: no process found
file_link.sh: no process found
file_rename.sh: no process found
file_link.sh: no process found
file_symlink.sh: no process found
file_link.sh: no process found
file_symlink.sh: no process found
file_list.sh: no process found
file_list.sh: no process found
file_symlink.sh: no process found
file_concat.sh: no process found
file_concat.sh: no process found
file_list.sh: no process found
file_exec.sh: no process found
file_concat.sh: no process found
file_exec.sh: no process found
file_chown.sh: no process found
file_exec.sh: no process found
file_chown.sh: no process found
file_chmod.sh: no process found
file_chown.sh: no process found
file_chmod.sh: no process found
file_mknod.sh: no process found
file_chmod.sh: no process found
file_mknod.sh: no process found
file_truncate.sh: no process found
file_mknod.sh: no process found
file_delxattr.sh: no process found
file_truncate.sh: no process found
file_truncate.sh: no process found
file_getxattr.sh: no process found
file_delxattr.sh: no process found
file_delxattr.sh: no process found
file_setxattr.sh: no process found
there should be NO racer processes:
file_getxattr.sh: no process found
file_getxattr.sh: no process found
file_setxattr.sh: no process found
there should be NO racer processes:
file_setxattr.sh: no process found
there should be NO racer processes:
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
df: /mnt/nfs/racer: No such file or directory
Filesystem     1K-blocks  Used Available Use% Mounted on
/dev/loop0        999320 46376    884132   5% /mnt/lustre
We survived racer.sh for 3600 seconds.
Filesystem     1K-blocks  Used Available Use% Mounted on
localhost:/       999424 46080    884224   5% /mnt/nfs2
We survived racer.sh for 3600 seconds.
+ umount /mnt/nfs
umount: /mnt/nfs: not mounted
+ exit 5

  Now you see in the middle of that /mnt/nfs suddenly disappeared.

  The racer scripts are at
  http://git.whamcloud.com/fs/lustre-release.git/tree/refs/heads/master:/lustre/tests/racer
  There's absolutely no unmounts in there.

  In the past I was just able to do the three racers in parallel, wait ~10 minutes and
  then kill all three of them and with significant probability the ext4 mountpoint would
  disappear.

  Any idea on how to better pinpoint this?

  Thanks.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 11+ messages in thread