From: Alan Jenkins <alan.christopher.jenkins@gmail.com>
To: David Howells <dhowells@redhat.com>, viro@zeniv.linux.org.uk
Cc: torvalds@linux-foundation.org, ebiederm@xmission.com,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
mszeredi@redhat.com
Subject: Re: [PATCH 03/34] teach move_mount(2) to work with OPEN_TREE_CLONE [ver #12]
Date: Sat, 20 Oct 2018 12:06:32 +0100 [thread overview]
Message-ID: <209e8c35-d26e-0a29-84d7-b8b1d0ecbebc@gmail.com> (raw)
In-Reply-To: <29902.1539988579@warthog.procyon.org.uk>
[-- Attachment #1: Type: text/plain, Size: 6483 bytes --]
On 19/10/2018 23:36, David Howells wrote:
> Alan Jenkins <alan.christopher.jenkins@gmail.com> wrote:
>
>> # open_tree_clone 3</mnt 3 sh
>> # cd /proc/self/fd/3
>> # mount --move . /mnt
>> [ 41.747831] mnt_flags=1020 umount=0
>> # cd /
>> # umount /mnt
>> umount: /mnt: target is busy
>>
>> ^ a newly introduced bug? I do not remember having this problem before.
> The reason EBUSY is returned is because propagate_mount_busy() is called by
> do_umount() with refcnt == 2, but mnt_count == 3:
>
> umount-3577 M=f8898a34 u=3 0x555 sp=__x64_sys_umount+0x12/0x15
>
> the trace line being added here:
>
> if (!propagate_mount_busy(mnt, 2)) {
> if (!list_empty(&mnt->mnt_list))
> umount_tree(mnt, UMOUNT_PROPAGATE|UMOUNT_SYNC);
> retval = 0;
> } else {
> trace_mnt_count(mnt, mnt->mnt_id,
> atomic_read(&mnt->mnt_count),
> 0x555, __builtin_return_address(0));
> }
>
> The busy evaluation is a result of this check:
>
> if (!list_empty(&mnt->mnt_mounts) || do_refcount_check(mnt, refcnt))
>
> in propagate_mount_busy().
>
>
> The problem apparently being that mnt_count counts both refs from mountings
> and refs from other sources, such as file descriptors or pathwalk.
>
> David
Sorry for wasting your time on the EBUSY. The EBUSY error is not new,
it is correct, and I was doing the wrong thing. I cannot "umount /mnt"
if I still have an FD which points inside /mnt.
I was trying to provide a nice clearer overview, but it was still too
sloppy to understand. I've written a second attempt to rephrase it (and
remove my mistake about EBUSY). This all seems consistent with what Al
just said, so if you got the picture from Al's message, you can ignore
this one :-).
~
The patch series [ver #12] has a problem. OPEN_TREE_CLONE creates an
open file, marked with FMODE_NEED_UNMOUNT for cleanup. Users are
expected to move_mount() directly from that file.
However, it is also possible to use openat() on the open file, which
gives you a second open file. This raises questions about the cleanup
handling. The second open file is *not* marked FMODE_NEED_UNMOUNT.
What happens if we clean up the first open file and then move_mount()
from the second one? And what happens if you consume the second open
file using move_mount(), and then cleanup up the first open file?
When I test the patch series [ver #12], it seems I can trigger the same
bug for either case. The two reproducers use the same commands, but in
a different order.
"close-then-mount"
# open_tree_clone 3</mnt 3 sh
# cd /proc/self/fd/3
# exec 3<&- # close FD 3
# mount --move . /mnt && cd /
# umount -l /mnt
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [umount:1472]
...
RIP: 0010:pin_kill+0x128/0x140
...
Call Trace:
pin_kill+0x5a/0x140
? finish_wait+0x80/0x80
group_pin_kill+0x1a/0x30
namespace_unlock+0x6f/0x80
ksys_umount+0x220/0x420
__x64_sys_umount+0x12/0x20
do_syscall_64+0x5b/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
"mount-then-close"
# open_tree_clone 3</mnt 3 sh
# cd /proc/self/fd/3
# mount --move . /mnt && cd /
# umount -l /mnt
# exec 3<&- # close FD 3
watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [sh:1423]
...
RIP: 0010:pin_kill+0x128/0x140
...
Call Trace:
? finish_wait+0x80/0x80
group_pin_kill+0x1a/0x30
namespace_unlock+0x6f/0x80
__fput+0x239/0x240
task_work_run+0x84/0xa0
exit_to_usermode_loop+0xb4/0xc0
do_syscall_64+0x14d/0x160
entry_SYSCALL_64_after_hwframe+0x44/0xa9
When I debug the kernel and reproduce "close-then-mount", I can see
something is wrong even before the last command. The mount command
attaches a mount into the mount namespace which is still marked as
MNT_UMOUNT. This contradicts a comment in the predicate function,
disconnect_mount():
/* Because the reference counting rules change when mounts are
* unmounted and connected, umounted mounts may not be
* connected to mounted mounts.
*/
if (!(mnt
<https://elixir.bootlin.com/linux/latest/ident/mnt>->mnt_parent->mnt
<https://elixir.bootlin.com/linux/latest/ident/mnt>.mnt_flags & MNT_UMOUNT <https://elixir.bootlin.com/linux/latest/ident/MNT_UMOUNT>))
return true;
We could ask if there is a procedure to safely clear MNT_UMOUNT on a
detached tree, but we don't have a specific reason to. You suggested a
one-line diff, to deny the problematic mount command in "close-then-mount".
@@ -2469,7 +2469,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
if (old->mnt_ns && !attached)
goto out1;
- if (old->mnt.mnt_flags & MNT_LOCKED)
+ if (old->mnt.mnt_flags & (MNT_LOCKED | MNT_UMOUNT))
goto out1;
if (old_path->dentry != old_path->mnt->mnt_root)
It sounds plausible, and it worked as suggested. But it feels
incomplete. If my two reproducer sequences are really symmetric, we
need to fix the code path in move_mount() *and* the code path in
close(). I asked if we can add this on top:
@@ -1763,7 +1763,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
{
namespace_lock();
lock_mount_hash();
- if (!real_mount(mnt)->mnt_ns) {
+ if (!real_mount(mnt)->mnt_ns && !(mnt->mnt_flags & MNT_UMOUNT)) {
mntget(mnt);
umount_tree(real_mount(mnt), UMOUNT_CONNECTED);
}
(To apply without whitespace damage, see the attachment). I tested now
and this seems to allow "mount-then-close"; there is no immediate
softlockup or error message.
You mentioned when you tested, you can get a GPF in fsnotify instead,
depending on the timing of the commands. I have been entering my
commands one at a time, and I have not seen the GPF so far.
You posted an analysis of a GPF, where you showed the reference count
was clearly one less than it should have been. You narrowed this down
to a step where you connected an unmounted mount (MNT_UMOUNT) to a
mounted mount. So your analysis is consistent with the comment in
disconnect_mount(), which says 1) you're not allowed to do that, 2) the
reason is because of different reference-counting rules. AFAICT, the
GPF you analyzed would be prevented by the fix in do_move_mount(),
checking for MNT_UMOUNT.
I have been trying to understand MNT_UMOUNT by reading the patch series
that added it. Now I'm getting the impression the different
ref-counting rules pre-date MNT_UMOUNT. I *think* the added check in
dissolve_on_fput() makes things right, but I don't understand enough to
be sure.
Alan
[-- Attachment #2: MNT_UMOUNT.diff --]
[-- Type: text/x-patch, Size: 711 bytes --]
diff --git a/fs/namespace.c b/fs/namespace.c
index 4dfe7e23b7ee..e8d61d5f581d 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1763,7 +1763,7 @@ void dissolve_on_fput(struct vfsmount *mnt)
{
namespace_lock();
lock_mount_hash();
- if (!real_mount(mnt)->mnt_ns) {
+ if (!real_mount(mnt)->mnt_ns && !(mnt->mnt_flags & MNT_UMOUNT)) {
mntget(mnt);
umount_tree(real_mount(mnt), UMOUNT_CONNECTED);
}
@@ -2469,7 +2469,7 @@ static int do_move_mount(struct path *old_path, struct path *new_path)
if (old->mnt_ns && !attached)
goto out1;
- if (old->mnt.mnt_flags & MNT_LOCKED)
+ if (old->mnt.mnt_flags & (MNT_LOCKED | MNT_UMOUNT))
goto out1;
if (old_path->dentry != old_path->mnt->mnt_root)
next prev parent reply other threads:[~2018-10-20 11:06 UTC|newest]
Thread overview: 110+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-09-21 16:30 [PATCH 00/34] VFS: Introduce filesystem context [ver #12] David Howells
2018-09-21 16:30 ` David Howells
2018-09-21 16:30 ` David Howells
2018-09-21 16:30 ` [PATCH 01/34] vfs: syscall: Add open_tree(2) to reference or clone a mount " David Howells
2018-10-21 16:41 ` Eric W. Biederman
2018-09-21 16:30 ` [PATCH 02/34] vfs: syscall: Add move_mount(2) to move mounts around " David Howells
2018-09-21 16:30 ` [PATCH 03/34] teach move_mount(2) to work with OPEN_TREE_CLONE " David Howells
2018-10-05 18:24 ` Alan Jenkins
2018-10-07 10:48 ` Alan Jenkins
2018-10-07 19:20 ` Alan Jenkins
2018-10-10 12:36 ` David Howells
2018-10-12 14:22 ` Alan Jenkins
2018-10-12 14:22 ` Alan Jenkins
2018-10-12 14:54 ` David Howells
2018-10-12 14:57 ` Alan Jenkins
2018-10-11 9:17 ` David Howells
2018-10-11 11:48 ` Alan Jenkins
2018-10-11 13:10 ` David Howells
2018-10-11 12:14 ` David Howells
2018-10-11 12:23 ` Alan Jenkins
2018-10-11 15:33 ` David Howells
2018-10-11 18:38 ` Eric W. Biederman
2018-10-11 20:17 ` David Howells
2018-10-13 6:06 ` Al Viro
2018-10-17 17:45 ` Alan Jenkins
2018-10-18 20:09 ` David Howells
2018-10-18 20:58 ` David Howells
2018-10-19 11:57 ` David Howells
2018-10-19 13:37 ` David Howells
2018-10-19 17:35 ` Alan Jenkins
2018-10-19 21:35 ` David Howells
2018-10-19 21:40 ` David Howells
2018-10-19 22:36 ` David Howells
2018-10-20 5:25 ` Al Viro
2018-10-20 11:06 ` Alan Jenkins [this message]
2018-10-20 11:48 ` Al Viro
2018-10-20 11:48 ` Al Viro
2018-10-20 12:26 ` Al Viro
2018-10-21 0:40 ` David Howells
2018-10-10 11:56 ` David Howells
2018-10-10 12:31 ` David Howells
2018-10-10 12:39 ` Alan Jenkins
2018-10-10 12:50 ` David Howells
2018-10-10 13:02 ` David Howells
2018-10-10 13:06 ` Alan Jenkins
2018-10-21 16:57 ` Eric W. Biederman
2018-10-23 11:19 ` Alan Jenkins
2018-10-23 16:22 ` Al Viro
2018-09-21 16:30 ` [PATCH 04/34] vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled " David Howells
2018-09-21 16:30 ` [PATCH 05/34] vfs: Introduce the basic header for the new mount API's filesystem context " David Howells
2018-09-21 16:30 ` [PATCH 06/34] vfs: Introduce logging functions " David Howells
2018-09-21 16:31 ` [PATCH 07/34] vfs: Add configuration parser helpers " David Howells
2019-03-14 7:46 ` Geert Uytterhoeven
2019-03-14 10:27 ` David Howells
2019-03-14 10:49 ` Geert Uytterhoeven
2018-09-21 16:31 ` [PATCH 08/34] vfs: Add LSM hooks for the new mount API " David Howells
2018-09-21 16:31 ` David Howells
2018-09-21 16:31 ` [PATCH 09/34] vfs: Put security flags into the fs_context struct " David Howells
2018-09-21 16:31 ` [PATCH 10/34] selinux: Implement the new mount API LSM hooks " David Howells
2018-09-21 16:31 ` David Howells
2018-09-21 16:31 ` [PATCH 11/34] smack: Implement filesystem context security " David Howells
2018-09-21 16:31 ` David Howells
2018-09-21 16:31 ` [PATCH 12/34] apparmor: Implement security hooks for the new mount API " David Howells
2018-09-21 16:31 ` David Howells
2018-09-21 16:31 ` [PATCH 13/34] tomoyo: " David Howells
2018-09-21 16:31 ` David Howells
2018-09-21 16:32 ` [PATCH 14/34] vfs: Separate changing mount flags full remount " David Howells
2018-09-21 16:32 ` [PATCH 15/34] vfs: Implement a filesystem superblock creation/configuration context " David Howells
2018-09-21 16:32 ` [PATCH 16/34] vfs: Remove unused code after filesystem context changes " David Howells
2018-09-21 16:32 ` [PATCH 17/34] procfs: Move proc_fill_super() to fs/proc/root.c " David Howells
2018-09-21 16:32 ` [PATCH 18/34] proc: Add fs_context support to procfs " David Howells
2018-09-21 16:32 ` [PATCH 19/34] ipc: Convert mqueue fs to fs_context " David Howells
2018-09-21 16:32 ` [PATCH 20/34] cpuset: Use " David Howells
2018-09-21 16:33 ` [PATCH 21/34] kernfs, sysfs, cgroup, intel_rdt: Support " David Howells
2018-11-19 4:23 ` Andrei Vagin
2018-12-06 17:08 ` Andrei Vagin
2018-09-21 16:33 ` [PATCH 22/34] hugetlbfs: Convert to " David Howells
2018-09-21 16:33 ` [PATCH 23/34] vfs: Remove kern_mount_data() " David Howells
2018-09-21 16:33 ` [PATCH 24/34] vfs: Provide documentation for new mount API " David Howells
2018-09-21 16:33 ` [PATCH 25/34] Make anon_inodes unconditional " David Howells
2018-09-21 16:33 ` [PATCH 26/34] vfs: syscall: Add fsopen() to prepare for superblock creation " David Howells
2018-09-21 16:33 ` [PATCH 27/34] vfs: Implement logging through fs_context " David Howells
2018-09-21 16:33 ` [PATCH 28/34] vfs: Add some logging to the core users of the fs_context log " David Howells
2018-09-21 16:34 ` [PATCH 29/34] vfs: syscall: Add fsconfig() for configuring and managing a context " David Howells
2018-09-21 16:34 ` [PATCH 30/34] vfs: syscall: Add fsmount() to create a mount for a superblock " David Howells
2018-09-21 16:34 ` [PATCH 31/34] vfs: syscall: Add fspick() to select a superblock for reconfiguration " David Howells
2018-10-12 14:49 ` Alan Jenkins
2018-10-13 6:11 ` Al Viro
2018-10-13 6:11 ` Al Viro
2018-10-13 9:45 ` Alan Jenkins
2018-10-13 23:04 ` Andy Lutomirski
2018-10-17 13:15 ` David Howells
2018-10-17 13:20 ` David Howells
2018-10-17 13:20 ` David Howells
2018-10-17 14:31 ` Alan Jenkins
2018-10-17 14:35 ` Eric W. Biederman
2018-10-17 14:55 ` Alan Jenkins
2018-10-17 15:24 ` David Howells
2018-10-17 15:24 ` David Howells
2018-10-17 15:38 ` Eric W. Biederman
2018-10-17 15:45 ` David Howells
2018-10-17 17:41 ` Alan Jenkins
2018-10-17 21:20 ` David Howells
2018-10-17 22:13 ` Alan Jenkins
2018-09-21 16:34 ` [PATCH 32/34] afs: Add fs_context support " David Howells
2018-09-21 16:34 ` [PATCH 33/34] afs: Use fs_context to pass parameters over automount " David Howells
2018-09-21 16:34 ` [PATCH 34/34] vfs: Add a sample program for the new mount API " David Howells
2018-12-17 14:12 ` Anders Roxell
2018-12-20 16:36 ` David Howells
2018-10-04 18:37 ` [PATCH 00/34] VFS: Introduce filesystem context " Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=209e8c35-d26e-0a29-84d7-b8b1d0ecbebc@gmail.com \
--to=alan.christopher.jenkins@gmail.com \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mszeredi@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.