From: Tycho Andersen <tycho@tycho.pizza>
To: Eric Biederman <ebiederm@xmission.com>,
Christian Brauner <brauner@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>
Cc: fuse-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: strange interaction between fuse + pidns
Date: Thu, 23 Jun 2022 11:21:25 -0600 [thread overview]
Message-ID: <YrShFXRLtRt6T/j+@risky> (raw)
Hi all,
I'm seeing some weird interactions with fuse and the pid namespace. I have a
small reproducer here: https://github.com/tych0/kernel-utils/tree/master/fuse2
fuse has the concept of "forcing" a request, which means (among other
things) that it does an unkillable wait in request_wait_answer(). fuse
flushes files when they are closed with this unkillable wait:
$ sudo cat /proc/1544574/stack
[<0>] request_wait_answer+0x12f/0x210
[<0>] fuse_simple_request+0x109/0x2c0
[<0>] fuse_flush+0x16f/0x1b0
[<0>] filp_close+0x27/0x70
[<0>] put_files_struct+0x6b/0xc0
[<0>] do_exit+0x360/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] get_signal+0x140/0x870
[<0>] arch_do_signal_or_restart+0xae/0x7c0
[<0>] exit_to_user_mode_prepare+0x10f/0x1c0
[<0>] syscall_exit_to_user_mode+0x26/0x40
[<0>] do_syscall_64+0x46/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
Generally, this is OK, since the fuse_dev_release() -> fuse_abort_conn()
wakes up this code when a fuse dev goes away (i.e. a fuse daemon is killed
or unmounted or whatever). However, there's a problem when the fuse daemon
itself spawns a thread that does a flush: since the thread has a copy of
the fd table with an fd pointing to the same fuse device, the reference
count isn't decremented to zero in fuse_dev_release(), and the task hangs
forever.
Tasks can be aborted via fusectl's abort file, so all is not lost. However,
this does wreak havoc in containers which mounted a fuse filesystem with
this state. If the init pid exits (or crashes), the kernel tries to clean
up the pidns:
$ sudo cat /proc/1528591/stack
[<0>] do_wait+0x156/0x2f0
[<0>] kernel_wait4+0x8d/0x140
[<0>] zap_pid_ns_processes+0x104/0x180
[<0>] do_exit+0xa41/0xb80
[<0>] do_group_exit+0x3a/0xa0
[<0>] __x64_sys_exit_group+0x14/0x20
[<0>] do_syscall_64+0x37/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae
but hangs forever. This unkillable wait seems unfortunate, so I tried the
obvious thing of changing it to a killable wait:
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0e537e580dc1..c604dfcaec26 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -297,7 +297,6 @@ void fuse_request_end(struct fuse_req *req)
spin_unlock(&fiq->lock);
}
WARN_ON(test_bit(FR_PENDING, &req->flags));
- WARN_ON(test_bit(FR_SENT, &req->flags));
if (test_bit(FR_BACKGROUND, &req->flags)) {
spin_lock(&fc->bg_lock);
clear_bit(FR_BACKGROUND, &req->flags);
@@ -381,30 +380,33 @@ static void request_wait_answer(struct fuse_req *req)
queue_interrupt(req);
}
- if (!test_bit(FR_FORCE, &req->flags)) {
- /* Only fatal signals may interrupt this */
- err = wait_event_killable(req->waitq,
- test_bit(FR_FINISHED, &req->flags));
- if (!err)
- return;
+ /* Only fatal signals may interrupt this */
+ err = wait_event_killable(req->waitq,
+ test_bit(FR_FINISHED, &req->flags));
+ if (!err)
+ return;
- spin_lock(&fiq->lock);
- /* Request is not yet in userspace, bail out */
- if (test_bit(FR_PENDING, &req->flags)) {
- list_del(&req->list);
- spin_unlock(&fiq->lock);
- __fuse_put_request(req);
- req->out.h.error = -EINTR;
- return;
- }
+ spin_lock(&fiq->lock);
+ /* Request is not yet in userspace, bail out */
+ if (test_bit(FR_PENDING, &req->flags)) {
+ list_del(&req->list);
spin_unlock(&fiq->lock);
+ __fuse_put_request(req);
+ req->out.h.error = -EINTR;
+ return;
}
+ spin_unlock(&fiq->lock);
/*
- * Either request is already in userspace, or it was forced.
- * Wait it out.
+ * Womp womp. We sent a request to userspace and now we're getting
+ * killed.
*/
- wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
+ set_bit(FR_INTERRUPTED, &req->flags);
+ /* matches barrier in fuse_dev_do_read() */
+ smp_mb__after_atomic();
+ /* request *must* be FR_SENT here, because we ignored FR_PENDING before */
+ WARN_ON(!test_bit(FR_SENT, &req->flags));
+ queue_interrupt(req);
}
static void __fuse_request_send(struct fuse_req *req)
avaialble as a full patch here:
https://github.com/tych0/linux/commit/81b9ff4c8c1af24f6544945da808dbf69a1293f7
but now things are even weirder. Tasks are stuck at the killable wait, but with
a SIGKILL pending for the thread group.
root@(none):/# cat /proc/187/stack
[<0>] fuse_simple_request+0x8d9/0x10f0 [fuse]
[<0>] fuse_flush+0x42f/0x630 [fuse]
[<0>] filp_close+0x96/0x120
[<0>] put_files_struct+0x15c/0x2c0
[<0>] do_exit+0xa00/0x2450
[<0>] do_group_exit+0xb2/0x2a0
[<0>] __x64_sys_exit_group+0x35/0x40
[<0>] do_syscall_64+0x40/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
root@(none):/# cat /proc/187/status
Name: main
Umask: 0022
State: S (sleeping)
Tgid: 187
Ngid: 0
Pid: 187
PPid: 185
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 0
Groups:
NStgid: 187 3
NSpid: 187 3
NSpgid: 171 0
NSsid: 160 0
Threads: 1
SigQ: 0/6706
SigPnd: 0000000000000000
ShdPnd: 0000000000000100
SigBlk: 0000000000000000
SigIgn: 0000000180004002
SigCgt: 0000000000000000
CapInh: 0000000000000000
CapPrm: 000001ffffffffff
CapEff: 000001ffffffffff
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Seccomp_filters: 0
Speculation_Store_Bypass: thread vulnerable
SpeculationIndirectBranch: conditional enabled
Cpus_allowed: f
Cpus_allowed_list: 0-3
Mems_allowed: 00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 6
nonvoluntary_ctxt_switches: 1
Any ideas what's going on here? It also seems I'm not the first person to
wonder about this:
https://sourceforge.net/p/fuse/mailman/fuse-devel/thread/CAMp4zn9zTA_A2GJiYo5AD9V5HpeXbzzMP%3DnF0WtwbxRbV3koNA%40mail.gmail.com/#msg36598753
Thanks,
Tycho
next reply other threads:[~2022-06-23 18:17 UTC|newest]
Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-23 17:21 Tycho Andersen [this message]
2022-06-23 21:55 ` strange interaction between fuse + pidns Vivek Goyal
2022-06-23 23:41 ` Tycho Andersen
2022-06-24 17:36 ` Vivek Goyal
2022-07-11 10:35 ` Miklos Szeredi
2022-07-11 13:59 ` Miklos Szeredi
2022-07-11 20:25 ` Tycho Andersen
2022-07-11 21:37 ` Eric W. Biederman
2022-07-11 22:53 ` Tycho Andersen
2022-07-11 23:06 ` Eric W. Biederman
2022-07-12 13:43 ` Tycho Andersen
2022-07-12 14:34 ` Eric W. Biederman
2022-07-12 15:14 ` Tycho Andersen
2022-07-13 17:53 ` [PATCH] sched: __fatal_signal_pending() should also check PF_EXITING Tycho Andersen
2022-07-20 15:03 ` Serge E. Hallyn
2022-07-20 20:58 ` Tycho Andersen
2022-07-21 1:54 ` Serge E. Hallyn
2022-07-27 15:44 ` Tycho Andersen
2022-07-27 16:32 ` Eric W. Biederman
2022-07-27 17:55 ` Tycho Andersen
2022-07-28 18:48 ` Eric W. Biederman
2022-07-27 17:55 ` Oleg Nesterov
2022-07-27 18:18 ` Tycho Andersen
2022-07-27 19:19 ` Oleg Nesterov
2022-07-27 19:40 ` Tycho Andersen
2022-07-28 9:12 ` Oleg Nesterov
2022-07-28 21:20 ` Tycho Andersen
2022-07-29 5:04 ` Eric W. Biederman
2022-07-29 13:50 ` Tycho Andersen
2022-07-29 16:15 ` Eric W. Biederman
2022-07-29 16:48 ` Tycho Andersen
2022-07-29 17:40 ` [RFC][PATCH] fuse: In fuse_flush only wait if someone wants the return code Eric W. Biederman
2022-07-29 20:47 ` Oleg Nesterov
2022-07-30 0:15 ` Al Viro
2022-07-30 5:10 ` [RFC][PATCH v2] " Eric W. Biederman
2022-08-01 15:16 ` Tycho Andersen
2022-08-02 12:50 ` Miklos Szeredi
2022-08-15 13:59 ` Tycho Andersen
2022-08-15 17:55 ` Serge E. Hallyn
2022-09-01 14:06 ` [PATCH] " Tycho Andersen
2022-09-19 15:03 ` Tycho Andersen
2022-09-20 18:02 ` Serge E. Hallyn
2022-09-26 14:17 ` Tycho Andersen
2022-09-27 9:46 ` Miklos Szeredi
2022-09-29 14:05 ` [fuse-devel] " Stef Bon
2022-09-29 16:39 ` [PATCH v2] " Tycho Andersen
2022-09-30 13:35 ` Miklos Szeredi
2022-09-30 14:01 ` Tycho Andersen
2022-09-30 14:41 ` Miklos Szeredi
2022-09-30 16:09 ` Tycho Andersen
2022-10-26 9:01 ` Miklos Szeredi
2022-11-14 16:02 ` [PATCH v3] " Tycho Andersen
2022-11-28 15:00 ` Tycho Andersen
2022-12-08 14:26 ` Miklos Szeredi
2022-12-08 17:49 ` Tycho Andersen
2022-12-19 19:16 ` Tycho Andersen
2023-01-03 14:51 ` Tycho Andersen
2023-01-05 15:15 ` Serge E. Hallyn
2023-01-26 14:12 ` Miklos Szeredi
2022-09-30 19:47 ` [PATCH] " Serge E. Hallyn
2022-09-19 15:46 ` [RFC][PATCH v2] " Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YrShFXRLtRt6T/j+@risky \
--to=tycho@tycho.pizza \
--cc=brauner@kernel.org \
--cc=ebiederm@xmission.com \
--cc=fuse-devel@lists.sourceforge.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=miklos@szeredi.hu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).