Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()

From: Dominique Martinet <asmadeus@codewreck.org>
To: Jens Axboe <axboe@kernel.dk>
Cc: Christian Schoenebeck <linux_oss@crudebyte.com>,
	Eric Van Hensbergen <ericvh@kernel.org>,
	netdev <netdev@vger.kernel.org>, Jakub Kicinski <kuba@kernel.org>,
	Pengfei Xu <pengfei.xu@intel.com>,
	v9fs-developer@lists.sourceforge.net
Subject: Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
Date: Tue, 7 Feb 2023 07:29:53 +0900	[thread overview]
Message-ID: <Y+F/YSjhcQax1bMm@codewreck.org> (raw)
In-Reply-To: <00a0809e-7b47-c43c-3a13-a84cd692f514@kernel.dk>

Jens Axboe wrote on Mon, Feb 06, 2023 at 02:56:57PM -0700:
> Sure, if you set it again when done, then it will probably work just
> fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING
> separately. An attempt at that at the end of this email, totally
> untested, and I'm not certain it's a good idea at all (see below). Is
> there a reason why we can't exit and get the task_work processed
> instead? That'd be greatly preferable.

No good reason aside of "it's not ready", but in the current code things
will probably get weird.
I actually misremembered the tag lookup for trans_fd, since we're not
freeing the tag yet the lookup will work and the connexion might not be
dropped (just reading into a buffer then freeing it in the cb without
any further processing), but even my refcounting works better than I
thought you'll end up with the IO being replayed while the server is
still processing the first one.
This is unlikely, but for example this could happen: 

- first write [0;1MB]
- write is interrupted before server handled it
   - write replayed and handled, userspace continues to...
   - second write [1MB-4k;1MB]
- first write handle by server, overwriting the second write

And who doesn't enjoy a silent corruption for breakfast?

> > Hm, schedule_delayed_work on the last fput, ok.
> > I was wondering what it had to do with the current 9p thread, but since
> > it's not scheduled on a particular cpu it can pick another cpu to wake
> > up, that makes sense -- although conceptually it feels rather bad to
> > interrupt a remote IO because of a local task that can be done later;
> > e.g. between having the fput wait a bit, or cancel a slow operation like
> > a 1MB write, I'd rather make the fput wait.
> > Do you know why that signal/interrupt is needed in the first place?
> 
> It's needed if the task is currently sleeping in the kernel, to abort a
> sleeping loop. The task_work may contain actions that will result in the
> sleep loop being satisfied and hence ending, which means it needs to be
> processed. That's my worry with the check-and-clear, then reset state
> solution.

I see, sleeping loop might not wake up until the signal is handled, but
it won't handle it if we don't get out.
Not bailing out on sigkill is bad enough but that's possibly much worse
indeed... And that also means the busy loop isn't any better, I was
wondering how it was noticed if it was just a few busy checks but in
that case just temporarily clearing the flag won't get out either,
that's not even a workaround.

I assume that also explains why it wants that task, and cannot just run
from the idle context-- it's not just any worker task, it's in the
process context? (sorry for using you as a rubber duck...)

> > I'll setup some uring IO on 9p and see if I can produce these.
> 
> I'm attaching a test case. I don't think it's particularly useful, but
> it does nicely demonstrate the infinite loop that 9p gets into if
> there's task_work pending.

Thanks, that helps!
I might not have time until weekend but I'll definitely look at it.

-- 
Dominique