netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
@ 2023-02-03 16:04 Jens Axboe
  2023-02-05 10:02 ` Dominique Martinet
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2023-02-03 16:04 UTC (permalink / raw)
  To: netdev, Jakub Kicinski; +Cc: Pengfei Xu

signal_pending() really means that an exit to userspace is required to
clear the condition, as it could be either an actual signal, or it could
be TWA_SIGNAL based task_work that needs processing. The 9p client
does a recalc_sigpending() to take care of the former, but that still
leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL
task_work pending, then we'll sit in a tight loop spinning as
signal_pending() remains true even after recalc_sigpending().

Move the signal_pending() logic into a helper that deals with both, and
return -ERESTARTSYS if the reason for signal_pendding() being true is
that we have task_work to process.

Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/
Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

---

v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at
    that point. For one, we don't want to export task_work_run(), it's
    in-kernel only. And secondly, we need to ensure we have a sane state
    before running task_work. The latter did look fine before, but this
    should be saner. Tested this also fixes the report as well for me.

diff --git a/net/9p/client.c b/net/9p/client.c
index 622ec6a586ee..9caa66cbd5b7 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -652,6 +652,25 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	return ERR_PTR(err);
 }
 
+static int p9_sigpending(int *sigpending)
+{
+	*sigpending = 0;
+
+	if (!signal_pending(current))
+		return 0;
+
+	/*
+	 * If we have a TIF_NOTIFY_SIGNAL pending, abort to get it
+	 * processed.
+	 */
+	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
+		return -ERESTARTSYS;
+
+	*sigpending = 1;
+	clear_thread_flag(TIF_SIGPENDING);
+	return 0;
+}
+
 /**
  * p9_client_rpc - issue a request and wait for a response
  * @c: client session
@@ -687,12 +706,9 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	req->tc.zc = false;
 	req->rc.zc = false;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	err = p9_sigpending(&sigpending);
+	if (err)
+		goto reterr;
 
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
@@ -789,12 +805,9 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	req->tc.zc = true;
 	req->rc.zc = true;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	err = p9_sigpending(&sigpending);
+	if (err)
+		goto reterr;
 
 	err = c->trans_mod->zc_request(c, req, uidata, uodata,
 				       inlen, olen, in_hdrlen);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
  2023-02-03 16:04 [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending() Jens Axboe
@ 2023-02-05 10:02 ` Dominique Martinet
  2023-02-06 20:19   ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Dominique Martinet @ 2023-02-05 10:02 UTC (permalink / raw)
  To: Jens Axboe, Christian Schoenebeck, Eric Van Hensbergen
  Cc: netdev, Jakub Kicinski, Pengfei Xu, v9fs-developer

meta-comment: 9p is usually handled separately from netdev, I just saw
this by chance when Simon replied to v1 -- please cc
v9fs-developer@lists.sourceforge.net for v3 if there is one
(well, it's a bit of a weird tree because patches are sometimes taken
through -net...)

Also added Christian (virtio 9p) and Eric (second maintainer) to Tos for
attention.


Jens Axboe wrote on Fri, Feb 03, 2023 at 09:04:28AM -0700:
> signal_pending() really means that an exit to userspace is required to
> clear the condition, as it could be either an actual signal, or it could
> be TWA_SIGNAL based task_work that needs processing. The 9p client
> does a recalc_sigpending() to take care of the former, but that still
> leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL
> task_work pending, then we'll sit in a tight loop spinning as
> signal_pending() remains true even after recalc_sigpending().
> 
> Move the signal_pending() logic into a helper that deals with both, and
> return -ERESTARTSYS if the reason for signal_pendding() being true is
> that we have task_work to process.
> Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/
> Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
> v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at
>     that point. For one, we don't want to export task_work_run(), it's
>     in-kernel only. And secondly, we need to ensure we have a sane state
>     before running task_work. The latter did look fine before, but this
>     should be saner. Tested this also fixes the report as well for me.

Hmm, just bailing out here is a can of worm -- when we get the reply
from server depending on the transport hell might break loose (zc
requests in particular on virtio will probably just access the memory
anyway... fd will consider it got a bogus reply and close the connection
which is a lesser evil but still not appropriatey)

We really need to get rid of that retry loop in the first place, and req
refcounting I added a couple of years ago was a first step towards async
flush which will help with that, but the async flush code had a bug I
never found time to work out so it never made it and we need an
immediate fix.

... Just looking at code out loud, sorry for rambling: actually that
signal handling in virtio is already out of p9_virtio_zc_request() so
the pages are already unpinned by the time we do that flush, and I guess
it's not much worse -- refcounting will make it "mostly work" exactly as
it does now, as in the pages won't be freed until we actually get the
reply, so the pages can get moved underneath virtio which is bad but is
the same as right now, and I guess it's a net improvement?


I agree with your assessment that we can't use task_work_run(), I assume
it's also quite bad to just clear the flag?
I'm not familiar with these task at all, in which case do they happen?
Would you be able to share an easy reproducer so that I/someone can try
on various transports?

If it's "rare enough" I'd say sacrificing the connection might make more
sense than a busy loop, but if it's becoming common I think we'll need
to spend some more time thinking about it...
It might be less effort to dig out my async flush commits if this become
too complicated, but I wish I could say I have time for it...

Thanks!

> 
> diff --git a/net/9p/client.c b/net/9p/client.c
> index 622ec6a586ee..9caa66cbd5b7 100644
> --- a/net/9p/client.c
> +++ b/net/9p/client.c
> @@ -652,6 +652,25 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
>  	return ERR_PTR(err);
>  }
>  
> +static int p9_sigpending(int *sigpending)
> +{
> +	*sigpending = 0;
> +
> +	if (!signal_pending(current))
> +		return 0;
> +
> +	/*
> +	 * If we have a TIF_NOTIFY_SIGNAL pending, abort to get it
> +	 * processed.
> +	 */
> +	if (test_thread_flag(TIF_NOTIFY_SIGNAL))
> +		return -ERESTARTSYS;
> +
> +	*sigpending = 1;
> +	clear_thread_flag(TIF_SIGPENDING);
> +	return 0;
> +}
> +
>  /**
>   * p9_client_rpc - issue a request and wait for a response
>   * @c: client session
> @@ -687,12 +706,9 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
>  	req->tc.zc = false;
>  	req->rc.zc = false;
>  
> -	if (signal_pending(current)) {
> -		sigpending = 1;
> -		clear_thread_flag(TIF_SIGPENDING);
> -	} else {
> -		sigpending = 0;
> -	}
> +	err = p9_sigpending(&sigpending);
> +	if (err)
> +		goto reterr;
>  
>  	err = c->trans_mod->request(c, req);
>  	if (err < 0) {
> @@ -789,12 +805,9 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
>  	req->tc.zc = true;
>  	req->rc.zc = true;
>  
> -	if (signal_pending(current)) {
> -		sigpending = 1;
> -		clear_thread_flag(TIF_SIGPENDING);
> -	} else {
> -		sigpending = 0;
> -	}
> +	err = p9_sigpending(&sigpending);
> +	if (err)
> +		goto reterr;
>  
>  	err = c->trans_mod->zc_request(c, req, uidata, uodata,
>  				       inlen, olen, in_hdrlen);
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
  2023-02-05 10:02 ` Dominique Martinet
@ 2023-02-06 20:19   ` Jens Axboe
  2023-02-06 21:42     ` Dominique Martinet
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2023-02-06 20:19 UTC (permalink / raw)
  To: Dominique Martinet, Christian Schoenebeck, Eric Van Hensbergen
  Cc: netdev, Jakub Kicinski, Pengfei Xu, v9fs-developer

On 2/5/23 3:02?AM, Dominique Martinet wrote:
> meta-comment: 9p is usually handled separately from netdev, I just saw
> this by chance when Simon replied to v1 -- please cc
> v9fs-developer@lists.sourceforge.net for v3 if there is one
> (well, it's a bit of a weird tree because patches are sometimes taken
> through -net...)
> 
> Also added Christian (virtio 9p) and Eric (second maintainer) to Tos for
> attention.

Thanks! I can send out a v3, but let's get the discussion sorted first.
Only change I want to make is the comment format, which apparently is
different in net/ that most other spots in the kernel.

> Jens Axboe wrote on Fri, Feb 03, 2023 at 09:04:28AM -0700:
>> signal_pending() really means that an exit to userspace is required to
>> clear the condition, as it could be either an actual signal, or it could
>> be TWA_SIGNAL based task_work that needs processing. The 9p client
>> does a recalc_sigpending() to take care of the former, but that still
>> leaves TWA_SIGNAL task_work. The result is that if we do have TWA_SIGNAL
>> task_work pending, then we'll sit in a tight loop spinning as
>> signal_pending() remains true even after recalc_sigpending().
>>
>> Move the signal_pending() logic into a helper that deals with both, and
>> return -ERESTARTSYS if the reason for signal_pendding() being true is
>> that we have task_work to process.
>> Link: https://lore.kernel.org/lkml/Y9TgUupO5C39V%2FDW@xpf.sh.intel.com/
>> Reported-and-tested-by: Pengfei Xu <pengfei.xu@intel.com>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>> v2: don't rely on task_work_run(), rather just punt with -ERESTARTYS at
>>     that point. For one, we don't want to export task_work_run(), it's
>>     in-kernel only. And secondly, we need to ensure we have a sane state
>>     before running task_work. The latter did look fine before, but this
>>     should be saner. Tested this also fixes the report as well for me.
> 
> Hmm, just bailing out here is a can of worm -- when we get the reply
> from server depending on the transport hell might break loose (zc
> requests in particular on virtio will probably just access the memory
> anyway... fd will consider it got a bogus reply and close the connection
> which is a lesser evil but still not appropriatey)
> 
> We really need to get rid of that retry loop in the first place, and req
> refcounting I added a couple of years ago was a first step towards async
> flush which will help with that, but the async flush code had a bug I
> never found time to work out so it never made it and we need an
> immediate fix.
> 
> ... Just looking at code out loud, sorry for rambling: actually that
> signal handling in virtio is already out of p9_virtio_zc_request() so
> the pages are already unpinned by the time we do that flush, and I guess
> it's not much worse -- refcounting will make it "mostly work" exactly as
> it does now, as in the pages won't be freed until we actually get the
> reply, so the pages can get moved underneath virtio which is bad but is
> the same as right now, and I guess it's a net improvement?
> 
> 
> I agree with your assessment that we can't use task_work_run(), I assume
> it's also quite bad to just clear the flag?
> I'm not familiar with these task at all, in which case do they happen?
> Would you be able to share an easy reproducer so that I/someone can try
> on various transports?

You can't just clear the flag without also running the task_work. Hence
it either needs to be done right there, or leave it pending and let the
exit to userspace take care of it.

> If it's "rare enough" I'd say sacrificing the connection might make more
> sense than a busy loop, but if it's becoming common I think we'll need
> to spend some more time thinking about it...
> It might be less effort to dig out my async flush commits if this become
> too complicated, but I wish I could say I have time for it...

It can be a number of different things - eg fput() will do it. The
particular case that I came across was io_uring, which will use
TWA_SIGNAL based task_work for retry operations (and other things). If
you use io_uring, and depending on how you setup the ring, it can be
quite common or will never happen. Dropping the connection task_work
being pending is not a viable solution, I'm afraid.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
  2023-02-06 20:19   ` Jens Axboe
@ 2023-02-06 21:42     ` Dominique Martinet
  2023-02-06 21:56       ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Dominique Martinet @ 2023-02-06 21:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Schoenebeck, Eric Van Hensbergen, netdev,
	Jakub Kicinski, Pengfei Xu, v9fs-developer

Jens Axboe wrote on Mon, Feb 06, 2023 at 01:19:24PM -0700:
> > I agree with your assessment that we can't use task_work_run(), I assume
> > it's also quite bad to just clear the flag?
> > I'm not familiar with these task at all, in which case do they happen?
> > Would you be able to share an easy reproducer so that I/someone can try
> > on various transports?
> 
> You can't just clear the flag without also running the task_work. Hence
> it either needs to be done right there, or leave it pending and let the
> exit to userspace take care of it.

Sorry I didn't develop that idea; the signal path resets the pending
signal when we're done, I assumed we could also reset the TWA_SIGNAL
flag when we're done flushing. That might take a while though, so it's
far from optimal.

> > If it's "rare enough" I'd say sacrificing the connection might make more
> > sense than a busy loop, but if it's becoming common I think we'll need
> > to spend some more time thinking about it...
> > It might be less effort to dig out my async flush commits if this become
> > too complicated, but I wish I could say I have time for it...
> 
> It can be a number of different things - eg fput() will do it.

Hm, schedule_delayed_work on the last fput, ok.
I was wondering what it had to do with the current 9p thread, but since
it's not scheduled on a particular cpu it can pick another cpu to wake
up, that makes sense -- although conceptually it feels rather bad to
interrupt a remote IO because of a local task that can be done later;
e.g. between having the fput wait a bit, or cancel a slow operation like
a 1MB write, I'd rather make the fput wait.
Do you know why that signal/interrupt is needed in the first place?

> The particular case that I came across was io_uring, which will use
> TWA_SIGNAL based task_work for retry operations (and other things). If
> you use io_uring, and depending on how you setup the ring, it can be
> quite common or will never happen. Dropping the connection task_work
> being pending is not a viable solution, I'm afraid.

Thanks for confirming that it's perfectly normal, let's not drop
connections :)

My preferred approach is still to try and restore the async flush code,
but that will take a while -- it's not something that'll work right away
and I want some tests so it won't be ready for this merge window.
If we can have some sort of workaround until then it'll probably be for
the best, but I don't have any other idea (than temporarily clearing the
flag) at this point.

I'll setup some uring IO on 9p and see if I can produce these.

-- 
Dominique

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
  2023-02-06 21:42     ` Dominique Martinet
@ 2023-02-06 21:56       ` Jens Axboe
  2023-02-06 21:58         ` Jens Axboe
  2023-02-06 22:29         ` Dominique Martinet
  0 siblings, 2 replies; 8+ messages in thread
From: Jens Axboe @ 2023-02-06 21:56 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Christian Schoenebeck, Eric Van Hensbergen, netdev,
	Jakub Kicinski, Pengfei Xu, v9fs-developer

[-- Attachment #1: Type: text/plain, Size: 3531 bytes --]

On 2/6/23 2:42?PM, Dominique Martinet wrote:
> Jens Axboe wrote on Mon, Feb 06, 2023 at 01:19:24PM -0700:
>>> I agree with your assessment that we can't use task_work_run(), I assume
>>> it's also quite bad to just clear the flag?
>>> I'm not familiar with these task at all, in which case do they happen?
>>> Would you be able to share an easy reproducer so that I/someone can try
>>> on various transports?
>>
>> You can't just clear the flag without also running the task_work. Hence
>> it either needs to be done right there, or leave it pending and let the
>> exit to userspace take care of it.
> 
> Sorry I didn't develop that idea; the signal path resets the pending
> signal when we're done, I assumed we could also reset the TWA_SIGNAL
> flag when we're done flushing. That might take a while though, so it's
> far from optimal.

Sure, if you set it again when done, then it will probably work just
fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING
separately. An attempt at that at the end of this email, totally
untested, and I'm not certain it's a good idea at all (see below). Is
there a reason why we can't exit and get the task_work processed
instead? That'd be greatly preferable.

>>> If it's "rare enough" I'd say sacrificing the connection might make more
>>> sense than a busy loop, but if it's becoming common I think we'll need
>>> to spend some more time thinking about it...
>>> It might be less effort to dig out my async flush commits if this become
>>> too complicated, but I wish I could say I have time for it...
>>
>> It can be a number of different things - eg fput() will do it.
> 
> Hm, schedule_delayed_work on the last fput, ok.
> I was wondering what it had to do with the current 9p thread, but since
> it's not scheduled on a particular cpu it can pick another cpu to wake
> up, that makes sense -- although conceptually it feels rather bad to
> interrupt a remote IO because of a local task that can be done later;
> e.g. between having the fput wait a bit, or cancel a slow operation like
> a 1MB write, I'd rather make the fput wait.
> Do you know why that signal/interrupt is needed in the first place?

It's needed if the task is currently sleeping in the kernel, to abort a
sleeping loop. The task_work may contain actions that will result in the
sleep loop being satisfied and hence ending, which means it needs to be
processed. That's my worry with the check-and-clear, then reset state
solution.

>> The particular case that I came across was io_uring, which will use
>> TWA_SIGNAL based task_work for retry operations (and other things). If
>> you use io_uring, and depending on how you setup the ring, it can be
>> quite common or will never happen. Dropping the connection task_work
>> being pending is not a viable solution, I'm afraid.
> 
> Thanks for confirming that it's perfectly normal, let's not drop
> connections :)
> 
> My preferred approach is still to try and restore the async flush code,
> but that will take a while -- it's not something that'll work right away
> and I want some tests so it won't be ready for this merge window.
> If we can have some sort of workaround until then it'll probably be for
> the best, but I don't have any other idea (than temporarily clearing the
> flag) at this point.
> 
> I'll setup some uring IO on 9p and see if I can produce these.

I'm attaching a test case. I don't think it's particularly useful, but
it does nicely demonstrate the infinite loop that 9p gets into if
there's task_work pending.

-- 
Jens Axboe

[-- Attachment #2: repro.c --]
[-- Type: text/x-csrc, Size: 19895 bytes --]

// autogenerated by syzkaller (https://github.com/google/syzkaller)

#define _GNU_SOURCE

#include <dirent.h>
#include <endian.h>
#include <errno.h>
#include <fcntl.h>
#include <setjmp.h>
#include <signal.h>
#include <stdarg.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/mount.h>
#include <sys/prctl.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

#include <linux/loop.h>

#ifndef __NR_io_uring_setup
#define __NR_io_uring_setup 425
#endif
#ifndef __NR_memfd_create
#define __NR_memfd_create 319
#endif

static unsigned long long procid;

static __thread int clone_ongoing;
static __thread int skip_segv;
static __thread jmp_buf segv_env;

static void segv_handler(int sig, siginfo_t* info, void* ctx)
{
  if (__atomic_load_n(&clone_ongoing, __ATOMIC_RELAXED) != 0) {
    exit(sig);
  }
  uintptr_t addr = (uintptr_t)info->si_addr;
  const uintptr_t prog_start = 1 << 20;
  const uintptr_t prog_end = 100 << 20;
  int skip = __atomic_load_n(&skip_segv, __ATOMIC_RELAXED) != 0;
  int valid = addr < prog_start || addr > prog_end;
  if (skip && valid) {
    _longjmp(segv_env, 1);
  }
  exit(sig);
}

static void install_segv_handler(void)
{
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sa.sa_handler = SIG_IGN;
  syscall(SYS_rt_sigaction, 0x20, &sa, NULL, 8);
  syscall(SYS_rt_sigaction, 0x21, &sa, NULL, 8);
  memset(&sa, 0, sizeof(sa));
  sa.sa_sigaction = segv_handler;
  sa.sa_flags = SA_NODEFER | SA_SIGINFO;
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
}

#define NONFAILING(...)                                                        \
  ({                                                                           \
    int ok = 1;                                                                \
    __atomic_fetch_add(&skip_segv, 1, __ATOMIC_SEQ_CST);                       \
    if (_setjmp(segv_env) == 0) {                                              \
      __VA_ARGS__;                                                             \
    } else                                                                     \
      ok = 0;                                                                  \
    __atomic_fetch_sub(&skip_segv, 1, __ATOMIC_SEQ_CST);                       \
    ok;                                                                        \
  })

static void sleep_ms(uint64_t ms)
{
  usleep(ms * 1000);
}

static uint64_t current_time_ms(void)
{
  struct timespec ts;
  if (clock_gettime(CLOCK_MONOTONIC, &ts))
    exit(1);
  return (uint64_t)ts.tv_sec * 1000 + (uint64_t)ts.tv_nsec / 1000000;
}

static bool write_file(const char* file, const char* what, ...)
{
  char buf[1024];
  va_list args;
  va_start(args, what);
  vsnprintf(buf, sizeof(buf), what, args);
  va_end(args);
  buf[sizeof(buf) - 1] = 0;
  int len = strlen(buf);
  int fd = open(file, O_WRONLY | O_CLOEXEC);
  if (fd == -1)
    return false;
  if (write(fd, buf, len) != len) {
    int err = errno;
    close(fd);
    errno = err;
    return false;
  }
  close(fd);
  return true;
}

//% This code is derived from puff.{c,h}, found in the zlib development. The
//% original files come with the following copyright notice:

//% Copyright (C) 2002-2013 Mark Adler, all rights reserved
//% version 2.3, 21 Jan 2013
//% This software is provided 'as-is', without any express or implied
//% warranty.  In no event will the author be held liable for any damages
//% arising from the use of this software.
//% Permission is granted to anyone to use this software for any purpose,
//% including commercial applications, and to alter it and redistribute it
//% freely, subject to the following restrictions:
//% 1. The origin of this software must not be misrepresented; you must not
//%    claim that you wrote the original software. If you use this software
//%    in a product, an acknowledgment in the product documentation would be
//%    appreciated but is not required.
//% 2. Altered source versions must be plainly marked as such, and must not be
//%    misrepresented as being the original software.
//% 3. This notice may not be removed or altered from any source distribution.
//% Mark Adler    madler@alumni.caltech.edu

//% BEGIN CODE DERIVED FROM puff.{c,h}

#define MAXBITS 15
#define MAXLCODES 286
#define MAXDCODES 30
#define MAXCODES (MAXLCODES + MAXDCODES)
#define FIXLCODES 288

struct puff_state {
  unsigned char* out;
  unsigned long outlen;
  unsigned long outcnt;
  const unsigned char* in;
  unsigned long inlen;
  unsigned long incnt;
  int bitbuf;
  int bitcnt;
  jmp_buf env;
};
static int puff_bits(struct puff_state* s, int need)
{
  long val = s->bitbuf;
  while (s->bitcnt < need) {
    if (s->incnt == s->inlen)
      longjmp(s->env, 1);
    val |= (long)(s->in[s->incnt++]) << s->bitcnt;
    s->bitcnt += 8;
  }
  s->bitbuf = (int)(val >> need);
  s->bitcnt -= need;
  return (int)(val & ((1L << need) - 1));
}
static int puff_stored(struct puff_state* s)
{
  s->bitbuf = 0;
  s->bitcnt = 0;
  if (s->incnt + 4 > s->inlen)
    return 2;
  unsigned len = s->in[s->incnt++];
  len |= s->in[s->incnt++] << 8;
  if (s->in[s->incnt++] != (~len & 0xff) ||
      s->in[s->incnt++] != ((~len >> 8) & 0xff))
    return -2;
  if (s->incnt + len > s->inlen)
    return 2;
  if (s->outcnt + len > s->outlen)
    return 1;
  for (; len--; s->outcnt++, s->incnt++) {
    if (s->in[s->incnt])
      s->out[s->outcnt] = s->in[s->incnt];
  }
  return 0;
}
struct puff_huffman {
  short* count;
  short* symbol;
};
static int puff_decode(struct puff_state* s, const struct puff_huffman* h)
{
  int first = 0;
  int index = 0;
  int bitbuf = s->bitbuf;
  int left = s->bitcnt;
  int code = first = index = 0;
  int len = 1;
  short* next = h->count + 1;
  while (1) {
    while (left--) {
      code |= bitbuf & 1;
      bitbuf >>= 1;
      int count = *next++;
      if (code - count < first) {
        s->bitbuf = bitbuf;
        s->bitcnt = (s->bitcnt - len) & 7;
        return h->symbol[index + (code - first)];
      }
      index += count;
      first += count;
      first <<= 1;
      code <<= 1;
      len++;
    }
    left = (MAXBITS + 1) - len;
    if (left == 0)
      break;
    if (s->incnt == s->inlen)
      longjmp(s->env, 1);
    bitbuf = s->in[s->incnt++];
    if (left > 8)
      left = 8;
  }
  return -10;
}
static int puff_construct(struct puff_huffman* h, const short* length, int n)
{
  int len;
  for (len = 0; len <= MAXBITS; len++)
    h->count[len] = 0;
  int symbol;
  for (symbol = 0; symbol < n; symbol++)
    (h->count[length[symbol]])++;
  if (h->count[0] == n)
    return 0;
  int left = 1;
  for (len = 1; len <= MAXBITS; len++) {
    left <<= 1;
    left -= h->count[len];
    if (left < 0)
      return left;
  }
  short offs[MAXBITS + 1];
  offs[1] = 0;
  for (len = 1; len < MAXBITS; len++)
    offs[len + 1] = offs[len] + h->count[len];
  for (symbol = 0; symbol < n; symbol++)
    if (length[symbol] != 0)
      h->symbol[offs[length[symbol]]++] = symbol;
  return left;
}
static int puff_codes(struct puff_state* s, const struct puff_huffman* lencode,
                      const struct puff_huffman* distcode)
{
  static const short lens[29] = {3,  4,  5,  6,   7,   8,   9,   10,  11, 13,
                                 15, 17, 19, 23,  27,  31,  35,  43,  51, 59,
                                 67, 83, 99, 115, 131, 163, 195, 227, 258};
  static const short lext[29] = {0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2,
                                 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0};
  static const short dists[30] = {
      1,    2,    3,    4,    5,    7,    9,    13,    17,    25,
      33,   49,   65,   97,   129,  193,  257,  385,   513,   769,
      1025, 1537, 2049, 3073, 4097, 6145, 8193, 12289, 16385, 24577};
  static const short dext[30] = {0, 0, 0,  0,  1,  1,  2,  2,  3,  3,
                                 4, 4, 5,  5,  6,  6,  7,  7,  8,  8,
                                 9, 9, 10, 10, 11, 11, 12, 12, 13, 13};
  int symbol;
  do {
    symbol = puff_decode(s, lencode);
    if (symbol < 0)
      return symbol;
    if (symbol < 256) {
      if (s->outcnt == s->outlen)
        return 1;
      if (symbol)
        s->out[s->outcnt] = symbol;
      s->outcnt++;
    } else if (symbol > 256) {
      symbol -= 257;
      if (symbol >= 29)
        return -10;
      int len = lens[symbol] + puff_bits(s, lext[symbol]);
      symbol = puff_decode(s, distcode);
      if (symbol < 0)
        return symbol;
      unsigned dist = dists[symbol] + puff_bits(s, dext[symbol]);
      if (dist > s->outcnt)
        return -11;
      if (s->outcnt + len > s->outlen)
        return 1;
      while (len--) {
        if (dist <= s->outcnt && s->out[s->outcnt - dist])
          s->out[s->outcnt] = s->out[s->outcnt - dist];
        s->outcnt++;
      }
    }
  } while (symbol != 256);
  return 0;
}
static int puff_fixed(struct puff_state* s)
{
  static int virgin = 1;
  static short lencnt[MAXBITS + 1], lensym[FIXLCODES];
  static short distcnt[MAXBITS + 1], distsym[MAXDCODES];
  static struct puff_huffman lencode, distcode;
  if (virgin) {
    lencode.count = lencnt;
    lencode.symbol = lensym;
    distcode.count = distcnt;
    distcode.symbol = distsym;
    short lengths[FIXLCODES];
    int symbol;
    for (symbol = 0; symbol < 144; symbol++)
      lengths[symbol] = 8;
    for (; symbol < 256; symbol++)
      lengths[symbol] = 9;
    for (; symbol < 280; symbol++)
      lengths[symbol] = 7;
    for (; symbol < FIXLCODES; symbol++)
      lengths[symbol] = 8;
    puff_construct(&lencode, lengths, FIXLCODES);
    for (symbol = 0; symbol < MAXDCODES; symbol++)
      lengths[symbol] = 5;
    puff_construct(&distcode, lengths, MAXDCODES);
    virgin = 0;
  }
  return puff_codes(s, &lencode, &distcode);
}
static int puff_dynamic(struct puff_state* s)
{
  static const short order[19] = {16, 17, 18, 0, 8,  7, 9,  6, 10, 5,
                                  11, 4,  12, 3, 13, 2, 14, 1, 15};
  int nlen = puff_bits(s, 5) + 257;
  int ndist = puff_bits(s, 5) + 1;
  int ncode = puff_bits(s, 4) + 4;
  if (nlen > MAXLCODES || ndist > MAXDCODES)
    return -3;
  short lengths[MAXCODES];
  int index;
  for (index = 0; index < ncode; index++)
    lengths[order[index]] = puff_bits(s, 3);
  for (; index < 19; index++)
    lengths[order[index]] = 0;
  short lencnt[MAXBITS + 1], lensym[MAXLCODES];
  struct puff_huffman lencode = {lencnt, lensym};
  int err = puff_construct(&lencode, lengths, 19);
  if (err != 0)
    return -4;
  index = 0;
  while (index < nlen + ndist) {
    int symbol;
    int len;
    symbol = puff_decode(s, &lencode);
    if (symbol < 0)
      return symbol;
    if (symbol < 16)
      lengths[index++] = symbol;
    else {
      len = 0;
      if (symbol == 16) {
        if (index == 0)
          return -5;
        len = lengths[index - 1];
        symbol = 3 + puff_bits(s, 2);
      } else if (symbol == 17)
        symbol = 3 + puff_bits(s, 3);
      else
        symbol = 11 + puff_bits(s, 7);
      if (index + symbol > nlen + ndist)
        return -6;
      while (symbol--)
        lengths[index++] = len;
    }
  }
  if (lengths[256] == 0)
    return -9;
  err = puff_construct(&lencode, lengths, nlen);
  if (err && (err < 0 || nlen != lencode.count[0] + lencode.count[1]))
    return -7;
  short distcnt[MAXBITS + 1], distsym[MAXDCODES];
  struct puff_huffman distcode = {distcnt, distsym};
  err = puff_construct(&distcode, lengths + nlen, ndist);
  if (err && (err < 0 || ndist != distcode.count[0] + distcode.count[1]))
    return -8;
  return puff_codes(s, &lencode, &distcode);
}
static int puff(unsigned char* dest, unsigned long* destlen,
                const unsigned char* source, unsigned long sourcelen)
{
  struct puff_state s = {
      .out = dest,
      .outlen = *destlen,
      .outcnt = 0,
      .in = source,
      .inlen = sourcelen,
      .incnt = 0,
      .bitbuf = 0,
      .bitcnt = 0,
  };
  int err;
  if (setjmp(s.env) != 0)
    err = 2;
  else {
    int last;
    do {
      last = puff_bits(&s, 1);
      int type = puff_bits(&s, 2);
      err = type == 0 ? puff_stored(&s)
                      : (type == 1 ? puff_fixed(&s)
                                   : (type == 2 ? puff_dynamic(&s) : -1));
      if (err != 0)
        break;
    } while (!last);
  }
  *destlen = s.outcnt;
  return err;
}

//% END CODE DERIVED FROM puff.{c,h}

#define ZLIB_HEADER_WIDTH 2

static int puff_zlib_to_file(const unsigned char* source,
                             unsigned long sourcelen, int dest_fd)
{
  if (sourcelen < ZLIB_HEADER_WIDTH)
    return 0;
  source += ZLIB_HEADER_WIDTH;
  sourcelen -= ZLIB_HEADER_WIDTH;
  const unsigned long max_destlen = 132 << 20;
  void* ret = mmap(0, max_destlen, PROT_WRITE | PROT_READ,
                   MAP_PRIVATE | MAP_ANON, -1, 0);
  if (ret == MAP_FAILED)
    return -1;
  unsigned char* dest = (unsigned char*)ret;
  unsigned long destlen = max_destlen;
  int err = puff(dest, &destlen, source, sourcelen);
  if (err) {
    munmap(dest, max_destlen);
    errno = -err;
    return -1;
  }
  if (write(dest_fd, dest, destlen) != (ssize_t)destlen) {
    munmap(dest, max_destlen);
    return -1;
  }
  return munmap(dest, destlen);
}

static int setup_loop_device(unsigned char* data, unsigned long size,
                             const char* loopname, int* loopfd_p)
{
  int err = 0, loopfd = -1;
  int memfd = syscall(__NR_memfd_create, "syzkaller", 0);
  if (memfd == -1) {
    err = errno;
    goto error;
  }
  if (puff_zlib_to_file(data, size, memfd)) {
    err = errno;
    goto error_close_memfd;
  }
  loopfd = open(loopname, O_RDWR);
  if (loopfd == -1) {
    err = errno;
    goto error_close_memfd;
  }
  if (ioctl(loopfd, LOOP_SET_FD, memfd)) {
    if (errno != EBUSY) {
      err = errno;
      goto error_close_loop;
    }
    ioctl(loopfd, LOOP_CLR_FD, 0);
    usleep(1000);
    if (ioctl(loopfd, LOOP_SET_FD, memfd)) {
      err = errno;
      goto error_close_loop;
    }
  }
  close(memfd);
  *loopfd_p = loopfd;
  return 0;

error_close_loop:
  close(loopfd);
error_close_memfd:
  close(memfd);
error:
  errno = err;
  return -1;
}

static long syz_mount_image(volatile long fsarg, volatile long dir,
                            volatile long flags, volatile long optsarg,
                            volatile long change_dir,
                            volatile unsigned long size, volatile long image)
{
  unsigned char* data = (unsigned char*)image;
  int res = -1, err = 0, loopfd = -1, need_loop_device = !!size;
  char* mount_opts = (char*)optsarg;
  char* target = (char*)dir;
  char* fs = (char*)fsarg;
  char* source = NULL;
  char loopname[64];
  if (need_loop_device) {
    memset(loopname, 0, sizeof(loopname));
    snprintf(loopname, sizeof(loopname), "/dev/loop%llu", procid);
    if (setup_loop_device(data, size, loopname, &loopfd) == -1)
      return -1;
    source = loopname;
  }
  mkdir(target, 0777);
  char opts[256];
  memset(opts, 0, sizeof(opts));
  if (strlen(mount_opts) > (sizeof(opts) - 32)) {
  }
  strncpy(opts, mount_opts, sizeof(opts) - 32);
  if (strcmp(fs, "iso9660") == 0) {
    flags |= MS_RDONLY;
  } else if (strncmp(fs, "ext", 3) == 0) {
    bool has_remount_ro = false;
    char* remount_ro_start = strstr(opts, "errors=remount-ro");
    if (remount_ro_start != NULL) {
      char after = *(remount_ro_start + strlen("errors=remount-ro"));
      char before = remount_ro_start == opts ? '\0' : *(remount_ro_start - 1);
      has_remount_ro = ((before == '\0' || before == ',') &&
                        (after == '\0' || after == ','));
    }
    if (strstr(opts, "errors=panic") || !has_remount_ro)
      strcat(opts, ",errors=continue");
  } else if (strcmp(fs, "xfs") == 0) {
    strcat(opts, ",nouuid");
  }
  res = mount(source, target, fs, flags, opts);
  if (res == -1) {
    err = errno;
    goto error_clear_loop;
  }
  res = open(target, O_RDONLY | O_DIRECTORY);
  if (res == -1) {
    err = errno;
    goto error_clear_loop;
  }
  if (change_dir) {
    res = chdir(target);
    if (res == -1) {
      err = errno;
    }
  }

error_clear_loop:
  if (need_loop_device) {
    ioctl(loopfd, LOOP_CLR_FD, 0);
    close(loopfd);
  }
  errno = err;
  return res;
}

static void kill_and_wait(int pid, int* status)
{
  kill(-pid, SIGKILL);
  kill(pid, SIGKILL);
  for (int i = 0; i < 100; i++) {
    if (waitpid(-1, status, WNOHANG | __WALL) == pid)
      return;
    usleep(1000);
  }
  DIR* dir = opendir("/sys/fs/fuse/connections");
  if (dir) {
    for (;;) {
      struct dirent* ent = readdir(dir);
      if (!ent)
        break;
      if (strcmp(ent->d_name, ".") == 0 || strcmp(ent->d_name, "..") == 0)
        continue;
      char abort[300];
      snprintf(abort, sizeof(abort), "/sys/fs/fuse/connections/%s/abort",
               ent->d_name);
      int fd = open(abort, O_WRONLY);
      if (fd == -1) {
        continue;
      }
      if (write(fd, abort, 1) < 0) {
      }
      close(fd);
    }
    closedir(dir);
  } else {
  }
  while (waitpid(-1, status, __WALL) != pid) {
  }
}

static void reset_loop()
{
  char buf[64];
  snprintf(buf, sizeof(buf), "/dev/loop%llu", procid);
  int loopfd = open(buf, O_RDWR);
  if (loopfd != -1) {
    ioctl(loopfd, LOOP_CLR_FD, 0);
    close(loopfd);
  }
}

static void setup_test()
{
  prctl(PR_SET_PDEATHSIG, SIGKILL, 0, 0, 0);
  setpgrp();
  write_file("/proc/self/oom_score_adj", "1000");
}

static void execute_one(void);

#define WAIT_FLAGS __WALL

static void loop(void)
{
  int iter = 0;
  for (;; iter++) {
    reset_loop();
    int pid = fork();
    if (pid < 0)
      exit(1);
    if (pid == 0) {
      setup_test();
      execute_one();
      exit(0);
    }
    int status = 0;
    uint64_t start = current_time_ms();
    for (;;) {
      if (waitpid(-1, &status, WNOHANG | WAIT_FLAGS) == pid)
        break;
      sleep_ms(1);
      if (current_time_ms() - start < 5000)
        continue;
      kill_and_wait(pid, &status);
      break;
    }
  }
}

uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};

void execute_one(void)
{
  intptr_t res = 0;
  res = syscall(__NR_pipe2, 0x20000100ul, 0ul);
  if (res != -1) {
    NONFAILING(r[0] = *(uint32_t*)0x20000100);
    NONFAILING(r[1] = *(uint32_t*)0x20000104);
  }
  NONFAILING(memcpy((void*)0x20002080, "./file0\000", 8));
  NONFAILING(syz_mount_image(0, 0x20002080, 0, 0, 0, 0, 0));
  NONFAILING(memcpy((void*)0x200000c0,
                    "\x15\x06\x00\x00\x65\xff\xff\x00\x00\x00\x00\x08\x00\x39"
                    "\x50\x32\x30\x30\x30\x2e\x75",
                    21));
  syscall(__NR_write, r[1], 0x200000c0ul, 0x15ul);
  NONFAILING(*(uint32_t*)0x200001c4 = 0);
  NONFAILING(*(uint32_t*)0x200001c8 = 0x10);
  NONFAILING(*(uint32_t*)0x200001cc = 0);
  NONFAILING(*(uint32_t*)0x200001d0 = 0);
  NONFAILING(*(uint32_t*)0x200001d8 = -1);
  NONFAILING(memset((void*)0x200001dc, 0, 12));
  res = syscall(__NR_io_uring_setup, 0x5ec9, 0x200001c0ul);
  if (res != -1)
    r[2] = res;
  syscall(__NR_close, r[2]);
  NONFAILING(memcpy((void*)0x20000080, "./file0\000", 8));
  NONFAILING(memcpy((void*)0x20000040, "9p\000", 3));
  NONFAILING(memcpy((void*)0x20000140, "trans=fd,rfdno=", 15));
  NONFAILING(sprintf((char*)0x2000014f, "0x%016llx", (long long)r[0]));
  NONFAILING(memcpy((void*)0x20000161, ",wfdno=", 7));
  NONFAILING(sprintf((char*)0x20000168, "0x%016llx", (long long)r[1]));
  syscall(__NR_mount, 0ul, 0x20000080ul, 0x20000040ul, 0ul, 0x20000140ul);
}
int main(void)
{
  syscall(__NR_mmap, 0x1ffff000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
  syscall(__NR_mmap, 0x20000000ul, 0x1000000ul, 7ul, 0x32ul, -1, 0ul);
  syscall(__NR_mmap, 0x21000000ul, 0x1000ul, 0ul, 0x32ul, -1, 0ul);
  install_segv_handler();
  loop();
  return 0;
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
  2023-02-06 21:56       ` Jens Axboe
@ 2023-02-06 21:58         ` Jens Axboe
  2023-02-06 22:29         ` Dominique Martinet
  1 sibling, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2023-02-06 21:58 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Christian Schoenebeck, Eric Van Hensbergen, netdev,
	Jakub Kicinski, Pengfei Xu, v9fs-developer

>> Sorry I didn't develop that idea; the signal path resets the pending
>> signal when we're done, I assumed we could also reset the TWA_SIGNAL
>> flag when we're done flushing. That might take a while though, so it's
>> far from optimal.
> 
> Sure, if you set it again when done, then it will probably work just
> fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING
> separately. An attempt at that at the end of this email, totally
> untested, and I'm not certain it's a good idea at all (see below). Is
> there a reason why we can't exit and get the task_work processed
> instead? That'd be greatly preferable.
Forgot to include it, but as mentioned, don't think it's a sane idea...


diff --git a/net/9p/client.c b/net/9p/client.c
index 622ec6a586ee..e4ff2773e00b 100644
--- a/net/9p/client.c
+++ b/net/9p/client.c
@@ -652,6 +652,33 @@ static struct p9_req_t *p9_client_prepare_req(struct p9_client *c,
 	return ERR_PTR(err);
 }
 
+static void p9_clear_sigpending(int *sigpending, int *notifypending)
+{
+	if (signal_pending(current)) {
+		*sigpending = test_thread_flag(TIF_SIGPENDING);
+		if (*sigpending)
+			clear_thread_flag(TIF_SIGPENDING);
+		*notifypending = test_thread_flag(TIF_NOTIFY_SIGNAL);
+		if (*notifypending)
+			clear_thread_flag(TIF_NOTIFY_SIGNAL);
+	} else {
+		*sigpending = *notifypending = 0;
+	}
+}
+
+static void p9_reset_sigpending(int sigpending, int notifypending)
+{
+	unsigned long flags;
+
+	if (sigpending) {
+		spin_lock_irqsave(&current->sighand->siglock, flags);
+		recalc_sigpending();
+		spin_unlock_irqrestore(&current->sighand->siglock, flags);
+	}
+	if (notifypending)
+		set_tsk_thread_flag(current, TIF_NOTIFY_SIGNAL);
+}
+
 /**
  * p9_client_rpc - issue a request and wait for a response
  * @c: client session
@@ -665,8 +692,7 @@ static struct p9_req_t *
 p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 {
 	va_list ap;
-	int sigpending, err;
-	unsigned long flags;
+	int sigpending, notifypending, err;
 	struct p9_req_t *req;
 	/* Passing zero for tsize/rsize to p9_client_prepare_req() tells it to
 	 * auto determine an appropriate (small) request/response size
@@ -687,12 +713,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	req->tc.zc = false;
 	req->rc.zc = false;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	p9_clear_sigpending(&sigpending, &notifypending);
 
 	err = c->trans_mod->request(c, req);
 	if (err < 0) {
@@ -714,8 +735,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 
 	if (err == -ERESTARTSYS && c->status == Connected &&
 	    type == P9_TFLUSH) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
+		p9_clear_sigpending(&sigpending, &notifypending);
 		goto again;
 	}
 
@@ -725,8 +745,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 	}
 	if (err == -ERESTARTSYS && c->status == Connected) {
 		p9_debug(P9_DEBUG_MUX, "flushing\n");
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
+		p9_clear_sigpending(&sigpending, &notifypending);
 
 		if (c->trans_mod->cancel(c, req))
 			p9_client_flush(c, req);
@@ -736,11 +755,7 @@ p9_client_rpc(struct p9_client *c, int8_t type, const char *fmt, ...)
 			err = 0;
 	}
 recalc_sigpending:
-	if (sigpending) {
-		spin_lock_irqsave(&current->sighand->siglock, flags);
-		recalc_sigpending();
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
-	}
+	p9_reset_sigpending(sigpending, notifypending);
 	if (err < 0)
 		goto reterr;
 
@@ -773,8 +788,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 					 const char *fmt, ...)
 {
 	va_list ap;
-	int sigpending, err;
-	unsigned long flags;
+	int sigpending, notifypending, err;
 	struct p9_req_t *req;
 
 	va_start(ap, fmt);
@@ -789,12 +803,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	req->tc.zc = true;
 	req->rc.zc = true;
 
-	if (signal_pending(current)) {
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
-	} else {
-		sigpending = 0;
-	}
+	p9_clear_sigpending(&sigpending, &notifypending);
 
 	err = c->trans_mod->zc_request(c, req, uidata, uodata,
 				       inlen, olen, in_hdrlen);
@@ -810,8 +819,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 	}
 	if (err == -ERESTARTSYS && c->status == Connected) {
 		p9_debug(P9_DEBUG_MUX, "flushing\n");
-		sigpending = 1;
-		clear_thread_flag(TIF_SIGPENDING);
+		p9_clear_sigpending(&sigpending, &notifypending);
 
 		if (c->trans_mod->cancel(c, req))
 			p9_client_flush(c, req);
@@ -821,11 +829,7 @@ static struct p9_req_t *p9_client_zc_rpc(struct p9_client *c, int8_t type,
 			err = 0;
 	}
 recalc_sigpending:
-	if (sigpending) {
-		spin_lock_irqsave(&current->sighand->siglock, flags);
-		recalc_sigpending();
-		spin_unlock_irqrestore(&current->sighand->siglock, flags);
-	}
+	p9_reset_sigpending(sigpending, notifypending);
 	if (err < 0)
 		goto reterr;
 

-- 
Jens Axboe



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
  2023-02-06 21:56       ` Jens Axboe
  2023-02-06 21:58         ` Jens Axboe
@ 2023-02-06 22:29         ` Dominique Martinet
  2023-02-06 22:56           ` Jens Axboe
  1 sibling, 1 reply; 8+ messages in thread
From: Dominique Martinet @ 2023-02-06 22:29 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christian Schoenebeck, Eric Van Hensbergen, netdev,
	Jakub Kicinski, Pengfei Xu, v9fs-developer

Jens Axboe wrote on Mon, Feb 06, 2023 at 02:56:57PM -0700:
> Sure, if you set it again when done, then it will probably work just
> fine. But you need to treat TIF_NOTIFY_SIGNAL and TIF_SIGPENDING
> separately. An attempt at that at the end of this email, totally
> untested, and I'm not certain it's a good idea at all (see below). Is
> there a reason why we can't exit and get the task_work processed
> instead? That'd be greatly preferable.

No good reason aside of "it's not ready", but in the current code things
will probably get weird.
I actually misremembered the tag lookup for trans_fd, since we're not
freeing the tag yet the lookup will work and the connexion might not be
dropped (just reading into a buffer then freeing it in the cb without
any further processing), but even my refcounting works better than I
thought you'll end up with the IO being replayed while the server is
still processing the first one.
This is unlikely, but for example this could happen: 

- first write [0;1MB]
- write is interrupted before server handled it
   - write replayed and handled, userspace continues to...
   - second write [1MB-4k;1MB]
- first write handle by server, overwriting the second write

And who doesn't enjoy a silent corruption for breakfast?


> > Hm, schedule_delayed_work on the last fput, ok.
> > I was wondering what it had to do with the current 9p thread, but since
> > it's not scheduled on a particular cpu it can pick another cpu to wake
> > up, that makes sense -- although conceptually it feels rather bad to
> > interrupt a remote IO because of a local task that can be done later;
> > e.g. between having the fput wait a bit, or cancel a slow operation like
> > a 1MB write, I'd rather make the fput wait.
> > Do you know why that signal/interrupt is needed in the first place?
> 
> It's needed if the task is currently sleeping in the kernel, to abort a
> sleeping loop. The task_work may contain actions that will result in the
> sleep loop being satisfied and hence ending, which means it needs to be
> processed. That's my worry with the check-and-clear, then reset state
> solution.

I see, sleeping loop might not wake up until the signal is handled, but
it won't handle it if we don't get out.
Not bailing out on sigkill is bad enough but that's possibly much worse
indeed... And that also means the busy loop isn't any better, I was
wondering how it was noticed if it was just a few busy checks but in
that case just temporarily clearing the flag won't get out either,
that's not even a workaround.

I assume that also explains why it wants that task, and cannot just run
from the idle context-- it's not just any worker task, it's in the
process context? (sorry for using you as a rubber duck...)

> > I'll setup some uring IO on 9p and see if I can produce these.
> 
> I'm attaching a test case. I don't think it's particularly useful, but
> it does nicely demonstrate the infinite loop that 9p gets into if
> there's task_work pending.

Thanks, that helps!
I might not have time until weekend but I'll definitely look at it.

-- 
Dominique

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending()
  2023-02-06 22:29         ` Dominique Martinet
@ 2023-02-06 22:56           ` Jens Axboe
  0 siblings, 0 replies; 8+ messages in thread
From: Jens Axboe @ 2023-02-06 22:56 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Christian Schoenebeck, Eric Van Hensbergen, netdev,
	Jakub Kicinski, Pengfei Xu, v9fs-developer

On 2/6/23 3:29?PM, Dominique Martinet wrote:
>>> Hm, schedule_delayed_work on the last fput, ok.
>>> I was wondering what it had to do with the current 9p thread, but since
>>> it's not scheduled on a particular cpu it can pick another cpu to wake
>>> up, that makes sense -- although conceptually it feels rather bad to
>>> interrupt a remote IO because of a local task that can be done later;
>>> e.g. between having the fput wait a bit, or cancel a slow operation like
>>> a 1MB write, I'd rather make the fput wait.
>>> Do you know why that signal/interrupt is needed in the first place?
>>
>> It's needed if the task is currently sleeping in the kernel, to abort a
>> sleeping loop. The task_work may contain actions that will result in the
>> sleep loop being satisfied and hence ending, which means it needs to be
>> processed. That's my worry with the check-and-clear, then reset state
>> solution.
> 
> I see, sleeping loop might not wake up until the signal is handled, but
> it won't handle it if we don't get out.

Exactly

> Not bailing out on sigkill is bad enough but that's possibly much worse
> indeed... And that also means the busy loop isn't any better, I was
> wondering how it was noticed if it was just a few busy checks but in
> that case just temporarily clearing the flag won't get out either,
> that's not even a workaround.
> 
> I assume that also explains why it wants that task, and cannot just run
> from the idle context-- it's not just any worker task, it's in the
> process context? (sorry for using you as a rubber duck...)

Right, it needs to run in the context of the right task. So we can't
just punt it out-of-line to something else, whihc would obviously also
solve that dependency loop.

>>> I'll setup some uring IO on 9p and see if I can produce these.
>>
>> I'm attaching a test case. I don't think it's particularly useful, but
>> it does nicely demonstrate the infinite loop that 9p gets into if
>> there's task_work pending.
> 
> Thanks, that helps!
> I might not have time until weekend but I'll definitely look at it.

Sounds good, thanks! I'll consider my patch abandoned and wait for what
you have.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-06 22:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-03 16:04 [PATCH v2] 9p/client: don't assume signal_pending() clears on recalc_sigpending() Jens Axboe
2023-02-05 10:02 ` Dominique Martinet
2023-02-06 20:19   ` Jens Axboe
2023-02-06 21:42     ` Dominique Martinet
2023-02-06 21:56       ` Jens Axboe
2023-02-06 21:58         ` Jens Axboe
2023-02-06 22:29         ` Dominique Martinet
2023-02-06 22:56           ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).