linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set
       [not found]     ` <69253379.JACLdFHAbQ@silver>
@ 2022-09-01 22:25       ` Tetsuo Handa
  2022-09-03 23:39         ` Dominique Martinet
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2022-09-01 22:25 UTC (permalink / raw)
  To: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
	Dominique Martinet
  Cc: syzbot, v9fs-developer, syzkaller-bugs, netdev, linux-fsdevel

On 2022/09/02 0:23, Christian Schoenebeck wrote:
> So the intention in this alternative approach is to allow user space apps  
> still being able to perform blocking I/O, while at the same time making the 
> kernel thread interruptible to fix this hung task issue, correct?

Making the kernel thread "non-blocking" (rather than "interruptible") in order
not to be blocked on I/O on pipes.

Since kernel threads by default do not receive signals, being "interruptible"
or "killable" does not help (except for silencing khungtaskd warning). Being
"non-blocking" like I/O on sockets helps.

>> --- a/net/9p/trans_fd.c
>> +++ b/net/9p/trans_fd.c
>> @@ -256,11 +256,13 @@ static int p9_fd_read(struct p9_client *client, void
>> *v, int len) if (!ts)
>>  		return -EREMOTEIO;
>>
>> -	if (!(ts->rd->f_flags & O_NONBLOCK))
>> -		p9_debug(P9_DEBUG_ERROR, "blocking read ...\n");
>> -
>>  	pos = ts->rd->f_pos;
>> +	/* Force non-blocking read() even without O_NONBLOCK. */
>> +	set_thread_flag(TIF_SIGPENDING);
>>  	ret = kernel_read(ts->rd, v, len, &pos);
>> +	spin_lock_irq(&current->sighand->siglock);
>> +	recalc_sigpending();
>> +	spin_unlock_irq(&current->sighand->siglock);
> 
> Is the recalc_sigpending() block here actually needed? The TIF_SIGPENDING flag 
> is already cleared by net/9p/client.c, no?

This is actually needed.

The thread which processes this function is a kernel workqueue thread which
is supposed to process other functions (which might call "interruptible"
functions even if signals are not received by default).

The thread which currently clearing the TIF_SIGPENDING flag is a user process
(which are calling "killable" functions from syscall context but effectively
"uninterruptible" due to clearing the TIF_SIGPENDING flag and retrying).
By the way, clearing the TIF_SIGPENDING flag before retrying "killable" functions
(like p9_client_rpc() does) is very bad and needs to be avoided...


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set
  2022-09-01 22:25       ` [PATCH v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set Tetsuo Handa
@ 2022-09-03 23:39         ` Dominique Martinet
  2022-09-04  0:27           ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Dominique Martinet @ 2022-09-03 23:39 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
	syzbot, v9fs-developer, syzkaller-bugs, netdev, linux-fsdevel

Thanks for the patch and sorry for the slow reply

v1 vs v2: my take is that I think v1 is easier to understand, and if you
pass a fd to be used as kernel end for 9p you shouldn't also be messing
with it so it's fair game to make it O_NONBLOCK -- we're reading and
writing to these things, the fds shouldn't be used by the caller after
the mount syscall.

Is there any reason you spent time working on v2, or is that just
theorical for not messing with userland fd ?

unless there's any reason I'll try to find time to test v1 and queue it
for 6.1

Tetsuo Handa wrote on Fri, Sep 02, 2022 at 07:25:30AM +0900:
> On 2022/09/02 0:23, Christian Schoenebeck wrote:
> > So the intention in this alternative approach is to allow user space apps  
> > still being able to perform blocking I/O, while at the same time making the 
> > kernel thread interruptible to fix this hung task issue, correct?
> 
> Making the kernel thread "non-blocking" (rather than "interruptible") in order
> not to be blocked on I/O on pipes.
> 
> Since kernel threads by default do not receive signals, being "interruptible"
> or "killable" does not help (except for silencing khungtaskd warning). Being
> "non-blocking" like I/O on sockets helps.

I'm still not 100% sure I understand the root of the deadlock, but I can
agree the worker thread shouldn't block.

We seem to check for EAGAIN where kernel_read/write end up being called
and there's a poll for scheduling so it -should- work, but I assume this
hasn't been tested much and might take a bit of time to test.


> The thread which currently clearing the TIF_SIGPENDING flag is a user process
> (which are calling "killable" functions from syscall context but effectively
> "uninterruptible" due to clearing the TIF_SIGPENDING flag and retrying).
> By the way, clearing the TIF_SIGPENDING flag before retrying "killable" functions
> (like p9_client_rpc() does) is very bad and needs to be avoided...

Yes, I really wish we could make this go away.
I started work to make the cancel (flush) asynchronous, but it
introduced a regression I never had (and still don't have) time to
figure out... If you have motivation to take over, the patches I sent
are here:
https://lore.kernel.org/all/20181217110111.GB17466@nautica/T/
(unfortunately some refactoring happened and they no longer apply, but
the logic should be mostly sane)


Thanks,
--
Dominique

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set
  2022-09-03 23:39         ` Dominique Martinet
@ 2022-09-04  0:27           ` Tetsuo Handa
  2022-10-07  1:40             ` Dominique Martinet
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2022-09-04  0:27 UTC (permalink / raw)
  To: Dominique Martinet
  Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
	syzbot, v9fs-developer, syzkaller-bugs, netdev, linux-fsdevel

On 2022/09/04 8:39, Dominique Martinet wrote:
> Is there any reason you spent time working on v2, or is that just
> theorical for not messing with userland fd ?

Just theoretical for not messing with userland fd, for programs generated
by fuzzers might use fds passed to the mount() syscall. I imagined that
syzbot again reports this problem when it started playing with fcntl().

For robustness, not messing with userland fd is the better. ;-)

> 
> unless there's any reason I'll try to find time to test v1 and queue it
> for 6.1

OK.

> We seem to check for EAGAIN where kernel_read/write end up being called
> and there's a poll for scheduling so it -should- work, but I assume this
> hasn't been tested much and might take a bit of time to test.

Right. Since the I/O in kernel side is poll based multiplexing, forcing
non-blocking I/O -should- work. (But I can't test e.g. changes in CPU time
usage because I don't have environment to test. I assume that poll based
multiplexing saves us from doing busy looping.)

We are currently checking for ERESTARTSYS and EAGAIN. The former is for
non-socket fds which do not have O_NONBLOCK flag, and the latter is for
socket fds which have O_NONBLOCK flag. If we enforce O_NONBLOCK flag,
the former will become redundant. I think we can remove the former check
after you tested that setting O_NONBLOCK flag on non-socket fds does not break.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set
  2022-09-04  0:27           ` Tetsuo Handa
@ 2022-10-07  1:40             ` Dominique Martinet
  2022-10-07 11:52               ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Dominique Martinet @ 2022-10-07  1:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
	syzbot, v9fs-developer, syzkaller-bugs, netdev, linux-fsdevel

Tetsuo Handa wrote on Sun, Sep 04, 2022 at 09:27:22AM +0900:
> On 2022/09/04 8:39, Dominique Martinet wrote:
> > Is there any reason you spent time working on v2, or is that just
> > theorical for not messing with userland fd ?
> 
> Just theoretical for not messing with userland fd, for programs generated
> by fuzzers might use fds passed to the mount() syscall. I imagined that
> syzbot again reports this problem when it started playing with fcntl().
> 
> For robustness, not messing with userland fd is the better. ;-)

By the way digging this back made me think about this a bit again.
My opinion hasn't really changed that if you want to shoot yourself in
the foot I don't think we're crossing any priviledge boundary here, but
we could probably prevent it by saying the mount call with close that fd
and somehow steal it? (drop the fget, close_fd after get_file perhaps?)

That should address your concern about robustess and syzbot will no
longer be able to play with fcntl on "our" end of the pipe. I think it's
fair to say that once you pass it to the kernel all bets are off, so
closing it for the userspace application could make sense, and the mount
already survives when short processes do the mount call and immediately
exit so it's not like we need that fd to be open...


What do you think?

(either way would be for 6.2, the patch is already good enough as is for
me)
--
Dominique

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set
  2022-10-07  1:40             ` Dominique Martinet
@ 2022-10-07 11:52               ` Tetsuo Handa
  0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2022-10-07 11:52 UTC (permalink / raw)
  To: Dominique Martinet, Alexander Viro
  Cc: Christian Schoenebeck, Eric Van Hensbergen, Latchesar Ionkov,
	syzbot, v9fs-developer, syzkaller-bugs, netdev, linux-fsdevel

On 2022/10/07 10:40, Dominique Martinet wrote:
> Tetsuo Handa wrote on Sun, Sep 04, 2022 at 09:27:22AM +0900:
>> On 2022/09/04 8:39, Dominique Martinet wrote:
>>> Is there any reason you spent time working on v2, or is that just
>>> theorical for not messing with userland fd ?
>>
>> Just theoretical for not messing with userland fd, for programs generated
>> by fuzzers might use fds passed to the mount() syscall. I imagined that
>> syzbot again reports this problem when it started playing with fcntl().
>>
>> For robustness, not messing with userland fd is the better. ;-)
> 
> By the way digging this back made me think about this a bit again.
> My opinion hasn't really changed that if you want to shoot yourself in
> the foot I don't think we're crossing any priviledge boundary here, but
> we could probably prevent it by saying the mount call with close that fd
> and somehow steal it? (drop the fget, close_fd after get_file perhaps?)
> 
> That should address your concern about robustess and syzbot will no
> longer be able to play with fcntl on "our" end of the pipe. I think it's
> fair to say that once you pass it to the kernel all bets are off, so
> closing it for the userspace application could make sense, and the mount
> already survives when short processes do the mount call and immediately
> exit so it's not like we need that fd to be open...
> 
> 
> What do you think?

I found that pipe is using alloc_file_clone() which allocates "struct file"
instead of just incrementing "struct file"->f_count.

Then, can we add EXPORT_SYMBOL_GPL(alloc_file_clone) to fs/file_table.c and
use it like

  struct file *f;

  ts->rd = fget(rfd);
  if (!ts->rd)
    goto out_free_ts;
  if (!(ts->rd->f_mode & FMODE_READ))
    goto out_put_rd;
  f = alloc_file_clone(ts->rd, ts->rd->f_flags | O_NONBLOCK, ts->rd->f_op);
  if (IS_ERR(f))
    goto out_put_rd;
  fput(ts->rd);
  ts->rd = f;

  ts->wr = fget(wfd);
  if (!ts->wr)
    goto out_put_rd;
  if (!(ts->wr->f_mode & FMODE_WRITE))
    goto out_put_wr;
  f = alloc_file_clone(ts->wr, ts->wr->f_flags | O_NONBLOCK, ts->wr->f_op);
  if (IS_ERR(f))
    goto out_put_wr;
  fput(ts->wr);
  ts->wr = f;

 from p9_fd_open() for cloning "struct file" with O_NONBLOCK flag added?
Just an idea. I don't know whether alloc_file_clone() arguments are correct...


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-10-07 11:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <00000000000039af4d05915a9f56@google.com>
     [not found] ` <345de429-a88b-7097-d177-adecf9fed342@I-love.SAKURA.ne.jp>
     [not found]   ` <4293faaf-8279-77e2-8b1a-aff765416980@I-love.SAKURA.ne.jp>
     [not found]     ` <69253379.JACLdFHAbQ@silver>
2022-09-01 22:25       ` [PATCH v2] 9p/trans_fd: perform read/write with TIF_SIGPENDING set Tetsuo Handa
2022-09-03 23:39         ` Dominique Martinet
2022-09-04  0:27           ` Tetsuo Handa
2022-10-07  1:40             ` Dominique Martinet
2022-10-07 11:52               ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).