linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marco Elver <elver@google.com>
To: Pavel Begunkov <asml.silence@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>,
	syzbot <syzbot+73554e2258b7b8bf0bbf@syzkaller.appspotmail.com>,
	io-uring@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Dmitry Vyukov <dvyukov@google.com>
Subject: Re: [syzbot] KCSAN: data-race in __io_uring_cancel / io_uring_try_cancel_requests
Date: Thu, 27 May 2021 11:32:34 +0200	[thread overview]
Message-ID: <YK9nMgamPsr9YsoY@elver.google.com> (raw)
In-Reply-To: <5cf2250a-c580-4dbf-5997-e987c7b71086@gmail.com>

On Wed, May 26, 2021 at 09:31PM +0100, Pavel Begunkov wrote:
> On 5/26/21 5:36 PM, Marco Elver wrote:
> > On Wed, 26 May 2021 at 18:29, Pavel Begunkov <asml.silence@gmail.com> wrote:
> >> On 5/26/21 4:52 PM, Marco Elver wrote:
> >>> Due to some moving around of code, the patch lost the actual fix (using
> >>> atomically read io_wq) -- so here it is again ... hopefully as intended.
> >>> :-)
> >>
> >> "fortify" damn it... It was synchronised with &ctx->uring_lock
> >> before, see io_uring_try_cancel_iowq() and io_uring_del_tctx_node(),
> >> so should not clear before *del_tctx_node()
> > 
> > Ah, so if I understand right, the property stated by the comment in
> > io_uring_try_cancel_iowq() was broken, and your patch below would fix
> > that, right?
> 
> "io_uring: fortify tctx/io_wq cleanup" broke it and the diff
> should fix it.
> 
> >> The fix should just move it after this sync point. Will you send
> >> it out as a patch?
> > 
> > Do you mean your move of write to io_wq goes on top of the patch I
> > proposed? (If so, please also leave your Signed-of-by so I can squash
> > it.)
> 
> No, only my diff, but you hinted on what has happened, so I would
> prefer you to take care of patching. If you want of course.
> 
> To be entirely fair, assuming that aligned ptr
> reads can't be torn, I don't see any _real_ problem. But surely
> the report is very helpful and the current state is too wonky, so
> should be patched.

In the current version, it is a problem if we end up with a double-read,
as it is in the current C code. The compiler might of course optimize
it into 1 read into a register.

Tangent: I avoid reasoning in terms of compiler optimizations where
I can. :-) It's is a slippery slope if the code in question isn't
tolerant to data races by design (examples are stats counting, or other
heuristics -- in the case here that's certainly not the case).
Therefore, my wish is that we really ought to resolve as many data races
as we can (+ mark intentional ones appropriately). Also, so that we're
left with only the interesting cases like in the case here.  (More
background if you're interested: https://lwn.net/Articles/816850/)

The problem here, however, has a nicer resolution as you suggested.

> TL;DR;
> The synchronisation goes as this: it's usually used by the owner
> task, and the owner task deletes it, so is mostly naturally
> synchronised. An exception is a worker (not only) that accesses
> it for cancellation purpose, but it uses it only under ->uring_lock,
> so if removal is also taking the lock it should be fine. see
> io_uring_del_tctx_node() locking.

Did you mean io_uring_del_task_file()? There is no
io_uring_del_tctx_node().

> > So if I understand right, we do in fact have 2 problems:
> > 1. the data race as I noted in my patch, and
> 
> Yes, and it deals with it
> 
> > 2. the fact that io_wq does not live long enough.
> 
> Nope, io_wq outlives them fine. 

I've sent:
https://lkml.kernel.org/r/20210527092547.2656514-1-elver@google.com

Thanks,
-- Marco

  reply	other threads:[~2021-05-27  9:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 15:44 [syzbot] KCSAN: data-race in __io_uring_cancel / io_uring_try_cancel_requests syzbot
2021-05-26 15:48 ` Marco Elver
2021-05-26 15:52   ` Marco Elver
2021-05-26 16:29     ` Pavel Begunkov
2021-05-26 16:33       ` Pavel Begunkov
2021-05-26 16:36       ` Marco Elver
2021-05-26 20:31         ` Pavel Begunkov
2021-05-27  9:32           ` Marco Elver [this message]
2021-05-27 10:05             ` Pavel Begunkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YK9nMgamPsr9YsoY@elver.google.com \
    --to=elver@google.com \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=dvyukov@google.com \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=syzbot+73554e2258b7b8bf0bbf@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).