linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
To: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org>
Cc: linux-block@vger.kernel.org, io-uring@vger.kernel.org,
	MATTHEW_WILCOX <matthew.wilcox@oracle.com>
Subject: Re: [PATCH 1/1] block: Manage bio references so the bio persists until necessary
Date: Mon, 24 Feb 2020 15:32:45 -0800	[thread overview]
Message-ID: <8111469e-713d-88d3-7f12-55e90edaf52b@oracle.com> (raw)
In-Reply-To: <46bf2ea0-7677-44af-8e23-45a10710ca3d@kernel.dk>

On 2/4/2020 12:59 PM, Jens Axboe wrote:
> On 2/4/20 12:51 AM, Christoph Hellwig wrote:
>> On Mon, Feb 03, 2020 at 01:07:48PM -0800, Bijan Mottahedeh wrote:
>>> My concern is with the code below for the single bio async case:
>>>
>>>                             qc = submit_bio(bio);
>>>
>>>                             if (polled)
>>>                                     WRITE_ONCE(iocb->ki_cookie, qc);
>>>
>>> The bio/dio can be freed before the the cookie is written which is what I'm
>>> seeing, and I thought this may lead to a scenario where that iocb request
>>> could be completed, freed, reallocated, and resubmitted in io_uring layer;
>>> i.e., I thought the cookie could be written into the wrong iocb.
>> I think we do have a potential use after free of the iocb here.
>> But taking a bio reference isn't going to help with that, as the iocb
>> and bio/dio life times are unrelated.
>>
>> I vaguely remember having that discussion with Jens a while ago, and
>> tried to pass a pointer to the qc to submit_bio so that we can set
>> it at submission time, but he came up with a reason why that might not
>> be required.  I'd have to dig out all notes unless Jens remembers
>> better.
> Don't remember that either, so I'd have to dig out emails! But looking
> at it now, for the async case with io_uring, the iocb is embedded in the
> io_kiocb from io_uring. We hold two references to the io_kiocb, one for
> submit and one for completion. Hence even if the bio completes
> immediately and someone else finds the completion before the application
> doing this submit, we still hold the submission reference to the
> io_kiocb. Hence I don't really see how we can end up with a
> use-after-free situation here.
>
> IIRC, Bijan had traces showing this can happen, KASAN complaining about
> it. Which makes me think that I'm missing a case here, though I don't
> immediately see what it is.
>
> Bijan, could post your trace again, I can't seem to find it?
>

I think the problem may be in the nvme driver's handling of multiple 
pollers sharing the same CQ, due to the fact that nvme_poll() drops 
cq_poll_lock before completing the CQEs found with nvme_process_cq():

nvme_poll()
{
     ...
     spin_lock(&nvmeq->cq_poll_lock);
     found = nvme_process_cq(nvmeq, &start, &end, -1);
     spin_unlock(&nvmeq->cq_poll_lock);

     nvme_complete_cqes(nvmeq, start, end);
     ...
}

Furthermore, nvme_process_cq() rings the CQ doorbell after collecting 
the CQEs but before processing them:

static inline int nvme_process_cq(struct nvme_queue *nvmeq, u16 *start, 
u16 *end, unsigned int tag)
{
     ...
     while (nvme_cqe_pending(nvmeq)) {
         ...
         nvme_update_cq_head(nvmeq);
     }
     ...
         nvme_ring_cq_doorbell(nvmeq);
     return found;
}

Each poller effectively tells the controller that the CQ is empty when it rings the CQ doorbell. This is ok if there is only one poller but with many of them, I think enough tags can be freed and reissued that CQ could be overrun.

In one specific example:

- Poller 1 find a CQ full of entries in nvme_process_cq()
- Poller 1 processes CQEs, and more pollers find CQE ranges to process
   Pollers 2-4 start processing additional non-overlapping CQE ranges
- Poller 5 finds a CQE range that is overlapping with Poller 1

CQ size 1024

Poller          1   2    3    4    5
CQ start index  10  9    214  401  708
CQ end index    9   214  401  708  77
CQ start phase  1   0    0    0    0
CQ end phase    0   0    0    0    1

Poller 1 finds the CQ phase has flipped when processing CQE 821 and  indeed the phase has flipped because of poller 5.  If I interpret this data correctly, it suggests that Pollers 1 and 5 overlap.

After that I start seeing errors.

A simpler theoretical example with two threads suggested by Matthew Wilcox:

Thread 1 submits enough I/O to fill the CQ
Thread 1 then processes two CQEs, two block layer tags become available.
Thread 1 is preempted by thread 2.
Thread 2 submits two I/Os.
Thread 2 processes the two CQEs which it owns.
Thread 2 submits two more I/Os.
Those CQEs overwrite the next two CQEs that will be processed by thread 1.

Two of thread 1's IOs will not receive a completion.  Two of
thread 2's IOs will receive two completions.

Just as a workaround, I held cq_poll_lock while completing the CQEs and see no errors.

Does that make sense?

Thanks.

--bijan


      parent reply	other threads:[~2020-02-24 23:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-31  3:23 [PATCH 0/1] block: Manage bio references so the bio persists until necessary Bijan Mottahedeh
2020-01-31  3:23 ` [PATCH 1/1] " Bijan Mottahedeh
2020-01-31  6:42   ` Christoph Hellwig
2020-01-31 18:08     ` Bijan Mottahedeh
2020-02-03  8:34       ` Christoph Hellwig
2020-02-03 21:07         ` Bijan Mottahedeh
2020-02-04  7:51           ` Christoph Hellwig
2020-02-04 20:59             ` Jens Axboe
2020-02-04 22:41               ` Bijan Mottahedeh
2020-02-24 23:32               ` Bijan Mottahedeh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8111469e-713d-88d3-7f12-55e90edaf52b@oracle.com \
    --to=bijan.mottahedeh@oracle.com \
    --cc=axboe@kernel.dk \
    --cc=hch@infradead.org \
    --cc=io-uring@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=matthew.wilcox@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).