All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Leech <cleech@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org
Subject: Re: ncme-tcp: io_work NULL pointer when racing with queue stop
Date: Thu, 27 Jan 2022 19:25:30 -0800	[thread overview]
Message-ID: <CAPnfmXJceXFDAUXND--8UiUJeyzL-8NUAdgYcws=CodEEOF6Zg@mail.gmail.com> (raw)
In-Reply-To: <d33f5615-0e40-2e6e-76a1-b9450fa9ec6c@grimberg.me>

Thanks Sagi, this looks promising.

It also might fit with a new backtrace I was just looking at from the
same testing, where nvme_tcp_submit_async_event hit a null
ctrl->async_req.pdu which I can only see happening if it was racing
with nvme_tcp_error_recovery_work.

I'll get this into some testing here at Red Hat and let you know the results.

- Chris

On Thu, Jan 27, 2022 at 3:05 PM Sagi Grimberg <sagi@grimberg.me> wrote:
>
>
> >> Thank you for the following detailed description. I'm going to go back
> >> to my crash report and take another look at this one.
> >
> > No worries Chris, perhaps I can assist.
> >
> > Is the dmesg log prior to the BUG available? Does it tell us anything
> > to what was going on leading to this?
> >
> > Any more information about the test case? (load + controller reset)
> > Is the reset in a loop? Any more info about the load?
> > Any other 'interference' during the test?
> > How reproducible is this?
> > Is this Linux nvmet as the controller?
> > How many queues does the controller have? (it will help me understand
> > how easy it is to reproduce on a vm setup)
>
> I took another look at the code and I think I see how io_work maybe
> triggered after a socket was released. The issue might be
> .submit_async_event callback from the core.
>
> When we start a reset, the first thing we do is stop the pending
> work elements that may trigger io by calling nvme_stop_ctrl, and
> then we continue to teardown the I/O queues and then the admin
> queue (in nvme_tcp_teardown_ctrl).
>
> So the sequence is:
>          nvme_stop_ctrl(ctrl);
>          nvme_tcp_teardown_ctrl(ctrl, false);
>
> However, there is a possibility, after nvme_stop_ctrl but before
> we teardown the admin queue, that the controller sends a AEN
> and is processed by the host, which includes automatically
> submitting another AER which in turn is calling the driver with
> .submit_async_event (instead of the normal .queue_rq as AERs don't have
> timeouts).
>
> In nvme_tcp_submit_async_event we do not check the controller or
> queue state and see that it is ready to accept a new submission like
> we do in .queue_rq, so we blindly prepare the AER cmd queue it and
> schedules io_work, but at this point I don't see what guarantees that
> the queue (e.g. the socket) is not released.
>
> Unless I'm missing something, this flow will trigger a use-after-free
> when io_work will attempt to access the socket.
>
> I see we also don't flush the async_event_work in the error recovery
> flow which we probably should so we can avoid such a race.
>
> I think that the below patch should address the issue:
> --
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 96725c3f1e77..bf380ca0e0d1 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -2097,6 +2097,7 @@ static void nvme_tcp_error_recovery_work(struct
> work_struct *work)
>
>          nvme_auth_stop(ctrl);
>          nvme_stop_keep_alive(ctrl);
> +       flush_work(&ctrl->async_event_work);
>          nvme_tcp_teardown_io_queues(ctrl, false);
>          /* unquiesce to fail fast pending requests */
>          nvme_start_queues(ctrl);
> @@ -2212,6 +2213,10 @@ static void nvme_tcp_submit_async_event(struct
> nvme_ctrl *arg)
>          struct nvme_tcp_cmd_pdu *pdu = ctrl->async_req.pdu;
>          struct nvme_command *cmd = &pdu->cmd;
>          u8 hdgst = nvme_tcp_hdgst_len(queue);
> +       bool queue_ready = test_bit(NVME_TCP_Q_LIVE, &queue->flags);
> +
> +       if (ctrl->ctrl.state != NVME_CTRL_LIVE || !queue_ready)
> +               return;
>
>          memset(pdu, 0, sizeof(*pdu));
>          pdu->hdr.type = nvme_tcp_cmd;
> --
>
> Chris, can you take this for some testing?
>



  reply	other threads:[~2022-01-28  3:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 18:36 ncme-tcp: io_work NULL pointer when racing with queue stop Chris Leech
2022-01-18 23:32 ` Sagi Grimberg
2022-01-19  1:31   ` Chris Leech
2022-01-19  6:33     ` Sagi Grimberg
2022-01-27 23:05       ` Sagi Grimberg
2022-01-28  3:25         ` Chris Leech [this message]
2022-01-28 23:55           ` Chris Leech
2022-01-30  8:47             ` Sagi Grimberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAPnfmXJceXFDAUXND--8UiUJeyzL-8NUAdgYcws=CodEEOF6Zg@mail.gmail.com' \
    --to=cleech@redhat.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.