All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anton Eidelman <anton.eidelman@gmail.com>
To: Sagi Grimberg <sagi@grimberg.me>,
	linux-nvme@lists.infradead.org, hch@lst.de, kbusch@kernel.org,
	axboe@fb.com
Subject: Re: [PATCH] nvme/mpath: fix hang when disk goes live over reconnect
Date: Tue, 19 Oct 2021 09:13:09 -0600	[thread overview]
Message-ID: <20211019151309.GA51473@anton-latitude> (raw)
In-Reply-To: <368499da-c117-e8b7-7b1b-46894e1e0b48@grimberg.me>

Looking into making nvme_start_ctrl capable of returning an error.
This is a bit more complicated IMHO, because the in most transports
the ctrl is already in LIVE state when nvme_start_ctrl is invoked,
so we need to bail out carefully.

On Tue, Oct 19, 2021 at 05:37:30PM +0300, Sagi Grimberg wrote:
> 
> 
> On 10/5/21 3:38 PM, Sagi Grimberg wrote:
> > 
> > > > How do we proceed with this fix?
> > > 
> > > Please resend with the suggested updates.
> > > 
> > > > I believe error propagation here is not wanted because:
> > > > 1) A failure to fetch or parse the ANA log should not be considered
> > > >     an error in ctrl initialization.
> > > 
> > > We must handle error that are due to a controller failing to initialize.
> > > 
> > > > 2) Such error will not cause problems in teardown.
> > > > 3) The same failure is possible in ANA work and we do not take
> > > >     any action in such case.
> > > 
> > > Failing in a workqueue is different from failing in the initialization
> > > path.
> > > 
> > > > 4) Adding support for failure to nvme_start_ctrl() adds complexity
> > > >     and does not look useful due to the above 1-3.
> > 
> > The feedback here is that this patch is changing the functionality
> > as before we failed initialization and now we don't.
> > 
> > What is the issue in propagating the error and then modify
> > the call-sites? Shouldn't it be simple enough to do?
> > -- 
> > diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
> > index aa14ad963d91..d86623b90eea 100644
> > --- a/drivers/nvme/host/fc.c
> > +++ b/drivers/nvme/host/fc.c
> > @@ -3158,8 +3158,11 @@ nvme_fc_create_association(struct nvme_fc_ctrl
> > *ctrl)
> > 
> >          ctrl->ctrl.nr_reconnects = 0;
> > 
> > -       if (changed)
> > -               nvme_start_ctrl(&ctrl->ctrl);
> > +       if (changed) {
> > +               ret = nvme_start_ctrl(&ctrl->ctrl);
> > +               if (ret)
> > +                       goto out_term_aen_ops;
> > +       }
> > 
> >          return 0;       /* Success */
> > 
> > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > index b82492cd7503..59bfdd72a51a 100644
> > --- a/drivers/nvme/host/pci.c
> > +++ b/drivers/nvme/host/pci.c
> > @@ -2827,7 +2827,10 @@ static void nvme_reset_work(struct work_struct
> > *work)
> >                          &nvme_pci_attr_group))
> >                  dev->attrs_added = true;
> > 
> > -       nvme_start_ctrl(&dev->ctrl);
> > +       ret = nvme_start_ctrl(&dev->ctrl);
> > +       if (ret)
> > +               goto out_unlock;
> > +
> >          return;
> > 
> >    out_unlock:
> > diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
> > index 042c594bc57e..9e9e34a012e7 100644
> > --- a/drivers/nvme/host/rdma.c
> > +++ b/drivers/nvme/host/rdma.c
> > @@ -1141,7 +1141,10 @@ static int nvme_rdma_setup_ctrl(struct
> > nvme_rdma_ctrl *ctrl, bool new)
> >                  goto destroy_io;
> >          }
> > 
> > -       nvme_start_ctrl(&ctrl->ctrl);
> > +       ret = nvme_start_ctrl(&ctrl->ctrl);
> > +       if (ret)
> > +               goto destroy_io;
> > +
> >          return 0;
> > 
> >   destroy_io:
> > diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> > index 10b164f02b5d..22199281ad17 100644
> > --- a/drivers/nvme/host/tcp.c
> > +++ b/drivers/nvme/host/tcp.c
> > @@ -923,6 +923,11 @@ static void nvme_tcp_fail_request(struct
> > nvme_tcp_request *req)
> >          nvme_tcp_end_request(blk_mq_rq_from_pdu(req),
> > NVME_SC_HOST_PATH_ERROR);
> >   }
> > @@ -2045,7 +2052,10 @@ static int nvme_tcp_setup_ctrl(struct nvme_ctrl
> > *ctrl, bool new)
> >                  goto destroy_io;
> >          }
> > 
> > -       nvme_start_ctrl(ctrl);
> > +       ret = nvme_start_ctrl(ctrl);
> > +       if (ret)
> > +               goto destroy_io;
> > +
> >          return 0;
> > 
> >   destroy_io:
> > diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
> > index 0285ccc7541f..3fa89650c0d1 100644
> > --- a/drivers/nvme/target/loop.c
> > +++ b/drivers/nvme/target/loop.c
> > @@ -490,7 +490,9 @@ static void nvme_loop_reset_ctrl_work(struct
> > work_struct *work)
> >          if (!nvme_change_ctrl_state(&ctrl->ctrl, NVME_CTRL_LIVE))
> >                  WARN_ON_ONCE(1);
> > 
> > -       nvme_start_ctrl(&ctrl->ctrl);
> > +       ret = nvme_start_ctrl(&ctrl->ctrl);
> > +       if (ret)
> > +               goto out_destroy_io;
> > 
> >          return;
> > -- 
> 
> So Anton, can this suggestion work? I think Hannes reported the same
> issue that this patch addresses.


  parent reply	other threads:[~2021-10-19 15:13 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18 21:57 [PATCH] nvme/mpath: fix hang when disk goes live over reconnect Anton Eidelman
2021-09-19 10:19 ` Sagi Grimberg
2021-09-19 23:05   ` Anton Eidelman
2021-09-20  7:55     ` Sagi Grimberg
2021-09-20  6:35 ` Christoph Hellwig
2021-09-20 14:53   ` Anton Eidelman
2021-09-21  7:15     ` Christoph Hellwig
2021-10-04 16:46       ` Anton Eidelman
2021-10-04 16:57         ` Christoph Hellwig
2021-10-05 12:38           ` Sagi Grimberg
     [not found]             ` <368499da-c117-e8b7-7b1b-46894e1e0b48@grimberg.me>
2021-10-19 15:13               ` Anton Eidelman [this message]
2022-03-23  3:55 ` [PATCH v2 0/1] " Anton Eidelman
2022-03-23  3:55   ` [PATCH v2 1/1] " Anton Eidelman
2022-03-23  9:23     ` Sagi Grimberg
2022-03-23 14:45     ` [PATCH v3 0/1] " Anton Eidelman
2022-03-23 14:45       ` [PATCH v3 1/1] " Anton Eidelman
2022-03-23 14:55         ` Sagi Grimberg
2022-03-23 15:22         ` Christoph Hellwig
2022-03-24 19:05           ` [PATCH v4 0/1] " Anton Eidelman
2022-03-24 19:05             ` [PATCH v4 1/1] " Anton Eidelman
2022-03-24 21:06               ` Sagi Grimberg
2022-03-25  6:36               ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211019151309.GA51473@anton-latitude \
    --to=anton.eidelman@gmail.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.