All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: vishal.l.verma@intel.com, "Jiang, Dave" <dave.jiang@intel.com>,
	ira.weiny@intel.com,  nvdimm@lists.linux.dev,
	mst <mst@redhat.com>
Subject: Re: [PATCH 2/2] virtio_pmem: set device ready in probe()
Date: Wed, 22 Jun 2022 11:34:20 +0800	[thread overview]
Message-ID: <CACGkMEuh+RPeAgE=uqe206iA9WbegO4J0ruid_i3j2Uuv5F19g@mail.gmail.com> (raw)
In-Reply-To: <62b2486b8fafe_8920729455@dwillia2-xfh.notmuch>

On Wed, Jun 22, 2022 at 6:38 AM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Jason Wang wrote:
> > The NVDIMM region could be available before the virtio_device_ready()
> > that is called by virtio_dev_probe(). This means the driver tries to
> > use device before DRIVER_OK which violates the spec, fixing this by
> > set device ready before the nvdimm_pmem_region_create().
>
> Can you clarify the failure path. What race is virtio_device_ready()
> losing?

So it's something like this:

1) virtio_device_ready() will set DRIVER_OK to the device.
2) virtio spec disallow device to process a virtqueue without DRIVER_OK to set

But the nd_region is available to user after nd_region_create(), and a
flush could be issued before virtio_device_ready().

This means the hypervisor gets a kick on the virtqueue before
DRIVER_OK. The hypervisor should choose not to respond to that kick
according to the spec. This will result in infinite wait in
virtio_pmem_flush().

Fortunately, qemu doesn't check DRIVER_OK and can process the
virtqueue without DRIVER_OK (which is kind of a spec violation), so we
survive for the past few years. But there's no guarantee it can work
for other hypervisor.

So we need to set DRIVER_OK before nd_region_create() to make sure flush works.

>
> >
> > Note that this means the virtio_pmem_host_ack() could be triggered
> > before the creation of the nd region, this is safe since the
> > virtio_pmem_host_ack() since pmem_lock has been initialized and we
> > check if we've added any buffer before trying to proceed.
>
> I got a little bit lost with the usage of "we" here. Can you clarify
> which function / context is making which guarantee?

By "we" I meant the callback for the req_vq that is virtio_pmem_host_ack().

If we do virtio_device_ready() before nd_region_create(). A buggy or
malicious hypervisor can raise the notification before
nd_region_create(). We need to make sure virtio_pmem_host_ack() can
survive from this. And since we've checked whether we've submitted any
request before, so in the case where guest memory is protected from
the host, we're safe here.

>
> >
> > Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c | 12 ++++++++++++
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > index 48f8327d0431..173f2f5adaea 100644
> > --- a/drivers/nvdimm/virtio_pmem.c
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -84,6 +84,17 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
> >       ndr_desc.provider_data = vdev;
> >       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> >       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +     /*
> > +      * The NVDIMM region could be available before the
> > +      * virtio_device_ready() that is called by
> > +      * virtio_dev_probe(), so we set device ready here.
> > +      *
> > +      * The callback - virtio_pmem_host_ack() is safe to be called
> > +      * before the nvdimm_pmem_region_create() since the pmem_lock
> > +      * has been initialized and legality of a used buffer is
> > +      * validated before moving forward.
>
> This comment feels like changelog material.

I had this in the changelog:

> > Note that this means the virtio_pmem_host_ack() could be triggered
> > before the creation of the nd region, this is safe since the
> > virtio_pmem_host_ack() since pmem_lock has been initialized and we
> > check if we've added any buffer before trying to proceed.

> Just document why
> virtio_device_ready() must be called before device_add() of the
> nd_region.

This comment wants to explain the side effect of having
virtio_device_ready() before nvdimm_pmem_region_create() and why we
can survive from that.

Thanks

>
> > +      */
> > +     virtio_device_ready(vdev);
> >       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> >       if (!nd_region) {
> >               dev_err(&vdev->dev, "failed to create nvdimm region\n");
> > @@ -92,6 +103,7 @@ static int virtio_pmem_probe(struct virtio_device *vdev)
> >       }
> >       return 0;
> >  out_nd:
> > +     virtio_reset_device(vdev);
> >       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> >  out_vq:
> >       vdev->config->del_vqs(vdev);
> > --
> > 2.25.1
> >
>


  reply	other threads:[~2022-06-22  3:34 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20  8:15 [PATCH 1/2] virtio_pmem: initialize provider_data through nd_region_desc Jason Wang
2022-06-20  8:15 ` [PATCH 2/2] virtio_pmem: set device ready in probe() Jason Wang
2022-06-20  8:32   ` Michael S. Tsirkin
2022-06-20  8:39     ` Jason Wang
2022-06-20  8:53       ` Michael S. Tsirkin
2022-06-21 12:34   ` Pankaj Gupta
2022-06-21 22:38   ` Dan Williams
2022-06-22  3:34     ` Jason Wang [this message]
2022-06-22  6:29     ` Michael S. Tsirkin
2022-06-22  7:24       ` Jason Wang
2022-06-22 12:31         ` Michael S. Tsirkin
2022-06-23  1:29           ` Jason Wang
2022-06-23  3:57             ` Jason Wang
2022-06-24  6:44               ` Michael S. Tsirkin
2022-06-20  8:36 ` [PATCH 1/2] virtio_pmem: initialize provider_data through nd_region_desc Michael S. Tsirkin
2022-06-20  8:36 ` Jason Wang
2022-06-21 12:44 ` Pankaj Gupta
2022-06-22  3:35   ` Jason Wang
2022-06-21 22:34 ` Dan Williams
2022-06-22  3:22   ` Jason Wang
2022-06-24  6:46     ` Michael S. Tsirkin
2022-06-27  2:31       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACGkMEuh+RPeAgE=uqe206iA9WbegO4J0ruid_i3j2Uuv5F19g@mail.gmail.com' \
    --to=jasowang@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=mst@redhat.com \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.