qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Lukasz Maniak <lukasz.maniak@linux.intel.com>
To: Klaus Jensen <its@irrelevant.dk>
Cc: qemu-block@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
	"Łukasz Gieryk" <lukasz.gieryk@linux.intel.com>,
	qemu-devel@nongnu.org, "Keith Busch" <kbusch@kernel.org>
Subject: Re: [PATCH 05/15] hw/nvme: Add support for SR-IOV
Date: Thu, 21 Oct 2021 16:33:13 +0200	[thread overview]
Message-ID: <20211021143313.GA3331@lmaniak-dev.igk.intel.com> (raw)
In-Reply-To: <YXBpA7ydMl9//wZ1@apples.localdomain>

On Wed, Oct 20, 2021 at 09:07:47PM +0200, Klaus Jensen wrote:
> On Oct  7 18:23, Lukasz Maniak wrote:
> > This patch implements initial support for Single Root I/O Virtualization
> > on an NVMe device.
> > 
> > Essentially, it allows to define the maximum number of virtual functions
> > supported by the NVMe controller via sriov_max_vfs parameter.
> > 
> > Passing a non-zero value to sriov_max_vfs triggers reporting of SR-IOV
> > capability by a physical controller and ARI capability by both the
> > physical and virtual function devices.
> > 
> > NVMe controllers created via virtual functions mirror functionally
> > the physical controller, which may not entirely be the case, thus
> > consideration would be needed on the way to limit the capabilities of
> > the VF.
> > 
> > NVMe subsystem is required for the use of SR-IOV.
> > 
> > Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com>
> > ---
> >  hw/nvme/ctrl.c           | 74 ++++++++++++++++++++++++++++++++++++++--
> >  hw/nvme/nvme.h           |  1 +
> >  include/hw/pci/pci_ids.h |  1 +
> >  3 files changed, 73 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
> > index 6a571d18cf..ad79ff0c00 100644
> > --- a/hw/nvme/ctrl.c
> > +++ b/hw/nvme/ctrl.c
> > @@ -35,6 +35,7 @@
> >   *              mdts=<N[optional]>,vsl=<N[optional]>, \
> >   *              zoned.zasl=<N[optional]>, \
> >   *              zoned.auto_transition=<on|off[optional]>, \
> > + *              sriov_max_vfs=<N[optional]> \
> >   *              subsys=<subsys_id>
> >   *      -device nvme-ns,drive=<drive_id>,bus=<bus_name>,nsid=<nsid>,\
> >   *              zoned=<true|false[optional]>, \
> > @@ -106,6 +107,12 @@
> >   *   transitioned to zone state closed for resource management purposes.
> >   *   Defaults to 'on'.
> >   *
> > + * - `sriov_max_vfs`
> > + *   Indicates the maximum number of PCIe virtual functions supported
> > + *   by the controller. The default value is 0. Specifying a non-zero value
> > + *   enables reporting of both SR-IOV and ARI capabilities by the NVMe device.
> > + *   Virtual function controllers will not report SR-IOV capability.
> > + *
> >   * nvme namespace device parameters
> >   * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >   * - `shared`
> > @@ -160,6 +167,7 @@
> >  #include "sysemu/block-backend.h"
> >  #include "sysemu/hostmem.h"
> >  #include "hw/pci/msix.h"
> > +#include "hw/pci/pcie_sriov.h"
> >  #include "migration/vmstate.h"
> >  
> >  #include "nvme.h"
> > @@ -175,6 +183,9 @@
> >  #define NVME_TEMPERATURE_CRITICAL 0x175
> >  #define NVME_NUM_FW_SLOTS 1
> >  #define NVME_DEFAULT_MAX_ZA_SIZE (128 * KiB)
> > +#define NVME_MAX_VFS 127
> > +#define NVME_VF_OFFSET 0x1
> > +#define NVME_VF_STRIDE 1
> >  
> >  #define NVME_GUEST_ERR(trace, fmt, ...) \
> >      do { \
> > @@ -5583,6 +5594,10 @@ static void nvme_ctrl_reset(NvmeCtrl *n)
> >          g_free(event);
> >      }
> >  
> > +    if (!pci_is_vf(&n->parent_obj) && n->params.sriov_max_vfs) {
> > +        pcie_sriov_pf_disable_vfs(&n->parent_obj);
> > +    }
> > +
> >      n->aer_queued = 0;
> >      n->outstanding_aers = 0;
> >      n->qs_created = false;
> > @@ -6264,6 +6279,19 @@ static void nvme_check_constraints(NvmeCtrl *n, Error **errp)
> >          error_setg(errp, "vsl must be non-zero");
> >          return;
> >      }
> > +
> > +    if (params->sriov_max_vfs) {
> > +        if (!n->subsys) {
> > +            error_setg(errp, "subsystem is required for the use of SR-IOV");
> > +            return;
> > +        }
> > +
> > +        if (params->sriov_max_vfs > NVME_MAX_VFS) {
> > +            error_setg(errp, "sriov_max_vfs must be between 0 and %d",
> > +                       NVME_MAX_VFS);
> > +            return;
> > +        }
> > +    }
> >  }
> >  
> >  static void nvme_init_state(NvmeCtrl *n)
> > @@ -6321,6 +6349,20 @@ static void nvme_init_pmr(NvmeCtrl *n, PCIDevice *pci_dev)
> >      memory_region_set_enabled(&n->pmr.dev->mr, false);
> >  }
> >  
> > +static void nvme_init_sriov(NvmeCtrl *n, PCIDevice *pci_dev, uint16_t offset,
> > +                            uint64_t bar_size)
> > +{
> > +    uint16_t vf_dev_id = n->params.use_intel_id ?
> > +                         PCI_DEVICE_ID_INTEL_NVME : PCI_DEVICE_ID_REDHAT_NVME;
> > +
> > +    pcie_sriov_pf_init(pci_dev, offset, "nvme", vf_dev_id,
> > +                       n->params.sriov_max_vfs, n->params.sriov_max_vfs,
> > +                       NVME_VF_OFFSET, NVME_VF_STRIDE, NULL);
> 
> Did you consider adding a new device for the virtual function device,
> "nvmevf"?
> 
> Down the road it might help with the variations in capabilities that you
> describe.

Hi Klaus,

A separate nvmevf device was actually the first approach I tried.
Well, it came down to copying the nvme device functions in favor of a
few changes that can be covered with conditions.

As for limiting VF capabilities, the problem comes down to a nice
restriction on supported command set by the VF controller. Thus, using
nvmevf for this purpose sounds like an overkill.

Concerning restriction on the supported command set, an actual real
device would reduce VF's ability to use namespace attachment, namespace
management, virtualization enhancements, and corresponding identify
commands. However, since implementing secure virtualization in QEMU
would be complex and is not required it can be skipped for now.

Kind regards,
Lukasz


  reply	other threads:[~2021-10-21 14:38 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-07 16:23 [PATCH 00/15] hw/nvme: SR-IOV with Virtualization Enhancements Lukasz Maniak
2021-10-07 16:23 ` [PATCH 01/15] pcie: Set default and supported MaxReadReq to 512 Lukasz Maniak
2021-10-07 22:12   ` Michael S. Tsirkin
2021-10-26 14:36     ` Lukasz Maniak
2021-10-26 15:37       ` Knut Omang
2021-10-07 16:23 ` [PATCH 02/15] pcie: Add support for Single Root I/O Virtualization (SR/IOV) Lukasz Maniak
2021-10-07 16:23 ` [PATCH 03/15] pcie: Add some SR/IOV API documentation in docs/pcie_sriov.txt Lukasz Maniak
2021-10-07 16:23 ` [PATCH 04/15] pcie: Add callback preceding SR-IOV VFs update Lukasz Maniak
2021-10-12  7:25   ` Michael S. Tsirkin
2021-10-12 16:06     ` Lukasz Maniak
2021-10-13  9:10       ` Michael S. Tsirkin
2021-10-15 16:24         ` Lukasz Maniak
2021-10-15 17:30           ` Michael S. Tsirkin
2021-10-20 13:30             ` Lukasz Maniak
2021-10-07 16:23 ` [PATCH 05/15] hw/nvme: Add support for SR-IOV Lukasz Maniak
2021-10-20 19:07   ` Klaus Jensen
2021-10-21 14:33     ` Lukasz Maniak [this message]
2021-11-02 14:33   ` Klaus Jensen
2021-11-02 17:33     ` Lukasz Maniak
2021-11-04 14:30       ` Lukasz Maniak
2021-11-08  7:56         ` Klaus Jensen
2021-11-10 13:42           ` Lukasz Maniak
2021-11-10 16:39             ` Klaus Jensen
2021-10-07 16:23 ` [PATCH 06/15] hw/nvme: Add support for Primary Controller Capabilities Lukasz Maniak
2021-11-02 14:34   ` Klaus Jensen
2021-10-07 16:23 ` [PATCH 07/15] hw/nvme: Add support for Secondary Controller List Lukasz Maniak
2021-11-02 14:35   ` Klaus Jensen
2021-10-07 16:23 ` [PATCH 08/15] pcie: Add 1.2 version token for the Power Management Capability Lukasz Maniak
2021-10-07 16:24 ` [PATCH 09/15] hw/nvme: Implement the Function Level Reset Lukasz Maniak
2021-11-02 14:35   ` Klaus Jensen
2021-10-07 16:24 ` [PATCH 10/15] hw/nvme: Make max_ioqpairs and msix_qsize configurable in runtime Lukasz Maniak
2021-10-18 10:06   ` Philippe Mathieu-Daudé
2021-10-18 15:53     ` Łukasz Gieryk
2021-10-20 19:06   ` Klaus Jensen
2021-10-21 13:40     ` Łukasz Gieryk
2021-11-03 12:11       ` Klaus Jensen
2021-10-20 19:26   ` Klaus Jensen
2021-10-07 16:24 ` [PATCH 11/15] hw/nvme: Calculate BAR atributes in a function Lukasz Maniak
2021-10-18  9:52   ` Philippe Mathieu-Daudé
2021-10-07 16:24 ` [PATCH 12/15] hw/nvme: Initialize capability structures for primary/secondary controllers Lukasz Maniak
2021-11-03 12:07   ` Klaus Jensen
2021-11-04 15:48     ` Łukasz Gieryk
2021-11-05  8:46       ` Łukasz Gieryk
2021-11-05 14:04         ` Łukasz Gieryk
2021-11-08  8:25           ` Klaus Jensen
2021-11-08 13:57             ` Łukasz Gieryk
2021-11-09 12:22               ` Klaus Jensen
2021-10-07 16:24 ` [PATCH 13/15] pcie: Add helpers to the SR/IOV API Lukasz Maniak
2021-10-26 16:57   ` Knut Omang
2021-10-07 16:24 ` [PATCH 14/15] hw/nvme: Add support for the Virtualization Management command Lukasz Maniak
2021-10-07 16:24 ` [PATCH 15/15] docs: Add documentation for SR-IOV and Virtualization Enhancements Lukasz Maniak
2021-10-08  6:31 ` [PATCH 00/15] hw/nvme: SR-IOV with " Klaus Jensen
2021-10-26 18:20 ` Klaus Jensen
2021-10-27 16:49   ` Lukasz Maniak
2021-11-02  7:24     ` Klaus Jensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211021143313.GA3331@lmaniak-dev.igk.intel.com \
    --to=lukasz.maniak@linux.intel.com \
    --cc=its@irrelevant.dk \
    --cc=kbusch@kernel.org \
    --cc=lukasz.gieryk@linux.intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).