All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps
@ 2017-02-21 21:43 Alex Williamson
  2017-02-22  3:08 ` Peter Xu
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Williamson @ 2017-02-21 21:43 UTC (permalink / raw)
  To: qemu-devel; +Cc: alex.williamson, Peter Xu, Michael S. Tsirkin

Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is
reserved") removes the internal use of extended capability ID 0, the
comment here becomes invalid.  However, peeling back the onion, the
code is still correct and we still can't seed the capability chain
with ID 0, unless we want to muck with using the version number to
force the header to be non-zero, which is much uglier to deal with.
The comment also now covers some of the subtleties of using cap ID 0,
such as transparently indicating absence of capabilities if none are
added.  This doesn't detract from the correctness of the referenced
commit as vfio in the kernel also uses capability ID zero to mask
capabilties.  In fact, we should skip zero capabilities precisely
because the kernel might also expose such a capability at the head
position and re-introduce the problem.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
---
 hw/vfio/pci.c |   31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index f2ba9b6cfafc..03a3d0154976 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
     /*
      * Extended capabilities are chained with each pointing to the next, so we
      * can drop anything other than the head of the chain simply by modifying
-     * the previous next pointer.  For the head of the chain, we can modify the
-     * capability ID to something that cannot match a valid capability.  ID
-     * 0 is reserved for this since absence of capabilities is indicated by
-     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
-     * uses ID 0 as reserved for list management and will incorrectly match and
-     * assert if we attempt to pre-load the head of the chain with this ID.
-     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
-     * part for identifying absence of capabilities in a root complex register
-     * block.  If the ID still exists after adding capabilities, switch back to
-     * zero.  We'll mark this entire first dword as emulated for this purpose.
+     * the previous next pointer.  Seed the head of the chain here such that
+     * we can simply skip any capabilities we want to drop below, regardless
+     * of their position in the chain.  If this stub capability still exists
+     * after we add the capabilities we want to expose, update the capability
+     * ID to zero.  Note that we cannot seed with the capability header being
+     * zero as this conflicts with definition of an absent capability chain
+     * and prevents capabilities beyond the head of the list from being added.
+     * By replacing the dummy capability ID with zero after walking the device
+     * chain, we also transparently mark extended capabilities as absent if
+     * no capabilities were added.  Note that the PCIe spec defines an absence
+     * of extended capabilities to be determined by a value of zero for the
+     * capability ID, version, AND next pointer.  A non-zero next pointer
+     * should be sufficient to indicate additional capabilities are present,
+     * which will occur if we call pcie_add_capability() below.  The entire
+     * first dword is emulated to support this.
+     *
+     * NB. The kernel side does similar masking, so be prepared that our
+     * view of the device may also contain a capability ID zero in the head
+     * of the chain.  Skip it for the same reason that we cannot seed the
+     * chain with a zero capability.
      */
     pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
                  PCI_EXT_CAP(0xFFFF, 0, 0));
@@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
                                    PCI_EXT_CAP_NEXT_MASK);
 
         switch (cap_id) {
+        case 0: /* kernel masked capability */
         case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
         case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
             trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps
  2017-02-21 21:43 [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps Alex Williamson
@ 2017-02-22  3:08 ` Peter Xu
  2017-02-22  3:54   ` Alex Williamson
  2017-02-22 11:48   ` Jintack Lim
  0 siblings, 2 replies; 5+ messages in thread
From: Peter Xu @ 2017-02-22  3:08 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, Michael S. Tsirkin, Jintack Lim

[cc Jintack]

On Tue, Feb 21, 2017 at 02:43:03PM -0700, Alex Williamson wrote:
> Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is
> reserved") removes the internal use of extended capability ID 0, the
> comment here becomes invalid.  However, peeling back the onion, the
> code is still correct and we still can't seed the capability chain
> with ID 0, unless we want to muck with using the version number to
> force the header to be non-zero, which is much uglier to deal with.
> The comment also now covers some of the subtleties of using cap ID 0,
> such as transparently indicating absence of capabilities if none are
> added.  This doesn't detract from the correctness of the referenced
> commit as vfio in the kernel also uses capability ID zero to mask
> capabilties.  In fact, we should skip zero capabilities precisely
> because the kernel might also expose such a capability at the head
> position and re-introduce the problem.
> 
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> ---
>  hw/vfio/pci.c |   31 +++++++++++++++++++++----------
>  1 file changed, 21 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index f2ba9b6cfafc..03a3d0154976 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>      /*
>       * Extended capabilities are chained with each pointing to the next, so we
>       * can drop anything other than the head of the chain simply by modifying
> -     * the previous next pointer.  For the head of the chain, we can modify the
> -     * capability ID to something that cannot match a valid capability.  ID
> -     * 0 is reserved for this since absence of capabilities is indicated by
> -     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
> -     * uses ID 0 as reserved for list management and will incorrectly match and
> -     * assert if we attempt to pre-load the head of the chain with this ID.
> -     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> -     * part for identifying absence of capabilities in a root complex register
> -     * block.  If the ID still exists after adding capabilities, switch back to
> -     * zero.  We'll mark this entire first dword as emulated for this purpose.
> +     * the previous next pointer.  Seed the head of the chain here such that
> +     * we can simply skip any capabilities we want to drop below, regardless
> +     * of their position in the chain.  If this stub capability still exists
> +     * after we add the capabilities we want to expose, update the capability
> +     * ID to zero.  Note that we cannot seed with the capability header being
> +     * zero as this conflicts with definition of an absent capability chain
> +     * and prevents capabilities beyond the head of the list from being added.
> +     * By replacing the dummy capability ID with zero after walking the device
> +     * chain, we also transparently mark extended capabilities as absent if
> +     * no capabilities were added.  Note that the PCIe spec defines an absence
> +     * of extended capabilities to be determined by a value of zero for the
> +     * capability ID, version, AND next pointer.  A non-zero next pointer
> +     * should be sufficient to indicate additional capabilities are present,
> +     * which will occur if we call pcie_add_capability() below.  The entire
> +     * first dword is emulated to support this.
> +     *
> +     * NB. The kernel side does similar masking, so be prepared that our
> +     * view of the device may also contain a capability ID zero in the head
> +     * of the chain.  Skip it for the same reason that we cannot seed the
> +     * chain with a zero capability.
>       */
>      pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
>                   PCI_EXT_CAP(0xFFFF, 0, 0));
> @@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>                                     PCI_EXT_CAP_NEXT_MASK);
>  
>          switch (cap_id) {
> +        case 0: /* kernel masked capability */
>          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
>          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
>              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
> 

Reviewed-by: Peter Xu <peterx@redhat.com>

Since this bug is originally reported by Jintack, maybe we can also
add:

Reported-by: Jintack Lim <jintack@cs.columbia.edu>

Jintack, if you want to test it and provide your tested-by, it would
be nice as well. ;)

Actually I just found that the bug still exist after Michael's fix (I
thought it was fixed). So we definitely need this patch or equivalent.
However, I would still slightly prefer removing the wrapping hack
since after all we need to touch it (and I do feel like that's error
prone...). So, Alex, do you like below one instead?

--------8<----------
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 332f41d..6942c1d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
      */
     config = g_memdup(pdev->config, vdev->config_size);

-    /*
-     * Extended capabilities are chained with each pointing to the next, so we
-     * can drop anything other than the head of the chain simply by modifying
-     * the previous next pointer.  For the head of the chain, we can modify the
-     * capability ID to something that cannot match a valid capability.  ID
-     * 0 is reserved for this since absence of capabilities is indicated by
-     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
-     * uses ID 0 as reserved for list management and will incorrectly match and
-     * assert if we attempt to pre-load the head of the chain with this ID.
-     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
-     * part for identifying absence of capabilities in a root complex register
-     * block.  If the ID still exists after adding capabilities, switch back to
-     * zero.  We'll mark this entire first dword as emulated for this purpose.
-     */
-    pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
-                 PCI_EXT_CAP(0xFFFF, 0, 0));
-    pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
-    pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
-
     for (next = PCI_CONFIG_SPACE_SIZE; next;
          next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
         header = pci_get_long(config + next);
@@ -1910,24 +1891,33 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
          */
         size = vfio_ext_cap_max_size(config, next);

-        /* Use emulated next pointer to allow dropping extended caps */
-        pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
-                                   PCI_EXT_CAP_NEXT_MASK);
+        /* Use emulated header to allow dropping extended caps */
+        pci_set_long(vdev->emulated_config_bits + next, 0xffffffffULL);

         switch (cap_id) {
         case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
         case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
+        case PCI_EXT_CAP_ID_VC:
+            /*
+             * For dropped capabilities, we keep their slot but
+             * replace them with a header containing cap_id=0 &&
+             * cap_ver=1. We do this reservation mostly to make sure
+             * the head ecap (at offset 0x100) will always be there.
+             * Anyway it won't hurt if we keep the rest of the dropped
+             * ones as well.
+             *
+             * Here we use non-zero cap_ver because we want to "mark"
+             * this ecap as "available" - from PCIe spec (chap 7.9.1),
+             * it marked out that cap_id=cap_ver=next=0 means empty
+             * ecap, and we really don't want it to be an "empty" slot
+             * especially for the head ecap (we need a head, always!).
+             */
+            pcie_add_capability(pdev, 0, 1, next, size);
             trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
             break;
         default:
             pcie_add_capability(pdev, cap_id, cap_ver, next, size);
         }
-
-    }
-
-    /* Cleanup chain head ID if necessary */
-    if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
-        pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
     }

     g_free(config);
------->8--------

The new patch will keep all the dropped ecaps (so we may see more than
one cap_id=0x0000 field), which I don't know whether would be a
drawback. OTOH, it provides benefits like:

- we can remove the wrapping hack, so the code is much readable and
  less error prone imho

- we can avoid using the assumption that 0xffff cap_id is reserved

I can live with both patches though. :-)

Thanks!

-- peterx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps
  2017-02-22  3:08 ` Peter Xu
@ 2017-02-22  3:54   ` Alex Williamson
  2017-02-22  4:10     ` Peter Xu
  2017-02-22 11:48   ` Jintack Lim
  1 sibling, 1 reply; 5+ messages in thread
From: Alex Williamson @ 2017-02-22  3:54 UTC (permalink / raw)
  To: Peter Xu; +Cc: qemu-devel, Michael S. Tsirkin, Jintack Lim

On Wed, 22 Feb 2017 11:08:51 +0800
Peter Xu <peterx@redhat.com> wrote:

> [cc Jintack]
> 
> On Tue, Feb 21, 2017 at 02:43:03PM -0700, Alex Williamson wrote:
> > Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is
> > reserved") removes the internal use of extended capability ID 0, the
> > comment here becomes invalid.  However, peeling back the onion, the
> > code is still correct and we still can't seed the capability chain
> > with ID 0, unless we want to muck with using the version number to
> > force the header to be non-zero, which is much uglier to deal with.
> > The comment also now covers some of the subtleties of using cap ID 0,
> > such as transparently indicating absence of capabilities if none are
> > added.  This doesn't detract from the correctness of the referenced
> > commit as vfio in the kernel also uses capability ID zero to mask
> > capabilties.  In fact, we should skip zero capabilities precisely
> > because the kernel might also expose such a capability at the head
> > position and re-introduce the problem.
> > 
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/vfio/pci.c |   31 +++++++++++++++++++++----------
> >  1 file changed, 21 insertions(+), 10 deletions(-)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index f2ba9b6cfafc..03a3d0154976 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >      /*
> >       * Extended capabilities are chained with each pointing to the next, so we
> >       * can drop anything other than the head of the chain simply by modifying
> > -     * the previous next pointer.  For the head of the chain, we can modify the
> > -     * capability ID to something that cannot match a valid capability.  ID
> > -     * 0 is reserved for this since absence of capabilities is indicated by
> > -     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
> > -     * uses ID 0 as reserved for list management and will incorrectly match and
> > -     * assert if we attempt to pre-load the head of the chain with this ID.
> > -     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> > -     * part for identifying absence of capabilities in a root complex register
> > -     * block.  If the ID still exists after adding capabilities, switch back to
> > -     * zero.  We'll mark this entire first dword as emulated for this purpose.
> > +     * the previous next pointer.  Seed the head of the chain here such that
> > +     * we can simply skip any capabilities we want to drop below, regardless
> > +     * of their position in the chain.  If this stub capability still exists
> > +     * after we add the capabilities we want to expose, update the capability
> > +     * ID to zero.  Note that we cannot seed with the capability header being
> > +     * zero as this conflicts with definition of an absent capability chain
> > +     * and prevents capabilities beyond the head of the list from being added.
> > +     * By replacing the dummy capability ID with zero after walking the device
> > +     * chain, we also transparently mark extended capabilities as absent if
> > +     * no capabilities were added.  Note that the PCIe spec defines an absence
> > +     * of extended capabilities to be determined by a value of zero for the
> > +     * capability ID, version, AND next pointer.  A non-zero next pointer
> > +     * should be sufficient to indicate additional capabilities are present,
> > +     * which will occur if we call pcie_add_capability() below.  The entire
> > +     * first dword is emulated to support this.
> > +     *
> > +     * NB. The kernel side does similar masking, so be prepared that our
> > +     * view of the device may also contain a capability ID zero in the head
> > +     * of the chain.  Skip it for the same reason that we cannot seed the
> > +     * chain with a zero capability.
> >       */
> >      pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> >                   PCI_EXT_CAP(0xFFFF, 0, 0));
> > @@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >                                     PCI_EXT_CAP_NEXT_MASK);
> >  
> >          switch (cap_id) {
> > +        case 0: /* kernel masked capability */
> >          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
> >          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
> >              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
> >   
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>
> 
> Since this bug is originally reported by Jintack, maybe we can also
> add:
> 
> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
> 
> Jintack, if you want to test it and provide your tested-by, it would
> be nice as well. ;)
> 
> Actually I just found that the bug still exist after Michael's fix (I
> thought it was fixed). So we definitely need this patch or equivalent.
> However, I would still slightly prefer removing the wrapping hack
> since after all we need to touch it (and I do feel like that's error
> prone...). So, Alex, do you like below one instead?
> 
> --------8<----------
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 332f41d..6942c1d 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>       */
>      config = g_memdup(pdev->config, vdev->config_size);
> 
> -    /*
> -     * Extended capabilities are chained with each pointing to the next, so we
> -     * can drop anything other than the head of the chain simply by modifying
> -     * the previous next pointer.  For the head of the chain, we can modify the
> -     * capability ID to something that cannot match a valid capability.  ID
> -     * 0 is reserved for this since absence of capabilities is indicated by
> -     * 0 for the ID, version, AND next pointer.  However, pcie_add_capability()
> -     * uses ID 0 as reserved for list management and will incorrectly match and
> -     * assert if we attempt to pre-load the head of the chain with this ID.
> -     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> -     * part for identifying absence of capabilities in a root complex register
> -     * block.  If the ID still exists after adding capabilities, switch back to
> -     * zero.  We'll mark this entire first dword as emulated for this purpose.
> -     */
> -    pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> -                 PCI_EXT_CAP(0xFFFF, 0, 0));
> -    pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
> -    pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
> -
>      for (next = PCI_CONFIG_SPACE_SIZE; next;
>           next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
>          header = pci_get_long(config + next);
> @@ -1910,24 +1891,33 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>           */
>          size = vfio_ext_cap_max_size(config, next);
> 
> -        /* Use emulated next pointer to allow dropping extended caps */
> -        pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
> -                                   PCI_EXT_CAP_NEXT_MASK);
> +        /* Use emulated header to allow dropping extended caps */
> +        pci_set_long(vdev->emulated_config_bits + next, 0xffffffffULL);
> 
>          switch (cap_id) {
>          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
>          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function virtualization */
> +        case PCI_EXT_CAP_ID_VC:
> +            /*
> +             * For dropped capabilities, we keep their slot but
> +             * replace them with a header containing cap_id=0 &&
> +             * cap_ver=1. We do this reservation mostly to make sure
> +             * the head ecap (at offset 0x100) will always be there.
> +             * Anyway it won't hurt if we keep the rest of the dropped
> +             * ones as well.
> +             *
> +             * Here we use non-zero cap_ver because we want to "mark"
> +             * this ecap as "available" - from PCIe spec (chap 7.9.1),
> +             * it marked out that cap_id=cap_ver=next=0 means empty
> +             * ecap, and we really don't want it to be an "empty" slot
> +             * especially for the head ecap (we need a head, always!).
> +             */
> +            pcie_add_capability(pdev, 0, 1, next, size);
>              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id, next);
>              break;
>          default:
>              pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>          }
> -
> -    }
> -
> -    /* Cleanup chain head ID if necessary */
> -    if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
> -        pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
>      }
> 
>      g_free(config);
> ------->8--------  
> 
> The new patch will keep all the dropped ecaps (so we may see more than
> one cap_id=0x0000 field), which I don't know whether would be a
> drawback. OTOH, it provides benefits like:
> 
> - we can remove the wrapping hack, so the code is much readable and
>   less error prone imho
> 
> - we can avoid using the assumption that 0xffff cap_id is reserved
> 
> I can live with both patches though. :-)

I prefer the existing code.  I don't really see why you consider it a
hack.  I think it's pretty elegant that we can ignore the header
through the course of iterating through the capabilities, that we drop
other masked capabilities out of the chain, and that we can so easily
and transparently insert a zero ID at the end to serve the dual purpose
of replacing the temporary ID and nullifying the list if nothing was
added.  The 0xffff capability ID is a perfectly safe assumption, not
only are we ridiculously far from allocating that ID, but it's arguably
a reserved value due to its use in the root complex register block.  I
also don't see any evidence that it's error prone, the entire point is
that we can arbitrarily skip capability IDs in the body of the loop and
the result is a correct, minimal capability chain.  OTOH, leaving
masked capabilities in the chain with an arbitrary version number seems
messy.

The real question is why are you sneaking the virtual channel
capability into the list of masked capabilities?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps
  2017-02-22  3:54   ` Alex Williamson
@ 2017-02-22  4:10     ` Peter Xu
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Xu @ 2017-02-22  4:10 UTC (permalink / raw)
  To: Alex Williamson; +Cc: qemu-devel, Michael S. Tsirkin, Jintack Lim

On Tue, Feb 21, 2017 at 08:54:31PM -0700, Alex Williamson wrote:

[...]

> I prefer the existing code.  I don't really see why you consider it a
> hack.  I think it's pretty elegant that we can ignore the header
> through the course of iterating through the capabilities, that we drop
> other masked capabilities out of the chain, and that we can so easily
> and transparently insert a zero ID at the end to serve the dual purpose
> of replacing the temporary ID and nullifying the list if nothing was
> added.  The 0xffff capability ID is a perfectly safe assumption, not
> only are we ridiculously far from allocating that ID, but it's arguably
> a reserved value due to its use in the root complex register block.  I
> also don't see any evidence that it's error prone, the entire point is
> that we can arbitrarily skip capability IDs in the body of the loop and
> the result is a correct, minimal capability chain.  OTOH, leaving
> masked capabilities in the chain with an arbitrary version number seems
> messy.

I see. Then please also pick this one:

Tested-by: Peter Xu <peterx@redhat.com>

> 
> The real question is why are you sneaking the virtual channel
> capability into the list of masked capabilities?  Thanks,

Oooops. I should remove that line. It's for my testing purpose (I need
to "fake" a device that with 0x100 masked to test my patch, while my
SD card reader did has this VC cap at 0x100 :-). Since we now have a
choice already, please just ignore that line along with the whole
patch. ;)

Thanks,

-- peterx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps
  2017-02-22  3:08 ` Peter Xu
  2017-02-22  3:54   ` Alex Williamson
@ 2017-02-22 11:48   ` Jintack Lim
  1 sibling, 0 replies; 5+ messages in thread
From: Jintack Lim @ 2017-02-22 11:48 UTC (permalink / raw)
  To: Peter Xu; +Cc: Alex Williamson, QEMU Devel Mailing List, Michael S. Tsirkin

On Tue, Feb 21, 2017 at 10:08 PM, Peter Xu <peterx@redhat.com> wrote:

> [cc Jintack]
>
> On Tue, Feb 21, 2017 at 02:43:03PM -0700, Alex Williamson wrote:
> > Since commit 4bb571d857d9 ("pci/pcie: don't assume cap id 0 is
> > reserved") removes the internal use of extended capability ID 0, the
> > comment here becomes invalid.  However, peeling back the onion, the
> > code is still correct and we still can't seed the capability chain
> > with ID 0, unless we want to muck with using the version number to
> > force the header to be non-zero, which is much uglier to deal with.
> > The comment also now covers some of the subtleties of using cap ID 0,
> > such as transparently indicating absence of capabilities if none are
> > added.  This doesn't detract from the correctness of the referenced
> > commit as vfio in the kernel also uses capability ID zero to mask
> > capabilties.  In fact, we should skip zero capabilities precisely
> > because the kernel might also expose such a capability at the head
> > position and re-introduce the problem.
> >
> > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Peter Xu <peterx@redhat.com>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  hw/vfio/pci.c |   31 +++++++++++++++++++++----------
> >  1 file changed, 21 insertions(+), 10 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index f2ba9b6cfafc..03a3d0154976 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -1880,16 +1880,26 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >      /*
> >       * Extended capabilities are chained with each pointing to the
> next, so we
> >       * can drop anything other than the head of the chain simply by
> modifying
> > -     * the previous next pointer.  For the head of the chain, we can
> modify the
> > -     * capability ID to something that cannot match a valid
> capability.  ID
> > -     * 0 is reserved for this since absence of capabilities is
> indicated by
> > -     * 0 for the ID, version, AND next pointer.  However,
> pcie_add_capability()
> > -     * uses ID 0 as reserved for list management and will incorrectly
> match and
> > -     * assert if we attempt to pre-load the head of the chain with this
> ID.
> > -     * Use ID 0xFFFF temporarily since it is also seems to be reserved
> in
> > -     * part for identifying absence of capabilities in a root complex
> register
> > -     * block.  If the ID still exists after adding capabilities, switch
> back to
> > -     * zero.  We'll mark this entire first dword as emulated for this
> purpose.
> > +     * the previous next pointer.  Seed the head of the chain here such
> that
> > +     * we can simply skip any capabilities we want to drop below,
> regardless
> > +     * of their position in the chain.  If this stub capability still
> exists
> > +     * after we add the capabilities we want to expose, update the
> capability
> > +     * ID to zero.  Note that we cannot seed with the capability header
> being
> > +     * zero as this conflicts with definition of an absent capability
> chain
> > +     * and prevents capabilities beyond the head of the list from being
> added.
> > +     * By replacing the dummy capability ID with zero after walking the
> device
> > +     * chain, we also transparently mark extended capabilities as
> absent if
> > +     * no capabilities were added.  Note that the PCIe spec defines an
> absence
> > +     * of extended capabilities to be determined by a value of zero for
> the
> > +     * capability ID, version, AND next pointer.  A non-zero next
> pointer
> > +     * should be sufficient to indicate additional capabilities are
> present,
> > +     * which will occur if we call pcie_add_capability() below.  The
> entire
> > +     * first dword is emulated to support this.
> > +     *
> > +     * NB. The kernel side does similar masking, so be prepared that our
> > +     * view of the device may also contain a capability ID zero in the
> head
> > +     * of the chain.  Skip it for the same reason that we cannot seed
> the
> > +     * chain with a zero capability.
> >       */
> >      pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> >                   PCI_EXT_CAP(0xFFFF, 0, 0));
> > @@ -1915,6 +1925,7 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
> >                                     PCI_EXT_CAP_NEXT_MASK);
> >
> >          switch (cap_id) {
> > +        case 0: /* kernel masked capability */
> >          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
> >          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function
> virtualization */
> >              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name,
> cap_id, next);
> >
>
> Reviewed-by: Peter Xu <peterx@redhat.com>
>
> Since this bug is originally reported by Jintack, maybe we can also
> add:
>
> Reported-by: Jintack Lim <jintack@cs.columbia.edu>
>
> Jintack, if you want to test it and provide your tested-by, it would
> be nice as well. ;)
>

I believe this patch is to fix the error I got before.
qemu-system-x86_64: hw/pci/pcie.c:686: pcie_add_capability: Assertion
`prev >= 0x100' failed.

If so,
Tested-by: Jintack Lim <jintack@cs.columbia.edu>


> Actually I just found that the bug still exist after Michael's fix (I
> thought it was fixed). So we definitely need this patch or equivalent.
> However, I would still slightly prefer removing the wrapping hack
> since after all we need to touch it (and I do feel like that's error
> prone...). So, Alex, do you like below one instead?
>
> --------8<----------
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 332f41d..6942c1d 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1877,25 +1877,6 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>       */
>      config = g_memdup(pdev->config, vdev->config_size);
>
> -    /*
> -     * Extended capabilities are chained with each pointing to the next,
> so we
> -     * can drop anything other than the head of the chain simply by
> modifying
> -     * the previous next pointer.  For the head of the chain, we can
> modify the
> -     * capability ID to something that cannot match a valid capability.
> ID
> -     * 0 is reserved for this since absence of capabilities is indicated
> by
> -     * 0 for the ID, version, AND next pointer.  However,
> pcie_add_capability()
> -     * uses ID 0 as reserved for list management and will incorrectly
> match and
> -     * assert if we attempt to pre-load the head of the chain with this
> ID.
> -     * Use ID 0xFFFF temporarily since it is also seems to be reserved in
> -     * part for identifying absence of capabilities in a root complex
> register
> -     * block.  If the ID still exists after adding capabilities, switch
> back to
> -     * zero.  We'll mark this entire first dword as emulated for this
> purpose.
> -     */
> -    pci_set_long(pdev->config + PCI_CONFIG_SPACE_SIZE,
> -                 PCI_EXT_CAP(0xFFFF, 0, 0));
> -    pci_set_long(pdev->wmask + PCI_CONFIG_SPACE_SIZE, 0);
> -    pci_set_long(vdev->emulated_config_bits + PCI_CONFIG_SPACE_SIZE, ~0);
> -
>      for (next = PCI_CONFIG_SPACE_SIZE; next;
>           next = PCI_EXT_CAP_NEXT(pci_get_long(config + next))) {
>          header = pci_get_long(config + next);
> @@ -1910,24 +1891,33 @@ static void vfio_add_ext_cap(VFIOPCIDevice *vdev)
>           */
>          size = vfio_ext_cap_max_size(config, next);
>
> -        /* Use emulated next pointer to allow dropping extended caps */
> -        pci_long_test_and_set_mask(vdev->emulated_config_bits + next,
> -                                   PCI_EXT_CAP_NEXT_MASK);
> +        /* Use emulated header to allow dropping extended caps */
> +        pci_set_long(vdev->emulated_config_bits + next, 0xffffffffULL);
>
>          switch (cap_id) {
>          case PCI_EXT_CAP_ID_SRIOV: /* Read-only VF BARs confuse OVMF */
>          case PCI_EXT_CAP_ID_ARI: /* XXX Needs next function
> virtualization */
> +        case PCI_EXT_CAP_ID_VC:
> +            /*
> +             * For dropped capabilities, we keep their slot but
> +             * replace them with a header containing cap_id=0 &&
> +             * cap_ver=1. We do this reservation mostly to make sure
> +             * the head ecap (at offset 0x100) will always be there.
> +             * Anyway it won't hurt if we keep the rest of the dropped
> +             * ones as well.
> +             *
> +             * Here we use non-zero cap_ver because we want to "mark"
> +             * this ecap as "available" - from PCIe spec (chap 7.9.1),
> +             * it marked out that cap_id=cap_ver=next=0 means empty
> +             * ecap, and we really don't want it to be an "empty" slot
> +             * especially for the head ecap (we need a head, always!).
> +             */
> +            pcie_add_capability(pdev, 0, 1, next, size);
>              trace_vfio_add_ext_cap_dropped(vdev->vbasedev.name, cap_id,
> next);
>              break;
>          default:
>              pcie_add_capability(pdev, cap_id, cap_ver, next, size);
>          }
> -
> -    }
> -
> -    /* Cleanup chain head ID if necessary */
> -    if (pci_get_word(pdev->config + PCI_CONFIG_SPACE_SIZE) == 0xFFFF) {
> -        pci_set_word(pdev->config + PCI_CONFIG_SPACE_SIZE, 0);
>      }
>
>      g_free(config);
> ------->8--------
>
> The new patch will keep all the dropped ecaps (so we may see more than
> one cap_id=0x0000 field), which I don't know whether would be a
> drawback. OTOH, it provides benefits like:
>
> - we can remove the wrapping hack, so the code is much readable and
>   less error prone imho
>
> - we can avoid using the assumption that 0xffff cap_id is reserved
>
> I can live with both patches though. :-)
>
> Thanks!
>
> -- peterx
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-02-22 11:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 21:43 [Qemu-devel] [PATCH] vfio/pci: Improve extended capability comments, skip masked caps Alex Williamson
2017-02-22  3:08 ` Peter Xu
2017-02-22  3:54   ` Alex Williamson
2017-02-22  4:10     ` Peter Xu
2017-02-22 11:48   ` Jintack Lim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.