All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] vpci/msix: fix PBA acceses
@ 2022-02-25 15:39 Roger Pau Monne
  2022-02-25 15:39 ` [PATCH v2 1/2] vpci/msix: reduce indentation in msix_write PBA handling Roger Pau Monne
  2022-02-25 15:39 ` [PATCH v2 2/2] vpci/msix: fix PBA accesses Roger Pau Monne
  0 siblings, 2 replies; 12+ messages in thread
From: Roger Pau Monne @ 2022-02-25 15:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Alex Olson, Roger Pau Monne

Hello,

First patch is just a preparatory non-functional change to reduce
indentation. Second patch is the one that actually fixes access to the
PBA.

Thanks, Roger.

Roger Pau Monne (2):
  vpci/msix: reduce indentation in msix_write PBA handling
  vpci/msix: fix PBA accesses

 xen/drivers/vpci/msix.c | 78 ++++++++++++++++++++++++++++++++---------
 xen/drivers/vpci/vpci.c |  2 ++
 xen/include/xen/vpci.h  |  2 ++
 3 files changed, 66 insertions(+), 16 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] vpci/msix: reduce indentation in msix_write PBA handling
  2022-02-25 15:39 [PATCH v2 0/2] vpci/msix: fix PBA acceses Roger Pau Monne
@ 2022-02-25 15:39 ` Roger Pau Monne
  2022-02-28  8:01   ` Jan Beulich
  2022-02-25 15:39 ` [PATCH v2 2/2] vpci/msix: fix PBA accesses Roger Pau Monne
  1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monne @ 2022-02-25 15:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Alex Olson, Roger Pau Monne

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/drivers/vpci/msix.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index 2ab4079412..a1fa7a5f13 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -275,23 +275,24 @@ static int cf_check msix_write(
 
     if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
     {
-        /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
-        if ( is_hardware_domain(d) )
+
+        if ( !is_hardware_domain(d) )
+            /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
+            return X86EMUL_OKAY;
+
+        switch ( len )
         {
-            switch ( len )
-            {
-            case 4:
-                writel(data, addr);
-                break;
+        case 4:
+            writel(data, addr);
+            break;
 
-            case 8:
-                writeq(data, addr);
-                break;
+        case 8:
+            writeq(data, addr);
+            break;
 
-            default:
-                ASSERT_UNREACHABLE();
-                break;
-            }
+        default:
+            ASSERT_UNREACHABLE();
+            break;
         }
 
         return X86EMUL_OKAY;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] vpci/msix: fix PBA accesses
  2022-02-25 15:39 [PATCH v2 0/2] vpci/msix: fix PBA acceses Roger Pau Monne
  2022-02-25 15:39 ` [PATCH v2 1/2] vpci/msix: reduce indentation in msix_write PBA handling Roger Pau Monne
@ 2022-02-25 15:39 ` Roger Pau Monne
  2022-02-25 17:57   ` Alex Olson
  1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monne @ 2022-02-25 15:39 UTC (permalink / raw)
  To: xen-devel; +Cc: Alex Olson, Roger Pau Monne, Jan Beulich

Map the PBA in order to access it from the MSI-X read and write
handlers. Note that previously the handlers would pass the physical
host address into the {read,write}{l,q} handlers, which is wrong as
those expect a linear address.

Map the PBA using ioremap when the first access is performed. Note
that 32bit arches might want to abstract the call to ioremap into a
vPCI arch handler, so they can use a fixmap range to map the PBA.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Alex Olson <this.is.a0lson@gmail.com>
---
Changes since v1:
 - Also handle writes.

I don't seem to have a box with a driver that will try to access the
PBA, so I would consider this specific code path only build tested. At
least it doesn't seem to regress the current state of vPCI.
---
 xen/drivers/vpci/msix.c | 55 +++++++++++++++++++++++++++++++++++++----
 xen/drivers/vpci/vpci.c |  2 ++
 xen/include/xen/vpci.h  |  2 ++
 3 files changed, 54 insertions(+), 5 deletions(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index a1fa7a5f13..9fbc111ecc 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -198,8 +198,13 @@ static int cf_check msix_read(
     if ( !access_allowed(msix->pdev, addr, len) )
         return X86EMUL_OKAY;
 
+    spin_lock(&msix->pdev->vpci->lock);
     if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
     {
+        struct vpci *vpci = msix->pdev->vpci;
+        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
+        unsigned int idx = addr - base;
+
         /*
          * Access to PBA.
          *
@@ -207,25 +212,43 @@ static int cf_check msix_read(
          * guest address space. If this changes the address will need to be
          * translated.
          */
+
+        if ( !msix->pba )
+        {
+            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
+            if ( !msix->pba )
+            {
+                /*
+                 * If unable to map the PBA return all 1s (all pending): it's
+                 * likely better to trigger spurious events than drop them.
+                 */
+                spin_unlock(&vpci->lock);
+                gprintk(XENLOG_WARNING,
+                        "%pp: unable to map MSI-X PBA, report all pending\n",
+                        msix->pdev);
+                return X86EMUL_OKAY;
+           }
+        }
+
         switch ( len )
         {
         case 4:
-            *data = readl(addr);
+            *data = readl(msix->pba + idx);
             break;
 
         case 8:
-            *data = readq(addr);
+            *data = readq(msix->pba + idx);
             break;
 
         default:
             ASSERT_UNREACHABLE();
             break;
         }
+        spin_unlock(&vpci->lock);
 
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -273,27 +296,49 @@ static int cf_check msix_write(
     if ( !access_allowed(msix->pdev, addr, len) )
         return X86EMUL_OKAY;
 
+    spin_lock(&msix->pdev->vpci->lock);
     if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
     {
+        struct vpci *vpci = msix->pdev->vpci;
+        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
+        unsigned int idx = addr - base;
 
         if ( !is_hardware_domain(d) )
+        {
             /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
+            spin_unlock(&vpci->lock);
             return X86EMUL_OKAY;
+        }
+
+        if ( !msix->pba )
+        {
+            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
+            if ( !msix->pba )
+            {
+                /* Unable to map the PBA, ignore write. */
+                spin_unlock(&vpci->lock);
+                gprintk(XENLOG_WARNING,
+                        "%pp: unable to map MSI-X PBA, write ignored\n",
+                        msix->pdev);
+                return X86EMUL_OKAY;
+           }
+        }
 
         switch ( len )
         {
         case 4:
-            writel(data, addr);
+            writel(data, msix->pba + idx);
             break;
 
         case 8:
-            writeq(data, addr);
+            writeq(data, msix->pba + idx);
             break;
 
         default:
             ASSERT_UNREACHABLE();
             break;
         }
+        spin_unlock(&vpci->lock);
 
         return X86EMUL_OKAY;
     }
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index f3b32d66cb..9fb3c05b2b 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -51,6 +51,8 @@ void vpci_remove_device(struct pci_dev *pdev)
         xfree(r);
     }
     spin_unlock(&pdev->vpci->lock);
+    if ( pdev->vpci->msix && pdev->vpci->msix->pba )
+        iounmap(pdev->vpci->msix->pba);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index bcad1516ae..c399b101ee 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -127,6 +127,8 @@ struct vpci {
         bool enabled         : 1;
         /* Masked? */
         bool masked          : 1;
+        /* PBA map */
+        void *pba;
         /* Entries. */
         struct vpci_msix_entry {
             uint64_t addr;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] vpci/msix: fix PBA accesses
  2022-02-25 15:39 ` [PATCH v2 2/2] vpci/msix: fix PBA accesses Roger Pau Monne
@ 2022-02-25 17:57   ` Alex Olson
  2022-02-26 10:05     ` [PATCH v2.1 " Roger Pau Monne
  2022-02-26 10:10     ` [PATCH v2 " Roger Pau Monné
  0 siblings, 2 replies; 12+ messages in thread
From: Alex Olson @ 2022-02-25 17:57 UTC (permalink / raw)
  To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich

I think there is an issue in the spin_lock handling of patch 2 for the
"msix_write" function as it results in the lock being taken a second time while
held (hangs). 

The lock taken before checking "VMSIX_ADDR_IN_RANGE" isn't unlocked for the non-
PBA case and a second lock is attempted just before the call to get_entry()
later in the same function.  It looks like either the added lock should either
be moved inside the PBA case or the lock before get_entry() should be removed.


On my server, upon loading the ioatdma driver, it now successfully attempts an
PBA write (which now doesn't crash the system), but I'm not sure I have a way to
fully exercise it...

I also see a different (related) issue in which modify_bars is called on a
virtual function seemingly before the BAR addresses are initialized/known and
will start a different thread for that topic.

Regards,

-Alex


On Fri, 2022-02-25 at 16:39 +0100, Roger Pau Monne wrote:
> Map the PBA in order to access it from the MSI-X read and write
> handlers. Note that previously the handlers would pass the physical
> host address into the {read,write}{l,q} handlers, which is wrong as
> those expect a linear address.
> 
> Map the PBA using ioremap when the first access is performed. Note
> that 32bit arches might want to abstract the call to ioremap into a
> vPCI arch handler, so they can use a fixmap range to map the PBA.
> 
> Reported-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Alex Olson <this.is.a0lson@gmail.com>
> ---
> Changes since v1:
>  - Also handle writes.
> 
> I don't seem to have a box with a driver that will try to access the
> PBA, so I would consider this specific code path only build tested. At
> least it doesn't seem to regress the current state of vPCI.
> ---
>  xen/drivers/vpci/msix.c | 55 +++++++++++++++++++++++++++++++++++++----
>  xen/drivers/vpci/vpci.c |  2 ++
>  xen/include/xen/vpci.h  |  2 ++
>  3 files changed, 54 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
> index a1fa7a5f13..9fbc111ecc 100644
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -198,8 +198,13 @@ static int cf_check msix_read(
>      if ( !access_allowed(msix->pdev, addr, len) )
>          return X86EMUL_OKAY;
>  
> +    spin_lock(&msix->pdev->vpci->lock);
>      if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
>      {
> +        struct vpci *vpci = msix->pdev->vpci;
> +        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
> +        unsigned int idx = addr - base;
> +
>          /*
>           * Access to PBA.
>           *
> @@ -207,25 +212,43 @@ static int cf_check msix_read(
>           * guest address space. If this changes the address will need to be
>           * translated.
>           */
> +
> +        if ( !msix->pba )
> +        {
> +            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
> +            if ( !msix->pba )
> +            {
> +                /*
> +                 * If unable to map the PBA return all 1s (all pending): it's
> +                 * likely better to trigger spurious events than drop them.
> +                 */
> +                spin_unlock(&vpci->lock);
> +                gprintk(XENLOG_WARNING,
> +                        "%pp: unable to map MSI-X PBA, report all pending\n",
> +                        msix->pdev);
> +                return X86EMUL_OKAY;
> +           }
> +        }
> +
>          switch ( len )
>          {
>          case 4:
> -            *data = readl(addr);
> +            *data = readl(msix->pba + idx);
>              break;
>  
>          case 8:
> -            *data = readq(addr);
> +            *data = readq(msix->pba + idx);
>              break;
>  
>          default:
>              ASSERT_UNREACHABLE();
>              break;
>          }
> +        spin_unlock(&vpci->lock);
>  
>          return X86EMUL_OKAY;
>      }
>  
> -    spin_lock(&msix->pdev->vpci->lock);
>      entry = get_entry(msix, addr);
>      offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
>  
> @@ -273,27 +296,49 @@ static int cf_check msix_write(
>      if ( !access_allowed(msix->pdev, addr, len) )
>          return X86EMUL_OKAY;
>  
> +    spin_lock(&msix->pdev->vpci->lock);
>      if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
>      {
> +        struct vpci *vpci = msix->pdev->vpci;
> +        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
> +        unsigned int idx = addr - base;
>  
>          if ( !is_hardware_domain(d) )
> +        {
>              /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
> +            spin_unlock(&vpci->lock);
>              return X86EMUL_OKAY;
> +        }
> +
> +        if ( !msix->pba )
> +        {
> +            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
> +            if ( !msix->pba )
> +            {
> +                /* Unable to map the PBA, ignore write. */
> +                spin_unlock(&vpci->lock);
> +                gprintk(XENLOG_WARNING,
> +                        "%pp: unable to map MSI-X PBA, write ignored\n",
> +                        msix->pdev);
> +                return X86EMUL_OKAY;
> +           }
> +        }
>  
>          switch ( len )
>          {
>          case 4:
> -            writel(data, addr);
> +            writel(data, msix->pba + idx);
>              break;
>  
>          case 8:
> -            writeq(data, addr);
> +            writeq(data, msix->pba + idx);
>              break;
>  
>          default:
>              ASSERT_UNREACHABLE();
>              break;
>          }
> +        spin_unlock(&vpci->lock);
>  
>          return X86EMUL_OKAY;
>      }
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index f3b32d66cb..9fb3c05b2b 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -51,6 +51,8 @@ void vpci_remove_device(struct pci_dev *pdev)
>          xfree(r);
>      }
>      spin_unlock(&pdev->vpci->lock);
> +    if ( pdev->vpci->msix && pdev->vpci->msix->pba )
> +        iounmap(pdev->vpci->msix->pba);
>      xfree(pdev->vpci->msix);
>      xfree(pdev->vpci->msi);
>      xfree(pdev->vpci);
> diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
> index bcad1516ae..c399b101ee 100644
> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -127,6 +127,8 @@ struct vpci {
>          bool enabled         : 1;
>          /* Masked? */
>          bool masked          : 1;
> +        /* PBA map */
> +        void *pba;
>          /* Entries. */
>          struct vpci_msix_entry {
>              uint64_t addr;



^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2.1 2/2] vpci/msix: fix PBA accesses
  2022-02-25 17:57   ` Alex Olson
@ 2022-02-26 10:05     ` Roger Pau Monne
  2022-03-01  8:46       ` Jan Beulich
  2022-02-26 10:10     ` [PATCH v2 " Roger Pau Monné
  1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monne @ 2022-02-26 10:05 UTC (permalink / raw)
  To: xen-devel; +Cc: Alex Olson, Roger Pau Monne, Jan Beulich

Map the PBA in order to access it from the MSI-X read and write
handlers. Note that previously the handlers would pass the physical
host address into the {read,write}{l,q} handlers, which is wrong as
those expect a linear address.

Map the PBA using ioremap when the first access is performed. Note
that 32bit arches might want to abstract the call to ioremap into a
vPCI arch handler, so they can use a fixmap range to map the PBA.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Alex Olson <this.is.a0lson@gmail.com>
---
Changes since v1:
 - Also handle writes.

I don't seem to have a box with a driver that will try to access the
PBA, so I would consider this specific code path only build tested. At
least it doesn't seem to regress the current state of vPCI.
---
 xen/drivers/vpci/msix.c | 56 ++++++++++++++++++++++++++++++++++++-----
 xen/drivers/vpci/vpci.c |  2 ++
 xen/include/xen/vpci.h  |  2 ++
 3 files changed, 54 insertions(+), 6 deletions(-)

diff --git a/xen/drivers/vpci/msix.c b/xen/drivers/vpci/msix.c
index a1fa7a5f13..4775f88e1f 100644
--- a/xen/drivers/vpci/msix.c
+++ b/xen/drivers/vpci/msix.c
@@ -198,8 +198,13 @@ static int cf_check msix_read(
     if ( !access_allowed(msix->pdev, addr, len) )
         return X86EMUL_OKAY;
 
+    spin_lock(&msix->pdev->vpci->lock);
     if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
     {
+        struct vpci *vpci = msix->pdev->vpci;
+        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
+        unsigned int idx = addr - base;
+
         /*
          * Access to PBA.
          *
@@ -207,25 +212,43 @@ static int cf_check msix_read(
          * guest address space. If this changes the address will need to be
          * translated.
          */
+
+        if ( !msix->pba )
+        {
+            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
+            if ( !msix->pba )
+            {
+                /*
+                 * If unable to map the PBA return all 1s (all pending): it's
+                 * likely better to trigger spurious events than drop them.
+                 */
+                spin_unlock(&vpci->lock);
+                gprintk(XENLOG_WARNING,
+                        "%pp: unable to map MSI-X PBA, report all pending\n",
+                        msix->pdev);
+                return X86EMUL_OKAY;
+           }
+        }
+
         switch ( len )
         {
         case 4:
-            *data = readl(addr);
+            *data = readl(msix->pba + idx);
             break;
 
         case 8:
-            *data = readq(addr);
+            *data = readq(msix->pba + idx);
             break;
 
         default:
             ASSERT_UNREACHABLE();
             break;
         }
+        spin_unlock(&vpci->lock);
 
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
@@ -273,32 +296,53 @@ static int cf_check msix_write(
     if ( !access_allowed(msix->pdev, addr, len) )
         return X86EMUL_OKAY;
 
+    spin_lock(&msix->pdev->vpci->lock);
     if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
     {
+        struct vpci *vpci = msix->pdev->vpci;
+        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
+        unsigned int idx = addr - base;
 
         if ( !is_hardware_domain(d) )
+        {
             /* Ignore writes to PBA for DomUs, it's behavior is undefined. */
+            spin_unlock(&vpci->lock);
             return X86EMUL_OKAY;
+        }
+
+        if ( !msix->pba )
+        {
+            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
+            if ( !msix->pba )
+            {
+                /* Unable to map the PBA, ignore write. */
+                spin_unlock(&vpci->lock);
+                gprintk(XENLOG_WARNING,
+                        "%pp: unable to map MSI-X PBA, write ignored\n",
+                        msix->pdev);
+                return X86EMUL_OKAY;
+           }
+        }
 
         switch ( len )
         {
         case 4:
-            writel(data, addr);
+            writel(data, msix->pba + idx);
             break;
 
         case 8:
-            writeq(data, addr);
+            writeq(data, msix->pba + idx);
             break;
 
         default:
             ASSERT_UNREACHABLE();
             break;
         }
+        spin_unlock(&vpci->lock);
 
         return X86EMUL_OKAY;
     }
 
-    spin_lock(&msix->pdev->vpci->lock);
     entry = get_entry(msix, addr);
     offset = addr & (PCI_MSIX_ENTRY_SIZE - 1);
 
diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
index f3b32d66cb..9fb3c05b2b 100644
--- a/xen/drivers/vpci/vpci.c
+++ b/xen/drivers/vpci/vpci.c
@@ -51,6 +51,8 @@ void vpci_remove_device(struct pci_dev *pdev)
         xfree(r);
     }
     spin_unlock(&pdev->vpci->lock);
+    if ( pdev->vpci->msix && pdev->vpci->msix->pba )
+        iounmap(pdev->vpci->msix->pba);
     xfree(pdev->vpci->msix);
     xfree(pdev->vpci->msi);
     xfree(pdev->vpci);
diff --git a/xen/include/xen/vpci.h b/xen/include/xen/vpci.h
index bcad1516ae..c399b101ee 100644
--- a/xen/include/xen/vpci.h
+++ b/xen/include/xen/vpci.h
@@ -127,6 +127,8 @@ struct vpci {
         bool enabled         : 1;
         /* Masked? */
         bool masked          : 1;
+        /* PBA map */
+        void *pba;
         /* Entries. */
         struct vpci_msix_entry {
             uint64_t addr;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] vpci/msix: fix PBA accesses
  2022-02-25 17:57   ` Alex Olson
  2022-02-26 10:05     ` [PATCH v2.1 " Roger Pau Monne
@ 2022-02-26 10:10     ` Roger Pau Monné
  2022-02-28 18:19       ` Alex Olson
  1 sibling, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2022-02-26 10:10 UTC (permalink / raw)
  To: Alex Olson; +Cc: xen-devel, Jan Beulich

On Fri, Feb 25, 2022 at 11:57:05AM -0600, Alex Olson wrote:
> I think there is an issue in the spin_lock handling of patch 2 for the
> "msix_write" function as it results in the lock being taken a second time while
> held (hangs). 
> 
> The lock taken before checking "VMSIX_ADDR_IN_RANGE" isn't unlocked for the non-
> PBA case and a second lock is attempted just before the call to get_entry()
> later in the same function.  It looks like either the added lock should either
> be moved inside the PBA case or the lock before get_entry() should be removed.

Sorry, was in a rush to send this before leaving yesterday and didn't
refresh the commit before generating the patch, v2.1 should be fixed.

Could you provide a 'Tested-by' if it work for you?

> 
> On my server, upon loading the ioatdma driver, it now successfully attempts an
> PBA write (which now doesn't crash the system), but I'm not sure I have a way to
> fully exercise it...

Urg, that's weird, PBA should be read-only only according to the spec.
Writes to PBA have undefined behavior.

> 
> I also see a different (related) issue in which modify_bars is called on a
> virtual function seemingly before the BAR addresses are initialized/known and
> will start a different thread for that topic.

SR-IOV is not supported on PVH dom0 yet, so that's not going to work.
I've posted a series in 2018 to enable it, but sadly had no time to
work on it anymore:

https://lore.kernel.org/xen-devel/20180717094830.54806-1-roger.pau@citrix.com/

It's likely not going to apply cleanly, and there's a lot of comments
to be fixed up there.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] vpci/msix: reduce indentation in msix_write PBA handling
  2022-02-25 15:39 ` [PATCH v2 1/2] vpci/msix: reduce indentation in msix_write PBA handling Roger Pau Monne
@ 2022-02-28  8:01   ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-02-28  8:01 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Alex Olson, xen-devel

On 25.02.2022 16:39, Roger Pau Monne wrote:
> No functional change.
> 
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Acked-by: Jan Beulich <jbeulich@suse.com>



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] vpci/msix: fix PBA accesses
  2022-02-26 10:10     ` [PATCH v2 " Roger Pau Monné
@ 2022-02-28 18:19       ` Alex Olson
  2022-03-01  7:43         ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Alex Olson @ 2022-02-28 18:19 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich

Hi Roger,

The revised patch looks good.  The PBA writes seen during ioatdma driver
initialization (self-test) complete successfully and the driver doesn't complain
(I see two interrupts per ioatdma device).   The driver has a self-test feature
which appears to exercise MSIX interrupts and has code which appears to cause a
PBA write.

Feel free to add "Tested-by: Alex.Olson@starlab.io" to your patchset.

Thanks also for the pointer to your 2018 work on SR-IOV, I'll give it a try. 


FYI, with this patch,  I was seeing  msix_read() and msix_write() being called
during the driver's self-test on physical address 0xfbc01800, corresponding to
the beginning of the PBA (lspci excerpt below):


02:00.0 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 0
...
        Region 0: Memory at fbc06000 (64-bit, non-prefetchable) [size=8K]
...
        Capabilities: [ac] MSI-X: Enable+ Count=1 Masked-
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001800
...
        Kernel modules: ioatdma



The functions involved on the Linux kernel side are:

ioat_probe()
 -> ioat3_dma_self_test()
  -> ioat_dma_self_test()
   -> ioat_free_chan_resources()
    ->  ioat_reset_hw()

drivers/dma/ioat/dma.c:   ioat_reset_hw()
...
    ioat_dma->msixpba = readq(ioat_dma->reg_base + 0x1800);
...
    writeq(ioat_dma->msixpba, ioat_dma->reg_base + 0x1800);


Thanks,

-Alex

On Sat, 2022-02-26 at 11:10 +0100, Roger Pau Monné wrote:
> On Fri, Feb 25, 2022 at 11:57:05AM -0600, Alex Olson wrote:
> > I think there is an issue in the spin_lock handling of patch 2 for the
> > "msix_write" function as it results in the lock being taken a second time
> > while
> > held (hangs). 
> > 
> > The lock taken before checking "VMSIX_ADDR_IN_RANGE" isn't unlocked for the
> > non-
> > PBA case and a second lock is attempted just before the call to get_entry()
> > later in the same function.  It looks like either the added lock should
> > either
> > be moved inside the PBA case or the lock before get_entry() should be
> > removed.
> 
> Sorry, was in a rush to send this before leaving yesterday and didn't
> refresh the commit before generating the patch, v2.1 should be fixed.
> 
> Could you provide a 'Tested-by' if it work for you?
> 
> > On my server, upon loading the ioatdma driver, it now successfully attempts
> > an
> > PBA write (which now doesn't crash the system), but I'm not sure I have a
> > way to
> > fully exercise it...
> 
> Urg, that's weird, PBA should be read-only only according to the spec.
> Writes to PBA have undefined behavior.
> 
> > I also see a different (related) issue in which modify_bars is called on a
> > virtual function seemingly before the BAR addresses are initialized/known
> > and
> > will start a different thread for that topic.
> 
> SR-IOV is not supported on PVH dom0 yet, so that's not going to work.
> I've posted a series in 2018 to enable it, but sadly had no time to
> work on it anymore:
> 
> https://lore.kernel.org/xen-devel/20180717094830.54806-1-roger.pau@citrix.com/
> 
> It's likely not going to apply cleanly, and there's a lot of comments
> to be fixed up there.
> 
> Thanks, Roger.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] vpci/msix: fix PBA accesses
  2022-02-28 18:19       ` Alex Olson
@ 2022-03-01  7:43         ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-03-01  7:43 UTC (permalink / raw)
  To: Alex Olson; +Cc: xen-devel, Roger Pau Monné

On 28.02.2022 19:19, Alex Olson wrote:
> FYI, with this patch,  I was seeing  msix_read() and msix_write() being called
> during the driver's self-test on physical address 0xfbc01800, corresponding to
> the beginning of the PBA (lspci excerpt below):
> 
> 
> 02:00.0 System peripheral: Intel Corporation Xeon Processor D Family QuickData Technology Register DMA Channel 0
> ...
>         Region 0: Memory at fbc06000 (64-bit, non-prefetchable) [size=8K]
> ...
>         Capabilities: [ac] MSI-X: Enable+ Count=1 Masked-
>                 Vector table: BAR=0 offset=00001000
>                 PBA: BAR=0 offset=00001800
> ...
>         Kernel modules: ioatdma
> 
> 
> 
> The functions involved on the Linux kernel side are:
> 
> ioat_probe()
>  -> ioat3_dma_self_test()
>   -> ioat_dma_self_test()
>    -> ioat_free_chan_resources()
>     ->  ioat_reset_hw()
> 
> drivers/dma/ioat/dma.c:   ioat_reset_hw()
> ...
>     ioat_dma->msixpba = readq(ioat_dma->reg_base + 0x1800);
> ...
>     writeq(ioat_dma->msixpba, ioat_dma->reg_base + 0x1800);

Wow, a clear and apparently intentional violation of the PCI spec. There was
a workaround for a reset issue introduced by commit 8a52b9ff1154, which was
then revised by c997e30e7f65 to take the present shape. However, both commits
claim this only affects certain Atoms, albeit the latter less explicitly by
having "CB3.3" in the subject. Yet you're seeing this on a Xeon D ...

Jan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2.1 2/2] vpci/msix: fix PBA accesses
  2022-02-26 10:05     ` [PATCH v2.1 " Roger Pau Monne
@ 2022-03-01  8:46       ` Jan Beulich
  2022-03-01  9:08         ` Roger Pau Monné
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Beulich @ 2022-03-01  8:46 UTC (permalink / raw)
  To: Roger Pau Monne; +Cc: Alex Olson, xen-devel

On 26.02.2022 11:05, Roger Pau Monne wrote:
> --- a/xen/drivers/vpci/msix.c
> +++ b/xen/drivers/vpci/msix.c
> @@ -198,8 +198,13 @@ static int cf_check msix_read(
>      if ( !access_allowed(msix->pdev, addr, len) )
>          return X86EMUL_OKAY;
>  
> +    spin_lock(&msix->pdev->vpci->lock);
>      if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
>      {
> +        struct vpci *vpci = msix->pdev->vpci;
> +        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
> +        unsigned int idx = addr - base;
> +
>          /*
>           * Access to PBA.
>           *
> @@ -207,25 +212,43 @@ static int cf_check msix_read(
>           * guest address space. If this changes the address will need to be
>           * translated.
>           */
> +
> +        if ( !msix->pba )
> +        {
> +            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
> +            if ( !msix->pba )
> +            {
> +                /*
> +                 * If unable to map the PBA return all 1s (all pending): it's
> +                 * likely better to trigger spurious events than drop them.
> +                 */
> +                spin_unlock(&vpci->lock);
> +                gprintk(XENLOG_WARNING,
> +                        "%pp: unable to map MSI-X PBA, report all pending\n",
> +                        msix->pdev);
> +                return X86EMUL_OKAY;

Hmm, this may report more set bits than there are vectors. Which is
probably fine, but the comment may want adjusting a little to make
clear this is understood and meant to be that way.

> +           }
> +        }

Imo it would make sense to limit the locked region to around just this
check-and-map logic. There's no need for ...

>          switch ( len )
>          {
>          case 4:
> -            *data = readl(addr);
> +            *data = readl(msix->pba + idx);
>              break;
>  
>          case 8:
> -            *data = readq(addr);
> +            *data = readq(msix->pba + idx);
>              break;
>  
>          default:
>              ASSERT_UNREACHABLE();
>              break;
>          }
> +        spin_unlock(&vpci->lock);

... the actual access to happen under lock, as you remove the mapping
only when the device is being removed. I'm inclined to suggest making
a helper function, which does an unlocked check, then the ioremap(),
then takes the lock and re-checks whether the field's still NULL, and
either installs the mapping or (after unlocking) iounmap()s it.

> --- a/xen/include/xen/vpci.h
> +++ b/xen/include/xen/vpci.h
> @@ -127,6 +127,8 @@ struct vpci {
>          bool enabled         : 1;
>          /* Masked? */
>          bool masked          : 1;
> +        /* PBA map */
> +        void *pba;

Here (and elsewhere as/if applicable) you want to add __iomem, even
if this is merely for documentation purposes right now.

I think you did mention this elsewhere: Don't we also need to deal
with accesses to MMIO covered by the same BAR / page, but falling
outside of the MSI-X table and PBA?

Jan



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2.1 2/2] vpci/msix: fix PBA accesses
  2022-03-01  8:46       ` Jan Beulich
@ 2022-03-01  9:08         ` Roger Pau Monné
  2022-03-01 10:32           ` Jan Beulich
  0 siblings, 1 reply; 12+ messages in thread
From: Roger Pau Monné @ 2022-03-01  9:08 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Alex Olson, xen-devel

On Tue, Mar 01, 2022 at 09:46:13AM +0100, Jan Beulich wrote:
> On 26.02.2022 11:05, Roger Pau Monne wrote:
> > --- a/xen/drivers/vpci/msix.c
> > +++ b/xen/drivers/vpci/msix.c
> > @@ -198,8 +198,13 @@ static int cf_check msix_read(
> >      if ( !access_allowed(msix->pdev, addr, len) )
> >          return X86EMUL_OKAY;
> >  
> > +    spin_lock(&msix->pdev->vpci->lock);
> >      if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
> >      {
> > +        struct vpci *vpci = msix->pdev->vpci;
> > +        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
> > +        unsigned int idx = addr - base;
> > +
> >          /*
> >           * Access to PBA.
> >           *
> > @@ -207,25 +212,43 @@ static int cf_check msix_read(
> >           * guest address space. If this changes the address will need to be
> >           * translated.
> >           */
> > +
> > +        if ( !msix->pba )
> > +        {
> > +            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
> > +            if ( !msix->pba )
> > +            {
> > +                /*
> > +                 * If unable to map the PBA return all 1s (all pending): it's
> > +                 * likely better to trigger spurious events than drop them.
> > +                 */
> > +                spin_unlock(&vpci->lock);
> > +                gprintk(XENLOG_WARNING,
> > +                        "%pp: unable to map MSI-X PBA, report all pending\n",
> > +                        msix->pdev);
> > +                return X86EMUL_OKAY;
> 
> Hmm, this may report more set bits than there are vectors. Which is
> probably fine, but the comment may want adjusting a little to make
> clear this is understood and meant to be that way.

Yes, it could return more bits than vectors, but that area is also
part of the PBA (as the end is aligned to 8 bytes). I will adjust the
comment.

> > +           }
> > +        }
> 
> Imo it would make sense to limit the locked region to around just this
> check-and-map logic. There's no need for ...
> 
> >          switch ( len )
> >          {
> >          case 4:
> > -            *data = readl(addr);
> > +            *data = readl(msix->pba + idx);
> >              break;
> >  
> >          case 8:
> > -            *data = readq(addr);
> > +            *data = readq(msix->pba + idx);
> >              break;
> >  
> >          default:
> >              ASSERT_UNREACHABLE();
> >              break;
> >          }
> > +        spin_unlock(&vpci->lock);
> 
> ... the actual access to happen under lock, as you remove the mapping
> only when the device is being removed. I'm inclined to suggest making
> a helper function, which does an unlocked check, then the ioremap(),
> then takes the lock and re-checks whether the field's still NULL, and
> either installs the mapping or (after unlocking) iounmap()s it.

I'm fine with dropping the lock earlier, but I'm not sure there's much
point in placing this in a separate helper, as it's the mapping of at
most 2 pages (PBA is 2048 bytes in size, 64bit aligned).

I guess you are suggesting this in preparation for adding support to
access the non PBA area falling into the same page(s)?

> > --- a/xen/include/xen/vpci.h
> > +++ b/xen/include/xen/vpci.h
> > @@ -127,6 +127,8 @@ struct vpci {
> >          bool enabled         : 1;
> >          /* Masked? */
> >          bool masked          : 1;
> > +        /* PBA map */
> > +        void *pba;
> 
> Here (and elsewhere as/if applicable) you want to add __iomem, even
> if this is merely for documentation purposes right now.

Will add.

> I think you did mention this elsewhere: Don't we also need to deal
> with accesses to MMIO covered by the same BAR / page, but falling
> outside of the MSI-X table and PBA?

Yes, I did mention it in a reply to Alex:

https://lore.kernel.org/xen-devel/Yhj58BIIN2p4bYJ8@Air-de-Roger/

So far we seem to have gotten away without needing it, but I might as
well try to implement, shouldn't be too complicated.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2.1 2/2] vpci/msix: fix PBA accesses
  2022-03-01  9:08         ` Roger Pau Monné
@ 2022-03-01 10:32           ` Jan Beulich
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Beulich @ 2022-03-01 10:32 UTC (permalink / raw)
  To: Roger Pau Monné; +Cc: Alex Olson, xen-devel

On 01.03.2022 10:08, Roger Pau Monné wrote:
> On Tue, Mar 01, 2022 at 09:46:13AM +0100, Jan Beulich wrote:
>> On 26.02.2022 11:05, Roger Pau Monne wrote:
>>> --- a/xen/drivers/vpci/msix.c
>>> +++ b/xen/drivers/vpci/msix.c
>>> @@ -198,8 +198,13 @@ static int cf_check msix_read(
>>>      if ( !access_allowed(msix->pdev, addr, len) )
>>>          return X86EMUL_OKAY;
>>>  
>>> +    spin_lock(&msix->pdev->vpci->lock);
>>>      if ( VMSIX_ADDR_IN_RANGE(addr, msix->pdev->vpci, VPCI_MSIX_PBA) )
>>>      {
>>> +        struct vpci *vpci = msix->pdev->vpci;
>>> +        paddr_t base = vmsix_table_addr(vpci, VPCI_MSIX_PBA);
>>> +        unsigned int idx = addr - base;
>>> +
>>>          /*
>>>           * Access to PBA.
>>>           *
>>> @@ -207,25 +212,43 @@ static int cf_check msix_read(
>>>           * guest address space. If this changes the address will need to be
>>>           * translated.
>>>           */
>>> +
>>> +        if ( !msix->pba )
>>> +        {
>>> +            msix->pba = ioremap(base, vmsix_table_size(vpci, VPCI_MSIX_PBA));
>>> +            if ( !msix->pba )
>>> +            {
>>> +                /*
>>> +                 * If unable to map the PBA return all 1s (all pending): it's
>>> +                 * likely better to trigger spurious events than drop them.
>>> +                 */
>>> +                spin_unlock(&vpci->lock);
>>> +                gprintk(XENLOG_WARNING,
>>> +                        "%pp: unable to map MSI-X PBA, report all pending\n",
>>> +                        msix->pdev);
>>> +                return X86EMUL_OKAY;
>>
>> Hmm, this may report more set bits than there are vectors. Which is
>> probably fine, but the comment may want adjusting a little to make
>> clear this is understood and meant to be that way.
> 
> Yes, it could return more bits than vectors, but that area is also
> part of the PBA (as the end is aligned to 8 bytes). I will adjust the
> comment.
> 
>>> +           }
>>> +        }
>>
>> Imo it would make sense to limit the locked region to around just this
>> check-and-map logic. There's no need for ...
>>
>>>          switch ( len )
>>>          {
>>>          case 4:
>>> -            *data = readl(addr);
>>> +            *data = readl(msix->pba + idx);
>>>              break;
>>>  
>>>          case 8:
>>> -            *data = readq(addr);
>>> +            *data = readq(msix->pba + idx);
>>>              break;
>>>  
>>>          default:
>>>              ASSERT_UNREACHABLE();
>>>              break;
>>>          }
>>> +        spin_unlock(&vpci->lock);
>>
>> ... the actual access to happen under lock, as you remove the mapping
>> only when the device is being removed. I'm inclined to suggest making
>> a helper function, which does an unlocked check, then the ioremap(),
>> then takes the lock and re-checks whether the field's still NULL, and
>> either installs the mapping or (after unlocking) iounmap()s it.
> 
> I'm fine with dropping the lock earlier, but I'm not sure there's much
> point in placing this in a separate helper, as it's the mapping of at
> most 2 pages (PBA is 2048 bytes in size, 64bit aligned).
> 
> I guess you are suggesting this in preparation for adding support to
> access the non PBA area falling into the same page(s)?

Not just. The write path wants to use the same logic, and with it
becoming a little more involved I think it would be better to have it
in just one place.

Jan



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-03-01 10:32 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-25 15:39 [PATCH v2 0/2] vpci/msix: fix PBA acceses Roger Pau Monne
2022-02-25 15:39 ` [PATCH v2 1/2] vpci/msix: reduce indentation in msix_write PBA handling Roger Pau Monne
2022-02-28  8:01   ` Jan Beulich
2022-02-25 15:39 ` [PATCH v2 2/2] vpci/msix: fix PBA accesses Roger Pau Monne
2022-02-25 17:57   ` Alex Olson
2022-02-26 10:05     ` [PATCH v2.1 " Roger Pau Monne
2022-03-01  8:46       ` Jan Beulich
2022-03-01  9:08         ` Roger Pau Monné
2022-03-01 10:32           ` Jan Beulich
2022-02-26 10:10     ` [PATCH v2 " Roger Pau Monné
2022-02-28 18:19       ` Alex Olson
2022-03-01  7:43         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.