All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
@ 2020-07-23 15:13 Matthew Rosato
  2020-07-23 15:13 ` [RFC PATCH] s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci Matthew Rosato
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Matthew Rosato @ 2020-07-23 15:13 UTC (permalink / raw)
  To: alex.williamson, pmorel, schnelle
  Cc: david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth

I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
and block MMIO access on disabled memory' vfio-pci via qemu on s390x
fails spectacularly, with errors in qemu like:

qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error

From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().

So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
a bit of tracing, we seem to be triggering the new trap in
__vfio_pci_memory_enabled().  Sure enough, if I just force this function to
return 'true' as a test case, things work again.
The included patch attempts to enforce the setting, which restores everything
to working order but also triggers vfio_bar_restore() in the process....  So
this isn't the right answer, more of a proof-of-concept.

@Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
recent kernel change?

@Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
I note that my host device lspci output looks like:

0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]

But the device is not marked as is_virtfn..  Otherwise, Alex's fix
from htps://lkml.org/lkml/2020/6/25/628 should cover the case.



Matthew Rosato (1):
  s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci

 hw/s390x/s390-pci-inst.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH] s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
  2020-07-23 15:13 [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Matthew Rosato
@ 2020-07-23 15:13 ` Matthew Rosato
  2020-07-23 16:29 ` [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Alex Williamson
  2020-07-24  9:46 ` Niklas Schnelle
  2 siblings, 0 replies; 14+ messages in thread
From: Matthew Rosato @ 2020-07-23 15:13 UTC (permalink / raw)
  To: alex.williamson, pmorel, schnelle
  Cc: david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth

After kernel commit abafbc55 'vfio-pci: Invalidate mmaps and
block MMIO access on disabled memory', I'm seeing qemu errors
on s390x as vfio-pci devices attempt to read bar 0 because
PCI_COMMAND_MEMORY is not reliably set.  Ensure the value
stays ON while the device is in-use.

Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
 hw/s390x/s390-pci-inst.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/hw/s390x/s390-pci-inst.c b/hw/s390x/s390-pci-inst.c
index 2f7a7d7..74b8796 100644
--- a/hw/s390x/s390-pci-inst.c
+++ b/hw/s390x/s390-pci-inst.c
@@ -555,6 +555,16 @@ int pcistg_service_call(S390CPU *cpu, uint8_t r1, uint8_t r2, uintptr_t ra)
         }
         /* len = 1,2,4 so we do not need to test */
         zpci_endian_swap(&data, len);
+
+        /*
+         * vfio-pci requires PCI_COMMAND_MEMORY.  Ensure that it
+         * stays on for an active device.
+         */
+        if ((pbdev->fh & FH_SHM_VFIO) && (offset == PCI_COMMAND) &&
+            (len == 2) && (data != 0)) {
+            data |= PCI_COMMAND_MEMORY;
+        }
+
         pci_host_config_write_common(pbdev->pdev, offset,
                                      pci_config_size(pbdev->pdev),
                                      data, len);
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-23 15:13 [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Matthew Rosato
  2020-07-23 15:13 ` [RFC PATCH] s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci Matthew Rosato
@ 2020-07-23 16:29 ` Alex Williamson
  2020-07-23 18:12   ` Matthew Rosato
  2020-07-27 15:40   ` Pierre Morel
  2020-07-24  9:46 ` Niklas Schnelle
  2 siblings, 2 replies; 14+ messages in thread
From: Alex Williamson @ 2020-07-23 16:29 UTC (permalink / raw)
  To: Matthew Rosato
  Cc: schnelle, pmorel, david, cohuck, qemu-devel, pasic, borntraeger,
	qemu-s390x, rth

On Thu, 23 Jul 2020 11:13:55 -0400
Matthew Rosato <mjrosato@linux.ibm.com> wrote:

> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
> fails spectacularly, with errors in qemu like:
> 
> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
> 
> From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
> 
> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
> a bit of tracing, we seem to be triggering the new trap in
> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
> return 'true' as a test case, things work again.
> The included patch attempts to enforce the setting, which restores everything
> to working order but also triggers vfio_bar_restore() in the process....  So
> this isn't the right answer, more of a proof-of-concept.
> 
> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
> recent kernel change?

Bummer!  I won't claim to understand s390 PCI, but if we have a VF
exposed to the "host" (ie. the first level where vfio-pci is being
used), but we can't tell that it's a VF, how do we know whether the
memory bit in the command register is unimplemented because it's a VF
or unimplemented because the device doesn't support MMIO?  How are the
device ID, vendor ID, and BAR registers virtualized to the host?  Could
the memory enable bit also be emulated by that virtualization, much
like vfio-pci does for userspace?  If the other registers are
virtualized, but these command register bits are left unimplemented, do
we need code to deduce that we have a VF based on the existence of MMIO
BARs, but lack of memory enable bit?  Thanks,

Alex

> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
> I note that my host device lspci output looks like:
> 
> 0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
> 
> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
> 
> 
> 
> Matthew Rosato (1):
>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
> 
>  hw/s390x/s390-pci-inst.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-23 16:29 ` [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Alex Williamson
@ 2020-07-23 18:12   ` Matthew Rosato
  2020-07-27 15:40   ` Pierre Morel
  1 sibling, 0 replies; 14+ messages in thread
From: Matthew Rosato @ 2020-07-23 18:12 UTC (permalink / raw)
  To: Alex Williamson
  Cc: schnelle, pmorel, david, cohuck, qemu-devel, pasic, borntraeger,
	qemu-s390x, rth

On 7/23/20 12:29 PM, Alex Williamson wrote:
> On Thu, 23 Jul 2020 11:13:55 -0400
> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> 
>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>> fails spectacularly, with errors in qemu like:
>>
>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
>>
>>  From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>
>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>> a bit of tracing, we seem to be triggering the new trap in
>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>> return 'true' as a test case, things work again.
>> The included patch attempts to enforce the setting, which restores everything
>> to working order but also triggers vfio_bar_restore() in the process....  So
>> this isn't the right answer, more of a proof-of-concept.
>>
>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
>> recent kernel change?
> 
> Bummer!  I won't claim to understand s390 PCI, but if we have a VF
> exposed to the "host" (ie. the first level where vfio-pci is being
> used), but we can't tell that it's a VF, how do we know whether the
> memory bit in the command register is unimplemented because it's a VF
> or unimplemented because the device doesn't support MMIO?  How are the
> device ID, vendor ID, and BAR registers virtualized to the host?  Could

On s390 this info is all advertised/accessed via special instructions 
(my PoC qemu patch was intercepting one of these sorts of instructions 
from the guest and modifying it).

> the memory enable bit also be emulated by that virtualization, much
> like vfio-pci does for userspace?  If the other registers are
> virtualized, but these command register bits are left unimplemented, do
> we need code to deduce that we have a VF based on the existence of MMIO
> BARs, but lack of memory enable bit?  Thanks,

Yeah, I'm thinking we might need something to this effect for s390 at 
least.  But I'm curious if Niklas/Pierre have any add'l thoughts about 
that since they touched the virtfn stuff recently for s390 PCI.

> 
> Alex
> 
>> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
>> I note that my host device lspci output looks like:
>>
>> 0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
>>
>> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
>> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
>>
>>
>>
>> Matthew Rosato (1):
>>    s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
>>
>>   hw/s390x/s390-pci-inst.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-23 15:13 [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Matthew Rosato
  2020-07-23 15:13 ` [RFC PATCH] s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci Matthew Rosato
  2020-07-23 16:29 ` [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Alex Williamson
@ 2020-07-24  9:46 ` Niklas Schnelle
  2020-07-24  9:53   ` Niklas Schnelle
  2020-07-24 14:15   ` Niklas Schnelle
  2 siblings, 2 replies; 14+ messages in thread
From: Niklas Schnelle @ 2020-07-24  9:46 UTC (permalink / raw)
  To: Matthew Rosato, alex.williamson, pmorel
  Cc: david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth



On 7/23/20 5:13 PM, Matthew Rosato wrote:
> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
> fails spectacularly, with errors in qemu like:
> 
> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
> 
> From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
> 
> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
> a bit of tracing, we seem to be triggering the new trap in
> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
> return 'true' as a test case, things work again.
> The included patch attempts to enforce the setting, which restores everything
> to working order but also triggers vfio_bar_restore() in the process....  So
> this isn't the right answer, more of a proof-of-concept.
> 
> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
> recent kernel change?
> 
> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
> I note that my host device lspci output looks like:
> 
> 0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
> 
> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I introduced
the is_physfn field to struct zpci_dev which gets set through the
CLP Query PCI Function. Also with that commit this being 0 will set
is_virtfn to 1.
Interestingly looking at s390-pci-inst.c in QEMU I'd think that
on QEMU this should already be 0 and thus is_virtfn should be set
with Linux >5.8-rc1 and the missing case is actually for passing through
a PF where it would wrongly be 0 too. 
Note: If the Linux instance does not see the
parent PF however the only way I know to test if it is a VF from userspace
is checking if /sys/bus/pci/devices/<dev>/vfn is non-zero which is platform
specific and currently wrongly set 0 on QEMU for VFs.
If the PF is known the mentioned commit will also create the
/sys/bus/pci/devices/<dev>/physfn symlink as on other platforms.
> Matthew Rosato (1):
>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
> 
>  hw/s390x/s390-pci-inst.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-24  9:46 ` Niklas Schnelle
@ 2020-07-24  9:53   ` Niklas Schnelle
  2020-07-24 14:15   ` Niklas Schnelle
  1 sibling, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2020-07-24  9:53 UTC (permalink / raw)
  To: Matthew Rosato, alex.williamson, pmorel
  Cc: david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth



On 7/24/20 11:46 AM, Niklas Schnelle wrote:
> 
> 
> On 7/23/20 5:13 PM, Matthew Rosato wrote:
>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>> fails spectacularly, with errors in qemu like:
>>
>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
>>
>> From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>
>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>> a bit of tracing, we seem to be triggering the new trap in
>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>> return 'true' as a test case, things work again.
>> The included patch attempts to enforce the setting, which restores everything
>> to working order but also triggers vfio_bar_restore() in the process....  So
>> this isn't the right answer, more of a proof-of-concept.
>>
>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
>> recent kernel change?
>>
>> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
>> I note that my host device lspci output looks like:
>>
>> 0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
>>
>> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
>> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
> With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I introduced
> the is_physfn field to struct zpci_dev which gets set through the
> CLP Query PCI Function. Also with that commit this being 0 will set
> is_virtfn to 1.
> Interestingly looking at s390-pci-inst.c in QEMU I'd think that
> on QEMU this should already be 0 and thus is_virtfn should be set
> with Linux >5.8-rc1 and the missing case is actually for passing through
> a PF where it would wrongly be 0 too. 
> Note: If the Linux instance does not see the
> parent PF however the only way I know to test if it is a VF from userspace
> is checking if /sys/bus/pci/devices/<dev>/vfn is non-zero which is platform
> specific and currently wrongly set 0 on QEMU for VFs.
> If the PF is known the mentioned commit will also create the
> /sys/bus/pci/devices/<dev>/physfn symlink as on other platforms.
Arghh, sorry the problem is of course that is_virtfn is not set in
the host. I thought it should be but testing this the is_physfn
bit is actually non-zero for RoCEs on z/VM and LPAR. I will
try figuring out why that is, I guess I should have used the
vfn field instead but I thought is_physfn would be more explicit :-(
>> Matthew Rosato (1):
>>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
>>
>>  hw/s390x/s390-pci-inst.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-24  9:46 ` Niklas Schnelle
  2020-07-24  9:53   ` Niklas Schnelle
@ 2020-07-24 14:15   ` Niklas Schnelle
  1 sibling, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2020-07-24 14:15 UTC (permalink / raw)
  To: Matthew Rosato, alex.williamson, pmorel
  Cc: david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth



On 7/24/20 11:46 AM, Niklas Schnelle wrote:
> 
> 
> On 7/23/20 5:13 PM, Matthew Rosato wrote:
>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>> fails spectacularly, with errors in qemu like:
>>
>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
>>
>> From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>
>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>> a bit of tracing, we seem to be triggering the new trap in
>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>> return 'true' as a test case, things work again.
>> The included patch attempts to enforce the setting, which restores everything
>> to working order but also triggers vfio_bar_restore() in the process....  So
>> this isn't the right answer, more of a proof-of-concept.
>>
>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
>> recent kernel change?
>>
>> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
>> I note that my host device lspci output looks like:
>>
>> 0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
>>
>> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
>> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
> With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I introduced
> the is_physfn field to struct zpci_dev which gets set through the
> CLP Query PCI Function. Also with that commit this being 0 will set
> is_virtfn to 1.
> Interestingly looking at s390-pci-inst.c in QEMU I'd think that
> on QEMU this should already be 0 and thus is_virtfn should be set
> with Linux >5.8-rc1 and the missing case is actually for passing through
> a PF where it would wrongly be 0 too. 
> Note: If the Linux instance does not see the
> parent PF however the only way I know to test if it is a VF from userspace
> is checking if /sys/bus/pci/devices/<dev>/vfn is non-zero which is platform
> specific and currently wrongly set 0 on QEMU for VFs.
> If the PF is known the mentioned commit will also create the
> /sys/bus/pci/devices/<dev>/physfn symlink as on other platforms.
Ok I've been looking into this more and is_physfn is correctly set
to 0 on RoCE VFs, there is just a Bug in zPCI preventing the
is_virtfn from beeing set when the device is found in
CLP List PCI devices (equivalent too boot time bus probing
on other architectures) instead of beeing hot plugged later
which is the case for sriov_numvfs.
I'm working on a fix for a bug with PF/VF linking anyway so
should get that fixed then.
>> Matthew Rosato (1):
>>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
>>
>>  hw/s390x/s390-pci-inst.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-23 16:29 ` [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Alex Williamson
  2020-07-23 18:12   ` Matthew Rosato
@ 2020-07-27 15:40   ` Pierre Morel
  2020-07-27 16:37     ` Matthew Rosato
                       ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Pierre Morel @ 2020-07-27 15:40 UTC (permalink / raw)
  To: Alex Williamson, Matthew Rosato
  Cc: schnelle, david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth



On 2020-07-23 18:29, Alex Williamson wrote:
> On Thu, 23 Jul 2020 11:13:55 -0400
> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> 
>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>> fails spectacularly, with errors in qemu like:
>>
>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
>>
>>  From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>
>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>> a bit of tracing, we seem to be triggering the new trap in
>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>> return 'true' as a test case, things work again.
>> The included patch attempts to enforce the setting, which restores everything
>> to working order but also triggers vfio_bar_restore() in the process....  So
>> this isn't the right answer, more of a proof-of-concept.
>>
>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
>> recent kernel change?
> 
> Bummer!  I won't claim to understand s390 PCI, but if we have a VF
> exposed to the "host" (ie. the first level where vfio-pci is being
> used), but we can't tell that it's a VF, how do we know whether the
> memory bit in the command register is unimplemented because it's a VF
> or unimplemented because the device doesn't support MMIO?  How are the
> device ID, vendor ID, and BAR registers virtualized to the host?  Could
> the memory enable bit also be emulated by that virtualization, much
> like vfio-pci does for userspace?  If the other registers are
> virtualized, but these command register bits are left unimplemented, do
> we need code to deduce that we have a VF based on the existence of MMIO
> BARs, but lack of memory enable bit?  Thanks,
> 
> Alex

Alex, Matt,

in s390 we have the possibility to assign a virtual function to a 
logical partition as function 0.
In this case it can not be treated as a virtual function but must be 
treated as a physical function.
This is currently working very well.
However, these functions do not set PCI_COMMAND_MEMORY as we need.

Shouldn't we fix this inside the kernel, to keep older QMEU working?

Then would it be OK to add a new bit/boolean inside the 
pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
enumeration and test inside __vfio_pci_memory_enabled() to return true?

In the enumeration we have the possibility to know if the function is a 
HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.

It seems an easy fix without side effects.

What do you think?


> 
>> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
>> I note that my host device lspci output looks like:
>>
>> 0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
>>
>> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
>> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
>>

I do not think we can fix this problem by forcing the is_virtfn bit.
AFAIU, our HW virtual function works a lot like a physical function.

>>
>>
>> Matthew Rosato (1):
>>    s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
>>
>>   hw/s390x/s390-pci-inst.c | 10 ++++++++++
>>   1 file changed, 10 insertions(+)
>>
> 
> 

Regards,
Pierre

-- 
Pierre Morel
IBM Lab Boeblingen


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-27 15:40   ` Pierre Morel
@ 2020-07-27 16:37     ` Matthew Rosato
  2020-07-27 16:47     ` Alex Williamson
  2020-07-28  8:59     ` Niklas Schnelle
  2 siblings, 0 replies; 14+ messages in thread
From: Matthew Rosato @ 2020-07-27 16:37 UTC (permalink / raw)
  To: Pierre Morel, Alex Williamson
  Cc: schnelle, david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth

On 7/27/20 11:40 AM, Pierre Morel wrote:
> 
> 
> On 2020-07-23 18:29, Alex Williamson wrote:
>> On Thu, 23 Jul 2020 11:13:55 -0400
>> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
>>
>>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>>> fails spectacularly, with errors in qemu like:
>>>
>>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) 
>>> failed: Input/output error
>>>
>>>  From read to bar 0 originating out of 
>>> hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>>
>>> So, I'm trying to figure out how to get vfio-pci happy again on 
>>> s390x.  From
>>> a bit of tracing, we seem to be triggering the new trap in
>>> __vfio_pci_memory_enabled().  Sure enough, if I just force this 
>>> function to
>>> return 'true' as a test case, things work again.
>>> The included patch attempts to enforce the setting, which restores 
>>> everything
>>> to working order but also triggers vfio_bar_restore() in the 
>>> process....  So
>>> this isn't the right answer, more of a proof-of-concept.
>>>
>>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy 
>>> with this
>>> recent kernel change?
>>
>> Bummer!  I won't claim to understand s390 PCI, but if we have a VF
>> exposed to the "host" (ie. the first level where vfio-pci is being
>> used), but we can't tell that it's a VF, how do we know whether the
>> memory bit in the command register is unimplemented because it's a VF
>> or unimplemented because the device doesn't support MMIO?  How are the
>> device ID, vendor ID, and BAR registers virtualized to the host?  Could
>> the memory enable bit also be emulated by that virtualization, much
>> like vfio-pci does for userspace?  If the other registers are
>> virtualized, but these command register bits are left unimplemented, do
>> we need code to deduce that we have a VF based on the existence of MMIO
>> BARs, but lack of memory enable bit?  Thanks,
>>
>> Alex
> 
> Alex, Matt,
> 
> in s390 we have the possibility to assign a virtual function to a 
> logical partition as function 0.
> In this case it can not be treated as a virtual function but must be 
> treated as a physical function.
> This is currently working very well.
> However, these functions do not set PCI_COMMAND_MEMORY as we need.
> 

Thanks for the explanation.

> Shouldn't we fix this inside the kernel, to keep older QMEU working?

Yes, ideally.

> 
> Then would it be OK to add a new bit/boolean inside the 
> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
> enumeration and test inside __vfio_pci_memory_enabled() to return true?
> 
> In the enumeration we have the possibility to know if the function is a 
> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
> 
> It seems an easy fix without side effects.
> 
> What do you think?
> 

 From my perspective at least, this would resolve the issue and is 
in-line with the sort of suggestion Alex floated the other day of "do
we need code to deduce that we have a VF based on the existence of MMIO
BARs, but lack of memory enable bit?" -- it just sounds like a different 
way of coming to the conclusion.  Effectively we need a way to say: 'for 
the purposes of memory_enable detection, treat this thing like a virtual 
function because it is one, even though it doesn't always act like one'

> 
>>
>>> @Nilkas/@Pierre: I wonder if this might be related to host device 
>>> is_virtfn?
>>> I note that my host device lspci output looks like:
>>>
>>> 0000:00:00.0 Ethernet controller: Mellanox Technologies MT27710 
>>> Family [ConnectX-4 Lx Virtual Function]
>>>
>>> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
>>> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
>>>
> 
> I do not think we can fix this problem by forcing the is_virtfn bit.
> AFAIU, our HW virtual function works a lot like a physical function.
> 
>>>
>>>
>>> Matthew Rosato (1):
>>>    s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
>>>
>>>   hw/s390x/s390-pci-inst.c | 10 ++++++++++
>>>   1 file changed, 10 insertions(+)
>>>
>>
>>
> 
> Regards,
> Pierre
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-27 15:40   ` Pierre Morel
  2020-07-27 16:37     ` Matthew Rosato
@ 2020-07-27 16:47     ` Alex Williamson
  2020-07-28  9:33       ` Niklas Schnelle
  2020-07-28  8:59     ` Niklas Schnelle
  2 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2020-07-27 16:47 UTC (permalink / raw)
  To: Pierre Morel
  Cc: schnelle, Matthew Rosato, david, cohuck, qemu-devel, pasic,
	borntraeger, qemu-s390x, rth

On Mon, 27 Jul 2020 17:40:39 +0200
Pierre Morel <pmorel@linux.ibm.com> wrote:

> On 2020-07-23 18:29, Alex Williamson wrote:
> > On Thu, 23 Jul 2020 11:13:55 -0400
> > Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> >   
> >> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
> >> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
> >> fails spectacularly, with errors in qemu like:
> >>
> >> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
> >>
> >>  From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
> >>
> >> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
> >> a bit of tracing, we seem to be triggering the new trap in
> >> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
> >> return 'true' as a test case, things work again.
> >> The included patch attempts to enforce the setting, which restores everything
> >> to working order but also triggers vfio_bar_restore() in the process....  So
> >> this isn't the right answer, more of a proof-of-concept.
> >>
> >> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
> >> recent kernel change?  
> > 
> > Bummer!  I won't claim to understand s390 PCI, but if we have a VF
> > exposed to the "host" (ie. the first level where vfio-pci is being
> > used), but we can't tell that it's a VF, how do we know whether the
> > memory bit in the command register is unimplemented because it's a VF
> > or unimplemented because the device doesn't support MMIO?  How are the
> > device ID, vendor ID, and BAR registers virtualized to the host?  Could
> > the memory enable bit also be emulated by that virtualization, much
> > like vfio-pci does for userspace?  If the other registers are
> > virtualized, but these command register bits are left unimplemented, do
> > we need code to deduce that we have a VF based on the existence of MMIO
> > BARs, but lack of memory enable bit?  Thanks,
> > 
> > Alex  
> 
> Alex, Matt,
> 
> in s390 we have the possibility to assign a virtual function to a 
> logical partition as function 0.
> In this case it can not be treated as a virtual function but must be 
> treated as a physical function.
> This is currently working very well.
> However, these functions do not set PCI_COMMAND_MEMORY as we need.

Where is the vendor and device ID virtualization done for these
devices, we can't have a PF with IDs 0000:0000.

> Shouldn't we fix this inside the kernel, to keep older QMEU working?
> 
> Then would it be OK to add a new bit/boolean inside the 
> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
> enumeration and test inside __vfio_pci_memory_enabled() to return true?

Probably each instance of is_virtfn in vfio-pci should be looked at to
see if it applies to s390.  If we're going to recognize this as a VF,
I'd rather we complete the emulation that the lower level hypervisor
has missed.  If we can enable all the is_virtfn code on s390, then we
should probably cache is_virtfn on the vfio_pci_device object and allow
s390 a place to set it once at probe or enable time.

> In the enumeration we have the possibility to know if the function is a 
> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
> 
> It seems an easy fix without side effects.
> 
> What do you think?

It sure seems preferable to recognize that it is a VF in the kernel
than to require userspace to have arch specific hacks.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-27 15:40   ` Pierre Morel
  2020-07-27 16:37     ` Matthew Rosato
  2020-07-27 16:47     ` Alex Williamson
@ 2020-07-28  8:59     ` Niklas Schnelle
  2 siblings, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2020-07-28  8:59 UTC (permalink / raw)
  To: Pierre Morel, Alex Williamson, Matthew Rosato
  Cc: david, cohuck, qemu-devel, pasic, borntraeger, qemu-s390x, rth



On 7/27/20 5:40 PM, Pierre Morel wrote:
> 
> 
> On 2020-07-23 18:29, Alex Williamson wrote:
>> On Thu, 23 Jul 2020 11:13:55 -0400
>> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
>>
>>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>>> fails spectacularly, with errors in qemu like:
>>>
>>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error
>>>
>>>  From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>>
>>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>>> a bit of tracing, we seem to be triggering the new trap in
>>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>>> return 'true' as a test case, things work again.
>>> The included patch attempts to enforce the setting, which restores everything
>>> to working order but also triggers vfio_bar_restore() in the process....  So
>>> this isn't the right answer, more of a proof-of-concept.
>>>
>>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
>>> recent kernel change?
>>
>> Bummer!  I won't claim to understand s390 PCI, but if we have a VF
>> exposed to the "host" (ie. the first level where vfio-pci is being
>> used), but we can't tell that it's a VF, how do we know whether the
>> memory bit in the command register is unimplemented because it's a VF
>> or unimplemented because the device doesn't support MMIO?  How are the
>> device ID, vendor ID, and BAR registers virtualized to the host?  Could
>> the memory enable bit also be emulated by that virtualization, much
>> like vfio-pci does for userspace?  If the other registers are
>> virtualized, but these command register bits are left unimplemented, do
>> we need code to deduce that we have a VF based on the existence of MMIO
>> BARs, but lack of memory enable bit?  Thanks,
>>
>> Alex
> 
> Alex, Matt,
> 
> in s390 we have the possibility to assign a virtual function to a logical partition as function 0.
> In this case it can not be treated as a virtual function but must be treated as a physical function.
> This is currently working very well.
Can you explain why it must be treated as a physical function and must not have is_virtfn set?
I'm currently reworking my fix for PF/VF linking not happening for all ways to attach a
VF and in that I intend to set is_virtfn = 1 also for VFs that are not linked with a PF
including those attached to an LPAR.

So far I really can not see a reason why that should not work since I was wrong before
and Firmware does tell us that these are indeed VFs (zdev->is_physfn == 0).
AFAIK on nearly all platforms guests will often have a VF as function zero on a bus
because that is what I expect to happen if you pass it through as a PCI function.
So unless I'm missing something, that just makes LPAR look more like a QEMU guest on
another platform which is very likely much more well tested than treating a VF as
a PF as we have been doing.
> However, these functions do not set PCI_COMMAND_MEMORY as we need.
> 
> Shouldn't we fix this inside the kernel, to keep older QMEU working?
> 
> Then would it be OK to add a new bit/boolean inside the pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during enumeration and test inside __vfio_pci_memory_enabled() to return true?
This does not make sense to me, as I wrote above it's totally normal for VMs to see VFs detached
from the PF as they are passed-through to a QEMU guest so IMHO that's already covered by the meaning
of is_virtfn.
> 
> In the enumeration we have the possibility to know if the function is a HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-27 16:47     ` Alex Williamson
@ 2020-07-28  9:33       ` Niklas Schnelle
  2020-07-28 12:52         ` Alex Williamson
  0 siblings, 1 reply; 14+ messages in thread
From: Niklas Schnelle @ 2020-07-28  9:33 UTC (permalink / raw)
  To: Alex Williamson, Pierre Morel
  Cc: Matthew Rosato, david, cohuck, qemu-devel, pasic, borntraeger,
	qemu-s390x, rth



On 7/27/20 6:47 PM, Alex Williamson wrote:
> On Mon, 27 Jul 2020 17:40:39 +0200
> Pierre Morel <pmorel@linux.ibm.com> wrote:
> 
>> On 2020-07-23 18:29, Alex Williamson wrote:
>>> On Thu, 23 Jul 2020 11:13:55 -0400
>>> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
>>>   
>>>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>>>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>>>> fails spectacularly, with errors in qemu like:
... snip ...
>>
>> Alex, Matt,
>>
>> in s390 we have the possibility to assign a virtual function to a 
>> logical partition as function 0.
>> In this case it can not be treated as a virtual function but must be 
>> treated as a physical function.
>> This is currently working very well.
>> However, these functions do not set PCI_COMMAND_MEMORY as we need.
> 
> Where is the vendor and device ID virtualization done for these
> devices, we can't have a PF with IDs 0000:0000.
Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0
so it is the mandatory function zero on it's PCI bus, where until recently
we always had only one function per bus but with the recent multi-function
support it can act more like on other platforms with several PCI functions
sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs.
That's why I'm saying that having devfn == 0 should not be very special for a VF
passed to a VM and I really don't see where it would not act like a VF passed
from any other Hypervisor.

The only really tricky part in my opinion is where during the "probing"
we do set is_virtfn so it happens both for VFs passed-through from z/VM
or LPAR and VFs created through sriov_numvfs which unlike on other platforms
are also scanned by Firmware (pdev->no_vf_scan disables the Linux side scanning).
With the fix I'm currently testing I had to do this in pcibios_enable_device()
because I also create sysfs links between VFs and their parent PFs and those
need the sysfs entries to be already created, which makes the more apropriately
sound pcibios_bus_add_device() too early.
> 
>> Shouldn't we fix this inside the kernel, to keep older QMEU working?
>>
>> Then would it be OK to add a new bit/boolean inside the 
>> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
>> enumeration and test inside __vfio_pci_memory_enabled() to return true?
> 
> Probably each instance of is_virtfn in vfio-pci should be looked at to
> see if it applies to s390.  If we're going to recognize this as a VF,
> I'd rather we complete the emulation that the lower level hypervisor
> has missed.  If we can enable all the is_virtfn code on s390, then we
> should probably cache is_virtfn on the vfio_pci_device object and allow
> s390 a place to set it once at probe or enable time.
> 
>> In the enumeration we have the possibility to know if the function is a 
>> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
>>
>> It seems an easy fix without side effects.
>>
>> What do you think?
> 
> It sure seems preferable to recognize that it is a VF in the kernel
> than to require userspace to have arch specific hacks.  Thanks,
> 
> Alex
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-28  9:33       ` Niklas Schnelle
@ 2020-07-28 12:52         ` Alex Williamson
  2020-07-28 14:13           ` Niklas Schnelle
  0 siblings, 1 reply; 14+ messages in thread
From: Alex Williamson @ 2020-07-28 12:52 UTC (permalink / raw)
  To: Niklas Schnelle
  Cc: Pierre Morel, Matthew Rosato, david, cohuck, qemu-devel, pasic,
	borntraeger, qemu-s390x, rth

On Tue, 28 Jul 2020 11:33:55 +0200
Niklas Schnelle <schnelle@linux.ibm.com> wrote:

> On 7/27/20 6:47 PM, Alex Williamson wrote:
> > On Mon, 27 Jul 2020 17:40:39 +0200
> > Pierre Morel <pmorel@linux.ibm.com> wrote:
> >   
> >> On 2020-07-23 18:29, Alex Williamson wrote:  
> >>> On Thu, 23 Jul 2020 11:13:55 -0400
> >>> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
> >>>     
> >>>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
> >>>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
> >>>> fails spectacularly, with errors in qemu like:  
> ... snip ...
> >>
> >> Alex, Matt,
> >>
> >> in s390 we have the possibility to assign a virtual function to a 
> >> logical partition as function 0.
> >> In this case it can not be treated as a virtual function but must be 
> >> treated as a physical function.
> >> This is currently working very well.
> >> However, these functions do not set PCI_COMMAND_MEMORY as we need.  
> > 
> > Where is the vendor and device ID virtualization done for these
> > devices, we can't have a PF with IDs 0000:0000.  
> Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0
> so it is the mandatory function zero on it's PCI bus, where until recently
> we always had only one function per bus but with the recent multi-function
> support it can act more like on other platforms with several PCI functions
> sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs.
> That's why I'm saying that having devfn == 0 should not be very special for a VF
> passed to a VM and I really don't see where it would not act like a VF passed
> from any other Hypervisor.

My question is relative to other registers on VFs that are not
implemented in hardware, not the device address.  In addition to the
memory bit of the command register, SR-IOV VFs do not implement the
vendor and device IDs, these are to be virtualized from the values
provided in the PF SR-IOV capability.  It wouldn't make much sense to
expose a device with no PCI vendor or device ID, so I assume the z/VM
hypervisor is virtualizing these, but not the memory bit.
 
> The only really tricky part in my opinion is where during the "probing"
> we do set is_virtfn so it happens both for VFs passed-through from z/VM
> or LPAR and VFs created through sriov_numvfs which unlike on other platforms
> are also scanned by Firmware (pdev->no_vf_scan disables the Linux side scanning).
> With the fix I'm currently testing I had to do this in pcibios_enable_device()
> because I also create sysfs links between VFs and their parent PFs and those
> need the sysfs entries to be already created, which makes the more apropriately
> sound pcibios_bus_add_device() too early.


I don't think it would be wise to set is_virtfn without a physfn being
present in the OS view.  I believe pci_dev.is_virtfn implies
pci_dev.physfn points to the PF device.  Thanks,

Alex

> >> Shouldn't we fix this inside the kernel, to keep older QMEU working?
> >>
> >> Then would it be OK to add a new bit/boolean inside the 
> >> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
> >> enumeration and test inside __vfio_pci_memory_enabled() to return true?  
> > 
> > Probably each instance of is_virtfn in vfio-pci should be looked at to
> > see if it applies to s390.  If we're going to recognize this as a VF,
> > I'd rather we complete the emulation that the lower level hypervisor
> > has missed.  If we can enable all the is_virtfn code on s390, then we
> > should probably cache is_virtfn on the vfio_pci_device object and allow
> > s390 a place to set it once at probe or enable time.
> >   
> >> In the enumeration we have the possibility to know if the function is a 
> >> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
> >>
> >> It seems an easy fix without side effects.
> >>
> >> What do you think?  
> > 
> > It sure seems preferable to recognize that it is a VF in the kernel
> > than to require userspace to have arch specific hacks.  Thanks,
> > 
> > Alex
> >   
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
  2020-07-28 12:52         ` Alex Williamson
@ 2020-07-28 14:13           ` Niklas Schnelle
  0 siblings, 0 replies; 14+ messages in thread
From: Niklas Schnelle @ 2020-07-28 14:13 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Pierre Morel, Matthew Rosato, david, cohuck, qemu-devel, pasic,
	borntraeger, qemu-s390x, rth



On 7/28/20 2:52 PM, Alex Williamson wrote:
> On Tue, 28 Jul 2020 11:33:55 +0200
> Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> 
>> On 7/27/20 6:47 PM, Alex Williamson wrote:
>>> On Mon, 27 Jul 2020 17:40:39 +0200
>>> Pierre Morel <pmorel@linux.ibm.com> wrote:
>>>   
>>>> On 2020-07-23 18:29, Alex Williamson wrote:  
>>>>> On Thu, 23 Jul 2020 11:13:55 -0400
>>>>> Matthew Rosato <mjrosato@linux.ibm.com> wrote:
>>>>>     
>>>>>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>>>>>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>>>>>> fails spectacularly, with errors in qemu like:  
>> ... snip ...
>>>>
>>>> Alex, Matt,
>>>>
>>>> in s390 we have the possibility to assign a virtual function to a 
>>>> logical partition as function 0.
>>>> In this case it can not be treated as a virtual function but must be 
>>>> treated as a physical function.
>>>> This is currently working very well.
>>>> However, these functions do not set PCI_COMMAND_MEMORY as we need.  
>>>
>>> Where is the vendor and device ID virtualization done for these
>>> devices, we can't have a PF with IDs 0000:0000.  
>> Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0
>> so it is the mandatory function zero on it's PCI bus, where until recently
>> we always had only one function per bus but with the recent multi-function
>> support it can act more like on other platforms with several PCI functions
>> sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs.
>> That's why I'm saying that having devfn == 0 should not be very special for a VF
>> passed to a VM and I really don't see where it would not act like a VF passed
>> from any other Hypervisor.
> 
> My question is relative to other registers on VFs that are not
> implemented in hardware, not the device address.  In addition to the
> memory bit of the command register, SR-IOV VFs do not implement the
> vendor and device IDs, these are to be virtualized from the values
> provided in the PF SR-IOV capability.  It wouldn't make much sense to
> expose a device with no PCI vendor or device ID, so I assume the z/VM
> hypervisor is virtualizing these, but not the memory bit.
Ahh, yes I see. On Z these are actually already virtualized at the LPAR
level as part of the firmware based scanning I mentioned that is the
reason for pdev->no_vf_scan. This is true even for VFs created
through sriov_numvfs in the host which is why I did not realize these
don't come from hardware, even though looking at drivers/pci/iov.c it's
pretty obvious and I did touch that code before. Sorry for the confusion.
>  
>> The only really tricky part in my opinion is where during the "probing"
>> we do set is_virtfn so it happens both for VFs passed-through from z/VM
>> or LPAR and VFs created through sriov_numvfs which unlike on other platforms
>> are also scanned by Firmware (pdev->no_vf_scan disables the Linux side scanning).
>> With the fix I'm currently testing I had to do this in pcibios_enable_device()
>> because I also create sysfs links between VFs and their parent PFs and those
>> need the sysfs entries to be already created, which makes the more apropriately
>> sound pcibios_bus_add_device() too early.
> 
> 
> I don't think it would be wise to set is_virtfn without a physfn being
> present in the OS view.  I believe pci_dev.is_virtfn implies
> pci_dev.physfn points to the PF device.  Thanks> 
> Alex
Thank you a lot for that hint, you're absolutely right, while the
drivers do work with is_virtfn == 1 && physffn == NULL
vfio-pci gets very confused. And sorry Pierre for doubting
your is_virtfn_detached idea, this thing really is confusing.
> 
>>>> Shouldn't we fix this inside the kernel, to keep older QMEU working?
>>>>
>>>> Then would it be OK to add a new bit/boolean inside the 
>>>> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
>>>> enumeration and test inside __vfio_pci_memory_enabled() to return true?  
>>>
>>> Probably each instance of is_virtfn in vfio-pci should be looked at to
>>> see if it applies to s390.  If we're going to recognize this as a VF,
>>> I'd rather we complete the emulation that the lower level hypervisor
>>> has missed.  If we can enable all the is_virtfn code on s390, then we
>>> should probably cache is_virtfn on the vfio_pci_device object and allow
>>> s390 a place to set it once at probe or enable time.
>>>   
>>>> In the enumeration we have the possibility to know if the function is a 
>>>> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
>>>>
>>>> It seems an easy fix without side effects.
>>>>
>>>> What do you think?  
>>>
>>> It sure seems preferable to recognize that it is a VF in the kernel
>>> than to require userspace to have arch specific hacks.  Thanks,
>>>
>>> Alex
>>>   
>>
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-07-28 14:15 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-23 15:13 [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Matthew Rosato
2020-07-23 15:13 ` [RFC PATCH] s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci Matthew Rosato
2020-07-23 16:29 ` [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement Alex Williamson
2020-07-23 18:12   ` Matthew Rosato
2020-07-27 15:40   ` Pierre Morel
2020-07-27 16:37     ` Matthew Rosato
2020-07-27 16:47     ` Alex Williamson
2020-07-28  9:33       ` Niklas Schnelle
2020-07-28 12:52         ` Alex Williamson
2020-07-28 14:13           ` Niklas Schnelle
2020-07-28  8:59     ` Niklas Schnelle
2020-07-24  9:46 ` Niklas Schnelle
2020-07-24  9:53   ` Niklas Schnelle
2020-07-24 14:15   ` Niklas Schnelle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.