All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce
@ 2018-10-09 12:11 Wuzongyong (Euler Dept)
  2018-10-09 15:08 ` Alex Williamson
  0 siblings, 1 reply; 5+ messages in thread
From: Wuzongyong (Euler Dept) @ 2018-10-09 12:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Williamson, libvir-list, Chenhaiwu (Euler), Wanzongshun (Vincent)

Hi,

I start a virtual machine with commandline:
    /usr/libexec/qemu-kvm --enable-kvm -smp 8 -m 8192 -device vfio-pci,host=0000:81:00.0

Then I pause the qemu process before executing the main_loop function by gdb.
At this moment, lspci shows the regions are disabled like below:
    81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
        Subsystem: NVIDIA Corporation Device 118f
        Physical Slot: 0-6
        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Interrupt: pin A routed to IRQ 35
        NUMA node: 1
        Region 0: Memory at c8000000 (32-bit, non-prefetchable) [disabled] [size=16M]
        Region 1: Memory at 27800000000 (64-bit, prefetchable) [disabled] [size=16G]
        Region 3: Memory at 27c00000000 (64-bit, prefetchable) [disabled] [size=32M]

But after the command:
echo 1 > /sys/bus/pci/devices/0000:81:00.0/reset
lspci shows the regions are *not* disabled:
    81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
        Subsystem: Huawei Technologies Co., Ltd. Device 2061
        Physical Slot: 0-6
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 7
        NUMA node: 1
        Region 0: Memory at c8000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 27800000000 (64-bit, prefetchable) [size=16G]
        Region 3: Memory at 27c00000000 (64-bit, prefetchable) [size=32M]

AFAIK, qemu performs vfio_pci_reset like the below callstack:
    Qemu:
        vfio_pci_reset
            ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)
Kernel:
    vfio_pci_ioctl
        pci_try_reset_function
            __pci_reset_function_locked
                    pci_parent_bus_reset
                        pci_reset_bridge_secondary_bus

and write 1 to the reset interface of sysfs go through the path:
Kernel:
    reset_store
        pci_reset_function
            __pci_reset_function_locked
                    pci_parent_bus_reset
                        pci_reset_bridge_secondary_bus

So seem that these two methods are same actually, I am confused why the results are inconsistent.

Thanks,
Zongyong Wu

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce
  2018-10-09 12:11 [Qemu-devel] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce Wuzongyong (Euler Dept)
@ 2018-10-09 15:08 ` Alex Williamson
  2018-10-10  1:26   ` Wuzongyong (Euler Dept)
  2018-10-10  1:47   ` Wuzongyong (Euler Dept)
  0 siblings, 2 replies; 5+ messages in thread
From: Alex Williamson @ 2018-10-09 15:08 UTC (permalink / raw)
  To: Wuzongyong (Euler Dept)
  Cc: qemu-devel, libvir-list, Chenhaiwu (Euler), Wanzongshun (Vincent)

On Tue, 9 Oct 2018 12:11:29 +0000
"Wuzongyong (Euler Dept)" <cordius.wu@huawei.com> wrote:

> Hi,
> 
> I start a virtual machine with commandline:
>     /usr/libexec/qemu-kvm --enable-kvm -smp 8 -m 8192 -device vfio-pci,host=0000:81:00.0
> 
> Then I pause the qemu process before executing the main_loop function by gdb.
> At this moment, lspci shows the regions are disabled like below:
>     81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
>         Subsystem: NVIDIA Corporation Device 118f
>         Physical Slot: 0-6
>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 35
>         NUMA node: 1
>         Region 0: Memory at c8000000 (32-bit, non-prefetchable) [disabled] [size=16M]
>         Region 1: Memory at 27800000000 (64-bit, prefetchable) [disabled] [size=16G]
>         Region 3: Memory at 27c00000000 (64-bit, prefetchable) [disabled] [size=32M]
> 
> But after the command:
> echo 1 > /sys/bus/pci/devices/0000:81:00.0/reset
> lspci shows the regions are *not* disabled:
>     81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
>         Subsystem: Huawei Technologies Co., Ltd. Device 2061
>         Physical Slot: 0-6
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 32 bytes
>         Interrupt: pin A routed to IRQ 7
>         NUMA node: 1
>         Region 0: Memory at c8000000 (32-bit, non-prefetchable) [size=16M]
>         Region 1: Memory at 27800000000 (64-bit, prefetchable) [size=16G]
>         Region 3: Memory at 27c00000000 (64-bit, prefetchable) [size=32M]
> 
> AFAIK, qemu performs vfio_pci_reset like the below callstack:
>     Qemu:
>         vfio_pci_reset
>             ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)
> Kernel:
>     vfio_pci_ioctl
>         pci_try_reset_function
>             __pci_reset_function_locked
>                     pci_parent_bus_reset
>                         pci_reset_bridge_secondary_bus
> 
> and write 1 to the reset interface of sysfs go through the path:
> Kernel:
>     reset_store
>         pci_reset_function
>             __pci_reset_function_locked
>                     pci_parent_bus_reset
>                         pci_reset_bridge_secondary_bus
> 
> So seem that these two methods are same actually, I am confused why the results are inconsistent.

Maybe there's a misunderstanding here, the kernel PCI reset functions
save and restore config space around the reset.  The intention of the
reset is to re-init the internal state of the device while preserving
(via save+restore) the config space.  The BARs being disabled is simply
a matter of the Memory bit in the Command register being unset (note
Mem-).  Whether this is indicative of some issue depends on whether the
state before reset matches the state after reset, not that the states
after two different paths of triggering a reset are identical.

vfio-pci will hand off the device to the user (QEMU) disabled, so the
states in the first example make sense to me.  In the second case, it's
not clear what the starting state is for the device.  Was this reset
performed from the starting point of the first case or is the device in
some arbitrary, unknown state prior to reset?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce
  2018-10-09 15:08 ` Alex Williamson
@ 2018-10-10  1:26   ` Wuzongyong (Euler Dept)
  2018-10-10  1:47   ` Wuzongyong (Euler Dept)
  1 sibling, 0 replies; 5+ messages in thread
From: Wuzongyong (Euler Dept) @ 2018-10-10  1:26 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, libvir-list, Chenhaiwu (Euler), Wanzongshun (Vincent)

> > Hi,
> >
> > I start a virtual machine with commandline:
> >     /usr/libexec/qemu-kvm --enable-kvm -smp 8 -m 8192 -device
> > vfio-pci,host=0000:81:00.0
> >
> > Then I pause the qemu process before executing the main_loop function by
> gdb.
> > At this moment, lspci shows the regions are disabled like below:
> >     81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe
> 16GB] (rev a1)
> >         Subsystem: NVIDIA Corporation Device 118f
> >         Physical Slot: 0-6
> >         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx+
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Interrupt: pin A routed to IRQ 35
> >         NUMA node: 1
> >         Region 0: Memory at c8000000 (32-bit, non-prefetchable)
> [disabled] [size=16M]
> >         Region 1: Memory at 27800000000 (64-bit, prefetchable) [disabled]
> [size=16G]
> >         Region 3: Memory at 27c00000000 (64-bit, prefetchable)
> > [disabled] [size=32M]
> >
> > But after the command:
> > echo 1 > /sys/bus/pci/devices/0000:81:00.0/reset
> > lspci shows the regions are *not* disabled:
> >     81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe
> 16GB] (rev a1)
> >         Subsystem: Huawei Technologies Co., Ltd. Device 2061
> >         Physical Slot: 0-6
> >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr+ Stepping- SERR+ FastB2B- DisINTx-
> >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
> >         Latency: 0, Cache Line Size: 32 bytes
> >         Interrupt: pin A routed to IRQ 7
> >         NUMA node: 1
> >         Region 0: Memory at c8000000 (32-bit, non-prefetchable)
> [size=16M]
> >         Region 1: Memory at 27800000000 (64-bit, prefetchable) [size=16G]
> >         Region 3: Memory at 27c00000000 (64-bit, prefetchable)
> > [size=32M]
> >
> > AFAIK, qemu performs vfio_pci_reset like the below callstack:
> >     Qemu:
> >         vfio_pci_reset
> >             ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)
> > Kernel:
> >     vfio_pci_ioctl
> >         pci_try_reset_function
> >             __pci_reset_function_locked
> >                     pci_parent_bus_reset
> >                         pci_reset_bridge_secondary_bus
> >
> > and write 1 to the reset interface of sysfs go through the path:
> > Kernel:
> >     reset_store
> >         pci_reset_function
> >             __pci_reset_function_locked
> >                     pci_parent_bus_reset
> >                         pci_reset_bridge_secondary_bus
> >
> > So seem that these two methods are same actually, I am confused why the
> results are inconsistent.
> 
> Maybe there's a misunderstanding here, the kernel PCI reset functions save
> and restore config space around the reset.  The intention of the reset is
> to re-init the internal state of the device while preserving (via
> save+restore) the config space.  The BARs being disabled is simply a
> matter of the Memory bit in the Command register being unset (note Mem-).
> Whether this is indicative of some issue depends on whether the state
> before reset matches the state after reset, not that the states after two
> different paths of triggering a reset are identical.
> 
> vfio-pci will hand off the device to the user (QEMU) disabled, so the
> states in the first example make sense to me.  In the second case, it's
> not clear what the starting state is for the device.  Was this reset
> performed from the starting point of the first case or is the device in
> some arbitrary, unknown state prior to reset?  Thanks,
> 
> Alex
In the second case, the reset was performed from the starting point of the first case.
IOW, the states before the two cases are identical, I think. The only difference I can think of
is the qemu process will perform twice reset, one occurs when vfio open the device' fd and the 
other one occurs as I mentioned above.

Thanks,
Wu Zongyong

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce
  2018-10-09 15:08 ` Alex Williamson
  2018-10-10  1:26   ` Wuzongyong (Euler Dept)
@ 2018-10-10  1:47   ` Wuzongyong (Euler Dept)
  2018-10-10  3:18     ` Alex Williamson
  1 sibling, 1 reply; 5+ messages in thread
From: Wuzongyong (Euler Dept) @ 2018-10-10  1:47 UTC (permalink / raw)
  To: Alex Williamson
  Cc: qemu-devel, libvir-list, Chenhaiwu (Euler), Wanzongshun (Vincent)

> > > Hi,
> > >
> > > I start a virtual machine with commandline:
> > >     /usr/libexec/qemu-kvm --enable-kvm -smp 8 -m 8192 -device
> > > vfio-pci,host=0000:81:00.0
> > >
> > > Then I pause the qemu process before executing the main_loop
> > > function by
> > gdb.
> > > At this moment, lspci shows the regions are disabled like below:
> > >     81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100
> > > PCIe
> > 16GB] (rev a1)
> > >         Subsystem: NVIDIA Corporation Device 118f
> > >         Physical Slot: 0-6
> > >         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
> > ParErr- Stepping- SERR- FastB2B- DisINTx+
> > >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > > >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >         Interrupt: pin A routed to IRQ 35
> > >         NUMA node: 1
> > >         Region 0: Memory at c8000000 (32-bit, non-prefetchable)
> > [disabled] [size=16M]
> > >         Region 1: Memory at 27800000000 (64-bit, prefetchable)
> > > [disabled]
> > [size=16G]
> > >         Region 3: Memory at 27c00000000 (64-bit, prefetchable)
> > > [disabled] [size=32M]
> > >
> > > But after the command:
> > > echo 1 > /sys/bus/pci/devices/0000:81:00.0/reset
> > > lspci shows the regions are *not* disabled:
> > >     81:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100
> > > PCIe
> > 16GB] (rev a1)
> > >         Subsystem: Huawei Technologies Co., Ltd. Device 2061
> > >         Physical Slot: 0-6
> > >         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> > ParErr+ Stepping- SERR+ FastB2B- DisINTx-
> > >         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
> > > >TAbort-
> > <TAbort- <MAbort- >SERR- <PERR- INTx-
> > >         Latency: 0, Cache Line Size: 32 bytes
> > >         Interrupt: pin A routed to IRQ 7
> > >         NUMA node: 1
> > >         Region 0: Memory at c8000000 (32-bit, non-prefetchable)
> > [size=16M]
> > >         Region 1: Memory at 27800000000 (64-bit, prefetchable)
> [size=16G]
> > >         Region 3: Memory at 27c00000000 (64-bit, prefetchable)
> > > [size=32M]
> > >
> > > AFAIK, qemu performs vfio_pci_reset like the below callstack:
> > >     Qemu:
> > >         vfio_pci_reset
> > >             ioctl(vdev->vbasedev.fd, VFIO_DEVICE_RESET)
> > > Kernel:
> > >     vfio_pci_ioctl
> > >         pci_try_reset_function
> > >             __pci_reset_function_locked
> > >                     pci_parent_bus_reset
> > >                         pci_reset_bridge_secondary_bus
> > >
> > > and write 1 to the reset interface of sysfs go through the path:
> > > Kernel:
> > >     reset_store
> > >         pci_reset_function
> > >             __pci_reset_function_locked
> > >                     pci_parent_bus_reset
> > >                         pci_reset_bridge_secondary_bus
> > >
> > > So seem that these two methods are same actually, I am confused why
> > > the
> > results are inconsistent.
> >
> > Maybe there's a misunderstanding here, the kernel PCI reset functions
> > save and restore config space around the reset.  The intention of the
> > reset is to re-init the internal state of the device while preserving
> > (via
> > save+restore) the config space.  The BARs being disabled is simply a
> > matter of the Memory bit in the Command register being unset (note Mem-).
> > Whether this is indicative of some issue depends on whether the state
> > before reset matches the state after reset, not that the states after
> > two different paths of triggering a reset are identical.
> >
> > vfio-pci will hand off the device to the user (QEMU) disabled, so the
> > states in the first example make sense to me.  In the second case,
> > it's not clear what the starting state is for the device.  Was this
> > reset performed from the starting point of the first case or is the
> > device in some arbitrary, unknown state prior to reset?  Thanks,
> >
> > Alex
> In the second case, the reset was performed from the starting point of the
> first case.
> IOW, the states before the two cases are identical, I think. The only
> difference I can think of is the qemu process will perform twice reset,
> one occurs when vfio open the device' fd and the other one occurs as I
> mentioned above.
> 
> Thanks,
> Wu Zongyong

You're right. The initial states are not identical.
I found the function vfio_pci_pre_reset in qemu.
    /*
     * Stop any ongoing DMA by disconecting I/O, MMIO, and bus master.
     * Also put INTx Disable in known state.
     */
    cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2);
    cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
             PCI_COMMAND_INTX_DISABLE);
    vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);

So the behaviors between the two reset are inconsistent.

Then I wonder whether the operation is necessary here?
Could I enable the Memory bit in the Command register in vfio_pci_post_reset,
because I want to write regions of PCI devices after reset.

Thanks,
Wu Zongyong

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce
  2018-10-10  1:47   ` Wuzongyong (Euler Dept)
@ 2018-10-10  3:18     ` Alex Williamson
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Williamson @ 2018-10-10  3:18 UTC (permalink / raw)
  To: Wuzongyong (Euler Dept)
  Cc: qemu-devel, libvir-list, Chenhaiwu (Euler), Wanzongshun (Vincent)

On Wed, 10 Oct 2018 01:47:10 +0000
"Wuzongyong (Euler Dept)" <cordius.wu@huawei.com> wrote:
> 
> You're right. The initial states are not identical.
> I found the function vfio_pci_pre_reset in qemu.
>     /*
>      * Stop any ongoing DMA by disconecting I/O, MMIO, and bus master.
>      * Also put INTx Disable in known state.
>      */
>     cmd = vfio_pci_read_config(pdev, PCI_COMMAND, 2);
>     cmd &= ~(PCI_COMMAND_IO | PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER |
>              PCI_COMMAND_INTX_DISABLE);
>     vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
> 
> So the behaviors between the two reset are inconsistent.
> 
> Then I wonder whether the operation is necessary here?
> Could I enable the Memory bit in the Command register in vfio_pci_post_reset,
> because I want to write regions of PCI devices after reset.

One reset is done by the kernel to try to put the device into a known
clean state before allowing the user access to it, the other is done by
QEMU as part of the initial machine reset.  I suppose we could special
case the initial machine reset, but it seems perhaps risky and
unnecessary.

QEMU is the driver here, it can certainly enable MMIO on the device and
there are some examples in the QEMU code where MMIO is enabled to
interact with the device, see vfio_radeon_reset() for example.  The
device driver in the guest or the VM firmware should be the one to
enable the device for VM usage though, QEMU should provide the device
to the VM in a power-on default state, or as close as we can reasonably
get to that.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-10-10  3:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-09 12:11 [Qemu-devel] The results of lspci are inconsistent between vfio reset pci devices and reset devices by sysfs interafce Wuzongyong (Euler Dept)
2018-10-09 15:08 ` Alex Williamson
2018-10-10  1:26   ` Wuzongyong (Euler Dept)
2018-10-10  1:47   ` Wuzongyong (Euler Dept)
2018-10-10  3:18     ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.