* [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 @ 2023-03-12 7:54 Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present Huang Rui ` (7 more replies) 0 siblings, 8 replies; 75+ messages in thread From: Huang Rui @ 2023-03-12 7:54 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui Hi all, In graphic world, the 3D applications/games are runing based on open graphic libraries such as OpenGL and Vulkan. Mesa is the Linux implemenatation of OpenGL and Vulkan for multiple hardware platforms. Because the graphic libraries would like to have the GPU hardware acceleration. In virtualization world, virtio-gpu and passthrough-gpu are two of gpu virtualization technologies. Current Xen only supports OpenGL (virgl: https://docs.mesa3d.org/drivers/virgl.html) for virtio-gpu and passthrough gpu based on PV dom0 for x86 platform. Today, we would like to introduce Vulkan (venus: https://docs.mesa3d.org/drivers/venus.html) and another OpenGL on Vulkan (zink: https://docs.mesa3d.org/drivers/zink.html) support for VirtIO GPU on Xen. These functions are supported on KVM at this moment, but so far, they are not supported on Xen. And we also introduce the PCIe passthrough (GPU) function based on PVH dom0 for AMD x86 platform. These supports required multiple repositories changes on kernel, xen, qemu, mesa, and virglrenderer. Please check below branches: Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=upstream-fox-xen Xen: https://gitlab.com/huangrui123/xen/-/commits/upstream-for-xen QEMU: https://gitlab.com/huangrui123/qemu/-/commits/upstream-for-xen Mesa: https://gitlab.freedesktop.org/rui/mesa/-/commits/upstream-for-xen Virglrenderer: https://gitlab.freedesktop.org/rui/virglrenderer/-/commits/upstream-for-xen In xen part, we mainly add the PCIe passthrough support on PVH dom0. It's using the QEMU to passthrough the GPU device into guest HVM domU. And mainly work is to transfer the interrupt by using gsi, vector, and pirq. Below are the screenshot of these functions, please take a look. Venus: https://drive.google.com/file/d/1_lPq6DMwHu1JQv7LUUVRx31dBj0HJYcL/view?usp=share_link Zink: https://drive.google.com/file/d/1FxLmKu6X7uJOxx1ZzwOm1yA6IL5WMGzd/view?usp=share_link Passthrough GPU: https://drive.google.com/file/d/17onr5gvDK8KM_LniHTSQEI2hGJZlI09L/view?usp=share_link We are working to write the documentation that describe how to verify these functions in the xen wiki page. And will update it in the future version. Thanks, Ray Chen Jiqian (5): vpci: accept BAR writes if dom0 is PVH x86/pvh: shouldn't check pirq flag when map pirq in PVH x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call tools/libs/call: add linux os call to get gsi from irq tools/libs/light: pci: translate irq to gsi Roger Pau Monne (1): x86/pvh: report ACPI VFCT table to dom0 if present tools/include/xen-sys/Linux/privcmd.h | 7 +++++++ tools/include/xencall.h | 2 ++ tools/include/xenctrl.h | 2 ++ tools/libs/call/core.c | 5 +++++ tools/libs/call/libxencall.map | 2 ++ tools/libs/call/linux.c | 14 ++++++++++++++ tools/libs/call/private.h | 9 +++++++++ tools/libs/ctrl/xc_physdev.c | 4 ++++ tools/libs/light/libxl_pci.c | 1 + xen/arch/x86/hvm/dom0_build.c | 1 + xen/arch/x86/hvm/hypercall.c | 3 +-- xen/drivers/vpci/header.c | 2 +- xen/include/acpi/actbl3.h | 1 + 13 files changed, 50 insertions(+), 3 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui @ 2023-03-12 7:54 ` Huang Rui 2023-03-13 11:55 ` Andrew Cooper 2023-03-12 7:54 ` [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH Huang Rui ` (6 subsequent siblings) 7 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-12 7:54 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui, Henry Wang From: Roger Pau Monne <roger.pau@citrix.com> The VFCT ACPI table is used by AMD GPUs to expose the vbios ROM image from the firmware instead of doing it on the PCI ROM on the physical device. As such, this needs to be available for PVH dom0 to access, or else the GPU won't work. Reported-by: Huang Rui <ray.huang@amd.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-and-Tested-by: Huang Rui <ray.huang@amd.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com> Signed-off-by: Huang Rui <ray.huang@amd.com> --- xen/arch/x86/hvm/dom0_build.c | 1 + xen/include/acpi/actbl3.h | 1 + 2 files changed, 2 insertions(+) diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c index 3ac6b7b423..d44de7f2b2 100644 --- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -892,6 +892,7 @@ static bool __init pvh_acpi_table_allowed(const char *sig, ACPI_SIG_DSDT, ACPI_SIG_FADT, ACPI_SIG_FACS, ACPI_SIG_PSDT, ACPI_SIG_SSDT, ACPI_SIG_SBST, ACPI_SIG_MCFG, ACPI_SIG_SLIC, ACPI_SIG_MSDM, ACPI_SIG_WDAT, ACPI_SIG_FPDT, ACPI_SIG_S3PT, + ACPI_SIG_VFCT, }; unsigned int i; diff --git a/xen/include/acpi/actbl3.h b/xen/include/acpi/actbl3.h index 0a6778421f..6858d3e60f 100644 --- a/xen/include/acpi/actbl3.h +++ b/xen/include/acpi/actbl3.h @@ -79,6 +79,7 @@ #define ACPI_SIG_MATR "MATR" /* Memory Address Translation Table */ #define ACPI_SIG_MSDM "MSDM" /* Microsoft Data Management Table */ #define ACPI_SIG_WPBT "WPBT" /* Windows Platform Binary Table */ +#define ACPI_SIG_VFCT "VFCT" /* AMD Video BIOS */ /* * All tables must be byte-packed to match the ACPI specification, since -- 2.25.1 ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present 2023-03-12 7:54 ` [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present Huang Rui @ 2023-03-13 11:55 ` Andrew Cooper 2023-03-13 12:21 ` Roger Pau Monné 0 siblings, 1 reply; 75+ messages in thread From: Andrew Cooper @ 2023-03-13 11:55 UTC (permalink / raw) To: Huang Rui, Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Henry Wang On 12/03/2023 7:54 am, Huang Rui wrote: > From: Roger Pau Monne <roger.pau@citrix.com> > > The VFCT ACPI table is used by AMD GPUs to expose the vbios ROM image > from the firmware instead of doing it on the PCI ROM on the physical > device. > > As such, this needs to be available for PVH dom0 to access, or else > the GPU won't work. > > Reported-by: Huang Rui <ray.huang@amd.com> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > Acked-and-Tested-by: Huang Rui <ray.huang@amd.com> > Release-acked-by: Henry Wang <Henry.Wang@arm.com> > Signed-off-by: Huang Rui <ray.huang@amd.com> Huh... Despite the release ack, this didn't get committed for 4.17. Sorry for the oversight. I've queued this now. ~Andrew ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present 2023-03-13 11:55 ` Andrew Cooper @ 2023-03-13 12:21 ` Roger Pau Monné 2023-03-13 12:27 ` Andrew Cooper 0 siblings, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-13 12:21 UTC (permalink / raw) To: Andrew Cooper Cc: Huang Rui, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Henry Wang On Mon, Mar 13, 2023 at 11:55:56AM +0000, Andrew Cooper wrote: > On 12/03/2023 7:54 am, Huang Rui wrote: > > From: Roger Pau Monne <roger.pau@citrix.com> > > > > The VFCT ACPI table is used by AMD GPUs to expose the vbios ROM image > > from the firmware instead of doing it on the PCI ROM on the physical > > device. > > > > As such, this needs to be available for PVH dom0 to access, or else > > the GPU won't work. > > > > Reported-by: Huang Rui <ray.huang@amd.com> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > Acked-and-Tested-by: Huang Rui <ray.huang@amd.com> > > Release-acked-by: Henry Wang <Henry.Wang@arm.com> > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > Huh... Despite the release ack, this didn't get committed for 4.17. There was a pending query from Jan as to where was this table signature documented or at least registered, as it's not in the ACPI spec or any related files. I don't oppose to the change, as it's already used by Linux, so I think it's impossible for the table signature to be reused, even if not properly documented (it would cause havoc). It's however not ideal to set this kind of precedents. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present 2023-03-13 12:21 ` Roger Pau Monné @ 2023-03-13 12:27 ` Andrew Cooper 2023-03-21 6:26 ` Huang Rui 0 siblings, 1 reply; 75+ messages in thread From: Andrew Cooper @ 2023-03-13 12:27 UTC (permalink / raw) To: Roger Pau Monné Cc: Huang Rui, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Henry Wang On 13/03/2023 12:21 pm, Roger Pau Monné wrote: > On Mon, Mar 13, 2023 at 11:55:56AM +0000, Andrew Cooper wrote: >> On 12/03/2023 7:54 am, Huang Rui wrote: >>> From: Roger Pau Monne <roger.pau@citrix.com> >>> >>> The VFCT ACPI table is used by AMD GPUs to expose the vbios ROM image >>> from the firmware instead of doing it on the PCI ROM on the physical >>> device. >>> >>> As such, this needs to be available for PVH dom0 to access, or else >>> the GPU won't work. >>> >>> Reported-by: Huang Rui <ray.huang@amd.com> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>> Acked-and-Tested-by: Huang Rui <ray.huang@amd.com> >>> Release-acked-by: Henry Wang <Henry.Wang@arm.com> >>> Signed-off-by: Huang Rui <ray.huang@amd.com> >> Huh... Despite the release ack, this didn't get committed for 4.17. > There was a pending query from Jan as to where was this table > signature documented or at least registered, as it's not in the ACPI > spec or any related files. > > I don't oppose to the change, as it's already used by Linux, so I > think it's impossible for the table signature to be reused, even if > not properly documented (it would cause havoc). > > It's however not ideal to set this kind of precedents. It's not great, but this exists in real systems, for several generations it seems. Making things work for users trumps any idealistic beliefs about firmware actually conforming to spec. ~Andrew ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present 2023-03-13 12:27 ` Andrew Cooper @ 2023-03-21 6:26 ` Huang Rui 0 siblings, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-21 6:26 UTC (permalink / raw) To: Andrew Cooper Cc: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Henry Wang On Mon, Mar 13, 2023 at 08:27:02PM +0800, Andrew Cooper wrote: > On 13/03/2023 12:21 pm, Roger Pau Monné wrote: > > On Mon, Mar 13, 2023 at 11:55:56AM +0000, Andrew Cooper wrote: > >> On 12/03/2023 7:54 am, Huang Rui wrote: > >>> From: Roger Pau Monne <roger.pau@citrix.com> > >>> > >>> The VFCT ACPI table is used by AMD GPUs to expose the vbios ROM image > >>> from the firmware instead of doing it on the PCI ROM on the physical > >>> device. > >>> > >>> As such, this needs to be available for PVH dom0 to access, or else > >>> the GPU won't work. > >>> > >>> Reported-by: Huang Rui <ray.huang@amd.com> > >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >>> Acked-and-Tested-by: Huang Rui <ray.huang@amd.com> > >>> Release-acked-by: Henry Wang <Henry.Wang@arm.com> > >>> Signed-off-by: Huang Rui <ray.huang@amd.com> > >> Huh... Despite the release ack, this didn't get committed for 4.17. > > There was a pending query from Jan as to where was this table > > signature documented or at least registered, as it's not in the ACPI > > spec or any related files. > > > > I don't oppose to the change, as it's already used by Linux, so I > > think it's impossible for the table signature to be reused, even if > > not properly documented (it would cause havoc). > > > > It's however not ideal to set this kind of precedents. > > It's not great, but this exists in real systems, for several generations > it seems. > > Making things work for users trumps any idealistic beliefs about > firmware actually conforming to spec. > Thanks Andrew for understanding! These tables have been there for more than 10+ years on all AMD GPU platforms. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present Huang Rui @ 2023-03-12 7:54 ` Huang Rui 2023-03-13 7:23 ` Christian König 2023-03-14 16:02 ` Jan Beulich 2023-03-12 7:54 ` [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH Huang Rui ` (5 subsequent siblings) 7 siblings, 2 replies; 75+ messages in thread From: Huang Rui @ 2023-03-12 7:54 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui From: Chen Jiqian <Jiqian.Chen@amd.com> When dom0 is PVH and we want to passthrough gpu to guest, we should allow BAR writes even through BAR is mapped. If not, the value of BARs are not initialized when guest firstly start. Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> --- xen/drivers/vpci/header.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index ec2e978a4e..918d11fbce 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -392,7 +392,7 @@ static void cf_check bar_write( * Xen only cares whether the BAR is mapped into the p2m, so allow BAR * writes as long as the BAR is not mapped into the p2m. */ - if ( bar->enabled ) + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) { /* If the value written is the current one avoid printing a warning. */ if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) -- 2.25.1 ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-12 7:54 ` [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH Huang Rui @ 2023-03-13 7:23 ` Christian König 2023-03-13 7:26 ` Christian König 2023-03-14 16:02 ` Jan Beulich 1 sibling, 1 reply; 75+ messages in thread From: Christian König @ 2023-03-13 7:23 UTC (permalink / raw) To: Huang Rui, Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian Am 12.03.23 um 08:54 schrieb Huang Rui: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > When dom0 is PVH and we want to passthrough gpu to guest, > we should allow BAR writes even through BAR is mapped. If > not, the value of BARs are not initialized when guest firstly > start. > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > Signed-off-by: Huang Rui <ray.huang@amd.com> > --- > xen/drivers/vpci/header.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > index ec2e978a4e..918d11fbce 100644 > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -392,7 +392,7 @@ static void cf_check bar_write( > * Xen only cares whether the BAR is mapped into the p2m, so allow BAR > * writes as long as the BAR is not mapped into the p2m. > */ > - if ( bar->enabled ) > + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) Checkpath.pl gives here: ERROR: space prohibited after that open parenthesis '(' #115: FILE: xen/drivers/vpci/header.c:395: + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) Christian. > { > /* If the value written is the current one avoid printing a warning. */ > if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-13 7:23 ` Christian König @ 2023-03-13 7:26 ` Christian König 2023-03-13 8:46 ` Jan Beulich ` (2 more replies) 0 siblings, 3 replies; 75+ messages in thread From: Christian König @ 2023-03-13 7:26 UTC (permalink / raw) To: Huang Rui, Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian Am 13.03.23 um 08:23 schrieb Christian König: > > > Am 12.03.23 um 08:54 schrieb Huang Rui: >> From: Chen Jiqian <Jiqian.Chen@amd.com> >> >> When dom0 is PVH and we want to passthrough gpu to guest, >> we should allow BAR writes even through BAR is mapped. If >> not, the value of BARs are not initialized when guest firstly >> start. >> >> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> >> Signed-off-by: Huang Rui <ray.huang@amd.com> >> --- >> xen/drivers/vpci/header.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c >> index ec2e978a4e..918d11fbce 100644 >> --- a/xen/drivers/vpci/header.c >> +++ b/xen/drivers/vpci/header.c >> @@ -392,7 +392,7 @@ static void cf_check bar_write( >> * Xen only cares whether the BAR is mapped into the p2m, so >> allow BAR >> * writes as long as the BAR is not mapped into the p2m. >> */ >> - if ( bar->enabled ) >> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & >> PCI_COMMAND_MEMORY ) > > Checkpath.pl gives here: > > ERROR: space prohibited after that open parenthesis '(' > #115: FILE: xen/drivers/vpci/header.c:395: > + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) But I should probably mention that I'm not 100% sure if this code base uses kernel coding style! Christian. > > Christian. > > >> { >> /* If the value written is the current one avoid printing a >> warning. */ >> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-13 7:26 ` Christian König @ 2023-03-13 8:46 ` Jan Beulich 2023-03-13 8:55 ` Huang Rui 2023-03-14 23:42 ` Stefano Stabellini 2 siblings, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-13 8:46 UTC (permalink / raw) To: Christian König Cc: Alex Deucher, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 13.03.2023 08:26, Christian König wrote: > Am 13.03.23 um 08:23 schrieb Christian König: >> Am 12.03.23 um 08:54 schrieb Huang Rui: >>> --- a/xen/drivers/vpci/header.c >>> +++ b/xen/drivers/vpci/header.c >>> @@ -392,7 +392,7 @@ static void cf_check bar_write( >>> * Xen only cares whether the BAR is mapped into the p2m, so >>> allow BAR >>> * writes as long as the BAR is not mapped into the p2m. >>> */ >>> - if ( bar->enabled ) >>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & >>> PCI_COMMAND_MEMORY ) >> >> Checkpath.pl gives here: >> >> ERROR: space prohibited after that open parenthesis '(' >> #115: FILE: xen/drivers/vpci/header.c:395: >> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > > But I should probably mention that I'm not 100% sure if this code base > uses kernel coding style! It doesn't - see ./CODING_STYLE. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-13 7:26 ` Christian König 2023-03-13 8:46 ` Jan Beulich @ 2023-03-13 8:55 ` Huang Rui 2023-03-14 23:42 ` Stefano Stabellini 2 siblings, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-13 8:55 UTC (permalink / raw) To: Koenig, Christian Cc: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Deucher, Alexander, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian On Mon, Mar 13, 2023 at 03:26:09PM +0800, Koenig, Christian wrote: > Am 13.03.23 um 08:23 schrieb Christian König: > > > > > > Am 12.03.23 um 08:54 schrieb Huang Rui: > >> From: Chen Jiqian <Jiqian.Chen@amd.com> > >> > >> When dom0 is PVH and we want to passthrough gpu to guest, > >> we should allow BAR writes even through BAR is mapped. If > >> not, the value of BARs are not initialized when guest firstly > >> start. > >> > >> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > >> Signed-off-by: Huang Rui <ray.huang@amd.com> > >> --- > >> xen/drivers/vpci/header.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > >> index ec2e978a4e..918d11fbce 100644 > >> --- a/xen/drivers/vpci/header.c > >> +++ b/xen/drivers/vpci/header.c > >> @@ -392,7 +392,7 @@ static void cf_check bar_write( > >> * Xen only cares whether the BAR is mapped into the p2m, so > >> allow BAR > >> * writes as long as the BAR is not mapped into the p2m. > >> */ > >> - if ( bar->enabled ) > >> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & > >> PCI_COMMAND_MEMORY ) > > > > Checkpath.pl gives here: > > > > ERROR: space prohibited after that open parenthesis '(' > > #115: FILE: xen/drivers/vpci/header.c:395: > > + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > > But I should probably mention that I'm not 100% sure if this code base > uses kernel coding style! > I noticed that actully Xen's coding style was different with Linux kernel. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-13 7:26 ` Christian König 2023-03-13 8:46 ` Jan Beulich 2023-03-13 8:55 ` Huang Rui @ 2023-03-14 23:42 ` Stefano Stabellini 2 siblings, 0 replies; 75+ messages in thread From: Stefano Stabellini @ 2023-03-14 23:42 UTC (permalink / raw) To: Christian König Cc: Huang Rui, Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Alex Deucher, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian [-- Attachment #1: Type: text/plain, Size: 1759 bytes --] On Mon, 13 Mar 2023, Christian König wrote: > Am 13.03.23 um 08:23 schrieb Christian König: > > Am 12.03.23 um 08:54 schrieb Huang Rui: > > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > > > When dom0 is PVH and we want to passthrough gpu to guest, > > > we should allow BAR writes even through BAR is mapped. If > > > not, the value of BARs are not initialized when guest firstly > > > start. > > > > > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > > --- > > > xen/drivers/vpci/header.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c > > > index ec2e978a4e..918d11fbce 100644 > > > --- a/xen/drivers/vpci/header.c > > > +++ b/xen/drivers/vpci/header.c > > > @@ -392,7 +392,7 @@ static void cf_check bar_write( > > > * Xen only cares whether the BAR is mapped into the p2m, so allow > > > BAR > > > * writes as long as the BAR is not mapped into the p2m. > > > */ > > > - if ( bar->enabled ) > > > + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > > > > Checkpath.pl gives here: > > > > ERROR: space prohibited after that open parenthesis '(' > > #115: FILE: xen/drivers/vpci/header.c:395: > > + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > > But I should probably mention that I'm not 100% sure if this code base uses > kernel coding style! Hi Christian, Thanks for taking a look at these patches! For better or for worse Xen follows a different coding style from the Linux kernel (see CODING_STYLE under xen.git). In Xen we use: if ( rc != 0 ) { return rc; } ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-12 7:54 ` [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH Huang Rui 2023-03-13 7:23 ` Christian König @ 2023-03-14 16:02 ` Jan Beulich 2023-03-21 9:36 ` Huang Rui 1 sibling, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-14 16:02 UTC (permalink / raw) To: Huang Rui Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 12.03.2023 08:54, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > When dom0 is PVH and we want to passthrough gpu to guest, > we should allow BAR writes even through BAR is mapped. If > not, the value of BARs are not initialized when guest firstly > start. From this it doesn't become clear why a GPU would be special in this regard, or what (if any) prior bug there was. Are you suggesting ... > --- a/xen/drivers/vpci/header.c > +++ b/xen/drivers/vpci/header.c > @@ -392,7 +392,7 @@ static void cf_check bar_write( > * Xen only cares whether the BAR is mapped into the p2m, so allow BAR > * writes as long as the BAR is not mapped into the p2m. > */ > - if ( bar->enabled ) > + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > { > /* If the value written is the current one avoid printing a warning. */ > if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) ... bar->enabled doesn't properly reflect the necessary state? It generally shouldn't be necessary to look at the physical device's state here. Furthermore when you make a change in a case like this, the accompanying comment also needs updating (which might have clarified what, if anything, has been wrong). Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-14 16:02 ` Jan Beulich @ 2023-03-21 9:36 ` Huang Rui 2023-03-21 9:41 ` Jan Beulich 0 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-21 9:36 UTC (permalink / raw) To: Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel Hi Jan, On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: > On 12.03.2023 08:54, Huang Rui wrote: > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > When dom0 is PVH and we want to passthrough gpu to guest, > > we should allow BAR writes even through BAR is mapped. If > > not, the value of BARs are not initialized when guest firstly > > start. > > From this it doesn't become clear why a GPU would be special in this > regard, or what (if any) prior bug there was. Are you suggesting ... > You're right. This is in fact a buggy we encountered while we start the guest domU. > > --- a/xen/drivers/vpci/header.c > > +++ b/xen/drivers/vpci/header.c > > @@ -392,7 +392,7 @@ static void cf_check bar_write( > > * Xen only cares whether the BAR is mapped into the p2m, so allow BAR > > * writes as long as the BAR is not mapped into the p2m. > > */ > > - if ( bar->enabled ) > > + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > > { > > /* If the value written is the current one avoid printing a warning. */ > > if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) > > ... bar->enabled doesn't properly reflect the necessary state? It > generally shouldn't be necessary to look at the physical device's > state here. > > Furthermore when you make a change in a case like this, the > accompanying comment also needs updating (which might have clarified > what, if anything, has been wrong). > That is the problem that we start domU at the first time, the enable flag will be set while the passthrough device would like to write the real pcie bar on the host. And yes, it's temporary workaround, we should figure out the root cause. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 9:36 ` Huang Rui @ 2023-03-21 9:41 ` Jan Beulich 2023-03-21 10:14 ` Huang Rui 0 siblings, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-21 9:41 UTC (permalink / raw) To: Huang Rui Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 21.03.2023 10:36, Huang Rui wrote: > On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: >> On 12.03.2023 08:54, Huang Rui wrote: >>> --- a/xen/drivers/vpci/header.c >>> +++ b/xen/drivers/vpci/header.c >>> @@ -392,7 +392,7 @@ static void cf_check bar_write( >>> * Xen only cares whether the BAR is mapped into the p2m, so allow BAR >>> * writes as long as the BAR is not mapped into the p2m. >>> */ >>> - if ( bar->enabled ) >>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) >>> { >>> /* If the value written is the current one avoid printing a warning. */ >>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) >> >> ... bar->enabled doesn't properly reflect the necessary state? It >> generally shouldn't be necessary to look at the physical device's >> state here. >> >> Furthermore when you make a change in a case like this, the >> accompanying comment also needs updating (which might have clarified >> what, if anything, has been wrong). >> > > That is the problem that we start domU at the first time, the enable flag > will be set while the passthrough device would like to write the real pcie > bar on the host. A pass-through device (i.e. one already owned by a DomU) should never be allowed to write to the real BAR. But it's not clear whether I'm not misinterpreting what you said ... > And yes, it's temporary workaround, we should figure out > the root cause. Right, that's the only way to approach this, imo. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 9:41 ` Jan Beulich @ 2023-03-21 10:14 ` Huang Rui 2023-03-21 10:20 ` Jan Beulich 0 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-21 10:14 UTC (permalink / raw) To: Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote: > On 21.03.2023 10:36, Huang Rui wrote: > > On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: > >> On 12.03.2023 08:54, Huang Rui wrote: > >>> --- a/xen/drivers/vpci/header.c > >>> +++ b/xen/drivers/vpci/header.c > >>> @@ -392,7 +392,7 @@ static void cf_check bar_write( > >>> * Xen only cares whether the BAR is mapped into the p2m, so allow BAR > >>> * writes as long as the BAR is not mapped into the p2m. > >>> */ > >>> - if ( bar->enabled ) > >>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > >>> { > >>> /* If the value written is the current one avoid printing a warning. */ > >>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) > >> > >> ... bar->enabled doesn't properly reflect the necessary state? It > >> generally shouldn't be necessary to look at the physical device's > >> state here. > >> > >> Furthermore when you make a change in a case like this, the > >> accompanying comment also needs updating (which might have clarified > >> what, if anything, has been wrong). > >> > > > > That is the problem that we start domU at the first time, the enable flag > > will be set while the passthrough device would like to write the real pcie > > bar on the host. > > A pass-through device (i.e. one already owned by a DomU) should never > be allowed to write to the real BAR. But it's not clear whether I'm not > misinterpreting what you said ... > OK. Thanks to clarify this. May I know how does a passthrough device modify pci bar with correct behavior on Xen? Thanks, Ray > > And yes, it's temporary workaround, we should figure out > > the root cause. > > Right, that's the only way to approach this, imo. > > Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 10:14 ` Huang Rui @ 2023-03-21 10:20 ` Jan Beulich 2023-03-21 11:49 ` Huang Rui 0 siblings, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-21 10:20 UTC (permalink / raw) To: Huang Rui Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 21.03.2023 11:14, Huang Rui wrote: > On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote: >> On 21.03.2023 10:36, Huang Rui wrote: >>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: >>>> On 12.03.2023 08:54, Huang Rui wrote: >>>>> --- a/xen/drivers/vpci/header.c >>>>> +++ b/xen/drivers/vpci/header.c >>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write( >>>>> * Xen only cares whether the BAR is mapped into the p2m, so allow BAR >>>>> * writes as long as the BAR is not mapped into the p2m. >>>>> */ >>>>> - if ( bar->enabled ) >>>>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) >>>>> { >>>>> /* If the value written is the current one avoid printing a warning. */ >>>>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) >>>> >>>> ... bar->enabled doesn't properly reflect the necessary state? It >>>> generally shouldn't be necessary to look at the physical device's >>>> state here. >>>> >>>> Furthermore when you make a change in a case like this, the >>>> accompanying comment also needs updating (which might have clarified >>>> what, if anything, has been wrong). >>>> >>> >>> That is the problem that we start domU at the first time, the enable flag >>> will be set while the passthrough device would like to write the real pcie >>> bar on the host. >> >> A pass-through device (i.e. one already owned by a DomU) should never >> be allowed to write to the real BAR. But it's not clear whether I'm not >> misinterpreting what you said ... >> > > OK. Thanks to clarify this. May I know how does a passthrough device modify > pci bar with correct behavior on Xen? A pass-through device may write to the virtual BAR, changing where in its own memory space the MMIO range appears. But it cannot (and may not) alter where in host memory space the (physical) MMIO range appears. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 10:20 ` Jan Beulich @ 2023-03-21 11:49 ` Huang Rui 2023-03-21 12:20 ` Roger Pau Monné 2023-03-21 12:27 ` Jan Beulich 0 siblings, 2 replies; 75+ messages in thread From: Huang Rui @ 2023-03-21 11:49 UTC (permalink / raw) To: Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote: > On 21.03.2023 11:14, Huang Rui wrote: > > On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote: > >> On 21.03.2023 10:36, Huang Rui wrote: > >>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: > >>>> On 12.03.2023 08:54, Huang Rui wrote: > >>>>> --- a/xen/drivers/vpci/header.c > >>>>> +++ b/xen/drivers/vpci/header.c > >>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write( > >>>>> * Xen only cares whether the BAR is mapped into the p2m, so allow BAR > >>>>> * writes as long as the BAR is not mapped into the p2m. > >>>>> */ > >>>>> - if ( bar->enabled ) > >>>>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > >>>>> { > >>>>> /* If the value written is the current one avoid printing a warning. */ > >>>>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) > >>>> > >>>> ... bar->enabled doesn't properly reflect the necessary state? It > >>>> generally shouldn't be necessary to look at the physical device's > >>>> state here. > >>>> > >>>> Furthermore when you make a change in a case like this, the > >>>> accompanying comment also needs updating (which might have clarified > >>>> what, if anything, has been wrong). > >>>> > >>> > >>> That is the problem that we start domU at the first time, the enable flag > >>> will be set while the passthrough device would like to write the real pcie > >>> bar on the host. > >> > >> A pass-through device (i.e. one already owned by a DomU) should never > >> be allowed to write to the real BAR. But it's not clear whether I'm not > >> misinterpreting what you said ... > >> > > > > OK. Thanks to clarify this. May I know how does a passthrough device modify > > pci bar with correct behavior on Xen? > > A pass-through device may write to the virtual BAR, changing where in its > own memory space the MMIO range appears. But it cannot (and may not) alter > where in host memory space the (physical) MMIO range appears. > Thanks, but we found if dom0 is PV domain, the passthrough device will access this function to write the real bar. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 11:49 ` Huang Rui @ 2023-03-21 12:20 ` Roger Pau Monné 2023-03-21 12:25 ` Jan Beulich 2023-03-21 12:59 ` Huang Rui 2023-03-21 12:27 ` Jan Beulich 1 sibling, 2 replies; 75+ messages in thread From: Roger Pau Monné @ 2023-03-21 12:20 UTC (permalink / raw) To: Huang Rui Cc: Jan Beulich, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Tue, Mar 21, 2023 at 07:49:26PM +0800, Huang Rui wrote: > On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote: > > On 21.03.2023 11:14, Huang Rui wrote: > > > On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote: > > >> On 21.03.2023 10:36, Huang Rui wrote: > > >>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: > > >>>> On 12.03.2023 08:54, Huang Rui wrote: > > >>>>> --- a/xen/drivers/vpci/header.c > > >>>>> +++ b/xen/drivers/vpci/header.c > > >>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write( > > >>>>> * Xen only cares whether the BAR is mapped into the p2m, so allow BAR > > >>>>> * writes as long as the BAR is not mapped into the p2m. > > >>>>> */ > > >>>>> - if ( bar->enabled ) > > >>>>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > > >>>>> { > > >>>>> /* If the value written is the current one avoid printing a warning. */ > > >>>>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) > > >>>> > > >>>> ... bar->enabled doesn't properly reflect the necessary state? It > > >>>> generally shouldn't be necessary to look at the physical device's > > >>>> state here. > > >>>> > > >>>> Furthermore when you make a change in a case like this, the > > >>>> accompanying comment also needs updating (which might have clarified > > >>>> what, if anything, has been wrong). > > >>>> > > >>> > > >>> That is the problem that we start domU at the first time, the enable flag > > >>> will be set while the passthrough device would like to write the real pcie > > >>> bar on the host. > > >> > > >> A pass-through device (i.e. one already owned by a DomU) should never > > >> be allowed to write to the real BAR. But it's not clear whether I'm not > > >> misinterpreting what you said ... > > >> > > > > > > OK. Thanks to clarify this. May I know how does a passthrough device modify > > > pci bar with correct behavior on Xen? > > > > A pass-through device may write to the virtual BAR, changing where in its > > own memory space the MMIO range appears. But it cannot (and may not) alter > > where in host memory space the (physical) MMIO range appears. > > > > Thanks, but we found if dom0 is PV domain, the passthrough device will > access this function to write the real bar. I'm very confused now, are you trying to use vPCI with HVM domains? As I understood it you are attempting to enable PCI passthrough for HVM guests from a PVH dom0, but now you say your dom0 is PV? Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 12:20 ` Roger Pau Monné @ 2023-03-21 12:25 ` Jan Beulich 2023-03-21 12:59 ` Huang Rui 1 sibling, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-21 12:25 UTC (permalink / raw) To: Roger Pau Monné Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel, Huang Rui On 21.03.2023 13:20, Roger Pau Monné wrote: > On Tue, Mar 21, 2023 at 07:49:26PM +0800, Huang Rui wrote: >> On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote: >>> On 21.03.2023 11:14, Huang Rui wrote: >>>> On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote: >>>>> On 21.03.2023 10:36, Huang Rui wrote: >>>>>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: >>>>>>> On 12.03.2023 08:54, Huang Rui wrote: >>>>>>>> --- a/xen/drivers/vpci/header.c >>>>>>>> +++ b/xen/drivers/vpci/header.c >>>>>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write( >>>>>>>> * Xen only cares whether the BAR is mapped into the p2m, so allow BAR >>>>>>>> * writes as long as the BAR is not mapped into the p2m. >>>>>>>> */ >>>>>>>> - if ( bar->enabled ) >>>>>>>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) >>>>>>>> { >>>>>>>> /* If the value written is the current one avoid printing a warning. */ >>>>>>>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) >>>>>>> >>>>>>> ... bar->enabled doesn't properly reflect the necessary state? It >>>>>>> generally shouldn't be necessary to look at the physical device's >>>>>>> state here. >>>>>>> >>>>>>> Furthermore when you make a change in a case like this, the >>>>>>> accompanying comment also needs updating (which might have clarified >>>>>>> what, if anything, has been wrong). >>>>>>> >>>>>> >>>>>> That is the problem that we start domU at the first time, the enable flag >>>>>> will be set while the passthrough device would like to write the real pcie >>>>>> bar on the host. >>>>> >>>>> A pass-through device (i.e. one already owned by a DomU) should never >>>>> be allowed to write to the real BAR. But it's not clear whether I'm not >>>>> misinterpreting what you said ... >>>>> >>>> >>>> OK. Thanks to clarify this. May I know how does a passthrough device modify >>>> pci bar with correct behavior on Xen? >>> >>> A pass-through device may write to the virtual BAR, changing where in its >>> own memory space the MMIO range appears. But it cannot (and may not) alter >>> where in host memory space the (physical) MMIO range appears. >>> >> >> Thanks, but we found if dom0 is PV domain, the passthrough device will >> access this function to write the real bar. > > I'm very confused now, are you trying to use vPCI with HVM domains? > > As I understood it you are attempting to enable PCI passthrough for > HVM guests from a PVH dom0, but now you say your dom0 is PV? I didn't read it like this. Instead my way of understanding the reply is that they try to mimic on PVH Dom0 what they observe on PV Dom0. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 12:20 ` Roger Pau Monné 2023-03-21 12:25 ` Jan Beulich @ 2023-03-21 12:59 ` Huang Rui 1 sibling, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-21 12:59 UTC (permalink / raw) To: Roger Pau Monné Cc: Jan Beulich, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Tue, Mar 21, 2023 at 08:20:53PM +0800, Roger Pau Monné wrote: > On Tue, Mar 21, 2023 at 07:49:26PM +0800, Huang Rui wrote: > > On Tue, Mar 21, 2023 at 06:20:03PM +0800, Jan Beulich wrote: > > > On 21.03.2023 11:14, Huang Rui wrote: > > > > On Tue, Mar 21, 2023 at 05:41:57PM +0800, Jan Beulich wrote: > > > >> On 21.03.2023 10:36, Huang Rui wrote: > > > >>> On Wed, Mar 15, 2023 at 12:02:35AM +0800, Jan Beulich wrote: > > > >>>> On 12.03.2023 08:54, Huang Rui wrote: > > > >>>>> --- a/xen/drivers/vpci/header.c > > > >>>>> +++ b/xen/drivers/vpci/header.c > > > >>>>> @@ -392,7 +392,7 @@ static void cf_check bar_write( > > > >>>>> * Xen only cares whether the BAR is mapped into the p2m, so allow BAR > > > >>>>> * writes as long as the BAR is not mapped into the p2m. > > > >>>>> */ > > > >>>>> - if ( bar->enabled ) > > > >>>>> + if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) > > > >>>>> { > > > >>>>> /* If the value written is the current one avoid printing a warning. */ > > > >>>>> if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) > > > >>>> > > > >>>> ... bar->enabled doesn't properly reflect the necessary state? It > > > >>>> generally shouldn't be necessary to look at the physical device's > > > >>>> state here. > > > >>>> > > > >>>> Furthermore when you make a change in a case like this, the > > > >>>> accompanying comment also needs updating (which might have clarified > > > >>>> what, if anything, has been wrong). > > > >>>> > > > >>> > > > >>> That is the problem that we start domU at the first time, the enable flag > > > >>> will be set while the passthrough device would like to write the real pcie > > > >>> bar on the host. > > > >> > > > >> A pass-through device (i.e. one already owned by a DomU) should never > > > >> be allowed to write to the real BAR. But it's not clear whether I'm not > > > >> misinterpreting what you said ... > > > >> > > > > > > > > OK. Thanks to clarify this. May I know how does a passthrough device modify > > > > pci bar with correct behavior on Xen? > > > > > > A pass-through device may write to the virtual BAR, changing where in its > > > own memory space the MMIO range appears. But it cannot (and may not) alter > > > where in host memory space the (physical) MMIO range appears. > > > > > > > Thanks, but we found if dom0 is PV domain, the passthrough device will > > access this function to write the real bar. > > I'm very confused now, are you trying to use vPCI with HVM domains? We are using QEMU for passthrough at this moment. > > As I understood it you are attempting to enable PCI passthrough for > HVM guests from a PVH dom0, but now you say your dom0 is PV? > Ah, sorry to make you confused, you're right. I am using PVH dom0 + HVM domU. But we are comparing passthrough function on PV dom0 + HVM domU as a reference. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 11:49 ` Huang Rui 2023-03-21 12:20 ` Roger Pau Monné @ 2023-03-21 12:27 ` Jan Beulich 2023-03-21 13:03 ` Huang Rui 1 sibling, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-21 12:27 UTC (permalink / raw) To: Huang Rui Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 21.03.2023 12:49, Huang Rui wrote: > Thanks, but we found if dom0 is PV domain, the passthrough device will > access this function to write the real bar. Can you please be quite a bit more detailed about this? The specific code paths taken (in upstream software) to result in such would of of interest. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 12:27 ` Jan Beulich @ 2023-03-21 13:03 ` Huang Rui 2023-03-22 7:28 ` Huang Rui 0 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-21 13:03 UTC (permalink / raw) To: Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote: > On 21.03.2023 12:49, Huang Rui wrote: > > Thanks, but we found if dom0 is PV domain, the passthrough device will > > access this function to write the real bar. > > Can you please be quite a bit more detailed about this? The specific code > paths taken (in upstream software) to result in such would of of interest. > yes, please wait for a moment. let me capture a trace dump in my side. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-21 13:03 ` Huang Rui @ 2023-03-22 7:28 ` Huang Rui 2023-03-22 7:45 ` Jan Beulich 2023-03-22 9:34 ` Roger Pau Monné 0 siblings, 2 replies; 75+ messages in thread From: Huang Rui @ 2023-03-22 7:28 UTC (permalink / raw) To: Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monn�, Stefano Stabellini, Anthony PERARD, xen-devel On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote: > On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote: > > On 21.03.2023 12:49, Huang Rui wrote: > > > Thanks, but we found if dom0 is PV domain, the passthrough device will > > > access this function to write the real bar. > > > > Can you please be quite a bit more detailed about this? The specific code > > paths taken (in upstream software) to result in such would of of interest. > > > > yes, please wait for a moment. let me capture a trace dump in my side. > Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called, please ignore above information. While xen is on initialization on PVH dom0, it will add all PCI devices in the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1 (Audio device). Audio is another function in the pcie device, but we won't use it here. So we will remove it after that. Please see below xl dmesg: (XEN) PCI add device 0000:03:00.0 (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 (XEN) PCI add device 0000:03:00.1 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 (XEN) PCI add device 0000:04:00.0 ... (XEN) PCI add device 0000:07:00.7 (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented (XEN) PCI remove device 0000:03:00.1 We run below script to remove audio echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as passthrough. At this moment, the real bar is trying to be written. (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 (XEN) Xen WARN at drivers/vpci/header.c:408 (XEN) ----[ Xen-4.18-unstable x86_64 debug=y Not tainted ]---- (XEN) CPU: 8 (XEN) RIP: e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v7) (XEN) rax: ffff8303fc36d06c rbx: ffff8303f90468b0 rcx: 0000000000000010 (XEN) rdx: 0000000000000002 rsi: ffff8303fc36a020 rdi: ffff8303fc36a018 (XEN) rbp: ffff8303fc367c18 rsp: ffff8303fc367be8 r8: 0000000000000001 (XEN) r9: ffff8303fc36a010 r10: 0000000000000001 r11: 0000000000000001 (XEN) r12: 00000000d0700000 r13: ffff8303fc6d9230 r14: ffff8303fc6d9270 (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000003506e0 (XEN) cr3: 00000003fc3c4000 cr2: 00007f180f6371e8 (XEN) fsb: 00007fce655edbc0 gsb: ffff88822f3c0000 gss: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 (XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce): (XEN) b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9 (XEN) Xen stack trace from rsp=ffff8303fc367be8: (XEN) 00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004 (XEN) 0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683 (XEN) 00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230 (XEN) 0000000000000000 0000000000000000 0000002400000004 0000000000000000 (XEN) 0000000000000000 0000000000000000 0000000000000004 00000000d0700000 (XEN) 0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0 (XEN) ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000 (XEN) ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0 (XEN) ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000 (XEN) ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000 (XEN) 0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004 (XEN) ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357 (XEN) 0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000 (XEN) 0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000 (XEN) 0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44 (XEN) ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000 (XEN) ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000 (XEN) 0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44 (XEN) 0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490 (XEN) 000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000 (XEN) Xen call trace: (XEN) [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce (XEN) [<ffff82d040263683>] F vpci_write+0x123/0x26c (XEN) [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7 (XEN) [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f (XEN) [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c (XEN) [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb (XEN) [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a (XEN) [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35 (XEN) [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7 (XEN) [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa (XEN) [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18 (XEN) (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-22 7:28 ` Huang Rui @ 2023-03-22 7:45 ` Jan Beulich 2023-03-22 9:34 ` Roger Pau Monné 1 sibling, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-22 7:45 UTC (permalink / raw) To: Huang Rui Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monn�, Stefano Stabellini, Anthony PERARD, xen-devel On 22.03.2023 08:28, Huang Rui wrote: > On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote: >> On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote: >>> On 21.03.2023 12:49, Huang Rui wrote: >>>> Thanks, but we found if dom0 is PV domain, the passthrough device will >>>> access this function to write the real bar. >>> >>> Can you please be quite a bit more detailed about this? The specific code >>> paths taken (in upstream software) to result in such would of of interest. >>> >> >> yes, please wait for a moment. let me capture a trace dump in my side. >> > > Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called, > please ignore above information. > > While xen is on initialization on PVH dom0, it will add all PCI devices in > the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1 > (Audio device). > > Audio is another function in the pcie device, but we won't use it here. So > we will remove it after that. > > Please see below xl dmesg: > > (XEN) PCI add device 0000:03:00.0 > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 > (XEN) PCI add device 0000:03:00.1 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) PCI add device 0000:04:00.0 > > ... > > (XEN) PCI add device 0000:07:00.7 > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented > (XEN) PCI remove device 0000:03:00.1 > > We run below script to remove audio > > echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove Why would you do that? Aiui this is a preparatory step to hot-unplug the device, which surely you don't mean to do. (But this is largely unrelated to the issue at hand; I'm merely curious.) > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented > > Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as > passthrough. At this moment, the real bar is trying to be written. How do you conclude it's the "real" BAR? And where is this attempt coming from? We refuse BAR updates for enabled BARs for a reason, so possibly there's code elsewhere which needs adjusting. > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 > (XEN) Xen WARN at drivers/vpci/header.c:408 None of these exist in upstream code. Therefore, for the output you supply to be meaningful, we also need to know what code changes you made (which then tells us by how much line numbers have shifted, and what e.g. the WARN_ON() condition is - it clearly isn't tied to bar->enabled being true alone, or else there would have been a 2nd instance at the bottom, unless of course you've stripped that). Jan > (XEN) ----[ Xen-4.18-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 8 > (XEN) RIP: e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce > (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v7) > (XEN) rax: ffff8303fc36d06c rbx: ffff8303f90468b0 rcx: 0000000000000010 > (XEN) rdx: 0000000000000002 rsi: ffff8303fc36a020 rdi: ffff8303fc36a018 > (XEN) rbp: ffff8303fc367c18 rsp: ffff8303fc367be8 r8: 0000000000000001 > (XEN) r9: ffff8303fc36a010 r10: 0000000000000001 r11: 0000000000000001 > (XEN) r12: 00000000d0700000 r13: ffff8303fc6d9230 r14: ffff8303fc6d9270 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000003506e0 > (XEN) cr3: 00000003fc3c4000 cr2: 00007f180f6371e8 > (XEN) fsb: 00007fce655edbc0 gsb: ffff88822f3c0000 gss: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce): > (XEN) b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9 > (XEN) Xen stack trace from rsp=ffff8303fc367be8: > (XEN) 00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004 > (XEN) 0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683 > (XEN) 00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230 > (XEN) 0000000000000000 0000000000000000 0000002400000004 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000004 00000000d0700000 > (XEN) 0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0 > (XEN) ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000 > (XEN) ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0 > (XEN) ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000 > (XEN) ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000 > (XEN) 0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004 > (XEN) ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357 > (XEN) 0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000 > (XEN) 0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000 > (XEN) 0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44 > (XEN) ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000 > (XEN) ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000 > (XEN) 0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44 > (XEN) 0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490 > (XEN) 000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000 > (XEN) Xen call trace: > (XEN) [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce > (XEN) [<ffff82d040263683>] F vpci_write+0x123/0x26c > (XEN) [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7 > (XEN) [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f > (XEN) [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c > (XEN) [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb > (XEN) [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a > (XEN) [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35 > (XEN) [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7 > (XEN) [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa > (XEN) [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18 > (XEN) > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 > > Thanks, > Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-22 7:28 ` Huang Rui 2023-03-22 7:45 ` Jan Beulich @ 2023-03-22 9:34 ` Roger Pau Monné 2023-03-22 12:33 ` Huang Rui 1 sibling, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-22 9:34 UTC (permalink / raw) To: Huang Rui Cc: Jan Beulich, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Wed, Mar 22, 2023 at 03:28:58PM +0800, Huang Rui wrote: > On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote: > > On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote: > > > On 21.03.2023 12:49, Huang Rui wrote: > > > > Thanks, but we found if dom0 is PV domain, the passthrough device will > > > > access this function to write the real bar. > > > > > > Can you please be quite a bit more detailed about this? The specific code > > > paths taken (in upstream software) to result in such would of of interest. > > > > > > > yes, please wait for a moment. let me capture a trace dump in my side. > > > > Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called, > please ignore above information. > > While xen is on initialization on PVH dom0, it will add all PCI devices in > the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1 > (Audio device). > > Audio is another function in the pcie device, but we won't use it here. So > we will remove it after that. > > Please see below xl dmesg: > > (XEN) PCI add device 0000:03:00.0 > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 > (XEN) PCI add device 0000:03:00.1 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > (XEN) PCI add device 0000:04:00.0 > > ... > > (XEN) PCI add device 0000:07:00.7 > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented > (XEN) PCI remove device 0000:03:00.1 > > We run below script to remove audio > > echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove > > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented > > Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as > passthrough. At this moment, the real bar is trying to be written. > > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 > (XEN) Xen WARN at drivers/vpci/header.c:408 > (XEN) ----[ Xen-4.18-unstable x86_64 debug=y Not tainted ]---- > (XEN) CPU: 8 > (XEN) RIP: e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce > (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v7) > (XEN) rax: ffff8303fc36d06c rbx: ffff8303f90468b0 rcx: 0000000000000010 > (XEN) rdx: 0000000000000002 rsi: ffff8303fc36a020 rdi: ffff8303fc36a018 > (XEN) rbp: ffff8303fc367c18 rsp: ffff8303fc367be8 r8: 0000000000000001 > (XEN) r9: ffff8303fc36a010 r10: 0000000000000001 r11: 0000000000000001 > (XEN) r12: 00000000d0700000 r13: ffff8303fc6d9230 r14: ffff8303fc6d9270 > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000003506e0 > (XEN) cr3: 00000003fc3c4000 cr2: 00007f180f6371e8 > (XEN) fsb: 00007fce655edbc0 gsb: ffff88822f3c0000 gss: 0000000000000000 > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > (XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce): > (XEN) b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9 > (XEN) Xen stack trace from rsp=ffff8303fc367be8: > (XEN) 00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004 > (XEN) 0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683 > (XEN) 00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230 > (XEN) 0000000000000000 0000000000000000 0000002400000004 0000000000000000 > (XEN) 0000000000000000 0000000000000000 0000000000000004 00000000d0700000 > (XEN) 0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0 > (XEN) ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000 > (XEN) ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0 > (XEN) ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000 > (XEN) ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000 > (XEN) 0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004 > (XEN) ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357 > (XEN) 0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000 > (XEN) 0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000 > (XEN) 0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44 > (XEN) ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000 > (XEN) ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000 > (XEN) 0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44 > (XEN) 0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490 > (XEN) 000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000 > (XEN) Xen call trace: > (XEN) [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce > (XEN) [<ffff82d040263683>] F vpci_write+0x123/0x26c > (XEN) [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7 > (XEN) [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f > (XEN) [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c > (XEN) [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb > (XEN) [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a > (XEN) [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35 > (XEN) [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7 > (XEN) [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa > (XEN) [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18 > (XEN) > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 As said by Jan, it's hard to figure out where are the printks placed without a diff of your changes. So far the above seems to be expected, as we currently don't handle BAR register writes with memory decoding enabled. Given the change proposed in this patch, can you check whether `bar->enabled == true` but the PCI command register has the memory decoding bit unset? If so it would mean Xen state got out-of-sync with the hardware state, and we would need to figure out where it happened. Is there any backdoor in the AMD GPU that allows to disable memory decoding without using the PCI command register? Regards, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-22 9:34 ` Roger Pau Monné @ 2023-03-22 12:33 ` Huang Rui 2023-03-22 12:48 ` Jan Beulich 0 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-22 12:33 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Wed, Mar 22, 2023 at 05:34:41PM +0800, Roger Pau Monné wrote: > On Wed, Mar 22, 2023 at 03:28:58PM +0800, Huang Rui wrote: > > On Tue, Mar 21, 2023 at 09:03:58PM +0800, Huang Rui wrote: > > > On Tue, Mar 21, 2023 at 08:27:21PM +0800, Jan Beulich wrote: > > > > On 21.03.2023 12:49, Huang Rui wrote: > > > > > Thanks, but we found if dom0 is PV domain, the passthrough device will > > > > > access this function to write the real bar. > > > > > > > > Can you please be quite a bit more detailed about this? The specific code > > > > paths taken (in upstream software) to result in such would of of interest. > > > > > > > > > > yes, please wait for a moment. let me capture a trace dump in my side. > > > > > > > Sorry, we are wrong that if Xen PV dom0, bar_write() won't be called, > > please ignore above information. > > > > While xen is on initialization on PVH dom0, it will add all PCI devices in > > the real bus including 0000:03:00.0 (VGA device: GPU) and 0000:03:00.1 > > (Audio device). > > > > Audio is another function in the pcie device, but we won't use it here. So > > we will remove it after that. > > > > Please see below xl dmesg: > > > > (XEN) PCI add device 0000:03:00.0 > > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:03:00.1 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:03:00.1 bar->enabled 0 > > (XEN) PCI add device 0000:03:00.1 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 391 0000:04:00.0 bar->enabled 0 > > (XEN) d0v0 bar_write Ray line 406 0000:04:00.0 bar->enabled 0 > > (XEN) PCI add device 0000:04:00.0 > > > > ... > > > > (XEN) PCI add device 0000:07:00.7 > > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0010058 unimplemented > > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc0011020 unimplemented > > (XEN) PCI remove device 0000:03:00.1 > > > > We run below script to remove audio > > > > echo -n "1" > /sys/bus/pci/devices/0000:03:00.1/remove > > > > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029b unimplemented > > (XEN) arch/x86/hvm/svm/svm.c:2017:d0v0 RDMSR 0xc001029a unimplemented > > > > Then we will run "xl pci-assignable-add 03:00.0" to assign GPU as > > passthrough. At this moment, the real bar is trying to be written. > > > > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 > > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 > > (XEN) Xen WARN at drivers/vpci/header.c:408 > > (XEN) ----[ Xen-4.18-unstable x86_64 debug=y Not tainted ]---- > > (XEN) CPU: 8 > > (XEN) RIP: e008:[<ffff82d040263cb9>] drivers/vpci/header.c#bar_write+0xc0/0x1ce > > (XEN) RFLAGS: 0000000000010202 CONTEXT: hypervisor (d0v7) > > (XEN) rax: ffff8303fc36d06c rbx: ffff8303f90468b0 rcx: 0000000000000010 > > (XEN) rdx: 0000000000000002 rsi: ffff8303fc36a020 rdi: ffff8303fc36a018 > > (XEN) rbp: ffff8303fc367c18 rsp: ffff8303fc367be8 r8: 0000000000000001 > > (XEN) r9: ffff8303fc36a010 r10: 0000000000000001 r11: 0000000000000001 > > (XEN) r12: 00000000d0700000 r13: ffff8303fc6d9230 r14: ffff8303fc6d9270 > > (XEN) r15: 0000000000000000 cr0: 0000000080050033 cr4: 00000000003506e0 > > (XEN) cr3: 00000003fc3c4000 cr2: 00007f180f6371e8 > > (XEN) fsb: 00007fce655edbc0 gsb: ffff88822f3c0000 gss: 0000000000000000 > > (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: e008 > > (XEN) Xen code around <ffff82d040263cb9> (drivers/vpci/header.c#bar_write+0xc0/0x1ce): > > (XEN) b6 53 14 f6 c2 02 74 02 <0f> 0b 48 8b 03 45 84 ff 0f 85 ec 00 00 00 48 b9 > > (XEN) Xen stack trace from rsp=ffff8303fc367be8: > > (XEN) 00000024fc367bf8 ffff8303f9046a50 0000000000000000 0000000000000004 > > (XEN) 0000000000000004 0000000000000024 ffff8303fc367ca0 ffff82d040263683 > > (XEN) 00000300fc367ca0 d070000003003501 00000024d0700000 ffff8303fc6d9230 > > (XEN) 0000000000000000 0000000000000000 0000002400000004 0000000000000000 > > (XEN) 0000000000000000 0000000000000000 0000000000000004 00000000d0700000 > > (XEN) 0000000000000024 0000000000000000 ffff82d040404bc0 ffff8303fc367cd0 > > (XEN) ffff82d0402c60a8 0000030000000001 ffff8303fc367d88 0000000000000000 > > (XEN) ffff8303fc610800 ffff8303fc367d30 ffff82d0402c54da ffff8303fc367ce0 > > (XEN) ffff8303fc367fff 0000000000000004 ffff830300000004 00000000d0700000 > > (XEN) ffff8303fc610800 ffff8303fc367d88 0000000000000001 0000000000000000 > > (XEN) 0000000000000000 ffff8303fc367d58 ffff82d0402c5570 0000000000000004 > > (XEN) ffff8304065ea000 ffff8303fc367e28 ffff8303fc367dd0 ffff82d0402b5357 > > (XEN) 0000000000000cfc ffff8303fc621000 0000000000000000 0000000000000000 > > (XEN) 0000000000000cfc 00000000d0700000 0000000400000001 0001000000000000 > > (XEN) 0000000000000004 0000000000000004 0000000000000000 ffff8303fc367e44 > > (XEN) ffff8304065ea000 ffff8303fc367e10 ffff82d0402b56d6 0000000000000000 > > (XEN) ffff8303fc367e44 0000000000000004 0000000000000cfc ffff8304065e6000 > > (XEN) 0000000000000000 ffff8303fc367e30 ffff82d0402b6bcc ffff8303fc367e44 > > (XEN) 0000000000000001 ffff8303fc367e70 ffff82d0402c5e80 d070000040203490 > > (XEN) 000000000000007b ffff8303fc367ef8 ffff8304065e6000 ffff8304065ea000 > > (XEN) Xen call trace: > > (XEN) [<ffff82d040263cb9>] R drivers/vpci/header.c#bar_write+0xc0/0x1ce > > (XEN) [<ffff82d040263683>] F vpci_write+0x123/0x26c > > (XEN) [<ffff82d0402c60a8>] F arch/x86/hvm/io.c#vpci_portio_write+0xa0/0xa7 > > (XEN) [<ffff82d0402c54da>] F hvm_process_io_intercept+0x203/0x26f > > (XEN) [<ffff82d0402c5570>] F hvm_io_intercept+0x2a/0x4c > > (XEN) [<ffff82d0402b5357>] F arch/x86/hvm/emulate.c#hvmemul_do_io+0x29b/0x5eb > > (XEN) [<ffff82d0402b56d6>] F arch/x86/hvm/emulate.c#hvmemul_do_io_buffer+0x2f/0x6a > > (XEN) [<ffff82d0402b6bcc>] F hvmemul_do_pio_buffer+0x33/0x35 > > (XEN) [<ffff82d0402c5e80>] F handle_pio+0x70/0x1b7 > > (XEN) [<ffff82d04029dc7f>] F svm_vmexit_handler+0x10ba/0x18aa > > (XEN) [<ffff82d0402034e5>] F svm_stgi_label+0x8/0x18 > > (XEN) > > (XEN) d0v7 bar_write Ray line 391 0000:03:00.0 bar->enabled 1 > > (XEN) d0v7 bar_write Ray line 406 0000:03:00.0 bar->enabled 1 > Hi Jan, Roger, > As said by Jan, it's hard to figure out where are the printks placed without a > diff of your changes. I attached the diff of my prints below, and I want to figure out why the Bar_write() is called while we use pci-assignable-add to assign passthrough device in PVH dom0. diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c index 918d11fbce..35447aff2a 100644 --- a/xen/drivers/vpci/header.c +++ b/xen/drivers/vpci/header.c @@ -388,12 +388,14 @@ static void cf_check bar_write( else val &= PCI_BASE_ADDRESS_MEM_MASK; + gprintk(XENLOG_WARNING, "%s Ray line %d %pp bar->enabled %d\n", __func__, __LINE__, &pdev->sbdf , bar->enabled); /* * Xen only cares whether the BAR is mapped into the p2m, so allow BAR * writes as long as the BAR is not mapped into the p2m. */ if ( pci_conf_read16(pdev->sbdf, PCI_COMMAND) & PCI_COMMAND_MEMORY ) { + gprintk(XENLOG_WARNING, "%s Ray line %d %pp bar->enabled %d\n", __func__, __LINE__, &pdev->sbdf , bar->enabled); /* If the value written is the current one avoid printing a warning. */ if ( val != (uint32_t)(bar->addr >> (hi ? 32 : 0)) ) gprintk(XENLOG_WARNING, @@ -401,7 +403,9 @@ static void cf_check bar_write( &pdev->sbdf, bar - pdev->vpci->header.bars + hi); return; } - + gprintk(XENLOG_WARNING, "%s Ray line %d %pp bar->enabled %d\n", __func__, __LINE__, &pdev->sbdf , bar->enabled); + if (bar->enabled) + WARN_ON(1); /* * Update the cached address, so that when memory decoding is enabled > > So far the above seems to be expected, as we currently don't handle BAR > register writes with memory decoding enabled. > > Given the change proposed in this patch, can you check whether `bar->enabled == > true` but the PCI command register has the memory decoding bit unset? I traced that while we do pci-assignable-add, we will follow below trace to bind the passthrough device. pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() Then kernel xen-pciback driver want to add virtual configuration spaces. In this phase, the bar_write() in xen hypervisor will be called. I still need a bit more time to figure the exact reason. May I know where the xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) [ 462.911658] Already setup the GSI :28 [ 462.911668] Already map the GSI :28 and IRQ: 115 [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device > > If so it would mean Xen state got out-of-sync with the hardware state, and we > would need to figure out where it happened. Is there any backdoor in the AMD > GPU that allows to disable memory decoding without using the PCI command > register? > I don't think we have any backdoor. Thanks, Ray ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-22 12:33 ` Huang Rui @ 2023-03-22 12:48 ` Jan Beulich 2023-03-23 10:26 ` Huang Rui 2023-03-23 10:43 ` Roger Pau Monné 0 siblings, 2 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-22 12:48 UTC (permalink / raw) To: Huang Rui Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel, Roger Pau Monné On 22.03.2023 13:33, Huang Rui wrote: > I traced that while we do pci-assignable-add, we will follow below trace to > bind the passthrough device. > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() > > Then kernel xen-pciback driver want to add virtual configuration spaces. In > this phase, the bar_write() in xen hypervisor will be called. I still need > a bit more time to figure the exact reason. May I know where the > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? Any config space access would. And I might guess ... > [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 > [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... > [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device > [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc > [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... > [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config > [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space > [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 > [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 > [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 > [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c > [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d > [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c > [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d > [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f > [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 > [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 > [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 > [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c > [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 > [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 > [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 > [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 > [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 > [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 > [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 > [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 > [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 > [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 > [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 > [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 > [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device > [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) > [ 462.911658] Already setup the GSI :28 > [ 462.911668] Already map the GSI :28 and IRQ: 115 > [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device > [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device > [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device ... it is actually the reset here, saving and then restoring config space. If e.g. that restore was done "blindly" (i.e. simply writing fields low to high), then memory decode would be re-enabled before the BARs are written. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-22 12:48 ` Jan Beulich @ 2023-03-23 10:26 ` Huang Rui 2023-03-23 14:16 ` Jan Beulich 2023-03-23 10:43 ` Roger Pau Monné 1 sibling, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-23 10:26 UTC (permalink / raw) To: Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel, Roger Pau Monné On Wed, Mar 22, 2023 at 08:48:30PM +0800, Jan Beulich wrote: > On 22.03.2023 13:33, Huang Rui wrote: > > I traced that while we do pci-assignable-add, we will follow below trace to > > bind the passthrough device. > > > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() > > > > Then kernel xen-pciback driver want to add virtual configuration spaces. In > > this phase, the bar_write() in xen hypervisor will be called. I still need > > a bit more time to figure the exact reason. May I know where the > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? > > Any config space access would. And I might guess ... > > > [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 > > [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... > > [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device > > [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc > > [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... > > [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config > > [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space > > [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 > > [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 > > [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 > > [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c > > [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d > > [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c > > [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d > > [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f > > [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 > > [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 > > [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 > > [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c > > [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 > > [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 > > [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 > > [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 > > [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 > > [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 > > [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 > > [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 > > [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 > > [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 > > [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 > > [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 > > [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device > > [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) > > [ 462.911658] Already setup the GSI :28 > > [ 462.911668] Already map the GSI :28 and IRQ: 115 > > [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device > > [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device > > [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device > > ... it is actually the reset here, saving and then restoring config space. > If e.g. that restore was done "blindly" (i.e. simply writing fields low to > high), then memory decode would be re-enabled before the BARs are written. > Yes, we confirm the problem is while the xen-pciback driver initializes passthrough device with pcistub_init_device() -> pci_restore_state() -> pci_restore_config_space() -> pci_restore_config_space_range() -> pci_restore_config_dword() -> pci_write_config_dword(), the pci config write will trigger io interrupt to bar_write() in the xen, then bar->enable is set, the write is not actually allowed. May I know whether this behavior (restore) is expected? Or it should not reset the device. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-23 10:26 ` Huang Rui @ 2023-03-23 14:16 ` Jan Beulich 0 siblings, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-23 14:16 UTC (permalink / raw) To: Huang Rui Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel, Roger Pau Monné On 23.03.2023 11:26, Huang Rui wrote: > On Wed, Mar 22, 2023 at 08:48:30PM +0800, Jan Beulich wrote: >> On 22.03.2023 13:33, Huang Rui wrote: >>> I traced that while we do pci-assignable-add, we will follow below trace to >>> bind the passthrough device. >>> >>> pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() >>> >>> Then kernel xen-pciback driver want to add virtual configuration spaces. In >>> this phase, the bar_write() in xen hypervisor will be called. I still need >>> a bit more time to figure the exact reason. May I know where the >>> xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? >> >> Any config space access would. And I might guess ... >> >>> [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 >>> [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... >>> [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device >>> [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc >>> [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... >>> [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config >>> [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space >>> [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 >>> [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 >>> [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 >>> [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c >>> [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d >>> [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c >>> [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d >>> [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f >>> [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 >>> [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 >>> [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 >>> [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c >>> [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 >>> [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 >>> [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 >>> [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 >>> [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 >>> [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 >>> [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 >>> [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 >>> [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 >>> [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 >>> [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 >>> [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 >>> [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device >>> [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) >>> [ 462.911658] Already setup the GSI :28 >>> [ 462.911668] Already map the GSI :28 and IRQ: 115 >>> [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device >>> [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device >>> [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device >> >> ... it is actually the reset here, saving and then restoring config space. >> If e.g. that restore was done "blindly" (i.e. simply writing fields low to >> high), then memory decode would be re-enabled before the BARs are written. >> > > Yes, we confirm the problem is while the xen-pciback driver initializes > passthrough device with pcistub_init_device() -> pci_restore_state() -> > pci_restore_config_space() -> pci_restore_config_space_range() -> > pci_restore_config_dword() -> pci_write_config_dword(), the pci config > write will trigger io interrupt to bar_write() in the xen, then bar->enable > is set, the write is not actually allowed. > > May I know whether this behavior (restore) is expected? Or it should not > reset the device. The reset is expected. To expand slightly on Roger's reply: The reset we're unaware of has likely indeed brought bar->enable and command register state out of sync. For everything else see Roger's response. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-22 12:48 ` Jan Beulich 2023-03-23 10:26 ` Huang Rui @ 2023-03-23 10:43 ` Roger Pau Monné 2023-03-23 13:34 ` Huang Rui 1 sibling, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-23 10:43 UTC (permalink / raw) To: Jan Beulich Cc: Huang Rui, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote: > On 22.03.2023 13:33, Huang Rui wrote: > > I traced that while we do pci-assignable-add, we will follow below trace to > > bind the passthrough device. > > > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() > > > > Then kernel xen-pciback driver want to add virtual configuration spaces. In > > this phase, the bar_write() in xen hypervisor will be called. I still need > > a bit more time to figure the exact reason. May I know where the > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? > > Any config space access would. And I might guess ... > > > [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 > > [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... > > [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device > > [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc > > [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... > > [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config > > [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space > > [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 > > [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 > > [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 > > [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c > > [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d > > [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c > > [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d > > [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f > > [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 > > [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 > > [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 > > [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c > > [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 > > [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 > > [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 > > [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 > > [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 > > [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 > > [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 > > [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 > > [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 > > [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 > > [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 > > [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 > > [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device > > [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) > > [ 462.911658] Already setup the GSI :28 > > [ 462.911668] Already map the GSI :28 and IRQ: 115 > > [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device > > [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device > > [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device > > ... it is actually the reset here, saving and then restoring config space. > If e.g. that restore was done "blindly" (i.e. simply writing fields low to > high), then memory decode would be re-enabled before the BARs are written. The problem is also that we don't tell vPCI that the device has been reset, so the current cached state in pdev->vpci is all out of date with the real device state. I didn't hit this on my test because the device I was using had no reset support. I don't think it's feasible for Xen to detect all the possible reset methods dom0 might use, as some of those are device specific for example. We would have to introduce a new hypercall that clears all vPCI device state, PHYSDEVOP_pci_device_reset for example. This will involve adding proper cleanup functions, as the current code in vpci_remove_device() only deals with allocated memory (because so far devices where not deassigned) but we now also need to make sure MSI(-X) interrupts are torn down and freed, and will also require removing any mappings of BARs into the dom0 physmap. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-23 10:43 ` Roger Pau Monné @ 2023-03-23 13:34 ` Huang Rui 2023-03-23 16:23 ` Roger Pau Monné 0 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-23 13:34 UTC (permalink / raw) To: Roger Pau Monné Cc: Jan Beulich, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Thu, Mar 23, 2023 at 06:43:53PM +0800, Roger Pau Monné wrote: > On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote: > > On 22.03.2023 13:33, Huang Rui wrote: > > > I traced that while we do pci-assignable-add, we will follow below trace to > > > bind the passthrough device. > > > > > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() > > > > > > Then kernel xen-pciback driver want to add virtual configuration spaces. In > > > this phase, the bar_write() in xen hypervisor will be called. I still need > > > a bit more time to figure the exact reason. May I know where the > > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? > > > > Any config space access would. And I might guess ... > > > > > [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 > > > [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... > > > [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device > > > [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc > > > [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... > > > [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config > > > [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space > > > [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 > > > [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 > > > [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 > > > [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c > > > [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d > > > [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c > > > [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d > > > [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f > > > [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 > > > [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 > > > [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 > > > [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c > > > [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 > > > [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 > > > [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 > > > [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 > > > [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 > > > [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 > > > [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 > > > [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 > > > [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 > > > [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 > > > [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 > > > [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 > > > [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device > > > [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) > > > [ 462.911658] Already setup the GSI :28 > > > [ 462.911668] Already map the GSI :28 and IRQ: 115 > > > [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device > > > [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device > > > [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device > > > > ... it is actually the reset here, saving and then restoring config space. > > If e.g. that restore was done "blindly" (i.e. simply writing fields low to > > high), then memory decode would be re-enabled before the BARs are written. > > The problem is also that we don't tell vPCI that the device has been > reset, so the current cached state in pdev->vpci is all out of date > with the real device state. > > I didn't hit this on my test because the device I was using had no > reset support. > > I don't think it's feasible for Xen to detect all the possible reset > methods dom0 might use, as some of those are device specific for > example. OK. > > We would have to introduce a new hypercall that clears all vPCI device > state, PHYSDEVOP_pci_device_reset for example. This will involve > adding proper cleanup functions, as the current code in > vpci_remove_device() only deals with allocated memory (because so far > devices where not deassigned) but we now also need to make sure > MSI(-X) interrupts are torn down and freed, and will also require > removing any mappings of BARs into the dom0 physmap. > Thanks for the suggestion. Let me make the new PHYSDEVOP_pci_device_reset in the next version instead of current workaround. The MSI(-X) interrupts doesn't work in our platform, I don't figure the root cause yet. Could you please elaborate where we should require removing any mappings of BARs into the dom0 physmap here? Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-23 13:34 ` Huang Rui @ 2023-03-23 16:23 ` Roger Pau Monné 2023-03-24 4:37 ` Huang Rui 0 siblings, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-23 16:23 UTC (permalink / raw) To: Huang Rui Cc: Jan Beulich, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Thu, Mar 23, 2023 at 09:34:40PM +0800, Huang Rui wrote: > On Thu, Mar 23, 2023 at 06:43:53PM +0800, Roger Pau Monné wrote: > > On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote: > > > On 22.03.2023 13:33, Huang Rui wrote: > > > > I traced that while we do pci-assignable-add, we will follow below trace to > > > > bind the passthrough device. > > > > > > > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() > > > > > > > > Then kernel xen-pciback driver want to add virtual configuration spaces. In > > > > this phase, the bar_write() in xen hypervisor will be called. I still need > > > > a bit more time to figure the exact reason. May I know where the > > > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? > > > > > > Any config space access would. And I might guess ... > > > > > > > [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 > > > > [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... > > > > [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device > > > > [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc > > > > [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... > > > > [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config > > > > [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space > > > > [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 > > > > [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 > > > > [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 > > > > [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c > > > > [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d > > > > [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c > > > > [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d > > > > [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f > > > > [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 > > > > [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 > > > > [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 > > > > [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c > > > > [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 > > > > [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 > > > > [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 > > > > [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 > > > > [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 > > > > [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 > > > > [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 > > > > [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 > > > > [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 > > > > [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 > > > > [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 > > > > [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 > > > > [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device > > > > [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) > > > > [ 462.911658] Already setup the GSI :28 > > > > [ 462.911668] Already map the GSI :28 and IRQ: 115 > > > > [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device > > > > [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device > > > > [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device > > > > > > ... it is actually the reset here, saving and then restoring config space. > > > If e.g. that restore was done "blindly" (i.e. simply writing fields low to > > > high), then memory decode would be re-enabled before the BARs are written. > > > > The problem is also that we don't tell vPCI that the device has been > > reset, so the current cached state in pdev->vpci is all out of date > > with the real device state. > > > > I didn't hit this on my test because the device I was using had no > > reset support. > > > > I don't think it's feasible for Xen to detect all the possible reset > > methods dom0 might use, as some of those are device specific for > > example. > > OK. > > > > > We would have to introduce a new hypercall that clears all vPCI device > > state, PHYSDEVOP_pci_device_reset for example. This will involve > > adding proper cleanup functions, as the current code in > > vpci_remove_device() only deals with allocated memory (because so far > > devices where not deassigned) but we now also need to make sure > > MSI(-X) interrupts are torn down and freed, and will also require > > removing any mappings of BARs into the dom0 physmap. > > > > Thanks for the suggestion. Let me make the new PHYSDEVOP_pci_device_reset > in the next version instead of current workaround. > > The MSI(-X) interrupts doesn't work in our platform, I don't figure the > root cause yet. Do MSI-X interrupts work when the device is in use by dom0 (both Pv and PVH)? > Could you please elaborate where we should require removing > any mappings of BARs into the dom0 physmap here? I think you can just use `modify_bars(pdev, 0, 0)`, as that will effectively remove any BARs from the memory map. That should also take care of preemption, so you should be good to go. Regards, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH 2023-03-23 16:23 ` Roger Pau Monné @ 2023-03-24 4:37 ` Huang Rui 0 siblings, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-24 4:37 UTC (permalink / raw) To: Roger Pau Monné Cc: Jan Beulich, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Stefano Stabellini, Anthony PERARD, xen-devel On Fri, Mar 24, 2023 at 12:23:39AM +0800, Roger Pau Monné wrote: > On Thu, Mar 23, 2023 at 09:34:40PM +0800, Huang Rui wrote: > > On Thu, Mar 23, 2023 at 06:43:53PM +0800, Roger Pau Monné wrote: > > > On Wed, Mar 22, 2023 at 01:48:30PM +0100, Jan Beulich wrote: > > > > On 22.03.2023 13:33, Huang Rui wrote: > > > > > I traced that while we do pci-assignable-add, we will follow below trace to > > > > > bind the passthrough device. > > > > > > > > > > pciassignable_add()->libxl_device_pci_assignable_add()->libxl__device_pci_assignable_add()->pciback_dev_assign() > > > > > > > > > > Then kernel xen-pciback driver want to add virtual configuration spaces. In > > > > > this phase, the bar_write() in xen hypervisor will be called. I still need > > > > > a bit more time to figure the exact reason. May I know where the > > > > > xen-pciback driver would trigger a hvm_io_intercept to xen hypervisor? > > > > > > > > Any config space access would. And I might guess ... > > > > > > > > > [ 309.719049] xen_pciback: wants to seize 0000:03:00.0 > > > > > [ 462.911251] pciback 0000:03:00.0: xen_pciback: probing... > > > > > [ 462.911256] pciback 0000:03:00.0: xen_pciback: seizing device > > > > > [ 462.911257] pciback 0000:03:00.0: xen_pciback: pcistub_device_alloc > > > > > [ 462.911261] pciback 0000:03:00.0: xen_pciback: initializing... > > > > > [ 462.911263] pciback 0000:03:00.0: xen_pciback: initializing config > > > > > [ 462.911265] pciback 0000:03:00.0: xen-pciback: initializing virtual configuration space > > > > > [ 462.911268] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x00 > > > > > [ 462.911271] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x02 > > > > > [ 462.911284] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x04 > > > > > [ 462.911286] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3c > > > > > [ 462.911289] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x3d > > > > > [ 462.911291] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0c > > > > > [ 462.911294] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0d > > > > > [ 462.911296] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x0f > > > > > [ 462.911301] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x10 > > > > > [ 462.911306] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x14 > > > > > [ 462.911309] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x18 > > > > > [ 462.911313] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x1c > > > > > [ 462.911317] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x20 > > > > > [ 462.911321] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x24 > > > > > [ 462.911325] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x30 > > > > > [ 462.911358] pciback 0000:03:00.0: Found capability 0x1 at 0x50 > > > > > [ 462.911361] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x50 > > > > > [ 462.911363] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x52 > > > > > [ 462.911368] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x54 > > > > > [ 462.911371] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x56 > > > > > [ 462.911373] pciback 0000:03:00.0: xen-pciback: added config field at offset 0x57 > > > > > [ 462.911386] pciback 0000:03:00.0: Found capability 0x5 at 0xa0 > > > > > [ 462.911388] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa0 > > > > > [ 462.911391] pciback 0000:03:00.0: xen-pciback: added config field at offset 0xa2 > > > > > [ 462.911405] pciback 0000:03:00.0: xen_pciback: enabling device > > > > > [ 462.911412] pciback 0000:03:00.0: enabling device (0006 -> 0007) > > > > > [ 462.911658] Already setup the GSI :28 > > > > > [ 462.911668] Already map the GSI :28 and IRQ: 115 > > > > > [ 462.911684] pciback 0000:03:00.0: xen_pciback: save state of device > > > > > [ 462.912154] pciback 0000:03:00.0: xen_pciback: resetting (FLR, D3, etc) the device > > > > > [ 463.954998] pciback 0000:03:00.0: xen_pciback: reset device > > > > > > > > ... it is actually the reset here, saving and then restoring config space. > > > > If e.g. that restore was done "blindly" (i.e. simply writing fields low to > > > > high), then memory decode would be re-enabled before the BARs are written. > > > > > > The problem is also that we don't tell vPCI that the device has been > > > reset, so the current cached state in pdev->vpci is all out of date > > > with the real device state. > > > > > > I didn't hit this on my test because the device I was using had no > > > reset support. > > > > > > I don't think it's feasible for Xen to detect all the possible reset > > > methods dom0 might use, as some of those are device specific for > > > example. > > > > OK. > > > > > > > > We would have to introduce a new hypercall that clears all vPCI device > > > state, PHYSDEVOP_pci_device_reset for example. This will involve > > > adding proper cleanup functions, as the current code in > > > vpci_remove_device() only deals with allocated memory (because so far > > > devices where not deassigned) but we now also need to make sure > > > MSI(-X) interrupts are torn down and freed, and will also require > > > removing any mappings of BARs into the dom0 physmap. > > > > > > > Thanks for the suggestion. Let me make the new PHYSDEVOP_pci_device_reset > > in the next version instead of current workaround. > > > > The MSI(-X) interrupts doesn't work in our platform, I don't figure the > > root cause yet. > > Do MSI-X interrupts work when the device is in use by dom0 (both Pv > and PVH)? Yes, dom0 works well. But they don't work on passthrough devices in domU whatever with PV or PVH. So I would like to implement the gsi firstly, then continue checking the MSI(-X) issues. > > > Could you please elaborate where we should require removing > > any mappings of BARs into the dom0 physmap here? > > I think you can just use `modify_bars(pdev, 0, 0)`, as that will > effectively remove any BARs from the memory map. That should also > take care of preemption, so you should be good to go. > Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH Huang Rui @ 2023-03-12 7:54 ` Huang Rui 2023-03-14 16:27 ` Jan Beulich 2023-03-15 15:57 ` Roger Pau Monné 2023-03-12 7:54 ` [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call Huang Rui ` (4 subsequent siblings) 7 siblings, 2 replies; 75+ messages in thread From: Huang Rui @ 2023-03-12 7:54 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui From: Chen Jiqian <Jiqian.Chen@amd.com> PVH is also hvm type domain, but PVH hasn't X86_EMU_USE_PIRQ flag. So, when dom0 is PVH and call PHYSDEVOP_map_pirq, it will fail at check has_pirq(); Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> --- xen/arch/x86/hvm/hypercall.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 405d0a95af..16a2f5c0b3 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -89,8 +89,6 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: - if ( !has_pirq(currd) ) - return -ENOSYS; break; case PHYSDEVOP_pci_mmcfg_reserved: -- 2.25.1 ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH 2023-03-12 7:54 ` [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH Huang Rui @ 2023-03-14 16:27 ` Jan Beulich 2023-03-15 15:57 ` Roger Pau Monné 1 sibling, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-14 16:27 UTC (permalink / raw) To: Huang Rui Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 12.03.2023 08:54, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > PVH is also hvm type domain, but PVH hasn't X86_EMU_USE_PIRQ > flag. So, when dom0 is PVH and call PHYSDEVOP_map_pirq, it > will fail at check has_pirq(); > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > Signed-off-by: Huang Rui <ray.huang@amd.com> Please see b96b50004804 ("x86: remove XENFEAT_hvm_pirqs for PVHv2 guests"), which clearly says that these sub-ops shouldn't be used by PVH domains. Plus if you're after just one sub-op (assuming that indeed needs making available for a yet to be supplied reason), why ... > --- a/xen/arch/x86/hvm/hypercall.c > +++ b/xen/arch/x86/hvm/hypercall.c > @@ -89,8 +89,6 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > case PHYSDEVOP_eoi: > case PHYSDEVOP_irq_status_query: > case PHYSDEVOP_get_free_pirq: > - if ( !has_pirq(currd) ) > - return -ENOSYS; > break; ... do you enable several more by simply dropping code altogether? Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH 2023-03-12 7:54 ` [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH Huang Rui 2023-03-14 16:27 ` Jan Beulich @ 2023-03-15 15:57 ` Roger Pau Monné 2023-03-16 0:22 ` Stefano Stabellini 2023-03-21 10:09 ` Huang Rui 1 sibling, 2 replies; 75+ messages in thread From: Roger Pau Monné @ 2023-03-15 15:57 UTC (permalink / raw) To: Huang Rui Cc: Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On Sun, Mar 12, 2023 at 03:54:52PM +0800, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > PVH is also hvm type domain, but PVH hasn't X86_EMU_USE_PIRQ > flag. So, when dom0 is PVH and call PHYSDEVOP_map_pirq, it > will fail at check has_pirq(); > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > Signed-off-by: Huang Rui <ray.huang@amd.com> > --- > xen/arch/x86/hvm/hypercall.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c > index 405d0a95af..16a2f5c0b3 100644 > --- a/xen/arch/x86/hvm/hypercall.c > +++ b/xen/arch/x86/hvm/hypercall.c > @@ -89,8 +89,6 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > case PHYSDEVOP_eoi: > case PHYSDEVOP_irq_status_query: > case PHYSDEVOP_get_free_pirq: > - if ( !has_pirq(currd) ) > - return -ENOSYS; Since I've taken a look at the Linux side of this, it seems like you need PHYSDEVOP_map_pirq and PHYSDEVOP_setup_gsi, but the later is not in this list because has never been available to HVM type guests. I would like to better understand the usage by PVH dom0 for GSI passthrough before deciding on what to do here. IIRC QEMU also uses PHYSDEVOP_{un,}map_pirq in order to allocate MSI(-X) interrupts. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH 2023-03-15 15:57 ` Roger Pau Monné @ 2023-03-16 0:22 ` Stefano Stabellini 2023-03-21 10:09 ` Huang Rui 1 sibling, 0 replies; 75+ messages in thread From: Stefano Stabellini @ 2023-03-16 0:22 UTC (permalink / raw) To: Roger Pau Monné Cc: Huang Rui, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian [-- Attachment #1: Type: text/plain, Size: 1538 bytes --] On Wed, 15 Mar 2023, Roger Pau Monné wrote: > On Sun, Mar 12, 2023 at 03:54:52PM +0800, Huang Rui wrote: > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > PVH is also hvm type domain, but PVH hasn't X86_EMU_USE_PIRQ > > flag. So, when dom0 is PVH and call PHYSDEVOP_map_pirq, it > > will fail at check has_pirq(); > > > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > --- > > xen/arch/x86/hvm/hypercall.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c > > index 405d0a95af..16a2f5c0b3 100644 > > --- a/xen/arch/x86/hvm/hypercall.c > > +++ b/xen/arch/x86/hvm/hypercall.c > > @@ -89,8 +89,6 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > > case PHYSDEVOP_eoi: > > case PHYSDEVOP_irq_status_query: > > case PHYSDEVOP_get_free_pirq: > > - if ( !has_pirq(currd) ) > > - return -ENOSYS; > > Since I've taken a look at the Linux side of this, it seems like you > need PHYSDEVOP_map_pirq and PHYSDEVOP_setup_gsi, but the later is not > in this list because has never been available to HVM type guests. > > I would like to better understand the usage by PVH dom0 for GSI > passthrough before deciding on what to do here. IIRC QEMU also uses > PHYSDEVOP_{un,}map_pirq in order to allocate MSI(-X) interrupts. I'll let Ray reply here, but I think you are right: HYSDEVOP_{un,}map_pirq are needed so that QEMU can run in PVH Dom0 to support HVM guests. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH 2023-03-15 15:57 ` Roger Pau Monné 2023-03-16 0:22 ` Stefano Stabellini @ 2023-03-21 10:09 ` Huang Rui 2023-03-21 10:17 ` Jan Beulich 1 sibling, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-21 10:09 UTC (permalink / raw) To: Roger Pau Monné Cc: Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian On Wed, Mar 15, 2023 at 11:57:45PM +0800, Roger Pau Monné wrote: > On Sun, Mar 12, 2023 at 03:54:52PM +0800, Huang Rui wrote: > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > PVH is also hvm type domain, but PVH hasn't X86_EMU_USE_PIRQ > > flag. So, when dom0 is PVH and call PHYSDEVOP_map_pirq, it > > will fail at check has_pirq(); > > > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > --- > > xen/arch/x86/hvm/hypercall.c | 2 -- > > 1 file changed, 2 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c > > index 405d0a95af..16a2f5c0b3 100644 > > --- a/xen/arch/x86/hvm/hypercall.c > > +++ b/xen/arch/x86/hvm/hypercall.c > > @@ -89,8 +89,6 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) > > case PHYSDEVOP_eoi: > > case PHYSDEVOP_irq_status_query: > > case PHYSDEVOP_get_free_pirq: > > - if ( !has_pirq(currd) ) > > - return -ENOSYS; > > Since I've taken a look at the Linux side of this, it seems like you > need PHYSDEVOP_map_pirq and PHYSDEVOP_setup_gsi, but the later is not > in this list because has never been available to HVM type guests. Do you mean HVM guest only support MSI(-X)? > > I would like to better understand the usage by PVH dom0 for GSI > passthrough before deciding on what to do here. IIRC QEMU also uses > PHYSDEVOP_{un,}map_pirq in order to allocate MSI(-X) interrupts. > The MSI(-X) interrupt doesn't work even on the passthrough device at domU even the dom0 is PV domain. It seems a common problem, I remember Christian encountered the similar issue as well. So we fallback to use the GSI interrupt instead. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH 2023-03-21 10:09 ` Huang Rui @ 2023-03-21 10:17 ` Jan Beulich 0 siblings, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-21 10:17 UTC (permalink / raw) To: Huang Rui Cc: Stefano Stabellini, Anthony PERARD, xen-devel, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné On 21.03.2023 11:09, Huang Rui wrote: > On Wed, Mar 15, 2023 at 11:57:45PM +0800, Roger Pau Monné wrote: >> On Sun, Mar 12, 2023 at 03:54:52PM +0800, Huang Rui wrote: >>> From: Chen Jiqian <Jiqian.Chen@amd.com> >>> >>> PVH is also hvm type domain, but PVH hasn't X86_EMU_USE_PIRQ >>> flag. So, when dom0 is PVH and call PHYSDEVOP_map_pirq, it >>> will fail at check has_pirq(); >>> >>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> >>> Signed-off-by: Huang Rui <ray.huang@amd.com> >>> --- >>> xen/arch/x86/hvm/hypercall.c | 2 -- >>> 1 file changed, 2 deletions(-) >>> >>> diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c >>> index 405d0a95af..16a2f5c0b3 100644 >>> --- a/xen/arch/x86/hvm/hypercall.c >>> +++ b/xen/arch/x86/hvm/hypercall.c >>> @@ -89,8 +89,6 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) >>> case PHYSDEVOP_eoi: >>> case PHYSDEVOP_irq_status_query: >>> case PHYSDEVOP_get_free_pirq: >>> - if ( !has_pirq(currd) ) >>> - return -ENOSYS; >> >> Since I've taken a look at the Linux side of this, it seems like you >> need PHYSDEVOP_map_pirq and PHYSDEVOP_setup_gsi, but the later is not >> in this list because has never been available to HVM type guests. > > Do you mean HVM guest only support MSI(-X)? I don't think that was meant. Instead, as per discussion elsewhere, we may need to make PHYSDEVOP_setup_gsi available to PVH Dom0. (DomU-s wouldn't be allowed to use this sub-op, so the statement Roger made simply doesn't apply to "HVM guest".) >> I would like to better understand the usage by PVH dom0 for GSI >> passthrough before deciding on what to do here. IIRC QEMU also uses >> PHYSDEVOP_{un,}map_pirq in order to allocate MSI(-X) interrupts. >> > > The MSI(-X) interrupt doesn't work even on the passthrough device at domU > even the dom0 is PV domain. It seems a common problem, I remember Christian > encountered the similar issue as well. So we fallback to use the GSI > interrupt instead. Looks like this wants figuring out properly as well then. MSI(-X) generally works for pass-through devices, from all I know. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui ` (2 preceding siblings ...) 2023-03-12 7:54 ` [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH Huang Rui @ 2023-03-12 7:54 ` Huang Rui 2023-03-14 16:30 ` Jan Beulich 2023-03-12 7:54 ` [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq Huang Rui ` (3 subsequent siblings) 7 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-12 7:54 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui From: Chen Jiqian <Jiqian.Chen@amd.com> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> --- xen/arch/x86/hvm/hypercall.c | 1 + 1 file changed, 1 insertion(+) diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 16a2f5c0b3..fce786618c 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -89,6 +89,7 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: + case PHYSDEVOP_setup_gsi: break; case PHYSDEVOP_pci_mmcfg_reserved: -- 2.25.1 ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-12 7:54 ` [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call Huang Rui @ 2023-03-14 16:30 ` Jan Beulich 2023-03-15 17:01 ` Andrew Cooper 2023-03-21 12:22 ` Huang Rui 0 siblings, 2 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-14 16:30 UTC (permalink / raw) To: Huang Rui Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 12.03.2023 08:54, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> An empty description won't do here. First of all you need to address the Why? As already hinted at in the reply to the earlier patch, it looks like you're breaking the intended IRQ model for PVH. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-14 16:30 ` Jan Beulich @ 2023-03-15 17:01 ` Andrew Cooper 2023-03-16 0:26 ` Stefano Stabellini ` (2 more replies) 2023-03-21 12:22 ` Huang Rui 1 sibling, 3 replies; 75+ messages in thread From: Andrew Cooper @ 2023-03-15 17:01 UTC (permalink / raw) To: Jan Beulich, Huang Rui Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 14/03/2023 4:30 pm, Jan Beulich wrote: > On 12.03.2023 08:54, Huang Rui wrote: >> From: Chen Jiqian <Jiqian.Chen@amd.com> > An empty description won't do here. First of all you need to address the Why? > As already hinted at in the reply to the earlier patch, it looks like you're > breaking the intended IRQ model for PVH. I think this is rather unfair. Until you can point to the document which describes how IRQs are intended to work in PVH, I'd say this series is pretty damn good attempt to make something that functions, in the absence of any guidance. ~Andrew P.S. If it isn't obvious, this is a giant hint that something should be written down... ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-15 17:01 ` Andrew Cooper @ 2023-03-16 0:26 ` Stefano Stabellini 2023-03-16 0:39 ` Stefano Stabellini 2023-03-16 8:51 ` Jan Beulich 2023-03-16 7:05 ` Jan Beulich 2023-03-21 12:42 ` Huang Rui 2 siblings, 2 replies; 75+ messages in thread From: Stefano Stabellini @ 2023-03-16 0:26 UTC (permalink / raw) To: Andrew Cooper Cc: Jan Beulich, Huang Rui, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On Wed, 15 Mar 2023, Andrew Cooper wrote: > On 14/03/2023 4:30 pm, Jan Beulich wrote: > > On 12.03.2023 08:54, Huang Rui wrote: > >> From: Chen Jiqian <Jiqian.Chen@amd.com> > > An empty description won't do here. First of all you need to address the Why? > > As already hinted at in the reply to the earlier patch, it looks like you're > > breaking the intended IRQ model for PVH. > > I think this is rather unfair. > > Until you can point to the document which describes how IRQs are > intended to work in PVH, I'd say this series is pretty damn good attempt > to make something that functions, in the absence of any guidance. And to make things more confusing those calls are not needed for PVH itself, those calls are needed so that we can run QEMU to support regular HVM guests on PVH Dom0 (I'll let Ray confirm.) So technically, this is not breaking the PVH IRQ model. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-16 0:26 ` Stefano Stabellini @ 2023-03-16 0:39 ` Stefano Stabellini 2023-03-16 8:51 ` Jan Beulich 1 sibling, 0 replies; 75+ messages in thread From: Stefano Stabellini @ 2023-03-16 0:39 UTC (permalink / raw) To: Stefano Stabellini Cc: Andrew Cooper, Jan Beulich, Huang Rui, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Anthony PERARD, xen-devel On Wed, 15 Mar 2023, Stefano Stabellini wrote: > On Wed, 15 Mar 2023, Andrew Cooper wrote: > > On 14/03/2023 4:30 pm, Jan Beulich wrote: > > > On 12.03.2023 08:54, Huang Rui wrote: > > >> From: Chen Jiqian <Jiqian.Chen@amd.com> > > > An empty description won't do here. First of all you need to address the Why? > > > As already hinted at in the reply to the earlier patch, it looks like you're > > > breaking the intended IRQ model for PVH. > > > > I think this is rather unfair. > > > > Until you can point to the document which describes how IRQs are > > intended to work in PVH, I'd say this series is pretty damn good attempt > > to make something that functions, in the absence of any guidance. > > And to make things more confusing those calls are not needed for PVH > itself, those calls are needed so that we can run QEMU to support > regular HVM guests on PVH Dom0 (I'll let Ray confirm.) > > So technically, this is not breaking the PVH IRQ model. To add more info: QEMU (hw/xen/xen_pt.c) calls xc_physdev_map_pirq and xc_domain_bind_pt_pci_irq. Note that xc_domain_bind_pt_pci_irq is the key hypercall here and it takes a pirq as parameter. That is why QEMU calls xc_physdev_map_pirq, so that we can get the pirq and use the pirq as parameter for xc_domain_bind_pt_pci_irq. We need to get the above to work also with Dom0 PVH. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-16 0:26 ` Stefano Stabellini 2023-03-16 0:39 ` Stefano Stabellini @ 2023-03-16 8:51 ` Jan Beulich 2023-03-16 9:18 ` Roger Pau Monné 1 sibling, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-16 8:51 UTC (permalink / raw) To: Stefano Stabellini Cc: Huang Rui, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Anthony PERARD, xen-devel, Andrew Cooper On 16.03.2023 01:26, Stefano Stabellini wrote: > On Wed, 15 Mar 2023, Andrew Cooper wrote: >> On 14/03/2023 4:30 pm, Jan Beulich wrote: >>> On 12.03.2023 08:54, Huang Rui wrote: >>>> From: Chen Jiqian <Jiqian.Chen@amd.com> >>> An empty description won't do here. First of all you need to address the Why? >>> As already hinted at in the reply to the earlier patch, it looks like you're >>> breaking the intended IRQ model for PVH. >> >> I think this is rather unfair. >> >> Until you can point to the document which describes how IRQs are >> intended to work in PVH, I'd say this series is pretty damn good attempt >> to make something that functions, in the absence of any guidance. > > And to make things more confusing those calls are not needed for PVH > itself, those calls are needed so that we can run QEMU to support > regular HVM guests on PVH Dom0 (I'll let Ray confirm.) Ah, but that wasn't said anywhere, was it? In which case ... > So technically, this is not breaking the PVH IRQ model. ... I of course agree here. But then I guess we may want to reject attempts for a domain to do any of this to itself. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-16 8:51 ` Jan Beulich @ 2023-03-16 9:18 ` Roger Pau Monné 0 siblings, 0 replies; 75+ messages in thread From: Roger Pau Monné @ 2023-03-16 9:18 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Huang Rui, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Anthony PERARD, xen-devel, Andrew Cooper On Thu, Mar 16, 2023 at 09:51:20AM +0100, Jan Beulich wrote: > On 16.03.2023 01:26, Stefano Stabellini wrote: > > On Wed, 15 Mar 2023, Andrew Cooper wrote: > >> On 14/03/2023 4:30 pm, Jan Beulich wrote: > >>> On 12.03.2023 08:54, Huang Rui wrote: > >>>> From: Chen Jiqian <Jiqian.Chen@amd.com> > >>> An empty description won't do here. First of all you need to address the Why? > >>> As already hinted at in the reply to the earlier patch, it looks like you're > >>> breaking the intended IRQ model for PVH. > >> > >> I think this is rather unfair. > >> > >> Until you can point to the document which describes how IRQs are > >> intended to work in PVH, I'd say this series is pretty damn good attempt > >> to make something that functions, in the absence of any guidance. > > > > And to make things more confusing those calls are not needed for PVH > > itself, those calls are needed so that we can run QEMU to support > > regular HVM guests on PVH Dom0 (I'll let Ray confirm.) > > Ah, but that wasn't said anywhere, was it? In which case ... > > > So technically, this is not breaking the PVH IRQ model. > > ... I of course agree here. But then I guess we may want to reject > attempts for a domain to do any of this to itself. For PCI passthrough we strictly need the PHYSDEVOP_{un,}map_pirq because that's the only way QEMU currently has to allocate MSI(-X) vectors from physical devices in order to assign to guests. We could see about moving those to DM ops maybe in the future, as I think it would be clearer, but that shouldn't block the work here. If we start allowing PVH domains to use PIRQs we must enforce that PIRQ cannot be mapped to event channels, IOW, block EVTCHNOP_bind_pirq. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-15 17:01 ` Andrew Cooper 2023-03-16 0:26 ` Stefano Stabellini @ 2023-03-16 7:05 ` Jan Beulich 2023-03-21 12:42 ` Huang Rui 2 siblings, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-16 7:05 UTC (permalink / raw) To: Andrew Cooper Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel, Huang Rui On 15.03.2023 18:01, Andrew Cooper wrote: > On 14/03/2023 4:30 pm, Jan Beulich wrote: >> On 12.03.2023 08:54, Huang Rui wrote: >>> From: Chen Jiqian <Jiqian.Chen@amd.com> >> An empty description won't do here. First of all you need to address the Why? >> As already hinted at in the reply to the earlier patch, it looks like you're >> breaking the intended IRQ model for PVH. > > I think this is rather unfair. > > Until you can point to the document which describes how IRQs are > intended to work in PVH, I'd say this series is pretty damn good attempt > to make something that functions, in the absence of any guidance. Are you advocating for patches which don't explain why they make a certain change? Even in the absence of any documentation, the code itself can be taken as reference, and hence it can be pointed out that either something was wrong before, or something needs extending in a certain way to make some use case work which can't be mode work by other means. In the case of this series, without knowing the "Why?" for the various changes, it is also impossible to suggest alternative approaches. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-15 17:01 ` Andrew Cooper 2023-03-16 0:26 ` Stefano Stabellini 2023-03-16 7:05 ` Jan Beulich @ 2023-03-21 12:42 ` Huang Rui 2 siblings, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-21 12:42 UTC (permalink / raw) To: Andrew Cooper Cc: Jan Beulich, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On Thu, Mar 16, 2023 at 01:01:52AM +0800, Andrew Cooper wrote: > On 14/03/2023 4:30 pm, Jan Beulich wrote: > > On 12.03.2023 08:54, Huang Rui wrote: > >> From: Chen Jiqian <Jiqian.Chen@amd.com> > > An empty description won't do here. First of all you need to address the Why? > > As already hinted at in the reply to the earlier patch, it looks like you're > > breaking the intended IRQ model for PVH. > > I think this is rather unfair. > > Until you can point to the document which describes how IRQs are > intended to work in PVH, I'd say this series is pretty damn good attempt > to make something that functions, in the absence of any guidance. > Thank you, Andrew! This is the first time we submit Xen patches, any comments are warm for us. :-) Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call 2023-03-14 16:30 ` Jan Beulich 2023-03-15 17:01 ` Andrew Cooper @ 2023-03-21 12:22 ` Huang Rui 1 sibling, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-21 12:22 UTC (permalink / raw) To: Jan Beulich Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On Wed, Mar 15, 2023 at 12:30:21AM +0800, Jan Beulich wrote: > On 12.03.2023 08:54, Huang Rui wrote: > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > An empty description won't do here. First of all you need to address the Why? > As already hinted at in the reply to the earlier patch, it looks like you're > breaking the intended IRQ model for PVH. > Sorry, I used a wrong patch without commit message. Will fix in next version. Thanks, Ray ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui ` (3 preceding siblings ...) 2023-03-12 7:54 ` [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call Huang Rui @ 2023-03-12 7:54 ` Huang Rui 2023-03-14 16:36 ` Jan Beulich 2023-03-12 7:54 ` [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi Huang Rui ` (2 subsequent siblings) 7 siblings, 1 reply; 75+ messages in thread From: Huang Rui @ 2023-03-12 7:54 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui From: Chen Jiqian <Jiqian.Chen@amd.com> When passthrough gpu to guest, usersapce can only get irq instead of gsi. But it should pass gsi to guest, so that guest can get interrupt signal. So, provide function to get gsi. Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> --- tools/include/xen-sys/Linux/privcmd.h | 7 +++++++ tools/include/xencall.h | 2 ++ tools/include/xenctrl.h | 2 ++ tools/libs/call/core.c | 5 +++++ tools/libs/call/libxencall.map | 2 ++ tools/libs/call/linux.c | 14 ++++++++++++++ tools/libs/call/private.h | 9 +++++++++ tools/libs/ctrl/xc_physdev.c | 4 ++++ 8 files changed, 45 insertions(+) diff --git a/tools/include/xen-sys/Linux/privcmd.h b/tools/include/xen-sys/Linux/privcmd.h index bc60e8fd55..d72e785b5d 100644 --- a/tools/include/xen-sys/Linux/privcmd.h +++ b/tools/include/xen-sys/Linux/privcmd.h @@ -95,6 +95,11 @@ typedef struct privcmd_mmap_resource { __u64 addr; } privcmd_mmap_resource_t; +typedef struct privcmd_gsi_from_irq { + __u32 irq; + __u32 gsi; +} privcmd_gsi_from_irq_t; + /* * @cmd: IOCTL_PRIVCMD_HYPERCALL * @arg: &privcmd_hypercall_t @@ -114,6 +119,8 @@ typedef struct privcmd_mmap_resource { _IOC(_IOC_NONE, 'P', 6, sizeof(domid_t)) #define IOCTL_PRIVCMD_MMAP_RESOURCE \ _IOC(_IOC_NONE, 'P', 7, sizeof(privcmd_mmap_resource_t)) +#define IOCTL_PRIVCMD_GSI_FROM_IRQ \ + _IOC(_IOC_NONE, 'P', 8, sizeof(privcmd_gsi_from_irq_t)) #define IOCTL_PRIVCMD_UNIMPLEMENTED \ _IOC(_IOC_NONE, 'P', 0xFF, 0) diff --git a/tools/include/xencall.h b/tools/include/xencall.h index fc95ed0fe5..962cb45e1f 100644 --- a/tools/include/xencall.h +++ b/tools/include/xencall.h @@ -113,6 +113,8 @@ int xencall5(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2, uint64_t arg3, uint64_t arg4, uint64_t arg5); +int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq); + /* Variant(s) of the above, as needed, returning "long" instead of "int". */ long xencall2L(xencall_handle *xcall, unsigned int op, uint64_t arg1, uint64_t arg2); diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h index 23037874d3..3918be9e53 100644 --- a/tools/include/xenctrl.h +++ b/tools/include/xenctrl.h @@ -1652,6 +1652,8 @@ int xc_physdev_unmap_pirq(xc_interface *xch, uint32_t domid, int pirq); +int xc_physdev_gsi_from_irq(xc_interface *xch, int irq); + /* * LOGGING AND ERROR REPORTING */ diff --git a/tools/libs/call/core.c b/tools/libs/call/core.c index 02c4f8e1ae..6f79f3babd 100644 --- a/tools/libs/call/core.c +++ b/tools/libs/call/core.c @@ -173,6 +173,11 @@ int xencall5(xencall_handle *xcall, unsigned int op, return osdep_hypercall(xcall, &call); } +int xen_oscall_gsi_from_irq(xencall_handle *xcall, int irq) +{ + return osdep_oscall(xcall, irq); +} + /* * Local variables: * mode: C diff --git a/tools/libs/call/libxencall.map b/tools/libs/call/libxencall.map index d18a3174e9..6cde8eda05 100644 --- a/tools/libs/call/libxencall.map +++ b/tools/libs/call/libxencall.map @@ -10,6 +10,8 @@ VERS_1.0 { xencall4; xencall5; + xen_oscall_gsi_from_irq; + xencall_alloc_buffer; xencall_free_buffer; xencall_alloc_buffer_pages; diff --git a/tools/libs/call/linux.c b/tools/libs/call/linux.c index 6d588e6bea..5267bceabf 100644 --- a/tools/libs/call/linux.c +++ b/tools/libs/call/linux.c @@ -85,6 +85,20 @@ long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall) return ioctl(xcall->fd, IOCTL_PRIVCMD_HYPERCALL, hypercall); } +long osdep_oscall(xencall_handle *xcall, int irq) +{ + privcmd_gsi_from_irq_t gsi_irq = { + .irq = irq, + .gsi = -1, + }; + + if (ioctl(xcall->fd, IOCTL_PRIVCMD_GSI_FROM_IRQ, &gsi_irq)) { + return gsi_irq.irq; + } + + return gsi_irq.gsi; +} + static void *alloc_pages_bufdev(xencall_handle *xcall, size_t npages) { void *p; diff --git a/tools/libs/call/private.h b/tools/libs/call/private.h index 9c3aa432ef..01a1f5076a 100644 --- a/tools/libs/call/private.h +++ b/tools/libs/call/private.h @@ -57,6 +57,15 @@ int osdep_xencall_close(xencall_handle *xcall); long osdep_hypercall(xencall_handle *xcall, privcmd_hypercall_t *hypercall); +#if defined(__linux__) +long osdep_oscall(xencall_handle *xcall, int irq); +#else +static inline long osdep_oscall(xencall_handle *xcall, int irq) +{ + return irq; +} +#endif + void *osdep_alloc_pages(xencall_handle *xcall, size_t nr_pages); void osdep_free_pages(xencall_handle *xcall, void *p, size_t nr_pages); diff --git a/tools/libs/ctrl/xc_physdev.c b/tools/libs/ctrl/xc_physdev.c index 460a8e779c..4d3b138ebd 100644 --- a/tools/libs/ctrl/xc_physdev.c +++ b/tools/libs/ctrl/xc_physdev.c @@ -111,3 +111,7 @@ int xc_physdev_unmap_pirq(xc_interface *xch, return rc; } +int xc_physdev_gsi_from_irq(xc_interface *xch, int irq) +{ + return xen_oscall_gsi_from_irq(xch->xcall, irq); +} -- 2.25.1 ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq 2023-03-12 7:54 ` [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq Huang Rui @ 2023-03-14 16:36 ` Jan Beulich 0 siblings, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-14 16:36 UTC (permalink / raw) To: Huang Rui Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 12.03.2023 08:54, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > When passthrough gpu to guest, usersapce can only get irq > instead of gsi. But it should pass gsi to guest, so that > guest can get interrupt signal. So, provide function to get > gsi. > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > Signed-off-by: Huang Rui <ray.huang@amd.com> > --- > tools/include/xen-sys/Linux/privcmd.h | 7 +++++++ Assuming this information needs obtaining in the first place (which I doubt), I don't think privcmd is the right vehicle to get at it. Can one obtain such mapping information on baremetal Linux? If so, that would want re-using in the same or a similar way. If not, there would need to be a very good reason why the information is needed when running on top of Xen. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui ` (4 preceding siblings ...) 2023-03-12 7:54 ` [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq Huang Rui @ 2023-03-12 7:54 ` Huang Rui 2023-03-14 16:39 ` Jan Beulich 2023-03-15 16:35 ` Roger Pau Monné 2023-03-13 7:24 ` [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Christian König 2023-03-20 16:22 ` Huang Rui 7 siblings, 2 replies; 75+ messages in thread From: Huang Rui @ 2023-03-12 7:54 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Huang Rui From: Chen Jiqian <Jiqian.Chen@amd.com> Use new xc_physdev_gsi_from_irq to get the GSI number Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> Signed-off-by: Huang Rui <ray.huang@amd.com> --- tools/libs/light/libxl_pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c index f4c4f17545..47cf2799bf 100644 --- a/tools/libs/light/libxl_pci.c +++ b/tools/libs/light/libxl_pci.c @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, goto out_no_irq; } if ((fscanf(f, "%u", &irq) == 1) && irq) { + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); if (r < 0) { LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)", -- 2.25.1 ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-12 7:54 ` [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi Huang Rui @ 2023-03-14 16:39 ` Jan Beulich 2023-03-15 16:35 ` Roger Pau Monné 1 sibling, 0 replies; 75+ messages in thread From: Jan Beulich @ 2023-03-14 16:39 UTC (permalink / raw) To: Huang Rui Cc: Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Roger Pau Monné, Stefano Stabellini, Anthony PERARD, xen-devel On 12.03.2023 08:54, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > Use new xc_physdev_gsi_from_irq to get the GSI number Apart from again the "Why?", ... > --- a/tools/libs/light/libxl_pci.c > +++ b/tools/libs/light/libxl_pci.c > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > goto out_no_irq; > } > if ((fscanf(f, "%u", &irq) == 1) && irq) { > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq); > if (r < 0) { > LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)", ... aren't you breaking existing use cases this way? Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-12 7:54 ` [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi Huang Rui 2023-03-14 16:39 ` Jan Beulich @ 2023-03-15 16:35 ` Roger Pau Monné 2023-03-16 0:44 ` Stefano Stabellini 1 sibling, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-15 16:35 UTC (permalink / raw) To: Huang Rui Cc: Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > From: Chen Jiqian <Jiqian.Chen@amd.com> > > Use new xc_physdev_gsi_from_irq to get the GSI number > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > Signed-off-by: Huang Rui <ray.huang@amd.com> > --- > tools/libs/light/libxl_pci.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > index f4c4f17545..47cf2799bf 100644 > --- a/tools/libs/light/libxl_pci.c > +++ b/tools/libs/light/libxl_pci.c > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > goto out_no_irq; > } > if ((fscanf(f, "%u", &irq) == 1) && irq) { > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); This is just a shot in the dark, because I don't really have enough context to understand what's going on here, but see below. I've taken a look at this on my box, and it seems like on dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not very consistent. If devices are in use by a driver the irq sysfs node reports either the GSI irq or the MSI IRQ (in case a single MSI interrupt is setup). It seems like pciback in Linux does something to report the correct value: root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq 74 root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq 16 As you can see, making the device assignable changed the value reported by the irq node to be the GSI instead of the MSI IRQ, I would think you are missing something similar in the PVH setup (some pciback magic)? Albeit I have no idea why you would need to translate from IRQ to GSI in the way you do in this and related patches, because I'm missing the context. Regards, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-15 16:35 ` Roger Pau Monné @ 2023-03-16 0:44 ` Stefano Stabellini 2023-03-16 8:54 ` Roger Pau Monné 2023-03-16 8:55 ` Jan Beulich 0 siblings, 2 replies; 75+ messages in thread From: Stefano Stabellini @ 2023-03-16 0:44 UTC (permalink / raw) To: Roger Pau Monné Cc: Huang Rui, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian [-- Attachment #1: Type: text/plain, Size: 2774 bytes --] On Wed, 15 Mar 2023, Roger Pau Monné wrote: > On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > Use new xc_physdev_gsi_from_irq to get the GSI number > > > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > --- > > tools/libs/light/libxl_pci.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > > index f4c4f17545..47cf2799bf 100644 > > --- a/tools/libs/light/libxl_pci.c > > +++ b/tools/libs/light/libxl_pci.c > > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > > goto out_no_irq; > > } > > if ((fscanf(f, "%u", &irq) == 1) && irq) { > > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > > This is just a shot in the dark, because I don't really have enough > context to understand what's going on here, but see below. > > I've taken a look at this on my box, and it seems like on > dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > very consistent. > > If devices are in use by a driver the irq sysfs node reports either > the GSI irq or the MSI IRQ (in case a single MSI interrupt is > setup). > > It seems like pciback in Linux does something to report the correct > value: > > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > 74 > root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > 16 > > As you can see, making the device assignable changed the value > reported by the irq node to be the GSI instead of the MSI IRQ, I would > think you are missing something similar in the PVH setup (some pciback > magic)? > > Albeit I have no idea why you would need to translate from IRQ to GSI > in the way you do in this and related patches, because I'm missing the > context. As I mention in another email, also keep in mind that we need QEMU to work and QEMU calls: 1) xc_physdev_map_pirq (this is also called from libxl) 2) xc_domain_bind_pt_pci_irq In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not the IRQ. If you look at the implementation of xc_physdev_map_pirq, you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: if ( index < 0 || index >= nr_irqs_gsi ) { dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, index); return -EINVAL; } nr_irqs_gsi < 112, and the check will fail. So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need to discover the GSI number corresponding to the IRQ number. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-16 0:44 ` Stefano Stabellini @ 2023-03-16 8:54 ` Roger Pau Monné 2023-03-16 8:55 ` Jan Beulich 1 sibling, 0 replies; 75+ messages in thread From: Roger Pau Monné @ 2023-03-16 8:54 UTC (permalink / raw) To: Stefano Stabellini Cc: Huang Rui, Jan Beulich, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On Wed, Mar 15, 2023 at 05:44:12PM -0700, Stefano Stabellini wrote: > On Wed, 15 Mar 2023, Roger Pau Monné wrote: > > On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > > > From: Chen Jiqian <Jiqian.Chen@amd.com> > > > > > > Use new xc_physdev_gsi_from_irq to get the GSI number > > > > > > Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > > > Signed-off-by: Huang Rui <ray.huang@amd.com> > > > --- > > > tools/libs/light/libxl_pci.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > > > index f4c4f17545..47cf2799bf 100644 > > > --- a/tools/libs/light/libxl_pci.c > > > +++ b/tools/libs/light/libxl_pci.c > > > @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > > > goto out_no_irq; > > > } > > > if ((fscanf(f, "%u", &irq) == 1) && irq) { > > > + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > > > > This is just a shot in the dark, because I don't really have enough > > context to understand what's going on here, but see below. > > > > I've taken a look at this on my box, and it seems like on > > dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > > very consistent. > > > > If devices are in use by a driver the irq sysfs node reports either > > the GSI irq or the MSI IRQ (in case a single MSI interrupt is > > setup). > > > > It seems like pciback in Linux does something to report the correct > > value: > > > > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > > 74 > > root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > > root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > > 16 > > > > As you can see, making the device assignable changed the value > > reported by the irq node to be the GSI instead of the MSI IRQ, I would > > think you are missing something similar in the PVH setup (some pciback > > magic)? > > > > Albeit I have no idea why you would need to translate from IRQ to GSI > > in the way you do in this and related patches, because I'm missing the > > context. > > As I mention in another email, also keep in mind that we need QEMU to > work and QEMU calls: > 1) xc_physdev_map_pirq (this is also called from libxl) > 2) xc_domain_bind_pt_pci_irq Those would be fine, and don't need any translation since it's QEMU the one that creates and maps the MSI(-X) interrupts, so it knows the PIRQ without requiring any translation because it has been allocated by QEMU itself. GSI is kind of special because it's a fixed (legacy) interrupt mapped to an IO-APIC pin and assigned to the device by the firmware. The setup in that case gets done by the toolstack (libxl) because the mapping is immutable for the lifetime of the domain. > In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > the IRQ. I think the real question here is why on this scenario IRQ != GSI for GSI interrupts. On one of my systems when booted as PVH dom0 with pci=nomsi I get from /proc/interrupt: 8: 0 0 0 0 0 0 0 IO-APIC 8-edge rtc0 9: 1 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi 16: 0 0 8373 0 0 0 0 IO-APIC 16-fasteoi i801_smbus, xhci-hcd:usb1, ahci[0000:00:17.0] 17: 0 0 0 542 0 0 0 IO-APIC 17-fasteoi eth0 24: 4112 0 0 0 0 0 0 xen-percpu -virq timer0 25: 352 0 0 0 0 0 0 xen-percpu -ipi resched0 26: 6635 0 0 0 0 0 0 xen-percpu -ipi callfunc0 So GSI == IRQ, and non GSI interrupts start past the last GSI, which is 23 on this system because it has a single IO-APIC with 24 pins. We need to figure out what causes GSIs to be mapped to IRQs != GSI on your system, and then we can decide how to fix this. I would expect it could be fixed so that IRQ == GSI (like it's on PV dom0), and none of this translation to be necessary. Can you paste the output of /proc/interrupts on that system that has a GSI not identity mapped to an IRQ? > If you look at the implementation of xc_physdev_map_pirq, > you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > > if ( index < 0 || index >= nr_irqs_gsi ) > { > dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > index); > return -EINVAL; > } > > nr_irqs_gsi < 112, and the check will fail. > > So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > to discover the GSI number corresponding to the IRQ number. Right, see above, I think the real problem is that IRQ != GSI on your Linux dom0 for some reason. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-16 0:44 ` Stefano Stabellini 2023-03-16 8:54 ` Roger Pau Monné @ 2023-03-16 8:55 ` Jan Beulich 2023-03-16 9:27 ` Roger Pau Monné 1 sibling, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-16 8:55 UTC (permalink / raw) To: Stefano Stabellini, Roger Pau Monné Cc: Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On 16.03.2023 01:44, Stefano Stabellini wrote: > On Wed, 15 Mar 2023, Roger Pau Monné wrote: >> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: >>> From: Chen Jiqian <Jiqian.Chen@amd.com> >>> >>> Use new xc_physdev_gsi_from_irq to get the GSI number >>> >>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> >>> Signed-off-by: Huang Rui <ray.huang@amd.com> >>> --- >>> tools/libs/light/libxl_pci.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c >>> index f4c4f17545..47cf2799bf 100644 >>> --- a/tools/libs/light/libxl_pci.c >>> +++ b/tools/libs/light/libxl_pci.c >>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, >>> goto out_no_irq; >>> } >>> if ((fscanf(f, "%u", &irq) == 1) && irq) { >>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); >> >> This is just a shot in the dark, because I don't really have enough >> context to understand what's going on here, but see below. >> >> I've taken a look at this on my box, and it seems like on >> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not >> very consistent. >> >> If devices are in use by a driver the irq sysfs node reports either >> the GSI irq or the MSI IRQ (in case a single MSI interrupt is >> setup). >> >> It seems like pciback in Linux does something to report the correct >> value: >> >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >> 74 >> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >> 16 >> >> As you can see, making the device assignable changed the value >> reported by the irq node to be the GSI instead of the MSI IRQ, I would >> think you are missing something similar in the PVH setup (some pciback >> magic)? >> >> Albeit I have no idea why you would need to translate from IRQ to GSI >> in the way you do in this and related patches, because I'm missing the >> context. > > As I mention in another email, also keep in mind that we need QEMU to > work and QEMU calls: > 1) xc_physdev_map_pirq (this is also called from libxl) > 2) xc_domain_bind_pt_pci_irq > > > In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > the IRQ. If you look at the implementation of xc_physdev_map_pirq, > you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > > if ( index < 0 || index >= nr_irqs_gsi ) > { > dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > index); > return -EINVAL; > } > > nr_irqs_gsi < 112, and the check will fail. > > So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > to discover the GSI number corresponding to the IRQ number. That's one possible approach. Another could be (making a lot of assumptions) that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen then translates that to GSI, knowing that PVH doesn't have (host) GSIs exposed to it. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-16 8:55 ` Jan Beulich @ 2023-03-16 9:27 ` Roger Pau Monné 2023-03-16 9:42 ` Jan Beulich 0 siblings, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-16 9:27 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote: > On 16.03.2023 01:44, Stefano Stabellini wrote: > > On Wed, 15 Mar 2023, Roger Pau Monné wrote: > >> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > >>> From: Chen Jiqian <Jiqian.Chen@amd.com> > >>> > >>> Use new xc_physdev_gsi_from_irq to get the GSI number > >>> > >>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > >>> Signed-off-by: Huang Rui <ray.huang@amd.com> > >>> --- > >>> tools/libs/light/libxl_pci.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > >>> index f4c4f17545..47cf2799bf 100644 > >>> --- a/tools/libs/light/libxl_pci.c > >>> +++ b/tools/libs/light/libxl_pci.c > >>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > >>> goto out_no_irq; > >>> } > >>> if ((fscanf(f, "%u", &irq) == 1) && irq) { > >>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > >> > >> This is just a shot in the dark, because I don't really have enough > >> context to understand what's going on here, but see below. > >> > >> I've taken a look at this on my box, and it seems like on > >> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > >> very consistent. > >> > >> If devices are in use by a driver the irq sysfs node reports either > >> the GSI irq or the MSI IRQ (in case a single MSI interrupt is > >> setup). > >> > >> It seems like pciback in Linux does something to report the correct > >> value: > >> > >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >> 74 > >> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > >> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >> 16 > >> > >> As you can see, making the device assignable changed the value > >> reported by the irq node to be the GSI instead of the MSI IRQ, I would > >> think you are missing something similar in the PVH setup (some pciback > >> magic)? > >> > >> Albeit I have no idea why you would need to translate from IRQ to GSI > >> in the way you do in this and related patches, because I'm missing the > >> context. > > > > As I mention in another email, also keep in mind that we need QEMU to > > work and QEMU calls: > > 1) xc_physdev_map_pirq (this is also called from libxl) > > 2) xc_domain_bind_pt_pci_irq > > > > > > In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > > in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > > the IRQ. If you look at the implementation of xc_physdev_map_pirq, > > you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > > xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > > > > if ( index < 0 || index >= nr_irqs_gsi ) > > { > > dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > > index); > > return -EINVAL; > > } > > > > nr_irqs_gsi < 112, and the check will fail. > > > > So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > > to discover the GSI number corresponding to the IRQ number. > > That's one possible approach. Another could be (making a lot of assumptions) > that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen > then translates that to GSI, knowing that PVH doesn't have (host) GSIs > exposed to it. I don't think Xen can translate a Linux IRQ to a GSI, as that's a Linux abstraction Xen has no part in. The GSIs exposed to a PVH dom0 are the native (host) ones, as we create an emulated IO-APIC topology that mimics the physical one. Question here is why Linux ends up with a IRQ != GSI, as it's my understanding on Linux GSIs will always be identity mapped to IRQs, and the IRQ space up to the last possible GSI is explicitly reserved for this purpose. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-16 9:27 ` Roger Pau Monné @ 2023-03-16 9:42 ` Jan Beulich 2023-03-16 23:19 ` Stefano Stabellini 0 siblings, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-16 9:42 UTC (permalink / raw) To: Roger Pau Monné Cc: Stefano Stabellini, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On 16.03.2023 10:27, Roger Pau Monné wrote: > On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote: >> On 16.03.2023 01:44, Stefano Stabellini wrote: >>> On Wed, 15 Mar 2023, Roger Pau Monné wrote: >>>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: >>>>> From: Chen Jiqian <Jiqian.Chen@amd.com> >>>>> >>>>> Use new xc_physdev_gsi_from_irq to get the GSI number >>>>> >>>>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> >>>>> Signed-off-by: Huang Rui <ray.huang@amd.com> >>>>> --- >>>>> tools/libs/light/libxl_pci.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c >>>>> index f4c4f17545..47cf2799bf 100644 >>>>> --- a/tools/libs/light/libxl_pci.c >>>>> +++ b/tools/libs/light/libxl_pci.c >>>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, >>>>> goto out_no_irq; >>>>> } >>>>> if ((fscanf(f, "%u", &irq) == 1) && irq) { >>>>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); >>>> >>>> This is just a shot in the dark, because I don't really have enough >>>> context to understand what's going on here, but see below. >>>> >>>> I've taken a look at this on my box, and it seems like on >>>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not >>>> very consistent. >>>> >>>> If devices are in use by a driver the irq sysfs node reports either >>>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is >>>> setup). >>>> >>>> It seems like pciback in Linux does something to report the correct >>>> value: >>>> >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >>>> 74 >>>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq >>>> 16 >>>> >>>> As you can see, making the device assignable changed the value >>>> reported by the irq node to be the GSI instead of the MSI IRQ, I would >>>> think you are missing something similar in the PVH setup (some pciback >>>> magic)? >>>> >>>> Albeit I have no idea why you would need to translate from IRQ to GSI >>>> in the way you do in this and related patches, because I'm missing the >>>> context. >>> >>> As I mention in another email, also keep in mind that we need QEMU to >>> work and QEMU calls: >>> 1) xc_physdev_map_pirq (this is also called from libxl) >>> 2) xc_domain_bind_pt_pci_irq >>> >>> >>> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ >>> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not >>> the IRQ. If you look at the implementation of xc_physdev_map_pirq, >>> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen >>> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: >>> >>> if ( index < 0 || index >= nr_irqs_gsi ) >>> { >>> dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, >>> index); >>> return -EINVAL; >>> } >>> >>> nr_irqs_gsi < 112, and the check will fail. >>> >>> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need >>> to discover the GSI number corresponding to the IRQ number. >> >> That's one possible approach. Another could be (making a lot of assumptions) >> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen >> then translates that to GSI, knowing that PVH doesn't have (host) GSIs >> exposed to it. > > I don't think Xen can translate a Linux IRQ to a GSI, as that's a > Linux abstraction Xen has no part in. Well, I was talking about whatever Dom0 and Xen use to communicate. I.e. if at all I might have meant pIRQ, but now that you mention ... > The GSIs exposed to a PVH dom0 are the native (host) ones, as we > create an emulated IO-APIC topology that mimics the physical one. > > Question here is why Linux ends up with a IRQ != GSI, as it's my > understanding on Linux GSIs will always be identity mapped to IRQs, and > the IRQ space up to the last possible GSI is explicitly reserved for > this purpose. ... this I guess pIRQ was a PV-only concept, and it really ought to be GSI in the PVH case. So yes, it then all boils down to that Linux- internal question. Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-16 9:42 ` Jan Beulich @ 2023-03-16 23:19 ` Stefano Stabellini 2023-03-17 8:39 ` Jan Beulich 0 siblings, 1 reply; 75+ messages in thread From: Stefano Stabellini @ 2023-03-16 23:19 UTC (permalink / raw) To: Jan Beulich Cc: Roger Pau Monné, Stefano Stabellini, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian [-- Attachment #1: Type: text/plain, Size: 5735 bytes --] On Thu, 16 Mar 2023, Jan Beulich wrote: > On 16.03.2023 10:27, Roger Pau Monné wrote: > > On Thu, Mar 16, 2023 at 09:55:03AM +0100, Jan Beulich wrote: > >> On 16.03.2023 01:44, Stefano Stabellini wrote: > >>> On Wed, 15 Mar 2023, Roger Pau Monné wrote: > >>>> On Sun, Mar 12, 2023 at 03:54:55PM +0800, Huang Rui wrote: > >>>>> From: Chen Jiqian <Jiqian.Chen@amd.com> > >>>>> > >>>>> Use new xc_physdev_gsi_from_irq to get the GSI number > >>>>> > >>>>> Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com> > >>>>> Signed-off-by: Huang Rui <ray.huang@amd.com> > >>>>> --- > >>>>> tools/libs/light/libxl_pci.c | 1 + > >>>>> 1 file changed, 1 insertion(+) > >>>>> > >>>>> diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c > >>>>> index f4c4f17545..47cf2799bf 100644 > >>>>> --- a/tools/libs/light/libxl_pci.c > >>>>> +++ b/tools/libs/light/libxl_pci.c > >>>>> @@ -1486,6 +1486,7 @@ static void pci_add_dm_done(libxl__egc *egc, > >>>>> goto out_no_irq; > >>>>> } > >>>>> if ((fscanf(f, "%u", &irq) == 1) && irq) { > >>>>> + irq = xc_physdev_gsi_from_irq(ctx->xch, irq); > >>>> > >>>> This is just a shot in the dark, because I don't really have enough > >>>> context to understand what's going on here, but see below. > >>>> > >>>> I've taken a look at this on my box, and it seems like on > >>>> dom0 the value returned by /sys/bus/pci/devices/SBDF/irq is not > >>>> very consistent. > >>>> > >>>> If devices are in use by a driver the irq sysfs node reports either > >>>> the GSI irq or the MSI IRQ (in case a single MSI interrupt is > >>>> setup). > >>>> > >>>> It seems like pciback in Linux does something to report the correct > >>>> value: > >>>> > >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >>>> 74 > >>>> root@lcy2-dt107:~# xl pci-assignable-add 00:14.0 > >>>> root@lcy2-dt107:~# cat /sys/bus/pci/devices/0000\:00\:14.0/irq > >>>> 16 > >>>> > >>>> As you can see, making the device assignable changed the value > >>>> reported by the irq node to be the GSI instead of the MSI IRQ, I would > >>>> think you are missing something similar in the PVH setup (some pciback > >>>> magic)? > >>>> > >>>> Albeit I have no idea why you would need to translate from IRQ to GSI > >>>> in the way you do in this and related patches, because I'm missing the > >>>> context. > >>> > >>> As I mention in another email, also keep in mind that we need QEMU to > >>> work and QEMU calls: > >>> 1) xc_physdev_map_pirq (this is also called from libxl) > >>> 2) xc_domain_bind_pt_pci_irq > >>> > >>> > >>> In this case IRQ != GSI (IRQ == 112, GSI == 28). Sysfs returns the IRQ > >>> in Linux (112), but actually xc_physdev_map_pirq expects the GSI, not > >>> the IRQ. If you look at the implementation of xc_physdev_map_pirq, > >>> you'll the type is "MAP_PIRQ_TYPE_GSI" and also see the check in Xen > >>> xen/arch/x86/irq.c:allocate_and_map_gsi_pirq: > >>> > >>> if ( index < 0 || index >= nr_irqs_gsi ) > >>> { > >>> dprintk(XENLOG_G_ERR, "dom%d: map invalid irq %d\n", d->domain_id, > >>> index); > >>> return -EINVAL; > >>> } > >>> > >>> nr_irqs_gsi < 112, and the check will fail. > >>> > >>> So we need to pass the GSI to xc_physdev_map_pirq. To do that, we need > >>> to discover the GSI number corresponding to the IRQ number. > >> > >> That's one possible approach. Another could be (making a lot of assumptions) > >> that a PVH Dom0 would pass in the IRQ it knows for this interrupt and Xen > >> then translates that to GSI, knowing that PVH doesn't have (host) GSIs > >> exposed to it. > > > > I don't think Xen can translate a Linux IRQ to a GSI, as that's a > > Linux abstraction Xen has no part in. > > Well, I was talking about whatever Dom0 and Xen use to communicate. I.e. > if at all I might have meant pIRQ, but now that you mention ... > > > The GSIs exposed to a PVH dom0 are the native (host) ones, as we > > create an emulated IO-APIC topology that mimics the physical one. > > > > Question here is why Linux ends up with a IRQ != GSI, as it's my > > understanding on Linux GSIs will always be identity mapped to IRQs, and > > the IRQ space up to the last possible GSI is explicitly reserved for > > this purpose. > > ... this I guess pIRQ was a PV-only concept, and it really ought to be > GSI in the PVH case. So yes, it then all boils down to that Linux- > internal question. Excellent question but we'll have to wait for Ray as he is the one with access to the hardware. But I have this data I can share in the meantime: [ 1.260378] IRQ to pin mappings: [ 1.260387] IRQ1 -> 0:1 [ 1.260395] IRQ2 -> 0:2 [ 1.260403] IRQ3 -> 0:3 [ 1.260410] IRQ4 -> 0:4 [ 1.260418] IRQ5 -> 0:5 [ 1.260425] IRQ6 -> 0:6 [ 1.260432] IRQ7 -> 0:7 [ 1.260440] IRQ8 -> 0:8 [ 1.260447] IRQ9 -> 0:9 [ 1.260455] IRQ10 -> 0:10 [ 1.260462] IRQ11 -> 0:11 [ 1.260470] IRQ12 -> 0:12 [ 1.260478] IRQ13 -> 0:13 [ 1.260485] IRQ14 -> 0:14 [ 1.260493] IRQ15 -> 0:15 [ 1.260505] IRQ106 -> 1:8 [ 1.260513] IRQ112 -> 1:4 [ 1.260521] IRQ116 -> 1:13 [ 1.260529] IRQ117 -> 1:14 [ 1.260537] IRQ118 -> 1:15 [ 1.260544] .................................... done. And I think Ray traced the point in Linux where Linux gives us an IRQ == 112 (which is the one causing issues): __acpi_register_gsi-> acpi_register_gsi_ioapic-> mp_map_gsi_to_irq-> mp_map_pin_to_irq-> __irq_resolve_mapping() if (likely(data)) { desc = irq_data_to_desc(data); if (irq) *irq = data->irq; /* this IRQ is 112, IO-APIC-34 domain */ } ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-16 23:19 ` Stefano Stabellini @ 2023-03-17 8:39 ` Jan Beulich 2023-03-17 9:51 ` Roger Pau Monné 0 siblings, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-17 8:39 UTC (permalink / raw) To: Stefano Stabellini Cc: Roger Pau Monné, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On 17.03.2023 00:19, Stefano Stabellini wrote: > On Thu, 16 Mar 2023, Jan Beulich wrote: >> So yes, it then all boils down to that Linux- >> internal question. > > Excellent question but we'll have to wait for Ray as he is the one with > access to the hardware. But I have this data I can share in the > meantime: > > [ 1.260378] IRQ to pin mappings: > [ 1.260387] IRQ1 -> 0:1 > [ 1.260395] IRQ2 -> 0:2 > [ 1.260403] IRQ3 -> 0:3 > [ 1.260410] IRQ4 -> 0:4 > [ 1.260418] IRQ5 -> 0:5 > [ 1.260425] IRQ6 -> 0:6 > [ 1.260432] IRQ7 -> 0:7 > [ 1.260440] IRQ8 -> 0:8 > [ 1.260447] IRQ9 -> 0:9 > [ 1.260455] IRQ10 -> 0:10 > [ 1.260462] IRQ11 -> 0:11 > [ 1.260470] IRQ12 -> 0:12 > [ 1.260478] IRQ13 -> 0:13 > [ 1.260485] IRQ14 -> 0:14 > [ 1.260493] IRQ15 -> 0:15 > [ 1.260505] IRQ106 -> 1:8 > [ 1.260513] IRQ112 -> 1:4 > [ 1.260521] IRQ116 -> 1:13 > [ 1.260529] IRQ117 -> 1:14 > [ 1.260537] IRQ118 -> 1:15 > [ 1.260544] .................................... done. And what does Linux think are IRQs 16 ... 105? Have you compared with Linux running baremetal on the same hardware? Jan > And I think Ray traced the point in Linux where Linux gives us an IRQ == > 112 (which is the one causing issues): > > __acpi_register_gsi-> > acpi_register_gsi_ioapic-> > mp_map_gsi_to_irq-> > mp_map_pin_to_irq-> > __irq_resolve_mapping() > > if (likely(data)) { > desc = irq_data_to_desc(data); > if (irq) > *irq = data->irq; > /* this IRQ is 112, IO-APIC-34 domain */ > } ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-17 8:39 ` Jan Beulich @ 2023-03-17 9:51 ` Roger Pau Monné 2023-03-17 18:15 ` Stefano Stabellini 0 siblings, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-17 9:51 UTC (permalink / raw) To: Jan Beulich Cc: Stefano Stabellini, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > On 17.03.2023 00:19, Stefano Stabellini wrote: > > On Thu, 16 Mar 2023, Jan Beulich wrote: > >> So yes, it then all boils down to that Linux- > >> internal question. > > > > Excellent question but we'll have to wait for Ray as he is the one with > > access to the hardware. But I have this data I can share in the > > meantime: > > > > [ 1.260378] IRQ to pin mappings: > > [ 1.260387] IRQ1 -> 0:1 > > [ 1.260395] IRQ2 -> 0:2 > > [ 1.260403] IRQ3 -> 0:3 > > [ 1.260410] IRQ4 -> 0:4 > > [ 1.260418] IRQ5 -> 0:5 > > [ 1.260425] IRQ6 -> 0:6 > > [ 1.260432] IRQ7 -> 0:7 > > [ 1.260440] IRQ8 -> 0:8 > > [ 1.260447] IRQ9 -> 0:9 > > [ 1.260455] IRQ10 -> 0:10 > > [ 1.260462] IRQ11 -> 0:11 > > [ 1.260470] IRQ12 -> 0:12 > > [ 1.260478] IRQ13 -> 0:13 > > [ 1.260485] IRQ14 -> 0:14 > > [ 1.260493] IRQ15 -> 0:15 > > [ 1.260505] IRQ106 -> 1:8 > > [ 1.260513] IRQ112 -> 1:4 > > [ 1.260521] IRQ116 -> 1:13 > > [ 1.260529] IRQ117 -> 1:14 > > [ 1.260537] IRQ118 -> 1:15 > > [ 1.260544] .................................... done. > > And what does Linux think are IRQs 16 ... 105? Have you compared with > Linux running baremetal on the same hardware? So I have some emails from Ray from he time he was looking into this, and on Linux dom0 PVH dmesg there is: [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 So it seems the vIO-APIC data provided by Xen to dom0 is at least consistent. > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > 112 (which is the one causing issues): > > > > __acpi_register_gsi-> > > acpi_register_gsi_ioapic-> > > mp_map_gsi_to_irq-> > > mp_map_pin_to_irq-> > > __irq_resolve_mapping() > > > > if (likely(data)) { > > desc = irq_data_to_desc(data); > > if (irq) > > *irq = data->irq; > > /* this IRQ is 112, IO-APIC-34 domain */ > > } Could this all be a result of patch 4/5 in the Linux series ("[RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different __acpi_register_gsi hook is installed for PVH in order to setup GSIs using PHYSDEV ops instead of doing it natively from the IO-APIC? FWIW, the introduced function in that patch (acpi_register_gsi_xen_pvh()) seems to unconditionally call acpi_register_gsi_ioapic() without checking if the GSI is already registered, which might lead to multiple IRQs being allocated for the same underlying GSI? As I commented there, I think that approach is wrong. If the GSI has not been mapped in Xen (because dom0 hasn't unmasked the respective IO-APIC pin) we should add some logic in the toolstack to map it before attempting to bind. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-17 9:51 ` Roger Pau Monné @ 2023-03-17 18:15 ` Stefano Stabellini 2023-03-17 19:48 ` Roger Pau Monné 0 siblings, 1 reply; 75+ messages in thread From: Stefano Stabellini @ 2023-03-17 18:15 UTC (permalink / raw) To: Roger Pau Monné Cc: Jan Beulich, Stefano Stabellini, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian [-- Attachment #1: Type: text/plain, Size: 3609 bytes --] On Fri, 17 Mar 2023, Roger Pau Monné wrote: > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > >> So yes, it then all boils down to that Linux- > > >> internal question. > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > access to the hardware. But I have this data I can share in the > > > meantime: > > > > > > [ 1.260378] IRQ to pin mappings: > > > [ 1.260387] IRQ1 -> 0:1 > > > [ 1.260395] IRQ2 -> 0:2 > > > [ 1.260403] IRQ3 -> 0:3 > > > [ 1.260410] IRQ4 -> 0:4 > > > [ 1.260418] IRQ5 -> 0:5 > > > [ 1.260425] IRQ6 -> 0:6 > > > [ 1.260432] IRQ7 -> 0:7 > > > [ 1.260440] IRQ8 -> 0:8 > > > [ 1.260447] IRQ9 -> 0:9 > > > [ 1.260455] IRQ10 -> 0:10 > > > [ 1.260462] IRQ11 -> 0:11 > > > [ 1.260470] IRQ12 -> 0:12 > > > [ 1.260478] IRQ13 -> 0:13 > > > [ 1.260485] IRQ14 -> 0:14 > > > [ 1.260493] IRQ15 -> 0:15 > > > [ 1.260505] IRQ106 -> 1:8 > > > [ 1.260513] IRQ112 -> 1:4 > > > [ 1.260521] IRQ116 -> 1:13 > > > [ 1.260529] IRQ117 -> 1:14 > > > [ 1.260537] IRQ118 -> 1:15 > > > [ 1.260544] .................................... done. > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > Linux running baremetal on the same hardware? > > So I have some emails from Ray from he time he was looking into this, > and on Linux dom0 PVH dmesg there is: > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > consistent. > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > 112 (which is the one causing issues): > > > > > > __acpi_register_gsi-> > > > acpi_register_gsi_ioapic-> > > > mp_map_gsi_to_irq-> > > > mp_map_pin_to_irq-> > > > __irq_resolve_mapping() > > > > > > if (likely(data)) { > > > desc = irq_data_to_desc(data); > > > if (irq) > > > *irq = data->irq; > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > } > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > FWIW, the introduced function in that patch > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > acpi_register_gsi_ioapic() without checking if the GSI is already > registered, which might lead to multiple IRQs being allocated for the > same underlying GSI? I understand this point and I think it needs investigating. > As I commented there, I think that approach is wrong. If the GSI has > not been mapped in Xen (because dom0 hasn't unmasked the respective > IO-APIC pin) we should add some logic in the toolstack to map it > before attempting to bind. But this statement confuses me. The toolstack doesn't get involved in IRQ setup for PCI devices for HVM guests? Keep in mind that this is a regular HVM guest creation on PVH Dom0, so normally the IRQ setup is done by QEMU, and QEMU already calls xc_physdev_map_pirq and xc_domain_bind_pt_pci_irq. So I don't follow your statement about "the toolstack to map it before attempting to bind". ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-17 18:15 ` Stefano Stabellini @ 2023-03-17 19:48 ` Roger Pau Monné 2023-03-17 20:55 ` Stefano Stabellini 0 siblings, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-17 19:48 UTC (permalink / raw) To: Stefano Stabellini Cc: Jan Beulich, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > >> So yes, it then all boils down to that Linux- > > > >> internal question. > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > > access to the hardware. But I have this data I can share in the > > > > meantime: > > > > > > > > [ 1.260378] IRQ to pin mappings: > > > > [ 1.260387] IRQ1 -> 0:1 > > > > [ 1.260395] IRQ2 -> 0:2 > > > > [ 1.260403] IRQ3 -> 0:3 > > > > [ 1.260410] IRQ4 -> 0:4 > > > > [ 1.260418] IRQ5 -> 0:5 > > > > [ 1.260425] IRQ6 -> 0:6 > > > > [ 1.260432] IRQ7 -> 0:7 > > > > [ 1.260440] IRQ8 -> 0:8 > > > > [ 1.260447] IRQ9 -> 0:9 > > > > [ 1.260455] IRQ10 -> 0:10 > > > > [ 1.260462] IRQ11 -> 0:11 > > > > [ 1.260470] IRQ12 -> 0:12 > > > > [ 1.260478] IRQ13 -> 0:13 > > > > [ 1.260485] IRQ14 -> 0:14 > > > > [ 1.260493] IRQ15 -> 0:15 > > > > [ 1.260505] IRQ106 -> 1:8 > > > > [ 1.260513] IRQ112 -> 1:4 > > > > [ 1.260521] IRQ116 -> 1:13 > > > > [ 1.260529] IRQ117 -> 1:14 > > > > [ 1.260537] IRQ118 -> 1:15 > > > > [ 1.260544] .................................... done. > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > Linux running baremetal on the same hardware? > > > > So I have some emails from Ray from he time he was looking into this, > > and on Linux dom0 PVH dmesg there is: > > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > consistent. > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > > 112 (which is the one causing issues): > > > > > > > > __acpi_register_gsi-> > > > > acpi_register_gsi_ioapic-> > > > > mp_map_gsi_to_irq-> > > > > mp_map_pin_to_irq-> > > > > __irq_resolve_mapping() > > > > > > > > if (likely(data)) { > > > > desc = irq_data_to_desc(data); > > > > if (irq) > > > > *irq = data->irq; > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > } > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > FWIW, the introduced function in that patch > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > acpi_register_gsi_ioapic() without checking if the GSI is already > > registered, which might lead to multiple IRQs being allocated for the > > same underlying GSI? > > I understand this point and I think it needs investigating. > > > > As I commented there, I think that approach is wrong. If the GSI has > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > IO-APIC pin) we should add some logic in the toolstack to map it > > before attempting to bind. > > But this statement confuses me. The toolstack doesn't get involved in > IRQ setup for PCI devices for HVM guests? It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that cold be removed (maybe for qemu-trad only?) or it's also required by QEMU upstream, I would have to investigate more. It's my understanding it's in pci_add_dm_done() where Ray was getting the mismatched IRQ vs GSI number. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-17 19:48 ` Roger Pau Monné @ 2023-03-17 20:55 ` Stefano Stabellini 2023-03-20 15:16 ` Roger Pau Monné 2023-07-31 16:40 ` Chen, Jiqian 0 siblings, 2 replies; 75+ messages in thread From: Stefano Stabellini @ 2023-03-17 20:55 UTC (permalink / raw) To: Roger Pau Monné Cc: Stefano Stabellini, Jan Beulich, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian [-- Attachment #1: Type: text/plain, Size: 4744 bytes --] On Fri, 17 Mar 2023, Roger Pau Monné wrote: > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > > >> So yes, it then all boils down to that Linux- > > > > >> internal question. > > > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > > > access to the hardware. But I have this data I can share in the > > > > > meantime: > > > > > > > > > > [ 1.260378] IRQ to pin mappings: > > > > > [ 1.260387] IRQ1 -> 0:1 > > > > > [ 1.260395] IRQ2 -> 0:2 > > > > > [ 1.260403] IRQ3 -> 0:3 > > > > > [ 1.260410] IRQ4 -> 0:4 > > > > > [ 1.260418] IRQ5 -> 0:5 > > > > > [ 1.260425] IRQ6 -> 0:6 > > > > > [ 1.260432] IRQ7 -> 0:7 > > > > > [ 1.260440] IRQ8 -> 0:8 > > > > > [ 1.260447] IRQ9 -> 0:9 > > > > > [ 1.260455] IRQ10 -> 0:10 > > > > > [ 1.260462] IRQ11 -> 0:11 > > > > > [ 1.260470] IRQ12 -> 0:12 > > > > > [ 1.260478] IRQ13 -> 0:13 > > > > > [ 1.260485] IRQ14 -> 0:14 > > > > > [ 1.260493] IRQ15 -> 0:15 > > > > > [ 1.260505] IRQ106 -> 1:8 > > > > > [ 1.260513] IRQ112 -> 1:4 > > > > > [ 1.260521] IRQ116 -> 1:13 > > > > > [ 1.260529] IRQ117 -> 1:14 > > > > > [ 1.260537] IRQ118 -> 1:15 > > > > > [ 1.260544] .................................... done. > > > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > > Linux running baremetal on the same hardware? > > > > > > So I have some emails from Ray from he time he was looking into this, > > > and on Linux dom0 PVH dmesg there is: > > > > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > > consistent. > > > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > > > 112 (which is the one causing issues): > > > > > > > > > > __acpi_register_gsi-> > > > > > acpi_register_gsi_ioapic-> > > > > > mp_map_gsi_to_irq-> > > > > > mp_map_pin_to_irq-> > > > > > __irq_resolve_mapping() > > > > > > > > > > if (likely(data)) { > > > > > desc = irq_data_to_desc(data); > > > > > if (irq) > > > > > *irq = data->irq; > > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > > } > > > > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > > > FWIW, the introduced function in that patch > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > > acpi_register_gsi_ioapic() without checking if the GSI is already > > > registered, which might lead to multiple IRQs being allocated for the > > > same underlying GSI? > > > > I understand this point and I think it needs investigating. > > > > > > > As I commented there, I think that approach is wrong. If the GSI has > > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > > IO-APIC pin) we should add some logic in the toolstack to map it > > > before attempting to bind. > > > > But this statement confuses me. The toolstack doesn't get involved in > > IRQ setup for PCI devices for HVM guests? > > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > cold be removed (maybe for qemu-trad only?) or it's also required by > QEMU upstream, I would have to investigate more. You are right. I am not certain, but it seems like a mistake in the toolstack to me. In theory, pci_add_dm_done should only be needed for PV guests, not for HVM guests. I am not sure. But I can see the call to xc_physdev_map_pirq you were referring to now. > It's my understanding it's in pci_add_dm_done() where Ray was getting > the mismatched IRQ vs GSI number. I think the mismatch was actually caused by the xc_physdev_map_pirq call from QEMU, which makes sense because in any case it should happen before the same call done by pci_add_dm_done (pci_add_dm_done is called after sending the pci passthrough QMP command to QEMU). So the first to hit the IRQ!=GSI problem would be QEMU. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-17 20:55 ` Stefano Stabellini @ 2023-03-20 15:16 ` Roger Pau Monné 2023-03-20 15:29 ` Jan Beulich 2023-07-31 16:40 ` Chen, Jiqian 1 sibling, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-03-20 15:16 UTC (permalink / raw) To: Stefano Stabellini Cc: Jan Beulich, Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian On Fri, Mar 17, 2023 at 01:55:08PM -0700, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > > > > On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > > > > > On 17.03.2023 00:19, Stefano Stabellini wrote: > > > > > > On Thu, 16 Mar 2023, Jan Beulich wrote: > > > > > >> So yes, it then all boils down to that Linux- > > > > > >> internal question. > > > > > > > > > > > > Excellent question but we'll have to wait for Ray as he is the one with > > > > > > access to the hardware. But I have this data I can share in the > > > > > > meantime: > > > > > > > > > > > > [ 1.260378] IRQ to pin mappings: > > > > > > [ 1.260387] IRQ1 -> 0:1 > > > > > > [ 1.260395] IRQ2 -> 0:2 > > > > > > [ 1.260403] IRQ3 -> 0:3 > > > > > > [ 1.260410] IRQ4 -> 0:4 > > > > > > [ 1.260418] IRQ5 -> 0:5 > > > > > > [ 1.260425] IRQ6 -> 0:6 > > > > > > [ 1.260432] IRQ7 -> 0:7 > > > > > > [ 1.260440] IRQ8 -> 0:8 > > > > > > [ 1.260447] IRQ9 -> 0:9 > > > > > > [ 1.260455] IRQ10 -> 0:10 > > > > > > [ 1.260462] IRQ11 -> 0:11 > > > > > > [ 1.260470] IRQ12 -> 0:12 > > > > > > [ 1.260478] IRQ13 -> 0:13 > > > > > > [ 1.260485] IRQ14 -> 0:14 > > > > > > [ 1.260493] IRQ15 -> 0:15 > > > > > > [ 1.260505] IRQ106 -> 1:8 > > > > > > [ 1.260513] IRQ112 -> 1:4 > > > > > > [ 1.260521] IRQ116 -> 1:13 > > > > > > [ 1.260529] IRQ117 -> 1:14 > > > > > > [ 1.260537] IRQ118 -> 1:15 > > > > > > [ 1.260544] .................................... done. > > > > > > > > > > And what does Linux think are IRQs 16 ... 105? Have you compared with > > > > > Linux running baremetal on the same hardware? > > > > > > > > So I have some emails from Ray from he time he was looking into this, > > > > and on Linux dom0 PVH dmesg there is: > > > > > > > > [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > > > > [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > > > > > > > > So it seems the vIO-APIC data provided by Xen to dom0 is at least > > > > consistent. > > > > > > > > > > And I think Ray traced the point in Linux where Linux gives us an IRQ == > > > > > > 112 (which is the one causing issues): > > > > > > > > > > > > __acpi_register_gsi-> > > > > > > acpi_register_gsi_ioapic-> > > > > > > mp_map_gsi_to_irq-> > > > > > > mp_map_pin_to_irq-> > > > > > > __irq_resolve_mapping() > > > > > > > > > > > > if (likely(data)) { > > > > > > desc = irq_data_to_desc(data); > > > > > > if (irq) > > > > > > *irq = data->irq; > > > > > > /* this IRQ is 112, IO-APIC-34 domain */ > > > > > > } > > > > > > > > > > > > Could this all be a result of patch 4/5 in the Linux series ("[RFC > > > > PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > > > > __acpi_register_gsi hook is installed for PVH in order to setup GSIs > > > > using PHYSDEV ops instead of doing it natively from the IO-APIC? > > > > > > > > FWIW, the introduced function in that patch > > > > (acpi_register_gsi_xen_pvh()) seems to unconditionally call > > > > acpi_register_gsi_ioapic() without checking if the GSI is already > > > > registered, which might lead to multiple IRQs being allocated for the > > > > same underlying GSI? > > > > > > I understand this point and I think it needs investigating. > > > > > > > > > > As I commented there, I think that approach is wrong. If the GSI has > > > > not been mapped in Xen (because dom0 hasn't unmasked the respective > > > > IO-APIC pin) we should add some logic in the toolstack to map it > > > > before attempting to bind. > > > > > > But this statement confuses me. The toolstack doesn't get involved in > > > IRQ setup for PCI devices for HVM guests? > > > > It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > > to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > > cold be removed (maybe for qemu-trad only?) or it's also required by > > QEMU upstream, I would have to investigate more. > > You are right. I am not certain, but it seems like a mistake in the > toolstack to me. In theory, pci_add_dm_done should only be needed for PV > guests, not for HVM guests. I am not sure. But I can see the call to > xc_physdev_map_pirq you were referring to now. > > > > It's my understanding it's in pci_add_dm_done() where Ray was getting > > the mismatched IRQ vs GSI number. > > I think the mismatch was actually caused by the xc_physdev_map_pirq call > from QEMU, which makes sense because in any case it should happen before > the same call done by pci_add_dm_done (pci_add_dm_done is called after > sending the pci passthrough QMP command to QEMU). So the first to hit > the IRQ!=GSI problem would be QEMU. I've been thinking about this a bit, and I think one of the possible issues with the current handling of GSIs in a PVH dom0 is that GSIs don't get registered until/unless they are unmasked. I could see this as a problem when doing passthrough: it's possible for a GSI (iow: vIO-APIC pin) to never get unmasked on dom0, because the device driver(s) are using MSI(-X) interrupts instead. However, the IO-APIC pin must be configured for it to be able to be mapped into a domU. A possible solution is to propagate the vIO-APIC pin configuration trigger/polarity when dom0 writes the low part of the redirection table entry. The patch below enables the usage of PHYSDEVOP_{un,}map_pirq from PVH domains (I need to assert this is secure even for domUs) and also propagates the vIO-APIC pin trigger/polarity mode on writes to the low part of the RTE. Such propagation leads to the following interrupt setup in Xen: IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt() IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 2 vec:a8 IO-APIC-edge status=000 aff:{0-7}/{0-7} no_action() IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt() IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0-7}/{0} mapped, unbound IRQ: 8 vec:60 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 8(-M-) IRQ: 9 vec:68 IO-APIC-edge status=010 aff:{0}/{0} in-flight=0 d0: 9(-M-) IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0-7}/{0} mapped, unbound IRQ: 16 vec:b0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 16(-M-) IRQ: 17 vec:b8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 18 vec:c0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 19 vec:c8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 20 vec:d0 IO-APIC-edge status=010 aff:{1}/{0-7} in-flight=0 d0: 20(-M-) IRQ: 21 vec:d8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 22 vec:e0 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound IRQ: 23 vec:e8 IO-APIC-edge status=002 aff:{0-7}/{0-7} mapped, unbound Note how now all GSIs on my box are setup, even when not bound to dom0 anymore. The output without this patch looks like: IRQ: 0 vec:f0 IO-APIC-edge status=000 aff:{0}/{0} arch/x86/time.c#timer_interrupt() IRQ: 1 vec:38 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 3 vec:f1 IO-APIC-edge status=000 aff:{0-7}/{0-7} drivers/char/ns16550.c#ns16550_interrupt() IRQ: 4 vec:40 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 5 vec:48 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 6 vec:50 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 7 vec:58 IO-APIC-edge status=006 aff:{0}/{0} mapped, unbound IRQ: 8 vec:d0 IO-APIC-edge status=010 aff:{6}/{0-7} in-flight=0 d0: 8(-M-) IRQ: 9 vec:a8 IO-APIC-level status=010 aff:{2}/{0-7} in-flight=0 d0: 9(-M-) IRQ: 10 vec:70 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 11 vec:78 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 12 vec:88 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 13 vec:90 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 14 vec:98 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 15 vec:a0 IO-APIC-edge status=002 aff:{0}/{0} mapped, unbound IRQ: 16 vec:e0 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 16(-M-),d1: 16(-M-) IRQ: 20 vec:d8 IO-APIC-level status=010 aff:{6}/{0-7} in-flight=0 d0: 20(-M-) Legacy IRQs (below 16) are always registered. With the patch above I seem to be able to do PCI passthrough to an HVM domU from a PVH dom0. Regards, Roger. --- diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 405d0a95af..cc53a3bd12 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: + break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 41e3c4d5e4..50e23a093c 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig, /* Interrupt has been unmasked, bind it now. */ ret = mp_register_gsi(gsi, trig, pol); - if ( ret == -EEXIST ) - return 0; - if ( ret ) + if ( ret && ret != -EEXIST ) { gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", gsi, ret); @@ -244,12 +242,18 @@ static void vioapic_write_redirent( } else { + int ret; + unmasked = ent.fields.mask; /* Remote IRR and Delivery Status are read-only. */ ent.bits = ((ent.bits >> 32) << 32) | val; ent.fields.delivery_status = 0; ent.fields.remote_irr = pent->fields.remote_irr; unmasked = unmasked && !ent.fields.mask; + ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity); + if ( ret && ret != -EEXIST ) + gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", + gsi, ret); } *pent = ent; ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-20 15:16 ` Roger Pau Monné @ 2023-03-20 15:29 ` Jan Beulich 2023-03-20 16:50 ` Roger Pau Monné 0 siblings, 1 reply; 75+ messages in thread From: Jan Beulich @ 2023-03-20 15:29 UTC (permalink / raw) To: Roger Pau Monné Cc: Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Stefano Stabellini On 20.03.2023 16:16, Roger Pau Monné wrote: > @@ -244,12 +242,18 @@ static void vioapic_write_redirent( > } > else > { > + int ret; > + > unmasked = ent.fields.mask; > /* Remote IRR and Delivery Status are read-only. */ > ent.bits = ((ent.bits >> 32) << 32) | val; > ent.fields.delivery_status = 0; > ent.fields.remote_irr = pent->fields.remote_irr; > unmasked = unmasked && !ent.fields.mask; > + ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity); > + if ( ret && ret != -EEXIST ) > + gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", > + gsi, ret); > } I assume this is only meant to be experimental, as I'm missing confinement to Dom0 here. I also question this when the mask bit as set, as in that case neither the trigger mode bit nor the polarity one can be relied upon. At which point it would look to me as if it was necessary for Dom0 to use a hypercall instead (which naturally would then be PHYSDEVOP_setup_gsi). Jan ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-20 15:29 ` Jan Beulich @ 2023-03-20 16:50 ` Roger Pau Monné 0 siblings, 0 replies; 75+ messages in thread From: Roger Pau Monné @ 2023-03-20 16:50 UTC (permalink / raw) To: Jan Beulich Cc: Huang Rui, Anthony PERARD, xen-devel, Alex Deucher, Christian König, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian, Stefano Stabellini On Mon, Mar 20, 2023 at 04:29:25PM +0100, Jan Beulich wrote: > On 20.03.2023 16:16, Roger Pau Monné wrote: > > @@ -244,12 +242,18 @@ static void vioapic_write_redirent( > > } > > else > > { > > + int ret; > > + > > unmasked = ent.fields.mask; > > /* Remote IRR and Delivery Status are read-only. */ > > ent.bits = ((ent.bits >> 32) << 32) | val; > > ent.fields.delivery_status = 0; > > ent.fields.remote_irr = pent->fields.remote_irr; > > unmasked = unmasked && !ent.fields.mask; > > + ret = mp_register_gsi(gsi, ent.fields.trig_mode, ent.fields.polarity); > > + if ( ret && ret != -EEXIST ) > > + gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", > > + gsi, ret); > > } > > I assume this is only meant to be experimental, as I'm missing confinement > to Dom0 here. Indeed. I've attached a fixed version below, let's make sure this doesn't influence testing. > I also question this when the mask bit as set, as in that > case neither the trigger mode bit nor the polarity one can be relied upon. > At which point it would look to me as if it was necessary for Dom0 to use > a hypercall instead (which naturally would then be PHYSDEVOP_setup_gsi). AFAICT Linux does correctly set the trigger/polarity even when the pins are masked, so this should be safe as a proof of concept. Let's first figure out whether the issue is really with the lack of setup of the IO-APIC pins. At the end without input from Ray this is just a wild guess. Regards, Roger. ---- diff --git a/xen/arch/x86/hvm/hypercall.c b/xen/arch/x86/hvm/hypercall.c index 405d0a95af..cc53a3bd12 100644 --- a/xen/arch/x86/hvm/hypercall.c +++ b/xen/arch/x86/hvm/hypercall.c @@ -86,6 +86,8 @@ long hvm_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { case PHYSDEVOP_map_pirq: case PHYSDEVOP_unmap_pirq: + break; + case PHYSDEVOP_eoi: case PHYSDEVOP_irq_status_query: case PHYSDEVOP_get_free_pirq: diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c index 41e3c4d5e4..64f7b5bcc5 100644 --- a/xen/arch/x86/hvm/vioapic.c +++ b/xen/arch/x86/hvm/vioapic.c @@ -180,9 +180,7 @@ static int vioapic_hwdom_map_gsi(unsigned int gsi, unsigned int trig, /* Interrupt has been unmasked, bind it now. */ ret = mp_register_gsi(gsi, trig, pol); - if ( ret == -EEXIST ) - return 0; - if ( ret ) + if ( ret && ret != -EEXIST ) { gprintk(XENLOG_WARNING, "vioapic: error registering GSI %u: %d\n", gsi, ret); @@ -250,6 +248,16 @@ static void vioapic_write_redirent( ent.fields.delivery_status = 0; ent.fields.remote_irr = pent->fields.remote_irr; unmasked = unmasked && !ent.fields.mask; + if ( is_hardware_domain(d) ) + { + int ret = mp_register_gsi(gsi, ent.fields.trig_mode, + ent.fields.polarity); + + if ( ret && ret != -EEXIST ) + gprintk(XENLOG_WARNING, + "vioapic: error registering GSI %u: %d\n", + gsi, ret); + } } *pent = ent; ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-03-17 20:55 ` Stefano Stabellini 2023-03-20 15:16 ` Roger Pau Monné @ 2023-07-31 16:40 ` Chen, Jiqian 2023-08-23 8:57 ` Roger Pau Monné 1 sibling, 1 reply; 75+ messages in thread From: Chen, Jiqian @ 2023-07-31 16:40 UTC (permalink / raw) To: Stefano Stabellini, Roger Pau Monné, Jan Beulich Cc: Huang, Ray, Anthony PERARD, xen-devel, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian Hi, On 2023/3/18 04:55, Stefano Stabellini wrote: > On Fri, 17 Mar 2023, Roger Pau Monné wrote: >> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: >>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: >>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: >>>>> On 17.03.2023 00:19, Stefano Stabellini wrote: >>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote: >>>>>>> So yes, it then all boils down to that Linux- >>>>>>> internal question. >>>>>> >>>>>> Excellent question but we'll have to wait for Ray as he is the one with >>>>>> access to the hardware. But I have this data I can share in the >>>>>> meantime: >>>>>> >>>>>> [ 1.260378] IRQ to pin mappings: >>>>>> [ 1.260387] IRQ1 -> 0:1 >>>>>> [ 1.260395] IRQ2 -> 0:2 >>>>>> [ 1.260403] IRQ3 -> 0:3 >>>>>> [ 1.260410] IRQ4 -> 0:4 >>>>>> [ 1.260418] IRQ5 -> 0:5 >>>>>> [ 1.260425] IRQ6 -> 0:6 >>>>>> [ 1.260432] IRQ7 -> 0:7 >>>>>> [ 1.260440] IRQ8 -> 0:8 >>>>>> [ 1.260447] IRQ9 -> 0:9 >>>>>> [ 1.260455] IRQ10 -> 0:10 >>>>>> [ 1.260462] IRQ11 -> 0:11 >>>>>> [ 1.260470] IRQ12 -> 0:12 >>>>>> [ 1.260478] IRQ13 -> 0:13 >>>>>> [ 1.260485] IRQ14 -> 0:14 >>>>>> [ 1.260493] IRQ15 -> 0:15 >>>>>> [ 1.260505] IRQ106 -> 1:8 >>>>>> [ 1.260513] IRQ112 -> 1:4 >>>>>> [ 1.260521] IRQ116 -> 1:13 >>>>>> [ 1.260529] IRQ117 -> 1:14 >>>>>> [ 1.260537] IRQ118 -> 1:15 >>>>>> [ 1.260544] .................................... done. >>>>> >>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with >>>>> Linux running baremetal on the same hardware? >>>> >>>> So I have some emails from Ray from he time he was looking into this, >>>> and on Linux dom0 PVH dmesg there is: >>>> >>>> [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 >>>> [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 >>>> >>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least >>>> consistent. >>>> >>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ == >>>>>> 112 (which is the one causing issues): >>>>>> >>>>>> __acpi_register_gsi-> >>>>>> acpi_register_gsi_ioapic-> >>>>>> mp_map_gsi_to_irq-> >>>>>> mp_map_pin_to_irq-> >>>>>> __irq_resolve_mapping() >>>>>> >>>>>> if (likely(data)) { >>>>>> desc = irq_data_to_desc(data); >>>>>> if (irq) >>>>>> *irq = data->irq; >>>>>> /* this IRQ is 112, IO-APIC-34 domain */ >>>>>> } >>>> >>>> >>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC >>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different >>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs >>>> using PHYSDEV ops instead of doing it natively from the IO-APIC? >>>> >>>> FWIW, the introduced function in that patch >>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call >>>> acpi_register_gsi_ioapic() without checking if the GSI is already >>>> registered, which might lead to multiple IRQs being allocated for the >>>> same underlying GSI? >>> >>> I understand this point and I think it needs investigating. >>> >>> >>>> As I commented there, I think that approach is wrong. If the GSI has >>>> not been mapped in Xen (because dom0 hasn't unmasked the respective >>>> IO-APIC pin) we should add some logic in the toolstack to map it >>>> before attempting to bind. >>> >>> But this statement confuses me. The toolstack doesn't get involved in >>> IRQ setup for PCI devices for HVM guests? >> >> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call >> to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that >> cold be removed (maybe for qemu-trad only?) or it's also required by >> QEMU upstream, I would have to investigate more. > > You are right. I am not certain, but it seems like a mistake in the > toolstack to me. In theory, pci_add_dm_done should only be needed for PV > guests, not for HVM guests. I am not sure. But I can see the call to > xc_physdev_map_pirq you were referring to now. > > >> It's my understanding it's in pci_add_dm_done() where Ray was getting >> the mismatched IRQ vs GSI number. > > I think the mismatch was actually caused by the xc_physdev_map_pirq call > from QEMU, which makes sense because in any case it should happen before > the same call done by pci_add_dm_done (pci_add_dm_done is called after > sending the pci passthrough QMP command to QEMU). So the first to hit > the IRQ!=GSI problem would be QEMU. Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? Please forgive me for making a summary response first. And I am looking forward to your comments. 1. Why irq is not equal with gsi? As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal. When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: acpi_register_gsi_ioapic mp_map_gsi_to_irq mp_map_pin_to_irq irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here) alloc_irq_from_domain __irq_domain_alloc_irqs irq_domain_alloc_descs __irq_alloc_descs If you add some printings like below: --------------------------------------------------------------------------------------------------------------------------------------------- diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index a868b76cd3d4..970fd461be7a 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin, } } mutex_unlock(&ioapic_mutex); + printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n", + gsi, irq, idx, ioapic, pin); return irq; } diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 5db0230aa6b5..4e9613abbe96 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS, from, cnt, 0); ret = -EEXIST; + printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n", + irq, from, cnt, node, start, nr_irqs); if (irq >=0 && start != irq) goto unlock; --------------------------------------------------------------------------------------------------------------------------------------------- You will get output on PVH dom0: [ 0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096 [ 0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096 [ 0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2 [ 0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096 [ 0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 [ 0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096 [ 0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 [ 0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096 [ 0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 [ 0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096 [ 0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 [ 0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096 [ 0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 [ 0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096 [ 0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096 [ 0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096 [ 0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 [ 0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096 [ 0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 [ 0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096 [ 0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 [ 0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096 [ 0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096 [ 0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 [ 0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096 [ 0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 [ 0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 [ 0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 [ 0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 [ 0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 [ 0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 [ 0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 [ 0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 [ 0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 [ 0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 [ 0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096 [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096 [ 0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096 [ 0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096 [ 0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096 [ 0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096 [ 0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096 [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096 [ 0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096 [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 [ 0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 [ 0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 [ 0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096 [ 0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096 [ 0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096 [ 0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096 [ 0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096 [ 0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096 [ 0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096 [ 0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096 [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096 [ 0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096 [ 0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096 [ 0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096 [ 0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7 [ 1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096 [ 1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8 [ 1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 [ 1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 [ 1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 [ 1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 [ 1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 [ 1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 [ 1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 [ 1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096 [ 1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096 [ 1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096 [ 1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096 [ 1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096 [ 1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 [ 1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096 [ 1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 [ 1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096 [ 2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096 [ 3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096 [ 3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 [ 3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096 [ 3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 [ 3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096 [ 3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 [ 3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096 [ 3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096 [ 3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096 [ 3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096 [ 3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096 [ 3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096 [ 3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096 [ 3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096 [ 3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14 [ 3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096 [ 3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096 [ 3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096 [ 3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096 [ 3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096 [ 3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096 [ 3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096 [ 3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096 [ 3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096 [ 3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096 [ 3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24 [ 3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 [ 3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 [ 3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096 [ 3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096 [ 3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096 [ 3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096 [ 3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096 [ 3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096 [ 3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096 [ 3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096 [ 3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096 [ 3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096 [ 3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096 [ 3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096 [ 3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096 [ 3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096 [ 3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096 [ 3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096 [ 8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 [ 9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 [ 9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096 [ 9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096 [ 9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096 [ 9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 [ 9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096 [ 9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096 [ 9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5 [ 9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096 [ 9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 [ 9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096 [ 9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 [ 9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096 [ 10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096 [ 10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096 You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier. Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux: [ 0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2 [ 0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 [ 0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 [ 0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 [ 0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 [ 0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 [ 0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 [ 0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 [ 0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 [ 0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 [ 0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 [ 0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 [ 1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 [ 1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 [ 1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 [ 1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 [ 1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 [ 1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8 [ 1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 [ 1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 [ 1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 [ 1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 [ 1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 [ 1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 [ 1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 [ 1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 [ 1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 [ 1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 [ 1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 [ 1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 [ 1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 [ 1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 [ 1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 [ 1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 [ 1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 [ 1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 [ 1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 [ 1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 [ 1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 [ 1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 [ 1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 [ 1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 [ 1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 [ 3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 [ 3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 [ 3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 [ 3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 [ 3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 [ 3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 [ 3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 [ 3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 [ 3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 [ 3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 [ 3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 [ 3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 [ 3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 [ 3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 [ 3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14 [ 3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 [ 3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 [ 3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 [ 3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 [ 3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 [ 3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 [ 3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 [ 3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 [ 3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096 [ 3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24 [ 3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 [ 3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 [ 3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 [ 3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096 [ 3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096 [ 3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096 [ 3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096 [ 3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096 [ 3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096 [ 3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096 [ 3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096 [ 3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096 [ 3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096 [ 3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096 [ 3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096 [ 3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096 [ 3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096 [ 3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096 [ 3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096 [ 7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 [ 9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 [ 9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 [ 9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 [ 9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 [ 9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 [ 9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 [ 10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 [ 10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5 [ 10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 [ 10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 [ 10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 [ 10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 [ 10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 2. Why I do the translations between irq and gsi? After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi, it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred. Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi. So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU. And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me. 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code: pci_add_dm_done xc_physdev_map_pirq xc_domain_irq_permission XEN_DOMCTL_irq_permission pirq_access_permitted xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed. So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq. 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment). So, I called PHYSDEVOP_setup_gsi to register gsi. But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices. So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi. -- Best regards, Jiqian Chen. ^ permalink raw reply related [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-07-31 16:40 ` Chen, Jiqian @ 2023-08-23 8:57 ` Roger Pau Monné 2023-08-31 8:56 ` Chen, Jiqian 0 siblings, 1 reply; 75+ messages in thread From: Roger Pau Monné @ 2023-08-23 8:57 UTC (permalink / raw) To: Chen, Jiqian Cc: Stefano Stabellini, Jan Beulich, Huang, Ray, Anthony PERARD, xen-devel, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia On Mon, Jul 31, 2023 at 04:40:35PM +0000, Chen, Jiqian wrote: > Hi, > > On 2023/3/18 04:55, Stefano Stabellini wrote: > > On Fri, 17 Mar 2023, Roger Pau Monné wrote: > >> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: > >>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: > >>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: > >>>>> On 17.03.2023 00:19, Stefano Stabellini wrote: > >>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote: > >>>>>>> So yes, it then all boils down to that Linux- > >>>>>>> internal question. > >>>>>> > >>>>>> Excellent question but we'll have to wait for Ray as he is the one with > >>>>>> access to the hardware. But I have this data I can share in the > >>>>>> meantime: > >>>>>> > >>>>>> [ 1.260378] IRQ to pin mappings: > >>>>>> [ 1.260387] IRQ1 -> 0:1 > >>>>>> [ 1.260395] IRQ2 -> 0:2 > >>>>>> [ 1.260403] IRQ3 -> 0:3 > >>>>>> [ 1.260410] IRQ4 -> 0:4 > >>>>>> [ 1.260418] IRQ5 -> 0:5 > >>>>>> [ 1.260425] IRQ6 -> 0:6 > >>>>>> [ 1.260432] IRQ7 -> 0:7 > >>>>>> [ 1.260440] IRQ8 -> 0:8 > >>>>>> [ 1.260447] IRQ9 -> 0:9 > >>>>>> [ 1.260455] IRQ10 -> 0:10 > >>>>>> [ 1.260462] IRQ11 -> 0:11 > >>>>>> [ 1.260470] IRQ12 -> 0:12 > >>>>>> [ 1.260478] IRQ13 -> 0:13 > >>>>>> [ 1.260485] IRQ14 -> 0:14 > >>>>>> [ 1.260493] IRQ15 -> 0:15 > >>>>>> [ 1.260505] IRQ106 -> 1:8 > >>>>>> [ 1.260513] IRQ112 -> 1:4 > >>>>>> [ 1.260521] IRQ116 -> 1:13 > >>>>>> [ 1.260529] IRQ117 -> 1:14 > >>>>>> [ 1.260537] IRQ118 -> 1:15 > >>>>>> [ 1.260544] .................................... done. > >>>>> > >>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with > >>>>> Linux running baremetal on the same hardware? > >>>> > >>>> So I have some emails from Ray from he time he was looking into this, > >>>> and on Linux dom0 PVH dmesg there is: > >>>> > >>>> [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 > >>>> [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 > >>>> > >>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least > >>>> consistent. > >>>> > >>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ == > >>>>>> 112 (which is the one causing issues): > >>>>>> > >>>>>> __acpi_register_gsi-> > >>>>>> acpi_register_gsi_ioapic-> > >>>>>> mp_map_gsi_to_irq-> > >>>>>> mp_map_pin_to_irq-> > >>>>>> __irq_resolve_mapping() > >>>>>> > >>>>>> if (likely(data)) { > >>>>>> desc = irq_data_to_desc(data); > >>>>>> if (irq) > >>>>>> *irq = data->irq; > >>>>>> /* this IRQ is 112, IO-APIC-34 domain */ > >>>>>> } > >>>> > >>>> > >>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC > >>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different > >>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs > >>>> using PHYSDEV ops instead of doing it natively from the IO-APIC? > >>>> > >>>> FWIW, the introduced function in that patch > >>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call > >>>> acpi_register_gsi_ioapic() without checking if the GSI is already > >>>> registered, which might lead to multiple IRQs being allocated for the > >>>> same underlying GSI? > >>> > >>> I understand this point and I think it needs investigating. > >>> > >>> > >>>> As I commented there, I think that approach is wrong. If the GSI has > >>>> not been mapped in Xen (because dom0 hasn't unmasked the respective > >>>> IO-APIC pin) we should add some logic in the toolstack to map it > >>>> before attempting to bind. > >>> > >>> But this statement confuses me. The toolstack doesn't get involved in > >>> IRQ setup for PCI devices for HVM guests? > >> > >> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call > >> to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that > >> cold be removed (maybe for qemu-trad only?) or it's also required by > >> QEMU upstream, I would have to investigate more. > > > > You are right. I am not certain, but it seems like a mistake in the > > toolstack to me. In theory, pci_add_dm_done should only be needed for PV > > guests, not for HVM guests. I am not sure. But I can see the call to > > xc_physdev_map_pirq you were referring to now. > > > > > >> It's my understanding it's in pci_add_dm_done() where Ray was getting > >> the mismatched IRQ vs GSI number. > > > > I think the mismatch was actually caused by the xc_physdev_map_pirq call > > from QEMU, which makes sense because in any case it should happen before > > the same call done by pci_add_dm_done (pci_add_dm_done is called after > > sending the pci passthrough QMP command to QEMU). So the first to hit > > the IRQ!=GSI problem would be QEMU. > > > Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? > Please forgive me for making a summary response first. And I am looking forward to your comments. Sorry, it's been a bit since that conversation, so my recollection is vague. One of the questions was why acpi_register_gsi_xen_pvh() is needed. I think the patch that introduced it on Linux didn't have much of a commit description. > 1. Why irq is not equal with gsi? > As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal. > When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: > acpi_register_gsi_ioapic > mp_map_gsi_to_irq > mp_map_pin_to_irq > irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here) > alloc_irq_from_domain > __irq_domain_alloc_irqs > irq_domain_alloc_descs > __irq_alloc_descs Won't you perform double GSI registrations with Xen if both acpi_register_gsi_ioapic() and acpi_register_gsi_xen_pvh() are used? > > If you add some printings like below: > --------------------------------------------------------------------------------------------------------------------------------------------- > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c > index a868b76cd3d4..970fd461be7a 100644 > --- a/arch/x86/kernel/apic/io_apic.c > +++ b/arch/x86/kernel/apic/io_apic.c > @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin, > } > } > mutex_unlock(&ioapic_mutex); > + printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n", > + gsi, irq, idx, ioapic, pin); > > return irq; > } > diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c > index 5db0230aa6b5..4e9613abbe96 100644 > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, > start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS, > from, cnt, 0); > ret = -EEXIST; > + printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n", > + irq, from, cnt, node, start, nr_irqs); > if (irq >=0 && start != irq) > goto unlock; > --------------------------------------------------------------------------------------------------------------------------------------------- > You will get output on PVH dom0: > > [ 0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096 > [ 0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096 > [ 0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2 > [ 0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096 > [ 0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 > [ 0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096 > [ 0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 > [ 0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096 > [ 0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 > [ 0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096 > [ 0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 > [ 0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096 > [ 0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 > [ 0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096 > [ 0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096 > [ 0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096 > [ 0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 > [ 0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096 > [ 0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 > [ 0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096 > [ 0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 > [ 0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096 > [ 0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096 > [ 0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 > [ 0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096 > [ 0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 > [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 > [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 > [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 > [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 > [ 0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 > [ 0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 > [ 0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 > [ 0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 > [ 0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 > [ 0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 > [ 0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 > [ 0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 > [ 0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 > [ 0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 > [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096 > [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096 > [ 0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096 > [ 0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096 > [ 0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096 > [ 0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096 > [ 0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096 > [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096 > [ 0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096 > [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 > [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 > [ 0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 > [ 0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 > [ 0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096 > [ 0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096 > [ 0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096 > [ 0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096 > [ 0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096 > [ 0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096 > [ 0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096 > [ 0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096 > [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096 > [ 0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096 > [ 0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096 > [ 0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096 > [ 0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7 > [ 1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096 > [ 1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8 > [ 1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 > [ 1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 > [ 1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 > [ 1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 > [ 1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 > [ 1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 > [ 1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 > [ 1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096 > [ 1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096 > [ 1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096 > [ 1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096 > [ 1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096 > [ 1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 > [ 1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096 > [ 1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 > [ 1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096 > [ 2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096 > [ 3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096 > [ 3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 > [ 3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096 > [ 3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 > [ 3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096 > [ 3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 > [ 3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096 > [ 3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096 > [ 3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096 > [ 3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096 > [ 3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096 > [ 3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096 > [ 3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096 > [ 3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096 > [ 3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14 > [ 3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096 > [ 3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096 > [ 3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096 > [ 3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096 > [ 3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096 > [ 3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096 > [ 3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096 > [ 3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096 > [ 3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096 > [ 3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096 > [ 3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24 > [ 3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 > [ 3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 > [ 3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096 > [ 3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096 > [ 3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096 > [ 3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096 > [ 3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096 > [ 3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096 > [ 3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096 > [ 3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096 > [ 3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096 > [ 3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096 > [ 3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096 > [ 3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096 > [ 3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096 > [ 3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096 > [ 3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096 > [ 3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096 > [ 8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 > [ 9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 > [ 9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096 > [ 9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096 > [ 9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096 > [ 9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 > [ 9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096 > [ 9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096 > [ 9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5 > [ 9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096 > [ 9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 > [ 9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096 > [ 9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 > [ 9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096 > [ 10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096 > [ 10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096 > > You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier. > Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux: It does seem weird to me that it does identity map legacy IRQs (<16), but then for GSI >= 16 it starts assigning IRQs in the 100 range. What uses the IRQ range [24, 105]? Also IIRC on a PV dom0 GSIs are identity mapped to IRQs on Linux? Or maybe that's just a side effect of GSIs being identity mapped into PIRQs by Xen? > [ 0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2 > [ 0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 > [ 0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 > [ 0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 > [ 0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 > [ 0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 > [ 0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 > [ 0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 > [ 0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 > [ 0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 > [ 0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 > [ 0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 > [ 1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 > [ 1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 > [ 1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 > [ 1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 > [ 1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 > [ 1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8 > [ 1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 > [ 1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 > [ 1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 > [ 1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 > [ 1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 > [ 1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 > [ 1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 > [ 1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 > [ 1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 > [ 1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 > [ 1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 > [ 1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 > [ 1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 > [ 1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 > [ 1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 > [ 1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 > [ 1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 > [ 1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 > [ 1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 > [ 1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 > [ 1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 > [ 1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 > [ 1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 > [ 1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 > [ 1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 > [ 3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 > [ 3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 > [ 3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 > [ 3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 > [ 3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 > [ 3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 > [ 3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 > [ 3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 > [ 3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 > [ 3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 > [ 3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 > [ 3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 > [ 3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 > [ 3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 > [ 3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14 > [ 3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 > [ 3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 > [ 3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 > [ 3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 > [ 3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 > [ 3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 > [ 3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 > [ 3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 > [ 3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096 > [ 3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24 > [ 3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 > [ 3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 > [ 3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 > [ 3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096 > [ 3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096 > [ 3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096 > [ 3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096 > [ 3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096 > [ 3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096 > [ 3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096 > [ 3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096 > [ 3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096 > [ 3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096 > [ 3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096 > [ 3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096 > [ 3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096 > [ 3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096 > [ 3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096 > [ 3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096 > [ 7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 > [ 9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 > [ 9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 > [ 9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 > [ 9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 > [ 9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 > [ 9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 > [ 10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 > [ 10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5 > [ 10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 > [ 10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 > [ 10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 > [ 10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 > [ 10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 > > 2. Why I do the translations between irq and gsi? > > After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi, So that's quite a difference. For some reason on a PV dom0 xen_host_pci_get_hex_value will return the IRQ that's identity mapped to the GSI. Is that because a PV dom0 will use acpi_register_gsi_xen() instead of acpi_register_gsi_ioapic()? > it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred. > Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi. > > So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU. > > And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me. > > 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? > > Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code: > pci_add_dm_done > xc_physdev_map_pirq > xc_domain_irq_permission > XEN_DOMCTL_irq_permission > pirq_access_permitted > xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed. I'm not sure of this specific case, but we shouldn't attempt to fit the same exact PCI pass through workflow that a PV dom0 uses into a PVH dom0. IOW: it might make sense to diverge some paths in order to avoid importing PV specific concepts into PVH without a reason. > So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq. > > 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? > > Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment). > So, I called PHYSDEVOP_setup_gsi to register gsi. > But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices. > So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi. Right, given how long it's been since the last series, I think we need a new series posted in order to see how this looks now. Thanks, Roger. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi 2023-08-23 8:57 ` Roger Pau Monné @ 2023-08-31 8:56 ` Chen, Jiqian 0 siblings, 0 replies; 75+ messages in thread From: Chen, Jiqian @ 2023-08-31 8:56 UTC (permalink / raw) To: Roger Pau Monné Cc: Stefano Stabellini, Jan Beulich, Huang, Ray, Anthony PERARD, xen-devel, Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian Thanks Roger, we will send a new series after the freezing time of Xen release 4.18. On 2023/8/23 16:57, Roger Pau Monné wrote: > On Mon, Jul 31, 2023 at 04:40:35PM +0000, Chen, Jiqian wrote: >> Hi, >> >> On 2023/3/18 04:55, Stefano Stabellini wrote: >>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: >>>> On Fri, Mar 17, 2023 at 11:15:37AM -0700, Stefano Stabellini wrote: >>>>> On Fri, 17 Mar 2023, Roger Pau Monné wrote: >>>>>> On Fri, Mar 17, 2023 at 09:39:52AM +0100, Jan Beulich wrote: >>>>>>> On 17.03.2023 00:19, Stefano Stabellini wrote: >>>>>>>> On Thu, 16 Mar 2023, Jan Beulich wrote: >>>>>>>>> So yes, it then all boils down to that Linux- >>>>>>>>> internal question. >>>>>>>> >>>>>>>> Excellent question but we'll have to wait for Ray as he is the one with >>>>>>>> access to the hardware. But I have this data I can share in the >>>>>>>> meantime: >>>>>>>> >>>>>>>> [ 1.260378] IRQ to pin mappings: >>>>>>>> [ 1.260387] IRQ1 -> 0:1 >>>>>>>> [ 1.260395] IRQ2 -> 0:2 >>>>>>>> [ 1.260403] IRQ3 -> 0:3 >>>>>>>> [ 1.260410] IRQ4 -> 0:4 >>>>>>>> [ 1.260418] IRQ5 -> 0:5 >>>>>>>> [ 1.260425] IRQ6 -> 0:6 >>>>>>>> [ 1.260432] IRQ7 -> 0:7 >>>>>>>> [ 1.260440] IRQ8 -> 0:8 >>>>>>>> [ 1.260447] IRQ9 -> 0:9 >>>>>>>> [ 1.260455] IRQ10 -> 0:10 >>>>>>>> [ 1.260462] IRQ11 -> 0:11 >>>>>>>> [ 1.260470] IRQ12 -> 0:12 >>>>>>>> [ 1.260478] IRQ13 -> 0:13 >>>>>>>> [ 1.260485] IRQ14 -> 0:14 >>>>>>>> [ 1.260493] IRQ15 -> 0:15 >>>>>>>> [ 1.260505] IRQ106 -> 1:8 >>>>>>>> [ 1.260513] IRQ112 -> 1:4 >>>>>>>> [ 1.260521] IRQ116 -> 1:13 >>>>>>>> [ 1.260529] IRQ117 -> 1:14 >>>>>>>> [ 1.260537] IRQ118 -> 1:15 >>>>>>>> [ 1.260544] .................................... done. >>>>>>> >>>>>>> And what does Linux think are IRQs 16 ... 105? Have you compared with >>>>>>> Linux running baremetal on the same hardware? >>>>>> >>>>>> So I have some emails from Ray from he time he was looking into this, >>>>>> and on Linux dom0 PVH dmesg there is: >>>>>> >>>>>> [ 0.065063] IOAPIC[0]: apic_id 33, version 17, address 0xfec00000, GSI 0-23 >>>>>> [ 0.065096] IOAPIC[1]: apic_id 34, version 17, address 0xfec01000, GSI 24-55 >>>>>> >>>>>> So it seems the vIO-APIC data provided by Xen to dom0 is at least >>>>>> consistent. >>>>>> >>>>>>>> And I think Ray traced the point in Linux where Linux gives us an IRQ == >>>>>>>> 112 (which is the one causing issues): >>>>>>>> >>>>>>>> __acpi_register_gsi-> >>>>>>>> acpi_register_gsi_ioapic-> >>>>>>>> mp_map_gsi_to_irq-> >>>>>>>> mp_map_pin_to_irq-> >>>>>>>> __irq_resolve_mapping() >>>>>>>> >>>>>>>> if (likely(data)) { >>>>>>>> desc = irq_data_to_desc(data); >>>>>>>> if (irq) >>>>>>>> *irq = data->irq; >>>>>>>> /* this IRQ is 112, IO-APIC-34 domain */ >>>>>>>> } >>>>>> >>>>>> >>>>>> Could this all be a result of patch 4/5 in the Linux series ("[RFC >>>>>> PATCH 4/5] x86/xen: acpi registers gsi for xen pvh"), where a different >>>>>> __acpi_register_gsi hook is installed for PVH in order to setup GSIs >>>>>> using PHYSDEV ops instead of doing it natively from the IO-APIC? >>>>>> >>>>>> FWIW, the introduced function in that patch >>>>>> (acpi_register_gsi_xen_pvh()) seems to unconditionally call >>>>>> acpi_register_gsi_ioapic() without checking if the GSI is already >>>>>> registered, which might lead to multiple IRQs being allocated for the >>>>>> same underlying GSI? >>>>> >>>>> I understand this point and I think it needs investigating. >>>>> >>>>> >>>>>> As I commented there, I think that approach is wrong. If the GSI has >>>>>> not been mapped in Xen (because dom0 hasn't unmasked the respective >>>>>> IO-APIC pin) we should add some logic in the toolstack to map it >>>>>> before attempting to bind. >>>>> >>>>> But this statement confuses me. The toolstack doesn't get involved in >>>>> IRQ setup for PCI devices for HVM guests? >>>> >>>> It does for GSI interrupts AFAICT, see pci_add_dm_done() and the call >>>> to xc_physdev_map_pirq(). I'm not sure whether that's a remnant that >>>> cold be removed (maybe for qemu-trad only?) or it's also required by >>>> QEMU upstream, I would have to investigate more. >>> >>> You are right. I am not certain, but it seems like a mistake in the >>> toolstack to me. In theory, pci_add_dm_done should only be needed for PV >>> guests, not for HVM guests. I am not sure. But I can see the call to >>> xc_physdev_map_pirq you were referring to now. >>> >>> >>>> It's my understanding it's in pci_add_dm_done() where Ray was getting >>>> the mismatched IRQ vs GSI number. >>> >>> I think the mismatch was actually caused by the xc_physdev_map_pirq call >>> from QEMU, which makes sense because in any case it should happen before >>> the same call done by pci_add_dm_done (pci_add_dm_done is called after >>> sending the pci passthrough QMP command to QEMU). So the first to hit >>> the IRQ!=GSI problem would be QEMU. >> >> >> Sorry for replying to you so late. And thank you all for review. I realized that your questions mainly focus on the following points: 1. Why irq is not equal with gsi? 2. Why I do the translations between irq and gsi? 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? >> Please forgive me for making a summary response first. And I am looking forward to your comments. > > Sorry, it's been a bit since that conversation, so my recollection is > vague. > > One of the questions was why acpi_register_gsi_xen_pvh() is needed. I > think the patch that introduced it on Linux didn't have much of a > commit description. PVH and baremetal both use acpi_register_gsi_ioapic to alloc irq for gsi. And I add function acpi_register_gsi_xen_pvh to replace acpi_register_gsi_ioapic for PVH, and then I can do something special for PVH, like map_pirq, setup_gsi, etc. > >> 1. Why irq is not equal with gsi? >> As far as I know, irq is dynamically allocated according to gsi, they are not necessarily equal. >> When I run "sudo xl pci-assignable-add 03:00.0" to assign passthrough device(Taking dGPU on my environment as an example, which gsi is 28). It will call into acpi_register_gsi_ioapic to get irq, the callstack is: >> acpi_register_gsi_ioapic >> mp_map_gsi_to_irq >> mp_map_pin_to_irq >> irq_find_mapping(if gsi has been mapped to an irq before, it will return corresponding irq here) >> alloc_irq_from_domain >> __irq_domain_alloc_irqs >> irq_domain_alloc_descs >> __irq_alloc_descs > > Won't you perform double GSI registrations with Xen if both > acpi_register_gsi_ioapic() and acpi_register_gsi_xen_pvh() are used? In the original PVH code, __acpi_register_gsi is set acpi_register_gsi_ioapic in callstack start_kernel->setup_arch->acpi_boot_init->acpi_process_madt->acpi_set_irq_model_ioapic. In my code, I use acpi_register_gsi_xen_pvh to replace acpi_register_gsi_ioapic in call stack start_kernel-> init_IRQ-> xen_init_IRQ-> pci_xen_pvh_init. So acpi_register_gsi_ioapic will be called only once. > >> >> If you add some printings like below: >> --------------------------------------------------------------------------------------------------------------------------------------------- >> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c >> index a868b76cd3d4..970fd461be7a 100644 >> --- a/arch/x86/kernel/apic/io_apic.c >> +++ b/arch/x86/kernel/apic/io_apic.c >> @@ -1067,6 +1067,8 @@ static int mp_map_pin_to_irq(u32 gsi, int idx, int ioapic, int pin, >> } >> } >> mutex_unlock(&ioapic_mutex); >> + printk("cjq_debug mp_map_pin_to_irq gsi: %u, irq: %d, idx: %d, ioapic: %d, pin: %d\n", >> + gsi, irq, idx, ioapic, pin); >> >> return irq; >> } >> diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c >> index 5db0230aa6b5..4e9613abbe96 100644 >> --- a/kernel/irq/irqdesc.c >> +++ b/kernel/irq/irqdesc.c >> @@ -786,6 +786,8 @@ __irq_alloc_descs(int irq, unsigned int from, unsigned int cnt, int node, >> start = bitmap_find_next_zero_area(allocated_irqs, IRQ_BITMAP_BITS, >> from, cnt, 0); >> ret = -EEXIST; >> + printk("cjq_debug __irq_alloc_descs irq: %d, from: %u, cnt: %u, node: %d, start: %d, nr_irqs: %d\n", >> + irq, from, cnt, node, start, nr_irqs); >> if (irq >=0 && start != irq) >> goto unlock; >> --------------------------------------------------------------------------------------------------------------------------------------------- >> You will get output on PVH dom0: >> >> [ 0.181560] cjq_debug __irq_alloc_descs irq: 1, from: 1, cnt: 1, node: -1, start: 1, nr_irqs: 1096 >> [ 0.181639] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 0.181641] cjq_debug __irq_alloc_descs irq: 2, from: 2, cnt: 1, node: -1, start: 2, nr_irqs: 1096 >> [ 0.181682] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 2, idx: 0, ioapic: 0, pin: 2 >> [ 0.181683] cjq_debug __irq_alloc_descs irq: 3, from: 3, cnt: 1, node: -1, start: 3, nr_irqs: 1096 >> [ 0.181715] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 >> [ 0.181716] cjq_debug __irq_alloc_descs irq: 4, from: 4, cnt: 1, node: -1, start: 4, nr_irqs: 1096 >> [ 0.181751] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 >> [ 0.181752] cjq_debug __irq_alloc_descs irq: 5, from: 5, cnt: 1, node: -1, start: 5, nr_irqs: 1096 >> [ 0.181783] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 >> [ 0.181784] cjq_debug __irq_alloc_descs irq: 6, from: 6, cnt: 1, node: -1, start: 6, nr_irqs: 1096 >> [ 0.181813] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 >> [ 0.181814] cjq_debug __irq_alloc_descs irq: 7, from: 7, cnt: 1, node: -1, start: 7, nr_irqs: 1096 >> [ 0.181856] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 >> [ 0.181857] cjq_debug __irq_alloc_descs irq: 8, from: 8, cnt: 1, node: -1, start: 8, nr_irqs: 1096 >> [ 0.181888] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 0.181889] cjq_debug __irq_alloc_descs irq: 9, from: 9, cnt: 1, node: -1, start: 9, nr_irqs: 1096 >> [ 0.181918] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 0.181919] cjq_debug __irq_alloc_descs irq: 10, from: 10, cnt: 1, node: -1, start: 10, nr_irqs: 1096 >> [ 0.181950] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 >> [ 0.181951] cjq_debug __irq_alloc_descs irq: 11, from: 11, cnt: 1, node: -1, start: 11, nr_irqs: 1096 >> [ 0.181977] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 >> [ 0.181979] cjq_debug __irq_alloc_descs irq: 12, from: 12, cnt: 1, node: -1, start: 12, nr_irqs: 1096 >> [ 0.182006] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 >> [ 0.182007] cjq_debug __irq_alloc_descs irq: 13, from: 13, cnt: 1, node: -1, start: 13, nr_irqs: 1096 >> [ 0.182034] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 0.182035] cjq_debug __irq_alloc_descs irq: 14, from: 14, cnt: 1, node: -1, start: 14, nr_irqs: 1096 >> [ 0.182066] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 >> [ 0.182067] cjq_debug __irq_alloc_descs irq: 15, from: 15, cnt: 1, node: -1, start: 15, nr_irqs: 1096 >> [ 0.182095] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 >> [ 0.186111] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 >> [ 0.188491] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 >> [ 0.192282] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 >> [ 0.196208] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 >> [ 0.198199] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 >> [ 0.198416] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 >> [ 0.198460] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 >> [ 0.198489] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 >> [ 0.198523] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 >> [ 0.201315] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 >> [ 0.202174] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 >> [ 0.202225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 >> [ 0.202259] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 >> [ 0.202291] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 >> [ 0.205239] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 63, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 64, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 66, nr_irqs: 1096 >> [ 0.208653] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 67, nr_irqs: 1096 >> [ 0.210169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 68, nr_irqs: 1096 >> [ 0.210322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 69, nr_irqs: 1096 >> [ 0.210370] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 70, nr_irqs: 1096 >> [ 0.210403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 71, nr_irqs: 1096 >> [ 0.210436] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 72, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 73, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 74, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 75, nr_irqs: 1096 >> [ 0.213190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 76, nr_irqs: 1096 >> [ 0.214151] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 77, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 78, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 79, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 80, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 81, nr_irqs: 1096 >> [ 0.217075] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 >> [ 0.220389] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 >> [ 0.222215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 >> [ 0.222366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 >> [ 0.222410] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 90, nr_irqs: 1096 >> [ 0.222447] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 91, nr_irqs: 1096 >> [ 0.222478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 92, nr_irqs: 1096 >> [ 0.225490] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 93, nr_irqs: 1096 >> [ 0.226225] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 94, nr_irqs: 1096 >> [ 0.226268] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 95, nr_irqs: 1096 >> [ 0.226300] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 96, nr_irqs: 1096 >> [ 0.226329] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 97, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 98, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 99, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 100, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 101, nr_irqs: 1096 >> [ 0.229057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 102, nr_irqs: 1096 >> [ 0.232399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 103, nr_irqs: 1096 >> [ 0.248854] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 104, nr_irqs: 1096 >> [ 0.250609] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 105, nr_irqs: 1096 >> [ 0.372343] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 0.720950] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 0.721052] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 1.254825] cjq_debug mp_map_pin_to_irq gsi: 7, irq: -16, idx: 7, ioapic: 0, pin: 7 >> [ 1.333081] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 1.375882] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 106, nr_irqs: 1096 >> [ 1.375951] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 106, idx: -1, ioapic: 1, pin: 8 >> [ 1.376072] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 >> [ 1.376121] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 >> [ 1.472551] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 107, idx: -1, ioapic: 1, pin: 13 >> [ 1.472697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 >> [ 1.472751] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 >> [ 1.484290] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 107, idx: -1, ioapic: 1, pin: 14 >> [ 1.768163] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 107, nr_irqs: 1096 >> [ 1.768627] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 108, nr_irqs: 1096 >> [ 1.769059] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 109, nr_irqs: 1096 >> [ 1.769694] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 110, nr_irqs: 1096 >> [ 1.770169] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 111, nr_irqs: 1096 >> [ 1.770697] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 112, nr_irqs: 1096 >> [ 1.770738] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 >> [ 1.770789] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 113, nr_irqs: 1096 >> [ 1.771230] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 112, idx: -1, ioapic: 1, pin: 4 >> [ 1.771278] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 114, nr_irqs: 1096 >> [ 2.127884] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 115, nr_irqs: 1096 >> [ 3.207419] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 116, nr_irqs: 1096 >> [ 3.207730] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 >> [ 3.208120] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 117, nr_irqs: 1096 >> [ 3.208475] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 >> [ 3.208478] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 118, nr_irqs: 1096 >> [ 3.208861] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 >> [ 3.208933] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 119, nr_irqs: 1096 >> [ 3.209127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 120, nr_irqs: 1096 >> [ 3.209383] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 121, nr_irqs: 1096 >> [ 3.209863] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 122, nr_irqs: 1096 >> [ 3.211439] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 123, nr_irqs: 1096 >> [ 3.211833] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 124, nr_irqs: 1096 >> [ 3.212873] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 125, nr_irqs: 1096 >> [ 3.243514] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 126, nr_irqs: 1096 >> [ 3.243689] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 126, idx: -1, ioapic: 1, pin: 14 >> [ 3.244293] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 127, nr_irqs: 1096 >> [ 3.244534] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 128, nr_irqs: 1096 >> [ 3.244714] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 129, nr_irqs: 1096 >> [ 3.244911] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 130, nr_irqs: 1096 >> [ 3.245096] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 131, nr_irqs: 1096 >> [ 3.245633] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 132, nr_irqs: 1096 >> [ 3.247890] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 133, nr_irqs: 1096 >> [ 3.248192] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 134, nr_irqs: 1096 >> [ 3.271093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 135, nr_irqs: 1096 >> [ 3.307045] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 136, nr_irqs: 1096 >> [ 3.307162] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 136, idx: -1, ioapic: 1, pin: 24 >> [ 3.307223] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 >> [ 3.331183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 137, nr_irqs: 1096 >> [ 3.331295] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 138, nr_irqs: 1096 >> [ 3.331366] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 139, nr_irqs: 1096 >> [ 3.331438] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 140, nr_irqs: 1096 >> [ 3.331511] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 141, nr_irqs: 1096 >> [ 3.331579] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 142, nr_irqs: 1096 >> [ 3.331646] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 143, nr_irqs: 1096 >> [ 3.331713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 144, nr_irqs: 1096 >> [ 3.331780] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 145, nr_irqs: 1096 >> [ 3.331846] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 146, nr_irqs: 1096 >> [ 3.331913] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 147, nr_irqs: 1096 >> [ 3.331984] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 148, nr_irqs: 1096 >> [ 3.332051] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 149, nr_irqs: 1096 >> [ 3.332118] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 150, nr_irqs: 1096 >> [ 3.332183] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 151, nr_irqs: 1096 >> [ 3.332252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 152, nr_irqs: 1096 >> [ 3.332319] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 153, nr_irqs: 1096 >> [ 8.010370] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 116, idx: -1, ioapic: 1, pin: 13 >> [ 9.545439] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 >> [ 9.545713] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 154, nr_irqs: 1096 >> [ 9.546034] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 155, nr_irqs: 1096 >> [ 9.687796] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 156, nr_irqs: 1096 >> [ 9.687979] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 >> [ 9.688057] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 157, nr_irqs: 1096 >> [ 9.921038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 158, nr_irqs: 1096 >> [ 9.921210] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 158, idx: -1, ioapic: 1, pin: 5 >> [ 9.921403] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 159, nr_irqs: 1096 >> [ 9.926373] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 156, idx: -1, ioapic: 1, pin: 15 >> [ 9.926747] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 160, nr_irqs: 1096 >> [ 9.928201] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 117, idx: -1, ioapic: 1, pin: 12 >> [ 9.928488] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 161, nr_irqs: 1096 >> [ 10.653915] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 162, nr_irqs: 1096 >> [ 10.656257] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 163, nr_irqs: 1096 >> >> You can find that the allocation of irq is not always based on the value of gsi. It follows the principle of requesting first, distributing first, like gsi 32 get 106 but gsi 28 get 112. And not only acpi_register_gsi_ioapic() will call into __irq_alloc_descs, but other functions will call, even earlier. >> Above output is like baremetal. So, we can get conclusion irq != gsi. See below output on linux: > > It does seem weird to me that it does identity map legacy IRQs (<16), > but then for GSI >= 16 it starts assigning IRQs in the 100 range. > > What uses the IRQ range [24, 105]? They are allocated to the ipi, msi or event channel. They call __irq_alloc_descs before the pci devices. For example, see one ipi's callstack: kernel_init kernel_init_freeable smp_prepare_cpus smp_ops.smp_prepare_cpus xen_hvm_smp_prepare_cpus xen_smp_intr_init bind_ipi_to_irqhandler bind_ipi_to_irq xen_allocate_irq_dynamic __irq_alloc_descs > > Also IIRC on a PV dom0 GSIs are identity mapped to IRQs on Linux? Or > maybe that's just a side effect of GSIs being identity mapped into > PIRQs by Xen? PV is different, although, ipi also will come before pci devices, they don't occupy the irq(24~56). Because in PV dom0, it doesn't call setup_IO_APIC when start_kernel, so variable "ioapic_initialized" in function arch_dynirq_lower_bound is not initialized, and then gsi_top whose value is 56 is returned, the irq allocation begins from 56 number(but PVH and baremetal will initialize "ioapic_initialized", and then arch_dynirq_lower_bound will return ioapic_dynirq_base whose value is 24). What's more, when PV allocates irq for a pci device, it call acpi_register_gsi_xen->irq_alloc_desc_at->__irq_alloc_descs, function irq_alloc_desc_at send gsi to __irq_alloc_descs(PVH and baremetal send -1), so in function __irq_alloc_descs, variable "from" is equal gsi, and gsi is between 24~56, and 24~56's irq are not occupied before. Then it returns a irq that equal gsi. > >> [ 0.105053] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 0.105061] cjq_debug mp_map_pin_to_irq gsi: 2, irq: 0, idx: 0, ioapic: 0, pin: 2 >> [ 0.105069] cjq_debug mp_map_pin_to_irq gsi: 3, irq: 3, idx: 3, ioapic: 0, pin: 3 >> [ 0.105078] cjq_debug mp_map_pin_to_irq gsi: 4, irq: 4, idx: 4, ioapic: 0, pin: 4 >> [ 0.105086] cjq_debug mp_map_pin_to_irq gsi: 5, irq: 5, idx: 5, ioapic: 0, pin: 5 >> [ 0.105094] cjq_debug mp_map_pin_to_irq gsi: 6, irq: 6, idx: 6, ioapic: 0, pin: 6 >> [ 0.105103] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 >> [ 0.105111] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 0.105119] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 0.105127] cjq_debug mp_map_pin_to_irq gsi: 10, irq: 10, idx: 9, ioapic: 0, pin: 10 >> [ 0.105136] cjq_debug mp_map_pin_to_irq gsi: 11, irq: 11, idx: 10, ioapic: 0, pin: 11 >> [ 0.105144] cjq_debug mp_map_pin_to_irq gsi: 12, irq: 12, idx: 11, ioapic: 0, pin: 12 >> [ 0.105152] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 0.105160] cjq_debug mp_map_pin_to_irq gsi: 14, irq: 14, idx: 13, ioapic: 0, pin: 14 >> [ 0.105169] cjq_debug mp_map_pin_to_irq gsi: 15, irq: 15, idx: 14, ioapic: 0, pin: 15 >> [ 0.398134] cjq_debug mp_map_pin_to_irq gsi: 9, irq: 9, idx: 1, ioapic: 0, pin: 9 >> [ 1.169293] cjq_debug mp_map_pin_to_irq gsi: 8, irq: 8, idx: 8, ioapic: 0, pin: 8 >> [ 1.169394] cjq_debug mp_map_pin_to_irq gsi: 13, irq: 13, idx: 12, ioapic: 0, pin: 13 >> [ 1.323132] cjq_debug mp_map_pin_to_irq gsi: 7, irq: 7, idx: 7, ioapic: 0, pin: 7 >> [ 1.345425] cjq_debug mp_map_pin_to_irq gsi: 1, irq: 1, idx: 2, ioapic: 0, pin: 1 >> [ 1.375502] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 24, nr_irqs: 1096 >> [ 1.375575] cjq_debug mp_map_pin_to_irq gsi: 32, irq: 24, idx: -1, ioapic: 1, pin: 8 >> [ 1.375661] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 1.375705] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 >> [ 1.442277] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 25, idx: -1, ioapic: 1, pin: 13 >> [ 1.442393] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 1.442450] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 >> [ 1.453893] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 25, idx: -1, ioapic: 1, pin: 14 >> [ 1.456127] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 25, nr_irqs: 1096 >> [ 1.734065] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 26, nr_irqs: 1096 >> [ 1.734165] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 27, nr_irqs: 1096 >> [ 1.734253] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 28, nr_irqs: 1096 >> [ 1.734344] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 29, nr_irqs: 1096 >> [ 1.734426] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 30, nr_irqs: 1096 >> [ 1.734512] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 31, nr_irqs: 1096 >> [ 1.734597] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 32, nr_irqs: 1096 >> [ 1.734643] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 33, nr_irqs: 1096 >> [ 1.734687] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 34, nr_irqs: 1096 >> [ 1.734728] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 35, nr_irqs: 1096 >> [ 1.735017] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 36, nr_irqs: 1096 >> [ 1.735252] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 37, nr_irqs: 1096 >> [ 1.735467] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 38, nr_irqs: 1096 >> [ 1.735799] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 39, nr_irqs: 1096 >> [ 1.736024] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 40, nr_irqs: 1096 >> [ 1.736364] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 41, nr_irqs: 1096 >> [ 1.736406] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 >> [ 1.736434] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 42, nr_irqs: 1096 >> [ 1.736701] cjq_debug mp_map_pin_to_irq gsi: 28, irq: 41, idx: -1, ioapic: 1, pin: 4 >> [ 1.736724] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 43, nr_irqs: 1096 >> [ 3.037123] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 44, nr_irqs: 1096 >> [ 3.037313] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 >> [ 3.037515] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 >> [ 3.037738] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 45, nr_irqs: 1096 >> [ 3.037959] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 46, nr_irqs: 1096 >> [ 3.038073] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 47, nr_irqs: 1096 >> [ 3.038154] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 48, nr_irqs: 1096 >> [ 3.038179] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 >> [ 3.038277] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 49, nr_irqs: 1096 >> [ 3.038399] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 50, nr_irqs: 1096 >> [ 3.038525] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 51, nr_irqs: 1096 >> [ 3.038657] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 52, nr_irqs: 1096 >> [ 3.038852] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 53, nr_irqs: 1096 >> [ 3.052377] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 54, nr_irqs: 1096 >> [ 3.052479] cjq_debug mp_map_pin_to_irq gsi: 38, irq: 54, idx: -1, ioapic: 1, pin: 14 >> [ 3.052730] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 55, nr_irqs: 1096 >> [ 3.052840] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 56, nr_irqs: 1096 >> [ 3.052918] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 57, nr_irqs: 1096 >> [ 3.052987] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 58, nr_irqs: 1096 >> [ 3.053069] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 59, nr_irqs: 1096 >> [ 3.053139] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 60, nr_irqs: 1096 >> [ 3.053201] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 61, nr_irqs: 1096 >> [ 3.053260] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 62, nr_irqs: 1096 >> [ 3.089128] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 63, nr_irqs: 1096 >> [ 3.089310] cjq_debug mp_map_pin_to_irq gsi: 48, irq: 63, idx: -1, ioapic: 1, pin: 24 >> [ 3.089376] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 >> [ 3.103435] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 65, nr_irqs: 1096 >> [ 3.114190] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 64, nr_irqs: 1096 >> [ 3.114346] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 66, nr_irqs: 1096 >> [ 3.121215] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 67, nr_irqs: 1096 >> [ 3.121350] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 68, nr_irqs: 1096 >> [ 3.121479] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 69, nr_irqs: 1096 >> [ 3.121612] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 70, nr_irqs: 1096 >> [ 3.121726] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 71, nr_irqs: 1096 >> [ 3.121841] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 72, nr_irqs: 1096 >> [ 3.121955] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 73, nr_irqs: 1096 >> [ 3.122025] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 74, nr_irqs: 1096 >> [ 3.122093] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 75, nr_irqs: 1096 >> [ 3.122148] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 76, nr_irqs: 1096 >> [ 3.122203] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 77, nr_irqs: 1096 >> [ 3.122265] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 78, nr_irqs: 1096 >> [ 3.122322] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 79, nr_irqs: 1096 >> [ 3.122378] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 80, nr_irqs: 1096 >> [ 3.122433] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: 0, start: 81, nr_irqs: 1096 >> [ 7.838753] cjq_debug mp_map_pin_to_irq gsi: 37, irq: 44, idx: -1, ioapic: 1, pin: 13 >> [ 9.619174] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 >> [ 9.619556] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 82, nr_irqs: 1096 >> [ 9.622038] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 83, nr_irqs: 1096 >> [ 9.634900] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 84, nr_irqs: 1096 >> [ 9.635316] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 >> [ 9.635405] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 85, nr_irqs: 1096 >> [ 10.006686] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 86, nr_irqs: 1096 >> [ 10.006823] cjq_debug mp_map_pin_to_irq gsi: 29, irq: 86, idx: -1, ioapic: 1, pin: 5 >> [ 10.007009] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 87, nr_irqs: 1096 >> [ 10.008723] cjq_debug mp_map_pin_to_irq gsi: 39, irq: 84, idx: -1, ioapic: 1, pin: 15 >> [ 10.009853] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 88, nr_irqs: 1096 >> [ 10.010786] cjq_debug mp_map_pin_to_irq gsi: 36, irq: 47, idx: -1, ioapic: 1, pin: 12 >> [ 10.010858] cjq_debug __irq_alloc_descs irq: -1, from: 24, cnt: 1, node: -1, start: 89, nr_irqs: 1096 >> >> 2. Why I do the translations between irq and gsi? >> >> After answering question 1, we get irq != gsi. And I found, in QEMU, (pci_qdev_realize->xen_pt_realize->xen_host_pci_device_get->xen_host_pci_get_hex_value) will get the irq number, but later, pci_qdev_realize->xen_pt_realize->xc_physdev_map_pirq requires us to pass into gsi, > > So that's quite a difference. For some reason on a PV dom0 > xen_host_pci_get_hex_value will return the IRQ that's identity mapped > to the GSI. > > Is that because a PV dom0 will use acpi_register_gsi_xen() instead of > acpi_register_gsi_ioapic()? Not right, PV get irq from /sys/bus/pci/devices/xxxx:xx:xx.x/irq, see xen_pt_realize-> xen_host_pci_device_get-> xen_host_pci_get_dec_value-> xen_host_pci_get_value-> open, and it treats irq as gsi. > >> it will call into Xen physdev_map_pirq-> allocate_and_map_gsi_pirq to allocate pirq for gsi. And then the error occurred. >> Not only that, the callback function pci_add_dm_done-> xc_physdev_map_pirq also need gsi. >> >> So, I added the function xc_physdev_map_pirq() to translate irq to gsi, for QEMU. >> >> And I didn't find similar functions in existing linux codes, and I think only "QEMU passthrough for Xen" need this translation, so I added it into privcmd. If you guys know any other similar functions or other more suitable places, please feel free to tell me. >> >> 3. Why I call PHYSDEVOP_map_pirq in acpi_register_gsi_xen_pvh()? >> >> Because if you want to map a gsi for domU, it must have a mapping in dom0. See QEMU code: >> pci_add_dm_done >> xc_physdev_map_pirq >> xc_domain_irq_permission >> XEN_DOMCTL_irq_permission >> pirq_access_permitted >> xc_physdev_map_pirq will get the pirq which mapped from gsi, and xc_domain_irq_permission will use pirq and call into Xen. If we don't do PHYSDEVOP_map_pirq for passthrough devices on PVH dom0, then pirq_access_permitted will get a NULL irq from dom0 and get failed. > > I'm not sure of this specific case, but we shouldn't attempt to fit > the same exact PCI pass through workflow that a PV dom0 uses into a > PVH dom0. IOW: it might make sense to diverge some paths in order to > avoid importing PV specific concepts into PVH without a reason. Yes, I agree with you. I also try another method to solve this problem. I think we can discuss this in the new series. > >> So, I added PHYSDEVOP_map_pirq for PVH dom0. But I think it is only necessary for passthrough devices to do that, instead of all devices which call __acpi_register_gsi. In next version patch, I will restrain that only passthrough devices can do PHYSDEVOP_map_pirq. >> >> 4. Why I call PHYSDEVOP_setup_gsi in acpi_register_gsi_xen_pvh()? >> >> Like Roger's comments, the gsi of passthrough device doesn't be unmasked and registered(I added printings in vioapic_hwdom_map_gsi(), and I found that it never be called for dGPU with gsi 28 in my environment). >> So, I called PHYSDEVOP_setup_gsi to register gsi. >> But I agree with Roger and Jan's opinion, it is wrong to do PHYSDEVOP_setup_gsi for all devices. >> So, in next version patch, I will also restrain that only passthrough devices can do PHYSDEVOP_setup_gsi. > > Right, given how long it's been since the last series, I think we need > a new series posted in order to see how this looks now. Agree, I am looking forward to getting your comments in the new series. > > Thanks, Roger. -- Best regards, Jiqian Chen. ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui ` (5 preceding siblings ...) 2023-03-12 7:54 ` [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi Huang Rui @ 2023-03-13 7:24 ` Christian König 2023-03-21 10:26 ` Huang Rui 2023-03-20 16:22 ` Huang Rui 7 siblings, 1 reply; 75+ messages in thread From: Christian König @ 2023-03-13 7:24 UTC (permalink / raw) To: Huang Rui, Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel Cc: Alex Deucher, Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian Hi Ray, one nit comment on the style, apart from that looks technical correct. But I'm *really* not and expert on all that stuff. Regards, Christian. Am 12.03.23 um 08:54 schrieb Huang Rui: > Hi all, > > In graphic world, the 3D applications/games are runing based on open > graphic libraries such as OpenGL and Vulkan. Mesa is the Linux > implemenatation of OpenGL and Vulkan for multiple hardware platforms. > Because the graphic libraries would like to have the GPU hardware > acceleration. In virtualization world, virtio-gpu and passthrough-gpu are > two of gpu virtualization technologies. > > Current Xen only supports OpenGL (virgl: > https://docs.mesa3d.org/drivers/virgl.html) for virtio-gpu and passthrough > gpu based on PV dom0 for x86 platform. Today, we would like to introduce > Vulkan (venus: https://docs.mesa3d.org/drivers/venus.html) and another > OpenGL on Vulkan (zink: https://docs.mesa3d.org/drivers/zink.html) support > for VirtIO GPU on Xen. These functions are supported on KVM at this moment, > but so far, they are not supported on Xen. And we also introduce the PCIe > passthrough (GPU) function based on PVH dom0 for AMD x86 platform. > > These supports required multiple repositories changes on kernel, xen, qemu, > mesa, and virglrenderer. Please check below branches: > > Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=upstream-fox-xen > Xen: https://gitlab.com/huangrui123/xen/-/commits/upstream-for-xen > QEMU: https://gitlab.com/huangrui123/qemu/-/commits/upstream-for-xen > Mesa: https://gitlab.freedesktop.org/rui/mesa/-/commits/upstream-for-xen > Virglrenderer: https://gitlab.freedesktop.org/rui/virglrenderer/-/commits/upstream-for-xen > > In xen part, we mainly add the PCIe passthrough support on PVH dom0. It's > using the QEMU to passthrough the GPU device into guest HVM domU. And > mainly work is to transfer the interrupt by using gsi, vector, and pirq. > > Below are the screenshot of these functions, please take a look. > > Venus: > https://drive.google.com/file/d/1_lPq6DMwHu1JQv7LUUVRx31dBj0HJYcL/view?usp=share_link > > Zink: > https://drive.google.com/file/d/1FxLmKu6X7uJOxx1ZzwOm1yA6IL5WMGzd/view?usp=share_link > > Passthrough GPU: > https://drive.google.com/file/d/17onr5gvDK8KM_LniHTSQEI2hGJZlI09L/view?usp=share_link > > We are working to write the documentation that describe how to verify these > functions in the xen wiki page. And will update it in the future version. > > Thanks, > Ray > > Chen Jiqian (5): > vpci: accept BAR writes if dom0 is PVH > x86/pvh: shouldn't check pirq flag when map pirq in PVH > x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call > tools/libs/call: add linux os call to get gsi from irq > tools/libs/light: pci: translate irq to gsi > > Roger Pau Monne (1): > x86/pvh: report ACPI VFCT table to dom0 if present > > tools/include/xen-sys/Linux/privcmd.h | 7 +++++++ > tools/include/xencall.h | 2 ++ > tools/include/xenctrl.h | 2 ++ > tools/libs/call/core.c | 5 +++++ > tools/libs/call/libxencall.map | 2 ++ > tools/libs/call/linux.c | 14 ++++++++++++++ > tools/libs/call/private.h | 9 +++++++++ > tools/libs/ctrl/xc_physdev.c | 4 ++++ > tools/libs/light/libxl_pci.c | 1 + > xen/arch/x86/hvm/dom0_build.c | 1 + > xen/arch/x86/hvm/hypercall.c | 3 +-- > xen/drivers/vpci/header.c | 2 +- > xen/include/acpi/actbl3.h | 1 + > 13 files changed, 50 insertions(+), 3 deletions(-) > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 2023-03-13 7:24 ` [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Christian König @ 2023-03-21 10:26 ` Huang Rui 0 siblings, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-21 10:26 UTC (permalink / raw) To: Koenig, Christian Cc: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, xen-devel, Deucher, Alexander, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian On Mon, Mar 13, 2023 at 03:24:55PM +0800, Koenig, Christian wrote: > Hi Ray, > > one nit comment on the style, apart from that looks technical correct. > > But I'm *really* not and expert on all that stuff. Christian, thanks anyway. :-) Thanks, Ray > > Regards, > Christian. > > Am 12.03.23 um 08:54 schrieb Huang Rui: > > Hi all, > > > > In graphic world, the 3D applications/games are runing based on open > > graphic libraries such as OpenGL and Vulkan. Mesa is the Linux > > implemenatation of OpenGL and Vulkan for multiple hardware platforms. > > Because the graphic libraries would like to have the GPU hardware > > acceleration. In virtualization world, virtio-gpu and passthrough-gpu are > > two of gpu virtualization technologies. > > > > Current Xen only supports OpenGL (virgl: > > https://docs.mesa3d.org/drivers/virgl.html) for virtio-gpu and passthrough > > gpu based on PV dom0 for x86 platform. Today, we would like to introduce > > Vulkan (venus: https://docs.mesa3d.org/drivers/venus.html) and another > > OpenGL on Vulkan (zink: https://docs.mesa3d.org/drivers/zink.html) support > > for VirtIO GPU on Xen. These functions are supported on KVM at this moment, > > but so far, they are not supported on Xen. And we also introduce the PCIe > > passthrough (GPU) function based on PVH dom0 for AMD x86 platform. > > > > These supports required multiple repositories changes on kernel, xen, qemu, > > mesa, and virglrenderer. Please check below branches: > > > > Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=upstream-fox-xen > > Xen: https://gitlab.com/huangrui123/xen/-/commits/upstream-for-xen > > QEMU: https://gitlab.com/huangrui123/qemu/-/commits/upstream-for-xen > > Mesa: https://gitlab.freedesktop.org/rui/mesa/-/commits/upstream-for-xen > > Virglrenderer: https://gitlab.freedesktop.org/rui/virglrenderer/-/commits/upstream-for-xen > > > > In xen part, we mainly add the PCIe passthrough support on PVH dom0. It's > > using the QEMU to passthrough the GPU device into guest HVM domU. And > > mainly work is to transfer the interrupt by using gsi, vector, and pirq. > > > > Below are the screenshot of these functions, please take a look. > > > > Venus: > > https://drive.google.com/file/d/1_lPq6DMwHu1JQv7LUUVRx31dBj0HJYcL/view?usp=share_link > > > > Zink: > > https://drive.google.com/file/d/1FxLmKu6X7uJOxx1ZzwOm1yA6IL5WMGzd/view?usp=share_link > > > > Passthrough GPU: > > https://drive.google.com/file/d/17onr5gvDK8KM_LniHTSQEI2hGJZlI09L/view?usp=share_link > > > > We are working to write the documentation that describe how to verify these > > functions in the xen wiki page. And will update it in the future version. > > > > Thanks, > > Ray > > > > Chen Jiqian (5): > > vpci: accept BAR writes if dom0 is PVH > > x86/pvh: shouldn't check pirq flag when map pirq in PVH > > x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call > > tools/libs/call: add linux os call to get gsi from irq > > tools/libs/light: pci: translate irq to gsi > > > > Roger Pau Monne (1): > > x86/pvh: report ACPI VFCT table to dom0 if present > > > > tools/include/xen-sys/Linux/privcmd.h | 7 +++++++ > > tools/include/xencall.h | 2 ++ > > tools/include/xenctrl.h | 2 ++ > > tools/libs/call/core.c | 5 +++++ > > tools/libs/call/libxencall.map | 2 ++ > > tools/libs/call/linux.c | 14 ++++++++++++++ > > tools/libs/call/private.h | 9 +++++++++ > > tools/libs/ctrl/xc_physdev.c | 4 ++++ > > tools/libs/light/libxl_pci.c | 1 + > > xen/arch/x86/hvm/dom0_build.c | 1 + > > xen/arch/x86/hvm/hypercall.c | 3 +-- > > xen/drivers/vpci/header.c | 2 +- > > xen/include/acpi/actbl3.h | 1 + > > 13 files changed, 50 insertions(+), 3 deletions(-) > > > ^ permalink raw reply [flat|nested] 75+ messages in thread
* Re: [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui ` (6 preceding siblings ...) 2023-03-13 7:24 ` [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Christian König @ 2023-03-20 16:22 ` Huang Rui 7 siblings, 0 replies; 75+ messages in thread From: Huang Rui @ 2023-03-20 16:22 UTC (permalink / raw) To: Roger Pau Monné, Jan Beulich, Stefano Stabellini, Anthony PERARD, Andrew Cooper, xen-devel Cc: Deucher, Alexander, Koenig, Christian, Hildebrand, Stewart, Xenia Ragiadakou, Huang, Honglei1, Zhang, Julia, Chen, Jiqian Hi Jan, Roger, Stefano, Andrew, Sorry to late response, I was fully occupied by another problem last week. And I will give the reply one by one in the mail tomorrow. Thanks for your patience. :-) Thanks, Ray On Sun, Mar 12, 2023 at 03:54:49PM +0800, Huang, Ray wrote: > Hi all, > > In graphic world, the 3D applications/games are runing based on open > graphic libraries such as OpenGL and Vulkan. Mesa is the Linux > implemenatation of OpenGL and Vulkan for multiple hardware platforms. > Because the graphic libraries would like to have the GPU hardware > acceleration. In virtualization world, virtio-gpu and passthrough-gpu are > two of gpu virtualization technologies. > > Current Xen only supports OpenGL (virgl: > https://docs.mesa3d.org/drivers/virgl.html) for virtio-gpu and passthrough > gpu based on PV dom0 for x86 platform. Today, we would like to introduce > Vulkan (venus: https://docs.mesa3d.org/drivers/venus.html) and another > OpenGL on Vulkan (zink: https://docs.mesa3d.org/drivers/zink.html) support > for VirtIO GPU on Xen. These functions are supported on KVM at this moment, > but so far, they are not supported on Xen. And we also introduce the PCIe > passthrough (GPU) function based on PVH dom0 for AMD x86 platform. > > These supports required multiple repositories changes on kernel, xen, qemu, > mesa, and virglrenderer. Please check below branches: > > Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=upstream-fox-xen > Xen: https://gitlab.com/huangrui123/xen/-/commits/upstream-for-xen > QEMU: https://gitlab.com/huangrui123/qemu/-/commits/upstream-for-xen > Mesa: https://gitlab.freedesktop.org/rui/mesa/-/commits/upstream-for-xen > Virglrenderer: https://gitlab.freedesktop.org/rui/virglrenderer/-/commits/upstream-for-xen > > In xen part, we mainly add the PCIe passthrough support on PVH dom0. It's > using the QEMU to passthrough the GPU device into guest HVM domU. And > mainly work is to transfer the interrupt by using gsi, vector, and pirq. > > Below are the screenshot of these functions, please take a look. > > Venus: > https://drive.google.com/file/d/1_lPq6DMwHu1JQv7LUUVRx31dBj0HJYcL/view?usp=share_link > > Zink: > https://drive.google.com/file/d/1FxLmKu6X7uJOxx1ZzwOm1yA6IL5WMGzd/view?usp=share_link > > Passthrough GPU: > https://drive.google.com/file/d/17onr5gvDK8KM_LniHTSQEI2hGJZlI09L/view?usp=share_link > > We are working to write the documentation that describe how to verify these > functions in the xen wiki page. And will update it in the future version. > > Thanks, > Ray > > Chen Jiqian (5): > vpci: accept BAR writes if dom0 is PVH > x86/pvh: shouldn't check pirq flag when map pirq in PVH > x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call > tools/libs/call: add linux os call to get gsi from irq > tools/libs/light: pci: translate irq to gsi > > Roger Pau Monne (1): > x86/pvh: report ACPI VFCT table to dom0 if present > > tools/include/xen-sys/Linux/privcmd.h | 7 +++++++ > tools/include/xencall.h | 2 ++ > tools/include/xenctrl.h | 2 ++ > tools/libs/call/core.c | 5 +++++ > tools/libs/call/libxencall.map | 2 ++ > tools/libs/call/linux.c | 14 ++++++++++++++ > tools/libs/call/private.h | 9 +++++++++ > tools/libs/ctrl/xc_physdev.c | 4 ++++ > tools/libs/light/libxl_pci.c | 1 + > xen/arch/x86/hvm/dom0_build.c | 1 + > xen/arch/x86/hvm/hypercall.c | 3 +-- > xen/drivers/vpci/header.c | 2 +- > xen/include/acpi/actbl3.h | 1 + > 13 files changed, 50 insertions(+), 3 deletions(-) > > -- > 2.25.1 > ^ permalink raw reply [flat|nested] 75+ messages in thread
end of thread, other threads:[~2023-08-31 8:56 UTC | newest] Thread overview: 75+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-03-12 7:54 [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 1/6] x86/pvh: report ACPI VFCT table to dom0 if present Huang Rui 2023-03-13 11:55 ` Andrew Cooper 2023-03-13 12:21 ` Roger Pau Monné 2023-03-13 12:27 ` Andrew Cooper 2023-03-21 6:26 ` Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 2/6] vpci: accept BAR writes if dom0 is PVH Huang Rui 2023-03-13 7:23 ` Christian König 2023-03-13 7:26 ` Christian König 2023-03-13 8:46 ` Jan Beulich 2023-03-13 8:55 ` Huang Rui 2023-03-14 23:42 ` Stefano Stabellini 2023-03-14 16:02 ` Jan Beulich 2023-03-21 9:36 ` Huang Rui 2023-03-21 9:41 ` Jan Beulich 2023-03-21 10:14 ` Huang Rui 2023-03-21 10:20 ` Jan Beulich 2023-03-21 11:49 ` Huang Rui 2023-03-21 12:20 ` Roger Pau Monné 2023-03-21 12:25 ` Jan Beulich 2023-03-21 12:59 ` Huang Rui 2023-03-21 12:27 ` Jan Beulich 2023-03-21 13:03 ` Huang Rui 2023-03-22 7:28 ` Huang Rui 2023-03-22 7:45 ` Jan Beulich 2023-03-22 9:34 ` Roger Pau Monné 2023-03-22 12:33 ` Huang Rui 2023-03-22 12:48 ` Jan Beulich 2023-03-23 10:26 ` Huang Rui 2023-03-23 14:16 ` Jan Beulich 2023-03-23 10:43 ` Roger Pau Monné 2023-03-23 13:34 ` Huang Rui 2023-03-23 16:23 ` Roger Pau Monné 2023-03-24 4:37 ` Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 3/6] x86/pvh: shouldn't check pirq flag when map pirq in PVH Huang Rui 2023-03-14 16:27 ` Jan Beulich 2023-03-15 15:57 ` Roger Pau Monné 2023-03-16 0:22 ` Stefano Stabellini 2023-03-21 10:09 ` Huang Rui 2023-03-21 10:17 ` Jan Beulich 2023-03-12 7:54 ` [RFC XEN PATCH 4/6] x86/pvh: PVH dom0 also need PHYSDEVOP_setup_gsi call Huang Rui 2023-03-14 16:30 ` Jan Beulich 2023-03-15 17:01 ` Andrew Cooper 2023-03-16 0:26 ` Stefano Stabellini 2023-03-16 0:39 ` Stefano Stabellini 2023-03-16 8:51 ` Jan Beulich 2023-03-16 9:18 ` Roger Pau Monné 2023-03-16 7:05 ` Jan Beulich 2023-03-21 12:42 ` Huang Rui 2023-03-21 12:22 ` Huang Rui 2023-03-12 7:54 ` [RFC XEN PATCH 5/6] tools/libs/call: add linux os call to get gsi from irq Huang Rui 2023-03-14 16:36 ` Jan Beulich 2023-03-12 7:54 ` [RFC XEN PATCH 6/6] tools/libs/light: pci: translate irq to gsi Huang Rui 2023-03-14 16:39 ` Jan Beulich 2023-03-15 16:35 ` Roger Pau Monné 2023-03-16 0:44 ` Stefano Stabellini 2023-03-16 8:54 ` Roger Pau Monné 2023-03-16 8:55 ` Jan Beulich 2023-03-16 9:27 ` Roger Pau Monné 2023-03-16 9:42 ` Jan Beulich 2023-03-16 23:19 ` Stefano Stabellini 2023-03-17 8:39 ` Jan Beulich 2023-03-17 9:51 ` Roger Pau Monné 2023-03-17 18:15 ` Stefano Stabellini 2023-03-17 19:48 ` Roger Pau Monné 2023-03-17 20:55 ` Stefano Stabellini 2023-03-20 15:16 ` Roger Pau Monné 2023-03-20 15:29 ` Jan Beulich 2023-03-20 16:50 ` Roger Pau Monné 2023-07-31 16:40 ` Chen, Jiqian 2023-08-23 8:57 ` Roger Pau Monné 2023-08-31 8:56 ` Chen, Jiqian 2023-03-13 7:24 ` [RFC XEN PATCH 0/6] Introduce VirtIO GPU and Passthrough GPU support on Xen PVH dom0 Christian König 2023-03-21 10:26 ` Huang Rui 2023-03-20 16:22 ` Huang Rui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).