xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Add Xen PVH dom0 support for GPU
@ 2023-03-12 12:01 Huang Rui
  2023-03-12 12:01 ` [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh Huang Rui
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Huang Rui @ 2023-03-12 12:01 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx
  Cc: Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian,
	Huang Rui

Hi all,

Currently, we are working to add VirtIO GPU and Passthrough GPU support on
Xen. We expected to use HVM on domU and PVH on dom0. The x86 PVH dom0
support needs a few modifications on our APU platform. These functions
requires multiple software components support including kernel, xen, qemu,
mesa, and virglrenderer. Please see the patch series on Xen and QEMU bleow.

Xen: https://lists.xenproject.org/archives/html/xen-devel/2023-03/msg00714.html
QEMU: https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg03972.html

Kernel part mainly adds the PVH dom0 support:

1) Enable Xen PVH dom0 for AMDGPU

Please check patch 1 to 3, that enable Xen PVH dom0 on amdgpu. Because we
would like to use hardware IOMMU instead of swiotlb for buffer copy, PV
dom0 only supported swiotlb.

There still some workarounds in the kernel need to dig it out like below
https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/commit/?h=upstream-fox-xen&id=9bee65dd3498dfc6aad283d22ff641198b5c91ed

2) Add PCIe Passthrough (GPU) on Xen PVH dom0

Please check patch 4 to 5, this implements acpi_register_gsi_xen_pvh API to
register GSI for guest domU, amd make a new privcmd to handle the GSI from
the IRQ.

Below are the screenshot of these functions, please take a look.

Passthrough GPU: https://drive.google.com/file/d/17onr5gvDK8KM_LniHTSQEI2hGJZlI09L/view?usp=share_link
Venus: https://drive.google.com/file/d/1_lPq6DMwHu1JQv7LUUVRx31dBj0HJYcL/view?usp=share_link
Zink: https://drive.google.com/file/d/1FxLmKu6X7uJOxx1ZzwOm1yA6IL5WMGzd/view?usp=share_link

Repositories
Kernel: https://git.kernel.org/pub/scm/linux/kernel/git/rui/linux.git/log/?h=upstream-fox-xen
Xen: https://gitlab.com/huangrui123/xen/-/commits/upstream-for-xen
QEMU: https://gitlab.com/huangrui123/qemu/-/commits/upstream-for-xen
Mesa: https://gitlab.freedesktop.org/rui/mesa/-/commits/upstream-for-xen
Virglrenderer: https://gitlab.freedesktop.org/rui/virglrenderer/-/commits/upstream-for-xen

We are writting the documentation on xen wiki page, and will update it in
feature version.

Thanks,
Ray

Chen Jiqian (2):
  x86/xen: acpi registers gsi for xen pvh
  xen/privcmd: add IOCTL_PRIVCMD_GSI_FROM_IRQ

Huang Rui (3):
  x86/xen: disable swiotlb for xen pvh
  xen/grants: update initialization order of xen grant table
  drm/amdgpu: set passthrough mode for xen pvh/hvm

 arch/x86/include/asm/apic.h              |  7 ++++
 arch/x86/include/asm/xen/pci.h           |  5 +++
 arch/x86/kernel/acpi/boot.c              |  2 +-
 arch/x86/kernel/pci-dma.c                |  8 ++++-
 arch/x86/pci/xen.c                       | 43 ++++++++++++++++++++++++
 arch/x86/xen/grant-table.c               |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  3 +-
 drivers/xen/events/events_base.c         | 39 +++++++++++++++++++++
 drivers/xen/grant-table.c                |  2 +-
 drivers/xen/privcmd.c                    | 20 +++++++++++
 include/uapi/xen/privcmd.h               |  7 ++++
 include/xen/events.h                     |  5 +++
 12 files changed, 138 insertions(+), 5 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-12 12:01 [RFC PATCH 0/5] Add Xen PVH dom0 support for GPU Huang Rui
@ 2023-03-12 12:01 ` Huang Rui
  2023-03-13  8:56   ` Jan Beulich
  2023-03-16 16:28   ` Roger Pau Monné
  2023-03-12 12:01 ` [RFC PATCH 2/5] xen/grants: update initialization order of xen grant table Huang Rui
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 27+ messages in thread
From: Huang Rui @ 2023-03-12 12:01 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx
  Cc: Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian,
	Huang Rui

Xen PVH is the paravirtualized mode and takes advantage of hardware
virtualization support when possible. It will using the hardware IOMMU
support instead of xen-swiotlb, so disable swiotlb if current domain is
Xen PVH.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/kernel/pci-dma.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index 30bbe4abb5d6..f5c73dd18f2a 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -74,6 +74,12 @@ static inline void __init pci_swiotlb_detect(void)
 #ifdef CONFIG_SWIOTLB_XEN
 static void __init pci_xen_swiotlb_init(void)
 {
+	/* Xen PVH domain won't use swiotlb */
+	if (xen_pvh_domain()) {
+		x86_swiotlb_enable = false;
+		return;
+	}
+
 	if (!xen_initial_domain() && !x86_swiotlb_enable)
 		return;
 	x86_swiotlb_enable = true;
@@ -86,7 +92,7 @@ static void __init pci_xen_swiotlb_init(void)
 
 int pci_xen_swiotlb_init_late(void)
 {
-	if (dma_ops == &xen_swiotlb_dma_ops)
+	if (xen_pvh_domain() || dma_ops == &xen_swiotlb_dma_ops)
 		return 0;
 
 	/* we can work with the default swiotlb */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 2/5] xen/grants: update initialization order of xen grant table
  2023-03-12 12:01 [RFC PATCH 0/5] Add Xen PVH dom0 support for GPU Huang Rui
  2023-03-12 12:01 ` [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh Huang Rui
@ 2023-03-12 12:01 ` Huang Rui
  2023-03-15 12:31   ` Roger Pau Monné
  2023-03-12 12:01 ` [RFC PATCH 3/5] drm/amdgpu: set passthrough mode for xen pvh/hvm Huang Rui
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 27+ messages in thread
From: Huang Rui @ 2023-03-12 12:01 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx
  Cc: Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian,
	Huang Rui

The xen grant table will be initialied before parsing the PCI resources,
so xen_alloc_unpopulated_pages() ends up using a range from the PCI
window because Linux hasn't parsed the PCI information yet.

So modify the initialization order to make sure the real PCI resources
are parsed before.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/xen/grant-table.c | 2 +-
 drivers/xen/grant-table.c  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
index 1e681bf62561..64a04d1e70f5 100644
--- a/arch/x86/xen/grant-table.c
+++ b/arch/x86/xen/grant-table.c
@@ -165,5 +165,5 @@ static int __init xen_pvh_gnttab_setup(void)
 }
 /* Call it _before_ __gnttab_init as we need to initialize the
  * xen_auto_xlat_grant_frames first. */
-core_initcall(xen_pvh_gnttab_setup);
+fs_initcall_sync(xen_pvh_gnttab_setup);
 #endif
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index e1ec725c2819..6382112f3473 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -1680,4 +1680,4 @@ static int __gnttab_init(void)
 }
 /* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
  * beforehand to initialize xen_auto_xlat_grant_frames. */
-core_initcall_sync(__gnttab_init);
+rootfs_initcall(__gnttab_init);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/5] drm/amdgpu: set passthrough mode for xen pvh/hvm
  2023-03-12 12:01 [RFC PATCH 0/5] Add Xen PVH dom0 support for GPU Huang Rui
  2023-03-12 12:01 ` [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh Huang Rui
  2023-03-12 12:01 ` [RFC PATCH 2/5] xen/grants: update initialization order of xen grant table Huang Rui
@ 2023-03-12 12:01 ` Huang Rui
  2023-03-15 12:42   ` Roger Pau Monné
  2023-03-12 12:01 ` [RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh Huang Rui
  2023-03-12 12:01 ` [RFC PATCH 5/5] xen/privcmd: add IOCTL_PRIVCMD_GSI_FROM_IRQ Huang Rui
  4 siblings, 1 reply; 27+ messages in thread
From: Huang Rui @ 2023-03-12 12:01 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx
  Cc: Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian,
	Huang Rui

There is an second stage translation between the guest machine address
and host machine address in Xen PVH/HVM. The PCI bar address in the xen
guest kernel are not translated at the second stage on Xen PVH/HVM, so
it's not the real physical address that hardware would like to know, so
we need to set passthrough mode for Xen PVH/HVM as well.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f2e2cbaa7fde..7b4369eba19d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -743,7 +743,8 @@ void amdgpu_detect_virtualization(struct amdgpu_device *adev)
 
 	if (!reg) {
 		/* passthrough mode exclus sriov mod */
-		if (is_virtual_machine() && !xen_initial_domain())
+		if (is_virtual_machine() &&
+		    !(xen_initial_domain() && xen_pv_domain()))
 			adev->virt.caps |= AMDGPU_PASSTHROUGH_MODE;
 	}
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh
  2023-03-12 12:01 [RFC PATCH 0/5] Add Xen PVH dom0 support for GPU Huang Rui
                   ` (2 preceding siblings ...)
  2023-03-12 12:01 ` [RFC PATCH 3/5] drm/amdgpu: set passthrough mode for xen pvh/hvm Huang Rui
@ 2023-03-12 12:01 ` Huang Rui
  2023-03-15 14:00   ` Roger Pau Monné
  2023-03-12 12:01 ` [RFC PATCH 5/5] xen/privcmd: add IOCTL_PRIVCMD_GSI_FROM_IRQ Huang Rui
  4 siblings, 1 reply; 27+ messages in thread
From: Huang Rui @ 2023-03-12 12:01 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx
  Cc: Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian,
	Huang Rui

From: Chen Jiqian <Jiqian.Chen@amd.com>

Add acpi_register_gsi_xen_pvh() to register gsi for PVH mode.
In addition to call acpi_register_gsi_ioapic(), it also setup
a map between gsi and vector in hypervisor side. So that,
when dgpu create an interrupt, hypervisor can correctly find
which guest domain to process interrupt by vector.

Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/include/asm/apic.h      |  7 ++++++
 arch/x86/include/asm/xen/pci.h   |  5 ++++
 arch/x86/kernel/acpi/boot.c      |  2 +-
 arch/x86/pci/xen.c               | 39 ++++++++++++++++++++++++++++++++
 drivers/xen/events/events_base.c |  2 ++
 5 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 3415321c8240..f3bc5de1f1d4 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -179,6 +179,8 @@ extern bool apic_needs_pit(void);
 
 extern void apic_send_IPI_allbutself(unsigned int vector);
 
+extern int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+				    int trigger, int polarity);
 #else /* !CONFIG_X86_LOCAL_APIC */
 static inline void lapic_shutdown(void) { }
 #define local_apic_timer_c2_ok		1
@@ -193,6 +195,11 @@ static inline void apic_intr_mode_init(void) { }
 static inline void lapic_assign_system_vectors(void) { }
 static inline void lapic_assign_legacy_vector(unsigned int i, bool r) { }
 static inline bool apic_needs_pit(void) { return true; }
+static inline int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+				    int trigger, int polarity)
+{
+	return (int)gsi;
+}
 #endif /* !CONFIG_X86_LOCAL_APIC */
 
 #ifdef CONFIG_X86_X2APIC
diff --git a/arch/x86/include/asm/xen/pci.h b/arch/x86/include/asm/xen/pci.h
index 9015b888edd6..aa8ded61fc2d 100644
--- a/arch/x86/include/asm/xen/pci.h
+++ b/arch/x86/include/asm/xen/pci.h
@@ -5,6 +5,7 @@
 #if defined(CONFIG_PCI_XEN)
 extern int __init pci_xen_init(void);
 extern int __init pci_xen_hvm_init(void);
+extern int __init pci_xen_pvh_init(void);
 #define pci_xen 1
 #else
 #define pci_xen 0
@@ -13,6 +14,10 @@ static inline int pci_xen_hvm_init(void)
 {
 	return -1;
 }
+static inline int pci_xen_pvh_init(void)
+{
+	return -1;
+}
 #endif
 #ifdef CONFIG_XEN_PV_DOM0
 int __init pci_xen_initial_domain(void);
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 907cc98b1938..25ec48dd897e 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -718,7 +718,7 @@ static int acpi_register_gsi_pic(struct device *dev, u32 gsi,
 }
 
 #ifdef CONFIG_X86_LOCAL_APIC
-static int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
+int acpi_register_gsi_ioapic(struct device *dev, u32 gsi,
 				    int trigger, int polarity)
 {
 	int irq = gsi;
diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index b94f727251b6..43b8b6d7147b 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -114,6 +114,38 @@ static int acpi_register_gsi_xen_hvm(struct device *dev, u32 gsi,
 				 false /* no mapping of GSI to PIRQ */);
 }
 
+static int acpi_register_gsi_xen_pvh(struct device *dev, u32 gsi,
+				    int trigger, int polarity)
+{
+	int irq;
+	int rc;
+	struct physdev_map_pirq map_irq;
+	struct physdev_setup_gsi setup_gsi;
+
+	irq = acpi_register_gsi_ioapic(dev, gsi, trigger, polarity);
+
+	map_irq.domid = DOMID_SELF;
+	map_irq.type = MAP_PIRQ_TYPE_GSI;
+	map_irq.index = gsi;
+	map_irq.pirq = gsi;
+
+	rc = HYPERVISOR_physdev_op(PHYSDEVOP_map_pirq, &map_irq);
+	if (rc)
+		printk(KERN_ERR "xen map GSI: %u failed %d\n", gsi, rc);
+
+	setup_gsi.gsi = gsi;
+	setup_gsi.triggering = (trigger == ACPI_EDGE_SENSITIVE ? 0 : 1);
+	setup_gsi.polarity = (polarity == ACPI_ACTIVE_HIGH ? 0 : 1);
+
+	rc = HYPERVISOR_physdev_op(PHYSDEVOP_setup_gsi, &setup_gsi);
+	if (rc == -EEXIST)
+		printk(KERN_INFO "Already setup the GSI :%u\n", gsi);
+	else if (rc)
+		printk(KERN_ERR "Failed to setup GSI :%u, err_code:%d\n", gsi, rc);
+
+	return irq;
+}
+
 #ifdef CONFIG_XEN_PV_DOM0
 static int xen_register_gsi(u32 gsi, int triggering, int polarity)
 {
@@ -554,6 +586,13 @@ int __init pci_xen_hvm_init(void)
 	return 0;
 }
 
+int __init pci_xen_pvh_init(void)
+{
+	__acpi_register_gsi = acpi_register_gsi_xen_pvh;
+	__acpi_unregister_gsi = NULL;
+	return 0;
+}
+
 #ifdef CONFIG_XEN_PV_DOM0
 int __init pci_xen_initial_domain(void)
 {
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index c443f04aaad7..48dff0ed9acd 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -2317,6 +2317,8 @@ void __init xen_init_IRQ(void)
 	xen_init_setup_upcall_vector();
 	xen_alloc_callback_vector();
 
+	if (xen_pvh_domain())
+		pci_xen_pvh_init();
 
 	if (xen_hvm_domain()) {
 		native_init_IRQ();
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC PATCH 5/5] xen/privcmd: add IOCTL_PRIVCMD_GSI_FROM_IRQ
  2023-03-12 12:01 [RFC PATCH 0/5] Add Xen PVH dom0 support for GPU Huang Rui
                   ` (3 preceding siblings ...)
  2023-03-12 12:01 ` [RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh Huang Rui
@ 2023-03-12 12:01 ` Huang Rui
  2023-03-15 14:26   ` Roger Pau Monné
  4 siblings, 1 reply; 27+ messages in thread
From: Huang Rui @ 2023-03-12 12:01 UTC (permalink / raw)
  To: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx
  Cc: Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian,
	Huang Rui

From: Chen Jiqian <Jiqian.Chen@amd.com>

When hypervisor get an interrupt, it needs interrupt's
gsi number instead of irq number. Gsi number is unique
in xen, but irq number is only unique in one domain.
So, we need to record the relationship between irq and
gsi when dom0 initialized pci devices, and provide syscall
IOCTL_PRIVCMD_GSI_FROM_IRQ to translate irq to gsi. So
that, we can map pirq successfully in hypervisor side.

Signed-off-by: Chen Jiqian <Jiqian.Chen@amd.com>
Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 arch/x86/pci/xen.c               |  4 ++++
 drivers/xen/events/events_base.c | 37 ++++++++++++++++++++++++++++++++
 drivers/xen/privcmd.c            | 20 +++++++++++++++++
 include/uapi/xen/privcmd.h       |  7 ++++++
 include/xen/events.h             |  5 +++++
 5 files changed, 73 insertions(+)

diff --git a/arch/x86/pci/xen.c b/arch/x86/pci/xen.c
index 43b8b6d7147b..3237961c7640 100644
--- a/arch/x86/pci/xen.c
+++ b/arch/x86/pci/xen.c
@@ -143,6 +143,10 @@ static int acpi_register_gsi_xen_pvh(struct device *dev, u32 gsi,
 	else if (rc)
 		printk(KERN_ERR "Failed to setup GSI :%u, err_code:%d\n", gsi, rc);
 
+	rc = xen_pvh_add_gsi_irq_map(gsi, irq);
+	if (rc == -EEXIST)
+		printk(KERN_INFO "Already map the GSI :%u and IRQ: %d\n", gsi, irq);
+
 	return irq;
 }
 
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 48dff0ed9acd..39a57fed2de3 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -967,6 +967,43 @@ int xen_irq_from_gsi(unsigned gsi)
 }
 EXPORT_SYMBOL_GPL(xen_irq_from_gsi);
 
+int xen_gsi_from_irq(unsigned irq)
+{
+	struct irq_info *info;
+
+	list_for_each_entry(info, &xen_irq_list_head, list) {
+		if (info->type != IRQT_PIRQ)
+			continue;
+
+		if (info->irq == irq)
+			return info->u.pirq.gsi;
+	}
+
+	return -1;
+}
+EXPORT_SYMBOL_GPL(xen_gsi_from_irq);
+
+int xen_pvh_add_gsi_irq_map(unsigned gsi, unsigned irq)
+{
+	int tmp_irq;
+	struct irq_info *info;
+
+	tmp_irq = xen_irq_from_gsi(gsi);
+	if (tmp_irq != -1)
+		return -EEXIST;
+
+	info = kzalloc(sizeof(*info), GFP_KERNEL);
+	if (info == NULL)
+		panic("Unable to allocate metadata for GSI%d\n", gsi);
+
+	info->type = IRQT_PIRQ;
+	info->irq = irq;
+	info->u.pirq.gsi = gsi;
+	list_add_tail(&info->list, &xen_irq_list_head);
+
+	return 0;
+}
+
 static void __unbind_from_irq(unsigned int irq)
 {
 	evtchn_port_t evtchn = evtchn_from_irq(irq);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index e88e8f6f0a33..830e84451731 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -37,6 +37,7 @@
 #include <xen/page.h>
 #include <xen/xen-ops.h>
 #include <xen/balloon.h>
+#include <xen/events.h>
 
 #include "privcmd.h"
 
@@ -833,6 +834,21 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
 	return rc;
 }
 
+static long privcmd_ioctl_gsi_from_irq(struct file *file, void __user *udata)
+{
+	struct privcmd_gsi_from_irq kdata;
+
+	if (copy_from_user(&kdata, udata, sizeof(kdata)))
+		return -EFAULT;
+
+	kdata.gsi = xen_gsi_from_irq(kdata.irq);
+
+	if (copy_to_user(udata, &kdata, sizeof(kdata)))
+		return -EFAULT;
+
+	return 0;
+}
+
 static long privcmd_ioctl(struct file *file,
 			  unsigned int cmd, unsigned long data)
 {
@@ -868,6 +884,10 @@ static long privcmd_ioctl(struct file *file,
 		ret = privcmd_ioctl_mmap_resource(file, udata);
 		break;
 
+	case IOCTL_PRIVCMD_GSI_FROM_IRQ:
+		ret = privcmd_ioctl_gsi_from_irq(file, udata);
+		break;
+
 	default:
 		break;
 	}
diff --git a/include/uapi/xen/privcmd.h b/include/uapi/xen/privcmd.h
index d2029556083e..55fe748bbfd7 100644
--- a/include/uapi/xen/privcmd.h
+++ b/include/uapi/xen/privcmd.h
@@ -98,6 +98,11 @@ struct privcmd_mmap_resource {
 	__u64 addr;
 };
 
+struct privcmd_gsi_from_irq {
+	__u32 irq;
+	__u32 gsi;
+};
+
 /*
  * @cmd: IOCTL_PRIVCMD_HYPERCALL
  * @arg: &privcmd_hypercall_t
@@ -125,5 +130,7 @@ struct privcmd_mmap_resource {
 	_IOC(_IOC_NONE, 'P', 6, sizeof(domid_t))
 #define IOCTL_PRIVCMD_MMAP_RESOURCE				\
 	_IOC(_IOC_NONE, 'P', 7, sizeof(struct privcmd_mmap_resource))
+#define IOCTL_PRIVCMD_GSI_FROM_IRQ				\
+	_IOC(_IOC_NONE, 'P', 8, sizeof(struct privcmd_gsi_from_irq))
 
 #endif /* __LINUX_PUBLIC_PRIVCMD_H__ */
diff --git a/include/xen/events.h b/include/xen/events.h
index 344081e71584..8377d8dfaa71 100644
--- a/include/xen/events.h
+++ b/include/xen/events.h
@@ -133,6 +133,11 @@ int xen_pirq_from_irq(unsigned irq);
 /* Return the irq allocated to the gsi */
 int xen_irq_from_gsi(unsigned gsi);
 
+/* Return the gsi from irq */
+int xen_gsi_from_irq(unsigned irq);
+
+int xen_pvh_add_gsi_irq_map(unsigned gsi, unsigned irq);
+
 /* Determine whether to ignore this IRQ if it is passed to a guest. */
 int xen_test_irq_shared(int irq);
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-12 12:01 ` [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh Huang Rui
@ 2023-03-13  8:56   ` Jan Beulich
  2023-03-15  0:52     ` Stefano Stabellini
  2023-03-16 16:28   ` Roger Pau Monné
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2023-03-13  8:56 UTC (permalink / raw)
  To: Huang Rui
  Cc: Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian,
	Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx

On 12.03.2023 13:01, Huang Rui wrote:
> Xen PVH is the paravirtualized mode and takes advantage of hardware
> virtualization support when possible. It will using the hardware IOMMU
> support instead of xen-swiotlb, so disable swiotlb if current domain is
> Xen PVH.

But the kernel has no way (yet) to drive the IOMMU, so how can it get
away without resorting to swiotlb in certain cases (like I/O to an
address-restricted device)?

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-13  8:56   ` Jan Beulich
@ 2023-03-15  0:52     ` Stefano Stabellini
  2023-03-15  4:14       ` Huang Rui
  2023-03-15  6:49       ` Jan Beulich
  0 siblings, 2 replies; 27+ messages in thread
From: Stefano Stabellini @ 2023-03-15  0:52 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Huang Rui, Alex Deucher, Christian König,
	Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang,
	Chen Jiqian, Juergen Gross, Stefano Stabellini,
	Oleksandr Tyshchenko, Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx

On Mon, 13 Mar 2023, Jan Beulich wrote:
> On 12.03.2023 13:01, Huang Rui wrote:
> > Xen PVH is the paravirtualized mode and takes advantage of hardware
> > virtualization support when possible. It will using the hardware IOMMU
> > support instead of xen-swiotlb, so disable swiotlb if current domain is
> > Xen PVH.
> 
> But the kernel has no way (yet) to drive the IOMMU, so how can it get
> away without resorting to swiotlb in certain cases (like I/O to an
> address-restricted device)?

I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
so we can use guest physical addresses instead of machine addresses for
DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
(see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
case is XENFEAT_not_direct_mapped).

Jurgen, what do you think? Would you rather make xen_swiotlb_detect
common between ARM and x86?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-15  0:52     ` Stefano Stabellini
@ 2023-03-15  4:14       ` Huang Rui
  2023-03-15  6:52         ` Jan Beulich
  2023-03-15  6:49       ` Jan Beulich
  1 sibling, 1 reply; 27+ messages in thread
From: Huang Rui @ 2023-03-15  4:14 UTC (permalink / raw)
  To: Jan Beulich, Stefano Stabellini, Koenig, Christian
  Cc: Juergen Gross, Huang, Honglei1, amd-gfx, dri-devel, linux-kernel,
	Hildebrand, Stewart, Oleksandr Tyshchenko, Chen, Jiqian,
	Xenia Ragiadakou, Deucher, Alexander, xen-devel, Boris Ostrovsky,
	Zhang, Julia, Roger Pau Monn�

On Wed, Mar 15, 2023 at 08:52:30AM +0800, Stefano Stabellini wrote:
> On Mon, 13 Mar 2023, Jan Beulich wrote:
> > On 12.03.2023 13:01, Huang Rui wrote:
> > > Xen PVH is the paravirtualized mode and takes advantage of hardware
> > > virtualization support when possible. It will using the hardware IOMMU
> > > support instead of xen-swiotlb, so disable swiotlb if current domain is
> > > Xen PVH.
> > 
> > But the kernel has no way (yet) to drive the IOMMU, so how can it get
> > away without resorting to swiotlb in certain cases (like I/O to an
> > address-restricted device)?
> 
> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
> so we can use guest physical addresses instead of machine addresses for
> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
> case is XENFEAT_not_direct_mapped).

Hi Jan, sorry to late reply. We are using the native kernel amdgpu and ttm
driver on Dom0, amdgpu/ttm would like to use IOMMU to allocate coherent
buffers for userptr that map the user space memory to gpu access, however,
swiotlb doesn't support this. In other words, with swiotlb, we only can
handle the buffer page by page.

Thanks,
Ray

> 
> Jurgen, what do you think? Would you rather make xen_swiotlb_detect
> common between ARM and x86?


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-15  0:52     ` Stefano Stabellini
  2023-03-15  4:14       ` Huang Rui
@ 2023-03-15  6:49       ` Jan Beulich
  2023-03-15 23:25         ` Stefano Stabellini
  1 sibling, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2023-03-15  6:49 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Huang Rui, Alex Deucher, Christian König,
	Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang,
	Chen Jiqian, Juergen Gross, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx

On 15.03.2023 01:52, Stefano Stabellini wrote:
> On Mon, 13 Mar 2023, Jan Beulich wrote:
>> On 12.03.2023 13:01, Huang Rui wrote:
>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
>>> virtualization support when possible. It will using the hardware IOMMU
>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
>>> Xen PVH.
>>
>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
>> away without resorting to swiotlb in certain cases (like I/O to an
>> address-restricted device)?
> 
> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
> so we can use guest physical addresses instead of machine addresses for
> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
> case is XENFEAT_not_direct_mapped).

But how does Xen using an IOMMU help with, as said, address-restricted
devices? They may still need e.g. a 32-bit address to be programmed in,
and if the kernel has memory beyond the 4G boundary not all I/O buffers
may fulfill this requirement.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-15  4:14       ` Huang Rui
@ 2023-03-15  6:52         ` Jan Beulich
  0 siblings, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2023-03-15  6:52 UTC (permalink / raw)
  To: Huang Rui
  Cc: Juergen Gross, Huang, Honglei1, amd-gfx, dri-devel, linux-kernel,
	Hildebrand, Stewart, Oleksandr Tyshchenko, Chen, Jiqian,
	Xenia Ragiadakou, Deucher, Alexander, xen-devel, Boris Ostrovsky,
	Zhang, Julia, Roger Pau Monn�,
	Stefano Stabellini, Koenig, Christian

On 15.03.2023 05:14, Huang Rui wrote:
> On Wed, Mar 15, 2023 at 08:52:30AM +0800, Stefano Stabellini wrote:
>> On Mon, 13 Mar 2023, Jan Beulich wrote:
>>> On 12.03.2023 13:01, Huang Rui wrote:
>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
>>>> virtualization support when possible. It will using the hardware IOMMU
>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
>>>> Xen PVH.
>>>
>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
>>> away without resorting to swiotlb in certain cases (like I/O to an
>>> address-restricted device)?
>>
>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
>> so we can use guest physical addresses instead of machine addresses for
>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
>> case is XENFEAT_not_direct_mapped).
> 
> Hi Jan, sorry to late reply. We are using the native kernel amdgpu and ttm
> driver on Dom0, amdgpu/ttm would like to use IOMMU to allocate coherent
> buffers for userptr that map the user space memory to gpu access, however,
> swiotlb doesn't support this. In other words, with swiotlb, we only can
> handle the buffer page by page.

But how does outright disabling swiotlb help with this? There still wouldn't
be an IOMMU that your kernel has control over. Looks like you want something
like pvIOMMU, but that work was never completed. And even then the swiotlb
may continue to be needed for other purposes.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 2/5] xen/grants: update initialization order of xen grant table
  2023-03-12 12:01 ` [RFC PATCH 2/5] xen/grants: update initialization order of xen grant table Huang Rui
@ 2023-03-15 12:31   ` Roger Pau Monné
  0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2023-03-15 12:31 UTC (permalink / raw)
  To: Huang Rui
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, xen-devel, linux-kernel, dri-devel, amd-gfx,
	Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian

On Sun, Mar 12, 2023 at 08:01:54PM +0800, Huang Rui wrote:
> The xen grant table will be initialied before parsing the PCI resources,
> so xen_alloc_unpopulated_pages() ends up using a range from the PCI
> window because Linux hasn't parsed the PCI information yet.
> 
> So modify the initialization order to make sure the real PCI resources
> are parsed before.

Has this been tested on a domU to make sure the late grant table init
doesn't interfere with PV devices getting setup?

> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/xen/grant-table.c | 2 +-
>  drivers/xen/grant-table.c  | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/grant-table.c b/arch/x86/xen/grant-table.c
> index 1e681bf62561..64a04d1e70f5 100644
> --- a/arch/x86/xen/grant-table.c
> +++ b/arch/x86/xen/grant-table.c
> @@ -165,5 +165,5 @@ static int __init xen_pvh_gnttab_setup(void)
>  }
>  /* Call it _before_ __gnttab_init as we need to initialize the
>   * xen_auto_xlat_grant_frames first. */
> -core_initcall(xen_pvh_gnttab_setup);
> +fs_initcall_sync();
>  #endif
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index e1ec725c2819..6382112f3473 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -1680,4 +1680,4 @@ static int __gnttab_init(void)
>  }
>  /* Starts after core_initcall so that xen_pvh_gnttab_setup can be called
>   * beforehand to initialize xen_auto_xlat_grant_frames. */

Comment need to be updated, but I was thinking whether it won't be
best to simply call xen_pvh_gnttab_setup() from __gnttab_init() itself
when running as a PVH guest?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/5] drm/amdgpu: set passthrough mode for xen pvh/hvm
  2023-03-12 12:01 ` [RFC PATCH 3/5] drm/amdgpu: set passthrough mode for xen pvh/hvm Huang Rui
@ 2023-03-15 12:42   ` Roger Pau Monné
  0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2023-03-15 12:42 UTC (permalink / raw)
  To: Huang Rui
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, xen-devel, linux-kernel, dri-devel, amd-gfx,
	Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian

On Sun, Mar 12, 2023 at 08:01:55PM +0800, Huang Rui wrote:
> There is an second stage translation between the guest machine address
> and host machine address in Xen PVH/HVM. The PCI bar address in the xen
> guest kernel are not translated at the second stage on Xen PVH/HVM, so

I'm confused by the sentence above, do you think it could be reworded
or expanded to clarify?

PCI BAR addresses are not in the guest kernel, but rather in the
physical memory layout made available to the guest.

Also, I'm unsure why xen_initial_domain() needs to be used in the
conditional below: all PV domains handle addresses the same, so if
it's not needed for a PV dom0 it's likely not needed for a PV domU
either.  Albeit it would help to know more about what
AMDGPU_PASSTHROUGH_MODE implies.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh
  2023-03-12 12:01 ` [RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh Huang Rui
@ 2023-03-15 14:00   ` Roger Pau Monné
  0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2023-03-15 14:00 UTC (permalink / raw)
  To: Huang Rui
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, xen-devel, linux-kernel, dri-devel, amd-gfx,
	Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian

On Sun, Mar 12, 2023 at 08:01:56PM +0800, Huang Rui wrote:
> From: Chen Jiqian <Jiqian.Chen@amd.com>
> 
> Add acpi_register_gsi_xen_pvh() to register gsi for PVH mode.
> In addition to call acpi_register_gsi_ioapic(), it also setup
> a map between gsi and vector in hypervisor side. So that,
> when dgpu create an interrupt, hypervisor can correctly find
> which guest domain to process interrupt by vector.

The term 'dgpu' needs clarifying or replacing by a more generic
naming.

Also, I would like to be able to get away from requiring dom0 to
register the GSIs in this way.  If you take a look at Xen, there's
code in the emulated IO-APIC available to dom0 that already does this
registering (see vioapic_hwdom_map_gsi() in Xen).

I think the problem here is that the GSI used by the device you want
to passthrough has never had it's pin unmasked in the IO-APIC, and
hence hasn't been registered.

It would be helpful if you could state in the commit message what
issue you are trying to solve by doing this registering here, I assume
it is done in order to map the IRQ to a PIRQ, so later calls by the
toolstack to bind it succeed.

Would it be possible instead to perform the call to PHYSDEVOP_map_pirq
in the toolstack itself if the PIRQ cannot be found?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 5/5] xen/privcmd: add IOCTL_PRIVCMD_GSI_FROM_IRQ
  2023-03-12 12:01 ` [RFC PATCH 5/5] xen/privcmd: add IOCTL_PRIVCMD_GSI_FROM_IRQ Huang Rui
@ 2023-03-15 14:26   ` Roger Pau Monné
  0 siblings, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2023-03-15 14:26 UTC (permalink / raw)
  To: Huang Rui
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, xen-devel, linux-kernel, dri-devel, amd-gfx,
	Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian

On Sun, Mar 12, 2023 at 08:01:57PM +0800, Huang Rui wrote:
> From: Chen Jiqian <Jiqian.Chen@amd.com>
> 
> When hypervisor get an interrupt, it needs interrupt's
> gsi number instead of irq number. Gsi number is unique
> in xen, but irq number is only unique in one domain.
> So, we need to record the relationship between irq and
> gsi when dom0 initialized pci devices, and provide syscall
> IOCTL_PRIVCMD_GSI_FROM_IRQ to translate irq to gsi. So
> that, we can map pirq successfully in hypervisor side.

GSI is not only unique in Xen, it's an ACPI provided value that's
unique in the platform.  The text above make it look like GSI is some
kind of internal Xen reference to an interrupt, but it's not.

How does a PV domain deal with this? I would assume there Linux will
also end up with IRQ != GSI, and hence will need some kind of
translation?

Thanks, Roger.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-15  6:49       ` Jan Beulich
@ 2023-03-15 23:25         ` Stefano Stabellini
  2023-03-16  7:50           ` Jan Beulich
  0 siblings, 1 reply; 27+ messages in thread
From: Stefano Stabellini @ 2023-03-15 23:25 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Huang Rui, Alex Deucher,
	Christian König, Stewart Hildebrand, Xenia Ragiadakou,
	Honglei Huang, Julia Zhang, Chen Jiqian, Juergen Gross,
	Oleksandr Tyshchenko, Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx

On Wed, 15 Mar 2023, Jan Beulich wrote:
> On 15.03.2023 01:52, Stefano Stabellini wrote:
> > On Mon, 13 Mar 2023, Jan Beulich wrote:
> >> On 12.03.2023 13:01, Huang Rui wrote:
> >>> Xen PVH is the paravirtualized mode and takes advantage of hardware
> >>> virtualization support when possible. It will using the hardware IOMMU
> >>> support instead of xen-swiotlb, so disable swiotlb if current domain is
> >>> Xen PVH.
> >>
> >> But the kernel has no way (yet) to drive the IOMMU, so how can it get
> >> away without resorting to swiotlb in certain cases (like I/O to an
> >> address-restricted device)?
> > 
> > I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
> > need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
> > so we can use guest physical addresses instead of machine addresses for
> > DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
> > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
> > case is XENFEAT_not_direct_mapped).
> 
> But how does Xen using an IOMMU help with, as said, address-restricted
> devices? They may still need e.g. a 32-bit address to be programmed in,
> and if the kernel has memory beyond the 4G boundary not all I/O buffers
> may fulfill this requirement.

In short, it is going to work as long as Linux has guest physical
addresses (not machine addresses, those could be anything) lower than
4GB.

If the address-restricted device does DMA via an IOMMU, then the device
gets programmed by Linux using its guest physical addresses (not machine
addresses).

The 32-bit restriction would be applied by Linux to its choice of guest
physical address to use to program the device, the same way it does on
native. The device would be fine as it always uses Linux-provided <4GB
addresses. After the IOMMU translation (pagetable setup by Xen), we
could get any address, including >4GB addresses, and that is expected to
work.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-15 23:25         ` Stefano Stabellini
@ 2023-03-16  7:50           ` Jan Beulich
  2023-03-16 13:45             ` Alex Deucher
  0 siblings, 1 reply; 27+ messages in thread
From: Jan Beulich @ 2023-03-16  7:50 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Huang Rui, Alex Deucher, Christian König,
	Stewart Hildebrand, Xenia Ragiadakou, Honglei Huang, Julia Zhang,
	Chen Jiqian, Juergen Gross, Oleksandr Tyshchenko,
	Boris Ostrovsky, Roger Pau Monné,
	xen-devel, linux-kernel, dri-devel, amd-gfx

On 16.03.2023 00:25, Stefano Stabellini wrote:
> On Wed, 15 Mar 2023, Jan Beulich wrote:
>> On 15.03.2023 01:52, Stefano Stabellini wrote:
>>> On Mon, 13 Mar 2023, Jan Beulich wrote:
>>>> On 12.03.2023 13:01, Huang Rui wrote:
>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
>>>>> virtualization support when possible. It will using the hardware IOMMU
>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
>>>>> Xen PVH.
>>>>
>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
>>>> away without resorting to swiotlb in certain cases (like I/O to an
>>>> address-restricted device)?
>>>
>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
>>> so we can use guest physical addresses instead of machine addresses for
>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
>>> case is XENFEAT_not_direct_mapped).
>>
>> But how does Xen using an IOMMU help with, as said, address-restricted
>> devices? They may still need e.g. a 32-bit address to be programmed in,
>> and if the kernel has memory beyond the 4G boundary not all I/O buffers
>> may fulfill this requirement.
> 
> In short, it is going to work as long as Linux has guest physical
> addresses (not machine addresses, those could be anything) lower than
> 4GB.
> 
> If the address-restricted device does DMA via an IOMMU, then the device
> gets programmed by Linux using its guest physical addresses (not machine
> addresses).
> 
> The 32-bit restriction would be applied by Linux to its choice of guest
> physical address to use to program the device, the same way it does on
> native. The device would be fine as it always uses Linux-provided <4GB
> addresses. After the IOMMU translation (pagetable setup by Xen), we
> could get any address, including >4GB addresses, and that is expected to
> work.

I understand that's the "normal" way of working. But whatever the swiotlb
is used for in baremetal Linux, that would similarly require its use in
PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to
me like an incomplete attempt to disable its use altogether on x86. What
difference of PVH vs baremetal am I missing here?

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16  7:50           ` Jan Beulich
@ 2023-03-16 13:45             ` Alex Deucher
  2023-03-16 13:48               ` Juergen Gross
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Deucher @ 2023-03-16 13:45 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Stefano Stabellini, Juergen Gross, Honglei Huang, amd-gfx,
	dri-devel, linux-kernel, Stewart Hildebrand,
	Oleksandr Tyshchenko, Huang Rui, Chen Jiqian, Xenia Ragiadakou,
	Alex Deucher, xen-devel, Boris Ostrovsky, Julia Zhang,
	Christian König, Roger Pau Monné

On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 16.03.2023 00:25, Stefano Stabellini wrote:
> > On Wed, 15 Mar 2023, Jan Beulich wrote:
> >> On 15.03.2023 01:52, Stefano Stabellini wrote:
> >>> On Mon, 13 Mar 2023, Jan Beulich wrote:
> >>>> On 12.03.2023 13:01, Huang Rui wrote:
> >>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
> >>>>> virtualization support when possible. It will using the hardware IOMMU
> >>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
> >>>>> Xen PVH.
> >>>>
> >>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
> >>>> away without resorting to swiotlb in certain cases (like I/O to an
> >>>> address-restricted device)?
> >>>
> >>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
> >>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
> >>> so we can use guest physical addresses instead of machine addresses for
> >>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
> >>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
> >>> case is XENFEAT_not_direct_mapped).
> >>
> >> But how does Xen using an IOMMU help with, as said, address-restricted
> >> devices? They may still need e.g. a 32-bit address to be programmed in,
> >> and if the kernel has memory beyond the 4G boundary not all I/O buffers
> >> may fulfill this requirement.
> >
> > In short, it is going to work as long as Linux has guest physical
> > addresses (not machine addresses, those could be anything) lower than
> > 4GB.
> >
> > If the address-restricted device does DMA via an IOMMU, then the device
> > gets programmed by Linux using its guest physical addresses (not machine
> > addresses).
> >
> > The 32-bit restriction would be applied by Linux to its choice of guest
> > physical address to use to program the device, the same way it does on
> > native. The device would be fine as it always uses Linux-provided <4GB
> > addresses. After the IOMMU translation (pagetable setup by Xen), we
> > could get any address, including >4GB addresses, and that is expected to
> > work.
>
> I understand that's the "normal" way of working. But whatever the swiotlb
> is used for in baremetal Linux, that would similarly require its use in
> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to
> me like an incomplete attempt to disable its use altogether on x86. What
> difference of PVH vs baremetal am I missing here?

swiotlb is not usable for GPUs even on bare metal.  They often have
hundreds or megs or even gigs of memory mapped on the device at any
given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
the chip family).

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16 13:45             ` Alex Deucher
@ 2023-03-16 13:48               ` Juergen Gross
  2023-03-16 13:53                 ` Alex Deucher
  0 siblings, 1 reply; 27+ messages in thread
From: Juergen Gross @ 2023-03-16 13:48 UTC (permalink / raw)
  To: Alex Deucher, Jan Beulich
  Cc: Stefano Stabellini, Honglei Huang, amd-gfx, dri-devel,
	linux-kernel, Stewart Hildebrand, Oleksandr Tyshchenko,
	Huang Rui, Chen Jiqian, Xenia Ragiadakou, Alex Deucher,
	xen-devel, Boris Ostrovsky, Julia Zhang, Christian König,
	Roger Pau Monné


[-- Attachment #1.1.1: Type: text/plain, Size: 2944 bytes --]

On 16.03.23 14:45, Alex Deucher wrote:
> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>>
>> On 16.03.2023 00:25, Stefano Stabellini wrote:
>>> On Wed, 15 Mar 2023, Jan Beulich wrote:
>>>> On 15.03.2023 01:52, Stefano Stabellini wrote:
>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote:
>>>>>> On 12.03.2023 13:01, Huang Rui wrote:
>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
>>>>>>> virtualization support when possible. It will using the hardware IOMMU
>>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
>>>>>>> Xen PVH.
>>>>>>
>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
>>>>>> away without resorting to swiotlb in certain cases (like I/O to an
>>>>>> address-restricted device)?
>>>>>
>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
>>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
>>>>> so we can use guest physical addresses instead of machine addresses for
>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
>>>>> case is XENFEAT_not_direct_mapped).
>>>>
>>>> But how does Xen using an IOMMU help with, as said, address-restricted
>>>> devices? They may still need e.g. a 32-bit address to be programmed in,
>>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers
>>>> may fulfill this requirement.
>>>
>>> In short, it is going to work as long as Linux has guest physical
>>> addresses (not machine addresses, those could be anything) lower than
>>> 4GB.
>>>
>>> If the address-restricted device does DMA via an IOMMU, then the device
>>> gets programmed by Linux using its guest physical addresses (not machine
>>> addresses).
>>>
>>> The 32-bit restriction would be applied by Linux to its choice of guest
>>> physical address to use to program the device, the same way it does on
>>> native. The device would be fine as it always uses Linux-provided <4GB
>>> addresses. After the IOMMU translation (pagetable setup by Xen), we
>>> could get any address, including >4GB addresses, and that is expected to
>>> work.
>>
>> I understand that's the "normal" way of working. But whatever the swiotlb
>> is used for in baremetal Linux, that would similarly require its use in
>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to
>> me like an incomplete attempt to disable its use altogether on x86. What
>> difference of PVH vs baremetal am I missing here?
> 
> swiotlb is not usable for GPUs even on bare metal.  They often have
> hundreds or megs or even gigs of memory mapped on the device at any
> given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> the chip family).

But the swiotlb isn't per device, but system global.


Juergen


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16 13:48               ` Juergen Gross
@ 2023-03-16 13:53                 ` Alex Deucher
  2023-03-16 13:58                   ` Jan Beulich
  2023-03-16 14:20                   ` Juergen Gross
  0 siblings, 2 replies; 27+ messages in thread
From: Alex Deucher @ 2023-03-16 13:53 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Jan Beulich, Stefano Stabellini, Honglei Huang, amd-gfx,
	dri-devel, linux-kernel, Stewart Hildebrand,
	Oleksandr Tyshchenko, Huang Rui, Chen Jiqian, Xenia Ragiadakou,
	Alex Deucher, xen-devel, Boris Ostrovsky, Julia Zhang,
	Christian König, Roger Pau Monné

On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote:
>
> On 16.03.23 14:45, Alex Deucher wrote:
> > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
> >>
> >> On 16.03.2023 00:25, Stefano Stabellini wrote:
> >>> On Wed, 15 Mar 2023, Jan Beulich wrote:
> >>>> On 15.03.2023 01:52, Stefano Stabellini wrote:
> >>>>> On Mon, 13 Mar 2023, Jan Beulich wrote:
> >>>>>> On 12.03.2023 13:01, Huang Rui wrote:
> >>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
> >>>>>>> virtualization support when possible. It will using the hardware IOMMU
> >>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
> >>>>>>> Xen PVH.
> >>>>>>
> >>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
> >>>>>> away without resorting to swiotlb in certain cases (like I/O to an
> >>>>>> address-restricted device)?
> >>>>>
> >>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
> >>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
> >>>>> so we can use guest physical addresses instead of machine addresses for
> >>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
> >>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
> >>>>> case is XENFEAT_not_direct_mapped).
> >>>>
> >>>> But how does Xen using an IOMMU help with, as said, address-restricted
> >>>> devices? They may still need e.g. a 32-bit address to be programmed in,
> >>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers
> >>>> may fulfill this requirement.
> >>>
> >>> In short, it is going to work as long as Linux has guest physical
> >>> addresses (not machine addresses, those could be anything) lower than
> >>> 4GB.
> >>>
> >>> If the address-restricted device does DMA via an IOMMU, then the device
> >>> gets programmed by Linux using its guest physical addresses (not machine
> >>> addresses).
> >>>
> >>> The 32-bit restriction would be applied by Linux to its choice of guest
> >>> physical address to use to program the device, the same way it does on
> >>> native. The device would be fine as it always uses Linux-provided <4GB
> >>> addresses. After the IOMMU translation (pagetable setup by Xen), we
> >>> could get any address, including >4GB addresses, and that is expected to
> >>> work.
> >>
> >> I understand that's the "normal" way of working. But whatever the swiotlb
> >> is used for in baremetal Linux, that would similarly require its use in
> >> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to
> >> me like an incomplete attempt to disable its use altogether on x86. What
> >> difference of PVH vs baremetal am I missing here?
> >
> > swiotlb is not usable for GPUs even on bare metal.  They often have
> > hundreds or megs or even gigs of memory mapped on the device at any
> > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > the chip family).
>
> But the swiotlb isn't per device, but system global.

Sure, but if the swiotlb is in use, then you can't really use the GPU.
So you get to pick one.

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16 13:53                 ` Alex Deucher
@ 2023-03-16 13:58                   ` Jan Beulich
  2023-03-16 14:20                   ` Juergen Gross
  1 sibling, 0 replies; 27+ messages in thread
From: Jan Beulich @ 2023-03-16 13:58 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Stefano Stabellini, Honglei Huang, amd-gfx, dri-devel,
	linux-kernel, Stewart Hildebrand, Oleksandr Tyshchenko,
	Huang Rui, Chen Jiqian, Xenia Ragiadakou, Alex Deucher,
	xen-devel, Boris Ostrovsky, Julia Zhang, Christian König,
	Roger Pau Monné,
	Juergen Gross

On 16.03.2023 14:53, Alex Deucher wrote:
> On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> On 16.03.23 14:45, Alex Deucher wrote:
>>> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 16.03.2023 00:25, Stefano Stabellini wrote:
>>>>> On Wed, 15 Mar 2023, Jan Beulich wrote:
>>>>>> On 15.03.2023 01:52, Stefano Stabellini wrote:
>>>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote:
>>>>>>>> On 12.03.2023 13:01, Huang Rui wrote:
>>>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
>>>>>>>>> virtualization support when possible. It will using the hardware IOMMU
>>>>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
>>>>>>>>> Xen PVH.
>>>>>>>>
>>>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
>>>>>>>> away without resorting to swiotlb in certain cases (like I/O to an
>>>>>>>> address-restricted device)?
>>>>>>>
>>>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
>>>>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
>>>>>>> so we can use guest physical addresses instead of machine addresses for
>>>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
>>>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
>>>>>>> case is XENFEAT_not_direct_mapped).
>>>>>>
>>>>>> But how does Xen using an IOMMU help with, as said, address-restricted
>>>>>> devices? They may still need e.g. a 32-bit address to be programmed in,
>>>>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers
>>>>>> may fulfill this requirement.
>>>>>
>>>>> In short, it is going to work as long as Linux has guest physical
>>>>> addresses (not machine addresses, those could be anything) lower than
>>>>> 4GB.
>>>>>
>>>>> If the address-restricted device does DMA via an IOMMU, then the device
>>>>> gets programmed by Linux using its guest physical addresses (not machine
>>>>> addresses).
>>>>>
>>>>> The 32-bit restriction would be applied by Linux to its choice of guest
>>>>> physical address to use to program the device, the same way it does on
>>>>> native. The device would be fine as it always uses Linux-provided <4GB
>>>>> addresses. After the IOMMU translation (pagetable setup by Xen), we
>>>>> could get any address, including >4GB addresses, and that is expected to
>>>>> work.
>>>>
>>>> I understand that's the "normal" way of working. But whatever the swiotlb
>>>> is used for in baremetal Linux, that would similarly require its use in
>>>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to
>>>> me like an incomplete attempt to disable its use altogether on x86. What
>>>> difference of PVH vs baremetal am I missing here?
>>>
>>> swiotlb is not usable for GPUs even on bare metal.  They often have
>>> hundreds or megs or even gigs of memory mapped on the device at any
>>> given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
>>> the chip family).
>>
>> But the swiotlb isn't per device, but system global.
> 
> Sure, but if the swiotlb is in use, then you can't really use the GPU.
> So you get to pick one.

Yet that "pick one" then can't be an unconditional disable in the source code.
If there's no way to avoid swiotlb on a per-device basis, then users will need
to be told to arrange for this via command line option when they want to use
the GPU is certain ways.

Jan


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16 13:53                 ` Alex Deucher
  2023-03-16 13:58                   ` Jan Beulich
@ 2023-03-16 14:20                   ` Juergen Gross
  2023-03-16 23:09                     ` Stefano Stabellini
  1 sibling, 1 reply; 27+ messages in thread
From: Juergen Gross @ 2023-03-16 14:20 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Jan Beulich, Stefano Stabellini, Honglei Huang, amd-gfx,
	dri-devel, linux-kernel, Stewart Hildebrand,
	Oleksandr Tyshchenko, Huang Rui, Chen Jiqian, Xenia Ragiadakou,
	Alex Deucher, xen-devel, Boris Ostrovsky, Julia Zhang,
	Christian König, Roger Pau Monné


[-- Attachment #1.1.1: Type: text/plain, Size: 3747 bytes --]

On 16.03.23 14:53, Alex Deucher wrote:
> On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> On 16.03.23 14:45, Alex Deucher wrote:
>>> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>
>>>> On 16.03.2023 00:25, Stefano Stabellini wrote:
>>>>> On Wed, 15 Mar 2023, Jan Beulich wrote:
>>>>>> On 15.03.2023 01:52, Stefano Stabellini wrote:
>>>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote:
>>>>>>>> On 12.03.2023 13:01, Huang Rui wrote:
>>>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of hardware
>>>>>>>>> virtualization support when possible. It will using the hardware IOMMU
>>>>>>>>> support instead of xen-swiotlb, so disable swiotlb if current domain is
>>>>>>>>> Xen PVH.
>>>>>>>>
>>>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can it get
>>>>>>>> away without resorting to swiotlb in certain cases (like I/O to an
>>>>>>>> address-restricted device)?
>>>>>>>
>>>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there is no
>>>>>>> need for swiotlb-xen in Dom0. Address translations are done by the IOMMU
>>>>>>> so we can use guest physical addresses instead of machine addresses for
>>>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is available
>>>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the corresponding
>>>>>>> case is XENFEAT_not_direct_mapped).
>>>>>>
>>>>>> But how does Xen using an IOMMU help with, as said, address-restricted
>>>>>> devices? They may still need e.g. a 32-bit address to be programmed in,
>>>>>> and if the kernel has memory beyond the 4G boundary not all I/O buffers
>>>>>> may fulfill this requirement.
>>>>>
>>>>> In short, it is going to work as long as Linux has guest physical
>>>>> addresses (not machine addresses, those could be anything) lower than
>>>>> 4GB.
>>>>>
>>>>> If the address-restricted device does DMA via an IOMMU, then the device
>>>>> gets programmed by Linux using its guest physical addresses (not machine
>>>>> addresses).
>>>>>
>>>>> The 32-bit restriction would be applied by Linux to its choice of guest
>>>>> physical address to use to program the device, the same way it does on
>>>>> native. The device would be fine as it always uses Linux-provided <4GB
>>>>> addresses. After the IOMMU translation (pagetable setup by Xen), we
>>>>> could get any address, including >4GB addresses, and that is expected to
>>>>> work.
>>>>
>>>> I understand that's the "normal" way of working. But whatever the swiotlb
>>>> is used for in baremetal Linux, that would similarly require its use in
>>>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look to
>>>> me like an incomplete attempt to disable its use altogether on x86. What
>>>> difference of PVH vs baremetal am I missing here?
>>>
>>> swiotlb is not usable for GPUs even on bare metal.  They often have
>>> hundreds or megs or even gigs of memory mapped on the device at any
>>> given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
>>> the chip family).
>>
>> But the swiotlb isn't per device, but system global.
> 
> Sure, but if the swiotlb is in use, then you can't really use the GPU.
> So you get to pick one.

The swiotlb is used only for buffers which are not within the DMA mask of a
device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
won't use the swiotlb unless you have a buffer above guest physical address of
16TB (so basically never).

Disabling swiotlb in such a guest would OTOH mean, that a device with only
32 bit DMA mask passed through to this guest couldn't work with buffers
above 4GB.

I don't think this is acceptable.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3149 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-12 12:01 ` [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh Huang Rui
  2023-03-13  8:56   ` Jan Beulich
@ 2023-03-16 16:28   ` Roger Pau Monné
  1 sibling, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2023-03-16 16:28 UTC (permalink / raw)
  To: Huang Rui
  Cc: Juergen Gross, Stefano Stabellini, Oleksandr Tyshchenko,
	Boris Ostrovsky, xen-devel, linux-kernel, dri-devel, amd-gfx,
	Alex Deucher, Christian König, Stewart Hildebrand,
	Xenia Ragiadakou, Honglei Huang, Julia Zhang, Chen Jiqian

On Sun, Mar 12, 2023 at 08:01:53PM +0800, Huang Rui wrote:
> Xen PVH is the paravirtualized mode and takes advantage of hardware
> virtualization support when possible. It will using the hardware IOMMU
> support instead of xen-swiotlb, so disable swiotlb if current domain is
> Xen PVH.
> 
> Signed-off-by: Huang Rui <ray.huang@amd.com>
> ---
>  arch/x86/kernel/pci-dma.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index 30bbe4abb5d6..f5c73dd18f2a 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -74,6 +74,12 @@ static inline void __init pci_swiotlb_detect(void)
>  #ifdef CONFIG_SWIOTLB_XEN
>  static void __init pci_xen_swiotlb_init(void)
>  {
> +	/* Xen PVH domain won't use swiotlb */
> +	if (xen_pvh_domain()) {
> +		x86_swiotlb_enable = false;
> +		return;
> +	}

I'm very confused by this: pci_xen_swiotlb_init() is only called for
PV domains, see the only caller in pci_iommu_alloc().  So this is just
dead code.

> +
>  	if (!xen_initial_domain() && !x86_swiotlb_enable)
>  		return;
>  	x86_swiotlb_enable = true;
> @@ -86,7 +92,7 @@ static void __init pci_xen_swiotlb_init(void)
>  
>  int pci_xen_swiotlb_init_late(void)
>  {
> -	if (dma_ops == &xen_swiotlb_dma_ops)
> +	if (xen_pvh_domain() || dma_ops == &xen_swiotlb_dma_ops)

Same here, this function is only called by
pcifront_connect_and_init_dma() and pcifront should never attach on a
PVH domain, hence it's also dead code.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16 14:20                   ` Juergen Gross
@ 2023-03-16 23:09                     ` Stefano Stabellini
  2023-03-17 10:19                       ` Roger Pau Monné
  2023-03-17 14:45                       ` Alex Deucher
  0 siblings, 2 replies; 27+ messages in thread
From: Stefano Stabellini @ 2023-03-16 23:09 UTC (permalink / raw)
  To: Juergen Gross
  Cc: Alex Deucher, Jan Beulich, Stefano Stabellini, Honglei Huang,
	amd-gfx, dri-devel, linux-kernel, Stewart Hildebrand,
	Oleksandr Tyshchenko, Huang Rui, Chen Jiqian, Xenia Ragiadakou,
	Alex Deucher, xen-devel, Boris Ostrovsky, Julia Zhang,
	Christian König, Roger Pau Monné

[-- Attachment #1: Type: text/plain, Size: 5258 bytes --]

On Thu, 16 Mar 2023, Juergen Gross wrote:
> On 16.03.23 14:53, Alex Deucher wrote:
> > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote:
> > > 
> > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
> > > > > 
> > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > hardware
> > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > hardware IOMMU
> > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > current domain is
> > > > > > > > > > Xen PVH.
> > > > > > > > > 
> > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can
> > > > > > > > > it get
> > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > to an
> > > > > > > > > address-restricted device)?
> > > > > > > > 
> > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there
> > > > > > > > is no
> > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > the IOMMU
> > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > addresses for
> > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > available
> > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > corresponding
> > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > 
> > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > address-restricted
> > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > programmed in,
> > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > buffers
> > > > > > > may fulfill this requirement.
> > > > > > 
> > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > than
> > > > > > 4GB.
> > > > > > 
> > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > device
> > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > machine
> > > > > > addresses).
> > > > > > 
> > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > guest
> > > > > > physical address to use to program the device, the same way it does
> > > > > > on
> > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > <4GB
> > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we
> > > > > > could get any address, including >4GB addresses, and that is
> > > > > > expected to
> > > > > > work.
> > > > > 
> > > > > I understand that's the "normal" way of working. But whatever the
> > > > > swiotlb
> > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > in
> > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > to
> > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > What
> > > > > difference of PVH vs baremetal am I missing here?
> > > > 
> > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > the chip family).
> > > 
> > > But the swiotlb isn't per device, but system global.
> > 
> > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > So you get to pick one.
> 
> The swiotlb is used only for buffers which are not within the DMA mask of a
> device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
> won't use the swiotlb unless you have a buffer above guest physical address of
> 16TB (so basically never).
> 
> Disabling swiotlb in such a guest would OTOH mean, that a device with only
> 32 bit DMA mask passed through to this guest couldn't work with buffers
> above 4GB.
> 
> I don't think this is acceptable.

From the Xen subsystem in Linux point of view, the only thing we need to
do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
the global swiotlb) on PVH because it is not needed anyway.

I think we should leave the global "swiotlb" setting alone. The global
swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
have a way to deal with swiotlb/GPU incompatibilities.

We just have to avoid making things worse on Xen, and for that we just
need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
swiotlb, then we have a good Linux configuration capable of handling the
GPU properly.

Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to
false on native (non-Xen) x86?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16 23:09                     ` Stefano Stabellini
@ 2023-03-17 10:19                       ` Roger Pau Monné
  2023-03-17 14:45                       ` Alex Deucher
  1 sibling, 0 replies; 27+ messages in thread
From: Roger Pau Monné @ 2023-03-17 10:19 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Alex Deucher, Jan Beulich, Honglei Huang, amd-gfx,
	dri-devel, linux-kernel, Stewart Hildebrand,
	Oleksandr Tyshchenko, Huang Rui, Chen Jiqian, Xenia Ragiadakou,
	Alex Deucher, xen-devel, Boris Ostrovsky, Julia Zhang,
	Christian König

On Thu, Mar 16, 2023 at 04:09:44PM -0700, Stefano Stabellini wrote:
> On Thu, 16 Mar 2023, Juergen Gross wrote:
> > On 16.03.23 14:53, Alex Deucher wrote:
> > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote:
> > > > 
> > > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
> > > > > > 
> > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > > hardware
> > > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > > hardware IOMMU
> > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > > current domain is
> > > > > > > > > > > Xen PVH.
> > > > > > > > > > 
> > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can
> > > > > > > > > > it get
> > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > > to an
> > > > > > > > > > address-restricted device)?
> > > > > > > > > 
> > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there
> > > > > > > > > is no
> > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > > the IOMMU
> > > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > > addresses for
> > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > > available
> > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > > corresponding
> > > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > > 
> > > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > > address-restricted
> > > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > > programmed in,
> > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > > buffers
> > > > > > > > may fulfill this requirement.
> > > > > > > 
> > > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > > than
> > > > > > > 4GB.
> > > > > > > 
> > > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > > device
> > > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > > machine
> > > > > > > addresses).
> > > > > > > 
> > > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > > guest
> > > > > > > physical address to use to program the device, the same way it does
> > > > > > > on
> > > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > > <4GB
> > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we
> > > > > > > could get any address, including >4GB addresses, and that is
> > > > > > > expected to
> > > > > > > work.
> > > > > > 
> > > > > > I understand that's the "normal" way of working. But whatever the
> > > > > > swiotlb
> > > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > > in
> > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > > to
> > > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > > What
> > > > > > difference of PVH vs baremetal am I missing here?
> > > > > 
> > > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > > the chip family).
> > > > 
> > > > But the swiotlb isn't per device, but system global.
> > > 
> > > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > > So you get to pick one.
> > 
> > The swiotlb is used only for buffers which are not within the DMA mask of a
> > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
> > won't use the swiotlb unless you have a buffer above guest physical address of
> > 16TB (so basically never).
> > 
> > Disabling swiotlb in such a guest would OTOH mean, that a device with only
> > 32 bit DMA mask passed through to this guest couldn't work with buffers
> > above 4GB.
> > 
> > I don't think this is acceptable.
> 
> From the Xen subsystem in Linux point of view, the only thing we need to
> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
> the global swiotlb) on PVH because it is not needed anyway.

But this is already the case on PVH, swiotlb_xen won't be enabled.
swiotlb_xen is only enabled for PV domains, other domain types don't
enable it under any circumstance on x86.

> I think we should leave the global "swiotlb" setting alone. The global
> swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
> have a way to deal with swiotlb/GPU incompatibilities.
> 
> We just have to avoid making things worse on Xen, and for that we just
> need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
> doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
> swiotlb, then we have a good Linux configuration capable of handling the
> GPU properly.

Given that this patch is basically a non-functional change (because
the modified functions are only called for PV domains) I think we all
agree that swiotlb_xen should never be used on PVH, and native swiotlb
might be required depending on the DMA address restrictions of the
devices on the system.  So no change required.

Thanks, Roger.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-16 23:09                     ` Stefano Stabellini
  2023-03-17 10:19                       ` Roger Pau Monné
@ 2023-03-17 14:45                       ` Alex Deucher
  2023-03-21 18:55                         ` Christian König
  1 sibling, 1 reply; 27+ messages in thread
From: Alex Deucher @ 2023-03-17 14:45 UTC (permalink / raw)
  To: Stefano Stabellini
  Cc: Juergen Gross, Jan Beulich, Honglei Huang, amd-gfx, dri-devel,
	linux-kernel, Stewart Hildebrand, Oleksandr Tyshchenko,
	Huang Rui, Chen Jiqian, Xenia Ragiadakou, Alex Deucher,
	xen-devel, Boris Ostrovsky, Julia Zhang, Christian König,
	Roger Pau Monné

On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini
<sstabellini@kernel.org> wrote:
>
> On Thu, 16 Mar 2023, Juergen Gross wrote:
> > On 16.03.23 14:53, Alex Deucher wrote:
> > > On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote:
> > > >
> > > > On 16.03.23 14:45, Alex Deucher wrote:
> > > > > On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
> > > > > >
> > > > > > On 16.03.2023 00:25, Stefano Stabellini wrote:
> > > > > > > On Wed, 15 Mar 2023, Jan Beulich wrote:
> > > > > > > > On 15.03.2023 01:52, Stefano Stabellini wrote:
> > > > > > > > > On Mon, 13 Mar 2023, Jan Beulich wrote:
> > > > > > > > > > On 12.03.2023 13:01, Huang Rui wrote:
> > > > > > > > > > > Xen PVH is the paravirtualized mode and takes advantage of
> > > > > > > > > > > hardware
> > > > > > > > > > > virtualization support when possible. It will using the
> > > > > > > > > > > hardware IOMMU
> > > > > > > > > > > support instead of xen-swiotlb, so disable swiotlb if
> > > > > > > > > > > current domain is
> > > > > > > > > > > Xen PVH.
> > > > > > > > > >
> > > > > > > > > > But the kernel has no way (yet) to drive the IOMMU, so how can
> > > > > > > > > > it get
> > > > > > > > > > away without resorting to swiotlb in certain cases (like I/O
> > > > > > > > > > to an
> > > > > > > > > > address-restricted device)?
> > > > > > > > >
> > > > > > > > > I think Ray meant that, thanks to the IOMMU setup by Xen, there
> > > > > > > > > is no
> > > > > > > > > need for swiotlb-xen in Dom0. Address translations are done by
> > > > > > > > > the IOMMU
> > > > > > > > > so we can use guest physical addresses instead of machine
> > > > > > > > > addresses for
> > > > > > > > > DMA. This is a similar case to Dom0 on ARM when the IOMMU is
> > > > > > > > > available
> > > > > > > > > (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
> > > > > > > > > corresponding
> > > > > > > > > case is XENFEAT_not_direct_mapped).
> > > > > > > >
> > > > > > > > But how does Xen using an IOMMU help with, as said,
> > > > > > > > address-restricted
> > > > > > > > devices? They may still need e.g. a 32-bit address to be
> > > > > > > > programmed in,
> > > > > > > > and if the kernel has memory beyond the 4G boundary not all I/O
> > > > > > > > buffers
> > > > > > > > may fulfill this requirement.
> > > > > > >
> > > > > > > In short, it is going to work as long as Linux has guest physical
> > > > > > > addresses (not machine addresses, those could be anything) lower
> > > > > > > than
> > > > > > > 4GB.
> > > > > > >
> > > > > > > If the address-restricted device does DMA via an IOMMU, then the
> > > > > > > device
> > > > > > > gets programmed by Linux using its guest physical addresses (not
> > > > > > > machine
> > > > > > > addresses).
> > > > > > >
> > > > > > > The 32-bit restriction would be applied by Linux to its choice of
> > > > > > > guest
> > > > > > > physical address to use to program the device, the same way it does
> > > > > > > on
> > > > > > > native. The device would be fine as it always uses Linux-provided
> > > > > > > <4GB
> > > > > > > addresses. After the IOMMU translation (pagetable setup by Xen), we
> > > > > > > could get any address, including >4GB addresses, and that is
> > > > > > > expected to
> > > > > > > work.
> > > > > >
> > > > > > I understand that's the "normal" way of working. But whatever the
> > > > > > swiotlb
> > > > > > is used for in baremetal Linux, that would similarly require its use
> > > > > > in
> > > > > > PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
> > > > > > to
> > > > > > me like an incomplete attempt to disable its use altogether on x86.
> > > > > > What
> > > > > > difference of PVH vs baremetal am I missing here?
> > > > >
> > > > > swiotlb is not usable for GPUs even on bare metal.  They often have
> > > > > hundreds or megs or even gigs of memory mapped on the device at any
> > > > > given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
> > > > > the chip family).
> > > >
> > > > But the swiotlb isn't per device, but system global.
> > >
> > > Sure, but if the swiotlb is in use, then you can't really use the GPU.
> > > So you get to pick one.
> >
> > The swiotlb is used only for buffers which are not within the DMA mask of a
> > device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
> > won't use the swiotlb unless you have a buffer above guest physical address of
> > 16TB (so basically never).
> >
> > Disabling swiotlb in such a guest would OTOH mean, that a device with only
> > 32 bit DMA mask passed through to this guest couldn't work with buffers
> > above 4GB.
> >
> > I don't think this is acceptable.
>
> From the Xen subsystem in Linux point of view, the only thing we need to
> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
> the global swiotlb) on PVH because it is not needed anyway.
>
> I think we should leave the global "swiotlb" setting alone. The global
> swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
> have a way to deal with swiotlb/GPU incompatibilities.
>
> We just have to avoid making things worse on Xen, and for that we just
> need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
> doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
> swiotlb, then we have a good Linux configuration capable of handling the
> GPU properly.
>
> Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to
> false on native (non-Xen) x86?

In most cases we have an IOMMU enabled and IIRC, TTM has slightly
different behavior for memory allocation depending on whether swiotlb
would be needed or not.

Alex


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh
  2023-03-17 14:45                       ` Alex Deucher
@ 2023-03-21 18:55                         ` Christian König
  0 siblings, 0 replies; 27+ messages in thread
From: Christian König @ 2023-03-21 18:55 UTC (permalink / raw)
  To: Alex Deucher, Stefano Stabellini
  Cc: Juergen Gross, Jan Beulich, Honglei Huang, amd-gfx, dri-devel,
	linux-kernel, Stewart Hildebrand, Oleksandr Tyshchenko,
	Huang Rui, Chen Jiqian, Xenia Ragiadakou, Alex Deucher,
	xen-devel, Boris Ostrovsky, Julia Zhang, Roger Pau Monné

Am 17.03.23 um 15:45 schrieb Alex Deucher:
> On Thu, Mar 16, 2023 at 7:09 PM Stefano Stabellini
> <sstabellini@kernel.org> wrote:
>> On Thu, 16 Mar 2023, Juergen Gross wrote:
>>> On 16.03.23 14:53, Alex Deucher wrote:
>>>> On Thu, Mar 16, 2023 at 9:48 AM Juergen Gross <jgross@suse.com> wrote:
>>>>> On 16.03.23 14:45, Alex Deucher wrote:
>>>>>> On Thu, Mar 16, 2023 at 3:50 AM Jan Beulich <jbeulich@suse.com> wrote:
>>>>>>> On 16.03.2023 00:25, Stefano Stabellini wrote:
>>>>>>>> On Wed, 15 Mar 2023, Jan Beulich wrote:
>>>>>>>>> On 15.03.2023 01:52, Stefano Stabellini wrote:
>>>>>>>>>> On Mon, 13 Mar 2023, Jan Beulich wrote:
>>>>>>>>>>> On 12.03.2023 13:01, Huang Rui wrote:
>>>>>>>>>>>> Xen PVH is the paravirtualized mode and takes advantage of
>>>>>>>>>>>> hardware
>>>>>>>>>>>> virtualization support when possible. It will using the
>>>>>>>>>>>> hardware IOMMU
>>>>>>>>>>>> support instead of xen-swiotlb, so disable swiotlb if
>>>>>>>>>>>> current domain is
>>>>>>>>>>>> Xen PVH.
>>>>>>>>>>> But the kernel has no way (yet) to drive the IOMMU, so how can
>>>>>>>>>>> it get
>>>>>>>>>>> away without resorting to swiotlb in certain cases (like I/O
>>>>>>>>>>> to an
>>>>>>>>>>> address-restricted device)?
>>>>>>>>>> I think Ray meant that, thanks to the IOMMU setup by Xen, there
>>>>>>>>>> is no
>>>>>>>>>> need for swiotlb-xen in Dom0. Address translations are done by
>>>>>>>>>> the IOMMU
>>>>>>>>>> so we can use guest physical addresses instead of machine
>>>>>>>>>> addresses for
>>>>>>>>>> DMA. This is a similar case to Dom0 on ARM when the IOMMU is
>>>>>>>>>> available
>>>>>>>>>> (see include/xen/arm/swiotlb-xen.h:xen_swiotlb_detect, the
>>>>>>>>>> corresponding
>>>>>>>>>> case is XENFEAT_not_direct_mapped).
>>>>>>>>> But how does Xen using an IOMMU help with, as said,
>>>>>>>>> address-restricted
>>>>>>>>> devices? They may still need e.g. a 32-bit address to be
>>>>>>>>> programmed in,
>>>>>>>>> and if the kernel has memory beyond the 4G boundary not all I/O
>>>>>>>>> buffers
>>>>>>>>> may fulfill this requirement.
>>>>>>>> In short, it is going to work as long as Linux has guest physical
>>>>>>>> addresses (not machine addresses, those could be anything) lower
>>>>>>>> than
>>>>>>>> 4GB.
>>>>>>>>
>>>>>>>> If the address-restricted device does DMA via an IOMMU, then the
>>>>>>>> device
>>>>>>>> gets programmed by Linux using its guest physical addresses (not
>>>>>>>> machine
>>>>>>>> addresses).
>>>>>>>>
>>>>>>>> The 32-bit restriction would be applied by Linux to its choice of
>>>>>>>> guest
>>>>>>>> physical address to use to program the device, the same way it does
>>>>>>>> on
>>>>>>>> native. The device would be fine as it always uses Linux-provided
>>>>>>>> <4GB
>>>>>>>> addresses. After the IOMMU translation (pagetable setup by Xen), we
>>>>>>>> could get any address, including >4GB addresses, and that is
>>>>>>>> expected to
>>>>>>>> work.
>>>>>>> I understand that's the "normal" way of working. But whatever the
>>>>>>> swiotlb
>>>>>>> is used for in baremetal Linux, that would similarly require its use
>>>>>>> in
>>>>>>> PVH (or HVM) aiui. So unconditionally disabling it in PVH would look
>>>>>>> to
>>>>>>> me like an incomplete attempt to disable its use altogether on x86.
>>>>>>> What
>>>>>>> difference of PVH vs baremetal am I missing here?
>>>>>> swiotlb is not usable for GPUs even on bare metal.  They often have
>>>>>> hundreds or megs or even gigs of memory mapped on the device at any
>>>>>> given time.  Also, AMD GPUs support 44-48 bit DMA masks (depending on
>>>>>> the chip family).
>>>>> But the swiotlb isn't per device, but system global.
>>>> Sure, but if the swiotlb is in use, then you can't really use the GPU.
>>>> So you get to pick one.
>>> The swiotlb is used only for buffers which are not within the DMA mask of a
>>> device (see dma_direct_map_page()). So an AMD GPU supporting a 44 bit DMA mask
>>> won't use the swiotlb unless you have a buffer above guest physical address of
>>> 16TB (so basically never).
>>>
>>> Disabling swiotlb in such a guest would OTOH mean, that a device with only
>>> 32 bit DMA mask passed through to this guest couldn't work with buffers
>>> above 4GB.
>>>
>>> I don't think this is acceptable.
>>  From the Xen subsystem in Linux point of view, the only thing we need to
>> do is to make sure *not* to enable swiotlb_xen (yes "swiotlb_xen", not
>> the global swiotlb) on PVH because it is not needed anyway.
>>
>> I think we should leave the global "swiotlb" setting alone. The global
>> swiotlb is not relevant to Xen anyway, and surely baremetal Linux has to
>> have a way to deal with swiotlb/GPU incompatibilities.
>>
>> We just have to avoid making things worse on Xen, and for that we just
>> need to avoid unconditionally enabling swiotlb-xen. If the Xen subsystem
>> doesn't enable swiotlb_xen/swiotlb, and no other subsystem enables
>> swiotlb, then we have a good Linux configuration capable of handling the
>> GPU properly.
>>
>> Alex, please correct me if I am wrong. How is x86_swiotlb_enable set to
>> false on native (non-Xen) x86?
> In most cases we have an IOMMU enabled and IIRC, TTM has slightly
> different behavior for memory allocation depending on whether swiotlb
> would be needed or not.

Well "slightly different" is an understatement. We need to disable quite 
a bunch of features to make swiotlb work with GPUs.

Especially userptr and inter device sharing won't work any more.

Regards,
Christian.

>
> Alex



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2023-03-21 18:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-12 12:01 [RFC PATCH 0/5] Add Xen PVH dom0 support for GPU Huang Rui
2023-03-12 12:01 ` [RFC PATCH 1/5] x86/xen: disable swiotlb for xen pvh Huang Rui
2023-03-13  8:56   ` Jan Beulich
2023-03-15  0:52     ` Stefano Stabellini
2023-03-15  4:14       ` Huang Rui
2023-03-15  6:52         ` Jan Beulich
2023-03-15  6:49       ` Jan Beulich
2023-03-15 23:25         ` Stefano Stabellini
2023-03-16  7:50           ` Jan Beulich
2023-03-16 13:45             ` Alex Deucher
2023-03-16 13:48               ` Juergen Gross
2023-03-16 13:53                 ` Alex Deucher
2023-03-16 13:58                   ` Jan Beulich
2023-03-16 14:20                   ` Juergen Gross
2023-03-16 23:09                     ` Stefano Stabellini
2023-03-17 10:19                       ` Roger Pau Monné
2023-03-17 14:45                       ` Alex Deucher
2023-03-21 18:55                         ` Christian König
2023-03-16 16:28   ` Roger Pau Monné
2023-03-12 12:01 ` [RFC PATCH 2/5] xen/grants: update initialization order of xen grant table Huang Rui
2023-03-15 12:31   ` Roger Pau Monné
2023-03-12 12:01 ` [RFC PATCH 3/5] drm/amdgpu: set passthrough mode for xen pvh/hvm Huang Rui
2023-03-15 12:42   ` Roger Pau Monné
2023-03-12 12:01 ` [RFC PATCH 4/5] x86/xen: acpi registers gsi for xen pvh Huang Rui
2023-03-15 14:00   ` Roger Pau Monné
2023-03-12 12:01 ` [RFC PATCH 5/5] xen/privcmd: add IOCTL_PRIVCMD_GSI_FROM_IRQ Huang Rui
2023-03-15 14:26   ` Roger Pau Monné

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).