* [PATCH RFC] drm: add func to better detect wether swiotlb is needed
@ 2019-02-15 21:29 Michael D Labriola
2019-02-16 19:00 ` Koenig, Christian
[not found] ` <1550266182-4061-1-git-send-email-michael.d.labriola-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 2 replies; 4+ messages in thread
From: Michael D Labriola @ 2019-02-15 21:29 UTC (permalink / raw)
To: dri-devel, Alex Deucher, Christian Koenig, Chunming Zhou,
amd-gfx, Monk Liu
Cc: Juergen Gross, Konrad Rzeszutek Wilk, Andrew Cooper,
Michael D Labriola, xen-devel, Christoph Hellwig, Paul Durrant,
Jan Beulich
This commit fixes DRM failures on Xen PV systems that were introduced in
v4.17 by the following commits:
82626363 drm: add func to get max iomem address v2
fd5fd480 drm/amdgpu: only enable swiotlb alloc when need v2
1bc3d3cc drm/radeon: only enable swiotlb path when need v2
The introduction of ->need_swiotlb to the ttm_dma_populate() conditionals
in the radeon and amdgpu device drivers causes Gnome to immediately crash
on Xen PV systems, returning the user to the login screen. The following
kernel errors get logged:
[ 28.554259] radeon_dp_aux_transfer_native: 200 callbacks suppressed
[ 31.219821] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 31.220030] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
[ 31.226109] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 31.226300] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
[ 31.300734] gnome-shell[1935]: segfault at 88 ip 00007f39151cd904 sp 00007ffc97611ad8 error 4 in libmutter-cogl.so[7f3915178000+aa000]
[ 31.300745] Code: 5f c3 0f 1f 40 00 48 8b 47 78 48 8b 40 40 ff e0 66 0f 1f 44 00 00 48 8b 47 78 48 8b 40 48 ff e0 66 0f 1f 44 00 00 48 8b 47 78 <48> 8b 80 88 00 00 00 ff e0 0f 1f 00 48 8b 47 78 48 8b 40 68 ff e0
[ 38.193302] radeon_dp_aux_transfer_native: 116 callbacks suppressed
[ 40.009317] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 40.009488] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
[ 40.015114] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 40.015297] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
[ 40.028302] gnome-shell[2431]: segfault at 2dadf40 ip 0000000002dadf40 sp 00007ffcd24ea5f8 error 15
[ 40.028306] Code: 20 6e 31 00 00 00 00 00 00 00 00 37 e3 3d 2d 7f 00 00 80 f4 e6 3d 2d 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 c1 00 00 00 00 00 00 00 80 e1 d2 03 00 00
This commit renames drm_get_max_iomem() to drm_need_swiotlb(), adds a
xen_pv_domain() check to it, and moves the bit shifting comparison that
always follows its usage into the function (simplifying the drm driver
code).
---
drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 2 +-
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
drivers/gpu/drm/drm_memory.c | 19 ++++++++++++++++---
drivers/gpu/drm/radeon/radeon_device.c | 2 +-
include/drm/drm_cache.h | 2 +-
6 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 910c4ce..6bc0266 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1029,7 +1029,7 @@ static int gmc_v7_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
- adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+ adev->need_swiotlb = drm_need_swiotlb(dma_bits);
r = gmc_v7_0_init_microcode(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 747c068..8638adf 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1155,7 +1155,7 @@ static int gmc_v8_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
pr_warn("amdgpu: No coherent DMA available\n");
}
- adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+ adev->need_swiotlb = drm_need_swiotlb(dma_bits);
r = gmc_v8_0_init_microcode(adev);
if (r) {
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index f35d7a5..4f67093 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -989,7 +989,7 @@ static int gmc_v9_0_sw_init(void *handle)
pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
}
- adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+ adev->need_swiotlb = drm_need_swiotlb(dma_bits);
if (adev->asic_type == CHIP_VEGA20) {
r = gfxhub_v1_1_get_xgmi_info(adev);
diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c
index d69e4fc..6af59a6 100644
--- a/drivers/gpu/drm/drm_memory.c
+++ b/drivers/gpu/drm/drm_memory.c
@@ -35,6 +35,7 @@
#include <linux/highmem.h>
#include <linux/export.h>
+#include <xen/xen.h>
#include <drm/drmP.h>
#include "drm_legacy.h"
@@ -150,15 +151,27 @@ void drm_legacy_ioremapfree(struct drm_local_map *map, struct drm_device *dev)
}
EXPORT_SYMBOL(drm_legacy_ioremapfree);
-u64 drm_get_max_iomem(void)
+bool drm_need_swiotlb(int dma_bits)
{
struct resource *tmp;
resource_size_t max_iomem = 0;
+ /*
+ * Xen paravirtual hosts require swiotlb regardless of requested dma
+ * transfer size.
+ *
+ * NOTE: Really, what it requires is use of the dma_alloc_coherent
+ * allocator used in ttm_dma_populate() instead of
+ * ttm_populate_and_map_pages(), which bounce buffers so much in
+ * Xen it leads to swiotlb buffer exhaustion.
+ */
+ if (xen_pv_domain())
+ return true;
+
for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling) {
max_iomem = max(max_iomem, tmp->end);
}
- return max_iomem;
+ return max_iomem > ((u64)1 << dma_bits);
}
-EXPORT_SYMBOL(drm_get_max_iomem);
+EXPORT_SYMBOL(drm_need_swiotlb);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 59c8a66..a1d3c62 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1387,7 +1387,7 @@ int radeon_device_init(struct radeon_device *rdev,
pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32));
pr_warn("radeon: No coherent DMA available\n");
}
- rdev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
+ rdev->need_swiotlb = drm_need_swiotlb(dma_bits);
/* Registers mapping */
/* TODO: block userspace mapping of io register */
diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
index bfe1639..633eaaf 100644
--- a/include/drm/drm_cache.h
+++ b/include/drm/drm_cache.h
@@ -38,7 +38,7 @@
void drm_clflush_pages(struct page *pages[], unsigned long num_pages);
void drm_clflush_sg(struct sg_table *st);
void drm_clflush_virt_range(void *addr, unsigned long length);
-u64 drm_get_max_iomem(void);
+bool drm_need_swiotlb(int dma_bits);
static inline bool drm_arch_can_wc_memory(void)
--
1.8.3.1
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] drm: add func to better detect wether swiotlb is needed
[not found] ` <1550266182-4061-1-git-send-email-michael.d.labriola-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2019-02-16 19:00 ` Koenig, Christian
2019-02-19 15:22 ` Michael Labriola
0 siblings, 1 reply; 4+ messages in thread
From: Koenig, Christian @ 2019-02-16 19:00 UTC (permalink / raw)
To: Michael D Labriola, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
Deucher, Alexander, Zhou, David(ChunMing),
amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Liu, Monk
Cc: Juergen Gross, Konrad Rzeszutek Wilk, Andrew Cooper,
xen-devel-GuqFBffKawuEi8DpZVb4nw, Christoph Hellwig,
Paul Durrant, Jan Beulich
Am 15.02.19 um 22:29 schrieb Michael D Labriola:
> This commit fixes DRM failures on Xen PV systems that were introduced in
> v4.17 by the following commits:
>
> 82626363 drm: add func to get max iomem address v2
> fd5fd480 drm/amdgpu: only enable swiotlb alloc when need v2
> 1bc3d3cc drm/radeon: only enable swiotlb path when need v2
>
> The introduction of ->need_swiotlb to the ttm_dma_populate() conditionals
> in the radeon and amdgpu device drivers causes Gnome to immediately crash
> on Xen PV systems, returning the user to the login screen. The following
> kernel errors get logged:
>
> [ 28.554259] radeon_dp_aux_transfer_native: 200 callbacks suppressed
> [ 31.219821] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 31.220030] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 31.226109] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 31.226300] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 31.300734] gnome-shell[1935]: segfault at 88 ip 00007f39151cd904 sp 00007ffc97611ad8 error 4 in libmutter-cogl.so[7f3915178000+aa000]
> [ 31.300745] Code: 5f c3 0f 1f 40 00 48 8b 47 78 48 8b 40 40 ff e0 66 0f 1f 44 00 00 48 8b 47 78 48 8b 40 48 ff e0 66 0f 1f 44 00 00 48 8b 47 78 <48> 8b 80 88 00 00 00 ff e0 0f 1f 00 48 8b 47 78 48 8b 40 68 ff e0
> [ 38.193302] radeon_dp_aux_transfer_native: 116 callbacks suppressed
> [ 40.009317] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 40.009488] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 40.015114] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 40.015297] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 40.028302] gnome-shell[2431]: segfault at 2dadf40 ip 0000000002dadf40 sp 00007ffcd24ea5f8 error 15
> [ 40.028306] Code: 20 6e 31 00 00 00 00 00 00 00 00 37 e3 3d 2d 7f 00 00 80 f4 e6 3d 2d 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 c1 00 00 00 00 00 00 00 80 e1 d2 03 00 00
>
> This commit renames drm_get_max_iomem() to drm_need_swiotlb(), adds a
> xen_pv_domain() check to it, and moves the bit shifting comparison that
> always follows its usage into the function (simplifying the drm driver
> code).
You signed of by line is missing. Apart from that it looks like a good
fix to me.
But the question is why does Xen needs dma_alloc_coherent() ?
Using dma_alloc_coherent() is really just a fallback path and we need to
disable a bunch of userspace features when going down that route.
Regards,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
> drivers/gpu/drm/drm_memory.c | 19 ++++++++++++++++---
> drivers/gpu/drm/radeon/radeon_device.c | 2 +-
> include/drm/drm_cache.h | 2 +-
> 6 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> index 910c4ce..6bc0266 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> @@ -1029,7 +1029,7 @@ static int gmc_v7_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> pr_warn("amdgpu: No coherent DMA available\n");
> }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> r = gmc_v7_0_init_microcode(adev);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> index 747c068..8638adf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> @@ -1155,7 +1155,7 @@ static int gmc_v8_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> pr_warn("amdgpu: No coherent DMA available\n");
> }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> r = gmc_v8_0_init_microcode(adev);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index f35d7a5..4f67093 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -989,7 +989,7 @@ static int gmc_v9_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
> }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> if (adev->asic_type == CHIP_VEGA20) {
> r = gfxhub_v1_1_get_xgmi_info(adev);
> diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c
> index d69e4fc..6af59a6 100644
> --- a/drivers/gpu/drm/drm_memory.c
> +++ b/drivers/gpu/drm/drm_memory.c
> @@ -35,6 +35,7 @@
>
> #include <linux/highmem.h>
> #include <linux/export.h>
> +#include <xen/xen.h>
> #include <drm/drmP.h>
> #include "drm_legacy.h"
>
> @@ -150,15 +151,27 @@ void drm_legacy_ioremapfree(struct drm_local_map *map, struct drm_device *dev)
> }
> EXPORT_SYMBOL(drm_legacy_ioremapfree);
>
> -u64 drm_get_max_iomem(void)
> +bool drm_need_swiotlb(int dma_bits)
> {
> struct resource *tmp;
> resource_size_t max_iomem = 0;
>
> + /*
> + * Xen paravirtual hosts require swiotlb regardless of requested dma
> + * transfer size.
> + *
> + * NOTE: Really, what it requires is use of the dma_alloc_coherent
> + * allocator used in ttm_dma_populate() instead of
> + * ttm_populate_and_map_pages(), which bounce buffers so much in
> + * Xen it leads to swiotlb buffer exhaustion.
> + */
> + if (xen_pv_domain())
> + return true;
> +
> for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling) {
> max_iomem = max(max_iomem, tmp->end);
> }
>
> - return max_iomem;
> + return max_iomem > ((u64)1 << dma_bits);
> }
> -EXPORT_SYMBOL(drm_get_max_iomem);
> +EXPORT_SYMBOL(drm_need_swiotlb);
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 59c8a66..a1d3c62 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1387,7 +1387,7 @@ int radeon_device_init(struct radeon_device *rdev,
> pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32));
> pr_warn("radeon: No coherent DMA available\n");
> }
> - rdev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + rdev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> /* Registers mapping */
> /* TODO: block userspace mapping of io register */
> diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> index bfe1639..633eaaf 100644
> --- a/include/drm/drm_cache.h
> +++ b/include/drm/drm_cache.h
> @@ -38,7 +38,7 @@
> void drm_clflush_pages(struct page *pages[], unsigned long num_pages);
> void drm_clflush_sg(struct sg_table *st);
> void drm_clflush_virt_range(void *addr, unsigned long length);
> -u64 drm_get_max_iomem(void);
> +bool drm_need_swiotlb(int dma_bits);
>
>
> static inline bool drm_arch_can_wc_memory(void)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] drm: add func to better detect wether swiotlb is needed
2019-02-15 21:29 [PATCH RFC] drm: add func to better detect wether swiotlb is needed Michael D Labriola
@ 2019-02-16 19:00 ` Koenig, Christian
[not found] ` <1550266182-4061-1-git-send-email-michael.d.labriola-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
1 sibling, 0 replies; 4+ messages in thread
From: Koenig, Christian @ 2019-02-16 19:00 UTC (permalink / raw)
To: Michael D Labriola, dri-devel, Deucher, Alexander, Zhou,
David(ChunMing),
amd-gfx, Liu, Monk
Cc: Juergen Gross, Konrad Rzeszutek Wilk, Andrew Cooper, xen-devel,
Christoph Hellwig, Paul Durrant, Jan Beulich
Am 15.02.19 um 22:29 schrieb Michael D Labriola:
> This commit fixes DRM failures on Xen PV systems that were introduced in
> v4.17 by the following commits:
>
> 82626363 drm: add func to get max iomem address v2
> fd5fd480 drm/amdgpu: only enable swiotlb alloc when need v2
> 1bc3d3cc drm/radeon: only enable swiotlb path when need v2
>
> The introduction of ->need_swiotlb to the ttm_dma_populate() conditionals
> in the radeon and amdgpu device drivers causes Gnome to immediately crash
> on Xen PV systems, returning the user to the login screen. The following
> kernel errors get logged:
>
> [ 28.554259] radeon_dp_aux_transfer_native: 200 callbacks suppressed
> [ 31.219821] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 31.220030] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 31.226109] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 31.226300] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 31.300734] gnome-shell[1935]: segfault at 88 ip 00007f39151cd904 sp 00007ffc97611ad8 error 4 in libmutter-cogl.so[7f3915178000+aa000]
> [ 31.300745] Code: 5f c3 0f 1f 40 00 48 8b 47 78 48 8b 40 40 ff e0 66 0f 1f 44 00 00 48 8b 47 78 48 8b 40 48 ff e0 66 0f 1f 44 00 00 48 8b 47 78 <48> 8b 80 88 00 00 00 ff e0 0f 1f 00 48 8b 47 78 48 8b 40 68 ff e0
> [ 38.193302] radeon_dp_aux_transfer_native: 116 callbacks suppressed
> [ 40.009317] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 40.009488] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 40.015114] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> [ 40.015297] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> [ 40.028302] gnome-shell[2431]: segfault at 2dadf40 ip 0000000002dadf40 sp 00007ffcd24ea5f8 error 15
> [ 40.028306] Code: 20 6e 31 00 00 00 00 00 00 00 00 37 e3 3d 2d 7f 00 00 80 f4 e6 3d 2d 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 c1 00 00 00 00 00 00 00 80 e1 d2 03 00 00
>
> This commit renames drm_get_max_iomem() to drm_need_swiotlb(), adds a
> xen_pv_domain() check to it, and moves the bit shifting comparison that
> always follows its usage into the function (simplifying the drm driver
> code).
You signed of by line is missing. Apart from that it looks like a good
fix to me.
But the question is why does Xen needs dma_alloc_coherent() ?
Using dma_alloc_coherent() is really just a fallback path and we need to
disable a bunch of userspace features when going down that route.
Regards,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 2 +-
> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 2 +-
> drivers/gpu/drm/drm_memory.c | 19 ++++++++++++++++---
> drivers/gpu/drm/radeon/radeon_device.c | 2 +-
> include/drm/drm_cache.h | 2 +-
> 6 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> index 910c4ce..6bc0266 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
> @@ -1029,7 +1029,7 @@ static int gmc_v7_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> pr_warn("amdgpu: No coherent DMA available\n");
> }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> r = gmc_v7_0_init_microcode(adev);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> index 747c068..8638adf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
> @@ -1155,7 +1155,7 @@ static int gmc_v8_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> pr_warn("amdgpu: No coherent DMA available\n");
> }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> r = gmc_v8_0_init_microcode(adev);
> if (r) {
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index f35d7a5..4f67093 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -989,7 +989,7 @@ static int gmc_v9_0_sw_init(void *handle)
> pci_set_consistent_dma_mask(adev->pdev, DMA_BIT_MASK(32));
> printk(KERN_WARNING "amdgpu: No coherent DMA available.\n");
> }
> - adev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + adev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> if (adev->asic_type == CHIP_VEGA20) {
> r = gfxhub_v1_1_get_xgmi_info(adev);
> diff --git a/drivers/gpu/drm/drm_memory.c b/drivers/gpu/drm/drm_memory.c
> index d69e4fc..6af59a6 100644
> --- a/drivers/gpu/drm/drm_memory.c
> +++ b/drivers/gpu/drm/drm_memory.c
> @@ -35,6 +35,7 @@
>
> #include <linux/highmem.h>
> #include <linux/export.h>
> +#include <xen/xen.h>
> #include <drm/drmP.h>
> #include "drm_legacy.h"
>
> @@ -150,15 +151,27 @@ void drm_legacy_ioremapfree(struct drm_local_map *map, struct drm_device *dev)
> }
> EXPORT_SYMBOL(drm_legacy_ioremapfree);
>
> -u64 drm_get_max_iomem(void)
> +bool drm_need_swiotlb(int dma_bits)
> {
> struct resource *tmp;
> resource_size_t max_iomem = 0;
>
> + /*
> + * Xen paravirtual hosts require swiotlb regardless of requested dma
> + * transfer size.
> + *
> + * NOTE: Really, what it requires is use of the dma_alloc_coherent
> + * allocator used in ttm_dma_populate() instead of
> + * ttm_populate_and_map_pages(), which bounce buffers so much in
> + * Xen it leads to swiotlb buffer exhaustion.
> + */
> + if (xen_pv_domain())
> + return true;
> +
> for (tmp = iomem_resource.child; tmp; tmp = tmp->sibling) {
> max_iomem = max(max_iomem, tmp->end);
> }
>
> - return max_iomem;
> + return max_iomem > ((u64)1 << dma_bits);
> }
> -EXPORT_SYMBOL(drm_get_max_iomem);
> +EXPORT_SYMBOL(drm_need_swiotlb);
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 59c8a66..a1d3c62 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1387,7 +1387,7 @@ int radeon_device_init(struct radeon_device *rdev,
> pci_set_consistent_dma_mask(rdev->pdev, DMA_BIT_MASK(32));
> pr_warn("radeon: No coherent DMA available\n");
> }
> - rdev->need_swiotlb = drm_get_max_iomem() > ((u64)1 << dma_bits);
> + rdev->need_swiotlb = drm_need_swiotlb(dma_bits);
>
> /* Registers mapping */
> /* TODO: block userspace mapping of io register */
> diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> index bfe1639..633eaaf 100644
> --- a/include/drm/drm_cache.h
> +++ b/include/drm/drm_cache.h
> @@ -38,7 +38,7 @@
> void drm_clflush_pages(struct page *pages[], unsigned long num_pages);
> void drm_clflush_sg(struct sg_table *st);
> void drm_clflush_virt_range(void *addr, unsigned long length);
> -u64 drm_get_max_iomem(void);
> +bool drm_need_swiotlb(int dma_bits);
>
>
> static inline bool drm_arch_can_wc_memory(void)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] drm: add func to better detect wether swiotlb is needed
2019-02-16 19:00 ` Koenig, Christian
@ 2019-02-19 15:22 ` Michael Labriola
0 siblings, 0 replies; 4+ messages in thread
From: Michael Labriola @ 2019-02-19 15:22 UTC (permalink / raw)
To: Koenig, Christian
Cc: Juergen Gross, Zhou, David(ChunMing),
Konrad Rzeszutek Wilk, Andrew Cooper, amd-gfx, xen-devel,
Christoph Hellwig, Paul Durrant, dri-devel, Jan Beulich, Deucher,
Alexander, Liu, Monk
On Sat, Feb 16, 2019 at 2:00 PM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Am 15.02.19 um 22:29 schrieb Michael D Labriola:
> > This commit fixes DRM failures on Xen PV systems that were introduced in
> > v4.17 by the following commits:
> >
> > 82626363 drm: add func to get max iomem address v2
> > fd5fd480 drm/amdgpu: only enable swiotlb alloc when need v2
> > 1bc3d3cc drm/radeon: only enable swiotlb path when need v2
> >
> > The introduction of ->need_swiotlb to the ttm_dma_populate() conditionals
> > in the radeon and amdgpu device drivers causes Gnome to immediately crash
> > on Xen PV systems, returning the user to the login screen. The following
> > kernel errors get logged:
> >
> > [ 28.554259] radeon_dp_aux_transfer_native: 200 callbacks suppressed
> > [ 31.219821] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> > [ 31.220030] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> > [ 31.226109] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> > [ 31.226300] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> > [ 31.300734] gnome-shell[1935]: segfault at 88 ip 00007f39151cd904 sp 00007ffc97611ad8 error 4 in libmutter-cogl.so[7f3915178000+aa000]
> > [ 31.300745] Code: 5f c3 0f 1f 40 00 48 8b 47 78 48 8b 40 40 ff e0 66 0f 1f 44 00 00 48 8b 47 78 48 8b 40 48 ff e0 66 0f 1f 44 00 00 48 8b 47 78 <48> 8b 80 88 00 00 00 ff e0 0f 1f 00 48 8b 47 78 48 8b 40 68 ff e0
> > [ 38.193302] radeon_dp_aux_transfer_native: 116 callbacks suppressed
> > [ 40.009317] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> > [ 40.009488] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> > [ 40.015114] radeon 0000:01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
> > [ 40.015297] [drm:radeon_gem_object_create [radeon]] *ERROR* Failed to allocate GEM object (16384000, 2, 4096, -14)
> > [ 40.028302] gnome-shell[2431]: segfault at 2dadf40 ip 0000000002dadf40 sp 00007ffcd24ea5f8 error 15
> > [ 40.028306] Code: 20 6e 31 00 00 00 00 00 00 00 00 37 e3 3d 2d 7f 00 00 80 f4 e6 3d 2d 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 c1 00 00 00 00 00 00 00 80 e1 d2 03 00 00
> >
> > This commit renames drm_get_max_iomem() to drm_need_swiotlb(), adds a
> > xen_pv_domain() check to it, and moves the bit shifting comparison that
> > always follows its usage into the function (simplifying the drm driver
> > code).
>
> You signed of by line is missing. Apart from that it looks like a good
> fix to me.
Oops, I can resend w/ signoff.
> But the question is why does Xen needs dma_alloc_coherent() ?
I'm not sure I understand, even after discussing on xen-devel. What I
was told by Juergen (on CC) is:
"The thing which is different between Xen PV guests and most others
(all others(?), now that Lguest and UML have been dropped) is that
what Linux thinks of as PFN $N isn't necessarily adjacent to PFN $N+1
in system physical address space.
Therefore, code which has a buffer spanning a page boundary can't just
convert a pointer to the buffer into a physical address, and hand that
address to a device. You generally end up with either memory
corruption (DMA hitting the wrong page allocated to the guest), or an
IOMMU fault (DMA hitting a pages which isn't allocated to the guest)."
And Christoph (also on CC) pointed out:
"ttm_populate_and_map_pages uses dma_map_page to map GPU memory,
and from what I can tell from your report potentially lots of it.
So this does "properly" use the DMA API for some amount of "properly".
The problem here is that ttm_populate_and_map_pages first allocates
memory and then maps it in a way where it bounce buffers a lot,
leading to a swiotlb buffer exaustion, as seen in your report.
ttm_dma_populate also sort of "properly" uses the DMA API in that it
uses the dma_alloc_coherent allocator. The benefit of that allocator is
that is always returns addressable memory without exhausing the swiotlb
buffer. The dowside of ttm_dma_populate / dma_alloc_coherent is that
on architectures where PCIe is not cache coherent it pointlessly
up other resources, as coherent DMA memory can be a very finite resource
there.
So for a short term fix forcing the dma_alloc_coherent route on
Xen/x86 is the right thing. On Xen/arm and Xen/arm64 is might already
be problemeatic due to the explanation above unfortunately.
The real fix is to finally get broadly available non-coherent device
memory allocator into mainlaine and with ttm_populate_and_map_pages
to use it. Note that for non-coherent devices it seems like
ttm_populate_and_map_pages also seems to miss proper ownership
management but that is another issue.
My proposal for such an allocator here:
https://lwn.net/Articles/774429/"
>
> Using dma_alloc_coherent() is really just a fallback path and we need to
> disable a bunch of userspace features when going down that route.
So, for the short-term fix, I'll resubmit my patch w/ appropriate sign-off.
I'm all for making it work even better on Xen after this band-aid is
applied. I can help test out proposed patches on my systems if that
helps. Thanks!
-Mike
--
Michael D Labriola
21 Rip Van Winkle Cir
Warwick, RI 02886
401-316-9844 (cell)
401-848-8871 (work)
401-234-1306 (home)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-02-19 15:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-15 21:29 [PATCH RFC] drm: add func to better detect wether swiotlb is needed Michael D Labriola
2019-02-16 19:00 ` Koenig, Christian
[not found] ` <1550266182-4061-1-git-send-email-michael.d.labriola-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-02-16 19:00 ` Koenig, Christian
2019-02-19 15:22 ` Michael Labriola
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.