* [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
@ 2019-11-13 23:25 ` Bruce Chang
0 siblings, 0 replies; 11+ messages in thread
From: Bruce Chang @ 2019-11-13 23:25 UTC (permalink / raw)
To: intel-gfx
below is the call trace when this issue is hit
<3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
<3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
<4> [113.319900] no locks held by debugfs_test/678.
<3> [113.321002] Preemption disabled at:
<4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
<4> [113.327259] Call Trace:
<4> [113.327871] dump_stack+0x67/0x9b
<4> [113.328683] ___might_sleep+0x167/0x250
<4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
<4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70
<4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
<4> [113.335951] compress_page+0x7c/0x100 [i915]
<4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
<4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
<4> [113.339771] ? __lock_acquire+0x4ac/0x1e90
<4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50
<4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915]
<4> [113.343243] full_proxy_open+0x139/0x1b0
<4> [113.344196] ? open_proxy_open+0xc0/0xc0
<4> [113.345149] do_dentry_open+0x1ce/0x3a0
<4> [113.346084] path_openat+0x4c9/0xac0
<4> [113.346967] do_filp_open+0x96/0x110
<4> [113.347848] ? __alloc_fd+0xe0/0x1f0
<4> [113.348736] ? do_sys_open+0x1b8/0x250
<4> [113.349647] do_sys_open+0x1b8/0x250
<4> [113.350526] do_syscall_64+0x55/0x1c0
<4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context
but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can
potentially go to sleep.
In order to fix this issue, we either
1) not enter into atomic context, i.e., to use non atomic version of
functions like io_mapping_map_wc/kmap,
or
2) make compress_page run in atomic context.
To follow current design not to run compression in atomic context, so,
use 1) above in this patch.
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
---
drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1f2f266f26af..7118ecb7f144 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
compress->wc = i915_gem_object_is_lmem(vma->obj) ||
drm_mm_node_allocated(&ggtt->error_capture);
ret = -EINVAL;
if (drm_mm_node_allocated(&ggtt->error_capture)) {
void __iomem *s;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
ggtt->vm.insert_page(&ggtt->vm, dma, slot,
I915_CACHE_NONE, 0);
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
io_mapping_unmap(s);
if (ret)
break;
}
} else if (i915_gem_object_is_lmem(vma->obj)) {
struct intel_memory_region *mem = vma->obj->mm.region;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
void __iomem *s;
- s = io_mapping_map_atomic_wc(&mem->iomap, dma);
+ s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
- io_mapping_unmap_atomic(s);
+ io_mapping_unmap(s);
if (ret)
break;
}
} else {
struct page *page;
for_each_sgt_page(page, iter, vma->pages) {
void *s;
drm_clflush_pages(&page, 1);
- s = kmap_atomic(page);
+ s = kmap(page);
ret = compress_page(compress, s, dst);
- kunmap_atomic(s);
+ kunmap(s);
drm_clflush_pages(&page, 1);
if (ret)
break;
}
}
if (ret || compress_flush(compress, dst)) {
while (dst->page_count--)
pool_free(&compress->pool, dst->pages[dst->page_count]);
kfree(dst);
dst = NULL;
}
compress_finish(compress);
return dst;
}
/*
* Generate a semi-unique error code. The code is not meant to have meaning, The
* code's only purpose is to try to prevent false duplicated bug reports by
* grossly estimating a GPU error state.
*
* TODO Ideally, hashing the batchbuffer would be a very nice way to determine
--
2.24.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Intel-gfx] [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
@ 2019-11-13 23:25 ` Bruce Chang
0 siblings, 0 replies; 11+ messages in thread
From: Bruce Chang @ 2019-11-13 23:25 UTC (permalink / raw)
To: intel-gfx
below is the call trace when this issue is hit
<3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
<3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
<4> [113.319900] no locks held by debugfs_test/678.
<3> [113.321002] Preemption disabled at:
<4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
<4> [113.327259] Call Trace:
<4> [113.327871] dump_stack+0x67/0x9b
<4> [113.328683] ___might_sleep+0x167/0x250
<4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
<4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70
<4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
<4> [113.335951] compress_page+0x7c/0x100 [i915]
<4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
<4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
<4> [113.339771] ? __lock_acquire+0x4ac/0x1e90
<4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50
<4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915]
<4> [113.343243] full_proxy_open+0x139/0x1b0
<4> [113.344196] ? open_proxy_open+0xc0/0xc0
<4> [113.345149] do_dentry_open+0x1ce/0x3a0
<4> [113.346084] path_openat+0x4c9/0xac0
<4> [113.346967] do_filp_open+0x96/0x110
<4> [113.347848] ? __alloc_fd+0xe0/0x1f0
<4> [113.348736] ? do_sys_open+0x1b8/0x250
<4> [113.349647] do_sys_open+0x1b8/0x250
<4> [113.350526] do_syscall_64+0x55/0x1c0
<4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context
but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can
potentially go to sleep.
In order to fix this issue, we either
1) not enter into atomic context, i.e., to use non atomic version of
functions like io_mapping_map_wc/kmap,
or
2) make compress_page run in atomic context.
To follow current design not to run compression in atomic context, so,
use 1) above in this patch.
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
---
drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1f2f266f26af..7118ecb7f144 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
compress->wc = i915_gem_object_is_lmem(vma->obj) ||
drm_mm_node_allocated(&ggtt->error_capture);
ret = -EINVAL;
if (drm_mm_node_allocated(&ggtt->error_capture)) {
void __iomem *s;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
ggtt->vm.insert_page(&ggtt->vm, dma, slot,
I915_CACHE_NONE, 0);
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
io_mapping_unmap(s);
if (ret)
break;
}
} else if (i915_gem_object_is_lmem(vma->obj)) {
struct intel_memory_region *mem = vma->obj->mm.region;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
void __iomem *s;
- s = io_mapping_map_atomic_wc(&mem->iomap, dma);
+ s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
- io_mapping_unmap_atomic(s);
+ io_mapping_unmap(s);
if (ret)
break;
}
} else {
struct page *page;
for_each_sgt_page(page, iter, vma->pages) {
void *s;
drm_clflush_pages(&page, 1);
- s = kmap_atomic(page);
+ s = kmap(page);
ret = compress_page(compress, s, dst);
- kunmap_atomic(s);
+ kunmap(s);
drm_clflush_pages(&page, 1);
if (ret)
break;
}
}
if (ret || compress_flush(compress, dst)) {
while (dst->page_count--)
pool_free(&compress->pool, dst->pages[dst->page_count]);
kfree(dst);
dst = NULL;
}
compress_finish(compress);
return dst;
}
/*
* Generate a semi-unique error code. The code is not meant to have meaning, The
* code's only purpose is to try to prevent false duplicated bug reports by
* grossly estimating a GPU error state.
*
* TODO Ideally, hashing the batchbuffer would be a very nice way to determine
--
2.24.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* ✗ Fi.CI.BAT: failure for drm/i915: Fix a bug calling sleep function in atomic context (rev4)
@ 2019-11-14 3:44 ` Patchwork
0 siblings, 0 replies; 11+ messages in thread
From: Patchwork @ 2019-11-14 3:44 UTC (permalink / raw)
To: Bruce Chang; +Cc: intel-gfx
== Series Details ==
Series: drm/i915: Fix a bug calling sleep function in atomic context (rev4)
URL : https://patchwork.freedesktop.org/series/69385/
State : failure
== Summary ==
Applying: drm/i915: Fix a bug calling sleep function in atomic context
Using index info to reconstruct a base tree...
M drivers/gpu/drm/i915/i915_gpu_error.c
Falling back to patching base and 3-way merge...
No changes -- Patch already applied.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
@ 2019-11-13 23:11 Bruce Chang
2019-11-13 23:20 ` Chris Wilson
2019-11-18 15:59 ` Jani Nikula
0 siblings, 2 replies; 11+ messages in thread
From: Bruce Chang @ 2019-11-13 23:11 UTC (permalink / raw)
To: intel-gfx
below is the call trace when this issue is hit
<3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
<3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
<4> [113.319900] no locks held by debugfs_test/678.
<3> [113.321002] Preemption disabled at:
<4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
<4> [113.327259] Call Trace:
<4> [113.327871] dump_stack+0x67/0x9b
<4> [113.328683] ___might_sleep+0x167/0x250
<4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
<4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380
<4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70
<4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
<4> [113.335951] compress_page+0x7c/0x100 [i915]
<4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
<4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
<4> [113.339771] ? __lock_acquire+0x4ac/0x1e90
<4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50
<4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915]
<4> [113.343243] full_proxy_open+0x139/0x1b0
<4> [113.344196] ? open_proxy_open+0xc0/0xc0
<4> [113.345149] do_dentry_open+0x1ce/0x3a0
<4> [113.346084] path_openat+0x4c9/0xac0
<4> [113.346967] do_filp_open+0x96/0x110
<4> [113.347848] ? __alloc_fd+0xe0/0x1f0
<4> [113.348736] ? do_sys_open+0x1b8/0x250
<4> [113.349647] do_sys_open+0x1b8/0x250
<4> [113.350526] do_syscall_64+0x55/0x1c0
<4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context
but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can
potentially go to sleep. When the kernel is in atomic context, sleeping is not
allowed. This is why this bug got triggered.
In order to fix this issue, we either
1) not enter into atomic context, i.e., to use non atomic version of
functions like io_mapping_map_wc/kmap,
or
2) make compress_page run in atomic context.
But it is not a good idea to run slow compression in atomic context, so,
1) above is preferred solution which is the implementation of this patch.
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
---
drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1f2f266f26af..7118ecb7f144 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
compress->wc = i915_gem_object_is_lmem(vma->obj) ||
drm_mm_node_allocated(&ggtt->error_capture);
ret = -EINVAL;
if (drm_mm_node_allocated(&ggtt->error_capture)) {
void __iomem *s;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
ggtt->vm.insert_page(&ggtt->vm, dma, slot,
I915_CACHE_NONE, 0);
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
io_mapping_unmap(s);
if (ret)
break;
}
} else if (i915_gem_object_is_lmem(vma->obj)) {
struct intel_memory_region *mem = vma->obj->mm.region;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
void __iomem *s;
- s = io_mapping_map_atomic_wc(&mem->iomap, dma);
+ s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
- io_mapping_unmap_atomic(s);
+ io_mapping_unmap(s);
if (ret)
break;
}
} else {
struct page *page;
for_each_sgt_page(page, iter, vma->pages) {
void *s;
drm_clflush_pages(&page, 1);
- s = kmap_atomic(page);
+ s = kmap(page);
ret = compress_page(compress, s, dst);
- kunmap_atomic(s);
+ kunmap(s);
drm_clflush_pages(&page, 1);
if (ret)
break;
}
}
if (ret || compress_flush(compress, dst)) {
while (dst->page_count--)
pool_free(&compress->pool, dst->pages[dst->page_count]);
kfree(dst);
dst = NULL;
}
compress_finish(compress);
return dst;
}
/*
* Generate a semi-unique error code. The code is not meant to have meaning, The
* code's only purpose is to try to prevent false duplicated bug reports by
* grossly estimating a GPU error state.
*
* TODO Ideally, hashing the batchbuffer would be a very nice way to determine
--
2.24.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
2019-11-13 23:11 [PATCH] drm/i915: Fix a bug calling sleep function in atomic context Bruce Chang
@ 2019-11-13 23:20 ` Chris Wilson
2019-11-18 15:59 ` Jani Nikula
1 sibling, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2019-11-13 23:20 UTC (permalink / raw)
To: Bruce Chang, intel-gfx
Quoting Bruce Chang (2019-11-13 23:11:04)
> below is the call trace when this issue is hit
>
> <3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
> <3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
> <4> [113.319900] no locks held by debugfs_test/678.
> <3> [113.321002] Preemption disabled at:
> <4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
> <4> [113.327259] Call Trace:
> <4> [113.327871] dump_stack+0x67/0x9b
> <4> [113.328683] ___might_sleep+0x167/0x250
> <4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
> <4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70
> <4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
> <4> [113.335951] compress_page+0x7c/0x100 [i915]
> <4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
> <4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
> <4> [113.339771] ? __lock_acquire+0x4ac/0x1e90
> <4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50
> <4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915]
> <4> [113.343243] full_proxy_open+0x139/0x1b0
> <4> [113.344196] ? open_proxy_open+0xc0/0xc0
> <4> [113.345149] do_dentry_open+0x1ce/0x3a0
> <4> [113.346084] path_openat+0x4c9/0xac0
> <4> [113.346967] do_filp_open+0x96/0x110
> <4> [113.347848] ? __alloc_fd+0xe0/0x1f0
> <4> [113.348736] ? do_sys_open+0x1b8/0x250
> <4> [113.349647] do_sys_open+0x1b8/0x250
> <4> [113.350526] do_syscall_64+0x55/0x1c0
> <4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context
> but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can
> potentially go to sleep. When the kernel is in atomic context, sleeping is not
> allowed. This is why this bug got triggered.
The last 2 sentences are redundant.
> In order to fix this issue, we either
> 1) not enter into atomic context, i.e., to use non atomic version of
> functions like io_mapping_map_wc/kmap,
> or
> 2) make compress_page run in atomic context.
>
> But it is not a good idea to run slow compression in atomic context, so,
> 1) above is preferred solution which is the implementation of this patch.
Reasonable, though we have and may have to do capture inside atomic
again. (Dropping the atomicity is a recent change that has a surprising
amount of controversy.)
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> Reviewed-by: Brian Welty <brian.welty@intel.com>
> Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
2019-11-13 23:11 [PATCH] drm/i915: Fix a bug calling sleep function in atomic context Bruce Chang
2019-11-13 23:20 ` Chris Wilson
@ 2019-11-18 15:59 ` Jani Nikula
1 sibling, 0 replies; 11+ messages in thread
From: Jani Nikula @ 2019-11-18 15:59 UTC (permalink / raw)
To: Bruce Chang, intel-gfx
On Wed, 13 Nov 2019, Bruce Chang <yu.bruce.chang@intel.com> wrote:
> below is the call trace when this issue is hit
>
> <3> [113.316247] BUG: sleeping function called from invalid context at mm/page_alloc.c:4653
> <3> [113.318190] in_atomic(): 1, irqs_disabled(): 0, pid: 678, name: debugfs_test
> <4> [113.319900] no locks held by debugfs_test/678.
> <3> [113.321002] Preemption disabled at:
> <4> [113.321130] [<ffffffffa02506d4>] i915_error_object_create+0x494/0x610 [i915]
> <4> [113.327259] Call Trace:
> <4> [113.327871] dump_stack+0x67/0x9b
> <4> [113.328683] ___might_sleep+0x167/0x250
> <4> [113.329618] __alloc_pages_nodemask+0x26b/0x1110
> <4> [113.330731] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.331943] ? ___slab_alloc.constprop.34+0x21c/0x380
> <4> [113.333169] ? __slab_alloc.isra.28.constprop.33+0x4d/0x70
> <4> [113.334614] pool_alloc.constprop.19+0x14/0x60 [i915]
> <4> [113.335951] compress_page+0x7c/0x100 [i915]
> <4> [113.337110] i915_error_object_create+0x4bd/0x610 [i915]
> <4> [113.338515] i915_capture_gpu_state+0x384/0x1680 [i915]
> <4> [113.339771] ? __lock_acquire+0x4ac/0x1e90
> <4> [113.340785] ? _raw_spin_lock_irqsave_nested+0x1/0x50
> <4> [113.342127] i915_gpu_info_open+0x44/0x70 [i915]
> <4> [113.343243] full_proxy_open+0x139/0x1b0
> <4> [113.344196] ? open_proxy_open+0xc0/0xc0
> <4> [113.345149] do_dentry_open+0x1ce/0x3a0
> <4> [113.346084] path_openat+0x4c9/0xac0
> <4> [113.346967] do_filp_open+0x96/0x110
> <4> [113.347848] ? __alloc_fd+0xe0/0x1f0
> <4> [113.348736] ? do_sys_open+0x1b8/0x250
> <4> [113.349647] do_sys_open+0x1b8/0x250
> <4> [113.350526] do_syscall_64+0x55/0x1c0
> <4> [113.351418] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> After the io_mapping_map_atomic_wc/kmap_atomic, the kernel enters atomic context
> but after that, compress_page calls pool_alloc with GFP_KERNEL flag which can
> potentially go to sleep. When the kernel is in atomic context, sleeping is not
> allowed. This is why this bug got triggered.
>
> In order to fix this issue, we either
> 1) not enter into atomic context, i.e., to use non atomic version of
> functions like io_mapping_map_wc/kmap,
> or
> 2) make compress_page run in atomic context.
>
> But it is not a good idea to run slow compression in atomic context, so,
> 1) above is preferred solution which is the implementation of this patch.
>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> Reviewed-by: Brian Welty <brian.welty@intel.com>
> Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 1f2f266f26af..7118ecb7f144 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
> compress->wc = i915_gem_object_is_lmem(vma->obj) ||
> drm_mm_node_allocated(&ggtt->error_capture);
>
> ret = -EINVAL;
> if (drm_mm_node_allocated(&ggtt->error_capture)) {
> void __iomem *s;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> I915_CACHE_NONE, 0);
>
> s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else if (i915_gem_object_is_lmem(vma->obj)) {
> struct intel_memory_region *mem = vma->obj->mm.region;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> void __iomem *s;
>
> - s = io_mapping_map_atomic_wc(&mem->iomap, dma);
> + s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> - io_mapping_unmap_atomic(s);
> + io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else {
> struct page *page;
>
> for_each_sgt_page(page, iter, vma->pages) {
> void *s;
>
> drm_clflush_pages(&page, 1);
>
> - s = kmap_atomic(page);
> + s = kmap(page);
> ret = compress_page(compress, s, dst);
> - kunmap_atomic(s);
> + kunmap(s);
>
> drm_clflush_pages(&page, 1);
>
> if (ret)
> break;
> }
> }
>
> if (ret || compress_flush(compress, dst)) {
> while (dst->page_count--)
> pool_free(&compress->pool, dst->pages[dst->page_count]);
> kfree(dst);
> dst = NULL;
> }
> compress_finish(compress);
>
> return dst;
> }
>
> /*
> * Generate a semi-unique error code. The code is not meant to have meaning, The
> * code's only purpose is to try to prevent false duplicated bug reports by
> * grossly estimating a GPU error state.
> *
> * TODO Ideally, hashing the batchbuffer would be a very nice way to determine
For future reference, please just use the default patch context size.
BR,
Jani.
--
Jani Nikula, Intel Open Source Graphics Center
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
@ 2019-11-13 19:52 Bruce Chang
2019-11-13 20:05 ` Chris Wilson
0 siblings, 1 reply; 11+ messages in thread
From: Bruce Chang @ 2019-11-13 19:52 UTC (permalink / raw)
To: intel-gfx
There are quite a few reports regarding "BUG: sleeping function called from
invalid context at mm/page_alloc.c"
Basically after the io_mapping_map_atomic_wc/kmap_atomic, it enters atomic
context, but compress_page cannot be called in atomic context as it will
call pool_alloc with GFP_KERNEL flag which can go to sleep. This is why
the bug got reported.
So, changed to non atomic version instead.
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
---
drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1f2f266f26af..7118ecb7f144 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
compress->wc = i915_gem_object_is_lmem(vma->obj) ||
drm_mm_node_allocated(&ggtt->error_capture);
ret = -EINVAL;
if (drm_mm_node_allocated(&ggtt->error_capture)) {
void __iomem *s;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
ggtt->vm.insert_page(&ggtt->vm, dma, slot,
I915_CACHE_NONE, 0);
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
io_mapping_unmap(s);
if (ret)
break;
}
} else if (i915_gem_object_is_lmem(vma->obj)) {
struct intel_memory_region *mem = vma->obj->mm.region;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
void __iomem *s;
- s = io_mapping_map_atomic_wc(&mem->iomap, dma);
+ s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
- io_mapping_unmap_atomic(s);
+ io_mapping_unmap(s);
if (ret)
break;
}
} else {
struct page *page;
for_each_sgt_page(page, iter, vma->pages) {
void *s;
drm_clflush_pages(&page, 1);
- s = kmap_atomic(page);
+ s = kmap(page);
ret = compress_page(compress, s, dst);
- kunmap_atomic(s);
+ kunmap(s);
drm_clflush_pages(&page, 1);
if (ret)
break;
}
}
if (ret || compress_flush(compress, dst)) {
while (dst->page_count--)
pool_free(&compress->pool, dst->pages[dst->page_count]);
kfree(dst);
dst = NULL;
}
compress_finish(compress);
return dst;
}
/*
* Generate a semi-unique error code. The code is not meant to have meaning, The
* code's only purpose is to try to prevent false duplicated bug reports by
* grossly estimating a GPU error state.
*
* TODO Ideally, hashing the batchbuffer would be a very nice way to determine
--
2.24.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
2019-11-13 19:52 Bruce Chang
@ 2019-11-13 20:05 ` Chris Wilson
0 siblings, 0 replies; 11+ messages in thread
From: Chris Wilson @ 2019-11-13 20:05 UTC (permalink / raw)
To: Bruce Chang, intel-gfx
Quoting Bruce Chang (2019-11-13 19:52:44)
> There are quite a few reports regarding "BUG: sleeping function called from
> invalid context at mm/page_alloc.c"
>
> Basically after the io_mapping_map_atomic_wc/kmap_atomic, it enters atomic
> context, but compress_page cannot be called in atomic context as it will
> call pool_alloc with GFP_KERNEL flag which can go to sleep. This is why
> the bug got reported.
Just a trimmed stack trace showing the bug will do fine; as the distance
to might_sleep_if() is short.
Then all you need to do is a quick description of why that is a problem,
and why you choose to fix it as you did. The latter is so that we can
assess if you've considered the alternatives, though in this case it is
trivial although the reason why GFP_KERNEL works for us here is not.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
@ 2019-11-13 0:28 Bruce Chang
2019-11-13 19:32 ` Brian Welty
0 siblings, 1 reply; 11+ messages in thread
From: Bruce Chang @ 2019-11-13 0:28 UTC (permalink / raw)
To: intel-gfx
There are quite a few reports regarding "BUG: sleeping function called from
invalid context at mm/page_alloc.c"
Basically after the io_mapping_map_atomic_wc/kmap_atomic, it enters atomic
context, but compress_page cannot be called in atomic context as it will
call pool_alloc with GFP_KERNEL flag which can go to sleep. This is why
the bug got reported.
So, changed to non atomic version instead.
Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
---
drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 1f2f266f26af..7118ecb7f144 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
compress->wc = i915_gem_object_is_lmem(vma->obj) ||
drm_mm_node_allocated(&ggtt->error_capture);
ret = -EINVAL;
if (drm_mm_node_allocated(&ggtt->error_capture)) {
void __iomem *s;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
ggtt->vm.insert_page(&ggtt->vm, dma, slot,
I915_CACHE_NONE, 0);
s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
io_mapping_unmap(s);
if (ret)
break;
}
} else if (i915_gem_object_is_lmem(vma->obj)) {
struct intel_memory_region *mem = vma->obj->mm.region;
dma_addr_t dma;
for_each_sgt_daddr(dma, iter, vma->pages) {
void __iomem *s;
- s = io_mapping_map_atomic_wc(&mem->iomap, dma);
+ s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
ret = compress_page(compress, (void __force *)s, dst);
- io_mapping_unmap_atomic(s);
+ io_mapping_unmap(s);
if (ret)
break;
}
} else {
struct page *page;
for_each_sgt_page(page, iter, vma->pages) {
void *s;
drm_clflush_pages(&page, 1);
- s = kmap_atomic(page);
+ s = kmap(page);
ret = compress_page(compress, s, dst);
- kunmap_atomic(s);
+ kunmap(s);
drm_clflush_pages(&page, 1);
if (ret)
break;
}
}
if (ret || compress_flush(compress, dst)) {
while (dst->page_count--)
pool_free(&compress->pool, dst->pages[dst->page_count]);
kfree(dst);
dst = NULL;
}
compress_finish(compress);
return dst;
}
/*
* Generate a semi-unique error code. The code is not meant to have meaning, The
* code's only purpose is to try to prevent false duplicated bug reports by
* grossly estimating a GPU error state.
*
* TODO Ideally, hashing the batchbuffer would be a very nice way to determine
--
2.24.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] drm/i915: Fix a bug calling sleep function in atomic context
2019-11-13 0:28 Bruce Chang
@ 2019-11-13 19:32 ` Brian Welty
0 siblings, 0 replies; 11+ messages in thread
From: Brian Welty @ 2019-11-13 19:32 UTC (permalink / raw)
To: Bruce Chang, intel-gfx
On 11/12/2019 4:28 PM, Bruce Chang wrote:
> There are quite a few reports regarding "BUG: sleeping function called from
> invalid context at mm/page_alloc.c"
>
> Basically after the io_mapping_map_atomic_wc/kmap_atomic, it enters atomic
> context, but compress_page cannot be called in atomic context as it will
> call pool_alloc with GFP_KERNEL flag which can go to sleep. This is why
> the bug got reported.
>
> So, changed to non atomic version instead.
The atomic functions were recently added, so seems worth a note that
you are fixing that patch by adding:
Fixes: 895d8ebeaa924 ("drm/i915: error capture with no ggtt slot")
And your fix here looks correct to me, so:
Reviewed-by: Brian Welty <brian.welty@intel.com>
>
> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com>
> ---
> drivers/gpu/drm/i915/i915_gpu_error.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 1f2f266f26af..7118ecb7f144 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1007,67 +1007,67 @@ i915_error_object_create(struct drm_i915_private *i915,
> compress->wc = i915_gem_object_is_lmem(vma->obj) ||
> drm_mm_node_allocated(&ggtt->error_capture);
>
> ret = -EINVAL;
> if (drm_mm_node_allocated(&ggtt->error_capture)) {
> void __iomem *s;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> ggtt->vm.insert_page(&ggtt->vm, dma, slot,
> I915_CACHE_NONE, 0);
>
> s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else if (i915_gem_object_is_lmem(vma->obj)) {
> struct intel_memory_region *mem = vma->obj->mm.region;
> dma_addr_t dma;
>
> for_each_sgt_daddr(dma, iter, vma->pages) {
> void __iomem *s;
>
> - s = io_mapping_map_atomic_wc(&mem->iomap, dma);
> + s = io_mapping_map_wc(&mem->iomap, dma, PAGE_SIZE);
> ret = compress_page(compress, (void __force *)s, dst);
> - io_mapping_unmap_atomic(s);
> + io_mapping_unmap(s);
> if (ret)
> break;
> }
> } else {
> struct page *page;
>
> for_each_sgt_page(page, iter, vma->pages) {
> void *s;
>
> drm_clflush_pages(&page, 1);
>
> - s = kmap_atomic(page);
> + s = kmap(page);
> ret = compress_page(compress, s, dst);
> - kunmap_atomic(s);
> + kunmap(s);
>
> drm_clflush_pages(&page, 1);
>
> if (ret)
> break;
> }
> }
>
> if (ret || compress_flush(compress, dst)) {
> while (dst->page_count--)
> pool_free(&compress->pool, dst->pages[dst->page_count]);
> kfree(dst);
> dst = NULL;
> }
> compress_finish(compress);
>
> return dst;
> }
>
> /*
> * Generate a semi-unique error code. The code is not meant to have meaning, The
> * code's only purpose is to try to prevent false duplicated bug reports by
> * grossly estimating a GPU error state.
> *
> * TODO Ideally, hashing the batchbuffer would be a very nice way to determine
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2019-11-18 15:59 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-13 23:25 [PATCH] drm/i915: Fix a bug calling sleep function in atomic context Bruce Chang
2019-11-13 23:25 ` [Intel-gfx] " Bruce Chang
2019-11-14 3:44 ` ✗ Fi.CI.BAT: failure for drm/i915: Fix a bug calling sleep function in atomic context (rev4) Patchwork
2019-11-14 3:44 ` [Intel-gfx] " Patchwork
-- strict thread matches above, loose matches on Subject: below --
2019-11-13 23:11 [PATCH] drm/i915: Fix a bug calling sleep function in atomic context Bruce Chang
2019-11-13 23:20 ` Chris Wilson
2019-11-18 15:59 ` Jani Nikula
2019-11-13 19:52 Bruce Chang
2019-11-13 20:05 ` Chris Wilson
2019-11-13 0:28 Bruce Chang
2019-11-13 19:32 ` Brian Welty
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.