* [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct @ 2019-03-02 15:11 Jan Stancek 2019-03-02 17:10 ` Matthew Wilcox 0 siblings, 1 reply; 12+ messages in thread From: Jan Stancek @ 2019-03-02 15:11 UTC (permalink / raw) To: linux-mm, akpm, willy, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, jstancek Cc: linux-kernel LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. This is a stress test, where one thread mmaps/writes/munmaps memory area and other thread is trying to read from it: CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) Call Trace: ([<0000000000000000>] (null)) [<00000000001adae4>] lock_acquire+0xec/0x258 [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 [<000000000012a780>] page_table_free+0x48/0x1a8 [<00000000002f6e54>] do_fault+0xdc/0x670 [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 page_table_free() is called with NULL mm parameter, but because "0" is a valid address on s390 (see S390_lowcore), it keeps going until it eventually crashes in lockdep's lock_acquire. This crash is reproducible at least since 4.14. Problem is that "vmf->vma" used in do_fault() can become stale. Because mmap_sem may be released, other threads can come in, call munmap() and cause "vma" be returned to kmem cache, and get zeroed/re-initialized and re-used: handle_mm_fault | __handle_mm_fault | do_fault | vma = vmf->vma | do_read_fault | __do_fault | vma->vm_ops->fault(vmf); | mmap_sem is released | | | do_munmap() | remove_vma_list() | remove_vma() | vm_area_free() | # vma is released | ... | # same vma is allocated | # from kmem cache | do_mmap() | vm_area_alloc() | memset(vma, 0, ...) | pte_free(vma->vm_mm, ...); | page_table_free | spin_lock_bh(&mm->context.lock);| <crash> | This patch pins mm_struct and stores its value, to avoid using potentially stale "vma" when calling pte_free(). [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c Signed-off-by: Jan Stancek <jstancek@redhat.com> --- mm/memory.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index e11ca9dd823f..1287ee9acbdc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3517,12 +3517,17 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf) * but allow concurrent faults). * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). + * If mmap_sem is released, vma may become invalid (for example + * by other thread calling munmap()). */ static vm_fault_t do_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + struct mm_struct *vm_mm = READ_ONCE(vma->vm_mm); vm_fault_t ret; + mmgrab(vm_mm); + /* * The VMA was not fully populated on mmap() or missing VM_DONTEXPAND */ @@ -3561,9 +3566,12 @@ static vm_fault_t do_fault(struct vm_fault *vmf) /* preallocated pagetable is unused: free it */ if (vmf->prealloc_pte) { - pte_free(vma->vm_mm, vmf->prealloc_pte); + pte_free(vm_mm, vmf->prealloc_pte); vmf->prealloc_pte = NULL; } + + mmdrop(vm_mm); + return ret; } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-02 15:11 [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct Jan Stancek @ 2019-03-02 17:10 ` Matthew Wilcox 2019-03-02 18:00 ` Jan Stancek 2019-03-02 18:19 ` [PATCH v2] " Jan Stancek 0 siblings, 2 replies; 12+ messages in thread From: Matthew Wilcox @ 2019-03-02 17:10 UTC (permalink / raw) To: Jan Stancek Cc: linux-mm, akpm, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, linux-kernel On Sat, Mar 02, 2019 at 04:11:26PM +0100, Jan Stancek wrote: > Problem is that "vmf->vma" used in do_fault() can become stale. > Because mmap_sem may be released, other threads can come in, > call munmap() and cause "vma" be returned to kmem cache, and > get zeroed/re-initialized and re-used: > This patch pins mm_struct and stores its value, to avoid using > potentially stale "vma" when calling pte_free(). OK, we need to cache the mm_struct, but why do we need the extra atomic op? There's surely no way the mm can be freed while the thread is in the middle of handling a fault. ie I would drop these lines: > + mmgrab(vm_mm); > + ... > + > + mmdrop(vm_mm); > + ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-02 17:10 ` Matthew Wilcox @ 2019-03-02 18:00 ` Jan Stancek 2019-03-02 18:19 ` [PATCH v2] " Jan Stancek 1 sibling, 0 replies; 12+ messages in thread From: Jan Stancek @ 2019-03-02 18:00 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-mm, akpm, peterz, riel, mhocko, ying huang, jrdr linux, jglisse, aneesh kumar, david, aarcange, raquini, rientjes, kirill, mgorman, linux-kernel ----- Original Message ----- > On Sat, Mar 02, 2019 at 04:11:26PM +0100, Jan Stancek wrote: > > Problem is that "vmf->vma" used in do_fault() can become stale. > > Because mmap_sem may be released, other threads can come in, > > call munmap() and cause "vma" be returned to kmem cache, and > > get zeroed/re-initialized and re-used: > > > This patch pins mm_struct and stores its value, to avoid using > > potentially stale "vma" when calling pte_free(). > > OK, we need to cache the mm_struct, but why do we need the extra atomic op? > There's surely no way the mm can be freed while the thread is in the middle > of handling a fault. You're right, I was needlessly paranoid. > > ie I would drop these lines: I'll send v2. Thanks, Jan > > > + mmgrab(vm_mm); > > + > ... > > + > > + mmdrop(vm_mm); > > + > ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-02 17:10 ` Matthew Wilcox 2019-03-02 18:00 ` Jan Stancek @ 2019-03-02 18:19 ` Jan Stancek 2019-03-02 18:45 ` Peter Zijlstra 2019-03-02 18:51 ` Andrea Arcangeli 1 sibling, 2 replies; 12+ messages in thread From: Jan Stancek @ 2019-03-02 18:19 UTC (permalink / raw) To: linux-mm, akpm, willy, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, jstancek Cc: linux-kernel LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. This is a stress test, where one thread mmaps/writes/munmaps memory area and other thread is trying to read from it: CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) Call Trace: ([<0000000000000000>] (null)) [<00000000001adae4>] lock_acquire+0xec/0x258 [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 [<000000000012a780>] page_table_free+0x48/0x1a8 [<00000000002f6e54>] do_fault+0xdc/0x670 [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 page_table_free() is called with NULL mm parameter, but because "0" is a valid address on s390 (see S390_lowcore), it keeps going until it eventually crashes in lockdep's lock_acquire. This crash is reproducible at least since 4.14. Problem is that "vmf->vma" used in do_fault() can become stale. Because mmap_sem may be released, other threads can come in, call munmap() and cause "vma" be returned to kmem cache, and get zeroed/re-initialized and re-used: handle_mm_fault | __handle_mm_fault | do_fault | vma = vmf->vma | do_read_fault | __do_fault | vma->vm_ops->fault(vmf); | mmap_sem is released | | | do_munmap() | remove_vma_list() | remove_vma() | vm_area_free() | # vma is released | ... | # same vma is allocated | # from kmem cache | do_mmap() | vm_area_alloc() | memset(vma, 0, ...) | pte_free(vma->vm_mm, ...); | page_table_free | spin_lock_bh(&mm->context.lock);| <crash> | Cache mm_struct to avoid using potentially stale "vma". [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c Signed-off-by: Jan Stancek <jstancek@redhat.com> --- mm/memory.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index e11ca9dd823f..6c1afc1ece50 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3517,10 +3517,13 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf) * but allow concurrent faults). * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). + * If mmap_sem is released, vma may become invalid (for example + * by other thread calling munmap()). */ static vm_fault_t do_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + struct mm_struct *vm_mm = READ_ONCE(vma->vm_mm); vm_fault_t ret; /* @@ -3561,7 +3564,7 @@ static vm_fault_t do_fault(struct vm_fault *vmf) /* preallocated pagetable is unused: free it */ if (vmf->prealloc_pte) { - pte_free(vma->vm_mm, vmf->prealloc_pte); + pte_free(vm_mm, vmf->prealloc_pte); vmf->prealloc_pte = NULL; } return ret; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-02 18:19 ` [PATCH v2] " Jan Stancek @ 2019-03-02 18:45 ` Peter Zijlstra 2019-03-02 18:51 ` Andrea Arcangeli 1 sibling, 0 replies; 12+ messages in thread From: Peter Zijlstra @ 2019-03-02 18:45 UTC (permalink / raw) To: Jan Stancek Cc: linux-mm, akpm, willy, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, linux-kernel On Sat, Mar 02, 2019 at 07:19:39PM +0100, Jan Stancek wrote: > static vm_fault_t do_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > + struct mm_struct *vm_mm = READ_ONCE(vma->vm_mm); Would this not need a corresponding WRITE_ONCE() in vma_init() ? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-02 18:19 ` [PATCH v2] " Jan Stancek 2019-03-02 18:45 ` Peter Zijlstra @ 2019-03-02 18:51 ` Andrea Arcangeli 2019-03-03 7:27 ` Jan Stancek 2019-03-03 7:28 ` [PATCH v3] " Jan Stancek 1 sibling, 2 replies; 12+ messages in thread From: Andrea Arcangeli @ 2019-03-02 18:51 UTC (permalink / raw) To: Jan Stancek Cc: linux-mm, akpm, willy, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, raquini, rientjes, kirill, mgorman, linux-kernel Hello Jan, On Sat, Mar 02, 2019 at 07:19:39PM +0100, Jan Stancek wrote: > + struct mm_struct *vm_mm = READ_ONCE(vma->vm_mm); The vma->vm_mm cannot change under gcc there, so no need of READ_ONCE. The release of mmap_sem has release semantics so the vma->vm_mm access cannot be reordered after up_read(mmap_sem) either. Other than the above detail: Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Thanks, Andrea ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-02 18:51 ` Andrea Arcangeli @ 2019-03-03 7:27 ` Jan Stancek 2019-03-03 7:28 ` [PATCH v3] " Jan Stancek 1 sibling, 0 replies; 12+ messages in thread From: Jan Stancek @ 2019-03-03 7:27 UTC (permalink / raw) To: Andrea Arcangeli, peterz Cc: linux-mm, akpm, willy, riel, mhocko, ying huang, jrdr linux, jglisse, aneesh kumar, david, raquini, rientjes, kirill, mgorman, linux-kernel ----- Original Message ----- > Hello Jan, > > On Sat, Mar 02, 2019 at 07:19:39PM +0100, Jan Stancek wrote: > > + struct mm_struct *vm_mm = READ_ONCE(vma->vm_mm); > > The vma->vm_mm cannot change under gcc there, so no need of > READ_ONCE. The release of mmap_sem has release semantics so the > vma->vm_mm access cannot be reordered after up_read(mmap_sem) either. > > Other than the above detail: > > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Thank you for review, I dropped READ_ONCE and sent v3 with your Reviewed-by included. I also successfully re-ran tests over-night. > Would this not need a corresponding WRITE_ONCE() in vma_init() ? There's at least 2 context switches between, so I think it wouldn't matter. My concern was gcc optimizing out vm_mm, and vma->vm_mm access happening only after do_read_fault(). ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-02 18:51 ` Andrea Arcangeli 2019-03-03 7:27 ` Jan Stancek @ 2019-03-03 7:28 ` Jan Stancek 2019-03-03 10:36 ` Matthew Wilcox ` (3 more replies) 1 sibling, 4 replies; 12+ messages in thread From: Jan Stancek @ 2019-03-03 7:28 UTC (permalink / raw) To: linux-mm, akpm, willy, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, jstancek Cc: linux-kernel LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. This is a stress test, where one thread mmaps/writes/munmaps memory area and other thread is trying to read from it: CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) Call Trace: ([<0000000000000000>] (null)) [<00000000001adae4>] lock_acquire+0xec/0x258 [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 [<000000000012a780>] page_table_free+0x48/0x1a8 [<00000000002f6e54>] do_fault+0xdc/0x670 [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 page_table_free() is called with NULL mm parameter, but because "0" is a valid address on s390 (see S390_lowcore), it keeps going until it eventually crashes in lockdep's lock_acquire. This crash is reproducible at least since 4.14. Problem is that "vmf->vma" used in do_fault() can become stale. Because mmap_sem may be released, other threads can come in, call munmap() and cause "vma" be returned to kmem cache, and get zeroed/re-initialized and re-used: handle_mm_fault | __handle_mm_fault | do_fault | vma = vmf->vma | do_read_fault | __do_fault | vma->vm_ops->fault(vmf); | mmap_sem is released | | | do_munmap() | remove_vma_list() | remove_vma() | vm_area_free() | # vma is released | ... | # same vma is allocated | # from kmem cache | do_mmap() | vm_area_alloc() | memset(vma, 0, ...) | pte_free(vma->vm_mm, ...); | page_table_free | spin_lock_bh(&mm->context.lock);| <crash> | Cache mm_struct to avoid using potentially stale "vma". [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c Signed-off-by: Jan Stancek <jstancek@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> --- mm/memory.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index e11ca9dd823f..e8d69ade5acc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3517,10 +3517,13 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf) * but allow concurrent faults). * The mmap_sem may have been released depending on flags and our * return value. See filemap_fault() and __lock_page_or_retry(). + * If mmap_sem is released, vma may become invalid (for example + * by other thread calling munmap()). */ static vm_fault_t do_fault(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + struct mm_struct *vm_mm = vma->vm_mm; vm_fault_t ret; /* @@ -3561,7 +3564,7 @@ static vm_fault_t do_fault(struct vm_fault *vmf) /* preallocated pagetable is unused: free it */ if (vmf->prealloc_pte) { - pte_free(vma->vm_mm, vmf->prealloc_pte); + pte_free(vm_mm, vmf->prealloc_pte); vmf->prealloc_pte = NULL; } return ret; -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-03 7:28 ` [PATCH v3] " Jan Stancek @ 2019-03-03 10:36 ` Matthew Wilcox 2019-03-04 0:13 ` Rafael Aquini ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Matthew Wilcox @ 2019-03-03 10:36 UTC (permalink / raw) To: Jan Stancek Cc: linux-mm, akpm, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, linux-kernel On Sun, Mar 03, 2019 at 08:28:04AM +0100, Jan Stancek wrote: > Cache mm_struct to avoid using potentially stale "vma". > > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Matthew Wilcox <willy@infradead.org> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-03 7:28 ` [PATCH v3] " Jan Stancek 2019-03-03 10:36 ` Matthew Wilcox @ 2019-03-04 0:13 ` Rafael Aquini 2019-03-04 8:10 ` Minchan Kim 2019-03-04 8:19 ` Kirill A. Shutemov 3 siblings, 0 replies; 12+ messages in thread From: Rafael Aquini @ 2019-03-04 0:13 UTC (permalink / raw) To: Jan Stancek Cc: linux-mm, akpm, willy, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, linux-kernel On Sun, Mar 03, 2019 at 08:28:04AM +0100, Jan Stancek wrote: > LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. > This is a stress test, where one thread mmaps/writes/munmaps memory area > and other thread is trying to read from it: > > CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 > Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) > Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) > Call Trace: > ([<0000000000000000>] (null)) > [<00000000001adae4>] lock_acquire+0xec/0x258 > [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 > [<000000000012a780>] page_table_free+0x48/0x1a8 > [<00000000002f6e54>] do_fault+0xdc/0x670 > [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 > [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 > [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 > [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 > > page_table_free() is called with NULL mm parameter, but because > "0" is a valid address on s390 (see S390_lowcore), it keeps > going until it eventually crashes in lockdep's lock_acquire. > This crash is reproducible at least since 4.14. > > Problem is that "vmf->vma" used in do_fault() can become stale. > Because mmap_sem may be released, other threads can come in, > call munmap() and cause "vma" be returned to kmem cache, and > get zeroed/re-initialized and re-used: > > handle_mm_fault | > __handle_mm_fault | > do_fault | > vma = vmf->vma | > do_read_fault | > __do_fault | > vma->vm_ops->fault(vmf); | > mmap_sem is released | > | > | do_munmap() > | remove_vma_list() > | remove_vma() > | vm_area_free() > | # vma is released > | ... > | # same vma is allocated > | # from kmem cache > | do_mmap() > | vm_area_alloc() > | memset(vma, 0, ...) > | > pte_free(vma->vm_mm, ...); | > page_table_free | > spin_lock_bh(&mm->context.lock);| > <crash> | > > Cache mm_struct to avoid using potentially stale "vma". > > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> > --- > mm/memory.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index e11ca9dd823f..e8d69ade5acc 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3517,10 +3517,13 @@ static vm_fault_t do_shared_fault(struct vm_fault *vmf) > * but allow concurrent faults). > * The mmap_sem may have been released depending on flags and our > * return value. See filemap_fault() and __lock_page_or_retry(). > + * If mmap_sem is released, vma may become invalid (for example > + * by other thread calling munmap()). > */ > static vm_fault_t do_fault(struct vm_fault *vmf) > { > struct vm_area_struct *vma = vmf->vma; > + struct mm_struct *vm_mm = vma->vm_mm; > vm_fault_t ret; > > /* > @@ -3561,7 +3564,7 @@ static vm_fault_t do_fault(struct vm_fault *vmf) > > /* preallocated pagetable is unused: free it */ > if (vmf->prealloc_pte) { > - pte_free(vma->vm_mm, vmf->prealloc_pte); > + pte_free(vm_mm, vmf->prealloc_pte); > vmf->prealloc_pte = NULL; > } > return ret; > -- > 1.8.3.1 > Acked-by: Rafael Aquini <aquini@redhat.com> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-03 7:28 ` [PATCH v3] " Jan Stancek 2019-03-03 10:36 ` Matthew Wilcox 2019-03-04 0:13 ` Rafael Aquini @ 2019-03-04 8:10 ` Minchan Kim 2019-03-04 8:19 ` Kirill A. Shutemov 3 siblings, 0 replies; 12+ messages in thread From: Minchan Kim @ 2019-03-04 8:10 UTC (permalink / raw) To: Jan Stancek Cc: linux-mm, akpm, willy, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, kirill, mgorman, linux-kernel On Sun, Mar 03, 2019 at 08:28:04AM +0100, Jan Stancek wrote: > LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. > This is a stress test, where one thread mmaps/writes/munmaps memory area > and other thread is trying to read from it: > > CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 > Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) > Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) > Call Trace: > ([<0000000000000000>] (null)) > [<00000000001adae4>] lock_acquire+0xec/0x258 > [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 > [<000000000012a780>] page_table_free+0x48/0x1a8 > [<00000000002f6e54>] do_fault+0xdc/0x670 > [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 > [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 > [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 > [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 > > page_table_free() is called with NULL mm parameter, but because > "0" is a valid address on s390 (see S390_lowcore), it keeps > going until it eventually crashes in lockdep's lock_acquire. > This crash is reproducible at least since 4.14. > > Problem is that "vmf->vma" used in do_fault() can become stale. > Because mmap_sem may be released, other threads can come in, > call munmap() and cause "vma" be returned to kmem cache, and > get zeroed/re-initialized and re-used: > > handle_mm_fault | > __handle_mm_fault | > do_fault | > vma = vmf->vma | > do_read_fault | > __do_fault | > vma->vm_ops->fault(vmf); | > mmap_sem is released | > | > | do_munmap() > | remove_vma_list() > | remove_vma() > | vm_area_free() > | # vma is released > | ... > | # same vma is allocated > | # from kmem cache > | do_mmap() > | vm_area_alloc() > | memset(vma, 0, ...) > | > pte_free(vma->vm_mm, ...); | > page_table_free | > spin_lock_bh(&mm->context.lock);| > <crash> | > > Cache mm_struct to avoid using potentially stale "vma". > > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Isn't it -stable material? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3] mm/memory.c: do_fault: avoid usage of stale vm_area_struct 2019-03-03 7:28 ` [PATCH v3] " Jan Stancek ` (2 preceding siblings ...) 2019-03-04 8:10 ` Minchan Kim @ 2019-03-04 8:19 ` Kirill A. Shutemov 3 siblings, 0 replies; 12+ messages in thread From: Kirill A. Shutemov @ 2019-03-04 8:19 UTC (permalink / raw) To: Jan Stancek Cc: linux-mm, akpm, willy, peterz, riel, mhocko, ying.huang, jrdr.linux, jglisse, aneesh.kumar, david, aarcange, raquini, rientjes, mgorman, linux-kernel On Sun, Mar 03, 2019 at 08:28:04AM +0100, Jan Stancek wrote: > LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8. > This is a stress test, where one thread mmaps/writes/munmaps memory area > and other thread is trying to read from it: > > CPU: 0 PID: 2611 Comm: mmap1 Not tainted 5.0.0-rc8+ #51 > Hardware name: IBM 2964 N63 400 (z/VM 6.4.0) > Krnl PSW : 0404e00180000000 00000000001ac8d8 (__lock_acquire+0x7/0x7a8) > Call Trace: > ([<0000000000000000>] (null)) > [<00000000001adae4>] lock_acquire+0xec/0x258 > [<000000000080d1ac>] _raw_spin_lock_bh+0x5c/0x98 > [<000000000012a780>] page_table_free+0x48/0x1a8 > [<00000000002f6e54>] do_fault+0xdc/0x670 > [<00000000002fadae>] __handle_mm_fault+0x416/0x5f0 > [<00000000002fb138>] handle_mm_fault+0x1b0/0x320 > [<00000000001248cc>] do_dat_exception+0x19c/0x2c8 > [<000000000080e5ee>] pgm_check_handler+0x19e/0x200 > > page_table_free() is called with NULL mm parameter, but because > "0" is a valid address on s390 (see S390_lowcore), it keeps > going until it eventually crashes in lockdep's lock_acquire. > This crash is reproducible at least since 4.14. > > Problem is that "vmf->vma" used in do_fault() can become stale. > Because mmap_sem may be released, other threads can come in, > call munmap() and cause "vma" be returned to kmem cache, and > get zeroed/re-initialized and re-used: > > handle_mm_fault | > __handle_mm_fault | > do_fault | > vma = vmf->vma | > do_read_fault | > __do_fault | > vma->vm_ops->fault(vmf); | > mmap_sem is released | > | > | do_munmap() > | remove_vma_list() > | remove_vma() > | vm_area_free() > | # vma is released > | ... > | # same vma is allocated > | # from kmem cache > | do_mmap() > | vm_area_alloc() > | memset(vma, 0, ...) > | > pte_free(vma->vm_mm, ...); | > page_table_free | > spin_lock_bh(&mm->context.lock);| > <crash> | > > Cache mm_struct to avoid using potentially stale "vma". > > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest06/mmap1.c > > Signed-off-by: Jan Stancek <jstancek@redhat.com> > Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2019-03-04 8:19 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-02 15:11 [PATCH] mm/memory.c: do_fault: avoid usage of stale vm_area_struct Jan Stancek 2019-03-02 17:10 ` Matthew Wilcox 2019-03-02 18:00 ` Jan Stancek 2019-03-02 18:19 ` [PATCH v2] " Jan Stancek 2019-03-02 18:45 ` Peter Zijlstra 2019-03-02 18:51 ` Andrea Arcangeli 2019-03-03 7:27 ` Jan Stancek 2019-03-03 7:28 ` [PATCH v3] " Jan Stancek 2019-03-03 10:36 ` Matthew Wilcox 2019-03-04 0:13 ` Rafael Aquini 2019-03-04 8:10 ` Minchan Kim 2019-03-04 8:19 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).