* [PATCH] mm: fix locking order in mm_take_all_locks()
@ 2016-01-11 11:05 Kirill A. Shutemov
2016-01-12 14:45 ` Michal Hocko
0 siblings, 1 reply; 4+ messages in thread
From: Kirill A. Shutemov @ 2016-01-11 11:05 UTC (permalink / raw)
To: Andrew Morton, Dmitry Vyukov
Cc: linux-mm, Kirill A. Shutemov, Michal Hocko, Peter Zijlstra,
Andrea Arcangeli
Dmitry Vyukov has reported[1] possible deadlock (triggered by his syzkaller
fuzzer):
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&mapping->i_mmap_rwsem);
lock(&hugetlbfs_i_mmap_rwsem_key);
lock(&mapping->i_mmap_rwsem);
Both traces points to mm_take_all_locks() as a source of the problem.
It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka
mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem.
huge_pmd_share() does memory allocation under hugetlbfs_i_mmap_rwsem_key
and allocator can take i_mmap_rwsem if it hit reclaim. So we need to
take i_mmap_rwsem from all hugetlb VMAs before taking i_mmap_rwsem from
rest of VMAs.
The patch also documents locking order for hugetlbfs_i_mmap_rwsem_key.
[1] http://lkml.kernel.org/r/CACT4Y+Zu95tBs-0EvdiAKzUOsb4tczRRfCRTpLr4bg_OP9HuVg@mail.gmail.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
---
fs/hugetlbfs/inode.c | 2 +-
mm/mmap.c | 25 ++++++++++++++++++++-----
mm/rmap.c | 31 ++++++++++++++++---------------
3 files changed, 37 insertions(+), 21 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 47789292a582..bbc333b01ca3 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -708,7 +708,7 @@ static struct inode *hugetlbfs_get_root(struct super_block *sb,
/*
* Hugetlbfs is not reclaimable; therefore its i_mmap_rwsem will never
* be taken from reclaim -- unlike regular filesystems. This needs an
- * annotation because huge_pmd_share() does an allocation under
+ * annotation because huge_pmd_share() does an allocation under hugetlb's
* i_mmap_rwsem.
*/
static struct lock_class_key hugetlbfs_i_mmap_rwsem_key;
diff --git a/mm/mmap.c b/mm/mmap.c
index b3f00b616b81..84b12624ceb0 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3184,10 +3184,16 @@ static void vm_lock_mapping(struct mm_struct *mm, struct address_space *mapping)
* mapping->flags avoid to take the same lock twice, if more than one
* vma in this mm is backed by the same anon_vma or address_space.
*
- * We can take all the locks in random order because the VM code
- * taking i_mmap_rwsem or anon_vma->rwsem outside the mmap_sem never
- * takes more than one of them in a row. Secondly we're protected
- * against a concurrent mm_take_all_locks() by the mm_all_locks_mutex.
+ * We take locks in following order, accordingly to comment at beginning
+ * of mm/rmap.c:
+ * - all hugetlbfs_i_mmap_rwsem_key locks (aka mapping->i_mmap_rwsem for
+ * hugetlb mapping);
+ * - all i_mmap_rwsem locks;
+ * - all anon_vma->rwseml
+ *
+ * We can take all locks within these types randomly because the VM code
+ * doesn't nest them and we protected from parallel mm_take_all_locks() by
+ * mm_all_locks_mutex.
*
* mm_take_all_locks() and mm_drop_all_locks are expensive operations
* that may have to take thousand of locks.
@@ -3206,7 +3212,16 @@ int mm_take_all_locks(struct mm_struct *mm)
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (signal_pending(current))
goto out_unlock;
- if (vma->vm_file && vma->vm_file->f_mapping)
+ if (vma->vm_file && vma->vm_file->f_mapping &&
+ is_vm_hugetlb_page(vma))
+ vm_lock_mapping(mm, vma->vm_file->f_mapping);
+ }
+
+ for (vma = mm->mmap; vma; vma = vma->vm_next) {
+ if (signal_pending(current))
+ goto out_unlock;
+ if (vma->vm_file && vma->vm_file->f_mapping &&
+ !is_vm_hugetlb_page(vma))
vm_lock_mapping(mm, vma->vm_file->f_mapping);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index 68af2e32f7ed..79f3bf047f38 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -23,21 +23,22 @@
* inode->i_mutex (while writing or truncating, not reading or faulting)
* mm->mmap_sem
* page->flags PG_locked (lock_page)
- * mapping->i_mmap_rwsem
- * anon_vma->rwsem
- * mm->page_table_lock or pte_lock
- * zone->lru_lock (in mark_page_accessed, isolate_lru_page)
- * swap_lock (in swap_duplicate, swap_info_get)
- * mmlist_lock (in mmput, drain_mmlist and others)
- * mapping->private_lock (in __set_page_dirty_buffers)
- * mem_cgroup_{begin,end}_page_stat (memcg->move_lock)
- * mapping->tree_lock (widely used)
- * inode->i_lock (in set_page_dirty's __mark_inode_dirty)
- * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
- * sb_lock (within inode_lock in fs/fs-writeback.c)
- * mapping->tree_lock (widely used, in set_page_dirty,
- * in arch-dependent flush_dcache_mmap_lock,
- * within bdi.wb->list_lock in __sync_single_inode)
+ * hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share)
+ * mapping->i_mmap_rwsem
+ * anon_vma->rwsem
+ * mm->page_table_lock or pte_lock
+ * zone->lru_lock (in mark_page_accessed, isolate_lru_page)
+ * swap_lock (in swap_duplicate, swap_info_get)
+ * mmlist_lock (in mmput, drain_mmlist and others)
+ * mapping->private_lock (in __set_page_dirty_buffers)
+ * mem_cgroup_{begin,end}_page_stat (memcg->move_lock)
+ * mapping->tree_lock (widely used)
+ * inode->i_lock (in set_page_dirty's __mark_inode_dirty)
+ * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty)
+ * sb_lock (within inode_lock in fs/fs-writeback.c)
+ * mapping->tree_lock (widely used, in set_page_dirty,
+ * in arch-dependent flush_dcache_mmap_lock,
+ * within bdi.wb->list_lock in __sync_single_inode)
*
* anon_vma->rwsem,mapping->i_mutex (memory_failure, collect_procs_anon)
* ->tasklist_lock
--
2.6.4
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: fix locking order in mm_take_all_locks()
2016-01-11 11:05 [PATCH] mm: fix locking order in mm_take_all_locks() Kirill A. Shutemov
@ 2016-01-12 14:45 ` Michal Hocko
2016-01-12 14:52 ` Kirill A. Shutemov
0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2016-01-12 14:45 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Dmitry Vyukov, linux-mm, Peter Zijlstra, Andrea Arcangeli
On Mon 11-01-16 14:05:28, Kirill A. Shutemov wrote:
> Dmitry Vyukov has reported[1] possible deadlock (triggered by his syzkaller
> fuzzer):
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&hugetlbfs_i_mmap_rwsem_key);
> lock(&mapping->i_mmap_rwsem);
> lock(&hugetlbfs_i_mmap_rwsem_key);
> lock(&mapping->i_mmap_rwsem);
>
> Both traces points to mm_take_all_locks() as a source of the problem.
> It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka
> mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem.
Hmm, but huge_pmd_share is called with mmap_sem held no? At least my
current cscope claims that huge_pte_alloc is called from
copy_hugetlb_page_range and hugetlb_fault both of which should be called
with mmap sem held for write (via dup_mmap) resp. read (via page fault
resp. gup) while mm_take_all_locks expects mmap_sem for write as well.
> huge_pmd_share() does memory allocation under hugetlbfs_i_mmap_rwsem_key
> and allocator can take i_mmap_rwsem if it hit reclaim. So we need to
> take i_mmap_rwsem from all hugetlb VMAs before taking i_mmap_rwsem from
> rest of VMAs.
>
> The patch also documents locking order for hugetlbfs_i_mmap_rwsem_key.
The documentation part alone makes sense but I fail to see how this can
solve any deadlock in the current code.
> [1] http://lkml.kernel.org/r/CACT4Y+Zu95tBs-0EvdiAKzUOsb4tczRRfCRTpLr4bg_OP9HuVg@mail.gmail.com
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reported-by: Dmitry Vyukov <dvyukov@google.com>
> Cc: Michal Hocko <mhocko@suse.cz>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: fix locking order in mm_take_all_locks()
2016-01-12 14:45 ` Michal Hocko
@ 2016-01-12 14:52 ` Kirill A. Shutemov
2016-01-12 15:20 ` Michal Hocko
0 siblings, 1 reply; 4+ messages in thread
From: Kirill A. Shutemov @ 2016-01-12 14:52 UTC (permalink / raw)
To: Michal Hocko
Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, linux-mm,
Peter Zijlstra, Andrea Arcangeli
On Tue, Jan 12, 2016 at 03:45:21PM +0100, Michal Hocko wrote:
> On Mon 11-01-16 14:05:28, Kirill A. Shutemov wrote:
> > Dmitry Vyukov has reported[1] possible deadlock (triggered by his syzkaller
> > fuzzer):
> >
> > Possible unsafe locking scenario:
> >
> > CPU0 CPU1
> > ---- ----
> > lock(&hugetlbfs_i_mmap_rwsem_key);
> > lock(&mapping->i_mmap_rwsem);
> > lock(&hugetlbfs_i_mmap_rwsem_key);
> > lock(&mapping->i_mmap_rwsem);
> >
> > Both traces points to mm_take_all_locks() as a source of the problem.
> > It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka
> > mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem.
>
> Hmm, but huge_pmd_share is called with mmap_sem held no?
Why does it matter?
Both mappings can be mapped to different processes, so mmap_sem is no good
here.
--
Kirill A. Shutemov
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] mm: fix locking order in mm_take_all_locks()
2016-01-12 14:52 ` Kirill A. Shutemov
@ 2016-01-12 15:20 ` Michal Hocko
0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2016-01-12 15:20 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Kirill A. Shutemov, Andrew Morton, Dmitry Vyukov, linux-mm,
Peter Zijlstra, Andrea Arcangeli
On Tue 12-01-16 16:52:19, Kirill A. Shutemov wrote:
> On Tue, Jan 12, 2016 at 03:45:21PM +0100, Michal Hocko wrote:
> > On Mon 11-01-16 14:05:28, Kirill A. Shutemov wrote:
> > > Dmitry Vyukov has reported[1] possible deadlock (triggered by his syzkaller
> > > fuzzer):
> > >
> > > Possible unsafe locking scenario:
> > >
> > > CPU0 CPU1
> > > ---- ----
> > > lock(&hugetlbfs_i_mmap_rwsem_key);
> > > lock(&mapping->i_mmap_rwsem);
> > > lock(&hugetlbfs_i_mmap_rwsem_key);
> > > lock(&mapping->i_mmap_rwsem);
> > >
> > > Both traces points to mm_take_all_locks() as a source of the problem.
> > > It doesn't take care about ordering or hugetlbfs_i_mmap_rwsem_key (aka
> > > mapping->i_mmap_rwsem for hugetlb mapping) vs. i_mmap_rwsem.
> >
> > Hmm, but huge_pmd_share is called with mmap_sem held no?
>
> Why does it matter?
>
> Both mappings can be mapped to different processes, so mmap_sem is no good
> here.
You are right! Then it really makes a differencec.
Feel free to add
Reviewed-by: Michal Hocko <mhocko@suse.com>
--
Michal Hocko
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2016-01-12 15:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-11 11:05 [PATCH] mm: fix locking order in mm_take_all_locks() Kirill A. Shutemov
2016-01-12 14:45 ` Michal Hocko
2016-01-12 14:52 ` Kirill A. Shutemov
2016-01-12 15:20 ` Michal Hocko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.