linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
@ 2017-08-07 14:09 Artem Savkov
  2017-08-15 19:01 ` Paul E. McKenney
  2017-08-16 14:07 ` Thomas Gleixner
  0 siblings, 2 replies; 15+ messages in thread
From: Artem Savkov @ 2017-08-07 14:09 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Paul E. McKenney, linux-kernel

Hello,

After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
the following lockdep splat started showing up on some systems while running
ltp's madvise06 test (right after first dirty_pages call [1]).

[1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136

[21002.630252] ======================================================
[21002.637148] WARNING: possible circular locking dependency detected
[21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
[21002.649583] ------------------------------------------------------
[21002.656492] a.out/4771 is trying to acquire lock:
[21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
[21002.672629] 
[21002.672629] but task is already holding lock:
[21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21002.688371] 
[21002.688371] which lock already depends on the new lock.
[21002.688371] 
[21002.697505] 
[21002.697505] the existing dependency chain (in reverse order) is:
[21002.705856] 
[21002.705856] -> #3 (&mm->mmap_sem){++++++}:
[21002.712080]        lock_acquire+0xc9/0x230
[21002.716661]        __might_fault+0x70/0xa0
[21002.721241]        _copy_to_user+0x23/0x70
[21002.725814]        filldir+0xa7/0x110
[21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
[21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
[21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
[21002.747485]        iterate_dir+0x17a/0x1a0
[21002.752057]        SyS_getdents+0xb0/0x160
[21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[21002.762371] 
[21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
[21002.769661]        lock_acquire+0xc9/0x230
[21002.774239]        down_read+0x51/0xb0
[21002.778429]        lookup_slow+0xde/0x210
[21002.782903]        walk_component+0x160/0x250
[21002.787765]        link_path_walk+0x1a6/0x610
[21002.792625]        path_openat+0xe4/0xd50
[21002.797100]        do_filp_open+0x91/0x100
[21002.801673]        file_open_name+0xf5/0x130
[21002.806429]        filp_open+0x33/0x50
[21002.810620]        kernel_read_file_from_path+0x39/0x80
[21002.816459]        _request_firmware+0x39f/0x880
[21002.821610]        request_firmware_direct+0x37/0x50
[21002.827151]        request_microcode_fw+0x64/0xe0
[21002.832401]        reload_store+0xf7/0x180
[21002.836974]        dev_attr_store+0x18/0x30
[21002.841641]        sysfs_kf_write+0x44/0x60
[21002.846318]        kernfs_fop_write+0x113/0x1a0
[21002.851374]        __vfs_write+0x37/0x170
[21002.855849]        vfs_write+0xc7/0x1c0
[21002.860128]        SyS_write+0x58/0xc0
[21002.864313]        do_syscall_64+0x6c/0x1f0
[21002.868973]        return_from_SYSCALL_64+0x0/0x7a
[21002.874317] 
[21002.874317] -> #1 (microcode_mutex){+.+.+.}:
[21002.880748]        lock_acquire+0xc9/0x230
[21002.885322]        __mutex_lock+0x88/0x960
[21002.889894]        mutex_lock_nested+0x1b/0x20
[21002.894854]        microcode_init+0xbb/0x208
[21002.899617]        do_one_initcall+0x51/0x1a9
[21002.904481]        kernel_init_freeable+0x208/0x2a7
[21002.909922]        kernel_init+0xe/0x104
[21002.914298]        ret_from_fork+0x2a/0x40
[21002.918867] 
[21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
[21002.926058]        __lock_acquire+0x153c/0x1550
[21002.931112]        lock_acquire+0xc9/0x230
[21002.935688]        cpus_read_lock+0x4b/0x90
[21002.940353]        drain_all_stock.part.35+0x18/0x140
[21002.945987]        try_charge+0x3ab/0x6e0
[21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
[21002.955902]        shmem_getpage_gfp+0x25f/0x1050
[21002.961149]        shmem_fault+0x96/0x200
[21002.965621]        __do_fault+0x1e/0xa0
[21002.969905]        __handle_mm_fault+0x9c3/0xe00
[21002.975056]        handle_mm_fault+0x16e/0x380
[21002.980013]        __do_page_fault+0x24a/0x530
[21002.984968]        do_page_fault+0x30/0x80
[21002.989537]        page_fault+0x28/0x30
[21002.993812] 
[21002.993812] other info that might help us debug this:
[21002.993812] 
[21003.002744] Chain exists of:
[21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
[21003.002744] 
[21003.016238]  Possible unsafe locking scenario:
[21003.016238] 
[21003.022843]        CPU0                    CPU1
[21003.027896]        ----                    ----
[21003.032948]   lock(&mm->mmap_sem);
[21003.036741]                                lock(&type->i_mutex_dir_key#3);
[21003.044419]                                lock(&mm->mmap_sem);
[21003.051025]   lock(cpu_hotplug_lock.rw_sem);
[21003.055788] 
[21003.055788]  *** DEADLOCK ***
[21003.055788] 
[21003.062393] 2 locks held by a.out/4771:
[21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
[21003.086198] 
[21003.086198] stack backtrace:
[21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 4.13.0-rc3-next-20170807 #12
[21003.099409] Hardware name: Dell Inc. PowerEdge M520/0DW6GX, BIOS 2.4.2 02/03/2015
[21003.107766] Call Trace:
[21003.110495]  dump_stack+0x85/0xc9
[21003.114190]  print_circular_bug+0x1f9/0x207
[21003.118854]  __lock_acquire+0x153c/0x1550
[21003.123327]  lock_acquire+0xc9/0x230
[21003.127313]  ? drain_all_stock.part.35+0x18/0x140
[21003.132563]  cpus_read_lock+0x4b/0x90
[21003.136652]  ? drain_all_stock.part.35+0x18/0x140
[21003.141900]  drain_all_stock.part.35+0x18/0x140
[21003.146954]  try_charge+0x3ab/0x6e0
[21003.150846]  mem_cgroup_try_charge+0x7f/0x2c0
[21003.155705]  shmem_getpage_gfp+0x25f/0x1050
[21003.160374]  shmem_fault+0x96/0x200
[21003.164263]  ? __lock_acquire+0x2fb/0x1550
[21003.168832]  ? __lock_acquire+0x2fb/0x1550
[21003.173402]  __do_fault+0x1e/0xa0
[21003.177097]  __handle_mm_fault+0x9c3/0xe00
[21003.181669]  handle_mm_fault+0x16e/0x380
[21003.186045]  ? handle_mm_fault+0x49/0x380
[21003.190518]  __do_page_fault+0x24a/0x530
[21003.194895]  do_page_fault+0x30/0x80
[21003.198883]  page_fault+0x28/0x30
[21003.202593] RIP: 0033:0x400886
[21003.205998] RSP: 002b:00007fff81d84d20 EFLAGS: 00010206
[21003.211827] RAX: 00007fc763bb7000 RBX: 0000000000000000 RCX: 0000000000001000
[21003.219789] RDX: 0000000006362000 RSI: 0000000019000000 RDI: 00007fc75d855000
[21003.227751] RBP: 00007fff81d84d50 R08: ffffffffffffffff R09: 0000000000000000
[21003.235713] R10: 00007fff81d84a30 R11: 00007fc7769445d0 R12: 0000000000400750
[21003.243681] R13: 00007fff81d84f70 R14: 0000000000000000 R15: 0000000000000000

-- 
Regards,
  Artem

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-07 14:09 possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem Artem Savkov
@ 2017-08-15 19:01 ` Paul E. McKenney
  2017-08-16 13:39   ` Laurent Dufour
  2017-08-16 14:07 ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Paul E. McKenney @ 2017-08-15 19:01 UTC (permalink / raw)
  To: Artem Savkov; +Cc: Thomas Gleixner, linux-kernel, ldufour

On Mon, Aug 07, 2017 at 04:09:47PM +0200, Artem Savkov wrote:
> Hello,
> 
> After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
> the following lockdep splat started showing up on some systems while running
> ltp's madvise06 test (right after first dirty_pages call [1]).

Hello, Artem,

Have you tried running this with Laurent Dufour's speculative page-fault
patch set?  https://lwn.net/Articles/730160/

							Thanx, Paul

> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
> 
> [21002.630252] ======================================================
> [21002.637148] WARNING: possible circular locking dependency detected
> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> [21002.649583] ------------------------------------------------------
> [21002.656492] a.out/4771 is trying to acquire lock:
> [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
> [21002.672629] 
> [21002.672629] but task is already holding lock:
> [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21002.688371] 
> [21002.688371] which lock already depends on the new lock.
> [21002.688371] 
> [21002.697505] 
> [21002.697505] the existing dependency chain (in reverse order) is:
> [21002.705856] 
> [21002.705856] -> #3 (&mm->mmap_sem){++++++}:
> [21002.712080]        lock_acquire+0xc9/0x230
> [21002.716661]        __might_fault+0x70/0xa0
> [21002.721241]        _copy_to_user+0x23/0x70
> [21002.725814]        filldir+0xa7/0x110
> [21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> [21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
> [21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
> [21002.747485]        iterate_dir+0x17a/0x1a0
> [21002.752057]        SyS_getdents+0xb0/0x160
> [21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
> [21002.762371] 
> [21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
> [21002.769661]        lock_acquire+0xc9/0x230
> [21002.774239]        down_read+0x51/0xb0
> [21002.778429]        lookup_slow+0xde/0x210
> [21002.782903]        walk_component+0x160/0x250
> [21002.787765]        link_path_walk+0x1a6/0x610
> [21002.792625]        path_openat+0xe4/0xd50
> [21002.797100]        do_filp_open+0x91/0x100
> [21002.801673]        file_open_name+0xf5/0x130
> [21002.806429]        filp_open+0x33/0x50
> [21002.810620]        kernel_read_file_from_path+0x39/0x80
> [21002.816459]        _request_firmware+0x39f/0x880
> [21002.821610]        request_firmware_direct+0x37/0x50
> [21002.827151]        request_microcode_fw+0x64/0xe0
> [21002.832401]        reload_store+0xf7/0x180
> [21002.836974]        dev_attr_store+0x18/0x30
> [21002.841641]        sysfs_kf_write+0x44/0x60
> [21002.846318]        kernfs_fop_write+0x113/0x1a0
> [21002.851374]        __vfs_write+0x37/0x170
> [21002.855849]        vfs_write+0xc7/0x1c0
> [21002.860128]        SyS_write+0x58/0xc0
> [21002.864313]        do_syscall_64+0x6c/0x1f0
> [21002.868973]        return_from_SYSCALL_64+0x0/0x7a
> [21002.874317] 
> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
> [21002.880748]        lock_acquire+0xc9/0x230
> [21002.885322]        __mutex_lock+0x88/0x960
> [21002.889894]        mutex_lock_nested+0x1b/0x20
> [21002.894854]        microcode_init+0xbb/0x208
> [21002.899617]        do_one_initcall+0x51/0x1a9
> [21002.904481]        kernel_init_freeable+0x208/0x2a7
> [21002.909922]        kernel_init+0xe/0x104
> [21002.914298]        ret_from_fork+0x2a/0x40
> [21002.918867] 
> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
> [21002.926058]        __lock_acquire+0x153c/0x1550
> [21002.931112]        lock_acquire+0xc9/0x230
> [21002.935688]        cpus_read_lock+0x4b/0x90
> [21002.940353]        drain_all_stock.part.35+0x18/0x140
> [21002.945987]        try_charge+0x3ab/0x6e0
> [21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
> [21002.955902]        shmem_getpage_gfp+0x25f/0x1050
> [21002.961149]        shmem_fault+0x96/0x200
> [21002.965621]        __do_fault+0x1e/0xa0
> [21002.969905]        __handle_mm_fault+0x9c3/0xe00
> [21002.975056]        handle_mm_fault+0x16e/0x380
> [21002.980013]        __do_page_fault+0x24a/0x530
> [21002.984968]        do_page_fault+0x30/0x80
> [21002.989537]        page_fault+0x28/0x30
> [21002.993812] 
> [21002.993812] other info that might help us debug this:
> [21002.993812] 
> [21003.002744] Chain exists of:
> [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
> [21003.002744] 
> [21003.016238]  Possible unsafe locking scenario:
> [21003.016238] 
> [21003.022843]        CPU0                    CPU1
> [21003.027896]        ----                    ----
> [21003.032948]   lock(&mm->mmap_sem);
> [21003.036741]                                lock(&type->i_mutex_dir_key#3);
> [21003.044419]                                lock(&mm->mmap_sem);
> [21003.051025]   lock(cpu_hotplug_lock.rw_sem);
> [21003.055788] 
> [21003.055788]  *** DEADLOCK ***
> [21003.055788] 
> [21003.062393] 2 locks held by a.out/4771:
> [21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
> [21003.086198] 
> [21003.086198] stack backtrace:
> [21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 4.13.0-rc3-next-20170807 #12
> [21003.099409] Hardware name: Dell Inc. PowerEdge M520/0DW6GX, BIOS 2.4.2 02/03/2015
> [21003.107766] Call Trace:
> [21003.110495]  dump_stack+0x85/0xc9
> [21003.114190]  print_circular_bug+0x1f9/0x207
> [21003.118854]  __lock_acquire+0x153c/0x1550
> [21003.123327]  lock_acquire+0xc9/0x230
> [21003.127313]  ? drain_all_stock.part.35+0x18/0x140
> [21003.132563]  cpus_read_lock+0x4b/0x90
> [21003.136652]  ? drain_all_stock.part.35+0x18/0x140
> [21003.141900]  drain_all_stock.part.35+0x18/0x140
> [21003.146954]  try_charge+0x3ab/0x6e0
> [21003.150846]  mem_cgroup_try_charge+0x7f/0x2c0
> [21003.155705]  shmem_getpage_gfp+0x25f/0x1050
> [21003.160374]  shmem_fault+0x96/0x200
> [21003.164263]  ? __lock_acquire+0x2fb/0x1550
> [21003.168832]  ? __lock_acquire+0x2fb/0x1550
> [21003.173402]  __do_fault+0x1e/0xa0
> [21003.177097]  __handle_mm_fault+0x9c3/0xe00
> [21003.181669]  handle_mm_fault+0x16e/0x380
> [21003.186045]  ? handle_mm_fault+0x49/0x380
> [21003.190518]  __do_page_fault+0x24a/0x530
> [21003.194895]  do_page_fault+0x30/0x80
> [21003.198883]  page_fault+0x28/0x30
> [21003.202593] RIP: 0033:0x400886
> [21003.205998] RSP: 002b:00007fff81d84d20 EFLAGS: 00010206
> [21003.211827] RAX: 00007fc763bb7000 RBX: 0000000000000000 RCX: 0000000000001000
> [21003.219789] RDX: 0000000006362000 RSI: 0000000019000000 RDI: 00007fc75d855000
> [21003.227751] RBP: 00007fff81d84d50 R08: ffffffffffffffff R09: 0000000000000000
> [21003.235713] R10: 00007fff81d84a30 R11: 00007fc7769445d0 R12: 0000000000400750
> [21003.243681] R13: 00007fff81d84f70 R14: 0000000000000000 R15: 0000000000000000
> 
> -- 
> Regards,
>   Artem
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-15 19:01 ` Paul E. McKenney
@ 2017-08-16 13:39   ` Laurent Dufour
  2017-08-16 15:36     ` Artem Savkov
  0 siblings, 1 reply; 15+ messages in thread
From: Laurent Dufour @ 2017-08-16 13:39 UTC (permalink / raw)
  To: paulmck, Artem Savkov; +Cc: Thomas Gleixner, linux-kernel

On 15/08/2017 21:01, Paul E. McKenney wrote:
> On Mon, Aug 07, 2017 at 04:09:47PM +0200, Artem Savkov wrote:
>> Hello,
>>
>> After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
>> the following lockdep splat started showing up on some systems while running
>> ltp's madvise06 test (right after first dirty_pages call [1]).
> 
> Hello, Artem,
> 
> Have you tried running this with Laurent Dufour's speculative page-fault
> patch set?  https://lwn.net/Articles/730160/

Hello Artem, Hello Paul,

This would be a good idea to give it a try, but I don't think this will
help here as the speculative page fault handler is aborted if vma->ops is
set as it is the case in the following stack trace of the CPU #0.

This being said, the speculative page fault handler may also failed due to
VMA's changes occurring in our back. In such a case, the legacy page fault
handler is then tried and such a lock dependency is expected to raise again.

Cheers,
Laurent.

> 
>> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
>>
>> [21002.630252] ======================================================
>> [21002.637148] WARNING: possible circular locking dependency detected
>> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
>> [21002.649583] ------------------------------------------------------
>> [21002.656492] a.out/4771 is trying to acquire lock:
>> [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
>> [21002.672629] 
>> [21002.672629] but task is already holding lock:
>> [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
>> [21002.688371] 
>> [21002.688371] which lock already depends on the new lock.
>> [21002.688371] 
>> [21002.697505] 
>> [21002.697505] the existing dependency chain (in reverse order) is:
>> [21002.705856] 
>> [21002.705856] -> #3 (&mm->mmap_sem){++++++}:
>> [21002.712080]        lock_acquire+0xc9/0x230
>> [21002.716661]        __might_fault+0x70/0xa0
>> [21002.721241]        _copy_to_user+0x23/0x70
>> [21002.725814]        filldir+0xa7/0x110
>> [21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
>> [21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
>> [21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
>> [21002.747485]        iterate_dir+0x17a/0x1a0
>> [21002.752057]        SyS_getdents+0xb0/0x160
>> [21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
>> [21002.762371] 
>> [21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
>> [21002.769661]        lock_acquire+0xc9/0x230
>> [21002.774239]        down_read+0x51/0xb0
>> [21002.778429]        lookup_slow+0xde/0x210
>> [21002.782903]        walk_component+0x160/0x250
>> [21002.787765]        link_path_walk+0x1a6/0x610
>> [21002.792625]        path_openat+0xe4/0xd50
>> [21002.797100]        do_filp_open+0x91/0x100
>> [21002.801673]        file_open_name+0xf5/0x130
>> [21002.806429]        filp_open+0x33/0x50
>> [21002.810620]        kernel_read_file_from_path+0x39/0x80
>> [21002.816459]        _request_firmware+0x39f/0x880
>> [21002.821610]        request_firmware_direct+0x37/0x50
>> [21002.827151]        request_microcode_fw+0x64/0xe0
>> [21002.832401]        reload_store+0xf7/0x180
>> [21002.836974]        dev_attr_store+0x18/0x30
>> [21002.841641]        sysfs_kf_write+0x44/0x60
>> [21002.846318]        kernfs_fop_write+0x113/0x1a0
>> [21002.851374]        __vfs_write+0x37/0x170
>> [21002.855849]        vfs_write+0xc7/0x1c0
>> [21002.860128]        SyS_write+0x58/0xc0
>> [21002.864313]        do_syscall_64+0x6c/0x1f0
>> [21002.868973]        return_from_SYSCALL_64+0x0/0x7a
>> [21002.874317] 
>> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
>> [21002.880748]        lock_acquire+0xc9/0x230
>> [21002.885322]        __mutex_lock+0x88/0x960
>> [21002.889894]        mutex_lock_nested+0x1b/0x20
>> [21002.894854]        microcode_init+0xbb/0x208
>> [21002.899617]        do_one_initcall+0x51/0x1a9
>> [21002.904481]        kernel_init_freeable+0x208/0x2a7
>> [21002.909922]        kernel_init+0xe/0x104
>> [21002.914298]        ret_from_fork+0x2a/0x40
>> [21002.918867] 
>> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
>> [21002.926058]        __lock_acquire+0x153c/0x1550
>> [21002.931112]        lock_acquire+0xc9/0x230
>> [21002.935688]        cpus_read_lock+0x4b/0x90
>> [21002.940353]        drain_all_stock.part.35+0x18/0x140
>> [21002.945987]        try_charge+0x3ab/0x6e0
>> [21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
>> [21002.955902]        shmem_getpage_gfp+0x25f/0x1050
>> [21002.961149]        shmem_fault+0x96/0x200
>> [21002.965621]        __do_fault+0x1e/0xa0
>> [21002.969905]        __handle_mm_fault+0x9c3/0xe00
>> [21002.975056]        handle_mm_fault+0x16e/0x380
>> [21002.980013]        __do_page_fault+0x24a/0x530
>> [21002.984968]        do_page_fault+0x30/0x80
>> [21002.989537]        page_fault+0x28/0x30
>> [21002.993812] 
>> [21002.993812] other info that might help us debug this:
>> [21002.993812] 
>> [21003.002744] Chain exists of:
>> [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
>> [21003.002744] 
>> [21003.016238]  Possible unsafe locking scenario:
>> [21003.016238] 
>> [21003.022843]        CPU0                    CPU1
>> [21003.027896]        ----                    ----
>> [21003.032948]   lock(&mm->mmap_sem);
>> [21003.036741]                                lock(&type->i_mutex_dir_key#3);
>> [21003.044419]                                lock(&mm->mmap_sem);
>> [21003.051025]   lock(cpu_hotplug_lock.rw_sem);
>> [21003.055788] 
>> [21003.055788]  *** DEADLOCK ***
>> [21003.055788] 
>> [21003.062393] 2 locks held by a.out/4771:
>> [21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
>> [21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
>> [21003.086198] 
>> [21003.086198] stack backtrace:
>> [21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 4.13.0-rc3-next-20170807 #12
>> [21003.099409] Hardware name: Dell Inc. PowerEdge M520/0DW6GX, BIOS 2.4.2 02/03/2015
>> [21003.107766] Call Trace:
>> [21003.110495]  dump_stack+0x85/0xc9
>> [21003.114190]  print_circular_bug+0x1f9/0x207
>> [21003.118854]  __lock_acquire+0x153c/0x1550
>> [21003.123327]  lock_acquire+0xc9/0x230
>> [21003.127313]  ? drain_all_stock.part.35+0x18/0x140
>> [21003.132563]  cpus_read_lock+0x4b/0x90
>> [21003.136652]  ? drain_all_stock.part.35+0x18/0x140
>> [21003.141900]  drain_all_stock.part.35+0x18/0x140
>> [21003.146954]  try_charge+0x3ab/0x6e0
>> [21003.150846]  mem_cgroup_try_charge+0x7f/0x2c0
>> [21003.155705]  shmem_getpage_gfp+0x25f/0x1050
>> [21003.160374]  shmem_fault+0x96/0x200
>> [21003.164263]  ? __lock_acquire+0x2fb/0x1550
>> [21003.168832]  ? __lock_acquire+0x2fb/0x1550
>> [21003.173402]  __do_fault+0x1e/0xa0
>> [21003.177097]  __handle_mm_fault+0x9c3/0xe00
>> [21003.181669]  handle_mm_fault+0x16e/0x380
>> [21003.186045]  ? handle_mm_fault+0x49/0x380
>> [21003.190518]  __do_page_fault+0x24a/0x530
>> [21003.194895]  do_page_fault+0x30/0x80
>> [21003.198883]  page_fault+0x28/0x30
>> [21003.202593] RIP: 0033:0x400886
>> [21003.205998] RSP: 002b:00007fff81d84d20 EFLAGS: 00010206
>> [21003.211827] RAX: 00007fc763bb7000 RBX: 0000000000000000 RCX: 0000000000001000
>> [21003.219789] RDX: 0000000006362000 RSI: 0000000019000000 RDI: 00007fc75d855000
>> [21003.227751] RBP: 00007fff81d84d50 R08: ffffffffffffffff R09: 0000000000000000
>> [21003.235713] R10: 00007fff81d84a30 R11: 00007fc7769445d0 R12: 0000000000400750
>> [21003.243681] R13: 00007fff81d84f70 R14: 0000000000000000 R15: 0000000000000000
>>
>> -- 
>> Regards,
>>   Artem
>>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-07 14:09 possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem Artem Savkov
  2017-08-15 19:01 ` Paul E. McKenney
@ 2017-08-16 14:07 ` Thomas Gleixner
  2017-08-30 14:15   ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2017-08-16 14:07 UTC (permalink / raw)
  To: Artem Savkov
  Cc: Paul E. McKenney, LKML, Michal Hocko, Johannes Weiner, linux-mm

On Mon, 7 Aug 2017, Artem Savkov wrote:

+Cc mm folks ...

> Hello,
> 
> After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
> the following lockdep splat started showing up on some systems while running
> ltp's madvise06 test (right after first dirty_pages call [1]).
>
> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
> 
> [21002.630252] ======================================================
> [21002.637148] WARNING: possible circular locking dependency detected
> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> [21002.649583] ------------------------------------------------------
> [21002.656492] a.out/4771 is trying to acquire lock:
> [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
> [21002.672629] 
> [21002.672629] but task is already holding lock:
> [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21002.688371] 
> [21002.688371] which lock already depends on the new lock.
> [21002.688371] 
> [21002.697505] 
> [21002.697505] the existing dependency chain (in reverse order) is:
> [21002.705856] 
> [21002.705856] -> #3 (&mm->mmap_sem){++++++}:
> [21002.712080]        lock_acquire+0xc9/0x230
> [21002.716661]        __might_fault+0x70/0xa0
> [21002.721241]        _copy_to_user+0x23/0x70
> [21002.725814]        filldir+0xa7/0x110
> [21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> [21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
> [21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
> [21002.747485]        iterate_dir+0x17a/0x1a0
> [21002.752057]        SyS_getdents+0xb0/0x160
> [21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
> [21002.762371] 
> [21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
> [21002.769661]        lock_acquire+0xc9/0x230
> [21002.774239]        down_read+0x51/0xb0
> [21002.778429]        lookup_slow+0xde/0x210
> [21002.782903]        walk_component+0x160/0x250
> [21002.787765]        link_path_walk+0x1a6/0x610
> [21002.792625]        path_openat+0xe4/0xd50
> [21002.797100]        do_filp_open+0x91/0x100
> [21002.801673]        file_open_name+0xf5/0x130
> [21002.806429]        filp_open+0x33/0x50
> [21002.810620]        kernel_read_file_from_path+0x39/0x80
> [21002.816459]        _request_firmware+0x39f/0x880
> [21002.821610]        request_firmware_direct+0x37/0x50
> [21002.827151]        request_microcode_fw+0x64/0xe0
> [21002.832401]        reload_store+0xf7/0x180
> [21002.836974]        dev_attr_store+0x18/0x30
> [21002.841641]        sysfs_kf_write+0x44/0x60
> [21002.846318]        kernfs_fop_write+0x113/0x1a0
> [21002.851374]        __vfs_write+0x37/0x170
> [21002.855849]        vfs_write+0xc7/0x1c0
> [21002.860128]        SyS_write+0x58/0xc0
> [21002.864313]        do_syscall_64+0x6c/0x1f0
> [21002.868973]        return_from_SYSCALL_64+0x0/0x7a
> [21002.874317] 
> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
> [21002.880748]        lock_acquire+0xc9/0x230
> [21002.885322]        __mutex_lock+0x88/0x960
> [21002.889894]        mutex_lock_nested+0x1b/0x20
> [21002.894854]        microcode_init+0xbb/0x208
> [21002.899617]        do_one_initcall+0x51/0x1a9
> [21002.904481]        kernel_init_freeable+0x208/0x2a7
> [21002.909922]        kernel_init+0xe/0x104
> [21002.914298]        ret_from_fork+0x2a/0x40
> [21002.918867] 
> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
> [21002.926058]        __lock_acquire+0x153c/0x1550
> [21002.931112]        lock_acquire+0xc9/0x230
> [21002.935688]        cpus_read_lock+0x4b/0x90
> [21002.940353]        drain_all_stock.part.35+0x18/0x140
> [21002.945987]        try_charge+0x3ab/0x6e0
> [21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
> [21002.955902]        shmem_getpage_gfp+0x25f/0x1050
> [21002.961149]        shmem_fault+0x96/0x200
> [21002.965621]        __do_fault+0x1e/0xa0
> [21002.969905]        __handle_mm_fault+0x9c3/0xe00
> [21002.975056]        handle_mm_fault+0x16e/0x380
> [21002.980013]        __do_page_fault+0x24a/0x530
> [21002.984968]        do_page_fault+0x30/0x80
> [21002.989537]        page_fault+0x28/0x30
> [21002.993812] 
> [21002.993812] other info that might help us debug this:
> [21002.993812] 
> [21003.002744] Chain exists of:
> [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
> [21003.002744] 
> [21003.016238]  Possible unsafe locking scenario:
> [21003.016238] 
> [21003.022843]        CPU0                    CPU1
> [21003.027896]        ----                    ----
> [21003.032948]   lock(&mm->mmap_sem);
> [21003.036741]                                lock(&type->i_mutex_dir_key#3);
> [21003.044419]                                lock(&mm->mmap_sem);
> [21003.051025]   lock(cpu_hotplug_lock.rw_sem);
> [21003.055788] 
> [21003.055788]  *** DEADLOCK ***
> [21003.055788] 
> [21003.062393] 2 locks held by a.out/4771:
> [21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
> [21003.086198] 
> [21003.086198] stack backtrace:
> [21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 4.13.0-rc3-next-20170807 #12
> [21003.099409] Hardware name: Dell Inc. PowerEdge M520/0DW6GX, BIOS 2.4.2 02/03/2015
> [21003.107766] Call Trace:
> [21003.110495]  dump_stack+0x85/0xc9
> [21003.114190]  print_circular_bug+0x1f9/0x207
> [21003.118854]  __lock_acquire+0x153c/0x1550
> [21003.123327]  lock_acquire+0xc9/0x230
> [21003.127313]  ? drain_all_stock.part.35+0x18/0x140
> [21003.132563]  cpus_read_lock+0x4b/0x90
> [21003.136652]  ? drain_all_stock.part.35+0x18/0x140
> [21003.141900]  drain_all_stock.part.35+0x18/0x140
> [21003.146954]  try_charge+0x3ab/0x6e0
> [21003.150846]  mem_cgroup_try_charge+0x7f/0x2c0
> [21003.155705]  shmem_getpage_gfp+0x25f/0x1050
> [21003.160374]  shmem_fault+0x96/0x200
> [21003.164263]  ? __lock_acquire+0x2fb/0x1550
> [21003.168832]  ? __lock_acquire+0x2fb/0x1550
> [21003.173402]  __do_fault+0x1e/0xa0
> [21003.177097]  __handle_mm_fault+0x9c3/0xe00
> [21003.181669]  handle_mm_fault+0x16e/0x380
> [21003.186045]  ? handle_mm_fault+0x49/0x380
> [21003.190518]  __do_page_fault+0x24a/0x530
> [21003.194895]  do_page_fault+0x30/0x80
> [21003.198883]  page_fault+0x28/0x30
> [21003.202593] RIP: 0033:0x400886
> [21003.205998] RSP: 002b:00007fff81d84d20 EFLAGS: 00010206
> [21003.211827] RAX: 00007fc763bb7000 RBX: 0000000000000000 RCX: 0000000000001000
> [21003.219789] RDX: 0000000006362000 RSI: 0000000019000000 RDI: 00007fc75d855000
> [21003.227751] RBP: 00007fff81d84d50 R08: ffffffffffffffff R09: 0000000000000000
> [21003.235713] R10: 00007fff81d84a30 R11: 00007fc7769445d0 R12: 0000000000400750
> [21003.243681] R13: 00007fff81d84f70 R14: 0000000000000000 R15: 0000000000000000
> 
> -- 
> Regards,
>   Artem
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-16 13:39   ` Laurent Dufour
@ 2017-08-16 15:36     ` Artem Savkov
  2017-08-16 15:40       ` Laurent Dufour
  0 siblings, 1 reply; 15+ messages in thread
From: Artem Savkov @ 2017-08-16 15:36 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: paulmck, Thomas Gleixner, linux-kernel

On Wed, Aug 16, 2017 at 03:39:14PM +0200, Laurent Dufour wrote:
> On 15/08/2017 21:01, Paul E. McKenney wrote:
> > On Mon, Aug 07, 2017 at 04:09:47PM +0200, Artem Savkov wrote:
> >> Hello,
> >>
> >> After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
> >> the following lockdep splat started showing up on some systems while running
> >> ltp's madvise06 test (right after first dirty_pages call [1]).
> > 
> > Hello, Artem,
> > 
> > Have you tried running this with Laurent Dufour's speculative page-fault
> > patch set?  https://lwn.net/Articles/730160/
> 
> Hello Artem, Hello Paul,
> 
> This would be a good idea to give it a try, but I don't think this will
> help here as the speculative page fault handler is aborted if vma->ops is
> set as it is the case in the following stack trace of the CPU #0.
> 
> This being said, the speculative page fault handler may also failed due to
> VMA's changes occurring in our back. In such a case, the legacy page fault
> handler is then tried and such a lock dependency is expected to raise again.

I've tried with the patch set on top of rc5 and the warning is intact
except for cpu_hotplug_lock path having an extra call to
handle_pte_fault() between handle_mm_fault() and __do_fault().

[   32.036924] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
[   32.037628]        __lock_acquire+0x153c/0x1550
[   32.038231]        lock_acquire+0xc9/0x230
[   32.038740]        cpus_read_lock+0x4b/0x90
[   32.039292]        drain_all_stock.part.33+0x18/0x140
[   32.039815]        try_charge+0x3ab/0x6e0
[   32.040273]        mem_cgroup_try_charge+0x82/0x330
[   32.040845]        shmem_getpage_gfp+0x268/0x1050
[   32.041421]        shmem_fault+0x96/0x200
[   32.041955]        __do_fault+0x1e/0xa0
[   32.042354]        handle_pte_fault+0x4bd/0x980
[   32.042923]        __handle_mm_fault+0x21b/0x520
[   32.043497]        handle_mm_fault+0x16e/0x380
[   32.044070]        __do_page_fault+0x3fe/0x5a0
[   32.044532]        do_page_fault+0x30/0x80
[   32.045004]        page_fault+0x28/0x30


> 
> Cheers,
> Laurent.
> 
> > 
> >> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
> >>
> >> [21002.630252] ======================================================
> >> [21002.637148] WARNING: possible circular locking dependency detected
> >> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> >> [21002.649583] ------------------------------------------------------
> >> [21002.656492] a.out/4771 is trying to acquire lock:
> >> [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
> >> [21002.672629] 
> >> [21002.672629] but task is already holding lock:
> >> [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> >> [21002.688371] 
> >> [21002.688371] which lock already depends on the new lock.
> >> [21002.688371] 
> >> [21002.697505] 
> >> [21002.697505] the existing dependency chain (in reverse order) is:
> >> [21002.705856] 
> >> [21002.705856] -> #3 (&mm->mmap_sem){++++++}:
> >> [21002.712080]        lock_acquire+0xc9/0x230
> >> [21002.716661]        __might_fault+0x70/0xa0
> >> [21002.721241]        _copy_to_user+0x23/0x70
> >> [21002.725814]        filldir+0xa7/0x110
> >> [21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> >> [21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
> >> [21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
> >> [21002.747485]        iterate_dir+0x17a/0x1a0
> >> [21002.752057]        SyS_getdents+0xb0/0x160
> >> [21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
> >> [21002.762371] 
> >> [21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
> >> [21002.769661]        lock_acquire+0xc9/0x230
> >> [21002.774239]        down_read+0x51/0xb0
> >> [21002.778429]        lookup_slow+0xde/0x210
> >> [21002.782903]        walk_component+0x160/0x250
> >> [21002.787765]        link_path_walk+0x1a6/0x610
> >> [21002.792625]        path_openat+0xe4/0xd50
> >> [21002.797100]        do_filp_open+0x91/0x100
> >> [21002.801673]        file_open_name+0xf5/0x130
> >> [21002.806429]        filp_open+0x33/0x50
> >> [21002.810620]        kernel_read_file_from_path+0x39/0x80
> >> [21002.816459]        _request_firmware+0x39f/0x880
> >> [21002.821610]        request_firmware_direct+0x37/0x50
> >> [21002.827151]        request_microcode_fw+0x64/0xe0
> >> [21002.832401]        reload_store+0xf7/0x180
> >> [21002.836974]        dev_attr_store+0x18/0x30
> >> [21002.841641]        sysfs_kf_write+0x44/0x60
> >> [21002.846318]        kernfs_fop_write+0x113/0x1a0
> >> [21002.851374]        __vfs_write+0x37/0x170
> >> [21002.855849]        vfs_write+0xc7/0x1c0
> >> [21002.860128]        SyS_write+0x58/0xc0
> >> [21002.864313]        do_syscall_64+0x6c/0x1f0
> >> [21002.868973]        return_from_SYSCALL_64+0x0/0x7a
> >> [21002.874317] 
> >> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
> >> [21002.880748]        lock_acquire+0xc9/0x230
> >> [21002.885322]        __mutex_lock+0x88/0x960
> >> [21002.889894]        mutex_lock_nested+0x1b/0x20
> >> [21002.894854]        microcode_init+0xbb/0x208
> >> [21002.899617]        do_one_initcall+0x51/0x1a9
> >> [21002.904481]        kernel_init_freeable+0x208/0x2a7
> >> [21002.909922]        kernel_init+0xe/0x104
> >> [21002.914298]        ret_from_fork+0x2a/0x40
> >> [21002.918867] 
> >> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
> >> [21002.926058]        __lock_acquire+0x153c/0x1550
> >> [21002.931112]        lock_acquire+0xc9/0x230
> >> [21002.935688]        cpus_read_lock+0x4b/0x90
> >> [21002.940353]        drain_all_stock.part.35+0x18/0x140
> >> [21002.945987]        try_charge+0x3ab/0x6e0
> >> [21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
> >> [21002.955902]        shmem_getpage_gfp+0x25f/0x1050
> >> [21002.961149]        shmem_fault+0x96/0x200
> >> [21002.965621]        __do_fault+0x1e/0xa0
> >> [21002.969905]        __handle_mm_fault+0x9c3/0xe00
> >> [21002.975056]        handle_mm_fault+0x16e/0x380
> >> [21002.980013]        __do_page_fault+0x24a/0x530
> >> [21002.984968]        do_page_fault+0x30/0x80
> >> [21002.989537]        page_fault+0x28/0x30
> >> [21002.993812] 
> >> [21002.993812] other info that might help us debug this:
> >> [21002.993812] 
> >> [21003.002744] Chain exists of:
> >> [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
> >> [21003.002744] 
> >> [21003.016238]  Possible unsafe locking scenario:
> >> [21003.016238] 
> >> [21003.022843]        CPU0                    CPU1
> >> [21003.027896]        ----                    ----
> >> [21003.032948]   lock(&mm->mmap_sem);
> >> [21003.036741]                                lock(&type->i_mutex_dir_key#3);
> >> [21003.044419]                                lock(&mm->mmap_sem);
> >> [21003.051025]   lock(cpu_hotplug_lock.rw_sem);
> >> [21003.055788] 
> >> [21003.055788]  *** DEADLOCK ***
> >> [21003.055788] 
> >> [21003.062393] 2 locks held by a.out/4771:
> >> [21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> >> [21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
> >> [21003.086198] 
> >> [21003.086198] stack backtrace:
> >> [21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 4.13.0-rc3-next-20170807 #12
> >> [21003.099409] Hardware name: Dell Inc. PowerEdge M520/0DW6GX, BIOS 2.4.2 02/03/2015
> >> [21003.107766] Call Trace:
> >> [21003.110495]  dump_stack+0x85/0xc9
> >> [21003.114190]  print_circular_bug+0x1f9/0x207
> >> [21003.118854]  __lock_acquire+0x153c/0x1550
> >> [21003.123327]  lock_acquire+0xc9/0x230
> >> [21003.127313]  ? drain_all_stock.part.35+0x18/0x140
> >> [21003.132563]  cpus_read_lock+0x4b/0x90
> >> [21003.136652]  ? drain_all_stock.part.35+0x18/0x140
> >> [21003.141900]  drain_all_stock.part.35+0x18/0x140
> >> [21003.146954]  try_charge+0x3ab/0x6e0
> >> [21003.150846]  mem_cgroup_try_charge+0x7f/0x2c0
> >> [21003.155705]  shmem_getpage_gfp+0x25f/0x1050
> >> [21003.160374]  shmem_fault+0x96/0x200
> >> [21003.164263]  ? __lock_acquire+0x2fb/0x1550
> >> [21003.168832]  ? __lock_acquire+0x2fb/0x1550
> >> [21003.173402]  __do_fault+0x1e/0xa0
> >> [21003.177097]  __handle_mm_fault+0x9c3/0xe00
> >> [21003.181669]  handle_mm_fault+0x16e/0x380
> >> [21003.186045]  ? handle_mm_fault+0x49/0x380
> >> [21003.190518]  __do_page_fault+0x24a/0x530
> >> [21003.194895]  do_page_fault+0x30/0x80
> >> [21003.198883]  page_fault+0x28/0x30
> >> [21003.202593] RIP: 0033:0x400886
> >> [21003.205998] RSP: 002b:00007fff81d84d20 EFLAGS: 00010206
> >> [21003.211827] RAX: 00007fc763bb7000 RBX: 0000000000000000 RCX: 0000000000001000
> >> [21003.219789] RDX: 0000000006362000 RSI: 0000000019000000 RDI: 00007fc75d855000
> >> [21003.227751] RBP: 00007fff81d84d50 R08: ffffffffffffffff R09: 0000000000000000
> >> [21003.235713] R10: 00007fff81d84a30 R11: 00007fc7769445d0 R12: 0000000000400750
> >> [21003.243681] R13: 00007fff81d84f70 R14: 0000000000000000 R15: 0000000000000000
> >>
> >> -- 
> >> Regards,
> >>   Artem
> >>
> 

-- 
Regards,
  Artem

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-16 15:36     ` Artem Savkov
@ 2017-08-16 15:40       ` Laurent Dufour
  0 siblings, 0 replies; 15+ messages in thread
From: Laurent Dufour @ 2017-08-16 15:40 UTC (permalink / raw)
  To: Artem Savkov; +Cc: paulmck, Thomas Gleixner, linux-kernel

On 16/08/2017 17:36, Artem Savkov wrote:
> On Wed, Aug 16, 2017 at 03:39:14PM +0200, Laurent Dufour wrote:
>> On 15/08/2017 21:01, Paul E. McKenney wrote:
>>> On Mon, Aug 07, 2017 at 04:09:47PM +0200, Artem Savkov wrote:
>>>> Hello,
>>>>
>>>> After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
>>>> the following lockdep splat started showing up on some systems while running
>>>> ltp's madvise06 test (right after first dirty_pages call [1]).
>>>
>>> Hello, Artem,
>>>
>>> Have you tried running this with Laurent Dufour's speculative page-fault
>>> patch set?  https://lwn.net/Articles/730160/
>>
>> Hello Artem, Hello Paul,
>>
>> This would be a good idea to give it a try, but I don't think this will
>> help here as the speculative page fault handler is aborted if vma->ops is
>> set as it is the case in the following stack trace of the CPU #0.
>>
>> This being said, the speculative page fault handler may also failed due to
>> VMA's changes occurring in our back. In such a case, the legacy page fault
>> handler is then tried and such a lock dependency is expected to raise again.
> 
> I've tried with the patch set on top of rc5 and the warning is intact
> except for cpu_hotplug_lock path having an extra call to
> handle_pte_fault() between handle_mm_fault() and __do_fault().

This is because now handle_pte_fault() is no more inlined in
handle_mm_fault() as it may also be called by handle_speculative_fault().

> 
> [   32.036924] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
> [   32.037628]        __lock_acquire+0x153c/0x1550
> [   32.038231]        lock_acquire+0xc9/0x230
> [   32.038740]        cpus_read_lock+0x4b/0x90
> [   32.039292]        drain_all_stock.part.33+0x18/0x140
> [   32.039815]        try_charge+0x3ab/0x6e0
> [   32.040273]        mem_cgroup_try_charge+0x82/0x330
> [   32.040845]        shmem_getpage_gfp+0x268/0x1050
> [   32.041421]        shmem_fault+0x96/0x200
> [   32.041955]        __do_fault+0x1e/0xa0
> [   32.042354]        handle_pte_fault+0x4bd/0x980
> [   32.042923]        __handle_mm_fault+0x21b/0x520
> [   32.043497]        handle_mm_fault+0x16e/0x380
> [   32.044070]        __do_page_fault+0x3fe/0x5a0
> [   32.044532]        do_page_fault+0x30/0x80
> [   32.045004]        page_fault+0x28/0x30
> 
> 
>>
>> Cheers,
>> Laurent.
>>
>>>
>>>> [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
>>>>
>>>> [21002.630252] ======================================================
>>>> [21002.637148] WARNING: possible circular locking dependency detected
>>>> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
>>>> [21002.649583] ------------------------------------------------------
>>>> [21002.656492] a.out/4771 is trying to acquire lock:
>>>> [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
>>>> [21002.672629] 
>>>> [21002.672629] but task is already holding lock:
>>>> [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
>>>> [21002.688371] 
>>>> [21002.688371] which lock already depends on the new lock.
>>>> [21002.688371] 
>>>> [21002.697505] 
>>>> [21002.697505] the existing dependency chain (in reverse order) is:
>>>> [21002.705856] 
>>>> [21002.705856] -> #3 (&mm->mmap_sem){++++++}:
>>>> [21002.712080]        lock_acquire+0xc9/0x230
>>>> [21002.716661]        __might_fault+0x70/0xa0
>>>> [21002.721241]        _copy_to_user+0x23/0x70
>>>> [21002.725814]        filldir+0xa7/0x110
>>>> [21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
>>>> [21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
>>>> [21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
>>>> [21002.747485]        iterate_dir+0x17a/0x1a0
>>>> [21002.752057]        SyS_getdents+0xb0/0x160
>>>> [21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
>>>> [21002.762371] 
>>>> [21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
>>>> [21002.769661]        lock_acquire+0xc9/0x230
>>>> [21002.774239]        down_read+0x51/0xb0
>>>> [21002.778429]        lookup_slow+0xde/0x210
>>>> [21002.782903]        walk_component+0x160/0x250
>>>> [21002.787765]        link_path_walk+0x1a6/0x610
>>>> [21002.792625]        path_openat+0xe4/0xd50
>>>> [21002.797100]        do_filp_open+0x91/0x100
>>>> [21002.801673]        file_open_name+0xf5/0x130
>>>> [21002.806429]        filp_open+0x33/0x50
>>>> [21002.810620]        kernel_read_file_from_path+0x39/0x80
>>>> [21002.816459]        _request_firmware+0x39f/0x880
>>>> [21002.821610]        request_firmware_direct+0x37/0x50
>>>> [21002.827151]        request_microcode_fw+0x64/0xe0
>>>> [21002.832401]        reload_store+0xf7/0x180
>>>> [21002.836974]        dev_attr_store+0x18/0x30
>>>> [21002.841641]        sysfs_kf_write+0x44/0x60
>>>> [21002.846318]        kernfs_fop_write+0x113/0x1a0
>>>> [21002.851374]        __vfs_write+0x37/0x170
>>>> [21002.855849]        vfs_write+0xc7/0x1c0
>>>> [21002.860128]        SyS_write+0x58/0xc0
>>>> [21002.864313]        do_syscall_64+0x6c/0x1f0
>>>> [21002.868973]        return_from_SYSCALL_64+0x0/0x7a
>>>> [21002.874317] 
>>>> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
>>>> [21002.880748]        lock_acquire+0xc9/0x230
>>>> [21002.885322]        __mutex_lock+0x88/0x960
>>>> [21002.889894]        mutex_lock_nested+0x1b/0x20
>>>> [21002.894854]        microcode_init+0xbb/0x208
>>>> [21002.899617]        do_one_initcall+0x51/0x1a9
>>>> [21002.904481]        kernel_init_freeable+0x208/0x2a7
>>>> [21002.909922]        kernel_init+0xe/0x104
>>>> [21002.914298]        ret_from_fork+0x2a/0x40
>>>> [21002.918867] 
>>>> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
>>>> [21002.926058]        __lock_acquire+0x153c/0x1550
>>>> [21002.931112]        lock_acquire+0xc9/0x230
>>>> [21002.935688]        cpus_read_lock+0x4b/0x90
>>>> [21002.940353]        drain_all_stock.part.35+0x18/0x140
>>>> [21002.945987]        try_charge+0x3ab/0x6e0
>>>> [21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
>>>> [21002.955902]        shmem_getpage_gfp+0x25f/0x1050
>>>> [21002.961149]        shmem_fault+0x96/0x200
>>>> [21002.965621]        __do_fault+0x1e/0xa0
>>>> [21002.969905]        __handle_mm_fault+0x9c3/0xe00
>>>> [21002.975056]        handle_mm_fault+0x16e/0x380
>>>> [21002.980013]        __do_page_fault+0x24a/0x530
>>>> [21002.984968]        do_page_fault+0x30/0x80
>>>> [21002.989537]        page_fault+0x28/0x30
>>>> [21002.993812] 
>>>> [21002.993812] other info that might help us debug this:
>>>> [21002.993812] 
>>>> [21003.002744] Chain exists of:
>>>> [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
>>>> [21003.002744] 
>>>> [21003.016238]  Possible unsafe locking scenario:
>>>> [21003.016238] 
>>>> [21003.022843]        CPU0                    CPU1
>>>> [21003.027896]        ----                    ----
>>>> [21003.032948]   lock(&mm->mmap_sem);
>>>> [21003.036741]                                lock(&type->i_mutex_dir_key#3);
>>>> [21003.044419]                                lock(&mm->mmap_sem);
>>>> [21003.051025]   lock(cpu_hotplug_lock.rw_sem);
>>>> [21003.055788] 
>>>> [21003.055788]  *** DEADLOCK ***
>>>> [21003.055788] 
>>>> [21003.062393] 2 locks held by a.out/4771:
>>>> [21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
>>>> [21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
>>>> [21003.086198] 
>>>> [21003.086198] stack backtrace:
>>>> [21003.091059] CPU: 6 PID: 4771 Comm: a.out Not tainted 4.13.0-rc3-next-20170807 #12
>>>> [21003.099409] Hardware name: Dell Inc. PowerEdge M520/0DW6GX, BIOS 2.4.2 02/03/2015
>>>> [21003.107766] Call Trace:
>>>> [21003.110495]  dump_stack+0x85/0xc9
>>>> [21003.114190]  print_circular_bug+0x1f9/0x207
>>>> [21003.118854]  __lock_acquire+0x153c/0x1550
>>>> [21003.123327]  lock_acquire+0xc9/0x230
>>>> [21003.127313]  ? drain_all_stock.part.35+0x18/0x140
>>>> [21003.132563]  cpus_read_lock+0x4b/0x90
>>>> [21003.136652]  ? drain_all_stock.part.35+0x18/0x140
>>>> [21003.141900]  drain_all_stock.part.35+0x18/0x140
>>>> [21003.146954]  try_charge+0x3ab/0x6e0
>>>> [21003.150846]  mem_cgroup_try_charge+0x7f/0x2c0
>>>> [21003.155705]  shmem_getpage_gfp+0x25f/0x1050
>>>> [21003.160374]  shmem_fault+0x96/0x200
>>>> [21003.164263]  ? __lock_acquire+0x2fb/0x1550
>>>> [21003.168832]  ? __lock_acquire+0x2fb/0x1550
>>>> [21003.173402]  __do_fault+0x1e/0xa0
>>>> [21003.177097]  __handle_mm_fault+0x9c3/0xe00
>>>> [21003.181669]  handle_mm_fault+0x16e/0x380
>>>> [21003.186045]  ? handle_mm_fault+0x49/0x380
>>>> [21003.190518]  __do_page_fault+0x24a/0x530
>>>> [21003.194895]  do_page_fault+0x30/0x80
>>>> [21003.198883]  page_fault+0x28/0x30
>>>> [21003.202593] RIP: 0033:0x400886
>>>> [21003.205998] RSP: 002b:00007fff81d84d20 EFLAGS: 00010206
>>>> [21003.211827] RAX: 00007fc763bb7000 RBX: 0000000000000000 RCX: 0000000000001000
>>>> [21003.219789] RDX: 0000000006362000 RSI: 0000000019000000 RDI: 00007fc75d855000
>>>> [21003.227751] RBP: 00007fff81d84d50 R08: ffffffffffffffff R09: 0000000000000000
>>>> [21003.235713] R10: 00007fff81d84a30 R11: 00007fc7769445d0 R12: 0000000000400750
>>>> [21003.243681] R13: 00007fff81d84f70 R14: 0000000000000000 R15: 0000000000000000
>>>>
>>>> -- 
>>>> Regards,
>>>>   Artem
>>>>
>>
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-16 14:07 ` Thomas Gleixner
@ 2017-08-30 14:15   ` Michal Hocko
  2017-08-30 15:43     ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2017-08-30 14:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Artem Savkov, Paul E. McKenney, LKML, Johannes Weiner, linux-mm

On Wed 16-08-17 16:07:21, Thomas Gleixner wrote:
> On Mon, 7 Aug 2017, Artem Savkov wrote:
> 
> +Cc mm folks ...

Ups, this has fallen through cracks

> > Hello,
> > 
> > After commit fc8dffd "cpu/hotplug: Convert hotplug locking to percpu rwsem"
> > the following lockdep splat started showing up on some systems while running
> > ltp's madvise06 test (right after first dirty_pages call [1]).
> >
> > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c#L136
> > 
> > [21002.630252] ======================================================
> > [21002.637148] WARNING: possible circular locking dependency detected
> > [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> > [21002.649583] ------------------------------------------------------
> > [21002.656492] a.out/4771 is trying to acquire lock:
> > [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
> > [21002.672629] 
> > [21002.672629] but task is already holding lock:
> > [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[...]
> > [21002.993812] other info that might help us debug this:
> > [21002.993812] 
> > [21003.002744] Chain exists of:
> > [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
> > [21003.002744] 
> > [21003.016238]  Possible unsafe locking scenario:
> > [21003.016238] 
> > [21003.022843]        CPU0                    CPU1
> > [21003.027896]        ----                    ----
> > [21003.032948]   lock(&mm->mmap_sem);
> > [21003.036741]                                lock(&type->i_mutex_dir_key#3);
> > [21003.044419]                                lock(&mm->mmap_sem);
> > [21003.051025]   lock(cpu_hotplug_lock.rw_sem);

OK, this smells like the same thing we had to address for
drain_all_pages by a459eeb7b852 ("mm, page_alloc: do not depend on cpu
hotplug locks inside the allocator"). try_charge might be deep in the
call path so taking cpu_hotplug_lock just calls for troubles.

I have of course forgot all the subtle details about drain_all_pages but
re-reading the changelog it seems that we can get along with droping
{get,put}_online_cpus in because drain_local_stock (which is called from
the WQ context as well) is disabling irqs and _always_ operates on the
local cpu stock. So we cannot possibly race with the memory hotplug
AFAICS.

So what do you think about the following patch?
---
>From fefd7533a95050e63b22206b76259fe098e7991e Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 30 Aug 2017 16:09:01 +0200
Subject: [PATCH] mm, memcg: remove hotplug locking from try_charge

The following lockde splat has been noticed during LTP testing

[21002.630252] ======================================================
[21002.637148] WARNING: possible circular locking dependency detected
[21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
[21002.649583] ------------------------------------------------------
[21002.656492] a.out/4771 is trying to acquire lock:
[21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
[21002.672629]
[21002.672629] but task is already holding lock:
[21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21002.688371]
[21002.688371] which lock already depends on the new lock.
[21002.688371]
[21002.697505]
[21002.697505] the existing dependency chain (in reverse order) is:
[21002.705856]
[21002.705856] -> #3 (&mm->mmap_sem){++++++}:
[21002.712080]        lock_acquire+0xc9/0x230
[21002.716661]        __might_fault+0x70/0xa0
[21002.721241]        _copy_to_user+0x23/0x70
[21002.725814]        filldir+0xa7/0x110
[21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
[21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
[21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
[21002.747485]        iterate_dir+0x17a/0x1a0
[21002.752057]        SyS_getdents+0xb0/0x160
[21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[21002.762371]
[21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
[21002.769661]        lock_acquire+0xc9/0x230
[21002.774239]        down_read+0x51/0xb0
[21002.778429]        lookup_slow+0xde/0x210
[21002.782903]        walk_component+0x160/0x250
[21002.787765]        link_path_walk+0x1a6/0x610
[21002.792625]        path_openat+0xe4/0xd50
[21002.797100]        do_filp_open+0x91/0x100
[21002.801673]        file_open_name+0xf5/0x130
[21002.806429]        filp_open+0x33/0x50
[21002.810620]        kernel_read_file_from_path+0x39/0x80
[21002.816459]        _request_firmware+0x39f/0x880
[21002.821610]        request_firmware_direct+0x37/0x50
[21002.827151]        request_microcode_fw+0x64/0xe0
[21002.832401]        reload_store+0xf7/0x180
[21002.836974]        dev_attr_store+0x18/0x30
[21002.841641]        sysfs_kf_write+0x44/0x60
[21002.846318]        kernfs_fop_write+0x113/0x1a0
[21002.851374]        __vfs_write+0x37/0x170
[21002.855849]        vfs_write+0xc7/0x1c0
[21002.860128]        SyS_write+0x58/0xc0
[21002.864313]        do_syscall_64+0x6c/0x1f0
[21002.868973]        return_from_SYSCALL_64+0x0/0x7a
[21002.874317]
[21002.874317] -> #1 (microcode_mutex){+.+.+.}:
[21002.880748]        lock_acquire+0xc9/0x230
[21002.885322]        __mutex_lock+0x88/0x960
[21002.889894]        mutex_lock_nested+0x1b/0x20
[21002.894854]        microcode_init+0xbb/0x208
[21002.899617]        do_one_initcall+0x51/0x1a9
[21002.904481]        kernel_init_freeable+0x208/0x2a7
[21002.909922]        kernel_init+0xe/0x104
[21002.914298]        ret_from_fork+0x2a/0x40
[21002.918867]
[21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
[21002.926058]        __lock_acquire+0x153c/0x1550
[21002.931112]        lock_acquire+0xc9/0x230
[21002.935688]        cpus_read_lock+0x4b/0x90
[21002.940353]        drain_all_stock.part.35+0x18/0x140
[21002.945987]        try_charge+0x3ab/0x6e0
[21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
[21002.955902]        shmem_getpage_gfp+0x25f/0x1050
[21002.961149]        shmem_fault+0x96/0x200
[21002.965621]        __do_fault+0x1e/0xa0
[21002.969905]        __handle_mm_fault+0x9c3/0xe00
[21002.975056]        handle_mm_fault+0x16e/0x380
[21002.980013]        __do_page_fault+0x24a/0x530
[21002.984968]        do_page_fault+0x30/0x80
[21002.989537]        page_fault+0x28/0x30
[21002.993812]
[21002.993812] other info that might help us debug this:
[21002.993812]
[21003.002744] Chain exists of:
[21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
[21003.002744]
[21003.016238]  Possible unsafe locking scenario:
[21003.016238]
[21003.022843]        CPU0                    CPU1
[21003.027896]        ----                    ----
[21003.032948]   lock(&mm->mmap_sem);
[21003.036741]                                lock(&type->i_mutex_dir_key#3);
[21003.044419]                                lock(&mm->mmap_sem);
[21003.051025]   lock(cpu_hotplug_lock.rw_sem);
[21003.055788]
[21003.055788]  *** DEADLOCK ***
[21003.055788]
[21003.062393] 2 locks held by a.out/4771:
[21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0

The problem is very similar to the one fixed by a459eeb7b852 ("mm,
page_alloc: do not depend on cpu hotplug locks inside the allocator").
We are taking hotplug locks while we can be sitting on top of basically
arbitrary locks. This just calls for problems.

We can get rid of {get,put}_online_cpus, fortunately. We do not have to
be worried about races with memory hotplug because drain_local_stock,
which is called from both the WQ draining and the memory hotplug
contexts, is always operating on the local cpu stock with IRQs disabled.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b9cf3cf4a3d0..6a97e827d282 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1765,6 +1765,10 @@ static void drain_local_stock(struct work_struct *dummy)
 	struct memcg_stock_pcp *stock;
 	unsigned long flags;
 
+	/*
+	 * The only protection from memory hotplug vs. drain_stock races is
+	 * that we always operate on local CPU stock here with IRQ disabled
+	 */
 	local_irq_save(flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
@@ -1807,7 +1811,6 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 	if (!mutex_trylock(&percpu_charge_mutex))
 		return;
 	/* Notify other cpus that system-wide "drain" is running */
-	get_online_cpus();
 	curcpu = get_cpu();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
@@ -1826,7 +1829,6 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 		}
 	}
 	put_cpu();
-	put_online_cpus();
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-- 
2.13.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-30 14:15   ` Michal Hocko
@ 2017-08-30 15:43     ` Michal Hocko
  2017-08-31 11:10       ` Artem Savkov
  2017-09-04 14:03       ` Michal Hocko
  0 siblings, 2 replies; 15+ messages in thread
From: Michal Hocko @ 2017-08-30 15:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Artem Savkov, Paul E. McKenney, LKML, Johannes Weiner, linux-mm

The previous patch is insufficient. drain_all_stock can still race with
the memory offline callback and the underlying memcg disappear. So we
need to be more careful and pin the css on the memcg. This patch
instead...
---
>From 70a5acf9bbe76d183e81a1a6b57dd5b9edc677c6 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Wed, 30 Aug 2017 16:09:01 +0200
Subject: [PATCH] mm, memcg: remove hotplug locking from try_charge

The following lockde splat has been noticed during LTP testing

[21002.630252] ======================================================
[21002.637148] WARNING: possible circular locking dependency detected
[21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
[21002.649583] ------------------------------------------------------
[21002.656492] a.out/4771 is trying to acquire lock:
[21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
[21002.672629]
[21002.672629] but task is already holding lock:
[21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21002.688371]
[21002.688371] which lock already depends on the new lock.
[21002.688371]
[21002.697505]
[21002.697505] the existing dependency chain (in reverse order) is:
[21002.705856]
[21002.705856] -> #3 (&mm->mmap_sem){++++++}:
[21002.712080]        lock_acquire+0xc9/0x230
[21002.716661]        __might_fault+0x70/0xa0
[21002.721241]        _copy_to_user+0x23/0x70
[21002.725814]        filldir+0xa7/0x110
[21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
[21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
[21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
[21002.747485]        iterate_dir+0x17a/0x1a0
[21002.752057]        SyS_getdents+0xb0/0x160
[21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
[21002.762371]
[21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
[21002.769661]        lock_acquire+0xc9/0x230
[21002.774239]        down_read+0x51/0xb0
[21002.778429]        lookup_slow+0xde/0x210
[21002.782903]        walk_component+0x160/0x250
[21002.787765]        link_path_walk+0x1a6/0x610
[21002.792625]        path_openat+0xe4/0xd50
[21002.797100]        do_filp_open+0x91/0x100
[21002.801673]        file_open_name+0xf5/0x130
[21002.806429]        filp_open+0x33/0x50
[21002.810620]        kernel_read_file_from_path+0x39/0x80
[21002.816459]        _request_firmware+0x39f/0x880
[21002.821610]        request_firmware_direct+0x37/0x50
[21002.827151]        request_microcode_fw+0x64/0xe0
[21002.832401]        reload_store+0xf7/0x180
[21002.836974]        dev_attr_store+0x18/0x30
[21002.841641]        sysfs_kf_write+0x44/0x60
[21002.846318]        kernfs_fop_write+0x113/0x1a0
[21002.851374]        __vfs_write+0x37/0x170
[21002.855849]        vfs_write+0xc7/0x1c0
[21002.860128]        SyS_write+0x58/0xc0
[21002.864313]        do_syscall_64+0x6c/0x1f0
[21002.868973]        return_from_SYSCALL_64+0x0/0x7a
[21002.874317]
[21002.874317] -> #1 (microcode_mutex){+.+.+.}:
[21002.880748]        lock_acquire+0xc9/0x230
[21002.885322]        __mutex_lock+0x88/0x960
[21002.889894]        mutex_lock_nested+0x1b/0x20
[21002.894854]        microcode_init+0xbb/0x208
[21002.899617]        do_one_initcall+0x51/0x1a9
[21002.904481]        kernel_init_freeable+0x208/0x2a7
[21002.909922]        kernel_init+0xe/0x104
[21002.914298]        ret_from_fork+0x2a/0x40
[21002.918867]
[21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
[21002.926058]        __lock_acquire+0x153c/0x1550
[21002.931112]        lock_acquire+0xc9/0x230
[21002.935688]        cpus_read_lock+0x4b/0x90
[21002.940353]        drain_all_stock.part.35+0x18/0x140
[21002.945987]        try_charge+0x3ab/0x6e0
[21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
[21002.955902]        shmem_getpage_gfp+0x25f/0x1050
[21002.961149]        shmem_fault+0x96/0x200
[21002.965621]        __do_fault+0x1e/0xa0
[21002.969905]        __handle_mm_fault+0x9c3/0xe00
[21002.975056]        handle_mm_fault+0x16e/0x380
[21002.980013]        __do_page_fault+0x24a/0x530
[21002.984968]        do_page_fault+0x30/0x80
[21002.989537]        page_fault+0x28/0x30
[21002.993812]
[21002.993812] other info that might help us debug this:
[21002.993812]
[21003.002744] Chain exists of:
[21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
[21003.002744]
[21003.016238]  Possible unsafe locking scenario:
[21003.016238]
[21003.022843]        CPU0                    CPU1
[21003.027896]        ----                    ----
[21003.032948]   lock(&mm->mmap_sem);
[21003.036741]                                lock(&type->i_mutex_dir_key#3);
[21003.044419]                                lock(&mm->mmap_sem);
[21003.051025]   lock(cpu_hotplug_lock.rw_sem);
[21003.055788]
[21003.055788]  *** DEADLOCK ***
[21003.055788]
[21003.062393] 2 locks held by a.out/4771:
[21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
[21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0

The problem is very similar to the one fixed by a459eeb7b852 ("mm,
page_alloc: do not depend on cpu hotplug locks inside the allocator").
We are taking hotplug locks while we can be sitting on top of basically
arbitrary locks. This just calls for problems.

We can get rid of {get,put}_online_cpus, fortunately. We do not have to
be worried about races with memory hotplug because drain_local_stock,
which is called from both the WQ draining and the memory hotplug
contexts, is always operating on the local cpu stock with IRQs disabled.

The only thing to be careful about is that the target memcg doesn't
vanish while we are still in drain_all_stock so take a reference on
it.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 mm/memcontrol.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b9cf3cf4a3d0..5c70f47abb3d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1765,6 +1765,10 @@ static void drain_local_stock(struct work_struct *dummy)
 	struct memcg_stock_pcp *stock;
 	unsigned long flags;
 
+	/*
+	 * The only protection from memory hotplug vs. drain_stock races is
+	 * that we always operate on local CPU stock here with IRQ disabled
+	 */
 	local_irq_save(flags);
 
 	stock = this_cpu_ptr(&memcg_stock);
@@ -1807,26 +1811,27 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 	if (!mutex_trylock(&percpu_charge_mutex))
 		return;
 	/* Notify other cpus that system-wide "drain" is running */
-	get_online_cpus();
 	curcpu = get_cpu();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
 		struct mem_cgroup *memcg;
 
 		memcg = stock->cached;
-		if (!memcg || !stock->nr_pages)
+		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
 			continue;
-		if (!mem_cgroup_is_descendant(memcg, root_memcg))
+		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
+			css_put(&memcg->css);
 			continue;
+		}
 		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
 			if (cpu == curcpu)
 				drain_local_stock(&stock->work);
 			else
 				schedule_work_on(cpu, &stock->work);
 		}
+		css_put(&memcg->css);
 	}
 	put_cpu();
-	put_online_cpus();
 	mutex_unlock(&percpu_charge_mutex);
 }
 
-- 
2.13.2

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-30 15:43     ` Michal Hocko
@ 2017-08-31 11:10       ` Artem Savkov
  2017-08-31 12:09         ` Michal Hocko
  2017-09-04 14:03       ` Michal Hocko
  1 sibling, 1 reply; 15+ messages in thread
From: Artem Savkov @ 2017-08-31 11:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Thomas Gleixner, Paul E. McKenney, LKML, Johannes Weiner, linux-mm

Hi Michal,

On Wed, Aug 30, 2017 at 05:43:15PM +0200, Michal Hocko wrote:
> The previous patch is insufficient. drain_all_stock can still race with
> the memory offline callback and the underlying memcg disappear. So we
> need to be more careful and pin the css on the memcg. This patch
> instead...

Tried this on top of rc7 and it does fix the splat for me.

> ---
> From 70a5acf9bbe76d183e81a1a6b57dd5b9edc677c6 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 30 Aug 2017 16:09:01 +0200
> Subject: [PATCH] mm, memcg: remove hotplug locking from try_charge
> 
> The following lockde splat has been noticed during LTP testing
> 
> [21002.630252] ======================================================
> [21002.637148] WARNING: possible circular locking dependency detected
> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> [21002.649583] ------------------------------------------------------
> [21002.656492] a.out/4771 is trying to acquire lock:
> [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
> [21002.672629]
> [21002.672629] but task is already holding lock:
> [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21002.688371]
> [21002.688371] which lock already depends on the new lock.
> [21002.688371]
> [21002.697505]
> [21002.697505] the existing dependency chain (in reverse order) is:
> [21002.705856]
> [21002.705856] -> #3 (&mm->mmap_sem){++++++}:
> [21002.712080]        lock_acquire+0xc9/0x230
> [21002.716661]        __might_fault+0x70/0xa0
> [21002.721241]        _copy_to_user+0x23/0x70
> [21002.725814]        filldir+0xa7/0x110
> [21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> [21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
> [21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
> [21002.747485]        iterate_dir+0x17a/0x1a0
> [21002.752057]        SyS_getdents+0xb0/0x160
> [21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
> [21002.762371]
> [21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
> [21002.769661]        lock_acquire+0xc9/0x230
> [21002.774239]        down_read+0x51/0xb0
> [21002.778429]        lookup_slow+0xde/0x210
> [21002.782903]        walk_component+0x160/0x250
> [21002.787765]        link_path_walk+0x1a6/0x610
> [21002.792625]        path_openat+0xe4/0xd50
> [21002.797100]        do_filp_open+0x91/0x100
> [21002.801673]        file_open_name+0xf5/0x130
> [21002.806429]        filp_open+0x33/0x50
> [21002.810620]        kernel_read_file_from_path+0x39/0x80
> [21002.816459]        _request_firmware+0x39f/0x880
> [21002.821610]        request_firmware_direct+0x37/0x50
> [21002.827151]        request_microcode_fw+0x64/0xe0
> [21002.832401]        reload_store+0xf7/0x180
> [21002.836974]        dev_attr_store+0x18/0x30
> [21002.841641]        sysfs_kf_write+0x44/0x60
> [21002.846318]        kernfs_fop_write+0x113/0x1a0
> [21002.851374]        __vfs_write+0x37/0x170
> [21002.855849]        vfs_write+0xc7/0x1c0
> [21002.860128]        SyS_write+0x58/0xc0
> [21002.864313]        do_syscall_64+0x6c/0x1f0
> [21002.868973]        return_from_SYSCALL_64+0x0/0x7a
> [21002.874317]
> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
> [21002.880748]        lock_acquire+0xc9/0x230
> [21002.885322]        __mutex_lock+0x88/0x960
> [21002.889894]        mutex_lock_nested+0x1b/0x20
> [21002.894854]        microcode_init+0xbb/0x208
> [21002.899617]        do_one_initcall+0x51/0x1a9
> [21002.904481]        kernel_init_freeable+0x208/0x2a7
> [21002.909922]        kernel_init+0xe/0x104
> [21002.914298]        ret_from_fork+0x2a/0x40
> [21002.918867]
> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
> [21002.926058]        __lock_acquire+0x153c/0x1550
> [21002.931112]        lock_acquire+0xc9/0x230
> [21002.935688]        cpus_read_lock+0x4b/0x90
> [21002.940353]        drain_all_stock.part.35+0x18/0x140
> [21002.945987]        try_charge+0x3ab/0x6e0
> [21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
> [21002.955902]        shmem_getpage_gfp+0x25f/0x1050
> [21002.961149]        shmem_fault+0x96/0x200
> [21002.965621]        __do_fault+0x1e/0xa0
> [21002.969905]        __handle_mm_fault+0x9c3/0xe00
> [21002.975056]        handle_mm_fault+0x16e/0x380
> [21002.980013]        __do_page_fault+0x24a/0x530
> [21002.984968]        do_page_fault+0x30/0x80
> [21002.989537]        page_fault+0x28/0x30
> [21002.993812]
> [21002.993812] other info that might help us debug this:
> [21002.993812]
> [21003.002744] Chain exists of:
> [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
> [21003.002744]
> [21003.016238]  Possible unsafe locking scenario:
> [21003.016238]
> [21003.022843]        CPU0                    CPU1
> [21003.027896]        ----                    ----
> [21003.032948]   lock(&mm->mmap_sem);
> [21003.036741]                                lock(&type->i_mutex_dir_key#3);
> [21003.044419]                                lock(&mm->mmap_sem);
> [21003.051025]   lock(cpu_hotplug_lock.rw_sem);
> [21003.055788]
> [21003.055788]  *** DEADLOCK ***
> [21003.055788]
> [21003.062393] 2 locks held by a.out/4771:
> [21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
> 
> The problem is very similar to the one fixed by a459eeb7b852 ("mm,
> page_alloc: do not depend on cpu hotplug locks inside the allocator").
> We are taking hotplug locks while we can be sitting on top of basically
> arbitrary locks. This just calls for problems.
> 
> We can get rid of {get,put}_online_cpus, fortunately. We do not have to
> be worried about races with memory hotplug because drain_local_stock,
> which is called from both the WQ draining and the memory hotplug
> contexts, is always operating on the local cpu stock with IRQs disabled.
> 
> The only thing to be careful about is that the target memcg doesn't
> vanish while we are still in drain_all_stock so take a reference on
> it.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b9cf3cf4a3d0..5c70f47abb3d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1765,6 +1765,10 @@ static void drain_local_stock(struct work_struct *dummy)
>  	struct memcg_stock_pcp *stock;
>  	unsigned long flags;
>  
> +	/*
> +	 * The only protection from memory hotplug vs. drain_stock races is
> +	 * that we always operate on local CPU stock here with IRQ disabled
> +	 */
>  	local_irq_save(flags);
>  
>  	stock = this_cpu_ptr(&memcg_stock);
> @@ -1807,26 +1811,27 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>  	if (!mutex_trylock(&percpu_charge_mutex))
>  		return;
>  	/* Notify other cpus that system-wide "drain" is running */
> -	get_online_cpus();
>  	curcpu = get_cpu();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
>  		struct mem_cgroup *memcg;
>  
>  		memcg = stock->cached;
> -		if (!memcg || !stock->nr_pages)
> +		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
>  			continue;
> -		if (!mem_cgroup_is_descendant(memcg, root_memcg))
> +		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
> +			css_put(&memcg->css);
>  			continue;
> +		}
>  		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>  			if (cpu == curcpu)
>  				drain_local_stock(&stock->work);
>  			else
>  				schedule_work_on(cpu, &stock->work);
>  		}
> +		css_put(&memcg->css);
>  	}
>  	put_cpu();
> -	put_online_cpus();
>  	mutex_unlock(&percpu_charge_mutex);
>  }
>  
> -- 
> 2.13.2
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Regards,
  Artem

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-31 11:10       ` Artem Savkov
@ 2017-08-31 12:09         ` Michal Hocko
  2017-08-31 12:19           ` Artem Savkov
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2017-08-31 12:09 UTC (permalink / raw)
  To: Artem Savkov
  Cc: Thomas Gleixner, Paul E. McKenney, LKML, Johannes Weiner, linux-mm

On Thu 31-08-17 13:10:06, Artem Savkov wrote:
> Hi Michal,
> 
> On Wed, Aug 30, 2017 at 05:43:15PM +0200, Michal Hocko wrote:
> > The previous patch is insufficient. drain_all_stock can still race with
> > the memory offline callback and the underlying memcg disappear. So we
> > need to be more careful and pin the css on the memcg. This patch
> > instead...
> 
> Tried this on top of rc7 and it does fix the splat for me.

Thanks for testing! Can I assume your Tested-by?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-31 12:09         ` Michal Hocko
@ 2017-08-31 12:19           ` Artem Savkov
  0 siblings, 0 replies; 15+ messages in thread
From: Artem Savkov @ 2017-08-31 12:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Thomas Gleixner, Paul E. McKenney, LKML, Johannes Weiner, linux-mm

On Thu, Aug 31, 2017 at 02:09:51PM +0200, Michal Hocko wrote:
> On Thu 31-08-17 13:10:06, Artem Savkov wrote:
> > Hi Michal,
> > 
> > On Wed, Aug 30, 2017 at 05:43:15PM +0200, Michal Hocko wrote:
> > > The previous patch is insufficient. drain_all_stock can still race with
> > > the memory offline callback and the underlying memcg disappear. So we
> > > need to be more careful and pin the css on the memcg. This patch
> > > instead...
> > 
> > Tried this on top of rc7 and it does fix the splat for me.
> 
> Thanks for testing! Can I assume your Tested-by?

Didn't test much more than the case that was causing it, but yes.

Reported-and-tested-by: Artem Savkov <asavkov@redhat.com>

-- 
Regards,
  Artem

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-08-30 15:43     ` Michal Hocko
  2017-08-31 11:10       ` Artem Savkov
@ 2017-09-04 14:03       ` Michal Hocko
  2017-09-05  8:19         ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2017-09-04 14:03 UTC (permalink / raw)
  To: Thomas Gleixner, Johannes Weiner
  Cc: Artem Savkov, Paul E. McKenney, LKML, linux-mm

Thomas, Johannes,
could you double check my thinking here? I will repost the patch to
Andrew if you are OK with this.

On Wed 30-08-17 17:43:15, Michal Hocko wrote:
> The previous patch is insufficient. drain_all_stock can still race with
> the memory offline callback and the underlying memcg disappear. So we
> need to be more careful and pin the css on the memcg. This patch
> instead...
> ---
> From 70a5acf9bbe76d183e81a1a6b57dd5b9edc677c6 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Wed, 30 Aug 2017 16:09:01 +0200
> Subject: [PATCH] mm, memcg: remove hotplug locking from try_charge
> 
> The following lockde splat has been noticed during LTP testing
> 
> [21002.630252] ======================================================
> [21002.637148] WARNING: possible circular locking dependency detected
> [21002.644045] 4.13.0-rc3-next-20170807 #12 Not tainted
> [21002.649583] ------------------------------------------------------
> [21002.656492] a.out/4771 is trying to acquire lock:
> [21002.661742]  (cpu_hotplug_lock.rw_sem){++++++}, at: [<ffffffff812b4668>] drain_all_stock.part.35+0x18/0x140
> [21002.672629]
> [21002.672629] but task is already holding lock:
> [21002.679137]  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21002.688371]
> [21002.688371] which lock already depends on the new lock.
> [21002.688371]
> [21002.697505]
> [21002.697505] the existing dependency chain (in reverse order) is:
> [21002.705856]
> [21002.705856] -> #3 (&mm->mmap_sem){++++++}:
> [21002.712080]        lock_acquire+0xc9/0x230
> [21002.716661]        __might_fault+0x70/0xa0
> [21002.721241]        _copy_to_user+0x23/0x70
> [21002.725814]        filldir+0xa7/0x110
> [21002.729988]        xfs_dir2_sf_getdents.isra.10+0x20c/0x2c0 [xfs]
> [21002.736840]        xfs_readdir+0x1fa/0x2c0 [xfs]
> [21002.742042]        xfs_file_readdir+0x30/0x40 [xfs]
> [21002.747485]        iterate_dir+0x17a/0x1a0
> [21002.752057]        SyS_getdents+0xb0/0x160
> [21002.756638]        entry_SYSCALL_64_fastpath+0x1f/0xbe
> [21002.762371]
> [21002.762371] -> #2 (&type->i_mutex_dir_key#3){++++++}:
> [21002.769661]        lock_acquire+0xc9/0x230
> [21002.774239]        down_read+0x51/0xb0
> [21002.778429]        lookup_slow+0xde/0x210
> [21002.782903]        walk_component+0x160/0x250
> [21002.787765]        link_path_walk+0x1a6/0x610
> [21002.792625]        path_openat+0xe4/0xd50
> [21002.797100]        do_filp_open+0x91/0x100
> [21002.801673]        file_open_name+0xf5/0x130
> [21002.806429]        filp_open+0x33/0x50
> [21002.810620]        kernel_read_file_from_path+0x39/0x80
> [21002.816459]        _request_firmware+0x39f/0x880
> [21002.821610]        request_firmware_direct+0x37/0x50
> [21002.827151]        request_microcode_fw+0x64/0xe0
> [21002.832401]        reload_store+0xf7/0x180
> [21002.836974]        dev_attr_store+0x18/0x30
> [21002.841641]        sysfs_kf_write+0x44/0x60
> [21002.846318]        kernfs_fop_write+0x113/0x1a0
> [21002.851374]        __vfs_write+0x37/0x170
> [21002.855849]        vfs_write+0xc7/0x1c0
> [21002.860128]        SyS_write+0x58/0xc0
> [21002.864313]        do_syscall_64+0x6c/0x1f0
> [21002.868973]        return_from_SYSCALL_64+0x0/0x7a
> [21002.874317]
> [21002.874317] -> #1 (microcode_mutex){+.+.+.}:
> [21002.880748]        lock_acquire+0xc9/0x230
> [21002.885322]        __mutex_lock+0x88/0x960
> [21002.889894]        mutex_lock_nested+0x1b/0x20
> [21002.894854]        microcode_init+0xbb/0x208
> [21002.899617]        do_one_initcall+0x51/0x1a9
> [21002.904481]        kernel_init_freeable+0x208/0x2a7
> [21002.909922]        kernel_init+0xe/0x104
> [21002.914298]        ret_from_fork+0x2a/0x40
> [21002.918867]
> [21002.918867] -> #0 (cpu_hotplug_lock.rw_sem){++++++}:
> [21002.926058]        __lock_acquire+0x153c/0x1550
> [21002.931112]        lock_acquire+0xc9/0x230
> [21002.935688]        cpus_read_lock+0x4b/0x90
> [21002.940353]        drain_all_stock.part.35+0x18/0x140
> [21002.945987]        try_charge+0x3ab/0x6e0
> [21002.950460]        mem_cgroup_try_charge+0x7f/0x2c0
> [21002.955902]        shmem_getpage_gfp+0x25f/0x1050
> [21002.961149]        shmem_fault+0x96/0x200
> [21002.965621]        __do_fault+0x1e/0xa0
> [21002.969905]        __handle_mm_fault+0x9c3/0xe00
> [21002.975056]        handle_mm_fault+0x16e/0x380
> [21002.980013]        __do_page_fault+0x24a/0x530
> [21002.984968]        do_page_fault+0x30/0x80
> [21002.989537]        page_fault+0x28/0x30
> [21002.993812]
> [21002.993812] other info that might help us debug this:
> [21002.993812]
> [21003.002744] Chain exists of:
> [21003.002744]   cpu_hotplug_lock.rw_sem --> &type->i_mutex_dir_key#3 --> &mm->mmap_sem
> [21003.002744]
> [21003.016238]  Possible unsafe locking scenario:
> [21003.016238]
> [21003.022843]        CPU0                    CPU1
> [21003.027896]        ----                    ----
> [21003.032948]   lock(&mm->mmap_sem);
> [21003.036741]                                lock(&type->i_mutex_dir_key#3);
> [21003.044419]                                lock(&mm->mmap_sem);
> [21003.051025]   lock(cpu_hotplug_lock.rw_sem);
> [21003.055788]
> [21003.055788]  *** DEADLOCK ***
> [21003.055788]
> [21003.062393] 2 locks held by a.out/4771:
> [21003.066675]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8106eb35>] __do_page_fault+0x175/0x530
> [21003.076391]  #1:  (percpu_charge_mutex){+.+...}, at: [<ffffffff812b4c97>] try_charge+0x397/0x6e0
> 
> The problem is very similar to the one fixed by a459eeb7b852 ("mm,
> page_alloc: do not depend on cpu hotplug locks inside the allocator").
> We are taking hotplug locks while we can be sitting on top of basically
> arbitrary locks. This just calls for problems.
> 
> We can get rid of {get,put}_online_cpus, fortunately. We do not have to
> be worried about races with memory hotplug because drain_local_stock,
> which is called from both the WQ draining and the memory hotplug
> contexts, is always operating on the local cpu stock with IRQs disabled.
> 
> The only thing to be careful about is that the target memcg doesn't
> vanish while we are still in drain_all_stock so take a reference on
> it.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  mm/memcontrol.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b9cf3cf4a3d0..5c70f47abb3d 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1765,6 +1765,10 @@ static void drain_local_stock(struct work_struct *dummy)
>  	struct memcg_stock_pcp *stock;
>  	unsigned long flags;
>  
> +	/*
> +	 * The only protection from memory hotplug vs. drain_stock races is
> +	 * that we always operate on local CPU stock here with IRQ disabled
> +	 */
>  	local_irq_save(flags);
>  
>  	stock = this_cpu_ptr(&memcg_stock);
> @@ -1807,26 +1811,27 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>  	if (!mutex_trylock(&percpu_charge_mutex))
>  		return;
>  	/* Notify other cpus that system-wide "drain" is running */
> -	get_online_cpus();
>  	curcpu = get_cpu();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
>  		struct mem_cgroup *memcg;
>  
>  		memcg = stock->cached;
> -		if (!memcg || !stock->nr_pages)
> +		if (!memcg || !stock->nr_pages || !css_tryget(&memcg->css))
>  			continue;
> -		if (!mem_cgroup_is_descendant(memcg, root_memcg))
> +		if (!mem_cgroup_is_descendant(memcg, root_memcg)) {
> +			css_put(&memcg->css);
>  			continue;
> +		}
>  		if (!test_and_set_bit(FLUSHING_CACHED_CHARGE, &stock->flags)) {
>  			if (cpu == curcpu)
>  				drain_local_stock(&stock->work);
>  			else
>  				schedule_work_on(cpu, &stock->work);
>  		}
> +		css_put(&memcg->css);
>  	}
>  	put_cpu();
> -	put_online_cpus();
>  	mutex_unlock(&percpu_charge_mutex);
>  }
>  
> -- 
> 2.13.2
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-09-04 14:03       ` Michal Hocko
@ 2017-09-05  8:19         ` Thomas Gleixner
  2017-09-05 10:23           ` Michal Hocko
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2017-09-05  8:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Artem Savkov, Paul E. McKenney, LKML, linux-mm

On Mon, 4 Sep 2017, Michal Hocko wrote:

> Thomas, Johannes,
> could you double check my thinking here? I will repost the patch to
> Andrew if you are OK with this.
> > +	/*
> > +	 * The only protection from memory hotplug vs. drain_stock races is
> > +	 * that we always operate on local CPU stock here with IRQ disabled
> > +	 */
> >  	local_irq_save(flags);
> >  
> >  	stock = this_cpu_ptr(&memcg_stock);
> > @@ -1807,26 +1811,27 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> >  	if (!mutex_trylock(&percpu_charge_mutex))
> >  		return;
> >  	/* Notify other cpus that system-wide "drain" is running */
> > -	get_online_cpus();
> >  	curcpu = get_cpu();

The problem here is that this does only protect you against a CPU being
unplugged, but not against a CPU coming online concurrently. I have no idea
whether that might be a problem, but at least you should put a comment in
which explains why it is not.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-09-05  8:19         ` Thomas Gleixner
@ 2017-09-05 10:23           ` Michal Hocko
  2017-09-05 11:04             ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2017-09-05 10:23 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Johannes Weiner, Artem Savkov, Paul E. McKenney, LKML, linux-mm

On Tue 05-09-17 10:19:13, Thomas Gleixner wrote:
> On Mon, 4 Sep 2017, Michal Hocko wrote:
> 
> > Thomas, Johannes,
> > could you double check my thinking here? I will repost the patch to
> > Andrew if you are OK with this.
> > > +	/*
> > > +	 * The only protection from memory hotplug vs. drain_stock races is
> > > +	 * that we always operate on local CPU stock here with IRQ disabled
> > > +	 */
> > >  	local_irq_save(flags);
> > >  
> > >  	stock = this_cpu_ptr(&memcg_stock);
> > > @@ -1807,26 +1811,27 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> > >  	if (!mutex_trylock(&percpu_charge_mutex))
> > >  		return;
> > >  	/* Notify other cpus that system-wide "drain" is running */
> > > -	get_online_cpus();
> > >  	curcpu = get_cpu();
> 
> The problem here is that this does only protect you against a CPU being
> unplugged, but not against a CPU coming online concurrently.

Yes but same as the drain_all_pages we do not have any cpu up specific
intialization so there is no specific action to race against AFAICS.

> I have no idea
> whether that might be a problem, but at least you should put a comment in
> which explains why it is not.

What about this?
---
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5c70f47abb3d..ff9b0979ccc3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1810,7 +1810,12 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
 	/* If someone's already draining, avoid adding running more workers. */
 	if (!mutex_trylock(&percpu_charge_mutex))
 		return;
-	/* Notify other cpus that system-wide "drain" is running */
+	/*
+	 * Notify other cpus that system-wide "drain" is running
+	 * We do not care about races with the cpu hotplug because cpu down
+	 * as well as workers from this path always operate on the local
+	 * per-cpu data. CPU up doesn't touch memcg_stock at all.
+	 */
 	curcpu = get_cpu();
 	for_each_online_cpu(cpu) {
 		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem
  2017-09-05 10:23           ` Michal Hocko
@ 2017-09-05 11:04             ` Thomas Gleixner
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Gleixner @ 2017-09-05 11:04 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Artem Savkov, Paul E. McKenney, LKML, linux-mm

On Tue, 5 Sep 2017, Michal Hocko wrote:
> On Tue 05-09-17 10:19:13, Thomas Gleixner wrote:
> > On Mon, 4 Sep 2017, Michal Hocko wrote:
> > 
> > > Thomas, Johannes,
> > > could you double check my thinking here? I will repost the patch to
> > > Andrew if you are OK with this.
> > > > +	/*
> > > > +	 * The only protection from memory hotplug vs. drain_stock races is
> > > > +	 * that we always operate on local CPU stock here with IRQ disabled
> > > > +	 */
> > > >  	local_irq_save(flags);
> > > >  
> > > >  	stock = this_cpu_ptr(&memcg_stock);
> > > > @@ -1807,26 +1811,27 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
> > > >  	if (!mutex_trylock(&percpu_charge_mutex))
> > > >  		return;
> > > >  	/* Notify other cpus that system-wide "drain" is running */
> > > > -	get_online_cpus();
> > > >  	curcpu = get_cpu();
> > 
> > The problem here is that this does only protect you against a CPU being
> > unplugged, but not against a CPU coming online concurrently.
> 
> Yes but same as the drain_all_pages we do not have any cpu up specific
> intialization so there is no specific action to race against AFAICS.
> 
> > I have no idea
> > whether that might be a problem, but at least you should put a comment in
> > which explains why it is not.
> 
> What about this?

Looks good.

> ---
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5c70f47abb3d..ff9b0979ccc3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1810,7 +1810,12 @@ static void drain_all_stock(struct mem_cgroup *root_memcg)
>  	/* If someone's already draining, avoid adding running more workers. */
>  	if (!mutex_trylock(&percpu_charge_mutex))
>  		return;
> -	/* Notify other cpus that system-wide "drain" is running */
> +	/*
> +	 * Notify other cpus that system-wide "drain" is running
> +	 * We do not care about races with the cpu hotplug because cpu down
> +	 * as well as workers from this path always operate on the local
> +	 * per-cpu data. CPU up doesn't touch memcg_stock at all.
> +	 */
>  	curcpu = get_cpu();
>  	for_each_online_cpu(cpu) {
>  		struct memcg_stock_pcp *stock = &per_cpu(memcg_stock, cpu);
> 
> -- 
> Michal Hocko
> SUSE Labs
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-05 11:04 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-07 14:09 possible circular locking dependency mmap_sem/cpu_hotplug_lock.rw_sem Artem Savkov
2017-08-15 19:01 ` Paul E. McKenney
2017-08-16 13:39   ` Laurent Dufour
2017-08-16 15:36     ` Artem Savkov
2017-08-16 15:40       ` Laurent Dufour
2017-08-16 14:07 ` Thomas Gleixner
2017-08-30 14:15   ` Michal Hocko
2017-08-30 15:43     ` Michal Hocko
2017-08-31 11:10       ` Artem Savkov
2017-08-31 12:09         ` Michal Hocko
2017-08-31 12:19           ` Artem Savkov
2017-09-04 14:03       ` Michal Hocko
2017-09-05  8:19         ` Thomas Gleixner
2017-09-05 10:23           ` Michal Hocko
2017-09-05 11:04             ` Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).