* [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() @ 2018-03-14 11:51 Kirill Tkhai 2018-03-14 13:55 ` Tejun Heo 2018-03-14 20:56 ` Andrew Morton 0 siblings, 2 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-14 11:51 UTC (permalink / raw) To: akpm, tj, cl, linux-mm, linux-kernel In case of memory deficit and low percpu memory pages, pcpu_balance_workfn() takes pcpu_alloc_mutex for a long time (as it makes memory allocations itself and waits for memory reclaim). If tasks doing pcpu_alloc() are choosen by OOM killer, they can't exit, because they are waiting for the mutex. The patch makes pcpu_alloc() to care about killing signal and use mutex_lock_killable(), when it's allowed by GFP flags. This guarantees, a task does not miss SIGKILL from OOM killer. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> --- mm/percpu.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 50e7fdf84055..212b4988926c 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1369,8 +1369,12 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, return NULL; } - if (!is_atomic) - mutex_lock(&pcpu_alloc_mutex); + if (!is_atomic) { + if (gfp & __GFP_NOFAIL) + mutex_lock(&pcpu_alloc_mutex); + else if (mutex_lock_killable(&pcpu_alloc_mutex)) + return NULL; + } spin_lock_irqsave(&pcpu_lock, flags); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-14 11:51 [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Kirill Tkhai @ 2018-03-14 13:55 ` Tejun Heo 2018-03-14 20:56 ` Andrew Morton 1 sibling, 0 replies; 27+ messages in thread From: Tejun Heo @ 2018-03-14 13:55 UTC (permalink / raw) To: Kirill Tkhai; +Cc: akpm, cl, linux-mm, linux-kernel On Wed, Mar 14, 2018 at 02:51:48PM +0300, Kirill Tkhai wrote: > In case of memory deficit and low percpu memory pages, > pcpu_balance_workfn() takes pcpu_alloc_mutex for a long > time (as it makes memory allocations itself and waits > for memory reclaim). If tasks doing pcpu_alloc() are > choosen by OOM killer, they can't exit, because they > are waiting for the mutex. > > The patch makes pcpu_alloc() to care about killing signal > and use mutex_lock_killable(), when it's allowed by GFP > flags. This guarantees, a task does not miss SIGKILL > from OOM killer. > > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Applied to percpu/for-4.16-fixes. Thanks, Kirill. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-14 11:51 [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Kirill Tkhai 2018-03-14 13:55 ` Tejun Heo @ 2018-03-14 20:56 ` Andrew Morton 2018-03-14 22:09 ` Tejun Heo ` (2 more replies) 1 sibling, 3 replies; 27+ messages in thread From: Andrew Morton @ 2018-03-14 20:56 UTC (permalink / raw) To: Kirill Tkhai; +Cc: tj, cl, linux-mm, linux-kernel On Wed, 14 Mar 2018 14:51:48 +0300 Kirill Tkhai <ktkhai@virtuozzo.com> wrote: > In case of memory deficit and low percpu memory pages, > pcpu_balance_workfn() takes pcpu_alloc_mutex for a long > time (as it makes memory allocations itself and waits > for memory reclaim). If tasks doing pcpu_alloc() are > choosen by OOM killer, they can't exit, because they > are waiting for the mutex. > > The patch makes pcpu_alloc() to care about killing signal > and use mutex_lock_killable(), when it's allowed by GFP > flags. This guarantees, a task does not miss SIGKILL > from OOM killer. > > ... > > --- a/mm/percpu.c > +++ b/mm/percpu.c > @@ -1369,8 +1369,12 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, > return NULL; > } > > - if (!is_atomic) > - mutex_lock(&pcpu_alloc_mutex); > + if (!is_atomic) { > + if (gfp & __GFP_NOFAIL) > + mutex_lock(&pcpu_alloc_mutex); > + else if (mutex_lock_killable(&pcpu_alloc_mutex)) > + return NULL; > + } It would benefit from a comment explaining why we're doing this (it's for the oom-killer). My memory is weak and our documentation is awful. What does mutex_lock_killable() actually do and how does it differ from mutex_lock_interruptible()? Userspace tasks can run pcpu_alloc() and I wonder if there's any way in which a userspace-delivered signal can disrupt another userspace task's memory allocation attempt? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-14 20:56 ` Andrew Morton @ 2018-03-14 22:09 ` Tejun Heo 2018-03-14 22:22 ` Andrew Morton 2018-03-15 11:58 ` Matthew Wilcox 2018-03-19 15:14 ` [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Tejun Heo 2 siblings, 1 reply; 27+ messages in thread From: Tejun Heo @ 2018-03-14 22:09 UTC (permalink / raw) To: Andrew Morton; +Cc: Kirill Tkhai, cl, linux-mm, linux-kernel Hello, Andrew. On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > It would benefit from a comment explaining why we're doing this (it's > for the oom-killer). Will add. > My memory is weak and our documentation is awful. What does > mutex_lock_killable() actually do and how does it differ from > mutex_lock_interruptible()? Userspace tasks can run pcpu_alloc() and I IIRC, killable listens only to SIGKILL. > wonder if there's any way in which a userspace-delivered signal can > disrupt another userspace task's memory allocation attempt? Hmm... maybe. Just honoring SIGKILL *should* be fine but the alloc failure paths might be broken, so there are some risks. Given that the cases where userspace tasks end up allocation percpu memory is pretty limited and/or priviledged (like mount, bpf), I don't think the risks are high tho. Thanks. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-14 22:09 ` Tejun Heo @ 2018-03-14 22:22 ` Andrew Morton 2018-03-15 8:58 ` Kirill Tkhai 2018-03-19 15:13 ` Tejun Heo 0 siblings, 2 replies; 27+ messages in thread From: Andrew Morton @ 2018-03-14 22:22 UTC (permalink / raw) To: Tejun Heo; +Cc: Kirill Tkhai, cl, linux-mm, linux-kernel On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: > Hello, Andrew. > > On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > > It would benefit from a comment explaining why we're doing this (it's > > for the oom-killer). > > Will add. > > > My memory is weak and our documentation is awful. What does > > mutex_lock_killable() actually do and how does it differ from > > mutex_lock_interruptible()? Userspace tasks can run pcpu_alloc() and I > > IIRC, killable listens only to SIGKILL. > > > wonder if there's any way in which a userspace-delivered signal can > > disrupt another userspace task's memory allocation attempt? > > Hmm... maybe. Just honoring SIGKILL *should* be fine but the alloc > failure paths might be broken, so there are some risks. Given that > the cases where userspace tasks end up allocation percpu memory is > pretty limited and/or priviledged (like mount, bpf), I don't think the > risks are high tho. hm. spose so. Maybe. Are there other ways? I assume the time is being spent in pcpu_create_chunk()? We could drop the mutex while running that stuff and take the appropriate did-we-race-with-someone testing after retaking it. Or similar. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-14 22:22 ` Andrew Morton @ 2018-03-15 8:58 ` Kirill Tkhai 2018-03-15 10:48 ` Tetsuo Handa 2018-03-19 15:13 ` Tejun Heo 1 sibling, 1 reply; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 8:58 UTC (permalink / raw) To: Andrew Morton, Tejun Heo; +Cc: cl, linux-mm, linux-kernel On 15.03.2018 01:22, Andrew Morton wrote: > On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: > >> Hello, Andrew. >> >> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >>> It would benefit from a comment explaining why we're doing this (it's >>> for the oom-killer). >> >> Will add. >> >>> My memory is weak and our documentation is awful. What does >>> mutex_lock_killable() actually do and how does it differ from >>> mutex_lock_interruptible()? Userspace tasks can run pcpu_alloc() and I >> >> IIRC, killable listens only to SIGKILL. >> >>> wonder if there's any way in which a userspace-delivered signal can >>> disrupt another userspace task's memory allocation attempt? >> >> Hmm... maybe. Just honoring SIGKILL *should* be fine but the alloc >> failure paths might be broken, so there are some risks. Given that >> the cases where userspace tasks end up allocation percpu memory is >> pretty limited and/or priviledged (like mount, bpf), I don't think the >> risks are high tho. > > hm. spose so. Maybe. Are there other ways? I assume the time is > being spent in pcpu_create_chunk()? We could drop the mutex while > running that stuff and take the appropriate did-we-race-with-someone > testing after retaking it. Or similar. The balance work spends its time in pcpu_populate_chunk(). There are two stacks of this problem: [ 106.313267] kworker/2:2 D13832 936 2 0x80000000 [ 106.313740] Workqueue: events pcpu_balance_workfn [ 106.314109] Call Trace: [ 106.314293] ? __schedule+0x267/0x750 [ 106.314570] schedule+0x2d/0x90 [ 106.314803] schedule_timeout+0x17f/0x390 [ 106.315106] ? __next_timer_interrupt+0xc0/0xc0 [ 106.315429] __alloc_pages_slowpath+0xb73/0xd90 [ 106.315792] __alloc_pages_nodemask+0x16a/0x210 [ 106.316148] pcpu_populate_chunk+0xce/0x300 [ 106.316479] pcpu_balance_workfn+0x3f3/0x580 [ 106.316853] ? _raw_spin_unlock_irq+0xe/0x30 [ 106.317227] ? finish_task_switch+0x8d/0x250 [ 106.317632] process_one_work+0x1b7/0x410 [ 106.317970] worker_thread+0x26/0x3d0 [ 106.318304] ? process_one_work+0x410/0x410 [ 106.318649] kthread+0x10e/0x130 [ 106.318916] ? __kthread_create_worker+0x120/0x120 [ 106.319360] ret_from_fork+0x35/0x40 [ 106.453375] a.out D13400 3670 1 0x00100004 [ 106.453880] Call Trace: [ 106.454114] ? __schedule+0x267/0x750 [ 106.454427] schedule+0x2d/0x90 [ 106.454829] schedule_preempt_disabled+0xf/0x20 [ 106.455422] __mutex_lock.isra.2+0x181/0x4d0 [ 106.455988] ? pcpu_alloc+0x3c4/0x670 [ 106.456465] pcpu_alloc+0x3c4/0x670 [ 106.456973] ? preempt_count_add+0x63/0x90 [ 106.457401] ? __local_bh_enable_ip+0x2e/0x60 [ 106.457882] ipv6_add_dev+0x121/0x490 [ 106.458330] addrconf_notify+0x27b/0x9a0 [ 106.458823] ? inetdev_init+0xd7/0x150 [ 106.459270] ? inetdev_event+0x339/0x4b0 [ 106.459738] ? preempt_count_add+0x63/0x90 [ 106.460243] ? _raw_spin_lock_irq+0xf/0x30 [ 106.460747] ? notifier_call_chain+0x42/0x60 [ 106.461271] notifier_call_chain+0x42/0x60 [ 106.461819] register_netdevice+0x415/0x530 [ 106.462364] register_netdev+0x11/0x20 [ 106.462849] loopback_net_init+0x43/0x90 [ 106.463216] ops_init+0x3b/0x100 [ 106.463516] setup_net+0x7d/0x150 [ 106.463831] copy_net_ns+0x14b/0x180 [ 106.464134] create_new_namespaces+0x117/0x1b0 [ 106.464481] unshare_nsproxy_namespaces+0x5b/0x90 [ 106.464864] SyS_unshare+0x1b0/0x300 [ 106.536845] Kernel panic - not syncing: Out of memory and no killable processes... ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-15 8:58 ` Kirill Tkhai @ 2018-03-15 10:48 ` Tetsuo Handa 0 siblings, 0 replies; 27+ messages in thread From: Tetsuo Handa @ 2018-03-15 10:48 UTC (permalink / raw) To: Kirill Tkhai, Andrew Morton, Tejun Heo; +Cc: cl, linux-mm, linux-kernel On 2018/03/15 17:58, Kirill Tkhai wrote: > On 15.03.2018 01:22, Andrew Morton wrote: >> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: >> >>> Hello, Andrew. >>> >>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >>>> It would benefit from a comment explaining why we're doing this (it's >>>> for the oom-killer). >>> >>> Will add. >>> >>>> My memory is weak and our documentation is awful. What does >>>> mutex_lock_killable() actually do and how does it differ from >>>> mutex_lock_interruptible()? Userspace tasks can run pcpu_alloc() and I >>> >>> IIRC, killable listens only to SIGKILL. I think that killable listens to any signal which results in termination of that process. For example, if a process is configured to terminate upon SIGINT, fatal_signal_pending() becomes true upon SIGINT. >>> >>>> wonder if there's any way in which a userspace-delivered signal can >>>> disrupt another userspace task's memory allocation attempt? >>> >>> Hmm... maybe. Just honoring SIGKILL *should* be fine but the alloc >>> failure paths might be broken, so there are some risks. Given that >>> the cases where userspace tasks end up allocation percpu memory is >>> pretty limited and/or priviledged (like mount, bpf), I don't think the >>> risks are high tho. >> >> hm. spose so. Maybe. Are there other ways? I assume the time is >> being spent in pcpu_create_chunk()? We could drop the mutex while >> running that stuff and take the appropriate did-we-race-with-someone >> testing after retaking it. Or similar. > > The balance work spends its time in pcpu_populate_chunk(). There are > two stacks of this problem: Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc() are choosen by OOM killer, they can't exit, because they are waiting for the mutex." should not cause problems. Of course, giving up upon SIGKILL is nice regardless. > > [ 106.313267] kworker/2:2 D13832 936 2 0x80000000 > [ 106.313740] Workqueue: events pcpu_balance_workfn > [ 106.314109] Call Trace: > [ 106.314293] ? __schedule+0x267/0x750 > [ 106.314570] schedule+0x2d/0x90 > [ 106.314803] schedule_timeout+0x17f/0x390 > [ 106.315106] ? __next_timer_interrupt+0xc0/0xc0 > [ 106.315429] __alloc_pages_slowpath+0xb73/0xd90 > [ 106.315792] __alloc_pages_nodemask+0x16a/0x210 > [ 106.316148] pcpu_populate_chunk+0xce/0x300 > [ 106.316479] pcpu_balance_workfn+0x3f3/0x580 > [ 106.316853] ? _raw_spin_unlock_irq+0xe/0x30 > [ 106.317227] ? finish_task_switch+0x8d/0x250 > [ 106.317632] process_one_work+0x1b7/0x410 > [ 106.317970] worker_thread+0x26/0x3d0 > [ 106.318304] ? process_one_work+0x410/0x410 > [ 106.318649] kthread+0x10e/0x130 > [ 106.318916] ? __kthread_create_worker+0x120/0x120 > [ 106.319360] ret_from_fork+0x35/0x40 > > [ 106.453375] a.out D13400 3670 1 0x00100004 > [ 106.453880] Call Trace: > [ 106.454114] ? __schedule+0x267/0x750 > [ 106.454427] schedule+0x2d/0x90 > [ 106.454829] schedule_preempt_disabled+0xf/0x20 > [ 106.455422] __mutex_lock.isra.2+0x181/0x4d0 > [ 106.455988] ? pcpu_alloc+0x3c4/0x670 > [ 106.456465] pcpu_alloc+0x3c4/0x670 > [ 106.456973] ? preempt_count_add+0x63/0x90 > [ 106.457401] ? __local_bh_enable_ip+0x2e/0x60 > [ 106.457882] ipv6_add_dev+0x121/0x490 > [ 106.458330] addrconf_notify+0x27b/0x9a0 > [ 106.458823] ? inetdev_init+0xd7/0x150 > [ 106.459270] ? inetdev_event+0x339/0x4b0 > [ 106.459738] ? preempt_count_add+0x63/0x90 > [ 106.460243] ? _raw_spin_lock_irq+0xf/0x30 > [ 106.460747] ? notifier_call_chain+0x42/0x60 > [ 106.461271] notifier_call_chain+0x42/0x60 > [ 106.461819] register_netdevice+0x415/0x530 > [ 106.462364] register_netdev+0x11/0x20 > [ 106.462849] loopback_net_init+0x43/0x90 > [ 106.463216] ops_init+0x3b/0x100 > [ 106.463516] setup_net+0x7d/0x150 > [ 106.463831] copy_net_ns+0x14b/0x180 > [ 106.464134] create_new_namespaces+0x117/0x1b0 > [ 106.464481] unshare_nsproxy_namespaces+0x5b/0x90 > [ 106.464864] SyS_unshare+0x1b0/0x300 > > [ 106.536845] Kernel panic - not syncing: Out of memory and no killable processes... These two stacks of this problem are not blocked at mutex_lock(). Why all OOM-killable threads were killed? There were only few? Does pcpu_alloc() allocate so much enough to deplete memory reserves? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() @ 2018-03-15 10:48 ` Tetsuo Handa 0 siblings, 0 replies; 27+ messages in thread From: Tetsuo Handa @ 2018-03-15 10:48 UTC (permalink / raw) To: Kirill Tkhai, Andrew Morton, Tejun Heo; +Cc: cl, linux-mm, linux-kernel On 2018/03/15 17:58, Kirill Tkhai wrote: > On 15.03.2018 01:22, Andrew Morton wrote: >> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: >> >>> Hello, Andrew. >>> >>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >>>> It would benefit from a comment explaining why we're doing this (it's >>>> for the oom-killer). >>> >>> Will add. >>> >>>> My memory is weak and our documentation is awful.A What does >>>> mutex_lock_killable() actually do and how does it differ from >>>> mutex_lock_interruptible()?A Userspace tasks can run pcpu_alloc() and I >>> >>> IIRC, killable listens only to SIGKILL. I think that killable listens to any signal which results in termination of that process. For example, if a process is configured to terminate upon SIGINT, fatal_signal_pending() becomes true upon SIGINT. >>> >>>> wonder if there's any way in which a userspace-delivered signal can >>>> disrupt another userspace task's memory allocation attempt? >>> >>> Hmm... maybe.A Just honoring SIGKILL *should* be fine but the alloc >>> failure paths might be broken, so there are some risks.A Given that >>> the cases where userspace tasks end up allocation percpu memory is >>> pretty limited and/or priviledged (like mount, bpf), I don't think the >>> risks are high tho. >> >> hm.A spose so.A Maybe.A Are there other ways?A I assume the time is >> being spent in pcpu_create_chunk()?A We could drop the mutex while >> running that stuff and take the appropriate did-we-race-with-someone >> testing after retaking it.A Or similar. > > The balance work spends its time in pcpu_populate_chunk(). There are > two stacks of this problem: Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc() are choosen by OOM killer, they can't exit, because they are waiting for the mutex." should not cause problems. Of course, giving up upon SIGKILL is nice regardless. > > [A 106.313267] kworker/2:2A A A A D13832A A 936A A A A A 2 0x80000000 > [A 106.313740] Workqueue: events pcpu_balance_workfn > [A 106.314109] Call Trace: > [A 106.314293]A ? __schedule+0x267/0x750 > [A 106.314570]A schedule+0x2d/0x90 > [A 106.314803]A schedule_timeout+0x17f/0x390 > [A 106.315106]A ? __next_timer_interrupt+0xc0/0xc0 > [A 106.315429]A __alloc_pages_slowpath+0xb73/0xd90 > [A 106.315792]A __alloc_pages_nodemask+0x16a/0x210 > [A 106.316148]A pcpu_populate_chunk+0xce/0x300 > [A 106.316479]A pcpu_balance_workfn+0x3f3/0x580 > [A 106.316853]A ? _raw_spin_unlock_irq+0xe/0x30 > [A 106.317227]A ? finish_task_switch+0x8d/0x250 > [A 106.317632]A process_one_work+0x1b7/0x410 > [A 106.317970]A worker_thread+0x26/0x3d0 > [A 106.318304]A ? process_one_work+0x410/0x410 > [A 106.318649]A kthread+0x10e/0x130 > [A 106.318916]A ? __kthread_create_worker+0x120/0x120 > [A 106.319360]A ret_from_fork+0x35/0x40 > > [A 106.453375] a.outA A A A A A A A A A D13400A 3670A A A A A 1 0x00100004 > [A 106.453880] Call Trace: > [A 106.454114]A ? __schedule+0x267/0x750 > [A 106.454427]A schedule+0x2d/0x90 > [A 106.454829]A schedule_preempt_disabled+0xf/0x20 > [A 106.455422]A __mutex_lock.isra.2+0x181/0x4d0 > [A 106.455988]A ? pcpu_alloc+0x3c4/0x670 > [A 106.456465]A pcpu_alloc+0x3c4/0x670 > [A 106.456973]A ? preempt_count_add+0x63/0x90 > [A 106.457401]A ? __local_bh_enable_ip+0x2e/0x60 > [A 106.457882]A ipv6_add_dev+0x121/0x490 > [A 106.458330]A addrconf_notify+0x27b/0x9a0 > [A 106.458823]A ? inetdev_init+0xd7/0x150 > [A 106.459270]A ? inetdev_event+0x339/0x4b0 > [A 106.459738]A ? preempt_count_add+0x63/0x90 > [A 106.460243]A ? _raw_spin_lock_irq+0xf/0x30 > [A 106.460747]A ? notifier_call_chain+0x42/0x60 > [A 106.461271]A notifier_call_chain+0x42/0x60 > [A 106.461819]A register_netdevice+0x415/0x530 > [A 106.462364]A register_netdev+0x11/0x20 > [A 106.462849]A loopback_net_init+0x43/0x90 > [A 106.463216]A ops_init+0x3b/0x100 > [A 106.463516]A setup_net+0x7d/0x150 > [A 106.463831]A copy_net_ns+0x14b/0x180 > [A 106.464134]A create_new_namespaces+0x117/0x1b0 > [A 106.464481]A unshare_nsproxy_namespaces+0x5b/0x90 > [A 106.464864]A SyS_unshare+0x1b0/0x300 > > [A 106.536845] Kernel panic - not syncing: Out of memory and no killable processes... These two stacks of this problem are not blocked at mutex_lock(). Why all OOM-killable threads were killed? There were only few? Does pcpu_alloc() allocate so much enough to deplete memory reserves? ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-15 10:48 ` Tetsuo Handa @ 2018-03-15 12:09 ` Kirill Tkhai -1 siblings, 0 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 12:09 UTC (permalink / raw) To: Tetsuo Handa, Andrew Morton, Tejun Heo; +Cc: cl, linux-mm, linux-kernel On 15.03.2018 13:48, Tetsuo Handa wrote: > On 2018/03/15 17:58, Kirill Tkhai wrote: >> On 15.03.2018 01:22, Andrew Morton wrote: >>> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: >>> >>>> Hello, Andrew. >>>> >>>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >>>>> It would benefit from a comment explaining why we're doing this (it's >>>>> for the oom-killer). >>>> >>>> Will add. >>>> >>>>> My memory is weak and our documentation is awful. What does >>>>> mutex_lock_killable() actually do and how does it differ from >>>>> mutex_lock_interruptible()? Userspace tasks can run pcpu_alloc() and I >>>> >>>> IIRC, killable listens only to SIGKILL. > > I think that killable listens to any signal which results in termination of > that process. For example, if a process is configured to terminate upon SIGINT, > fatal_signal_pending() becomes true upon SIGINT. It shouldn't act on SIGINT: static inline int __fatal_signal_pending(struct task_struct *p) { return unlikely(sigismember(&p->pending.signal, SIGKILL)); } static inline int fatal_signal_pending(struct task_struct *p) { return signal_pending(p) && __fatal_signal_pending(p); } >>>> >>>>> wonder if there's any way in which a userspace-delivered signal can >>>>> disrupt another userspace task's memory allocation attempt? >>>> >>>> Hmm... maybe. Just honoring SIGKILL *should* be fine but the alloc >>>> failure paths might be broken, so there are some risks. Given that >>>> the cases where userspace tasks end up allocation percpu memory is >>>> pretty limited and/or priviledged (like mount, bpf), I don't think the >>>> risks are high tho. >>> >>> hm. spose so. Maybe. Are there other ways? I assume the time is >>> being spent in pcpu_create_chunk()? We could drop the mutex while >>> running that stuff and take the appropriate did-we-race-with-someone >>> testing after retaking it. Or similar. >> >> The balance work spends its time in pcpu_populate_chunk(). There are >> two stacks of this problem: > > Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper > reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc() > are choosen by OOM killer, they can't exit, because they are waiting for the > mutex." should not cause problems. Of course, giving up upon SIGKILL is nice > regardless. There is a test case, which leads my 4 cpus VM to the OOM: #define _GNU_SOURCE #include <sched.h> main() { int i; for (i = 0; i < 8; i++) fork(); daemon(1,1); while (1) unshare(CLONE_NEWNET); } The problem is that net namespace init/exit methods are not made to be executed in parallel, and exclusive mutex is used there. I'm working on solution at the moment, and you may find that I've done in net-next.git, if you are interested. pcpu_alloc()-related OOM happens on stable kernel, and it's easy to trigger it by the test. pcpu is not the only problem there, but it's one of them, and since there is logically seen OOM deadlock in pcpu code like it's written in patch description, and the patch fixes it. Going away from this problem to general, I think all allocating/registering actions in kernel should be made with killable primitives, if they can fail. And the generic policy should be using mutex_lock_killable() instead of mutex_lock(). Otherwise, OOM victims can't died, if they waiting for a mutex, which is held by a process making a reclaim. This makes circular dependencies and just makes OOM badness counting useless, while it must not be so. >> >> [ 106.313267] kworker/2:2 D13832 936 2 0x80000000 >> [ 106.313740] Workqueue: events pcpu_balance_workfn >> [ 106.314109] Call Trace: >> [ 106.314293] ? __schedule+0x267/0x750 >> [ 106.314570] schedule+0x2d/0x90 >> [ 106.314803] schedule_timeout+0x17f/0x390 >> [ 106.315106] ? __next_timer_interrupt+0xc0/0xc0 >> [ 106.315429] __alloc_pages_slowpath+0xb73/0xd90 >> [ 106.315792] __alloc_pages_nodemask+0x16a/0x210 >> [ 106.316148] pcpu_populate_chunk+0xce/0x300 >> [ 106.316479] pcpu_balance_workfn+0x3f3/0x580 >> [ 106.316853] ? _raw_spin_unlock_irq+0xe/0x30 >> [ 106.317227] ? finish_task_switch+0x8d/0x250 >> [ 106.317632] process_one_work+0x1b7/0x410 >> [ 106.317970] worker_thread+0x26/0x3d0 >> [ 106.318304] ? process_one_work+0x410/0x410 >> [ 106.318649] kthread+0x10e/0x130 >> [ 106.318916] ? __kthread_create_worker+0x120/0x120 >> [ 106.319360] ret_from_fork+0x35/0x40 >> >> [ 106.453375] a.out D13400 3670 1 0x00100004 >> [ 106.453880] Call Trace: >> [ 106.454114] ? __schedule+0x267/0x750 >> [ 106.454427] schedule+0x2d/0x90 >> [ 106.454829] schedule_preempt_disabled+0xf/0x20 >> [ 106.455422] __mutex_lock.isra.2+0x181/0x4d0 >> [ 106.455988] ? pcpu_alloc+0x3c4/0x670 >> [ 106.456465] pcpu_alloc+0x3c4/0x670 >> [ 106.456973] ? preempt_count_add+0x63/0x90 >> [ 106.457401] ? __local_bh_enable_ip+0x2e/0x60 >> [ 106.457882] ipv6_add_dev+0x121/0x490 >> [ 106.458330] addrconf_notify+0x27b/0x9a0 >> [ 106.458823] ? inetdev_init+0xd7/0x150 >> [ 106.459270] ? inetdev_event+0x339/0x4b0 >> [ 106.459738] ? preempt_count_add+0x63/0x90 >> [ 106.460243] ? _raw_spin_lock_irq+0xf/0x30 >> [ 106.460747] ? notifier_call_chain+0x42/0x60 >> [ 106.461271] notifier_call_chain+0x42/0x60 >> [ 106.461819] register_netdevice+0x415/0x530 >> [ 106.462364] register_netdev+0x11/0x20 >> [ 106.462849] loopback_net_init+0x43/0x90 >> [ 106.463216] ops_init+0x3b/0x100 >> [ 106.463516] setup_net+0x7d/0x150 >> [ 106.463831] copy_net_ns+0x14b/0x180 >> [ 106.464134] create_new_namespaces+0x117/0x1b0 >> [ 106.464481] unshare_nsproxy_namespaces+0x5b/0x90 >> [ 106.464864] SyS_unshare+0x1b0/0x300 >> >> [ 106.536845] Kernel panic - not syncing: Out of memory and no killable processes... > > These two stacks of this problem are not blocked at mutex_lock(). > > Why all OOM-killable threads were killed? There were only few? > Does pcpu_alloc() allocate so much enough to deplete memory reserves? The test eats all kmem, so OOM kills everything. It's because of slow net namespace destruction. But this patch is about "half"-deadlock between pcpu_alloc() and worker, which slows down OOM reaping. There is potential possibility, and it's good to fix it. I've seen a crash with waiting on the mutex, but I have not saved it. It seems the test may reproduce it after some time. With the patch applied I don't see pcpu-related crashes in pcpu_alloc() at all. Kirill ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() @ 2018-03-15 12:09 ` Kirill Tkhai 0 siblings, 0 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 12:09 UTC (permalink / raw) To: Tetsuo Handa, Andrew Morton, Tejun Heo; +Cc: cl, linux-mm, linux-kernel On 15.03.2018 13:48, Tetsuo Handa wrote: > On 2018/03/15 17:58, Kirill Tkhai wrote: >> On 15.03.2018 01:22, Andrew Morton wrote: >>> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: >>> >>>> Hello, Andrew. >>>> >>>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >>>>> It would benefit from a comment explaining why we're doing this (it's >>>>> for the oom-killer). >>>> >>>> Will add. >>>> >>>>> My memory is weak and our documentation is awful.A What does >>>>> mutex_lock_killable() actually do and how does it differ from >>>>> mutex_lock_interruptible()?A Userspace tasks can run pcpu_alloc() and I >>>> >>>> IIRC, killable listens only to SIGKILL. > > I think that killable listens to any signal which results in termination of > that process. For example, if a process is configured to terminate upon SIGINT, > fatal_signal_pending() becomes true upon SIGINT. It shouldn't act on SIGINT: static inline int __fatal_signal_pending(struct task_struct *p) { return unlikely(sigismember(&p->pending.signal, SIGKILL)); } static inline int fatal_signal_pending(struct task_struct *p) { return signal_pending(p) && __fatal_signal_pending(p); } >>>> >>>>> wonder if there's any way in which a userspace-delivered signal can >>>>> disrupt another userspace task's memory allocation attempt? >>>> >>>> Hmm... maybe.A Just honoring SIGKILL *should* be fine but the alloc >>>> failure paths might be broken, so there are some risks.A Given that >>>> the cases where userspace tasks end up allocation percpu memory is >>>> pretty limited and/or priviledged (like mount, bpf), I don't think the >>>> risks are high tho. >>> >>> hm.A spose so.A Maybe.A Are there other ways?A I assume the time is >>> being spent in pcpu_create_chunk()?A We could drop the mutex while >>> running that stuff and take the appropriate did-we-race-with-someone >>> testing after retaking it.A Or similar. >> >> The balance work spends its time in pcpu_populate_chunk(). There are >> two stacks of this problem: > > Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper > reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc() > are choosen by OOM killer, they can't exit, because they are waiting for the > mutex." should not cause problems. Of course, giving up upon SIGKILL is nice > regardless. There is a test case, which leads my 4 cpus VM to the OOM: #define _GNU_SOURCE #include <sched.h> main() { int i; for (i = 0; i < 8; i++) fork(); daemon(1,1); while (1) unshare(CLONE_NEWNET); } The problem is that net namespace init/exit methods are not made to be executed in parallel, and exclusive mutex is used there. I'm working on solution at the moment, and you may find that I've done in net-next.git, if you are interested. pcpu_alloc()-related OOM happens on stable kernel, and it's easy to trigger it by the test. pcpu is not the only problem there, but it's one of them, and since there is logically seen OOM deadlock in pcpu code like it's written in patch description, and the patch fixes it. Going away from this problem to general, I think all allocating/registering actions in kernel should be made with killable primitives, if they can fail. And the generic policy should be using mutex_lock_killable() instead of mutex_lock(). Otherwise, OOM victims can't died, if they waiting for a mutex, which is held by a process making a reclaim. This makes circular dependencies and just makes OOM badness counting useless, while it must not be so. >> >> [A 106.313267] kworker/2:2A A A A D13832A A 936A A A A A 2 0x80000000 >> [A 106.313740] Workqueue: events pcpu_balance_workfn >> [A 106.314109] Call Trace: >> [A 106.314293]A ? __schedule+0x267/0x750 >> [A 106.314570]A schedule+0x2d/0x90 >> [A 106.314803]A schedule_timeout+0x17f/0x390 >> [A 106.315106]A ? __next_timer_interrupt+0xc0/0xc0 >> [A 106.315429]A __alloc_pages_slowpath+0xb73/0xd90 >> [A 106.315792]A __alloc_pages_nodemask+0x16a/0x210 >> [A 106.316148]A pcpu_populate_chunk+0xce/0x300 >> [A 106.316479]A pcpu_balance_workfn+0x3f3/0x580 >> [A 106.316853]A ? _raw_spin_unlock_irq+0xe/0x30 >> [A 106.317227]A ? finish_task_switch+0x8d/0x250 >> [A 106.317632]A process_one_work+0x1b7/0x410 >> [A 106.317970]A worker_thread+0x26/0x3d0 >> [A 106.318304]A ? process_one_work+0x410/0x410 >> [A 106.318649]A kthread+0x10e/0x130 >> [A 106.318916]A ? __kthread_create_worker+0x120/0x120 >> [A 106.319360]A ret_from_fork+0x35/0x40 >> >> [A 106.453375] a.outA A A A A A A A A A D13400A 3670A A A A A 1 0x00100004 >> [A 106.453880] Call Trace: >> [A 106.454114]A ? __schedule+0x267/0x750 >> [A 106.454427]A schedule+0x2d/0x90 >> [A 106.454829]A schedule_preempt_disabled+0xf/0x20 >> [A 106.455422]A __mutex_lock.isra.2+0x181/0x4d0 >> [A 106.455988]A ? pcpu_alloc+0x3c4/0x670 >> [A 106.456465]A pcpu_alloc+0x3c4/0x670 >> [A 106.456973]A ? preempt_count_add+0x63/0x90 >> [A 106.457401]A ? __local_bh_enable_ip+0x2e/0x60 >> [A 106.457882]A ipv6_add_dev+0x121/0x490 >> [A 106.458330]A addrconf_notify+0x27b/0x9a0 >> [A 106.458823]A ? inetdev_init+0xd7/0x150 >> [A 106.459270]A ? inetdev_event+0x339/0x4b0 >> [A 106.459738]A ? preempt_count_add+0x63/0x90 >> [A 106.460243]A ? _raw_spin_lock_irq+0xf/0x30 >> [A 106.460747]A ? notifier_call_chain+0x42/0x60 >> [A 106.461271]A notifier_call_chain+0x42/0x60 >> [A 106.461819]A register_netdevice+0x415/0x530 >> [A 106.462364]A register_netdev+0x11/0x20 >> [A 106.462849]A loopback_net_init+0x43/0x90 >> [A 106.463216]A ops_init+0x3b/0x100 >> [A 106.463516]A setup_net+0x7d/0x150 >> [A 106.463831]A copy_net_ns+0x14b/0x180 >> [A 106.464134]A create_new_namespaces+0x117/0x1b0 >> [A 106.464481]A unshare_nsproxy_namespaces+0x5b/0x90 >> [A 106.464864]A SyS_unshare+0x1b0/0x300 >> >> [A 106.536845] Kernel panic - not syncing: Out of memory and no killable processes... > > These two stacks of this problem are not blocked at mutex_lock(). > > Why all OOM-killable threads were killed? There were only few? > Does pcpu_alloc() allocate so much enough to deplete memory reserves? The test eats all kmem, so OOM kills everything. It's because of slow net namespace destruction. But this patch is about "half"-deadlock between pcpu_alloc() and worker, which slows down OOM reaping. There is potential possibility, and it's good to fix it. I've seen a crash with waiting on the mutex, but I have not saved it. It seems the test may reproduce it after some time. With the patch applied I don't see pcpu-related crashes in pcpu_alloc() at all. Kirill ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-15 12:09 ` Kirill Tkhai (?) @ 2018-03-15 14:09 ` Tetsuo Handa 2018-03-15 14:42 ` Kirill Tkhai -1 siblings, 1 reply; 27+ messages in thread From: Tetsuo Handa @ 2018-03-15 14:09 UTC (permalink / raw) To: ktkhai, akpm, tj; +Cc: cl, linux-mm, linux-kernel Kirill Tkhai wrote: > >>>>> My memory is weak and our documentation is awful.? What does > >>>>> mutex_lock_killable() actually do and how does it differ from > >>>>> mutex_lock_interruptible()?? Userspace tasks can run pcpu_alloc() and I > >>>> > >>>> IIRC, killable listens only to SIGKILL. > > > > I think that killable listens to any signal which results in termination of > > that process. For example, if a process is configured to terminate upon SIGINT, > > fatal_signal_pending() becomes true upon SIGINT. > > It shouldn't act on SIGINT: > > static inline int __fatal_signal_pending(struct task_struct *p) > { > return unlikely(sigismember(&p->pending.signal, SIGKILL)); > } > > static inline int fatal_signal_pending(struct task_struct *p) > { > return signal_pending(p) && __fatal_signal_pending(p); > } > Really? Compile below module and try to load using insmod command. ---------------------------------------- #include <linux/module.h> #include <linux/sched/signal.h> static int __init test_init(void) { static DEFINE_MUTEX(lock); mutex_lock(&lock); printk(KERN_INFO "signal_pending()=%d fatal_signal_pending()=%d\n", signal_pending(current), fatal_signal_pending(current)); if (mutex_lock_killable(&lock)) { printk(KERN_INFO "signal_pending()=%d fatal_signal_pending()=%d\n", signal_pending(current), fatal_signal_pending(current)); mutex_unlock(&lock); return -EINTR; } mutex_unlock(&lock); mutex_unlock(&lock); return -EINVAL; } module_init(test_init); MODULE_LICENSE("GPL"); ---------------------------------------- What you will see (apart from lockdep warning) upon SIGINT or SIGHUP is signal_pending()=0 fatal_signal_pending()=0 signal_pending()=1 fatal_signal_pending()=1 which means that fatal_signal_pending() becomes true without SIGKILL. If insmod is executed via nohup wrapper, insmod does not terminate upon SIGHUP. > The problem is that net namespace init/exit methods are not made to be executed in parallel, > and exclusive mutex is used there. I'm working on solution at the moment, and you may find > that I've done in net-next.git, if you are interested. I see. Despite your patch, torture tests using your test case still allows OOM panic. ---------------------------------------- [ 860.420677] Out of memory: Kill process 12727 (a.out) score 0 or sacrifice child [ 860.423228] Killed process 12727 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 860.428125] oom_reaper: reaped process 12727 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 860.438257] Out of memory: Kill process 12728 (a.out) score 0 or sacrifice child [ 860.440709] Killed process 12728 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 860.445840] oom_reaper: reaped process 12728 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 860.456815] Out of memory: Kill process 12729 (a.out) score 0 or sacrifice child [ 860.459618] Killed process 12729 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 860.464686] oom_reaper: reaped process 12729 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 860.489807] Out of memory: Kill process 12730 (a.out) score 0 or sacrifice child [ 860.492495] Killed process 12730 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 860.501268] oom_reaper: reaped process 12730 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 860.536786] Out of memory: Kill process 12731 (a.out) score 0 or sacrifice child [ 860.539392] Killed process 12731 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 860.544130] oom_reaper: reaped process 12731 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 860.553587] Out of memory: Kill process 12732 (a.out) score 0 or sacrifice child [ 860.556359] Killed process 12732 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 860.559639] oom_reaper: reaped process 12732 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 860.564972] Out of memory: Kill process 12733 (a.out) score 0 or sacrifice child [ 860.567603] Killed process 12733 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB [ 860.573416] oom_reaper: reaped process 12733 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 860.579675] Out of memory: Kill process 762 (dbus-daemon) score 0 or sacrifice child [ 860.582334] Killed process 762 (dbus-daemon) total-vm:24560kB, anon-rss:480kB, file-rss:0kB, shmem-rss:0kB [ 860.590607] systemd invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 [ 860.594065] systemd cpuset=/ mems_allowed=0 [ 860.596172] CPU: 1 PID: 1 Comm: systemd Kdump: loaded Tainted: G O 4.16.0-rc5-next-20180315 #695 [ 860.599401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 860.602676] Call Trace: [ 860.604118] dump_stack+0x5f/0x8b [ 860.605741] dump_header+0x69/0x431 [ 860.607380] ? rcu_read_unlock_special+0x2cc/0x2f0 [ 860.609342] out_of_memory+0x4d8/0x720 [ 860.611044] __alloc_pages_nodemask+0x12c5/0x1410 [ 860.613041] filemap_fault+0x479/0x640 [ 860.614725] __xfs_filemap_fault.constprop.0+0x5f/0x1f0 [ 860.616717] __do_fault+0x15/0xa0 [ 860.618294] __handle_mm_fault+0xcb2/0x1140 [ 860.620031] handle_mm_fault+0x186/0x350 [ 860.621720] __do_page_fault+0x2a7/0x510 [ 860.623402] do_page_fault+0x2c/0x2a0 [ 860.624999] ? page_fault+0x2f/0x50 [ 860.626548] page_fault+0x45/0x50 [ 860.628045] RIP: 61fa2380:0x55c6609099a0 [ 860.629805] RSP: 608a920b:00007ffc83a98620 EFLAGS: 7fdbd590e740 [ 860.630698] Mem-Info: [ 860.634428] active_anon:3783 inactive_anon:3987 isolated_anon:0 [ 860.634428] active_file:3 inactive_file:0 isolated_file:0 [ 860.634428] unevictable:0 dirty:0 writeback:0 unstable:0 [ 860.634428] slab_reclaimable:124666 slab_unreclaimable:694094 [ 860.634428] mapped:37 shmem:6270 pagetables:2087 bounce:0 [ 860.634428] free:21037 free_pcp:299 free_cma:0 [ 860.646361] Node 0 active_anon:15132kB inactive_anon:15948kB active_file:12kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:148kB dirty:0kB writeback:0kB shmem:25080kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2048kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes [ 860.653706] Node 0 DMA free:14828kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 860.660693] lowmem_reserve[]: 0 2684 3642 3642 [ 860.662661] Node 0 DMA32 free:53532kB min:49596kB low:61992kB high:74388kB active_anon:3420kB inactive_anon:5048kB active_file:192kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2771556kB mlocked:0kB kernel_stack:5184kB pagetables:7680kB bounce:0kB free_pcp:444kB local_pcp:76kB free_cma:0kB [ 860.671523] lowmem_reserve[]: 0 0 958 958 [ 860.673464] Node 0 Normal free:15616kB min:17696kB low:22120kB high:26544kB active_anon:11948kB inactive_anon:10900kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1048576kB managed:981136kB mlocked:0kB kernel_stack:2640kB pagetables:636kB bounce:0kB free_pcp:892kB local_pcp:648kB free_cma:0kB [ 860.682664] lowmem_reserve[]: 0 0 0 0 [ 860.684412] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (E) 1*256kB (E) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ME) 2*4096kB (M) = 14828kB [ 860.688963] Node 0 DMA32: 565*4kB (UM) 568*8kB (UM) 1037*16kB (UM) 42*32kB (ME) 26*64kB (UME) 15*128kB (ME) 15*256kB (UM) 8*512kB (ME) 7*1024kB (UME) 3*2048kB (M) 1*4096kB (M) = 53668kB [ 860.694161] Node 0 Normal: 67*4kB (ME) 1218*8kB (UME) 348*16kB (UME) 1*32kB (E) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB [ 860.698195] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 860.701105] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 860.703682] 6267 total pagecache pages [ 860.705388] 0 pages in swap cache [ 860.707457] Swap cache stats: add 0, delete 0, find 0/0 [ 860.709624] Free swap = 0kB [ 860.711105] Total swap = 0kB [ 860.712841] 1048445 pages RAM [ 860.714320] 0 pages HighMem/MovableOnly [ 860.715970] 106296 pages reserved [ 860.717625] 0 pages hwpoisoned [ 860.719086] Unreclaimable slab info: [ 860.720677] Name Used Total [ 860.722685] scsi_sense_cache 44KB 44KB [ 860.724517] RAWv6 237460KB 237460KB [ 860.726353] TCPv6 118KB 118KB [ 860.728183] sgpool-128 192KB 192KB [ 860.730434] sgpool-16 64KB 64KB [ 860.732303] mqueue_inode_cache 31KB 31KB [ 860.734302] xfs_buf 584KB 640KB [ 860.736188] xfs_ili 134KB 134KB [ 860.738154] xfs_efd_item 110KB 110KB [ 860.740163] xfs_trans 31KB 31KB [ 860.742707] xfs_ifork 108KB 108KB [ 860.744781] xfs_da_state 63KB 63KB [ 860.747019] xfs_btree_cur 31KB 31KB [ 860.748845] bio-2 47KB 47KB [ 860.750647] UNIX 273KB 273KB [ 860.752534] RAW 277018KB 277018KB [ 860.754339] UDP 19861KB 19861KB [ 860.756221] tw_sock_TCP 7KB 7KB [ 860.758115] request_sock_TCP 7KB 7KB [ 860.759851] TCP 120KB 120KB [ 860.761508] hugetlbfs_inode_cache 63KB 63KB [ 860.763700] eventpoll_pwq 15KB 15KB [ 860.765423] inotify_inode_mark 52KB 52KB [ 860.767707] request_queue 94KB 94KB [ 860.769379] blkdev_ioc 39KB 39KB [ 860.771060] biovec-(1<<(21-12)) 784KB 912KB [ 860.772759] biovec-128 192KB 192KB [ 860.775225] biovec-64 128KB 128KB [ 860.777187] uid_cache 15KB 15KB [ 860.779341] dmaengine-unmap-2 16KB 16KB [ 860.781330] skbuff_head_cache 184KB 216KB [ 860.783001] file_lock_cache 31KB 31KB [ 860.784593] file_lock_ctx 15KB 15KB [ 860.786172] net_namespace 55270KB 55270KB [ 860.787706] shmem_inode_cache 980KB 980KB [ 860.789349] task_delay_info 138KB 179KB [ 860.791090] taskstats 23KB 23KB [ 860.792623] proc_dir_entry 142317KB 142317KB [ 860.794287] pde_opener 15KB 15KB [ 860.796069] seq_file 31KB 31KB [ 860.797641] sigqueue 19KB 19KB [ 860.799060] kernfs_node_cache 173844KB 173844KB [ 860.800572] mnt_cache 141KB 141KB [ 860.802101] filp 656KB 656KB [ 860.803624] names_cache 256KB 256KB [ 860.805107] key_jar 31KB 31KB [ 860.806655] vm_area_struct 1293KB 1726KB [ 860.808074] mm_struct 789KB 1012KB [ 860.809482] files_cache 1330KB 1330KB [ 860.811777] signal_cache 998KB 1606KB [ 860.813835] sighand_cache 1610KB 1990KB [ 860.815698] task_struct 3795KB 4766KB [ 860.817247] cred_jar 368KB 460KB [ 860.818819] anon_vma 1457KB 1747KB [ 860.820406] pid 1534KB 2196KB [ 860.822081] Acpi-Operand 480KB 480KB [ 860.823955] Acpi-State 27KB 27KB [ 860.825518] Acpi-Namespace 179KB 179KB [ 860.827060] numa_policy 15KB 15KB [ 860.828772] trace_event_file 90KB 90KB [ 860.830696] ftrace_event_field 95KB 95KB [ 860.832241] pool_workqueue 144KB 144KB [ 860.833746] task_group 599KB 630KB [ 860.835272] page->ptl 362KB 411KB [ 860.836755] dma-kmalloc-512 16KB 16KB [ 860.838274] kmalloc-8192 211320KB 211320KB [ 860.839748] kmalloc-4096 631748KB 631748KB [ 860.841248] kmalloc-2048 529296KB 529296KB [ 860.843094] kmalloc-1024 79300KB 79300KB [ 860.844575] kmalloc-512 192744KB 192744KB [ 860.847059] kmalloc-256 24292KB 24292KB [ 860.848747] kmalloc-192 56959KB 56959KB [ 860.850504] kmalloc-128 77732KB 77732KB [ 860.852197] kmalloc-96 1426KB 1519KB [ 860.853936] kmalloc-64 19192KB 19192KB [ 860.855454] kmalloc-32 1368KB 1368KB [ 860.856988] kmalloc-16 336KB 336KB [ 860.858478] kmalloc-8 864KB 864KB [ 860.859990] kmem_cache_node 20KB 20KB [ 860.861448] kmem_cache 78KB 78KB [ 860.863006] Kernel panic - not syncing: Out of memory and no killable processes... [ 860.863006] [ 860.866069] CPU: 3 PID: 1 Comm: systemd Kdump: loaded Tainted: G O 4.16.0-rc5-next-20180315 #695 [ 860.868829] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 860.871926] Call Trace: [ 860.872850] dump_stack+0x5f/0x8b [ 860.873965] panic+0xde/0x231 [ 860.875005] out_of_memory+0x4e4/0x720 [ 860.876218] __alloc_pages_nodemask+0x12c5/0x1410 [ 860.877645] filemap_fault+0x479/0x640 [ 860.878842] __xfs_filemap_fault.constprop.0+0x5f/0x1f0 [ 860.880542] __do_fault+0x15/0xa0 [ 860.881674] __handle_mm_fault+0xcb2/0x1140 [ 860.883004] handle_mm_fault+0x186/0x350 [ 860.884283] __do_page_fault+0x2a7/0x510 [ 860.885576] do_page_fault+0x2c/0x2a0 [ 860.886805] ? page_fault+0x2f/0x50 [ 860.888003] page_fault+0x45/0x50 [ 860.889169] RIP: 61fa2380:0x55c6609099a0 ---------------------------------------- ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-15 14:09 ` Tetsuo Handa @ 2018-03-15 14:42 ` Kirill Tkhai 0 siblings, 0 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 14:42 UTC (permalink / raw) To: Tetsuo Handa, akpm, tj, willy; +Cc: cl, linux-mm, linux-kernel On 15.03.2018 17:09, Tetsuo Handa wrote: > Kirill Tkhai wrote: >>>>>>> My memory is weak and our documentation is awful.? What does >>>>>>> mutex_lock_killable() actually do and how does it differ from >>>>>>> mutex_lock_interruptible()?? Userspace tasks can run pcpu_alloc() and I >>>>>> >>>>>> IIRC, killable listens only to SIGKILL. >>> >>> I think that killable listens to any signal which results in termination of >>> that process. For example, if a process is configured to terminate upon SIGINT, >>> fatal_signal_pending() becomes true upon SIGINT. >> >> It shouldn't act on SIGINT: >> >> static inline int __fatal_signal_pending(struct task_struct *p) >> { >> return unlikely(sigismember(&p->pending.signal, SIGKILL)); >> } >> >> static inline int fatal_signal_pending(struct task_struct *p) >> { >> return signal_pending(p) && __fatal_signal_pending(p); >> } >> > > Really? Compile below module and try to load using insmod command. > > ---------------------------------------- > #include <linux/module.h> > #include <linux/sched/signal.h> > > static int __init test_init(void) > { > static DEFINE_MUTEX(lock); > > mutex_lock(&lock); > printk(KERN_INFO "signal_pending()=%d fatal_signal_pending()=%d\n", signal_pending(current), fatal_signal_pending(current)); > if (mutex_lock_killable(&lock)) { > printk(KERN_INFO "signal_pending()=%d fatal_signal_pending()=%d\n", signal_pending(current), fatal_signal_pending(current)); > mutex_unlock(&lock); > return -EINTR; > } > mutex_unlock(&lock); > mutex_unlock(&lock); > return -EINVAL; > } > > module_init(test_init); > MODULE_LICENSE("GPL"); > ---------------------------------------- > > What you will see (apart from lockdep warning) upon SIGINT or SIGHUP is > > signal_pending()=0 fatal_signal_pending()=0 > signal_pending()=1 fatal_signal_pending()=1 > > which means that fatal_signal_pending() becomes true without SIGKILL. > If insmod is executed via nohup wrapper, insmod does not terminate upon SIGHUP. Matthew already pointed that. Thanks for the explanation again :) >> The problem is that net namespace init/exit methods are not made to be executed in parallel, >> and exclusive mutex is used there. I'm working on solution at the moment, and you may find >> that I've done in net-next.git, if you are interested. > > I see. Despite your patch, torture tests using your test case still allows OOM panic. I know. There are several problems. But fresh net-next.git with this patch and these two patchsets: https://patchwork.ozlabs.org/project/netdev/list/?series=33829 https://patchwork.ozlabs.org/project/netdev/list/?series=33949 does not bump into OOM during the test. Despite that, there is a lot of work, which should be made more. Kirill > ---------------------------------------- > [ 860.420677] Out of memory: Kill process 12727 (a.out) score 0 or sacrifice child > [ 860.423228] Killed process 12727 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > [ 860.428125] oom_reaper: reaped process 12727 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 860.438257] Out of memory: Kill process 12728 (a.out) score 0 or sacrifice child > [ 860.440709] Killed process 12728 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > [ 860.445840] oom_reaper: reaped process 12728 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 860.456815] Out of memory: Kill process 12729 (a.out) score 0 or sacrifice child > [ 860.459618] Killed process 12729 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > [ 860.464686] oom_reaper: reaped process 12729 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 860.489807] Out of memory: Kill process 12730 (a.out) score 0 or sacrifice child > [ 860.492495] Killed process 12730 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > [ 860.501268] oom_reaper: reaped process 12730 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 860.536786] Out of memory: Kill process 12731 (a.out) score 0 or sacrifice child > [ 860.539392] Killed process 12731 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > [ 860.544130] oom_reaper: reaped process 12731 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 860.553587] Out of memory: Kill process 12732 (a.out) score 0 or sacrifice child > [ 860.556359] Killed process 12732 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > [ 860.559639] oom_reaper: reaped process 12732 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 860.564972] Out of memory: Kill process 12733 (a.out) score 0 or sacrifice child > [ 860.567603] Killed process 12733 (a.out) total-vm:4176kB, anon-rss:84kB, file-rss:0kB, shmem-rss:0kB > [ 860.573416] oom_reaper: reaped process 12733 (a.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > [ 860.579675] Out of memory: Kill process 762 (dbus-daemon) score 0 or sacrifice child > [ 860.582334] Killed process 762 (dbus-daemon) total-vm:24560kB, anon-rss:480kB, file-rss:0kB, shmem-rss:0kB > [ 860.590607] systemd invoked oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0 > [ 860.594065] systemd cpuset=/ mems_allowed=0 > [ 860.596172] CPU: 1 PID: 1 Comm: systemd Kdump: loaded Tainted: G O 4.16.0-rc5-next-20180315 #695 > [ 860.599401] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 > [ 860.602676] Call Trace: > [ 860.604118] dump_stack+0x5f/0x8b > [ 860.605741] dump_header+0x69/0x431 > [ 860.607380] ? rcu_read_unlock_special+0x2cc/0x2f0 > [ 860.609342] out_of_memory+0x4d8/0x720 > [ 860.611044] __alloc_pages_nodemask+0x12c5/0x1410 > [ 860.613041] filemap_fault+0x479/0x640 > [ 860.614725] __xfs_filemap_fault.constprop.0+0x5f/0x1f0 > [ 860.616717] __do_fault+0x15/0xa0 > [ 860.618294] __handle_mm_fault+0xcb2/0x1140 > [ 860.620031] handle_mm_fault+0x186/0x350 > [ 860.621720] __do_page_fault+0x2a7/0x510 > [ 860.623402] do_page_fault+0x2c/0x2a0 > [ 860.624999] ? page_fault+0x2f/0x50 > [ 860.626548] page_fault+0x45/0x50 > [ 860.628045] RIP: 61fa2380:0x55c6609099a0 > [ 860.629805] RSP: 608a920b:00007ffc83a98620 EFLAGS: 7fdbd590e740 > [ 860.630698] Mem-Info: > [ 860.634428] active_anon:3783 inactive_anon:3987 isolated_anon:0 > [ 860.634428] active_file:3 inactive_file:0 isolated_file:0 > [ 860.634428] unevictable:0 dirty:0 writeback:0 unstable:0 > [ 860.634428] slab_reclaimable:124666 slab_unreclaimable:694094 > [ 860.634428] mapped:37 shmem:6270 pagetables:2087 bounce:0 > [ 860.634428] free:21037 free_pcp:299 free_cma:0 > [ 860.646361] Node 0 active_anon:15132kB inactive_anon:15948kB active_file:12kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:148kB dirty:0kB writeback:0kB shmem:25080kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 2048kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes > [ 860.653706] Node 0 DMA free:14828kB min:284kB low:352kB high:420kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [ 860.660693] lowmem_reserve[]: 0 2684 3642 3642 > [ 860.662661] Node 0 DMA32 free:53532kB min:49596kB low:61992kB high:74388kB active_anon:3420kB inactive_anon:5048kB active_file:192kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129216kB managed:2771556kB mlocked:0kB kernel_stack:5184kB pagetables:7680kB bounce:0kB free_pcp:444kB local_pcp:76kB free_cma:0kB > [ 860.671523] lowmem_reserve[]: 0 0 958 958 > [ 860.673464] Node 0 Normal free:15616kB min:17696kB low:22120kB high:26544kB active_anon:11948kB inactive_anon:10900kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1048576kB managed:981136kB mlocked:0kB kernel_stack:2640kB pagetables:636kB bounce:0kB free_pcp:892kB local_pcp:648kB free_cma:0kB > [ 860.682664] lowmem_reserve[]: 0 0 0 0 > [ 860.684412] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 1*32kB (U) 1*64kB (U) 1*128kB (E) 1*256kB (E) 2*512kB (UE) 1*1024kB (E) 2*2048kB (ME) 2*4096kB (M) = 14828kB > [ 860.688963] Node 0 DMA32: 565*4kB (UM) 568*8kB (UM) 1037*16kB (UM) 42*32kB (ME) 26*64kB (UME) 15*128kB (ME) 15*256kB (UM) 8*512kB (ME) 7*1024kB (UME) 3*2048kB (M) 1*4096kB (M) = 53668kB > [ 860.694161] Node 0 Normal: 67*4kB (ME) 1218*8kB (UME) 348*16kB (UME) 1*32kB (E) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15612kB > [ 860.698195] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB > [ 860.701105] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > [ 860.703682] 6267 total pagecache pages > [ 860.705388] 0 pages in swap cache > [ 860.707457] Swap cache stats: add 0, delete 0, find 0/0 > [ 860.709624] Free swap = 0kB > [ 860.711105] Total swap = 0kB > [ 860.712841] 1048445 pages RAM > [ 860.714320] 0 pages HighMem/MovableOnly > [ 860.715970] 106296 pages reserved > [ 860.717625] 0 pages hwpoisoned > [ 860.719086] Unreclaimable slab info: > [ 860.720677] Name Used Total > [ 860.722685] scsi_sense_cache 44KB 44KB > [ 860.724517] RAWv6 237460KB 237460KB > [ 860.726353] TCPv6 118KB 118KB > [ 860.728183] sgpool-128 192KB 192KB > [ 860.730434] sgpool-16 64KB 64KB > [ 860.732303] mqueue_inode_cache 31KB 31KB > [ 860.734302] xfs_buf 584KB 640KB > [ 860.736188] xfs_ili 134KB 134KB > [ 860.738154] xfs_efd_item 110KB 110KB > [ 860.740163] xfs_trans 31KB 31KB > [ 860.742707] xfs_ifork 108KB 108KB > [ 860.744781] xfs_da_state 63KB 63KB > [ 860.747019] xfs_btree_cur 31KB 31KB > [ 860.748845] bio-2 47KB 47KB > [ 860.750647] UNIX 273KB 273KB > [ 860.752534] RAW 277018KB 277018KB > [ 860.754339] UDP 19861KB 19861KB > [ 860.756221] tw_sock_TCP 7KB 7KB > [ 860.758115] request_sock_TCP 7KB 7KB > [ 860.759851] TCP 120KB 120KB > [ 860.761508] hugetlbfs_inode_cache 63KB 63KB > [ 860.763700] eventpoll_pwq 15KB 15KB > [ 860.765423] inotify_inode_mark 52KB 52KB > [ 860.767707] request_queue 94KB 94KB > [ 860.769379] blkdev_ioc 39KB 39KB > [ 860.771060] biovec-(1<<(21-12)) 784KB 912KB > [ 860.772759] biovec-128 192KB 192KB > [ 860.775225] biovec-64 128KB 128KB > [ 860.777187] uid_cache 15KB 15KB > [ 860.779341] dmaengine-unmap-2 16KB 16KB > [ 860.781330] skbuff_head_cache 184KB 216KB > [ 860.783001] file_lock_cache 31KB 31KB > [ 860.784593] file_lock_ctx 15KB 15KB > [ 860.786172] net_namespace 55270KB 55270KB > [ 860.787706] shmem_inode_cache 980KB 980KB > [ 860.789349] task_delay_info 138KB 179KB > [ 860.791090] taskstats 23KB 23KB > [ 860.792623] proc_dir_entry 142317KB 142317KB > [ 860.794287] pde_opener 15KB 15KB > [ 860.796069] seq_file 31KB 31KB > [ 860.797641] sigqueue 19KB 19KB > [ 860.799060] kernfs_node_cache 173844KB 173844KB > [ 860.800572] mnt_cache 141KB 141KB > [ 860.802101] filp 656KB 656KB > [ 860.803624] names_cache 256KB 256KB > [ 860.805107] key_jar 31KB 31KB > [ 860.806655] vm_area_struct 1293KB 1726KB > [ 860.808074] mm_struct 789KB 1012KB > [ 860.809482] files_cache 1330KB 1330KB > [ 860.811777] signal_cache 998KB 1606KB > [ 860.813835] sighand_cache 1610KB 1990KB > [ 860.815698] task_struct 3795KB 4766KB > [ 860.817247] cred_jar 368KB 460KB > [ 860.818819] anon_vma 1457KB 1747KB > [ 860.820406] pid 1534KB 2196KB > [ 860.822081] Acpi-Operand 480KB 480KB > [ 860.823955] Acpi-State 27KB 27KB > [ 860.825518] Acpi-Namespace 179KB 179KB > [ 860.827060] numa_policy 15KB 15KB > [ 860.828772] trace_event_file 90KB 90KB > [ 860.830696] ftrace_event_field 95KB 95KB > [ 860.832241] pool_workqueue 144KB 144KB > [ 860.833746] task_group 599KB 630KB > [ 860.835272] page->ptl 362KB 411KB > [ 860.836755] dma-kmalloc-512 16KB 16KB > [ 860.838274] kmalloc-8192 211320KB 211320KB > [ 860.839748] kmalloc-4096 631748KB 631748KB > [ 860.841248] kmalloc-2048 529296KB 529296KB > [ 860.843094] kmalloc-1024 79300KB 79300KB > [ 860.844575] kmalloc-512 192744KB 192744KB > [ 860.847059] kmalloc-256 24292KB 24292KB > [ 860.848747] kmalloc-192 56959KB 56959KB > [ 860.850504] kmalloc-128 77732KB 77732KB > [ 860.852197] kmalloc-96 1426KB 1519KB > [ 860.853936] kmalloc-64 19192KB 19192KB > [ 860.855454] kmalloc-32 1368KB 1368KB > [ 860.856988] kmalloc-16 336KB 336KB > [ 860.858478] kmalloc-8 864KB 864KB > [ 860.859990] kmem_cache_node 20KB 20KB > [ 860.861448] kmem_cache 78KB 78KB > [ 860.863006] Kernel panic - not syncing: Out of memory and no killable processes... > [ 860.863006] > [ 860.866069] CPU: 3 PID: 1 Comm: systemd Kdump: loaded Tainted: G O 4.16.0-rc5-next-20180315 #695 > [ 860.868829] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 > [ 860.871926] Call Trace: > [ 860.872850] dump_stack+0x5f/0x8b > [ 860.873965] panic+0xde/0x231 > [ 860.875005] out_of_memory+0x4e4/0x720 > [ 860.876218] __alloc_pages_nodemask+0x12c5/0x1410 > [ 860.877645] filemap_fault+0x479/0x640 > [ 860.878842] __xfs_filemap_fault.constprop.0+0x5f/0x1f0 > [ 860.880542] __do_fault+0x15/0xa0 > [ 860.881674] __handle_mm_fault+0xcb2/0x1140 > [ 860.883004] handle_mm_fault+0x186/0x350 > [ 860.884283] __do_page_fault+0x2a7/0x510 > [ 860.885576] do_page_fault+0x2c/0x2a0 > [ 860.886805] ? page_fault+0x2f/0x50 > [ 860.888003] page_fault+0x45/0x50 > [ 860.889169] RIP: 61fa2380:0x55c6609099a0 > ---------------------------------------- ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-14 22:22 ` Andrew Morton 2018-03-15 8:58 ` Kirill Tkhai @ 2018-03-19 15:13 ` Tejun Heo 1 sibling, 0 replies; 27+ messages in thread From: Tejun Heo @ 2018-03-19 15:13 UTC (permalink / raw) To: Andrew Morton; +Cc: Kirill Tkhai, cl, linux-mm, linux-kernel Hello, Andrew. On Wed, Mar 14, 2018 at 03:22:03PM -0700, Andrew Morton wrote: > hm. spose so. Maybe. Are there other ways? I assume the time is > being spent in pcpu_create_chunk()? We could drop the mutex while > running that stuff and take the appropriate did-we-race-with-someone > testing after retaking it. Or similar. I'm not sure that'd change much. Ultimately, isn't the choice between being able to return NULL and waiting for more memory? If we decide to return NULL, it doesn't make difference where we do that from, right? Thanks. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] Improve mutex documentation 2018-03-14 20:56 ` Andrew Morton @ 2018-03-15 11:58 ` Matthew Wilcox 2018-03-15 11:58 ` Matthew Wilcox 2018-03-19 15:14 ` [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Tejun Heo 2 siblings, 0 replies; 27+ messages in thread From: Matthew Wilcox @ 2018-03-15 11:58 UTC (permalink / raw) To: Andrew Morton Cc: Kirill Tkhai, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > My memory is weak and our documentation is awful. What does > mutex_lock_killable() actually do and how does it differ from > mutex_lock_interruptible()? From: Matthew Wilcox <mawilcox@microsoft.com> Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the kernel-doc for mutex_lock_interruptible(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 858a07590e39..2048359f33d2 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -1082,15 +1082,16 @@ static noinline int __sched __mutex_lock_interruptible_slowpath(struct mutex *lock); /** - * mutex_lock_interruptible - acquire the mutex, interruptible - * @lock: the mutex to be acquired + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. + * @lock: The mutex to be acquired. * - * Lock the mutex like mutex_lock(), and return 0 if the mutex has - * been acquired or sleep until the mutex becomes available. If a - * signal arrives while waiting for the lock then this function - * returns -EINTR. + * Lock the mutex like mutex_lock(). If a signal is delivered while the + * process is sleeping, this function will return without acquiring the + * mutex. * - * This function is similar to (but not equivalent to) down_interruptible(). + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * signal arrived. */ int __sched mutex_lock_interruptible(struct mutex *lock) { @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex *lock) EXPORT_SYMBOL(mutex_lock_interruptible); +/** + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). If a signal which will be fatal to + * the current process is delivered while the process is sleeping, this + * function will return without acquiring the mutex. + * + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * fatal signal arrived. + */ int __sched mutex_lock_killable(struct mutex *lock) { might_sleep(); @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) } EXPORT_SYMBOL(mutex_lock_killable); +/** + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for I/O + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). While the task is waiting for this + * mutex, it will be accounted as being in the IO wait state by the + * scheduler. + * + * Context: Process context. + */ void __sched mutex_lock_io(struct mutex *lock) { int token; ^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH] Improve mutex documentation @ 2018-03-15 11:58 ` Matthew Wilcox 0 siblings, 0 replies; 27+ messages in thread From: Matthew Wilcox @ 2018-03-15 11:58 UTC (permalink / raw) To: Andrew Morton Cc: Kirill Tkhai, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > My memory is weak and our documentation is awful. What does > mutex_lock_killable() actually do and how does it differ from > mutex_lock_interruptible()? From: Matthew Wilcox <mawilcox@microsoft.com> Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the kernel-doc for mutex_lock_interruptible(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 858a07590e39..2048359f33d2 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -1082,15 +1082,16 @@ static noinline int __sched __mutex_lock_interruptible_slowpath(struct mutex *lock); /** - * mutex_lock_interruptible - acquire the mutex, interruptible - * @lock: the mutex to be acquired + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. + * @lock: The mutex to be acquired. * - * Lock the mutex like mutex_lock(), and return 0 if the mutex has - * been acquired or sleep until the mutex becomes available. If a - * signal arrives while waiting for the lock then this function - * returns -EINTR. + * Lock the mutex like mutex_lock(). If a signal is delivered while the + * process is sleeping, this function will return without acquiring the + * mutex. * - * This function is similar to (but not equivalent to) down_interruptible(). + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * signal arrived. */ int __sched mutex_lock_interruptible(struct mutex *lock) { @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex *lock) EXPORT_SYMBOL(mutex_lock_interruptible); +/** + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). If a signal which will be fatal to + * the current process is delivered while the process is sleeping, this + * function will return without acquiring the mutex. + * + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * fatal signal arrived. + */ int __sched mutex_lock_killable(struct mutex *lock) { might_sleep(); @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) } EXPORT_SYMBOL(mutex_lock_killable); +/** + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for I/O + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). While the task is waiting for this + * mutex, it will be accounted as being in the IO wait state by the + * scheduler. + * + * Context: Process context. + */ void __sched mutex_lock_io(struct mutex *lock) { int token; -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation 2018-03-15 11:58 ` Matthew Wilcox @ 2018-03-15 12:12 ` Kirill Tkhai -1 siblings, 0 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 12:12 UTC (permalink / raw) To: Matthew Wilcox, Andrew Morton Cc: tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar Hi, Matthew, On 15.03.2018 14:58, Matthew Wilcox wrote: > On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >> My memory is weak and our documentation is awful. What does >> mutex_lock_killable() actually do and how does it differ from >> mutex_lock_interruptible()? > > From: Matthew Wilcox <mawilcox@microsoft.com> > > Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the > kernel-doc for mutex_lock_interruptible(). > > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 858a07590e39..2048359f33d2 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -1082,15 +1082,16 @@ static noinline int __sched > __mutex_lock_interruptible_slowpath(struct mutex *lock); > > /** > - * mutex_lock_interruptible - acquire the mutex, interruptible > - * @lock: the mutex to be acquired > + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. > + * @lock: The mutex to be acquired. > * > - * Lock the mutex like mutex_lock(), and return 0 if the mutex has > - * been acquired or sleep until the mutex becomes available. If a > - * signal arrives while waiting for the lock then this function > - * returns -EINTR. > + * Lock the mutex like mutex_lock(). If a signal is delivered while the > + * process is sleeping, this function will return without acquiring the > + * mutex. > * > - * This function is similar to (but not equivalent to) down_interruptible(). > + * Context: Process context. > + * Return: 0 if the lock was successfully acquired or %-EINTR if a > + * signal arrived. > */ > int __sched mutex_lock_interruptible(struct mutex *lock) > { > @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex *lock) > > EXPORT_SYMBOL(mutex_lock_interruptible); > > +/** > + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. Shouldn't we clarify that fatal signals are SIGKILL only? > + * @lock: The mutex to be acquired. > + * > + * Lock the mutex like mutex_lock(). If a signal which will be fatal to > + * the current process is delivered while the process is sleeping, this > + * function will return without acquiring the mutex. > + * > + * Context: Process context. > + * Return: 0 if the lock was successfully acquired or %-EINTR if a > + * fatal signal arrived. > + */ > int __sched mutex_lock_killable(struct mutex *lock) > { > might_sleep(); > @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) > } > EXPORT_SYMBOL(mutex_lock_killable); > > +/** > + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for I/O > + * @lock: The mutex to be acquired. > + * > + * Lock the mutex like mutex_lock(). While the task is waiting for this > + * mutex, it will be accounted as being in the IO wait state by the > + * scheduler. > + * > + * Context: Process context. > + */ > void __sched mutex_lock_io(struct mutex *lock) > { > int token; > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation @ 2018-03-15 12:12 ` Kirill Tkhai 0 siblings, 0 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 12:12 UTC (permalink / raw) To: Matthew Wilcox, Andrew Morton Cc: tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar Hi, Matthew, On 15.03.2018 14:58, Matthew Wilcox wrote: > On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >> My memory is weak and our documentation is awful. What does >> mutex_lock_killable() actually do and how does it differ from >> mutex_lock_interruptible()? > > From: Matthew Wilcox <mawilcox@microsoft.com> > > Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the > kernel-doc for mutex_lock_interruptible(). > > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 858a07590e39..2048359f33d2 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -1082,15 +1082,16 @@ static noinline int __sched > __mutex_lock_interruptible_slowpath(struct mutex *lock); > > /** > - * mutex_lock_interruptible - acquire the mutex, interruptible > - * @lock: the mutex to be acquired > + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. > + * @lock: The mutex to be acquired. > * > - * Lock the mutex like mutex_lock(), and return 0 if the mutex has > - * been acquired or sleep until the mutex becomes available. If a > - * signal arrives while waiting for the lock then this function > - * returns -EINTR. > + * Lock the mutex like mutex_lock(). If a signal is delivered while the > + * process is sleeping, this function will return without acquiring the > + * mutex. > * > - * This function is similar to (but not equivalent to) down_interruptible(). > + * Context: Process context. > + * Return: 0 if the lock was successfully acquired or %-EINTR if a > + * signal arrived. > */ > int __sched mutex_lock_interruptible(struct mutex *lock) > { > @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex *lock) > > EXPORT_SYMBOL(mutex_lock_interruptible); > > +/** > + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. Shouldn't we clarify that fatal signals are SIGKILL only? > + * @lock: The mutex to be acquired. > + * > + * Lock the mutex like mutex_lock(). If a signal which will be fatal to > + * the current process is delivered while the process is sleeping, this > + * function will return without acquiring the mutex. > + * > + * Context: Process context. > + * Return: 0 if the lock was successfully acquired or %-EINTR if a > + * fatal signal arrived. > + */ > int __sched mutex_lock_killable(struct mutex *lock) > { > might_sleep(); > @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) > } > EXPORT_SYMBOL(mutex_lock_killable); > > +/** > + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for I/O > + * @lock: The mutex to be acquired. > + * > + * Lock the mutex like mutex_lock(). While the task is waiting for this > + * mutex, it will be accounted as being in the IO wait state by the > + * scheduler. > + * > + * Context: Process context. > + */ > void __sched mutex_lock_io(struct mutex *lock) > { > int token; > -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation 2018-03-15 12:12 ` Kirill Tkhai @ 2018-03-15 13:18 ` Matthew Wilcox -1 siblings, 0 replies; 27+ messages in thread From: Matthew Wilcox @ 2018-03-15 13:18 UTC (permalink / raw) To: Kirill Tkhai Cc: Andrew Morton, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar On Thu, Mar 15, 2018 at 03:12:30PM +0300, Kirill Tkhai wrote: > > +/** > > + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. > > Shouldn't we clarify that fatal signals are SIGKILL only? It's more complicated than it might seem (... welcome to signal handling!) If you send SIGINT to a task that's waiting on a mutex_killable(), it will still die. I *think* that's due to the code in complete_signal(): if (sig_fatal(p, sig) && !(signal->flags & SIGNAL_GROUP_EXIT) && !sigismember(&t->real_blocked, sig) && (sig == SIGKILL || !p->ptrace)) { ... sigaddset(&t->pending.signal, SIGKILL); You're correct that this code only checks for SIGKILL, but any fatal signal will result in the signal group receiving SIGKILL. Unless I've misunderstood, and it wouldn't be the first time I've misunderstood signal handling. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation @ 2018-03-15 13:18 ` Matthew Wilcox 0 siblings, 0 replies; 27+ messages in thread From: Matthew Wilcox @ 2018-03-15 13:18 UTC (permalink / raw) To: Kirill Tkhai Cc: Andrew Morton, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar On Thu, Mar 15, 2018 at 03:12:30PM +0300, Kirill Tkhai wrote: > > +/** > > + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. > > Shouldn't we clarify that fatal signals are SIGKILL only? It's more complicated than it might seem (... welcome to signal handling!) If you send SIGINT to a task that's waiting on a mutex_killable(), it will still die. I *think* that's due to the code in complete_signal(): if (sig_fatal(p, sig) && !(signal->flags & SIGNAL_GROUP_EXIT) && !sigismember(&t->real_blocked, sig) && (sig == SIGKILL || !p->ptrace)) { ... sigaddset(&t->pending.signal, SIGKILL); You're correct that this code only checks for SIGKILL, but any fatal signal will result in the signal group receiving SIGKILL. Unless I've misunderstood, and it wouldn't be the first time I've misunderstood signal handling. -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation 2018-03-15 13:18 ` Matthew Wilcox @ 2018-03-15 13:23 ` Kirill Tkhai -1 siblings, 0 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 13:23 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar On 15.03.2018 16:18, Matthew Wilcox wrote: > On Thu, Mar 15, 2018 at 03:12:30PM +0300, Kirill Tkhai wrote: >>> +/** >>> + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. >> >> Shouldn't we clarify that fatal signals are SIGKILL only? > > It's more complicated than it might seem (... welcome to signal handling!) > If you send SIGINT to a task that's waiting on a mutex_killable(), it will > still die. I *think* that's due to the code in complete_signal(): > > if (sig_fatal(p, sig) && > !(signal->flags & SIGNAL_GROUP_EXIT) && > !sigismember(&t->real_blocked, sig) && > (sig == SIGKILL || !p->ptrace)) { > ... > sigaddset(&t->pending.signal, SIGKILL); > > You're correct that this code only checks for SIGKILL, but any fatal > signal will result in the signal group receiving SIGKILL. > > Unless I've misunderstood, and it wouldn't be the first time I've > misunderstood signal handling. Sure, thanks for the explanation. Kirill ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation @ 2018-03-15 13:23 ` Kirill Tkhai 0 siblings, 0 replies; 27+ messages in thread From: Kirill Tkhai @ 2018-03-15 13:23 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Peter Zijlstra, Ingo Molnar On 15.03.2018 16:18, Matthew Wilcox wrote: > On Thu, Mar 15, 2018 at 03:12:30PM +0300, Kirill Tkhai wrote: >>> +/** >>> + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. >> >> Shouldn't we clarify that fatal signals are SIGKILL only? > > It's more complicated than it might seem (... welcome to signal handling!) > If you send SIGINT to a task that's waiting on a mutex_killable(), it will > still die. I *think* that's due to the code in complete_signal(): > > if (sig_fatal(p, sig) && > !(signal->flags & SIGNAL_GROUP_EXIT) && > !sigismember(&t->real_blocked, sig) && > (sig == SIGKILL || !p->ptrace)) { > ... > sigaddset(&t->pending.signal, SIGKILL); > > You're correct that this code only checks for SIGKILL, but any fatal > signal will result in the signal group receiving SIGKILL. > > Unless I've misunderstood, and it wouldn't be the first time I've > misunderstood signal handling. Sure, thanks for the explanation. Kirill -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation 2018-03-15 11:58 ` Matthew Wilcox @ 2018-03-16 13:57 ` Peter Zijlstra -1 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2018-03-16 13:57 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, Kirill Tkhai, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Ingo Molnar On Thu, Mar 15, 2018 at 04:58:12AM -0700, Matthew Wilcox wrote: > On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > > My memory is weak and our documentation is awful. What does > > mutex_lock_killable() actually do and how does it differ from > > mutex_lock_interruptible()? > > From: Matthew Wilcox <mawilcox@microsoft.com> > > Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the > kernel-doc for mutex_lock_interruptible(). > > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Thanks Matthew! ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Improve mutex documentation @ 2018-03-16 13:57 ` Peter Zijlstra 0 siblings, 0 replies; 27+ messages in thread From: Peter Zijlstra @ 2018-03-16 13:57 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, Kirill Tkhai, tj, cl, linux-mm, linux-kernel, linux-doc, Jonathan Corbet, Mauro Carvalho Chehab, Ingo Molnar On Thu, Mar 15, 2018 at 04:58:12AM -0700, Matthew Wilcox wrote: > On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > > My memory is weak and our documentation is awful. What does > > mutex_lock_killable() actually do and how does it differ from > > mutex_lock_interruptible()? > > From: Matthew Wilcox <mawilcox@microsoft.com> > > Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the > kernel-doc for mutex_lock_interruptible(). > > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Thanks Matthew! -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 27+ messages in thread
* [tip:locking/urgent] locking/mutex: Improve documentation 2018-03-15 11:58 ` Matthew Wilcox ` (2 preceding siblings ...) (?) @ 2018-03-20 11:07 ` tip-bot for Matthew Wilcox -1 siblings, 0 replies; 27+ messages in thread From: tip-bot for Matthew Wilcox @ 2018-03-20 11:07 UTC (permalink / raw) To: linux-tip-commits Cc: tglx, peterz, hpa, mchehab, corbet, linux-kernel, ktkhai, mawilcox, mingo, paulmck, akpm, torvalds Commit-ID: 45dbac0e288350f9a4226a5b4b651ed434dd9f85 Gitweb: https://git.kernel.org/tip/45dbac0e288350f9a4226a5b4b651ed434dd9f85 Author: Matthew Wilcox <mawilcox@microsoft.com> AuthorDate: Thu, 15 Mar 2018 04:58:12 -0700 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Tue, 20 Mar 2018 08:07:41 +0100 locking/mutex: Improve documentation On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > My memory is weak and our documentation is awful. What does > mutex_lock_killable() actually do and how does it differ from > mutex_lock_interruptible()? Add kernel-doc for mutex_lock_killable() and mutex_lock_io(). Reword the kernel-doc for mutex_lock_interruptible(). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: cl@linux.com Cc: tj@kernel.org Link: http://lkml.kernel.org/r/20180315115812.GA9949@bombadil.infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/locking/mutex.c | 37 ++++++++++++++++++++++++++++++------- 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 858a07590e39..2048359f33d2 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -1082,15 +1082,16 @@ static noinline int __sched __mutex_lock_interruptible_slowpath(struct mutex *lock); /** - * mutex_lock_interruptible - acquire the mutex, interruptible - * @lock: the mutex to be acquired + * mutex_lock_interruptible() - Acquire the mutex, interruptible by signals. + * @lock: The mutex to be acquired. * - * Lock the mutex like mutex_lock(), and return 0 if the mutex has - * been acquired or sleep until the mutex becomes available. If a - * signal arrives while waiting for the lock then this function - * returns -EINTR. + * Lock the mutex like mutex_lock(). If a signal is delivered while the + * process is sleeping, this function will return without acquiring the + * mutex. * - * This function is similar to (but not equivalent to) down_interruptible(). + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * signal arrived. */ int __sched mutex_lock_interruptible(struct mutex *lock) { @@ -1104,6 +1105,18 @@ int __sched mutex_lock_interruptible(struct mutex *lock) EXPORT_SYMBOL(mutex_lock_interruptible); +/** + * mutex_lock_killable() - Acquire the mutex, interruptible by fatal signals. + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). If a signal which will be fatal to + * the current process is delivered while the process is sleeping, this + * function will return without acquiring the mutex. + * + * Context: Process context. + * Return: 0 if the lock was successfully acquired or %-EINTR if a + * fatal signal arrived. + */ int __sched mutex_lock_killable(struct mutex *lock) { might_sleep(); @@ -1115,6 +1128,16 @@ int __sched mutex_lock_killable(struct mutex *lock) } EXPORT_SYMBOL(mutex_lock_killable); +/** + * mutex_lock_io() - Acquire the mutex and mark the process as waiting for I/O + * @lock: The mutex to be acquired. + * + * Lock the mutex like mutex_lock(). While the task is waiting for this + * mutex, it will be accounted as being in the IO wait state by the + * scheduler. + * + * Context: Process context. + */ void __sched mutex_lock_io(struct mutex *lock) { int token; ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-14 20:56 ` Andrew Morton 2018-03-14 22:09 ` Tejun Heo 2018-03-15 11:58 ` Matthew Wilcox @ 2018-03-19 15:14 ` Tejun Heo 2018-03-19 15:32 ` [PATCH v2] mm: " Kirill Tkhai 2 siblings, 1 reply; 27+ messages in thread From: Tejun Heo @ 2018-03-19 15:14 UTC (permalink / raw) To: Andrew Morton; +Cc: Kirill Tkhai, cl, linux-mm, linux-kernel On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: > > + if (!is_atomic) { > > + if (gfp & __GFP_NOFAIL) > > + mutex_lock(&pcpu_alloc_mutex); > > + else if (mutex_lock_killable(&pcpu_alloc_mutex)) > > + return NULL; > > + } > > It would benefit from a comment explaining why we're doing this (it's > for the oom-killer). And, yeah, this would be great. Kirill, can you please send a patch to add a comment there? Thanks. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2] mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-19 15:14 ` [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Tejun Heo @ 2018-03-19 15:32 ` Kirill Tkhai 2018-03-19 16:39 ` Tejun Heo 0 siblings, 1 reply; 27+ messages in thread From: Kirill Tkhai @ 2018-03-19 15:32 UTC (permalink / raw) To: Tejun Heo, Andrew Morton; +Cc: cl, linux-mm, linux-kernel From: Kirill Tkhai <ktkhai@virtuozzo.com> In case of memory deficit and low percpu memory pages, pcpu_balance_workfn() takes pcpu_alloc_mutex for a long time (as it makes memory allocations itself and waits for memory reclaim). If tasks doing pcpu_alloc() are choosen by OOM killer, they can't exit, because they are waiting for the mutex. The patch makes pcpu_alloc() to care about killing signal and use mutex_lock_killable(), when it's allowed by GFP flags. This guarantees, a task does not miss SIGKILL from OOM killer. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> --- v2: Added explaining comment mm/percpu.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/mm/percpu.c b/mm/percpu.c index 50e7fdf84055..605e3228baa6 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1369,8 +1369,17 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, return NULL; } - if (!is_atomic) - mutex_lock(&pcpu_alloc_mutex); + if (!is_atomic) { + /* + * pcpu_balance_workfn() allocates memory under this mutex, + * and it may wait for memory reclaim. Allow current task + * to become OOM victim, in case of memory pressure. + */ + if (gfp & __GFP_NOFAIL) + mutex_lock(&pcpu_alloc_mutex); + else if (mutex_lock_killable(&pcpu_alloc_mutex)) + return NULL; + } spin_lock_irqsave(&pcpu_lock, flags); ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2] mm: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() 2018-03-19 15:32 ` [PATCH v2] mm: " Kirill Tkhai @ 2018-03-19 16:39 ` Tejun Heo 0 siblings, 0 replies; 27+ messages in thread From: Tejun Heo @ 2018-03-19 16:39 UTC (permalink / raw) To: Kirill Tkhai; +Cc: Andrew Morton, cl, linux-mm, linux-kernel On Mon, Mar 19, 2018 at 06:32:10PM +0300, Kirill Tkhai wrote: > From: Kirill Tkhai <ktkhai@virtuozzo.com> > > In case of memory deficit and low percpu memory pages, > pcpu_balance_workfn() takes pcpu_alloc_mutex for a long > time (as it makes memory allocations itself and waits > for memory reclaim). If tasks doing pcpu_alloc() are > choosen by OOM killer, they can't exit, because they > are waiting for the mutex. > > The patch makes pcpu_alloc() to care about killing signal > and use mutex_lock_killable(), when it's allowed by GFP > flags. This guarantees, a task does not miss SIGKILL > from OOM killer. > > Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Applied to percpu/for-4.16-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2018-03-20 11:07 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-03-14 11:51 [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Kirill Tkhai 2018-03-14 13:55 ` Tejun Heo 2018-03-14 20:56 ` Andrew Morton 2018-03-14 22:09 ` Tejun Heo 2018-03-14 22:22 ` Andrew Morton 2018-03-15 8:58 ` Kirill Tkhai 2018-03-15 10:48 ` Tetsuo Handa 2018-03-15 10:48 ` Tetsuo Handa 2018-03-15 12:09 ` Kirill Tkhai 2018-03-15 12:09 ` Kirill Tkhai 2018-03-15 14:09 ` Tetsuo Handa 2018-03-15 14:42 ` Kirill Tkhai 2018-03-19 15:13 ` Tejun Heo 2018-03-15 11:58 ` [PATCH] Improve mutex documentation Matthew Wilcox 2018-03-15 11:58 ` Matthew Wilcox 2018-03-15 12:12 ` Kirill Tkhai 2018-03-15 12:12 ` Kirill Tkhai 2018-03-15 13:18 ` Matthew Wilcox 2018-03-15 13:18 ` Matthew Wilcox 2018-03-15 13:23 ` Kirill Tkhai 2018-03-15 13:23 ` Kirill Tkhai 2018-03-16 13:57 ` Peter Zijlstra 2018-03-16 13:57 ` Peter Zijlstra 2018-03-20 11:07 ` [tip:locking/urgent] locking/mutex: Improve documentation tip-bot for Matthew Wilcox 2018-03-19 15:14 ` [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Tejun Heo 2018-03-19 15:32 ` [PATCH v2] mm: " Kirill Tkhai 2018-03-19 16:39 ` Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.