From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> To: Kirill Tkhai <ktkhai@virtuozzo.com>, Andrew Morton <akpm@linux-foundation.org>, Tejun Heo <tj@kernel.org> Cc: cl@linux.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Date: Thu, 15 Mar 2018 19:48:13 +0900 [thread overview] Message-ID: <5a4a1aae-8c61-de28-d3cd-2f8f4355f050@i-love.sakura.ne.jp> (raw) In-Reply-To: <c5c1c98b-9e0c-ec09-36c6-4266ad239ef1@virtuozzo.com> On 2018/03/15 17:58, Kirill Tkhai wrote: > On 15.03.2018 01:22, Andrew Morton wrote: >> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: >> >>> Hello, Andrew. >>> >>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >>>> It would benefit from a comment explaining why we're doing this (it's >>>> for the oom-killer). >>> >>> Will add. >>> >>>> My memory is weak and our documentation is awful. What does >>>> mutex_lock_killable() actually do and how does it differ from >>>> mutex_lock_interruptible()? Userspace tasks can run pcpu_alloc() and I >>> >>> IIRC, killable listens only to SIGKILL. I think that killable listens to any signal which results in termination of that process. For example, if a process is configured to terminate upon SIGINT, fatal_signal_pending() becomes true upon SIGINT. >>> >>>> wonder if there's any way in which a userspace-delivered signal can >>>> disrupt another userspace task's memory allocation attempt? >>> >>> Hmm... maybe. Just honoring SIGKILL *should* be fine but the alloc >>> failure paths might be broken, so there are some risks. Given that >>> the cases where userspace tasks end up allocation percpu memory is >>> pretty limited and/or priviledged (like mount, bpf), I don't think the >>> risks are high tho. >> >> hm. spose so. Maybe. Are there other ways? I assume the time is >> being spent in pcpu_create_chunk()? We could drop the mutex while >> running that stuff and take the appropriate did-we-race-with-someone >> testing after retaking it. Or similar. > > The balance work spends its time in pcpu_populate_chunk(). There are > two stacks of this problem: Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc() are choosen by OOM killer, they can't exit, because they are waiting for the mutex." should not cause problems. Of course, giving up upon SIGKILL is nice regardless. > > [ 106.313267] kworker/2:2 D13832 936 2 0x80000000 > [ 106.313740] Workqueue: events pcpu_balance_workfn > [ 106.314109] Call Trace: > [ 106.314293] ? __schedule+0x267/0x750 > [ 106.314570] schedule+0x2d/0x90 > [ 106.314803] schedule_timeout+0x17f/0x390 > [ 106.315106] ? __next_timer_interrupt+0xc0/0xc0 > [ 106.315429] __alloc_pages_slowpath+0xb73/0xd90 > [ 106.315792] __alloc_pages_nodemask+0x16a/0x210 > [ 106.316148] pcpu_populate_chunk+0xce/0x300 > [ 106.316479] pcpu_balance_workfn+0x3f3/0x580 > [ 106.316853] ? _raw_spin_unlock_irq+0xe/0x30 > [ 106.317227] ? finish_task_switch+0x8d/0x250 > [ 106.317632] process_one_work+0x1b7/0x410 > [ 106.317970] worker_thread+0x26/0x3d0 > [ 106.318304] ? process_one_work+0x410/0x410 > [ 106.318649] kthread+0x10e/0x130 > [ 106.318916] ? __kthread_create_worker+0x120/0x120 > [ 106.319360] ret_from_fork+0x35/0x40 > > [ 106.453375] a.out D13400 3670 1 0x00100004 > [ 106.453880] Call Trace: > [ 106.454114] ? __schedule+0x267/0x750 > [ 106.454427] schedule+0x2d/0x90 > [ 106.454829] schedule_preempt_disabled+0xf/0x20 > [ 106.455422] __mutex_lock.isra.2+0x181/0x4d0 > [ 106.455988] ? pcpu_alloc+0x3c4/0x670 > [ 106.456465] pcpu_alloc+0x3c4/0x670 > [ 106.456973] ? preempt_count_add+0x63/0x90 > [ 106.457401] ? __local_bh_enable_ip+0x2e/0x60 > [ 106.457882] ipv6_add_dev+0x121/0x490 > [ 106.458330] addrconf_notify+0x27b/0x9a0 > [ 106.458823] ? inetdev_init+0xd7/0x150 > [ 106.459270] ? inetdev_event+0x339/0x4b0 > [ 106.459738] ? preempt_count_add+0x63/0x90 > [ 106.460243] ? _raw_spin_lock_irq+0xf/0x30 > [ 106.460747] ? notifier_call_chain+0x42/0x60 > [ 106.461271] notifier_call_chain+0x42/0x60 > [ 106.461819] register_netdevice+0x415/0x530 > [ 106.462364] register_netdev+0x11/0x20 > [ 106.462849] loopback_net_init+0x43/0x90 > [ 106.463216] ops_init+0x3b/0x100 > [ 106.463516] setup_net+0x7d/0x150 > [ 106.463831] copy_net_ns+0x14b/0x180 > [ 106.464134] create_new_namespaces+0x117/0x1b0 > [ 106.464481] unshare_nsproxy_namespaces+0x5b/0x90 > [ 106.464864] SyS_unshare+0x1b0/0x300 > > [ 106.536845] Kernel panic - not syncing: Out of memory and no killable processes... These two stacks of this problem are not blocked at mutex_lock(). Why all OOM-killable threads were killed? There were only few? Does pcpu_alloc() allocate so much enough to deplete memory reserves?
WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> To: Kirill Tkhai <ktkhai@virtuozzo.com>, Andrew Morton <akpm@linux-foundation.org>, Tejun Heo <tj@kernel.org> Cc: cl@linux.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Date: Thu, 15 Mar 2018 19:48:13 +0900 [thread overview] Message-ID: <5a4a1aae-8c61-de28-d3cd-2f8f4355f050@i-love.sakura.ne.jp> (raw) In-Reply-To: <c5c1c98b-9e0c-ec09-36c6-4266ad239ef1@virtuozzo.com> On 2018/03/15 17:58, Kirill Tkhai wrote: > On 15.03.2018 01:22, Andrew Morton wrote: >> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote: >> >>> Hello, Andrew. >>> >>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote: >>>> It would benefit from a comment explaining why we're doing this (it's >>>> for the oom-killer). >>> >>> Will add. >>> >>>> My memory is weak and our documentation is awful.A What does >>>> mutex_lock_killable() actually do and how does it differ from >>>> mutex_lock_interruptible()?A Userspace tasks can run pcpu_alloc() and I >>> >>> IIRC, killable listens only to SIGKILL. I think that killable listens to any signal which results in termination of that process. For example, if a process is configured to terminate upon SIGINT, fatal_signal_pending() becomes true upon SIGINT. >>> >>>> wonder if there's any way in which a userspace-delivered signal can >>>> disrupt another userspace task's memory allocation attempt? >>> >>> Hmm... maybe.A Just honoring SIGKILL *should* be fine but the alloc >>> failure paths might be broken, so there are some risks.A Given that >>> the cases where userspace tasks end up allocation percpu memory is >>> pretty limited and/or priviledged (like mount, bpf), I don't think the >>> risks are high tho. >> >> hm.A spose so.A Maybe.A Are there other ways?A I assume the time is >> being spent in pcpu_create_chunk()?A We could drop the mutex while >> running that stuff and take the appropriate did-we-race-with-someone >> testing after retaking it.A Or similar. > > The balance work spends its time in pcpu_populate_chunk(). There are > two stacks of this problem: Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc() are choosen by OOM killer, they can't exit, because they are waiting for the mutex." should not cause problems. Of course, giving up upon SIGKILL is nice regardless. > > [A 106.313267] kworker/2:2A A A A D13832A A 936A A A A A 2 0x80000000 > [A 106.313740] Workqueue: events pcpu_balance_workfn > [A 106.314109] Call Trace: > [A 106.314293]A ? __schedule+0x267/0x750 > [A 106.314570]A schedule+0x2d/0x90 > [A 106.314803]A schedule_timeout+0x17f/0x390 > [A 106.315106]A ? __next_timer_interrupt+0xc0/0xc0 > [A 106.315429]A __alloc_pages_slowpath+0xb73/0xd90 > [A 106.315792]A __alloc_pages_nodemask+0x16a/0x210 > [A 106.316148]A pcpu_populate_chunk+0xce/0x300 > [A 106.316479]A pcpu_balance_workfn+0x3f3/0x580 > [A 106.316853]A ? _raw_spin_unlock_irq+0xe/0x30 > [A 106.317227]A ? finish_task_switch+0x8d/0x250 > [A 106.317632]A process_one_work+0x1b7/0x410 > [A 106.317970]A worker_thread+0x26/0x3d0 > [A 106.318304]A ? process_one_work+0x410/0x410 > [A 106.318649]A kthread+0x10e/0x130 > [A 106.318916]A ? __kthread_create_worker+0x120/0x120 > [A 106.319360]A ret_from_fork+0x35/0x40 > > [A 106.453375] a.outA A A A A A A A A A D13400A 3670A A A A A 1 0x00100004 > [A 106.453880] Call Trace: > [A 106.454114]A ? __schedule+0x267/0x750 > [A 106.454427]A schedule+0x2d/0x90 > [A 106.454829]A schedule_preempt_disabled+0xf/0x20 > [A 106.455422]A __mutex_lock.isra.2+0x181/0x4d0 > [A 106.455988]A ? pcpu_alloc+0x3c4/0x670 > [A 106.456465]A pcpu_alloc+0x3c4/0x670 > [A 106.456973]A ? preempt_count_add+0x63/0x90 > [A 106.457401]A ? __local_bh_enable_ip+0x2e/0x60 > [A 106.457882]A ipv6_add_dev+0x121/0x490 > [A 106.458330]A addrconf_notify+0x27b/0x9a0 > [A 106.458823]A ? inetdev_init+0xd7/0x150 > [A 106.459270]A ? inetdev_event+0x339/0x4b0 > [A 106.459738]A ? preempt_count_add+0x63/0x90 > [A 106.460243]A ? _raw_spin_lock_irq+0xf/0x30 > [A 106.460747]A ? notifier_call_chain+0x42/0x60 > [A 106.461271]A notifier_call_chain+0x42/0x60 > [A 106.461819]A register_netdevice+0x415/0x530 > [A 106.462364]A register_netdev+0x11/0x20 > [A 106.462849]A loopback_net_init+0x43/0x90 > [A 106.463216]A ops_init+0x3b/0x100 > [A 106.463516]A setup_net+0x7d/0x150 > [A 106.463831]A copy_net_ns+0x14b/0x180 > [A 106.464134]A create_new_namespaces+0x117/0x1b0 > [A 106.464481]A unshare_nsproxy_namespaces+0x5b/0x90 > [A 106.464864]A SyS_unshare+0x1b0/0x300 > > [A 106.536845] Kernel panic - not syncing: Out of memory and no killable processes... These two stacks of this problem are not blocked at mutex_lock(). Why all OOM-killable threads were killed? There were only few? Does pcpu_alloc() allocate so much enough to deplete memory reserves?
next prev parent reply other threads:[~2018-03-15 10:48 UTC|newest] Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-03-14 11:51 [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Kirill Tkhai 2018-03-14 13:55 ` Tejun Heo 2018-03-14 20:56 ` Andrew Morton 2018-03-14 22:09 ` Tejun Heo 2018-03-14 22:22 ` Andrew Morton 2018-03-15 8:58 ` Kirill Tkhai 2018-03-15 10:48 ` Tetsuo Handa [this message] 2018-03-15 10:48 ` Tetsuo Handa 2018-03-15 12:09 ` Kirill Tkhai 2018-03-15 12:09 ` Kirill Tkhai 2018-03-15 14:09 ` Tetsuo Handa 2018-03-15 14:42 ` Kirill Tkhai 2018-03-19 15:13 ` Tejun Heo 2018-03-15 11:58 ` [PATCH] Improve mutex documentation Matthew Wilcox 2018-03-15 11:58 ` Matthew Wilcox 2018-03-15 12:12 ` Kirill Tkhai 2018-03-15 12:12 ` Kirill Tkhai 2018-03-15 13:18 ` Matthew Wilcox 2018-03-15 13:18 ` Matthew Wilcox 2018-03-15 13:23 ` Kirill Tkhai 2018-03-15 13:23 ` Kirill Tkhai 2018-03-16 13:57 ` Peter Zijlstra 2018-03-16 13:57 ` Peter Zijlstra 2018-03-20 11:07 ` [tip:locking/urgent] locking/mutex: Improve documentation tip-bot for Matthew Wilcox 2018-03-19 15:14 ` [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Tejun Heo 2018-03-19 15:32 ` [PATCH v2] mm: " Kirill Tkhai 2018-03-19 16:39 ` Tejun Heo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=5a4a1aae-8c61-de28-d3cd-2f8f4355f050@i-love.sakura.ne.jp \ --to=penguin-kernel@i-love.sakura.ne.jp \ --cc=akpm@linux-foundation.org \ --cc=cl@linux.com \ --cc=ktkhai@virtuozzo.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.