All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>
Cc: cl@linux.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()
Date: Thu, 15 Mar 2018 19:48:13 +0900	[thread overview]
Message-ID: <5a4a1aae-8c61-de28-d3cd-2f8f4355f050@i-love.sakura.ne.jp> (raw)
In-Reply-To: <c5c1c98b-9e0c-ec09-36c6-4266ad239ef1@virtuozzo.com>

On 2018/03/15 17:58, Kirill Tkhai wrote:
> On 15.03.2018 01:22, Andrew Morton wrote:
>> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote:
>>
>>> Hello, Andrew.
>>>
>>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote:
>>>> It would benefit from a comment explaining why we're doing this (it's
>>>> for the oom-killer).
>>>
>>> Will add.
>>>
>>>> My memory is weak and our documentation is awful.  What does
>>>> mutex_lock_killable() actually do and how does it differ from
>>>> mutex_lock_interruptible()?  Userspace tasks can run pcpu_alloc() and I
>>>
>>> IIRC, killable listens only to SIGKILL.

I think that killable listens to any signal which results in termination of
that process. For example, if a process is configured to terminate upon SIGINT,
fatal_signal_pending() becomes true upon SIGINT.

>>>
>>>> wonder if there's any way in which a userspace-delivered signal can
>>>> disrupt another userspace task's memory allocation attempt?
>>>
>>> Hmm... maybe.  Just honoring SIGKILL *should* be fine but the alloc
>>> failure paths might be broken, so there are some risks.  Given that
>>> the cases where userspace tasks end up allocation percpu memory is
>>> pretty limited and/or priviledged (like mount, bpf), I don't think the
>>> risks are high tho.
>>
>> hm.  spose so.  Maybe.  Are there other ways?  I assume the time is
>> being spent in pcpu_create_chunk()?  We could drop the mutex while
>> running that stuff and take the appropriate did-we-race-with-someone
>> testing after retaking it.  Or similar.
>
> The balance work spends its time in pcpu_populate_chunk(). There are
> two stacks of this problem:

Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper
reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc()
are choosen by OOM killer, they can't exit, because they are waiting for the
mutex." should not cause problems. Of course, giving up upon SIGKILL is nice
regardless.

>
> [  106.313267] kworker/2:2     D13832   936      2 0x80000000
> [  106.313740] Workqueue: events pcpu_balance_workfn
> [  106.314109] Call Trace:
> [  106.314293]  ? __schedule+0x267/0x750
> [  106.314570]  schedule+0x2d/0x90
> [  106.314803]  schedule_timeout+0x17f/0x390
> [  106.315106]  ? __next_timer_interrupt+0xc0/0xc0
> [  106.315429]  __alloc_pages_slowpath+0xb73/0xd90
> [  106.315792]  __alloc_pages_nodemask+0x16a/0x210
> [  106.316148]  pcpu_populate_chunk+0xce/0x300
> [  106.316479]  pcpu_balance_workfn+0x3f3/0x580
> [  106.316853]  ? _raw_spin_unlock_irq+0xe/0x30
> [  106.317227]  ? finish_task_switch+0x8d/0x250
> [  106.317632]  process_one_work+0x1b7/0x410
> [  106.317970]  worker_thread+0x26/0x3d0
> [  106.318304]  ? process_one_work+0x410/0x410
> [  106.318649]  kthread+0x10e/0x130
> [  106.318916]  ? __kthread_create_worker+0x120/0x120
> [  106.319360]  ret_from_fork+0x35/0x40
>
> [  106.453375] a.out           D13400  3670      1 0x00100004
> [  106.453880] Call Trace:
> [  106.454114]  ? __schedule+0x267/0x750
> [  106.454427]  schedule+0x2d/0x90
> [  106.454829]  schedule_preempt_disabled+0xf/0x20
> [  106.455422]  __mutex_lock.isra.2+0x181/0x4d0
> [  106.455988]  ? pcpu_alloc+0x3c4/0x670
> [  106.456465]  pcpu_alloc+0x3c4/0x670
> [  106.456973]  ? preempt_count_add+0x63/0x90
> [  106.457401]  ? __local_bh_enable_ip+0x2e/0x60
> [  106.457882]  ipv6_add_dev+0x121/0x490
> [  106.458330]  addrconf_notify+0x27b/0x9a0
> [  106.458823]  ? inetdev_init+0xd7/0x150
> [  106.459270]  ? inetdev_event+0x339/0x4b0
> [  106.459738]  ? preempt_count_add+0x63/0x90
> [  106.460243]  ? _raw_spin_lock_irq+0xf/0x30
> [  106.460747]  ? notifier_call_chain+0x42/0x60
> [  106.461271]  notifier_call_chain+0x42/0x60
> [  106.461819]  register_netdevice+0x415/0x530
> [  106.462364]  register_netdev+0x11/0x20
> [  106.462849]  loopback_net_init+0x43/0x90
> [  106.463216]  ops_init+0x3b/0x100
> [  106.463516]  setup_net+0x7d/0x150
> [  106.463831]  copy_net_ns+0x14b/0x180
> [  106.464134]  create_new_namespaces+0x117/0x1b0
> [  106.464481]  unshare_nsproxy_namespaces+0x5b/0x90
> [  106.464864]  SyS_unshare+0x1b0/0x300
>
> [  106.536845] Kernel panic - not syncing: Out of memory and no killable processes...

These two stacks of this problem are not blocked at mutex_lock().

Why all OOM-killable threads were killed? There were only few?
Does pcpu_alloc() allocate so much enough to deplete memory reserves?

WARNING: multiple messages have this Message-ID (diff)
From: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
To: Kirill Tkhai <ktkhai@virtuozzo.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Tejun Heo <tj@kernel.org>
Cc: cl@linux.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn()
Date: Thu, 15 Mar 2018 19:48:13 +0900	[thread overview]
Message-ID: <5a4a1aae-8c61-de28-d3cd-2f8f4355f050@i-love.sakura.ne.jp> (raw)
In-Reply-To: <c5c1c98b-9e0c-ec09-36c6-4266ad239ef1@virtuozzo.com>

On 2018/03/15 17:58, Kirill Tkhai wrote:
> On 15.03.2018 01:22, Andrew Morton wrote:
>> On Wed, 14 Mar 2018 15:09:09 -0700 Tejun Heo <tj@kernel.org> wrote:
>>
>>> Hello, Andrew.
>>>
>>> On Wed, Mar 14, 2018 at 01:56:31PM -0700, Andrew Morton wrote:
>>>> It would benefit from a comment explaining why we're doing this (it's
>>>> for the oom-killer).
>>>
>>> Will add.
>>>
>>>> My memory is weak and our documentation is awful.A  What does
>>>> mutex_lock_killable() actually do and how does it differ from
>>>> mutex_lock_interruptible()?A  Userspace tasks can run pcpu_alloc() and I
>>>
>>> IIRC, killable listens only to SIGKILL.

I think that killable listens to any signal which results in termination of
that process. For example, if a process is configured to terminate upon SIGINT,
fatal_signal_pending() becomes true upon SIGINT.

>>>
>>>> wonder if there's any way in which a userspace-delivered signal can
>>>> disrupt another userspace task's memory allocation attempt?
>>>
>>> Hmm... maybe.A  Just honoring SIGKILL *should* be fine but the alloc
>>> failure paths might be broken, so there are some risks.A  Given that
>>> the cases where userspace tasks end up allocation percpu memory is
>>> pretty limited and/or priviledged (like mount, bpf), I don't think the
>>> risks are high tho.
>>
>> hm.A  spose so.A  Maybe.A  Are there other ways?A  I assume the time is
>> being spent in pcpu_create_chunk()?A  We could drop the mutex while
>> running that stuff and take the appropriate did-we-race-with-someone
>> testing after retaking it.A  Or similar.
>
> The balance work spends its time in pcpu_populate_chunk(). There are
> two stacks of this problem:

Will you show me more contexts? Unless CONFIG_MMU=n kernels, the OOM reaper
reclaims memory from the OOM victim. Therefore, "If tasks doing pcpu_alloc()
are choosen by OOM killer, they can't exit, because they are waiting for the
mutex." should not cause problems. Of course, giving up upon SIGKILL is nice
regardless.

>
> [A  106.313267] kworker/2:2A A A A  D13832A A  936A A A A A  2 0x80000000
> [A  106.313740] Workqueue: events pcpu_balance_workfn
> [A  106.314109] Call Trace:
> [A  106.314293]A  ? __schedule+0x267/0x750
> [A  106.314570]A  schedule+0x2d/0x90
> [A  106.314803]A  schedule_timeout+0x17f/0x390
> [A  106.315106]A  ? __next_timer_interrupt+0xc0/0xc0
> [A  106.315429]A  __alloc_pages_slowpath+0xb73/0xd90
> [A  106.315792]A  __alloc_pages_nodemask+0x16a/0x210
> [A  106.316148]A  pcpu_populate_chunk+0xce/0x300
> [A  106.316479]A  pcpu_balance_workfn+0x3f3/0x580
> [A  106.316853]A  ? _raw_spin_unlock_irq+0xe/0x30
> [A  106.317227]A  ? finish_task_switch+0x8d/0x250
> [A  106.317632]A  process_one_work+0x1b7/0x410
> [A  106.317970]A  worker_thread+0x26/0x3d0
> [A  106.318304]A  ? process_one_work+0x410/0x410
> [A  106.318649]A  kthread+0x10e/0x130
> [A  106.318916]A  ? __kthread_create_worker+0x120/0x120
> [A  106.319360]A  ret_from_fork+0x35/0x40
>
> [A  106.453375] a.outA A A A A A A A A A  D13400A  3670A A A A A  1 0x00100004
> [A  106.453880] Call Trace:
> [A  106.454114]A  ? __schedule+0x267/0x750
> [A  106.454427]A  schedule+0x2d/0x90
> [A  106.454829]A  schedule_preempt_disabled+0xf/0x20
> [A  106.455422]A  __mutex_lock.isra.2+0x181/0x4d0
> [A  106.455988]A  ? pcpu_alloc+0x3c4/0x670
> [A  106.456465]A  pcpu_alloc+0x3c4/0x670
> [A  106.456973]A  ? preempt_count_add+0x63/0x90
> [A  106.457401]A  ? __local_bh_enable_ip+0x2e/0x60
> [A  106.457882]A  ipv6_add_dev+0x121/0x490
> [A  106.458330]A  addrconf_notify+0x27b/0x9a0
> [A  106.458823]A  ? inetdev_init+0xd7/0x150
> [A  106.459270]A  ? inetdev_event+0x339/0x4b0
> [A  106.459738]A  ? preempt_count_add+0x63/0x90
> [A  106.460243]A  ? _raw_spin_lock_irq+0xf/0x30
> [A  106.460747]A  ? notifier_call_chain+0x42/0x60
> [A  106.461271]A  notifier_call_chain+0x42/0x60
> [A  106.461819]A  register_netdevice+0x415/0x530
> [A  106.462364]A  register_netdev+0x11/0x20
> [A  106.462849]A  loopback_net_init+0x43/0x90
> [A  106.463216]A  ops_init+0x3b/0x100
> [A  106.463516]A  setup_net+0x7d/0x150
> [A  106.463831]A  copy_net_ns+0x14b/0x180
> [A  106.464134]A  create_new_namespaces+0x117/0x1b0
> [A  106.464481]A  unshare_nsproxy_namespaces+0x5b/0x90
> [A  106.464864]A  SyS_unshare+0x1b0/0x300
>
> [A  106.536845] Kernel panic - not syncing: Out of memory and no killable processes...

These two stacks of this problem are not blocked at mutex_lock().

Why all OOM-killable threads were killed? There were only few?
Does pcpu_alloc() allocate so much enough to deplete memory reserves?

  reply	other threads:[~2018-03-15 10:48 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-14 11:51 [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Kirill Tkhai
2018-03-14 13:55 ` Tejun Heo
2018-03-14 20:56 ` Andrew Morton
2018-03-14 22:09   ` Tejun Heo
2018-03-14 22:22     ` Andrew Morton
2018-03-15  8:58       ` Kirill Tkhai
2018-03-15 10:48         ` Tetsuo Handa [this message]
2018-03-15 10:48           ` Tetsuo Handa
2018-03-15 12:09           ` Kirill Tkhai
2018-03-15 12:09             ` Kirill Tkhai
2018-03-15 14:09             ` Tetsuo Handa
2018-03-15 14:42               ` Kirill Tkhai
2018-03-19 15:13       ` Tejun Heo
2018-03-15 11:58   ` [PATCH] Improve mutex documentation Matthew Wilcox
2018-03-15 11:58     ` Matthew Wilcox
2018-03-15 12:12     ` Kirill Tkhai
2018-03-15 12:12       ` Kirill Tkhai
2018-03-15 13:18       ` Matthew Wilcox
2018-03-15 13:18         ` Matthew Wilcox
2018-03-15 13:23         ` Kirill Tkhai
2018-03-15 13:23           ` Kirill Tkhai
2018-03-16 13:57     ` Peter Zijlstra
2018-03-16 13:57       ` Peter Zijlstra
2018-03-20 11:07     ` [tip:locking/urgent] locking/mutex: Improve documentation tip-bot for Matthew Wilcox
2018-03-19 15:14   ` [PATCH] percpu: Allow to kill tasks doing pcpu_alloc() and waiting for pcpu_balance_workfn() Tejun Heo
2018-03-19 15:32     ` [PATCH v2] mm: " Kirill Tkhai
2018-03-19 16:39       ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5a4a1aae-8c61-de28-d3cd-2f8f4355f050@i-love.sakura.ne.jp \
    --to=penguin-kernel@i-love.sakura.ne.jp \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.