All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luigi Semenzato <semenzato@google.com>
To: Sameer Nanda <snanda@chromium.org>, msb@facebook.com
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	mhocko@suse.cz, Johannes Weiner <hannes@cmpxchg.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm, oom: Fix race when selecting process to kill
Date: Tue, 5 Nov 2013 23:17:16 -0800	[thread overview]
Message-ID: <CAA25o9Q-HvjQ_5pFJgYNeutaCoYgPu=e=k7EHq=6-+jeEuhzoA@mail.gmail.com> (raw)
In-Reply-To: <CAA25o9QG2BOmV5MoXCH73sadKoRD6wPivKq6TLvEem8GhZeXGg@mail.gmail.com>

Regarding other fixes: would it be possible to have the thread
iterator insert a dummy marker element in the thread list before the
scan?  There would be one such dummy element per CPU, so that multiple
CPUs can scan the list in parallel.  The loop would skip such
elements, and each dummy element would be removed at the end of each
scan.

I think this would work, i.e. it would have all the right properties,
but I don't have a sense of whether the performance impact is
acceptable.  Probably not, or it would have been proposed earlier.



On Tue, Nov 5, 2013 at 8:45 PM, Luigi Semenzato <semenzato@google.com> wrote:
> It's interesting that this was known for 3+ years, but nobody bothered
> adding a small warning to the code.
>
> We noticed this because it's actually happening on Chromebooks in the
> field.  We try to minimize OOM kills, but we can deal with them.  Of
> course, a hung kernel we cannot deal with.
>
> On Tue, Nov 5, 2013 at 7:04 PM, Sameer Nanda <snanda@chromium.org> wrote:
>>
>>
>>
>> On Tue, Nov 5, 2013 at 5:27 PM, David Rientjes <rientjes@google.com> wrote:
>>>
>>> On Tue, 5 Nov 2013, Luigi Semenzato wrote:
>>>
>>> > It's not enough to hold a reference to the task struct, because it can
>>> > still be taken out of the circular list of threads.  The RCU
>>> > assumptions don't hold in that case.
>>> >
>>>
>>> Could you please post a proper bug report that isolates this at the cause?
>>
>>
>> We've been running into this issue on Chrome OS. crbug.com/256326 has
>> additional
>> details.  The issue manifests itself as a soft lockup.
>>
>> The kernel we've been seeing this on is 3.8.
>>
>> We have a pretty consistent repro currently.  Happy to try out other
>> suggestions
>> for a fix.
>>
>>>
>>>
>>> Thanks.
>>
>>
>>
>>
>> --
>> Sameer

WARNING: multiple messages have this Message-ID (diff)
From: Luigi Semenzato <semenzato@google.com>
To: Sameer Nanda <snanda@chromium.org>, msb@facebook.com
Cc: David Rientjes <rientjes@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	mhocko@suse.cz, Johannes Weiner <hannes@cmpxchg.org>,
	Rusty Russell <rusty@rustcorp.com.au>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm, oom: Fix race when selecting process to kill
Date: Tue, 5 Nov 2013 23:17:16 -0800	[thread overview]
Message-ID: <CAA25o9Q-HvjQ_5pFJgYNeutaCoYgPu=e=k7EHq=6-+jeEuhzoA@mail.gmail.com> (raw)
In-Reply-To: <CAA25o9QG2BOmV5MoXCH73sadKoRD6wPivKq6TLvEem8GhZeXGg@mail.gmail.com>

Regarding other fixes: would it be possible to have the thread
iterator insert a dummy marker element in the thread list before the
scan?  There would be one such dummy element per CPU, so that multiple
CPUs can scan the list in parallel.  The loop would skip such
elements, and each dummy element would be removed at the end of each
scan.

I think this would work, i.e. it would have all the right properties,
but I don't have a sense of whether the performance impact is
acceptable.  Probably not, or it would have been proposed earlier.



On Tue, Nov 5, 2013 at 8:45 PM, Luigi Semenzato <semenzato@google.com> wrote:
> It's interesting that this was known for 3+ years, but nobody bothered
> adding a small warning to the code.
>
> We noticed this because it's actually happening on Chromebooks in the
> field.  We try to minimize OOM kills, but we can deal with them.  Of
> course, a hung kernel we cannot deal with.
>
> On Tue, Nov 5, 2013 at 7:04 PM, Sameer Nanda <snanda@chromium.org> wrote:
>>
>>
>>
>> On Tue, Nov 5, 2013 at 5:27 PM, David Rientjes <rientjes@google.com> wrote:
>>>
>>> On Tue, 5 Nov 2013, Luigi Semenzato wrote:
>>>
>>> > It's not enough to hold a reference to the task struct, because it can
>>> > still be taken out of the circular list of threads.  The RCU
>>> > assumptions don't hold in that case.
>>> >
>>>
>>> Could you please post a proper bug report that isolates this at the cause?
>>
>>
>> We've been running into this issue on Chrome OS. crbug.com/256326 has
>> additional
>> details.  The issue manifests itself as a soft lockup.
>>
>> The kernel we've been seeing this on is 3.8.
>>
>> We have a pretty consistent repro currently.  Happy to try out other
>> suggestions
>> for a fix.
>>
>>>
>>>
>>> Thanks.
>>
>>
>>
>>
>> --
>> Sameer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-11-06  7:17 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-05 23:26 [PATCH] mm, oom: Fix race when selecting process to kill Sameer Nanda
2013-11-05 23:26 ` Sameer Nanda
2013-11-06  1:18 ` David Rientjes
2013-11-06  1:18   ` David Rientjes
2013-11-06  1:25   ` Luigi Semenzato
2013-11-06  1:25     ` Luigi Semenzato
2013-11-06  1:27     ` David Rientjes
2013-11-06  1:27       ` David Rientjes
2013-11-06  3:00       ` Vladimir Murzin
2013-11-06  3:00         ` Vladimir Murzin
2013-11-06  3:04       ` Sameer Nanda
2013-11-06  4:45         ` Luigi Semenzato
2013-11-06  4:45           ` Luigi Semenzato
2013-11-06  7:17           ` Luigi Semenzato [this message]
2013-11-06  7:17             ` Luigi Semenzato
2013-11-06 16:58             ` Sameer Nanda
2013-11-06 16:58               ` Sameer Nanda
2013-11-07  0:35               ` David Rientjes
2013-11-07  0:35                 ` David Rientjes
2013-11-07 19:34                 ` Sameer Nanda
2013-11-07 19:34                   ` Sameer Nanda
2013-11-08 18:07                 ` [PATCH v2] " Sameer Nanda
2013-11-08 18:07                   ` Sameer Nanda
2013-11-08 18:45                   ` Oleg Nesterov
2013-11-08 18:45                     ` Oleg Nesterov
2013-11-08 19:49                     ` [PATCH v3] " Sameer Nanda
2013-11-08 19:49                       ` Sameer Nanda
2013-11-09 15:16                       ` Oleg Nesterov
2013-11-09 15:16                         ` Oleg Nesterov
2013-11-11 23:15                         ` Sameer Nanda
2013-11-12  0:21                         ` [PATCH v4] " Sameer Nanda
2013-11-12  0:21                           ` Sameer Nanda
2013-11-12 15:13                           ` Michal Hocko
2013-11-12 15:13                             ` Michal Hocko
2013-11-12 20:01                           ` Oleg Nesterov
2013-11-12 20:01                             ` Oleg Nesterov
2013-11-12 20:08                             ` Sameer Nanda
2013-11-12 20:08                               ` Sameer Nanda
2013-11-12 20:23                               ` [PATCH v5] " Sameer Nanda
2013-11-12 20:23                                 ` Sameer Nanda
2013-11-13  2:33                                 ` David Rientjes
2013-11-13  2:33                                   ` David Rientjes
2013-11-13 16:46                                   ` Sameer Nanda
2013-11-13 16:46                                     ` Sameer Nanda
2013-11-13 17:18                                     ` [PATCH v6] " Sameer Nanda
2013-11-13 17:18                                       ` Sameer Nanda
2013-11-13 17:29                                       ` Oleg Nesterov
2013-11-13 17:29                                         ` Oleg Nesterov
2013-11-14 13:43                                       ` dserrg
2013-11-14 17:03                                         ` Sameer Nanda
2013-11-14 17:03                                           ` Sameer Nanda

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA25o9Q-HvjQ_5pFJgYNeutaCoYgPu=e=k7EHq=6-+jeEuhzoA@mail.gmail.com' \
    --to=semenzato@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=msb@facebook.com \
    --cc=rientjes@google.com \
    --cc=rusty@rustcorp.com.au \
    --cc=snanda@chromium.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.