From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754086Ab3KGTfM (ORCPT ); Thu, 7 Nov 2013 14:35:12 -0500 Received: from mail-wg0-f53.google.com ([74.125.82.53]:40506 "EHLO mail-wg0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752658Ab3KGTfF (ORCPT ); Thu, 7 Nov 2013 14:35:05 -0500 MIME-Version: 1.0 In-Reply-To: References: <1383693987-14171-1-git-send-email-snanda@chromium.org> From: Sameer Nanda Date: Thu, 7 Nov 2013 11:34:43 -0800 X-Google-Sender-Auth: m-7OqlRjQgIEYxDHWxgyWgztBVk Message-ID: Subject: Re: [PATCH] mm, oom: Fix race when selecting process to kill To: David Rientjes Cc: Luigi Semenzato , msb@facebook.com, Andrew Morton , mhocko@suse.cz, Johannes Weiner , Rusty Russell , oleg@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 6, 2013 at 4:35 PM, David Rientjes wrote: > On Wed, 6 Nov 2013, Sameer Nanda wrote: > >> David -- I think we can make the duration that the tasklist_lock is >> held smaller by consolidating the process selection logic that is >> currently split across select_bad_process and oom_kill_process into >> one place in select_bad_process. The tasklist_lock would then need to >> be held only when the thread lists are being traversed. Would you be >> ok with that? I can re-spin the patch if that sounds like a workable >> option. >> > > No, this caused hundreds of machines to hit soft lockups for Google > because there's no synchronization that prevents dozens of cpus to take > tasklist_lock in the oom killer during parallel memcg oom conditions and > never allow the write_lock_irq() on fork() or exit() to make progress. We > absolutely must hold tasklist_lock for as little time as possible in the > oom killer. > > That said, I've never actually seen your reported bug manifest in our > production environment so let's see if Oleg has any ideas. Is the path you are referring to mem_cgroup_out_of_memory calling oom_kill_process? If so, then that path doesn't appear to suffer from the two step select_bad_process, oom_kill_process race since mem_cgroup_out_of_memory directly calls oom_kill_process without going through select_bad_process. This also means that the patch I sent is incorrect since it removes the existing tasklist_lock protection in oom_kill_process. Respinning patch to take care of this case. -- Sameer From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pa0-f43.google.com (mail-pa0-f43.google.com [209.85.220.43]) by kanga.kvack.org (Postfix) with ESMTP id 0A0AA6B0171 for ; Thu, 7 Nov 2013 14:35:08 -0500 (EST) Received: by mail-pa0-f43.google.com with SMTP id hz1so1075018pad.2 for ; Thu, 07 Nov 2013 11:35:08 -0800 (PST) Received: from psmtp.com ([74.125.245.136]) by mx.google.com with SMTP id r4si4020942pan.159.2013.11.07.11.35.05 for ; Thu, 07 Nov 2013 11:35:06 -0800 (PST) Received: by mail-we0-f175.google.com with SMTP id t61so994224wes.6 for ; Thu, 07 Nov 2013 11:35:03 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <1383693987-14171-1-git-send-email-snanda@chromium.org> From: Sameer Nanda Date: Thu, 7 Nov 2013 11:34:43 -0800 Message-ID: Subject: Re: [PATCH] mm, oom: Fix race when selecting process to kill Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: David Rientjes Cc: Luigi Semenzato , msb@facebook.com, Andrew Morton , mhocko@suse.cz, Johannes Weiner , Rusty Russell , oleg@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Wed, Nov 6, 2013 at 4:35 PM, David Rientjes wrote: > On Wed, 6 Nov 2013, Sameer Nanda wrote: > >> David -- I think we can make the duration that the tasklist_lock is >> held smaller by consolidating the process selection logic that is >> currently split across select_bad_process and oom_kill_process into >> one place in select_bad_process. The tasklist_lock would then need to >> be held only when the thread lists are being traversed. Would you be >> ok with that? I can re-spin the patch if that sounds like a workable >> option. >> > > No, this caused hundreds of machines to hit soft lockups for Google > because there's no synchronization that prevents dozens of cpus to take > tasklist_lock in the oom killer during parallel memcg oom conditions and > never allow the write_lock_irq() on fork() or exit() to make progress. We > absolutely must hold tasklist_lock for as little time as possible in the > oom killer. > > That said, I've never actually seen your reported bug manifest in our > production environment so let's see if Oleg has any ideas. Is the path you are referring to mem_cgroup_out_of_memory calling oom_kill_process? If so, then that path doesn't appear to suffer from the two step select_bad_process, oom_kill_process race since mem_cgroup_out_of_memory directly calls oom_kill_process without going through select_bad_process. This also means that the patch I sent is incorrect since it removes the existing tasklist_lock protection in oom_kill_process. Respinning patch to take care of this case. -- Sameer -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org