From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93D0DC433DF for ; Sat, 18 Jul 2020 02:16:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4E2DE2074B for ; Sat, 18 Jul 2020 02:16:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="q2EpGaS/" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E2DE2074B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B1F266B007B; Fri, 17 Jul 2020 22:16:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACEE36B007D; Fri, 17 Jul 2020 22:16:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E6238D0001; Fri, 17 Jul 2020 22:16:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id 896E06B007B for ; Fri, 17 Jul 2020 22:16:39 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0E8C38249980 for ; Sat, 18 Jul 2020 02:16:39 +0000 (UTC) X-FDA: 77049582918.30.soda35_3b05bb526f0f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id B4E8618008A20 for ; Sat, 18 Jul 2020 02:16:28 +0000 (UTC) X-HE-Tag: soda35_3b05bb526f0f X-Filterd-Recvd-Size: 7600 Received: from mail-il1-f194.google.com (mail-il1-f194.google.com [209.85.166.194]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Sat, 18 Jul 2020 02:16:28 +0000 (UTC) Received: by mail-il1-f194.google.com with SMTP id k6so8911491ili.6 for ; Fri, 17 Jul 2020 19:16:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=4/lgh68GPKEB5zM92/WIahhuRAW9NJjfxux5JAFZkSQ=; b=q2EpGaS/gQZVnXH8kbYANyJMJoaGirtIWi6mP3hZhVf7tjEk42eno0WsrCllT3nlnb fT8gJUtpMadwTO7pX6ZJpPSM0h3ySq5s1AcCgwhZyhLyBQvnl+Fc5b5xUVaK6fHAfdOO eYJXE/VuiLgq1wZCYbKqGeVNAXU2NkbtKIhJNgEJvRP5c63gBGMLpRwg9ofzf46dTCy4 7rwtid4mqURLxpf9IxySEVbHaC6EHlWOnPKGo+iQJalbioP7gYoBSyuUcixc2FgARhOJ ZxXgEwlhgS6psQCmpxEeVXE/TVKxEJ+fU+mlo4uuodPWbG8fUN5CbJqwzcFH37gocwr6 4xyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=4/lgh68GPKEB5zM92/WIahhuRAW9NJjfxux5JAFZkSQ=; b=BZp79KRWS6qfsf77TFb4wFThejxEHgwv0sUUHP5G19NwVK3luXo42ZepYcbFYkbjo7 Ou8+oaMTdsfQ/imGUgJVFG8tThkXD2HXeWKU4ZeFpUaVCt4G9EkZNiw5c9AqXwDSfmf9 a6naqfz4Ah3qmmfKAI6qXX0E6QA6N3LB2a4qjTwZhRUHqsv4AoJCwhVf+81iOZKitzni vG0XBsnl07OSlzhWoFqUl8tAWkI1y3BcoPfFgv4FOFMoYyfTmzLNAMaAue5/o86dofFl JbKXDDvu1WF1YCPABoiGdj0B5yQrUVTLJF+Nexu047Tp1Qe99LzQmMOunE6U6S/Xm5t7 582g== X-Gm-Message-State: AOAM533OesXa6ChBMqGene+25ULZqsEFLajJ0DxOmG6My63ad3/Igx85 clZ7SW50gKL77B6z+3Pyi8WcwTGevcxDwXcjeks= X-Google-Smtp-Source: ABdhPJymd0S4Htv49mdtEjW+e4gYKZxPKO2dGcE5Bo4xsYiUqFHOtQTg5wfyv5P5WtmqVi9oNAedzdGWyFE5qVh6G6Q= X-Received: by 2002:a92:da4c:: with SMTP id p12mr12616908ilq.142.1595038587734; Fri, 17 Jul 2020 19:16:27 -0700 (PDT) MIME-Version: 1.0 References: <1594735034-19190-1-git-send-email-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Sat, 18 Jul 2020 10:15:51 +0800 Message-ID: Subject: Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom To: David Rientjes Cc: Michal Hocko , Tetsuo Handa , Andrew Morton , Johannes Weiner , Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: B4E8618008A20 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jul 18, 2020 at 3:26 AM David Rientjes wrote: > > On Fri, 17 Jul 2020, Yafang Shao wrote: > > > > > Actually the kernel is doing it now, see bellow, > > > > > > > > dump_header() <<<< dump lots of information > > > > __oom_kill_process > > > > p =3D find_lock_task_mm(victim); > > > > if (!p) > > > > return; <<<< without killing any process. > > > > > > > > > > Ah, this is catching an instance where the chosen process has already= done > > > exit_mm(), good catch -- I can find examples of this by scraping kern= el > > > logs from our fleet. > > > > > > So it appears there is precedence for dumping all the oom info but no= t > > > actually performing any action for it and I made the earlier point th= at > > > diagnostic information in the kernel log here is still useful. I thi= nk it > > > is still preferable that the kernel at least tell us why it didn't do > > > anything, but as you mention that already happens today. > > > > > > Would you like to send a patch that checks for mem_cgroup_margin() he= re as > > > well? A second patch could make the possible inaction more visibile, > > > something like "Process ${pid} (${comm}) is already exiting" for the = above > > > check or "Memcg ${memcg} is no longer out of memory". > > > > > > Another thing that these messages indicate, beyond telling us why the= oom > > > killer didn't actually SIGKILL anything, is that we can expect some s= kew > > > in the memory stats that shows an availability of memory. > > > > > > > Agreed, these messages would be helpful. > > I will send a patch for it. > > > > Thanks Yafang. We should also continue talking about challenges you > encounter with the oom killer either at the system level or for memcg > limit ooms in a separate thread. It's clear that you are meeting several > of the issues that we have previously seen ourselves. > > I could do a full audit of all our oom killer changes that may be > interesting to you, but off the top of my head: > > - A means of triggering a memcg oom through the kernel: think of sysrq+f > but scoped to processes attached to a memcg hierarchy. This allows > userspace to reliably oom kill processes on overcommitted systems > (SIGKILL can be insufficient if we depend on oom reaping, for example, > to make forward progress) > memcg sysrq+f would be helpful. But I'm wondering how about waking up the oom_reaper when we send SIGKILL to a process ? For the below three proposals, I think they would be helpful as well and I don't have different opinions=E3=80=82 > - Storing the state of a memcg's memory at the time reclaim has failed > and we must oom kill: when the memcg oom killer is disabled so that > userspace can handle it, if it triggers an oom kill through the kernel > because it prefers an oom kill on an overcommitted system, we need to > dump the state of the memory at oom rather than with the stack of the > explicit trigger > > - Supplement memcg oom notification with an additional notification even= t > on kernel oom kill: allows users to register for an event that trigger= s > when the kernel oom killer kills something (and keeps a count of these > events available for read) > > - Add a notion of an oom delay: on overcommitted systems, userspace may > become unreliable or unresponsive despite our best efforts, this > supplements the ability to disable the oom killer for a memcg hierarch= y > with the ability to disable it for a set period of time until the oom > killer intervenes and kills something (last ditch effort). > > I'd be happy to discuss any of these topics if you are interested. Pls. send these patches at your convenience. --=20 Thanks Yafang