From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6211EC433E3 for ; Fri, 17 Jul 2020 19:26:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 27C5B2064C for ; Fri, 17 Jul 2020 19:26:13 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="l9/KNC2t" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27C5B2064C Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 96FAF8D000C; Fri, 17 Jul 2020 15:26:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 921148D0003; Fri, 17 Jul 2020 15:26:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 80E228D000C; Fri, 17 Jul 2020 15:26:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0209.hostedemail.com [216.40.44.209]) by kanga.kvack.org (Postfix) with ESMTP id 6D3908D0003 for ; Fri, 17 Jul 2020 15:26:12 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D1B481802CCAC for ; Fri, 17 Jul 2020 19:26:11 +0000 (UTC) X-FDA: 77048548542.27.cord70_5d173fc26f0d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id A45EE3E900 for ; Fri, 17 Jul 2020 19:26:11 +0000 (UTC) X-HE-Tag: cord70_5d173fc26f0d X-Filterd-Recvd-Size: 7309 Received: from mail-pj1-f68.google.com (mail-pj1-f68.google.com [209.85.216.68]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Jul 2020 19:26:11 +0000 (UTC) Received: by mail-pj1-f68.google.com with SMTP id gc15so4352848pjb.0 for ; Fri, 17 Jul 2020 12:26:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=akRnVw0UEhO7s1kfmWmCLEPCBlIQAQxeVTl6Fjr04L8=; b=l9/KNC2tYDQqm8GuthStN8+sHCHMog/u1WtnairhLXaJUW/bVG9SLncjzsD5PQUsm7 xtgQtykhTMffsTZuQwzz9xXgvvmGuFdxzn94RIeTdjaPDsv4khHkZ318cYK/s/yleAcj eXNsk8Mq7KJTxOx1L64X8cRuH174OzAZhe2+vz2iJHbk1Eo3ukuC99kqBln/nnVvQBaF vBEO+PpdGtzmHhb8eGjy6otwyfBd1j4KnB2M2mSrGIoH1gi2KSCT3ihWnowUVvKRMq+W 7Rcnsz0+dYqOsne14J8Rjzye3U3l1GT7djObAPuQGUDD7G4MVLOkuk5H8ve20mkpL4Wh bN0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=akRnVw0UEhO7s1kfmWmCLEPCBlIQAQxeVTl6Fjr04L8=; b=oYjWGygY4siytCPI23YpiYeDb2DslFf3Rafmtwi5dUMBG9WRCdYHqCqfhPWycWyMD1 02P7s2wdBklL6JlzlLU8QPQXiYcur7jJ+UtkW5i0orejtZaP98auIzCnILqnuRMRWg7T 31jqiRnYIvMBqo+9FL0nZhzBbzEwJYVg39SUx+p+w/HkeXQpxyZTaHd6m2/NSAfBwdUf wmwerM7Z4Zin3TN9c9rTgSD5OsVFWA4IGp4LPXeW9K6T6Vsm2qnUBNL5SBzxDiqb+DNB zJgzXXUoNMBZrVfcRG1TJzJBlLkUdLlpDxdK27JFKv1+27Qgq4lyCpjKmw5g9Tpzl8xh ZHGQ== X-Gm-Message-State: AOAM533Mh2uLSFrt2agXTptS+SO3EihXsFsRKgyI27M6s9JVZtCNhk6V L6SrLf99x+f8OGB7+2NsZQ9Y5A== X-Google-Smtp-Source: ABdhPJxXth33mYtE+V8ECy5np6mHSCqxmY01zaYAAqtBWp0YPfNkcnKKdPTCbYpKEKdjf1uaFumhLw== X-Received: by 2002:a17:902:b7c8:: with SMTP id v8mr8905793plz.201.1595013969642; Fri, 17 Jul 2020 12:26:09 -0700 (PDT) Received: from [2620:15c:17:3:4a0f:cfff:fe51:6667] ([2620:15c:17:3:4a0f:cfff:fe51:6667]) by smtp.gmail.com with ESMTPSA id i196sm8146486pgc.55.2020.07.17.12.26.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jul 2020 12:26:08 -0700 (PDT) Date: Fri, 17 Jul 2020 12:26:07 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Yafang Shao cc: Michal Hocko , Tetsuo Handa , Andrew Morton , Johannes Weiner , Linux MM Subject: Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom In-Reply-To: Message-ID: References: <1594735034-19190-1-git-send-email-laoar.shao@gmail.com> User-Agent: Alpine 2.23 (DEB 453 2020-06-18) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: A45EE3E900 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, 17 Jul 2020, Yafang Shao wrote: > > > Actually the kernel is doing it now, see bellow, > > > > > > dump_header() <<<< dump lots of information > > > __oom_kill_process > > > p = find_lock_task_mm(victim); > > > if (!p) > > > return; <<<< without killing any process. > > > > > > > Ah, this is catching an instance where the chosen process has already done > > exit_mm(), good catch -- I can find examples of this by scraping kernel > > logs from our fleet. > > > > So it appears there is precedence for dumping all the oom info but not > > actually performing any action for it and I made the earlier point that > > diagnostic information in the kernel log here is still useful. I think it > > is still preferable that the kernel at least tell us why it didn't do > > anything, but as you mention that already happens today. > > > > Would you like to send a patch that checks for mem_cgroup_margin() here as > > well? A second patch could make the possible inaction more visibile, > > something like "Process ${pid} (${comm}) is already exiting" for the above > > check or "Memcg ${memcg} is no longer out of memory". > > > > Another thing that these messages indicate, beyond telling us why the oom > > killer didn't actually SIGKILL anything, is that we can expect some skew > > in the memory stats that shows an availability of memory. > > > > Agreed, these messages would be helpful. > I will send a patch for it. > Thanks Yafang. We should also continue talking about challenges you encounter with the oom killer either at the system level or for memcg limit ooms in a separate thread. It's clear that you are meeting several of the issues that we have previously seen ourselves. I could do a full audit of all our oom killer changes that may be interesting to you, but off the top of my head: - A means of triggering a memcg oom through the kernel: think of sysrq+f but scoped to processes attached to a memcg hierarchy. This allows userspace to reliably oom kill processes on overcommitted systems (SIGKILL can be insufficient if we depend on oom reaping, for example, to make forward progress) - Storing the state of a memcg's memory at the time reclaim has failed and we must oom kill: when the memcg oom killer is disabled so that userspace can handle it, if it triggers an oom kill through the kernel because it prefers an oom kill on an overcommitted system, we need to dump the state of the memory at oom rather than with the stack of the explicit trigger - Supplement memcg oom notification with an additional notification event on kernel oom kill: allows users to register for an event that triggers when the kernel oom killer kills something (and keeps a count of these events available for read) - Add a notion of an oom delay: on overcommitted systems, userspace may become unreliable or unresponsive despite our best efforts, this supplements the ability to disable the oom killer for a memcg hierarchy with the ability to disable it for a set period of time until the oom killer intervenes and kills something (last ditch effort). I'd be happy to discuss any of these topics if you are interested.