From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 037E9C433E0 for ; Thu, 16 Jul 2020 02:39:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B5DC620775 for ; Thu, 16 Jul 2020 02:39:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="rR8GhjdB" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B5DC620775 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38DE86B0007; Wed, 15 Jul 2020 22:39:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 33E1C6B0008; Wed, 15 Jul 2020 22:39:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2570B6B000E; Wed, 15 Jul 2020 22:39:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0195.hostedemail.com [216.40.44.195]) by kanga.kvack.org (Postfix) with ESMTP id 0F4F46B0007 for ; Wed, 15 Jul 2020 22:39:19 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9D2E6180AD802 for ; Thu, 16 Jul 2020 02:39:18 +0000 (UTC) X-FDA: 77042382396.03.maid27_3d1178f26efe Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 7235028A4EC for ; Thu, 16 Jul 2020 02:39:18 +0000 (UTC) X-HE-Tag: maid27_3d1178f26efe X-Filterd-Recvd-Size: 6957 Received: from mail-io1-f67.google.com (mail-io1-f67.google.com [209.85.166.67]) by imf20.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jul 2020 02:39:17 +0000 (UTC) Received: by mail-io1-f67.google.com with SMTP id f23so4463046iof.6 for ; Wed, 15 Jul 2020 19:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aoqM7Nr5kbcuwhH3L1l7wP8NJUhQegCXUIvFsFSaHFc=; b=rR8GhjdB92zHA63EVWSSrhaPFliLd3h/R2fHWYDkrPubc+DvKLjugPCziEgcNLWEPE HsUt1DefZIH7hwjwjjEtSpLRsBTBmugW1pbI+dSXMgQ1ZH5U9dirS4AFUJ3UroMFvogA NLJzS/7yZpscPDVBpSvgoRffrhytzQPFxIkUhVZd76L6xccXKWmqOynKjv42C3bDJOgW RWdB8kX87FE/dLGzjyhDa18Z4jEhv947CmnjdAjiY/+mrsYWBXxDOLvBlsRZ/ccgEbF+ dsrjcwRjWxoEwtbZvfBS0foH8wO8OnpZrQOYDx8r/3SV8OkSyIlgU5qf3yXhUxyMKm4b nAZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aoqM7Nr5kbcuwhH3L1l7wP8NJUhQegCXUIvFsFSaHFc=; b=NpsKDajnNIE7pFIhaXLPVGDS0GxAlY3TWtkXPA3x1LcBeAAfpS8RqQq8Tvb+8wIti6 bAAxfr+NSwGf3f5fhOamH8IcimzEjVioYvVKy6N1t73YN8IFCkKIJkKbzZUsi79v4qoI Q0nHzql9lKrswxSghNUnezJ6tXJIgt25WAz8Y4Ziq3kQ8DCdsiR0rnO8snAfjANthfid /AQCZrStfLu6YEvD7ZY9aXIKqzEuHy6IJ7H/S34DXLwRBrwRNWpXyAW4CTJFBYOk3PUk rdYE4QpXQkJHpgwkTX6q/XXpw9Hnd4boUviBK0Y8KoGew1RKYll2lFshE01dgE0mu31v W1Rw== X-Gm-Message-State: AOAM531cYcA1UI3b2KPDL93De6LDoly47xElzuvpAsSYNTvMH2NQ+yAQ wC7KW+O509YXZf8ACUMA49FPXLChC81L8ONOqho= X-Google-Smtp-Source: ABdhPJxZxSD8IPikjjiw+8wsD9OlaoDcTAcYfn5yEGhPHARSbxHtgQO2jqkRqNqDsAvI3vhUenewJPVIqmhUkh5OPZo= X-Received: by 2002:a02:3905:: with SMTP id l5mr2820607jaa.64.1594867157403; Wed, 15 Jul 2020 19:39:17 -0700 (PDT) MIME-Version: 1.0 References: <1594735034-19190-1-git-send-email-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Thu, 16 Jul 2020 10:38:41 +0800 Message-ID: Subject: Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom To: David Rientjes Cc: Michal Hocko , Tetsuo Handa , Andrew Morton , Johannes Weiner , Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 7235028A4EC X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 16, 2020 at 1:30 AM David Rientjes wrote: > > On Wed, 15 Jul 2020, Yafang Shao wrote: > > > > > If it is the race which causes this issue and we want to reduce the > > > > race window, I don't know whether it is proper to check the memcg > > > > margin in out_of_memory() or do it before calling do_send_sig_info(). > > > > Because per my understanding, dump_header() always takes much more > > > > time than select_bad_process() especially if there're slow consoles. > > > > So the race might easily happen when doing dump_header() or dumping > > > > other information, but if we check the memcg margin after dumping this > > > > oom info, it would be strange to dump so much oom logs without killing > > > > a process. > > > > > > > > > > Absolutely correct :) In my proposed patch, we declare dump_header() as > > > the "point of no return" since we don't want to dump oom kill information > > > to the kernel log when nothing is actually killed. We could abort at the > > > very last minute, as you mention, but I think that may have an adverse > > > impact on anything that cares about that log message. > > > > How about storing the memcg information in oom_control when the memcg > > oom is triggered, and then show this information in dump_header() ? > > IOW, the OOM info really shows the memcg status when oom occurs, > > rather than the memcg status when this info is printed. > > > > We actually do that too in our kernel but for slightly other reasons :) > It's pretty interesting how a lot of our previous concerns with memcg oom > killing have been echoed by you in this thread. These should be common concerns of container users :) I'm a heavy container user for now. > But yes, we store vital > information about the memcg at the time of the first oom event when the > oom killer is disabled (to allow userspace to determine what the best > course of action is). > It would be better if you could upstream the features in your kernel, and I think it could also help the other users. > But regardless of whether we present previous data to the user in the > kernel log or not, we've determined that oom killing a process is a > serious matter and go to any lengths possible to avoid having to do it. > For us, that means waiting until the "point of no return" to either go > ahead with oom killing a process or aborting and retrying the charge. > > I don't think moving the mem_cgroup_margin() check to out_of_memory() > right before printing the oom info and killing the process is a very > invasive patch. Any strong preference against doing it that way? I think > moving the check as late as possible to save a process from being killed > when racing with an exiter or killed process (including perhaps current) > has a pretty clear motivation. I understand what you mean "point of no return", but that seems a workaround rather than a fix. If you don't want to kill unnecessary processes, then checking the memcg margin before sending sigkill is better, because as I said before the race will be most likely happening in dump_header(). If you don't want to show strange OOM information like "your process was oom killed and it shows usage is 60MB in a memcg limited to 100MB", it is better to get the snapshot of the OOM when it is triggered and then show it later, and I think it could also apply to the global OOM. While my patch means to fix the issue caused by parallel OOM, because the others are waiting oom_lock while one process is doing OOM. And as explained by Michal before, it is more in line with the global oom flow and it is much easier to reason about. -- Thanks Yafang