From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77B7EC433E0 for ; Thu, 16 Jul 2020 11:53:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1033520739 for ; Thu, 16 Jul 2020 11:53:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a7+xBufm" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1033520739 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 52D216B0024; Thu, 16 Jul 2020 07:53:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B6D66B0025; Thu, 16 Jul 2020 07:53:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37E526B0026; Thu, 16 Jul 2020 07:53:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 1EA096B0024 for ; Thu, 16 Jul 2020 07:53:50 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 84EA1181AC9C6 for ; Thu, 16 Jul 2020 11:53:49 +0000 (UTC) X-FDA: 77043779778.30.book29_6216f6626f02 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 58AD3180B3C85 for ; Thu, 16 Jul 2020 11:53:49 +0000 (UTC) X-HE-Tag: book29_6216f6626f02 X-Filterd-Recvd-Size: 7235 Received: from mail-il1-f193.google.com (mail-il1-f193.google.com [209.85.166.193]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jul 2020 11:53:48 +0000 (UTC) Received: by mail-il1-f193.google.com with SMTP id o3so4758543ilo.12 for ; Thu, 16 Jul 2020 04:53:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=FWcQHubzQKSBewy38ejs87PKUyqXhw9eQkxgKF1gpKI=; b=a7+xBufmiKSjup68L1U2fvrDWMs+Qt/mG5YTVb9iUj+f1hPZLf0SoP1gKj+6d/nb4r cLMHzkDsddCZskm/vc82xg4zS5bljAQ6kk0vOLfBlPDu7wJpJTHORO4QCqvRUBJp3F9F bVR4BiXM8vHfSsTg2xbP2cwFVPUxJHJbgzCAMxAhrwZzY/LHMPbux1qog/x+IHgKX4NS Zy9OE3yoddDj3wT0Nh2gl8dvZ9r1wg8ZaNDRR4hlgvtdT6RB8z3Vl0zTW/4Fw+PTtNhM G2vGeLvGlYcyiWZx0whQh6DRsQBoxT4SevvBiMuKWyYAYAr98X+2UdlO/Hh30X3IkQsP PKBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=FWcQHubzQKSBewy38ejs87PKUyqXhw9eQkxgKF1gpKI=; b=cKsehgRFkUOjGJKDdl/DCsH05CpTSQEQtlzUYe2UJ/SUH9LVGL/pNagm0hjkgBakvN PWUB35eVLUdvPesseH5t+YLjmGc7JweL1W0pHiPTkN7mjoByJg0CVziZplDqrue2Bqlh uwfS0qdOJM+COb3oyOtCbG5SdKkyimTD4mLpc4D/Bh5SXBxXUy4/cWO3tYBuOfLRfCko QUxr8VarLGRPV/SxTT+Kht8TrG4d4Lh+n846en81zVkk67qpdZ7/rH1fK3Ec1B/aMuD9 3hDc5lho+KhjqFjgln2GxdBCKyrBes3TCh+ZcZ+9GOdBU7vwWdoRlZR5vCbUQvY+8CJf ooIQ== X-Gm-Message-State: AOAM531MwO/wLn0c8ttmmgV1L1Vm+ju8RPvE0eMMC2tMEnRevYp6QCZy t6WgbUrB9dLnpiCihML0Y3TbGFKmNdI7jYQSKFQ= X-Google-Smtp-Source: ABdhPJxsCyGnmIblGj8rw+oS9qLjZviQ/duyaSUae+zJ1q62JSQ8Bd2T+7r6OJcx9GfUygr1AP+CKwxL2SWgCd860S4= X-Received: by 2002:a92:c205:: with SMTP id j5mr4355146ilo.137.1594900428259; Thu, 16 Jul 2020 04:53:48 -0700 (PDT) MIME-Version: 1.0 References: <1594735034-19190-1-git-send-email-laoar.shao@gmail.com> In-Reply-To: From: Yafang Shao Date: Thu, 16 Jul 2020 19:53:12 +0800 Message-ID: Subject: Re: [PATCH v2] memcg, oom: check memcg margin for parallel oom To: David Rientjes Cc: Michal Hocko , Tetsuo Handa , Andrew Morton , Johannes Weiner , Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 58AD3180B3C85 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 16, 2020 at 3:04 PM David Rientjes wrote: > > On Thu, 16 Jul 2020, Yafang Shao wrote: > > > > But yes, we store vital > > > information about the memcg at the time of the first oom event when the > > > oom killer is disabled (to allow userspace to determine what the best > > > course of action is). > > > > > > > It would be better if you could upstream the features in your kernel, > > and I think it could also help the other users. > > > > Everything we've discussed so far has been proposed in the past, actually. > I think we stress the oom killer and use it at scale that others do not, > so only a subset of users would find it interesting. You are very likely > one of those subset of users. > > We should certainly talk about other issues that we have run into that > make the upstream oom killer unusable. Are there other areas that you're > currently focused on or having trouble with? I'd be happy to have a > discussion on how we have resolved a lot of its issues. > > > I understand what you mean "point of no return", but that seems a > > workaround rather than a fix. > > If you don't want to kill unnecessary processes, then checking the > > memcg margin before sending sigkill is better, because as I said > > before the race will be most likely happening in dump_header(). > > If you don't want to show strange OOM information like "your process > > was oom killed and it shows usage is 60MB in a memcg limited > > to 100MB", it is better to get the snapshot of the OOM when it is > > triggered and then show it later, and I think it could also apply to > > the global OOM. > > > > It's likely a misunderstanding: I wasn't necessarily concerned about > showing 60MB in a memcg limited to 100MB, that part we can deal with, the > concern was after dumping all of that great information that instead of > getting a "Killed process..." we get a "Oh, there's memory now, just > kidding about everything we just dumped" ;) > Actually the kernel is doing it now, see bellow, dump_header() <<<< dump lots of information __oom_kill_process p = find_lock_task_mm(victim); if (!p) return; <<<< without killing any process. > We could likely enlighten userspace about that so that we don't consider > that to be an actual oom kill. But I also very much agree that after > dump_header() would be appropriate as well since the goal is to prevent > unnecessary oom killing. > > Would you mind sending a patch to check mem_cgroup_margin() on > is_memcg_oom() prior to sending the SIGKILL to the victim and printing the > "Killed process..." line? We'd need a line that says "xKB of memory now > available -- suppressing oom kill" or something along those lines so > userspace understands what happened. But the memory info that it emits > both for the state of the memcg and system RAM may also be helpful to > understand why we got to the oom kill in the first place, which is also a > good thing. > > I'd happy ack that patch since it would be a comprehensive solution that > avoids oom kill of user processes at all costs, which is a goal I think we > can all rally behind. I'd prefer to put dump_header() behind do_send_sig_info(), for example, __oom_kill_process() do_send_sig_info() dump_header() <<<< may better put it behind wake_oom_reaper(), but it may loses some information to dump... pr_err("%s: Killed process %d (%s)....") Because the main goal of OOM is to kill a process to free pages ASAP to avoid system stall or memcg stall. We all find that dump_header() may take a long time to finish especially if there is a slow console, and this long time may cause a great system stall, so we'd better defer the dump of it. But that should be another topic. -- Thanks Yafang