From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4B84C433B4 for ; Wed, 21 Apr 2021 13:26:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 421E961449 for ; Wed, 21 Apr 2021 13:26:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 421E961449 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9641F6B006C; Wed, 21 Apr 2021 09:26:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 913FB6B006E; Wed, 21 Apr 2021 09:26:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7B4986B0070; Wed, 21 Apr 2021 09:26:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0103.hostedemail.com [216.40.44.103]) by kanga.kvack.org (Postfix) with ESMTP id 5C13B6B006C for ; Wed, 21 Apr 2021 09:26:51 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1791F181BCF19 for ; Wed, 21 Apr 2021 13:26:51 +0000 (UTC) X-FDA: 78056449422.02.778923A Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf10.hostedemail.com (Postfix) with ESMTP id 1B77040002C2 for ; Wed, 21 Apr 2021 13:26:42 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id 4so6994361lfp.11 for ; Wed, 21 Apr 2021 06:26:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZywKeowOaZ5VjiR6Xyl1lA10hFbmQYAWAA2LZ4rnjYw=; b=CK1O++Mtfpi16+3yujCdkCfBGjU4sVfzcD13bBfsUdBy4QfY5X0LGDdVSJi7h59uGE CAbgTBgagVouDpUaWZMzdF9CN6uKvmqX57EAYp9CPdLIm2iwV3lrOhiMPNnXnS1UAc3U VOjJ9diIXc6gRimHPoCeH9Hiqca0KFQ/WE/zg2xuxmv7LyUlzY+T+aZvjN/dULKrWWGN AtQZCyKcPiVpNk7gB9XIYbHiTdQe2wVPftF0ZEMKDc8lqANKKd2tbwSEwcHIzZdonz3B mcNDmd8/vG/gsoS8jfvtZABzkpzTRy1JgaOM1Emh/D0bKwhc4dtv2LM7QlFYVIRXeY3A Ulog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZywKeowOaZ5VjiR6Xyl1lA10hFbmQYAWAA2LZ4rnjYw=; b=BZEz+1bNlefDSXdVZG/tVtPZGQm2SM22ikyKXeW0XhK1/7Fh4ZxKI0jjAtP9SSJToA A4zVzUO8HiM7PucnjQOi9KaDUXxo/wdUNnoNULyajI7QIruualNFF9NraRihHeo5lBNx bLNZCqPi38hdM9jAaSYri1RX30bdtzxl0oMFLyoXMPFyDrgWiY4ZpTs94lyTG8r5X1hl X3H1PbdDQlDIU8gH6m7vP3lyWCxkmKmrMKk25jxCJqob1aij+MvxfHgbliDtnse6MdMK 0+ZcHs1aUVcF9fb4IVMKLaQ4xUVxEaICGlR492eLKepiSq+MMyi3JVhvd40rmPv0AnX2 ehfw== X-Gm-Message-State: AOAM530vH3m1OmoLH87B177b8MJur40r40YzvYZtCCPfmz7JoHqu+9Al u+HNcoEHOtU/xPtSnbcA0SKPPT3g/mjYczHzQV7vNA== X-Google-Smtp-Source: ABdhPJxOMLDAmh/+OEeo8HE3aBbSOrP23NrZqB40yujPGlQ1kz99f3Qw13cMpn/RLMC5fZO5jwg4a6o5Pylj2lXp574= X-Received: by 2002:a05:6512:2037:: with SMTP id s23mr19213599lfs.358.1619011608927; Wed, 21 Apr 2021 06:26:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Shakeel Butt Date: Wed, 21 Apr 2021 06:26:37 -0700 Message-ID: Subject: Re: [RFC] memory reserve for userspace oom-killer To: Roman Gushchin Cc: Johannes Weiner , Michal Hocko , Linux MM , Andrew Morton , Cgroups , David Rientjes , LKML , Suren Baghdasaryan , Greg Thelen , Dragos Sbirlea , Priya Duraisamy Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 1B77040002C2 X-Stat-Signature: 5bgk819q47wrzh37fo9hsn5q1fez113i X-Rspamd-Server: rspam02 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf10; identity=mailfrom; envelope-from=""; helo=mail-lf1-f53.google.com; client-ip=209.85.167.53 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619011602-97362 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 20, 2021 at 7:58 PM Roman Gushchin wrote: > [...] > > > > Michal has suggested ALLOC_OOM which is less risky. > > The problem is that even if you'll serve the oom daemon task with pages > from a reserve/custom pool, it doesn't guarantee anything, because the task > still can wait for a long time on some mutex, taken by another process, > throttled somewhere in the reclaim. I am assuming here by mutex you are referring to locks which oom-killer might have to take to read metrics or any possible lock which oom-killer might have to take which some other process can take too. Have you observed this situation happening with oomd on production? > You're basically trying to introduce a > "higher memory priority" and as always in such cases there will be priority > inversion problems. > > So I doubt that you can simple create a common mechanism which will work > flawlessly for all kinds of allocations, I anticipate many special cases > requiring an individual approach. > [...] > > First, I need to admit that I didn't follow the bpf development too close > for last couple of years, so my knowledge can be a bit outdated. > > But in general bpf is great when there is a fixed amount of data as input > (e.g. skb) and a fixed output (e.g. drop/pass the packet). There are different > maps which are handy to store some persistent data between calls. > > However traversing complex data structures is way more complicated. It's > especially tricky if the data structure is not of a fixed size: bpf programs > have to be deterministic, so there are significant constraints on loops. > > Just for example: it's easy to call a bpf program for each task in the system, > provide some stats/access to some fields of struct task and expect it to return > an oom score, which then the kernel will look at to select the victim. > Something like this can be done with cgroups too. > > Writing a kthread, which can sleep, poll some data all over the system and > decide what to do (what oomd/... does), will be really challenging. > And going back, it will not provide any guarantees unless we're not taking > any locks, which is already quite challenging. > Thanks for the info and I agree this direction needs much more thought and time to be materialized. thanks, Shakeel