From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CADF2C433ED for ; Wed, 21 Apr 2021 18:28:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8FEEB613D3 for ; Wed, 21 Apr 2021 18:28:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245195AbhDUS33 (ORCPT ); Wed, 21 Apr 2021 14:29:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244543AbhDUS31 (ORCPT ); Wed, 21 Apr 2021 14:29:27 -0400 Received: from mail-lf1-x12a.google.com (mail-lf1-x12a.google.com [IPv6:2a00:1450:4864:20::12a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CC9E4C06138A for ; Wed, 21 Apr 2021 11:28:52 -0700 (PDT) Received: by mail-lf1-x12a.google.com with SMTP id r128so41256385lff.4 for ; Wed, 21 Apr 2021 11:28:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=a3pKNRpAYonIhJIPLG+pbIg/iVXk4BkHo5q1/p/dOgo=; b=AAoUTPD7oBRut/ALdKm+tPO6jlrcqGZAJSsoMj8iGEp6Zg/YqD6yGfx1Yxi3bpHRcC 6siZpF6fb7tuWkT2EQdMnCumN2CMDnOPzmvpYcF5S431ORvspnkulpHEa/t3mERwZHmG 7n3krkLX85vE9LhJodHreS7aSegTTQ/WxNpSKGKsxY71e1fntE2ffpomx7mY2GomwzyC Fn/HKakuhmUzJhMig93Y0mlLSeVwDWqQW1R2IcjuZ2uEzJLnTN1ZOJ46woQnyDOezz0w 8orZmdhMXW7tHjrcFjkfhhtVhtnhc4W9irW2IzCsysbkc4u1Eo89TJanvCB8v4SRoloM rKvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=a3pKNRpAYonIhJIPLG+pbIg/iVXk4BkHo5q1/p/dOgo=; b=d52O6hxGJLz0EyVNmmtJ0pzDuh+9sqa1P6IcmJdXvQh542xjeeKSCSxEUufaHBVxOh i5KwF/dSBB/3WBhUdHJDKpClo2Hw3WTKn9QpyRo5pSab1+MY0tUD/iJNqK26QX41Ielh LgZ+Qbmj6cKRbjenXoehGIzZST2rcO/TwvPxOEOpRbeQGLmEfVhS046hMBn7RBi+znDB PkLHw/iefUAw1l29w5NCVrfswazrg7RTEWoxxAeD9IDpN0YiOI9wW1Hd+ladTNihZDgT cwn48Cg3ZdFZ0xc66Ph7uBtU/rIkVHU1kZ9lLP/7yVFqpGV0reVU385bxwDqRWyq2Iky Kgtg== X-Gm-Message-State: AOAM531N4vn+ruLFqyJEIsm3yH++gAT63t7sNAW3dN76/aGiBogsS0wp VxJtySUODLnaFim+eTpsfkr2WTJTFEB80vnGE/3nyQ== X-Google-Smtp-Source: ABdhPJzVdK3x9cI06saU3B0MCD9Pb2S2GZP3vYXK2GIGysuECny5jZsN839PKrxjU0hUrtB8WK7Aox+R8wCADc2Sqdg= X-Received: by 2002:ac2:58ee:: with SMTP id v14mr19969129lfo.83.1619029730879; Wed, 21 Apr 2021 11:28:50 -0700 (PDT) MIME-Version: 1.0 References: <699e51ba-825d-b243-8205-4d8cff478a66@sony.com> In-Reply-To: <699e51ba-825d-b243-8205-4d8cff478a66@sony.com> From: Shakeel Butt Date: Wed, 21 Apr 2021 11:28:38 -0700 Message-ID: Subject: Re: [RFC] memory reserve for userspace oom-killer To: peter enderborg Cc: Johannes Weiner , Roman Gushchin , Michal Hocko , Linux MM , Andrew Morton , Cgroups , David Rientjes , LKML , Suren Baghdasaryan , Greg Thelen , Dragos Sbirlea , Priya Duraisamy Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 21, 2021 at 10:06 AM peter enderborg wrote: > > On 4/20/21 3:44 AM, Shakeel Butt wrote: [...] > > I think this is the wrong way to go. Which one? Are you talking about the kernel one? We already talked out of that. To decide to OOM, we need to look at a very diverse set of metrics and it seems like that would be very hard to do flexibly inside the kernel. > > I sent a patch for android lowmemorykiller some years ago. > > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2017-February/100319.html > > It has been improved since than, so it can act handle oom callbacks, it can act on vmpressure and psi > and as a shrinker. The patches has not been ported to resent kernels though. > > I don't think vmpressure and psi is that relevant now. (They are what userspace act on) But the basic idea is to have a priority queue > within the kernel. It need pick up new processes and dying process. And then it has a order, and that > is set with oom adj values by activity manager in android. I see this model can be reused for > something that is between a standard oom and userspace. Instead of vmpressure and psi > a watchdog might be a better way. If userspace (in android the activity manager or lmkd) does not kick the watchdog, > the watchdog bite the task according to the priority and kills it. This priority list does not have to be a list generated > within kernel. But it has the advantage that you inherent parents properties. We use a rb-tree for that. > > All that is missing is the watchdog. > Actually no. It is missing the flexibility to monitor metrics which a user care and based on which they decide to trigger oom-kill. Not sure how will watchdog replace psi/vmpressure? Userspace keeps petting the watchdog does not mean that system is not suffering. In addition oom priorities change dynamically and changing it in your system seems very hard. Cgroup awareness is missing too. Anyways, there are already widely deployed userspace oom-killer solutions (lmkd, oomd). I am aiming to further improve the reliability. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE775C433ED for ; Wed, 21 Apr 2021 18:28:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 64FBC61450 for ; Wed, 21 Apr 2021 18:28:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 64FBC61450 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B1E526B0036; Wed, 21 Apr 2021 14:28:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ACE216B006E; Wed, 21 Apr 2021 14:28:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9223F6B0070; Wed, 21 Apr 2021 14:28:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0168.hostedemail.com [216.40.44.168]) by kanga.kvack.org (Postfix) with ESMTP id 72A066B0036 for ; Wed, 21 Apr 2021 14:28:53 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1E0A08248D7C for ; Wed, 21 Apr 2021 18:28:53 +0000 (UTC) X-FDA: 78057210546.11.152E3B6 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf17.hostedemail.com (Postfix) with ESMTP id D626640002D0 for ; Wed, 21 Apr 2021 18:28:49 +0000 (UTC) Received: by mail-lf1-f53.google.com with SMTP id g8so68295948lfv.12 for ; Wed, 21 Apr 2021 11:28:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=a3pKNRpAYonIhJIPLG+pbIg/iVXk4BkHo5q1/p/dOgo=; b=AAoUTPD7oBRut/ALdKm+tPO6jlrcqGZAJSsoMj8iGEp6Zg/YqD6yGfx1Yxi3bpHRcC 6siZpF6fb7tuWkT2EQdMnCumN2CMDnOPzmvpYcF5S431ORvspnkulpHEa/t3mERwZHmG 7n3krkLX85vE9LhJodHreS7aSegTTQ/WxNpSKGKsxY71e1fntE2ffpomx7mY2GomwzyC Fn/HKakuhmUzJhMig93Y0mlLSeVwDWqQW1R2IcjuZ2uEzJLnTN1ZOJ46woQnyDOezz0w 8orZmdhMXW7tHjrcFjkfhhtVhtnhc4W9irW2IzCsysbkc4u1Eo89TJanvCB8v4SRoloM rKvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=a3pKNRpAYonIhJIPLG+pbIg/iVXk4BkHo5q1/p/dOgo=; b=nlP/IE/IkMahUuDfs7gy5vrZBAdsz5tVjzYfALTMgj/Hvu4bSC9X96AlJjV9hyvvVp KM0Fs/WwEwflom/NPw2KtdshtMZ4lFmuNyA6MIxUdihFvsWYvn8GWA9TNtK3IO37obvH udVD2NAjzzKzNa5zAmpadDIewYYKKTnXd6Xapt/vBfUxH6R/mNmSKU76LE2YBEDlflQk gCfara5GT9iBoQFMOpbvW/UtiW3TGVwpkF/PNLe2bQ/+/aWcyzwzF11KtSoH+UUX41tz ZQKc2Cn/gPj4izHQ7HNi6d2tKk3pac1G3R1paXVYHWdKgyDT+/yHEfMWHqENhbKS0SWk VXKg== X-Gm-Message-State: AOAM531QYcxkOAi20ErUIT9K9IBJ8st7zw7l9JtmMR8D32uj5/OJ1dZn BeO7TZ0CNBJaSXXvhwBjK4WeWDjdXQqB6B6kBpvsCQ== X-Google-Smtp-Source: ABdhPJzVdK3x9cI06saU3B0MCD9Pb2S2GZP3vYXK2GIGysuECny5jZsN839PKrxjU0hUrtB8WK7Aox+R8wCADc2Sqdg= X-Received: by 2002:ac2:58ee:: with SMTP id v14mr19969129lfo.83.1619029730879; Wed, 21 Apr 2021 11:28:50 -0700 (PDT) MIME-Version: 1.0 References: <699e51ba-825d-b243-8205-4d8cff478a66@sony.com> In-Reply-To: <699e51ba-825d-b243-8205-4d8cff478a66@sony.com> From: Shakeel Butt Date: Wed, 21 Apr 2021 11:28:38 -0700 Message-ID: Subject: Re: [RFC] memory reserve for userspace oom-killer To: peter enderborg Cc: Johannes Weiner , Roman Gushchin , Michal Hocko , Linux MM , Andrew Morton , Cgroups , David Rientjes , LKML , Suren Baghdasaryan , Greg Thelen , Dragos Sbirlea , Priya Duraisamy Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: D626640002D0 X-Stat-Signature: 1d1ijws4g4zjmfrotmpojwunbju3xx1q X-Rspamd-Server: rspam02 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=mail-lf1-f53.google.com; client-ip=209.85.167.53 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619029729-837813 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 21, 2021 at 10:06 AM peter enderborg wrote: > > On 4/20/21 3:44 AM, Shakeel Butt wrote: [...] > > I think this is the wrong way to go. Which one? Are you talking about the kernel one? We already talked out of that. To decide to OOM, we need to look at a very diverse set of metrics and it seems like that would be very hard to do flexibly inside the kernel. > > I sent a patch for android lowmemorykiller some years ago. > > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2017-February/100319.html > > It has been improved since than, so it can act handle oom callbacks, it can act on vmpressure and psi > and as a shrinker. The patches has not been ported to resent kernels though. > > I don't think vmpressure and psi is that relevant now. (They are what userspace act on) But the basic idea is to have a priority queue > within the kernel. It need pick up new processes and dying process. And then it has a order, and that > is set with oom adj values by activity manager in android. I see this model can be reused for > something that is between a standard oom and userspace. Instead of vmpressure and psi > a watchdog might be a better way. If userspace (in android the activity manager or lmkd) does not kick the watchdog, > the watchdog bite the task according to the priority and kills it. This priority list does not have to be a list generated > within kernel. But it has the advantage that you inherent parents properties. We use a rb-tree for that. > > All that is missing is the watchdog. > Actually no. It is missing the flexibility to monitor metrics which a user care and based on which they decide to trigger oom-kill. Not sure how will watchdog replace psi/vmpressure? Userspace keeps petting the watchdog does not mean that system is not suffering. In addition oom priorities change dynamically and changing it in your system seems very hard. Cgroup awareness is missing too. Anyways, there are already widely deployed userspace oom-killer solutions (lmkd, oomd). I am aiming to further improve the reliability. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shakeel Butt Subject: Re: [RFC] memory reserve for userspace oom-killer Date: Wed, 21 Apr 2021 11:28:38 -0700 Message-ID: References: <699e51ba-825d-b243-8205-4d8cff478a66@sony.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=a3pKNRpAYonIhJIPLG+pbIg/iVXk4BkHo5q1/p/dOgo=; b=AAoUTPD7oBRut/ALdKm+tPO6jlrcqGZAJSsoMj8iGEp6Zg/YqD6yGfx1Yxi3bpHRcC 6siZpF6fb7tuWkT2EQdMnCumN2CMDnOPzmvpYcF5S431ORvspnkulpHEa/t3mERwZHmG 7n3krkLX85vE9LhJodHreS7aSegTTQ/WxNpSKGKsxY71e1fntE2ffpomx7mY2GomwzyC Fn/HKakuhmUzJhMig93Y0mlLSeVwDWqQW1R2IcjuZ2uEzJLnTN1ZOJ46woQnyDOezz0w 8orZmdhMXW7tHjrcFjkfhhtVhtnhc4W9irW2IzCsysbkc4u1Eo89TJanvCB8v4SRoloM rKvw== In-Reply-To: <699e51ba-825d-b243-8205-4d8cff478a66-7U/KSKJipcs@public.gmane.org> List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: peter enderborg Cc: Johannes Weiner , Roman Gushchin , Michal Hocko , Linux MM , Andrew Morton , Cgroups , David Rientjes , LKML , Suren Baghdasaryan , Greg Thelen , Dragos Sbirlea , Priya Duraisamy On Wed, Apr 21, 2021 at 10:06 AM peter enderborg wrote: > > On 4/20/21 3:44 AM, Shakeel Butt wrote: [...] > > I think this is the wrong way to go. Which one? Are you talking about the kernel one? We already talked out of that. To decide to OOM, we need to look at a very diverse set of metrics and it seems like that would be very hard to do flexibly inside the kernel. > > I sent a patch for android lowmemorykiller some years ago. > > http://driverdev.linuxdriverproject.org/pipermail/driverdev-devel/2017-February/100319.html > > It has been improved since than, so it can act handle oom callbacks, it can act on vmpressure and psi > and as a shrinker. The patches has not been ported to resent kernels though. > > I don't think vmpressure and psi is that relevant now. (They are what userspace act on) But the basic idea is to have a priority queue > within the kernel. It need pick up new processes and dying process. And then it has a order, and that > is set with oom adj values by activity manager in android. I see this model can be reused for > something that is between a standard oom and userspace. Instead of vmpressure and psi > a watchdog might be a better way. If userspace (in android the activity manager or lmkd) does not kick the watchdog, > the watchdog bite the task according to the priority and kills it. This priority list does not have to be a list generated > within kernel. But it has the advantage that you inherent parents properties. We use a rb-tree for that. > > All that is missing is the watchdog. > Actually no. It is missing the flexibility to monitor metrics which a user care and based on which they decide to trigger oom-kill. Not sure how will watchdog replace psi/vmpressure? Userspace keeps petting the watchdog does not mean that system is not suffering. In addition oom priorities change dynamically and changing it in your system seems very hard. Cgroup awareness is missing too. Anyways, there are already widely deployed userspace oom-killer solutions (lmkd, oomd). I am aiming to further improve the reliability.