From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECDB3C433DB for ; Tue, 16 Feb 2021 13:38:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7E20264DF0 for ; Tue, 16 Feb 2021 13:38:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7E20264DF0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=chrisdown.name Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E7FC68D017A; Tue, 16 Feb 2021 08:38:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E30A48D0176; Tue, 16 Feb 2021 08:38:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D6FCC8D017A; Tue, 16 Feb 2021 08:38:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id C00F28D0176 for ; Tue, 16 Feb 2021 08:38:43 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8B40B87DE for ; Tue, 16 Feb 2021 13:38:43 +0000 (UTC) X-FDA: 77824236126.22.crime30_4f1188527644 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin22.hostedemail.com (Postfix) with ESMTP id 6F617180CE600 for ; Tue, 16 Feb 2021 13:38:43 +0000 (UTC) X-HE-Tag: crime30_4f1188527644 X-Filterd-Recvd-Size: 5170 Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Tue, 16 Feb 2021 13:38:42 +0000 (UTC) Received: by mail-ed1-f43.google.com with SMTP id r17so5565495edy.10 for ; Tue, 16 Feb 2021 05:38:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=WVdO5Suny8ndHlIrRpoBrmqIPx4CWmlNbuYupKr2Cqc=; b=mx72lufRdfc46M4ag+yiDlWAEO4kn8+gbsl3jWz0aFqVijhPbgaHfBEQpRr24bWoo3 Kzcg5/Qv0MqwuyCOWfV1DbKnZu075rzPranwK6vD4iech5drvH3efYd5mAeEH9goRzIa WVW3u5QveHXn/Yw7x4byE6AG/TaLuDBAt4Gb8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=WVdO5Suny8ndHlIrRpoBrmqIPx4CWmlNbuYupKr2Cqc=; b=No0OowTw7pVR/0p6mseVpA5QiknujtDJuWOqCvUnGxTkL0Nn6na0xhVQLsLAEluTYm Vjr8gWkGZ4KqnYfWMdfy2vrxkOuWFw8uG+aSY3kHpbfvqozTSUQiAtCdILCeIsoXOTSE txMUxpF35AfzDR/boH7oslZw55CvvRohiavhnBg9rwfGwXmJDKGRtJv5nu+GXRJVxNOM vPuy9I6JTUz9V3fbrak12nT5baeEY31KXZZKblQpVQ4ivGYxAD0acd8xsxBgzDhYD7+A SiS7YZC1M8FlzdwiKV+rr+XPCe/6aFV7taw1LAD+gZ9BZRx796C/5jyH7aliESX8UEp9 Bm/Q== X-Gm-Message-State: AOAM533a02+fM2IvnHxqZDxvdS3N5kR2xTW5irPkqaqNWr2iYv/LehWm pT/Atq3FyEMbmnfHso7sIj206w== X-Google-Smtp-Source: ABdhPJzpPnxzHatNnDsBqyQ+8F5czdeOrvKXZriPXHaF1qmKcbRlVfIHYJBPmwi07+8O5GcHXWnEEw== X-Received: by 2002:a50:fe02:: with SMTP id f2mr938391edt.173.1613482721746; Tue, 16 Feb 2021 05:38:41 -0800 (PST) Received: from localhost ([2a01:4b00:8432:8a00:63de:dd93:20be:f460]) by smtp.gmail.com with ESMTPSA id i21sm13620839edy.9.2021.02.16.05.38.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Feb 2021 05:38:41 -0800 (PST) Date: Tue, 16 Feb 2021 13:38:40 +0000 From: Chris Down To: Eiichi Tsukata Cc: corbet@lwn.net, mike.kravetz@oracle.com, mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, akpm@linux-foundation.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, felipe.franciosi@nutanix.com Subject: Re: [RFC PATCH] mm, oom: introduce vm.sacrifice_hugepage_on_oom Message-ID: References: <20210216030713.79101-1-eiichi.tsukata@nutanix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <20210216030713.79101-1-eiichi.tsukata@nutanix.com> User-Agent: Mutt/2.0.5 (da5e3282) (2021-01-21) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Eiichi, I agree with Michal's points, and I think there are also some other design questions which don't quite make sense to me. Perhaps you can clear them up? :-) Eiichi Tsukata writes: >diff --git a/mm/hugetlb.c b/mm/hugetlb.c >index 4bdb58ab14cb..e2d57200fd00 100644 >--- a/mm/hugetlb.c >+++ b/mm/hugetlb.c >@@ -1726,8 +1726,8 @@ static int alloc_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, > * balanced over allowed nodes. > * Called with hugetlb_lock locked. > */ >-static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, >- bool acct_surplus) >+int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, >+ bool acct_surplus) > { > int nr_nodes, node; > int ret = 0; The immediate red flag to me is that we're investing further mm knowledge into hugetlb. For the vast majority of intents and purposes, hugetlb exists outside of the typical memory management lifecycle, and historic behaviour has been to treat a separate reserve that we don't touch. We expect that hugetlb is a reserve which is by and large explicitly managed by the system administrator, not by us, and this seems to violate that. Shoehorning in shrink-on-OOM support to it seems a little suspicious to me, because we already have a modernised system for huge pages that handles not only this, but many other memory management situations: THP. THP not only has support for this particular case, but so many other features which are necessary to coherently manage it as part of the mm lifecycle. For that reason, I'm not convinced that those composes to a sensible interface. As some example questions which appear unresolved to me: if hugetlb pages are lost, what mechanisms will we provide to tell automation or the system administrator what to do in that scenario? How should the interface for resolving hugepage starvation due to repeated OOMs look? By what metrics will you decide if releasing the hugepage is worse for the system than selecting a victim for OOM? Why can't the system use the existing THP mechanisms to resolve this ahead of time? Thanks, Chris