From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F12A7C433EF for ; Thu, 26 May 2022 17:39:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 902568D0006; Thu, 26 May 2022 13:39:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B52F8D0001; Thu, 26 May 2022 13:39:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 777A68D0006; Thu, 26 May 2022 13:39:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6770D8D0001 for ; Thu, 26 May 2022 13:39:57 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 37AEB20721 for ; Thu, 26 May 2022 17:39:57 +0000 (UTC) X-FDA: 79508607234.29.5540D7A Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf30.hostedemail.com (Postfix) with ESMTP id A4EFF8003D for ; Thu, 26 May 2022 17:39:26 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id o9-20020a17090a0a0900b001df3fc52ea7so5005689pjo.3 for ; Thu, 26 May 2022 10:39:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UPrnwTfUWOpUHq1KniNStECg2nrIvcdzabAxSeT5xtY=; b=hVcBtvlW8Q+5XYqxfxGkegmpAUoBgHNlUCHMk7u6Zz9/PbG19jN+eGCzmfZsghNtgk d2IcPKrHT4h+TytLhSjXs7y/cemG8PWLIJ1EvIF3UolWTm/fnCVFOuZcA4Q7MaXswtEF naVgekPp9LLD6klP+54H6slj/JDf07ld8/bHBs7nOyI3qtb/2o07q+/d0uwPyBCxQx9N XyB+3eIBtGApuWZEEflov+43CWUrW2qx1ArZ6qQ7YGj0qjBblXRQXkZ+Bk66AUEomugo J8IXHthoGnJMCAVbdkA3mlK6fQA9SzrxeWCLyL2eV/172J2/UlrZE3veSgu//4VR3O3f ro9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UPrnwTfUWOpUHq1KniNStECg2nrIvcdzabAxSeT5xtY=; b=f2Ur7rONcoSBtkPhtoPWkbjqZYFuNrocdPRkYdxNfNE5aCgLrKoOYRWjFvXlKDkizV aEmGi4BRhh+Z+wWXimDrsM/AmyNG4kQbo9XsRGDEo1P/2VW4IbrSMz8mcNelPLmwM1V2 74QMqUCvGgV645GiOGYgOqHpaoSNv8dZVlL0EgRZ0zhAxGXpmmgfVYggJZbGc29cFmr/ sNoAP68xexABL2awYLZ96fTwb08Tq0OBAePVq36DssHXv8tKtbF+FkvCE3FBbC22wJPv YfGdBvNmqLpVGvdFHZhLhaIezlMY6hLNDLhwaUlKsbNeAzXFpZI+rSbn0S8zEn4HuQP4 cvfg== X-Gm-Message-State: AOAM530dmOJniYOTPyo6FTrnWfkNVjD7/ca28D43w12wZwBhu5Yp/xDl 8MSYuS+pWrrNH+Gk6DKAtXzmDHtKkVP6Xc18r5U= X-Google-Smtp-Source: ABdhPJwl4ndCso7Ir+MzGvzUFE1sWpggn96GoxwAki4i0z/Tv7te6B9g3ftmblgUnPxslbmWNpOfv3gYx4Gcq4c5VBc= X-Received: by 2002:a17:90a:de15:b0:1df:63dd:9cfc with SMTP id m21-20020a17090ade1500b001df63dd9cfcmr3827937pjv.200.1653586795775; Thu, 26 May 2022 10:39:55 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yang Shi Date: Thu, 26 May 2022 10:39:42 -0700 Message-ID: Subject: Re: [RFC] mm: MADV_COLLAPSE semantics To: Michal Hocko Cc: "Zach O'Keefe" , Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Peter Xu , Song Liu , Linux MM , Rongwei Wang , Andrea Arcangeli , Axel Rasmussen , Hugh Dickins , "Kirill A. Shutemov" , Minchan Kim , SeongJae Park , Pasha Tatashin Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: A4EFF8003D X-Stat-Signature: 865zw94cawmdgp5aqaa1y46hufc9g15z Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=hVcBtvlW; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1653586766-415467 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, May 26, 2022 at 12:12 AM Michal Hocko wrote: > > On Wed 25-05-22 10:32:44, Yang Shi wrote: > > On Wed, May 25, 2022 at 1:24 AM Michal Hocko wrote: > > > > > > On Mon 23-05-22 17:18:32, Zach O'Keefe wrote: > > > [...] > > > > Idea: MADV_COLLAPSE should respect VM_NOHUGEPAGE and "never" THP mode, > > > > but otherwise would attempt to collapse. > > > > > > I do agree that {process_}madvise should fail on VM_NOHUGEPAGE. The > > > process has explicitly noted that THP shouldn't be used on such a VMA > > > and seeing THP could be observed as not complying with that contract. > > > > > > I am not so sure about the global "never" policy, though. The global > > > policy controls _kernel_ driven THPs. As the request to collapse memory > > > comes from the userspace I do not think it should be limited by the > > > kernel policy. I also think it can be beneficial to implement userspace > > > based THP policies and exclude any kernel interference and that could be > > > achieved by global kernel "never" policy and implement the whole > > > functionality by process_madvise. > > > > I'd prefer to respect "never" for now since it is typically used to > > disable THP globally even though the mappings are madvised > > (MADV_HUGEPAGE). IMHO I treat MADV_COLLAPSE as weaker MADV_HUGEPAGE > > (take effect for non-madvised mappings but not flip VM_NOHUGEPAGE) + > > best-effort synchronous THP collapse. > > MADV_HUGEPAGE is a way to tell the kernel what and how to do in future > time by the kernel. MADV_COLLAPSE is a way tell what the userspace want > at the moment of the call. So I do not really think they are directly > related in any way except they somehow control THP. > > The primary question here is whether we want to support usecases which > want to completely rule out THP handling by the kernel and only rely on > the userspace. If yes, I do not see other way than using never global > policy and rely on MADV_COLLAPSE from the userspace. Or am I missing > something? I'm not sure whether we want to reach that eventually. But isn't "madvise" good enough? "madvise" also means to give the delegation to the users IMHO. The users decide whether huge page is preferred or not. The users could implement policies: No - MADV_NOHUGEPAGE Yes - MADV_HUGEPAGE But the THP allocation is deferred to real access (page fault) or khugepaged. So I treated MADV_COLLAPSE as weaker MAD_HUGEPAGE + synchronous THP allocation. > > > We could lift the restriction in the future if it turns out non > > respecting "never" is more useful. > > I do not think we can change the behavior in the future without risking > regressions. Yeah we may get THP out of blue. So I thought "madvise" should be good enough. > -- > Michal Hocko > SUSE Labs