From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B39B5C433F5 for ; Thu, 21 Apr 2022 01:02:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0DFF26B0071; Wed, 20 Apr 2022 21:02:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 090566B0073; Wed, 20 Apr 2022 21:02:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E72326B0074; Wed, 20 Apr 2022 21:02:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id D56A66B0071 for ; Wed, 20 Apr 2022 21:02:53 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AAEBC263FD for ; Thu, 21 Apr 2022 01:02:53 +0000 (UTC) X-FDA: 79379086626.26.39A7984 Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by imf22.hostedemail.com (Postfix) with ESMTP id D995FC001A for ; Thu, 21 Apr 2022 01:02:52 +0000 (UTC) Received: by mail-lj1-f174.google.com with SMTP id r18so3962075ljp.0 for ; Wed, 20 Apr 2022 18:02:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=EZyhbrXig3dMdrG9GM7X6IR/WeE15omTiG/3kYyTJVY=; b=g4gS8XHB2+48Izq7sUhpzYgdHxSIQfCuTzGVZPq9g90KepdsbcPzady+lJTRox7oWp l0ug3I0ihNPBCsGyL6TLTHHxu7Cepx6p5Nn/OI/BtY/5c/uXYD5Ysvye1AslAM+o6MSo B/3pMoWdxQl33WVaYQrG3C78hF/z66sRDrgJ0/vzW2lYNLk5WFVScKdk36TUApbMRfZS dS18Q9nVD5MWEmBwHKM8ymdJ6/r1DU089VpJl/xiSyTYgCbxXSTBY6OpeBkoXWB/zxyu 2CvEBUTZoloAdAfBJwulrrFRZAGQtVe3LGh6yGb5o//8dAQkmUkVTaaI6ZOU2mEtRcQd 1+3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=EZyhbrXig3dMdrG9GM7X6IR/WeE15omTiG/3kYyTJVY=; b=G83ytiQcbVNBiB++BNAyuSyYRAJq494IyXljenMLtxr4VTB5uyVV34uQKfjH2iyfLG +RXWQxiD1gzu/YxwEoQz1rNOy5L1Xlxin39039aGrsuGFc18341UE2l13mrr0W4/MSFi UMnR0Ib5cvcHQO54YIec5hksnvTHe+J5Eih3qMo7RclmhlTFFaz9DB2mp/igi5mhCXmt opA5mI466HSDHc2oXbThbvF5BU8n3kkiZl4NumIK8CzobZhRkLUUZyRWTa2H69eYO1Pd +klwlIEj1e57Xk6M2qrARsrcvC9hBj3WaMkPW4U7+lGTPOXBHImMA8Wk4axZgTnMmpaT RTJQ== X-Gm-Message-State: AOAM531GDT7JCGxWgG94H8wXktuxk350E8DrIC2GiQIe5UZ1Z89eGzWV 7tOALtKct8OQWnNNQk8yQ+wrb/ins8E1cIfWGVV3VA== X-Google-Smtp-Source: ABdhPJyibWCXyb701NXL4c4PbrmVy/8EQhs2Gez/ogC+9L9Fc9vzDd4sXJH/p+g0lpUdvfcqhAeAYA/ZrxePjwTmxWU= X-Received: by 2002:a2e:9a85:0:b0:24d:b9ee:a77f with SMTP id p5-20020a2e9a85000000b0024db9eea77fmr11592120lji.35.1650502971163; Wed, 20 Apr 2022 18:02:51 -0700 (PDT) MIME-Version: 1.0 References: <20220414180612.3844426-1-zokeefe@google.com> <8d8da2fb-aed9-96d0-47ed-94806e190250@redhat.com> <0cb08671-52b2-608a-74f1-eb6fdce5f100@redhat.com> In-Reply-To: From: "Zach O'Keefe" Date: Wed, 20 Apr 2022 18:02:14 -0700 Message-ID: Subject: Re: [PATCH v2 00/12] mm: userspace hugepage collapse To: Yang Shi Cc: David Hildenbrand , Peter Xu , Alex Shi , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=g4gS8XHB; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=zokeefe@google.com X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D995FC001A X-Stat-Signature: 3hakogsdwcqmjh4uo6tgq3ug4smr4o71 X-HE-Tag: 1650502972-371787 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Apr 20, 2022 at 5:57 PM Yang Shi wrote: > > On Tue, Apr 19, 2022 at 3:43 PM Zach O'Keefe wrote: > > > > On Tue, Apr 19, 2022 at 1:03 PM David Hildenbrand wrote: > > > > > > >> E.g., have with a very sparse memory layout, we don't want to waste > > > >> memory by allocating memory where we actually have no page populated yet > > > >> -- could be user space won't reuse that memory in the foreseeable > > > >> future. With too many swap entries, we don't want to trigger an > > > >> eventually unnecessary overhead of swapping in entries if user space > > > >> won't access them in the foreseeable future. Something similar applies > > > >> to max_ptes_shared, where one might just end up wasting a lot of memory > > > >> eventually in some applications. > > > >> > > > >> So IMHO, with MADV_COLLAPSE we should ignore/disable any heuristics that > > > >> try figuring out what user space might be doing. We know exactly what > > > >> user space asks for -- and that can be documented properly. > > > >> > > > > > > Just a thought, if we ever want to implement khugepaged in user space, > > > it could theoretically obtain similar information using e.g., the > > > pagemap. It wouldn't be race-free, but the question is if it would matter. > > > > > > I consider the primary use case of giving an application more precise > > > control over actual THP placement. > > > > > > > Good point about the pagemap and agree about the primary use case - > > I'll make that clear in v3 cover letter. > > > > > > > > > > Sounds good to me. Would you also be in favor of decoupling allocation > > > > semantics from khugepaged? I.e. we'll pick some default gfp flags and > > > > not depend on /sys/kernel/mm/transparent_hugepage/khugepaged/defrag? > > > > > > Good question. It's not really a heuristic like that other stuff. > > > > > > Easy answer: we're not dealing with khugepaged, so anything in > > > /sys/kernel/mm/transparent_hugepage/khugepaged/ shouldn't apply? > > > > > > > That's what I'm thinking now too. If there's no objections, I'll > > proceed in that direction for v3. > > I agree, we should not treat MADV_COLLAPSE as "userspace khugepaged" > IMHO. It is still best effort though, but it is requested by the users > explicitly so kernel should trust the users' judgement and ignore > those max_ptes_* since we should assume the users know what they are > doing and the cost. > Thanks for reading and giving your thoughts, Yang. Glad to hear we are aligned here! I'll send out a v3 early next week. Only real change is the gfp flags, but I want to avoid spamming folks so soon since v2. Thanks, Zach > > > > > Sure, we could have a separate toggles for MADV_COLLAPSE. > > > > > > Maybe we simply want a dedicated syscall where we can specify additional > > > options ... but maybe that simply over-complicates the problem. > > > > > > > Thankfully process_madvise(2) has flags, and madvise(2) users can > > always migrate to using process_madvise(2) on self. Piggy-backing off > > madvise infrastructure for these "non-advice actions" (e.g. > > MADV_PAGEOUT) seems to be the norm. > > > > Thanks as always for your time and thoughts! > > > > Zach > > > > > -- > > > Thanks, > > > > > > David / dhildenb > > >