From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 87A50C433EF for ; Thu, 21 Apr 2022 00:57:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCD4C6B0071; Wed, 20 Apr 2022 20:57:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7D1B6B0073; Wed, 20 Apr 2022 20:57:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF6AD6B0074; Wed, 20 Apr 2022 20:57:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 9FF746B0071 for ; Wed, 20 Apr 2022 20:57:01 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 61C8826331 for ; Thu, 21 Apr 2022 00:57:01 +0000 (UTC) X-FDA: 79379071842.28.6E6E0AB Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf16.hostedemail.com (Postfix) with ESMTP id BE02F180028 for ; Thu, 21 Apr 2022 00:56:59 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id b7so3362051plh.2 for ; Wed, 20 Apr 2022 17:57:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fwEfM6ywktFoJnPwdcTVr8+uO5uovhOVBB6coZ6ItLY=; b=FFL0CYwFl7t8dWJ3wYeTpWMSUlowv9achF1T86RqugH5c6mUXr0xOoyqo8Mc7kcJoy dk5+JdYHHf3XX0wxmZori2wjQOukkZi5V0vJPmCWYCC2biTM1EHit+L6EGxSDYmfw9uV j7lnqXcmuw6zku04OrMLKhoc6B1EUFCI83nMAU02g1/FGuePvEfLDWL5L8KwBLVvPltU mAiRUKsffk7zPcc9dyhSzNh5FB7CzZvkhlgIBbJ5cWl/BKTjLYMNvnX0BNZn9MLXgaLF oycsjTYUZ5k5FikluNa8v/fBykwgd0tY5ApUzTtR/3t4LDi5lUNuyx+0dsezw90Xmy8D xIvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fwEfM6ywktFoJnPwdcTVr8+uO5uovhOVBB6coZ6ItLY=; b=piSeqSLmIHaMNW/lYVGYqf7oThP+r7BUKEsGiCCKZA/LEFetOuCMnOaeMoFys4939l xJ08sfNJwABoCPMymeD7DcowDpT+ptZEOcA8KjDyjIhu3tkl1aPk1NkJvifWIK3JwEv4 Yg2lwDx7iN6kaYxHsTGv+e5DFX1HRwpz477gZNCSyh+81Nws/TQmqpO7+Co3WxF+WxeN Z2fmV7A9yc3SIuaIfu7IavrOrfG2V10LO1jAeVTOQqVUb8Au4lQC7Wp+4WI8HQq4gHjR 9nU1n9GppPzrwy3WO4Sm/mr6tPBziF7pfLed9loulOMVShee6v1xs6VjfVVGBpW1jrsm xAhQ== X-Gm-Message-State: AOAM533dZ//7GbgU/NySiMhmx0uwH2jj+SEYnjNae9rMe1wJqBL7+FWv sYFWTXFNwg9X2+dwqUIsGeP4zBCOGHtDEVdnKbU= X-Google-Smtp-Source: ABdhPJyESNsjuMMCMGK6s3BAcG9IWb2c5pjkdGfwyiyRZb8avoRfSbShLr4nv89IwLsbVJ2YNWgW023QgDADTvU944k= X-Received: by 2002:a17:902:d5c3:b0:154:c472:de80 with SMTP id g3-20020a170902d5c300b00154c472de80mr23002437plh.87.1650502620007; Wed, 20 Apr 2022 17:57:00 -0700 (PDT) MIME-Version: 1.0 References: <20220414180612.3844426-1-zokeefe@google.com> <8d8da2fb-aed9-96d0-47ed-94806e190250@redhat.com> <0cb08671-52b2-608a-74f1-eb6fdce5f100@redhat.com> In-Reply-To: From: Yang Shi Date: Wed, 20 Apr 2022 17:56:47 -0700 Message-ID: Subject: Re: [PATCH v2 00/12] mm: userspace hugepage collapse To: "Zach O'Keefe" Cc: David Hildenbrand , Peter Xu , Alex Shi , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , Linux MM , Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: BE02F180028 X-Stat-Signature: mykbboqcu1uxgxdxckp3i81qfhkn5zd8 Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=FFL0CYwF; spf=pass (imf16.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1650502619-139619 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 19, 2022 at 3:43 PM Zach O'Keefe wrote: > > On Tue, Apr 19, 2022 at 1:03 PM David Hildenbrand wrote: > > > > >> E.g., have with a very sparse memory layout, we don't want to waste > > >> memory by allocating memory where we actually have no page populated yet > > >> -- could be user space won't reuse that memory in the foreseeable > > >> future. With too many swap entries, we don't want to trigger an > > >> eventually unnecessary overhead of swapping in entries if user space > > >> won't access them in the foreseeable future. Something similar applies > > >> to max_ptes_shared, where one might just end up wasting a lot of memory > > >> eventually in some applications. > > >> > > >> So IMHO, with MADV_COLLAPSE we should ignore/disable any heuristics that > > >> try figuring out what user space might be doing. We know exactly what > > >> user space asks for -- and that can be documented properly. > > >> > > > > Just a thought, if we ever want to implement khugepaged in user space, > > it could theoretically obtain similar information using e.g., the > > pagemap. It wouldn't be race-free, but the question is if it would matter. > > > > I consider the primary use case of giving an application more precise > > control over actual THP placement. > > > > Good point about the pagemap and agree about the primary use case - > I'll make that clear in v3 cover letter. > > > > > > > Sounds good to me. Would you also be in favor of decoupling allocation > > > semantics from khugepaged? I.e. we'll pick some default gfp flags and > > > not depend on /sys/kernel/mm/transparent_hugepage/khugepaged/defrag? > > > > Good question. It's not really a heuristic like that other stuff. > > > > Easy answer: we're not dealing with khugepaged, so anything in > > /sys/kernel/mm/transparent_hugepage/khugepaged/ shouldn't apply? > > > > That's what I'm thinking now too. If there's no objections, I'll > proceed in that direction for v3. I agree, we should not treat MADV_COLLAPSE as "userspace khugepaged" IMHO. It is still best effort though, but it is requested by the users explicitly so kernel should trust the users' judgement and ignore those max_ptes_* since we should assume the users know what they are doing and the cost. > > > Sure, we could have a separate toggles for MADV_COLLAPSE. > > > > Maybe we simply want a dedicated syscall where we can specify additional > > options ... but maybe that simply over-complicates the problem. > > > > Thankfully process_madvise(2) has flags, and madvise(2) users can > always migrate to using process_madvise(2) on self. Piggy-backing off > madvise infrastructure for these "non-advice actions" (e.g. > MADV_PAGEOUT) seems to be the norm. > > Thanks as always for your time and thoughts! > > Zach > > > -- > > Thanks, > > > > David / dhildenb > >