From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67DA7C433F5 for ; Tue, 31 May 2022 23:52:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D20A36B0072; Tue, 31 May 2022 19:52:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CA7996B0073; Tue, 31 May 2022 19:52:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1B1D6B0074; Tue, 31 May 2022 19:52:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9E23F6B0072 for ; Tue, 31 May 2022 19:52:37 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 63BE73358A for ; Tue, 31 May 2022 23:52:37 +0000 (UTC) X-FDA: 79527690354.30.6C608B0 Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) by imf26.hostedemail.com (Postfix) with ESMTP id AA05914004D for ; Tue, 31 May 2022 23:52:32 +0000 (UTC) Received: by mail-pl1-f179.google.com with SMTP id n8so205958plh.1 for ; Tue, 31 May 2022 16:52:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=C1MFWDri7HX7EW49u+d7dN2Z7x7tjSebo+EAg7UCAjA=; b=ZB7BNJEiou2DKaZu8vnm1StnYsAayBVodIylbm8S0jPym657JryMj+yry1RA6xxnUn 3S0oygR5jAViZnoi0x6KRWZDGApLiyri7DL3flAb6ViPL5fqpdsrSy6iduy+iY96I2ts SbHggHSilzjWJM+6DqUYb8IIrudi8DpYivg0NQZ8O6Vm0DxHFrfQSJEsczaxupMPYSDn /cVZWZ9UwNlrGzxR6rGmEXsoIU4QmpCH56/KWyRCyebzfVb1yQChK49NJM3XjEZcjZuZ xfBrhoQMxB/Mvq3NxC9cxTCnUSYJZ9RYI2j1ZBZmVJzccthXkFwqKYctHvH3GUlvnTk9 UmKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=C1MFWDri7HX7EW49u+d7dN2Z7x7tjSebo+EAg7UCAjA=; b=l3JiMsWmwCENEg9/qKwgtkRZ6t0fJ909AjdH1U1QRDdA0Cgzo0HrOAKkKK6ETGOBgW Jp+OqSqh36nbUHe5B4lF0gM5iAWXacg0r9JghVHcxLi+bR6ffhQsZm2//0O0AEuG6d6j Cl0wjV1s2xWs+9iRhHDwxtMkBeXclXD7fKlvuSQ+mLYLUbZwNbaaXhrYmv7h3aUsVkWN T1CrGXoLBvGrS2XQpFCWO6XVZqDQ1LaxrbvtZb7hX9JcCpLTLKEQwOapA4FINSKIvCVo DLhRQ5rTtR4yQUb9lbImoctPhqk+Tj6J3QqtJEctO30InrqblnjzqkNZze++Vrlh+oS7 leLA== X-Gm-Message-State: AOAM531y61VO+BhJapJIiAaXrT2YPFGH9SVgvQ3RKeIvGqRodRj9pI1w pvwR0iiDh88vp6Wq5zjaUiHoAG+8vOPlgFSG3BQ= X-Google-Smtp-Source: ABdhPJx0dBo+SQKospp9Z9c7wjdYwC6Csn0wL+aM3/qVHq97ctnvbx0ci7gHO6gOZE6R4Nj6RXOnAvTZ8gE0prtblbE= X-Received: by 2002:a17:902:aa07:b0:162:467:db94 with SMTP id be7-20020a170902aa0700b001620467db94mr52156144plb.26.1654041156025; Tue, 31 May 2022 16:52:36 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yang Shi Date: Tue, 31 May 2022 16:52:23 -0700 Message-ID: Subject: Re: [RFC] mm: MADV_COLLAPSE semantics To: "Zach O'Keefe" Cc: Matthew Wilcox , Michal Hocko , Peter Xu , Alex Shi , David Hildenbrand , David Rientjes , Song Liu , Linux MM , Rongwei Wang , Andrea Arcangeli , Axel Rasmussen , Hugh Dickins , "Kirill A. Shutemov" , Minchan Kim , SeongJae Park , Pasha Tatashin Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: mrr36gk5uos3e8sycms34nhjz88fhbm1 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=ZB7BNJEi; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.179 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AA05914004D X-HE-Tag: 1654041152-209203 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 31, 2022 at 2:37 PM Zach O'Keefe wrote: > > Thanks everyone for your time and for the great discussion! > > For the purposes of arriving at a decision, I've tried to outline the > major points + my 2c below as: Thanks for summing up the discussion. > > 1. Breaking userland. AFAIK, if permitting MADV_COLLAPSE in "never" > will break real, existing use cases, then linux's policy would > necessitate that we don't do that. Is there a way we can reasonably > determine this? An affirmative answer here makes this decision easy. I don't have an affirmative answer. It depends on the users' expectations. Some users may expect there won't be any THP allocation in "never" mode even though it is requested by the users. AFAICT some sys admins may expect so since they may manage machines which may run untrusted software. So allowing MADV_COLLAPSE in "never" doesn't break any workload, but may break some expectations. > > 2. Current uses of "never" a.k.a dev/debug. If (1) is false, then > we've asserted that *currently* "never" is only used for > development/debugging. During development of MADV_COLLAPSE, I found it > necessary to disable khugepaged via a new debugfs tunable to prevent > khugepaged collapsing memory before MADV_COLLAPSE could act. If > MADV_COLLAPSE wasn't tied to "never", it's one less debugfs tunable > we'd need. OTOH, I can still see the benefit, during debugging, of a > master "no THPs" switch. If we think we'll ever want that master > switch, then let's just keep "never" as said switch. > > 3. Future uses of "never". Do we want to permit a policy where > userspace *entirely* takes over THP allocation, and khugepaged and > at-fault is disabled in the kernel? If yes, then then might as well > permit "never" to allow that now. Personally, though, I can't imagine > wanting to disable faulting-in THPs in places where we know data will > be hot; but respecting "never" does back us into a corner if we ever > go that route. > > 4. Flexibility / separation of concerns: All else being equal, > decoupling user MADV_COLLAPSE from kernel THP sysfs controls is more > flexible and consistent with the rest of MADV_COLLAPSE semantics. > > If that's roughly accurate, and in lieu of any other critical points, > if we can determine (1), then I'd prefer "never" to be tied to kernel > decisions, not userspace. Any strong objections? I do not have strong objections, and I think Michal's point and yours do make some sense for some usecases. A simple way is to allow MADV_COLLAPSE in "never" mode, then see whether there will be any complaints. > > Thanks again for your time, > Zach