From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59027C432BE for ; Fri, 6 Aug 2021 03:34:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 321D861184 for ; Fri, 6 Aug 2021 03:34:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241009AbhHFDe0 (ORCPT ); Thu, 5 Aug 2021 23:34:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50978 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231173AbhHFDeZ (ORCPT ); Thu, 5 Aug 2021 23:34:25 -0400 Received: from mail-qt1-x834.google.com (mail-qt1-x834.google.com [IPv6:2607:f8b0:4864:20::834]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 052DDC061799 for ; Thu, 5 Aug 2021 20:34:10 -0700 (PDT) Received: by mail-qt1-x834.google.com with SMTP id c5so5505012qtp.13 for ; Thu, 05 Aug 2021 20:34:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=y8ogEIVcPiFxpMZEeFL+iB1RVC44ARrL6FG7si/okV8=; b=GtN34Q4YXatK1FMDurMfqzSx7R3KDMxupETssqL8pGpziaAyzf1nRXPdqlfpp06Gbi NjER6XOHZV5MUt/Tr6ws1pXwJHYoNlOpWYuyM6cYzihK6CD4QpmjPptGUP3BognP/y5r IMn+8LZvRCNaaHmv8GbKDgzSked5v5yzzQRcAGlGy8ijswcgSpc0YUVAgpRDvsZ7+yVx b8lIDWO7QnJuwZA7B3nl0GdDT1FZEd7jPc7mZ7mv/BRg6fLuH3EuacGjugEuZvGSaXFv pFXw1NUa4gT6YyyFR9ylK7FKqo83wUrqG+XytD5mjyBoxcvIeJkhT0ulox1y/HW5wrF4 etUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=y8ogEIVcPiFxpMZEeFL+iB1RVC44ARrL6FG7si/okV8=; b=bpIEBT75fzCEdSMZZ0hbhpHFwA7sLSJohw4sHpGsbCIZTf7qodCYqSspLrPt6hCJR+ 6JD5ewdUKZGXEzvbrP5nIW82vMwtqTDvOkpjo87K/BvV4gWDuqqbl0mbDtOrS/N7p2GX ZRST1dxSfr2HJd29393CY01QS6w3CHvzXHX/sL/8U0uAE/ygb7RnJYTZgA62lBK5wMUY yzyT86I+iCIwblAFjRLvoNnqLrjwwMd09zruD6IChCc5dyMdfa9M1RMAAuSqlFTC3AXy 5Umyt9KGJ3oA4OQmweX0QNKmhOEF5inIi/77bC/m+jbh0HnX4ErgDs/5j5ShRJBZjDsa Q2CQ== X-Gm-Message-State: AOAM530GYlaJJsz5W/dTrfLS3Ea6Lwbe4FdtNNx7q0l9AwPRU7dYfs44 qfXZ9B1R/3u5MwCbWKiuM+vwdw== X-Google-Smtp-Source: ABdhPJy8OuSwStOaQYrXr5aTHo+kB9ZBtuCUXy8wngT9cE4vUh2LOYzxm50BNMMBD1AItuok2DUtOw== X-Received: by 2002:ac8:5ac7:: with SMTP id d7mr7294096qtd.240.1628220848602; Thu, 05 Aug 2021 20:34:08 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x10sm3936003qkf.91.2021.08.05.20.34.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Aug 2021 20:34:07 -0700 (PDT) Date: Thu, 5 Aug 2021 20:33:21 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: "Kirill A. Shutemov" cc: Hugh Dickins , Andrew Morton , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 07/16] memfd: memfd_create(name, MFD_HUGEPAGE) for shmem huge pages In-Reply-To: <20210804140341.m3ptxesrxwivqjmk@box.shutemov.name> Message-ID: <7852f33a-bfe8-cbf6-65c8-30f7c06d5e@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> <20210804140341.m3ptxesrxwivqjmk@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 4 Aug 2021, Kirill A. Shutemov wrote: > On Fri, Jul 30, 2021 at 12:45:49AM -0700, Hugh Dickins wrote: > > Commit 749df87bd7be ("mm/shmem: add hugetlbfs support to memfd_create()") > > in 4.14 added the MFD_HUGETLB flag to memfd_create(), to use hugetlbfs > > pages instead of tmpfs pages: now add the MFD_HUGEPAGE flag, to use tmpfs > > Transparent Huge Pages when they can be allocated (flag named to follow > > the precedent of madvise's MADV_HUGEPAGE for THPs). > > I don't like the interface. THP supposed to be transparent, not yet another > hugetlbs. THP is transparent in the sense that it builds hugepages from the normal page pool, when it can (or not when it cannot), rather than promising hugepages from a separate pre-reserved hugetlbfs pool. Not transparent in the sense that it cannot be limited or guided. > > > /sys/kernel/mm/transparent_hugepage/shmem_enabled "always" or "force" > > already made this possible: but that is much too blunt an instrument, > > affecting all the very different kinds of files on the internal shmem > > mount, and was intended just for ease of testing hugepage loads. > > I wounder if your tried "always" in production? What breaks? Maybe we can > make it work with a heuristic? This would speed up adoption. We have not tried /sys/kernel/mm/transparent_hugepage/shmem_enabled "always" in production. Is that an experiment I want to recommend for production? No, I don't think so! Why should we? I am not looking to "speed up adoption" of huge tmpfs everywhere: let those who find it useful use it, there is no need for it to be used everywhere. We have had this disagreement before: you were aiming for tmpfs on /tmp huge=always, I didn't see the need for that; but we have always agreed that it should not be broken there, and the better it works the better - you did the unused_huge_shrink stuff in particular to meet such cases. > > If a tunable needed, I would rather go with fadvise(). It would operate on > a couple of bits per struct file and they get translated into VM_HUGEPAGE > and VM_NOHUGEPAGE on mmap(). > > Later if needed fadvise() implementation may be extended to track > requested ranges. But initially it can be simple. Let me shift that to the 08/16 (fcntl) response, and here answer: > Hm, But why is the MFD_* needed if the fcntl() can do the same. You're right, MFD_HUGEPAGE (and MFD_MEM_LOCK) are not strictly needed if there's an fcntl() or fadvise() which can do that too. But MFD_HUGEPAGE is the option which was first asked for, and is the most popular usage internally - I did the fcntl at the same time, and it has been found useful, but MFD_HUGEPAGE was the priority (largely because fiddling with shmem_enabled interferes with everyone's different usages, whereas huge=always on a mount can be deployed selectively). And it makes good sense for memfd_create() to offer MFD_HUGEPAGE, as it is already offering MFD_HUGETLB: when we document MFD_HUGEPAGE next to MFD_HUGETLB in the memfd_create(2) man page, that will help developers to make a good choice. (You said MFD_*, so I take it that you're thinking of MFD_MEM_LOCK too: MFD_MEM_LOCK is something I added when building this series, when I realized that it became possible once size change permitted. Nobody here is using it yet, I don't mind if it's dropped; but it's natural to propose it as part of the series, and it can be justified as offering the memlock option which MFD_HUGETLB already bundles in.) Hugh From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E50BC4338F for ; Fri, 6 Aug 2021 03:34:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 246E9611C5 for ; Fri, 6 Aug 2021 03:34:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 246E9611C5 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 892A98D0001; Thu, 5 Aug 2021 23:34:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 81D996B0071; Thu, 5 Aug 2021 23:34:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6BADE8D0001; Thu, 5 Aug 2021 23:34:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0190.hostedemail.com [216.40.44.190]) by kanga.kvack.org (Postfix) with ESMTP id 4C4F26B006C for ; Thu, 5 Aug 2021 23:34:10 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D5F0B180ACC34 for ; Fri, 6 Aug 2021 03:34:09 +0000 (UTC) X-FDA: 78443237418.15.710C39F Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by imf14.hostedemail.com (Postfix) with ESMTP id 83283600BF12 for ; Fri, 6 Aug 2021 03:34:09 +0000 (UTC) Received: by mail-qt1-f181.google.com with SMTP id e15so2852030qtx.1 for ; Thu, 05 Aug 2021 20:34:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=y8ogEIVcPiFxpMZEeFL+iB1RVC44ARrL6FG7si/okV8=; b=GtN34Q4YXatK1FMDurMfqzSx7R3KDMxupETssqL8pGpziaAyzf1nRXPdqlfpp06Gbi NjER6XOHZV5MUt/Tr6ws1pXwJHYoNlOpWYuyM6cYzihK6CD4QpmjPptGUP3BognP/y5r IMn+8LZvRCNaaHmv8GbKDgzSked5v5yzzQRcAGlGy8ijswcgSpc0YUVAgpRDvsZ7+yVx b8lIDWO7QnJuwZA7B3nl0GdDT1FZEd7jPc7mZ7mv/BRg6fLuH3EuacGjugEuZvGSaXFv pFXw1NUa4gT6YyyFR9ylK7FKqo83wUrqG+XytD5mjyBoxcvIeJkhT0ulox1y/HW5wrF4 etUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=y8ogEIVcPiFxpMZEeFL+iB1RVC44ARrL6FG7si/okV8=; b=YgqXIeeWQLVI5xtmf9nfwPkNP4d7iDEOoH6eUem5FWzJiekQ8sfoJ4KBMmUXdaHVtp mDJibJYxxnGqWf5OeBzysKURbqDiEGrcpNgg0/GBtj/DRZhYKVPEqv2jn5GxAlVhFAzI TzulzguiRegklqZzSxr8HxFnoAB45JfW8mt39iqtPDrYhepRDSXgFA2t44gZlI/rq34S ZStzCyVK590yEnmvo8LOBkYdFmeDyyyazJmLuHWIsjgCHsks+tOFCn5ZTtgBiv/BQZ9r gw/+IDMxrJqbttbdSyNxMULpv9V/+F0hwb3Y9jy9xtmjBGfRQJIea0MX2zT2Aj8MP3nn Kxgg== X-Gm-Message-State: AOAM532LdVl1J43MY9afy/MXwWUz02izH/TDa09tVmqidS6S8tQypGIa DbD/zomMIIbCJ7anX6TPNrtUXw== X-Google-Smtp-Source: ABdhPJy8OuSwStOaQYrXr5aTHo+kB9ZBtuCUXy8wngT9cE4vUh2LOYzxm50BNMMBD1AItuok2DUtOw== X-Received: by 2002:ac8:5ac7:: with SMTP id d7mr7294096qtd.240.1628220848602; Thu, 05 Aug 2021 20:34:08 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x10sm3936003qkf.91.2021.08.05.20.34.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Aug 2021 20:34:07 -0700 (PDT) Date: Thu, 5 Aug 2021 20:33:21 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.anvils To: "Kirill A. Shutemov" cc: Hugh Dickins , Andrew Morton , Shakeel Butt , "Kirill A. Shutemov" , Yang Shi , Miaohe Lin , Mike Kravetz , Michal Hocko , Rik van Riel , Christoph Hellwig , Matthew Wilcox , "Eric W. Biederman" , Alexey Gladkov , Chris Wilson , Matthew Auld , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 07/16] memfd: memfd_create(name, MFD_HUGEPAGE) for shmem huge pages In-Reply-To: <20210804140341.m3ptxesrxwivqjmk@box.shutemov.name> Message-ID: <7852f33a-bfe8-cbf6-65c8-30f7c06d5e@google.com> References: <2862852d-badd-7486-3a8e-c5ea9666d6fb@google.com> <20210804140341.m3ptxesrxwivqjmk@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=GtN34Q4Y; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of hughd@google.com designates 209.85.160.181 as permitted sender) smtp.mailfrom=hughd@google.com X-Stat-Signature: oga6n4efptmmikzm6f8fdkc87ysnw1ug X-Rspamd-Queue-Id: 83283600BF12 X-Rspamd-Server: rspam01 X-HE-Tag: 1628220849-73097 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 4 Aug 2021, Kirill A. Shutemov wrote: > On Fri, Jul 30, 2021 at 12:45:49AM -0700, Hugh Dickins wrote: > > Commit 749df87bd7be ("mm/shmem: add hugetlbfs support to memfd_create()") > > in 4.14 added the MFD_HUGETLB flag to memfd_create(), to use hugetlbfs > > pages instead of tmpfs pages: now add the MFD_HUGEPAGE flag, to use tmpfs > > Transparent Huge Pages when they can be allocated (flag named to follow > > the precedent of madvise's MADV_HUGEPAGE for THPs). > > I don't like the interface. THP supposed to be transparent, not yet another > hugetlbs. THP is transparent in the sense that it builds hugepages from the normal page pool, when it can (or not when it cannot), rather than promising hugepages from a separate pre-reserved hugetlbfs pool. Not transparent in the sense that it cannot be limited or guided. > > > /sys/kernel/mm/transparent_hugepage/shmem_enabled "always" or "force" > > already made this possible: but that is much too blunt an instrument, > > affecting all the very different kinds of files on the internal shmem > > mount, and was intended just for ease of testing hugepage loads. > > I wounder if your tried "always" in production? What breaks? Maybe we can > make it work with a heuristic? This would speed up adoption. We have not tried /sys/kernel/mm/transparent_hugepage/shmem_enabled "always" in production. Is that an experiment I want to recommend for production? No, I don't think so! Why should we? I am not looking to "speed up adoption" of huge tmpfs everywhere: let those who find it useful use it, there is no need for it to be used everywhere. We have had this disagreement before: you were aiming for tmpfs on /tmp huge=always, I didn't see the need for that; but we have always agreed that it should not be broken there, and the better it works the better - you did the unused_huge_shrink stuff in particular to meet such cases. > > If a tunable needed, I would rather go with fadvise(). It would operate on > a couple of bits per struct file and they get translated into VM_HUGEPAGE > and VM_NOHUGEPAGE on mmap(). > > Later if needed fadvise() implementation may be extended to track > requested ranges. But initially it can be simple. Let me shift that to the 08/16 (fcntl) response, and here answer: > Hm, But why is the MFD_* needed if the fcntl() can do the same. You're right, MFD_HUGEPAGE (and MFD_MEM_LOCK) are not strictly needed if there's an fcntl() or fadvise() which can do that too. But MFD_HUGEPAGE is the option which was first asked for, and is the most popular usage internally - I did the fcntl at the same time, and it has been found useful, but MFD_HUGEPAGE was the priority (largely because fiddling with shmem_enabled interferes with everyone's different usages, whereas huge=always on a mount can be deployed selectively). And it makes good sense for memfd_create() to offer MFD_HUGEPAGE, as it is already offering MFD_HUGETLB: when we document MFD_HUGEPAGE next to MFD_HUGETLB in the memfd_create(2) man page, that will help developers to make a good choice. (You said MFD_*, so I take it that you're thinking of MFD_MEM_LOCK too: MFD_MEM_LOCK is something I added when building this series, when I realized that it became possible once size change permitted. Nobody here is using it yet, I don't mind if it's dropped; but it's natural to propose it as part of the series, and it can be justified as offering the memlock option which MFD_HUGETLB already bundles in.) Hugh