From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F223C433E0 for ; Tue, 2 Feb 2021 19:11:13 +0000 (UTC) Received: from ml01.01.org (ml01.01.org [198.145.21.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EC19F64E92 for ; Tue, 2 Feb 2021 19:11:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC19F64E92 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nvdimm-bounces@lists.01.org Received: from ml01.vlan13.01.org (localhost [IPv6:::1]) by ml01.01.org (Postfix) with ESMTP id B64B7100EA2B9; Tue, 2 Feb 2021 11:11:12 -0800 (PST) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=198.145.29.99; helo=mail.kernel.org; envelope-from=rppt@kernel.org; receiver= Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ml01.01.org (Postfix) with ESMTPS id 60B6B100EB851 for ; Tue, 2 Feb 2021 11:11:06 -0800 (PST) Received: by mail.kernel.org (Postfix) with ESMTPSA id 47C6D64D87; Tue, 2 Feb 2021 19:10:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612293066; bh=Pzyl+wYYgvPCwgJ+GvBahmfBmshzS1gdeGZydw56C1k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=urlxNYufSkfJp0SH0EEfwFzSeIYwXDdueMnyILwXIullWx4SbMoEerX80f5OQqyLU Bg+fxp5cMzP5FzVAMlOPfjBauj+B1/aEKphaIK3R5lVLYCvCQesqrDIQmZvMjDXmNR wmsGMRNi9ZfDbI6vAJ5p7VMbSsLj4nb8l1NhazBIB10phtLBckQ1cOU2nN3vTXLilS uJtQ4CoJagfCIkQnsWqgtg/k3AnTkwWKwrvb5ozbBGjBggCHGi2zD+24ANJ5uL76kr bEh+jPfAk+q1UtVeseCuM6MVzwqCxfVLsegGHFyjAVr0PWSO8yZEm61pbzlwU5cRc5 HnguLQxiFymVg== Date: Tue, 2 Feb 2021 21:10:40 +0200 From: Mike Rapoport To: Michal Hocko Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <20210202191040.GP242749@kernel.org> References: <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> <20210126120823.GM827@dhcp22.suse.cz> <20210128092259.GB242749@kernel.org> <73738cda43236b5ac2714e228af362b67a712f5d.camel@linux.ibm.com> <6de6b9f9c2d28eecc494e7db6ffbedc262317e11.camel@linux.ibm.com> <20210202124857.GN242749@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: Message-ID-Hash: HO434KD2ZUQMQMOKX7I44VZXGSX4X7OY X-Message-ID-Hash: HO434KD2ZUQMQMOKX7I44VZXGSX4X7OY X-MailFrom: rppt@kernel.org X-Mailman-Rule-Hits: nonmember-moderation X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation CC: James Bottomley , David Hildenbrand , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt X-Mailman-Version: 3.1.1 Precedence: list List-Id: "Linux-nvdimm developer list." Archived-At: List-Archive: List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Tue, Feb 02, 2021 at 02:27:14PM +0100, Michal Hocko wrote: > On Tue 02-02-21 14:48:57, Mike Rapoport wrote: > > On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote: > > > On Mon 01-02-21 08:56:19, James Bottomley wrote: > > > > > > I have also proposed potential ways out of this. Either the pool is not > > > fixed sized and you make it a regular unevictable memory (if direct map > > > fragmentation is not considered a major problem) > > > > I think that the direct map fragmentation is not a major problem, and the > > data we have confirms it, so I'd be more than happy to entirely drop the > > pool, allocate memory page by page and remove each page from the direct > > map. > > > > Still, we cannot prove negative and it could happen that there is a > > workload that would suffer a lot from the direct map fragmentation, so > > having a pool of large pages upfront is better than trying to fix it > > afterwards. As we get more confidence that the direct map fragmentation is > > not an issue as it is common to believe we may remove the pool altogether. > > I would drop the pool altogether and instantiate pages to the > unevictable LRU list and internally treat it as ramdisk/mlock so you > will get an accounting correctly. The feature should be still opt-in > (e.g. a kernel command line parameter) for now. The recent report by > Intel (http://lkml.kernel.org/r/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com) > there is no clear win to have huge mappings in _general_ but there are > still workloads which benefit. > > > I think that using PMD_ORDER allocations for the pool with a fallback to > > order 0 will do the job, but unfortunately I doubt we'll reach a consensus > > about this because dogmatic beliefs are hard to shake... > > If this is opt-in then those beliefs can be relaxed somehow. Long term > it makes a lot of sense to optimize for a better direct map management > but I do not think this is a hard requirement for an initial > implementation if it is not imposed to everybody by default. > > > A more restrictive possibility is to still use plain PMD_ORDER allocations > > to fill the pool, without relying on CMA. In this case there will be no > > global secretmem specific pool to exhaust, but then it's possible to drain > > high order free blocks in a system, so CMA has an advantage of limiting > > secretmem pools to certain amount of memory with somewhat higher > > probability for high order allocation to succeed. > > > > > or you need a careful access control > > > > Do you mind elaborating what do you mean by "careful access control"? > > As already mentioned, a mechanism to control who can use this feature - > e.g. make it a special device which you can access control by > permissions or higher level security policies. But that is really needed > only if the pool is fixed sized. Let me reiterate to make sure I don't misread your suggestion. If we make secretmem an opt-in feature with, e.g. kernel parameter, the pooling of large pages is unnecessary. In this case there is no limited resource we need to protect because secretmem will allocate page by page. Since there is no limited resource, we don't need special permissions to access secretmem so we can move forward with a system call that creates a mmapable file descriptor and save the hassle of a chardev. I cannot say I don't like this as it cuts roughly half of mm/secretmem.c :) But I must say I am still a bit concerned about that we have no provisions here for dealing with the direct map fragmentation even with the set goal to improve the direct map management in the long run... -- Sincerely yours, Mike. _______________________________________________ Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org To unsubscribe send an email to linux-nvdimm-leave@lists.01.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49C9AC433DB for ; Tue, 2 Feb 2021 19:14:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 08D7064E0A for ; Tue, 2 Feb 2021 19:14:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239531AbhBBTOZ (ORCPT ); Tue, 2 Feb 2021 14:14:25 -0500 Received: from mail.kernel.org ([198.145.29.99]:45758 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231862AbhBBTLw (ORCPT ); Tue, 2 Feb 2021 14:11:52 -0500 Received: by mail.kernel.org (Postfix) with ESMTPSA id 47C6D64D87; Tue, 2 Feb 2021 19:10:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612293066; bh=Pzyl+wYYgvPCwgJ+GvBahmfBmshzS1gdeGZydw56C1k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=urlxNYufSkfJp0SH0EEfwFzSeIYwXDdueMnyILwXIullWx4SbMoEerX80f5OQqyLU Bg+fxp5cMzP5FzVAMlOPfjBauj+B1/aEKphaIK3R5lVLYCvCQesqrDIQmZvMjDXmNR wmsGMRNi9ZfDbI6vAJ5p7VMbSsLj4nb8l1NhazBIB10phtLBckQ1cOU2nN3vTXLilS uJtQ4CoJagfCIkQnsWqgtg/k3AnTkwWKwrvb5ozbBGjBggCHGi2zD+24ANJ5uL76kr bEh+jPfAk+q1UtVeseCuM6MVzwqCxfVLsegGHFyjAVr0PWSO8yZEm61pbzlwU5cRc5 HnguLQxiFymVg== Date: Tue, 2 Feb 2021 21:10:40 +0200 From: Mike Rapoport To: Michal Hocko Cc: James Bottomley , David Hildenbrand , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <20210202191040.GP242749@kernel.org> References: <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> <20210126120823.GM827@dhcp22.suse.cz> <20210128092259.GB242749@kernel.org> <73738cda43236b5ac2714e228af362b67a712f5d.camel@linux.ibm.com> <6de6b9f9c2d28eecc494e7db6ffbedc262317e11.camel@linux.ibm.com> <20210202124857.GN242749@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 02, 2021 at 02:27:14PM +0100, Michal Hocko wrote: > On Tue 02-02-21 14:48:57, Mike Rapoport wrote: > > On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote: > > > On Mon 01-02-21 08:56:19, James Bottomley wrote: > > > > > > I have also proposed potential ways out of this. Either the pool is not > > > fixed sized and you make it a regular unevictable memory (if direct map > > > fragmentation is not considered a major problem) > > > > I think that the direct map fragmentation is not a major problem, and the > > data we have confirms it, so I'd be more than happy to entirely drop the > > pool, allocate memory page by page and remove each page from the direct > > map. > > > > Still, we cannot prove negative and it could happen that there is a > > workload that would suffer a lot from the direct map fragmentation, so > > having a pool of large pages upfront is better than trying to fix it > > afterwards. As we get more confidence that the direct map fragmentation is > > not an issue as it is common to believe we may remove the pool altogether. > > I would drop the pool altogether and instantiate pages to the > unevictable LRU list and internally treat it as ramdisk/mlock so you > will get an accounting correctly. The feature should be still opt-in > (e.g. a kernel command line parameter) for now. The recent report by > Intel (http://lkml.kernel.org/r/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com) > there is no clear win to have huge mappings in _general_ but there are > still workloads which benefit. > > > I think that using PMD_ORDER allocations for the pool with a fallback to > > order 0 will do the job, but unfortunately I doubt we'll reach a consensus > > about this because dogmatic beliefs are hard to shake... > > If this is opt-in then those beliefs can be relaxed somehow. Long term > it makes a lot of sense to optimize for a better direct map management > but I do not think this is a hard requirement for an initial > implementation if it is not imposed to everybody by default. > > > A more restrictive possibility is to still use plain PMD_ORDER allocations > > to fill the pool, without relying on CMA. In this case there will be no > > global secretmem specific pool to exhaust, but then it's possible to drain > > high order free blocks in a system, so CMA has an advantage of limiting > > secretmem pools to certain amount of memory with somewhat higher > > probability for high order allocation to succeed. > > > > > or you need a careful access control > > > > Do you mind elaborating what do you mean by "careful access control"? > > As already mentioned, a mechanism to control who can use this feature - > e.g. make it a special device which you can access control by > permissions or higher level security policies. But that is really needed > only if the pool is fixed sized. Let me reiterate to make sure I don't misread your suggestion. If we make secretmem an opt-in feature with, e.g. kernel parameter, the pooling of large pages is unnecessary. In this case there is no limited resource we need to protect because secretmem will allocate page by page. Since there is no limited resource, we don't need special permissions to access secretmem so we can move forward with a system call that creates a mmapable file descriptor and save the hassle of a chardev. I cannot say I don't like this as it cuts roughly half of mm/secretmem.c :) But I must say I am still a bit concerned about that we have no provisions here for dealing with the direct map fragmentation even with the set goal to improve the direct map management in the long run... -- Sincerely yours, Mike. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0470FC433E0 for ; Tue, 2 Feb 2021 19:11:20 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A03A564E0A for ; Tue, 2 Feb 2021 19:11:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A03A564E0A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=udf7X06bGydWe4GFgQTbly9qpPaeUAAPyK3afISDP/k=; b=QdUswf/pcSSDJ3sCeOpLV215H jZH6KpXpUp16huWkU6GceDS3KWRML2sJDR2KqzRoysOPfgUupNHkguj0cI9mWyG5LfClxEBwlCv7x 6dwvgpPTMffSJyF+f3coLMrtscdXR5JSucB25wTGSZqLbKK5iHW7sQF3Ro1px/YF7e9noDp+diNmT N6488LKnzYrwaAP++z0iF9UH4CnUl4zXIumi7/hxWfJIfa/4Z14el3OHplmXC7HV42JoKQr/GfMpn WwcDWmt3Tpjg1sdVlFTczuRpRZt70QhfH+svOPvAnfAcFhLEyop7YnJEitL24aMQcizquF60zwbTF ztUNwJqRA==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l714x-0001sS-RL; Tue, 02 Feb 2021 19:11:11 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l714t-0001qw-3w; Tue, 02 Feb 2021 19:11:08 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id 47C6D64D87; Tue, 2 Feb 2021 19:10:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612293066; bh=Pzyl+wYYgvPCwgJ+GvBahmfBmshzS1gdeGZydw56C1k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=urlxNYufSkfJp0SH0EEfwFzSeIYwXDdueMnyILwXIullWx4SbMoEerX80f5OQqyLU Bg+fxp5cMzP5FzVAMlOPfjBauj+B1/aEKphaIK3R5lVLYCvCQesqrDIQmZvMjDXmNR wmsGMRNi9ZfDbI6vAJ5p7VMbSsLj4nb8l1NhazBIB10phtLBckQ1cOU2nN3vTXLilS uJtQ4CoJagfCIkQnsWqgtg/k3AnTkwWKwrvb5ozbBGjBggCHGi2zD+24ANJ5uL76kr bEh+jPfAk+q1UtVeseCuM6MVzwqCxfVLsegGHFyjAVr0PWSO8yZEm61pbzlwU5cRc5 HnguLQxiFymVg== Date: Tue, 2 Feb 2021 21:10:40 +0200 From: Mike Rapoport To: Michal Hocko Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <20210202191040.GP242749@kernel.org> References: <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> <20210126120823.GM827@dhcp22.suse.cz> <20210128092259.GB242749@kernel.org> <73738cda43236b5ac2714e228af362b67a712f5d.camel@linux.ibm.com> <6de6b9f9c2d28eecc494e7db6ffbedc262317e11.camel@linux.ibm.com> <20210202124857.GN242749@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210202_141107_726079_3C34A5C0 X-CRM114-Status: GOOD ( 33.90 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Palmer Dabbelt , Arnd Bergmann , James Bottomley , Hagen Paul Pfeifer , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Tue, Feb 02, 2021 at 02:27:14PM +0100, Michal Hocko wrote: > On Tue 02-02-21 14:48:57, Mike Rapoport wrote: > > On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote: > > > On Mon 01-02-21 08:56:19, James Bottomley wrote: > > > > > > I have also proposed potential ways out of this. Either the pool is not > > > fixed sized and you make it a regular unevictable memory (if direct map > > > fragmentation is not considered a major problem) > > > > I think that the direct map fragmentation is not a major problem, and the > > data we have confirms it, so I'd be more than happy to entirely drop the > > pool, allocate memory page by page and remove each page from the direct > > map. > > > > Still, we cannot prove negative and it could happen that there is a > > workload that would suffer a lot from the direct map fragmentation, so > > having a pool of large pages upfront is better than trying to fix it > > afterwards. As we get more confidence that the direct map fragmentation is > > not an issue as it is common to believe we may remove the pool altogether. > > I would drop the pool altogether and instantiate pages to the > unevictable LRU list and internally treat it as ramdisk/mlock so you > will get an accounting correctly. The feature should be still opt-in > (e.g. a kernel command line parameter) for now. The recent report by > Intel (http://lkml.kernel.org/r/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com) > there is no clear win to have huge mappings in _general_ but there are > still workloads which benefit. > > > I think that using PMD_ORDER allocations for the pool with a fallback to > > order 0 will do the job, but unfortunately I doubt we'll reach a consensus > > about this because dogmatic beliefs are hard to shake... > > If this is opt-in then those beliefs can be relaxed somehow. Long term > it makes a lot of sense to optimize for a better direct map management > but I do not think this is a hard requirement for an initial > implementation if it is not imposed to everybody by default. > > > A more restrictive possibility is to still use plain PMD_ORDER allocations > > to fill the pool, without relying on CMA. In this case there will be no > > global secretmem specific pool to exhaust, but then it's possible to drain > > high order free blocks in a system, so CMA has an advantage of limiting > > secretmem pools to certain amount of memory with somewhat higher > > probability for high order allocation to succeed. > > > > > or you need a careful access control > > > > Do you mind elaborating what do you mean by "careful access control"? > > As already mentioned, a mechanism to control who can use this feature - > e.g. make it a special device which you can access control by > permissions or higher level security policies. But that is really needed > only if the pool is fixed sized. Let me reiterate to make sure I don't misread your suggestion. If we make secretmem an opt-in feature with, e.g. kernel parameter, the pooling of large pages is unnecessary. In this case there is no limited resource we need to protect because secretmem will allocate page by page. Since there is no limited resource, we don't need special permissions to access secretmem so we can move forward with a system call that creates a mmapable file descriptor and save the hassle of a chardev. I cannot say I don't like this as it cuts roughly half of mm/secretmem.c :) But I must say I am still a bit concerned about that we have no provisions here for dealing with the direct map fragmentation even with the set goal to improve the direct map management in the long run... -- Sincerely yours, Mike. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9A92C433E0 for ; Tue, 2 Feb 2021 19:12:25 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5A13A64E0A for ; Tue, 2 Feb 2021 19:12:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5A13A64E0A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=Ej0CGRXPOzSgkFYOsJwGMHTaMtjcBJhVtZ2RoZiBtvw=; b=P8gmVGmLVrTViHs56t7fYrJzY ZMZCXABXg1BNOlhWRdM6fGa366KzxTDiG9yGfm6jZ0PZViGFbFU2GrUBO+/Z9pzCFJYUVsln4TWu9 ZHqvTcMhd9RSMn8hv/Ah/7dsiZdEW0lxsEKbS3mObE+MIUjzCEGo440ka29leCzZWhltoE8uQJVhO EMS7rhTYgBO7sfygyhYYT131z2v59wLsK131a9pok9r+8+DX5ZWtCxuB/OyT0V0aSLN1sFraOaq/J 8vLB/sfgGCTDBkrnJuglhgS9W5ZWEis9+fHW5JUpN0CDMg4AQKODqn0zqiZ//DhJ6cF9UA9RJT9Ck 3jbAr6cqg==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1l714w-0001sA-QC; Tue, 02 Feb 2021 19:11:10 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1l714t-0001qw-3w; Tue, 02 Feb 2021 19:11:08 +0000 Received: by mail.kernel.org (Postfix) with ESMTPSA id 47C6D64D87; Tue, 2 Feb 2021 19:10:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612293066; bh=Pzyl+wYYgvPCwgJ+GvBahmfBmshzS1gdeGZydw56C1k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=urlxNYufSkfJp0SH0EEfwFzSeIYwXDdueMnyILwXIullWx4SbMoEerX80f5OQqyLU Bg+fxp5cMzP5FzVAMlOPfjBauj+B1/aEKphaIK3R5lVLYCvCQesqrDIQmZvMjDXmNR wmsGMRNi9ZfDbI6vAJ5p7VMbSsLj4nb8l1NhazBIB10phtLBckQ1cOU2nN3vTXLilS uJtQ4CoJagfCIkQnsWqgtg/k3AnTkwWKwrvb5ozbBGjBggCHGi2zD+24ANJ5uL76kr bEh+jPfAk+q1UtVeseCuM6MVzwqCxfVLsegGHFyjAVr0PWSO8yZEm61pbzlwU5cRc5 HnguLQxiFymVg== Date: Tue, 2 Feb 2021 21:10:40 +0200 From: Mike Rapoport To: Michal Hocko Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <20210202191040.GP242749@kernel.org> References: <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> <20210126120823.GM827@dhcp22.suse.cz> <20210128092259.GB242749@kernel.org> <73738cda43236b5ac2714e228af362b67a712f5d.camel@linux.ibm.com> <6de6b9f9c2d28eecc494e7db6ffbedc262317e11.camel@linux.ibm.com> <20210202124857.GN242749@kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210202_141107_726079_3C34A5C0 X-CRM114-Status: GOOD ( 33.90 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Shuah Khan , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Will Deacon , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Palmer Dabbelt , Arnd Bergmann , James Bottomley , Hagen Paul Pfeifer , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Shakeel Butt , Andrew Morton , Rick Edgecombe , Roman Gushchin Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Feb 02, 2021 at 02:27:14PM +0100, Michal Hocko wrote: > On Tue 02-02-21 14:48:57, Mike Rapoport wrote: > > On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote: > > > On Mon 01-02-21 08:56:19, James Bottomley wrote: > > > > > > I have also proposed potential ways out of this. Either the pool is not > > > fixed sized and you make it a regular unevictable memory (if direct map > > > fragmentation is not considered a major problem) > > > > I think that the direct map fragmentation is not a major problem, and the > > data we have confirms it, so I'd be more than happy to entirely drop the > > pool, allocate memory page by page and remove each page from the direct > > map. > > > > Still, we cannot prove negative and it could happen that there is a > > workload that would suffer a lot from the direct map fragmentation, so > > having a pool of large pages upfront is better than trying to fix it > > afterwards. As we get more confidence that the direct map fragmentation is > > not an issue as it is common to believe we may remove the pool altogether. > > I would drop the pool altogether and instantiate pages to the > unevictable LRU list and internally treat it as ramdisk/mlock so you > will get an accounting correctly. The feature should be still opt-in > (e.g. a kernel command line parameter) for now. The recent report by > Intel (http://lkml.kernel.org/r/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux.intel.com) > there is no clear win to have huge mappings in _general_ but there are > still workloads which benefit. > > > I think that using PMD_ORDER allocations for the pool with a fallback to > > order 0 will do the job, but unfortunately I doubt we'll reach a consensus > > about this because dogmatic beliefs are hard to shake... > > If this is opt-in then those beliefs can be relaxed somehow. Long term > it makes a lot of sense to optimize for a better direct map management > but I do not think this is a hard requirement for an initial > implementation if it is not imposed to everybody by default. > > > A more restrictive possibility is to still use plain PMD_ORDER allocations > > to fill the pool, without relying on CMA. In this case there will be no > > global secretmem specific pool to exhaust, but then it's possible to drain > > high order free blocks in a system, so CMA has an advantage of limiting > > secretmem pools to certain amount of memory with somewhat higher > > probability for high order allocation to succeed. > > > > > or you need a careful access control > > > > Do you mind elaborating what do you mean by "careful access control"? > > As already mentioned, a mechanism to control who can use this feature - > e.g. make it a special device which you can access control by > permissions or higher level security policies. But that is really needed > only if the pool is fixed sized. Let me reiterate to make sure I don't misread your suggestion. If we make secretmem an opt-in feature with, e.g. kernel parameter, the pooling of large pages is unnecessary. In this case there is no limited resource we need to protect because secretmem will allocate page by page. Since there is no limited resource, we don't need special permissions to access secretmem so we can move forward with a system call that creates a mmapable file descriptor and save the hassle of a chardev. I cannot say I don't like this as it cuts roughly half of mm/secretmem.c :) But I must say I am still a bit concerned about that we have no provisions here for dealing with the direct map fragmentation even with the set goal to improve the direct map management in the long run... -- Sincerely yours, Mike. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel