From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3305BC4727E for ; Wed, 30 Sep 2020 10:21:10 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CE90120719 for ; Wed, 30 Sep 2020 10:21:09 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="qPfD4t4Z"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="jcekzk5d" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE90120719 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=tb1r+xgQajMri4zyJBuh/PDiSYmwouxCyaWm9RCZSgk=; b=qPfD4t4Zb3sC8VfM0Z8kRij3m Sk9/7iTzrSiwp+73wc2U3R04ii+uDkxBKB4h3kmoCKK+GiUTtoFeGVcY2NWX9pogoi3EzzwCtMUYo T/+miWHPHmcCkfMMyTrFVDuurhqzEWYZiRdIzCMSwPOrOE6Xd+ofSRzPT6iOfLC4WT4rdxbtmxW3W niqYwYBd08a2D92IkGrG/fb02AZwiPO5/7OVOVmbM+XVOmOPXByV0fhZM+9YSIE0srU8SabrzUqB9 7F7vQPiv3eZiazf8FjvzR45nJfPE7K1FHPaWgfmCuewGp7ZEPzJEYAg8rZ7Y+81MH2/CoE1Ifbsix iej12SmEQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1kNZEK-0005w0-Et; Wed, 30 Sep 2020 10:21:00 +0000 Received: from mail.kernel.org ([198.145.29.99]) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1kNZE9-0005q2-6i; Wed, 30 Sep 2020 10:20:53 +0000 Received: from kernel.org (unknown [87.71.73.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id BD4B3205F4; Wed, 30 Sep 2020 10:20:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1601461247; bh=6oXQqyXoqD62Xsdx5psnVPQRRFcKJD7TzaHCNaM+cfo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=jcekzk5dBYwe2jH5pSJRmExmxQTx5Y/sk708kFA+J4dcQopXM92llD+OqjWCXgKuT dmrRBkYzNmmpQuJcEzNThsHaZcV+G7b6pQg9hev7AIEYe3G8ksdc6iTRWS4tzmiNvV tQmVGE7L8Bxs8vTgd7E6jHGnxc1rUFFqUOmdsY+s= Date: Wed, 30 Sep 2020 13:20:31 +0300 From: Mike Rapoport To: Peter Zijlstra Subject: Re: [PATCH v6 5/6] mm: secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <20200930102031.GJ2142832@kernel.org> References: <20200924132904.1391-1-rppt@kernel.org> <20200924132904.1391-6-rppt@kernel.org> <20200925074125.GQ2628@hirez.programming.kicks-ass.net> <20200929130529.GE2142832@kernel.org> <20200929141216.GO2628@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200929141216.GO2628@hirez.programming.kicks-ass.net> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200930_062049_426679_A37BE37F X-CRM114-Status: GOOD ( 40.28 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Mark Rutland , David Hildenbrand , Catalin Marinas , Dave Hansen , linux-mm@kvack.org, Will Deacon , linux-kselftest@vger.kernel.org, "H. Peter Anvin" , Christopher Lameter , Idan Yaniv , Thomas Gleixner , Elena Reshetova , linux-arch@vger.kernel.org, Tycho Andersen , linux-nvdimm@lists.01.org, Shuah Khan , x86@kernel.org, Matthew Wilcox , Mike Rapoport , Ingo Molnar , Michael Kerrisk , Arnd Bergmann , James Bottomley , Borislav Petkov , Alexander Viro , Andy Lutomirski , Paul Walmsley , "Kirill A. Shutemov" , Dan Williams , linux-arm-kernel@lists.infradead.org, linux-api@vger.kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org, Palmer Dabbelt , linux-fsdevel@vger.kernel.org, Andrew Morton Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Tue, Sep 29, 2020 at 04:12:16PM +0200, Peter Zijlstra wrote: > On Tue, Sep 29, 2020 at 04:05:29PM +0300, Mike Rapoport wrote: > > On Fri, Sep 25, 2020 at 09:41:25AM +0200, Peter Zijlstra wrote: > > > On Thu, Sep 24, 2020 at 04:29:03PM +0300, Mike Rapoport wrote: > > > > From: Mike Rapoport > > > > > > > > Removing a PAGE_SIZE page from the direct map every time such page is > > > > allocated for a secret memory mapping will cause severe fragmentation of > > > > the direct map. This fragmentation can be reduced by using PMD-size pages > > > > as a pool for small pages for secret memory mappings. > > > > > > > > Add a gen_pool per secretmem inode and lazily populate this pool with > > > > PMD-size pages. > > > > > > What's the actual efficacy of this? Since the pmd is per inode, all I > > > need is a lot of inodes and we're in business to destroy the directmap, > > > no? > > > > > > Afaict there's no privs needed to use this, all a process needs is to > > > stay below the mlock limit, so a 'fork-bomb' that maps a single secret > > > page will utterly destroy the direct map. > > > > This indeed will cause 1G pages in the direct map to be split into 2M > > chunks, but I disagree with 'destroy' term here. Citing the cover letter > > of an earlier version of this series: > > It will drop them down to 4k pages. Given enough inodes, and allocating > only a single sekrit page per pmd, we'll shatter the directmap into 4k. > > > I've tried to find some numbers that show the benefit of using larger > > pages in the direct map, but I couldn't find anything so I've run a > > couple of benchmarks from phoronix-test-suite on my laptop (i7-8650U > > with 32G RAM). > > Existing benchmarks suck at this, but FB had a load that had a I tried to dig the regression report in the mailing list, and the best I could find is https://lore.kernel.org/lkml/20190823052335.572133-1-songliubraving@fb.com/ which does not mention the actual performance regression but it only complaints about kernel text mapping being split into 4K pages. Any chance you have the regression report handy? > deterministic enough performance regression to bisect to a directmap > issue, fixed by: > > 7af0145067bc ("x86/mm/cpa: Prevent large page split when ftrace flips RW on kernel text") This commit talks about large page split for the text and mentions iTLB performance. Could it be that for data the behavoiur is different? > > I've tested three variants: the default with 28G of the physical > > memory covered with 1G pages, then I disabled 1G pages using > > "nogbpages" in the kernel command line and at last I've forced the > > entire direct map to use 4K pages using a simple patch to > > arch/x86/mm/init.c. I've made runs of the benchmarks with SSD and > > tmpfs. > > > > Surprisingly, the results does not show huge advantage for large > > pages. For instance, here the results for kernel build with > > 'make -j8', in seconds: > > Your benchmark should stress the TLB of your uarch, such that additional > pressure added by the shattered directmap shows up. I understand that the benchmark should stress the TLB, but it's not that we can add something like random access to a large working set as a kernel module and insmod it. The userspace should do something that will cause the stress to the TLB so that entries corresponding to the direct map will be evicted frequently. And, frankly, > And no, I don't have one either. > > > | 1G | 2M | 4K > > ----------------------+--------+--------+--------- > > ssd, mitigations=on | 308.75 | 317.37 | 314.9 > > ssd, mitigations=off | 305.25 | 295.32 | 304.92 > > ram, mitigations=on | 301.58 | 322.49 | 306.54 > > ram, mitigations=off | 299.32 | 288.44 | 310.65 > > These results lack error data, but assuming the reults are significant, > then this very much makes a case for 1G mappings. 5s on a kernel builds > is pretty good. The standard error for those are between 2.5 and 4.5 out of 3 runs for each variant. For kernel build 1G mappings perform better, but here 5s is only 1.6% of 300s and the direct map fragmentation was taken to the extreme here. I'm not saying that the direct map fragmentation comes with no cost, but the cost is not so big to dismiss features that cause the fragmentation out of hand. There were also benchmarks that actually performed better with 2M pages in the direct map, so I'm still not convinced that 1G pages in the direct map are the clear cut winner. -- Sincerely yours, Mike. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv