From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AB82C10F25 for ; Mon, 9 Mar 2020 19:46:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DE88620828 for ; Mon, 9 Mar 2020 19:46:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DE88620828 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arndb.de Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 899336B0007; Mon, 9 Mar 2020 15:46:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7FB716B0008; Mon, 9 Mar 2020 15:46:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6EC0D6B000A; Mon, 9 Mar 2020 15:46:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id 51AEE6B0007 for ; Mon, 9 Mar 2020 15:46:39 -0400 (EDT) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2DA8C180AD811 for ; Mon, 9 Mar 2020 19:46:39 +0000 (UTC) X-FDA: 76576856118.14.roof98_1cdc1c5368859 X-HE-Tag: roof98_1cdc1c5368859 X-Filterd-Recvd-Size: 7823 Received: from mout.kundenserver.de (mout.kundenserver.de [212.227.17.24]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Mon, 9 Mar 2020 19:46:38 +0000 (UTC) Received: from mail-qk1-f177.google.com ([209.85.222.177]) by mrelayeu.kundenserver.de (mreue109 [212.227.15.145]) with ESMTPSA (Nemesis) id 1McpW8-1jjuAA1uVe-00a0rl for ; Mon, 09 Mar 2020 20:46:36 +0100 Received: by mail-qk1-f177.google.com with SMTP id c145so4695228qke.12 for ; Mon, 09 Mar 2020 12:46:35 -0700 (PDT) X-Gm-Message-State: ANhLgQ3DvTRYI8x6i1NZANlNiK9fWdr0YJ3hArosKYNW8KFp14A48/w6 QBbE5YneesyjKo8YcI7U2vvyEAwG54KRF5+bvxI= X-Google-Smtp-Source: ADFU+vvmBvMz+V6RDUU++ZwSagdLLK5RIhqNTJ7v8Q8rLCrbads695yLqa65IiOBHxFXCnfQzEWO8L12fWPy8qQjUb0= X-Received: by 2002:a37:6455:: with SMTP id y82mr7117037qkb.286.1583783194930; Mon, 09 Mar 2020 12:46:34 -0700 (PDT) MIME-Version: 1.0 References: <20200211164701.4ac88d9222e23d1e8cc57c51@linux-foundation.org> <20200212085004.GL25745@shell.armlinux.org.uk> <671b05bc-7237-7422-3ece-f1a4a3652c92@oracle.com> <7c4c1459-60d5-24c8-6eb9-da299ead99ea@oracle.com> <20200306203439.peytghdqragjfhdx@kahuna> <20200309155945.GA4124965@arrakis.emea.arm.com> <20200309160919.GM25745@shell.armlinux.org.uk> In-Reply-To: <20200309160919.GM25745@shell.armlinux.org.uk> From: Arnd Bergmann Date: Mon, 9 Mar 2020 20:46:18 +0100 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] vfs: keep inodes with page cache off the inode shrinker LRU To: Russell King - ARM Linux admin Cc: Catalin Marinas , Nishanth Menon , Santosh Shilimkar , Tero Kristo , Linux ARM , Michal Hocko , Rik van Riel , Santosh Shilimkar , Dave Chinner , Linux Kernel Mailing List , Linux-MM , Yafang Shao , Al Viro , Johannes Weiner , linux-fsdevel , kernel-team@fb.com, Kishon Vijay Abraham I , Linus Torvalds , Andrew Morton , Roman Gushchin Content-Type: text/plain; charset="UTF-8" X-Provags-ID: V03:K1:4OqqOuufWoesLMz5UzTzM2dpEWkTnUbnzoqIBbXNevJTvjMfW2z FUDeoU9LvZ4GzRlllwNgIbsRNAEgooNwtSHVyJLHyy4GfIHbby+vYS0w0sl1X0gW+3+KXDy w9meQrk6BtgCUgabEgKzPrpgTxCp/ibnDTO9x5DZwS/s7gmSl/00Y9xOpeydWBL1PrJWFr+ eV+81MkZGRS4pad8NB8yA== X-UI-Out-Filterresults: notjunk:1;V03:K0:5qJaCHL+2QE=:e4tgdx9lgpmKAe9Xkojyib SGxduTfhUdwYb5j9lsaLLmsVRA3H5ir4c5DbYqydvAB3+pM1iuyWPZRER3m7sMaVcw9kfzmLX 49HlC75CNW2ObeqlcB05489DpqSV6TJ2q8fcwIDxe1kOggUoTHNftCKPi13PMquShaaN92fYr lq0A/pjTT9zAbOy4WCH8y21QwbCvHBKE0MkcLtZ8u1tvpX35HJUMzg7/7hzGDKi9/WJ1EiFE0 VhM0+0Ch0dATE7qoLGHfSTxf48XVqZjHlYgXuZh7Xf9hTJ5MXsKs794AOfMGA4rOC0C3QRcE2 6Yx+NvOoWdVPWz5MomrBRIeMxX8IaszMCRS8CLDEihBw2hzT1FRZ6GFepWBgr3ulysYRUdu6i IpG9ykfc2B/3U/Bkgh/MDGZhym8Yn7f5LLb7G35uXRQ1ntGcdUhLXQaVQi7GsYu3QvEaTY3k9 Yg/US5E48RqTWkrH0+8BBkqf11g2X5sjzoTBBwSQN7LPfYKjWeY9OKA67cZMJrD7n7DUwKWTd bm/X9g3wQAfdumvuqdjjUMFCtmMftBU+uRp0cHWq5cEWuVAphfG56ZA/r1nzrt8UX6j4kCjJe zYybhQlaNwRwZdFUR7wOZEz3Ic5YH3RC7uO60iceJGUGpDZazckTCiYNkd5iGk0eQLCEq5FwI qwbywLmxbe7gAy1jPfkFDZd0Xz1oUOLvrAN+znkdweODCqHhC0NxgqmvyiKJc4MxHLwxA+RYI p3Uye3dcm2N02bH2GqfqPmVQnzqaRJ3lWR/Wgc0OG8xBRxGxLNtIXmiYyJv1ygb0MNChkTuWH qMigjIGMuFQeh6ACRBXuzpVBwm+LBhDppjv3fXz1s9eRqoSojY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Mar 9, 2020 at 5:09 PM Russell King - ARM Linux admin wrote: > On Mon, Mar 09, 2020 at 03:59:45PM +0000, Catalin Marinas wrote: > > On Sun, Mar 08, 2020 at 11:58:52AM +0100, Arnd Bergmann wrote: > > > - revisit CONFIG_VMSPLIT_4G_4G for arm32 (and maybe mips32) > > > to see if it can be done, and what the overhead is. This is probably > > > more work than the others combined, but also the most promising > > > as it allows the most user address space and physical ram to be used. > > > > A rough outline of such support (and likely to miss some corner cases): > > > > 1. Kernel runs with its own ASID and non-global page tables. > > > > 2. Trampoline code on exception entry/exit to handle the TTBR0 switching > > between user and kernel. > > > > 3. uaccess routines need to be reworked to pin the user pages in memory > > (get_user_pages()) and access them via the kernel address space. > > > > Point 3 is probably the ugliest and it would introduce a noticeable > > slowdown in certain syscalls. There are probably a number of ways to do the basic design. The idea I had (again, probably missing more corner cases than either of you two that actually understand the details of the mmu): - Assuming we have LPAE, run the kernel vmlinux and modules inside the vmalloc space, in the top 256MB or 512MB on TTBR1 - Map all the physical RAM (up to 3.75GB) into a reserved ASID with TTBR0 - Flip TTBR0 on kernel entry/exit, and again during user access. This is probably more work to implement than your idea, but I would hope this has a lower overhead on most microarchitectures as it doesn't require pinning the pages. Depending on the microarchitecture, I'd hope the overhead would be comparable to that of ARM64_SW_TTBR0_PAN. > We also need to consider that it has implications for the single-kernel > support; a kernel doing this kind of switching would likely be horrid > for a kernel supporting v6+ with VIPT aliasing caches. Would we be > adding a new red line between kernels supporting VIPT-aliasing caches > (present in earlier v6 implementations) and kernels using this system? I would initially do it for LPAE only, given that this is already an incompatible config option. I don't think there are any v6 machines with more than 1GB of RAM (the maximum for AST2500), and the only distro that ships a v6+ multiplatform kernel is Raspbian, which in turn needs a separate LPAE kernel for the large-memory machines anyway. Only doing it for LPAE would still cover the vast majority of systems that actually shipped with more than 2GB. There are a couple of exceptions, i.e. early Cubox i4x4, the Calxeda Highbank developer system and the Novena Laptop, which I would guess have a limited life expectancy (before users stop updating kernels) no longer than the 8GB Keystone-2. Based on that, I would hope that the ARMv7 distros can keep shipping the two kernel images they already ship: - The non-LPAE kernel modified to VMSPLIT_2G_OPT, not using highmem on anything up to 2GB, but still supporting the handful of remaining Cortex-A9s with 4GB using highmem until they are completely obsolete. - The LPAE kernel modified to use a newly added VMSPLIT_4G_4G, with details to be worked out. Most new systems tend to be based on Cortex-A7 with no more than 2GB, so those could run either configuration well. If we find the 2GB of user address space too limiting for the non-LPAE config, or I missed some important pre-LPAE systems with 4GB that need to be supported for longer than other highmem systems, that can probably be added later. Arnd