From: Catalin Marinas <catalin.marinas@arm.com> To: Arnd Bergmann <arnd@arndb.de> Cc: Linus Torvalds <torvalds@linux-foundation.org>, "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, "Kirill A. Shutemov" <kirill@shutemov.name>, Mark Langsdorf <mlangsdo@redhat.com>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Subject: Re: Linux 3.19-rc3 Date: Mon, 12 Jan 2015 12:18:15 +0000 [thread overview] Message-ID: <20150112121815.GB19807@e104818-lin.cambridge.arm.com> (raw) In-Reply-To: <10028397.kdbz8TfPck@wuerfel> On Sat, Jan 10, 2015 at 09:36:13PM +0000, Arnd Bergmann wrote: > On Saturday 10 January 2015 13:00:27 Linus Torvalds wrote: > > > IIRC, AIX works great with 64k pages, but only because of two > > > reasons that don't apply on Linux: > > > > .. there's a few other ones: > > > > (c) nobody really runs AIX on dekstops. It's very much a DB load > > environment, with historically some HPC. > > > > (d) the powerpc TLB fill/buildup/teardown costs are horrible, so on > > AIX the cost of lots of small pages is much higher too. > > I think (d) applies to ARM as well, since it has no hardware > dirty/referenced bit tracking and requires the OS to mark the > pages as invalid/readonly until the first access. ARMv8.1 > has a fix for that, but it's optional and we haven't seen any > implementations yet. Do you happen have any data on how significantly non-hardware dirty/access bits impact the performance? I think it may affect the user process start-up time a but at run-time it shouldn't be that bad. If it is that significant, we could optimise it further in the arch code. For example, make a fast exception path where we need to mark the pte dirty. This would be handled by arch code without even calling handle_pte_fault(). > > so I feel pretty confident in saying it won't happen. It's just too > > much of a bother, for little to no actual upside. It's likely a much > > better approach to try to instead use THP for anonymous mappings. > > arm64 already supports 2MB transparent hugepages. I guess it > wouldn't be too hard to change it so that an existing hugepage > on an anonymous mapping that gets split up into 4KB pages gets > split along 64KB boundaries with the contiguous mapping bit set. > > Having full support for multiple hugepage sizes (64KB, 2MB and 32MB > in case of ARM64 with 4KB PAGE_SIZE) would be even better and > probably negate any benefits of 64KB PAGE_SIZE, but requires more > changes to common mm code. As I replied to your other email, I don't think that's simple for the transparent huge pages case. The main advantage I see with 64KB pages is not the reduced TLB pressure but the number of levels of page tables. Take the AMD Seattle board for example, with 4KB pages you need 4 levels but 64KB allow only 2 levels (42-bit VA). Larger TLBs and improved walk caches (caching VA -> pmd entry translation rather than all the way to pte/PA) make things better but you still have the warming up time for any fork/new process as they don't share the same TLB entries. But as Linus said already, the trade-off with the memory wastage is highly dependent on the targeted load. -- Catalin
WARNING: multiple messages have this Message-ID (diff)
From: catalin.marinas@arm.com (Catalin Marinas) To: linux-arm-kernel@lists.infradead.org Subject: Linux 3.19-rc3 Date: Mon, 12 Jan 2015 12:18:15 +0000 [thread overview] Message-ID: <20150112121815.GB19807@e104818-lin.cambridge.arm.com> (raw) In-Reply-To: <10028397.kdbz8TfPck@wuerfel> On Sat, Jan 10, 2015 at 09:36:13PM +0000, Arnd Bergmann wrote: > On Saturday 10 January 2015 13:00:27 Linus Torvalds wrote: > > > IIRC, AIX works great with 64k pages, but only because of two > > > reasons that don't apply on Linux: > > > > .. there's a few other ones: > > > > (c) nobody really runs AIX on dekstops. It's very much a DB load > > environment, with historically some HPC. > > > > (d) the powerpc TLB fill/buildup/teardown costs are horrible, so on > > AIX the cost of lots of small pages is much higher too. > > I think (d) applies to ARM as well, since it has no hardware > dirty/referenced bit tracking and requires the OS to mark the > pages as invalid/readonly until the first access. ARMv8.1 > has a fix for that, but it's optional and we haven't seen any > implementations yet. Do you happen have any data on how significantly non-hardware dirty/access bits impact the performance? I think it may affect the user process start-up time a but at run-time it shouldn't be that bad. If it is that significant, we could optimise it further in the arch code. For example, make a fast exception path where we need to mark the pte dirty. This would be handled by arch code without even calling handle_pte_fault(). > > so I feel pretty confident in saying it won't happen. It's just too > > much of a bother, for little to no actual upside. It's likely a much > > better approach to try to instead use THP for anonymous mappings. > > arm64 already supports 2MB transparent hugepages. I guess it > wouldn't be too hard to change it so that an existing hugepage > on an anonymous mapping that gets split up into 4KB pages gets > split along 64KB boundaries with the contiguous mapping bit set. > > Having full support for multiple hugepage sizes (64KB, 2MB and 32MB > in case of ARM64 with 4KB PAGE_SIZE) would be even better and > probably negate any benefits of 64KB PAGE_SIZE, but requires more > changes to common mm code. As I replied to your other email, I don't think that's simple for the transparent huge pages case. The main advantage I see with 64KB pages is not the reduced TLB pressure but the number of levels of page tables. Take the AMD Seattle board for example, with 4KB pages you need 4 levels but 64KB allow only 2 levels (42-bit VA). Larger TLBs and improved walk caches (caching VA -> pmd entry translation rather than all the way to pte/PA) make things better but you still have the warming up time for any fork/new process as they don't share the same TLB entries. But as Linus said already, the trade-off with the memory wastage is highly dependent on the targeted load. -- Catalin
next prev parent reply other threads:[~2015-01-12 12:18 UTC|newest] Thread overview: 154+ messages / expand[flat|nested] mbox.gz Atom feed top 2015-01-06 1:46 Linux 3.19-rc3 Linus Torvalds 2015-01-06 2:46 ` Dave Jones 2015-01-06 8:18 ` Takashi Iwai 2015-01-06 9:45 ` Jiri Kosina 2015-01-08 12:51 ` Mark Langsdorf 2015-01-08 12:51 ` Mark Langsdorf 2015-01-08 13:45 ` Catalin Marinas 2015-01-08 13:45 ` Catalin Marinas 2015-01-08 17:29 ` Mark Langsdorf 2015-01-08 17:29 ` Mark Langsdorf 2015-01-08 17:34 ` Catalin Marinas 2015-01-08 17:34 ` Catalin Marinas 2015-01-08 18:48 ` Mark Langsdorf 2015-01-08 18:48 ` Mark Langsdorf 2015-01-08 19:21 ` Linus Torvalds 2015-01-08 19:21 ` Linus Torvalds 2015-01-09 23:27 ` Catalin Marinas 2015-01-09 23:27 ` Catalin Marinas 2015-01-10 0:35 ` Kirill A. Shutemov 2015-01-10 0:35 ` Kirill A. Shutemov 2015-01-10 2:27 ` Linus Torvalds 2015-01-10 2:27 ` Linus Torvalds 2015-01-10 2:51 ` David Lang 2015-01-10 2:51 ` David Lang 2015-01-10 3:06 ` Linus Torvalds 2015-01-10 3:06 ` Linus Torvalds 2015-01-10 10:46 ` Andreas Mohr 2015-01-10 10:46 ` Andreas Mohr 2015-01-10 19:42 ` Linus Torvalds 2015-01-10 19:42 ` Linus Torvalds 2015-01-13 3:33 ` Rik van Riel 2015-01-13 3:33 ` Rik van Riel 2015-01-13 10:28 ` Catalin Marinas 2015-01-13 10:28 ` Catalin Marinas 2015-01-10 3:17 ` Tony Luck 2015-01-10 3:17 ` Tony Luck 2015-01-10 20:16 ` Arnd Bergmann 2015-01-10 20:16 ` Arnd Bergmann 2015-01-10 21:00 ` Linus Torvalds 2015-01-10 21:00 ` Linus Torvalds 2015-01-10 21:36 ` Arnd Bergmann 2015-01-10 21:36 ` Arnd Bergmann 2015-01-10 21:48 ` Linus Torvalds 2015-01-10 21:48 ` Linus Torvalds 2015-01-12 11:37 ` Kirill A. Shutemov 2015-01-12 11:37 ` Kirill A. Shutemov 2015-01-12 12:18 ` Catalin Marinas [this message] 2015-01-12 12:18 ` Catalin Marinas 2015-01-12 13:57 ` Arnd Bergmann 2015-01-12 13:57 ` Arnd Bergmann 2015-01-12 14:23 ` Catalin Marinas 2015-01-12 14:23 ` Catalin Marinas 2015-01-12 15:42 ` Arnd Bergmann 2015-01-12 15:42 ` Arnd Bergmann 2015-01-12 11:53 ` Catalin Marinas 2015-01-12 11:53 ` Catalin Marinas 2015-01-12 13:15 ` Arnd Bergmann 2015-01-12 13:15 ` Arnd Bergmann 2015-01-08 15:08 ` Michal Hocko 2015-01-08 15:08 ` Michal Hocko 2015-01-08 15:08 ` Michal Hocko 2015-01-08 16:37 ` Mark Langsdorf 2015-01-08 16:37 ` Mark Langsdorf 2015-01-08 16:37 ` Mark Langsdorf 2015-01-09 15:56 ` Michal Hocko 2015-01-09 15:56 ` Michal Hocko 2015-01-09 15:56 ` Michal Hocko 2015-01-09 12:13 ` Mark Rutland 2015-01-09 12:13 ` Mark Rutland 2015-01-09 14:19 ` Steve Capper 2015-01-09 14:19 ` Steve Capper 2015-01-09 14:27 ` Mark Langsdorf 2015-01-09 14:27 ` Mark Langsdorf 2015-01-09 17:57 ` Mark Rutland 2015-01-09 17:57 ` Mark Rutland 2015-01-09 18:37 ` Marc Zyngier 2015-01-09 18:37 ` Marc Zyngier 2015-01-09 19:43 ` Will Deacon 2015-01-09 19:43 ` Will Deacon 2015-01-10 3:29 ` Laszlo Ersek 2015-01-10 3:29 ` Laszlo Ersek 2015-01-10 4:39 ` Linus Torvalds 2015-01-10 4:39 ` Linus Torvalds 2015-01-10 13:37 ` Will Deacon 2015-01-10 13:37 ` Will Deacon 2015-01-10 19:47 ` Laszlo Ersek 2015-01-10 19:47 ` Laszlo Ersek 2015-01-10 19:56 ` Linus Torvalds 2015-01-10 19:56 ` Linus Torvalds 2015-01-10 20:08 ` Laszlo Ersek 2015-01-10 20:08 ` Laszlo Ersek 2015-01-10 19:51 ` Linus Torvalds 2015-01-10 19:51 ` Linus Torvalds 2015-01-12 12:42 ` Will Deacon 2015-01-12 12:42 ` Will Deacon 2015-01-12 13:22 ` Mark Langsdorf 2015-01-12 13:22 ` Mark Langsdorf 2015-01-12 19:03 ` Dave Hansen 2015-01-12 19:03 ` Dave Hansen 2015-01-12 19:06 ` Linus Torvalds 2015-01-12 19:06 ` Linus Torvalds 2015-01-12 19:07 ` Linus Torvalds 2015-01-12 19:07 ` Linus Torvalds 2015-01-12 19:24 ` Will Deacon 2015-01-12 19:24 ` Will Deacon 2015-01-10 15:22 ` Kyle McMartin 2015-01-10 15:22 ` Kyle McMartin 2015-01-06 4:49 Sedat Dilek 2015-01-06 9:34 ` Sedat Dilek 2015-01-06 9:56 ` Takashi Iwai 2015-01-06 10:06 ` Sedat Dilek 2015-01-06 10:28 ` Takashi Iwai 2015-01-06 10:31 ` Sedat Dilek 2015-01-06 10:37 ` Takashi Iwai 2015-01-06 10:42 ` Sedat Dilek 2015-01-06 9:59 ` Peter Zijlstra 2015-01-06 9:40 ` Peter Zijlstra 2015-01-06 9:42 ` Sedat Dilek 2015-01-06 9:57 ` Sedat Dilek 2015-01-06 10:06 ` Peter Zijlstra 2015-01-06 10:18 ` Sedat Dilek 2015-01-06 11:01 ` Peter Zijlstra 2015-01-06 11:07 ` Kent Overstreet 2015-01-06 11:25 ` Sedat Dilek 2015-01-06 11:40 ` Kent Overstreet 2015-01-06 12:51 ` Sedat Dilek 2015-01-06 11:42 ` Peter Zijlstra 2015-01-06 11:48 ` Peter Zijlstra 2015-01-06 12:01 ` Kent Overstreet 2015-01-06 12:20 ` Peter Zijlstra 2015-01-06 12:45 ` Kent Overstreet 2015-01-06 12:55 ` Peter Hurley 2015-01-06 17:38 ` Paul E. McKenney 2015-01-06 17:58 ` Peter Hurley 2015-01-06 19:25 ` Paul E. McKenney 2015-01-06 19:57 ` Peter Hurley 2015-01-06 20:47 ` Paul E. McKenney 2015-01-20 0:30 ` Paul E. McKenney 2015-01-20 14:03 ` Peter Hurley 2015-02-02 16:11 ` Paul E. McKenney 2015-02-02 19:03 ` Peter Hurley 2015-02-02 19:33 ` Paul E. McKenney 2015-01-06 11:56 ` Kent Overstreet 2015-01-06 12:16 ` Peter Zijlstra 2015-01-06 12:43 ` Kent Overstreet 2015-01-06 13:03 ` Peter Zijlstra 2015-01-06 13:28 ` Kent Overstreet 2015-01-13 15:23 ` Peter Zijlstra 2015-01-06 11:58 ` Peter Zijlstra 2015-01-06 12:18 ` Kent Overstreet 2015-01-16 16:56 ` Peter Hurley 2015-01-16 17:00 ` Chris Mason 2015-01-16 18:58 ` Peter Hurley 2015-01-06 10:29 ` Sedat Dilek
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20150112121815.GB19807@e104818-lin.cambridge.arm.com \ --to=catalin.marinas@arm.com \ --cc=arnd@arndb.de \ --cc=kirill@shutemov.name \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mlangsdo@redhat.com \ --cc=torvalds@linux-foundation.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.