From mboxrd@z Thu Jan 1 00:00:00 1970 From: agraf@suse.de (Alexander Graf) Date: Fri, 2 Sep 2016 18:58:00 +0200 Subject: [RFC/RFT PATCH] arm64: mm: allow userland to run with one fewer translation level In-Reply-To: <1471781914-16681-1-git-send-email-ard.biesheuvel@linaro.org> References: <1471781914-16681-1-git-send-email-ard.biesheuvel@linaro.org> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 21.08.16 14:18, Ard Biesheuvel wrote: > The choice of VA size is usually decided by the requirements on the kernel > side, particularly the size of the linear region, which must be large > enough to cover all of physical memory, including the holes in between, > which may be very large (~512 GB on some systems). > > Since running with more translation levels could potentially result in > a performance penalty due to additional TLB pressure, this patch allows the > kernel to be configured so that it runs with one fewer translation level on > the userland side. Rather than modifying all the compile time logic to deal > with folded PUDs or PMDs, we simply allocate the root table and the next > table adjacently, so that we can simply point TTBR0_EL1 to the next table > (and update TCR_EL1.T0SZ accordingly) > > Signed-off-by: Ard Biesheuvel > --- > > This is just a proof of concept. *If* there is a performance penalty associated > with using 4 translation levels instead of 3, I would expect this patch to > compensate for that, given that the additional TLB pressure should be on the > userland side primarily. Benchmark results are highly appreciated. > > As a bonus, this would fix the horrible yet real JIT issues we have been seeing > with 48-bit VA configurations. IOW, I expect this to be an easier sell than > simply limiting TASKSIZE to 47 bits (assuming anyone can show a benchmark where > this patch has a positive impact on the performance of a 48-bit/4 levels kernel) > and distros can ship kernels that work on all hardware (including Freescale and > Xgene with >= 64 GB) but don't break their JITs. > > This patch is most likely broken for 16k/47-bit configs, but I didn't bother to > fix that before having the discussion. Let's roll forward by a few years. In that time, there's a good chance you will have nvdimms in a good number of systems out there with massive address spaces that easily reach beyond the lousy 512GB you get with 3 levels. That means at that point we'd have to roll back and have 48 bits regardless - or add special attributes to have binaries that then can demand bigger address space. Overall that doesn't sound terribly appealing, so I'm not sure going for 39 as interim is a step into the right direction. That said, I'd be very happy to see benchmark results too :) Alex