linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Joerg Roedel <joro@8bytes.org>
Cc: Andy Lutomirski <luto@kernel.org>, Joerg Roedel <jroedel@suse.de>,
	X86 ML <x86@kernel.org>,  "H. Peter Anvin" <hpa@zytor.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Arnd Bergmann <arnd@arndb.de>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Michal Hocko <mhocko@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	 Linux ACPI <linux-acpi@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	 Linux-MM <linux-mm@kvack.org>
Subject: Re: [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings()
Date: Sat, 9 May 2020 22:05:43 -0700	[thread overview]
Message-ID: <CALCETrWyQA=4y57PsKhhcRWpxfCufBpda5g7gyEVSST6H5FNJQ@mail.gmail.com> (raw)
In-Reply-To: <20200509215713.GE18353@8bytes.org>

On Sat, May 9, 2020 at 2:57 PM Joerg Roedel <joro@8bytes.org> wrote:
>
> Hi Andy,
>
> On Sat, May 09, 2020 at 12:05:29PM -0700, Andy Lutomirski wrote:

> > So, unless I'm missing something here, there is an absolute maximum of
> > 512 top-level entries that ever need to be synchronized.
>
> And here is where your assumption is wrong. On 32-bit PAE systems it is
> not the top-level entries that need to be synchronized for vmalloc, but
> the second-level entries. And dependent on the kernel configuration,
> there are (in total, not only vmalloc) 1536, 1024, or 512 of these
> second-level entries. How much of them are actually used for vmalloc
> depends on the size of the system RAM (but is at least 64), because
> the vmalloc area begins after the kernel direct-mapping (with an 8MB
> unmapped hole).

I spent some time looking at the code, and I'm guessing you're talking
about the 3-level !SHARED_KERNEL_PMD case.  I can't quite figure out
what's going on.

Can you explain what is actually going on that causes different
mms/pgds to have top-level entries in the kernel range that point to
different tables?  Because I'm not seeing why this makes any sense.

>
> > Now, there's an additional complication.  On x86_64, we have a rule:
> > those entries that need to be synced start out null and may, during
> > the lifetime of the system, change *once*.  They are never unmapped or
> > modified after being allocated.  This means that those entries can
> > only ever point to a page *table* and not to a ginormous page.  So,
> > even if the hardware were to support ginormous pages (which, IIRC, it
> > doesn't), we would be limited to merely immense and not ginormous
> > pages in the vmalloc range.  On x86_32, I don't think we have this
> > rule right now.  And this means that it's possible for one of these
> > pages to be unmapped or modified.
>
> The reason for x86-32 being different is that the address space is
> orders of magnitude smaller than on x86-64. We just have 4 top-level
> entries with PAE paging and can't afford to partition kernel-adress
> space on that level like we do on x86-64. That is the reason the address
> space is partitioned on the second (PMD) level, which is also the reason
> vmalloc synchronization needs to happen on that level. And because
> that's not enough yet, its also the page-table level to map huge-pages.

Why does it need to be partitioned at all?  The only thing that comes
to mind is that the LDT range is per-mm.  So I can imagine that the
PAE case with a 3G user / 1G kernel split has to have the vmalloc
range and the LDT range in the same top-level entry.  Yuck.

>
> > So my suggestion is that just apply the x86_64 rule to x86_32 as well.
> > The practical effect will be that 2-level-paging systems will not be
> > able to use huge pages in the vmalloc range, since the rule will be
> > that the vmalloc-relevant entries in the top-level table must point to
> > page *tables* instead of huge pages.
>
> I could very well live with prohibiting huge-page ioremap mappings for
> x86-32. But as I wrote before, this doesn't solve the problems I am
> trying to address with this patch-set, or would only address them if
> significant amount of total system memory is used.
>
> The pre-allocation solution would work for x86-64, it would only need
> 256kb of preallocated memory for the vmalloc range to never synchronize
> or fault again. I have thought about that and did the math before
> writing this patch-set, but doing the math for 32 bit drove me away from
> it for reasons written above.
>

If it's *just* the LDT that's a problem, we could plausibly shrink the
user address range a little bit and put the LDT in the user portion.
I suppose this could end up creating its own set of problems involving
tracking which code owns which page tables.

> And since a lot of the vmalloc_sync_(un)mappings problems I debugged
> were actually related to 32-bit, I want a solution that works for 32 and
> 64-bit x86 (at least until support for x86-32 is removed). And I think
> this patch-set provides a solution that works well for both.

I'm not fundamentally objecting to your patch set, but I do want to
understand what's going on that needs this stuff.

>
>
>         Joerg


  reply	other threads:[~2020-05-10  5:06 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-08 14:40 [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings() Joerg Roedel
2020-05-08 14:40 ` [RFC PATCH 1/7] mm: Add functions to track page directory modifications Joerg Roedel
2020-05-08 14:40 ` [RFC PATCH 2/7] mm/vmalloc: Track which page-table levels were modified Joerg Roedel
2020-05-08 19:10   ` Peter Zijlstra
2020-05-08 21:23     ` Joerg Roedel
2020-05-08 14:40 ` [RFC PATCH 3/7] mm/ioremap: " Joerg Roedel
2020-05-08 14:40 ` [RFC PATCH 4/7] x86/mm/64: Implement arch_sync_kernel_mappings() Joerg Roedel
2020-05-08 14:40 ` [RFC PATCH 5/7] x86/mm/32: " Joerg Roedel
2020-05-08 14:40 ` [RFC PATCH 6/7] mm: Remove vmalloc_sync_(un)mappings() Joerg Roedel
2020-05-08 18:58   ` Steven Rostedt
2020-05-08 21:24     ` Joerg Roedel
2020-05-08 14:40 ` [RFC PATCH 7/7] x86/mm: Remove vmalloc faulting Joerg Roedel
2020-05-08 19:20 ` [RFC PATCH 0/7] mm: Get rid of vmalloc_sync_(un)mappings() Peter Zijlstra
2020-05-08 21:34   ` Joerg Roedel
2020-05-09  9:25     ` Peter Zijlstra
2020-05-09 17:54       ` Joerg Roedel
2020-05-10  1:11       ` Matthew Wilcox
2020-05-11  7:31         ` Peter Zijlstra
2020-05-11 15:52           ` Matthew Wilcox
2020-05-11 16:08             ` Matthew Wilcox
2020-05-11 17:02             ` Peter Zijlstra
2020-05-08 21:33 ` Andy Lutomirski
2020-05-08 21:36   ` Joerg Roedel
2020-05-08 23:49     ` Andy Lutomirski
2020-05-09 17:52       ` Joerg Roedel
2020-05-09 19:05         ` Andy Lutomirski
2020-05-09 21:57           ` Joerg Roedel
2020-05-10  5:05             ` Andy Lutomirski [this message]
2020-05-10  8:15               ` Joerg Roedel
2020-05-11  7:42           ` Peter Zijlstra
2020-05-11 15:36             ` Andy Lutomirski
2020-05-11 17:06               ` Peter Zijlstra
2020-05-11 19:14               ` Joerg Roedel
2020-05-11 19:36                 ` Andy Lutomirski
2020-05-12 15:02                   ` Joerg Roedel
2020-05-12 15:13                     ` Steven Rostedt
2020-05-11 20:50                 ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrWyQA=4y57PsKhhcRWpxfCufBpda5g7gyEVSST6H5FNJQ@mail.gmail.com' \
    --to=luto@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=joro@8bytes.org \
    --cc=jroedel@suse.de \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rjw@rjwysocki.net \
    --cc=rostedt@goodmis.org \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).