linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [GIT PULL] x86/mm changes for v5.9
@ 2020-08-03 19:03 Ingo Molnar
  2020-08-04  0:50 ` pr-tracker-bot
  2020-08-05 11:03 ` Jason A. Donenfeld
  0 siblings, 2 replies; 18+ messages in thread
From: Ingo Molnar @ 2020-08-03 19:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Thomas Gleixner, Borislav Petkov, Peter Zijlstra,
	Andrew Morton

Linus,

Please pull the latest x86/mm git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-2020-08-03

   # HEAD: 2b32ab031e82a109e2c5b0d30ce563db0fe286b4 x86/mm/64: Make sync_global_pgds() static

The biggest change is to not sync the vmalloc and ioremap ranges for x86-64 anymore.

 Thanks,

	Ingo

------------------>
Joerg Roedel (3):
      x86/mm: Pre-allocate P4D/PUD pages for vmalloc area
      x86/mm/64: Do not sync vmalloc/ioremap mappings
      x86/mm/64: Make sync_global_pgds() static


 arch/x86/include/asm/pgtable_64.h       |  2 --
 arch/x86/include/asm/pgtable_64_types.h |  2 --
 arch/x86/mm/init_64.c                   | 59 +++++++++++++++++++++++++++++----
 3 files changed, 53 insertions(+), 10 deletions(-)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-03 19:03 [GIT PULL] x86/mm changes for v5.9 Ingo Molnar
@ 2020-08-04  0:50 ` pr-tracker-bot
  2020-08-05 11:03 ` Jason A. Donenfeld
  1 sibling, 0 replies; 18+ messages in thread
From: pr-tracker-bot @ 2020-08-04  0:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

The pull request you sent on Mon, 3 Aug 2020 21:03:54 +0200:

> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-2020-08-03

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/e96ec8cf9ca12a8d6b3b896a2eccd4b92a1893ab

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.wiki.kernel.org/userdoc/prtracker

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-03 19:03 [GIT PULL] x86/mm changes for v5.9 Ingo Molnar
  2020-08-04  0:50 ` pr-tracker-bot
@ 2020-08-05 11:03 ` Jason A. Donenfeld
  2020-08-05 17:05   ` Linus Torvalds
  2020-08-06 22:56   ` Joerg Roedel
  1 sibling, 2 replies; 18+ messages in thread
From: Jason A. Donenfeld @ 2020-08-05 11:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, linux-kernel, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton, Joerg Roedel

Hi Ingo, Joerg,

On Mon, Aug 03, 2020 at 09:03:54PM +0200, Ingo Molnar wrote:
> Linus,
> 
> Please pull the latest x86/mm git tree from:
> 
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-mm-2020-08-03
> 
>    # HEAD: 2b32ab031e82a109e2c5b0d30ce563db0fe286b4 x86/mm/64: Make sync_global_pgds() static
> 
> The biggest change is to not sync the vmalloc and ioremap ranges for x86-64 anymore.
> 
>  Thanks,
> 
> 	Ingo
> 
> ------------------>
> Joerg Roedel (3):
>       x86/mm: Pre-allocate P4D/PUD pages for vmalloc area
>       x86/mm/64: Do not sync vmalloc/ioremap mappings
>       x86/mm/64: Make sync_global_pgds() static

The commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap
mappings") causes the OOPS below, in Linus' tree and in linux-next,
unearthed by my CI on <https://www.wireguard.com/build-status/>.
Bisecting reveals 8bb9bf242d1f, and reverting this makes the OOPS go
away.

Jason


BUG: unable to handle page fault for address: ffffe8ffffd00608
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 22 Comm: kworker/2:0 Not tainted 5.8.0+ #154
RIP: 0010:process_one_work+0x2c/0x2d0
Code: 41 56 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 08 48 8b 06 4c 8b 67 40 49 89 c6 45 30 f6 a8 04 b8 00 00 00 00 4c 0f 44 f0 <49> 8b 46 08 44 8b a8 00 01 05
RSP: 0018:ffff88800016be98 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff888000010c00 RCX: 0000000000000000
RDX: 00000000fffedb85 RSI: ffff888000391900 RDI: ffff888000010c00
RBP: ffff888000391900 R08: ffff888000391900 R09: ffff888000350090
R10: ffffffffff770df6 R11: ffff88800f91f840 R12: ffff88800f91f140
R13: ffff88800f91f160 R14: ffffe8ffffd00600 R15: ffff88800f91f140
FS:  0000000000000000(0000) GS:ffff88800f900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffe8ffffd00608 CR3: 00000000001be005 CR4: 00000000001706a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 worker_thread+0x4b/0x3b0
 ? rescuer_thread+0x360/0x360
 kthread+0x116/0x140
 ? __kthread_create_worker+0x110/0x110
 ret_from_fork+0x1f/0x30
CR2: ffffe8ffffd00608
---[ end trace ca0cfc5a79414fb2 ]---
RIP: 0010:process_one_work+0x2c/0x2d0
Code: 41 56 41 55 41 54 55 48 89 f5 53 48 89 fb 48 83 ec 08 48 8b 06 4c 8b 67 40 49 89 c6 45 30 f6 a8 04 b8 00 00 00 00 4c 0f 44 f0 <49> 8b 46 08 44 8b a8 00 01 05
RSP: 0018:ffff88800016be98 EFLAGS: 00010002
RAX: 0000000000000000 RBX: ffff888000010c00 RCX: 0000000000000000
RDX: 00000000fffedb85 RSI: ffff888000391900 RDI: ffff888000010c00
RBP: ffff888000391900 R08: ffff888000391900 R09: ffff888000350090
R10: ffffffffff770df6 R11: ffff88800f91f840 R12: ffff88800f91f140
R13: ffff88800f91f160 R14: ffffe8ffffd00600 R15: ffff88800f91f140
FS:  0000000000000000(0000) GS:ffff88800f900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffe8ffffd00608 CR3: 00000000001be005 CR4: 00000000001706a0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Kernel panic - not syncing: Fatal exception


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-05 11:03 ` Jason A. Donenfeld
@ 2020-08-05 17:05   ` Linus Torvalds
  2020-08-06 13:10     ` Ingo Molnar
  2020-08-06 22:56   ` Joerg Roedel
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-08-05 17:05 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Ingo Molnar, Linux Kernel Mailing List, Thomas Gleixner,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Joerg Roedel

On Wed, Aug 5, 2020 at 4:03 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> The commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap
> mappings") causes the OOPS below, in Linus' tree and in linux-next,
> unearthed by my CI on <https://www.wireguard.com/build-status/>.
> Bisecting reveals 8bb9bf242d1f, and reverting this makes the OOPS go
> away.

The oops happens early in the function, and the "Code:" line actually
gets almost the whole function prologue in it (missing first two bytes
are probably "push %rbp"):

   0: 41 56                push   %r14
   2: 41 55                push   %r13
   4: 41 54                push   %r12
   6: 55                    push   %rbp
   7: 48 89 f5              mov    %rsi,%rbp
   a: 53                    push   %rbx
   b: 48 89 fb              mov    %rdi,%rbx
   e: 48 83 ec 08          sub    $0x8,%rsp
  12: 48 8b 06              mov    (%rsi),%rax
  15: 4c 8b 67 40          mov    0x40(%rdi),%r12
  19: 49 89 c6              mov    %rax,%r14
  1c: 45 30 f6              xor    %r14b,%r14b
  1f: a8 04                test   $0x4,%al
  21: b8 00 00 00 00        mov    $0x0,%eax
  26: 4c 0f 44 f0          cmove  %rax,%r14
  2a:* 49 8b 46 08          mov    0x8(%r14),%rax <-- trapping instruction


> BUG: unable to handle page fault for address: ffffe8ffffd00608
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0

Yeah, missing page table because it wasn't copied.

Presumably because that kthread is using the active_mm of some random
user space process that didn't get sync'ed.

And the sync_global_pgds() may have ended up being sufficient
synchronization with whoever allocated thigns, even if it wasn't about
the TLB contents themselves.

So apparently the "the page-table pages are all pre-allocated now" is
simply not true. Joerg?

Unless somebody can figure this out fairly quickly, I think it should
just be reverted.

               Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-05 17:05   ` Linus Torvalds
@ 2020-08-06 13:10     ` Ingo Molnar
  2020-08-06 18:57       ` Joerg Roedel
  0 siblings, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2020-08-06 13:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Linux Kernel Mailing List, Thomas Gleixner,
	Borislav Petkov, Peter Zijlstra, Andrew Morton, Joerg Roedel


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Wed, Aug 5, 2020 at 4:03 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > The commit 8bb9bf242d1f ("x86/mm/64: Do not sync vmalloc/ioremap
> > mappings") causes the OOPS below, in Linus' tree and in linux-next,
> > unearthed by my CI on <https://www.wireguard.com/build-status/>.
> > Bisecting reveals 8bb9bf242d1f, and reverting this makes the OOPS go
> > away.
> 
> The oops happens early in the function, and the "Code:" line actually
> gets almost the whole function prologue in it (missing first two bytes
> are probably "push %rbp"):
> 
>    0: 41 56                push   %r14
>    2: 41 55                push   %r13
>    4: 41 54                push   %r12
>    6: 55                    push   %rbp
>    7: 48 89 f5              mov    %rsi,%rbp
>    a: 53                    push   %rbx
>    b: 48 89 fb              mov    %rdi,%rbx
>    e: 48 83 ec 08          sub    $0x8,%rsp
>   12: 48 8b 06              mov    (%rsi),%rax
>   15: 4c 8b 67 40          mov    0x40(%rdi),%r12
>   19: 49 89 c6              mov    %rax,%r14
>   1c: 45 30 f6              xor    %r14b,%r14b
>   1f: a8 04                test   $0x4,%al
>   21: b8 00 00 00 00        mov    $0x0,%eax
>   26: 4c 0f 44 f0          cmove  %rax,%r14
>   2a:* 49 8b 46 08          mov    0x8(%r14),%rax <-- trapping instruction
> 
> 
> > BUG: unable to handle page fault for address: ffffe8ffffd00608
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 0 P4D 0
> 
> Yeah, missing page table because it wasn't copied.
> 
> Presumably because that kthread is using the active_mm of some random
> user space process that didn't get sync'ed.
> 
> And the sync_global_pgds() may have ended up being sufficient
> synchronization with whoever allocated thigns, even if it wasn't about
> the TLB contents themselves.
> 
> So apparently the "the page-table pages are all pre-allocated now" is
> simply not true. Joerg?
> 
> Unless somebody can figure this out fairly quickly, I think it should
> just be reverted.

Agreed. Joerg?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 13:10     ` Ingo Molnar
@ 2020-08-06 18:57       ` Joerg Roedel
  2020-08-06 19:02         ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2020-08-06 18:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Aug 06, 2020 at 03:10:34PM +0200, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > So apparently the "the page-table pages are all pre-allocated now" is
> > simply not true. Joerg?

It pre-allocates the whole vmalloc/ioremap PUD/P4D pages, but I actually
only tested it with 4-level paging, as I don't have access to 5-level
paging hardware.

I'll try to make sense of that oops.

> > Unless somebody can figure this out fairly quickly, I think it should
> > just be reverted.
> 
> Agreed. Joerg?

Yeah, please just revert that commit for now until I have an idea what
went wrong.

	
	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 18:57       ` Joerg Roedel
@ 2020-08-06 19:02         ` Linus Torvalds
  2020-08-06 19:23           ` Joerg Roedel
  2020-08-06 21:20           ` Ingo Molnar
  0 siblings, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-08-06 19:02 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Aug 6, 2020 at 11:57 AM Joerg Roedel <jroedel@suse.de> wrote:
>
> On Thu, Aug 06, 2020 at 03:10:34PM +0200, Ingo Molnar wrote:
> >
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > So apparently the "the page-table pages are all pre-allocated now" is
> > > simply not true. Joerg?
>
> It pre-allocates the whole vmalloc/ioremap PUD/P4D pages, but I actually
> only tested it with 4-level paging, as I don't have access to 5-level
> paging hardware.

I don't think Jason has either.

The

        PGD 0 P4D 0

line tells us that "pgd_present()" is true, even though PGD is 0
(otherwise it wouldn't print the P4D part). That means that he doesn't
have l5 enabled.

But you may obviously have different settings for CONFIG_X86_5LEVEL,
and maybe that ends up changing something?

But since apparently it's not immediately obvious what the problem is,
I'll revert it for now.

                Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 19:02         ` Linus Torvalds
@ 2020-08-06 19:23           ` Joerg Roedel
  2020-08-06 19:42             ` Linus Torvalds
  2020-08-07  9:53             ` Jason A. Donenfeld
  2020-08-06 21:20           ` Ingo Molnar
  1 sibling, 2 replies; 18+ messages in thread
From: Joerg Roedel @ 2020-08-06 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Aug 06, 2020 at 12:02:40PM -0700, Linus Torvalds wrote:
> But you may obviously have different settings for CONFIG_X86_5LEVEL,
> and maybe that ends up changing something?
> 
> But since apparently it's not immediately obvious what the problem is,
> I'll revert it for now.

Yes, that's the best for now. My gut feeling is that the fault Jason is
seeing didn't happen on a vmalloc address, but I can't prove that yet.

And if this is true it means that more work is needed before the syncing
on x86-64 can be removed, so reverting is the best for now.

Jason, can you share more details about the test setup which triggers
this? Like the .config and the machine setup, ideally a qemu
command-line, and how to reproduce it on that setup.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 19:23           ` Joerg Roedel
@ 2020-08-06 19:42             ` Linus Torvalds
  2020-08-06 19:57               ` Tejun Heo
  2020-08-07  9:53             ` Jason A. Donenfeld
  1 sibling, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2020-08-06 19:42 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Ingo Molnar, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton,
	Tejun Heo

On Thu, Aug 6, 2020 at 12:23 PM Joerg Roedel <jroedel@suse.de> wrote:
>
> Yes, that's the best for now. My gut feeling is that the fault Jason is
> seeing didn't happen on a vmalloc address, but I can't prove that yet.

No, it's definitely fairly high in the vmalloc space. Look at the
faulting address:

   BUG: unable to handle page fault for address: ffffe8ffffd00608

and the code sequence is this:

>   12: 48 8b 06              mov    (%rsi),%rax
>   15: 4c 8b 67 40          mov    0x40(%rdi),%r12
>   19: 49 89 c6              mov    %rax,%r14
>   1c: 45 30 f6              xor    %r14b,%r14b
>   1f: a8 04                test   $0x4,%al
>   21: b8 00 00 00 00        mov    $0x0,%eax
>   26: 4c 0f 44 f0          cmove  %rax,%r14

that admittedly odd sequence is get_work_pwq(work)

And then the faulting instruction is:

>   2a:* 49 8b 46 08          mov    0x8(%r14),%rax <-- trapping instruction

and this is the "->wq" dereference.

So it's the pwq->wq that traps, with 'pwq' being the trapping base
pointer, and clearly being in the vmalloc space.

I think pwq may a percpu allocation, so not _directly_ vmalloc().
Adding Tejun to the cc in case he can clarify ("No, silly Linus, it's
allocated here..").

                Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 19:42             ` Linus Torvalds
@ 2020-08-06 19:57               ` Tejun Heo
  0 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2020-08-06 19:57 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Ingo Molnar, Jason A. Donenfeld,
	Linux Kernel Mailing List, Thomas Gleixner, Borislav Petkov,
	Peter Zijlstra, Andrew Morton

On Thu, Aug 06, 2020 at 12:42:23PM -0700, Linus Torvalds wrote:
> that admittedly odd sequence is get_work_pwq(work)
> 
> And then the faulting instruction is:
> 
> >   2a:* 49 8b 46 08          mov    0x8(%r14),%rax <-- trapping instruction
> 
> and this is the "->wq" dereference.
> 
> So it's the pwq->wq that traps, with 'pwq' being the trapping base
> pointer, and clearly being in the vmalloc space.
> 
> I think pwq may a percpu allocation, so not _directly_ vmalloc().
> Adding Tejun to the cc in case he can clarify ("No, silly Linus, it's
> allocated here..").

Hey, silly Linus, yeap, they're per-cpu allocations and will be in vmalloc
address space for per-cpu workqueues. For unbound workqueues, they're
regular kmallocs. The per-cpu allocation happens in alloc_and_link_pwqs():

  static int alloc_and_link_pwqs(struct workqueue_struct *wq)
  {
          bool highpri = wq->flags & WQ_HIGHPRI;
          int cpu, ret;

          if (!(wq->flags & WQ_UNBOUND)) {
                  wq->cpu_pwqs = alloc_percpu(struct pool_workqueue);
                  if (!wq->cpu_pwqs)
                          return -ENOMEM;

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 19:02         ` Linus Torvalds
  2020-08-06 19:23           ` Joerg Roedel
@ 2020-08-06 21:20           ` Ingo Molnar
  2020-08-07  8:47             ` Joerg Roedel
  1 sibling, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2020-08-06 21:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Joerg Roedel, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Thu, Aug 6, 2020 at 11:57 AM Joerg Roedel <jroedel@suse.de> wrote:
> >
> > On Thu, Aug 06, 2020 at 03:10:34PM +0200, Ingo Molnar wrote:
> > >
> > > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > > > So apparently the "the page-table pages are all pre-allocated now" is
> > > > simply not true. Joerg?
> >
> > It pre-allocates the whole vmalloc/ioremap PUD/P4D pages, but I actually
> > only tested it with 4-level paging, as I don't have access to 5-level
> > paging hardware.
> 
> I don't think Jason has either.
> 
> The
> 
>         PGD 0 P4D 0
> 
> line tells us that "pgd_present()" is true, even though PGD is 0
> (otherwise it wouldn't print the P4D part). That means that he doesn't
> have l5 enabled.
> 
> But you may obviously have different settings for CONFIG_X86_5LEVEL,
> and maybe that ends up changing something?
> 
> But since apparently it's not immediately obvious what the problem is,
> I'll revert it for now.

I've reverted it in x86/urgent as well earlier today, can send you 
that tree right now if you prefer that route.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-05 11:03 ` Jason A. Donenfeld
  2020-08-05 17:05   ` Linus Torvalds
@ 2020-08-06 22:56   ` Joerg Roedel
  2020-08-06 23:12     ` Joerg Roedel
  1 sibling, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2020-08-06 22:56 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Ingo Molnar, Linus Torvalds, linux-kernel, Thomas Gleixner,
	Borislav Petkov, Peter Zijlstra, Andrew Morton

On Wed, Aug 05, 2020 at 01:03:48PM +0200, Jason A. Donenfeld wrote:
> BUG: unable to handle page fault for address: ffffe8ffffd00608

Okay, looks like my usage of the page-table macros is bogus, seems I
don't understand their usage as good as I thought. The p?d_none checks
in the allocation path are wrong and led to the bug. In fact it caused
only the first PUD entry to be allocated and in the later iterations of
the loop it always ended up on the same PUD entry.

I still don't fully understand why, but its late here and my head spins.
So I take another look tomorrow in the hope to understand it better.
Please remind me to not take vacation again during merge windows :)

Anyway...

Jason, does the attached diff fix the issue in your testing? For me it
makes all PUD pages correctly allocated.

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index c7a47603537f..e4abf73167d0 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -696,6 +696,7 @@ static void __init memory_map_bottom_up(unsigned long map_start,
 static void __init init_trampoline(void)
 {
 #ifdef CONFIG_X86_64
+
 	if (!kaslr_memory_enabled())
 		trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)];
 	else
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index e65b96f381a7..351fac590b02 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1248,27 +1248,23 @@ static void __init preallocate_vmalloc_pages(void)
 		p4d_t *p4d;
 		pud_t *pud;
 
-		p4d = p4d_offset(pgd, addr);
-		if (p4d_none(*p4d)) {
-			/* Can only happen with 5-level paging */
-			p4d = p4d_alloc(&init_mm, pgd, addr);
-			if (!p4d) {
-				lvl = "p4d";
-				goto failed;
-			}
+		p4d = p4d_alloc(&init_mm, pgd, addr);
+		if (!p4d) {
+			lvl = "p4d";
+			goto failed;
 		}
 
 		if (pgtable_l5_enabled())
 			continue;
 
-		pud = pud_offset(p4d, addr);
-		if (pud_none(*pud)) {
-			/* Ends up here only with 4-level paging */
-			pud = pud_alloc(&init_mm, p4d, addr);
-			if (!pud) {
-				lvl = "pud";
-				goto failed;
-			}
+		/*
+		 * With 4-level paging the P4D is folded, so allocate a
+		 * PUD to have one level below PGD present.
+		 */
+		pud = pud_alloc(&init_mm, p4d, addr);
+		if (!pud) {
+			lvl = "pud";
+			goto failed;
 		}
 	}
 

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 22:56   ` Joerg Roedel
@ 2020-08-06 23:12     ` Joerg Roedel
  0 siblings, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2020-08-06 23:12 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Ingo Molnar, Linus Torvalds, linux-kernel, Thomas Gleixner,
	Borislav Petkov, Peter Zijlstra, Andrew Morton

On Fri, Aug 07, 2020 at 12:56:51AM +0200, Joerg Roedel wrote:
> I still don't fully understand why, but its late here and my head spins.
> So I take another look tomorrow in the hope to understand it better.

I think I know what the bug is, I am calling p4d_offset(pgd, addr)
without checking for pgd_none() first. That is still a bug in the code
wich needs fixing, even with the revert.


	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 21:20           ` Ingo Molnar
@ 2020-08-07  8:47             ` Joerg Roedel
  2020-08-13 19:30               ` Ingo Molnar
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2020-08-07  8:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Aug 06, 2020 at 11:20:19PM +0200, Ingo Molnar wrote:
> I've reverted it in x86/urgent as well earlier today, can send you 
> that tree right now if you prefer that route.

I sent a fix for preallocate_vmalloc_pages() to correctly pre-allocate
the vmalloc PGD entries. I verified that it works and that
swapper_pg_dir contains the correct entries now. This should also fix
the issue Jason is seeing.

Sorry for screwing this up :-(

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-06 19:23           ` Joerg Roedel
  2020-08-06 19:42             ` Linus Torvalds
@ 2020-08-07  9:53             ` Jason A. Donenfeld
  1 sibling, 0 replies; 18+ messages in thread
From: Jason A. Donenfeld @ 2020-08-07  9:53 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

Hey Joerg,

On Thu, Aug 6, 2020 at 9:23 PM Joerg Roedel <jroedel@suse.de> wrote:
> Jason, can you share more details about the test setup which triggers
> this? Like the .config and the machine setup, ideally a qemu
> command-line, and how to reproduce it on that setup.

make -C tools/testing/selftests/wireguard/qemu -j$(nproc)

I can also confirm that the patch you sent to the list fixes the issue.

Jason

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-07  8:47             ` Joerg Roedel
@ 2020-08-13 19:30               ` Ingo Molnar
  2020-08-13 19:38                 ` Linus Torvalds
  2020-08-14 14:26                 ` Joerg Roedel
  0 siblings, 2 replies; 18+ messages in thread
From: Ingo Molnar @ 2020-08-13 19:30 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Linus Torvalds, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton


* Joerg Roedel <jroedel@suse.de> wrote:

> On Thu, Aug 06, 2020 at 11:20:19PM +0200, Ingo Molnar wrote:
> > I've reverted it in x86/urgent as well earlier today, can send you 
> > that tree right now if you prefer that route.
> 
> I sent a fix for preallocate_vmalloc_pages() to correctly pre-allocate
> the vmalloc PGD entries. I verified that it works and that
> swapper_pg_dir contains the correct entries now. This should also fix
> the issue Jason is seeing.

Thanks!

There's one thing left to do. Linus has reverted the patch which 
exposed this bug:

  7b4ea9456dd3: ("Revert "x86/mm/64: Do not sync vmalloc/ioremap mappings"")

and has applied your fix:

  995909a4e22b: ("x86/mm/64: Do not dereference non-present PGD entries")

I think now we can re-apply the original commit:

  8bb9bf242d1f: ("x86/mm/64: Do not sync vmalloc/ioremap mappings")

Mind re-sending it, with an updated changelog that explains why it's 
now truly safe?

Would be tentatively scheduled for v5.10 though, we've had enough 
excitement in this area for v5.9 I think. :-/

> Sorry for screwing this up :-(

No problem, and it was my fault too: I sent 8bb9bf242d1f to Linus too 
quickly, just 7 days after applying it - x86/mm patches usually need a 
few weeks of testing.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-13 19:30               ` Ingo Molnar
@ 2020-08-13 19:38                 ` Linus Torvalds
  2020-08-14 14:26                 ` Joerg Roedel
  1 sibling, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2020-08-13 19:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Joerg Roedel, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

On Thu, Aug 13, 2020 at 12:30 PM Ingo Molnar <mingo@kernel.org> wrote:
>
> Would be tentatively scheduled for v5.10 though, we've had enough
> excitement in this area for v5.9 I think. :-/

I think I can take it for 5.9 again if you send a pull request my way.

If it causes any other problems, we'll obviously revert it again and
at that point it's final for 5.9.

But the one report it got was fixed early and quickly enough that I
don't feel like things were _horribly_ broken, just a mistake that
wasn't something horribly fundamental that needs another release to
think about.

                  Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [GIT PULL] x86/mm changes for v5.9
  2020-08-13 19:30               ` Ingo Molnar
  2020-08-13 19:38                 ` Linus Torvalds
@ 2020-08-14 14:26                 ` Joerg Roedel
  1 sibling, 0 replies; 18+ messages in thread
From: Joerg Roedel @ 2020-08-14 14:26 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Jason A. Donenfeld, Linux Kernel Mailing List,
	Thomas Gleixner, Borislav Petkov, Peter Zijlstra, Andrew Morton

Hi Ingo,

On Thu, Aug 13, 2020 at 09:30:08PM +0200, Ingo Molnar wrote:
> Mind re-sending it, with an updated changelog that explains why it's 
> now truly safe?

I have updated the commit and will send it out later today, maybe
together with a comment update in the preallocate_vmalloc_pages()
function.

Thanks,

	Joerg


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-08-14 14:26 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-03 19:03 [GIT PULL] x86/mm changes for v5.9 Ingo Molnar
2020-08-04  0:50 ` pr-tracker-bot
2020-08-05 11:03 ` Jason A. Donenfeld
2020-08-05 17:05   ` Linus Torvalds
2020-08-06 13:10     ` Ingo Molnar
2020-08-06 18:57       ` Joerg Roedel
2020-08-06 19:02         ` Linus Torvalds
2020-08-06 19:23           ` Joerg Roedel
2020-08-06 19:42             ` Linus Torvalds
2020-08-06 19:57               ` Tejun Heo
2020-08-07  9:53             ` Jason A. Donenfeld
2020-08-06 21:20           ` Ingo Molnar
2020-08-07  8:47             ` Joerg Roedel
2020-08-13 19:30               ` Ingo Molnar
2020-08-13 19:38                 ` Linus Torvalds
2020-08-14 14:26                 ` Joerg Roedel
2020-08-06 22:56   ` Joerg Roedel
2020-08-06 23:12     ` Joerg Roedel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).