All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>,
	linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org,
	Richard Henderson <richard.henderson@linaro.org>
Subject: Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
Date: Thu, 2 Feb 2023 06:58:03 +0000	[thread overview]
Message-ID: <Y9te+4n4ajSF++Ex@ZenIV> (raw)
In-Reply-To: <CAHk-=wjiwFzEGd_60H3nbgVB=R_8KTcfUJmXy=hSXCvLrXQRFA@mail.gmail.com>

On Tue, Jan 31, 2023 at 01:19:59PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm...  What about the semantics of get_user() of unmapped address?
> > Some architectures do quiet EFAULT; some (including alpha) hit
> > the sucker with SIGBUS, no matter what.
> 
> I think we should strive to just make this all common.
> 
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".

BTW, speaking of alpha page faults - maybe I'm misreading the manual,
but it seems to imply that interrupts are *not* disabled when entering
page fault handler:

Table 24–2 Entry Point Address Registers
Entry Point Value in a0             Value in a1   Value in a2         PS<IPL>
entArith    Exception summary       Register mask UNPREDICTABLE       Unchanged
entIF       Fault or trap type code UNPREDICTABLE UNPREDICTABLE       Unchanged
entInt      Interrupt type          Vector        Interrupt parameter Priority of interrupt
entMM       VA                      MMCSR         Cause               Unchanged
entSys      p0                      p1            p2                  Unchanged
entUna      VA                      Opcode        Src/Dst             Unchanged

So there's nothing to prevent an interrupt hitting just as we reach
entMM, with interrupt handler stepping on a vmalloc'ed area and
triggering another page fault.

If that is correct, this
                /* Synchronize this task's top level page-table
                   with the "reference" page table from init.  */
                long index = pgd_index(address);
                pgd_t *pgd, *pgd_k;

                pgd = current->active_mm->pgd + index;
                pgd_k = swapper_pg_dir + index;
                if (!pgd_present(*pgd) && pgd_present(*pgd_k)) {
                        pgd_val(*pgd) = pgd_val(*pgd_k);
                        return;
                }
                goto no_context;
is not just missing local_irq_save()/local_irq_restore() around that
fragment - if it finds pgd already present, it needs to check pte
before deciding to proceed to no_context.

Suppose we access vmalloc area and corresponding pgd is not
present in current->active_mm.  Just as we get to entMM,
an interrupt arrives and proceeds to access something
covered by the same pgd.  OK, current->active_mm is still
not present, we get another page fault and do_page_fault()
gets to the quoted code.  pgd is copied from the swapper_pg_dir,
do_page_fault() returns and we get back to the instruction in
interrupt handler that had triggered the second #PF.  This
time around it succeeds.  Once the interrupt handler completes
we are back to entMM.  Once *that* gets to do_page_fault()
we hit the quoted code again.  Only this time around pgd
*is* present and instead of returning we get to no_context.
And since it's been a normal access to vmalloc'ed memory,
there's nothing to be found in exception table.  Oops...

	AFAICS, one way to deal with that is to treat
(unlikely) pgd_present(*pgd) as "get to pte, return if
it looks legitimate, proceed to no_context if it isn't".

Other bugs in the same area:
	* we ought to compare address with VMALLOC_START,
not TASK_SIZE.
	* we ought to do that *before* checking for
kernel threads/pagefault_disable() being in effect.

Wait a minute - pgd_present() on alpha has become constant 1
since a73c948952cc "alpha: use pgtable-nopud instead of 4level-fixup"

So that thing had been completely broken for 3 years and nobody
had noticed.  And that's really completely broken - it stopped
copying top-level entries since that commit.

That's also not hard to fix, but...
	* CONFIG_ALPHA_LARGE_VMALLOC had been racy all along
	* it had very limited use (need for >8Gb of vmalloc
space)
	* it had stopped working in late 2019 and nobody cared.

How about removing that kludge?  Richard?

WARNING: multiple messages have this Message-ID (diff)
From: Al Viro <viro@zeniv.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Xu <peterx@redhat.com>,
	linux-arch@vger.kernel.org, linux-alpha@vger.kernel.org,
	Richard Henderson <richard.henderson@linaro.org>
Subject: Re: [RFC][PATCHSET] VM_FAULT_RETRY fixes
Date: Thu, 2 Feb 2023 06:58:03 +0000	[thread overview]
Message-ID: <Y9te+4n4ajSF++Ex@ZenIV> (raw)
In-Reply-To: <CAHk-=wjiwFzEGd_60H3nbgVB=R_8KTcfUJmXy=hSXCvLrXQRFA@mail.gmail.com>

On Tue, Jan 31, 2023 at 01:19:59PM -0800, Linus Torvalds wrote:
> On Tue, Jan 31, 2023 at 1:10 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > Umm...  What about the semantics of get_user() of unmapped address?
> > Some architectures do quiet EFAULT; some (including alpha) hit
> > the sucker with SIGBUS, no matter what.
> 
> I think we should strive to just make this all common.
> 
> The reason alpha is different is almost certainly not intentional, but
> a combination of "pure accident" and "nobody actually cares".

BTW, speaking of alpha page faults - maybe I'm misreading the manual,
but it seems to imply that interrupts are *not* disabled when entering
page fault handler:

Table 24–2 Entry Point Address Registers
Entry Point Value in a0             Value in a1   Value in a2         PS<IPL>
entArith    Exception summary       Register mask UNPREDICTABLE       Unchanged
entIF       Fault or trap type code UNPREDICTABLE UNPREDICTABLE       Unchanged
entInt      Interrupt type          Vector        Interrupt parameter Priority of interrupt
entMM       VA                      MMCSR         Cause               Unchanged
entSys      p0                      p1            p2                  Unchanged
entUna      VA                      Opcode        Src/Dst             Unchanged

So there's nothing to prevent an interrupt hitting just as we reach
entMM, with interrupt handler stepping on a vmalloc'ed area and
triggering another page fault.

If that is correct, this
                /* Synchronize this task's top level page-table
                   with the "reference" page table from init.  */
                long index = pgd_index(address);
                pgd_t *pgd, *pgd_k;

                pgd = current->active_mm->pgd + index;
                pgd_k = swapper_pg_dir + index;
                if (!pgd_present(*pgd) && pgd_present(*pgd_k)) {
                        pgd_val(*pgd) = pgd_val(*pgd_k);
                        return;
                }
                goto no_context;
is not just missing local_irq_save()/local_irq_restore() around that
fragment - if it finds pgd already present, it needs to check pte
before deciding to proceed to no_context.

Suppose we access vmalloc area and corresponding pgd is not
present in current->active_mm.  Just as we get to entMM,
an interrupt arrives and proceeds to access something
covered by the same pgd.  OK, current->active_mm is still
not present, we get another page fault and do_page_fault()
gets to the quoted code.  pgd is copied from the swapper_pg_dir,
do_page_fault() returns and we get back to the instruction in
interrupt handler that had triggered the second #PF.  This
time around it succeeds.  Once the interrupt handler completes
we are back to entMM.  Once *that* gets to do_page_fault()
we hit the quoted code again.  Only this time around pgd
*is* present and instead of returning we get to no_context.
And since it's been a normal access to vmalloc'ed memory,
there's nothing to be found in exception table.  Oops...

	AFAICS, one way to deal with that is to treat
(unlikely) pgd_present(*pgd) as "get to pte, return if
it looks legitimate, proceed to no_context if it isn't".

Other bugs in the same area:
	* we ought to compare address with VMALLOC_START,
not TASK_SIZE.
	* we ought to do that *before* checking for
kernel threads/pagefault_disable() being in effect.

Wait a minute - pgd_present() on alpha has become constant 1
since a73c948952cc "alpha: use pgtable-nopud instead of 4level-fixup"

So that thing had been completely broken for 3 years and nobody
had noticed.  And that's really completely broken - it stopped
copying top-level entries since that commit.

That's also not hard to fix, but...
	* CONFIG_ALPHA_LARGE_VMALLOC had been racy all along
	* it had very limited use (need for >8Gb of vmalloc
space)
	* it had stopped working in late 2019 and nobody cared.

How about removing that kludge?  Richard?

  parent reply	other threads:[~2023-02-02  6:58 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-31 20:02 [RFC][PATCHSET] VM_FAULT_RETRY fixes Al Viro
2023-01-31 20:02 ` Al Viro
2023-01-31 20:03 ` [PATCH 01/10] alpha: fix livelock in uaccess Al Viro
2023-01-31 20:03   ` Al Viro
2023-03-07  0:48   ` patchwork-bot+linux-riscv
2023-03-07  0:48     ` patchwork-bot+linux-riscv
2023-01-31 20:03 ` [PATCH 02/10] hexagon: " Al Viro
2023-01-31 20:03   ` Al Viro
2023-02-10  2:59   ` Brian Cain
2023-02-10  2:59     ` Brian Cain
2023-01-31 20:04 ` [PATCH 03/10] ia64: " Al Viro
2023-01-31 20:04   ` Al Viro
2023-01-31 20:04 ` [PATCH 04/10] m68k: " Al Viro
2023-01-31 20:04   ` Al Viro
2023-02-05  6:18   ` Finn Thain
2023-02-05  6:18     ` Finn Thain
2023-02-05  6:18     ` Finn Thain
2023-02-05 18:51     ` Linus Torvalds
2023-02-05 18:51       ` Linus Torvalds
2023-02-05 18:51       ` Linus Torvalds
2023-02-07  3:07       ` Finn Thain
2023-02-07  3:07         ` Finn Thain
2023-02-07  3:07         ` Finn Thain
2023-02-05 20:39     ` Al Viro
2023-02-05 20:39       ` Al Viro
2023-02-05 20:39       ` Al Viro
2023-02-05 20:41       ` Linus Torvalds
2023-02-05 20:41         ` Linus Torvalds
2023-02-05 20:41         ` Linus Torvalds
2023-02-06 12:08   ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-01-31 20:05 ` [PATCH 05/10] microblaze: " Al Viro
2023-01-31 20:05   ` Al Viro
2023-01-31 20:05 ` [PATCH 06/10] nios2: " Al Viro
2023-01-31 20:05   ` Al Viro
2023-01-31 20:06 ` [PATCH 07/10] openrisc: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-01-31 20:06 ` [PATCH 08/10] parisc: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-02-06 16:58   ` Helge Deller
2023-02-06 16:58     ` Helge Deller
2023-02-06 16:58     ` Helge Deller
2023-02-28 17:34     ` Al Viro
2023-02-28 17:34       ` Al Viro
2023-02-28 18:26       ` Helge Deller
2023-02-28 19:14         ` Al Viro
2023-02-28 19:32           ` Helge Deller
2023-02-28 20:00             ` Helge Deller
2023-02-28 20:22               ` Helge Deller
2023-02-28 22:57                 ` Al Viro
2023-03-01  4:00                   ` Helge Deller
2023-03-02 17:53                     ` Al Viro
2023-02-28 15:22   ` Guenter Roeck
2023-02-28 15:22     ` Guenter Roeck
2023-02-28 15:22     ` Guenter Roeck
2023-02-28 19:18     ` Michael Schmitz
2023-02-28 19:18       ` Michael Schmitz
2023-02-28 19:18       ` Michael Schmitz
2023-01-31 20:06 ` [PATCH 09/10] riscv: " Al Viro
2023-01-31 20:06   ` Al Viro
2023-02-06 20:06   ` Björn Töpel
2023-02-06 20:06     ` Björn Töpel
2023-02-06 20:06     ` Björn Töpel
2023-02-07 16:11   ` Geert Uytterhoeven
2023-02-07 16:11     ` Geert Uytterhoeven
2023-02-07 16:11     ` Geert Uytterhoeven
2023-01-31 20:07 ` [PATCH 10/10] sparc: " Al Viro
2023-01-31 20:07   ` Al Viro
2023-01-31 20:24 ` [RFC][PATCHSET] VM_FAULT_RETRY fixes Linus Torvalds
2023-01-31 20:24   ` Linus Torvalds
2023-01-31 20:24   ` Linus Torvalds
2023-01-31 21:10   ` Al Viro
2023-01-31 21:10     ` Al Viro
2023-01-31 21:19     ` Linus Torvalds
2023-01-31 21:19       ` Linus Torvalds
2023-01-31 21:19       ` Linus Torvalds
2023-01-31 21:49       ` Al Viro
2023-01-31 21:49         ` Al Viro
2023-02-01  0:00         ` Linus Torvalds
2023-02-01  0:00           ` Linus Torvalds
2023-02-01  0:00           ` Linus Torvalds
2023-02-01 19:48           ` Peter Xu
2023-02-01 19:48             ` Peter Xu
2023-02-01 19:48             ` Peter Xu
2023-02-01 22:18             ` Al Viro
2023-02-01 22:18               ` Al Viro
2023-02-01 22:18               ` Al Viro
2023-02-02  0:57               ` Al Viro
2023-02-02  0:57                 ` Al Viro
2023-02-02  0:57                 ` Al Viro
2023-02-02 22:56               ` Peter Xu
2023-02-02 22:56                 ` Peter Xu
2023-02-02 22:56                 ` Peter Xu
2023-02-04  0:26                 ` Al Viro
2023-02-04  0:26                   ` Al Viro
2023-02-04  0:26                   ` Al Viro
2023-02-05  5:10                   ` Al Viro
2023-02-05  5:10                     ` Al Viro
2023-02-05  5:10                     ` Al Viro
2023-02-04  0:47         ` [loongarch oddities] " Al Viro
2023-02-01  8:21       ` Helge Deller
2023-02-01  8:21         ` Helge Deller
2023-02-01  8:21         ` Helge Deller
2023-02-01 19:51         ` Linus Torvalds
2023-02-01 19:51           ` Linus Torvalds
2023-02-01 19:51           ` Linus Torvalds
2023-02-02  6:58       ` Al Viro [this message]
2023-02-02  6:58         ` Al Viro
2023-02-02  8:54         ` Michael Cree
2023-02-02  9:56           ` John Paul Adrian Glaubitz
2023-02-02 15:20           ` Al Viro
2023-02-02 20:20             ` Al Viro
2023-02-02 20:34         ` Linus Torvalds
2023-02-01 10:50 ` Mark Rutland
2023-02-01 10:50   ` Mark Rutland
2023-02-01 10:50   ` Mark Rutland
2023-02-06 12:08   ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven
2023-02-06 12:08     ` Geert Uytterhoeven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9te+4n4ajSF++Ex@ZenIV \
    --to=viro@zeniv.linux.org.uk \
    --cc=linux-alpha@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=peterx@redhat.com \
    --cc=richard.henderson@linaro.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.