x86/mm: determine whether the fault address is canonical
diff mbox series

Message ID 20191004134501.30651-1-changbin.du@gmail.com
State New
Headers show
Series
  • x86/mm: determine whether the fault address is canonical
Related show

Commit Message

Changbin Du Oct. 4, 2019, 1:45 p.m. UTC
We know the answer, so don't ask the user.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
---
 arch/x86/mm/extable.c     |  5 ++++-
 arch/x86/mm/mm_internal.h | 11 +++++++++++
 2 files changed, 15 insertions(+), 1 deletion(-)

Comments

Dave Hansen Oct. 4, 2019, 2:39 p.m. UTC | #1
On 10/4/19 6:45 AM, Changbin Du wrote:
> +static inline bool is_canonical_addr(u64 addr)
> +{
> +#ifdef CONFIG_X86_64
> +	int shift = 64 - boot_cpu_data.x86_phys_bits;

I think you mean to check the virtual bits member, not "phys_bits".

BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
Would one of those work in this case?

As for the error message:

>  {
> -	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> +	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> +		  is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");

I've always read that as "the GP might have been caused by a
non-canonical access".  The main nit I'd have with the change is that I
don't think all #GP's during user access functions which are given a
non-canonical address *necessarily* caused the #GP.

There are a billion ways you can get a #GP and I bet canonical
violations aren't the only way you can get one in a user copy function.
Andy Lutomirski Oct. 4, 2019, 2:59 p.m. UTC | #2
On Fri, Oct 4, 2019 at 6:45 AM Changbin Du <changbin.du@gmail.com> wrote:
>
> We know the answer, so don't ask the user.
>
> Signed-off-by: Changbin Du <changbin.du@gmail.com>
> ---
>  arch/x86/mm/extable.c     |  5 ++++-
>  arch/x86/mm/mm_internal.h | 11 +++++++++++
>  2 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
> index 4d75bc656f97..5196e586756f 100644
> --- a/arch/x86/mm/extable.c
> +++ b/arch/x86/mm/extable.c
> @@ -8,6 +8,8 @@
>  #include <asm/traps.h>
>  #include <asm/kdebug.h>
>
> +#include "mm_internal.h"
> +
>  typedef bool (*ex_handler_t)(const struct exception_table_entry *,
>                             struct pt_regs *, int, unsigned long,
>                             unsigned long);
> @@ -123,7 +125,8 @@ __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
>                                   unsigned long error_code,
>                                   unsigned long fault_addr)
>  {
> -       WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> +       WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> +                 is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");

Unless the hardware behaves rather differently from the way I think it
does, fault_addr is garbage for anything other than #PF and sometimes
for #DF.  (And maybe the virtualization faults?)  I don't believe that
#GP fills in CR2.
Dave Hansen Oct. 4, 2019, 3:14 p.m. UTC | #3
On 10/4/19 7:59 AM, Andy Lutomirski wrote:
>> @@ -123,7 +125,8 @@ __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
>>                                   unsigned long error_code,
>>                                   unsigned long fault_addr)
>>  {
>> -       WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
>> +       WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
>> +                 is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> Unless the hardware behaves rather differently from the way I think it
> does, fault_addr is garbage for anything other than #PF and sometimes
> for #DF.  (And maybe the virtualization faults?)  I don't believe that
> #GP fills in CR2.

For #GP, we do:

do_general_protection(struct pt_regs *regs, long error_code)
{
...
        if (!user_mode(regs)) {
                if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
                        return;

Where the 0 is 'fault_addr'.  I'm not sure any other way that
ex_handler_uaccess() can get called with trapnr == X86_TRAP_GP.  0 is
canonical last I checked, which would make this patch a bit academic. :)
Sean Christopherson Oct. 4, 2019, 3:31 p.m. UTC | #4
On Fri, Oct 04, 2019 at 07:39:08AM -0700, Dave Hansen wrote:
> On 10/4/19 6:45 AM, Changbin Du wrote:
> > +static inline bool is_canonical_addr(u64 addr)
> > +{
> > +#ifdef CONFIG_X86_64
> > +	int shift = 64 - boot_cpu_data.x86_phys_bits;
> 
> I think you mean to check the virtual bits member, not "phys_bits".
> 
> BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
> Would one of those work in this case?
> 
> As for the error message:
> 
> >  {
> > -	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> > +	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> > +		  is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> 
> I've always read that as "the GP might have been caused by a
> non-canonical access".  The main nit I'd have with the change is that I
> don't think all #GP's during user access functions which are given a
> non-canonical address *necessarily* caused the #GP.
> 
> There are a billion ways you can get a #GP and I bet canonical
> violations aren't the only way you can get one in a user copy function.

All the other reasons would require a fairly egregious kernel bug, hence
the speculation that the #GP is due to a non-canonical address.  Something
like the following would be more precise, though highly unlikely to ever
be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
error code that went unnoticed for years.

	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
		  (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
		  					       "Segmentation bug");
Changbin Du Oct. 6, 2019, 2:29 a.m. UTC | #5
On Fri, Oct 04, 2019 at 08:14:25AM -0700, Dave Hansen wrote:
> On 10/4/19 7:59 AM, Andy Lutomirski wrote:
> >> @@ -123,7 +125,8 @@ __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
> >>                                   unsigned long error_code,
> >>                                   unsigned long fault_addr)
> >>  {
> >> -       WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> >> +       WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> >> +                 is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> > Unless the hardware behaves rather differently from the way I think it
> > does, fault_addr is garbage for anything other than #PF and sometimes
> > for #DF.  (And maybe the virtualization faults?)  I don't believe that
> > #GP fills in CR2.
> 
> For #GP, we do:
> 
> do_general_protection(struct pt_regs *regs, long error_code)
> {
> ...
>         if (!user_mode(regs)) {
>                 if (fixup_exception(regs, X86_TRAP_GP, error_code, 0))
>                         return;
> 
> Where the 0 is 'fault_addr'.  I'm not sure any other way that
> ex_handler_uaccess() can get called with trapnr == X86_TRAP_GP.  0 is
> canonical last I checked, which would make this patch a bit academic. :)
My fault. I thought the 'fault_addr' is filled with a valid value. So we really
don't know the answer without decoding the instruction which causes this #GP. :)
Ingo Molnar Oct. 7, 2019, 2:32 p.m. UTC | #6
* Sean Christopherson <sean.j.christopherson@intel.com> wrote:

> On Fri, Oct 04, 2019 at 07:39:08AM -0700, Dave Hansen wrote:
> > On 10/4/19 6:45 AM, Changbin Du wrote:
> > > +static inline bool is_canonical_addr(u64 addr)
> > > +{
> > > +#ifdef CONFIG_X86_64
> > > +	int shift = 64 - boot_cpu_data.x86_phys_bits;
> > 
> > I think you mean to check the virtual bits member, not "phys_bits".
> > 
> > BTW, I also prefer the IS_ENABLED(CONFIG_) checks to explicit #ifdefs.
> > Would one of those work in this case?
> > 
> > As for the error message:
> > 
> > >  {
> > > -	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
> > > +	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
> > > +		  is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
> > 
> > I've always read that as "the GP might have been caused by a
> > non-canonical access".  The main nit I'd have with the change is that I
> > don't think all #GP's during user access functions which are given a
> > non-canonical address *necessarily* caused the #GP.
> > 
> > There are a billion ways you can get a #GP and I bet canonical
> > violations aren't the only way you can get one in a user copy function.
> 
> All the other reasons would require a fairly egregious kernel bug, hence
> the speculation that the #GP is due to a non-canonical address.  Something
> like the following would be more precise, though highly unlikely to ever
> be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> error code that went unnoticed for years.
> 
> 	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> 		  (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> 		  					       "Segmentation bug");

Instead of trying to guess the reason of the #GPF (which guess might be 
wrong), please just state it as the reason if we are sure that the cause 
is a non-canonical address - and provide a best-guess if it's not but 
clearly signal that it's a guess.

I.e. if I understood all the cases correctly we'd have three types of 
messages generated:

 !error_code:
	"General protection fault in user access, due to non-canonical address."

 error_code && !is_canonical_addr(fault_addr):
	"General protection fault in user access. Non-canonical address?"

 error_code && is_canonical_addr(fault_addr):
	"General protection fault in user access. Segmentation bug?"

Only the first one is declarative, because we know we got a #GP with a 
zero error code which should denote a non-canonical address access.

The second and third ones are guesses with question marks to communicate 
the uncertainty.

Assuming that !error_code always means non-canonical access?

And hopefully "!error_code && !is_canonical_addr(fault_addr)" is not 
possible?

Thanks,

	Ingo
Ingo Molnar Oct. 7, 2019, 2:44 p.m. UTC | #7
* Ingo Molnar <mingo@kernel.org> wrote:

> > All the other reasons would require a fairly egregious kernel bug, hence
> > the speculation that the #GP is due to a non-canonical address.  Something
> > like the following would be more precise, though highly unlikely to ever
> > be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> > error code that went unnoticed for years.
> > 
> > 	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> > 		  (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> > 		  					       "Segmentation bug");
> 
> Instead of trying to guess the reason of the #GPF (which guess might be 
> wrong), please just state it as the reason if we are sure that the cause 
> is a non-canonical address - and provide a best-guess if it's not but 
> clearly signal that it's a guess.
> 
> I.e. if I understood all the cases correctly we'd have three types of 
> messages generated:
> 
>  !error_code:
> 	"General protection fault in user access, due to non-canonical address."
> 
>  error_code && !is_canonical_addr(fault_addr):
> 	"General protection fault in user access. Non-canonical address?"
> 
>  error_code && is_canonical_addr(fault_addr):
> 	"General protection fault in user access. Segmentation bug?"

Now that I've read the rest of the thread, since fault_addr is always 0 
we can ignore most of this I suspect ...

Thanks,

	Ingo
Sean Christopherson Oct. 7, 2019, 3:13 p.m. UTC | #8
On Mon, Oct 07, 2019 at 04:44:23PM +0200, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@kernel.org> wrote:
> 
> > > All the other reasons would require a fairly egregious kernel bug, hence
> > > the speculation that the #GP is due to a non-canonical address.  Something
> > > like the following would be more precise, though highly unlikely to ever
> > > be exercised, e.g. KVM had a fatal bug related to injecting a non-zero
> > > error code that went unnoticed for years.
> > > 
> > > 	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. %s?\n",
> > > 		  (IS_ENABLED(CONFIG_X86_64) && !error_code) ? "Non-canonical address" :
> > > 		  					       "Segmentation bug");
> > 
> > Instead of trying to guess the reason of the #GPF (which guess might be 
> > wrong), please just state it as the reason if we are sure that the cause 
> > is a non-canonical address - and provide a best-guess if it's not but 
> > clearly signal that it's a guess.
> > 
> > I.e. if I understood all the cases correctly we'd have three types of 
> > messages generated:
> > 
> >  !error_code:
> > 	"General protection fault in user access, due to non-canonical address."

A non-canonical #GP always has an error code of '0', but the reverse isn't
technically true.  And 32-bit mode obviously can't generate a non-canonical
address.

But practically speaking, since _ASM_EXTABLE_UA() should only be used for
reg<->mem instructions, the only way to get a #GP on a usercopy instruction
would be to corrupt the code itself or have a bad segment loaded in 32-bit
mode.  So qualifying the non-canonical message on '64-bit && !error_code'
is techncally more precise/correct, but likely meaningless in practice.

> >  error_code && !is_canonical_addr(fault_addr):
> > 	"General protection fault in user access. Non-canonical address?"
> > 
> >  error_code && is_canonical_addr(fault_addr):
> > 	"General protection fault in user access. Segmentation bug?"
> 
> Now that I've read the rest of the thread, since fault_addr is always 0 
> we can ignore most of this I suspect ...

Patch
diff mbox series

diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index 4d75bc656f97..5196e586756f 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -8,6 +8,8 @@ 
 #include <asm/traps.h>
 #include <asm/kdebug.h>
 
+#include "mm_internal.h"
+
 typedef bool (*ex_handler_t)(const struct exception_table_entry *,
 			    struct pt_regs *, int, unsigned long,
 			    unsigned long);
@@ -123,7 +125,8 @@  __visible bool ex_handler_uaccess(const struct exception_table_entry *fixup,
 				  unsigned long error_code,
 				  unsigned long fault_addr)
 {
-	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault in user access. Non-canonical address?");
+	WARN_ONCE(trapnr == X86_TRAP_GP, "General protection fault at %s address in user access.",
+		  is_canonical_addr(fault_addr) ? "canonical" : "non-canonical");
 	regs->ip = ex_fixup_addr(fixup);
 	return true;
 }
diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h
index eeae142062ed..4c8a0fdd1c64 100644
--- a/arch/x86/mm/mm_internal.h
+++ b/arch/x86/mm/mm_internal.h
@@ -2,6 +2,17 @@ 
 #ifndef __X86_MM_INTERNAL_H
 #define __X86_MM_INTERNAL_H
 
+static inline bool is_canonical_addr(u64 addr)
+{
+#ifdef CONFIG_X86_64
+	int shift = 64 - boot_cpu_data.x86_phys_bits;
+
+	return ((int64_t)addr << shift >> shift) == addr;
+#else
+	return true;
+#endif
+}
+
 void *alloc_low_pages(unsigned int num);
 static inline void *alloc_low_page(void)
 {