[RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
@ 2018-03-05 10:31 Will Deacon
  2018-03-05 13:27 ` Robin Murphy
  2018-03-05 14:05 ` Dave Martin
  0 siblings, 2 replies; 12+ messages in thread
From: Will Deacon @ 2018-03-05 10:31 UTC (permalink / raw)
  To: linux-arm-kernel

User faults on kernel addresses are a good sign that the faulting task
is either up to no good or is in deep trouble. In such situations,
exposing the optional ESR context on the sigframe as part of the
delivered signal is only useful to attackers who are using information
about underlying hardware fault (e.g. translation vs permission) as a
mechanism to defeat KASLR.

Remove the ESR context from the sigframe for user faults on kernel
addresses.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---

Here's another one that doesn't make a huge amount of difference when
kpti is enabled, but I think is a change worth making all the same.

 arch/arm64/mm/fault.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 49dfb08a6c4d..b9800395788e 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -292,8 +292,10 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,

 static void __do_user_fault(struct siginfo *info, unsigned int esr)
 {
-	current->thread.fault_address = (unsigned long)info->si_addr;
-	current->thread.fault_code = esr;
+	unsigned long addr = (unsigned long)info->si_addr;
+
+	current->thread.fault_address = addr;
+	current->thread.fault_code = addr < TASK_SIZE ? esr : 0;
 	arm64_force_sig_info(info, esr_to_fault_info(esr)->name, current);
 }

-- 
2.1.4

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-05 10:31 [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA Will Deacon
@ 2018-03-05 13:27 ` Robin Murphy
  2018-03-05 15:56   ` Will Deacon
  2018-03-05 14:05 ` Dave Martin
  1 sibling, 1 reply; 12+ messages in thread
From: Robin Murphy @ 2018-03-05 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 05/03/18 10:31, Will Deacon wrote:
> User faults on kernel addresses are a good sign that the faulting task
> is either up to no good or is in deep trouble. In such situations,
> exposing the optional ESR context on the sigframe as part of the
> delivered signal is only useful to attackers who are using information
> about underlying hardware fault (e.g. translation vs permission) as a
> mechanism to defeat KASLR.
> 
> Remove the ESR context from the sigframe for user faults on kernel
> addresses.
> 
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
> 
> Here's another one that doesn't make a huge amount of difference when
> kpti is enabled, but I think is a change worth making all the same.
> 
>   arch/arm64/mm/fault.c | 6 ++++--
>   1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 49dfb08a6c4d..b9800395788e 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -292,8 +292,10 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>   
>   static void __do_user_fault(struct siginfo *info, unsigned int esr)
>   {
> -	current->thread.fault_address = (unsigned long)info->si_addr;
> -	current->thread.fault_code = esr;
> +	unsigned long addr = (unsigned long)info->si_addr;
> +
> +	current->thread.fault_address = addr;
> +	current->thread.fault_code = addr < TASK_SIZE ? esr : 0;

Nit: there are still non-kernel addresses above TASK_SIZE which would 
only imply a wild pointer rather than nefarious misdeeds, but I guess if 
you can already see that the faulting address is in the hole you don't 
really need a level 0 translation fault spelled out. More generally 
though, if there's a chance that someone might still try to interpret 
fault_code as an ESR value regardless of what happened, should we be 
setting it to ESR_ELx_IL rather than 0, to be consistent with the 
implied "Unknown reason" EC value?

Robin.

>   	arm64_force_sig_info(info, esr_to_fault_info(esr)->name, current);
>   }
>   
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-05 10:31 [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA Will Deacon
  2018-03-05 13:27 ` Robin Murphy
@ 2018-03-05 14:05 ` Dave Martin
  2018-03-05 17:24   ` Will Deacon
  2018-03-06 14:49   ` Catalin Marinas
  1 sibling, 2 replies; 12+ messages in thread
From: Dave Martin @ 2018-03-05 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 05, 2018 at 10:31:15AM +0000, Will Deacon wrote:
> User faults on kernel addresses are a good sign that the faulting task
> is either up to no good or is in deep trouble. In such situations,
> exposing the optional ESR context on the sigframe as part of the
> delivered signal is only useful to attackers who are using information
> about underlying hardware fault (e.g. translation vs permission) as a
> mechanism to defeat KASLR.
> 
> Remove the ESR context from the sigframe for user faults on kernel
> addresses.

As this wording suggests, this change causes esr_context to disappear
entirely from the signal frame.  Previously, I think user code could
have relied on its being present for certain signals.

Does Debian's codesearch throw up any nontrivial users of esr_context?

Cheers
---Dave

> 
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Dave Martin <Dave.Martin@arm.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
> 
> Here's another one that doesn't make a huge amount of difference when
> kpti is enabled, but I think is a change worth making all the same.
> 
>  arch/arm64/mm/fault.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 49dfb08a6c4d..b9800395788e 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -292,8 +292,10 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
>  
>  static void __do_user_fault(struct siginfo *info, unsigned int esr)
>  {
> -	current->thread.fault_address = (unsigned long)info->si_addr;
> -	current->thread.fault_code = esr;
> +	unsigned long addr = (unsigned long)info->si_addr;
> +
> +	current->thread.fault_address = addr;
> +	current->thread.fault_code = addr < TASK_SIZE ? esr : 0;
>  	arm64_force_sig_info(info, esr_to_fault_info(esr)->name, current);
>  }
>  
> -- 
> 2.1.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-05 13:27 ` Robin Murphy
@ 2018-03-05 15:56   ` Will Deacon
  0 siblings, 0 replies; 12+ messages in thread
From: Will Deacon @ 2018-03-05 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 05, 2018 at 01:27:47PM +0000, Robin Murphy wrote:
> On 05/03/18 10:31, Will Deacon wrote:
> >User faults on kernel addresses are a good sign that the faulting task
> >is either up to no good or is in deep trouble. In such situations,
> >exposing the optional ESR context on the sigframe as part of the
> >delivered signal is only useful to attackers who are using information
> >about underlying hardware fault (e.g. translation vs permission) as a
> >mechanism to defeat KASLR.
> >
> >Remove the ESR context from the sigframe for user faults on kernel
> >addresses.
> >
> >Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >Cc: Dave Martin <Dave.Martin@arm.com>
> >Signed-off-by: Will Deacon <will.deacon@arm.com>
> >---
> >
> >Here's another one that doesn't make a huge amount of difference when
> >kpti is enabled, but I think is a change worth making all the same.
> >
> >  arch/arm64/mm/fault.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> >diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> >index 49dfb08a6c4d..b9800395788e 100644
> >--- a/arch/arm64/mm/fault.c
> >+++ b/arch/arm64/mm/fault.c
> >@@ -292,8 +292,10 @@ static void __do_kernel_fault(unsigned long addr, unsigned int esr,
> >  static void __do_user_fault(struct siginfo *info, unsigned int esr)
> >  {
> >-	current->thread.fault_address = (unsigned long)info->si_addr;
> >-	current->thread.fault_code = esr;
> >+	unsigned long addr = (unsigned long)info->si_addr;
> >+
> >+	current->thread.fault_address = addr;
> >+	current->thread.fault_code = addr < TASK_SIZE ? esr : 0;
> 
> Nit: there are still non-kernel addresses above TASK_SIZE which would only
> imply a wild pointer rather than nefarious misdeeds, but I guess if you can
> already see that the faulting address is in the hole you don't really need a
> level 0 translation fault spelled out. More generally though, if there's a
> chance that someone might still try to interpret fault_code as an ESR value
> regardless of what happened, should we be setting it to ESR_ELx_IL rather
> than 0, to be consistent with the implied "Unknown reason" EC value?

0 is a magic value, which means that the ESR record gets omitted from the
sigframe entirely (see setup_sigframe_layout).

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-05 14:05 ` Dave Martin
@ 2018-03-05 17:24   ` Will Deacon
  2018-03-06 15:59     ` Peter Maydell
  2018-03-06 14:49   ` Catalin Marinas
  1 sibling, 1 reply; 12+ messages in thread
From: Will Deacon @ 2018-03-05 17:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 05, 2018 at 02:05:06PM +0000, Dave Martin wrote:
> On Mon, Mar 05, 2018 at 10:31:15AM +0000, Will Deacon wrote:
> > User faults on kernel addresses are a good sign that the faulting task
> > is either up to no good or is in deep trouble. In such situations,
> > exposing the optional ESR context on the sigframe as part of the
> > delivered signal is only useful to attackers who are using information
> > about underlying hardware fault (e.g. translation vs permission) as a
> > mechanism to defeat KASLR.
> > 
> > Remove the ESR context from the sigframe for user faults on kernel
> > addresses.
> 
> As this wording suggests, this change causes esr_context to disappear
> entirely from the signal frame.  Previously, I think user code could
> have relied on its being present for certain signals.
> 
> Does Debian's codesearch throw up any nontrivial users of esr_context?

The main one seems to be ASAN, which uses the RnW bit to report "READ",
"WRITE" or "UNKNOWN". So with this change, the access will be treated as
UNKNOWN for kernel addresses.

Whilst I can see how that might cause a testsuite regression, I'm struggling
to see how it could sensible impact ASAN given that userspace never has
permission to access these addresses and so the fault should be treated as
fatal regardless of whether or not it's a read or a write.

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-05 14:05 ` Dave Martin
  2018-03-05 17:24   ` Will Deacon
@ 2018-03-06 14:49   ` Catalin Marinas
  1 sibling, 0 replies; 12+ messages in thread
From: Catalin Marinas @ 2018-03-06 14:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 05, 2018 at 02:05:06PM +0000, Dave P Martin wrote:
> On Mon, Mar 05, 2018 at 10:31:15AM +0000, Will Deacon wrote:
> > User faults on kernel addresses are a good sign that the faulting task
> > is either up to no good or is in deep trouble. In such situations,
> > exposing the optional ESR context on the sigframe as part of the
> > delivered signal is only useful to attackers who are using information
> > about underlying hardware fault (e.g. translation vs permission) as a
> > mechanism to defeat KASLR.
> > 
> > Remove the ESR context from the sigframe for user faults on kernel
> > addresses.
> 
> As this wording suggests, this change causes esr_context to disappear
> entirely from the signal frame.  Previously, I think user code could
> have relied on its being present for certain signals.
> 
> Does Debian's codesearch throw up any nontrivial users of esr_context?

The request for ESR context came from the qemu people. Cc'ing Peter
Maydell (and bouncing the rest of the thread to him separately).

-- 
Catalin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-05 17:24   ` Will Deacon
@ 2018-03-06 15:59     ` Peter Maydell
  2018-03-06 16:05       ` Dave Martin
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Maydell @ 2018-03-06 15:59 UTC (permalink / raw)
  To: linux-arm-kernel

On 5 March 2018 at 17:24, Will Deacon <will.deacon@arm.com> wrote:
> On Mon, Mar 05, 2018 at 02:05:06PM +0000, Dave Martin wrote:
>> Does Debian's codesearch throw up any nontrivial users of esr_context?
>
> The main one seems to be ASAN, which uses the RnW bit to report "READ",
> "WRITE" or "UNKNOWN". So with this change, the access will be treated as
> UNKNOWN for kernel addresses.
>
> Whilst I can see how that might cause a testsuite regression, I'm struggling
> to see how it could sensible impact ASAN given that userspace never has
> permission to access these addresses and so the fault should be treated as
> fatal regardless of whether or not it's a read or a write.

Right, but the read/write/unknown classification also affects the
severity of that warning level ('scariness' in the asan code),
and it's not immediately clear how much might then in turn be relying
on that.

I think that if you have widely deployed code that is using this
ESR value, then it's kernel ABI that people are relying on, and
the safest thing to do is to make the minimal change that will
fix the problem you have, not to yank the whole thing entirely
and hope that the users will cope.

QEMU is not currently using the ESR value, but it would be nice to
in future, and it would certainly be irritating not to have the
WnR information just because the faulting address happens to be in
the top half of memory.

AFAIK the major thing that consumers actually are after here
is the WnR information, so preserving that and sanitizing
the rest of the ESR if necessary would be a less risky fix IMHO.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-06 15:59     ` Peter Maydell
@ 2018-03-06 16:05       ` Dave Martin
  2018-03-06 17:54         ` Will Deacon
  2018-03-06 17:59         ` James Morse
  0 siblings, 2 replies; 12+ messages in thread
From: Dave Martin @ 2018-03-06 16:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 06, 2018 at 03:59:59PM +0000, Peter Maydell wrote:
> On 5 March 2018 at 17:24, Will Deacon <will.deacon@arm.com> wrote:
> > On Mon, Mar 05, 2018 at 02:05:06PM +0000, Dave Martin wrote:
> >> Does Debian's codesearch throw up any nontrivial users of esr_context?
> >
> > The main one seems to be ASAN, which uses the RnW bit to report "READ",
> > "WRITE" or "UNKNOWN". So with this change, the access will be treated as
> > UNKNOWN for kernel addresses.
> >
> > Whilst I can see how that might cause a testsuite regression, I'm struggling
> > to see how it could sensible impact ASAN given that userspace never has
> > permission to access these addresses and so the fault should be treated as
> > fatal regardless of whether or not it's a read or a write.
> 
> Right, but the read/write/unknown classification also affects the
> severity of that warning level ('scariness' in the asan code),
> and it's not immediately clear how much might then in turn be relying
> on that.
> 
> I think that if you have widely deployed code that is using this
> ESR value, then it's kernel ABI that people are relying on, and
> the safest thing to do is to make the minimal change that will
> fix the problem you have, not to yank the whole thing entirely
> and hope that the users will cope.
> 
> QEMU is not currently using the ESR value, but it would be nice to
> in future, and it would certainly be irritating not to have the
> WnR information just because the faulting address happens to be in
> the top half of memory.
> 
> AFAIK the major thing that consumers actually are after here
> is the WnR information, so preserving that and sanitizing
> the rest of the ESR if necessary would be a less risky fix IMHO.

If there is a way of squashing the syndrome information so that it
reports a fixed syndrome except for information about what userspace
attempted to do (i.e., WnR -- I dunno if there's anything else), that
seems reasonable.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-06 16:05       ` Dave Martin
@ 2018-03-06 17:54         ` Will Deacon
  2018-03-07 10:50           ` Dave P Martin
  2018-03-06 17:59         ` James Morse
  1 sibling, 1 reply; 12+ messages in thread
From: Will Deacon @ 2018-03-06 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 06, 2018 at 04:05:53PM +0000, Dave Martin wrote:
> On Tue, Mar 06, 2018 at 03:59:59PM +0000, Peter Maydell wrote:
> > On 5 March 2018 at 17:24, Will Deacon <will.deacon@arm.com> wrote:
> > > On Mon, Mar 05, 2018 at 02:05:06PM +0000, Dave Martin wrote:
> > >> Does Debian's codesearch throw up any nontrivial users of esr_context?
> > >
> > > The main one seems to be ASAN, which uses the RnW bit to report "READ",
> > > "WRITE" or "UNKNOWN". So with this change, the access will be treated as
> > > UNKNOWN for kernel addresses.
> > >
> > > Whilst I can see how that might cause a testsuite regression, I'm struggling
> > > to see how it could sensible impact ASAN given that userspace never has
> > > permission to access these addresses and so the fault should be treated as
> > > fatal regardless of whether or not it's a read or a write.
> > 
> > Right, but the read/write/unknown classification also affects the
> > severity of that warning level ('scariness' in the asan code),
> > and it's not immediately clear how much might then in turn be relying
> > on that.
> > 
> > I think that if you have widely deployed code that is using this
> > ESR value, then it's kernel ABI that people are relying on, and
> > the safest thing to do is to make the minimal change that will
> > fix the problem you have, not to yank the whole thing entirely
> > and hope that the users will cope.
> > 
> > QEMU is not currently using the ESR value, but it would be nice to
> > in future, and it would certainly be irritating not to have the
> > WnR information just because the faulting address happens to be in
> > the top half of memory.
> > 
> > AFAIK the major thing that consumers actually are after here
> > is the WnR information, so preserving that and sanitizing
> > the rest of the ESR if necessary would be a less risky fix IMHO.
> 
> If there is a way of squashing the syndrome information so that it
> reports a fixed syndrome except for information about what userspace
> attempted to do (i.e., WnR -- I dunno if there's anything else), that
> seems reasonable.

I don't know how we can do that, and I'm deeply sceptical of claims that
the WnR bit matters at all for kernel addresses. Any change we make here
will be user visible but I don't think that means we shouldn't consider
changes for cases that are highly unlikely to cause problems. We'll
obviously revert anything that does causes issues, but that shouldn't
be the goal.

I'll try to reach out to the ASAN people to get their feedback on this.

If we do want to use a sanitised ESR value, then we need to do this within
the constraints of the architecture because Linux advertises this as the
ESR when it is provided. What encoding would you suggest? Should we report
all faults on kernel addresses as Translation fault level 0?

Will

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-06 16:05       ` Dave Martin
  2018-03-06 17:54         ` Will Deacon
@ 2018-03-06 17:59         ` James Morse
  2018-03-06 18:16           ` James Morse
  1 sibling, 1 reply; 12+ messages in thread
From: James Morse @ 2018-03-06 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Dave, Peter,

On 06/03/18 16:05, Dave Martin wrote:
> On Tue, Mar 06, 2018 at 03:59:59PM +0000, Peter Maydell wrote:
>> AFAIK the major thing that consumers actually are after here
>> is the WnR information, so preserving that and sanitizing
>> the rest of the ESR if necessary would be a less risky fix IMHO.
> 
> If there is a way of squashing the syndrome information so that it
> reports a fixed syndrome except for information about what userspace
> attempted to do (i.e., WnR -- I dunno if there's anything else), that
> seems reasonable.

Anything else? For the RAS stuff I planned to use this to indicate whether a
SIBGUS due to hwpoison was due to an instruction or data abort, so a
sanitised-EC field would be good.

(this lets KVMs user-space inject the correct flavour of external-abort)


Thanks,

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-06 17:59         ` James Morse
@ 2018-03-06 18:16           ` James Morse
  0 siblings, 0 replies; 12+ messages in thread
From: James Morse @ 2018-03-06 18:16 UTC (permalink / raw)
  To: linux-arm-kernel

On 06/03/18 17:59, James Morse wrote:
> On 06/03/18 16:05, Dave Martin wrote:
>> On Tue, Mar 06, 2018 at 03:59:59PM +0000, Peter Maydell wrote:
>>> AFAIK the major thing that consumers actually are after here
>>> is the WnR information, so preserving that and sanitizing
>>> the rest of the ESR if necessary would be a less risky fix IMHO.
>>
>> If there is a way of squashing the syndrome information so that it
>> reports a fixed syndrome except for information about what userspace
>> attempted to do (i.e., WnR -- I dunno if there's anything else), that
>> seems reasonable.
> 
> Anything else? For the RAS stuff I planned to use this to indicate whether a
> SIBGUS due to hwpoison was due to an instruction or data abort, so a
> sanitised-EC field would be good.

I misread this as always sanitising the ESR field, Will was only talking about
sanitising the ESR for faults on kernel addresses.


Sorry for the noise!,

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA
  2018-03-06 17:54         ` Will Deacon
@ 2018-03-07 10:50           ` Dave P Martin
  0 siblings, 0 replies; 12+ messages in thread
From: Dave P Martin @ 2018-03-07 10:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 06, 2018 at 05:54:46PM +0000, Will Deacon wrote:
> On Tue, Mar 06, 2018 at 04:05:53PM +0000, Dave Martin wrote:
> > On Tue, Mar 06, 2018 at 03:59:59PM +0000, Peter Maydell wrote:
> > > On 5 March 2018 at 17:24, Will Deacon <will.deacon@arm.com> wrote:
> > > > On Mon, Mar 05, 2018 at 02:05:06PM +0000, Dave Martin wrote:
> > > >> Does Debian's codesearch throw up any nontrivial users of esr_context?
> > > >
> > > > The main one seems to be ASAN, which uses the RnW bit to report "READ",
> > > > "WRITE" or "UNKNOWN". So with this change, the access will be treated as
> > > > UNKNOWN for kernel addresses.
> > > >
> > > > Whilst I can see how that might cause a testsuite regression, I'm struggling
> > > > to see how it could sensible impact ASAN given that userspace never has
> > > > permission to access these addresses and so the fault should be treated as
> > > > fatal regardless of whether or not it's a read or a write.
> > >
> > > Right, but the read/write/unknown classification also affects the
> > > severity of that warning level ('scariness' in the asan code),
> > > and it's not immediately clear how much might then in turn be relying
> > > on that.
> > >
> > > I think that if you have widely deployed code that is using this
> > > ESR value, then it's kernel ABI that people are relying on, and
> > > the safest thing to do is to make the minimal change that will
> > > fix the problem you have, not to yank the whole thing entirely
> > > and hope that the users will cope.
> > >
> > > QEMU is not currently using the ESR value, but it would be nice to
> > > in future, and it would certainly be irritating not to have the
> > > WnR information just because the faulting address happens to be in
> > > the top half of memory.
> > >
> > > AFAIK the major thing that consumers actually are after here
> > > is the WnR information, so preserving that and sanitizing
> > > the rest of the ESR if necessary would be a less risky fix IMHO.
> >
> > If there is a way of squashing the syndrome information so that it
> > reports a fixed syndrome except for information about what userspace
> > attempted to do (i.e., WnR -- I dunno if there's anything else), that
> > seems reasonable.
>
> I don't know how we can do that, and I'm deeply sceptical of claims that
> the WnR bit matters at all for kernel addresses. Any change we make here
> will be user visible but I don't think that means we shouldn't consider
> changes for cases that are highly unlikely to cause problems. We'll
> obviously revert anything that does causes issues, but that shouldn't
> be the goal.
>
> I'll try to reach out to the ASAN people to get their feedback on this.
>
> If we do want to use a sanitised ESR value, then we need to do this within
> the constraints of the architecture because Linux advertises this as the
> ESR when it is provided. What encoding would you suggest? Should we report
> all faults on kernel addresses as Translation fault level 0?

We could, say, report everything that hits this case as a level 3
permission fault, preserving fields such as ISV (insn syndrome valid)
SAS (access size) SSE (sign extend) SRE (register number) SF (64-bit
register) AR (acquire/release) etc.

We would probably want to squash things like EA (external abort type),
SET (synchronous error type), S1PTW (stage2 fault on stage1 pagetable
walk).

This is still to some extent inventing architecture though, so I
don't think we should consider this kind of approach without
confirmation that some userspace software is legitimately relying on it.


Cheers
---Dave
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-07 10:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-05 10:31 [RFC PATCH] arm64: fault: Don't populate ESR context for user fault on kernel VA Will Deacon
2018-03-05 13:27 ` Robin Murphy
2018-03-05 15:56   ` Will Deacon
2018-03-05 14:05 ` Dave Martin
2018-03-05 17:24   ` Will Deacon
2018-03-06 15:59     ` Peter Maydell
2018-03-06 16:05       ` Dave Martin
2018-03-06 17:54         ` Will Deacon
2018-03-07 10:50           ` Dave P Martin
2018-03-06 17:59         ` James Morse
2018-03-06 18:16           ` James Morse
2018-03-06 14:49   ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.