All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
@ 2018-04-13 10:02 Peter Maydell
  2018-04-13 14:21 ` Dave Martin
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2018-04-13 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

If userspace faults on a kernel address, handing them the raw ESR
value on the sigframe as part of the delivered signal can leak data
useful to attackers who are using information about the underlying hardware
fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.

However there are also legitimate uses for the information provided
in the ESR -- notably the GCC and LLVM sanitizers would like to be
able to report whether wild pointer accesses by the application are
reads or writes (since a wild write is a more serious bug than a
wild read), so we don't want to drop the ESR information entirely.

For faulting addresses in the kernel, sanitize the ESR. We choose
to present userspace with the illusion that there is nothing mapped
in the kernel's part of the address space at all, by reporting all
faults as level 0 translation faults.

These fields are safe to pass through to userspace as they depend
only on the instruction that userspace used to provoke the fault:
 EC IL ISV SAS SSE SRT SF AR CM WNR
All the other fields in ESR except DFSC are architecturally zero
for an L1 translation fault, so can be zeroed out without confusing
userspace.

The illusion is not entirely perfect, as there is a tiny wrinkle
where we will report an alignment fault that was not due to the memory
type (for instance a LDREX to an unaligned address) as a translation
fault, whereas if you do this on real unmapped memory the alignment
fault takes precedence. This is not likely to trip anybody up in
practice, as the only users we know of for the ESR information who
care about the behaviour for kernel addresses only really want to
know about the WnR bit.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
---
This RFC patch is an alternative proposal to Will's patch
https://patchwork.kernel.org/patch/10258781/
which simply removed the ESR record entirely for kernel addresses.

 arch/arm64/mm/fault.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index bff11553eb05..933c6d3b906e 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -307,6 +307,57 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
 		__show_regs(regs);
 	}
 
+	/*
+	 * If the faulting address is in the kernel, we must sanitize the ESR.
+	 * From userspace's point of view, kernel-only mappings don't exist
+	 * at all, so we report them as level 0 translation faults.
+	 * (This is not quite the way that "no mapping there at all" behaves:
+	 * an alignment fault not caused by the memory type would take
+	 * precedence over translation fault for a real access to empty
+	 * space. Unfortunately we can't easily distinguish "alignment fault
+	 * not caused by memory type" from "alignment fault caused by memory
+	 * type", so we ignore this wrinkle and just return the translation
+	 * fault.)
+	 */
+	if (addr >= TASK_SIZE) {
+		switch (ESR_ELx_EC(esr)) {
+		case ESR_ELx_EC_DABT_CUR:
+		case ESR_ELx_EC_DABT_LOW:
+			/*
+			 * These bits provide only information about the
+			 * faulting instruction, which userspace knows already.
+			 * All other bits are architecturally required to be
+			 * zero for faults reported with a DFSCR indicating
+			 * a level 0 translation fault.
+			 */
+			esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV |
+				ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK |
+				ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM |
+				ESR_ELx_WNR;
+			esr |= ESR_ELx_FSC_FAULT;
+			break;
+		case ESR_ELx_EC_IABT_CUR:
+		case ESR_ELx_EC_IABT_LOW:
+			/*
+			 * Claim a level 0 translation fault.
+			 * All other bits are architecturally required to be
+			 * zero for faults reported with that DFSC value.
+			 */
+			esr &= ESR_ELx_EC_MASK | ESR_ELx_IL;
+			esr |= ESR_ELx_FSC_FAULT;
+			break;
+		default:
+			/*
+			 * This should never happen (entry.S only brings us
+			 * into this code for insn and data aborts). Fail
+			 * safe by not providing an ESR context record at all.
+			 */
+			WARN(1, "ESR 0x%x is not DABT or IABT\n", esr);
+			esr = 0;
+			break;
+		}
+	}
+
 	tsk->thread.fault_address = addr;
 	tsk->thread.fault_code = esr;
 	si.si_signo = sig;
-- 
2.16.2

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
  2018-04-13 10:02 [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA Peter Maydell
@ 2018-04-13 14:21 ` Dave Martin
  2018-04-13 14:47   ` Peter Maydell
                     ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Dave Martin @ 2018-04-13 14:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote:
> If userspace faults on a kernel address, handing them the raw ESR
> value on the sigframe as part of the delivered signal can leak data
> useful to attackers who are using information about the underlying hardware
> fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.
> 
> However there are also legitimate uses for the information provided
> in the ESR -- notably the GCC and LLVM sanitizers would like to be

"would like to" or "currently rely on for correct behaviour"?

The usefulness of some of this information (particularly WnR) does not
seem to be arch-dependent.  If this were reported in a portable way via
siginfo, would it allow any known users to be simplified?

I'm wondering whether there's an argument for coming up with a generic
solution for the future.  That wouldn't replace this patch though.

> able to report whether wild pointer accesses by the application are
> reads or writes (since a wild write is a more serious bug than a
> wild read), so we don't want to drop the ESR information entirely.
> 
> For faulting addresses in the kernel, sanitize the ESR. We choose
> to present userspace with the illusion that there is nothing mapped
> in the kernel's part of the address space at all, by reporting all
> faults as level 0 translation faults.
> 
> These fields are safe to pass through to userspace as they depend
> only on the instruction that userspace used to provoke the fault:
>  EC IL ISV SAS SSE SRT SF AR CM WNR
> All the other fields in ESR except DFSC are architecturally zero
> for an L1 translation fault, so can be zeroed out without confusing
> userspace.

While we're about it, it occurs to me that even for faults taken
on user addresses, ESR exposes information that is not really
relevant to userspace.  For example, the layout of the pagetables
is not something that userspace can normally see, so it is strange
to report the level at which a translation fault occurred etc.

If we are sanitising this information, we should perhaps squash
all faults that specify a translation table level to a single level.

There are certain fault types that should never be exposed to
userspace, such as TLB conflict aborts, access flag faults, etc.
See the changes upstream between v4.16 and torvalds/master (which I had
temporarily forgotten about...)

Now, __do_user_fault() is called for these cases but the signal has
already been mapped to SIGKILL by this point via the fault_info[]
table.  It doesn't matter too much what we do in such cases because
the user task will be killed on signal delivery and so doesn't get a
chance to inspect the ESR contents anyway.  It might be worth expanding
the WARN() to catch mismaintenance of the fault_info[] table -- but that
may be overkill.  Maybe adding some comments explaining the dependency
is sufficient.

> The illusion is not entirely perfect, as there is a tiny wrinkle
> where we will report an alignment fault that was not due to the memory
> type (for instance a LDREX to an unaligned address) as a translation
> fault, whereas if you do this on real unmapped memory the alignment
> fault takes precedence. This is not likely to trip anybody up in
> practice, as the only users we know of for the ESR information who
> care about the behaviour for kernel addresses only really want to
> know about the WnR bit.

I tend to agree with this.  We could probably allow the alignment fault
to show through, but I can't think of a reasonable scenario where
userspace would legitimately rely on this.

If we err on the size of exposing too little information, we are less
likely to get into problems later on...

> 
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> ---
> This RFC patch is an alternative proposal to Will's patch
> https://patchwork.kernel.org/patch/10258781/
> which simply removed the ESR record entirely for kernel addresses.
> 
>  arch/arm64/mm/fault.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index bff11553eb05..933c6d3b906e 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -307,6 +307,57 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>  		__show_regs(regs);
>  	}
>  
> +	/*
> +	 * If the faulting address is in the kernel, we must sanitize the ESR.
> +	 * From userspace's point of view, kernel-only mappings don't exist
> +	 * at all, so we report them as level 0 translation faults.
> +	 * (This is not quite the way that "no mapping there at all" behaves:
> +	 * an alignment fault not caused by the memory type would take
> +	 * precedence over translation fault for a real access to empty
> +	 * space. Unfortunately we can't easily distinguish "alignment fault
> +	 * not caused by memory type" from "alignment fault caused by memory
> +	 * type", so we ignore this wrinkle and just return the translation
> +	 * fault.)
> +	 */
> +	if (addr >= TASK_SIZE) {
> +		switch (ESR_ELx_EC(esr)) {
> +		case ESR_ELx_EC_DABT_CUR:

How do we hit the _CUR cases, and if we can, aren't we reporting
information about an insn executed by the _kernel_ here?

Possibly we should delete those cases: if we took a _CUR exception with
user_mode(regs) it looks a bit like we would have to have been executing
at EL0 but somehow took the exception from EL1 to EL1.  That sounds like
it should be impossible...


> +		case ESR_ELx_EC_DABT_LOW:
> +			/*
> +			 * These bits provide only information about the
> +			 * faulting instruction, which userspace knows already.
> +			 * All other bits are architecturally required to be
> +			 * zero for faults reported with a DFSCR indicating
> +			 * a level 0 translation fault.
> +			 */
> +			esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV |

In the !ISV case, some of the other fields become architecturally
UNKNOWN.  We should probably mask them out in that case, since they
could expose something that EL0 is not supposed to see.

> +				ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK |
> +				ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM |
> +				ESR_ELx_WNR;
> +			esr |= ESR_ELx_FSC_FAULT;
> +			break;
> +		case ESR_ELx_EC_IABT_CUR:
> +		case ESR_ELx_EC_IABT_LOW:
> +			/*
> +			 * Claim a level 0 translation fault.
> +			 * All other bits are architecturally required to be
> +			 * zero for faults reported with that DFSC value.

Nit: they're RES0, so they might become nonzero in the future, with
unpredicatable (and possibly sensitive) meanings.  So it makes sense to
hide them until we know they're safe to expose.

> +			 */
> +			esr &= ESR_ELx_EC_MASK | ESR_ELx_IL;
> +			esr |= ESR_ELx_FSC_FAULT;
> +			break;
> +		default:
> +			/*
> +			 * This should never happen (entry.S only brings us
> +			 * into this code for insn and data aborts). Fail
> +			 * safe by not providing an ESR context record at all.
> +			 */
> +			WARN(1, "ESR 0x%x is not DABT or IABT\n", esr);
> +			esr = 0;
> +			break;
> +		}
> +	}
> +
>  	tsk->thread.fault_address = addr;
>  	tsk->thread.fault_code = esr;
>  	si.si_signo = sig;
> -- 
> 2.16.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
  2018-04-13 14:21 ` Dave Martin
@ 2018-04-13 14:47   ` Peter Maydell
  2018-04-13 15:21     ` Dave Martin
  2018-04-13 14:54   ` Robin Murphy
  2018-04-17 17:34   ` Peter Maydell
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2018-04-13 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote:
>> If userspace faults on a kernel address, handing them the raw ESR
>> value on the sigframe as part of the delivered signal can leak data
>> useful to attackers who are using information about the underlying hardware
>> fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.
>>
>> However there are also legitimate uses for the information provided
>> in the ESR -- notably the GCC and LLVM sanitizers would like to be
>
> "would like to" or "currently rely on for correct behaviour"?

Closer to the latter -- certainly if we just dropped ESR we would
change their behaviour in those situations, though they have a
fallback codepath I think so they wouldn't die entirely.

> The usefulness of some of this information (particularly WnR) does not
> seem to be arch-dependent.  If this were reported in a portable way via
> siginfo, would it allow any known users to be simplified?
>
> I'm wondering whether there's an argument for coming up with a generic
> solution for the future.  That wouldn't replace this patch though.

That would certainly be nice. QEMU has accumulated a collection
of per-arch per-OS signal handlers that have code of varying
degrees of ugliness to try to determine "was this a write"
when in a SIGSEGV/SIGBUS handler:

https://git.qemu.org/?p=qemu.git;a=blob;f=accel/tcg/user-exec.c;h=26a3ffbba1de229060f683f9f6a0ca80db988eb9;hb=refs/heads/master#l180

as has the LLVM sanitizer code:

https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux.cc#L1724

(though they seem to have managed to make it a bit less ugly
than QEMU :-))

> While we're about it, it occurs to me that even for faults taken
> on user addresses, ESR exposes information that is not really
> relevant to userspace.  For example, the layout of the pagetables
> is not something that userspace can normally see, so it is strange
> to report the level at which a translation fault occurred etc.
>
> If we are sanitising this information, we should perhaps squash
> all faults that specify a translation table level to a single level.

Mmm, maybe (though I guess level 0 fault is not so appropriate
here). TBH it feels a bit like an unnecessary level of gold-plating.

> There are certain fault types that should never be exposed to
> userspace, such as TLB conflict aborts, access flag faults, etc.
> See the changes upstream between v4.16 and torvalds/master (which I had
> temporarily forgotten about...)
>
> Now, __do_user_fault() is called for these cases but the signal has
> already been mapped to SIGKILL by this point via the fault_info[]
> table.  It doesn't matter too much what we do in such cases because
> the user task will be killed on signal delivery and so doesn't get a
> chance to inspect the ESR contents anyway.  It might be worth expanding
> the WARN() to catch mismaintenance of the fault_info[] table -- but that
> may be overkill.  Maybe adding some comments explaining the dependency
> is sufficient.

I don't entirely understand this paragraph, but if you want to
provide a comment I'm happy to fill it into a v2 of this patch :-)

>> The illusion is not entirely perfect, as there is a tiny wrinkle
>> where we will report an alignment fault that was not due to the memory
>> type (for instance a LDREX to an unaligned address) as a translation
>> fault, whereas if you do this on real unmapped memory the alignment
>> fault takes precedence. This is not likely to trip anybody up in
>> practice, as the only users we know of for the ESR information who
>> care about the behaviour for kernel addresses only really want to
>> know about the WnR bit.
>
> I tend to agree with this.  We could probably allow the alignment fault
> to show through, but I can't think of a reasonable scenario where
> userspace would legitimately rely on this.

If you always allow "alignment fault" DFSC values through, then you
are permitting userspace to distinguish kernel-space memory that
is Device memory from (unmapped or Normal), because it can do an
unaligned LDR access or similar, and see whether it gets a DFSC
for alignment fault or one for a translation error.

>> +     if (addr >= TASK_SIZE) {
>> +             switch (ESR_ELx_EC(esr)) {
>> +             case ESR_ELx_EC_DABT_CUR:
>
> How do we hit the _CUR cases, and if we can, aren't we reporting
> information about an insn executed by the _kernel_ here?
>
> Possibly we should delete those cases: if we took a _CUR exception with
> user_mode(regs) it looks a bit like we would have to have been executing
> at EL0 but somehow took the exception from EL1 to EL1.  That sounds like
> it should be impossible...

Yes, you're right that we shouldn't get here with the _CUR cases.
I think I was thinking there might be cases where the kernel
accessed userspace memory and reported faults with a signal,
but looking at the code we can't get to this function unless
user_mode() returned true. So we can let the default case handle
the _CUR cases too.

>> +             case ESR_ELx_EC_DABT_LOW:
>> +                     /*
>> +                      * These bits provide only information about the
>> +                      * faulting instruction, which userspace knows already.
>> +                      * All other bits are architecturally required to be
>> +                      * zero for faults reported with a DFSCR indicating
>> +                      * a level 0 translation fault.
>> +                      */
>> +                     esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV |
>
> In the !ISV case, some of the other fields become architecturally
> UNKNOWN.  We should probably mask them out in that case, since they
> could expose something that EL0 is not supposed to see.

The spec says they're RES0 when ISV is 0, and only UNKNOWN if ISV
is UNKNOWN (which only happens for Debug state accesses, which I
think are not relevant here).

We can certainly mask out the RES0 bits in the ISV==0 case, though.

>> +                             ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK |
>> +                             ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM |
>> +                             ESR_ELx_WNR;
>> +                     esr |= ESR_ELx_FSC_FAULT;
>> +                     break;
>> +             case ESR_ELx_EC_IABT_CUR:
>> +             case ESR_ELx_EC_IABT_LOW:
>> +                     /*
>> +                      * Claim a level 0 translation fault.
>> +                      * All other bits are architecturally required to be
>> +                      * zero for faults reported with that DFSC value.
>
> Nit: they're RES0, so they might become nonzero in the future, with
> unpredicatable (and possibly sensitive) meanings.  So it makes sense to
> hide them until we know they're safe to expose.

OK, I'll tweak the comment text.

>> +                      */
>> +                     esr &= ESR_ELx_EC_MASK | ESR_ELx_IL;
>> +                     esr |= ESR_ELx_FSC_FAULT;
>> +                     break;
>> +             default:
>> +                     /*
>> +                      * This should never happen (entry.S only brings us
>> +                      * into this code for insn and data aborts). Fail
>> +                      * safe by not providing an ESR context record at all.
>> +                      */
>> +                     WARN(1, "ESR 0x%x is not DABT or IABT\n", esr);
>> +                     esr = 0;
>> +                     break;
>> +             }
>> +     }
>> +
>>       tsk->thread.fault_address = addr;
>>       tsk->thread.fault_code = esr;
>>       si.si_signo = sig;

thanks
-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
  2018-04-13 14:21 ` Dave Martin
  2018-04-13 14:47   ` Peter Maydell
@ 2018-04-13 14:54   ` Robin Murphy
  2018-04-17 17:34   ` Peter Maydell
  2 siblings, 0 replies; 7+ messages in thread
From: Robin Murphy @ 2018-04-13 14:54 UTC (permalink / raw)
  To: linux-arm-kernel



On 13/04/18 15:21, Dave Martin wrote:
> On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote:
>> If userspace faults on a kernel address, handing them the raw ESR
>> value on the sigframe as part of the delivered signal can leak data
>> useful to attackers who are using information about the underlying hardware
>> fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.
>>
>> However there are also legitimate uses for the information provided
>> in the ESR -- notably the GCC and LLVM sanitizers would like to be
> 
> "would like to" or "currently rely on for correct behaviour"?
> 
> The usefulness of some of this information (particularly WnR) does not
> seem to be arch-dependent.  If this were reported in a portable way via
> siginfo, would it allow any known users to be simplified?
> 
> I'm wondering whether there's an argument for coming up with a generic
> solution for the future.  That wouldn't replace this patch though.
> 
>> able to report whether wild pointer accesses by the application are
>> reads or writes (since a wild write is a more serious bug than a
>> wild read), so we don't want to drop the ESR information entirely.
>>
>> For faulting addresses in the kernel, sanitize the ESR. We choose
>> to present userspace with the illusion that there is nothing mapped
>> in the kernel's part of the address space at all, by reporting all
>> faults as level 0 translation faults.
>>
>> These fields are safe to pass through to userspace as they depend
>> only on the instruction that userspace used to provoke the fault:
>>   EC IL ISV SAS SSE SRT SF AR CM WNR
>> All the other fields in ESR except DFSC are architecturally zero
>> for an L1 translation fault, so can be zeroed out without confusing
>> userspace.
> 
> While we're about it, it occurs to me that even for faults taken
> on user addresses, ESR exposes information that is not really
> relevant to userspace.  For example, the layout of the pagetables
> is not something that userspace can normally see, so it is strange
> to report the level at which a translation fault occurred etc.
> 
> If we are sanitising this information, we should perhaps squash
> all faults that specify a translation table level to a single level.
> 
> There are certain fault types that should never be exposed to
> userspace, such as TLB conflict aborts, access flag faults, etc.
> See the changes upstream between v4.16 and torvalds/master (which I had
> temporarily forgotten about...)
> 
> Now, __do_user_fault() is called for these cases but the signal has
> already been mapped to SIGKILL by this point via the fault_info[]
> table.  It doesn't matter too much what we do in such cases because
> the user task will be killed on signal delivery and so doesn't get a
> chance to inspect the ESR contents anyway.  It might be worth expanding
> the WARN() to catch mismaintenance of the fault_info[] table -- but that
> may be overkill.  Maybe adding some comments explaining the dependency
> is sufficient.
> 
>> The illusion is not entirely perfect, as there is a tiny wrinkle
>> where we will report an alignment fault that was not due to the memory
>> type (for instance a LDREX to an unaligned address) as a translation
>> fault, whereas if you do this on real unmapped memory the alignment
>> fault takes precedence. This is not likely to trip anybody up in
>> practice, as the only users we know of for the ESR information who
>> care about the behaviour for kernel addresses only really want to
>> know about the WnR bit.
> 
> I tend to agree with this.  We could probably allow the alignment fault
> to show through, but I can't think of a reasonable scenario where
> userspace would legitimately rely on this.

I think the problem with that is that memory type alignment faults also 
still have precedence over permission faults, meaning that valid device 
mappings could potentially be discovered by probing with deliberately 
misaligned accesses, which could give away the location of the vmalloc 
region.

Robin.

> If we err on the size of exposing too little information, we are less
> likely to get into problems later on...
> 
>>
>> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
>> ---
>> This RFC patch is an alternative proposal to Will's patch
>> https://patchwork.kernel.org/patch/10258781/
>> which simply removed the ESR record entirely for kernel addresses.
>>
>>   arch/arm64/mm/fault.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 51 insertions(+)
>>
>> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
>> index bff11553eb05..933c6d3b906e 100644
>> --- a/arch/arm64/mm/fault.c
>> +++ b/arch/arm64/mm/fault.c
>> @@ -307,6 +307,57 @@ static void __do_user_fault(struct task_struct *tsk, unsigned long addr,
>>   		__show_regs(regs);
>>   	}
>>   
>> +	/*
>> +	 * If the faulting address is in the kernel, we must sanitize the ESR.
>> +	 * From userspace's point of view, kernel-only mappings don't exist
>> +	 * at all, so we report them as level 0 translation faults.
>> +	 * (This is not quite the way that "no mapping there at all" behaves:
>> +	 * an alignment fault not caused by the memory type would take
>> +	 * precedence over translation fault for a real access to empty
>> +	 * space. Unfortunately we can't easily distinguish "alignment fault
>> +	 * not caused by memory type" from "alignment fault caused by memory
>> +	 * type", so we ignore this wrinkle and just return the translation
>> +	 * fault.)
>> +	 */
>> +	if (addr >= TASK_SIZE) {
>> +		switch (ESR_ELx_EC(esr)) {
>> +		case ESR_ELx_EC_DABT_CUR:
> 
> How do we hit the _CUR cases, and if we can, aren't we reporting
> information about an insn executed by the _kernel_ here?
> 
> Possibly we should delete those cases: if we took a _CUR exception with
> user_mode(regs) it looks a bit like we would have to have been executing
> at EL0 but somehow took the exception from EL1 to EL1.  That sounds like
> it should be impossible...
> 
> 
>> +		case ESR_ELx_EC_DABT_LOW:
>> +			/*
>> +			 * These bits provide only information about the
>> +			 * faulting instruction, which userspace knows already.
>> +			 * All other bits are architecturally required to be
>> +			 * zero for faults reported with a DFSCR indicating
>> +			 * a level 0 translation fault.
>> +			 */
>> +			esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV |
> 
> In the !ISV case, some of the other fields become architecturally
> UNKNOWN.  We should probably mask them out in that case, since they
> could expose something that EL0 is not supposed to see.
> 
>> +				ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK |
>> +				ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM |
>> +				ESR_ELx_WNR;
>> +			esr |= ESR_ELx_FSC_FAULT;
>> +			break;
>> +		case ESR_ELx_EC_IABT_CUR:
>> +		case ESR_ELx_EC_IABT_LOW:
>> +			/*
>> +			 * Claim a level 0 translation fault.
>> +			 * All other bits are architecturally required to be
>> +			 * zero for faults reported with that DFSC value.
> 
> Nit: they're RES0, so they might become nonzero in the future, with
> unpredicatable (and possibly sensitive) meanings.  So it makes sense to
> hide them until we know they're safe to expose.
> 
>> +			 */
>> +			esr &= ESR_ELx_EC_MASK | ESR_ELx_IL;
>> +			esr |= ESR_ELx_FSC_FAULT;
>> +			break;
>> +		default:
>> +			/*
>> +			 * This should never happen (entry.S only brings us
>> +			 * into this code for insn and data aborts). Fail
>> +			 * safe by not providing an ESR context record at all.
>> +			 */
>> +			WARN(1, "ESR 0x%x is not DABT or IABT\n", esr);
>> +			esr = 0;
>> +			break;
>> +		}
>> +	}
>> +
>>   	tsk->thread.fault_address = addr;
>>   	tsk->thread.fault_code = esr;
>>   	si.si_signo = sig;
>> -- 
>> 2.16.2
>>
>>
>> _______________________________________________
>> linux-arm-kernel mailing list
>> linux-arm-kernel at lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
  2018-04-13 14:47   ` Peter Maydell
@ 2018-04-13 15:21     ` Dave Martin
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Martin @ 2018-04-13 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 13, 2018 at 03:47:08PM +0100, Peter Maydell wrote:
> On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote:
> > On Fri, Apr 13, 2018 at 11:02:04AM +0100, Peter Maydell wrote:
> >> If userspace faults on a kernel address, handing them the raw ESR
> >> value on the sigframe as part of the delivered signal can leak data
> >> useful to attackers who are using information about the underlying hardware
> >> fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.
> >>
> >> However there are also legitimate uses for the information provided
> >> in the ESR -- notably the GCC and LLVM sanitizers would like to be
> >
> > "would like to" or "currently rely on for correct behaviour"?
> 
> Closer to the latter -- certainly if we just dropped ESR we would
> change their behaviour in those situations, though they have a
> fallback codepath I think so they wouldn't die entirely.

I see.

> > The usefulness of some of this information (particularly WnR) does not
> > seem to be arch-dependent.  If this were reported in a portable way via
> > siginfo, would it allow any known users to be simplified?
> >
> > I'm wondering whether there's an argument for coming up with a generic
> > solution for the future.  That wouldn't replace this patch though.
> 
> That would certainly be nice. QEMU has accumulated a collection
> of per-arch per-OS signal handlers that have code of varying
> degrees of ugliness to try to determine "was this a write"
> when in a SIGSEGV/SIGBUS handler:
> 
> https://git.qemu.org/?p=qemu.git;a=blob;f=accel/tcg/user-exec.c;h=26a3ffbba1de229060f683f9f6a0ca80db988eb9;hb=refs/heads/master#l180
> 
> as has the LLVM sanitizer code:
> 
> https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_linux.cc#L1724
> 
> (though they seem to have managed to make it a bit less ugly
> than QEMU :-))
> 
> > While we're about it, it occurs to me that even for faults taken
> > on user addresses, ESR exposes information that is not really
> > relevant to userspace.  For example, the layout of the pagetables
> > is not something that userspace can normally see, so it is strange
> > to report the level at which a translation fault occurred etc.
> >
> > If we are sanitising this information, we should perhaps squash
> > all faults that specify a translation table level to a single level.
> 
> Mmm, maybe (though I guess level 0 fault is not so appropriate
> here). TBH it feels a bit like an unnecessary level of gold-plating.
> 
> > There are certain fault types that should never be exposed to
> > userspace, such as TLB conflict aborts, access flag faults, etc.
> > See the changes upstream between v4.16 and torvalds/master (which I had
> > temporarily forgotten about...)
> >
> > Now, __do_user_fault() is called for these cases but the signal has
> > already been mapped to SIGKILL by this point via the fault_info[]
> > table.  It doesn't matter too much what we do in such cases because
> > the user task will be killed on signal delivery and so doesn't get a
> > chance to inspect the ESR contents anyway.  It might be worth expanding
> > the WARN() to catch mismaintenance of the fault_info[] table -- but that
> > may be overkill.  Maybe adding some comments explaining the dependency
> > is sufficient.
> 
> I don't entirely understand this paragraph, but if you want to
> provide a comment I'm happy to fill it into a v2 of this patch :-)

Take a look at arch/arm64/mm/fault.c in torvalds/master.  That's
probably easier than trying to explain it.

What I was trying to say was that for certain DFSC values
__do_user_fault() may assume that userspace is sent SIGKILL and so never
sees the syndrome information.  But if we are relying on that for
safety here, we should comment and/or double-check with a WARN().

> >> The illusion is not entirely perfect, as there is a tiny wrinkle
> >> where we will report an alignment fault that was not due to the memory
> >> type (for instance a LDREX to an unaligned address) as a translation
> >> fault, whereas if you do this on real unmapped memory the alignment
> >> fault takes precedence. This is not likely to trip anybody up in
> >> practice, as the only users we know of for the ESR information who
> >> care about the behaviour for kernel addresses only really want to
> >> know about the WnR bit.
> >
> > I tend to agree with this.  We could probably allow the alignment fault
> > to show through, but I can't think of a reasonable scenario where
> > userspace would legitimately rely on this.
> 
> If you always allow "alignment fault" DFSC values through, then you
> are permitting userspace to distinguish kernel-space memory that
> is Device memory from (unmapped or Normal), because it can do an
> unaligned LDR access or similar, and see whether it gets a DFSC
> for alignment fault or one for a translation error.

Yes, true.  Bleh.  I think your argument about this apparent
repriotisation of fault types being harmless is reasonable in any case.

> >> +     if (addr >= TASK_SIZE) {
> >> +             switch (ESR_ELx_EC(esr)) {
> >> +             case ESR_ELx_EC_DABT_CUR:
> >
> > How do we hit the _CUR cases, and if we can, aren't we reporting
> > information about an insn executed by the _kernel_ here?
> >
> > Possibly we should delete those cases: if we took a _CUR exception with
> > user_mode(regs) it looks a bit like we would have to have been executing
> > at EL0 but somehow took the exception from EL1 to EL1.  That sounds like
> > it should be impossible...
> 
> Yes, you're right that we shouldn't get here with the _CUR cases.
> I think I was thinking there might be cases where the kernel
> accessed userspace memory and reported faults with a signal,
> but looking at the code we can't get to this function unless
> user_mode() returned true. So we can let the default case handle
> the _CUR cases too.

OK.  The WARN text should probably change if so.

> >> +             case ESR_ELx_EC_DABT_LOW:
> >> +                     /*
> >> +                      * These bits provide only information about the
> >> +                      * faulting instruction, which userspace knows already.
> >> +                      * All other bits are architecturally required to be
> >> +                      * zero for faults reported with a DFSCR indicating
> >> +                      * a level 0 translation fault.
> >> +                      */
> >> +                     esr &= ESR_ELx_EC_MASK | ESR_ELx_IL | ESR_ELx_ISV |
> >
> > In the !ISV case, some of the other fields become architecturally
> > UNKNOWN.  We should probably mask them out in that case, since they
> > could expose something that EL0 is not supposed to see.
> 
> The spec says they're RES0 when ISV is 0, and only UNKNOWN if ISV
> is UNKNOWN (which only happens for Debug state accesses, which I
> think are not relevant here).

Hmmm, you're right.  I wasn't reading the spec carefully enough.

RES0 still probably means we should mask those bits out though (same
rationale as in the next case).

> We can certainly mask out the RES0 bits in the ISV==0 case, though.

Agreed.

> >> +                             ESR_ELx_SAS | ESR_ELx_SSE | ESR_ELx_SRT_MASK |
> >> +                             ESR_ELx_SF | ESR_ELx_AR | ESR_ELx_CM |
> >> +                             ESR_ELx_WNR;
> >> +                     esr |= ESR_ELx_FSC_FAULT;
> >> +                     break;
> >> +             case ESR_ELx_EC_IABT_CUR:
> >> +             case ESR_ELx_EC_IABT_LOW:
> >> +                     /*
> >> +                      * Claim a level 0 translation fault.
> >> +                      * All other bits are architecturally required to be
> >> +                      * zero for faults reported with that DFSC value.
> >
> > Nit: they're RES0, so they might become nonzero in the future, with
> > unpredicatable (and possibly sensitive) meanings.  So it makes sense to
> > hide them until we know they're safe to expose.
> 
> OK, I'll tweak the comment text.
> 
> >> +                      */
> >> +                     esr &= ESR_ELx_EC_MASK | ESR_ELx_IL;
> >> +                     esr |= ESR_ELx_FSC_FAULT;
> >> +                     break;
> >> +             default:
> >> +                     /*
> >> +                      * This should never happen (entry.S only brings us
> >> +                      * into this code for insn and data aborts). Fail
> >> +                      * safe by not providing an ESR context record at all.
> >> +                      */
> >> +                     WARN(1, "ESR 0x%x is not DABT or IABT\n", esr);
> >> +                     esr = 0;
> >> +                     break;
> >> +             }
> >> +     }
> >> +
> >>       tsk->thread.fault_address = addr;
> >>       tsk->thread.fault_code = esr;
> >>       si.si_signo = sig;
> 
> thanks
> -- PMM
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
  2018-04-13 14:21 ` Dave Martin
  2018-04-13 14:47   ` Peter Maydell
  2018-04-13 14:54   ` Robin Murphy
@ 2018-04-17 17:34   ` Peter Maydell
  2018-04-18 10:33     ` Dave Martin
  2 siblings, 1 reply; 7+ messages in thread
From: Peter Maydell @ 2018-04-17 17:34 UTC (permalink / raw)
  To: linux-arm-kernel

On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote:
> There are certain fault types that should never be exposed to
> userspace, such as TLB conflict aborts, access flag faults, etc.
> See the changes upstream between v4.16 and torvalds/master (which I had
> temporarily forgotten about...)
>
> Now, __do_user_fault() is called for these cases but the signal has
> already been mapped to SIGKILL by this point via the fault_info[]
> table.  It doesn't matter too much what we do in such cases because
> the user task will be killed on signal delivery and so doesn't get a
> chance to inspect the ESR contents anyway.  It might be worth expanding
> the WARN() to catch mismaintenance of the fault_info[] table -- but that
> may be overkill.  Maybe adding some comments explaining the dependency
> is sufficient.

I had a look at the code in current master, and from my reading
of it it does not call __do_user_fault() for faults like TLB
conflict aborts, etc. Those all have fault_info[] table entries
that specify do_bad() as the function pointer, which just returns 1.
It's only do_page_fault(), do_translation_fault() and do_alignment()
fault that can get into __do_user_fault().

So I don't think there's any change that needs to be made for this
point. Am I misunderstanding what you have in mind? If so it would
probably be simplest if you explained the change you'd like to see.

(The changes in master since v4.16 do require a rebase anyway, mostly
for textual reasons, so I'll base v2 on master unless you have a
different branch you'd rather I based it on.)

I've made the various other changes you suggest.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA
  2018-04-17 17:34   ` Peter Maydell
@ 2018-04-18 10:33     ` Dave Martin
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Martin @ 2018-04-18 10:33 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 17, 2018 at 06:34:11PM +0100, Peter Maydell wrote:
> On 13 April 2018 at 15:21, Dave Martin <Dave.Martin@arm.com> wrote:
> > There are certain fault types that should never be exposed to
> > userspace, such as TLB conflict aborts, access flag faults, etc.
> > See the changes upstream between v4.16 and torvalds/master (which I had
> > temporarily forgotten about...)
> >
> > Now, __do_user_fault() is called for these cases but the signal has
> > already been mapped to SIGKILL by this point via the fault_info[]
> > table.  It doesn't matter too much what we do in such cases because
> > the user task will be killed on signal delivery and so doesn't get a
> > chance to inspect the ESR contents anyway.  It might be worth expanding
> > the WARN() to catch mismaintenance of the fault_info[] table -- but that
> > may be overkill.  Maybe adding some comments explaining the dependency
> > is sufficient.
> 
> I had a look at the code in current master, and from my reading
> of it it does not call __do_user_fault() for faults like TLB
> conflict aborts, etc. Those all have fault_info[] table entries
> that specify do_bad() as the function pointer, which just returns 1.
> It's only do_page_fault(), do_translation_fault() and do_alignment()
> fault that can get into __do_user_fault().
> 
> So I don't think there's any change that needs to be made for this
> point. Am I misunderstanding what you have in mind? If so it would
> probably be simplest if you explained the change you'd like to see.

Don't worry about this; I think you're right.  I was confusing do_bad()
with do_bad_area() in my head somewhere along the line...

> 
> (The changes in master since v4.16 do require a rebase anyway, mostly
> for textual reasons, so I'll base v2 on master unless you have a
> different branch you'd rather I based it on.)

-rc1 or master is probably fine.  Most of the dust from the merge window
ought to have settled now.

> I've made the various other changes you suggest.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-04-18 10:33 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-13 10:02 [RFC PATCH] arm64: fault: Don't leak data in ESR context for user fault on kernel VA Peter Maydell
2018-04-13 14:21 ` Dave Martin
2018-04-13 14:47   ` Peter Maydell
2018-04-13 15:21     ` Dave Martin
2018-04-13 14:54   ` Robin Murphy
2018-04-17 17:34   ` Peter Maydell
2018-04-18 10:33     ` Dave Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.