All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: mm: Fix false positives in W+X checking
@ 2018-04-25 14:13 Jeffrey Hugo
  2018-04-25 15:03 ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Hugo @ 2018-04-25 14:13 UTC (permalink / raw)
  To: linux-arm-kernel

load_module() creates W+X mappings via __vmalloc_node_range() (from
layout_and_allocate()->move_module()->module_alloc()) by using
PAGE_KERNEL_EXEC.  These mappings are later cleaned up via
"call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().

This is a problem because call_rcu_sched() queues work, which can be run
after debug_checkwx() is run, resulting in a race condition.  If hit, the
race results in a nasty splat about insecure W+X mappings, which results
in a poor user experience as these are not the mappings that
debug_checkwx() is intended to catch.

Address the race by flushing the queued work before running
debug_checkwx().

Reported-by: Timur Tabi <timur@codeaurora.org>
Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages")
Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
---
 arch/arm64/mm/mmu.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9..40d45fd 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -503,6 +503,12 @@ void mark_rodata_ro(void)
 	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
 			    section_size, PAGE_KERNEL_RO);
 
+	/*
+	 * load_module() results in W+X mappings, which are cleaned up with
+	 * call_rcu_sched().  Let's make sure that queued work is flushed so
+	 * that we don't hit false positives.
+	 */
+	rcu_barrier_sched();
 	debug_checkwx();
 }
 
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-25 14:13 [PATCH] arm64: mm: Fix false positives in W+X checking Jeffrey Hugo
@ 2018-04-25 15:03 ` Will Deacon
  2018-04-25 15:12   ` Jeffrey Hugo
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2018-04-25 15:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jeffrey,

Thanks for the patch. Comment below.

On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
> load_module() creates W+X mappings via __vmalloc_node_range() (from
> layout_and_allocate()->move_module()->module_alloc()) by using
> PAGE_KERNEL_EXEC.  These mappings are later cleaned up via
> "call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().
> 
> This is a problem because call_rcu_sched() queues work, which can be run
> after debug_checkwx() is run, resulting in a race condition.  If hit, the
> race results in a nasty splat about insecure W+X mappings, which results
> in a poor user experience as these are not the mappings that
> debug_checkwx() is intended to catch.
> 
> Address the race by flushing the queued work before running
> debug_checkwx().
> 
> Reported-by: Timur Tabi <timur@codeaurora.org>
> Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
> Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages")
> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
> ---
>  arch/arm64/mm/mmu.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 2dbb2c9..40d45fd 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>  	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
>  			    section_size, PAGE_KERNEL_RO);
>  
> +	/*
> +	 * load_module() results in W+X mappings, which are cleaned up with
> +	 * call_rcu_sched().  Let's make sure that queued work is flushed so
> +	 * that we don't hit false positives.
> +	 */
> +	rcu_barrier_sched();
>  	debug_checkwx();
>  }

Whilst this looks correct to me, it looks to me like other architectures
(e.g. x86) would be affected by this problem as well. Perhaps it would be
better to solve it in the core code before invoking mark_rodata_ro, or is
there some reason that we only run into this on arm64?

Cheers,

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-25 15:03 ` Will Deacon
@ 2018-04-25 15:12   ` Jeffrey Hugo
  2018-04-25 15:41     ` Laura Abbott
  2018-04-25 16:23     ` Kees Cook
  0 siblings, 2 replies; 9+ messages in thread
From: Jeffrey Hugo @ 2018-04-25 15:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/25/2018 9:03 AM, Will Deacon wrote:
> Hi Jeffrey,
> 
> Thanks for the patch. Comment below.
> 
> On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
>> load_module() creates W+X mappings via __vmalloc_node_range() (from
>> layout_and_allocate()->move_module()->module_alloc()) by using
>> PAGE_KERNEL_EXEC.  These mappings are later cleaned up via
>> "call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().
>>
>> This is a problem because call_rcu_sched() queues work, which can be run
>> after debug_checkwx() is run, resulting in a race condition.  If hit, the
>> race results in a nasty splat about insecure W+X mappings, which results
>> in a poor user experience as these are not the mappings that
>> debug_checkwx() is intended to catch.
>>
>> Address the race by flushing the queued work before running
>> debug_checkwx().
>>
>> Reported-by: Timur Tabi <timur@codeaurora.org>
>> Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
>> Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages")
>> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
>> ---
>>   arch/arm64/mm/mmu.c | 6 ++++++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 2dbb2c9..40d45fd 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>>   	update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
>>   			    section_size, PAGE_KERNEL_RO);
>>   
>> +	/*
>> +	 * load_module() results in W+X mappings, which are cleaned up with
>> +	 * call_rcu_sched().  Let's make sure that queued work is flushed so
>> +	 * that we don't hit false positives.
>> +	 */
>> +	rcu_barrier_sched();
>>   	debug_checkwx();
>>   }
> 
> Whilst this looks correct to me, it looks to me like other architectures
> (e.g. x86) would be affected by this problem as well. Perhaps it would be
> better to solve it in the core code before invoking mark_rodata_ro, or is
> there some reason that we only run into this on arm64?
> 

Thanks for the review.

I was actually pondering pushing this into ptdump_check_wx() so that the 
"barrier" hit is only observed with CONFIG_DEBUG_WX.

I agree, in principal this is not an arm64 specific issue.  I do not 
have sufficient equipment to validate other architectures.  On QDF2400, 
the reproduction rate is very low, roughly 1-3 instances in 2000-3000 
reboots.  I do have a system simulator which happens to repro the issue 
100% of the time, which is what I used to debug and test this fix, 
before applying it to QDF2400.

I'm waffling.  I see the benefit of fixing this in common code, but the 
"core" functionality of mark_rodata_ro doesn't need this barrier...

I suppose I can push it up to core code, and see what the rest of the 
community says.  Is that what you recommend?

-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-25 15:12   ` Jeffrey Hugo
@ 2018-04-25 15:41     ` Laura Abbott
  2018-04-25 16:23     ` Kees Cook
  1 sibling, 0 replies; 9+ messages in thread
From: Laura Abbott @ 2018-04-25 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/25/2018 08:12 AM, Jeffrey Hugo wrote:
> On 4/25/2018 9:03 AM, Will Deacon wrote:
>> Hi Jeffrey,
>>
>> Thanks for the patch. Comment below.
>>
>> On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
>>> load_module() creates W+X mappings via __vmalloc_node_range() (from
>>> layout_and_allocate()->move_module()->module_alloc()) by using
>>> PAGE_KERNEL_EXEC.? These mappings are later cleaned up via
>>> "call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().
>>>
>>> This is a problem because call_rcu_sched() queues work, which can be run
>>> after debug_checkwx() is run, resulting in a race condition.? If hit, the
>>> race results in a nasty splat about insecure W+X mappings, which results
>>> in a poor user experience as these are not the mappings that
>>> debug_checkwx() is intended to catch.
>>>
>>> Address the race by flushing the queued work before running
>>> debug_checkwx().
>>>
>>> Reported-by: Timur Tabi <timur@codeaurora.org>
>>> Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
>>> Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages")
>>> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
>>> ---
>>> ? arch/arm64/mm/mmu.c | 6 ++++++
>>> ? 1 file changed, 6 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 2dbb2c9..40d45fd 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>>> ????? update_mapping_prot(__pa_symbol(__start_rodata), (unsigned long)__start_rodata,
>>> ????????????????? section_size, PAGE_KERNEL_RO);
>>> +??? /*
>>> +???? * load_module() results in W+X mappings, which are cleaned up with
>>> +???? * call_rcu_sched().? Let's make sure that queued work is flushed so
>>> +???? * that we don't hit false positives.
>>> +???? */
>>> +??? rcu_barrier_sched();
>>> ????? debug_checkwx();
>>> ? }
>>
>> Whilst this looks correct to me, it looks to me like other architectures
>> (e.g. x86) would be affected by this problem as well. Perhaps it would be
>> better to solve it in the core code before invoking mark_rodata_ro, or is
>> there some reason that we only run into this on arm64?
>>
> 
> Thanks for the review.
> 
> I was actually pondering pushing this into ptdump_check_wx() so that the "barrier" hit is only observed with CONFIG_DEBUG_WX.
> 
> I agree, in principal this is not an arm64 specific issue.? I do not have sufficient equipment to validate other architectures.? On QDF2400, the reproduction rate is very low, roughly 1-3 instances in 2000-3000 reboots.? I do have a system simulator which happens to repro the issue 100% of the time, which is what I used to debug and test this fix, before applying it to QDF2400.
> 
> I'm waffling.? I see the benefit of fixing this in common code, but the "core" functionality of mark_rodata_ro doesn't need this barrier...
> 
> I suppose I can push it up to core code, and see what the rest of the community says.? Is that what you recommend?
> 

I'd feel more confident if we had an explanation why this doesn't
occur on x86. I'm also a bit wary of throwing in here since it's
an implementation detail of modules. I'm traveling but if I get
a chance I'll see if I can dig into if x86 is just getting lucky
or something else.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-25 15:12   ` Jeffrey Hugo
  2018-04-25 15:41     ` Laura Abbott
@ 2018-04-25 16:23     ` Kees Cook
  2018-04-25 16:36       ` Jeffrey Hugo
  1 sibling, 1 reply; 9+ messages in thread
From: Kees Cook @ 2018-04-25 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 8:12 AM, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> On 4/25/2018 9:03 AM, Will Deacon wrote:
>>
>> Hi Jeffrey,
>>
>> Thanks for the patch. Comment below.
>>
>> On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
>>>
>>> load_module() creates W+X mappings via __vmalloc_node_range() (from
>>> layout_and_allocate()->move_module()->module_alloc()) by using
>>> PAGE_KERNEL_EXEC.  These mappings are later cleaned up via
>>> "call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().
>>>
>>> This is a problem because call_rcu_sched() queues work, which can be run
>>> after debug_checkwx() is run, resulting in a race condition.  If hit, the
>>> race results in a nasty splat about insecure W+X mappings, which results
>>> in a poor user experience as these are not the mappings that
>>> debug_checkwx() is intended to catch.
>>>
>>> Address the race by flushing the queued work before running
>>> debug_checkwx().
>>>
>>> Reported-by: Timur Tabi <timur@codeaurora.org>
>>> Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
>>> Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and
>>> exectuable pages")
>>> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
>>> ---
>>>   arch/arm64/mm/mmu.c | 6 ++++++
>>>   1 file changed, 6 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 2dbb2c9..40d45fd 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>>>         update_mapping_prot(__pa_symbol(__start_rodata), (unsigned
>>> long)__start_rodata,
>>>                             section_size, PAGE_KERNEL_RO);
>>>   +     /*
>>> +        * load_module() results in W+X mappings, which are cleaned up
>>> with
>>> +        * call_rcu_sched().  Let's make sure that queued work is flushed
>>> so
>>> +        * that we don't hit false positives.
>>> +        */
>>> +       rcu_barrier_sched();
>>>         debug_checkwx();
>>>   }
>>
>>
>> Whilst this looks correct to me, it looks to me like other architectures
>> (e.g. x86) would be affected by this problem as well. Perhaps it would be
>> better to solve it in the core code before invoking mark_rodata_ro, or is
>> there some reason that we only run into this on arm64?
>>
>
> Thanks for the review.
>
> I was actually pondering pushing this into ptdump_check_wx() so that the
> "barrier" hit is only observed with CONFIG_DEBUG_WX.
>
> I agree, in principal this is not an arm64 specific issue.  I do not have
> sufficient equipment to validate other architectures.  On QDF2400, the
> reproduction rate is very low, roughly 1-3 instances in 2000-3000 reboots.
> I do have a system simulator which happens to repro the issue 100% of the
> time, which is what I used to debug and test this fix, before applying it to
> QDF2400.
>
> I'm waffling.  I see the benefit of fixing this in common code, but the
> "core" functionality of mark_rodata_ro doesn't need this barrier...
>
> I suppose I can push it up to core code, and see what the rest of the
> community says.  Is that what you recommend?

I think fixing this in the general case makes sense.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-25 16:23     ` Kees Cook
@ 2018-04-25 16:36       ` Jeffrey Hugo
  2018-04-25 16:37         ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Jeffrey Hugo @ 2018-04-25 16:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/25/2018 10:23 AM, Kees Cook wrote:
> On Wed, Apr 25, 2018 at 8:12 AM, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>> On 4/25/2018 9:03 AM, Will Deacon wrote:
>>>
>>> Hi Jeffrey,
>>>
>>> Thanks for the patch. Comment below.
>>>
>>> On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
>>>>
>>>> load_module() creates W+X mappings via __vmalloc_node_range() (from
>>>> layout_and_allocate()->move_module()->module_alloc()) by using
>>>> PAGE_KERNEL_EXEC.  These mappings are later cleaned up via
>>>> "call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().
>>>>
>>>> This is a problem because call_rcu_sched() queues work, which can be run
>>>> after debug_checkwx() is run, resulting in a race condition.  If hit, the
>>>> race results in a nasty splat about insecure W+X mappings, which results
>>>> in a poor user experience as these are not the mappings that
>>>> debug_checkwx() is intended to catch.
>>>>
>>>> Address the race by flushing the queued work before running
>>>> debug_checkwx().
>>>>
>>>> Reported-by: Timur Tabi <timur@codeaurora.org>
>>>> Reported-by: Jan Glauber <jan.glauber@caviumnetworks.com>
>>>> Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and
>>>> exectuable pages")
>>>> Signed-off-by: Jeffrey Hugo <jhugo@codeaurora.org>
>>>> ---
>>>>    arch/arm64/mm/mmu.c | 6 ++++++
>>>>    1 file changed, 6 insertions(+)
>>>>
>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>> index 2dbb2c9..40d45fd 100644
>>>> --- a/arch/arm64/mm/mmu.c
>>>> +++ b/arch/arm64/mm/mmu.c
>>>> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>>>>          update_mapping_prot(__pa_symbol(__start_rodata), (unsigned
>>>> long)__start_rodata,
>>>>                              section_size, PAGE_KERNEL_RO);
>>>>    +     /*
>>>> +        * load_module() results in W+X mappings, which are cleaned up
>>>> with
>>>> +        * call_rcu_sched().  Let's make sure that queued work is flushed
>>>> so
>>>> +        * that we don't hit false positives.
>>>> +        */
>>>> +       rcu_barrier_sched();
>>>>          debug_checkwx();
>>>>    }
>>>
>>>
>>> Whilst this looks correct to me, it looks to me like other architectures
>>> (e.g. x86) would be affected by this problem as well. Perhaps it would be
>>> better to solve it in the core code before invoking mark_rodata_ro, or is
>>> there some reason that we only run into this on arm64?
>>>
>>
>> Thanks for the review.
>>
>> I was actually pondering pushing this into ptdump_check_wx() so that the
>> "barrier" hit is only observed with CONFIG_DEBUG_WX.
>>
>> I agree, in principal this is not an arm64 specific issue.  I do not have
>> sufficient equipment to validate other architectures.  On QDF2400, the
>> reproduction rate is very low, roughly 1-3 instances in 2000-3000 reboots.
>> I do have a system simulator which happens to repro the issue 100% of the
>> time, which is what I used to debug and test this fix, before applying it to
>> QDF2400.
>>
>> I'm waffling.  I see the benefit of fixing this in common code, but the
>> "core" functionality of mark_rodata_ro doesn't need this barrier...
>>
>> I suppose I can push it up to core code, and see what the rest of the
>> community says.  Is that what you recommend?
> 
> I think fixing this in the general case makes sense.

Ok.  I'll give it until Monday to see if Laura has any insights into 
x86, and then spin a v2.

-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-25 16:36       ` Jeffrey Hugo
@ 2018-04-25 16:37         ` Will Deacon
  2018-04-27 19:30           ` Laura Abbott
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2018-04-25 16:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 25, 2018 at 10:36:25AM -0600, Jeffrey Hugo wrote:
> On 4/25/2018 10:23 AM, Kees Cook wrote:
> >On Wed, Apr 25, 2018 at 8:12 AM, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> >>On 4/25/2018 9:03 AM, Will Deacon wrote:
> >>>On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
> >>>>diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >>>>index 2dbb2c9..40d45fd 100644
> >>>>--- a/arch/arm64/mm/mmu.c
> >>>>+++ b/arch/arm64/mm/mmu.c
> >>>>@@ -503,6 +503,12 @@ void mark_rodata_ro(void)
> >>>>         update_mapping_prot(__pa_symbol(__start_rodata), (unsigned
> >>>>long)__start_rodata,
> >>>>                             section_size, PAGE_KERNEL_RO);
> >>>>   +     /*
> >>>>+        * load_module() results in W+X mappings, which are cleaned up
> >>>>with
> >>>>+        * call_rcu_sched().  Let's make sure that queued work is flushed
> >>>>so
> >>>>+        * that we don't hit false positives.
> >>>>+        */
> >>>>+       rcu_barrier_sched();
> >>>>         debug_checkwx();
> >>>>   }
> >>>
> >>>
> >>>Whilst this looks correct to me, it looks to me like other architectures
> >>>(e.g. x86) would be affected by this problem as well. Perhaps it would be
> >>>better to solve it in the core code before invoking mark_rodata_ro, or is
> >>>there some reason that we only run into this on arm64?
> >>>
> >>
> >>Thanks for the review.
> >>
> >>I was actually pondering pushing this into ptdump_check_wx() so that the
> >>"barrier" hit is only observed with CONFIG_DEBUG_WX.
> >>
> >>I agree, in principal this is not an arm64 specific issue.  I do not have
> >>sufficient equipment to validate other architectures.  On QDF2400, the
> >>reproduction rate is very low, roughly 1-3 instances in 2000-3000 reboots.
> >>I do have a system simulator which happens to repro the issue 100% of the
> >>time, which is what I used to debug and test this fix, before applying it to
> >>QDF2400.
> >>
> >>I'm waffling.  I see the benefit of fixing this in common code, but the
> >>"core" functionality of mark_rodata_ro doesn't need this barrier...
> >>
> >>I suppose I can push it up to core code, and see what the rest of the
> >>community says.  Is that what you recommend?
> >
> >I think fixing this in the general case makes sense.
> 
> Ok.  I'll give it until Monday to see if Laura has any insights into x86,
> and then spin a v2.

Thanks, Jeffrey.

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-25 16:37         ` Will Deacon
@ 2018-04-27 19:30           ` Laura Abbott
  2018-04-27 20:01             ` Jeffrey Hugo
  0 siblings, 1 reply; 9+ messages in thread
From: Laura Abbott @ 2018-04-27 19:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 04/25/2018 09:37 AM, Will Deacon wrote:
> On Wed, Apr 25, 2018 at 10:36:25AM -0600, Jeffrey Hugo wrote:
>> On 4/25/2018 10:23 AM, Kees Cook wrote:
>>> On Wed, Apr 25, 2018 at 8:12 AM, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
>>>> On 4/25/2018 9:03 AM, Will Deacon wrote:
>>>>> On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
>>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>>> index 2dbb2c9..40d45fd 100644
>>>>>> --- a/arch/arm64/mm/mmu.c
>>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>>> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>>>>>>          update_mapping_prot(__pa_symbol(__start_rodata), (unsigned
>>>>>> long)__start_rodata,
>>>>>>                              section_size, PAGE_KERNEL_RO);
>>>>>>    +     /*
>>>>>> +        * load_module() results in W+X mappings, which are cleaned up
>>>>>> with
>>>>>> +        * call_rcu_sched().  Let's make sure that queued work is flushed
>>>>>> so
>>>>>> +        * that we don't hit false positives.
>>>>>> +        */
>>>>>> +       rcu_barrier_sched();
>>>>>>          debug_checkwx();
>>>>>>    }
>>>>>
>>>>>
>>>>> Whilst this looks correct to me, it looks to me like other architectures
>>>>> (e.g. x86) would be affected by this problem as well. Perhaps it would be
>>>>> better to solve it in the core code before invoking mark_rodata_ro, or is
>>>>> there some reason that we only run into this on arm64?
>>>>>
>>>>
>>>> Thanks for the review.
>>>>
>>>> I was actually pondering pushing this into ptdump_check_wx() so that the
>>>> "barrier" hit is only observed with CONFIG_DEBUG_WX.
>>>>
>>>> I agree, in principal this is not an arm64 specific issue.  I do not have
>>>> sufficient equipment to validate other architectures.  On QDF2400, the
>>>> reproduction rate is very low, roughly 1-3 instances in 2000-3000 reboots.
>>>> I do have a system simulator which happens to repro the issue 100% of the
>>>> time, which is what I used to debug and test this fix, before applying it to
>>>> QDF2400.
>>>>
>>>> I'm waffling.  I see the benefit of fixing this in common code, but the
>>>> "core" functionality of mark_rodata_ro doesn't need this barrier...
>>>>
>>>> I suppose I can push it up to core code, and see what the rest of the
>>>> community says.  Is that what you recommend?
>>>
>>> I think fixing this in the general case makes sense.
>>
>> Ok.  I'll give it until Monday to see if Laura has any insights into x86,
>> and then spin a v2.
> 
> Thanks, Jeffrey.
> 
> Will
> 

I set up some test code that does request_module in a late_initcall
and stuck a delay and print in do_free_init and triggered it:

[    6.115319]  ? module_memfree+0x10/0x10
[    6.116082]  rcu_process_callbacks+0x1bd/0x7f0
[    6.116895]  __do_softirq+0xd9/0x2af
[    6.117588]  irq_exit+0xa9/0xb0
[    6.118193]  smp_apic_timer_interrupt+0x67/0x120
[    6.118993]  apic_timer_interrupt+0xf/0x20
[    6.119732]  </IRQ>
[    6.120265] RIP: 0010:__change_page_attr_set_clr+0x667/0xc80
[    6.121363] RSP: 0018:ffffc90000197c60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[    6.123417] RAX: 00000000000001e8 RBX: 80000000000000e3 RCX: ffffffff81c031d1
[    6.124677] RDX: 0000000000000000 RSI: ffffc90000197dd8 RDI: 80000000028000e3
[    6.125905] RBP: ffff8800029e7000 R08: 00000000ffffd801 R09: 000ffffffffff000
[    6.127114] R10: 0000000000000001 R11: 000ffffffffff000 R12: 80000000000000e3
[    6.128350] R13: 00000000000029e7 R14: 0000000000000200 R15: 80000000000000e3
[    6.129522]  ? __alloc_pages_nodemask+0xfc/0x220
[    6.130218]  ? smp_call_function_many+0x1e7/0x230
[    6.130935]  ? load_new_mm_cr3+0xe0/0xe0
[    6.131566]  __change_page_attr_set_clr+0xc59/0xc80
[    6.132293]  ? cpumask_next+0x16/0x20
[    6.132894]  change_page_attr_set_clr+0x131/0x470
[    6.133601]  set_memory_rw+0x21/0x30
[    6.134172]  free_init_pages+0x59/0x80
[    6.134766]  ? rest_init+0xb0/0xb0
[    6.135322]  kernel_init+0x14/0x100
[    6.135885]  ret_from_fork+0x35/0x40

This is part way through mark_rodata_ro which does the debug_checkwx()
so I think it could still be an issue on x86. The set_memory_* functions
on x86 are a bit more involved than arm64 and may provide more
opportunities for the rcu callbacks to run. I'm guessing arm64 may just
be loading modules that could be particularly likely to load other
modules.

So if you're willing to take some of my hand waving, I think this
should go in generic code since this is difficult to reproduce.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH] arm64: mm: Fix false positives in W+X checking
  2018-04-27 19:30           ` Laura Abbott
@ 2018-04-27 20:01             ` Jeffrey Hugo
  0 siblings, 0 replies; 9+ messages in thread
From: Jeffrey Hugo @ 2018-04-27 20:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 4/27/2018 1:30 PM, Laura Abbott wrote:
> On 04/25/2018 09:37 AM, Will Deacon wrote:
>> On Wed, Apr 25, 2018 at 10:36:25AM -0600, Jeffrey Hugo wrote:
>>> On 4/25/2018 10:23 AM, Kees Cook wrote:
>>>> On Wed, Apr 25, 2018 at 8:12 AM, Jeffrey Hugo <jhugo@codeaurora.org> 
>>>> wrote:
>>>>> On 4/25/2018 9:03 AM, Will Deacon wrote:
>>>>>> On Wed, Apr 25, 2018 at 08:13:31AM -0600, Jeffrey Hugo wrote:
>>>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>>>> index 2dbb2c9..40d45fd 100644
>>>>>>> --- a/arch/arm64/mm/mmu.c
>>>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>>>> @@ -503,6 +503,12 @@ void mark_rodata_ro(void)
>>>>>>> ???????? update_mapping_prot(__pa_symbol(__start_rodata), (unsigned
>>>>>>> long)__start_rodata,
>>>>>>> ???????????????????????????? section_size, PAGE_KERNEL_RO);
>>>>>>> ?? +???? /*
>>>>>>> +??????? * load_module() results in W+X mappings, which are 
>>>>>>> cleaned up
>>>>>>> with
>>>>>>> +??????? * call_rcu_sched().? Let's make sure that queued work is 
>>>>>>> flushed
>>>>>>> so
>>>>>>> +??????? * that we don't hit false positives.
>>>>>>> +??????? */
>>>>>>> +?????? rcu_barrier_sched();
>>>>>>> ???????? debug_checkwx();
>>>>>>> ?? }
>>>>>>
>>>>>>
>>>>>> Whilst this looks correct to me, it looks to me like other 
>>>>>> architectures
>>>>>> (e.g. x86) would be affected by this problem as well. Perhaps it 
>>>>>> would be
>>>>>> better to solve it in the core code before invoking 
>>>>>> mark_rodata_ro, or is
>>>>>> there some reason that we only run into this on arm64?
>>>>>>
>>>>>
>>>>> Thanks for the review.
>>>>>
>>>>> I was actually pondering pushing this into ptdump_check_wx() so 
>>>>> that the
>>>>> "barrier" hit is only observed with CONFIG_DEBUG_WX.
>>>>>
>>>>> I agree, in principal this is not an arm64 specific issue.? I do 
>>>>> not have
>>>>> sufficient equipment to validate other architectures.? On QDF2400, the
>>>>> reproduction rate is very low, roughly 1-3 instances in 2000-3000 
>>>>> reboots.
>>>>> I do have a system simulator which happens to repro the issue 100% 
>>>>> of the
>>>>> time, which is what I used to debug and test this fix, before 
>>>>> applying it to
>>>>> QDF2400.
>>>>>
>>>>> I'm waffling.? I see the benefit of fixing this in common code, but 
>>>>> the
>>>>> "core" functionality of mark_rodata_ro doesn't need this barrier...
>>>>>
>>>>> I suppose I can push it up to core code, and see what the rest of the
>>>>> community says.? Is that what you recommend?
>>>>
>>>> I think fixing this in the general case makes sense.
>>>
>>> Ok.? I'll give it until Monday to see if Laura has any insights into 
>>> x86,
>>> and then spin a v2.
>>
>> Thanks, Jeffrey.
>>
>> Will
>>
> 
> I set up some test code that does request_module in a late_initcall
> and stuck a delay and print in do_free_init and triggered it:
> 
> [??? 6.115319]? ? module_memfree+0x10/0x10
> [??? 6.116082]? rcu_process_callbacks+0x1bd/0x7f0
> [??? 6.116895]? __do_softirq+0xd9/0x2af
> [??? 6.117588]? irq_exit+0xa9/0xb0
> [??? 6.118193]? smp_apic_timer_interrupt+0x67/0x120
> [??? 6.118993]? apic_timer_interrupt+0xf/0x20
> [??? 6.119732]? </IRQ>
> [??? 6.120265] RIP: 0010:__change_page_attr_set_clr+0x667/0xc80
> [??? 6.121363] RSP: 0018:ffffc90000197c60 EFLAGS: 00000246 ORIG_RAX: 
> ffffffffffffff13
> [??? 6.123417] RAX: 00000000000001e8 RBX: 80000000000000e3 RCX: 
> ffffffff81c031d1
> [??? 6.124677] RDX: 0000000000000000 RSI: ffffc90000197dd8 RDI: 
> 80000000028000e3
> [??? 6.125905] RBP: ffff8800029e7000 R08: 00000000ffffd801 R09: 
> 000ffffffffff000
> [??? 6.127114] R10: 0000000000000001 R11: 000ffffffffff000 R12: 
> 80000000000000e3
> [??? 6.128350] R13: 00000000000029e7 R14: 0000000000000200 R15: 
> 80000000000000e3
> [??? 6.129522]? ? __alloc_pages_nodemask+0xfc/0x220
> [??? 6.130218]? ? smp_call_function_many+0x1e7/0x230
> [??? 6.130935]? ? load_new_mm_cr3+0xe0/0xe0
> [??? 6.131566]? __change_page_attr_set_clr+0xc59/0xc80
> [??? 6.132293]? ? cpumask_next+0x16/0x20
> [??? 6.132894]? change_page_attr_set_clr+0x131/0x470
> [??? 6.133601]? set_memory_rw+0x21/0x30
> [??? 6.134172]? free_init_pages+0x59/0x80
> [??? 6.134766]? ? rest_init+0xb0/0xb0
> [??? 6.135322]? kernel_init+0x14/0x100
> [??? 6.135885]? ret_from_fork+0x35/0x40
> 
> This is part way through mark_rodata_ro which does the debug_checkwx()
> so I think it could still be an issue on x86. The set_memory_* functions
> on x86 are a bit more involved than arm64 and may provide more
> opportunities for the rcu callbacks to run. I'm guessing arm64 may just
> be loading modules that could be particularly likely to load other
> modules.
> 
> So if you're willing to take some of my hand waving, I think this
> should go in generic code since this is difficult to reproduce.

Thank you for the follow up Laura.  In my opinion, there is no hand 
waiving.  I feel you synthetically demonstrated the issue on x86, thus 
confirming this is a multi-architectural issue.

I will work on a v2 moving the barrier to common code as suggested.

-- 
Jeffrey Hugo
Qualcomm Datacenter Technologies as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-04-27 20:01 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-25 14:13 [PATCH] arm64: mm: Fix false positives in W+X checking Jeffrey Hugo
2018-04-25 15:03 ` Will Deacon
2018-04-25 15:12   ` Jeffrey Hugo
2018-04-25 15:41     ` Laura Abbott
2018-04-25 16:23     ` Kees Cook
2018-04-25 16:36       ` Jeffrey Hugo
2018-04-25 16:37         ` Will Deacon
2018-04-27 19:30           ` Laura Abbott
2018-04-27 20:01             ` Jeffrey Hugo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.