linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
@ 2020-08-28  3:19 Muchun Song
  2020-09-07  2:54 ` Muchun Song
  0 siblings, 1 reply; 10+ messages in thread
From: Muchun Song @ 2020-08-28  3:19 UTC (permalink / raw)
  To: keescook, mhiramat, rostedt, alex.popov, miguel.ojeda.sandonis
  Cc: linux-kernel, Muchun Song

There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.

    CPU0:                                 CPU1:
                                          proc_sys_write
    stack_erasing_sysctl                    proc_sys_call_handler
      table->data = &state;                   stack_erasing_sysctl
                                                table->data = &state;
      proc_doulongvec_minmax
        do_proc_doulongvec_minmax             sysctl_head_finish
          __do_proc_doulongvec_minmax           unuse_table
            i = table->data;
            *i = val;  // corrupt CPU1's stack

Fix this by duplicating the `table`, and only update the duplicate of
it.

Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
---
changelogs in v2:
 1. Add more details about how the race happened to the commit message.

 kernel/stackleak.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/stackleak.c b/kernel/stackleak.c
index a8fc9ae1d03d..fd95b87478ff 100644
--- a/kernel/stackleak.c
+++ b/kernel/stackleak.c
@@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
 	int ret = 0;
 	int state = !static_branch_unlikely(&stack_erasing_bypass);
 	int prev_state = state;
+	struct ctl_table dup_table = *table;
 
-	table->data = &state;
-	table->maxlen = sizeof(int);
-	ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+	/*
+	 * In order to avoid races with __do_proc_doulongvec_minmax(), we
+	 * can duplicate the @table and alter the duplicate of it.
+	 */
+	dup_table.data = &state;
+	dup_table.maxlen = sizeof(int);
+	ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
 	state = !!state;
 	if (ret || !write || state == prev_state)
 		return ret;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-08-28  3:19 [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers Muchun Song
@ 2020-09-07  2:54 ` Muchun Song
  2020-09-07 11:24   ` Alexander Popov
  0 siblings, 1 reply; 10+ messages in thread
From: Muchun Song @ 2020-09-07  2:54 UTC (permalink / raw)
  To: Kees Cook, Masami Hiramatsu, Steven Rostedt, alex.popov,
	miguel.ojeda.sandonis
  Cc: LKML

Hi all,

Any comments or suggestions? Thanks.

On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> There is a race between the assignment of `table->data` and write value
> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
> the other thread.
>
>     CPU0:                                 CPU1:
>                                           proc_sys_write
>     stack_erasing_sysctl                    proc_sys_call_handler
>       table->data = &state;                   stack_erasing_sysctl
>                                                 table->data = &state;
>       proc_doulongvec_minmax
>         do_proc_doulongvec_minmax             sysctl_head_finish
>           __do_proc_doulongvec_minmax           unuse_table
>             i = table->data;
>             *i = val;  // corrupt CPU1's stack
>
> Fix this by duplicating the `table`, and only update the duplicate of
> it.
>
> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> ---
> changelogs in v2:
>  1. Add more details about how the race happened to the commit message.
>
>  kernel/stackleak.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> index a8fc9ae1d03d..fd95b87478ff 100644
> --- a/kernel/stackleak.c
> +++ b/kernel/stackleak.c
> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
>         int ret = 0;
>         int state = !static_branch_unlikely(&stack_erasing_bypass);
>         int prev_state = state;
> +       struct ctl_table dup_table = *table;
>
> -       table->data = &state;
> -       table->maxlen = sizeof(int);
> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +       /*
> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
> +        * can duplicate the @table and alter the duplicate of it.
> +        */
> +       dup_table.data = &state;
> +       dup_table.maxlen = sizeof(int);
> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
>         state = !!state;
>         if (ret || !write || state == prev_state)
>                 return ret;
> --
> 2.11.0
>


-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-07  2:54 ` Muchun Song
@ 2020-09-07 11:24   ` Alexander Popov
  2020-09-07 13:53     ` [External] " Muchun Song
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Popov @ 2020-09-07 11:24 UTC (permalink / raw)
  To: Muchun Song, Kees Cook, Masami Hiramatsu, Steven Rostedt,
	miguel.ojeda.sandonis
  Cc: LKML

On 07.09.2020 05:54, Muchun Song wrote:
> Hi all,
> 
> Any comments or suggestions? Thanks.
> 
> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>
>> There is a race between the assignment of `table->data` and write value
>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
>> the other thread.
>>
>>     CPU0:                                 CPU1:
>>                                           proc_sys_write
>>     stack_erasing_sysctl                    proc_sys_call_handler
>>       table->data = &state;                   stack_erasing_sysctl
>>                                                 table->data = &state;
>>       proc_doulongvec_minmax
>>         do_proc_doulongvec_minmax             sysctl_head_finish
>>           __do_proc_doulongvec_minmax           unuse_table
>>             i = table->data;
>>             *i = val;  // corrupt CPU1's stack

Hello everyone!

As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
handlers. Is that issue relevant for other handlers as well?

Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
that? Thanks!

Best regards,
Alexander

>> Fix this by duplicating the `table`, and only update the duplicate of
>> it.
>>
>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>> ---
>> changelogs in v2:
>>  1. Add more details about how the race happened to the commit message.
>>
>>  kernel/stackleak.c | 11 ++++++++---
>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
>> index a8fc9ae1d03d..fd95b87478ff 100644
>> --- a/kernel/stackleak.c
>> +++ b/kernel/stackleak.c
>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
>>         int ret = 0;
>>         int state = !static_branch_unlikely(&stack_erasing_bypass);
>>         int prev_state = state;
>> +       struct ctl_table dup_table = *table;
>>
>> -       table->data = &state;
>> -       table->maxlen = sizeof(int);
>> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>> +       /*
>> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
>> +        * can duplicate the @table and alter the duplicate of it.
>> +        */
>> +       dup_table.data = &state;
>> +       dup_table.maxlen = sizeof(int);
>> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
>>         state = !!state;
>>         if (ret || !write || state == prev_state)
>>                 return ret;
>> --
>> 2.11.0
>>
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-07 11:24   ` Alexander Popov
@ 2020-09-07 13:53     ` Muchun Song
  2020-09-11  3:53       ` Muchun Song
  2020-09-14 13:56       ` Alexander Popov
  0 siblings, 2 replies; 10+ messages in thread
From: Muchun Song @ 2020-09-07 13:53 UTC (permalink / raw)
  To: alex.popov
  Cc: Kees Cook, Masami Hiramatsu, Steven Rostedt, miguel.ojeda.sandonis, LKML

On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov <alex.popov@linux.com> wrote:
>
> On 07.09.2020 05:54, Muchun Song wrote:
> > Hi all,
> >
> > Any comments or suggestions? Thanks.
> >
> > On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >>
> >> There is a race between the assignment of `table->data` and write value
> >> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
> >> the other thread.
> >>
> >>     CPU0:                                 CPU1:
> >>                                           proc_sys_write
> >>     stack_erasing_sysctl                    proc_sys_call_handler
> >>       table->data = &state;                   stack_erasing_sysctl
> >>                                                 table->data = &state;
> >>       proc_doulongvec_minmax
> >>         do_proc_doulongvec_minmax             sysctl_head_finish
> >>           __do_proc_doulongvec_minmax           unuse_table
> >>             i = table->data;
> >>             *i = val;  // corrupt CPU1's stack
>
> Hello everyone!
>
> As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
> handlers. Is that issue relevant for other handlers as well?

Yeah, it's very similar. But the difference is that others use a
global variable as the
`table->data`, but here we use a local variable as the `table->data`.
The local variable
is allocated from the stack. So other thread could corrupt the stack
like the diagram
above.

>
> Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
> that? Thanks!

Why did I find this problem? Because I solve another problem which is
very similar to
this issue. You can reference the following fix patch. Thanks.

  https://lkml.org/lkml/2020/8/22/105




>
> Best regards,
> Alexander
>
> >> Fix this by duplicating the `table`, and only update the duplicate of
> >> it.
> >>
> >> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >> ---
> >> changelogs in v2:
> >>  1. Add more details about how the race happened to the commit message.
> >>
> >>  kernel/stackleak.c | 11 ++++++++---
> >>  1 file changed, 8 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> >> index a8fc9ae1d03d..fd95b87478ff 100644
> >> --- a/kernel/stackleak.c
> >> +++ b/kernel/stackleak.c
> >> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
> >>         int ret = 0;
> >>         int state = !static_branch_unlikely(&stack_erasing_bypass);
> >>         int prev_state = state;
> >> +       struct ctl_table dup_table = *table;
> >>
> >> -       table->data = &state;
> >> -       table->maxlen = sizeof(int);
> >> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >> +       /*
> >> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
> >> +        * can duplicate the @table and alter the duplicate of it.
> >> +        */
> >> +       dup_table.data = &state;
> >> +       dup_table.maxlen = sizeof(int);
> >> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
> >>         state = !!state;
> >>         if (ret || !write || state == prev_state)
> >>                 return ret;
> >> --
> >> 2.11.0
> >>
> >
> >
>


--
Yours,
Muchun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-07 13:53     ` [External] " Muchun Song
@ 2020-09-11  3:53       ` Muchun Song
  2020-09-14 13:56       ` Alexander Popov
  1 sibling, 0 replies; 10+ messages in thread
From: Muchun Song @ 2020-09-11  3:53 UTC (permalink / raw)
  To: alex.popov, Kees Cook, Masami Hiramatsu, Steven Rostedt,
	miguel.ojeda.sandonis
  Cc: LKML

Ping guys. Thanks.

On Mon, Sep 7, 2020 at 9:53 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov <alex.popov@linux.com> wrote:
> >
> > On 07.09.2020 05:54, Muchun Song wrote:
> > > Hi all,
> > >
> > > Any comments or suggestions? Thanks.
> > >
> > > On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
> > >>
> > >> There is a race between the assignment of `table->data` and write value
> > >> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
> > >> the other thread.
> > >>
> > >>     CPU0:                                 CPU1:
> > >>                                           proc_sys_write
> > >>     stack_erasing_sysctl                    proc_sys_call_handler
> > >>       table->data = &state;                   stack_erasing_sysctl
> > >>                                                 table->data = &state;
> > >>       proc_doulongvec_minmax
> > >>         do_proc_doulongvec_minmax             sysctl_head_finish
> > >>           __do_proc_doulongvec_minmax           unuse_table
> > >>             i = table->data;
> > >>             *i = val;  // corrupt CPU1's stack
> >
> > Hello everyone!
> >
> > As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
> > handlers. Is that issue relevant for other handlers as well?
>
> Yeah, it's very similar. But the difference is that others use a
> global variable as the
> `table->data`, but here we use a local variable as the `table->data`.
> The local variable
> is allocated from the stack. So other thread could corrupt the stack
> like the diagram
> above.
>
> >
> > Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
> > that? Thanks!
>
> Why did I find this problem? Because I solve another problem which is
> very similar to
> this issue. You can reference the following fix patch. Thanks.
>
>   https://lkml.org/lkml/2020/8/22/105
>
>
>
>
> >
> > Best regards,
> > Alexander
> >
> > >> Fix this by duplicating the `table`, and only update the duplicate of
> > >> it.
> > >>
> > >> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> > >> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > >> ---
> > >> changelogs in v2:
> > >>  1. Add more details about how the race happened to the commit message.
> > >>
> > >>  kernel/stackleak.c | 11 ++++++++---
> > >>  1 file changed, 8 insertions(+), 3 deletions(-)
> > >>
> > >> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> > >> index a8fc9ae1d03d..fd95b87478ff 100644
> > >> --- a/kernel/stackleak.c
> > >> +++ b/kernel/stackleak.c
> > >> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
> > >>         int ret = 0;
> > >>         int state = !static_branch_unlikely(&stack_erasing_bypass);
> > >>         int prev_state = state;
> > >> +       struct ctl_table dup_table = *table;
> > >>
> > >> -       table->data = &state;
> > >> -       table->maxlen = sizeof(int);
> > >> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > >> +       /*
> > >> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
> > >> +        * can duplicate the @table and alter the duplicate of it.
> > >> +        */
> > >> +       dup_table.data = &state;
> > >> +       dup_table.maxlen = sizeof(int);
> > >> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
> > >>         state = !!state;
> > >>         if (ret || !write || state == prev_state)
> > >>                 return ret;
> > >> --
> > >> 2.11.0
> > >>
> > >
> > >
> >
>
>
> --
> Yours,
> Muchun



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-07 13:53     ` [External] " Muchun Song
  2020-09-11  3:53       ` Muchun Song
@ 2020-09-14 13:56       ` Alexander Popov
  2020-09-14 14:09         ` Muchun Song
  2020-09-22  5:59         ` Muchun Song
  1 sibling, 2 replies; 10+ messages in thread
From: Alexander Popov @ 2020-09-14 13:56 UTC (permalink / raw)
  To: Muchun Song
  Cc: Kees Cook, Masami Hiramatsu, Steven Rostedt,
	miguel.ojeda.sandonis, LKML, Luis Chamberlain, Iurii Zaikin,
	linux-fsdevel, mike.kravetz

On 07.09.2020 16:53, Muchun Song wrote:
> On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov <alex.popov@linux.com> wrote:
>>
>> On 07.09.2020 05:54, Muchun Song wrote:
>>> Hi all,
>>>
>>> Any comments or suggestions? Thanks.
>>>
>>> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>>>
>>>> There is a race between the assignment of `table->data` and write value
>>>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
>>>> the other thread.
>>>>
>>>>     CPU0:                                 CPU1:
>>>>                                           proc_sys_write
>>>>     stack_erasing_sysctl                    proc_sys_call_handler
>>>>       table->data = &state;                   stack_erasing_sysctl
>>>>                                                 table->data = &state;
>>>>       proc_doulongvec_minmax
>>>>         do_proc_doulongvec_minmax             sysctl_head_finish
>>>>           __do_proc_doulongvec_minmax           unuse_table
>>>>             i = table->data;
>>>>             *i = val;  // corrupt CPU1's stack
>>
>> Hello everyone!
>>
>> As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
>> handlers. Is that issue relevant for other handlers as well?
> 
> Yeah, it's very similar. But the difference is that others use a
> global variable as the
> `table->data`, but here we use a local variable as the `table->data`.
> The local variable
> is allocated from the stack. So other thread could corrupt the stack
> like the diagram
> above.

Hi Muchun,

I don't think that the proposed copying of struct ctl_table to local variable is
a good fix of that issue. There might be other bugs caused by concurrent
execution of stack_erasing_sysctl().

I would recommend using some locking instead.

But you say there are other similar issues. Should it be fixed on higher level
in kernel/sysctl.c?

[Adding more knowing people to CC]

Thanks!

>> Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
>> that? Thanks!
> 
> Why did I find this problem? Because I solve another problem which is
> very similar to
> this issue. You can reference the following fix patch. Thanks.
> 
>   https://lkml.org/lkml/2020/8/22/105
>>
>>>> Fix this by duplicating the `table`, and only update the duplicate of
>>>> it.
>>>>
>>>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>> ---
>>>> changelogs in v2:
>>>>  1. Add more details about how the race happened to the commit message.
>>>>
>>>>  kernel/stackleak.c | 11 ++++++++---
>>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
>>>> index a8fc9ae1d03d..fd95b87478ff 100644
>>>> --- a/kernel/stackleak.c
>>>> +++ b/kernel/stackleak.c
>>>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
>>>>         int ret = 0;
>>>>         int state = !static_branch_unlikely(&stack_erasing_bypass);
>>>>         int prev_state = state;
>>>> +       struct ctl_table dup_table = *table;
>>>>
>>>> -       table->data = &state;
>>>> -       table->maxlen = sizeof(int);
>>>> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>>> +       /*
>>>> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
>>>> +        * can duplicate the @table and alter the duplicate of it.
>>>> +        */
>>>> +       dup_table.data = &state;
>>>> +       dup_table.maxlen = sizeof(int);
>>>> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
>>>>         state = !!state;
>>>>         if (ret || !write || state == prev_state)
>>>>                 return ret;
>>>> --
>>>> 2.11.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-14 13:56       ` Alexander Popov
@ 2020-09-14 14:09         ` Muchun Song
  2020-09-22  5:59         ` Muchun Song
  1 sibling, 0 replies; 10+ messages in thread
From: Muchun Song @ 2020-09-14 14:09 UTC (permalink / raw)
  To: alex.popov
  Cc: Kees Cook, Masami Hiramatsu, Steven Rostedt,
	miguel.ojeda.sandonis, LKML, Luis Chamberlain, Iurii Zaikin,
	linux-fsdevel, Mike Kravetz

On Mon, Sep 14, 2020 at 9:56 PM Alexander Popov <alex.popov@linux.com> wrote:
>
> On 07.09.2020 16:53, Muchun Song wrote:
> > On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov <alex.popov@linux.com> wrote:
> >>
> >> On 07.09.2020 05:54, Muchun Song wrote:
> >>> Hi all,
> >>>
> >>> Any comments or suggestions? Thanks.
> >>>
> >>> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >>>>
> >>>> There is a race between the assignment of `table->data` and write value
> >>>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
> >>>> the other thread.
> >>>>
> >>>>     CPU0:                                 CPU1:
> >>>>                                           proc_sys_write
> >>>>     stack_erasing_sysctl                    proc_sys_call_handler
> >>>>       table->data = &state;                   stack_erasing_sysctl
> >>>>                                                 table->data = &state;
> >>>>       proc_doulongvec_minmax
> >>>>         do_proc_doulongvec_minmax             sysctl_head_finish
> >>>>           __do_proc_doulongvec_minmax           unuse_table
> >>>>             i = table->data;
> >>>>             *i = val;  // corrupt CPU1's stack
> >>
> >> Hello everyone!
> >>
> >> As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
> >> handlers. Is that issue relevant for other handlers as well?
> >
> > Yeah, it's very similar. But the difference is that others use a
> > global variable as the
> > `table->data`, but here we use a local variable as the `table->data`.
> > The local variable
> > is allocated from the stack. So other thread could corrupt the stack
> > like the diagram
> > above.
>
> Hi Muchun,
>
> I don't think that the proposed copying of struct ctl_table to local variable is
> a good fix of that issue. There might be other bugs caused by concurrent
> execution of stack_erasing_sysctl().

I can not figure out how the bug happened when there is concurrent
execution of stack_erasing_sysctl().

>
> I would recommend using some locking instead.
>
> But you say there are other similar issues. Should it be fixed on higher level
> in kernel/sysctl.c?

Yeah, we can see the same issue here.

    https://lkml.org/lkml/2020/8/22/105.

I agree with you. Maybe a fix on the higher level is a good choice in
kernel/sysctl.c. If someone also agrees with this solution, I can do
this work.

>
> [Adding more knowing people to CC]
>
> Thanks!
>
> >> Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
> >> that? Thanks!
> >
> > Why did I find this problem? Because I solve another problem which is
> > very similar to
> > this issue. You can reference the following fix patch. Thanks.
> >
> >   https://lkml.org/lkml/2020/8/22/105
> >>
> >>>> Fix this by duplicating the `table`, and only update the duplicate of
> >>>> it.
> >>>>
> >>>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> >>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>>> ---
> >>>> changelogs in v2:
> >>>>  1. Add more details about how the race happened to the commit message.
> >>>>
> >>>>  kernel/stackleak.c | 11 ++++++++---
> >>>>  1 file changed, 8 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> >>>> index a8fc9ae1d03d..fd95b87478ff 100644
> >>>> --- a/kernel/stackleak.c
> >>>> +++ b/kernel/stackleak.c
> >>>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
> >>>>         int ret = 0;
> >>>>         int state = !static_branch_unlikely(&stack_erasing_bypass);
> >>>>         int prev_state = state;
> >>>> +       struct ctl_table dup_table = *table;
> >>>>
> >>>> -       table->data = &state;
> >>>> -       table->maxlen = sizeof(int);
> >>>> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >>>> +       /*
> >>>> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
> >>>> +        * can duplicate the @table and alter the duplicate of it.
> >>>> +        */
> >>>> +       dup_table.data = &state;
> >>>> +       dup_table.maxlen = sizeof(int);
> >>>> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
> >>>>         state = !!state;
> >>>>         if (ret || !write || state == prev_state)
> >>>>                 return ret;
> >>>> --
> >>>> 2.11.0



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-14 13:56       ` Alexander Popov
  2020-09-14 14:09         ` Muchun Song
@ 2020-09-22  5:59         ` Muchun Song
  2020-09-28  6:32           ` Alexander Popov
  1 sibling, 1 reply; 10+ messages in thread
From: Muchun Song @ 2020-09-22  5:59 UTC (permalink / raw)
  To: alex.popov
  Cc: Kees Cook, Masami Hiramatsu, Steven Rostedt,
	miguel.ojeda.sandonis, LKML, Luis Chamberlain, Iurii Zaikin,
	linux-fsdevel, Mike Kravetz

On Mon, Sep 14, 2020 at 9:56 PM Alexander Popov <alex.popov@linux.com> wrote:
>
> On 07.09.2020 16:53, Muchun Song wrote:
> > On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov <alex.popov@linux.com> wrote:
> >>
> >> On 07.09.2020 05:54, Muchun Song wrote:
> >>> Hi all,
> >>>
> >>> Any comments or suggestions? Thanks.
> >>>
> >>> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >>>>
> >>>> There is a race between the assignment of `table->data` and write value
> >>>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
> >>>> the other thread.
> >>>>
> >>>>     CPU0:                                 CPU1:
> >>>>                                           proc_sys_write
> >>>>     stack_erasing_sysctl                    proc_sys_call_handler
> >>>>       table->data = &state;                   stack_erasing_sysctl
> >>>>                                                 table->data = &state;
> >>>>       proc_doulongvec_minmax
> >>>>         do_proc_doulongvec_minmax             sysctl_head_finish
> >>>>           __do_proc_doulongvec_minmax           unuse_table
> >>>>             i = table->data;
> >>>>             *i = val;  // corrupt CPU1's stack
> >>
> >> Hello everyone!
> >>
> >> As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
> >> handlers. Is that issue relevant for other handlers as well?
> >
> > Yeah, it's very similar. But the difference is that others use a
> > global variable as the
> > `table->data`, but here we use a local variable as the `table->data`.
> > The local variable
> > is allocated from the stack. So other thread could corrupt the stack
> > like the diagram
> > above.
>
> Hi Muchun,
>
> I don't think that the proposed copying of struct ctl_table to local variable is
> a good fix of that issue. There might be other bugs caused by concurrent
> execution of stack_erasing_sysctl().

Hi Alexander,

Yeah, we can fix this issue on a higher level in kernel/sysctl.c. But
we will rework some kernel/sysctl.c base code. Because the commit:

    964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")

is introduced from linux-4.20. So we should backport this fix patch to the other
stable tree. Be the safe side, we can apply this patch to only fix the
stack_erasing_sysctl. In this case, the impact of backport is minimal.

In the feature, we can fix the issue(another patch) like this on a higher
level in kernel/sysctl.c and only apply it in the later kernel version. Is
this OK?

>
> I would recommend using some locking instead.
>
> But you say there are other similar issues. Should it be fixed on higher level
> in kernel/sysctl.c?
>
> [Adding more knowing people to CC]
>
> Thanks!
>
> >> Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
> >> that? Thanks!
> >
> > Why did I find this problem? Because I solve another problem which is
> > very similar to
> > this issue. You can reference the following fix patch. Thanks.
> >
> >   https://lkml.org/lkml/2020/8/22/105
> >>
> >>>> Fix this by duplicating the `table`, and only update the duplicate of
> >>>> it.
> >>>>
> >>>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> >>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>>> ---
> >>>> changelogs in v2:
> >>>>  1. Add more details about how the race happened to the commit message.
> >>>>
> >>>>  kernel/stackleak.c | 11 ++++++++---
> >>>>  1 file changed, 8 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> >>>> index a8fc9ae1d03d..fd95b87478ff 100644
> >>>> --- a/kernel/stackleak.c
> >>>> +++ b/kernel/stackleak.c
> >>>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
> >>>>         int ret = 0;
> >>>>         int state = !static_branch_unlikely(&stack_erasing_bypass);
> >>>>         int prev_state = state;
> >>>> +       struct ctl_table dup_table = *table;
> >>>>
> >>>> -       table->data = &state;
> >>>> -       table->maxlen = sizeof(int);
> >>>> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >>>> +       /*
> >>>> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
> >>>> +        * can duplicate the @table and alter the duplicate of it.
> >>>> +        */
> >>>> +       dup_table.data = &state;
> >>>> +       dup_table.maxlen = sizeof(int);
> >>>> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
> >>>>         state = !!state;
> >>>>         if (ret || !write || state == prev_state)
> >>>>                 return ret;
> >>>> --
> >>>> 2.11.0



-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-22  5:59         ` Muchun Song
@ 2020-09-28  6:32           ` Alexander Popov
  2020-09-28  7:30             ` Muchun Song
  0 siblings, 1 reply; 10+ messages in thread
From: Alexander Popov @ 2020-09-28  6:32 UTC (permalink / raw)
  To: Muchun Song
  Cc: Kees Cook, Masami Hiramatsu, Steven Rostedt,
	miguel.ojeda.sandonis, LKML, Luis Chamberlain, Iurii Zaikin,
	linux-fsdevel, Mike Kravetz

On 22.09.2020 08:59, Muchun Song wrote:
> On Mon, Sep 14, 2020 at 9:56 PM Alexander Popov <alex.popov@linux.com> wrote:
>>
>> On 07.09.2020 16:53, Muchun Song wrote:
>>> On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov <alex.popov@linux.com> wrote:
>>>>
>>>> On 07.09.2020 05:54, Muchun Song wrote:
>>>>> Hi all,
>>>>>
>>>>> Any comments or suggestions? Thanks.
>>>>>
>>>>> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
>>>>>>
>>>>>> There is a race between the assignment of `table->data` and write value
>>>>>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
>>>>>> the other thread.
>>>>>>
>>>>>>     CPU0:                                 CPU1:
>>>>>>                                           proc_sys_write
>>>>>>     stack_erasing_sysctl                    proc_sys_call_handler
>>>>>>       table->data = &state;                   stack_erasing_sysctl
>>>>>>                                                 table->data = &state;
>>>>>>       proc_doulongvec_minmax
>>>>>>         do_proc_doulongvec_minmax             sysctl_head_finish
>>>>>>           __do_proc_doulongvec_minmax           unuse_table
>>>>>>             i = table->data;
>>>>>>             *i = val;  // corrupt CPU1's stack
>>>>
>>>> Hello everyone!
>>>>
>>>> As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
>>>> handlers. Is that issue relevant for other handlers as well?
>>>
>>> Yeah, it's very similar. But the difference is that others use a
>>> global variable as the
>>> `table->data`, but here we use a local variable as the `table->data`.
>>> The local variable
>>> is allocated from the stack. So other thread could corrupt the stack
>>> like the diagram
>>> above.
>>
>> Hi Muchun,
>>
>> I don't think that the proposed copying of struct ctl_table to local variable is
>> a good fix of that issue. There might be other bugs caused by concurrent
>> execution of stack_erasing_sysctl().
> 
> Hi Alexander,
> 
> Yeah, we can fix this issue on a higher level in kernel/sysctl.c. But
> we will rework some kernel/sysctl.c base code. Because the commit:
> 
>     964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> 
> is introduced from linux-4.20. So we should backport this fix patch to the other
> stable tree. Be the safe side, we can apply this patch to only fix the
> stack_erasing_sysctl. In this case, the impact of backport is minimal.
> 
> In the feature, we can fix the issue(another patch) like this on a higher
> level in kernel/sysctl.c and only apply it in the later kernel version. Is
> this OK?

Muchun, I would recommend:
  1) fixing the reason of the issue in kernel/sysctl.c
or
  2) use some locking in stack_erasing_sysctl() to fix the issue locally.

Honestly, I don't like this "dup_table" approach in the patch below. It doesn't
remove the data race.

Thank you!
Alexander

>> I would recommend using some locking instead.
>>
>> But you say there are other similar issues. Should it be fixed on higher level
>> in kernel/sysctl.c?
>>
>> [Adding more knowing people to CC]
>>
>> Thanks!
>>
>>>> Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
>>>> that? Thanks!
>>>
>>> Why did I find this problem? Because I solve another problem which is
>>> very similar to
>>> this issue. You can reference the following fix patch. Thanks.
>>>
>>>   https://lkml.org/lkml/2020/8/22/105
>>>>
>>>>>> Fix this by duplicating the `table`, and only update the duplicate of
>>>>>> it.
>>>>>>
>>>>>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
>>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
>>>>>> ---
>>>>>> changelogs in v2:
>>>>>>  1. Add more details about how the race happened to the commit message.
>>>>>>
>>>>>>  kernel/stackleak.c | 11 ++++++++---
>>>>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
>>>>>> index a8fc9ae1d03d..fd95b87478ff 100644
>>>>>> --- a/kernel/stackleak.c
>>>>>> +++ b/kernel/stackleak.c
>>>>>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
>>>>>>         int ret = 0;
>>>>>>         int state = !static_branch_unlikely(&stack_erasing_bypass);
>>>>>>         int prev_state = state;
>>>>>> +       struct ctl_table dup_table = *table;
>>>>>>
>>>>>> -       table->data = &state;
>>>>>> -       table->maxlen = sizeof(int);
>>>>>> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
>>>>>> +       /*
>>>>>> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
>>>>>> +        * can duplicate the @table and alter the duplicate of it.
>>>>>> +        */
>>>>>> +       dup_table.data = &state;
>>>>>> +       dup_table.maxlen = sizeof(int);
>>>>>> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
>>>>>>         state = !!state;
>>>>>>         if (ret || !write || state == prev_state)
>>>>>>                 return ret;
>>>>>> --
>>>>>> 2.11.0
> 
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [External] Re: [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers
  2020-09-28  6:32           ` Alexander Popov
@ 2020-09-28  7:30             ` Muchun Song
  0 siblings, 0 replies; 10+ messages in thread
From: Muchun Song @ 2020-09-28  7:30 UTC (permalink / raw)
  To: alex.popov
  Cc: Kees Cook, Masami Hiramatsu, Steven Rostedt,
	miguel.ojeda.sandonis, LKML, Luis Chamberlain, Iurii Zaikin,
	linux-fsdevel, Mike Kravetz

On Mon, Sep 28, 2020 at 2:32 PM Alexander Popov <alex.popov@linux.com> wrote:
>
> On 22.09.2020 08:59, Muchun Song wrote:
> > On Mon, Sep 14, 2020 at 9:56 PM Alexander Popov <alex.popov@linux.com> wrote:
> >>
> >> On 07.09.2020 16:53, Muchun Song wrote:
> >>> On Mon, Sep 7, 2020 at 7:24 PM Alexander Popov <alex.popov@linux.com> wrote:
> >>>>
> >>>> On 07.09.2020 05:54, Muchun Song wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> Any comments or suggestions? Thanks.
> >>>>>
> >>>>> On Fri, Aug 28, 2020 at 11:19 AM Muchun Song <songmuchun@bytedance.com> wrote:
> >>>>>>
> >>>>>> There is a race between the assignment of `table->data` and write value
> >>>>>> to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
> >>>>>> the other thread.
> >>>>>>
> >>>>>>     CPU0:                                 CPU1:
> >>>>>>                                           proc_sys_write
> >>>>>>     stack_erasing_sysctl                    proc_sys_call_handler
> >>>>>>       table->data = &state;                   stack_erasing_sysctl
> >>>>>>                                                 table->data = &state;
> >>>>>>       proc_doulongvec_minmax
> >>>>>>         do_proc_doulongvec_minmax             sysctl_head_finish
> >>>>>>           __do_proc_doulongvec_minmax           unuse_table
> >>>>>>             i = table->data;
> >>>>>>             *i = val;  // corrupt CPU1's stack
> >>>>
> >>>> Hello everyone!
> >>>>
> >>>> As I remember, I implemented stack_erasing_sysctl() very similar to other sysctl
> >>>> handlers. Is that issue relevant for other handlers as well?
> >>>
> >>> Yeah, it's very similar. But the difference is that others use a
> >>> global variable as the
> >>> `table->data`, but here we use a local variable as the `table->data`.
> >>> The local variable
> >>> is allocated from the stack. So other thread could corrupt the stack
> >>> like the diagram
> >>> above.
> >>
> >> Hi Muchun,
> >>
> >> I don't think that the proposed copying of struct ctl_table to local variable is
> >> a good fix of that issue. There might be other bugs caused by concurrent
> >> execution of stack_erasing_sysctl().
> >
> > Hi Alexander,
> >
> > Yeah, we can fix this issue on a higher level in kernel/sysctl.c. But
> > we will rework some kernel/sysctl.c base code. Because the commit:
> >
> >     964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> >
> > is introduced from linux-4.20. So we should backport this fix patch to the other
> > stable tree. Be the safe side, we can apply this patch to only fix the
> > stack_erasing_sysctl. In this case, the impact of backport is minimal.
> >
> > In the feature, we can fix the issue(another patch) like this on a higher
> > level in kernel/sysctl.c and only apply it in the later kernel version. Is
> > this OK?
>
> Muchun, I would recommend:
>   1) fixing the reason of the issue in kernel/sysctl.c
> or
>   2) use some locking in stack_erasing_sysctl() to fix the issue locally.

Yeah, this is work.

>
> Honestly, I don't like this "dup_table" approach in the patch below. It doesn't
> remove the data race.

Alexander, I don't understand where the race is? I think that the duplicate is
enough. But If you prefer using the lock to protect the data. I also
can do that.

>
> Thank you!
> Alexander
>
> >> I would recommend using some locking instead.
> >>
> >> But you say there are other similar issues. Should it be fixed on higher level
> >> in kernel/sysctl.c?
> >>
> >> [Adding more knowing people to CC]
> >>
> >> Thanks!
> >>
> >>>> Muchun, could you elaborate how CPU1's stack is corrupted and how you detected
> >>>> that? Thanks!
> >>>
> >>> Why did I find this problem? Because I solve another problem which is
> >>> very similar to
> >>> this issue. You can reference the following fix patch. Thanks.
> >>>
> >>>   https://lkml.org/lkml/2020/8/22/105
> >>>>
> >>>>>> Fix this by duplicating the `table`, and only update the duplicate of
> >>>>>> it.
> >>>>>>
> >>>>>> Fixes: 964c9dff0091 ("stackleak: Allow runtime disabling of kernel stack erasing")
> >>>>>> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> >>>>>> ---
> >>>>>> changelogs in v2:
> >>>>>>  1. Add more details about how the race happened to the commit message.
> >>>>>>
> >>>>>>  kernel/stackleak.c | 11 ++++++++---
> >>>>>>  1 file changed, 8 insertions(+), 3 deletions(-)
> >>>>>>
> >>>>>> diff --git a/kernel/stackleak.c b/kernel/stackleak.c
> >>>>>> index a8fc9ae1d03d..fd95b87478ff 100644
> >>>>>> --- a/kernel/stackleak.c
> >>>>>> +++ b/kernel/stackleak.c
> >>>>>> @@ -25,10 +25,15 @@ int stack_erasing_sysctl(struct ctl_table *table, int write,
> >>>>>>         int ret = 0;
> >>>>>>         int state = !static_branch_unlikely(&stack_erasing_bypass);
> >>>>>>         int prev_state = state;
> >>>>>> +       struct ctl_table dup_table = *table;
> >>>>>>
> >>>>>> -       table->data = &state;
> >>>>>> -       table->maxlen = sizeof(int);
> >>>>>> -       ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> >>>>>> +       /*
> >>>>>> +        * In order to avoid races with __do_proc_doulongvec_minmax(), we
> >>>>>> +        * can duplicate the @table and alter the duplicate of it.
> >>>>>> +        */
> >>>>>> +       dup_table.data = &state;
> >>>>>> +       dup_table.maxlen = sizeof(int);
> >>>>>> +       ret = proc_dointvec_minmax(&dup_table, write, buffer, lenp, ppos);
> >>>>>>         state = !!state;
> >>>>>>         if (ret || !write || state == prev_state)
> >>>>>>                 return ret;
> >>>>>> --
> >>>>>> 2.11.0
> >
> >
> >
>


-- 
Yours,
Muchun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-09-28  7:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28  3:19 [PATCH v2] stackleak: Fix a race between stack erasing sysctl handlers Muchun Song
2020-09-07  2:54 ` Muchun Song
2020-09-07 11:24   ` Alexander Popov
2020-09-07 13:53     ` [External] " Muchun Song
2020-09-11  3:53       ` Muchun Song
2020-09-14 13:56       ` Alexander Popov
2020-09-14 14:09         ` Muchun Song
2020-09-22  5:59         ` Muchun Song
2020-09-28  6:32           ` Alexander Popov
2020-09-28  7:30             ` Muchun Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).