linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
@ 2020-03-06  1:02 brookxu
  2020-03-10  9:48 ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: brookxu @ 2020-03-06  1:02 UTC (permalink / raw)
  To: Michal Hocko; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

From: Chunguang Xu <brookxu@tencent.com>

An eventfd monitors multiple memory thresholds of the cgroup, closes them,
the kernel deletes all events related to this eventfd. Before all events
are deleted, another eventfd monitors the memory threshold of this cgroup,
leading to a crash:

[  135.675108] BUG: kernel NULL pointer dereference, address: 0000000000000004
[  135.675350] #PF: supervisor write access in kernel mode
[  135.675579] #PF: error_code(0x0002) - not-present page
[  135.675816] PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
[  135.676080] Oops: 0002 [#1] SMP PTI
[  135.676332] CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
[  135.676610] Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
[  135.676909] Workqueue: events memcg_event_remove
[  135.677192] RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
[  135.677825] RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
[  135.678186] RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
[  135.678548] RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
[  135.678912] RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
[  135.679287] R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
[  135.679670] R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
[  135.680066] FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
[  135.680475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.680894] CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
[  135.681325] Call Trace:
[  135.681763]  memcg_event_remove+0x32/0x90
[  135.682209]  process_one_work+0x172/0x380
[  135.682657]  worker_thread+0x49/0x3f0
[  135.683111]  kthread+0xf8/0x130
[  135.683570]  ? max_active_store+0x80/0x80
[  135.684034]  ? kthread_bind+0x10/0x10
[  135.684506]  ret_from_fork+0x35/0x40
[  135.689733] CR2: 0000000000000004

We can reproduce this problem in the following ways:
 
1. We create a new cgroup subdirectory and a new eventfd, and then we
   monitor multiple memory thresholds of the cgroup through this eventfd.
2. closing this eventfd, and __mem_cgroup_usage_unregister_event () will be
   called multiple times to delete all events related to this eventfd.

The first time __mem_cgroup_usage_unregister_event() is called, the kernel
will clear all items related to this eventfd in thresholds-> primary.Since
there is currently only one eventfd, thresholds-> primary becomes empty,
so the kernel will set thresholds-> primary and hresholds-> spare to NULL.
If at this time, the user creates a new eventfd and monitor the memory
threshold of this cgroup, kernel will re-initialize thresholds-> primary.
Then when __mem_cgroup_usage_unregister_event () is called for the second
time, because thresholds-> primary is not empty, the system will access
thresholds-> spare, but thresholds-> spare is NULL, which will trigger a
crash.

In general, the longer it takes to delete all events related to this
eventfd, the easier it is to trigger this problem.

The solution is to check whether the thresholds associated with the eventfd
has been cleared when deleting the event. If so, we do nothing.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
---
 mm/memcontrol.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d09776c..4575a58 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
     struct mem_cgroup_thresholds *thresholds;
     struct mem_cgroup_threshold_ary *new;
     unsigned long usage;
-    int i, j, size;
+    int i, j, size, entries;
 
     mutex_lock(&memcg->thresholds_lock);
 
@@ -4047,12 +4047,18 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
     __mem_cgroup_threshold(memcg, type == _MEMSWAP);
 
     /* Calculate new number of threshold */
-    size = 0;
+    size = entries = 0;
     for (i = 0; i < thresholds->primary->size; i++) {
         if (thresholds->primary->entries[i].eventfd != eventfd)
             size++;
+        else
+            entries++;
     }
 
+    /* If items related to eventfd have been cleared, nothing to do */
+    if (!entries)
+        goto unlock;
+
     new = thresholds->spare;
 
     /* Set thresholds array to NULL if we don't have thresholds */
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
  2020-03-06  1:02 [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event brookxu
@ 2020-03-10  9:48 ` Michal Hocko
  2020-03-10 10:41   ` Kirill A. Shutemov
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2020-03-10  9:48 UTC (permalink / raw)
  To: brookxu, Kirill A. Shutemov
  Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

[Cc Kirill, I didn't realize he has implemented this code]

On Fri 06-03-20 09:02:02, brookxu wrote:
> From: Chunguang Xu <brookxu@tencent.com>
> 
> An eventfd monitors multiple memory thresholds of the cgroup, closes them,
> the kernel deletes all events related to this eventfd. Before all events
> are deleted, another eventfd monitors the memory threshold of this cgroup,
> leading to a crash:
> 
> [  135.675108] BUG: kernel NULL pointer dereference, address: 0000000000000004
> [  135.675350] #PF: supervisor write access in kernel mode
> [  135.675579] #PF: error_code(0x0002) - not-present page
> [  135.675816] PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
> [  135.676080] Oops: 0002 [#1] SMP PTI
> [  135.676332] CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
> [  135.676610] Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
> [  135.676909] Workqueue: events memcg_event_remove
> [  135.677192] RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
> [  135.677825] RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
> [  135.678186] RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
> [  135.678548] RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
> [  135.678912] RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
> [  135.679287] R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
> [  135.679670] R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
> [  135.680066] FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
> [  135.680475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  135.680894] CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
> [  135.681325] Call Trace:
> [  135.681763]  memcg_event_remove+0x32/0x90
> [  135.682209]  process_one_work+0x172/0x380
> [  135.682657]  worker_thread+0x49/0x3f0
> [  135.683111]  kthread+0xf8/0x130
> [  135.683570]  ? max_active_store+0x80/0x80
> [  135.684034]  ? kthread_bind+0x10/0x10
> [  135.684506]  ret_from_fork+0x35/0x40
> [  135.689733] CR2: 0000000000000004
> 
> We can reproduce this problem in the following ways:
>  
> 1. We create a new cgroup subdirectory and a new eventfd, and then we
>    monitor multiple memory thresholds of the cgroup through this eventfd.
> 2. closing this eventfd, and __mem_cgroup_usage_unregister_event () will be
>    called multiple times to delete all events related to this eventfd.
> 
> The first time __mem_cgroup_usage_unregister_event() is called, the kernel
> will clear all items related to this eventfd in thresholds-> primary.Since
> there is currently only one eventfd, thresholds-> primary becomes empty,
> so the kernel will set thresholds-> primary and hresholds-> spare to NULL.
> If at this time, the user creates a new eventfd and monitor the memory
> threshold of this cgroup, kernel will re-initialize thresholds-> primary.
> Then when __mem_cgroup_usage_unregister_event () is called for the second
> time, because thresholds-> primary is not empty, the system will access
> thresholds-> spare, but thresholds-> spare is NULL, which will trigger a
> crash.
> 
> In general, the longer it takes to delete all events related to this
> eventfd, the easier it is to trigger this problem.
> 
> The solution is to check whether the thresholds associated with the eventfd
> has been cleared when deleting the event. If so, we do nothing.
> 
> Signed-off-by: Chunguang Xu <brookxu@tencent.com>

The fix looks reasonable to me
Acked-by: Michal Hocko <mhocko@suse.com>

It seems that the code has been broken since 2c488db27b61 ("memcg: clean
up memory thresholds"). We've had 371528caec55 ("mm: memcg: Correct
unregistring of events attached to the same eventfd") but it didn't
catch this case for some reason. Unless I am missing something the code
was broken back then already. Kirill please double check after me.

So if I am not wrong then we want
Fixes: 2c488db27b61 ("memcg: clean up memory thresholds")
Cc: stable

sounds appropriate because this seems to be user trigerable.

Thanks for preparing the patch!

Btw. you should double check your email sender because it seemed to
whitespace damaged the patch (\t -> spaces). Please use git send-email
instead.

> ---
>  mm/memcontrol.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d09776c..4575a58 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>      struct mem_cgroup_thresholds *thresholds;
>      struct mem_cgroup_threshold_ary *new;
>      unsigned long usage;
> -    int i, j, size;
> +    int i, j, size, entries;
>  
>      mutex_lock(&memcg->thresholds_lock);
>  
> @@ -4047,12 +4047,18 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
>      __mem_cgroup_threshold(memcg, type == _MEMSWAP);
>  
>      /* Calculate new number of threshold */
> -    size = 0;
> +    size = entries = 0;
>      for (i = 0; i < thresholds->primary->size; i++) {
>          if (thresholds->primary->entries[i].eventfd != eventfd)
>              size++;
> +        else
> +            entries++;
>      }
>  
> +    /* If items related to eventfd have been cleared, nothing to do */
> +    if (!entries)
> +        goto unlock;
> +
>      new = thresholds->spare;
>  
>      /* Set thresholds array to NULL if we don't have thresholds */
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
  2020-03-10  9:48 ` Michal Hocko
@ 2020-03-10 10:41   ` Kirill A. Shutemov
  2020-03-10 11:17     ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Kirill A. Shutemov @ 2020-03-10 10:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: brookxu, hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

On Tue, Mar 10, 2020 at 10:48:36AM +0100, Michal Hocko wrote:
> [Cc Kirill, I didn't realize he has implemented this code]

My first non-trivial mm contribution :P

> On Fri 06-03-20 09:02:02, brookxu wrote:
> > From: Chunguang Xu <brookxu@tencent.com>
> > 
> > An eventfd monitors multiple memory thresholds of the cgroup, closes them,
> > the kernel deletes all events related to this eventfd. Before all events
> > are deleted, another eventfd monitors the memory threshold of this cgroup,
> > leading to a crash:
> > 
> > [  135.675108] BUG: kernel NULL pointer dereference, address: 0000000000000004
> > [  135.675350] #PF: supervisor write access in kernel mode
> > [  135.675579] #PF: error_code(0x0002) - not-present page
> > [  135.675816] PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
> > [  135.676080] Oops: 0002 [#1] SMP PTI
> > [  135.676332] CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
> > [  135.676610] Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
> > [  135.676909] Workqueue: events memcg_event_remove
> > [  135.677192] RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
> > [  135.677825] RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
> > [  135.678186] RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
> > [  135.678548] RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
> > [  135.678912] RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
> > [  135.679287] R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
> > [  135.679670] R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
> > [  135.680066] FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
> > [  135.680475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  135.680894] CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
> > [  135.681325] Call Trace:
> > [  135.681763]  memcg_event_remove+0x32/0x90
> > [  135.682209]  process_one_work+0x172/0x380
> > [  135.682657]  worker_thread+0x49/0x3f0
> > [  135.683111]  kthread+0xf8/0x130
> > [  135.683570]  ? max_active_store+0x80/0x80
> > [  135.684034]  ? kthread_bind+0x10/0x10
> > [  135.684506]  ret_from_fork+0x35/0x40
> > [  135.689733] CR2: 0000000000000004
> > 
> > We can reproduce this problem in the following ways:
> >  
> > 1. We create a new cgroup subdirectory and a new eventfd, and then we
> >    monitor multiple memory thresholds of the cgroup through this eventfd.
> > 2. closing this eventfd, and __mem_cgroup_usage_unregister_event () will be
> >    called multiple times to delete all events related to this eventfd.
> > 
> > The first time __mem_cgroup_usage_unregister_event() is called, the kernel
> > will clear all items related to this eventfd in thresholds-> primary.Since
> > there is currently only one eventfd, thresholds-> primary becomes empty,
> > so the kernel will set thresholds-> primary and hresholds-> spare to NULL.

						    ^ typo

> > If at this time, the user creates a new eventfd and monitor the memory
> > threshold of this cgroup, kernel will re-initialize thresholds-> primary.
> > Then when __mem_cgroup_usage_unregister_event () is called for the second
> > time, because thresholds-> primary is not empty, the system will access
> > thresholds-> spare, but thresholds-> spare is NULL, which will trigger a
> > crash.
> > 
> > In general, the longer it takes to delete all events related to this
> > eventfd, the easier it is to trigger this problem.
> > 
> > The solution is to check whether the thresholds associated with the eventfd
> > has been cleared when deleting the event. If so, we do nothing.
> > 
> > Signed-off-by: Chunguang Xu <brookxu@tencent.com>
> 
> The fix looks reasonable to me
> Acked-by: Michal Hocko <mhocko@suse.com>

Agreed. Two typos have to be addressed.

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

> It seems that the code has been broken since 2c488db27b61 ("memcg: clean
> up memory thresholds"). We've had 371528caec55 ("mm: memcg: Correct
> unregistring of events attached to the same eventfd") but it didn't
> catch this case for some reason. Unless I am missing something the code
> was broken back then already. Kirill please double check after me.

I think the issue exitsted before 2c488db27b61. The fields had different
names back then.

The logic to make unregister never-fail is added in 907860ed381a
("cgroups: make cftype.unregister_event() void-returning"). I believe the
Fixes should point there.

> 
> So if I am not wrong then we want
> Fixes: 2c488db27b61 ("memcg: clean up memory thresholds")
> Cc: stable
> 
> sounds appropriate because this seems to be user trigerable.
> 
> Thanks for preparing the patch!
> 
> Btw. you should double check your email sender because it seemed to
> whitespace damaged the patch (\t -> spaces). Please use git send-email
> instead.
> 
> > ---
> >  mm/memcontrol.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index d09776c..4575a58 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
> >      struct mem_cgroup_thresholds *thresholds;
> >      struct mem_cgroup_threshold_ary *new;
> >      unsigned long usage;
> > -    int i, j, size;
> > +    int i, j, size, entries;
> >  
> >      mutex_lock(&memcg->thresholds_lock);
> >  
> > @@ -4047,12 +4047,18 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
> >      __mem_cgroup_threshold(memcg, type == _MEMSWAP);
> >  
> >      /* Calculate new number of threshold */
> > -    size = 0;
> > +    size = entries = 0;
> >      for (i = 0; i < thresholds->primary->size; i++) {
> >          if (thresholds->primary->entries[i].eventfd != eventfd)
> >              size++;
> > +        else
> > +            entries++;
> >      }
> >  
> > +    /* If items related to eventfd have been cleared, nothing to do */

	       ^ "no items" ?

> > +    if (!entries)
> > +        goto unlock;
> > +
> >      new = thresholds->spare;
> >  
> >      /* Set thresholds array to NULL if we don't have thresholds */
> > -- 
> > 1.8.3.1
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
  2020-03-10 10:41   ` Kirill A. Shutemov
@ 2020-03-10 11:17     ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2020-03-10 11:17 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: brookxu, hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

On Tue 10-03-20 13:41:49, Kirill A. Shutemov wrote:
> On Tue, Mar 10, 2020 at 10:48:36AM +0100, Michal Hocko wrote:
> > [Cc Kirill, I didn't realize he has implemented this code]
> 
> My first non-trivial mm contribution :P

Everybody has to pay for sins of youth :p

[...]

> > It seems that the code has been broken since 2c488db27b61 ("memcg: clean
> > up memory thresholds"). We've had 371528caec55 ("mm: memcg: Correct
> > unregistring of events attached to the same eventfd") but it didn't
> > catch this case for some reason. Unless I am missing something the code
> > was broken back then already. Kirill please double check after me.
> 
> I think the issue exitsted before 2c488db27b61. The fields had different
> names back then.
> 
> The logic to make unregister never-fail is added in 907860ed381a
> ("cgroups: make cftype.unregister_event() void-returning"). I believe the
> Fixes should point there.

Yes, you seem to be right. It doesn't make a difference much as both
went in to the same kernel but a proper Fixes tag is really valuable.

Thanks for looking into that.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-03-10 11:17 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-06  1:02 [PATCHv2] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event brookxu
2020-03-10  9:48 ` Michal Hocko
2020-03-10 10:41   ` Kirill A. Shutemov
2020-03-10 11:17     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).