linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
@ 2020-03-05  5:52 brookxu
  2020-03-05  9:24 ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: brookxu @ 2020-03-05  5:52 UTC (permalink / raw)
  To: hannes, mhocko, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

One eventfd monitors multiple memory thresholds of cgroup, closing it, the
system will delete related events. Before all events are deleted, another
eventfd monitors the cgroup's memory threshold.

As a result, thresholds->primary[] is not empty, but thresholds->sparse[]
is NULL, __mem_cgroup_usage_unregister_event() leading to a crash:

[  138.925809] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
[  138.926817] IP: [<ffffffff8116c9b7>] mem_cgroup_usage_unregister_event+0xd7/0x1f0
[  138.927701] PGD 73bce067 PUD 76ff3067 PMD 0
[  138.928384] Oops: 0002 [#1] SMP
[  138.935218] CPU: 1 PID: 14 Comm: kworker/1:0 Not tainted 3.10.107-1-tlinux2-0047 #1
[  138.936076] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  138.936988] Workqueue: events cgroup_event_remove
[  138.937581] task: ffff88007c07e440 ti: ffff88007c090000 task.ti: ffff88007c090000
[  138.938485] RIP: 0010:[<ffffffff8116c9b7>]  [<ffffffff8116c9b7>] mem_cgroup_usage_unregister_event+0xd7/0x1f0
[  138.940116] RSP: 0018:ffff88007c093dc0  EFLAGS: 00010202
[  138.941056] RAX: 0000000000000001 RBX: ffff880073b3e1a8 RCX: 0000000000000001
[  138.942095] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff880074519900
[  138.943129] RBP: ffff88007c093df0 R08: 0000000000000001 R09: 0000000000000000
[  138.946057] R10: 000000000000b95b R11: 0000000000000001 R12: ffff880076cc0480
[  138.947805] R13: ffff880073b3e1d0 R14: 0000000000000000 R15: 0000000000000000
[  138.948903] FS:  0000000000000000(0000) GS:ffff88007fd00000(0000) knlGS:0000000000000000
[  138.952264] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  138.953123] CR2: 0000000000000004 CR3: 00000000753b3000 CR4: 00000000000406e0
[  138.954110] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  138.963245] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  138.964088] Stack:
[  138.964456]  0000000000000246 ffff880076d6df68 ffff8800751b4c00 ffff880076d6df00
[  138.965650]  0000000000000040 ffff880076d6df68 ffff88007c093e18 ffffffff810b17ba
[  138.966803]  ffff88007d04cf80 ffff88007fd115c0 ffff88007fd15600 ffff88007c093e60
[  138.968179] Call Trace:
[  138.968592]  [<ffffffff810b17ba>] cgroup_event_remove+0x3a/0x80
[  138.969321]  [<ffffffff81066387>] process_one_work+0x177/0x450
[  138.970051]  [<ffffffff8106721b>] worker_thread+0x11b/0x390
[  138.970741]  [<ffffffff81067100>] ? manage_workers.isra.26+0x290/0x290
[  138.971612]  [<ffffffff8106dacf>] kthread+0xcf/0xe0
[  138.972340]  [<ffffffff8106da00>] ? insert_kthread_work+0x40/0x40
[  138.973142]  [<ffffffff81aad9f8>] ret_from_fork+0x58/0x90
[  138.973843]  [<ffffffff8106da00>] ? insert_kthread_work+0x40/0x40

The solution is to check whether the thresholds associated with the eventfd
has been cleared when deleting the event. If so, we do nothing.

Signed-off-by: Chunguang Xu <brookxu@tencent.com>
---
 mm/memcontrol.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d09776c..4575a58 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
     struct mem_cgroup_thresholds *thresholds;
     struct mem_cgroup_threshold_ary *new;
     unsigned long usage;
-    int i, j, size;
+    int i, j, size, entries;
 
     mutex_lock(&memcg->thresholds_lock);
 
@@ -4047,12 +4047,18 @@ static void __mem_cgroup_usage_unregister_event(struct mem_cgroup *memcg,
     __mem_cgroup_threshold(memcg, type == _MEMSWAP);
 
     /* Calculate new number of threshold */
-    size = 0;
+    size = entries = 0;
     for (i = 0; i < thresholds->primary->size; i++) {
         if (thresholds->primary->entries[i].eventfd != eventfd)
             size++;
+        else
+            entries++;
     }
 
+    /* If items related to eventfd have been cleared, nothing to do */
+    if (!entries)
+        goto unlock;
+
     new = thresholds->spare;
 
     /* Set thresholds array to NULL if we don't have thresholds */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
  2020-03-05  5:52 [PATCH] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event brookxu
@ 2020-03-05  9:24 ` Michal Hocko
  2020-03-05 13:28   ` brookxu
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2020-03-05  9:24 UTC (permalink / raw)
  To: brookxu; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

Thank you for the report!

On Thu 05-03-20 13:52:03, brookxu wrote:
> One eventfd monitors multiple memory thresholds of cgroup, closing it, the
> system will delete related events. Before all events are deleted, another
> eventfd monitors the cgroup's memory threshold.

Could you describe the race scenario please? Ideally 
> 
> As a result, thresholds->primary[] is not empty, but thresholds->sparse[]
> is NULL, __mem_cgroup_usage_unregister_event() leading to a crash:
> 
> [  138.925809] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
> [  138.926817] IP: [<ffffffff8116c9b7>] mem_cgroup_usage_unregister_event+0xd7/0x1f0
> [  138.927701] PGD 73bce067 PUD 76ff3067 PMD 0
> [  138.928384] Oops: 0002 [#1] SMP
> [  138.935218] CPU: 1 PID: 14 Comm: kworker/1:0 Not tainted 3.10.107-1-tlinux2-0047 #1

Also you seem to be running a very old kernel. Does the problem exist in
the current Vanilla kernel?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
  2020-03-05  9:24 ` Michal Hocko
@ 2020-03-05 13:28   ` brookxu
  2020-03-05 14:07     ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: brookxu @ 2020-03-05 13:28 UTC (permalink / raw)
  To: Michal Hocko; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

Sorry, the calltrace in the email is from the old system, i will fix it later. this problem also exists in the latest master branch. The
following is the calltrace corresponding to the latest kernel:

[  135.675108] BUG: kernel NULL pointer dereference, address: 0000000000000004
[  135.675350] #PF: supervisor write access in kernel mode
[  135.675579] #PF: error_code(0x0002) - not-present page
[  135.675816] PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
[  135.676080] Oops: 0002 [#1] SMP PTI
[  135.676332] CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
[  135.676610] Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
[  135.676909] Workqueue: events memcg_event_remove
[  135.677192] RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
[  135.677473] Code: c0 31 d2 48 63 ca 48 c1 e1 04 4c 39 64 0e 08 0f 95 c1 83 c2 01 0f b6 c9 01 c8 39 fa 75 e5 85 c0 4d 8b 7d 08 0f 84 ad 00 00 00 <41> 89 47 04 41 c7 07 ff ff ff ff 31 c0 49 8b 4d 00 31 d2 8b 71 04
[  135.677825] RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
[  135.678186] RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
[  135.678548] RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
[  135.678912] RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
[  135.679287] R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
[  135.679670] R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
[  135.680066] FS:  0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
[  135.680475] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.680894] CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
[  135.681325] Call Trace:
[  135.681763]  memcg_event_remove+0x32/0x90
[  135.682209]  process_one_work+0x172/0x380
[  135.682657]  worker_thread+0x49/0x3f0
[  135.683111]  kthread+0xf8/0x130
[  135.683570]  ? max_active_store+0x80/0x80
[  135.684034]  ? kthread_bind+0x10/0x10
[  135.684506]  ret_from_fork+0x35/0x40
[  135.684981] Modules linked in: netconsole cmac xt_CHECKSUM xt_MASQUERADE tun bridge stp llc ccm ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter bnep sunrpc intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rtl8192cu rtl_usb rtl8192c_common irqbypass rtlwifi mac80211 uvcvideo snd_hda_codec_realtek crct10dif_pclmul snd_hda_codec_hdmi snd_hda_codec_generic iTCO_wdt crc32_pclmul wmi_bmof mei_wdt iTCO_vendor_support snd_hda_intel btusb videobuf2_vmalloc ghash_clmulni_intel snd_intel_dspcfg btrtl videobuf2_memops snd_hda_codec btbcm videobuf2_v4l2 btintel snd_hda_core videobuf2_common bluetooth aesni_intel crypto_simd cryptd cfg80211
videodev
[  135.685003]  snd_hwdep glue_helper snd_seq mc ecdh_generic libarc4 snd_seq_device ecc pcspkr joydev snd_pcm thinkpad_acpi snd_timer ledtrig_audio sg i2c_i801 lpc_ich wmi snd rfkill mei_me soundcore mei ie31200_edac sch_fq_codel ip_tables xfs libcrc32c sr_mod sd_mod cdrom t10_pi i915 i2c_algo_bit drm_kms_helper crc32c_intel syscopyarea sysfillrect sysimgblt serio_raw fb_sys_fops drm e1000e ahci libahci video libata ptp pps_core dm_mirror dm_region_hash dm_log dm_mod
[  135.689733] CR2: 0000000000000004

We can reproduce this problem in the following ways:

1. We create a new cgroup subdirectory and a new eventfd, and then we monitor multiple memory thresholds of the cgroup through this eventfd
2. closing this eventfd, the system will call __mem_cgroup_usage_unregister_event () multiple times to delete all events related to this eventfd.

The first time __mem_cgroup_usage_unregister_event () is called, the system will clear all items related to this eventfd in thresholds-> primary.
Since there is currently only one eventfd, thresholds-> primary becomes empty, so the system will set thresholds-> primary and hresholds-> spare
to NULL. If at this time, the user creates a new eventfd and monitor the memory threshold of this cgroup, then the system will re-initialize
thresholds-> primary. Then when the system call __mem_cgroup_usage_unregister_event () is called for the second time, because thresholds-> primary
is not empty, the system will access thresholds-> spare, but thresholds-> spare is NULL, which will trigger a crash.

In general, the longer it takes to delete all events related to this eventfd, the easier it is to trigger this problem.

thanks

Michal Hocko wrote on 2020/3/5 17:24:
> Thank you for the report!
>
> On Thu 05-03-20 13:52:03, brookxu wrote:
>> One eventfd monitors multiple memory thresholds of cgroup, closing it, the
>> system will delete related events. Before all events are deleted, another
>> eventfd monitors the cgroup's memory threshold.
> Could you describe the race scenario please? Ideally 
>> As a result, thresholds->primary[] is not empty, but thresholds->sparse[]
>> is NULL, __mem_cgroup_usage_unregister_event() leading to a crash:
>>
>> [  138.925809] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
>> [  138.926817] IP: [<ffffffff8116c9b7>] mem_cgroup_usage_unregister_event+0xd7/0x1f0
>> [  138.927701] PGD 73bce067 PUD 76ff3067 PMD 0
>> [  138.928384] Oops: 0002 [#1] SMP
>> [  138.935218] CPU: 1 PID: 14 Comm: kworker/1:0 Not tainted 3.10.107-1-tlinux2-0047 #1
> Also you seem to be running a very old kernel. Does the problem exist in
> the current Vanilla kernel?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
  2020-03-05 13:28   ` brookxu
@ 2020-03-05 14:07     ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2020-03-05 14:07 UTC (permalink / raw)
  To: brookxu; +Cc: hannes, vdavydov.dev, akpm, cgroups, linux-mm, linux-kernel

On Thu 05-03-20 21:28:20, brookxu wrote:
[...]
> We can reproduce this problem in the following ways:
> 
> 1. We create a new cgroup subdirectory and a new eventfd, and then we monitor multiple memory thresholds of the cgroup through this eventfd
> 2. closing this eventfd, the system will call __mem_cgroup_usage_unregister_event () multiple times to delete all events related to this eventfd.
> 
> The first time __mem_cgroup_usage_unregister_event () is called, the system will clear all items related to this eventfd in thresholds-> primary.
> Since there is currently only one eventfd, thresholds-> primary becomes empty, so the system will set thresholds-> primary and hresholds-> spare
> to NULL. If at this time, the user creates a new eventfd and monitor the memory threshold of this cgroup, then the system will re-initialize
> thresholds-> primary. Then when the system call __mem_cgroup_usage_unregister_event () is called for the second time, because thresholds-> primary
> is not empty, the system will access thresholds-> spare, but thresholds-> spare is NULL, which will trigger a crash.
> 
> In general, the longer it takes to delete all events related to this eventfd, the easier it is to trigger this problem.

Thank you for reproducing this on the current kernel and the above
description. That makes it much more easier to understand the underlying
problem. Please add it to the changelog. My memory about thresholds is
quite rusty so I will have to think about this more but I have some
coarse idea now at least.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-03-05 14:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-05  5:52 [PATCH] memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event brookxu
2020-03-05  9:24 ` Michal Hocko
2020-03-05 13:28   ` brookxu
2020-03-05 14:07     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).