From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pb0-f80.google.com (mail-pb0-f80.google.com [209.85.160.80]) by kanga.kvack.org (Postfix) with ESMTP id 85BFB6B0037 for ; Mon, 28 Oct 2013 11:22:21 -0400 (EDT) Received: by mail-pb0-f80.google.com with SMTP id md4so32892pbc.3 for ; Mon, 28 Oct 2013 08:22:21 -0700 (PDT) Received: from psmtp.com ([74.125.245.205]) by mx.google.com with SMTP id je1si5902827pbb.0.2013.10.25.20.39.48 for ; Fri, 25 Oct 2013 20:39:49 -0700 (PDT) Date: Fri, 25 Oct 2013 23:39:36 -0400 From: Johannes Weiner Subject: Re: RIP: mem_cgroup_move_account+0xf4/0x290 Message-ID: <20131026033936.GA14971@cmpxchg.org> References: <20131025161555.GA4398@plex.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20131025161555.GA4398@plex.lan> Sender: owner-linux-mm@kvack.org List-ID: To: Flavio Leitner Cc: Andrew Morton , Sha Zhengju , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Fri, Oct 25, 2013 at 02:15:55PM -0200, Flavio Leitner wrote: >=20 > While playing with guests and net-next kernel, I've triggered > this with some frequency. Even Fedora 19 kernel reproduces. >=20 > It it a known issue? >=20 > Thanks, > fbl >=20 > [ 6790.349763] kvm: zapping shadow pages for mmio generation wraparound > [ 6792.283879] kvm: zapping shadow pages for mmio generation wraparound > [ 7535.654438] perf samples too long (2719 > 2500), lowering kernel.perf_= event_max_sample_rate to 50000 > [ 7535.665948] INFO: NMI handler (perf_event_nmi_handler) took too long t= o run: 11.560 msecs > [ 7691.048392] virbr0: port 1(vnet0) entered disabled state > [ 7691.056281] device vnet0 left promiscuous mode > [ 7691.061674] virbr0: port 1(vnet0) entered disabled state > [ 7691.163363] BUG: unable to handle kernel paging request at 000060fbc00= 02a20 > [ 7691.171145] IP: [] mem_cgroup_move_account+0xf4/0x290 > [ 7691.178574] PGD 0=20 > [ 7691.181042] Oops: 0000 [#1] SMP=20 > [ 7691.184761] Modules linked in: vhost_net vhost macvtap macvlan tun vet= h openvswitch xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ip= t_MASQUERADE ip6t_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp= llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 = nf_nat_ipv6 vxlan ip_tunnel gre libcrc32c ip6table_mangle ip6table_security= ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_d= efrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security = iptable_raw coretemp kvm_intel snd_hda_codec_realtek snd_hda_intel nfsd snd= _hda_codec kvm auth_rpcgss nfs_acl snd_hwdep lockd snd_seq snd_seq_device s= nd_pcm e1000e snd_page_alloc sunrpc snd_timer crc32c_intel i7core_edac bnx2= shpchp ptp snd iTCO_wdt joydev pps_core iTCO_vendor_support pcspkr soundco= re microcode serio_raw lpc_ich edac_core mfd_core i2c_i801 acpi_cpufreq hid= _logitech_dj nouveau ata_generic pata_acpi video i2c_algo_bit drm_kms_helpe= r ttm drm mxm_wmi i2c_core pata_marvell wmi [last unloaded: openvswitch] > [ 7691.285989] CPU: 1 PID: 14 Comm: kworker/1:0 Tainted: G I 3.= 12.0-rc6-01188-gb45bd46 #1 > [ 7691.295779] Hardware name: /DX58SO, BIOS SOX5810J.86A= =2E5599.2012.0529.2218 05/29/2012 > [ 7691.306066] Workqueue: events css_killed_work_fn > [ 7691.311303] task: ffff880429555dc0 ti: ffff88042957a000 task.ti: ffff8= 8042957a000 > [ 7691.319673] RIP: 0010:[] [] mem_c= group_move_account+0xf4/0x290 > [ 7691.329728] RSP: 0018:ffff88042957bcc8 EFLAGS: 00010002 > [ 7691.335747] RAX: 0000000000000246 RBX: ffff88042b17bc30 RCX: 000000000= 0000004 > [ 7691.343720] RDX: ffff880424cd6000 RSI: 000060fbc0002a08 RDI: ffff88042= 4cd622c > [ 7691.351735] RBP: ffff88042957bd20 R08: ffff880424cd4000 R09: 000000000= 0000001 > [ 7691.359751] R10: 0000000000000001 R11: 0000000000000001 R12: ffffea001= 03ef0c0 > [ 7691.367745] R13: ffff880424cd6000 R14: 0000000000000000 R15: ffff88042= 4cd622c > [ 7691.375738] FS: 0000000000000000(0000) GS:ffff88043fc20000(0000) knlG= S:0000000000000000 > [ 7691.384755] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > [ 7691.391238] CR2: 000060fbc0002a20 CR3: 0000000001c0c000 CR4: 000000000= 00027e0 > [ 7691.399235] Stack: > [ 7691.401672] ffff88042957bce8 ffff88042957bce8 ffffffff81312b6d ffff88= 0424cd4000 > [ 7691.409968] ffff880400000001 ffff880424cd6000 ffffea00103ef0c0 ffff88= 0424cd0430 > [ 7691.418264] ffff88042b17bc30 ffffea00103ef0e0 ffff880424cd6000 ffff88= 042957bda8 > [ 7691.426578] Call Trace: > [ 7691.429513] [] ? list_del+0xd/0x30 > [ 7691.435250] [] mem_cgroup_reparent_charges+0x247/0x= 460 > [ 7691.442874] [] mem_cgroup_css_offline+0xaf/0x1b0 > [ 7691.449942] [] offline_css+0x27/0x50 > [ 7691.455874] [] css_killed_work_fn+0x2d/0xa0 > [ 7691.462466] [] process_one_work+0x175/0x430 > [ 7691.469041] [] worker_thread+0x11b/0x3a0 > [ 7691.475345] [] ? rescuer_thread+0x340/0x340 > [ 7691.481919] [] kthread+0xc0/0xd0 > [ 7691.487478] [] ? insert_kthread_work+0x40/0x40 > [ 7691.494352] [] ret_from_fork+0x7c/0xb0 > [ 7691.500464] [] ? insert_kthread_work+0x40/0x40 > [ 7691.507335] Code: 85 f6 48 8b 55 d0 44 8b 4d c8 4c 8b 45 c0 0f 85 b3 0= 0 00 00 41 8b 4c 24 18 85 c9 0f 88 a6 00 00 00 48 8b b2 30 02 00 00 45 89 c= a <4c> 39 56 18 0f 8c 36 01 00 00 44 89 c9 f7 d9 89 cf 65 48 01 7e This is All code =3D=3D=3D=3D=3D=3D=3D=3D 0: 85 f6 test %esi,%esi 2: 48 8b 55 d0 mov -0x30(%rbp),%rdx 6: 44 8b 4d c8 mov -0x38(%rbp),%r9d a: 4c 8b 45 c0 mov -0x40(%rbp),%r8 e: 0f 85 b3 00 00 00 jne 0xc7 14: 41 8b 4c 24 18 mov 0x18(%r12),%ecx 19: 85 c9 test %ecx,%ecx 1b: 0f 88 a6 00 00 00 js 0xc7 21: 48 8b b2 30 02 00 00 mov 0x230(%rdx),%rsi 28: 45 89 ca mov %r9d,%r10d 2b:* 4c 39 56 18 cmp %r10,0x18(%rsi) <-- trappin= g instruction 2f: 0f 8c 36 01 00 00 jl 0x16b 35: 44 89 c9 mov %r9d,%ecx 38: f7 d9 neg %ecx 3a: 89 cf mov %ecx,%edi 3c: 65 gs 3d: 48 rex.W 3e: 01 .byte 0x1 3f: 7e .byte 0x7e which corresponds to WARN_ON_ONCE(from->stat->count[idx] < nr_pages); Humm. from->stat is a percpu pointer... This patch should fix it: --- =46rom 4e9fe9d7e8502eab1c8bb4761de838f61cd4a8e0 Mon Sep 17 00:00:00 2001 =46rom: Johannes Weiner Date: Fri, 25 Oct 2013 23:23:31 -0400 Subject: [patch] mm: memcg: fix percpu variable access crash 3ea67d06e467 ("memcg: add per cgroup writeback pages accounting") added a WARN_ON_ONCE() to sanity check the page statistics counter when moving charges. Unfortunately, it dereferences the percpu counter directly, which may result in a crash like this: [ 7691.163363] BUG: unable to handle kernel paging request at 000060fbc0002= a20 [ 7691.171145] IP: [] mem_cgroup_move_account+0xf4/0x290 [ 7691.178574] PGD 0 [ 7691.181042] Oops: 0000 [#1] SMP [...] [ 7691.285989] CPU: 1 PID: 14 Comm: kworker/1:0 Tainted: G I 3.12= =2E0-rc6-01188-gb45bd46 #1 [ 7691.295779] Hardware name: /DX58SO, BIOS SOX5810J.86A.5= 599.2012.0529.2218 05/29/2012 [ 7691.306066] Workqueue: events css_killed_work_fn [ 7691.311303] task: ffff880429555dc0 ti: ffff88042957a000 task.ti: ffff880= 42957a000 [ 7691.319673] RIP: 0010:[] [] mem_cgr= oup_move_account+0xf4/0x290 [ 7691.329728] RSP: 0018:ffff88042957bcc8 EFLAGS: 00010002 [ 7691.335747] RAX: 0000000000000246 RBX: ffff88042b17bc30 RCX: 00000000000= 00004 [ 7691.343720] RDX: ffff880424cd6000 RSI: 000060fbc0002a08 RDI: ffff880424c= d622c [ 7691.351735] RBP: ffff88042957bd20 R08: ffff880424cd4000 R09: 00000000000= 00001 [ 7691.359751] R10: 0000000000000001 R11: 0000000000000001 R12: ffffea00103= ef0c0 [ 7691.367745] R13: ffff880424cd6000 R14: 0000000000000000 R15: ffff880424c= d622c [ 7691.375738] FS: 0000000000000000(0000) GS:ffff88043fc20000(0000) knlGS:= 0000000000000000 [ 7691.384755] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 7691.391238] CR2: 000060fbc0002a20 CR3: 0000000001c0c000 CR4: 00000000000= 027e0 [ 7691.399235] Stack: [ 7691.401672] ffff88042957bce8 ffff88042957bce8 ffffffff81312b6d ffff8804= 24cd4000 [ 7691.409968] ffff880400000001 ffff880424cd6000 ffffea00103ef0c0 ffff8804= 24cd0430 [ 7691.418264] ffff88042b17bc30 ffffea00103ef0e0 ffff880424cd6000 ffff8804= 2957bda8 [ 7691.426578] Call Trace: [ 7691.429513] [] ? list_del+0xd/0x30 [ 7691.435250] [] mem_cgroup_reparent_charges+0x247/0x460 [ 7691.442874] [] mem_cgroup_css_offline+0xaf/0x1b0 [ 7691.449942] [] offline_css+0x27/0x50 [ 7691.455874] [] css_killed_work_fn+0x2d/0xa0 [ 7691.462466] [] process_one_work+0x175/0x430 [ 7691.469041] [] worker_thread+0x11b/0x3a0 [ 7691.475345] [] ? rescuer_thread+0x340/0x340 [ 7691.481919] [] kthread+0xc0/0xd0 [ 7691.487478] [] ? insert_kthread_work+0x40/0x40 [ 7691.494352] [] ret_from_fork+0x7c/0xb0 [ 7691.500464] [] ? insert_kthread_work+0x40/0x40 [ 7691.507335] Code: 85 f6 48 8b 55 d0 44 8b 4d c8 4c 8b 45 c0 0f 85 b3 00 = 00 00 41 8b 4c 24 18 85 c9 0f 88 a6 00 00 00 48 8b b2 30 02 00 00 45 89 ca <4c> 39 56 18 0f 8c 3= 6 01 00 00 44 89 c9 f7 d9 89 cf 65 48 01 7e [ 7691.528638] RIP [] mem_cgroup_move_account+0xf4/0x290 Add the required __this_cpu_read(). Signed-off-by: Johannes Weiner --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 4097a78..a4864b6 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3773,7 +3773,7 @@ void mem_cgroup_move_account_page_stat(struct mem_cgr= oup *from, { /* Update stat data for mem_cgroup */ preempt_disable(); - WARN_ON_ONCE(from->stat->count[idx] < nr_pages); + WARN_ON_ONCE(__this_cpu_read(from->stat->count[idx]) < nr_pages); __this_cpu_add(from->stat->count[idx], -nr_pages); __this_cpu_add(to->stat->count[idx], nr_pages); preempt_enable(); --=20 1.8.4.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org