linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 210075] New: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
       [not found] <bug-210075-27@https.bugzilla.kernel.org/>
@ 2020-11-07  5:13 ` Andrew Morton
  2020-11-08 17:49   ` Lorenzo Stoakes
  2020-11-09  8:16   ` Michal Hocko
  0 siblings, 2 replies; 4+ messages in thread
From: Andrew Morton @ 2020-11-07  5:13 UTC (permalink / raw)
  To: linux-mm; +Cc: bugzilla-daemon, vladi, Johannes Weiner, Michal Hocko

(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=210075
> 
>             Bug ID: 210075
>            Summary: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at
>                     mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 5.9.5
>           Hardware: x86-64
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Page Allocator
>           Assignee: akpm@linux-foundation.org
>           Reporter: vladi@aresgate.net
>         Regression: No

I'm assuming this is a bug in the networking code.  I've seen a number
of possibly-related emails fly past - is it familiar to anyone?


> [Thu Nov  5 13:14:27 2020] ------------[ cut here ]------------
> [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57
> page_counter_uncharge+0x34/0x40
> [Thu Nov  5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag
> inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter
> ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q
>  iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE ipt_REJECT
> nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE
> xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_
> tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard
> curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64
> poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86
> _64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi
> nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel gpu_sched
> nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he
> lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt snd_pcm
> fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic backlight
> soundcore
> [Thu Nov  5 13:14:27 2020]  sha1_generic cp210x input_leds evdev led_class
> usbserial bfq acpi_cpufreq button
> [Thu Nov  5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted
> 5.9.5 #85
> [Thu Nov  5 13:14:27 2020] Hardware name: System manufacturer System Product
> Name/PRIME X370-PRO, BIOS 5220 09/12/2019
> [Thu Nov  5 13:14:27 2020] Workqueue: netns cleanup_net
> [Thu Nov  5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40
> [Thu Nov  5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0
> 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc f3
> c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3
> 9 d6 72 41 41 54
> [Thu Nov  5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082
> [Thu Nov  5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX:
> fffffffffffffffe
> [Thu Nov  5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI:
> ffff93a9cd087248
> [Thu Nov  5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09:
> fffffffffffffffe
> [Thu Nov  5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12:
> 0000000000000518
> [Thu Nov  5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15:
> ffff93a9cb992c00
> [Thu Nov  5 13:14:27 2020] FS:  0000000000000000(0000)
> GS:ffff93a9cef00000(0000) knlGS:0000000000000000
> [Thu Nov  5 13:14:27 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [Thu Nov  5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4:
> 00000000003506e0
> [Thu Nov  5 13:14:27 2020] Call Trace:
> [Thu Nov  5 13:14:27 2020]  __memcg_kmem_uncharge+0x4a/0x80
> [Thu Nov  5 13:14:27 2020]  drain_obj_stock+0x72/0x90
> [Thu Nov  5 13:14:27 2020]  refill_obj_stock+0x95/0xb0
> [Thu Nov  5 13:14:27 2020]  kmem_cache_free+0x194/0x390
> [Thu Nov  5 13:14:27 2020]  __sk_destruct+0x125/0x180
> [Thu Nov  5 13:14:27 2020]  inet_release+0x48/0x90
> [Thu Nov  5 13:14:27 2020]  sock_release+0x26/0x80
> [Thu Nov  5 13:14:27 2020]  ops_exit_list+0x2e/0x60
> [Thu Nov  5 13:14:27 2020]  cleanup_net+0x1eb/0x310
> [Thu Nov  5 13:14:27 2020]  process_one_work+0x1b1/0x310
> [Thu Nov  5 13:14:27 2020]  worker_thread+0x4b/0x400
> [Thu Nov  5 13:14:27 2020]  ? process_one_work+0x310/0x310
> [Thu Nov  5 13:14:27 2020]  kthread+0x112/0x130
> [Thu Nov  5 13:14:27 2020]  ? __kthread_bind_mask+0x90/0x90
> [Thu Nov  5 13:14:27 2020]  ret_from_fork+0x22/0x30
> [Thu Nov  5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]---
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug 210075] New: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
  2020-11-07  5:13 ` [Bug 210075] New: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40 Andrew Morton
@ 2020-11-08 17:49   ` Lorenzo Stoakes
  2020-11-09  8:16   ` Michal Hocko
  1 sibling, 0 replies; 4+ messages in thread
From: Lorenzo Stoakes @ 2020-11-08 17:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, bugzilla-daemon, vladi, Johannes Weiner, Michal Hocko

Looking to jump back into some kernel hacking again so thought I'd
take a quick rusty look.

Pattern matching a bit but I wonder whether f2fe7b09 (mm: memcg/slab:
charge individual slab objects instead of pages) may have had a role
in this bug as it adds an obj_cgroup_uncharge() invocation to
memcg_slab_free_hook() invoked from kmem_cache_free(). sk_prot_free()
also invokes mem_cgroup_sk_free() before kmem_cache_free() so perhaps
an uncharge is getting doubled up here? I traced through
mem_cgroup_sk_free() (which invokes css_put()) but couldn't see where
it would result in an additional uncharge so I may be barking up the
wrong tree here.

I'd be more than happy to have a deeper look at this if vladi has some
code that repro's this + a .config, if that'd be helpful.

Best, Lorenzo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug 210075] New: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
  2020-11-07  5:13 ` [Bug 210075] New: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40 Andrew Morton
  2020-11-08 17:49   ` Lorenzo Stoakes
@ 2020-11-09  8:16   ` Michal Hocko
  2020-11-09 17:39     ` Shakeel Butt
  1 sibling, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2020-11-09  8:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, bugzilla-daemon, vladi, Johannes Weiner,
	Roman Gushchin, Shakeel Butt

[Cc Roman and Shakeel]

On Fri 06-11-20 21:13:00, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=210075
> > 
> >             Bug ID: 210075
> >            Summary: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at
> >                     mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 5.9.5
> >           Hardware: x86-64
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Page Allocator
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: vladi@aresgate.net
> >         Regression: No
> 
> I'm assuming this is a bug in the networking code.  I've seen a number
> of possibly-related emails fly past - is it familiar to anyone?

Looks similar to 8de15e920dc8 ("mm: memcg: link page counters to root if
use_hierarchy is false"). The path is different so the underlying reason
might be something else.
 
> > [Thu Nov  5 13:14:27 2020] ------------[ cut here ]------------
> > [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57
> > page_counter_uncharge+0x34/0x40
> > [Thu Nov  5 13:14:27 2020] Modules linked in: tcp_diag udp_diag raw_diag
> > inet_diag netlink_diag nfnetlink_queue xt_nat macvlan veth ip6table_filter
> > ip6_tables nf_conntrack_netlink nfnetlink bridge stp llc 8021q
> >  iptable_raw nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_NFQUEUE ipt_REJECT
> > nf_reject_ipv4 xt_mark xt_tcpudp xt_conntrack iptable_filter xt_MASQUERADE
> > xt_addrtype iptable_nat nf_nat iptable_mangle ip_tables x_
> > tables sch_fq_codel ext4 mbcache jbd2 dm_crypt encrypted_keys wireguard
> > curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64
> > poly1305_x86_64 ip6_udp_tunnel udp_tunnel libblake2s blake2s_x86
> > _64 libblake2s_generic ipv6 libchacha amdgpu snd_hda_codec_hdmi
> > nf_conntrack_sip nf_conntrack nf_defrag_ipv6 mfd_core snd_hda_intel gpu_sched
> > nf_defrag_ipv4 snd_intel_dspcfg ttm kvm_amd snd_hda_codec drm_kms_he
> > lper efi_pstore tun snd_hda_core syscopyarea kvm sysfillrect sysimgblt snd_pcm
> > fb_sys_fops irqbypass snd_timer efivars drm snd ccp k10temp tcp_cubic backlight
> > soundcore
> > [Thu Nov  5 13:14:27 2020]  sha1_generic cp210x input_leds evdev led_class
> > usbserial bfq acpi_cpufreq button
> > [Thu Nov  5 13:14:27 2020] CPU: 4 PID: 133 Comm: kworker/u16:9 Not tainted
> > 5.9.5 #85
> > [Thu Nov  5 13:14:27 2020] Hardware name: System manufacturer System Product
> > Name/PRIME X370-PRO, BIOS 5220 09/12/2019
> > [Thu Nov  5 13:14:27 2020] Workqueue: netns cleanup_net
> > [Thu Nov  5 13:14:27 2020] RIP: 0010:page_counter_uncharge+0x34/0x40
> > [Thu Nov  5 13:14:27 2020] Code: 85 ff 74 24 4c 89 c8 f0 48 0f c1 07 4c 29 c0
> > 48 89 c1 48 89 c6 e8 7c fe ff ff 48 85 c9 78 0b 48 8b 7f 28 48 85 ff 75 dc f3
> > c3 <0f> 0b eb f1 0f 1f 84 00 00 00 00 00 48 8b 17 48 3
> > 9 d6 72 41 41 54
> > [Thu Nov  5 13:14:27 2020] RSP: 0018:ffffa225007c7d30 EFLAGS: 00010082
> > [Thu Nov  5 13:14:27 2020] RAX: fffffffffffffffe RBX: ffff93a9cd087000 RCX:
> > fffffffffffffffe
> > [Thu Nov  5 13:14:27 2020] RDX: 0000000000000200 RSI: fffffffffffffffe RDI:
> > ffff93a9cd087248
> > [Thu Nov  5 13:14:27 2020] RBP: 0000000000000002 R08: 0000000000000002 R09:
> > fffffffffffffffe
> > [Thu Nov  5 13:14:27 2020] R10: 0000000000000246 R11: 0000000000000000 R12:
> > 0000000000000518
> > [Thu Nov  5 13:14:27 2020] R13: 0000000000000488 R14: ffffffffb96869f5 R15:
> > ffff93a9cb992c00
> > [Thu Nov  5 13:14:27 2020] FS:  0000000000000000(0000)
> > GS:ffff93a9cef00000(0000) knlGS:0000000000000000
> > [Thu Nov  5 13:14:27 2020] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [Thu Nov  5 13:14:27 2020] CR2: 00007f362472c008 CR3: 00000003f108a000 CR4:
> > 00000000003506e0
> > [Thu Nov  5 13:14:27 2020] Call Trace:
> > [Thu Nov  5 13:14:27 2020]  __memcg_kmem_uncharge+0x4a/0x80
> > [Thu Nov  5 13:14:27 2020]  drain_obj_stock+0x72/0x90
> > [Thu Nov  5 13:14:27 2020]  refill_obj_stock+0x95/0xb0
> > [Thu Nov  5 13:14:27 2020]  kmem_cache_free+0x194/0x390
> > [Thu Nov  5 13:14:27 2020]  __sk_destruct+0x125/0x180
> > [Thu Nov  5 13:14:27 2020]  inet_release+0x48/0x90
> > [Thu Nov  5 13:14:27 2020]  sock_release+0x26/0x80
> > [Thu Nov  5 13:14:27 2020]  ops_exit_list+0x2e/0x60
> > [Thu Nov  5 13:14:27 2020]  cleanup_net+0x1eb/0x310
> > [Thu Nov  5 13:14:27 2020]  process_one_work+0x1b1/0x310
> > [Thu Nov  5 13:14:27 2020]  worker_thread+0x4b/0x400
> > [Thu Nov  5 13:14:27 2020]  ? process_one_work+0x310/0x310
> > [Thu Nov  5 13:14:27 2020]  kthread+0x112/0x130
> > [Thu Nov  5 13:14:27 2020]  ? __kthread_bind_mask+0x90/0x90
> > [Thu Nov  5 13:14:27 2020]  ret_from_fork+0x22/0x30
> > [Thu Nov  5 13:14:27 2020] ---[ end trace a17bbc8650d8c295 ]---
> > 
> > -- 
> > You are receiving this mail because:
> > You are the assignee for the bug.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Bug 210075] New: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
  2020-11-09  8:16   ` Michal Hocko
@ 2020-11-09 17:39     ` Shakeel Butt
  0 siblings, 0 replies; 4+ messages in thread
From: Shakeel Butt @ 2020-11-09 17:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Linux MM, bugzilla-daemon, vladi, Johannes Weiner,
	Roman Gushchin

On Mon, Nov 9, 2020 at 12:16 AM Michal Hocko <mhocko@suse.com> wrote:
>
> [Cc Roman and Shakeel]
>
> On Fri 06-11-20 21:13:00, Andrew Morton wrote:
> > (switched to email.  Please respond via emailed reply-to-all, not via the
> > bugzilla web interface).
> >
> > On Thu, 05 Nov 2020 21:18:05 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> >
> > > https://bugzilla.kernel.org/show_bug.cgi?id=210075
> > >
> > >             Bug ID: 210075
> > >            Summary: [Thu Nov  5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at
> > >                     mm/page_counter.c:57 page_counter_uncharge+0x34/0x40
> > >            Product: Memory Management
> > >            Version: 2.5
> > >     Kernel Version: 5.9.5
> > >           Hardware: x86-64
> > >                 OS: Linux
> > >               Tree: Mainline
> > >             Status: NEW
> > >           Severity: normal
> > >           Priority: P1
> > >          Component: Page Allocator
> > >           Assignee: akpm@linux-foundation.org
> > >           Reporter: vladi@aresgate.net
> > >         Regression: No
> >
> > I'm assuming this is a bug in the networking code.  I've seen a number
> > of possibly-related emails fly past - is it familiar to anyone?
>
> Looks similar to 8de15e920dc8 ("mm: memcg: link page counters to root if
> use_hierarchy is false"). The path is different so the underlying reason
> might be something else.
>

The commit 8de15e920dc8 is not in 5.9.5.

Is the issue reproducible and bisectable?


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-11-09 17:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-210075-27@https.bugzilla.kernel.org/>
2020-11-07  5:13 ` [Bug 210075] New: [Thu Nov 5 13:14:27 2020] WARNING: CPU: 4 PID: 133 at mm/page_counter.c:57 page_counter_uncharge+0x34/0x40 Andrew Morton
2020-11-08 17:49   ` Lorenzo Stoakes
2020-11-09  8:16   ` Michal Hocko
2020-11-09 17:39     ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).