From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Thomas Gleixner <tglx@linutronix.de>, Waiman Long <longman@redhat.com>, Peter Zijlstra <peterz@infradead.org> Subject: [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it. Date: Wed, 22 Dec 2021 12:41:08 +0100 [thread overview] Message-ID: <20211222114111.2206248-1-bigeasy@linutronix.de> (raw) Hi, this is a follow up to https://lkml.kernel.org/r/20211207155208.eyre5svucpg7krxe@linutronix.de where it has been suggested that I should try again with memcg instead of simply disabling it. Patch #1 deals with the counters. It has been suggested to simply disable preemption on RT (like in vmstats) and I followed that advice as closely as possible. The local_irq_save() could be removed from mod_memcg_state() and the other wrapper on RT but I leave it since it does not hurt and it might look nicer ;) Patch #2 is a follow up to https://lkml.kernel.org/r/20211214144412.447035-1-longman@redhat.com Patch #3 restricts the task_obj usage to !PREEMPTION kernels. Based on my understanding the use of preempt_disable() minimizes (avoids?) the win of the optimisation. I tested them on CONFIG_PREEMPT_NONE + CONFIG_PREEMPT_RT with the tools/testing/selftests/cgroup/* tests. I looked good except for the following (which was also there before the patches): - test_kmem sometimes complained about: not ok 2 test_kmem_memcg_deletion - test_memcontrol complained always about not ok 3 test_memcg_min not ok 4 test_memcg_low and did not finish. - lockdep complains were triggered by test_core and test_freezer (both had to run): ====================================================== WARNING: possible circular locking dependency detected 5.16.0-rc5 #259 Not tainted ------------------------------------------------------ test_core/5996 is trying to acquire lock: ffffffff829a1258 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x2d/0xb0 but task is already holding lock: ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sighand->siglock){....}-{2:2}: _raw_spin_lock+0x27/0x40 cgroup_post_fork+0x1f5/0x290 copy_process+0x191b/0x1f80 kernel_clone+0x5a/0x410 __do_sys_clone3+0xb3/0x110 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (css_set_lock){..-.}-{2:2}: __lock_acquire+0x1253/0x2280 lock_acquire+0xd4/0x2e0 _raw_spin_lock_irqsave+0x36/0x50 obj_cgroup_release+0x2d/0xb0 drain_obj_stock+0x1a9/0x1b0 refill_obj_stock+0x4f/0x220 memcg_slab_free_hook.part.0+0x108/0x290 kmem_cache_free+0xf5/0x3c0 dequeue_signal+0xaf/0x1e0 get_signal+0x232/0xdb0 arch_do_signal_or_restart+0xf8/0x740 exit_to_user_mode_prepare+0x17d/0x270 syscall_exit_to_user_mode+0x19/0x70 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sighand->siglock); lock(css_set_lock); lock(&sighand->siglock); lock(css_set_lock); *** DEADLOCK *** 2 locks held by test_core/5996: #0: ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0 #1: ffffffff82905e40 (rcu_read_lock){....}-{1:2}, at: drain_obj_stock+0x71/0x1b0 stack backtrace: CPU: 2 PID: 5996 Comm: test_core Not tainted 5.16.0-rc5 #259 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 check_noncircular+0xfe/0x110 __lock_acquire+0x1253/0x2280 lock_acquire+0xd4/0x2e0 _raw_spin_lock_irqsave+0x36/0x50 obj_cgroup_release+0x2d/0xb0 drain_obj_stock+0x1a9/0x1b0 refill_obj_stock+0x4f/0x220 memcg_slab_free_hook.part.0+0x108/0x290 kmem_cache_free+0xf5/0x3c0 dequeue_signal+0xaf/0x1e0 get_signal+0x232/0xdb0 arch_do_signal_or_restart+0xf8/0x740 exit_to_user_mode_prepare+0x17d/0x270 syscall_exit_to_user_mode+0x19/0x70 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Sebastian
WARNING: multiple messages have this Message-ID (diff)
From: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org> To: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Vladimir Davydov <vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>, Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> Subject: [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it. Date: Wed, 22 Dec 2021 12:41:08 +0100 [thread overview] Message-ID: <20211222114111.2206248-1-bigeasy@linutronix.de> (raw) Hi, this is a follow up to https://lkml.kernel.org/r/20211207155208.eyre5svucpg7krxe-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org where it has been suggested that I should try again with memcg instead of simply disabling it. Patch #1 deals with the counters. It has been suggested to simply disable preemption on RT (like in vmstats) and I followed that advice as closely as possible. The local_irq_save() could be removed from mod_memcg_state() and the other wrapper on RT but I leave it since it does not hurt and it might look nicer ;) Patch #2 is a follow up to https://lkml.kernel.org/r/20211214144412.447035-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org Patch #3 restricts the task_obj usage to !PREEMPTION kernels. Based on my understanding the use of preempt_disable() minimizes (avoids?) the win of the optimisation. I tested them on CONFIG_PREEMPT_NONE + CONFIG_PREEMPT_RT with the tools/testing/selftests/cgroup/* tests. I looked good except for the following (which was also there before the patches): - test_kmem sometimes complained about: not ok 2 test_kmem_memcg_deletion - test_memcontrol complained always about not ok 3 test_memcg_min not ok 4 test_memcg_low and did not finish. - lockdep complains were triggered by test_core and test_freezer (both had to run): ====================================================== WARNING: possible circular locking dependency detected 5.16.0-rc5 #259 Not tainted ------------------------------------------------------ test_core/5996 is trying to acquire lock: ffffffff829a1258 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x2d/0xb0 but task is already holding lock: ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sighand->siglock){....}-{2:2}: _raw_spin_lock+0x27/0x40 cgroup_post_fork+0x1f5/0x290 copy_process+0x191b/0x1f80 kernel_clone+0x5a/0x410 __do_sys_clone3+0xb3/0x110 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (css_set_lock){..-.}-{2:2}: __lock_acquire+0x1253/0x2280 lock_acquire+0xd4/0x2e0 _raw_spin_lock_irqsave+0x36/0x50 obj_cgroup_release+0x2d/0xb0 drain_obj_stock+0x1a9/0x1b0 refill_obj_stock+0x4f/0x220 memcg_slab_free_hook.part.0+0x108/0x290 kmem_cache_free+0xf5/0x3c0 dequeue_signal+0xaf/0x1e0 get_signal+0x232/0xdb0 arch_do_signal_or_restart+0xf8/0x740 exit_to_user_mode_prepare+0x17d/0x270 syscall_exit_to_user_mode+0x19/0x70 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sighand->siglock); lock(css_set_lock); lock(&sighand->siglock); lock(css_set_lock); *** DEADLOCK *** 2 locks held by test_core/5996: #0: ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0 #1: ffffffff82905e40 (rcu_read_lock){....}-{1:2}, at: drain_obj_stock+0x71/0x1b0 stack backtrace: CPU: 2 PID: 5996 Comm: test_core Not tainted 5.16.0-rc5 #259 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x59 check_noncircular+0xfe/0x110 __lock_acquire+0x1253/0x2280 lock_acquire+0xd4/0x2e0 _raw_spin_lock_irqsave+0x36/0x50 obj_cgroup_release+0x2d/0xb0 drain_obj_stock+0x1a9/0x1b0 refill_obj_stock+0x4f/0x220 memcg_slab_free_hook.part.0+0x108/0x290 kmem_cache_free+0xf5/0x3c0 dequeue_signal+0xaf/0x1e0 get_signal+0x232/0xdb0 arch_do_signal_or_restart+0xf8/0x740 exit_to_user_mode_prepare+0x17d/0x270 syscall_exit_to_user_mode+0x19/0x70 do_syscall_64+0x50/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> Sebastian
next reply other threads:[~2021-12-22 11:41 UTC|newest] Thread overview: 48+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-12-22 11:41 Sebastian Andrzej Siewior [this message] 2021-12-22 11:41 ` [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it Sebastian Andrzej Siewior 2021-12-22 11:41 ` [RFC PATCH 1/3] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT Sebastian Andrzej Siewior 2021-12-22 11:41 ` Sebastian Andrzej Siewior 2021-12-23 2:31 ` Waiman Long 2021-12-23 2:31 ` Waiman Long 2021-12-23 7:34 ` Sebastian Andrzej Siewior 2021-12-23 7:34 ` Sebastian Andrzej Siewior 2021-12-23 16:01 ` Waiman Long 2021-12-23 16:01 ` Waiman Long 2022-01-05 14:16 ` Michal Koutný 2022-01-05 14:16 ` Michal Koutný 2022-01-13 13:08 ` Sebastian Andrzej Siewior 2022-01-13 13:08 ` Sebastian Andrzej Siewior 2022-01-13 14:48 ` Michal Koutný 2022-01-13 14:48 ` Michal Koutný 2022-01-14 9:09 ` Sebastian Andrzej Siewior 2022-01-14 9:09 ` Sebastian Andrzej Siewior 2022-01-18 18:26 ` [PATCH] mm/memcg: Do not check v1 event counter when not needed Michal Koutný 2022-01-18 18:26 ` Michal Koutný 2022-01-18 19:57 ` Sebastian Andrzej Siewior 2022-01-18 19:57 ` Sebastian Andrzej Siewior 2021-12-22 11:41 ` [RFC PATCH 2/3] mm/memcg: Add a local_lock_t for IRQ and TASK object Sebastian Andrzej Siewior 2021-12-22 11:41 ` Sebastian Andrzej Siewior 2021-12-23 21:38 ` Waiman Long 2021-12-23 21:38 ` Waiman Long 2022-01-03 16:34 ` Sebastian Andrzej Siewior 2022-01-03 16:34 ` Sebastian Andrzej Siewior 2022-01-03 17:09 ` Waiman Long 2022-01-03 17:09 ` Waiman Long 2021-12-22 11:41 ` [RFC PATCH 3/3] mm/memcg: Allow the task_obj optimization only on non-PREEMPTIBLE kernels Sebastian Andrzej Siewior 2021-12-22 11:41 ` Sebastian Andrzej Siewior 2021-12-23 21:48 ` Waiman Long 2021-12-23 21:48 ` Waiman Long 2022-01-03 14:44 ` Sebastian Andrzej Siewior 2022-01-03 14:44 ` Sebastian Andrzej Siewior 2022-01-03 15:04 ` Waiman Long 2022-01-03 15:04 ` Waiman Long 2022-01-05 20:22 ` Sebastian Andrzej Siewior 2022-01-05 20:22 ` Sebastian Andrzej Siewior 2022-01-06 3:28 ` Waiman Long 2022-01-06 3:28 ` Waiman Long 2022-01-13 15:26 ` Sebastian Andrzej Siewior 2022-01-13 15:26 ` Sebastian Andrzej Siewior 2022-01-05 14:59 ` [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it Michal Koutný 2022-01-05 14:59 ` Michal Koutný 2022-01-05 15:06 ` Sebastian Andrzej Siewior 2022-01-05 15:06 ` Sebastian Andrzej Siewior
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20211222114111.2206248-1-bigeasy@linutronix.de \ --to=bigeasy@linutronix.de \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=hannes@cmpxchg.org \ --cc=linux-mm@kvack.org \ --cc=longman@redhat.com \ --cc=mhocko@kernel.org \ --cc=peterz@infradead.org \ --cc=tglx@linutronix.de \ --cc=vdavydov.dev@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.