All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: cgroups@vger.kernel.org, linux-mm@kvack.org
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Waiman Long <longman@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it.
Date: Wed, 22 Dec 2021 12:41:08 +0100	[thread overview]
Message-ID: <20211222114111.2206248-1-bigeasy@linutronix.de> (raw)

Hi,

this is a follow up to
   https://lkml.kernel.org/r/20211207155208.eyre5svucpg7krxe@linutronix.de

where it has been suggested that I should try again with memcg instead
of simply disabling it.

Patch #1 deals with the counters. It has been suggested to simply
disable preemption on RT (like in vmstats) and I followed that advice as
closely as possible. The local_irq_save() could be removed from
mod_memcg_state() and the other wrapper on RT but I leave it since it
does not hurt and it might look nicer ;)

Patch #2 is a follow up to
   https://lkml.kernel.org/r/20211214144412.447035-1-longman@redhat.com

Patch #3 restricts the task_obj usage to !PREEMPTION kernels. Based on
my understanding the use of preempt_disable() minimizes (avoids?) the
win of the optimisation.

I tested them on CONFIG_PREEMPT_NONE + CONFIG_PREEMPT_RT with the
tools/testing/selftests/cgroup/* tests. I looked good except for the
following (which was also there before the patches):
- test_kmem sometimes complained about:
 not ok 2 test_kmem_memcg_deletion
 
- test_memcontrol complained always about
 not ok 3 test_memcg_min
 not ok 4 test_memcg_low
 and did not finish.

- lockdep complains were triggered by test_core and test_freezer (both
  had to run):
 ======================================================
 WARNING: possible circular locking dependency detected
 5.16.0-rc5 #259 Not tainted
 ------------------------------------------------------
 test_core/5996 is trying to acquire lock:
 ffffffff829a1258 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x2d/0xb0
 
 but task is already holding lock:
 ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0
 
 which lock already depends on the new lock.

 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (&sighand->siglock){....}-{2:2}:
        _raw_spin_lock+0x27/0x40
        cgroup_post_fork+0x1f5/0x290
        copy_process+0x191b/0x1f80
        kernel_clone+0x5a/0x410
        __do_sys_clone3+0xb3/0x110
        do_syscall_64+0x43/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
 
 -> #0 (css_set_lock){..-.}-{2:2}:
        __lock_acquire+0x1253/0x2280
        lock_acquire+0xd4/0x2e0
        _raw_spin_lock_irqsave+0x36/0x50
        obj_cgroup_release+0x2d/0xb0
        drain_obj_stock+0x1a9/0x1b0
        refill_obj_stock+0x4f/0x220
        memcg_slab_free_hook.part.0+0x108/0x290
        kmem_cache_free+0xf5/0x3c0
        dequeue_signal+0xaf/0x1e0
        get_signal+0x232/0xdb0
        arch_do_signal_or_restart+0xf8/0x740
        exit_to_user_mode_prepare+0x17d/0x270
        syscall_exit_to_user_mode+0x19/0x70
        do_syscall_64+0x50/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
 
 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&sighand->siglock);
                                lock(css_set_lock);
                                lock(&sighand->siglock);
   lock(css_set_lock);
 
  *** DEADLOCK ***

 2 locks held by test_core/5996:
  #0: ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0
  #1: ffffffff82905e40 (rcu_read_lock){....}-{1:2}, at: drain_obj_stock+0x71/0x1b0
 
 stack backtrace:
 CPU: 2 PID: 5996 Comm: test_core Not tainted 5.16.0-rc5 #259
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x45/0x59
  check_noncircular+0xfe/0x110
  __lock_acquire+0x1253/0x2280
  lock_acquire+0xd4/0x2e0
  _raw_spin_lock_irqsave+0x36/0x50
  obj_cgroup_release+0x2d/0xb0
  drain_obj_stock+0x1a9/0x1b0
  refill_obj_stock+0x4f/0x220
  memcg_slab_free_hook.part.0+0x108/0x290
  kmem_cache_free+0xf5/0x3c0
  dequeue_signal+0xaf/0x1e0
  get_signal+0x232/0xdb0
  arch_do_signal_or_restart+0xf8/0x740
  exit_to_user_mode_prepare+0x17d/0x270
  syscall_exit_to_user_mode+0x19/0x70
  do_syscall_64+0x50/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  </TASK>

Sebastian




WARNING: multiple messages have this Message-ID (diff)
From: Sebastian Andrzej Siewior <bigeasy-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>
To: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Thomas Gleixner <tglx-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org>,
	Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Subject: [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it.
Date: Wed, 22 Dec 2021 12:41:08 +0100	[thread overview]
Message-ID: <20211222114111.2206248-1-bigeasy@linutronix.de> (raw)

Hi,

this is a follow up to
   https://lkml.kernel.org/r/20211207155208.eyre5svucpg7krxe-hfZtesqFncYOwBW4kG4KsQ@public.gmane.org

where it has been suggested that I should try again with memcg instead
of simply disabling it.

Patch #1 deals with the counters. It has been suggested to simply
disable preemption on RT (like in vmstats) and I followed that advice as
closely as possible. The local_irq_save() could be removed from
mod_memcg_state() and the other wrapper on RT but I leave it since it
does not hurt and it might look nicer ;)

Patch #2 is a follow up to
   https://lkml.kernel.org/r/20211214144412.447035-1-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org

Patch #3 restricts the task_obj usage to !PREEMPTION kernels. Based on
my understanding the use of preempt_disable() minimizes (avoids?) the
win of the optimisation.

I tested them on CONFIG_PREEMPT_NONE + CONFIG_PREEMPT_RT with the
tools/testing/selftests/cgroup/* tests. I looked good except for the
following (which was also there before the patches):
- test_kmem sometimes complained about:
 not ok 2 test_kmem_memcg_deletion
 
- test_memcontrol complained always about
 not ok 3 test_memcg_min
 not ok 4 test_memcg_low
 and did not finish.

- lockdep complains were triggered by test_core and test_freezer (both
  had to run):
 ======================================================
 WARNING: possible circular locking dependency detected
 5.16.0-rc5 #259 Not tainted
 ------------------------------------------------------
 test_core/5996 is trying to acquire lock:
 ffffffff829a1258 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x2d/0xb0
 
 but task is already holding lock:
 ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0
 
 which lock already depends on the new lock.

 
 the existing dependency chain (in reverse order) is:
 
 -> #1 (&sighand->siglock){....}-{2:2}:
        _raw_spin_lock+0x27/0x40
        cgroup_post_fork+0x1f5/0x290
        copy_process+0x191b/0x1f80
        kernel_clone+0x5a/0x410
        __do_sys_clone3+0xb3/0x110
        do_syscall_64+0x43/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
 
 -> #0 (css_set_lock){..-.}-{2:2}:
        __lock_acquire+0x1253/0x2280
        lock_acquire+0xd4/0x2e0
        _raw_spin_lock_irqsave+0x36/0x50
        obj_cgroup_release+0x2d/0xb0
        drain_obj_stock+0x1a9/0x1b0
        refill_obj_stock+0x4f/0x220
        memcg_slab_free_hook.part.0+0x108/0x290
        kmem_cache_free+0xf5/0x3c0
        dequeue_signal+0xaf/0x1e0
        get_signal+0x232/0xdb0
        arch_do_signal_or_restart+0xf8/0x740
        exit_to_user_mode_prepare+0x17d/0x270
        syscall_exit_to_user_mode+0x19/0x70
        do_syscall_64+0x50/0x90
        entry_SYSCALL_64_after_hwframe+0x44/0xae
 
 other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&sighand->siglock);
                                lock(css_set_lock);
                                lock(&sighand->siglock);
   lock(css_set_lock);
 
  *** DEADLOCK ***

 2 locks held by test_core/5996:
  #0: ffff888103034618 (&sighand->siglock){....}-{2:2}, at: get_signal+0x8d/0xdb0
  #1: ffffffff82905e40 (rcu_read_lock){....}-{1:2}, at: drain_obj_stock+0x71/0x1b0
 
 stack backtrace:
 CPU: 2 PID: 5996 Comm: test_core Not tainted 5.16.0-rc5 #259
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x45/0x59
  check_noncircular+0xfe/0x110
  __lock_acquire+0x1253/0x2280
  lock_acquire+0xd4/0x2e0
  _raw_spin_lock_irqsave+0x36/0x50
  obj_cgroup_release+0x2d/0xb0
  drain_obj_stock+0x1a9/0x1b0
  refill_obj_stock+0x4f/0x220
  memcg_slab_free_hook.part.0+0x108/0x290
  kmem_cache_free+0xf5/0x3c0
  dequeue_signal+0xaf/0x1e0
  get_signal+0x232/0xdb0
  arch_do_signal_or_restart+0xf8/0x740
  exit_to_user_mode_prepare+0x17d/0x270
  syscall_exit_to_user_mode+0x19/0x70
  do_syscall_64+0x50/0x90
  entry_SYSCALL_64_after_hwframe+0x44/0xae
  </TASK>

Sebastian



             reply	other threads:[~2021-12-22 11:41 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 11:41 Sebastian Andrzej Siewior [this message]
2021-12-22 11:41 ` [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it Sebastian Andrzej Siewior
2021-12-22 11:41 ` [RFC PATCH 1/3] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT Sebastian Andrzej Siewior
2021-12-22 11:41   ` Sebastian Andrzej Siewior
2021-12-23  2:31   ` Waiman Long
2021-12-23  2:31     ` Waiman Long
2021-12-23  7:34     ` Sebastian Andrzej Siewior
2021-12-23  7:34       ` Sebastian Andrzej Siewior
2021-12-23 16:01       ` Waiman Long
2021-12-23 16:01         ` Waiman Long
2022-01-05 14:16   ` Michal Koutný
2022-01-05 14:16     ` Michal Koutný
2022-01-13 13:08     ` Sebastian Andrzej Siewior
2022-01-13 13:08       ` Sebastian Andrzej Siewior
2022-01-13 14:48       ` Michal Koutný
2022-01-13 14:48         ` Michal Koutný
2022-01-14  9:09         ` Sebastian Andrzej Siewior
2022-01-14  9:09           ` Sebastian Andrzej Siewior
2022-01-18 18:26           ` [PATCH] mm/memcg: Do not check v1 event counter when not needed Michal Koutný
2022-01-18 18:26             ` Michal Koutný
2022-01-18 19:57             ` Sebastian Andrzej Siewior
2022-01-18 19:57               ` Sebastian Andrzej Siewior
2021-12-22 11:41 ` [RFC PATCH 2/3] mm/memcg: Add a local_lock_t for IRQ and TASK object Sebastian Andrzej Siewior
2021-12-22 11:41   ` Sebastian Andrzej Siewior
2021-12-23 21:38   ` Waiman Long
2021-12-23 21:38     ` Waiman Long
2022-01-03 16:34     ` Sebastian Andrzej Siewior
2022-01-03 16:34       ` Sebastian Andrzej Siewior
2022-01-03 17:09       ` Waiman Long
2022-01-03 17:09         ` Waiman Long
2021-12-22 11:41 ` [RFC PATCH 3/3] mm/memcg: Allow the task_obj optimization only on non-PREEMPTIBLE kernels Sebastian Andrzej Siewior
2021-12-22 11:41   ` Sebastian Andrzej Siewior
2021-12-23 21:48   ` Waiman Long
2021-12-23 21:48     ` Waiman Long
2022-01-03 14:44     ` Sebastian Andrzej Siewior
2022-01-03 14:44       ` Sebastian Andrzej Siewior
2022-01-03 15:04       ` Waiman Long
2022-01-03 15:04         ` Waiman Long
2022-01-05 20:22         ` Sebastian Andrzej Siewior
2022-01-05 20:22           ` Sebastian Andrzej Siewior
2022-01-06  3:28           ` Waiman Long
2022-01-06  3:28             ` Waiman Long
2022-01-13 15:26             ` Sebastian Andrzej Siewior
2022-01-13 15:26               ` Sebastian Andrzej Siewior
2022-01-05 14:59 ` [RFC PATCH 0/3] mm/memcg: Address PREEMPT_RT problems instead of disabling it Michal Koutný
2022-01-05 14:59   ` Michal Koutný
2022-01-05 15:06   ` Sebastian Andrzej Siewior
2022-01-05 15:06     ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211222114111.2206248-1-bigeasy@linutronix.de \
    --to=bigeasy@linutronix.de \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=longman@redhat.com \
    --cc=mhocko@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.