Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
@ 2019-11-06 22:51 Roman Gushchin
  2019-11-06 22:51 ` [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() Roman Gushchin
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Roman Gushchin @ 2019-11-06 22:51 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Michal Hocko, Johannes Weiner, linux-kernel, kernel-team,
	Roman Gushchin, stable, Tejun Heo

We've encountered a rcu stall in get_mem_cgroup_from_mm():

 rcu: INFO: rcu_sched self-detected stall on CPU
 rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
 (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
 <...>
 RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
 <...>
 __memcg_kmem_charge+0x55/0x140
 __alloc_pages_nodemask+0x267/0x320
 pipe_write+0x1ad/0x400
 new_sync_write+0x127/0x1c0
 __kernel_write+0x4f/0xf0
 dump_emit+0x91/0xc0
 writenote+0xa0/0xc0
 elf_core_dump+0x11af/0x1430
 do_coredump+0xc65/0xee0
 ? unix_stream_sendmsg+0x37d/0x3b0
 get_signal+0x132/0x7c0
 do_signal+0x36/0x640
 ? recalc_sigpending+0x17/0x50
 exit_to_usermode_loop+0x61/0xd0
 do_syscall_64+0xd4/0x100
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

The problem is caused by an exiting task which is associated with
an offline memcg. We're iterating over and over in the
do {} while (!css_tryget_online()) loop, but obviously the memcg won't
become online and the exiting task won't be migrated to a live memcg.

Let's fix it by switching from css_tryget_online() to css_tryget().

As css_tryget_online() cannot guarantee that the memcg won't go
offline, the check is usually useless, except some rare cases
when for example it determines if something should be presented
to a user.

A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: stable@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 50f5bc55fcec..c5b5f74cfd4d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -939,7 +939,7 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
 			if (unlikely(!memcg))
 				memcg = root_mem_cgroup;
 		}
-	} while (!css_tryget_online(&memcg->css));
+	} while (!css_tryget(&memcg->css));
 	rcu_read_unlock();
 	return memcg;
 }
-- 
2.17.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
  2019-11-06 22:51 [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Roman Gushchin
@ 2019-11-06 22:51 ` Roman Gushchin
  2019-11-07  0:26   ` Johannes Weiner
                     ` (2 more replies)
  2019-11-07  0:22 ` [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Johannes Weiner
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 18+ messages in thread
From: Roman Gushchin @ 2019-11-06 22:51 UTC (permalink / raw)
  To: linux-mm, Andrew Morton
  Cc: Michal Hocko, Johannes Weiner, linux-kernel, kernel-team,
	Roman Gushchin, stable, Tejun Heo

An exiting task might belong to an offline cgroup. In this case
an attempt to grab a cgroup reference from the task can end up
with an infinite loop in hugetlb_cgroup_charge_cgroup(), because
neither the cgroup will become online, neither the task will
be migrated to a live cgroup.

Fix this by switching over to css_tryget(). As css_tryget_online()
can't guarantee that the cgroup won't go offline, in most cases
the check doesn't make sense. In this particular case users of
hugetlb_cgroup_charge_cgroup() are not affected by this change.

A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").

Signed-off-by: Roman Gushchin <guro@fb.com>
Cc: stable@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>
---
 mm/hugetlb_cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index f1930fa0b445..2ac38bdc18a1 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -196,7 +196,7 @@ int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
 again:
 	rcu_read_lock();
 	h_cg = hugetlb_cgroup_from_task(current);
-	if (!css_tryget_online(&h_cg->css)) {
+	if (!css_tryget(&h_cg->css)) {
 		rcu_read_unlock();
 		goto again;
 	}
-- 
2.17.1



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-06 22:51 [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Roman Gushchin
  2019-11-06 22:51 ` [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() Roman Gushchin
@ 2019-11-07  0:22 ` Johannes Weiner
  2019-11-07  1:25   ` Shakeel Butt
  2019-11-07 12:21 ` Michal Hocko
  2019-11-07 15:43 ` Tejun Heo
  3 siblings, 1 reply; 18+ messages in thread
From: Johannes Weiner @ 2019-11-07  0:22 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Michal Hocko, linux-kernel, kernel-team,
	stable, Tejun Heo

On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote:
> We've encountered a rcu stall in get_mem_cgroup_from_mm():
> 
>  rcu: INFO: rcu_sched self-detected stall on CPU
>  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
>  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
>  <...>
>  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
>  <...>
>  __memcg_kmem_charge+0x55/0x140
>  __alloc_pages_nodemask+0x267/0x320
>  pipe_write+0x1ad/0x400
>  new_sync_write+0x127/0x1c0
>  __kernel_write+0x4f/0xf0
>  dump_emit+0x91/0xc0
>  writenote+0xa0/0xc0
>  elf_core_dump+0x11af/0x1430
>  do_coredump+0xc65/0xee0
>  ? unix_stream_sendmsg+0x37d/0x3b0
>  get_signal+0x132/0x7c0
>  do_signal+0x36/0x640
>  ? recalc_sigpending+0x17/0x50
>  exit_to_usermode_loop+0x61/0xd0
>  do_syscall_64+0xd4/0x100
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The problem is caused by an exiting task which is associated with
> an offline memcg. We're iterating over and over in the
> do {} while (!css_tryget_online()) loop, but obviously the memcg won't
> become online and the exiting task won't be migrated to a live memcg.
> 
> Let's fix it by switching from css_tryget_online() to css_tryget().
> 
> As css_tryget_online() cannot guarantee that the memcg won't go
> offline, the check is usually useless, except some rare cases
> when for example it determines if something should be presented
> to a user.
> 
> A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> css_tryget() instead of css_tryget_online() in task_get_css()").
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: stable@vger.kernel.org
> Cc: Tejun Heo <tj@kernel.org>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

The bug aside, it doesn't matter whether the cgroup is online for the
callers. It used to matter when offlining needed to evacuate all
charges from the memcg, and so needed to prevent new ones from showing
up, but we don't care now.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
  2019-11-06 22:51 ` [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() Roman Gushchin
@ 2019-11-07  0:26   ` Johannes Weiner
  2019-11-07  2:31   ` Shakeel Butt
  2019-11-07 15:44   ` Tejun Heo
  2 siblings, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2019-11-07  0:26 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Michal Hocko, linux-kernel, kernel-team,
	stable, Tejun Heo

On Wed, Nov 06, 2019 at 02:51:31PM -0800, Roman Gushchin wrote:
> An exiting task might belong to an offline cgroup. In this case
> an attempt to grab a cgroup reference from the task can end up
> with an infinite loop in hugetlb_cgroup_charge_cgroup(), because
> neither the cgroup will become online, neither the task will
> be migrated to a live cgroup.
> 
> Fix this by switching over to css_tryget(). As css_tryget_online()
> can't guarantee that the cgroup won't go offline, in most cases
> the check doesn't make sense. In this particular case users of
> hugetlb_cgroup_charge_cgroup() are not affected by this change.
> 
> A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> css_tryget() instead of css_tryget_online() in task_get_css()").
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07  0:22 ` [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Johannes Weiner
@ 2019-11-07  1:25   ` Shakeel Butt
  2019-11-07  1:43     ` Johannes Weiner
  2019-11-07  1:43     ` Roman Gushchin
  0 siblings, 2 replies; 18+ messages in thread
From: Shakeel Butt @ 2019-11-07  1:25 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Roman Gushchin, Linux MM, Andrew Morton, Michal Hocko, LKML,
	Kernel Team, stable, Tejun Heo

On Wed, Nov 6, 2019 at 4:22 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote:
> > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> >
> >  rcu: INFO: rcu_sched self-detected stall on CPU
> >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> >  <...>
> >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> >  <...>
> >  __memcg_kmem_charge+0x55/0x140
> >  __alloc_pages_nodemask+0x267/0x320
> >  pipe_write+0x1ad/0x400
> >  new_sync_write+0x127/0x1c0
> >  __kernel_write+0x4f/0xf0
> >  dump_emit+0x91/0xc0
> >  writenote+0xa0/0xc0
> >  elf_core_dump+0x11af/0x1430
> >  do_coredump+0xc65/0xee0
> >  ? unix_stream_sendmsg+0x37d/0x3b0
> >  get_signal+0x132/0x7c0
> >  do_signal+0x36/0x640
> >  ? recalc_sigpending+0x17/0x50
> >  exit_to_usermode_loop+0x61/0xd0
> >  do_syscall_64+0xd4/0x100
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > The problem is caused by an exiting task which is associated with
> > an offline memcg. We're iterating over and over in the
> > do {} while (!css_tryget_online()) loop, but obviously the memcg won't
> > become online and the exiting task won't be migrated to a live memcg.
> >
> > Let's fix it by switching from css_tryget_online() to css_tryget().
> >
> > As css_tryget_online() cannot guarantee that the memcg won't go
> > offline, the check is usually useless, except some rare cases
> > when for example it determines if something should be presented
> > to a user.
> >
> > A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> > css_tryget() instead of css_tryget_online() in task_get_css()").
> >
> > Signed-off-by: Roman Gushchin <guro@fb.com>
> > Cc: stable@vger.kernel.org
> > Cc: Tejun Heo <tj@kernel.org>
>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> The bug aside, it doesn't matter whether the cgroup is online for the
> callers. It used to matter when offlining needed to evacuate all
> charges from the memcg, and so needed to prevent new ones from showing
> up, but we don't care now.

Should get_mem_cgroup_from_current() and get_mem_cgroup_from_page() be
switched to css_tryget() as well then?


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07  1:25   ` Shakeel Butt
@ 2019-11-07  1:43     ` Johannes Weiner
  2019-11-07  1:43     ` Roman Gushchin
  1 sibling, 0 replies; 18+ messages in thread
From: Johannes Weiner @ 2019-11-07  1:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Linux MM, Andrew Morton, Michal Hocko, LKML,
	Kernel Team, stable, Tejun Heo

On Wed, Nov 06, 2019 at 05:25:26PM -0800, Shakeel Butt wrote:
> On Wed, Nov 6, 2019 at 4:22 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote:
> > > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> > >
> > >  rcu: INFO: rcu_sched self-detected stall on CPU
> > >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> > >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> > >  <...>
> > >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> > >  <...>
> > >  __memcg_kmem_charge+0x55/0x140
> > >  __alloc_pages_nodemask+0x267/0x320
> > >  pipe_write+0x1ad/0x400
> > >  new_sync_write+0x127/0x1c0
> > >  __kernel_write+0x4f/0xf0
> > >  dump_emit+0x91/0xc0
> > >  writenote+0xa0/0xc0
> > >  elf_core_dump+0x11af/0x1430
> > >  do_coredump+0xc65/0xee0
> > >  ? unix_stream_sendmsg+0x37d/0x3b0
> > >  get_signal+0x132/0x7c0
> > >  do_signal+0x36/0x640
> > >  ? recalc_sigpending+0x17/0x50
> > >  exit_to_usermode_loop+0x61/0xd0
> > >  do_syscall_64+0xd4/0x100
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >
> > > The problem is caused by an exiting task which is associated with
> > > an offline memcg. We're iterating over and over in the
> > > do {} while (!css_tryget_online()) loop, but obviously the memcg won't
> > > become online and the exiting task won't be migrated to a live memcg.
> > >
> > > Let's fix it by switching from css_tryget_online() to css_tryget().
> > >
> > > As css_tryget_online() cannot guarantee that the memcg won't go
> > > offline, the check is usually useless, except some rare cases
> > > when for example it determines if something should be presented
> > > to a user.
> > >
> > > A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> > > css_tryget() instead of css_tryget_online() in task_get_css()").
> > >
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > Cc: stable@vger.kernel.org
> > > Cc: Tejun Heo <tj@kernel.org>
> >
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > The bug aside, it doesn't matter whether the cgroup is online for the
> > callers. It used to matter when offlining needed to evacuate all
> > charges from the memcg, and so needed to prevent new ones from showing
> > up, but we don't care now.
> 
> Should get_mem_cgroup_from_current() and get_mem_cgroup_from_page() be
> switched to css_tryget() as well then?

Yes. Looking at the remaining css_tryget_online() in memcontrol.c, I
don't think the online/offline distinction is meaningful for any of
them anymore.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07  1:25   ` Shakeel Butt
  2019-11-07  1:43     ` Johannes Weiner
@ 2019-11-07  1:43     ` Roman Gushchin
  2019-11-07  2:21       ` Shakeel Butt
  1 sibling, 1 reply; 18+ messages in thread
From: Roman Gushchin @ 2019-11-07  1:43 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Linux MM, Andrew Morton, Michal Hocko, LKML,
	Kernel Team, stable, Tejun Heo

On Wed, Nov 06, 2019 at 05:25:26PM -0800, Shakeel Butt wrote:
> On Wed, Nov 6, 2019 at 4:22 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote:
> > > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> > >
> > >  rcu: INFO: rcu_sched self-detected stall on CPU
> > >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> > >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> > >  <...>
> > >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> > >  <...>
> > >  __memcg_kmem_charge+0x55/0x140
> > >  __alloc_pages_nodemask+0x267/0x320
> > >  pipe_write+0x1ad/0x400
> > >  new_sync_write+0x127/0x1c0
> > >  __kernel_write+0x4f/0xf0
> > >  dump_emit+0x91/0xc0
> > >  writenote+0xa0/0xc0
> > >  elf_core_dump+0x11af/0x1430
> > >  do_coredump+0xc65/0xee0
> > >  ? unix_stream_sendmsg+0x37d/0x3b0
> > >  get_signal+0x132/0x7c0
> > >  do_signal+0x36/0x640
> > >  ? recalc_sigpending+0x17/0x50
> > >  exit_to_usermode_loop+0x61/0xd0
> > >  do_syscall_64+0xd4/0x100
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > >
> > > The problem is caused by an exiting task which is associated with
> > > an offline memcg. We're iterating over and over in the
> > > do {} while (!css_tryget_online()) loop, but obviously the memcg won't
> > > become online and the exiting task won't be migrated to a live memcg.
> > >
> > > Let's fix it by switching from css_tryget_online() to css_tryget().
> > >
> > > As css_tryget_online() cannot guarantee that the memcg won't go
> > > offline, the check is usually useless, except some rare cases
> > > when for example it determines if something should be presented
> > > to a user.
> > >
> > > A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> > > css_tryget() instead of css_tryget_online() in task_get_css()").
> > >
> > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > Cc: stable@vger.kernel.org
> > > Cc: Tejun Heo <tj@kernel.org>
> >
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > The bug aside, it doesn't matter whether the cgroup is online for the
> > callers. It used to matter when offlining needed to evacuate all
> > charges from the memcg, and so needed to prevent new ones from showing
> > up, but we don't care now.
> 
> Should get_mem_cgroup_from_current() and get_mem_cgroup_from_page() be
> switched to css_tryget() as well then?

In those case it can't cause a rcu stall, so it's not a so urgent.
But you are right, we should probably do the same here. I'll look
at all remaining callers and prepare the patchset.

I'll also probably rename it to css_tryget_if_online() to make obvious
that it doesn't hold the cgroup from being onlined.

Thanks!


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07  1:43     ` Roman Gushchin
@ 2019-11-07  2:21       ` Shakeel Butt
  2019-11-07  2:28         ` Shakeel Butt
  0 siblings, 1 reply; 18+ messages in thread
From: Shakeel Butt @ 2019-11-07  2:21 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Linux MM, Andrew Morton, Michal Hocko, LKML,
	Kernel Team, stable, Tejun Heo

On Wed, Nov 6, 2019 at 5:43 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Wed, Nov 06, 2019 at 05:25:26PM -0800, Shakeel Butt wrote:
> > On Wed, Nov 6, 2019 at 4:22 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote:
> > > > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> > > >
> > > >  rcu: INFO: rcu_sched self-detected stall on CPU
> > > >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> > > >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> > > >  <...>
> > > >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> > > >  <...>
> > > >  __memcg_kmem_charge+0x55/0x140
> > > >  __alloc_pages_nodemask+0x267/0x320
> > > >  pipe_write+0x1ad/0x400
> > > >  new_sync_write+0x127/0x1c0
> > > >  __kernel_write+0x4f/0xf0
> > > >  dump_emit+0x91/0xc0
> > > >  writenote+0xa0/0xc0
> > > >  elf_core_dump+0x11af/0x1430
> > > >  do_coredump+0xc65/0xee0
> > > >  ? unix_stream_sendmsg+0x37d/0x3b0
> > > >  get_signal+0x132/0x7c0
> > > >  do_signal+0x36/0x640
> > > >  ? recalc_sigpending+0x17/0x50
> > > >  exit_to_usermode_loop+0x61/0xd0
> > > >  do_syscall_64+0xd4/0x100
> > > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > >
> > > > The problem is caused by an exiting task which is associated with
> > > > an offline memcg. We're iterating over and over in the
> > > > do {} while (!css_tryget_online()) loop, but obviously the memcg won't
> > > > become online and the exiting task won't be migrated to a live memcg.
> > > >
> > > > Let's fix it by switching from css_tryget_online() to css_tryget().
> > > >
> > > > As css_tryget_online() cannot guarantee that the memcg won't go
> > > > offline, the check is usually useless, except some rare cases
> > > > when for example it determines if something should be presented
> > > > to a user.
> > > >
> > > > A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> > > > css_tryget() instead of css_tryget_online() in task_get_css()").
> > > >
> > > > Signed-off-by: Roman Gushchin <guro@fb.com>
> > > > Cc: stable@vger.kernel.org
> > > > Cc: Tejun Heo <tj@kernel.org>
> > >
> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> > >
> > > The bug aside, it doesn't matter whether the cgroup is online for the
> > > callers. It used to matter when offlining needed to evacuate all
> > > charges from the memcg, and so needed to prevent new ones from showing
> > > up, but we don't care now.
> >
> > Should get_mem_cgroup_from_current() and get_mem_cgroup_from_page() be
> > switched to css_tryget() as well then?
>
> In those case it can't cause a rcu stall, so it's not a so urgent.
> But you are right, we should probably do the same here. I'll look
> at all remaining callers and prepare the patchset.
>
> I'll also probably rename it to css_tryget_if_online() to make obvious
> that it doesn't hold the cgroup from being onlined.
>

SGTM


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07  2:21       ` Shakeel Butt
@ 2019-11-07  2:28         ` Shakeel Butt
  0 siblings, 0 replies; 18+ messages in thread
From: Shakeel Butt @ 2019-11-07  2:28 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Linux MM, Andrew Morton, Michal Hocko, LKML,
	Kernel Team, stable, Tejun Heo

On Wed, Nov 6, 2019 at 6:21 PM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Wed, Nov 6, 2019 at 5:43 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > On Wed, Nov 06, 2019 at 05:25:26PM -0800, Shakeel Butt wrote:
> > > On Wed, Nov 6, 2019 at 4:22 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > >
> > > > On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote:
> > > > > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> > > > >
> > > > >  rcu: INFO: rcu_sched self-detected stall on CPU
> > > > >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> > > > >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> > > > >  <...>
> > > > >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> > > > >  <...>
> > > > >  __memcg_kmem_charge+0x55/0x140
> > > > >  __alloc_pages_nodemask+0x267/0x320
> > > > >  pipe_write+0x1ad/0x400
> > > > >  new_sync_write+0x127/0x1c0
> > > > >  __kernel_write+0x4f/0xf0
> > > > >  dump_emit+0x91/0xc0
> > > > >  writenote+0xa0/0xc0
> > > > >  elf_core_dump+0x11af/0x1430
> > > > >  do_coredump+0xc65/0xee0
> > > > >  ? unix_stream_sendmsg+0x37d/0x3b0
> > > > >  get_signal+0x132/0x7c0
> > > > >  do_signal+0x36/0x640
> > > > >  ? recalc_sigpending+0x17/0x50
> > > > >  exit_to_usermode_loop+0x61/0xd0
> > > > >  do_syscall_64+0xd4/0x100
> > > > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > >
> > > > > The problem is caused by an exiting task which is associated with
> > > > > an offline memcg. We're iterating over and over in the
> > > > > do {} while (!css_tryget_online()) loop, but obviously the memcg won't
> > > > > become online and the exiting task won't be migrated to a live memcg.
> > > > >
> > > > > Let's fix it by switching from css_tryget_online() to css_tryget().
> > > > >
> > > > > As css_tryget_online() cannot guarantee that the memcg won't go
> > > > > offline, the check is usually useless, except some rare cases
> > > > > when for example it determines if something should be presented
> > > > > to a user.
> > > > >
> > > > > A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> > > > > css_tryget() instead of css_tryget_online() in task_get_css()").
> > > > >
> > > > > Signed-off-by: Roman Gushchin <guro@fb.com>

Forgot to add:

Reviewed-by: Shakeel Butt <shakeeb@google.com>

> > > > > Cc: stable@vger.kernel.org
> > > > > Cc: Tejun Heo <tj@kernel.org>
> > > >
> > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> > > >
> > > > The bug aside, it doesn't matter whether the cgroup is online for the
> > > > callers. It used to matter when offlining needed to evacuate all
> > > > charges from the memcg, and so needed to prevent new ones from showing
> > > > up, but we don't care now.
> > >
> > > Should get_mem_cgroup_from_current() and get_mem_cgroup_from_page() be
> > > switched to css_tryget() as well then?
> >
> > In those case it can't cause a rcu stall, so it's not a so urgent.
> > But you are right, we should probably do the same here. I'll look
> > at all remaining callers and prepare the patchset.
> >
> > I'll also probably rename it to css_tryget_if_online() to make obvious
> > that it doesn't hold the cgroup from being onlined.
> >
>
> SGTM


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
  2019-11-06 22:51 ` [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() Roman Gushchin
  2019-11-07  0:26   ` Johannes Weiner
@ 2019-11-07  2:31   ` Shakeel Butt
  2019-11-07 15:44   ` Tejun Heo
  2 siblings, 0 replies; 18+ messages in thread
From: Shakeel Butt @ 2019-11-07  2:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Linux MM, Andrew Morton, Michal Hocko, Johannes Weiner, LKML,
	Kernel Team, stable, Tejun Heo

On Wed, Nov 6, 2019 at 2:53 PM Roman Gushchin <guro@fb.com> wrote:
>
> An exiting task might belong to an offline cgroup. In this case
> an attempt to grab a cgroup reference from the task can end up
> with an infinite loop in hugetlb_cgroup_charge_cgroup(), because
> neither the cgroup will become online, neither the task will
> be migrated to a live cgroup.
>
> Fix this by switching over to css_tryget(). As css_tryget_online()
> can't guarantee that the cgroup won't go offline, in most cases
> the check doesn't make sense. In this particular case users of
> hugetlb_cgroup_charge_cgroup() are not affected by this change.
>
> A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> css_tryget() instead of css_tryget_online() in task_get_css()").
>
> Signed-off-by: Roman Gushchin <guro@fb.com>

Reviewed-by: Shakeel Butt <shakeelb@google.com>

> Cc: stable@vger.kernel.org
> Cc: Tejun Heo <tj@kernel.org>
> ---
>  mm/hugetlb_cgroup.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> index f1930fa0b445..2ac38bdc18a1 100644
> --- a/mm/hugetlb_cgroup.c
> +++ b/mm/hugetlb_cgroup.c
> @@ -196,7 +196,7 @@ int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
>  again:
>         rcu_read_lock();
>         h_cg = hugetlb_cgroup_from_task(current);
> -       if (!css_tryget_online(&h_cg->css)) {
> +       if (!css_tryget(&h_cg->css)) {
>                 rcu_read_unlock();
>                 goto again;
>         }
> --
> 2.17.1
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-06 22:51 [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Roman Gushchin
  2019-11-06 22:51 ` [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() Roman Gushchin
  2019-11-07  0:22 ` [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Johannes Weiner
@ 2019-11-07 12:21 ` Michal Hocko
  2019-11-07 15:43   ` Tejun Heo
  2019-11-07 16:42   ` Roman Gushchin
  2019-11-07 15:43 ` Tejun Heo
  3 siblings, 2 replies; 18+ messages in thread
From: Michal Hocko @ 2019-11-07 12:21 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Johannes Weiner, linux-kernel,
	kernel-team, stable, Tejun Heo

On Wed 06-11-19 14:51:30, Roman Gushchin wrote:
> We've encountered a rcu stall in get_mem_cgroup_from_mm():
> 
>  rcu: INFO: rcu_sched self-detected stall on CPU
>  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
>  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
>  <...>
>  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
>  <...>
>  __memcg_kmem_charge+0x55/0x140
>  __alloc_pages_nodemask+0x267/0x320
>  pipe_write+0x1ad/0x400
>  new_sync_write+0x127/0x1c0
>  __kernel_write+0x4f/0xf0
>  dump_emit+0x91/0xc0
>  writenote+0xa0/0xc0
>  elf_core_dump+0x11af/0x1430
>  do_coredump+0xc65/0xee0
>  ? unix_stream_sendmsg+0x37d/0x3b0
>  get_signal+0x132/0x7c0
>  do_signal+0x36/0x640
>  ? recalc_sigpending+0x17/0x50
>  exit_to_usermode_loop+0x61/0xd0
>  do_syscall_64+0xd4/0x100
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The problem is caused by an exiting task which is associated with
> an offline memcg.

Hmm, how can we have a task in an offline memcg? I thought that any
existing task will prevent cgroup removal from proceeding. Is this some
sort of race where the task managed to disassociate from the cgroup
while there is still a binding to a memcg existing? What am I missing?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07 12:21 ` Michal Hocko
@ 2019-11-07 15:43   ` Tejun Heo
  2019-11-07 16:42   ` Roman Gushchin
  1 sibling, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2019-11-07 15:43 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, linux-mm, Andrew Morton, Johannes Weiner,
	linux-kernel, kernel-team, stable

Hello,

On Thu, Nov 07, 2019 at 01:21:25PM +0100, Michal Hocko wrote:
> Hmm, how can we have a task in an offline memcg? I thought that any
> existing task will prevent cgroup removal from proceeding. Is this some
> sort of race where the task managed to disassociate from the cgroup
> while there is still a binding to a memcg existing? What am I missing?

The previous cgroup one was from bsd process accounting which happens
after the process is marked head.  This one IIUC is from core dumping
path.

Thanks.

-- 
tejun


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-06 22:51 [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Roman Gushchin
                   ` (2 preceding siblings ...)
  2019-11-07 12:21 ` Michal Hocko
@ 2019-11-07 15:43 ` Tejun Heo
  3 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2019-11-07 15:43 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Michal Hocko, Johannes Weiner,
	linux-kernel, kernel-team, stable

On Wed, Nov 06, 2019 at 02:51:30PM -0800, Roman Gushchin wrote:
> We've encountered a rcu stall in get_mem_cgroup_from_mm():
> 
>  rcu: INFO: rcu_sched self-detected stall on CPU
>  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
>  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
>  <...>
>  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
>  <...>
>  __memcg_kmem_charge+0x55/0x140
>  __alloc_pages_nodemask+0x267/0x320
>  pipe_write+0x1ad/0x400
>  new_sync_write+0x127/0x1c0
>  __kernel_write+0x4f/0xf0
>  dump_emit+0x91/0xc0
>  writenote+0xa0/0xc0
>  elf_core_dump+0x11af/0x1430
>  do_coredump+0xc65/0xee0
>  ? unix_stream_sendmsg+0x37d/0x3b0
>  get_signal+0x132/0x7c0
>  do_signal+0x36/0x640
>  ? recalc_sigpending+0x17/0x50
>  exit_to_usermode_loop+0x61/0xd0
>  do_syscall_64+0xd4/0x100
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The problem is caused by an exiting task which is associated with
> an offline memcg. We're iterating over and over in the
> do {} while (!css_tryget_online()) loop, but obviously the memcg won't
> become online and the exiting task won't be migrated to a live memcg.
> 
> Let's fix it by switching from css_tryget_online() to css_tryget().
> 
> As css_tryget_online() cannot guarantee that the memcg won't go
> offline, the check is usually useless, except some rare cases
> when for example it determines if something should be presented
> to a user.
> 
> A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> css_tryget() instead of css_tryget_online() in task_get_css()").
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: stable@vger.kernel.org
> Cc: Tejun Heo <tj@kernel.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
  2019-11-06 22:51 ` [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() Roman Gushchin
  2019-11-07  0:26   ` Johannes Weiner
  2019-11-07  2:31   ` Shakeel Butt
@ 2019-11-07 15:44   ` Tejun Heo
  2 siblings, 0 replies; 18+ messages in thread
From: Tejun Heo @ 2019-11-07 15:44 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Michal Hocko, Johannes Weiner,
	linux-kernel, kernel-team, stable

On Wed, Nov 06, 2019 at 02:51:31PM -0800, Roman Gushchin wrote:
> An exiting task might belong to an offline cgroup. In this case
> an attempt to grab a cgroup reference from the task can end up
> with an infinite loop in hugetlb_cgroup_charge_cgroup(), because
> neither the cgroup will become online, neither the task will
> be migrated to a live cgroup.
> 
> Fix this by switching over to css_tryget(). As css_tryget_online()
> can't guarantee that the cgroup won't go offline, in most cases
> the check doesn't make sense. In this particular case users of
> hugetlb_cgroup_charge_cgroup() are not affected by this change.
> 
> A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
> css_tryget() instead of css_tryget_online() in task_get_css()").
> 
> Signed-off-by: Roman Gushchin <guro@fb.com>
> Cc: stable@vger.kernel.org
> Cc: Tejun Heo <tj@kernel.org>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07 12:21 ` Michal Hocko
  2019-11-07 15:43   ` Tejun Heo
@ 2019-11-07 16:42   ` Roman Gushchin
  2019-11-07 17:02     ` Michal Hocko
  1 sibling, 1 reply; 18+ messages in thread
From: Roman Gushchin @ 2019-11-07 16:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Johannes Weiner, linux-kernel,
	Kernel Team, stable, Tejun Heo

On Thu, Nov 07, 2019 at 01:21:25PM +0100, Michal Hocko wrote:
> On Wed 06-11-19 14:51:30, Roman Gushchin wrote:
> > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> > 
> >  rcu: INFO: rcu_sched self-detected stall on CPU
> >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> >  <...>
> >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> >  <...>
> >  __memcg_kmem_charge+0x55/0x140
> >  __alloc_pages_nodemask+0x267/0x320
> >  pipe_write+0x1ad/0x400
> >  new_sync_write+0x127/0x1c0
> >  __kernel_write+0x4f/0xf0
> >  dump_emit+0x91/0xc0
> >  writenote+0xa0/0xc0
> >  elf_core_dump+0x11af/0x1430
> >  do_coredump+0xc65/0xee0
> >  ? unix_stream_sendmsg+0x37d/0x3b0
> >  get_signal+0x132/0x7c0
> >  do_signal+0x36/0x640
> >  ? recalc_sigpending+0x17/0x50
> >  exit_to_usermode_loop+0x61/0xd0
> >  do_syscall_64+0xd4/0x100
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > The problem is caused by an exiting task which is associated with
> > an offline memcg.
> 
> Hmm, how can we have a task in an offline memcg? I thought that any
> existing task will prevent cgroup removal from proceeding. Is this some
> sort of race where the task managed to disassociate from the cgroup
> while there is still a binding to a memcg existing? What am I missing?

It's an exiting task with the PF_EXITING flag set and it's in their late stages
of life.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07 16:42   ` Roman Gushchin
@ 2019-11-07 17:02     ` Michal Hocko
  2019-11-07 22:41       ` Roman Gushchin
  0 siblings, 1 reply; 18+ messages in thread
From: Michal Hocko @ 2019-11-07 17:02 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Johannes Weiner, linux-kernel,
	Kernel Team, stable, Tejun Heo

On Thu 07-11-19 16:42:41, Roman Gushchin wrote:
> On Thu, Nov 07, 2019 at 01:21:25PM +0100, Michal Hocko wrote:
> > On Wed 06-11-19 14:51:30, Roman Gushchin wrote:
> > > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> > > 
> > >  rcu: INFO: rcu_sched self-detected stall on CPU
> > >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> > >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> > >  <...>
> > >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> > >  <...>
> > >  __memcg_kmem_charge+0x55/0x140
> > >  __alloc_pages_nodemask+0x267/0x320
> > >  pipe_write+0x1ad/0x400
> > >  new_sync_write+0x127/0x1c0
> > >  __kernel_write+0x4f/0xf0
> > >  dump_emit+0x91/0xc0
> > >  writenote+0xa0/0xc0
> > >  elf_core_dump+0x11af/0x1430
> > >  do_coredump+0xc65/0xee0
> > >  ? unix_stream_sendmsg+0x37d/0x3b0
> > >  get_signal+0x132/0x7c0
> > >  do_signal+0x36/0x640
> > >  ? recalc_sigpending+0x17/0x50
> > >  exit_to_usermode_loop+0x61/0xd0
> > >  do_syscall_64+0xd4/0x100
> > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > 
> > > The problem is caused by an exiting task which is associated with
> > > an offline memcg.
> > 
> > Hmm, how can we have a task in an offline memcg? I thought that any
> > existing task will prevent cgroup removal from proceeding. Is this some
> > sort of race where the task managed to disassociate from the cgroup
> > while there is still a binding to a memcg existing? What am I missing?
> 
> It's an exiting task with the PF_EXITING flag set and it's in their late stages
> of life.

This is a signal delivery path AFAIU (get_signal) and the coredumping
happens before do_exit. My understanding is that that unlinking
happens from cgroup_exit. So either I am misreading the backtrace or
there is some other way to leave cgroups or there is something more
going on.

JFTR I am not really disputing the patch but I simply do not understand
how the problem really happened.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07 17:02     ` Michal Hocko
@ 2019-11-07 22:41       ` Roman Gushchin
  2019-11-08  8:53         ` Michal Hocko
  0 siblings, 1 reply; 18+ messages in thread
From: Roman Gushchin @ 2019-11-07 22:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, Andrew Morton, Johannes Weiner, linux-kernel,
	Kernel Team, stable, Tejun Heo

On Thu, Nov 07, 2019 at 06:02:00PM +0100, Michal Hocko wrote:
> On Thu 07-11-19 16:42:41, Roman Gushchin wrote:
> > On Thu, Nov 07, 2019 at 01:21:25PM +0100, Michal Hocko wrote:
> > > On Wed 06-11-19 14:51:30, Roman Gushchin wrote:
> > > > We've encountered a rcu stall in get_mem_cgroup_from_mm():
> > > > 
> > > >  rcu: INFO: rcu_sched self-detected stall on CPU
> > > >  rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
> > > >  (t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
> > > >  <...>
> > > >  RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
> > > >  <...>
> > > >  __memcg_kmem_charge+0x55/0x140
> > > >  __alloc_pages_nodemask+0x267/0x320
> > > >  pipe_write+0x1ad/0x400
> > > >  new_sync_write+0x127/0x1c0
> > > >  __kernel_write+0x4f/0xf0
> > > >  dump_emit+0x91/0xc0
> > > >  writenote+0xa0/0xc0
> > > >  elf_core_dump+0x11af/0x1430
> > > >  do_coredump+0xc65/0xee0
> > > >  ? unix_stream_sendmsg+0x37d/0x3b0
> > > >  get_signal+0x132/0x7c0
> > > >  do_signal+0x36/0x640
> > > >  ? recalc_sigpending+0x17/0x50
> > > >  exit_to_usermode_loop+0x61/0xd0
> > > >  do_syscall_64+0xd4/0x100
> > > >  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > 
> > > > The problem is caused by an exiting task which is associated with
> > > > an offline memcg.
> > > 
> > > Hmm, how can we have a task in an offline memcg? I thought that any
> > > existing task will prevent cgroup removal from proceeding. Is this some
> > > sort of race where the task managed to disassociate from the cgroup
> > > while there is still a binding to a memcg existing? What am I missing?
> > 
> > It's an exiting task with the PF_EXITING flag set and it's in their late stages
> > of life.
> 
> This is a signal delivery path AFAIU (get_signal) and the coredumping
> happens before do_exit. My understanding is that that unlinking
> happens from cgroup_exit. So either I am misreading the backtrace or
> there is some other way to leave cgroups or there is something more
> going on.

Yeah, you're right. I have no better explanation for this and the similar,
mentioned in the commit bsd accounting issue, than some very rare race condition
that allows cgroups to be offlined with a task inside.

I'll think more about it.

Thanks, it's a really good question.

> 
> JFTR I am not really disputing the patch but I simply do not understand
> how the problem really happened.
> 
> -- 
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
  2019-11-07 22:41       ` Roman Gushchin
@ 2019-11-08  8:53         ` Michal Hocko
  0 siblings, 0 replies; 18+ messages in thread
From: Michal Hocko @ 2019-11-08  8:53 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: linux-mm, Andrew Morton, Johannes Weiner, linux-kernel,
	Kernel Team, stable, Tejun Heo

On Thu 07-11-19 22:41:13, Roman Gushchin wrote:
> On Thu, Nov 07, 2019 at 06:02:00PM +0100, Michal Hocko wrote:
> > On Thu 07-11-19 16:42:41, Roman Gushchin wrote:
[...]
> > > It's an exiting task with the PF_EXITING flag set and it's in their late stages
> > > of life.
> > 
> > This is a signal delivery path AFAIU (get_signal) and the coredumping
> > happens before do_exit. My understanding is that that unlinking
> > happens from cgroup_exit. So either I am misreading the backtrace or
> > there is some other way to leave cgroups or there is something more
> > going on.
> 
> Yeah, you're right. I have no better explanation for this and the similar,
> mentioned in the commit bsd accounting issue,

Tejun mentioned bsd accounting issue as well, but I do not see any
explicit reference to it in neither of the two patches.

> than some very rare race condition
> that allows cgroups to be offlined with a task inside.
> 
> I'll think more about it.

Thanks a lot. As I've said, I am not opposing this change once we have a
proper changelog but I find the explanation really weak. If there is a
race then it should be fixed as well.

Thanks!
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, back to index

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-06 22:51 [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Roman Gushchin
2019-11-06 22:51 ` [PATCH 2/2] mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup() Roman Gushchin
2019-11-07  0:26   ` Johannes Weiner
2019-11-07  2:31   ` Shakeel Butt
2019-11-07 15:44   ` Tejun Heo
2019-11-07  0:22 ` [PATCH 1/2] mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm() Johannes Weiner
2019-11-07  1:25   ` Shakeel Butt
2019-11-07  1:43     ` Johannes Weiner
2019-11-07  1:43     ` Roman Gushchin
2019-11-07  2:21       ` Shakeel Butt
2019-11-07  2:28         ` Shakeel Butt
2019-11-07 12:21 ` Michal Hocko
2019-11-07 15:43   ` Tejun Heo
2019-11-07 16:42   ` Roman Gushchin
2019-11-07 17:02     ` Michal Hocko
2019-11-07 22:41       ` Roman Gushchin
2019-11-08  8:53         ` Michal Hocko
2019-11-07 15:43 ` Tejun Heo

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git