linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [bug report] resctrl high memory comsumption
@ 2020-01-08 17:07 Shakeel Butt
  2020-01-08 20:23 ` Fenghua Yu
  0 siblings, 1 reply; 7+ messages in thread
From: Shakeel Butt @ 2020-01-08 17:07 UTC (permalink / raw)
  To: Reinette Chatre, Fenghua Yu, Borislav Petkov, LKML,
	Thomas Gleixner, Ingo Molnar, x86

Hi,

Recently we had a bug in the system software writing the same pids to
the tasks file of resctrl group multiple times. The resctrl code
allocates "struct task_move_callback" for each such write and call
task_work_add() for that task to handle it on return to user-space
without checking if such request already exist for that particular
task. The issue arises for long sleeping tasks which has thousands for
such request queued to be handled. On our production, we notice
thousands of tasks having thousands of such requests and taking GiBs
of memory for "struct task_move_callback". I am not very familiar with
the code to judge if task_work_cancel() is the right approach or just
checking closid/rmid before doing task_work_add().

==repro==
# mkdir /sys/fs/resctrl/test
# cat /proc/slabinfo | grep kmalloc-32
kmalloc-32         57219  57288     32  124    1 : tunables  120   60
  8 : slabdata    462    462      0
# sleep 600&
[1] 17611
# for i in {1..200000}; do echo 17611 > /sys/fs/resctrl/test/tasks ; done
# cat /proc/slabinfo | grep kmalloc-32
kmalloc-32        257466 257548     32  124    1 : tunables  120   60
  8 : slabdata   2077   2077      5
# kill 17611
[1]+  Terminated              sleep 600
# cat /proc/slabinfo | grep kmalloc-32
kmalloc-32         57924  60636     32  124    1 : tunables  120   60
  8 : slabdata    470    489    385

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] resctrl high memory comsumption
  2020-01-08 17:07 [bug report] resctrl high memory comsumption Shakeel Butt
@ 2020-01-08 20:23 ` Fenghua Yu
  2020-01-08 20:42   ` Reinette Chatre
  2020-01-08 21:20   ` Shakeel Butt
  0 siblings, 2 replies; 7+ messages in thread
From: Fenghua Yu @ 2020-01-08 20:23 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Reinette Chatre, Borislav Petkov, LKML, Thomas Gleixner,
	Ingo Molnar, x86

On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote:
> Hi,
> 
> Recently we had a bug in the system software writing the same pids to
> the tasks file of resctrl group multiple times. The resctrl code
> allocates "struct task_move_callback" for each such write and call
> task_work_add() for that task to handle it on return to user-space
> without checking if such request already exist for that particular
> task. The issue arises for long sleeping tasks which has thousands for
> such request queued to be handled. On our production, we notice
> thousands of tasks having thousands of such requests and taking GiBs
> of memory for "struct task_move_callback". I am not very familiar with
> the code to judge if task_work_cancel() is the right approach or just
> checking closid/rmid before doing task_work_add().
> 

Thank you for reporting the issue, Shakeel!

Could you please check if the following patch fixes the issue?
From 3c23c39b6a44fdfbbbe0083d074dcc114d7d7f1c Mon Sep 17 00:00:00 2001
From: Fenghua Yu <fenghua.yu@intel.com>
Date: Wed, 8 Jan 2020 19:53:33 +0000
Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements

Currently a task can be moved to a rdtgroup multiple times.
But, this can cause multiple task works are added, waste memory
and degrade performance.

To fix the issue, only move the task to a rdtgroup when the task
is not in the rdgroup. Don't try to move the task to the rdtgroup
again when the task is already in the rdtgroup.

Reported-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
---
 arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 2e3b06d6bbc6..75300c4a5969 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -546,6 +546,17 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
 	struct task_move_callback *callback;
 	int ret;
 
+	/* If the task is already in rdtgrp, don't move the task. */
+	if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
+	    tsk->rmid == rdtgrp->mon.rmid) ||
+	    (rdtgrp->type == RDTMON_GROUP &&
+	     rdtgrp->mon.parent->closid == tsk->closid &&
+	     tsk->rmid == rdtgrp->mon.rmid)) {
+		rdt_last_cmd_puts("Task is already in the rdgroup\n");
+
+		return -EINVAL;
+	}
+
 	callback = kzalloc(sizeof(*callback), GFP_KERNEL);
 	if (!callback)
 		return -ENOMEM;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [bug report] resctrl high memory comsumption
  2020-01-08 20:23 ` Fenghua Yu
@ 2020-01-08 20:42   ` Reinette Chatre
  2020-01-08 21:42     ` Fenghua Yu
  2020-01-08 21:20   ` Shakeel Butt
  1 sibling, 1 reply; 7+ messages in thread
From: Reinette Chatre @ 2020-01-08 20:42 UTC (permalink / raw)
  To: Fenghua Yu, Shakeel Butt
  Cc: Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86

Hi Fenghua,

On 1/8/2020 12:23 PM, Fenghua Yu wrote:
> On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote:
>> Hi,
>>
>> Recently we had a bug in the system software writing the same pids to
>> the tasks file of resctrl group multiple times. The resctrl code
>> allocates "struct task_move_callback" for each such write and call
>> task_work_add() for that task to handle it on return to user-space
>> without checking if such request already exist for that particular
>> task. The issue arises for long sleeping tasks which has thousands for
>> such request queued to be handled. On our production, we notice
>> thousands of tasks having thousands of such requests and taking GiBs
>> of memory for "struct task_move_callback". I am not very familiar with
>> the code to judge if task_work_cancel() is the right approach or just
>> checking closid/rmid before doing task_work_add().
>>
> 
> Thank you for reporting the issue, Shakeel!
> 
> Could you please check if the following patch fixes the issue?
> From 3c23c39b6a44fdfbbbe0083d074dcc114d7d7f1c Mon Sep 17 00:00:00 2001
> From: Fenghua Yu <fenghua.yu@intel.com>
> Date: Wed, 8 Jan 2020 19:53:33 +0000
> Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements
> 
> Currently a task can be moved to a rdtgroup multiple times.
> But, this can cause multiple task works are added, waste memory
> and degrade performance.
> 
> To fix the issue, only move the task to a rdtgroup when the task
> is not in the rdgroup. Don't try to move the task to the rdtgroup
> again when the task is already in the rdtgroup.
> 
> Reported-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 2e3b06d6bbc6..75300c4a5969 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -546,6 +546,17 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
>  	struct task_move_callback *callback;
>  	int ret;
>  
> +	/* If the task is already in rdtgrp, don't move the task. */
> +	if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
> +	    tsk->rmid == rdtgrp->mon.rmid) ||
> +	    (rdtgrp->type == RDTMON_GROUP &&
> +	     rdtgrp->mon.parent->closid == tsk->closid &&
> +	     tsk->rmid == rdtgrp->mon.rmid)) {
> +		rdt_last_cmd_puts("Task is already in the rdgroup\n");
> +
> +		return -EINVAL;
> +	}
> +
>  	callback = kzalloc(sizeof(*callback), GFP_KERNEL);
>  	if (!callback)
>  		return -ENOMEM;
> 

I think your fix would address this specific use case but a slightly
different use case will still encounter the problem of high memory
consumption. If for example, sleeping tasks are moved (many times)
between resource or monitoring groups then their task_works queue would
just keep growing. It seems that a call to task_work_cancel() before
adding a new work item should address all these cases?

Reinette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] resctrl high memory comsumption
  2020-01-08 20:23 ` Fenghua Yu
  2020-01-08 20:42   ` Reinette Chatre
@ 2020-01-08 21:20   ` Shakeel Butt
  1 sibling, 0 replies; 7+ messages in thread
From: Shakeel Butt @ 2020-01-08 21:20 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Reinette Chatre, Borislav Petkov, LKML, Thomas Gleixner,
	Ingo Molnar, x86

On Wed, Jan 8, 2020 at 12:12 PM Fenghua Yu <fenghua.yu@intel.com> wrote:
>
> On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote:
> > Hi,
> >
> > Recently we had a bug in the system software writing the same pids to
> > the tasks file of resctrl group multiple times. The resctrl code
> > allocates "struct task_move_callback" for each such write and call
> > task_work_add() for that task to handle it on return to user-space
> > without checking if such request already exist for that particular
> > task. The issue arises for long sleeping tasks which has thousands for
> > such request queued to be handled. On our production, we notice
> > thousands of tasks having thousands of such requests and taking GiBs
> > of memory for "struct task_move_callback". I am not very familiar with
> > the code to judge if task_work_cancel() is the right approach or just
> > checking closid/rmid before doing task_work_add().
> >
>
> Thank you for reporting the issue, Shakeel!
>
> Could you please check if the following patch fixes the issue?
> From 3c23c39b6a44fdfbbbe0083d074dcc114d7d7f1c Mon Sep 17 00:00:00 2001
> From: Fenghua Yu <fenghua.yu@intel.com>
> Date: Wed, 8 Jan 2020 19:53:33 +0000
> Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements
>
> Currently a task can be moved to a rdtgroup multiple times.
> But, this can cause multiple task works are added, waste memory
> and degrade performance.
>
> To fix the issue, only move the task to a rdtgroup when the task
> is not in the rdgroup. Don't try to move the task to the rdtgroup
> again when the task is already in the rdtgroup.
>
> Reported-by: Shakeel Butt <shakeelb@google.com>
> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
> ---
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 2e3b06d6bbc6..75300c4a5969 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -546,6 +546,17 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
>         struct task_move_callback *callback;
>         int ret;
>
> +       /* If the task is already in rdtgrp, don't move the task. */
> +       if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid &&
> +           tsk->rmid == rdtgrp->mon.rmid) ||
> +           (rdtgrp->type == RDTMON_GROUP &&
> +            rdtgrp->mon.parent->closid == tsk->closid &&
> +            tsk->rmid == rdtgrp->mon.rmid)) {
> +               rdt_last_cmd_puts("Task is already in the rdgroup\n");
> +
> +               return -EINVAL;

Why not just return success if the task is already in that group (i.e.
just follow the cgroup behavior).

> +       }
> +
>         callback = kzalloc(sizeof(*callback), GFP_KERNEL);
>         if (!callback)
>                 return -ENOMEM;
> --
> 2.19.1
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] resctrl high memory comsumption
  2020-01-08 20:42   ` Reinette Chatre
@ 2020-01-08 21:42     ` Fenghua Yu
  2020-01-08 21:54       ` Reinette Chatre
  0 siblings, 1 reply; 7+ messages in thread
From: Fenghua Yu @ 2020-01-08 21:42 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Shakeel Butt, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86

On Wed, Jan 08, 2020 at 12:42:17PM -0800, Reinette Chatre wrote:
> Hi Fenghua,
> On 1/8/2020 12:23 PM, Fenghua Yu wrote:
> > On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote:
> >> Recently we had a bug in the system software writing the same pids to
> >> the tasks file of resctrl group multiple times. The resctrl code
> > Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements
> I think your fix would address this specific use case but a slightly
> different use case will still encounter the problem of high memory
> consumption. If for example, sleeping tasks are moved (many times)
> between resource or monitoring groups then their task_works queue would
> just keep growing. It seems that a call to task_work_cancel() before
> adding a new work item should address all these cases?

The checking code in this patch is also helpful to avoid redundant
task move preparation (kzalloc(), task_work_add(), etc) in the same
rdtgroup.

How about adding both the checking code and task_work_cancel()?

Thanks.

-Fenghua

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] resctrl high memory comsumption
  2020-01-08 21:42     ` Fenghua Yu
@ 2020-01-08 21:54       ` Reinette Chatre
  2020-01-13 18:38         ` Shakeel Butt
  0 siblings, 1 reply; 7+ messages in thread
From: Reinette Chatre @ 2020-01-08 21:54 UTC (permalink / raw)
  To: Fenghua Yu
  Cc: Shakeel Butt, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86

Hi Fenghua,

On 1/8/2020 1:42 PM, Fenghua Yu wrote:
> On Wed, Jan 08, 2020 at 12:42:17PM -0800, Reinette Chatre wrote:
>> Hi Fenghua,
>> On 1/8/2020 12:23 PM, Fenghua Yu wrote:
>>> On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote:
>>>> Recently we had a bug in the system software writing the same pids to
>>>> the tasks file of resctrl group multiple times. The resctrl code
>>> Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements
>> I think your fix would address this specific use case but a slightly
>> different use case will still encounter the problem of high memory
>> consumption. If for example, sleeping tasks are moved (many times)
>> between resource or monitoring groups then their task_works queue would
>> just keep growing. It seems that a call to task_work_cancel() before
>> adding a new work item should address all these cases?
> 
> The checking code in this patch is also helpful to avoid redundant
> task move preparation (kzalloc(), task_work_add(), etc) in the same
> rdtgroup.

Indeed.

> 
> How about adding both the checking code and task_work_cancel()?

That does sound good to me.

There is something in the current implementation that I would appreciate
your feedback on: Currently the task's closid and rmid are initialized
_after_ the call to task_work_add() succeeds. Should these not be
initialized before the call to task_work_add()?

Thank you

Reinette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [bug report] resctrl high memory comsumption
  2020-01-08 21:54       ` Reinette Chatre
@ 2020-01-13 18:38         ` Shakeel Butt
  0 siblings, 0 replies; 7+ messages in thread
From: Shakeel Butt @ 2020-01-13 18:38 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: Fenghua Yu, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86

On Wed, Jan 8, 2020 at 1:54 PM Reinette Chatre
<reinette.chatre@intel.com> wrote:
>
> Hi Fenghua,
>
> On 1/8/2020 1:42 PM, Fenghua Yu wrote:
> > On Wed, Jan 08, 2020 at 12:42:17PM -0800, Reinette Chatre wrote:
> >> Hi Fenghua,
> >> On 1/8/2020 12:23 PM, Fenghua Yu wrote:
> >>> On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote:
> >>>> Recently we had a bug in the system software writing the same pids to
> >>>> the tasks file of resctrl group multiple times. The resctrl code
> >>> Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements
> >> I think your fix would address this specific use case but a slightly
> >> different use case will still encounter the problem of high memory
> >> consumption. If for example, sleeping tasks are moved (many times)
> >> between resource or monitoring groups then their task_works queue would
> >> just keep growing. It seems that a call to task_work_cancel() before
> >> adding a new work item should address all these cases?
> >
> > The checking code in this patch is also helpful to avoid redundant
> > task move preparation (kzalloc(), task_work_add(), etc) in the same
> > rdtgroup.
>
> Indeed.
>
> >
> > How about adding both the checking code and task_work_cancel()?
>
> That does sound good to me.
>

Hi Fenghua, any updates here?

> There is something in the current implementation that I would appreciate
> your feedback on: Currently the task's closid and rmid are initialized
> _after_ the call to task_work_add() succeeds. Should these not be
> initialized before the call to task_work_add()?
>

This seems like a potential race.

thanks,
Shakeel

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-01-13 18:38 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 17:07 [bug report] resctrl high memory comsumption Shakeel Butt
2020-01-08 20:23 ` Fenghua Yu
2020-01-08 20:42   ` Reinette Chatre
2020-01-08 21:42     ` Fenghua Yu
2020-01-08 21:54       ` Reinette Chatre
2020-01-13 18:38         ` Shakeel Butt
2020-01-08 21:20   ` Shakeel Butt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).