* [bug report] resctrl high memory comsumption @ 2020-01-08 17:07 Shakeel Butt 2020-01-08 20:23 ` Fenghua Yu 0 siblings, 1 reply; 7+ messages in thread From: Shakeel Butt @ 2020-01-08 17:07 UTC (permalink / raw) To: Reinette Chatre, Fenghua Yu, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86 Hi, Recently we had a bug in the system software writing the same pids to the tasks file of resctrl group multiple times. The resctrl code allocates "struct task_move_callback" for each such write and call task_work_add() for that task to handle it on return to user-space without checking if such request already exist for that particular task. The issue arises for long sleeping tasks which has thousands for such request queued to be handled. On our production, we notice thousands of tasks having thousands of such requests and taking GiBs of memory for "struct task_move_callback". I am not very familiar with the code to judge if task_work_cancel() is the right approach or just checking closid/rmid before doing task_work_add(). ==repro== # mkdir /sys/fs/resctrl/test # cat /proc/slabinfo | grep kmalloc-32 kmalloc-32 57219 57288 32 124 1 : tunables 120 60 8 : slabdata 462 462 0 # sleep 600& [1] 17611 # for i in {1..200000}; do echo 17611 > /sys/fs/resctrl/test/tasks ; done # cat /proc/slabinfo | grep kmalloc-32 kmalloc-32 257466 257548 32 124 1 : tunables 120 60 8 : slabdata 2077 2077 5 # kill 17611 [1]+ Terminated sleep 600 # cat /proc/slabinfo | grep kmalloc-32 kmalloc-32 57924 60636 32 124 1 : tunables 120 60 8 : slabdata 470 489 385 thanks, Shakeel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug report] resctrl high memory comsumption 2020-01-08 17:07 [bug report] resctrl high memory comsumption Shakeel Butt @ 2020-01-08 20:23 ` Fenghua Yu 2020-01-08 20:42 ` Reinette Chatre 2020-01-08 21:20 ` Shakeel Butt 0 siblings, 2 replies; 7+ messages in thread From: Fenghua Yu @ 2020-01-08 20:23 UTC (permalink / raw) To: Shakeel Butt Cc: Reinette Chatre, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86 On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote: > Hi, > > Recently we had a bug in the system software writing the same pids to > the tasks file of resctrl group multiple times. The resctrl code > allocates "struct task_move_callback" for each such write and call > task_work_add() for that task to handle it on return to user-space > without checking if such request already exist for that particular > task. The issue arises for long sleeping tasks which has thousands for > such request queued to be handled. On our production, we notice > thousands of tasks having thousands of such requests and taking GiBs > of memory for "struct task_move_callback". I am not very familiar with > the code to judge if task_work_cancel() is the right approach or just > checking closid/rmid before doing task_work_add(). > Thank you for reporting the issue, Shakeel! Could you please check if the following patch fixes the issue? From 3c23c39b6a44fdfbbbe0083d074dcc114d7d7f1c Mon Sep 17 00:00:00 2001 From: Fenghua Yu <fenghua.yu@intel.com> Date: Wed, 8 Jan 2020 19:53:33 +0000 Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements Currently a task can be moved to a rdtgroup multiple times. But, this can cause multiple task works are added, waste memory and degrade performance. To fix the issue, only move the task to a rdtgroup when the task is not in the rdgroup. Don't try to move the task to the rdtgroup again when the task is already in the rdtgroup. Reported-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 2e3b06d6bbc6..75300c4a5969 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -546,6 +546,17 @@ static int __rdtgroup_move_task(struct task_struct *tsk, struct task_move_callback *callback; int ret; + /* If the task is already in rdtgrp, don't move the task. */ + if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid && + tsk->rmid == rdtgrp->mon.rmid) || + (rdtgrp->type == RDTMON_GROUP && + rdtgrp->mon.parent->closid == tsk->closid && + tsk->rmid == rdtgrp->mon.rmid)) { + rdt_last_cmd_puts("Task is already in the rdgroup\n"); + + return -EINVAL; + } + callback = kzalloc(sizeof(*callback), GFP_KERNEL); if (!callback) return -ENOMEM; -- 2.19.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [bug report] resctrl high memory comsumption 2020-01-08 20:23 ` Fenghua Yu @ 2020-01-08 20:42 ` Reinette Chatre 2020-01-08 21:42 ` Fenghua Yu 2020-01-08 21:20 ` Shakeel Butt 1 sibling, 1 reply; 7+ messages in thread From: Reinette Chatre @ 2020-01-08 20:42 UTC (permalink / raw) To: Fenghua Yu, Shakeel Butt Cc: Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86 Hi Fenghua, On 1/8/2020 12:23 PM, Fenghua Yu wrote: > On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote: >> Hi, >> >> Recently we had a bug in the system software writing the same pids to >> the tasks file of resctrl group multiple times. The resctrl code >> allocates "struct task_move_callback" for each such write and call >> task_work_add() for that task to handle it on return to user-space >> without checking if such request already exist for that particular >> task. The issue arises for long sleeping tasks which has thousands for >> such request queued to be handled. On our production, we notice >> thousands of tasks having thousands of such requests and taking GiBs >> of memory for "struct task_move_callback". I am not very familiar with >> the code to judge if task_work_cancel() is the right approach or just >> checking closid/rmid before doing task_work_add(). >> > > Thank you for reporting the issue, Shakeel! > > Could you please check if the following patch fixes the issue? > From 3c23c39b6a44fdfbbbe0083d074dcc114d7d7f1c Mon Sep 17 00:00:00 2001 > From: Fenghua Yu <fenghua.yu@intel.com> > Date: Wed, 8 Jan 2020 19:53:33 +0000 > Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements > > Currently a task can be moved to a rdtgroup multiple times. > But, this can cause multiple task works are added, waste memory > and degrade performance. > > To fix the issue, only move the task to a rdtgroup when the task > is not in the rdgroup. Don't try to move the task to the rdtgroup > again when the task is already in the rdtgroup. > > Reported-by: Shakeel Butt <shakeelb@google.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > --- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index 2e3b06d6bbc6..75300c4a5969 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -546,6 +546,17 @@ static int __rdtgroup_move_task(struct task_struct *tsk, > struct task_move_callback *callback; > int ret; > > + /* If the task is already in rdtgrp, don't move the task. */ > + if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid && > + tsk->rmid == rdtgrp->mon.rmid) || > + (rdtgrp->type == RDTMON_GROUP && > + rdtgrp->mon.parent->closid == tsk->closid && > + tsk->rmid == rdtgrp->mon.rmid)) { > + rdt_last_cmd_puts("Task is already in the rdgroup\n"); > + > + return -EINVAL; > + } > + > callback = kzalloc(sizeof(*callback), GFP_KERNEL); > if (!callback) > return -ENOMEM; > I think your fix would address this specific use case but a slightly different use case will still encounter the problem of high memory consumption. If for example, sleeping tasks are moved (many times) between resource or monitoring groups then their task_works queue would just keep growing. It seems that a call to task_work_cancel() before adding a new work item should address all these cases? Reinette ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug report] resctrl high memory comsumption 2020-01-08 20:42 ` Reinette Chatre @ 2020-01-08 21:42 ` Fenghua Yu 2020-01-08 21:54 ` Reinette Chatre 0 siblings, 1 reply; 7+ messages in thread From: Fenghua Yu @ 2020-01-08 21:42 UTC (permalink / raw) To: Reinette Chatre Cc: Shakeel Butt, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86 On Wed, Jan 08, 2020 at 12:42:17PM -0800, Reinette Chatre wrote: > Hi Fenghua, > On 1/8/2020 12:23 PM, Fenghua Yu wrote: > > On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote: > >> Recently we had a bug in the system software writing the same pids to > >> the tasks file of resctrl group multiple times. The resctrl code > > Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements > I think your fix would address this specific use case but a slightly > different use case will still encounter the problem of high memory > consumption. If for example, sleeping tasks are moved (many times) > between resource or monitoring groups then their task_works queue would > just keep growing. It seems that a call to task_work_cancel() before > adding a new work item should address all these cases? The checking code in this patch is also helpful to avoid redundant task move preparation (kzalloc(), task_work_add(), etc) in the same rdtgroup. How about adding both the checking code and task_work_cancel()? Thanks. -Fenghua ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug report] resctrl high memory comsumption 2020-01-08 21:42 ` Fenghua Yu @ 2020-01-08 21:54 ` Reinette Chatre 2020-01-13 18:38 ` Shakeel Butt 0 siblings, 1 reply; 7+ messages in thread From: Reinette Chatre @ 2020-01-08 21:54 UTC (permalink / raw) To: Fenghua Yu Cc: Shakeel Butt, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86 Hi Fenghua, On 1/8/2020 1:42 PM, Fenghua Yu wrote: > On Wed, Jan 08, 2020 at 12:42:17PM -0800, Reinette Chatre wrote: >> Hi Fenghua, >> On 1/8/2020 12:23 PM, Fenghua Yu wrote: >>> On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote: >>>> Recently we had a bug in the system software writing the same pids to >>>> the tasks file of resctrl group multiple times. The resctrl code >>> Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements >> I think your fix would address this specific use case but a slightly >> different use case will still encounter the problem of high memory >> consumption. If for example, sleeping tasks are moved (many times) >> between resource or monitoring groups then their task_works queue would >> just keep growing. It seems that a call to task_work_cancel() before >> adding a new work item should address all these cases? > > The checking code in this patch is also helpful to avoid redundant > task move preparation (kzalloc(), task_work_add(), etc) in the same > rdtgroup. Indeed. > > How about adding both the checking code and task_work_cancel()? That does sound good to me. There is something in the current implementation that I would appreciate your feedback on: Currently the task's closid and rmid are initialized _after_ the call to task_work_add() succeeds. Should these not be initialized before the call to task_work_add()? Thank you Reinette ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug report] resctrl high memory comsumption 2020-01-08 21:54 ` Reinette Chatre @ 2020-01-13 18:38 ` Shakeel Butt 0 siblings, 0 replies; 7+ messages in thread From: Shakeel Butt @ 2020-01-13 18:38 UTC (permalink / raw) To: Reinette Chatre Cc: Fenghua Yu, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86 On Wed, Jan 8, 2020 at 1:54 PM Reinette Chatre <reinette.chatre@intel.com> wrote: > > Hi Fenghua, > > On 1/8/2020 1:42 PM, Fenghua Yu wrote: > > On Wed, Jan 08, 2020 at 12:42:17PM -0800, Reinette Chatre wrote: > >> Hi Fenghua, > >> On 1/8/2020 12:23 PM, Fenghua Yu wrote: > >>> On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote: > >>>> Recently we had a bug in the system software writing the same pids to > >>>> the tasks file of resctrl group multiple times. The resctrl code > >>> Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements > >> I think your fix would address this specific use case but a slightly > >> different use case will still encounter the problem of high memory > >> consumption. If for example, sleeping tasks are moved (many times) > >> between resource or monitoring groups then their task_works queue would > >> just keep growing. It seems that a call to task_work_cancel() before > >> adding a new work item should address all these cases? > > > > The checking code in this patch is also helpful to avoid redundant > > task move preparation (kzalloc(), task_work_add(), etc) in the same > > rdtgroup. > > Indeed. > > > > > How about adding both the checking code and task_work_cancel()? > > That does sound good to me. > Hi Fenghua, any updates here? > There is something in the current implementation that I would appreciate > your feedback on: Currently the task's closid and rmid are initialized > _after_ the call to task_work_add() succeeds. Should these not be > initialized before the call to task_work_add()? > This seems like a potential race. thanks, Shakeel ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [bug report] resctrl high memory comsumption 2020-01-08 20:23 ` Fenghua Yu 2020-01-08 20:42 ` Reinette Chatre @ 2020-01-08 21:20 ` Shakeel Butt 1 sibling, 0 replies; 7+ messages in thread From: Shakeel Butt @ 2020-01-08 21:20 UTC (permalink / raw) To: Fenghua Yu Cc: Reinette Chatre, Borislav Petkov, LKML, Thomas Gleixner, Ingo Molnar, x86 On Wed, Jan 8, 2020 at 12:12 PM Fenghua Yu <fenghua.yu@intel.com> wrote: > > On Wed, Jan 08, 2020 at 09:07:41AM -0800, Shakeel Butt wrote: > > Hi, > > > > Recently we had a bug in the system software writing the same pids to > > the tasks file of resctrl group multiple times. The resctrl code > > allocates "struct task_move_callback" for each such write and call > > task_work_add() for that task to handle it on return to user-space > > without checking if such request already exist for that particular > > task. The issue arises for long sleeping tasks which has thousands for > > such request queued to be handled. On our production, we notice > > thousands of tasks having thousands of such requests and taking GiBs > > of memory for "struct task_move_callback". I am not very familiar with > > the code to judge if task_work_cancel() is the right approach or just > > checking closid/rmid before doing task_work_add(). > > > > Thank you for reporting the issue, Shakeel! > > Could you please check if the following patch fixes the issue? > From 3c23c39b6a44fdfbbbe0083d074dcc114d7d7f1c Mon Sep 17 00:00:00 2001 > From: Fenghua Yu <fenghua.yu@intel.com> > Date: Wed, 8 Jan 2020 19:53:33 +0000 > Subject: [RFC PATCH] x86/resctrl: Fix redundant task movements > > Currently a task can be moved to a rdtgroup multiple times. > But, this can cause multiple task works are added, waste memory > and degrade performance. > > To fix the issue, only move the task to a rdtgroup when the task > is not in the rdgroup. Don't try to move the task to the rdtgroup > again when the task is already in the rdtgroup. > > Reported-by: Shakeel Butt <shakeelb@google.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> > --- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > index 2e3b06d6bbc6..75300c4a5969 100644 > --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c > +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c > @@ -546,6 +546,17 @@ static int __rdtgroup_move_task(struct task_struct *tsk, > struct task_move_callback *callback; > int ret; > > + /* If the task is already in rdtgrp, don't move the task. */ > + if ((rdtgrp->type == RDTCTRL_GROUP && tsk->closid == rdtgrp->closid && > + tsk->rmid == rdtgrp->mon.rmid) || > + (rdtgrp->type == RDTMON_GROUP && > + rdtgrp->mon.parent->closid == tsk->closid && > + tsk->rmid == rdtgrp->mon.rmid)) { > + rdt_last_cmd_puts("Task is already in the rdgroup\n"); > + > + return -EINVAL; Why not just return success if the task is already in that group (i.e. just follow the cgroup behavior). > + } > + > callback = kzalloc(sizeof(*callback), GFP_KERNEL); > if (!callback) > return -ENOMEM; > -- > 2.19.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-01-13 18:38 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-08 17:07 [bug report] resctrl high memory comsumption Shakeel Butt 2020-01-08 20:23 ` Fenghua Yu 2020-01-08 20:42 ` Reinette Chatre 2020-01-08 21:42 ` Fenghua Yu 2020-01-08 21:54 ` Reinette Chatre 2020-01-13 18:38 ` Shakeel Butt 2020-01-08 21:20 ` Shakeel Butt
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).