linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Reinette Chatre <reinette.chatre@intel.com>
To: James Morse <james.morse@arm.com>, fenghua.yu@intel.com
Cc: tglx@linutronix.de, bp@alien8.de, tony.luck@intel.com,
	kuo-lang.tseng@intel.com, shakeelb@google.com,
	valentin.schneider@arm.com, mingo@redhat.com, babu.moger@amd.com,
	hpa@zytor.com, x86@kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org
Subject: Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group
Date: Wed, 9 Dec 2020 16:22:40 -0800	[thread overview]
Message-ID: <d80e3d5d-1db2-0f47-dbf3-dfc607e1b451@intel.com> (raw)
In-Reply-To: <97610014-12a8-c389-e7e6-f655caf61d0d@arm.com>

Hi James,

On 12/9/2020 8:51 AM, James Morse wrote:
> Hi Reinette, Fenghua,
> 
> Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' conveys the
> guts of this change more quickly!

Sure. Thank you. A small change is that I plan to write "PQR_ASSOC MSR" 
instead to closer match the name.

> 
> On 03/12/2020 23:25, Reinette Chatre wrote:
>> From: Fenghua Yu <fenghua.yu@intel.com>
>>
>> Currently when moving a task to a resource group the PQR_ASSOC MSR
>> is updated with the new closid and rmid in an added task callback.
>> If the task is running the work is run as soon as possible. If the
>> task is not running the work is executed later
> 
>> in the kernel exit path when the kernel returns to the task again.
> 
> kernel exit makes me thing of user-space... is it enough to just say:
> "by __switch_to() when task is next run"?

I do not think that would be accurate. The paragraph to which you are 
proposing the change states the current context, before the fix, when 
task_work_add() is still used to update PQR_ASSOC. The "kernel exit" 
text you refer to is quite close to task_work_add()'s comments and 
indeed refers to the work that is run before returning to user space. If 
a function name would make things clearer perhaps adding 
exit_to_user_mode_loop() instead? Changing the text to __switch_to() 
does not reflect the context described here since __switch_to() is what 
is called when task is context switched back from where 
resctrl_sched_in() is called, not the queued work being described here.

>> Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
>> is running is the right thing to do. Queueing work for a task that is
>> not running is unnecessary (the PQR_ASSOC MSR is already updated when the
>> task is scheduled in) and causing system resource waste with the way in
>> which it is implemented: Work to update the PQR_ASSOC register is queued
>> every time the user writes a task id to the "tasks" file, even if the task
>> already belongs to the resource group. This could result in multiple pending
>> work items associated with a single task even if they are all identical and
>> even though only a single update with most recent values is needed.
>> Specifically, even if a task is moved between different resource groups
>> while it is sleeping then it is only the last move that is relevant but
>> yet a work item is queued during each move.
>> This unnecessary queueing of work items could result in significant system
>> resource waste, especially on tasks sleeping for a long time. For example,
>> as demonstrated by Shakeel Butt in [1] writing the same task id to the
>> "tasks" file can quickly consume significant memory. The same problem
>> (wasted system resources) occurs when moving a task between different
>> resource groups.
>>
>> As pointed out by Valentin Schneider in [2] there is an additional issue with
>> the way in which the queueing of work is done in that the task_struct update
>> is currently done after the work is queued, resulting in a race with the
>> register update possibly done before the data needed by the update is available.
>>
>> To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
>> right after the new closid and rmid are ready during the task movement,
>> only if the task is running. If a moved task is not running nothing is
>> done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
>> This is the same way used to update the register when tasks are moved as
>> part of resource group removal.
> 
> (as t->on_cpu is already used...)
> 
> Reviewed-by: James Morse <james.morse@arm.com>

Thank you very much. I do plan to follow your suggestion below though.


>> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> index 68db7d2dec8f..9d62f1fadcc3 100644
>> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
>> @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
> 
> 
>> +static void update_task_closid_rmid(struct task_struct *t)
>>   {
>> +	int cpu;
>>   
>> +	if (task_on_cpu(t, &cpu))
>> +		smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);
> 
> 
> I think:
> |	if (task_curr(t))
> |		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
> 
> here would make for an easier backport as it doesn't depend on the previous patch.
> 

Will do, thank you very much.

The previous patch has now become a bugfix in its own right after you 
pointed out the issue with using t->on_cpu. In addressing that I plan to 
remove the helpers found in patch #1 so backporting should continue to 
be easier.

> 
>> +}
> 
> [...]
> 
>>   static int __rdtgroup_move_task(struct task_struct *tsk,
>>   				struct rdtgroup *rdtgrp)
>>   {
> 
>> +	if (rdtgrp->type == RDTCTRL_GROUP) {
>> +		tsk->closid = rdtgrp->closid;
>> +		tsk->rmid = rdtgrp->mon.rmid;
>> +	} else if (rdtgrp->type == RDTMON_GROUP) {
> 
> [...]
> 
>> +	} else {
> 
>> +		rdt_last_cmd_puts("Invalid resource group type\n");
>> +		return -EINVAL;
> 
> Wouldn't this be a kernel bug?
> I'd have thought there would be a WARN_ON_ONCE() here to make it clear this isn't
> user-space's fault!

You are right, this would be a kernel bug. I'll add a WARN_ON_ONCE().

> 
> 
>>   	}
>> -	return ret;
>> +
>> +	/*
>> +	 * By now, the task's closid and rmid are set. If the task is current
>> +	 * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
>> +	 * group go into effect. If the task is not current, the MSR will be
>> +	 * updated when the task is scheduled in.
>> +	 */
>> +	update_task_closid_rmid(tsk);
>> +
>> +	return 0;
>>   }
> 

Thank you very much.

Reinette


  reply	other threads:[~2020-12-10  0:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-03 23:25 [PATCH 0/3] x86/resctrl: Fix a few issues in moving a task to a resource group Reinette Chatre
2020-12-03 23:25 ` [PATCH 1/3] x86/resctrl: Move setting task's active CPU in a mask into helpers Reinette Chatre
2020-12-07 18:29   ` Borislav Petkov
2020-12-07 21:24     ` Reinette Chatre
2020-12-08  9:49       ` Borislav Petkov
2020-12-08 16:35         ` Reinette Chatre
2020-12-09 16:47   ` James Morse
2020-12-10  0:21     ` Reinette Chatre
2020-12-03 23:25 ` [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group Reinette Chatre
2020-12-09 16:51   ` James Morse
2020-12-10  0:22     ` Reinette Chatre [this message]
2020-12-11 20:46   ` Valentin Schneider
2020-12-14 18:41     ` Reinette Chatre
2020-12-16 17:41       ` Valentin Schneider
2020-12-16 18:26         ` Reinette Chatre
2020-12-17 10:39           ` Valentin Schneider
2020-12-03 23:25 ` [PATCH 3/3] x86/resctrl: Don't move a task to the same " Reinette Chatre
2020-12-11 20:46 ` [PATCH 0/3] x86/resctrl: Fix a few issues in moving a task to a " Valentin Schneider
2020-12-14 18:38   ` Reinette Chatre
2020-12-16 17:41     ` Valentin Schneider
2020-12-16 18:26       ` Reinette Chatre
2020-12-17 12:19 ` [PATCH 4/3] x86/intel_rdt: Apply READ_ONCE/WRITE_ONCE to task_struct .rmid & .closid Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d80e3d5d-1db2-0f47-dbf3-dfc607e1b451@intel.com \
    --to=reinette.chatre@intel.com \
    --cc=babu.moger@amd.com \
    --cc=bp@alien8.de \
    --cc=fenghua.yu@intel.com \
    --cc=hpa@zytor.com \
    --cc=james.morse@arm.com \
    --cc=kuo-lang.tseng@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=shakeelb@google.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=valentin.schneider@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).