All of lore.kernel.org
 help / color / mirror / Atom feed
From: Parth Shah <parth@linux.ibm.com>
To: Pavan Kondeti <pkondeti@codeaurora.org>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, qais.yousef@arm.com,
	chris.hyser@oracle.com, valentin.schneider@arm.com,
	rjw@rjwysocki.net
Subject: Re: [RFC 2/4] sched/core: Set nr_lat_sensitive counter at various scheduler entry/exit points
Date: Fri, 8 May 2020 16:45:16 +0530	[thread overview]
Message-ID: <73506bba-7bcb-fd40-6866-d5d88d436fbf@linux.ibm.com> (raw)
In-Reply-To: <20200508083308.GI19464@codeaurora.org>

Hi Pavan,

Thanks for going through this patch-set.

On 5/8/20 2:03 PM, Pavan Kondeti wrote:
> Hi Parth,
> 
> On Thu, May 07, 2020 at 07:07:21PM +0530, Parth Shah wrote:
>> Monitor tasks at:
>> 1. wake_up_new_task() - forked tasks
>>
>> 2. set_task_cpu() - task migrations, Load balancer
>>
>> 3. __sched_setscheduler() - set/unset latency_nice value
>> Increment the nr_lat_sensitive count on the CPU with task marked with
>> latency_nice == -20.
>> Similarly, decrement the nr_lat_sensitive counter upon re-marking the task
>> with >-20 latency_nice task.
>>
>> 4. finish_task_switch() - dying task
>>
> 
> 
>> Signed-off-by: Parth Shah <parth@linux.ibm.com>
>> ---
>>  kernel/sched/core.c  | 30 ++++++++++++++++++++++++++++--
>>  kernel/sched/sched.h |  5 +++++
>>  2 files changed, 33 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index 2d8b76f41d61..ad396c36eba6 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -1744,6 +1744,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>>  	trace_sched_migrate_task(p, new_cpu);
>>  
>>  	if (task_cpu(p) != new_cpu) {
>> +		if (task_is_lat_sensitive(p)) {
>> +			per_cpu(nr_lat_sensitive, task_cpu(p))--;
>> +			per_cpu(nr_lat_sensitive, new_cpu)++;
>> +		}
>> +
> 
> Since we can come here without rq locks, there is a possibility
> of a race and incorrect updates can happen. Since the counters
> are used to prevent C-states, we don't want that to happen.

I did tried using task_lock(p) wherever we do change refcount and when
latency_nice value is set. There I was using nr_lat_sensitive with atomic_t
type.

After lots of thinking to optimize it and thinking that we anyways hold rq
lock, I thought of not using any lock here and see if scheduler community
has well known solution for this :-)

But in brief, using atomic_t nr_lat_sensitive and task_lock(p) when changin
refcount should solve problem, right?

If you or anyone have solution for this kind of pattern, then that surely
will be helpful.

> 
>>  		if (p->sched_class->migrate_task_rq)
>>  			p->sched_class->migrate_task_rq(p, new_cpu);
>>  		p->se.nr_migrations++;
>> @@ -2947,6 +2952,7 @@ void wake_up_new_task(struct task_struct *p)
>>  {
>>  	struct rq_flags rf;
>>  	struct rq *rq;
>> +	int target_cpu = 0;
>>  
>>  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
>>  	p->state = TASK_RUNNING;
>> @@ -2960,9 +2966,17 @@ void wake_up_new_task(struct task_struct *p)
>>  	 * as we're not fully set-up yet.
>>  	 */
>>  	p->recent_used_cpu = task_cpu(p);
>> -	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
>> +	target_cpu = select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0);
>> +	__set_task_cpu(p, target_cpu);
>> +
> 
> The target_cpu variable can be eliminated by using task_cpu(p) directly
> in the below update.

Right. Will change it thus saving one diff line.

> 
>>  #endif
>>  	rq = __task_rq_lock(p, &rf);
>> +
>> +#ifdef CONFIG_SMP
>> +	if (task_is_lat_sensitive(p))
>> +		per_cpu(nr_lat_sensitive, target_cpu)++;
>> +#endif
>> +
> 
> Is the SMP check intentional? In some parts of this patch, updates to
> nr_lat_sensitive are done without SMP checks. For example,
> finish_task_switch() below.

No. just forget to remove. I will remove SMP checks in next revision.

> 
>>  	update_rq_clock(rq);
>>  	post_init_entity_util_avg(p);
>>  
>> @@ -3248,6 +3262,9 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>>  		if (prev->sched_class->task_dead)
>>  			prev->sched_class->task_dead(prev);
>>  
>> +		if (task_is_lat_sensitive(prev))
>> +			per_cpu(nr_lat_sensitive, prev->cpu)--;
>> +
>>  		/*
>>  		 * Remove function-return probe instances associated with this
>>  		 * task and put them back on the free list.
>> @@ -4732,8 +4749,17 @@ static void __setscheduler_params(struct task_struct *p,
>>  	p->normal_prio = normal_prio(p);
>>  	set_load_weight(p, true);
>>  
>> -	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE)
>> +	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
>> +		if (p->state != TASK_DEAD &&
>> +		    attr->sched_latency_nice != p->latency_nice) {
>> +			if (attr->sched_latency_nice == MIN_LATENCY_NICE)
>> +				per_cpu(nr_lat_sensitive, task_cpu(p))++;
>> +			else if (task_is_lat_sensitive(p))
>> +				per_cpu(nr_lat_sensitive, task_cpu(p))--;
>> +		}
>> +
>>  		p->latency_nice = attr->sched_latency_nice;
>> +	}
>>  }
> 
> There is a potential race here due to which we can mess up the refcount.
> 
> - A latency sensitive task is marked TASK_DEAD
> <snip>
> - sched_setattr() called on the task to clear the latency nice. Since
> we check the task state here, we skip the decrement.
> - The task is finally context switched out and we skip the decrement again
> since it is not a latency senstivie task.

if task is already marked TASK_DEAD then we should have already decremented
its refcount in finish_task_switch().
am I missing something?

> 
>>  
>>  /* Actually do priority change: must hold pi & rq lock. */
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 5c41020c530e..56f885e37451 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -211,6 +211,11 @@ static inline int task_has_dl_policy(struct task_struct *p)
>>  	return dl_policy(p->policy);
>>  }
>>  
>> +static inline int task_is_lat_sensitive(struct task_struct *p)
>> +{
>> +	return p->latency_nice == MIN_LATENCY_NICE;
>> +}
>> +
>>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>>  
>>  /*
>> -- 
>> 2.17.2
>>
> 
> Thanks,
> Pavan
> 

Thanks,
Parth

  reply	other threads:[~2020-05-08 11:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 13:37 [RFC 0/4] IDLE gating in presence of latency-sensitive tasks Parth Shah
2020-05-07 13:37 ` [RFC 1/4] sched/core: Introduce per_cpu counter to track latency sensitive tasks Parth Shah
2020-05-08  8:40   ` Pavan Kondeti
2020-05-08 11:30     ` Parth Shah
2020-05-09  2:14       ` Pavan Kondeti
2020-05-07 13:37 ` [RFC 2/4] sched/core: Set nr_lat_sensitive counter at various scheduler entry/exit points Parth Shah
2020-05-08  8:33   ` Pavan Kondeti
2020-05-08 11:15     ` Parth Shah [this message]
2020-05-09  2:39       ` Pavan Kondeti
2020-05-12  7:51         ` Parth Shah
2020-05-07 13:37 ` [RFC 3/4] sched/idle: Disable idle call on least latency requirements Parth Shah
2020-05-08  8:36   ` Pavan Kondeti
2020-05-08 11:19     ` Parth Shah
2020-05-09  2:18       ` Pavan Kondeti
2020-05-07 13:37 ` [RFC 4/4] sched/idle: Add debugging bits to validate inconsistency in latency sensitive task calculations Parth Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73506bba-7bcb-fd40-6866-d5d88d436fbf@linux.ibm.com \
    --to=parth@linux.ibm.com \
    --cc=chris.hyser@oracle.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.