All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavan Kondeti <pkondeti@codeaurora.org>
To: Parth Shah <parth@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	mingo@redhat.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, qais.yousef@arm.com,
	chris.hyser@oracle.com, valentin.schneider@arm.com,
	rjw@rjwysocki.net
Subject: Re: [RFC 2/4] sched/core: Set nr_lat_sensitive counter at various scheduler entry/exit points
Date: Fri, 8 May 2020 14:03:08 +0530	[thread overview]
Message-ID: <20200508083308.GI19464@codeaurora.org> (raw)
In-Reply-To: <20200507133723.18325-3-parth@linux.ibm.com>

Hi Parth,

On Thu, May 07, 2020 at 07:07:21PM +0530, Parth Shah wrote:
> Monitor tasks at:
> 1. wake_up_new_task() - forked tasks
> 
> 2. set_task_cpu() - task migrations, Load balancer
> 
> 3. __sched_setscheduler() - set/unset latency_nice value
> Increment the nr_lat_sensitive count on the CPU with task marked with
> latency_nice == -20.
> Similarly, decrement the nr_lat_sensitive counter upon re-marking the task
> with >-20 latency_nice task.
> 
> 4. finish_task_switch() - dying task
> 


> Signed-off-by: Parth Shah <parth@linux.ibm.com>
> ---
>  kernel/sched/core.c  | 30 ++++++++++++++++++++++++++++--
>  kernel/sched/sched.h |  5 +++++
>  2 files changed, 33 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2d8b76f41d61..ad396c36eba6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1744,6 +1744,11 @@ void set_task_cpu(struct task_struct *p, unsigned int new_cpu)
>  	trace_sched_migrate_task(p, new_cpu);
>  
>  	if (task_cpu(p) != new_cpu) {
> +		if (task_is_lat_sensitive(p)) {
> +			per_cpu(nr_lat_sensitive, task_cpu(p))--;
> +			per_cpu(nr_lat_sensitive, new_cpu)++;
> +		}
> +

Since we can come here without rq locks, there is a possibility
of a race and incorrect updates can happen. Since the counters
are used to prevent C-states, we don't want that to happen.

>  		if (p->sched_class->migrate_task_rq)
>  			p->sched_class->migrate_task_rq(p, new_cpu);
>  		p->se.nr_migrations++;
> @@ -2947,6 +2952,7 @@ void wake_up_new_task(struct task_struct *p)
>  {
>  	struct rq_flags rf;
>  	struct rq *rq;
> +	int target_cpu = 0;
>  
>  	raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
>  	p->state = TASK_RUNNING;
> @@ -2960,9 +2966,17 @@ void wake_up_new_task(struct task_struct *p)
>  	 * as we're not fully set-up yet.
>  	 */
>  	p->recent_used_cpu = task_cpu(p);
> -	__set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
> +	target_cpu = select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0);
> +	__set_task_cpu(p, target_cpu);
> +

The target_cpu variable can be eliminated by using task_cpu(p) directly
in the below update.

>  #endif
>  	rq = __task_rq_lock(p, &rf);
> +
> +#ifdef CONFIG_SMP
> +	if (task_is_lat_sensitive(p))
> +		per_cpu(nr_lat_sensitive, target_cpu)++;
> +#endif
> +

Is the SMP check intentional? In some parts of this patch, updates to
nr_lat_sensitive are done without SMP checks. For example,
finish_task_switch() below.

>  	update_rq_clock(rq);
>  	post_init_entity_util_avg(p);
>  
> @@ -3248,6 +3262,9 @@ static struct rq *finish_task_switch(struct task_struct *prev)
>  		if (prev->sched_class->task_dead)
>  			prev->sched_class->task_dead(prev);
>  
> +		if (task_is_lat_sensitive(prev))
> +			per_cpu(nr_lat_sensitive, prev->cpu)--;
> +
>  		/*
>  		 * Remove function-return probe instances associated with this
>  		 * task and put them back on the free list.
> @@ -4732,8 +4749,17 @@ static void __setscheduler_params(struct task_struct *p,
>  	p->normal_prio = normal_prio(p);
>  	set_load_weight(p, true);
>  
> -	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE)
> +	if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE) {
> +		if (p->state != TASK_DEAD &&
> +		    attr->sched_latency_nice != p->latency_nice) {
> +			if (attr->sched_latency_nice == MIN_LATENCY_NICE)
> +				per_cpu(nr_lat_sensitive, task_cpu(p))++;
> +			else if (task_is_lat_sensitive(p))
> +				per_cpu(nr_lat_sensitive, task_cpu(p))--;
> +		}
> +
>  		p->latency_nice = attr->sched_latency_nice;
> +	}
>  }

There is a potential race here due to which we can mess up the refcount.

- A latency sensitive task is marked TASK_DEAD
<snip>
- sched_setattr() called on the task to clear the latency nice. Since
we check the task state here, we skip the decrement.
- The task is finally context switched out and we skip the decrement again
since it is not a latency senstivie task.

>  
>  /* Actually do priority change: must hold pi & rq lock. */
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 5c41020c530e..56f885e37451 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -211,6 +211,11 @@ static inline int task_has_dl_policy(struct task_struct *p)
>  	return dl_policy(p->policy);
>  }
>  
> +static inline int task_is_lat_sensitive(struct task_struct *p)
> +{
> +	return p->latency_nice == MIN_LATENCY_NICE;
> +}
> +
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>  
>  /*
> -- 
> 2.17.2
> 

Thanks,
Pavan

-- 
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.

  reply	other threads:[~2020-05-08  8:33 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07 13:37 [RFC 0/4] IDLE gating in presence of latency-sensitive tasks Parth Shah
2020-05-07 13:37 ` [RFC 1/4] sched/core: Introduce per_cpu counter to track latency sensitive tasks Parth Shah
2020-05-08  8:40   ` Pavan Kondeti
2020-05-08 11:30     ` Parth Shah
2020-05-09  2:14       ` Pavan Kondeti
2020-05-07 13:37 ` [RFC 2/4] sched/core: Set nr_lat_sensitive counter at various scheduler entry/exit points Parth Shah
2020-05-08  8:33   ` Pavan Kondeti [this message]
2020-05-08 11:15     ` Parth Shah
2020-05-09  2:39       ` Pavan Kondeti
2020-05-12  7:51         ` Parth Shah
2020-05-07 13:37 ` [RFC 3/4] sched/idle: Disable idle call on least latency requirements Parth Shah
2020-05-08  8:36   ` Pavan Kondeti
2020-05-08 11:19     ` Parth Shah
2020-05-09  2:18       ` Pavan Kondeti
2020-05-07 13:37 ` [RFC 4/4] sched/idle: Add debugging bits to validate inconsistency in latency sensitive task calculations Parth Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200508083308.GI19464@codeaurora.org \
    --to=pkondeti@codeaurora.org \
    --cc=chris.hyser@oracle.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=parth@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=rjw@rjwysocki.net \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.