linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chen Yu <yu.c.chen@intel.com>
To: Xuewen Yan <xuewen.yan@unisoc.com>
Cc: <peterz@infradead.org>, <mingo@redhat.com>,
	<juri.lelli@redhat.com>, <vincent.guittot@linaro.org>,
	<dietmar.eggemann@arm.com>, <rostedt@goodmis.org>,
	<bsegall@google.com>, <mgorman@suse.de>, <bristot@redhat.com>,
	<vschneid@redhat.com>, <ke.wang@unisoc.com>,
	<xuewen.yan94@gmail.com>, <di.shen@unisoc.com>,
	<linux-kernel@vger.kernel.org>,
	Sergei Trofimovich <slyich@gmail.com>,
	"Breno Leitao" <leitao@debian.org>,
	Igor Raits <igor@gooddata.com>, Tim Chen <tim.c.chen@intel.com>,
	Yujie Liu <yujie.liu@intel.com>
Subject: Re: [PATCH v2] sched/eevdf: Prevent vlag from going out of bounds when reweight_eevdf
Date: Mon, 22 Apr 2024 16:47:31 +0800	[thread overview]
Message-ID: <ZiYkI4qn2nvdmktq@chenyu5-mobl2> (raw)
In-Reply-To: <20240422082238.5784-1-xuewen.yan@unisoc.com>

On 2024-04-22 at 16:22:38 +0800, Xuewen Yan wrote:
> kernel encounters the following error when running workload:
> 
> BUG: kernel NULL pointer dereference, address: 0000002c
> EIP: set_next_entity (fair.c:?)
> 
> which was caused by NULL pointer returned by pick_eevdf().
> 
> Further investigation has shown that, the entity_eligible() has an
> false-negative issue when the entity's vruntime is far behind the
> cfs_rq.min_vruntime that, the (vruntime - cfs_rq->min_vruntime) * load
> caused a s64 overflow, thus every entity on the rb-tree is not
> eligible, which results in a NULL candidate.
> 
> The reason why entity's vruntime is far behind the cfs_rq.min_vruntime
> is because during a on_rq task group's update_cfs_group()->reweight_eevdf(),
> there is no limit on the new entity's vlag. If the new weight is much
> smaller than the old one,
> 
> vlag = div_s64(vlag * old_weight, weight)
> 
> generates a huge vlag, and results in very small(negative) vruntime.
> 
> Thus limit the range of vlag accordingly.
>

Thanks for the fix.

Might also add comments from Tim suggested here:
https://lore.kernel.org/lkml/ec479251e6245148b89b226f734192f6d5343bbc.camel@linux.intel.com/

A fix tag might be needed.
Fixes: eab03c23c2a1 ("sched/eevdf: Fix vruntime adjustment on reweight")
 
> Reported-by: Sergei Trofimovich <slyich@gmail.com>
> Closes: https://lore.kernel.org/all/ZhuYyrh3mweP_Kd8@nz.home/
> Reported-by: Igor Raits <igor@gooddata.com>
> Closes: https://lore.kernel.org/all/CA+9S74ih+45M_2TPUY_mPPVDhNvyYfy1J1ftSix+KjiTVxg8nw@mail.gmail.com/
> Reported-by: Breno Leitao <leitao@debian.org>
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/lkml/202401301012.2ed95df0-oliver.sang@intel.com/
> Reported-by: Yujie Liu <yujie.liu@intel.com>
> Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
> ---

Cced Sergei, Igor, Breno who have encountered this NULL pointer issue before.

From my testing, with this applied I did not see the NULL pointer exception
after running trinity for 100 cycles, so

Reviewed-and-tested-by: Chen Yu <yu.c.chen@intel.com>

thanks,
Chenyu

> changes of v2:
> -add reported-by (suggested by <yu.c.chen@intel.com>)
> -remork the changelog (<yu.c.chen@intel.com>)
> -remove the judgement of fork (Peter)
> -remove the !on_rq case. (Peter)
> ---
> Previous discussion link:
> https://lore.kernel.org/lkml/20240226082349.302363-1-yu.c.chen@intel.com/
> https://lore.kernel.org/all/20240130080643.1828-1-xuewen.yan@unisoc.com/
> ---
> ---
>  kernel/sched/fair.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03be0d1330a6..64826f406d6d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -696,15 +696,21 @@ u64 avg_vruntime(struct cfs_rq *cfs_rq)
>   *
>   * XXX could add max_slice to the augmented data to track this.
>   */
> -static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +static s64 entity_lag(u64 avruntime, struct sched_entity *se)
>  {
> -	s64 lag, limit;
> +	s64 vlag, limit;
> +
> +	vlag = avruntime - se->vruntime;
> +	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> +
> +	return clamp(vlag, -limit, limit);
> +}
>  
> +static void update_entity_lag(struct cfs_rq *cfs_rq, struct sched_entity *se)
> +{
>  	SCHED_WARN_ON(!se->on_rq);
> -	lag = avg_vruntime(cfs_rq) - se->vruntime;
>  
> -	limit = calc_delta_fair(max_t(u64, 2*se->slice, TICK_NSEC), se);
> -	se->vlag = clamp(lag, -limit, limit);
> +	se->vlag = entity_lag(avg_vruntime(cfs_rq), se);
>  }
>  
>  /*
> @@ -3761,7 +3767,7 @@ static void reweight_eevdf(struct cfs_rq *cfs_rq, struct sched_entity *se,
>  	 *	   = V  - vl'
>  	 */
>  	if (avruntime != se->vruntime) {
> -		vlag = (s64)(avruntime - se->vruntime);
> +		vlag = entity_lag(avruntime, se);
>  		vlag = div_s64(vlag * old_weight, weight);
>  		se->vruntime = avruntime - vlag;
>  	}
> -- 
> 2.25.1
> 

  parent reply	other threads:[~2024-04-22  8:47 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-22  8:22 [PATCH v2] sched/eevdf: Prevent vlag from going out of bounds when reweight_eevdf Xuewen Yan
2024-04-22  8:33 ` Xuewen Yan
2024-04-22  9:41   ` Peter Zijlstra
2024-04-22 11:07     ` Xuewen Yan
2024-04-22 11:17       ` Peter Zijlstra
2024-04-22 13:12         ` Xuewen Yan
2024-04-22 13:52           ` Chen Yu
2024-04-22 15:59           ` Peter Zijlstra
2024-04-23  3:05             ` Xuewen Yan
2024-04-23 11:48               ` Peter Zijlstra
2024-04-24  6:53                 ` Xuewen Yan
2024-04-22  8:47 ` Chen Yu [this message]
2024-04-23  1:26   ` Yujie Liu
2024-04-22 11:39 ` [tip: sched/urgent] sched/eevdf: Prevent vlag from going out of bounds in reweight_eevdf() tip-bot2 for Xuewen Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZiYkI4qn2nvdmktq@chenyu5-mobl2 \
    --to=yu.c.chen@intel.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=di.shen@unisoc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=igor@gooddata.com \
    --cc=juri.lelli@redhat.com \
    --cc=ke.wang@unisoc.com \
    --cc=leitao@debian.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=slyich@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=xuewen.yan94@gmail.com \
    --cc=xuewen.yan@unisoc.com \
    --cc=yujie.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).