LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
To: bsegall@google.com
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily
Date: Thu, 16 Jun 2016 20:23:28 +0300
Message-ID: <5762E090.4000706@yandex-team.ru> (raw)
In-Reply-To: <xm26r3bxvzep.fsf@bsegall-linux.mtv.corp.google.com>

On 16.06.2016 20:03, bsegall@google.com wrote:
> Konstantin Khlebnikov <khlebnikov@yandex-team.ru> writes:
>
>> Cgroup created inside throttled group must inherit current throttle_count.
>> Broken throttle_count allows to nominate throttled entries as a next buddy,
>> later this leads to null pointer dereference in pick_next_task_fair().
>>
>> This patch initialize cfs_rq->throttle_count at first enqueue: laziness
>> allows to skip locking all rq at group creation. Lazy approach also allows
>> to skip full sub-tree scan at throttling hierarchy (not in this patch).
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Cc: Stable <stable@vger.kernel.org> # v3.2+
>> ---
>>   kernel/sched/fair.c  |   19 +++++++++++++++++++
>>   kernel/sched/sched.h |    2 +-
>>   2 files changed, 20 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 218f8e83db73..fe809fe169d2 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -4185,6 +4185,25 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
>>   	if (!cfs_bandwidth_used())
>>   		return;
>>
>> +	/* synchronize hierarchical throttle counter */
>> +	if (unlikely(!cfs_rq->throttle_uptodate)) {
>> +		struct rq *rq = rq_of(cfs_rq);
>> +		struct cfs_rq *pcfs_rq;
>> +		struct task_group *tg;
>> +
>> +		cfs_rq->throttle_uptodate = 1;
>> +		/* get closest uptodate node because leaves goes first */
>> +		for (tg = cfs_rq->tg->parent; tg; tg = tg->parent) {
>> +			pcfs_rq = tg->cfs_rq[cpu_of(rq)];
>> +			if (pcfs_rq->throttle_uptodate)
>> +				break;
>> +		}
>> +		if (tg) {
>> +			cfs_rq->throttle_count = pcfs_rq->throttle_count;
>> +			cfs_rq->throttled_clock_task = rq_clock_task(rq);
>> +		}
>> +	}
>> +
>
> Checking just in enqueue is not sufficient - throttled_lb_pair can check
> against a cfs_rq that has never been enqueued (and possibly other
> paths).

Looks like this is minor problem: in worst case load-balancer will migrate
task into throttled hierarchy. And this could happens only once for each
newly created group.

>
> It might also make sense to go ahead and initialize all the cfs_rqs we
> skipped over to avoid some n^2 pathological behavior. You could also use
> throttle_count == -1 or < 0. (We had our own version of this patch that
> I guess we forgot to push?)

n^2 shouldn't be a problem while this happens only once for each group.

throttle_count == -1 could be overwritten when parent throttles/unthrottles
before initialization. We could set it to INT_MIN/2 and check <0 but this
will hide possible bugs here. One more int in the same cacheline shouldn't
add noticeable overhead.

I've also added this into our kernel to catch such problems without crash.
Probably it's worth to add into upstream because stale buddy is a real pain)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5506,8 +5506,11 @@ static void set_last_buddy(struct sched_entity *se)
         if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
                 return;

-       for_each_sched_entity(se)
+       for_each_sched_entity(se) {
+               if (WARN_ON_ONCE(!se->on_rq))
+                       return;
                 cfs_rq_of(se)->last = se;
+       }
  }

  static void set_next_buddy(struct sched_entity *se)
@@ -5515,8 +5518,11 @@ static void set_next_buddy(struct sched_entity *se)
         if (entity_is_task(se) && unlikely(task_of(se)->policy == SCHED_IDLE))
                 return;

-       for_each_sched_entity(se)
+       for_each_sched_entity(se) {
+               if (WARN_ON_ONCE(!se->on_rq))
+                       return;
                 cfs_rq_of(se)->next = se;
+       }
  }

  static void set_skip_buddy(struct sched_entity *se)


>
>
>>   	/* an active group must be handled by the update_curr()->put() path */
>>   	if (!cfs_rq->runtime_enabled || cfs_rq->curr)
>>   		return;
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 72f1f3087b04..7cbeb92a1cb9 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -437,7 +437,7 @@ struct cfs_rq {
>>
>>   	u64 throttled_clock, throttled_clock_task;
>>   	u64 throttled_clock_task_time;
>> -	int throttled, throttle_count;
>> +	int throttled, throttle_count, throttle_uptodate;
>>   	struct list_head throttled_list;
>>   #endif /* CONFIG_CFS_BANDWIDTH */
>>   #endif /* CONFIG_FAIR_GROUP_SCHED */


-- 
Konstantin

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 12:57 Konstantin Khlebnikov
2016-06-16 17:03 ` bsegall
2016-06-16 17:23   ` Konstantin Khlebnikov [this message]
2016-06-16 17:33     ` bsegall
2016-06-21 13:41 ` Konstantin Khlebnikov
2016-06-21 21:10 ` Peter Zijlstra
2016-06-22  8:10   ` Konstantin Khlebnikov
2016-06-22  8:23     ` Peter Zijlstra
2016-06-24  8:59 ` [tip:sched/urgent] sched/fair: Initialize " tip-bot for Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5762E090.4000706@yandex-team.ru \
    --to=khlebnikov@yandex-team.ru \
    --cc=bsegall@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git