LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Subject: Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily
Date: Tue, 21 Jun 2016 16:41:03 +0300
Message-ID: <576943EF.8080902@yandex-team.ru> (raw)
In-Reply-To: <146608182119.21870.8439834428248129633.stgit@buzz>

On 16.06.2016 15:57, Konstantin Khlebnikov wrote:
> Cgroup created inside throttled group must inherit current throttle_count.
> Broken throttle_count allows to nominate throttled entries as a next buddy,
> later this leads to null pointer dereference in pick_next_task_fair().

example of kernel oops to summon maintainers

<1>[3627487.878297] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
<1>[3627487.879028] IP: [<ffffffff8109ab6c>] set_next_entity+0x1c/0x80
<4>[3627487.879837] PGD 0
<4>[3627487.880567] Oops: 0000 [#1] SMP
<4>[3627487.881292] Modules linked in: macvlan overlay ipmi_si ipmi_devintf ipmi_msghandler ip6t_REJECT nf_reject_ipv6 xt_tcpudp 
ip6table_filter ip6_tables x_tables quota_v2 quota_tree cls_cgroup sch_htb bridge netconsole configfs 8021q mrp garp stp llc 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ast ttm drm_kms_helper drm ghash_clmulni_intel aesni_intel 
ablk_helper sb_edac cryptd lrw lpc_ich gf128mul edac_core sysimgblt glue_helper aes_x86_64 microcode sysfillrect syscopyarea acpi_pad 
tcp_htcp mlx4_en mlx4_core vxlan udp_tunnel ip6_udp_tunnel igb i2c_algo_bit isci ixgbe libsas i2c_core ahci dca ptp libahci 
scsi_transport_sas pps_core mdio raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 
multipath linear [last unloaded: ipmi_msghandler]<4>[3627487.886379]
<4>[3627487.887892] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 3.18.19-24 #1
<4>[3627487.889429] Hardware name: AIC 1D-HV24-02/MB-DPSB04-04, BIOS IVYBV058 07/01/2015
<4>[3627487.891008] task: ffff881fd336f540 ti: ffff881fd33a4000 task.ti: ffff881fd33a4000
<4>[3627487.892569] RIP: 0010:[<ffffffff8109ab6c>]  [<ffffffff8109ab6c>] set_next_entity+0x1c/0x80
<4>[3627487.894200] RSP: 0018:ffff881fd33a7d68  EFLAGS: 00010082
<4>[3627487.895750] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff881fffdb2b70
<4>[3627487.897276] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd1193600
<4>[3627487.898793] RBP: ffff881fd33a7d88 R08: 0000000000000f6d R09: 0000000000000000
<4>[3627487.900358] R10: 0000000000000078 R11: 0000000000000000 R12: 0000000000000000
<4>[3627487.901898] R13: ffffffff8180f3c0 R14: ffff881fd33a4000 R15: ffff881fd1193600
<4>[3627487.903381] FS:  0000000000000000(0000) GS:ffff881fffda0000(0000) knlGS:0000000000000000
<4>[3627487.904920] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[3627487.906382] CR2: 0000000000000038 CR3: 0000000001c14000 CR4: 00000000001407e0
<4>[3627487.907904] Stack:
<4>[3627487.909365]  ffff881fffdb2b00 ffff881fffdb2b00 0000000000000000 ffffffff8180f3c0
<4>[3627487.910837]  ffff881fd33a7e18 ffffffff810a1b18 00000001360794f4 00000001760794f3
<4>[3627487.912322]  ffff881fd2888000 0000000000000000 0000000000012b00 ffff881fd336f540
<4>[3627487.913770] Call Trace:
<4>[3627487.915188]  [<ffffffff810a1b18>] pick_next_task_fair+0x88/0x5f0
<4>[3627487.916573]  [<ffffffff816d258f>] __schedule+0x6ef/0x820
<4>[3627487.917936]  [<ffffffff816d2799>] schedule+0x29/0x70
<4>[3627487.919277]  [<ffffffff816d2a76>] schedule_preempt_disabled+0x16/0x20
<4>[3627487.920632]  [<ffffffff810a8ddb>] cpu_startup_entry+0x14b/0x3d0
<4>[3627487.921999]  [<ffffffff810ce272>] ? clockevents_register_device+0xe2/0x140
<4>[3627487.923323]  [<ffffffff810463fc>] start_secondary+0x14c/0x160
<4>[3627487.924660] Code: 89 ff 48 89 e5 f0 48 0f b3 3e 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 
f3 4c 89 6d f8 <44> 8b 4e 38 49 89 fc 45 85 c9 74 17 4c 8d 6e 10 4c 39 6f 30 74
<1>[3627487.927435] RIP  [<ffffffff8109ab6c>] set_next_entity+0x1c/0x80
<4>[3627487.928741]  RSP <ffff881fd33a7d68>
<4>[3627487.930010] CR2: 0000000000000038

>
> This patch initialize cfs_rq->throttle_count at first enqueue: laziness
> allows to skip locking all rq at group creation. Lazy approach also allows
> to skip full sub-tree scan at throttling hierarchy (not in this patch).
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Cc: Stable <stable@vger.kernel.org> # v3.2+
> ---
>   kernel/sched/fair.c  |   19 +++++++++++++++++++
>   kernel/sched/sched.h |    2 +-
>   2 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 218f8e83db73..fe809fe169d2 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4185,6 +4185,25 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq)
>   	if (!cfs_bandwidth_used())
>   		return;
>
> +	/* synchronize hierarchical throttle counter */
> +	if (unlikely(!cfs_rq->throttle_uptodate)) {
> +		struct rq *rq = rq_of(cfs_rq);
> +		struct cfs_rq *pcfs_rq;
> +		struct task_group *tg;
> +
> +		cfs_rq->throttle_uptodate = 1;
> +		/* get closest uptodate node because leaves goes first */
> +		for (tg = cfs_rq->tg->parent; tg; tg = tg->parent) {
> +			pcfs_rq = tg->cfs_rq[cpu_of(rq)];
> +			if (pcfs_rq->throttle_uptodate)
> +				break;
> +		}
> +		if (tg) {
> +			cfs_rq->throttle_count = pcfs_rq->throttle_count;
> +			cfs_rq->throttled_clock_task = rq_clock_task(rq);
> +		}
> +	}
> +
>   	/* an active group must be handled by the update_curr()->put() path */
>   	if (!cfs_rq->runtime_enabled || cfs_rq->curr)
>   		return;
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 72f1f3087b04..7cbeb92a1cb9 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -437,7 +437,7 @@ struct cfs_rq {
>
>   	u64 throttled_clock, throttled_clock_task;
>   	u64 throttled_clock_task_time;
> -	int throttled, throttle_count;
> +	int throttled, throttle_count, throttle_uptodate;
>   	struct list_head throttled_list;
>   #endif /* CONFIG_CFS_BANDWIDTH */
>   #endif /* CONFIG_FAIR_GROUP_SCHED */
>


-- 
Konstantin

  parent reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16 12:57 Konstantin Khlebnikov
2016-06-16 17:03 ` bsegall
2016-06-16 17:23   ` Konstantin Khlebnikov
2016-06-16 17:33     ` bsegall
2016-06-21 13:41 ` Konstantin Khlebnikov [this message]
2016-06-21 21:10 ` Peter Zijlstra
2016-06-22  8:10   ` Konstantin Khlebnikov
2016-06-22  8:23     ` Peter Zijlstra
2016-06-24  8:59 ` [tip:sched/urgent] sched/fair: Initialize " tip-bot for Konstantin Khlebnikov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=576943EF.8080902@yandex-team.ru \
    --to=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git