From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751982AbcFUNlL (ORCPT ); Tue, 21 Jun 2016 09:41:11 -0400 Received: from forward-corp1m.cmail.yandex.net ([5.255.216.100]:32993 "EHLO forward-corp1m.cmail.yandex.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751347AbcFUNlI (ORCPT ); Tue, 21 Jun 2016 09:41:08 -0400 Authentication-Results: smtpcorp1m.mail.yandex.net; dkim=pass header.i=@yandex-team.ru Subject: Re: [PATCH] sched/fair: initialize throttle_count for new task-groups lazily To: Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org References: <146608182119.21870.8439834428248129633.stgit@buzz> Cc: stable@vger.kernel.org From: Konstantin Khlebnikov Message-ID: <576943EF.8080902@yandex-team.ru> Date: Tue, 21 Jun 2016 16:41:03 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 MIME-Version: 1.0 In-Reply-To: <146608182119.21870.8439834428248129633.stgit@buzz> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 16.06.2016 15:57, Konstantin Khlebnikov wrote: > Cgroup created inside throttled group must inherit current throttle_count. > Broken throttle_count allows to nominate throttled entries as a next buddy, > later this leads to null pointer dereference in pick_next_task_fair(). example of kernel oops to summon maintainers <1>[3627487.878297] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038 <1>[3627487.879028] IP: [] set_next_entity+0x1c/0x80 <4>[3627487.879837] PGD 0 <4>[3627487.880567] Oops: 0000 [#1] SMP <4>[3627487.881292] Modules linked in: macvlan overlay ipmi_si ipmi_devintf ipmi_msghandler ip6t_REJECT nf_reject_ipv6 xt_tcpudp ip6table_filter ip6_tables x_tables quota_v2 quota_tree cls_cgroup sch_htb bridge netconsole configfs 8021q mrp garp stp llc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ast ttm drm_kms_helper drm ghash_clmulni_intel aesni_intel ablk_helper sb_edac cryptd lrw lpc_ich gf128mul edac_core sysimgblt glue_helper aes_x86_64 microcode sysfillrect syscopyarea acpi_pad tcp_htcp mlx4_en mlx4_core vxlan udp_tunnel ip6_udp_tunnel igb i2c_algo_bit isci ixgbe libsas i2c_core ahci dca ptp libahci scsi_transport_sas pps_core mdio raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov raid6_pq async_tx raid1 raid0 multipath linear [last unloaded: ipmi_msghandler]<4>[3627487.886379] <4>[3627487.887892] CPU: 21 PID: 0 Comm: swapper/21 Not tainted 3.18.19-24 #1 <4>[3627487.889429] Hardware name: AIC 1D-HV24-02/MB-DPSB04-04, BIOS IVYBV058 07/01/2015 <4>[3627487.891008] task: ffff881fd336f540 ti: ffff881fd33a4000 task.ti: ffff881fd33a4000 <4>[3627487.892569] RIP: 0010:[] [] set_next_entity+0x1c/0x80 <4>[3627487.894200] RSP: 0018:ffff881fd33a7d68 EFLAGS: 00010082 <4>[3627487.895750] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff881fffdb2b70 <4>[3627487.897276] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff881fd1193600 <4>[3627487.898793] RBP: ffff881fd33a7d88 R08: 0000000000000f6d R09: 0000000000000000 <4>[3627487.900358] R10: 0000000000000078 R11: 0000000000000000 R12: 0000000000000000 <4>[3627487.901898] R13: ffffffff8180f3c0 R14: ffff881fd33a4000 R15: ffff881fd1193600 <4>[3627487.903381] FS: 0000000000000000(0000) GS:ffff881fffda0000(0000) knlGS:0000000000000000 <4>[3627487.904920] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[3627487.906382] CR2: 0000000000000038 CR3: 0000000001c14000 CR4: 00000000001407e0 <4>[3627487.907904] Stack: <4>[3627487.909365] ffff881fffdb2b00 ffff881fffdb2b00 0000000000000000 ffffffff8180f3c0 <4>[3627487.910837] ffff881fd33a7e18 ffffffff810a1b18 00000001360794f4 00000001760794f3 <4>[3627487.912322] ffff881fd2888000 0000000000000000 0000000000012b00 ffff881fd336f540 <4>[3627487.913770] Call Trace: <4>[3627487.915188] [] pick_next_task_fair+0x88/0x5f0 <4>[3627487.916573] [] __schedule+0x6ef/0x820 <4>[3627487.917936] [] schedule+0x29/0x70 <4>[3627487.919277] [] schedule_preempt_disabled+0x16/0x20 <4>[3627487.920632] [] cpu_startup_entry+0x14b/0x3d0 <4>[3627487.921999] [] ? clockevents_register_device+0xe2/0x140 <4>[3627487.923323] [] start_secondary+0x14c/0x160 <4>[3627487.924660] Code: 89 ff 48 89 e5 f0 48 0f b3 3e 5d c3 0f 1f 00 0f 1f 44 00 00 55 48 89 e5 48 83 ec 20 48 89 5d e8 4c 89 65 f0 48 89 f3 4c 89 6d f8 <44> 8b 4e 38 49 89 fc 45 85 c9 74 17 4c 8d 6e 10 4c 39 6f 30 74 <1>[3627487.927435] RIP [] set_next_entity+0x1c/0x80 <4>[3627487.928741] RSP <4>[3627487.930010] CR2: 0000000000000038 > > This patch initialize cfs_rq->throttle_count at first enqueue: laziness > allows to skip locking all rq at group creation. Lazy approach also allows > to skip full sub-tree scan at throttling hierarchy (not in this patch). > > Signed-off-by: Konstantin Khlebnikov > Cc: Stable # v3.2+ > --- > kernel/sched/fair.c | 19 +++++++++++++++++++ > kernel/sched/sched.h | 2 +- > 2 files changed, 20 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 218f8e83db73..fe809fe169d2 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -4185,6 +4185,25 @@ static void check_enqueue_throttle(struct cfs_rq *cfs_rq) > if (!cfs_bandwidth_used()) > return; > > + /* synchronize hierarchical throttle counter */ > + if (unlikely(!cfs_rq->throttle_uptodate)) { > + struct rq *rq = rq_of(cfs_rq); > + struct cfs_rq *pcfs_rq; > + struct task_group *tg; > + > + cfs_rq->throttle_uptodate = 1; > + /* get closest uptodate node because leaves goes first */ > + for (tg = cfs_rq->tg->parent; tg; tg = tg->parent) { > + pcfs_rq = tg->cfs_rq[cpu_of(rq)]; > + if (pcfs_rq->throttle_uptodate) > + break; > + } > + if (tg) { > + cfs_rq->throttle_count = pcfs_rq->throttle_count; > + cfs_rq->throttled_clock_task = rq_clock_task(rq); > + } > + } > + > /* an active group must be handled by the update_curr()->put() path */ > if (!cfs_rq->runtime_enabled || cfs_rq->curr) > return; > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 72f1f3087b04..7cbeb92a1cb9 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -437,7 +437,7 @@ struct cfs_rq { > > u64 throttled_clock, throttled_clock_task; > u64 throttled_clock_task_time; > - int throttled, throttle_count; > + int throttled, throttle_count, throttle_uptodate; > struct list_head throttled_list; > #endif /* CONFIG_CFS_BANDWIDTH */ > #endif /* CONFIG_FAIR_GROUP_SCHED */ > -- Konstantin