From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F326C433F5 for ; Mon, 18 Apr 2022 09:09:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237412AbiDRJMV (ORCPT ); Mon, 18 Apr 2022 05:12:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36998 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234812AbiDRJMU (ORCPT ); Mon, 18 Apr 2022 05:12:20 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BE4C7DF5B for ; Mon, 18 Apr 2022 02:09:40 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id d15so11862082pll.10 for ; Mon, 18 Apr 2022 02:09:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=4zgIjpV1QpK8SGoNtpBB6Ij+QoqmC5OIdrLktq8hdyA=; b=TpUlhZbGYppVgX7JfA1+9T3+LcztkehRmnRv+QP5CTCwa+NMcwVWIa9Bq/d0HeTqMa UiP0WRii/rHJKd02deVu66UgbWYuM7RhU+t+gBUaUKBLM2Zi8xDGZAfDMHkkOHAMfBGD yBK8M/dU/a9K9imZpThHkQkqY6zp4m1MMNMTwVPKLmW9iSuK04YNAdmCpgqxXyXOrRsy fQRfp0kSPyVDbVG+PrY6tNpeoPKMlaeXpriA++bnzLJtbShvE5N/IirTwVDMsqSJaGK9 WikQYXJumcA1D7Umc+IKUDZmMYJs1zCZKVn/4+fK/3nGnC0ymf683EehpvreG8g7cZyo GUiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=4zgIjpV1QpK8SGoNtpBB6Ij+QoqmC5OIdrLktq8hdyA=; b=20aXFR0CbgObXLMifNjA2AQCs13PIoCJBZ4/hgeq8rY12ueHl/zgkZQ23XqPGCjgdb Os43lcf2L5QW5wUzGQ8pGaA7m0hQeQSE7hoQ9H3sv9gphDK2g8mAUgRc+PCVWoXbQD2A LVkWyTdY0IGHFHSrIX3g3LJMYMPM6MITN8zSGlwutvCxlOwnee+vGwmMV/fgjdA31KUc C91cxOQZ8eJTCzj63b2i3hP7uw7UFqoBXmEa5A7UH700BS84ArJRoBK9nCHZfNuRJENL TcQlGlwpaaNKbI1eLcoASGpA6UK830J7u3vARB0l8md9lK1fya+GCi3/v++V5qWrPhYn tTMA== X-Gm-Message-State: AOAM53157XObPSQvbwZqlgnPCS2hVQF41m8wb+Hfer3TaxfIZi8+BlDF 4Y8al42QpjUoUUYu/P0Lj3nEPg== X-Google-Smtp-Source: ABdhPJyLyRKMrvmUlnZpvmV+au+kPFkSDdm9JIL4QkYzROzZ5Dy/anBbaJj0AQR/YRdrPiRjQqQn2Q== X-Received: by 2002:a17:90a:db08:b0:1c9:7cf3:6363 with SMTP id g8-20020a17090adb0800b001c97cf36363mr12071035pjv.35.1650272980251; Mon, 18 Apr 2022 02:09:40 -0700 (PDT) Received: from C02G87K0MD6R.bytedance.net ([139.177.225.244]) by smtp.gmail.com with ESMTPSA id p3-20020a056a000b4300b0050a4e73bf89sm8332116pfo.66.2022.04.18.02.09.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 Apr 2022 02:09:39 -0700 (PDT) From: Hao Jia To: mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com Cc: linux-kernel@vger.kernel.org, Hao Jia Subject: [PATCH] sched/core: Avoid obvious double update_rq_clock warning Date: Mon, 18 Apr 2022 17:09:29 +0800 Message-Id: <20220418090929.54005-1-jiahao.os@bytedance.com> X-Mailer: git-send-email 2.32.0 (Apple Git-132) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When we use raw_spin_rq_lock to acquire the rq lock and have to update the rq clock while holding the lock, the kernel may issue a WARN_DOUBLE_CLOCK warning. Since we directly use raw_spin_rq_lock to acquire rq lock instead of rq_lock, there is no corresponding change to rq->clock_update_flags. In particular, we have obtained the rq lock of other cores, the core rq->clock_update_flags may be RQCF_UPDATED at this time, and then calling update_rq_clock will trigger the WARN_DOUBLE_CLOCK warning. Some call trace reports: Call Trace 1: sched_rt_period_timer+0x10f/0x3a0 ? enqueue_top_rt_rq+0x110/0x110 __hrtimer_run_queues+0x1a9/0x490 hrtimer_interrupt+0x10b/0x240 __sysvec_apic_timer_interrupt+0x8a/0x250 sysvec_apic_timer_interrupt+0x9a/0xd0 asm_sysvec_apic_timer_interrupt+0x12/0x20 Call Trace 2: activate_task+0x8b/0x110 push_rt_task.part.108+0x241/0x2c0 push_rt_tasks+0x15/0x30 finish_task_switch+0xaa/0x2e0 ? __switch_to+0x134/0x420 __schedule+0x343/0x8e0 ? hrtimer_start_range_ns+0x101/0x340 schedule+0x4e/0xb0 do_nanosleep+0x8e/0x160 hrtimer_nanosleep+0x89/0x120 ? hrtimer_init_sleeper+0x90/0x90 __x64_sys_nanosleep+0x96/0xd0 do_syscall_64+0x34/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Call Trace 3: deactivate_task+0x93/0xe0 pull_rt_task+0x33e/0x400 balance_rt+0x7e/0x90 __schedule+0x62f/0x8e0 do_task_dead+0x3f/0x50 do_exit+0x7b8/0xbb0 do_group_exit+0x2d/0x90 get_signal+0x9df/0x9e0 ? preempt_count_add+0x56/0xa0 ? __remove_hrtimer+0x35/0x70 arch_do_signal_or_restart+0x36/0x720 ? nanosleep_copyout+0x39/0x50 ? do_nanosleep+0x131/0x160 ? audit_filter_inodes+0xf5/0x120 exit_to_user_mode_prepare+0x10f/0x1e0 syscall_exit_to_user_mode+0x17/0x30 do_syscall_64+0x40/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Steps to reproduce: 1. Enable CONFIG_SCHED_DEBUG when compiling the kernel 2. echo 1 > /sys/kernel/debug/clear_warn_once echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched_features echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched_features 3. Run some rt tasks that periodically change the priority and sleep, e.g.: void *ThreadFun(void *arg) { int cnt = *(int*)arg; struct sched_param param; while (1) { sqrt(MAGIC_NUM); cnt = cnt % 10 + 1; param.sched_priority = cnt; pthread_setschedparam(pthread_self(), SCHED_RR, ¶m); sqrt(MAGIC_NUM); sqrt(MAGIC_NUM); sleep(cnt); } return NULL; } Signed-off-by: Hao Jia --- kernel/sched/deadline.c | 18 +++++++++++------- kernel/sched/rt.c | 20 ++++++++++++++++++-- 2 files changed, 29 insertions(+), 9 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index fb4255ae0b2c..9207b978cc43 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -2258,6 +2258,7 @@ static int push_dl_task(struct rq *rq) { struct task_struct *next_task; struct rq *later_rq; + struct rq_flags srf, drf; int ret = 0; if (!rq->dl.overloaded) @@ -2317,16 +2318,14 @@ static int push_dl_task(struct rq *rq) goto retry; } + rq_pin_lock(rq, &srf); + rq_pin_lock(later_rq, &drf); deactivate_task(rq, next_task, 0); set_task_cpu(next_task, later_rq->cpu); - - /* - * Update the later_rq clock here, because the clock is used - * by the cpufreq_update_util() inside __add_running_bw(). - */ - update_rq_clock(later_rq); - activate_task(later_rq, next_task, ENQUEUE_NOCLOCK); + activate_task(later_rq, next_task, 0); ret = 1; + rq_unpin_lock(rq, &srf); + rq_unpin_lock(later_rq, &drf); resched_curr(later_rq); @@ -2351,6 +2350,7 @@ static void pull_dl_task(struct rq *this_rq) struct task_struct *p, *push_task; bool resched = false; struct rq *src_rq; + struct rq_flags this_rf, src_rf; u64 dmin = LONG_MAX; if (likely(!dl_overloaded(this_rq))) @@ -2413,11 +2413,15 @@ static void pull_dl_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task = get_push_task(src_rq); } else { + rq_pin_lock(this_rq, &this_rf); + rq_pin_lock(src_rq, &src_rf); deactivate_task(src_rq, p, 0); set_task_cpu(p, this_cpu); activate_task(this_rq, p, 0); dmin = p->dl.deadline; resched = true; + rq_unpin_lock(this_rq, &this_rf); + rq_unpin_lock(src_rq, &src_rf); } /* Is there any other task even earlier? */ diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index a32c46889af8..9305ad87fef0 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -871,6 +871,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) int enqueue = 0; struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i); struct rq *rq = rq_of_rt_rq(rt_rq); + struct rq_flags rf; int skip; /* @@ -885,7 +886,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) if (skip) continue; - raw_spin_rq_lock(rq); + rq_lock(rq, &rf); update_rq_clock(rq); if (rt_rq->rt_time) { @@ -923,7 +924,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun) if (enqueue) sched_rt_rq_enqueue(rt_rq); - raw_spin_rq_unlock(rq); + rq_unlock(rq, &rf); } if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF)) @@ -2001,6 +2002,7 @@ static int push_rt_task(struct rq *rq, bool pull) { struct task_struct *next_task; struct rq *lowest_rq; + struct rq_flags srf, drf; int ret = 0; if (!rq->rt.overloaded) @@ -2102,9 +2104,18 @@ static int push_rt_task(struct rq *rq, bool pull) goto retry; } + /* + * We may drop rq'lock in double_lock_balance, + * so we still need to clean up the RQCF_UPDATED flag + * to avoid the WARN_DOUBLE_CLOCK warning. + */ + rq_pin_lock(rq, &srf); + rq_pin_lock(lowest_rq, &drf); deactivate_task(rq, next_task, 0); set_task_cpu(next_task, lowest_rq->cpu); activate_task(lowest_rq, next_task, 0); + rq_unpin_lock(rq, &srf); + rq_unpin_lock(lowest_rq, &drf); resched_curr(lowest_rq); ret = 1; @@ -2299,6 +2310,7 @@ static void pull_rt_task(struct rq *this_rq) bool resched = false; struct task_struct *p, *push_task; struct rq *src_rq; + struct rq_flags src_rf, this_rf; int rt_overload_count = rt_overloaded(this_rq); if (likely(!rt_overload_count)) @@ -2375,10 +2387,14 @@ static void pull_rt_task(struct rq *this_rq) if (is_migration_disabled(p)) { push_task = get_push_task(src_rq); } else { + rq_pin_lock(this_rq, &this_rf); + rq_pin_lock(src_rq, &src_rf); deactivate_task(src_rq, p, 0); set_task_cpu(p, this_cpu); activate_task(this_rq, p, 0); resched = true; + rq_unpin_lock(this_rq, &this_rf); + rq_unpin_lock(src_rq, &src_rf); } /* * We continue with the search, just in -- 2.32.0