From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0EA8CA9EA9 for ; Fri, 18 Oct 2019 18:12:11 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C7E1920640 for ; Fri, 18 Oct 2019 18:12:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726898AbfJRSML (ORCPT ); Fri, 18 Oct 2019 14:12:11 -0400 Received: from mx1.redhat.com ([209.132.183.28]:40182 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726534AbfJRSML (ORCPT ); Fri, 18 Oct 2019 14:12:11 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 410BA30833CB; Fri, 18 Oct 2019 18:12:10 +0000 (UTC) Received: from llong.remote.csb (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id 65C225B69A; Fri, 18 Oct 2019 18:12:05 +0000 (UTC) Subject: Re: [PATCH RT v2 2/3] sched: Lazy migrate_disable processing To: Scott Wood , Sebastian Andrzej Siewior Cc: Thomas Gleixner , Steven Rostedt , Peter Zijlstra , Juri Lelli , linux-rt-users@vger.kernel.org, linux-kernel@vger.kernel.org References: <20191012065214.28109-1-swood@redhat.com> <20191012065214.28109-3-swood@redhat.com> From: Waiman Long Organization: Red Hat Message-ID: <67dca2a2-5322-ca9d-d093-4bbdfc4aec29@redhat.com> Date: Fri, 18 Oct 2019 14:12:04 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20191012065214.28109-3-swood@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Fri, 18 Oct 2019 18:12:10 +0000 (UTC) Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org On 10/12/19 2:52 AM, Scott Wood wrote: > Avoid overhead on the majority of migrate disable/enable sequences by > only manipulating scheduler data (and grabbing the relevant locks) when > the task actually schedules while migrate-disabled. A kernel build > showed around a 10% reduction in system time (with CONFIG_NR_CPUS=512). > > Instead of cpuhp_pin_lock, CPU hotplug is handled by keeping a per-CPU > count of the number of pinned tasks (including tasks which have not > scheduled in the migrate-disabled section); takedown_cpu() will > wait until that reaches zero (confirmed by take_cpu_down() in stop > machine context to deal with races) before migrating tasks off of the > cpu. > > To simplify synchronization, updating cpus_mask is no longer deferred > until migrate_enable(). This lets us not have to worry about > migrate_enable() missing the update if it's on the fast path (didn't > schedule during the migrate disabled section). It also makes the code > a bit simpler and reduces deviation from mainline. > > While the main motivation for this is the performance benefit, lazy > migrate disable also eliminates the restriction on calling > migrate_disable() while atomic but leaving the atomic region prior to > calling migrate_enable() -- though this won't help with local_bh_disable() > (and thus rcutorture) unless something similar is done with the recently > added local_lock. > > Signed-off-by: Scott Wood > --- > The speedup is smaller than before, due to commit 659252061477862f > ("lib/smp_processor_id: Don't use cpumask_equal()") achieving > an overlapping speedup. > --- > include/linux/cpu.h | 4 -- > include/linux/sched.h | 11 +-- > init/init_task.c | 4 ++ > kernel/cpu.c | 103 +++++++++++----------------- > kernel/sched/core.c | 180 ++++++++++++++++++++----------------------------- > kernel/sched/sched.h | 4 ++ > lib/smp_processor_id.c | 3 + > 7 files changed, 128 insertions(+), 181 deletions(-) > > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index f4a772c12d14..2df500fdcbc4 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -113,8 +113,6 @@ static inline void cpu_maps_update_done(void) > extern void cpu_hotplug_enable(void); > void clear_tasks_mm_cpumask(int cpu); > int cpu_down(unsigned int cpu); > -extern void pin_current_cpu(void); > -extern void unpin_current_cpu(void); > > #else /* CONFIG_HOTPLUG_CPU */ > > @@ -126,8 +124,6 @@ static inline void cpus_read_unlock(void) { } > static inline void lockdep_assert_cpus_held(void) { } > static inline void cpu_hotplug_disable(void) { } > static inline void cpu_hotplug_enable(void) { } > -static inline void pin_current_cpu(void) { } > -static inline void unpin_current_cpu(void) { } > > #endif /* !CONFIG_HOTPLUG_CPU */ > > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 7e892e727f12..c6872b38bf77 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -229,6 +229,8 @@ > extern long io_schedule_timeout(long timeout); > extern void io_schedule(void); > > +int cpu_nr_pinned(int cpu); > + > /** > * struct prev_cputime - snapshot of system and user cputime > * @utime: time spent in user mode > @@ -661,16 +663,13 @@ struct task_struct { > cpumask_t cpus_mask; > #if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE) > int migrate_disable; > - int migrate_disable_update; > - int pinned_on_cpu; > + bool migrate_disable_scheduled; > # ifdef CONFIG_SCHED_DEBUG > - int migrate_disable_atomic; > + int pinned_on_cpu; > # endif > - > #elif !defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE) > # ifdef CONFIG_SCHED_DEBUG > int migrate_disable; > - int migrate_disable_atomic; > # endif > #endif > #ifdef CONFIG_PREEMPT_RT_FULL > @@ -2074,4 +2073,6 @@ static inline void rseq_syscall(struct pt_regs *regs) > > #endif > > +extern struct task_struct *takedown_cpu_task; > + > #endif > diff --git a/init/init_task.c b/init/init_task.c > index e402413dc47d..c0c7618fd2fb 100644 > --- a/init/init_task.c > +++ b/init/init_task.c > @@ -81,6 +81,10 @@ struct task_struct init_task > .cpus_ptr = &init_task.cpus_mask, > .cpus_mask = CPU_MASK_ALL, > .nr_cpus_allowed= NR_CPUS, > +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT_BASE) && \ > + defined(CONFIG_SCHED_DEBUG) > + .pinned_on_cpu = -1, > +#endif > .mm = NULL, > .active_mm = &init_mm, > .restart_block = { > diff --git a/kernel/cpu.c b/kernel/cpu.c > index 25afa2bb1a2c..e1bf3c698a32 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -76,11 +76,6 @@ static DEFINE_PER_CPU(struct cpuhp_cpu_state, cpuhp_state) = { > .fail = CPUHP_INVALID, > }; > > -#if defined(CONFIG_HOTPLUG_CPU) && defined(CONFIG_PREEMPT_RT_FULL) > -static DEFINE_PER_CPU(struct rt_rw_lock, cpuhp_pin_lock) = \ > - __RWLOCK_RT_INITIALIZER(cpuhp_pin_lock); > -#endif > - > #if defined(CONFIG_LOCKDEP) && defined(CONFIG_SMP) > static struct lockdep_map cpuhp_state_up_map = > STATIC_LOCKDEP_MAP_INIT("cpuhp_state-up", &cpuhp_state_up_map); > @@ -287,57 +282,6 @@ void cpu_maps_update_done(void) > > #ifdef CONFIG_HOTPLUG_CPU > > -/** > - * pin_current_cpu - Prevent the current cpu from being unplugged > - */ > -void pin_current_cpu(void) > -{ > -#ifdef CONFIG_PREEMPT_RT_FULL > - struct rt_rw_lock *cpuhp_pin; > - unsigned int cpu; > - int ret; > - > -again: > - cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock); > - ret = __read_rt_trylock(cpuhp_pin); > - if (ret) { > - current->pinned_on_cpu = smp_processor_id(); > - return; > - } > - cpu = smp_processor_id(); > - preempt_lazy_enable(); > - preempt_enable(); > - > - sleeping_lock_inc(); > - __read_rt_lock(cpuhp_pin); > - sleeping_lock_dec(); > - > - preempt_disable(); > - preempt_lazy_disable(); > - if (cpu != smp_processor_id()) { > - __read_rt_unlock(cpuhp_pin); > - goto again; > - } > - current->pinned_on_cpu = cpu; > -#endif > -} > - > -/** > - * unpin_current_cpu - Allow unplug of current cpu > - */ > -void unpin_current_cpu(void) > -{ > -#ifdef CONFIG_PREEMPT_RT_FULL > - struct rt_rw_lock *cpuhp_pin = this_cpu_ptr(&cpuhp_pin_lock); > - > - if (WARN_ON(current->pinned_on_cpu != smp_processor_id())) > - cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, current->pinned_on_cpu); > - > - current->pinned_on_cpu = -1; > - __read_rt_unlock(cpuhp_pin); > -#endif > -} > - > DEFINE_STATIC_PERCPU_RWSEM(cpu_hotplug_lock); > > void cpus_read_lock(void) > @@ -895,6 +839,15 @@ static int take_cpu_down(void *_param) > int err, cpu = smp_processor_id(); > int ret; > > +#ifdef CONFIG_PREEMPT_RT_BASE > + /* > + * If any tasks disabled migration before we got here, > + * go back and sleep again. > + */ > + if (cpu_nr_pinned(cpu)) > + return -EAGAIN; > +#endif > + > /* Ensure this CPU doesn't handle any more interrupts. */ > err = __cpu_disable(); > if (err < 0) > @@ -924,11 +877,10 @@ static int take_cpu_down(void *_param) > return 0; > } > > +struct task_struct *takedown_cpu_task; > + > static int takedown_cpu(unsigned int cpu) > { > -#ifdef CONFIG_PREEMPT_RT_FULL > - struct rt_rw_lock *cpuhp_pin = per_cpu_ptr(&cpuhp_pin_lock, cpu); > -#endif > struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu); > int err; > > @@ -941,17 +893,38 @@ static int takedown_cpu(unsigned int cpu) > */ > irq_lock_sparse(); > > -#ifdef CONFIG_PREEMPT_RT_FULL > - __write_rt_lock(cpuhp_pin); > +#ifdef CONFIG_PREEMPT_RT_BASE > + WARN_ON_ONCE(takedown_cpu_task); > + takedown_cpu_task = current; I don't know how likely it is for more than one task calling takedown_cpu() concurrently. But if that can happen, there is a possibility of missed wakeup. The current code is fragile. How about something like while (cmpxchg(&takedown_cpu_task, NULL, current) {     WARN_ON_ONCE(1);     schedule(); } Cheers, Longman