From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 61B09CD5BC0 for ; Tue, 19 Sep 2023 13:44:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232409AbjISNof (ORCPT ); Tue, 19 Sep 2023 09:44:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232474AbjISNoL (ORCPT ); Tue, 19 Sep 2023 09:44:11 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CEDE11738 for ; Tue, 19 Sep 2023 06:43:31 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1695131008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=F6HJIRD4PyaX4+pUjqGdadn05E3sV+30VO4ObkBi4tA=; b=INe3beAVUGy7X1rPg0jVCOBApZrBIols9AlzAGXgMjPV257Vi8uogUm7f+1Kqn1dxss5w3 Qd0i8v4W/KraIOm+0T08nhMyYiNAOuSdipyvpsH3wNCmEDgs6EpI6kzYy1p6edIzt5fRsM P3qq2iMbd4rnkAcfTNo2E4Mimdyz7ezivL28bRK930bJKJgeepHh1qRS0BwzW2gGyPjpzO FsKZToH8pgbrfbDvRRO4VZRnQlJziFXI2AOfdGkEzVhRgYrR7LTr5h7nXciBgTG+k58Etx Ndd3xkgzvmITTLqDMaD3fS+b1tdLInm1s1/Jc1ttwJ+JYsXw7NcqZLonBfCbWQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1695131008; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=F6HJIRD4PyaX4+pUjqGdadn05E3sV+30VO4ObkBi4tA=; b=tqREo84zIB2g7/8J6CTjDIfHTI+ZuUEpff2uM/m8NQcbyKurWmVLCLHVVCMc9TsZfuSA/a /Stk4hnv9k0JS6Bw== To: Ingo Molnar , Linus Torvalds Cc: Peter Zijlstra , Ankur Arora , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org, jon.grimm@amd.com, bharata@amd.com, raghavendra.kt@amd.com, boris.ostrovsky@oracle.com, konrad.wilk@oracle.com, jgross@suse.com, andrew.cooper3@citrix.com Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED In-Reply-To: References: <87edj64rj1.fsf@oracle.com> <87zg1u1h5t.fsf@oracle.com> <20230911150410.GC9098@noisy.programming.kicks-ass.net> <87h6o01w1a.fsf@oracle.com> <20230912082606.GB35261@noisy.programming.kicks-ass.net> <87cyyfxd4k.ffs@tglx> Date: Tue, 19 Sep 2023 15:43:27 +0200 Message-ID: <87fs3awa68.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 19 2023 at 10:43, Ingo Molnar wrote: > * Ingo Molnar wrote: > Ie. a modern scheduler might have mooted much of this change: > > 4542057e18ca ("mm: avoid 'might_sleep()' in get_mmap_lock_carefully()") > > ... because now we'll only reschedule on timeslice exhaustion, or if a task > comes in with a big deadline deficit. > > And even the deadline-deficit wakeup preemption can be turned off further > with: > > $ echo NO_WAKEUP_PREEMPTION > /debug/sched/features > > And we are considering making that the default behavior for same-prio tasks > - basically turn same-prio SCHED_OTHER tasks into SCHED_BATCH - which > should be quite similar to what NEED_RESCHED_LAZY achieves on -rt. I don't think that you can get rid of NEED_RESCHED_LAZY for !RT because there is a clear advantage of having the return to user preemption point. It spares to have the kernel/user transition just to get the task back via the timeslice interrupt. I experimented with that on RT and the result was definitely worse. We surely can revisit that, but I'd really start with the straight forward mappable LAZY bit approach and if experimentation turns out to provide good enough results by not setting that bit at all, then we still can do so without changing anything except the core scheduler decision logic. It's again a cheap thing due to the way how the return to user TIF handling works: ti_work = read_thread_flags(); if (unlikely(ti_work & EXIT_TO_USER_MODE_WORK)) ti_work = exit_to_user_mode_loop(regs, ti_work); TIF_LAZY_RESCHED is part of EXIT_TO_USER_MODE_WORK, so the non-work case does not become more expensive than today. If any of the bits is set, then the slowpath wont get measurably different performance whether the bit is evaluated or not in exit_to_user_mode_loop(). As we really want TIF_LAZY_RESCHED for RT, we just keep all of this consistent in terms of code and purely a scheduler decision whether it utilizes it or not. As a consequence PREEMPT_RT is not longer special in that regard and the main RT difference becomes the lock substitution and forced interrupt threading. For the magic 'spare me the extra conditional' optimization of exit_to_user_mode_loop() if LAZY can be optimized out for !RT because the scheduler is sooo clever (which I doubt), we can just use the same approach as for other TIF bits and define them to 0 :) So lets start consistent and optimize on top if really required. Thanks, tglx