From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D38CC4360C for ; Wed, 2 Oct 2019 20:48:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 16D6C2070B for ; Wed, 2 Oct 2019 20:48:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=digitalocean.com header.i=@digitalocean.com header.b="An15MEpM" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728719AbfJBUsa (ORCPT ); Wed, 2 Oct 2019 16:48:30 -0400 Received: from mail-oi1-f195.google.com ([209.85.167.195]:39823 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727911AbfJBUsa (ORCPT ); Wed, 2 Oct 2019 16:48:30 -0400 Received: by mail-oi1-f195.google.com with SMTP id w144so601714oia.6 for ; Wed, 02 Oct 2019 13:48:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=digitalocean.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=AEbG0e9SVlOvdqd5lS6UQt/0LUXQ3Pzs6cUXmwlgj2s=; b=An15MEpMxrfLIrUSBszoUbhxmzYv80e8jb19G8ikUYtaOK0yZDQmuDkoaLvCok2aVr qPo2wV6Laj733065pkVk6sb/CSyJZ6VZ5D6DRYBzELSofblKm4aPP8bR1h/yCXCjSG16 omKnb6MHKnqcncVB32rhj7vGO37C88j2zCLqI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=AEbG0e9SVlOvdqd5lS6UQt/0LUXQ3Pzs6cUXmwlgj2s=; b=NgoNJwWGC+DwO5GQA+rgjmlr3TywXsRfUEmf4iyXumAyPeJF0xMN/j8CMJOWoBSlCs uXMex7ZZgwxuuyJm5NGsP37XvFQWeyoRhv6KgooeCxZNcExCt1vmdjDrdJOutcMeUDY/ 3FtlG8hxArtAHiCI08Gj7BZLkRXeJ9icfsjkI1EsNw1SEma5axJvhVAabSvfaiYziYML Muzhd0uFdzQEI1ywlbYHZi9HTopqL0ICNSUu4GK75lseJJ9UGP4KAoJ5DjnQxFONjkH6 VQ9ZQPrxDkg5cI/OKLTpKk1LCvfhmR56NUM6UN/a5Dj+rACdpPoV5zMwM0dDs7JVB/32 Ujzg== X-Gm-Message-State: APjAAAWgC7xqds5KfjEtSVbJLNYnr5gBaR+uaMnAemDJL0NuQuytchAI zP6oALjYb7D69hU7NDZkxH3zsaYMpJxoLlh/VI/jlw== X-Google-Smtp-Source: APXvYqwd95FnkWwLe44gfdqOQmFfpoyNR0ryu3eNLL1KwcZdDEuiJtTTjij9ZJIMPQys0ApKrcdENky0j83MxC5hcOQ= X-Received: by 2002:aca:ab84:: with SMTP id u126mr4184593oie.115.1570049306806; Wed, 02 Oct 2019 13:48:26 -0700 (PDT) MIME-Version: 1.0 References: <20190725143003.GA992@aaronlu> <20190726152101.GA27884@sinkpad> <7dc86e3c-aa3f-905f-3745-01181a3b0dac@linux.intel.com> <20190802153715.GA18075@sinkpad> <69cd9bca-da28-1d35-3913-1efefe0c1c22@linux.intel.com> <20190911140204.GA52872@aaronlu> <7b001860-05b4-4308-df0e-8b60037b8000@linux.intel.com> <20190912123532.GB16200@aaronlu> In-Reply-To: From: Vineeth Remanan Pillai Date: Wed, 2 Oct 2019 16:48:14 -0400 Message-ID: Subject: Re: [RFC PATCH v3 00/16] Core scheduling v3 To: Aaron Lu Cc: Tim Chen , Julien Desfossez , Dario Faggioli , "Li, Aubrey" , Aubrey Li , Nishanth Aravamudan , Peter Zijlstra , Ingo Molnar , Thomas Gleixner , Paul Turner , Linus Torvalds , Linux List Kernel Mailing , =?UTF-8?B?RnLDqWTDqXJpYyBXZWlzYmVja2Vy?= , Kees Cook , Greg Kerr , Phil Auld , Valentin Schneider , Mel Gorman , Pawan Gupta , Paolo Bonzini Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Sep 30, 2019 at 7:53 AM Vineeth Remanan Pillai wrote: > > > > Sorry, I misunderstood the fix and I did not initially see the core wide > min_vruntime that you tried to maintain in the rq->core. This approach > seems reasonable. I think we can fix the potential starvation that you > mentioned in the comment by adjusting for the difference in all the children > cfs_rq when we set the minvruntime in rq->core. Since we take the lock for > both the queues, it should be doable and I am trying to see how we can best > do that. > Attaching here with, the 2 patches I was working on in preparation of v4. Patch 1 is an improvement of patch 2 of Aaron where I am propagating the vruntime changes to the whole tree. Patch 2 is an improvement for patch 3 of Aaron where we do resched_curr only when the sibling is forced idle. Micro benchmarks seems good. Will be doing larger set of tests and hopefully posting v4 by end of week. Please let me know what you think of these patches (patch 1 is on top of Aaron's patch 2, patch 2 replaces Aaron's patch 3) Thanks, Vineeth [PATCH 1/2] sched/fair: propagate the min_vruntime change to the whole rq tree When we adjust the min_vruntime of rq->core, we need to propgate that down the tree so as to not cause starvation of existing tasks based on previous vruntime. --- kernel/sched/fair.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 59cb01a1563b..e8dd78a8c54d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -476,6 +476,23 @@ static inline u64 cfs_rq_min_vruntime(struct cfs_rq *cfs_rq) return cfs_rq->min_vruntime; } +static void coresched_adjust_vruntime(struct cfs_rq *cfs_rq, u64 delta) +{ + struct sched_entity *se, *next; + + if (!cfs_rq) + return; + + cfs_rq->min_vruntime -= delta; + rbtree_postorder_for_each_entry_safe(se, next, + &cfs_rq->tasks_timeline.rb_root, run_node) { + if (se->vruntime > delta) + se->vruntime -= delta; + if (se->my_q) + coresched_adjust_vruntime(se->my_q, delta); + } +} + static void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rq) { struct cfs_rq *cfs_rq_core; @@ -487,8 +504,11 @@ static void update_core_cfs_rq_min_vruntime(struct cfs_rq *cfs_rq) return; cfs_rq_core = core_cfs_rq(cfs_rq); - cfs_rq_core->min_vruntime = max(cfs_rq_core->min_vruntime, - cfs_rq->min_vruntime); + if (cfs_rq_core != cfs_rq && + cfs_rq->min_vruntime < cfs_rq_core->min_vruntime) { + u64 delta = cfs_rq_core->min_vruntime - cfs_rq->min_vruntime; + coresched_adjust_vruntime(cfs_rq_core, delta); + } } bool cfs_prio_less(struct task_struct *a, struct task_struct *b) -- 2.17.1 [PATCH 2/2] sched/fair : Wake up forced idle siblings if needed If a cpu has only one task and if it has used up its timeslice, then we should try to wake up the sibling to give the forced idle thread a chance. We do that by triggering schedule which will IPI the sibling if the task in the sibling wins the priority check. --- kernel/sched/fair.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index e8dd78a8c54d..ba4d929abae6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4165,6 +4165,13 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) update_min_vruntime(cfs_rq); } +static inline bool +__entity_slice_used(struct sched_entity *se) +{ + return (se->sum_exec_runtime - se->prev_sum_exec_runtime) > + sched_slice(cfs_rq_of(se), se); +} + /* * Preempt the current task with a newly woken task if needed: */ @@ -10052,6 +10059,39 @@ static void rq_offline_fair(struct rq *rq) #endif /* CONFIG_SMP */ +#ifdef CONFIG_SCHED_CORE +/* + * If runqueue has only one task which used up its slice and + * if the sibling is forced idle, then trigger schedule + * to give forced idle task a chance. + */ +static void resched_forceidle(struct rq *rq, struct sched_entity *se) +{ + int cpu = cpu_of(rq), sibling_cpu; + if (rq->cfs.nr_running > 1 || !__entity_slice_used(se)) + return; + + for_each_cpu(sibling_cpu, cpu_smt_mask(cpu)) { + struct rq *sibling_rq; + if (sibling_cpu == cpu) + continue; + if (cpu_is_offline(sibling_cpu)) + continue; + + sibling_rq = cpu_rq(sibling_cpu); + if (sibling_rq->core_forceidle) { + resched_curr(rq); + break; + } + } +} +#else +static inline void resched_forceidle(struct rq *rq, struct sched_entity *se) +{ +} +#endif + + /* * scheduler tick hitting a task of our scheduling class. * @@ -10075,6 +10115,9 @@ static void task_tick_fair(struct rq *rq, struct task_struct *curr, int queued) update_misfit_status(curr, rq); update_overutilized_status(task_rq(curr)); + + if (sched_core_enabled(rq)) + resched_forceidle(rq, &curr->se); } /* -- 2.17.1