From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B96FC433FE for ; Sun, 9 Oct 2022 13:17:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230052AbiJINRp (ORCPT ); Sun, 9 Oct 2022 09:17:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229749AbiJINRm (ORCPT ); Sun, 9 Oct 2022 09:17:42 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77338237F3 for ; Sun, 9 Oct 2022 06:17:40 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id l4so8303685plb.8 for ; Sun, 09 Oct 2022 06:17:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=o8DJS0Bp8z45upCAfsSfSsEEeWWUheV9TXrd7HE4E5Q=; b=HabQFMpeWkb0G9twyGKYTxVx3VK2Y0/WFt2RsmoBIsBD6hPeQylWAWegVnix/khHbi gOrS2QCBvW2qK5mQ2Q6R4zhju5Xli6bItriGACBPMtq5pujMzxEvkROCVd2EGeUQhwrg Z+Rl49ecGsI/JbmleX02H77azPRiQZOqHOWqL6Kzw+hejCkOf6AecvQmlegwua2AAZwA aIp4Df7dKKJDn5H9fODP68s1Pi3rSA+BybbK82HeJKVf46IjMktDe3XVds4ZOQV7QDhL uHo1jYQfLH1jHhfxQHorpl18sS+gfXa+EiL3AC25/oUk6CRNggNrHl6fKgiGqtIUuGAi Q+ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=o8DJS0Bp8z45upCAfsSfSsEEeWWUheV9TXrd7HE4E5Q=; b=Il6EfUFlNdwUVeMZUjyeGWqqWh1jWwm6umN44Qw+2fEO7WPWUH6QN6WlQyNR0cCtPZ IjOR3zrxaehpu4A0NxhyHN9sU0d4nUzK2Wirfx4OPTCKZrOMmNhX0SDJRJtBW0XSPOmm OcfgbkYWnUJCCGxbylbwPI+qlaJ/ihBBAi+FH7jZdqFL7dyLY651QXtnzxTfT9ThvQ5s 8e3tonpga0XvoZZw/qRG/s19R7SYbbS8kCfSxyZayGh8ZFTZbgCB+i3EjY3G5y3MlIJj HlUzqGJKLVF0TErO8NoNxf9irKxg+AyZIE9XPWkz1QcxBRJ2mix2cFreEjxoDurE5Bne LDTQ== X-Gm-Message-State: ACrzQf0OPQjxuasH4UjfbtTe1hBIeB9pj5bgYcaSLiqFknEpF3AEjKSO ZUHlO1lVB/FEzx/IyP0VDRxhzA== X-Google-Smtp-Source: AMsMyM7Lv/jvgPK2zEH9j3o0xXv6dFpXCCN53f8L0eM0CaGfsvbD6ewAA3iuiiP7yT43P0kLYIAjtw== X-Received: by 2002:a17:902:b945:b0:181:c6b6:abc with SMTP id h5-20020a170902b94500b00181c6b60abcmr3524401pls.75.1665321459931; Sun, 09 Oct 2022 06:17:39 -0700 (PDT) Received: from [10.70.253.98] ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id g5-20020aa796a5000000b00561d79f1064sm4958765pfk.57.2022.10.09.06.17.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 09 Oct 2022 06:17:38 -0700 (PDT) Message-ID: Date: Sun, 9 Oct 2022 21:17:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: PSI idle-shutoff Content-Language: en-US From: Chengming Zhou To: Suren Baghdasaryan , Pavan Kondeti , Johannes Weiner Cc: LKML , Charan Teja Kalla References: <20220913140817.GA9091@hu-pkondeti-hyd.qualcomm.com> <20220915062027.GA14713@hu-pkondeti-hyd.qualcomm.com> <43f4d1c3-52fe-5254-7d50-c420de6d11a6@bytedance.com> In-Reply-To: <43f4d1c3-52fe-5254-7d50-c420de6d11a6@bytedance.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/10/9 20:41, Chengming Zhou wrote: > Hello, > > I just saw these emails, sorry for late. > > On 2022/10/6 00:32, Suren Baghdasaryan wrote: >> On Sun, Oct 2, 2022 at 11:11 PM Suren Baghdasaryan wrote: >>> >>> On Fri, Sep 16, 2022 at 10:45 PM Suren Baghdasaryan wrote: >>>> >>>> On Wed, Sep 14, 2022 at 11:20 PM Pavan Kondeti >>>> wrote: >>>>> >>>>> On Tue, Sep 13, 2022 at 07:38:17PM +0530, Pavan Kondeti wrote: >>>>>> Hi >>>>>> >>>>>> The fact that psi_avgs_work()->collect_percpu_times()->get_recent_times() >>>>>> run from a kworker thread, PSI_NONIDLE condition would be observed as >>>>>> there is a RUNNING task. So we would always end up re-arming the work. >>>>>> >>>>>> If the work is re-armed from the psi_avgs_work() it self, the backing off >>>>>> logic in psi_task_change() (will be moved to psi_task_switch soon) can't >>>>>> help. The work is already scheduled. so we don't do anything there. >>>> >>>> Hi Pavan, >>>> Thanks for reporting the issue. IIRC [1] was meant to fix exactly this >>>> issue. At the time it was written I tested it and it seemed to work. >>>> Maybe I missed something or some other change introduced afterwards >>>> affected the shutoff logic. I'll take a closer look next week when I'm >>>> back at my computer and will consult with Johannes. >>> >>> Sorry for the delay. I had some time to look into this and test psi >>> shutoff on my device and I think you are right. The patch I mentioned >>> prevents new psi_avgs_work from being scheduled when the only non-idle >>> task is psi_avgs_work itself, however the regular 2sec averaging work >>> will still go on. I think we could record the fact that the only >>> active task is psi_avgs_work in record_times() using a new >>> psi_group_cpu.state_mask flag and then prevent psi_avgs_work() from >>> rescheduling itself if that flag is set for all non-idle cpus. I'll >>> test this approach and will post a patch for review if that works. >> >> Hi Pavan, >> Testing PSI shutoff on Android proved more difficult than I expected. >> Lots of tasks to silence and I keep encountering new ones. >> The approach I was thinking about is something like this: >> >> --- >> include/linux/psi_types.h | 3 +++ >> kernel/sched/psi.c | 12 +++++++++--- >> 2 files changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h >> index c7fe7c089718..8d936f22cb5b 100644 >> --- a/include/linux/psi_types.h >> +++ b/include/linux/psi_types.h >> @@ -68,6 +68,9 @@ enum psi_states { >> NR_PSI_STATES = 7, >> }; >> >> +/* state_mask flag to keep re-arming averaging work */ >> +#define PSI_STATE_WAKE_CLOCK (1 << NR_PSI_STATES) >> + >> enum psi_aggregators { >> PSI_AVGS = 0, >> PSI_POLL, >> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c >> index ecb4b4ff4ce0..dd62ad28bacd 100644 >> --- a/kernel/sched/psi.c >> +++ b/kernel/sched/psi.c >> @@ -278,6 +278,7 @@ static void get_recent_times(struct psi_group >> *group, int cpu, >> if (delta) >> *pchanged_states |= (1 << s); >> } >> + *pchanged_states |= (state_mask & PSI_STATE_WAKE_CLOCK); > > If the avgs_work kworker is running on this CPU, it will still see > PSI_STATE_WAKE_CLOCK set in state_mask? So the work will be re-armed? > > Maybe I missed something... but I have another different idea which > I want to implement later only for discussion. diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index ee2ecc081422..f322e8fd8d41 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -241,11 +241,13 @@ static void get_recent_times(struct psi_group *group, int cpu, enum psi_aggregators aggregator, u32 *times, u32 *pchanged_states) { + int current_cpu = raw_smp_processor_id(); struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu); u64 now, state_start; enum psi_states s; unsigned int seq; u32 state_mask; + bool only_avgs_work = false; *pchanged_states = 0; @@ -256,6 +258,14 @@ static void get_recent_times(struct psi_group *group, int cpu, memcpy(times, groupc->times, sizeof(groupc->times)); state_mask = groupc->state_mask; state_start = groupc->state_start; + /* + * This CPU has only avgs_work kworker running, snapshot the + * newest times then don't need to re-arm work for this groupc. + * Normally this kworker will sleep soon and won't + * wake_clock in psi_group_change(). + */ + if (current_cpu == cpu && groupc->tasks[NR_RUNNING] == 1) + only_avgs_work = true; } while (read_seqcount_retry(&groupc->seq, seq)); /* Calculate state time deltas against the previous snapshot */ @@ -280,6 +290,10 @@ static void get_recent_times(struct psi_group *group, int cpu, if (delta) *pchanged_states |= (1 << s); } + + /* Clear PSI_NONIDLE so avgs_work won't be re-armed for this groupc */ + if (only_avgs_work) + *pchanged_states &= ~(1 << PSI_NONIDLE); } static void calc_avgs(unsigned long avg[3], int missed_periods, > > Thanks. > >> } >> >> static void calc_avgs(unsigned long avg[3], int missed_periods, >> @@ -413,7 +414,7 @@ static void psi_avgs_work(struct work_struct *work) >> struct delayed_work *dwork; >> struct psi_group *group; >> u32 changed_states; >> - bool nonidle; >> + bool wake_clock; >> u64 now; >> >> dwork = to_delayed_work(work); >> @@ -424,7 +425,7 @@ static void psi_avgs_work(struct work_struct *work) >> now = sched_clock(); >> >> collect_percpu_times(group, PSI_AVGS, &changed_states); >> - nonidle = changed_states & (1 << PSI_NONIDLE); >> + wake_clock = changed_states & PSI_STATE_WAKE_CLOCK; >> /* >> * If there is task activity, periodically fold the per-cpu >> * times and feed samples into the running averages. If things >> @@ -435,7 +436,7 @@ static void psi_avgs_work(struct work_struct *work) >> if (now >= group->avg_next_update) >> group->avg_next_update = update_averages(group, now); >> >> - if (nonidle) { >> + if (wake_clock) { >> schedule_delayed_work(dwork, nsecs_to_jiffies( >> group->avg_next_update - now) + 1); >> } >> @@ -742,6 +743,11 @@ static void psi_group_change(struct psi_group >> *group, int cpu, >> if (unlikely(groupc->tasks[NR_ONCPU] && cpu_curr(cpu)->in_memstall)) >> state_mask |= (1 << PSI_MEM_FULL); >> >> + if (wake_clock || test_state(groupc->tasks, PSI_NONIDLE)) { >> + /* psi_avgs_work was not the only task on the CPU */ >> + state_mask |= PSI_STATE_WAKE_CLOCK; >> + } >> + >> groupc->state_mask = state_mask; >> >> write_seqcount_end(&groupc->seq);