From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73EFFC433FE for ; Mon, 10 Oct 2022 06:43:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231245AbiJJGnY (ORCPT ); Mon, 10 Oct 2022 02:43:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59078 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231220AbiJJGnW (ORCPT ); Mon, 10 Oct 2022 02:43:22 -0400 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0C9A052DD3 for ; Sun, 9 Oct 2022 23:43:20 -0700 (PDT) Received: from pps.filterd (m0279873.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29A4D833017818; Mon, 10 Oct 2022 06:43:17 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=date : from : to : cc : subject : message-id : references : mime-version : content-type : in-reply-to; s=qcppdkim1; bh=boes3ma3opD9+iACwnulK78t7r3+2aMZJGaD0dxYKTc=; b=n2GFhZ2X9wYdBabRUpaxfF4qgY4FFhv6ynLRBhXhkUOGL1Y2uWwj7CPzn3jFJ+7Nmnvn rnv25IB5gwwX1dM8tFYk4CfodRoMGps4PBGGDZjiq/RmHwjj1sTSvBBGjAJwgYfBUek4 EQCI4XD5k/3VPlIA9glJmAopbbEe/Rs2acZdEe703BpR9zsw2/x9PZ7jAHNnMA4PQb5D p3G7gUV3m9j3qfofy1gTonVV42+Dbe8IZgtlh0Zj4CYE5eiUo+N5LucXJl51urh7T1/1 CvsbEFWHTPJiauTkpXap149CRGKt3f44Yga2tS9TQ5zCHLFCPhvkO0Dj3fl6VCLimH2b wA== Received: from nalasppmta04.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3k31q5b4u2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 Oct 2022 06:43:17 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA04.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 29A6hGdQ020568 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 10 Oct 2022 06:43:16 GMT Received: from hu-pkondeti-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.29; Sun, 9 Oct 2022 23:43:13 -0700 Date: Mon, 10 Oct 2022 12:13:09 +0530 From: Pavan Kondeti To: Pavan Kondeti CC: Chengming Zhou , Suren Baghdasaryan , Johannes Weiner , LKML , Charan Teja Kalla Subject: Re: PSI idle-shutoff Message-ID: <20221010064309.GD1474@hu-pkondeti-hyd.qualcomm.com> References: <20220913140817.GA9091@hu-pkondeti-hyd.qualcomm.com> <20220915062027.GA14713@hu-pkondeti-hyd.qualcomm.com> <43f4d1c3-52fe-5254-7d50-c420de6d11a6@bytedance.com> <20221010061849.GB1474@hu-pkondeti-hyd.qualcomm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20221010061849.GB1474@hu-pkondeti-hyd.qualcomm.com> User-Agent: Mutt/1.5.24 (2015-08-30) X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01b.na.qualcomm.com (10.46.141.250) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: JclyZJhjXjMVkg_4FOzjV0wm4gJANVJZ X-Proofpoint-ORIG-GUID: JclyZJhjXjMVkg_4FOzjV0wm4gJANVJZ X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.528,FMLib:17.11.122.1 definitions=2022-10-07_04,2022-10-07_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 impostorscore=0 spamscore=0 malwarescore=0 priorityscore=1501 mlxscore=0 adultscore=0 mlxlogscore=999 bulkscore=0 suspectscore=0 phishscore=0 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2210100040 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Oct 10, 2022 at 11:48:49AM +0530, Pavan Kondeti wrote: > On Sun, Oct 09, 2022 at 09:17:34PM +0800, Chengming Zhou wrote: > > On 2022/10/9 20:41, Chengming Zhou wrote: > > > Hello, > > > > > > I just saw these emails, sorry for late. > > > > > > On 2022/10/6 00:32, Suren Baghdasaryan wrote: > > >> On Sun, Oct 2, 2022 at 11:11 PM Suren Baghdasaryan wrote: > > >>> > > >>> On Fri, Sep 16, 2022 at 10:45 PM Suren Baghdasaryan wrote: > > >>>> > > >>>> On Wed, Sep 14, 2022 at 11:20 PM Pavan Kondeti > > >>>> wrote: > > >>>>> > > >>>>> On Tue, Sep 13, 2022 at 07:38:17PM +0530, Pavan Kondeti wrote: > > >>>>>> Hi > > >>>>>> > > >>>>>> The fact that psi_avgs_work()->collect_percpu_times()->get_recent_times() > > >>>>>> run from a kworker thread, PSI_NONIDLE condition would be observed as > > >>>>>> there is a RUNNING task. So we would always end up re-arming the work. > > >>>>>> > > >>>>>> If the work is re-armed from the psi_avgs_work() it self, the backing off > > >>>>>> logic in psi_task_change() (will be moved to psi_task_switch soon) can't > > >>>>>> help. The work is already scheduled. so we don't do anything there. > > >>>> > > >>>> Hi Pavan, > > >>>> Thanks for reporting the issue. IIRC [1] was meant to fix exactly this > > >>>> issue. At the time it was written I tested it and it seemed to work. > > >>>> Maybe I missed something or some other change introduced afterwards > > >>>> affected the shutoff logic. I'll take a closer look next week when I'm > > >>>> back at my computer and will consult with Johannes. > > >>> > > >>> Sorry for the delay. I had some time to look into this and test psi > > >>> shutoff on my device and I think you are right. The patch I mentioned > > >>> prevents new psi_avgs_work from being scheduled when the only non-idle > > >>> task is psi_avgs_work itself, however the regular 2sec averaging work > > >>> will still go on. I think we could record the fact that the only > > >>> active task is psi_avgs_work in record_times() using a new > > >>> psi_group_cpu.state_mask flag and then prevent psi_avgs_work() from > > >>> rescheduling itself if that flag is set for all non-idle cpus. I'll > > >>> test this approach and will post a patch for review if that works. > > >> > > >> Hi Pavan, > > >> Testing PSI shutoff on Android proved more difficult than I expected. > > >> Lots of tasks to silence and I keep encountering new ones. > > >> The approach I was thinking about is something like this: > > >> > > >> --- > > >> include/linux/psi_types.h | 3 +++ > > >> kernel/sched/psi.c | 12 +++++++++--- > > >> 2 files changed, 12 insertions(+), 3 deletions(-) > > >> > > >> diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h > > >> index c7fe7c089718..8d936f22cb5b 100644 > > >> --- a/include/linux/psi_types.h > > >> +++ b/include/linux/psi_types.h > > >> @@ -68,6 +68,9 @@ enum psi_states { > > >> NR_PSI_STATES = 7, > > >> }; > > >> > > >> +/* state_mask flag to keep re-arming averaging work */ > > >> +#define PSI_STATE_WAKE_CLOCK (1 << NR_PSI_STATES) > > >> + > > >> enum psi_aggregators { > > >> PSI_AVGS = 0, > > >> PSI_POLL, > > >> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c > > >> index ecb4b4ff4ce0..dd62ad28bacd 100644 > > >> --- a/kernel/sched/psi.c > > >> +++ b/kernel/sched/psi.c > > >> @@ -278,6 +278,7 @@ static void get_recent_times(struct psi_group > > >> *group, int cpu, > > >> if (delta) > > >> *pchanged_states |= (1 << s); > > >> } > > >> + *pchanged_states |= (state_mask & PSI_STATE_WAKE_CLOCK); > > > > > > If the avgs_work kworker is running on this CPU, it will still see > > > PSI_STATE_WAKE_CLOCK set in state_mask? So the work will be re-armed? > > > > > > Maybe I missed something... but I have another different idea which > > > I want to implement later only for discussion. > > > > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c > > index ee2ecc081422..f322e8fd8d41 100644 > > --- a/kernel/sched/psi.c > > +++ b/kernel/sched/psi.c > > @@ -241,11 +241,13 @@ static void get_recent_times(struct psi_group *group, int cpu, > > enum psi_aggregators aggregator, u32 *times, > > u32 *pchanged_states) > > { > > + int current_cpu = raw_smp_processor_id(); > > struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu); > > u64 now, state_start; > > enum psi_states s; > > unsigned int seq; > > u32 state_mask; > > + bool only_avgs_work = false; > > > > *pchanged_states = 0; > > > > @@ -256,6 +258,14 @@ static void get_recent_times(struct psi_group *group, int cpu, > > memcpy(times, groupc->times, sizeof(groupc->times)); > > state_mask = groupc->state_mask; > > state_start = groupc->state_start; > > + /* > > + * This CPU has only avgs_work kworker running, snapshot the > > + * newest times then don't need to re-arm work for this groupc. > > + * Normally this kworker will sleep soon and won't > > + * wake_clock in psi_group_change(). > > + */ > > + if (current_cpu == cpu && groupc->tasks[NR_RUNNING] == 1) > > + only_avgs_work = true; > > } while (read_seqcount_retry(&groupc->seq, seq)); > > > > /* Calculate state time deltas against the previous snapshot */ > > @@ -280,6 +290,10 @@ static void get_recent_times(struct psi_group *group, int cpu, > > if (delta) > > *pchanged_states |= (1 << s); > > } > > + > > + /* Clear PSI_NONIDLE so avgs_work won't be re-armed for this groupc */ > > + if (only_avgs_work) > > + *pchanged_states &= ~(1 << PSI_NONIDLE); > > } > > > Thanks Chengming for the patch. I will test this patch and report my > observations. It makes sense to consider this CPU as non-idle if the PSI kworker > is the only task running. It could run other works but that decision is now > deferred to schedule out path. Ideally if this is the only (or last) work > running, we should not see PSI work not re-arming it self. > is condition groupc->tasks[NR_RUNNING] == 1 alone sufficient to clear NONIDLE? or should we also make sure that !NR_IOWAIT && !NR_MEMSTALL condition on this CPU? Thanks, Pavan