From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1DD2C433E1 for ; Thu, 25 Jun 2020 10:02:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9BF8820702 for ; Thu, 25 Jun 2020 10:02:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403810AbgFYKCP (ORCPT ); Thu, 25 Jun 2020 06:02:15 -0400 Received: from foss.arm.com ([217.140.110.172]:55512 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390025AbgFYKCO (ORCPT ); Thu, 25 Jun 2020 06:02:14 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F0CB41F1; Thu, 25 Jun 2020 03:02:12 -0700 (PDT) Received: from [10.37.12.83] (unknown [10.37.12.83]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9E0EA3F73C; Thu, 25 Jun 2020 03:02:10 -0700 (PDT) Subject: Re: brocken devfreq simple_ondemand for Odroid XU3/4? To: Sylwester Nawrocki Cc: Krzysztof Kozlowski , Willy Wolff , Chanwoo Choi , MyungJoo Ham , Kyungmin Park , Kukjin Kim , linux-pm@vger.kernel.org, "linux-samsung-soc@vger.kernel.org" , linux-arm-kernel@lists.infradead.org, "linux-kernel@vger.kernel.org" References: <20200623164733.qbhua7b6cg2umafj@macmini.local> <20200623191129.GA4171@kozik-lap> <85f5a8c0-7d48-f2cd-3385-c56d662f2c88@arm.com> From: Lukasz Luba Message-ID: <4a72fcab-e8da-8323-1fbe-98a6a4b3e0f1@arm.com> Date: Thu, 25 Jun 2020 11:02:08 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Sylwester, On 6/24/20 4:11 PM, Sylwester Nawrocki wrote: > Hi All, > > On 24.06.2020 12:32, Lukasz Luba wrote: >> I had issues with devfreq governor which wasn't called by devfreq >> workqueue. The old DELAYED vs DEFERRED work discussions and my patches >> for it [1]. If the CPU which scheduled the next work went idle, the >> devfreq workqueue will not be kicked and devfreq governor won't check >> DMC status and will not decide to decrease the frequency based on low >> busy_time. >> The same applies for going up with the frequency. They both are >> done by the governor but the workqueue must be scheduled periodically. > > As I have been working on resolving the video mixer IOMMU fault issue > described here: https://patchwork.kernel.org/patch/10861757 > I did some investigation of the devfreq operation, mostly on Odroid U3. > > My conclusions are similar to what Lukasz says above. I would like to add > that broken scheduling of the performance counters read and the devfreq > updates seems to have one more serious implication. In each call, which > normally should happen periodically with fixed interval we stop the counters, > read counter values and start the counters again. But if period between > calls becomes long enough to let any of the counters overflow, we will > get wrong performance measurement results. My observations are that > the workqueue job can be suspended for several seconds and conditions for > the counter overflow occur sooner or later, depending among others > on the CPUs load. > Wrong bus load measurement can lead to setting too low interconnect bus > clock frequency and then bad things happen in peripheral devices. > > I agree the workqueue issue needs to be fixed. I have some WIP code to use > the performance counters overflow interrupts instead of SW polling and with > that the interconnect bus clock control seems to work much better. > Thank you for sharing your use case and investigation results. I think we are reaching a decent number of developers to maybe address this issue: 'workqueue issue needs to be fixed'. I have been facing this devfreq workqueue issue ~5 times in different platforms. Regarding the 'performance counters overflow interrupts' there is one thing worth to keep in mind: variable utilization and frequency. For example, in order to make a conclusion in algorithm deciding that the device should increase or decrease the frequency, we fix the period of observation, i.e. to 500ms. That can cause the long delay if the utilization of the device suddenly drops. For example we set an overflow threshold to value i.e. 1000 and we know that at 1000MHz and full utilization (100%) the counter will reach that threshold after 500ms (which we want, because we don't want too many interrupts per sec). What if suddenly utilization drops to 2% (i.e. from 5GB/s to 250MB/s (what if it drops to 25MB/s?!)), the counter will reach the threshold after 50*500ms = 25s. It is impossible just for the counters to predict next utilization and adjust the threshold. To address that, we still need to have another mechanism (like watchdog) which will be triggered just to check if the threshold needs adjustment. This mechanism can be a local timer in the driver or a framework timer running kind of 'for loop' on all this type of devices (like the scheduled workqueue). In both cases in the system there will be interrupts, timers (even at workqueues) and scheduling. The approach to force developers to implement their local watchdog timers (or workqueues) in drivers is IMHO wrong and that's why we have frameworks. Regards, Lukasz From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 758FEC433DF for ; Thu, 25 Jun 2020 10:04:12 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 3E1DC2076E for ; Thu, 25 Jun 2020 10:04:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="hPks7972" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3E1DC2076E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=57LdIcqgXmFyrdilvDr/gh2NVHZvQ49xoHcyuRm5HL8=; b=hPks7972/HrLKszxOu6Ygxsnc XvFmkaCZgzmcOFHOCIFpR5I6YGsaNv42/1r0cEMct1XR1NVrSlylBxxQrSJnHtpMeGO02pVLDd3NR UL/6QRA/WjuSzGnE/sQa81nBPRXkcoOeZOcNSStLni1JWr02UJdj3U+szuLn1XzPxko8hVLX0Hqxf 52MR7uqjwRLjnyLI57ZcCkM9D4PqJhaJtjTE/K3K7QNsflLWdvpdO2r3/Ay+OyOrcTFG9pqt+nODC X2NK9KWlZ3r0LE5pjtK/sy8svF47UhMjVzWsypjRWG0y0J3+Tmy01lz4eQvjXn3BgzwI3eeLnMXwv uYmxSmytQ==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1joOi0-0001j5-8A; Thu, 25 Jun 2020 10:02:16 +0000 Received: from foss.arm.com ([217.140.110.172]) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1joOhx-0001ic-Gf for linux-arm-kernel@lists.infradead.org; Thu, 25 Jun 2020 10:02:14 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F0CB41F1; Thu, 25 Jun 2020 03:02:12 -0700 (PDT) Received: from [10.37.12.83] (unknown [10.37.12.83]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9E0EA3F73C; Thu, 25 Jun 2020 03:02:10 -0700 (PDT) Subject: Re: brocken devfreq simple_ondemand for Odroid XU3/4? To: Sylwester Nawrocki References: <20200623164733.qbhua7b6cg2umafj@macmini.local> <20200623191129.GA4171@kozik-lap> <85f5a8c0-7d48-f2cd-3385-c56d662f2c88@arm.com> From: Lukasz Luba Message-ID: <4a72fcab-e8da-8323-1fbe-98a6a4b3e0f1@arm.com> Date: Thu, 25 Jun 2020 11:02:08 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.9.0 MIME-Version: 1.0 In-Reply-To: Content-Language: en-US X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Willy Wolff , "linux-samsung-soc@vger.kernel.org" , linux-pm@vger.kernel.org, "linux-kernel@vger.kernel.org" , Krzysztof Kozlowski , Chanwoo Choi , Kyungmin Park , Kukjin Kim , MyungJoo Ham , linux-arm-kernel@lists.infradead.org Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi Sylwester, On 6/24/20 4:11 PM, Sylwester Nawrocki wrote: > Hi All, > > On 24.06.2020 12:32, Lukasz Luba wrote: >> I had issues with devfreq governor which wasn't called by devfreq >> workqueue. The old DELAYED vs DEFERRED work discussions and my patches >> for it [1]. If the CPU which scheduled the next work went idle, the >> devfreq workqueue will not be kicked and devfreq governor won't check >> DMC status and will not decide to decrease the frequency based on low >> busy_time. >> The same applies for going up with the frequency. They both are >> done by the governor but the workqueue must be scheduled periodically. > > As I have been working on resolving the video mixer IOMMU fault issue > described here: https://patchwork.kernel.org/patch/10861757 > I did some investigation of the devfreq operation, mostly on Odroid U3. > > My conclusions are similar to what Lukasz says above. I would like to add > that broken scheduling of the performance counters read and the devfreq > updates seems to have one more serious implication. In each call, which > normally should happen periodically with fixed interval we stop the counters, > read counter values and start the counters again. But if period between > calls becomes long enough to let any of the counters overflow, we will > get wrong performance measurement results. My observations are that > the workqueue job can be suspended for several seconds and conditions for > the counter overflow occur sooner or later, depending among others > on the CPUs load. > Wrong bus load measurement can lead to setting too low interconnect bus > clock frequency and then bad things happen in peripheral devices. > > I agree the workqueue issue needs to be fixed. I have some WIP code to use > the performance counters overflow interrupts instead of SW polling and with > that the interconnect bus clock control seems to work much better. > Thank you for sharing your use case and investigation results. I think we are reaching a decent number of developers to maybe address this issue: 'workqueue issue needs to be fixed'. I have been facing this devfreq workqueue issue ~5 times in different platforms. Regarding the 'performance counters overflow interrupts' there is one thing worth to keep in mind: variable utilization and frequency. For example, in order to make a conclusion in algorithm deciding that the device should increase or decrease the frequency, we fix the period of observation, i.e. to 500ms. That can cause the long delay if the utilization of the device suddenly drops. For example we set an overflow threshold to value i.e. 1000 and we know that at 1000MHz and full utilization (100%) the counter will reach that threshold after 500ms (which we want, because we don't want too many interrupts per sec). What if suddenly utilization drops to 2% (i.e. from 5GB/s to 250MB/s (what if it drops to 25MB/s?!)), the counter will reach the threshold after 50*500ms = 25s. It is impossible just for the counters to predict next utilization and adjust the threshold. To address that, we still need to have another mechanism (like watchdog) which will be triggered just to check if the threshold needs adjustment. This mechanism can be a local timer in the driver or a framework timer running kind of 'for loop' on all this type of devices (like the scheduled workqueue). In both cases in the system there will be interrupts, timers (even at workqueues) and scheduling. The approach to force developers to implement their local watchdog timers (or workqueues) in drivers is IMHO wrong and that's why we have frameworks. Regards, Lukasz _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel