From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92061C43603 for ; Mon, 9 Dec 2019 12:44:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6608620726 for ; Mon, 9 Dec 2019 12:44:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1575895494; bh=SUcO/GRyMcaRl4yl2gXpVD0nsu2gmCrOHo7OwC6fU5s=; h=References:In-Reply-To:From:Date:Subject:To:Cc:List-ID:From; b=KMdVuchclMAE8zcK7uVbvKpTN+NktHbuxfQstXMwsvg2ONmo4pPWquhKqo9aa2Iyt 3BbOTZ6OATYgApgbKpSM+SFjau0vBO4vCmUUKk0XSj3nbiV/ceXjX2meVgFnnIIgL0 Rp9q+Ef3YFFzorPk+O+QeloNP92sbgGUC57WgjJU= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727403AbfLIMoy convert rfc822-to-8bit (ORCPT ); Mon, 9 Dec 2019 07:44:54 -0500 Received: from mail-ot1-f66.google.com ([209.85.210.66]:37336 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727200AbfLIMox (ORCPT ); Mon, 9 Dec 2019 07:44:53 -0500 Received: by mail-ot1-f66.google.com with SMTP id k14so12046893otn.4 for ; Mon, 09 Dec 2019 04:44:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=c+ELEXRFNNVXPowvEvklMHcDzPFbpmODjwtDaavTNMM=; b=W4vZacCGH32Zs2isc9a2P4SsheJuqIGY13Ny/07YgQ20wa6Bf1C59PSlz1ohEIU68+ R3hhMMqUXQZ4RDXa/YrrTFooObHL2a9fLH3gsiA7qnM/e8EQZ9fxFL/4/bKqEZ/eb3hr 38Jh4uzLyZgdvCP2pkkAu3Sg5DPeUrReaW/w9RgagDX91zdPJvtA5M4ENjam5MGC+3Ix NAmH0/3ZHrTuDKq+X+M+U/cdI2Z00TV+4lupLRO62jUVfWLC6EnV2brmGcmnHKLIxnUz aCvtcWlWgkT0aUahm1EVv0ZBWjoX5rKLVlTh7hlegtVswkHWZVHc30GVjCkRDFoXnxRO RCtw== X-Gm-Message-State: APjAAAWzjVnc1cvZsRDj15Qte/Po5g2GNPp55EbJZdCfQ+60pXhdos9Y D1CjwqpcIXVyHCykGodYh1keaZi5jmoi9cBj5BI= X-Google-Smtp-Source: APXvYqzlLQ/v/MzDNxzf8/iEOQAAOuDCLOxgBwFclffR3llUkvkgte6Xn/tURU1N9HUFsQHnvtolqN6QGJKasuRIPfs= X-Received: by 2002:a9d:7447:: with SMTP id p7mr20299188otk.189.1575895492787; Mon, 09 Dec 2019 04:44:52 -0800 (PST) MIME-Version: 1.0 References: <4087016.QifdzW7851@kreacher> <0EF688DF-FD00-456C-8CE1-C4F825651275@nxp.com> In-Reply-To: <0EF688DF-FD00-456C-8CE1-C4F825651275@nxp.com> From: "Rafael J. Wysocki" Date: Mon, 9 Dec 2019 13:44:41 +0100 Message-ID: Subject: Re: About CPU hot-plug stress test failed in cpufreq driver To: Anson Huang Cc: "Rafael J. Wysocki" , "Rafael J. Wysocki" , Peng Fan , Viresh Kumar , Jacky Bai , "linux-pm@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8BIT Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On Mon, Dec 9, 2019 at 1:32 PM Anson Huang wrote: > > > > From Anson's iPhone 6 > > > > 在 2019年12月9日,19:23,Rafael J. Wysocki 写道: > > > >> On Mon, Dec 9, 2019 at 11:57 AM Anson Huang wrote: > >> > >> Forgot to mentioned that below patch on v5.4 can easily reproduce the panic() on our platforms which I think is unexpected, as the policy->cpus already be updated after governor stop, but still try to have irq work queued on it. > >> > >> static void dbs_update_util_handler(struct update_util_data *data, u64 time, unsigned int flags) > >> + if (!cpumask_test_cpu(smp_processor_id(), policy_dbs->policy->cpus)) > >> + panic("...irq work on offline cpu %d\n", smp_processor_id()); > >> irq_work_queue(&policy_dbs->irq_work); > > > > Yes, that is unexpected. > > > > In cpufreq_offline(), we have: > > > > down_write(&policy->rwsem); > > if (has_target()) > > cpufreq_stop_governor(policy); > > > > cpumask_clear_cpu(cpu, policy->cpus); > > > > and cpufreq_stop_governor() calls policy->governor->stop(policy) which > > is cpufreq_dbs_governor_stop(). > > > > That calls gov_clear_update_util(policy_dbs->policy) first, which > > invokes cpufreq_remove_update_util_hook() for each CPU in policy->cpus > > and synchronizes RCU, so after that point none of the policy->cpus is > > expected to run dbs_update_util_handler(). > > > > policy->cpus is updated next and the governor is started again with > > the new policy->cpus. Because the offline CPU is not there, it is not > > expected to run dbs_update_util_handler() again. > > > > Do you only get the original error when one of the CPUs goes back online? > > No, sometimes I also got this error during a CPU is being offline. > > But the point is NOT that dbs_update_util_handler() called during governor stop, > it is that this function is running on a CPU which already finish the governor stop > function, Yes, it is, and which should not be possible as per the above. The offline CPU is not there in prolicy->cpus when cpufreq_dbs_governor_start() is called for the policy, so its cpufreq_update_util_data pointer is not set (it is NULL at that time). Therefore it is not expected to run dbs_update_util_handler() until it is turn back online. > I thought the original expectation is that this function ONLY be executed on the CPU which needs scaling frequency? > Is this correct? Yes, it is. > v4.19 follows this expectation while v5.4 is NOT. As per the kernel code, they both do. > The only thing I can image is the changes in kernel/sched/ folder cause this difference, but I still need more time to figure out what changes cause it, if you have any suggestion, please advise, thanks! The CPU offline/online (hotplug) rework was done after 4.19 IIRC and that changed the way online works. Now, it runs on the CPU going online and previously it ran on the CPU "asking" the other one to go online. That may be what makes the difference (if my recollection of the time frame is correct).