From mboxrd@z Thu Jan 1 00:00:00 1970 From: Will Deacon Subject: Re: [PATCH v9 00/12] Support PPTT for ARM64 Date: Tue, 29 May 2018 21:16:24 +0100 Message-ID: <20180529201623.GA591@arm.com> References: <20180511235807.30834-1-jeremy.linton@arm.com> <20180517170523.h7tuvbzdfluuidcz@armageddon.cambridge.arm.com> <09fb3fe7-d703-43f1-74f7-f8cb5ff1f67a@arm.com> <551905a6-eaa8-97df-06ec-1ceedfbc164f@arm.com> <20180529150823.GD17159@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Geert Uytterhoeven Cc: Sudeep Holla , Catalin Marinas , Jeremy Linton , ACPI Devel Maling List , Mark Rutland , austinwc@codeaurora.org, tnowicki@caviumnetworks.com, Palmer Dabbelt , linux-riscv@lists.infradead.org, Morten.Rasmussen@arm.com, vkilari@codeaurora.org, Lorenzo Pieralisi , jhugo@codeaurora.org, Al Stone , Len Brown , John Garry , wangxiongfeng2@huawei.com, Dietmar Eggemann , Linux ARM , Ard Biesheuvel , Greg KH , Rafael J. Wysocki List-Id: linux-acpi@vger.kernel.org Hi Geert, On Tue, May 29, 2018 at 05:51:29PM +0200, Geert Uytterhoeven wrote: > On Tue, May 29, 2018 at 5:08 PM, Will Deacon wrote: > > On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote: > >> On 29/05/18 12:56, Geert Uytterhoeven wrote: > >> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla wrote: > >> >> On 29/05/18 11:48, Geert Uytterhoeven wrote: > >> >>> System supend still works fine on systems with big cores only: > >> >>> > >> >>> R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware)) > >> >>> R-Car M3-N (2xCA57) > >> >>> > >> >>> Reverting this commit fixes the issue for me. > >> >> > >> >> I can't find anything that relates to system suspend in these patches > >> >> unless they are messing with something during CPU hot plug-in back > >> >> during resume. > >> > > >> > It's only the last patch that introduces the breakage. > >> > > >> > >> As specified in the commit log, it won't change any behavior for DT > >> systems if it's non-NUMA or single node system. So I am still wondering > >> what could trigger this regression. > > > > I wonder if we're somehow giving an uninitialised/invalid NUMA configuration > > to the scheduler, although I can't see how this would happen. > > > > Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below > > do you see anything shouting in dmesg? > > Thanks, but unfortunately it doesn't help. > I added some debug code to print cpumask, but so far I don't see anything > suspicious. Damn, sorry for wasting your time. For the record, Catalin's been seeing boot failures under KVM on a non-big/LITTLE machine that bisect reliably to this patch, but we've also not been able to explain them. Worse, adding so much as a printk makes the problem disappear. Will From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966643AbeE2UP7 (ORCPT ); Tue, 29 May 2018 16:15:59 -0400 Received: from foss.arm.com ([217.140.101.70]:47170 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966179AbeE2UPz (ORCPT ); Tue, 29 May 2018 16:15:55 -0400 Date: Tue, 29 May 2018 21:16:24 +0100 From: Will Deacon To: Geert Uytterhoeven Cc: Sudeep Holla , Catalin Marinas , Jeremy Linton , ACPI Devel Maling List , Mark Rutland , austinwc@codeaurora.org, tnowicki@caviumnetworks.com, Palmer Dabbelt , linux-riscv@lists.infradead.org, Morten.Rasmussen@arm.com, vkilari@codeaurora.org, Lorenzo Pieralisi , jhugo@codeaurora.org, Al Stone , Len Brown , John Garry , wangxiongfeng2@huawei.com, Dietmar Eggemann , Linux ARM , Ard Biesheuvel , Greg KH , "Rafael J. Wysocki" , Linux Kernel Mailing List , Hanjun Guo , Linux-Renesas Subject: Re: [PATCH v9 00/12] Support PPTT for ARM64 Message-ID: <20180529201623.GA591@arm.com> References: <20180511235807.30834-1-jeremy.linton@arm.com> <20180517170523.h7tuvbzdfluuidcz@armageddon.cambridge.arm.com> <09fb3fe7-d703-43f1-74f7-f8cb5ff1f67a@arm.com> <551905a6-eaa8-97df-06ec-1ceedfbc164f@arm.com> <20180529150823.GD17159@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Geert, On Tue, May 29, 2018 at 05:51:29PM +0200, Geert Uytterhoeven wrote: > On Tue, May 29, 2018 at 5:08 PM, Will Deacon wrote: > > On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote: > >> On 29/05/18 12:56, Geert Uytterhoeven wrote: > >> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla wrote: > >> >> On 29/05/18 11:48, Geert Uytterhoeven wrote: > >> >>> System supend still works fine on systems with big cores only: > >> >>> > >> >>> R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware)) > >> >>> R-Car M3-N (2xCA57) > >> >>> > >> >>> Reverting this commit fixes the issue for me. > >> >> > >> >> I can't find anything that relates to system suspend in these patches > >> >> unless they are messing with something during CPU hot plug-in back > >> >> during resume. > >> > > >> > It's only the last patch that introduces the breakage. > >> > > >> > >> As specified in the commit log, it won't change any behavior for DT > >> systems if it's non-NUMA or single node system. So I am still wondering > >> what could trigger this regression. > > > > I wonder if we're somehow giving an uninitialised/invalid NUMA configuration > > to the scheduler, although I can't see how this would happen. > > > > Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below > > do you see anything shouting in dmesg? > > Thanks, but unfortunately it doesn't help. > I added some debug code to print cpumask, but so far I don't see anything > suspicious. Damn, sorry for wasting your time. For the record, Catalin's been seeing boot failures under KVM on a non-big/LITTLE machine that bisect reliably to this patch, but we've also not been able to explain them. Worse, adding so much as a printk makes the problem disappear. Will From mboxrd@z Thu Jan 1 00:00:00 1970 From: will.deacon@arm.com (Will Deacon) Date: Tue, 29 May 2018 21:16:24 +0100 Subject: [PATCH v9 00/12] Support PPTT for ARM64 In-Reply-To: References: <20180511235807.30834-1-jeremy.linton@arm.com> <20180517170523.h7tuvbzdfluuidcz@armageddon.cambridge.arm.com> <09fb3fe7-d703-43f1-74f7-f8cb5ff1f67a@arm.com> <551905a6-eaa8-97df-06ec-1ceedfbc164f@arm.com> <20180529150823.GD17159@arm.com> Message-ID: <20180529201623.GA591@arm.com> To: linux-riscv@lists.infradead.org List-Id: linux-riscv.lists.infradead.org Hi Geert, On Tue, May 29, 2018 at 05:51:29PM +0200, Geert Uytterhoeven wrote: > On Tue, May 29, 2018 at 5:08 PM, Will Deacon wrote: > > On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote: > >> On 29/05/18 12:56, Geert Uytterhoeven wrote: > >> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla wrote: > >> >> On 29/05/18 11:48, Geert Uytterhoeven wrote: > >> >>> System supend still works fine on systems with big cores only: > >> >>> > >> >>> R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware)) > >> >>> R-Car M3-N (2xCA57) > >> >>> > >> >>> Reverting this commit fixes the issue for me. > >> >> > >> >> I can't find anything that relates to system suspend in these patches > >> >> unless they are messing with something during CPU hot plug-in back > >> >> during resume. > >> > > >> > It's only the last patch that introduces the breakage. > >> > > >> > >> As specified in the commit log, it won't change any behavior for DT > >> systems if it's non-NUMA or single node system. So I am still wondering > >> what could trigger this regression. > > > > I wonder if we're somehow giving an uninitialised/invalid NUMA configuration > > to the scheduler, although I can't see how this would happen. > > > > Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below > > do you see anything shouting in dmesg? > > Thanks, but unfortunately it doesn't help. > I added some debug code to print cpumask, but so far I don't see anything > suspicious. Damn, sorry for wasting your time. For the record, Catalin's been seeing boot failures under KVM on a non-big/LITTLE machine that bisect reliably to this patch, but we've also not been able to explain them. Worse, adding so much as a printk makes the problem disappear. Will From mboxrd@z Thu Jan 1 00:00:00 1970 From: will.deacon@arm.com (Will Deacon) Date: Tue, 29 May 2018 21:16:24 +0100 Subject: [PATCH v9 00/12] Support PPTT for ARM64 In-Reply-To: References: <20180511235807.30834-1-jeremy.linton@arm.com> <20180517170523.h7tuvbzdfluuidcz@armageddon.cambridge.arm.com> <09fb3fe7-d703-43f1-74f7-f8cb5ff1f67a@arm.com> <551905a6-eaa8-97df-06ec-1ceedfbc164f@arm.com> <20180529150823.GD17159@arm.com> Message-ID: <20180529201623.GA591@arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org Hi Geert, On Tue, May 29, 2018 at 05:51:29PM +0200, Geert Uytterhoeven wrote: > On Tue, May 29, 2018 at 5:08 PM, Will Deacon wrote: > > On Tue, May 29, 2018 at 02:18:40PM +0100, Sudeep Holla wrote: > >> On 29/05/18 12:56, Geert Uytterhoeven wrote: > >> > On Tue, May 29, 2018 at 1:14 PM, Sudeep Holla wrote: > >> >> On 29/05/18 11:48, Geert Uytterhoeven wrote: > >> >>> System supend still works fine on systems with big cores only: > >> >>> > >> >>> R-Car H3 ES1.0 (4xCA57 (4xCA53 disabled in firmware)) > >> >>> R-Car M3-N (2xCA57) > >> >>> > >> >>> Reverting this commit fixes the issue for me. > >> >> > >> >> I can't find anything that relates to system suspend in these patches > >> >> unless they are messing with something during CPU hot plug-in back > >> >> during resume. > >> > > >> > It's only the last patch that introduces the breakage. > >> > > >> > >> As specified in the commit log, it won't change any behavior for DT > >> systems if it's non-NUMA or single node system. So I am still wondering > >> what could trigger this regression. > > > > I wonder if we're somehow giving an uninitialised/invalid NUMA configuration > > to the scheduler, although I can't see how this would happen. > > > > Geert -- if you enable CONFIG_DEBUG_PER_CPU_MAPS=y and apply the diff below > > do you see anything shouting in dmesg? > > Thanks, but unfortunately it doesn't help. > I added some debug code to print cpumask, but so far I don't see anything > suspicious. Damn, sorry for wasting your time. For the record, Catalin's been seeing boot failures under KVM on a non-big/LITTLE machine that bisect reliably to this patch, but we've also not been able to explain them. Worse, adding so much as a printk makes the problem disappear. Will