From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: <671a0b39-b635-6e0e-d3fa-967651f2e29c@arm.com> References: <1488520809-31670-1-git-send-email-leo.yan@linaro.org> <1488520809-31670-4-git-send-email-leo.yan@linaro.org> <671a0b39-b635-6e0e-d3fa-967651f2e29c@arm.com> From: Mike Leach Date: Wed, 22 Mar 2017 12:54:36 +0000 Message-ID: Subject: Re: [PATCH v3 3/5] coresight: add support for debug module Content-Type: multipart/alternative; boundary=001a113cbdd849a780054b5143de To: Sudeep Holla Cc: Leo Yan , Rob Herring , Mark Rutland , Wei Xu , Catalin Marinas , Will Deacon , Michael Turquette , Stephen Boyd , Mathieu Poirier , John Stultz , Guodong Xu , Haojian Zhuang , Greg Kroah-Hartman , devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-clk@vger.kernel.org List-ID: --001a113cbdd849a780054b5143de Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 21 March 2017 at 15:39, Sudeep Holla wrote: > > > On 03/03/17 06:00, Leo Yan wrote: > > Coresight includes debug module and usually the module connects with CP= U > > debug logic. ARMv8 architecture reference manual (ARM DDI 0487A.k) has > > description for related info in "Part H: External Debug". > > > > Chapter H7 "The Sample-based Profiling Extension" introduces several > > sampling registers, e.g. we can check program counter value with > > combined CPU exception level, secure state, etc. So this is helpful for > > analysis CPU lockup scenarios, e.g. if one CPU has run into infinite > > loop with IRQ disabled. In this case the CPU cannot switch context and > > handle any interrupt (including IPIs), as the result it cannot handle > > SMP call for stack dump. > > > > This patch is to enable coresight debug module, so firstly this driver > > is to bind apb clock for debug module and this is to ensure the debug > > module can be accessed from program or external debugger. And the drive= r > > uses sample-based registers for debug purpose, e.g. when system detects > > the CPU lockup and trigger panic, the driver will dump program counter > > and combined context registers (EDCIDSR, EDVIDSR); by parsing context > > registers so can quickly get to know CPU secure state, exception level, > > etc. > > > > Some of the debug module registers are located in CPU power domain, so > > in the driver it has checked the power state for CPU before accessing > > registers within CPU power domain. For most safe way to use this driver= , > > it's suggested to disable CPU low power states, this can simply set > > "nohlt" in kernel command line. > > > > I disagree with this approach. One of the main usefulness of such self > hosted debug feature is to debug issues around features like cpuidle. > Adding constraints like "cpuidle needs to be disabled" is not good IMO. > There are ways to make it work with cpuidle enabled. Please explore > them. In particular refer H9.2.39 EDPRCR, External Debug Power/Reset > Control Register. > > So, "nohlt" option is not an option. I prefer some sysfs option like > Suzuki suggested to enable this feature on demand if power saving in > normal usecase is the concern. Using "nohlt" just disables idle and > doesn't ensure the debug power domain is ON. Using the flag directly in > this driver to enable debug power domain also sounds misuse of that flag > for me. > > -- > Regards, > Sudeep > I think the key issue to remember here is that experience with external debug shows that CPU Idle means different things to different SoC designs / power management schemes. (and we are using external debug in a self hosted way here). Some designs will power down an entire cluster if all CPUs on the cluster are powered down - including the parts of the debug registers that should remain powered in the debug power domain. The bits in EDPRCR are not respected in these cases - these designs do not really support debug over power down in the way that the CoreSight / Debug designers anticipated. This means that even checking EDPRSR has the potential to cause a bus hang if the target register is unpowered. (and if the debug power domain is unpowered then the PC data is also lost). In these cases, accessing to the debug registers while they are not powered is a recipe for disaster - so preventing CPUIdle =E2=80=8Band the subsequent cluster power down =E2=80=8B allows investigation on this class of system - =E2=80=8Band allowing the CPUs of interest be interrogated without hanging = the crash log process.=E2=80=8B =E2=80=8BOn systems that do behave correctly with respect to debug power do= mains, then disabling CPUIdle is unnecessary - these can be controlled by =E2=80= =8BEDPRCR - perhaps; per the specification it is "implementation defined" if writing bits to this register have an effect on the system anyway even if the debug domain is correctly powered. =E2=80=8BWhile it is true to say that disabling CPUIdle does not guarantee = that the debug power domain is on, it does in a certain class of designs prevent it being powered off (Juno historically - not sure if that is still the case.)= . However, I do agree that the use of the driver should not be triggered _only_ on the existence of /nohlt on the command line - =E2=80=8Bthere is a= class of designs where this will not be required. When enabing the driver as a kernel config the user needs to decide:- 1) do I need this to debug the issue I am seeing 2) does the power management on my system require I use /nohlt as well. I think that the use of /nohlt as an option, and the reasons why it might be needed should be part of the configuration help in this case. There is also a case for considering if there should be an option to configure it to be enabled or disabled at boot time. It is easy to imagine cases I want to have this running from the start as a crash happens early - and cases I can enable it on demand later. =E2=80=8BRegards=E2=80=8B =E2=80=8BMike=E2=80=8B --=20 Mike Leach Principal Engineer, ARM Ltd. Blackburn Design Centre. UK --001a113cbdd849a780054b5143de Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable


On 21 March 2017 at 15:39, Sudeep Holla <sudeep.holla@arm.com= > wrote:


On 03/03/17 06:00, Leo Yan wrote:
> Coresight includes debug module and usua= lly the module connects with CPU
> debug logic. ARMv8 architecture reference manual (ARM DDI 0487A.k) has=
> description for related info in "Part H: External Debug". >
> Chapter H7 "The Sample-based Profiling Extension" introduces= several
> sampling registers, e.g. we can check program counter value with
> combined CPU exception level, secure state, etc. So this is helpful fo= r
> analysis CPU lockup scenarios, e.g. if one CPU has run into infinite > loop with IRQ disabled. In this case the CPU cannot switch context and=
> handle any interrupt (including IPIs), as the result it cannot handle<= br> > SMP call for stack dump.
>
> This patch is to enable coresight debug module, so firstly this driver=
> is to bind apb clock for debug module and this is to ensure the debug<= br> > module can be accessed from program or external debugger. And the driv= er
> uses sample-based registers for debug purpose, e.g. when system detect= s
> the CPU lockup and trigger panic, the driver will dump program counter=
> and combined context registers (EDCIDSR, EDVIDSR); by parsing context<= br> > registers so can quickly get to know CPU secure state, exception level= ,
> etc.
>
> Some of the debug module registers are located in CPU power domain, so=
> in the driver it has checked the power state for CPU before accessing<= br> > registers within CPU power domain. For most safe way to use this drive= r,
> it's suggested to disable CPU low power states, this can simply se= t
> "nohlt" in kernel command line.
>

I disagree with this approach. One of the main usefulness of such se= lf
hosted debug feature is to debug issues around features like cpuidle.
Adding constraints like "cpuidle needs to be disabled" is not goo= d IMO.
There are ways to make it work with cpuidle enabled. Please explore
them. In particular refer H9.2.39 EDPRCR, External Debug Power/Reset
Control Register.

So, "nohlt" option is not an option. I prefer some sysfs option l= ike
Suzuki suggested to enable this feature on demand if power saving in
normal usecase is the concern. Using "nohlt" just disables idle a= nd
doesn't ensure the debug power domain is ON. Using the flag directly in=
this driver to enable debug power domain also sounds misuse of that flag for me.

--
Regards,
Sudeep

I think the key issue to remember here is that experience with external=20 debug shows that CPU Idle means different things to different SoC=20 designs / power management schemes. (and we are using external debug in a s= elf hosted way here).

Some designs will power down an entir= e cluster if all CPUs on the cluster are=20 powered down - including the parts of the debug registers that should=20 remain powered in the debug power domain. The bits in EDPRCR are not=20 respected in these cases - these designs do not really support debug=20 over power down in the way that the CoreSight / Debug designers=20 anticipated. This means that even checking EDPRSR has the potential to caus= e a bus hang if the target register is unpowered. (and if the debug power doma= in is unpowered then the PC data is also lost).

In these cases, accessing to the debug registers while they are not=20 powered is a recipe for disaster - so preventing CPUIdle
= =E2=80=8Band the subsequent cluster power down =E2=80=8B
allows=20 investigation on this class of system -
=E2=80=8Band allo= wing the CPUs of interest be interrogated without hanging the crash log pro= cess.=E2=80=8B


=E2=80=8BOn systems that do behave correctly= with respect to debug power domains, then disabling CPUIdle is unnecessary= - these can be controlled by =E2=80=8BEDPRCR - perhaps; per the specificat= ion it is "implementation defined" if writing bits to this regist= er have an effect on the system anyway even if the debug domain is correctl= y powered.

=E2=80=8BWhile it is true to say that disabling = CPUIdle does not guarantee that the debug power domain is on, it does in a = certain class of designs prevent it being powered off (Juno historically - = not sure if that is still the case.).

However, I do agree t= hat the use of the driver should not be triggered _only_ on the existence o= f /nohlt on the command line - =E2=80=8Bthere is a class of designs where t= his will not be required.

When enabing the driver as a ker= nel config the user needs to decide:-
1) do I need this to debu= g the issue I am seeing
2) does the power management on my syst= em require I use /nohlt as well.

I think that the use of /n= ohlt as an option, and the reasons why it might be needed should be part of= the configuration help in this case.

There is also a case = for considering if there should be an option to configure it to be enabled = or disabled at boot time. It is easy to imagine cases I want to have this r= unning from the start as a crash happens early - and cases I can enable it = on demand later.


=E2=80=8BRegards=E2=80=8B
=E2=80=8BMike=E2=80=8B

--
Mike Leach
Principal Engineer, ARM Ltd= .
Blackburn Design Centre. UK
--001a113cbdd849a780054b5143de--