Re: [Freedreno] [PATCH v2 5/7] arm64: dts: qcom: sc7280: Update gpu register list

From: Akhil P Oommen <quic_akhilpo@quicinc.com>
To: Rajendra Nayak <quic_rjendra@quicinc.com>,
	Stephen Boyd <swboyd@chromium.org>,
	Doug Anderson <dianders@chromium.org>,
	Taniya Das <quic_tdas@quicinc.com>
Cc: <devicetree@vger.kernel.org>, Jonathan Marek <jonathan@marek.ca>,
	linux-arm-msm <linux-arm-msm@vger.kernel.org>,
	Andy Gross <agross@kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Bjorn Andersson" <bjorn.andersson@linaro.org>,
	Rob Herring <robh+dt@kernel.org>, Rob Clark <robdclark@gmail.com>,
	Matthias Kaehlcke <mka@chromium.org>,
	Krzysztof Kozlowski <krzysztof.kozlowski+dt@linaro.org>,
	Jordan Crouse <jordan@cosmicpenguin.net>,
	freedreno <freedreno@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [Freedreno] [PATCH v2 5/7] arm64: dts: qcom: sc7280: Update gpu register list
Date: Wed, 20 Jul 2022 11:34:01 +0530	[thread overview]
Message-ID: <698d3279-6a02-9b1e-a3bd-627b6afbc57e@quicinc.com> (raw)
In-Reply-To: <b6ab023b-601d-1df2-b04b-af5961b73bea@quicinc.com>

On 7/19/2022 3:26 PM, Rajendra Nayak wrote:
>
>
> On 7/19/2022 12:49 PM, Stephen Boyd wrote:
>> Quoting Akhil P Oommen (2022-07-18 23:37:16)
>>> On 7/19/2022 11:19 AM, Stephen Boyd wrote:
>>>> Quoting Akhil P Oommen (2022-07-18 21:07:05)
>>>>> On 7/14/2022 11:10 AM, Akhil P Oommen wrote:
>>>>>> IIUC, qcom gdsc driver doesn't ensure hardware is collapsed since 
>>>>>> they
>>>>>> are vote-able switches. Ideally, we should ensure that the hw has
>>>>>> collapsed for gpu recovery because there could be transient votes 
>>>>>> from
>>>>>> other subsystems like hypervisor using their vote register.
>>>>>>
>>>>>> I am not sure how complex the plumbing to gpucc driver would be 
>>>>>> to allow
>>>>>> gpu driver to check hw status. OTOH, with this patch, gpu driver 
>>>>>> does a
>>>>>> read operation on a gpucc register which is in always-on domain. 
>>>>>> That
>>>>>> means we don't need to vote any resource to access this register.
>>
>> Reading between the lines here, you're saying that you have to read the
>> gdsc register to make sure that the gdsc is in some state? Can you
>> clarify exactly what you're doing? And how do you know that something
>> else in the kernel can't cause the register to change after it is read?
>> It certainly seems like we can't be certain because there is voting
>> involved.
 From gpu driver, cx_gdscr.bit[31] (power off status) register can be 
polled to ensure that it *collapsed at least once*. We don't need to 
care if something turns ON gdsc after that.

>
> yes, this looks like the best case effort to get the gpu to recover, but
> the kernel driver really has no control to make sure this condition can
> always be met (because it depends on other entities like hyp, 
> trustzone etc right?)
> Why not just put a worst case polling delay?

I didn't get you entirely. Where do you mean to keep the polling delay?
>
>>
>>>>>>
>>>>>> Stephen/Rajendra/Taniya, any suggestion?
>>>> Why can't you assert a gpu reset signal with the reset APIs? This 
>>>> series
>>>> seems to jump through a bunch of hoops to get the gdsc and power 
>>>> domain
>>>> to "reset" when I don't know why any of that is necessary. Can't we
>>>> simply assert a reset to the hardware after recovery completes so the
>>>> device is back into a good known POR (power on reset) state?
>>> That is because there is no register interface to reset GPU CX domain.
>>> The recommended sequence from HW design folks is to collapse both cx 
>>> and
>>> gx gdsc to properly reset gpu/gmu.
>>>
>>
>> Ok. One knee jerk reaction is to treat the gdsc as a reset then and
>> possibly mux that request along with any power domain on/off so that if
>> the reset is requested and the power domain is off nothing happens.
>> Otherwise if the power domain is on then it manually sequences and
>> controls the two gdscs so that the GPU is reset and then restores the
>> enable state of the power domain.
It would be fatal to asynchronously pull the plug on CX gdsc forcefully 
because there might be another gpu/smmu driver thread accessing 
registers in cx domain.

-Akhil.