All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
@ 2023-12-13 21:06 Rob Clark
  2023-12-14  7:16 ` Johan Hovold
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-12-13 21:06 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Rob Clark, Dmitry Baryshkov, Andy Gross, Bjorn Andersson,
	Konrad Dybcio, open list

From: Rob Clark <robdclark@chromium.org>

We need to bail out before adding/removing devices, if we are going
to -EPROBE_DEFER.  Otherwise boot will get stuck forever at
deferred_probe_initcall().

Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/soc/qcom/pmic_glink.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 914057331afd..2899746af6a6 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -268,10 +268,16 @@ static int pmic_glink_probe(struct platform_device *pdev)
 	else
 		pg->client_mask = PMIC_GLINK_CLIENT_DEFAULT;
 
+	pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
+	if (IS_ERR(pg->pdr)) {
+		ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n");
+		return ret;
+	}
+
 	if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI)) {
 		ret = pmic_glink_add_aux_device(pg, &pg->ucsi_aux, "ucsi");
 		if (ret)
-			return ret;
+			goto out_release_pdr_handle;
 	}
 	if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_ALTMODE)) {
 		ret = pmic_glink_add_aux_device(pg, &pg->altmode_aux, "altmode");
@@ -284,17 +290,11 @@ static int pmic_glink_probe(struct platform_device *pdev)
 			goto out_release_altmode_aux;
 	}
 
-	pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
-	if (IS_ERR(pg->pdr)) {
-		ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n");
-		goto out_release_aux_devices;
-	}
-
 	service = pdr_add_lookup(pg->pdr, "tms/servreg", "msm/adsp/charger_pd");
 	if (IS_ERR(service)) {
 		ret = dev_err_probe(&pdev->dev, PTR_ERR(service),
 				    "failed adding pdr lookup for charger_pd\n");
-		goto out_release_pdr_handle;
+		goto out_release_aux_devices;
 	}
 
 	mutex_lock(&__pmic_glink_lock);
@@ -303,8 +303,6 @@ static int pmic_glink_probe(struct platform_device *pdev)
 
 	return 0;
 
-out_release_pdr_handle:
-	pdr_handle_release(pg->pdr);
 out_release_aux_devices:
 	if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_BATT))
 		pmic_glink_del_aux_device(pg, &pg->ps_aux);
@@ -314,6 +312,8 @@ static int pmic_glink_probe(struct platform_device *pdev)
 out_release_ucsi_aux:
 	if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI))
 		pmic_glink_del_aux_device(pg, &pg->ucsi_aux);
+out_release_pdr_handle:
+	pdr_handle_release(pg->pdr);
 
 	return ret;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-13 21:06 [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m Rob Clark
@ 2023-12-14  7:16 ` Johan Hovold
  2023-12-14 11:04   ` Dmitry Baryshkov
  0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14  7:16 UTC (permalink / raw)
  To: Rob Clark
  Cc: linux-arm-msm, Rob Clark, Dmitry Baryshkov, Andy Gross,
	Bjorn Andersson, Konrad Dybcio, open list

On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> We need to bail out before adding/removing devices, if we are going
> to -EPROBE_DEFER.  Otherwise boot will get stuck forever at
> deferred_probe_initcall().

Can please you expand on why this is a problem here in the commit
message?

The aux devices appear to be tore down correctly in the probe error
paths so how exactly does that lead to deferred_probe_initcall() being
stuck? This sounds like we may have a problem elsewhere which this patch
is papering over.

Also where does the probe deferral come from in your case?
pdr_handle_alloc()?

If this is a correct fix, I'd also expect there to be a Fixes and
CC-stable tag.

Johan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14  7:16 ` Johan Hovold
@ 2023-12-14 11:04   ` Dmitry Baryshkov
  2023-12-14 14:01     ` Johan Hovold
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Baryshkov @ 2023-12-14 11:04 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
	Konrad Dybcio, open list

On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
>
> On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > We need to bail out before adding/removing devices, if we are going
> > to -EPROBE_DEFER.  Otherwise boot will get stuck forever at
> > deferred_probe_initcall().
>
> Can please you expand on why this is a problem here in the commit
> message?
>
> The aux devices appear to be tore down correctly in the probe error
> paths so how exactly does that lead to deferred_probe_initcall() being
> stuck? This sounds like we may have a problem elsewhere which this patch
> is papering over.

This is a known problem. Successful probes during the probe deferral
loop causes the whole loop to be reiterated. Creating child devices
usually results in  a successful probe. Aso I thought that just
creating new device also causes a reprobe, but I can not find any
evidence now.

>
> Also where does the probe deferral come from in your case?
> pdr_handle_alloc()?
>
> If this is a correct fix, I'd also expect there to be a Fixes and
> CC-stable tag.
>
> Johan
>


-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14 11:04   ` Dmitry Baryshkov
@ 2023-12-14 14:01     ` Johan Hovold
  2023-12-14 14:04       ` Dmitry Baryshkov
  0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 14:01 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
	Konrad Dybcio, open list

On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:
> On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
> >
> > On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > We need to bail out before adding/removing devices, if we are going
> > > to -EPROBE_DEFER.  Otherwise boot will get stuck forever at
> > > deferred_probe_initcall().
> >
> > Can please you expand on why this is a problem here in the commit
> > message?
> >
> > The aux devices appear to be tore down correctly in the probe error
> > paths so how exactly does that lead to deferred_probe_initcall() being
> > stuck? This sounds like we may have a problem elsewhere which this patch
> > is papering over.
> 
> This is a known problem. Successful probes during the probe deferral
> loop causes the whole loop to be reiterated. Creating child devices
> usually results in  a successful probe. Aso I thought that just
> creating new device also causes a reprobe, but I can not find any
> evidence now.

This still needs to be described in the commit message.

Only a successful probe should trigger a reprobe, and when the child
devices are registered the parent is not yet on the deferred probe list.
So something is not right or missing here.

Johan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14 14:01     ` Johan Hovold
@ 2023-12-14 14:04       ` Dmitry Baryshkov
  2023-12-14 14:09         ` Johan Hovold
  0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Baryshkov @ 2023-12-14 14:04 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
	Konrad Dybcio, open list

On Thu, 14 Dec 2023 at 16:01, Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:
> > On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
> > >
> > > On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > We need to bail out before adding/removing devices, if we are going
> > > > to -EPROBE_DEFER.  Otherwise boot will get stuck forever at
> > > > deferred_probe_initcall().
> > >
> > > Can please you expand on why this is a problem here in the commit
> > > message?
> > >
> > > The aux devices appear to be tore down correctly in the probe error
> > > paths so how exactly does that lead to deferred_probe_initcall() being
> > > stuck? This sounds like we may have a problem elsewhere which this patch
> > > is papering over.
> >
> > This is a known problem. Successful probes during the probe deferral
> > loop causes the whole loop to be reiterated. Creating child devices
> > usually results in  a successful probe. Aso I thought that just
> > creating new device also causes a reprobe, but I can not find any
> > evidence now.
>
> This still needs to be described in the commit message.
>
> Only a successful probe should trigger a reprobe, and when the child
> devices are registered the parent is not yet on the deferred probe list.
> So something is not right or missing here.

Child devices can be successfully probed, then the parent gets
-EPROBE_DEFER, removes children and then it goes on and on.

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14 14:04       ` Dmitry Baryshkov
@ 2023-12-14 14:09         ` Johan Hovold
  2023-12-14 15:38           ` Johan Hovold
  0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 14:09 UTC (permalink / raw)
  To: Dmitry Baryshkov
  Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
	Konrad Dybcio, open list

On Thu, Dec 14, 2023 at 04:04:49PM +0200, Dmitry Baryshkov wrote:
> On Thu, 14 Dec 2023 at 16:01, Johan Hovold <johan@kernel.org> wrote:
> > On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:
> > > On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
> > > > On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > We need to bail out before adding/removing devices, if we are going
> > > > > to -EPROBE_DEFER.  Otherwise boot will get stuck forever at
> > > > > deferred_probe_initcall().
> > > >
> > > > Can please you expand on why this is a problem here in the commit
> > > > message?
> > > >
> > > > The aux devices appear to be tore down correctly in the probe error
> > > > paths so how exactly does that lead to deferred_probe_initcall() being
> > > > stuck? This sounds like we may have a problem elsewhere which this patch
> > > > is papering over.
> > >
> > > This is a known problem. Successful probes during the probe deferral
> > > loop causes the whole loop to be reiterated. Creating child devices
> > > usually results in  a successful probe. Aso I thought that just
> > > creating new device also causes a reprobe, but I can not find any
> > > evidence now.
> >
> > This still needs to be described in the commit message.
> >
> > Only a successful probe should trigger a reprobe, and when the child
> > devices are registered the parent is not yet on the deferred probe list.
> > So something is not right or missing here.
> 
> Child devices can be successfully probed, then the parent gets
> -EPROBE_DEFER, removes children and then it goes on and on.

So what? As I described above, the successful probe of the children
should have nothing to do with whether the parent is reprobed.

If that isn't the case, then explain how.

Johan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14 14:09         ` Johan Hovold
@ 2023-12-14 15:38           ` Johan Hovold
  2023-12-14 16:08             ` Johan Hovold
  0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 15:38 UTC (permalink / raw)
  To: Dmitry Baryshkov, Rob Clark
  Cc: linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
	Konrad Dybcio, open list

On Thu, Dec 14, 2023 at 03:09:36PM +0100, Johan Hovold wrote:
> On Thu, Dec 14, 2023 at 04:04:49PM +0200, Dmitry Baryshkov wrote:
> > On Thu, 14 Dec 2023 at 16:01, Johan Hovold <johan@kernel.org> wrote:
> > > On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:

> > > > This is a known problem. Successful probes during the probe deferral
> > > > loop causes the whole loop to be reiterated. Creating child devices
> > > > usually results in  a successful probe. Aso I thought that just
> > > > creating new device also causes a reprobe, but I can not find any
> > > > evidence now.
> > >
> > > This still needs to be described in the commit message.
> > >
> > > Only a successful probe should trigger a reprobe, and when the child
> > > devices are registered the parent is not yet on the deferred probe list.
> > > So something is not right or missing here.
> > 
> > Child devices can be successfully probed, then the parent gets
> > -EPROBE_DEFER, removes children and then it goes on and on.
> 
> So what? As I described above, the successful probe of the children
> should have nothing to do with whether the parent is reprobed.
> 
> If that isn't the case, then explain how.

I took a closer look at this and indeed we do have code that triggers a
reprobe of a device in case there was a successful probe while the
device was probing.

This was introduced by commit 58b116bce136 ("drivercore: deferral race
condition fix") and the workaround for the reprobe-loop bug that hack
led to is to not return -EPROBE_DEFER after registering child devices as
no one managed to come up with a proper fix. This was documented here:

	fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")

But please spell this out in some more detail in the commit message, and
add a Fixes and CC stable tag.

Johan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14 15:38           ` Johan Hovold
@ 2023-12-14 16:08             ` Johan Hovold
  2023-12-14 20:44               ` Rob Clark
  0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 16:08 UTC (permalink / raw)
  To: Dmitry Baryshkov, Rob Clark
  Cc: linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
	Konrad Dybcio, open list

On Thu, Dec 14, 2023 at 04:38:35PM +0100, Johan Hovold wrote:

> I took a closer look at this and indeed we do have code that triggers a
> reprobe of a device in case there was a successful probe while the
> device was probing.
> 
> This was introduced by commit 58b116bce136 ("drivercore: deferral race
> condition fix") and the workaround for the reprobe-loop bug that hack
> led to is to not return -EPROBE_DEFER after registering child devices as
> no one managed to come up with a proper fix. This was documented here:
> 
> 	fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
> 
> But please spell this out in some more detail in the commit message, and
> add a Fixes and CC stable tag.

And please update the commit summary as I've been booting with QRTR=m
all along just fine. I guess the issue is if you have pmic_glink
built-in or in the initramfs but forgot to include qrtr or similar?

Johan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14 16:08             ` Johan Hovold
@ 2023-12-14 20:44               ` Rob Clark
  2023-12-15  7:50                 ` Johan Hovold
  0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-12-14 20:44 UTC (permalink / raw)
  To: Johan Hovold
  Cc: Dmitry Baryshkov, linux-arm-msm, Rob Clark, Andy Gross,
	Bjorn Andersson, Konrad Dybcio, open list

On Thu, Dec 14, 2023 at 8:08 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Dec 14, 2023 at 04:38:35PM +0100, Johan Hovold wrote:
>
> > I took a closer look at this and indeed we do have code that triggers a
> > reprobe of a device in case there was a successful probe while the
> > device was probing.
> >
> > This was introduced by commit 58b116bce136 ("drivercore: deferral race
> > condition fix") and the workaround for the reprobe-loop bug that hack
> > led to is to not return -EPROBE_DEFER after registering child devices as
> > no one managed to come up with a proper fix. This was documented here:
> >
> >       fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
> >
> > But please spell this out in some more detail in the commit message, and
> > add a Fixes and CC stable tag.
>
> And please update the commit summary as I've been booting with QRTR=m
> all along just fine. I guess the issue is if you have pmic_glink
> built-in or in the initramfs but forgot to include qrtr or similar?

I do have both QRTR=m and QCOM_GLINK=m.  I'm honestly not sure what
started triggering this issue for me.. it seemed to have started after
merging msm-next + drm-misc-next on top of your
jhovold/wip/sc8280xp-v6.7-rc5 (the merged branches were based on -rc3
so this shouldn't have really brought in random non-drm things).
Maybe there is a timing element to it?  I felt like the problem was
obvious enough, and the exact details of why I started hitting this
were not important enough to spend time tracking down.

BR,
-R

> Johan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
  2023-12-14 20:44               ` Rob Clark
@ 2023-12-15  7:50                 ` Johan Hovold
  0 siblings, 0 replies; 10+ messages in thread
From: Johan Hovold @ 2023-12-15  7:50 UTC (permalink / raw)
  To: Rob Clark
  Cc: Dmitry Baryshkov, linux-arm-msm, Rob Clark, Andy Gross,
	Bjorn Andersson, Konrad Dybcio, open list

On Thu, Dec 14, 2023 at 12:44:10PM -0800, Rob Clark wrote:
> On Thu, Dec 14, 2023 at 8:08 AM Johan Hovold <johan@kernel.org> wrote:
> >
> > On Thu, Dec 14, 2023 at 04:38:35PM +0100, Johan Hovold wrote:
> >
> > > I took a closer look at this and indeed we do have code that triggers a
> > > reprobe of a device in case there was a successful probe while the
> > > device was probing.
> > >
> > > This was introduced by commit 58b116bce136 ("drivercore: deferral race
> > > condition fix") and the workaround for the reprobe-loop bug that hack
> > > led to is to not return -EPROBE_DEFER after registering child devices as
> > > no one managed to come up with a proper fix. This was documented here:
> > >
> > >       fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
> > >
> > > But please spell this out in some more detail in the commit message, and
> > > add a Fixes and CC stable tag.
> >
> > And please update the commit summary as I've been booting with QRTR=m
> > all along just fine. I guess the issue is if you have pmic_glink
> > built-in or in the initramfs but forgot to include qrtr or similar?
> 
> I do have both QRTR=m and QCOM_GLINK=m.  I'm honestly not sure what
> started triggering this issue for me.. it seemed to have started after
> merging msm-next + drm-misc-next on top of your
> jhovold/wip/sc8280xp-v6.7-rc5 (the merged branches were based on -rc3
> so this shouldn't have really brought in random non-drm things).
> Maybe there is a timing element to it?

Yeah, possibly, and device links may also come into play here.

> I felt like the problem was obvious enough, and the exact details of
> why I started hitting this were not important enough to spend time
> tracking down.

The patch itself is of course fine itself as a clean up (or
microoptimisation) but the claim that it solves, and is the correct fix
for, a probe loop issue is not obvious at all and requires some further
justification.

Since he last time someone suggested reverting the commit that
introduced the probe-loop issue, the result was to leave things as they
were and just document the workaround, it should be fine to just refer
to that commit:

	fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")

Perhaps you can keep the Subject if you make it clear in the commit
message that this bug isn't always hit with QRTR=m.

Johan

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-12-15  7:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-13 21:06 [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m Rob Clark
2023-12-14  7:16 ` Johan Hovold
2023-12-14 11:04   ` Dmitry Baryshkov
2023-12-14 14:01     ` Johan Hovold
2023-12-14 14:04       ` Dmitry Baryshkov
2023-12-14 14:09         ` Johan Hovold
2023-12-14 15:38           ` Johan Hovold
2023-12-14 16:08             ` Johan Hovold
2023-12-14 20:44               ` Rob Clark
2023-12-15  7:50                 ` Johan Hovold

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.