* [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
@ 2023-12-13 21:06 Rob Clark
2023-12-14 7:16 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-12-13 21:06 UTC (permalink / raw)
To: linux-arm-msm
Cc: Rob Clark, Dmitry Baryshkov, Andy Gross, Bjorn Andersson,
Konrad Dybcio, open list
From: Rob Clark <robdclark@chromium.org>
We need to bail out before adding/removing devices, if we are going
to -EPROBE_DEFER. Otherwise boot will get stuck forever at
deferred_probe_initcall().
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
drivers/soc/qcom/pmic_glink.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index 914057331afd..2899746af6a6 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -268,10 +268,16 @@ static int pmic_glink_probe(struct platform_device *pdev)
else
pg->client_mask = PMIC_GLINK_CLIENT_DEFAULT;
+ pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
+ if (IS_ERR(pg->pdr)) {
+ ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n");
+ return ret;
+ }
+
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI)) {
ret = pmic_glink_add_aux_device(pg, &pg->ucsi_aux, "ucsi");
if (ret)
- return ret;
+ goto out_release_pdr_handle;
}
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_ALTMODE)) {
ret = pmic_glink_add_aux_device(pg, &pg->altmode_aux, "altmode");
@@ -284,17 +290,11 @@ static int pmic_glink_probe(struct platform_device *pdev)
goto out_release_altmode_aux;
}
- pg->pdr = pdr_handle_alloc(pmic_glink_pdr_callback, pg);
- if (IS_ERR(pg->pdr)) {
- ret = dev_err_probe(&pdev->dev, PTR_ERR(pg->pdr), "failed to initialize pdr\n");
- goto out_release_aux_devices;
- }
-
service = pdr_add_lookup(pg->pdr, "tms/servreg", "msm/adsp/charger_pd");
if (IS_ERR(service)) {
ret = dev_err_probe(&pdev->dev, PTR_ERR(service),
"failed adding pdr lookup for charger_pd\n");
- goto out_release_pdr_handle;
+ goto out_release_aux_devices;
}
mutex_lock(&__pmic_glink_lock);
@@ -303,8 +303,6 @@ static int pmic_glink_probe(struct platform_device *pdev)
return 0;
-out_release_pdr_handle:
- pdr_handle_release(pg->pdr);
out_release_aux_devices:
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_BATT))
pmic_glink_del_aux_device(pg, &pg->ps_aux);
@@ -314,6 +312,8 @@ static int pmic_glink_probe(struct platform_device *pdev)
out_release_ucsi_aux:
if (pg->client_mask & BIT(PMIC_GLINK_CLIENT_UCSI))
pmic_glink_del_aux_device(pg, &pg->ucsi_aux);
+out_release_pdr_handle:
+ pdr_handle_release(pg->pdr);
return ret;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-13 21:06 [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m Rob Clark
@ 2023-12-14 7:16 ` Johan Hovold
2023-12-14 11:04 ` Dmitry Baryshkov
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 7:16 UTC (permalink / raw)
To: Rob Clark
Cc: linux-arm-msm, Rob Clark, Dmitry Baryshkov, Andy Gross,
Bjorn Andersson, Konrad Dybcio, open list
On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
>
> We need to bail out before adding/removing devices, if we are going
> to -EPROBE_DEFER. Otherwise boot will get stuck forever at
> deferred_probe_initcall().
Can please you expand on why this is a problem here in the commit
message?
The aux devices appear to be tore down correctly in the probe error
paths so how exactly does that lead to deferred_probe_initcall() being
stuck? This sounds like we may have a problem elsewhere which this patch
is papering over.
Also where does the probe deferral come from in your case?
pdr_handle_alloc()?
If this is a correct fix, I'd also expect there to be a Fixes and
CC-stable tag.
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 7:16 ` Johan Hovold
@ 2023-12-14 11:04 ` Dmitry Baryshkov
2023-12-14 14:01 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Baryshkov @ 2023-12-14 11:04 UTC (permalink / raw)
To: Johan Hovold
Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
Konrad Dybcio, open list
On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
>
> On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > We need to bail out before adding/removing devices, if we are going
> > to -EPROBE_DEFER. Otherwise boot will get stuck forever at
> > deferred_probe_initcall().
>
> Can please you expand on why this is a problem here in the commit
> message?
>
> The aux devices appear to be tore down correctly in the probe error
> paths so how exactly does that lead to deferred_probe_initcall() being
> stuck? This sounds like we may have a problem elsewhere which this patch
> is papering over.
This is a known problem. Successful probes during the probe deferral
loop causes the whole loop to be reiterated. Creating child devices
usually results in a successful probe. Aso I thought that just
creating new device also causes a reprobe, but I can not find any
evidence now.
>
> Also where does the probe deferral come from in your case?
> pdr_handle_alloc()?
>
> If this is a correct fix, I'd also expect there to be a Fixes and
> CC-stable tag.
>
> Johan
>
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 11:04 ` Dmitry Baryshkov
@ 2023-12-14 14:01 ` Johan Hovold
2023-12-14 14:04 ` Dmitry Baryshkov
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 14:01 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
Konrad Dybcio, open list
On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:
> On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
> >
> > On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > We need to bail out before adding/removing devices, if we are going
> > > to -EPROBE_DEFER. Otherwise boot will get stuck forever at
> > > deferred_probe_initcall().
> >
> > Can please you expand on why this is a problem here in the commit
> > message?
> >
> > The aux devices appear to be tore down correctly in the probe error
> > paths so how exactly does that lead to deferred_probe_initcall() being
> > stuck? This sounds like we may have a problem elsewhere which this patch
> > is papering over.
>
> This is a known problem. Successful probes during the probe deferral
> loop causes the whole loop to be reiterated. Creating child devices
> usually results in a successful probe. Aso I thought that just
> creating new device also causes a reprobe, but I can not find any
> evidence now.
This still needs to be described in the commit message.
Only a successful probe should trigger a reprobe, and when the child
devices are registered the parent is not yet on the deferred probe list.
So something is not right or missing here.
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 14:01 ` Johan Hovold
@ 2023-12-14 14:04 ` Dmitry Baryshkov
2023-12-14 14:09 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Dmitry Baryshkov @ 2023-12-14 14:04 UTC (permalink / raw)
To: Johan Hovold
Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
Konrad Dybcio, open list
On Thu, 14 Dec 2023 at 16:01, Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:
> > On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
> > >
> > > On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > We need to bail out before adding/removing devices, if we are going
> > > > to -EPROBE_DEFER. Otherwise boot will get stuck forever at
> > > > deferred_probe_initcall().
> > >
> > > Can please you expand on why this is a problem here in the commit
> > > message?
> > >
> > > The aux devices appear to be tore down correctly in the probe error
> > > paths so how exactly does that lead to deferred_probe_initcall() being
> > > stuck? This sounds like we may have a problem elsewhere which this patch
> > > is papering over.
> >
> > This is a known problem. Successful probes during the probe deferral
> > loop causes the whole loop to be reiterated. Creating child devices
> > usually results in a successful probe. Aso I thought that just
> > creating new device also causes a reprobe, but I can not find any
> > evidence now.
>
> This still needs to be described in the commit message.
>
> Only a successful probe should trigger a reprobe, and when the child
> devices are registered the parent is not yet on the deferred probe list.
> So something is not right or missing here.
Child devices can be successfully probed, then the parent gets
-EPROBE_DEFER, removes children and then it goes on and on.
--
With best wishes
Dmitry
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 14:04 ` Dmitry Baryshkov
@ 2023-12-14 14:09 ` Johan Hovold
2023-12-14 15:38 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 14:09 UTC (permalink / raw)
To: Dmitry Baryshkov
Cc: Rob Clark, linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
Konrad Dybcio, open list
On Thu, Dec 14, 2023 at 04:04:49PM +0200, Dmitry Baryshkov wrote:
> On Thu, 14 Dec 2023 at 16:01, Johan Hovold <johan@kernel.org> wrote:
> > On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:
> > > On Thu, 14 Dec 2023 at 09:16, Johan Hovold <johan@kernel.org> wrote:
> > > > On Wed, Dec 13, 2023 at 01:06:43PM -0800, Rob Clark wrote:
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > We need to bail out before adding/removing devices, if we are going
> > > > > to -EPROBE_DEFER. Otherwise boot will get stuck forever at
> > > > > deferred_probe_initcall().
> > > >
> > > > Can please you expand on why this is a problem here in the commit
> > > > message?
> > > >
> > > > The aux devices appear to be tore down correctly in the probe error
> > > > paths so how exactly does that lead to deferred_probe_initcall() being
> > > > stuck? This sounds like we may have a problem elsewhere which this patch
> > > > is papering over.
> > >
> > > This is a known problem. Successful probes during the probe deferral
> > > loop causes the whole loop to be reiterated. Creating child devices
> > > usually results in a successful probe. Aso I thought that just
> > > creating new device also causes a reprobe, but I can not find any
> > > evidence now.
> >
> > This still needs to be described in the commit message.
> >
> > Only a successful probe should trigger a reprobe, and when the child
> > devices are registered the parent is not yet on the deferred probe list.
> > So something is not right or missing here.
>
> Child devices can be successfully probed, then the parent gets
> -EPROBE_DEFER, removes children and then it goes on and on.
So what? As I described above, the successful probe of the children
should have nothing to do with whether the parent is reprobed.
If that isn't the case, then explain how.
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 14:09 ` Johan Hovold
@ 2023-12-14 15:38 ` Johan Hovold
2023-12-14 16:08 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 15:38 UTC (permalink / raw)
To: Dmitry Baryshkov, Rob Clark
Cc: linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
Konrad Dybcio, open list
On Thu, Dec 14, 2023 at 03:09:36PM +0100, Johan Hovold wrote:
> On Thu, Dec 14, 2023 at 04:04:49PM +0200, Dmitry Baryshkov wrote:
> > On Thu, 14 Dec 2023 at 16:01, Johan Hovold <johan@kernel.org> wrote:
> > > On Thu, Dec 14, 2023 at 01:04:43PM +0200, Dmitry Baryshkov wrote:
> > > > This is a known problem. Successful probes during the probe deferral
> > > > loop causes the whole loop to be reiterated. Creating child devices
> > > > usually results in a successful probe. Aso I thought that just
> > > > creating new device also causes a reprobe, but I can not find any
> > > > evidence now.
> > >
> > > This still needs to be described in the commit message.
> > >
> > > Only a successful probe should trigger a reprobe, and when the child
> > > devices are registered the parent is not yet on the deferred probe list.
> > > So something is not right or missing here.
> >
> > Child devices can be successfully probed, then the parent gets
> > -EPROBE_DEFER, removes children and then it goes on and on.
>
> So what? As I described above, the successful probe of the children
> should have nothing to do with whether the parent is reprobed.
>
> If that isn't the case, then explain how.
I took a closer look at this and indeed we do have code that triggers a
reprobe of a device in case there was a successful probe while the
device was probing.
This was introduced by commit 58b116bce136 ("drivercore: deferral race
condition fix") and the workaround for the reprobe-loop bug that hack
led to is to not return -EPROBE_DEFER after registering child devices as
no one managed to come up with a proper fix. This was documented here:
fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
But please spell this out in some more detail in the commit message, and
add a Fixes and CC stable tag.
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 15:38 ` Johan Hovold
@ 2023-12-14 16:08 ` Johan Hovold
2023-12-14 20:44 ` Rob Clark
0 siblings, 1 reply; 10+ messages in thread
From: Johan Hovold @ 2023-12-14 16:08 UTC (permalink / raw)
To: Dmitry Baryshkov, Rob Clark
Cc: linux-arm-msm, Rob Clark, Andy Gross, Bjorn Andersson,
Konrad Dybcio, open list
On Thu, Dec 14, 2023 at 04:38:35PM +0100, Johan Hovold wrote:
> I took a closer look at this and indeed we do have code that triggers a
> reprobe of a device in case there was a successful probe while the
> device was probing.
>
> This was introduced by commit 58b116bce136 ("drivercore: deferral race
> condition fix") and the workaround for the reprobe-loop bug that hack
> led to is to not return -EPROBE_DEFER after registering child devices as
> no one managed to come up with a proper fix. This was documented here:
>
> fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
>
> But please spell this out in some more detail in the commit message, and
> add a Fixes and CC stable tag.
And please update the commit summary as I've been booting with QRTR=m
all along just fine. I guess the issue is if you have pmic_glink
built-in or in the initramfs but forgot to include qrtr or similar?
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 16:08 ` Johan Hovold
@ 2023-12-14 20:44 ` Rob Clark
2023-12-15 7:50 ` Johan Hovold
0 siblings, 1 reply; 10+ messages in thread
From: Rob Clark @ 2023-12-14 20:44 UTC (permalink / raw)
To: Johan Hovold
Cc: Dmitry Baryshkov, linux-arm-msm, Rob Clark, Andy Gross,
Bjorn Andersson, Konrad Dybcio, open list
On Thu, Dec 14, 2023 at 8:08 AM Johan Hovold <johan@kernel.org> wrote:
>
> On Thu, Dec 14, 2023 at 04:38:35PM +0100, Johan Hovold wrote:
>
> > I took a closer look at this and indeed we do have code that triggers a
> > reprobe of a device in case there was a successful probe while the
> > device was probing.
> >
> > This was introduced by commit 58b116bce136 ("drivercore: deferral race
> > condition fix") and the workaround for the reprobe-loop bug that hack
> > led to is to not return -EPROBE_DEFER after registering child devices as
> > no one managed to come up with a proper fix. This was documented here:
> >
> > fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
> >
> > But please spell this out in some more detail in the commit message, and
> > add a Fixes and CC stable tag.
>
> And please update the commit summary as I've been booting with QRTR=m
> all along just fine. I guess the issue is if you have pmic_glink
> built-in or in the initramfs but forgot to include qrtr or similar?
I do have both QRTR=m and QCOM_GLINK=m. I'm honestly not sure what
started triggering this issue for me.. it seemed to have started after
merging msm-next + drm-misc-next on top of your
jhovold/wip/sc8280xp-v6.7-rc5 (the merged branches were based on -rc3
so this shouldn't have really brought in random non-drm things).
Maybe there is a timing element to it? I felt like the problem was
obvious enough, and the exact details of why I started hitting this
were not important enough to spend time tracking down.
BR,
-R
> Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m
2023-12-14 20:44 ` Rob Clark
@ 2023-12-15 7:50 ` Johan Hovold
0 siblings, 0 replies; 10+ messages in thread
From: Johan Hovold @ 2023-12-15 7:50 UTC (permalink / raw)
To: Rob Clark
Cc: Dmitry Baryshkov, linux-arm-msm, Rob Clark, Andy Gross,
Bjorn Andersson, Konrad Dybcio, open list
On Thu, Dec 14, 2023 at 12:44:10PM -0800, Rob Clark wrote:
> On Thu, Dec 14, 2023 at 8:08 AM Johan Hovold <johan@kernel.org> wrote:
> >
> > On Thu, Dec 14, 2023 at 04:38:35PM +0100, Johan Hovold wrote:
> >
> > > I took a closer look at this and indeed we do have code that triggers a
> > > reprobe of a device in case there was a successful probe while the
> > > device was probing.
> > >
> > > This was introduced by commit 58b116bce136 ("drivercore: deferral race
> > > condition fix") and the workaround for the reprobe-loop bug that hack
> > > led to is to not return -EPROBE_DEFER after registering child devices as
> > > no one managed to come up with a proper fix. This was documented here:
> > >
> > > fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
> > >
> > > But please spell this out in some more detail in the commit message, and
> > > add a Fixes and CC stable tag.
> >
> > And please update the commit summary as I've been booting with QRTR=m
> > all along just fine. I guess the issue is if you have pmic_glink
> > built-in or in the initramfs but forgot to include qrtr or similar?
>
> I do have both QRTR=m and QCOM_GLINK=m. I'm honestly not sure what
> started triggering this issue for me.. it seemed to have started after
> merging msm-next + drm-misc-next on top of your
> jhovold/wip/sc8280xp-v6.7-rc5 (the merged branches were based on -rc3
> so this shouldn't have really brought in random non-drm things).
> Maybe there is a timing element to it?
Yeah, possibly, and device links may also come into play here.
> I felt like the problem was obvious enough, and the exact details of
> why I started hitting this were not important enough to spend time
> tracking down.
The patch itself is of course fine itself as a clean up (or
microoptimisation) but the claim that it solves, and is the correct fix
for, a probe loop issue is not obvious at all and requires some further
justification.
Since he last time someone suggested reverting the commit that
introduced the probe-loop issue, the result was to leave things as they
were and just document the workaround, it should be fine to just refer
to that commit:
fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")
Perhaps you can keep the Subject if you make it clear in the commit
message that this bug isn't always hit with QRTR=m.
Johan
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-12-15 7:50 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-13 21:06 [PATCH] soc: qcom: pmic_glink: Fix boot when QRTR=m Rob Clark
2023-12-14 7:16 ` Johan Hovold
2023-12-14 11:04 ` Dmitry Baryshkov
2023-12-14 14:01 ` Johan Hovold
2023-12-14 14:04 ` Dmitry Baryshkov
2023-12-14 14:09 ` Johan Hovold
2023-12-14 15:38 ` Johan Hovold
2023-12-14 16:08 ` Johan Hovold
2023-12-14 20:44 ` Rob Clark
2023-12-15 7:50 ` Johan Hovold
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.