dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/edid/firmware: stop using throwaway platform device
@ 2022-10-06 22:21 Jani Nikula
  2022-10-11  6:27 ` Matthieu CHARETTE
  0 siblings, 1 reply; 10+ messages in thread
From: Jani Nikula @ 2022-10-06 22:21 UTC (permalink / raw)
  To: dri-devel; +Cc: jani.nikula, intel-gfx, Matthieu CHARETTE

We've used a temporary platform device for firmware EDID loading since
it was introduced in commit da0df92b5731 ("drm: allow loading an EDID as
firmware to override broken monitor"), but there's no explanation why.

Do we need to?

Maybe this fixes the suspend/resume issue?

(Yes, I'll rewrite the commit message if this is the way to go ;)

References: https://lore.kernel.org/r/20220727074152.43059-1-matthieu.charette@gmail.com
Cc: Matthieu CHARETTE <matthieu.charette@gmail.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/drm_edid_load.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_edid_load.c b/drivers/gpu/drm/drm_edid_load.c
index 37d8ba3ddb46..fbae12130234 100644
--- a/drivers/gpu/drm/drm_edid_load.c
+++ b/drivers/gpu/drm/drm_edid_load.c
@@ -182,18 +182,9 @@ static void *edid_load(struct drm_connector *connector, const char *name,
 		fwdata = generic_edid[builtin];
 		fwsize = sizeof(generic_edid[builtin]);
 	} else {
-		struct platform_device *pdev;
 		int err;
 
-		pdev = platform_device_register_simple(connector_name, -1, NULL, 0);
-		if (IS_ERR(pdev)) {
-			DRM_ERROR("Failed to register EDID firmware platform device "
-				  "for connector \"%s\"\n", connector_name);
-			return ERR_CAST(pdev);
-		}
-
-		err = request_firmware(&fw, name, &pdev->dev);
-		platform_device_unregister(pdev);
+		err = request_firmware(&fw, name, connector->dev->dev);
 		if (err) {
 			DRM_ERROR("Requesting EDID firmware \"%s\" failed (err=%d)\n",
 				  name, err);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-10-06 22:21 [PATCH] drm/edid/firmware: stop using throwaway platform device Jani Nikula
@ 2022-10-11  6:27 ` Matthieu CHARETTE
  2022-10-11  7:20   ` Jani Nikula
  0 siblings, 1 reply; 10+ messages in thread
From: Matthieu CHARETTE @ 2022-10-11  6:27 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, dri-devel

It should fix the issue. Meanwhile, the system will still crash if a 
new monitor is plugged while the machine is suspended. We might need to 
precache the EDID to prevent that.

Matthieu

On Fri, Oct 7 2022 at 01:21:46 AM +0300, Jani Nikula 
<jani.nikula@intel.com> wrote:
> We've used a temporary platform device for firmware EDID loading since
> it was introduced in commit da0df92b5731 ("drm: allow loading an EDID 
> as
> firmware to override broken monitor"), but there's no explanation why.
> 
> Do we need to?
> 
> Maybe this fixes the suspend/resume issue?
> 
> (Yes, I'll rewrite the commit message if this is the way to go ;)
> 
> References: 
> https://lore.kernel.org/r/20220727074152.43059-1-matthieu.charette@gmail.com
> Cc: Matthieu CHARETTE <matthieu.charette@gmail.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> ---
>  drivers/gpu/drm/drm_edid_load.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_edid_load.c 
> b/drivers/gpu/drm/drm_edid_load.c
> index 37d8ba3ddb46..fbae12130234 100644
> --- a/drivers/gpu/drm/drm_edid_load.c
> +++ b/drivers/gpu/drm/drm_edid_load.c
> @@ -182,18 +182,9 @@ static void *edid_load(struct drm_connector 
> *connector, const char *name,
>  		fwdata = generic_edid[builtin];
>  		fwsize = sizeof(generic_edid[builtin]);
>  	} else {
> -		struct platform_device *pdev;
>  		int err;
> 
> -		pdev = platform_device_register_simple(connector_name, -1, NULL, 
> 0);
> -		if (IS_ERR(pdev)) {
> -			DRM_ERROR("Failed to register EDID firmware platform device "
> -				  "for connector \"%s\"\n", connector_name);
> -			return ERR_CAST(pdev);
> -		}
> -
> -		err = request_firmware(&fw, name, &pdev->dev);
> -		platform_device_unregister(pdev);
> +		err = request_firmware(&fw, name, connector->dev->dev);
>  		if (err) {
>  			DRM_ERROR("Requesting EDID firmware \"%s\" failed (err=%d)\n",
>  				  name, err);
> --
> 2.34.1
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-10-11  6:27 ` Matthieu CHARETTE
@ 2022-10-11  7:20   ` Jani Nikula
  2022-10-11 20:45     ` Matthieu CHARETTE
  0 siblings, 1 reply; 10+ messages in thread
From: Jani Nikula @ 2022-10-11  7:20 UTC (permalink / raw)
  To: Matthieu CHARETTE; +Cc: intel-gfx, dri-devel

On Tue, 11 Oct 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> wrote:
> It should fix the issue. Meanwhile, the system will still crash if a 
> new monitor is plugged while the machine is suspended. We might need to 
> precache the EDID to prevent that.

Please elaborate.

BR,
Jani.


>
> Matthieu
>
> On Fri, Oct 7 2022 at 01:21:46 AM +0300, Jani Nikula 
> <jani.nikula@intel.com> wrote:
>> We've used a temporary platform device for firmware EDID loading since
>> it was introduced in commit da0df92b5731 ("drm: allow loading an EDID 
>> as
>> firmware to override broken monitor"), but there's no explanation why.
>> 
>> Do we need to?
>> 
>> Maybe this fixes the suspend/resume issue?
>> 
>> (Yes, I'll rewrite the commit message if this is the way to go ;)
>> 
>> References: 
>> https://lore.kernel.org/r/20220727074152.43059-1-matthieu.charette@gmail.com
>> Cc: Matthieu CHARETTE <matthieu.charette@gmail.com>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>> ---
>>  drivers/gpu/drm/drm_edid_load.c | 11 +----------
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/drm_edid_load.c 
>> b/drivers/gpu/drm/drm_edid_load.c
>> index 37d8ba3ddb46..fbae12130234 100644
>> --- a/drivers/gpu/drm/drm_edid_load.c
>> +++ b/drivers/gpu/drm/drm_edid_load.c
>> @@ -182,18 +182,9 @@ static void *edid_load(struct drm_connector 
>> *connector, const char *name,
>>  		fwdata = generic_edid[builtin];
>>  		fwsize = sizeof(generic_edid[builtin]);
>>  	} else {
>> -		struct platform_device *pdev;
>>  		int err;
>> 
>> -		pdev = platform_device_register_simple(connector_name, -1, NULL, 
>> 0);
>> -		if (IS_ERR(pdev)) {
>> -			DRM_ERROR("Failed to register EDID firmware platform device "
>> -				  "for connector \"%s\"\n", connector_name);
>> -			return ERR_CAST(pdev);
>> -		}
>> -
>> -		err = request_firmware(&fw, name, &pdev->dev);
>> -		platform_device_unregister(pdev);
>> +		err = request_firmware(&fw, name, connector->dev->dev);
>>  		if (err) {
>>  			DRM_ERROR("Requesting EDID firmware \"%s\" failed (err=%d)\n",
>>  				  name, err);
>> --
>> 2.34.1
>> 
>
>

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-10-11  7:20   ` Jani Nikula
@ 2022-10-11 20:45     ` Matthieu CHARETTE
  2022-10-12  8:25       ` Jani Nikula
  0 siblings, 1 reply; 10+ messages in thread
From: Matthieu CHARETTE @ 2022-10-11 20:45 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, dri-devel

Currently the EDID is requested during the resume. But since it's
requested too early, this means before the filesystem is mounted, the
firmware request fails. This make the DRM driver crash when resuming.
This kind of issue should be prevented by the firmware caching process
which cache every firmware requested for the next resume. But since we
are using a temporary device, the firmware isn't cached on suspend
since the device doesn't work anymore.
When using a non temporary device to get the EDID, the firmware will
be cached on suspend for the next resume. So requesting the firmware
during resume will succeed.
But if the firmware has never been requested since the boot, this
means that the monitor isn't plugged since the boot. The kernel will
not be caching the EDID. So if we plug the monitor while the machine
is suspended. The resume will fail to load the firmware. And the DRM
driver will crash.
So basically, your fix should solve the issue except for the case
where the monitor hasn't been plugged since boot and is plugged while
the machine is suspended.
I hope I was clear. Tell me if I wasn't. I'm not really good at explaining.

Matthieu

On 10/11/22, Jani Nikula <jani.nikula@intel.com> wrote:
> On Tue, 11 Oct 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> wrote:
>> It should fix the issue. Meanwhile, the system will still crash if a
>> new monitor is plugged while the machine is suspended. We might need to
>> precache the EDID to prevent that.
>
> Please elaborate.
>
> BR,
> Jani.
>
>
>>
>> Matthieu
>>
>> On Fri, Oct 7 2022 at 01:21:46 AM +0300, Jani Nikula
>> <jani.nikula@intel.com> wrote:
>>> We've used a temporary platform device for firmware EDID loading since
>>> it was introduced in commit da0df92b5731 ("drm: allow loading an EDID
>>> as
>>> firmware to override broken monitor"), but there's no explanation why.
>>>
>>> Do we need to?
>>>
>>> Maybe this fixes the suspend/resume issue?
>>>
>>> (Yes, I'll rewrite the commit message if this is the way to go ;)
>>>
>>> References:
>>> https://lore.kernel.org/r/20220727074152.43059-1-matthieu.charette@gmail.com
>>> Cc: Matthieu CHARETTE <matthieu.charette@gmail.com>
>>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>>> ---
>>>  drivers/gpu/drm/drm_edid_load.c | 11 +----------
>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_edid_load.c
>>> b/drivers/gpu/drm/drm_edid_load.c
>>> index 37d8ba3ddb46..fbae12130234 100644
>>> --- a/drivers/gpu/drm/drm_edid_load.c
>>> +++ b/drivers/gpu/drm/drm_edid_load.c
>>> @@ -182,18 +182,9 @@ static void *edid_load(struct drm_connector
>>> *connector, const char *name,
>>>  		fwdata = generic_edid[builtin];
>>>  		fwsize = sizeof(generic_edid[builtin]);
>>>  	} else {
>>> -		struct platform_device *pdev;
>>>  		int err;
>>>
>>> -		pdev = platform_device_register_simple(connector_name, -1, NULL,
>>> 0);
>>> -		if (IS_ERR(pdev)) {
>>> -			DRM_ERROR("Failed to register EDID firmware platform device "
>>> -				  "for connector \"%s\"\n", connector_name);
>>> -			return ERR_CAST(pdev);
>>> -		}
>>> -
>>> -		err = request_firmware(&fw, name, &pdev->dev);
>>> -		platform_device_unregister(pdev);
>>> +		err = request_firmware(&fw, name, connector->dev->dev);
>>>  		if (err) {
>>>  			DRM_ERROR("Requesting EDID firmware \"%s\" failed (err=%d)\n",
>>>  				  name, err);
>>> --
>>> 2.34.1
>>>
>>
>>
>
> --
> Jani Nikula, Intel Open Source Graphics Center
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-10-11 20:45     ` Matthieu CHARETTE
@ 2022-10-12  8:25       ` Jani Nikula
  2022-10-12 17:16         ` Matthieu CHARETTE
  0 siblings, 1 reply; 10+ messages in thread
From: Jani Nikula @ 2022-10-12  8:25 UTC (permalink / raw)
  To: Matthieu CHARETTE; +Cc: intel-gfx, dri-devel

On Tue, 11 Oct 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> wrote:
> Currently the EDID is requested during the resume. But since it's
> requested too early, this means before the filesystem is mounted, the
> firmware request fails. This make the DRM driver crash when resuming.
> This kind of issue should be prevented by the firmware caching process
> which cache every firmware requested for the next resume. But since we
> are using a temporary device, the firmware isn't cached on suspend
> since the device doesn't work anymore.
> When using a non temporary device to get the EDID, the firmware will
> be cached on suspend for the next resume. So requesting the firmware
> during resume will succeed.
> But if the firmware has never been requested since the boot, this
> means that the monitor isn't plugged since the boot. The kernel will
> not be caching the EDID. So if we plug the monitor while the machine
> is suspended. The resume will fail to load the firmware. And the DRM
> driver will crash.
> So basically, your fix should solve the issue except for the case
> where the monitor hasn't been plugged since boot and is plugged while
> the machine is suspended.
> I hope I was clear. Tell me if I wasn't. I'm not really good at explaining.

That was a pretty good explanation. The only thing I'm missing is what
the failure mode is exactly when you claim the driver will crash. Why
would request_firmware() "crash" if called for the first time on the
resume path?

I'm not sure I care much about not being able to load the firmware EDID
in the suspend-plug-resume case (as this can be remedied with a
subsequent modeset), but obviously any errors need to be handled
gracefully, without crashing.

BR,
Jani.


-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-10-12  8:25       ` Jani Nikula
@ 2022-10-12 17:16         ` Matthieu CHARETTE
  2022-11-06 15:03           ` Matthieu CHARETTE
  0 siblings, 1 reply; 10+ messages in thread
From: Matthieu CHARETTE @ 2022-10-12 17:16 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, dri-devel

By crash, I mean that an error is returned here: 
https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git/+/refs/heads/master/drivers/gpu/drm/drm_edid_load.c#195
I don't really know what happens next, but on my machine the built-in 
screen and the external remains dark. Also the kernel seems to freeze. 
I suspect a kernel panic, but I'm not sure. Anyway, the error is 
definitely not well handled, and a fix would be great.
Also, request_firmware() will crash if called for the first time on the 
resume path because the file system isn't reachable on the resume 
process. And no cache is available for this firmware. So I guess that 
in this case, request_firmware() returns an error.
Suspend-plug-resume case is not my priority nether as long as it 
doesn't make the system crash (Which is currently the case).

On Wed, Oct 12 2022 at 11:25:59 AM +0300, Jani Nikula 
<jani.nikula@intel.com> wrote:
> On Tue, 11 Oct 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> 
> wrote:
>>  Currently the EDID is requested during the resume. But since it's
>>  requested too early, this means before the filesystem is mounted, 
>> the
>>  firmware request fails. This make the DRM driver crash when 
>> resuming.
>>  This kind of issue should be prevented by the firmware caching 
>> process
>>  which cache every firmware requested for the next resume. But since 
>> we
>>  are using a temporary device, the firmware isn't cached on suspend
>>  since the device doesn't work anymore.
>>  When using a non temporary device to get the EDID, the firmware will
>>  be cached on suspend for the next resume. So requesting the firmware
>>  during resume will succeed.
>>  But if the firmware has never been requested since the boot, this
>>  means that the monitor isn't plugged since the boot. The kernel will
>>  not be caching the EDID. So if we plug the monitor while the machine
>>  is suspended. The resume will fail to load the firmware. And the DRM
>>  driver will crash.
>>  So basically, your fix should solve the issue except for the case
>>  where the monitor hasn't been plugged since boot and is plugged 
>> while
>>  the machine is suspended.
>>  I hope I was clear. Tell me if I wasn't. I'm not really good at 
>> explaining.
> 
> That was a pretty good explanation. The only thing I'm missing is what
> the failure mode is exactly when you claim the driver will crash. Why
> would request_firmware() "crash" if called for the first time on the
> resume path?
> 
> I'm not sure I care much about not being able to load the firmware 
> EDID
> in the suspend-plug-resume case (as this can be remedied with a
> subsequent modeset), but obviously any errors need to be handled
> gracefully, without crashing.
> 
> BR,
> Jani.
> 
> 
> --
> Jani Nikula, Intel Open Source Graphics Center



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-10-12 17:16         ` Matthieu CHARETTE
@ 2022-11-06 15:03           ` Matthieu CHARETTE
  2022-11-08 11:27             ` Jani Nikula
  0 siblings, 1 reply; 10+ messages in thread
From: Matthieu CHARETTE @ 2022-11-06 15:03 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, dri-devel

Hi,

Can you tell me what are we waiting for? Maybe I can help.

Thanks.

Matthieu

On Wed, Oct 12 2022 at 07:16:29 PM +0200, Matthieu CHARETTE 
<matthieu.charette@gmail.com> wrote:
> By crash, I mean that an error is returned here: 
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git/+/refs/heads/master/drivers/gpu/drm/drm_edid_load.c#195
> I don't really know what happens next, but on my machine the built-in 
> screen and the external remains dark. Also the kernel seems to 
> freeze. I suspect a kernel panic, but I'm not sure. Anyway, the error 
> is definitely not well handled, and a fix would be great.
> Also, request_firmware() will crash if called for the first time on 
> the resume path because the file system isn't reachable on the resume 
> process. And no cache is available for this firmware. So I guess that 
> in this case, request_firmware() returns an error.
> Suspend-plug-resume case is not my priority nether as long as it 
> doesn't make the system crash (Which is currently the case).
> 
> On Wed, Oct 12 2022 at 11:25:59 AM +0300, Jani Nikula 
> <jani.nikula@intel.com> wrote:
>> On Tue, 11 Oct 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> 
>> \x7fwrote:
>>>  Currently the EDID is requested during the resume. But since it's
>>>  requested too early, this means before the filesystem is mounted, 
>>> \x7f\x7fthe
>>>  firmware request fails. This make the DRM driver crash when 
>>> \x7f\x7fresuming.
>>>  This kind of issue should be prevented by the firmware caching 
>>> \x7f\x7fprocess
>>>  which cache every firmware requested for the next resume. But 
>>> since \x7f\x7fwe
>>>  are using a temporary device, the firmware isn't cached on suspend
>>>  since the device doesn't work anymore.
>>>  When using a non temporary device to get the EDID, the firmware 
>>> will
>>>  be cached on suspend for the next resume. So requesting the 
>>> firmware
>>>  during resume will succeed.
>>>  But if the firmware has never been requested since the boot, this
>>>  means that the monitor isn't plugged since the boot. The kernel 
>>> will
>>>  not be caching the EDID. So if we plug the monitor while the 
>>> machine
>>>  is suspended. The resume will fail to load the firmware. And the 
>>> DRM
>>>  driver will crash.
>>>  So basically, your fix should solve the issue except for the case
>>>  where the monitor hasn't been plugged since boot and is plugged 
>>> \x7f\x7fwhile
>>>  the machine is suspended.
>>>  I hope I was clear. Tell me if I wasn't. I'm not really good at 
>>> \x7f\x7fexplaining.
>> 
>> That was a pretty good explanation. The only thing I'm missing is 
>> what
>> the failure mode is exactly when you claim the driver will crash. Why
>> would request_firmware() "crash" if called for the first time on the
>> resume path?
>> 
>> I'm not sure I care much about not being able to load the firmware 
>> \x7fEDID
>> in the suspend-plug-resume case (as this can be remedied with a
>> subsequent modeset), but obviously any errors need to be handled
>> gracefully, without crashing.
>> 
>> BR,
>> Jani.
>> 
>> 
>> --
>> Jani Nikula, Intel Open Source Graphics Center
> 
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-11-06 15:03           ` Matthieu CHARETTE
@ 2022-11-08 11:27             ` Jani Nikula
  2022-11-08 15:40               ` Matthieu CHARETTE
  0 siblings, 1 reply; 10+ messages in thread
From: Jani Nikula @ 2022-11-08 11:27 UTC (permalink / raw)
  To: Matthieu CHARETTE; +Cc: intel-gfx, dri-devel

On Sun, 06 Nov 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> wrote:
> Hi,
>
> Can you tell me what are we waiting for? Maybe I can help.

Have you tried the patch? Is it an improvement over the status quo?

The "crash" is still ambiguous to me. Do you observe it with the patch?
Do you have logs? Etc.

BR,
Jani.


>
> Thanks.
>
> Matthieu
>
> On Wed, Oct 12 2022 at 07:16:29 PM +0200, Matthieu CHARETTE 
> <matthieu.charette@gmail.com> wrote:
>> By crash, I mean that an error is returned here: 
>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git/+/refs/heads/master/drivers/gpu/drm/drm_edid_load.c#195
>> I don't really know what happens next, but on my machine the built-in 
>> screen and the external remains dark. Also the kernel seems to 
>> freeze. I suspect a kernel panic, but I'm not sure. Anyway, the error 
>> is definitely not well handled, and a fix would be great.
>> Also, request_firmware() will crash if called for the first time on 
>> the resume path because the file system isn't reachable on the resume 
>> process. And no cache is available for this firmware. So I guess that 
>> in this case, request_firmware() returns an error.
>> Suspend-plug-resume case is not my priority nether as long as it 
>> doesn't make the system crash (Which is currently the case).
>> 
>> On Wed, Oct 12 2022 at 11:25:59 AM +0300, Jani Nikula 
>> <jani.nikula@intel.com> wrote:
>>> On Tue, 11 Oct 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> 
>>> \x7fwrote:
>>>>  Currently the EDID is requested during the resume. But since it's
>>>>  requested too early, this means before the filesystem is mounted, 
>>>> \x7f\x7fthe
>>>>  firmware request fails. This make the DRM driver crash when 
>>>> \x7f\x7fresuming.
>>>>  This kind of issue should be prevented by the firmware caching 
>>>> \x7f\x7fprocess
>>>>  which cache every firmware requested for the next resume. But 
>>>> since \x7f\x7fwe
>>>>  are using a temporary device, the firmware isn't cached on suspend
>>>>  since the device doesn't work anymore.
>>>>  When using a non temporary device to get the EDID, the firmware 
>>>> will
>>>>  be cached on suspend for the next resume. So requesting the 
>>>> firmware
>>>>  during resume will succeed.
>>>>  But if the firmware has never been requested since the boot, this
>>>>  means that the monitor isn't plugged since the boot. The kernel 
>>>> will
>>>>  not be caching the EDID. So if we plug the monitor while the 
>>>> machine
>>>>  is suspended. The resume will fail to load the firmware. And the 
>>>> DRM
>>>>  driver will crash.
>>>>  So basically, your fix should solve the issue except for the case
>>>>  where the monitor hasn't been plugged since boot and is plugged 
>>>> \x7f\x7fwhile
>>>>  the machine is suspended.
>>>>  I hope I was clear. Tell me if I wasn't. I'm not really good at 
>>>> \x7f\x7fexplaining.
>>> 
>>> That was a pretty good explanation. The only thing I'm missing is 
>>> what
>>> the failure mode is exactly when you claim the driver will crash. Why
>>> would request_firmware() "crash" if called for the first time on the
>>> resume path?
>>> 
>>> I'm not sure I care much about not being able to load the firmware 
>>> \x7fEDID
>>> in the suspend-plug-resume case (as this can be remedied with a
>>> subsequent modeset), but obviously any errors need to be handled
>>> gracefully, without crashing.
>>> 
>>> BR,
>>> Jani.
>>> 
>>> 
>>> --
>>> Jani Nikula, Intel Open Source Graphics Center
>> 
>> 
>
>

-- 
Jani Nikula, Intel Open Source Graphics Center

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-11-08 11:27             ` Jani Nikula
@ 2022-11-08 15:40               ` Matthieu CHARETTE
  2022-11-13 19:26                 ` Matthieu CHARETTE
  0 siblings, 1 reply; 10+ messages in thread
From: Matthieu CHARETTE @ 2022-11-08 15:40 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, dri-devel

I didn't test the patch yet. I will do. But even without testing I can 
tell you that it will work (It will not crash).
Currently when the crash occurs, all screens remain black after resume. 
I'm not able to login with ssh neither. And logs end before the 
suspend. So the crash seems to be some kind of kernel panic.

Matthieu

On Tue, Nov 8 2022 at 01:27:33 PM +0200, Jani Nikula 
<jani.nikula@intel.com> wrote:
> On Sun, 06 Nov 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> 
> wrote:
>>  Hi,
>> 
>>  Can you tell me what are we waiting for? Maybe I can help.
> 
> Have you tried the patch? Is it an improvement over the status quo?
> 
> The "crash" is still ambiguous to me. Do you observe it with the 
> patch?
> Do you have logs? Etc.
> 
> BR,
> Jani.
> 
> 
>> 
>>  Thanks.
>> 
>>  Matthieu
>> 
>>  On Wed, Oct 12 2022 at 07:16:29 PM +0200, Matthieu CHARETTE
>>  <matthieu.charette@gmail.com> wrote:
>>>  By crash, I mean that an error is returned here:
>>>  
>>> https://kernel.googlesource.com/pub/scm/linux/kernel/git/torvalds/linux.git/+/refs/heads/master/drivers/gpu/drm/drm_edid_load.c#195
>>>  I don't really know what happens next, but on my machine the 
>>> built-in
>>>  screen and the external remains dark. Also the kernel seems to
>>>  freeze. I suspect a kernel panic, but I'm not sure. Anyway, the 
>>> error
>>>  is definitely not well handled, and a fix would be great.
>>>  Also, request_firmware() will crash if called for the first time on
>>>  the resume path because the file system isn't reachable on the 
>>> resume
>>>  process. And no cache is available for this firmware. So I guess 
>>> that
>>>  in this case, request_firmware() returns an error.
>>>  Suspend-plug-resume case is not my priority nether as long as it
>>>  doesn't make the system crash (Which is currently the case).
>>> 
>>>  On Wed, Oct 12 2022 at 11:25:59 AM +0300, Jani Nikula
>>>  <jani.nikula@intel.com> wrote:
>>>>  On Tue, 11 Oct 2022, Matthieu CHARETTE 
>>>> <matthieu.charette@gmail.com>
>>>>  \x7fwrote:
>>>>>   Currently the EDID is requested during the resume. But since 
>>>>> it's
>>>>>   requested too early, this means before the filesystem is 
>>>>> mounted,
>>>>>  \x7f\x7fthe
>>>>>   firmware request fails. This make the DRM driver crash when
>>>>>  \x7f\x7fresuming.
>>>>>   This kind of issue should be prevented by the firmware caching
>>>>>  \x7f\x7fprocess
>>>>>   which cache every firmware requested for the next resume. But
>>>>>  since \x7f\x7fwe
>>>>>   are using a temporary device, the firmware isn't cached on 
>>>>> suspend
>>>>>   since the device doesn't work anymore.
>>>>>   When using a non temporary device to get the EDID, the firmware
>>>>>  will
>>>>>   be cached on suspend for the next resume. So requesting the
>>>>>  firmware
>>>>>   during resume will succeed.
>>>>>   But if the firmware has never been requested since the boot, 
>>>>> this
>>>>>   means that the monitor isn't plugged since the boot. The kernel
>>>>>  will
>>>>>   not be caching the EDID. So if we plug the monitor while the
>>>>>  machine
>>>>>   is suspended. The resume will fail to load the firmware. And the
>>>>>  DRM
>>>>>   driver will crash.
>>>>>   So basically, your fix should solve the issue except for the 
>>>>> case
>>>>>   where the monitor hasn't been plugged since boot and is plugged
>>>>>  \x7f\x7fwhile
>>>>>   the machine is suspended.
>>>>>   I hope I was clear. Tell me if I wasn't. I'm not really good at
>>>>>  \x7f\x7fexplaining.
>>>> 
>>>>  That was a pretty good explanation. The only thing I'm missing is
>>>>  what
>>>>  the failure mode is exactly when you claim the driver will crash. 
>>>> Why
>>>>  would request_firmware() "crash" if called for the first time on 
>>>> the
>>>>  resume path?
>>>> 
>>>>  I'm not sure I care much about not being able to load the firmware
>>>>  \x7fEDID
>>>>  in the suspend-plug-resume case (as this can be remedied with a
>>>>  subsequent modeset), but obviously any errors need to be handled
>>>>  gracefully, without crashing.
>>>> 
>>>>  BR,
>>>>  Jani.
>>>> 
>>>> 
>>>>  --
>>>>  Jani Nikula, Intel Open Source Graphics Center
>>> 
>>> 
>> 
>> 
> 
> --
> Jani Nikula, Intel Open Source Graphics Center



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] drm/edid/firmware: stop using throwaway platform device
  2022-11-08 15:40               ` Matthieu CHARETTE
@ 2022-11-13 19:26                 ` Matthieu CHARETTE
  0 siblings, 0 replies; 10+ messages in thread
From: Matthieu CHARETTE @ 2022-11-13 19:26 UTC (permalink / raw)
  To: Jani Nikula; +Cc: intel-gfx, dri-devel

Hi,

I've tested the patch and I can confirm that it fixed the issue.
Tested on Fedora 36 with kernel 6.0.8.

Thanks,
Matthieu

On Tue, Nov 8 2022 at 04:40:52 PM +0100, Matthieu CHARETTE 
<matthieu.charette@gmail.com> wrote:
> I didn't test the patch yet. I will do. But even without testing I 
> can tell you that it will work (It will not crash).
> Currently when the crash occurs, all screens remain black after 
> resume. I'm not able to login with ssh neither. And logs end before 
> the suspend. So the crash seems to be some kind of kernel panic.
> 
> Matthieu
> 
> On Tue, Nov 8 2022 at 01:27:33 PM +0200, Jani Nikula 
> <jani.nikula@intel.com> wrote:
>> On Sun, 06 Nov 2022, Matthieu CHARETTE <matthieu.charette@gmail.com> 
>> \x7fwrote:
>>>  Hi,
>>> 
>>>  Can you tell me what are we waiting for? Maybe I can help.
>> 
>> Have you tried the patch? Is it an improvement over the status quo?
>> 
>> The "crash" is still ambiguous to me. Do you observe it with the 
>> \x7fpatch?
>> Do you have logs? Etc.
>> 
>> BR,
>> Jani.
>> 
>> 
>>> 
>>>  Thanks.
>>> 
>>>  Matthieu
>>> 
>>>  On Wed, Oct 12 2022 at 07:16:29 PM +0200, Matthieu CHARETTE
>>>  <matthieu.charette@gmail.com> wrote:
>>>>  By crash, I mean that an error is returned here:
>>>>  
>>>> \x7f\x7f\x7fLINKIFYEFGabGCFEcabCEEfeCECDAcaebFGeICGHEAADGDb
>>>>  I don't really know what happens next, but on my machine the 
>>>> \x7f\x7f\x7fbuilt-in
>>>>  screen and the external remains dark. Also the kernel seems to
>>>>  freeze. I suspect a kernel panic, but I'm not sure. Anyway, the 
>>>> \x7f\x7f\x7ferror
>>>>  is definitely not well handled, and a fix would be great.
>>>>  Also, request_firmware() will crash if called for the first time 
>>>> on
>>>>  the resume path because the file system isn't reachable on the 
>>>> \x7f\x7f\x7fresume
>>>>  process. And no cache is available for this firmware. So I guess 
>>>> \x7f\x7f\x7fthat
>>>>  in this case, request_firmware() returns an error.
>>>>  Suspend-plug-resume case is not my priority nether as long as it
>>>>  doesn't make the system crash (Which is currently the case).
>>>> 
>>>>  On Wed, Oct 12 2022 at 11:25:59 AM +0300, Jani Nikula
>>>>  <jani.nikula@intel.com> wrote:
>>>>>  On Tue, 11 Oct 2022, Matthieu CHARETTE 
>>>>> \x7f\x7f\x7f\x7f<matthieu.charette@gmail.com>
>>>>>  \x7fwrote:
>>>>>>   Currently the EDID is requested during the resume. But since 
>>>>>> \x7f\x7f\x7f\x7f\x7fit's
>>>>>>   requested too early, this means before the filesystem is 
>>>>>> \x7f\x7f\x7f\x7f\x7fmounted,
>>>>>>  \x7f\x7fthe
>>>>>>   firmware request fails. This make the DRM driver crash when
>>>>>>  \x7f\x7fresuming.
>>>>>>   This kind of issue should be prevented by the firmware caching
>>>>>>  \x7f\x7fprocess
>>>>>>   which cache every firmware requested for the next resume. But
>>>>>>  since \x7f\x7fwe
>>>>>>   are using a temporary device, the firmware isn't cached on 
>>>>>> \x7f\x7f\x7f\x7f\x7fsuspend
>>>>>>   since the device doesn't work anymore.
>>>>>>   When using a non temporary device to get the EDID, the firmware
>>>>>>  will
>>>>>>   be cached on suspend for the next resume. So requesting the
>>>>>>  firmware
>>>>>>   during resume will succeed.
>>>>>>   But if the firmware has never been requested since the boot, 
>>>>>> \x7f\x7f\x7f\x7f\x7fthis
>>>>>>   means that the monitor isn't plugged since the boot. The kernel
>>>>>>  will
>>>>>>   not be caching the EDID. So if we plug the monitor while the
>>>>>>  machine
>>>>>>   is suspended. The resume will fail to load the firmware. And 
>>>>>> the
>>>>>>  DRM
>>>>>>   driver will crash.
>>>>>>   So basically, your fix should solve the issue except for the 
>>>>>> \x7f\x7f\x7f\x7f\x7fcase
>>>>>>   where the monitor hasn't been plugged since boot and is plugged
>>>>>>  \x7f\x7fwhile
>>>>>>   the machine is suspended.
>>>>>>   I hope I was clear. Tell me if I wasn't. I'm not really good at
>>>>>>  \x7f\x7fexplaining.
>>>>> 
>>>>>  That was a pretty good explanation. The only thing I'm missing is
>>>>>  what
>>>>>  the failure mode is exactly when you claim the driver will 
>>>>> crash. \x7f\x7f\x7f\x7fWhy
>>>>>  would request_firmware() "crash" if called for the first time on 
>>>>> \x7f\x7f\x7f\x7fthe
>>>>>  resume path?
>>>>> 
>>>>>  I'm not sure I care much about not being able to load the 
>>>>> firmware
>>>>>  \x7fEDID
>>>>>  in the suspend-plug-resume case (as this can be remedied with a
>>>>>  subsequent modeset), but obviously any errors need to be handled
>>>>>  gracefully, without crashing.
>>>>> 
>>>>>  BR,
>>>>>  Jani.
>>>>> 
>>>>> 
>>>>>  --
>>>>>  Jani Nikula, Intel Open Source Graphics Center
>>>> 
>>>> 
>>> 
>>> 
>> 
>> --
>> Jani Nikula, Intel Open Source Graphics Center
> 
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-13 19:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-06 22:21 [PATCH] drm/edid/firmware: stop using throwaway platform device Jani Nikula
2022-10-11  6:27 ` Matthieu CHARETTE
2022-10-11  7:20   ` Jani Nikula
2022-10-11 20:45     ` Matthieu CHARETTE
2022-10-12  8:25       ` Jani Nikula
2022-10-12 17:16         ` Matthieu CHARETTE
2022-11-06 15:03           ` Matthieu CHARETTE
2022-11-08 11:27             ` Jani Nikula
2022-11-08 15:40               ` Matthieu CHARETTE
2022-11-13 19:26                 ` Matthieu CHARETTE

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).