All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] fbdev: wait for references go away
@ 2020-01-20 10:00   ` Gerd Hoffmann
  0 siblings, 0 replies; 27+ messages in thread
From: Gerd Hoffmann @ 2020-01-20 10:00 UTC (permalink / raw)
  To: dri-devel
  Cc: marmarek, Gerd Hoffmann, Bartlomiej Zolnierkiewicz,
	open list:FRAMEBUFFER LAYER, open list

Problem: do_unregister_framebuffer() might return before the device is
fully cleaned up, due to userspace having a file handle for /dev/fb0
open.  Which can result in drm driver not being able to grab resources
(and fail initialization) because the firmware framebuffer still holds
them.  Reportedly plymouth can trigger this.

Fix this by trying to wait until all references are gone.  Don't wait
forever though given that userspace might keep the file handle open.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/video/fbdev/core/fbmem.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index d04554959ea7..2ea8ac05b065 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -35,6 +35,7 @@
 #include <linux/fbcon.h>
 #include <linux/mem_encrypt.h>
 #include <linux/pci.h>
+#include <linux/delay.h>
 
 #include <asm/fb.h>
 
@@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
 
 static void do_unregister_framebuffer(struct fb_info *fb_info)
 {
+	int limit = 100;
+
 	unlink_framebuffer(fb_info);
 	if (fb_info->pixmap.addr &&
 	    (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
@@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
 	fbcon_fb_unregistered(fb_info);
 	console_unlock();
 
+	/* try wait until all references are gone */
+	while (atomic_read(&fb_info->count) > 1 && --limit > 0)
+		msleep(10);
+
 	/* this may free fb info */
 	put_fb_info(fb_info);
 }
-- 
2.18.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH] fbdev: wait for references go away
@ 2020-01-20 10:00   ` Gerd Hoffmann
  0 siblings, 0 replies; 27+ messages in thread
From: Gerd Hoffmann @ 2020-01-20 10:00 UTC (permalink / raw)
  To: dri-devel
  Cc: open list, open list:FRAMEBUFFER LAYER,
	Bartlomiej Zolnierkiewicz, marmarek, Gerd Hoffmann

Problem: do_unregister_framebuffer() might return before the device is
fully cleaned up, due to userspace having a file handle for /dev/fb0
open.  Which can result in drm driver not being able to grab resources
(and fail initialization) because the firmware framebuffer still holds
them.  Reportedly plymouth can trigger this.

Fix this by trying to wait until all references are gone.  Don't wait
forever though given that userspace might keep the file handle open.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/video/fbdev/core/fbmem.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index d04554959ea7..2ea8ac05b065 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -35,6 +35,7 @@
 #include <linux/fbcon.h>
 #include <linux/mem_encrypt.h>
 #include <linux/pci.h>
+#include <linux/delay.h>
 
 #include <asm/fb.h>
 
@@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
 
 static void do_unregister_framebuffer(struct fb_info *fb_info)
 {
+	int limit = 100;
+
 	unlink_framebuffer(fb_info);
 	if (fb_info->pixmap.addr &&
 	    (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
@@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
 	fbcon_fb_unregistered(fb_info);
 	console_unlock();
 
+	/* try wait until all references are gone */
+	while (atomic_read(&fb_info->count) > 1 && --limit > 0)
+		msleep(10);
+
 	/* this may free fb info */
 	put_fb_info(fb_info);
 }
-- 
2.18.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH] fbdev: wait for references go away
@ 2020-01-20 10:00   ` Gerd Hoffmann
  0 siblings, 0 replies; 27+ messages in thread
From: Gerd Hoffmann @ 2020-01-20 10:00 UTC (permalink / raw)
  To: dri-devel
  Cc: open list, open list:FRAMEBUFFER LAYER,
	Bartlomiej Zolnierkiewicz, marmarek, Gerd Hoffmann

Problem: do_unregister_framebuffer() might return before the device is
fully cleaned up, due to userspace having a file handle for /dev/fb0
open.  Which can result in drm driver not being able to grab resources
(and fail initialization) because the firmware framebuffer still holds
them.  Reportedly plymouth can trigger this.

Fix this by trying to wait until all references are gone.  Don't wait
forever though given that userspace might keep the file handle open.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 drivers/video/fbdev/core/fbmem.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index d04554959ea7..2ea8ac05b065 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -35,6 +35,7 @@
 #include <linux/fbcon.h>
 #include <linux/mem_encrypt.h>
 #include <linux/pci.h>
+#include <linux/delay.h>
 
 #include <asm/fb.h>
 
@@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
 
 static void do_unregister_framebuffer(struct fb_info *fb_info)
 {
+	int limit = 100;
+
 	unlink_framebuffer(fb_info);
 	if (fb_info->pixmap.addr &&
 	    (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
@@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
 	fbcon_fb_unregistered(fb_info);
 	console_unlock();
 
+	/* try wait until all references are gone */
+	while (atomic_read(&fb_info->count) > 1 && --limit > 0)
+		msleep(10);
+
 	/* this may free fb info */
 	put_fb_info(fb_info);
 }
-- 
2.18.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-20 10:00   ` Gerd Hoffmann
  (?)
@ 2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-20 17:51 UTC (permalink / raw)
  To: Gerd Hoffmann, dri-devel; +Cc: marmarek, open list:FRAMEBUFFER LAYER, open list


Hi,

On 1/20/20 11:00 AM, Gerd Hoffmann wrote:
> Problem: do_unregister_framebuffer() might return before the device is
> fully cleaned up, due to userspace having a file handle for /dev/fb0

do_unregister_framebuffer() doesn't guarantee that fb_info is freed after
function's return (it only drops the kernel reference on fb_info).

> open.  Which can result in drm driver not being able to grab resources
> (and fail initialization) because the firmware framebuffer still holds
> them.  Reportedly plymouth can trigger this.

Could you please describe issue some more?

I guess that a problem is happening during DRM driver load while fbdev
driver is loaded? I assume do_unregister_framebuffer() is called inside
do_remove_conflicting_framebuffers()?

At first glance it seems to be an user-space issue as it should not be
holding references on /dev/fb0 while DRM driver is being loaded.

> Fix this by trying to wait until all references are gone.  Don't wait
> forever though given that userspace might keep the file handle open.

Where does the 1s maximum delay come from?

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/video/fbdev/core/fbmem.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index d04554959ea7..2ea8ac05b065 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -35,6 +35,7 @@
>  #include <linux/fbcon.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>  
>  #include <asm/fb.h>
>  
> @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
>  
>  static void do_unregister_framebuffer(struct fb_info *fb_info)
>  {
> +	int limit = 100;
> +
>  	unlink_framebuffer(fb_info);
>  	if (fb_info->pixmap.addr &&
>  	    (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
>  	fbcon_fb_unregistered(fb_info);
>  	console_unlock();
>  
> +	/* try wait until all references are gone */
> +	while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> +		msleep(10);
> +
>  	/* this may free fb info */
>  	put_fb_info(fb_info);
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-20 17:51 UTC (permalink / raw)
  To: Gerd Hoffmann, dri-devel; +Cc: open list:FRAMEBUFFER LAYER, marmarek, open list


Hi,

On 1/20/20 11:00 AM, Gerd Hoffmann wrote:
> Problem: do_unregister_framebuffer() might return before the device is
> fully cleaned up, due to userspace having a file handle for /dev/fb0

do_unregister_framebuffer() doesn't guarantee that fb_info is freed after
function's return (it only drops the kernel reference on fb_info).

> open.  Which can result in drm driver not being able to grab resources
> (and fail initialization) because the firmware framebuffer still holds
> them.  Reportedly plymouth can trigger this.

Could you please describe issue some more?

I guess that a problem is happening during DRM driver load while fbdev
driver is loaded? I assume do_unregister_framebuffer() is called inside
do_remove_conflicting_framebuffers()?

At first glance it seems to be an user-space issue as it should not be
holding references on /dev/fb0 while DRM driver is being loaded.

> Fix this by trying to wait until all references are gone.  Don't wait
> forever though given that userspace might keep the file handle open.

Where does the 1s maximum delay come from?

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/video/fbdev/core/fbmem.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index d04554959ea7..2ea8ac05b065 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -35,6 +35,7 @@
>  #include <linux/fbcon.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>  
>  #include <asm/fb.h>
>  
> @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
>  
>  static void do_unregister_framebuffer(struct fb_info *fb_info)
>  {
> +	int limit = 100;
> +
>  	unlink_framebuffer(fb_info);
>  	if (fb_info->pixmap.addr &&
>  	    (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
>  	fbcon_fb_unregistered(fb_info);
>  	console_unlock();
>  
> +	/* try wait until all references are gone */
> +	while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> +		msleep(10);
> +
>  	/* this may free fb info */
>  	put_fb_info(fb_info);
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-20 17:51 UTC (permalink / raw)
  To: Gerd Hoffmann, dri-devel; +Cc: open list:FRAMEBUFFER LAYER, marmarek, open list


Hi,

On 1/20/20 11:00 AM, Gerd Hoffmann wrote:
> Problem: do_unregister_framebuffer() might return before the device is
> fully cleaned up, due to userspace having a file handle for /dev/fb0

do_unregister_framebuffer() doesn't guarantee that fb_info is freed after
function's return (it only drops the kernel reference on fb_info).

> open.  Which can result in drm driver not being able to grab resources
> (and fail initialization) because the firmware framebuffer still holds
> them.  Reportedly plymouth can trigger this.

Could you please describe issue some more?

I guess that a problem is happening during DRM driver load while fbdev
driver is loaded? I assume do_unregister_framebuffer() is called inside
do_remove_conflicting_framebuffers()?

At first glance it seems to be an user-space issue as it should not be
holding references on /dev/fb0 while DRM driver is being loaded.

> Fix this by trying to wait until all references are gone.  Don't wait
> forever though given that userspace might keep the file handle open.

Where does the 1s maximum delay come from?

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  drivers/video/fbdev/core/fbmem.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index d04554959ea7..2ea8ac05b065 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -35,6 +35,7 @@
>  #include <linux/fbcon.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>  
>  #include <asm/fb.h>
>  
> @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
>  
>  static void do_unregister_framebuffer(struct fb_info *fb_info)
>  {
> +	int limit = 100;
> +
>  	unlink_framebuffer(fb_info);
>  	if (fb_info->pixmap.addr &&
>  	    (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
>  	fbcon_fb_unregistered(fb_info);
>  	console_unlock();
>  
> +	/* try wait until all references are gone */
> +	while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> +		msleep(10);
> +
>  	/* this may free fb info */
>  	put_fb_info(fb_info);
>  }
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
  (?)
@ 2020-01-20 17:54       ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-20 17:54 UTC (permalink / raw)
  To: Gerd Hoffmann, dri-devel; +Cc: marmarek, open list:FRAMEBUFFER LAYER, open list


On 1/20/20 6:51 PM, Bartlomiej Zolnierkiewicz wrote:

> do_unregister_framebuffer() doesn't guarantee that fb_info is freed after
> function's return (it only drops the kernel reference on fb_info).
s/kernel/fbdev device driver/

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-20 17:54       ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-20 17:54 UTC (permalink / raw)
  To: Gerd Hoffmann, dri-devel; +Cc: open list:FRAMEBUFFER LAYER, marmarek, open list


On 1/20/20 6:51 PM, Bartlomiej Zolnierkiewicz wrote:

> do_unregister_framebuffer() doesn't guarantee that fb_info is freed after
> function's return (it only drops the kernel reference on fb_info).
s/kernel/fbdev device driver/

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-20 17:54       ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-20 17:54 UTC (permalink / raw)
  To: Gerd Hoffmann, dri-devel; +Cc: open list:FRAMEBUFFER LAYER, marmarek, open list


On 1/20/20 6:51 PM, Bartlomiej Zolnierkiewicz wrote:

> do_unregister_framebuffer() doesn't guarantee that fb_info is freed after
> function's return (it only drops the kernel reference on fb_info).
s/kernel/fbdev device driver/

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
  (?)
@ 2020-01-20 18:11       ` Marek Marczykowski-Górecki
  -1 siblings, 0 replies; 27+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-01-20 18:11 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Gerd Hoffmann, dri-devel, open list:FRAMEBUFFER LAYER, open list

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

On Mon, Jan 20, 2020 at 06:51:17PM +0100, Bartlomiej Zolnierkiewicz wrote:
> I guess that a problem is happening during DRM driver load while fbdev
> driver is loaded? I assume do_unregister_framebuffer() is called inside
> do_remove_conflicting_framebuffers()?

Yes, exactly. More details here:
https://lists.linuxfoundation.org/pipermail/virtualization/2020-January/045026.html

> At first glance it seems to be an user-space issue as it should not be
> holding references on /dev/fb0 while DRM driver is being loaded.

How plymouth would know when exactly it needs to release /dev/fb0?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-20 18:11       ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 27+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-01-20 18:11 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: open list:FRAMEBUFFER LAYER, Gerd Hoffmann, dri-devel, open list

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

On Mon, Jan 20, 2020 at 06:51:17PM +0100, Bartlomiej Zolnierkiewicz wrote:
> I guess that a problem is happening during DRM driver load while fbdev
> driver is loaded? I assume do_unregister_framebuffer() is called inside
> do_remove_conflicting_framebuffers()?

Yes, exactly. More details here:
https://lists.linuxfoundation.org/pipermail/virtualization/2020-January/045026.html

> At first glance it seems to be an user-space issue as it should not be
> holding references on /dev/fb0 while DRM driver is being loaded.

How plymouth would know when exactly it needs to release /dev/fb0?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-20 18:11       ` Marek Marczykowski-Górecki
  0 siblings, 0 replies; 27+ messages in thread
From: Marek Marczykowski-Górecki @ 2020-01-20 18:11 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: open list:FRAMEBUFFER LAYER, Gerd Hoffmann, dri-devel, open list


[-- Attachment #1.1: Type: text/plain, Size: 786 bytes --]

On Mon, Jan 20, 2020 at 06:51:17PM +0100, Bartlomiej Zolnierkiewicz wrote:
> I guess that a problem is happening during DRM driver load while fbdev
> driver is loaded? I assume do_unregister_framebuffer() is called inside
> do_remove_conflicting_framebuffers()?

Yes, exactly. More details here:
https://lists.linuxfoundation.org/pipermail/virtualization/2020-January/045026.html

> At first glance it seems to be an user-space issue as it should not be
> holding references on /dev/fb0 while DRM driver is being loaded.

How plymouth would know when exactly it needs to release /dev/fb0?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
  (?)
@ 2020-01-21  5:53       ` Gerd Hoffmann
  -1 siblings, 0 replies; 27+ messages in thread
From: Gerd Hoffmann @ 2020-01-21  5:53 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: dri-devel, marmarek, open list:FRAMEBUFFER LAYER, open list

  Hi,

> > open.  Which can result in drm driver not being able to grab resources
> > (and fail initialization) because the firmware framebuffer still holds
> > them.  Reportedly plymouth can trigger this.
> 
> Could you please describe issue some more?
> 
> I guess that a problem is happening during DRM driver load while fbdev
> driver is loaded? I assume do_unregister_framebuffer() is called inside
> do_remove_conflicting_framebuffers()?

Yes.  Specifically bochs-drm.ko and efifb in virtual machines.

> At first glance it seems to be an user-space issue as it should not be
> holding references on /dev/fb0 while DRM driver is being loaded.

Well, the drm driver is loaded by udev like everything else.

Dunno what plymouth (graphical boot screen tool) does to handle the
situation.  I guess listening to udev events.  So it should notice efifb
going away and drop the /dev/fb0 reference, but this races against
bochs-drm initializing.

> > Fix this by trying to wait until all references are gone.  Don't wait
> > forever though given that userspace might keep the file handle open.
> 
> Where does the 1s maximum delay come from?

Pulled out something out of thin air which I expect being on the safe
side.  plymouth responding on the udev event should need only a small
fraction of that.

cheers,
  Gerd


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-21  5:53       ` Gerd Hoffmann
  0 siblings, 0 replies; 27+ messages in thread
From: Gerd Hoffmann @ 2020-01-21  5:53 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: open list:FRAMEBUFFER LAYER, marmarek, dri-devel, open list

  Hi,

> > open.  Which can result in drm driver not being able to grab resources
> > (and fail initialization) because the firmware framebuffer still holds
> > them.  Reportedly plymouth can trigger this.
> 
> Could you please describe issue some more?
> 
> I guess that a problem is happening during DRM driver load while fbdev
> driver is loaded? I assume do_unregister_framebuffer() is called inside
> do_remove_conflicting_framebuffers()?

Yes.  Specifically bochs-drm.ko and efifb in virtual machines.

> At first glance it seems to be an user-space issue as it should not be
> holding references on /dev/fb0 while DRM driver is being loaded.

Well, the drm driver is loaded by udev like everything else.

Dunno what plymouth (graphical boot screen tool) does to handle the
situation.  I guess listening to udev events.  So it should notice efifb
going away and drop the /dev/fb0 reference, but this races against
bochs-drm initializing.

> > Fix this by trying to wait until all references are gone.  Don't wait
> > forever though given that userspace might keep the file handle open.
> 
> Where does the 1s maximum delay come from?

Pulled out something out of thin air which I expect being on the safe
side.  plymouth responding on the udev event should need only a small
fraction of that.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-21  5:53       ` Gerd Hoffmann
  0 siblings, 0 replies; 27+ messages in thread
From: Gerd Hoffmann @ 2020-01-21  5:53 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: open list:FRAMEBUFFER LAYER, marmarek, dri-devel, open list

  Hi,

> > open.  Which can result in drm driver not being able to grab resources
> > (and fail initialization) because the firmware framebuffer still holds
> > them.  Reportedly plymouth can trigger this.
> 
> Could you please describe issue some more?
> 
> I guess that a problem is happening during DRM driver load while fbdev
> driver is loaded? I assume do_unregister_framebuffer() is called inside
> do_remove_conflicting_framebuffers()?

Yes.  Specifically bochs-drm.ko and efifb in virtual machines.

> At first glance it seems to be an user-space issue as it should not be
> holding references on /dev/fb0 while DRM driver is being loaded.

Well, the drm driver is loaded by udev like everything else.

Dunno what plymouth (graphical boot screen tool) does to handle the
situation.  I guess listening to udev events.  So it should notice efifb
going away and drop the /dev/fb0 reference, but this races against
bochs-drm initializing.

> > Fix this by trying to wait until all references are gone.  Don't wait
> > forever though given that userspace might keep the file handle open.
> 
> Where does the 1s maximum delay come from?

Pulled out something out of thin air which I expect being on the safe
side.  plymouth responding on the udev event should need only a small
fraction of that.

cheers,
  Gerd

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-21  5:53       ` Gerd Hoffmann
  (?)
@ 2020-01-28 15:20         ` Bartlomiej Zolnierkiewicz
  -1 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-28 15:20 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: dri-devel, marmarek, open list:FRAMEBUFFER LAYER, open list


On 1/21/20 6:53 AM, Gerd Hoffmann wrote:
>   Hi,
> 
>>> open.  Which can result in drm driver not being able to grab resources
>>> (and fail initialization) because the firmware framebuffer still holds
>>> them.  Reportedly plymouth can trigger this.
>>
>> Could you please describe issue some more?
>>
>> I guess that a problem is happening during DRM driver load while fbdev
>> driver is loaded? I assume do_unregister_framebuffer() is called inside
>> do_remove_conflicting_framebuffers()?
> 
> Yes.  Specifically bochs-drm.ko and efifb in virtual machines.
> 
>> At first glance it seems to be an user-space issue as it should not be
>> holding references on /dev/fb0 while DRM driver is being loaded.
> 
> Well, the drm driver is loaded by udev like everything else.
> 
> Dunno what plymouth (graphical boot screen tool) does to handle the
> situation.  I guess listening to udev events.  So it should notice efifb
> going away and drop the /dev/fb0 reference, but this races against
> bochs-drm initializing.

It has been a week and there have been no alternative proposals to
address the problem so I incline to accepting this approach..

However please rework the patch slightly:

- Don't wait in the usual fb_info removal code-path, only in the driver
  replacement one. You can achieve this by adding additional "bool wait"
  parameter to do_unregister_framebuffer()

- Add a FIXME comment just before the wait loop with the description of
  the issue (the above explanation of the race between plymouth and udev
  would be fine) so we will remember why this workaround is needed.

- Change patch summary to something more descriptive (i.e. to "fbdev:
  workaround race on driver replacement").

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

>>> Fix this by trying to wait until all references are gone.  Don't wait
>>> forever though given that userspace might keep the file handle open.
>>
>> Where does the 1s maximum delay come from?
> 
> Pulled out something out of thin air which I expect being on the safe
> side.  plymouth responding on the udev event should need only a small
> fraction of that.
> 
> cheers,
>   Gerd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 15:20         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-28 15:20 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: open list:FRAMEBUFFER LAYER, marmarek, dri-devel, open list


On 1/21/20 6:53 AM, Gerd Hoffmann wrote:
>   Hi,
> 
>>> open.  Which can result in drm driver not being able to grab resources
>>> (and fail initialization) because the firmware framebuffer still holds
>>> them.  Reportedly plymouth can trigger this.
>>
>> Could you please describe issue some more?
>>
>> I guess that a problem is happening during DRM driver load while fbdev
>> driver is loaded? I assume do_unregister_framebuffer() is called inside
>> do_remove_conflicting_framebuffers()?
> 
> Yes.  Specifically bochs-drm.ko and efifb in virtual machines.
> 
>> At first glance it seems to be an user-space issue as it should not be
>> holding references on /dev/fb0 while DRM driver is being loaded.
> 
> Well, the drm driver is loaded by udev like everything else.
> 
> Dunno what plymouth (graphical boot screen tool) does to handle the
> situation.  I guess listening to udev events.  So it should notice efifb
> going away and drop the /dev/fb0 reference, but this races against
> bochs-drm initializing.

It has been a week and there have been no alternative proposals to
address the problem so I incline to accepting this approach..

However please rework the patch slightly:

- Don't wait in the usual fb_info removal code-path, only in the driver
  replacement one. You can achieve this by adding additional "bool wait"
  parameter to do_unregister_framebuffer()

- Add a FIXME comment just before the wait loop with the description of
  the issue (the above explanation of the race between plymouth and udev
  would be fine) so we will remember why this workaround is needed.

- Change patch summary to something more descriptive (i.e. to "fbdev:
  workaround race on driver replacement").

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

>>> Fix this by trying to wait until all references are gone.  Don't wait
>>> forever though given that userspace might keep the file handle open.
>>
>> Where does the 1s maximum delay come from?
> 
> Pulled out something out of thin air which I expect being on the safe
> side.  plymouth responding on the udev event should need only a small
> fraction of that.
> 
> cheers,
>   Gerd

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 15:20         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 27+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2020-01-28 15:20 UTC (permalink / raw)
  To: Gerd Hoffmann; +Cc: open list:FRAMEBUFFER LAYER, marmarek, dri-devel, open list


On 1/21/20 6:53 AM, Gerd Hoffmann wrote:
>   Hi,
> 
>>> open.  Which can result in drm driver not being able to grab resources
>>> (and fail initialization) because the firmware framebuffer still holds
>>> them.  Reportedly plymouth can trigger this.
>>
>> Could you please describe issue some more?
>>
>> I guess that a problem is happening during DRM driver load while fbdev
>> driver is loaded? I assume do_unregister_framebuffer() is called inside
>> do_remove_conflicting_framebuffers()?
> 
> Yes.  Specifically bochs-drm.ko and efifb in virtual machines.
> 
>> At first glance it seems to be an user-space issue as it should not be
>> holding references on /dev/fb0 while DRM driver is being loaded.
> 
> Well, the drm driver is loaded by udev like everything else.
> 
> Dunno what plymouth (graphical boot screen tool) does to handle the
> situation.  I guess listening to udev events.  So it should notice efifb
> going away and drop the /dev/fb0 reference, but this races against
> bochs-drm initializing.

It has been a week and there have been no alternative proposals to
address the problem so I incline to accepting this approach..

However please rework the patch slightly:

- Don't wait in the usual fb_info removal code-path, only in the driver
  replacement one. You can achieve this by adding additional "bool wait"
  parameter to do_unregister_framebuffer()

- Add a FIXME comment just before the wait loop with the description of
  the issue (the above explanation of the race between plymouth and udev
  would be fine) so we will remember why this workaround is needed.

- Change patch summary to something more descriptive (i.e. to "fbdev:
  workaround race on driver replacement").

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

>>> Fix this by trying to wait until all references are gone.  Don't wait
>>> forever though given that userspace might keep the file handle open.
>>
>> Where does the 1s maximum delay come from?
> 
> Pulled out something out of thin air which I expect being on the safe
> side.  plymouth responding on the udev event should need only a small
> fraction of that.
> 
> cheers,
>   Gerd
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-20 10:00   ` Gerd Hoffmann
  (?)
@ 2020-01-28 16:39     ` Daniel Vetter
  -1 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:39 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: dri-devel, open list, open list:FRAMEBUFFER LAYER,
	Bartlomiej Zolnierkiewicz, marmarek

On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> Problem: do_unregister_framebuffer() might return before the device is
> fully cleaned up, due to userspace having a file handle for /dev/fb0
> open.  Which can result in drm driver not being able to grab resources
> (and fail initialization) because the firmware framebuffer still holds
> them.  Reportedly plymouth can trigger this.
>
> Fix this by trying to wait until all references are gone.  Don't wait
> forever though given that userspace might keep the file handle open.
>
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

(Missed this because lca, so a bit late)

This isn't really how driver unload is supposed to happen. Instead:

- Driver unload starts
- Driver calls the foo_unregister function, which stops new userspace
from getting at the driver. If you're subsystem is good (i.e. drm
since Noralf fixed it) this will also sufficiently synchronize with
any pending ioctl.
- Important: This does _not_ wait until userspace closes all
references. You can't force that.
- Driver releases all hw structures and mappings and everything else.
With fbdev this is currently not fully race free because no one is
synchronizing with userspace everywhere correctly.

... much time can pass ...

- Userspace releases the last references, which triggers the final
destroy stuff and which releases the memory occupied by various
structures still (but not anything releated to hw or anything else
really).

So there's two bits:

1. Synchronizing with pending ioctls. This is mostly there already
with lock_fb_info/unlock_fb_info. From a quick look the missing bit
seems to be that the unregister code is not taking that lock, and so
not sufficiently synchronizing against concurrent ioctl calls and
other stuff. Plus would need to audit all entry points.

1a. fbcon works differently. Don't look too closely, but this is also
not the problem your facing here.

2. Refcounting of the fb structure and hw teardown. That's what's
tracked in fb_info->count. Most likely the fbdev driver you have has a
wrong split between the hw teardown code and what's in fb_destroy. If
you have any hw cleanup code in fb_destroy that driver is buggy. efifb
is very buggy in that area :-) Same for offb, simplefb, vesafb and
vesa16fb.

We might need a new fb_unregister callback for these drivers to be
able to fix this properly. Because the unregister comes from the fbdev
core, and not the driver as usual, so the usual driver unload sequence
doesnt work:

drm_dev_unregister();
... release all hw resource ...

drm_dev_put();

Or in terms of fbdev:

unregister_framebuffer(info);
... release all hw resources ... <- everyone gets this wrong
framebuffer_release(info); <- also wrong because not refcounted,
hooray, this should be moved to to end of the ->fb_destroy callback

So we need a callback to put the "release all hw resources" step into
the flow at the right place. Another option (slightly less midlayer)
would be to add a fb_takeover hook, for these platforms drivers, which
would then do the above sequence (like at driver unload).

Also adding Noralf, since he's fixed up all the drm stuff in this area
in the past.

Cheers, Daniel

> ---
>  drivers/video/fbdev/core/fbmem.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index d04554959ea7..2ea8ac05b065 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -35,6 +35,7 @@
>  #include <linux/fbcon.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>
>  #include <asm/fb.h>
>
> @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
>
>  static void do_unregister_framebuffer(struct fb_info *fb_info)
>  {
> +       int limit = 100;
> +
>         unlink_framebuffer(fb_info);
>         if (fb_info->pixmap.addr &&
>             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
>         fbcon_fb_unregistered(fb_info);
>         console_unlock();
>
> +       /* try wait until all references are gone */
> +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> +               msleep(10);
> +
>         /* this may free fb info */
>         put_fb_info(fb_info);
>  }
> --
> 2.18.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 16:39     ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:39 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: marmarek, open list:FRAMEBUFFER LAYER, open list, dri-devel,
	Bartlomiej Zolnierkiewicz

On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> Problem: do_unregister_framebuffer() might return before the device is
> fully cleaned up, due to userspace having a file handle for /dev/fb0
> open.  Which can result in drm driver not being able to grab resources
> (and fail initialization) because the firmware framebuffer still holds
> them.  Reportedly plymouth can trigger this.
>
> Fix this by trying to wait until all references are gone.  Don't wait
> forever though given that userspace might keep the file handle open.
>
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

(Missed this because lca, so a bit late)

This isn't really how driver unload is supposed to happen. Instead:

- Driver unload starts
- Driver calls the foo_unregister function, which stops new userspace
from getting at the driver. If you're subsystem is good (i.e. drm
since Noralf fixed it) this will also sufficiently synchronize with
any pending ioctl.
- Important: This does _not_ wait until userspace closes all
references. You can't force that.
- Driver releases all hw structures and mappings and everything else.
With fbdev this is currently not fully race free because no one is
synchronizing with userspace everywhere correctly.

... much time can pass ...

- Userspace releases the last references, which triggers the final
destroy stuff and which releases the memory occupied by various
structures still (but not anything releated to hw or anything else
really).

So there's two bits:

1. Synchronizing with pending ioctls. This is mostly there already
with lock_fb_info/unlock_fb_info. From a quick look the missing bit
seems to be that the unregister code is not taking that lock, and so
not sufficiently synchronizing against concurrent ioctl calls and
other stuff. Plus would need to audit all entry points.

1a. fbcon works differently. Don't look too closely, but this is also
not the problem your facing here.

2. Refcounting of the fb structure and hw teardown. That's what's
tracked in fb_info->count. Most likely the fbdev driver you have has a
wrong split between the hw teardown code and what's in fb_destroy. If
you have any hw cleanup code in fb_destroy that driver is buggy. efifb
is very buggy in that area :-) Same for offb, simplefb, vesafb and
vesa16fb.

We might need a new fb_unregister callback for these drivers to be
able to fix this properly. Because the unregister comes from the fbdev
core, and not the driver as usual, so the usual driver unload sequence
doesnt work:

drm_dev_unregister();
... release all hw resource ...

drm_dev_put();

Or in terms of fbdev:

unregister_framebuffer(info);
... release all hw resources ... <- everyone gets this wrong
framebuffer_release(info); <- also wrong because not refcounted,
hooray, this should be moved to to end of the ->fb_destroy callback

So we need a callback to put the "release all hw resources" step into
the flow at the right place. Another option (slightly less midlayer)
would be to add a fb_takeover hook, for these platforms drivers, which
would then do the above sequence (like at driver unload).

Also adding Noralf, since he's fixed up all the drm stuff in this area
in the past.

Cheers, Daniel

> ---
>  drivers/video/fbdev/core/fbmem.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index d04554959ea7..2ea8ac05b065 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -35,6 +35,7 @@
>  #include <linux/fbcon.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>
>  #include <asm/fb.h>
>
> @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
>
>  static void do_unregister_framebuffer(struct fb_info *fb_info)
>  {
> +       int limit = 100;
> +
>         unlink_framebuffer(fb_info);
>         if (fb_info->pixmap.addr &&
>             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
>         fbcon_fb_unregistered(fb_info);
>         console_unlock();
>
> +       /* try wait until all references are gone */
> +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> +               msleep(10);
> +
>         /* this may free fb info */
>         put_fb_info(fb_info);
>  }
> --
> 2.18.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 16:39     ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:39 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: marmarek, open list:FRAMEBUFFER LAYER, open list, dri-devel,
	Bartlomiej Zolnierkiewicz

On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
>
> Problem: do_unregister_framebuffer() might return before the device is
> fully cleaned up, due to userspace having a file handle for /dev/fb0
> open.  Which can result in drm driver not being able to grab resources
> (and fail initialization) because the firmware framebuffer still holds
> them.  Reportedly plymouth can trigger this.
>
> Fix this by trying to wait until all references are gone.  Don't wait
> forever though given that userspace might keep the file handle open.
>
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>

(Missed this because lca, so a bit late)

This isn't really how driver unload is supposed to happen. Instead:

- Driver unload starts
- Driver calls the foo_unregister function, which stops new userspace
from getting at the driver. If you're subsystem is good (i.e. drm
since Noralf fixed it) this will also sufficiently synchronize with
any pending ioctl.
- Important: This does _not_ wait until userspace closes all
references. You can't force that.
- Driver releases all hw structures and mappings and everything else.
With fbdev this is currently not fully race free because no one is
synchronizing with userspace everywhere correctly.

... much time can pass ...

- Userspace releases the last references, which triggers the final
destroy stuff and which releases the memory occupied by various
structures still (but not anything releated to hw or anything else
really).

So there's two bits:

1. Synchronizing with pending ioctls. This is mostly there already
with lock_fb_info/unlock_fb_info. From a quick look the missing bit
seems to be that the unregister code is not taking that lock, and so
not sufficiently synchronizing against concurrent ioctl calls and
other stuff. Plus would need to audit all entry points.

1a. fbcon works differently. Don't look too closely, but this is also
not the problem your facing here.

2. Refcounting of the fb structure and hw teardown. That's what's
tracked in fb_info->count. Most likely the fbdev driver you have has a
wrong split between the hw teardown code and what's in fb_destroy. If
you have any hw cleanup code in fb_destroy that driver is buggy. efifb
is very buggy in that area :-) Same for offb, simplefb, vesafb and
vesa16fb.

We might need a new fb_unregister callback for these drivers to be
able to fix this properly. Because the unregister comes from the fbdev
core, and not the driver as usual, so the usual driver unload sequence
doesnt work:

drm_dev_unregister();
... release all hw resource ...

drm_dev_put();

Or in terms of fbdev:

unregister_framebuffer(info);
... release all hw resources ... <- everyone gets this wrong
framebuffer_release(info); <- also wrong because not refcounted,
hooray, this should be moved to to end of the ->fb_destroy callback

So we need a callback to put the "release all hw resources" step into
the flow at the right place. Another option (slightly less midlayer)
would be to add a fb_takeover hook, for these platforms drivers, which
would then do the above sequence (like at driver unload).

Also adding Noralf, since he's fixed up all the drm stuff in this area
in the past.

Cheers, Daniel

> ---
>  drivers/video/fbdev/core/fbmem.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> index d04554959ea7..2ea8ac05b065 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -35,6 +35,7 @@
>  #include <linux/fbcon.h>
>  #include <linux/mem_encrypt.h>
>  #include <linux/pci.h>
> +#include <linux/delay.h>
>
>  #include <asm/fb.h>
>
> @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
>
>  static void do_unregister_framebuffer(struct fb_info *fb_info)
>  {
> +       int limit = 100;
> +
>         unlink_framebuffer(fb_info);
>         if (fb_info->pixmap.addr &&
>             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
>         fbcon_fb_unregistered(fb_info);
>         console_unlock();
>
> +       /* try wait until all references are gone */
> +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> +               msleep(10);
> +
>         /* this may free fb info */
>         put_fb_info(fb_info);
>  }
> --
> 2.18.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-28 16:39     ` Daniel Vetter
  (?)
@ 2020-01-28 16:44       ` Daniel Vetter
  -1 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:44 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: dri-devel, open list, open list:FRAMEBUFFER LAYER,
	Bartlomiej Zolnierkiewicz, marmarek

On Tue, Jan 28, 2020 at 5:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> > Problem: do_unregister_framebuffer() might return before the device is
> > fully cleaned up, due to userspace having a file handle for /dev/fb0
> > open.  Which can result in drm driver not being able to grab resources
> > (and fail initialization) because the firmware framebuffer still holds
> > them.  Reportedly plymouth can trigger this.
> >
> > Fix this by trying to wait until all references are gone.  Don't wait
> > forever though given that userspace might keep the file handle open.
> >
> > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
>
> (Missed this because lca, so a bit late)
>
> This isn't really how driver unload is supposed to happen. Instead:
>
> - Driver unload starts
> - Driver calls the foo_unregister function, which stops new userspace
> from getting at the driver. If you're subsystem is good (i.e. drm
> since Noralf fixed it) this will also sufficiently synchronize with
> any pending ioctl.
> - Important: This does _not_ wait until userspace closes all
> references. You can't force that.
> - Driver releases all hw structures and mappings and everything else.
> With fbdev this is currently not fully race free because no one is
> synchronizing with userspace everywhere correctly.
>
> ... much time can pass ...
>
> - Userspace releases the last references, which triggers the final
> destroy stuff and which releases the memory occupied by various
> structures still (but not anything releated to hw or anything else
> really).
>
> So there's two bits:
>
> 1. Synchronizing with pending ioctls. This is mostly there already
> with lock_fb_info/unlock_fb_info. From a quick look the missing bit
> seems to be that the unregister code is not taking that lock, and so
> not sufficiently synchronizing against concurrent ioctl calls and
> other stuff. Plus would need to audit all entry points.

Correction: The check here is file_fb_info(), which checks for
unregister. Except it's totally racy and misses the end marker (unlike
drm_dev_enter/exit in drm). So bunch of work to do here too. The
lock_fb_info is purely locking, not lifetime (and I think in a bunch
of places way too late).

> 1a. fbcon works differently. Don't look too closely, but this is also
> not the problem your facing here.
>
> 2. Refcounting of the fb structure and hw teardown. That's what's
> tracked in fb_info->count. Most likely the fbdev driver you have has a
> wrong split between the hw teardown code and what's in fb_destroy. If
> you have any hw cleanup code in fb_destroy that driver is buggy. efifb
> is very buggy in that area :-) Same for offb, simplefb, vesafb and
> vesa16fb.
>
> We might need a new fb_unregister callback for these drivers to be
> able to fix this properly. Because the unregister comes from the fbdev
> core, and not the driver as usual, so the usual driver unload sequence
> doesnt work:
>
> drm_dev_unregister();
> ... release all hw resource ...
>
> drm_dev_put();
>
> Or in terms of fbdev:
>
> unregister_framebuffer(info);
> ... release all hw resources ... <- everyone gets this wrong
> framebuffer_release(info); <- also wrong because not refcounted,
> hooray, this should be moved to to end of the ->fb_destroy callback
>
> So we need a callback to put the "release all hw resources" step into
> the flow at the right place. Another option (slightly less midlayer)
> would be to add a fb_takeover hook, for these platforms drivers, which
> would then do the above sequence (like at driver unload).
>
> Also adding Noralf, since he's fixed up all the drm stuff in this area
> in the past.
>
> Cheers, Daniel
>
> > ---
> >  drivers/video/fbdev/core/fbmem.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> > index d04554959ea7..2ea8ac05b065 100644
> > --- a/drivers/video/fbdev/core/fbmem.c
> > +++ b/drivers/video/fbdev/core/fbmem.c
> > @@ -35,6 +35,7 @@
> >  #include <linux/fbcon.h>
> >  #include <linux/mem_encrypt.h>
> >  #include <linux/pci.h>
> > +#include <linux/delay.h>
> >
> >  #include <asm/fb.h>
> >
> > @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
> >
> >  static void do_unregister_framebuffer(struct fb_info *fb_info)
> >  {
> > +       int limit = 100;
> > +
> >         unlink_framebuffer(fb_info);
> >         if (fb_info->pixmap.addr &&
> >             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> > @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
> >         fbcon_fb_unregistered(fb_info);
> >         console_unlock();
> >
> > +       /* try wait until all references are gone */
> > +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> > +               msleep(10);
> > +
> >         /* this may free fb info */
> >         put_fb_info(fb_info);
> >  }
> > --
> > 2.18.1
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 16:44       ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:44 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: marmarek, open list:FRAMEBUFFER LAYER, open list, dri-devel,
	Bartlomiej Zolnierkiewicz

On Tue, Jan 28, 2020 at 5:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> > Problem: do_unregister_framebuffer() might return before the device is
> > fully cleaned up, due to userspace having a file handle for /dev/fb0
> > open.  Which can result in drm driver not being able to grab resources
> > (and fail initialization) because the firmware framebuffer still holds
> > them.  Reportedly plymouth can trigger this.
> >
> > Fix this by trying to wait until all references are gone.  Don't wait
> > forever though given that userspace might keep the file handle open.
> >
> > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
>
> (Missed this because lca, so a bit late)
>
> This isn't really how driver unload is supposed to happen. Instead:
>
> - Driver unload starts
> - Driver calls the foo_unregister function, which stops new userspace
> from getting at the driver. If you're subsystem is good (i.e. drm
> since Noralf fixed it) this will also sufficiently synchronize with
> any pending ioctl.
> - Important: This does _not_ wait until userspace closes all
> references. You can't force that.
> - Driver releases all hw structures and mappings and everything else.
> With fbdev this is currently not fully race free because no one is
> synchronizing with userspace everywhere correctly.
>
> ... much time can pass ...
>
> - Userspace releases the last references, which triggers the final
> destroy stuff and which releases the memory occupied by various
> structures still (but not anything releated to hw or anything else
> really).
>
> So there's two bits:
>
> 1. Synchronizing with pending ioctls. This is mostly there already
> with lock_fb_info/unlock_fb_info. From a quick look the missing bit
> seems to be that the unregister code is not taking that lock, and so
> not sufficiently synchronizing against concurrent ioctl calls and
> other stuff. Plus would need to audit all entry points.

Correction: The check here is file_fb_info(), which checks for
unregister. Except it's totally racy and misses the end marker (unlike
drm_dev_enter/exit in drm). So bunch of work to do here too. The
lock_fb_info is purely locking, not lifetime (and I think in a bunch
of places way too late).

> 1a. fbcon works differently. Don't look too closely, but this is also
> not the problem your facing here.
>
> 2. Refcounting of the fb structure and hw teardown. That's what's
> tracked in fb_info->count. Most likely the fbdev driver you have has a
> wrong split between the hw teardown code and what's in fb_destroy. If
> you have any hw cleanup code in fb_destroy that driver is buggy. efifb
> is very buggy in that area :-) Same for offb, simplefb, vesafb and
> vesa16fb.
>
> We might need a new fb_unregister callback for these drivers to be
> able to fix this properly. Because the unregister comes from the fbdev
> core, and not the driver as usual, so the usual driver unload sequence
> doesnt work:
>
> drm_dev_unregister();
> ... release all hw resource ...
>
> drm_dev_put();
>
> Or in terms of fbdev:
>
> unregister_framebuffer(info);
> ... release all hw resources ... <- everyone gets this wrong
> framebuffer_release(info); <- also wrong because not refcounted,
> hooray, this should be moved to to end of the ->fb_destroy callback
>
> So we need a callback to put the "release all hw resources" step into
> the flow at the right place. Another option (slightly less midlayer)
> would be to add a fb_takeover hook, for these platforms drivers, which
> would then do the above sequence (like at driver unload).
>
> Also adding Noralf, since he's fixed up all the drm stuff in this area
> in the past.
>
> Cheers, Daniel
>
> > ---
> >  drivers/video/fbdev/core/fbmem.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> > index d04554959ea7..2ea8ac05b065 100644
> > --- a/drivers/video/fbdev/core/fbmem.c
> > +++ b/drivers/video/fbdev/core/fbmem.c
> > @@ -35,6 +35,7 @@
> >  #include <linux/fbcon.h>
> >  #include <linux/mem_encrypt.h>
> >  #include <linux/pci.h>
> > +#include <linux/delay.h>
> >
> >  #include <asm/fb.h>
> >
> > @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
> >
> >  static void do_unregister_framebuffer(struct fb_info *fb_info)
> >  {
> > +       int limit = 100;
> > +
> >         unlink_framebuffer(fb_info);
> >         if (fb_info->pixmap.addr &&
> >             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> > @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
> >         fbcon_fb_unregistered(fb_info);
> >         console_unlock();
> >
> > +       /* try wait until all references are gone */
> > +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> > +               msleep(10);
> > +
> >         /* this may free fb info */
> >         put_fb_info(fb_info);
> >  }
> > --
> > 2.18.1
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 16:44       ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:44 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: marmarek, open list:FRAMEBUFFER LAYER, open list, dri-devel,
	Bartlomiej Zolnierkiewicz

On Tue, Jan 28, 2020 at 5:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
> >
> > Problem: do_unregister_framebuffer() might return before the device is
> > fully cleaned up, due to userspace having a file handle for /dev/fb0
> > open.  Which can result in drm driver not being able to grab resources
> > (and fail initialization) because the firmware framebuffer still holds
> > them.  Reportedly plymouth can trigger this.
> >
> > Fix this by trying to wait until all references are gone.  Don't wait
> > forever though given that userspace might keep the file handle open.
> >
> > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
>
> (Missed this because lca, so a bit late)
>
> This isn't really how driver unload is supposed to happen. Instead:
>
> - Driver unload starts
> - Driver calls the foo_unregister function, which stops new userspace
> from getting at the driver. If you're subsystem is good (i.e. drm
> since Noralf fixed it) this will also sufficiently synchronize with
> any pending ioctl.
> - Important: This does _not_ wait until userspace closes all
> references. You can't force that.
> - Driver releases all hw structures and mappings and everything else.
> With fbdev this is currently not fully race free because no one is
> synchronizing with userspace everywhere correctly.
>
> ... much time can pass ...
>
> - Userspace releases the last references, which triggers the final
> destroy stuff and which releases the memory occupied by various
> structures still (but not anything releated to hw or anything else
> really).
>
> So there's two bits:
>
> 1. Synchronizing with pending ioctls. This is mostly there already
> with lock_fb_info/unlock_fb_info. From a quick look the missing bit
> seems to be that the unregister code is not taking that lock, and so
> not sufficiently synchronizing against concurrent ioctl calls and
> other stuff. Plus would need to audit all entry points.

Correction: The check here is file_fb_info(), which checks for
unregister. Except it's totally racy and misses the end marker (unlike
drm_dev_enter/exit in drm). So bunch of work to do here too. The
lock_fb_info is purely locking, not lifetime (and I think in a bunch
of places way too late).

> 1a. fbcon works differently. Don't look too closely, but this is also
> not the problem your facing here.
>
> 2. Refcounting of the fb structure and hw teardown. That's what's
> tracked in fb_info->count. Most likely the fbdev driver you have has a
> wrong split between the hw teardown code and what's in fb_destroy. If
> you have any hw cleanup code in fb_destroy that driver is buggy. efifb
> is very buggy in that area :-) Same for offb, simplefb, vesafb and
> vesa16fb.
>
> We might need a new fb_unregister callback for these drivers to be
> able to fix this properly. Because the unregister comes from the fbdev
> core, and not the driver as usual, so the usual driver unload sequence
> doesnt work:
>
> drm_dev_unregister();
> ... release all hw resource ...
>
> drm_dev_put();
>
> Or in terms of fbdev:
>
> unregister_framebuffer(info);
> ... release all hw resources ... <- everyone gets this wrong
> framebuffer_release(info); <- also wrong because not refcounted,
> hooray, this should be moved to to end of the ->fb_destroy callback
>
> So we need a callback to put the "release all hw resources" step into
> the flow at the right place. Another option (slightly less midlayer)
> would be to add a fb_takeover hook, for these platforms drivers, which
> would then do the above sequence (like at driver unload).
>
> Also adding Noralf, since he's fixed up all the drm stuff in this area
> in the past.
>
> Cheers, Daniel
>
> > ---
> >  drivers/video/fbdev/core/fbmem.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> > index d04554959ea7..2ea8ac05b065 100644
> > --- a/drivers/video/fbdev/core/fbmem.c
> > +++ b/drivers/video/fbdev/core/fbmem.c
> > @@ -35,6 +35,7 @@
> >  #include <linux/fbcon.h>
> >  #include <linux/mem_encrypt.h>
> >  #include <linux/pci.h>
> > +#include <linux/delay.h>
> >
> >  #include <asm/fb.h>
> >
> > @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
> >
> >  static void do_unregister_framebuffer(struct fb_info *fb_info)
> >  {
> > +       int limit = 100;
> > +
> >         unlink_framebuffer(fb_info);
> >         if (fb_info->pixmap.addr &&
> >             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> > @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
> >         fbcon_fb_unregistered(fb_info);
> >         console_unlock();
> >
> > +       /* try wait until all references are gone */
> > +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> > +               msleep(10);
> > +
> >         /* this may free fb info */
> >         put_fb_info(fb_info);
> >  }
> > --
> > 2.18.1
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
  2020-01-28 16:44       ` Daniel Vetter
  (?)
@ 2020-01-28 16:58         ` Daniel Vetter
  -1 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:58 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: dri-devel, open list, open list:FRAMEBUFFER LAYER,
	Bartlomiej Zolnierkiewicz, marmarek

On Tue, Jan 28, 2020 at 5:44 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Jan 28, 2020 at 5:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >
> > > Problem: do_unregister_framebuffer() might return before the device is
> > > fully cleaned up, due to userspace having a file handle for /dev/fb0
> > > open.  Which can result in drm driver not being able to grab resources
> > > (and fail initialization) because the firmware framebuffer still holds
> > > them.  Reportedly plymouth can trigger this.
> > >
> > > Fix this by trying to wait until all references are gone.  Don't wait
> > > forever though given that userspace might keep the file handle open.
> > >
> > > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> >
> > (Missed this because lca, so a bit late)
> >
> > This isn't really how driver unload is supposed to happen. Instead:
> >
> > - Driver unload starts
> > - Driver calls the foo_unregister function, which stops new userspace
> > from getting at the driver. If you're subsystem is good (i.e. drm
> > since Noralf fixed it) this will also sufficiently synchronize with
> > any pending ioctl.
> > - Important: This does _not_ wait until userspace closes all
> > references. You can't force that.
> > - Driver releases all hw structures and mappings and everything else.
> > With fbdev this is currently not fully race free because no one is
> > synchronizing with userspace everywhere correctly.
> >
> > ... much time can pass ...
> >
> > - Userspace releases the last references, which triggers the final
> > destroy stuff and which releases the memory occupied by various
> > structures still (but not anything releated to hw or anything else
> > really).
> >
> > So there's two bits:
> >
> > 1. Synchronizing with pending ioctls. This is mostly there already
> > with lock_fb_info/unlock_fb_info. From a quick look the missing bit
> > seems to be that the unregister code is not taking that lock, and so
> > not sufficiently synchronizing against concurrent ioctl calls and
> > other stuff. Plus would need to audit all entry points.
>
> Correction: The check here is file_fb_info(), which checks for
> unregister. Except it's totally racy and misses the end marker (unlike
> drm_dev_enter/exit in drm). So bunch of work to do here too. The
> lock_fb_info is purely locking, not lifetime (and I think in a bunch
> of places way too late).

Oh and since your patch is full of races too I expect you'll be able
to get away with just adding the ->fb_remove hook, fix up drivers, and
leave fixing the various races to someone else.
-Daniel

>
> > 1a. fbcon works differently. Don't look too closely, but this is also
> > not the problem your facing here.
> >
> > 2. Refcounting of the fb structure and hw teardown. That's what's
> > tracked in fb_info->count. Most likely the fbdev driver you have has a
> > wrong split between the hw teardown code and what's in fb_destroy. If
> > you have any hw cleanup code in fb_destroy that driver is buggy. efifb
> > is very buggy in that area :-) Same for offb, simplefb, vesafb and
> > vesa16fb.
> >
> > We might need a new fb_unregister callback for these drivers to be
> > able to fix this properly. Because the unregister comes from the fbdev
> > core, and not the driver as usual, so the usual driver unload sequence
> > doesnt work:
> >
> > drm_dev_unregister();
> > ... release all hw resource ...
> >
> > drm_dev_put();
> >
> > Or in terms of fbdev:
> >
> > unregister_framebuffer(info);
> > ... release all hw resources ... <- everyone gets this wrong
> > framebuffer_release(info); <- also wrong because not refcounted,
> > hooray, this should be moved to to end of the ->fb_destroy callback
> >
> > So we need a callback to put the "release all hw resources" step into
> > the flow at the right place. Another option (slightly less midlayer)
> > would be to add a fb_takeover hook, for these platforms drivers, which
> > would then do the above sequence (like at driver unload).
> >
> > Also adding Noralf, since he's fixed up all the drm stuff in this area
> > in the past.
> >
> > Cheers, Daniel
> >
> > > ---
> > >  drivers/video/fbdev/core/fbmem.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> > > index d04554959ea7..2ea8ac05b065 100644
> > > --- a/drivers/video/fbdev/core/fbmem.c
> > > +++ b/drivers/video/fbdev/core/fbmem.c
> > > @@ -35,6 +35,7 @@
> > >  #include <linux/fbcon.h>
> > >  #include <linux/mem_encrypt.h>
> > >  #include <linux/pci.h>
> > > +#include <linux/delay.h>
> > >
> > >  #include <asm/fb.h>
> > >
> > > @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
> > >
> > >  static void do_unregister_framebuffer(struct fb_info *fb_info)
> > >  {
> > > +       int limit = 100;
> > > +
> > >         unlink_framebuffer(fb_info);
> > >         if (fb_info->pixmap.addr &&
> > >             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> > > @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
> > >         fbcon_fb_unregistered(fb_info);
> > >         console_unlock();
> > >
> > > +       /* try wait until all references are gone */
> > > +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> > > +               msleep(10);
> > > +
> > >         /* this may free fb info */
> > >         put_fb_info(fb_info);
> > >  }
> > > --
> > > 2.18.1
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 16:58         ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:58 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: marmarek, open list:FRAMEBUFFER LAYER, open list, dri-devel,
	Bartlomiej Zolnierkiewicz

On Tue, Jan 28, 2020 at 5:44 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Jan 28, 2020 at 5:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >
> > > Problem: do_unregister_framebuffer() might return before the device is
> > > fully cleaned up, due to userspace having a file handle for /dev/fb0
> > > open.  Which can result in drm driver not being able to grab resources
> > > (and fail initialization) because the firmware framebuffer still holds
> > > them.  Reportedly plymouth can trigger this.
> > >
> > > Fix this by trying to wait until all references are gone.  Don't wait
> > > forever though given that userspace might keep the file handle open.
> > >
> > > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> >
> > (Missed this because lca, so a bit late)
> >
> > This isn't really how driver unload is supposed to happen. Instead:
> >
> > - Driver unload starts
> > - Driver calls the foo_unregister function, which stops new userspace
> > from getting at the driver. If you're subsystem is good (i.e. drm
> > since Noralf fixed it) this will also sufficiently synchronize with
> > any pending ioctl.
> > - Important: This does _not_ wait until userspace closes all
> > references. You can't force that.
> > - Driver releases all hw structures and mappings and everything else.
> > With fbdev this is currently not fully race free because no one is
> > synchronizing with userspace everywhere correctly.
> >
> > ... much time can pass ...
> >
> > - Userspace releases the last references, which triggers the final
> > destroy stuff and which releases the memory occupied by various
> > structures still (but not anything releated to hw or anything else
> > really).
> >
> > So there's two bits:
> >
> > 1. Synchronizing with pending ioctls. This is mostly there already
> > with lock_fb_info/unlock_fb_info. From a quick look the missing bit
> > seems to be that the unregister code is not taking that lock, and so
> > not sufficiently synchronizing against concurrent ioctl calls and
> > other stuff. Plus would need to audit all entry points.
>
> Correction: The check here is file_fb_info(), which checks for
> unregister. Except it's totally racy and misses the end marker (unlike
> drm_dev_enter/exit in drm). So bunch of work to do here too. The
> lock_fb_info is purely locking, not lifetime (and I think in a bunch
> of places way too late).

Oh and since your patch is full of races too I expect you'll be able
to get away with just adding the ->fb_remove hook, fix up drivers, and
leave fixing the various races to someone else.
-Daniel

>
> > 1a. fbcon works differently. Don't look too closely, but this is also
> > not the problem your facing here.
> >
> > 2. Refcounting of the fb structure and hw teardown. That's what's
> > tracked in fb_info->count. Most likely the fbdev driver you have has a
> > wrong split between the hw teardown code and what's in fb_destroy. If
> > you have any hw cleanup code in fb_destroy that driver is buggy. efifb
> > is very buggy in that area :-) Same for offb, simplefb, vesafb and
> > vesa16fb.
> >
> > We might need a new fb_unregister callback for these drivers to be
> > able to fix this properly. Because the unregister comes from the fbdev
> > core, and not the driver as usual, so the usual driver unload sequence
> > doesnt work:
> >
> > drm_dev_unregister();
> > ... release all hw resource ...
> >
> > drm_dev_put();
> >
> > Or in terms of fbdev:
> >
> > unregister_framebuffer(info);
> > ... release all hw resources ... <- everyone gets this wrong
> > framebuffer_release(info); <- also wrong because not refcounted,
> > hooray, this should be moved to to end of the ->fb_destroy callback
> >
> > So we need a callback to put the "release all hw resources" step into
> > the flow at the right place. Another option (slightly less midlayer)
> > would be to add a fb_takeover hook, for these platforms drivers, which
> > would then do the above sequence (like at driver unload).
> >
> > Also adding Noralf, since he's fixed up all the drm stuff in this area
> > in the past.
> >
> > Cheers, Daniel
> >
> > > ---
> > >  drivers/video/fbdev/core/fbmem.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> > > index d04554959ea7..2ea8ac05b065 100644
> > > --- a/drivers/video/fbdev/core/fbmem.c
> > > +++ b/drivers/video/fbdev/core/fbmem.c
> > > @@ -35,6 +35,7 @@
> > >  #include <linux/fbcon.h>
> > >  #include <linux/mem_encrypt.h>
> > >  #include <linux/pci.h>
> > > +#include <linux/delay.h>
> > >
> > >  #include <asm/fb.h>
> > >
> > > @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
> > >
> > >  static void do_unregister_framebuffer(struct fb_info *fb_info)
> > >  {
> > > +       int limit = 100;
> > > +
> > >         unlink_framebuffer(fb_info);
> > >         if (fb_info->pixmap.addr &&
> > >             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> > > @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
> > >         fbcon_fb_unregistered(fb_info);
> > >         console_unlock();
> > >
> > > +       /* try wait until all references are gone */
> > > +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> > > +               msleep(10);
> > > +
> > >         /* this may free fb info */
> > >         put_fb_info(fb_info);
> > >  }
> > > --
> > > 2.18.1
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] fbdev: wait for references go away
@ 2020-01-28 16:58         ` Daniel Vetter
  0 siblings, 0 replies; 27+ messages in thread
From: Daniel Vetter @ 2020-01-28 16:58 UTC (permalink / raw)
  To: Gerd Hoffmann, Noralf Trønnes
  Cc: marmarek, open list:FRAMEBUFFER LAYER, open list, dri-devel,
	Bartlomiej Zolnierkiewicz

On Tue, Jan 28, 2020 at 5:44 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Jan 28, 2020 at 5:39 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Jan 20, 2020 at 11:00 AM Gerd Hoffmann <kraxel@redhat.com> wrote:
> > >
> > > Problem: do_unregister_framebuffer() might return before the device is
> > > fully cleaned up, due to userspace having a file handle for /dev/fb0
> > > open.  Which can result in drm driver not being able to grab resources
> > > (and fail initialization) because the firmware framebuffer still holds
> > > them.  Reportedly plymouth can trigger this.
> > >
> > > Fix this by trying to wait until all references are gone.  Don't wait
> > > forever though given that userspace might keep the file handle open.
> > >
> > > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> > > Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> >
> > (Missed this because lca, so a bit late)
> >
> > This isn't really how driver unload is supposed to happen. Instead:
> >
> > - Driver unload starts
> > - Driver calls the foo_unregister function, which stops new userspace
> > from getting at the driver. If you're subsystem is good (i.e. drm
> > since Noralf fixed it) this will also sufficiently synchronize with
> > any pending ioctl.
> > - Important: This does _not_ wait until userspace closes all
> > references. You can't force that.
> > - Driver releases all hw structures and mappings and everything else.
> > With fbdev this is currently not fully race free because no one is
> > synchronizing with userspace everywhere correctly.
> >
> > ... much time can pass ...
> >
> > - Userspace releases the last references, which triggers the final
> > destroy stuff and which releases the memory occupied by various
> > structures still (but not anything releated to hw or anything else
> > really).
> >
> > So there's two bits:
> >
> > 1. Synchronizing with pending ioctls. This is mostly there already
> > with lock_fb_info/unlock_fb_info. From a quick look the missing bit
> > seems to be that the unregister code is not taking that lock, and so
> > not sufficiently synchronizing against concurrent ioctl calls and
> > other stuff. Plus would need to audit all entry points.
>
> Correction: The check here is file_fb_info(), which checks for
> unregister. Except it's totally racy and misses the end marker (unlike
> drm_dev_enter/exit in drm). So bunch of work to do here too. The
> lock_fb_info is purely locking, not lifetime (and I think in a bunch
> of places way too late).

Oh and since your patch is full of races too I expect you'll be able
to get away with just adding the ->fb_remove hook, fix up drivers, and
leave fixing the various races to someone else.
-Daniel

>
> > 1a. fbcon works differently. Don't look too closely, but this is also
> > not the problem your facing here.
> >
> > 2. Refcounting of the fb structure and hw teardown. That's what's
> > tracked in fb_info->count. Most likely the fbdev driver you have has a
> > wrong split between the hw teardown code and what's in fb_destroy. If
> > you have any hw cleanup code in fb_destroy that driver is buggy. efifb
> > is very buggy in that area :-) Same for offb, simplefb, vesafb and
> > vesa16fb.
> >
> > We might need a new fb_unregister callback for these drivers to be
> > able to fix this properly. Because the unregister comes from the fbdev
> > core, and not the driver as usual, so the usual driver unload sequence
> > doesnt work:
> >
> > drm_dev_unregister();
> > ... release all hw resource ...
> >
> > drm_dev_put();
> >
> > Or in terms of fbdev:
> >
> > unregister_framebuffer(info);
> > ... release all hw resources ... <- everyone gets this wrong
> > framebuffer_release(info); <- also wrong because not refcounted,
> > hooray, this should be moved to to end of the ->fb_destroy callback
> >
> > So we need a callback to put the "release all hw resources" step into
> > the flow at the right place. Another option (slightly less midlayer)
> > would be to add a fb_takeover hook, for these platforms drivers, which
> > would then do the above sequence (like at driver unload).
> >
> > Also adding Noralf, since he's fixed up all the drm stuff in this area
> > in the past.
> >
> > Cheers, Daniel
> >
> > > ---
> > >  drivers/video/fbdev/core/fbmem.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
> > > index d04554959ea7..2ea8ac05b065 100644
> > > --- a/drivers/video/fbdev/core/fbmem.c
> > > +++ b/drivers/video/fbdev/core/fbmem.c
> > > @@ -35,6 +35,7 @@
> > >  #include <linux/fbcon.h>
> > >  #include <linux/mem_encrypt.h>
> > >  #include <linux/pci.h>
> > > +#include <linux/delay.h>
> > >
> > >  #include <asm/fb.h>
> > >
> > > @@ -1707,6 +1708,8 @@ static void unlink_framebuffer(struct fb_info *fb_info)
> > >
> > >  static void do_unregister_framebuffer(struct fb_info *fb_info)
> > >  {
> > > +       int limit = 100;
> > > +
> > >         unlink_framebuffer(fb_info);
> > >         if (fb_info->pixmap.addr &&
> > >             (fb_info->pixmap.flags & FB_PIXMAP_DEFAULT))
> > > @@ -1726,6 +1729,10 @@ static void do_unregister_framebuffer(struct fb_info *fb_info)
> > >         fbcon_fb_unregistered(fb_info);
> > >         console_unlock();
> > >
> > > +       /* try wait until all references are gone */
> > > +       while (atomic_read(&fb_info->count) > 1 && --limit > 0)
> > > +               msleep(10);
> > > +
> > >         /* this may free fb info */
> > >         put_fb_info(fb_info);
> > >  }
> > > --
> > > 2.18.1
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-01-28 16:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20200120100025eucas1p21f5e2da0fd7c1fcb33cb47a97e9e645c@eucas1p2.samsung.com>
2020-01-20 10:00 ` [PATCH] fbdev: wait for references go away Gerd Hoffmann
2020-01-20 10:00   ` Gerd Hoffmann
2020-01-20 10:00   ` Gerd Hoffmann
2020-01-20 17:51   ` Bartlomiej Zolnierkiewicz
2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
2020-01-20 17:51     ` Bartlomiej Zolnierkiewicz
2020-01-20 17:54     ` Bartlomiej Zolnierkiewicz
2020-01-20 17:54       ` Bartlomiej Zolnierkiewicz
2020-01-20 17:54       ` Bartlomiej Zolnierkiewicz
2020-01-20 18:11     ` Marek Marczykowski-Górecki
2020-01-20 18:11       ` Marek Marczykowski-Górecki
2020-01-20 18:11       ` Marek Marczykowski-Górecki
2020-01-21  5:53     ` Gerd Hoffmann
2020-01-21  5:53       ` Gerd Hoffmann
2020-01-21  5:53       ` Gerd Hoffmann
2020-01-28 15:20       ` Bartlomiej Zolnierkiewicz
2020-01-28 15:20         ` Bartlomiej Zolnierkiewicz
2020-01-28 15:20         ` Bartlomiej Zolnierkiewicz
2020-01-28 16:39   ` Daniel Vetter
2020-01-28 16:39     ` Daniel Vetter
2020-01-28 16:39     ` Daniel Vetter
2020-01-28 16:44     ` Daniel Vetter
2020-01-28 16:44       ` Daniel Vetter
2020-01-28 16:44       ` Daniel Vetter
2020-01-28 16:58       ` Daniel Vetter
2020-01-28 16:58         ` Daniel Vetter
2020-01-28 16:58         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.