All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal
@ 2024-02-20 11:10 Herve Codina
  2024-02-20 11:10 ` [PATCH 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations Herve Codina
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Herve Codina @ 2024-02-20 11:10 UTC (permalink / raw)
  To: Bartosz Golaszewski, Kent Gibson, Linus Walleij
  Cc: linux-gpio, linux-kernel, Luca Ceresoli, Thomas Petazzoni, Herve Codina

Hi,

When a gpio chip device is removed while some related gpio are used by
the user-space (gpiomon for instance), the following warning can appear:
  remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
  WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
  ...
  Call trace:
    remove_proc_entry+0x190/0x19c
    unregister_irq_proc+0xd0/0x104
    free_desc+0x4c/0xc4
    irq_free_descs+0x6c/0x90
    irq_dispose_mapping+0x104/0x14c
    gpiochip_irqchip_remove+0xcc/0x1a4
    gpiochip_remove+0x48/0x100
  ...

Indeed, even if the gpio removal is notified to the gpio-cdev, the
IRQ used is not released when it should be.

This series calls the gpio removal notifier sooner in the removal
process in order to give a chance to a notifier function to release
the IRQ before releasing the IRQ mapping and adds the needed
operations to release the IRQ in the gpio cdev notifier function.

Best regards,
Hervé Codina

Herve Codina (2):
  gpiolib: call gcdev_unregister() sooner in the removal operations
  gpiolib: cdev: release IRQs when the gpio chip device is removed

 drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
 drivers/gpio/gpiolib.c      |  8 +++++++-
 2 files changed, 29 insertions(+), 12 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations
  2024-02-20 11:10 [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal Herve Codina
@ 2024-02-20 11:10 ` Herve Codina
  2024-02-20 13:47   ` Bartosz Golaszewski
  2024-02-20 11:10 ` [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed Herve Codina
  2024-02-20 13:41 ` [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal Bartosz Golaszewski
  2 siblings, 1 reply; 17+ messages in thread
From: Herve Codina @ 2024-02-20 11:10 UTC (permalink / raw)
  To: Bartosz Golaszewski, Kent Gibson, Linus Walleij
  Cc: linux-gpio, linux-kernel, Luca Ceresoli, Thomas Petazzoni, Herve Codina

When gpio chip device is removed while some related gpio are used by the
user-space, the following warning can appear:
  remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
  WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
  ...
  Call trace:
    remove_proc_entry+0x190/0x19c
    unregister_irq_proc+0xd0/0x104
    free_desc+0x4c/0xc4
    irq_free_descs+0x6c/0x90
    irq_dispose_mapping+0x104/0x14c
    gpiochip_irqchip_remove+0xcc/0x1a4
    gpiochip_remove+0x48/0x100
  ...

Indeed, the gpio cdev uses an IRQ but this IRQ is not released
(irq_free() call) before the call to gpiochip_irqchip_remove().

In order to give a chance to the gpio dev driver to release this
irq before removing the IRQ mapping, notify the cdev driver about
the gpio device removal before the gpiochip_irqchip_remove() call.

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
 drivers/gpio/gpiolib.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 8b3a0f45b574..079181b9daa8 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1051,6 +1051,13 @@ void gpiochip_remove(struct gpio_chip *gc)
 
 	/* FIXME: should the legacy sysfs handling be moved to gpio_device? */
 	gpiochip_sysfs_unregister(gdev);
+
+	/*
+	 * Tell gcdev that the device is removing. If any gpio resources are in
+	 * use (irqs for instance), it's time for gcdev to release them.
+	 */
+	gcdev_unregister(gdev);
+
 	gpiochip_free_hogs(gc);
 	/* Numb the device, cancelling all outstanding operations */
 	gdev->chip = NULL;
@@ -1085,7 +1092,6 @@ void gpiochip_remove(struct gpio_chip *gc)
 	 * be removed, else it will be dangling until the last user is
 	 * gone.
 	 */
-	gcdev_unregister(gdev);
 	up_write(&gdev->sem);
 	gpio_device_put(gdev);
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-20 11:10 [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal Herve Codina
  2024-02-20 11:10 ` [PATCH 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations Herve Codina
@ 2024-02-20 11:10 ` Herve Codina
  2024-02-20 14:29   ` Kent Gibson
  2024-02-20 13:41 ` [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal Bartosz Golaszewski
  2 siblings, 1 reply; 17+ messages in thread
From: Herve Codina @ 2024-02-20 11:10 UTC (permalink / raw)
  To: Bartosz Golaszewski, Kent Gibson, Linus Walleij
  Cc: linux-gpio, linux-kernel, Luca Ceresoli, Thomas Petazzoni, Herve Codina

When gpio chip device is removed while some related gpio are used by the
user-space, the following warning can appear:
  remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
  WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
  ...
  Call trace:
    remove_proc_entry+0x190/0x19c
    unregister_irq_proc+0xd0/0x104
    free_desc+0x4c/0xc4
    irq_free_descs+0x6c/0x90
    irq_dispose_mapping+0x104/0x14c
    gpiochip_irqchip_remove+0xcc/0x1a4
    gpiochip_remove+0x48/0x100
  ...

Indeed, the gpio cdev uses an IRQ but this IRQ is not released when the
gpio chip device is removed.

Release IRQs used in the device removal notifier functions.
Also move one of these function definition in order to avoid a forward
declaration (move after the edge_detector_stop() definition).

Signed-off-by: Herve Codina <herve.codina@bootlin.com>
---
 drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
index 2a88736629ef..aec4a4c8490a 100644
--- a/drivers/gpio/gpiolib-cdev.c
+++ b/drivers/gpio/gpiolib-cdev.c
@@ -688,17 +688,6 @@ static void line_set_debounce_period(struct line *line,
 	 GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE | \
 	 GPIO_V2_LINE_EDGE_FLAGS)
 
-static int linereq_unregistered_notify(struct notifier_block *nb,
-				       unsigned long action, void *data)
-{
-	struct linereq *lr = container_of(nb, struct linereq,
-					  device_unregistered_nb);
-
-	wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR);
-
-	return NOTIFY_OK;
-}
-
 static void linereq_put_event(struct linereq *lr,
 			      struct gpio_v2_line_event *le)
 {
@@ -1189,6 +1178,23 @@ static int edge_detector_update(struct line *line,
 	return edge_detector_setup(line, lc, line_idx, edflags);
 }
 
+static int linereq_unregistered_notify(struct notifier_block *nb,
+				       unsigned long action, void *data)
+{
+	struct linereq *lr = container_of(nb, struct linereq,
+					  device_unregistered_nb);
+	int i;
+
+	for (i = 0; i < lr->num_lines; i++) {
+		if (lr->lines[i].desc)
+			edge_detector_stop(&lr->lines[i]);
+	}
+
+	wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR);
+
+	return NOTIFY_OK;
+}
+
 static u64 gpio_v2_line_config_flags(struct gpio_v2_line_config *lc,
 				     unsigned int line_idx)
 {
@@ -1898,6 +1904,11 @@ static int lineevent_unregistered_notify(struct notifier_block *nb,
 	struct lineevent_state *le = container_of(nb, struct lineevent_state,
 						  device_unregistered_nb);
 
+	if (le->irq) {
+		free_irq(le->irq, le);
+		le->irq = 0;
+	}
+
 	wake_up_poll(&le->wait, EPOLLIN | EPOLLERR);
 
 	return NOTIFY_OK;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal
  2024-02-20 11:10 [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal Herve Codina
  2024-02-20 11:10 ` [PATCH 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations Herve Codina
  2024-02-20 11:10 ` [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed Herve Codina
@ 2024-02-20 13:41 ` Bartosz Golaszewski
  2 siblings, 0 replies; 17+ messages in thread
From: Bartosz Golaszewski @ 2024-02-20 13:41 UTC (permalink / raw)
  To: Herve Codina
  Cc: Kent Gibson, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

On Tue, Feb 20, 2024 at 12:10 PM Herve Codina <herve.codina@bootlin.com> wrote:
>
> Hi,
>
> When a gpio chip device is removed while some related gpio are used by
> the user-space (gpiomon for instance), the following warning can appear:
>   remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
>   WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
>   ...
>   Call trace:
>     remove_proc_entry+0x190/0x19c
>     unregister_irq_proc+0xd0/0x104
>     free_desc+0x4c/0xc4
>     irq_free_descs+0x6c/0x90
>     irq_dispose_mapping+0x104/0x14c
>     gpiochip_irqchip_remove+0xcc/0x1a4
>     gpiochip_remove+0x48/0x100
>   ...
>
> Indeed, even if the gpio removal is notified to the gpio-cdev, the
> IRQ used is not released when it should be.
>
> This series calls the gpio removal notifier sooner in the removal
> process in order to give a chance to a notifier function to release
> the IRQ before releasing the IRQ mapping and adds the needed
> operations to release the IRQ in the gpio cdev notifier function.
>
> Best regards,
> Hervé Codina
>
> Herve Codina (2):
>   gpiolib: call gcdev_unregister() sooner in the removal operations
>   gpiolib: cdev: release IRQs when the gpio chip device is removed
>
>  drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
>  drivers/gpio/gpiolib.c      |  8 +++++++-
>  2 files changed, 29 insertions(+), 12 deletions(-)
>
> --
> 2.43.0
>

Thanks for taking a stab at it. I saw this issue some time ago, tried
to fix it directly in interrupt procfs code[1], got yelled at by
Thomas Gleixner for 20 or so emails and eventually forgot about it.
Nice to see someone tackle it again.

Bart

[1] https://lore.kernel.org/lkml/20230814093621.23209-1-brgl@bgdev.pl/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations
  2024-02-20 11:10 ` [PATCH 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations Herve Codina
@ 2024-02-20 13:47   ` Bartosz Golaszewski
  0 siblings, 0 replies; 17+ messages in thread
From: Bartosz Golaszewski @ 2024-02-20 13:47 UTC (permalink / raw)
  To: Herve Codina
  Cc: Kent Gibson, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

On Tue, Feb 20, 2024 at 12:10 PM Herve Codina <herve.codina@bootlin.com> wrote:
>
> When gpio chip device is removed while some related gpio are used by the
> user-space, the following warning can appear:
>   remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
>   WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
>   ...
>   Call trace:
>     remove_proc_entry+0x190/0x19c
>     unregister_irq_proc+0xd0/0x104
>     free_desc+0x4c/0xc4
>     irq_free_descs+0x6c/0x90
>     irq_dispose_mapping+0x104/0x14c
>     gpiochip_irqchip_remove+0xcc/0x1a4
>     gpiochip_remove+0x48/0x100
>   ...
>
> Indeed, the gpio cdev uses an IRQ but this IRQ is not released
> (irq_free() call) before the call to gpiochip_irqchip_remove().
>
> In order to give a chance to the gpio dev driver to release this
> irq before removing the IRQ mapping, notify the cdev driver about
> the gpio device removal before the gpiochip_irqchip_remove() call.
>
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---
>  drivers/gpio/gpiolib.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index 8b3a0f45b574..079181b9daa8 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -1051,6 +1051,13 @@ void gpiochip_remove(struct gpio_chip *gc)
>
>         /* FIXME: should the legacy sysfs handling be moved to gpio_device? */
>         gpiochip_sysfs_unregister(gdev);
> +
> +       /*
> +        * Tell gcdev that the device is removing. If any gpio resources are in
> +        * use (irqs for instance), it's time for gcdev to release them.
> +        */
> +       gcdev_unregister(gdev);
> +
>         gpiochip_free_hogs(gc);
>         /* Numb the device, cancelling all outstanding operations */
>         gdev->chip = NULL;
> @@ -1085,7 +1092,6 @@ void gpiochip_remove(struct gpio_chip *gc)
>          * be removed, else it will be dangling until the last user is
>          * gone.
>          */
> -       gcdev_unregister(gdev);
>         up_write(&gdev->sem);

Please rebase it on top of the for-next branch of the GPIO tree. We've
had some significant rework recently, we no longer even have this
semaphore.

Bart

>         gpio_device_put(gdev);
>  }
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-20 11:10 ` [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed Herve Codina
@ 2024-02-20 14:29   ` Kent Gibson
  2024-02-20 18:26     ` Herve Codina
  2024-02-22  0:57     ` Kent Gibson
  0 siblings, 2 replies; 17+ messages in thread
From: Kent Gibson @ 2024-02-20 14:29 UTC (permalink / raw)
  To: Herve Codina
  Cc: Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:
> When gpio chip device is removed while some related gpio are used by the
> user-space, the following warning can appear:
>   remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
>   WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
>   ...
>   Call trace:
>     remove_proc_entry+0x190/0x19c
>     unregister_irq_proc+0xd0/0x104
>     free_desc+0x4c/0xc4
>     irq_free_descs+0x6c/0x90
>     irq_dispose_mapping+0x104/0x14c
>     gpiochip_irqchip_remove+0xcc/0x1a4
>     gpiochip_remove+0x48/0x100
>   ...
>
> Indeed, the gpio cdev uses an IRQ but this IRQ is not released when the
> gpio chip device is removed.
>
> Release IRQs used in the device removal notifier functions.
> Also move one of these function definition in order to avoid a forward
> declaration (move after the edge_detector_stop() definition).
>
> Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> ---
>  drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
>  1 file changed, 22 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
> index 2a88736629ef..aec4a4c8490a 100644
> --- a/drivers/gpio/gpiolib-cdev.c
> +++ b/drivers/gpio/gpiolib-cdev.c
> @@ -688,17 +688,6 @@ static void line_set_debounce_period(struct line *line,
>  	 GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE | \
>  	 GPIO_V2_LINE_EDGE_FLAGS)
>
> -static int linereq_unregistered_notify(struct notifier_block *nb,
> -				       unsigned long action, void *data)
> -{
> -	struct linereq *lr = container_of(nb, struct linereq,
> -					  device_unregistered_nb);
> -
> -	wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR);
> -
> -	return NOTIFY_OK;
> -}
> -
>  static void linereq_put_event(struct linereq *lr,
>  			      struct gpio_v2_line_event *le)
>  {
> @@ -1189,6 +1178,23 @@ static int edge_detector_update(struct line *line,
>  	return edge_detector_setup(line, lc, line_idx, edflags);
>  }
>
> +static int linereq_unregistered_notify(struct notifier_block *nb,
> +				       unsigned long action, void *data)
> +{
> +	struct linereq *lr = container_of(nb, struct linereq,
> +					  device_unregistered_nb);
> +	int i;
> +
> +	for (i = 0; i < lr->num_lines; i++) {
> +		if (lr->lines[i].desc)
> +			edge_detector_stop(&lr->lines[i]);
> +	}
> +

Firstly, the re-ordering in the previous patch creates a race,
as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
there is now a window between the notifier being called and that numbing,
during which userspace may call linereq_set_config() and re-request
the irq.

There is also a race here with linereq_set_config().  That can be prevented
by holding the lr->config_mutex - assuming the notifier is not being called
from atomic context.

You also have a race with the line being freed that could pull the
lr out from under you, so a use after free problem.
I'd rather live with the warning :(.
Fixing that requires rethinking the lifecycle management for the
linereq/lineevent.

Cheers,
Kent.

> +	wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR);
> +
> +	return NOTIFY_OK;
> +}
> +
>  static u64 gpio_v2_line_config_flags(struct gpio_v2_line_config *lc,
>  				     unsigned int line_idx)
>  {
> @@ -1898,6 +1904,11 @@ static int lineevent_unregistered_notify(struct notifier_block *nb,
>  	struct lineevent_state *le = container_of(nb, struct lineevent_state,
>  						  device_unregistered_nb);
>
> +	if (le->irq) {
> +		free_irq(le->irq, le);
> +		le->irq = 0;
> +	}
> +
>  	wake_up_poll(&le->wait, EPOLLIN | EPOLLERR);
>
>  	return NOTIFY_OK;
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-20 14:29   ` Kent Gibson
@ 2024-02-20 18:26     ` Herve Codina
  2024-02-21  0:25       ` Kent Gibson
  2024-02-22  0:57     ` Kent Gibson
  1 sibling, 1 reply; 17+ messages in thread
From: Herve Codina @ 2024-02-20 18:26 UTC (permalink / raw)
  To: Kent Gibson
  Cc: Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

Hi Kent,

On Tue, 20 Feb 2024 22:29:59 +0800
Kent Gibson <warthog618@gmail.com> wrote:

> On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:
> > When gpio chip device is removed while some related gpio are used by the
> > user-space, the following warning can appear:
> >   remove_proc_entry: removing non-empty directory 'irq/233', leaking at least 'gpiomon'
> >   WARNING: CPU: 2 PID: 72 at fs/proc/generic.c:717 remove_proc_entry+0x190/0x19c
> >   ...
> >   Call trace:
> >     remove_proc_entry+0x190/0x19c
> >     unregister_irq_proc+0xd0/0x104
> >     free_desc+0x4c/0xc4
> >     irq_free_descs+0x6c/0x90
> >     irq_dispose_mapping+0x104/0x14c
> >     gpiochip_irqchip_remove+0xcc/0x1a4
> >     gpiochip_remove+0x48/0x100
> >   ...
> >
> > Indeed, the gpio cdev uses an IRQ but this IRQ is not released when the
> > gpio chip device is removed.
> >
> > Release IRQs used in the device removal notifier functions.
> > Also move one of these function definition in order to avoid a forward
> > declaration (move after the edge_detector_stop() definition).
> >
> > Signed-off-by: Herve Codina <herve.codina@bootlin.com>
> > ---
> >  drivers/gpio/gpiolib-cdev.c | 33 ++++++++++++++++++++++-----------
> >  1 file changed, 22 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/gpio/gpiolib-cdev.c b/drivers/gpio/gpiolib-cdev.c
> > index 2a88736629ef..aec4a4c8490a 100644
> > --- a/drivers/gpio/gpiolib-cdev.c
> > +++ b/drivers/gpio/gpiolib-cdev.c
> > @@ -688,17 +688,6 @@ static void line_set_debounce_period(struct line *line,
> >  	 GPIO_V2_LINE_FLAG_EVENT_CLOCK_HTE | \
> >  	 GPIO_V2_LINE_EDGE_FLAGS)
> >
> > -static int linereq_unregistered_notify(struct notifier_block *nb,
> > -				       unsigned long action, void *data)
> > -{
> > -	struct linereq *lr = container_of(nb, struct linereq,
> > -					  device_unregistered_nb);
> > -
> > -	wake_up_poll(&lr->wait, EPOLLIN | EPOLLERR);
> > -
> > -	return NOTIFY_OK;
> > -}
> > -
> >  static void linereq_put_event(struct linereq *lr,
> >  			      struct gpio_v2_line_event *le)
> >  {
> > @@ -1189,6 +1178,23 @@ static int edge_detector_update(struct line *line,
> >  	return edge_detector_setup(line, lc, line_idx, edflags);
> >  }
> >
> > +static int linereq_unregistered_notify(struct notifier_block *nb,
> > +				       unsigned long action, void *data)
> > +{
> > +	struct linereq *lr = container_of(nb, struct linereq,
> > +					  device_unregistered_nb);
> > +	int i;
> > +
> > +	for (i = 0; i < lr->num_lines; i++) {
> > +		if (lr->lines[i].desc)
> > +			edge_detector_stop(&lr->lines[i]);
> > +	}
> > +  
> 
> Firstly, the re-ordering in the previous patch creates a race,
> as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
> there is now a window between the notifier being called and that numbing,
> during which userspace may call linereq_set_config() and re-request
> the irq.

Well in my previous patch, if gdev->chip need to NULL before the call to 
gcdev_unregister(), this can be done.
I did modification that leads to the following sequence:
--- 8< ---
	...
        gcdev_unregister(gdev);

        gpiochip_free_hogs(gc);
        /* Numb the device, cancelling all outstanding operations */
        gdev->chip = NULL;
        gpiochip_irqchip_remove(gc);
        acpi_gpiochip_remove(gc);
        of_gpiochip_remove(gc);
        gpiochip_remove_pin_ranges(gc);
	...
--- 8< ---

I can call gcdev_unregister() right after gdev->chip = NULL.
The needed things is to have free_irq() (from the gcdev_unregister()) called
before calling gpiochip_irqchip_remove().

And so, why not:
--- 8< ---
	...
        gpiochip_free_hogs(gc);
        /* Numb the device, cancelling all outstanding operations */
        gdev->chip = NULL;
	gcdev_unregister(gdev);
        gpiochip_irqchip_remove(gc);
        acpi_gpiochip_remove(gc);
        of_gpiochip_remove(gc);
        gpiochip_remove_pin_ranges(gc);
	...
--- 8< ---

> 
> There is also a race here with linereq_set_config().  That can be prevented
> by holding the lr->config_mutex - assuming the notifier is not being called
> from atomic context.

I missed that one and indeed, I probably can take the mutex. With the mutex
holded, no more race condition with linereq_set_config() and so the IRQ cannot
be re-requested.

> 
> You also have a race with the line being freed that could pull the
> lr out from under you, so a use after free problem.

I probably missed something but I don't see this use after free.
Can you give me some details/pointers ?


> I'd rather live with the warning :(.
> Fixing that requires rethinking the lifecycle management for the
> linereq/lineevent.

Well, currently the warning is a big one with a dump_stack included.
It will be interesting to have it fixed.

The need to fix it is to have free_irq() called before
gpiochip_irqchip_remove();

Is there really no way to have this correct sequence without rethinking all
the lifecycle management ?

Also, after the warning related to the IRQ, the following one is present:
--- 8< ---
[ 9593.527961] gpio gpiochip9: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
[ 9593.535602] ------------[ cut here ]------------
[ 9593.540244] WARNING: CPU: 0 PID: 309 at drivers/gpio/gpiolib.c:2352 gpiod_free.part.0+0x20/0x48
...
[ 9593.725016] Call trace:
[ 9593.727468]  gpiod_free.part.0+0x20/0x48
[ 9593.731404]  gpiod_free+0x14/0x24
[ 9593.734728]  lineevent_free+0x40/0x74
[ 9593.738402]  lineevent_release+0x14/0x24
[ 9593.742335]  __fput+0x70/0x2bc
[ 9593.745403]  __fput_sync+0x50/0x5c
[ 9593.748817]  __arm64_sys_close+0x38/0x7c
[ 9593.752751]  invoke_syscall+0x48/0x114
...
[ 9593.815299] ---[ end trace 0000000000000000 ]---
[ 9593.820616] hotplug-manager dock-hotplug-manager: remove overlay 0 (ovcs id 1)
gpiomon: error waiting for events: No such device
# 
--- 8< ---

Best regards,
Hervé

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-20 18:26     ` Herve Codina
@ 2024-02-21  0:25       ` Kent Gibson
  2024-02-21  0:55         ` Kent Gibson
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Gibson @ 2024-02-21  0:25 UTC (permalink / raw)
  To: Herve Codina
  Cc: Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

On Tue, Feb 20, 2024 at 07:26:57PM +0100, Herve Codina wrote:
> Hi Kent,
>
> On Tue, 20 Feb 2024 22:29:59 +0800
> Kent Gibson <warthog618@gmail.com> wrote:
>

...

>
> I can call gcdev_unregister() right after gdev->chip = NULL.
> The needed things is to have free_irq() (from the gcdev_unregister()) called
> before calling gpiochip_irqchip_remove().
>
> And so, why not:
> --- 8< ---
> 	...
>         gpiochip_free_hogs(gc);
>         /* Numb the device, cancelling all outstanding operations */
>         gdev->chip = NULL;
> 	gcdev_unregister(gdev);
>         gpiochip_irqchip_remove(gc);
>         acpi_gpiochip_remove(gc);
>         of_gpiochip_remove(gc);
>         gpiochip_remove_pin_ranges(gc);
> 	...
> --- 8< ---
>

Exactly - why not.  Though adding some comments to the code as to why
that ordering is required, as per the numbing, would be helpful for
maintainability.

> >
> > There is also a race here with linereq_set_config().  That can be prevented
> > by holding the lr->config_mutex - assuming the notifier is not being called
> > from atomic context.
>
> I missed that one and indeed, I probably can take the mutex. With the mutex
> holded, no more race condition with linereq_set_config() and so the IRQ cannot
> be re-requested.
>
> >
> > You also have a race with the line being freed that could pull the
> > lr out from under you, so a use after free problem.
>
> I probably missed something but I don't see this use after free.
> Can you give me some details/pointers ?
>

What is to prevent userspace releasing the request and freeing the
linereq while you use it?  The use after free is anywhere that is
possible.

AIUI, from the userspace side that is prevented by the file handle not
being closed, and so linereq_release() not being called, while ioctls
are in flight.  But as you are calling in from the kernel side there is
nothing in place to prevent userspace freeing the linereq while you are
using it.

>
> > I'd rather live with the warning :(.
> > Fixing that requires rethinking the lifecycle management for the
> > linereq/lineevent.
>
> Well, currently the warning is a big one with a dump_stack included.
> It will be interesting to have it fixed.
>
> The need to fix it is to have free_irq() called before
> gpiochip_irqchip_remove();
>
> Is there really no way to have this correct sequence without rethinking all
> the lifecycle management ?
>

Not that I am aware of.  You need to protect against the linereq
being freed while you are using it, which is by definition is lifecycle
management.  Though it isn't necessarily __all__ the lifecycle management.

> Also, after the warning related to the IRQ, the following one is present:
> --- 8< ---
> [ 9593.527961] gpio gpiochip9: REMOVING GPIOCHIP WITH GPIOS STILL REQUESTED
> [ 9593.535602] ------------[ cut here ]------------
> [ 9593.540244] WARNING: CPU: 0 PID: 309 at drivers/gpio/gpiolib.c:2352 gpiod_free.part.0+0x20/0x48
> ...
> [ 9593.725016] Call trace:
> [ 9593.727468]  gpiod_free.part.0+0x20/0x48
> [ 9593.731404]  gpiod_free+0x14/0x24
> [ 9593.734728]  lineevent_free+0x40/0x74
> [ 9593.738402]  lineevent_release+0x14/0x24
> [ 9593.742335]  __fput+0x70/0x2bc
> [ 9593.745403]  __fput_sync+0x50/0x5c
> [ 9593.748817]  __arm64_sys_close+0x38/0x7c
> [ 9593.752751]  invoke_syscall+0x48/0x114
> ...
> [ 9593.815299] ---[ end trace 0000000000000000 ]---
> [ 9593.820616] hotplug-manager dock-hotplug-manager: remove overlay 0 (ovcs id 1)
> gpiomon: error waiting for events: No such device
> #
> --- 8< ---
>

My understanding is that we now handle that case - that is what
the gdev->device_notifier chain is for, and gpiolib and gpiolib-cdev now
perform a controlled cleanupi - so that warning is obsolete and should be
removed from gpiochip_remove().

IIRC, previously you would've gotten a panic shortly after that warning.
Now you get gpiomon noticing that the device has been removed.

Btw, where I mention linereq here, the same applies to lineevent and
linehandle - the uAPI v1 equivalents of linereq.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-21  0:25       ` Kent Gibson
@ 2024-02-21  0:55         ` Kent Gibson
  0 siblings, 0 replies; 17+ messages in thread
From: Kent Gibson @ 2024-02-21  0:55 UTC (permalink / raw)
  To: Herve Codina
  Cc: Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

On Wed, Feb 21, 2024 at 08:25:55AM +0800, Kent Gibson wrote:
> On Tue, Feb 20, 2024 at 07:26:57PM +0100, Herve Codina wrote:
> > Hi Kent,
> >
> >
> > I probably missed something but I don't see this use after free.
> > Can you give me some details/pointers ?
> >
>
> What is to prevent userspace releasing the request and freeing the
> linereq while you use it?  The use after free is anywhere that is
> possible.
>

To answer my own question - the notifier call chain itself will prevent
that - linereq_free() will get blocked on the notifier chain semaphore
until the notifier call returns. So there is no use after free problem.

My bad - sorry for the added confusion.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-20 14:29   ` Kent Gibson
  2024-02-20 18:26     ` Herve Codina
@ 2024-02-22  0:57     ` Kent Gibson
  2024-02-22  1:05       ` Kent Gibson
  1 sibling, 1 reply; 17+ messages in thread
From: Kent Gibson @ 2024-02-22  0:57 UTC (permalink / raw)
  To: Herve Codina
  Cc: Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote:
> On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:

...

> >  }
> >
> > +static int linereq_unregistered_notify(struct notifier_block *nb,
> > +				       unsigned long action, void *data)
> > +{
> > +	struct linereq *lr = container_of(nb, struct linereq,
> > +					  device_unregistered_nb);
> > +	int i;
> > +
> > +	for (i = 0; i < lr->num_lines; i++) {
> > +		if (lr->lines[i].desc)
> > +			edge_detector_stop(&lr->lines[i]);
> > +	}
> > +
>
> Firstly, the re-ordering in the previous patch creates a race,
> as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
> there is now a window between the notifier being called and that numbing,
> during which userspace may call linereq_set_config() and re-request
> the irq.
>
> There is also a race here with linereq_set_config().  That can be prevented
> by holding the lr->config_mutex - assuming the notifier is not being called
> from atomic context.
>

It occurs to me that the fixed reordering in patch 1 would place
the notifier call AFTER the NULLing of the ioctls, so there will no longer
be any chance of a race with linereq_set_config() - so holding the
config_mutex semaphore is not necessary.

In which case this patch is fine - it is only patch 1 that requires
updating.

Cheers,
Kent.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-22  0:57     ` Kent Gibson
@ 2024-02-22  1:05       ` Kent Gibson
  2024-02-22  8:31         ` Bartosz Golaszewski
  0 siblings, 1 reply; 17+ messages in thread
From: Kent Gibson @ 2024-02-22  1:05 UTC (permalink / raw)
  To: Herve Codina
  Cc: Bartosz Golaszewski, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni

On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote:
> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote:
> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:
>
> ...
>
> > >  }
> > >
> > > +static int linereq_unregistered_notify(struct notifier_block *nb,
> > > +				       unsigned long action, void *data)
> > > +{
> > > +	struct linereq *lr = container_of(nb, struct linereq,
> > > +					  device_unregistered_nb);
> > > +	int i;
> > > +
> > > +	for (i = 0; i < lr->num_lines; i++) {
> > > +		if (lr->lines[i].desc)
> > > +			edge_detector_stop(&lr->lines[i]);
> > > +	}
> > > +
> >
> > Firstly, the re-ordering in the previous patch creates a race,
> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
> > there is now a window between the notifier being called and that numbing,
> > during which userspace may call linereq_set_config() and re-request
> > the irq.
> >
> > There is also a race here with linereq_set_config().  That can be prevented
> > by holding the lr->config_mutex - assuming the notifier is not being called
> > from atomic context.
> >
>
> It occurs to me that the fixed reordering in patch 1 would place
> the notifier call AFTER the NULLing of the ioctls, so there will no longer
> be any chance of a race with linereq_set_config() - so holding the
> config_mutex semaphore is not necessary.
>

NULLing -> numbing

The gdev->chip is NULLed, so the ioctls are numbed.
And I need to let the coffee soak in before sending.

> In which case this patch is fine - it is only patch 1 that requires
> updating.
>
> Cheers,
> Kent.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-22  1:05       ` Kent Gibson
@ 2024-02-22  8:31         ` Bartosz Golaszewski
  2024-02-22 11:36           ` Herve Codina
  0 siblings, 1 reply; 17+ messages in thread
From: Bartosz Golaszewski @ 2024-02-22  8:31 UTC (permalink / raw)
  To: Kent Gibson
  Cc: Herve Codina, Bartosz Golaszewski, Linus Walleij, linux-gpio,
	linux-kernel, Luca Ceresoli, Thomas Petazzoni, Saravana Kannan

On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said:
> On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote:
>> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote:
>> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:
>>
>> ...
>>
>> > >  }
>> > >
>> > > +static int linereq_unregistered_notify(struct notifier_block *nb,
>> > > +				       unsigned long action, void *data)
>> > > +{
>> > > +	struct linereq *lr = container_of(nb, struct linereq,
>> > > +					  device_unregistered_nb);
>> > > +	int i;
>> > > +
>> > > +	for (i = 0; i < lr->num_lines; i++) {
>> > > +		if (lr->lines[i].desc)
>> > > +			edge_detector_stop(&lr->lines[i]);
>> > > +	}
>> > > +
>> >
>> > Firstly, the re-ordering in the previous patch creates a race,
>> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
>> > there is now a window between the notifier being called and that numbing,
>> > during which userspace may call linereq_set_config() and re-request
>> > the irq.
>> >
>> > There is also a race here with linereq_set_config().  That can be prevented
>> > by holding the lr->config_mutex - assuming the notifier is not being called
>> > from atomic context.
>> >
>>
>> It occurs to me that the fixed reordering in patch 1 would place
>> the notifier call AFTER the NULLing of the ioctls, so there will no longer
>> be any chance of a race with linereq_set_config() - so holding the
>> config_mutex semaphore is not necessary.
>>
>
> NULLing -> numbing
>
> The gdev->chip is NULLed, so the ioctls are numbed.
> And I need to let the coffee soak in before sending.
>
>> In which case this patch is fine - it is only patch 1 that requires
>> updating.
>>
>> Cheers,
>> Kent.
>

The fix for the user-space issue may be more-or-less correct but the problem is
deeper and this won't fix it for in-kernel users.

Herve: please consider the following DT snippet:

	gpio0 {
		compatible = "foo";

		gpio-controller;
		#gpio-cells = <2>;
		interrupt-controller;
		#interrupt-cells = <1>;
		ngpios = <8>;
	};

	consumer {
		compatible = "bar";

		interrupts-extended = <&gpio0 0>;
	};

If you unbind the "gpio0" device after the consumer requested the interrupt,
you'll get the same splat. And device links will not help you here (on that
note: Saravana: is there anything we could do about it? Have you even
considered making the irqchip subsystem use the driver model in any way? Is it
even feasible?).

I would prefer this to be fixed at a lower lever than the GPIOLIB character
device.

Bartosz

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-22  8:31         ` Bartosz Golaszewski
@ 2024-02-22 11:36           ` Herve Codina
  2024-02-22 12:21             ` Bartosz Golaszewski
  0 siblings, 1 reply; 17+ messages in thread
From: Herve Codina @ 2024-02-22 11:36 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Kent Gibson, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni, Saravana Kannan

Hi Bartosz,

On Thu, 22 Feb 2024 00:31:08 -0800
Bartosz Golaszewski <brgl@bgdev.pl> wrote:

> On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said:
> > On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote:  
> >> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote:  
> >> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:  
> >>
> >> ...
> >>  
> >> > >  }
> >> > >
> >> > > +static int linereq_unregistered_notify(struct notifier_block *nb,
> >> > > +				       unsigned long action, void *data)
> >> > > +{
> >> > > +	struct linereq *lr = container_of(nb, struct linereq,
> >> > > +					  device_unregistered_nb);
> >> > > +	int i;
> >> > > +
> >> > > +	for (i = 0; i < lr->num_lines; i++) {
> >> > > +		if (lr->lines[i].desc)
> >> > > +			edge_detector_stop(&lr->lines[i]);
> >> > > +	}
> >> > > +  
> >> >
> >> > Firstly, the re-ordering in the previous patch creates a race,
> >> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
> >> > there is now a window between the notifier being called and that numbing,
> >> > during which userspace may call linereq_set_config() and re-request
> >> > the irq.
> >> >
> >> > There is also a race here with linereq_set_config().  That can be prevented
> >> > by holding the lr->config_mutex - assuming the notifier is not being called
> >> > from atomic context.
> >> >  
> >>
> >> It occurs to me that the fixed reordering in patch 1 would place
> >> the notifier call AFTER the NULLing of the ioctls, so there will no longer
> >> be any chance of a race with linereq_set_config() - so holding the
> >> config_mutex semaphore is not necessary.
> >>  
> >
> > NULLing -> numbing
> >
> > The gdev->chip is NULLed, so the ioctls are numbed.
> > And I need to let the coffee soak in before sending.
> >  
> >> In which case this patch is fine - it is only patch 1 that requires
> >> updating.
> >>
> >> Cheers,
> >> Kent.  
> >  
> 
> The fix for the user-space issue may be more-or-less correct but the problem is
> deeper and this won't fix it for in-kernel users.
> 
> Herve: please consider the following DT snippet:
> 
> 	gpio0 {
> 		compatible = "foo";
> 
> 		gpio-controller;
> 		#gpio-cells = <2>;
> 		interrupt-controller;
> 		#interrupt-cells = <1>;
> 		ngpios = <8>;
> 	};
> 
> 	consumer {
> 		compatible = "bar";
> 
> 		interrupts-extended = <&gpio0 0>;
> 	};
> 
> If you unbind the "gpio0" device after the consumer requested the interrupt,
> you'll get the same splat. And device links will not help you here (on that
> note: Saravana: is there anything we could do about it? Have you even
> considered making the irqchip subsystem use the driver model in any way? Is it
> even feasible?).
> 
> I would prefer this to be fixed at a lower lever than the GPIOLIB character
> device.

I think this use case is covered.
When the consumer device related to the consumer DT node is added, a
consumer/supplier relationship is created:
parse_interrupts() parses the 'interrups-extended' property
  https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
and so, of_link_to_phandle() creates the consumer/supplier link.
  https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316

We that link present, if the supplier is removed, the consumer is removed
before.
The consumer should release the interrupt during its remove process (i.e
explicit in its .remove() or explicit because of a devm_*() call).

At least, it is my understanding.

Best regards,
Hervé

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-22 11:36           ` Herve Codina
@ 2024-02-22 12:21             ` Bartosz Golaszewski
  2024-02-22 23:51               ` Saravana Kannan
  0 siblings, 1 reply; 17+ messages in thread
From: Bartosz Golaszewski @ 2024-02-22 12:21 UTC (permalink / raw)
  To: Herve Codina
  Cc: Kent Gibson, Linus Walleij, linux-gpio, linux-kernel,
	Luca Ceresoli, Thomas Petazzoni, Saravana Kannan

On Thu, Feb 22, 2024 at 12:36 PM Herve Codina <herve.codina@bootlin.com> wrote:
>
> Hi Bartosz,
>
> On Thu, 22 Feb 2024 00:31:08 -0800
> Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> > On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said:
> > > On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote:
> > >> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote:
> > >> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:
> > >>
> > >> ...
> > >>
> > >> > >  }
> > >> > >
> > >> > > +static int linereq_unregistered_notify(struct notifier_block *nb,
> > >> > > +                                     unsigned long action, void *data)
> > >> > > +{
> > >> > > +      struct linereq *lr = container_of(nb, struct linereq,
> > >> > > +                                        device_unregistered_nb);
> > >> > > +      int i;
> > >> > > +
> > >> > > +      for (i = 0; i < lr->num_lines; i++) {
> > >> > > +              if (lr->lines[i].desc)
> > >> > > +                      edge_detector_stop(&lr->lines[i]);
> > >> > > +      }
> > >> > > +
> > >> >
> > >> > Firstly, the re-ordering in the previous patch creates a race,
> > >> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
> > >> > there is now a window between the notifier being called and that numbing,
> > >> > during which userspace may call linereq_set_config() and re-request
> > >> > the irq.
> > >> >
> > >> > There is also a race here with linereq_set_config().  That can be prevented
> > >> > by holding the lr->config_mutex - assuming the notifier is not being called
> > >> > from atomic context.
> > >> >
> > >>
> > >> It occurs to me that the fixed reordering in patch 1 would place
> > >> the notifier call AFTER the NULLing of the ioctls, so there will no longer
> > >> be any chance of a race with linereq_set_config() - so holding the
> > >> config_mutex semaphore is not necessary.
> > >>
> > >
> > > NULLing -> numbing
> > >
> > > The gdev->chip is NULLed, so the ioctls are numbed.
> > > And I need to let the coffee soak in before sending.
> > >
> > >> In which case this patch is fine - it is only patch 1 that requires
> > >> updating.
> > >>
> > >> Cheers,
> > >> Kent.
> > >
> >
> > The fix for the user-space issue may be more-or-less correct but the problem is
> > deeper and this won't fix it for in-kernel users.
> >
> > Herve: please consider the following DT snippet:
> >
> >       gpio0 {
> >               compatible = "foo";
> >
> >               gpio-controller;
> >               #gpio-cells = <2>;
> >               interrupt-controller;
> >               #interrupt-cells = <1>;
> >               ngpios = <8>;
> >       };
> >
> >       consumer {
> >               compatible = "bar";
> >
> >               interrupts-extended = <&gpio0 0>;
> >       };
> >
> > If you unbind the "gpio0" device after the consumer requested the interrupt,
> > you'll get the same splat. And device links will not help you here (on that
> > note: Saravana: is there anything we could do about it? Have you even
> > considered making the irqchip subsystem use the driver model in any way? Is it
> > even feasible?).
> >
> > I would prefer this to be fixed at a lower lever than the GPIOLIB character
> > device.
>
> I think this use case is covered.
> When the consumer device related to the consumer DT node is added, a
> consumer/supplier relationship is created:
> parse_interrupts() parses the 'interrups-extended' property
>   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
> and so, of_link_to_phandle() creates the consumer/supplier link.
>   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
>
> We that link present, if the supplier is removed, the consumer is removed
> before.
> The consumer should release the interrupt during its remove process (i.e
> explicit in its .remove() or explicit because of a devm_*() call).
>
> At least, it is my understanding.

Well, then it doesn't work, because I literally just tried it before
sending my previous email.

Please try it yourself, you'll see.

Also: an interrupt controller may not even have a device consuming its
DT node (see IRQCHIP_DECLARE()), what happens then?

Bart

>
> Best regards,
> Hervé

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-22 12:21             ` Bartosz Golaszewski
@ 2024-02-22 23:51               ` Saravana Kannan
  2024-02-27  8:26                 ` Herve Codina
  2024-02-27 19:27                 ` Bartosz Golaszewski
  0 siblings, 2 replies; 17+ messages in thread
From: Saravana Kannan @ 2024-02-22 23:51 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Herve Codina, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel, Luca Ceresoli, Thomas Petazzoni

On Thu, Feb 22, 2024 at 4:21 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> On Thu, Feb 22, 2024 at 12:36 PM Herve Codina <herve.codina@bootlin.com> wrote:
> >
> > Hi Bartosz,
> >
> > On Thu, 22 Feb 2024 00:31:08 -0800
> > Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >
> > > On Thu, 22 Feb 2024 02:05:30 +0100, Kent Gibson <warthog618@gmail.com> said:
> > > > On Thu, Feb 22, 2024 at 08:57:44AM +0800, Kent Gibson wrote:
> > > >> On Tue, Feb 20, 2024 at 10:29:59PM +0800, Kent Gibson wrote:
> > > >> > On Tue, Feb 20, 2024 at 12:10:18PM +0100, Herve Codina wrote:
> > > >>
> > > >> ...
> > > >>
> > > >> > >  }
> > > >> > >
> > > >> > > +static int linereq_unregistered_notify(struct notifier_block *nb,
> > > >> > > +                                     unsigned long action, void *data)
> > > >> > > +{
> > > >> > > +      struct linereq *lr = container_of(nb, struct linereq,
> > > >> > > +                                        device_unregistered_nb);
> > > >> > > +      int i;
> > > >> > > +
> > > >> > > +      for (i = 0; i < lr->num_lines; i++) {
> > > >> > > +              if (lr->lines[i].desc)
> > > >> > > +                      edge_detector_stop(&lr->lines[i]);
> > > >> > > +      }
> > > >> > > +
> > > >> >
> > > >> > Firstly, the re-ordering in the previous patch creates a race,
> > > >> > as the NULLing of the gdev->chip serves to numb the cdev ioctls, so
> > > >> > there is now a window between the notifier being called and that numbing,
> > > >> > during which userspace may call linereq_set_config() and re-request
> > > >> > the irq.
> > > >> >
> > > >> > There is also a race here with linereq_set_config().  That can be prevented
> > > >> > by holding the lr->config_mutex - assuming the notifier is not being called
> > > >> > from atomic context.
> > > >> >
> > > >>
> > > >> It occurs to me that the fixed reordering in patch 1 would place
> > > >> the notifier call AFTER the NULLing of the ioctls, so there will no longer
> > > >> be any chance of a race with linereq_set_config() - so holding the
> > > >> config_mutex semaphore is not necessary.
> > > >>
> > > >
> > > > NULLing -> numbing
> > > >
> > > > The gdev->chip is NULLed, so the ioctls are numbed.
> > > > And I need to let the coffee soak in before sending.
> > > >
> > > >> In which case this patch is fine - it is only patch 1 that requires
> > > >> updating.
> > > >>
> > > >> Cheers,
> > > >> Kent.
> > > >
> > >
> > > The fix for the user-space issue may be more-or-less correct but the problem is
> > > deeper and this won't fix it for in-kernel users.
> > >
> > > Herve: please consider the following DT snippet:
> > >
> > >       gpio0 {
> > >               compatible = "foo";
> > >
> > >               gpio-controller;
> > >               #gpio-cells = <2>;
> > >               interrupt-controller;
> > >               #interrupt-cells = <1>;
> > >               ngpios = <8>;
> > >       };
> > >
> > >       consumer {
> > >               compatible = "bar";
> > >
> > >               interrupts-extended = <&gpio0 0>;
> > >       };
> > >
> > > If you unbind the "gpio0" device after the consumer requested the interrupt,
> > > you'll get the same splat. And device links will not help you here (on that
> > > note: Saravana: is there anything we could do about it? Have you even
> > > considered making the irqchip subsystem use the driver model in any way? Is it
> > > even feasible?).

I did add support to irqchip to use the driver model. See
IRQCHIP_PLATFORM_DRIVER_BEGIN() and uses of it.  So this makes sure
the probe ordering is correct.

But when I added that support, there was some pushback on making the
modules removable[1]. But that's why you'll see that the
IRQCHIP_PLATFORM_DRIVER_BEGIN() macro set .suppress_bind_attrs = true.

Do you have a way to unregister an interrupt controller in your
example? If so, how do you unregister it? It shouldn't be too hard to
extend those macros to add removal support. We could add a
IRQCHIP_MATCH2() that also takes in an exit() function op that gets
called on device unbind.

[1] - https://lore.kernel.org/lkml/86sghas7so.wl-maz@kernel.org/#t

> > >
> > > I would prefer this to be fixed at a lower lever than the GPIOLIB character
> > > device.
> >
> > I think this use case is covered.
> > When the consumer device related to the consumer DT node is added, a
> > consumer/supplier relationship is created:
> > parse_interrupts() parses the 'interrups-extended' property
> >   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
> > and so, of_link_to_phandle() creates the consumer/supplier link.
> >   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
> >
> > We that link present, if the supplier is removed, the consumer is removed
> > before.
> > The consumer should release the interrupt during its remove process (i.e
> > explicit in its .remove() or explicit because of a devm_*() call).
> >
> > At least, it is my understanding.
>
> Well, then it doesn't work, because I literally just tried it before
> sending my previous email.

For your gpio0 device, can you see why __device_release_driver()
doesn't end up calling device_links_unbind_consumers()?

Also, can you look at
/sys/class/devlink/<bus:gpio0-devicename>--<consumer device name>
folders and see what the status file says before you try to unbind the
gpio0 device? It should say "active".

> Please try it yourself, you'll see.
>
> Also: an interrupt controller may not even have a device consuming its
> DT node (see IRQCHIP_DECLARE()), what happens then?

Yeah, we are screwed in those cases. Ideally we are rejecting all
submissions for irqchip drivers that use IRQCHIP_DECLARE().

-Saravana

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-22 23:51               ` Saravana Kannan
@ 2024-02-27  8:26                 ` Herve Codina
  2024-02-27 19:27                 ` Bartosz Golaszewski
  1 sibling, 0 replies; 17+ messages in thread
From: Herve Codina @ 2024-02-27  8:26 UTC (permalink / raw)
  To: Bartosz Golaszewski
  Cc: Saravana Kannan, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel, Luca Ceresoli, Thomas Petazzoni

Hi Bartosz

On Thu, 22 Feb 2024 15:51:15 -0800
Saravana Kannan <saravanak@google.com> wrote:

> On Thu, Feb 22, 2024 at 4:21 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >
...
> > > >
> > > > The fix for the user-space issue may be more-or-less correct but the problem is
> > > > deeper and this won't fix it for in-kernel users.
> > > >
> > > > Herve: please consider the following DT snippet:
> > > >
> > > >       gpio0 {
> > > >               compatible = "foo";
> > > >
> > > >               gpio-controller;
> > > >               #gpio-cells = <2>;
> > > >               interrupt-controller;
> > > >               #interrupt-cells = <1>;
> > > >               ngpios = <8>;
> > > >       };
> > > >
> > > >       consumer {
> > > >               compatible = "bar";
> > > >
> > > >               interrupts-extended = <&gpio0 0>;
> > > >       };
> > > >
> > > > If you unbind the "gpio0" device after the consumer requested the interrupt,
> > > > you'll get the same splat. And device links will not help you here (on that
> > > > note: Saravana: is there anything we could do about it? Have you even
> > > > considered making the irqchip subsystem use the driver model in any way? Is it
> > > > even feasible?).  
> 
> I did add support to irqchip to use the driver model. See
> IRQCHIP_PLATFORM_DRIVER_BEGIN() and uses of it.  So this makes sure
> the probe ordering is correct.
> 
> But when I added that support, there was some pushback on making the
> modules removable[1]. But that's why you'll see that the
> IRQCHIP_PLATFORM_DRIVER_BEGIN() macro set .suppress_bind_attrs = true.
> 
> Do you have a way to unregister an interrupt controller in your
> example? If so, how do you unregister it? It shouldn't be too hard to
> extend those macros to add removal support. We could add a
> IRQCHIP_MATCH2() that also takes in an exit() function op that gets
> called on device unbind.
> 
> [1] - https://lore.kernel.org/lkml/86sghas7so.wl-maz@kernel.org/#t
> 
> > > >
> > > > I would prefer this to be fixed at a lower lever than the GPIOLIB character
> > > > device.  
> > >
> > > I think this use case is covered.
> > > When the consumer device related to the consumer DT node is added, a
> > > consumer/supplier relationship is created:
> > > parse_interrupts() parses the 'interrups-extended' property
> > >   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
> > > and so, of_link_to_phandle() creates the consumer/supplier link.
> > >   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
> > >
> > > We that link present, if the supplier is removed, the consumer is removed
> > > before.
> > > The consumer should release the interrupt during its remove process (i.e
> > > explicit in its .remove() or explicit because of a devm_*() call).
> > >
> > > At least, it is my understanding.  
> >
> > Well, then it doesn't work, because I literally just tried it before
> > sending my previous email.  
> 
> For your gpio0 device, can you see why __device_release_driver()
> doesn't end up calling device_links_unbind_consumers()?
> 
> Also, can you look at
> /sys/class/devlink/<bus:gpio0-devicename>--<consumer device name>
> folders and see what the status file says before you try to unbind the
> gpio0 device? It should say "active".
> 
> > Please try it yourself, you'll see.
> >
> > Also: an interrupt controller may not even have a device consuming its
> > DT node (see IRQCHIP_DECLARE()), what happens then?  
> 
> Yeah, we are screwed in those cases. Ideally we are rejecting all
> submissions for irqchip drivers that use IRQCHIP_DECLARE().
> 

I have the feeling that this issue related to your gpio0 driver unbind is out of
the scope of this series.

Let move forward with the user-space fix (cdev) related to this series.
I will sent the v2 to cover the cdev case.

Regards,
Hervé

-- 
Hervé Codina, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed
  2024-02-22 23:51               ` Saravana Kannan
  2024-02-27  8:26                 ` Herve Codina
@ 2024-02-27 19:27                 ` Bartosz Golaszewski
  1 sibling, 0 replies; 17+ messages in thread
From: Bartosz Golaszewski @ 2024-02-27 19:27 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Herve Codina, Kent Gibson, Linus Walleij, linux-gpio,
	linux-kernel, Luca Ceresoli, Thomas Petazzoni

On Fri, Feb 23, 2024 at 12:51 AM Saravana Kannan <saravanak@google.com> wrote:
>

[snip]

> > > >
> > > > If you unbind the "gpio0" device after the consumer requested the interrupt,
> > > > you'll get the same splat. And device links will not help you here (on that
> > > > note: Saravana: is there anything we could do about it? Have you even
> > > > considered making the irqchip subsystem use the driver model in any way? Is it
> > > > even feasible?).
>
> I did add support to irqchip to use the driver model. See
> IRQCHIP_PLATFORM_DRIVER_BEGIN() and uses of it.  So this makes sure
> the probe ordering is correct.
>
> But when I added that support, there was some pushback on making the
> modules removable[1]. But that's why you'll see that the
> IRQCHIP_PLATFORM_DRIVER_BEGIN() macro set .suppress_bind_attrs = true.
>
> Do you have a way to unregister an interrupt controller in your
> example? If so, how do you unregister it? It shouldn't be too hard to
> extend those macros to add removal support. We could add a
> IRQCHIP_MATCH2() that also takes in an exit() function op that gets
> called on device unbind.
>

The provider in my example is a GPIO chip that registers its own IRQ
domain. The domain is destroyed when the GPIO controller is
unregistered but interrupts can be requested orthogonally to the GPIO
subsystem and we don't have the knowledge about any interrupt users as
the .to_irq() callback is never called. Let me know if anything can be
done in this case.

I used the gpio-sim testing module for it (instantiated over DT) but
any such GPIO chip that is also an interrupt-controller would behave
the same.

> [1] - https://lore.kernel.org/lkml/86sghas7so.wl-maz@kernel.org/#t
>
> > > >
> > > > I would prefer this to be fixed at a lower lever than the GPIOLIB character
> > > > device.
> > >
> > > I think this use case is covered.
> > > When the consumer device related to the consumer DT node is added, a
> > > consumer/supplier relationship is created:
> > > parse_interrupts() parses the 'interrups-extended' property
> > >   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
> > > and so, of_link_to_phandle() creates the consumer/supplier link.
> > >   https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/of/property.c#L1316
> > >
> > > We that link present, if the supplier is removed, the consumer is removed
> > > before.
> > > The consumer should release the interrupt during its remove process (i.e
> > > explicit in its .remove() or explicit because of a devm_*() call).
> > >
> > > At least, it is my understanding.
> >
> > Well, then it doesn't work, because I literally just tried it before
> > sending my previous email.
>
> For your gpio0 device, can you see why __device_release_driver()
> doesn't end up calling device_links_unbind_consumers()?
>

It never gets into the while (device_links_busy(dev)) loop.

> Also, can you look at
> /sys/class/devlink/<bus:gpio0-devicename>--<consumer device name>
> folders and see what the status file says before you try to unbind the
> gpio0 device? It should say "active".
>

It says "available".

If you have some board supported upstream you could use for testing, I
think I could prepare you a DT snippet for testing.

Bart

[snip]

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-02-27 19:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-20 11:10 [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal Herve Codina
2024-02-20 11:10 ` [PATCH 1/2] gpiolib: call gcdev_unregister() sooner in the removal operations Herve Codina
2024-02-20 13:47   ` Bartosz Golaszewski
2024-02-20 11:10 ` [PATCH 2/2] gpiolib: cdev: release IRQs when the gpio chip device is removed Herve Codina
2024-02-20 14:29   ` Kent Gibson
2024-02-20 18:26     ` Herve Codina
2024-02-21  0:25       ` Kent Gibson
2024-02-21  0:55         ` Kent Gibson
2024-02-22  0:57     ` Kent Gibson
2024-02-22  1:05       ` Kent Gibson
2024-02-22  8:31         ` Bartosz Golaszewski
2024-02-22 11:36           ` Herve Codina
2024-02-22 12:21             ` Bartosz Golaszewski
2024-02-22 23:51               ` Saravana Kannan
2024-02-27  8:26                 ` Herve Codina
2024-02-27 19:27                 ` Bartosz Golaszewski
2024-02-20 13:41 ` [PATCH 0/2] gpio-cdev: Release IRQ used by gpio-cdev on gpio chip removal Bartosz Golaszewski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.