All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] Avoid system abort by moving pm domain's detach after devres_release_all
@ 2017-08-15  8:36 Shawn Lin
  2017-08-15  8:36 ` [PATCH v2 1/2] driver core: detach device's pm_domain " Shawn Lin
  2017-08-15  8:36 ` [PATCH v2 2/2] mmc: dw_mmc: fix potential system abort if activating CONFIG_DEBUG_SHIRQ Shawn Lin
  0 siblings, 2 replies; 6+ messages in thread
From: Shawn Lin @ 2017-08-15  8:36 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J. Wysocki, Greg Kroah-Hartman
  Cc: Heiko Stuebner, Jaehoon Chung, linux-mmc, linux-kernel, Shawn Lin


CONFIG_DEBUG_SHIRQ will fire extra irq action to call the registered
irq callback after driver is removed or failed to probe. In general,
the irq callback provided by driver should read its internal registers
to see who fires the irq. So this leads a situation that we access the
registers with a powered-off pm domain that the system abort. This is
a system-wide issue may break lots of drivers, IMHO.

dwmmc driver is one of them suffered from this as it allows the variant
drivers to register shared irq. Please see the commit message of patch 2
for how that happened.

I haven't find a proper way to freeze the genpd_power_off_work_fn
and fire it again when finish devres_release_all indeed. And it
seems to me that device's pm domain detach is always called when
failing to probe or removing the driver. So from a top view, I have to cook
patch 1 for a RFC to see if folks think this's the best way to fix that,
otherwise we may need to fix it everywhere for other drivers.

These patchset was tested by hacking the driver to return a failure for probe
and also by unbinding the driver and all seem to work fine.


Changes in v2:
- include a driver core change to fix the genpd issue.

Shawn Lin (2):
  driver core: detach device's pm_domain after devres_release_all
  mmc: dw_mmc: fix potential system abort if activating
    CONFIG_DEBUG_SHIRQ

 drivers/base/dd.c         | 35 +++++++++++++++++++++++++++++-----
 drivers/base/platform.c   | 18 ++----------------
 drivers/mmc/host/dw_mmc.c | 48 +++++++++++++++++++++++------------------------
 3 files changed, 55 insertions(+), 46 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2 1/2] driver core: detach device's pm_domain after devres_release_all
  2017-08-15  8:36 [PATCH v2 0/2] Avoid system abort by moving pm domain's detach after devres_release_all Shawn Lin
@ 2017-08-15  8:36 ` Shawn Lin
  2017-08-29  6:42   ` Greg Kroah-Hartman
  2017-08-15  8:36 ` [PATCH v2 2/2] mmc: dw_mmc: fix potential system abort if activating CONFIG_DEBUG_SHIRQ Shawn Lin
  1 sibling, 1 reply; 6+ messages in thread
From: Shawn Lin @ 2017-08-15  8:36 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J. Wysocki, Greg Kroah-Hartman
  Cc: Heiko Stuebner, Jaehoon Chung, linux-mmc, linux-kernel, Shawn Lin

Move dev_pm_domain_detach after devres_release_all to avoid
accessing device's registers with genpd been powered off.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
---

Changes in v2: None

 drivers/base/dd.c       | 35 ++++++++++++++++++++++++++++++-----
 drivers/base/platform.c | 18 ++----------------
 2 files changed, 32 insertions(+), 21 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index ad44b40..13dc0ad 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -25,7 +25,9 @@
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
+#include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
+#include <linux/pm_domain.h>
 #include <linux/pinctrl/devinfo.h>
 
 #include "base.h"
@@ -356,6 +358,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	int local_trigger_count = atomic_read(&deferred_trigger_count);
 	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
 			   !drv->suppress_bind_attrs;
+	struct platform_driver *pdrv;
+	bool do_pm_domain = false;
 
 	if (defer_all_probes) {
 		/*
@@ -414,6 +418,16 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 		if (ret)
 			goto probe_failed;
 	} else if (drv->probe) {
+		ret = dev_pm_domain_attach(dev, true);
+		pdrv = to_platform_driver(dev->driver);
+		/* don't fail if just dev_pm_domain_attach failed */
+		if (pdrv->prevent_deferred_probe &&
+		    ret == -EPROBE_DEFER) {
+			dev_warn(dev, "probe deferral not supported\n");
+			ret = -ENXIO;
+			goto probe_failed;
+		}
+		do_pm_domain = true;
 		ret = drv->probe(dev);
 		if (ret)
 			goto probe_failed;
@@ -421,13 +435,17 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 
 	if (test_remove) {
 		test_remove = false;
+		do_pm_domain = false;
 
-		if (dev->bus->remove)
+		if (dev->bus->remove) {
 			dev->bus->remove(dev);
-		else if (drv->remove)
+		} else if (drv->remove) {
 			drv->remove(dev);
-
+			do_pm_domain = true;
+		}
 		devres_release_all(dev);
+		if (do_pm_domain)
+			dev_pm_domain_detach(dev, true);
 		driver_sysfs_remove(dev);
 		dev->driver = NULL;
 		dev_set_drvdata(dev, NULL);
@@ -458,6 +476,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 pinctrl_bind_failed:
 	device_links_no_driver(dev);
 	devres_release_all(dev);
+	if (do_pm_domain)
+		dev_pm_domain_detach(dev, true);
 	driver_sysfs_remove(dev);
 	dev->driver = NULL;
 	dev_set_drvdata(dev, NULL);
@@ -818,6 +838,7 @@ int driver_attach(struct device_driver *drv)
 static void __device_release_driver(struct device *dev, struct device *parent)
 {
 	struct device_driver *drv;
+	bool do_pm_domain = false;
 
 	drv = dev->driver;
 	if (drv) {
@@ -855,15 +876,19 @@ static void __device_release_driver(struct device *dev, struct device *parent)
 
 		pm_runtime_put_sync(dev);
 
-		if (dev->bus && dev->bus->remove)
+		if (dev->bus && dev->bus->remove) {
 			dev->bus->remove(dev);
-		else if (drv->remove)
+		} else if (drv->remove) {
+			do_pm_domain = true;
 			drv->remove(dev);
+		}
 
 		device_links_driver_cleanup(dev);
 		dma_deconfigure(dev);
 
 		devres_release_all(dev);
+		if (do_pm_domain)
+			dev_pm_domain_detach(dev, true);
 		dev->driver = NULL;
 		dev_set_drvdata(dev, NULL);
 		if (dev->pm_domain && dev->pm_domain->dismiss)
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index d1bd992..8fa688d 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -572,22 +572,8 @@ static int platform_drv_probe(struct device *_dev)
 	if (ret < 0)
 		return ret;
 
-	ret = dev_pm_domain_attach(_dev, true);
-	if (ret != -EPROBE_DEFER) {
-		if (drv->probe) {
-			ret = drv->probe(dev);
-			if (ret)
-				dev_pm_domain_detach(_dev, true);
-		} else {
-			/* don't fail if just dev_pm_domain_attach failed */
-			ret = 0;
-		}
-	}
-
-	if (drv->prevent_deferred_probe && ret == -EPROBE_DEFER) {
-		dev_warn(_dev, "probe deferral not supported\n");
-		ret = -ENXIO;
-	}
+	if (drv->probe)
+		ret = drv->probe(dev);
 
 	return ret;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH v2 2/2] mmc: dw_mmc: fix potential system abort if activating CONFIG_DEBUG_SHIRQ
  2017-08-15  8:36 [PATCH v2 0/2] Avoid system abort by moving pm domain's detach after devres_release_all Shawn Lin
  2017-08-15  8:36 ` [PATCH v2 1/2] driver core: detach device's pm_domain " Shawn Lin
@ 2017-08-15  8:36 ` Shawn Lin
  1 sibling, 0 replies; 6+ messages in thread
From: Shawn Lin @ 2017-08-15  8:36 UTC (permalink / raw)
  To: Ulf Hansson, Rafael J. Wysocki, Greg Kroah-Hartman
  Cc: Heiko Stuebner, Jaehoon Chung, linux-mmc, linux-kernel, Shawn Lin

With CONFIG_DEBUG_SHIRQ enabled, the irq tear down routine
would still access the irq handler registed as a shard irq.
Per the comment within the function of __free_irq, it says
"It's a shared IRQ -- the driver ought to be prepared for
an IRQ event to happen even now it's being freed". However
when failing to probe the driver, dw_mmc disables the clock
and asserts the reset pin, even power off its genpd for accessing
the registers and the following check for shared irq state would call
the irq handler which accesses the register w/o all necessary resource
prepared. That will hang the system forever.

With adding some dump_stack we could see how that happened.

Synopsys Designware Multimedia Card Interface Driver
dwmmc_rockchip fe320000.dwmmc: IDMAC supports 32-bit address mode.
dwmmc_rockchip fe320000.dwmmc: Using internal DMA controller.
dwmmc_rockchip fe320000.dwmmc: Version ID is 270a
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
4.13.0-rc3-next-20170807-00004-g93d3644-dirty #5
Hardware name: Firefly-RK3399 Board (DT)
Call trace:
[<ffff20000808b5a0>] dump_backtrace+0x0/0x300
[<ffff20000808ba1c>] show_stack+0x14/0x20
[<ffff200008dc480c>] dump_stack+0xa4/0xc8
[<ffff200008b9691c>] dw_mci_interrupt+0x3c/0x6a8
[<ffff200008157450>] __free_irq+0x308/0x410
[<ffff20000815760c>] free_irq+0x54/0xb0
[<ffff20000815d630>] devm_irq_release+0x30/0x40
[<ffff2000087f0174>] release_nodes+0x1e4/0x390
[<ffff2000087f04c0>] devres_release_all+0x50/0x78
[<ffff2000087e9bc0>] driver_probe_device+0x128/0x3b8
[<ffff2000087e9f34>] __driver_attach+0xe4/0xe8
[<ffff2000087e7048>] bus_for_each_dev+0xe0/0x138
[<ffff2000087e93b8>] driver_attach+0x30/0x40
[<ffff2000087e8c00>] bus_add_driver+0x1d0/0x328
[<ffff2000087ead0c>] driver_register+0xb4/0x198
[<ffff2000087ec98c>] __platform_driver_register+0x7c/0x88
[<ffff2000095bc564>] dw_mci_rockchip_pltfm_driver_init+0x18/0x20
[<ffff200008083a8c>] do_one_initcall+0x14c/0x1b8
[<ffff200009560ff8>] kernel_init_freeable+0x238/0x2d8
[<ffff200008dde500>] kernel_init+0x10/0x110
[<ffff2000080836c0>] ret_from_fork+0x10/0x50
Synchronous External Abort: synchronous external abort (0x96000010) at
0xffff20000aaa4040
Internal error: : 96000010 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
task: ffff80006ba28080 task.stack: ffff80006ba24000
PC is at dw_mci_interrupt+0x4c/0x6a8
LR is at dw_mci_interrupt+0x44/0x6a8
pc : [<ffff200008b9692c>] lr : [<ffff200008b96924>] pstate: 600001c5

...

Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x00200c
Memory Limit: none
---[ end Kernel panic - not syncing: Attempted to kill init!
exitcode=0x0000000b

In order to fix this, we remove all the clock-disabling from
the error handle path and driver's remove function. And replying
on the devm_add_action_or_reset to fire the clock-disabling and reset
signal at the appropriate time.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>

---

Changes in v2:
- include a driver core change to fix the genpd issue.

 drivers/mmc/host/dw_mmc.c | 48 +++++++++++++++++++++++------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index b9640c7..05b5acf 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -3012,6 +3012,18 @@ static void dw_mci_enable_cd(struct dw_mci *host)
 	}
 }
 
+static void dw_mci_post_cleanup(void *data)
+{
+	struct dw_mci *host = data;
+
+	clk_disable_unprepare(host->ciu_clk);
+	clk_disable_unprepare(host->biu_clk);
+
+	if (!IS_ERR(host->pdata->rstc))
+		reset_control_assert(host->pdata->rstc);
+
+}
+
 int dw_mci_probe(struct dw_mci *host)
 {
 	const struct dw_mci_drv_data *drv_data = host->drv_data;
@@ -3047,7 +3059,7 @@ int dw_mci_probe(struct dw_mci *host)
 		ret = clk_prepare_enable(host->ciu_clk);
 		if (ret) {
 			dev_err(host->dev, "failed to enable ciu clock\n");
-			goto err_clk_biu;
+			return ret;
 		}
 
 		if (host->pdata->bus_hz) {
@@ -3060,11 +3072,16 @@ int dw_mci_probe(struct dw_mci *host)
 		host->bus_hz = clk_get_rate(host->ciu_clk);
 	}
 
+	ret = devm_add_action_or_reset(host->dev, dw_mci_post_cleanup, host);
+	if (ret) {
+		dev_err(host->dev, "unable to add action or reset\n");
+		return ret;
+	}
+
 	if (!host->bus_hz) {
 		dev_err(host->dev,
 			"Platform data must supply bus speed\n");
-		ret = -ENODEV;
-		goto err_clk_ciu;
+		return -ENODEV;
 	}
 
 	if (drv_data && drv_data->init) {
@@ -3072,7 +3089,7 @@ int dw_mci_probe(struct dw_mci *host)
 		if (ret) {
 			dev_err(host->dev,
 				"implementation specific init failed\n");
-			goto err_clk_ciu;
+			return ret;
 		}
 	}
 
@@ -3119,10 +3136,8 @@ int dw_mci_probe(struct dw_mci *host)
 	}
 
 	/* Reset all blocks */
-	if (!dw_mci_ctrl_reset(host, SDMMC_CTRL_ALL_RESET_FLAGS)) {
-		ret = -ENODEV;
-		goto err_clk_ciu;
-	}
+	if (!dw_mci_ctrl_reset(host, SDMMC_CTRL_ALL_RESET_FLAGS))
+		return -ENODEV;
 
 	host->dma_ops = host->pdata->dma_ops;
 	dw_mci_init_dma(host);
@@ -3209,15 +3224,6 @@ int dw_mci_probe(struct dw_mci *host)
 	if (host->use_dma && host->dma_ops->exit)
 		host->dma_ops->exit(host);
 
-	if (!IS_ERR(host->pdata->rstc))
-		reset_control_assert(host->pdata->rstc);
-
-err_clk_ciu:
-	clk_disable_unprepare(host->ciu_clk);
-
-err_clk_biu:
-	clk_disable_unprepare(host->biu_clk);
-
 	return ret;
 }
 EXPORT_SYMBOL(dw_mci_probe);
@@ -3237,17 +3243,9 @@ void dw_mci_remove(struct dw_mci *host)
 
 	if (host->use_dma && host->dma_ops->exit)
 		host->dma_ops->exit(host);
-
-	if (!IS_ERR(host->pdata->rstc))
-		reset_control_assert(host->pdata->rstc);
-
-	clk_disable_unprepare(host->ciu_clk);
-	clk_disable_unprepare(host->biu_clk);
 }
 EXPORT_SYMBOL(dw_mci_remove);
 
-
-
 #ifdef CONFIG_PM
 int dw_mci_runtime_suspend(struct device *dev)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] driver core: detach device's pm_domain after devres_release_all
  2017-08-15  8:36 ` [PATCH v2 1/2] driver core: detach device's pm_domain " Shawn Lin
@ 2017-08-29  6:42   ` Greg Kroah-Hartman
  2017-08-29  8:08     ` Shawn Lin
  0 siblings, 1 reply; 6+ messages in thread
From: Greg Kroah-Hartman @ 2017-08-29  6:42 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Ulf Hansson, Rafael J. Wysocki, Heiko Stuebner, Jaehoon Chung,
	linux-mmc, linux-kernel

On Tue, Aug 15, 2017 at 04:36:56PM +0800, Shawn Lin wrote:
> Move dev_pm_domain_detach after devres_release_all to avoid
> accessing device's registers with genpd been powered off.

So, what is this going to break that is working already today?  :)

> 
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
> ---
> 
> Changes in v2: None
> 
>  drivers/base/dd.c       | 35 ++++++++++++++++++++++++++++++-----
>  drivers/base/platform.c | 18 ++----------------
>  2 files changed, 32 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index ad44b40..13dc0ad 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -25,7 +25,9 @@
>  #include <linux/kthread.h>
>  #include <linux/wait.h>
>  #include <linux/async.h>
> +#include <linux/platform_device.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/pm_domain.h>
>  #include <linux/pinctrl/devinfo.h>
>  
>  #include "base.h"
> @@ -356,6 +358,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  	int local_trigger_count = atomic_read(&deferred_trigger_count);
>  	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
>  			   !drv->suppress_bind_attrs;
> +	struct platform_driver *pdrv;
> +	bool do_pm_domain = false;
>  
>  	if (defer_all_probes) {
>  		/*
> @@ -414,6 +418,16 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  		if (ret)
>  			goto probe_failed;
>  	} else if (drv->probe) {
> +		ret = dev_pm_domain_attach(dev, true);
> +		pdrv = to_platform_driver(dev->driver);
> +		/* don't fail if just dev_pm_domain_attach failed */
> +		if (pdrv->prevent_deferred_probe &&
> +		    ret == -EPROBE_DEFER) {
> +			dev_warn(dev, "probe deferral not supported\n");
> +			ret = -ENXIO;
> +			goto probe_failed;
> +		}
> +		do_pm_domain = true;
>  		ret = drv->probe(dev);
>  		if (ret)
>  			goto probe_failed;
> @@ -421,13 +435,17 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  
>  	if (test_remove) {
>  		test_remove = false;
> +		do_pm_domain = false;
>  
> -		if (dev->bus->remove)
> +		if (dev->bus->remove) {
>  			dev->bus->remove(dev);
> -		else if (drv->remove)
> +		} else if (drv->remove) {
>  			drv->remove(dev);
> -
> +			do_pm_domain = true;

Why is this set to true if you have a driver remove function, but not if
you only have a bus remove function?  Why the difference?


> +		}
>  		devres_release_all(dev);
> +		if (do_pm_domain)
> +			dev_pm_domain_detach(dev, true);
>  		driver_sysfs_remove(dev);
>  		dev->driver = NULL;
>  		dev_set_drvdata(dev, NULL);
> @@ -458,6 +476,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>  pinctrl_bind_failed:
>  	device_links_no_driver(dev);
>  	devres_release_all(dev);
> +	if (do_pm_domain)
> +		dev_pm_domain_detach(dev, true);

Can't you just always call this on the error path?

>  	driver_sysfs_remove(dev);
>  	dev->driver = NULL;
>  	dev_set_drvdata(dev, NULL);
> @@ -818,6 +838,7 @@ int driver_attach(struct device_driver *drv)
>  static void __device_release_driver(struct device *dev, struct device *parent)
>  {
>  	struct device_driver *drv;
> +	bool do_pm_domain = false;
>  
>  	drv = dev->driver;
>  	if (drv) {
> @@ -855,15 +876,19 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>  
>  		pm_runtime_put_sync(dev);
>  
> -		if (dev->bus && dev->bus->remove)
> +		if (dev->bus && dev->bus->remove) {
>  			dev->bus->remove(dev);
> -		else if (drv->remove)
> +		} else if (drv->remove) {
> +			do_pm_domain = true;

Same question here about drivers and bus default functions.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] driver core: detach device's pm_domain after devres_release_all
  2017-08-29  6:42   ` Greg Kroah-Hartman
@ 2017-08-29  8:08     ` Shawn Lin
  2017-08-29  9:03       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 6+ messages in thread
From: Shawn Lin @ 2017-08-29  8:08 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: shawn.lin, Ulf Hansson, Rafael J. Wysocki, Heiko Stuebner,
	Jaehoon Chung, linux-mmc, linux-kernel

Hi Greg,

On 2017/8/29 14:42, Greg Kroah-Hartman wrote:
> On Tue, Aug 15, 2017 at 04:36:56PM +0800, Shawn Lin wrote:
>> Move dev_pm_domain_detach after devres_release_all to avoid
>> accessing device's registers with genpd been powered off.
> 
> So, what is this going to break that is working already today?  :)

Thanks for your comment!

The background of this patch is that:
(1) Some SoCs, including Rockchips' SoCs, couldn't support
accessing controllers' registers w/o clk and power domain enabled.
(2) Many common drivers use devm_request_irq to request irq for either
shared irq or non-shared irq.
(3) So we rely on devres_release_all to free irq automatically.

So the actually race condition is:
(1) Driver A probe failed or calling remove
(2) power domain is detached right now
(3) A irq triggerd cocurrently just before calling devm_irq_release..
(4) Driver A's ISR read its register .. panic..

The issue is exposed by enabing CONFIG_DEBUG_SHIRQ. Thus devres_free_irq
will try to call the ISR as it says: "It's a shared IRQ -- the driver
ought to be prepared for an IRQ event to happen even now it's being
freed". So it calls the driver's ISR w/o power domain enabled, which
hangup the system... This is theoretically help folks to make the code
robust enough to deal with shared case.

But, for no matter whether the irq is shared or non-shared, the race
condition is there. So we possible have two choices that
(1) Either using request_irq and free_irq directly
(2) Or moving dev_pm_domain_detach after devres_release_all which
makes sure we free the irq before powering off power domain.

However doesn't choice(1) imply that devm_request_irq shouldn't
exist? :) So I try to fix it like what this patch does.

> 
>>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>> ---

...

> 
> Why is this set to true if you have a driver remove function, but not if
> you only have a bus remove function?  Why the difference?
> 
> 

Sorry, I will fix these all and always call dev_pm_domain_detach on the
error  path.

>> +		}
>>   		devres_release_all(dev);
>> +		if (do_pm_domain)
>> +			dev_pm_domain_detach(dev, true);
>>   		driver_sysfs_remove(dev);
>>   		dev->driver = NULL;
>>   		dev_set_drvdata(dev, NULL);
>> @@ -458,6 +476,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>>   pinctrl_bind_failed:
>>   	device_links_no_driver(dev);
>>   	devres_release_all(dev);
>> +	if (do_pm_domain)
>> +		dev_pm_domain_detach(dev, true);
> 
> Can't you just always call this on the error path?
> 
>>   	driver_sysfs_remove(dev);
>>   	dev->driver = NULL;
>>   	dev_set_drvdata(dev, NULL);
>> @@ -818,6 +838,7 @@ int driver_attach(struct device_driver *drv)
>>   static void __device_release_driver(struct device *dev, struct device *parent)
>>   {
>>   	struct device_driver *drv;
>> +	bool do_pm_domain = false;
>>   
>>   	drv = dev->driver;
>>   	if (drv) {
>> @@ -855,15 +876,19 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>>   
>>   		pm_runtime_put_sync(dev);
>>   
>> -		if (dev->bus && dev->bus->remove)
>> +		if (dev->bus && dev->bus->remove) {
>>   			dev->bus->remove(dev);
>> -		else if (drv->remove)
>> +		} else if (drv->remove) {
>> +			do_pm_domain = true;
> 
> Same question here about drivers and bus default functions.
> 
> thanks,
> 
> greg k-h
> 
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2 1/2] driver core: detach device's pm_domain after devres_release_all
  2017-08-29  8:08     ` Shawn Lin
@ 2017-08-29  9:03       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2017-08-29  9:03 UTC (permalink / raw)
  To: Shawn Lin
  Cc: Ulf Hansson, Rafael J. Wysocki, Heiko Stuebner, Jaehoon Chung,
	linux-mmc, linux-kernel

On Tue, Aug 29, 2017 at 04:08:52PM +0800, Shawn Lin wrote:
> Hi Greg,
> 
> On 2017/8/29 14:42, Greg Kroah-Hartman wrote:
> > On Tue, Aug 15, 2017 at 04:36:56PM +0800, Shawn Lin wrote:
> > > Move dev_pm_domain_detach after devres_release_all to avoid
> > > accessing device's registers with genpd been powered off.
> > 
> > So, what is this going to break that is working already today?  :)
> 
> Thanks for your comment!
> 
> The background of this patch is that:
> (1) Some SoCs, including Rockchips' SoCs, couldn't support
> accessing controllers' registers w/o clk and power domain enabled.
> (2) Many common drivers use devm_request_irq to request irq for either
> shared irq or non-shared irq.
> (3) So we rely on devres_release_all to free irq automatically.
> 
> So the actually race condition is:
> (1) Driver A probe failed or calling remove
> (2) power domain is detached right now
> (3) A irq triggerd cocurrently just before calling devm_irq_release..
> (4) Driver A's ISR read its register .. panic..

If a probe failed, the ISR should never be called, right?  So that
should not be an issue here.

> The issue is exposed by enabing CONFIG_DEBUG_SHIRQ. Thus devres_free_irq
> will try to call the ISR as it says: "It's a shared IRQ -- the driver
> ought to be prepared for an IRQ event to happen even now it's being
> freed". So it calls the driver's ISR w/o power domain enabled, which
> hangup the system... This is theoretically help folks to make the code
> robust enough to deal with shared case.
> 
> But, for no matter whether the irq is shared or non-shared, the race
> condition is there. So we possible have two choices that
> (1) Either using request_irq and free_irq directly
> (2) Or moving dev_pm_domain_detach after devres_release_all which
> makes sure we free the irq before powering off power domain.
> 
> However doesn't choice(1) imply that devm_request_irq shouldn't
> exist? :) So I try to fix it like what this patch does.

Ok, this makes a lot more sense, please put this kind of information in
the patch changelog when you resend it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-29  9:03 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-15  8:36 [PATCH v2 0/2] Avoid system abort by moving pm domain's detach after devres_release_all Shawn Lin
2017-08-15  8:36 ` [PATCH v2 1/2] driver core: detach device's pm_domain " Shawn Lin
2017-08-29  6:42   ` Greg Kroah-Hartman
2017-08-29  8:08     ` Shawn Lin
2017-08-29  9:03       ` Greg Kroah-Hartman
2017-08-15  8:36 ` [PATCH v2 2/2] mmc: dw_mmc: fix potential system abort if activating CONFIG_DEBUG_SHIRQ Shawn Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.