linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied
@ 2020-03-24 12:20 Andy Shevchenko
  2020-03-24 12:20 ` [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done() Andy Shevchenko
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Andy Shevchenko @ 2020-03-24 12:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki, linux-pm
  Cc: Andy Shevchenko, Artem Bityutskiy, Mark Brown, Felipe Balbi,
	Andrzej Hajda, Peter Ujfalusi, Ferry Toth

Consider the following scenario.

The main driver of USB OTG controller (dwc3-pci), which has the following
functional dependencies on certain platform:
- ULPI (tusb1210)
- extcon (tested with extcon-intel-mrfld)

Note, that first driver, tusb1210, is available at the moment of
dwc3-pci probing, while extcon-intel-mrfld is built as a module and
won't appear till user space does something about it.

This is depicted by kernel configuration excerpt:

	CONFIG_PHY_TUSB1210=y
	CONFIG_USB_DWC3=y
	CONFIG_USB_DWC3_ULPI=y
	CONFIG_USB_DWC3_DUAL_ROLE=y
	CONFIG_USB_DWC3_PCI=y
	CONFIG_EXTCON_INTEL_MRFLD=m

In the Buildroot environment the modules are probed by alphabetical ordering
of their modaliases. The latter comes to the case when USB OTG driver will be
probed first followed by extcon one.

So, if the platform anticipates extcon device to be appeared, in the above case
we will get deferred probe of USB OTG, because of ordering.

Since current implementation, done by the commit 58b116bce136 ("drivercore:
deferral race condition fix") counts the amount of triggered deferred probe,
we never advance the situation -- the change makes it to be an infinite loop.

---8<---8<---

[   22.187127] driver_deferred_probe_trigger <<< 1

...here is the late initcall triggers deferred probe...

[   22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list

...dwc3.0.auto is the only device in the deferred list...

[   22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1

...the counter before mutex is unlocked is kept the same...

[   22.205663] platform dwc3.0.auto: Retrying from deferred list

...mutes has been unlocked, we try to re-probe the driver...

[   22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
[   22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
[   22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
[   22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
[   22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
[   22.263723] driver_deferred_probe_trigger <<< 2

...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...

[   22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
[   22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral

...but extcon driver is still missing...

[   22.283174] platform dwc3.0.auto: Added to deferred list
[   22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2

...and since we had a successful probe, we got counter mismatch...

[   22.297490] driver_deferred_probe_trigger <<< 3
[   22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3

...at the end we have a new counter and loop repeats again, see 22.198727...

---8<---8<---

Revert of the commit helps, but it is probably not helpful for the initially
found regression. Artem Bityutskiy suggested to use counter of the successful
probes instead. This fixes above mentioned case and shouldn't prevent driver
to reprobe deferred ones.

Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Cc: Grant Likely grant.likely@arm.com
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Andrzej Hajda <a.hajda@samsung.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Tested-by: Ferry Toth <fntoth@gmail.com>
---
v2: picked up tags, update Grant's email (Peter)
 drivers/base/dd.c | 39 +++++++++++++++++++++------------------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index b25bcab2a26b..43720beb5300 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,7 +53,6 @@
 static DEFINE_MUTEX(deferred_probe_mutex);
 static LIST_HEAD(deferred_probe_pending_list);
 static LIST_HEAD(deferred_probe_active_list);
-static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
 static struct dentry *deferred_devices;
 static bool initcalls_done;
 
@@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
  * This functions moves all devices from the pending list to the active
  * list and schedules the deferred probe workqueue to process them.  It
  * should be called anytime a driver is successfully bound to a device.
- *
- * Note, there is a race condition in multi-threaded probe. In the case where
- * more than one device is probing at the same time, it is possible for one
- * probe to complete successfully while another is about to defer. If the second
- * depends on the first, then it will get put on the pending list after the
- * trigger event has already occurred and will be stuck there.
- *
- * The atomic 'deferred_trigger_count' is used to determine if a successful
- * trigger has occurred in the midst of probing a driver. If the trigger count
- * changes in the midst of a probe, then deferred processing should be triggered
- * again.
  */
 static void driver_deferred_probe_trigger(void)
 {
@@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
 	 * into the active list so they can be retried by the workqueue
 	 */
 	mutex_lock(&deferred_probe_mutex);
-	atomic_inc(&deferred_trigger_count);
 	list_splice_tail_init(&deferred_probe_pending_list,
 			      &deferred_probe_active_list);
 	mutex_unlock(&deferred_probe_mutex);
@@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
 }
 __exitcall(deferred_probe_exit);
 
+/*
+ * Note, there is a race condition in multi-threaded probe. In the case where
+ * more than one device is probing at the same time, it is possible for one
+ * probe to complete successfully while another is about to defer. If the second
+ * depends on the first, then it will get put on the pending list after the
+ * trigger event has already occurred and will be stuck there.
+ *
+ * The atomic 'probe_okay' is used to determine if a successful probe has
+ * occurred in the midst of probing another driver. If the count changes in
+ * the midst of a probe, then deferred processing should be triggered again.
+ */
+static atomic_t probe_okay = ATOMIC_INIT(0);
+
 /**
  * device_is_bound() - Check if device is bound to a driver
  * @dev: device to check
@@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
 	pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
 		 __func__, dev_name(dev));
 
+	atomic_inc(&probe_okay);
 	klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
 	device_links_driver_bound(dev);
 
@@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
 static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
 
 static void driver_deferred_probe_add_trigger(struct device *dev,
-					      int local_trigger_count)
+					      int local_probe_okay_count)
 {
 	driver_deferred_probe_add(dev);
 	/* Did a trigger occur while probing? Need to re-trigger if yes */
-	if (local_trigger_count != atomic_read(&deferred_trigger_count))
+	if (local_probe_okay_count != atomic_read(&probe_okay))
 		driver_deferred_probe_trigger();
 }
 
 static int really_probe(struct device *dev, struct device_driver *drv)
 {
 	int ret = -EPROBE_DEFER;
-	int local_trigger_count = atomic_read(&deferred_trigger_count);
+	int local_probe_okay_count = atomic_read(&probe_okay);
 	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
 			   !drv->suppress_bind_attrs;
 
@@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 
 	ret = device_links_check_suppliers(dev);
 	if (ret == -EPROBE_DEFER)
-		driver_deferred_probe_add_trigger(dev, local_trigger_count);
+		driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
 	if (ret)
 		return ret;
 
@@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 	case -EPROBE_DEFER:
 		/* Driver requested deferred probing */
 		dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
-		driver_deferred_probe_add_trigger(dev, local_trigger_count);
+		driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
 		break;
 	case -ENODEV:
 	case -ENXIO:
@@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
 		dev_pm_set_driver_flags(dev, 0);
 
 		klist_remove(&dev->p->knode_driver);
+		atomic_dec(&probe_okay);
+
 		device_pm_check_callbacks(dev);
 		if (dev->bus)
 			blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done()
  2020-03-24 12:20 [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Andy Shevchenko
@ 2020-03-24 12:20 ` Andy Shevchenko
  2020-03-24 12:55   ` Rafael J. Wysocki
  2020-03-24 12:20 ` [PATCH v2 3/3] driver core: Replace open-coded list_last_entry() Andy Shevchenko
  2020-03-24 12:52 ` [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Rafael J. Wysocki
  2 siblings, 1 reply; 20+ messages in thread
From: Andy Shevchenko @ 2020-03-24 12:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki, linux-pm
  Cc: Andy Shevchenko, Ferry Toth

Between printing the debug message and actual check atomic counter can be
altered. For better debugging experience read atomic counter value only once.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Ferry Toth <fntoth@gmail.com>
---
v2: picked up tags
 drivers/base/dd.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 43720beb5300..efd0e4c16ba5 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -669,9 +669,10 @@ static int really_probe_debug(struct device *dev, struct device_driver *drv)
  */
 int driver_probe_done(void)
 {
-	pr_debug("%s: probe_count = %d\n", __func__,
-		 atomic_read(&probe_count));
-	if (atomic_read(&probe_count))
+	int local_probe_count = atomic_read(&probe_count);
+
+	pr_debug("%s: probe_count = %d\n", __func__, local_probe_count);
+	if (local_probe_count)
 		return -EBUSY;
 	return 0;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-24 12:20 [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Andy Shevchenko
  2020-03-24 12:20 ` [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done() Andy Shevchenko
@ 2020-03-24 12:20 ` Andy Shevchenko
  2020-03-24 12:53   ` Rafael J. Wysocki
  2020-03-24 12:52 ` [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Rafael J. Wysocki
  2 siblings, 1 reply; 20+ messages in thread
From: Andy Shevchenko @ 2020-03-24 12:20 UTC (permalink / raw)
  To: Greg Kroah-Hartman, linux-kernel, Rafael J. Wysocki, linux-pm
  Cc: Andy Shevchenko

There is a place in the code where open-coded version of list entry accessors
list_last_entry() is used.

Replace that with the standard macro.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
v2: no change
 drivers/base/dd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index efd0e4c16ba5..27a4d51b5bba 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
 			spin_unlock(&drv->p->klist_devices.k_lock);
 			break;
 		}
-		dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
+		dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
 				     struct device_private,
 				     knode_driver.n_node);
 		dev = dev_prv->device;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied
  2020-03-24 12:20 [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Andy Shevchenko
  2020-03-24 12:20 ` [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done() Andy Shevchenko
  2020-03-24 12:20 ` [PATCH v2 3/3] driver core: Replace open-coded list_last_entry() Andy Shevchenko
@ 2020-03-24 12:52 ` Rafael J. Wysocki
  2020-03-24 13:39   ` Andy Shevchenko
  2 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2020-03-24 12:52 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Rafael J. Wysocki,
	Linux PM, Artem Bityutskiy, Mark Brown, Felipe Balbi,
	Andrzej Hajda, Peter Ujfalusi, Ferry Toth

On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> Consider the following scenario.
>
> The main driver of USB OTG controller (dwc3-pci), which has the following
> functional dependencies on certain platform:
> - ULPI (tusb1210)
> - extcon (tested with extcon-intel-mrfld)
>
> Note, that first driver, tusb1210, is available at the moment of
> dwc3-pci probing, while extcon-intel-mrfld is built as a module and
> won't appear till user space does something about it.
>
> This is depicted by kernel configuration excerpt:
>
>         CONFIG_PHY_TUSB1210=y
>         CONFIG_USB_DWC3=y
>         CONFIG_USB_DWC3_ULPI=y
>         CONFIG_USB_DWC3_DUAL_ROLE=y
>         CONFIG_USB_DWC3_PCI=y
>         CONFIG_EXTCON_INTEL_MRFLD=m
>
> In the Buildroot environment the modules are probed by alphabetical ordering
> of their modaliases. The latter comes to the case when USB OTG driver will be
> probed first followed by extcon one.
>
> So, if the platform anticipates extcon device to be appeared, in the above case
> we will get deferred probe of USB OTG, because of ordering.
>
> Since current implementation, done by the commit 58b116bce136 ("drivercore:
> deferral race condition fix") counts the amount of triggered deferred probe,
> we never advance the situation -- the change makes it to be an infinite loop.
>
> ---8<---8<---
>
> [   22.187127] driver_deferred_probe_trigger <<< 1
>
> ...here is the late initcall triggers deferred probe...
>
> [   22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list
>
> ...dwc3.0.auto is the only device in the deferred list...
>
> [   22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1
>
> ...the counter before mutex is unlocked is kept the same...
>
> [   22.205663] platform dwc3.0.auto: Retrying from deferred list
>
> ...mutes has been unlocked, we try to re-probe the driver...
>
> [   22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
> [   22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
> [   22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
> [   22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
> [   22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
> [   22.263723] driver_deferred_probe_trigger <<< 2
>
> ...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...
>
> [   22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
> [   22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral
>
> ...but extcon driver is still missing...
>
> [   22.283174] platform dwc3.0.auto: Added to deferred list
> [   22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2
>
> ...and since we had a successful probe, we got counter mismatch...
>
> [   22.297490] driver_deferred_probe_trigger <<< 3
> [   22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3
>
> ...at the end we have a new counter and loop repeats again, see 22.198727...
>
> ---8<---8<---
>
> Revert of the commit helps, but it is probably not helpful for the initially
> found regression. Artem Bityutskiy suggested to use counter of the successful
> probes instead. This fixes above mentioned case and shouldn't prevent driver
> to reprobe deferred ones.
>
> Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
> Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> Cc: Grant Likely grant.likely@arm.com
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: Andrzej Hajda <a.hajda@samsung.com>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> Tested-by: Ferry Toth <fntoth@gmail.com>
> ---
> v2: picked up tags, update Grant's email (Peter)
>  drivers/base/dd.c | 39 +++++++++++++++++++++------------------
>  1 file changed, 21 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index b25bcab2a26b..43720beb5300 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -53,7 +53,6 @@
>  static DEFINE_MUTEX(deferred_probe_mutex);
>  static LIST_HEAD(deferred_probe_pending_list);
>  static LIST_HEAD(deferred_probe_active_list);
> -static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
>  static struct dentry *deferred_devices;
>  static bool initcalls_done;
>
> @@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
>   * This functions moves all devices from the pending list to the active
>   * list and schedules the deferred probe workqueue to process them.  It
>   * should be called anytime a driver is successfully bound to a device.
> - *
> - * Note, there is a race condition in multi-threaded probe. In the case where
> - * more than one device is probing at the same time, it is possible for one
> - * probe to complete successfully while another is about to defer. If the second
> - * depends on the first, then it will get put on the pending list after the
> - * trigger event has already occurred and will be stuck there.
> - *
> - * The atomic 'deferred_trigger_count' is used to determine if a successful
> - * trigger has occurred in the midst of probing a driver. If the trigger count
> - * changes in the midst of a probe, then deferred processing should be triggered
> - * again.
>   */
>  static void driver_deferred_probe_trigger(void)
>  {
> @@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
>          * into the active list so they can be retried by the workqueue
>          */
>         mutex_lock(&deferred_probe_mutex);
> -       atomic_inc(&deferred_trigger_count);
>         list_splice_tail_init(&deferred_probe_pending_list,
>                               &deferred_probe_active_list);
>         mutex_unlock(&deferred_probe_mutex);
> @@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
>  }
>  __exitcall(deferred_probe_exit);
>
> +/*
> + * Note, there is a race condition in multi-threaded probe. In the case where
> + * more than one device is probing at the same time, it is possible for one
> + * probe to complete successfully while another is about to defer. If the second
> + * depends on the first, then it will get put on the pending list after the
> + * trigger event has already occurred and will be stuck there.
> + *
> + * The atomic 'probe_okay' is used to determine if a successful probe has
> + * occurred in the midst of probing another driver. If the count changes in
> + * the midst of a probe, then deferred processing should be triggered again.
> + */
> +static atomic_t probe_okay = ATOMIC_INIT(0);
> +
>  /**
>   * device_is_bound() - Check if device is bound to a driver
>   * @dev: device to check
> @@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
>         pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
>                  __func__, dev_name(dev));
>
> +       atomic_inc(&probe_okay);
>         klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
>         device_links_driver_bound(dev);
>
> @@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
>  static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
>
>  static void driver_deferred_probe_add_trigger(struct device *dev,
> -                                             int local_trigger_count)
> +                                             int local_probe_okay_count)
>  {
>         driver_deferred_probe_add(dev);
>         /* Did a trigger occur while probing? Need to re-trigger if yes */
> -       if (local_trigger_count != atomic_read(&deferred_trigger_count))
> +       if (local_probe_okay_count != atomic_read(&probe_okay))
>                 driver_deferred_probe_trigger();
>  }
>
>  static int really_probe(struct device *dev, struct device_driver *drv)
>  {
>         int ret = -EPROBE_DEFER;
> -       int local_trigger_count = atomic_read(&deferred_trigger_count);
> +       int local_probe_okay_count = atomic_read(&probe_okay);
>         bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
>                            !drv->suppress_bind_attrs;
>
> @@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>
>         ret = device_links_check_suppliers(dev);
>         if (ret == -EPROBE_DEFER)
> -               driver_deferred_probe_add_trigger(dev, local_trigger_count);
> +               driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
>         if (ret)
>                 return ret;
>
> @@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
>         case -EPROBE_DEFER:
>                 /* Driver requested deferred probing */
>                 dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> -               driver_deferred_probe_add_trigger(dev, local_trigger_count);
> +               driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
>                 break;
>         case -ENODEV:
>         case -ENXIO:
> @@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
>                 dev_pm_set_driver_flags(dev, 0);
>
>                 klist_remove(&dev->p->knode_driver);
> +               atomic_dec(&probe_okay);
> +

Why is this needed?

>                 device_pm_check_callbacks(dev);
>                 if (dev->bus)
>                         blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> --

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-24 12:20 ` [PATCH v2 3/3] driver core: Replace open-coded list_last_entry() Andy Shevchenko
@ 2020-03-24 12:53   ` Rafael J. Wysocki
  2020-03-27 17:56     ` Naresh Kamboju
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2020-03-24 12:53 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Rafael J. Wysocki,
	Linux PM

On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> There is a place in the code where open-coded version of list entry accessors
> list_last_entry() is used.
>
> Replace that with the standard macro.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> v2: no change
>  drivers/base/dd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index efd0e4c16ba5..27a4d51b5bba 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
>                         spin_unlock(&drv->p->klist_devices.k_lock);
>                         break;
>                 }
> -               dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
> +               dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
>                                      struct device_private,
>                                      knode_driver.n_node);
>                 dev = dev_prv->device;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done()
  2020-03-24 12:20 ` [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done() Andy Shevchenko
@ 2020-03-24 12:55   ` Rafael J. Wysocki
  0 siblings, 0 replies; 20+ messages in thread
From: Rafael J. Wysocki @ 2020-03-24 12:55 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Rafael J. Wysocki,
	Linux PM, Ferry Toth

On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> Between printing the debug message and actual check atomic counter can be
> altered. For better debugging experience read atomic counter value only once.
>
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Tested-by: Ferry Toth <fntoth@gmail.com>

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> v2: picked up tags
>  drivers/base/dd.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 43720beb5300..efd0e4c16ba5 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -669,9 +669,10 @@ static int really_probe_debug(struct device *dev, struct device_driver *drv)
>   */
>  int driver_probe_done(void)
>  {
> -       pr_debug("%s: probe_count = %d\n", __func__,
> -                atomic_read(&probe_count));
> -       if (atomic_read(&probe_count))
> +       int local_probe_count = atomic_read(&probe_count);
> +
> +       pr_debug("%s: probe_count = %d\n", __func__, local_probe_count);
> +       if (local_probe_count)
>                 return -EBUSY;
>         return 0;
>  }
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied
  2020-03-24 12:52 ` [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Rafael J. Wysocki
@ 2020-03-24 13:39   ` Andy Shevchenko
  2020-03-24 15:36     ` Rafael J. Wysocki
  0 siblings, 1 reply; 20+ messages in thread
From: Andy Shevchenko @ 2020-03-24 13:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Artem Bityutskiy, Mark Brown, Felipe Balbi, Andrzej Hajda,
	Peter Ujfalusi, Ferry Toth

On Tue, Mar 24, 2020 at 01:52:00PM +0100, Rafael J. Wysocki wrote:
> On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> >
> > Consider the following scenario.
> >
> > The main driver of USB OTG controller (dwc3-pci), which has the following
> > functional dependencies on certain platform:
> > - ULPI (tusb1210)
> > - extcon (tested with extcon-intel-mrfld)
> >
> > Note, that first driver, tusb1210, is available at the moment of
> > dwc3-pci probing, while extcon-intel-mrfld is built as a module and
> > won't appear till user space does something about it.
> >
> > This is depicted by kernel configuration excerpt:
> >
> >         CONFIG_PHY_TUSB1210=y
> >         CONFIG_USB_DWC3=y
> >         CONFIG_USB_DWC3_ULPI=y
> >         CONFIG_USB_DWC3_DUAL_ROLE=y
> >         CONFIG_USB_DWC3_PCI=y
> >         CONFIG_EXTCON_INTEL_MRFLD=m
> >
> > In the Buildroot environment the modules are probed by alphabetical ordering
> > of their modaliases. The latter comes to the case when USB OTG driver will be
> > probed first followed by extcon one.
> >
> > So, if the platform anticipates extcon device to be appeared, in the above case
> > we will get deferred probe of USB OTG, because of ordering.
> >
> > Since current implementation, done by the commit 58b116bce136 ("drivercore:
> > deferral race condition fix") counts the amount of triggered deferred probe,
> > we never advance the situation -- the change makes it to be an infinite loop.
> >
> > ---8<---8<---
> >
> > [   22.187127] driver_deferred_probe_trigger <<< 1
> >
> > ...here is the late initcall triggers deferred probe...
> >
> > [   22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list
> >
> > ...dwc3.0.auto is the only device in the deferred list...
> >
> > [   22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1
> >
> > ...the counter before mutex is unlocked is kept the same...
> >
> > [   22.205663] platform dwc3.0.auto: Retrying from deferred list
> >
> > ...mutes has been unlocked, we try to re-probe the driver...
> >
> > [   22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
> > [   22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
> > [   22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
> > [   22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
> > [   22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
> > [   22.263723] driver_deferred_probe_trigger <<< 2
> >
> > ...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...
> >
> > [   22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
> > [   22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> >
> > ...but extcon driver is still missing...
> >
> > [   22.283174] platform dwc3.0.auto: Added to deferred list
> > [   22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2
> >
> > ...and since we had a successful probe, we got counter mismatch...
> >
> > [   22.297490] driver_deferred_probe_trigger <<< 3
> > [   22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3
> >
> > ...at the end we have a new counter and loop repeats again, see 22.198727...
> >
> > ---8<---8<---
> >
> > Revert of the commit helps, but it is probably not helpful for the initially
> > found regression. Artem Bityutskiy suggested to use counter of the successful
> > probes instead. This fixes above mentioned case and shouldn't prevent driver
> > to reprobe deferred ones.
> >
> > Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
> > Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> > Cc: Grant Likely grant.likely@arm.com
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Mark Brown <broonie@kernel.org>
> > Cc: Felipe Balbi <balbi@kernel.org>
> > Cc: Andrzej Hajda <a.hajda@samsung.com>
> > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> > Tested-by: Ferry Toth <fntoth@gmail.com>
> > ---
> > v2: picked up tags, update Grant's email (Peter)
> >  drivers/base/dd.c | 39 +++++++++++++++++++++------------------
> >  1 file changed, 21 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index b25bcab2a26b..43720beb5300 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -53,7 +53,6 @@
> >  static DEFINE_MUTEX(deferred_probe_mutex);
> >  static LIST_HEAD(deferred_probe_pending_list);
> >  static LIST_HEAD(deferred_probe_active_list);
> > -static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
> >  static struct dentry *deferred_devices;
> >  static bool initcalls_done;
> >
> > @@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
> >   * This functions moves all devices from the pending list to the active
> >   * list and schedules the deferred probe workqueue to process them.  It
> >   * should be called anytime a driver is successfully bound to a device.
> > - *
> > - * Note, there is a race condition in multi-threaded probe. In the case where
> > - * more than one device is probing at the same time, it is possible for one
> > - * probe to complete successfully while another is about to defer. If the second
> > - * depends on the first, then it will get put on the pending list after the
> > - * trigger event has already occurred and will be stuck there.
> > - *
> > - * The atomic 'deferred_trigger_count' is used to determine if a successful
> > - * trigger has occurred in the midst of probing a driver. If the trigger count
> > - * changes in the midst of a probe, then deferred processing should be triggered
> > - * again.
> >   */
> >  static void driver_deferred_probe_trigger(void)
> >  {
> > @@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
> >          * into the active list so they can be retried by the workqueue
> >          */
> >         mutex_lock(&deferred_probe_mutex);
> > -       atomic_inc(&deferred_trigger_count);
> >         list_splice_tail_init(&deferred_probe_pending_list,
> >                               &deferred_probe_active_list);
> >         mutex_unlock(&deferred_probe_mutex);
> > @@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
> >  }
> >  __exitcall(deferred_probe_exit);
> >
> > +/*
> > + * Note, there is a race condition in multi-threaded probe. In the case where
> > + * more than one device is probing at the same time, it is possible for one
> > + * probe to complete successfully while another is about to defer. If the second
> > + * depends on the first, then it will get put on the pending list after the
> > + * trigger event has already occurred and will be stuck there.
> > + *
> > + * The atomic 'probe_okay' is used to determine if a successful probe has
> > + * occurred in the midst of probing another driver. If the count changes in
> > + * the midst of a probe, then deferred processing should be triggered again.
> > + */
> > +static atomic_t probe_okay = ATOMIC_INIT(0);
> > +
> >  /**
> >   * device_is_bound() - Check if device is bound to a driver
> >   * @dev: device to check
> > @@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
> >         pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
> >                  __func__, dev_name(dev));
> >
> > +       atomic_inc(&probe_okay);
> >         klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> >         device_links_driver_bound(dev);
> >
> > @@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
> >  static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> >
> >  static void driver_deferred_probe_add_trigger(struct device *dev,
> > -                                             int local_trigger_count)
> > +                                             int local_probe_okay_count)
> >  {
> >         driver_deferred_probe_add(dev);
> >         /* Did a trigger occur while probing? Need to re-trigger if yes */
> > -       if (local_trigger_count != atomic_read(&deferred_trigger_count))
> > +       if (local_probe_okay_count != atomic_read(&probe_okay))
> >                 driver_deferred_probe_trigger();
> >  }
> >
> >  static int really_probe(struct device *dev, struct device_driver *drv)
> >  {
> >         int ret = -EPROBE_DEFER;
> > -       int local_trigger_count = atomic_read(&deferred_trigger_count);
> > +       int local_probe_okay_count = atomic_read(&probe_okay);
> >         bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> >                            !drv->suppress_bind_attrs;
> >
> > @@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >
> >         ret = device_links_check_suppliers(dev);
> >         if (ret == -EPROBE_DEFER)
> > -               driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > +               driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> >         if (ret)
> >                 return ret;
> >
> > @@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> >         case -EPROBE_DEFER:
> >                 /* Driver requested deferred probing */
> >                 dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> > -               driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > +               driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> >                 break;
> >         case -ENODEV:
> >         case -ENXIO:
> > @@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> >                 dev_pm_set_driver_flags(dev, 0);
> >
> >                 klist_remove(&dev->p->knode_driver);
> > +               atomic_dec(&probe_okay);
> > +
> 
> Why is this needed?

Under successful probe the following is comprehended. When probe of the driver
happens it may be discarded (as in above case) as it was initiated by another
driver which got deferred.

We also discussed this with Peter [1] during his review.

[1]: https://lkml.org/lkml/2020/3/12/347

> 
> >                 device_pm_check_callbacks(dev);
> >                 if (dev->bus)
> >                         blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> > --

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied
  2020-03-24 13:39   ` Andy Shevchenko
@ 2020-03-24 15:36     ` Rafael J. Wysocki
  2020-03-24 15:47       ` Andy Shevchenko
  0 siblings, 1 reply; 20+ messages in thread
From: Rafael J. Wysocki @ 2020-03-24 15:36 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Linux PM, Artem Bityutskiy, Mark Brown, Felipe Balbi,
	Andrzej Hajda, Peter Ujfalusi, Ferry Toth

 On Tue, Mar 24, 2020 at 2:39 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Tue, Mar 24, 2020 at 01:52:00PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > >
> > > Consider the following scenario.
> > >
> > > The main driver of USB OTG controller (dwc3-pci), which has the following
> > > functional dependencies on certain platform:
> > > - ULPI (tusb1210)
> > > - extcon (tested with extcon-intel-mrfld)
> > >
> > > Note, that first driver, tusb1210, is available at the moment of
> > > dwc3-pci probing, while extcon-intel-mrfld is built as a module and
> > > won't appear till user space does something about it.
> > >
> > > This is depicted by kernel configuration excerpt:
> > >
> > >         CONFIG_PHY_TUSB1210=y
> > >         CONFIG_USB_DWC3=y
> > >         CONFIG_USB_DWC3_ULPI=y
> > >         CONFIG_USB_DWC3_DUAL_ROLE=y
> > >         CONFIG_USB_DWC3_PCI=y
> > >         CONFIG_EXTCON_INTEL_MRFLD=m
> > >
> > > In the Buildroot environment the modules are probed by alphabetical ordering
> > > of their modaliases. The latter comes to the case when USB OTG driver will be
> > > probed first followed by extcon one.
> > >
> > > So, if the platform anticipates extcon device to be appeared, in the above case
> > > we will get deferred probe of USB OTG, because of ordering.
> > >
> > > Since current implementation, done by the commit 58b116bce136 ("drivercore:
> > > deferral race condition fix") counts the amount of triggered deferred probe,
> > > we never advance the situation -- the change makes it to be an infinite loop.
> > >
> > > ---8<---8<---
> > >
> > > [   22.187127] driver_deferred_probe_trigger <<< 1
> > >
> > > ...here is the late initcall triggers deferred probe...
> > >
> > > [   22.191725] platform dwc3.0.auto: deferred_probe_work_func in deferred list
> > >
> > > ...dwc3.0.auto is the only device in the deferred list...
> > >
> > > [   22.198727] platform dwc3.0.auto: deferred_probe_work_func 1 <<< counter 1
> > >
> > > ...the counter before mutex is unlocked is kept the same...
> > >
> > > [   22.205663] platform dwc3.0.auto: Retrying from deferred list
> > >
> > > ...mutes has been unlocked, we try to re-probe the driver...
> > >
> > > [   22.211487] bus: 'platform': driver_probe_device: matched device dwc3.0.auto with driver dwc3
> > > [   22.220060] bus: 'platform': really_probe: probing driver dwc3 with device dwc3.0.auto
> > > [   22.238735] bus: 'ulpi': driver_probe_device: matched device dwc3.0.auto.ulpi with driver tusb1210
> > > [   22.247743] bus: 'ulpi': really_probe: probing driver tusb1210 with device dwc3.0.auto.ulpi
> > > [   22.256292] driver: 'tusb1210': driver_bound: bound to device 'dwc3.0.auto.ulpi'
> > > [   22.263723] driver_deferred_probe_trigger <<< 2
> > >
> > > ...the dwc3.0.auto probes ULPI, we got successful bound and bumped counter...
> > >
> > > [   22.268304] bus: 'ulpi': really_probe: bound device dwc3.0.auto.ulpi to driver tusb1210
> > > [   22.276697] platform dwc3.0.auto: Driver dwc3 requests probe deferral
> > >
> > > ...but extcon driver is still missing...
> > >
> > > [   22.283174] platform dwc3.0.auto: Added to deferred list
> > > [   22.288513] platform dwc3.0.auto: driver_deferred_probe_add_trigger local counter: 1 new counter 2
> > >
> > > ...and since we had a successful probe, we got counter mismatch...
> > >
> > > [   22.297490] driver_deferred_probe_trigger <<< 3
> > > [   22.302074] platform dwc3.0.auto: deferred_probe_work_func 2 <<< counter 3
> > >
> > > ...at the end we have a new counter and loop repeats again, see 22.198727...
> > >
> > > ---8<---8<---
> > >
> > > Revert of the commit helps, but it is probably not helpful for the initially
> > > found regression. Artem Bityutskiy suggested to use counter of the successful
> > > probes instead. This fixes above mentioned case and shouldn't prevent driver
> > > to reprobe deferred ones.
> > >
> > > Fixes: 58b116bce136 ("drivercore: deferral race condition fix")
> > > Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
> > > Cc: Grant Likely grant.likely@arm.com
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Mark Brown <broonie@kernel.org>
> > > Cc: Felipe Balbi <balbi@kernel.org>
> > > Cc: Andrzej Hajda <a.hajda@samsung.com>
> > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
> > > Tested-by: Ferry Toth <fntoth@gmail.com>
> > > ---
> > > v2: picked up tags, update Grant's email (Peter)
> > >  drivers/base/dd.c | 39 +++++++++++++++++++++------------------
> > >  1 file changed, 21 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index b25bcab2a26b..43720beb5300 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -53,7 +53,6 @@
> > >  static DEFINE_MUTEX(deferred_probe_mutex);
> > >  static LIST_HEAD(deferred_probe_pending_list);
> > >  static LIST_HEAD(deferred_probe_active_list);
> > > -static atomic_t deferred_trigger_count = ATOMIC_INIT(0);
> > >  static struct dentry *deferred_devices;
> > >  static bool initcalls_done;
> > >
> > > @@ -147,17 +146,6 @@ static bool driver_deferred_probe_enable = false;
> > >   * This functions moves all devices from the pending list to the active
> > >   * list and schedules the deferred probe workqueue to process them.  It
> > >   * should be called anytime a driver is successfully bound to a device.
> > > - *
> > > - * Note, there is a race condition in multi-threaded probe. In the case where
> > > - * more than one device is probing at the same time, it is possible for one
> > > - * probe to complete successfully while another is about to defer. If the second
> > > - * depends on the first, then it will get put on the pending list after the
> > > - * trigger event has already occurred and will be stuck there.
> > > - *
> > > - * The atomic 'deferred_trigger_count' is used to determine if a successful
> > > - * trigger has occurred in the midst of probing a driver. If the trigger count
> > > - * changes in the midst of a probe, then deferred processing should be triggered
> > > - * again.
> > >   */
> > >  static void driver_deferred_probe_trigger(void)
> > >  {
> > > @@ -170,7 +158,6 @@ static void driver_deferred_probe_trigger(void)
> > >          * into the active list so they can be retried by the workqueue
> > >          */
> > >         mutex_lock(&deferred_probe_mutex);
> > > -       atomic_inc(&deferred_trigger_count);
> > >         list_splice_tail_init(&deferred_probe_pending_list,
> > >                               &deferred_probe_active_list);
> > >         mutex_unlock(&deferred_probe_mutex);
> > > @@ -350,6 +337,19 @@ static void __exit deferred_probe_exit(void)
> > >  }
> > >  __exitcall(deferred_probe_exit);
> > >
> > > +/*
> > > + * Note, there is a race condition in multi-threaded probe. In the case where
> > > + * more than one device is probing at the same time, it is possible for one
> > > + * probe to complete successfully while another is about to defer. If the second
> > > + * depends on the first, then it will get put on the pending list after the
> > > + * trigger event has already occurred and will be stuck there.
> > > + *
> > > + * The atomic 'probe_okay' is used to determine if a successful probe has
> > > + * occurred in the midst of probing another driver. If the count changes in
> > > + * the midst of a probe, then deferred processing should be triggered again.
> > > + */
> > > +static atomic_t probe_okay = ATOMIC_INIT(0);
> > > +
> > >  /**
> > >   * device_is_bound() - Check if device is bound to a driver
> > >   * @dev: device to check
> > > @@ -375,6 +375,7 @@ static void driver_bound(struct device *dev)
> > >         pr_debug("driver: '%s': %s: bound to device '%s'\n", dev->driver->name,
> > >                  __func__, dev_name(dev));
> > >
> > > +       atomic_inc(&probe_okay);
> > >         klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> > >         device_links_driver_bound(dev);
> > >
> > > @@ -481,18 +482,18 @@ static atomic_t probe_count = ATOMIC_INIT(0);
> > >  static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
> > >
> > >  static void driver_deferred_probe_add_trigger(struct device *dev,
> > > -                                             int local_trigger_count)
> > > +                                             int local_probe_okay_count)
> > >  {
> > >         driver_deferred_probe_add(dev);
> > >         /* Did a trigger occur while probing? Need to re-trigger if yes */
> > > -       if (local_trigger_count != atomic_read(&deferred_trigger_count))
> > > +       if (local_probe_okay_count != atomic_read(&probe_okay))
> > >                 driver_deferred_probe_trigger();
> > >  }
> > >
> > >  static int really_probe(struct device *dev, struct device_driver *drv)
> > >  {
> > >         int ret = -EPROBE_DEFER;
> > > -       int local_trigger_count = atomic_read(&deferred_trigger_count);
> > > +       int local_probe_okay_count = atomic_read(&probe_okay);
> > >         bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
> > >                            !drv->suppress_bind_attrs;
> > >
> > > @@ -509,7 +510,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >
> > >         ret = device_links_check_suppliers(dev);
> > >         if (ret == -EPROBE_DEFER)
> > > -               driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > > +               driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> > >         if (ret)
> > >                 return ret;
> > >
> > > @@ -619,7 +620,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
> > >         case -EPROBE_DEFER:
> > >                 /* Driver requested deferred probing */
> > >                 dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> > > -               driver_deferred_probe_add_trigger(dev, local_trigger_count);
> > > +               driver_deferred_probe_add_trigger(dev, local_probe_okay_count);
> > >                 break;
> > >         case -ENODEV:
> > >         case -ENXIO:
> > > @@ -1148,6 +1149,8 @@ static void __device_release_driver(struct device *dev, struct device *parent)
> > >                 dev_pm_set_driver_flags(dev, 0);
> > >
> > >                 klist_remove(&dev->p->knode_driver);
> > > +               atomic_dec(&probe_okay);
> > > +
> >
> > Why is this needed?
>
> Under successful probe the following is comprehended. When probe of the driver
> happens it may be discarded (as in above case) as it was initiated by another
> driver which got deferred.
>
> We also discussed this with Peter [1] during his review.
>
> [1]: https://lkml.org/lkml/2020/3/12/347

OK, but I would add a comment explaining that to the code.

Also it would be good to explain why probe_okay cannot go below zero
here in the changelog.

Cheers!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied
  2020-03-24 15:36     ` Rafael J. Wysocki
@ 2020-03-24 15:47       ` Andy Shevchenko
  0 siblings, 0 replies; 20+ messages in thread
From: Andy Shevchenko @ 2020-03-24 15:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Artem Bityutskiy, Mark Brown, Felipe Balbi, Andrzej Hajda,
	Peter Ujfalusi, Ferry Toth

On Tue, Mar 24, 2020 at 04:36:56PM +0100, Rafael J. Wysocki wrote:
>  On Tue, Mar 24, 2020 at 2:39 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Tue, Mar 24, 2020 at 01:52:00PM +0100, Rafael J. Wysocki wrote:
> > > On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> > > <andriy.shevchenko@linux.intel.com> wrote:

...

> > > > +               atomic_dec(&probe_okay);
> > >
> > > Why is this needed?
> >
> > Under successful probe the following is comprehended. When probe of the driver
> > happens it may be discarded (as in above case) as it was initiated by another
> > driver which got deferred.
> >
> > We also discussed this with Peter [1] during his review.
> >
> > [1]: https://lkml.org/lkml/2020/3/12/347
> 
> OK, but I would add a comment explaining that to the code.
> 
> Also it would be good to explain why probe_okay cannot go below zero
> here in the changelog.

Will do, thank you for review!

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-24 12:53   ` Rafael J. Wysocki
@ 2020-03-27 17:56     ` Naresh Kamboju
  2020-03-27 19:40       ` Robin Murphy
  2020-03-28  7:25       ` Greg Kroah-Hartman
  0 siblings, 2 replies; 20+ messages in thread
From: Naresh Kamboju @ 2020-03-27 17:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Andy Shevchenko, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Linux PM, Basil Eljuse, lkft-triage, Linux-Next Mailing List,
	fntoth, Arnd Bergmann, Anders Roxell

The kernel warning noticed on arm64 juno-r2 device running linux
next-20200326 and next-20200327

[   36.077086] ------------[ cut here ]------------
[   36.081752] amba 20010000.etf: deferred probe timeout, ignoring dependency
[   36.081859] WARNING: CPU: 1 PID: 42 at drivers/base/dd.c:270
driver_deferred_probe_check_state+0x54/0x80
[   36.098242] Modules linked in: fuse
[   36.101753] CPU: 1 PID: 42 Comm: kworker/1:1 Not tainted
5.6.0-rc7-next-20200327 #1
[   36.109427] Hardware name: ARM Juno development board (r2) (DT)
[   36.115372] Workqueue: events amba_deferred_retry_func
[   36.120526] pstate: 60000005 (nZCv daif -PAN -UAO)
[   36.125334] pc : driver_deferred_probe_check_state+0x54/0x80
[   36.131010] lr : driver_deferred_probe_check_state+0x54/0x80
[   36.136680] sp : ffff000934e0fae0
[   36.140001] x29: ffff000934e0fae0 x28: ffff000934db5608
[   36.145337] x27: ffffa00013c63240 x26: ffff000934f2a800
[   36.150668] x25: 0000000000000001 x24: fffffffffffffffe
[   36.155996] x23: ffff000934c6ab80 x22: ffffa00011b39ea0
[   36.161322] x21: ffff000934f2a800 x20: ffffa00011905fe0
[   36.166649] x19: ffff000934f2a800 x18: 0000000000000000
[   36.171974] x17: 0000000000000000 x16: 0000000000000000
[   36.177299] x15: 0000000000000000 x14: 003d090000000000
[   36.182625] x13: 00003d0900000000 x12: ffff9400027ef445
[   36.187952] x11: 1ffff400027ef444 x10: ffff9400027ef444
[   36.193278] x9 : dfffa00000000000 x8 : 0000000000000000
[   36.198603] x7 : 0000000000000001 x6 : ffffa00013f7a220
[   36.203929] x5 : 0000000000000004 x4 : dfffa00000000000
[   36.209255] x3 : ffffa000101a74ec x2 : ffff8001269c1f26
[   36.214581] x1 : da1107b7b6a8fb00 x0 : 0000000000000000
[   36.219906] Call trace:
[   36.222369]  driver_deferred_probe_check_state+0x54/0x80
[   36.227698]  __genpd_dev_pm_attach+0x264/0x2a0
[   36.232154]  genpd_dev_pm_attach+0x68/0x78
[   36.236265]  dev_pm_domain_attach+0x6c/0x70
[   36.240463]  amba_device_try_add+0xec/0x3f8
[   36.244659]  amba_deferred_retry_func+0x84/0x158
[   36.249301]  process_one_work+0x3f0/0x660
[   36.253326]  worker_thread+0x74/0x698
[   36.256997]  kthread+0x218/0x220
[   36.260236]  ret_from_fork+0x10/0x1c
[   36.263819] ---[ end trace c637c10e549bd716 ]---#

Full test log,
https://lkft.validation.linaro.org/scheduler/job/1317079#L981

On Tue, 24 Mar 2020 at 18:24, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> >
> > There is a place in the code where open-coded version of list entry accessors
> > list_last_entry() is used.
> >
> > Replace that with the standard macro.
> >
> > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>
> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> > ---
> > v2: no change
> >  drivers/base/dd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index efd0e4c16ba5..27a4d51b5bba 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
> >                         spin_unlock(&drv->p->klist_devices.k_lock);
> >                         break;
> >                 }
> > -               dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
> > +               dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
> >                                      struct device_private,
> >                                      knode_driver.n_node);
> >                 dev = dev_prv->device;

metadata:
  git branch: master
  git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  git describe: next-20200327
  kernel-config:
https://builds.tuxbuild.com/nqmmxorUbC1qTWp42iEKjQ/kernel.config


-- 
Linaro LKFT
https://lkft.linaro.org

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-27 17:56     ` Naresh Kamboju
@ 2020-03-27 19:40       ` Robin Murphy
  2020-03-30 10:13         ` Sudeep Holla
  2020-03-28  7:25       ` Greg Kroah-Hartman
  1 sibling, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2020-03-27 19:40 UTC (permalink / raw)
  To: Naresh Kamboju, Rafael J. Wysocki
  Cc: Andy Shevchenko, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Linux PM, Basil Eljuse, lkft-triage, Linux-Next Mailing List,
	fntoth, Arnd Bergmann, Anders Roxell

On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> The kernel warning noticed on arm64 juno-r2 device running linux
> next-20200326 and next-20200327

I suspect this is the correct expected behaviour manifesting. If you're 
using the upstream juno-r2.dts, the power domain being waited for here 
is provided by SCPI, however unless you're using an SCP firmware from at 
least 3 years ago you won't actually have SCPI since they switched it to 
the newer SCMI protocol, which is not yet supported upstream for Juno. 
See what happened earlier in the log:

[    2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
[    2.747586] scpi_protocol: probe of scpi failed with error -110

Thus this is the "waiting for a dependency which will never appear" 
case, for which I assume the warning is intentional, since the system is 
essentially broken (i.e. the hardware/firmware doesn't actually match 
what the DT describes).

Robin.

> [   36.077086] ------------[ cut here ]------------
> [   36.081752] amba 20010000.etf: deferred probe timeout, ignoring dependency
> [   36.081859] WARNING: CPU: 1 PID: 42 at drivers/base/dd.c:270
> driver_deferred_probe_check_state+0x54/0x80
> [   36.098242] Modules linked in: fuse
> [   36.101753] CPU: 1 PID: 42 Comm: kworker/1:1 Not tainted
> 5.6.0-rc7-next-20200327 #1
> [   36.109427] Hardware name: ARM Juno development board (r2) (DT)
> [   36.115372] Workqueue: events amba_deferred_retry_func
> [   36.120526] pstate: 60000005 (nZCv daif -PAN -UAO)
> [   36.125334] pc : driver_deferred_probe_check_state+0x54/0x80
> [   36.131010] lr : driver_deferred_probe_check_state+0x54/0x80
> [   36.136680] sp : ffff000934e0fae0
> [   36.140001] x29: ffff000934e0fae0 x28: ffff000934db5608
> [   36.145337] x27: ffffa00013c63240 x26: ffff000934f2a800
> [   36.150668] x25: 0000000000000001 x24: fffffffffffffffe
> [   36.155996] x23: ffff000934c6ab80 x22: ffffa00011b39ea0
> [   36.161322] x21: ffff000934f2a800 x20: ffffa00011905fe0
> [   36.166649] x19: ffff000934f2a800 x18: 0000000000000000
> [   36.171974] x17: 0000000000000000 x16: 0000000000000000
> [   36.177299] x15: 0000000000000000 x14: 003d090000000000
> [   36.182625] x13: 00003d0900000000 x12: ffff9400027ef445
> [   36.187952] x11: 1ffff400027ef444 x10: ffff9400027ef444
> [   36.193278] x9 : dfffa00000000000 x8 : 0000000000000000
> [   36.198603] x7 : 0000000000000001 x6 : ffffa00013f7a220
> [   36.203929] x5 : 0000000000000004 x4 : dfffa00000000000
> [   36.209255] x3 : ffffa000101a74ec x2 : ffff8001269c1f26
> [   36.214581] x1 : da1107b7b6a8fb00 x0 : 0000000000000000
> [   36.219906] Call trace:
> [   36.222369]  driver_deferred_probe_check_state+0x54/0x80
> [   36.227698]  __genpd_dev_pm_attach+0x264/0x2a0
> [   36.232154]  genpd_dev_pm_attach+0x68/0x78
> [   36.236265]  dev_pm_domain_attach+0x6c/0x70
> [   36.240463]  amba_device_try_add+0xec/0x3f8
> [   36.244659]  amba_deferred_retry_func+0x84/0x158
> [   36.249301]  process_one_work+0x3f0/0x660
> [   36.253326]  worker_thread+0x74/0x698
> [   36.256997]  kthread+0x218/0x220
> [   36.260236]  ret_from_fork+0x10/0x1c
> [   36.263819] ---[ end trace c637c10e549bd716 ]---#
> 
> Full test log,
> https://lkft.validation.linaro.org/scheduler/job/1317079#L981
> 
> On Tue, 24 Mar 2020 at 18:24, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
>> <andriy.shevchenko@linux.intel.com> wrote:
>>>
>>> There is a place in the code where open-coded version of list entry accessors
>>> list_last_entry() is used.
>>>
>>> Replace that with the standard macro.
>>>
>>> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>>
>> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>>> ---
>>> v2: no change
>>>   drivers/base/dd.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
>>> index efd0e4c16ba5..27a4d51b5bba 100644
>>> --- a/drivers/base/dd.c
>>> +++ b/drivers/base/dd.c
>>> @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
>>>                          spin_unlock(&drv->p->klist_devices.k_lock);
>>>                          break;
>>>                  }
>>> -               dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
>>> +               dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
>>>                                       struct device_private,
>>>                                       knode_driver.n_node);
>>>                  dev = dev_prv->device;
> 
> metadata:
>    git branch: master
>    git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>    git describe: next-20200327
>    kernel-config:
> https://builds.tuxbuild.com/nqmmxorUbC1qTWp42iEKjQ/kernel.config
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-27 17:56     ` Naresh Kamboju
  2020-03-27 19:40       ` Robin Murphy
@ 2020-03-28  7:25       ` Greg Kroah-Hartman
  1 sibling, 0 replies; 20+ messages in thread
From: Greg Kroah-Hartman @ 2020-03-28  7:25 UTC (permalink / raw)
  To: Naresh Kamboju
  Cc: Rafael J. Wysocki, Andy Shevchenko, Linux Kernel Mailing List,
	Linux PM, Basil Eljuse, lkft-triage, Linux-Next Mailing List,
	fntoth, Arnd Bergmann, Anders Roxell

On Fri, Mar 27, 2020 at 11:26:13PM +0530, Naresh Kamboju wrote:
> The kernel warning noticed on arm64 juno-r2 device running linux
> next-20200326 and next-20200327
> 
> [   36.077086] ------------[ cut here ]------------
> [   36.081752] amba 20010000.etf: deferred probe timeout, ignoring dependency
> [   36.081859] WARNING: CPU: 1 PID: 42 at drivers/base/dd.c:270
> driver_deferred_probe_check_state+0x54/0x80
> [   36.098242] Modules linked in: fuse
> [   36.101753] CPU: 1 PID: 42 Comm: kworker/1:1 Not tainted
> 5.6.0-rc7-next-20200327 #1
> [   36.109427] Hardware name: ARM Juno development board (r2) (DT)
> [   36.115372] Workqueue: events amba_deferred_retry_func
> [   36.120526] pstate: 60000005 (nZCv daif -PAN -UAO)
> [   36.125334] pc : driver_deferred_probe_check_state+0x54/0x80
> [   36.131010] lr : driver_deferred_probe_check_state+0x54/0x80
> [   36.136680] sp : ffff000934e0fae0
> [   36.140001] x29: ffff000934e0fae0 x28: ffff000934db5608
> [   36.145337] x27: ffffa00013c63240 x26: ffff000934f2a800
> [   36.150668] x25: 0000000000000001 x24: fffffffffffffffe
> [   36.155996] x23: ffff000934c6ab80 x22: ffffa00011b39ea0
> [   36.161322] x21: ffff000934f2a800 x20: ffffa00011905fe0
> [   36.166649] x19: ffff000934f2a800 x18: 0000000000000000
> [   36.171974] x17: 0000000000000000 x16: 0000000000000000
> [   36.177299] x15: 0000000000000000 x14: 003d090000000000
> [   36.182625] x13: 00003d0900000000 x12: ffff9400027ef445
> [   36.187952] x11: 1ffff400027ef444 x10: ffff9400027ef444
> [   36.193278] x9 : dfffa00000000000 x8 : 0000000000000000
> [   36.198603] x7 : 0000000000000001 x6 : ffffa00013f7a220
> [   36.203929] x5 : 0000000000000004 x4 : dfffa00000000000
> [   36.209255] x3 : ffffa000101a74ec x2 : ffff8001269c1f26
> [   36.214581] x1 : da1107b7b6a8fb00 x0 : 0000000000000000
> [   36.219906] Call trace:
> [   36.222369]  driver_deferred_probe_check_state+0x54/0x80
> [   36.227698]  __genpd_dev_pm_attach+0x264/0x2a0
> [   36.232154]  genpd_dev_pm_attach+0x68/0x78
> [   36.236265]  dev_pm_domain_attach+0x6c/0x70
> [   36.240463]  amba_device_try_add+0xec/0x3f8
> [   36.244659]  amba_deferred_retry_func+0x84/0x158
> [   36.249301]  process_one_work+0x3f0/0x660
> [   36.253326]  worker_thread+0x74/0x698
> [   36.256997]  kthread+0x218/0x220
> [   36.260236]  ret_from_fork+0x10/0x1c
> [   36.263819] ---[ end trace c637c10e549bd716 ]---#
> 
> Full test log,
> https://lkft.validation.linaro.org/scheduler/job/1317079#L981
> 
> On Tue, 24 Mar 2020 at 18:24, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Tue, Mar 24, 2020 at 1:20 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > >
> > > There is a place in the code where open-coded version of list entry accessors
> > > list_last_entry() is used.
> > >
> > > Replace that with the standard macro.
> > >
> > > Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> >
> > Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > > ---
> > > v2: no change
> > >  drivers/base/dd.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > > index efd0e4c16ba5..27a4d51b5bba 100644
> > > --- a/drivers/base/dd.c
> > > +++ b/drivers/base/dd.c
> > > @@ -1226,7 +1226,7 @@ void driver_detach(struct device_driver *drv)
> > >                         spin_unlock(&drv->p->klist_devices.k_lock);
> > >                         break;
> > >                 }
> > > -               dev_prv = list_entry(drv->p->klist_devices.k_list.prev,
> > > +               dev_prv = list_last_entry(&drv->p->klist_devices.k_list,
> > >                                      struct device_private,
> > >                                      knode_driver.n_node);
> > >                 dev = dev_prv->device;
> 
> metadata:
>   git branch: master
>   git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   git describe: next-20200327
>   kernel-config:
> https://builds.tuxbuild.com/nqmmxorUbC1qTWp42iEKjQ/kernel.config
> 

And you bisected the warning to this patch?  If you revert it, does it
go away?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-27 19:40       ` Robin Murphy
@ 2020-03-30 10:13         ` Sudeep Holla
  2020-03-30 10:43           ` Andy Shevchenko
  2020-03-30 12:45           ` Robin Murphy
  0 siblings, 2 replies; 20+ messages in thread
From: Sudeep Holla @ 2020-03-30 10:13 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Naresh Kamboju, Rafael J. Wysocki, Robin Murphy, Andy Shevchenko,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Basil Eljuse, lkft-triage, Linux-Next Mailing List, fntoth,
	Arnd Bergmann, Sudeep Holla, Anders Roxell

On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> > The kernel warning noticed on arm64 juno-r2 device running linux
> > next-20200326 and next-20200327
>
> I suspect this is the correct expected behaviour manifesting. If you're
> using the upstream juno-r2.dts, the power domain being waited for here is
> provided by SCPI, however unless you're using an SCP firmware from at least
> 3 years ago you won't actually have SCPI since they switched it to the newer
> SCMI protocol, which is not yet supported upstream for Juno. See what
> happened earlier in the log:
>
> [    2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
> [    2.747586] scpi_protocol: probe of scpi failed with error -110
>
> Thus this is the "waiting for a dependency which will never appear" case,
> for which I assume the warning is intentional,

Is that the case ?

Previously we used to get the warning:
"amba xx: ignoring dependency for device, assuming no driver"

Now we have the kernel warning in addition to the above.

> since the system is essentially broken (i.e. the hardware/firmware doesn't
> actually match what the DT describes).
>

Not sure if we can term it as "essentially broken". Definitely not 100%
functional but not broken if the situation like on Juno where SCP firmware
is fundamental for all OSPM but not essential for boot and other minimum
set of functionality.

Either way I am not against the warning, just wanted to get certain things
clarified myself.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-30 10:13         ` Sudeep Holla
@ 2020-03-30 10:43           ` Andy Shevchenko
  2020-03-30 13:16             ` Sudeep Holla
  2020-03-30 12:45           ` Robin Murphy
  1 sibling, 1 reply; 20+ messages in thread
From: Andy Shevchenko @ 2020-03-30 10:43 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Naresh Kamboju, Rafael J. Wysocki, Robin Murphy,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Basil Eljuse, lkft-triage, Linux-Next Mailing List, fntoth,
	Arnd Bergmann, Anders Roxell

On Mon, Mar 30, 2020 at 11:13:21AM +0100, Sudeep Holla wrote:
> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> > On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> > > The kernel warning noticed on arm64 juno-r2 device running linux
> > > next-20200326 and next-20200327
> >
> > I suspect this is the correct expected behaviour manifesting. If you're
> > using the upstream juno-r2.dts, the power domain being waited for here is
> > provided by SCPI, however unless you're using an SCP firmware from at least
> > 3 years ago you won't actually have SCPI since they switched it to the newer
> > SCMI protocol, which is not yet supported upstream for Juno. See what
> > happened earlier in the log:
> >
> > [    2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
> > [    2.747586] scpi_protocol: probe of scpi failed with error -110
> >
> > Thus this is the "waiting for a dependency which will never appear" case,
> > for which I assume the warning is intentional,
> 
> Is that the case ?
> 
> Previously we used to get the warning:
> "amba xx: ignoring dependency for device, assuming no driver"
> 
> Now we have the kernel warning in addition to the above.
> 
> > since the system is essentially broken (i.e. the hardware/firmware doesn't
> > actually match what the DT describes).
> >
> 
> Not sure if we can term it as "essentially broken". Definitely not 100%
> functional but not broken if the situation like on Juno where SCP firmware
> is fundamental for all OSPM but not essential for boot and other minimum
> set of functionality.
> 
> Either way I am not against the warning, just wanted to get certain things
> clarified myself.

How this warning related to the patch in the subject? Does revert of the patch
gives you no warning? (I will be very surprised).

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-30 10:13         ` Sudeep Holla
  2020-03-30 10:43           ` Andy Shevchenko
@ 2020-03-30 12:45           ` Robin Murphy
  2020-03-30 13:11             ` Andy Shevchenko
  2020-03-30 13:25             ` Sudeep Holla
  1 sibling, 2 replies; 20+ messages in thread
From: Robin Murphy @ 2020-03-30 12:45 UTC (permalink / raw)
  To: Sudeep Holla, Andy Shevchenko
  Cc: Naresh Kamboju, Rafael J. Wysocki, Greg Kroah-Hartman,
	Linux Kernel Mailing List, Linux PM, Basil Eljuse, lkft-triage,
	Linux-Next Mailing List, fntoth, Arnd Bergmann, Anders Roxell

On 2020-03-30 11:13 am, Sudeep Holla wrote:
> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
>> On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
>>> The kernel warning noticed on arm64 juno-r2 device running linux
>>> next-20200326 and next-20200327
>>
>> I suspect this is the correct expected behaviour manifesting. If you're
>> using the upstream juno-r2.dts, the power domain being waited for here is
>> provided by SCPI, however unless you're using an SCP firmware from at least
>> 3 years ago you won't actually have SCPI since they switched it to the newer
>> SCMI protocol, which is not yet supported upstream for Juno. See what
>> happened earlier in the log:
>>
>> [    2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
>> [    2.747586] scpi_protocol: probe of scpi failed with error -110
>>
>> Thus this is the "waiting for a dependency which will never appear" case,
>> for which I assume the warning is intentional,
> 
> Is that the case ?
> 
> Previously we used to get the warning:
> "amba xx: ignoring dependency for device, assuming no driver"
> 
> Now we have the kernel warning in addition to the above.

AFAICS the difference is down to whether deferred_probe_timeout has 
expired or not - I'm not familiar enough with this code to know 
*exactly* what the difference is supposed to represent, nor which change 
has actually pushed the Juno case from one state to the other (other 
than it almost certainly can't be $SUBJECT - if this series is to blame 
at all I'd assume it would be down to patch #1/3, but there's a bunch of 
other rework previously queued in -next that is probably also interacting)

>> since the system is essentially broken (i.e. the hardware/firmware doesn't
>> actually match what the DT describes).
>>
> 
> Not sure if we can term it as "essentially broken". Definitely not 100%
> functional but not broken if the situation like on Juno where SCP firmware
> is fundamental for all OSPM but not essential for boot and other minimum
> set of functionality.

It's "broken" in the sense that the underlying system is *not* the 
system described in the DT. Yes, all the parts that still happen to line 
up will mostly still function OK, but those that don't will 
fundamentally not work as the kernel has been told to expect. I'm not 
sure what you prefer to call "not working as the kernel expects", but I 
call it "broken" ;)

Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-30 12:45           ` Robin Murphy
@ 2020-03-30 13:11             ` Andy Shevchenko
  2020-03-30 13:29               ` Robin Murphy
  2020-03-30 13:25             ` Sudeep Holla
  1 sibling, 1 reply; 20+ messages in thread
From: Andy Shevchenko @ 2020-03-30 13:11 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Sudeep Holla, Andy Shevchenko, Naresh Kamboju, Rafael J. Wysocki,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Basil Eljuse, lkft-triage, Linux-Next Mailing List, Ferry Toth,
	Arnd Bergmann, Anders Roxell

On Mon, Mar 30, 2020 at 3:49 PM Robin Murphy <robin.murphy@arm.com> wrote:
> On 2020-03-30 11:13 am, Sudeep Holla wrote:
> > On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:

...

> AFAICS the difference is down to whether deferred_probe_timeout has
> expired or not - I'm not familiar enough with this code to know
> *exactly* what the difference is supposed to represent, nor which change
> has actually pushed the Juno case from one state to the other (other
> than it almost certainly can't be $SUBJECT - if this series is to blame
> at all I'd assume it would be down to patch #1/3, but there's a bunch of
> other rework previously queued in -next that is probably also interacting)

JFYI: patch #1/3 wasn't applied.


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-30 10:43           ` Andy Shevchenko
@ 2020-03-30 13:16             ` Sudeep Holla
  0 siblings, 0 replies; 20+ messages in thread
From: Sudeep Holla @ 2020-03-30 13:16 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Naresh Kamboju, Rafael J. Wysocki, Robin Murphy,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Basil Eljuse, lkft-triage, Linux-Next Mailing List, fntoth,
	Arnd Bergmann, Anders Roxell

On Mon, Mar 30, 2020 at 01:43:40PM +0300, Andy Shevchenko wrote:

[...]

>
> How this warning related to the patch in the subject? Does revert of the patch
> gives you no warning? (I will be very surprised).
>

I did a quick check reverting the $subject patch and it doesn't remove
extra warning. Sorry for the noise, I assumed so hastily, I was wrong.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-30 12:45           ` Robin Murphy
  2020-03-30 13:11             ` Andy Shevchenko
@ 2020-03-30 13:25             ` Sudeep Holla
  1 sibling, 0 replies; 20+ messages in thread
From: Sudeep Holla @ 2020-03-30 13:25 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andy Shevchenko, Naresh Kamboju, Rafael J. Wysocki,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Basil Eljuse, lkft-triage, Linux-Next Mailing List, fntoth,
	Arnd Bergmann, Sudeep Holla, Anders Roxell

On Mon, Mar 30, 2020 at 01:45:32PM +0100, Robin Murphy wrote:
> On 2020-03-30 11:13 am, Sudeep Holla wrote:
> > On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> > > On 2020-03-27 5:56 pm, Naresh Kamboju wrote:
> > > > The kernel warning noticed on arm64 juno-r2 device running linux
> > > > next-20200326 and next-20200327
> > >
> > > I suspect this is the correct expected behaviour manifesting. If you're
> > > using the upstream juno-r2.dts, the power domain being waited for here is
> > > provided by SCPI, however unless you're using an SCP firmware from at least
> > > 3 years ago you won't actually have SCPI since they switched it to the newer
> > > SCMI protocol, which is not yet supported upstream for Juno. See what
> > > happened earlier in the log:
> > >
> > > [    2.741206] scpi_protocol scpi: incorrect or no SCP firmware found
> > > [    2.747586] scpi_protocol: probe of scpi failed with error -110
> > >
> > > Thus this is the "waiting for a dependency which will never appear" case,
> > > for which I assume the warning is intentional,
> >
> > Is that the case ?
> >
> > Previously we used to get the warning:
> > "amba xx: ignoring dependency for device, assuming no driver"
> >
> > Now we have the kernel warning in addition to the above.
>
> AFAICS the difference is down to whether deferred_probe_timeout has expired
> or not - I'm not familiar enough with this code to know *exactly* what the
> difference is supposed to represent, nor which change has actually pushed
> the Juno case from one state to the other

Me either

> (other than it almost certainly
> can't be $SUBJECT - if this series is to blame at all I'd assume it would be
> down to patch #1/3, but there's a bunch of other rework previously queued in
> -next that is probably also interacting)
>

I agree, I was assuming one of the patch in series but again I may be wrong.

> > > since the system is essentially broken (i.e. the hardware/firmware doesn't
> > > actually match what the DT describes).
> > >
> >
> > Not sure if we can term it as "essentially broken". Definitely not 100%
> > functional but not broken if the situation like on Juno where SCP firmware
> > is fundamental for all OSPM but not essential for boot and other minimum
> > set of functionality.
>
> It's "broken" in the sense that the underlying system is *not* the system
> described in the DT. Yes, all the parts that still happen to line up will
> mostly still function OK, but those that don't will fundamentally not work
> as the kernel has been told to expect. I'm not sure what you prefer to call
> "not working as the kernel expects", but I call it "broken" ;)
>

I agree with you in context of Juno and it's firmware story.

But I also have another development use-case. Unless the DT becomes the
integral part of firmware from start, we can end up having DT with full
DT components(e.g. all SCMI users) while the firmware can add the
features incremental way. I agree this is not common for most of the
kernel developer but practical for few.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-30 13:11             ` Andy Shevchenko
@ 2020-03-30 13:29               ` Robin Murphy
  2020-03-30 20:02                 ` John Stultz
  0 siblings, 1 reply; 20+ messages in thread
From: Robin Murphy @ 2020-03-30 13:29 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Sudeep Holla, Andy Shevchenko, Naresh Kamboju, Rafael J. Wysocki,
	Greg Kroah-Hartman, Linux Kernel Mailing List, Linux PM,
	Basil Eljuse, lkft-triage, Linux-Next Mailing List, Ferry Toth,
	Arnd Bergmann, Anders Roxell

On 2020-03-30 2:11 pm, Andy Shevchenko wrote:
> On Mon, Mar 30, 2020 at 3:49 PM Robin Murphy <robin.murphy@arm.com> wrote:
>> On 2020-03-30 11:13 am, Sudeep Holla wrote:
>>> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> 
> ...
> 
>> AFAICS the difference is down to whether deferred_probe_timeout has
>> expired or not - I'm not familiar enough with this code to know
>> *exactly* what the difference is supposed to represent, nor which change
>> has actually pushed the Juno case from one state to the other (other
>> than it almost certainly can't be $SUBJECT - if this series is to blame
>> at all I'd assume it would be down to patch #1/3, but there's a bunch of
>> other rework previously queued in -next that is probably also interacting)
> 
> JFYI: patch #1/3 wasn't applied.

OK, so if anyone's invested enough to want to investigate, it must be 
something in John's earlier changes here:

https://lore.kernel.org/lkml/20200225050828.56458-1-john.stultz@linaro.org/

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 3/3] driver core: Replace open-coded list_last_entry()
  2020-03-30 13:29               ` Robin Murphy
@ 2020-03-30 20:02                 ` John Stultz
  0 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2020-03-30 20:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Andy Shevchenko, Sudeep Holla, Andy Shevchenko, Naresh Kamboju,
	Rafael J. Wysocki, Greg Kroah-Hartman, Linux Kernel Mailing List,
	Linux PM, Basil Eljuse, lkft-triage, Linux-Next Mailing List,
	Ferry Toth, Arnd Bergmann, Anders Roxell

On Mon, Mar 30, 2020 at 6:30 AM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2020-03-30 2:11 pm, Andy Shevchenko wrote:
> > On Mon, Mar 30, 2020 at 3:49 PM Robin Murphy <robin.murphy@arm.com> wrote:
> >> On 2020-03-30 11:13 am, Sudeep Holla wrote:
> >>> On Fri, Mar 27, 2020 at 07:40:25PM +0000, Robin Murphy wrote:
> >
> > ...
> >
> >> AFAICS the difference is down to whether deferred_probe_timeout has
> >> expired or not - I'm not familiar enough with this code to know
> >> *exactly* what the difference is supposed to represent, nor which change
> >> has actually pushed the Juno case from one state to the other (other
> >> than it almost certainly can't be $SUBJECT - if this series is to blame
> >> at all I'd assume it would be down to patch #1/3, but there's a bunch of
> >> other rework previously queued in -next that is probably also interacting)
> >
> > JFYI: patch #1/3 wasn't applied.
>
> OK, so if anyone's invested enough to want to investigate, it must be
> something in John's earlier changes here:
>
> https://lore.kernel.org/lkml/20200225050828.56458-1-john.stultz@linaro.org/

Hey all,
  Sorry, I just got a heads up about this thread.

So yea, it looks like the change is due likely to the first patch in
my series. Previously, after initcall_done, (since
deferred_probe_timeout was -1 unless manually specified) if the driver
wasn't already loaded we'd print "ignoring dependency for device,
assuming no driver" and return ENODEV.

Now, if modules are enabled (as without modules enabled, I believe
you'd see the same behavior as previous), we wait 30 seconds  (for
userspace to load any posssible modules that meet that dependency) and
then the driver_deferred_probe_timeout fires and we print "deferred
probe timeout, ignoring dependency".

It seems the issue here is the first message was printed with
dev_warn() and the second with dev_WARN() which provides the scary
backtrace.

I think functionally as mentioned above, there's no real behavioral
change here. But please correct me if that's wrong.

Since we are more likely to see the second message now, maybe we
should make both print via dev_warn()?

I'll spin up a patch.

thanks
-john

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2020-03-30 20:02 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-24 12:20 [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Andy Shevchenko
2020-03-24 12:20 ` [PATCH v2 2/3] driver core: Read atomic counter once in driver_probe_done() Andy Shevchenko
2020-03-24 12:55   ` Rafael J. Wysocki
2020-03-24 12:20 ` [PATCH v2 3/3] driver core: Replace open-coded list_last_entry() Andy Shevchenko
2020-03-24 12:53   ` Rafael J. Wysocki
2020-03-27 17:56     ` Naresh Kamboju
2020-03-27 19:40       ` Robin Murphy
2020-03-30 10:13         ` Sudeep Holla
2020-03-30 10:43           ` Andy Shevchenko
2020-03-30 13:16             ` Sudeep Holla
2020-03-30 12:45           ` Robin Murphy
2020-03-30 13:11             ` Andy Shevchenko
2020-03-30 13:29               ` Robin Murphy
2020-03-30 20:02                 ` John Stultz
2020-03-30 13:25             ` Sudeep Holla
2020-03-28  7:25       ` Greg Kroah-Hartman
2020-03-24 12:52 ` [PATCH v2 1/3] driver core: Break infinite loop when deferred probe can't be satisfied Rafael J. Wysocki
2020-03-24 13:39   ` Andy Shevchenko
2020-03-24 15:36     ` Rafael J. Wysocki
2020-03-24 15:47       ` Andy Shevchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).