linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/2] drivers/base: bugfix for supplier<-consumer ordering in device_kset
@ 2018-06-25  7:47 Pingfan Liu
  2018-06-25  7:47 ` [PATCHv2 1/2] drivers/base: only reordering consumer device when probing Pingfan Liu
  2018-06-25  7:47 ` [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers Pingfan Liu
  0 siblings, 2 replies; 7+ messages in thread
From: Pingfan Liu @ 2018-06-25  7:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Greg Kroah-Hartman, Grygorii Strashko,
	Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
	linuxppc-dev

commit 52cdbdd49853 ("driver core: correct device's shutdown order")
places an assumption of supplier<-consumer order on the process of probe.
But it turns out to break down the parent <- child order in some scene.
E.g in pci, a bridge is enabled by pci core, and behind it, the devices
have been probed. Then comes the bridge's module, which enables extra
feature(such as hotplug) on this bridge.
This will break the parent<-children order and cause failure when
"kexec -e" in some scenario.

I tried to fix this issue in pci subsystem, and it turns out to be wrong.
Thanks to Christoph Hellwig, he enlightens me that it should be a bug in
driver core.

note: This series has some lock issue, should be fixed in next version

v1 -> v2:
  refragment

Pingfan Liu (2):
  drivers/base: only reordering consumer device when probing
  drivers/base: reorder consumer and its children behind suppliers

 drivers/base/base.h |   1 +
 drivers/base/core.c | 135 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/base/dd.c   |   9 +---
 3 files changed, 138 insertions(+), 7 deletions(-)

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: linux-pci@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
-- 
2.7.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCHv2 1/2] drivers/base: only reordering consumer device when probing
  2018-06-25  7:47 [PATCHv2 0/2] drivers/base: bugfix for supplier<-consumer ordering in device_kset Pingfan Liu
@ 2018-06-25  7:47 ` Pingfan Liu
  2018-06-25  7:47 ` [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers Pingfan Liu
  1 sibling, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2018-06-25  7:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Greg Kroah-Hartman, Grygorii Strashko,
	Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
	linuxppc-dev

commit 52cdbdd49853 ("driver core: correct device's shutdown order")
places an assumption of supplier<-consumer order on the process of probe.
But it turns out to break down the parent <- child order in some scene.
E.g in pci, a bridge is enabled by pci core, and behind it, the devices
have been probed. Then comes the bridge's module, which enables extra
feature(such as hotplug) on this bridge. This will break the
parent<-children order and cause failure when "kexec -e" in some scenario.

The detailed description of the scenario:
An IBM Power9 machine on which, two drivers portdrv_pci and shpchp(a mod)
match the PCI_CLASS_BRIDGE_PCI, but neither of them success to probe due
to some issue. For this case, the bridge is moved after its children in
devices_kset. Then, when "kexec -e", a ata-disk behind the bridge can not
write back buffer in flight due to the former shutdown of the bridge which
clears the BusMaster bit.

To fix this issue, only reordering a device and all of its children
if it is a consumer.

Note, the bridge involved:
0004:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        NUMA node: 0
        Bus: primary=00, secondary=01, subordinate=12, sec-latency=0
        I/O behind bridge: 00000000-00000fff
        Memory behind bridge: 80000000-ffefffff
        Prefetchable memory behind bridge: 0006024000000000-0006027f7fffffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR+ NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [48] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 256 bytes, MaxReadReq 128 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM not supported, Exit Latency L0s <64ns, L1 <1us
                        ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk- DLActive+ BWMgmt- ABWMgmt+
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported ARIFwd+
                DevCtl2: Completion Timeout: 16ms to 55ms, TimeoutDis+, LTR-, OBFF Disabled ARIFwd+
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP+ FCP- CmpltTO+ CmpltAbrt+ UnxCmplt- RxOF- MalfTLP- ECRC+ UnsupReq- ACSViol-
                UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+
        Capabilities: [148 v1] #19

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: linux-pci@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
---
note: this patch points out the code where the bug is introduced.
and I hope it sketches out the scene. Will fold the series in one.

---
 drivers/base/base.h | 1 +
 drivers/base/core.c | 9 +++++++++
 drivers/base/dd.c   | 9 ++-------
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index a75c302..37f86ca 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -135,6 +135,7 @@ extern void device_unblock_probing(void);
 
 /* /sys/devices directory */
 extern struct kset *devices_kset;
+extern int device_reorder_consumer(struct device *dev);
 extern void devices_kset_move_last(struct device *dev);
 
 #if defined(CONFIG_MODULES) && defined(CONFIG_SYSFS)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 36622b5..66f06ff 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -123,6 +123,15 @@ static int device_is_dependent(struct device *dev, void *target)
 	return ret;
 }
 
+/* a temporary place holder to mark out the root cause of the bug.
+ * The proposal algorithm will come in next patch
+ */
+int device_reorder_consumer(struct device *dev)
+{
+	devices_kset_move_last(dev);
+	return 0;
+}
+
 static int device_reorder_to_tail(struct device *dev, void *not_used)
 {
 	struct device_link *link;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 1435d72..c74f23c 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -434,13 +434,8 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 			goto probe_failed;
 	}
 
-	/*
-	 * Ensure devices are listed in devices_kset in correct order
-	 * It's important to move Dev to the end of devices_kset before
-	 * calling .probe, because it could be recursive and parent Dev
-	 * should always go first
-	 */
-	devices_kset_move_last(dev);
+	/* only reoder consumer and its children after suppliers.*/
+	device_reorder_consumer(dev);
 
 	if (dev->bus->probe) {
 		ret = dev->bus->probe(dev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers
  2018-06-25  7:47 [PATCHv2 0/2] drivers/base: bugfix for supplier<-consumer ordering in device_kset Pingfan Liu
  2018-06-25  7:47 ` [PATCHv2 1/2] drivers/base: only reordering consumer device when probing Pingfan Liu
@ 2018-06-25  7:47 ` Pingfan Liu
  2018-06-25 10:45   ` Greg Kroah-Hartman
  1 sibling, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2018-06-25  7:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Pingfan Liu, Greg Kroah-Hartman, Grygorii Strashko,
	Christoph Hellwig, Bjorn Helgaas, Dave Young, linux-pci,
	linuxppc-dev

commit 52cdbdd49853 ("driver core: correct device's shutdown order")
introduces supplier<-consumer order in devices_kset. The commit tries
to cleverly maintain both parent<-child and supplier<-consumer order by
reordering a device when probing. This method makes things simple and
clean, but unfortunately, breaks parent<-child order in some case,
which is described in next patch in this series.
Here this patch tries to resolve supplier<-consumer by only reordering a
device when it has suppliers, and takes care of the following scenario:
    [consumer, children] [ ... potential ... ] supplier
                         ^                   ^
After moving the consumer and its children after the supplier, the
potentail section may contain consumers whose supplier is inside
children, and this poses the requirement to dry out all consumpers in
the section recursively.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Bjorn Helgaas <helgaas@kernel.org>
Cc: Dave Young <dyoung@redhat.com>
Cc: linux-pci@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
---
note: there is lock issue in this patch, should be fixed in next version

---
 drivers/base/core.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 129 insertions(+), 3 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 66f06ff..db30e86 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -123,12 +123,138 @@ static int device_is_dependent(struct device *dev, void *target)
 	return ret;
 }
 
-/* a temporary place holder to mark out the root cause of the bug.
- * The proposal algorithm will come in next patch
+struct pos_info {
+	struct device *pos;
+	struct device *tail;
+};
+
+/* caller takes the devices_kset->list_lock */
+static int descendants_reorder_after_pos(struct device *dev,
+	void *data)
+{
+	struct device *pos;
+	struct pos_info *p = data;
+
+	pos = p->pos;
+	pr_debug("devices_kset: Moving %s after %s\n",
+		 dev_name(dev), dev_name(pos));
+	device_for_each_child(dev, p, descendants_reorder_after_pos);
+	/* children at the tail */
+	list_move(&dev->kobj.entry, &pos->kobj.entry);
+	/* record the right boundary of the section */
+	if (p->tail == NULL)
+		p->tail = dev;
+	return 0;
+}
+
+/* iterate over an open section */
+#define list_opensect_for_each_reverse(cur, left, right)	\
+	for (cur = right->prev; cur == left; cur = cur->prev)
+
+static bool is_consumer(struct device *query, struct device *supplier)
+{
+	struct device_link *link;
+	/* todo, lock protection */
+	list_for_each_entry(link, &supplier->links.consumers, s_node)
+		if (link->consumer == query)
+			return true;
+	return false;
+}
+
+/* recursively move the potential consumers in open section (left, right)
+ * after the barrier
+ */
+static int __device_reorder_consumer(struct device *consumer,
+	struct list_head *left, struct list_head *right,
+	struct pos_info *p)
+{
+	struct list_head *iter;
+	struct device *c_dev, *s_dev, *tail_dev;
+
+	descendants_reorder_after_pos(consumer, p);
+	tail_dev = p->tail;
+	/* (left, right) may contain consumers, hence checking if any moved
+	 * child serving as supplier. The reversing order help us to meet
+	 * the last supplier of a consumer.
+	 */
+	list_opensect_for_each_reverse(iter, left, right) {
+		struct list_head *l_iter, *moved_left, *moved_right;
+
+		moved_left = (&consumer->kobj.entry)->prev;
+		moved_right = tail_dev->kobj.entry.next;
+		/* the moved section may contain potential suppliers */
+		list_opensect_for_each_reverse(l_iter, moved_left,
+			moved_right) {
+			s_dev = list_entry(l_iter, struct device, kobj.entry);
+			c_dev = list_entry(iter, struct device, kobj.entry);
+			/* to fix: this poses extra effort for locking */
+			if (is_consumer(c_dev, s_dev)) {
+				p->tail = NULL;
+				/* to fix: lock issue */
+				p->pos =  s_dev;
+				/* reorder after the last supplier */
+				__device_reorder_consumer(c_dev,
+					l_iter, right, p);
+			}
+		}
+	}
+	return 0;
+}
+
+static int find_last_supplier(struct device *dev, struct device *supplier)
+{
+	struct device_link *link;
+
+	list_for_each_entry_reverse(link, &dev->links.suppliers, c_node) {
+		if (link->supplier == supplier)
+			return 1;
+	}
+	if (dev == supplier)
+		return -1;
+	return 0;
+}
+
+/* When reodering, take care of the range of (old_pos(dev), new_pos(dev)),
+ * there may be requirement to recursively move item.
  */
 int device_reorder_consumer(struct device *dev)
 {
-	devices_kset_move_last(dev);
+	struct list_head *iter, *left, *right;
+	struct device *cur_dev;
+	struct pos_info info;
+	int ret, idx;
+
+	idx = device_links_read_lock();
+	if (list_empty(&dev->links.suppliers)) {
+		device_links_read_unlock(idx);
+		return 0;
+	}
+	spin_lock(&devices_kset->list_lock);
+	list_for_each_prev(iter, &devices_kset->list) {
+		cur_dev = list_entry(iter, struct device, kobj.entry);
+		ret = find_last_supplier(dev, cur_dev);
+		switch (ret) {
+		case -1:
+			goto unlock;
+		case 1:
+			break;
+		case 0:
+			continue;
+		}
+	}
+	BUG_ON(!ret);
+
+	/* record the affected open section */
+	left = dev->kobj.entry.prev;
+	right = iter;
+	info.pos = list_entry(iter, struct device, kobj.entry);
+	info.tail = NULL;
+	/* dry out the consumers in (left,right) */
+	__device_reorder_consumer(dev, left, right, &info);
+
+unlock:
+	spin_unlock(&devices_kset->list_lock);
+	device_links_read_unlock(idx);
 	return 0;
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers
  2018-06-25  7:47 ` [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers Pingfan Liu
@ 2018-06-25 10:45   ` Greg Kroah-Hartman
  2018-06-26  3:29     ` Pingfan Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2018-06-25 10:45 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Grygorii Strashko, Christoph Hellwig,
	Bjorn Helgaas, Dave Young, linux-pci, linuxppc-dev

On Mon, Jun 25, 2018 at 03:47:39PM +0800, Pingfan Liu wrote:
> commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> introduces supplier<-consumer order in devices_kset. The commit tries
> to cleverly maintain both parent<-child and supplier<-consumer order by
> reordering a device when probing. This method makes things simple and
> clean, but unfortunately, breaks parent<-child order in some case,
> which is described in next patch in this series.

There is no "next patch in this series" :(

> Here this patch tries to resolve supplier<-consumer by only reordering a
> device when it has suppliers, and takes care of the following scenario:
>     [consumer, children] [ ... potential ... ] supplier
>                          ^                   ^
> After moving the consumer and its children after the supplier, the
> potentail section may contain consumers whose supplier is inside
> children, and this poses the requirement to dry out all consumpers in
> the section recursively.
> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Bjorn Helgaas <helgaas@kernel.org>
> Cc: Dave Young <dyoung@redhat.com>
> Cc: linux-pci@vger.kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> ---
> note: there is lock issue in this patch, should be fixed in next version

Please send patches that you know are correct, why would I want to
review this if you know it is not correct?

And if the original commit is causing problems for you, why not just
revert that instead of adding this much-increased complexity?



> 
> ---
>  drivers/base/core.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 129 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 66f06ff..db30e86 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -123,12 +123,138 @@ static int device_is_dependent(struct device *dev, void *target)
>  	return ret;
>  }
>  
> -/* a temporary place holder to mark out the root cause of the bug.
> - * The proposal algorithm will come in next patch
> +struct pos_info {
> +	struct device *pos;
> +	struct device *tail;
> +};
> +
> +/* caller takes the devices_kset->list_lock */
> +static int descendants_reorder_after_pos(struct device *dev,
> +	void *data)

Why are you wrapping lines that do not need to be wrapped?

What does this function do?

> +{
> +	struct device *pos;
> +	struct pos_info *p = data;
> +
> +	pos = p->pos;
> +	pr_debug("devices_kset: Moving %s after %s\n",
> +		 dev_name(dev), dev_name(pos));

You have a device, use it for debugging, i.e. dev_dbg().

> +	device_for_each_child(dev, p, descendants_reorder_after_pos);

Recursive?

> +	/* children at the tail */
> +	list_move(&dev->kobj.entry, &pos->kobj.entry);
> +	/* record the right boundary of the section */
> +	if (p->tail == NULL)
> +		p->tail = dev;
> +	return 0;
> +}

I really do not understand what the above code is supposed to be doing :(

> +
> +/* iterate over an open section */
> +#define list_opensect_for_each_reverse(cur, left, right)	\
> +	for (cur = right->prev; cur == left; cur = cur->prev)
> +
> +static bool is_consumer(struct device *query, struct device *supplier)
> +{
> +	struct device_link *link;
> +	/* todo, lock protection */

Always run checkpatch.pl on patches so you do not get grumpy maintainers
telling you to run checkpatch.pl :(

> +	list_for_each_entry(link, &supplier->links.consumers, s_node)
> +		if (link->consumer == query)
> +			return true;
> +	return false;
> +}
> +
> +/* recursively move the potential consumers in open section (left, right)
> + * after the barrier

What barrier?

I'm stopping here as I have no idea what is going on, and this needs a
lot more work at the basic level of "it handles locking correctly"...

If you are working on this for power9, I'm guessing you work for IBM?
If so, please run this through your internal patch review process before
sending it out again...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers
  2018-06-25 10:45   ` Greg Kroah-Hartman
@ 2018-06-26  3:29     ` Pingfan Liu
  2018-06-26 11:54       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 7+ messages in thread
From: Pingfan Liu @ 2018-06-26  3:29 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Grygorii Strashko, Christoph Hellwig,
	Bjorn Helgaas, Dave Young, linux-pci, linuxppc-dev

On Mon, Jun 25, 2018 at 6:45 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jun 25, 2018 at 03:47:39PM +0800, Pingfan Liu wrote:
> > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > introduces supplier<-consumer order in devices_kset. The commit tries
> > to cleverly maintain both parent<-child and supplier<-consumer order by
> > reordering a device when probing. This method makes things simple and
> > clean, but unfortunately, breaks parent<-child order in some case,
> > which is described in next patch in this series.
>
> There is no "next patch in this series" :(
>
Oh, re-arrange the patches, and forget the comment in log

> > Here this patch tries to resolve supplier<-consumer by only reordering a
> > device when it has suppliers, and takes care of the following scenario:
> >     [consumer, children] [ ... potential ... ] supplier
> >                          ^                   ^
> > After moving the consumer and its children after the supplier, the
> > potentail section may contain consumers whose supplier is inside
> > children, and this poses the requirement to dry out all consumpers in
> > the section recursively.
> >
> > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > Cc: Dave Young <dyoung@redhat.com>
> > Cc: linux-pci@vger.kernel.org
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > ---
> > note: there is lock issue in this patch, should be fixed in next version
>
> Please send patches that you know are correct, why would I want to
> review this if you know it is not correct?
>
> And if the original commit is causing problems for you, why not just
> revert that instead of adding this much-increased complexity?
>
Revert the original commit, then it will expose the error  order
"consumer <- supplier" again.
This patch tries to resolve the error and fix the following scenario:
step0:  before the consumer device's probing,  (note child_a is a
supplier of consumer_a, etc)
[ consumer-X,  child_a, ...., child_z]     [.... consumer_a, ...,
consumer_z, ....]    supplier-X
                                                             ^^^
affected range during moving^^^
step1: When probing, moving consumer-X after supplier-X
[ child_a, ...., child_z]     [.... consumer_a, ...,     consumer_z,
....]   supplier-X, consumer-X
But it breaks "parent <-child" seq now, and should be fixed like:
step2:
[.... consumer_a, ...,     consumer_z, ....]  supplier-X  [
consumer-X,  child_a, ...., child_z]    <---
descendants_reorder_after_pos() does it.
Again, the seq "consumer_a <- child_a" breaks the "supplier<-consumer"
 order, should be fixed like:
step3:
[....  consumer_z, .....]  supplier-X  [ consumer-X,  child_a,
consumer_a ...., child_z]   <--- __device_reorder_consumer() does it.
^^ affected range^^
The moving of consumer_a brings us to face the same scenario of step1,
hence we need an external recursion.

Each round of step3,  __device_reorder_consumer() resolves its "local
affected range", which is a fraction of the "whole affected range".
Hence finally, we have all potential consumers in affected range resolved.
(Maybe I can split patch at step2 and step3 to ease the review for the
next version)

Since __device_reorder_consumer() has already hold devices_kset's spin
lock, and need to get srcu lock on devices->links.consumers.
This needs a breakage of spin lock, and will incur much effort. If the
above algorithm is fine, I can do it.
>
>
> >
> > ---
> >  drivers/base/core.c | 132 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 129 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 66f06ff..db30e86 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -123,12 +123,138 @@ static int device_is_dependent(struct device *dev, void *target)
> >       return ret;
> >  }
> >
> > -/* a temporary place holder to mark out the root cause of the bug.
> > - * The proposal algorithm will come in next patch
> > +struct pos_info {
> > +     struct device *pos;
> > +     struct device *tail;
> > +};
> > +
> > +/* caller takes the devices_kset->list_lock */
> > +static int descendants_reorder_after_pos(struct device *dev,
> > +     void *data)
>
> Why are you wrapping lines that do not need to be wrapped?
>
OK, will fix.

> What does this function do?
>
As the name implies, reordering dev and its children after a position.
When moving a consumer after a supplier, we break down the order
of  "parent <-child" order of consumer and its children in devices_kset.
Hence we should move the children too.
The param "data"  contains the position info, and its name is not
illuminated :(,
since the func proto is required by device_for_each_child(), may be better to
name it as postion_info

> > +{
> > +     struct device *pos;
> > +     struct pos_info *p = data;
> > +
> > +     pos = p->pos;
> > +     pr_debug("devices_kset: Moving %s after %s\n",
> > +              dev_name(dev), dev_name(pos));
>
> You have a device, use it for debugging, i.e. dev_dbg().
>
But here we have two devices.

> > +     device_for_each_child(dev, p, descendants_reorder_after_pos);
>
> Recursive?
>
Yes, in order to move all children of the consumer.

> > +     /* children at the tail */
> > +     list_move(&dev->kobj.entry, &pos->kobj.entry);
> > +     /* record the right boundary of the section */
> > +     if (p->tail == NULL)
> > +             p->tail = dev;
> > +     return 0;
> > +}
>
> I really do not understand what the above code is supposed to be doing :(
>
The moved consumer's children may be  suppliers of devices,
[.... consumer_a, ...,     consumer_z, ....]    supplier-X      [
consumer-X,  child_a, ............, child_z]
^^^    potential consumers  ^^^^^^

                                           ^^potential suppliers^^
Now,  consumer_a and its supplier child_a  violate the order
"supplier<-consumer".
To pick out such violation, we need to check the potential suppliers
against potential
consumers. And p->tail helps to record the new moved position of child_z.

> > +
> > +/* iterate over an open section */
> > +#define list_opensect_for_each_reverse(cur, left, right)     \
> > +     for (cur = right->prev; cur == left; cur = cur->prev)
> > +
> > +static bool is_consumer(struct device *query, struct device *supplier)
> > +{
> > +     struct device_link *link;
> > +     /* todo, lock protection */
>
> Always run checkpatch.pl on patches so you do not get grumpy maintainers
> telling you to run checkpatch.pl :(
>
Yes, I had run it, and only got a warning:
WARNING: Avoid crashing the kernel - try using WARN_ON & recovery code
rather than BUG() or BUG_ON()
#167: FILE: drivers/base/core.c:245:
+ BUG_ON(!ret);

total: 0 errors, 1 warnings, 141 lines checked

> > +     list_for_each_entry(link, &supplier->links.consumers, s_node)
> > +             if (link->consumer == query)
> > +                     return true;
> > +     return false;
> > +}
> > +
> > +/* recursively move the potential consumers in open section (left, right)
> > + * after the barrier
>
> What barrier?
>
A position that moved devices can not cross before.

> I'm stopping here as I have no idea what is going on, and this needs a
> lot more work at the basic level of "it handles locking correctly"...
>
> If you are working on this for power9, I'm guessing you work for IBM?

No. I just hit this bug.

> If so, please run this through your internal patch review process before
> sending it out again...
>
I will try my best to find some guys to review. But is the assumption
of step0 and
the following algorithm worth to try?

Thanks and regards,
Pingfan

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers
  2018-06-26  3:29     ` Pingfan Liu
@ 2018-06-26 11:54       ` Greg Kroah-Hartman
  2018-07-03  6:48         ` Pingfan Liu
  0 siblings, 1 reply; 7+ messages in thread
From: Greg Kroah-Hartman @ 2018-06-26 11:54 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-kernel, Grygorii Strashko, Christoph Hellwig,
	Bjorn Helgaas, Dave Young, linux-pci, linuxppc-dev

On Tue, Jun 26, 2018 at 11:29:48AM +0800, Pingfan Liu wrote:
> On Mon, Jun 25, 2018 at 6:45 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Jun 25, 2018 at 03:47:39PM +0800, Pingfan Liu wrote:
> > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > introduces supplier<-consumer order in devices_kset. The commit tries
> > > to cleverly maintain both parent<-child and supplier<-consumer order by
> > > reordering a device when probing. This method makes things simple and
> > > clean, but unfortunately, breaks parent<-child order in some case,
> > > which is described in next patch in this series.
> >
> > There is no "next patch in this series" :(
> >
> Oh, re-arrange the patches, and forget the comment in log
> 
> > > Here this patch tries to resolve supplier<-consumer by only reordering a
> > > device when it has suppliers, and takes care of the following scenario:
> > >     [consumer, children] [ ... potential ... ] supplier
> > >                          ^                   ^
> > > After moving the consumer and its children after the supplier, the
> > > potentail section may contain consumers whose supplier is inside
> > > children, and this poses the requirement to dry out all consumpers in
> > > the section recursively.
> > >
> > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > > Cc: Christoph Hellwig <hch@infradead.org>
> > > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > > Cc: Dave Young <dyoung@redhat.com>
> > > Cc: linux-pci@vger.kernel.org
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > ---
> > > note: there is lock issue in this patch, should be fixed in next version
> >
> > Please send patches that you know are correct, why would I want to
> > review this if you know it is not correct?
> >
> > And if the original commit is causing problems for you, why not just
> > revert that instead of adding this much-increased complexity?
> >
> Revert the original commit, then it will expose the error  order
> "consumer <- supplier" again.
> This patch tries to resolve the error and fix the following scenario:
> step0:  before the consumer device's probing,  (note child_a is a
> supplier of consumer_a, etc)
> [ consumer-X,  child_a, ...., child_z]     [.... consumer_a, ...,
> consumer_z, ....]    supplier-X
>                                                              ^^^
> affected range during moving^^^
> step1: When probing, moving consumer-X after supplier-X
> [ child_a, ...., child_z]     [.... consumer_a, ...,     consumer_z,
> ....]   supplier-X, consumer-X
> But it breaks "parent <-child" seq now, and should be fixed like:
> step2:
> [.... consumer_a, ...,     consumer_z, ....]  supplier-X  [
> consumer-X,  child_a, ...., child_z]    <---
> descendants_reorder_after_pos() does it.
> Again, the seq "consumer_a <- child_a" breaks the "supplier<-consumer"
>  order, should be fixed like:
> step3:
> [....  consumer_z, .....]  supplier-X  [ consumer-X,  child_a,
> consumer_a ...., child_z]   <--- __device_reorder_consumer() does it.
> ^^ affected range^^
> The moving of consumer_a brings us to face the same scenario of step1,
> hence we need an external recursion.

Something really got messed up here, and this all does not make any
sense :(

Can you try again?

Also, please cc: Rafael on all of this, as he wrote all of this
consumer/supplier logic and I am not that familiar with it at all.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers
  2018-06-26 11:54       ` Greg Kroah-Hartman
@ 2018-07-03  6:48         ` Pingfan Liu
  0 siblings, 0 replies; 7+ messages in thread
From: Pingfan Liu @ 2018-07-03  6:48 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: linux-kernel, Grygorii Strashko, Christoph Hellwig,
	Bjorn Helgaas, Dave Young, linux-pci, linuxppc-dev,
	rafael.j.wysocki

On Tue, Jun 26, 2018 at 7:54 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Jun 26, 2018 at 11:29:48AM +0800, Pingfan Liu wrote:
> > On Mon, Jun 25, 2018 at 6:45 PM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Mon, Jun 25, 2018 at 03:47:39PM +0800, Pingfan Liu wrote:
> > > > commit 52cdbdd49853 ("driver core: correct device's shutdown order")
> > > > introduces supplier<-consumer order in devices_kset. The commit tries
> > > > to cleverly maintain both parent<-child and supplier<-consumer order by
> > > > reordering a device when probing. This method makes things simple and
> > > > clean, but unfortunately, breaks parent<-child order in some case,
> > > > which is described in next patch in this series.
> > >
> > > There is no "next patch in this series" :(
> > >
> > Oh, re-arrange the patches, and forget the comment in log
> >
> > > > Here this patch tries to resolve supplier<-consumer by only reordering a
> > > > device when it has suppliers, and takes care of the following scenario:
> > > >     [consumer, children] [ ... potential ... ] supplier
> > > >                          ^                   ^
> > > > After moving the consumer and its children after the supplier, the
> > > > potentail section may contain consumers whose supplier is inside
> > > > children, and this poses the requirement to dry out all consumpers in
> > > > the section recursively.
> > > >
> > > > Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > > > Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> > > > Cc: Christoph Hellwig <hch@infradead.org>
> > > > Cc: Bjorn Helgaas <helgaas@kernel.org>
> > > > Cc: Dave Young <dyoung@redhat.com>
> > > > Cc: linux-pci@vger.kernel.org
> > > > Cc: linuxppc-dev@lists.ozlabs.org
> > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> > > > ---
> > > > note: there is lock issue in this patch, should be fixed in next version
> > >
> > > Please send patches that you know are correct, why would I want to
> > > review this if you know it is not correct?
> > >
> > > And if the original commit is causing problems for you, why not just
> > > revert that instead of adding this much-increased complexity?
> > >
> > Revert the original commit, then it will expose the error  order
> > "consumer <- supplier" again.
> > This patch tries to resolve the error and fix the following scenario:
> > step0:  before the consumer device's probing,  (note child_a is a
> > supplier of consumer_a, etc)
> > [ consumer-X,  child_a, ...., child_z]     [.... consumer_a, ...,
> > consumer_z, ....]    supplier-X
> >                                                              ^^^
> > affected range during moving^^^
> > step1: When probing, moving consumer-X after supplier-X
> > [ child_a, ...., child_z]     [.... consumer_a, ...,     consumer_z,
> > ....]   supplier-X, consumer-X
> > But it breaks "parent <-child" seq now, and should be fixed like:
> > step2:
> > [.... consumer_a, ...,     consumer_z, ....]  supplier-X  [
> > consumer-X,  child_a, ...., child_z]    <---
> > descendants_reorder_after_pos() does it.
> > Again, the seq "consumer_a <- child_a" breaks the "supplier<-consumer"
> >  order, should be fixed like:
> > step3:
> > [....  consumer_z, .....]  supplier-X  [ consumer-X,  child_a,
> > consumer_a ...., child_z]   <--- __device_reorder_consumer() does it.
> > ^^ affected range^^
> > The moving of consumer_a brings us to face the same scenario of step1,
> > hence we need an external recursion.
>
> Something really got messed up here, and this all does not make any
> sense :(
>
> Can you try again?
>
> Also, please cc: Rafael on all of this, as he wrote all of this
> consumer/supplier logic and I am not that familiar with it at all.
>
Cc Rafael J. Wysocki for the context.  I will send out V3 soon.

Regards,
Pingfan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-03  6:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-25  7:47 [PATCHv2 0/2] drivers/base: bugfix for supplier<-consumer ordering in device_kset Pingfan Liu
2018-06-25  7:47 ` [PATCHv2 1/2] drivers/base: only reordering consumer device when probing Pingfan Liu
2018-06-25  7:47 ` [PATCHv2 2/2] drivers/base: reorder consumer and its children behind suppliers Pingfan Liu
2018-06-25 10:45   ` Greg Kroah-Hartman
2018-06-26  3:29     ` Pingfan Liu
2018-06-26 11:54       ` Greg Kroah-Hartman
2018-07-03  6:48         ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).