linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yinghai Lu <yinghai@kernel.org>
To: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] pci: fix invalid list entry warning for double pci device removing via sysfs
Date: Wed, 30 Oct 2013 23:45:07 -0700	[thread overview]
Message-ID: <CAE9FiQXp+9XKbs8ccJVt08icax6SOWioDfL5s+H1SWGi9cTAiw@mail.gmail.com> (raw)
In-Reply-To: <5270DBFC.3050705@cn.fujitsu.com>

[-- Attachment #1: Type: text/plain, Size: 4115 bytes --]

On Wed, Oct 30, 2013 at 3:14 AM, Gu Zheng <guz.fnst@cn.fujitsu.com> wrote:
> When concurent removing pci devices which are in the same pci subtree
> via sysfs, such as:
> echo -n 1 > /sys/bus/pci/devices/0000\:10\:00.0/remove ; echo -n 1 >
> /sys/bus/pci/devices/0000\:1a\:01.0/remove
> (1a:01.0 device is downstream from the 10:00.0 bridge)
>
> the following warning will show:
> [ 1799.280918] ------------[ cut here ]------------
> [ 1799.336199] WARNING: CPU: 7 PID: 126 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0()
> [ 1799.433093] list_del corruption, ffff8807b4a7c000->next is LIST_POISON1 (dead000000100100)
> [ 1799.532110] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter ip_tables sg dm_mirror dm_region_hash dm_log dm_mod vfat fat e1000e igb iTCO_wdt iTCO_vendor_support ioatdma ptp i7core_edac ipmi_si edac_core lpc_ich pps_core i2c_i801 pcspkr mfd_core dca ipmi_msghandler acpi_cpufreq xfs libcrc32c sd_mod crc_t10dif crct10dif_common i2c_algo_bit drm_kms_helper ttm drm mptsas scsi_transport_sas mptscsih i2c_core megaraid_sas mptbase
> [ 1800.276623] CPU: 7 PID: 126 Comm: kworker/u512:1 Tainted: G        W    3.12.0-rc5+ #196
> [ 1800.508918] Workqueue: sysfsd sysfs_schedule_callback_work
> [ 1800.574703]  0000000000000009 ffff8807adbadbd8 ffffffff8168b26c ffff8807c27d08a8
> [ 1800.663860]  ffff8807adbadc28 ffff8807adbadc18 ffffffff810711dc ffff8807adbadc68
> [ 1800.753130]  ffff8807b4a7c000 ffff8807b4a7c000 ffff8807ad089c00 0000000000000000
> [ 1800.842282] Call Trace:
> [ 1800.871651]  [<ffffffff8168b26c>] dump_stack+0x55/0x76
> [ 1800.933301]  [<ffffffff810711dc>] warn_slowpath_common+0x8c/0xc0
> [ 1801.005283]  [<ffffffff810712c6>] warn_slowpath_fmt+0x46/0x50
> [ 1801.074081]  [<ffffffff8135a343>] __list_del_entry+0x63/0xd0
> [ 1801.141839]  [<ffffffff8135a3c1>] list_del+0x11/0x40
> [ 1801.201320]  [<ffffffff813734da>] pci_remove_bus_device+0x6a/0xe0
> [ 1801.274279]  [<ffffffff8137356e>] pci_stop_and_remove_bus_device+0x1e/0x30
> [ 1801.356606]  [<ffffffff8137b20b>] remove_callback+0x2b/0x40
> [ 1801.423412]  [<ffffffff81251848>] sysfs_schedule_callback_work+0x18/0x60
> [ 1801.503744]  [<ffffffff8108eab5>] process_one_work+0x1f5/0x540
> [ 1801.573640]  [<ffffffff8108ea53>] ? process_one_work+0x193/0x540
> [ 1801.645616]  [<ffffffff8108f2ac>] worker_thread+0x11c/0x370
> [ 1801.712337]  [<ffffffff8108f190>] ? rescuer_thread+0x350/0x350
> [ 1801.782178]  [<ffffffff8109731d>] kthread+0xed/0x100
> [ 1801.841661]  [<ffffffff81097230>] ? kthread_create_on_node+0x160/0x160
> [ 1801.919919]  [<ffffffff8169cc3c>] ret_from_fork+0x7c/0xb0
> [ 1801.984608]  [<ffffffff81097230>] ? kthread_create_on_node+0x160/0x160
> [ 1802.062825] ---[ end trace d77f2054de000fb7 ]---
>
> This issue is related to the bug 54411:
> https://bugzilla.kernel.org/show_bug.cgi?id=54411
>
> Since we added the pci_bus reference management, the bug becomes a
> invalid list entry warning as descripted above. Beacuse it still
> trys to delete an deleted pci device from device list.
>
> So here we make stop device actually detach driver only, and remove
> device will do device_del instead, and move bus_list change and pci device
> resource free into pci_release_dev, so that it'll consistent with
> bus reference managment, and the device only can be deleted from device
> list in pci_release_dev just for one time.
>
> Besides, it also makes hostbridge to use device_unregister to be pair
> with device_register before.
>
> This patch is based on Yinghai's privious similar patch:
> http://lkml.org/lkml/2013/5/13/658

I have updated version, please check if attached patches fix the problem.

virtfn_release.patch
fix_cx3_hotplug_2.patch
fix_racing_removing_6_1.patch
fix_racing_removing_6_2.patch
fix_racing_removing_6_3.patch

Thanks

Yinghai

[-- Attachment #2: virtfn_release.patch --]
[-- Type: text/x-patch, Size: 4589 bytes --]

Subject: [PATCH] PCI: Release PF ref during removing VF
From: Yinghai Lu <yinghai@kernel.org>

commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
broke the pcie native hotplug with SRIOV enabled: PF is not freed
during hot-remove, and later hot-add do not work as old pci_dev
is still around, and can not create new pci_dev.

That commit need VFs to be removed via pci_disable_sriov/virtfn_remove
to make sure ref to PF is put back.

So we can not call stop_and_remove for VF before PF, that will
make VF get removed directly before PF's driver try to use
pci_disable_sriov/virtfn_remove to do it.

Solution would be
1. separating stop and remove in two iterations.
2. or make pci_stop_and_remove_bus_device could be used on VF.

This patch is second solution:
Separate PF ref releasing from virtfn_remove, all that during
pci_destroy_dev for VFs.

-v2: fix warning when SRIOV is not defined.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/iov.c    |   34 +++++++++++++++++++++++++++++-----
 drivers/pci/pci.h    |    4 ++++
 drivers/pci/remove.c |   17 +++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

Index: linux-2.6/drivers/pci/iov.c
===================================================================
--- linux-2.6.orig/drivers/pci/iov.c
+++ linux-2.6/drivers/pci/iov.c
@@ -132,9 +132,37 @@ failed:
 	return rc;
 }
 
-static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+void virtfn_release(struct pci_dev *virtfn)
 {
+	int i;
+	struct pci_dev *dev;
 	char buf[VIRTFN_ID_LEN];
+
+	if (!virtfn->is_virtfn)
+		return;
+
+	dev = virtfn->physfn;
+
+	/*
+	 * Remove link to virtfn for PF.
+	 * pci_disable_sriov() could be called after pci_stop_dev() is
+	 * called for PF. So check sd to avoid spurious sfsfs warning.
+	 */
+	if (dev->dev.kobj.sd)
+		for (i = 0; i < dev->sriov->num_VFs; i++)
+			if ((virtfn_bus(dev, i) == virtfn->bus->number) &&
+			    (virtfn_devfn(dev, i) == virtfn->devfn)) {
+				sprintf(buf, "virtfn%u", i);
+				sysfs_remove_link(&dev->dev.kobj, buf);
+				break;
+			}
+
+	virtfn_remove_bus(dev->bus, virtfn->bus);
+	pci_dev_put(dev);
+}
+
+static void virtfn_remove(struct pci_dev *dev, int id, int reset)
+{
 	struct pci_dev *virtfn;
 	struct pci_sriov *iov = dev->sriov;
 
@@ -149,8 +177,6 @@ static void virtfn_remove(struct pci_dev
 		__pci_reset_function(virtfn);
 	}
 
-	sprintf(buf, "virtfn%u", id);
-	sysfs_remove_link(&dev->dev.kobj, buf);
 	/*
 	 * pci_stop_dev() could have been called for this virtfn already,
 	 * so the directory for the virtfn may have been removed before.
@@ -161,12 +187,10 @@ static void virtfn_remove(struct pci_dev
 
 	mutex_lock(&iov->dev->sriov->lock);
 	pci_stop_and_remove_bus_device(virtfn);
-	virtfn_remove_bus(dev->bus, virtfn->bus);
 	mutex_unlock(&iov->dev->sriov->lock);
 
 	/* balance pci_get_domain_bus_and_slot() */
 	pci_dev_put(virtfn);
-	pci_dev_put(dev);
 }
 
 static int sriov_migration(struct pci_dev *dev)
Index: linux-2.6/drivers/pci/pci.h
===================================================================
--- linux-2.6.orig/drivers/pci/pci.h
+++ linux-2.6/drivers/pci/pci.h
@@ -259,6 +259,7 @@ int pci_iov_resource_bar(struct pci_dev
 resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
 void pci_restore_iov_state(struct pci_dev *dev);
 int pci_iov_bus_range(struct pci_bus *bus);
+void virtfn_release(struct pci_dev *dev);
 
 #else
 static inline int pci_iov_init(struct pci_dev *dev)
@@ -281,6 +282,9 @@ static inline int pci_iov_bus_range(stru
 {
 	return 0;
 }
+static inline void virtfn_release(struct pci_dev *dev)
+{
+}
 
 #endif /* CONFIG_PCI_IOV */
 
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -38,6 +38,8 @@ static void pci_destroy_dev(struct pci_d
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);
 
+	virtfn_release(dev);
+
 	pci_free_resources(dev);
 	put_device(&dev->dev);
 }
@@ -107,8 +109,23 @@ static void pci_remove_bus_device(struct
  */
 void pci_stop_and_remove_bus_device(struct pci_dev *dev)
 {
+#ifdef CONFIG_PCI_ATS
+	struct pci_sriov *iov = NULL;
+	int locked = 0;
+
+	if (dev->is_virtfn) {
+		iov = dev->physfn->sriov;
+		locked = mutex_trylock(&iov->dev->sriov->lock);
+	}
+#endif
+
 	pci_stop_bus_device(dev);
 	pci_remove_bus_device(dev);
+
+#ifdef CONFIG_PCI_ATS
+	if (locked)
+		mutex_unlock(&iov->dev->sriov->lock);
+#endif
 }
 EXPORT_SYMBOL(pci_stop_and_remove_bus_device);
 

[-- Attachment #3: fix_cx3_hotplug_2.patch --]
[-- Type: text/x-patch, Size: 1575 bytes --]

Subject: [PATCH v2 3/3] PCI: Stop sriov after stop PF if PF's driver skip that
From: Yinghai Lu <yinghai@kernel.org>

After commit dc087f2f6a2925e81831f3016b9cbb6e470e7423
(PCI: Simplify IOV implementation and fix reference count races)
VF need to be removed via virtfn_remove to make sure ref to PF
is put back.

Some driver (like ixgbe) does not call pci_disable_sriov() if
sriov is enabled via /sys/.../sriov_numvfs setting.
ixgbe does allow driver for PF get detached, but still have VFs
around.

But how about PF get removed via /sys or pciehp finally?

During hot-remove, VF will still hold one ref to PF and it
prevent PF to be removed.
That make the next hot-add fails, as old PF dev struct is still around.

We need to add pci_disable_sriov() calling during stop PF .

Need this one for v3.11

-v2: Accoring to Bjorn, move that calling to pci_stop_dev.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Donald Dutile <ddutile@redhat.com>
Cc: Greg Rose <gregory.v.rose@intel.com>

---
 drivers/pci/remove.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -25,6 +25,8 @@ static void pci_stop_dev(struct pci_dev
 		pci_proc_detach_device(dev);
 		pci_remove_sysfs_dev_files(dev);
 		device_del(&dev->dev);
+		/* remove VF, if PF driver skip that */
+		pci_disable_sriov(dev);
 		dev->is_added = 0;
 	}
 

[-- Attachment #4: fix_racing_removing_6_1.patch --]
[-- Type: text/x-patch, Size: 1220 bytes --]

Subject: [PATCH 1/7] PCI: move back pci_proc_attach_devices calling
From: Yinghai Lu <yinghai@kernel.org>

We stop detach proc when pci_stop_device.
So should attach that during pci_bus_add_device.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/bus.c   |    1 +
 drivers/pci/probe.c |    2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1382,8 +1382,6 @@ void pci_device_add(struct pci_dev *dev,
 	dev->match_driver = false;
 	ret = device_add(&dev->dev);
 	WARN_ON(ret < 0);
-
-	pci_proc_attach_device(dev);
 }
 
 struct pci_dev *__ref pci_scan_single_device(struct pci_bus *bus, int devfn)
Index: linux-2.6/drivers/pci/bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/bus.c
+++ linux-2.6/drivers/pci/bus.c
@@ -175,6 +175,7 @@ int pci_bus_add_device(struct pci_dev *d
 	 * are not assigned yet for some devices.
 	 */
 	pci_fixup_device(pci_fixup_final, dev);
+	pci_proc_attach_device(dev);
 	pci_create_sysfs_dev_files(dev);
 
 	dev->match_driver = true;

[-- Attachment #5: fix_racing_removing_6_2.patch --]
[-- Type: text/x-patch, Size: 2578 bytes --]

Subject: [PATCH 2/7] PCI: move resources and bus_list releasing to pci_release_dev
From: Yinghai Lu <yinghai@kernel.org>

We should not release resource in pci_destroy that is too early
as there could be still other use hold reference.

release them or remove it from bus devices list at last
in pci_release_dev instead.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/probe.c  |   25 +++++++++++++++++++++++--
 drivers/pci/remove.c |   21 ---------------------
 2 files changed, 23 insertions(+), 23 deletions(-)

Index: linux-2.6/drivers/pci/probe.c
===================================================================
--- linux-2.6.orig/drivers/pci/probe.c
+++ linux-2.6/drivers/pci/probe.c
@@ -1155,6 +1155,20 @@ static void pci_release_capabilities(str
 	pci_free_cap_save_buffers(dev);
 }
 
+static void pci_free_resources(struct pci_dev *dev)
+{
+	int i;
+
+	msi_remove_pci_irq_vectors(dev);
+
+	pci_cleanup_rom(dev);
+	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+		struct resource *res = dev->resource + i;
+		if (res->parent)
+			release_resource(res);
+	}
+}
+
 /**
  * pci_release_dev - free a pci device structure when all users of it are finished.
  * @dev: device that's been disconnected
@@ -1164,9 +1178,16 @@ static void pci_release_capabilities(str
  */
 static void pci_release_dev(struct device *dev)
 {
-	struct pci_dev *pci_dev;
+	struct pci_dev *pci_dev = to_pci_dev(dev);
+
+	down_write(&pci_bus_sem);
+	list_del(&pci_dev->bus_list);
+	up_write(&pci_bus_sem);
+
+	virtfn_release(pci_dev);
+
+	pci_free_resources(pci_dev);
 
-	pci_dev = to_pci_dev(dev);
 	pci_release_capabilities(pci_dev);
 	pci_release_of_node(pci_dev);
 	pcibios_release_device(pci_dev);
Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -3,20 +3,6 @@
 #include <linux/pci-aspm.h>
 #include "pci.h"
 
-static void pci_free_resources(struct pci_dev *dev)
-{
-	int i;
-
- 	msi_remove_pci_irq_vectors(dev);
-
-	pci_cleanup_rom(dev);
-	for (i = 0; i < PCI_NUM_RESOURCES; i++) {
-		struct resource *res = dev->resource + i;
-		if (res->parent)
-			release_resource(res);
-	}
-}
-
 static void pci_stop_dev(struct pci_dev *dev)
 {
 	pci_pme_active(dev, false);
@@ -36,13 +22,6 @@ static void pci_stop_dev(struct pci_dev
 
 static void pci_destroy_dev(struct pci_dev *dev)
 {
-	down_write(&pci_bus_sem);
-	list_del(&dev->bus_list);
-	up_write(&pci_bus_sem);
-
-	virtfn_release(dev);
-
-	pci_free_resources(dev);
 	put_device(&dev->dev);
 }
 

[-- Attachment #6: fix_racing_removing_6_3.patch --]
[-- Type: text/x-patch, Size: 1544 bytes --]

Subject: [PATCH 3/7] PCI: Destroy pci dev only once
From: Yinghai Lu <yinghai@kernel.org>

Mutliple removing via /sys will call pci_destroy_dev two times.

Add is_removed to record if pci_destroy_dev is called already.

During second calling, still have extra dev ref hold via
device_schedule_call, so we are safe to check dev->is_removed.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 drivers/pci/remove.c |    5 ++++-
 include/linux/pci.h  |    1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

Index: linux-2.6/drivers/pci/remove.c
===================================================================
--- linux-2.6.orig/drivers/pci/remove.c
+++ linux-2.6/drivers/pci/remove.c
@@ -22,7 +22,10 @@ static void pci_stop_dev(struct pci_dev
 
 static void pci_destroy_dev(struct pci_dev *dev)
 {
-	put_device(&dev->dev);
+	if (!dev->is_removed) {
+		dev->is_removed = 1;
+		put_device(&dev->dev);
+	}
 }
 
 void pci_remove_bus(struct pci_bus *bus)
Index: linux-2.6/include/linux/pci.h
===================================================================
--- linux-2.6.orig/include/linux/pci.h
+++ linux-2.6/include/linux/pci.h
@@ -308,6 +308,7 @@ struct pci_dev {
 	unsigned int	multifunction:1;/* Part of multi-function device */
 	/* keep track of device state */
 	unsigned int	is_added:1;
+	unsigned int	is_removed:1;	/* pci_destroy_dev is called */
 	unsigned int	is_busmaster:1; /* device is busmaster */
 	unsigned int	no_msi:1;	/* device may not use msi */
 	unsigned int	block_cfg_access:1;	/* config space access is blocked */

  parent reply	other threads:[~2013-10-31  6:45 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-30 10:14 [PATCH] pci: fix invalid list entry warning for double pci device removing via sysfs Gu Zheng
2013-10-30 17:08 ` Bjorn Helgaas
2013-10-31 10:11   ` Gu Zheng
2013-10-31  6:45 ` Yinghai Lu [this message]
2013-10-31 10:12   ` Gu Zheng
2013-10-31 13:06   ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAE9FiQXp+9XKbs8ccJVt08icax6SOWioDfL5s+H1SWGi9cTAiw@mail.gmail.com \
    --to=yinghai@kernel.org \
    --cc=bhelgaas@google.com \
    --cc=guz.fnst@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).