All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] pci aer: fix deadlock in do_recovery
@ 2017-09-27 21:42 Govindarajulu Varadarajan
  2017-09-27 21:42 ` [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held Govindarajulu Varadarajan
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-27 21:42 UTC (permalink / raw)
  To: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz
  Cc: Govindarajulu Varadarajan

I am seeing a dead lock while loading enic driver with sriov enabled.

CPU0					CPU1
---------------------------------------------------------------------
__driver_attach()
device_lock(&dev->mutex) <--- device mutex lock here
driver_probe_device()
pci_enable_sriov()
pci_iov_add_virtfn()
pci_device_add()
					aer_isr()		<--- pci aer error
					do_recovery()
					broadcast_error_message()
					pci_walk_bus()
					down_read(&pci_bus_sem) <--- rd sem
down_write(&pci_bus_sem) <-- stuck on wr sem
					report_error_detected()
					device_lock(&dev->mutex)<--- DEAD LOCK

This can also happen when aer error occurs while pci_dev->sriov_config() is
called.

Only fix I could think of is to lock &pci_bus_sem and try locking all
device->mutex under that pci_bus. If it fails, unlock all device->mutex
and &pci_bus_sem and try again. This approach seems to be hackish and I
do not have better solution. I would like to open the discussion for
this.

Path 1 and 2 are code refactoring for pci locking api. Patch 3 fixes the
issue.

With current fix, we hold mutex lock of parent device and all the
devices under the bus. This can exceed the size of held_locks in lockdep
if number of devices (VFs) exceed 48. Patch 4 extends this 63, max
supported by lockdep.

Govindarajulu Varadarajan (4):
  pci: introduce __pci_walk_bus for caller with pci_bus_sem held
  pci: code refactor pci_bus_lock/unlock/trylock
  pci aer: fix deadlock in do_recovery
  lockdep: make MAX_LOCK_DEPTH configurable from Kconfig

 drivers/pci/bus.c                  | 13 ++++++++--
 drivers/pci/pci.c                  | 38 ++++++++++++++++++++---------
 drivers/pci/pcie/aer/aerdrv_core.c | 50 ++++++++++++++++++++++++++++++--------
 fs/configfs/inode.c                |  2 +-
 include/linux/pci.h                | 18 ++++++++++++++
 include/linux/sched.h              |  3 +--
 kernel/locking/lockdep.c           | 13 +++++-----
 lib/Kconfig.debug                  | 10 ++++++++
 8 files changed, 115 insertions(+), 32 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held
  2017-09-27 21:42 [PATCH 0/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
@ 2017-09-27 21:42 ` Govindarajulu Varadarajan
  2017-09-28 16:12   ` Sinan Kaya
  2017-09-27 21:42 ` [PATCH 2/4] pci: code refactor pci_bus_lock/unlock/trylock Govindarajulu Varadarajan
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-27 21:42 UTC (permalink / raw)
  To: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz
  Cc: Govindarajulu Varadarajan

Move pci_bus_sem down/up out of pci_walk_bus and rename it to
__pci_walk_bus. Caller who already hold pci_bus_sem can call
__pci_walk_bus.

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/pci/bus.c   | 13 +++++++++++--
 include/linux/pci.h |  2 ++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index bc56cf19afd3..3cababe74af0 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -376,8 +376,10 @@ EXPORT_SYMBOL(pci_bus_add_devices);
  *  We check the return of @cb each time. If it returns anything
  *  other than 0, we break out.
  *
+ *  Should be called with read pci_bus_sem held.
+ *
  */
-void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
+void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 		  void *userdata)
 {
 	struct pci_dev *dev;
@@ -386,7 +388,6 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 	int retval;
 
 	bus = top;
-	down_read(&pci_bus_sem);
 	next = top->devices.next;
 	for (;;) {
 		if (next == &bus->devices) {
@@ -409,6 +410,14 @@ void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 		if (retval)
 			break;
 	}
+}
+EXPORT_SYMBOL_GPL(__pci_walk_bus);
+
+void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
+		  void *userdata)
+{
+	down_read(&pci_bus_sem);
+	__pci_walk_bus(top, cb, userdata);
 	up_read(&pci_bus_sem);
 }
 EXPORT_SYMBOL_GPL(pci_walk_bus);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index f68c58a93dd0..b4b1a8a164c0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1297,6 +1297,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max,
 
 void pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
 		  void *userdata);
+void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
+		    void *userdata);
 int pci_cfg_space_size(struct pci_dev *dev);
 unsigned char pci_bus_max_busnr(struct pci_bus *bus);
 void pci_setup_bridge(struct pci_bus *bus);
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] pci: code refactor pci_bus_lock/unlock/trylock
  2017-09-27 21:42 [PATCH 0/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
  2017-09-27 21:42 ` [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held Govindarajulu Varadarajan
@ 2017-09-27 21:42 ` Govindarajulu Varadarajan
  2017-09-27 21:42 ` [PATCH 3/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
  2017-09-27 21:42 ` [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig Govindarajulu Varadarajan
  3 siblings, 0 replies; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-27 21:42 UTC (permalink / raw)
  To: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz
  Cc: Govindarajulu Varadarajan

Introduce __pci_bus_trylock and __pci_bus_unlock with lock and unlock cb
functions as arguments. User can pass on what they want to lock in pci_dev.

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/pci/pci.c   | 38 +++++++++++++++++++++++++++-----------
 include/linux/pci.h | 16 ++++++++++++++++
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6078dfc11b11..3c6a9210f27c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4377,29 +4377,38 @@ static void pci_bus_lock(struct pci_bus *bus)
 	}
 }
 
-/* Unlock devices from the bottom of the tree up */
-static void pci_bus_unlock(struct pci_bus *bus)
+void __pci_bus_unlock(struct pci_bus *bus,
+		      void (*unlock)(struct pci_dev *dev))
 {
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
 		if (dev->subordinate)
-			pci_bus_unlock(dev->subordinate);
-		pci_dev_unlock(dev);
+			__pci_bus_unlock(dev->subordinate, unlock);
+		unlock(dev);
 	}
 }
+EXPORT_SYMBOL_GPL(__pci_bus_unlock);
 
-/* Return 1 on successful lock, 0 on contention */
-static int pci_bus_trylock(struct pci_bus *bus)
+/* Unlock devices from the bottom of the tree up */
+static void pci_bus_unlock(struct pci_bus *bus)
+{
+	__pci_bus_unlock(bus, pci_dev_unlock);
+}
+
+int __pci_bus_trylock(struct pci_bus *bus,
+		      int (*lock)(struct pci_dev *dev),
+		      void (*unlock)(struct pci_dev *dev))
 {
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		if (!pci_dev_trylock(dev))
+		if (!lock(dev))
 			goto unlock;
 		if (dev->subordinate) {
-			if (!pci_bus_trylock(dev->subordinate)) {
-				pci_dev_unlock(dev);
+			if (!__pci_bus_trylock(dev->subordinate, lock,
+					       unlock)) {
+				unlock(dev);
 				goto unlock;
 			}
 		}
@@ -4409,11 +4418,18 @@ static int pci_bus_trylock(struct pci_bus *bus)
 unlock:
 	list_for_each_entry_continue_reverse(dev, &bus->devices, bus_list) {
 		if (dev->subordinate)
-			pci_bus_unlock(dev->subordinate);
-		pci_dev_unlock(dev);
+			__pci_bus_unlock(dev->subordinate, unlock);
+		unlock(dev);
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(__pci_bus_trylock);
+
+/* Return 1 on successful lock, 0 on contention */
+static int pci_bus_trylock(struct pci_bus *bus)
+{
+	return __pci_bus_trylock(bus, pci_dev_trylock, pci_dev_unlock);
+}
 
 /* Do any devices on or below this slot prevent a bus reset? */
 static bool pci_slot_resetable(struct pci_slot *slot)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index b4b1a8a164c0..33359c64cd2e 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1515,6 +1515,22 @@ void pci_cfg_access_lock(struct pci_dev *dev);
 bool pci_cfg_access_trylock(struct pci_dev *dev);
 void pci_cfg_access_unlock(struct pci_dev *dev);
 
+void __pci_bus_unlock(struct pci_bus *bus,
+		      void (*unlock)(struct pci_dev *dev));
+int __pci_bus_trylock(struct pci_bus *bus,
+		      int (*lock)(struct pci_dev *dev),
+		      void (*unlock)(struct pci_dev *dev));
+static inline int pci_device_trylock(struct pci_dev *dev)
+{
+	return device_trylock(&dev->dev);
+}
+
+static inline void pci_device_unlock(struct pci_dev *dev)
+{
+	device_unlock(&dev->dev);
+}
+
+
 /*
  * PCI domain support.  Sometimes called PCI segment (eg by ACPI),
  * a PCI domain is defined to be a set of PCI buses which share
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] pci aer: fix deadlock in do_recovery
  2017-09-27 21:42 [PATCH 0/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
  2017-09-27 21:42 ` [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held Govindarajulu Varadarajan
  2017-09-27 21:42 ` [PATCH 2/4] pci: code refactor pci_bus_lock/unlock/trylock Govindarajulu Varadarajan
@ 2017-09-27 21:42 ` Govindarajulu Varadarajan
  2017-09-28 16:47   ` Sinan Kaya
  2017-09-27 21:42 ` [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig Govindarajulu Varadarajan
  3 siblings, 1 reply; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-27 21:42 UTC (permalink / raw)
  To: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz
  Cc: Govindarajulu Varadarajan

CPU0					CPU1
---------------------------------------------------------------------
__driver_attach()
device_lock(&dev->mutex) <--- device mutex lock here
driver_probe_device()
pci_enable_sriov()
pci_iov_add_virtfn()
pci_device_add()
					aer_isr()		<--- pci aer error
					do_recovery()
					broadcast_error_message()
					pci_walk_bus()
					down_read(&pci_bus_sem) <--- rd sem
down_write(&pci_bus_sem) <-- stuck on wr sem
					report_error_detected()
					device_lock(&dev->mutex)<--- DEAD LOCK

This can also happen when aer error occurs while pci_dev->sriov_config() is
called.

Only fix I could think of is to lock &pci_bus_sem and try locking all
device->mutex under that pci_bus. If it fails, unlock all device->mutex
and &pci_bus_sem and try again.

[   70.984091] pcieport 0000:00:02.0: AER: Uncorrected (Non-Fatal) error received: id=0010
[   70.984112] pcieport 0000:00:02.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0010(Requester ID)
[   70.984116] pcieport 0000:00:02.0:   device [8086:3c04] error status/mask=00004000/00100000
[   70.984120] pcieport 0000:00:02.0:    [14] Completion Timeout     (First)
...
[  107.484190] INFO: task kworker/0:1:76 blocked for more than 30 seconds.
[  107.563619]       Not tainted 4.13.0+ #28
[  107.611618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  107.705368] kworker/0:1     D    0    76      2 0x80000000
[  107.771050] Workqueue: events aer_isr
[  107.814895] Call Trace:
[  107.844181]  __schedule+0x312/0xa40
[  107.885928]  schedule+0x3d/0x90
[  107.923506]  schedule_preempt_disabled+0x15/0x20
[  107.978773]  __mutex_lock+0x304/0xa30
[  108.022594]  ? dev_printk_emit+0x3b/0x50
[  108.069534]  ? report_error_detected+0xa6/0x210
[  108.123770]  mutex_lock_nested+0x1b/0x20
[  108.170713]  ? mutex_lock_nested+0x1b/0x20
[  108.219730]  report_error_detected+0xa6/0x210
[  108.271881]  ? aer_recover_queue+0xe0/0xe0
[  108.320904]  pci_walk_bus+0x46/0x90
[  108.362645]  ? aer_recover_queue+0xe0/0xe0
[  108.411658]  broadcast_error_message+0xc3/0xf0
[  108.464835]  do_recovery+0x34/0x220
[  108.506569]  ? get_device_error_info+0x92/0x130
[  108.560785]  aer_isr+0x28f/0x3b0
[  108.599410]  process_one_work+0x277/0x6c0
[  108.647399]  worker_thread+0x4d/0x3b0
[  108.691218]  kthread+0x171/0x190
[  108.729830]  ? process_one_work+0x6c0/0x6c0
[  108.779888]  ? kthread_create_on_node+0x40/0x40
[  108.834110]  ret_from_fork+0x2a/0x40
[  108.876916] INFO: task kworker/0:2:205 blocked for more than 30 seconds.
[  108.957129]       Not tainted 4.13.0+ #28
[  109.005114] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  109.098873] kworker/0:2     D    0   205      2 0x80000000
[  109.164544] Workqueue: events work_for_cpu_fn
[  109.216681] Call Trace:
[  109.245943]  __schedule+0x312/0xa40
[  109.287687]  ? rwsem_down_write_failed+0x308/0x4f0
[  109.345021]  schedule+0x3d/0x90
[  109.382603]  rwsem_down_write_failed+0x30d/0x4f0
[  109.437869]  ? __lock_acquire+0x75c/0x1410
[  109.486910]  call_rwsem_down_write_failed+0x17/0x30
[  109.545287]  ? call_rwsem_down_write_failed+0x17/0x30
[  109.605752]  down_write+0x88/0xb0
[  109.645410]  pci_device_add+0x158/0x240
[  109.691313]  pci_iov_add_virtfn+0x24f/0x340
[  109.741375]  pci_enable_sriov+0x32b/0x420
[  109.789466]  ? pci_read+0x2c/0x30
[  109.829142]  enic_probe+0x5d4/0xff0 [enic]
[  109.878184]  ? trace_hardirqs_on+0xd/0x10
[  109.926180]  local_pci_probe+0x42/0xa0
[  109.971037]  work_for_cpu_fn+0x14/0x20
[  110.015898]  process_one_work+0x277/0x6c0
[  110.063884]  worker_thread+0x1d6/0x3b0
[  110.108750]  kthread+0x171/0x190
[  110.147363]  ? process_one_work+0x6c0/0x6c0
[  110.197426]  ? kthread_create_on_node+0x40/0x40
[  110.251642]  ret_from_fork+0x2a/0x40
[  110.294448] INFO: task systemd-udevd:492 blocked for more than 30 seconds.
[  110.376742]       Not tainted 4.13.0+ #28
[  110.424715] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  110.518457] systemd-udevd   D    0   492    444 0x80000180
[  110.584116] Call Trace:
[  110.613382]  __schedule+0x312/0xa40
[  110.655127]  ? wait_for_completion+0x14a/0x1d0
[  110.708302]  schedule+0x3d/0x90
[  110.745875]  schedule_timeout+0x26e/0x5b0
[  110.793887]  ? wait_for_completion+0x14a/0x1d0
[  110.847068]  wait_for_completion+0x169/0x1d0
[  110.898165]  ? wait_for_completion+0x169/0x1d0
[  110.951354]  ? wake_up_q+0x80/0x80
[  110.992060]  flush_work+0x237/0x300
[  111.033795]  ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[  111.093224]  ? wait_for_completion+0x5a/0x1d0
[  111.145363]  ? flush_work+0x237/0x300
[  111.189189]  work_on_cpu+0x94/0xb0
[  111.229894]  ? work_is_static_object+0x20/0x20
[  111.283070]  ? pci_device_shutdown+0x60/0x60
[  111.334173]  pci_device_probe+0x17a/0x190
[  111.382163]  driver_probe_device+0x2ff/0x450
[  111.433260]  __driver_attach+0x103/0x140
[  111.480195]  ? driver_probe_device+0x450/0x450
[  111.533381]  bus_for_each_dev+0x74/0xb0
[  111.579276]  driver_attach+0x1e/0x20
[  111.622056]  bus_add_driver+0x1ca/0x270
[  111.667955]  ? 0xffffffffc039c000
[  111.707616]  driver_register+0x60/0xe0
[  111.752472]  ? 0xffffffffc039c000
[  111.792126]  __pci_register_driver+0x6b/0x70
[  111.843275]  enic_init_module+0x38/0x1000 [enic]
[  111.898533]  do_one_initcall+0x50/0x192
[  111.944428]  ? trace_hardirqs_on+0xd/0x10
[  111.992408]  do_init_module+0x5f/0x1f2
[  112.037274]  load_module+0x1740/0x1f70
[  112.082148]  SYSC_finit_module+0xd7/0xf0
[  112.129083]  ? SYSC_finit_module+0xd7/0xf0
[  112.178106]  SyS_finit_module+0xe/0x10
[  112.222972]  do_syscall_64+0x69/0x180
[  112.266793]  entry_SYSCALL64_slow_path+0x25/0x25
[  112.322047] RIP: 0033:0x7f3da098b559
[  112.364826] RSP: 002b:00007ffeb3306a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[  112.455447] RAX: ffffffffffffffda RBX: 0000557fe41ed3d0 RCX: 00007f3da098b559
[  112.540860] RDX: 0000000000000000 RSI: 00007f3da14c79c5 RDI: 0000000000000006
[  112.626281] RBP: 00007f3da14c79c5 R08: 0000000000000000 R09: 00007ffeb3306b50
[  112.711698] R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000000000
[  112.797114] R13: 0000557fe420e210 R14: 0000000000020000 R15: 0000557fe2c1ef4a
[  112.882568]
               Showing all locks held in the system:
[  112.956545] 5 locks held by kworker/0:1/76:
[  113.006616]  #0:  ("events"){+.+.}, at: [<ffffffffb00b10ed>] process_one_work+0x1ed/0x6c0
[  113.104535]  #1:  ((&rpc->dpc_handler)){+.+.}, at: [<ffffffffb00b10ed>] process_one_work+0x1ed/0x6c0
[  113.213894]  #2:  (&rpc->rpc_mutex){+.+.}, at: [<ffffffffb0505ca2>] aer_isr+0x32/0x3b0
[  113.308711]  #3:  (pci_bus_sem){++++}, at: [<ffffffffb04ea18a>] pci_walk_bus+0x2a/0x90
[  113.403501]  #4:  (&dev->mutex){....}, at: [<ffffffffb0505706>] report_error_detected+0xa6/0x210
[  113.508715] 3 locks held by kworker/0:2/205:
[  113.559808]  #0:  ("events"){+.+.}, at: [<ffffffffb00b10ed>] process_one_work+0x1ed/0x6c0
[  113.657718]  #1:  ((&wfc.work)){+.+.}, at: [<ffffffffb00b10ed>] process_one_work+0x1ed/0x6c0
[  113.758745]  #2:  (pci_bus_sem){++++}, at: [<ffffffffb04ec978>] pci_device_add+0x158/0x240
[  113.857710] 1 lock held by khungtaskd/239:
[  113.906729]  #0:  (tasklist_lock){.+.+}, at: [<ffffffffb00f07dd>] debug_show_all_locks+0x3d/0x1a0
[  114.012972] 2 locks held by systemd-udevd/492:
[  114.066148]  #0:  (&dev->mutex){....}, at: [<ffffffffb06254d5>] __driver_attach+0x55/0x140
[  114.165107]  #1:  (&dev->mutex){....}, at: [<ffffffffb06254f2>] __driver_attach+0x72/0x140

[  114.281879] =============================================

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 drivers/pci/pcie/aer/aerdrv_core.c | 50 ++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 890efcc574cb..a3869a3b6e82 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -26,6 +26,7 @@
 #include <linux/slab.h>
 #include <linux/kfifo.h>
 #include "aerdrv.h"
+#include "../../pci.h"
 
 #define	PCI_EXP_AER_FLAGS	(PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \
 				 PCI_EXP_DEVCTL_FERE | PCI_EXP_DEVCTL_URRE)
@@ -241,7 +242,6 @@ static int report_error_detected(struct pci_dev *dev, void *data)
 	struct aer_broadcast_data *result_data;
 	result_data = (struct aer_broadcast_data *) data;
 
-	device_lock(&dev->dev);
 	dev->error_state = result_data->state;
 
 	if (!dev->driver ||
@@ -281,7 +281,6 @@ static int report_error_detected(struct pci_dev *dev, void *data)
 	}
 
 	result_data->result = merge_result(result_data->result, vote);
-	device_unlock(&dev->dev);
 	return 0;
 }
 
@@ -292,7 +291,6 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data)
 	struct aer_broadcast_data *result_data;
 	result_data = (struct aer_broadcast_data *) data;
 
-	device_lock(&dev->dev);
 	if (!dev->driver ||
 		!dev->driver->err_handler ||
 		!dev->driver->err_handler->mmio_enabled)
@@ -302,7 +300,6 @@ static int report_mmio_enabled(struct pci_dev *dev, void *data)
 	vote = err_handler->mmio_enabled(dev);
 	result_data->result = merge_result(result_data->result, vote);
 out:
-	device_unlock(&dev->dev);
 	return 0;
 }
 
@@ -313,7 +310,6 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
 	struct aer_broadcast_data *result_data;
 	result_data = (struct aer_broadcast_data *) data;
 
-	device_lock(&dev->dev);
 	if (!dev->driver ||
 		!dev->driver->err_handler ||
 		!dev->driver->err_handler->slot_reset)
@@ -323,7 +319,6 @@ static int report_slot_reset(struct pci_dev *dev, void *data)
 	vote = err_handler->slot_reset(dev);
 	result_data->result = merge_result(result_data->result, vote);
 out:
-	device_unlock(&dev->dev);
 	return 0;
 }
 
@@ -331,7 +326,6 @@ static int report_resume(struct pci_dev *dev, void *data)
 {
 	const struct pci_error_handlers *err_handler;
 
-	device_lock(&dev->dev);
 	dev->error_state = pci_channel_io_normal;
 
 	if (!dev->driver ||
@@ -342,10 +336,46 @@ static int report_resume(struct pci_dev *dev, void *data)
 	err_handler = dev->driver->err_handler;
 	err_handler->resume(dev);
 out:
-	device_unlock(&dev->dev);
 	return 0;
 }
 
+static void aer_pci_walk_bus(struct pci_bus *bus,
+			     int (*cb)(struct pci_dev *, void *),
+			     struct aer_broadcast_data *res)
+{
+	bool locked;
+	uint8_t i;
+
+	for (i = 1; i; i++) {
+		/* PCI driver could hold device->mutex lock and call driver
+		 * cb functions which may try to aquire pci_bus_sem.
+		 * Trying to aquiring device->mutex lock holding pci_bus_sem
+		 * could lead to deadlock.
+		 *
+		 * Holding pci_bus_sem lets try to aquire device->mutex lock.
+		 * If trylock(device->mutex) fails, unlock pci_bus_sem and
+		 * try again.
+		 */
+		down_read(&pci_bus_sem);
+		locked = __pci_bus_trylock(bus, pci_device_trylock,
+					   pci_device_unlock);
+		if (locked)
+			goto out;
+		up_read(&pci_bus_sem);
+		dev_info(&bus->self->dev, "Could not aquire device lock on all subordinates, trying again.");
+		msleep(25);
+	};
+
+	res->result = PCI_ERS_RESULT_NONE;
+	dev_err(&bus->self->dev, "Could not aquire lock. No aer recovery done.");
+	return;
+out:
+	/* all devices under this subordinate is locked */
+	__pci_walk_bus(bus, cb, res);
+	__pci_bus_unlock(bus, pci_device_unlock);
+	up_read(&pci_bus_sem);
+}
+
 /**
  * broadcast_error_message - handle message broadcast to downstream drivers
  * @dev: pointer to from where in a hierarchy message is broadcasted down
@@ -380,7 +410,7 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 		 */
 		if (cb == report_error_detected)
 			dev->error_state = state;
-		pci_walk_bus(dev->subordinate, cb, &result_data);
+		aer_pci_walk_bus(dev->subordinate, cb, &result_data);
 		if (cb == report_resume) {
 			pci_cleanup_aer_uncorrect_error_status(dev);
 			dev->error_state = pci_channel_io_normal;
@@ -390,7 +420,7 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
 		 * If the error is reported by an end point, we think this
 		 * error is related to the upstream link of the end point.
 		 */
-		pci_walk_bus(dev->bus, cb, &result_data);
+		aer_pci_walk_bus(dev->bus, cb, &result_data);
 	}
 
 	return result_data.result;
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig
  2017-09-27 21:42 [PATCH 0/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
                   ` (2 preceding siblings ...)
  2017-09-27 21:42 ` [PATCH 3/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
@ 2017-09-27 21:42 ` Govindarajulu Varadarajan
  2017-09-28  9:26   ` Peter Zijlstra
  3 siblings, 1 reply; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-27 21:42 UTC (permalink / raw)
  To: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz
  Cc: Govindarajulu Varadarajan

Make MAX_LOCK_DEPTH configurable. It is set to 48 right now. Number of
VFs under a PCI pf bus can exceed 48 and this disables lockdep.

lockdep currently allows max of 63 held_locks.

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
---
 fs/configfs/inode.c      |  2 +-
 include/linux/sched.h    |  3 +--
 kernel/locking/lockdep.c | 13 +++++++------
 lib/Kconfig.debug        | 10 ++++++++++
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c
index ad718e5e37bb..a1e2a7ce69d0 100644
--- a/fs/configfs/inode.c
+++ b/fs/configfs/inode.c
@@ -41,7 +41,7 @@
 #include "configfs_internal.h"
 
 #ifdef CONFIG_LOCKDEP
-static struct lock_class_key default_group_class[MAX_LOCK_DEPTH];
+static struct lock_class_key default_group_class[CONFIG_MAX_LOCK_DEPTH];
 #endif
 
 static const struct address_space_operations configfs_aops = {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 92fb8dd5a9e4..9a81eec702be 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -841,11 +841,10 @@ struct task_struct {
 #endif
 
 #ifdef CONFIG_LOCKDEP
-# define MAX_LOCK_DEPTH			48UL
 	u64				curr_chain_key;
 	int				lockdep_depth;
 	unsigned int			lockdep_recursion;
-	struct held_lock		held_locks[MAX_LOCK_DEPTH];
+	struct held_lock		held_locks[CONFIG_MAX_LOCK_DEPTH];
 #endif
 
 #ifdef CONFIG_LOCKDEP_CROSSRELEASE
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 44c8d0d17170..22a4a338616d 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3404,7 +3404,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	/*
 	 * Ran out of static storage for our per-task lock stack again have we?
 	 */
-	if (DEBUG_LOCKS_WARN_ON(depth >= MAX_LOCK_DEPTH))
+	if (DEBUG_LOCKS_WARN_ON(depth >= CONFIG_MAX_LOCK_DEPTH))
 		return 0;
 
 	class_idx = class - lock_classes + 1;
@@ -3513,11 +3513,11 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	if (unlikely(!debug_locks))
 		return 0;
 #endif
-	if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
+	if (unlikely(curr->lockdep_depth >= CONFIG_MAX_LOCK_DEPTH)) {
 		debug_locks_off();
 		print_lockdep_off("BUG: MAX_LOCK_DEPTH too low!");
 		printk(KERN_DEBUG "depth: %i  max: %lu!\n",
-		       curr->lockdep_depth, MAX_LOCK_DEPTH);
+		       curr->lockdep_depth, CONFIG_MAX_LOCK_DEPTH);
 
 		lockdep_print_held_locks(current);
 		debug_show_all_locks();
@@ -4276,7 +4276,8 @@ void lockdep_reset(void)
 	current->curr_chain_key = 0;
 	current->lockdep_depth = 0;
 	current->lockdep_recursion = 0;
-	memset(current->held_locks, 0, MAX_LOCK_DEPTH*sizeof(struct held_lock));
+	memset(current->held_locks, 0,
+	       CONFIG_MAX_LOCK_DEPTH * sizeof(struct held_lock));
 	nr_hardirq_chains = 0;
 	nr_softirq_chains = 0;
 	nr_process_chains = 0;
@@ -4421,7 +4422,7 @@ void __init lockdep_info(void)
 	printk("Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar\n");
 
 	printk("... MAX_LOCKDEP_SUBCLASSES:  %lu\n", MAX_LOCKDEP_SUBCLASSES);
-	printk("... MAX_LOCK_DEPTH:          %lu\n", MAX_LOCK_DEPTH);
+	printk("... MAX_LOCK_DEPTH:          %lu\n", CONFIG_MAX_LOCK_DEPTH);
 	printk("... MAX_LOCKDEP_KEYS:        %lu\n", MAX_LOCKDEP_KEYS);
 	printk("... CLASSHASH_SIZE:          %lu\n", CLASSHASH_SIZE);
 	printk("... MAX_LOCKDEP_ENTRIES:     %lu\n", MAX_LOCKDEP_ENTRIES);
@@ -4441,7 +4442,7 @@ void __init lockdep_info(void)
 		);
 
 	printk(" per task-struct memory footprint: %lu bytes\n",
-		sizeof(struct held_lock) * MAX_LOCK_DEPTH);
+		sizeof(struct held_lock) * CONFIG_MAX_LOCK_DEPTH);
 }
 
 static void
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2689b7c50c52..60bc084315a6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1187,6 +1187,16 @@ config DEBUG_LOCKDEP
 	  additional runtime checks to debug itself, at the price
 	  of more runtime overhead.
 
+config MAX_LOCK_DEPTH
+	int "Maximum held_locks depth"
+	depends on DEBUG_LOCKDEP
+	default 48
+	range 1 63
+	help
+	  This is maximum held_lock depth in task_struct for debugging.
+	  Increment if you think a task can hold more than default(48) locks.
+	  If unsure, set to default value, 48.
+
 config DEBUG_ATOMIC_SLEEP
 	bool "Sleep inside atomic section checking"
 	select PREEMPT_COUNT
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig
  2017-09-27 21:42 ` [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig Govindarajulu Varadarajan
@ 2017-09-28  9:26   ` Peter Zijlstra
  2017-09-28 23:51     ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2017-09-28  9:26 UTC (permalink / raw)
  To: Govindarajulu Varadarajan
  Cc: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo

On Wed, Sep 27, 2017 at 02:42:20PM -0700, Govindarajulu Varadarajan wrote:
> Make MAX_LOCK_DEPTH configurable. It is set to 48 right now. Number of
> VFs under a PCI pf bus can exceed 48 and this disables lockdep.
> 
> lockdep currently allows max of 63 held_locks.

But why a config knob? Why not just raise the number to 64
unconditionally? And is that sufficient; you only state 48 is
insufficient, you don't actually state the VF limit.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held
  2017-09-27 21:42 ` [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held Govindarajulu Varadarajan
@ 2017-09-28 16:12   ` Sinan Kaya
  2017-09-28 23:52     ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 15+ messages in thread
From: Sinan Kaya @ 2017-09-28 16:12 UTC (permalink / raw)
  To: Govindarajulu Varadarajan, benve, bhelgaas, linux-pci,
	linux-kernel, jlbec, hch, mingo, peterz

On 9/27/2017 5:42 PM, Govindarajulu Varadarajan wrote:
> +void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
> +		    void *userdata);

pci_walk_bus_locked would be a better name as you are assuming that caller is
holding the lock. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] pci aer: fix deadlock in do_recovery
  2017-09-27 21:42 ` [PATCH 3/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
@ 2017-09-28 16:47   ` Sinan Kaya
  2017-09-28 23:46     ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 15+ messages in thread
From: Sinan Kaya @ 2017-09-28 16:47 UTC (permalink / raw)
  To: Govindarajulu Varadarajan, benve, bhelgaas, linux-pci,
	linux-kernel, jlbec, hch, mingo, peterz

On 9/27/2017 5:42 PM, Govindarajulu Varadarajan wrote:
> CPU0					CPU1
> ---------------------------------------------------------------------
> __driver_attach()
> device_lock(&dev->mutex) <--- device mutex lock here
> driver_probe_device()
> pci_enable_sriov()
> pci_iov_add_virtfn()
> pci_device_add()
> 					aer_isr()		<--- pci aer error
> 					do_recovery()
> 					broadcast_error_message()
> 					pci_walk_bus()
> 					down_read(&pci_bus_sem) <--- rd sem

How about releasing the device_lock here on CPU0? or in other words keep
device_lock as short as possible?

> down_write(&pci_bus_sem) <-- stuck on wr sem
> 					report_error_detected()
> 					device_lock(&dev->mutex)<--- DEAD LOCK


-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] pci aer: fix deadlock in do_recovery
  2017-09-28 16:47   ` Sinan Kaya
@ 2017-09-28 23:46     ` Govindarajulu Varadarajan
  2017-09-29 13:32       ` Sinan Kaya
  0 siblings, 1 reply; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-28 23:46 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz

On Thu, 28 Sep 2017, Sinan Kaya wrote:

> On 9/27/2017 5:42 PM, Govindarajulu Varadarajan wrote:
>> CPU0					CPU1
>> ---------------------------------------------------------------------
>> __driver_attach()
>> device_lock(&dev->mutex) <--- device mutex lock here
>> driver_probe_device()
>> pci_enable_sriov()
>> pci_iov_add_virtfn()
>> pci_device_add()
>> 					aer_isr()		<--- pci aer error
>> 					do_recovery()
>> 					broadcast_error_message()
>> 					pci_walk_bus()
>> 					down_read(&pci_bus_sem) <--- rd sem
>
> How about releasing the device_lock here on CPU0?>

pci_device_add() is called by driver's pci probe function. device_lock(dev)
should be held before calling pci driver probe function.

> or in other words keep device_lock as short as possible?

The problem is not the duration device_lock is held. It is the order two locks
are aquired. We cannot control or implement a restriction that during
device_lock() is held, driver probe should not call pci function which aquires
pci_bus_sem. And in case of pci aer, aer handler needs to call driver err_handler()
for which we need to hold device_lock() before calling err_handler(). In order
to find all the devices on a pci bus, we should hold pci_bus_sem to do
pci_walk_bus().

>> down_write(&pci_bus_sem) <-- stuck on wr sem
>> 					report_error_detected()
>> 					device_lock(&dev->mutex)<--- DEAD LOCK

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig
  2017-09-28  9:26   ` Peter Zijlstra
@ 2017-09-28 23:51     ` Govindarajulu Varadarajan
  2017-09-29 16:23       ` Bjorn Helgaas
  0 siblings, 1 reply; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-28 23:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo

On Thu, 28 Sep 2017, Peter Zijlstra wrote:

> On Wed, Sep 27, 2017 at 02:42:20PM -0700, Govindarajulu Varadarajan wrote:
>> Make MAX_LOCK_DEPTH configurable. It is set to 48 right now. Number of
>> VFs under a PCI pf bus can exceed 48 and this disables lockdep.
>>
>> lockdep currently allows max of 63 held_locks.
>
> But why a config knob? Why not just raise the number to 64
> unconditionally? And is that sufficient; you only state 48 is
> insufficient, you don't actually state the VF limit.
>

I did not want to change the default configuration for everyone.

I will change it 63 unconditionally in v2 and resubmit the series.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held
  2017-09-28 16:12   ` Sinan Kaya
@ 2017-09-28 23:52     ` Govindarajulu Varadarajan
  0 siblings, 0 replies; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-28 23:52 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz

On Thu, 28 Sep 2017, Sinan Kaya wrote:

> On 9/27/2017 5:42 PM, Govindarajulu Varadarajan wrote:
>> +void __pci_walk_bus(struct pci_bus *top, int (*cb)(struct pci_dev *, void *),
>> +		    void *userdata);
>
> pci_walk_bus_locked would be a better name as you are assuming that caller is
> holding the lock.
>

Will change and resubmit for v2.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] pci aer: fix deadlock in do_recovery
  2017-09-28 23:46     ` Govindarajulu Varadarajan
@ 2017-09-29 13:32       ` Sinan Kaya
  2017-09-30  6:00         ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 15+ messages in thread
From: Sinan Kaya @ 2017-09-29 13:32 UTC (permalink / raw)
  To: Govindarajulu Varadarajan
  Cc: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz

On 9/28/2017 7:46 PM, Govindarajulu Varadarajan wrote:
>> How about releasing the device_lock here on CPU0?>
> 
> pci_device_add() is called by driver's pci probe function. device_lock(dev)
> should be held before calling pci driver probe function.

I see. The goal of the lock held here is to protect probe() operation from
being disrupted. I also don't think we can change this. 

> 
>> or in other words keep device_lock as short as possible?
> 
> The problem is not the duration device_lock is held. It is the order two locks
> are aquired. We cannot control or implement a restriction that during
> device_lock() is held, driver probe should not call pci function which aquires
> pci_bus_sem. And in case of pci aer, aer handler needs to call driver err_handler()
> for which we need to hold device_lock() before calling err_handler(). In order
> to find all the devices on a pci bus, we should hold pci_bus_sem to do
> pci_walk_bus().

I was reacting to this to see if there is a better way to do this.

"Only fix I could think of is to lock &pci_bus_sem and try locking all
device->mutex under that pci_bus. If it fails, unlock all device->mutex
and &pci_bus_sem and try again."

How about gracefully returning from report_error_detected() when we cannot obtain
the device_lock() by replacing it with device_trylock()? 

aer_pci_walk_bus() can still poll like you did until it gets the lock. At least,
we don't get to introduce a new API, new lock semantics and code refactoring.
__pci_bus_trylock() looked very powerful and also dangerously flexible to
introduce new bugs to me.

For instance, you called it like this.

+		down_read(&pci_bus_sem);
+		locked = __pci_bus_trylock(bus, pci_device_trylock,
+					   pci_device_unlock);

pci_bus_trylock() would obtain device + cfg locks whereas pci_device_trylock() only
obtains the device lock. Can it race against cfg lock? It depends on the caller.
Very subtle difference.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig
  2017-09-28 23:51     ` Govindarajulu Varadarajan
@ 2017-09-29 16:23       ` Bjorn Helgaas
  2017-09-30  6:03         ` Govindarajulu Varadarajan
  0 siblings, 1 reply; 15+ messages in thread
From: Bjorn Helgaas @ 2017-09-29 16:23 UTC (permalink / raw)
  To: Govindarajulu Varadarajan
  Cc: Peter Zijlstra, benve, bhelgaas, linux-pci, linux-kernel, jlbec,
	hch, mingo

On Thu, Sep 28, 2017 at 04:51:46PM -0700, Govindarajulu Varadarajan wrote:
> On Thu, 28 Sep 2017, Peter Zijlstra wrote:
> 
> >On Wed, Sep 27, 2017 at 02:42:20PM -0700, Govindarajulu Varadarajan wrote:
> >>Make MAX_LOCK_DEPTH configurable. It is set to 48 right now. Number of
> >>VFs under a PCI pf bus can exceed 48 and this disables lockdep.
> >>
> >>lockdep currently allows max of 63 held_locks.
> >
> >But why a config knob? Why not just raise the number to 64
> >unconditionally? And is that sufficient; you only state 48 is
> >insufficient, you don't actually state the VF limit.
> >
> 
> I did not want to change the default configuration for everyone.
> 
> I will change it 63 unconditionally in v2 and resubmit the series.

I'm not happy about having to increase MAX_LOCK_DEPTH based on a
number of VFs.  I haven't had time to look at the locking strategy
you're proposing, but it just doesn't feel right to have to take 50+
locks for one operation.

Bjorn

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] pci aer: fix deadlock in do_recovery
  2017-09-29 13:32       ` Sinan Kaya
@ 2017-09-30  6:00         ` Govindarajulu Varadarajan
  0 siblings, 0 replies; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-30  6:00 UTC (permalink / raw)
  To: Sinan Kaya
  Cc: benve, bhelgaas, linux-pci, linux-kernel, jlbec, hch, mingo, peterz

On Fri, 29 Sep 2017, Sinan Kaya wrote:

> On 9/28/2017 7:46 PM, Govindarajulu Varadarajan wrote:
>>> How about releasing the device_lock here on CPU0?>
>>
>> pci_device_add() is called by driver's pci probe function. device_lock(dev)
>> should be held before calling pci driver probe function.
>
> I see. The goal of the lock held here is to protect probe() operation from
> being disrupted. I also don't think we can change this.
>
>>
>>> or in other words keep device_lock as short as possible?
>>
>> The problem is not the duration device_lock is held. It is the order two locks
>> are aquired. We cannot control or implement a restriction that during
>> device_lock() is held, driver probe should not call pci function which aquires
>> pci_bus_sem. And in case of pci aer, aer handler needs to call driver err_handler()
>> for which we need to hold device_lock() before calling err_handler(). In order
>> to find all the devices on a pci bus, we should hold pci_bus_sem to do
>> pci_walk_bus().
>
> I was reacting to this to see if there is a better way to do this.
>
> "Only fix I could think of is to lock &pci_bus_sem and try locking all
> device->mutex under that pci_bus. If it fails, unlock all device->mutex
> and &pci_bus_sem and try again."
>
> How about gracefully returning from report_error_detected() when we cannot obtain
> the device_lock() by replacing it with device_trylock()?
>

Some of the devices may miss the error reporthing. I have sent V2 where we do
a pci_bus_walk and adds all the devices to a list. After unlocking (up_read)
&pci_bus_sem, we go through the list and call err_handler of the devices with
devic_lock() held. This way, we dont try to hold both locks at same time and
we dont hold 50+ device_lock. Let me know if this approach is better.

> aer_pci_walk_bus() can still poll like you did until it gets the lock. At least,
> we don't get to introduce a new API, new lock semantics and code refactoring.
> __pci_bus_trylock() looked very powerful and also dangerously flexible to
> introduce new bugs to me.
>
> For instance, you called it like this.
>
> +		down_read(&pci_bus_sem);
> +		locked = __pci_bus_trylock(bus, pci_device_trylock,
> +					   pci_device_unlock);
>
> pci_bus_trylock() would obtain device + cfg locks whereas pci_device_trylock() only
> obtains the device lock. Can it race against cfg lock? It depends on the caller.
> Very subtle difference.
>
> -- 
> Sinan Kaya
> Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
> Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig
  2017-09-29 16:23       ` Bjorn Helgaas
@ 2017-09-30  6:03         ` Govindarajulu Varadarajan
  0 siblings, 0 replies; 15+ messages in thread
From: Govindarajulu Varadarajan @ 2017-09-30  6:03 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Peter Zijlstra, benve, bhelgaas, linux-pci, linux-kernel, jlbec,
	hch, mingo

On Fri, 29 Sep 2017, Bjorn Helgaas wrote:

> On Thu, Sep 28, 2017 at 04:51:46PM -0700, Govindarajulu Varadarajan wrote:
>> On Thu, 28 Sep 2017, Peter Zijlstra wrote:
>>
>>> On Wed, Sep 27, 2017 at 02:42:20PM -0700, Govindarajulu Varadarajan wrote:
>>>> Make MAX_LOCK_DEPTH configurable. It is set to 48 right now. Number of
>>>> VFs under a PCI pf bus can exceed 48 and this disables lockdep.
>>>>
>>>> lockdep currently allows max of 63 held_locks.
>>>
>>> But why a config knob? Why not just raise the number to 64
>>> unconditionally? And is that sufficient; you only state 48 is
>>> insufficient, you don't actually state the VF limit.
>>>
>>
>> I did not want to change the default configuration for everyone.
>>
>> I will change it 63 unconditionally in v2 and resubmit the series.
>
> I'm not happy about having to increase MAX_LOCK_DEPTH based on a
> number of VFs.  I haven't had time to look at the locking strategy
> you're proposing, but it just doesn't feel right to have to take 50+
> locks for one operation.

I agree. I have sent V2 where we dont lock 50+ device_lock. Please have a look.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-09-30  6:03 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27 21:42 [PATCH 0/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
2017-09-27 21:42 ` [PATCH 1/4] pci: introduce __pci_walk_bus for caller with pci_bus_sem held Govindarajulu Varadarajan
2017-09-28 16:12   ` Sinan Kaya
2017-09-28 23:52     ` Govindarajulu Varadarajan
2017-09-27 21:42 ` [PATCH 2/4] pci: code refactor pci_bus_lock/unlock/trylock Govindarajulu Varadarajan
2017-09-27 21:42 ` [PATCH 3/4] pci aer: fix deadlock in do_recovery Govindarajulu Varadarajan
2017-09-28 16:47   ` Sinan Kaya
2017-09-28 23:46     ` Govindarajulu Varadarajan
2017-09-29 13:32       ` Sinan Kaya
2017-09-30  6:00         ` Govindarajulu Varadarajan
2017-09-27 21:42 ` [PATCH 4/4] lockdep: make MAX_LOCK_DEPTH configurable from Kconfig Govindarajulu Varadarajan
2017-09-28  9:26   ` Peter Zijlstra
2017-09-28 23:51     ` Govindarajulu Varadarajan
2017-09-29 16:23       ` Bjorn Helgaas
2017-09-30  6:03         ` Govindarajulu Varadarajan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.