linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for-rc 0/3] Series short description
@ 2020-02-10 13:10 Dennis Dalessandro
  2020-02-10 13:10 ` [PATCH for-rc 1/3] IB/hfi1: Acquire lock to release TID entries when user file is closed Dennis Dalessandro
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Dennis Dalessandro @ 2020-02-10 13:10 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma

Here are some fixes for the current rc cycle. The first fixes a potential data
corruption and involves adding a mutex lock around a missed criitcal section.
The other two patches are a bit more involved but they fix panics.

---

Kaike Wan (2):
      IB/hfi1: Acquire lock to release TID entries when user file is closed
      IB/rdmavt: Reset all QPs when the device is shut down

Mike Marciniszyn (1):
      IB/hfi1: Close window for pq and request coliding


 drivers/infiniband/hw/hfi1/file_ops.c     |   52 +++++++++++-------
 drivers/infiniband/hw/hfi1/hfi.h          |    5 +-
 drivers/infiniband/hw/hfi1/user_exp_rcv.c |    5 +-
 drivers/infiniband/hw/hfi1/user_sdma.c    |   17 ++++--
 drivers/infiniband/sw/rdmavt/qp.c         |   84 ++++++++++++++++++-----------
 5 files changed, 101 insertions(+), 62 deletions(-)

--
-Denny

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH for-rc 1/3] IB/hfi1: Acquire lock to release TID entries when user file is closed
  2020-02-10 13:10 [PATCH for-rc 0/3] Series short description Dennis Dalessandro
@ 2020-02-10 13:10 ` Dennis Dalessandro
  2020-02-10 13:10 ` [PATCH for-rc 2/3] IB/hfi1: Close window for pq and request coliding Dennis Dalessandro
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Dennis Dalessandro @ 2020-02-10 13:10 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Mike Marciniszyn, Kaike Wan

From: Kaike Wan <kaike.wan@intel.com>

Each user context is allocated a certain number of RcvArray (TID)
entries and these entries are managed through TID groups. These groups
are put into one of three lists in each user context: tid_group_list,
tid_used_list, and tid_full_list, depending on the number of used TID
entries within each group. When TID packets are expected, one or more
TID groups will be allocated. After the packets are received, the TID
groups will be freed. Since multiple user threads may access the TID
groups simultaneously, a mutex exp_mutex is used to synchronize the
access. However, when the user file is closed, it tries to release
all TID groups without acquiring the mutex first, which risks a race
condition with another thread that may be releasing its TID groups,
leading to data corruption.

This patch addresses the issue by acquiring the mutex first before
releasing the TID groups when the file is closed.

Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/user_exp_rcv.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index f05742a..2443423 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -142,10 +142,12 @@ void hfi1_user_exp_rcv_free(struct hfi1_filedata *fd)
 {
 	struct hfi1_ctxtdata *uctxt = fd->uctxt;
 
+	mutex_lock(&uctxt->exp_mutex);
 	if (!EXP_TID_SET_EMPTY(uctxt->tid_full_list))
 		unlock_exp_tids(uctxt, &uctxt->tid_full_list, fd);
 	if (!EXP_TID_SET_EMPTY(uctxt->tid_used_list))
 		unlock_exp_tids(uctxt, &uctxt->tid_used_list, fd);
+	mutex_unlock(&uctxt->exp_mutex);
 
 	kfree(fd->invalid_tids);
 	fd->invalid_tids = NULL;


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH for-rc 2/3] IB/hfi1: Close window for pq and request coliding
  2020-02-10 13:10 [PATCH for-rc 0/3] Series short description Dennis Dalessandro
  2020-02-10 13:10 ` [PATCH for-rc 1/3] IB/hfi1: Acquire lock to release TID entries when user file is closed Dennis Dalessandro
@ 2020-02-10 13:10 ` Dennis Dalessandro
  2020-02-10 13:10 ` [PATCH for-rc 3/3] IB/rdmavt: Reset all QPs when the device is shut down Dennis Dalessandro
  2020-02-11 17:52 ` [PATCH for-rc 0/3] Series short description Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Dennis Dalessandro @ 2020-02-10 13:10 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Mike Marciniszyn, Kaike Wan

From: Mike Marciniszyn <mike.marciniszyn@intel.com>

Cleaning up a pq can result in the following warning and panic:

[29386.970819] WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0
[29386.970821] list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100)
[29386.970823] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
[29386.970877]  nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
[29386.970891] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G           OE  ------------   3.10.0-957.38.3.el7.x86_64 #1
[29386.970893] Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
[29386.970894] Call Trace:
[29386.970905]  [<ffffffff90365ac0>] dump_stack+0x19/0x1b
[29386.970911]  [<ffffffff8fc98b78>] __warn+0xd8/0x100
[29386.970913]  [<ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80
[29386.970916]  [<ffffffff8ff970c3>] __list_del_entry+0x63/0xd0
[29386.970919]  [<ffffffff8ff9713d>] list_del+0xd/0x30
[29386.970924]  [<ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110
[29386.970969]  [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[29386.970984]  [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
[29386.970988]  [<ffffffff8fe4519c>] __fput+0xec/0x260
[29386.970991]  [<ffffffff8fe453fe>] ____fput+0xe/0x10
[29386.970995]  [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
[29386.971000]  [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
[29386.971004]  [<ffffffff90379134>] int_signal+0x12/0x17
[29386.971024] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[29386.978897] IP: [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
[29386.984928] PGD 2cdab19067 PUD 2f7bfdb067 PMD 0
[29386.989627] Oops: 0000 [#1] SMP
[29386.992902] Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables
[29387.065320]  nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit]
[29387.078246] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G        W  OE  ------------   3.10.0-957.38.3.el7.x86_64 #1
[29387.089796] Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019
[29387.099097] task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000
[29387.106576] RIP: 0010:[<ffffffff8fe1f93e>]  [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
[29387.115033] RSP: 0018:ffff88b5393abd60  EFLAGS: 00010287
[29387.120346] RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003
[29387.127479] RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800
[29387.134612] RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19
[29387.141746] R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000
[29387.148877] R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000
[29387.156010] FS:  00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000
[29387.164097] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[29387.169839] CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0
[29387.176966] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[29387.184100] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[29387.191232] PKRU: 55555554
[29387.193944] Call Trace:
[29387.196400]  [<ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80
[29387.202492]  [<ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110
[29387.208461]  [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1]
[29387.215694]  [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1]
[29387.221957]  [<ffffffff8fe4519c>] __fput+0xec/0x260
[29387.226837]  [<ffffffff8fe453fe>] ____fput+0xe/0x10
[29387.231718]  [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0
[29387.237116]  [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0
[29387.242778]  [<ffffffff90379134>] int_signal+0x12/0x17
[29387.247922] Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39
[29387.268313] RIP  [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300
[29387.274440]  RSP <ffff88b5393abd60>
[29387.277932] CR2: 0000000000000010

The panic is the result of slab entries being freed during the
destruction of the pq slab.

The code attempts to quiesce the pq, but looking for
n_req == 0 doesn't account for new requests.

Fix the issue by using SRCU to get a pq pointer and adjust the pq
free logic to NULL the fd pq pointer prior to the quiesce.

Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized")
Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/file_ops.c     |   52 ++++++++++++++++++-----------
 drivers/infiniband/hw/hfi1/hfi.h          |    5 ++-
 drivers/infiniband/hw/hfi1/user_exp_rcv.c |    3 --
 drivers/infiniband/hw/hfi1/user_sdma.c    |   17 +++++++--
 4 files changed, 48 insertions(+), 29 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index bef6946..2591158 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -200,23 +200,24 @@ static int hfi1_file_open(struct inode *inode, struct file *fp)
 
 	fd = kzalloc(sizeof(*fd), GFP_KERNEL);
 
-	if (fd) {
-		fd->rec_cpu_num = -1; /* no cpu affinity by default */
-		fd->mm = current->mm;
-		mmgrab(fd->mm);
-		fd->dd = dd;
-		kobject_get(&fd->dd->kobj);
-		fp->private_data = fd;
-	} else {
-		fp->private_data = NULL;
-
-		if (atomic_dec_and_test(&dd->user_refcount))
-			complete(&dd->user_comp);
-
-		return -ENOMEM;
-	}
-
+	if (!fd || init_srcu_struct(&fd->pq_srcu))
+		goto nomem;
+	spin_lock_init(&fd->pq_rcu_lock);
+	spin_lock_init(&fd->tid_lock);
+	spin_lock_init(&fd->invalid_lock);
+	fd->rec_cpu_num = -1; /* no cpu affinity by default */
+	fd->mm = current->mm;
+	mmgrab(fd->mm);
+	fd->dd = dd;
+	kobject_get(&fd->dd->kobj);
+	fp->private_data = fd;
 	return 0;
+nomem:
+	kfree(fd);
+	fp->private_data = NULL;
+	if (atomic_dec_and_test(&dd->user_refcount))
+		complete(&dd->user_comp);
+	return -ENOMEM;
 }
 
 static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
@@ -301,21 +302,30 @@ static long hfi1_file_ioctl(struct file *fp, unsigned int cmd,
 static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
 {
 	struct hfi1_filedata *fd = kiocb->ki_filp->private_data;
-	struct hfi1_user_sdma_pkt_q *pq = fd->pq;
+	struct hfi1_user_sdma_pkt_q *pq;
 	struct hfi1_user_sdma_comp_q *cq = fd->cq;
 	int done = 0, reqs = 0;
 	unsigned long dim = from->nr_segs;
+	int idx;
 
-	if (!cq || !pq)
+	idx = srcu_read_lock(&fd->pq_srcu);
+	pq = srcu_dereference(fd->pq, &fd->pq_srcu);
+	if (!cq || !pq) {
+		srcu_read_unlock(&fd->pq_srcu, idx);
 		return -EIO;
+	}
 
-	if (!iter_is_iovec(from) || !dim)
+	if (!iter_is_iovec(from) || !dim) {
+		srcu_read_unlock(&fd->pq_srcu, idx);
 		return -EINVAL;
+	}
 
 	trace_hfi1_sdma_request(fd->dd, fd->uctxt->ctxt, fd->subctxt, dim);
 
-	if (atomic_read(&pq->n_reqs) == pq->n_max_reqs)
+	if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) {
+		srcu_read_unlock(&fd->pq_srcu, idx);
 		return -ENOSPC;
+	}
 
 	while (dim) {
 		int ret;
@@ -333,6 +343,7 @@ static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from)
 		reqs++;
 	}
 
+	srcu_read_unlock(&fd->pq_srcu, idx);
 	return reqs;
 }
 
@@ -707,6 +718,7 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
 	if (atomic_dec_and_test(&dd->user_refcount))
 		complete(&dd->user_comp);
 
+	cleanup_srcu_struct(&fdata->pq_srcu);
 	kfree(fdata);
 	return 0;
 }
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index 6365e8f..cae12f4 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -1444,10 +1444,13 @@ static inline bool hfi1_vnic_is_rsm_full(struct hfi1_devdata *dd, int spare)
 
 /* Private data for file operations */
 struct hfi1_filedata {
+	struct srcu_struct pq_srcu;
 	struct hfi1_devdata *dd;
 	struct hfi1_ctxtdata *uctxt;
 	struct hfi1_user_sdma_comp_q *cq;
-	struct hfi1_user_sdma_pkt_q *pq;
+	/* update side lock for SRCU */
+	spinlock_t pq_rcu_lock;
+	struct hfi1_user_sdma_pkt_q __rcu *pq;
 	u16 subctxt;
 	/* for cpu affinity; -1 if none */
 	int rec_cpu_num;
diff --git a/drivers/infiniband/hw/hfi1/user_exp_rcv.c b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
index 2443423..4da03f8 100644
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c
+++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c
@@ -87,9 +87,6 @@ int hfi1_user_exp_rcv_init(struct hfi1_filedata *fd,
 {
 	int ret = 0;
 
-	spin_lock_init(&fd->tid_lock);
-	spin_lock_init(&fd->invalid_lock);
-
 	fd->entry_to_rb = kcalloc(uctxt->expected_count,
 				  sizeof(struct rb_node *),
 				  GFP_KERNEL);
diff --git a/drivers/infiniband/hw/hfi1/user_sdma.c b/drivers/infiniband/hw/hfi1/user_sdma.c
index fd754a1..c2f0d9b 100644
--- a/drivers/infiniband/hw/hfi1/user_sdma.c
+++ b/drivers/infiniband/hw/hfi1/user_sdma.c
@@ -179,7 +179,6 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
 	pq = kzalloc(sizeof(*pq), GFP_KERNEL);
 	if (!pq)
 		return -ENOMEM;
-
 	pq->dd = dd;
 	pq->ctxt = uctxt->ctxt;
 	pq->subctxt = fd->subctxt;
@@ -236,7 +235,7 @@ int hfi1_user_sdma_alloc_queues(struct hfi1_ctxtdata *uctxt,
 		goto pq_mmu_fail;
 	}
 
-	fd->pq = pq;
+	rcu_assign_pointer(fd->pq, pq);
 	fd->cq = cq;
 
 	return 0;
@@ -264,8 +263,14 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
 
 	trace_hfi1_sdma_user_free_queues(uctxt->dd, uctxt->ctxt, fd->subctxt);
 
-	pq = fd->pq;
+	spin_lock(&fd->pq_rcu_lock);
+	pq = srcu_dereference_check(fd->pq, &fd->pq_srcu,
+				    lockdep_is_held(&fd->pq_rcu_lock));
 	if (pq) {
+		rcu_assign_pointer(fd->pq, NULL);
+		spin_unlock(&fd->pq_rcu_lock);
+		synchronize_srcu(&fd->pq_srcu);
+		/* at this point there can be no more new requests */
 		if (pq->handler)
 			hfi1_mmu_rb_unregister(pq->handler);
 		iowait_sdma_drain(&pq->busy);
@@ -277,7 +282,8 @@ int hfi1_user_sdma_free_queues(struct hfi1_filedata *fd,
 		kfree(pq->req_in_use);
 		kmem_cache_destroy(pq->txreq_cache);
 		kfree(pq);
-		fd->pq = NULL;
+	} else {
+		spin_unlock(&fd->pq_rcu_lock);
 	}
 	if (fd->cq) {
 		vfree(fd->cq->comps);
@@ -321,7 +327,8 @@ int hfi1_user_sdma_process_request(struct hfi1_filedata *fd,
 {
 	int ret = 0, i;
 	struct hfi1_ctxtdata *uctxt = fd->uctxt;
-	struct hfi1_user_sdma_pkt_q *pq = fd->pq;
+	struct hfi1_user_sdma_pkt_q *pq =
+		srcu_dereference(fd->pq, &fd->pq_srcu);
 	struct hfi1_user_sdma_comp_q *cq = fd->cq;
 	struct hfi1_devdata *dd = pq->dd;
 	unsigned long idx = 0;


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH for-rc 3/3] IB/rdmavt: Reset all QPs when the device is shut down
  2020-02-10 13:10 [PATCH for-rc 0/3] Series short description Dennis Dalessandro
  2020-02-10 13:10 ` [PATCH for-rc 1/3] IB/hfi1: Acquire lock to release TID entries when user file is closed Dennis Dalessandro
  2020-02-10 13:10 ` [PATCH for-rc 2/3] IB/hfi1: Close window for pq and request coliding Dennis Dalessandro
@ 2020-02-10 13:10 ` Dennis Dalessandro
  2020-02-11 17:52 ` [PATCH for-rc 0/3] Series short description Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Dennis Dalessandro @ 2020-02-10 13:10 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Mike Marciniszyn, Kaike Wan

From: Kaike Wan <kaike.wan@intel.com>

When the hfi1 device is shut down during a system reboot, it is
possible that some QPs might have not not freed by ULPs. More requests
could be post sent and a lingering timer could be triggered to schedule
more packet sends, leading to a crash:

[ 188.570075] BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
[ 188.570114] IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0
[ 188.570142] PGD 0
[ 188.570154] Oops: 0000 1 SMP
[ 188.570171] Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en
[ 188.570501] sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod
[ 188.570641] CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1
[ 188.570674] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018
[ 188.570708] task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000
[ 188.570733] RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0
[ 188.570763] RSP: 0018:ffff88105df43d48 EFLAGS: 00010046
[ 188.570782] RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000
[ 188.570806] RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f
[ 188.570830] RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000
[ 188.570854] R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0
[ 188.570877] R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000
[ 188.570901] FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000
[ 188.570929] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 188.570948] CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0
[ 188.570972] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 188.570996] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 188.571020] Stack:
[ 188.571029] ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000
[ 188.571060] ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98
[ 188.571090] ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde
[ 188.571119] Call Trace:
[ 188.571130] IRQ
[ 188.571138]
[ 188.571148] [ffffffff810a6b85] queue_work_on+0x45/0x50
[ 188.571186] [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1]
[ 188.571215] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ 188.571248] [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1]
[ 188.571275] [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt]
[ 188.571301] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ 188.571324] [ffffffff81097316] call_timer_fn+0x36/0x110
[ 188.571348] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt]
[ 188.571370] [ffffffff8109982d] run_timer_softirq+0x22d/0x310
[ 188.571393] [ffffffff81090b3f] __do_softirq+0xef/0x280
[ 188.571415] [ffffffff816b6a5c] call_softirq+0x1c/0x30
[ 188.571435] [ffffffff8102d3c5] do_softirq+0x65/0xa0
[ 188.571454] [ffffffff81090ec5] irq_exit+0x105/0x110
[ 188.572263] [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50
[ 188.573060] [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80
[ 188.573846] EOI
[ 188.573854]
[ 188.574631] [ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0
[ 188.575425] [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210
[ 188.576203] [ffffffff81034fee] arch_cpu_idle+0xe/0x30
[ 188.576959] [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0
[ 188.577697] [ffffffff81051af6] start_secondary+0x1b6/0x230
[ 188.578412] Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00
[ 188.579926] RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0
[ 188.580623] RSP ffff88105df43d48
[ 188.581293] CR2: 0000000000000102

The solution is to reset the QPs before the device resources are freed.
This reset will change the QP state to prevent post sends and delete
timers to prevent callbacks.

Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/sw/rdmavt/qp.c |   84 ++++++++++++++++++++++---------------
 1 file changed, 51 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 3cdf75d..7858d49 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -61,6 +61,8 @@
 #define RVT_RWQ_COUNT_THRESHOLD 16
 
 static void rvt_rc_timeout(struct timer_list *t);
+static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
+			 enum ib_qp_type type);
 
 /*
  * Convert the AETH RNR timeout code into the number of microseconds.
@@ -452,40 +454,41 @@ int rvt_driver_qp_init(struct rvt_dev_info *rdi)
 }
 
 /**
- * free_all_qps - check for QPs still in use
+ * rvt_free_qp_cb - callback function to reset a qp
+ * @qp: the qp to reset
+ * @v: a 64-bit value
+ *
+ * This function resets the qp and removes it from the
+ * qp hash table.
+ */
+static void rvt_free_qp_cb(struct rvt_qp *qp, u64 v)
+{
+	unsigned int *qp_inuse = (unsigned int *)v;
+	struct rvt_dev_info *rdi = ib_to_rvt(qp->ibqp.device);
+
+	/* Reset the qp and remove it from the qp hash list */
+	rvt_reset_qp(rdi, qp, qp->ibqp.qp_type);
+
+	/* Increment the qp_inuse count */
+	(*qp_inuse)++;
+}
+
+/**
+ * rvt_free_all_qps - check for QPs still in use
  * @rdi: rvt device info structure
  *
  * There should not be any QPs still in use.
  * Free memory for table.
+ * Return the number of QPs still in use.
  */
 static unsigned rvt_free_all_qps(struct rvt_dev_info *rdi)
 {
-	unsigned long flags;
-	struct rvt_qp *qp;
-	unsigned n, qp_inuse = 0;
-	spinlock_t *ql; /* work around too long line below */
-
-	if (rdi->driver_f.free_all_qps)
-		qp_inuse = rdi->driver_f.free_all_qps(rdi);
+	unsigned int qp_inuse = 0;
 
 	qp_inuse += rvt_mcast_tree_empty(rdi);
 
-	if (!rdi->qp_dev)
-		return qp_inuse;
-
-	ql = &rdi->qp_dev->qpt_lock;
-	spin_lock_irqsave(ql, flags);
-	for (n = 0; n < rdi->qp_dev->qp_table_size; n++) {
-		qp = rcu_dereference_protected(rdi->qp_dev->qp_table[n],
-					       lockdep_is_held(ql));
-		RCU_INIT_POINTER(rdi->qp_dev->qp_table[n], NULL);
+	rvt_qp_iter(rdi, (u64)&qp_inuse, rvt_free_qp_cb);
 
-		for (; qp; qp = rcu_dereference_protected(qp->next,
-							  lockdep_is_held(ql)))
-			qp_inuse++;
-	}
-	spin_unlock_irqrestore(ql, flags);
-	synchronize_rcu();
 	return qp_inuse;
 }
 
@@ -902,14 +905,14 @@ static void rvt_init_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
 }
 
 /**
- * rvt_reset_qp - initialize the QP state to the reset state
+ * _rvt_reset_qp - initialize the QP state to the reset state
  * @qp: the QP to reset
  * @type: the QP type
  *
  * r_lock, s_hlock, and s_lock are required to be held by the caller
  */
-static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
-			 enum ib_qp_type type)
+static void _rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
+			  enum ib_qp_type type)
 	__must_hold(&qp->s_lock)
 	__must_hold(&qp->s_hlock)
 	__must_hold(&qp->r_lock)
@@ -955,6 +958,27 @@ static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
 	lockdep_assert_held(&qp->s_lock);
 }
 
+/**
+ * rvt_reset_qp - initialize the QP state to the reset state
+ * @rdi: the device info
+ * @qp: the QP to reset
+ * @type: the QP type
+ *
+ * This is the wrapper function to acquire the r_lock, s_hlock, and s_lock
+ * before calling _rvt_reset_qp().
+ */
+static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
+			 enum ib_qp_type type)
+{
+	spin_lock_irq(&qp->r_lock);
+	spin_lock(&qp->s_hlock);
+	spin_lock(&qp->s_lock);
+	_rvt_reset_qp(rdi, qp, type);
+	spin_unlock(&qp->s_lock);
+	spin_unlock(&qp->s_hlock);
+	spin_unlock_irq(&qp->r_lock);
+}
+
 /** rvt_free_qpn - Free a qpn from the bit map
  * @qpt: QP table
  * @qpn: queue pair number to free
@@ -1546,7 +1570,7 @@ int rvt_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
 	switch (new_state) {
 	case IB_QPS_RESET:
 		if (qp->state != IB_QPS_RESET)
-			rvt_reset_qp(rdi, qp, ibqp->qp_type);
+			_rvt_reset_qp(rdi, qp, ibqp->qp_type);
 		break;
 
 	case IB_QPS_RTR:
@@ -1695,13 +1719,7 @@ int rvt_destroy_qp(struct ib_qp *ibqp, struct ib_udata *udata)
 	struct rvt_qp *qp = ibqp_to_rvtqp(ibqp);
 	struct rvt_dev_info *rdi = ib_to_rvt(ibqp->device);
 
-	spin_lock_irq(&qp->r_lock);
-	spin_lock(&qp->s_hlock);
-	spin_lock(&qp->s_lock);
 	rvt_reset_qp(rdi, qp, ibqp->qp_type);
-	spin_unlock(&qp->s_lock);
-	spin_unlock(&qp->s_hlock);
-	spin_unlock_irq(&qp->r_lock);
 
 	wait_event(qp->wait, !atomic_read(&qp->refcount));
 	/* qpn is now available for use again */


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH for-rc 0/3] Series short description
  2020-02-10 13:10 [PATCH for-rc 0/3] Series short description Dennis Dalessandro
                   ` (2 preceding siblings ...)
  2020-02-10 13:10 ` [PATCH for-rc 3/3] IB/rdmavt: Reset all QPs when the device is shut down Dennis Dalessandro
@ 2020-02-11 17:52 ` Jason Gunthorpe
  3 siblings, 0 replies; 5+ messages in thread
From: Jason Gunthorpe @ 2020-02-11 17:52 UTC (permalink / raw)
  To: Dennis Dalessandro; +Cc: dledford, linux-rdma

On Mon, Feb 10, 2020 at 08:10:20AM -0500, Dennis Dalessandro wrote:
> Here are some fixes for the current rc cycle. The first fixes a potential data
> corruption and involves adding a mutex lock around a missed criitcal section.
> The other two patches are a bit more involved but they fix panics.
> 
> 
> Kaike Wan (2):
>       IB/hfi1: Acquire lock to release TID entries when user file is closed
>       IB/rdmavt: Reset all QPs when the device is shut down
> 
> Mike Marciniszyn (1):
>       IB/hfi1: Close window for pq and request coliding

Applied to for-rc

Thanks,
Jason

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-02-11 17:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-10 13:10 [PATCH for-rc 0/3] Series short description Dennis Dalessandro
2020-02-10 13:10 ` [PATCH for-rc 1/3] IB/hfi1: Acquire lock to release TID entries when user file is closed Dennis Dalessandro
2020-02-10 13:10 ` [PATCH for-rc 2/3] IB/hfi1: Close window for pq and request coliding Dennis Dalessandro
2020-02-10 13:10 ` [PATCH for-rc 3/3] IB/rdmavt: Reset all QPs when the device is shut down Dennis Dalessandro
2020-02-11 17:52 ` [PATCH for-rc 0/3] Series short description Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).