All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-rc 0/7] hfi1 and qib patches for rc
@ 2019-01-17 20:40 ` Dennis Dalessandro
  0 siblings, 0 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2019-01-17 20:40 UTC (permalink / raw)
  To: jgg, dledford
  Cc: Mike Marciniszyn, linux-rdma, Brian Welty, stable,
	Michael J. Ruhl, Andrzej Witkowski, Ira Weiny

Hi Folks,

Here are a handful of patches we'd like to target for the RC. These are bug
fixes, except one may be a little iffy (Add wc_flags and wc_immdata...) but
it's a very trivial patch and really helps with debugging.

---

Andrzej Witkowski (1):
      IB/hfi1: Use new API to deallocate vnic rdma-netdev in hfi1 driver

Brian Welty (1):
      IB/{hfi1,qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM

Michael J. Ruhl (3):
      IB/{hfi1,qib,rdmavt}: Do not depend on IB Verbs name for driver logging
      IB/hfi1: Close race condition on user context disable and close
      IB/hfi1: Remove overly conservative VM_EXEC flag check

Mike Marciniszyn (2):
      IB/rdmavt: Add wc_flags and wc_immdata to cq entry trace
      IB/hfi1: Add limit test for RC/UC send via loopback


 drivers/infiniband/hw/hfi1/affinity.c             |    6 ++-
 drivers/infiniband/hw/hfi1/driver.c               |    8 ++++
 drivers/infiniband/hw/hfi1/file_ops.c             |    4 ++
 drivers/infiniband/hw/hfi1/hfi.h                  |   25 +++++++-------
 drivers/infiniband/hw/hfi1/init.c                 |   10 +++---
 drivers/infiniband/hw/hfi1/ud.c                   |    1 -
 drivers/infiniband/hw/hfi1/verbs.c                |    1 +
 drivers/infiniband/hw/hfi1/vnic_main.c            |    4 +-
 drivers/infiniband/hw/qib/qib.h                   |   12 +++----
 drivers/infiniband/hw/qib/qib_driver.c            |    8 ++++
 drivers/infiniband/hw/qib/qib_init.c              |   10 ++++--
 drivers/infiniband/hw/qib/qib_ud.c                |    1 -
 drivers/infiniband/hw/qib/qib_verbs.c             |    1 +
 drivers/infiniband/sw/rdmavt/qp.c                 |    7 +++-
 drivers/infiniband/sw/rdmavt/trace.h              |    6 ++-
 drivers/infiniband/sw/rdmavt/trace_cq.h           |   10 ++++--
 drivers/infiniband/sw/rdmavt/vt.c                 |    2 +
 drivers/infiniband/sw/rdmavt/vt.h                 |   10 ++++--
 drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c |    5 +--
 include/rdma/ib_verbs.h                           |    7 ----
 include/rdma/rdma_vt.h                            |   38 ++++-----------------
 21 files changed, 92 insertions(+), 84 deletions(-)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-rc 0/7] hfi1 and qib patches for rc
@ 2019-01-17 20:40 ` Dennis Dalessandro
  0 siblings, 0 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2019-01-17 20:40 UTC (permalink / raw)
  To: jgg, dledford
  Cc: Mike Marciniszyn, linux-rdma, Brian Welty, stable,
	Michael J. Ruhl, Andrzej Witkowski, Ira Weiny

Hi Folks,

Here are a handful of patches we'd like to target for the RC. These are bug
fixes, except one may be a little iffy (Add wc_flags and wc_immdata...) but
it's a very trivial patch and really helps with debugging.

---

Andrzej Witkowski (1):
      IB/hfi1: Use new API to deallocate vnic rdma-netdev in hfi1 driver

Brian Welty (1):
      IB/{hfi1,qib}: Fix WC.byte_len calculation for UD_SEND_WITH_IMM

Michael J. Ruhl (3):
      IB/{hfi1,qib,rdmavt}: Do not depend on IB Verbs name for driver logging
      IB/hfi1: Close race condition on user context disable and close
      IB/hfi1: Remove overly conservative VM_EXEC flag check

Mike Marciniszyn (2):
      IB/rdmavt: Add wc_flags and wc_immdata to cq entry trace
      IB/hfi1: Add limit test for RC/UC send via loopback


 drivers/infiniband/hw/hfi1/affinity.c             |    6 ++-
 drivers/infiniband/hw/hfi1/driver.c               |    8 ++++
 drivers/infiniband/hw/hfi1/file_ops.c             |    4 ++
 drivers/infiniband/hw/hfi1/hfi.h                  |   25 +++++++-------
 drivers/infiniband/hw/hfi1/init.c                 |   10 +++---
 drivers/infiniband/hw/hfi1/ud.c                   |    1 -
 drivers/infiniband/hw/hfi1/verbs.c                |    1 +
 drivers/infiniband/hw/hfi1/vnic_main.c            |    4 +-
 drivers/infiniband/hw/qib/qib.h                   |   12 +++----
 drivers/infiniband/hw/qib/qib_driver.c            |    8 ++++
 drivers/infiniband/hw/qib/qib_init.c              |   10 ++++--
 drivers/infiniband/hw/qib/qib_ud.c                |    1 -
 drivers/infiniband/hw/qib/qib_verbs.c             |    1 +
 drivers/infiniband/sw/rdmavt/qp.c                 |    7 +++-
 drivers/infiniband/sw/rdmavt/trace.h              |    6 ++-
 drivers/infiniband/sw/rdmavt/trace_cq.h           |   10 ++++--
 drivers/infiniband/sw/rdmavt/vt.c                 |    2 +
 drivers/infiniband/sw/rdmavt/vt.h                 |   10 ++++--
 drivers/infiniband/ulp/opa_vnic/opa_vnic_netdev.c |    5 +--
 include/rdma/ib_verbs.h                           |    7 ----
 include/rdma/rdma_vt.h                            |   38 ++++-----------------
 21 files changed, 92 insertions(+), 84 deletions(-)

--
-Denny

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context disable and close
  2019-01-17 20:40 ` Dennis Dalessandro
  (?)
@ 2019-01-17 20:41 ` Dennis Dalessandro
  2019-01-17 21:12   ` Jason Gunthorpe
       [not found]   ` <20190122155611.B55A0217F9@mail.kernel.org>
  -1 siblings, 2 replies; 11+ messages in thread
From: Dennis Dalessandro @ 2019-01-17 20:41 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Michael J. Ruhl, Mike Marciniszyn, stable

From: Michael J. Ruhl <michael.j.ruhl@intel.com>

When disabling and removing a receive context, it is possible for an
asynchronous event (i.e IRQ) to occur.  Because of this there is a
race between cleaning up the context, and the context being used by
the asynchronous event.

cpu 0  (context cleanup)
    rc->ref_count-- (ref_count == 0)
    hfi1_rcd_free()
cpu 1  (IRQ (with rcd index))
	rcd_get_by_index()
	lock
	ref_count+++     <-- reference count race (WARNING)
	return rcd
	unlock
cpu 0
    hfi1_free_ctxtdata() <-- incorrect free location
    lock
    remove rcd from array
    unlock
    free rcd

This race will cause the following WARNING trace:

WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
Call Trace:
  dump_stack+0x19/0x1b
  __warn+0xd8/0x100
  warn_slowpath_null+0x1d/0x20
  hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
  is_rcv_urgent_int+0x24/0x90 [hfi1]
  general_interrupt+0x1b6/0x210 [hfi1]
  __handle_irq_event_percpu+0x44/0x1c0
  handle_irq_event_percpu+0x32/0x80
  handle_irq_event+0x3c/0x60
  handle_edge_irq+0x7f/0x150
  handle_irq+0xe4/0x1a0
  do_IRQ+0x4d/0xf0
  common_interrupt+0x162/0x162

The race can also lead to a use after free which could be similar
to:

general protection fault: 0000 1 SMP
CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
Call Trace:
  ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
  hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
  hfi1_aio_write+0xba/0x110 [hfi1]
  do_sync_readv_writev+0x7b/0xd0
  do_readv_writev+0xce/0x260
  ? handle_mm_fault+0x39d/0x9b0
  ? pick_next_task_fair+0x5f/0x1b0
  ? sched_clock_cpu+0x85/0xc0
  ? __schedule+0x13a/0x890
  vfs_writev+0x35/0x60
  SyS_writev+0x7f/0x110
  system_call_fastpath+0x22/0x27

Introduce a context enable flag to indicate if the context is enabled.

Reorder context cleanup to ensure context removal before cleanup
occurs correctly.

Cc: stable@vger.kernel.org # v4.14.0+
Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/file_ops.c |    2 ++
 drivers/infiniband/hw/hfi1/hfi.h      |    3 ++-
 drivers/infiniband/hw/hfi1/init.c     |    8 +++++---
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index c22ebc7..0ba0cf5 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -671,6 +671,8 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
 	}
 	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
 
+	uctxt->del_pend = 1;
+
 	/*
 	 * Disable receive context and interrupt available, reset all
 	 * RcvCtxtCtrl bits to default values.
diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h
index dc8ba43..ea99768 100644
--- a/drivers/infiniband/hw/hfi1/hfi.h
+++ b/drivers/infiniband/hw/hfi1/hfi.h
@@ -284,7 +284,8 @@ struct hfi1_ctxtdata {
 	u16 expected_base;
 	/* Device context index */
 	u8 ctxt;
-
+	/* ctxt deletion is pending */
+	u8 del_pend;
 	/* PSM Specific fields */
 	/* lock protecting all Expected TID data */
 	struct mutex exp_mutex;
diff --git a/drivers/infiniband/hw/hfi1/init.c b/drivers/infiniband/hw/hfi1/init.c
index 044d9a8..d2ac3ef 100644
--- a/drivers/infiniband/hw/hfi1/init.c
+++ b/drivers/infiniband/hw/hfi1/init.c
@@ -215,12 +215,11 @@ static void hfi1_rcd_free(struct kref *kref)
 	struct hfi1_ctxtdata *rcd =
 		container_of(kref, struct hfi1_ctxtdata, kref);
 
-	hfi1_free_ctxtdata(rcd->dd, rcd);
-
 	spin_lock_irqsave(&rcd->dd->uctxt_lock, flags);
 	rcd->dd->rcd[rcd->ctxt] = NULL;
 	spin_unlock_irqrestore(&rcd->dd->uctxt_lock, flags);
 
+	hfi1_free_ctxtdata(rcd->dd, rcd);
 	kfree(rcd);
 }
 
@@ -326,7 +325,10 @@ struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct hfi1_devdata *dd, u16 ctxt)
 	spin_lock_irqsave(&dd->uctxt_lock, flags);
 	if (dd->rcd[ctxt]) {
 		rcd = dd->rcd[ctxt];
-		hfi1_rcd_get(rcd);
+		if (rcd->del_pend)
+			rcd = NULL;
+		else
+			hfi1_rcd_get(rcd);
 	}
 	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH for-rc 6/7] IB/hfi1: Remove overly conservative VM_EXEC flag check
  2019-01-17 20:40 ` Dennis Dalessandro
  (?)
  (?)
@ 2019-01-17 20:42 ` Dennis Dalessandro
  2019-01-18 21:02   ` Jason Gunthorpe
  -1 siblings, 1 reply; 11+ messages in thread
From: Dennis Dalessandro @ 2019-01-17 20:42 UTC (permalink / raw)
  To: jgg, dledford
  Cc: linux-rdma, Michael J. Ruhl, Mike Marciniszyn, Ira Weiny, stable

From: Michael J. Ruhl <michael.j.ruhl@intel.com>

Applications that use the stack for execution purposes cause
PSM jobs to fail during mmap().

Both Fortran (non-standard format parsing) and C (callback
functions located in the stack) applications can be written
such that stack execution is required.

Because of this the EXECSTACK bit can be automatically set at link
time for any application.

On application load, the ELF loader evaluates the EXECSTACK bit
for the application and it's linked libraries.  It will set the
process VM flags to allow the stack to include the VM_EXEC bit
if the EXECSTACK bit is set.  This flag is propagated to the
driver during the mmap() call in the vma flag bits.

Checking for this bit and failing the request with EPERM is overly
conservative and will break any PSM application that has the bit set.

Remove the VM_EXEC flag from the check.

Cc: <stable@vger.kernel.org> #v4.14+
Fixes: 12220267645c ("IB/hfi: Protect against writable mmap")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/hw/hfi1/file_ops.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
index 0ba0cf5..68c397a 100644
--- a/drivers/infiniband/hw/hfi1/file_ops.c
+++ b/drivers/infiniband/hw/hfi1/file_ops.c
@@ -488,7 +488,7 @@ static int hfi1_file_mmap(struct file *fp, struct vm_area_struct *vma)
 		vmf = 1;
 		break;
 	case STATUS:
-		if (flags & (unsigned long)(VM_WRITE | VM_EXEC)) {
+		if (flags & VM_WRITE) {
 			ret = -EPERM;
 			goto done;
 		}

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH for-rc 7/7] IB/hfi1: Add limit test for RC/UC send via loopback
  2019-01-17 20:40 ` Dennis Dalessandro
                   ` (2 preceding siblings ...)
  (?)
@ 2019-01-17 20:42 ` Dennis Dalessandro
  2019-01-18 21:04   ` Jason Gunthorpe
  -1 siblings, 1 reply; 11+ messages in thread
From: Dennis Dalessandro @ 2019-01-17 20:42 UTC (permalink / raw)
  To: jgg, dledford; +Cc: linux-rdma, Michael J. Ruhl, Mike Marciniszyn, stable

From: Mike Marciniszyn <mike.marciniszyn@intel.com>

Fix potential memory corruption and panic in loopback for
IB_WR_SEND variants.

The code blindly assumes the posted length will fit in the
fetched rwqe, which is not a valid assumption.

Fix by adding a limit test, and triggering the appropriate
send completion and putting the QP in an error state.  This
mimics the handling for non-loopback QPs.

Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
Cc: <stable@vger.kernel.org> #v4.20+
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
---
 drivers/infiniband/sw/rdmavt/qp.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index a1bd8cf..c6cc3e4 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -2910,6 +2910,8 @@ void rvt_ruc_loopback(struct rvt_qp *sqp)
 			goto op_err;
 		if (!ret)
 			goto rnr_nak;
+		if (wqe->length > qp->r_len)
+			goto inv_err;
 		break;
 
 	case IB_WR_RDMA_WRITE_WITH_IMM:
@@ -3078,7 +3080,10 @@ void rvt_ruc_loopback(struct rvt_qp *sqp)
 	goto err;
 
 inv_err:
-	send_status = IB_WC_REM_INV_REQ_ERR;
+	send_status =
+		sqp->ibqp.qp_type == IB_QPT_RC ?
+			IB_WC_REM_INV_REQ_ERR :
+			IB_WC_SUCCESS;
 	wc.status = IB_WC_LOC_QP_OP_ERR;
 	goto err;
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context disable and close
  2019-01-17 20:41 ` [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context disable and close Dennis Dalessandro
@ 2019-01-17 21:12   ` Jason Gunthorpe
  2019-01-17 21:32     ` Ruhl, Michael J
       [not found]   ` <20190122155611.B55A0217F9@mail.kernel.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2019-01-17 21:12 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: dledford, linux-rdma, Michael J. Ruhl, Mike Marciniszyn, stable

On Thu, Jan 17, 2019 at 12:41:54PM -0800, Dennis Dalessandro wrote:
> From: Michael J. Ruhl <michael.j.ruhl@intel.com>
> 
> When disabling and removing a receive context, it is possible for an
> asynchronous event (i.e IRQ) to occur.  Because of this there is a
> race between cleaning up the context, and the context being used by
> the asynchronous event.
> 
> cpu 0  (context cleanup)
>     rc->ref_count-- (ref_count == 0)
>     hfi1_rcd_free()
> cpu 1  (IRQ (with rcd index))
> 	rcd_get_by_index()
> 	lock
> 	ref_count+++     <-- reference count race (WARNING)
> 	return rcd
> 	unlock
> cpu 0
>     hfi1_free_ctxtdata() <-- incorrect free location
>     lock
>     remove rcd from array
>     unlock
>     free rcd
> 
> This race will cause the following WARNING trace:
> 
> WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52 hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
> CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE ------------ 3.10.0-957.el7.x86_64 #1
> Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
> Call Trace:
>   dump_stack+0x19/0x1b
>   __warn+0xd8/0x100
>   warn_slowpath_null+0x1d/0x20
>   hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
>   is_rcv_urgent_int+0x24/0x90 [hfi1]
>   general_interrupt+0x1b6/0x210 [hfi1]
>   __handle_irq_event_percpu+0x44/0x1c0
>   handle_irq_event_percpu+0x32/0x80
>   handle_irq_event+0x3c/0x60
>   handle_edge_irq+0x7f/0x150
>   handle_irq+0xe4/0x1a0
>   do_IRQ+0x4d/0xf0
>   common_interrupt+0x162/0x162
> 
> The race can also lead to a use after free which could be similar
> to:
> 
> general protection fault: 0000 1 SMP
> CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.el7.x86_64 #1
> Hardware name: Intel Corporation S2600KP/S2600KP, BIOS SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
> task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000 __kmalloc+0x94/0x230
> Call Trace:
>   ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
>   hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
>   hfi1_aio_write+0xba/0x110 [hfi1]
>   do_sync_readv_writev+0x7b/0xd0
>   do_readv_writev+0xce/0x260
>   ? handle_mm_fault+0x39d/0x9b0
>   ? pick_next_task_fair+0x5f/0x1b0
>   ? sched_clock_cpu+0x85/0xc0
>   ? __schedule+0x13a/0x890
>   vfs_writev+0x35/0x60
>   SyS_writev+0x7f/0x110
>   system_call_fastpath+0x22/0x27
> 
> Introduce a context enable flag to indicate if the context is enabled.
> 
> Reorder context cleanup to ensure context removal before cleanup
> occurs correctly.
> 
> Cc: stable@vger.kernel.org # v4.14.0+
> Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting receive contexts")
> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
>  drivers/infiniband/hw/hfi1/file_ops.c |    2 ++
>  drivers/infiniband/hw/hfi1/hfi.h      |    3 ++-
>  drivers/infiniband/hw/hfi1/init.c     |    8 +++++---
>  3 files changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/file_ops.c b/drivers/infiniband/hw/hfi1/file_ops.c
> index c22ebc7..0ba0cf5 100644
> +++ b/drivers/infiniband/hw/hfi1/file_ops.c
> @@ -671,6 +671,8 @@ static int hfi1_file_close(struct inode *inode, struct file *fp)
>  	}
>  	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
>  
> +	uctxt->del_pend = 1;

Not going to get good results if you update shared data outside the
lock that is reading it..

> @@ -326,7 +325,10 @@ struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct hfi1_devdata *dd, u16 ctxt)
>  	spin_lock_irqsave(&dd->uctxt_lock, flags);
>  	if (dd->rcd[ctxt]) {
>  		rcd = dd->rcd[ctxt];
> -		hfi1_rcd_get(rcd);
> +		if (rcd->del_pend)
> +			rcd = NULL;
> +		else
> +			hfi1_rcd_get(rcd);

This probably wants to be kref_get_not_zero() - why add a confusing
del_pend?

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context disable and close
  2019-01-17 21:12   ` Jason Gunthorpe
@ 2019-01-17 21:32     ` Ruhl, Michael J
  0 siblings, 0 replies; 11+ messages in thread
From: Ruhl, Michael J @ 2019-01-17 21:32 UTC (permalink / raw)
  To: Jason Gunthorpe, Dalessandro, Dennis
  Cc: dledford, linux-rdma, Marciniszyn, Mike, stable

>-----Original Message-----
>From: Jason Gunthorpe [mailto:jgg@ziepe.ca]
>Sent: Thursday, January 17, 2019 4:12 PM
>To: Dalessandro, Dennis <dennis.dalessandro@intel.com>
>Cc: dledford@redhat.com; linux-rdma@vger.kernel.org; Ruhl, Michael J
><michael.j.ruhl@intel.com>; Marciniszyn, Mike
><mike.marciniszyn@intel.com>; stable@vger.kernel.org
>Subject: Re: [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context
>disable and close
>
>On Thu, Jan 17, 2019 at 12:41:54PM -0800, Dennis Dalessandro wrote:
>> From: Michael J. Ruhl <michael.j.ruhl@intel.com>
>>
>> When disabling and removing a receive context, it is possible for an
>> asynchronous event (i.e IRQ) to occur.  Because of this there is a
>> race between cleaning up the context, and the context being used by
>> the asynchronous event.
>>
>> cpu 0  (context cleanup)
>>     rc->ref_count-- (ref_count == 0)
>>     hfi1_rcd_free()
>> cpu 1  (IRQ (with rcd index))
>> 	rcd_get_by_index()
>> 	lock
>> 	ref_count+++     <-- reference count race (WARNING)
>> 	return rcd
>> 	unlock
>> cpu 0
>>     hfi1_free_ctxtdata() <-- incorrect free location
>>     lock
>>     remove rcd from array
>>     unlock
>>     free rcd
>>
>> This race will cause the following WARNING trace:
>>
>> WARNING: CPU: 0 PID: 175027 at include/linux/kref.h:52
>hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
>> CPU: 0 PID: 175027 Comm: IMB-MPI1 Kdump: loaded Tainted: G OE -----------
>- 3.10.0-957.el7.x86_64 #1
>> Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
>SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
>> Call Trace:
>>   dump_stack+0x19/0x1b
>>   __warn+0xd8/0x100
>>   warn_slowpath_null+0x1d/0x20
>>   hfi1_rcd_get_by_index+0x84/0xa0 [hfi1]
>>   is_rcv_urgent_int+0x24/0x90 [hfi1]
>>   general_interrupt+0x1b6/0x210 [hfi1]
>>   __handle_irq_event_percpu+0x44/0x1c0
>>   handle_irq_event_percpu+0x32/0x80
>>   handle_irq_event+0x3c/0x60
>>   handle_edge_irq+0x7f/0x150
>>   handle_irq+0xe4/0x1a0
>>   do_IRQ+0x4d/0xf0
>>   common_interrupt+0x162/0x162
>>
>> The race can also lead to a use after free which could be similar
>> to:
>>
>> general protection fault: 0000 1 SMP
>> CPU: 71 PID: 177147 Comm: IMB-MPI1 Kdump: loaded Tainted: G W OE ------
>------ 3.10.0-957.el7.x86_64 #1
>> Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
>SE5C610.86B.11.01.0076.C4.111920150602 11/19/2015
>> task: ffff9962a8098000 ti: ffff99717a508000 task.ti: ffff99717a508000
>__kmalloc+0x94/0x230
>> Call Trace:
>>   ? hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
>>   hfi1_user_sdma_process_request+0x9c8/0x1250 [hfi1]
>>   hfi1_aio_write+0xba/0x110 [hfi1]
>>   do_sync_readv_writev+0x7b/0xd0
>>   do_readv_writev+0xce/0x260
>>   ? handle_mm_fault+0x39d/0x9b0
>>   ? pick_next_task_fair+0x5f/0x1b0
>>   ? sched_clock_cpu+0x85/0xc0
>>   ? __schedule+0x13a/0x890
>>   vfs_writev+0x35/0x60
>>   SyS_writev+0x7f/0x110
>>   system_call_fastpath+0x22/0x27
>>
>> Introduce a context enable flag to indicate if the context is enabled.
>>
>> Reorder context cleanup to ensure context removal before cleanup
>> occurs correctly.
>>
>> Cc: stable@vger.kernel.org # v4.14.0+
>> Fixes: f683c80ca68e ("IB/hfi1: Resolve kernel panics by reference counting
>receive contexts")
>> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
>> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
>> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
>>  drivers/infiniband/hw/hfi1/file_ops.c |    2 ++
>>  drivers/infiniband/hw/hfi1/hfi.h      |    3 ++-
>>  drivers/infiniband/hw/hfi1/init.c     |    8 +++++---
>>  3 files changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/hfi1/file_ops.c
>b/drivers/infiniband/hw/hfi1/file_ops.c
>> index c22ebc7..0ba0cf5 100644
>> +++ b/drivers/infiniband/hw/hfi1/file_ops.c
>> @@ -671,6 +671,8 @@ static int hfi1_file_close(struct inode *inode, struct
>file *fp)
>>  	}
>>  	spin_unlock_irqrestore(&dd->uctxt_lock, flags);
>>
>> +	uctxt->del_pend = 1;
>
>Not going to get good results if you update shared data outside the
>lock that is reading it..


>> @@ -326,7 +325,10 @@ struct hfi1_ctxtdata *hfi1_rcd_get_by_index(struct
>hfi1_devdata *dd, u16 ctxt)
>>  	spin_lock_irqsave(&dd->uctxt_lock, flags);
>>  	if (dd->rcd[ctxt]) {
>>  		rcd = dd->rcd[ctxt];
>> -		hfi1_rcd_get(rcd);
>> +		if (rcd->del_pend)
>> +			rcd = NULL;
>> +		else
>> +			hfi1_rcd_get(rcd);
>
>This probably wants to be kref_get_not_zero() - why add a confusing
>del_pend?

kref_get_unless_zero()?

/sigh.  I missed that function completely.

I will examine this some more and work up a new patch.

M

>Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-rc 6/7] IB/hfi1: Remove overly conservative VM_EXEC flag check
  2019-01-17 20:42 ` [PATCH for-rc 6/7] IB/hfi1: Remove overly conservative VM_EXEC flag check Dennis Dalessandro
@ 2019-01-18 21:02   ` Jason Gunthorpe
  2019-01-18 21:03     ` Jason Gunthorpe
  0 siblings, 1 reply; 11+ messages in thread
From: Jason Gunthorpe @ 2019-01-18 21:02 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: dledford, linux-rdma, Michael J. Ruhl, Mike Marciniszyn,
	Ira Weiny, stable

On Thu, Jan 17, 2019 at 12:42:04PM -0800, Dennis Dalessandro wrote:
> From: Michael J. Ruhl <michael.j.ruhl@intel.com>
> 
> Applications that use the stack for execution purposes cause
> PSM jobs to fail during mmap().
> 
> Both Fortran (non-standard format parsing) and C (callback
> functions located in the stack) applications can be written
> such that stack execution is required.
> 
> Because of this the EXECSTACK bit can be automatically set at link
> time for any application.
> 
> On application load, the ELF loader evaluates the EXECSTACK bit
> for the application and it's linked libraries.  It will set the
> process VM flags to allow the stack to include the VM_EXEC bit
> if the EXECSTACK bit is set.  This flag is propagated to the
> driver during the mmap() call in the vma flag bits.
> 
> Checking for this bit and failing the request with EPERM is overly
> conservative and will break any PSM application that has the bit set.
> 
> Remove the VM_EXEC flag from the check.
> 
> Cc: <stable@vger.kernel.org> #v4.14+
> Fixes: 12220267645c ("IB/hfi: Protect against writable mmap")
> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> ---
>  drivers/infiniband/hw/hfi1/file_ops.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)

applied to for-next

Thanks,
Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-rc 6/7] IB/hfi1: Remove overly conservative VM_EXEC flag check
  2019-01-18 21:02   ` Jason Gunthorpe
@ 2019-01-18 21:03     ` Jason Gunthorpe
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2019-01-18 21:03 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: dledford, linux-rdma, Michael J. Ruhl, Mike Marciniszyn,
	Ira Weiny, stable

On Fri, Jan 18, 2019 at 02:02:35PM -0700, Jason Gunthorpe wrote:
> On Thu, Jan 17, 2019 at 12:42:04PM -0800, Dennis Dalessandro wrote:
> > From: Michael J. Ruhl <michael.j.ruhl@intel.com>
> > 
> > Applications that use the stack for execution purposes cause
> > PSM jobs to fail during mmap().
> > 
> > Both Fortran (non-standard format parsing) and C (callback
> > functions located in the stack) applications can be written
> > such that stack execution is required.
> > 
> > Because of this the EXECSTACK bit can be automatically set at link
> > time for any application.
> > 
> > On application load, the ELF loader evaluates the EXECSTACK bit
> > for the application and it's linked libraries.  It will set the
> > process VM flags to allow the stack to include the VM_EXEC bit
> > if the EXECSTACK bit is set.  This flag is propagated to the
> > driver during the mmap() call in the vma flag bits.
> > 
> > Checking for this bit and failing the request with EPERM is overly
> > conservative and will break any PSM application that has the bit set.
> > 
> > Remove the VM_EXEC flag from the check.
> > 
> > Cc: <stable@vger.kernel.org> #v4.14+
> > Fixes: 12220267645c ("IB/hfi: Protect against writable mmap")
> > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> > Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> > Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> >  drivers/infiniband/hw/hfi1/file_ops.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> applied to for-next

sorry for-rc

Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH for-rc 7/7] IB/hfi1: Add limit test for RC/UC send via loopback
  2019-01-17 20:42 ` [PATCH for-rc 7/7] IB/hfi1: Add limit test for RC/UC send via loopback Dennis Dalessandro
@ 2019-01-18 21:04   ` Jason Gunthorpe
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Gunthorpe @ 2019-01-18 21:04 UTC (permalink / raw)
  To: Dennis Dalessandro
  Cc: dledford, linux-rdma, Michael J. Ruhl, Mike Marciniszyn, stable

On Thu, Jan 17, 2019 at 12:42:16PM -0800, Dennis Dalessandro wrote:
> From: Mike Marciniszyn <mike.marciniszyn@intel.com>
> 
> Fix potential memory corruption and panic in loopback for
> IB_WR_SEND variants.
> 
> The code blindly assumes the posted length will fit in the
> fetched rwqe, which is not a valid assumption.
> 
> Fix by adding a limit test, and triggering the appropriate
> send completion and putting the QP in an error state.  This
> mimics the handling for non-loopback QPs.
> 
> Fixes: 15703461533a ("IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt")
> Cc: <stable@vger.kernel.org> #v4.20+
> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> ---
>  drivers/infiniband/sw/rdmavt/qp.c |    7 ++++++-
>  1 files changed, 6 insertions(+), 1 deletions(-)

Applied to for-rc

Thanks,
Jason

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context disable and close
       [not found]   ` <20190122155611.B55A0217F9@mail.kernel.org>
@ 2019-01-22 16:24     ` Ruhl, Michael J
  0 siblings, 0 replies; 11+ messages in thread
From: Ruhl, Michael J @ 2019-01-22 16:24 UTC (permalink / raw)
  To: Sasha Levin, Dalessandro, Dennis, jgg, dledford; +Cc: linux-rdma, stable

Hi,

This patch needs to be redone.

Please drop.

Thanks,

Mike

>-----Original Message-----
>From: Sasha Levin [mailto:sashal@kernel.org]
>Sent: Tuesday, January 22, 2019 10:56 AM
>To: Sasha Levin <sashal@kernel.org>; Dalessandro, Dennis
><dennis.dalessandro@intel.com>; Ruhl, Michael J
><michael.j.ruhl@intel.com>; jgg@ziepe.ca; dledford@redhat.com
>Cc: linux-rdma@vger.kernel.org; stable@vger.kernel.org
>Subject: Re: [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context
>disable and close
>
>Hi,
>
>[This is an automated email]
>
>This commit has been processed because it contains a "Fixes:" tag,
>fixing commit: f683c80ca68e IB/hfi1: Resolve kernel panics by reference
>counting receive contexts.
>
>The bot has tested the following trees: v4.20.3, v4.19.16, v4.14.94.
>
>v4.20.3: Build OK!
>v4.19.16: Build OK!
>v4.14.94: Failed to apply! Possible dependencies:
>    071e4fec8e4d ("IB/hfi1: Reorg ctxtdata and rightsize fields")
>    1b311f8931cf ("IB/hfi1: Add tx_opcode_stats like the opcode_stats")
>    40442b30aad0 ("IB/hfi1: Move rhf_offset from devdata to ctxtdata")
>    b0ba3c18d6bf ("IB/hfi1: Move normal functions from hfi1_devdata to const
>array")
>    c8314811f9b2 ("IB/hfi1: Cleanup of exp_rcv")
>    ed71e86a8d66 ("IB/hfi1: Rename exp_lock to exp_mutex")
>
>
>How should we proceed with this patch?
>
>--
>Thanks,
>Sasha

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2019-01-22 16:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-17 20:40 [PATCH for-rc 0/7] hfi1 and qib patches for rc Dennis Dalessandro
2019-01-17 20:40 ` Dennis Dalessandro
2019-01-17 20:41 ` [PATCH for-rc 5/7] IB/hfi1: Close race condition on user context disable and close Dennis Dalessandro
2019-01-17 21:12   ` Jason Gunthorpe
2019-01-17 21:32     ` Ruhl, Michael J
     [not found]   ` <20190122155611.B55A0217F9@mail.kernel.org>
2019-01-22 16:24     ` Ruhl, Michael J
2019-01-17 20:42 ` [PATCH for-rc 6/7] IB/hfi1: Remove overly conservative VM_EXEC flag check Dennis Dalessandro
2019-01-18 21:02   ` Jason Gunthorpe
2019-01-18 21:03     ` Jason Gunthorpe
2019-01-17 20:42 ` [PATCH for-rc 7/7] IB/hfi1: Add limit test for RC/UC send via loopback Dennis Dalessandro
2019-01-18 21:04   ` Jason Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.