Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH rdma-rc 0/9] Fixes for v5.6
@ 2020-02-12  7:26 Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 1/9] RDMA/ucma: Mask QPN to be 24 bits according to IBTA Leon Romanovsky
                   ` (9 more replies)
  0 siblings, 10 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Leon Romanovsky <leonro@mellanox.com>

Hi,

This pack of small fixes is sent as a patchset simply to simplify their
tracking. Some of them, like first and second patches were already
sent to the mailing list. The ucma patch was in our regression for whole
cycle and we didn't notice any failures related to that change.

Changelog of second patch:
1. Maor added IB_QP_PKEY_INDEX and IB_QP_PORT checks and I rewrote the
code logic to be less hairy.

Thanks

Leon Romanovsky (2):
  RDMA/ucma: Mask QPN to be 24 bits according to IBTA
  RDMA/mlx5: Prevent overflow in mmap offset calculations

Maor Gottlieb (1):
  RDMA/core: Fix protection fault in get_pkey_idx_qp_list

Michael Guralnik (1):
  RDMA/core: Add missing list deletion on freeing event queue

Parav Pandit (1):
  Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"

Valentine Fatiev (1):
  IB/ipoib: Fix double free of skb in case of multicast traffic in CM
    mode

Yishai Hadas (1):
  IB/mlx5: Fix async events cleanup flows

Yonatan Cohen (1):
  IB/umad: Fix kernel crash while unloading ib_umad

Zhu Yanjun (1):
  RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq

 drivers/infiniband/core/cm.c               |  3 ++
 drivers/infiniband/core/cma.c              | 15 +++++--
 drivers/infiniband/core/security.c         | 24 ++++------
 drivers/infiniband/core/ucma.c             |  2 +-
 drivers/infiniband/core/user_mad.c         |  5 ++-
 drivers/infiniband/core/uverbs_std_types.c |  1 +
 drivers/infiniband/hw/mlx5/devx.c          | 51 ++++++++++++----------
 drivers/infiniband/hw/mlx5/main.c          |  4 +-
 drivers/infiniband/sw/rxe/rxe_comp.c       |  8 ++--
 drivers/infiniband/ulp/ipoib/ipoib.h       |  1 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c    | 15 ++++---
 drivers/infiniband/ulp/ipoib/ipoib_ib.c    |  8 +++-
 drivers/infiniband/ulp/ipoib/ipoib_main.c  |  1 +
 13 files changed, 78 insertions(+), 60 deletions(-)

--
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 1/9] RDMA/ucma: Mask QPN to be 24 bits according to IBTA
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Leon Romanovsky
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Leon Romanovsky <leonro@mellanox.com>

IBTA declares QPN as 24bits, mask input to ensure that kernel
doesn't get higher bits and ensure by adding WANR_ONCE() that
other CM users do the same.

Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cm.c   | 3 +++
 drivers/infiniband/core/ucma.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 68cc1b2d6824..33c0d9e7bb66 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -2188,6 +2188,9 @@ int ib_send_cm_rep(struct ib_cm_id *cm_id,
 	cm_id_priv->initiator_depth = param->initiator_depth;
 	cm_id_priv->responder_resources = param->responder_resources;
 	cm_id_priv->rq_psn = cpu_to_be32(IBA_GET(CM_REP_STARTING_PSN, rep_msg));
+	WARN_ONCE(param->qp_num & 0xFF000000,
+		  "IBTA declares QPN to be 24 bits, but it is 0x%X\n",
+		  param->qp_num);
 	cm_id_priv->local_qpn = cpu_to_be32(param->qp_num & 0xFFFFFF);
 
 out:	spin_unlock_irqrestore(&cm_id_priv->lock, flags);
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 0274e9b704be..57e68491a2fd 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1045,7 +1045,7 @@ static void ucma_copy_conn_param(struct rdma_cm_id *id,
 	dst->retry_count = src->retry_count;
 	dst->rnr_retry_count = src->rnr_retry_count;
 	dst->srq = src->srq;
-	dst->qp_num = src->qp_num;
+	dst->qp_num = src->qp_num & 0xFFFFFF;
 	dst->qkey = (id->route.addr.src_addr.ss_family == AF_IB) ? src->qkey : 0;
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 1/9] RDMA/ucma: Mask QPN to be 24 bits according to IBTA Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-12  8:01   ` Leon Romanovsky
  2020-02-12  8:06   ` Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow" Leon Romanovsky
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Maor Gottlieb <maorg@mellanox.com>

We don't need to set pkey as valid in case that user set only one of
pkey index or port number, otherwise it will be resulted in NULL
pointer dereference while accessing to uninitialized pkey list.
The following crash from Syzkaller revealed it.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 port_pkey_list_insert+0xd7/0x7c0
 ib_security_modify_qp+0x6fa/0xfc0
 _ib_modify_qp+0x8c4/0xbf0
 modify_qp+0x10da/0x16d0
 ib_uverbs_modify_qp+0x9a/0x100
 ib_uverbs_write+0xaa5/0xdf0
 __vfs_write+0x7c/0x100
 vfs_write+0x168/0x4a0
 ksys_write+0xc8/0x200
 do_syscall_64+0x9c/0x390
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/security.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 6eb6d2717ca5..5c8bf8ffb08c 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -339,22 +339,16 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
 	if (!new_pps)
 		return NULL;
 
-	if (qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) {
-		if (!qp_pps) {
-			new_pps->main.port_num = qp_attr->port_num;
-			new_pps->main.pkey_index = qp_attr->pkey_index;
-		} else {
-			new_pps->main.port_num = (qp_attr_mask & IB_QP_PORT) ?
-						  qp_attr->port_num :
-						  qp_pps->main.port_num;
-
-			new_pps->main.pkey_index =
-					(qp_attr_mask & IB_QP_PKEY_INDEX) ?
-					 qp_attr->pkey_index :
-					 qp_pps->main.pkey_index;
-		}
+	if (qp_attr_mask & IB_QP_PORT)
+		new_pps->main.port_num =
+			(qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
+	if (qp_attr_mask & IB_QP_PKEY_INDEX)
+		new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
+						      qp_attr->pkey_index;
+	if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
 		new_pps->main.state = IB_PORT_PKEY_VALID;
-	} else if (qp_pps) {
+
+	if (!(qp_attr_mask & IB_QP_PKEY_INDEX & IB_QP_PORT) && qp_pps) {
 		new_pps->main.port_num = qp_pps->main.port_num;
 		new_pps->main.pkey_index = qp_pps->main.pkey_index;
 		if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 1/9] RDMA/ucma: Mask QPN to be 24 bits according to IBTA Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-13 13:30   ` Jason Gunthorpe
  2020-02-12  7:26 ` [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Parav Pandit <parav@mellanox.com>

This reverts commit 219d2e9dfda9431b808c28d5efc74b404b95b638.

Below flow requires cm_id_priv's destination address to be setup
before performing rdma_bind_addr().
Without which, source port allocation for existing bind list fails
when port range is small, resulting into rdma_resolve_addr()
failure.

rdma_resolve_addr()
  cma_bind_addr()
    rdma_bind_addr()
      cma_get_port()
        cma_alloc_any_port()
          cma_port_is_unique() <- compared with zero daddr

Fixes: 219d2e9dfda9 ("RDMA/cma: Simplify rdma_resolve_addr() error flow")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/cma.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 72f032160c4b..2dec3a02ab9f 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -3212,19 +3212,26 @@ int rdma_resolve_addr(struct rdma_cm_id *id, struct sockaddr *src_addr,
 	int ret;
 
 	id_priv = container_of(id, struct rdma_id_private, id);
+	memcpy(cma_dst_addr(id_priv), dst_addr, rdma_addr_size(dst_addr));
 	if (id_priv->state == RDMA_CM_IDLE) {
 		ret = cma_bind_addr(id, src_addr, dst_addr);
-		if (ret)
+		if (ret) {
+			memset(cma_dst_addr(id_priv), 0,
+			       rdma_addr_size(dst_addr));
 			return ret;
+		}
 	}
 
-	if (cma_family(id_priv) != dst_addr->sa_family)
+	if (cma_family(id_priv) != dst_addr->sa_family) {
+		memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr));
 		return -EINVAL;
+	}
 
-	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY))
+	if (!cma_comp_exch(id_priv, RDMA_CM_ADDR_BOUND, RDMA_CM_ADDR_QUERY)) {
+		memset(cma_dst_addr(id_priv), 0, rdma_addr_size(dst_addr));
 		return -EINVAL;
+	}
 
-	memcpy(cma_dst_addr(id_priv), dst_addr, rdma_addr_size(dst_addr));
 	if (cma_any_addr(dst_addr)) {
 		ret = cma_resolve_loopback(id_priv);
 	} else {
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
                   ` (2 preceding siblings ...)
  2020-02-12  7:26 ` [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow" Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-13 15:37   ` Jason Gunthorpe
  2020-02-12  7:26 ` [PATCH rdma-rc 5/9] RDMA/core: Add missing list deletion on freeing event queue Leon Romanovsky
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Valentine Fatiev <valentinef@mellanox.com>

While connected mode set and we have connected and datagram traffic
in parallel it might end with double free of datagram skb.

Current mechanism assumes that order in completion queue is the same as
theorder of sent packets for all QPs. Order is kept only for specific QP,
in case of mixed UD and CM traffic we have few QPs(one UD and few CM's)
in parallel.

The problem:
----------------------------------------------------------

Transmit queue:
-----------------
UD skb pointer kept in queue itself , CM skb kept in spearate queue and
use transmit queue as a placeholder to count number of transmitted packets.

0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
------------------------------------------------------------
NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
--------------------------------------------------------------
    ^                                  ^
   tail                               head

Completion queue(problematic scenario) - order not the same as in
transmit queue

  1  2  3  4  5  6  7  8  9
----------------------------
 ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
----------------------------

1. CM1 'wc' processing
   - skb freed in cm separate ring.
   - tx_tail of transmit queue increased although UD2 not freed.
     Now driver assume UD2 index is already freed and it could be used
     for new transmitted skb.

0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
------------------------------------------------------------
NL NL  UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
--------------------------------------------------------------
        ^   ^                       ^
      (Bad)tail                    head
(Bad - Could be used for new SKB)

In this case (due to heavy load) UD2 skb pointer could be replaced by
new transmitted packet UD_NEW , as driver assume its free.
At this point we will have to process two 'wc' with same index but we
have only one pointer to free.
During second attempt to free same skb we will have NULL pointer exception.

2. UD2 'wc' processing
   - skb freed according the index we got from 'wc' ,but
     it was already overwritten by mistake.So actually skb that was
     released its skb of new transmitted packet and no the original one.

3. UD_NEW 'wc' processing
   - attempt to free already freed skb. NUll pointer exception.

Fix:
-----------------------------------------------------------------------
The fix is to stop using UD ring also as placeholder for CM packets,
instead using atomic counter to keep total packets sent value and use it
to stop/wake queue.

Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable in all modes")
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/ulp/ipoib/ipoib.h      |  1 +
 drivers/infiniband/ulp/ipoib/ipoib_cm.c   | 15 ++++++++-------
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   |  8 ++++++--
 drivers/infiniband/ulp/ipoib/ipoib_main.c |  1 +
 4 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index 2aa3457a30ce..c614cb87d09b 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -379,6 +379,7 @@ struct ipoib_dev_priv {
 	struct ipoib_tx_buf *tx_ring;
 	unsigned int	     tx_head;
 	unsigned int	     tx_tail;
+	atomic_t             tx_outstanding;
 	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
 	struct ib_ud_wr      tx_wr;
 	struct ib_wc	     send_wc[MAX_SEND_CQE];
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index c59e00a0881f..db6aace83fe5 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -756,7 +756,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 		return;
 	}
 
-	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
+	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
 		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
 			  tx->qp->qp_num);
 		netif_stop_queue(dev);
@@ -786,7 +786,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
 	} else {
 		netif_trans_update(dev);
 		++tx->tx_head;
-		++priv->tx_head;
+		atomic_inc(&priv->tx_outstanding);
 	}
 }
 
@@ -820,11 +820,11 @@ void ipoib_cm_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 	netif_tx_lock(dev);
 
 	++tx->tx_tail;
-	++priv->tx_tail;
+	atomic_dec(&priv->tx_outstanding);
 
 	if (unlikely(netif_queue_stopped(dev) &&
-		     (priv->tx_head - priv->tx_tail) <= ipoib_sendq_size >> 1 &&
-		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
+	    (atomic_read(&priv->tx_outstanding) <= ipoib_sendq_size >> 1) &&
+	    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
 		netif_wake_queue(dev);
 
 	if (wc->status != IB_WC_SUCCESS &&
@@ -1232,8 +1232,9 @@ static void ipoib_cm_tx_destroy(struct ipoib_cm_tx *p)
 		dev_kfree_skb_any(tx_req->skb);
 		netif_tx_lock_bh(p->dev);
 		++p->tx_tail;
-		++priv->tx_tail;
-		if (unlikely(priv->tx_head - priv->tx_tail == ipoib_sendq_size >> 1) &&
+		atomic_dec(&priv->tx_outstanding);
+		if (unlikely(atomic_read(&priv->tx_outstanding) <=
+			     ipoib_sendq_size >> 1) &&
 		    netif_queue_stopped(p->dev) &&
 		    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
 			netif_wake_queue(p->dev);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index c332b4761816..0652f25e8d0f 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -407,9 +407,11 @@ static void ipoib_ib_handle_tx_wc(struct net_device *dev, struct ib_wc *wc)
 	dev_kfree_skb_any(tx_req->skb);
 
 	++priv->tx_tail;
+	atomic_dec(&priv->tx_outstanding);
 
 	if (unlikely(netif_queue_stopped(dev) &&
-		     ((priv->tx_head - priv->tx_tail) <= ipoib_sendq_size >> 1) &&
+		     (atomic_read(&priv->tx_outstanding) <= ipoib_sendq_size >>
+		      1) &&
 		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
 		netif_wake_queue(dev);
 
@@ -634,7 +636,7 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb,
 	else
 		priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
 	/* increase the tx_head after send success, but use it for queue state */
-	if (priv->tx_head - priv->tx_tail == ipoib_sendq_size - 1) {
+	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
 		ipoib_dbg(priv, "TX ring full, stopping kernel net queue\n");
 		netif_stop_queue(dev);
 	}
@@ -662,6 +664,7 @@ int ipoib_send(struct net_device *dev, struct sk_buff *skb,
 
 		rc = priv->tx_head;
 		++priv->tx_head;
+		atomic_inc(&priv->tx_outstanding);
 	}
 	return rc;
 }
@@ -807,6 +810,7 @@ int ipoib_ib_dev_stop_default(struct net_device *dev)
 				ipoib_dma_unmap_tx(priv, tx_req);
 				dev_kfree_skb_any(tx_req->skb);
 				++priv->tx_tail;
+				atomic_dec(&priv->tx_outstanding);
 			}
 
 			for (i = 0; i < ipoib_recvq_size; ++i) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 4a0d3a9e72e1..0d98886cfef8 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1706,6 +1706,7 @@ static int ipoib_dev_init_default(struct net_device *dev)
 	}
 
 	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
+	atomic_set(&priv->tx_outstanding, 0);
 
 	if (ipoib_transport_dev_init(dev, priv->ca)) {
 		pr_warn("%s: ipoib_transport_dev_init failed\n",
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 5/9] RDMA/core: Add missing list deletion on freeing event queue
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
                   ` (3 preceding siblings ...)
  2020-02-12  7:26 ` [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 6/9] IB/mlx5: Fix async events cleanup flows Leon Romanovsky
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Michael Guralnik <michaelgur@mellanox.com>

Fix a bug on disassociate flow while deleting the event queues.
Current code has a race between ib_uverbs_free_event_queue() and
ib_uverbs_event_read() which might leave entries in the list and bring
double free.

Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects")
Signed-off-by: Michael Guralnik <michaelgur@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/uverbs_std_types.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/core/uverbs_std_types.c b/drivers/infiniband/core/uverbs_std_types.c
index 994d8744b246..3abfc63225cb 100644
--- a/drivers/infiniband/core/uverbs_std_types.c
+++ b/drivers/infiniband/core/uverbs_std_types.c
@@ -220,6 +220,7 @@ void ib_uverbs_free_event_queue(struct ib_uverbs_event_queue *event_queue)
 	list_for_each_entry_safe(entry, tmp, &event_queue->event_list, list) {
 		if (entry->counter)
 			list_del(&entry->obj_list);
+		list_del(&entry->list);
 		kfree(entry);
 	}
 	spin_unlock_irq(&event_queue->lock);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 6/9] IB/mlx5: Fix async events cleanup flows
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
                   ` (4 preceding siblings ...)
  2020-02-12  7:26 ` [PATCH rdma-rc 5/9] RDMA/core: Add missing list deletion on freeing event queue Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 7/9] RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq Leon Romanovsky
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Yishai Hadas <yishaih@mellanox.com>

Fix async events flows to prevent race between the read event
APIs and their destroy uobj API.

In both async command/event flows, delete the event entry from the list
before its memory de-allocation and fix the async command flow to check
properly for the 'is_destroyed' under the lock.

The above comes to prevent accessing an entry post its de-allocation.

Fixes: f7c8416ccea5 ("RDMA/core: Simplify destruction of FD uobjects")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/devx.c | 51 +++++++++++++++++--------------
 1 file changed, 28 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/devx.c b/drivers/infiniband/hw/mlx5/devx.c
index d7efc9f6daf0..46e1ab771f10 100644
--- a/drivers/infiniband/hw/mlx5/devx.c
+++ b/drivers/infiniband/hw/mlx5/devx.c
@@ -2319,14 +2319,12 @@ static int deliver_event(struct devx_event_subscription *event_sub,
 
 	if (ev_file->omit_data) {
 		spin_lock_irqsave(&ev_file->lock, flags);
-		if (!list_empty(&event_sub->event_list)) {
+		if (!list_empty(&event_sub->event_list) ||
+		    ev_file->is_destroyed) {
 			spin_unlock_irqrestore(&ev_file->lock, flags);
 			return 0;
 		}
 
-		/* is_destroyed is ignored here because we don't have any memory
-		 * allocation to clean up for the omit_data case
-		 */
 		list_add_tail(&event_sub->event_list, &ev_file->event_list);
 		spin_unlock_irqrestore(&ev_file->lock, flags);
 		wake_up_interruptible(&ev_file->poll_wait);
@@ -2473,11 +2471,11 @@ static ssize_t devx_async_cmd_event_read(struct file *filp, char __user *buf,
 			return -ERESTARTSYS;
 		}
 
-		if (list_empty(&ev_queue->event_list) &&
-		    ev_queue->is_destroyed)
-			return -EIO;
-
 		spin_lock_irq(&ev_queue->lock);
+		if (ev_queue->is_destroyed) {
+			spin_unlock_irq(&ev_queue->lock);
+			return -EIO;
+		}
 	}
 
 	event = list_entry(ev_queue->event_list.next,
@@ -2551,10 +2549,6 @@ static ssize_t devx_async_event_read(struct file *filp, char __user *buf,
 		return -EOVERFLOW;
 	}
 
-	if (ev_file->is_destroyed) {
-		spin_unlock_irq(&ev_file->lock);
-		return -EIO;
-	}
 
 	while (list_empty(&ev_file->event_list)) {
 		spin_unlock_irq(&ev_file->lock);
@@ -2667,8 +2661,10 @@ static int devx_async_cmd_event_destroy_uobj(struct ib_uobject *uobj,
 
 	spin_lock_irq(&comp_ev_file->ev_queue.lock);
 	list_for_each_entry_safe(entry, tmp,
-				 &comp_ev_file->ev_queue.event_list, list)
+				 &comp_ev_file->ev_queue.event_list, list) {
+		list_del(&entry->list);
 		kvfree(entry);
+	}
 	spin_unlock_irq(&comp_ev_file->ev_queue.lock);
 	return 0;
 };
@@ -2680,11 +2676,29 @@ static int devx_async_event_destroy_uobj(struct ib_uobject *uobj,
 		container_of(uobj, struct devx_async_event_file,
 			     uobj);
 	struct devx_event_subscription *event_sub, *event_sub_tmp;
-	struct devx_async_event_data *entry, *tmp;
 	struct mlx5_ib_dev *dev = ev_file->dev;
 
 	spin_lock_irq(&ev_file->lock);
 	ev_file->is_destroyed = 1;
+
+	/* free the pending events allocation */
+	if (ev_file->omit_data) {
+		struct devx_event_subscription *event_sub, *tmp;
+
+		list_for_each_entry_safe(event_sub, tmp, &ev_file->event_list,
+					 event_list)
+			list_del_init(&event_sub->event_list);
+
+	} else {
+		struct devx_async_event_data *entry, *tmp;
+
+		list_for_each_entry_safe(entry, tmp, &ev_file->event_list,
+					 list) {
+			list_del(&entry->list);
+			kfree(entry);
+		}
+	}
+
 	spin_unlock_irq(&ev_file->lock);
 	wake_up_interruptible(&ev_file->poll_wait);
 
@@ -2699,15 +2713,6 @@ static int devx_async_event_destroy_uobj(struct ib_uobject *uobj,
 	}
 	mutex_unlock(&dev->devx_event_table.event_xa_lock);
 
-	/* free the pending events allocation */
-	if (!ev_file->omit_data) {
-		spin_lock_irq(&ev_file->lock);
-		list_for_each_entry_safe(entry, tmp,
-					 &ev_file->event_list, list)
-			kfree(entry); /* read can't come any more */
-		spin_unlock_irq(&ev_file->lock);
-	}
-
 	put_device(&dev->ib_dev.dev);
 	return 0;
 };
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 7/9] RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
                   ` (5 preceding siblings ...)
  2020-02-12  7:26 ` [PATCH rdma-rc 6/9] IB/mlx5: Fix async events cleanup flows Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-12  7:26 ` [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad Leon Romanovsky
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Zhu Yanjun <yanjunz@mellanox.com>

When run stress tests with RXE, the following Call Traces often occur
"
watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0]
...
Call Trace:
<IRQ>
create_object+0x3f/0x3b0
kmem_cache_alloc_node_trace+0x129/0x2d0
__kmalloc_reserve.isra.52+0x2e/0x80
__alloc_skb+0x83/0x270
rxe_init_packet+0x99/0x150 [rdma_rxe]
rxe_requester+0x34e/0x11a0 [rdma_rxe]
rxe_do_task+0x85/0xf0 [rdma_rxe]
tasklet_action_common.isra.21+0xeb/0x100
__do_softirq+0xd0/0x298
irq_exit+0xc5/0xd0
smp_apic_timer_interrupt+0x68/0x120
apic_timer_interrupt+0xf/0x20
</IRQ>
...
"
The root cause is that tasklet is actually a softirq. In a tasklet handler,
another softirq handler is triggered. Usually these softirq handlers run
on the same cpu core. So this will cause "soft lockup Bug".

Fixes: 8700e3e7c485 ("Soft RoCE driver")
CC: Yossef Efraim <yossefe@mellanox.com>
Signed-off-by: Zhu Yanjun <yanjunz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/sw/rxe/rxe_comp.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 116cafc9afcf..4bc88708b355 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -329,7 +329,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
 					qp->comp.psn = pkt->psn;
 					if (qp->req.wait_psn) {
 						qp->req.wait_psn = 0;
-						rxe_run_task(&qp->req.task, 1);
+						rxe_run_task(&qp->req.task, 0);
 					}
 				}
 				return COMPST_ERROR_RETRY;
@@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
 	 */
 	if (qp->req.wait_fence) {
 		qp->req.wait_fence = 0;
-		rxe_run_task(&qp->req.task, 1);
+		rxe_run_task(&qp->req.task, 0);
 	}
 }
 
@@ -479,7 +479,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
 		if (qp->req.need_rd_atomic) {
 			qp->comp.timeout_retry = 0;
 			qp->req.need_rd_atomic = 0;
-			rxe_run_task(&qp->req.task, 1);
+			rxe_run_task(&qp->req.task, 0);
 		}
 	}
 
@@ -725,7 +725,7 @@ int rxe_completer(void *arg)
 							RXE_CNT_COMP_RETRY);
 					qp->req.need_retry = 1;
 					qp->comp.started_retry = 1;
-					rxe_run_task(&qp->req.task, 1);
+					rxe_run_task(&qp->req.task, 0);
 				}
 
 				if (pkt) {
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
                   ` (6 preceding siblings ...)
  2020-02-12  7:26 ` [PATCH rdma-rc 7/9] RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-13 14:28   ` Jason Gunthorpe
  2020-02-12  7:26 ` [PATCH rdma-rc 9/9] RDMA/mlx5: Prevent overflow in mmap offset calculations Leon Romanovsky
  2020-02-13 18:03 ` [PATCH rdma-rc 0/9] Fixes for v5.6 Jason Gunthorpe
  9 siblings, 1 reply; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Yonatan Cohen <yonatanc@mellanox.com>

When unloading ib_umad, remove ibdev sys file 1st before
port removal to prevent kernel oops.

ib_mad's method ibdev_show() might access a umad port
whoes ibdev field has already been NULLed when rmmod ib_umad
was issued from another shell.

Consider this scenario
	         shell-1            	shell-2
	        rmmod ib_mod    	cat /sys/devices/../ibdev
	            |           		|
	        ib_umad_kill_port()        ibdev_show()
	     port->ib_dev = NULL	dev_name(port->ib_dev)

kernel stack
PF: error_code(0x0000) - not-present page
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
FS:  00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
Call Trace:
 dev_attr_show+0x15/0x50
 sysfs_kf_seq_show+0xb8/0x1a0
 seq_read+0x12d/0x350
 vfs_read+0x89/0x140
 ksys_read+0x55/0xd0
 do_syscall_64+0x55/0x1b0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9:

Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/user_mad.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index d1407fa378e8..1235ffb2389b 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1312,6 +1312,9 @@ static void ib_umad_kill_port(struct ib_umad_port *port)
 	struct ib_umad_file *file;
 	int id;
 
+	cdev_device_del(&port->sm_cdev, &port->sm_dev);
+	cdev_device_del(&port->cdev, &port->dev);
+
 	mutex_lock(&port->file_mutex);
 
 	/* Mark ib_dev NULL and block ioctl or other file ops to progress
@@ -1331,8 +1334,6 @@ static void ib_umad_kill_port(struct ib_umad_port *port)
 
 	mutex_unlock(&port->file_mutex);
 
-	cdev_device_del(&port->sm_cdev, &port->sm_dev);
-	cdev_device_del(&port->cdev, &port->dev);
 	ida_free(&umad_ida, port->dev_num);
 
 	/* balances device_initialize() */
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH rdma-rc 9/9] RDMA/mlx5: Prevent overflow in mmap offset calculations
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
                   ` (7 preceding siblings ...)
  2020-02-12  7:26 ` [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad Leon Romanovsky
@ 2020-02-12  7:26 ` Leon Romanovsky
  2020-02-13 18:03 ` [PATCH rdma-rc 0/9] Fixes for v5.6 Jason Gunthorpe
  9 siblings, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  7:26 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: Leon Romanovsky, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Jason Gunthorpe, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

From: Leon Romanovsky <leonro@mellanox.com>

The cmd and index variables declared as u16 and the result is supposed
to be stored in u64. The C arithmetic rules doesn't promote "(index >>
8) << 16" to be u64 and leaves the end result to be u16.

Fixes: 7be76bef320b ("IB/mlx5: Introduce VAR object and its alloc/destroy methods")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/hw/mlx5/main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index e874d688d040..987bfdcd12a5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2283,8 +2283,8 @@ static int mlx5_ib_mmap_offset(struct mlx5_ib_dev *dev,
 
 static u64 mlx5_entry_to_mmap_offset(struct mlx5_user_mmap_entry *entry)
 {
-	u16 cmd = entry->rdma_entry.start_pgoff >> 16;
-	u16 index = entry->rdma_entry.start_pgoff & 0xFFFF;
+	u64 cmd = (entry->rdma_entry.start_pgoff >> 16) & 0xFFFF;
+	u64 index = entry->rdma_entry.start_pgoff & 0xFFFF;
 
 	return (((index >> 8) << 16) | (cmd << MLX5_IB_MMAP_CMD_SHIFT) |
 		(index & 0xFF)) << PAGE_SHIFT;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list
  2020-02-12  7:26 ` [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Leon Romanovsky
@ 2020-02-12  8:01   ` Leon Romanovsky
  2020-02-12  8:06   ` Leon Romanovsky
  1 sibling, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  8:01 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: RDMA mailing list, Daniel Jurgens, Erez Shitrit, Jason Gunthorpe,
	Maor Gottlieb, Michael Guralnik, Moni Shoua, Parav Pandit,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen,
	Zhu Yanjun

On Wed, Feb 12, 2020 at 09:26:28AM +0200, Leon Romanovsky wrote:
> From: Maor Gottlieb <maorg@mellanox.com>
>
> We don't need to set pkey as valid in case that user set only one of
> pkey index or port number, otherwise it will be resulted in NULL
> pointer dereference while accessing to uninitialized pkey list.
> The following crash from Syzkaller revealed it.
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN PTI
> CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
> Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
> 8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
> 02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
> RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
> RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
> RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
> R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
> R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
> FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  port_pkey_list_insert+0xd7/0x7c0
>  ib_security_modify_qp+0x6fa/0xfc0
>  _ib_modify_qp+0x8c4/0xbf0
>  modify_qp+0x10da/0x16d0
>  ib_uverbs_modify_qp+0x9a/0x100
>  ib_uverbs_write+0xaa5/0xdf0
>  __vfs_write+0x7c/0x100
>  vfs_write+0x168/0x4a0
>  ksys_write+0xc8/0x200
>  do_syscall_64+0x9c/0x390
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/core/security.c | 24 +++++++++---------------
>  1 file changed, 9 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
> index 6eb6d2717ca5..5c8bf8ffb08c 100644
> --- a/drivers/infiniband/core/security.c
> +++ b/drivers/infiniband/core/security.c
> @@ -339,22 +339,16 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
>  	if (!new_pps)
>  		return NULL;
>
> -	if (qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) {
> -		if (!qp_pps) {
> -			new_pps->main.port_num = qp_attr->port_num;
> -			new_pps->main.pkey_index = qp_attr->pkey_index;
> -		} else {
> -			new_pps->main.port_num = (qp_attr_mask & IB_QP_PORT) ?
> -						  qp_attr->port_num :
> -						  qp_pps->main.port_num;
> -
> -			new_pps->main.pkey_index =
> -					(qp_attr_mask & IB_QP_PKEY_INDEX) ?
> -					 qp_attr->pkey_index :
> -					 qp_pps->main.pkey_index;
> -		}
> +	if (qp_attr_mask & IB_QP_PORT)
> +		new_pps->main.port_num =
> +			(qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
> +	if (qp_attr_mask & IB_QP_PKEY_INDEX)
> +		new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
> +						      qp_attr->pkey_index;
> +	if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
>  		new_pps->main.state = IB_PORT_PKEY_VALID;
> -	} else if (qp_pps) {
> +
> +	if (!(qp_attr_mask & IB_QP_PKEY_INDEX & IB_QP_PORT) && qp_pps) {
                                              ^^
Sorry, this is rebase error, I'll send another version of this patch now.

>  		new_pps->main.port_num = qp_pps->main.port_num;
>  		new_pps->main.pkey_index = qp_pps->main.pkey_index;
>  		if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
> --
> 2.24.1
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list
  2020-02-12  7:26 ` [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Leon Romanovsky
  2020-02-12  8:01   ` Leon Romanovsky
@ 2020-02-12  8:06   ` Leon Romanovsky
  1 sibling, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-12  8:06 UTC (permalink / raw)
  To: Doug Ledford, Jason Gunthorpe
  Cc: RDMA mailing list, Daniel Jurgens, Erez Shitrit, Jason Gunthorpe,
	Maor Gottlieb, Michael Guralnik, Moni Shoua, Parav Pandit,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen,
	Zhu Yanjun

On Wed, Feb 12, 2020 at 09:26:28AM +0200, Leon Romanovsky wrote:
> From: Maor Gottlieb <maorg@mellanox.com>
>
> We don't need to set pkey as valid in case that user set only one of
> pkey index or port number, otherwise it will be resulted in NULL
> pointer dereference while accessing to uninitialized pkey list.
> The following crash from Syzkaller revealed it.
>
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory access
> general protection fault: 0000 [#1] SMP KASAN PTI
> CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
> RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
> Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
> 8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
> 02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
> RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
> RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
> RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
> RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
> R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
> R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
> FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  port_pkey_list_insert+0xd7/0x7c0
>  ib_security_modify_qp+0x6fa/0xfc0
>  _ib_modify_qp+0x8c4/0xbf0
>  modify_qp+0x10da/0x16d0
>  ib_uverbs_modify_qp+0x9a/0x100
>  ib_uverbs_write+0xaa5/0xdf0
>  __vfs_write+0x7c/0x100
>  vfs_write+0x168/0x4a0
>  ksys_write+0xc8/0x200
>  do_syscall_64+0x9c/0x390
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
> Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> ---
>  drivers/infiniband/core/security.c | 24 +++++++++---------------
>  1 file changed, 9 insertions(+), 15 deletions(-)


From e4536a662610eac76dc817c3a6956fa19542772d Mon Sep 17 00:00:00 2001
From: Maor Gottlieb <maorg@mellanox.com>
Date: Thu, 16 Jan 2020 09:31:40 +0200
Subject: [PATCH rdma-rc v1] RDMA/core: Fix protection fault in
 get_pkey_idx_qp_list
To: Doug Ledford <dledford@redhat.com>, Jason Gunthorpe <jgg@mellanox.com>
Cc: RDMA mailing list <linux-rdma@vger.kernel.org>, Daniel Jurgens <danielj@mellanox.com>, Maor Gottlieb <maorg@mellanox.com>

We don't need to set pkey as valid in case that user set only one of
pkey index or port number, otherwise it will be resulted in NULL
pointer dereference while accessing to uninitialized pkey list.
The following crash from Syzkaller revealed it.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0
Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48
8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04
02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8
RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec
RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010
RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f
R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430
R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000
FS:  00007f20777de700(0000) GS:ffff88811b100000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 port_pkey_list_insert+0xd7/0x7c0
 ib_security_modify_qp+0x6fa/0xfc0
 _ib_modify_qp+0x8c4/0xbf0
 modify_qp+0x10da/0x16d0
 ib_uverbs_modify_qp+0x9a/0x100
 ib_uverbs_write+0xaa5/0xdf0
 __vfs_write+0x7c/0x100
 vfs_write+0x168/0x4a0
 ksys_write+0xc8/0x200
 do_syscall_64+0x9c/0x390
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
Changelog:
 1. Replaced "&" with "||" in last if.
---
 drivers/infiniband/core/security.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c
index 6eb6d2717ca5..2b4d80393bd0 100644
--- a/drivers/infiniband/core/security.c
+++ b/drivers/infiniband/core/security.c
@@ -339,22 +339,16 @@ static struct ib_ports_pkeys *get_new_pps(const struct ib_qp *qp,
 	if (!new_pps)
 		return NULL;

-	if (qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) {
-		if (!qp_pps) {
-			new_pps->main.port_num = qp_attr->port_num;
-			new_pps->main.pkey_index = qp_attr->pkey_index;
-		} else {
-			new_pps->main.port_num = (qp_attr_mask & IB_QP_PORT) ?
-						  qp_attr->port_num :
-						  qp_pps->main.port_num;
-
-			new_pps->main.pkey_index =
-					(qp_attr_mask & IB_QP_PKEY_INDEX) ?
-					 qp_attr->pkey_index :
-					 qp_pps->main.pkey_index;
-		}
+	if (qp_attr_mask & IB_QP_PORT)
+		new_pps->main.port_num =
+			(qp_pps) ? qp_pps->main.port_num : qp_attr->port_num;
+	if (qp_attr_mask & IB_QP_PKEY_INDEX)
+		new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index :
+						      qp_attr->pkey_index;
+	if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT))
 		new_pps->main.state = IB_PORT_PKEY_VALID;
-	} else if (qp_pps) {
+
+	if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) {
 		new_pps->main.port_num = qp_pps->main.port_num;
 		new_pps->main.pkey_index = qp_pps->main.pkey_index;
 		if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
--
2.24.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
  2020-02-12  7:26 ` [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow" Leon Romanovsky
@ 2020-02-13 13:30   ` Jason Gunthorpe
  2020-02-14  3:11     ` Parav Pandit
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 13:30 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Daniel Jurgens,
	Erez Shitrit, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

On Wed, Feb 12, 2020 at 09:26:29AM +0200, Leon Romanovsky wrote:
> From: Parav Pandit <parav@mellanox.com>
> 
> This reverts commit 219d2e9dfda9431b808c28d5efc74b404b95b638.
> 
> Below flow requires cm_id_priv's destination address to be setup
> before performing rdma_bind_addr().
> Without which, source port allocation for existing bind list fails
> when port range is small, resulting into rdma_resolve_addr()
> failure.

I don't quite understands this - what is "when port range is small" ?
 
> rdma_resolve_addr()
>   cma_bind_addr()
>     rdma_bind_addr()
>       cma_get_port()
>         cma_alloc_any_port()
>           cma_port_is_unique() <- compared with zero daddr

Do you understand why cma_port_is_unique is even testing the dst_addr?
It seems very strange.

Outbound connections should not alias the source port in the first
place??

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad
  2020-02-12  7:26 ` [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad Leon Romanovsky
@ 2020-02-13 14:28   ` Jason Gunthorpe
  2020-02-13 18:03     ` Leon Romanovsky
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 14:28 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Daniel Jurgens,
	Erez Shitrit, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

On Wed, Feb 12, 2020 at 09:26:34AM +0200, Leon Romanovsky wrote:
> From: Yonatan Cohen <yonatanc@mellanox.com>
> 
> When unloading ib_umad, remove ibdev sys file 1st before
> port removal to prevent kernel oops.
> 
> ib_mad's method ibdev_show() might access a umad port
> whoes ibdev field has already been NULLed when rmmod ib_umad
> was issued from another shell.
> 
> Consider this scenario
> 	         shell-1            	shell-2
> 	        rmmod ib_mod    	cat /sys/devices/../ibdev
> 	            |           		|
> 	        ib_umad_kill_port()        ibdev_show()
> 	     port->ib_dev = NULL	dev_name(port->ib_dev)
> 
> kernel stack
> PF: error_code(0x0000) - not-present page
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
> RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
> RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
> RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
> R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
> R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
> FS:  00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
> Call Trace:
>  dev_attr_show+0x15/0x50
>  sysfs_kf_seq_show+0xb8/0x1a0
>  seq_read+0x12d/0x350
>  vfs_read+0x89/0x140
>  ksys_read+0x55/0xd0
>  do_syscall_64+0x55/0x1b0
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9:
> 
> Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()")

This is the wrong fixes line, this ordering change was actually
deliberately done:

commit cf7ad3030271c55a7119a8c2162563e3f6e93879
Author: Parav Pandit <parav@mellanox.com>
Date:   Fri Dec 21 16:19:24 2018 +0200

    IB/umad: Avoid destroying device while it is accessed
    
    ib_umad_reg_agent2() and ib_umad_reg_agent() access the device name in
    dev_notice(), while concurrently, ib_umad_kill_port() can destroy the
    device using device_destroy().
    
            cpu-0                               cpu-1
            -----                               -----
        ib_umad_ioctl()
            [...]                            ib_umad_kill_port()
                                                  device_destroy(dev)
    
            ib_umad_reg_agent()
                dev_notice(dev)

The mistake in the above was to move the device_dstroy() down, not
split it into device_del() above and put_device() below.

Now that is already split we are OK.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-02-12  7:26 ` [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
@ 2020-02-13 15:37   ` Jason Gunthorpe
  2020-02-13 18:10     ` Leon Romanovsky
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 15:37 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Daniel Jurgens,
	Erez Shitrit, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

On Wed, Feb 12, 2020 at 09:26:30AM +0200, Leon Romanovsky wrote:

> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
> index 2aa3457a30ce..c614cb87d09b 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -379,6 +379,7 @@ struct ipoib_dev_priv {
>  	struct ipoib_tx_buf *tx_ring;
>  	unsigned int	     tx_head;
>  	unsigned int	     tx_tail;
> +	atomic_t             tx_outstanding;
>  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
>  	struct ib_ud_wr      tx_wr;
>  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index c59e00a0881f..db6aace83fe5 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -756,7 +756,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
>  		return;
>  	}
>  
> -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
>  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
>  			  tx->qp->qp_num);
>  		netif_stop_queue(dev);
> @@ -786,7 +786,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
>  	} else {
>  		netif_trans_update(dev);
>  		++tx->tx_head;
> -		++priv->tx_head;
> +		atomic_inc(&priv->tx_outstanding);
>  	}

This use of an atomic is very weird, probably wrong.

Why is it an atomic?  priv->tx_head wasn't an atomic, and every place
touching tx_outstanding was also touching tx_head.

I assume there is some hidden locking here? Or much more stuff is
busted up.

In that case, drop the atomic.

However, if the atomic is needed (where/why?) then something has to
be dealing with the races, and if the write side is fully locked then
an atomic is the wrong choice, use READ_ONCE/WRITE_ONCE instead

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 0/9] Fixes for v5.6
  2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
                   ` (8 preceding siblings ...)
  2020-02-12  7:26 ` [PATCH rdma-rc 9/9] RDMA/mlx5: Prevent overflow in mmap offset calculations Leon Romanovsky
@ 2020-02-13 18:03 ` Jason Gunthorpe
  9 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 18:03 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Daniel Jurgens,
	Erez Shitrit, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Parav Pandit, Sean Hefty, Valentine Fatiev, Yishai Hadas,
	Yonatan Cohen, Zhu Yanjun

On Wed, Feb 12, 2020 at 09:26:26AM +0200, Leon Romanovsky wrote:
> From: Leon Romanovsky <leonro@mellanox.com>
> 
> Hi,
> 
> This pack of small fixes is sent as a patchset simply to simplify their
> tracking. Some of them, like first and second patches were already
> sent to the mailing list. The ucma patch was in our regression for whole
> cycle and we didn't notice any failures related to that change.
> 
> Changelog of second patch:
> 1. Maor added IB_QP_PKEY_INDEX and IB_QP_PORT checks and I rewrote the
> code logic to be less hairy.
> 
> Thanks
> 
> Leon Romanovsky (2):
>   RDMA/ucma: Mask QPN to be 24 bits according to IBTA

I put this one in to for-next

>   RDMA/mlx5: Prevent overflow in mmap offset calculations
> 
> Maor Gottlieb (1):
>   RDMA/core: Fix protection fault in get_pkey_idx_qp_list
> 
> Michael Guralnik (1):
>   RDMA/core: Add missing list deletion on freeing event queue
>  
> Yishai Hadas (1):
>   IB/mlx5: Fix async events cleanup flows
> 
> Yonatan Cohen (1):
>   IB/umad: Fix kernel crash while unloading ib_umad
> 
> Zhu Yanjun (1):
>   RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq

And these to for-rc

> Parav Pandit (1):
>   Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
> 
> Valentine Fatiev (1):
>   IB/ipoib: Fix double free of skb in case of multicast traffic in CM
>     mode

These I want to see the discussion on

Thanks,
Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad
  2020-02-13 14:28   ` Jason Gunthorpe
@ 2020-02-13 18:03     ` Leon Romanovsky
  0 siblings, 0 replies; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-13 18:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Maor Gottlieb, Michael Guralnik, Moni Shoua, Parav Pandit,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen,
	Zhu Yanjun

On Thu, Feb 13, 2020 at 10:28:18AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2020 at 09:26:34AM +0200, Leon Romanovsky wrote:
> > From: Yonatan Cohen <yonatanc@mellanox.com>
> >
> > When unloading ib_umad, remove ibdev sys file 1st before
> > port removal to prevent kernel oops.
> >
> > ib_mad's method ibdev_show() might access a umad port
> > whoes ibdev field has already been NULLed when rmmod ib_umad
> > was issued from another shell.
> >
> > Consider this scenario
> > 	         shell-1            	shell-2
> > 	        rmmod ib_mod    	cat /sys/devices/../ibdev
> > 	            |           		|
> > 	        ib_umad_kill_port()        ibdev_show()
> > 	     port->ib_dev = NULL	dev_name(port->ib_dev)
> >
> > kernel stack
> > PF: error_code(0x0000) - not-present page
> > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> > RIP: 0010:ibdev_show+0x18/0x50 [ib_umad]
> > RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282
> > RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000
> > RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870
> > RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000
> > R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40
> > R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58
> > FS:  00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0
> > Call Trace:
> >  dev_attr_show+0x15/0x50
> >  sysfs_kf_seq_show+0xb8/0x1a0
> >  seq_read+0x12d/0x350
> >  vfs_read+0x89/0x140
> >  ksys_read+0x55/0xd0
> >  do_syscall_64+0x55/0x1b0
> >  entry_SYSCALL_64_after_hwframe+0x44/0xa9:
> >
> > Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()")
>
> This is the wrong fixes line, this ordering change was actually
> deliberately done:
>

Can you please fix the fixes line, so I will resend unaccepted patches only?

Thanks

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-02-13 15:37   ` Jason Gunthorpe
@ 2020-02-13 18:10     ` Leon Romanovsky
  2020-02-13 18:26       ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-13 18:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Maor Gottlieb, Michael Guralnik, Moni Shoua, Parav Pandit,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen,
	Zhu Yanjun

On Thu, Feb 13, 2020 at 11:37:43AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2020 at 09:26:30AM +0200, Leon Romanovsky wrote:
>
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
> > index 2aa3457a30ce..c614cb87d09b 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> > @@ -379,6 +379,7 @@ struct ipoib_dev_priv {
> >  	struct ipoib_tx_buf *tx_ring;
> >  	unsigned int	     tx_head;
> >  	unsigned int	     tx_tail;
> > +	atomic_t             tx_outstanding;
> >  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
> >  	struct ib_ud_wr      tx_wr;
> >  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > index c59e00a0881f..db6aace83fe5 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > @@ -756,7 +756,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> >  		return;
> >  	}
> >
> > -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> > +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
> >  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
> >  			  tx->qp->qp_num);
> >  		netif_stop_queue(dev);
> > @@ -786,7 +786,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> >  	} else {
> >  		netif_trans_update(dev);
> >  		++tx->tx_head;
> > -		++priv->tx_head;
> > +		atomic_inc(&priv->tx_outstanding);
> >  	}
>
> This use of an atomic is very weird, probably wrong.
>
> Why is it an atomic?  priv->tx_head wasn't an atomic, and every place
> touching tx_outstanding was also touching tx_head.
>
> I assume there is some hidden locking here? Or much more stuff is
> busted up.
>
> In that case, drop the atomic.
>
> However, if the atomic is needed (where/why?) then something has to
> be dealing with the races, and if the write side is fully locked then
> an atomic is the wrong choice, use READ_ONCE/WRITE_ONCE instead

I thought that atomic_t is appropriate here. Valentin wanted to
implement "window" and he needed to ensure that this window is counted
correctly while it doesn't need to be 100% accurate.

Thanks

>
> Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-02-13 18:10     ` Leon Romanovsky
@ 2020-02-13 18:26       ` Jason Gunthorpe
  2020-02-13 18:36         ` Leon Romanovsky
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 18:26 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Maor Gottlieb, Michael Guralnik, Moni Shoua, Parav Pandit,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen,
	Zhu Yanjun

On Thu, Feb 13, 2020 at 08:10:12PM +0200, Leon Romanovsky wrote:
> On Thu, Feb 13, 2020 at 11:37:43AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 12, 2020 at 09:26:30AM +0200, Leon Romanovsky wrote:
> >
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > index 2aa3457a30ce..c614cb87d09b 100644
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > @@ -379,6 +379,7 @@ struct ipoib_dev_priv {
> > >  	struct ipoib_tx_buf *tx_ring;
> > >  	unsigned int	     tx_head;
> > >  	unsigned int	     tx_tail;
> > > +	atomic_t             tx_outstanding;
> > >  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
> > >  	struct ib_ud_wr      tx_wr;
> > >  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > index c59e00a0881f..db6aace83fe5 100644
> > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > @@ -756,7 +756,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > >  		return;
> > >  	}
> > >
> > > -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> > > +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
> > >  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
> > >  			  tx->qp->qp_num);
> > >  		netif_stop_queue(dev);
> > > @@ -786,7 +786,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > >  	} else {
> > >  		netif_trans_update(dev);
> > >  		++tx->tx_head;
> > > -		++priv->tx_head;
> > > +		atomic_inc(&priv->tx_outstanding);
> > >  	}
> >
> > This use of an atomic is very weird, probably wrong.
> >
> > Why is it an atomic?  priv->tx_head wasn't an atomic, and every place
> > touching tx_outstanding was also touching tx_head.
> >
> > I assume there is some hidden locking here? Or much more stuff is
> > busted up.
> >
> > In that case, drop the atomic.
> >
> > However, if the atomic is needed (where/why?) then something has to
> > be dealing with the races, and if the write side is fully locked then
> > an atomic is the wrong choice, use READ_ONCE/WRITE_ONCE instead
> 
> I thought that atomic_t is appropriate here. Valentin wanted to
> implement "window" and he needed to ensure that this window is counted
> correctly while it doesn't need to be 100% accurate.

It does need to be 100% accurate because the stop_queue condition is:

 +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {

And presumably the whole problem here is overflowing some sized q
opbject.

That atomic is only needed if there is concurrency, and where is the
concurrency?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-02-13 18:26       ` Jason Gunthorpe
@ 2020-02-13 18:36         ` Leon Romanovsky
  2020-02-13 19:09           ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Leon Romanovsky @ 2020-02-13 18:36 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Maor Gottlieb, Michael Guralnik, Moni Shoua, Parav Pandit,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen,
	Zhu Yanjun

On Thu, Feb 13, 2020 at 02:26:55PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2020 at 08:10:12PM +0200, Leon Romanovsky wrote:
> > On Thu, Feb 13, 2020 at 11:37:43AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 12, 2020 at 09:26:30AM +0200, Leon Romanovsky wrote:
> > >
> > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > > index 2aa3457a30ce..c614cb87d09b 100644
> > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > > @@ -379,6 +379,7 @@ struct ipoib_dev_priv {
> > > >  	struct ipoib_tx_buf *tx_ring;
> > > >  	unsigned int	     tx_head;
> > > >  	unsigned int	     tx_tail;
> > > > +	atomic_t             tx_outstanding;
> > > >  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
> > > >  	struct ib_ud_wr      tx_wr;
> > > >  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > > index c59e00a0881f..db6aace83fe5 100644
> > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > > @@ -756,7 +756,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > > >  		return;
> > > >  	}
> > > >
> > > > -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> > > > +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
> > > >  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
> > > >  			  tx->qp->qp_num);
> > > >  		netif_stop_queue(dev);
> > > > @@ -786,7 +786,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > > >  	} else {
> > > >  		netif_trans_update(dev);
> > > >  		++tx->tx_head;
> > > > -		++priv->tx_head;
> > > > +		atomic_inc(&priv->tx_outstanding);
> > > >  	}
> > >
> > > This use of an atomic is very weird, probably wrong.
> > >
> > > Why is it an atomic?  priv->tx_head wasn't an atomic, and every place
> > > touching tx_outstanding was also touching tx_head.
> > >
> > > I assume there is some hidden locking here? Or much more stuff is
> > > busted up.
> > >
> > > In that case, drop the atomic.
> > >
> > > However, if the atomic is needed (where/why?) then something has to
> > > be dealing with the races, and if the write side is fully locked then
> > > an atomic is the wrong choice, use READ_ONCE/WRITE_ONCE instead
> >
> > I thought that atomic_t is appropriate here. Valentin wanted to
> > implement "window" and he needed to ensure that this window is counted
> > correctly while it doesn't need to be 100% accurate.
>
> It does need to be 100% accurate because the stop_queue condition is:
>
>  +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {

So better to write ">=" instead of "=".

>
> And presumably the whole problem here is overflowing some sized q
> opbject.
>
> That atomic is only needed if there is concurrency, and where is the
> concurrency?

Depends on if you believe to description in commit message :)

>
> Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode
  2020-02-13 18:36         ` Leon Romanovsky
@ 2020-02-13 19:09           ` Jason Gunthorpe
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 19:09 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, RDMA mailing list, Daniel Jurgens, Erez Shitrit,
	Maor Gottlieb, Michael Guralnik, Moni Shoua, Parav Pandit,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen,
	Zhu Yanjun

On Thu, Feb 13, 2020 at 08:36:22PM +0200, Leon Romanovsky wrote:
> On Thu, Feb 13, 2020 at 02:26:55PM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 13, 2020 at 08:10:12PM +0200, Leon Romanovsky wrote:
> > > On Thu, Feb 13, 2020 at 11:37:43AM -0400, Jason Gunthorpe wrote:
> > > > On Wed, Feb 12, 2020 at 09:26:30AM +0200, Leon Romanovsky wrote:
> > > >
> > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > > > index 2aa3457a30ce..c614cb87d09b 100644
> > > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> > > > > @@ -379,6 +379,7 @@ struct ipoib_dev_priv {
> > > > >  	struct ipoib_tx_buf *tx_ring;
> > > > >  	unsigned int	     tx_head;
> > > > >  	unsigned int	     tx_tail;
> > > > > +	atomic_t             tx_outstanding;
> > > > >  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
> > > > >  	struct ib_ud_wr      tx_wr;
> > > > >  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > > > index c59e00a0881f..db6aace83fe5 100644
> > > > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> > > > > @@ -756,7 +756,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > > > >  		return;
> > > > >  	}
> > > > >
> > > > > -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> > > > > +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
> > > > >  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n",
> > > > >  			  tx->qp->qp_num);
> > > > >  		netif_stop_queue(dev);
> > > > > @@ -786,7 +786,7 @@ void ipoib_cm_send(struct net_device *dev, struct sk_buff *skb, struct ipoib_cm_
> > > > >  	} else {
> > > > >  		netif_trans_update(dev);
> > > > >  		++tx->tx_head;
> > > > > -		++priv->tx_head;
> > > > > +		atomic_inc(&priv->tx_outstanding);
> > > > >  	}
> > > >
> > > > This use of an atomic is very weird, probably wrong.
> > > >
> > > > Why is it an atomic?  priv->tx_head wasn't an atomic, and every place
> > > > touching tx_outstanding was also touching tx_head.
> > > >
> > > > I assume there is some hidden locking here? Or much more stuff is
> > > > busted up.
> > > >
> > > > In that case, drop the atomic.
> > > >
> > > > However, if the atomic is needed (where/why?) then something has to
> > > > be dealing with the races, and if the write side is fully locked then
> > > > an atomic is the wrong choice, use READ_ONCE/WRITE_ONCE instead
> > >
> > > I thought that atomic_t is appropriate here. Valentin wanted to
> > > implement "window" and he needed to ensure that this window is counted
> > > correctly while it doesn't need to be 100% accurate.
> >
> > It does need to be 100% accurate because the stop_queue condition is:
> >
> >  +	if (atomic_read(&priv->tx_outstanding) == ipoib_sendq_size - 1) {
> 
> So better to write ">=" instead of "=".

Then you risk overflowing the send q.

The required scheme appears to be to globally bound the # of packets
outstanding to tx such that the global bound is lower than any of the
(many) QP's sendq, thus there will always be a place to put the
packets.

So it must be exact or at least pessimistic.

This isn't a multiq netdev, so AFAICT, the tx side is single threaded
by netdev, and it looks like the wc side was single threaded by NAPI
as well.

The old code used two variable, each written in a single threaded way,
and then read to generate the state. With the missing READ_ONCE added,
it should look like this:

    if ((priv->tx_head - READ_ONCE(priv->tx_tail)) == ipoib_sendq_size - 1) {

I feel like sticking with the single writer packets posted/completed
scheme is better and clearer than trying to collapse it into a single
atomic.

> > And presumably the whole problem here is overflowing some sized q
> > opbject.
> >
> > That atomic is only needed if there is concurrency, and where is the
> > concurrency?
> 
> Depends on if you believe to description in commit message :)

I do.. It is obviously wrong now that you point to it :) Each QP must
have exclusive control of its head/tail pointers, having a CM QP reach
into the head/tail of the UD QP is certainly wrong.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
  2020-02-13 13:30   ` Jason Gunthorpe
@ 2020-02-14  3:11     ` Parav Pandit
  2020-02-14 14:08       ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Parav Pandit @ 2020-02-14  3:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Leon Romanovsky
  Cc: Doug Ledford, Leon Romanovsky, RDMA mailing list, Daniel Jurgens,
	Erez Shitrit, Maor Gottlieb, Michael Guralnik, Moni Shoua,
	Sean Hefty, Valentine Fatiev, Yishai Hadas, Yonatan Cohen (SW),
	Yanjun Zhu

On 2/13/2020 7:30 AM, Jason Gunthorpe wrote:
> On Wed, Feb 12, 2020 at 09:26:29AM +0200, Leon Romanovsky wrote:
>> From: Parav Pandit <parav@mellanox.com>
>>
>> This reverts commit 219d2e9dfda9431b808c28d5efc74b404b95b638.
>>
>> Below flow requires cm_id_priv's destination address to be setup
>> before performing rdma_bind_addr().
>> Without which, source port allocation for existing bind list fails
>> when port range is small, resulting into rdma_resolve_addr()
>> failure.
> 
> I don't quite understands this - what is "when port range is small" ?
>  
There is sysfs knob to reduce source port range to use for binding.
So it is easy to create the issue by reducing port range to handful of them.

>> rdma_resolve_addr()
>>   cma_bind_addr()
>>     rdma_bind_addr()
>>       cma_get_port()
>>         cma_alloc_any_port()
>>           cma_port_is_unique() <- compared with zero daddr
> 
> Do you understand why cma_port_is_unique is even testing the dst_addr?
> It seems very strange.

ma_port_unique() reuses the source port as long as
destination is different (destination = either different
dest.addr or different dest.port between two cm_ids ).

This behavior is synonymous to TCP also.

> Outbound connections should not alias the source port in the first
> place??
>
I believe it can because TCP seems to allow that too as long destination
is different.

I think it allows if it results into different 4-tuple.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
  2020-02-14  3:11     ` Parav Pandit
@ 2020-02-14 14:08       ` Jason Gunthorpe
  2020-02-14 14:48         ` Parav Pandit
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-14 14:08 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Daniel Jurgens, Erez Shitrit, Maor Gottlieb,
	Michael Guralnik, Moni Shoua, Sean Hefty, Valentine Fatiev,
	Yishai Hadas, Yonatan Cohen (SW),
	Yanjun Zhu

On Fri, Feb 14, 2020 at 03:11:48AM +0000, Parav Pandit wrote:
> On 2/13/2020 7:30 AM, Jason Gunthorpe wrote:
> > On Wed, Feb 12, 2020 at 09:26:29AM +0200, Leon Romanovsky wrote:
> >> From: Parav Pandit <parav@mellanox.com>
> >>
> >> This reverts commit 219d2e9dfda9431b808c28d5efc74b404b95b638.
> >>
> >> Below flow requires cm_id_priv's destination address to be setup
> >> before performing rdma_bind_addr().
> >> Without which, source port allocation for existing bind list fails
> >> when port range is small, resulting into rdma_resolve_addr()
> >> failure.
> > 
> > I don't quite understands this - what is "when port range is small" ?
> >  
> There is sysfs knob to reduce source port range to use for binding.
> So it is easy to create the issue by reducing port range to handful of them.
> 
> >> rdma_resolve_addr()
> >>   cma_bind_addr()
> >>     rdma_bind_addr()
> >>       cma_get_port()
> >>         cma_alloc_any_port()
> >>           cma_port_is_unique() <- compared with zero daddr
> > 
> > Do you understand why cma_port_is_unique is even testing the dst_addr?
> > It seems very strange.
> 
> ma_port_unique() reuses the source port as long as
> destination is different (destination = either different
> dest.addr or different dest.port between two cm_ids ).
> 
> This behavior is synonymous to TCP also.
> 
> > Outbound connections should not alias the source port in the first
> > place??
> >
> I believe it can because TCP seems to allow that too as long destination
> is different.
> 
> I think it allows if it results into different 4-tuple.

So the issue here is if the dest is left as 0 then the
cma_alloc_any_port() isn't able to alias source ports any more?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow"
  2020-02-14 14:08       ` Jason Gunthorpe
@ 2020-02-14 14:48         ` Parav Pandit
  0 siblings, 0 replies; 24+ messages in thread
From: Parav Pandit @ 2020-02-14 14:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky,
	RDMA mailing list, Daniel Jurgens, Erez Shitrit, Maor Gottlieb,
	Michael Guralnik, Moni Shoua, Sean Hefty, Valentine Fatiev,
	Yishai Hadas, Yonatan Cohen (SW),
	Yanjun Zhu

On 2/14/2020 8:08 AM, Jason Gunthorpe wrote:
> On Fri, Feb 14, 2020 at 03:11:48AM +0000, Parav Pandit wrote:
>> On 2/13/2020 7:30 AM, Jason Gunthorpe wrote:
>>> On Wed, Feb 12, 2020 at 09:26:29AM +0200, Leon Romanovsky wrote:
>>>> From: Parav Pandit <parav@mellanox.com>
>>>>
>>>> This reverts commit 219d2e9dfda9431b808c28d5efc74b404b95b638.
>>>>
>>>> Below flow requires cm_id_priv's destination address to be setup
>>>> before performing rdma_bind_addr().
>>>> Without which, source port allocation for existing bind list fails
>>>> when port range is small, resulting into rdma_resolve_addr()
>>>> failure.
>>>
>>> I don't quite understands this - what is "when port range is small" ?
>>>  
>> There is sysfs knob to reduce source port range to use for binding.
>> So it is easy to create the issue by reducing port range to handful of them.
>>
>>>> rdma_resolve_addr()
>>>>   cma_bind_addr()
>>>>     rdma_bind_addr()
>>>>       cma_get_port()
>>>>         cma_alloc_any_port()
>>>>           cma_port_is_unique() <- compared with zero daddr
>>>
>>> Do you understand why cma_port_is_unique is even testing the dst_addr?
>>> It seems very strange.
>>
>> ma_port_unique() reuses the source port as long as
>> destination is different (destination = either different
>> dest.addr or different dest.port between two cm_ids ).
>>
>> This behavior is synonymous to TCP also.
>>
>>> Outbound connections should not alias the source port in the first
>>> place??
>>>
>> I believe it can because TCP seems to allow that too as long destination
>> is different.
>>
>> I think it allows if it results into different 4-tuple.
> 
> So the issue here is if the dest is left as 0 then the
> cma_alloc_any_port() isn't able to alias source ports any more?
>
Correct.


> Jason
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-12  7:26 [PATCH rdma-rc 0/9] Fixes for v5.6 Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 1/9] RDMA/ucma: Mask QPN to be 24 bits according to IBTA Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 2/9] RDMA/core: Fix protection fault in get_pkey_idx_qp_list Leon Romanovsky
2020-02-12  8:01   ` Leon Romanovsky
2020-02-12  8:06   ` Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 3/9] Revert "RDMA/cma: Simplify rdma_resolve_addr() error flow" Leon Romanovsky
2020-02-13 13:30   ` Jason Gunthorpe
2020-02-14  3:11     ` Parav Pandit
2020-02-14 14:08       ` Jason Gunthorpe
2020-02-14 14:48         ` Parav Pandit
2020-02-12  7:26 ` [PATCH rdma-rc 4/9] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode Leon Romanovsky
2020-02-13 15:37   ` Jason Gunthorpe
2020-02-13 18:10     ` Leon Romanovsky
2020-02-13 18:26       ` Jason Gunthorpe
2020-02-13 18:36         ` Leon Romanovsky
2020-02-13 19:09           ` Jason Gunthorpe
2020-02-12  7:26 ` [PATCH rdma-rc 5/9] RDMA/core: Add missing list deletion on freeing event queue Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 6/9] IB/mlx5: Fix async events cleanup flows Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 7/9] RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad Leon Romanovsky
2020-02-13 14:28   ` Jason Gunthorpe
2020-02-13 18:03     ` Leon Romanovsky
2020-02-12  7:26 ` [PATCH rdma-rc 9/9] RDMA/mlx5: Prevent overflow in mmap offset calculations Leon Romanovsky
2020-02-13 18:03 ` [PATCH rdma-rc 0/9] Fixes for v5.6 Jason Gunthorpe

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git