All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4
@ 2014-02-26 15:06 Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 01/31] cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB Hariprasad Shenai
                   ` (30 more replies)
  0 siblings, 31 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

Hi All,

This patch series provides miscelleneous fixes for Chelsio T4/T5 adapters
related to cxgb4 related to sge and mtu. And includes DB Drop avoidance
and other misc. fixes on iw-cxgb4.

The patches series is created against David Miller's 'net-next' tree.
And includes patches on cxgb4 and iw_cxgb4 driver.

We would like to request this patch series to get merged via David Miller's
'net-next' tree.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

Thanks

Kumar Sanghvi (6):
  cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size
    is 64KB
  cxgb4: Add code to dump SGE registers when hitting idma hangs
  cxgb4: Rectify emitting messages about SGE Ingress DMA channels being
    potentially stuck
  cxgb4: Updates for T5 SGE's Egress Congestion Threshold
  cxgb4: use spinlock_irqsave/spinlock_irqrestore for db lock.
  Revert "cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()"

Steve Wise (25):
  iw_cxgb4: cap CQ size at T4_MAX_IQ_SIZE.
  iw_cxgb4: Allow loopback connections.
  iw_cxgb4: release neigh entry in error paths.
  iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice.
  cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
  iw_cxgb4: use the BAR2/WC path for kernel QPs and T5 devices.
  iw_cxgb4: Fix incorrect BUG_ON conditions.
  iw_cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes.
  iw_cxgb4: default peer2peer mode to 1.
  iw_cxgb4: save the correct map length for fast_reg_page_lists.
  iw_cxgb4: don't leak skb in c4iw_uld_rx_handler().
  iw_cxgb4: fix possible memory leak in RX_PKT processing.
  iw_cxgb4: ignore read reponse type 1 CQEs.
  iw_cxgb4: connect_request_upcall fixes.
  iw_cxgb4: adjust tcp snd/rcv window based on link speed.
  iw_cxgb4: update snd_seq when sending MPA messages.
  iw_cxgb4: lock around accept/reject downcalls.
  iw_cxgb4: drop RX_DATA packets if the endpoint is gone.
  iw_cxgb4: rx_data() needs to hold the ep mutex.
  iw_cxgb4: endpoint timeout fixes.
  iw_cxgb4: rmb() after reading valid gen bit.
  iw_cxgb4: wc_wmb() needed after DB writes.
  iw_cxgb4: SQ flush fix.
  iw_cxgb4: minor fixes/cleanup.
  iw_cxgb4: Max fastreg depth depends on DSGL support.

 drivers/infiniband/hw/cxgb4/cm.c                |  270 ++++++++++++++++-------
 drivers/infiniband/hw/cxgb4/cq.c                |   50 +++--
 drivers/infiniband/hw/cxgb4/device.c            |  237 +++++++++++++-------
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   17 +-
 drivers/infiniband/hw/cxgb4/mem.c               |   18 ++-
 drivers/infiniband/hw/cxgb4/provider.c          |   46 ++++-
 drivers/infiniband/hw/cxgb4/qp.c                |  216 +++++++++---------
 drivers/infiniband/hw/cxgb4/resource.c          |   10 +-
 drivers/infiniband/hw/cxgb4/t4.h                |   78 ++++++-
 drivers/infiniband/hw/cxgb4/user.h              |    5 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |   11 +-
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   84 ++++---
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |  161 +++++++++-----
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c      |  106 +++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h     |    2 +
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h    |    9 +
 16 files changed, 924 insertions(+), 396 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH net-next 01/31] cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 02/31] cxgb4: Add code to dump SGE registers when hitting idma hangs Hariprasad Shenai
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Kumar Sanghvi <kumaras@chelsio.com>

We'd come in with SGE_FL_BUFFER_SIZE[0] and [1] both equal to 64KB and the
extant logic would flag that as an error.

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c |   12 ++++++++++--
 1 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index af76b25..3a2ecd8 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2596,11 +2596,19 @@ static int t4_sge_init_soft(struct adapter *adap)
 	fl_small_mtu = READ_FL_BUF(RX_SMALL_MTU_BUF);
 	fl_large_mtu = READ_FL_BUF(RX_LARGE_MTU_BUF);
 
+	/* We only bother using the Large Page logic if the Large Page Buffer
+	 * is larger than our Page Size Buffer.
+	 */
+	if (fl_large_pg <= fl_small_pg)
+		fl_large_pg = 0;
+
 	#undef READ_FL_BUF
 
+	/* The Page Size Buffer must be exactly equal to our Page Size and the
+	 * Large Page Size Buffer should be 0 (per above) or a power of 2.
+	 */
 	if (fl_small_pg != PAGE_SIZE ||
-	    (fl_large_pg != 0 && (fl_large_pg < fl_small_pg ||
-				  (fl_large_pg & (fl_large_pg-1)) != 0))) {
+	    (fl_large_pg & (fl_large_pg-1)) != 0) {
 		dev_err(adap->pdev_dev, "bad SGE FL page buffer sizes [%d, %d]\n",
 			fl_small_pg, fl_large_pg);
 		return -EINVAL;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 02/31] cxgb4: Add code to dump SGE registers when hitting idma hangs
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 01/31] cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 03/31] cxgb4: Rectify emitting messages about SGE Ingress DMA channels being potentially stuck Hariprasad Shenai
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Kumar Sanghvi <kumaras@chelsio.com>

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h   |    1 +
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c   |  106 ++++++++++++++++++++++++++
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h |    3 +
 3 files changed, 110 insertions(+), 0 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 944f2cb..509c976 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -1032,4 +1032,5 @@ void t4_db_dropped(struct adapter *adapter);
 int t4_mem_win_read_len(struct adapter *adap, u32 addr, __be32 *data, int len);
 int t4_fwaddrspace_write(struct adapter *adap, unsigned int mbox,
 			 u32 addr, u32 val);
+void t4_sge_decode_idma_state(struct adapter *adapter, int state);
 #endif /* __CXGB4_H__ */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 7ae756d..f50637a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -2597,6 +2597,112 @@ int t4_mdio_wr(struct adapter *adap, unsigned int mbox, unsigned int phy_addr,
 }
 
 /**
+ *	t4_sge_decode_idma_state - decode the idma state
+ *	@adap: the adapter
+ *	@state: the state idma is stuck in
+ */
+void t4_sge_decode_idma_state(struct adapter *adapter, int state)
+{
+	static const char * const t4_decode[] = {
+		"IDMA_IDLE",
+		"IDMA_PUSH_MORE_CPL_FIFO",
+		"IDMA_PUSH_CPL_MSG_HEADER_TO_FIFO",
+		"Not used",
+		"IDMA_PHYSADDR_SEND_PCIEHDR",
+		"IDMA_PHYSADDR_SEND_PAYLOAD_FIRST",
+		"IDMA_PHYSADDR_SEND_PAYLOAD",
+		"IDMA_SEND_FIFO_TO_IMSG",
+		"IDMA_FL_REQ_DATA_FL_PREP",
+		"IDMA_FL_REQ_DATA_FL",
+		"IDMA_FL_DROP",
+		"IDMA_FL_H_REQ_HEADER_FL",
+		"IDMA_FL_H_SEND_PCIEHDR",
+		"IDMA_FL_H_PUSH_CPL_FIFO",
+		"IDMA_FL_H_SEND_CPL",
+		"IDMA_FL_H_SEND_IP_HDR_FIRST",
+		"IDMA_FL_H_SEND_IP_HDR",
+		"IDMA_FL_H_REQ_NEXT_HEADER_FL",
+		"IDMA_FL_H_SEND_NEXT_PCIEHDR",
+		"IDMA_FL_H_SEND_IP_HDR_PADDING",
+		"IDMA_FL_D_SEND_PCIEHDR",
+		"IDMA_FL_D_SEND_CPL_AND_IP_HDR",
+		"IDMA_FL_D_REQ_NEXT_DATA_FL",
+		"IDMA_FL_SEND_PCIEHDR",
+		"IDMA_FL_PUSH_CPL_FIFO",
+		"IDMA_FL_SEND_CPL",
+		"IDMA_FL_SEND_PAYLOAD_FIRST",
+		"IDMA_FL_SEND_PAYLOAD",
+		"IDMA_FL_REQ_NEXT_DATA_FL",
+		"IDMA_FL_SEND_NEXT_PCIEHDR",
+		"IDMA_FL_SEND_PADDING",
+		"IDMA_FL_SEND_COMPLETION_TO_IMSG",
+		"IDMA_FL_SEND_FIFO_TO_IMSG",
+		"IDMA_FL_REQ_DATAFL_DONE",
+		"IDMA_FL_REQ_HEADERFL_DONE",
+	};
+	static const char * const t5_decode[] = {
+		"IDMA_IDLE",
+		"IDMA_ALMOST_IDLE",
+		"IDMA_PUSH_MORE_CPL_FIFO",
+		"IDMA_PUSH_CPL_MSG_HEADER_TO_FIFO",
+		"IDMA_SGEFLRFLUSH_SEND_PCIEHDR",
+		"IDMA_PHYSADDR_SEND_PCIEHDR",
+		"IDMA_PHYSADDR_SEND_PAYLOAD_FIRST",
+		"IDMA_PHYSADDR_SEND_PAYLOAD",
+		"IDMA_SEND_FIFO_TO_IMSG",
+		"IDMA_FL_REQ_DATA_FL",
+		"IDMA_FL_DROP",
+		"IDMA_FL_DROP_SEND_INC",
+		"IDMA_FL_H_REQ_HEADER_FL",
+		"IDMA_FL_H_SEND_PCIEHDR",
+		"IDMA_FL_H_PUSH_CPL_FIFO",
+		"IDMA_FL_H_SEND_CPL",
+		"IDMA_FL_H_SEND_IP_HDR_FIRST",
+		"IDMA_FL_H_SEND_IP_HDR",
+		"IDMA_FL_H_REQ_NEXT_HEADER_FL",
+		"IDMA_FL_H_SEND_NEXT_PCIEHDR",
+		"IDMA_FL_H_SEND_IP_HDR_PADDING",
+		"IDMA_FL_D_SEND_PCIEHDR",
+		"IDMA_FL_D_SEND_CPL_AND_IP_HDR",
+		"IDMA_FL_D_REQ_NEXT_DATA_FL",
+		"IDMA_FL_SEND_PCIEHDR",
+		"IDMA_FL_PUSH_CPL_FIFO",
+		"IDMA_FL_SEND_CPL",
+		"IDMA_FL_SEND_PAYLOAD_FIRST",
+		"IDMA_FL_SEND_PAYLOAD",
+		"IDMA_FL_REQ_NEXT_DATA_FL",
+		"IDMA_FL_SEND_NEXT_PCIEHDR",
+		"IDMA_FL_SEND_PADDING",
+		"IDMA_FL_SEND_COMPLETION_TO_IMSG",
+	};
+	static const u32 sge_regs[] = {
+		SGE_DEBUG_DATA_LOW_INDEX_2,
+		SGE_DEBUG_DATA_LOW_INDEX_3,
+		SGE_DEBUG_DATA_HIGH_INDEX_10,
+	};
+	const char **sge_idma_decode;
+	int sge_idma_decode_nstates;
+	int i;
+
+	if (is_t4(adapter->params.chip)) {
+		sge_idma_decode = (const char **)t4_decode;
+		sge_idma_decode_nstates = ARRAY_SIZE(t4_decode);
+	} else {
+		sge_idma_decode = (const char **)t5_decode;
+		sge_idma_decode_nstates = ARRAY_SIZE(t5_decode);
+	}
+
+	if (state < sge_idma_decode_nstates)
+		CH_WARN(adapter, "idma state %s\n", sge_idma_decode[state]);
+	else
+		CH_WARN(adapter, "idma state %d unknown\n", state);
+
+	for (i = 0; i < ARRAY_SIZE(sge_regs); i++)
+		CH_WARN(adapter, "SGE register %#x value %#x\n",
+			sge_regs[i], t4_read_reg(adapter, sge_regs[i]));
+}
+
+/**
  *      t4_fw_hello - establish communication with FW
  *      @adap: the adapter
  *      @mbox: mailbox to use for the FW command
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 4082522..33cf9ef 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -278,6 +278,9 @@
 #define SGE_DEBUG_INDEX 0x10cc
 #define SGE_DEBUG_DATA_HIGH 0x10d0
 #define SGE_DEBUG_DATA_LOW 0x10d4
+#define SGE_DEBUG_DATA_LOW_INDEX_2	0x12c8
+#define SGE_DEBUG_DATA_LOW_INDEX_3	0x12cc
+#define SGE_DEBUG_DATA_HIGH_INDEX_10	0x12a8
 #define SGE_INGRESS_QUEUES_PER_PAGE_PF 0x10f4
 
 #define S_HP_INT_THRESH    28
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 03/31] cxgb4: Rectify emitting messages about SGE Ingress DMA channels being potentially stuck
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 01/31] cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 02/31] cxgb4: Add code to dump SGE registers when hitting idma hangs Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 04/31] cxgb4: Updates for T5 SGE's Egress Congestion Threshold Hariprasad Shenai
                   ` (27 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Kumar Sanghvi <kumaras@chelsio.com>

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h |    9 ++-
 drivers/net/ethernet/chelsio/cxgb4/sge.c   |   91 ++++++++++++++++++++++------
 2 files changed, 80 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 509c976..50abe1d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -556,8 +556,13 @@ struct sge {
 	u32 pktshift;               /* padding between CPL & packet data */
 	u32 fl_align;               /* response queue message alignment */
 	u32 fl_starve_thres;        /* Free List starvation threshold */
-	unsigned int starve_thres;
-	u8 idma_state[2];
+
+	/* State variables for detecting an SGE Ingress DMA hang */
+	unsigned int idma_1s_thresh;/* SGE same State Counter 1s threshold */
+	unsigned int idma_stalled[2];/* SGE synthesized stalled timers in HZ */
+	unsigned int idma_state[2]; /* SGE IDMA Hang detect state */
+	unsigned int idma_qid[2];   /* SGE IDMA Hung Ingress Queue ID */
+
 	unsigned int egr_start;
 	unsigned int ingr_start;
 	void *egr_map[MAX_EGRQ];    /* qid->queue egress queue map */
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 3a2ecd8..809ab60 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -93,6 +93,16 @@
  */
 #define TX_QCHECK_PERIOD (HZ / 2)
 
+/* SGE Hung Ingress DMA Threshold Warning time (in Hz) and Warning Repeat Rate
+ * (in RX_QCHECK_PERIOD multiples).  If we find one of the SGE Ingress DMA
+ * State Machines in the same state for this amount of time (in HZ) then we'll
+ * issue a warning about a potential hang.  We'll repeat the warning as the
+ * SGE Ingress DMA Channel appears to be hung every N RX_QCHECK_PERIODs till
+ * the situation clears.  If the situation clears, we'll note that as well.
+ */
+#define SGE_IDMA_WARN_THRESH (1 * HZ)
+#define SGE_IDMA_WARN_REPEAT (20 * RX_QCHECK_PERIOD)
+
 /*
  * Max number of Tx descriptors to be reclaimed by the Tx timer.
  */
@@ -2008,7 +2018,7 @@ irq_handler_t t4_intr_handler(struct adapter *adap)
 static void sge_rx_timer_cb(unsigned long data)
 {
 	unsigned long m;
-	unsigned int i, cnt[2];
+	unsigned int i, idma_same_state_cnt[2];
 	struct adapter *adap = (struct adapter *)data;
 	struct sge *s = &adap->sge;
 
@@ -2031,21 +2041,65 @@ static void sge_rx_timer_cb(unsigned long data)
 		}
 
 	t4_write_reg(adap, SGE_DEBUG_INDEX, 13);
-	cnt[0] = t4_read_reg(adap, SGE_DEBUG_DATA_HIGH);
-	cnt[1] = t4_read_reg(adap, SGE_DEBUG_DATA_LOW);
-
-	for (i = 0; i < 2; i++)
-		if (cnt[i] >= s->starve_thres) {
-			if (s->idma_state[i] || cnt[i] == 0xffffffff)
-				continue;
-			s->idma_state[i] = 1;
-			t4_write_reg(adap, SGE_DEBUG_INDEX, 11);
-			m = t4_read_reg(adap, SGE_DEBUG_DATA_LOW) >> (i * 16);
-			dev_warn(adap->pdev_dev,
-				 "SGE idma%u starvation detected for "
-				 "queue %lu\n", i, m & 0xffff);
-		} else if (s->idma_state[i])
-			s->idma_state[i] = 0;
+	idma_same_state_cnt[0] = t4_read_reg(adap, SGE_DEBUG_DATA_HIGH);
+	idma_same_state_cnt[1] = t4_read_reg(adap, SGE_DEBUG_DATA_LOW);
+
+	for (i = 0; i < 2; i++) {
+		u32 debug0, debug11;
+
+		/* If the Ingress DMA Same State Counter ("timer") is less
+		 * than 1s, then we can reset our synthesized Stall Timer and
+		 * continue.  If we have previously emitted warnings about a
+		 * potential stalled Ingress Queue, issue a note indicating
+		 * that the Ingress Queue has resumed forward progress.
+		 */
+		if (idma_same_state_cnt[i] < s->idma_1s_thresh) {
+			if (s->idma_stalled[i] >= SGE_IDMA_WARN_THRESH)
+				CH_WARN(adap, "SGE idma%d, queue%u,resumed after %d sec",
+					i, s->idma_qid[i],
+					s->idma_stalled[i]/HZ);
+			s->idma_stalled[i] = 0;
+			continue;
+		}
+
+		/* Synthesize an SGE Ingress DMA Same State Timer in the Hz
+		 * domain.  The first time we get here it'll be because we
+		 * passed the 1s Threshold; each additional time it'll be
+		 * because the RX Timer Callback is being fired on its regular
+		 * schedule.
+		 *
+		 * If the stall is below our Potential Hung Ingress Queue
+		 * Warning Threshold, continue.
+		 */
+		if (s->idma_stalled[i] == 0)
+			s->idma_stalled[i] = HZ;
+		else
+			s->idma_stalled[i] += RX_QCHECK_PERIOD;
+
+		if (s->idma_stalled[i] < SGE_IDMA_WARN_THRESH)
+			continue;
+
+		/* We'll issue a warning every SGE_IDMA_WARN_REPEAT Hz */
+		if (((s->idma_stalled[i] - HZ) % SGE_IDMA_WARN_REPEAT) != 0)
+			continue;
+
+		/* Read and save the SGE IDMA State and Queue ID information.
+		 * We do this every time in case it changes across time ...
+		 */
+		t4_write_reg(adap, SGE_DEBUG_INDEX, 0);
+		debug0 = t4_read_reg(adap, SGE_DEBUG_DATA_LOW);
+		s->idma_state[i] = (debug0 >> (i * 9)) & 0x3f;
+
+		t4_write_reg(adap, SGE_DEBUG_INDEX, 11);
+		debug11 = t4_read_reg(adap, SGE_DEBUG_DATA_LOW);
+		s->idma_qid[i] = (debug11 >> (i * 16)) & 0xffff;
+
+		CH_WARN(adap, "SGE idma%u, queue%u, maybe stuck state%u %dsecs"
+			" (debug0=%#x, debug11=%#x)\n",
+			i, s->idma_qid[i], s->idma_state[i],
+			s->idma_stalled[i]/HZ, debug0, debug11);
+		t4_sge_decode_idma_state(adap, s->idma_state[i]);
+	}
 
 	mod_timer(&s->rx_timer, jiffies + RX_QCHECK_PERIOD);
 }
@@ -2756,8 +2810,9 @@ int t4_sge_init(struct adapter *adap)
 
 	setup_timer(&s->rx_timer, sge_rx_timer_cb, (unsigned long)adap);
 	setup_timer(&s->tx_timer, sge_tx_timer_cb, (unsigned long)adap);
-	s->starve_thres = core_ticks_per_usec(adap) * 1000000;  /* 1 s */
-	s->idma_state[0] = s->idma_state[1] = 0;
+	s->idma_1s_thresh = core_ticks_per_usec(adap) * 1000000;  /* 1 s */
+	s->idma_stalled[0] = 0;
+	s->idma_stalled[1] = 0;
 	spin_lock_init(&s->intrq_lock);
 
 	return 0;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 04/31] cxgb4: Updates for T5 SGE's Egress Congestion Threshold
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (2 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 03/31] cxgb4: Rectify emitting messages about SGE Ingress DMA channels being potentially stuck Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 05/31] cxgb4: use spinlock_irqsave/spinlock_irqrestore for db lock Hariprasad Shenai
                   ` (26 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Kumar Sanghvi <kumaras@chelsio.com>

Based on original work by Casey Leedom <leedom@chelsio.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c     |   18 +++++++++++++-----
 drivers/net/ethernet/chelsio/cxgb4/t4_regs.h |    6 ++++++
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 809ab60..e0376cd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -2777,8 +2777,8 @@ static int t4_sge_init_hard(struct adapter *adap)
 int t4_sge_init(struct adapter *adap)
 {
 	struct sge *s = &adap->sge;
-	u32 sge_control;
-	int ret;
+	u32 sge_control, sge_conm_ctrl;
+	int ret, egress_threshold;
 
 	/*
 	 * Ingress Padding Boundary and Egress Status Page Size are set up by
@@ -2803,10 +2803,18 @@ int t4_sge_init(struct adapter *adap)
 	 * SGE's Egress Congestion Threshold.  If it isn't, then we can get
 	 * stuck waiting for new packets while the SGE is waiting for us to
 	 * give it more Free List entries.  (Note that the SGE's Egress
-	 * Congestion Threshold is in units of 2 Free List pointers.)
+	 * Congestion Threshold is in units of 2 Free List pointers.) For T4,
+	 * there was only a single field to control this.  For T5 there's the
+	 * original field which now only applies to Unpacked Mode Free List
+	 * buffers and a new field which only applies to Packed Mode Free List
+	 * buffers.
 	 */
-	s->fl_starve_thres
-		= EGRTHRESHOLD_GET(t4_read_reg(adap, SGE_CONM_CTRL))*2 + 1;
+	sge_conm_ctrl = t4_read_reg(adap, SGE_CONM_CTRL);
+	if (is_t4(adap->params.chip))
+		egress_threshold = EGRTHRESHOLD_GET(sge_conm_ctrl);
+	else
+		egress_threshold = EGRTHRESHOLDPACKING_GET(sge_conm_ctrl);
+	s->fl_starve_thres = 2*egress_threshold + 1;
 
 	setup_timer(&s->rx_timer, sge_rx_timer_cb, (unsigned long)adap);
 	setup_timer(&s->tx_timer, sge_tx_timer_cb, (unsigned long)adap);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
index 33cf9ef..225ad8a 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_regs.h
@@ -230,6 +230,12 @@
 #define  EGRTHRESHOLD(x)     ((x) << EGRTHRESHOLDshift)
 #define  EGRTHRESHOLD_GET(x) (((x) & EGRTHRESHOLD_MASK) >> EGRTHRESHOLDshift)
 
+#define EGRTHRESHOLDPACKING_MASK	0x3fU
+#define EGRTHRESHOLDPACKING_SHIFT	14
+#define EGRTHRESHOLDPACKING(x)		((x) << EGRTHRESHOLDPACKING_SHIFT)
+#define EGRTHRESHOLDPACKING_GET(x)	(((x) >> EGRTHRESHOLDPACKING_SHIFT) & \
+					  EGRTHRESHOLDPACKING_MASK)
+
 #define SGE_DBFIFO_STATUS 0x10a4
 #define  HP_INT_THRESH_SHIFT 28
 #define  HP_INT_THRESH_MASK  0xfU
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 05/31] cxgb4: use spinlock_irqsave/spinlock_irqrestore for db lock.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (3 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 04/31] cxgb4: Updates for T5 SGE's Egress Congestion Threshold Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 06/31] iw_cxgb4: cap CQ size at T4_MAX_IQ_SIZE Hariprasad Shenai
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Kumar Sanghvi <kumaras@chelsio.com>

Currently ring_tx_db() can deadlock if a db_full interrupt fires and is
run on the same while ring_tx_db() has the db lock held.  It needs to
disable interrupts since it serializes with an interrupt handler.

Based on original work by Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   11 +++++++----
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |    5 +++--
 2 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 4660f55..61825f5 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3585,9 +3585,11 @@ static void disable_txq_db(struct sge_txq *q)
 
 static void enable_txq_db(struct sge_txq *q)
 {
-	spin_lock_irq(&q->db_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->db_lock, flags);
 	q->db_disabled = 0;
-	spin_unlock_irq(&q->db_lock);
+	spin_unlock_irqrestore(&q->db_lock, flags);
 }
 
 static void disable_dbs(struct adapter *adap)
@@ -3617,9 +3619,10 @@ static void enable_dbs(struct adapter *adap)
 static void sync_txq_pidx(struct adapter *adap, struct sge_txq *q)
 {
 	u16 hw_pidx, hw_cidx;
+	unsigned long flags;
 	int ret;
 
-	spin_lock_bh(&q->db_lock);
+	spin_lock_irqsave(&q->db_lock, flags);
 	ret = read_eq_indices(adap, (u16)q->cntxt_id, &hw_pidx, &hw_cidx);
 	if (ret)
 		goto out;
@@ -3636,7 +3639,7 @@ static void sync_txq_pidx(struct adapter *adap, struct sge_txq *q)
 	}
 out:
 	q->db_disabled = 0;
-	spin_unlock_bh(&q->db_lock);
+	spin_unlock_irqrestore(&q->db_lock, flags);
 	if (ret)
 		CH_WARN(adap, "DB drop recovery failed.\n");
 }
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index e0376cd..392baee 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -860,9 +860,10 @@ static void cxgb_pio_copy(u64 __iomem *dst, u64 *src)
 static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
 {
 	unsigned int *wr, index;
+	unsigned long flags;
 
 	wmb();            /* write descriptors before telling HW */
-	spin_lock(&q->db_lock);
+	spin_lock_irqsave(&q->db_lock, flags);
 	if (!q->db_disabled) {
 		if (is_t4(adap->params.chip)) {
 			t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
@@ -880,7 +881,7 @@ static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
 		}
 	}
 	q->db_pidx = q->pidx;
-	spin_unlock(&q->db_lock);
+	spin_unlock_irqrestore(&q->db_lock, flags);
 }
 
 /**
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 06/31] iw_cxgb4: cap CQ size at T4_MAX_IQ_SIZE.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (4 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 05/31] cxgb4: use spinlock_irqsave/spinlock_irqrestore for db lock Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 07/31] iw_cxgb4: Allow loopback connections Hariprasad Shenai
                   ` (24 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cq.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 88de3aa..c0673ac 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -881,7 +881,7 @@ struct ib_cq *c4iw_create_cq(struct ib_device *ibdev, int entries,
 	/*
 	 * Make actual HW queue 2x to avoid cdix_inc overflows.
 	 */
-	hwentries = entries * 2;
+	hwentries = min(entries * 2, T4_MAX_IQ_SIZE);
 
 	/*
 	 * Make HW queue at least 64 entries so GTS updates aren't too
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 07/31] iw_cxgb4: Allow loopback connections.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (5 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 06/31] iw_cxgb4: cap CQ size at T4_MAX_IQ_SIZE Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 08/31] iw_cxgb4: release neigh entry in error paths Hariprasad Shenai
                   ` (23 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

find_route() must treat loopback as a valid
egress interface.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index d286bde..360807e 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -400,7 +400,8 @@ static struct dst_entry *find_route(struct c4iw_dev *dev, __be32 local_ip,
 	n = dst_neigh_lookup(&rt->dst, &peer_ip);
 	if (!n)
 		return NULL;
-	if (!our_interface(dev, n->dev)) {
+	if (!our_interface(dev, n->dev) &&
+	    !(n->dev->flags & IFF_LOOPBACK)) {
 		dst_release(&rt->dst);
 		return NULL;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 08/31] iw_cxgb4: release neigh entry in error paths.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (6 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 07/31] iw_cxgb4: Allow loopback connections Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 09/31] iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice Hariprasad Shenai
                   ` (22 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Always release the neigh entry in rx_pkt().

Based on original work by Santosh Rastapur <santosh@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 360807e..74a2250 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -3350,10 +3350,9 @@ static int rx_pkt(struct c4iw_dev *dev, struct sk_buff *skb)
 	if (!e) {
 		pr_err("%s - failed to allocate l2t entry!\n",
 		       __func__);
-		goto free_dst;
+		goto free_neigh;
 	}
 
-	neigh_release(neigh);
 	step = dev->rdev.lldi.nrxq / dev->rdev.lldi.nchan;
 	rss_qid = dev->rdev.lldi.rxq_ids[pi->port_id * step];
 	window = (__force u16) htons((__force u16)tcph->window);
@@ -3373,6 +3372,8 @@ static int rx_pkt(struct c4iw_dev *dev, struct sk_buff *skb)
 			      tcph->source, ntohl(tcph->seq), filter, window,
 			      rss_qid, pi->port_id);
 	cxgb4_l2t_release(e);
+free_neigh:
+	neigh_release(neigh);
 free_dst:
 	dst_release(dst);
 reject:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 09/31] iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (7 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 08/31] iw_cxgb4: release neigh entry in error paths Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes Hariprasad Shenai
                   ` (21 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Based on original work by Anand Priyadarshee <anandp@chelsio.com>.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c            |   25 +++++++++++++------------
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h |    1 +
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 74a2250..9387f74 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1648,6 +1648,16 @@ static inline int act_open_has_tid(int status)
 	       status != CPL_ERR_ARP_MISS;
 }
 
+/*
+ * Returns whether a CPL status conveys negative advice.
+ */
+static int is_neg_adv(unsigned int status)
+{
+	return status == CPL_ERR_RTX_NEG_ADVICE ||
+	       status == CPL_ERR_PERSIST_NEG_ADVICE ||
+	       status == CPL_ERR_KEEPALV_NEG_ADVICE;
+}
+
 #define ACT_OPEN_RETRY_COUNT 2
 
 static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
@@ -1836,7 +1846,7 @@ static int act_open_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
 	PDBG("%s ep %p atid %u status %u errno %d\n", __func__, ep, atid,
 	     status, status2errno(status));
 
-	if (status == CPL_ERR_RTX_NEG_ADVICE) {
+	if (is_neg_adv(status)) {
 		printk(KERN_WARNING MOD "Connection problems for atid %u\n",
 			atid);
 		return 0;
@@ -2266,15 +2276,6 @@ static int peer_close(struct c4iw_dev *dev, struct sk_buff *skb)
 	return 0;
 }
 
-/*
- * Returns whether an ABORT_REQ_RSS message is a negative advice.
- */
-static int is_neg_adv_abort(unsigned int status)
-{
-	return status == CPL_ERR_RTX_NEG_ADVICE ||
-	       status == CPL_ERR_PERSIST_NEG_ADVICE;
-}
-
 static int peer_abort(struct c4iw_dev *dev, struct sk_buff *skb)
 {
 	struct cpl_abort_req_rss *req = cplhdr(skb);
@@ -2288,7 +2289,7 @@ static int peer_abort(struct c4iw_dev *dev, struct sk_buff *skb)
 	unsigned int tid = GET_TID(req);
 
 	ep = lookup_tid(t, tid);
-	if (is_neg_adv_abort(req->status)) {
+	if (is_neg_adv(req->status)) {
 		PDBG("%s neg_adv_abort ep %p tid %u\n", __func__, ep,
 		     ep->hwtid);
 		return 0;
@@ -3572,7 +3573,7 @@ static int peer_abort_intr(struct c4iw_dev *dev, struct sk_buff *skb)
 		kfree_skb(skb);
 		return 0;
 	}
-	if (is_neg_adv_abort(req->status)) {
+	if (is_neg_adv(req->status)) {
 		PDBG("%s neg_adv_abort ep %p tid %u\n", __func__, ep,
 		     ep->hwtid);
 		kfree_skb(skb);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
index cd6874b..f2738c7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
@@ -116,6 +116,7 @@ enum CPL_error {
 	CPL_ERR_KEEPALIVE_TIMEDOUT = 34,
 	CPL_ERR_RTX_NEG_ADVICE     = 35,
 	CPL_ERR_PERSIST_NEG_ADVICE = 36,
+	CPL_ERR_KEEPALV_NEG_ADVICE = 37,
 	CPL_ERR_ABORT_FAILED       = 42,
 	CPL_ERR_IWARP_FLM          = 50,
 };
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (8 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 09/31] iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
       [not found]   ` <1393427230-14532-11-git-send-email-hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
  2014-02-26 15:06 ` [PATCH net-next 11/31] iw_cxgb4: use the BAR2/WC path for kernel QPs and T5 devices Hariprasad Shenai
                   ` (20 subsequent siblings)
  30 siblings, 1 reply; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

The current logic suffers from a slow response time to disable user DB
usage, and also fails to avoid DB FIFO drops under heavy load. This commit
fixes these deficiencies and makes the avoidance logic more optimal.
This is done by more efficiently notifying the ULDs of potential DB
problems, and implements a smoother flow control algorithm in iw_cxgb4,
which is the ULD that puts the most load on the DB fifo.

Design:

cxgb4:

Direct ULD notification when a DB FULL/DROP interrupt fires.  This allows
the ULD to stop doing user DB writes as quickly as possible.

While user DB usage is disabled, the LLD will accumulate DB write events
for its queues.  Then once DB usage is reenabled, a single DB write is
done for each queue with its accumulated write count.  This reduces the
load put on the DB fifo when reenabling.

iw_cxgb4:

Instead of marking each qp to indicate DB writes are disabled, we create
a device-global status page that each user process maps.  This allows
iw_cxgb4 to only set this single bit to disable all DB write for all
user QPs vs traversing all the active QPs.  If the libcxgb4 doesn't
support this, then we fall back to the old approach of marking each QP.
Thus we allow the new driver to work with an older libcxgb4.

When the LLD upcalls indicating DB FULL, we disable all DB writes
via the status page and transition the DB state to STOPPED.  As user
processes see that DB writes are disabled, they call into the iw_cxgb4
submit their DB write events.  Since the DB state is in STOPPED,
the QP trying to write gets enqueued on a new DB "flow control" list.
As subsequent DB writes are submitted for this flow controlled QP, the
amount of writes are accumulated for each QP on the flow control list.
So all the user QPs that are actively ringing the DB get put on this
list and the number of writes they request are accumulated.

When the LLD upcalls indicating DB EMPTY, which is in a workq context, we
change the DB state to FLOW_CONTROL, and begin resuming all the QPs that
are on the flow control list.  This logic runs on until the flow control
list is empty or we exit FLOW_CONTROL mode (due to a DB DROP upcall,
for example).  QPs are removed from this list, and their accumulated
DB write counts written to the DB FIFO.  Sets of QPs, called chunks in
the code, are removed at one time. This chunk size is a module option,
db_fc_resume_size, and defaults to 64.  So 64 QPs are resumed at a time,
and before the next chunk is resumed, the logic waits (blocks) for the
DB FIFO to drain.  This prevents resuming to quickly and overflowing
the FIFO.  Once the flow control list is empty, the db state transitions
back to NORMAL and user QPs are again allowed to write directly to the
user DB register.

The algorithm is designed such that if the DB write load is high enough,
then all the DB writes get submitted by the kernel using this flow
controlled approach to avoid DB drops.  As the load lightens though, we
resume to normal DB writes directly by user applications.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/device.c            |  196 ++++++++++++++---------
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h          |   11 +-
 drivers/infiniband/hw/cxgb4/provider.c          |   44 +++++-
 drivers/infiniband/hw/cxgb4/qp.c                |  152 +++++++-----------
 drivers/infiniband/hw/cxgb4/t4.h                |    6 +
 drivers/infiniband/hw/cxgb4/user.h              |    5 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4.h      |    1 +
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c |   73 +++++----
 drivers/net/ethernet/chelsio/cxgb4/sge.c        |    3 +-
 9 files changed, 285 insertions(+), 206 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 4a03385..64334ef 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -45,15 +45,22 @@ MODULE_DESCRIPTION("Chelsio T4/T5 RDMA Driver");
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_VERSION(DRV_VERSION);
 
-static int allow_db_fc_on_t5;
-module_param(allow_db_fc_on_t5, int, 0644);
-MODULE_PARM_DESC(allow_db_fc_on_t5,
-		 "Allow DB Flow Control on T5 (default = 0)");
-
-static int allow_db_coalescing_on_t5;
-module_param(allow_db_coalescing_on_t5, int, 0644);
-MODULE_PARM_DESC(allow_db_coalescing_on_t5,
-		 "Allow DB Coalescing on T5 (default = 0)");
+static int db_fc_resume_size = 64;
+module_param(db_fc_resume_size, int, 0644);
+MODULE_PARM_DESC(db_fc_resume_size, "qps are resumed from db flow control in "
+		 "this size chunks (default = 64)");
+
+static int db_fc_resume_delay = 1;
+module_param(db_fc_resume_delay, int, 0644);
+MODULE_PARM_DESC(db_fc_resume_delay, "how long to delay between removing qps "
+		 "from the fc list (default is 1 jiffy)");
+
+static int db_fc_drain_thresh;
+module_param(db_fc_drain_thresh, int, 0644);
+MODULE_PARM_DESC(db_fc_drain_thresh,
+		 "relative threshold at which a chunk will be resumed"
+		 "from the fc list (default is 0 (int_thresh << "
+		 "db_fc_drain_thresh))");
 
 struct uld_ctx {
 	struct list_head entry;
@@ -311,9 +318,10 @@ static int stats_show(struct seq_file *seq, void *v)
 	seq_printf(seq, "  DB FULL: %10llu\n", dev->rdev.stats.db_full);
 	seq_printf(seq, " DB EMPTY: %10llu\n", dev->rdev.stats.db_empty);
 	seq_printf(seq, "  DB DROP: %10llu\n", dev->rdev.stats.db_drop);
-	seq_printf(seq, " DB State: %s Transitions %llu\n",
+	seq_printf(seq, " DB State: %s Transitions %llu FC Interruptions %llu\n",
 		   db_state_str[dev->db_state],
-		   dev->rdev.stats.db_state_transitions);
+		   dev->rdev.stats.db_state_transitions,
+		   dev->rdev.stats.db_fc_interruptions);
 	seq_printf(seq, "TCAM_FULL: %10llu\n", dev->rdev.stats.tcam_full);
 	seq_printf(seq, "ACT_OFLD_CONN_FAILS: %10llu\n",
 		   dev->rdev.stats.act_ofld_conn_fails);
@@ -643,6 +651,12 @@ static int c4iw_rdev_open(struct c4iw_rdev *rdev)
 		printk(KERN_ERR MOD "error %d initializing ocqp pool\n", err);
 		goto err4;
 	}
+	rdev->status_page = (struct t4_dev_status_page *)
+			    __get_free_page(GFP_KERNEL);
+	if (!rdev->status_page) {
+		pr_err(MOD "error allocating status page\n");
+		goto err4;
+	}
 	return 0;
 err4:
 	c4iw_rqtpool_destroy(rdev);
@@ -656,6 +670,7 @@ err1:
 
 static void c4iw_rdev_close(struct c4iw_rdev *rdev)
 {
+	free_page((unsigned long)rdev->status_page);
 	c4iw_pblpool_destroy(rdev);
 	c4iw_rqtpool_destroy(rdev);
 	c4iw_destroy_resource(&rdev->resource);
@@ -703,18 +718,6 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 		pr_info("%s: On-Chip Queues not supported on this device.\n",
 			pci_name(infop->pdev));
 
-	if (!is_t4(infop->adapter_type)) {
-		if (!allow_db_fc_on_t5) {
-			db_fc_threshold = 100000;
-			pr_info("DB Flow Control Disabled.\n");
-		}
-
-		if (!allow_db_coalescing_on_t5) {
-			db_coalescing_threshold = -1;
-			pr_info("DB Coalescing Disabled.\n");
-		}
-	}
-
 	devp = (struct c4iw_dev *)ib_alloc_device(sizeof(*devp));
 	if (!devp) {
 		printk(KERN_ERR MOD "Cannot allocate ib device\n");
@@ -749,6 +752,7 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 	spin_lock_init(&devp->lock);
 	mutex_init(&devp->rdev.stats.lock);
 	mutex_init(&devp->db_mutex);
+	INIT_LIST_HEAD(&devp->db_fc_list);
 
 	if (c4iw_debugfs_root) {
 		devp->debugfs_root = debugfs_create_dir(
@@ -977,13 +981,16 @@ static int disable_qp_db(int id, void *p, void *data)
 
 static void stop_queues(struct uld_ctx *ctx)
 {
-	spin_lock_irq(&ctx->dev->lock);
-	if (ctx->dev->db_state == NORMAL) {
-		ctx->dev->rdev.stats.db_state_transitions++;
-		ctx->dev->db_state = FLOW_CONTROL;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ctx->dev->lock, flags);
+	ctx->dev->rdev.stats.db_state_transitions++;
+	ctx->dev->db_state = STOPPED;
+	if (ctx->dev->rdev.flags & T4_STATUS_PAGE_DISABLED)
 		idr_for_each(&ctx->dev->qpidr, disable_qp_db, NULL);
-	}
-	spin_unlock_irq(&ctx->dev->lock);
+	else
+		ctx->dev->rdev.status_page->db_off = 1;
+	spin_unlock_irqrestore(&ctx->dev->lock, flags);
 }
 
 static int enable_qp_db(int id, void *p, void *data)
@@ -994,15 +1001,70 @@ static int enable_qp_db(int id, void *p, void *data)
 	return 0;
 }
 
+static void resume_rc_qp(struct c4iw_qp *qp)
+{
+	spin_lock(&qp->lock);
+	t4_ring_sq_db(&qp->wq, qp->wq.sq.wq_pidx_inc);
+	qp->wq.sq.wq_pidx_inc = 0;
+	t4_ring_rq_db(&qp->wq, qp->wq.rq.wq_pidx_inc);
+	qp->wq.rq.wq_pidx_inc = 0;
+	spin_unlock(&qp->lock);
+}
+
+static void resume_a_chunk(struct uld_ctx *ctx)
+{
+	int i;
+	struct c4iw_qp *qp;
+
+	for (i = 0; i < db_fc_resume_size; i++) {
+		qp = list_first_entry(&ctx->dev->db_fc_list, struct c4iw_qp,
+				      db_fc_entry);
+		list_del_init(&qp->db_fc_entry);
+		resume_rc_qp(qp);
+		if (list_empty(&ctx->dev->db_fc_list))
+			break;
+	}
+}
+
 static void resume_queues(struct uld_ctx *ctx)
 {
 	spin_lock_irq(&ctx->dev->lock);
-	if (ctx->dev->qpcnt <= db_fc_threshold &&
-	    ctx->dev->db_state == FLOW_CONTROL) {
-		ctx->dev->db_state = NORMAL;
-		ctx->dev->rdev.stats.db_state_transitions++;
-		idr_for_each(&ctx->dev->qpidr, enable_qp_db, NULL);
+	if (ctx->dev->db_state != STOPPED)
+		goto out;
+	ctx->dev->db_state = FLOW_CONTROL;
+	while (1) {
+		if (list_empty(&ctx->dev->db_fc_list)) {
+			WARN_ON(ctx->dev->db_state != FLOW_CONTROL);
+			ctx->dev->db_state = NORMAL;
+			ctx->dev->rdev.stats.db_state_transitions++;
+			if (ctx->dev->rdev.flags & T4_STATUS_PAGE_DISABLED) {
+				idr_for_each(&ctx->dev->qpidr, enable_qp_db,
+					     NULL);
+			} else {
+				ctx->dev->rdev.status_page->db_off = 0;
+			}
+			break;
+		} else {
+			if (cxgb4_dbfifo_count(ctx->dev->rdev.lldi.ports[0], 1)
+			    < (ctx->dev->rdev.lldi.dbfifo_int_thresh <<
+			       db_fc_drain_thresh)) {
+				resume_a_chunk(ctx);
+			}
+			if (!list_empty(&ctx->dev->db_fc_list)) {
+				spin_unlock_irq(&ctx->dev->lock);
+				if (db_fc_resume_delay) {
+					set_current_state(TASK_UNINTERRUPTIBLE);
+					schedule_timeout(db_fc_resume_delay);
+				}
+				spin_lock_irq(&ctx->dev->lock);
+				if (ctx->dev->db_state != FLOW_CONTROL)
+					break;
+			}
+		}
 	}
+out:
+	if (ctx->dev->db_state != NORMAL)
+		ctx->dev->rdev.stats.db_fc_interruptions++;
 	spin_unlock_irq(&ctx->dev->lock);
 }
 
@@ -1028,12 +1090,12 @@ static int count_qps(int id, void *p, void *data)
 	return 0;
 }
 
-static void deref_qps(struct qp_list qp_list)
+static void deref_qps(struct qp_list *qp_list)
 {
 	int idx;
 
-	for (idx = 0; idx < qp_list.idx; idx++)
-		c4iw_qp_rem_ref(&qp_list.qps[idx]->ibqp);
+	for (idx = 0; idx < qp_list->idx; idx++)
+		c4iw_qp_rem_ref(&qp_list->qps[idx]->ibqp);
 }
 
 static void recover_lost_dbs(struct uld_ctx *ctx, struct qp_list *qp_list)
@@ -1044,17 +1106,22 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct qp_list *qp_list)
 	for (idx = 0; idx < qp_list->idx; idx++) {
 		struct c4iw_qp *qp = qp_list->qps[idx];
 
+		spin_lock_irq(&qp->rhp->lock);
+		spin_lock(&qp->lock);
 		ret = cxgb4_sync_txq_pidx(qp->rhp->rdev.lldi.ports[0],
 					  qp->wq.sq.qid,
 					  t4_sq_host_wq_pidx(&qp->wq),
 					  t4_sq_wq_size(&qp->wq));
 		if (ret) {
-			printk(KERN_ERR MOD "%s: Fatal error - "
+			pr_err(KERN_ERR MOD "%s: Fatal error - "
 			       "DB overflow recovery failed - "
 			       "error syncing SQ qid %u\n",
 			       pci_name(ctx->lldi.pdev), qp->wq.sq.qid);
+			spin_unlock(&qp->lock);
+			spin_unlock_irq(&qp->rhp->lock);
 			return;
 		}
+		qp->wq.sq.wq_pidx_inc = 0;
 
 		ret = cxgb4_sync_txq_pidx(qp->rhp->rdev.lldi.ports[0],
 					  qp->wq.rq.qid,
@@ -1062,12 +1129,17 @@ static void recover_lost_dbs(struct uld_ctx *ctx, struct qp_list *qp_list)
 					  t4_rq_wq_size(&qp->wq));
 
 		if (ret) {
-			printk(KERN_ERR MOD "%s: Fatal error - "
+			pr_err(KERN_ERR MOD "%s: Fatal error - "
 			       "DB overflow recovery failed - "
 			       "error syncing RQ qid %u\n",
 			       pci_name(ctx->lldi.pdev), qp->wq.rq.qid);
+			spin_unlock(&qp->lock);
+			spin_unlock_irq(&qp->rhp->lock);
 			return;
 		}
+		qp->wq.rq.wq_pidx_inc = 0;
+		spin_unlock(&qp->lock);
+		spin_unlock_irq(&qp->rhp->lock);
 
 		/* Wait for the dbfifo to drain */
 		while (cxgb4_dbfifo_count(qp->rhp->rdev.lldi.ports[0], 1) > 0) {
@@ -1083,36 +1155,22 @@ static void recover_queues(struct uld_ctx *ctx)
 	struct qp_list qp_list;
 	int ret;
 
-	/* lock out kernel db ringers */
-	mutex_lock(&ctx->dev->db_mutex);
-
-	/* put all queues in to recovery mode */
-	spin_lock_irq(&ctx->dev->lock);
-	ctx->dev->db_state = RECOVERY;
-	ctx->dev->rdev.stats.db_state_transitions++;
-	idr_for_each(&ctx->dev->qpidr, disable_qp_db, NULL);
-	spin_unlock_irq(&ctx->dev->lock);
-
 	/* slow everybody down */
 	set_current_state(TASK_UNINTERRUPTIBLE);
 	schedule_timeout(usecs_to_jiffies(1000));
 
-	/* Wait for the dbfifo to completely drain. */
-	while (cxgb4_dbfifo_count(ctx->dev->rdev.lldi.ports[0], 1) > 0) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(usecs_to_jiffies(10));
-	}
-
 	/* flush the SGE contexts */
 	ret = cxgb4_flush_eq_cache(ctx->dev->rdev.lldi.ports[0]);
 	if (ret) {
 		printk(KERN_ERR MOD "%s: Fatal error - DB overflow recovery failed\n",
 		       pci_name(ctx->lldi.pdev));
-		goto out;
+		return;
 	}
 
 	/* Count active queues so we can build a list of queues to recover */
 	spin_lock_irq(&ctx->dev->lock);
+	WARN_ON(ctx->dev->db_state != STOPPED);
+	ctx->dev->db_state = RECOVERY;
 	idr_for_each(&ctx->dev->qpidr, count_qps, &count);
 
 	qp_list.qps = kzalloc(count * sizeof *qp_list.qps, GFP_ATOMIC);
@@ -1120,7 +1178,7 @@ static void recover_queues(struct uld_ctx *ctx)
 		printk(KERN_ERR MOD "%s: Fatal error - DB overflow recovery failed\n",
 		       pci_name(ctx->lldi.pdev));
 		spin_unlock_irq(&ctx->dev->lock);
-		goto out;
+		return;
 	}
 	qp_list.idx = 0;
 
@@ -1133,29 +1191,13 @@ static void recover_queues(struct uld_ctx *ctx)
 	recover_lost_dbs(ctx, &qp_list);
 
 	/* we're almost done!  deref the qps and clean up */
-	deref_qps(qp_list);
+	deref_qps(&qp_list);
 	kfree(qp_list.qps);
 
-	/* Wait for the dbfifo to completely drain again */
-	while (cxgb4_dbfifo_count(ctx->dev->rdev.lldi.ports[0], 1) > 0) {
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(usecs_to_jiffies(10));
-	}
-
-	/* resume the queues */
 	spin_lock_irq(&ctx->dev->lock);
-	if (ctx->dev->qpcnt > db_fc_threshold)
-		ctx->dev->db_state = FLOW_CONTROL;
-	else {
-		ctx->dev->db_state = NORMAL;
-		idr_for_each(&ctx->dev->qpidr, enable_qp_db, NULL);
-	}
-	ctx->dev->rdev.stats.db_state_transitions++;
+	WARN_ON(ctx->dev->db_state != RECOVERY);
+	ctx->dev->db_state = STOPPED;
 	spin_unlock_irq(&ctx->dev->lock);
-
-out:
-	/* start up kernel db ringers again */
-	mutex_unlock(&ctx->dev->db_mutex);
 }
 
 static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
@@ -1165,9 +1207,7 @@ static int c4iw_uld_control(void *handle, enum cxgb4_control control, ...)
 	switch (control) {
 	case CXGB4_CONTROL_DB_FULL:
 		stop_queues(ctx);
-		mutex_lock(&ctx->dev->rdev.stats.lock);
 		ctx->dev->rdev.stats.db_full++;
-		mutex_unlock(&ctx->dev->rdev.stats.lock);
 		break;
 	case CXGB4_CONTROL_DB_EMPTY:
 		resume_queues(ctx);
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 23eaeab..e9ecbfa 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -109,6 +109,7 @@ struct c4iw_dev_ucontext {
 
 enum c4iw_rdev_flags {
 	T4_FATAL_ERROR = (1<<0),
+	T4_STATUS_PAGE_DISABLED = (1<<1),
 };
 
 struct c4iw_stat {
@@ -130,6 +131,7 @@ struct c4iw_stats {
 	u64  db_empty;
 	u64  db_drop;
 	u64  db_state_transitions;
+	u64  db_fc_interruptions;
 	u64  tcam_full;
 	u64  act_ofld_conn_fails;
 	u64  pas_ofld_conn_fails;
@@ -150,6 +152,7 @@ struct c4iw_rdev {
 	unsigned long oc_mw_pa;
 	void __iomem *oc_mw_kva;
 	struct c4iw_stats stats;
+	struct t4_dev_status_page *status_page;
 };
 
 static inline int c4iw_fatal_error(struct c4iw_rdev *rdev)
@@ -211,7 +214,8 @@ static inline int c4iw_wait_for_reply(struct c4iw_rdev *rdev,
 enum db_state {
 	NORMAL = 0,
 	FLOW_CONTROL = 1,
-	RECOVERY = 2
+	RECOVERY = 2,
+	STOPPED = 3
 };
 
 struct c4iw_dev {
@@ -225,10 +229,10 @@ struct c4iw_dev {
 	struct mutex db_mutex;
 	struct dentry *debugfs_root;
 	enum db_state db_state;
-	int qpcnt;
 	struct idr hwtid_idr;
 	struct idr atid_idr;
 	struct idr stid_idr;
+	struct list_head db_fc_list;
 };
 
 static inline struct c4iw_dev *to_c4iw_dev(struct ib_device *ibdev)
@@ -432,6 +436,7 @@ struct c4iw_qp_attributes {
 
 struct c4iw_qp {
 	struct ib_qp ibqp;
+	struct list_head db_fc_entry;
 	struct c4iw_dev *rhp;
 	struct c4iw_ep *ep;
 	struct c4iw_qp_attributes attr;
@@ -936,8 +941,6 @@ void c4iw_ev_dispatch(struct c4iw_dev *dev, struct t4_cqe *err_cqe);
 extern struct cxgb4_client t4c_client;
 extern c4iw_handler_func c4iw_handlers[NUM_CPL_CMDS];
 extern int c4iw_max_read_depth;
-extern int db_fc_threshold;
-extern int db_coalescing_threshold;
 extern int use_dsgl;
 
 
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 7e94c9a..d1565a4 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -106,15 +106,55 @@ static struct ib_ucontext *c4iw_alloc_ucontext(struct ib_device *ibdev,
 {
 	struct c4iw_ucontext *context;
 	struct c4iw_dev *rhp = to_c4iw_dev(ibdev);
+	static int warned;
+	struct c4iw_alloc_ucontext_resp uresp;
+	int ret = 0;
+	struct c4iw_mm_entry *mm = NULL;
 
 	PDBG("%s ibdev %p\n", __func__, ibdev);
 	context = kzalloc(sizeof(*context), GFP_KERNEL);
-	if (!context)
-		return ERR_PTR(-ENOMEM);
+	if (!context) {
+		ret = -ENOMEM;
+		goto err;
+	}
+
 	c4iw_init_dev_ucontext(&rhp->rdev, &context->uctx);
 	INIT_LIST_HEAD(&context->mmaps);
 	spin_lock_init(&context->mmap_lock);
+
+	if (udata->outlen < sizeof(uresp)) {
+		if (!warned++)
+			pr_err(MOD "Warning - downlevel libcxgb4 "
+			       "(non-fatal), device status page disabled.");
+		rhp->rdev.flags |= T4_STATUS_PAGE_DISABLED;
+	} else {
+		mm = kmalloc(sizeof(*mm), GFP_KERNEL);
+		if (!mm)
+			goto err_free;
+
+		uresp.status_page_size = PAGE_SIZE;
+
+		spin_lock(&context->mmap_lock);
+		uresp.status_page_key = context->key;
+		context->key += PAGE_SIZE;
+		spin_unlock(&context->mmap_lock);
+
+		ret = ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+		if (ret)
+			goto err_mm;
+
+		mm->key = uresp.status_page_key;
+		mm->addr = virt_to_phys(rhp->rdev.status_page);
+		mm->len = PAGE_SIZE;
+		insert_mmap(context, mm);
+	}
 	return &context->ibucontext;
+err_mm:
+	kfree(mm);
+err_free:
+	kfree(context);
+err:
+	return ERR_PTR(ret);
 }
 
 static int c4iw_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 5829367..2b0d994 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -42,18 +42,6 @@ static int ocqp_support = 1;
 module_param(ocqp_support, int, 0644);
 MODULE_PARM_DESC(ocqp_support, "Support on-chip SQs (default=1)");
 
-int db_fc_threshold = 1000;
-module_param(db_fc_threshold, int, 0644);
-MODULE_PARM_DESC(db_fc_threshold,
-		 "QP count/threshold that triggers"
-		 " automatic db flow control mode (default = 1000)");
-
-int db_coalescing_threshold;
-module_param(db_coalescing_threshold, int, 0644);
-MODULE_PARM_DESC(db_coalescing_threshold,
-		 "QP count/threshold that triggers"
-		 " disabling db coalescing (default = 0)");
-
 static int max_fr_immd = T4_MAX_FR_IMMD;
 module_param(max_fr_immd, int, 0644);
 MODULE_PARM_DESC(max_fr_immd, "fastreg threshold for using DSGL instead of immedate");
@@ -638,6 +626,46 @@ void c4iw_qp_rem_ref(struct ib_qp *qp)
 		wake_up(&(to_c4iw_qp(qp)->wait));
 }
 
+static void add_to_fc_list(struct list_head *head, struct list_head *entry)
+{
+	if (list_empty(entry))
+		list_add_tail(entry, head);
+}
+
+static int ring_kernel_sq_db(struct c4iw_qp *qhp, u16 inc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&qhp->rhp->lock, flags);
+	spin_lock(&qhp->lock);
+	if (qhp->rhp->db_state == NORMAL) {
+		t4_ring_sq_db(&qhp->wq, inc);
+	} else {
+		add_to_fc_list(&qhp->rhp->db_fc_list, &qhp->db_fc_entry);
+		qhp->wq.sq.wq_pidx_inc += inc;
+	}
+	spin_unlock(&qhp->lock);
+	spin_unlock_irqrestore(&qhp->rhp->lock, flags);
+	return 0;
+}
+
+static int ring_kernel_rq_db(struct c4iw_qp *qhp, u16 inc)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&qhp->rhp->lock, flags);
+	spin_lock(&qhp->lock);
+	if (qhp->rhp->db_state == NORMAL) {
+		t4_ring_rq_db(&qhp->wq, inc);
+	} else {
+		add_to_fc_list(&qhp->rhp->db_fc_list, &qhp->db_fc_entry);
+		qhp->wq.rq.wq_pidx_inc += inc;
+	}
+	spin_unlock(&qhp->lock);
+	spin_unlock_irqrestore(&qhp->rhp->lock, flags);
+	return 0;
+}
+
 int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		   struct ib_send_wr **bad_wr)
 {
@@ -750,9 +778,13 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		t4_sq_produce(&qhp->wq, len16);
 		idx += DIV_ROUND_UP(len16*16, T4_EQ_ENTRY_SIZE);
 	}
-	if (t4_wq_db_enabled(&qhp->wq))
+	if (!qhp->rhp->rdev.status_page->db_off) {
 		t4_ring_sq_db(&qhp->wq, idx);
-	spin_unlock_irqrestore(&qhp->lock, flag);
+		spin_unlock_irqrestore(&qhp->lock, flag);
+	} else {
+		spin_unlock_irqrestore(&qhp->lock, flag);
+		ring_kernel_sq_db(qhp, idx);
+	}
 	return err;
 }
 
@@ -812,9 +844,13 @@ int c4iw_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 		wr = wr->next;
 		num_wrs--;
 	}
-	if (t4_wq_db_enabled(&qhp->wq))
+	if (!qhp->rhp->rdev.status_page->db_off) {
 		t4_ring_rq_db(&qhp->wq, idx);
-	spin_unlock_irqrestore(&qhp->lock, flag);
+		spin_unlock_irqrestore(&qhp->lock, flag);
+	} else {
+		spin_unlock_irqrestore(&qhp->lock, flag);
+		ring_kernel_rq_db(qhp, idx);
+	}
 	return err;
 }
 
@@ -1200,35 +1236,6 @@ out:
 	return ret;
 }
 
-/*
- * Called by the library when the qp has user dbs disabled due to
- * a DB_FULL condition.  This function will single-thread all user
- * DB rings to avoid overflowing the hw db-fifo.
- */
-static int ring_kernel_db(struct c4iw_qp *qhp, u32 qid, u16 inc)
-{
-	int delay = db_delay_usecs;
-
-	mutex_lock(&qhp->rhp->db_mutex);
-	do {
-
-		/*
-		 * The interrupt threshold is dbfifo_int_thresh << 6. So
-		 * make sure we don't cross that and generate an interrupt.
-		 */
-		if (cxgb4_dbfifo_count(qhp->rhp->rdev.lldi.ports[0], 1) <
-		    (qhp->rhp->rdev.lldi.dbfifo_int_thresh << 5)) {
-			writel(QID(qid) | PIDX(inc), qhp->wq.db);
-			break;
-		}
-		set_current_state(TASK_UNINTERRUPTIBLE);
-		schedule_timeout(usecs_to_jiffies(delay));
-		delay = min(delay << 1, 2000);
-	} while (1);
-	mutex_unlock(&qhp->rhp->db_mutex);
-	return 0;
-}
-
 int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
 		   enum c4iw_qp_attr_mask mask,
 		   struct c4iw_qp_attributes *attrs,
@@ -1278,11 +1285,11 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
 	}
 
 	if (mask & C4IW_QP_ATTR_SQ_DB) {
-		ret = ring_kernel_db(qhp, qhp->wq.sq.qid, attrs->sq_db_inc);
+		ret = ring_kernel_sq_db(qhp, attrs->sq_db_inc);
 		goto out;
 	}
 	if (mask & C4IW_QP_ATTR_RQ_DB) {
-		ret = ring_kernel_db(qhp, qhp->wq.rq.qid, attrs->rq_db_inc);
+		ret = ring_kernel_rq_db(qhp, attrs->rq_db_inc);
 		goto out;
 	}
 
@@ -1465,14 +1472,6 @@ out:
 	return ret;
 }
 
-static int enable_qp_db(int id, void *p, void *data)
-{
-	struct c4iw_qp *qp = p;
-
-	t4_enable_wq_db(&qp->wq);
-	return 0;
-}
-
 int c4iw_destroy_qp(struct ib_qp *ib_qp)
 {
 	struct c4iw_dev *rhp;
@@ -1490,22 +1489,15 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp)
 		c4iw_modify_qp(rhp, qhp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0);
 	wait_event(qhp->wait, !qhp->ep);
 
-	spin_lock_irq(&rhp->lock);
-	remove_handle_nolock(rhp, &rhp->qpidr, qhp->wq.sq.qid);
-	rhp->qpcnt--;
-	BUG_ON(rhp->qpcnt < 0);
-	if (rhp->qpcnt <= db_fc_threshold && rhp->db_state == FLOW_CONTROL) {
-		rhp->rdev.stats.db_state_transitions++;
-		rhp->db_state = NORMAL;
-		idr_for_each(&rhp->qpidr, enable_qp_db, NULL);
-	}
-	if (db_coalescing_threshold >= 0)
-		if (rhp->qpcnt <= db_coalescing_threshold)
-			cxgb4_enable_db_coalescing(rhp->rdev.lldi.ports[0]);
-	spin_unlock_irq(&rhp->lock);
+	remove_handle(rhp, &rhp->qpidr, qhp->wq.sq.qid);
 	atomic_dec(&qhp->refcnt);
 	wait_event(qhp->wait, !atomic_read(&qhp->refcnt));
 
+	spin_lock_irq(&rhp->lock);
+	if (!list_empty(&qhp->db_fc_entry))
+		list_del_init(&qhp->db_fc_entry);
+	spin_unlock_irq(&rhp->lock);
+
 	ucontext = ib_qp->uobject ?
 		   to_c4iw_ucontext(ib_qp->uobject->context) : NULL;
 	destroy_qp(&rhp->rdev, &qhp->wq,
@@ -1516,14 +1508,6 @@ int c4iw_destroy_qp(struct ib_qp *ib_qp)
 	return 0;
 }
 
-static int disable_qp_db(int id, void *p, void *data)
-{
-	struct c4iw_qp *qp = p;
-
-	t4_disable_wq_db(&qp->wq);
-	return 0;
-}
-
 struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
 			     struct ib_udata *udata)
 {
@@ -1610,20 +1594,7 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
 	init_waitqueue_head(&qhp->wait);
 	atomic_set(&qhp->refcnt, 1);
 
-	spin_lock_irq(&rhp->lock);
-	if (rhp->db_state != NORMAL)
-		t4_disable_wq_db(&qhp->wq);
-	rhp->qpcnt++;
-	if (rhp->qpcnt > db_fc_threshold && rhp->db_state == NORMAL) {
-		rhp->rdev.stats.db_state_transitions++;
-		rhp->db_state = FLOW_CONTROL;
-		idr_for_each(&rhp->qpidr, disable_qp_db, NULL);
-	}
-	if (db_coalescing_threshold >= 0)
-		if (rhp->qpcnt > db_coalescing_threshold)
-			cxgb4_disable_db_coalescing(rhp->rdev.lldi.ports[0]);
-	ret = insert_handle_nolock(rhp, &rhp->qpidr, qhp, qhp->wq.sq.qid);
-	spin_unlock_irq(&rhp->lock);
+	ret = insert_handle(rhp, &rhp->qpidr, qhp, qhp->wq.sq.qid);
 	if (ret)
 		goto err2;
 
@@ -1709,6 +1680,7 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
 	}
 	qhp->ibqp.qp_num = qhp->wq.sq.qid;
 	init_timer(&(qhp->timer));
+	INIT_LIST_HEAD(&qhp->db_fc_entry);
 	PDBG("%s qhp %p sq_num_entries %d, rq_num_entries %d qpid 0x%0x\n",
 	     __func__, qhp, qhp->attr.sq_num_entries, qhp->attr.rq_num_entries,
 	     qhp->wq.sq.qid);
diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index e73ace7..eeca8b1 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -300,6 +300,7 @@ struct t4_sq {
 	u16 cidx;
 	u16 pidx;
 	u16 wq_pidx;
+	u16 wq_pidx_inc;
 	u16 flags;
 	short flush_cidx;
 };
@@ -324,6 +325,7 @@ struct t4_rq {
 	u16 cidx;
 	u16 pidx;
 	u16 wq_pidx;
+	u16 wq_pidx_inc;
 };
 
 struct t4_wq {
@@ -609,3 +611,7 @@ static inline void t4_set_cq_in_error(struct t4_cq *cq)
 	((struct t4_status_page *)&cq->queue[cq->size])->qp_err = 1;
 }
 #endif
+
+struct t4_dev_status_page {
+	u8 db_off;
+};
diff --git a/drivers/infiniband/hw/cxgb4/user.h b/drivers/infiniband/hw/cxgb4/user.h
index 32b754c..a4cac61 100644
--- a/drivers/infiniband/hw/cxgb4/user.h
+++ b/drivers/infiniband/hw/cxgb4/user.h
@@ -71,3 +71,8 @@ struct c4iw_create_qp_resp {
 	__u32 flags;
 };
 #endif
+struct c4iw_alloc_ucontext_resp {
+	__u64 status_page_key;
+	__u32 status_page_size;
+};
+
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
index 50abe1d..32db377 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4.h
@@ -500,6 +500,7 @@ struct sge_txq {
 	spinlock_t db_lock;
 	int db_disabled;
 	unsigned short db_pidx;
+	unsigned short db_pidx_inc;
 	u64 udb;
 };
 
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 61825f5..17891f1 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -3578,16 +3578,24 @@ static void drain_db_fifo(struct adapter *adap, int usecs)
 
 static void disable_txq_db(struct sge_txq *q)
 {
-	spin_lock_irq(&q->db_lock);
+	unsigned long flags;
+
+	spin_lock_irqsave(&q->db_lock, flags);
 	q->db_disabled = 1;
-	spin_unlock_irq(&q->db_lock);
+	spin_unlock_irqrestore(&q->db_lock, flags);
 }
 
-static void enable_txq_db(struct sge_txq *q)
+static void enable_txq_db(struct adapter *adap, struct sge_txq *q)
 {
 	unsigned long flags;
 
 	spin_lock_irqsave(&q->db_lock, flags);
+	if (q->db_pidx_inc) {
+		wmb();
+		t4_write_reg(adap, MYPF_REG(SGE_PF_KDOORBELL),
+			     QID(q->cntxt_id) | PIDX(q->db_pidx_inc));
+		q->db_pidx_inc = 0;
+	}
 	q->db_disabled = 0;
 	spin_unlock_irqrestore(&q->db_lock, flags);
 }
@@ -3609,11 +3617,32 @@ static void enable_dbs(struct adapter *adap)
 	int i;
 
 	for_each_ethrxq(&adap->sge, i)
-		enable_txq_db(&adap->sge.ethtxq[i].q);
+		enable_txq_db(adap, &adap->sge.ethtxq[i].q);
 	for_each_ofldrxq(&adap->sge, i)
-		enable_txq_db(&adap->sge.ofldtxq[i].q);
+		enable_txq_db(adap, &adap->sge.ofldtxq[i].q);
 	for_each_port(adap, i)
-		enable_txq_db(&adap->sge.ctrlq[i].q);
+		enable_txq_db(adap, &adap->sge.ctrlq[i].q);
+}
+
+static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
+{
+	if (adap->uld_handle[CXGB4_ULD_RDMA])
+		ulds[CXGB4_ULD_RDMA].control(adap->uld_handle[CXGB4_ULD_RDMA],
+				cmd);
+}
+
+static void process_db_full(struct work_struct *work)
+{
+	struct adapter *adap;
+
+	adap = container_of(work, struct adapter, db_full_task);
+
+	drain_db_fifo(adap, dbfifo_drain_delay);
+	enable_dbs(adap);
+	notify_rdma_uld(adap, CXGB4_CONTROL_DB_EMPTY);
+	t4_set_reg_field(adap, SGE_INT_ENABLE3,
+			 DBFIFO_HP_INT | DBFIFO_LP_INT,
+			 DBFIFO_HP_INT | DBFIFO_LP_INT);
 }
 
 static void sync_txq_pidx(struct adapter *adap, struct sge_txq *q)
@@ -3639,6 +3668,7 @@ static void sync_txq_pidx(struct adapter *adap, struct sge_txq *q)
 	}
 out:
 	q->db_disabled = 0;
+	q->db_pidx_inc = 0;
 	spin_unlock_irqrestore(&q->db_lock, flags);
 	if (ret)
 		CH_WARN(adap, "DB drop recovery failed.\n");
@@ -3655,29 +3685,6 @@ static void recover_all_queues(struct adapter *adap)
 		sync_txq_pidx(adap, &adap->sge.ctrlq[i].q);
 }
 
-static void notify_rdma_uld(struct adapter *adap, enum cxgb4_control cmd)
-{
-	mutex_lock(&uld_mutex);
-	if (adap->uld_handle[CXGB4_ULD_RDMA])
-		ulds[CXGB4_ULD_RDMA].control(adap->uld_handle[CXGB4_ULD_RDMA],
-				cmd);
-	mutex_unlock(&uld_mutex);
-}
-
-static void process_db_full(struct work_struct *work)
-{
-	struct adapter *adap;
-
-	adap = container_of(work, struct adapter, db_full_task);
-
-	notify_rdma_uld(adap, CXGB4_CONTROL_DB_FULL);
-	drain_db_fifo(adap, dbfifo_drain_delay);
-	t4_set_reg_field(adap, SGE_INT_ENABLE3,
-			 DBFIFO_HP_INT | DBFIFO_LP_INT,
-			 DBFIFO_HP_INT | DBFIFO_LP_INT);
-	notify_rdma_uld(adap, CXGB4_CONTROL_DB_EMPTY);
-}
-
 static void process_db_drop(struct work_struct *work)
 {
 	struct adapter *adap;
@@ -3685,11 +3692,13 @@ static void process_db_drop(struct work_struct *work)
 	adap = container_of(work, struct adapter, db_drop_task);
 
 	if (is_t4(adap->params.chip)) {
-		disable_dbs(adap);
+		drain_db_fifo(adap, dbfifo_drain_delay);
 		notify_rdma_uld(adap, CXGB4_CONTROL_DB_DROP);
-		drain_db_fifo(adap, 1);
+		drain_db_fifo(adap, dbfifo_drain_delay);
 		recover_all_queues(adap);
+		drain_db_fifo(adap, dbfifo_drain_delay);
 		enable_dbs(adap);
+		notify_rdma_uld(adap, CXGB4_CONTROL_DB_EMPTY);
 	} else {
 		u32 dropped_db = t4_read_reg(adap, 0x010ac);
 		u16 qid = (dropped_db >> 15) & 0x1ffff;
@@ -3730,6 +3739,8 @@ static void process_db_drop(struct work_struct *work)
 void t4_db_full(struct adapter *adap)
 {
 	if (is_t4(adap->params.chip)) {
+		disable_dbs(adap);
+		notify_rdma_uld(adap, CXGB4_CONTROL_DB_FULL);
 		t4_set_reg_field(adap, SGE_INT_ENABLE3,
 				 DBFIFO_HP_INT | DBFIFO_LP_INT, 0);
 		queue_work(workq, &adap->db_full_task);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 392baee..c2e142d 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -879,7 +879,8 @@ static inline void ring_tx_db(struct adapter *adap, struct sge_txq *q, int n)
 				writel(n,  adap->bar2 + q->udb + 8);
 			wmb();
 		}
-	}
+	} else
+		q->db_pidx_inc += n;
 	q->db_pidx = q->pidx;
 	spin_unlock_irqrestore(&q->db_lock, flags);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 11/31] iw_cxgb4: use the BAR2/WC path for kernel QPs and T5 devices.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (9 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 12/31] iw_cxgb4: Fix incorrect BUG_ON conditions Hariprasad Shenai
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/device.c   |   41 +++++++++++++++++----
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |    2 +
 drivers/infiniband/hw/cxgb4/qp.c       |   59 ++++++++++++++++++++----------
 drivers/infiniband/hw/cxgb4/t4.h       |   62 +++++++++++++++++++++++++++++--
 4 files changed, 132 insertions(+), 32 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 64334ef..38a1b38 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -685,7 +685,10 @@ static void c4iw_dealloc(struct uld_ctx *ctx)
 	idr_destroy(&ctx->dev->hwtid_idr);
 	idr_destroy(&ctx->dev->stid_idr);
 	idr_destroy(&ctx->dev->atid_idr);
-	iounmap(ctx->dev->rdev.oc_mw_kva);
+	if (ctx->dev->rdev.bar2_kva)
+		iounmap(ctx->dev->rdev.bar2_kva);
+	if (ctx->dev->rdev.oc_mw_kva)
+		iounmap(ctx->dev->rdev.oc_mw_kva);
 	ib_dealloc_device(&ctx->dev->ibdev);
 	ctx->dev = NULL;
 }
@@ -725,11 +728,31 @@ static struct c4iw_dev *c4iw_alloc(const struct cxgb4_lld_info *infop)
 	}
 	devp->rdev.lldi = *infop;
 
-	devp->rdev.oc_mw_pa = pci_resource_start(devp->rdev.lldi.pdev, 2) +
-		(pci_resource_len(devp->rdev.lldi.pdev, 2) -
-		 roundup_pow_of_two(devp->rdev.lldi.vr->ocq.size));
-	devp->rdev.oc_mw_kva = ioremap_wc(devp->rdev.oc_mw_pa,
-					       devp->rdev.lldi.vr->ocq.size);
+	/*
+	 * For T5 devices, we map all of BAR2 with WC.
+	 * For T4 devices with onchip qp mem, we map only that part
+	 * of BAR2 with WC.
+	 */
+	devp->rdev.bar2_pa = pci_resource_start(devp->rdev.lldi.pdev, 2);
+	if (is_t5(devp->rdev.lldi.adapter_type)) {
+		devp->rdev.bar2_kva = ioremap_wc(devp->rdev.bar2_pa,
+			pci_resource_len(devp->rdev.lldi.pdev, 2));
+		if (!devp->rdev.bar2_kva) {
+			printk(KERN_ERR MOD "Unable to ioremap BAR2\n");
+			return ERR_PTR(-EINVAL);
+		}
+	} else if (ocqp_supported(infop)) {
+		devp->rdev.oc_mw_pa =
+			pci_resource_start(devp->rdev.lldi.pdev, 2) +
+			pci_resource_len(devp->rdev.lldi.pdev, 2) -
+			roundup_pow_of_two(devp->rdev.lldi.vr->ocq.size);
+		devp->rdev.oc_mw_kva = ioremap_wc(devp->rdev.oc_mw_pa,
+			devp->rdev.lldi.vr->ocq.size);
+		if (!devp->rdev.oc_mw_kva) {
+			pr_err(MOD "Unable to ioremap onchip mem\n");
+			return ERR_PTR(-EINVAL);
+		}
+	}
 
 	PDBG(KERN_INFO MOD "ocq memory: "
 	       "hw_start 0x%x size %u mw_pa 0x%lx mw_kva %p\n",
@@ -1004,9 +1027,11 @@ static int enable_qp_db(int id, void *p, void *data)
 static void resume_rc_qp(struct c4iw_qp *qp)
 {
 	spin_lock(&qp->lock);
-	t4_ring_sq_db(&qp->wq, qp->wq.sq.wq_pidx_inc);
+	t4_ring_sq_db(&qp->wq, qp->wq.sq.wq_pidx_inc,
+		      is_t5(qp->rhp->rdev.lldi.adapter_type), NULL);
 	qp->wq.sq.wq_pidx_inc = 0;
-	t4_ring_rq_db(&qp->wq, qp->wq.rq.wq_pidx_inc);
+	t4_ring_rq_db(&qp->wq, qp->wq.rq.wq_pidx_inc,
+		      is_t5(qp->rhp->rdev.lldi.adapter_type), NULL);
 	qp->wq.rq.wq_pidx_inc = 0;
 	spin_unlock(&qp->lock);
 }
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index e9ecbfa..c05c875 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -149,6 +149,8 @@ struct c4iw_rdev {
 	struct gen_pool *ocqp_pool;
 	u32 flags;
 	struct cxgb4_lld_info lldi;
+	unsigned long bar2_pa;
+	void __iomem *bar2_kva;
 	unsigned long oc_mw_pa;
 	void __iomem *oc_mw_kva;
 	struct c4iw_stats stats;
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 2b0d994..92c10d4 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -46,6 +46,10 @@ static int max_fr_immd = T4_MAX_FR_IMMD;
 module_param(max_fr_immd, int, 0644);
 MODULE_PARM_DESC(max_fr_immd, "fastreg threshold for using DSGL instead of immedate");
 
+int t5_en_wc = 1;
+module_param(t5_en_wc, int, 0644);
+MODULE_PARM_DESC(t5_en_wc, "Use BAR2/WC path for kernel users (default 1)");
+
 static void set_state(struct c4iw_qp *qhp, enum c4iw_qp_state state)
 {
 	unsigned long flag;
@@ -200,13 +204,23 @@ static int create_qp(struct c4iw_rdev *rdev, struct t4_wq *wq,
 
 	wq->db = rdev->lldi.db_reg;
 	wq->gts = rdev->lldi.gts_reg;
-	if (user) {
-		wq->sq.udb = (u64)pci_resource_start(rdev->lldi.pdev, 2) +
-					(wq->sq.qid << rdev->qpshift);
-		wq->sq.udb &= PAGE_MASK;
-		wq->rq.udb = (u64)pci_resource_start(rdev->lldi.pdev, 2) +
-					(wq->rq.qid << rdev->qpshift);
-		wq->rq.udb &= PAGE_MASK;
+	if (user || is_t5(rdev->lldi.adapter_type)) {
+		u32 off;
+
+		off = (wq->sq.qid << rdev->qpshift) & PAGE_MASK;
+		if (user)
+			wq->sq.udb = (u64 __iomem *)(rdev->bar2_pa + off);
+		else {
+			off += 128 * (wq->sq.qid & rdev->qpmask) + 8;
+			wq->sq.udb = (u64 __iomem *)(rdev->bar2_kva + off);
+		}
+		off = (wq->rq.qid << rdev->qpshift) & PAGE_MASK;
+		if (user)
+			wq->rq.udb = (u64 __iomem *)(rdev->bar2_pa + off);
+		else {
+			off += 128 * (wq->rq.qid & rdev->qpmask) + 8;
+			wq->rq.udb = (u64 __iomem *)(rdev->bar2_kva + off);
+		}
 	}
 	wq->rdev = rdev;
 	wq->rq.msn = 1;
@@ -289,7 +303,8 @@ static int create_qp(struct c4iw_rdev *rdev, struct t4_wq *wq,
 
 	PDBG("%s sqid 0x%x rqid 0x%x kdb 0x%p squdb 0x%llx rqudb 0x%llx\n",
 	     __func__, wq->sq.qid, wq->rq.qid, wq->db,
-	     (unsigned long long)wq->sq.udb, (unsigned long long)wq->rq.udb);
+	     (__force unsigned long long)wq->sq.udb,
+	     (__force unsigned long long)wq->rq.udb);
 
 	return 0;
 free_dma:
@@ -638,9 +653,10 @@ static int ring_kernel_sq_db(struct c4iw_qp *qhp, u16 inc)
 
 	spin_lock_irqsave(&qhp->rhp->lock, flags);
 	spin_lock(&qhp->lock);
-	if (qhp->rhp->db_state == NORMAL) {
-		t4_ring_sq_db(&qhp->wq, inc);
-	} else {
+	if (qhp->rhp->db_state == NORMAL)
+		t4_ring_sq_db(&qhp->wq, inc,
+			      is_t5(qhp->rhp->rdev.lldi.adapter_type), NULL);
+	else {
 		add_to_fc_list(&qhp->rhp->db_fc_list, &qhp->db_fc_entry);
 		qhp->wq.sq.wq_pidx_inc += inc;
 	}
@@ -655,9 +671,10 @@ static int ring_kernel_rq_db(struct c4iw_qp *qhp, u16 inc)
 
 	spin_lock_irqsave(&qhp->rhp->lock, flags);
 	spin_lock(&qhp->lock);
-	if (qhp->rhp->db_state == NORMAL) {
-		t4_ring_rq_db(&qhp->wq, inc);
-	} else {
+	if (qhp->rhp->db_state == NORMAL)
+		t4_ring_rq_db(&qhp->wq, inc,
+			      is_t5(qhp->rhp->rdev.lldi.adapter_type), NULL);
+	else {
 		add_to_fc_list(&qhp->rhp->db_fc_list, &qhp->db_fc_entry);
 		qhp->wq.rq.wq_pidx_inc += inc;
 	}
@@ -674,7 +691,7 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 	enum fw_wr_opcodes fw_opcode = 0;
 	enum fw_ri_wr_flags fw_flags;
 	struct c4iw_qp *qhp;
-	union t4_wr *wqe;
+	union t4_wr *wqe = NULL;
 	u32 num_wrs;
 	struct t4_swsqe *swsqe;
 	unsigned long flag;
@@ -779,7 +796,8 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		idx += DIV_ROUND_UP(len16*16, T4_EQ_ENTRY_SIZE);
 	}
 	if (!qhp->rhp->rdev.status_page->db_off) {
-		t4_ring_sq_db(&qhp->wq, idx);
+		t4_ring_sq_db(&qhp->wq, idx,
+			      is_t5(qhp->rhp->rdev.lldi.adapter_type), wqe);
 		spin_unlock_irqrestore(&qhp->lock, flag);
 	} else {
 		spin_unlock_irqrestore(&qhp->lock, flag);
@@ -793,7 +811,7 @@ int c4iw_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 {
 	int err = 0;
 	struct c4iw_qp *qhp;
-	union t4_recv_wr *wqe;
+	union t4_recv_wr *wqe = NULL;
 	u32 num_wrs;
 	u8 len16 = 0;
 	unsigned long flag;
@@ -845,7 +863,8 @@ int c4iw_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
 		num_wrs--;
 	}
 	if (!qhp->rhp->rdev.status_page->db_off) {
-		t4_ring_rq_db(&qhp->wq, idx);
+		t4_ring_rq_db(&qhp->wq, idx,
+			      is_t5(qhp->rhp->rdev.lldi.adapter_type), wqe);
 		spin_unlock_irqrestore(&qhp->lock, flag);
 	} else {
 		spin_unlock_irqrestore(&qhp->lock, flag);
@@ -1663,11 +1682,11 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
 		mm2->len = PAGE_ALIGN(qhp->wq.rq.memsize);
 		insert_mmap(ucontext, mm2);
 		mm3->key = uresp.sq_db_gts_key;
-		mm3->addr = qhp->wq.sq.udb;
+		mm3->addr = (__force u64)qhp->wq.sq.udb;
 		mm3->len = PAGE_SIZE;
 		insert_mmap(ucontext, mm3);
 		mm4->key = uresp.rq_db_gts_key;
-		mm4->addr = qhp->wq.rq.udb;
+		mm4->addr = (__force u64)qhp->wq.rq.udb;
 		mm4->len = PAGE_SIZE;
 		insert_mmap(ucontext, mm4);
 		if (mm5) {
diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index eeca8b1..edab0e9 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -292,7 +292,7 @@ struct t4_sq {
 	unsigned long phys_addr;
 	struct t4_swsqe *sw_sq;
 	struct t4_swsqe *oldest_read;
-	u64 udb;
+	u64 __iomem *udb;
 	size_t memsize;
 	u32 qid;
 	u16 in_use;
@@ -314,7 +314,7 @@ struct t4_rq {
 	dma_addr_t dma_addr;
 	DEFINE_DMA_UNMAP_ADDR(mapping);
 	struct t4_swrqe *sw_rq;
-	u64 udb;
+	u64 __iomem *udb;
 	size_t memsize;
 	u32 qid;
 	u32 msn;
@@ -435,15 +435,69 @@ static inline u16 t4_sq_wq_size(struct t4_wq *wq)
 		return wq->sq.size * T4_SQ_NUM_SLOTS;
 }
 
-static inline void t4_ring_sq_db(struct t4_wq *wq, u16 inc)
+/* This function copies 64 byte coalesced work request to
+ * memory mapped BAR2 space. For coalesced WR SGE fetches
+ * data from the FIFO instead of from Host.
+ */
+static inline void pio_copy(u64 __iomem *dst, u64 *src)
+{
+	int count = 8;
+
+	while (count) {
+		writeq(*src, dst);
+		src++;
+		dst++;
+		count--;
+	}
+}
+extern int t5_en_wc;
+
+static inline void wc_wmb(void)
+{
+#if defined(CONFIG_X86_32) || defined(CONFIG_X86_64)
+	asm volatile("sfence" : : : "memory");
+#else
+	wmb()
+#endif
+}
+
+static inline void t4_ring_sq_db(struct t4_wq *wq, u16 inc, u8 t5,
+				 union t4_wr *wqe)
 {
 	wmb();
+	if (t5) {
+		if (t5_en_wc && inc == 1 && wqe) {
+			PDBG("%s: WC wq->sq.pidx = %d\n",
+			     __func__, wq->sq.pidx);
+			pio_copy(wq->sq.udb + 7, (void *)wqe);
+			wc_wmb();
+		} else {
+			PDBG("%s: DB wq->sq.pidx = %d\n",
+			     __func__, wq->sq.pidx);
+			writel(PIDX_T5(inc), wq->sq.udb);
+		}
+		return;
+	}
 	writel(QID(wq->sq.qid) | PIDX(inc), wq->db);
 }
 
-static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc)
+static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc, u8 t5,
+				 union t4_recv_wr *wqe)
 {
 	wmb();
+	if (t5) {
+		if (t5_en_wc && inc == 1 && wqe) {
+			PDBG("%s: WC wq->rq.pidx = %d\n",
+			     __func__, wq->rq.pidx);
+			pio_copy(wq->rq.udb + 7, (void *)wqe);
+			wc_wmb();
+		} else {
+			PDBG("%s: DB wq->rq.pidx = %d\n",
+			     __func__, wq->rq.pidx);
+			writel(PIDX_T5(inc), wq->rq.udb);
+		}
+		return;
+	}
 	writel(QID(wq->rq.qid) | PIDX(inc), wq->db);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 12/31] iw_cxgb4: Fix incorrect BUG_ON conditions.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (10 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 11/31] iw_cxgb4: use the BAR2/WC path for kernel QPs and T5 devices Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 13/31] iw_cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes Hariprasad Shenai
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Based on original work from Jay Hernandez <jay@chelsio.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cq.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index c0673ac..59f7601 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -603,7 +603,7 @@ proc_cqe:
 	 */
 	if (SQ_TYPE(hw_cqe)) {
 		int idx = CQE_WRID_SQ_IDX(hw_cqe);
-		BUG_ON(idx > wq->sq.size);
+		BUG_ON(idx >= wq->sq.size);
 
 		/*
 		* Account for any unsignaled completions completed by
@@ -617,7 +617,7 @@ proc_cqe:
 			wq->sq.in_use -= wq->sq.size + idx - wq->sq.cidx;
 		else
 			wq->sq.in_use -= idx - wq->sq.cidx;
-		BUG_ON(wq->sq.in_use < 0 && wq->sq.in_use < wq->sq.size);
+		BUG_ON(wq->sq.in_use <= 0 && wq->sq.in_use >= wq->sq.size);
 
 		wq->sq.cidx = (uint16_t)idx;
 		PDBG("%s completing sq idx %u\n", __func__, wq->sq.cidx);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 13/31] iw_cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (11 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 12/31] iw_cxgb4: Fix incorrect BUG_ON conditions Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 14/31] iw_cxgb4: default peer2peer mode to 1 Hariprasad Shenai
                   ` (17 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |    1 +
 drivers/infiniband/hw/cxgb4/qp.c       |    6 ++++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index c05c875..8c32088 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -448,6 +448,7 @@ struct c4iw_qp {
 	atomic_t refcnt;
 	wait_queue_head_t wait;
 	struct timer_list timer;
+	int sq_sig_all;
 };
 
 static inline struct c4iw_qp *to_c4iw_qp(struct ib_qp *ibqp)
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 92c10d4..3b44f39 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -720,7 +720,7 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		fw_flags = 0;
 		if (wr->send_flags & IB_SEND_SOLICITED)
 			fw_flags |= FW_RI_SOLICITED_EVENT_FLAG;
-		if (wr->send_flags & IB_SEND_SIGNALED)
+		if (wr->send_flags & IB_SEND_SIGNALED || qhp->sq_sig_all)
 			fw_flags |= FW_RI_COMPLETION_FLAG;
 		swsqe = &qhp->wq.sq.sw_sq[qhp->wq.sq.pidx];
 		switch (wr->opcode) {
@@ -781,7 +781,8 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
 		}
 		swsqe->idx = qhp->wq.sq.pidx;
 		swsqe->complete = 0;
-		swsqe->signaled = (wr->send_flags & IB_SEND_SIGNALED);
+		swsqe->signaled = (wr->send_flags & IB_SEND_SIGNALED) ||
+				  qhp->sq_sig_all;
 		swsqe->flushed = 0;
 		swsqe->wr_id = wr->wr_id;
 
@@ -1608,6 +1609,7 @@ struct ib_qp *c4iw_create_qp(struct ib_pd *pd, struct ib_qp_init_attr *attrs,
 	qhp->attr.enable_bind = 1;
 	qhp->attr.max_ord = 1;
 	qhp->attr.max_ird = 1;
+	qhp->sq_sig_all = attrs->sq_sig_type == IB_SIGNAL_ALL_WR;
 	spin_lock_init(&qhp->lock);
 	mutex_init(&qhp->mutex);
 	init_waitqueue_head(&qhp->wait);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 14/31] iw_cxgb4: default peer2peer mode to 1.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (12 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 13/31] iw_cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 15/31] iw_cxgb4: save the correct map length for fast_reg_page_lists Hariprasad Shenai
                   ` (16 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 9387f74..d21ac15 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -98,9 +98,9 @@ int c4iw_debug;
 module_param(c4iw_debug, int, 0644);
 MODULE_PARM_DESC(c4iw_debug, "Enable debug logging (default=0)");
 
-static int peer2peer;
+static int peer2peer = 1;
 module_param(peer2peer, int, 0644);
-MODULE_PARM_DESC(peer2peer, "Support peer2peer ULPs (default=0)");
+MODULE_PARM_DESC(peer2peer, "Support peer2peer ULPs (default=1)");
 
 static int p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ;
 module_param(p2p_type, int, 0644);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 15/31] iw_cxgb4: save the correct map length for fast_reg_page_lists.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (13 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 14/31] iw_cxgb4: default peer2peer mode to 1 Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 16/31] iw_cxgb4: don't leak skb in c4iw_uld_rx_handler() Hariprasad Shenai
                   ` (15 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

We cannot save the mapped length using the rdma max_page_list_len field
of the ib_fast_reg_page_list struct because the core code uses it.  This
results in an incorrect unmap of the page list in c4iw_free_fastreg_pbl().

I found this with dma map debugging enabled in the kernel.  The fix is
to save the length in the c4iw_fr_page_list struct.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h |    1 +
 drivers/infiniband/hw/cxgb4/mem.c      |   12 ++++++++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 8c32088..b75f8f5 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -375,6 +375,7 @@ struct c4iw_fr_page_list {
 	DEFINE_DMA_UNMAP_ADDR(mapping);
 	dma_addr_t dma_addr;
 	struct c4iw_dev *dev;
+	int pll_len;
 };
 
 static inline struct c4iw_fr_page_list *to_c4iw_fr_page_list(
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index 41b1195..cdaf257 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -903,7 +903,11 @@ struct ib_fast_reg_page_list *c4iw_alloc_fastreg_pbl(struct ib_device *device,
 	dma_unmap_addr_set(c4pl, mapping, dma_addr);
 	c4pl->dma_addr = dma_addr;
 	c4pl->dev = dev;
-	c4pl->ibpl.max_page_list_len = pll_len;
+	c4pl->pll_len = pll_len;
+
+	PDBG("%s c4pl %p pll_len %u page_list %p dma_addr %p\n",
+	     __func__, c4pl, c4pl->pll_len, c4pl->ibpl.page_list,
+	     (void *)c4pl->dma_addr);
 
 	return &c4pl->ibpl;
 }
@@ -912,8 +916,12 @@ void c4iw_free_fastreg_pbl(struct ib_fast_reg_page_list *ibpl)
 {
 	struct c4iw_fr_page_list *c4pl = to_c4iw_fr_page_list(ibpl);
 
+	PDBG("%s c4pl %p pll_len %u page_list %p dma_addr %p\n",
+	     __func__, c4pl, c4pl->pll_len, c4pl->ibpl.page_list,
+	     (void *)c4pl->dma_addr);
+
 	dma_free_coherent(&c4pl->dev->rdev.lldi.pdev->dev,
-			  c4pl->ibpl.max_page_list_len,
+			  c4pl->pll_len,
 			  c4pl->ibpl.page_list, dma_unmap_addr(c4pl, mapping));
 	kfree(c4pl);
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 16/31] iw_cxgb4: don't leak skb in c4iw_uld_rx_handler().
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (14 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 15/31] iw_cxgb4: save the correct map length for fast_reg_page_lists Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 17/31] iw_cxgb4: fix possible memory leak in RX_PKT processing Hariprasad Shenai
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/device.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/device.c b/drivers/infiniband/hw/cxgb4/device.c
index 38a1b38..b3bf057 100644
--- a/drivers/infiniband/hw/cxgb4/device.c
+++ b/drivers/infiniband/hw/cxgb4/device.c
@@ -926,9 +926,11 @@ static int c4iw_uld_rx_handler(void *handle, const __be64 *rsp,
 	opcode = *(u8 *)rsp;
 	if (c4iw_handlers[opcode])
 		c4iw_handlers[opcode](dev, skb);
-	else
+	else {
 		pr_info("%s no handler opcode 0x%x...\n", __func__,
 		       opcode);
+		kfree_skb(skb);
+	}
 
 	return 0;
 nomem:
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 17/31] iw_cxgb4: fix possible memory leak in RX_PKT processing.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (15 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 16/31] iw_cxgb4: don't leak skb in c4iw_uld_rx_handler() Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 18/31] iw_cxgb4: ignore read reponse type 1 CQEs Hariprasad Shenai
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

If cxgb4_ofld_send() returns < 0, then send_fw_pass_open_req() must
free the request skb and the saved skb with the tcp header.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index d21ac15..c5c42d0 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -3205,6 +3205,7 @@ static void send_fw_pass_open_req(struct c4iw_dev *dev, struct sk_buff *skb,
 	struct sk_buff *req_skb;
 	struct fw_ofld_connection_wr *req;
 	struct cpl_pass_accept_req *cpl = cplhdr(skb);
+	int ret;
 
 	req_skb = alloc_skb(sizeof(struct fw_ofld_connection_wr), GFP_KERNEL);
 	req = (struct fw_ofld_connection_wr *)__skb_put(req_skb, sizeof(*req));
@@ -3241,7 +3242,13 @@ static void send_fw_pass_open_req(struct c4iw_dev *dev, struct sk_buff *skb,
 	req->cookie = (unsigned long)skb;
 
 	set_wr_txq(req_skb, CPL_PRIORITY_CONTROL, port_id);
-	cxgb4_ofld_send(dev->rdev.lldi.ports[0], req_skb);
+	ret = cxgb4_ofld_send(dev->rdev.lldi.ports[0], req_skb);
+	if (ret < 0) {
+		pr_err("%s - cxgb4_ofld_send error %d - dropping\n", __func__,
+		       ret);
+		kfree_skb(skb);
+		kfree_skb(req_skb);
+	}
 }
 
 /*
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 18/31] iw_cxgb4: ignore read reponse type 1 CQEs.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (16 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 17/31] iw_cxgb4: fix possible memory leak in RX_PKT processing Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 19/31] iw_cxgb4: connect_request_upcall fixes Hariprasad Shenai
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

These are generated by HW in some error cases and need to be
silently discarded.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cq.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 59f7601..e310762 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -366,6 +366,14 @@ void c4iw_flush_hw_cq(struct c4iw_cq *chp)
 		if (CQE_OPCODE(hw_cqe) == FW_RI_READ_RESP) {
 
 			/*
+			 * If we have reached here because of async
+			 * event or other error, and have egress error
+			 * then drop
+			 */
+			if (CQE_TYPE(hw_cqe) == 1)
+				goto next_cqe;
+
+			/*
 			 * drop peer2peer RTR reads.
 			 */
 			if (CQE_WRID_STAG(hw_cqe) == 1)
@@ -512,6 +520,18 @@ static int poll_cq(struct t4_wq *wq, struct t4_cq *cq, struct t4_cqe *cqe,
 	if (RQ_TYPE(hw_cqe) && (CQE_OPCODE(hw_cqe) == FW_RI_READ_RESP)) {
 
 		/*
+		 * If we have reached here because of async
+		 * event or other error, and have egress error
+		 * then drop
+		 */
+		if (CQE_TYPE(hw_cqe) == 1) {
+			if (CQE_STATUS(hw_cqe))
+				t4_set_wq_in_error(wq);
+			ret = -EAGAIN;
+			goto skip_cqe;
+		}
+
+		/*
 		 * If this is an unsolicited read response, then the read
 		 * was generated by the kernel driver as part of peer-2-peer
 		 * connection setup.  So ignore the completion.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 19/31] iw_cxgb4: connect_request_upcall fixes.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (17 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 18/31] iw_cxgb4: ignore read reponse type 1 CQEs Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:06 ` [PATCH net-next 20/31] iw_cxgb4: adjust tcp snd/rcv window based on link speed Hariprasad Shenai
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

When processing an MPA Start Request, if the listening
endpoint is DEAD, then abort the connection.

If the IWCM returns an error, then we must abort the connection and
release resources.  Also abort_connection() should not post a CLOSE
event, so clean that up too.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |   40 ++++++++++++++++++++++---------------
 1 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index c5c42d0..452ae3a 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -968,13 +968,14 @@ static int act_establish(struct c4iw_dev *dev, struct sk_buff *skb)
 	return 0;
 }
 
-static void close_complete_upcall(struct c4iw_ep *ep)
+static void close_complete_upcall(struct c4iw_ep *ep, int status)
 {
 	struct iw_cm_event event;
 
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 	memset(&event, 0, sizeof(event));
 	event.event = IW_CM_EVENT_CLOSE;
+	event.status = status;
 	if (ep->com.cm_id) {
 		PDBG("close complete delivered ep %p cm_id %p tid %u\n",
 		     ep, ep->com.cm_id, ep->hwtid);
@@ -988,7 +989,6 @@ static void close_complete_upcall(struct c4iw_ep *ep)
 static int abort_connection(struct c4iw_ep *ep, struct sk_buff *skb, gfp_t gfp)
 {
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
-	close_complete_upcall(ep);
 	state_set(&ep->com, ABORTING);
 	set_bit(ABORT_CONN, &ep->com.history);
 	return send_abort(ep, skb, gfp);
@@ -1067,9 +1067,10 @@ static void connect_reply_upcall(struct c4iw_ep *ep, int status)
 	}
 }
 
-static void connect_request_upcall(struct c4iw_ep *ep)
+static int connect_request_upcall(struct c4iw_ep *ep)
 {
 	struct iw_cm_event event;
+	int ret;
 
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 	memset(&event, 0, sizeof(event));
@@ -1094,15 +1095,14 @@ static void connect_request_upcall(struct c4iw_ep *ep)
 		event.private_data_len = ep->plen;
 		event.private_data = ep->mpa_pkt + sizeof(struct mpa_message);
 	}
-	if (state_read(&ep->parent_ep->com) != DEAD) {
-		c4iw_get_ep(&ep->com);
-		ep->parent_ep->com.cm_id->event_handler(
-						ep->parent_ep->com.cm_id,
-						&event);
-	}
+	c4iw_get_ep(&ep->com);
+	ret = ep->parent_ep->com.cm_id->event_handler(ep->parent_ep->com.cm_id,
+						      &event);
+	if (ret)
+		c4iw_put_ep(&ep->com);
 	set_bit(CONNREQ_UPCALL, &ep->com.history);
 	c4iw_put_ep(&ep->parent_ep->com);
-	ep->parent_ep = NULL;
+	return ret;
 }
 
 static void established_upcall(struct c4iw_ep *ep)
@@ -1401,7 +1401,6 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 		return;
 
 	PDBG("%s enter (%s line %u)\n", __func__, __FILE__, __LINE__);
-	stop_ep_timer(ep);
 	mpa = (struct mpa_message *) ep->mpa_pkt;
 
 	/*
@@ -1494,9 +1493,17 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 	     ep->mpa_attr.p2p_type);
 
 	state_set(&ep->com, MPA_REQ_RCVD);
+	stop_ep_timer(ep);
 
 	/* drive upcall */
-	connect_request_upcall(ep);
+	mutex_lock(&ep->parent_ep->com.mutex);
+	if (ep->parent_ep->com.state != DEAD) {
+		if (connect_request_upcall(ep))
+			abort_connection(ep, skb, GFP_KERNEL);
+	} else {
+		abort_connection(ep, skb, GFP_KERNEL);
+	}
+	mutex_unlock(&ep->parent_ep->com.mutex);
 	return;
 }
 
@@ -2257,7 +2264,7 @@ static int peer_close(struct c4iw_dev *dev, struct sk_buff *skb)
 			c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
 				       C4IW_QP_ATTR_NEXT_STATE, &attrs, 1);
 		}
-		close_complete_upcall(ep);
+		close_complete_upcall(ep, 0);
 		__state_set(&ep->com, DEAD);
 		release = 1;
 		disconnect = 0;
@@ -2427,7 +2434,7 @@ static int close_con_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
 					     C4IW_QP_ATTR_NEXT_STATE,
 					     &attrs, 1);
 		}
-		close_complete_upcall(ep);
+		close_complete_upcall(ep, 0);
 		__state_set(&ep->com, DEAD);
 		release = 1;
 		break;
@@ -2982,7 +2989,7 @@ int c4iw_ep_disconnect(struct c4iw_ep *ep, int abrupt, gfp_t gfp)
 	rdev = &ep->com.dev->rdev;
 	if (c4iw_fatal_error(rdev)) {
 		fatal = 1;
-		close_complete_upcall(ep);
+		close_complete_upcall(ep, -EIO);
 		ep->com.state = DEAD;
 	}
 	switch (ep->com.state) {
@@ -3024,7 +3031,7 @@ int c4iw_ep_disconnect(struct c4iw_ep *ep, int abrupt, gfp_t gfp)
 	if (close) {
 		if (abrupt) {
 			set_bit(EP_DISC_ABORT, &ep->com.history);
-			close_complete_upcall(ep);
+			close_complete_upcall(ep, -ECONNRESET);
 			ret = send_abort(ep, NULL, gfp);
 		} else {
 			set_bit(EP_DISC_CLOSE, &ep->com.history);
@@ -3437,6 +3444,7 @@ static void process_timeout(struct c4iw_ep *ep)
 				     &attrs, 1);
 		}
 		__state_set(&ep->com, ABORTING);
+		close_complete_upcall(ep, -ETIMEDOUT);
 		break;
 	default:
 		WARN(1, "%s unexpected state ep %p tid %u state %u\n",
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 20/31] iw_cxgb4: adjust tcp snd/rcv window based on link speed.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (18 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 19/31] iw_cxgb4: connect_request_upcall fixes Hariprasad Shenai
@ 2014-02-26 15:06 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 21/31] iw_cxgb4: update snd_seq when sending MPA messages Hariprasad Shenai
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:06 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

40G devices need a bigger windows, so default 40G devices to snd 512K
rcv 1024K.

Fixed a bug that shows up with recv window sizes that exceed the size of
the RCV_BUFSIZ field in opt0 (>= 1024K :).  If the recv window exceeds
this, then we specify the max possible in opt0, add add the rest in via
a RX_DATA_ACK credits.

Added module option named adjust_win, defaulted to 1, that allows
disabling the 40G window bump.  This allows a user to specify the exact
default window sizes via module options snd_win and rcv_win.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c            |   63 +++++++++++++++++++++++++--
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h      |    2 +
 drivers/net/ethernet/chelsio/cxgb4/t4_msg.h |    1 +
 3 files changed, 62 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 452ae3a..81fbc6e 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -134,6 +134,11 @@ static int snd_win = 128 * 1024;
 module_param(snd_win, int, 0644);
 MODULE_PARM_DESC(snd_win, "TCP send window in bytes (default=128KB)");
 
+static int adjust_win = 1;
+module_param(adjust_win, int, 0644);
+MODULE_PARM_DESC(adjust_win,
+		 "Adjust TCP window based on link speed (default=1)");
+
 static struct workqueue_struct *workq;
 
 static struct sk_buff_head rxq;
@@ -465,7 +470,7 @@ static void send_flowc(struct c4iw_ep *ep, struct sk_buff *skb)
 	flowc->mnemval[5].mnemonic = FW_FLOWC_MNEM_RCVNXT;
 	flowc->mnemval[5].val = cpu_to_be32(ep->rcv_seq);
 	flowc->mnemval[6].mnemonic = FW_FLOWC_MNEM_SNDBUF;
-	flowc->mnemval[6].val = cpu_to_be32(snd_win);
+	flowc->mnemval[6].val = cpu_to_be32(ep->snd_win);
 	flowc->mnemval[7].mnemonic = FW_FLOWC_MNEM_MSS;
 	flowc->mnemval[7].val = cpu_to_be32(ep->emss);
 	/* Pad WR to 16 byte boundary */
@@ -547,6 +552,7 @@ static int send_connect(struct c4iw_ep *ep)
 	struct sockaddr_in *ra = (struct sockaddr_in *)&ep->com.remote_addr;
 	struct sockaddr_in6 *la6 = (struct sockaddr_in6 *)&ep->com.local_addr;
 	struct sockaddr_in6 *ra6 = (struct sockaddr_in6 *)&ep->com.remote_addr;
+	int win;
 
 	wrlen = (ep->com.remote_addr.ss_family == AF_INET) ?
 			roundup(sizev4, 16) :
@@ -564,6 +570,15 @@ static int send_connect(struct c4iw_ep *ep)
 
 	cxgb4_best_mtu(ep->com.dev->rdev.lldi.mtus, ep->mtu, &mtu_idx);
 	wscale = compute_wscale(rcv_win);
+
+	/*
+	 * Specify the largest window that will fit in opt0. The
+	 * remainder will be specified in the rx_data_ack.
+	 */
+	win = ep->rcv_win >> 10;
+	if (win > RCV_BUFSIZ_MASK)
+		win = RCV_BUFSIZ_MASK;
+
 	opt0 = (nocong ? NO_CONG(1) : 0) |
 	       KEEP_ALIVE(1) |
 	       DELACK(1) |
@@ -574,7 +589,7 @@ static int send_connect(struct c4iw_ep *ep)
 	       SMAC_SEL(ep->smac_idx) |
 	       DSCP(ep->tos) |
 	       ULP_MODE(ULP_MODE_TCPDDP) |
-	       RCV_BUFSIZ(rcv_win>>10);
+	       RCV_BUFSIZ(win);
 	opt2 = RX_CHANNEL(0) |
 	       CCTRL_ECN(enable_ecn) |
 	       RSS_QUEUE_VALID | RSS_QUEUE(ep->rss_qid);
@@ -1134,6 +1149,14 @@ static int update_rx_credits(struct c4iw_ep *ep, u32 credits)
 		return 0;
 	}
 
+	/*
+	 * If we couldn't specify the entire rcv window at connection setup
+	 * due to the limit in the number of bits in the RCV_BUFSIZ field,
+	 * then add the overage in to the credits returned.
+	 */
+	if (ep->rcv_win > RCV_BUFSIZ_MASK * 1024)
+		credits += ep->rcv_win - RCV_BUFSIZ_MASK * 1024;
+
 	req = (struct cpl_rx_data_ack *) skb_put(skb, wrlen);
 	memset(req, 0, wrlen);
 	INIT_TP_WR(req, ep->hwtid);
@@ -1592,6 +1615,7 @@ static void send_fw_act_open_req(struct c4iw_ep *ep, unsigned int atid)
 	unsigned int mtu_idx;
 	int wscale;
 	struct sockaddr_in *sin;
+	int win;
 
 	skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
 	req = (struct fw_ofld_connection_wr *)__skb_put(skb, sizeof(*req));
@@ -1616,6 +1640,15 @@ static void send_fw_act_open_req(struct c4iw_ep *ep, unsigned int atid)
 	req->tcb.rcv_adv = htons(1);
 	cxgb4_best_mtu(ep->com.dev->rdev.lldi.mtus, ep->mtu, &mtu_idx);
 	wscale = compute_wscale(rcv_win);
+
+	/*
+	 * Specify the largest window that will fit in opt0. The
+	 * remainder will be specified in the rx_data_ack.
+	 */
+	win = ep->rcv_win >> 10;
+	if (win > RCV_BUFSIZ_MASK)
+		win = RCV_BUFSIZ_MASK;
+
 	req->tcb.opt0 = (__force __be64) (TCAM_BYPASS(1) |
 		(nocong ? NO_CONG(1) : 0) |
 		KEEP_ALIVE(1) |
@@ -1627,7 +1660,7 @@ static void send_fw_act_open_req(struct c4iw_ep *ep, unsigned int atid)
 		SMAC_SEL(ep->smac_idx) |
 		DSCP(ep->tos) |
 		ULP_MODE(ULP_MODE_TCPDDP) |
-		RCV_BUFSIZ(rcv_win >> 10));
+		RCV_BUFSIZ(win));
 	req->tcb.opt2 = (__force __be32) (PACE(1) |
 		TX_QUEUE(ep->com.dev->rdev.lldi.tx_modq[ep->tx_chan]) |
 		RX_CHANNEL(0) |
@@ -1665,6 +1698,17 @@ static int is_neg_adv(unsigned int status)
 	       status == CPL_ERR_KEEPALV_NEG_ADVICE;
 }
 
+static void set_tcp_window(struct c4iw_ep *ep, struct port_info *pi)
+{
+	ep->snd_win = snd_win;
+	ep->rcv_win = rcv_win;
+	if (adjust_win && pi->link_cfg.speed == 40000) {
+		ep->snd_win *= 4;
+		ep->rcv_win *= 4;
+	}
+	PDBG("%s snd_win %d rcv_win %d\n", __func__, ep->snd_win, ep->rcv_win);
+}
+
 #define ACT_OPEN_RETRY_COUNT 2
 
 static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
@@ -1713,6 +1757,7 @@ static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
 		ep->ctrlq_idx = cxgb4_port_idx(pdev);
 		ep->rss_qid = cdev->rdev.lldi.rxq_ids[
 			cxgb4_port_idx(pdev) * step];
+		set_tcp_window(ep, (struct port_info *)netdev_priv(pdev));
 		dev_put(pdev);
 	} else {
 		pdev = get_real_dev(n->dev);
@@ -1731,6 +1776,7 @@ static int import_ep(struct c4iw_ep *ep, int iptype, __u8 *peer_ip,
 			cdev->rdev.lldi.nchan;
 		ep->rss_qid = cdev->rdev.lldi.rxq_ids[
 			cxgb4_port_idx(n->dev) * step];
+		set_tcp_window(ep, (struct port_info *)netdev_priv(pdev));
 
 		if (clear_mpa_v1) {
 			ep->retry_with_mpa_v1 = 0;
@@ -1961,6 +2007,7 @@ static void accept_cr(struct c4iw_ep *ep, struct sk_buff *skb,
 	u64 opt0;
 	u32 opt2;
 	int wscale;
+	int win;
 
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 	BUG_ON(skb_cloned(skb));
@@ -1968,6 +2015,14 @@ static void accept_cr(struct c4iw_ep *ep, struct sk_buff *skb,
 	skb_get(skb);
 	cxgb4_best_mtu(ep->com.dev->rdev.lldi.mtus, ep->mtu, &mtu_idx);
 	wscale = compute_wscale(rcv_win);
+
+	/*
+	 * Specify the largest window that will fit in opt0. The
+	 * remainder will be specified in the rx_data_ack.
+	 */
+	win = ep->rcv_win >> 10;
+	if (win > RCV_BUFSIZ_MASK)
+		win = RCV_BUFSIZ_MASK;
 	opt0 = (nocong ? NO_CONG(1) : 0) |
 	       KEEP_ALIVE(1) |
 	       DELACK(1) |
@@ -1978,7 +2033,7 @@ static void accept_cr(struct c4iw_ep *ep, struct sk_buff *skb,
 	       SMAC_SEL(ep->smac_idx) |
 	       DSCP(ep->tos >> 2) |
 	       ULP_MODE(ULP_MODE_TCPDDP) |
-	       RCV_BUFSIZ(rcv_win>>10);
+	       RCV_BUFSIZ(win);
 	opt2 = RX_CHANNEL(0) |
 	       RSS_QUEUE_VALID | RSS_QUEUE(ep->rss_qid);
 
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index b75f8f5..3b6cea0 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -804,6 +804,8 @@ struct c4iw_ep {
 	u8 retry_with_mpa_v1;
 	u8 tried_with_mpa_v1;
 	unsigned int retry_count;
+	int snd_win;
+	int rcv_win;
 };
 
 static inline struct c4iw_ep *to_ep(struct iw_cm_id *cm_id)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
index f2738c7..330bc14 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_msg.h
@@ -227,6 +227,7 @@ struct cpl_pass_open_req {
 #define DELACK(x)     ((x) << 5)
 #define ULP_MODE(x)   ((x) << 8)
 #define RCV_BUFSIZ(x) ((x) << 12)
+#define RCV_BUFSIZ_MASK 0x3FFU
 #define DSCP(x)       ((x) << 22)
 #define SMAC_SEL(x)   ((u64)(x) << 28)
 #define L2T_IDX(x)    ((u64)(x) << 36)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 21/31] iw_cxgb4: update snd_seq when sending MPA messages.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (19 preceding siblings ...)
  2014-02-26 15:06 ` [PATCH net-next 20/31] iw_cxgb4: adjust tcp snd/rcv window based on link speed Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 22/31] iw_cxgb4: lock around accept/reject downcalls Hariprasad Shenai
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 81fbc6e..11c99a6 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -777,6 +777,7 @@ static void send_mpa_req(struct c4iw_ep *ep, struct sk_buff *skb,
 	start_ep_timer(ep);
 	state_set(&ep->com, MPA_REQ_SENT);
 	ep->mpa_attr.initiator = 1;
+	ep->snd_seq += mpalen;
 	return;
 }
 
@@ -856,6 +857,7 @@ static int send_mpa_reject(struct c4iw_ep *ep, const void *pdata, u8 plen)
 	t4_set_arp_err_handler(skb, NULL, arp_failure_discard);
 	BUG_ON(ep->mpa_skb);
 	ep->mpa_skb = skb;
+	ep->snd_seq += mpalen;
 	return c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
 }
 
@@ -940,6 +942,7 @@ static int send_mpa_reply(struct c4iw_ep *ep, const void *pdata, u8 plen)
 	t4_set_arp_err_handler(skb, NULL, arp_failure_discard);
 	ep->mpa_skb = skb;
 	state_set(&ep->com, MPA_REP_SENT);
+	ep->snd_seq += mpalen;
 	return c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 22/31] iw_cxgb4: lock around accept/reject downcalls.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (20 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 21/31] iw_cxgb4: update snd_seq when sending MPA messages Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 23/31] iw_cxgb4: drop RX_DATA packets if the endpoint is gone Hariprasad Shenai
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

There is a race between ULP threads doing an accept/reject, and the
ingress processing thread handling close/abort for the same connection.
The accept/reject path needs to hold the lock to serialize these paths.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |   31 +++++++++++++++++++++----------
 1 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 11c99a6..cbeaa58 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -775,7 +775,7 @@ static void send_mpa_req(struct c4iw_ep *ep, struct sk_buff *skb,
 	ep->mpa_skb = skb;
 	c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
 	start_ep_timer(ep);
-	state_set(&ep->com, MPA_REQ_SENT);
+	__state_set(&ep->com, MPA_REQ_SENT);
 	ep->mpa_attr.initiator = 1;
 	ep->snd_seq += mpalen;
 	return;
@@ -941,7 +941,7 @@ static int send_mpa_reply(struct c4iw_ep *ep, const void *pdata, u8 plen)
 	skb_get(skb);
 	t4_set_arp_err_handler(skb, NULL, arp_failure_discard);
 	ep->mpa_skb = skb;
-	state_set(&ep->com, MPA_REP_SENT);
+	__state_set(&ep->com, MPA_REP_SENT);
 	ep->snd_seq += mpalen;
 	return c4iw_l2t_send(&ep->com.dev->rdev, skb, ep->l2t);
 }
@@ -959,6 +959,7 @@ static int act_establish(struct c4iw_dev *dev, struct sk_buff *skb)
 	PDBG("%s ep %p tid %u snd_isn %u rcv_isn %u\n", __func__, ep, tid,
 	     be32_to_cpu(req->snd_isn), be32_to_cpu(req->rcv_isn));
 
+	mutex_lock(&ep->com.mutex);
 	dst_confirm(ep->dst);
 
 	/* setup the hwtid for this connection */
@@ -982,7 +983,7 @@ static int act_establish(struct c4iw_dev *dev, struct sk_buff *skb)
 		send_mpa_req(ep, skb, 1);
 	else
 		send_mpa_req(ep, skb, mpa_rev);
-
+	mutex_unlock(&ep->com.mutex);
 	return 0;
 }
 
@@ -2567,22 +2568,28 @@ static int fw4_ack(struct c4iw_dev *dev, struct sk_buff *skb)
 
 int c4iw_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len)
 {
-	int err;
+	int err = 0;
+	int disconnect = 0;
 	struct c4iw_ep *ep = to_ep(cm_id);
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 
-	if (state_read(&ep->com) == DEAD) {
+
+	mutex_lock(&ep->com.mutex);
+	if (ep->com.state == DEAD) {
 		c4iw_put_ep(&ep->com);
 		return -ECONNRESET;
 	}
 	set_bit(ULP_REJECT, &ep->com.history);
-	BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD);
+	BUG_ON(ep->com.state != MPA_REQ_RCVD);
 	if (mpa_rev == 0)
 		abort_connection(ep, NULL, GFP_KERNEL);
 	else {
 		err = send_mpa_reject(ep, pdata, pdata_len);
-		err = c4iw_ep_disconnect(ep, 0, GFP_KERNEL);
+		disconnect = 1;
 	}
+	mutex_unlock(&ep->com.mutex);
+	if (disconnect)
+		err = c4iw_ep_disconnect(ep, 0, GFP_KERNEL);
 	c4iw_put_ep(&ep->com);
 	return 0;
 }
@@ -2597,12 +2604,14 @@ int c4iw_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 	struct c4iw_qp *qp = get_qhp(h, conn_param->qpn);
 
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
-	if (state_read(&ep->com) == DEAD) {
+
+	mutex_lock(&ep->com.mutex);
+	if (ep->com.state == DEAD) {
 		err = -ECONNRESET;
 		goto err;
 	}
 
-	BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD);
+	BUG_ON(ep->com.state != MPA_REQ_RCVD);
 	BUG_ON(!qp);
 
 	set_bit(ULP_ACCEPT, &ep->com.history);
@@ -2671,14 +2680,16 @@ int c4iw_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
 	if (err)
 		goto err1;
 
-	state_set(&ep->com, FPDU_MODE);
+	__state_set(&ep->com, FPDU_MODE);
 	established_upcall(ep);
+	mutex_unlock(&ep->com.mutex);
 	c4iw_put_ep(&ep->com);
 	return 0;
 err1:
 	ep->com.cm_id = NULL;
 	cm_id->rem_ref(cm_id);
 err:
+	mutex_unlock(&ep->com.mutex);
 	c4iw_put_ep(&ep->com);
 	return err;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 23/31] iw_cxgb4: drop RX_DATA packets if the endpoint is gone.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (21 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 22/31] iw_cxgb4: lock around accept/reject downcalls Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 24/31] iw_cxgb4: rx_data() needs to hold the ep mutex Hariprasad Shenai
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index cbeaa58..2930e91 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1544,6 +1544,8 @@ static int rx_data(struct c4iw_dev *dev, struct sk_buff *skb)
 	__u8 status = hdr->status;
 
 	ep = lookup_tid(t, tid);
+	if (!ep)
+		return 0;
 	PDBG("%s ep %p tid %u dlen %u\n", __func__, ep, ep->hwtid, dlen);
 	skb_pull(skb, sizeof(*hdr));
 	skb_trim(skb, dlen);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 24/31] iw_cxgb4: rx_data() needs to hold the ep mutex.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (22 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 23/31] iw_cxgb4: drop RX_DATA packets if the endpoint is gone Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 25/31] iw_cxgb4: endpoint timeout fixes Hariprasad Shenai
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

To avoid racing with other threads doing close/flush/whatever, rx_data()
should hold the endpoint mutex.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |   16 +++++++++-------
 1 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index 2930e91..f30ed32 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -1193,7 +1193,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
 	 * the connection.
 	 */
 	stop_ep_timer(ep);
-	if (state_read(&ep->com) != MPA_REQ_SENT)
+	if (ep->com.state != MPA_REQ_SENT)
 		return;
 
 	/*
@@ -1268,7 +1268,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
 	 * start reply message including private data. And
 	 * the MPA header is valid.
 	 */
-	state_set(&ep->com, FPDU_MODE);
+	__state_set(&ep->com, FPDU_MODE);
 	ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0;
 	ep->mpa_attr.recv_marker_enabled = markers_enabled;
 	ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0;
@@ -1383,7 +1383,7 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
 	}
 	goto out;
 err:
-	state_set(&ep->com, ABORTING);
+	__state_set(&ep->com, ABORTING);
 	send_abort(ep, skb, GFP_KERNEL);
 out:
 	connect_reply_upcall(ep, err);
@@ -1398,7 +1398,7 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 
-	if (state_read(&ep->com) != MPA_REQ_WAIT)
+	if (ep->com.state != MPA_REQ_WAIT)
 		return;
 
 	/*
@@ -1519,7 +1519,7 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 	     ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version,
 	     ep->mpa_attr.p2p_type);
 
-	state_set(&ep->com, MPA_REQ_RCVD);
+	__state_set(&ep->com, MPA_REQ_RCVD);
 	stop_ep_timer(ep);
 
 	/* drive upcall */
@@ -1549,11 +1549,12 @@ static int rx_data(struct c4iw_dev *dev, struct sk_buff *skb)
 	PDBG("%s ep %p tid %u dlen %u\n", __func__, ep, ep->hwtid, dlen);
 	skb_pull(skb, sizeof(*hdr));
 	skb_trim(skb, dlen);
+	mutex_lock(&ep->com.mutex);
 
 	/* update RX credits */
 	update_rx_credits(ep, dlen);
 
-	switch (state_read(&ep->com)) {
+	switch (ep->com.state) {
 	case MPA_REQ_SENT:
 		ep->rcv_seq += dlen;
 		process_mpa_reply(ep, skb);
@@ -1569,7 +1570,7 @@ static int rx_data(struct c4iw_dev *dev, struct sk_buff *skb)
 			pr_err("%s Unexpected streaming data." \
 			       " qpid %u ep %p state %d tid %u status %d\n",
 			       __func__, ep->com.qp->wq.sq.qid, ep,
-			       state_read(&ep->com), ep->hwtid, status);
+			       ep->com.state, ep->hwtid, status);
 		attrs.next_state = C4IW_QP_STATE_TERMINATE;
 		c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
 			       C4IW_QP_ATTR_NEXT_STATE, &attrs, 0);
@@ -1578,6 +1579,7 @@ static int rx_data(struct c4iw_dev *dev, struct sk_buff *skb)
 	default:
 		break;
 	}
+	mutex_unlock(&ep->com.mutex);
 	return 0;
 }
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 25/31] iw_cxgb4: endpoint timeout fixes.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (23 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 24/31] iw_cxgb4: rx_data() needs to hold the ep mutex Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 26/31] iw_cxgb4: rmb() after reading valid gen bit Hariprasad Shenai
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

1) timedout endpoint processing can be starved. If there is continual
CPL messages flowing into the driver, the endpoint timeout processing
can be starved.  This condition exposed the other bugs below.

Solution: In process_work(), call process_timedout_eps() after each CPL
is processed.

2) Connection events can be processed even though the endpoint is on
the timeout list.  If the endpoint is scheduled for timeout processing,
then we must ignore MPA Start Requests and Replies.

Solution: Change stop_ep_timer() to return 1 if the ep has already been
queued for timeout processing.  All the callers of stop_ep_timer() need
to check this and act accordingly.  There are just a few cases where
the caller needs to do something different if stop_ep_timer() returns 1:

1) in process_mpa_reply(), ignore the reply and  process_timeout()
will abort the connection.

2) in process_mpa_request, ignore the request and process_timeout()
will abort the connection.

It is ok for callers of stop_ep_timer() to abort the connection since
that will leave the state in ABORTING or DEAD, and process_timeout()
now ignores timeouts when the ep is in these states.

3) Double insertion on the timeout list.  Since the endpoint timers are
used for connection setup and teardown, we need to guard against the
possibility that an endpoint is already on the timeout list.  This is
a rare condition and only seen under heavy load and in the presense of
the above 2 bugs.

Solution: In ep_timeout(), don't queue the endpoint if it is already on
the queue.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cm.c |   89 ++++++++++++++++++++++++--------------
 1 files changed, 56 insertions(+), 33 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cm.c b/drivers/infiniband/hw/cxgb4/cm.c
index f30ed32..bb48510 100644
--- a/drivers/infiniband/hw/cxgb4/cm.c
+++ b/drivers/infiniband/hw/cxgb4/cm.c
@@ -178,12 +178,15 @@ static void start_ep_timer(struct c4iw_ep *ep)
 	add_timer(&ep->timer);
 }
 
-static void stop_ep_timer(struct c4iw_ep *ep)
+static int stop_ep_timer(struct c4iw_ep *ep)
 {
 	PDBG("%s ep %p stopping\n", __func__, ep);
 	del_timer_sync(&ep->timer);
-	if (!test_and_set_bit(TIMEOUT, &ep->com.flags))
+	if (!test_and_set_bit(TIMEOUT, &ep->com.flags)) {
 		c4iw_put_ep(&ep->com);
+		return 0;
+	}
+	return 1;
 }
 
 static int c4iw_l2t_send(struct c4iw_rdev *rdev, struct sk_buff *skb,
@@ -1188,12 +1191,11 @@ static void process_mpa_reply(struct c4iw_ep *ep, struct sk_buff *skb)
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 
 	/*
-	 * Stop mpa timer.  If it expired, then the state has
-	 * changed and we bail since ep_timeout already aborted
-	 * the connection.
+	 * Stop mpa timer.  If it expired, then
+	 * we ignore the MPA reply.  process_timeout()
+	 * will abort the connection.
 	 */
-	stop_ep_timer(ep);
-	if (ep->com.state != MPA_REQ_SENT)
+	if (stop_ep_timer(ep))
 		return;
 
 	/*
@@ -1398,15 +1400,12 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 
 	PDBG("%s ep %p tid %u\n", __func__, ep, ep->hwtid);
 
-	if (ep->com.state != MPA_REQ_WAIT)
-		return;
-
 	/*
 	 * If we get more than the supported amount of private data
 	 * then we must fail this connection.
 	 */
 	if (ep->mpa_pkt_len + skb->len > sizeof(ep->mpa_pkt)) {
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		abort_connection(ep, skb, GFP_KERNEL);
 		return;
 	}
@@ -1436,13 +1435,13 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 	if (mpa->revision > mpa_rev) {
 		printk(KERN_ERR MOD "%s MPA version mismatch. Local = %d,"
 		       " Received = %d\n", __func__, mpa_rev, mpa->revision);
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		abort_connection(ep, skb, GFP_KERNEL);
 		return;
 	}
 
 	if (memcmp(mpa->key, MPA_KEY_REQ, sizeof(mpa->key))) {
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		abort_connection(ep, skb, GFP_KERNEL);
 		return;
 	}
@@ -1453,7 +1452,7 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 	 * Fail if there's too much private data.
 	 */
 	if (plen > MPA_MAX_PRIVATE_DATA) {
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		abort_connection(ep, skb, GFP_KERNEL);
 		return;
 	}
@@ -1462,7 +1461,7 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 	 * If plen does not account for pkt size
 	 */
 	if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) {
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		abort_connection(ep, skb, GFP_KERNEL);
 		return;
 	}
@@ -1519,18 +1518,24 @@ static void process_mpa_request(struct c4iw_ep *ep, struct sk_buff *skb)
 	     ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version,
 	     ep->mpa_attr.p2p_type);
 
-	__state_set(&ep->com, MPA_REQ_RCVD);
-	stop_ep_timer(ep);
-
-	/* drive upcall */
-	mutex_lock(&ep->parent_ep->com.mutex);
-	if (ep->parent_ep->com.state != DEAD) {
-		if (connect_request_upcall(ep))
+	/*
+	 * If the endpoint timer already expired, then we ignore
+	 * the start request.  process_timeout() will abort
+	 * the connection.
+	 */
+	if (!stop_ep_timer(ep)) {
+		__state_set(&ep->com, MPA_REQ_RCVD);
+
+		/* drive upcall */
+		mutex_lock(&ep->parent_ep->com.mutex);
+		if (ep->parent_ep->com.state != DEAD) {
+			if (connect_request_upcall(ep))
+				abort_connection(ep, skb, GFP_KERNEL);
+		} else {
 			abort_connection(ep, skb, GFP_KERNEL);
-	} else {
-		abort_connection(ep, skb, GFP_KERNEL);
+		}
+		mutex_unlock(&ep->parent_ep->com.mutex);
 	}
-	mutex_unlock(&ep->parent_ep->com.mutex);
 	return;
 }
 
@@ -2321,7 +2326,7 @@ static int peer_close(struct c4iw_dev *dev, struct sk_buff *skb)
 		disconnect = 0;
 		break;
 	case MORIBUND:
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		if (ep->com.cm_id && ep->com.qp) {
 			attrs.next_state = C4IW_QP_STATE_IDLE;
 			c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp,
@@ -2381,10 +2386,10 @@ static int peer_abort(struct c4iw_dev *dev, struct sk_buff *skb)
 	case CONNECTING:
 		break;
 	case MPA_REQ_WAIT:
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		break;
 	case MPA_REQ_SENT:
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		if (mpa_rev == 1 || (mpa_rev == 2 && ep->tried_with_mpa_v1))
 			connect_reply_upcall(ep, -ECONNRESET);
 		else {
@@ -2489,7 +2494,7 @@ static int close_con_rpl(struct c4iw_dev *dev, struct sk_buff *skb)
 		__state_set(&ep->com, MORIBUND);
 		break;
 	case MORIBUND:
-		stop_ep_timer(ep);
+		(void)stop_ep_timer(ep);
 		if ((ep->com.cm_id) && (ep->com.qp)) {
 			attrs.next_state = C4IW_QP_STATE_IDLE;
 			c4iw_modify_qp(ep->com.qp->rhp,
@@ -3084,7 +3089,7 @@ int c4iw_ep_disconnect(struct c4iw_ep *ep, int abrupt, gfp_t gfp)
 		if (!test_and_set_bit(CLOSE_SENT, &ep->com.flags)) {
 			close = 1;
 			if (abrupt) {
-				stop_ep_timer(ep);
+				(void)stop_ep_timer(ep);
 				ep->com.state = ABORTING;
 			} else
 				ep->com.state = MORIBUND;
@@ -3519,6 +3524,16 @@ static void process_timeout(struct c4iw_ep *ep)
 		__state_set(&ep->com, ABORTING);
 		close_complete_upcall(ep, -ETIMEDOUT);
 		break;
+	case ABORTING:
+	case DEAD:
+
+		/*
+		 * These states are expected if the ep timed out at the same
+		 * time as another thread was calling stop_ep_timer().
+		 * So we silently do nothing for these states.
+		 */
+		abort = 0;
+		break;
 	default:
 		WARN(1, "%s unexpected state ep %p tid %u state %u\n",
 			__func__, ep, ep->hwtid, ep->com.state);
@@ -3540,6 +3555,8 @@ static void process_timedout_eps(void)
 
 		tmp = timeout_list.next;
 		list_del(tmp);
+		tmp->next = NULL;
+		tmp->prev = NULL;
 		spin_unlock_irq(&timeout_lock);
 		ep = list_entry(tmp, struct c4iw_ep, entry);
 		process_timeout(ep);
@@ -3556,6 +3573,7 @@ static void process_work(struct work_struct *work)
 	unsigned int opcode;
 	int ret;
 
+	process_timedout_eps();
 	while ((skb = skb_dequeue(&rxq))) {
 		rpl = cplhdr(skb);
 		dev = *((struct c4iw_dev **) (skb->cb + sizeof(void *)));
@@ -3565,8 +3583,8 @@ static void process_work(struct work_struct *work)
 		ret = work_handlers[opcode](dev, skb);
 		if (!ret)
 			kfree_skb(skb);
+		process_timedout_eps();
 	}
-	process_timedout_eps();
 }
 
 static DECLARE_WORK(skb_work, process_work);
@@ -3578,8 +3596,13 @@ static void ep_timeout(unsigned long arg)
 
 	spin_lock(&timeout_lock);
 	if (!test_and_set_bit(TIMEOUT, &ep->com.flags)) {
-		list_add_tail(&ep->entry, &timeout_list);
-		kickit = 1;
+		/*
+		 * Only insert if it is not already on the list.
+		 */
+		if (!ep->entry.next) {
+			list_add_tail(&ep->entry, &timeout_list);
+			kickit = 1;
+		}
 	}
 	spin_unlock(&timeout_lock);
 	if (kickit)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 26/31] iw_cxgb4: rmb() after reading valid gen bit.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (24 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 25/31] iw_cxgb4: endpoint timeout fixes Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 27/31] iw_cxgb4: wc_wmb() needed after DB writes Hariprasad Shenai
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Some HW platforms can reorder read operations, so we must rmb() after
we see a valid gen bit in a CQE but before we read any other fields
from the CQE.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/t4.h |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index edab0e9..67cd09e 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -622,6 +622,7 @@ static inline int t4_next_hw_cqe(struct t4_cq *cq, struct t4_cqe **cqe)
 		printk(KERN_ERR MOD "cq overflow cqid %u\n", cq->cqid);
 		BUG_ON(1);
 	} else if (t4_valid_cqe(cq, &cq->queue[cq->cidx])) {
+		rmb();
 		*cqe = &cq->queue[cq->cidx];
 		ret = 0;
 	} else
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 27/31] iw_cxgb4: wc_wmb() needed after DB writes.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (25 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 26/31] iw_cxgb4: rmb() after reading valid gen bit Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 28/31] iw_cxgb4: SQ flush fix Hariprasad Shenai
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Need to do an sfence after both the WC and regular PIDX DB write.
Otherwise the host might reorder things and cause work request corruption
(seen with NFSRDMA).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/t4.h |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index 67cd09e..ace3154 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -470,12 +470,12 @@ static inline void t4_ring_sq_db(struct t4_wq *wq, u16 inc, u8 t5,
 			PDBG("%s: WC wq->sq.pidx = %d\n",
 			     __func__, wq->sq.pidx);
 			pio_copy(wq->sq.udb + 7, (void *)wqe);
-			wc_wmb();
 		} else {
 			PDBG("%s: DB wq->sq.pidx = %d\n",
 			     __func__, wq->sq.pidx);
 			writel(PIDX_T5(inc), wq->sq.udb);
 		}
+		wc_wmb();
 		return;
 	}
 	writel(QID(wq->sq.qid) | PIDX(inc), wq->db);
@@ -490,12 +490,12 @@ static inline void t4_ring_rq_db(struct t4_wq *wq, u16 inc, u8 t5,
 			PDBG("%s: WC wq->rq.pidx = %d\n",
 			     __func__, wq->rq.pidx);
 			pio_copy(wq->rq.udb + 7, (void *)wqe);
-			wc_wmb();
 		} else {
 			PDBG("%s: DB wq->rq.pidx = %d\n",
 			     __func__, wq->rq.pidx);
 			writel(PIDX_T5(inc), wq->rq.udb);
 		}
+		wc_wmb();
 		return;
 	}
 	writel(QID(wq->rq.qid) | PIDX(inc), wq->db);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 28/31] iw_cxgb4: SQ flush fix.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (26 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 27/31] iw_cxgb4: wc_wmb() needed after DB writes Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 29/31] iw_cxgb4: minor fixes/cleanup Hariprasad Shenai
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

There is a race when moving a QP from RTS->CLOSING where a SQ work
request could be posted after the FW receives the RDMA_RI/FINI WR.
The SQ work request will never get processed, and should be completed
with FLUSHED status.  Function c4iw_flush_sq(), however was dropping
the oldest SQ work request when in CLOSING or IDLE states, instead of
completing the pending work request. If that oldest pending work request
was actually complete and has a CQE in the CQ, then when that CQE is
proceessed in poll_cq, we'll BUG_ON() due to the inconsistent SQ/CQ state.

This is a very small timing hole and has only been hit once so far.

The fix is two-fold:

1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED
status regardless of the QP state.

2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue
before moving the state out of RTS.  This ensures that the state
transition will not happen while another thread is in post_rc_send(),
because set_state() and post_rc_send() both aquire the qp spinlock.
Also, once we transition the state out of RTS, subsequent calls to
post_rc_send() will fail because the "in error" bit is set.  I don't
think this fully closes the race where the FW can get a FINI followed a
SQ work request being posted (because they are posted to differente EQs),
but the #1 fix will handle the issue by flushing the SQ work request.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cq.c |   22 ++++++++--------------
 drivers/infiniband/hw/cxgb4/qp.c |    6 +++---
 2 files changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index e310762..351c8e0 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -235,27 +235,21 @@ int c4iw_flush_sq(struct c4iw_qp *qhp)
 	struct t4_cq *cq = &chp->cq;
 	int idx;
 	struct t4_swsqe *swsqe;
-	int error = (qhp->attr.state != C4IW_QP_STATE_CLOSING &&
-			qhp->attr.state != C4IW_QP_STATE_IDLE);
 
 	if (wq->sq.flush_cidx == -1)
 		wq->sq.flush_cidx = wq->sq.cidx;
 	idx = wq->sq.flush_cidx;
 	BUG_ON(idx >= wq->sq.size);
 	while (idx != wq->sq.pidx) {
-		if (error) {
-			swsqe = &wq->sq.sw_sq[idx];
-			BUG_ON(swsqe->flushed);
-			swsqe->flushed = 1;
-			insert_sq_cqe(wq, cq, swsqe);
-			if (wq->sq.oldest_read == swsqe) {
-				BUG_ON(swsqe->opcode != FW_RI_READ_REQ);
-				advance_oldest_read(wq);
-			}
-			flushed++;
-		} else {
-			t4_sq_consume(wq);
+		swsqe = &wq->sq.sw_sq[idx];
+		BUG_ON(swsqe->flushed);
+		swsqe->flushed = 1;
+		insert_sq_cqe(wq, cq, swsqe);
+		if (wq->sq.oldest_read == swsqe) {
+			BUG_ON(swsqe->opcode != FW_RI_READ_REQ);
+			advance_oldest_read(wq);
 		}
+		flushed++;
 		if (++idx == wq->sq.size)
 			idx = 0;
 	}
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 3b44f39..274f59a 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -1359,6 +1359,7 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
 		switch (attrs->next_state) {
 		case C4IW_QP_STATE_CLOSING:
 			BUG_ON(atomic_read(&qhp->ep->com.kref.refcount) < 2);
+			t4_set_wq_in_error(&qhp->wq);
 			set_state(qhp, C4IW_QP_STATE_CLOSING);
 			ep = qhp->ep;
 			if (!internal) {
@@ -1366,16 +1367,15 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
 				disconnect = 1;
 				c4iw_get_ep(&qhp->ep->com);
 			}
-			t4_set_wq_in_error(&qhp->wq);
 			ret = rdma_fini(rhp, qhp, ep);
 			if (ret)
 				goto err;
 			break;
 		case C4IW_QP_STATE_TERMINATE:
+			t4_set_wq_in_error(&qhp->wq);
 			set_state(qhp, C4IW_QP_STATE_TERMINATE);
 			qhp->attr.layer_etype = attrs->layer_etype;
 			qhp->attr.ecode = attrs->ecode;
-			t4_set_wq_in_error(&qhp->wq);
 			ep = qhp->ep;
 			disconnect = 1;
 			if (!internal)
@@ -1388,8 +1388,8 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, struct c4iw_qp *qhp,
 			c4iw_get_ep(&qhp->ep->com);
 			break;
 		case C4IW_QP_STATE_ERROR:
-			set_state(qhp, C4IW_QP_STATE_ERROR);
 			t4_set_wq_in_error(&qhp->wq);
+			set_state(qhp, C4IW_QP_STATE_ERROR);
 			if (!internal) {
 				abort = 1;
 				disconnect = 1;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 29/31] iw_cxgb4: minor fixes/cleanup.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (27 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 28/31] iw_cxgb4: SQ flush fix Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 30/31] iw_cxgb4: Max fastreg depth depends on DSGL support Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 31/31] Revert "cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()" Hariprasad Shenai
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

Added some missing debug stats.

Use uninitialized_var().

Rate limit warning printks.

Initialize reserved fields in a FW work request.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/cq.c       |    2 +-
 drivers/infiniband/hw/cxgb4/mem.c      |    6 +++++-
 drivers/infiniband/hw/cxgb4/qp.c       |    2 ++
 drivers/infiniband/hw/cxgb4/resource.c |   10 +++++++---
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c
index 351c8e0..6cbe4c5 100644
--- a/drivers/infiniband/hw/cxgb4/cq.c
+++ b/drivers/infiniband/hw/cxgb4/cq.c
@@ -676,7 +676,7 @@ skip_cqe:
 static int c4iw_poll_cq_one(struct c4iw_cq *chp, struct ib_wc *wc)
 {
 	struct c4iw_qp *qhp = NULL;
-	struct t4_cqe cqe = {0, 0}, *rd_cqe;
+	struct t4_cqe uninitialized_var(cqe), *rd_cqe;
 	struct t4_wq *wq;
 	u32 credit = 0;
 	u8 cqe_flushed;
diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c
index cdaf257..ecdee18 100644
--- a/drivers/infiniband/hw/cxgb4/mem.c
+++ b/drivers/infiniband/hw/cxgb4/mem.c
@@ -259,8 +259,12 @@ static int write_tpt_entry(struct c4iw_rdev *rdev, u32 reset_tpt_entry,
 
 	if ((!reset_tpt_entry) && (*stag == T4_STAG_UNSET)) {
 		stag_idx = c4iw_get_resource(&rdev->resource.tpt_table);
-		if (!stag_idx)
+		if (!stag_idx) {
+			mutex_lock(&rdev->stats.lock);
+			rdev->stats.stag.fail++;
+			mutex_unlock(&rdev->stats.lock);
 			return -ENOMEM;
+		}
 		mutex_lock(&rdev->stats.lock);
 		rdev->stats.stag.cur += 32;
 		if (rdev->stats.stag.cur > rdev->stats.stag.max)
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 274f59a..30fcf84 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -428,6 +428,8 @@ static int build_rdma_send(struct t4_sq *sq, union t4_wr *wqe,
 	default:
 		return -EINVAL;
 	}
+	wqe->send.r3 = 0;
+	wqe->send.r4 = 0;
 
 	plen = 0;
 	if (wr->num_sge) {
diff --git a/drivers/infiniband/hw/cxgb4/resource.c b/drivers/infiniband/hw/cxgb4/resource.c
index cdef4d7..69e57d0 100644
--- a/drivers/infiniband/hw/cxgb4/resource.c
+++ b/drivers/infiniband/hw/cxgb4/resource.c
@@ -179,8 +179,12 @@ u32 c4iw_get_qpid(struct c4iw_rdev *rdev, struct c4iw_dev_ucontext *uctx)
 		kfree(entry);
 	} else {
 		qid = c4iw_get_resource(&rdev->resource.qid_table);
-		if (!qid)
+		if (!qid) {
+			mutex_lock(&rdev->stats.lock);
+			rdev->stats.qid.fail++;
+			mutex_unlock(&rdev->stats.lock);
 			goto out;
+		}
 		mutex_lock(&rdev->stats.lock);
 		rdev->stats.qid.cur += rdev->qpmask + 1;
 		mutex_unlock(&rdev->stats.lock);
@@ -322,8 +326,8 @@ u32 c4iw_rqtpool_alloc(struct c4iw_rdev *rdev, int size)
 	unsigned long addr = gen_pool_alloc(rdev->rqt_pool, size << 6);
 	PDBG("%s addr 0x%x size %d\n", __func__, (u32)addr, size << 6);
 	if (!addr)
-		printk_ratelimited(KERN_WARNING MOD "%s: Out of RQT memory\n",
-		       pci_name(rdev->lldi.pdev));
+		pr_warn_ratelimited(KERN_WARNING MOD "%s: Out of RQT memory\n",
+				    pci_name(rdev->lldi.pdev));
 	mutex_lock(&rdev->stats.lock);
 	if (addr) {
 		rdev->stats.rqt.cur += roundup(size << 6, 1 << MIN_RQT_SHIFT);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 30/31] iw_cxgb4: Max fastreg depth depends on DSGL support.
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (28 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 29/31] iw_cxgb4: minor fixes/cleanup Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  2014-02-26 15:07 ` [PATCH net-next 31/31] Revert "cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()" Hariprasad Shenai
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Steve Wise <swise@opengridcomputing.com>

The max depth of a fastreg mr depends on  whether the device supports DSGL or not.  So
compute it dynamically based on the device support and the module use_dsgl option.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 drivers/infiniband/hw/cxgb4/provider.c |    2 +-
 drivers/infiniband/hw/cxgb4/qp.c       |    3 ++-
 drivers/infiniband/hw/cxgb4/t4.h       |    9 ++++++++-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index d1565a4..9e1a409 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -327,7 +327,7 @@ static int c4iw_query_device(struct ib_device *ibdev,
 	props->max_mr = c4iw_num_stags(&dev->rdev);
 	props->max_pd = T4_MAX_NUM_PD;
 	props->local_ca_ack_delay = 0;
-	props->max_fast_reg_page_list_len = T4_MAX_FR_DEPTH;
+	props->max_fast_reg_page_list_len = t4_max_fr_depth(use_dsgl);
 
 	return 0;
 }
diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c
index 30fcf84..12b4bb4 100644
--- a/drivers/infiniband/hw/cxgb4/qp.c
+++ b/drivers/infiniband/hw/cxgb4/qp.c
@@ -560,7 +560,8 @@ static int build_fastreg(struct t4_sq *sq, union t4_wr *wqe,
 	int pbllen = roundup(wr->wr.fast_reg.page_list_len * sizeof(u64), 32);
 	int rem;
 
-	if (wr->wr.fast_reg.page_list_len > T4_MAX_FR_DEPTH)
+	if (wr->wr.fast_reg.page_list_len >
+	    t4_max_fr_depth(use_dsgl))
 		return -EINVAL;
 
 	wqe->fr.qpbinde_to_dcacpu = 0;
diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h
index ace3154..1543d6b 100644
--- a/drivers/infiniband/hw/cxgb4/t4.h
+++ b/drivers/infiniband/hw/cxgb4/t4.h
@@ -84,7 +84,14 @@ struct t4_status_page {
 			sizeof(struct fw_ri_isgl)) / sizeof(struct fw_ri_sge))
 #define T4_MAX_FR_IMMD ((T4_SQ_NUM_BYTES - sizeof(struct fw_ri_fr_nsmr_wr) - \
 			sizeof(struct fw_ri_immd)) & ~31UL)
-#define T4_MAX_FR_DEPTH (1024 / sizeof(u64))
+#define T4_MAX_FR_IMMD_DEPTH (T4_MAX_FR_IMMD / sizeof(u64))
+#define T4_MAX_FR_DSGL 1024
+#define T4_MAX_FR_DSGL_DEPTH (T4_MAX_FR_DSGL / sizeof(u64))
+
+static inline int t4_max_fr_depth(int use_dsgl)
+{
+	return use_dsgl ? T4_MAX_FR_DSGL_DEPTH : T4_MAX_FR_IMMD_DEPTH;
+}
 
 #define T4_RQ_NUM_SLOTS 2
 #define T4_RQ_NUM_BYTES (T4_EQ_ENTRY_SIZE * T4_RQ_NUM_SLOTS)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH net-next 31/31] Revert "cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()"
  2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
                   ` (29 preceding siblings ...)
  2014-02-26 15:07 ` [PATCH net-next 30/31] iw_cxgb4: Max fastreg depth depends on DSGL support Hariprasad Shenai
@ 2014-02-26 15:07 ` Hariprasad Shenai
  30 siblings, 0 replies; 40+ messages in thread
From: Hariprasad Shenai @ 2014-02-26 15:07 UTC (permalink / raw)
  To: netdev, linux-rdma
  Cc: davem, roland, kumaras, dm, swise, leedom, santosh, hariprasad,
	nirranjan

From: Kumar Sanghvi <kumaras@chelsio.com>

Commit 0034b29 (cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit())
introduced a regression causing chip-hang. This patch needs more debugging and
more work. So reverting for now.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c |   32 +++++++++--------------------
 1 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index c2e142d..25f8981 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -716,17 +716,11 @@ static inline unsigned int flits_to_desc(unsigned int n)
  *	@skb: the packet
  *
  *	Returns whether an Ethernet packet is small enough to fit as
- *	immediate data. Return value corresponds to headroom required.
+ *	immediate data.
  */
 static inline int is_eth_imm(const struct sk_buff *skb)
 {
-	int hdrlen = skb_shinfo(skb)->gso_size ?
-			sizeof(struct cpl_tx_pkt_lso_core) : 0;
-
-	hdrlen += sizeof(struct cpl_tx_pkt);
-	if (skb->len <= MAX_IMM_TX_PKT_LEN - hdrlen)
-		return hdrlen;
-	return 0;
+	return skb->len <= MAX_IMM_TX_PKT_LEN - sizeof(struct cpl_tx_pkt);
 }
 
 /**
@@ -739,10 +733,9 @@ static inline int is_eth_imm(const struct sk_buff *skb)
 static inline unsigned int calc_tx_flits(const struct sk_buff *skb)
 {
 	unsigned int flits;
-	int hdrlen = is_eth_imm(skb);
 
-	if (hdrlen)
-		return DIV_ROUND_UP(skb->len + hdrlen, sizeof(__be64));
+	if (is_eth_imm(skb))
+		return DIV_ROUND_UP(skb->len + sizeof(struct cpl_tx_pkt), 8);
 
 	flits = sgl_len(skb_shinfo(skb)->nr_frags + 1) + 4;
 	if (skb_shinfo(skb)->gso_size)
@@ -990,7 +983,6 @@ static inline void txq_advance(struct sge_txq *q, unsigned int n)
  */
 netdev_tx_t t4_eth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
-	int len;
 	u32 wr_mid;
 	u64 cntrl, *end;
 	int qidx, credits;
@@ -1002,7 +994,6 @@ netdev_tx_t t4_eth_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct cpl_tx_pkt_core *cpl;
 	const struct skb_shared_info *ssi;
 	dma_addr_t addr[MAX_SKB_FRAGS + 1];
-	bool immediate = false;
 
 	/*
 	 * The chip min packet length is 10 octets but play safe and reject
@@ -1032,10 +1023,7 @@ out_free:	dev_kfree_skb(skb);
 		return NETDEV_TX_BUSY;
 	}
 
-	if (is_eth_imm(skb))
-		immediate = true;
-
-	if (!immediate &&
+	if (!is_eth_imm(skb) &&
 	    unlikely(map_skb(adap->pdev_dev, skb, addr) < 0)) {
 		q->mapping_err++;
 		goto out_free;
@@ -1052,8 +1040,6 @@ out_free:	dev_kfree_skb(skb);
 	wr->r3 = cpu_to_be64(0);
 	end = (u64 *)wr + flits;
 
-	len = immediate ? skb->len : 0;
-	len += sizeof(*cpl);
 	ssi = skb_shinfo(skb);
 	if (ssi->gso_size) {
 		struct cpl_tx_pkt_lso *lso = (void *)wr;
@@ -1061,9 +1047,8 @@ out_free:	dev_kfree_skb(skb);
 		int l3hdr_len = skb_network_header_len(skb);
 		int eth_xtra_len = skb_network_offset(skb) - ETH_HLEN;
 
-		len += sizeof(*lso);
 		wr->op_immdlen = htonl(FW_WR_OP(FW_ETH_TX_PKT_WR) |
-				       FW_WR_IMMDLEN(len));
+				       FW_WR_IMMDLEN(sizeof(*lso)));
 		lso->c.lso_ctrl = htonl(LSO_OPCODE(CPL_TX_PKT_LSO) |
 					LSO_FIRST_SLICE | LSO_LAST_SLICE |
 					LSO_IPV6(v6) |
@@ -1081,6 +1066,9 @@ out_free:	dev_kfree_skb(skb);
 		q->tso++;
 		q->tx_cso += ssi->gso_segs;
 	} else {
+		int len;
+
+		len = is_eth_imm(skb) ? skb->len + sizeof(*cpl) : sizeof(*cpl);
 		wr->op_immdlen = htonl(FW_WR_OP(FW_ETH_TX_PKT_WR) |
 				       FW_WR_IMMDLEN(len));
 		cpl = (void *)(wr + 1);
@@ -1102,7 +1090,7 @@ out_free:	dev_kfree_skb(skb);
 	cpl->len = htons(skb->len);
 	cpl->ctrl1 = cpu_to_be64(cntrl);
 
-	if (immediate) {
+	if (is_eth_imm(skb)) {
 		inline_tx_skb(skb, &q->q, cpl + 1);
 		dev_kfree_skb(skb);
 	} else {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
       [not found]   ` <1393427230-14532-11-git-send-email-hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
@ 2014-02-26 23:12     ` David Miller
       [not found]       ` <20140226.181216.1563503549713890339.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: David Miller @ 2014-02-26 23:12 UTC (permalink / raw)
  To: hariprasad-ut6Up61K2wZBDgjK7y7TUQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	roland-BHEL68pLQRGGvPXPguhicg, kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	dm-ut6Up61K2wZBDgjK7y7TUQ,
	swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	leedom-ut6Up61K2wZBDgjK7y7TUQ, santosh-ut6Up61K2wZBDgjK7y7TUQ,
	nirranjan-ut6Up61K2wZBDgjK7y7TUQ

From: Hariprasad Shenai <hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
Date: Wed, 26 Feb 2014 20:36:49 +0530

> -static int allow_db_fc_on_t5;
> -module_param(allow_db_fc_on_t5, int, 0644);
> -MODULE_PARM_DESC(allow_db_fc_on_t5,
> -		 "Allow DB Flow Control on T5 (default = 0)");
> -
> -static int allow_db_coalescing_on_t5;
> -module_param(allow_db_coalescing_on_t5, int, 0644);
> -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
> -		 "Allow DB Coalescing on T5 (default = 0)");

Module parameters are a user facing interface.

You cannot just delete, or change the semantics of, the ones you feel
like doing so to.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
       [not found]       ` <20140226.181216.1563503549713890339.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2014-02-27 17:11           ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2014-02-27 17:11 UTC (permalink / raw)
  To: 'David Miller', hariprasad-ut6Up61K2wZBDgjK7y7TUQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	roland-BHEL68pLQRGGvPXPguhicg, kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	dm-ut6Up61K2wZBDgjK7y7TUQ, leedom-ut6Up61K2wZBDgjK7y7TUQ,
	santosh-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ

> 
> > -static int allow_db_fc_on_t5;
> > -module_param(allow_db_fc_on_t5, int, 0644);
> > -MODULE_PARM_DESC(allow_db_fc_on_t5,
> > -		 "Allow DB Flow Control on T5 (default = 0)");
> > -
> > -static int allow_db_coalescing_on_t5;
> > -module_param(allow_db_coalescing_on_t5, int, 0644);
> > -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
> > -		 "Allow DB Coalescing on T5 (default = 0)");
> 
> Module parameters are a user facing interface.
> 
> You cannot just delete, or change the semantics of, the ones you feel
> like doing so to.

I see your point on user facing interfaces.   These module params were
added initially to allow tweaking the db drop recovery for T5 devices in
the thought that we might need it.  It turns out T5 doesn't suffer from
this issue.  These params default to 0 anyway, and I doubt anyone has
changed them.  Disabling the hw db coalescing feature proved problematic
and it turned out to even make the issue worse, so I removed it totally
at the recommendation from the HW engineers, and put in place the new
design which better rate controls things under heavy load.

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
@ 2014-02-27 17:11           ` Steve Wise
  0 siblings, 0 replies; 40+ messages in thread
From: Steve Wise @ 2014-02-27 17:11 UTC (permalink / raw)
  To: 'David Miller', hariprasad-ut6Up61K2wZBDgjK7y7TUQ
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	roland-BHEL68pLQRGGvPXPguhicg, kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	dm-ut6Up61K2wZBDgjK7y7TUQ, leedom-ut6Up61K2wZBDgjK7y7TUQ,
	santosh-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ

> 
> > -static int allow_db_fc_on_t5;
> > -module_param(allow_db_fc_on_t5, int, 0644);
> > -MODULE_PARM_DESC(allow_db_fc_on_t5,
> > -		 "Allow DB Flow Control on T5 (default = 0)");
> > -
> > -static int allow_db_coalescing_on_t5;
> > -module_param(allow_db_coalescing_on_t5, int, 0644);
> > -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
> > -		 "Allow DB Coalescing on T5 (default = 0)");
> 
> Module parameters are a user facing interface.
> 
> You cannot just delete, or change the semantics of, the ones you feel
> like doing so to.

I see your point on user facing interfaces.   These module params were
added initially to allow tweaking the db drop recovery for T5 devices in
the thought that we might need it.  It turns out T5 doesn't suffer from
this issue.  These params default to 0 anyway, and I doubt anyone has
changed them.  Disabling the hw db coalescing feature proved problematic
and it turned out to even make the issue worse, so I removed it totally
at the recommendation from the HW engineers, and put in place the new
design which better rate controls things under heavy load.

Steve.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
  2014-02-27 17:11           ` Steve Wise
  (?)
@ 2014-02-27 18:05           ` David Miller
  2014-02-27 18:21             ` Stephen Hemminger
       [not found]             ` <20140227.130544.1892840202381139797.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  -1 siblings, 2 replies; 40+ messages in thread
From: David Miller @ 2014-02-27 18:05 UTC (permalink / raw)
  To: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW
  Cc: hariprasad-ut6Up61K2wZBDgjK7y7TUQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-BHEL68pLQRGGvPXPguhicg,
	kumaras-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
	leedom-ut6Up61K2wZBDgjK7y7TUQ, santosh-ut6Up61K2wZBDgjK7y7TUQ,
	nirranjan-ut6Up61K2wZBDgjK7y7TUQ

From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
Date: Thu, 27 Feb 2014 11:11:49 -0600

>> 
>> > -static int allow_db_fc_on_t5;
>> > -module_param(allow_db_fc_on_t5, int, 0644);
>> > -MODULE_PARM_DESC(allow_db_fc_on_t5,
>> > -		 "Allow DB Flow Control on T5 (default = 0)");
>> > -
>> > -static int allow_db_coalescing_on_t5;
>> > -module_param(allow_db_coalescing_on_t5, int, 0644);
>> > -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
>> > -		 "Allow DB Coalescing on T5 (default = 0)");
>> 
>> Module parameters are a user facing interface.
>> 
>> You cannot just delete, or change the semantics of, the ones you feel
>> like doing so to.
> 
> I see your point on user facing interfaces.   These module params were
> added initially to allow tweaking the db drop recovery for T5 devices in
> the thought that we might need it.  It turns out T5 doesn't suffer from
> this issue.  These params default to 0 anyway, and I doubt anyone has
> changed them.  Disabling the hw db coalescing feature proved problematic
> and it turned out to even make the issue worse, so I removed it totally
> at the recommendation from the HW engineers, and put in place the new
> design which better rate controls things under heavy load.

You have to keep the old ones around.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
  2014-02-27 18:05           ` David Miller
@ 2014-02-27 18:21             ` Stephen Hemminger
       [not found]             ` <20140227.130544.1892840202381139797.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
  1 sibling, 0 replies; 40+ messages in thread
From: Stephen Hemminger @ 2014-02-27 18:21 UTC (permalink / raw)
  To: David Miller
  Cc: swise, hariprasad, netdev, linux-rdma, roland, kumaras, dm,
	leedom, santosh, nirranjan

On Thu, 27 Feb 2014 13:05:44 -0500 (EST)
David Miller <davem@davemloft.net> wrote:

> From: "Steve Wise" <swise@opengridcomputing.com>
> Date: Thu, 27 Feb 2014 11:11:49 -0600
> 
> >> 
> >> > -static int allow_db_fc_on_t5;
> >> > -module_param(allow_db_fc_on_t5, int, 0644);
> >> > -MODULE_PARM_DESC(allow_db_fc_on_t5,
> >> > -		 "Allow DB Flow Control on T5 (default = 0)");
> >> > -
> >> > -static int allow_db_coalescing_on_t5;
> >> > -module_param(allow_db_coalescing_on_t5, int, 0644);
> >> > -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
> >> > -		 "Allow DB Coalescing on T5 (default = 0)");
> >> 
> >> Module parameters are a user facing interface.
> >> 
> >> You cannot just delete, or change the semantics of, the ones you feel
> >> like doing so to.
> > 
> > I see your point on user facing interfaces.   These module params were
> > added initially to allow tweaking the db drop recovery for T5 devices in
> > the thought that we might need it.  It turns out T5 doesn't suffer from
> > this issue.  These params default to 0 anyway, and I doubt anyone has
> > changed them.  Disabling the hw db coalescing feature proved problematic
> > and it turned out to even make the issue worse, so I removed it totally
> > at the recommendation from the HW engineers, and put in place the new
> > design which better rate controls things under heavy load.
> 
> You have to keep the old ones around.

You can keep the parameters but ignore them however.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
       [not found]             ` <20140227.130544.1892840202381139797.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
@ 2014-03-04 23:22               ` Ben Hutchings
       [not found]                 ` <1393975366.16256.23.camel-nDn/Rdv9kqW9Jme8/bJn5UCKIB8iOfG2tUK59QYPAWc@public.gmane.org>
  0 siblings, 1 reply; 40+ messages in thread
From: Ben Hutchings @ 2014-03-04 23:22 UTC (permalink / raw)
  To: David Miller
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	hariprasad-ut6Up61K2wZBDgjK7y7TUQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, roland-BHEL68pLQRGGvPXPguhicg,
	kumaras-ut6Up61K2wZBDgjK7y7TUQ, dm-ut6Up61K2wZBDgjK7y7TUQ,
	leedom-ut6Up61K2wZBDgjK7y7TUQ, santosh-ut6Up61K2wZBDgjK7y7TUQ,
	nirranjan-ut6Up61K2wZBDgjK7y7TUQ

[-- Attachment #1: Type: text/plain, Size: 1897 bytes --]

On Thu, 2014-02-27 at 13:05 -0500, David Miller wrote:
> From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> Date: Thu, 27 Feb 2014 11:11:49 -0600
> 
> >> 
> >> > -static int allow_db_fc_on_t5;
> >> > -module_param(allow_db_fc_on_t5, int, 0644);
> >> > -MODULE_PARM_DESC(allow_db_fc_on_t5,
> >> > -           "Allow DB Flow Control on T5 (default = 0)");
> >> > -
> >> > -static int allow_db_coalescing_on_t5;
> >> > -module_param(allow_db_coalescing_on_t5, int, 0644);
> >> > -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
> >> > -           "Allow DB Coalescing on T5 (default = 0)");
> >> 
> >> Module parameters are a user facing interface.
> >> 
> >> You cannot just delete, or change the semantics of, the ones you feel
> >> like doing so to.
> > 
> > I see your point on user facing interfaces.   These module params were
> > added initially to allow tweaking the db drop recovery for T5 devices in
> > the thought that we might need it.  It turns out T5 doesn't suffer from
> > this issue.  These params default to 0 anyway, and I doubt anyone has
> > changed them.  Disabling the hw db coalescing feature proved problematic
> > and it turned out to even make the issue worse, so I removed it totally
> > at the recommendation from the HW engineers, and put in place the new
> > design which better rate controls things under heavy load.
> 
> You have to keep the old ones around.

Setting an unknown module parameter now results in a warning (since
3.11), so removing a parameter isn't so disruptive as it used to be.

Obviously this is removing a feature, but as the feature sounds like it
was only marginally useful I think it's a perfectly valid change.

Ben.

-- 
Ben Hutchings
I'm always amazed by the number of people who take up solipsism because
they heard someone else explain it. - E*Borg on alt.fan.pratchett

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
       [not found]                 ` <1393975366.16256.23.camel-nDn/Rdv9kqW9Jme8/bJn5UCKIB8iOfG2tUK59QYPAWc@public.gmane.org>
@ 2014-03-05  5:34                     ` Hariprasad S
  0 siblings, 0 replies; 40+ messages in thread
From: Hariprasad S @ 2014-03-05  5:34 UTC (permalink / raw)
  To: Ben Hutchings, David Miller
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	roland-BHEL68pLQRGGvPXPguhicg, kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	dm-ut6Up61K2wZBDgjK7y7TUQ, leedom-ut6Up61K2wZBDgjK7y7TUQ,
	santosh-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ

On Tue, Mar 04, 2014 at 23:22:46 +0000, Ben Hutchings wrote:
> On Thu, 2014-02-27 at 13:05 -0500, David Miller wrote:
> > From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> > Date: Thu, 27 Feb 2014 11:11:49 -0600
> > 
> > >> 
> > >> > -static int allow_db_fc_on_t5;
> > >> > -module_param(allow_db_fc_on_t5, int, 0644);
> > >> > -MODULE_PARM_DESC(allow_db_fc_on_t5,
> > >> > -           "Allow DB Flow Control on T5 (default = 0)");
> > >> > -
> > >> > -static int allow_db_coalescing_on_t5;
> > >> > -module_param(allow_db_coalescing_on_t5, int, 0644);
> > >> > -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
> > >> > -           "Allow DB Coalescing on T5 (default = 0)");
> > >> 
> > >> Module parameters are a user facing interface.
> > >> 
> > >> You cannot just delete, or change the semantics of, the ones you feel
> > >> like doing so to.
> > > 
> > > I see your point on user facing interfaces.   These module params were
> > > added initially to allow tweaking the db drop recovery for T5 devices in
> > > the thought that we might need it.  It turns out T5 doesn't suffer from
> > > this issue.  These params default to 0 anyway, and I doubt anyone has
> > > changed them.  Disabling the hw db coalescing feature proved problematic
> > > and it turned out to even make the issue worse, so I removed it totally
> > > at the recommendation from the HW engineers, and put in place the new
> > > design which better rate controls things under heavy load.
> > 
> > You have to keep the old ones around.
> 
> Setting an unknown module parameter now results in a warning (since
> 3.11), so removing a parameter isn't so disruptive as it used to be.
> 
> Obviously this is removing a feature, but as the feature sounds like it
> was only marginally useful I think it's a perfectly valid change.
> 
> Ben.

Thanks Ben for letting us know that things have changed post 3.11 kernel.

Hi David,
Would you be OK if we remove the unused module_param now when we re-submit the series ?


Thanks,
-Hari.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes.
@ 2014-03-05  5:34                     ` Hariprasad S
  0 siblings, 0 replies; 40+ messages in thread
From: Hariprasad S @ 2014-03-05  5:34 UTC (permalink / raw)
  To: Ben Hutchings, David Miller
  Cc: swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW,
	netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	roland-BHEL68pLQRGGvPXPguhicg, kumaras-ut6Up61K2wZBDgjK7y7TUQ,
	dm-ut6Up61K2wZBDgjK7y7TUQ, leedom-ut6Up61K2wZBDgjK7y7TUQ,
	santosh-ut6Up61K2wZBDgjK7y7TUQ, nirranjan-ut6Up61K2wZBDgjK7y7TUQ

On Tue, Mar 04, 2014 at 23:22:46 +0000, Ben Hutchings wrote:
> On Thu, 2014-02-27 at 13:05 -0500, David Miller wrote:
> > From: "Steve Wise" <swise-7bPotxP6k4+P2YhJcF5u+vpXobYPEAuW@public.gmane.org>
> > Date: Thu, 27 Feb 2014 11:11:49 -0600
> > 
> > >> 
> > >> > -static int allow_db_fc_on_t5;
> > >> > -module_param(allow_db_fc_on_t5, int, 0644);
> > >> > -MODULE_PARM_DESC(allow_db_fc_on_t5,
> > >> > -           "Allow DB Flow Control on T5 (default = 0)");
> > >> > -
> > >> > -static int allow_db_coalescing_on_t5;
> > >> > -module_param(allow_db_coalescing_on_t5, int, 0644);
> > >> > -MODULE_PARM_DESC(allow_db_coalescing_on_t5,
> > >> > -           "Allow DB Coalescing on T5 (default = 0)");
> > >> 
> > >> Module parameters are a user facing interface.
> > >> 
> > >> You cannot just delete, or change the semantics of, the ones you feel
> > >> like doing so to.
> > > 
> > > I see your point on user facing interfaces.   These module params were
> > > added initially to allow tweaking the db drop recovery for T5 devices in
> > > the thought that we might need it.  It turns out T5 doesn't suffer from
> > > this issue.  These params default to 0 anyway, and I doubt anyone has
> > > changed them.  Disabling the hw db coalescing feature proved problematic
> > > and it turned out to even make the issue worse, so I removed it totally
> > > at the recommendation from the HW engineers, and put in place the new
> > > design which better rate controls things under heavy load.
> > 
> > You have to keep the old ones around.
> 
> Setting an unknown module parameter now results in a warning (since
> 3.11), so removing a parameter isn't so disruptive as it used to be.
> 
> Obviously this is removing a feature, but as the feature sounds like it
> was only marginally useful I think it's a perfectly valid change.
> 
> Ben.

Thanks Ben for letting us know that things have changed post 3.11 kernel.

Hi David,
Would you be OK if we remove the unused module_param now when we re-submit the series ?


Thanks,
-Hari.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-03-05  5:34 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 15:06 [PATCH net-next 00/31] Misc. fixes for cxgb4 and iw_cxgb4 Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 01/31] cxgb4: Fix some small bugs in t4_sge_init_soft() when our Page Size is 64KB Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 02/31] cxgb4: Add code to dump SGE registers when hitting idma hangs Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 03/31] cxgb4: Rectify emitting messages about SGE Ingress DMA channels being potentially stuck Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 04/31] cxgb4: Updates for T5 SGE's Egress Congestion Threshold Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 05/31] cxgb4: use spinlock_irqsave/spinlock_irqrestore for db lock Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 06/31] iw_cxgb4: cap CQ size at T4_MAX_IQ_SIZE Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 07/31] iw_cxgb4: Allow loopback connections Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 08/31] iw_cxgb4: release neigh entry in error paths Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 09/31] iw_cxgb4: Treat CPL_ERR_KEEPALV_NEG_ADVICE as negative advice Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 10/31] cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes Hariprasad Shenai
     [not found]   ` <1393427230-14532-11-git-send-email-hariprasad-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
2014-02-26 23:12     ` David Miller
     [not found]       ` <20140226.181216.1563503549713890339.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2014-02-27 17:11         ` Steve Wise
2014-02-27 17:11           ` Steve Wise
2014-02-27 18:05           ` David Miller
2014-02-27 18:21             ` Stephen Hemminger
     [not found]             ` <20140227.130544.1892840202381139797.davem-fT/PcQaiUtIeIZ0/mPfg9Q@public.gmane.org>
2014-03-04 23:22               ` Ben Hutchings
     [not found]                 ` <1393975366.16256.23.camel-nDn/Rdv9kqW9Jme8/bJn5UCKIB8iOfG2tUK59QYPAWc@public.gmane.org>
2014-03-05  5:34                   ` Hariprasad S
2014-03-05  5:34                     ` Hariprasad S
2014-02-26 15:06 ` [PATCH net-next 11/31] iw_cxgb4: use the BAR2/WC path for kernel QPs and T5 devices Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 12/31] iw_cxgb4: Fix incorrect BUG_ON conditions Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 13/31] iw_cxgb4: Mind the sq_sig_all/sq_sig_type QP attributes Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 14/31] iw_cxgb4: default peer2peer mode to 1 Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 15/31] iw_cxgb4: save the correct map length for fast_reg_page_lists Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 16/31] iw_cxgb4: don't leak skb in c4iw_uld_rx_handler() Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 17/31] iw_cxgb4: fix possible memory leak in RX_PKT processing Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 18/31] iw_cxgb4: ignore read reponse type 1 CQEs Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 19/31] iw_cxgb4: connect_request_upcall fixes Hariprasad Shenai
2014-02-26 15:06 ` [PATCH net-next 20/31] iw_cxgb4: adjust tcp snd/rcv window based on link speed Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 21/31] iw_cxgb4: update snd_seq when sending MPA messages Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 22/31] iw_cxgb4: lock around accept/reject downcalls Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 23/31] iw_cxgb4: drop RX_DATA packets if the endpoint is gone Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 24/31] iw_cxgb4: rx_data() needs to hold the ep mutex Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 25/31] iw_cxgb4: endpoint timeout fixes Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 26/31] iw_cxgb4: rmb() after reading valid gen bit Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 27/31] iw_cxgb4: wc_wmb() needed after DB writes Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 28/31] iw_cxgb4: SQ flush fix Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 29/31] iw_cxgb4: minor fixes/cleanup Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 30/31] iw_cxgb4: Max fastreg depth depends on DSGL support Hariprasad Shenai
2014-02-26 15:07 ` [PATCH net-next 31/31] Revert "cxgb4: Don't assume LSO only uses SGL path in t4_eth_xmit()" Hariprasad Shenai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.