All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] ath10k: improve throughput performance
@ 2016-03-22 11:52 ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

Hi All,

In order to reuse HTT Rx descriptor (copy engine 5), HTT response
processing should be decoupled from txrx data processing. This change also
helps to reduce rx ring lock contention. As txrx tasklet's work load is
reduced, rx replenish task can be combined with txrx_task. Refilling
complete rx ring from txrx tasket is affecting UDP UL traffic in AP135
platform. Hence existing refill threshold is updated to meet peak
throughput in both AP135 and AP148 platforms. Instead of tasklet existing
refill timer is used to reschedule replenish work at an interval of 5 ms
incase of more deficit.

This series are experimented in both AP148(QCA99x0) & IPQ4019 platforms.
Below are consolidated report alongwith CPU usage. Thanks Tamizh for helping 
to verify the changes.

        IPQ4019(TOT)  IPQ4019(+rework)   AP148(TOT)     AP148(+rework)
        ===========   ===============    ==========     =============
TCP DL   639 (40%)       646 (42%)       1134 (71%)      1134 (71%)
TCP UL   661 (31%)       663 (30%)       1244 (71%)      1270 (72%)
UDP DL   670 (50%)       682 (49%)       1240 (73%)      1244 (75%)

        AP135 (OpenWrt TOT)     AP135 (+changes)
        ==================      ===============

TCP DL          603             620
TCP UL          430             428
UDP DL          758             803
UDP UL          420             450

-Rajkumar

Rajkumar Manoharan (9):
  ath10k: speedup htt rx descriptor processing for tx completion
  ath10k: copy tx fetch indication message
  ath10k: remove unused fw_desc processing
  ath10k: cleanup amsdu processing for rx indication
  ath10k: speedup htt rx descriptor processing for rx_ind
  ath10k: register ath10k_htt_htc_t2h_msg_handler
  ath10k: cleanup copy engine receive next completion
  ath10k: reuse copy engine 5 (htt rx) descriptors
  ath10k: combine txrx and replenish task

 drivers/net/wireless/ath/ath10k/ce.c     |  44 ++---
 drivers/net/wireless/ath/ath10k/ce.h     |  13 +-
 drivers/net/wireless/ath/ath10k/htt.c    |   2 +-
 drivers/net/wireless/ath/ath10k/htt.h    |  25 ++-
 drivers/net/wireless/ath/ath10k/htt_rx.c | 301 ++++++++++++-------------------
 drivers/net/wireless/ath/ath10k/htt_tx.c |  14 +-
 drivers/net/wireless/ath/ath10k/pci.c    | 106 ++++++++---
 drivers/net/wireless/ath/ath10k/txrx.c   |  12 +-
 8 files changed, 260 insertions(+), 257 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/9] ath10k: improve throughput performance
@ 2016-03-22 11:52 ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

Hi All,

In order to reuse HTT Rx descriptor (copy engine 5), HTT response
processing should be decoupled from txrx data processing. This change also
helps to reduce rx ring lock contention. As txrx tasklet's work load is
reduced, rx replenish task can be combined with txrx_task. Refilling
complete rx ring from txrx tasket is affecting UDP UL traffic in AP135
platform. Hence existing refill threshold is updated to meet peak
throughput in both AP135 and AP148 platforms. Instead of tasklet existing
refill timer is used to reschedule replenish work at an interval of 5 ms
incase of more deficit.

This series are experimented in both AP148(QCA99x0) & IPQ4019 platforms.
Below are consolidated report alongwith CPU usage. Thanks Tamizh for helping 
to verify the changes.

        IPQ4019(TOT)  IPQ4019(+rework)   AP148(TOT)     AP148(+rework)
        ===========   ===============    ==========     =============
TCP DL   639 (40%)       646 (42%)       1134 (71%)      1134 (71%)
TCP UL   661 (31%)       663 (30%)       1244 (71%)      1270 (72%)
UDP DL   670 (50%)       682 (49%)       1240 (73%)      1244 (75%)

        AP135 (OpenWrt TOT)     AP135 (+changes)
        ==================      ===============

TCP DL          603             620
TCP UL          430             428
UDP DL          758             803
UDP UL          420             450

-Rajkumar

Rajkumar Manoharan (9):
  ath10k: speedup htt rx descriptor processing for tx completion
  ath10k: copy tx fetch indication message
  ath10k: remove unused fw_desc processing
  ath10k: cleanup amsdu processing for rx indication
  ath10k: speedup htt rx descriptor processing for rx_ind
  ath10k: register ath10k_htt_htc_t2h_msg_handler
  ath10k: cleanup copy engine receive next completion
  ath10k: reuse copy engine 5 (htt rx) descriptors
  ath10k: combine txrx and replenish task

 drivers/net/wireless/ath/ath10k/ce.c     |  44 ++---
 drivers/net/wireless/ath/ath10k/ce.h     |  13 +-
 drivers/net/wireless/ath/ath10k/htt.c    |   2 +-
 drivers/net/wireless/ath/ath10k/htt.h    |  25 ++-
 drivers/net/wireless/ath/ath10k/htt_rx.c | 301 ++++++++++++-------------------
 drivers/net/wireless/ath/ath10k/htt_tx.c |  14 +-
 drivers/net/wireless/ath/ath10k/pci.c    | 106 ++++++++---
 drivers/net/wireless/ath/ath10k/txrx.c   |  12 +-
 8 files changed, 260 insertions(+), 257 deletions(-)

-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

To optimize CPU usage htt rx descriptors will be reused instead of
refilling it for htt rx copy engine (CE5). To support that all htt rx
indications should be processed at same context. FIFO queue is used
to maintain tx completion status for each msdu. This helps to retain
the order of tx completion.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.h    | 18 ++++++++---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 55 +++++++++++++++++++-------------
 drivers/net/wireless/ath/ath10k/htt_tx.c | 14 +++++++-
 drivers/net/wireless/ath/ath10k/txrx.c   | 12 +++----
 4 files changed, 64 insertions(+), 35 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index d196bcc..76c4bae 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -22,6 +22,7 @@
 #include <linux/interrupt.h>
 #include <linux/dmapool.h>
 #include <linux/hashtable.h>
+#include <linux/kfifo.h>
 #include <net/mac80211.h>
 
 #include "htc.h"
@@ -1526,10 +1527,15 @@ struct htt_resp {
 /*** host side structures follow ***/
 
 struct htt_tx_done {
-	u32 msdu_id;
-	bool discard;
-	bool no_ack;
-	bool success;
+	u16 msdu_id;
+	u16 status;
+};
+
+enum htt_tx_compl_state {
+	HTT_TX_COMPL_STATE_NONE,
+	HTT_TX_COMPL_STATE_ACK,
+	HTT_TX_COMPL_STATE_NOACK,
+	HTT_TX_COMPL_STATE_DISCARD,
 };
 
 struct htt_peer_map_event {
@@ -1650,6 +1656,9 @@ struct ath10k_htt {
 	struct idr pending_tx;
 	wait_queue_head_t empty_tx_wq;
 
+	/* FIFO for storing tx done status {ack, no-ack, discard} and msdu id */
+	DECLARE_KFIFO_PTR(txdone_fifo, struct htt_tx_done);
+
 	/* set if host-fw communication goes haywire
 	 * used to avoid further failures */
 	bool rx_confused;
@@ -1658,7 +1667,6 @@ struct ath10k_htt {
 	/* This is used to group tx/rx completions separately and process them
 	 * in batches to reduce cache stalls */
 	struct tasklet_struct txrx_compl_task;
-	struct sk_buff_head tx_compl_q;
 	struct sk_buff_head rx_compl_q;
 	struct sk_buff_head rx_in_ord_compl_q;
 	struct sk_buff_head tx_fetch_ind_q;
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 2da8ccf..855ff4a 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -226,7 +226,6 @@ void ath10k_htt_rx_free(struct ath10k_htt *htt)
 	tasklet_kill(&htt->rx_replenish_task);
 	tasklet_kill(&htt->txrx_compl_task);
 
-	skb_queue_purge(&htt->tx_compl_q);
 	skb_queue_purge(&htt->rx_compl_q);
 	skb_queue_purge(&htt->rx_in_ord_compl_q);
 	skb_queue_purge(&htt->tx_fetch_ind_q);
@@ -567,7 +566,6 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 	tasklet_init(&htt->rx_replenish_task, ath10k_htt_rx_replenish_task,
 		     (unsigned long)htt);
 
-	skb_queue_head_init(&htt->tx_compl_q);
 	skb_queue_head_init(&htt->rx_compl_q);
 	skb_queue_head_init(&htt->rx_in_ord_compl_q);
 	skb_queue_head_init(&htt->tx_fetch_ind_q);
@@ -1678,7 +1676,7 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
 	}
 }
 
-static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
+static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
 				       struct sk_buff *skb)
 {
 	struct ath10k_htt *htt = &ar->htt;
@@ -1690,19 +1688,19 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
 
 	switch (status) {
 	case HTT_DATA_TX_STATUS_NO_ACK:
-		tx_done.no_ack = true;
+		tx_done.status = HTT_TX_COMPL_STATE_NOACK;
 		break;
 	case HTT_DATA_TX_STATUS_OK:
-		tx_done.success = true;
+		tx_done.status = HTT_TX_COMPL_STATE_ACK;
 		break;
 	case HTT_DATA_TX_STATUS_DISCARD:
 	case HTT_DATA_TX_STATUS_POSTPONE:
 	case HTT_DATA_TX_STATUS_DOWNLOAD_FAIL:
-		tx_done.discard = true;
+		tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 		break;
 	default:
 		ath10k_warn(ar, "unhandled tx completion status %d\n", status);
-		tx_done.discard = true;
+		tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 		break;
 	}
 
@@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
 	for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
 		msdu_id = resp->data_tx_completion.msdus[i];
 		tx_done.msdu_id = __le16_to_cpu(msdu_id);
-		ath10k_txrx_tx_unref(htt, &tx_done);
+
+		/* kfifo_put: In practice firmware shouldn't fire off per-CE
+		 * interrupt and main interrupt (MSI/-X range case) for the same
+		 * HTC service so it should be safe to use kfifo_put w/o lock.
+		 *
+		 * From kfifo_put() documentation:
+		 *  Note that with only one concurrent reader and one concurrent
+		 *  writer, you don't need extra locking to use these macro.
+		 */
+		if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
+			ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
+				    tx_done.msdu_id, tx_done.status);
+			ath10k_txrx_tx_unref(htt, &tx_done);
+		}
 	}
 }
 
@@ -2338,19 +2349,18 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	case HTT_T2H_MSG_TYPE_MGMT_TX_COMPLETION: {
 		struct htt_tx_done tx_done = {};
 		int status = __le32_to_cpu(resp->mgmt_tx_completion.status);
-
 		tx_done.msdu_id =
 			__le32_to_cpu(resp->mgmt_tx_completion.desc_id);
 
 		switch (status) {
 		case HTT_MGMT_TX_STATUS_OK:
-			tx_done.success = true;
+			tx_done.status = HTT_TX_COMPL_STATE_ACK;
 			break;
 		case HTT_MGMT_TX_STATUS_RETRY:
-			tx_done.no_ack = true;
+			tx_done.status = HTT_TX_COMPL_STATE_NOACK;
 			break;
 		case HTT_MGMT_TX_STATUS_DROP:
-			tx_done.discard = true;
+			tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 			break;
 		}
 
@@ -2364,9 +2374,9 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 		break;
 	}
 	case HTT_T2H_MSG_TYPE_TX_COMPL_IND:
-		skb_queue_tail(&htt->tx_compl_q, skb);
+		ath10k_htt_rx_tx_compl_ind(htt->ar, skb);
 		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		break;
 	case HTT_T2H_MSG_TYPE_SEC_IND: {
 		struct ath10k *ar = htt->ar;
 		struct htt_security_indication *ev = &resp->security_indication;
@@ -2475,7 +2485,7 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 {
 	struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
 	struct ath10k *ar = htt->ar;
-	struct sk_buff_head tx_q;
+	struct htt_tx_done tx_done = {};
 	struct sk_buff_head rx_q;
 	struct sk_buff_head rx_ind_q;
 	struct sk_buff_head tx_ind_q;
@@ -2483,14 +2493,10 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 	struct sk_buff *skb;
 	unsigned long flags;
 
-	__skb_queue_head_init(&tx_q);
 	__skb_queue_head_init(&rx_q);
 	__skb_queue_head_init(&rx_ind_q);
 	__skb_queue_head_init(&tx_ind_q);
 
-	spin_lock_irqsave(&htt->tx_compl_q.lock, flags);
-	skb_queue_splice_init(&htt->tx_compl_q, &tx_q);
-	spin_unlock_irqrestore(&htt->tx_compl_q.lock, flags);
 
 	spin_lock_irqsave(&htt->rx_compl_q.lock, flags);
 	skb_queue_splice_init(&htt->rx_compl_q, &rx_q);
@@ -2504,10 +2510,13 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 	skb_queue_splice_init(&htt->tx_fetch_ind_q, &tx_ind_q);
 	spin_unlock_irqrestore(&htt->tx_fetch_ind_q.lock, flags);
 
-	while ((skb = __skb_dequeue(&tx_q))) {
-		ath10k_htt_rx_frm_tx_compl(htt->ar, skb);
-		dev_kfree_skb_any(skb);
-	}
+	/* kfifo_get: called only within txrx_tasklet so it's neatly serialized.
+	 * From kfifo_get() documentation:
+	 *  Note that with only one concurrent reader and one concurrent writer,
+	 *  you don't need extra locking to use these macro.
+	 */
+	while (kfifo_get(&htt->txdone_fifo, &tx_done))
+		ath10k_txrx_tx_unref(htt, &tx_done);
 
 	while ((skb = __skb_dequeue(&tx_ind_q))) {
 		ath10k_htt_rx_tx_fetch_ind(ar, skb);
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index b2ae122..9baa2e6 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -339,8 +339,18 @@ int ath10k_htt_tx_alloc(struct ath10k_htt *htt)
 		goto free_frag_desc;
 	}
 
+	size = roundup_pow_of_two(htt->max_num_pending_tx);
+	ret = kfifo_alloc(&htt->txdone_fifo, size, GFP_KERNEL);
+	if (ret) {
+		ath10k_err(ar, "failed to alloc txdone fifo: %d\n", ret);
+		goto free_txq;
+	}
+
 	return 0;
 
+free_txq:
+	ath10k_htt_tx_free_txq(htt);
+
 free_frag_desc:
 	ath10k_htt_tx_free_cont_frag_desc(htt);
 
@@ -364,8 +374,8 @@ static int ath10k_htt_tx_clean_up_pending(int msdu_id, void *skb, void *ctx)
 
 	ath10k_dbg(ar, ATH10K_DBG_HTT, "force cleanup msdu_id %hu\n", msdu_id);
 
-	tx_done.discard = 1;
 	tx_done.msdu_id = msdu_id;
+	tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 
 	ath10k_txrx_tx_unref(htt, &tx_done);
 
@@ -388,6 +398,8 @@ void ath10k_htt_tx_free(struct ath10k_htt *htt)
 
 	ath10k_htt_tx_free_txq(htt);
 	ath10k_htt_tx_free_cont_frag_desc(htt);
+	WARN_ON(!kfifo_is_empty(&htt->txdone_fifo));
+	kfifo_free(&htt->txdone_fifo);
 }
 
 void ath10k_htt_htc_tx_complete(struct ath10k *ar, struct sk_buff *skb)
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 48e26cd..9369411 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -61,9 +61,8 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
 	struct sk_buff *msdu;
 
 	ath10k_dbg(ar, ATH10K_DBG_HTT,
-		   "htt tx completion msdu_id %u discard %d no_ack %d success %d\n",
-		   tx_done->msdu_id, !!tx_done->discard,
-		   !!tx_done->no_ack, !!tx_done->success);
+		   "htt tx completion msdu_id %u status %d\n",
+		   tx_done->msdu_id, tx_done->status);
 
 	if (tx_done->msdu_id >= htt->max_num_pending_tx) {
 		ath10k_warn(ar, "warning: msdu_id %d too big, ignoring\n",
@@ -101,7 +100,7 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
 	memset(&info->status, 0, sizeof(info->status));
 	trace_ath10k_txrx_tx_unref(ar, tx_done->msdu_id);
 
-	if (tx_done->discard) {
+	if (tx_done->status == HTT_TX_COMPL_STATE_DISCARD) {
 		ieee80211_free_txskb(htt->ar->hw, msdu);
 		return 0;
 	}
@@ -109,10 +108,11 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
 	if (!(info->flags & IEEE80211_TX_CTL_NO_ACK))
 		info->flags |= IEEE80211_TX_STAT_ACK;
 
-	if (tx_done->no_ack)
+	if (tx_done->status == HTT_TX_COMPL_STATE_NOACK)
 		info->flags &= ~IEEE80211_TX_STAT_ACK;
 
-	if (tx_done->success && (info->flags & IEEE80211_TX_CTL_NO_ACK))
+	if ((tx_done->status == HTT_TX_COMPL_STATE_ACK) &&
+	    (info->flags & IEEE80211_TX_CTL_NO_ACK))
 		info->flags |= IEEE80211_TX_STAT_NOACK_TRANSMITTED;
 
 	ieee80211_tx_status(htt->ar->hw, msdu);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

To optimize CPU usage htt rx descriptors will be reused instead of
refilling it for htt rx copy engine (CE5). To support that all htt rx
indications should be processed at same context. FIFO queue is used
to maintain tx completion status for each msdu. This helps to retain
the order of tx completion.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.h    | 18 ++++++++---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 55 +++++++++++++++++++-------------
 drivers/net/wireless/ath/ath10k/htt_tx.c | 14 +++++++-
 drivers/net/wireless/ath/ath10k/txrx.c   | 12 +++----
 4 files changed, 64 insertions(+), 35 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index d196bcc..76c4bae 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -22,6 +22,7 @@
 #include <linux/interrupt.h>
 #include <linux/dmapool.h>
 #include <linux/hashtable.h>
+#include <linux/kfifo.h>
 #include <net/mac80211.h>
 
 #include "htc.h"
@@ -1526,10 +1527,15 @@ struct htt_resp {
 /*** host side structures follow ***/
 
 struct htt_tx_done {
-	u32 msdu_id;
-	bool discard;
-	bool no_ack;
-	bool success;
+	u16 msdu_id;
+	u16 status;
+};
+
+enum htt_tx_compl_state {
+	HTT_TX_COMPL_STATE_NONE,
+	HTT_TX_COMPL_STATE_ACK,
+	HTT_TX_COMPL_STATE_NOACK,
+	HTT_TX_COMPL_STATE_DISCARD,
 };
 
 struct htt_peer_map_event {
@@ -1650,6 +1656,9 @@ struct ath10k_htt {
 	struct idr pending_tx;
 	wait_queue_head_t empty_tx_wq;
 
+	/* FIFO for storing tx done status {ack, no-ack, discard} and msdu id */
+	DECLARE_KFIFO_PTR(txdone_fifo, struct htt_tx_done);
+
 	/* set if host-fw communication goes haywire
 	 * used to avoid further failures */
 	bool rx_confused;
@@ -1658,7 +1667,6 @@ struct ath10k_htt {
 	/* This is used to group tx/rx completions separately and process them
 	 * in batches to reduce cache stalls */
 	struct tasklet_struct txrx_compl_task;
-	struct sk_buff_head tx_compl_q;
 	struct sk_buff_head rx_compl_q;
 	struct sk_buff_head rx_in_ord_compl_q;
 	struct sk_buff_head tx_fetch_ind_q;
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 2da8ccf..855ff4a 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -226,7 +226,6 @@ void ath10k_htt_rx_free(struct ath10k_htt *htt)
 	tasklet_kill(&htt->rx_replenish_task);
 	tasklet_kill(&htt->txrx_compl_task);
 
-	skb_queue_purge(&htt->tx_compl_q);
 	skb_queue_purge(&htt->rx_compl_q);
 	skb_queue_purge(&htt->rx_in_ord_compl_q);
 	skb_queue_purge(&htt->tx_fetch_ind_q);
@@ -567,7 +566,6 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 	tasklet_init(&htt->rx_replenish_task, ath10k_htt_rx_replenish_task,
 		     (unsigned long)htt);
 
-	skb_queue_head_init(&htt->tx_compl_q);
 	skb_queue_head_init(&htt->rx_compl_q);
 	skb_queue_head_init(&htt->rx_in_ord_compl_q);
 	skb_queue_head_init(&htt->tx_fetch_ind_q);
@@ -1678,7 +1676,7 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
 	}
 }
 
-static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
+static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
 				       struct sk_buff *skb)
 {
 	struct ath10k_htt *htt = &ar->htt;
@@ -1690,19 +1688,19 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
 
 	switch (status) {
 	case HTT_DATA_TX_STATUS_NO_ACK:
-		tx_done.no_ack = true;
+		tx_done.status = HTT_TX_COMPL_STATE_NOACK;
 		break;
 	case HTT_DATA_TX_STATUS_OK:
-		tx_done.success = true;
+		tx_done.status = HTT_TX_COMPL_STATE_ACK;
 		break;
 	case HTT_DATA_TX_STATUS_DISCARD:
 	case HTT_DATA_TX_STATUS_POSTPONE:
 	case HTT_DATA_TX_STATUS_DOWNLOAD_FAIL:
-		tx_done.discard = true;
+		tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 		break;
 	default:
 		ath10k_warn(ar, "unhandled tx completion status %d\n", status);
-		tx_done.discard = true;
+		tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 		break;
 	}
 
@@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
 	for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
 		msdu_id = resp->data_tx_completion.msdus[i];
 		tx_done.msdu_id = __le16_to_cpu(msdu_id);
-		ath10k_txrx_tx_unref(htt, &tx_done);
+
+		/* kfifo_put: In practice firmware shouldn't fire off per-CE
+		 * interrupt and main interrupt (MSI/-X range case) for the same
+		 * HTC service so it should be safe to use kfifo_put w/o lock.
+		 *
+		 * From kfifo_put() documentation:
+		 *  Note that with only one concurrent reader and one concurrent
+		 *  writer, you don't need extra locking to use these macro.
+		 */
+		if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
+			ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
+				    tx_done.msdu_id, tx_done.status);
+			ath10k_txrx_tx_unref(htt, &tx_done);
+		}
 	}
 }
 
@@ -2338,19 +2349,18 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	case HTT_T2H_MSG_TYPE_MGMT_TX_COMPLETION: {
 		struct htt_tx_done tx_done = {};
 		int status = __le32_to_cpu(resp->mgmt_tx_completion.status);
-
 		tx_done.msdu_id =
 			__le32_to_cpu(resp->mgmt_tx_completion.desc_id);
 
 		switch (status) {
 		case HTT_MGMT_TX_STATUS_OK:
-			tx_done.success = true;
+			tx_done.status = HTT_TX_COMPL_STATE_ACK;
 			break;
 		case HTT_MGMT_TX_STATUS_RETRY:
-			tx_done.no_ack = true;
+			tx_done.status = HTT_TX_COMPL_STATE_NOACK;
 			break;
 		case HTT_MGMT_TX_STATUS_DROP:
-			tx_done.discard = true;
+			tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 			break;
 		}
 
@@ -2364,9 +2374,9 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 		break;
 	}
 	case HTT_T2H_MSG_TYPE_TX_COMPL_IND:
-		skb_queue_tail(&htt->tx_compl_q, skb);
+		ath10k_htt_rx_tx_compl_ind(htt->ar, skb);
 		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		break;
 	case HTT_T2H_MSG_TYPE_SEC_IND: {
 		struct ath10k *ar = htt->ar;
 		struct htt_security_indication *ev = &resp->security_indication;
@@ -2475,7 +2485,7 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 {
 	struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
 	struct ath10k *ar = htt->ar;
-	struct sk_buff_head tx_q;
+	struct htt_tx_done tx_done = {};
 	struct sk_buff_head rx_q;
 	struct sk_buff_head rx_ind_q;
 	struct sk_buff_head tx_ind_q;
@@ -2483,14 +2493,10 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 	struct sk_buff *skb;
 	unsigned long flags;
 
-	__skb_queue_head_init(&tx_q);
 	__skb_queue_head_init(&rx_q);
 	__skb_queue_head_init(&rx_ind_q);
 	__skb_queue_head_init(&tx_ind_q);
 
-	spin_lock_irqsave(&htt->tx_compl_q.lock, flags);
-	skb_queue_splice_init(&htt->tx_compl_q, &tx_q);
-	spin_unlock_irqrestore(&htt->tx_compl_q.lock, flags);
 
 	spin_lock_irqsave(&htt->rx_compl_q.lock, flags);
 	skb_queue_splice_init(&htt->rx_compl_q, &rx_q);
@@ -2504,10 +2510,13 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 	skb_queue_splice_init(&htt->tx_fetch_ind_q, &tx_ind_q);
 	spin_unlock_irqrestore(&htt->tx_fetch_ind_q.lock, flags);
 
-	while ((skb = __skb_dequeue(&tx_q))) {
-		ath10k_htt_rx_frm_tx_compl(htt->ar, skb);
-		dev_kfree_skb_any(skb);
-	}
+	/* kfifo_get: called only within txrx_tasklet so it's neatly serialized.
+	 * From kfifo_get() documentation:
+	 *  Note that with only one concurrent reader and one concurrent writer,
+	 *  you don't need extra locking to use these macro.
+	 */
+	while (kfifo_get(&htt->txdone_fifo, &tx_done))
+		ath10k_txrx_tx_unref(htt, &tx_done);
 
 	while ((skb = __skb_dequeue(&tx_ind_q))) {
 		ath10k_htt_rx_tx_fetch_ind(ar, skb);
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index b2ae122..9baa2e6 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -339,8 +339,18 @@ int ath10k_htt_tx_alloc(struct ath10k_htt *htt)
 		goto free_frag_desc;
 	}
 
+	size = roundup_pow_of_two(htt->max_num_pending_tx);
+	ret = kfifo_alloc(&htt->txdone_fifo, size, GFP_KERNEL);
+	if (ret) {
+		ath10k_err(ar, "failed to alloc txdone fifo: %d\n", ret);
+		goto free_txq;
+	}
+
 	return 0;
 
+free_txq:
+	ath10k_htt_tx_free_txq(htt);
+
 free_frag_desc:
 	ath10k_htt_tx_free_cont_frag_desc(htt);
 
@@ -364,8 +374,8 @@ static int ath10k_htt_tx_clean_up_pending(int msdu_id, void *skb, void *ctx)
 
 	ath10k_dbg(ar, ATH10K_DBG_HTT, "force cleanup msdu_id %hu\n", msdu_id);
 
-	tx_done.discard = 1;
 	tx_done.msdu_id = msdu_id;
+	tx_done.status = HTT_TX_COMPL_STATE_DISCARD;
 
 	ath10k_txrx_tx_unref(htt, &tx_done);
 
@@ -388,6 +398,8 @@ void ath10k_htt_tx_free(struct ath10k_htt *htt)
 
 	ath10k_htt_tx_free_txq(htt);
 	ath10k_htt_tx_free_cont_frag_desc(htt);
+	WARN_ON(!kfifo_is_empty(&htt->txdone_fifo));
+	kfifo_free(&htt->txdone_fifo);
 }
 
 void ath10k_htt_htc_tx_complete(struct ath10k *ar, struct sk_buff *skb)
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 48e26cd..9369411 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -61,9 +61,8 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
 	struct sk_buff *msdu;
 
 	ath10k_dbg(ar, ATH10K_DBG_HTT,
-		   "htt tx completion msdu_id %u discard %d no_ack %d success %d\n",
-		   tx_done->msdu_id, !!tx_done->discard,
-		   !!tx_done->no_ack, !!tx_done->success);
+		   "htt tx completion msdu_id %u status %d\n",
+		   tx_done->msdu_id, tx_done->status);
 
 	if (tx_done->msdu_id >= htt->max_num_pending_tx) {
 		ath10k_warn(ar, "warning: msdu_id %d too big, ignoring\n",
@@ -101,7 +100,7 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
 	memset(&info->status, 0, sizeof(info->status));
 	trace_ath10k_txrx_tx_unref(ar, tx_done->msdu_id);
 
-	if (tx_done->discard) {
+	if (tx_done->status == HTT_TX_COMPL_STATE_DISCARD) {
 		ieee80211_free_txskb(htt->ar->hw, msdu);
 		return 0;
 	}
@@ -109,10 +108,11 @@ int ath10k_txrx_tx_unref(struct ath10k_htt *htt,
 	if (!(info->flags & IEEE80211_TX_CTL_NO_ACK))
 		info->flags |= IEEE80211_TX_STAT_ACK;
 
-	if (tx_done->no_ack)
+	if (tx_done->status == HTT_TX_COMPL_STATE_NOACK)
 		info->flags &= ~IEEE80211_TX_STAT_ACK;
 
-	if (tx_done->success && (info->flags & IEEE80211_TX_CTL_NO_ACK))
+	if ((tx_done->status == HTT_TX_COMPL_STATE_ACK) &&
+	    (info->flags & IEEE80211_TX_CTL_NO_ACK))
 		info->flags |= IEEE80211_TX_STAT_NOACK_TRANSMITTED;
 
 	ieee80211_tx_status(htt->ar->hw, msdu);
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/9] ath10k: copy tx fetch indication message
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

To optmize CPU usage htt rx descriptors will be reused instead of
refilling it for htt rx copy engine (CE5). To support that all htt rx
indications should be proecssed at same context. Instead of queueing
actual indication message, queue copied message for txrx processing.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 855ff4a..b888e3a 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2449,10 +2449,17 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	}
 	case HTT_T2H_MSG_TYPE_AGGR_CONF:
 		break;
-	case HTT_T2H_MSG_TYPE_TX_FETCH_IND:
-		skb_queue_tail(&htt->tx_fetch_ind_q, skb);
+	case HTT_T2H_MSG_TYPE_TX_FETCH_IND: {
+		struct sk_buff *tx_fetch_ind = skb_copy(skb, GFP_ATOMIC);
+
+		if (!tx_fetch_ind) {
+			ath10k_warn(ar, "failed to copy htt tx fetch ind\n");
+			break;
+		}
+		skb_queue_tail(&htt->tx_fetch_ind_q, tx_fetch_ind);
 		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		break;
+	}
 	case HTT_T2H_MSG_TYPE_TX_FETCH_CONFIRM:
 		ath10k_htt_rx_tx_fetch_confirm(ar, skb);
 		break;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/9] ath10k: copy tx fetch indication message
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

To optmize CPU usage htt rx descriptors will be reused instead of
refilling it for htt rx copy engine (CE5). To support that all htt rx
indications should be proecssed at same context. Instead of queueing
actual indication message, queue copied message for txrx processing.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 855ff4a..b888e3a 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2449,10 +2449,17 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	}
 	case HTT_T2H_MSG_TYPE_AGGR_CONF:
 		break;
-	case HTT_T2H_MSG_TYPE_TX_FETCH_IND:
-		skb_queue_tail(&htt->tx_fetch_ind_q, skb);
+	case HTT_T2H_MSG_TYPE_TX_FETCH_IND: {
+		struct sk_buff *tx_fetch_ind = skb_copy(skb, GFP_ATOMIC);
+
+		if (!tx_fetch_ind) {
+			ath10k_warn(ar, "failed to copy htt tx fetch ind\n");
+			break;
+		}
+		skb_queue_tail(&htt->tx_fetch_ind_q, tx_fetch_ind);
 		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		break;
+	}
 	case HTT_T2H_MSG_TYPE_TX_FETCH_CONFIRM:
 		ath10k_htt_rx_tx_fetch_confirm(ar, skb);
 		break;
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/9] ath10k: remove unused fw_desc processing
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

The fw descriptor was never used and probably never will be. It makes
little sense to maintain support for it. Remove it and simplify rx
processing. This will make it easier to optimize rx processing later
as well.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 65 +-------------------------------
 1 file changed, 2 insertions(+), 63 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index b888e3a..abb712f 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -281,7 +281,6 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
 
 /* return: < 0 fatal error, 0 - non chained msdu, 1 chained msdu */
 static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
-				   u8 **fw_desc, int *fw_desc_len,
 				   struct sk_buff_head *amsdu)
 {
 	struct ath10k *ar = htt->ar;
@@ -323,48 +322,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
 			return -EIO;
 		}
 
-		/*
-		 * Copy the FW rx descriptor for this MSDU from the rx
-		 * indication message into the MSDU's netbuf. HL uses the
-		 * same rx indication message definition as LL, and simply
-		 * appends new info (fields from the HW rx desc, and the
-		 * MSDU payload itself). So, the offset into the rx
-		 * indication message only has to account for the standard
-		 * offset of the per-MSDU FW rx desc info within the
-		 * message, and how many bytes of the per-MSDU FW rx desc
-		 * info have already been consumed. (And the endianness of
-		 * the host, since for a big-endian host, the rx ind
-		 * message contents, including the per-MSDU rx desc bytes,
-		 * were byteswapped during upload.)
-		 */
-		if (*fw_desc_len > 0) {
-			rx_desc->fw_desc.info0 = **fw_desc;
-			/*
-			 * The target is expected to only provide the basic
-			 * per-MSDU rx descriptors. Just to be sure, verify
-			 * that the target has not attached extension data
-			 * (e.g. LRO flow ID).
-			 */
-
-			/* or more, if there's extension data */
-			(*fw_desc)++;
-			(*fw_desc_len)--;
-		} else {
-			/*
-			 * When an oversized AMSDU happened, FW will lost
-			 * some of MSDU status - in this case, the FW
-			 * descriptors provided will be less than the
-			 * actual MSDUs inside this MPDU. Mark the FW
-			 * descriptors so that it will still deliver to
-			 * upper stack, if no CRC error for this MPDU.
-			 *
-			 * FIX THIS - the FW descriptors are actually for
-			 * MSDUs in the end of this A-MSDU instead of the
-			 * beginning.
-			 */
-			rx_desc->fw_desc.info0 = 0;
-		}
-
 		msdu_len_invalid = !!(__le32_to_cpu(rx_desc->attention.flags)
 					& (RX_ATTENTION_FLAGS_MPDU_LENGTH_ERR |
 					   RX_ATTENTION_FLAGS_MSDU_LENGTH_ERR));
@@ -1579,8 +1536,6 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 	struct htt_rx_indication_mpdu_range *mpdu_ranges;
 	struct sk_buff_head amsdu;
 	int num_mpdu_ranges;
-	int fw_desc_len;
-	u8 *fw_desc;
 	int i, ret, mpdu_count = 0;
 
 	lockdep_assert_held(&htt->rx_ring.lock);
@@ -1588,9 +1543,6 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 	if (htt->rx_confused)
 		return;
 
-	fw_desc_len = __le16_to_cpu(rx->prefix.fw_rx_desc_bytes);
-	fw_desc = (u8 *)&rx->fw_desc;
-
 	num_mpdu_ranges = MS(__le32_to_cpu(rx->hdr.info1),
 			     HTT_RX_INDICATION_INFO1_NUM_MPDU_RANGES);
 	mpdu_ranges = htt_rx_ind_get_mpdu_ranges(rx);
@@ -1605,8 +1557,7 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 
 	while (mpdu_count--) {
 		__skb_queue_head_init(&amsdu);
-		ret = ath10k_htt_rx_amsdu_pop(htt, &fw_desc,
-					      &fw_desc_len, &amsdu);
+		ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
 		if (ret < 0) {
 			ath10k_warn(ar, "rx ring became corrupted: %d\n", ret);
 			__skb_queue_purge(&amsdu);
@@ -1634,17 +1585,11 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
 	struct ieee80211_rx_status *rx_status = &htt->rx_status;
 	struct sk_buff_head amsdu;
 	int ret;
-	u8 *fw_desc;
-	int fw_desc_len;
-
-	fw_desc_len = __le16_to_cpu(frag->fw_rx_desc_bytes);
-	fw_desc = (u8 *)frag->fw_msdu_rx_desc;
 
 	__skb_queue_head_init(&amsdu);
 
 	spin_lock_bh(&htt->rx_ring.lock);
-	ret = ath10k_htt_rx_amsdu_pop(htt, &fw_desc, &fw_desc_len,
-				      &amsdu);
+	ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
 	spin_unlock_bh(&htt->rx_ring.lock);
 
 	tasklet_schedule(&htt->rx_replenish_task);
@@ -1668,12 +1613,6 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
 	ath10k_htt_rx_h_filter(ar, &amsdu, rx_status);
 	ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status);
 	ath10k_htt_rx_h_deliver(ar, &amsdu, rx_status);
-
-	if (fw_desc_len > 0) {
-		ath10k_dbg(ar, ATH10K_DBG_HTT,
-			   "expecting more fragmented rx in one indication %d\n",
-			   fw_desc_len);
-	}
 }
 
 static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/9] ath10k: remove unused fw_desc processing
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

The fw descriptor was never used and probably never will be. It makes
little sense to maintain support for it. Remove it and simplify rx
processing. This will make it easier to optimize rx processing later
as well.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 65 +-------------------------------
 1 file changed, 2 insertions(+), 63 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index b888e3a..abb712f 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -281,7 +281,6 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
 
 /* return: < 0 fatal error, 0 - non chained msdu, 1 chained msdu */
 static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
-				   u8 **fw_desc, int *fw_desc_len,
 				   struct sk_buff_head *amsdu)
 {
 	struct ath10k *ar = htt->ar;
@@ -323,48 +322,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
 			return -EIO;
 		}
 
-		/*
-		 * Copy the FW rx descriptor for this MSDU from the rx
-		 * indication message into the MSDU's netbuf. HL uses the
-		 * same rx indication message definition as LL, and simply
-		 * appends new info (fields from the HW rx desc, and the
-		 * MSDU payload itself). So, the offset into the rx
-		 * indication message only has to account for the standard
-		 * offset of the per-MSDU FW rx desc info within the
-		 * message, and how many bytes of the per-MSDU FW rx desc
-		 * info have already been consumed. (And the endianness of
-		 * the host, since for a big-endian host, the rx ind
-		 * message contents, including the per-MSDU rx desc bytes,
-		 * were byteswapped during upload.)
-		 */
-		if (*fw_desc_len > 0) {
-			rx_desc->fw_desc.info0 = **fw_desc;
-			/*
-			 * The target is expected to only provide the basic
-			 * per-MSDU rx descriptors. Just to be sure, verify
-			 * that the target has not attached extension data
-			 * (e.g. LRO flow ID).
-			 */
-
-			/* or more, if there's extension data */
-			(*fw_desc)++;
-			(*fw_desc_len)--;
-		} else {
-			/*
-			 * When an oversized AMSDU happened, FW will lost
-			 * some of MSDU status - in this case, the FW
-			 * descriptors provided will be less than the
-			 * actual MSDUs inside this MPDU. Mark the FW
-			 * descriptors so that it will still deliver to
-			 * upper stack, if no CRC error for this MPDU.
-			 *
-			 * FIX THIS - the FW descriptors are actually for
-			 * MSDUs in the end of this A-MSDU instead of the
-			 * beginning.
-			 */
-			rx_desc->fw_desc.info0 = 0;
-		}
-
 		msdu_len_invalid = !!(__le32_to_cpu(rx_desc->attention.flags)
 					& (RX_ATTENTION_FLAGS_MPDU_LENGTH_ERR |
 					   RX_ATTENTION_FLAGS_MSDU_LENGTH_ERR));
@@ -1579,8 +1536,6 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 	struct htt_rx_indication_mpdu_range *mpdu_ranges;
 	struct sk_buff_head amsdu;
 	int num_mpdu_ranges;
-	int fw_desc_len;
-	u8 *fw_desc;
 	int i, ret, mpdu_count = 0;
 
 	lockdep_assert_held(&htt->rx_ring.lock);
@@ -1588,9 +1543,6 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 	if (htt->rx_confused)
 		return;
 
-	fw_desc_len = __le16_to_cpu(rx->prefix.fw_rx_desc_bytes);
-	fw_desc = (u8 *)&rx->fw_desc;
-
 	num_mpdu_ranges = MS(__le32_to_cpu(rx->hdr.info1),
 			     HTT_RX_INDICATION_INFO1_NUM_MPDU_RANGES);
 	mpdu_ranges = htt_rx_ind_get_mpdu_ranges(rx);
@@ -1605,8 +1557,7 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 
 	while (mpdu_count--) {
 		__skb_queue_head_init(&amsdu);
-		ret = ath10k_htt_rx_amsdu_pop(htt, &fw_desc,
-					      &fw_desc_len, &amsdu);
+		ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
 		if (ret < 0) {
 			ath10k_warn(ar, "rx ring became corrupted: %d\n", ret);
 			__skb_queue_purge(&amsdu);
@@ -1634,17 +1585,11 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
 	struct ieee80211_rx_status *rx_status = &htt->rx_status;
 	struct sk_buff_head amsdu;
 	int ret;
-	u8 *fw_desc;
-	int fw_desc_len;
-
-	fw_desc_len = __le16_to_cpu(frag->fw_rx_desc_bytes);
-	fw_desc = (u8 *)frag->fw_msdu_rx_desc;
 
 	__skb_queue_head_init(&amsdu);
 
 	spin_lock_bh(&htt->rx_ring.lock);
-	ret = ath10k_htt_rx_amsdu_pop(htt, &fw_desc, &fw_desc_len,
-				      &amsdu);
+	ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
 	spin_unlock_bh(&htt->rx_ring.lock);
 
 	tasklet_schedule(&htt->rx_replenish_task);
@@ -1668,12 +1613,6 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
 	ath10k_htt_rx_h_filter(ar, &amsdu, rx_status);
 	ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status);
 	ath10k_htt_rx_h_deliver(ar, &amsdu, rx_status);
-
-	if (fw_desc_len > 0) {
-		ath10k_dbg(ar, ATH10K_DBG_HTT,
-			   "expecting more fragmented rx in one indication %d\n",
-			   fw_desc_len);
-	}
 }
 
 static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] ath10k: cleanup amsdu processing for rx indication
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

Make amsdu handlers (i.e amsdu_pop and rx_h_handler) common to both
rx_ind and frag_ind htt events. It is sufficient to hold rx_ring lock
for amsdu_pop alone and no need to hold it until the packets are
delivered to mac80211. This helps to reduce rx_lock contention as well.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 100 +++++++++++++------------------
 1 file changed, 41 insertions(+), 59 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index abb712f..3fdbfee 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1528,20 +1528,49 @@ static void ath10k_htt_rx_h_filter(struct ath10k *ar,
 	__skb_queue_purge(amsdu);
 }
 
+static int ath10k_htt_rx_handle_amsdu(struct ath10k_htt *htt)
+{
+	struct ath10k *ar = htt->ar;
+	static struct ieee80211_rx_status rx_status;
+	struct sk_buff_head amsdu;
+	int ret;
+
+	__skb_queue_head_init(&amsdu);
+
+	spin_lock_bh(&htt->rx_ring.lock);
+	if (htt->rx_confused) {
+		spin_unlock_bh(&htt->rx_ring.lock);
+		return -EIO;
+	}
+	ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
+	spin_unlock_bh(&htt->rx_ring.lock);
+
+	if (ret < 0) {
+		ath10k_warn(ar, "rx ring became corrupted: %d\n", ret);
+		__skb_queue_purge(&amsdu);
+		/* FIXME: It's probably a good idea to reboot the
+		 * device instead of leaving it inoperable.
+		 */
+		htt->rx_confused = true;
+		return ret;
+	}
+
+	ath10k_htt_rx_h_ppdu(ar, &amsdu, &rx_status, 0xffff);
+	ath10k_htt_rx_h_unchain(ar, &amsdu, ret > 0);
+	ath10k_htt_rx_h_filter(ar, &amsdu, &rx_status);
+	ath10k_htt_rx_h_mpdu(ar, &amsdu, &rx_status);
+	ath10k_htt_rx_h_deliver(ar, &amsdu, &rx_status);
+
+	return 0;
+}
+
 static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 				  struct htt_rx_indication *rx)
 {
 	struct ath10k *ar = htt->ar;
-	struct ieee80211_rx_status *rx_status = &htt->rx_status;
 	struct htt_rx_indication_mpdu_range *mpdu_ranges;
-	struct sk_buff_head amsdu;
 	int num_mpdu_ranges;
-	int i, ret, mpdu_count = 0;
-
-	lockdep_assert_held(&htt->rx_ring.lock);
-
-	if (htt->rx_confused)
-		return;
+	int i, mpdu_count = 0;
 
 	num_mpdu_ranges = MS(__le32_to_cpu(rx->hdr.info1),
 			     HTT_RX_INDICATION_INFO1_NUM_MPDU_RANGES);
@@ -1556,63 +1585,18 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 		mpdu_count += mpdu_ranges[i].mpdu_count;
 
 	while (mpdu_count--) {
-		__skb_queue_head_init(&amsdu);
-		ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
-		if (ret < 0) {
-			ath10k_warn(ar, "rx ring became corrupted: %d\n", ret);
-			__skb_queue_purge(&amsdu);
-			/* FIXME: It's probably a good idea to reboot the
-			 * device instead of leaving it inoperable.
-			 */
-			htt->rx_confused = true;
+		if (ath10k_htt_rx_handle_amsdu(htt) < 0)
 			break;
-		}
-
-		ath10k_htt_rx_h_ppdu(ar, &amsdu, rx_status, 0xffff);
-		ath10k_htt_rx_h_unchain(ar, &amsdu, ret > 0);
-		ath10k_htt_rx_h_filter(ar, &amsdu, rx_status);
-		ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status);
-		ath10k_htt_rx_h_deliver(ar, &amsdu, rx_status);
 	}
 
 	tasklet_schedule(&htt->rx_replenish_task);
 }
 
-static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
-				       struct htt_rx_fragment_indication *frag)
+static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt)
 {
-	struct ath10k *ar = htt->ar;
-	struct ieee80211_rx_status *rx_status = &htt->rx_status;
-	struct sk_buff_head amsdu;
-	int ret;
-
-	__skb_queue_head_init(&amsdu);
-
-	spin_lock_bh(&htt->rx_ring.lock);
-	ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
-	spin_unlock_bh(&htt->rx_ring.lock);
+	ath10k_htt_rx_handle_amsdu(htt);
 
 	tasklet_schedule(&htt->rx_replenish_task);
-
-	ath10k_dbg(ar, ATH10K_DBG_HTT_DUMP, "htt rx frag ahead\n");
-
-	if (ret) {
-		ath10k_warn(ar, "failed to pop amsdu from httr rx ring for fragmented rx %d\n",
-			    ret);
-		__skb_queue_purge(&amsdu);
-		return;
-	}
-
-	if (skb_queue_len(&amsdu) != 1) {
-		ath10k_warn(ar, "failed to pop frag amsdu: too many msdus\n");
-		__skb_queue_purge(&amsdu);
-		return;
-	}
-
-	ath10k_htt_rx_h_ppdu(ar, &amsdu, rx_status, 0xffff);
-	ath10k_htt_rx_h_filter(ar, &amsdu, rx_status);
-	ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status);
-	ath10k_htt_rx_h_deliver(ar, &amsdu, rx_status);
 }
 
 static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
@@ -2331,7 +2315,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	case HTT_T2H_MSG_TYPE_RX_FRAG_IND: {
 		ath10k_dbg_dump(ar, ATH10K_DBG_HTT_DUMP, NULL, "htt event: ",
 				skb->data, skb->len);
-		ath10k_htt_rx_frag_handler(htt, &resp->rx_frag_ind);
+		ath10k_htt_rx_frag_handler(htt);
 		break;
 	}
 	case HTT_T2H_MSG_TYPE_TEST:
@@ -2473,9 +2457,7 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 
 	while ((skb = __skb_dequeue(&rx_q))) {
 		resp = (struct htt_resp *)skb->data;
-		spin_lock_bh(&htt->rx_ring.lock);
 		ath10k_htt_rx_handler(htt, &resp->rx_ind);
-		spin_unlock_bh(&htt->rx_ring.lock);
 		dev_kfree_skb_any(skb);
 	}
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/9] ath10k: cleanup amsdu processing for rx indication
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

Make amsdu handlers (i.e amsdu_pop and rx_h_handler) common to both
rx_ind and frag_ind htt events. It is sufficient to hold rx_ring lock
for amsdu_pop alone and no need to hold it until the packets are
delivered to mac80211. This helps to reduce rx_lock contention as well.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 100 +++++++++++++------------------
 1 file changed, 41 insertions(+), 59 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index abb712f..3fdbfee 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1528,20 +1528,49 @@ static void ath10k_htt_rx_h_filter(struct ath10k *ar,
 	__skb_queue_purge(amsdu);
 }
 
+static int ath10k_htt_rx_handle_amsdu(struct ath10k_htt *htt)
+{
+	struct ath10k *ar = htt->ar;
+	static struct ieee80211_rx_status rx_status;
+	struct sk_buff_head amsdu;
+	int ret;
+
+	__skb_queue_head_init(&amsdu);
+
+	spin_lock_bh(&htt->rx_ring.lock);
+	if (htt->rx_confused) {
+		spin_unlock_bh(&htt->rx_ring.lock);
+		return -EIO;
+	}
+	ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
+	spin_unlock_bh(&htt->rx_ring.lock);
+
+	if (ret < 0) {
+		ath10k_warn(ar, "rx ring became corrupted: %d\n", ret);
+		__skb_queue_purge(&amsdu);
+		/* FIXME: It's probably a good idea to reboot the
+		 * device instead of leaving it inoperable.
+		 */
+		htt->rx_confused = true;
+		return ret;
+	}
+
+	ath10k_htt_rx_h_ppdu(ar, &amsdu, &rx_status, 0xffff);
+	ath10k_htt_rx_h_unchain(ar, &amsdu, ret > 0);
+	ath10k_htt_rx_h_filter(ar, &amsdu, &rx_status);
+	ath10k_htt_rx_h_mpdu(ar, &amsdu, &rx_status);
+	ath10k_htt_rx_h_deliver(ar, &amsdu, &rx_status);
+
+	return 0;
+}
+
 static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 				  struct htt_rx_indication *rx)
 {
 	struct ath10k *ar = htt->ar;
-	struct ieee80211_rx_status *rx_status = &htt->rx_status;
 	struct htt_rx_indication_mpdu_range *mpdu_ranges;
-	struct sk_buff_head amsdu;
 	int num_mpdu_ranges;
-	int i, ret, mpdu_count = 0;
-
-	lockdep_assert_held(&htt->rx_ring.lock);
-
-	if (htt->rx_confused)
-		return;
+	int i, mpdu_count = 0;
 
 	num_mpdu_ranges = MS(__le32_to_cpu(rx->hdr.info1),
 			     HTT_RX_INDICATION_INFO1_NUM_MPDU_RANGES);
@@ -1556,63 +1585,18 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 		mpdu_count += mpdu_ranges[i].mpdu_count;
 
 	while (mpdu_count--) {
-		__skb_queue_head_init(&amsdu);
-		ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
-		if (ret < 0) {
-			ath10k_warn(ar, "rx ring became corrupted: %d\n", ret);
-			__skb_queue_purge(&amsdu);
-			/* FIXME: It's probably a good idea to reboot the
-			 * device instead of leaving it inoperable.
-			 */
-			htt->rx_confused = true;
+		if (ath10k_htt_rx_handle_amsdu(htt) < 0)
 			break;
-		}
-
-		ath10k_htt_rx_h_ppdu(ar, &amsdu, rx_status, 0xffff);
-		ath10k_htt_rx_h_unchain(ar, &amsdu, ret > 0);
-		ath10k_htt_rx_h_filter(ar, &amsdu, rx_status);
-		ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status);
-		ath10k_htt_rx_h_deliver(ar, &amsdu, rx_status);
 	}
 
 	tasklet_schedule(&htt->rx_replenish_task);
 }
 
-static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,
-				       struct htt_rx_fragment_indication *frag)
+static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt)
 {
-	struct ath10k *ar = htt->ar;
-	struct ieee80211_rx_status *rx_status = &htt->rx_status;
-	struct sk_buff_head amsdu;
-	int ret;
-
-	__skb_queue_head_init(&amsdu);
-
-	spin_lock_bh(&htt->rx_ring.lock);
-	ret = ath10k_htt_rx_amsdu_pop(htt, &amsdu);
-	spin_unlock_bh(&htt->rx_ring.lock);
+	ath10k_htt_rx_handle_amsdu(htt);
 
 	tasklet_schedule(&htt->rx_replenish_task);
-
-	ath10k_dbg(ar, ATH10K_DBG_HTT_DUMP, "htt rx frag ahead\n");
-
-	if (ret) {
-		ath10k_warn(ar, "failed to pop amsdu from httr rx ring for fragmented rx %d\n",
-			    ret);
-		__skb_queue_purge(&amsdu);
-		return;
-	}
-
-	if (skb_queue_len(&amsdu) != 1) {
-		ath10k_warn(ar, "failed to pop frag amsdu: too many msdus\n");
-		__skb_queue_purge(&amsdu);
-		return;
-	}
-
-	ath10k_htt_rx_h_ppdu(ar, &amsdu, rx_status, 0xffff);
-	ath10k_htt_rx_h_filter(ar, &amsdu, rx_status);
-	ath10k_htt_rx_h_mpdu(ar, &amsdu, rx_status);
-	ath10k_htt_rx_h_deliver(ar, &amsdu, rx_status);
 }
 
 static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
@@ -2331,7 +2315,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	case HTT_T2H_MSG_TYPE_RX_FRAG_IND: {
 		ath10k_dbg_dump(ar, ATH10K_DBG_HTT_DUMP, NULL, "htt event: ",
 				skb->data, skb->len);
-		ath10k_htt_rx_frag_handler(htt, &resp->rx_frag_ind);
+		ath10k_htt_rx_frag_handler(htt);
 		break;
 	}
 	case HTT_T2H_MSG_TYPE_TEST:
@@ -2473,9 +2457,7 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 
 	while ((skb = __skb_dequeue(&rx_q))) {
 		resp = (struct htt_resp *)skb->data;
-		spin_lock_bh(&htt->rx_ring.lock);
 		ath10k_htt_rx_handler(htt, &resp->rx_ind);
-		spin_unlock_bh(&htt->rx_ring.lock);
 		dev_kfree_skb_any(skb);
 	}
 
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/9] ath10k: speedup htt rx descriptor processing for rx_ind
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

In follow up patch, htt rx descriptors will be reused instead of
dealloc and refill. To achieve that htt rx indication messages
should not be deferred and should be processed in pci tasklet itself.
Also from rx indication message, mpdu_count alone is used. So it is
maintained as atomic variable and all rx amsdu handlers are done
processed from txrx tasklet. This change get rid of rx_compl_q usage.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.h    |  1 +
 drivers/net/wireless/ath/ath10k/htt_rx.c | 39 +++++++++++++++-----------------
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 76c4bae..27a65ec 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1663,6 +1663,7 @@ struct ath10k_htt {
 	 * used to avoid further failures */
 	bool rx_confused;
 	struct tasklet_struct rx_replenish_task;
+	atomic_t num_mpdus_ready;
 
 	/* This is used to group tx/rx completions separately and process them
 	 * in batches to reduce cache stalls */
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 3fdbfee..dde2237 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -526,6 +526,7 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 	skb_queue_head_init(&htt->rx_compl_q);
 	skb_queue_head_init(&htt->rx_in_ord_compl_q);
 	skb_queue_head_init(&htt->tx_fetch_ind_q);
+	atomic_set(&htt->num_mpdus_ready, 0);
 
 	tasklet_init(&htt->txrx_compl_task, ath10k_htt_txrx_compl_task,
 		     (unsigned long)htt);
@@ -1564,8 +1565,8 @@ static int ath10k_htt_rx_handle_amsdu(struct ath10k_htt *htt)
 	return 0;
 }
 
-static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
-				  struct htt_rx_indication *rx)
+static void ath10k_htt_rx_proc_rx_ind(struct ath10k_htt *htt,
+				      struct htt_rx_indication *rx)
 {
 	struct ath10k *ar = htt->ar;
 	struct htt_rx_indication_mpdu_range *mpdu_ranges;
@@ -1584,19 +1585,16 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 	for (i = 0; i < num_mpdu_ranges; i++)
 		mpdu_count += mpdu_ranges[i].mpdu_count;
 
-	while (mpdu_count--) {
-		if (ath10k_htt_rx_handle_amsdu(htt) < 0)
-			break;
-	}
+	atomic_add(mpdu_count, &htt->num_mpdus_ready);
 
-	tasklet_schedule(&htt->rx_replenish_task);
+	tasklet_schedule(&htt->txrx_compl_task);
 }
 
 static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt)
 {
-	ath10k_htt_rx_handle_amsdu(htt);
+	atomic_inc(&htt->num_mpdus_ready);
 
-	tasklet_schedule(&htt->rx_replenish_task);
+	tasklet_schedule(&htt->txrx_compl_task);
 }
 
 static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
@@ -2250,9 +2248,8 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 		break;
 	}
 	case HTT_T2H_MSG_TYPE_RX_IND:
-		skb_queue_tail(&htt->rx_compl_q, skb);
-		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		ath10k_htt_rx_proc_rx_ind(htt, &resp->rx_ind);
+		break;
 	case HTT_T2H_MSG_TYPE_PEER_MAP: {
 		struct htt_peer_map_event ev = {
 			.vdev_id = resp->peer_map.vdev_id,
@@ -2419,19 +2416,15 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 	struct sk_buff_head rx_q;
 	struct sk_buff_head rx_ind_q;
 	struct sk_buff_head tx_ind_q;
-	struct htt_resp *resp;
 	struct sk_buff *skb;
 	unsigned long flags;
+	int num_mpdus;
 
 	__skb_queue_head_init(&rx_q);
 	__skb_queue_head_init(&rx_ind_q);
 	__skb_queue_head_init(&tx_ind_q);
 
 
-	spin_lock_irqsave(&htt->rx_compl_q.lock, flags);
-	skb_queue_splice_init(&htt->rx_compl_q, &rx_q);
-	spin_unlock_irqrestore(&htt->rx_compl_q.lock, flags);
-
 	spin_lock_irqsave(&htt->rx_in_ord_compl_q.lock, flags);
 	skb_queue_splice_init(&htt->rx_in_ord_compl_q, &rx_ind_q);
 	spin_unlock_irqrestore(&htt->rx_in_ord_compl_q.lock, flags);
@@ -2455,10 +2448,12 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 
 	ath10k_mac_tx_push_pending(ar);
 
-	while ((skb = __skb_dequeue(&rx_q))) {
-		resp = (struct htt_resp *)skb->data;
-		ath10k_htt_rx_handler(htt, &resp->rx_ind);
-		dev_kfree_skb_any(skb);
+	num_mpdus = atomic_read(&htt->num_mpdus_ready);
+	atomic_sub(num_mpdus, &htt->num_mpdus_ready);
+
+	while (num_mpdus--) {
+		if (ath10k_htt_rx_handle_amsdu(htt))
+			break;
 	}
 
 	while ((skb = __skb_dequeue(&rx_ind_q))) {
@@ -2467,4 +2462,6 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 		spin_unlock_bh(&htt->rx_ring.lock);
 		dev_kfree_skb_any(skb);
 	}
+
+	tasklet_schedule(&htt->rx_replenish_task);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/9] ath10k: speedup htt rx descriptor processing for rx_ind
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

In follow up patch, htt rx descriptors will be reused instead of
dealloc and refill. To achieve that htt rx indication messages
should not be deferred and should be processed in pci tasklet itself.
Also from rx indication message, mpdu_count alone is used. So it is
maintained as atomic variable and all rx amsdu handlers are done
processed from txrx tasklet. This change get rid of rx_compl_q usage.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.h    |  1 +
 drivers/net/wireless/ath/ath10k/htt_rx.c | 39 +++++++++++++++-----------------
 2 files changed, 19 insertions(+), 21 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 76c4bae..27a65ec 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1663,6 +1663,7 @@ struct ath10k_htt {
 	 * used to avoid further failures */
 	bool rx_confused;
 	struct tasklet_struct rx_replenish_task;
+	atomic_t num_mpdus_ready;
 
 	/* This is used to group tx/rx completions separately and process them
 	 * in batches to reduce cache stalls */
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 3fdbfee..dde2237 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -526,6 +526,7 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 	skb_queue_head_init(&htt->rx_compl_q);
 	skb_queue_head_init(&htt->rx_in_ord_compl_q);
 	skb_queue_head_init(&htt->tx_fetch_ind_q);
+	atomic_set(&htt->num_mpdus_ready, 0);
 
 	tasklet_init(&htt->txrx_compl_task, ath10k_htt_txrx_compl_task,
 		     (unsigned long)htt);
@@ -1564,8 +1565,8 @@ static int ath10k_htt_rx_handle_amsdu(struct ath10k_htt *htt)
 	return 0;
 }
 
-static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
-				  struct htt_rx_indication *rx)
+static void ath10k_htt_rx_proc_rx_ind(struct ath10k_htt *htt,
+				      struct htt_rx_indication *rx)
 {
 	struct ath10k *ar = htt->ar;
 	struct htt_rx_indication_mpdu_range *mpdu_ranges;
@@ -1584,19 +1585,16 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
 	for (i = 0; i < num_mpdu_ranges; i++)
 		mpdu_count += mpdu_ranges[i].mpdu_count;
 
-	while (mpdu_count--) {
-		if (ath10k_htt_rx_handle_amsdu(htt) < 0)
-			break;
-	}
+	atomic_add(mpdu_count, &htt->num_mpdus_ready);
 
-	tasklet_schedule(&htt->rx_replenish_task);
+	tasklet_schedule(&htt->txrx_compl_task);
 }
 
 static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt)
 {
-	ath10k_htt_rx_handle_amsdu(htt);
+	atomic_inc(&htt->num_mpdus_ready);
 
-	tasklet_schedule(&htt->rx_replenish_task);
+	tasklet_schedule(&htt->txrx_compl_task);
 }
 
 static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
@@ -2250,9 +2248,8 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 		break;
 	}
 	case HTT_T2H_MSG_TYPE_RX_IND:
-		skb_queue_tail(&htt->rx_compl_q, skb);
-		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		ath10k_htt_rx_proc_rx_ind(htt, &resp->rx_ind);
+		break;
 	case HTT_T2H_MSG_TYPE_PEER_MAP: {
 		struct htt_peer_map_event ev = {
 			.vdev_id = resp->peer_map.vdev_id,
@@ -2419,19 +2416,15 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 	struct sk_buff_head rx_q;
 	struct sk_buff_head rx_ind_q;
 	struct sk_buff_head tx_ind_q;
-	struct htt_resp *resp;
 	struct sk_buff *skb;
 	unsigned long flags;
+	int num_mpdus;
 
 	__skb_queue_head_init(&rx_q);
 	__skb_queue_head_init(&rx_ind_q);
 	__skb_queue_head_init(&tx_ind_q);
 
 
-	spin_lock_irqsave(&htt->rx_compl_q.lock, flags);
-	skb_queue_splice_init(&htt->rx_compl_q, &rx_q);
-	spin_unlock_irqrestore(&htt->rx_compl_q.lock, flags);
-
 	spin_lock_irqsave(&htt->rx_in_ord_compl_q.lock, flags);
 	skb_queue_splice_init(&htt->rx_in_ord_compl_q, &rx_ind_q);
 	spin_unlock_irqrestore(&htt->rx_in_ord_compl_q.lock, flags);
@@ -2455,10 +2448,12 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 
 	ath10k_mac_tx_push_pending(ar);
 
-	while ((skb = __skb_dequeue(&rx_q))) {
-		resp = (struct htt_resp *)skb->data;
-		ath10k_htt_rx_handler(htt, &resp->rx_ind);
-		dev_kfree_skb_any(skb);
+	num_mpdus = atomic_read(&htt->num_mpdus_ready);
+	atomic_sub(num_mpdus, &htt->num_mpdus_ready);
+
+	while (num_mpdus--) {
+		if (ath10k_htt_rx_handle_amsdu(htt))
+			break;
 	}
 
 	while ((skb = __skb_dequeue(&rx_ind_q))) {
@@ -2467,4 +2462,6 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 		spin_unlock_bh(&htt->rx_ring.lock);
 		dev_kfree_skb_any(skb);
 	}
+
+	tasklet_schedule(&htt->rx_replenish_task);
 }
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/9] ath10k: register ath10k_htt_htc_t2h_msg_handler
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

Except qca61x4 family chips (qca6164, qca6174), copy engine 5 is used
for receiving target to host htt messages. In follow up patch, CE5
descriptors will be reused. In such case, same API can not be used as
htc layer callback where the response messages will be freed at the end.
Hence register new API for HTC layer that free up received message and
keep the message handler common for both HTC and HIF layers.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.c    |  2 +-
 drivers/net/wireless/ath/ath10k/htt.h    |  3 ++-
 drivers/net/wireless/ath/ath10k/htt_rx.c | 22 +++++++++++++++-------
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.c b/drivers/net/wireless/ath/ath10k/htt.c
index 7561f22..17a3008 100644
--- a/drivers/net/wireless/ath/ath10k/htt.c
+++ b/drivers/net/wireless/ath/ath10k/htt.c
@@ -149,7 +149,7 @@ int ath10k_htt_connect(struct ath10k_htt *htt)
 	memset(&conn_resp, 0, sizeof(conn_resp));
 
 	conn_req.ep_ops.ep_tx_complete = ath10k_htt_htc_tx_complete;
-	conn_req.ep_ops.ep_rx_complete = ath10k_htt_t2h_msg_handler;
+	conn_req.ep_ops.ep_rx_complete = ath10k_htt_htc_t2h_msg_handler;
 
 	/* connect to control service */
 	conn_req.service_id = ATH10K_HTC_SVC_ID_HTT_DATA_MSG;
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 27a65ec..1adcae4 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1765,7 +1765,8 @@ int ath10k_htt_rx_ring_refill(struct ath10k *ar);
 void ath10k_htt_rx_free(struct ath10k_htt *htt);
 
 void ath10k_htt_htc_tx_complete(struct ath10k *ar, struct sk_buff *skb);
-void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb);
+void ath10k_htt_htc_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb);
+bool ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb);
 int ath10k_htt_h2t_ver_req_msg(struct ath10k_htt *htt);
 int ath10k_htt_h2t_stats_req(struct ath10k_htt *htt, u8 mask, u64 cookie);
 int ath10k_htt_send_frag_desc_bank_cfg(struct ath10k_htt *htt);
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index dde2237..31be0c6 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2219,7 +2219,18 @@ static inline enum ieee80211_band phy_mode_to_band(u32 phy_mode)
 	return band;
 }
 
-void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+void ath10k_htt_htc_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+{
+	bool release;
+
+	release = ath10k_htt_t2h_msg_handler(ar, skb);
+
+	/* Free the indication buffer */
+	if (release)
+		dev_kfree_skb_any(skb);
+}
+
+bool ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 {
 	struct ath10k_htt *htt = &ar->htt;
 	struct htt_resp *resp = (struct htt_resp *)skb->data;
@@ -2235,8 +2246,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	if (resp->hdr.msg_type >= ar->htt.t2h_msg_types_max) {
 		ath10k_dbg(ar, ATH10K_DBG_HTT, "htt rx, unsupported msg_type: 0x%0X\n max: 0x%0X",
 			   resp->hdr.msg_type, ar->htt.t2h_msg_types_max);
-		dev_kfree_skb_any(skb);
-		return;
+		return true;
 	}
 	type = ar->htt.t2h_msg_types[resp->hdr.msg_type];
 
@@ -2352,7 +2362,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	case HTT_T2H_MSG_TYPE_RX_IN_ORD_PADDR_IND: {
 		skb_queue_tail(&htt->rx_in_ord_compl_q, skb);
 		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		return false;
 	}
 	case HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND:
 		break;
@@ -2394,9 +2404,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 				skb->data, skb->len);
 		break;
 	};
-
-	/* Free the indication buffer */
-	dev_kfree_skb_any(skb);
+	return true;
 }
 EXPORT_SYMBOL(ath10k_htt_t2h_msg_handler);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/9] ath10k: register ath10k_htt_htc_t2h_msg_handler
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

Except qca61x4 family chips (qca6164, qca6174), copy engine 5 is used
for receiving target to host htt messages. In follow up patch, CE5
descriptors will be reused. In such case, same API can not be used as
htc layer callback where the response messages will be freed at the end.
Hence register new API for HTC layer that free up received message and
keep the message handler common for both HTC and HIF layers.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.c    |  2 +-
 drivers/net/wireless/ath/ath10k/htt.h    |  3 ++-
 drivers/net/wireless/ath/ath10k/htt_rx.c | 22 +++++++++++++++-------
 3 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.c b/drivers/net/wireless/ath/ath10k/htt.c
index 7561f22..17a3008 100644
--- a/drivers/net/wireless/ath/ath10k/htt.c
+++ b/drivers/net/wireless/ath/ath10k/htt.c
@@ -149,7 +149,7 @@ int ath10k_htt_connect(struct ath10k_htt *htt)
 	memset(&conn_resp, 0, sizeof(conn_resp));
 
 	conn_req.ep_ops.ep_tx_complete = ath10k_htt_htc_tx_complete;
-	conn_req.ep_ops.ep_rx_complete = ath10k_htt_t2h_msg_handler;
+	conn_req.ep_ops.ep_rx_complete = ath10k_htt_htc_t2h_msg_handler;
 
 	/* connect to control service */
 	conn_req.service_id = ATH10K_HTC_SVC_ID_HTT_DATA_MSG;
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 27a65ec..1adcae4 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1765,7 +1765,8 @@ int ath10k_htt_rx_ring_refill(struct ath10k *ar);
 void ath10k_htt_rx_free(struct ath10k_htt *htt);
 
 void ath10k_htt_htc_tx_complete(struct ath10k *ar, struct sk_buff *skb);
-void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb);
+void ath10k_htt_htc_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb);
+bool ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb);
 int ath10k_htt_h2t_ver_req_msg(struct ath10k_htt *htt);
 int ath10k_htt_h2t_stats_req(struct ath10k_htt *htt, u8 mask, u64 cookie);
 int ath10k_htt_send_frag_desc_bank_cfg(struct ath10k_htt *htt);
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index dde2237..31be0c6 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -2219,7 +2219,18 @@ static inline enum ieee80211_band phy_mode_to_band(u32 phy_mode)
 	return band;
 }
 
-void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+void ath10k_htt_htc_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+{
+	bool release;
+
+	release = ath10k_htt_t2h_msg_handler(ar, skb);
+
+	/* Free the indication buffer */
+	if (release)
+		dev_kfree_skb_any(skb);
+}
+
+bool ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 {
 	struct ath10k_htt *htt = &ar->htt;
 	struct htt_resp *resp = (struct htt_resp *)skb->data;
@@ -2235,8 +2246,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	if (resp->hdr.msg_type >= ar->htt.t2h_msg_types_max) {
 		ath10k_dbg(ar, ATH10K_DBG_HTT, "htt rx, unsupported msg_type: 0x%0X\n max: 0x%0X",
 			   resp->hdr.msg_type, ar->htt.t2h_msg_types_max);
-		dev_kfree_skb_any(skb);
-		return;
+		return true;
 	}
 	type = ar->htt.t2h_msg_types[resp->hdr.msg_type];
 
@@ -2352,7 +2362,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 	case HTT_T2H_MSG_TYPE_RX_IN_ORD_PADDR_IND: {
 		skb_queue_tail(&htt->rx_in_ord_compl_q, skb);
 		tasklet_schedule(&htt->txrx_compl_task);
-		return;
+		return false;
 	}
 	case HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND:
 		break;
@@ -2394,9 +2404,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 				skb->data, skb->len);
 		break;
 	};
-
-	/* Free the indication buffer */
-	dev_kfree_skb_any(skb);
+	return true;
 }
 EXPORT_SYMBOL(ath10k_htt_t2h_msg_handler);
 
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/9] ath10k: cleanup copy engine receive next completion
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

The physical address necessary to unmap DMA ('bufferp') is stored
in ath10k_skb_cb as 'paddr'. For diag register read and write
operations, 'paddr' is stored in transfer context. ath10k doesn't rely
on the meta/transfer_id. So the unused output arguments {bufferp, nbytesp
and transfer_idp} are removed from CE recv_next completion.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/ce.c  | 21 +++--------------
 drivers/net/wireless/ath/ath10k/ce.h  | 10 ++------
 drivers/net/wireless/ath/ath10k/pci.c | 43 ++++++++++++++---------------------
 3 files changed, 22 insertions(+), 52 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index edf3629..d6da404 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -444,14 +444,10 @@ int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
  */
 int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 					 void **per_transfer_contextp,
-					 u32 *bufferp,
-					 unsigned int *nbytesp,
-					 unsigned int *transfer_idp,
-					 unsigned int *flagsp)
+					 unsigned int *nbytesp)
 {
 	struct ath10k_ce_ring *dest_ring = ce_state->dest_ring;
 	unsigned int nentries_mask = dest_ring->nentries_mask;
-	struct ath10k *ar = ce_state->ar;
 	unsigned int sw_index = dest_ring->sw_index;
 
 	struct ce_desc *base = dest_ring->base_addr_owner_space;
@@ -476,14 +472,7 @@ int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 	desc->nbytes = 0;
 
 	/* Return data from completed destination descriptor */
-	*bufferp = __le32_to_cpu(sdesc.addr);
 	*nbytesp = nbytes;
-	*transfer_idp = MS(__le16_to_cpu(sdesc.flags), CE_DESC_FLAGS_META_DATA);
-
-	if (__le16_to_cpu(sdesc.flags) & CE_DESC_FLAGS_BYTE_SWAP)
-		*flagsp = CE_RECV_FLAG_SWAPPED;
-	else
-		*flagsp = 0;
 
 	if (per_transfer_contextp)
 		*per_transfer_contextp =
@@ -501,10 +490,7 @@ int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 
 int ath10k_ce_completed_recv_next(struct ath10k_ce_pipe *ce_state,
 				  void **per_transfer_contextp,
-				  u32 *bufferp,
-				  unsigned int *nbytesp,
-				  unsigned int *transfer_idp,
-				  unsigned int *flagsp)
+				  unsigned int *nbytesp)
 {
 	struct ath10k *ar = ce_state->ar;
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
@@ -513,8 +499,7 @@ int ath10k_ce_completed_recv_next(struct ath10k_ce_pipe *ce_state,
 	spin_lock_bh(&ar_pci->ce_lock);
 	ret = ath10k_ce_completed_recv_next_nolock(ce_state,
 						   per_transfer_contextp,
-						   bufferp, nbytesp,
-						   transfer_idp, flagsp);
+						   nbytesp);
 	spin_unlock_bh(&ar_pci->ce_lock);
 
 	return ret;
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index dac6768..68717e5 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -177,10 +177,7 @@ int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr);
  */
 int ath10k_ce_completed_recv_next(struct ath10k_ce_pipe *ce_state,
 				  void **per_transfer_contextp,
-				  u32 *bufferp,
-				  unsigned int *nbytesp,
-				  unsigned int *transfer_idp,
-				  unsigned int *flagsp);
+				  unsigned int *nbytesp);
 /*
  * Supply data for the next completed unprocessed send descriptor.
  * Pops 1 completed send buffer from Source ring.
@@ -212,10 +209,7 @@ int ath10k_ce_revoke_recv_next(struct ath10k_ce_pipe *ce_state,
 
 int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 					 void **per_transfer_contextp,
-					 u32 *bufferp,
-					 unsigned int *nbytesp,
-					 unsigned int *transfer_idp,
-					 unsigned int *flagsp);
+					 unsigned int *nbytesp);
 
 /*
  * Support clean shutdown by allowing the caller to cancel
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index b3cff1d..290a61a 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -870,10 +870,8 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	int ret = 0;
-	u32 buf;
+	u32 *buf;
 	unsigned int completed_nbytes, orig_nbytes, remaining_bytes;
-	unsigned int id;
-	unsigned int flags;
 	struct ath10k_ce_pipe *ce_diag;
 	/* Host buffer address in CE space */
 	u32 ce_data;
@@ -909,7 +907,7 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 		nbytes = min_t(unsigned int, remaining_bytes,
 			       DIAG_TRANSFER_LIMIT);
 
-		ret = __ath10k_ce_rx_post_buf(ce_diag, NULL, ce_data);
+		ret = __ath10k_ce_rx_post_buf(ce_diag, &ce_data, ce_data);
 		if (ret != 0)
 			goto done;
 
@@ -940,9 +938,10 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 		}
 
 		i = 0;
-		while (ath10k_ce_completed_recv_next_nolock(ce_diag, NULL, &buf,
-							    &completed_nbytes,
-							    &id, &flags) != 0) {
+		while (ath10k_ce_completed_recv_next_nolock(ce_diag,
+							    (void **)&buf,
+							    &completed_nbytes)
+								!= 0) {
 			mdelay(1);
 
 			if (i++ > DIAG_ACCESS_CE_TIMEOUT_MS) {
@@ -956,7 +955,7 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 			goto done;
 		}
 
-		if (buf != ce_data) {
+		if (*buf != ce_data) {
 			ret = -EIO;
 			goto done;
 		}
@@ -1026,10 +1025,8 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	int ret = 0;
-	u32 buf;
+	u32 *buf;
 	unsigned int completed_nbytes, orig_nbytes, remaining_bytes;
-	unsigned int id;
-	unsigned int flags;
 	struct ath10k_ce_pipe *ce_diag;
 	void *data_buf = NULL;
 	u32 ce_data;	/* Host buffer address in CE space */
@@ -1078,7 +1075,7 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 		nbytes = min_t(int, remaining_bytes, DIAG_TRANSFER_LIMIT);
 
 		/* Set up to receive directly into Target(!) address */
-		ret = __ath10k_ce_rx_post_buf(ce_diag, NULL, address);
+		ret = __ath10k_ce_rx_post_buf(ce_diag, &address, address);
 		if (ret != 0)
 			goto done;
 
@@ -1103,9 +1100,10 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 		}
 
 		i = 0;
-		while (ath10k_ce_completed_recv_next_nolock(ce_diag, NULL, &buf,
-							    &completed_nbytes,
-							    &id, &flags) != 0) {
+		while (ath10k_ce_completed_recv_next_nolock(ce_diag,
+							    (void **)&buf,
+							    &completed_nbytes)
+								!= 0) {
 			mdelay(1);
 
 			if (i++ > DIAG_ACCESS_CE_TIMEOUT_MS) {
@@ -1119,7 +1117,7 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 			goto done;
 		}
 
-		if (buf != address) {
+		if (*buf != address) {
 			ret = -EIO;
 			goto done;
 		}
@@ -1181,15 +1179,11 @@ static void ath10k_pci_process_rx_cb(struct ath10k_ce_pipe *ce_state,
 	struct sk_buff *skb;
 	struct sk_buff_head list;
 	void *transfer_context;
-	u32 ce_data;
 	unsigned int nbytes, max_nbytes;
-	unsigned int transfer_id;
-	unsigned int flags;
 
 	__skb_queue_head_init(&list);
 	while (ath10k_ce_completed_recv_next(ce_state, &transfer_context,
-					     &ce_data, &nbytes, &transfer_id,
-					     &flags) == 0) {
+					     &nbytes) == 0) {
 		skb = transfer_context;
 		max_nbytes = skb->len + skb_tailroom(skb);
 		dma_unmap_single(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
@@ -1835,13 +1829,10 @@ static void ath10k_pci_bmi_recv_data(struct ath10k_ce_pipe *ce_state)
 {
 	struct ath10k *ar = ce_state->ar;
 	struct bmi_xfer *xfer;
-	u32 ce_data;
 	unsigned int nbytes;
-	unsigned int transfer_id;
-	unsigned int flags;
 
-	if (ath10k_ce_completed_recv_next(ce_state, (void **)&xfer, &ce_data,
-					  &nbytes, &transfer_id, &flags))
+	if (ath10k_ce_completed_recv_next(ce_state, (void **)&xfer,
+					  &nbytes))
 		return;
 
 	if (WARN_ON_ONCE(!xfer))
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/9] ath10k: cleanup copy engine receive next completion
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

The physical address necessary to unmap DMA ('bufferp') is stored
in ath10k_skb_cb as 'paddr'. For diag register read and write
operations, 'paddr' is stored in transfer context. ath10k doesn't rely
on the meta/transfer_id. So the unused output arguments {bufferp, nbytesp
and transfer_idp} are removed from CE recv_next completion.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/ce.c  | 21 +++--------------
 drivers/net/wireless/ath/ath10k/ce.h  | 10 ++------
 drivers/net/wireless/ath/ath10k/pci.c | 43 ++++++++++++++---------------------
 3 files changed, 22 insertions(+), 52 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index edf3629..d6da404 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -444,14 +444,10 @@ int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
  */
 int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 					 void **per_transfer_contextp,
-					 u32 *bufferp,
-					 unsigned int *nbytesp,
-					 unsigned int *transfer_idp,
-					 unsigned int *flagsp)
+					 unsigned int *nbytesp)
 {
 	struct ath10k_ce_ring *dest_ring = ce_state->dest_ring;
 	unsigned int nentries_mask = dest_ring->nentries_mask;
-	struct ath10k *ar = ce_state->ar;
 	unsigned int sw_index = dest_ring->sw_index;
 
 	struct ce_desc *base = dest_ring->base_addr_owner_space;
@@ -476,14 +472,7 @@ int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 	desc->nbytes = 0;
 
 	/* Return data from completed destination descriptor */
-	*bufferp = __le32_to_cpu(sdesc.addr);
 	*nbytesp = nbytes;
-	*transfer_idp = MS(__le16_to_cpu(sdesc.flags), CE_DESC_FLAGS_META_DATA);
-
-	if (__le16_to_cpu(sdesc.flags) & CE_DESC_FLAGS_BYTE_SWAP)
-		*flagsp = CE_RECV_FLAG_SWAPPED;
-	else
-		*flagsp = 0;
 
 	if (per_transfer_contextp)
 		*per_transfer_contextp =
@@ -501,10 +490,7 @@ int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 
 int ath10k_ce_completed_recv_next(struct ath10k_ce_pipe *ce_state,
 				  void **per_transfer_contextp,
-				  u32 *bufferp,
-				  unsigned int *nbytesp,
-				  unsigned int *transfer_idp,
-				  unsigned int *flagsp)
+				  unsigned int *nbytesp)
 {
 	struct ath10k *ar = ce_state->ar;
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
@@ -513,8 +499,7 @@ int ath10k_ce_completed_recv_next(struct ath10k_ce_pipe *ce_state,
 	spin_lock_bh(&ar_pci->ce_lock);
 	ret = ath10k_ce_completed_recv_next_nolock(ce_state,
 						   per_transfer_contextp,
-						   bufferp, nbytesp,
-						   transfer_idp, flagsp);
+						   nbytesp);
 	spin_unlock_bh(&ar_pci->ce_lock);
 
 	return ret;
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index dac6768..68717e5 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -177,10 +177,7 @@ int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr);
  */
 int ath10k_ce_completed_recv_next(struct ath10k_ce_pipe *ce_state,
 				  void **per_transfer_contextp,
-				  u32 *bufferp,
-				  unsigned int *nbytesp,
-				  unsigned int *transfer_idp,
-				  unsigned int *flagsp);
+				  unsigned int *nbytesp);
 /*
  * Supply data for the next completed unprocessed send descriptor.
  * Pops 1 completed send buffer from Source ring.
@@ -212,10 +209,7 @@ int ath10k_ce_revoke_recv_next(struct ath10k_ce_pipe *ce_state,
 
 int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 					 void **per_transfer_contextp,
-					 u32 *bufferp,
-					 unsigned int *nbytesp,
-					 unsigned int *transfer_idp,
-					 unsigned int *flagsp);
+					 unsigned int *nbytesp);
 
 /*
  * Support clean shutdown by allowing the caller to cancel
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index b3cff1d..290a61a 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -870,10 +870,8 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	int ret = 0;
-	u32 buf;
+	u32 *buf;
 	unsigned int completed_nbytes, orig_nbytes, remaining_bytes;
-	unsigned int id;
-	unsigned int flags;
 	struct ath10k_ce_pipe *ce_diag;
 	/* Host buffer address in CE space */
 	u32 ce_data;
@@ -909,7 +907,7 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 		nbytes = min_t(unsigned int, remaining_bytes,
 			       DIAG_TRANSFER_LIMIT);
 
-		ret = __ath10k_ce_rx_post_buf(ce_diag, NULL, ce_data);
+		ret = __ath10k_ce_rx_post_buf(ce_diag, &ce_data, ce_data);
 		if (ret != 0)
 			goto done;
 
@@ -940,9 +938,10 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 		}
 
 		i = 0;
-		while (ath10k_ce_completed_recv_next_nolock(ce_diag, NULL, &buf,
-							    &completed_nbytes,
-							    &id, &flags) != 0) {
+		while (ath10k_ce_completed_recv_next_nolock(ce_diag,
+							    (void **)&buf,
+							    &completed_nbytes)
+								!= 0) {
 			mdelay(1);
 
 			if (i++ > DIAG_ACCESS_CE_TIMEOUT_MS) {
@@ -956,7 +955,7 @@ static int ath10k_pci_diag_read_mem(struct ath10k *ar, u32 address, void *data,
 			goto done;
 		}
 
-		if (buf != ce_data) {
+		if (*buf != ce_data) {
 			ret = -EIO;
 			goto done;
 		}
@@ -1026,10 +1025,8 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 {
 	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
 	int ret = 0;
-	u32 buf;
+	u32 *buf;
 	unsigned int completed_nbytes, orig_nbytes, remaining_bytes;
-	unsigned int id;
-	unsigned int flags;
 	struct ath10k_ce_pipe *ce_diag;
 	void *data_buf = NULL;
 	u32 ce_data;	/* Host buffer address in CE space */
@@ -1078,7 +1075,7 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 		nbytes = min_t(int, remaining_bytes, DIAG_TRANSFER_LIMIT);
 
 		/* Set up to receive directly into Target(!) address */
-		ret = __ath10k_ce_rx_post_buf(ce_diag, NULL, address);
+		ret = __ath10k_ce_rx_post_buf(ce_diag, &address, address);
 		if (ret != 0)
 			goto done;
 
@@ -1103,9 +1100,10 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 		}
 
 		i = 0;
-		while (ath10k_ce_completed_recv_next_nolock(ce_diag, NULL, &buf,
-							    &completed_nbytes,
-							    &id, &flags) != 0) {
+		while (ath10k_ce_completed_recv_next_nolock(ce_diag,
+							    (void **)&buf,
+							    &completed_nbytes)
+								!= 0) {
 			mdelay(1);
 
 			if (i++ > DIAG_ACCESS_CE_TIMEOUT_MS) {
@@ -1119,7 +1117,7 @@ int ath10k_pci_diag_write_mem(struct ath10k *ar, u32 address,
 			goto done;
 		}
 
-		if (buf != address) {
+		if (*buf != address) {
 			ret = -EIO;
 			goto done;
 		}
@@ -1181,15 +1179,11 @@ static void ath10k_pci_process_rx_cb(struct ath10k_ce_pipe *ce_state,
 	struct sk_buff *skb;
 	struct sk_buff_head list;
 	void *transfer_context;
-	u32 ce_data;
 	unsigned int nbytes, max_nbytes;
-	unsigned int transfer_id;
-	unsigned int flags;
 
 	__skb_queue_head_init(&list);
 	while (ath10k_ce_completed_recv_next(ce_state, &transfer_context,
-					     &ce_data, &nbytes, &transfer_id,
-					     &flags) == 0) {
+					     &nbytes) == 0) {
 		skb = transfer_context;
 		max_nbytes = skb->len + skb_tailroom(skb);
 		dma_unmap_single(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
@@ -1835,13 +1829,10 @@ static void ath10k_pci_bmi_recv_data(struct ath10k_ce_pipe *ce_state)
 {
 	struct ath10k *ar = ce_state->ar;
 	struct bmi_xfer *xfer;
-	u32 ce_data;
 	unsigned int nbytes;
-	unsigned int transfer_id;
-	unsigned int flags;
 
-	if (ath10k_ce_completed_recv_next(ce_state, (void **)&xfer, &ce_data,
-					  &nbytes, &transfer_id, &flags))
+	if (ath10k_ce_completed_recv_next(ce_state, (void **)&xfer,
+					  &nbytes))
 		return;
 
 	if (WARN_ON_ONCE(!xfer))
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/9] ath10k: reuse copy engine 5 (htt rx) descriptors
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

Whenever htt rx indication i.e target to host messages are received
on rx copy engine (CE5), the message will be freed after processing
the response. Then CE 5 will be refilled with new descriptors at
post rx processing. This memory alloc and free operations can be avoided
by reusing the same descriptors.

During CE pipe allocation, full ring is not initialized i.e n-1 entries
are filled up. So for CE 5 full ring should be filled up to reuse
descriptors. Moreover CE 5 write index will be updated in single shot
instead of incremental access. This could avoid multiple pci_write and
ce_ring access. From experiments, It improves CPU usage by ~3% in IPQ4019
platform.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/ce.c  | 23 +++++++++++--
 drivers/net/wireless/ath/ath10k/ce.h  |  3 ++
 drivers/net/wireless/ath/ath10k/pci.c | 63 +++++++++++++++++++++++++++++++++--
 3 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index d6da404..7212802 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -411,7 +411,8 @@ int __ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
 
 	lockdep_assert_held(&ar_pci->ce_lock);
 
-	if (CE_RING_DELTA(nentries_mask, write_index, sw_index - 1) == 0)
+	if ((pipe->id != 5) &&
+	    CE_RING_DELTA(nentries_mask, write_index, sw_index - 1) == 0)
 		return -ENOSPC;
 
 	desc->addr = __cpu_to_le32(paddr);
@@ -425,6 +426,19 @@ int __ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
 	return 0;
 }
 
+void ath10k_ce_rx_update_write_idx(struct ath10k_ce_pipe *pipe, u32 nentries)
+{
+	struct ath10k *ar = pipe->ar;
+	struct ath10k_ce_ring *dest_ring = pipe->dest_ring;
+	unsigned int nentries_mask = dest_ring->nentries_mask;
+	unsigned int write_index = dest_ring->write_index;
+	u32 ctrl_addr = pipe->ctrl_addr;
+
+	write_index = CE_RING_IDX_ADD(nentries_mask, write_index, nentries);
+	ath10k_ce_dest_ring_write_index_set(ar, ctrl_addr, write_index);
+	dest_ring->write_index = write_index;
+}
+
 int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
 {
 	struct ath10k *ar = pipe->ar;
@@ -478,8 +492,11 @@ int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 		*per_transfer_contextp =
 			dest_ring->per_transfer_context[sw_index];
 
-	/* sanity */
-	dest_ring->per_transfer_context[sw_index] = NULL;
+	/* Copy engine 5 (HTT Rx) will reuse the same transfer context.
+	 * So update transfer context all CEs except CE5.
+	 */
+	if (ce_state->id != 5)
+		dest_ring->per_transfer_context[sw_index] = NULL;
 
 	/* Update sw_index */
 	sw_index = CE_RING_IDX_INCR(nentries_mask, sw_index);
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index 68717e5..25cafcf 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -166,6 +166,7 @@ int ath10k_ce_num_free_src_entries(struct ath10k_ce_pipe *pipe);
 int __ath10k_ce_rx_num_free_bufs(struct ath10k_ce_pipe *pipe);
 int __ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr);
 int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr);
+void ath10k_ce_rx_update_write_idx(struct ath10k_ce_pipe *pipe, u32 nentries);
 
 /* recv flags */
 /* Data is byte-swapped */
@@ -410,6 +411,8 @@ static inline u32 ath10k_ce_base_address(struct ath10k *ar, unsigned int ce_id)
 	(((int)(toidx)-(int)(fromidx)) & (nentries_mask))
 
 #define CE_RING_IDX_INCR(nentries_mask, idx) (((idx) + 1) & (nentries_mask))
+#define CE_RING_IDX_ADD(nentries_mask, idx, num) \
+		(((idx) + (num)) & (nentries_mask))
 
 #define CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_LSB \
 				ar->regs->ce_wrap_intr_sum_host_msi_lsb
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 290a61a..0b305ef 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -809,7 +809,8 @@ static void ath10k_pci_rx_post_pipe(struct ath10k_pci_pipe *pipe)
 	spin_lock_bh(&ar_pci->ce_lock);
 	num = __ath10k_ce_rx_num_free_bufs(ce_pipe);
 	spin_unlock_bh(&ar_pci->ce_lock);
-	while (num--) {
+
+	while (num >= 0) {
 		ret = __ath10k_pci_rx_post_buf(pipe);
 		if (ret) {
 			if (ret == -ENOSPC)
@@ -819,6 +820,7 @@ static void ath10k_pci_rx_post_pipe(struct ath10k_pci_pipe *pipe)
 				  ATH10K_PCI_RX_POST_RETRY_MS);
 			break;
 		}
+		num--;
 	}
 }
 
@@ -1212,6 +1214,63 @@ static void ath10k_pci_process_rx_cb(struct ath10k_ce_pipe *ce_state,
 	ath10k_pci_rx_post_pipe(pipe_info);
 }
 
+static void ath10k_pci_process_htt_rx_cb(struct ath10k_ce_pipe *ce_state,
+					 void (*callback)(struct ath10k *ar,
+							  struct sk_buff *skb))
+{
+	struct ath10k *ar = ce_state->ar;
+	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
+	struct ath10k_pci_pipe *pipe_info =  &ar_pci->pipe_info[ce_state->id];
+	struct ath10k_ce_pipe *ce_pipe = pipe_info->ce_hdl;
+	struct sk_buff *skb;
+	struct sk_buff_head list;
+	void *transfer_context;
+	unsigned int nbytes, max_nbytes, nentries;
+	int orig_len;
+
+	/* No need to aquire ce_lock for CE5, since this is the only place CE5
+	 * is processed other than init and deinit. Before releasing CE5
+	 * buffers, interrupts are disabled. Thus CE5 access is serialized.
+	 */
+	__skb_queue_head_init(&list);
+	while (ath10k_ce_completed_recv_next_nolock(ce_state, &transfer_context,
+						    &nbytes) == 0) {
+		skb = transfer_context;
+		max_nbytes = skb->len + skb_tailroom(skb);
+
+		if (unlikely(max_nbytes < nbytes)) {
+			ath10k_warn(ar, "rxed more than expected (nbytes %d, max %d)",
+				    nbytes, max_nbytes);
+			continue;
+		}
+
+		dma_sync_single_for_cpu(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
+					max_nbytes, DMA_FROM_DEVICE);
+		skb_put(skb, nbytes);
+		__skb_queue_tail(&list, skb);
+	}
+
+	nentries = skb_queue_len(&list);
+	while ((skb = __skb_dequeue(&list))) {
+		ath10k_dbg(ar, ATH10K_DBG_PCI, "pci rx ce pipe %d len %d\n",
+			   ce_state->id, skb->len);
+		ath10k_dbg_dump(ar, ATH10K_DBG_PCI_DUMP, NULL, "pci rx: ",
+				skb->data, skb->len);
+
+		orig_len = skb->len;
+		callback(ar, skb);
+		skb_push(skb, orig_len - skb->len);
+		skb_reset_tail_pointer(skb);
+		skb_trim(skb, 0);
+
+		/*let device gain the buffer again*/
+		dma_sync_single_for_device(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
+					   skb->len + skb_tailroom(skb),
+					   DMA_FROM_DEVICE);
+	}
+	ath10k_ce_rx_update_write_idx(ce_pipe, nentries);
+}
+
 /* Called by lower (CE) layer when data is received from the Target. */
 static void ath10k_pci_htc_rx_cb(struct ath10k_ce_pipe *ce_state)
 {
@@ -1268,7 +1327,7 @@ static void ath10k_pci_htt_rx_cb(struct ath10k_ce_pipe *ce_state)
 	 */
 	ath10k_ce_per_engine_service(ce_state->ar, 4);
 
-	ath10k_pci_process_rx_cb(ce_state, ath10k_pci_htt_rx_deliver);
+	ath10k_pci_process_htt_rx_cb(ce_state, ath10k_pci_htt_rx_deliver);
 }
 
 int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 8/9] ath10k: reuse copy engine 5 (htt rx) descriptors
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

Whenever htt rx indication i.e target to host messages are received
on rx copy engine (CE5), the message will be freed after processing
the response. Then CE 5 will be refilled with new descriptors at
post rx processing. This memory alloc and free operations can be avoided
by reusing the same descriptors.

During CE pipe allocation, full ring is not initialized i.e n-1 entries
are filled up. So for CE 5 full ring should be filled up to reuse
descriptors. Moreover CE 5 write index will be updated in single shot
instead of incremental access. This could avoid multiple pci_write and
ce_ring access. From experiments, It improves CPU usage by ~3% in IPQ4019
platform.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/ce.c  | 23 +++++++++++--
 drivers/net/wireless/ath/ath10k/ce.h  |  3 ++
 drivers/net/wireless/ath/ath10k/pci.c | 63 +++++++++++++++++++++++++++++++++--
 3 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index d6da404..7212802 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -411,7 +411,8 @@ int __ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
 
 	lockdep_assert_held(&ar_pci->ce_lock);
 
-	if (CE_RING_DELTA(nentries_mask, write_index, sw_index - 1) == 0)
+	if ((pipe->id != 5) &&
+	    CE_RING_DELTA(nentries_mask, write_index, sw_index - 1) == 0)
 		return -ENOSPC;
 
 	desc->addr = __cpu_to_le32(paddr);
@@ -425,6 +426,19 @@ int __ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
 	return 0;
 }
 
+void ath10k_ce_rx_update_write_idx(struct ath10k_ce_pipe *pipe, u32 nentries)
+{
+	struct ath10k *ar = pipe->ar;
+	struct ath10k_ce_ring *dest_ring = pipe->dest_ring;
+	unsigned int nentries_mask = dest_ring->nentries_mask;
+	unsigned int write_index = dest_ring->write_index;
+	u32 ctrl_addr = pipe->ctrl_addr;
+
+	write_index = CE_RING_IDX_ADD(nentries_mask, write_index, nentries);
+	ath10k_ce_dest_ring_write_index_set(ar, ctrl_addr, write_index);
+	dest_ring->write_index = write_index;
+}
+
 int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr)
 {
 	struct ath10k *ar = pipe->ar;
@@ -478,8 +492,11 @@ int ath10k_ce_completed_recv_next_nolock(struct ath10k_ce_pipe *ce_state,
 		*per_transfer_contextp =
 			dest_ring->per_transfer_context[sw_index];
 
-	/* sanity */
-	dest_ring->per_transfer_context[sw_index] = NULL;
+	/* Copy engine 5 (HTT Rx) will reuse the same transfer context.
+	 * So update transfer context all CEs except CE5.
+	 */
+	if (ce_state->id != 5)
+		dest_ring->per_transfer_context[sw_index] = NULL;
 
 	/* Update sw_index */
 	sw_index = CE_RING_IDX_INCR(nentries_mask, sw_index);
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index 68717e5..25cafcf 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -166,6 +166,7 @@ int ath10k_ce_num_free_src_entries(struct ath10k_ce_pipe *pipe);
 int __ath10k_ce_rx_num_free_bufs(struct ath10k_ce_pipe *pipe);
 int __ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr);
 int ath10k_ce_rx_post_buf(struct ath10k_ce_pipe *pipe, void *ctx, u32 paddr);
+void ath10k_ce_rx_update_write_idx(struct ath10k_ce_pipe *pipe, u32 nentries);
 
 /* recv flags */
 /* Data is byte-swapped */
@@ -410,6 +411,8 @@ static inline u32 ath10k_ce_base_address(struct ath10k *ar, unsigned int ce_id)
 	(((int)(toidx)-(int)(fromidx)) & (nentries_mask))
 
 #define CE_RING_IDX_INCR(nentries_mask, idx) (((idx) + 1) & (nentries_mask))
+#define CE_RING_IDX_ADD(nentries_mask, idx, num) \
+		(((idx) + (num)) & (nentries_mask))
 
 #define CE_WRAPPER_INTERRUPT_SUMMARY_HOST_MSI_LSB \
 				ar->regs->ce_wrap_intr_sum_host_msi_lsb
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 290a61a..0b305ef 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -809,7 +809,8 @@ static void ath10k_pci_rx_post_pipe(struct ath10k_pci_pipe *pipe)
 	spin_lock_bh(&ar_pci->ce_lock);
 	num = __ath10k_ce_rx_num_free_bufs(ce_pipe);
 	spin_unlock_bh(&ar_pci->ce_lock);
-	while (num--) {
+
+	while (num >= 0) {
 		ret = __ath10k_pci_rx_post_buf(pipe);
 		if (ret) {
 			if (ret == -ENOSPC)
@@ -819,6 +820,7 @@ static void ath10k_pci_rx_post_pipe(struct ath10k_pci_pipe *pipe)
 				  ATH10K_PCI_RX_POST_RETRY_MS);
 			break;
 		}
+		num--;
 	}
 }
 
@@ -1212,6 +1214,63 @@ static void ath10k_pci_process_rx_cb(struct ath10k_ce_pipe *ce_state,
 	ath10k_pci_rx_post_pipe(pipe_info);
 }
 
+static void ath10k_pci_process_htt_rx_cb(struct ath10k_ce_pipe *ce_state,
+					 void (*callback)(struct ath10k *ar,
+							  struct sk_buff *skb))
+{
+	struct ath10k *ar = ce_state->ar;
+	struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
+	struct ath10k_pci_pipe *pipe_info =  &ar_pci->pipe_info[ce_state->id];
+	struct ath10k_ce_pipe *ce_pipe = pipe_info->ce_hdl;
+	struct sk_buff *skb;
+	struct sk_buff_head list;
+	void *transfer_context;
+	unsigned int nbytes, max_nbytes, nentries;
+	int orig_len;
+
+	/* No need to aquire ce_lock for CE5, since this is the only place CE5
+	 * is processed other than init and deinit. Before releasing CE5
+	 * buffers, interrupts are disabled. Thus CE5 access is serialized.
+	 */
+	__skb_queue_head_init(&list);
+	while (ath10k_ce_completed_recv_next_nolock(ce_state, &transfer_context,
+						    &nbytes) == 0) {
+		skb = transfer_context;
+		max_nbytes = skb->len + skb_tailroom(skb);
+
+		if (unlikely(max_nbytes < nbytes)) {
+			ath10k_warn(ar, "rxed more than expected (nbytes %d, max %d)",
+				    nbytes, max_nbytes);
+			continue;
+		}
+
+		dma_sync_single_for_cpu(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
+					max_nbytes, DMA_FROM_DEVICE);
+		skb_put(skb, nbytes);
+		__skb_queue_tail(&list, skb);
+	}
+
+	nentries = skb_queue_len(&list);
+	while ((skb = __skb_dequeue(&list))) {
+		ath10k_dbg(ar, ATH10K_DBG_PCI, "pci rx ce pipe %d len %d\n",
+			   ce_state->id, skb->len);
+		ath10k_dbg_dump(ar, ATH10K_DBG_PCI_DUMP, NULL, "pci rx: ",
+				skb->data, skb->len);
+
+		orig_len = skb->len;
+		callback(ar, skb);
+		skb_push(skb, orig_len - skb->len);
+		skb_reset_tail_pointer(skb);
+		skb_trim(skb, 0);
+
+		/*let device gain the buffer again*/
+		dma_sync_single_for_device(ar->dev, ATH10K_SKB_RXCB(skb)->paddr,
+					   skb->len + skb_tailroom(skb),
+					   DMA_FROM_DEVICE);
+	}
+	ath10k_ce_rx_update_write_idx(ce_pipe, nentries);
+}
+
 /* Called by lower (CE) layer when data is received from the Target. */
 static void ath10k_pci_htc_rx_cb(struct ath10k_ce_pipe *ce_state)
 {
@@ -1268,7 +1327,7 @@ static void ath10k_pci_htt_rx_cb(struct ath10k_ce_pipe *ce_state)
 	 */
 	ath10k_ce_per_engine_service(ce_state->ar, 4);
 
-	ath10k_pci_process_rx_cb(ce_state, ath10k_pci_htt_rx_deliver);
+	ath10k_pci_process_htt_rx_cb(ce_state, ath10k_pci_htt_rx_deliver);
 }
 
 int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 9/9] ath10k: combine txrx and replenish task
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  -1 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, rmanohar, Rajkumar Manoharan

Since tx completion and rx indication processing are moved out
of txrx tasklet and rx ring lock contention also removed from txrx
for rx_ind messages, it would be efficient to combine both replenish
and txrx tasks. Refill threshold is adjusted for both AP135 and AP148
(low and high end systems). With this adjustment in AP135, TCP DL is
improved from 603 Mbps to 620 Mbps and UDP DL is improved from 758 Mbps
to 803 Mbps. Also no watchdog are observed on UDP BiDi.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.h    |  3 +--
 drivers/net/wireless/ath/ath10k/htt_rx.c | 21 ++++++---------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 1adcae4..60bd9fe 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1662,7 +1662,6 @@ struct ath10k_htt {
 	/* set if host-fw communication goes haywire
 	 * used to avoid further failures */
 	bool rx_confused;
-	struct tasklet_struct rx_replenish_task;
 	atomic_t num_mpdus_ready;
 
 	/* This is used to group tx/rx completions separately and process them
@@ -1737,7 +1736,7 @@ struct htt_rx_desc {
 
 /* Refill a bunch of RX buffers for each refill round so that FW/HW can handle
  * aggregated traffic more nicely. */
-#define ATH10K_HTT_MAX_NUM_REFILL 16
+#define ATH10K_HTT_MAX_NUM_REFILL 100
 
 /*
  * DMA_MAP expects the buffer to be an integral number of cache lines.
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 31be0c6..96a7417 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -31,6 +31,8 @@
 /* when under memory pressure rx ring refill may fail and needs a retry */
 #define HTT_RX_RING_REFILL_RETRY_MS 50
 
+#define HTT_RX_RING_REFILL_RESCHED_MS 5
+
 static int ath10k_htt_rx_get_csum_state(struct sk_buff *skb);
 static void ath10k_htt_txrx_compl_task(unsigned long ptr);
 
@@ -192,7 +194,8 @@ static void ath10k_htt_rx_msdu_buff_replenish(struct ath10k_htt *htt)
 		mod_timer(&htt->rx_ring.refill_retry_timer, jiffies +
 			  msecs_to_jiffies(HTT_RX_RING_REFILL_RETRY_MS));
 	} else if (num_deficit > 0) {
-		tasklet_schedule(&htt->rx_replenish_task);
+		mod_timer(&htt->rx_ring.refill_retry_timer, jiffies +
+			  msecs_to_jiffies(HTT_RX_RING_REFILL_RESCHED_MS));
 	}
 	spin_unlock_bh(&htt->rx_ring.lock);
 }
@@ -223,7 +226,6 @@ int ath10k_htt_rx_ring_refill(struct ath10k *ar)
 void ath10k_htt_rx_free(struct ath10k_htt *htt)
 {
 	del_timer_sync(&htt->rx_ring.refill_retry_timer);
-	tasklet_kill(&htt->rx_replenish_task);
 	tasklet_kill(&htt->txrx_compl_task);
 
 	skb_queue_purge(&htt->rx_compl_q);
@@ -380,13 +382,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
 	return msdu_chaining;
 }
 
-static void ath10k_htt_rx_replenish_task(unsigned long ptr)
-{
-	struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
-
-	ath10k_htt_rx_msdu_buff_replenish(htt);
-}
-
 static struct sk_buff *ath10k_htt_rx_pop_paddr(struct ath10k_htt *htt,
 					       u32 paddr)
 {
@@ -520,9 +515,6 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 	htt->rx_ring.sw_rd_idx.msdu_payld = 0;
 	hash_init(htt->rx_ring.skb_table);
 
-	tasklet_init(&htt->rx_replenish_task, ath10k_htt_rx_replenish_task,
-		     (unsigned long)htt);
-
 	skb_queue_head_init(&htt->rx_compl_q);
 	skb_queue_head_init(&htt->rx_in_ord_compl_q);
 	skb_queue_head_init(&htt->tx_fetch_ind_q);
@@ -1912,8 +1904,7 @@ static void ath10k_htt_rx_in_ord_ind(struct ath10k *ar, struct sk_buff *skb)
 			return;
 		}
 	}
-
-	tasklet_schedule(&htt->rx_replenish_task);
+	ath10k_htt_rx_msdu_buff_replenish(htt);
 }
 
 static void ath10k_htt_rx_tx_fetch_resp_id_confirm(struct ath10k *ar,
@@ -2471,5 +2462,5 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 		dev_kfree_skb_any(skb);
 	}
 
-	tasklet_schedule(&htt->rx_replenish_task);
+	ath10k_htt_rx_msdu_buff_replenish(htt);
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 9/9] ath10k: combine txrx and replenish task
@ 2016-03-22 11:52   ` Rajkumar Manoharan
  0 siblings, 0 replies; 33+ messages in thread
From: Rajkumar Manoharan @ 2016-03-22 11:52 UTC (permalink / raw)
  To: ath10k; +Cc: linux-wireless, Rajkumar Manoharan, rmanohar

Since tx completion and rx indication processing are moved out
of txrx tasklet and rx ring lock contention also removed from txrx
for rx_ind messages, it would be efficient to combine both replenish
and txrx tasks. Refill threshold is adjusted for both AP135 and AP148
(low and high end systems). With this adjustment in AP135, TCP DL is
improved from 603 Mbps to 620 Mbps and UDP DL is improved from 758 Mbps
to 803 Mbps. Also no watchdog are observed on UDP BiDi.

Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>
---
 drivers/net/wireless/ath/ath10k/htt.h    |  3 +--
 drivers/net/wireless/ath/ath10k/htt_rx.c | 21 ++++++---------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 1adcae4..60bd9fe 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1662,7 +1662,6 @@ struct ath10k_htt {
 	/* set if host-fw communication goes haywire
 	 * used to avoid further failures */
 	bool rx_confused;
-	struct tasklet_struct rx_replenish_task;
 	atomic_t num_mpdus_ready;
 
 	/* This is used to group tx/rx completions separately and process them
@@ -1737,7 +1736,7 @@ struct htt_rx_desc {
 
 /* Refill a bunch of RX buffers for each refill round so that FW/HW can handle
  * aggregated traffic more nicely. */
-#define ATH10K_HTT_MAX_NUM_REFILL 16
+#define ATH10K_HTT_MAX_NUM_REFILL 100
 
 /*
  * DMA_MAP expects the buffer to be an integral number of cache lines.
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 31be0c6..96a7417 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -31,6 +31,8 @@
 /* when under memory pressure rx ring refill may fail and needs a retry */
 #define HTT_RX_RING_REFILL_RETRY_MS 50
 
+#define HTT_RX_RING_REFILL_RESCHED_MS 5
+
 static int ath10k_htt_rx_get_csum_state(struct sk_buff *skb);
 static void ath10k_htt_txrx_compl_task(unsigned long ptr);
 
@@ -192,7 +194,8 @@ static void ath10k_htt_rx_msdu_buff_replenish(struct ath10k_htt *htt)
 		mod_timer(&htt->rx_ring.refill_retry_timer, jiffies +
 			  msecs_to_jiffies(HTT_RX_RING_REFILL_RETRY_MS));
 	} else if (num_deficit > 0) {
-		tasklet_schedule(&htt->rx_replenish_task);
+		mod_timer(&htt->rx_ring.refill_retry_timer, jiffies +
+			  msecs_to_jiffies(HTT_RX_RING_REFILL_RESCHED_MS));
 	}
 	spin_unlock_bh(&htt->rx_ring.lock);
 }
@@ -223,7 +226,6 @@ int ath10k_htt_rx_ring_refill(struct ath10k *ar)
 void ath10k_htt_rx_free(struct ath10k_htt *htt)
 {
 	del_timer_sync(&htt->rx_ring.refill_retry_timer);
-	tasklet_kill(&htt->rx_replenish_task);
 	tasklet_kill(&htt->txrx_compl_task);
 
 	skb_queue_purge(&htt->rx_compl_q);
@@ -380,13 +382,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
 	return msdu_chaining;
 }
 
-static void ath10k_htt_rx_replenish_task(unsigned long ptr)
-{
-	struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
-
-	ath10k_htt_rx_msdu_buff_replenish(htt);
-}
-
 static struct sk_buff *ath10k_htt_rx_pop_paddr(struct ath10k_htt *htt,
 					       u32 paddr)
 {
@@ -520,9 +515,6 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 	htt->rx_ring.sw_rd_idx.msdu_payld = 0;
 	hash_init(htt->rx_ring.skb_table);
 
-	tasklet_init(&htt->rx_replenish_task, ath10k_htt_rx_replenish_task,
-		     (unsigned long)htt);
-
 	skb_queue_head_init(&htt->rx_compl_q);
 	skb_queue_head_init(&htt->rx_in_ord_compl_q);
 	skb_queue_head_init(&htt->tx_fetch_ind_q);
@@ -1912,8 +1904,7 @@ static void ath10k_htt_rx_in_ord_ind(struct ath10k *ar, struct sk_buff *skb)
 			return;
 		}
 	}
-
-	tasklet_schedule(&htt->rx_replenish_task);
+	ath10k_htt_rx_msdu_buff_replenish(htt);
 }
 
 static void ath10k_htt_rx_tx_fetch_resp_id_confirm(struct ath10k *ar,
@@ -2471,5 +2462,5 @@ static void ath10k_htt_txrx_compl_task(unsigned long ptr)
 		dev_kfree_skb_any(skb);
 	}
 
-	tasklet_schedule(&htt->rx_replenish_task);
+	ath10k_htt_rx_msdu_buff_replenish(htt);
 }
-- 
2.7.4


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
  2016-03-22 11:52   ` Rajkumar Manoharan
@ 2016-03-24 13:08     ` Valo, Kalle
  -1 siblings, 0 replies; 33+ messages in thread
From: Valo, Kalle @ 2016-03-24 13:08 UTC (permalink / raw)
  To: Manoharan, Rajkumar; +Cc: ath10k, linux-wireless, rmanohar

Rajkumar Manoharan <rmanohar@qti.qualcomm.com> writes:

> To optimize CPU usage htt rx descriptors will be reused instead of
> refilling it for htt rx copy engine (CE5). To support that all htt rx
> indications should be processed at same context. FIFO queue is used
> to maintain tx completion status for each msdu. This helps to retain
> the order of tx completion.
>
> Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>

[...]

> @@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
>  	for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
>  		msdu_id = resp->data_tx_completion.msdus[i];
>  		tx_done.msdu_id = __le16_to_cpu(msdu_id);
> -		ath10k_txrx_tx_unref(htt, &tx_done);
> +
> +		/* kfifo_put: In practice firmware shouldn't fire off per-CE
> +		 * interrupt and main interrupt (MSI/-X range case) for the same
> +		 * HTC service so it should be safe to use kfifo_put w/o lock.
> +		 *
> +		 * From kfifo_put() documentation:
> +		 *  Note that with only one concurrent reader and one concurrent
> +		 *  writer, you don't need extra locking to use these macro.
> +		 */
> +		if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
> +			ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
> +				    tx_done.msdu_id, tx_done.status);
> +			ath10k_txrx_tx_unref(htt, &tx_done);
> +		}

I see two new warnings on the kfifo_put() call:

drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar

But I suspect they are false warnings due to my old compiler:

gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

Opinions?

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
@ 2016-03-24 13:08     ` Valo, Kalle
  0 siblings, 0 replies; 33+ messages in thread
From: Valo, Kalle @ 2016-03-24 13:08 UTC (permalink / raw)
  To: Manoharan, Rajkumar; +Cc: linux-wireless, rmanohar, ath10k

Rajkumar Manoharan <rmanohar@qti.qualcomm.com> writes:

> To optimize CPU usage htt rx descriptors will be reused instead of
> refilling it for htt rx copy engine (CE5). To support that all htt rx
> indications should be processed at same context. FIFO queue is used
> to maintain tx completion status for each msdu. This helps to retain
> the order of tx completion.
>
> Signed-off-by: Rajkumar Manoharan <rmanohar@qti.qualcomm.com>

[...]

> @@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
>  	for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
>  		msdu_id = resp->data_tx_completion.msdus[i];
>  		tx_done.msdu_id = __le16_to_cpu(msdu_id);
> -		ath10k_txrx_tx_unref(htt, &tx_done);
> +
> +		/* kfifo_put: In practice firmware shouldn't fire off per-CE
> +		 * interrupt and main interrupt (MSI/-X range case) for the same
> +		 * HTC service so it should be safe to use kfifo_put w/o lock.
> +		 *
> +		 * From kfifo_put() documentation:
> +		 *  Note that with only one concurrent reader and one concurrent
> +		 *  writer, you don't need extra locking to use these macro.
> +		 */
> +		if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
> +			ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
> +				    tx_done.msdu_id, tx_done.status);
> +			ath10k_txrx_tx_unref(htt, &tx_done);
> +		}

I see two new warnings on the kfifo_put() call:

drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar

But I suspect they are false warnings due to my old compiler:

gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3

Opinions?

-- 
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
  2016-03-24 13:08     ` Valo, Kalle
@ 2016-03-24 15:52       ` Manoharan, Rajkumar
  -1 siblings, 0 replies; 33+ messages in thread
From: Manoharan, Rajkumar @ 2016-03-24 15:52 UTC (permalink / raw)
  To: Valo, Kalle; +Cc: ath10k, linux-wireless, rmanohar

[...]
>
>> @@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
>>       for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
>>               msdu_id = resp->data_tx_completion.msdus[i];
>>               tx_done.msdu_id = __le16_to_cpu(msdu_id);
>> -             ath10k_txrx_tx_unref(htt, &tx_done);
>> +
>> +             /* kfifo_put: In practice firmware shouldn't fire off per-CE
>> +              * interrupt and main interrupt (MSI/-X range case) for the same
>> +              * HTC service so it should be safe to use kfifo_put w/o lock.
>> +              *
>> +              * From kfifo_put() documentation:
>> +              *  Note that with only one concurrent reader and one concurrent
>> +              *  writer, you don't need extra locking to use these macro.
>> +              */
>> +             if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
>> +                     ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
>> +                                 tx_done.msdu_id, tx_done.status);
>> +                     ath10k_txrx_tx_unref(htt, &tx_done);
>> +             }
>
> I see two new warnings on the kfifo_put() call:
>
> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar
> 
> But I suspect they are false warnings due to my old compiler:
> 
> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> 
> Opinions?
> 
Hmm... I am not sure why older compiler is not complaining this. Unfortunately
both x86 and cross compiler are not throwing any warnings though :(

gcc version 5.3.0 (GCC)
Target: x86_64-unknown-linux-gnu

gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r47724)
Target: arm-openwrt-linux-uclibcgnueabi

Any suggestions welcome.

-Rajkumar


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
@ 2016-03-24 15:52       ` Manoharan, Rajkumar
  0 siblings, 0 replies; 33+ messages in thread
From: Manoharan, Rajkumar @ 2016-03-24 15:52 UTC (permalink / raw)
  To: Valo, Kalle; +Cc: linux-wireless, rmanohar, ath10k

[...]
>
>> @@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
>>       for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
>>               msdu_id = resp->data_tx_completion.msdus[i];
>>               tx_done.msdu_id = __le16_to_cpu(msdu_id);
>> -             ath10k_txrx_tx_unref(htt, &tx_done);
>> +
>> +             /* kfifo_put: In practice firmware shouldn't fire off per-CE
>> +              * interrupt and main interrupt (MSI/-X range case) for the same
>> +              * HTC service so it should be safe to use kfifo_put w/o lock.
>> +              *
>> +              * From kfifo_put() documentation:
>> +              *  Note that with only one concurrent reader and one concurrent
>> +              *  writer, you don't need extra locking to use these macro.
>> +              */
>> +             if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
>> +                     ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
>> +                                 tx_done.msdu_id, tx_done.status);
>> +                     ath10k_txrx_tx_unref(htt, &tx_done);
>> +             }
>
> I see two new warnings on the kfifo_put() call:
>
> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar
> 
> But I suspect they are false warnings due to my old compiler:
> 
> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> 
> Opinions?
> 
Hmm... I am not sure why older compiler is not complaining this. Unfortunately
both x86 and cross compiler are not throwing any warnings though :(

gcc version 5.3.0 (GCC)
Target: x86_64-unknown-linux-gnu

gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r47724)
Target: arm-openwrt-linux-uclibcgnueabi

Any suggestions welcome.

-Rajkumar


_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
  2016-03-24 15:52       ` Manoharan, Rajkumar
@ 2016-03-24 16:13         ` Manoharan, Rajkumar
  -1 siblings, 0 replies; 33+ messages in thread
From: Manoharan, Rajkumar @ 2016-03-24 16:13 UTC (permalink / raw)
  To: Valo, Kalle; +Cc: ath10k, linux-wireless, rmanohar

[...]
>>
>> I see two new warnings on the kfifo_put() call:
>>
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar
>>
>> But I suspect they are false warnings due to my old compiler:
>>
>> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
>>
>> Opinions?
>>
> Hmm... I am not sure why older compiler is not complaining this. Unfortunately
> both x86 and cross compiler are not throwing any warnings though :(
>

older/latest/g

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
@ 2016-03-24 16:13         ` Manoharan, Rajkumar
  0 siblings, 0 replies; 33+ messages in thread
From: Manoharan, Rajkumar @ 2016-03-24 16:13 UTC (permalink / raw)
  To: Valo, Kalle; +Cc: linux-wireless, rmanohar, ath10k

[...]
>>
>> I see two new warnings on the kfifo_put() call:
>>
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar
>>
>> But I suspect they are false warnings due to my old compiler:
>>
>> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
>>
>> Opinions?
>>
> Hmm... I am not sure why older compiler is not complaining this. Unfortunately
> both x86 and cross compiler are not throwing any warnings though :(
>

older/latest/g
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
  2016-03-24 15:52       ` Manoharan, Rajkumar
@ 2016-03-29  7:39         ` Valo, Kalle
  -1 siblings, 0 replies; 33+ messages in thread
From: Valo, Kalle @ 2016-03-29  7:39 UTC (permalink / raw)
  To: Manoharan, Rajkumar; +Cc: ath10k, linux-wireless, rmanohar

"Manoharan, Rajkumar" <rmanohar@qti.qualcomm.com> writes:

>>> @@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
>>>       for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
>>>               msdu_id = resp->data_tx_completion.msdus[i];
>>>               tx_done.msdu_id = __le16_to_cpu(msdu_id);
>>> -             ath10k_txrx_tx_unref(htt, &tx_done);
>>> +
>>> +             /* kfifo_put: In practice firmware shouldn't fire off per-CE
>>> +              * interrupt and main interrupt (MSI/-X range case) for the same
>>> +              * HTC service so it should be safe to use kfifo_put w/o lock.
>>> +              *
>>> +              * From kfifo_put() documentation:
>>> +              *  Note that with only one concurrent reader and one concurrent
>>> +              *  writer, you don't need extra locking to use these macro.
>>> +              */
>>> +             if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
>>> +                     ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
>>> +                                 tx_done.msdu_id, tx_done.status);
>>> +                     ath10k_txrx_tx_unref(htt, &tx_done);
>>> +             }
>>
>> I see two new warnings on the kfifo_put() call:
>>
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar
>> 
>> But I suspect they are false warnings due to my old compiler:
>> 
>> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
>> 
>> Opinions?
>> 
> Hmm... I am not sure why older compiler is not complaining this. Unfortunately
> both x86 and cross compiler are not throwing any warnings though :(
>
> gcc version 5.3.0 (GCC)
> Target: x86_64-unknown-linux-gnu
>
> gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r47724)
> Target: arm-openwrt-linux-uclibcgnueabi

Maybe we can just ignore the warning and hope that all recent compilers
are smart enough.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion
@ 2016-03-29  7:39         ` Valo, Kalle
  0 siblings, 0 replies; 33+ messages in thread
From: Valo, Kalle @ 2016-03-29  7:39 UTC (permalink / raw)
  To: Manoharan, Rajkumar; +Cc: linux-wireless, rmanohar, ath10k

"Manoharan, Rajkumar" <rmanohar@qti.qualcomm.com> writes:

>>> @@ -1712,7 +1710,20 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
>>>       for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
>>>               msdu_id = resp->data_tx_completion.msdus[i];
>>>               tx_done.msdu_id = __le16_to_cpu(msdu_id);
>>> -             ath10k_txrx_tx_unref(htt, &tx_done);
>>> +
>>> +             /* kfifo_put: In practice firmware shouldn't fire off per-CE
>>> +              * interrupt and main interrupt (MSI/-X range case) for the same
>>> +              * HTC service so it should be safe to use kfifo_put w/o lock.
>>> +              *
>>> +              * From kfifo_put() documentation:
>>> +              *  Note that with only one concurrent reader and one concurrent
>>> +              *  writer, you don't need extra locking to use these macro.
>>> +              */
>>> +             if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
>>> +                     ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
>>> +                                 tx_done.msdu_id, tx_done.status);
>>> +                     ath10k_txrx_tx_unref(htt, &tx_done);
>>> +             }
>>
>> I see two new warnings on the kfifo_put() call:
>>
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast to non-scalar
>> drivers/net/wireless/ath/ath10k/htt_rx.c:1722:22: warning: cast from non-scalar
>> 
>> But I suspect they are false warnings due to my old compiler:
>> 
>> gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
>> 
>> Opinions?
>> 
> Hmm... I am not sure why older compiler is not complaining this. Unfortunately
> both x86 and cross compiler are not throwing any warnings though :(
>
> gcc version 5.3.0 (GCC)
> Target: x86_64-unknown-linux-gnu
>
> gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 r47724)
> Target: arm-openwrt-linux-uclibcgnueabi

Maybe we can just ignore the warning and hope that all recent compilers
are smart enough.

-- 
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* latest ath10k pending tree bug's
  2016-03-29  7:39         ` Valo, Kalle
  (?)
@ 2016-03-29  9:12         ` Sebastian Gottschall
  2016-03-29 11:02           ` Manoharan, Rajkumar
  -1 siblings, 1 reply; 33+ messages in thread
From: Sebastian Gottschall @ 2016-03-29  9:12 UTC (permalink / raw)
  To: ath10k

the new introduced code which initialized the precal data for new 
chipsets will break QCA99XX support.
so ath10k fails to initialize.
i commented out
ret = ath10k_core_pre_cal_config(ar);
and
  ret = ath10k_core_pre_cal_download(ar);

to get it working again. this might not affect real pcie cards. but on 
IPQ8064 based boards with on board wifi cards it shows that behaviour.


in addition this message here floods the kernel log
[125917.204121] ath10k_pci 0001:01:00.0: failed to increase tx mgmt 
pending count: -16, dropping
but it has no further visible side effects


Sebastian



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: latest ath10k pending tree bug's
  2016-03-29  9:12         ` latest ath10k pending tree bug's Sebastian Gottschall
@ 2016-03-29 11:02           ` Manoharan, Rajkumar
  2016-03-29 16:02             ` Sebastian Gottschall
  0 siblings, 1 reply; 33+ messages in thread
From: Manoharan, Rajkumar @ 2016-03-29 11:02 UTC (permalink / raw)
  To: Sebastian Gottschall, ath10k, Rajkumar Manoharan

> the new introduced code which initialized the precal data for new
> chipsets will break QCA99XX support.
> so ath10k fails to initialize.
> i commented out
> ret = ath10k_core_pre_cal_config(ar);
> and
>   ret = ath10k_core_pre_cal_download(ar);
> 
> to get it working again. this might not affect real pcie cards. but on
> IPQ8064 based boards with on board wifi cards it shows that behaviour.
> 
Can you please try below change?

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index ca45da3..cf90c87 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -725,10 +725,8 @@ struct ath10k {
        const void *firmware_data;
        size_t firmware_len;

-       union {
-               const struct firmware *pre_cal_file;
-               const struct firmware *cal_file;
-       };
+       const struct firmware *pre_cal_file;
+       const struct firmware *cal_file;

        struct {
                const void *firmware_codeswap_data;

> in addition this message here floods the kernel log
> [125917.204121] ath10k_pci 0001:01:00.0: failed to increase tx mgmt
> pending count: -16, dropping
> but it has no further visible side effects
> 
Yeah... It is harmless but better to lower debug level.

-Rajkumar
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: latest ath10k pending tree bug's
  2016-03-29 11:02           ` Manoharan, Rajkumar
@ 2016-03-29 16:02             ` Sebastian Gottschall
  0 siblings, 0 replies; 33+ messages in thread
From: Sebastian Gottschall @ 2016-03-29 16:02 UTC (permalink / raw)
  To: Manoharan, Rajkumar, ath10k, Rajkumar Manoharan

tryed  it . it works.

thanks for quick solution

Sebastian

Am 29.03.2016 um 13:02 schrieb Manoharan, Rajkumar:
>> the new introduced code which initialized the precal data for new
>> chipsets will break QCA99XX support.
>> so ath10k fails to initialize.
>> i commented out
>> ret = ath10k_core_pre_cal_config(ar);
>> and
>>    ret = ath10k_core_pre_cal_download(ar);
>>
>> to get it working again. this might not affect real pcie cards. but on
>> IPQ8064 based boards with on board wifi cards it shows that behaviour.
>>
> Can you please try below change?
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
> index ca45da3..cf90c87 100644
> --- a/drivers/net/wireless/ath/ath10k/core.h
> +++ b/drivers/net/wireless/ath/ath10k/core.h
> @@ -725,10 +725,8 @@ struct ath10k {
>          const void *firmware_data;
>          size_t firmware_len;
>
> -       union {
> -               const struct firmware *pre_cal_file;
> -               const struct firmware *cal_file;
> -       };
> +       const struct firmware *pre_cal_file;
> +       const struct firmware *cal_file;
>
>          struct {
>                  const void *firmware_codeswap_data;
>
>> in addition this message here floods the kernel log
>> [125917.204121] ath10k_pci 0001:01:00.0: failed to increase tx mgmt
>> pending count: -16, dropping
>> but it has no further visible side effects
>>
> Yeah... It is harmless but better to lower debug level.
>
> -Rajkumar



_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/9] ath10k: improve throughput performance
  2016-03-22 11:52 ` Rajkumar Manoharan
@ 2016-04-04 14:55   ` Valo, Kalle
  -1 siblings, 0 replies; 33+ messages in thread
From: Valo, Kalle @ 2016-04-04 14:55 UTC (permalink / raw)
  To: Manoharan, Rajkumar; +Cc: ath10k, linux-wireless, rmanohar

Rajkumar Manoharan <rmanohar@qti.qualcomm.com> writes:

> Hi All,
>
> In order to reuse HTT Rx descriptor (copy engine 5), HTT response
> processing should be decoupled from txrx data processing. This change also
> helps to reduce rx ring lock contention. As txrx tasklet's work load is
> reduced, rx replenish task can be combined with txrx_task. Refilling
> complete rx ring from txrx tasket is affecting UDP UL traffic in AP135
> platform. Hence existing refill threshold is updated to meet peak
> throughput in both AP135 and AP148 platforms. Instead of tasklet existing
> refill timer is used to reschedule replenish work at an interval of 5 ms
> incase of more deficit.
>
> This series are experimented in both AP148(QCA99x0) & IPQ4019 platforms.
> Below are consolidated report alongwith CPU usage. Thanks Tamizh for helping 
> to verify the changes.
>
>         IPQ4019(TOT)  IPQ4019(+rework)   AP148(TOT)     AP148(+rework)
>         ===========   ===============    ==========     =============
> TCP DL   639 (40%)       646 (42%)       1134 (71%)      1134 (71%)
> TCP UL   661 (31%)       663 (30%)       1244 (71%)      1270 (72%)
> UDP DL   670 (50%)       682 (49%)       1240 (73%)      1244 (75%)
>
>         AP135 (OpenWrt TOT)     AP135 (+changes)
>         ==================      ===============
>
> TCP DL          603             620
> TCP UL          430             428
> UDP DL          758             803
> UDP UL          420             450
>
> -Rajkumar
>
> Rajkumar Manoharan (9):
>   ath10k: speedup htt rx descriptor processing for tx completion
>   ath10k: copy tx fetch indication message
>   ath10k: remove unused fw_desc processing
>   ath10k: cleanup amsdu processing for rx indication
>   ath10k: speedup htt rx descriptor processing for rx_ind
>   ath10k: register ath10k_htt_htc_t2h_msg_handler
>   ath10k: cleanup copy engine receive next completion
>   ath10k: reuse copy engine 5 (htt rx) descriptors
>   ath10k: combine txrx and replenish task

Applied, thanks.

-- 
Kalle Valo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/9] ath10k: improve throughput performance
@ 2016-04-04 14:55   ` Valo, Kalle
  0 siblings, 0 replies; 33+ messages in thread
From: Valo, Kalle @ 2016-04-04 14:55 UTC (permalink / raw)
  To: Manoharan, Rajkumar; +Cc: linux-wireless, rmanohar, ath10k

Rajkumar Manoharan <rmanohar@qti.qualcomm.com> writes:

> Hi All,
>
> In order to reuse HTT Rx descriptor (copy engine 5), HTT response
> processing should be decoupled from txrx data processing. This change also
> helps to reduce rx ring lock contention. As txrx tasklet's work load is
> reduced, rx replenish task can be combined with txrx_task. Refilling
> complete rx ring from txrx tasket is affecting UDP UL traffic in AP135
> platform. Hence existing refill threshold is updated to meet peak
> throughput in both AP135 and AP148 platforms. Instead of tasklet existing
> refill timer is used to reschedule replenish work at an interval of 5 ms
> incase of more deficit.
>
> This series are experimented in both AP148(QCA99x0) & IPQ4019 platforms.
> Below are consolidated report alongwith CPU usage. Thanks Tamizh for helping 
> to verify the changes.
>
>         IPQ4019(TOT)  IPQ4019(+rework)   AP148(TOT)     AP148(+rework)
>         ===========   ===============    ==========     =============
> TCP DL   639 (40%)       646 (42%)       1134 (71%)      1134 (71%)
> TCP UL   661 (31%)       663 (30%)       1244 (71%)      1270 (72%)
> UDP DL   670 (50%)       682 (49%)       1240 (73%)      1244 (75%)
>
>         AP135 (OpenWrt TOT)     AP135 (+changes)
>         ==================      ===============
>
> TCP DL          603             620
> TCP UL          430             428
> UDP DL          758             803
> UDP UL          420             450
>
> -Rajkumar
>
> Rajkumar Manoharan (9):
>   ath10k: speedup htt rx descriptor processing for tx completion
>   ath10k: copy tx fetch indication message
>   ath10k: remove unused fw_desc processing
>   ath10k: cleanup amsdu processing for rx indication
>   ath10k: speedup htt rx descriptor processing for rx_ind
>   ath10k: register ath10k_htt_htc_t2h_msg_handler
>   ath10k: cleanup copy engine receive next completion
>   ath10k: reuse copy engine 5 (htt rx) descriptors
>   ath10k: combine txrx and replenish task

Applied, thanks.

-- 
Kalle Valo
_______________________________________________
ath10k mailing list
ath10k@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/ath10k

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2016-04-04 14:55 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-22 11:52 [PATCH 0/9] ath10k: improve throughput performance Rajkumar Manoharan
2016-03-22 11:52 ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 1/9] ath10k: speedup htt rx descriptor processing for tx completion Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-24 13:08   ` Valo, Kalle
2016-03-24 13:08     ` Valo, Kalle
2016-03-24 15:52     ` Manoharan, Rajkumar
2016-03-24 15:52       ` Manoharan, Rajkumar
2016-03-24 16:13       ` Manoharan, Rajkumar
2016-03-24 16:13         ` Manoharan, Rajkumar
2016-03-29  7:39       ` Valo, Kalle
2016-03-29  7:39         ` Valo, Kalle
2016-03-29  9:12         ` latest ath10k pending tree bug's Sebastian Gottschall
2016-03-29 11:02           ` Manoharan, Rajkumar
2016-03-29 16:02             ` Sebastian Gottschall
2016-03-22 11:52 ` [PATCH 2/9] ath10k: copy tx fetch indication message Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 3/9] ath10k: remove unused fw_desc processing Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 4/9] ath10k: cleanup amsdu processing for rx indication Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 5/9] ath10k: speedup htt rx descriptor processing for rx_ind Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 6/9] ath10k: register ath10k_htt_htc_t2h_msg_handler Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 7/9] ath10k: cleanup copy engine receive next completion Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 8/9] ath10k: reuse copy engine 5 (htt rx) descriptors Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-03-22 11:52 ` [PATCH 9/9] ath10k: combine txrx and replenish task Rajkumar Manoharan
2016-03-22 11:52   ` Rajkumar Manoharan
2016-04-04 14:55 ` [PATCH 0/9] ath10k: improve throughput performance Valo, Kalle
2016-04-04 14:55   ` Valo, Kalle

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.