Linux-Wireless Archive on lore.kernel.org
 help / color / Atom feed
* [RFC 0/7] Add support to process rx packets in thread
@ 2020-07-21 17:14 Rakesh Pillai
  2020-07-21 17:14 ` [RFC 1/7] mac80211: Add check for napi handle before WARN_ON Rakesh Pillai
                   ` (7 more replies)
  0 siblings, 8 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

NAPI gets scheduled on the CPU core which got the
interrupt. The linux scheduler cannot move it to a
different core, even if the CPU on which NAPI is running
is heavily loaded. This can lead to degraded wifi
performance when running traffic at peak data rates.

A thread on the other hand can be moved to different
CPU cores, if the one on which its running is heavily
loaded. During high incoming data traffic, this gives
better performance, since the thread can be moved to a
less loaded or sometimes even a more powerful CPU core
to account for the required CPU performance in order
to process the incoming packets.

This patch series adds the support to use a high priority
thread to process the incoming packets, as opposed to
everything being done in NAPI context.

The rx thread can be enabled by using a module parameter
when loading the ath10k_snoc module.

---
This patch series is dependent on the below patch series
https://patchwork.kernel.org/project/ath10k/list/?series=315759

Rakesh Pillai (7):
  mac80211: Add check for napi handle before WARN_ON
  ath10k: Add support to process rx packet in thread
  ath10k: Add module param to enable rx thread
  ath10k: Do not exhaust budget on process tx completion
  ath10k: Handle the rx packet processing in thread
  ath10k: Add deliver to stack from thread context
  ath10k: Handle rx thread suspend and resume

 drivers/net/wireless/ath/ath10k/core.c   |  64 +++++++++++++++++++
 drivers/net/wireless/ath/ath10k/core.h   |  33 ++++++++++
 drivers/net/wireless/ath/ath10k/htt.h    |   2 +
 drivers/net/wireless/ath/ath10k/htt_rx.c |  66 ++++++++++++++-----
 drivers/net/wireless/ath/ath10k/snoc.c   | 105 ++++++++++++++++++++++++++++++-
 net/mac80211/rx.c                        |   2 +-
 6 files changed, 253 insertions(+), 19 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 1/7] mac80211: Add check for napi handle before WARN_ON
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
@ 2020-07-21 17:14 ` Rakesh Pillai
  2020-07-22 12:56   ` Johannes Berg
  2020-07-21 17:14 ` [RFC 2/7] ath10k: Add support to process rx packet in thread Rakesh Pillai
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

The function ieee80211_rx_napi can be now called
from a thread context as well, with napi context
being NULL.

Hence add the napi context check before giving out
a warning for softirq count being 0.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
---
 net/mac80211/rx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index a88ab6f..1e703f1 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -4652,7 +4652,7 @@ void ieee80211_rx_napi(struct ieee80211_hw *hw, struct ieee80211_sta *pubsta,
 	struct ieee80211_supported_band *sband;
 	struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
 
-	WARN_ON_ONCE(softirq_count() == 0);
+	WARN_ON_ONCE(napi && softirq_count() == 0);
 
 	if (WARN_ON(status->band >= NUM_NL80211_BANDS))
 		goto drop;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
  2020-07-21 17:14 ` [RFC 1/7] mac80211: Add check for napi handle before WARN_ON Rakesh Pillai
@ 2020-07-21 17:14 ` Rakesh Pillai
  2020-07-21 21:53   ` Rajkumar Manoharan
  2020-07-21 17:14 ` [RFC 3/7] ath10k: Add module param to enable rx thread Rakesh Pillai
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

NAPI instance gets scheduled on a CPU core on which
the IRQ was triggered. The processing of rx packets
can be CPU intensive and since NAPI cannot be moved
to a different CPU core, to get better performance,
its better to move the gist of rx packet processing
in a high priority thread.

Add the init/deinit part for a thread to process the
receive packets.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/core.c | 33 +++++++++++++++++++++++++++
 drivers/net/wireless/ath/ath10k/core.h | 23 +++++++++++++++++++
 drivers/net/wireless/ath/ath10k/snoc.c | 41 ++++++++++++++++++++++++++++++++++
 3 files changed, 97 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 9104496..2b520a0 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -668,6 +668,39 @@ static unsigned int ath10k_core_get_fw_feature_str(char *buf,
 	return scnprintf(buf, buf_len, "%s", ath10k_core_fw_feature_str[feat]);
 }
 
+int ath10k_core_thread_shutdown(struct ath10k *ar,
+				struct ath10k_thread *thread)
+{
+	ath10k_dbg(ar, ATH10K_DBG_BOOT, "shutting down %s\n", thread->name);
+	set_bit(ATH10K_THREAD_EVENT_SHUTDOWN, thread->event_flags);
+	wake_up_process(thread->task);
+	wait_for_completion(&thread->shutdown);
+	ath10k_info(ar, "thread %s exited\n", thread->name);
+
+	return 0;
+}
+EXPORT_SYMBOL(ath10k_core_thread_shutdown);
+
+int ath10k_core_thread_init(struct ath10k *ar,
+			    struct ath10k_thread *thread,
+			    int (*handler)(void *data),
+			    char *thread_name)
+{
+	thread->task = kthread_create(handler, thread, thread_name);
+	if (IS_ERR(thread->task))
+		return -EINVAL;
+
+	init_waitqueue_head(&thread->wait_q);
+	init_completion(&thread->shutdown);
+	memcpy(thread->name, thread_name, ATH10K_THREAD_NAME_SIZE_MAX);
+	clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN, thread->event_flags);
+	ath10k_info(ar, "Starting thread %s\n", thread_name);
+	wake_up_process(thread->task);
+
+	return 0;
+}
+EXPORT_SYMBOL(ath10k_core_thread_init);
+
 void ath10k_core_get_fw_features_str(struct ath10k *ar,
 				     char *buf,
 				     size_t buf_len)
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 5c18f6c..96919e8 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -970,6 +970,22 @@ struct ath10k_bus_params {
 	bool hl_msdu_ids;
 };
 
+#define ATH10K_THREAD_NAME_SIZE_MAX	32
+
+enum ath10k_thread_events {
+	ATH10K_THREAD_EVENT_SHUTDOWN,
+	ATH10K_THREAD_EVENT_MAX,
+};
+
+struct ath10k_thread {
+	struct ath10k *ar;
+	struct task_struct *task;
+	struct completion shutdown;
+	wait_queue_head_t wait_q;
+	DECLARE_BITMAP(event_flags, ATH10K_THREAD_EVENT_MAX);
+	char name[ATH10K_THREAD_NAME_SIZE_MAX];
+};
+
 struct ath10k {
 	struct ath_common ath_common;
 	struct ieee80211_hw *hw;
@@ -982,6 +998,7 @@ struct ath10k {
 	} msa;
 	u8 mac_addr[ETH_ALEN];
 
+	struct ath10k_thread rx_thread;
 	enum ath10k_hw_rev hw_rev;
 	u16 dev_id;
 	u32 chip_id;
@@ -1276,6 +1293,12 @@ static inline bool ath10k_peer_stats_enabled(struct ath10k *ar)
 
 extern unsigned long ath10k_coredump_mask;
 
+int ath10k_core_thread_shutdown(struct ath10k *ar,
+				struct ath10k_thread *thread);
+int ath10k_core_thread_init(struct ath10k *ar,
+			    struct ath10k_thread *thread,
+			    int (*handler)(void *data),
+			    char *thread_name);
 struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev,
 				  enum ath10k_bus bus,
 				  enum ath10k_hw_rev hw_rev,
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 1ef5fdb..463c34e 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -909,6 +909,31 @@ static void ath10k_snoc_buffer_cleanup(struct ath10k *ar)
 	}
 }
 
+int ath10k_snoc_rx_thread_loop(void *data)
+{
+	struct ath10k_thread *rx_thread = data;
+	struct ath10k *ar = rx_thread->ar;
+	bool shutdown = false;
+
+	ath10k_dbg(ar, ATH10K_DBG_SNOC, "rx thread started\n");
+	set_user_nice(current, -1);
+
+	while (!shutdown) {
+		wait_event_interruptible(
+			rx_thread->wait_q,
+			(test_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
+				  rx_thread->event_flags)));
+		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
+				       rx_thread->event_flags))
+			shutdown = true;
+	}
+
+	ath10k_dbg(ar, ATH10K_DBG_SNOC, "rx thread exiting\n");
+	complete(&rx_thread->shutdown);
+
+	do_exit(0);
+}
+
 static void ath10k_snoc_hif_stop(struct ath10k *ar)
 {
 	if (!test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags))
@@ -916,6 +941,7 @@ static void ath10k_snoc_hif_stop(struct ath10k *ar)
 
 	napi_synchronize(&ar->napi);
 	napi_disable(&ar->napi);
+	ath10k_core_thread_shutdown(ar, &ar->rx_thread);
 	ath10k_snoc_buffer_cleanup(ar);
 	ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif stop\n");
 }
@@ -923,9 +949,19 @@ static void ath10k_snoc_hif_stop(struct ath10k *ar)
 static int ath10k_snoc_hif_start(struct ath10k *ar)
 {
 	struct ath10k_snoc *ar_snoc = ath10k_snoc_priv(ar);
+	int ret;
 
 	bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
 	napi_enable(&ar->napi);
+
+	ret = ath10k_core_thread_init(ar, &ar->rx_thread,
+				      ath10k_snoc_rx_thread_loop,
+				      "ath10k_rx_thread");
+	if (ret) {
+		ath10k_err(ar, "failed to start rx thread\n");
+		goto rx_thread_fail;
+	}
+
 	ath10k_snoc_irq_enable(ar);
 	ath10k_snoc_rx_post(ar);
 
@@ -934,6 +970,10 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
 	ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif start\n");
 
 	return 0;
+
+rx_thread_fail:
+	napi_disable(&ar->napi);
+	return ret;
 }
 
 static int ath10k_snoc_init_pipes(struct ath10k *ar)
@@ -1652,6 +1692,7 @@ static int ath10k_snoc_probe(struct platform_device *pdev)
 		return -ENOMEM;
 	}
 
+	ar->rx_thread.ar = ar;
 	ar_snoc = ath10k_snoc_priv(ar);
 	ar_snoc->dev = pdev;
 	platform_set_drvdata(pdev, ar);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 3/7] ath10k: Add module param to enable rx thread
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
  2020-07-21 17:14 ` [RFC 1/7] mac80211: Add check for napi handle before WARN_ON Rakesh Pillai
  2020-07-21 17:14 ` [RFC 2/7] ath10k: Add support to process rx packet in thread Rakesh Pillai
@ 2020-07-21 17:14 ` Rakesh Pillai
  2020-07-21 17:14 ` [RFC 4/7] ath10k: Do not exhaust budget on process tx completion Rakesh Pillai
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

Add a module parameter to enable or disable
the processing of received packets in rx thread.

To enable rx packet processing in a thread
context, use the belo command to load driver:
insmod ath10k_snoc.ko rx_thread_enable=1

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/core.h |  1 +
 drivers/net/wireless/ath/ath10k/snoc.c | 24 +++++++++++++++++-------
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 96919e8..59bdf11 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -998,6 +998,7 @@ struct ath10k {
 	} msa;
 	u8 mac_addr[ETH_ALEN];
 
+	bool rx_thread_enable;
 	struct ath10k_thread rx_thread;
 	enum ath10k_hw_rev hw_rev;
 	u16 dev_id;
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 463c34e..f01725b 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -26,6 +26,12 @@
 #define CE_POLL_PIPE 4
 #define ATH10K_SNOC_WAKE_IRQ 2
 
+static bool ath10k_rx_thread_enable;
+
+module_param_named(rx_thread_enable, ath10k_rx_thread_enable, bool, 0644);
+
+MODULE_PARM_DESC(rx_thread_enable, "Receive packet processing in thread");
+
 static char *const ce_name[] = {
 	"WLAN_CE_0",
 	"WLAN_CE_1",
@@ -941,7 +947,8 @@ static void ath10k_snoc_hif_stop(struct ath10k *ar)
 
 	napi_synchronize(&ar->napi);
 	napi_disable(&ar->napi);
-	ath10k_core_thread_shutdown(ar, &ar->rx_thread);
+	if (ar->rx_thread_enable)
+		ath10k_core_thread_shutdown(ar, &ar->rx_thread);
 	ath10k_snoc_buffer_cleanup(ar);
 	ath10k_dbg(ar, ATH10K_DBG_BOOT, "boot hif stop\n");
 }
@@ -954,12 +961,14 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
 	bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
 	napi_enable(&ar->napi);
 
-	ret = ath10k_core_thread_init(ar, &ar->rx_thread,
-				      ath10k_snoc_rx_thread_loop,
-				      "ath10k_rx_thread");
-	if (ret) {
-		ath10k_err(ar, "failed to start rx thread\n");
-		goto rx_thread_fail;
+	if (ar->rx_thread_enable) {
+		ret = ath10k_core_thread_init(ar, &ar->rx_thread,
+					      ath10k_snoc_rx_thread_loop,
+					      "ath10k_rx_thread");
+		if (ret) {
+			ath10k_err(ar, "failed to start rx thread\n");
+			goto rx_thread_fail;
+		}
 	}
 
 	ath10k_snoc_irq_enable(ar);
@@ -1693,6 +1702,7 @@ static int ath10k_snoc_probe(struct platform_device *pdev)
 	}
 
 	ar->rx_thread.ar = ar;
+	ar->rx_thread_enable = ath10k_rx_thread_enable;
 	ar_snoc = ath10k_snoc_priv(ar);
 	ar_snoc->dev = pdev;
 	platform_set_drvdata(pdev, ar);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 4/7] ath10k: Do not exhaust budget on process tx completion
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
                   ` (2 preceding siblings ...)
  2020-07-21 17:14 ` [RFC 3/7] ath10k: Add module param to enable rx thread Rakesh Pillai
@ 2020-07-21 17:14 ` Rakesh Pillai
  2020-07-21 17:14 ` [RFC 5/7] ath10k: Handle the rx packet processing in thread Rakesh Pillai
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

Currently the entire NAPI budget is marked as exhausted
if any tx completion is processed.
In scenarios of bi-directional traffic, this leads to a
situation where the irqs are never enabled and the NAPI
is rescheuled again and again.

Increase the work done quota by the number of tx completions
which are processed in the NAPI context.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index cac05e7..a4a6618 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -4077,21 +4077,18 @@ int ath10k_htt_txrx_compl_task(struct ath10k *ar, int budget)
 	/* Deliver received data after processing data from hardware */
 	quota = ath10k_htt_rx_deliver_msdu(ar, quota, budget);
 
-	/* From NAPI documentation:
-	 *  The napi poll() function may also process TX completions, in which
-	 *  case if it processes the entire TX ring then it should count that
-	 *  work as the rest of the budget.
-	 */
-	if ((quota < budget) && !kfifo_is_empty(&htt->txdone_fifo))
-		quota = budget;
-
 	/* kfifo_get: called only within txrx_tasklet so it's neatly serialized.
 	 * From kfifo_get() documentation:
 	 *  Note that with only one concurrent reader and one concurrent writer,
 	 *  you don't need extra locking to use these macro.
 	 */
-	while (kfifo_get(&htt->txdone_fifo, &tx_done))
+	while (kfifo_get(&htt->txdone_fifo, &tx_done)) {
 		ath10k_txrx_tx_unref(htt, &tx_done);
+		quota++;
+	}
+
+	if (quota > budget)
+		resched_napi = true;
 
 	ath10k_mac_tx_push_pending(ar);
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 5/7] ath10k: Handle the rx packet processing in thread
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
                   ` (3 preceding siblings ...)
  2020-07-21 17:14 ` [RFC 4/7] ath10k: Do not exhaust budget on process tx completion Rakesh Pillai
@ 2020-07-21 17:14 ` Rakesh Pillai
  2020-07-21 17:14 ` [RFC 6/7] ath10k: Add deliver to stack from thread context Rakesh Pillai
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

Add the support to handle the receive packet and
the tx completion processing in a thread context
if the feature has been enabled via module parameter.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/core.c   |  8 ++++++
 drivers/net/wireless/ath/ath10k/core.h   |  4 +++
 drivers/net/wireless/ath/ath10k/htt.h    |  2 ++
 drivers/net/wireless/ath/ath10k/htt_rx.c | 46 +++++++++++++++++++++++++++-----
 drivers/net/wireless/ath/ath10k/snoc.c   | 12 +++++++--
 5 files changed, 64 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 2b520a0..4064fa2 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -668,6 +668,14 @@ static unsigned int ath10k_core_get_fw_feature_str(char *buf,
 	return scnprintf(buf, buf_len, "%s", ath10k_core_fw_feature_str[feat]);
 }
 
+void ath10k_core_thread_post_event(struct ath10k_thread *thread,
+				   enum ath10k_thread_events event)
+{
+	set_bit(event, thread->event_flags);
+	wake_up(&thread->wait_q);
+}
+EXPORT_SYMBOL(ath10k_core_thread_post_event);
+
 int ath10k_core_thread_shutdown(struct ath10k *ar,
 				struct ath10k_thread *thread)
 {
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 59bdf11..596d31b 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -974,6 +974,8 @@ struct ath10k_bus_params {
 
 enum ath10k_thread_events {
 	ATH10K_THREAD_EVENT_SHUTDOWN,
+	ATH10K_THREAD_EVENT_RX_POST,
+	ATH10K_THREAD_EVENT_TX_POST,
 	ATH10K_THREAD_EVENT_MAX,
 };
 
@@ -1294,6 +1296,8 @@ static inline bool ath10k_peer_stats_enabled(struct ath10k *ar)
 
 extern unsigned long ath10k_coredump_mask;
 
+void ath10k_core_thread_post_event(struct ath10k_thread *thread,
+				   enum ath10k_thread_events event);
 int ath10k_core_thread_shutdown(struct ath10k *ar,
 				struct ath10k_thread *thread);
 int ath10k_core_thread_init(struct ath10k *ar,
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index cad5949..e3cb723 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1970,6 +1970,8 @@ struct ath10k_htt {
 		spinlock_t lock;
 	} rx_ring;
 
+	/* Protects access to in order indication queue */
+	spinlock_t rx_in_ord_q_lock;
 	unsigned int prefetch_len;
 
 	/* Protects access to pending_tx, num_pending_tx */
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index a4a6618..becbd56 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -796,6 +796,7 @@ int ath10k_htt_rx_alloc(struct ath10k_htt *htt)
 	timer_setup(timer, ath10k_htt_rx_ring_refill_retry, 0);
 
 	spin_lock_init(&htt->rx_ring.lock);
+	spin_lock_init(&htt->rx_in_ord_q_lock);
 
 	htt->rx_ring.fill_cnt = 0;
 	htt->rx_ring.sw_rd_idx.msdu_payld = 0;
@@ -2702,10 +2703,17 @@ static void ath10k_htt_rx_tx_compl_ind(struct ath10k *ar,
 		 */
 		if (ar->bus_param.dev_type == ATH10K_DEV_TYPE_HL) {
 			ath10k_txrx_tx_unref(htt, &tx_done);
-		} else if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
-			ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
-				    tx_done.msdu_id, tx_done.status);
-			ath10k_txrx_tx_unref(htt, &tx_done);
+		} else {
+			if (!kfifo_put(&htt->txdone_fifo, tx_done)) {
+				ath10k_warn(ar, "txdone fifo overrun, msdu_id %d status %d\n",
+					    tx_done.msdu_id, tx_done.status);
+				ath10k_txrx_tx_unref(htt, &tx_done);
+				continue;
+			}
+			if (ar->rx_thread_enable)
+				ath10k_core_thread_post_event(
+					&ar->rx_thread,
+					ATH10K_THREAD_EVENT_TX_POST);
 		}
 	}
 
@@ -3903,7 +3911,16 @@ bool ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
 		break;
 	}
 	case HTT_T2H_MSG_TYPE_RX_IN_ORD_PADDR_IND: {
-		skb_queue_tail(&htt->rx_in_ord_compl_q, skb);
+		if (!ar->rx_thread_enable) {
+			skb_queue_tail(&htt->rx_in_ord_compl_q, skb);
+		} else {
+			spin_lock_bh(&htt->rx_in_ord_q_lock);
+			skb_queue_tail(&htt->rx_in_ord_compl_q, skb);
+			spin_unlock_bh(&htt->rx_in_ord_q_lock);
+			ath10k_core_thread_post_event(
+				&ar->rx_thread,
+				ATH10K_THREAD_EVENT_RX_POST);
+		}
 		return false;
 	}
 	case HTT_T2H_MSG_TYPE_TX_CREDIT_UPDATE_IND: {
@@ -4032,6 +4049,23 @@ int ath10k_htt_rx_hl_indication(struct ath10k *ar, int budget)
 }
 EXPORT_SYMBOL(ath10k_htt_rx_hl_indication);
 
+static inline struct sk_buff *
+ath10k_htt_dequeue_in_ord_q(struct ath10k_htt *htt)
+{
+	struct ath10k *ar = htt->ar;
+	struct sk_buff *skb = NULL;
+
+	if (ar->rx_thread_enable) {
+		spin_lock_bh(&htt->rx_in_ord_q_lock);
+		skb = skb_dequeue(&htt->rx_in_ord_compl_q);
+		spin_unlock_bh(&htt->rx_in_ord_q_lock);
+	} else {
+		skb = skb_dequeue(&htt->rx_in_ord_compl_q);
+	}
+
+	return skb;
+}
+
 int ath10k_htt_txrx_compl_task(struct ath10k *ar, int budget)
 {
 	struct ath10k_htt *htt = &ar->htt;
@@ -4053,7 +4087,7 @@ int ath10k_htt_txrx_compl_task(struct ath10k *ar, int budget)
 		goto exit;
 	}
 
-	while ((skb = skb_dequeue(&htt->rx_in_ord_compl_q))) {
+	while ((skb = ath10k_htt_dequeue_in_ord_q(htt))) {
 		spin_lock_bh(&htt->rx_ring.lock);
 		ret = ath10k_htt_rx_in_ord_ind(ar, skb);
 		spin_unlock_bh(&htt->rx_ring.lock);
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index f01725b..3eb5eac 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -920,6 +920,7 @@ int ath10k_snoc_rx_thread_loop(void *data)
 	struct ath10k_thread *rx_thread = data;
 	struct ath10k *ar = rx_thread->ar;
 	bool shutdown = false;
+	u32 thread_budget = 8192;
 
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "rx thread started\n");
 	set_user_nice(current, -1);
@@ -927,8 +928,14 @@ int ath10k_snoc_rx_thread_loop(void *data)
 	while (!shutdown) {
 		wait_event_interruptible(
 			rx_thread->wait_q,
-			(test_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
+			(test_and_clear_bit(ATH10K_THREAD_EVENT_RX_POST,
+					    rx_thread->event_flags) ||
+			 test_and_clear_bit(ATH10K_THREAD_EVENT_TX_POST,
+					    rx_thread->event_flags) ||
+			 test_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
 				  rx_thread->event_flags)));
+
+		ath10k_htt_txrx_compl_task(ar, thread_budget);
 		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
 				       rx_thread->event_flags))
 			shutdown = true;
@@ -1235,7 +1242,8 @@ static int ath10k_snoc_napi_poll(struct napi_struct *ctx, int budget)
 			ath10k_ce_enable_interrupt(ar, ce_id);
 		}
 
-	done = ath10k_htt_txrx_compl_task(ar, budget);
+	if (!ar->rx_thread_enable)
+		done = ath10k_htt_txrx_compl_task(ar, budget);
 
 	if (done < budget)
 		napi_complete(ctx);
-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 6/7] ath10k: Add deliver to stack from thread context
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
                   ` (4 preceding siblings ...)
  2020-07-21 17:14 ` [RFC 5/7] ath10k: Handle the rx packet processing in thread Rakesh Pillai
@ 2020-07-21 17:14 ` Rakesh Pillai
  2020-07-21 17:14 ` [RFC 7/7] ath10k: Handle rx thread suspend and resume Rakesh Pillai
  2020-07-21 17:25 ` [RFC 0/7] Add support to process rx packets in thread Andrew Lunn
  7 siblings, 0 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

When the receive packets are submitted to the stack
from a thread context, the NAPI handle should be passed
as NULL to the function ieee80211_rx_napi. This will
make sure that the packets are submitted to stack via
non-napi method

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/htt_rx.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index becbd56..85c169c 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1321,7 +1321,10 @@ static void ath10k_process_rx(struct ath10k *ar, struct sk_buff *skb)
 	trace_ath10k_rx_hdr(ar, skb->data, skb->len);
 	trace_ath10k_rx_payload(ar, skb->data, skb->len);
 
-	ieee80211_rx_napi(ar->hw, NULL, skb, &ar->napi);
+	if (in_serving_softirq())
+		ieee80211_rx_napi(ar->hw, NULL, skb, &ar->napi);
+	else
+		ieee80211_rx_napi(ar->hw, NULL, skb, NULL);
 }
 
 static int ath10k_htt_rx_nwifi_hdrlen(struct ath10k *ar,
-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [RFC 7/7] ath10k: Handle rx thread suspend and resume
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
                   ` (5 preceding siblings ...)
  2020-07-21 17:14 ` [RFC 6/7] ath10k: Add deliver to stack from thread context Rakesh Pillai
@ 2020-07-21 17:14 ` Rakesh Pillai
  2020-07-23 23:06   ` Sebastian Gottschall
  2020-07-21 17:25 ` [RFC 0/7] Add support to process rx packets in thread Andrew Lunn
  7 siblings, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-21 17:14 UTC (permalink / raw)
  To: ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen, Rakesh Pillai

During the system suspend or resume, the rx thread
also needs to be suspended or resume respectively.

Handle the rx thread as well during the system
suspend and resume.

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1

Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
---
 drivers/net/wireless/ath/ath10k/core.c | 23 ++++++++++++++++++
 drivers/net/wireless/ath/ath10k/core.h |  5 ++++
 drivers/net/wireless/ath/ath10k/snoc.c | 44 +++++++++++++++++++++++++++++++++-
 3 files changed, 71 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 4064fa2..b82b355 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -668,6 +668,27 @@ static unsigned int ath10k_core_get_fw_feature_str(char *buf,
 	return scnprintf(buf, buf_len, "%s", ath10k_core_fw_feature_str[feat]);
 }
 
+int ath10k_core_thread_suspend(struct ath10k_thread *thread)
+{
+	ath10k_dbg(thread->ar, ATH10K_DBG_BOOT, "Suspending thread %s\n",
+		   thread->name);
+	set_bit(ATH10K_THREAD_EVENT_SUSPEND, thread->event_flags);
+	wait_for_completion(&thread->suspend);
+
+	return 0;
+}
+EXPORT_SYMBOL(ath10k_core_thread_suspend);
+
+int ath10k_core_thread_resume(struct ath10k_thread *thread)
+{
+	ath10k_dbg(thread->ar, ATH10K_DBG_BOOT, "Resuming thread %s\n",
+		   thread->name);
+	complete(&thread->resume);
+
+	return 0;
+}
+EXPORT_SYMBOL(ath10k_core_thread_resume);
+
 void ath10k_core_thread_post_event(struct ath10k_thread *thread,
 				   enum ath10k_thread_events event)
 {
@@ -700,6 +721,8 @@ int ath10k_core_thread_init(struct ath10k *ar,
 
 	init_waitqueue_head(&thread->wait_q);
 	init_completion(&thread->shutdown);
+	init_completion(&thread->suspend);
+	init_completion(&thread->resume);
 	memcpy(thread->name, thread_name, ATH10K_THREAD_NAME_SIZE_MAX);
 	clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN, thread->event_flags);
 	ath10k_info(ar, "Starting thread %s\n", thread_name);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 596d31b..df65e75 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -976,6 +976,7 @@ enum ath10k_thread_events {
 	ATH10K_THREAD_EVENT_SHUTDOWN,
 	ATH10K_THREAD_EVENT_RX_POST,
 	ATH10K_THREAD_EVENT_TX_POST,
+	ATH10K_THREAD_EVENT_SUSPEND,
 	ATH10K_THREAD_EVENT_MAX,
 };
 
@@ -983,6 +984,8 @@ struct ath10k_thread {
 	struct ath10k *ar;
 	struct task_struct *task;
 	struct completion shutdown;
+	struct completion suspend;
+	struct completion resume;
 	wait_queue_head_t wait_q;
 	DECLARE_BITMAP(event_flags, ATH10K_THREAD_EVENT_MAX);
 	char name[ATH10K_THREAD_NAME_SIZE_MAX];
@@ -1296,6 +1299,8 @@ static inline bool ath10k_peer_stats_enabled(struct ath10k *ar)
 
 extern unsigned long ath10k_coredump_mask;
 
+int ath10k_core_thread_suspend(struct ath10k_thread *thread);
+int ath10k_core_thread_resume(struct ath10k_thread *thread);
 void ath10k_core_thread_post_event(struct ath10k_thread *thread,
 				   enum ath10k_thread_events event);
 int ath10k_core_thread_shutdown(struct ath10k *ar,
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index 3eb5eac..a373b2b 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -932,13 +932,31 @@ int ath10k_snoc_rx_thread_loop(void *data)
 					    rx_thread->event_flags) ||
 			 test_and_clear_bit(ATH10K_THREAD_EVENT_TX_POST,
 					    rx_thread->event_flags) ||
+			 test_bit(ATH10K_THREAD_EVENT_SUSPEND,
+				  rx_thread->event_flags) ||
 			 test_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
 				  rx_thread->event_flags)));
 
+		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SUSPEND,
+				       rx_thread->event_flags)) {
+			complete(&rx_thread->suspend);
+			ath10k_info(ar, "rx thread suspend\n");
+			wait_for_completion(&rx_thread->resume);
+			ath10k_info(ar, "rx thread resume\n");
+		}
+
 		ath10k_htt_txrx_compl_task(ar, thread_budget);
 		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
 				       rx_thread->event_flags))
 			shutdown = true;
+
+		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SUSPEND,
+				       rx_thread->event_flags)) {
+			complete(&rx_thread->suspend);
+			ath10k_info(ar, "rx thread suspend\n");
+			wait_for_completion(&rx_thread->resume);
+			ath10k_info(ar, "rx thread resume\n");
+		}
 	}
 
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "rx thread exiting\n");
@@ -1133,15 +1151,30 @@ static int ath10k_snoc_hif_suspend(struct ath10k *ar)
 	if (!device_may_wakeup(ar->dev))
 		return -EPERM;
 
+	if (ar->rx_thread_enable) {
+		ret = ath10k_core_thread_suspend(&ar->rx_thread);
+		if (ret) {
+			ath10k_err(ar, "failed to suspend rx_thread, %d\n",
+				   ret);
+			return ret;
+		}
+	}
+
 	ret = enable_irq_wake(ar_snoc->ce_irqs[ATH10K_SNOC_WAKE_IRQ].irq_line);
 	if (ret) {
 		ath10k_err(ar, "failed to enable wakeup irq :%d\n", ret);
-		return ret;
+		goto fail;
 	}
 
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc device suspended\n");
 
 	return ret;
+
+fail:
+	if (ar->rx_thread_enable)
+		ath10k_core_thread_resume(&ar->rx_thread);
+
+	return ret;
 }
 
 static int ath10k_snoc_hif_resume(struct ath10k *ar)
@@ -1158,6 +1191,15 @@ static int ath10k_snoc_hif_resume(struct ath10k *ar)
 		return ret;
 	}
 
+	if (ar->rx_thread_enable) {
+		ret = ath10k_core_thread_resume(&ar->rx_thread);
+		if (ret) {
+			ath10k_err(ar, "failed to suspend rx_thread, %d\n",
+				   ret);
+			return ret;
+		}
+	}
+
 	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc device resumed\n");
 
 	return ret;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
  2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
                   ` (6 preceding siblings ...)
  2020-07-21 17:14 ` [RFC 7/7] ath10k: Handle rx thread suspend and resume Rakesh Pillai
@ 2020-07-21 17:25 ` Andrew Lunn
  2020-07-21 18:05   ` Florian Fainelli
                     ` (3 more replies)
  7 siblings, 4 replies; 39+ messages in thread
From: Andrew Lunn @ 2020-07-21 17:25 UTC (permalink / raw)
  To: Rakesh Pillai
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen

On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
> NAPI gets scheduled on the CPU core which got the
> interrupt. The linux scheduler cannot move it to a
> different core, even if the CPU on which NAPI is running
> is heavily loaded. This can lead to degraded wifi
> performance when running traffic at peak data rates.
> 
> A thread on the other hand can be moved to different
> CPU cores, if the one on which its running is heavily
> loaded. During high incoming data traffic, this gives
> better performance, since the thread can be moved to a
> less loaded or sometimes even a more powerful CPU core
> to account for the required CPU performance in order
> to process the incoming packets.
> 
> This patch series adds the support to use a high priority
> thread to process the incoming packets, as opposed to
> everything being done in NAPI context.

I don't see why this problem is limited to the ath10k driver. I expect
it applies to all drivers using NAPI. So shouldn't you be solving this
in the NAPI core? Allow a driver to request the NAPI core uses a
thread?

	Andrew

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
  2020-07-21 17:25 ` [RFC 0/7] Add support to process rx packets in thread Andrew Lunn
@ 2020-07-21 18:05   ` Florian Fainelli
  2020-07-23 18:21     ` Rakesh Pillai
  2020-07-22  9:12   ` David Laight
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 39+ messages in thread
From: Florian Fainelli @ 2020-07-21 18:05 UTC (permalink / raw)
  To: Andrew Lunn, Rakesh Pillai
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen

On 7/21/20 10:25 AM, Andrew Lunn wrote:
> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
>> NAPI gets scheduled on the CPU core which got the
>> interrupt. The linux scheduler cannot move it to a
>> different core, even if the CPU on which NAPI is running
>> is heavily loaded. This can lead to degraded wifi
>> performance when running traffic at peak data rates.
>>
>> A thread on the other hand can be moved to different
>> CPU cores, if the one on which its running is heavily
>> loaded. During high incoming data traffic, this gives
>> better performance, since the thread can be moved to a
>> less loaded or sometimes even a more powerful CPU core
>> to account for the required CPU performance in order
>> to process the incoming packets.
>>
>> This patch series adds the support to use a high priority
>> thread to process the incoming packets, as opposed to
>> everything being done in NAPI context.
> 
> I don't see why this problem is limited to the ath10k driver. I expect
> it applies to all drivers using NAPI. So shouldn't you be solving this
> in the NAPI core? Allow a driver to request the NAPI core uses a
> thread?

What's more, you should be able to configure interrupt affinity to steer
RX processing onto a desired CPU core, is not that working for you somehow?
-- 
Florian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-21 17:14 ` [RFC 2/7] ath10k: Add support to process rx packet in thread Rakesh Pillai
@ 2020-07-21 21:53   ` Rajkumar Manoharan
  2020-07-22 12:27     ` Felix Fietkau
  2020-07-23 18:25     ` Rakesh Pillai
  0 siblings, 2 replies; 39+ messages in thread
From: Rajkumar Manoharan @ 2020-07-21 21:53 UTC (permalink / raw)
  To: Rakesh Pillai
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen, linux-wireless-owner

On 2020-07-21 10:14, Rakesh Pillai wrote:
> NAPI instance gets scheduled on a CPU core on which
> the IRQ was triggered. The processing of rx packets
> can be CPU intensive and since NAPI cannot be moved
> to a different CPU core, to get better performance,
> its better to move the gist of rx packet processing
> in a high priority thread.
> 
> Add the init/deinit part for a thread to process the
> receive packets.
> 
IMHO this defeat the whole purpose of NAPI. Originally in ath10k
irq processing happened in tasklet (high priority) context which in
turn push more data to net core even though net is unable to process
driver data as both happen in different context (fast producer - slow 
consumer)
issue. Why can't CPU governor schedule the interrupts in less loaded CPU 
core?
Otherwise you can play with different RPS and affinity settings to meet 
your
requirement.

IMO introducing high priority tasklets/threads is not viable solution.

-Rajkumar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 0/7] Add support to process rx packets in thread
  2020-07-21 17:25 ` [RFC 0/7] Add support to process rx packets in thread Andrew Lunn
  2020-07-21 18:05   ` Florian Fainelli
@ 2020-07-22  9:12   ` David Laight
  2020-07-22 16:20   ` Jakub Kicinski
       [not found]   ` <20200725081633.7432-1-hdanton@sina.com>
  3 siblings, 0 replies; 39+ messages in thread
From: David Laight @ 2020-07-22  9:12 UTC (permalink / raw)
  To: 'Andrew Lunn', Rakesh Pillai
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen

From: Andrew Lunn
> Sent: 21 July 2020 18:25
> 
> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
> > NAPI gets scheduled on the CPU core which got the
> > interrupt. The linux scheduler cannot move it to a
> > different core, even if the CPU on which NAPI is running
> > is heavily loaded. This can lead to degraded wifi
> > performance when running traffic at peak data rates.
> >
> > A thread on the other hand can be moved to different
> > CPU cores, if the one on which its running is heavily
> > loaded. During high incoming data traffic, this gives
> > better performance, since the thread can be moved to a
> > less loaded or sometimes even a more powerful CPU core
> > to account for the required CPU performance in order
> > to process the incoming packets.
> >
> > This patch series adds the support to use a high priority
> > thread to process the incoming packets, as opposed to
> > everything being done in NAPI context.
> 
> I don't see why this problem is limited to the ath10k driver. I expect
> it applies to all drivers using NAPI. So shouldn't you be solving this
> in the NAPI core? Allow a driver to request the NAPI core uses a
> thread?

It's not just NAPI the problem is with the softint processing.
I suspect a lot of systems would work better if it ran as
a (highish priority) kernel thread.

I've had to remove the main locks from a multi-threaded application
and replace them with atomic counters.
Consider what happens when the threads remove items from a shared
work list.
The code looks like:
	mutex_enter();
	remove_item_from_list();
	mutex_exit().
The mutex is only held for a few instructions, so while you'd expect
the cache line to be 'hot' you wouldn't get real contention.
However the following scenarios happen:
1) An ethernet interrupt happens while the mutex is held.
   This stops the other threads until all the softint processing
   has finished.
2) An ethernet interrupt (and softint) runs on a thread that is
   waiting for the mutex.
   (Or on the cpu that the thread's processor affinity ties it to.)
   In this case the 'fair' (ticket) mutex code won't let any other
   thread acquire the mutex.
   So again everything stops until the softints all complete.

The second one is also a problem when trying to wake up all
the threads (eg after adding a lot of items to the list).
The ticket locks force them to wake in order, but
sometimes the 'thundering herd' would work better.

IIRC this is actually worse for processes running under the RT
scheduler (without CONFIG_PREEMPT) because the they are almost
always scheduled on the same cpu they ran on last.
If it is busy, but cannot be pre-empted, they are not moved
to an idle cpu.
   
To confound things there is a very broken workaround for broken
hardware in the driver for the e1000 interface on (at least)
Ivy Bridge cpu that can cause the driver to spin for a very
long time (IIRC milliseconds) whenever it has to write to a
MAC register (ie on every transmit setup).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-21 21:53   ` Rajkumar Manoharan
@ 2020-07-22 12:27     ` Felix Fietkau
  2020-07-22 12:55       ` Johannes Berg
  2020-07-23 18:25     ` Rakesh Pillai
  1 sibling, 1 reply; 39+ messages in thread
From: Felix Fietkau @ 2020-07-22 12:27 UTC (permalink / raw)
  To: Rajkumar Manoharan, Rakesh Pillai
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen

On 2020-07-21 23:53, Rajkumar Manoharan wrote:
> On 2020-07-21 10:14, Rakesh Pillai wrote:
>> NAPI instance gets scheduled on a CPU core on which
>> the IRQ was triggered. The processing of rx packets
>> can be CPU intensive and since NAPI cannot be moved
>> to a different CPU core, to get better performance,
>> its better to move the gist of rx packet processing
>> in a high priority thread.
>> 
>> Add the init/deinit part for a thread to process the
>> receive packets.
>> 
> IMHO this defeat the whole purpose of NAPI. Originally in ath10k
> irq processing happened in tasklet (high priority) context which in
> turn push more data to net core even though net is unable to process
> driver data as both happen in different context (fast producer - slow 
> consumer)
> issue. Why can't CPU governor schedule the interrupts in less loaded CPU 
> core?
> Otherwise you can play with different RPS and affinity settings to meet 
> your
> requirement.
> 
> IMO introducing high priority tasklets/threads is not viable solution.
I'm beginning to think that the main problem with NAPI here is that the
work done by poll functions on 802.11 drivers is significantly more CPU
intensive compared to ethernet drivers, possibly more than what NAPI was
designed for.

I'm considering testing a different approach (with mt76 initially):
- Add a mac80211 rx function that puts processed skbs into a list
instead of handing them to the network stack directly.
- Move all rx processing to a high priority thread, keep a driver
internal queue for fully processed packets.
- Schedule NAPI poll on completion.
- NAPI poll function pulls from the internal queue and passes to the
network stack.

With this approach, the network stack retains some control over the
processing rate of rx packets, while the scheduler can move the CPU
intensive processing around to where it fits best.

What do you think?

- Felix

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-22 12:27     ` Felix Fietkau
@ 2020-07-22 12:55       ` Johannes Berg
  2020-07-22 13:00         ` Felix Fietkau
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Berg @ 2020-07-22 12:55 UTC (permalink / raw)
  To: Felix Fietkau, Rajkumar Manoharan, Rakesh Pillai
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen

On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:

> I'm considering testing a different approach (with mt76 initially):
> - Add a mac80211 rx function that puts processed skbs into a list
> instead of handing them to the network stack directly.

Would this be *after* all the mac80211 processing, i.e. in place of the
rx-up-to-stack?

johannes


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 1/7] mac80211: Add check for napi handle before WARN_ON
  2020-07-21 17:14 ` [RFC 1/7] mac80211: Add check for napi handle before WARN_ON Rakesh Pillai
@ 2020-07-22 12:56   ` Johannes Berg
  2020-07-23 18:26     ` Rakesh Pillai
  0 siblings, 1 reply; 39+ messages in thread
From: Johannes Berg @ 2020-07-22 12:56 UTC (permalink / raw)
  To: Rakesh Pillai, ath10k
  Cc: linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen

On Tue, 2020-07-21 at 22:44 +0530, Rakesh Pillai wrote:
> The function ieee80211_rx_napi can be now called
> from a thread context as well, with napi context
> being NULL.
> 
> Hence add the napi context check before giving out
> a warning for softirq count being 0.
> 
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1
> 
> Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
> ---
>  net/mac80211/rx.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> index a88ab6f..1e703f1 100644
> --- a/net/mac80211/rx.c
> +++ b/net/mac80211/rx.c
> @@ -4652,7 +4652,7 @@ void ieee80211_rx_napi(struct ieee80211_hw *hw, struct ieee80211_sta *pubsta,
>  	struct ieee80211_supported_band *sband;
>  	struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
>  
> -	WARN_ON_ONCE(softirq_count() == 0);
> +	WARN_ON_ONCE(napi && softirq_count() == 0);

FWIW, I'm pretty sure this is incorrect - we make assumptions on
softirqs being disabled in mac80211 for serialization and in place of
some locking, I believe.

johannes


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-22 12:55       ` Johannes Berg
@ 2020-07-22 13:00         ` Felix Fietkau
  2020-07-23  6:09           ` Rajkumar Manoharan
  0 siblings, 1 reply; 39+ messages in thread
From: Felix Fietkau @ 2020-07-22 13:00 UTC (permalink / raw)
  To: Johannes Berg, Rajkumar Manoharan, Rakesh Pillai
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen

On 2020-07-22 14:55, Johannes Berg wrote:
> On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:
> 
>> I'm considering testing a different approach (with mt76 initially):
>> - Add a mac80211 rx function that puts processed skbs into a list
>> instead of handing them to the network stack directly.
> 
> Would this be *after* all the mac80211 processing, i.e. in place of the
> rx-up-to-stack?
Yes, it would run all the rx handlers normally and then put the
resulting skbs into a list instead of calling netif_receive_skb or
napi_gro_frags.

- Felix

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
  2020-07-21 17:25 ` [RFC 0/7] Add support to process rx packets in thread Andrew Lunn
  2020-07-21 18:05   ` Florian Fainelli
  2020-07-22  9:12   ` David Laight
@ 2020-07-22 16:20   ` Jakub Kicinski
       [not found]   ` <20200725081633.7432-1-hdanton@sina.com>
  3 siblings, 0 replies; 39+ messages in thread
From: Jakub Kicinski @ 2020-07-22 16:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rakesh Pillai, ath10k, linux-wireless, linux-kernel, kvalo,
	johannes, davem, netdev, dianders, evgreen, Eric Dumazet

On Tue, 21 Jul 2020 19:25:14 +0200 Andrew Lunn wrote:
> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
> > NAPI gets scheduled on the CPU core which got the
> > interrupt. The linux scheduler cannot move it to a
> > different core, even if the CPU on which NAPI is running
> > is heavily loaded. This can lead to degraded wifi
> > performance when running traffic at peak data rates.
> > 
> > A thread on the other hand can be moved to different
> > CPU cores, if the one on which its running is heavily
> > loaded. During high incoming data traffic, this gives
> > better performance, since the thread can be moved to a
> > less loaded or sometimes even a more powerful CPU core
> > to account for the required CPU performance in order
> > to process the incoming packets.
> > 
> > This patch series adds the support to use a high priority
> > thread to process the incoming packets, as opposed to
> > everything being done in NAPI context.  
> 
> I don't see why this problem is limited to the ath10k driver. I expect
> it applies to all drivers using NAPI. So shouldn't you be solving this
> in the NAPI core? Allow a driver to request the NAPI core uses a
> thread?

Agreed, this is a problem we have with all drivers today.
We see seriously sub-optimal behavior in data center workloads, 
because kernel overloads the cores doing packet processing.

I think the fix may actually be in the scheduler. AFAIU the scheduler
counts the softIRQ time towards the interrupted process, and on top
of that tries to move processes to the cores handling their IO. In the
end the configuration which works somewhat okay is when each core has
its own IRQ and queues, which is seriously sub-optimal.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-22 13:00         ` Felix Fietkau
@ 2020-07-23  6:09           ` Rajkumar Manoharan
  0 siblings, 0 replies; 39+ messages in thread
From: Rajkumar Manoharan @ 2020-07-23  6:09 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Johannes Berg, Rakesh Pillai, netdev, linux-wireless,
	linux-kernel, ath10k, dianders, evgreen, kuba, davem, kvalo

On 2020-07-22 06:00, Felix Fietkau wrote:
> On 2020-07-22 14:55, Johannes Berg wrote:
>> On Wed, 2020-07-22 at 14:27 +0200, Felix Fietkau wrote:
>> 
>>> I'm considering testing a different approach (with mt76 initially):
>>> - Add a mac80211 rx function that puts processed skbs into a list
>>> instead of handing them to the network stack directly.
>> 
>> Would this be *after* all the mac80211 processing, i.e. in place of 
>> the
>> rx-up-to-stack?
> Yes, it would run all the rx handlers normally and then put the
> resulting skbs into a list instead of calling netif_receive_skb or
> napi_gro_frags.
> 
Felix,

This seems like split & batch processing. In past (ath9k), we observed 
some
behavioral differences between netif_rx and netif_receive_skb. The 
intermediate
queue in netif_rx changed not just performance but also time sensitive 
application
data. Agree that wireless stack processing might be heavier than 
ethernet packet
processing. If the hardware supports rx decap offload, NAPI processing 
shouldn't be
an issue. right?

-Rajkumar

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 0/7] Add support to process rx packets in thread
  2020-07-21 18:05   ` Florian Fainelli
@ 2020-07-23 18:21     ` Rakesh Pillai
  2020-07-23 19:02       ` Florian Fainelli
  0 siblings, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-23 18:21 UTC (permalink / raw)
  To: 'Florian Fainelli', 'Andrew Lunn'
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen



> -----Original Message-----
> From: Florian Fainelli <f.fainelli@gmail.com>
> Sent: Tuesday, July 21, 2020 11:35 PM
> To: Andrew Lunn <andrew@lunn.ch>; Rakesh Pillai <pillair@codeaurora.org>
> Cc: ath10k@lists.infradead.org; linux-wireless@vger.kernel.org; linux-
> kernel@vger.kernel.org; kvalo@codeaurora.org; johannes@sipsolutions.net;
> davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
> dianders@chromium.org; evgreen@chromium.org
> Subject: Re: [RFC 0/7] Add support to process rx packets in thread
> 
> On 7/21/20 10:25 AM, Andrew Lunn wrote:
> > On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
> >> NAPI gets scheduled on the CPU core which got the
> >> interrupt. The linux scheduler cannot move it to a
> >> different core, even if the CPU on which NAPI is running
> >> is heavily loaded. This can lead to degraded wifi
> >> performance when running traffic at peak data rates.
> >>
> >> A thread on the other hand can be moved to different
> >> CPU cores, if the one on which its running is heavily
> >> loaded. During high incoming data traffic, this gives
> >> better performance, since the thread can be moved to a
> >> less loaded or sometimes even a more powerful CPU core
> >> to account for the required CPU performance in order
> >> to process the incoming packets.
> >>
> >> This patch series adds the support to use a high priority
> >> thread to process the incoming packets, as opposed to
> >> everything being done in NAPI context.
> >
> > I don't see why this problem is limited to the ath10k driver. I expect
> > it applies to all drivers using NAPI. So shouldn't you be solving this
> > in the NAPI core? Allow a driver to request the NAPI core uses a
> > thread?
> 
> What's more, you should be able to configure interrupt affinity to steer
> RX processing onto a desired CPU core, is not that working for you
> somehow?

Hi Florian,
Yes, the affinity of IRQ does work for me.
But the affinity of IRQ does not happen runtime based on load.


> --
> Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-21 21:53   ` Rajkumar Manoharan
  2020-07-22 12:27     ` Felix Fietkau
@ 2020-07-23 18:25     ` Rakesh Pillai
  2020-07-24 23:11       ` Jacob Keller
  1 sibling, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-23 18:25 UTC (permalink / raw)
  To: 'Rajkumar Manoharan'
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen, linux-wireless-owner



> -----Original Message-----
> From: Rajkumar Manoharan <rmanohar@codeaurora.org>
> Sent: Wednesday, July 22, 2020 3:23 AM
> To: Rakesh Pillai <pillair@codeaurora.org>
> Cc: ath10k@lists.infradead.org; linux-wireless@vger.kernel.org; linux-
> kernel@vger.kernel.org; kvalo@codeaurora.org; johannes@sipsolutions.net;
> davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
> dianders@chromium.org; evgreen@chromium.org; linux-wireless-
> owner@vger.kernel.org
> Subject: Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
> 
> On 2020-07-21 10:14, Rakesh Pillai wrote:
> > NAPI instance gets scheduled on a CPU core on which
> > the IRQ was triggered. The processing of rx packets
> > can be CPU intensive and since NAPI cannot be moved
> > to a different CPU core, to get better performance,
> > its better to move the gist of rx packet processing
> > in a high priority thread.
> >
> > Add the init/deinit part for a thread to process the
> > receive packets.
> >
> IMHO this defeat the whole purpose of NAPI. Originally in ath10k
> irq processing happened in tasklet (high priority) context which in
> turn push more data to net core even though net is unable to process
> driver data as both happen in different context (fast producer - slow
> consumer)
> issue. Why can't CPU governor schedule the interrupts in less loaded CPU
> core?
> Otherwise you can play with different RPS and affinity settings to meet
> your
> requirement.
> 
> IMO introducing high priority tasklets/threads is not viable solution.

Hi Rajkumar,
In linux, the IRQs are directed to the first core which is booted.
I see that all the IRQs are getting routed to CORE0 even if its heavily
loaded.

IRQ and NAPI are not under the scheduler's control, since it cannot be
moved.
NAPI is scheduled only on the same core as IRQ. But a thread can be moved
around by the scheduler based on the CPU load.
This is the reason I went ahead with using thread.

Affinity settings are static, and cannot be done runtime without any
downstream userspace entity.


> 
> -Rajkumar


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 1/7] mac80211: Add check for napi handle before WARN_ON
  2020-07-22 12:56   ` Johannes Berg
@ 2020-07-23 18:26     ` Rakesh Pillai
  2020-07-23 20:06       ` Johannes Berg
  0 siblings, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-23 18:26 UTC (permalink / raw)
  To: 'Johannes Berg', ath10k
  Cc: linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen



> -----Original Message-----
> From: Johannes Berg <johannes@sipsolutions.net>
> Sent: Wednesday, July 22, 2020 6:26 PM
> To: Rakesh Pillai <pillair@codeaurora.org>; ath10k@lists.infradead.org
> Cc: linux-wireless@vger.kernel.org; linux-kernel@vger.kernel.org;
> kvalo@codeaurora.org; davem@davemloft.net; kuba@kernel.org;
> netdev@vger.kernel.org; dianders@chromium.org; evgreen@chromium.org
> Subject: Re: [RFC 1/7] mac80211: Add check for napi handle before
> WARN_ON
> 
> On Tue, 2020-07-21 at 22:44 +0530, Rakesh Pillai wrote:
> > The function ieee80211_rx_napi can be now called
> > from a thread context as well, with napi context
> > being NULL.
> >
> > Hence add the napi context check before giving out
> > a warning for softirq count being 0.
> >
> > Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1
> >
> > Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
> > ---
> >  net/mac80211/rx.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
> > index a88ab6f..1e703f1 100644
> > --- a/net/mac80211/rx.c
> > +++ b/net/mac80211/rx.c
> > @@ -4652,7 +4652,7 @@ void ieee80211_rx_napi(struct ieee80211_hw
> *hw, struct ieee80211_sta *pubsta,
> >  	struct ieee80211_supported_band *sband;
> >  	struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
> >
> > -	WARN_ON_ONCE(softirq_count() == 0);
> > +	WARN_ON_ONCE(napi && softirq_count() == 0);
> 
> FWIW, I'm pretty sure this is incorrect - we make assumptions on
> softirqs being disabled in mac80211 for serialization and in place of
> some locking, I believe.
> 

I checked this, but let me double confirm.
But after this change, no packet is submitted from driver in a softirq context.
So ideally this should take care of serialization.


> johannes



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
  2020-07-23 18:21     ` Rakesh Pillai
@ 2020-07-23 19:02       ` Florian Fainelli
  2020-07-24  6:20         ` Rakesh Pillai
  0 siblings, 1 reply; 39+ messages in thread
From: Florian Fainelli @ 2020-07-23 19:02 UTC (permalink / raw)
  To: Rakesh Pillai, 'Andrew Lunn'
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen

On 7/23/20 11:21 AM, Rakesh Pillai wrote:
> 
> 
>> -----Original Message-----
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Sent: Tuesday, July 21, 2020 11:35 PM
>> To: Andrew Lunn <andrew@lunn.ch>; Rakesh Pillai <pillair@codeaurora.org>
>> Cc: ath10k@lists.infradead.org; linux-wireless@vger.kernel.org; linux-
>> kernel@vger.kernel.org; kvalo@codeaurora.org; johannes@sipsolutions.net;
>> davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
>> dianders@chromium.org; evgreen@chromium.org
>> Subject: Re: [RFC 0/7] Add support to process rx packets in thread
>>
>> On 7/21/20 10:25 AM, Andrew Lunn wrote:
>>> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
>>>> NAPI gets scheduled on the CPU core which got the
>>>> interrupt. The linux scheduler cannot move it to a
>>>> different core, even if the CPU on which NAPI is running
>>>> is heavily loaded. This can lead to degraded wifi
>>>> performance when running traffic at peak data rates.
>>>>
>>>> A thread on the other hand can be moved to different
>>>> CPU cores, if the one on which its running is heavily
>>>> loaded. During high incoming data traffic, this gives
>>>> better performance, since the thread can be moved to a
>>>> less loaded or sometimes even a more powerful CPU core
>>>> to account for the required CPU performance in order
>>>> to process the incoming packets.
>>>>
>>>> This patch series adds the support to use a high priority
>>>> thread to process the incoming packets, as opposed to
>>>> everything being done in NAPI context.
>>>
>>> I don't see why this problem is limited to the ath10k driver. I expect
>>> it applies to all drivers using NAPI. So shouldn't you be solving this
>>> in the NAPI core? Allow a driver to request the NAPI core uses a
>>> thread?
>>
>> What's more, you should be able to configure interrupt affinity to steer
>> RX processing onto a desired CPU core, is not that working for you
>> somehow?
> 
> Hi Florian,
> Yes, the affinity of IRQ does work for me.
> But the affinity of IRQ does not happen runtime based on load.

It can if you also run irqbalance.
-- 
Florian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 1/7] mac80211: Add check for napi handle before WARN_ON
  2020-07-23 18:26     ` Rakesh Pillai
@ 2020-07-23 20:06       ` Johannes Berg
  2020-07-24  6:21         ` Rakesh Pillai
  2020-07-26 16:19         ` Rakesh Pillai
  0 siblings, 2 replies; 39+ messages in thread
From: Johannes Berg @ 2020-07-23 20:06 UTC (permalink / raw)
  To: Rakesh Pillai, ath10k
  Cc: linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen

On Thu, 2020-07-23 at 23:56 +0530, Rakesh Pillai wrote:

> > > -	WARN_ON_ONCE(softirq_count() == 0);
> > > +	WARN_ON_ONCE(napi && softirq_count() == 0);
> > 
> > FWIW, I'm pretty sure this is incorrect - we make assumptions on
> > softirqs being disabled in mac80211 for serialization and in place of
> > some locking, I believe.
> > 
> 
> I checked this, but let me double confirm.
> But after this change, no packet is submitted from driver in a softirq context.
> So ideally this should take care of serialization.

I'd guess that we have some reliance on BHs already being disabled, for
things like u64 sync updates, or whatnot. I mean, we did "rx_ni()" for a
reason ... Maybe lockdep can help catch some of the issues.

But couldn't you be in a thread and have BHs disabled too?

johannes


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 7/7] ath10k: Handle rx thread suspend and resume
  2020-07-21 17:14 ` [RFC 7/7] ath10k: Handle rx thread suspend and resume Rakesh Pillai
@ 2020-07-23 23:06   ` Sebastian Gottschall
  2020-07-24  6:19     ` Rakesh Pillai
  0 siblings, 1 reply; 39+ messages in thread
From: Sebastian Gottschall @ 2020-07-23 23:06 UTC (permalink / raw)
  To: Rakesh Pillai, ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen

your patch seem to only affect the WCN3990 chipset. all other ath10k 
supported chipset are not supported here. so you see a chance to 
implement this more generic?

Sebastian

Am 21.07.2020 um 19:14 schrieb Rakesh Pillai:
> During the system suspend or resume, the rx thread
> also needs to be suspended or resume respectively.
>
> Handle the rx thread as well during the system
> suspend and resume.
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1
>
> Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
> ---
>   drivers/net/wireless/ath/ath10k/core.c | 23 ++++++++++++++++++
>   drivers/net/wireless/ath/ath10k/core.h |  5 ++++
>   drivers/net/wireless/ath/ath10k/snoc.c | 44 +++++++++++++++++++++++++++++++++-
>   3 files changed, 71 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> index 4064fa2..b82b355 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -668,6 +668,27 @@ static unsigned int ath10k_core_get_fw_feature_str(char *buf,
>   	return scnprintf(buf, buf_len, "%s", ath10k_core_fw_feature_str[feat]);
>   }
>   
> +int ath10k_core_thread_suspend(struct ath10k_thread *thread)
> +{
> +	ath10k_dbg(thread->ar, ATH10K_DBG_BOOT, "Suspending thread %s\n",
> +		   thread->name);
> +	set_bit(ATH10K_THREAD_EVENT_SUSPEND, thread->event_flags);
> +	wait_for_completion(&thread->suspend);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(ath10k_core_thread_suspend);
> +
> +int ath10k_core_thread_resume(struct ath10k_thread *thread)
> +{
> +	ath10k_dbg(thread->ar, ATH10K_DBG_BOOT, "Resuming thread %s\n",
> +		   thread->name);
> +	complete(&thread->resume);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL(ath10k_core_thread_resume);
> +
>   void ath10k_core_thread_post_event(struct ath10k_thread *thread,
>   				   enum ath10k_thread_events event)
>   {
> @@ -700,6 +721,8 @@ int ath10k_core_thread_init(struct ath10k *ar,
>   
>   	init_waitqueue_head(&thread->wait_q);
>   	init_completion(&thread->shutdown);
> +	init_completion(&thread->suspend);
> +	init_completion(&thread->resume);
>   	memcpy(thread->name, thread_name, ATH10K_THREAD_NAME_SIZE_MAX);
>   	clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN, thread->event_flags);
>   	ath10k_info(ar, "Starting thread %s\n", thread_name);
> diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
> index 596d31b..df65e75 100644
> --- a/drivers/net/wireless/ath/ath10k/core.h
> +++ b/drivers/net/wireless/ath/ath10k/core.h
> @@ -976,6 +976,7 @@ enum ath10k_thread_events {
>   	ATH10K_THREAD_EVENT_SHUTDOWN,
>   	ATH10K_THREAD_EVENT_RX_POST,
>   	ATH10K_THREAD_EVENT_TX_POST,
> +	ATH10K_THREAD_EVENT_SUSPEND,
>   	ATH10K_THREAD_EVENT_MAX,
>   };
>   
> @@ -983,6 +984,8 @@ struct ath10k_thread {
>   	struct ath10k *ar;
>   	struct task_struct *task;
>   	struct completion shutdown;
> +	struct completion suspend;
> +	struct completion resume;
>   	wait_queue_head_t wait_q;
>   	DECLARE_BITMAP(event_flags, ATH10K_THREAD_EVENT_MAX);
>   	char name[ATH10K_THREAD_NAME_SIZE_MAX];
> @@ -1296,6 +1299,8 @@ static inline bool ath10k_peer_stats_enabled(struct ath10k *ar)
>   
>   extern unsigned long ath10k_coredump_mask;
>   
> +int ath10k_core_thread_suspend(struct ath10k_thread *thread);
> +int ath10k_core_thread_resume(struct ath10k_thread *thread);
>   void ath10k_core_thread_post_event(struct ath10k_thread *thread,
>   				   enum ath10k_thread_events event);
>   int ath10k_core_thread_shutdown(struct ath10k *ar,
> diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> index 3eb5eac..a373b2b 100644
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -932,13 +932,31 @@ int ath10k_snoc_rx_thread_loop(void *data)
>   					    rx_thread->event_flags) ||
>   			 test_and_clear_bit(ATH10K_THREAD_EVENT_TX_POST,
>   					    rx_thread->event_flags) ||
> +			 test_bit(ATH10K_THREAD_EVENT_SUSPEND,
> +				  rx_thread->event_flags) ||
>   			 test_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
>   				  rx_thread->event_flags)));
>   
> +		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SUSPEND,
> +				       rx_thread->event_flags)) {
> +			complete(&rx_thread->suspend);
> +			ath10k_info(ar, "rx thread suspend\n");
> +			wait_for_completion(&rx_thread->resume);
> +			ath10k_info(ar, "rx thread resume\n");
> +		}
> +
>   		ath10k_htt_txrx_compl_task(ar, thread_budget);
>   		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
>   				       rx_thread->event_flags))
>   			shutdown = true;
> +
> +		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SUSPEND,
> +				       rx_thread->event_flags)) {
> +			complete(&rx_thread->suspend);
> +			ath10k_info(ar, "rx thread suspend\n");
> +			wait_for_completion(&rx_thread->resume);
> +			ath10k_info(ar, "rx thread resume\n");
> +		}
>   	}
>   
>   	ath10k_dbg(ar, ATH10K_DBG_SNOC, "rx thread exiting\n");
> @@ -1133,15 +1151,30 @@ static int ath10k_snoc_hif_suspend(struct ath10k *ar)
>   	if (!device_may_wakeup(ar->dev))
>   		return -EPERM;
>   
> +	if (ar->rx_thread_enable) {
> +		ret = ath10k_core_thread_suspend(&ar->rx_thread);
> +		if (ret) {
> +			ath10k_err(ar, "failed to suspend rx_thread, %d\n",
> +				   ret);
> +			return ret;
> +		}
> +	}
> +
>   	ret = enable_irq_wake(ar_snoc->ce_irqs[ATH10K_SNOC_WAKE_IRQ].irq_line);
>   	if (ret) {
>   		ath10k_err(ar, "failed to enable wakeup irq :%d\n", ret);
> -		return ret;
> +		goto fail;
>   	}
>   
>   	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc device suspended\n");
>   
>   	return ret;
> +
> +fail:
> +	if (ar->rx_thread_enable)
> +		ath10k_core_thread_resume(&ar->rx_thread);
> +
> +	return ret;
>   }
>   
>   static int ath10k_snoc_hif_resume(struct ath10k *ar)
> @@ -1158,6 +1191,15 @@ static int ath10k_snoc_hif_resume(struct ath10k *ar)
>   		return ret;
>   	}
>   
> +	if (ar->rx_thread_enable) {
> +		ret = ath10k_core_thread_resume(&ar->rx_thread);
> +		if (ret) {
> +			ath10k_err(ar, "failed to suspend rx_thread, %d\n",
> +				   ret);
> +			return ret;
> +		}
> +	}
> +
>   	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc device resumed\n");
>   
>   	return ret;

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 7/7] ath10k: Handle rx thread suspend and resume
  2020-07-23 23:06   ` Sebastian Gottschall
@ 2020-07-24  6:19     ` Rakesh Pillai
  0 siblings, 0 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-24  6:19 UTC (permalink / raw)
  To: 'Sebastian Gottschall', ath10k
  Cc: linux-wireless, linux-kernel, kvalo, johannes, davem, kuba,
	netdev, dianders, evgreen



> -----Original Message-----
> From: Sebastian Gottschall <s.gottschall@dd-wrt.com>
> Sent: Friday, July 24, 2020 4:36 AM
> To: Rakesh Pillai <pillair@codeaurora.org>; ath10k@lists.infradead.org
> Cc: linux-wireless@vger.kernel.org; linux-kernel@vger.kernel.org;
> kvalo@codeaurora.org; johannes@sipsolutions.net; davem@davemloft.net;
> kuba@kernel.org; netdev@vger.kernel.org; dianders@chromium.org;
> evgreen@chromium.org
> Subject: Re: [RFC 7/7] ath10k: Handle rx thread suspend and resume
> 
> your patch seem to only affect the WCN3990 chipset. all other ath10k
> supported chipset are not supported here. so you see a chance to
> implement this more generic?
> 
> Sebastian

Hi Sebastian,

A generic core for handling threads is added with this patchset.
So the handling of rx packet processing in thread can always be extended to other targets, if they wish so.

The thread related APIs are in core.c which gives all the other targets access to these APIs for using them.

Thanks,
Rakesh Pillai.

> 
> Am 21.07.2020 um 19:14 schrieb Rakesh Pillai:
> > During the system suspend or resume, the rx thread
> > also needs to be suspended or resume respectively.
> >
> > Handle the rx thread as well during the system
> > suspend and resume.
> >
> > Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.1-01040-QCAHLSWMTPLZ-1
> >
> > Signed-off-by: Rakesh Pillai <pillair@codeaurora.org>
> > ---
> >   drivers/net/wireless/ath/ath10k/core.c | 23 ++++++++++++++++++
> >   drivers/net/wireless/ath/ath10k/core.h |  5 ++++
> >   drivers/net/wireless/ath/ath10k/snoc.c | 44
> +++++++++++++++++++++++++++++++++-
> >   3 files changed, 71 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/wireless/ath/ath10k/core.c
> b/drivers/net/wireless/ath/ath10k/core.c
> > index 4064fa2..b82b355 100644
> > --- a/drivers/net/wireless/ath/ath10k/core.c
> > +++ b/drivers/net/wireless/ath/ath10k/core.c
> > @@ -668,6 +668,27 @@ static unsigned int
> ath10k_core_get_fw_feature_str(char *buf,
> >   	return scnprintf(buf, buf_len, "%s",
> ath10k_core_fw_feature_str[feat]);
> >   }
> >
> > +int ath10k_core_thread_suspend(struct ath10k_thread *thread)
> > +{
> > +	ath10k_dbg(thread->ar, ATH10K_DBG_BOOT, "Suspending thread
> %s\n",
> > +		   thread->name);
> > +	set_bit(ATH10K_THREAD_EVENT_SUSPEND, thread->event_flags);
> > +	wait_for_completion(&thread->suspend);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(ath10k_core_thread_suspend);
> > +
> > +int ath10k_core_thread_resume(struct ath10k_thread *thread)
> > +{
> > +	ath10k_dbg(thread->ar, ATH10K_DBG_BOOT, "Resuming thread
> %s\n",
> > +		   thread->name);
> > +	complete(&thread->resume);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL(ath10k_core_thread_resume);
> > +
> >   void ath10k_core_thread_post_event(struct ath10k_thread *thread,
> >   				   enum ath10k_thread_events event)
> >   {
> > @@ -700,6 +721,8 @@ int ath10k_core_thread_init(struct ath10k *ar,
> >
> >   	init_waitqueue_head(&thread->wait_q);
> >   	init_completion(&thread->shutdown);
> > +	init_completion(&thread->suspend);
> > +	init_completion(&thread->resume);
> >   	memcpy(thread->name, thread_name,
> ATH10K_THREAD_NAME_SIZE_MAX);
> >   	clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN, thread-
> >event_flags);
> >   	ath10k_info(ar, "Starting thread %s\n", thread_name);
> > diff --git a/drivers/net/wireless/ath/ath10k/core.h
> b/drivers/net/wireless/ath/ath10k/core.h
> > index 596d31b..df65e75 100644
> > --- a/drivers/net/wireless/ath/ath10k/core.h
> > +++ b/drivers/net/wireless/ath/ath10k/core.h
> > @@ -976,6 +976,7 @@ enum ath10k_thread_events {
> >   	ATH10K_THREAD_EVENT_SHUTDOWN,
> >   	ATH10K_THREAD_EVENT_RX_POST,
> >   	ATH10K_THREAD_EVENT_TX_POST,
> > +	ATH10K_THREAD_EVENT_SUSPEND,
> >   	ATH10K_THREAD_EVENT_MAX,
> >   };
> >
> > @@ -983,6 +984,8 @@ struct ath10k_thread {
> >   	struct ath10k *ar;
> >   	struct task_struct *task;
> >   	struct completion shutdown;
> > +	struct completion suspend;
> > +	struct completion resume;
> >   	wait_queue_head_t wait_q;
> >   	DECLARE_BITMAP(event_flags, ATH10K_THREAD_EVENT_MAX);
> >   	char name[ATH10K_THREAD_NAME_SIZE_MAX];
> > @@ -1296,6 +1299,8 @@ static inline bool
> ath10k_peer_stats_enabled(struct ath10k *ar)
> >
> >   extern unsigned long ath10k_coredump_mask;
> >
> > +int ath10k_core_thread_suspend(struct ath10k_thread *thread);
> > +int ath10k_core_thread_resume(struct ath10k_thread *thread);
> >   void ath10k_core_thread_post_event(struct ath10k_thread *thread,
> >   				   enum ath10k_thread_events event);
> >   int ath10k_core_thread_shutdown(struct ath10k *ar,
> > diff --git a/drivers/net/wireless/ath/ath10k/snoc.c
> b/drivers/net/wireless/ath/ath10k/snoc.c
> > index 3eb5eac..a373b2b 100644
> > --- a/drivers/net/wireless/ath/ath10k/snoc.c
> > +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> > @@ -932,13 +932,31 @@ int ath10k_snoc_rx_thread_loop(void *data)
> >   					    rx_thread->event_flags) ||
> >
> test_and_clear_bit(ATH10K_THREAD_EVENT_TX_POST,
> >   					    rx_thread->event_flags) ||
> > +			 test_bit(ATH10K_THREAD_EVENT_SUSPEND,
> > +				  rx_thread->event_flags) ||
> >   			 test_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
> >   				  rx_thread->event_flags)));
> >
> > +		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SUSPEND,
> > +				       rx_thread->event_flags)) {
> > +			complete(&rx_thread->suspend);
> > +			ath10k_info(ar, "rx thread suspend\n");
> > +			wait_for_completion(&rx_thread->resume);
> > +			ath10k_info(ar, "rx thread resume\n");
> > +		}
> > +
> >   		ath10k_htt_txrx_compl_task(ar, thread_budget);
> >   		if
> (test_and_clear_bit(ATH10K_THREAD_EVENT_SHUTDOWN,
> >   				       rx_thread->event_flags))
> >   			shutdown = true;
> > +
> > +		if (test_and_clear_bit(ATH10K_THREAD_EVENT_SUSPEND,
> > +				       rx_thread->event_flags)) {
> > +			complete(&rx_thread->suspend);
> > +			ath10k_info(ar, "rx thread suspend\n");
> > +			wait_for_completion(&rx_thread->resume);
> > +			ath10k_info(ar, "rx thread resume\n");
> > +		}
> >   	}
> >
> >   	ath10k_dbg(ar, ATH10K_DBG_SNOC, "rx thread exiting\n");
> > @@ -1133,15 +1151,30 @@ static int ath10k_snoc_hif_suspend(struct
> ath10k *ar)
> >   	if (!device_may_wakeup(ar->dev))
> >   		return -EPERM;
> >
> > +	if (ar->rx_thread_enable) {
> > +		ret = ath10k_core_thread_suspend(&ar->rx_thread);
> > +		if (ret) {
> > +			ath10k_err(ar, "failed to suspend rx_thread, %d\n",
> > +				   ret);
> > +			return ret;
> > +		}
> > +	}
> > +
> >   	ret = enable_irq_wake(ar_snoc-
> >ce_irqs[ATH10K_SNOC_WAKE_IRQ].irq_line);
> >   	if (ret) {
> >   		ath10k_err(ar, "failed to enable wakeup irq :%d\n", ret);
> > -		return ret;
> > +		goto fail;
> >   	}
> >
> >   	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc device suspended\n");
> >
> >   	return ret;
> > +
> > +fail:
> > +	if (ar->rx_thread_enable)
> > +		ath10k_core_thread_resume(&ar->rx_thread);
> > +
> > +	return ret;
> >   }
> >
> >   static int ath10k_snoc_hif_resume(struct ath10k *ar)
> > @@ -1158,6 +1191,15 @@ static int ath10k_snoc_hif_resume(struct ath10k
> *ar)
> >   		return ret;
> >   	}
> >
> > +	if (ar->rx_thread_enable) {
> > +		ret = ath10k_core_thread_resume(&ar->rx_thread);
> > +		if (ret) {
> > +			ath10k_err(ar, "failed to suspend rx_thread, %d\n",
> > +				   ret);
> > +			return ret;
> > +		}
> > +	}
> > +
> >   	ath10k_dbg(ar, ATH10K_DBG_SNOC, "snoc device resumed\n");
> >
> >   	return ret;


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 0/7] Add support to process rx packets in thread
  2020-07-23 19:02       ` Florian Fainelli
@ 2020-07-24  6:20         ` Rakesh Pillai
  2020-07-24 22:28           ` Florian Fainelli
  0 siblings, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-24  6:20 UTC (permalink / raw)
  To: 'Florian Fainelli', 'Andrew Lunn'
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen



> -----Original Message-----
> From: Florian Fainelli <f.fainelli@gmail.com>
> Sent: Friday, July 24, 2020 12:33 AM
> To: Rakesh Pillai <pillair@codeaurora.org>; 'Andrew Lunn'
> <andrew@lunn.ch>
> Cc: ath10k@lists.infradead.org; linux-wireless@vger.kernel.org; linux-
> kernel@vger.kernel.org; kvalo@codeaurora.org; johannes@sipsolutions.net;
> davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
> dianders@chromium.org; evgreen@chromium.org
> Subject: Re: [RFC 0/7] Add support to process rx packets in thread
> 
> On 7/23/20 11:21 AM, Rakesh Pillai wrote:
> >
> >
> >> -----Original Message-----
> >> From: Florian Fainelli <f.fainelli@gmail.com>
> >> Sent: Tuesday, July 21, 2020 11:35 PM
> >> To: Andrew Lunn <andrew@lunn.ch>; Rakesh Pillai
> <pillair@codeaurora.org>
> >> Cc: ath10k@lists.infradead.org; linux-wireless@vger.kernel.org; linux-
> >> kernel@vger.kernel.org; kvalo@codeaurora.org;
> johannes@sipsolutions.net;
> >> davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
> >> dianders@chromium.org; evgreen@chromium.org
> >> Subject: Re: [RFC 0/7] Add support to process rx packets in thread
> >>
> >> On 7/21/20 10:25 AM, Andrew Lunn wrote:
> >>> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
> >>>> NAPI gets scheduled on the CPU core which got the
> >>>> interrupt. The linux scheduler cannot move it to a
> >>>> different core, even if the CPU on which NAPI is running
> >>>> is heavily loaded. This can lead to degraded wifi
> >>>> performance when running traffic at peak data rates.
> >>>>
> >>>> A thread on the other hand can be moved to different
> >>>> CPU cores, if the one on which its running is heavily
> >>>> loaded. During high incoming data traffic, this gives
> >>>> better performance, since the thread can be moved to a
> >>>> less loaded or sometimes even a more powerful CPU core
> >>>> to account for the required CPU performance in order
> >>>> to process the incoming packets.
> >>>>
> >>>> This patch series adds the support to use a high priority
> >>>> thread to process the incoming packets, as opposed to
> >>>> everything being done in NAPI context.
> >>>
> >>> I don't see why this problem is limited to the ath10k driver. I expect
> >>> it applies to all drivers using NAPI. So shouldn't you be solving this
> >>> in the NAPI core? Allow a driver to request the NAPI core uses a
> >>> thread?
> >>
> >> What's more, you should be able to configure interrupt affinity to steer
> >> RX processing onto a desired CPU core, is not that working for you
> >> somehow?
> >
> > Hi Florian,
> > Yes, the affinity of IRQ does work for me.
> > But the affinity of IRQ does not happen runtime based on load.
> 
> It can if you also run irqbalance.


Hi Florian,

Is it some kernel feature ?  Sorry I am not aware of this ?
I know it can be done in userspace.

> --
> Florian


^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 1/7] mac80211: Add check for napi handle before WARN_ON
  2020-07-23 20:06       ` Johannes Berg
@ 2020-07-24  6:21         ` Rakesh Pillai
  2020-07-26 16:19         ` Rakesh Pillai
  1 sibling, 0 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-24  6:21 UTC (permalink / raw)
  To: 'Johannes Berg', ath10k
  Cc: linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen



> -----Original Message-----
> From: Johannes Berg <johannes@sipsolutions.net>
> Sent: Friday, July 24, 2020 1:37 AM
> To: Rakesh Pillai <pillair@codeaurora.org>; ath10k@lists.infradead.org
> Cc: linux-wireless@vger.kernel.org; linux-kernel@vger.kernel.org;
> kvalo@codeaurora.org; davem@davemloft.net; kuba@kernel.org;
> netdev@vger.kernel.org; dianders@chromium.org; evgreen@chromium.org
> Subject: Re: [RFC 1/7] mac80211: Add check for napi handle before
> WARN_ON
> 
> On Thu, 2020-07-23 at 23:56 +0530, Rakesh Pillai wrote:
> 
> > > > -	WARN_ON_ONCE(softirq_count() == 0);
> > > > +	WARN_ON_ONCE(napi && softirq_count() == 0);
> > >
> > > FWIW, I'm pretty sure this is incorrect - we make assumptions on
> > > softirqs being disabled in mac80211 for serialization and in place of
> > > some locking, I believe.
> > >
> >
> > I checked this, but let me double confirm.
> > But after this change, no packet is submitted from driver in a softirq
> context.
> > So ideally this should take care of serialization.
> 
> I'd guess that we have some reliance on BHs already being disabled, for
> things like u64 sync updates, or whatnot. I mean, we did "rx_ni()" for a
> reason ... Maybe lockdep can help catch some of the issues.
> 
> But couldn't you be in a thread and have BHs disabled too?

This would ideally beat the purpose and possibly hurt the other subsystems running on the same core.

> 
> johannes



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
  2020-07-24  6:20         ` Rakesh Pillai
@ 2020-07-24 22:28           ` Florian Fainelli
  0 siblings, 0 replies; 39+ messages in thread
From: Florian Fainelli @ 2020-07-24 22:28 UTC (permalink / raw)
  To: Rakesh Pillai, 'Andrew Lunn'
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen

On 7/23/20 11:20 PM, Rakesh Pillai wrote:
> 
> 
>> -----Original Message-----
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Sent: Friday, July 24, 2020 12:33 AM
>> To: Rakesh Pillai <pillair@codeaurora.org>; 'Andrew Lunn'
>> <andrew@lunn.ch>
>> Cc: ath10k@lists.infradead.org; linux-wireless@vger.kernel.org; linux-
>> kernel@vger.kernel.org; kvalo@codeaurora.org; johannes@sipsolutions.net;
>> davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
>> dianders@chromium.org; evgreen@chromium.org
>> Subject: Re: [RFC 0/7] Add support to process rx packets in thread
>>
>> On 7/23/20 11:21 AM, Rakesh Pillai wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Florian Fainelli <f.fainelli@gmail.com>
>>>> Sent: Tuesday, July 21, 2020 11:35 PM
>>>> To: Andrew Lunn <andrew@lunn.ch>; Rakesh Pillai
>> <pillair@codeaurora.org>
>>>> Cc: ath10k@lists.infradead.org; linux-wireless@vger.kernel.org; linux-
>>>> kernel@vger.kernel.org; kvalo@codeaurora.org;
>> johannes@sipsolutions.net;
>>>> davem@davemloft.net; kuba@kernel.org; netdev@vger.kernel.org;
>>>> dianders@chromium.org; evgreen@chromium.org
>>>> Subject: Re: [RFC 0/7] Add support to process rx packets in thread
>>>>
>>>> On 7/21/20 10:25 AM, Andrew Lunn wrote:
>>>>> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
>>>>>> NAPI gets scheduled on the CPU core which got the
>>>>>> interrupt. The linux scheduler cannot move it to a
>>>>>> different core, even if the CPU on which NAPI is running
>>>>>> is heavily loaded. This can lead to degraded wifi
>>>>>> performance when running traffic at peak data rates.
>>>>>>
>>>>>> A thread on the other hand can be moved to different
>>>>>> CPU cores, if the one on which its running is heavily
>>>>>> loaded. During high incoming data traffic, this gives
>>>>>> better performance, since the thread can be moved to a
>>>>>> less loaded or sometimes even a more powerful CPU core
>>>>>> to account for the required CPU performance in order
>>>>>> to process the incoming packets.
>>>>>>
>>>>>> This patch series adds the support to use a high priority
>>>>>> thread to process the incoming packets, as opposed to
>>>>>> everything being done in NAPI context.
>>>>>
>>>>> I don't see why this problem is limited to the ath10k driver. I expect
>>>>> it applies to all drivers using NAPI. So shouldn't you be solving this
>>>>> in the NAPI core? Allow a driver to request the NAPI core uses a
>>>>> thread?
>>>>
>>>> What's more, you should be able to configure interrupt affinity to steer
>>>> RX processing onto a desired CPU core, is not that working for you
>>>> somehow?
>>>
>>> Hi Florian,
>>> Yes, the affinity of IRQ does work for me.
>>> But the affinity of IRQ does not happen runtime based on load.
>>
>> It can if you also run irqbalance.
> 
> 
> Hi Florian,
> 
> Is it some kernel feature ?  Sorry I am not aware of this ?
> I know it can be done in userspace.

The kernel interface is /proc/<irq>/smp_affinity and the users-space
implementation resides here:

https://github.com/Irqbalance/irqbalance
-- 
Florian

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 2/7] ath10k: Add support to process rx packet in thread
  2020-07-23 18:25     ` Rakesh Pillai
@ 2020-07-24 23:11       ` Jacob Keller
  0 siblings, 0 replies; 39+ messages in thread
From: Jacob Keller @ 2020-07-24 23:11 UTC (permalink / raw)
  To: Rakesh Pillai, 'Rajkumar Manoharan'
  Cc: ath10k, linux-wireless, linux-kernel, kvalo, johannes, davem,
	kuba, netdev, dianders, evgreen, linux-wireless-owner



On 7/23/2020 11:25 AM, Rakesh Pillai wrote:
> Hi Rajkumar,
> In linux, the IRQs are directed to the first core which is booted.
> I see that all the IRQs are getting routed to CORE0 even if its heavily
> loaded.
> 

You should be able to configure the initial IRQ setup so that they don't
all go on CPU 0 when you create the IRQ. That obviously doesn't help the
case of wanting scheduler to dynamically move the processing around to
other CPUs though.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
       [not found]   ` <20200725081633.7432-1-hdanton@sina.com>
@ 2020-07-25 10:38     ` Sebastian Gottschall
  2020-07-25 14:08       ` Sebastian Gottschall
       [not found]       ` <20200725145728.10556-1-hdanton@sina.com>
  2020-07-25 17:57     ` Felix Fietkau
       [not found]     ` <20200726012244.15264-1-hdanton@sina.com>
  2 siblings, 2 replies; 39+ messages in thread
From: Sebastian Gottschall @ 2020-07-25 10:38 UTC (permalink / raw)
  To: Hillf Danton, David Laight, Andrew Lunn, Rakesh Pillai
  Cc: netdev, linux-wireless, linux-kernel, ath10k, dianders,
	Markus Elfring, evgreen, kuba, johannes, davem, kvalo

you may consider this

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142611.html 
<https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142611.html>

years ago someone already wanted to bring this feature upstream, but it 
was denied. i already tested this patch the last 2 days and it worked so 
far (with some little modifications)
so such a solution existed already and may be considered here

Sebastian


someone

Am 25.07.2020 um 10:16 schrieb Hillf Danton:
> On Wed, 22 Jul 2020 09:12:42 +0000 David Laight wrote:
>>> On 21 July 2020 18:25 Andrew Lunn wrote:
>>>
>>> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
>>>> NAPI gets scheduled on the CPU core which got the
>>>> interrupt. The linux scheduler cannot move it to a
>>>> different core, even if the CPU on which NAPI is running
>>>> is heavily loaded. This can lead to degraded wifi
>>>> performance when running traffic at peak data rates.
>>>>
>>>> A thread on the other hand can be moved to different
>>>> CPU cores, if the one on which its running is heavily
>>>> loaded. During high incoming data traffic, this gives
>>>> better performance, since the thread can be moved to a
>>>> less loaded or sometimes even a more powerful CPU core
>>>> to account for the required CPU performance in order
>>>> to process the incoming packets.
>>>>
>>>> This patch series adds the support to use a high priority
>>>> thread to process the incoming packets, as opposed to
>>>> everything being done in NAPI context.
>>> I don't see why this problem is limited to the ath10k driver. I expect
>>> it applies to all drivers using NAPI. So shouldn't you be solving this
>>> in the NAPI core? Allow a driver to request the NAPI core uses a
>>> thread?
>> It's not just NAPI the problem is with the softint processing.
>> I suspect a lot of systems would work better if it ran as
>> a (highish priority) kernel thread.
> Hi folks
>
> Below is a minimunm poc implementation I can imagine on top of workqueue
> to make napi threaded. Thoughts are appreciated.
>
>> I've had to remove the main locks from a multi-threaded application
>> and replace them with atomic counters.
>> Consider what happens when the threads remove items from a shared
>> work list.
>> The code looks like:
>> 	mutex_enter();
>> 	remove_item_from_list();
>> 	mutex_exit().
>> The mutex is only held for a few instructions, so while you'd expect
>> the cache line to be 'hot' you wouldn't get real contention.
>> However the following scenarios happen:
>> 1) An ethernet interrupt happens while the mutex is held.
>>     This stops the other threads until all the softint processing
>>     has finished.
>> 2) An ethernet interrupt (and softint) runs on a thread that is
>>     waiting for the mutex.
>>     (Or on the cpu that the thread's processor affinity ties it to.)
>>     In this case the 'fair' (ticket) mutex code won't let any other
>>     thread acquire the mutex.
>>     So again everything stops until the softints all complete.
>>
>> The second one is also a problem when trying to wake up all
>> the threads (eg after adding a lot of items to the list).
>> The ticket locks force them to wake in order, but
>> sometimes the 'thundering herd' would work better.
>>
>> IIRC this is actually worse for processes running under the RT
>> scheduler (without CONFIG_PREEMPT) because the they are almost
>> always scheduled on the same cpu they ran on last.
>> If it is busy, but cannot be pre-empted, they are not moved
>> to an idle cpu.
>>     
>> To confound things there is a very broken workaround for broken
>> hardware in the driver for the e1000 interface on (at least)
>> Ivy Bridge cpu that can cause the driver to spin for a very
>> long time (IIRC milliseconds) whenever it has to write to a
>> MAC register (ie on every transmit setup).
>>
>> 	David
>>
>> -
>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
>> Registration No: 1397386 (Wales)
> To make napi threaded, if either irq or softirq thread is entirely ruled
> out, add napi::work that will be queued on a highpri workqueue. It is
> actually a unbound one to facilitate scheduler to catter napi loads on to
> idle CPU cores. What users need to do with the threaded napi
> is s/netif_napi_add/netif_threaded_napi_add/ and no more.
>
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -338,6 +338,9 @@ struct napi_struct {
>   	struct list_head	dev_list;
>   	struct hlist_node	napi_hash_node;
>   	unsigned int		napi_id;
> +#ifdef CONFIG_THREADED_NAPI
> +	struct work_struct	work;
> +#endif
>   };
>   
>   enum {
> @@ -2234,6 +2237,19 @@ static inline void *netdev_priv(const st
>   void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>   		    int (*poll)(struct napi_struct *, int), int weight);
>   
> +#ifdef CONFIG_THREADED_NAPI
> +void netif_threaded_napi_add(struct net_device *dev, struct napi_struct *napi,
> +		    int (*poll)(struct napi_struct *, int), int weight);
> +#else
> +static inline void netif_threaded_napi_add(struct net_device *dev,
> +					struct napi_struct *napi,
> +					int (*poll)(struct napi_struct *, int),
> +					int weight)
> +{
> +	netif_napi_add(dev, napi, poll, weight);
> +}
> +#endif
> +
>   /**
>    *	netif_tx_napi_add - initialize a NAPI context
>    *	@dev:  network device
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6277,6 +6277,61 @@ static int process_backlog(struct napi_s
>   	return work;
>   }
>   
> +#ifdef CONFIG_THREADED_NAPI
> +/* unbound highpri workqueue for threaded napi */
> +static struct workqueue_struct *napi_workq;
> +
> +static void napi_workfn(struct work_struct *work)
> +{
> +	struct napi_struct *n = container_of(work, struct napi_struct, work);
> +
> +	for (;;) {
> +		if (!test_bit(NAPI_STATE_SCHED, &n->state))
> +			return;
> +
> +		if (n->poll(n, n->weight) < n->weight)
> +			return;
> +
> +		if (need_resched()) {
> +			/*
> +			 * have to pay for the latency of task switch even if
> +			 * napi is scheduled
> +			 */
> +			if (test_bit(NAPI_STATE_SCHED, &n->state))
> +				queue_work(napi_workq, work);
> +			return;
> +		}
> +	}
> +}
> +
> +void netif_threaded_napi_add(struct net_device *dev,
> +				struct napi_struct *napi,
> +				int (*poll)(struct napi_struct *, int),
> +				int weight)
> +{
> +	netif_napi_add(dev, napi, poll, weight);
> +	INIT_WORK(&napi->work, napi_workfn);
> +}
> +
> +static inline bool is_threaded_napi(struct napi_struct *n)
> +{
> +	return n->work.func == napi_workfn;
> +}
> +
> +static inline void threaded_napi_sched(struct napi_struct *n)
> +{
> +	if (is_threaded_napi(n))
> +		queue_work(napi_workq, &n->work);
> +	else
> +		____napi_schedule(this_cpu_ptr(&softnet_data), n);
> +}
> +#else
> +static inline void threaded_napi_sched(struct napi_struct *n)
> +{
> +	____napi_schedule(this_cpu_ptr(&softnet_data), n);
> +}
> +#endif
> +
>   /**
>    * __napi_schedule - schedule for receive
>    * @n: entry to schedule
> @@ -6289,7 +6344,7 @@ void __napi_schedule(struct napi_struct
>   	unsigned long flags;
>   
>   	local_irq_save(flags);
> -	____napi_schedule(this_cpu_ptr(&softnet_data), n);
> +	threaded_napi_sched(n);
>   	local_irq_restore(flags);
>   }
>   EXPORT_SYMBOL(__napi_schedule);
> @@ -6335,7 +6390,7 @@ EXPORT_SYMBOL(napi_schedule_prep);
>    */
>   void __napi_schedule_irqoff(struct napi_struct *n)
>   {
> -	____napi_schedule(this_cpu_ptr(&softnet_data), n);
> +	threaded_napi_sched(n);
>   }
>   EXPORT_SYMBOL(__napi_schedule_irqoff);
>   
> @@ -10685,6 +10740,10 @@ static int __init net_dev_init(void)
>   		sd->backlog.weight = weight_p;
>   	}
>   
> +#ifdef CONFIG_THREADED_NAPI
> +	napi_workq = alloc_workqueue("napi_workq", WQ_UNBOUND | WQ_HIGHPRI,
> +					    WQ_UNBOUND_MAX_ACTIVE);
> +#endif
>   	dev_boot_phase = 0;
>   
>   	/* The loopback device is special if any other network devices
>
>
> _______________________________________________
> ath10k mailing list
> ath10k@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/ath10k
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
  2020-07-25 10:38     ` Sebastian Gottschall
@ 2020-07-25 14:08       ` Sebastian Gottschall
       [not found]       ` <20200725145728.10556-1-hdanton@sina.com>
  1 sibling, 0 replies; 39+ messages in thread
From: Sebastian Gottschall @ 2020-07-25 14:08 UTC (permalink / raw)
  To: Hillf Danton, David Laight, Andrew Lunn, Rakesh Pillai
  Cc: netdev, linux-wireless, linux-kernel, ath10k, dianders,
	Markus Elfring, evgreen, kuba, johannes, davem, kvalo


Am 25.07.2020 um 14:25 schrieb Hillf Danton:
> On Sat, 25 Jul 2020 12:38:00 +0200 Sebastian Gottschall wrote:
>> you may consider this
>>
>> https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142611.html 
>>
>> <https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1142611.html> 
>>
> Thanks very much for your link.
>
>> years ago someone already wanted to bring this feature upstream, but it
>> was denied. i already tested this patch the last 2 days and it worked so
>> far (with some little modifications)
>> so such a solution existed already and may be considered here
> I don't see outstanding difference in principle from Paolo's work in
> 2016 except for the use of kthread_create() and friends because kworker
> made use of them even before 2016. This is a simpler one as shown by
> the diff stat in his cover letter.
i agree. i just can say that i tested this patch recently due this 
discussion here. and it can be changed by sysfs. but it doesnt work for
wifi drivers which are mainly using dummy netdev devices. for this i 
made a small patch to get them working using napi_set_threaded manually 
hardcoded in the drivers. (see patch bellow)
i also tested various networking drivers. one thing i notice doesnt 
work. some napi code is used for tx polling. so from my experience this 
concept just works good for rx with the most drivers.
so far i tested mt76, ath10k and some soc ethernet chipsets with good 
success. on ath10k i had about 10 - 20% performance gain on multicore 
systems. using standard iperf3 with 4 parallel streams.

-5439,7 +5441,7 @@ int napi_set_threaded(struct napi_struct *n, bool
                 clear_bit(NAPI_STATE_THREADED, &n->state);

         /* if the device is initializing, nothing todo */
-       if (test_bit(__LINK_STATE_START, &n->dev->state))
+       if (test_bit(__LINK_STATE_START, &n->dev->state) && 
n->dev->reg_state != NETREG_DUMMY)
                 return 0;

         napi_thread_stop(n)
;


>
> Paolo, feel free to correct me if I misread anything.
>
> Finally I don't see the need to add sysfs attr, given 
> CONFIG_THREADED_NAPI
> in this work.
>
> BTW let us know if anyone has plans to pick up the 2016 RFC.
>
> Hillf
>
> Paolo Abeni (2):
>    net: implement threaded-able napi poll loop support
>    net: add sysfs attribute to control napi threaded mode
>
>   include/linux/netdevice.h |   4 ++
>   net/core/dev.c            | 113 
> ++++++++++++++++++++++++++++++++++++++++++++++
>   net/core/net-sysfs.c      | 102 
> +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 219 insertions(+)
>> Sebastian
>>
>>
>> someone
>>
>> Am 25.07.2020 um 10:16 schrieb Hillf Danton:
>>> On Wed, 22 Jul 2020 09:12:42 +0000 David Laight wrote:
>>>>> On 21 July 2020 18:25 Andrew Lunn wrote:
>>>>>
>>>>> On Tue, Jul 21, 2020 at 10:44:19PM +0530, Rakesh Pillai wrote:
>>>>>> NAPI gets scheduled on the CPU core which got the
>>>>>> interrupt. The linux scheduler cannot move it to a
>>>>>> different core, even if the CPU on which NAPI is running
>>>>>> is heavily loaded. This can lead to degraded wifi
>>>>>> performance when running traffic at peak data rates.
>>>>>>
>>>>>> A thread on the other hand can be moved to different
>>>>>> CPU cores, if the one on which its running is heavily
>>>>>> loaded. During high incoming data traffic, this gives
>>>>>> better performance, since the thread can be moved to a
>>>>>> less loaded or sometimes even a more powerful CPU core
>>>>>> to account for the required CPU performance in order
>>>>>> to process the incoming packets.
>>>>>>
>>>>>> This patch series adds the support to use a high priority
>>>>>> thread to process the incoming packets, as opposed to
>>>>>> everything being done in NAPI context.
>>>>> I don't see why this problem is limited to the ath10k driver. I 
>>>>> expect
>>>>> it applies to all drivers using NAPI. So shouldn't you be solving 
>>>>> this
>>>>> in the NAPI core? Allow a driver to request the NAPI core uses a
>>>>> thread?
>>>> It's not just NAPI the problem is with the softint processing.
>>>> I suspect a lot of systems would work better if it ran as
>>>> a (highish priority) kernel thread.
>>> Hi folks
>>>
>>> Below is a minimunm poc implementation I can imagine on top of 
>>> workqueue
>>> to make napi threaded. Thoughts are appreciated.
>>>
>>>> I've had to remove the main locks from a multi-threaded application
>>>> and replace them with atomic counters.
>>>> Consider what happens when the threads remove items from a shared
>>>> work list.
>>>> The code looks like:
>>>>     mutex_enter();
>>>>     remove_item_from_list();
>>>>     mutex_exit().
>>>> The mutex is only held for a few instructions, so while you'd expect
>>>> the cache line to be 'hot' you wouldn't get real contention.
>>>> However the following scenarios happen:
>>>> 1) An ethernet interrupt happens while the mutex is held.
>>>>      This stops the other threads until all the softint processing
>>>>      has finished.
>>>> 2) An ethernet interrupt (and softint) runs on a thread that is
>>>>      waiting for the mutex.
>>>>      (Or on the cpu that the thread's processor affinity ties it to.)
>>>>      In this case the 'fair' (ticket) mutex code won't let any other
>>>>      thread acquire the mutex.
>>>>      So again everything stops until the softints all complete.
>>>>
>>>> The second one is also a problem when trying to wake up all
>>>> the threads (eg after adding a lot of items to the list).
>>>> The ticket locks force them to wake in order, but
>>>> sometimes the 'thundering herd' would work better.
>>>>
>>>> IIRC this is actually worse for processes running under the RT
>>>> scheduler (without CONFIG_PREEMPT) because the they are almost
>>>> always scheduled on the same cpu they ran on last.
>>>> If it is busy, but cannot be pre-empted, they are not moved
>>>> to an idle cpu.
>>>>      To confound things there is a very broken workaround for broken
>>>> hardware in the driver for the e1000 interface on (at least)
>>>> Ivy Bridge cpu that can cause the driver to spin for a very
>>>> long time (IIRC milliseconds) whenever it has to write to a
>>>> MAC register (ie on every transmit setup).
>>>>
>>>>     David
>>>>
>>>> -
>>>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton 
>>>> Keynes, MK1 1PT, UK
>>>> Registration No: 1397386 (Wales)
>>>>
>>>>
>>> To make napi threaded, if either irq or softirq thread is entirely 
>>> ruled
>>> out, add napi::work that will be queued on a highpri workqueue. It is
>>> actually a unbound one to facilitate scheduler to catter napi loads 
>>> on to
>>> idle CPU cores. What users need to do with the threaded napi
>>> is s/netif_napi_add/netif_threaded_napi_add/ and no more.
>>>
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -338,6 +338,9 @@ struct napi_struct {
>>>        struct list_head    dev_list;
>>>        struct hlist_node    napi_hash_node;
>>>        unsigned int        napi_id;
>>> +#ifdef CONFIG_THREADED_NAPI
>>> +    struct work_struct    work;
>>> +#endif
>>>    };
>>>       enum {
>>> @@ -2234,6 +2237,19 @@ static inline void *netdev_priv(const st
>>>    void netif_napi_add(struct net_device *dev, struct napi_struct 
>>> *napi,
>>>                int (*poll)(struct napi_struct *, int), int weight);
>>>    +#ifdef CONFIG_THREADED_NAPI
>>> +void netif_threaded_napi_add(struct net_device *dev, struct 
>>> napi_struct *napi,
>>> +            int (*poll)(struct napi_struct *, int), int weight);
>>> +#else
>>> +static inline void netif_threaded_napi_add(struct net_device *dev,
>>> +                    struct napi_struct *napi,
>>> +                    int (*poll)(struct napi_struct *, int),
>>> +                    int weight)
>>> +{
>>> +    netif_napi_add(dev, napi, poll, weight);
>>> +}
>>> +#endif
>>> +
>>>    /**
>>>     *    netif_tx_napi_add - initialize a NAPI context
>>>     *    @dev:  network device
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -6277,6 +6277,61 @@ static int process_backlog(struct napi_s
>>>        return work;
>>>    }
>>>    +#ifdef CONFIG_THREADED_NAPI
>>> +/* unbound highpri workqueue for threaded napi */
>>> +static struct workqueue_struct *napi_workq;
>>> +
>>> +static void napi_workfn(struct work_struct *work)
>>> +{
>>> +    struct napi_struct *n = container_of(work, struct napi_struct, 
>>> work);
>>> +
>>> +    for (;;) {
>>> +        if (!test_bit(NAPI_STATE_SCHED, &n->state))
>>> +            return;
>>> +
>>> +        if (n->poll(n, n->weight) < n->weight)
>>> +            return;
>>> +
>>> +        if (need_resched()) {
>>> +            /*
>>> +             * have to pay for the latency of task switch even if
>>> +             * napi is scheduled
>>> +             */
>>> +            if (test_bit(NAPI_STATE_SCHED, &n->state))
>>> +                queue_work(napi_workq, work);
>>> +            return;
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +void netif_threaded_napi_add(struct net_device *dev,
>>> +                struct napi_struct *napi,
>>> +                int (*poll)(struct napi_struct *, int),
>>> +                int weight)
>>> +{
>>> +    netif_napi_add(dev, napi, poll, weight);
>>> +    INIT_WORK(&napi->work, napi_workfn);
>>> +}
>>> +
>>> +static inline bool is_threaded_napi(struct napi_struct *n)
>>> +{
>>> +    return n->work.func == napi_workfn;
>>> +}
>>> +
>>> +static inline void threaded_napi_sched(struct napi_struct *n)
>>> +{
>>> +    if (is_threaded_napi(n))
>>> +        queue_work(napi_workq, &n->work);
>>> +    else
>>> +        ____napi_schedule(this_cpu_ptr(&softnet_data), n);
>>> +}
>>> +#else
>>> +static inline void threaded_napi_sched(struct napi_struct *n)
>>> +{
>>> +    ____napi_schedule(this_cpu_ptr(&softnet_data), n);
>>> +}
>>> +#endif
>>> +
>>>    /**
>>>     * __napi_schedule - schedule for receive
>>>     * @n: entry to schedule
>>> @@ -6289,7 +6344,7 @@ void __napi_schedule(struct napi_struct
>>>        unsigned long flags;
>>>           local_irq_save(flags);
>>> -    ____napi_schedule(this_cpu_ptr(&softnet_data), n);
>>> +    threaded_napi_sched(n);
>>>        local_irq_restore(flags);
>>>    }
>>>    EXPORT_SYMBOL(__napi_schedule);
>>> @@ -6335,7 +6390,7 @@ EXPORT_SYMBOL(napi_schedule_prep);
>>>     */
>>>    void __napi_schedule_irqoff(struct napi_struct *n)
>>>    {
>>> -    ____napi_schedule(this_cpu_ptr(&softnet_data), n);
>>> +    threaded_napi_sched(n);
>>>    }
>>>    EXPORT_SYMBOL(__napi_schedule_irqoff);
>>>    @@ -10685,6 +10740,10 @@ static int __init net_dev_init(void)
>>>            sd->backlog.weight = weight_p;
>>>        }
>>>    +#ifdef CONFIG_THREADED_NAPI
>>> +    napi_workq = alloc_workqueue("napi_workq", WQ_UNBOUND | 
>>> WQ_HIGHPRI,
>>> +                        WQ_UNBOUND_MAX_ACTIVE);
>>> +#endif
>>>        dev_boot_phase = 0;
>>>           /* The loopback device is special if any other network 
>>> devices
>>>
>>>
>>> _______________________________________________
>>> ath10k mailing list
>>> ath10k@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/ath10k
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
       [not found]       ` <20200725145728.10556-1-hdanton@sina.com>
@ 2020-07-25 15:41         ` Sebastian Gottschall
  2020-07-26 11:16           ` David Laight
  0 siblings, 1 reply; 39+ messages in thread
From: Sebastian Gottschall @ 2020-07-25 15:41 UTC (permalink / raw)
  To: Hillf Danton
  Cc: David Laight, Andrew Lunn, Rakesh Pillai, netdev, linux-wireless,
	linux-kernel, ath10k, dianders, Markus Elfring, evgreen, kuba,
	johannes, davem, kvalo


>> i agree. i just can say that i tested this patch recently due this
>> discussion here. and it can be changed by sysfs. but it doesnt work for
>> wifi drivers which are mainly using dummy netdev devices. for this i
>> made a small patch to get them working using napi_set_threaded manually
>> hardcoded in the drivers. (see patch bellow)
> By CONFIG_THREADED_NAPI, there is no need to consider what you did here
> in the napi core because device drivers know better and are responsible
> for it before calling napi_schedule(n).
yeah. but that approach will not work for some cases. some stupid 
drivers are using locking context in the napi poll function.
in that case the performance will runto shit. i discovered this with the 
mvneta eth driver (marvell) and mt76 tx polling (rx  works)
for mvneta is will cause very high latencies and packet drops. for mt76 
it causes packet stop. doesnt work simply (on all cases no crashes)
so the threading will only work for drivers which are compatible with 
that approach. it cannot be used as drop in replacement from my point of 
view.
its all a question of the driver design

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
       [not found]   ` <20200725081633.7432-1-hdanton@sina.com>
  2020-07-25 10:38     ` Sebastian Gottschall
@ 2020-07-25 17:57     ` Felix Fietkau
       [not found]     ` <20200726012244.15264-1-hdanton@sina.com>
  2 siblings, 0 replies; 39+ messages in thread
From: Felix Fietkau @ 2020-07-25 17:57 UTC (permalink / raw)
  To: Hillf Danton, David Laight, Andrew Lunn, Rakesh Pillai
  Cc: netdev, linux-wireless, linux-kernel, ath10k, dianders,
	Markus Elfring, evgreen, kuba, johannes, davem, kvalo

On 2020-07-25 10:16, Hillf Danton wrote:
> Hi folks
> 
> Below is a minimunm poc implementation I can imagine on top of workqueue
> to make napi threaded. Thoughts are appreciated.
Hi Hillf,

For some reason I don't see your mails on linux-wireless/netdev.
I've cleaned up your implementation a bit and I ran some tests with mt76
on an mt7621 embedded board. The results look pretty nice, performance
is a lot more consistent in my tests now.

Here are the changes that I've made compared to your version:

- remove the #ifdef, I think it's unnecessary
- add a state bit for threaded NAPI
- make netif_threaded_napi_add inline
- run queue_work outside of local_irq_save/restore (it does that
internally already)

If you don't mind, I'd like to propose this to netdev soon. Can I have
your Signed-off-by for that?

Thanks,

- Felix

---
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -347,6 +347,7 @@ struct napi_struct {
 	struct list_head	dev_list;
 	struct hlist_node	napi_hash_node;
 	unsigned int		napi_id;
+	struct work_struct	work;
 };
 
 enum {
@@ -357,6 +358,7 @@ enum {
 	NAPI_STATE_HASHED,	/* In NAPI hash (busy polling possible) */
 	NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
 	NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
+	NAPI_STATE_THREADED,	/* Use threaded NAPI */
 };
 
 enum {
@@ -367,6 +369,7 @@ enum {
 	NAPIF_STATE_HASHED	 = BIT(NAPI_STATE_HASHED),
 	NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
 	NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
+	NAPIF_STATE_THREADED	 = BIT(NAPI_STATE_THREADED),
 };
 
 enum gro_result {
@@ -2315,6 +2318,26 @@ static inline void *netdev_priv(const struct net_device *dev)
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight);
 
+/**
+ *	netif_threaded_napi_add - initialize a NAPI context
+ *	@dev:  network device
+ *	@napi: NAPI context
+ *	@poll: polling function
+ *	@weight: default weight
+ *
+ * This variant of netif_napi_add() should be used from drivers using NAPI
+ * with CPU intensive poll functions.
+ * This will schedule polling from a high priority workqueue that
+ */
+static inline void netif_threaded_napi_add(struct net_device *dev,
+					   struct napi_struct *napi,
+					   int (*poll)(struct napi_struct *, int),
+					   int weight)
+{
+	set_bit(NAPI_STATE_THREADED, &napi->state);
+	netif_napi_add(dev, napi, poll, weight);
+}
+
 /**
  *	netif_tx_napi_add - initialize a NAPI context
  *	@dev:  network device
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -158,6 +158,7 @@ static DEFINE_SPINLOCK(offload_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;	/* Taps */
 static struct list_head offload_base __read_mostly;
+static struct workqueue_struct *napi_workq;
 
 static int netif_rx_internal(struct sk_buff *skb);
 static int call_netdevice_notifiers_info(unsigned long val,
@@ -6286,6 +6287,11 @@ void __napi_schedule(struct napi_struct *n)
 {
 	unsigned long flags;
 
+	if (test_bit(NAPI_STATE_THREADED, &n->state)) {
+		queue_work(napi_workq, &n->work);
+		return;
+	}
+
 	local_irq_save(flags);
 	____napi_schedule(this_cpu_ptr(&softnet_data), n);
 	local_irq_restore(flags);
@@ -6333,6 +6339,11 @@ EXPORT_SYMBOL(napi_schedule_prep);
  */
 void __napi_schedule_irqoff(struct napi_struct *n)
 {
+	if (test_bit(NAPI_STATE_THREADED, &n->state)) {
+		queue_work(napi_workq, &n->work);
+		return;
+	}
+
 	____napi_schedule(this_cpu_ptr(&softnet_data), n);
 }
 EXPORT_SYMBOL(__napi_schedule_irqoff);
@@ -6601,6 +6612,29 @@ static void init_gro_hash(struct napi_struct *napi)
 	napi->gro_bitmask = 0;
 }
 
+static void napi_workfn(struct work_struct *work)
+{
+	struct napi_struct *n = container_of(work, struct napi_struct, work);
+
+	for (;;) {
+		if (!test_bit(NAPI_STATE_SCHED, &n->state))
+			return;
+
+		if (n->poll(n, n->weight) < n->weight)
+			return;
+
+		if (need_resched()) {
+			/*
+			 * have to pay for the latency of task switch even if
+			 * napi is scheduled
+			 */
+			if (test_bit(NAPI_STATE_SCHED, &n->state))
+				queue_work(napi_workq, work);
+			return;
+		}
+	}
+}
+
 void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 		    int (*poll)(struct napi_struct *, int), int weight)
 {
@@ -6621,6 +6655,7 @@ void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
 #ifdef CONFIG_NETPOLL
 	napi->poll_owner = -1;
 #endif
+	INIT_WORK(&napi->work, napi_workfn);
 	set_bit(NAPI_STATE_SCHED, &napi->state);
 	napi_hash_add(napi);
 }
@@ -10676,6 +10711,10 @@ static int __init net_dev_init(void)
 		sd->backlog.weight = weight_p;
 	}
 
+	napi_workq = alloc_workqueue("napi_workq", WQ_UNBOUND | WQ_HIGHPRI,
+				     WQ_UNBOUND_MAX_ACTIVE);
+	BUG_ON(!napi_workq);
+
 	dev_boot_phase = 0;
 
 	/* The loopback device is special if any other network devices

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
       [not found]     ` <20200726012244.15264-1-hdanton@sina.com>
@ 2020-07-26  8:10       ` Felix Fietkau
       [not found]       ` <20200726083239.5060-1-hdanton@sina.com>
  1 sibling, 0 replies; 39+ messages in thread
From: Felix Fietkau @ 2020-07-26  8:10 UTC (permalink / raw)
  To: Hillf Danton
  Cc: David Laight, Andrew Lunn, Rakesh Pillai, netdev, linux-wireless,
	linux-kernel, ath10k, dianders, Markus Elfring, Ding Zhao Nan,
	evgreen, kuba, johannes, davem, kvalo

On 2020-07-26 03:22, Hillf Danton wrote:
>> - add a state bit for threaded NAPI
>> - make netif_threaded_napi_add inline
>> - run queue_work outside of local_irq_save/restore (it does that
>> internally already)
>>
>> If you don't mind, I'd like to propose this to netdev soon. Can I have
>> your Signed-off-by for that?
> 
> Feel free to do that. Is it likely for me to select a Cc?
Shall I use Signed-off-by: Hillf Danton <hdanton@sina.com>?
What Cc do you want me to add?

>> ---
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -347,6 +347,7 @@ struct napi_struct {
>>  	struct list_head	dev_list;
>>  	struct hlist_node	napi_hash_node;
>>  	unsigned int		napi_id;
>> +	struct work_struct	work;
>>  };
>>  
>>  enum {
>> @@ -357,6 +358,7 @@ enum {
>>  	NAPI_STATE_HASHED,	/* In NAPI hash (busy polling possible) */
>>  	NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
>>  	NAPI_STATE_IN_BUSY_POLL,/* sk_busy_loop() owns this NAPI */
>> +	NAPI_STATE_THREADED,	/* Use threaded NAPI */
>>  };
>>  
>>  enum {
>> @@ -367,6 +369,7 @@ enum {
>>  	NAPIF_STATE_HASHED	 = BIT(NAPI_STATE_HASHED),
>>  	NAPIF_STATE_NO_BUSY_POLL = BIT(NAPI_STATE_NO_BUSY_POLL),
>>  	NAPIF_STATE_IN_BUSY_POLL = BIT(NAPI_STATE_IN_BUSY_POLL),
>> +	NAPIF_STATE_THREADED	 = BIT(NAPI_STATE_THREADED),
>>  };
>>  
>>  enum gro_result {
>> @@ -2315,6 +2318,26 @@ static inline void *netdev_priv(const struct net_device *dev)
>>  void netif_napi_add(struct net_device *dev, struct napi_struct *napi,
>>  		    int (*poll)(struct napi_struct *, int), int weight);
>>  
>> +/**
>> + *	netif_threaded_napi_add - initialize a NAPI context
>> + *	@dev:  network device
>> + *	@napi: NAPI context
>> + *	@poll: polling function
>> + *	@weight: default weight
>> + *
>> + * This variant of netif_napi_add() should be used from drivers using NAPI
>> + * with CPU intensive poll functions.
>> + * This will schedule polling from a high priority workqueue that
>> + */
>> +static inline void netif_threaded_napi_add(struct net_device *dev,
>> +					   struct napi_struct *napi,
>> +					   int (*poll)(struct napi_struct *, int),
>> +					   int weight)
>> +{
>> +	set_bit(NAPI_STATE_THREADED, &napi->state);
>> +	netif_napi_add(dev, napi, poll, weight);
>> +}
>> +
>>  /**
>>   *	netif_tx_napi_add - initialize a NAPI context
>>   *	@dev:  network device
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -158,6 +158,7 @@ static DEFINE_SPINLOCK(offload_lock);
>>  struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
>>  struct list_head ptype_all __read_mostly;	/* Taps */
>>  static struct list_head offload_base __read_mostly;
>> +static struct workqueue_struct *napi_workq;
> 
> Is __read_mostly missing?
Yes, thanks. I will add that before sending the patch.

- Felix

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 0/7] Add support to process rx packets in thread
       [not found]       ` <20200726083239.5060-1-hdanton@sina.com>
@ 2020-07-26  8:59         ` Felix Fietkau
  0 siblings, 0 replies; 39+ messages in thread
From: Felix Fietkau @ 2020-07-26  8:59 UTC (permalink / raw)
  To: Hillf Danton
  Cc: David Laight, Andrew Lunn, Rakesh Pillai, netdev, linux-wireless,
	linux-kernel, ath10k, dianders, Markus Elfring, Ding Zhao Nan,
	evgreen, kuba, johannes, davem, kvalo

On 2020-07-26 10:32, Hillf Danton wrote:
> 
> On Sun, 26 Jul 2020 10:10:15 +0200 Felix Fietkau wrote:
>> On 2020-07-26 03:22, Hillf Danton wrote:
>> > 
>> > Feel free to do that. Is it likely for me to select a Cc?
>> > 
>> Shall I use Signed-off-by: Hillf Danton <hdanton@sina.com>?
> 
> s/Signed-off-by/Cc/
> 
>> What Cc do you want me to add?
> 
> I prefer Cc over other tags.
Ah, okay. I was planning on adding you as the author of the patch, since
you did most of the work on it. If you want to be attributed as author
(or in Co-developed-by), I'd need your Signed-off-by.

If you don't want that, I can submit it under my name and leave you in
as Cc only.

- Felix

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 0/7] Add support to process rx packets in thread
  2020-07-25 15:41         ` Sebastian Gottschall
@ 2020-07-26 11:16           ` David Laight
  2020-07-28 16:59             ` Rakesh Pillai
  0 siblings, 1 reply; 39+ messages in thread
From: David Laight @ 2020-07-26 11:16 UTC (permalink / raw)
  To: 'Sebastian Gottschall', Hillf Danton
  Cc: Andrew Lunn, Rakesh Pillai, netdev, linux-wireless, linux-kernel,
	ath10k, dianders, Markus Elfring, evgreen, kuba, johannes, davem,
	kvalo

From: Sebastian Gottschall <s.gottschall@dd-wrt.com>
> Sent: 25 July 2020 16:42
> >> i agree. i just can say that i tested this patch recently due this
> >> discussion here. and it can be changed by sysfs. but it doesnt work for
> >> wifi drivers which are mainly using dummy netdev devices. for this i
> >> made a small patch to get them working using napi_set_threaded manually
> >> hardcoded in the drivers. (see patch bellow)

> > By CONFIG_THREADED_NAPI, there is no need to consider what you did here
> > in the napi core because device drivers know better and are responsible
> > for it before calling napi_schedule(n).

> yeah. but that approach will not work for some cases. some stupid
> drivers are using locking context in the napi poll function.
> in that case the performance will runto shit. i discovered this with the
> mvneta eth driver (marvell) and mt76 tx polling (rx  works)
> for mvneta is will cause very high latencies and packet drops. for mt76
> it causes packet stop. doesnt work simply (on all cases no crashes)
> so the threading will only work for drivers which are compatible with
> that approach. it cannot be used as drop in replacement from my point of
> view.
> its all a question of the driver design

Why should it make (much) difference whether the napi callbacks (etc)
are done in the context of the interrupted process or that of a
specific kernel thread.
The process flags (or whatever) can even be set so that it appears
to be the expected 'softint' context.

In any case running NAPI from a thread will just show up the next
piece of code that runs for ages in softint context.
I think I've seen the tail end of memory being freed under rcu
finally happening under softint and taking absolutely ages.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 1/7] mac80211: Add check for napi handle before WARN_ON
  2020-07-23 20:06       ` Johannes Berg
  2020-07-24  6:21         ` Rakesh Pillai
@ 2020-07-26 16:19         ` Rakesh Pillai
  2020-07-30 12:40           ` Johannes Berg
  1 sibling, 1 reply; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-26 16:19 UTC (permalink / raw)
  To: 'Johannes Berg', ath10k
  Cc: linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen



> -----Original Message-----
> From: Rakesh Pillai <pillair@codeaurora.org>
> Sent: Friday, July 24, 2020 11:51 AM
> To: 'Johannes Berg' <johannes@sipsolutions.net>;
> 'ath10k@lists.infradead.org' <ath10k@lists.infradead.org>
> Cc: 'linux-wireless@vger.kernel.org' <linux-wireless@vger.kernel.org>;
> 'linux-kernel@vger.kernel.org' <linux-kernel@vger.kernel.org>;
> 'kvalo@codeaurora.org' <kvalo@codeaurora.org>; 'davem@davemloft.net'
> <davem@davemloft.net>; 'kuba@kernel.org' <kuba@kernel.org>;
> 'netdev@vger.kernel.org' <netdev@vger.kernel.org>;
> 'dianders@chromium.org' <dianders@chromium.org>;
> 'evgreen@chromium.org' <evgreen@chromium.org>
> Subject: RE: [RFC 1/7] mac80211: Add check for napi handle before
> WARN_ON
> 
> 
> 
> > -----Original Message-----
> > From: Johannes Berg <johannes@sipsolutions.net>
> > Sent: Friday, July 24, 2020 1:37 AM
> > To: Rakesh Pillai <pillair@codeaurora.org>; ath10k@lists.infradead.org
> > Cc: linux-wireless@vger.kernel.org; linux-kernel@vger.kernel.org;
> > kvalo@codeaurora.org; davem@davemloft.net; kuba@kernel.org;
> > netdev@vger.kernel.org; dianders@chromium.org;
> evgreen@chromium.org
> > Subject: Re: [RFC 1/7] mac80211: Add check for napi handle before
> > WARN_ON
> >
> > On Thu, 2020-07-23 at 23:56 +0530, Rakesh Pillai wrote:
> >
> > > > > -	WARN_ON_ONCE(softirq_count() == 0);
> > > > > +	WARN_ON_ONCE(napi && softirq_count() == 0);
> > > >
> > > > FWIW, I'm pretty sure this is incorrect - we make assumptions on
> > > > softirqs being disabled in mac80211 for serialization and in place of
> > > > some locking, I believe.
> > > >
> > >
> > > I checked this, but let me double confirm.
> > > But after this change, no packet is submitted from driver in a softirq
> > context.
> > > So ideally this should take care of serialization.
> >
> > I'd guess that we have some reliance on BHs already being disabled, for
> > things like u64 sync updates, or whatnot. I mean, we did "rx_ni()" for a
> > reason ... Maybe lockdep can help catch some of the issues.
> >
> > But couldn't you be in a thread and have BHs disabled too?
> 
> This would ideally beat the purpose and possibly hurt the other subsystems
> running on the same core.
> 

Hi Johannes,

We do have the usage of napi_gro_receive and netif_receive_skb in mac80211.
                /* deliver to local stack */
                if (rx->napi)
                        napi_gro_receive(rx->napi, skb);
                else
                        netif_receive_skb(skb);


Also all the rx_handlers are called under the " rx->local->rx_path_lock" lock.
Is the BH disable still required ?


> >
> > johannes



^ permalink raw reply	[flat|nested] 39+ messages in thread

* RE: [RFC 0/7] Add support to process rx packets in thread
  2020-07-26 11:16           ` David Laight
@ 2020-07-28 16:59             ` Rakesh Pillai
  0 siblings, 0 replies; 39+ messages in thread
From: Rakesh Pillai @ 2020-07-28 16:59 UTC (permalink / raw)
  To: 'David Laight', 'Sebastian Gottschall',
	'Hillf Danton'
  Cc: 'Andrew Lunn',
	netdev, linux-wireless, linux-kernel, ath10k, dianders,
	'Markus Elfring',
	evgreen, kuba, johannes, davem, kvalo



> -----Original Message-----
> From: David Laight <David.Laight@ACULAB.COM>
> Sent: Sunday, July 26, 2020 4:46 PM
> To: 'Sebastian Gottschall' <s.gottschall@dd-wrt.com>; Hillf Danton
> <hdanton@sina.com>
> Cc: Andrew Lunn <andrew@lunn.ch>; Rakesh Pillai <pillair@codeaurora.org>;
> netdev@vger.kernel.org; linux-wireless@vger.kernel.org; linux-
> kernel@vger.kernel.org; ath10k@lists.infradead.org;
> dianders@chromium.org; Markus Elfring <Markus.Elfring@web.de>;
> evgreen@chromium.org; kuba@kernel.org; johannes@sipsolutions.net;
> davem@davemloft.net; kvalo@codeaurora.org
> Subject: RE: [RFC 0/7] Add support to process rx packets in thread
> 
> From: Sebastian Gottschall <s.gottschall@dd-wrt.com>
> > Sent: 25 July 2020 16:42
> > >> i agree. i just can say that i tested this patch recently due this
> > >> discussion here. and it can be changed by sysfs. but it doesnt work for
> > >> wifi drivers which are mainly using dummy netdev devices. for this i
> > >> made a small patch to get them working using napi_set_threaded
> manually
> > >> hardcoded in the drivers. (see patch bellow)
> 
> > > By CONFIG_THREADED_NAPI, there is no need to consider what you did
> here
> > > in the napi core because device drivers know better and are responsible
> > > for it before calling napi_schedule(n).
> 
> > yeah. but that approach will not work for some cases. some stupid
> > drivers are using locking context in the napi poll function.
> > in that case the performance will runto shit. i discovered this with the
> > mvneta eth driver (marvell) and mt76 tx polling (rx  works)
> > for mvneta is will cause very high latencies and packet drops. for mt76
> > it causes packet stop. doesnt work simply (on all cases no crashes)
> > so the threading will only work for drivers which are compatible with
> > that approach. it cannot be used as drop in replacement from my point of
> > view.
> > its all a question of the driver design
> 
> Why should it make (much) difference whether the napi callbacks (etc)
> are done in the context of the interrupted process or that of a
> specific kernel thread.
> The process flags (or whatever) can even be set so that it appears
> to be the expected 'softint' context.
> 
> In any case running NAPI from a thread will just show up the next
> piece of code that runs for ages in softint context.
> I think I've seen the tail end of memory being freed under rcu
> finally happening under softint and taking absolutely ages.
> 
> 	David
> 

Hi All,

Is the threaded NAPI change posted to kernel ? 
Is the conclusion of this discussion that " we cannot use threads for processing packets " ??


> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,
> MK1 1PT, UK
> Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [RFC 1/7] mac80211: Add check for napi handle before WARN_ON
  2020-07-26 16:19         ` Rakesh Pillai
@ 2020-07-30 12:40           ` Johannes Berg
  0 siblings, 0 replies; 39+ messages in thread
From: Johannes Berg @ 2020-07-30 12:40 UTC (permalink / raw)
  To: Rakesh Pillai, ath10k
  Cc: linux-wireless, linux-kernel, kvalo, davem, kuba, netdev,
	dianders, evgreen

On Sun, 2020-07-26 at 21:49 +0530, Rakesh Pillai wrote:

> We do have the usage of napi_gro_receive and netif_receive_skb in mac80211.
>                 /* deliver to local stack */
>                 if (rx->napi)
>                         napi_gro_receive(rx->napi, skb);
>                 else
>                         netif_receive_skb(skb);
> 
> 
> Also all the rx_handlers are called under the " rx->local->rx_path_lock" lock.
> Is the BH disable still required ?

I tend to think so, but you're welcome to show that it's not. The lock
serves a different purpose.

TBH, I don't have all of this in my head at all times either, so please
do your own work and analyse why it may or may not be necessary. But
without any such analysis I'm not going to take patches that change it.

johannes


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, back to index

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-21 17:14 [RFC 0/7] Add support to process rx packets in thread Rakesh Pillai
2020-07-21 17:14 ` [RFC 1/7] mac80211: Add check for napi handle before WARN_ON Rakesh Pillai
2020-07-22 12:56   ` Johannes Berg
2020-07-23 18:26     ` Rakesh Pillai
2020-07-23 20:06       ` Johannes Berg
2020-07-24  6:21         ` Rakesh Pillai
2020-07-26 16:19         ` Rakesh Pillai
2020-07-30 12:40           ` Johannes Berg
2020-07-21 17:14 ` [RFC 2/7] ath10k: Add support to process rx packet in thread Rakesh Pillai
2020-07-21 21:53   ` Rajkumar Manoharan
2020-07-22 12:27     ` Felix Fietkau
2020-07-22 12:55       ` Johannes Berg
2020-07-22 13:00         ` Felix Fietkau
2020-07-23  6:09           ` Rajkumar Manoharan
2020-07-23 18:25     ` Rakesh Pillai
2020-07-24 23:11       ` Jacob Keller
2020-07-21 17:14 ` [RFC 3/7] ath10k: Add module param to enable rx thread Rakesh Pillai
2020-07-21 17:14 ` [RFC 4/7] ath10k: Do not exhaust budget on process tx completion Rakesh Pillai
2020-07-21 17:14 ` [RFC 5/7] ath10k: Handle the rx packet processing in thread Rakesh Pillai
2020-07-21 17:14 ` [RFC 6/7] ath10k: Add deliver to stack from thread context Rakesh Pillai
2020-07-21 17:14 ` [RFC 7/7] ath10k: Handle rx thread suspend and resume Rakesh Pillai
2020-07-23 23:06   ` Sebastian Gottschall
2020-07-24  6:19     ` Rakesh Pillai
2020-07-21 17:25 ` [RFC 0/7] Add support to process rx packets in thread Andrew Lunn
2020-07-21 18:05   ` Florian Fainelli
2020-07-23 18:21     ` Rakesh Pillai
2020-07-23 19:02       ` Florian Fainelli
2020-07-24  6:20         ` Rakesh Pillai
2020-07-24 22:28           ` Florian Fainelli
2020-07-22  9:12   ` David Laight
2020-07-22 16:20   ` Jakub Kicinski
     [not found]   ` <20200725081633.7432-1-hdanton@sina.com>
2020-07-25 10:38     ` Sebastian Gottschall
2020-07-25 14:08       ` Sebastian Gottschall
     [not found]       ` <20200725145728.10556-1-hdanton@sina.com>
2020-07-25 15:41         ` Sebastian Gottschall
2020-07-26 11:16           ` David Laight
2020-07-28 16:59             ` Rakesh Pillai
2020-07-25 17:57     ` Felix Fietkau
     [not found]     ` <20200726012244.15264-1-hdanton@sina.com>
2020-07-26  8:10       ` Felix Fietkau
     [not found]       ` <20200726083239.5060-1-hdanton@sina.com>
2020-07-26  8:59         ` Felix Fietkau

Linux-Wireless Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-wireless/0 linux-wireless/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-wireless linux-wireless/ https://lore.kernel.org/linux-wireless \
		linux-wireless@vger.kernel.org
	public-inbox-index linux-wireless

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-wireless


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git