All of lore.kernel.org
 help / color / mirror / Atom feed
* Bad performance in RX with sfc 40G
@ 2021-11-18 15:14 Íñigo Huguet
  2021-11-18 17:19 ` Eric Dumazet
  2021-11-20  8:31 ` Martin Habets
  0 siblings, 2 replies; 15+ messages in thread
From: Íñigo Huguet @ 2021-11-18 15:14 UTC (permalink / raw)
  To: Edward Cree, habetsm.xilinx; +Cc: netdev, Dinan Gunawardena, Pablo Cascon

Hello,

Doing some tests a few weeks ago I noticed a very low performance in
RX using 40G Solarflare NICs. Doing tests with iperf3 I got more than
30Gbps in TX, but just around 15Gbps in RX. Other NICs from other
vendors could send and receive over 30Gbps.

I was doing the tests with multiple threads in iperf3 (-P 8).

The models used are SFC9140 and SFC9220.

Perf showed that most of the time was being expended in
`native_queued_spin_lock_slowpath`. Tracing the calls to it with
bpftrace I got that most of the calls were from __napi_poll > efx_poll
> efx_fast_push_rx_descriptors > __alloc_pages >
get_page_from_freelist > ...

Please can you help me investigate the issue? At first sight, it seems
a not very optimal memory allocation strategy, or maybe a failure in
pages recycling strategy...

This is the output of bpftrace, the 2 call chains that repeat more
times, both from sfc

@[
    native_queued_spin_lock_slowpath+1
    _raw_spin_lock+26
    rmqueue_bulk+76
    get_page_from_freelist+2295
    __alloc_pages+214
    efx_fast_push_rx_descriptors+640
    efx_poll+660
    __napi_poll+42
    net_rx_action+547
    __softirqentry_text_start+208
    __irq_exit_rcu+179
    common_interrupt+131
    asm_common_interrupt+30
    cpuidle_enter_state+199
    cpuidle_enter+41
    do_idle+462
    cpu_startup_entry+25
    start_kernel+2465
    secondary_startup_64_no_verify+194
]: 2650
@[
    native_queued_spin_lock_slowpath+1
    _raw_spin_lock+26
    rmqueue_bulk+76
    get_page_from_freelist+2295
    __alloc_pages+214
    efx_fast_push_rx_descriptors+640
    efx_poll+660
    __napi_poll+42
    net_rx_action+547
    __softirqentry_text_start+208
    __irq_exit_rcu+179
    common_interrupt+131
    asm_common_interrupt+30
    cpuidle_enter_state+199
    cpuidle_enter+41
    do_idle+462
    cpu_startup_entry+25
    secondary_startup_64_no_verify+194
]: 17119

--
Íñigo Huguet


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bad performance in RX with sfc 40G
  2021-11-18 15:14 Bad performance in RX with sfc 40G Íñigo Huguet
@ 2021-11-18 17:19 ` Eric Dumazet
  2021-12-02 14:26   ` Íñigo Huguet
  2021-11-20  8:31 ` Martin Habets
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2021-11-18 17:19 UTC (permalink / raw)
  To: Íñigo Huguet, Edward Cree, habetsm.xilinx
  Cc: netdev, Dinan Gunawardena, Pablo Cascon



On 11/18/21 7:14 AM, Íñigo Huguet wrote:
> Hello,
> 
> Doing some tests a few weeks ago I noticed a very low performance in
> RX using 40G Solarflare NICs. Doing tests with iperf3 I got more than
> 30Gbps in TX, but just around 15Gbps in RX. Other NICs from other
> vendors could send and receive over 30Gbps.
> 
> I was doing the tests with multiple threads in iperf3 (-P 8).
> 
> The models used are SFC9140 and SFC9220.
> 
> Perf showed that most of the time was being expended in
> `native_queued_spin_lock_slowpath`. Tracing the calls to it with
> bpftrace I got that most of the calls were from __napi_poll > efx_poll
>> efx_fast_push_rx_descriptors > __alloc_pages >
> get_page_from_freelist > ...
> 
> Please can you help me investigate the issue? At first sight, it seems
> a not very optimal memory allocation strategy, or maybe a failure in
> pages recycling strategy...
> 
> This is the output of bpftrace, the 2 call chains that repeat more
> times, both from sfc
> 
> @[
>     native_queued_spin_lock_slowpath+1
>     _raw_spin_lock+26
>     rmqueue_bulk+76
>     get_page_from_freelist+2295
>     __alloc_pages+214
>     efx_fast_push_rx_descriptors+640
>     efx_poll+660
>     __napi_poll+42
>     net_rx_action+547
>     __softirqentry_text_start+208
>     __irq_exit_rcu+179
>     common_interrupt+131
>     asm_common_interrupt+30
>     cpuidle_enter_state+199
>     cpuidle_enter+41
>     do_idle+462
>     cpu_startup_entry+25
>     start_kernel+2465
>     secondary_startup_64_no_verify+194
> ]: 2650
> @[
>     native_queued_spin_lock_slowpath+1
>     _raw_spin_lock+26
>     rmqueue_bulk+76
>     get_page_from_freelist+2295
>     __alloc_pages+214
>     efx_fast_push_rx_descriptors+640
>     efx_poll+660
>     __napi_poll+42
>     net_rx_action+547
>     __softirqentry_text_start+208
>     __irq_exit_rcu+179
>     common_interrupt+131
>     asm_common_interrupt+30
>     cpuidle_enter_state+199
>     cpuidle_enter+41
>     do_idle+462
>     cpu_startup_entry+25
>     secondary_startup_64_no_verify+194
> ]: 17119
> 
> --
> Íñigo Huguet
> 


You could try to :

Make the RX ring buffers bigger (ethtool -G eth0 rx 8192)

and/or

Make sure your tcp socket receive buffer is smaller than number of frames in the ring buffer

echo "4096 131072 2097152" >/proc/sys/net/ipv4/tcp_rmem

You can also try latest net-next, as TCP got something to help this case.

f35f821935d8df76f9c92e2431a225bdff938169 tcp: defer skb freeing after socket lock is released

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bad performance in RX with sfc 40G
  2021-11-18 15:14 Bad performance in RX with sfc 40G Íñigo Huguet
  2021-11-18 17:19 ` Eric Dumazet
@ 2021-11-20  8:31 ` Martin Habets
  2021-12-09 12:06   ` Íñigo Huguet
  1 sibling, 1 reply; 15+ messages in thread
From: Martin Habets @ 2021-11-20  8:31 UTC (permalink / raw)
  To: Íñigo Huguet; +Cc: Edward Cree, netdev, Dinan Gunawardena

On Thu, Nov 18, 2021 at 04:14:35PM +0100, Íñigo Huguet wrote:
> Hello,
> 
> Doing some tests a few weeks ago I noticed a very low performance in
> RX using 40G Solarflare NICs. Doing tests with iperf3 I got more than
> 30Gbps in TX, but just around 15Gbps in RX. Other NICs from other
> vendors could send and receive over 30Gbps.
> 
> I was doing the tests with multiple threads in iperf3 (-P 8).
> 
> The models used are SFC9140 and SFC9220.
> 
> Perf showed that most of the time was being expended in
> `native_queued_spin_lock_slowpath`. Tracing the calls to it with
> bpftrace I got that most of the calls were from __napi_poll > efx_poll
> > efx_fast_push_rx_descriptors > __alloc_pages >
> get_page_from_freelist > ...
> 
> Please can you help me investigate the issue? At first sight, it seems
> a not very optimal memory allocation strategy, or maybe a failure in
> pages recycling strategy...

Apologies for the late reply. I'm having trouble digging out a
machine to test this.
If you're testing without the IOMMU enabled I suspect the recycle ring
size may be too small. Can your try the patch below?

Martin

diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 68fc7d317693..e37702bf3380 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -28,7 +28,7 @@ MODULE_PARM_DESC(rx_refill_threshold,
  * the number of pages to store in the RX page recycle ring.
  */
 #define EFX_RECYCLE_RING_SIZE_IOMMU 4096
-#define EFX_RECYCLE_RING_SIZE_NOIOMMU (2 * EFX_RX_PREFERRED_BATCH)
+#define EFX_RECYCLE_RING_SIZE_NOIOMMU 1024
 
 /* RX maximum head room required.
  *


> 
> This is the output of bpftrace, the 2 call chains that repeat more
> times, both from sfc
> 
> @[
>     native_queued_spin_lock_slowpath+1
>     _raw_spin_lock+26
>     rmqueue_bulk+76
>     get_page_from_freelist+2295
>     __alloc_pages+214
>     efx_fast_push_rx_descriptors+640
>     efx_poll+660
>     __napi_poll+42
>     net_rx_action+547
>     __softirqentry_text_start+208
>     __irq_exit_rcu+179
>     common_interrupt+131
>     asm_common_interrupt+30
>     cpuidle_enter_state+199
>     cpuidle_enter+41
>     do_idle+462
>     cpu_startup_entry+25
>     start_kernel+2465
>     secondary_startup_64_no_verify+194
> ]: 2650
> @[
>     native_queued_spin_lock_slowpath+1
>     _raw_spin_lock+26
>     rmqueue_bulk+76
>     get_page_from_freelist+2295
>     __alloc_pages+214
>     efx_fast_push_rx_descriptors+640
>     efx_poll+660
>     __napi_poll+42
>     net_rx_action+547
>     __softirqentry_text_start+208
>     __irq_exit_rcu+179
>     common_interrupt+131
>     asm_common_interrupt+30
>     cpuidle_enter_state+199
>     cpuidle_enter+41
>     do_idle+462
>     cpu_startup_entry+25
>     secondary_startup_64_no_verify+194
> ]: 17119
> 
> --
> Íñigo Huguet

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Bad performance in RX with sfc 40G
  2021-11-18 17:19 ` Eric Dumazet
@ 2021-12-02 14:26   ` Íñigo Huguet
  0 siblings, 0 replies; 15+ messages in thread
From: Íñigo Huguet @ 2021-12-02 14:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Edward Cree, habetsm.xilinx, netdev, Dinan Gunawardena, Pablo Cascon

Hi,

On Thu, Nov 18, 2021 at 6:19 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> You could try to :
>
> Make the RX ring buffers bigger (ethtool -G eth0 rx 8192)
>
> and/or
>
> Make sure your tcp socket receive buffer is smaller than number of frames in the ring buffer
>
> echo "4096 131072 2097152" >/proc/sys/net/ipv4/tcp_rmem

Thanks for the suggestion. Sadly, the results are almost the same.

> You can also try latest net-next, as TCP got something to help this case.
>
> f35f821935d8df76f9c92e2431a225bdff938169 tcp: defer skb freeing after socket lock is released

I'll try ASAP, but I will need some days. Thanks



-- 
Íñigo Huguet


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bad performance in RX with sfc 40G
  2021-11-20  8:31 ` Martin Habets
@ 2021-12-09 12:06   ` Íñigo Huguet
  2021-12-23 13:18     ` Íñigo Huguet
  0 siblings, 1 reply; 15+ messages in thread
From: Íñigo Huguet @ 2021-12-09 12:06 UTC (permalink / raw)
  To: Íñigo Huguet, Edward Cree, netdev, Dinan Gunawardena

Hi,

On Sat, Nov 20, 2021 at 9:31 AM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> If you're testing without the IOMMU enabled I suspect the recycle ring
> size may be too small. Can your try the patch below?

Sorry for the very late reply, but I've had to be out of work for many days.

This patch has improved the performance a lot, reaching the same
30Gbps than in TX. However, it seems sometimes a bit erratic, still
dropping to 15Gbps sometimes, specially after module remove & probe,
or from one iperf call to another. But not being all the times, I
didn't found a clear pattern. Anyway, it clearly improves things.

Can this patch be applied as is or it's just a test?

--
Íñigo Huguet


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bad performance in RX with sfc 40G
  2021-12-09 12:06   ` Íñigo Huguet
@ 2021-12-23 13:18     ` Íñigo Huguet
  2022-01-02  9:22       ` Martin Habets
  0 siblings, 1 reply; 15+ messages in thread
From: Íñigo Huguet @ 2021-12-23 13:18 UTC (permalink / raw)
  To: habetsm.xilinx; +Cc: Edward Cree, netdev, Dinan Gunawardena

Hi Martin,

I replied this a few weeks ago, but it seems that, for some reason, I
didn't CCd you.

On Thu, Dec 9, 2021 at 1:06 PM Íñigo Huguet <ihuguet@redhat.com> wrote:
>
> Hi,
>
> On Sat, Nov 20, 2021 at 9:31 AM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> > If you're testing without the IOMMU enabled I suspect the recycle ring
> > size may be too small. Can your try the patch below?
>
> Sorry for the very late reply, but I've had to be out of work for many days.
>
> This patch has improved the performance a lot, reaching the same
> 30Gbps than in TX. However, it seems sometimes a bit erratic, still
> dropping to 15Gbps sometimes, specially after module remove & probe,
> or from one iperf call to another. But not being all the times, I
> didn't found a clear pattern. Anyway, it clearly improves things.
>
> Can this patch be applied as is or it's just a test?
>
> --
> Íñigo Huguet



-- 
Íñigo Huguet


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Bad performance in RX with sfc 40G
  2021-12-23 13:18     ` Íñigo Huguet
@ 2022-01-02  9:22       ` Martin Habets
  2022-01-10  8:58         ` [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible Martin Habets
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Habets @ 2022-01-02  9:22 UTC (permalink / raw)
  To: Íñigo Huguet; +Cc: Edward Cree, netdev, Dinan Gunawardena

Hi Íñigo,

On Thu, Dec 23, 2021 at 02:18:03PM +0100, Íñigo Huguet wrote:
> Hi Martin,
> 
> I replied this a few weeks ago, but it seems that, for some reason, I
> didn't CCd you.

I'm just getting back to work after my holidays. Happy new year!

> On Thu, Dec 9, 2021 at 1:06 PM Íñigo Huguet <ihuguet@redhat.com> wrote:
> >
> > Hi,
> >
> > On Sat, Nov 20, 2021 at 9:31 AM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> > > If you're testing without the IOMMU enabled I suspect the recycle ring
> > > size may be too small. Can your try the patch below?
> >
> > Sorry for the very late reply, but I've had to be out of work for many days.
> >
> > This patch has improved the performance a lot, reaching the same
> > 30Gbps than in TX. However, it seems sometimes a bit erratic, still
> > dropping to 15Gbps sometimes, specially after module remove & probe,
> > or from one iperf call to another. But not being all the times, I
> > didn't found a clear pattern. Anyway, it clearly improves things.

Thanks for the feedback. After module probe the RX cache is cold (empty),
as pages only get recycled as they come in.
The issue you see between iperf calls could be related to the NUMA
locality of the cache. After the 1st run the cache will contain pages for
the NUMA node that iperf ran on. If a subsequent run executes on a
different NUMA node the pages in the cache are further away.
This is where pinning the iperf runs to cores on the same NUMA node will
help.

> > Can this patch be applied as is or it's just a test?

The patch is good for a 40G NIC. But it won't be good enough on a 100G NIC,
and for a 10G NIC the size can be smaller.
I've been puzzling over the way to code this link speed dependency best.
We create this page ring when the interface is brought up, which is
before the driver knows the link speed. So I think it is best to
size it for the maximum speed of a given NIC.
In short, I'll work on a better patch for this.

Martin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-02  9:22       ` Martin Habets
@ 2022-01-10  8:58         ` Martin Habets
  2022-01-10  9:31           ` Íñigo Huguet
                             ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Martin Habets @ 2022-01-10  8:58 UTC (permalink / raw)
  To: Íñigo Huguet, davem, kuba; +Cc: ecree.xilinx, netdev, dinang

Ideally the size would depend on the link speed, but the recycle
ring is created when the interface is brought up before the driver
knows the link speed. So size it for the maximum speed of a given NIC.
PowerPC is only supported on SFN7xxx and SFN8xxx NICs.

With this patch on a 40G NIC the number of calls to alloc_pages and
friends went down from about 18% to under 2%.
On a 10G NIC the number of calls to alloc_pages and friends went down
from about 15% to 0 (perf did not capture any calls during the 60
second test).
On a 100G NIC the number of calls to alloc_pages and friends went down
from about 23% to 4%.

Reported-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
---
 drivers/net/ethernet/sfc/ef10.c       |   31 +++++++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/ef100_nic.c  |    9 +++++++++
 drivers/net/ethernet/sfc/net_driver.h |    2 ++
 drivers/net/ethernet/sfc/nic_common.h |    5 +++++
 drivers/net/ethernet/sfc/rx_common.c  |   18 +-----------------
 drivers/net/ethernet/sfc/rx_common.h  |    6 ++++++
 drivers/net/ethernet/sfc/siena.c      |    8 ++++++++
 7 files changed, 62 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index cf366ed2557c..dc3f95503d9c 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -3990,6 +3990,35 @@ static unsigned int ef10_check_caps(const struct efx_nic *efx,
 	}
 }
 
+static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
+{
+	unsigned int ret;
+
+	/* There is no difference between PFs and VFs. The side is based on
+	 * the maximum link speed of a given NIC.
+	 */
+	switch (efx->pci_dev->device & 0xfff) {
+	case 0x0903:	/* Farmingdale can do up to 10G */
+#ifdef CONFIG_PPC64
+		ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
+#else
+		ret = EFX_RECYCLE_RING_SIZE_10G;
+#endif
+		break;
+	case 0x0923:	/* Greenport can do up to 40G */
+	case 0x0a03:	/* Medford can do up to 40G */
+#ifdef CONFIG_PPC64
+		ret = 16 * EFX_RECYCLE_RING_SIZE_10G;
+#else
+		ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
+#endif
+		break;
+	default:	/* Medford2 can do up to 100G */
+		ret = 10 * EFX_RECYCLE_RING_SIZE_10G;
+	}
+	return ret;
+}
+
 #define EF10_OFFLOAD_FEATURES		\
 	(NETIF_F_IP_CSUM |		\
 	 NETIF_F_HW_VLAN_CTAG_FILTER |	\
@@ -4106,6 +4135,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
 	.check_caps = ef10_check_caps,
 	.print_additional_fwver = efx_ef10_print_additional_fwver,
 	.sensor_event = efx_mcdi_sensor_event,
+	.rx_recycle_ring_size = efx_ef10_recycle_ring_size,
 };
 
 const struct efx_nic_type efx_hunt_a0_nic_type = {
@@ -4243,4 +4273,5 @@ const struct efx_nic_type efx_hunt_a0_nic_type = {
 	.check_caps = ef10_check_caps,
 	.print_additional_fwver = efx_ef10_print_additional_fwver,
 	.sensor_event = efx_mcdi_sensor_event,
+	.rx_recycle_ring_size = efx_ef10_recycle_ring_size,
 };
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index f79b14a119ae..a07cbf45a326 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -23,6 +23,7 @@
 #include "ef100_rx.h"
 #include "ef100_tx.h"
 #include "ef100_netdev.h"
+#include "rx_common.h"
 
 #define EF100_MAX_VIS 4096
 #define EF100_NUM_MCDI_BUFFERS	1
@@ -696,6 +697,12 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
 	}
 }
 
+static unsigned int efx_ef100_recycle_ring_size(const struct efx_nic *efx)
+{
+	/* Maximum link speed for Riverhead is 100G */
+	return 10 * EFX_RECYCLE_RING_SIZE_10G;
+}
+
 /*	NIC level access functions
  */
 #define EF100_OFFLOAD_FEATURES	(NETIF_F_HW_CSUM | NETIF_F_RXCSUM |	\
@@ -770,6 +777,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.rx_push_rss_context_config = efx_mcdi_rx_push_rss_context_config,
 	.rx_pull_rss_context_config = efx_mcdi_rx_pull_rss_context_config,
 	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+	.rx_recycle_ring_size = efx_ef100_recycle_ring_size,
 
 	.reconfigure_mac = ef100_reconfigure_mac,
 	.reconfigure_port = efx_mcdi_port_reconfigure,
@@ -849,6 +857,7 @@ const struct efx_nic_type ef100_vf_nic_type = {
 	.rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
 	.rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
 	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+	.rx_recycle_ring_size = efx_ef100_recycle_ring_size,
 
 	.reconfigure_mac = ef100_reconfigure_mac,
 	.test_nvram = efx_new_mcdi_nvram_test_all,
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index cc15ee8812d9..c75dc75e2857 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1282,6 +1282,7 @@ struct efx_udp_tunnel {
  * @udp_tnl_has_port: Check if a port has been added as UDP tunnel
  * @print_additional_fwver: Dump NIC-specific additional FW version info
  * @sensor_event: Handle a sensor event from MCDI
+ * @rx_recycle_ring_size: Size of the RX recycle ring
  * @revision: Hardware architecture revision
  * @txd_ptr_tbl_base: TX descriptor ring base address
  * @rxd_ptr_tbl_base: RX descriptor ring base address
@@ -1460,6 +1461,7 @@ struct efx_nic_type {
 	size_t (*print_additional_fwver)(struct efx_nic *efx, char *buf,
 					 size_t len);
 	void (*sensor_event)(struct efx_nic *efx, efx_qword_t *ev);
+	unsigned int (*rx_recycle_ring_size)(const struct efx_nic *efx);
 
 	int revision;
 	unsigned int txd_ptr_tbl_base;
diff --git a/drivers/net/ethernet/sfc/nic_common.h b/drivers/net/ethernet/sfc/nic_common.h
index b9cafe9cd568..0cef35c0c559 100644
--- a/drivers/net/ethernet/sfc/nic_common.h
+++ b/drivers/net/ethernet/sfc/nic_common.h
@@ -195,6 +195,11 @@ static inline void efx_sensor_event(struct efx_nic *efx, efx_qword_t *ev)
 		efx->type->sensor_event(efx, ev);
 }
 
+static inline unsigned int efx_rx_recycle_ring_size(const struct efx_nic *efx)
+{
+	return efx->type->rx_recycle_ring_size(efx);
+}
+
 /* Some statistics are computed as A - B where A and B each increase
  * linearly with some hardware counter(s) and the counters are read
  * asynchronously.  If the counters contributing to B are always read
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 633ca77a26fd..1b22c7be0088 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -23,13 +23,6 @@ module_param(rx_refill_threshold, uint, 0444);
 MODULE_PARM_DESC(rx_refill_threshold,
 		 "RX descriptor ring refill threshold (%)");
 
-/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
- * ring, this number is divided by the number of buffers per page to calculate
- * the number of pages to store in the RX page recycle ring.
- */
-#define EFX_RECYCLE_RING_SIZE_IOMMU 4096
-#define EFX_RECYCLE_RING_SIZE_NOIOMMU (2 * EFX_RX_PREFERRED_BATCH)
-
 /* RX maximum head room required.
  *
  * This must be at least 1 to prevent overflow, plus one packet-worth
@@ -141,16 +134,7 @@ static void efx_init_rx_recycle_ring(struct efx_rx_queue *rx_queue)
 	unsigned int bufs_in_recycle_ring, page_ring_size;
 	struct efx_nic *efx = rx_queue->efx;
 
-	/* Set the RX recycle ring size */
-#ifdef CONFIG_PPC64
-	bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
-#else
-	if (iommu_present(&pci_bus_type))
-		bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
-	else
-		bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_NOIOMMU;
-#endif /* CONFIG_PPC64 */
-
+	bufs_in_recycle_ring = efx_rx_recycle_ring_size(efx);
 	page_ring_size = roundup_pow_of_two(bufs_in_recycle_ring /
 					    efx->rx_bufs_per_page);
 	rx_queue->page_ring = kcalloc(page_ring_size,
diff --git a/drivers/net/ethernet/sfc/rx_common.h b/drivers/net/ethernet/sfc/rx_common.h
index 207ccd8ba062..fbd2769307f9 100644
--- a/drivers/net/ethernet/sfc/rx_common.h
+++ b/drivers/net/ethernet/sfc/rx_common.h
@@ -18,6 +18,12 @@
 #define EFX_RX_MAX_FRAGS DIV_ROUND_UP(EFX_MAX_FRAME_LEN(EFX_MAX_MTU), \
 				      EFX_RX_USR_BUF_SIZE)
 
+/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
+ * ring, this number is divided by the number of buffers per page to calculate
+ * the number of pages to store in the RX page recycle ring.
+ */
+#define EFX_RECYCLE_RING_SIZE_10G	256
+
 static inline u8 *efx_rx_buf_va(struct efx_rx_buffer *buf)
 {
 	return page_address(buf->page) + buf->page_offset;
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index 16347a6d0c47..ce3060e15b54 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -25,6 +25,7 @@
 #include "mcdi_port_common.h"
 #include "selftest.h"
 #include "siena_sriov.h"
+#include "rx_common.h"
 
 /* Hardware control for SFC9000 family including SFL9021 (aka Siena). */
 
@@ -958,6 +959,12 @@ static unsigned int siena_check_caps(const struct efx_nic *efx,
 	return 0;
 }
 
+static unsigned int efx_siena_recycle_ring_size(const struct efx_nic *efx)
+{
+	/* Maximum link speed is 10G */
+	return EFX_RECYCLE_RING_SIZE_10G;
+}
+
 /**************************************************************************
  *
  * Revision-dependent attributes used by efx.c and nic.c
@@ -1098,4 +1105,5 @@ const struct efx_nic_type siena_a0_nic_type = {
 	.rx_hash_key_size = 16,
 	.check_caps = siena_check_caps,
 	.sensor_event = efx_mcdi_sensor_event,
+	.rx_recycle_ring_size = efx_siena_recycle_ring_size,
 };

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-10  8:58         ` [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible Martin Habets
@ 2022-01-10  9:31           ` Íñigo Huguet
  2022-01-12  9:05             ` Martin Habets
  2022-01-31 11:08             ` Martin Habets
  2022-01-10 17:22           ` Jakub Kicinski
  2022-01-31 11:10           ` [PATCH V2 " Martin Habets
  2 siblings, 2 replies; 15+ messages in thread
From: Íñigo Huguet @ 2022-01-10  9:31 UTC (permalink / raw)
  To: Íñigo Huguet, David S. Miller, Jakub Kicinski,
	Edward Cree, netdev, Dinan Gunawardena

Hi Martin, thanks for the quick fix.

On Mon, Jan 10, 2022 at 9:58 AM Martin Habets <habetsm.xilinx@gmail.com> wrote:
>
> Ideally the size would depend on the link speed, but the recycle
> ring is created when the interface is brought up before the driver
> knows the link speed. So size it for the maximum speed of a given NIC.
> PowerPC is only supported on SFN7xxx and SFN8xxx NICs.
>
> With this patch on a 40G NIC the number of calls to alloc_pages and
> friends went down from about 18% to under 2%.
> On a 10G NIC the number of calls to alloc_pages and friends went down
> from about 15% to 0 (perf did not capture any calls during the 60
> second test).
> On a 100G NIC the number of calls to alloc_pages and friends went down
> from about 23% to 4%.

Although the problem seemed to be mainly in the case of IOMMU not
present, this patch also changes the ring size for the IOMMU present
case, using the same size for both. Have you checked that performance
is not reduced in the second case?

> Reported-by: Íñigo Huguet <ihuguet@redhat.com>
> Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
> ---
>  drivers/net/ethernet/sfc/ef10.c       |   31 +++++++++++++++++++++++++++++++
>  drivers/net/ethernet/sfc/ef100_nic.c  |    9 +++++++++
>  drivers/net/ethernet/sfc/net_driver.h |    2 ++
>  drivers/net/ethernet/sfc/nic_common.h |    5 +++++
>  drivers/net/ethernet/sfc/rx_common.c  |   18 +-----------------
>  drivers/net/ethernet/sfc/rx_common.h  |    6 ++++++
>  drivers/net/ethernet/sfc/siena.c      |    8 ++++++++
>  7 files changed, 62 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> index cf366ed2557c..dc3f95503d9c 100644
> --- a/drivers/net/ethernet/sfc/ef10.c
> +++ b/drivers/net/ethernet/sfc/ef10.c
> @@ -3990,6 +3990,35 @@ static unsigned int ef10_check_caps(const struct efx_nic *efx,
>         }
>  }
>
> +static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
> +{
> +       unsigned int ret;
> +
> +       /* There is no difference between PFs and VFs. The side is based on
> +        * the maximum link speed of a given NIC.
> +        */
> +       switch (efx->pci_dev->device & 0xfff) {
> +       case 0x0903:    /* Farmingdale can do up to 10G */
> +#ifdef CONFIG_PPC64
> +               ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> +#else
> +               ret = EFX_RECYCLE_RING_SIZE_10G;
> +#endif
> +               break;
> +       case 0x0923:    /* Greenport can do up to 40G */
> +       case 0x0a03:    /* Medford can do up to 40G */
> +#ifdef CONFIG_PPC64
> +               ret = 16 * EFX_RECYCLE_RING_SIZE_10G;
> +#else
> +               ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> +#endif
> +               break;
> +       default:        /* Medford2 can do up to 100G */
> +               ret = 10 * EFX_RECYCLE_RING_SIZE_10G;
> +       }
> +       return ret;
> +}
> +
>  #define EF10_OFFLOAD_FEATURES          \
>         (NETIF_F_IP_CSUM |              \
>          NETIF_F_HW_VLAN_CTAG_FILTER |  \
> @@ -4106,6 +4135,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
>         .check_caps = ef10_check_caps,
>         .print_additional_fwver = efx_ef10_print_additional_fwver,
>         .sensor_event = efx_mcdi_sensor_event,
> +       .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
>  };
>
>  const struct efx_nic_type efx_hunt_a0_nic_type = {
> @@ -4243,4 +4273,5 @@ const struct efx_nic_type efx_hunt_a0_nic_type = {
>         .check_caps = ef10_check_caps,
>         .print_additional_fwver = efx_ef10_print_additional_fwver,
>         .sensor_event = efx_mcdi_sensor_event,
> +       .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
>  };
> diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
> index f79b14a119ae..a07cbf45a326 100644
> --- a/drivers/net/ethernet/sfc/ef100_nic.c
> +++ b/drivers/net/ethernet/sfc/ef100_nic.c
> @@ -23,6 +23,7 @@
>  #include "ef100_rx.h"
>  #include "ef100_tx.h"
>  #include "ef100_netdev.h"
> +#include "rx_common.h"
>
>  #define EF100_MAX_VIS 4096
>  #define EF100_NUM_MCDI_BUFFERS 1
> @@ -696,6 +697,12 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
>         }
>  }
>
> +static unsigned int efx_ef100_recycle_ring_size(const struct efx_nic *efx)
> +{
> +       /* Maximum link speed for Riverhead is 100G */
> +       return 10 * EFX_RECYCLE_RING_SIZE_10G;
> +}
> +
>  /*     NIC level access functions
>   */
>  #define EF100_OFFLOAD_FEATURES (NETIF_F_HW_CSUM | NETIF_F_RXCSUM |     \
> @@ -770,6 +777,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
>         .rx_push_rss_context_config = efx_mcdi_rx_push_rss_context_config,
>         .rx_pull_rss_context_config = efx_mcdi_rx_pull_rss_context_config,
>         .rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
> +       .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
>
>         .reconfigure_mac = ef100_reconfigure_mac,
>         .reconfigure_port = efx_mcdi_port_reconfigure,
> @@ -849,6 +857,7 @@ const struct efx_nic_type ef100_vf_nic_type = {
>         .rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
>         .rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
>         .rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
> +       .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
>
>         .reconfigure_mac = ef100_reconfigure_mac,
>         .test_nvram = efx_new_mcdi_nvram_test_all,
> diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> index cc15ee8812d9..c75dc75e2857 100644
> --- a/drivers/net/ethernet/sfc/net_driver.h
> +++ b/drivers/net/ethernet/sfc/net_driver.h
> @@ -1282,6 +1282,7 @@ struct efx_udp_tunnel {
>   * @udp_tnl_has_port: Check if a port has been added as UDP tunnel
>   * @print_additional_fwver: Dump NIC-specific additional FW version info
>   * @sensor_event: Handle a sensor event from MCDI
> + * @rx_recycle_ring_size: Size of the RX recycle ring
>   * @revision: Hardware architecture revision
>   * @txd_ptr_tbl_base: TX descriptor ring base address
>   * @rxd_ptr_tbl_base: RX descriptor ring base address
> @@ -1460,6 +1461,7 @@ struct efx_nic_type {
>         size_t (*print_additional_fwver)(struct efx_nic *efx, char *buf,
>                                          size_t len);
>         void (*sensor_event)(struct efx_nic *efx, efx_qword_t *ev);
> +       unsigned int (*rx_recycle_ring_size)(const struct efx_nic *efx);
>
>         int revision;
>         unsigned int txd_ptr_tbl_base;
> diff --git a/drivers/net/ethernet/sfc/nic_common.h b/drivers/net/ethernet/sfc/nic_common.h
> index b9cafe9cd568..0cef35c0c559 100644
> --- a/drivers/net/ethernet/sfc/nic_common.h
> +++ b/drivers/net/ethernet/sfc/nic_common.h
> @@ -195,6 +195,11 @@ static inline void efx_sensor_event(struct efx_nic *efx, efx_qword_t *ev)
>                 efx->type->sensor_event(efx, ev);
>  }
>
> +static inline unsigned int efx_rx_recycle_ring_size(const struct efx_nic *efx)
> +{
> +       return efx->type->rx_recycle_ring_size(efx);
> +}
> +
>  /* Some statistics are computed as A - B where A and B each increase
>   * linearly with some hardware counter(s) and the counters are read
>   * asynchronously.  If the counters contributing to B are always read
> diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
> index 633ca77a26fd..1b22c7be0088 100644
> --- a/drivers/net/ethernet/sfc/rx_common.c
> +++ b/drivers/net/ethernet/sfc/rx_common.c
> @@ -23,13 +23,6 @@ module_param(rx_refill_threshold, uint, 0444);
>  MODULE_PARM_DESC(rx_refill_threshold,
>                  "RX descriptor ring refill threshold (%)");
>
> -/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
> - * ring, this number is divided by the number of buffers per page to calculate
> - * the number of pages to store in the RX page recycle ring.
> - */
> -#define EFX_RECYCLE_RING_SIZE_IOMMU 4096
> -#define EFX_RECYCLE_RING_SIZE_NOIOMMU (2 * EFX_RX_PREFERRED_BATCH)
> -
>  /* RX maximum head room required.
>   *
>   * This must be at least 1 to prevent overflow, plus one packet-worth
> @@ -141,16 +134,7 @@ static void efx_init_rx_recycle_ring(struct efx_rx_queue *rx_queue)
>         unsigned int bufs_in_recycle_ring, page_ring_size;
>         struct efx_nic *efx = rx_queue->efx;
>
> -       /* Set the RX recycle ring size */
> -#ifdef CONFIG_PPC64
> -       bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
> -#else
> -       if (iommu_present(&pci_bus_type))
> -               bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
> -       else
> -               bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_NOIOMMU;
> -#endif /* CONFIG_PPC64 */
> -
> +       bufs_in_recycle_ring = efx_rx_recycle_ring_size(efx);
>         page_ring_size = roundup_pow_of_two(bufs_in_recycle_ring /
>                                             efx->rx_bufs_per_page);
>         rx_queue->page_ring = kcalloc(page_ring_size,
> diff --git a/drivers/net/ethernet/sfc/rx_common.h b/drivers/net/ethernet/sfc/rx_common.h
> index 207ccd8ba062..fbd2769307f9 100644
> --- a/drivers/net/ethernet/sfc/rx_common.h
> +++ b/drivers/net/ethernet/sfc/rx_common.h
> @@ -18,6 +18,12 @@
>  #define EFX_RX_MAX_FRAGS DIV_ROUND_UP(EFX_MAX_FRAME_LEN(EFX_MAX_MTU), \
>                                       EFX_RX_USR_BUF_SIZE)
>
> +/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
> + * ring, this number is divided by the number of buffers per page to calculate
> + * the number of pages to store in the RX page recycle ring.
> + */
> +#define EFX_RECYCLE_RING_SIZE_10G      256
> +
>  static inline u8 *efx_rx_buf_va(struct efx_rx_buffer *buf)
>  {
>         return page_address(buf->page) + buf->page_offset;
> diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
> index 16347a6d0c47..ce3060e15b54 100644
> --- a/drivers/net/ethernet/sfc/siena.c
> +++ b/drivers/net/ethernet/sfc/siena.c
> @@ -25,6 +25,7 @@
>  #include "mcdi_port_common.h"
>  #include "selftest.h"
>  #include "siena_sriov.h"
> +#include "rx_common.h"
>
>  /* Hardware control for SFC9000 family including SFL9021 (aka Siena). */
>
> @@ -958,6 +959,12 @@ static unsigned int siena_check_caps(const struct efx_nic *efx,
>         return 0;
>  }
>
> +static unsigned int efx_siena_recycle_ring_size(const struct efx_nic *efx)
> +{
> +       /* Maximum link speed is 10G */
> +       return EFX_RECYCLE_RING_SIZE_10G;
> +}
> +
>  /**************************************************************************
>   *
>   * Revision-dependent attributes used by efx.c and nic.c
> @@ -1098,4 +1105,5 @@ const struct efx_nic_type siena_a0_nic_type = {
>         .rx_hash_key_size = 16,
>         .check_caps = siena_check_caps,
>         .sensor_event = efx_mcdi_sensor_event,
> +       .rx_recycle_ring_size = efx_siena_recycle_ring_size,
>  };
>


-- 
Íñigo Huguet


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-10  8:58         ` [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible Martin Habets
  2022-01-10  9:31           ` Íñigo Huguet
@ 2022-01-10 17:22           ` Jakub Kicinski
  2022-01-12  9:08             ` Martin Habets
  2022-01-31 11:10           ` [PATCH V2 " Martin Habets
  2 siblings, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2022-01-10 17:22 UTC (permalink / raw)
  To: Martin Habets; +Cc: Íñigo Huguet, davem, ecree.xilinx, netdev, dinang

On Mon, 10 Jan 2022 08:58:21 +0000 Martin Habets wrote:
> +static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
> +{
> +	unsigned int ret;
> +
> +	/* There is no difference between PFs and VFs. The side is based on
> +	 * the maximum link speed of a given NIC.
> +	 */
> +	switch (efx->pci_dev->device & 0xfff) {
> +	case 0x0903:	/* Farmingdale can do up to 10G */
> +#ifdef CONFIG_PPC64
> +		ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> +#else
> +		ret = EFX_RECYCLE_RING_SIZE_10G;
> +#endif
> +		break;
> +	case 0x0923:	/* Greenport can do up to 40G */
> +	case 0x0a03:	/* Medford can do up to 40G */
> +#ifdef CONFIG_PPC64
> +		ret = 16 * EFX_RECYCLE_RING_SIZE_10G;
> +#else
> +		ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> +#endif
> +		break;
> +	default:	/* Medford2 can do up to 100G */
> +		ret = 10 * EFX_RECYCLE_RING_SIZE_10G;
> +	}
> +	return ret;
> +}

Why not factor out the 4x scaling for powerpc outside of the switch?

The callback could return the scaling factor but failing that:

static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
{
	unsigned int ret = EFX_RECYCLE_RING_SIZE_10G;;

	/* There is no difference between PFs and VFs. The side is based on
	 * the maximum link speed of a given NIC.
	 */
	switch (efx->pci_dev->device & 0xfff) {
	case 0x0903:	/* Farmingdale can do up to 10G */
		break;
	case 0x0923:	/* Greenport can do up to 40G */
	case 0x0a03:	/* Medford can do up to 40G */
		ret *= 4;
		break;
	default:	/* Medford2 can do up to 100G */
		ret *= 10;
	}

	if (IS_ENABLED(CONFIG_PPC64))
		ret *= 4;

	return ret;
}

Other than that - net-next is closed, please switch to RFC postings
until it opens back up once 5.17-rc1 is cut. Thanks!

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-10  9:31           ` Íñigo Huguet
@ 2022-01-12  9:05             ` Martin Habets
  2022-01-31 11:08             ` Martin Habets
  1 sibling, 0 replies; 15+ messages in thread
From: Martin Habets @ 2022-01-12  9:05 UTC (permalink / raw)
  To: Íñigo Huguet
  Cc: David S. Miller, Jakub Kicinski, Edward Cree, netdev, Dinan Gunawardena

Hi Íñigo,

On Mon, Jan 10, 2022 at 10:31:57AM +0100, Íñigo Huguet wrote:
> Hi Martin, thanks for the quick fix.
> 
> On Mon, Jan 10, 2022 at 9:58 AM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> >
> > Ideally the size would depend on the link speed, but the recycle
> > ring is created when the interface is brought up before the driver
> > knows the link speed. So size it for the maximum speed of a given NIC.
> > PowerPC is only supported on SFN7xxx and SFN8xxx NICs.
> >
> > With this patch on a 40G NIC the number of calls to alloc_pages and
> > friends went down from about 18% to under 2%.
> > On a 10G NIC the number of calls to alloc_pages and friends went down
> > from about 15% to 0 (perf did not capture any calls during the 60
> > second test).
> > On a 100G NIC the number of calls to alloc_pages and friends went down
> > from about 23% to 4%.
> 
> Although the problem seemed to be mainly in the case of IOMMU not
> present, this patch also changes the ring size for the IOMMU present
> case, using the same size for both. Have you checked that performance
> is not reduced in the second case?

I have not checked that yet, it will take longer. I expect the performance
will be similar to the ones without IOMMU enabled.

Martin

> > Reported-by: Íñigo Huguet <ihuguet@redhat.com>
> > Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
> > ---
> >  drivers/net/ethernet/sfc/ef10.c       |   31 +++++++++++++++++++++++++++++++
> >  drivers/net/ethernet/sfc/ef100_nic.c  |    9 +++++++++
> >  drivers/net/ethernet/sfc/net_driver.h |    2 ++
> >  drivers/net/ethernet/sfc/nic_common.h |    5 +++++
> >  drivers/net/ethernet/sfc/rx_common.c  |   18 +-----------------
> >  drivers/net/ethernet/sfc/rx_common.h  |    6 ++++++
> >  drivers/net/ethernet/sfc/siena.c      |    8 ++++++++
> >  7 files changed, 62 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> > index cf366ed2557c..dc3f95503d9c 100644
> > --- a/drivers/net/ethernet/sfc/ef10.c
> > +++ b/drivers/net/ethernet/sfc/ef10.c
> > @@ -3990,6 +3990,35 @@ static unsigned int ef10_check_caps(const struct efx_nic *efx,
> >         }
> >  }
> >
> > +static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       unsigned int ret;
> > +
> > +       /* There is no difference between PFs and VFs. The side is based on
> > +        * the maximum link speed of a given NIC.
> > +        */
> > +       switch (efx->pci_dev->device & 0xfff) {
> > +       case 0x0903:    /* Farmingdale can do up to 10G */
> > +#ifdef CONFIG_PPC64
> > +               ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> > +#else
> > +               ret = EFX_RECYCLE_RING_SIZE_10G;
> > +#endif
> > +               break;
> > +       case 0x0923:    /* Greenport can do up to 40G */
> > +       case 0x0a03:    /* Medford can do up to 40G */
> > +#ifdef CONFIG_PPC64
> > +               ret = 16 * EFX_RECYCLE_RING_SIZE_10G;
> > +#else
> > +               ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> > +#endif
> > +               break;
> > +       default:        /* Medford2 can do up to 100G */
> > +               ret = 10 * EFX_RECYCLE_RING_SIZE_10G;
> > +       }
> > +       return ret;
> > +}
> > +
> >  #define EF10_OFFLOAD_FEATURES          \
> >         (NETIF_F_IP_CSUM |              \
> >          NETIF_F_HW_VLAN_CTAG_FILTER |  \
> > @@ -4106,6 +4135,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
> >         .check_caps = ef10_check_caps,
> >         .print_additional_fwver = efx_ef10_print_additional_fwver,
> >         .sensor_event = efx_mcdi_sensor_event,
> > +       .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
> >  };
> >
> >  const struct efx_nic_type efx_hunt_a0_nic_type = {
> > @@ -4243,4 +4273,5 @@ const struct efx_nic_type efx_hunt_a0_nic_type = {
> >         .check_caps = ef10_check_caps,
> >         .print_additional_fwver = efx_ef10_print_additional_fwver,
> >         .sensor_event = efx_mcdi_sensor_event,
> > +       .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
> >  };
> > diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
> > index f79b14a119ae..a07cbf45a326 100644
> > --- a/drivers/net/ethernet/sfc/ef100_nic.c
> > +++ b/drivers/net/ethernet/sfc/ef100_nic.c
> > @@ -23,6 +23,7 @@
> >  #include "ef100_rx.h"
> >  #include "ef100_tx.h"
> >  #include "ef100_netdev.h"
> > +#include "rx_common.h"
> >
> >  #define EF100_MAX_VIS 4096
> >  #define EF100_NUM_MCDI_BUFFERS 1
> > @@ -696,6 +697,12 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
> >         }
> >  }
> >
> > +static unsigned int efx_ef100_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       /* Maximum link speed for Riverhead is 100G */
> > +       return 10 * EFX_RECYCLE_RING_SIZE_10G;
> > +}
> > +
> >  /*     NIC level access functions
> >   */
> >  #define EF100_OFFLOAD_FEATURES (NETIF_F_HW_CSUM | NETIF_F_RXCSUM |     \
> > @@ -770,6 +777,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
> >         .rx_push_rss_context_config = efx_mcdi_rx_push_rss_context_config,
> >         .rx_pull_rss_context_config = efx_mcdi_rx_pull_rss_context_config,
> >         .rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
> > +       .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
> >
> >         .reconfigure_mac = ef100_reconfigure_mac,
> >         .reconfigure_port = efx_mcdi_port_reconfigure,
> > @@ -849,6 +857,7 @@ const struct efx_nic_type ef100_vf_nic_type = {
> >         .rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
> >         .rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
> >         .rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
> > +       .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
> >
> >         .reconfigure_mac = ef100_reconfigure_mac,
> >         .test_nvram = efx_new_mcdi_nvram_test_all,
> > diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> > index cc15ee8812d9..c75dc75e2857 100644
> > --- a/drivers/net/ethernet/sfc/net_driver.h
> > +++ b/drivers/net/ethernet/sfc/net_driver.h
> > @@ -1282,6 +1282,7 @@ struct efx_udp_tunnel {
> >   * @udp_tnl_has_port: Check if a port has been added as UDP tunnel
> >   * @print_additional_fwver: Dump NIC-specific additional FW version info
> >   * @sensor_event: Handle a sensor event from MCDI
> > + * @rx_recycle_ring_size: Size of the RX recycle ring
> >   * @revision: Hardware architecture revision
> >   * @txd_ptr_tbl_base: TX descriptor ring base address
> >   * @rxd_ptr_tbl_base: RX descriptor ring base address
> > @@ -1460,6 +1461,7 @@ struct efx_nic_type {
> >         size_t (*print_additional_fwver)(struct efx_nic *efx, char *buf,
> >                                          size_t len);
> >         void (*sensor_event)(struct efx_nic *efx, efx_qword_t *ev);
> > +       unsigned int (*rx_recycle_ring_size)(const struct efx_nic *efx);
> >
> >         int revision;
> >         unsigned int txd_ptr_tbl_base;
> > diff --git a/drivers/net/ethernet/sfc/nic_common.h b/drivers/net/ethernet/sfc/nic_common.h
> > index b9cafe9cd568..0cef35c0c559 100644
> > --- a/drivers/net/ethernet/sfc/nic_common.h
> > +++ b/drivers/net/ethernet/sfc/nic_common.h
> > @@ -195,6 +195,11 @@ static inline void efx_sensor_event(struct efx_nic *efx, efx_qword_t *ev)
> >                 efx->type->sensor_event(efx, ev);
> >  }
> >
> > +static inline unsigned int efx_rx_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       return efx->type->rx_recycle_ring_size(efx);
> > +}
> > +
> >  /* Some statistics are computed as A - B where A and B each increase
> >   * linearly with some hardware counter(s) and the counters are read
> >   * asynchronously.  If the counters contributing to B are always read
> > diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
> > index 633ca77a26fd..1b22c7be0088 100644
> > --- a/drivers/net/ethernet/sfc/rx_common.c
> > +++ b/drivers/net/ethernet/sfc/rx_common.c
> > @@ -23,13 +23,6 @@ module_param(rx_refill_threshold, uint, 0444);
> >  MODULE_PARM_DESC(rx_refill_threshold,
> >                  "RX descriptor ring refill threshold (%)");
> >
> > -/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
> > - * ring, this number is divided by the number of buffers per page to calculate
> > - * the number of pages to store in the RX page recycle ring.
> > - */
> > -#define EFX_RECYCLE_RING_SIZE_IOMMU 4096
> > -#define EFX_RECYCLE_RING_SIZE_NOIOMMU (2 * EFX_RX_PREFERRED_BATCH)
> > -
> >  /* RX maximum head room required.
> >   *
> >   * This must be at least 1 to prevent overflow, plus one packet-worth
> > @@ -141,16 +134,7 @@ static void efx_init_rx_recycle_ring(struct efx_rx_queue *rx_queue)
> >         unsigned int bufs_in_recycle_ring, page_ring_size;
> >         struct efx_nic *efx = rx_queue->efx;
> >
> > -       /* Set the RX recycle ring size */
> > -#ifdef CONFIG_PPC64
> > -       bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
> > -#else
> > -       if (iommu_present(&pci_bus_type))
> > -               bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
> > -       else
> > -               bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_NOIOMMU;
> > -#endif /* CONFIG_PPC64 */
> > -
> > +       bufs_in_recycle_ring = efx_rx_recycle_ring_size(efx);
> >         page_ring_size = roundup_pow_of_two(bufs_in_recycle_ring /
> >                                             efx->rx_bufs_per_page);
> >         rx_queue->page_ring = kcalloc(page_ring_size,
> > diff --git a/drivers/net/ethernet/sfc/rx_common.h b/drivers/net/ethernet/sfc/rx_common.h
> > index 207ccd8ba062..fbd2769307f9 100644
> > --- a/drivers/net/ethernet/sfc/rx_common.h
> > +++ b/drivers/net/ethernet/sfc/rx_common.h
> > @@ -18,6 +18,12 @@
> >  #define EFX_RX_MAX_FRAGS DIV_ROUND_UP(EFX_MAX_FRAME_LEN(EFX_MAX_MTU), \
> >                                       EFX_RX_USR_BUF_SIZE)
> >
> > +/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
> > + * ring, this number is divided by the number of buffers per page to calculate
> > + * the number of pages to store in the RX page recycle ring.
> > + */
> > +#define EFX_RECYCLE_RING_SIZE_10G      256
> > +
> >  static inline u8 *efx_rx_buf_va(struct efx_rx_buffer *buf)
> >  {
> >         return page_address(buf->page) + buf->page_offset;
> > diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
> > index 16347a6d0c47..ce3060e15b54 100644
> > --- a/drivers/net/ethernet/sfc/siena.c
> > +++ b/drivers/net/ethernet/sfc/siena.c
> > @@ -25,6 +25,7 @@
> >  #include "mcdi_port_common.h"
> >  #include "selftest.h"
> >  #include "siena_sriov.h"
> > +#include "rx_common.h"
> >
> >  /* Hardware control for SFC9000 family including SFL9021 (aka Siena). */
> >
> > @@ -958,6 +959,12 @@ static unsigned int siena_check_caps(const struct efx_nic *efx,
> >         return 0;
> >  }
> >
> > +static unsigned int efx_siena_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       /* Maximum link speed is 10G */
> > +       return EFX_RECYCLE_RING_SIZE_10G;
> > +}
> > +
> >  /**************************************************************************
> >   *
> >   * Revision-dependent attributes used by efx.c and nic.c
> > @@ -1098,4 +1105,5 @@ const struct efx_nic_type siena_a0_nic_type = {
> >         .rx_hash_key_size = 16,
> >         .check_caps = siena_check_caps,
> >         .sensor_event = efx_mcdi_sensor_event,
> > +       .rx_recycle_ring_size = efx_siena_recycle_ring_size,
> >  };
> >
> 
> 
> -- 
> Íñigo Huguet

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-10 17:22           ` Jakub Kicinski
@ 2022-01-12  9:08             ` Martin Habets
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Habets @ 2022-01-12  9:08 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Íñigo Huguet, davem, ecree.xilinx, netdev, dinang

On Mon, Jan 10, 2022 at 09:22:24AM -0800, Jakub Kicinski wrote:
> On Mon, 10 Jan 2022 08:58:21 +0000 Martin Habets wrote:
> > +static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +	unsigned int ret;
> > +
> > +	/* There is no difference between PFs and VFs. The side is based on
> > +	 * the maximum link speed of a given NIC.
> > +	 */
> > +	switch (efx->pci_dev->device & 0xfff) {
> > +	case 0x0903:	/* Farmingdale can do up to 10G */
> > +#ifdef CONFIG_PPC64
> > +		ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> > +#else
> > +		ret = EFX_RECYCLE_RING_SIZE_10G;
> > +#endif
> > +		break;
> > +	case 0x0923:	/* Greenport can do up to 40G */
> > +	case 0x0a03:	/* Medford can do up to 40G */
> > +#ifdef CONFIG_PPC64
> > +		ret = 16 * EFX_RECYCLE_RING_SIZE_10G;
> > +#else
> > +		ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> > +#endif
> > +		break;
> > +	default:	/* Medford2 can do up to 100G */
> > +		ret = 10 * EFX_RECYCLE_RING_SIZE_10G;
> > +	}
> > +	return ret;
> > +}
> 
> Why not factor out the 4x scaling for powerpc outside of the switch?
> 
> The callback could return the scaling factor but failing that:
> 
> static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
> {
> 	unsigned int ret = EFX_RECYCLE_RING_SIZE_10G;;
> 
> 	/* There is no difference between PFs and VFs. The side is based on
> 	 * the maximum link speed of a given NIC.
> 	 */
> 	switch (efx->pci_dev->device & 0xfff) {
> 	case 0x0903:	/* Farmingdale can do up to 10G */
> 		break;
> 	case 0x0923:	/* Greenport can do up to 40G */
> 	case 0x0a03:	/* Medford can do up to 40G */
> 		ret *= 4;
> 		break;
> 	default:	/* Medford2 can do up to 100G */
> 		ret *= 10;
> 	}
> 
> 	if (IS_ENABLED(CONFIG_PPC64))
> 		ret *= 4;
> 
> 	return ret;
> }

Thanks, will do.

> Other than that - net-next is closed, please switch to RFC postings
> until it opens back up once 5.17-rc1 is cut. Thanks!

I knew it had to be near closing, I even checked the weblink. ;)
Will repost when net-next is open again.

Martin

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-10  9:31           ` Íñigo Huguet
  2022-01-12  9:05             ` Martin Habets
@ 2022-01-31 11:08             ` Martin Habets
  1 sibling, 0 replies; 15+ messages in thread
From: Martin Habets @ 2022-01-31 11:08 UTC (permalink / raw)
  To: Íñigo Huguet
  Cc: David S. Miller, Jakub Kicinski, Edward Cree, netdev, Dinan Gunawardena

[-- Attachment #1: Type: text/plain, Size: 11596 bytes --]

On Mon, Jan 10, 2022 at 10:31:57AM +0100, Íñigo Huguet wrote:
> Hi Martin, thanks for the quick fix.
> 
> On Mon, Jan 10, 2022 at 9:58 AM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> >
> > Ideally the size would depend on the link speed, but the recycle
> > ring is created when the interface is brought up before the driver
> > knows the link speed. So size it for the maximum speed of a given NIC.
> > PowerPC is only supported on SFN7xxx and SFN8xxx NICs.
> >
> > With this patch on a 40G NIC the number of calls to alloc_pages and
> > friends went down from about 18% to under 2%.
> > On a 10G NIC the number of calls to alloc_pages and friends went down
> > from about 15% to 0 (perf did not capture any calls during the 60
> > second test).
> > On a 100G NIC the number of calls to alloc_pages and friends went down
> > from about 23% to 4%.
> 
> Although the problem seemed to be mainly in the case of IOMMU not
> present, this patch also changes the ring size for the IOMMU present
> case, using the same size for both. Have you checked that performance
> is not reduced in the second case?

With the IOMMU enabled calls to __alloc_pages are at a whopping 0.01%.
No change in performance was measured.

Martin

> > Reported-by: Íñigo Huguet <ihuguet@redhat.com>
> > Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
> > ---
> >  drivers/net/ethernet/sfc/ef10.c       |   31 +++++++++++++++++++++++++++++++
> >  drivers/net/ethernet/sfc/ef100_nic.c  |    9 +++++++++
> >  drivers/net/ethernet/sfc/net_driver.h |    2 ++
> >  drivers/net/ethernet/sfc/nic_common.h |    5 +++++
> >  drivers/net/ethernet/sfc/rx_common.c  |   18 +-----------------
> >  drivers/net/ethernet/sfc/rx_common.h  |    6 ++++++
> >  drivers/net/ethernet/sfc/siena.c      |    8 ++++++++
> >  7 files changed, 62 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
> > index cf366ed2557c..dc3f95503d9c 100644
> > --- a/drivers/net/ethernet/sfc/ef10.c
> > +++ b/drivers/net/ethernet/sfc/ef10.c
> > @@ -3990,6 +3990,35 @@ static unsigned int ef10_check_caps(const struct efx_nic *efx,
> >         }
> >  }
> >
> > +static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       unsigned int ret;
> > +
> > +       /* There is no difference between PFs and VFs. The side is based on
> > +        * the maximum link speed of a given NIC.
> > +        */
> > +       switch (efx->pci_dev->device & 0xfff) {
> > +       case 0x0903:    /* Farmingdale can do up to 10G */
> > +#ifdef CONFIG_PPC64
> > +               ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> > +#else
> > +               ret = EFX_RECYCLE_RING_SIZE_10G;
> > +#endif
> > +               break;
> > +       case 0x0923:    /* Greenport can do up to 40G */
> > +       case 0x0a03:    /* Medford can do up to 40G */
> > +#ifdef CONFIG_PPC64
> > +               ret = 16 * EFX_RECYCLE_RING_SIZE_10G;
> > +#else
> > +               ret = 4 * EFX_RECYCLE_RING_SIZE_10G;
> > +#endif
> > +               break;
> > +       default:        /* Medford2 can do up to 100G */
> > +               ret = 10 * EFX_RECYCLE_RING_SIZE_10G;
> > +       }
> > +       return ret;
> > +}
> > +
> >  #define EF10_OFFLOAD_FEATURES          \
> >         (NETIF_F_IP_CSUM |              \
> >          NETIF_F_HW_VLAN_CTAG_FILTER |  \
> > @@ -4106,6 +4135,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
> >         .check_caps = ef10_check_caps,
> >         .print_additional_fwver = efx_ef10_print_additional_fwver,
> >         .sensor_event = efx_mcdi_sensor_event,
> > +       .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
> >  };
> >
> >  const struct efx_nic_type efx_hunt_a0_nic_type = {
> > @@ -4243,4 +4273,5 @@ const struct efx_nic_type efx_hunt_a0_nic_type = {
> >         .check_caps = ef10_check_caps,
> >         .print_additional_fwver = efx_ef10_print_additional_fwver,
> >         .sensor_event = efx_mcdi_sensor_event,
> > +       .rx_recycle_ring_size = efx_ef10_recycle_ring_size,
> >  };
> > diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
> > index f79b14a119ae..a07cbf45a326 100644
> > --- a/drivers/net/ethernet/sfc/ef100_nic.c
> > +++ b/drivers/net/ethernet/sfc/ef100_nic.c
> > @@ -23,6 +23,7 @@
> >  #include "ef100_rx.h"
> >  #include "ef100_tx.h"
> >  #include "ef100_netdev.h"
> > +#include "rx_common.h"
> >
> >  #define EF100_MAX_VIS 4096
> >  #define EF100_NUM_MCDI_BUFFERS 1
> > @@ -696,6 +697,12 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
> >         }
> >  }
> >
> > +static unsigned int efx_ef100_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       /* Maximum link speed for Riverhead is 100G */
> > +       return 10 * EFX_RECYCLE_RING_SIZE_10G;
> > +}
> > +
> >  /*     NIC level access functions
> >   */
> >  #define EF100_OFFLOAD_FEATURES (NETIF_F_HW_CSUM | NETIF_F_RXCSUM |     \
> > @@ -770,6 +777,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
> >         .rx_push_rss_context_config = efx_mcdi_rx_push_rss_context_config,
> >         .rx_pull_rss_context_config = efx_mcdi_rx_pull_rss_context_config,
> >         .rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
> > +       .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
> >
> >         .reconfigure_mac = ef100_reconfigure_mac,
> >         .reconfigure_port = efx_mcdi_port_reconfigure,
> > @@ -849,6 +857,7 @@ const struct efx_nic_type ef100_vf_nic_type = {
> >         .rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
> >         .rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
> >         .rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
> > +       .rx_recycle_ring_size = efx_ef100_recycle_ring_size,
> >
> >         .reconfigure_mac = ef100_reconfigure_mac,
> >         .test_nvram = efx_new_mcdi_nvram_test_all,
> > diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
> > index cc15ee8812d9..c75dc75e2857 100644
> > --- a/drivers/net/ethernet/sfc/net_driver.h
> > +++ b/drivers/net/ethernet/sfc/net_driver.h
> > @@ -1282,6 +1282,7 @@ struct efx_udp_tunnel {
> >   * @udp_tnl_has_port: Check if a port has been added as UDP tunnel
> >   * @print_additional_fwver: Dump NIC-specific additional FW version info
> >   * @sensor_event: Handle a sensor event from MCDI
> > + * @rx_recycle_ring_size: Size of the RX recycle ring
> >   * @revision: Hardware architecture revision
> >   * @txd_ptr_tbl_base: TX descriptor ring base address
> >   * @rxd_ptr_tbl_base: RX descriptor ring base address
> > @@ -1460,6 +1461,7 @@ struct efx_nic_type {
> >         size_t (*print_additional_fwver)(struct efx_nic *efx, char *buf,
> >                                          size_t len);
> >         void (*sensor_event)(struct efx_nic *efx, efx_qword_t *ev);
> > +       unsigned int (*rx_recycle_ring_size)(const struct efx_nic *efx);
> >
> >         int revision;
> >         unsigned int txd_ptr_tbl_base;
> > diff --git a/drivers/net/ethernet/sfc/nic_common.h b/drivers/net/ethernet/sfc/nic_common.h
> > index b9cafe9cd568..0cef35c0c559 100644
> > --- a/drivers/net/ethernet/sfc/nic_common.h
> > +++ b/drivers/net/ethernet/sfc/nic_common.h
> > @@ -195,6 +195,11 @@ static inline void efx_sensor_event(struct efx_nic *efx, efx_qword_t *ev)
> >                 efx->type->sensor_event(efx, ev);
> >  }
> >
> > +static inline unsigned int efx_rx_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       return efx->type->rx_recycle_ring_size(efx);
> > +}
> > +
> >  /* Some statistics are computed as A - B where A and B each increase
> >   * linearly with some hardware counter(s) and the counters are read
> >   * asynchronously.  If the counters contributing to B are always read
> > diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
> > index 633ca77a26fd..1b22c7be0088 100644
> > --- a/drivers/net/ethernet/sfc/rx_common.c
> > +++ b/drivers/net/ethernet/sfc/rx_common.c
> > @@ -23,13 +23,6 @@ module_param(rx_refill_threshold, uint, 0444);
> >  MODULE_PARM_DESC(rx_refill_threshold,
> >                  "RX descriptor ring refill threshold (%)");
> >
> > -/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
> > - * ring, this number is divided by the number of buffers per page to calculate
> > - * the number of pages to store in the RX page recycle ring.
> > - */
> > -#define EFX_RECYCLE_RING_SIZE_IOMMU 4096
> > -#define EFX_RECYCLE_RING_SIZE_NOIOMMU (2 * EFX_RX_PREFERRED_BATCH)
> > -
> >  /* RX maximum head room required.
> >   *
> >   * This must be at least 1 to prevent overflow, plus one packet-worth
> > @@ -141,16 +134,7 @@ static void efx_init_rx_recycle_ring(struct efx_rx_queue *rx_queue)
> >         unsigned int bufs_in_recycle_ring, page_ring_size;
> >         struct efx_nic *efx = rx_queue->efx;
> >
> > -       /* Set the RX recycle ring size */
> > -#ifdef CONFIG_PPC64
> > -       bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
> > -#else
> > -       if (iommu_present(&pci_bus_type))
> > -               bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
> > -       else
> > -               bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_NOIOMMU;
> > -#endif /* CONFIG_PPC64 */
> > -
> > +       bufs_in_recycle_ring = efx_rx_recycle_ring_size(efx);
> >         page_ring_size = roundup_pow_of_two(bufs_in_recycle_ring /
> >                                             efx->rx_bufs_per_page);
> >         rx_queue->page_ring = kcalloc(page_ring_size,
> > diff --git a/drivers/net/ethernet/sfc/rx_common.h b/drivers/net/ethernet/sfc/rx_common.h
> > index 207ccd8ba062..fbd2769307f9 100644
> > --- a/drivers/net/ethernet/sfc/rx_common.h
> > +++ b/drivers/net/ethernet/sfc/rx_common.h
> > @@ -18,6 +18,12 @@
> >  #define EFX_RX_MAX_FRAGS DIV_ROUND_UP(EFX_MAX_FRAME_LEN(EFX_MAX_MTU), \
> >                                       EFX_RX_USR_BUF_SIZE)
> >
> > +/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
> > + * ring, this number is divided by the number of buffers per page to calculate
> > + * the number of pages to store in the RX page recycle ring.
> > + */
> > +#define EFX_RECYCLE_RING_SIZE_10G      256
> > +
> >  static inline u8 *efx_rx_buf_va(struct efx_rx_buffer *buf)
> >  {
> >         return page_address(buf->page) + buf->page_offset;
> > diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
> > index 16347a6d0c47..ce3060e15b54 100644
> > --- a/drivers/net/ethernet/sfc/siena.c
> > +++ b/drivers/net/ethernet/sfc/siena.c
> > @@ -25,6 +25,7 @@
> >  #include "mcdi_port_common.h"
> >  #include "selftest.h"
> >  #include "siena_sriov.h"
> > +#include "rx_common.h"
> >
> >  /* Hardware control for SFC9000 family including SFL9021 (aka Siena). */
> >
> > @@ -958,6 +959,12 @@ static unsigned int siena_check_caps(const struct efx_nic *efx,
> >         return 0;
> >  }
> >
> > +static unsigned int efx_siena_recycle_ring_size(const struct efx_nic *efx)
> > +{
> > +       /* Maximum link speed is 10G */
> > +       return EFX_RECYCLE_RING_SIZE_10G;
> > +}
> > +
> >  /**************************************************************************
> >   *
> >   * Revision-dependent attributes used by efx.c and nic.c
> > @@ -1098,4 +1105,5 @@ const struct efx_nic_type siena_a0_nic_type = {
> >         .rx_hash_key_size = 16,
> >         .check_caps = siena_check_caps,
> >         .sensor_event = efx_mcdi_sensor_event,
> > +       .rx_recycle_ring_size = efx_siena_recycle_ring_size,
> >  };
> >
> 
> 
> -- 
> Íñigo Huguet

[-- Attachment #2: receive-flamegraph.svg --]
[-- Type: image/svg+xml, Size: 165963 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH V2 net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-10  8:58         ` [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible Martin Habets
  2022-01-10  9:31           ` Íñigo Huguet
  2022-01-10 17:22           ` Jakub Kicinski
@ 2022-01-31 11:10           ` Martin Habets
  2022-02-02  5:10             ` patchwork-bot+netdevbpf
  2 siblings, 1 reply; 15+ messages in thread
From: Martin Habets @ 2022-01-31 11:10 UTC (permalink / raw)
  To: Íñigo Huguet, davem, kuba, ecree.xilinx, netdev, dinang

Ideally the size would depend on the link speed, but the recycle
ring is created when the interface is brought up before the driver
knows the link speed. So size it for the maximum speed of a given NIC.
PowerPC is only supported on SFN7xxx and SFN8xxx NICs.

With this patch on a 40G NIC the number of calls to alloc_pages and
friends went down from about 18% to under 2%.
On a 10G NIC the number of calls to alloc_pages and friends went down
from about 15% to 0 (perf did not capture any calls during the 60
second test).
On a 100G NIC the number of calls to alloc_pages and friends went down
from about 23% to 4%.

Change from V1:
- Factored out the PowerPC scaling to be outside the swtich.

Reported-by: Íñigo Huguet <ihuguet@redhat.com>
Signed-off-by: Martin Habets <habetsm.xilinx@gmail.com>
---
 drivers/net/ethernet/sfc/ef10.c       |   26 ++++++++++++++++++++++++++
 drivers/net/ethernet/sfc/ef100_nic.c  |    9 +++++++++
 drivers/net/ethernet/sfc/net_driver.h |    2 ++
 drivers/net/ethernet/sfc/nic_common.h |    5 +++++
 drivers/net/ethernet/sfc/rx_common.c  |   18 +-----------------
 drivers/net/ethernet/sfc/rx_common.h  |    6 ++++++
 drivers/net/ethernet/sfc/siena.c      |    8 ++++++++
 7 files changed, 57 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c
index cf366ed2557c..50d535981a35 100644
--- a/drivers/net/ethernet/sfc/ef10.c
+++ b/drivers/net/ethernet/sfc/ef10.c
@@ -3990,6 +3990,30 @@ static unsigned int ef10_check_caps(const struct efx_nic *efx,
 	}
 }
 
+static unsigned int efx_ef10_recycle_ring_size(const struct efx_nic *efx)
+{
+	unsigned int ret = EFX_RECYCLE_RING_SIZE_10G;
+
+	/* There is no difference between PFs and VFs. The side is based on
+	 * the maximum link speed of a given NIC.
+	 */
+	switch (efx->pci_dev->device & 0xfff) {
+	case 0x0903:	/* Farmingdale can do up to 10G */
+		break;
+	case 0x0923:	/* Greenport can do up to 40G */
+	case 0x0a03:	/* Medford can do up to 40G */
+		ret *= 4;
+		break;
+	default:	/* Medford2 can do up to 100G */
+		ret *= 10;
+	}
+
+	if (IS_ENABLED(CONFIG_PPC64))
+		ret *= 4;
+
+	return ret;
+}
+
 #define EF10_OFFLOAD_FEATURES		\
 	(NETIF_F_IP_CSUM |		\
 	 NETIF_F_HW_VLAN_CTAG_FILTER |	\
@@ -4106,6 +4130,7 @@ const struct efx_nic_type efx_hunt_a0_vf_nic_type = {
 	.check_caps = ef10_check_caps,
 	.print_additional_fwver = efx_ef10_print_additional_fwver,
 	.sensor_event = efx_mcdi_sensor_event,
+	.rx_recycle_ring_size = efx_ef10_recycle_ring_size,
 };
 
 const struct efx_nic_type efx_hunt_a0_nic_type = {
@@ -4243,4 +4268,5 @@ const struct efx_nic_type efx_hunt_a0_nic_type = {
 	.check_caps = ef10_check_caps,
 	.print_additional_fwver = efx_ef10_print_additional_fwver,
 	.sensor_event = efx_mcdi_sensor_event,
+	.rx_recycle_ring_size = efx_ef10_recycle_ring_size,
 };
diff --git a/drivers/net/ethernet/sfc/ef100_nic.c b/drivers/net/ethernet/sfc/ef100_nic.c
index f79b14a119ae..a07cbf45a326 100644
--- a/drivers/net/ethernet/sfc/ef100_nic.c
+++ b/drivers/net/ethernet/sfc/ef100_nic.c
@@ -23,6 +23,7 @@
 #include "ef100_rx.h"
 #include "ef100_tx.h"
 #include "ef100_netdev.h"
+#include "rx_common.h"
 
 #define EF100_MAX_VIS 4096
 #define EF100_NUM_MCDI_BUFFERS	1
@@ -696,6 +697,12 @@ static unsigned int ef100_check_caps(const struct efx_nic *efx,
 	}
 }
 
+static unsigned int efx_ef100_recycle_ring_size(const struct efx_nic *efx)
+{
+	/* Maximum link speed for Riverhead is 100G */
+	return 10 * EFX_RECYCLE_RING_SIZE_10G;
+}
+
 /*	NIC level access functions
  */
 #define EF100_OFFLOAD_FEATURES	(NETIF_F_HW_CSUM | NETIF_F_RXCSUM |	\
@@ -770,6 +777,7 @@ const struct efx_nic_type ef100_pf_nic_type = {
 	.rx_push_rss_context_config = efx_mcdi_rx_push_rss_context_config,
 	.rx_pull_rss_context_config = efx_mcdi_rx_pull_rss_context_config,
 	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+	.rx_recycle_ring_size = efx_ef100_recycle_ring_size,
 
 	.reconfigure_mac = ef100_reconfigure_mac,
 	.reconfigure_port = efx_mcdi_port_reconfigure,
@@ -849,6 +857,7 @@ const struct efx_nic_type ef100_vf_nic_type = {
 	.rx_pull_rss_config = efx_mcdi_rx_pull_rss_config,
 	.rx_push_rss_config = efx_mcdi_pf_rx_push_rss_config,
 	.rx_restore_rss_contexts = efx_mcdi_rx_restore_rss_contexts,
+	.rx_recycle_ring_size = efx_ef100_recycle_ring_size,
 
 	.reconfigure_mac = ef100_reconfigure_mac,
 	.test_nvram = efx_new_mcdi_nvram_test_all,
diff --git a/drivers/net/ethernet/sfc/net_driver.h b/drivers/net/ethernet/sfc/net_driver.h
index cc15ee8812d9..c75dc75e2857 100644
--- a/drivers/net/ethernet/sfc/net_driver.h
+++ b/drivers/net/ethernet/sfc/net_driver.h
@@ -1282,6 +1282,7 @@ struct efx_udp_tunnel {
  * @udp_tnl_has_port: Check if a port has been added as UDP tunnel
  * @print_additional_fwver: Dump NIC-specific additional FW version info
  * @sensor_event: Handle a sensor event from MCDI
+ * @rx_recycle_ring_size: Size of the RX recycle ring
  * @revision: Hardware architecture revision
  * @txd_ptr_tbl_base: TX descriptor ring base address
  * @rxd_ptr_tbl_base: RX descriptor ring base address
@@ -1460,6 +1461,7 @@ struct efx_nic_type {
 	size_t (*print_additional_fwver)(struct efx_nic *efx, char *buf,
 					 size_t len);
 	void (*sensor_event)(struct efx_nic *efx, efx_qword_t *ev);
+	unsigned int (*rx_recycle_ring_size)(const struct efx_nic *efx);
 
 	int revision;
 	unsigned int txd_ptr_tbl_base;
diff --git a/drivers/net/ethernet/sfc/nic_common.h b/drivers/net/ethernet/sfc/nic_common.h
index b9cafe9cd568..0cef35c0c559 100644
--- a/drivers/net/ethernet/sfc/nic_common.h
+++ b/drivers/net/ethernet/sfc/nic_common.h
@@ -195,6 +195,11 @@ static inline void efx_sensor_event(struct efx_nic *efx, efx_qword_t *ev)
 		efx->type->sensor_event(efx, ev);
 }
 
+static inline unsigned int efx_rx_recycle_ring_size(const struct efx_nic *efx)
+{
+	return efx->type->rx_recycle_ring_size(efx);
+}
+
 /* Some statistics are computed as A - B where A and B each increase
  * linearly with some hardware counter(s) and the counters are read
  * asynchronously.  If the counters contributing to B are always read
diff --git a/drivers/net/ethernet/sfc/rx_common.c b/drivers/net/ethernet/sfc/rx_common.c
index 633ca77a26fd..1b22c7be0088 100644
--- a/drivers/net/ethernet/sfc/rx_common.c
+++ b/drivers/net/ethernet/sfc/rx_common.c
@@ -23,13 +23,6 @@ module_param(rx_refill_threshold, uint, 0444);
 MODULE_PARM_DESC(rx_refill_threshold,
 		 "RX descriptor ring refill threshold (%)");
 
-/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
- * ring, this number is divided by the number of buffers per page to calculate
- * the number of pages to store in the RX page recycle ring.
- */
-#define EFX_RECYCLE_RING_SIZE_IOMMU 4096
-#define EFX_RECYCLE_RING_SIZE_NOIOMMU (2 * EFX_RX_PREFERRED_BATCH)
-
 /* RX maximum head room required.
  *
  * This must be at least 1 to prevent overflow, plus one packet-worth
@@ -141,16 +134,7 @@ static void efx_init_rx_recycle_ring(struct efx_rx_queue *rx_queue)
 	unsigned int bufs_in_recycle_ring, page_ring_size;
 	struct efx_nic *efx = rx_queue->efx;
 
-	/* Set the RX recycle ring size */
-#ifdef CONFIG_PPC64
-	bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
-#else
-	if (iommu_present(&pci_bus_type))
-		bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_IOMMU;
-	else
-		bufs_in_recycle_ring = EFX_RECYCLE_RING_SIZE_NOIOMMU;
-#endif /* CONFIG_PPC64 */
-
+	bufs_in_recycle_ring = efx_rx_recycle_ring_size(efx);
 	page_ring_size = roundup_pow_of_two(bufs_in_recycle_ring /
 					    efx->rx_bufs_per_page);
 	rx_queue->page_ring = kcalloc(page_ring_size,
diff --git a/drivers/net/ethernet/sfc/rx_common.h b/drivers/net/ethernet/sfc/rx_common.h
index 207ccd8ba062..fbd2769307f9 100644
--- a/drivers/net/ethernet/sfc/rx_common.h
+++ b/drivers/net/ethernet/sfc/rx_common.h
@@ -18,6 +18,12 @@
 #define EFX_RX_MAX_FRAGS DIV_ROUND_UP(EFX_MAX_FRAME_LEN(EFX_MAX_MTU), \
 				      EFX_RX_USR_BUF_SIZE)
 
+/* Number of RX buffers to recycle pages for.  When creating the RX page recycle
+ * ring, this number is divided by the number of buffers per page to calculate
+ * the number of pages to store in the RX page recycle ring.
+ */
+#define EFX_RECYCLE_RING_SIZE_10G	256
+
 static inline u8 *efx_rx_buf_va(struct efx_rx_buffer *buf)
 {
 	return page_address(buf->page) + buf->page_offset;
diff --git a/drivers/net/ethernet/sfc/siena.c b/drivers/net/ethernet/sfc/siena.c
index 16347a6d0c47..ce3060e15b54 100644
--- a/drivers/net/ethernet/sfc/siena.c
+++ b/drivers/net/ethernet/sfc/siena.c
@@ -25,6 +25,7 @@
 #include "mcdi_port_common.h"
 #include "selftest.h"
 #include "siena_sriov.h"
+#include "rx_common.h"
 
 /* Hardware control for SFC9000 family including SFL9021 (aka Siena). */
 
@@ -958,6 +959,12 @@ static unsigned int siena_check_caps(const struct efx_nic *efx,
 	return 0;
 }
 
+static unsigned int efx_siena_recycle_ring_size(const struct efx_nic *efx)
+{
+	/* Maximum link speed is 10G */
+	return EFX_RECYCLE_RING_SIZE_10G;
+}
+
 /**************************************************************************
  *
  * Revision-dependent attributes used by efx.c and nic.c
@@ -1098,4 +1105,5 @@ const struct efx_nic_type siena_a0_nic_type = {
 	.rx_hash_key_size = 16,
 	.check_caps = siena_check_caps,
 	.sensor_event = efx_mcdi_sensor_event,
+	.rx_recycle_ring_size = efx_siena_recycle_ring_size,
 };

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH V2 net-next] sfc: The size of the RX recycle ring should be more flexible
  2022-01-31 11:10           ` [PATCH V2 " Martin Habets
@ 2022-02-02  5:10             ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-02-02  5:10 UTC (permalink / raw)
  To: Martin Habets; +Cc: ihuguet, davem, kuba, ecree.xilinx, netdev, dinang

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 31 Jan 2022 11:10:54 +0000 you wrote:
> Ideally the size would depend on the link speed, but the recycle
> ring is created when the interface is brought up before the driver
> knows the link speed. So size it for the maximum speed of a given NIC.
> PowerPC is only supported on SFN7xxx and SFN8xxx NICs.
> 
> With this patch on a 40G NIC the number of calls to alloc_pages and
> friends went down from about 18% to under 2%.
> On a 10G NIC the number of calls to alloc_pages and friends went down
> from about 15% to 0 (perf did not capture any calls during the 60
> second test).
> On a 100G NIC the number of calls to alloc_pages and friends went down
> from about 23% to 4%.
> 
> [...]

Here is the summary with links:
  - [V2,net-next] sfc: The size of the RX recycle ring should be more flexible
    https://git.kernel.org/netdev/net-next/c/000fe940e51f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-02-02  5:10 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18 15:14 Bad performance in RX with sfc 40G Íñigo Huguet
2021-11-18 17:19 ` Eric Dumazet
2021-12-02 14:26   ` Íñigo Huguet
2021-11-20  8:31 ` Martin Habets
2021-12-09 12:06   ` Íñigo Huguet
2021-12-23 13:18     ` Íñigo Huguet
2022-01-02  9:22       ` Martin Habets
2022-01-10  8:58         ` [PATCH net-next] sfc: The size of the RX recycle ring should be more flexible Martin Habets
2022-01-10  9:31           ` Íñigo Huguet
2022-01-12  9:05             ` Martin Habets
2022-01-31 11:08             ` Martin Habets
2022-01-10 17:22           ` Jakub Kicinski
2022-01-12  9:08             ` Martin Habets
2022-01-31 11:10           ` [PATCH V2 " Martin Habets
2022-02-02  5:10             ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.