linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] net: Optimize the qed* allocations inside kdump kernel
@ 2020-05-05 19:04 Bhupesh Sharma
  2020-05-05 19:04 ` [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running " Bhupesh Sharma
  2020-05-05 19:04 ` [PATCH 2/2] net: qed: Disable SRIOV functionality " Bhupesh Sharma
  0 siblings, 2 replies; 7+ messages in thread
From: Bhupesh Sharma @ 2020-05-05 19:04 UTC (permalink / raw)
  To: netdev
  Cc: bhsharma, bhupesh.linux, kexec, linux-kernel, aelior,
	GR-everest-linux-l2, manishc, davem

Since kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs, large memory allocations done by a network driver
can cause the crashkernel to panic with OOM.

The qed* drivers take up approximately 214MB memory when run in the
kdump kernel with the default configuration settings presently used in
the driver. With an usual crashkernel size of 512M, this allocation
is equal to almost half of the total crashkernel size allocated.

See some logs obtained via memstrack tool (see [1]) below:
 dracut-pre-pivot[676]: ======== Report format module_summary: ========
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
 dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)

This patchset tries to reduce the overall memory allocation profile of
the qed* driver when they run in the kdump kernel. With these
optimization we can see a saving of approx 85M in the kdump kernel:
 dracut-pre-pivot[671]: ======== Report format module_summary: ========
 dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)
 <..snip..>
 dracut-pre-pivot[671]: Module qede using 4.6MB (73 pages), peak allocation 4.6MB (74 pages)

And the kdump kernel can save vmcore successfully via both ssh and nfs
interfaces.

This patchset contains two patches:
[PATCH 1/2] - Reduces the default TX and RX ring count in kdump kernel.
[PATCH 2/2] - Disables qed SRIOV feature in kdump kernel (as it is
              normally not a supported kdump target for saving
	      vmcore).

[1]. Memstrack tool: https://github.com/ryncsn/memstrack

-
Bhupesh Sharma (2):
  net: qed*: Reduce RX and TX default ring count when running inside
    kdump kernel
  net: qed: Disable SRIOV functionality inside kdump kernel

 drivers/net/ethernet/qlogic/qed/qed_sriov.h  | 10 +++++++---
 drivers/net/ethernet/qlogic/qede/qede.h      |  5 +++--
 drivers/net/ethernet/qlogic/qede/qede_main.c |  2 +-
 3 files changed, 11 insertions(+), 6 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel
  2020-05-05 19:04 [PATCH 0/2] net: Optimize the qed* allocations inside kdump kernel Bhupesh Sharma
@ 2020-05-05 19:04 ` Bhupesh Sharma
  2020-05-05 21:24   ` David Miller
  2020-05-06  6:51   ` [EXT] " Igor Russkikh
  2020-05-05 19:04 ` [PATCH 2/2] net: qed: Disable SRIOV functionality " Bhupesh Sharma
  1 sibling, 2 replies; 7+ messages in thread
From: Bhupesh Sharma @ 2020-05-05 19:04 UTC (permalink / raw)
  To: netdev
  Cc: bhsharma, bhupesh.linux, kexec, linux-kernel, aelior,
	GR-everest-linux-l2, manishc, davem

Normally kdump kernel(s) run under severe memory constraint with the
basic idea being to save the crashdump vmcore reliably when the primary
kernel panics/hangs.

Currently the qed* ethernet driver ends up consuming a lot of memory in
the kdump kernel, leading to kdump kernel panic when one tries to save
the vmcore via ssh/nfs (thus utilizing the services of the underlying
qed* network interfaces).

An example OOM message log seen in the kdump kernel can be seen here
[1], with crashkernel size reservation of 512M.

Using tools like memstrack (see [2]), we can track the modules taking up
the bulk of memory in the kdump kernel and organize the memory usage
output as per 'highest allocator first'. An example log for the OOM case
indicates that the qed* modules end up allocating approximately 216M
memory, which is a large part of the total crashkernel size:

 dracut-pre-pivot[676]: ======== Report format module_summary: ========
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)
 dracut-pre-pivot[676]: Module qede using 65.3MB (1045 pages), peak allocation 65.3MB (1045 pages)

This patch reduces the default RX and TX ring count from 1024 to 64
when running inside kdump kernel, which leads to a significant memory
saving.

An example log with the patch applied shows the reduced memory
allocation in the kdump kernel:
 dracut-pre-pivot[674]: ======== Report format module_summary: ========
 dracut-pre-pivot[674]: Module qed using 141.8MB (2268 pages), peak allocation 141.8MB (2268 pages)
 <..snip..>
[dracut-pre-pivot[674]: Module qede using 4.8MB (76 pages), peak allocation 4.9MB (78 pages)

Tested crashdump vmcore save via ssh/nfs protocol using underlying qed*
network interface after applying this patch.

[1] OOM log:
------------

 kworker/0:6: page allocation failure: order:6,
 mode:0x60c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null)
 kworker/0:6 cpuset=/ mems_allowed=0
 CPU: 0 PID: 145 Comm: kworker/0:6 Not tainted 4.18.0-109.el8.aarch64 #1
 Hardware name: To be filled by O.E.M. Saber/Saber, BIOS 0ACKL025
 01/18/2019
 Workqueue: events work_for_cpu_fn
 Call trace:
  dump_backtrace+0x0/0x188
  show_stack+0x24/0x30
  dump_stack+0x90/0xb4
  warn_alloc+0xf4/0x178
  __alloc_pages_nodemask+0xcac/0xd58
  alloc_pages_current+0x8c/0xf8
  kmalloc_order_trace+0x38/0x108
  qed_iov_alloc+0x40/0x248 [qed]
  qed_resc_alloc+0x224/0x518 [qed]
  qed_slowpath_start+0x254/0x928 [qed]
   __qede_probe+0xf8/0x5e0 [qede]
  qede_probe+0x68/0xd8 [qede]
  local_pci_probe+0x44/0xa8
  work_for_cpu_fn+0x20/0x30
  process_one_work+0x1ac/0x3e8
  worker_thread+0x44/0x448
  kthread+0x130/0x138
  ret_from_fork+0x10/0x18
  Cannot start slowpath
  qede: probe of 0000:05:00.1 failed with error -12

[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Cc: Manish Chopra <manishc@marvell.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
---
 drivers/net/ethernet/qlogic/qede/qede.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 234c6f30effb..b55ab32ef0b3 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -32,6 +32,7 @@
 #ifndef _QEDE_H_
 #define _QEDE_H_
 #include <linux/compiler.h>
+#include <linux/crash_dump.h>
 #include <linux/version.h>
 #include <linux/workqueue.h>
 #include <linux/netdevice.h>
@@ -574,13 +575,13 @@ int qede_add_tc_flower_fltr(struct qede_dev *edev, __be16 proto,
 #define RX_RING_SIZE		((u16)BIT(RX_RING_SIZE_POW))
 #define NUM_RX_BDS_MAX		(RX_RING_SIZE - 1)
 #define NUM_RX_BDS_MIN		128
-#define NUM_RX_BDS_DEF		((u16)BIT(10) - 1)
+#define NUM_RX_BDS_DEF		((is_kdump_kernel()) ? ((u16)BIT(6) - 1) : ((u16)BIT(10) - 1))
 
 #define TX_RING_SIZE_POW	13
 #define TX_RING_SIZE		((u16)BIT(TX_RING_SIZE_POW))
 #define NUM_TX_BDS_MAX		(TX_RING_SIZE - 1)
 #define NUM_TX_BDS_MIN		128
-#define NUM_TX_BDS_DEF		NUM_TX_BDS_MAX
+#define NUM_TX_BDS_DEF		((is_kdump_kernel()) ? ((u16)BIT(6) - 1) : NUM_TX_BDS_MAX)
 
 #define QEDE_MIN_PKT_LEN		64
 #define QEDE_RX_HDR_SIZE		256
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] net: qed: Disable SRIOV functionality inside kdump kernel
  2020-05-05 19:04 [PATCH 0/2] net: Optimize the qed* allocations inside kdump kernel Bhupesh Sharma
  2020-05-05 19:04 ` [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running " Bhupesh Sharma
@ 2020-05-05 19:04 ` Bhupesh Sharma
  1 sibling, 0 replies; 7+ messages in thread
From: Bhupesh Sharma @ 2020-05-05 19:04 UTC (permalink / raw)
  To: netdev
  Cc: bhsharma, bhupesh.linux, kexec, linux-kernel, aelior,
	GR-everest-linux-l2, manishc, davem

Since we have kdump kernel(s) running under severe memory constraint
it makes sense to disable the qed SRIOV functionality when running the
kdump kernel as kdump configurations on several distributions don't
support SRIOV targets for saving the vmcore (see [1] for example).

Currently the qed SRIOV functionality ends up consuming memory in
the kdump kernel, when we don't really use the same.

An example log seen in the kdump kernel with the SRIOV functionality
enabled can be seen below (obtained via memstrack tool, see [2]):
 dracut-pre-pivot[676]: ======== Report format module_summary: ========
 dracut-pre-pivot[676]: Module qed using 149.6MB (2394 pages), peak allocation 149.6MB (2394 pages)

This patch disables the SRIOV functionality inside kdump kernel and with
the same applied the memory consumption goes down:
 dracut-pre-pivot[671]: ======== Report format module_summary: ========
 dracut-pre-pivot[671]: Module qed using 124.6MB (1993 pages), peak allocation 124.7MB (1995 pages)

[1]. https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/installing-and-configuring-kdump_managing-monitoring-and-updating-the-kernel#supported-kdump-targets_supported-kdump-configurations-and-targets
[2]. Memstrack tool: https://github.com/ryncsn/memstrack

Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Cc: Manish Chopra <manishc@marvell.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
---
 drivers/net/ethernet/qlogic/qed/qed_sriov.h  | 10 +++++++---
 drivers/net/ethernet/qlogic/qede/qede_main.c |  2 +-
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_sriov.h b/drivers/net/ethernet/qlogic/qed/qed_sriov.h
index 368e88565783..f2ebd9a76e20 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sriov.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_sriov.h
@@ -32,6 +32,7 @@
 
 #ifndef _QED_SRIOV_H
 #define _QED_SRIOV_H
+#include <linux/crash_dump.h>
 #include <linux/types.h>
 #include "qed_vf.h"
 
@@ -40,9 +41,12 @@
 #define QED_VF_ARRAY_LENGTH (3)
 
 #ifdef CONFIG_QED_SRIOV
-#define IS_VF(cdev)             ((cdev)->b_is_vf)
-#define IS_PF(cdev)             (!((cdev)->b_is_vf))
-#define IS_PF_SRIOV(p_hwfn)     (!!((p_hwfn)->cdev->p_iov_info))
+#define IS_VF(cdev)             ((is_kdump_kernel()) ? \
+				 (0) : ((cdev)->b_is_vf))
+#define IS_PF(cdev)             ((is_kdump_kernel()) ? \
+				 (1) : !((cdev)->b_is_vf))
+#define IS_PF_SRIOV(p_hwfn)     ((is_kdump_kernel()) ? \
+				 (0) : !!((p_hwfn)->cdev->p_iov_info))
 #else
 #define IS_VF(cdev)             (0)
 #define IS_PF(cdev)             (1)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 34fa3917eb33..f557ae90ce7c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1187,7 +1187,7 @@ static int qede_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	case QEDE_PRIVATE_VF:
 		if (debug & QED_LOG_VERBOSE_MASK)
 			dev_err(&pdev->dev, "Probing a VF\n");
-		is_vf = true;
+		is_vf = is_kdump_kernel() ? false : true;
 		break;
 	default:
 		if (debug & QED_LOG_VERBOSE_MASK)
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel
  2020-05-05 19:04 ` [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running " Bhupesh Sharma
@ 2020-05-05 21:24   ` David Miller
  2020-05-06  5:04     ` Bhupesh Sharma
  2020-05-06  6:51   ` [EXT] " Igor Russkikh
  1 sibling, 1 reply; 7+ messages in thread
From: David Miller @ 2020-05-05 21:24 UTC (permalink / raw)
  To: bhsharma
  Cc: netdev, bhupesh.linux, kexec, linux-kernel, aelior,
	GR-everest-linux-l2, manishc

From: Bhupesh Sharma <bhsharma@redhat.com>
Date: Wed,  6 May 2020 00:34:40 +0530

> -#define NUM_RX_BDS_DEF		((u16)BIT(10) - 1)
> +#define NUM_RX_BDS_DEF		((is_kdump_kernel()) ? ((u16)BIT(6) - 1) : ((u16)BIT(10) - 1))

These parenthesis are very excessive and unnecessary.  At the
very least remove the parenthesis around is_kdump_kernel().

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel
  2020-05-05 21:24   ` David Miller
@ 2020-05-06  5:04     ` Bhupesh Sharma
  0 siblings, 0 replies; 7+ messages in thread
From: Bhupesh Sharma @ 2020-05-06  5:04 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, Bhupesh SHARMA, kexec mailing list,
	Linux Kernel Mailing List, aelior, GR-everest-linux-l2, manishc

Hi David,

On Wed, May 6, 2020 at 2:54 AM David Miller <davem@davemloft.net> wrote:
>
> From: Bhupesh Sharma <bhsharma@redhat.com>
> Date: Wed,  6 May 2020 00:34:40 +0530
>
> > -#define NUM_RX_BDS_DEF               ((u16)BIT(10) - 1)
> > +#define NUM_RX_BDS_DEF               ((is_kdump_kernel()) ? ((u16)BIT(6) - 1) : ((u16)BIT(10) - 1))
>
> These parenthesis are very excessive and unnecessary.  At the
> very least remove the parenthesis around is_kdump_kernel().

Thanks a lot for the review.
Sure, will fix this in the v2.

Regards,
Bhupesh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel
  2020-05-05 19:04 ` [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running " Bhupesh Sharma
  2020-05-05 21:24   ` David Miller
@ 2020-05-06  6:51   ` Igor Russkikh
  2020-05-06  7:13     ` Bhupesh Sharma
  1 sibling, 1 reply; 7+ messages in thread
From: Igor Russkikh @ 2020-05-06  6:51 UTC (permalink / raw)
  To: Bhupesh Sharma, netdev
  Cc: bhupesh.linux, kexec, linux-kernel, Ariel Elior,
	GR-everest-linux-l2, Manish Chopra, davem, Alok Prasad



>  #include <linux/compiler.h>
> +#include <linux/crash_dump.h>
>  #include <linux/version.h>
>  #include <linux/workqueue.h>
>  #include <linux/netdevice.h>
> @@ -574,13 +575,13 @@ int qede_add_tc_flower_fltr(struct qede_dev *edev,
> __be16 proto,
>  #define RX_RING_SIZE		((u16)BIT(RX_RING_SIZE_POW))
>  #define NUM_RX_BDS_MAX		(RX_RING_SIZE - 1)
>  #define NUM_RX_BDS_MIN		128
> -#define NUM_RX_BDS_DEF		((u16)BIT(10) - 1)
> +#define NUM_RX_BDS_DEF		((is_kdump_kernel()) ? ((u16)BIT(6) - 1) :
> ((u16)BIT(10) - 1))
>  
>  #define TX_RING_SIZE_POW	13
>  #define TX_RING_SIZE		((u16)BIT(TX_RING_SIZE_POW))
>  #define NUM_TX_BDS_MAX		(TX_RING_SIZE - 1)
>  #define NUM_TX_BDS_MIN		128
> -#define NUM_TX_BDS_DEF		NUM_TX_BDS_MAX
> +#define NUM_TX_BDS_DEF		((is_kdump_kernel()) ? ((u16)BIT(6) - 1) :
> NUM_TX_BDS_MAX)
>  

Hi Bhupesh,

Thanks for looking into this. We are also analyzing how to reduce qed* memory
usage even more.

Patch is good, but may I suggest not to introduce conditional logic into the
defines but instead just add two new defines like NUM_[RT]X_BDS_MIN and check
for is_kdump_kernel() in the code explicitly?

if (is_kdump_kernel()) {
	edev->q_num_rx_buffers = NUM_RX_BDS_MIN;
	edev->q_num_tx_buffers = NUM_TX_BDS_MIN;
} else {
	edev->q_num_rx_buffers = NUM_RX_BDS_DEF;
	edev->q_num_tx_buffers = NUM_TX_BDS_DEF;
}

This may make configuration logic more explicit. If future we may want adding
more specific configs under this `if`.

Regards
  Igor

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [EXT] [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running inside kdump kernel
  2020-05-06  6:51   ` [EXT] " Igor Russkikh
@ 2020-05-06  7:13     ` Bhupesh Sharma
  0 siblings, 0 replies; 7+ messages in thread
From: Bhupesh Sharma @ 2020-05-06  7:13 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: netdev, Bhupesh SHARMA, kexec mailing list,
	Linux Kernel Mailing List, Ariel Elior, GR-everest-linux-l2,
	Manish Chopra, David S . Miller, Alok Prasad

Hello Igor,

On Wed, May 6, 2020 at 12:21 PM Igor Russkikh <irusskikh@marvell.com> wrote:
>
>
>
> >  #include <linux/compiler.h>
> > +#include <linux/crash_dump.h>
> >  #include <linux/version.h>
> >  #include <linux/workqueue.h>
> >  #include <linux/netdevice.h>
> > @@ -574,13 +575,13 @@ int qede_add_tc_flower_fltr(struct qede_dev *edev,
> > __be16 proto,
> >  #define RX_RING_SIZE         ((u16)BIT(RX_RING_SIZE_POW))
> >  #define NUM_RX_BDS_MAX               (RX_RING_SIZE - 1)
> >  #define NUM_RX_BDS_MIN               128
> > -#define NUM_RX_BDS_DEF               ((u16)BIT(10) - 1)
> > +#define NUM_RX_BDS_DEF               ((is_kdump_kernel()) ? ((u16)BIT(6) - 1) :
> > ((u16)BIT(10) - 1))
> >
> >  #define TX_RING_SIZE_POW     13
> >  #define TX_RING_SIZE         ((u16)BIT(TX_RING_SIZE_POW))
> >  #define NUM_TX_BDS_MAX               (TX_RING_SIZE - 1)
> >  #define NUM_TX_BDS_MIN               128
> > -#define NUM_TX_BDS_DEF               NUM_TX_BDS_MAX
> > +#define NUM_TX_BDS_DEF               ((is_kdump_kernel()) ? ((u16)BIT(6) - 1) :
> > NUM_TX_BDS_MAX)
> >
>
> Hi Bhupesh,
>
> Thanks for looking into this. We are also analyzing how to reduce qed* memory
> usage even more.
>
> Patch is good, but may I suggest not to introduce conditional logic into the
> defines but instead just add two new defines like NUM_[RT]X_BDS_MIN and check
> for is_kdump_kernel() in the code explicitly?
>
> if (is_kdump_kernel()) {
>         edev->q_num_rx_buffers = NUM_RX_BDS_MIN;
>         edev->q_num_tx_buffers = NUM_TX_BDS_MIN;
> } else {
>         edev->q_num_rx_buffers = NUM_RX_BDS_DEF;
>         edev->q_num_tx_buffers = NUM_TX_BDS_DEF;
> }
>
> This may make configuration logic more explicit. If future we may want adding
> more specific configs under this `if`.

Thanks for the review comments.
The suggestions seem fine to me. I will incorporate them in v2.

Regards,
Bhupesh


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-05-06  7:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 19:04 [PATCH 0/2] net: Optimize the qed* allocations inside kdump kernel Bhupesh Sharma
2020-05-05 19:04 ` [PATCH 1/2] net: qed*: Reduce RX and TX default ring count when running " Bhupesh Sharma
2020-05-05 21:24   ` David Miller
2020-05-06  5:04     ` Bhupesh Sharma
2020-05-06  6:51   ` [EXT] " Igor Russkikh
2020-05-06  7:13     ` Bhupesh Sharma
2020-05-05 19:04 ` [PATCH 2/2] net: qed: Disable SRIOV functionality " Bhupesh Sharma

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).