All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] [v2] ethtool: reduce stack usage with clang
@ 2019-03-07 15:58 Arnd Bergmann
  2019-03-07 16:41 ` Michal Kubecek
  2019-03-07 17:46 ` David Miller
  0 siblings, 2 replies; 4+ messages in thread
From: Arnd Bergmann @ 2019-03-07 15:58 UTC (permalink / raw)
  To: David S. Miller
  Cc: Arnd Bergmann, Florian Fainelli, Jakub Kicinski, Jiri Pirko,
	Wenwen Wang, Michal Kubecek, netdev, linux-kernel

clang inlines the dev_ethtool() more aggressively than gcc does, leading
to a larger amount of used stack space:

net/core/ethtool.c:2536:24: error: stack frame size of 1216 bytes in function 'dev_ethtool' [-Werror,-Wframe-larger-than=]

Marking the sub-functions that require the most stack space as
noinline_for_stack gives us reasonable behavior on all compilers.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
v2: don't annotate dev_ethtool itself, as pointed out by Michal Kubecek
---
 net/core/ethtool.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index d4918ffddda8..b1eb32419732 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -2319,9 +2319,10 @@ static int ethtool_set_tunable(struct net_device *dev, void __user *useraddr)
 	return ret;
 }
 
-static int ethtool_get_per_queue_coalesce(struct net_device *dev,
-					  void __user *useraddr,
-					  struct ethtool_per_queue_op *per_queue_opt)
+static noinline_for_stack int
+ethtool_get_per_queue_coalesce(struct net_device *dev,
+			       void __user *useraddr,
+			       struct ethtool_per_queue_op *per_queue_opt)
 {
 	u32 bit;
 	int ret;
@@ -2349,9 +2350,10 @@ static int ethtool_get_per_queue_coalesce(struct net_device *dev,
 	return 0;
 }
 
-static int ethtool_set_per_queue_coalesce(struct net_device *dev,
-					  void __user *useraddr,
-					  struct ethtool_per_queue_op *per_queue_opt)
+static noinline_for_stack int
+ethtool_set_per_queue_coalesce(struct net_device *dev,
+			       void __user *useraddr,
+			       struct ethtool_per_queue_op *per_queue_opt)
 {
 	u32 bit;
 	int i, ret = 0;
@@ -2405,7 +2407,7 @@ static int ethtool_set_per_queue_coalesce(struct net_device *dev,
 	return ret;
 }
 
-static int ethtool_set_per_queue(struct net_device *dev,
+static int noinline_for_stack ethtool_set_per_queue(struct net_device *dev,
 				 void __user *useraddr, u32 sub_cmd)
 {
 	struct ethtool_per_queue_op per_queue_opt;
-- 
2.20.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] [v2] ethtool: reduce stack usage with clang
  2019-03-07 15:58 [PATCH] [v2] ethtool: reduce stack usage with clang Arnd Bergmann
@ 2019-03-07 16:41 ` Michal Kubecek
  2019-03-07 17:46 ` David Miller
  1 sibling, 0 replies; 4+ messages in thread
From: Michal Kubecek @ 2019-03-07 16:41 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David S. Miller, Florian Fainelli, Jakub Kicinski, Jiri Pirko,
	Wenwen Wang, netdev, linux-kernel

On Thu, Mar 07, 2019 at 04:58:35PM +0100, Arnd Bergmann wrote:
> clang inlines the dev_ethtool() more aggressively than gcc does, leading
> to a larger amount of used stack space:
> 
> net/core/ethtool.c:2536:24: error: stack frame size of 1216 bytes in function 'dev_ethtool' [-Werror,-Wframe-larger-than=]
> 
> Marking the sub-functions that require the most stack space as
> noinline_for_stack gives us reasonable behavior on all compilers.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: don't annotate dev_ethtool itself, as pointed out by Michal Kubecek
> ---
>  net/core/ethtool.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 

Reviewed-by: Michal Kubecek <mkubecek@suse.cz>

In the long term, these three functions would IMHO deserve some rework.
While marking them as noinline_for_stack prevents one big stack frame in
dev_ethtool(), we still allocate rather big structures on the stack,
only it won't be in one stack frame. When we take the

  dev_ethtool
    ethtool_set_perqueue
      ethtool_[gs]et_per_queue_coalesce

path, there will still be two 512-byte arrays on the stack, one inside
struct ethtool_per_queue_op in ethtool_set_per_queue() and one as the
queue_mask bitmap in ethtool_[gs]et_per_queue_coalesce() (MAX_NUM_QUEUE
is 4096 now). In other words, actual stack consumption might be
approximately the same after this change as it was before, only divided
into three stack frames.

Michal Kubecek

> diff --git a/net/core/ethtool.c b/net/core/ethtool.c
> index d4918ffddda8..b1eb32419732 100644
> --- a/net/core/ethtool.c
> +++ b/net/core/ethtool.c
> @@ -2319,9 +2319,10 @@ static int ethtool_set_tunable(struct net_device *dev, void __user *useraddr)
>  	return ret;
>  }
>  
> -static int ethtool_get_per_queue_coalesce(struct net_device *dev,
> -					  void __user *useraddr,
> -					  struct ethtool_per_queue_op *per_queue_opt)
> +static noinline_for_stack int
> +ethtool_get_per_queue_coalesce(struct net_device *dev,
> +			       void __user *useraddr,
> +			       struct ethtool_per_queue_op *per_queue_opt)
>  {
>  	u32 bit;
>  	int ret;
> @@ -2349,9 +2350,10 @@ static int ethtool_get_per_queue_coalesce(struct net_device *dev,
>  	return 0;
>  }
>  
> -static int ethtool_set_per_queue_coalesce(struct net_device *dev,
> -					  void __user *useraddr,
> -					  struct ethtool_per_queue_op *per_queue_opt)
> +static noinline_for_stack int
> +ethtool_set_per_queue_coalesce(struct net_device *dev,
> +			       void __user *useraddr,
> +			       struct ethtool_per_queue_op *per_queue_opt)
>  {
>  	u32 bit;
>  	int i, ret = 0;
> @@ -2405,7 +2407,7 @@ static int ethtool_set_per_queue_coalesce(struct net_device *dev,
>  	return ret;
>  }
>  
> -static int ethtool_set_per_queue(struct net_device *dev,
> +static int noinline_for_stack ethtool_set_per_queue(struct net_device *dev,
>  				 void __user *useraddr, u32 sub_cmd)
>  {
>  	struct ethtool_per_queue_op per_queue_opt;

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] [v2] ethtool: reduce stack usage with clang
  2019-03-07 15:58 [PATCH] [v2] ethtool: reduce stack usage with clang Arnd Bergmann
  2019-03-07 16:41 ` Michal Kubecek
@ 2019-03-07 17:46 ` David Miller
  2019-03-07 22:03   ` Arnd Bergmann
  1 sibling, 1 reply; 4+ messages in thread
From: David Miller @ 2019-03-07 17:46 UTC (permalink / raw)
  To: arnd
  Cc: f.fainelli, jakub.kicinski, jiri, wang6495, mkubecek, netdev,
	linux-kernel

From: Arnd Bergmann <arnd@arndb.de>
Date: Thu,  7 Mar 2019 16:58:35 +0100

> clang inlines the dev_ethtool() more aggressively than gcc does, leading
> to a larger amount of used stack space:
> 
> net/core/ethtool.c:2536:24: error: stack frame size of 1216 bytes in function 'dev_ethtool' [-Werror,-Wframe-larger-than=]
> 
> Marking the sub-functions that require the most stack space as
> noinline_for_stack gives us reasonable behavior on all compilers.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> v2: don't annotate dev_ethtool itself, as pointed out by Michal Kubecek

I'll apply this, but as Michal said this is just papering over the problem,
the aggregate stack allocation is still the same and very large.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] [v2] ethtool: reduce stack usage with clang
  2019-03-07 17:46 ` David Miller
@ 2019-03-07 22:03   ` Arnd Bergmann
  0 siblings, 0 replies; 4+ messages in thread
From: Arnd Bergmann @ 2019-03-07 22:03 UTC (permalink / raw)
  To: David Miller
  Cc: Florian Fainelli, Jakub Kicinski, Jiri Pirko, Wenwen Wang,
	Michal Kubecek, Networking, Linux Kernel Mailing List

On Thu, Mar 7, 2019 at 6:46 PM David Miller <davem@davemloft.net> wrote:
>
> From: Arnd Bergmann <arnd@arndb.de>
> Date: Thu,  7 Mar 2019 16:58:35 +0100
>
> > clang inlines the dev_ethtool() more aggressively than gcc does, leading
> > to a larger amount of used stack space:
> >
> > net/core/ethtool.c:2536:24: error: stack frame size of 1216 bytes in function 'dev_ethtool' [-Werror,-Wframe-larger-than=]
> >
> > Marking the sub-functions that require the most stack space as
> > noinline_for_stack gives us reasonable behavior on all compilers.
> >
> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> > ---
> > v2: don't annotate dev_ethtool itself, as pointed out by Michal Kubecek
>
> I'll apply this, but as Michal said this is just papering over the problem,
> the aggregate stack allocation is still the same and very large.

Thanks,

I looked at it again and found that ethtool_get_per_queue_coalesce()
and ethtool_set_per_queue_coalesce() in fact use the same stack slots
(as one would hope) when they are both inlined, so you are both right
that this doesn't change as much for the ethtool_set_per_queue()
function as I had hoped.

On the other hand, at least it helps reduce the stack usage for all
the other sub-functions of dev_ethtool(), which now don't pile
on top of the 1216 bytes for the combined function but only
on top of the 272 bytes for the rest of dev_ethtool.

     Arnd

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-03-07 22:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-07 15:58 [PATCH] [v2] ethtool: reduce stack usage with clang Arnd Bergmann
2019-03-07 16:41 ` Michal Kubecek
2019-03-07 17:46 ` David Miller
2019-03-07 22:03   ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.