linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] netdev: Use flexible array for trailing private bytes
@ 2024-02-29 21:30 Kees Cook
  2024-02-29 22:15 ` Gustavo A. R. Silva
  2024-03-01  6:59 ` Jakub Kicinski
  0 siblings, 2 replies; 14+ messages in thread
From: Kees Cook @ 2024-02-29 21:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kees Cook, David S. Miller, Eric Dumazet, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

Introduce a new struct net_device_priv that contains struct net_device
but also accounts for the commonly trailing bytes through the "size" and
"data" members. As many dummy struct net_device instances exist still,
it is non-trivial to but this flexible array inside struct net_device
itself. But we can add a sanity check in netdev_priv() to catch any
attempts to access the private data of a dummy struct.

Adjust allocation logic to use the new full structure.

Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
---
 include/linux/netdevice.h | 21 ++++++++++++++++++---
 net/core/dev.c            | 12 ++++--------
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 118c40258d07..b476809d0bae 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1815,6 +1815,8 @@ enum netdev_stat_type {
 	NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
 };
 
+#define	NETDEV_ALIGN		32
+
 /**
  *	struct net_device - The DEVICE structure.
  *
@@ -2476,6 +2478,14 @@ struct net_device {
 	struct hlist_head	page_pools;
 #endif
 };
+
+struct net_device_priv {
+	struct net_device	dev;
+	u32			size;
+	u8			data[] __counted_by(size)
+				       __aligned(NETDEV_ALIGN);
+};
+
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
 /*
@@ -2496,8 +2506,6 @@ static inline bool netif_elide_gro(const struct net_device *dev)
 	return false;
 }
 
-#define	NETDEV_ALIGN		32
-
 static inline
 int netdev_get_prio_tc_map(const struct net_device *dev, u32 prio)
 {
@@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net)
  */
 static inline void *netdev_priv(const struct net_device *dev)
 {
-	return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
+	struct net_device_priv *priv;
+
+	/* Dummy struct net_device have no trailing data. */
+	if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY))
+		return NULL;
+
+	priv = container_of(dev, struct net_device_priv, dev);
+	return (u8 *)priv->data;
 }
 
 /* Set the sysfs physical device reference for the network logical device
diff --git a/net/core/dev.c b/net/core/dev.c
index cb2dab0feee0..0fcaf6ae8486 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10800,7 +10800,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 {
 	struct net_device *dev;
 	unsigned int alloc_size;
-	struct net_device *p;
+	struct net_device_priv *p;
 
 	BUG_ON(strlen(name) >= sizeof(dev->name));
 
@@ -10814,20 +10814,16 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		return NULL;
 	}
 
-	alloc_size = sizeof(struct net_device);
-	if (sizeof_priv) {
-		/* ensure 32-byte alignment of private area */
-		alloc_size = ALIGN(alloc_size, NETDEV_ALIGN);
-		alloc_size += sizeof_priv;
-	}
+	alloc_size = struct_size(p, data, sizeof_priv);
 	/* ensure 32-byte alignment of whole construct */
 	alloc_size += NETDEV_ALIGN - 1;
 
 	p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
 	if (!p)
 		return NULL;
+	p->size = sizeof_priv;
 
-	dev = PTR_ALIGN(p, NETDEV_ALIGN);
+	dev = &PTR_ALIGN(p, NETDEV_ALIGN)->dev;
 	dev->padded = (char *)dev - (char *)p;
 
 	ref_tracker_dir_init(&dev->refcnt_tracker, 128, name);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-02-29 21:30 [PATCH] netdev: Use flexible array for trailing private bytes Kees Cook
@ 2024-02-29 22:15 ` Gustavo A. R. Silva
  2024-03-01  6:59 ` Jakub Kicinski
  1 sibling, 0 replies; 14+ messages in thread
From: Gustavo A. R. Silva @ 2024-02-29 22:15 UTC (permalink / raw)
  To: Kees Cook, Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Gustavo A. R. Silva, netdev, linux-hardening, Simon Horman,
	Jiri Pirko, Daniel Borkmann, Coco Li, Amritha Nambiar,
	linux-kernel



On 2/29/24 15:30, Kees Cook wrote:
> Introduce a new struct net_device_priv that contains struct net_device
> but also accounts for the commonly trailing bytes through the "size" and
> "data" members. As many dummy struct net_device instances exist still,
> it is non-trivial to but this flexible array inside struct net_device
> itself. But we can add a sanity check in netdev_priv() to catch any
> attempts to access the private data of a dummy struct.
> 
> Adjust allocation logic to use the new full structure.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
[for the flex `struct net_device_priv`, `struct_size()`, `__counted_by()`,
and the use of `container_of()` to retrieve a pointer to the flex struct
and return pointer to flex-array member `data` in `netdev_priv()`]

Thanks
--
Gustavo

> ---
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
> Cc: netdev@vger.kernel.org
> Cc: linux-hardening@vger.kernel.org
> ---
>   include/linux/netdevice.h | 21 ++++++++++++++++++---
>   net/core/dev.c            | 12 ++++--------
>   2 files changed, 22 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 118c40258d07..b476809d0bae 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1815,6 +1815,8 @@ enum netdev_stat_type {
>   	NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
>   };
>   
> +#define	NETDEV_ALIGN		32
> +
>   /**
>    *	struct net_device - The DEVICE structure.
>    *
> @@ -2476,6 +2478,14 @@ struct net_device {
>   	struct hlist_head	page_pools;
>   #endif
>   };
> +
> +struct net_device_priv {
> +	struct net_device	dev;
> +	u32			size;
> +	u8			data[] __counted_by(size)
> +				       __aligned(NETDEV_ALIGN);
> +};
> +
>   #define to_net_dev(d) container_of(d, struct net_device, dev)
>   
>   /*
> @@ -2496,8 +2506,6 @@ static inline bool netif_elide_gro(const struct net_device *dev)
>   	return false;
>   }
>   
> -#define	NETDEV_ALIGN		32
> -
>   static inline
>   int netdev_get_prio_tc_map(const struct net_device *dev, u32 prio)
>   {
> @@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net)
>    */
>   static inline void *netdev_priv(const struct net_device *dev)
>   {
> -	return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
> +	struct net_device_priv *priv;
> +
> +	/* Dummy struct net_device have no trailing data. */
> +	if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY))
> +		return NULL;
> +
> +	priv = container_of(dev, struct net_device_priv, dev);
> +	return (u8 *)priv->data;
>   }
>   
>   /* Set the sysfs physical device reference for the network logical device
> diff --git a/net/core/dev.c b/net/core/dev.c
> index cb2dab0feee0..0fcaf6ae8486 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -10800,7 +10800,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
>   {
>   	struct net_device *dev;
>   	unsigned int alloc_size;
> -	struct net_device *p;
> +	struct net_device_priv *p;
>   
>   	BUG_ON(strlen(name) >= sizeof(dev->name));
>   
> @@ -10814,20 +10814,16 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
>   		return NULL;
>   	}
>   
> -	alloc_size = sizeof(struct net_device);
> -	if (sizeof_priv) {
> -		/* ensure 32-byte alignment of private area */
> -		alloc_size = ALIGN(alloc_size, NETDEV_ALIGN);
> -		alloc_size += sizeof_priv;
> -	}
> +	alloc_size = struct_size(p, data, sizeof_priv);
>   	/* ensure 32-byte alignment of whole construct */
>   	alloc_size += NETDEV_ALIGN - 1;
>   
>   	p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL);
>   	if (!p)
>   		return NULL;
> +	p->size = sizeof_priv;
>   
> -	dev = PTR_ALIGN(p, NETDEV_ALIGN);
> +	dev = &PTR_ALIGN(p, NETDEV_ALIGN)->dev;
>   	dev->padded = (char *)dev - (char *)p;
>   
>   	ref_tracker_dir_init(&dev->refcnt_tracker, 128, name);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-02-29 21:30 [PATCH] netdev: Use flexible array for trailing private bytes Kees Cook
  2024-02-29 22:15 ` Gustavo A. R. Silva
@ 2024-03-01  6:59 ` Jakub Kicinski
  2024-03-01  8:03   ` Eric Dumazet
                     ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-03-01  6:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Gustavo A. R. Silva, netdev, linux-hardening, Simon Horman,
	Jiri Pirko, Daniel Borkmann, Coco Li, Amritha Nambiar,
	linux-kernel

On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote:
> Introduce a new struct net_device_priv that contains struct net_device
> but also accounts for the commonly trailing bytes through the "size" and
> "data" members.

I'm a bit unclear on the benefit. Perhaps I'm unaccustomed to "safe C".

> As many dummy struct net_device instances exist still,
> it is non-trivial to but this flexible array inside struct net_device

put

Non-trivial, meaning what's the challenge?
We also do somewhat silly things with netdev lifetime, because we can't
assume netdev gets freed by netdev_free(). Cleaning up the "embedders"
would be beneficial for multiple reasons.

> itself. But we can add a sanity check in netdev_priv() to catch any
> attempts to access the private data of a dummy struct.
> 
> Adjust allocation logic to use the new full structure.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>

> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 118c40258d07..b476809d0bae 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1815,6 +1815,8 @@ enum netdev_stat_type {
>  	NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
>  };
>  
> +#define	NETDEV_ALIGN		32

Unless someone knows what this is for it should go.
Align priv to cacheline size.

>  /**
>   *	struct net_device - The DEVICE structure.
>   *

> @@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net)
>   */
>  static inline void *netdev_priv(const struct net_device *dev)
>  {
> -	return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
> +	struct net_device_priv *priv;
> +
> +	/* Dummy struct net_device have no trailing data. */
> +	if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY))
> +		return NULL;

This is a static inline with roughly 11,000 call sites, according to 
a quick grep. Aren't WARN_ONCE() in static inlines creating a "once"
object in every compilation unit where they get used?

> +	priv = container_of(dev, struct net_device_priv, dev);
> +	return (u8 *)priv->data;
>  }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01  6:59 ` Jakub Kicinski
@ 2024-03-01  8:03   ` Eric Dumazet
  2024-03-01 12:58     ` Alexander Lobakin
  2024-03-01 11:41   ` Greg KH
  2024-03-06 13:16   ` Breno Leitao
  2 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2024-03-01  8:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kees Cook, David S. Miller, Paolo Abeni, Andy Shevchenko,
	Gustavo A. R. Silva, netdev, linux-hardening, Simon Horman,
	Jiri Pirko, Daniel Borkmann, Coco Li, Amritha Nambiar,
	linux-kernel

On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote:

> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 118c40258d07..b476809d0bae 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1815,6 +1815,8 @@ enum netdev_stat_type {
> >       NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
> >  };
> >
> > +#define      NETDEV_ALIGN            32
>
> Unless someone knows what this is for it should go.
> Align priv to cacheline size.

+2

#define NETDEV_ALIGN    L1_CACHE_BYTES

or a general replacement of NETDEV_ALIGN....

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01  6:59 ` Jakub Kicinski
  2024-03-01  8:03   ` Eric Dumazet
@ 2024-03-01 11:41   ` Greg KH
  2024-03-06 13:16   ` Breno Leitao
  2 siblings, 0 replies; 14+ messages in thread
From: Greg KH @ 2024-03-01 11:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Kees Cook, David S. Miller, Eric Dumazet, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

On Thu, Feb 29, 2024 at 10:59:10PM -0800, Jakub Kicinski wrote:
> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote:
> > Introduce a new struct net_device_priv that contains struct net_device
> > but also accounts for the commonly trailing bytes through the "size" and
> > "data" members.
> 
> I'm a bit unclear on the benefit. Perhaps I'm unaccustomed to "safe C".
> 
> > As many dummy struct net_device instances exist still,
> > it is non-trivial to but this flexible array inside struct net_device
> 
> put
> 
> Non-trivial, meaning what's the challenge?
> We also do somewhat silly things with netdev lifetime, because we can't
> assume netdev gets freed by netdev_free(). Cleaning up the "embedders"
> would be beneficial for multiple reasons.
> 
> > itself. But we can add a sanity check in netdev_priv() to catch any
> > attempts to access the private data of a dummy struct.
> > 
> > Adjust allocation logic to use the new full structure.
> > 
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> 
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 118c40258d07..b476809d0bae 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -1815,6 +1815,8 @@ enum netdev_stat_type {
> >  	NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
> >  };
> >  
> > +#define	NETDEV_ALIGN		32
> 
> Unless someone knows what this is for it should go.
> Align priv to cacheline size.
> 
> >  /**
> >   *	struct net_device - The DEVICE structure.
> >   *
> 
> > @@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net)
> >   */
> >  static inline void *netdev_priv(const struct net_device *dev)
> >  {
> > -	return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
> > +	struct net_device_priv *priv;
> > +
> > +	/* Dummy struct net_device have no trailing data. */
> > +	if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY))
> > +		return NULL;
> 
> This is a static inline with roughly 11,000 call sites, according to 
> a quick grep. Aren't WARN_ONCE() in static inlines creating a "once"
> object in every compilation unit where they get used?

It also, if this every trips, will reboot the box for those that run
with panic-on-warn set, is that something that you all really want?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01  8:03   ` Eric Dumazet
@ 2024-03-01 12:58     ` Alexander Lobakin
  2024-03-01 13:25       ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Lobakin @ 2024-03-01 12:58 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Kees Cook
  Cc: David S. Miller, Paolo Abeni, Andy Shevchenko,
	Gustavo A. R. Silva, netdev, linux-hardening, Simon Horman,
	Jiri Pirko, Daniel Borkmann, Coco Li, Amritha Nambiar,
	linux-kernel

From: Eric Dumazet <edumazet@google.com>
Date: Fri, 1 Mar 2024 09:03:55 +0100

> On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote:

Re WARN_ONCE() in netdev_priv(): netdev_priv() is VERY hot, I'm not sure
we want to add checks there. Maybe under CONFIG_DEBUG_NET?

> 
>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>> index 118c40258d07..b476809d0bae 100644
>>> --- a/include/linux/netdevice.h
>>> +++ b/include/linux/netdevice.h
>>> @@ -1815,6 +1815,8 @@ enum netdev_stat_type {
>>>       NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
>>>  };
>>>
>>> +#define      NETDEV_ALIGN            32
>>
>> Unless someone knows what this is for it should go.
>> Align priv to cacheline size.
> 
> +2
> 

Maybe

> #define NETDEV_ALIGN    L1_CACHE_BYTES

#define NETDEV_ALIGN	max(SMP_CACHE_BYTES, 32)

?

(or even max(1 << INTERNODE_CACHE_SHIFT, 32))

> 
> or a general replacement of NETDEV_ALIGN....
> 
> 

+ I'd align both struct net_device AND its private space to
%NETDEV_ALIGN and remove this weird PTR_ALIGN. {k,v}malloc ensures
natural alignment of allocations for at least a couple years already
(IOW if struct net_device is aligned to 64, {k,v}malloc will *always*
return a 64-byte aligned address).

Thanks,
Olek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01 12:58     ` Alexander Lobakin
@ 2024-03-01 13:25       ` Eric Dumazet
  2024-03-01 14:30         ` Alexander Lobakin
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2024-03-01 13:25 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Jakub Kicinski, Kees Cook, David S. Miller, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

On Fri, Mar 1, 2024 at 1:59 PM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
> Date: Fri, 1 Mar 2024 09:03:55 +0100
>
> > On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>
> >> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote:
>
> Re WARN_ONCE() in netdev_priv(): netdev_priv() is VERY hot, I'm not sure
> we want to add checks there. Maybe under CONFIG_DEBUG_NET?
>
> >
> >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> >>> index 118c40258d07..b476809d0bae 100644
> >>> --- a/include/linux/netdevice.h
> >>> +++ b/include/linux/netdevice.h
> >>> @@ -1815,6 +1815,8 @@ enum netdev_stat_type {
> >>>       NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
> >>>  };
> >>>
> >>> +#define      NETDEV_ALIGN            32
> >>
> >> Unless someone knows what this is for it should go.
> >> Align priv to cacheline size.
> >
> > +2
> >
>
> Maybe
>
> > #define NETDEV_ALIGN    L1_CACHE_BYTES
>
> #define NETDEV_ALIGN    max(SMP_CACHE_BYTES, 32)

Why would we care if some arches have a very small SMP_CACHE_BYTES ?
Bet it !

IMO nothing in networking mandates this minimal 32 byte alignment.

>
> ?
>
> (or even max(1 << INTERNODE_CACHE_SHIFT, 32))

I do not think so.

INTERNODE_CACHE_SHIFT is a bit extreme on allyesconfig on x86 :/
(with CONFIG_X86_VSMP=y)


>
> >
> > or a general replacement of NETDEV_ALIGN....
> >
> >
>
> + I'd align both struct net_device AND its private space to
> %NETDEV_ALIGN and remove this weird PTR_ALIGN. {k,v}malloc ensures
> natural alignment of allocations for at least a couple years already
> (IOW if struct net_device is aligned to 64, {k,v}malloc will *always*
> return a 64-byte aligned address).

I think that with SLAB or SLOB in the past with some DEBUG options
there was no such guarantee.

But this is probably no longer the case, and heavy DEBUG options these
days (KASAN, KFENCE...)
do not expect fast networking anyway.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01 13:25       ` Eric Dumazet
@ 2024-03-01 14:30         ` Alexander Lobakin
  2024-03-01 17:35           ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Lobakin @ 2024-03-01 14:30 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Jakub Kicinski, Kees Cook, David S. Miller, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

From: Eric Dumazet <edumazet@google.com>
Date: Fri, 1 Mar 2024 14:25:37 +0100

> On Fri, Mar 1, 2024 at 1:59 PM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Eric Dumazet <edumazet@google.com>
>> Date: Fri, 1 Mar 2024 09:03:55 +0100
>>
>>> On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>>
>>>> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote:
>>
>> Re WARN_ONCE() in netdev_priv(): netdev_priv() is VERY hot, I'm not sure
>> we want to add checks there. Maybe under CONFIG_DEBUG_NET?
>>
>>>
>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>>>>> index 118c40258d07..b476809d0bae 100644
>>>>> --- a/include/linux/netdevice.h
>>>>> +++ b/include/linux/netdevice.h
>>>>> @@ -1815,6 +1815,8 @@ enum netdev_stat_type {
>>>>>       NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */
>>>>>  };
>>>>>
>>>>> +#define      NETDEV_ALIGN            32
>>>>
>>>> Unless someone knows what this is for it should go.
>>>> Align priv to cacheline size.
>>>
>>> +2
>>>
>>
>> Maybe
>>
>>> #define NETDEV_ALIGN    L1_CACHE_BYTES
>>
>> #define NETDEV_ALIGN    max(SMP_CACHE_BYTES, 32)
> 
> Why would we care if some arches have a very small SMP_CACHE_BYTES ?

Oh sorry, I thought %SMP_CACHE_BYTES is 1 when !SMP.
We can then just add ____cacheline_aligned to both struct net_device and
its ::priv flex array and that's it.

I like the idea of declaring priv explicitly rather than doing size +
ptr magic. But maybe we could just add this flex array to struct
net_device and avoid introducing a new structure.

> Bet it !
> 
> IMO nothing in networking mandates this minimal 32 byte alignment.
> 
>>
>> ?
>>
>> (or even max(1 << INTERNODE_CACHE_SHIFT, 32))
> 
> I do not think so.
> 
> INTERNODE_CACHE_SHIFT is a bit extreme on allyesconfig on x86 :/
> (with CONFIG_X86_VSMP=y)
> 
> 
>>
>>>
>>> or a general replacement of NETDEV_ALIGN....
>>>
>>>
>>
>> + I'd align both struct net_device AND its private space to
>> %NETDEV_ALIGN and remove this weird PTR_ALIGN. {k,v}malloc ensures
>> natural alignment of allocations for at least a couple years already
>> (IOW if struct net_device is aligned to 64, {k,v}malloc will *always*
>> return a 64-byte aligned address).
> 
> I think that with SLAB or SLOB in the past with some DEBUG options
> there was no such guarantee.
> 
> But this is probably no longer the case, and heavy DEBUG options these
> days (KASAN, KFENCE...)
> do not expect fast networking anyway.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01 14:30         ` Alexander Lobakin
@ 2024-03-01 17:35           ` Jakub Kicinski
  2024-03-04 14:32             ` Alexander Lobakin
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-03-01 17:35 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Eric Dumazet, Kees Cook, David S. Miller, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

On Fri, 1 Mar 2024 15:30:03 +0100 Alexander Lobakin wrote:
> I like the idea of declaring priv explicitly rather than doing size +
> ptr magic. But maybe we could just add this flex array to struct
> net_device and avoid introducing a new structure.

100% I should have linked to the thread that led to Kees's work.
Adding directly to net_device would be way better but there's
a handful of drivers which embed the struct.
If we can switch them to dynamic allocation, that'd be great.
And, as you may be alluding to, it removes the need for the WARN_ON()
entirely as well.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01 17:35           ` Jakub Kicinski
@ 2024-03-04 14:32             ` Alexander Lobakin
  2024-03-04 15:24               ` Jakub Kicinski
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Lobakin @ 2024-03-04 14:32 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Eric Dumazet, Kees Cook, David S. Miller, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

From: Jakub Kicinski <kuba@kernel.org>
Date: Fri, 1 Mar 2024 09:35:17 -0800

> On Fri, 1 Mar 2024 15:30:03 +0100 Alexander Lobakin wrote:
>> I like the idea of declaring priv explicitly rather than doing size +
>> ptr magic. But maybe we could just add this flex array to struct
>> net_device and avoid introducing a new structure.
> 
> 100% I should have linked to the thread that led to Kees's work.
> Adding directly to net_device would be way better but there's
> a handful of drivers which embed the struct.

I think it's okay to embed a struct with flex array at the end as long
as it's not used? Or the compiler will say that the flex array is not at
the end of the structure?

> If we can switch them to dynamic allocation, that'd be great.

It's mega weird to embed &net_device rather than do alloc_*dev() >_<

> And, as you may be alluding to, it removes the need for the WARN_ON()
> entirely as well.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-04 14:32             ` Alexander Lobakin
@ 2024-03-04 15:24               ` Jakub Kicinski
  0 siblings, 0 replies; 14+ messages in thread
From: Jakub Kicinski @ 2024-03-04 15:24 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Eric Dumazet, Kees Cook, David S. Miller, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

On Mon, 4 Mar 2024 15:32:51 +0100 Alexander Lobakin wrote:
> > 100% I should have linked to the thread that led to Kees's work.
> > Adding directly to net_device would be way better but there's
> > a handful of drivers which embed the struct.  
> 
> I think it's okay to embed a struct with flex array at the end as long
> as it's not used? Or the compiler will say that the flex array is not at
> the end of the structure?

Technically, yes. Practically it ties the lifetime of a refcounted
object to something semi-related with different lifetime rules :(

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-01  6:59 ` Jakub Kicinski
  2024-03-01  8:03   ` Eric Dumazet
  2024-03-01 11:41   ` Greg KH
@ 2024-03-06 13:16   ` Breno Leitao
  2024-03-06 15:06     ` Jakub Kicinski
  2 siblings, 1 reply; 14+ messages in thread
From: Breno Leitao @ 2024-03-06 13:16 UTC (permalink / raw)
  To: Jakub Kicinski, keescook
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Gustavo A. R. Silva, netdev, linux-hardening, Simon Horman,
	Jiri Pirko, Daniel Borkmann, Coco Li, Amritha Nambiar,
	linux-kernel

On Thu, Feb 29, 2024 at 10:59:10PM -0800, Jakub Kicinski wrote:
> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote:
> > Introduce a new struct net_device_priv that contains struct net_device
> > but also accounts for the commonly trailing bytes through the "size" and
> > "data" members.
> 
> I'm a bit unclear on the benefit. Perhaps I'm unaccustomed to "safe C".
> 
> > As many dummy struct net_device instances exist still,
> > it is non-trivial to but this flexible array inside struct net_device
> 
> put
> 
> Non-trivial, meaning what's the challenge?
> We also do somewhat silly things with netdev lifetime, because we can't
> assume netdev gets freed by netdev_free(). Cleaning up the "embedders"
> would be beneficial for multiple reasons.

I've been looking at some of these embedders as reported by Kees[1], and
most of them are for dummy interfaces. I.e, they are basically used for
schedule NAPI poll.

From that list[1], most of the driver matches with:

	# git grep init_dummy_netdev

That said, do you think it is still worth cleaning up embedders for
dummy net_devices?

[1] https://lore.kernel.org/all/202402281554.C1CEEF744@keescook/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-06 13:16   ` Breno Leitao
@ 2024-03-06 15:06     ` Jakub Kicinski
  2024-03-06 23:42       ` Kees Cook
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2024-03-06 15:06 UTC (permalink / raw)
  To: Breno Leitao, keescook
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Andy Shevchenko,
	Gustavo A. R. Silva, netdev, linux-hardening, Simon Horman,
	Jiri Pirko, Daniel Borkmann, Coco Li, Amritha Nambiar,
	linux-kernel

On Wed, 6 Mar 2024 05:16:16 -0800 Breno Leitao wrote:
> I've been looking at some of these embedders as reported by Kees[1], and
> most of them are for dummy interfaces. I.e, they are basically used for
> schedule NAPI poll.
> 
> From that list[1], most of the driver matches with:
> 
> 	# git grep init_dummy_netdev
> 
> That said, do you think it is still worth cleaning up embedders for
> dummy net_devices?
> 
> [1] https://lore.kernel.org/all/202402281554.C1CEEF744@keescook/

Yes, I think so.
Kees, did you plan to send a v2? Otherwise I can put the cleanup on our
"public ToDo" list :)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] netdev: Use flexible array for trailing private bytes
  2024-03-06 15:06     ` Jakub Kicinski
@ 2024-03-06 23:42       ` Kees Cook
  0 siblings, 0 replies; 14+ messages in thread
From: Kees Cook @ 2024-03-06 23:42 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Breno Leitao, David S. Miller, Eric Dumazet, Paolo Abeni,
	Andy Shevchenko, Gustavo A. R. Silva, netdev, linux-hardening,
	Simon Horman, Jiri Pirko, Daniel Borkmann, Coco Li,
	Amritha Nambiar, linux-kernel

On Wed, Mar 06, 2024 at 07:06:58AM -0800, Jakub Kicinski wrote:
> On Wed, 6 Mar 2024 05:16:16 -0800 Breno Leitao wrote:
> > I've been looking at some of these embedders as reported by Kees[1], and
> > most of them are for dummy interfaces. I.e, they are basically used for
> > schedule NAPI poll.
> > 
> > From that list[1], most of the driver matches with:
> > 
> > 	# git grep init_dummy_netdev
> > 
> > That said, do you think it is still worth cleaning up embedders for
> > dummy net_devices?
> > 
> > [1] https://lore.kernel.org/all/202402281554.C1CEEF744@keescook/
> 
> Yes, I think so.
> Kees, did you plan to send a v2? Otherwise I can put the cleanup on our
> "public ToDo" list :)

I found the requested collateral changes that popped out of v1 to be
rather a bit much for me to tackle right now, so I think adding to the
TODO list is probably best. :)

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-03-06 23:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-29 21:30 [PATCH] netdev: Use flexible array for trailing private bytes Kees Cook
2024-02-29 22:15 ` Gustavo A. R. Silva
2024-03-01  6:59 ` Jakub Kicinski
2024-03-01  8:03   ` Eric Dumazet
2024-03-01 12:58     ` Alexander Lobakin
2024-03-01 13:25       ` Eric Dumazet
2024-03-01 14:30         ` Alexander Lobakin
2024-03-01 17:35           ` Jakub Kicinski
2024-03-04 14:32             ` Alexander Lobakin
2024-03-04 15:24               ` Jakub Kicinski
2024-03-01 11:41   ` Greg KH
2024-03-06 13:16   ` Breno Leitao
2024-03-06 15:06     ` Jakub Kicinski
2024-03-06 23:42       ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).