All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
@ 2022-06-26  8:23 Kuniyuki Iwashima
  2022-06-26 16:43 ` Eric W. Biederman
  0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-26  8:23 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Eric W. Biederman, Pavel Emelyanov, Herbert Xu,
	Kuniyuki Iwashima, Kuniyuki Iwashima, netdev

While setting up init_net's sysctl table, we need not duplicate the global
table and can use it directly.

Fixes: 1597fbc0faf8 ("[UNIX]: Make the unix sysctl tables per-namespace")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
v2:
  * Fix NULL comparison style by checkpatch.pl

v1: https://lore.kernel.org/all/20220626074454.28944-1-kuniyu@amazon.com/
---
---
 net/unix/sysctl_net_unix.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
index 01d44e2598e2..4bd856d05135 100644
--- a/net/unix/sysctl_net_unix.c
+++ b/net/unix/sysctl_net_unix.c
@@ -26,11 +26,16 @@ int __net_init unix_sysctl_register(struct net *net)
 {
 	struct ctl_table *table;
 
-	table = kmemdup(unix_table, sizeof(unix_table), GFP_KERNEL);
-	if (table == NULL)
-		goto err_alloc;
+	if (net_eq(net, &init_net)) {
+		table = unix_table;
+	} else {
+		table = kmemdup(unix_table, sizeof(unix_table), GFP_KERNEL);
+		if (!table)
+			goto err_alloc;
+
+		table[0].data = &net->unx.sysctl_max_dgram_qlen;
+	}
 
-	table[0].data = &net->unx.sysctl_max_dgram_qlen;
 	net->unx.ctl = register_net_sysctl(net, "net/unix", table);
 	if (net->unx.ctl == NULL)
 		goto err_reg;
@@ -38,7 +43,8 @@ int __net_init unix_sysctl_register(struct net *net)
 	return 0;
 
 err_reg:
-	kfree(table);
+	if (net_eq(net, &init_net))
+		kfree(table);
 err_alloc:
 	return -ENOMEM;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-26  8:23 [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table Kuniyuki Iwashima
@ 2022-06-26 16:43 ` Eric W. Biederman
  2022-06-27 17:58   ` Jakub Kicinski
  0 siblings, 1 reply; 13+ messages in thread
From: Eric W. Biederman @ 2022-06-26 16:43 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Pavel Emelyanov, Herbert Xu, Kuniyuki Iwashima, netdev

Kuniyuki Iwashima <kuniyu@amazon.com> writes:

> While setting up init_net's sysctl table, we need not duplicate the global
> table and can use it directly.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

I am not quite certain the savings of a single entry table justivies
the complexity.  But the looks correct.

Eric


>
> Fixes: 1597fbc0faf8 ("[UNIX]: Make the unix sysctl tables per-namespace")
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
> v2:
>   * Fix NULL comparison style by checkpatch.pl
>
> v1: https://lore.kernel.org/all/20220626074454.28944-1-kuniyu@amazon.com/
> ---
> ---
>  net/unix/sysctl_net_unix.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c
> index 01d44e2598e2..4bd856d05135 100644
> --- a/net/unix/sysctl_net_unix.c
> +++ b/net/unix/sysctl_net_unix.c
> @@ -26,11 +26,16 @@ int __net_init unix_sysctl_register(struct net *net)
>  {
>  	struct ctl_table *table;
>  
> -	table = kmemdup(unix_table, sizeof(unix_table), GFP_KERNEL);
> -	if (table == NULL)
> -		goto err_alloc;
> +	if (net_eq(net, &init_net)) {
> +		table = unix_table;
> +	} else {
> +		table = kmemdup(unix_table, sizeof(unix_table), GFP_KERNEL);
> +		if (!table)
> +			goto err_alloc;
> +
> +		table[0].data = &net->unx.sysctl_max_dgram_qlen;
> +	}
>  
> -	table[0].data = &net->unx.sysctl_max_dgram_qlen;
>  	net->unx.ctl = register_net_sysctl(net, "net/unix", table);
>  	if (net->unx.ctl == NULL)
>  		goto err_reg;
> @@ -38,7 +43,8 @@ int __net_init unix_sysctl_register(struct net *net)
>  	return 0;
>  
>  err_reg:
> -	kfree(table);
> +	if (net_eq(net, &init_net))
> +		kfree(table);
>  err_alloc:
>  	return -ENOMEM;
>  }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-26 16:43 ` Eric W. Biederman
@ 2022-06-27 17:58   ` Jakub Kicinski
  2022-06-27 18:30     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 13+ messages in thread
From: Jakub Kicinski @ 2022-06-27 17:58 UTC (permalink / raw)
  To: Eric W. Biederman, Kuniyuki Iwashima
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Pavel Emelyanov,
	Herbert Xu, Kuniyuki Iwashima, netdev

On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> 
> > While setting up init_net's sysctl table, we need not duplicate the global
> > table and can use it directly.  
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> I am not quite certain the savings of a single entry table justivies
> the complexity.  But the looks correct.

Yeah, the commit message is a little sparse. The "why" is not addressed.
Could you add more details to explain the motivation?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 17:58   ` Jakub Kicinski
@ 2022-06-27 18:30     ` Kuniyuki Iwashima
  2022-06-27 18:40       ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-27 18:30 UTC (permalink / raw)
  To: kuba
  Cc: davem, ebiederm, edumazet, herbert, kuni1840, kuniyu, netdev,
	pabeni, xemul

From:   Jakub Kicinski <kuba@kernel.org>
Date:   Mon, 27 Jun 2022 10:58:59 -0700
> On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > 
> > > While setting up init_net's sysctl table, we need not duplicate the global
> > > table and can use it directly.  
> > 
> > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > 
> > I am not quite certain the savings of a single entry table justivies
> > the complexity.  But the looks correct.
> 
> Yeah, the commit message is a little sparse. The "why" is not addressed.
> Could you add more details to explain the motivation?

I was working on a series which converts UDP/TCP hash tables into per-netns
ones like AF_UNIX to speed up looking up sockets.  It will consume much
memory on a host with thousands of netns, but it can be waste if we do not
have its protocol family's sockets.

So, I'm now working on a follow-up series for AF_UNIX per-netns hash table
so that we can change the size for a child netns by a sysctl knob:

  # sysctl -w net.unix.child_hash_entries=128
  # ip net add test  # created with the hash table size 128
  # ip net exec test sh
  # sysctl net.unix.hash_entries  # read-only
  128

  (The size for init_net can be changed via a new boot parameter
   xhash_entries like uhash_entries/thash_entries.)

While implementing that, I found that kmemdup() is called for init_net but
TCP/UDP does not (See: ipv4_sysctl_init_net()).  Unlike IPv4, AF_UNIX does
not have a huge sysctl table, so it cannot be a problem though, this patch
is for consuming less memory and kind of consistency.  The reason I submit
this seperately is that it might be better to have a Fixes tag.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 18:30     ` Kuniyuki Iwashima
@ 2022-06-27 18:40       ` Eric Dumazet
  2022-06-27 18:58         ` Kuniyuki Iwashima
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2022-06-27 18:40 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: Jakub Kicinski, David Miller, Eric W. Biederman, Herbert Xu,
	Kuniyuki Iwashima, netdev, Paolo Abeni, Pavel Emelyanov

On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From:   Jakub Kicinski <kuba@kernel.org>
> Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > >
> > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > table and can use it directly.
> > >
> > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > >
> > > I am not quite certain the savings of a single entry table justivies
> > > the complexity.  But the looks correct.
> >
> > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > Could you add more details to explain the motivation?
>
> I was working on a series which converts UDP/TCP hash tables into per-netns
> ones like AF_UNIX to speed up looking up sockets.  It will consume much
> memory on a host with thousands of netns, but it can be waste if we do not
> have its protocol family's sockets.

For the record, I doubt we will accept such a patch (per net-ns
TCP/UDP hash tables)

>
> So, I'm now working on a follow-up series for AF_UNIX per-netns hash table
> so that we can change the size for a child netns by a sysctl knob:
>
>   # sysctl -w net.unix.child_hash_entries=128
>   # ip net add test  # created with the hash table size 128
>   # ip net exec test sh
>   # sysctl net.unix.hash_entries  # read-only
>   128
>
>   (The size for init_net can be changed via a new boot parameter
>    xhash_entries like uhash_entries/thash_entries.)
>
> While implementing that, I found that kmemdup() is called for init_net but
> TCP/UDP does not (See: ipv4_sysctl_init_net()).  Unlike IPv4, AF_UNIX does
> not have a huge sysctl table, so it cannot be a problem though, this patch
> is for consuming less memory and kind of consistency.  The reason I submit
> this seperately is that it might be better to have a Fixes tag.

I think that af_unix module can be unloaded.

Your patch will break the module unload operation.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 18:40       ` Eric Dumazet
@ 2022-06-27 18:58         ` Kuniyuki Iwashima
  2022-06-27 19:06           ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-27 18:58 UTC (permalink / raw)
  To: edumazet
  Cc: davem, ebiederm, herbert, kuba, kuni1840, kuniyu, netdev, pabeni, xemul

From:   Eric Dumazet <edumazet@google.com>
Date:   Mon, 27 Jun 2022 20:40:24 +0200
> On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From:   Jakub Kicinski <kuba@kernel.org>
> > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > >
> > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > table and can use it directly.
> > > >
> > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > >
> > > > I am not quite certain the savings of a single entry table justivies
> > > > the complexity.  But the looks correct.
> > >
> > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > Could you add more details to explain the motivation?
> >
> > I was working on a series which converts UDP/TCP hash tables into per-netns
> > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > memory on a host with thousands of netns, but it can be waste if we do not
> > have its protocol family's sockets.
> 
> For the record, I doubt we will accept such a patch (per net-ns
> TCP/UDP hash tables)

Is it because it's risky?
IIRC, you said we need per netns table for TCP in the future.


> > So, I'm now working on a follow-up series for AF_UNIX per-netns hash table
> > so that we can change the size for a child netns by a sysctl knob:
> >
> >   # sysctl -w net.unix.child_hash_entries=128
> >   # ip net add test  # created with the hash table size 128
> >   # ip net exec test sh
> >   # sysctl net.unix.hash_entries  # read-only
> >   128
> >
> >   (The size for init_net can be changed via a new boot parameter
> >    xhash_entries like uhash_entries/thash_entries.)
> >
> > While implementing that, I found that kmemdup() is called for init_net but
> > TCP/UDP does not (See: ipv4_sysctl_init_net()).  Unlike IPv4, AF_UNIX does
> > not have a huge sysctl table, so it cannot be a problem though, this patch
> > is for consuming less memory and kind of consistency.  The reason I submit
> > this seperately is that it might be better to have a Fixes tag.
> 
> I think that af_unix module can be unloaded.
> 
> Your patch will break the module unload operation.

Thank you!
I had to take of kfree() in unix_sysctl_unregister().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 18:58         ` Kuniyuki Iwashima
@ 2022-06-27 19:06           ` Eric Dumazet
  2022-06-27 19:15             ` Kuniyuki Iwashima
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2022-06-27 19:06 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Miller, Eric W. Biederman, Herbert Xu, Jakub Kicinski,
	Kuniyuki Iwashima, netdev, Paolo Abeni, Pavel Emelyanov

On Mon, Jun 27, 2022 at 8:59 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From:   Eric Dumazet <edumazet@google.com>
> Date:   Mon, 27 Jun 2022 20:40:24 +0200
> > On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From:   Jakub Kicinski <kuba@kernel.org>
> > > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > > >
> > > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > > table and can use it directly.
> > > > >
> > > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > >
> > > > > I am not quite certain the savings of a single entry table justivies
> > > > > the complexity.  But the looks correct.
> > > >
> > > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > > Could you add more details to explain the motivation?
> > >
> > > I was working on a series which converts UDP/TCP hash tables into per-netns
> > > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > > memory on a host with thousands of netns, but it can be waste if we do not
> > > have its protocol family's sockets.
> >
> > For the record, I doubt we will accept such a patch (per net-ns
> > TCP/UDP hash tables)
>
> Is it because it's risky?

Because it will be very expensive. TCP hash tables are quite big.

[    4.917080] tcp_listen_portaddr_hash hash table entries: 65536
(order: 8, 1048576 bytes, vmalloc)
[    4.917260] TCP established hash table entries: 524288 (order: 10,
4194304 bytes, vmalloc hugepage)
[    4.917760] TCP bind hash table entries: 65536 (order: 8, 1048576
bytes, vmalloc)
[    4.917881] TCP: Hash tables configured (established 524288 bind 65536)



> IIRC, you said we need per netns table for TCP in the future.

Which ones exactly ? I guess you misunderstood.

>
>
> > > So, I'm now working on a follow-up series for AF_UNIX per-netns hash table
> > > so that we can change the size for a child netns by a sysctl knob:
> > >
> > >   # sysctl -w net.unix.child_hash_entries=128
> > >   # ip net add test  # created with the hash table size 128
> > >   # ip net exec test sh
> > >   # sysctl net.unix.hash_entries  # read-only
> > >   128
> > >
> > >   (The size for init_net can be changed via a new boot parameter
> > >    xhash_entries like uhash_entries/thash_entries.)
> > >
> > > While implementing that, I found that kmemdup() is called for init_net but
> > > TCP/UDP does not (See: ipv4_sysctl_init_net()).  Unlike IPv4, AF_UNIX does
> > > not have a huge sysctl table, so it cannot be a problem though, this patch
> > > is for consuming less memory and kind of consistency.  The reason I submit
> > > this seperately is that it might be better to have a Fixes tag.
> >
> > I think that af_unix module can be unloaded.
> >
> > Your patch will break the module unload operation.
>
> Thank you!
> I had to take of kfree() in unix_sysctl_unregister().

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 19:06           ` Eric Dumazet
@ 2022-06-27 19:15             ` Kuniyuki Iwashima
  2022-06-27 19:36               ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-27 19:15 UTC (permalink / raw)
  To: edumazet
  Cc: davem, ebiederm, herbert, kuba, kuni1840, kuniyu, netdev, pabeni, xemul

From:   Eric Dumazet <edumazet@google.com>
Date:   Mon, 27 Jun 2022 21:06:14 +0200
> On Mon, Jun 27, 2022 at 8:59 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From:   Eric Dumazet <edumazet@google.com>
> > Date:   Mon, 27 Jun 2022 20:40:24 +0200
> > > On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From:   Jakub Kicinski <kuba@kernel.org>
> > > > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > > > >
> > > > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > > > table and can use it directly.
> > > > > >
> > > > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > > >
> > > > > > I am not quite certain the savings of a single entry table justivies
> > > > > > the complexity.  But the looks correct.
> > > > >
> > > > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > > > Could you add more details to explain the motivation?
> > > >
> > > > I was working on a series which converts UDP/TCP hash tables into per-netns
> > > > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > > > memory on a host with thousands of netns, but it can be waste if we do not
> > > > have its protocol family's sockets.
> > >
> > > For the record, I doubt we will accept such a patch (per net-ns
> > > TCP/UDP hash tables)
> >
> > Is it because it's risky?
> 
> Because it will be very expensive. TCP hash tables are quite big.

Yes, so I'm wondering if changing the size by sysctl makes sense.  If we
have per-netns hash tables, each table should have smaller amount of
sockets and smaller size should be enough, I think.

> 
> [    4.917080] tcp_listen_portaddr_hash hash table entries: 65536
> (order: 8, 1048576 bytes, vmalloc)
> [    4.917260] TCP established hash table entries: 524288 (order: 10,
> 4194304 bytes, vmalloc hugepage)
> [    4.917760] TCP bind hash table entries: 65536 (order: 8, 1048576
> bytes, vmalloc)
> [    4.917881] TCP: Hash tables configured (established 524288 bind 65536)
> 
> 
> 
> > IIRC, you said we need per netns table for TCP in the future.
> 
> Which ones exactly ? I guess you misunderstood.

I think this.
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=04c494e68a13

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 19:15             ` Kuniyuki Iwashima
@ 2022-06-27 19:36               ` Eric Dumazet
  2022-06-27 19:59                 ` Kuniyuki Iwashima
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2022-06-27 19:36 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Miller, Eric W. Biederman, Herbert Xu, Jakub Kicinski,
	Kuniyuki Iwashima, netdev, Paolo Abeni, Pavel Emelyanov

On Mon, Jun 27, 2022 at 9:16 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From:   Eric Dumazet <edumazet@google.com>
> Date:   Mon, 27 Jun 2022 21:06:14 +0200
> > On Mon, Jun 27, 2022 at 8:59 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From:   Eric Dumazet <edumazet@google.com>
> > > Date:   Mon, 27 Jun 2022 20:40:24 +0200
> > > > On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > From:   Jakub Kicinski <kuba@kernel.org>
> > > > > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > > > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > > > > >
> > > > > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > > > > table and can use it directly.
> > > > > > >
> > > > > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > > > >
> > > > > > > I am not quite certain the savings of a single entry table justivies
> > > > > > > the complexity.  But the looks correct.
> > > > > >
> > > > > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > > > > Could you add more details to explain the motivation?
> > > > >
> > > > > I was working on a series which converts UDP/TCP hash tables into per-netns
> > > > > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > > > > memory on a host with thousands of netns, but it can be waste if we do not
> > > > > have its protocol family's sockets.
> > > >
> > > > For the record, I doubt we will accept such a patch (per net-ns
> > > > TCP/UDP hash tables)
> > >
> > > Is it because it's risky?
> >
> > Because it will be very expensive. TCP hash tables are quite big.
>
> Yes, so I'm wondering if changing the size by sysctl makes sense.  If we
> have per-netns hash tables, each table should have smaller amount of
> sockets and smaller size should be enough, I think.

How can a sysctl be safely used if two different threads call "unshare
-n" at the same time ?

>
> >
> > [    4.917080] tcp_listen_portaddr_hash hash table entries: 65536
> > (order: 8, 1048576 bytes, vmalloc)
> > [    4.917260] TCP established hash table entries: 524288 (order: 10,
> > 4194304 bytes, vmalloc hugepage)
> > [    4.917760] TCP bind hash table entries: 65536 (order: 8, 1048576
> > bytes, vmalloc)
> > [    4.917881] TCP: Hash tables configured (established 524288 bind 65536)
> >
> >
> >
> > > IIRC, you said we need per netns table for TCP in the future.
> >
> > Which ones exactly ? I guess you misunderstood.
>
> I think this.
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=04c494e68a13

"might" is very different than "will"

I would rather use the list of time_wait, instead of adding huge
memory costs for hosts with hundreds of netns.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 19:36               ` Eric Dumazet
@ 2022-06-27 19:59                 ` Kuniyuki Iwashima
  2022-06-27 20:04                   ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-27 19:59 UTC (permalink / raw)
  To: edumazet
  Cc: davem, ebiederm, herbert, kuba, kuni1840, kuniyu, netdev, pabeni, xemul

From:   Eric Dumazet <edumazet@google.com>
Date:   Mon, 27 Jun 2022 21:36:18 +0200
> On Mon, Jun 27, 2022 at 9:16 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From:   Eric Dumazet <edumazet@google.com>
> > Date:   Mon, 27 Jun 2022 21:06:14 +0200
> > > On Mon, Jun 27, 2022 at 8:59 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From:   Eric Dumazet <edumazet@google.com>
> > > > Date:   Mon, 27 Jun 2022 20:40:24 +0200
> > > > > On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > > >
> > > > > > From:   Jakub Kicinski <kuba@kernel.org>
> > > > > > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > > > > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > > > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > > > > > >
> > > > > > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > > > > > table and can use it directly.
> > > > > > > >
> > > > > > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > > > > >
> > > > > > > > I am not quite certain the savings of a single entry table justivies
> > > > > > > > the complexity.  But the looks correct.
> > > > > > >
> > > > > > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > > > > > Could you add more details to explain the motivation?
> > > > > >
> > > > > > I was working on a series which converts UDP/TCP hash tables into per-netns
> > > > > > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > > > > > memory on a host with thousands of netns, but it can be waste if we do not
> > > > > > have its protocol family's sockets.
> > > > >
> > > > > For the record, I doubt we will accept such a patch (per net-ns
> > > > > TCP/UDP hash tables)
> > > >
> > > > Is it because it's risky?
> > >
> > > Because it will be very expensive. TCP hash tables are quite big.
> >
> > Yes, so I'm wondering if changing the size by sysctl makes sense.  If we
> > have per-netns hash tables, each table should have smaller amount of
> > sockets and smaller size should be enough, I think.
> 
> How can a sysctl be safely used if two different threads call "unshare
> -n" at the same time ?

I think two unshare are safe. Each of them reads its parent netns's sysctl
knob.  Even when the parent is the same, they can read the same value.

But I think we need READ_ONCE()/WRITE_ONCE() in such a sysctl.  While
creating a child netns, another one can change the value and there can be
a data-race.  So we have to use custome handler and pass write/read handler
as conv of do_proc_douintvec(), like do_proc_douintvec_conv_lockless().

If there are some sysctls missing READ_ONCE/WRITE_ONCE(), I will add
more general one, proc_douintvec_lockless().


> > > [    4.917080] tcp_listen_portaddr_hash hash table entries: 65536
> > > (order: 8, 1048576 bytes, vmalloc)
> > > [    4.917260] TCP established hash table entries: 524288 (order: 10,
> > > 4194304 bytes, vmalloc hugepage)
> > > [    4.917760] TCP bind hash table entries: 65536 (order: 8, 1048576
> > > bytes, vmalloc)
> > > [    4.917881] TCP: Hash tables configured (established 524288 bind 65536)
> > >
> > >
> > >
> > > > IIRC, you said we need per netns table for TCP in the future.
> > >
> > > Which ones exactly ? I guess you misunderstood.
> >
> > I think this.
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=04c494e68a13
> 
> "might" is very different than "will"
> 
> I would rather use the list of time_wait, instead of adding huge
> memory costs for hosts with hundreds of netns.

Sorry, my bad.
I would give it a try only for TIME_WAIT.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 19:59                 ` Kuniyuki Iwashima
@ 2022-06-27 20:04                   ` Eric Dumazet
  2022-06-27 20:18                     ` Kuniyuki Iwashima
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2022-06-27 20:04 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Miller, Eric W. Biederman, Herbert Xu, Jakub Kicinski,
	Kuniyuki Iwashima, netdev, Paolo Abeni, Pavel Emelyanov

On Mon, Jun 27, 2022 at 10:00 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From:   Eric Dumazet <edumazet@google.com>
> Date:   Mon, 27 Jun 2022 21:36:18 +0200
> > On Mon, Jun 27, 2022 at 9:16 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From:   Eric Dumazet <edumazet@google.com>
> > > Date:   Mon, 27 Jun 2022 21:06:14 +0200
> > > > On Mon, Jun 27, 2022 at 8:59 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > From:   Eric Dumazet <edumazet@google.com>
> > > > > Date:   Mon, 27 Jun 2022 20:40:24 +0200
> > > > > > On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > > > >
> > > > > > > From:   Jakub Kicinski <kuba@kernel.org>
> > > > > > > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > > > > > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > > > > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > > > > > > >
> > > > > > > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > > > > > > table and can use it directly.
> > > > > > > > >
> > > > > > > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > > > > > >
> > > > > > > > > I am not quite certain the savings of a single entry table justivies
> > > > > > > > > the complexity.  But the looks correct.
> > > > > > > >
> > > > > > > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > > > > > > Could you add more details to explain the motivation?
> > > > > > >
> > > > > > > I was working on a series which converts UDP/TCP hash tables into per-netns
> > > > > > > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > > > > > > memory on a host with thousands of netns, but it can be waste if we do not
> > > > > > > have its protocol family's sockets.
> > > > > >
> > > > > > For the record, I doubt we will accept such a patch (per net-ns
> > > > > > TCP/UDP hash tables)
> > > > >
> > > > > Is it because it's risky?
> > > >
> > > > Because it will be very expensive. TCP hash tables are quite big.
> > >
> > > Yes, so I'm wondering if changing the size by sysctl makes sense.  If we
> > > have per-netns hash tables, each table should have smaller amount of
> > > sockets and smaller size should be enough, I think.
> >
> > How can a sysctl be safely used if two different threads call "unshare
> > -n" at the same time ?
>
> I think two unshare are safe. Each of them reads its parent netns's sysctl
> knob.  Even when the parent is the same, they can read the same value.

How can one thread create a netns with a TCP ehash table with 1024 buckets,
and a second one create a netns with a TCP ehash table with 1 million
buckets at the same time,
if they share the same sysctl ???

>
> But I think we need READ_ONCE()/WRITE_ONCE() in such a sysctl.

Like all sysctls really.

  While
> creating a child netns, another one can change the value and there can be
> a data-race.  So we have to use custome handler and pass write/read handler
> as conv of do_proc_douintvec(), like do_proc_douintvec_conv_lockless().
>
> If there are some sysctls missing READ_ONCE/WRITE_ONCE(), I will add
> more general one, proc_douintvec_lockless().

Seriously, all sysctls can be set while being read. That is not something new.

>
>
> > > > [    4.917080] tcp_listen_portaddr_hash hash table entries: 65536
> > > > (order: 8, 1048576 bytes, vmalloc)
> > > > [    4.917260] TCP established hash table entries: 524288 (order: 10,
> > > > 4194304 bytes, vmalloc hugepage)
> > > > [    4.917760] TCP bind hash table entries: 65536 (order: 8, 1048576
> > > > bytes, vmalloc)
> > > > [    4.917881] TCP: Hash tables configured (established 524288 bind 65536)
> > > >
> > > >
> > > >
> > > > > IIRC, you said we need per netns table for TCP in the future.
> > > >
> > > > Which ones exactly ? I guess you misunderstood.
> > >
> > > I think this.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=04c494e68a13
> >
> > "might" is very different than "will"
> >
> > I would rather use the list of time_wait, instead of adding huge
> > memory costs for hosts with hundreds of netns.
>
> Sorry, my bad.
> I would give it a try only for TIME_WAIT.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
  2022-06-27 20:04                   ` Eric Dumazet
@ 2022-06-27 20:18                     ` Kuniyuki Iwashima
  0 siblings, 0 replies; 13+ messages in thread
From: Kuniyuki Iwashima @ 2022-06-27 20:18 UTC (permalink / raw)
  To: edumazet
  Cc: davem, ebiederm, herbert, kuba, kuni1840, kuniyu, netdev, pabeni, xemul

From:   Eric Dumazet <edumazet@google.com>
Date:   Mon, 27 Jun 2022 22:04:07 +0200
> On Mon, Jun 27, 2022 at 10:00 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> >
> > From:   Eric Dumazet <edumazet@google.com>
> > Date:   Mon, 27 Jun 2022 21:36:18 +0200
> > > On Mon, Jun 27, 2022 at 9:16 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > >
> > > > From:   Eric Dumazet <edumazet@google.com>
> > > > Date:   Mon, 27 Jun 2022 21:06:14 +0200
> > > > > On Mon, Jun 27, 2022 at 8:59 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > > >
> > > > > > From:   Eric Dumazet <edumazet@google.com>
> > > > > > Date:   Mon, 27 Jun 2022 20:40:24 +0200
> > > > > > > On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > > > > >
> > > > > > > > From:   Jakub Kicinski <kuba@kernel.org>
> > > > > > > > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > > > > > > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > > > > > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > > > > > > > >
> > > > > > > > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > > > > > > > table and can use it directly.
> > > > > > > > > >
> > > > > > > > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > > > > > > >
> > > > > > > > > > I am not quite certain the savings of a single entry table justivies
> > > > > > > > > > the complexity.  But the looks correct.
> > > > > > > > >
> > > > > > > > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > > > > > > > Could you add more details to explain the motivation?
> > > > > > > >
> > > > > > > > I was working on a series which converts UDP/TCP hash tables into per-netns
> > > > > > > > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > > > > > > > memory on a host with thousands of netns, but it can be waste if we do not
> > > > > > > > have its protocol family's sockets.
> > > > > > >
> > > > > > > For the record, I doubt we will accept such a patch (per net-ns
> > > > > > > TCP/UDP hash tables)
> > > > > >
> > > > > > Is it because it's risky?
> > > > >
> > > > > Because it will be very expensive. TCP hash tables are quite big.
> > > >
> > > > Yes, so I'm wondering if changing the size by sysctl makes sense.  If we
> > > > have per-netns hash tables, each table should have smaller amount of
> > > > sockets and smaller size should be enough, I think.
> > >
> > > How can a sysctl be safely used if two different threads call "unshare
> > > -n" at the same time ?
> >
> > I think two unshare are safe. Each of them reads its parent netns's sysctl
> > knob.  Even when the parent is the same, they can read the same value.
> 
> How can one thread create a netns with a TCP ehash table with 1024 buckets,
> and a second one create a netns with a TCP ehash table with 1 million
> buckets at the same time,
> if they share the same sysctl ???

Oh, I undertood.
In the example, I added net.unix.hash_entries so we can confirm if the size
is intended one, but yes, checking it and recreating netns is crazy...

  # sysctl -w net.unix.child_hash_entries=128
  # ip net add test  # created with the hash table size 128
  # ip net exec test sh
  # sysctl net.unix.hash_entries  # read-only
  128

Do you have good idea?


> 
> >
> > But I think we need READ_ONCE()/WRITE_ONCE() in such a sysctl.
> 
> Like all sysctls really.
> 
>   While
> > creating a child netns, another one can change the value and there can be
> > a data-race.  So we have to use custome handler and pass write/read handler
> > as conv of do_proc_douintvec(), like do_proc_douintvec_conv_lockless().
> >
> > If there are some sysctls missing READ_ONCE/WRITE_ONCE(), I will add
> > more general one, proc_douintvec_lockless().
> 
> Seriously, all sysctls can be set while being read. That is not something new.

Ok, I added that on TODO.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
@ 2022-07-20  6:35 kernel test robot
  0 siblings, 0 replies; 13+ messages in thread
From: kernel test robot @ 2022-07-20  6:35 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 18161 bytes --]

:::::: 
:::::: Manual check reason: "low confidence static check warning: net/unix/sysctl_net_unix.c:47:3: warning: Argument to kfree() is the address of the global variable 'unix_table', which is not memory allocated by malloc() [clang-analyzer-unix.Malloc]"
:::::: 

CC: llvm(a)lists.linux.dev
CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220626082331.36119-1-kuniyu@amazon.com>
References: <20220626082331.36119-1-kuniyu@amazon.com>
TO: Kuniyuki Iwashima <kuniyu@amazon.com>

Hi Kuniyuki,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net/master]

url:    https://github.com/intel-lab-lkp/linux/commits/Kuniyuki-Iwashima/af_unix-Do-not-call-kmemdup-for-init_net-s-sysctl-table/20220626-162736
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 3b89b511ea0c705cc418440e2abf9d692a556d84
:::::: branch date: 3 weeks ago
:::::: commit date: 3 weeks ago
config: s390-randconfig-c005-20220715 (https://download.01.org/0day-ci/archive/20220720/202207201427.L9FolDQx-lkp(a)intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 07022e6cf9b5b3baa642be53d0b3c3f1c403dbfd)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/ba827e6db65fb677e56f718249dcacecad4d364d
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Kuniyuki-Iwashima/af_unix-Do-not-call-kmemdup-for-init_net-s-sysctl-table/20220626-162736
        git checkout ba827e6db65fb677e56f718249dcacecad4d364d
        # save the config file
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 clang-analyzer 

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>


clang-analyzer warnings: (new ones prefixed by >>)
                              ^~~~~
   lib/cpumask.c:270:9: warning: Dereference of null pointer [clang-analyzer-core.NullDereference]
           prev = __this_cpu_read(distribute_cpu_mask_prev);
                  ^
   include/linux/percpu-defs.h:446:2: note: expanded from macro '__this_cpu_read'
           raw_cpu_read(pcp);                                              \
           ^~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:420:28: note: expanded from macro 'raw_cpu_read'
   #define raw_cpu_read(pcp)               __pcpu_size_call_return(raw_cpu_read_, pcp)
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:323:23: note: expanded from macro '__pcpu_size_call_return'
           case 4: pscr_ret__ = stem##4(variable); break;                  \
                                ^~~~~~~~~~~~~~~~~
   note: (skipping 4 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/asm-generic/percpu.h:44:31: note: expanded from macro 'arch_raw_cpu_ptr'
   #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:231:2: note: expanded from macro 'SHIFT_PERCPU_PTR'
           RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:170:28: note: expanded from macro 'RELOC_HIDE'
       (typeof(ptr)) (__ptr + (off)); })
                              ^~~~~
   lib/cpumask.c:270:9: note: Loop condition is false.  Exiting loop
           prev = __this_cpu_read(distribute_cpu_mask_prev);
                  ^
   include/linux/percpu-defs.h:446:2: note: expanded from macro '__this_cpu_read'
           raw_cpu_read(pcp);                                              \
           ^
   include/linux/percpu-defs.h:420:28: note: expanded from macro 'raw_cpu_read'
   #define raw_cpu_read(pcp)               __pcpu_size_call_return(raw_cpu_read_, pcp)
                                           ^
   include/linux/percpu-defs.h:319:2: note: expanded from macro '__pcpu_size_call_return'
           __verify_pcpu_ptr(&(variable));                                 \
           ^
   include/linux/percpu-defs.h:217:37: note: expanded from macro '__verify_pcpu_ptr'
   #define __verify_pcpu_ptr(ptr)                                          \
                                                                           ^
   lib/cpumask.c:270:9: note: Control jumps to 'case 4:'  at line 270
           prev = __this_cpu_read(distribute_cpu_mask_prev);
                  ^
   include/linux/percpu-defs.h:446:2: note: expanded from macro '__this_cpu_read'
           raw_cpu_read(pcp);                                              \
           ^
   include/linux/percpu-defs.h:420:28: note: expanded from macro 'raw_cpu_read'
   #define raw_cpu_read(pcp)               __pcpu_size_call_return(raw_cpu_read_, pcp)
                                           ^
   include/linux/percpu-defs.h:320:2: note: expanded from macro '__pcpu_size_call_return'
           switch(sizeof(variable)) {                                      \
           ^
   lib/cpumask.c:270:9: note: Loop condition is false.  Exiting loop
           prev = __this_cpu_read(distribute_cpu_mask_prev);
                  ^
   include/linux/percpu-defs.h:446:2: note: expanded from macro '__this_cpu_read'
           raw_cpu_read(pcp);                                              \
           ^
   include/linux/percpu-defs.h:420:28: note: expanded from macro 'raw_cpu_read'
   #define raw_cpu_read(pcp)               __pcpu_size_call_return(raw_cpu_read_, pcp)
                                           ^
   include/linux/percpu-defs.h:323:23: note: expanded from macro '__pcpu_size_call_return'
           case 4: pscr_ret__ = stem##4(variable); break;                  \
                                ^
   note: (skipping 2 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/asm-generic/percpu.h:67:3: note: expanded from macro 'raw_cpu_generic_read'
           *raw_cpu_ptr(&(pcp));                                           \
            ^
   include/linux/percpu-defs.h:241:2: note: expanded from macro 'raw_cpu_ptr'
           __verify_pcpu_ptr(ptr);                                         \
           ^
   include/linux/percpu-defs.h:217:37: note: expanded from macro '__verify_pcpu_ptr'
   #define __verify_pcpu_ptr(ptr)                                          \
                                                                           ^
   lib/cpumask.c:270:9: note: Dereference of null pointer
           prev = __this_cpu_read(distribute_cpu_mask_prev);
                  ^
   include/linux/percpu-defs.h:446:2: note: expanded from macro '__this_cpu_read'
           raw_cpu_read(pcp);                                              \
           ^~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:420:28: note: expanded from macro 'raw_cpu_read'
   #define raw_cpu_read(pcp)               __pcpu_size_call_return(raw_cpu_read_, pcp)
                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:323:23: note: expanded from macro '__pcpu_size_call_return'
           case 4: pscr_ret__ = stem##4(variable); break;                  \
                                ^~~~~~~~~~~~~~~~~
   note: (skipping 4 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/asm-generic/percpu.h:44:31: note: expanded from macro 'arch_raw_cpu_ptr'
   #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)
                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/percpu-defs.h:231:2: note: expanded from macro 'SHIFT_PERCPU_PTR'
           RELOC_HIDE((typeof(*(__p)) __kernel __force *)(__p), (__offset))
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:170:28: note: expanded from macro 'RELOC_HIDE'
       (typeof(ptr)) (__ptr + (off)); })
                              ^~~~~
   Suppressed 57 warnings (45 in non-user code, 12 with check filters).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   102 warnings generated.
   Suppressed 102 warnings (90 in non-user code, 12 with check filters).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   103 warnings generated.
>> net/unix/sysctl_net_unix.c:47:3: warning: Argument to kfree() is the address of the global variable 'unix_table', which is not memory allocated by malloc() [clang-analyzer-unix.Malloc]
                   kfree(table);
                   ^     ~~~~~
   net/unix/sysctl_net_unix.c:29:6: note: Assuming the condition is true
           if (net_eq(net, &init_net)) {
               ^~~~~~~~~~~~~~~~~~~~~~
   net/unix/sysctl_net_unix.c:29:2: note: Taking true branch
           if (net_eq(net, &init_net)) {
           ^
   net/unix/sysctl_net_unix.c:40:6: note: Assuming field 'ctl' is equal to NULL
           if (net->unx.ctl == NULL)
               ^~~~~~~~~~~~~~~~~~~~
   net/unix/sysctl_net_unix.c:40:2: note: Taking true branch
           if (net->unx.ctl == NULL)
           ^
   net/unix/sysctl_net_unix.c:41:3: note: Control jumps to line 46
                   goto err_reg;
                   ^
   net/unix/sysctl_net_unix.c:46:6: note: Assuming the condition is true
           if (net_eq(net, &init_net))
               ^~~~~~~~~~~~~~~~~~~~~~
   net/unix/sysctl_net_unix.c:46:2: note: Taking true branch
           if (net_eq(net, &init_net))
           ^
   net/unix/sysctl_net_unix.c:47:3: note: Argument to kfree() is the address of the global variable 'unix_table', which is not memory allocated by malloc()
                   kfree(table);
                   ^     ~~~~~
   Suppressed 102 warnings (90 in non-user code, 12 with check filters).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   103 warnings generated.
   net/decnet/dn_nsp_out.c:85:2: warning: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
           memset(&fld, 0, sizeof(fld));
           ^
   include/linux/fortify-string.h:288:25: note: expanded from macro 'memset'
   #define memset(p, c, s) __fortify_memset_chk(p, c, s,                   \
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/fortify-string.h:281:2: note: expanded from macro '__fortify_memset_chk'
           __underlying_memset(p, c, __fortify_size);                      \
           ^~~~~~~~~~~~~~~~~~~
   include/linux/fortify-string.h:47:29: note: expanded from macro '__underlying_memset'
   #define __underlying_memset     __builtin_memset
                                   ^~~~~~~~~~~~~~~~
   net/decnet/dn_nsp_out.c:85:2: note: Call to function 'memset' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memset_s' in case of C11
           memset(&fld, 0, sizeof(fld));
           ^
   include/linux/fortify-string.h:288:25: note: expanded from macro 'memset'
   #define memset(p, c, s) __fortify_memset_chk(p, c, s,                   \
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/fortify-string.h:281:2: note: expanded from macro '__fortify_memset_chk'
           __underlying_memset(p, c, __fortify_size);                      \
           ^~~~~~~~~~~~~~~~~~~
   include/linux/fortify-string.h:47:29: note: expanded from macro '__underlying_memset'
   #define __underlying_memset     __builtin_memset
                                   ^~~~~~~~~~~~~~~~
   net/decnet/dn_nsp_out.c:555:3: warning: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11 [clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling]
                   memcpy(msg, dd, ddl);
                   ^
   include/linux/fortify-string.h:385:26: note: expanded from macro 'memcpy'
   #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/fortify-string.h:378:2: note: expanded from macro '__fortify_memcpy_chk'
           __underlying_##op(p, q, __fortify_size);                        \
           ^~~~~~~~~~~~~~~~~
   note: expanded from here
   include/linux/fortify-string.h:45:29: note: expanded from macro '__underlying_memcpy'
   #define __underlying_memcpy     __builtin_memcpy
                                   ^~~~~~~~~~~~~~~~
   net/decnet/dn_nsp_out.c:555:3: note: Call to function 'memcpy' is insecure as it does not provide security checks introduced in the C11 standard. Replace with analogous functions that support length arguments or provides boundary checks such as 'memcpy_s' in case of C11
                   memcpy(msg, dd, ddl);
                   ^
   include/linux/fortify-string.h:385:26: note: expanded from macro 'memcpy'
   #define memcpy(p, q, s)  __fortify_memcpy_chk(p, q, s,                  \
                            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/fortify-string.h:378:2: note: expanded from macro '__fortify_memcpy_chk'
           __underlying_##op(p, q, __fortify_size);                        \
           ^~~~~~~~~~~~~~~~~
   note: expanded from here
   include/linux/fortify-string.h:45:29: note: expanded from macro '__underlying_memcpy'
   #define __underlying_memcpy     __builtin_memcpy
                                   ^~~~~~~~~~~~~~~~
   Suppressed 101 warnings (89 in non-user code, 12 with check filters).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   99 warnings generated.
   Suppressed 99 warnings (87 in non-user code, 12 with check filters).
   Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
   102 warnings generated.
   include/net/sch_generic.h:868:2: warning: Dereference of null pointer [clang-analyzer-core.NullDereference]
           this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
           ^
   include/linux/percpu-defs.h:519:33: note: expanded from macro 'this_cpu_sub'
   #define this_cpu_sub(pcp, val)          this_cpu_add(pcp, -(typeof(pcp))(val))
                                           ^
   include/linux/percpu-defs.h:509:33: note: expanded from macro 'this_cpu_add'
   #define this_cpu_add(pcp, val)          __pcpu_size_call(this_cpu_add_, pcp, val)
                                           ^
   include/linux/percpu-defs.h:379:11: note: expanded from macro '__pcpu_size_call'
                   case 4: stem##4(variable, __VA_ARGS__);break;           \
                           ^
   note: (skipping 4 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/asm-generic/percpu.h:44:31: note: expanded from macro 'arch_raw_cpu_ptr'
   #define arch_raw_cpu_ptr(ptr) SHIFT_PERCPU_PTR(ptr, __my_cpu_offset)

vim +/unix_table +47 net/unix/sysctl_net_unix.c

^1da177e4c3f41 Linus Torvalds    2005-04-16  24  
2c8c1e7297e19b Alexey Dobriyan   2010-01-17  25  int __net_init unix_sysctl_register(struct net *net)
^1da177e4c3f41 Linus Torvalds    2005-04-16  26  {
1597fbc0faf88c Pavel Emelyanov   2007-12-01  27  	struct ctl_table *table;
1597fbc0faf88c Pavel Emelyanov   2007-12-01  28  
ba827e6db65fb6 Kuniyuki Iwashima 2022-06-26  29  	if (net_eq(net, &init_net)) {
ba827e6db65fb6 Kuniyuki Iwashima 2022-06-26  30  		table = unix_table;
ba827e6db65fb6 Kuniyuki Iwashima 2022-06-26  31  	} else {
1597fbc0faf88c Pavel Emelyanov   2007-12-01  32  		table = kmemdup(unix_table, sizeof(unix_table), GFP_KERNEL);
ba827e6db65fb6 Kuniyuki Iwashima 2022-06-26  33  		if (!table)
1597fbc0faf88c Pavel Emelyanov   2007-12-01  34  			goto err_alloc;
1597fbc0faf88c Pavel Emelyanov   2007-12-01  35  
a0a53c8ba95451 Denis V. Lunev    2007-12-11  36  		table[0].data = &net->unx.sysctl_max_dgram_qlen;
ba827e6db65fb6 Kuniyuki Iwashima 2022-06-26  37  	}
ba827e6db65fb6 Kuniyuki Iwashima 2022-06-26  38  
ec8f23ce0f4005 Eric W. Biederman 2012-04-19  39  	net->unx.ctl = register_net_sysctl(net, "net/unix", table);
a0a53c8ba95451 Denis V. Lunev    2007-12-11  40  	if (net->unx.ctl == NULL)
1597fbc0faf88c Pavel Emelyanov   2007-12-01  41  		goto err_reg;
1597fbc0faf88c Pavel Emelyanov   2007-12-01  42  
1597fbc0faf88c Pavel Emelyanov   2007-12-01  43  	return 0;
1597fbc0faf88c Pavel Emelyanov   2007-12-01  44  
1597fbc0faf88c Pavel Emelyanov   2007-12-01  45  err_reg:
ba827e6db65fb6 Kuniyuki Iwashima 2022-06-26  46  	if (net_eq(net, &init_net))
1597fbc0faf88c Pavel Emelyanov   2007-12-01 @47  		kfree(table);
1597fbc0faf88c Pavel Emelyanov   2007-12-01  48  err_alloc:
1597fbc0faf88c Pavel Emelyanov   2007-12-01  49  	return -ENOMEM;
^1da177e4c3f41 Linus Torvalds    2005-04-16  50  }
^1da177e4c3f41 Linus Torvalds    2005-04-16  51  

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-07-20  6:35 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-26  8:23 [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table Kuniyuki Iwashima
2022-06-26 16:43 ` Eric W. Biederman
2022-06-27 17:58   ` Jakub Kicinski
2022-06-27 18:30     ` Kuniyuki Iwashima
2022-06-27 18:40       ` Eric Dumazet
2022-06-27 18:58         ` Kuniyuki Iwashima
2022-06-27 19:06           ` Eric Dumazet
2022-06-27 19:15             ` Kuniyuki Iwashima
2022-06-27 19:36               ` Eric Dumazet
2022-06-27 19:59                 ` Kuniyuki Iwashima
2022-06-27 20:04                   ` Eric Dumazet
2022-06-27 20:18                     ` Kuniyuki Iwashima
2022-07-20  6:35 kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.