All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <edumazet@google.com>
To: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: David Miller <davem@davemloft.net>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	Jakub Kicinski <kuba@kernel.org>,
	Kuniyuki Iwashima <kuni1840@gmail.com>,
	netdev <netdev@vger.kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Pavel Emelyanov <xemul@openvz.org>
Subject: Re: [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table.
Date: Mon, 27 Jun 2022 21:36:18 +0200	[thread overview]
Message-ID: <CANn89iJsk7g0LcH17u=JbLy5dwYi0QVg84b3c5eLf-zUTK5b8g@mail.gmail.com> (raw)
In-Reply-To: <20220627191544.4266-1-kuniyu@amazon.com>

On Mon, Jun 27, 2022 at 9:16 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
>
> From:   Eric Dumazet <edumazet@google.com>
> Date:   Mon, 27 Jun 2022 21:06:14 +0200
> > On Mon, Jun 27, 2022 at 8:59 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > >
> > > From:   Eric Dumazet <edumazet@google.com>
> > > Date:   Mon, 27 Jun 2022 20:40:24 +0200
> > > > On Mon, Jun 27, 2022 at 8:30 PM Kuniyuki Iwashima <kuniyu@amazon.com> wrote:
> > > > >
> > > > > From:   Jakub Kicinski <kuba@kernel.org>
> > > > > Date:   Mon, 27 Jun 2022 10:58:59 -0700
> > > > > > On Sun, 26 Jun 2022 11:43:27 -0500 Eric W. Biederman wrote:
> > > > > > > Kuniyuki Iwashima <kuniyu@amazon.com> writes:
> > > > > > >
> > > > > > > > While setting up init_net's sysctl table, we need not duplicate the global
> > > > > > > > table and can use it directly.
> > > > > > >
> > > > > > > Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> > > > > > >
> > > > > > > I am not quite certain the savings of a single entry table justivies
> > > > > > > the complexity.  But the looks correct.
> > > > > >
> > > > > > Yeah, the commit message is a little sparse. The "why" is not addressed.
> > > > > > Could you add more details to explain the motivation?
> > > > >
> > > > > I was working on a series which converts UDP/TCP hash tables into per-netns
> > > > > ones like AF_UNIX to speed up looking up sockets.  It will consume much
> > > > > memory on a host with thousands of netns, but it can be waste if we do not
> > > > > have its protocol family's sockets.
> > > >
> > > > For the record, I doubt we will accept such a patch (per net-ns
> > > > TCP/UDP hash tables)
> > >
> > > Is it because it's risky?
> >
> > Because it will be very expensive. TCP hash tables are quite big.
>
> Yes, so I'm wondering if changing the size by sysctl makes sense.  If we
> have per-netns hash tables, each table should have smaller amount of
> sockets and smaller size should be enough, I think.

How can a sysctl be safely used if two different threads call "unshare
-n" at the same time ?

>
> >
> > [    4.917080] tcp_listen_portaddr_hash hash table entries: 65536
> > (order: 8, 1048576 bytes, vmalloc)
> > [    4.917260] TCP established hash table entries: 524288 (order: 10,
> > 4194304 bytes, vmalloc hugepage)
> > [    4.917760] TCP bind hash table entries: 65536 (order: 8, 1048576
> > bytes, vmalloc)
> > [    4.917881] TCP: Hash tables configured (established 524288 bind 65536)
> >
> >
> >
> > > IIRC, you said we need per netns table for TCP in the future.
> >
> > Which ones exactly ? I guess you misunderstood.
>
> I think this.
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=04c494e68a13

"might" is very different than "will"

I would rather use the list of time_wait, instead of adding huge
memory costs for hosts with hundreds of netns.

  reply	other threads:[~2022-06-27 19:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-26  8:23 [PATCH v2 net] af_unix: Do not call kmemdup() for init_net's sysctl table Kuniyuki Iwashima
2022-06-26 16:43 ` Eric W. Biederman
2022-06-27 17:58   ` Jakub Kicinski
2022-06-27 18:30     ` Kuniyuki Iwashima
2022-06-27 18:40       ` Eric Dumazet
2022-06-27 18:58         ` Kuniyuki Iwashima
2022-06-27 19:06           ` Eric Dumazet
2022-06-27 19:15             ` Kuniyuki Iwashima
2022-06-27 19:36               ` Eric Dumazet [this message]
2022-06-27 19:59                 ` Kuniyuki Iwashima
2022-06-27 20:04                   ` Eric Dumazet
2022-06-27 20:18                     ` Kuniyuki Iwashima
2022-07-20  6:35 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANn89iJsk7g0LcH17u=JbLy5dwYi0QVg84b3c5eLf-zUTK5b8g@mail.gmail.com' \
    --to=edumazet@google.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=kuba@kernel.org \
    --cc=kuni1840@gmail.com \
    --cc=kuniyu@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.