netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Marcel Apfelbaum <mapfelba@redhat.com>
Cc: marcel@redhat.com, Saeed Mahameed <saeed@kernel.org>,
	netdev@vger.kernel.org, Paolo Abeni <pabeni@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	"David S. Miller" <davem@davemloft.net>,
	Shuah Khan <shuah@kernel.org>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Daniel Borkmann <daniel@iogearbox.net>
Subject: Re: [kuba@kernel.org: Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask]
Date: Fri, 20 Nov 2020 12:50:39 -0800	[thread overview]
Message-ID: <20201120125039.6a88b0b1@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> (raw)
In-Reply-To: <CA+aaKfD_6qdNVRgr2TdDeuOau1UCFzRqWRB8iM-_GHV7mMrcsg@mail.gmail.com>

On Fri, 20 Nov 2020 19:39:24 +0200 Marcel Apfelbaum wrote:
> > The CPU isolation is done statically at system boot by setting
> > Linux kernel parameters, So the container management component, in
> > this case the Machine Configuration Operator (for Openshift) or the
> > K8s counterpart can't really help. (actually they would help if a
> > global RPS mask would exist)
> >
> > I tried to tweak the rps_cpus mask using the container management
> > stack, but there is no sane way to do it, please let me get a
> > little into the details.
> >
> > The k8s orchestration component that deals with injecting the
> > network device(s) into the container is CNI, which is interface
> > based and implemented by a lot of plugins, making it hardly
> > feasible to go over all the existing plugins and change them. Also
> > what about the 3rd party ones?

I'm not particularly amenable to the "changing user space is hard"
argument. Especially that you don't appear to have given it an honest
try.

> > Writing a new CNI plugin and chain it into the existing one is also
> > not an option AFAIK, they work at the network level and do not have
> > access to sysfs (they handle the network namespaces). Even if it
> > would be possible (I don't have a deep CNI understanding), it will
> > require a cluster global configuration that is actually needed only
> > for some of the cluster nodes.
> >
> > Another approach is to set the RPS configuration from the inside(of
> > the container), but the /sys mount is read only for unprivileged
> > containers, so we lose again.
> >
> > That leaves us with a host daemon hack:
> > Since the virtual network devices are created in the host namespace
> > and then "moved" into the container, we can listen to some udev
> > event and write to the rps_cpus file after the virtual netdev is
> > created and before it is moved (as stated above, the work is done
> > by a CNI plugin implementation). That is of course extremely racy
> > and not a valid solution.
> >
> >> > Possibly I can reduce the amount of new code introduced by this
> >> > patchset removing some code duplication
> >> > between rps_default_mask_sysctl() and flow_limit_cpu_sysctl().
> >> > Would that make this change more acceptable? Or should I drop
> >> > this altogether?  
> >>
> >> I'm leaning towards drop altogether, unless you can get some
> >> support/review tags from other netdev developers. So far it
> >> appears we only got a down vote from Saeed.

As I said here, try to convince some other senior networking developers
this is the right solution and I'll apply it.

This is giving me flashbacks of trying bend the kernel for OpenStack
because there was no developer on my team who could change OpenStack.

> > Any solution that will allow the user space to avoid the
> > network soft IRQs on specific CPUs would be welcome.
> >
> > The proposed global mask is a solution, maybe there other ways?

  reply	other threads:[~2020-11-20 20:50 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20201119162527.GB9877@fuller.cnet>
     [not found] ` <CA+aaKfCMa1sOa6bMXFAaP6Wb=5ZgoAcnZAaP9aBmWwOzaAtcHw@mail.gmail.com>
2020-11-20 17:39   ` [kuba@kernel.org: Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask] Marcel Apfelbaum
2020-11-20 20:50     ` Jakub Kicinski [this message]
2020-11-20 21:56     ` Daniel Borkmann
2020-11-21  5:39       ` Marcel Apfelbaum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201120125039.6a88b0b1@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com \
    --to=kuba@kernel.org \
    --cc=corbet@lwn.net \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=mapfelba@redhat.com \
    --cc=marcel@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeed@kernel.org \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).