linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Miller <davem@davemloft.net>
To: tj@kernel.org
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	lizefan@huawei.com, cgroups@vger.kernel.org, hannes@cmpxchg.org,
	kernel-team@fb.com, ninasc@fb.com, dgoode@fb.com
Subject: Re: [PATCH v4.11] cgroup, net_cls: iterate the fds of only the tasks which are being migrated
Date: Wed, 22 Mar 2017 10:33:28 -0700 (PDT)	[thread overview]
Message-ID: <20170322.103328.2258083179184839596.davem@davemloft.net> (raw)
In-Reply-To: <20170314232556.GE15132@htj.duckdns.org>

From: Tejun Heo <tj@kernel.org>
Date: Tue, 14 Mar 2017 19:25:56 -0400

> The net_cls controller controls the classid field of each socket which
> is associated with the cgroup.  Because the classid is per-socket
> attribute, when a task migrates to another cgroup or the configured
> classid of the cgroup changes, the controller needs to walk all
> sockets and update the classid value, which was implemented by
> 3b13758f51de ("cgroups: Allow dynamically changing net_classid").
> 
> While the approach is not scalable, migrating tasks which have a lot
> of fds attached to them is rare and the cost is born by the ones
> initiating the operations.  However, for simplicity, both the
> migration and classid config change paths call update_classid() which
> scans all fds of all tasks in the target css.  This is an overkill for
> the migration path which only needs to cover a much smaller subset of
> tasks which are actually getting migrated in.
> 
> On cgroup v1, this can lead to unexpected scalability issues when one
> tries to migrate a task or process into a net_cls cgroup which already
> contains a lot of fds.  Even if the migration traget doesn't have many
> to get scanned, update_classid() ends up scanning all fds in the
> target cgroup which can be extremely numerous.
> 
> Unfortunately, on cgroup v2 which doesn't use net_cls, the problem is
> even worse.  Before bfc2cf6f61fc ("cgroup: call subsys->*attach() only
> for subsystems which are actually affected by migration"), cgroup core
> would call the ->css_attach callback even for controllers which don't
> see actual migration to a different css.
> 
> As net_cls is always disabled but still mounted on cgroup v2, whenever
> a process is migrated on the cgroup v2 hierarchy, net_cls sees
> identity migration from root to root and cgroup core used to call
> ->css_attach callback for those.  The net_cls ->css_attach ends up
> calling update_classid() on the root net_cls css to which all
> processes on the system belong to as the controller isn't used.  This
> makes any cgroup v2 migration O(total_number_of_fds_on_the_system)
> which is horrible and easily leads to noticeable stalls triggering RCU
> stall warnings and so on.
> 
> The worst symptom is already fixed in upstream by bfc2cf6f61fc
> ("cgroup: call subsys->*attach() only for subsystems which are
> actually affected by migration"); however, backporting that commit is
> too invasive and we want to avoid other cases too.
> 
> This patch updates net_cls's cgrp_attach() to iterate fds of only the
> processes which are actually getting migrated.  This removes the
> surprising migration cost which is dependent on the total number of
> fds in the target cgroup.  As this leaves write_classid() the only
> user of update_classid(), open-code the helper into write_classid().
> 
> Reported-by: David Goode <dgoode@fb.com>
> Fixes: 3b13758f51de ("cgroups: Allow dynamically changing net_classid")
> Cc: stable@vger.kernel.org # v4.4+
> Cc: Nina Schiff <ninasc@fb.com>
> Cc: David S. Miller <davem@davemloft.net>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Applied, thanks.

      reply	other threads:[~2017-03-22 17:33 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-14 23:25 [PATCH v4.11] cgroup, net_cls: iterate the fds of only the tasks which are being migrated Tejun Heo
2017-03-22 17:33 ` David Miller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170322.103328.2258083179184839596.davem@davemloft.net \
    --to=davem@davemloft.net \
    --cc=cgroups@vger.kernel.org \
    --cc=dgoode@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=netdev@vger.kernel.org \
    --cc=ninasc@fb.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).