From: Feng Tang <feng.tang@intel.com>
To: Shakeel Butt <shakeelb@google.com>
Cc: Eric Dumazet <edumazet@google.com>, Linux MM <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Michal Hocko <mhocko@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Muchun Song <songmuchun@bytedance.com>,
Jakub Kicinski <kuba@kernel.org>, Xin Long <lucien.xin@gmail.com>,
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
kernel test robot <oliver.sang@intel.com>,
Soheil Hassas Yeganeh <soheil@google.com>,
LKML <linux-kernel@vger.kernel.org>,
network dev <netdev@vger.kernel.org>,
linux-s390@vger.kernel.org,
MPTCP Upstream <mptcp@lists.linux.dev>,
"linux-sctp @ vger . kernel . org" <linux-sctp@vger.kernel.org>,
lkp@lists.01.org, kbuild test robot <lkp@intel.com>,
Huang Ying <ying.huang@intel.com>,
Xing Zhengjun <zhengjun.xing@linux.intel.com>,
Yin Fengwei <fengwei.yin@intel.com>, Ying Xu <yinxu@redhat.com>
Subject: Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression
Date: Fri, 24 Jun 2022 15:06:56 +0800 [thread overview]
Message-ID: <20220624070656.GE79500@shbuild999.sh.intel.com> (raw)
In-Reply-To: <CALvZod7kULCvHAuk53FE-XBOi4-BbLdY3HCg6jfCZTJDxYsZow@mail.gmail.com>
On Thu, Jun 23, 2022 at 11:34:15PM -0700, Shakeel Butt wrote:
> CCing memcg folks.
>
> The thread starts at
> https://lore.kernel.org/all/20220619150456.GB34471@xsang-OptiPlex-9020/
>
> On Thu, Jun 23, 2022 at 9:14 PM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Fri, Jun 24, 2022 at 3:57 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >
> > > On Thu, 23 Jun 2022 18:50:07 -0400 Xin Long wrote:
> > > > From the perf data, we can see __sk_mem_reduce_allocated() is the one
> > > > using CPU the most more than before, and mem_cgroup APIs are also
> > > > called in this function. It means the mem cgroup must be enabled in
> > > > the test env, which may explain why I couldn't reproduce it.
> > > >
> > > > The Commit 4890b686f4 ("net: keep sk->sk_forward_alloc as small as
> > > > possible") uses sk_mem_reclaim(checking reclaimable >= PAGE_SIZE) to
> > > > reclaim the memory, which is *more frequent* to call
> > > > __sk_mem_reduce_allocated() than before (checking reclaimable >=
> > > > SK_RECLAIM_THRESHOLD). It might be cheap when
> > > > mem_cgroup_sockets_enabled is false, but I'm not sure if it's still
> > > > cheap when mem_cgroup_sockets_enabled is true.
> > > >
> > > > I think SCTP netperf could trigger this, as the CPU is the bottleneck
> > > > for SCTP netperf testing, which is more sensitive to the extra
> > > > function calls than TCP.
> > > >
> > > > Can we re-run this testing without mem cgroup enabled?
> > >
> > > FWIW I defer to Eric, thanks a lot for double checking the report
> > > and digging in!
> >
> > I did tests with TCP + memcg and noticed a very small additional cost
> > in memcg functions,
> > because of suboptimal layout:
> >
> > Extract of an internal Google bug, update from June 9th:
> >
> > --------------------------------
> > I have noticed a minor false sharing to fetch (struct
> > mem_cgroup)->css.parent, at offset 0xc0,
> > because it shares the cache line containing struct mem_cgroup.memory,
> > at offset 0xd0
> >
> > Ideally, memcg->socket_pressure and memcg->parent should sit in a read
> > mostly cache line.
> > -----------------------
> >
> > But nothing that could explain a "-69.4% regression"
> >
> > memcg has a very similar strategy of per-cpu reserves, with
> > MEMCG_CHARGE_BATCH being 32 pages per cpu.
> >
> > It is not clear why SCTP with 10K writes would overflow this reserve constantly.
> >
> > Presumably memcg experts will have to rework structure alignments to
> > make sure they can cope better
> > with more charge/uncharge operations, because we are not going back to
> > gigantic per-socket reserves,
> > this simply does not scale.
>
> Yes I agree. As you pointed out there are fields which are mostly
> read-only but sharing cache lines with fields which get updated and
> definitely need work.
>
> However can we first confirm if memcg charging is really the issue
> here as I remember these intel lkp tests are configured to run in root
> memcg and the kernel does not associate root memcg to any socket (see
> mem_cgroup_sk_alloc()).
>
> If these tests are running in non-root memcg, is this cgroup v1 or v2?
> The memory counter and the 32 pages per cpu stock are only used on v2.
> For v1, there is no per-cpu stock and there is a separate tcpmem page
> counter and on v1 the network memory accounting has to be enabled
> explicitly i.e. not enabled by default.
>
> There is definite possibility of slowdown on v1 but let's first
> confirm the memcg setup used for this testing environment.
>
> Feng, can you please explain the memcg setup on these test machines
> and if the tests are run in root or non-root memcg?
I don't know the exact setup, Philip/Oliver from 0Day can correct me.
I logged into a test box which runs netperf test, and it seems to be
cgoup v1 and non-root memcg. The netperf tasks all sit in dir:
'/sys/fs/cgroup/memory/system.slice/lkp-bootstrap.service'
And the rootfs is a debian based rootfs
Thanks,
Feng
> thanks,
> Shakeel
next prev parent reply other threads:[~2022-06-24 7:07 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20220619150456.GB34471@xsang-OptiPlex-9020>
[not found] ` <20220622172857.37db0d29@kernel.org>
[not found] ` <CADvbK_csvmkKe46hT9792=+Qcjor2EvkkAnr--CJK3NGX-N9BQ@mail.gmail.com>
2022-06-23 22:50 ` [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression Xin Long
2022-06-24 1:57 ` Jakub Kicinski
2022-06-24 4:13 ` Eric Dumazet
2022-06-24 4:22 ` Eric Dumazet
2022-06-24 5:13 ` Feng Tang
2022-06-24 5:45 ` Eric Dumazet
2022-06-24 6:00 ` Feng Tang
2022-06-24 6:07 ` Eric Dumazet
2022-06-24 6:34 ` Shakeel Butt
2022-06-24 7:06 ` Feng Tang [this message]
2022-06-24 14:43 ` Shakeel Butt
2022-06-25 2:36 ` Feng Tang
2022-06-27 2:38 ` Feng Tang
2022-06-27 8:46 ` Eric Dumazet
2022-06-27 12:34 ` Feng Tang
2022-06-27 14:07 ` Eric Dumazet
2022-06-27 14:48 ` Feng Tang
2022-06-27 16:25 ` Eric Dumazet
2022-06-27 16:48 ` Shakeel Butt
2022-06-27 17:05 ` Eric Dumazet
2022-06-28 1:46 ` Roman Gushchin
2022-06-28 3:49 ` Feng Tang
2022-07-01 15:47 ` Shakeel Butt
2022-07-03 10:43 ` Feng Tang
2022-07-03 22:55 ` Roman Gushchin
2022-07-05 5:03 ` Feng Tang
2022-08-16 5:52 ` Oliver Sang
2022-08-16 15:55 ` Shakeel Butt
2022-06-27 14:52 ` Shakeel Butt
2022-06-27 14:56 ` Eric Dumazet
2022-06-27 15:12 ` Feng Tang
2022-06-27 16:25 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220624070656.GE79500@shbuild999.sh.intel.com \
--to=feng.tang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=edumazet@google.com \
--cc=fengwei.yin@intel.com \
--cc=hannes@cmpxchg.org \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-s390@vger.kernel.org \
--cc=linux-sctp@vger.kernel.org \
--cc=lkp@intel.com \
--cc=lkp@lists.01.org \
--cc=lucien.xin@gmail.com \
--cc=marcelo.leitner@gmail.com \
--cc=mhocko@kernel.org \
--cc=mptcp@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=oliver.sang@intel.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=soheil@google.com \
--cc=songmuchun@bytedance.com \
--cc=ying.huang@intel.com \
--cc=yinxu@redhat.com \
--cc=zhengjun.xing@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).