All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Sudeep K N <sudeepholla.maillist@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, netdev <netdev@vger.kernel.org>,
	<x86@kernel.org>, open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	Sudeep Holla <sudeep.holla@arm.com>
Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg
Date: Tue, 23 Aug 2016 19:44:59 +0300	[thread overview]
Message-ID: <20160823164459.GD1863@esperanza> (raw)
In-Reply-To: <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>

Hello,

On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote:
> On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
> > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote:
> >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote:
> >> > Unix sockets can consume a significant amount of system memory, hence
> >> > they should be accounted to kmemcg.
> >> >
> >> > Since unix socket buffers are always allocated from process context,
> >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> >> > sock->sk_allocation mask.
> >>
> >> I have two questions :
> >>
> >> 1) What happens when a buffer, allocated from socket <A> lands in a
> >> different socket <B>, maybe owned by another user/process.
> >>
> >> Who owns it now, in term of kmemcg accounting ?
> >
> > We never move memcg charges. E.g. if two processes from different
> > cgroups are sharing a memory region, each page will be charged to the
> > process which touched it first. Or if two processes are working with the
> > same directory tree, inodes and dentries will be charged to the first
> > user. The same is fair for unix socket buffers - they will be charged to
> > the sender.
> >
> >>
> >> 2) Has performance impact been evaluated ?
> >
> > I ran netperf STREAM_STREAM with default options in a kmemcg on
> > a 4 core x 2 HT box. The results are below:
> >
> >  # clients            bandwidth (10^6bits/sec)
> >                     base              patched
> >          1      67643 +-  725      64874 +-  353    - 4.0 %
> >          4     193585 +- 2516     186715 +- 1460    - 3.5 %
> >          8     194820 +-  377     187443 +- 1229    - 3.7 %
> >
> > So the accounting doesn't come for free - it takes ~4% of performance.
> > I believe we could optimize it by using per cpu batching not only on
> > charge, but also on uncharge in memcg core, but that's beyond the scope
> > of this patch set - I'll take a look at this later.
> >
> > Anyway, if performance impact is found to be unacceptable, it is always
> > possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
> > or not use memory cgroups at runtime at all (thanks to jump labels
> > there'll be no overhead even if they are compiled in).
> >
> 
> I started seeing almost 10% degradation in the hackbench score with v4.8-rc1
> Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge
> buffers to kmemcg") in the mainline.
> 
> As per the commit log, it seems like that's expected but I was not sure about
> the margin. I also see the hackbench score is more inconsistent after this
> patch, but I may be wrong as that's based on limited observation.
> 
> Is this something we can ignore as hackbench is more synthetic compared
> to the gain this patch provides in some real workloads ?

AFAIU hackbench essentially measures the rate of sending data over a
unix socket back and forth between processes running on different cpus,
so it isn't a surprise that the patch resulted in a degradation, as it
makes every skb page allocation/deallocation inc/dec an atomic counter
inside memcg. The more processes/cpus running in the same cgroup are
involved in this test, the more significant the overhead of this atomic
counter is going to be.

The degradation is not unavoidable - it can be fixed by making kmem
charge/uncharge code use per-cpu batches. The infrastructure for this
already exists in memcontrol.c. If it were not for the legacy
mem_cgroup->kmem counter (which is actually useless and will be dropped
in cgroup v2), the issue would be pretty easy to fix. However, this
legacy counter makes a possible implementation quite messy, so I'd like
to postpone it until cgroup v2 has finally settled down.

Regarding your problem. As a workaround you can either start your
workload in the root memory cgroup or disable kmem accounting for memory
cgroups altogether (via cgroup.memory=nokmem boot option). If you find
the issue critical, I don't mind reverting the patch - we can always
re-apply it once per-cpu batches are implemented for kmem charges.

Thanks,
Vladimir

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Sudeep K N <sudeepholla.maillist@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<linux-fsdevel@vger.kernel.org>, netdev <netdev@vger.kernel.org>,
	<x86@kernel.org>, open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Sudeep Holla <sudeep.holla@arm.com>
Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg
Date: Tue, 23 Aug 2016 19:44:59 +0300	[thread overview]
Message-ID: <20160823164459.GD1863@esperanza> (raw)
In-Reply-To: <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>

Hello,

On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote:
> On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
> > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote:
> >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote:
> >> > Unix sockets can consume a significant amount of system memory, hence
> >> > they should be accounted to kmemcg.
> >> >
> >> > Since unix socket buffers are always allocated from process context,
> >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> >> > sock->sk_allocation mask.
> >>
> >> I have two questions :
> >>
> >> 1) What happens when a buffer, allocated from socket <A> lands in a
> >> different socket <B>, maybe owned by another user/process.
> >>
> >> Who owns it now, in term of kmemcg accounting ?
> >
> > We never move memcg charges. E.g. if two processes from different
> > cgroups are sharing a memory region, each page will be charged to the
> > process which touched it first. Or if two processes are working with the
> > same directory tree, inodes and dentries will be charged to the first
> > user. The same is fair for unix socket buffers - they will be charged to
> > the sender.
> >
> >>
> >> 2) Has performance impact been evaluated ?
> >
> > I ran netperf STREAM_STREAM with default options in a kmemcg on
> > a 4 core x 2 HT box. The results are below:
> >
> >  # clients            bandwidth (10^6bits/sec)
> >                     base              patched
> >          1      67643 +-  725      64874 +-  353    - 4.0 %
> >          4     193585 +- 2516     186715 +- 1460    - 3.5 %
> >          8     194820 +-  377     187443 +- 1229    - 3.7 %
> >
> > So the accounting doesn't come for free - it takes ~4% of performance.
> > I believe we could optimize it by using per cpu batching not only on
> > charge, but also on uncharge in memcg core, but that's beyond the scope
> > of this patch set - I'll take a look at this later.
> >
> > Anyway, if performance impact is found to be unacceptable, it is always
> > possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
> > or not use memory cgroups at runtime at all (thanks to jump labels
> > there'll be no overhead even if they are compiled in).
> >
> 
> I started seeing almost 10% degradation in the hackbench score with v4.8-rc1
> Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge
> buffers to kmemcg") in the mainline.
> 
> As per the commit log, it seems like that's expected but I was not sure about
> the margin. I also see the hackbench score is more inconsistent after this
> patch, but I may be wrong as that's based on limited observation.
> 
> Is this something we can ignore as hackbench is more synthetic compared
> to the gain this patch provides in some real workloads ?

AFAIU hackbench essentially measures the rate of sending data over a
unix socket back and forth between processes running on different cpus,
so it isn't a surprise that the patch resulted in a degradation, as it
makes every skb page allocation/deallocation inc/dec an atomic counter
inside memcg. The more processes/cpus running in the same cgroup are
involved in this test, the more significant the overhead of this atomic
counter is going to be.

The degradation is not unavoidable - it can be fixed by making kmem
charge/uncharge code use per-cpu batches. The infrastructure for this
already exists in memcontrol.c. If it were not for the legacy
mem_cgroup->kmem counter (which is actually useless and will be dropped
in cgroup v2), the issue would be pretty easy to fix. However, this
legacy counter makes a possible implementation quite messy, so I'd like
to postpone it until cgroup v2 has finally settled down.

Regarding your problem. As a workaround you can either start your
workload in the root memory cgroup or disable kmem accounting for memory
cgroups altogether (via cgroup.memory=nokmem boot option). If you find
the issue critical, I don't mind reverting the patch - we can always
re-apply it once per-cpu batches are implemented for kmem charges.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Sudeep K N <sudeepholla.maillist@gmail.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"David S. Miller" <davem@davemloft.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	netdev <netdev@vger.kernel.org>,
	x86@kernel.org, open list <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Sudeep Holla <sudeep.holla@arm.com>
Subject: Re: [PATCH RESEND 8/8] af_unix: charge buffers to kmemcg
Date: Tue, 23 Aug 2016 19:44:59 +0300	[thread overview]
Message-ID: <20160823164459.GD1863@esperanza> (raw)
In-Reply-To: <CAPKp9uY9kFTqPT+9rkAcWACWrnE-FbGbuU=6mw715X6eCC4PVg@mail.gmail.com>

Hello,

On Tue, Aug 23, 2016 at 02:48:11PM +0100, Sudeep K N wrote:
> On Tue, May 24, 2016 at 5:36 PM, Vladimir Davydov
> <vdavydov@virtuozzo.com> wrote:
> > On Tue, May 24, 2016 at 06:02:06AM -0700, Eric Dumazet wrote:
> >> On Tue, 2016-05-24 at 11:49 +0300, Vladimir Davydov wrote:
> >> > Unix sockets can consume a significant amount of system memory, hence
> >> > they should be accounted to kmemcg.
> >> >
> >> > Since unix socket buffers are always allocated from process context,
> >> > all we need to do to charge them to kmemcg is set __GFP_ACCOUNT in
> >> > sock->sk_allocation mask.
> >>
> >> I have two questions :
> >>
> >> 1) What happens when a buffer, allocated from socket <A> lands in a
> >> different socket <B>, maybe owned by another user/process.
> >>
> >> Who owns it now, in term of kmemcg accounting ?
> >
> > We never move memcg charges. E.g. if two processes from different
> > cgroups are sharing a memory region, each page will be charged to the
> > process which touched it first. Or if two processes are working with the
> > same directory tree, inodes and dentries will be charged to the first
> > user. The same is fair for unix socket buffers - they will be charged to
> > the sender.
> >
> >>
> >> 2) Has performance impact been evaluated ?
> >
> > I ran netperf STREAM_STREAM with default options in a kmemcg on
> > a 4 core x 2 HT box. The results are below:
> >
> >  # clients            bandwidth (10^6bits/sec)
> >                     base              patched
> >          1      67643 +-  725      64874 +-  353    - 4.0 %
> >          4     193585 +- 2516     186715 +- 1460    - 3.5 %
> >          8     194820 +-  377     187443 +- 1229    - 3.7 %
> >
> > So the accounting doesn't come for free - it takes ~4% of performance.
> > I believe we could optimize it by using per cpu batching not only on
> > charge, but also on uncharge in memcg core, but that's beyond the scope
> > of this patch set - I'll take a look at this later.
> >
> > Anyway, if performance impact is found to be unacceptable, it is always
> > possible to disable kmem accounting at boot time (cgroup.memory=nokmem)
> > or not use memory cgroups at runtime at all (thanks to jump labels
> > there'll be no overhead even if they are compiled in).
> >
> 
> I started seeing almost 10% degradation in the hackbench score with v4.8-rc1
> Bisecting it resulted in this patch, i.e. Commit 3aa9799e1364 ("af_unix: charge
> buffers to kmemcg") in the mainline.
> 
> As per the commit log, it seems like that's expected but I was not sure about
> the margin. I also see the hackbench score is more inconsistent after this
> patch, but I may be wrong as that's based on limited observation.
> 
> Is this something we can ignore as hackbench is more synthetic compared
> to the gain this patch provides in some real workloads ?

AFAIU hackbench essentially measures the rate of sending data over a
unix socket back and forth between processes running on different cpus,
so it isn't a surprise that the patch resulted in a degradation, as it
makes every skb page allocation/deallocation inc/dec an atomic counter
inside memcg. The more processes/cpus running in the same cgroup are
involved in this test, the more significant the overhead of this atomic
counter is going to be.

The degradation is not unavoidable - it can be fixed by making kmem
charge/uncharge code use per-cpu batches. The infrastructure for this
already exists in memcontrol.c. If it were not for the legacy
mem_cgroup->kmem counter (which is actually useless and will be dropped
in cgroup v2), the issue would be pretty easy to fix. However, this
legacy counter makes a possible implementation quite messy, so I'd like
to postpone it until cgroup v2 has finally settled down.

Regarding your problem. As a workaround you can either start your
workload in the root memory cgroup or disable kmem accounting for memory
cgroups altogether (via cgroup.memory=nokmem boot option). If you find
the issue critical, I don't mind reverting the patch - we can always
re-apply it once per-cpu batches are implemented for kmem charges.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-08-23 17:17 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-24  8:49 [PATCH RESEND 0/8] More stuff to charge to kmemcg Vladimir Davydov
2016-05-24  8:49 ` Vladimir Davydov
2016-05-24  8:49 ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 1/8] mm: remove pointless struct in struct page definition Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 2/8] mm: clean up non-standard page->_mapcount users Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 3/8] mm: memcontrol: cleanup kmem charge functions Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 4/8] mm: charge/uncharge kmemcg from generic page allocator paths Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 5/8] mm: memcontrol: teach uncharge_list to deal with kmem pages Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 6/8] arch: x86: charge page tables to kmemcg Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 7/8] pipe: account " Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24 12:59   ` Eric Dumazet
2016-05-24 12:59     ` Eric Dumazet
2016-05-24 12:59     ` Eric Dumazet
2016-05-24 16:13     ` Vladimir Davydov
2016-05-24 16:13       ` Vladimir Davydov
2016-05-24 16:13       ` Vladimir Davydov
2016-05-24 20:04       ` Eric Dumazet
2016-05-24 20:04         ` Eric Dumazet
2016-05-24 20:04         ` Eric Dumazet
2016-05-25 10:30         ` Vladimir Davydov
2016-05-25 10:30           ` Vladimir Davydov
2016-05-25 10:30           ` Vladimir Davydov
2016-05-26  7:04           ` Minchan Kim
2016-05-26  7:04             ` Minchan Kim
2016-05-26  7:04             ` Minchan Kim
2016-05-26 13:59             ` Vladimir Davydov
2016-05-26 13:59               ` Vladimir Davydov
2016-05-26 13:59               ` Vladimir Davydov
2016-05-26 14:15               ` Eric Dumazet
2016-05-26 14:15                 ` Eric Dumazet
2016-05-26 14:15                 ` Eric Dumazet
2016-05-27 15:03                 ` Vladimir Davydov
2016-05-27 15:03                   ` Vladimir Davydov
2016-05-27 15:03                   ` Vladimir Davydov
2016-05-24  8:49 ` [PATCH RESEND 8/8] af_unix: charge buffers " Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24  8:49   ` Vladimir Davydov
2016-05-24 13:02   ` Eric Dumazet
2016-05-24 13:02     ` Eric Dumazet
2016-05-24 16:36     ` Vladimir Davydov
2016-05-24 16:36       ` Vladimir Davydov
2016-05-24 16:36       ` Vladimir Davydov
2016-08-23 13:48       ` Sudeep K N
2016-08-23 13:48         ` Sudeep K N
2016-08-23 16:44         ` Vladimir Davydov [this message]
2016-08-23 16:44           ` Vladimir Davydov
2016-08-23 16:44           ` Vladimir Davydov
2016-08-23 16:50           ` Sudeep Holla
2016-08-23 16:50             ` Sudeep Holla

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160823164459.GD1863@esperanza \
    --to=vdavydov@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sudeep.holla@arm.com \
    --cc=sudeepholla.maillist@gmail.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.