[For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting

All of lore.kernel.org
 help / color / mirror / Atom feed

* [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
@ 2019-04-02  5:35 Vaibhav Rustagi
  2019-04-24 16:50 ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Vaibhav Rustagi @ 2019-04-02  5:35 UTC (permalink / raw)
  To: stable; +Cc: hannes, tj, mhocko, vdavydov.dev, guro, riel, sfr, akpm, torvalds

In linux stable kernel (tested on 4.14), reading memory.stat in case
of tens of thousands of ghost cgroups pinned by lingering page cache
takes up to 100 ms ~ 700 ms to complete the reading.

Repro steps (tested on 4.14 kernel):

$ cat /tmp/make_zombies

mkdir /tmp/fs
mount -t tmpfs nodev /tmp/fs
for i in {1..10000}; do
   mkdir /sys/fs/cgroup/memory/z$i
   (echo $BASHPID >> /sys/fs/cgroup/memory/z$i/cgroup.procs && echo $i
> /tmp/fs/$i)
 done

# establish baseline
$ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
0.011642670 seconds time elapsed

$ bash /tmp/make_zombies
$ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
0.134939281 seconds time elapsed

$ rmdir /sys/fs/cgroup/memory/z*
$ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
0.135323145 seconds time elapsed
# even after rmdir we have zombies, so still slow.

The fix is already present in linux master (since 4.16) by following commits:

c9019e9bf42e66d028d70d2da6206cad4dd9250d mm: memcontrol: eliminate raw
access to stat and event counters
284542656e22c43fdada8c8cc0ca9ede8453eed7  mm: memcontrol: implement
lruvec stat functions on top of each other
a983b5ebee57209c99f68c8327072f25e0e6e3da  mm: memcontrol: fix
excessive complexity in memory.stat reporting
c3cc39118c3610eb6ab4711bc624af7fc48a35fe  mm: memcontrol: fix
NR_WRITEBACK leak in memcg and system stats
e27be240df53f1a20c659168e722b5d9f16cc7f4  mm: memcg: make sure
memory.events is uptodate when waking pollers

I would like to request cherry-picking the above commits to
linux-stable branch - 4.14.

Thanks,
Vaibhav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
  2019-04-02  5:35 [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting Vaibhav Rustagi
@ 2019-04-24 16:50 ` Greg KH
  2019-04-24 17:35   ` Vaibhav Rustagi
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2019-04-24 16:50 UTC (permalink / raw)
  To: Vaibhav Rustagi
  Cc: stable, hannes, tj, mhocko, vdavydov.dev, guro, riel, sfr, akpm,
	torvalds

On Mon, Apr 01, 2019 at 10:35:59PM -0700, Vaibhav Rustagi wrote:
> In linux stable kernel (tested on 4.14), reading memory.stat in case
> of tens of thousands of ghost cgroups pinned by lingering page cache
> takes up to 100 ms ~ 700 ms to complete the reading.

Great, don't do that :)

> Repro steps (tested on 4.14 kernel):
> 
> $ cat /tmp/make_zombies
> 
> mkdir /tmp/fs
> mount -t tmpfs nodev /tmp/fs
> for i in {1..10000}; do
>    mkdir /sys/fs/cgroup/memory/z$i
>    (echo $BASHPID >> /sys/fs/cgroup/memory/z$i/cgroup.procs && echo $i
> > /tmp/fs/$i)
>  done
> 
> # establish baseline
> $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> 0.011642670 seconds time elapsed
> 
> $ bash /tmp/make_zombies
> $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> 0.134939281 seconds time elapsed
> 
> $ rmdir /sys/fs/cgroup/memory/z*
> $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> 0.135323145 seconds time elapsed
> # even after rmdir we have zombies, so still slow.
> 
> The fix is already present in linux master (since 4.16) by following commits:
> 
> c9019e9bf42e66d028d70d2da6206cad4dd9250d mm: memcontrol: eliminate raw
> access to stat and event counters
> 284542656e22c43fdada8c8cc0ca9ede8453eed7  mm: memcontrol: implement
> lruvec stat functions on top of each other
> a983b5ebee57209c99f68c8327072f25e0e6e3da  mm: memcontrol: fix
> excessive complexity in memory.stat reporting
> c3cc39118c3610eb6ab4711bc624af7fc48a35fe  mm: memcontrol: fix
> NR_WRITEBACK leak in memcg and system stats
> e27be240df53f1a20c659168e722b5d9f16cc7f4  mm: memcg: make sure
> memory.events is uptodate when waking pollers
> 
> I would like to request cherry-picking the above commits to
> linux-stable branch - 4.14.

What's wrong with just moving to a newer kernel, like 4.19.y, if you
have this issue?  That's a much better thing to do than to backport the
above patches, right?

As this is just an "annoyance", on the old kernel, I don't really see
why it's needed to be backported, it can't cause any problems overall,
right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
  2019-04-24 16:50 ` Greg KH
@ 2019-04-24 17:35   ` Vaibhav Rustagi
  2019-04-24 18:34     ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Vaibhav Rustagi @ 2019-04-24 17:35 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, hannes, tj, mhocko, vdavydov.dev, guro, riel, sfr, akpm,
	torvalds, Aditya Kali

Apologies for sending a non-plain text e-mail previously.

This issue is encountered in the actual production environment by our
customers where they are constantly creating containers
and tearing them down (using kubernetes for the workload).  Kubernetes
constantly reads the memory.stat file for accounting memory
information and over time (around a week) the memcg's got accumulated
and the response time for reading memory.stat increases and
customer applications get affected.

The repro steps mentioned previously was just used for testing the
patches locally.

Yes, we are moving to 4.19 but are also supporting 4.14 till Jan 2020
(so production environment will still contain 4.14 kernel)

Let me know your thoughts on this.

Thanks,
Vaibhav


On Wed, Apr 24, 2019 at 9:50 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Mon, Apr 01, 2019 at 10:35:59PM -0700, Vaibhav Rustagi wrote:
> > In linux stable kernel (tested on 4.14), reading memory.stat in case
> > of tens of thousands of ghost cgroups pinned by lingering page cache
> > takes up to 100 ms ~ 700 ms to complete the reading.
>
> Great, don't do that :)
>
> > Repro steps (tested on 4.14 kernel):
> >
> > $ cat /tmp/make_zombies
> >
> > mkdir /tmp/fs
> > mount -t tmpfs nodev /tmp/fs
> > for i in {1..10000}; do
> >    mkdir /sys/fs/cgroup/memory/z$i
> >    (echo $BASHPID >> /sys/fs/cgroup/memory/z$i/cgroup.procs && echo $i
> > > /tmp/fs/$i)
> >  done
> >
> > # establish baseline
> > $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> > 0.011642670 seconds time elapsed
> >
> > $ bash /tmp/make_zombies
> > $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> > 0.134939281 seconds time elapsed
> >
> > $ rmdir /sys/fs/cgroup/memory/z*
> > $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> > 0.135323145 seconds time elapsed
> > # even after rmdir we have zombies, so still slow.
> >
> > The fix is already present in linux master (since 4.16) by following commits:
> >
> > c9019e9bf42e66d028d70d2da6206cad4dd9250d mm: memcontrol: eliminate raw
> > access to stat and event counters
> > 284542656e22c43fdada8c8cc0ca9ede8453eed7  mm: memcontrol: implement
> > lruvec stat functions on top of each other
> > a983b5ebee57209c99f68c8327072f25e0e6e3da  mm: memcontrol: fix
> > excessive complexity in memory.stat reporting
> > c3cc39118c3610eb6ab4711bc624af7fc48a35fe  mm: memcontrol: fix
> > NR_WRITEBACK leak in memcg and system stats
> > e27be240df53f1a20c659168e722b5d9f16cc7f4  mm: memcg: make sure
> > memory.events is uptodate when waking pollers
> >
> > I would like to request cherry-picking the above commits to
> > linux-stable branch - 4.14.
>
> What's wrong with just moving to a newer kernel, like 4.19.y, if you
> have this issue?  That's a much better thing to do than to backport the
> above patches, right?
>
> As this is just an "annoyance", on the old kernel, I don't really see
> why it's needed to be backported, it can't cause any problems overall,
> right?
>
> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
  2019-04-24 17:35   ` Vaibhav Rustagi
@ 2019-04-24 18:34     ` Greg KH
  2019-04-30 20:41       ` Vaibhav Rustagi
  0 siblings, 1 reply; 8+ messages in thread
From: Greg KH @ 2019-04-24 18:34 UTC (permalink / raw)
  To: Vaibhav Rustagi
  Cc: stable, hannes, tj, mhocko, vdavydov.dev, guro, riel, sfr, akpm,
	torvalds, Aditya Kali

A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?

http://daringfireball.net/2007/07/on_top

On Wed, Apr 24, 2019 at 10:35:51AM -0700, Vaibhav Rustagi wrote:
> Apologies for sending a non-plain text e-mail previously.
> 
> This issue is encountered in the actual production environment by our
> customers where they are constantly creating containers
> and tearing them down (using kubernetes for the workload).  Kubernetes
> constantly reads the memory.stat file for accounting memory
> information and over time (around a week) the memcg's got accumulated
> and the response time for reading memory.stat increases and
> customer applications get affected.

Please define "affected".  Their apps still run properly, so all should
be fine, it would be kubernetes that sees the slowdowns, not the
application.  How exactly does this show up to an end-user?

> The repro steps mentioned previously was just used for testing the
> patches locally.
> 
> Yes, we are moving to 4.19 but are also supporting 4.14 till Jan 2020
> (so production environment will still contain 4.14 kernel)

If you are already moving to 4.19, this seems like a good as reason as
any (hint, I can give you more) to move off of 4.14 at this point in
time.  There's no real need to keep 4.14 around, given that you don't
have any out-of-tree code in your kernels, so all should be simple to
just update the next reboot, right?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
  2019-04-24 18:34     ` Greg KH
@ 2019-04-30 20:41       ` Vaibhav Rustagi
  2019-05-01  7:08         ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Vaibhav Rustagi @ 2019-04-30 20:41 UTC (permalink / raw)
  To: Greg KH
  Cc: stable, hannes, tj, mhocko, vdavydov.dev, guro, riel, sfr, akpm,
	torvalds, Aditya Kali

On Wed, Apr 24, 2019 at 11:53 AM Greg KH <gregkh@linuxfoundation.org> wrote:
>
>
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
> A: Top-posting.
> Q: What is the most annoying thing in e-mail?
>
> A: No.
> Q: Should I include quotations after my reply?
>
> http://daringfireball.net/2007/07/on_top
>
> On Wed, Apr 24, 2019 at 10:35:51AM -0700, Vaibhav Rustagi wrote:
> > Apologies for sending a non-plain text e-mail previously.
> >
> > This issue is encountered in the actual production environment by our
> > customers where they are constantly creating containers
> > and tearing them down (using kubernetes for the workload).  Kubernetes
> > constantly reads the memory.stat file for accounting memory
> > information and over time (around a week) the memcg's got accumulated
> > and the response time for reading memory.stat increases and
> > customer applications get affected.
>
> Please define "affected".  Their apps still run properly, so all should
> be fine, it would be kubernetes that sees the slowdowns, not the
> application.  How exactly does this show up to an end-user?
>

Over time as the zombie cgroups get accumulated, kubelet (process
doing frequent memory.stat) becomes more cpu resource intensive and
all other user containers running on the same machine will starve for
cpu. It affects the user containers in at-least 2 ways that we know
of: (1) User experience liveness probe failures where there
applications are not completed in expected amount of time. (2) new
user jobs cannot be schedule,
There certainly is a possibilty of reducing the adverse affect at
Kubernetes level as well, and we are investigating that as well. But,
the kernel patches requested helps in not exacerbating the problem.

> > The repro steps mentioned previously was just used for testing the
> > patches locally.
> >
> > Yes, we are moving to 4.19 but are also supporting 4.14 till Jan 2020
> > (so production environment will still contain 4.14 kernel)
>
> If you are already moving to 4.19, this seems like a good as reason as
> any (hint, I can give you more) to move off of 4.14 at this point in
> time.  There's no real need to keep 4.14 around, given that you don't
> have any out-of-tree code in your kernels, so all should be simple to
> just update the next reboot, right?
>

Based on the past experiences, major kernel upgrade sometime
introduces new regressions as well. So while we are working to roll
out kernel 4.19, it may not be a practical solution for all the users.

> thanks,
>
> greg k-h

Thanks,
Vaibhav

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
  2019-04-30 20:41       ` Vaibhav Rustagi
@ 2019-05-01  7:08         ` Greg KH
  0 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2019-05-01  7:08 UTC (permalink / raw)
  To: Vaibhav Rustagi
  Cc: stable, hannes, tj, mhocko, vdavydov.dev, guro, riel, sfr, akpm,
	torvalds, Aditya Kali

On Tue, Apr 30, 2019 at 01:41:16PM -0700, Vaibhav Rustagi wrote:
> On Wed, Apr 24, 2019 at 11:53 AM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> >
> > A: Because it messes up the order in which people normally read text.
> > Q: Why is top-posting such a bad thing?
> > A: Top-posting.
> > Q: What is the most annoying thing in e-mail?
> >
> > A: No.
> > Q: Should I include quotations after my reply?
> >
> > http://daringfireball.net/2007/07/on_top
> >
> > On Wed, Apr 24, 2019 at 10:35:51AM -0700, Vaibhav Rustagi wrote:
> > > Apologies for sending a non-plain text e-mail previously.
> > >
> > > This issue is encountered in the actual production environment by our
> > > customers where they are constantly creating containers
> > > and tearing them down (using kubernetes for the workload).  Kubernetes
> > > constantly reads the memory.stat file for accounting memory
> > > information and over time (around a week) the memcg's got accumulated
> > > and the response time for reading memory.stat increases and
> > > customer applications get affected.
> >
> > Please define "affected".  Their apps still run properly, so all should
> > be fine, it would be kubernetes that sees the slowdowns, not the
> > application.  How exactly does this show up to an end-user?
> >
> 
> Over time as the zombie cgroups get accumulated, kubelet (process
> doing frequent memory.stat) becomes more cpu resource intensive and
> all other user containers running on the same machine will starve for
> cpu. It affects the user containers in at-least 2 ways that we know
> of: (1) User experience liveness probe failures where there
> applications are not completed in expected amount of time.

"expected amount of time" is interesting to claim in a shared
environment :)

> (2) new user jobs cannot be schedule,

Really?  This slows down starting new processes?  Or is this just
slowing down your system overall?

> There certainly is a possibilty of reducing the adverse affect at
> Kubernetes level as well, and we are investigating that as well. But,
> the kernel patches requested helps in not exacerbating the problem.

I understand this is a kernel issue, but if you see this happen, just
updating to a modern kernel should be fine.

> > > The repro steps mentioned previously was just used for testing the
> > > patches locally.
> > >
> > > Yes, we are moving to 4.19 but are also supporting 4.14 till Jan 2020
> > > (so production environment will still contain 4.14 kernel)
> >
> > If you are already moving to 4.19, this seems like a good as reason as
> > any (hint, I can give you more) to move off of 4.14 at this point in
> > time.  There's no real need to keep 4.14 around, given that you don't
> > have any out-of-tree code in your kernels, so all should be simple to
> > just update the next reboot, right?
> >
> 
> Based on the past experiences, major kernel upgrade sometime
> introduces new regressions as well. So while we are working to roll
> out kernel 4.19, it may not be a practical solution for all the users.

If you are not doing the same exact testing senario for a new 4.14.y
kernel release as you are doing for a move to 4.19.y, then your "roll
out" process is broken.

Given that 4.19.y is now 6 months old, I would have expected any "new
regressions" to have already been reported.  Please just use a new
kernel, and if you have regressions, we will work to address them.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
  2019-04-01 20:34 Vaibhav Rustagi
@ 2019-04-02  5:24 ` Greg KH
  0 siblings, 0 replies; 8+ messages in thread
From: Greg KH @ 2019-04-02  5:24 UTC (permalink / raw)
  To: Vaibhav Rustagi; +Cc: stable, hannes

On Mon, Apr 01, 2019 at 01:34:14PM -0700, Vaibhav Rustagi wrote:
> In linux stable kernel (tested on 4.14), reading memory.stat in case
> of tens of thousands of ghost cgroups pinned by lingering page cache
> takes up to 100 ms ~ 700 ms to complete the reading.
> 
> Repro steps (tested on 4.14 kernel):
> 
> $ cat /tmp/make_zombies
> 
> mkdir /tmp/fs
> mount -t tmpfs nodev /tmp/fs
> for i in {1..10000}; do
>    mkdir /sys/fs/cgroup/memory/z$i
>    (echo $BASHPID >> /sys/fs/cgroup/memory/z$i/cgroup.procs && echo $i
> > /tmp/fs/$i)
>  done
> 
> # establish baseline
> $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> 0.011642670 seconds time elapsed
> 
> $ bash /tmp/make_zombies
> $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> 0.134939281 seconds time elapsed
> 
> $ rmdir /sys/fs/cgroup/memory/z*
> $ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
> 0.135323145 seconds time elapsed
> # even after rmdir we have zombies, so still slow.
> 
> The fix is already present in linux master (since 4.16) by following commits:
> 
> c9019e9bf42e66d028d70d2da6206cad4dd9250d mm: memcontrol: eliminate raw
> access to stat and event counters
> 284542656e22c43fdada8c8cc0ca9ede8453eed7  mm: memcontrol: implement
> lruvec stat functions on top of each other
> a983b5ebee57209c99f68c8327072f25e0e6e3da  mm: memcontrol: fix
> excessive complexity in memory.stat reporting
> c3cc39118c3610eb6ab4711bc624af7fc48a35fe  mm: memcontrol: fix
> NR_WRITEBACK leak in memcg and system stats
> e27be240df53f1a20c659168e722b5d9f16cc7f4  mm: memcg: make sure
> memory.events is uptodate when waking pollers
> 
> I would like to request cherry-picking the above commits to
> linux-stable branch - 4.14.

Please resend and cc: all of the people on those patches so we can get
their opinion on if this is stable kernel material or not.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting
@ 2019-04-01 20:34 Vaibhav Rustagi
  2019-04-02  5:24 ` Greg KH
  0 siblings, 1 reply; 8+ messages in thread
From: Vaibhav Rustagi @ 2019-04-01 20:34 UTC (permalink / raw)
  To: stable; +Cc: hannes

In linux stable kernel (tested on 4.14), reading memory.stat in case
of tens of thousands of ghost cgroups pinned by lingering page cache
takes up to 100 ms ~ 700 ms to complete the reading.

Repro steps (tested on 4.14 kernel):

$ cat /tmp/make_zombies

mkdir /tmp/fs
mount -t tmpfs nodev /tmp/fs
for i in {1..10000}; do
   mkdir /sys/fs/cgroup/memory/z$i
   (echo $BASHPID >> /sys/fs/cgroup/memory/z$i/cgroup.procs && echo $i
> /tmp/fs/$i)
 done

# establish baseline
$ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
0.011642670 seconds time elapsed

$ bash /tmp/make_zombies
$ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
0.134939281 seconds time elapsed

$ rmdir /sys/fs/cgroup/memory/z*
$ perf stat -r3 cat /sys/fs/cgroup/memory/memory.stat > /dev/null
0.135323145 seconds time elapsed
# even after rmdir we have zombies, so still slow.

The fix is already present in linux master (since 4.16) by following commits:

c9019e9bf42e66d028d70d2da6206cad4dd9250d mm: memcontrol: eliminate raw
access to stat and event counters
284542656e22c43fdada8c8cc0ca9ede8453eed7  mm: memcontrol: implement
lruvec stat functions on top of each other
a983b5ebee57209c99f68c8327072f25e0e6e3da  mm: memcontrol: fix
excessive complexity in memory.stat reporting
c3cc39118c3610eb6ab4711bc624af7fc48a35fe  mm: memcontrol: fix
NR_WRITEBACK leak in memcg and system stats
e27be240df53f1a20c659168e722b5d9f16cc7f4  mm: memcg: make sure
memory.events is uptodate when waking pollers

I would like to request cherry-picking the above commits to
linux-stable branch - 4.14.

Thanks,
Vaibhav

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-05-01  7:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-02  5:35 [For Stable] mm: memcontrol: fix excessive complexity in memory.stat reporting Vaibhav Rustagi
2019-04-24 16:50 ` Greg KH
2019-04-24 17:35   ` Vaibhav Rustagi
2019-04-24 18:34     ` Greg KH
2019-04-30 20:41       ` Vaibhav Rustagi
2019-05-01  7:08         ` Greg KH
  -- strict thread matches above, loose matches on Subject: below --
2019-04-01 20:34 Vaibhav Rustagi
2019-04-02  5:24 ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.