All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
       [not found] <bug-190841-27@https.bugzilla.kernel.org/>
@ 2017-01-05  1:30 ` Andrew Morton
  2017-01-05 12:33   ` Michal Hocko
                     ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Andrew Morton @ 2017-01-05  1:30 UTC (permalink / raw)
  To: Michal Hocko, Johannes Weiner; +Cc: bugzilla-daemon, frolvlad, linux-mm


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 21 Dec 2016 19:56:16 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=190841
> 
>             Bug ID: 190841
>            Summary: [REGRESSION] Intensive Memory CGroup removal leads to
>                     high load average 10+
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 4.7.0-rc1+
>           Hardware: All
>                 OS: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Other
>           Assignee: akpm@linux-foundation.org
>           Reporter: frolvlad@gmail.com
>         Regression: No
> 
> My simplified workflow looks like this:
> 
> 1. Create a Memory CGroup with memory limit
> 2. Exec a child process
> 3. Add the child process PID into the Memory CGroup
> 4. Wait for the child process to finish
> 5. Remove the Memory CGroup
> 
> The child processes usually run less than 0.1 seconds, but I have lots of them.
> Normally, I could run over 10000 child processes per minute, but with newer
> kernels, I can only do 400-500 executions per minute, and my system becomes
> extremely sluggish (the only indicator of the weirdness I found is an unusually
> high load average, which sometimes goes over 250!).
> 
> Here is a simple reproduction script:
> 
> #!/bin/sh
> CGROUP_BASE=/sys/fs/cgroup/memory/qq
> 
> for $i in $(seq 1000); do
>     echo "Iteration #$i"
>     sh -c "
>         mkdir '$CGROUP_BASE'
>         sh -c 'echo \$$ > $CGROUP_BASE/tasks ; sleep 0.0'
>         rmdir '$CGROUP_BASE' || true
>     "
> done
> # ===
> 
> Running this script on 4.7.0-rc1 and above I get a noticeable slowdown and also
> high load average with no other indicators like high CPU or IO usage reported
> in top/iotop/vmstat.
> 
> It used to work just fine up until Kernel 4.7.0. In fact, I have jumped from
> 4.4 to 4.8 kernel, so I had to test several kernels before I came to the
> conclusion that this seems to be a regression in Kernel. Currently, I have
> tried the following kernels (using a fresh minimal Ubuntu 16.04 on VirtualBox
> with their binary mainline kernels):
> 
> * Ubuntu 4.4.0-57 kernel works fine
> * Mainline 4.4.39 and below seem to work just fine -
> https://youtu.be/tGD6sfwa-3c
> * Mainline 4.6.7 kernel behaves seminormal, load average is higher than on 4.4,
> but not as bad as on 4.7+ - https://youtu.be/-CyhmkkPbKE
> * Mainline 4.7.0-rc1 kernel is the first kernel after 4.6.7 that is available
> in binaries, so I chose to test it and it doesn't play nicely -
> https://youtu.be/C_J5es74Ars
> * Mainline 4.9.0 kernel still doesn't play nicely -
> https://youtu.be/_o17U5x3bmY
> 
> OTHER NOTES:
> 1. Using VirtualBox I have noticed that this bug only reproducible when I have
> 2+ CPU cores!
> 2. This bug is also reproducible on other Linux distibutions: Fedora 25 with
> 4.8.14-300.fc25.x86_64 kernel, latest Arch Linux with 4.8.13 and 4.8.15 with
> Liquorix patchset.
> 3. Commenting out `rmdir '$CGROUP_BASE'` in the reproduction script makes
> things fly yet again, but I don't want to leave leftovers after the runs.
> 
> -- 
> You are receiving this mail because:
> You are the assignee for the bug.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-05  1:30 ` [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+ Andrew Morton
@ 2017-01-05 12:33   ` Michal Hocko
  2017-01-05 20:26     ` Vladyslav Frolov
  2017-01-05 21:22   ` Johannes Weiner
  2017-01-06 16:28   ` Vladimir Davydov
  2 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2017-01-05 12:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, bugzilla-daemon, frolvlad, linux-mm

On Wed 04-01-17 17:30:37, Andrew Morton wrote:
> 
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
> 
> On Wed, 21 Dec 2016 19:56:16 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
> 
> > https://bugzilla.kernel.org/show_bug.cgi?id=190841
> > 
> >             Bug ID: 190841
> >            Summary: [REGRESSION] Intensive Memory CGroup removal leads to
> >                     high load average 10+
> >            Product: Memory Management
> >            Version: 2.5
> >     Kernel Version: 4.7.0-rc1+
> >           Hardware: All
> >                 OS: Linux
> >               Tree: Mainline
> >             Status: NEW
> >           Severity: normal
> >           Priority: P1
> >          Component: Other
> >           Assignee: akpm@linux-foundation.org
> >           Reporter: frolvlad@gmail.com
> >         Regression: No
> > 
> > My simplified workflow looks like this:
> > 
> > 1. Create a Memory CGroup with memory limit
> > 2. Exec a child process
> > 3. Add the child process PID into the Memory CGroup
> > 4. Wait for the child process to finish
> > 5. Remove the Memory CGroup
> > 
> > The child processes usually run less than 0.1 seconds, but I have lots of them.
> > Normally, I could run over 10000 child processes per minute, but with newer
> > kernels, I can only do 400-500 executions per minute, and my system becomes
> > extremely sluggish (the only indicator of the weirdness I found is an unusually
> > high load average, which sometimes goes over 250!).

Well, yes, rmdir is not the cheapest operation... Since b2052564e66d
("mm: memcontrol: continue cache reclaim from offlined groups") we are
postponing the real memcg removal to later, when there is a memory
pressure. 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
after many small jobs") fixed unbound id space consumption. I would be
quite surprised if this caused a new regression. But the report says
that this is 4.7+ thing. I would expect older kernels would just refuse
the create new cgroups... Maybe that happens in your script and just
gets unnoticed?

We might come up with some more harderning in the offline path (e.g.
count the number of dead memcgs and force their reclaim after some
number gets accumulated). But all that just adds more code and risk of
regression for something that is not used very often. Cgroups
creation/destruction are too heavy operations to be done for very
shortlived process. Even without memcg involved. Are there any strong
reasons you cannot reuse an existing cgroup?

> > Here is a simple reproduction script:
> > 
> > #!/bin/sh
> > CGROUP_BASE=/sys/fs/cgroup/memory/qq
> > 
> > for $i in $(seq 1000); do
> >     echo "Iteration #$i"
> >     sh -c "
> >         mkdir '$CGROUP_BASE'
> >         sh -c 'echo \$$ > $CGROUP_BASE/tasks ; sleep 0.0'

one possible workaround would be to do
            echo 1 > $CGROUP_BASE/memory.force_empty

before you remove the cgroup. That should drop the existing charges - at
least for the page cache which might be what keeps those memcgs alive.

> >         rmdir '$CGROUP_BASE' || true
> >     "
> > done
> > # ===

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-05 12:33   ` Michal Hocko
@ 2017-01-05 20:26     ` Vladyslav Frolov
  2017-01-06 14:08       ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Vladyslav Frolov @ 2017-01-05 20:26 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, Johannes Weiner, bugzilla-daemon, linux-mm

> I would expect older kernels would just refuse the create new cgroups... Maybe that happens in your script and just gets unnoticed?

I have been running a production service doing this "intensive"
cgroups creation and cleaning for over a year now and it just works
with 3.xx - 4.5 kernels (currently, I run it on an LTS 4.4 kernel),
triggering up to 100 CGroup creations/cleanings events per second
non-stop for months, and I haven't noticed any refuses in new cgroup
creations whatsoever even on 1GB RAM boxes.


> Even without memcg involved. Are there any strong reasons you cannot reuse an existing cgroup?

I run concurrent executions (I run cgmemtime
[https://github.com/gsauthof/cgmemtime] to measure high-water memory
usage of a group of processes), so I cannot reuse a single cgroup, and
I, currently, cannot maintain a pool of cgroups (it will add extra
complexity in my code, and will require cgmemtime patching, while
older kernels just worked fine). Do you believe there is no bug there
and it is just slow by design? There are a few odd things here:

1. 4.7+ kernels perform 20 times *slower* while postponing should in
theory speed things up due to "async" nature
2. Other cgroup creation/cleaning work like a charm, it is only
`memory` cgroup making my system overloaded


> echo 1 > $CGROUP_BASE/memory.force_empty

This didn't help at alll.

On 5 January 2017 at 14:33, Michal Hocko <mhocko@kernel.org> wrote:
> On Wed 04-01-17 17:30:37, Andrew Morton wrote:
>>
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Wed, 21 Dec 2016 19:56:16 +0000 bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>> > https://bugzilla.kernel.org/show_bug.cgi?id=190841
>> >
>> >             Bug ID: 190841
>> >            Summary: [REGRESSION] Intensive Memory CGroup removal leads to
>> >                     high load average 10+
>> >            Product: Memory Management
>> >            Version: 2.5
>> >     Kernel Version: 4.7.0-rc1+
>> >           Hardware: All
>> >                 OS: Linux
>> >               Tree: Mainline
>> >             Status: NEW
>> >           Severity: normal
>> >           Priority: P1
>> >          Component: Other
>> >           Assignee: akpm@linux-foundation.org
>> >           Reporter: frolvlad@gmail.com
>> >         Regression: No
>> >
>> > My simplified workflow looks like this:
>> >
>> > 1. Create a Memory CGroup with memory limit
>> > 2. Exec a child process
>> > 3. Add the child process PID into the Memory CGroup
>> > 4. Wait for the child process to finish
>> > 5. Remove the Memory CGroup
>> >
>> > The child processes usually run less than 0.1 seconds, but I have lots of them.
>> > Normally, I could run over 10000 child processes per minute, but with newer
>> > kernels, I can only do 400-500 executions per minute, and my system becomes
>> > extremely sluggish (the only indicator of the weirdness I found is an unusually
>> > high load average, which sometimes goes over 250!).
>
> Well, yes, rmdir is not the cheapest operation... Since b2052564e66d
> ("mm: memcontrol: continue cache reclaim from offlined groups") we are
> postponing the real memcg removal to later, when there is a memory
> pressure. 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
> after many small jobs") fixed unbound id space consumption. I would be
> quite surprised if this caused a new regression. But the report says
> that this is 4.7+ thing. I would expect older kernels would just refuse
> the create new cgroups... Maybe that happens in your script and just
> gets unnoticed?
>
> We might come up with some more harderning in the offline path (e.g.
> count the number of dead memcgs and force their reclaim after some
> number gets accumulated). But all that just adds more code and risk of
> regression for something that is not used very often. Cgroups
> creation/destruction are too heavy operations to be done for very
> shortlived process. Even without memcg involved. Are there any strong
> reasons you cannot reuse an existing cgroup?
>
>> > Here is a simple reproduction script:
>> >
>> > #!/bin/sh
>> > CGROUP_BASE=/sys/fs/cgroup/memory/qq
>> >
>> > for $i in $(seq 1000); do
>> >     echo "Iteration #$i"
>> >     sh -c "
>> >         mkdir '$CGROUP_BASE'
>> >         sh -c 'echo \$$ > $CGROUP_BASE/tasks ; sleep 0.0'
>
> one possible workaround would be to do
>             echo 1 > $CGROUP_BASE/memory.force_empty
>
> before you remove the cgroup. That should drop the existing charges - at
> least for the page cache which might be what keeps those memcgs alive.
>
>> >         rmdir '$CGROUP_BASE' || true
>> >     "
>> > done
>> > # ===
>
> --
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-05  1:30 ` [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+ Andrew Morton
  2017-01-05 12:33   ` Michal Hocko
@ 2017-01-05 21:22   ` Johannes Weiner
  2017-01-06 16:28   ` Vladimir Davydov
  2 siblings, 0 replies; 9+ messages in thread
From: Johannes Weiner @ 2017-01-05 21:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Michal Hocko, bugzilla-daemon, frolvlad, linux-mm, Vladimir Davydov

On Wed, Jan 04, 2017 at 05:30:37PM -0800, Andrew Morton wrote:
> > My simplified workflow looks like this:
> > 
> > 1. Create a Memory CGroup with memory limit
> > 2. Exec a child process
> > 3. Add the child process PID into the Memory CGroup
> > 4. Wait for the child process to finish
> > 5. Remove the Memory CGroup
> > 
> > The child processes usually run less than 0.1 seconds, but I have lots of them.
> > Normally, I could run over 10000 child processes per minute, but with newer
> > kernels, I can only do 400-500 executions per minute, and my system becomes
> > extremely sluggish (the only indicator of the weirdness I found is an unusually
> > high load average, which sometimes goes over 250!).
> > 
> > Here is a simple reproduction script:
> > 
> > #!/bin/sh
> > CGROUP_BASE=/sys/fs/cgroup/memory/qq
> > 
> > for $i in $(seq 1000); do
> >     echo "Iteration #$i"
> >     sh -c "
> >         mkdir '$CGROUP_BASE'
> >         sh -c 'echo \$$ > $CGROUP_BASE/tasks ; sleep 0.0'
> >         rmdir '$CGROUP_BASE' || true
> >     "
> > done
> > # ===

You're not even running anything concurrently. While I agree with
Michal that cgroup creation and destruction are not the fastest paths,
a load of 250 from a single-threaded testcase is silly.

We recently had a load spikee issue with the on-demand memcg slab
cache duplication, but that should have happened in 4.6 already. I
don't see anything suspicious going into memcontrol.c after 4.6.

When the load is high like this, can you check with ps what the
blocked tasks are?

A run with perf record -a also might give us an idea if cycles go to
the wrong place.

I'll try to reproduce this once I have access to my test machine again
next week.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-05 20:26     ` Vladyslav Frolov
@ 2017-01-06 14:08       ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2017-01-06 14:08 UTC (permalink / raw)
  To: Vladyslav Frolov
  Cc: Andrew Morton, Johannes Weiner, bugzilla-daemon, linux-mm

On Thu 05-01-17 22:26:53, Vladyslav Frolov wrote:
[...]
> > Even without memcg involved. Are there any strong reasons you cannot reuse an existing cgroup?
> 
> I run concurrent executions (I run cgmemtime
> [https://github.com/gsauthof/cgmemtime] to measure high-water memory
> usage of a group of processes), so I cannot reuse a single cgroup, and
> I, currently, cannot maintain a pool of cgroups (it will add extra
> complexity in my code, and will require cgmemtime patching, while
> older kernels just worked fine). Do you believe there is no bug there
> and it is just slow by design?

> There are a few odd things here:
> 
> 1. 4.7+ kernels perform 20 times *slower* while postponing should in
> theory speed things up due to "async" nature
> 2. Other cgroup creation/cleaning work like a charm, it is only
> `memory` cgroup making my system overloaded
> 
> > echo 1 > $CGROUP_BASE/memory.force_empty
> 
> This didn't help at alll.

OK, then it is not just the page cache staying behind which prevents
those memcgs go away. Another reason might be kmem charges. Memcg kernel
memory accounting has been enabled by default since 4.6 AFAIR. You say
4.7+ has seen a slowdown though so this might be completely unrelated.
But it would be good to see whether the same happens with kernel command
line:
cgroup.memory=nokmem
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-05  1:30 ` [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+ Andrew Morton
  2017-01-05 12:33   ` Michal Hocko
  2017-01-05 21:22   ` Johannes Weiner
@ 2017-01-06 16:28   ` Vladimir Davydov
  2017-01-12 13:55     ` Vladyslav Frolov
  2 siblings, 1 reply; 9+ messages in thread
From: Vladimir Davydov @ 2017-01-06 16:28 UTC (permalink / raw)
  To: frolvlad
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, bugzilla-daemon, linux-mm

Hello,

The issue does look like kmemcg related - see below.

On Wed, Jan 04, 2017 at 05:30:37PM -0800, Andrew Morton wrote:

> > * Ubuntu 4.4.0-57 kernel works fine
> > * Mainline 4.4.39 and below seem to work just fine -
> > https://youtu.be/tGD6sfwa-3c

kmemcg is disabled

> > * Mainline 4.6.7 kernel behaves seminormal, load average is higher than on 4.4,
> > but not as bad as on 4.7+ - https://youtu.be/-CyhmkkPbKE

4.6+

b313aeee25098 mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy

kmemcg is enabled by default for all cgroups, which introduces extra
overhead to memcg destruction path

> > * Mainline 4.7.0-rc1 kernel is the first kernel after 4.6.7 that is available
> > in binaries, so I chose to test it and it doesn't play nicely -
> > https://youtu.be/C_J5es74Ars

4.7+

81ae6d03952c1 mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in kmem_cache_shrink()

kick_all_cpus_sync(), which was used for synchronizing slub cache
destruction before this commit, turns out to be too disruptive on big
SMP machines as it generates a lot of IPIs, so it is replaced with more
lightweight synchronize_sched(). The latter, however, blocks cgroup
rmdir under the slab_mutex for relatively long, resulting in higher load
average as well as stalling other processes trying to create or destroy
a kmem cache.

> > * Mainline 4.9.0 kernel still doesn't play nicely -
> > https://youtu.be/_o17U5x3bmY

The above-mentioned issue is still unfixed.

> > 
> > OTHER NOTES:
> > 1. Using VirtualBox I have noticed that this bug only reproducible when I have
> > 2+ CPU cores!

synchronize_sched() is a no-op on UP machines, which explains why on a
UP machine the problems goes away.

If I'm correct, the issue must have been fixed in 4.10, which is yet to
be released:

89e364db71fb5 slub: move synchronize_sched out of slab_mutex on shrink

You can workaround it on older kernels by turning kmem accounting off.
To do that, append 'cgroup.memory=nokmem' to the kernel command line.
Alternatively, you can try to recompile the kernel choosing SLAB as the
slab allocator, because only SLUB is affected IIRC.

FWIW I tried the script you provided in a 4 CPU VM running 4.10-rc2 and
didn't notice any significant stalls or latency spikes. Could you please
check if this kernel fixes your problem? If it does it might be worth
submitting the patch to stable..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-06 16:28   ` Vladimir Davydov
@ 2017-01-12 13:55     ` Vladyslav Frolov
  2017-01-12 15:33       ` Vladimir Davydov
  0 siblings, 1 reply; 9+ messages in thread
From: Vladyslav Frolov @ 2017-01-12 13:55 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, bugzilla-daemon, linux-mm

Indeed, `cgroup.memory=nokmem` works around the high load average on
all the kernels!

4.10rc2 kernel without `cgroup.memory=nokmem` behaves much better than
4.7-4.9 kernels, yet it still reaches LA ~6 using my reproduction
script, while LA <=1.0 is expected. 4.10rc2 feels like 4.6, which I
described as "seminormal".

Running the reproduction script 3000 times gives the following results:

* 4.4 kernel takes 13 seconds to complete and LA <= 1.0
* 4.6-4.10rc2 kernels with `cgroup.memory=nokmem'` also takes 13
seconds to complete and LA <= 1.0
* 4.6 kernel takes 25 seconds to complete and LA ~= 5
* 4.7-4.9 kernels take 6-9 minutes (yes, 25-40 times slower than with
`nokmem`) to complete and LA > 20
* 4.10rc2 kernel takes 60 seconds (4 times slower than with `nokmem`)
to complete and LA ~= 6


On 6 January 2017 at 18:28, Vladimir Davydov <vdavydov@tarantool.org> wrote:
> Hello,
>
> The issue does look like kmemcg related - see below.
>
> On Wed, Jan 04, 2017 at 05:30:37PM -0800, Andrew Morton wrote:
>
>> > * Ubuntu 4.4.0-57 kernel works fine
>> > * Mainline 4.4.39 and below seem to work just fine -
>> > https://youtu.be/tGD6sfwa-3c
>
> kmemcg is disabled
>
>> > * Mainline 4.6.7 kernel behaves seminormal, load average is higher than on 4.4,
>> > but not as bad as on 4.7+ - https://youtu.be/-CyhmkkPbKE
>
> 4.6+
>
> b313aeee25098 mm: memcontrol: enable kmem accounting for all cgroups in the legacy hierarchy
>
> kmemcg is enabled by default for all cgroups, which introduces extra
> overhead to memcg destruction path
>
>> > * Mainline 4.7.0-rc1 kernel is the first kernel after 4.6.7 that is available
>> > in binaries, so I chose to test it and it doesn't play nicely -
>> > https://youtu.be/C_J5es74Ars
>
> 4.7+
>
> 81ae6d03952c1 mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in kmem_cache_shrink()
>
> kick_all_cpus_sync(), which was used for synchronizing slub cache
> destruction before this commit, turns out to be too disruptive on big
> SMP machines as it generates a lot of IPIs, so it is replaced with more
> lightweight synchronize_sched(). The latter, however, blocks cgroup
> rmdir under the slab_mutex for relatively long, resulting in higher load
> average as well as stalling other processes trying to create or destroy
> a kmem cache.
>
>> > * Mainline 4.9.0 kernel still doesn't play nicely -
>> > https://youtu.be/_o17U5x3bmY
>
> The above-mentioned issue is still unfixed.
>
>> >
>> > OTHER NOTES:
>> > 1. Using VirtualBox I have noticed that this bug only reproducible when I have
>> > 2+ CPU cores!
>
> synchronize_sched() is a no-op on UP machines, which explains why on a
> UP machine the problems goes away.
>
> If I'm correct, the issue must have been fixed in 4.10, which is yet to
> be released:
>
> 89e364db71fb5 slub: move synchronize_sched out of slab_mutex on shrink
>
> You can workaround it on older kernels by turning kmem accounting off.
> To do that, append 'cgroup.memory=nokmem' to the kernel command line.
> Alternatively, you can try to recompile the kernel choosing SLAB as the
> slab allocator, because only SLUB is affected IIRC.
>
> FWIW I tried the script you provided in a 4 CPU VM running 4.10-rc2 and
> didn't notice any significant stalls or latency spikes. Could you please
> check if this kernel fixes your problem? If it does it might be worth
> submitting the patch to stable..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-12 13:55     ` Vladyslav Frolov
@ 2017-01-12 15:33       ` Vladimir Davydov
  2017-12-26 20:40         ` Vladyslav Frolov
  0 siblings, 1 reply; 9+ messages in thread
From: Vladimir Davydov @ 2017-01-12 15:33 UTC (permalink / raw)
  To: Vladyslav Frolov
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, bugzilla-daemon, linux-mm

On Thu, Jan 12, 2017 at 03:55:59PM +0200, Vladyslav Frolov wrote:
> Indeed, `cgroup.memory=nokmem` works around the high load average on
> all the kernels!
> 
> 4.10rc2 kernel without `cgroup.memory=nokmem` behaves much better than
> 4.7-4.9 kernels, yet it still reaches LA ~6 using my reproduction
> script, while LA <=1.0 is expected. 4.10rc2 feels like 4.6, which I
> described as "seminormal".

Thanks for trying it out. I'll think if we can do anything to further
improve performance over the weekend.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+
  2017-01-12 15:33       ` Vladimir Davydov
@ 2017-12-26 20:40         ` Vladyslav Frolov
  0 siblings, 0 replies; 9+ messages in thread
From: Vladyslav Frolov @ 2017-12-26 20:40 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Michal Hocko, Johannes Weiner, Andrew Morton, bugzilla-daemon, linux-mm

It seems that this issue has been fixed in one of the recent major
releases. I cannot reproduce it on 4.14.8 now (I still can reproduce
the issue on the same host with the older kernels and even with 4.9.71
LTS).

Can someone close the issue on bugzilla?

Thank you!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-12-26 20:40 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-190841-27@https.bugzilla.kernel.org/>
2017-01-05  1:30 ` [Bug 190841] New: [REGRESSION] Intensive Memory CGroup removal leads to high load average 10+ Andrew Morton
2017-01-05 12:33   ` Michal Hocko
2017-01-05 20:26     ` Vladyslav Frolov
2017-01-06 14:08       ` Michal Hocko
2017-01-05 21:22   ` Johannes Weiner
2017-01-06 16:28   ` Vladimir Davydov
2017-01-12 13:55     ` Vladyslav Frolov
2017-01-12 15:33       ` Vladimir Davydov
2017-12-26 20:40         ` Vladyslav Frolov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.