linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline
@ 2019-08-01 13:42 Jan Hadrava
  2019-08-01 14:06 ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hadrava @ 2019-08-01 13:42 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, wizards

There seems to be a bug in mm/vmscan.c shrink_slab function when kernel is
compilled with CONFIG_MEMCG=y and it is then disabled at boot with commandline
parameter cgroup_disable=memory. SLABs are then not getting shrinked if the
system memory is consumed by userspace.

This issue is present in linux-stable 4.19 and all newer lines.
    (tested on git tags v5.3-rc2 v5.2.5 v5.1.21 v4.19.63)
And it is no not present in 4.14.135 (v4.14.135).

Git bisect is pointing to commit:
	b0dedc49a2daa0f44ddc51fbf686b2ef012fccbf

Particulary the last hunk seems to be causing it:

@@ -585,13 +657,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
                        .memcg = memcg,
                };

-               /*
-                * If kernel memory accounting is disabled, we ignore
-                * SHRINKER_MEMCG_AWARE flag and call all shrinkers
-                * passing NULL for memcg.
-                */
-               if (memcg_kmem_enabled() &&
-                   !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
+               if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
                        continue;

                if (!(shrinker->flags & SHRINKER_NUMA_AWARE))

Following commit aeed1d325d429ac9699c4bf62d17156d60905519
deletes conditional continue (and so it fixes the problem). But it is creating
similar issue few lines earlier:

@@ -644,7 +642,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
        struct shrinker *shrinker;
        unsigned long freed = 0;

-       if (memcg && !mem_cgroup_is_root(memcg))
+       if (!mem_cgroup_is_root(memcg))
                return shrink_slab_memcg(gfp_mask, nid, memcg, priority);

        if (!down_read_trylock(&shrinker_rwsem))
@@ -657,9 +655,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
                        .memcg = memcg,
                };

-               if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
-                       continue;
-
                if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
                        sc.nid = 0;


How was the bisection done:

 - Compile kernel with x86_64_defconfig + CONFIG_MEMCG=y
 - Boot VM with cgroup_disable=memory and filesystem with 500k Inodes and run
   simple script on it:
   - Observe number of active objects of ext4_inode_cache
     --> around 400, but anything under 1000 was accepted by the bisect script

   - Call `find / > /dev/null`
   - Again observe number of active objects of ext4_inode_cache
     --> around 7000, but anything over 6000 was accepted by the script

   - Consume whole memory by simple program `while(1){ malloc(1); }` until it
     gets killed by oom-killer.
   - Again observe number of active objects of ext4_inode_cache
     --> around 7000, script threshold: >= 6000 --> bug is there
     --> around 100, script threshold <= 1000 --> bug not present

Real-world appearance:

We encountered this issue after upgrading kernel from 4.9 to 4.19 on our backup
server. (Debian Stretch userspace, upgrade to Debian Buster distribution kernel
or custom build 4.19.60.) The server has around 12 M of used inodes and only
4 GB of RAM. Whenever we run the backup, memory gets quickly consumed by kernel
SLABs (mainly ext4_inode_cache). Userspace starts receiving a lot of hits by
oom-killer after that so the server is completly unusable until reboot.

We just removed the cgroup_disable=memory parameter on our server. Memory
footprint of memcg is significantly smaller then it used to be in the time we
started using this parameter. But i still think that mentioned behaviour is a
bug and should be fixed.

By the way, it seems like the raspberrypi kernel was fighting this issue as well:
	https://github.com/raspberrypi/linux/issues/2829
If I'm reading correctly: they disabled memcg via commandline due to some
memory leaks. Month later: they hit this issue and reenabled memcg.


Thanks,
Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline
  2019-08-01 13:42 [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline Jan Hadrava
@ 2019-08-01 14:06 ` Michal Hocko
  2019-08-01 15:54   ` Jan Hadrava
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2019-08-01 14:06 UTC (permalink / raw)
  To: Jan Hadrava
  Cc: linux-mm, linux-kernel, wizards, Kirill Tkhai, Johannes Weiner,
	Yang Shi, Shakeel Butt

Cc few more people

On Thu 01-08-19 15:42:50, Jan Hadrava wrote:
> There seems to be a bug in mm/vmscan.c shrink_slab function when kernel is
> compilled with CONFIG_MEMCG=y and it is then disabled at boot with commandline
> parameter cgroup_disable=memory. SLABs are then not getting shrinked if the
> system memory is consumed by userspace.

This looks similar to http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
although the culprit commit has been identified to be different. Could
you try it out please? Maybe we need more fixes.

keeping the rest of the email for the reference

> This issue is present in linux-stable 4.19 and all newer lines.
>     (tested on git tags v5.3-rc2 v5.2.5 v5.1.21 v4.19.63)
> And it is no not present in 4.14.135 (v4.14.135).
> 
> Git bisect is pointing to commit:
> 	b0dedc49a2daa0f44ddc51fbf686b2ef012fccbf
> 
> Particulary the last hunk seems to be causing it:
> 
> @@ -585,13 +657,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>                         .memcg = memcg,
>                 };
> 
> -               /*
> -                * If kernel memory accounting is disabled, we ignore
> -                * SHRINKER_MEMCG_AWARE flag and call all shrinkers
> -                * passing NULL for memcg.
> -                */
> -               if (memcg_kmem_enabled() &&
> -                   !!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
> +               if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
>                         continue;
> 
>                 if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
> 
> Following commit aeed1d325d429ac9699c4bf62d17156d60905519
> deletes conditional continue (and so it fixes the problem). But it is creating
> similar issue few lines earlier:
> 
> @@ -644,7 +642,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>         struct shrinker *shrinker;
>         unsigned long freed = 0;
> 
> -       if (memcg && !mem_cgroup_is_root(memcg))
> +       if (!mem_cgroup_is_root(memcg))
>                 return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
> 
>         if (!down_read_trylock(&shrinker_rwsem))
> @@ -657,9 +655,6 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
>                         .memcg = memcg,
>                 };
> 
> -               if (!!memcg != !!(shrinker->flags & SHRINKER_MEMCG_AWARE))
> -                       continue;
> -
>                 if (!(shrinker->flags & SHRINKER_NUMA_AWARE))
>                         sc.nid = 0;
> 
> 
> How was the bisection done:
> 
>  - Compile kernel with x86_64_defconfig + CONFIG_MEMCG=y
>  - Boot VM with cgroup_disable=memory and filesystem with 500k Inodes and run
>    simple script on it:
>    - Observe number of active objects of ext4_inode_cache
>      --> around 400, but anything under 1000 was accepted by the bisect script
> 
>    - Call `find / > /dev/null`
>    - Again observe number of active objects of ext4_inode_cache
>      --> around 7000, but anything over 6000 was accepted by the script
> 
>    - Consume whole memory by simple program `while(1){ malloc(1); }` until it
>      gets killed by oom-killer.
>    - Again observe number of active objects of ext4_inode_cache
>      --> around 7000, script threshold: >= 6000 --> bug is there
>      --> around 100, script threshold <= 1000 --> bug not present
> 
> Real-world appearance:
> 
> We encountered this issue after upgrading kernel from 4.9 to 4.19 on our backup
> server. (Debian Stretch userspace, upgrade to Debian Buster distribution kernel
> or custom build 4.19.60.) The server has around 12 M of used inodes and only
> 4 GB of RAM. Whenever we run the backup, memory gets quickly consumed by kernel
> SLABs (mainly ext4_inode_cache). Userspace starts receiving a lot of hits by
> oom-killer after that so the server is completly unusable until reboot.
> 
> We just removed the cgroup_disable=memory parameter on our server. Memory
> footprint of memcg is significantly smaller then it used to be in the time we
> started using this parameter. But i still think that mentioned behaviour is a
> bug and should be fixed.
> 
> By the way, it seems like the raspberrypi kernel was fighting this issue as well:
> 	https://github.com/raspberrypi/linux/issues/2829
> If I'm reading correctly: they disabled memcg via commandline due to some
> memory leaks. Month later: they hit this issue and reenabled memcg.
> 
> 
> Thanks,
> Jan

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline
  2019-08-01 14:06 ` Michal Hocko
@ 2019-08-01 15:54   ` Jan Hadrava
  2019-08-01 16:32     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hadrava @ 2019-08-01 15:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, wizards, Kirill Tkhai, Johannes Weiner,
	Yang Shi, Shakeel Butt

On Thu, Aug 01, 2019 at 04:06:10PM +0200, Michal Hocko wrote:
> On Thu 01-08-19 15:42:50, Jan Hadrava wrote:
> > There seems to be a bug in mm/vmscan.c shrink_slab function when kernel is
> > compilled with CONFIG_MEMCG=y and it is then disabled at boot with commandline
> > parameter cgroup_disable=memory. SLABs are then not getting shrinked if the
> > system memory is consumed by userspace.
> 
> This looks similar to http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
> although the culprit commit has been identified to be different. Could
> you try it out please? Maybe we need more fixes.

Yes, it is same. So my report is duplicate and I'm just bad in searching the
archives, sorry.

Just to be sure, i run my tests and patch proposed in the original thread
solves my issue in all four affected stable releases:

> > This issue is present in linux-stable 4.19 and all newer lines.
> >     (tested on git tags v5.3-rc2 v5.2.5 v5.1.21 v4.19.63)

And culprit commit is in fact also the same: b0dedc49a2da introduces one issue
in one place and aeed1d325d42 (culprit according to original thread) moves it
few lines up:

> > Git bisect is pointing to commit:
> > 	b0dedc49a2daa0f44ddc51fbf686b2ef012fccbf
(...)
> > Following commit aeed1d325d429ac9699c4bf62d17156d60905519
> > deletes conditional continue (and so it fixes the problem). But it is creating
> > similar issue few lines earlier:

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline
  2019-08-01 15:54   ` Jan Hadrava
@ 2019-08-01 16:32     ` Michal Hocko
  2019-08-01 17:46       ` Jan Hadrava
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2019-08-01 16:32 UTC (permalink / raw)
  To: Jan Hadrava
  Cc: linux-mm, linux-kernel, wizards, Kirill Tkhai, Johannes Weiner,
	Yang Shi, Shakeel Butt

On Thu 01-08-19 17:54:34, Jan Hadrava wrote:
> On Thu, Aug 01, 2019 at 04:06:10PM +0200, Michal Hocko wrote:
> > On Thu 01-08-19 15:42:50, Jan Hadrava wrote:
> > > There seems to be a bug in mm/vmscan.c shrink_slab function when kernel is
> > > compilled with CONFIG_MEMCG=y and it is then disabled at boot with commandline
> > > parameter cgroup_disable=memory. SLABs are then not getting shrinked if the
> > > system memory is consumed by userspace.
> > 
> > This looks similar to http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
> > although the culprit commit has been identified to be different. Could
> > you try it out please? Maybe we need more fixes.
> 
> Yes, it is same.

I am happy to hear that!

> So my report is duplicate and I'm just bad in searching the
> archives, sorry.

No worries. Your bug report was really good with great level of details.
I wish all the bug reports were done so thoroughly.
 
> Just to be sure, i run my tests and patch proposed in the original thread
> solves my issue in all four affected stable releases:

Cc Andrew. I assume we can assume your Tested-by tag?

Thanks a lot!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline
  2019-08-01 16:32     ` Michal Hocko
@ 2019-08-01 17:46       ` Jan Hadrava
  2019-08-01 21:10         ` Yang Shi
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Hadrava @ 2019-08-01 17:46 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, wizards, Kirill Tkhai, Johannes Weiner,
	Yang Shi, Shakeel Butt

On Thu, Aug 01, 2019 at 06:32:13PM +0200, Michal Hocko wrote:
> On Thu 01-08-19 17:54:34, Jan Hadrava wrote:
> > Just to be sure, i run my tests and patch proposed in the original thread
> > solves my issue in all four affected stable releases:
> 
> Cc Andrew.

Are you sure? I can't see any change in e-mail headers.

> I assume we can assume your Tested-by tag?

Well, these test only checked, that bug is present without the patch
and disappears after applying it. Anyway: I am ok with it.


-- 
Jan Hadrava

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline
  2019-08-01 17:46       ` Jan Hadrava
@ 2019-08-01 21:10         ` Yang Shi
  0 siblings, 0 replies; 6+ messages in thread
From: Yang Shi @ 2019-08-01 21:10 UTC (permalink / raw)
  To: Jan Hadrava, Michal Hocko
  Cc: linux-mm, linux-kernel, wizards, Kirill Tkhai, Johannes Weiner,
	Shakeel Butt, Andrew Morton



On 8/1/19 10:46 AM, Jan Hadrava wrote:
> On Thu, Aug 01, 2019 at 06:32:13PM +0200, Michal Hocko wrote:
>> On Thu 01-08-19 17:54:34, Jan Hadrava wrote:
>>> Just to be sure, i run my tests and patch proposed in the original thread
>>> solves my issue in all four affected stable releases:
>> Cc Andrew.
> Are you sure? I can't see any change in e-mail headers.

Cc'ed Andrew.

>
>> I assume we can assume your Tested-by tag?
> Well, these test only checked, that bug is present without the patch
> and disappears after applying it. Anyway: I am ok with it.

Thanks for testing it. I think you ran into the similar pre-mature OOM 
issue as what Shakeel reported.

Andrew,

The patch has been in -mm tree 
(mm-vmscan-check-if-mem-cgroup-is-disabled-or-not-before-calling-memcg-slab-shrinker.patch), 
it seems we'd better to get this fix in the upcoming 5.3-rc so that it 
could get into stable release soon.

>
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-08-01 21:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-01 13:42 [BUG]: mm/vmscan.c: shrink_slab does not work correctly with memcg disabled via commandline Jan Hadrava
2019-08-01 14:06 ` Michal Hocko
2019-08-01 15:54   ` Jan Hadrava
2019-08-01 16:32     ` Michal Hocko
2019-08-01 17:46       ` Jan Hadrava
2019-08-01 21:10         ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).