linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Memory management issue in 4.18.15
@ 2018-10-20 11:41 Spock
  2018-10-20 15:37 ` Randy Dunlap
  2018-10-22  8:33 ` Michal Hocko
  0 siblings, 2 replies; 9+ messages in thread
From: Spock @ 2018-10-20 11:41 UTC (permalink / raw)
  To: linux-kernel

Hello,

I have a workload, which creates lots of cache pages. Before 4.18.15,
the behavior was very stable: pagecache is constantly growing until it
consumes all the free memory, and then kswapd is balancing it around
low watermark. After 4.18.15, once in a while khugepaged is waking up
and reclaims almost all the pages from pagecache, so there is always
around 2G of 8G unused. THP is enabled only for madvise case and are
not used.

The exact change that leads to current behavior is
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.18.y&id=62aad93f09c1952ede86405894df1b22012fd5ab

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-20 11:41 Memory management issue in 4.18.15 Spock
@ 2018-10-20 15:37 ` Randy Dunlap
  2018-10-20 17:41   ` Roman Gushchin
  2018-10-22  8:33 ` Michal Hocko
  1 sibling, 1 reply; 9+ messages in thread
From: Randy Dunlap @ 2018-10-20 15:37 UTC (permalink / raw)
  To: Spock, linux-kernel, Linux MM
  Cc: Andrew Morton, Roman Gushchin, Rik van Riel, Sasha Levin

[add linux-mm mailing list + people]


On 10/20/18 4:41 AM, Spock wrote:
> Hello,
> 
> I have a workload, which creates lots of cache pages. Before 4.18.15,
> the behavior was very stable: pagecache is constantly growing until it
> consumes all the free memory, and then kswapd is balancing it around
> low watermark. After 4.18.15, once in a while khugepaged is waking up
> and reclaims almost all the pages from pagecache, so there is always
> around 2G of 8G unused. THP is enabled only for madvise case and are
> not used.
> 
> The exact change that leads to current behavior is
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.18.y&id=62aad93f09c1952ede86405894df1b22012fd5ab
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-20 15:37 ` Randy Dunlap
@ 2018-10-20 17:41   ` Roman Gushchin
  0 siblings, 0 replies; 9+ messages in thread
From: Roman Gushchin @ 2018-10-20 17:41 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Spock, linux-kernel, Linux MM, Andrew Morton, Rik van Riel, Sasha Levin

On Sat, Oct 20, 2018 at 08:37:28AM -0700, Randy Dunlap wrote:
> [add linux-mm mailing list + people]
> 
> 
> On 10/20/18 4:41 AM, Spock wrote:
> > Hello,
> > 
> > I have a workload, which creates lots of cache pages. Before 4.18.15,
> > the behavior was very stable: pagecache is constantly growing until it
> > consumes all the free memory, and then kswapd is balancing it around
> > low watermark. After 4.18.15, once in a while khugepaged is waking up
> > and reclaims almost all the pages from pagecache, so there is always
> > around 2G of 8G unused. THP is enabled only for madvise case and are
> > not used.
> > 
> > The exact change that leads to current behavior is
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.18.y&id=62aad93f09c1952ede86405894df1b22012fd5ab
> > 

Hello!

Can you, please, describe your workload in more details?
Do you use memory cgroups? How many of them? What's the ratio between slabs
and pagecache in the affected cgroup? Is the pagecache mmapped by some process?
Is the majority of the pagecache created by few cached files or the number
of files is big?

This is definitely a strange effect. The change shouldn't affect pagecache
reclaim directly, so the only possibility I see is that because we started
applying some minimal pressure on slabs, we also started reclaim some internal
fs structures under background memory pressure, which leads to a more aggressive
pagecache reclaim.

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-20 11:41 Memory management issue in 4.18.15 Spock
  2018-10-20 15:37 ` Randy Dunlap
@ 2018-10-22  8:33 ` Michal Hocko
  2018-10-22 15:08   ` Roman Gushchin
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Michal Hocko @ 2018-10-22  8:33 UTC (permalink / raw)
  To: Spock
  Cc: linux-kernel, Roman Gushchin, Rik van Riel, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Andrew Morton, Sasha Levin,
	Greg Kroah-Hartman, linux-mm

Cc som more people.

I am wondering why 172b06c32b94 ("mm: slowly shrink slabs with a
relatively small number of objects") has been backported to the stable
tree when not marked that way. Put that aside it seems likely that the
upstream kernel will have the same issue I suspect. Roman, could you
have a look please?

On Sat 20-10-18 14:41:40, Spock wrote:
> Hello,
> 
> I have a workload, which creates lots of cache pages. Before 4.18.15,
> the behavior was very stable: pagecache is constantly growing until it
> consumes all the free memory, and then kswapd is balancing it around
> low watermark. After 4.18.15, once in a while khugepaged is waking up
> and reclaims almost all the pages from pagecache, so there is always
> around 2G of 8G unused. THP is enabled only for madvise case and are
> not used.
> 
> The exact change that leads to current behavior is
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.18.y&id=62aad93f09c1952ede86405894df1b22012fd5ab

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-22  8:33 ` Michal Hocko
@ 2018-10-22 15:08   ` Roman Gushchin
  2018-10-22 17:01     ` Michal Hocko
  2018-10-22 17:35   ` Roman Gushchin
  2018-10-22 23:44   ` Roman Gushchin
  2 siblings, 1 reply; 9+ messages in thread
From: Roman Gushchin @ 2018-10-22 15:08 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Spock, linux-kernel, Rik van Riel, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Andrew Morton, Sasha Levin,
	Greg Kroah-Hartman, linux-mm

On Mon, Oct 22, 2018 at 10:33:22AM +0200, Michal Hocko wrote:
> Cc som more people.
> 
> I am wondering why 172b06c32b94 ("mm: slowly shrink slabs with a
> relatively small number of objects") has been backported to the stable
> tree when not marked that way. Put that aside it seems likely that the
> upstream kernel will have the same issue I suspect. Roman, could you
> have a look please?

Sure, already looking... Spock provided some useful details, and I think,
I know what's happening... Hope to propose a solution soon.

RE backporting: I'm slightly surprised that only one patch of the memcg
reclaim fix series has been backported. Either all or none makes much more
sense to me.

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-22 15:08   ` Roman Gushchin
@ 2018-10-22 17:01     ` Michal Hocko
  2018-10-25 11:26       ` Sasha Levin
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2018-10-22 17:01 UTC (permalink / raw)
  To: Roman Gushchin, Sasha Levin
  Cc: Spock, linux-kernel, Rik van Riel, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Andrew Morton,
	Greg Kroah-Hartman, linux-mm

On Mon 22-10-18 15:08:22, Roman Gushchin wrote:
[...]
> RE backporting: I'm slightly surprised that only one patch of the memcg
> reclaim fix series has been backported. Either all or none makes much more
> sense to me.

Yeah, I think this is AUTOSEL trying to be clever again. I though it has
been agreed that MM is quite good at marking patches for stable and so
it was not considered by the machinery. Sasha?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-22  8:33 ` Michal Hocko
  2018-10-22 15:08   ` Roman Gushchin
@ 2018-10-22 17:35   ` Roman Gushchin
  2018-10-22 23:44   ` Roman Gushchin
  2 siblings, 0 replies; 9+ messages in thread
From: Roman Gushchin @ 2018-10-22 17:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Spock, linux-kernel, Rik van Riel, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Andrew Morton, Sasha Levin,
	Greg Kroah-Hartman, linux-mm

On Mon, Oct 22, 2018 at 10:33:22AM +0200, Michal Hocko wrote:
> Cc som more people.
> 
> I am wondering why 172b06c32b94 ("mm: slowly shrink slabs with a
> relatively small number of objects") has been backported to the stable
> tree when not marked that way. Put that aside it seems likely that the
> upstream kernel will have the same issue I suspect. Roman, could you
> have a look please?

So, the problem is probably caused by the unused inode eviction code:
inode_lru_isolate() invalidates all pages belonging to an unreferenced
clean inode at once, even if the goal was to scan (and potentially free)
just one inode (or any other slab object).

Spock's workload, as described, has few large files in the pagecache,
so it becomes noticeable. A small pressure applied on inode cache
surprisingly results in cleaning up significant percentage of the memory.

It happened before my change too, but was probably less noticeable, because
usually required higher memory pressure to happen. So, too aggressive reclaim
was less unexpected.

How to fix this?

It seems to me, that we shouldn't try invalidating pagecache pages from
the inode reclaim path at all (maybe except inodes with only few pages).
If an inode has a lot of attached pagecache, let it be evicted "naturally",
through file LRU lists.
But I need to perform some real-life testing on how this will work.

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-22  8:33 ` Michal Hocko
  2018-10-22 15:08   ` Roman Gushchin
  2018-10-22 17:35   ` Roman Gushchin
@ 2018-10-22 23:44   ` Roman Gushchin
  2 siblings, 0 replies; 9+ messages in thread
From: Roman Gushchin @ 2018-10-22 23:44 UTC (permalink / raw)
  To: Spock
  Cc: Spock, linux-kernel, Rik van Riel, Johannes Weiner,
	Vladimir Davydov, Shakeel Butt, Andrew Morton, Sasha Levin,
	Greg Kroah-Hartman, linux-mm

> On Sat 20-10-18 14:41:40, Spock wrote:
> > Hello,
> > 
> > I have a workload, which creates lots of cache pages. Before 4.18.15,
> > the behavior was very stable: pagecache is constantly growing until it
> > consumes all the free memory, and then kswapd is balancing it around
> > low watermark. After 4.18.15, once in a while khugepaged is waking up
> > and reclaims almost all the pages from pagecache, so there is always
> > around 2G of 8G unused. THP is enabled only for madvise case and are
> > not used.

Spock, can you, please, check if the following patch solves the problem
for you?

Thank you!

--

diff --git a/fs/inode.c b/fs/inode.c
index 73432e64f874..63aca301a8bc 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -731,7 +731,7 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
        }
 
        /* recently referenced inodes get one more pass */
-       if (inode->i_state & I_REFERENCED) {
+       if (inode->i_state & I_REFERENCED || inode->i_data.nrpages > 1) {
                inode->i_state &= ~I_REFERENCED;
                spin_unlock(&inode->i_lock);
                return LRU_ROTATE;


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Memory management issue in 4.18.15
  2018-10-22 17:01     ` Michal Hocko
@ 2018-10-25 11:26       ` Sasha Levin
  0 siblings, 0 replies; 9+ messages in thread
From: Sasha Levin @ 2018-10-25 11:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: guro, Sasha Levin, dairinin, linux-kernel@vger.kernel.org List,
	riel, hannes, vdavydov.dev, shakeelb, Andrew Morton, Greg KH,
	linux-mm, sashal

On Mon, Oct 22, 2018 at 1:01 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 22-10-18 15:08:22, Roman Gushchin wrote:
> [...]
> > RE backporting: I'm slightly surprised that only one patch of the memcg
> > reclaim fix series has been backported. Either all or none makes much more
> > sense to me.
>
> Yeah, I think this is AUTOSEL trying to be clever again. I though it has
> been agreed that MM is quite good at marking patches for stable and so
> it was not considered by the machinery. Sasha?

I've talked about it briefly with Andrew, and he suggested that I'll
send him the list of AUTOSEL commits separately to avoid the noise, so
we'll try that and see what happens.


--
Thanks.
Sasha

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-10-25 11:26 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-20 11:41 Memory management issue in 4.18.15 Spock
2018-10-20 15:37 ` Randy Dunlap
2018-10-20 17:41   ` Roman Gushchin
2018-10-22  8:33 ` Michal Hocko
2018-10-22 15:08   ` Roman Gushchin
2018-10-22 17:01     ` Michal Hocko
2018-10-25 11:26       ` Sasha Levin
2018-10-22 17:35   ` Roman Gushchin
2018-10-22 23:44   ` Roman Gushchin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).