linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* No memory reclaim while reaching MemoryHigh
@ 2019-07-25 13:17 Stefan Priebe - Profihost AG
  2019-07-25 14:01 ` Michal Hocko
  2019-07-25 14:53 ` Chris Down
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-25 13:17 UTC (permalink / raw)
  To: cgroups
  Cc: linux-mm, Michal Hocko, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Hello all,

i hope i added the right list and people - if i missed someone i would
be happy to know.

While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
varnish service.

It happens that the varnish.service cgroup reaches it's MemoryHigh value
and stops working due to throttling.

But i don't understand is that the process itself only consumes 40% of
it's cgroup usage.

So the other 60% is dirty dentries and inode cache. If i issue an
echo 3 > /proc/sys/vm/drop_caches

the varnish cgroup memory usage drops to the 50% of the pure process.

I thought that the kernel would trigger automatic memory reclaim if a
cgroup reaches is memory high value to drop caches.

Isn't it? does it needs a special flag or tuning? Is this expected?

Before drop caches:
   Memory: 13.1G (high: 13.0G)

After drop caches:
   Memory: 5.8G (high: 13.0G)

Greets,
Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-25 13:17 No memory reclaim while reaching MemoryHigh Stefan Priebe - Profihost AG
@ 2019-07-25 14:01 ` Michal Hocko
  2019-07-25 21:37   ` Stefan Priebe - Profihost AG
  2019-07-25 14:53 ` Chris Down
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2019-07-25 14:01 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
> Hello all,
> 
> i hope i added the right list and people - if i missed someone i would
> be happy to know.
> 
> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
> varnish service.
> 
> It happens that the varnish.service cgroup reaches it's MemoryHigh value
> and stops working due to throttling.

What do you mean by "stops working"? Does it mean that the process is
stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
what the kernel executing for the process.
 
> But i don't understand is that the process itself only consumes 40% of
> it's cgroup usage.
> 
> So the other 60% is dirty dentries and inode cache. If i issue an
> echo 3 > /proc/sys/vm/drop_caches
> 
> the varnish cgroup memory usage drops to the 50% of the pure process.
> 
> I thought that the kernel would trigger automatic memory reclaim if a
> cgroup reaches is memory high value to drop caches.

Yes, that is indeed the case and the kernel memory (e.g. inodes/dentries
and others) should be reclaim on the way. Maybe it is harder for the
reclaim to get rid of those than drop_caches. We need more data.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-25 13:17 No memory reclaim while reaching MemoryHigh Stefan Priebe - Profihost AG
  2019-07-25 14:01 ` Michal Hocko
@ 2019-07-25 14:53 ` Chris Down
  2019-07-25 21:42   ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Down @ 2019-07-25 14:53 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: cgroups, linux-mm, Michal Hocko, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Hi Stefan,

Stefan Priebe - Profihost AG writes:
>While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>varnish service.
>
>It happens that the varnish.service cgroup reaches it's MemoryHigh value
>and stops working due to throttling.

In that kernel version, the only throttling we have is reclaim-based throttling 
(I also have a patch out to do schedule-based throttling, but it's not in 
mainline yet). If the application is slowing down, it likely means that we are 
struggling to reclaim pages.

>But i don't understand is that the process itself only consumes 40% of
>it's cgroup usage.
>
>So the other 60% is dirty dentries and inode cache. If i issue an
>echo 3 > /proc/sys/vm/drop_caches
>
>the varnish cgroup memory usage drops to the 50% of the pure process.

As a caching server, doesn't Varnish have a lot of hot inodes/dentries in 
memory? If they are hot, it's possible it's hard for us to evict them.

>I thought that the kernel would trigger automatic memory reclaim if a
>cgroup reaches is memory high value to drop caches.

It does, that's the throttling you're seeing :-) I think more information is 
needed to work out what's going on here. For example: what do your kswapd 
counters look like? What does "stops working due to throttling" mean -- are you 
stuck in reclaim?

Thanks,

Chris


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-25 14:01 ` Michal Hocko
@ 2019-07-25 21:37   ` Stefan Priebe - Profihost AG
  2019-07-26  7:45     ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-25 21:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Hi Michal,

Am 25.07.19 um 16:01 schrieb Michal Hocko:
> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>> Hello all,
>>
>> i hope i added the right list and people - if i missed someone i would
>> be happy to know.
>>
>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>> varnish service.
>>
>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>> and stops working due to throttling.
> 
> What do you mean by "stops working"? Does it mean that the process is
> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
> what the kernel executing for the process.

The service no longer responses to HTTP requests.

stack switches in this case between:
[<0>] io_schedule+0x12/0x40
[<0>] __lock_page_or_retry+0x1e7/0x4e0
[<0>] filemap_fault+0x42f/0x830
[<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
[<0>] __do_fault+0x57/0x108
[<0>] __handle_mm_fault+0x949/0xef0
[<0>] handle_mm_fault+0xfc/0x1f0
[<0>] __do_page_fault+0x24a/0x450
[<0>] do_page_fault+0x32/0x110
[<0>] async_page_fault+0x1e/0x30
[<0>] 0xffffffffffffffff

and

[<0>] poll_schedule_timeout.constprop.13+0x42/0x70
[<0>] do_sys_poll+0x51e/0x5f0
[<0>] __x64_sys_poll+0xe7/0x130
[<0>] do_syscall_64+0x5b/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff


>> But i don't understand is that the process itself only consumes 40% of
>> it's cgroup usage.
>>
>> So the other 60% is dirty dentries and inode cache. If i issue an
>> echo 3 > /proc/sys/vm/drop_caches
>>
>> the varnish cgroup memory usage drops to the 50% of the pure process.
>>
>> I thought that the kernel would trigger automatic memory reclaim if a
>> cgroup reaches is memory high value to drop caches.
> 
> Yes, that is indeed the case and the kernel memory (e.g. inodes/dentries
> and others) should be reclaim on the way. Maybe it is harder for the
> reclaim to get rid of those than drop_caches. We need more data.

Tell me what you need ;-)

Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-25 14:53 ` Chris Down
@ 2019-07-25 21:42   ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-25 21:42 UTC (permalink / raw)
  To: Chris Down
  Cc: cgroups, linux-mm, Michal Hocko, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Hi Chris,

Am 25.07.19 um 16:53 schrieb Chris Down:
> Hi Stefan,
> 
> Stefan Priebe - Profihost AG writes:
>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>> varnish service.
>>
>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>> and stops working due to throttling.
> 
> In that kernel version, the only throttling we have is reclaim-based
> throttling (I also have a patch out to do schedule-based throttling, but
> it's not in mainline yet). If the application is slowing down, it likely
> means that we are struggling to reclaim pages.

Sounds interesting can you point me to a discussion or thread?


>> But i don't understand is that the process itself only consumes 40% of
>> it's cgroup usage.
>>
>> So the other 60% is dirty dentries and inode cache. If i issue an
>> echo 3 > /proc/sys/vm/drop_caches
>>
>> the varnish cgroup memory usage drops to the 50% of the pure process.
> 
> As a caching server, doesn't Varnish have a lot of hot inodes/dentries
> in memory? If they are hot, it's possible it's hard for us to evict them.

May be but they can't be that hot as what i would call hot. If you drop
caches the whole cgroup is only using ~ 1G extra memory even after hours.

>> I thought that the kernel would trigger automatic memory reclaim if a
>> cgroup reaches is memory high value to drop caches.
> 
> It does, that's the throttling you're seeing :-) I think more
> information is needed to work out what's going on here. For example:
> what do your kswapd counters look like?

Where do i find those?

> What does "stops working due to
> throttling" mean -- are you stuck in reclaim?

See the other mail to Michal - varnish does not respond and stack hangs
in handle_mm_fault.

I thought th kernel would drop fast the unneeded pagecache, inode and
dentries cache.

Thanks,
Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-25 21:37   ` Stefan Priebe - Profihost AG
@ 2019-07-26  7:45     ` Michal Hocko
  2019-07-26 18:30       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2019-07-26  7:45 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
> Hi Michal,
> 
> Am 25.07.19 um 16:01 schrieb Michal Hocko:
> > On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
> >> Hello all,
> >>
> >> i hope i added the right list and people - if i missed someone i would
> >> be happy to know.
> >>
> >> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
> >> varnish service.
> >>
> >> It happens that the varnish.service cgroup reaches it's MemoryHigh value
> >> and stops working due to throttling.
> > 
> > What do you mean by "stops working"? Does it mean that the process is
> > stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
> > what the kernel executing for the process.
> 
> The service no longer responses to HTTP requests.
> 
> stack switches in this case between:
> [<0>] io_schedule+0x12/0x40
> [<0>] __lock_page_or_retry+0x1e7/0x4e0
> [<0>] filemap_fault+0x42f/0x830
> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
> [<0>] __do_fault+0x57/0x108
> [<0>] __handle_mm_fault+0x949/0xef0
> [<0>] handle_mm_fault+0xfc/0x1f0
> [<0>] __do_page_fault+0x24a/0x450
> [<0>] do_page_fault+0x32/0x110
> [<0>] async_page_fault+0x1e/0x30
> [<0>] 0xffffffffffffffff
> 
> and
> 
> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
> [<0>] do_sys_poll+0x51e/0x5f0
> [<0>] __x64_sys_poll+0xe7/0x130
> [<0>] do_syscall_64+0x5b/0x170
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<0>] 0xffffffffffffffff

Neither of the two seem to be memcg related. Have you tried to get
several snapshots and see if the backtrace is stable? strace would also
tell you whether your application is stuck in a single syscall or they
are just progressing very slowly (-ttt parameter should give you timing)
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-26  7:45     ` Michal Hocko
@ 2019-07-26 18:30       ` Stefan Priebe - Profihost AG
  2019-07-28 21:11         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-26 18:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Am 26.07.19 um 09:45 schrieb Michal Hocko:
> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
>> Hi Michal,
>>
>> Am 25.07.19 um 16:01 schrieb Michal Hocko:
>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>>>> Hello all,
>>>>
>>>> i hope i added the right list and people - if i missed someone i would
>>>> be happy to know.
>>>>
>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>>>> varnish service.
>>>>
>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>>>> and stops working due to throttling.
>>>
>>> What do you mean by "stops working"? Does it mean that the process is
>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
>>> what the kernel executing for the process.
>>
>> The service no longer responses to HTTP requests.
>>
>> stack switches in this case between:
>> [<0>] io_schedule+0x12/0x40
>> [<0>] __lock_page_or_retry+0x1e7/0x4e0
>> [<0>] filemap_fault+0x42f/0x830
>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
>> [<0>] __do_fault+0x57/0x108
>> [<0>] __handle_mm_fault+0x949/0xef0
>> [<0>] handle_mm_fault+0xfc/0x1f0
>> [<0>] __do_page_fault+0x24a/0x450
>> [<0>] do_page_fault+0x32/0x110
>> [<0>] async_page_fault+0x1e/0x30
>> [<0>] 0xffffffffffffffff
>>
>> and
>>
>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
>> [<0>] do_sys_poll+0x51e/0x5f0
>> [<0>] __x64_sys_poll+0xe7/0x130
>> [<0>] do_syscall_64+0x5b/0x170
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [<0>] 0xffffffffffffffff
> 
> Neither of the two seem to be memcg related.

Yes but at least the xfs one is a page fault - isn't this related?

> Have you tried to get
> several snapshots and see if the backtrace is stable?
No it's not it switches most of the time between these both. But as long
as the xfs one with the page fault is seen it does not serve requests
and that one is seen for at least 1-5s than the poill one is visible and
than the xfs one again for 1-5s.

This happens if i do:
systemctl set-property --runtime varnish.service MemoryHigh=6.5G

if i set:
systemctl set-property --runtime varnish.service MemoryHigh=14G

i never get the xfs handle_mm fault one. This is reproducable.

> tell you whether your application is stuck in a single syscall or they
> are just progressing very slowly (-ttt parameter should give you timing)

Yes it's still going forward but really really slow due to memory
pressure. memory.pressure of varnish cgroup shows high values above 100
or 200.

I can reproduce the same with rsync or other tasks using memory for
inodes and dentries. What i don't unterstand is that the kernel does not
reclaim memory for the userspace process and drops the cache. I can't
believe those entries are hot - as they must be at least some days old
as a fresh process running a day only consumes about 200MB of indoe /
dentries / page cache.

Greets,
Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-26 18:30       ` Stefan Priebe - Profihost AG
@ 2019-07-28 21:11         ` Stefan Priebe - Profihost AG
  2019-07-28 21:39           ` Chris Down
  2019-07-29  7:07           ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-28 21:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

here is a memory.stat output of the cgroup:
# cat /sys/fs/cgroup/system.slice/varnish.service/memory.stat
anon 8113229824
file 39735296
kernel_stack 26345472
slab 24985600
sock 339968
shmem 0
file_mapped 38793216
file_dirty 946176
file_writeback 0
inactive_anon 0
active_anon 8113119232
inactive_file 40198144
active_file 102400
unevictable 0
slab_reclaimable 2859008
slab_unreclaimable 22126592
pgfault 178231449
pgmajfault 22011
pgrefill 393038
pgscan 4218254
pgsteal 430005
pgactivate 295416
pgdeactivate 351487
pglazyfree 0
pglazyfreed 0
workingset_refault 401874
workingset_activate 62535
workingset_nodereclaim 0

Greets,
Stefan

Am 26.07.19 um 20:30 schrieb Stefan Priebe - Profihost AG:
> Am 26.07.19 um 09:45 schrieb Michal Hocko:
>> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
>>> Hi Michal,
>>>
>>> Am 25.07.19 um 16:01 schrieb Michal Hocko:
>>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>>>>> Hello all,
>>>>>
>>>>> i hope i added the right list and people - if i missed someone i would
>>>>> be happy to know.
>>>>>
>>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>>>>> varnish service.
>>>>>
>>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>>>>> and stops working due to throttling.
>>>>
>>>> What do you mean by "stops working"? Does it mean that the process is
>>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
>>>> what the kernel executing for the process.
>>>
>>> The service no longer responses to HTTP requests.
>>>
>>> stack switches in this case between:
>>> [<0>] io_schedule+0x12/0x40
>>> [<0>] __lock_page_or_retry+0x1e7/0x4e0
>>> [<0>] filemap_fault+0x42f/0x830
>>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
>>> [<0>] __do_fault+0x57/0x108
>>> [<0>] __handle_mm_fault+0x949/0xef0
>>> [<0>] handle_mm_fault+0xfc/0x1f0
>>> [<0>] __do_page_fault+0x24a/0x450
>>> [<0>] do_page_fault+0x32/0x110
>>> [<0>] async_page_fault+0x1e/0x30
>>> [<0>] 0xffffffffffffffff
>>>
>>> and
>>>
>>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
>>> [<0>] do_sys_poll+0x51e/0x5f0
>>> [<0>] __x64_sys_poll+0xe7/0x130
>>> [<0>] do_syscall_64+0x5b/0x170
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [<0>] 0xffffffffffffffff
>>
>> Neither of the two seem to be memcg related.
> 
> Yes but at least the xfs one is a page fault - isn't this related?
> 
>> Have you tried to get
>> several snapshots and see if the backtrace is stable?
> No it's not it switches most of the time between these both. But as long
> as the xfs one with the page fault is seen it does not serve requests
> and that one is seen for at least 1-5s than the poill one is visible and
> than the xfs one again for 1-5s.
> 
> This happens if i do:
> systemctl set-property --runtime varnish.service MemoryHigh=6.5G
> 
> if i set:
> systemctl set-property --runtime varnish.service MemoryHigh=14G
> 
> i never get the xfs handle_mm fault one. This is reproducable.
> 
>> tell you whether your application is stuck in a single syscall or they
>> are just progressing very slowly (-ttt parameter should give you timing)
> 
> Yes it's still going forward but really really slow due to memory
> pressure. memory.pressure of varnish cgroup shows high values above 100
> or 200.
> 
> I can reproduce the same with rsync or other tasks using memory for
> inodes and dentries. What i don't unterstand is that the kernel does not
> reclaim memory for the userspace process and drops the cache. I can't
> believe those entries are hot - as they must be at least some days old
> as a fresh process running a day only consumes about 200MB of indoe /
> dentries / page cache.
> 
> Greets,
> Stefan
> 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-28 21:11         ` Stefan Priebe - Profihost AG
@ 2019-07-28 21:39           ` Chris Down
  2019-07-29  5:34             ` Stefan Priebe - Profihost AG
  2019-07-29  7:07           ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 13+ messages in thread
From: Chris Down @ 2019-07-28 21:39 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Michal Hocko, cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Hi Stefan,

Stefan Priebe - Profihost AG writes:
>anon 8113229824

You mention this problem happens if you set memory.high to 6.5G, however in 
steady state your application is 8G. What makes you think it (both its RSS and 
other shared resources like the page cache and other shared resources) can 
compress to 6.5G without memory thrashing?

I expect you're just setting memory.high so low that we end up having to 
constantly thrash the disk due to reclaim, from the evidence you presented.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-28 21:39           ` Chris Down
@ 2019-07-29  5:34             ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-29  5:34 UTC (permalink / raw)
  To: Chris Down
  Cc: Michal Hocko, cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Hi Chris,
Am 28.07.19 um 23:39 schrieb Chris Down:
> Hi Stefan,
> 
> Stefan Priebe - Profihost AG writes:
>> anon 8113229824
> 
> You mention this problem happens if you set memory.high to 6.5G, however
> in steady state your application is 8G.
This is a current memory.stat now i would test with memory.high set to
7.9 or 8G.

Last week it was at 6.5G

 What makes you think it (both
> its RSS and other shared resources like the page cache and other shared
> resources) can compress to 6.5G without memory thrashing?
If i issue echo 3 > drop_caches the usage always drops down to 5.8G


> I expect you're just setting memory.high so low that we end up having to
> constantly thrash the disk due to reclaim, from the evidence you presented.

This sounds interesting? How can i verify this? And what do you mean by
trashing the disk? swap is completely disabled.

I thought all memory which i can drop with drop_caches can be reclaimed?

Greets,
Stefan


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-28 21:11         ` Stefan Priebe - Profihost AG
  2019-07-28 21:39           ` Chris Down
@ 2019-07-29  7:07           ` Stefan Priebe - Profihost AG
  2019-07-29  7:45             ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-29  7:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Hi all,

it might be that i just missunderstood how it works.

This test works absolutely fine without any penalty:

test.sh:
#####
#!/bin/bash

sync
echo 3 >/proc/sys/vm/drop_caches
sync
time find / -xdev -type f -exec cat "{}" \; >/dev/null 2>/dev/null
#####

started with:
systemd-run -pRemainAfterExit=True -- /root/spriebe/test.sh

or

systemd-run --property=MemoryHigh=300M -pRemainAfterExit=True --
/root/spriebe/test.sh

In both cases it takes ~ 1m 45s even though it consumes about 2G of mem
in the first case.

So it seems even though it can only consume a max of 300M in the 2nd
case. It is as fast as the first one without any limit.

I thought until today that the same would happen for varnish. Where's
the difference?

I also tried stuff like:
sysctl -w vm.vfs_cache_pressure=1000000

but the cgroup memory usage of varnish still raises slowly about 100M
per hour. The varnish process itself stays constant at ~5.6G

Greets,
Stefan

Am 28.07.19 um 23:11 schrieb Stefan Priebe - Profihost AG:
> here is a memory.stat output of the cgroup:
> # cat /sys/fs/cgroup/system.slice/varnish.service/memory.stat
> anon 8113229824
> file 39735296
> kernel_stack 26345472
> slab 24985600
> sock 339968
> shmem 0
> file_mapped 38793216
> file_dirty 946176
> file_writeback 0
> inactive_anon 0
> active_anon 8113119232
> inactive_file 40198144
> active_file 102400
> unevictable 0
> slab_reclaimable 2859008
> slab_unreclaimable 22126592
> pgfault 178231449
> pgmajfault 22011
> pgrefill 393038
> pgscan 4218254
> pgsteal 430005
> pgactivate 295416
> pgdeactivate 351487
> pglazyfree 0
> pglazyfreed 0
> workingset_refault 401874
> workingset_activate 62535
> workingset_nodereclaim 0
> 
> Greets,
> Stefan
> 
> Am 26.07.19 um 20:30 schrieb Stefan Priebe - Profihost AG:
>> Am 26.07.19 um 09:45 schrieb Michal Hocko:
>>> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
>>>> Hi Michal,
>>>>
>>>> Am 25.07.19 um 16:01 schrieb Michal Hocko:
>>>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> i hope i added the right list and people - if i missed someone i would
>>>>>> be happy to know.
>>>>>>
>>>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>>>>>> varnish service.
>>>>>>
>>>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>>>>>> and stops working due to throttling.
>>>>>
>>>>> What do you mean by "stops working"? Does it mean that the process is
>>>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
>>>>> what the kernel executing for the process.
>>>>
>>>> The service no longer responses to HTTP requests.
>>>>
>>>> stack switches in this case between:
>>>> [<0>] io_schedule+0x12/0x40
>>>> [<0>] __lock_page_or_retry+0x1e7/0x4e0
>>>> [<0>] filemap_fault+0x42f/0x830
>>>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
>>>> [<0>] __do_fault+0x57/0x108
>>>> [<0>] __handle_mm_fault+0x949/0xef0
>>>> [<0>] handle_mm_fault+0xfc/0x1f0
>>>> [<0>] __do_page_fault+0x24a/0x450
>>>> [<0>] do_page_fault+0x32/0x110
>>>> [<0>] async_page_fault+0x1e/0x30
>>>> [<0>] 0xffffffffffffffff
>>>>
>>>> and
>>>>
>>>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
>>>> [<0>] do_sys_poll+0x51e/0x5f0
>>>> [<0>] __x64_sys_poll+0xe7/0x130
>>>> [<0>] do_syscall_64+0x5b/0x170
>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [<0>] 0xffffffffffffffff
>>>
>>> Neither of the two seem to be memcg related.
>>
>> Yes but at least the xfs one is a page fault - isn't this related?
>>
>>> Have you tried to get
>>> several snapshots and see if the backtrace is stable?
>> No it's not it switches most of the time between these both. But as long
>> as the xfs one with the page fault is seen it does not serve requests
>> and that one is seen for at least 1-5s than the poill one is visible and
>> than the xfs one again for 1-5s.
>>
>> This happens if i do:
>> systemctl set-property --runtime varnish.service MemoryHigh=6.5G
>>
>> if i set:
>> systemctl set-property --runtime varnish.service MemoryHigh=14G
>>
>> i never get the xfs handle_mm fault one. This is reproducable.
>>
>>> tell you whether your application is stuck in a single syscall or they
>>> are just progressing very slowly (-ttt parameter should give you timing)
>>
>> Yes it's still going forward but really really slow due to memory
>> pressure. memory.pressure of varnish cgroup shows high values above 100
>> or 200.
>>
>> I can reproduce the same with rsync or other tasks using memory for
>> inodes and dentries. What i don't unterstand is that the kernel does not
>> reclaim memory for the userspace process and drops the cache. I can't
>> believe those entries are hot - as they must be at least some days old
>> as a fresh process running a day only consumes about 200MB of indoe /
>> dentries / page cache.
>>
>> Greets,
>> Stefan
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-29  7:07           ` Stefan Priebe - Profihost AG
@ 2019-07-29  7:45             ` Stefan Priebe - Profihost AG
  2019-07-31 13:03               ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-07-29  7:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

Sorry for may be spamming - i try to share as much information as i can:

The difference varnish between my is that:
* varnish cgroup consumes active_anon type of mem
* my test consumes inactive_file type of mem

both get freed by drop_caches but active_anon does not get freed by
triggering memoryhigh.

Greets,
Stefan

Am 29.07.19 um 09:07 schrieb Stefan Priebe - Profihost AG:
> Hi all,
> 
> it might be that i just missunderstood how it works.
> 
> This test works absolutely fine without any penalty:
> 
> test.sh:
> #####
> #!/bin/bash
> 
> sync
> echo 3 >/proc/sys/vm/drop_caches
> sync
> time find / -xdev -type f -exec cat "{}" \; >/dev/null 2>/dev/null
> #####
> 
> started with:
> systemd-run -pRemainAfterExit=True -- /root/spriebe/test.sh
> 
> or
> 
> systemd-run --property=MemoryHigh=300M -pRemainAfterExit=True --
> /root/spriebe/test.sh
> 
> In both cases it takes ~ 1m 45s even though it consumes about 2G of mem
> in the first case.
> 
> So it seems even though it can only consume a max of 300M in the 2nd
> case. It is as fast as the first one without any limit.
> 
> I thought until today that the same would happen for varnish. Where's
> the difference?
> 
> I also tried stuff like:
> sysctl -w vm.vfs_cache_pressure=1000000
> 
> but the cgroup memory usage of varnish still raises slowly about 100M
> per hour. The varnish process itself stays constant at ~5.6G
> 
> Greets,
> Stefan
> 
> Am 28.07.19 um 23:11 schrieb Stefan Priebe - Profihost AG:
>> here is a memory.stat output of the cgroup:
>> # cat /sys/fs/cgroup/system.slice/varnish.service/memory.stat
>> anon 8113229824
>> file 39735296
>> kernel_stack 26345472
>> slab 24985600
>> sock 339968
>> shmem 0
>> file_mapped 38793216
>> file_dirty 946176
>> file_writeback 0
>> inactive_anon 0
>> active_anon 8113119232
>> inactive_file 40198144
>> active_file 102400
>> unevictable 0
>> slab_reclaimable 2859008
>> slab_unreclaimable 22126592
>> pgfault 178231449
>> pgmajfault 22011
>> pgrefill 393038
>> pgscan 4218254
>> pgsteal 430005
>> pgactivate 295416
>> pgdeactivate 351487
>> pglazyfree 0
>> pglazyfreed 0
>> workingset_refault 401874
>> workingset_activate 62535
>> workingset_nodereclaim 0
>>
>> Greets,
>> Stefan
>>
>> Am 26.07.19 um 20:30 schrieb Stefan Priebe - Profihost AG:
>>> Am 26.07.19 um 09:45 schrieb Michal Hocko:
>>>> On Thu 25-07-19 23:37:14, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Michal,
>>>>>
>>>>> Am 25.07.19 um 16:01 schrieb Michal Hocko:
>>>>>> On Thu 25-07-19 15:17:17, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hello all,
>>>>>>>
>>>>>>> i hope i added the right list and people - if i missed someone i would
>>>>>>> be happy to know.
>>>>>>>
>>>>>>> While using kernel 4.19.55 and cgroupv2 i set a MemoryHigh value for a
>>>>>>> varnish service.
>>>>>>>
>>>>>>> It happens that the varnish.service cgroup reaches it's MemoryHigh value
>>>>>>> and stops working due to throttling.
>>>>>>
>>>>>> What do you mean by "stops working"? Does it mean that the process is
>>>>>> stuck in the kernel doing the reclaim? /proc/<pid>/stack would tell you
>>>>>> what the kernel executing for the process.
>>>>>
>>>>> The service no longer responses to HTTP requests.
>>>>>
>>>>> stack switches in this case between:
>>>>> [<0>] io_schedule+0x12/0x40
>>>>> [<0>] __lock_page_or_retry+0x1e7/0x4e0
>>>>> [<0>] filemap_fault+0x42f/0x830
>>>>> [<0>] __xfs_filemap_fault.constprop.11+0x49/0x120
>>>>> [<0>] __do_fault+0x57/0x108
>>>>> [<0>] __handle_mm_fault+0x949/0xef0
>>>>> [<0>] handle_mm_fault+0xfc/0x1f0
>>>>> [<0>] __do_page_fault+0x24a/0x450
>>>>> [<0>] do_page_fault+0x32/0x110
>>>>> [<0>] async_page_fault+0x1e/0x30
>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>> and
>>>>>
>>>>> [<0>] poll_schedule_timeout.constprop.13+0x42/0x70
>>>>> [<0>] do_sys_poll+0x51e/0x5f0
>>>>> [<0>] __x64_sys_poll+0xe7/0x130
>>>>> [<0>] do_syscall_64+0x5b/0x170
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [<0>] 0xffffffffffffffff
>>>>
>>>> Neither of the two seem to be memcg related.
>>>
>>> Yes but at least the xfs one is a page fault - isn't this related?
>>>
>>>> Have you tried to get
>>>> several snapshots and see if the backtrace is stable?
>>> No it's not it switches most of the time between these both. But as long
>>> as the xfs one with the page fault is seen it does not serve requests
>>> and that one is seen for at least 1-5s than the poill one is visible and
>>> than the xfs one again for 1-5s.
>>>
>>> This happens if i do:
>>> systemctl set-property --runtime varnish.service MemoryHigh=6.5G
>>>
>>> if i set:
>>> systemctl set-property --runtime varnish.service MemoryHigh=14G
>>>
>>> i never get the xfs handle_mm fault one. This is reproducable.
>>>
>>>> tell you whether your application is stuck in a single syscall or they
>>>> are just progressing very slowly (-ttt parameter should give you timing)
>>>
>>> Yes it's still going forward but really really slow due to memory
>>> pressure. memory.pressure of varnish cgroup shows high values above 100
>>> or 200.
>>>
>>> I can reproduce the same with rsync or other tasks using memory for
>>> inodes and dentries. What i don't unterstand is that the kernel does not
>>> reclaim memory for the userspace process and drops the cache. I can't
>>> believe those entries are hot - as they must be at least some days old
>>> as a fresh process running a day only consumes about 200MB of indoe /
>>> dentries / page cache.
>>>
>>> Greets,
>>> Stefan
>>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: No memory reclaim while reaching MemoryHigh
  2019-07-29  7:45             ` Stefan Priebe - Profihost AG
@ 2019-07-31 13:03               ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2019-07-31 13:03 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: cgroups, linux-mm, Johannes Weiner, n.fahldieck,
	Daniel Aberger - Profihost AG, p.kramme

On Mon 29-07-19 09:45:00, Stefan Priebe - Profihost AG wrote:
> Sorry for may be spamming - i try to share as much information as i can:
> 
> The difference varnish between my is that:
> * varnish cgroup consumes active_anon type of mem
> * my test consumes inactive_file type of mem
> 
> both get freed by drop_caches but active_anon does not get freed by
> triggering memoryhigh.

Do you have swap available?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-07-31 13:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-25 13:17 No memory reclaim while reaching MemoryHigh Stefan Priebe - Profihost AG
2019-07-25 14:01 ` Michal Hocko
2019-07-25 21:37   ` Stefan Priebe - Profihost AG
2019-07-26  7:45     ` Michal Hocko
2019-07-26 18:30       ` Stefan Priebe - Profihost AG
2019-07-28 21:11         ` Stefan Priebe - Profihost AG
2019-07-28 21:39           ` Chris Down
2019-07-29  5:34             ` Stefan Priebe - Profihost AG
2019-07-29  7:07           ` Stefan Priebe - Profihost AG
2019-07-29  7:45             ` Stefan Priebe - Profihost AG
2019-07-31 13:03               ` Michal Hocko
2019-07-25 14:53 ` Chris Down
2019-07-25 21:42   ` Stefan Priebe - Profihost AG

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).