linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* lot of MemAvailable but falling cache and raising PSI
@ 2019-09-05 11:27 Stefan Priebe - Profihost AG
  2019-09-05 11:40 ` Michal Hocko
  2019-09-05 12:15 ` Vlastimil Babka
  0 siblings, 2 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-05 11:27 UTC (permalink / raw)
  To: linux-mm; +Cc: l.roehrs, cgroups, Johannes Weiner, Michal Hocko

Hello all,

i hope you can help me again to understand the current MemAvailable
value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
this case.

I'm seeing the following behaviour i don't understand and ask for help.

While MemAvailable shows 5G the kernel starts to drop cache from 4G down
to 1G while the apache spawns some PHP processes. After that the PSI
mem.some value rises and the kernel tries to reclaim memory but
MemAvailable stays at 5G.

Any ideas?

Thanks!

Greets,
Stefan



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 11:27 lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
@ 2019-09-05 11:40 ` Michal Hocko
  2019-09-05 11:56   ` Stefan Priebe - Profihost AG
  2019-09-05 12:15 ` Vlastimil Babka
  1 sibling, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-05 11:40 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
> Hello all,
> 
> i hope you can help me again to understand the current MemAvailable
> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
> this case.
> 
> I'm seeing the following behaviour i don't understand and ask for help.
> 
> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
> to 1G while the apache spawns some PHP processes. After that the PSI
> mem.some value rises and the kernel tries to reclaim memory but
> MemAvailable stays at 5G.
> 
> Any ideas?

Can you collect /proc/vmstat (every second or so) and post it while this
is the case please?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 11:40 ` Michal Hocko
@ 2019-09-05 11:56   ` Stefan Priebe - Profihost AG
  2019-09-05 16:28     ` Yang Shi
  2019-09-06 10:08     ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-05 11:56 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner


Am 05.09.19 um 13:40 schrieb Michal Hocko:
> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
>> Hello all,
>>
>> i hope you can help me again to understand the current MemAvailable
>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
>> this case.
>>
>> I'm seeing the following behaviour i don't understand and ask for help.
>>
>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
>> to 1G while the apache spawns some PHP processes. After that the PSI
>> mem.some value rises and the kernel tries to reclaim memory but
>> MemAvailable stays at 5G.
>>
>> Any ideas?
> 
> Can you collect /proc/vmstat (every second or so) and post it while this
> is the case please?

Yes sure.

But i don't know which event you mean exactly. Current situation is PSI
/ memory pressure is > 20 but:

This is the current status where MemAvailable show 5G but Cached is
already dropped to 1G coming from 4G:


meminfo:
MemTotal:       16423116 kB
MemFree:         5280736 kB
MemAvailable:    5332752 kB
Buffers:            2572 kB
Cached:          1225112 kB
SwapCached:            0 kB
Active:          8934976 kB
Inactive:        1026900 kB
Active(anon):    8740396 kB
Inactive(anon):   873448 kB
Active(file):     194580 kB
Inactive(file):   153452 kB
Unevictable:       19900 kB
Mlocked:           19900 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:              1980 kB
Writeback:             0 kB
AnonPages:       8423480 kB
Mapped:           978212 kB
Shmem:            875680 kB
Slab:             839868 kB
SReclaimable:     383396 kB
SUnreclaim:       456472 kB
KernelStack:       22576 kB
PageTables:        49824 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     8211556 kB
Committed_AS:   32060624 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
Percpu:           118048 kB
HardwareCorrupted:     0 kB
AnonHugePages:   6406144 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:     2580336 kB
DirectMap2M:    14196736 kB
DirectMap1G:     2097152 kB


vmstat shows:
nr_free_pages 1320053
nr_zone_inactive_anon 218362
nr_zone_active_anon 2185108
nr_zone_inactive_file 38363
nr_zone_active_file 48645
nr_zone_unevictable 4975
nr_zone_write_pending 495
nr_mlock 4975
nr_page_table_pages 12553
nr_kernel_stack 22576
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 13916119899
numa_miss 0
numa_foreign 0
numa_interleave 15629
numa_local 13916119899
numa_other 0
nr_inactive_anon 218362
nr_active_anon 2185164
nr_inactive_file 38363
nr_active_file 48645
nr_unevictable 4975
nr_slab_reclaimable 95849
nr_slab_unreclaimable 114118
nr_isolated_anon 0
nr_isolated_file 0
workingset_refault 71365357
workingset_activate 20281670
workingset_restore 8995665
workingset_nodereclaim 326085
nr_anon_pages 2105903
nr_mapped 244553
nr_file_pages 306921
nr_dirty 495
nr_writeback 0
nr_writeback_temp 0
nr_shmem 218920
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_anon_transparent_hugepages 3128
nr_unstable 0
nr_vmscan_write 0
nr_vmscan_immediate_reclaim 1833104
nr_dirtied 386544087
nr_written 259220036
nr_dirty_threshold 265636
nr_dirty_background_threshold 132656
pgpgin 1817628997
pgpgout 3730818029
pswpin 0
pswpout 0
pgalloc_dma 0
pgalloc_dma32 5790777997
pgalloc_normal 20003662520
pgalloc_movable 0
allocstall_dma 0
allocstall_dma32 0
allocstall_normal 39
allocstall_movable 1980089
pgskip_dma 0
pgskip_dma32 0
pgskip_normal 0
pgskip_movable 0
pgfree 26637215947
pgactivate 316722654
pgdeactivate 261039211
pglazyfree 0
pgfault 17719356599
pgmajfault 30985544
pglazyfreed 0
pgrefill 286826568
pgsteal_kswapd 36740923
pgsteal_direct 349291470
pgscan_kswapd 36878966
pgscan_direct 395327492
pgscan_direct_throttle 0
zone_reclaim_failed 0
pginodesteal 49817087
slabs_scanned 597956834
kswapd_inodesteal 1412447
kswapd_low_wmark_hit_quickly 39
kswapd_high_wmark_hit_quickly 319
pageoutrun 3585
pgrotated 2873743
drop_pagecache 0
drop_slab 0
oom_kill 0
pgmigrate_success 839062285
pgmigrate_fail 507313
compact_migrate_scanned 9619077010
compact_free_scanned 67985619651
compact_isolated 1684537704
compact_stall 205761
compact_fail 182420
compact_success 23341
compact_daemon_wake 2
compact_daemon_migrate_scanned 811
compact_daemon_free_scanned 490241
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 1006521
unevictable_pgs_scanned 0
unevictable_pgs_rescued 997077
unevictable_pgs_mlocked 1319203
unevictable_pgs_munlocked 842471
unevictable_pgs_cleared 470531
unevictable_pgs_stranded 459613
thp_fault_alloc 20263113
thp_fault_fallback 3368635
thp_collapse_alloc 226476
thp_collapse_alloc_failed 17594
thp_file_alloc 0
thp_file_mapped 0
thp_split_page 1159
thp_split_page_failed 3927
thp_deferred_split_page 20348941
thp_split_pmd 53361
thp_split_pud 0
thp_zero_page_alloc 1
thp_zero_page_alloc_failed 0
thp_swpout 0
thp_swpout_fallback 0
balloon_inflate 0
balloon_deflate 0
balloon_migrate 0
swap_ra 0
swap_ra_hit 0

Greets,
Stefan



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 11:27 lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
  2019-09-05 11:40 ` Michal Hocko
@ 2019-09-05 12:15 ` Vlastimil Babka
  2019-09-05 12:27   ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Vlastimil Babka @ 2019-09-05 12:15 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, linux-mm
  Cc: l.roehrs, cgroups, Johannes Weiner, Michal Hocko

On 9/5/19 1:27 PM, Stefan Priebe - Profihost AG wrote:
> Hello all,
> 
> i hope you can help me again to understand the current MemAvailable
> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
> this case.
> 
> I'm seeing the following behaviour i don't understand and ask for help.
> 
> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
> to 1G while the apache spawns some PHP processes. After that the PSI
> mem.some value rises and the kernel tries to reclaim memory but
> MemAvailable stays at 5G.
> 
> Any ideas?

PHP seems to use madvise(MADV_HUGEPAGE), so if it's a NUMA machine it
might be worth trying to cherry-pick these two commits:
92717d429b38 ("Revert "Revert "mm, thp: consolidate THP gfp handling
into alloc_hugepage_direct_gfpmask""")
a8282608c88e ("Revert "mm, thp: restore node-local hugepage allocations"")

> Thanks!
> 
> Greets,
> Stefan
> 
> 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 12:15 ` Vlastimil Babka
@ 2019-09-05 12:27   ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-05 12:27 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm
  Cc: l.roehrs, cgroups, Johannes Weiner, Michal Hocko


Am 05.09.19 um 14:15 schrieb Vlastimil Babka:
> On 9/5/19 1:27 PM, Stefan Priebe - Profihost AG wrote:
>> Hello all,
>>
>> i hope you can help me again to understand the current MemAvailable
>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
>> this case.
>>
>> I'm seeing the following behaviour i don't understand and ask for help.
>>
>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
>> to 1G while the apache spawns some PHP processes. After that the PSI
>> mem.some value rises and the kernel tries to reclaim memory but
>> MemAvailable stays at 5G.
>>
>> Any ideas?
> 
> PHP seems to use madvise(MADV_HUGEPAGE), so if it's a NUMA machine it
> might be worth trying to cherry-pick these two commits:
> 92717d429b38 ("Revert "Revert "mm, thp: consolidate THP gfp handling
> into alloc_hugepage_direct_gfpmask""")
> a8282608c88e ("Revert "mm, thp: restore node-local hugepage allocations"")

No it's a vm running inside qemu/kvm without numa.

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 11:56   ` Stefan Priebe - Profihost AG
@ 2019-09-05 16:28     ` Yang Shi
  2019-09-05 17:26       ` Stefan Priebe - Profihost AG
  2019-09-06 10:08     ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Yang Shi @ 2019-09-05 16:28 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner

On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
>
> Am 05.09.19 um 13:40 schrieb Michal Hocko:
> > On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
> >> Hello all,
> >>
> >> i hope you can help me again to understand the current MemAvailable
> >> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
> >> this case.
> >>
> >> I'm seeing the following behaviour i don't understand and ask for help.
> >>
> >> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
> >> to 1G while the apache spawns some PHP processes. After that the PSI
> >> mem.some value rises and the kernel tries to reclaim memory but
> >> MemAvailable stays at 5G.
> >>
> >> Any ideas?
> >
> > Can you collect /proc/vmstat (every second or so) and post it while this
> > is the case please?
>
> Yes sure.
>
> But i don't know which event you mean exactly. Current situation is PSI
> / memory pressure is > 20 but:
>
> This is the current status where MemAvailable show 5G but Cached is
> already dropped to 1G coming from 4G:

I don't get what problem you are running into. MemAvailable is *not*
the indication for triggering memory reclaim.

Basically MemAvailable = MemFree + page cache (active file + inactive
file) / 2 + SReclaimable / 2, which means that much memory could be
reclaimed if memory pressure is hit.

But, memory pressure (tracked by PSI) is triggered by how much memory
(aka watermark) is consumed.

So, it looks page reclaim logic just reclaimed file cache (it looks
sane since your VM doesn't have swap partition), so I'm supposed you
would see MemFree increased along with dropping "Cached", but
MemAvailable basically is not changed. It looks sane to me. Am I
missing something else?

>
>
> meminfo:
> MemTotal:       16423116 kB
> MemFree:         5280736 kB
> MemAvailable:    5332752 kB
> Buffers:            2572 kB
> Cached:          1225112 kB
> SwapCached:            0 kB
> Active:          8934976 kB
> Inactive:        1026900 kB
> Active(anon):    8740396 kB
> Inactive(anon):   873448 kB
> Active(file):     194580 kB
> Inactive(file):   153452 kB
> Unevictable:       19900 kB
> Mlocked:           19900 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:              1980 kB
> Writeback:             0 kB
> AnonPages:       8423480 kB
> Mapped:           978212 kB
> Shmem:            875680 kB
> Slab:             839868 kB
> SReclaimable:     383396 kB
> SUnreclaim:       456472 kB
> KernelStack:       22576 kB
> PageTables:        49824 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     8211556 kB
> Committed_AS:   32060624 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:           0 kB
> VmallocChunk:          0 kB
> Percpu:           118048 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:   6406144 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> Hugetlb:               0 kB
> DirectMap4k:     2580336 kB
> DirectMap2M:    14196736 kB
> DirectMap1G:     2097152 kB
>
>
> vmstat shows:
> nr_free_pages 1320053
> nr_zone_inactive_anon 218362
> nr_zone_active_anon 2185108
> nr_zone_inactive_file 38363
> nr_zone_active_file 48645
> nr_zone_unevictable 4975
> nr_zone_write_pending 495
> nr_mlock 4975
> nr_page_table_pages 12553
> nr_kernel_stack 22576
> nr_bounce 0
> nr_zspages 0
> nr_free_cma 0
> numa_hit 13916119899
> numa_miss 0
> numa_foreign 0
> numa_interleave 15629
> numa_local 13916119899
> numa_other 0
> nr_inactive_anon 218362
> nr_active_anon 2185164
> nr_inactive_file 38363
> nr_active_file 48645
> nr_unevictable 4975
> nr_slab_reclaimable 95849
> nr_slab_unreclaimable 114118
> nr_isolated_anon 0
> nr_isolated_file 0
> workingset_refault 71365357
> workingset_activate 20281670
> workingset_restore 8995665
> workingset_nodereclaim 326085
> nr_anon_pages 2105903
> nr_mapped 244553
> nr_file_pages 306921
> nr_dirty 495
> nr_writeback 0
> nr_writeback_temp 0
> nr_shmem 218920
> nr_shmem_hugepages 0
> nr_shmem_pmdmapped 0
> nr_anon_transparent_hugepages 3128
> nr_unstable 0
> nr_vmscan_write 0
> nr_vmscan_immediate_reclaim 1833104
> nr_dirtied 386544087
> nr_written 259220036
> nr_dirty_threshold 265636
> nr_dirty_background_threshold 132656
> pgpgin 1817628997
> pgpgout 3730818029
> pswpin 0
> pswpout 0
> pgalloc_dma 0
> pgalloc_dma32 5790777997
> pgalloc_normal 20003662520
> pgalloc_movable 0
> allocstall_dma 0
> allocstall_dma32 0
> allocstall_normal 39
> allocstall_movable 1980089
> pgskip_dma 0
> pgskip_dma32 0
> pgskip_normal 0
> pgskip_movable 0
> pgfree 26637215947
> pgactivate 316722654
> pgdeactivate 261039211
> pglazyfree 0
> pgfault 17719356599
> pgmajfault 30985544
> pglazyfreed 0
> pgrefill 286826568
> pgsteal_kswapd 36740923
> pgsteal_direct 349291470
> pgscan_kswapd 36878966
> pgscan_direct 395327492
> pgscan_direct_throttle 0
> zone_reclaim_failed 0
> pginodesteal 49817087
> slabs_scanned 597956834
> kswapd_inodesteal 1412447
> kswapd_low_wmark_hit_quickly 39
> kswapd_high_wmark_hit_quickly 319
> pageoutrun 3585
> pgrotated 2873743
> drop_pagecache 0
> drop_slab 0
> oom_kill 0
> pgmigrate_success 839062285
> pgmigrate_fail 507313
> compact_migrate_scanned 9619077010
> compact_free_scanned 67985619651
> compact_isolated 1684537704
> compact_stall 205761
> compact_fail 182420
> compact_success 23341
> compact_daemon_wake 2
> compact_daemon_migrate_scanned 811
> compact_daemon_free_scanned 490241
> htlb_buddy_alloc_success 0
> htlb_buddy_alloc_fail 0
> unevictable_pgs_culled 1006521
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 997077
> unevictable_pgs_mlocked 1319203
> unevictable_pgs_munlocked 842471
> unevictable_pgs_cleared 470531
> unevictable_pgs_stranded 459613
> thp_fault_alloc 20263113
> thp_fault_fallback 3368635
> thp_collapse_alloc 226476
> thp_collapse_alloc_failed 17594
> thp_file_alloc 0
> thp_file_mapped 0
> thp_split_page 1159
> thp_split_page_failed 3927
> thp_deferred_split_page 20348941
> thp_split_pmd 53361
> thp_split_pud 0
> thp_zero_page_alloc 1
> thp_zero_page_alloc_failed 0
> thp_swpout 0
> thp_swpout_fallback 0
> balloon_inflate 0
> balloon_deflate 0
> balloon_migrate 0
> swap_ra 0
> swap_ra_hit 0
>
> Greets,
> Stefan
>
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 16:28     ` Yang Shi
@ 2019-09-05 17:26       ` Stefan Priebe - Profihost AG
  2019-09-05 18:46         ` Yang Shi
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-05 17:26 UTC (permalink / raw)
  To: Yang Shi; +Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner

Hi,
Am 05.09.19 um 18:28 schrieb Yang Shi:
> On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>>
>>
>> Am 05.09.19 um 13:40 schrieb Michal Hocko:
>>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
>>>> Hello all,
>>>>
>>>> i hope you can help me again to understand the current MemAvailable
>>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
>>>> this case.
>>>>
>>>> I'm seeing the following behaviour i don't understand and ask for help.
>>>>
>>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
>>>> to 1G while the apache spawns some PHP processes. After that the PSI
>>>> mem.some value rises and the kernel tries to reclaim memory but
>>>> MemAvailable stays at 5G.
>>>>
>>>> Any ideas?
>>>
>>> Can you collect /proc/vmstat (every second or so) and post it while this
>>> is the case please?
>>
>> Yes sure.
>>
>> But i don't know which event you mean exactly. Current situation is PSI
>> / memory pressure is > 20 but:
>>
>> This is the current status where MemAvailable show 5G but Cached is
>> already dropped to 1G coming from 4G:
> 
> I don't get what problem you are running into. MemAvailable is *not*
> the indication for triggering memory reclaim.

Yes it's not sure. But i don't get why:
* PSI is raising and Caches are dropped when MemAvail and MemFree show 5GB

> Basically MemAvailable = MemFree + page cache (active file + inactive
> file) / 2 + SReclaimable / 2, which means that much memory could be
> reclaimed if memory pressure is hit.

Yes but MemFree also shows 5G in this case see below and still file
cache gets dropped and PSI is rising.

> But, memory pressure (tracked by PSI) is triggered by how much memory
> (aka watermark) is consumed.
What does this exactly mean?

> So, it looks page reclaim logic just reclaimed file cache (it looks
> sane since your VM doesn't have swap partition), so I'm supposed you
> would see MemFree increased along with dropping "Cached",

No it does not. MemFree and MemAvail stay constant at 5G.

> but
> MemAvailable basically is not changed. It looks sane to me. Am I
> missing something else?

I ever thought the kerne would not free the cache nor PSI gets rising
when there are 5GB in MemFree and in MemAvail. This makes still no sense
to me. Why drop the cache when you have 5G free. This results currently
in I/O waits as the page was dropped.

Greets,
Stefan

>>
>> meminfo:
>> MemTotal:       16423116 kB
>> MemFree:         5280736 kB
>> MemAvailable:    5332752 kB
>> Buffers:            2572 kB
>> Cached:          1225112 kB
>> SwapCached:            0 kB
>> Active:          8934976 kB
>> Inactive:        1026900 kB
>> Active(anon):    8740396 kB
>> Inactive(anon):   873448 kB
>> Active(file):     194580 kB
>> Inactive(file):   153452 kB
>> Unevictable:       19900 kB
>> Mlocked:           19900 kB
>> SwapTotal:             0 kB
>> SwapFree:              0 kB
>> Dirty:              1980 kB
>> Writeback:             0 kB
>> AnonPages:       8423480 kB
>> Mapped:           978212 kB
>> Shmem:            875680 kB
>> Slab:             839868 kB
>> SReclaimable:     383396 kB
>> SUnreclaim:       456472 kB
>> KernelStack:       22576 kB
>> PageTables:        49824 kB
>> NFS_Unstable:          0 kB
>> Bounce:                0 kB
>> WritebackTmp:          0 kB
>> CommitLimit:     8211556 kB
>> Committed_AS:   32060624 kB
>> VmallocTotal:   34359738367 kB
>> VmallocUsed:           0 kB
>> VmallocChunk:          0 kB
>> Percpu:           118048 kB
>> HardwareCorrupted:     0 kB
>> AnonHugePages:   6406144 kB
>> ShmemHugePages:        0 kB
>> ShmemPmdMapped:        0 kB
>> HugePages_Total:       0
>> HugePages_Free:        0
>> HugePages_Rsvd:        0
>> HugePages_Surp:        0
>> Hugepagesize:       2048 kB
>> Hugetlb:               0 kB
>> DirectMap4k:     2580336 kB
>> DirectMap2M:    14196736 kB
>> DirectMap1G:     2097152 kB
>>
>>
>> vmstat shows:
>> nr_free_pages 1320053
>> nr_zone_inactive_anon 218362
>> nr_zone_active_anon 2185108
>> nr_zone_inactive_file 38363
>> nr_zone_active_file 48645
>> nr_zone_unevictable 4975
>> nr_zone_write_pending 495
>> nr_mlock 4975
>> nr_page_table_pages 12553
>> nr_kernel_stack 22576
>> nr_bounce 0
>> nr_zspages 0
>> nr_free_cma 0
>> numa_hit 13916119899
>> numa_miss 0
>> numa_foreign 0
>> numa_interleave 15629
>> numa_local 13916119899
>> numa_other 0
>> nr_inactive_anon 218362
>> nr_active_anon 2185164
>> nr_inactive_file 38363
>> nr_active_file 48645
>> nr_unevictable 4975
>> nr_slab_reclaimable 95849
>> nr_slab_unreclaimable 114118
>> nr_isolated_anon 0
>> nr_isolated_file 0
>> workingset_refault 71365357
>> workingset_activate 20281670
>> workingset_restore 8995665
>> workingset_nodereclaim 326085
>> nr_anon_pages 2105903
>> nr_mapped 244553
>> nr_file_pages 306921
>> nr_dirty 495
>> nr_writeback 0
>> nr_writeback_temp 0
>> nr_shmem 218920
>> nr_shmem_hugepages 0
>> nr_shmem_pmdmapped 0
>> nr_anon_transparent_hugepages 3128
>> nr_unstable 0
>> nr_vmscan_write 0
>> nr_vmscan_immediate_reclaim 1833104
>> nr_dirtied 386544087
>> nr_written 259220036
>> nr_dirty_threshold 265636
>> nr_dirty_background_threshold 132656
>> pgpgin 1817628997
>> pgpgout 3730818029
>> pswpin 0
>> pswpout 0
>> pgalloc_dma 0
>> pgalloc_dma32 5790777997
>> pgalloc_normal 20003662520
>> pgalloc_movable 0
>> allocstall_dma 0
>> allocstall_dma32 0
>> allocstall_normal 39
>> allocstall_movable 1980089
>> pgskip_dma 0
>> pgskip_dma32 0
>> pgskip_normal 0
>> pgskip_movable 0
>> pgfree 26637215947
>> pgactivate 316722654
>> pgdeactivate 261039211
>> pglazyfree 0
>> pgfault 17719356599
>> pgmajfault 30985544
>> pglazyfreed 0
>> pgrefill 286826568
>> pgsteal_kswapd 36740923
>> pgsteal_direct 349291470
>> pgscan_kswapd 36878966
>> pgscan_direct 395327492
>> pgscan_direct_throttle 0
>> zone_reclaim_failed 0
>> pginodesteal 49817087
>> slabs_scanned 597956834
>> kswapd_inodesteal 1412447
>> kswapd_low_wmark_hit_quickly 39
>> kswapd_high_wmark_hit_quickly 319
>> pageoutrun 3585
>> pgrotated 2873743
>> drop_pagecache 0
>> drop_slab 0
>> oom_kill 0
>> pgmigrate_success 839062285
>> pgmigrate_fail 507313
>> compact_migrate_scanned 9619077010
>> compact_free_scanned 67985619651
>> compact_isolated 1684537704
>> compact_stall 205761
>> compact_fail 182420
>> compact_success 23341
>> compact_daemon_wake 2
>> compact_daemon_migrate_scanned 811
>> compact_daemon_free_scanned 490241
>> htlb_buddy_alloc_success 0
>> htlb_buddy_alloc_fail 0
>> unevictable_pgs_culled 1006521
>> unevictable_pgs_scanned 0
>> unevictable_pgs_rescued 997077
>> unevictable_pgs_mlocked 1319203
>> unevictable_pgs_munlocked 842471
>> unevictable_pgs_cleared 470531
>> unevictable_pgs_stranded 459613
>> thp_fault_alloc 20263113
>> thp_fault_fallback 3368635
>> thp_collapse_alloc 226476
>> thp_collapse_alloc_failed 17594
>> thp_file_alloc 0
>> thp_file_mapped 0
>> thp_split_page 1159
>> thp_split_page_failed 3927
>> thp_deferred_split_page 20348941
>> thp_split_pmd 53361
>> thp_split_pud 0
>> thp_zero_page_alloc 1
>> thp_zero_page_alloc_failed 0
>> thp_swpout 0
>> thp_swpout_fallback 0
>> balloon_inflate 0
>> balloon_deflate 0
>> balloon_migrate 0
>> swap_ra 0
>> swap_ra_hit 0
>>
>> Greets,
>> Stefan
>>
>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 17:26       ` Stefan Priebe - Profihost AG
@ 2019-09-05 18:46         ` Yang Shi
  2019-09-05 19:31           ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Yang Shi @ 2019-09-05 18:46 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner

On Thu, Sep 5, 2019 at 10:26 AM Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
> Hi,
> Am 05.09.19 um 18:28 schrieb Yang Shi:
> > On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG
> > <s.priebe@profihost.ag> wrote:
> >>
> >>
> >> Am 05.09.19 um 13:40 schrieb Michal Hocko:
> >>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
> >>>> Hello all,
> >>>>
> >>>> i hope you can help me again to understand the current MemAvailable
> >>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
> >>>> this case.
> >>>>
> >>>> I'm seeing the following behaviour i don't understand and ask for help.
> >>>>
> >>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
> >>>> to 1G while the apache spawns some PHP processes. After that the PSI
> >>>> mem.some value rises and the kernel tries to reclaim memory but
> >>>> MemAvailable stays at 5G.
> >>>>
> >>>> Any ideas?
> >>>
> >>> Can you collect /proc/vmstat (every second or so) and post it while this
> >>> is the case please?
> >>
> >> Yes sure.
> >>
> >> But i don't know which event you mean exactly. Current situation is PSI
> >> / memory pressure is > 20 but:
> >>
> >> This is the current status where MemAvailable show 5G but Cached is
> >> already dropped to 1G coming from 4G:
> >
> > I don't get what problem you are running into. MemAvailable is *not*
> > the indication for triggering memory reclaim.
>
> Yes it's not sure. But i don't get why:
> * PSI is raising and Caches are dropped when MemAvail and MemFree show 5GB

You need check your water mark (/proc/min_free_kbytes,
/proc/watermark_scale_factor and /proc/zoneinfo) setting why kswapd is
launched when there is 5 GB free memory.

>
> > Basically MemAvailable = MemFree + page cache (active file + inactive
> > file) / 2 + SReclaimable / 2, which means that much memory could be
> > reclaimed if memory pressure is hit.
>
> Yes but MemFree also shows 5G in this case see below and still file
> cache gets dropped and PSI is rising.
>
> > But, memory pressure (tracked by PSI) is triggered by how much memory
> > (aka watermark) is consumed.
> What does this exactly mean?

cat /proc/zoneinfo, it would show something like:

pages free     4118641
        min      12470
        low      16598
        high     20726

Here min/low/high are the so-called "water mark". When free memory is
lower than low, kswapd would be launched.

>
> > So, it looks page reclaim logic just reclaimed file cache (it looks
> > sane since your VM doesn't have swap partition), so I'm supposed you
> > would see MemFree increased along with dropping "Cached",
>
> No it does not. MemFree and MemAvail stay constant at 5G.
>
> > but
> > MemAvailable basically is not changed. It looks sane to me. Am I
> > missing something else?
>
> I ever thought the kerne would not free the cache nor PSI gets rising
> when there are 5GB in MemFree and in MemAvail. This makes still no sense
> to me. Why drop the cache when you have 5G free. This results currently
> in I/O waits as the page was dropped.
>
> Greets,
> Stefan
>
> >>
> >> meminfo:
> >> MemTotal:       16423116 kB
> >> MemFree:         5280736 kB
> >> MemAvailable:    5332752 kB
> >> Buffers:            2572 kB
> >> Cached:          1225112 kB
> >> SwapCached:            0 kB
> >> Active:          8934976 kB
> >> Inactive:        1026900 kB
> >> Active(anon):    8740396 kB
> >> Inactive(anon):   873448 kB
> >> Active(file):     194580 kB
> >> Inactive(file):   153452 kB
> >> Unevictable:       19900 kB
> >> Mlocked:           19900 kB
> >> SwapTotal:             0 kB
> >> SwapFree:              0 kB
> >> Dirty:              1980 kB
> >> Writeback:             0 kB
> >> AnonPages:       8423480 kB
> >> Mapped:           978212 kB
> >> Shmem:            875680 kB
> >> Slab:             839868 kB
> >> SReclaimable:     383396 kB
> >> SUnreclaim:       456472 kB
> >> KernelStack:       22576 kB
> >> PageTables:        49824 kB
> >> NFS_Unstable:          0 kB
> >> Bounce:                0 kB
> >> WritebackTmp:          0 kB
> >> CommitLimit:     8211556 kB
> >> Committed_AS:   32060624 kB
> >> VmallocTotal:   34359738367 kB
> >> VmallocUsed:           0 kB
> >> VmallocChunk:          0 kB
> >> Percpu:           118048 kB
> >> HardwareCorrupted:     0 kB
> >> AnonHugePages:   6406144 kB
> >> ShmemHugePages:        0 kB
> >> ShmemPmdMapped:        0 kB
> >> HugePages_Total:       0
> >> HugePages_Free:        0
> >> HugePages_Rsvd:        0
> >> HugePages_Surp:        0
> >> Hugepagesize:       2048 kB
> >> Hugetlb:               0 kB
> >> DirectMap4k:     2580336 kB
> >> DirectMap2M:    14196736 kB
> >> DirectMap1G:     2097152 kB
> >>
> >>
> >> vmstat shows:
> >> nr_free_pages 1320053
> >> nr_zone_inactive_anon 218362
> >> nr_zone_active_anon 2185108
> >> nr_zone_inactive_file 38363
> >> nr_zone_active_file 48645
> >> nr_zone_unevictable 4975
> >> nr_zone_write_pending 495
> >> nr_mlock 4975
> >> nr_page_table_pages 12553
> >> nr_kernel_stack 22576
> >> nr_bounce 0
> >> nr_zspages 0
> >> nr_free_cma 0
> >> numa_hit 13916119899
> >> numa_miss 0
> >> numa_foreign 0
> >> numa_interleave 15629
> >> numa_local 13916119899
> >> numa_other 0
> >> nr_inactive_anon 218362
> >> nr_active_anon 2185164
> >> nr_inactive_file 38363
> >> nr_active_file 48645
> >> nr_unevictable 4975
> >> nr_slab_reclaimable 95849
> >> nr_slab_unreclaimable 114118
> >> nr_isolated_anon 0
> >> nr_isolated_file 0
> >> workingset_refault 71365357
> >> workingset_activate 20281670
> >> workingset_restore 8995665
> >> workingset_nodereclaim 326085
> >> nr_anon_pages 2105903
> >> nr_mapped 244553
> >> nr_file_pages 306921
> >> nr_dirty 495
> >> nr_writeback 0
> >> nr_writeback_temp 0
> >> nr_shmem 218920
> >> nr_shmem_hugepages 0
> >> nr_shmem_pmdmapped 0
> >> nr_anon_transparent_hugepages 3128
> >> nr_unstable 0
> >> nr_vmscan_write 0
> >> nr_vmscan_immediate_reclaim 1833104
> >> nr_dirtied 386544087
> >> nr_written 259220036
> >> nr_dirty_threshold 265636
> >> nr_dirty_background_threshold 132656
> >> pgpgin 1817628997
> >> pgpgout 3730818029
> >> pswpin 0
> >> pswpout 0
> >> pgalloc_dma 0
> >> pgalloc_dma32 5790777997
> >> pgalloc_normal 20003662520
> >> pgalloc_movable 0
> >> allocstall_dma 0
> >> allocstall_dma32 0
> >> allocstall_normal 39
> >> allocstall_movable 1980089
> >> pgskip_dma 0
> >> pgskip_dma32 0
> >> pgskip_normal 0
> >> pgskip_movable 0
> >> pgfree 26637215947
> >> pgactivate 316722654
> >> pgdeactivate 261039211
> >> pglazyfree 0
> >> pgfault 17719356599
> >> pgmajfault 30985544
> >> pglazyfreed 0
> >> pgrefill 286826568
> >> pgsteal_kswapd 36740923
> >> pgsteal_direct 349291470
> >> pgscan_kswapd 36878966
> >> pgscan_direct 395327492
> >> pgscan_direct_throttle 0
> >> zone_reclaim_failed 0
> >> pginodesteal 49817087
> >> slabs_scanned 597956834
> >> kswapd_inodesteal 1412447
> >> kswapd_low_wmark_hit_quickly 39
> >> kswapd_high_wmark_hit_quickly 319
> >> pageoutrun 3585
> >> pgrotated 2873743
> >> drop_pagecache 0
> >> drop_slab 0
> >> oom_kill 0
> >> pgmigrate_success 839062285
> >> pgmigrate_fail 507313
> >> compact_migrate_scanned 9619077010
> >> compact_free_scanned 67985619651
> >> compact_isolated 1684537704
> >> compact_stall 205761
> >> compact_fail 182420
> >> compact_success 23341
> >> compact_daemon_wake 2
> >> compact_daemon_migrate_scanned 811
> >> compact_daemon_free_scanned 490241
> >> htlb_buddy_alloc_success 0
> >> htlb_buddy_alloc_fail 0
> >> unevictable_pgs_culled 1006521
> >> unevictable_pgs_scanned 0
> >> unevictable_pgs_rescued 997077
> >> unevictable_pgs_mlocked 1319203
> >> unevictable_pgs_munlocked 842471
> >> unevictable_pgs_cleared 470531
> >> unevictable_pgs_stranded 459613
> >> thp_fault_alloc 20263113
> >> thp_fault_fallback 3368635
> >> thp_collapse_alloc 226476
> >> thp_collapse_alloc_failed 17594
> >> thp_file_alloc 0
> >> thp_file_mapped 0
> >> thp_split_page 1159
> >> thp_split_page_failed 3927
> >> thp_deferred_split_page 20348941
> >> thp_split_pmd 53361
> >> thp_split_pud 0
> >> thp_zero_page_alloc 1
> >> thp_zero_page_alloc_failed 0
> >> thp_swpout 0
> >> thp_swpout_fallback 0
> >> balloon_inflate 0
> >> balloon_deflate 0
> >> balloon_migrate 0
> >> swap_ra 0
> >> swap_ra_hit 0
> >>
> >> Greets,
> >> Stefan
> >>
> >>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 18:46         ` Yang Shi
@ 2019-09-05 19:31           ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-05 19:31 UTC (permalink / raw)
  To: Yang Shi; +Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner

Am 05.09.19 um 20:46 schrieb Yang Shi:
> On Thu, Sep 5, 2019 at 10:26 AM Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>>
>> Hi,
>> Am 05.09.19 um 18:28 schrieb Yang Shi:
>>> On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG
>>> <s.priebe@profihost.ag> wrote:
>>>>
>>>>
>>>> Am 05.09.19 um 13:40 schrieb Michal Hocko:
>>>>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> i hope you can help me again to understand the current MemAvailable
>>>>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
>>>>>> this case.
>>>>>>
>>>>>> I'm seeing the following behaviour i don't understand and ask for help.
>>>>>>
>>>>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
>>>>>> to 1G while the apache spawns some PHP processes. After that the PSI
>>>>>> mem.some value rises and the kernel tries to reclaim memory but
>>>>>> MemAvailable stays at 5G.
>>>>>>
>>>>>> Any ideas?
>>>>>
>>>>> Can you collect /proc/vmstat (every second or so) and post it while this
>>>>> is the case please?
>>>>
>>>> Yes sure.
>>>>
>>>> But i don't know which event you mean exactly. Current situation is PSI
>>>> / memory pressure is > 20 but:
>>>>
>>>> This is the current status where MemAvailable show 5G but Cached is
>>>> already dropped to 1G coming from 4G:
>>>
>>> I don't get what problem you are running into. MemAvailable is *not*
>>> the indication for triggering memory reclaim.
>>
>> Yes it's not sure. But i don't get why:
>> * PSI is raising and Caches are dropped when MemAvail and MemFree show 5GB
> 
> You need check your water mark (/proc/min_free_kbytes,
> /proc/watermark_scale_factor and /proc/zoneinfo) setting why kswapd is
> launched when there is 5 GB free memory.

sure i did but can't find anything:
# cat /proc/sys/vm/min_free_kbytes
164231

# cat /proc/sys/vm/watermark_scale_factor
10


# cat /proc/zoneinfo
Node 0, zone      DMA
  per-node stats
      nr_inactive_anon 177046
      nr_active_anon 1718836
      nr_inactive_file 288146
      nr_active_file 121497
      nr_unevictable 5510
      nr_slab_reclaimable 301721
      nr_slab_unreclaimable 119276
      nr_isolated_anon 0
      nr_isolated_file 0
      workingset_refault 72376392
      workingset_activate 20641006
      workingset_restore 9149962
      workingset_nodereclaim 326469
      nr_anon_pages 1647524
      nr_mapped    211704
      nr_file_pages 587984
      nr_dirty     212
      nr_writeback 0
      nr_writeback_temp 0
      nr_shmem     177458
      nr_shmem_hugepages 0
      nr_shmem_pmdmapped 0
      nr_anon_transparent_hugepages 2480
      nr_unstable  0
      nr_vmscan_write 0
      nr_vmscan_immediate_reclaim 1843759
      nr_dirtied   388618149
      nr_written   260643754
  pages free     3977
        min      39
        low      48
        high     57
        spanned  4095
        present  3998
        managed  3977
        protection: (0, 2968, 16022, 16022, 16022)
      nr_free_pages 3977
      nr_zone_inactive_anon 0
      nr_zone_active_anon 0
      nr_zone_inactive_file 0
      nr_zone_active_file 0
      nr_zone_unevictable 0
      nr_zone_write_pending 0
      nr_mlock     0
      nr_page_table_pages 0
      nr_kernel_stack 0
      nr_bounce    0
      nr_zspages   0
      nr_free_cma  0
      numa_hit     0
      numa_miss    0
      numa_foreign 0
      numa_interleave 0
      numa_local   0
      numa_other   0
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 2
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 3
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 4
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 5
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 6
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
    cpu: 7
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 8
  node_unreclaimable:  0
  start_pfn:           1
Node 0, zone    DMA32
  pages free     439019
        min      7600
        low      9500
        high     11400
        spanned  1044480
        present  782300
        managed  760023
        protection: (0, 0, 13053, 13053, 13053)
      nr_free_pages 439019
      nr_zone_inactive_anon 0
      nr_zone_active_anon 309777
      nr_zone_inactive_file 809
      nr_zone_active_file 645
      nr_zone_unevictable 2048
      nr_zone_write_pending 1
      nr_mlock     2048
      nr_page_table_pages 8
      nr_kernel_stack 32
      nr_bounce    0
      nr_zspages   0
      nr_free_cma  0
      numa_hit     213697054
      numa_miss    0
      numa_foreign 0
      numa_interleave 0
      numa_local   213697054
      numa_other   0
  pagesets
    cpu: 0
              count: 0
              high:  378
              batch: 63
  vm stats threshold: 48
    cpu: 1
              count: 1
              high:  378
              batch: 63
  vm stats threshold: 48
    cpu: 2
              count: 338
              high:  378
              batch: 63
  vm stats threshold: 48
    cpu: 3
              count: 10
              high:  378
              batch: 63
  vm stats threshold: 48
    cpu: 4
              count: 0
              high:  378
              batch: 63
  vm stats threshold: 48
    cpu: 5
              count: 324
              high:  378
              batch: 63
  vm stats threshold: 48
    cpu: 6
              count: 136
              high:  378
              batch: 63
  vm stats threshold: 48
    cpu: 7
              count: 1
              high:  378
              batch: 63
  vm stats threshold: 48
  node_unreclaimable:  0
  start_pfn:           4096
Node 0, zone   Normal
  pages free     734519
        min      33417
        low      41771
        high     50125
        spanned  3407872
        present  3407872
        managed  3341779
        protection: (0, 0, 0, 0, 0)
      nr_free_pages 734519
      nr_zone_inactive_anon 177046
      nr_zone_active_anon 1409059
      nr_zone_inactive_file 287337
      nr_zone_active_file 120852
      nr_zone_unevictable 3462
      nr_zone_write_pending 211
      nr_mlock     3462
      nr_page_table_pages 10551
      nr_kernel_stack 22464
      nr_bounce    0
      nr_zspages   0
      nr_free_cma  0
      numa_hit     13801352577
      numa_miss    0
      numa_foreign 0
      numa_interleave 15629
      numa_local   13801352577
      numa_other   0
  pagesets
    cpu: 0
              count: 12
              high:  42
              batch: 7
  vm stats threshold: 64
    cpu: 1
              count: 40
              high:  42
              batch: 7
  vm stats threshold: 64
    cpu: 2
              count: 41
              high:  42
              batch: 7
  vm stats threshold: 64
    cpu: 3
              count: 41
              high:  42
              batch: 7
  vm stats threshold: 64
    cpu: 4
              count: 37
              high:  42
              batch: 7
  vm stats threshold: 64
    cpu: 5
              count: 39
              high:  42
              batch: 7
  vm stats threshold: 64
    cpu: 6
              count: 19
              high:  42
              batch: 7
  vm stats threshold: 64
    cpu: 7
              count: 9
              high:  42
              batch: 7
  vm stats threshold: 64
  node_unreclaimable:  0
  start_pfn:           1048576
Node 0, zone  Movable
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 0, 0, 0)
Node 0, zone   Device
  pages free     0
        min      0
        low      0
        high     0
        spanned  0
        present  0
        managed  0
        protection: (0, 0, 0, 0, 0)

>>> Basically MemAvailable = MemFree + page cache (active file + inactive
>>> file) / 2 + SReclaimable / 2, which means that much memory could be
>>> reclaimed if memory pressure is hit.
>>
>> Yes but MemFree also shows 5G in this case see below and still file
>> cache gets dropped and PSI is rising.
>>
>>> But, memory pressure (tracked by PSI) is triggered by how much memory
>>> (aka watermark) is consumed.
>> What does this exactly mean?
> 
> cat /proc/zoneinfo, it would show something like:
> 
> pages free     4118641
>         min      12470
>         low      16598
>         high     20726
> 
> Here min/low/high are the so-called "water mark". When free memory is
> lower than low, kswapd would be launched.
> 
>>
>>> So, it looks page reclaim logic just reclaimed file cache (it looks
>>> sane since your VM doesn't have swap partition), so I'm supposed you
>>> would see MemFree increased along with dropping "Cached",
>>
>> No it does not. MemFree and MemAvail stay constant at 5G.
>>
>>> but
>>> MemAvailable basically is not changed. It looks sane to me. Am I
>>> missing something else?
>>
>> I ever thought the kerne would not free the cache nor PSI gets rising
>> when there are 5GB in MemFree and in MemAvail. This makes still no sense
>> to me. Why drop the cache when you have 5G free. This results currently
>> in I/O waits as the page was dropped.
>>
>> Greets,
>> Stefan
>>
>>>>
>>>> meminfo:
>>>> MemTotal:       16423116 kB
>>>> MemFree:         5280736 kB
>>>> MemAvailable:    5332752 kB
>>>> Buffers:            2572 kB
>>>> Cached:          1225112 kB
>>>> SwapCached:            0 kB
>>>> Active:          8934976 kB
>>>> Inactive:        1026900 kB
>>>> Active(anon):    8740396 kB
>>>> Inactive(anon):   873448 kB
>>>> Active(file):     194580 kB
>>>> Inactive(file):   153452 kB
>>>> Unevictable:       19900 kB
>>>> Mlocked:           19900 kB
>>>> SwapTotal:             0 kB
>>>> SwapFree:              0 kB
>>>> Dirty:              1980 kB
>>>> Writeback:             0 kB
>>>> AnonPages:       8423480 kB
>>>> Mapped:           978212 kB
>>>> Shmem:            875680 kB
>>>> Slab:             839868 kB
>>>> SReclaimable:     383396 kB
>>>> SUnreclaim:       456472 kB
>>>> KernelStack:       22576 kB
>>>> PageTables:        49824 kB
>>>> NFS_Unstable:          0 kB
>>>> Bounce:                0 kB
>>>> WritebackTmp:          0 kB
>>>> CommitLimit:     8211556 kB
>>>> Committed_AS:   32060624 kB
>>>> VmallocTotal:   34359738367 kB
>>>> VmallocUsed:           0 kB
>>>> VmallocChunk:          0 kB
>>>> Percpu:           118048 kB
>>>> HardwareCorrupted:     0 kB
>>>> AnonHugePages:   6406144 kB
>>>> ShmemHugePages:        0 kB
>>>> ShmemPmdMapped:        0 kB
>>>> HugePages_Total:       0
>>>> HugePages_Free:        0
>>>> HugePages_Rsvd:        0
>>>> HugePages_Surp:        0
>>>> Hugepagesize:       2048 kB
>>>> Hugetlb:               0 kB
>>>> DirectMap4k:     2580336 kB
>>>> DirectMap2M:    14196736 kB
>>>> DirectMap1G:     2097152 kB
>>>>
>>>>
>>>> vmstat shows:
>>>> nr_free_pages 1320053
>>>> nr_zone_inactive_anon 218362
>>>> nr_zone_active_anon 2185108
>>>> nr_zone_inactive_file 38363
>>>> nr_zone_active_file 48645
>>>> nr_zone_unevictable 4975
>>>> nr_zone_write_pending 495
>>>> nr_mlock 4975
>>>> nr_page_table_pages 12553
>>>> nr_kernel_stack 22576
>>>> nr_bounce 0
>>>> nr_zspages 0
>>>> nr_free_cma 0
>>>> numa_hit 13916119899
>>>> numa_miss 0
>>>> numa_foreign 0
>>>> numa_interleave 15629
>>>> numa_local 13916119899
>>>> numa_other 0
>>>> nr_inactive_anon 218362
>>>> nr_active_anon 2185164
>>>> nr_inactive_file 38363
>>>> nr_active_file 48645
>>>> nr_unevictable 4975
>>>> nr_slab_reclaimable 95849
>>>> nr_slab_unreclaimable 114118
>>>> nr_isolated_anon 0
>>>> nr_isolated_file 0
>>>> workingset_refault 71365357
>>>> workingset_activate 20281670
>>>> workingset_restore 8995665
>>>> workingset_nodereclaim 326085
>>>> nr_anon_pages 2105903
>>>> nr_mapped 244553
>>>> nr_file_pages 306921
>>>> nr_dirty 495
>>>> nr_writeback 0
>>>> nr_writeback_temp 0
>>>> nr_shmem 218920
>>>> nr_shmem_hugepages 0
>>>> nr_shmem_pmdmapped 0
>>>> nr_anon_transparent_hugepages 3128
>>>> nr_unstable 0
>>>> nr_vmscan_write 0
>>>> nr_vmscan_immediate_reclaim 1833104
>>>> nr_dirtied 386544087
>>>> nr_written 259220036
>>>> nr_dirty_threshold 265636
>>>> nr_dirty_background_threshold 132656
>>>> pgpgin 1817628997
>>>> pgpgout 3730818029
>>>> pswpin 0
>>>> pswpout 0
>>>> pgalloc_dma 0
>>>> pgalloc_dma32 5790777997
>>>> pgalloc_normal 20003662520
>>>> pgalloc_movable 0
>>>> allocstall_dma 0
>>>> allocstall_dma32 0
>>>> allocstall_normal 39
>>>> allocstall_movable 1980089
>>>> pgskip_dma 0
>>>> pgskip_dma32 0
>>>> pgskip_normal 0
>>>> pgskip_movable 0
>>>> pgfree 26637215947
>>>> pgactivate 316722654
>>>> pgdeactivate 261039211
>>>> pglazyfree 0
>>>> pgfault 17719356599
>>>> pgmajfault 30985544
>>>> pglazyfreed 0
>>>> pgrefill 286826568
>>>> pgsteal_kswapd 36740923
>>>> pgsteal_direct 349291470
>>>> pgscan_kswapd 36878966
>>>> pgscan_direct 395327492
>>>> pgscan_direct_throttle 0
>>>> zone_reclaim_failed 0
>>>> pginodesteal 49817087
>>>> slabs_scanned 597956834
>>>> kswapd_inodesteal 1412447
>>>> kswapd_low_wmark_hit_quickly 39
>>>> kswapd_high_wmark_hit_quickly 319
>>>> pageoutrun 3585
>>>> pgrotated 2873743
>>>> drop_pagecache 0
>>>> drop_slab 0
>>>> oom_kill 0
>>>> pgmigrate_success 839062285
>>>> pgmigrate_fail 507313
>>>> compact_migrate_scanned 9619077010
>>>> compact_free_scanned 67985619651
>>>> compact_isolated 1684537704
>>>> compact_stall 205761
>>>> compact_fail 182420
>>>> compact_success 23341
>>>> compact_daemon_wake 2
>>>> compact_daemon_migrate_scanned 811
>>>> compact_daemon_free_scanned 490241
>>>> htlb_buddy_alloc_success 0
>>>> htlb_buddy_alloc_fail 0
>>>> unevictable_pgs_culled 1006521
>>>> unevictable_pgs_scanned 0
>>>> unevictable_pgs_rescued 997077
>>>> unevictable_pgs_mlocked 1319203
>>>> unevictable_pgs_munlocked 842471
>>>> unevictable_pgs_cleared 470531
>>>> unevictable_pgs_stranded 459613
>>>> thp_fault_alloc 20263113
>>>> thp_fault_fallback 3368635
>>>> thp_collapse_alloc 226476
>>>> thp_collapse_alloc_failed 17594
>>>> thp_file_alloc 0
>>>> thp_file_mapped 0
>>>> thp_split_page 1159
>>>> thp_split_page_failed 3927
>>>> thp_deferred_split_page 20348941
>>>> thp_split_pmd 53361
>>>> thp_split_pud 0
>>>> thp_zero_page_alloc 1
>>>> thp_zero_page_alloc_failed 0
>>>> thp_swpout 0
>>>> thp_swpout_fallback 0
>>>> balloon_inflate 0
>>>> balloon_deflate 0
>>>> balloon_migrate 0
>>>> swap_ra 0
>>>> swap_ra_hit 0
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-05 11:56   ` Stefan Priebe - Profihost AG
  2019-09-05 16:28     ` Yang Shi
@ 2019-09-06 10:08     ` Stefan Priebe - Profihost AG
  2019-09-06 10:25       ` Vlastimil Babka
                         ` (2 more replies)
  1 sibling, 3 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-06 10:08 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

These are the biggest differences in meminfo before and after cached
starts to drop. I didn't expect cached end up in MemFree.

Before:
MemTotal:       16423116 kB
MemFree:          374572 kB
MemAvailable:    5633816 kB
Cached:          5550972 kB
Inactive:        4696580 kB
Inactive(file):  3624776 kB


After:
MemTotal:       16423116 kB
MemFree:         3477168 kB
MemAvailable:    6066916 kB
Cached:          2724504 kB
Inactive:        1854740 kB
Inactive(file):   950680 kB

Any explanation?

Greets,
Stefan
Am 05.09.19 um 13:56 schrieb Stefan Priebe - Profihost AG:
> 
> Am 05.09.19 um 13:40 schrieb Michal Hocko:
>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
>>> Hello all,
>>>
>>> i hope you can help me again to understand the current MemAvailable
>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
>>> this case.
>>>
>>> I'm seeing the following behaviour i don't understand and ask for help.
>>>
>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
>>> to 1G while the apache spawns some PHP processes. After that the PSI
>>> mem.some value rises and the kernel tries to reclaim memory but
>>> MemAvailable stays at 5G.
>>>
>>> Any ideas?
>>
>> Can you collect /proc/vmstat (every second or so) and post it while this
>> is the case please?
> 
> Yes sure.
> 
> But i don't know which event you mean exactly. Current situation is PSI
> / memory pressure is > 20 but:
> 
> This is the current status where MemAvailable show 5G but Cached is
> already dropped to 1G coming from 4G:
> 
> 
> meminfo:
> MemTotal:       16423116 kB
> MemFree:         5280736 kB
> MemAvailable:    5332752 kB
> Buffers:            2572 kB
> Cached:          1225112 kB
> SwapCached:            0 kB
> Active:          8934976 kB
> Inactive:        1026900 kB
> Active(anon):    8740396 kB
> Inactive(anon):   873448 kB
> Active(file):     194580 kB
> Inactive(file):   153452 kB
> Unevictable:       19900 kB
> Mlocked:           19900 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:              1980 kB
> Writeback:             0 kB
> AnonPages:       8423480 kB
> Mapped:           978212 kB
> Shmem:            875680 kB
> Slab:             839868 kB
> SReclaimable:     383396 kB
> SUnreclaim:       456472 kB
> KernelStack:       22576 kB
> PageTables:        49824 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     8211556 kB
> Committed_AS:   32060624 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:           0 kB
> VmallocChunk:          0 kB
> Percpu:           118048 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:   6406144 kB
> ShmemHugePages:        0 kB
> ShmemPmdMapped:        0 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> Hugetlb:               0 kB
> DirectMap4k:     2580336 kB
> DirectMap2M:    14196736 kB
> DirectMap1G:     2097152 kB
> 
> 
> vmstat shows:
> nr_free_pages 1320053
> nr_zone_inactive_anon 218362
> nr_zone_active_anon 2185108
> nr_zone_inactive_file 38363
> nr_zone_active_file 48645
> nr_zone_unevictable 4975
> nr_zone_write_pending 495
> nr_mlock 4975
> nr_page_table_pages 12553
> nr_kernel_stack 22576
> nr_bounce 0
> nr_zspages 0
> nr_free_cma 0
> numa_hit 13916119899
> numa_miss 0
> numa_foreign 0
> numa_interleave 15629
> numa_local 13916119899
> numa_other 0
> nr_inactive_anon 218362
> nr_active_anon 2185164
> nr_inactive_file 38363
> nr_active_file 48645
> nr_unevictable 4975
> nr_slab_reclaimable 95849
> nr_slab_unreclaimable 114118
> nr_isolated_anon 0
> nr_isolated_file 0
> workingset_refault 71365357
> workingset_activate 20281670
> workingset_restore 8995665
> workingset_nodereclaim 326085
> nr_anon_pages 2105903
> nr_mapped 244553
> nr_file_pages 306921
> nr_dirty 495
> nr_writeback 0
> nr_writeback_temp 0
> nr_shmem 218920
> nr_shmem_hugepages 0
> nr_shmem_pmdmapped 0
> nr_anon_transparent_hugepages 3128
> nr_unstable 0
> nr_vmscan_write 0
> nr_vmscan_immediate_reclaim 1833104
> nr_dirtied 386544087
> nr_written 259220036
> nr_dirty_threshold 265636
> nr_dirty_background_threshold 132656
> pgpgin 1817628997
> pgpgout 3730818029
> pswpin 0
> pswpout 0
> pgalloc_dma 0
> pgalloc_dma32 5790777997
> pgalloc_normal 20003662520
> pgalloc_movable 0
> allocstall_dma 0
> allocstall_dma32 0
> allocstall_normal 39
> allocstall_movable 1980089
> pgskip_dma 0
> pgskip_dma32 0
> pgskip_normal 0
> pgskip_movable 0
> pgfree 26637215947
> pgactivate 316722654
> pgdeactivate 261039211
> pglazyfree 0
> pgfault 17719356599
> pgmajfault 30985544
> pglazyfreed 0
> pgrefill 286826568
> pgsteal_kswapd 36740923
> pgsteal_direct 349291470
> pgscan_kswapd 36878966
> pgscan_direct 395327492
> pgscan_direct_throttle 0
> zone_reclaim_failed 0
> pginodesteal 49817087
> slabs_scanned 597956834
> kswapd_inodesteal 1412447
> kswapd_low_wmark_hit_quickly 39
> kswapd_high_wmark_hit_quickly 319
> pageoutrun 3585
> pgrotated 2873743
> drop_pagecache 0
> drop_slab 0
> oom_kill 0
> pgmigrate_success 839062285
> pgmigrate_fail 507313
> compact_migrate_scanned 9619077010
> compact_free_scanned 67985619651
> compact_isolated 1684537704
> compact_stall 205761
> compact_fail 182420
> compact_success 23341
> compact_daemon_wake 2
> compact_daemon_migrate_scanned 811
> compact_daemon_free_scanned 490241
> htlb_buddy_alloc_success 0
> htlb_buddy_alloc_fail 0
> unevictable_pgs_culled 1006521
> unevictable_pgs_scanned 0
> unevictable_pgs_rescued 997077
> unevictable_pgs_mlocked 1319203
> unevictable_pgs_munlocked 842471
> unevictable_pgs_cleared 470531
> unevictable_pgs_stranded 459613
> thp_fault_alloc 20263113
> thp_fault_fallback 3368635
> thp_collapse_alloc 226476
> thp_collapse_alloc_failed 17594
> thp_file_alloc 0
> thp_file_mapped 0
> thp_split_page 1159
> thp_split_page_failed 3927
> thp_deferred_split_page 20348941
> thp_split_pmd 53361
> thp_split_pud 0
> thp_zero_page_alloc 1
> thp_zero_page_alloc_failed 0
> thp_swpout 0
> thp_swpout_fallback 0
> balloon_inflate 0
> balloon_deflate 0
> balloon_migrate 0
> swap_ra 0
> swap_ra_hit 0
> 
> Greets,
> Stefan
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-06 10:08     ` Stefan Priebe - Profihost AG
@ 2019-09-06 10:25       ` Vlastimil Babka
  2019-09-06 18:52       ` Yang Shi
  2019-09-09  8:27       ` Michal Hocko
  2 siblings, 0 replies; 61+ messages in thread
From: Vlastimil Babka @ 2019-09-06 10:25 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On 9/6/19 12:08 PM, Stefan Priebe - Profihost AG wrote:
> These are the biggest differences in meminfo before and after cached
> starts to drop. I didn't expect cached end up in MemFree.
> 
> Before:
> MemTotal:       16423116 kB
> MemFree:          374572 kB
> MemAvailable:    5633816 kB
> Cached:          5550972 kB
> Inactive:        4696580 kB
> Inactive(file):  3624776 kB
> 
> 
> After:
> MemTotal:       16423116 kB
> MemFree:         3477168 kB
> MemAvailable:    6066916 kB
> Cached:          2724504 kB
> Inactive:        1854740 kB
> Inactive(file):   950680 kB
> 
> Any explanation?

How does /proc/pagetypeinfo look like?
Also as Michal said, collecting the whole of /proc/vmstat (e.g. catting
it to vmstat.$TIMESTAMP once per second) when the bad situation is
happening, would be useful.
You could also try if the bad trend stops after you execute:
 echo never > /sys/kernel/mm/transparent_hugepage/defrag


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-06 10:08     ` Stefan Priebe - Profihost AG
  2019-09-06 10:25       ` Vlastimil Babka
@ 2019-09-06 18:52       ` Yang Shi
  2019-09-07  7:32         ` Stefan Priebe - Profihost AG
  2019-09-09  8:27       ` Michal Hocko
  2 siblings, 1 reply; 61+ messages in thread
From: Yang Shi @ 2019-09-06 18:52 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner

On Fri, Sep 6, 2019 at 3:08 AM Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
> These are the biggest differences in meminfo before and after cached
> starts to drop. I didn't expect cached end up in MemFree.
>
> Before:
> MemTotal:       16423116 kB
> MemFree:          374572 kB

Here MemFree is only ~300MB? It is quite low comparing with the amount
of total memory. It may trigger water mark to launch kswapd.

> MemAvailable:    5633816 kB
> Cached:          5550972 kB
> Inactive:        4696580 kB
> Inactive(file):  3624776 kB
>
>
> After:
> MemTotal:       16423116 kB
> MemFree:         3477168 kB

Here MemFree is ~3GB and file cache was shrunk from ~5G down to ~2G.

> MemAvailable:    6066916 kB
> Cached:          2724504 kB
> Inactive:        1854740 kB
> Inactive(file):   950680 kB
>
> Any explanation?
>
> Greets,
> Stefan
> Am 05.09.19 um 13:56 schrieb Stefan Priebe - Profihost AG:
> >
> > Am 05.09.19 um 13:40 schrieb Michal Hocko:
> >> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
> >>> Hello all,
> >>>
> >>> i hope you can help me again to understand the current MemAvailable
> >>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
> >>> this case.
> >>>
> >>> I'm seeing the following behaviour i don't understand and ask for help.
> >>>
> >>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
> >>> to 1G while the apache spawns some PHP processes. After that the PSI
> >>> mem.some value rises and the kernel tries to reclaim memory but
> >>> MemAvailable stays at 5G.
> >>>
> >>> Any ideas?
> >>
> >> Can you collect /proc/vmstat (every second or so) and post it while this
> >> is the case please?
> >
> > Yes sure.
> >
> > But i don't know which event you mean exactly. Current situation is PSI
> > / memory pressure is > 20 but:
> >
> > This is the current status where MemAvailable show 5G but Cached is
> > already dropped to 1G coming from 4G:
> >
> >
> > meminfo:
> > MemTotal:       16423116 kB
> > MemFree:         5280736 kB
> > MemAvailable:    5332752 kB
> > Buffers:            2572 kB
> > Cached:          1225112 kB
> > SwapCached:            0 kB
> > Active:          8934976 kB
> > Inactive:        1026900 kB
> > Active(anon):    8740396 kB
> > Inactive(anon):   873448 kB
> > Active(file):     194580 kB
> > Inactive(file):   153452 kB
> > Unevictable:       19900 kB
> > Mlocked:           19900 kB
> > SwapTotal:             0 kB
> > SwapFree:              0 kB
> > Dirty:              1980 kB
> > Writeback:             0 kB
> > AnonPages:       8423480 kB
> > Mapped:           978212 kB
> > Shmem:            875680 kB
> > Slab:             839868 kB
> > SReclaimable:     383396 kB
> > SUnreclaim:       456472 kB
> > KernelStack:       22576 kB
> > PageTables:        49824 kB
> > NFS_Unstable:          0 kB
> > Bounce:                0 kB
> > WritebackTmp:          0 kB
> > CommitLimit:     8211556 kB
> > Committed_AS:   32060624 kB
> > VmallocTotal:   34359738367 kB
> > VmallocUsed:           0 kB
> > VmallocChunk:          0 kB
> > Percpu:           118048 kB
> > HardwareCorrupted:     0 kB
> > AnonHugePages:   6406144 kB
> > ShmemHugePages:        0 kB
> > ShmemPmdMapped:        0 kB
> > HugePages_Total:       0
> > HugePages_Free:        0
> > HugePages_Rsvd:        0
> > HugePages_Surp:        0
> > Hugepagesize:       2048 kB
> > Hugetlb:               0 kB
> > DirectMap4k:     2580336 kB
> > DirectMap2M:    14196736 kB
> > DirectMap1G:     2097152 kB
> >
> >
> > vmstat shows:
> > nr_free_pages 1320053
> > nr_zone_inactive_anon 218362
> > nr_zone_active_anon 2185108
> > nr_zone_inactive_file 38363
> > nr_zone_active_file 48645
> > nr_zone_unevictable 4975
> > nr_zone_write_pending 495
> > nr_mlock 4975
> > nr_page_table_pages 12553
> > nr_kernel_stack 22576
> > nr_bounce 0
> > nr_zspages 0
> > nr_free_cma 0
> > numa_hit 13916119899
> > numa_miss 0
> > numa_foreign 0
> > numa_interleave 15629
> > numa_local 13916119899
> > numa_other 0
> > nr_inactive_anon 218362
> > nr_active_anon 2185164
> > nr_inactive_file 38363
> > nr_active_file 48645
> > nr_unevictable 4975
> > nr_slab_reclaimable 95849
> > nr_slab_unreclaimable 114118
> > nr_isolated_anon 0
> > nr_isolated_file 0
> > workingset_refault 71365357
> > workingset_activate 20281670
> > workingset_restore 8995665
> > workingset_nodereclaim 326085
> > nr_anon_pages 2105903
> > nr_mapped 244553
> > nr_file_pages 306921
> > nr_dirty 495
> > nr_writeback 0
> > nr_writeback_temp 0
> > nr_shmem 218920
> > nr_shmem_hugepages 0
> > nr_shmem_pmdmapped 0
> > nr_anon_transparent_hugepages 3128
> > nr_unstable 0
> > nr_vmscan_write 0
> > nr_vmscan_immediate_reclaim 1833104
> > nr_dirtied 386544087
> > nr_written 259220036
> > nr_dirty_threshold 265636
> > nr_dirty_background_threshold 132656
> > pgpgin 1817628997
> > pgpgout 3730818029
> > pswpin 0
> > pswpout 0
> > pgalloc_dma 0
> > pgalloc_dma32 5790777997
> > pgalloc_normal 20003662520
> > pgalloc_movable 0
> > allocstall_dma 0
> > allocstall_dma32 0
> > allocstall_normal 39
> > allocstall_movable 1980089
> > pgskip_dma 0
> > pgskip_dma32 0
> > pgskip_normal 0
> > pgskip_movable 0
> > pgfree 26637215947
> > pgactivate 316722654
> > pgdeactivate 261039211
> > pglazyfree 0
> > pgfault 17719356599
> > pgmajfault 30985544
> > pglazyfreed 0
> > pgrefill 286826568
> > pgsteal_kswapd 36740923
> > pgsteal_direct 349291470
> > pgscan_kswapd 36878966
> > pgscan_direct 395327492
> > pgscan_direct_throttle 0
> > zone_reclaim_failed 0
> > pginodesteal 49817087
> > slabs_scanned 597956834
> > kswapd_inodesteal 1412447
> > kswapd_low_wmark_hit_quickly 39
> > kswapd_high_wmark_hit_quickly 319
> > pageoutrun 3585
> > pgrotated 2873743
> > drop_pagecache 0
> > drop_slab 0
> > oom_kill 0
> > pgmigrate_success 839062285
> > pgmigrate_fail 507313
> > compact_migrate_scanned 9619077010
> > compact_free_scanned 67985619651
> > compact_isolated 1684537704
> > compact_stall 205761
> > compact_fail 182420
> > compact_success 23341
> > compact_daemon_wake 2
> > compact_daemon_migrate_scanned 811
> > compact_daemon_free_scanned 490241
> > htlb_buddy_alloc_success 0
> > htlb_buddy_alloc_fail 0
> > unevictable_pgs_culled 1006521
> > unevictable_pgs_scanned 0
> > unevictable_pgs_rescued 997077
> > unevictable_pgs_mlocked 1319203
> > unevictable_pgs_munlocked 842471
> > unevictable_pgs_cleared 470531
> > unevictable_pgs_stranded 459613
> > thp_fault_alloc 20263113
> > thp_fault_fallback 3368635
> > thp_collapse_alloc 226476
> > thp_collapse_alloc_failed 17594
> > thp_file_alloc 0
> > thp_file_mapped 0
> > thp_split_page 1159
> > thp_split_page_failed 3927
> > thp_deferred_split_page 20348941
> > thp_split_pmd 53361
> > thp_split_pud 0
> > thp_zero_page_alloc 1
> > thp_zero_page_alloc_failed 0
> > thp_swpout 0
> > thp_swpout_fallback 0
> > balloon_inflate 0
> > balloon_deflate 0
> > balloon_migrate 0
> > swap_ra 0
> > swap_ra_hit 0
> >
> > Greets,
> > Stefan
> >
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-06 18:52       ` Yang Shi
@ 2019-09-07  7:32         ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-07  7:32 UTC (permalink / raw)
  To: Yang Shi; +Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner


Am 06.09.19 um 20:52 schrieb Yang Shi:
> On Fri, Sep 6, 2019 at 3:08 AM Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>>
>> These are the biggest differences in meminfo before and after cached
>> starts to drop. I didn't expect cached end up in MemFree.
>>
>> Before:
>> MemTotal:       16423116 kB
>> MemFree:          374572 kB
> 
> Here MemFree is only ~300MB? It is quite low comparing with the amount
> of total memory. It may trigger water mark to launch kswapd.

mhm yes that might be possible but i don't see kswapd running in the
process list at least not using cpu.

Also does it really free all cache? i thought it would only free up to
vm.min_free_kbytes but this is 160MB on this machine.

>> MemAvailable:    5633816 kB
>> Cached:          5550972 kB
>> Inactive:        4696580 kB
>> Inactive(file):  3624776 kB
>>
>>
>> After:
>> MemTotal:       16423116 kB
>> MemFree:         3477168 kB
> 
> Here MemFree is ~3GB and file cache was shrunk from ~5G down to ~2G.

yes - bu i thought if file cache gets shrunk another process has
requested tis memory and would use it. So it does not end up in free.

I'm sure this all is explainable - but i really would like to know how.

Greets,
Stefan

>> MemAvailable:    6066916 kB
>> Cached:          2724504 kB
>> Inactive:        1854740 kB
>> Inactive(file):   950680 kB
>>
>> Any explanation?
>>
>> Greets,
>> Stefan
>> Am 05.09.19 um 13:56 schrieb Stefan Priebe - Profihost AG:
>>>
>>> Am 05.09.19 um 13:40 schrieb Michal Hocko:
>>>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote:
>>>>> Hello all,
>>>>>
>>>>> i hope you can help me again to understand the current MemAvailable
>>>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in
>>>>> this case.
>>>>>
>>>>> I'm seeing the following behaviour i don't understand and ask for help.
>>>>>
>>>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down
>>>>> to 1G while the apache spawns some PHP processes. After that the PSI
>>>>> mem.some value rises and the kernel tries to reclaim memory but
>>>>> MemAvailable stays at 5G.
>>>>>
>>>>> Any ideas?
>>>>
>>>> Can you collect /proc/vmstat (every second or so) and post it while this
>>>> is the case please?
>>>
>>> Yes sure.
>>>
>>> But i don't know which event you mean exactly. Current situation is PSI
>>> / memory pressure is > 20 but:
>>>
>>> This is the current status where MemAvailable show 5G but Cached is
>>> already dropped to 1G coming from 4G:
>>>
>>>
>>> meminfo:
>>> MemTotal:       16423116 kB
>>> MemFree:         5280736 kB
>>> MemAvailable:    5332752 kB
>>> Buffers:            2572 kB
>>> Cached:          1225112 kB
>>> SwapCached:            0 kB
>>> Active:          8934976 kB
>>> Inactive:        1026900 kB
>>> Active(anon):    8740396 kB
>>> Inactive(anon):   873448 kB
>>> Active(file):     194580 kB
>>> Inactive(file):   153452 kB
>>> Unevictable:       19900 kB
>>> Mlocked:           19900 kB
>>> SwapTotal:             0 kB
>>> SwapFree:              0 kB
>>> Dirty:              1980 kB
>>> Writeback:             0 kB
>>> AnonPages:       8423480 kB
>>> Mapped:           978212 kB
>>> Shmem:            875680 kB
>>> Slab:             839868 kB
>>> SReclaimable:     383396 kB
>>> SUnreclaim:       456472 kB
>>> KernelStack:       22576 kB
>>> PageTables:        49824 kB
>>> NFS_Unstable:          0 kB
>>> Bounce:                0 kB
>>> WritebackTmp:          0 kB
>>> CommitLimit:     8211556 kB
>>> Committed_AS:   32060624 kB
>>> VmallocTotal:   34359738367 kB
>>> VmallocUsed:           0 kB
>>> VmallocChunk:          0 kB
>>> Percpu:           118048 kB
>>> HardwareCorrupted:     0 kB
>>> AnonHugePages:   6406144 kB
>>> ShmemHugePages:        0 kB
>>> ShmemPmdMapped:        0 kB
>>> HugePages_Total:       0
>>> HugePages_Free:        0
>>> HugePages_Rsvd:        0
>>> HugePages_Surp:        0
>>> Hugepagesize:       2048 kB
>>> Hugetlb:               0 kB
>>> DirectMap4k:     2580336 kB
>>> DirectMap2M:    14196736 kB
>>> DirectMap1G:     2097152 kB
>>>
>>>
>>> vmstat shows:
>>> nr_free_pages 1320053
>>> nr_zone_inactive_anon 218362
>>> nr_zone_active_anon 2185108
>>> nr_zone_inactive_file 38363
>>> nr_zone_active_file 48645
>>> nr_zone_unevictable 4975
>>> nr_zone_write_pending 495
>>> nr_mlock 4975
>>> nr_page_table_pages 12553
>>> nr_kernel_stack 22576
>>> nr_bounce 0
>>> nr_zspages 0
>>> nr_free_cma 0
>>> numa_hit 13916119899
>>> numa_miss 0
>>> numa_foreign 0
>>> numa_interleave 15629
>>> numa_local 13916119899
>>> numa_other 0
>>> nr_inactive_anon 218362
>>> nr_active_anon 2185164
>>> nr_inactive_file 38363
>>> nr_active_file 48645
>>> nr_unevictable 4975
>>> nr_slab_reclaimable 95849
>>> nr_slab_unreclaimable 114118
>>> nr_isolated_anon 0
>>> nr_isolated_file 0
>>> workingset_refault 71365357
>>> workingset_activate 20281670
>>> workingset_restore 8995665
>>> workingset_nodereclaim 326085
>>> nr_anon_pages 2105903
>>> nr_mapped 244553
>>> nr_file_pages 306921
>>> nr_dirty 495
>>> nr_writeback 0
>>> nr_writeback_temp 0
>>> nr_shmem 218920
>>> nr_shmem_hugepages 0
>>> nr_shmem_pmdmapped 0
>>> nr_anon_transparent_hugepages 3128
>>> nr_unstable 0
>>> nr_vmscan_write 0
>>> nr_vmscan_immediate_reclaim 1833104
>>> nr_dirtied 386544087
>>> nr_written 259220036
>>> nr_dirty_threshold 265636
>>> nr_dirty_background_threshold 132656
>>> pgpgin 1817628997
>>> pgpgout 3730818029
>>> pswpin 0
>>> pswpout 0
>>> pgalloc_dma 0
>>> pgalloc_dma32 5790777997
>>> pgalloc_normal 20003662520
>>> pgalloc_movable 0
>>> allocstall_dma 0
>>> allocstall_dma32 0
>>> allocstall_normal 39
>>> allocstall_movable 1980089
>>> pgskip_dma 0
>>> pgskip_dma32 0
>>> pgskip_normal 0
>>> pgskip_movable 0
>>> pgfree 26637215947
>>> pgactivate 316722654
>>> pgdeactivate 261039211
>>> pglazyfree 0
>>> pgfault 17719356599
>>> pgmajfault 30985544
>>> pglazyfreed 0
>>> pgrefill 286826568
>>> pgsteal_kswapd 36740923
>>> pgsteal_direct 349291470
>>> pgscan_kswapd 36878966
>>> pgscan_direct 395327492
>>> pgscan_direct_throttle 0
>>> zone_reclaim_failed 0
>>> pginodesteal 49817087
>>> slabs_scanned 597956834
>>> kswapd_inodesteal 1412447
>>> kswapd_low_wmark_hit_quickly 39
>>> kswapd_high_wmark_hit_quickly 319
>>> pageoutrun 3585
>>> pgrotated 2873743
>>> drop_pagecache 0
>>> drop_slab 0
>>> oom_kill 0
>>> pgmigrate_success 839062285
>>> pgmigrate_fail 507313
>>> compact_migrate_scanned 9619077010
>>> compact_free_scanned 67985619651
>>> compact_isolated 1684537704
>>> compact_stall 205761
>>> compact_fail 182420
>>> compact_success 23341
>>> compact_daemon_wake 2
>>> compact_daemon_migrate_scanned 811
>>> compact_daemon_free_scanned 490241
>>> htlb_buddy_alloc_success 0
>>> htlb_buddy_alloc_fail 0
>>> unevictable_pgs_culled 1006521
>>> unevictable_pgs_scanned 0
>>> unevictable_pgs_rescued 997077
>>> unevictable_pgs_mlocked 1319203
>>> unevictable_pgs_munlocked 842471
>>> unevictable_pgs_cleared 470531
>>> unevictable_pgs_stranded 459613
>>> thp_fault_alloc 20263113
>>> thp_fault_fallback 3368635
>>> thp_collapse_alloc 226476
>>> thp_collapse_alloc_failed 17594
>>> thp_file_alloc 0
>>> thp_file_mapped 0
>>> thp_split_page 1159
>>> thp_split_page_failed 3927
>>> thp_deferred_split_page 20348941
>>> thp_split_pmd 53361
>>> thp_split_pud 0
>>> thp_zero_page_alloc 1
>>> thp_zero_page_alloc_failed 0
>>> thp_swpout 0
>>> thp_swpout_fallback 0
>>> balloon_inflate 0
>>> balloon_deflate 0
>>> balloon_migrate 0
>>> swap_ra 0
>>> swap_ra_hit 0
>>>
>>> Greets,
>>> Stefan
>>>
>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-06 10:08     ` Stefan Priebe - Profihost AG
  2019-09-06 10:25       ` Vlastimil Babka
  2019-09-06 18:52       ` Yang Shi
@ 2019-09-09  8:27       ` Michal Hocko
  2019-09-09  8:54         ` Stefan Priebe - Profihost AG
  2 siblings, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-09  8:27 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On Fri 06-09-19 12:08:31, Stefan Priebe - Profihost AG wrote:
> These are the biggest differences in meminfo before and after cached
> starts to drop. I didn't expect cached end up in MemFree.
> 
> Before:
> MemTotal:       16423116 kB
> MemFree:          374572 kB
> MemAvailable:    5633816 kB
> Cached:          5550972 kB
> Inactive:        4696580 kB
> Inactive(file):  3624776 kB
> 
> 
> After:
> MemTotal:       16423116 kB
> MemFree:         3477168 kB
> MemAvailable:    6066916 kB
> Cached:          2724504 kB
> Inactive:        1854740 kB
> Inactive(file):   950680 kB
> 
> Any explanation?

Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and
me earlier in this thread? Seeing the overall progress would tell us
much more than before and after. Or have I missed this data?

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09  8:27       ` Michal Hocko
@ 2019-09-09  8:54         ` Stefan Priebe - Profihost AG
  2019-09-09 11:01           ` Michal Hocko
  2019-09-09 11:49           ` Vlastimil Babka
  0 siblings, 2 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-09  8:54 UTC (permalink / raw)
  To: Michal Hocko; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

Hello Michal,

Am 09.09.19 um 10:27 schrieb Michal Hocko:
> On Fri 06-09-19 12:08:31, Stefan Priebe - Profihost AG wrote:
>> These are the biggest differences in meminfo before and after cached
>> starts to drop. I didn't expect cached end up in MemFree.
>>
>> Before:
>> MemTotal:       16423116 kB
>> MemFree:          374572 kB
>> MemAvailable:    5633816 kB
>> Cached:          5550972 kB
>> Inactive:        4696580 kB
>> Inactive(file):  3624776 kB
>>
>>
>> After:
>> MemTotal:       16423116 kB
>> MemFree:         3477168 kB
>> MemAvailable:    6066916 kB
>> Cached:          2724504 kB
>> Inactive:        1854740 kB
>> Inactive(file):   950680 kB
>>
>> Any explanation?
> 
> Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and
> me earlier in this thread? Seeing the overall progress would tell us
> much more than before and after. Or have I missed this data?

I needed to wait until today to grab again such a situation but from
what i know it is very clear that MemFree is low and than the kernel
starts to drop the chaches.

Attached you'll find two log files.

Greets,
Stefan





[-- Attachment #2: meminfo.gz --]
[-- Type: application/gzip, Size: 114233 bytes --]

[-- Attachment #3: vmstat.gz --]
[-- Type: application/gzip, Size: 224712 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09  8:54         ` Stefan Priebe - Profihost AG
@ 2019-09-09 11:01           ` Michal Hocko
  2019-09-09 12:08             ` Michal Hocko
  2019-09-09 11:49           ` Vlastimil Babka
  1 sibling, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-09 11:01 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

[Cc Vlastimil - logs are http://lkml.kernel.org/r/1d9ee19a-98c9-cd78-1e5b-21d9d6e36792@profihost.ag]

On Mon 09-09-19 10:54:21, Stefan Priebe - Profihost AG wrote:
> Hello Michal,
> 
> Am 09.09.19 um 10:27 schrieb Michal Hocko:
> > On Fri 06-09-19 12:08:31, Stefan Priebe - Profihost AG wrote:
> >> These are the biggest differences in meminfo before and after cached
> >> starts to drop. I didn't expect cached end up in MemFree.
> >>
> >> Before:
> >> MemTotal:       16423116 kB
> >> MemFree:          374572 kB
> >> MemAvailable:    5633816 kB
> >> Cached:          5550972 kB
> >> Inactive:        4696580 kB
> >> Inactive(file):  3624776 kB
> >>
> >>
> >> After:
> >> MemTotal:       16423116 kB
> >> MemFree:         3477168 kB
> >> MemAvailable:    6066916 kB
> >> Cached:          2724504 kB
> >> Inactive:        1854740 kB
> >> Inactive(file):   950680 kB
> >>
> >> Any explanation?
> > 
> > Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and
> > me earlier in this thread? Seeing the overall progress would tell us
> > much more than before and after. Or have I missed this data?
> 
> I needed to wait until today to grab again such a situation but from
> what i know it is very clear that MemFree is low and than the kernel
> starts to drop the chaches.
> 
> Attached you'll find two log files.

$ grep pgsteal_kswapd vmstat | uniq -c
   1331 pgsteal_kswapd 37142300
$ grep pgscan_kswapd vmstat | uniq -c
   1331 pgscan_kswapd 37285092

kswapd hasn't scanned nor reclaimed any memory throughout the whole
collected time span. On the other hand we can see direct reclaim active.
But we can see quite some direct reclaim activity:
$ awk '/pgsteal_direct/ {val=$2+0; ln++; if (last && val-last > 0) {printf("%d %d\n", ln, val-last)} last=val}' vmstat | head
17 1058
18 9773
19 1036
24 11413
49 1055
50 1050
51 17938
52 22665
53 29400
54 5997

So there is a steady source of the direct reclaim which is quite
unexpected considering the background reclaim is inactive. Or maybe it
is blocked not able to make a forward progress.

780513 pages has been reclaimed which is 3G worth of memory which
matches the dropdown you are seeing AFAICS.

$ grep allocstall_dma32 vmstat | uniq -c
   1331 allocstall_dma32 0
$ grep allocstall_normal vmstat | uniq -c
   1331 allocstall_normal 39

no direct reclaim invoked for DMA32 and Normal zones. But Movable zone
seems the be the source of the direct reclaim
awk '/allocstall_movable/ {val=$2+0; ln++; if (last && val-last > 0) {printf("%d %d\n", ln, val-last)} last=val}' vmstat | head
17 1
18 9
19 1
24 10
49 1
50 1
51 17
52 20
53 28
54 5

and that matches moments when we reclaimed memory. There seems to be a
steady THP allocations flow so maybe this is a source of the direct
reclaim?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09  8:54         ` Stefan Priebe - Profihost AG
  2019-09-09 11:01           ` Michal Hocko
@ 2019-09-09 11:49           ` Vlastimil Babka
  2019-09-09 12:09             ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Vlastimil Babka @ 2019-09-09 11:49 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote:
>> Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and
>> me earlier in this thread? Seeing the overall progress would tell us
>> much more than before and after. Or have I missed this data?
> 
> I needed to wait until today to grab again such a situation but from
> what i know it is very clear that MemFree is low and than the kernel
> starts to drop the chaches.
> 
> Attached you'll find two log files.

Thanks, what about my other requests/suggestions from earlier?

1. How does /proc/pagetypeinfo look like?
2. Could you also try if the bad trend stops after you execute:
  echo never > /sys/kernel/mm/transparent_hugepage/defrag
and report the result?

Thanks


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 11:01           ` Michal Hocko
@ 2019-09-09 12:08             ` Michal Hocko
  2019-09-09 12:10               ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-09 12:08 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Mon 09-09-19 13:01:36, Michal Hocko wrote:
> and that matches moments when we reclaimed memory. There seems to be a
> steady THP allocations flow so maybe this is a source of the direct
> reclaim?

I was thinking about this some more and THP being a source of reclaim
sounds quite unlikely. At least in a default configuration because we
shouldn't do anything expensinve in the #PF path. But there might be a
difference source of high order (!costly) allocations. Could you check
how many allocation requests like that you have on your system?

mount -t debugfs none /debug
echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
cat /debug/tracing/trace_pipe > $file
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 11:49           ` Vlastimil Babka
@ 2019-09-09 12:09             ` Stefan Priebe - Profihost AG
  2019-09-09 12:21               ` Vlastimil Babka
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-09 12:09 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner


Am 09.09.19 um 13:49 schrieb Vlastimil Babka:
> On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote:
>>> Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and
>>> me earlier in this thread? Seeing the overall progress would tell us
>>> much more than before and after. Or have I missed this data?
>>
>> I needed to wait until today to grab again such a situation but from
>> what i know it is very clear that MemFree is low and than the kernel
>> starts to drop the chaches.
>>
>> Attached you'll find two log files.
> 
> Thanks, what about my other requests/suggestions from earlier?

Sorry i missed your email.

> 1. How does /proc/pagetypeinfo look like?

# cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3
     4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      1      0      0      1
     2      1      1      0      1      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0
     0      0      0      0      0      1      3
Node    0, zone      DMA, type  Reclaimable      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone      DMA, type   HighAtomic      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone      DMA, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone    DMA32, type    Unmovable   1141    970    903    628
   302    106     27      4      0      0      0
Node    0, zone    DMA32, type      Movable    274    269    368    396
   342    265    214    178    113     12     13
Node    0, zone    DMA32, type  Reclaimable     81     57    134    114
    60     50     25      4      2      0      0
Node    0, zone    DMA32, type   HighAtomic      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable     39     36  13257   3474
  1333    317     42      0      0      0      0
Node    0, zone   Normal, type      Movable   1087   9678   1104   4250
  2391   1946   1768    691    141      0      0
Node    0, zone   Normal, type  Reclaimable      1   1782   1153   2455
  1927    986    330      7      2      0      0
Node    0, zone   Normal, type   HighAtomic      1      1      2      2
     2      0      1      1      1      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0

Number of blocks type     Unmovable      Movable  Reclaimable
HighAtomic      Isolate
Node 0, zone      DMA            1            7            0
0            0
Node 0, zone    DMA32           52         1461           15
0            0
Node 0, zone   Normal          824         5448          383
1            0

> 2. Could you also try if the bad trend stops after you execute:
>  echo never > /sys/kernel/mm/transparent_hugepage/defrag
> and report the result?

it's pretty difficult to catch those moments. Is it OK so set the value
now and monitor if it happens again?

Just to let you know:
I've now also some more servers where memfree show 10-20Gb but cache
drops suddently and memory PSI raises.

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:08             ` Michal Hocko
@ 2019-09-09 12:10               ` Stefan Priebe - Profihost AG
  2019-09-09 12:28                 ` Michal Hocko
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-09 12:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka


Am 09.09.19 um 14:08 schrieb Michal Hocko:
> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
>> and that matches moments when we reclaimed memory. There seems to be a
>> steady THP allocations flow so maybe this is a source of the direct
>> reclaim?
> 
> I was thinking about this some more and THP being a source of reclaim
> sounds quite unlikely. At least in a default configuration because we
> shouldn't do anything expensinve in the #PF path. But there might be a
> difference source of high order (!costly) allocations. Could you check
> how many allocation requests like that you have on your system?
> 
> mount -t debugfs none /debug
> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
> cat /debug/tracing/trace_pipe > $file

Just now or when PSI raises?

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:09             ` Stefan Priebe - Profihost AG
@ 2019-09-09 12:21               ` Vlastimil Babka
  2019-09-09 12:31                 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Vlastimil Babka @ 2019-09-09 12:21 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On 9/9/19 2:09 PM, Stefan Priebe - Profihost AG wrote:
> 
> Am 09.09.19 um 13:49 schrieb Vlastimil Babka:
>> On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote:
>>>> Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and
>>>> me earlier in this thread? Seeing the overall progress would tell us
>>>> much more than before and after. Or have I missed this data?
>>>
>>> I needed to wait until today to grab again such a situation but from
>>> what i know it is very clear that MemFree is low and than the kernel
>>> starts to drop the chaches.
>>>
>>> Attached you'll find two log files.
>>
>> Thanks, what about my other requests/suggestions from earlier?
> 
> Sorry i missed your email.
> 
>> 1. How does /proc/pagetypeinfo look like?
> 
> # cat /proc/pagetypeinfo
> Page block order: 9
> Pages per block:  512

Looks like it might be fragmented, but was that snapshot taken in the 
situation where there's free memory and the system still drops cache?

>> 2. Could you also try if the bad trend stops after you execute:
>>   echo never > /sys/kernel/mm/transparent_hugepage/defrag
>> and report the result?
> 
> it's pretty difficult to catch those moments. Is it OK so set the value
> now and monitor if it happens again?

Well if it doesn't happen again after changing that setting, it would 
definitely point at THP interactions.

> Just to let you know:
> I've now also some more servers where memfree show 10-20Gb but cache
> drops suddently and memory PSI raises.

You mean those are in that state right now? So how does 
/proc/pagetypeinfo look there, and would changing the defrag setting help?

> Greets,
> Stefan
> 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:10               ` Stefan Priebe - Profihost AG
@ 2019-09-09 12:28                 ` Michal Hocko
  2019-09-09 12:37                   ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-09 12:28 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
> 
> Am 09.09.19 um 14:08 schrieb Michal Hocko:
> > On Mon 09-09-19 13:01:36, Michal Hocko wrote:
> >> and that matches moments when we reclaimed memory. There seems to be a
> >> steady THP allocations flow so maybe this is a source of the direct
> >> reclaim?
> > 
> > I was thinking about this some more and THP being a source of reclaim
> > sounds quite unlikely. At least in a default configuration because we
> > shouldn't do anything expensinve in the #PF path. But there might be a
> > difference source of high order (!costly) allocations. Could you check
> > how many allocation requests like that you have on your system?
> > 
> > mount -t debugfs none /debug
> > echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
> > echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
> > cat /debug/tracing/trace_pipe > $file

echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
 
might tell us something as well but it might turn out that it just still
doesn't give us the full picture and we might need
echo stacktrace > /debug/tracing/trace_options

It will generate much more output though.

> Just now or when PSI raises?

When the excessive reclaim is happening ideally.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:21               ` Vlastimil Babka
@ 2019-09-09 12:31                 ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-09 12:31 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

Am 09.09.19 um 14:21 schrieb Vlastimil Babka:
> On 9/9/19 2:09 PM, Stefan Priebe - Profihost AG wrote:
>>
>> Am 09.09.19 um 13:49 schrieb Vlastimil Babka:
>>> On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote:
>>>>> Do you have more snapshots of /proc/vmstat as suggested by
>>>>> Vlastimil and
>>>>> me earlier in this thread? Seeing the overall progress would tell us
>>>>> much more than before and after. Or have I missed this data?
>>>>
>>>> I needed to wait until today to grab again such a situation but from
>>>> what i know it is very clear that MemFree is low and than the kernel
>>>> starts to drop the chaches.
>>>>
>>>> Attached you'll find two log files.
>>>
>>> Thanks, what about my other requests/suggestions from earlier?
>>
>> Sorry i missed your email.
>>
>>> 1. How does /proc/pagetypeinfo look like?
>>
>> # cat /proc/pagetypeinfo
>> Page block order: 9
>> Pages per block:  512
> 
> Looks like it might be fragmented, but was that snapshot taken in the
> situation where there's free memory and the system still drops cache?

No this one is from "now" where no pressure is recorded and where mem
free is at 3G and cache is also at 3G.

>>> 2. Could you also try if the bad trend stops after you execute:
>>>   echo never > /sys/kernel/mm/transparent_hugepage/defrag
>>> and report the result?
>>
>> it's pretty difficult to catch those moments. Is it OK so set the value
>> now and monitor if it happens again?
> 
> Well if it doesn't happen again after changing that setting, it would
> definitely point at THP interactions.

OK i set it to never.

>> Just to let you know:
>> I've now also some more servers where memfree show 10-20Gb but cache
>> drops suddently and memory PSI raises.
> 
> You mean those are in that state right now? So how does
> /proc/pagetypeinfo look there, and would changing the defrag setting help?

Yes i've a system which constantly triggers PSI (just 1-3%) but Mem Free
is at 29GB.

1402:
# cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3
     4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      0      0      0      1
     2      1      1      0      1      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0
     0      0      0      0      0      1      3
Node    0, zone      DMA, type  Reclaimable      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone      DMA, type   HighAtomic      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone      DMA, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone    DMA32, type    Unmovable      0      1      0      1
     0      1      0      1      1      0      3
Node    0, zone    DMA32, type      Movable     42     29     60     52
    56     52     47     46     24      3     48
Node    0, zone    DMA32, type  Reclaimable      0      0      3      1
     0      1      1      1      1      0      0
Node    0, zone    DMA32, type   HighAtomic      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable    189   7690  24737  14314
  7620   5362   3458   1607    165      0      0
Node    0, zone   Normal, type      Movable  29269  31003  70251  73957
 54776  37134  21084  10547   2307     35      4
Node    0, zone   Normal, type  Reclaimable   1431   3837   1821   2137
  2475    978    386    112      2      0      0
Node    0, zone   Normal, type   HighAtomic      0      0      1      3
     3      3      1      0      1      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0

Number of blocks type     Unmovable      Movable  Reclaimable
HighAtomic      Isolate
Node 0, zone      DMA            1            7            0
0            0
Node 0, zone    DMA32           10         1005            1
0            0
Node 0, zone   Normal         3407        27184         1152
1            0

Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:28                 ` Michal Hocko
@ 2019-09-09 12:37                   ` Stefan Priebe - Profihost AG
  2019-09-09 12:49                     ` Michal Hocko
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-09 12:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 1941 bytes --]


Am 09.09.19 um 14:28 schrieb Michal Hocko:
> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
>>
>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
>>>> and that matches moments when we reclaimed memory. There seems to be a
>>>> steady THP allocations flow so maybe this is a source of the direct
>>>> reclaim?
>>>
>>> I was thinking about this some more and THP being a source of reclaim
>>> sounds quite unlikely. At least in a default configuration because we
>>> shouldn't do anything expensinve in the #PF path. But there might be a
>>> difference source of high order (!costly) allocations. Could you check
>>> how many allocation requests like that you have on your system?
>>>
>>> mount -t debugfs none /debug
>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
>>> cat /debug/tracing/trace_pipe > $file
> 
> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>  
> might tell us something as well but it might turn out that it just still
> doesn't give us the full picture and we might need
> echo stacktrace > /debug/tracing/trace_options
> 
> It will generate much more output though.
> 
>> Just now or when PSI raises?
> 
> When the excessive reclaim is happening ideally.

This one is from a server with 28G memfree but memory pressure is still
jumping between 0 and 10%.

I did:
echo "order > 0" >
/sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter

echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable

echo 1 >
/sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable

echo 1 >
/sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable

timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace

File attached.

Stefan

[-- Attachment #2: trace.gz --]
[-- Type: application/gzip, Size: 311017 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:37                   ` Stefan Priebe - Profihost AG
@ 2019-09-09 12:49                     ` Michal Hocko
  2019-09-09 12:56                       ` Stefan Priebe - Profihost AG
  2019-09-10  5:41                       ` Stefan Priebe - Profihost AG
  0 siblings, 2 replies; 61+ messages in thread
From: Michal Hocko @ 2019-09-09 12:49 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
> 
> Am 09.09.19 um 14:28 schrieb Michal Hocko:
> > On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 09.09.19 um 14:08 schrieb Michal Hocko:
> >>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
> >>>> and that matches moments when we reclaimed memory. There seems to be a
> >>>> steady THP allocations flow so maybe this is a source of the direct
> >>>> reclaim?
> >>>
> >>> I was thinking about this some more and THP being a source of reclaim
> >>> sounds quite unlikely. At least in a default configuration because we
> >>> shouldn't do anything expensinve in the #PF path. But there might be a
> >>> difference source of high order (!costly) allocations. Could you check
> >>> how many allocation requests like that you have on your system?
> >>>
> >>> mount -t debugfs none /debug
> >>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
> >>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
> >>> cat /debug/tracing/trace_pipe > $file
> > 
> > echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> > echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> >  
> > might tell us something as well but it might turn out that it just still
> > doesn't give us the full picture and we might need
> > echo stacktrace > /debug/tracing/trace_options
> > 
> > It will generate much more output though.
> > 
> >> Just now or when PSI raises?
> > 
> > When the excessive reclaim is happening ideally.
> 
> This one is from a server with 28G memfree but memory pressure is still
> jumping between 0 and 10%.
> 
> I did:
> echo "order > 0" >
> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
> 
> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
> 
> echo 1 >
> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> 
> echo 1 >
> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> 
> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
> 
> File attached.

There is no reclaim captured in this trace dump.
$ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
    777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
    663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
    153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
    911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
   4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
     62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
     14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
     11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
   1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
     45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
      1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
   7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
     73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
    729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
    528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
   1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
   5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
      1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
    132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
     13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
      1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
   1232 order=9 gfp_flags=GFP_TRANSHUGE
    108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
    362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE

Nothing really stands out because except for the THP ones none of others
are going to even be using movable zone. You've said that your machine
doesn't have more than one NUMA node, right?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:49                     ` Michal Hocko
@ 2019-09-09 12:56                       ` Stefan Priebe - Profihost AG
       [not found]                         ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>
  2019-09-10  5:41                       ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-09 12:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Am 09.09.19 um 14:49 schrieb Michal Hocko:
> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
>>
>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
>>>>>> and that matches moments when we reclaimed memory. There seems to be a
>>>>>> steady THP allocations flow so maybe this is a source of the direct
>>>>>> reclaim?
>>>>>
>>>>> I was thinking about this some more and THP being a source of reclaim
>>>>> sounds quite unlikely. At least in a default configuration because we
>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
>>>>> difference source of high order (!costly) allocations. Could you check
>>>>> how many allocation requests like that you have on your system?
>>>>>
>>>>> mount -t debugfs none /debug
>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
>>>>> cat /debug/tracing/trace_pipe > $file
>>>
>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>>  
>>> might tell us something as well but it might turn out that it just still
>>> doesn't give us the full picture and we might need
>>> echo stacktrace > /debug/tracing/trace_options
>>>
>>> It will generate much more output though.
>>>
>>>> Just now or when PSI raises?
>>>
>>> When the excessive reclaim is happening ideally.
>>
>> This one is from a server with 28G memfree but memory pressure is still
>> jumping between 0 and 10%.
>>
>> I did:
>> echo "order > 0" >
>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
>>
>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
>>
>> echo 1 >
>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>
>> echo 1 >
>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>
>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
>>
>> File attached.
> 
> There is no reclaim captured in this trace dump.
> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
>     777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>     663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>     153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>     911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
>    4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>      62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>      14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>      11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
>    1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>      45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>       1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
>    7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>      73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>     729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>     528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>    1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>    5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>       1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>     132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>      13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>       1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>    1232 order=9 gfp_flags=GFP_TRANSHUGE
>     108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
>     362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE
> 
> Nothing really stands out because except for the THP ones none of others
> are going to even be using movable zone.
It might be that this is not an ideal example is was just the fastest i
could find. May be we really need one with much higher pressure.

I would try to find one with much higher pressure.

>  You've said that your machine
> doesn't have more than one NUMA node, right?

Yes the first example is / was a VM. This one is a Single Xeon.

Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-09 12:49                     ` Michal Hocko
  2019-09-09 12:56                       ` Stefan Priebe - Profihost AG
@ 2019-09-10  5:41                       ` Stefan Priebe - Profihost AG
  1 sibling, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-10  5:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka


Am 09.09.19 um 14:49 schrieb Michal Hocko:
> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
>>
>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
>>>>>> and that matches moments when we reclaimed memory. There seems to be a
>>>>>> steady THP allocations flow so maybe this is a source of the direct
>>>>>> reclaim?
>>>>>
>>>>> I was thinking about this some more and THP being a source of reclaim
>>>>> sounds quite unlikely. At least in a default configuration because we
>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
>>>>> difference source of high order (!costly) allocations. Could you check
>>>>> how many allocation requests like that you have on your system?

I've another system which might be interesting. Not sure which stuff to
gather.

It never builds up any read cache cause memory is constantly under
pressure. But memfree is 28G.

What would be interesting to collect here? Pressure is not very high
just 1-3% but it seems it prevents the system from building up file
cache. Mostly at the night where no pressure is it starts building up a
read cache until pressure happens again. But all this happens with
MemFree at nearly 30GB of memory.

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
       [not found]                         ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>
@ 2019-09-10  5:58                           ` Stefan Priebe - Profihost AG
  2019-09-10  8:29                           ` Michal Hocko
  1 sibling, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-10  5:58 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Those are also constantly running on this system (30G free mem):
  101 root      20   0       0      0      0 S  12,9  0,0  40:38.45
[kswapd0]
   89 root      39  19       0      0      0 S  11,6  0,0  38:58.84
[khugepaged]

# cat /proc/pagetypeinfo
Page block order: 9
Pages per block:  512

Free pages count per migrate type at order       0      1      2      3
     4      5      6      7      8      9     10
Node    0, zone      DMA, type    Unmovable      0      0      0      1
     2      1      1      0      1      0      0
Node    0, zone      DMA, type      Movable      0      0      0      0
     0      0      0      0      0      1      3
Node    0, zone      DMA, type  Reclaimable      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone      DMA, type   HighAtomic      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone      DMA, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone    DMA32, type    Unmovable      0      1      0      1
     0      1      0      1      1      0      3
Node    0, zone    DMA32, type      Movable     66     53     71     57
    59     53     49     47     24      2     42
Node    0, zone    DMA32, type  Reclaimable      0      0      3      1
     0      1      1      1      1      0      0
Node    0, zone    DMA32, type   HighAtomic      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone    DMA32, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0
Node    0, zone   Normal, type    Unmovable      1   5442  25546  12849
  8379   5771   3297   1523    268      0      0
Node    0, zone   Normal, type      Movable 100322 153229 102511  75583
 52007  34284  19259   9465   2014     15      5
Node    0, zone   Normal, type  Reclaimable   4002   4299   2395   3721
  2568   1056    489    177     63      0      0
Node    0, zone   Normal, type   HighAtomic      0      0      1      3
     3      3      1      0      1      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0
     0      0      0      0      0      0      0

Number of blocks type     Unmovable      Movable  Reclaimable
HighAtomic      Isolate
Node 0, zone      DMA            1            7            0
0            0
Node 0, zone    DMA32           10         1005            1
0            0
Node 0, zone   Normal         3411        27125         1207
1            0

Greets,
Stefan

Am 10.09.19 um 07:56 schrieb Stefan Priebe - Profihost AG:
> 
> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG:
>> Am 09.09.19 um 14:49 schrieb Michal Hocko:
>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
>>>>>>
>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a
>>>>>>>> steady THP allocations flow so maybe this is a source of the direct
>>>>>>>> reclaim?
>>>>>>>
>>>>>>> I was thinking about this some more and THP being a source of reclaim
>>>>>>> sounds quite unlikely. At least in a default configuration because we
>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
>>>>>>> difference source of high order (!costly) allocations. Could you check
>>>>>>> how many allocation requests like that you have on your system?
>>>>>>>
>>>>>>> mount -t debugfs none /debug
>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
>>>>>>> cat /debug/tracing/trace_pipe > $file
>>>>>
>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>>>>  
>>>>> might tell us something as well but it might turn out that it just still
>>>>> doesn't give us the full picture and we might need
>>>>> echo stacktrace > /debug/tracing/trace_options
>>>>>
>>>>> It will generate much more output though.
>>>>>
>>>>>> Just now or when PSI raises?
>>>>>
>>>>> When the excessive reclaim is happening ideally.
>>>>
>>>> This one is from a server with 28G memfree but memory pressure is still
>>>> jumping between 0 and 10%.
>>>>
>>>> I did:
>>>> echo "order > 0" >
>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
>>>>
>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
>>>>
>>>> echo 1 >
>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>>>
>>>> echo 1 >
>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>>>
>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
>>>>
>>>> File attached.
>>>
>>> There is no reclaim captured in this trace dump.
>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
>>>     777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>     663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>     153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>     911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
>>>    4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>      62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>      14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>>>      11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
>>>    1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>      45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>>>       1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
>>>    7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>      73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>     729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>>>     528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>    1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>    5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>>>       1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>     132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>      13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>>>       1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>>>    1232 order=9 gfp_flags=GFP_TRANSHUGE
>>>     108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
>>>     362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE
>>>
>>> Nothing really stands out because except for the THP ones none of others
>>> are going to even be using movable zone.
>> It might be that this is not an ideal example is was just the fastest i
>> could find. May be we really need one with much higher pressure.
> 
> here another trace log where a system has 30GB free memory but is under
> constant pressure and does not build up any file cache caused by memory
> pressure.
> 
> 
> Greets,
> Stefan
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
       [not found]                         ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>
  2019-09-10  5:58                           ` Stefan Priebe - Profihost AG
@ 2019-09-10  8:29                           ` Michal Hocko
  2019-09-10  8:38                             ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-10  8:29 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote:
> 
> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG:
> > Am 09.09.19 um 14:49 schrieb Michal Hocko:
> >> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
> >>>
> >>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
> >>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
> >>>>>
> >>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
> >>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
> >>>>>>> and that matches moments when we reclaimed memory. There seems to be a
> >>>>>>> steady THP allocations flow so maybe this is a source of the direct
> >>>>>>> reclaim?
> >>>>>>
> >>>>>> I was thinking about this some more and THP being a source of reclaim
> >>>>>> sounds quite unlikely. At least in a default configuration because we
> >>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
> >>>>>> difference source of high order (!costly) allocations. Could you check
> >>>>>> how many allocation requests like that you have on your system?
> >>>>>>
> >>>>>> mount -t debugfs none /debug
> >>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
> >>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
> >>>>>> cat /debug/tracing/trace_pipe > $file
> >>>>
> >>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> >>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> >>>>  
> >>>> might tell us something as well but it might turn out that it just still
> >>>> doesn't give us the full picture and we might need
> >>>> echo stacktrace > /debug/tracing/trace_options
> >>>>
> >>>> It will generate much more output though.
> >>>>
> >>>>> Just now or when PSI raises?
> >>>>
> >>>> When the excessive reclaim is happening ideally.
> >>>
> >>> This one is from a server with 28G memfree but memory pressure is still
> >>> jumping between 0 and 10%.
> >>>
> >>> I did:
> >>> echo "order > 0" >
> >>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
> >>>
> >>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
> >>>
> >>> echo 1 >
> >>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> >>>
> >>> echo 1 >
> >>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> >>>
> >>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
> >>>
> >>> File attached.
> >>
> >> There is no reclaim captured in this trace dump.
> >> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
> >>     777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>     663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>     153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>     911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
> >>    4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>      62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>      14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
> >>      11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
> >>    1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>      45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
> >>       1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
> >>    7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>      73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>     729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
> >>     528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>    1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>    5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
> >>       1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>     132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>      13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
> >>       1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
> >>    1232 order=9 gfp_flags=GFP_TRANSHUGE
> >>     108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
> >>     362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE
> >>
> >> Nothing really stands out because except for the THP ones none of others
> >> are going to even be using movable zone.
> > It might be that this is not an ideal example is was just the fastest i
> > could find. May be we really need one with much higher pressure.
> 
> here another trace log where a system has 30GB free memory but is under
> constant pressure and does not build up any file cache caused by memory
> pressure.

So the reclaim is clearly induced by THP allocations
$ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c
   1580 gfp_flags=GFP_TRANSHUGE
     15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE

$ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' |  awk '{nr+=$6+0}END{print nr}'
1541726

6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation
rate is really high as well
$ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l
15340

this is 30GB worth of THPs (some of them might get released of course).
Also only 10% of requests ends up reclaiming.

One additional interesting point
$ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk
min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596

Even though the std is high there are quite some outliers when a lot of
memory is reclaimed.

Which kernel version is this. And again, what is the THP configuration.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10  8:29                           ` Michal Hocko
@ 2019-09-10  8:38                             ` Stefan Priebe - Profihost AG
  2019-09-10  9:02                               ` Michal Hocko
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-10  8:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Am 10.09.19 um 10:29 schrieb Michal Hocko:
> On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote:
>>
>> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG:
>>> Am 09.09.19 um 14:49 schrieb Michal Hocko:
>>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
>>>>>
>>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
>>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
>>>>>>>
>>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
>>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
>>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a
>>>>>>>>> steady THP allocations flow so maybe this is a source of the direct
>>>>>>>>> reclaim?
>>>>>>>>
>>>>>>>> I was thinking about this some more and THP being a source of reclaim
>>>>>>>> sounds quite unlikely. At least in a default configuration because we
>>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
>>>>>>>> difference source of high order (!costly) allocations. Could you check
>>>>>>>> how many allocation requests like that you have on your system?
>>>>>>>>
>>>>>>>> mount -t debugfs none /debug
>>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
>>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
>>>>>>>> cat /debug/tracing/trace_pipe > $file
>>>>>>
>>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>>>>>  
>>>>>> might tell us something as well but it might turn out that it just still
>>>>>> doesn't give us the full picture and we might need
>>>>>> echo stacktrace > /debug/tracing/trace_options
>>>>>>
>>>>>> It will generate much more output though.
>>>>>>
>>>>>>> Just now or when PSI raises?
>>>>>>
>>>>>> When the excessive reclaim is happening ideally.
>>>>>
>>>>> This one is from a server with 28G memfree but memory pressure is still
>>>>> jumping between 0 and 10%.
>>>>>
>>>>> I did:
>>>>> echo "order > 0" >
>>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
>>>>>
>>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
>>>>>
>>>>> echo 1 >
>>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>>>>
>>>>> echo 1 >
>>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>>>>
>>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
>>>>>
>>>>> File attached.
>>>>
>>>> There is no reclaim captured in this trace dump.
>>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
>>>>     777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>     663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>     153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>     911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
>>>>    4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>>      62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>      14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>>>>      11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
>>>>    1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>      45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>>>>       1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
>>>>    7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>>      73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>     729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>>>>     528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>    1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>>    5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>>>>       1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>     132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>      13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>>>>       1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>>>>    1232 order=9 gfp_flags=GFP_TRANSHUGE
>>>>     108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
>>>>     362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE
>>>>
>>>> Nothing really stands out because except for the THP ones none of others
>>>> are going to even be using movable zone.
>>> It might be that this is not an ideal example is was just the fastest i
>>> could find. May be we really need one with much higher pressure.
>>
>> here another trace log where a system has 30GB free memory but is under
>> constant pressure and does not build up any file cache caused by memory
>> pressure.
> 
> So the reclaim is clearly induced by THP allocations
> $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c
>    1580 gfp_flags=GFP_TRANSHUGE
>      15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
> 
> $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' |  awk '{nr+=$6+0}END{print nr}'
> 1541726
> 
> 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation
> rate is really high as well
> $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l
> 15340
> 
> this is 30GB worth of THPs (some of them might get released of course).
> Also only 10% of requests ends up reclaiming.
> 
> One additional interesting point
> $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk
> min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596
> 
> Even though the std is high there are quite some outliers when a lot of
> memory is reclaimed.
> 
> Which kernel version is this. And again, what is the THP configuration.

This is 4.19.66 regarding THP you mean this:
/sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise]
madvise never

/sys/kernel/mm/transparent_hugepage/enabled:[always] madvise never

/sys/kernel/mm/transparent_hugepage/hpage_pmd_size:2097152

/sys/kernel/mm/transparent_hugepage/shmem_enabled:always within_size
advise [never] deny force

/sys/kernel/mm/transparent_hugepage/use_zero_page:1

/sys/kernel/mm/transparent_hugepage/enabled was madvise until yesterday
where i tried to switch to defer+madvise - which didn't help.

Greets,
Stefan



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10  8:38                             ` Stefan Priebe - Profihost AG
@ 2019-09-10  9:02                               ` Michal Hocko
  2019-09-10  9:37                                 ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-10  9:02 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote:
> Am 10.09.19 um 10:29 schrieb Michal Hocko:
> > On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote:
> >>
> >> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG:
> >>> Am 09.09.19 um 14:49 schrieb Michal Hocko:
> >>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
> >>>>>
> >>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
> >>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
> >>>>>>>
> >>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
> >>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
> >>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a
> >>>>>>>>> steady THP allocations flow so maybe this is a source of the direct
> >>>>>>>>> reclaim?
> >>>>>>>>
> >>>>>>>> I was thinking about this some more and THP being a source of reclaim
> >>>>>>>> sounds quite unlikely. At least in a default configuration because we
> >>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
> >>>>>>>> difference source of high order (!costly) allocations. Could you check
> >>>>>>>> how many allocation requests like that you have on your system?
> >>>>>>>>
> >>>>>>>> mount -t debugfs none /debug
> >>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
> >>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
> >>>>>>>> cat /debug/tracing/trace_pipe > $file
> >>>>>>
> >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> >>>>>>  
> >>>>>> might tell us something as well but it might turn out that it just still
> >>>>>> doesn't give us the full picture and we might need
> >>>>>> echo stacktrace > /debug/tracing/trace_options
> >>>>>>
> >>>>>> It will generate much more output though.
> >>>>>>
> >>>>>>> Just now or when PSI raises?
> >>>>>>
> >>>>>> When the excessive reclaim is happening ideally.
> >>>>>
> >>>>> This one is from a server with 28G memfree but memory pressure is still
> >>>>> jumping between 0 and 10%.
> >>>>>
> >>>>> I did:
> >>>>> echo "order > 0" >
> >>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
> >>>>>
> >>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
> >>>>>
> >>>>> echo 1 >
> >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
> >>>>>
> >>>>> echo 1 >
> >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
> >>>>>
> >>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
> >>>>>
> >>>>> File attached.
> >>>>
> >>>> There is no reclaim captured in this trace dump.
> >>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
> >>>>     777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
> >>>>    4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>>>      62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>      14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
> >>>>      11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
> >>>>    1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>      45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
> >>>>       1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
> >>>>    7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>>>      73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
> >>>>     528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>    1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
> >>>>    5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
> >>>>       1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>     132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
> >>>>      13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
> >>>>       1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
> >>>>    1232 order=9 gfp_flags=GFP_TRANSHUGE
> >>>>     108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
> >>>>     362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE
> >>>>
> >>>> Nothing really stands out because except for the THP ones none of others
> >>>> are going to even be using movable zone.
> >>> It might be that this is not an ideal example is was just the fastest i
> >>> could find. May be we really need one with much higher pressure.
> >>
> >> here another trace log where a system has 30GB free memory but is under
> >> constant pressure and does not build up any file cache caused by memory
> >> pressure.
> > 
> > So the reclaim is clearly induced by THP allocations
> > $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c
> >    1580 gfp_flags=GFP_TRANSHUGE
> >      15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
> > 
> > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' |  awk '{nr+=$6+0}END{print nr}'
> > 1541726
> > 
> > 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation
> > rate is really high as well
> > $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l
> > 15340
> > 
> > this is 30GB worth of THPs (some of them might get released of course).
> > Also only 10% of requests ends up reclaiming.
> > 
> > One additional interesting point
> > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk
> > min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596
> > 
> > Even though the std is high there are quite some outliers when a lot of
> > memory is reclaimed.
> > 
> > Which kernel version is this. And again, what is the THP configuration.
> 
> This is 4.19.66 regarding THP you mean this:

Do you see the same behavior with 5.3?

> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise]
> madvise never
> 
> /sys/kernel/mm/transparent_hugepage/enabled:[always] madvise never
> 
> /sys/kernel/mm/transparent_hugepage/hpage_pmd_size:2097152
> 
> /sys/kernel/mm/transparent_hugepage/shmem_enabled:always within_size
> advise [never] deny force
> 
> /sys/kernel/mm/transparent_hugepage/use_zero_page:1
> 
> /sys/kernel/mm/transparent_hugepage/enabled was madvise until yesterday
> where i tried to switch to defer+madvise - which didn't help.

Many processes hitting the reclaim are php5 others I cannot say because
their cmd is not reflected in the trace. I suspect those are using
madvise. I haven't really seen kcompactd interfering much. That would
suggest using defer.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10  9:02                               ` Michal Hocko
@ 2019-09-10  9:37                                 ` Stefan Priebe - Profihost AG
  2019-09-10 11:07                                   ` Michal Hocko
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-10  9:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka


Am 10.09.19 um 11:02 schrieb Michal Hocko:
> On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote:
>> Am 10.09.19 um 10:29 schrieb Michal Hocko:
>>> On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote:
>>>>
>>>> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG:
>>>>> Am 09.09.19 um 14:49 schrieb Michal Hocko:
>>>>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote:
>>>>>>>
>>>>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko:
>>>>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote:
>>>>>>>>>
>>>>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko:
>>>>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote:
>>>>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a
>>>>>>>>>>> steady THP allocations flow so maybe this is a source of the direct
>>>>>>>>>>> reclaim?
>>>>>>>>>>
>>>>>>>>>> I was thinking about this some more and THP being a source of reclaim
>>>>>>>>>> sounds quite unlikely. At least in a default configuration because we
>>>>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a
>>>>>>>>>> difference source of high order (!costly) allocations. Could you check
>>>>>>>>>> how many allocation requests like that you have on your system?
>>>>>>>>>>
>>>>>>>>>> mount -t debugfs none /debug
>>>>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter
>>>>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable
>>>>>>>>>> cat /debug/tracing/trace_pipe > $file
>>>>>>>>
>>>>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>>>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>>>>>>>  
>>>>>>>> might tell us something as well but it might turn out that it just still
>>>>>>>> doesn't give us the full picture and we might need
>>>>>>>> echo stacktrace > /debug/tracing/trace_options
>>>>>>>>
>>>>>>>> It will generate much more output though.
>>>>>>>>
>>>>>>>>> Just now or when PSI raises?
>>>>>>>>
>>>>>>>> When the excessive reclaim is happening ideally.
>>>>>>>
>>>>>>> This one is from a server with 28G memfree but memory pressure is still
>>>>>>> jumping between 0 and 10%.
>>>>>>>
>>>>>>> I did:
>>>>>>> echo "order > 0" >
>>>>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter
>>>>>>>
>>>>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable
>>>>>>>
>>>>>>> echo 1 >
>>>>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable
>>>>>>>
>>>>>>> echo 1 >
>>>>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable
>>>>>>>
>>>>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace
>>>>>>>
>>>>>>> File attached.
>>>>>>
>>>>>> There is no reclaim captured in this trace dump.
>>>>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c
>>>>>>     777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>     663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>     153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>     911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO
>>>>>>    4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>>>>      62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>      14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>>>>>>      11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE
>>>>>>    1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>      45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>>>>>>       1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
>>>>>>    7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>>>>      73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>     729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE
>>>>>>     528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>    1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT
>>>>>>    5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP
>>>>>>       1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>     132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC
>>>>>>      13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>>>>>>       1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO
>>>>>>    1232 order=9 gfp_flags=GFP_TRANSHUGE
>>>>>>     108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
>>>>>>     362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE
>>>>>>
>>>>>> Nothing really stands out because except for the THP ones none of others
>>>>>> are going to even be using movable zone.
>>>>> It might be that this is not an ideal example is was just the fastest i
>>>>> could find. May be we really need one with much higher pressure.
>>>>
>>>> here another trace log where a system has 30GB free memory but is under
>>>> constant pressure and does not build up any file cache caused by memory
>>>> pressure.
>>>
>>> So the reclaim is clearly induced by THP allocations
>>> $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c
>>>    1580 gfp_flags=GFP_TRANSHUGE
>>>      15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE
>>>
>>> $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' |  awk '{nr+=$6+0}END{print nr}'
>>> 1541726
>>>
>>> 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation
>>> rate is really high as well
>>> $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l
>>> 15340
>>>
>>> this is 30GB worth of THPs (some of them might get released of course).
>>> Also only 10% of requests ends up reclaiming.
>>>
>>> One additional interesting point
>>> $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk
>>> min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596
>>>
>>> Even though the std is high there are quite some outliers when a lot of
>>> memory is reclaimed.
>>>
>>> Which kernel version is this. And again, what is the THP configuration.
>>
>> This is 4.19.66 regarding THP you mean this:
> 
> Do you see the same behavior with 5.3?

I rebootet with 5.3.0-rc8 - let's see what happens it might take some
    hours or even days.

>> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise]
>> madvise never
>>
>> /sys/kernel/mm/transparent_hugepage/enabled:[always] madvise never
>>
>> /sys/kernel/mm/transparent_hugepage/hpage_pmd_size:2097152
>>
>> /sys/kernel/mm/transparent_hugepage/shmem_enabled:always within_size
>> advise [never] deny force
>>
>> /sys/kernel/mm/transparent_hugepage/use_zero_page:1
>>
>> /sys/kernel/mm/transparent_hugepage/enabled was madvise until yesterday
>> where i tried to switch to defer+madvise - which didn't help.
> 
> Many processes hitting the reclaim are php5 others I cannot say because
> their cmd is not reflected in the trace. I suspect those are using
> madvise. I haven't really seen kcompactd interfering much. That would
> suggest using defer.

You mean i should set transparent_hugepage to defer?

Stefan



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10  9:37                                 ` Stefan Priebe - Profihost AG
@ 2019-09-10 11:07                                   ` Michal Hocko
  2019-09-10 12:45                                     ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-10 11:07 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Tue 10-09-19 11:37:19, Stefan Priebe - Profihost AG wrote:
> 
> Am 10.09.19 um 11:02 schrieb Michal Hocko:
> > On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote:
[...]
> >> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise]
> >> madvise never
[...]
> > Many processes hitting the reclaim are php5 others I cannot say because
> > their cmd is not reflected in the trace. I suspect those are using
> > madvise. I haven't really seen kcompactd interfering much. That would
> > suggest using defer.
> 
> You mean i should set transparent_hugepage to defer?

Let's try with 5.3 without any changes first and then if the problem is
still reproducible then limit the THP load by setting
transparent_hugepage to defer.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10 11:07                                   ` Michal Hocko
@ 2019-09-10 12:45                                     ` Stefan Priebe - Profihost AG
  2019-09-10 12:57                                       ` Michal Hocko
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-10 12:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

[-- Attachment #1: Type: text/plain, Size: 1118 bytes --]

Hello Michal,

ok this might take a long time. Attached you'll find a graph from a
fresh boot what happens over time (here 17 August to 30 August). Memory
Usage decreases as well as cache but slowly and only over time and days.

So it might take 2-3 weeks running Kernel 5.3 to see what happens.

Greets,
Stefan

Am 10.09.19 um 13:07 schrieb Michal Hocko:
> On Tue 10-09-19 11:37:19, Stefan Priebe - Profihost AG wrote:
>>
>> Am 10.09.19 um 11:02 schrieb Michal Hocko:
>>> On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote:
> [...]
>>>> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise]
>>>> madvise never
> [...]
>>> Many processes hitting the reclaim are php5 others I cannot say because
>>> their cmd is not reflected in the trace. I suspect those are using
>>> madvise. I haven't really seen kcompactd interfering much. That would
>>> suggest using defer.
>>
>> You mean i should set transparent_hugepage to defer?
> 
> Let's try with 5.3 without any changes first and then if the problem is
> still reproducible then limit the THP load by setting
> transparent_hugepage to defer.

[-- Attachment #2: psi-overview.png --]
[-- Type: image/png, Size: 111334 bytes --]

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10 12:45                                     ` Stefan Priebe - Profihost AG
@ 2019-09-10 12:57                                       ` Michal Hocko
  2019-09-10 13:05                                         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-10 12:57 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
> Hello Michal,
> 
> ok this might take a long time. Attached you'll find a graph from a
> fresh boot what happens over time (here 17 August to 30 August). Memory
> Usage decreases as well as cache but slowly and only over time and days.
> 
> So it might take 2-3 weeks running Kernel 5.3 to see what happens.

No problem. Just make sure to collect the requested data from the time
you see the actual problem. Btw. you try my very dumb scriplets to get
an idea of how much memory gets reclaimed due to THP.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10 12:57                                       ` Michal Hocko
@ 2019-09-10 13:05                                         ` Stefan Priebe - Profihost AG
  2019-09-10 13:14                                           ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-10 13:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka


Am 10.09.19 um 14:57 schrieb Michal Hocko:
> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>> Hello Michal,
>>
>> ok this might take a long time. Attached you'll find a graph from a
>> fresh boot what happens over time (here 17 August to 30 August). Memory
>> Usage decreases as well as cache but slowly and only over time and days.
>>
>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
> 
> No problem. Just make sure to collect the requested data from the time
> you see the actual problem. Btw. you try my very dumb scriplets to get
> an idea of how much memory gets reclaimed due to THP.

You mean your sed and sort on top of the trace file? No i did not with
the current 5.3 kernel do you think it will show anything interesting?
Which line shows me how much memory gets reclaimed due to THP?

Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10 13:05                                         ` Stefan Priebe - Profihost AG
@ 2019-09-10 13:14                                           ` Stefan Priebe - Profihost AG
  2019-09-10 13:24                                             ` Michal Hocko
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-10 13:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
> 
> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>> Hello Michal,
>>>
>>> ok this might take a long time. Attached you'll find a graph from a
>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>> Usage decreases as well as cache but slowly and only over time and days.
>>>
>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>
>> No problem. Just make sure to collect the requested data from the time
>> you see the actual problem. Btw. you try my very dumb scriplets to get
>> an idea of how much memory gets reclaimed due to THP.
> 
> You mean your sed and sort on top of the trace file? No i did not with
> the current 5.3 kernel do you think it will show anything interesting?
> Which line shows me how much memory gets reclaimed due to THP?

Is something like a kernel memory leak possible? Or wouldn't this end up
in having a lot of free memory which doesn't seem usable.

I also wonder why a reclaim takes place when there is enough memory.

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10 13:14                                           ` Stefan Priebe - Profihost AG
@ 2019-09-10 13:24                                             ` Michal Hocko
  2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Michal Hocko @ 2019-09-10 13:24 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
> > 
> > Am 10.09.19 um 14:57 schrieb Michal Hocko:
> >> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
> >>> Hello Michal,
> >>>
> >>> ok this might take a long time. Attached you'll find a graph from a
> >>> fresh boot what happens over time (here 17 August to 30 August). Memory
> >>> Usage decreases as well as cache but slowly and only over time and days.
> >>>
> >>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
> >>
> >> No problem. Just make sure to collect the requested data from the time
> >> you see the actual problem. Btw. you try my very dumb scriplets to get
> >> an idea of how much memory gets reclaimed due to THP.
> > 
> > You mean your sed and sort on top of the trace file? No i did not with
> > the current 5.3 kernel do you think it will show anything interesting?
> > Which line shows me how much memory gets reclaimed due to THP?

Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
Each command has a commented output. If you see nunmber of reclaimed
pages to be large for GFP_TRANSHUGE then you are seeing a similar
problem.

> Is something like a kernel memory leak possible? Or wouldn't this end up
> in having a lot of free memory which doesn't seem usable.

I would be really surprised if this was the case.

> I also wonder why a reclaim takes place when there is enough memory.

This is not clear yet and it might be a bug that has been fixed since
4.18. That's why we need to see whether the same is pattern is happening
with 5.3 as well.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-10 13:24                                             ` Michal Hocko
@ 2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
  2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
                                                                   ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-11  6:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Hi Michal,
Am 10.09.19 um 15:24 schrieb Michal Hocko:
> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
>>>
>>> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>>>> Hello Michal,
>>>>>
>>>>> ok this might take a long time. Attached you'll find a graph from a
>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>>>> Usage decreases as well as cache but slowly and only over time and days.
>>>>>
>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>>>
>>>> No problem. Just make sure to collect the requested data from the time
>>>> you see the actual problem. Btw. you try my very dumb scriplets to get
>>>> an idea of how much memory gets reclaimed due to THP.
>>>
>>> You mean your sed and sort on top of the trace file? No i did not with
>>> the current 5.3 kernel do you think it will show anything interesting?
>>> Which line shows me how much memory gets reclaimed due to THP?
> 
> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
> Each command has a commented output. If you see nunmber of reclaimed
> pages to be large for GFP_TRANSHUGE then you are seeing a similar
> problem.
> 
>> Is something like a kernel memory leak possible? Or wouldn't this end up
>> in having a lot of free memory which doesn't seem usable.
> 
> I would be really surprised if this was the case.
> 
>> I also wonder why a reclaim takes place when there is enough memory.
> 
> This is not clear yet and it might be a bug that has been fixed since
> 4.18. That's why we need to see whether the same is pattern is happening
> with 5.3 as well.

Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
on backup disk completely hangs / is blocked at 100% i/o:
[54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
[54739.066973]       Not tainted 5.3.0-rc8 #1
[54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[54739.069065] rsync           D    0  9830   9829 0x00004002
[54739.070146] Call Trace:
[54739.071183]  ? __schedule+0x3cf/0x680
[54739.072202]  ? bit_wait+0x50/0x50
[54739.073196]  schedule+0x39/0xa0
[54739.074213]  io_schedule+0x12/0x40
[54739.075219]  bit_wait_io+0xd/0x50
[54739.076227]  __wait_on_bit+0x66/0x90
[54739.077239]  ? bit_wait+0x50/0x50
[54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
[54739.078741]  ? init_wait_var_entry+0x40/0x40
[54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[54739.080748]  do_writepages+0x1a/0x60
[54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
[54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[54739.084608]  ? retarget_shared_pending+0x70/0x70
[54739.085049]  do_fsync+0x38/0x60
[54739.085494]  __x64_sys_fdatasync+0x13/0x20
[54739.085944]  do_syscall_64+0x55/0x1a0
[54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[54739.086850] RIP: 0033:0x7f1db3fc85f0
[54739.087310] Code: Bad RIP value.
[54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
[54859.900863]       Not tainted 5.3.0-rc8 #1
[54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[54859.902909] rsync           D    0  9830   9829 0x00004002
[54859.903930] Call Trace:
[54859.904888]  ? __schedule+0x3cf/0x680
[54859.905831]  ? bit_wait+0x50/0x50
[54859.906751]  schedule+0x39/0xa0
[54859.907653]  io_schedule+0x12/0x40
[54859.908535]  bit_wait_io+0xd/0x50
[54859.909441]  __wait_on_bit+0x66/0x90
[54859.910306]  ? bit_wait+0x50/0x50
[54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
[54859.912043]  ? init_wait_var_entry+0x40/0x40
[54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[54859.914276]  do_writepages+0x1a/0x60
[54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
[54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[54859.917903]  ? retarget_shared_pending+0x70/0x70
[54859.918307]  do_fsync+0x38/0x60
[54859.918707]  __x64_sys_fdatasync+0x13/0x20
[54859.919106]  do_syscall_64+0x55/0x1a0
[54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[54859.919866] RIP: 0033:0x7f1db3fc85f0
[54859.920243] Code: Bad RIP value.
[54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
[54980.734061]       Not tainted 5.3.0-rc8 #1
[54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[54980.735209] rsync           D    0  9830   9829 0x00004002
[54980.735802] Call Trace:
[54980.736473]  ? __schedule+0x3cf/0x680
[54980.737054]  ? bit_wait+0x50/0x50
[54980.737664]  schedule+0x39/0xa0
[54980.738243]  io_schedule+0x12/0x40
[54980.738712]  bit_wait_io+0xd/0x50
[54980.739171]  __wait_on_bit+0x66/0x90
[54980.739623]  ? bit_wait+0x50/0x50
[54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
[54980.740548]  ? init_wait_var_entry+0x40/0x40
[54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[54980.743045]  do_writepages+0x1a/0x60
[54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
[54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[54980.747575]  ? retarget_shared_pending+0x70/0x70
[54980.748059]  do_fsync+0x38/0x60
[54980.748539]  __x64_sys_fdatasync+0x13/0x20
[54980.749012]  do_syscall_64+0x55/0x1a0
[54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[54980.749995] RIP: 0033:0x7f1db3fc85f0
[54980.750368] Code: Bad RIP value.
[54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
[55101.567775]       Not tainted 5.3.0-rc8 #1
[55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[55101.568649] rsync           D    0  9830   9829 0x00004002
[55101.569101] Call Trace:
[55101.569609]  ? __schedule+0x3cf/0x680
[55101.570052]  ? bit_wait+0x50/0x50
[55101.570504]  schedule+0x39/0xa0
[55101.570938]  io_schedule+0x12/0x40
[55101.571404]  bit_wait_io+0xd/0x50
[55101.571934]  __wait_on_bit+0x66/0x90
[55101.572601]  ? bit_wait+0x50/0x50
[55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
[55101.573599]  ? init_wait_var_entry+0x40/0x40
[55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[55101.575580]  do_writepages+0x1a/0x60
[55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
[55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[55101.579139]  ? retarget_shared_pending+0x70/0x70
[55101.579543]  do_fsync+0x38/0x60
[55101.579928]  __x64_sys_fdatasync+0x13/0x20
[55101.580312]  do_syscall_64+0x55/0x1a0
[55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55101.581086] RIP: 0033:0x7f1db3fc85f0
[55101.581463] Code: Bad RIP value.
[55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
[55222.405773]       Not tainted 5.3.0-rc8 #1
[55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[55222.407158] rsync           D    0  9830   9829 0x00004002
[55222.407776] Call Trace:
[55222.408450]  ? __schedule+0x3cf/0x680
[55222.409206]  ? bit_wait+0x50/0x50
[55222.409942]  schedule+0x39/0xa0
[55222.410658]  io_schedule+0x12/0x40
[55222.411346]  bit_wait_io+0xd/0x50
[55222.411946]  __wait_on_bit+0x66/0x90
[55222.412572]  ? bit_wait+0x50/0x50
[55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
[55222.413944]  ? init_wait_var_entry+0x40/0x40
[55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[55222.417505]  do_writepages+0x1a/0x60
[55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
[55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[55222.424140]  ? retarget_shared_pending+0x70/0x70
[55222.424861]  do_fsync+0x38/0x60
[55222.425581]  __x64_sys_fdatasync+0x13/0x20
[55222.426308]  do_syscall_64+0x55/0x1a0
[55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55222.427732] RIP: 0033:0x7f1db3fc85f0
[55222.428396] Code: Bad RIP value.
[55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
[55343.235887]       Not tainted 5.3.0-rc8 #1
[55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[55343.237213] rsync           D    0  9830   9829 0x00004002
[55343.237766] Call Trace:
[55343.238353]  ? __schedule+0x3cf/0x680
[55343.238971]  ? bit_wait+0x50/0x50
[55343.239592]  schedule+0x39/0xa0
[55343.240173]  io_schedule+0x12/0x40
[55343.240721]  bit_wait_io+0xd/0x50
[55343.241266]  __wait_on_bit+0x66/0x90
[55343.241835]  ? bit_wait+0x50/0x50
[55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
[55343.242938]  ? init_wait_var_entry+0x40/0x40
[55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[55343.245843]  do_writepages+0x1a/0x60
[55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
[55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[55343.251139]  ? retarget_shared_pending+0x70/0x70
[55343.251628]  do_fsync+0x38/0x60
[55343.252208]  __x64_sys_fdatasync+0x13/0x20
[55343.252702]  do_syscall_64+0x55/0x1a0
[55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55343.253798] RIP: 0033:0x7f1db3fc85f0
[55343.254294] Code: Bad RIP value.
[55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
[55464.069701]       Not tainted 5.3.0-rc8 #1
[55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[55464.071637] rsync           D    0  9830   9829 0x00004002
[55464.072637] Call Trace:
[55464.073623]  ? __schedule+0x3cf/0x680
[55464.074604]  ? bit_wait+0x50/0x50
[55464.075577]  schedule+0x39/0xa0
[55464.076531]  io_schedule+0x12/0x40
[55464.077480]  bit_wait_io+0xd/0x50
[55464.078400]  __wait_on_bit+0x66/0x90
[55464.079300]  ? bit_wait+0x50/0x50
[55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
[55464.081107]  ? init_wait_var_entry+0x40/0x40
[55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[55464.085456]  do_writepages+0x1a/0x60
[55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
[55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[55464.089043]  ? retarget_shared_pending+0x70/0x70
[55464.089429]  do_fsync+0x38/0x60
[55464.089811]  __x64_sys_fdatasync+0x13/0x20
[55464.090190]  do_syscall_64+0x55/0x1a0
[55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55464.090944] RIP: 0033:0x7f1db3fc85f0
[55464.091321] Code: Bad RIP value.
[55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
[55584.903748]       Not tainted 5.3.0-rc8 #1
[55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[55584.906023] rsync           D    0  9830   9829 0x00004002
[55584.907207] Call Trace:
[55584.908355]  ? __schedule+0x3cf/0x680
[55584.909507]  ? bit_wait+0x50/0x50
[55584.910682]  schedule+0x39/0xa0
[55584.911230]  io_schedule+0x12/0x40
[55584.911666]  bit_wait_io+0xd/0x50
[55584.912092]  __wait_on_bit+0x66/0x90
[55584.912510]  ? bit_wait+0x50/0x50
[55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
[55584.913343]  ? init_wait_var_entry+0x40/0x40
[55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[55584.915588]  do_writepages+0x1a/0x60
[55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
[55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[55584.919679]  ? retarget_shared_pending+0x70/0x70
[55584.920122]  do_fsync+0x38/0x60
[55584.920559]  __x64_sys_fdatasync+0x13/0x20
[55584.920996]  do_syscall_64+0x55/0x1a0
[55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55584.921865] RIP: 0033:0x7f1db3fc85f0
[55584.922298] Code: Bad RIP value.
[55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
[55705.736999]       Not tainted 5.3.0-rc8 #1
[55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[55705.738411] rsync           D    0  9830   9829 0x00004002
[55705.739072] Call Trace:
[55705.739455]  ? __schedule+0x3cf/0x680
[55705.739837]  ? bit_wait+0x50/0x50
[55705.740215]  schedule+0x39/0xa0
[55705.740610]  io_schedule+0x12/0x40
[55705.741243]  bit_wait_io+0xd/0x50
[55705.741897]  __wait_on_bit+0x66/0x90
[55705.742524]  ? bit_wait+0x50/0x50
[55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
[55705.743750]  ? init_wait_var_entry+0x40/0x40
[55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[55705.746753]  do_writepages+0x1a/0x60
[55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
[55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[55705.752981]  ? retarget_shared_pending+0x70/0x70
[55705.753686]  do_fsync+0x38/0x60
[55705.754340]  __x64_sys_fdatasync+0x13/0x20
[55705.755012]  do_syscall_64+0x55/0x1a0
[55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55705.756375] RIP: 0033:0x7f1db3fc85f0
[55705.757042] Code: Bad RIP value.
[55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000
[55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
[55826.571349]       Not tainted 5.3.0-rc8 #1
[55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[55826.573618] rsync           D    0  9830   9829 0x00004002
[55826.574790] Call Trace:
[55826.575932]  ? __schedule+0x3cf/0x680
[55826.577079]  ? bit_wait+0x50/0x50
[55826.578233]  schedule+0x39/0xa0
[55826.579350]  io_schedule+0x12/0x40
[55826.580451]  bit_wait_io+0xd/0x50
[55826.581527]  __wait_on_bit+0x66/0x90
[55826.582596]  ? bit_wait+0x50/0x50
[55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
[55826.583550]  ? init_wait_var_entry+0x40/0x40
[55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
[55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
[55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
[55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
[55826.585547]  do_writepages+0x1a/0x60
[55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
[55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
[55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
[55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
[55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
[55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
[55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
[55826.589219]  ? retarget_shared_pending+0x70/0x70
[55826.589617]  do_fsync+0x38/0x60
[55826.590011]  __x64_sys_fdatasync+0x13/0x20
[55826.590411]  do_syscall_64+0x55/0x1a0
[55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[55826.591185] RIP: 0033:0x7f1db3fc85f0
[55826.591572] Code: Bad RIP value.
[55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
000000000000004b
[55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
00007f1db3fc85f0
[55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
0000000000000001
[55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
0000000081c492ca
[55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
0000000000000028
[55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
0000000000000000


Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
@ 2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
  2019-09-11 13:59                                                   ` Stefan Priebe - Profihost AG
  2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
  2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
  2 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-11  6:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Hi Michal,

Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG:
> Hi Michal,
> Am 10.09.19 um 15:24 schrieb Michal Hocko:
>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
>>>>
>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello Michal,
>>>>>>
>>>>>> ok this might take a long time. Attached you'll find a graph from a
>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>>>>> Usage decreases as well as cache but slowly and only over time and days.
>>>>>>
>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>>>>
>>>>> No problem. Just make sure to collect the requested data from the time
>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get
>>>>> an idea of how much memory gets reclaimed due to THP.
>>>>
>>>> You mean your sed and sort on top of the trace file? No i did not with
>>>> the current 5.3 kernel do you think it will show anything interesting?
>>>> Which line shows me how much memory gets reclaimed due to THP?
>>
>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
>> Each command has a commented output. If you see nunmber of reclaimed
>> pages to be large for GFP_TRANSHUGE then you are seeing a similar
>> problem.
>>
>>> Is something like a kernel memory leak possible? Or wouldn't this end up
>>> in having a lot of free memory which doesn't seem usable.
>>
>> I would be really surprised if this was the case.
>>
>>> I also wonder why a reclaim takes place when there is enough memory.
>>
>> This is not clear yet and it might be a bug that has been fixed since
>> 4.18. That's why we need to see whether the same is pattern is happening
>> with 5.3 as well.

but except from the btrfs problem the memory consumption looks far
better than before.

Running 4.19.X:
after about 12h cache starts to drop from 30G to 24G

Running 5.3-rc8:
after about 24h cache is still constant at nearly 30G

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI)
  2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
  2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
@ 2019-09-11  7:09                                                 ` Michal Hocko
  2019-09-11 14:09                                                   ` Stefan Priebe - Profihost AG
  2019-09-11 14:56                                                   ` Filipe Manana
  2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
  2 siblings, 2 replies; 61+ messages in thread
From: Michal Hocko @ 2019-09-11  7:09 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka,
	Jens Axboe, linux-block, linux-fsdevel, David Sterba,
	linux-btrfs

This smells like IO/Btrfs issue to me. Cc some more people.

On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote:
[...]
> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
> on backup disk completely hangs / is blocked at 100% i/o:
> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
> [54739.066973]       Not tainted 5.3.0-rc8 #1
> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [54739.069065] rsync           D    0  9830   9829 0x00004002
> [54739.070146] Call Trace:
> [54739.071183]  ? __schedule+0x3cf/0x680
> [54739.072202]  ? bit_wait+0x50/0x50
> [54739.073196]  schedule+0x39/0xa0
> [54739.074213]  io_schedule+0x12/0x40
> [54739.075219]  bit_wait_io+0xd/0x50
> [54739.076227]  __wait_on_bit+0x66/0x90
> [54739.077239]  ? bit_wait+0x50/0x50
> [54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
> [54739.078741]  ? init_wait_var_entry+0x40/0x40
> [54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [54739.080748]  do_writepages+0x1a/0x60
> [54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
> [54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [54739.084608]  ? retarget_shared_pending+0x70/0x70
> [54739.085049]  do_fsync+0x38/0x60
> [54739.085494]  __x64_sys_fdatasync+0x13/0x20
> [54739.085944]  do_syscall_64+0x55/0x1a0
> [54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [54739.086850] RIP: 0033:0x7f1db3fc85f0
> [54739.087310] Code: Bad RIP value.
> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
> [54859.900863]       Not tainted 5.3.0-rc8 #1
> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [54859.902909] rsync           D    0  9830   9829 0x00004002
> [54859.903930] Call Trace:
> [54859.904888]  ? __schedule+0x3cf/0x680
> [54859.905831]  ? bit_wait+0x50/0x50
> [54859.906751]  schedule+0x39/0xa0
> [54859.907653]  io_schedule+0x12/0x40
> [54859.908535]  bit_wait_io+0xd/0x50
> [54859.909441]  __wait_on_bit+0x66/0x90
> [54859.910306]  ? bit_wait+0x50/0x50
> [54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
> [54859.912043]  ? init_wait_var_entry+0x40/0x40
> [54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [54859.914276]  do_writepages+0x1a/0x60
> [54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
> [54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [54859.917903]  ? retarget_shared_pending+0x70/0x70
> [54859.918307]  do_fsync+0x38/0x60
> [54859.918707]  __x64_sys_fdatasync+0x13/0x20
> [54859.919106]  do_syscall_64+0x55/0x1a0
> [54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [54859.919866] RIP: 0033:0x7f1db3fc85f0
> [54859.920243] Code: Bad RIP value.
> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
> [54980.734061]       Not tainted 5.3.0-rc8 #1
> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [54980.735209] rsync           D    0  9830   9829 0x00004002
> [54980.735802] Call Trace:
> [54980.736473]  ? __schedule+0x3cf/0x680
> [54980.737054]  ? bit_wait+0x50/0x50
> [54980.737664]  schedule+0x39/0xa0
> [54980.738243]  io_schedule+0x12/0x40
> [54980.738712]  bit_wait_io+0xd/0x50
> [54980.739171]  __wait_on_bit+0x66/0x90
> [54980.739623]  ? bit_wait+0x50/0x50
> [54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
> [54980.740548]  ? init_wait_var_entry+0x40/0x40
> [54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [54980.743045]  do_writepages+0x1a/0x60
> [54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
> [54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [54980.747575]  ? retarget_shared_pending+0x70/0x70
> [54980.748059]  do_fsync+0x38/0x60
> [54980.748539]  __x64_sys_fdatasync+0x13/0x20
> [54980.749012]  do_syscall_64+0x55/0x1a0
> [54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [54980.749995] RIP: 0033:0x7f1db3fc85f0
> [54980.750368] Code: Bad RIP value.
> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
> [55101.567775]       Not tainted 5.3.0-rc8 #1
> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [55101.568649] rsync           D    0  9830   9829 0x00004002
> [55101.569101] Call Trace:
> [55101.569609]  ? __schedule+0x3cf/0x680
> [55101.570052]  ? bit_wait+0x50/0x50
> [55101.570504]  schedule+0x39/0xa0
> [55101.570938]  io_schedule+0x12/0x40
> [55101.571404]  bit_wait_io+0xd/0x50
> [55101.571934]  __wait_on_bit+0x66/0x90
> [55101.572601]  ? bit_wait+0x50/0x50
> [55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
> [55101.573599]  ? init_wait_var_entry+0x40/0x40
> [55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [55101.575580]  do_writepages+0x1a/0x60
> [55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
> [55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [55101.579139]  ? retarget_shared_pending+0x70/0x70
> [55101.579543]  do_fsync+0x38/0x60
> [55101.579928]  __x64_sys_fdatasync+0x13/0x20
> [55101.580312]  do_syscall_64+0x55/0x1a0
> [55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [55101.581086] RIP: 0033:0x7f1db3fc85f0
> [55101.581463] Code: Bad RIP value.
> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
> [55222.405773]       Not tainted 5.3.0-rc8 #1
> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [55222.407158] rsync           D    0  9830   9829 0x00004002
> [55222.407776] Call Trace:
> [55222.408450]  ? __schedule+0x3cf/0x680
> [55222.409206]  ? bit_wait+0x50/0x50
> [55222.409942]  schedule+0x39/0xa0
> [55222.410658]  io_schedule+0x12/0x40
> [55222.411346]  bit_wait_io+0xd/0x50
> [55222.411946]  __wait_on_bit+0x66/0x90
> [55222.412572]  ? bit_wait+0x50/0x50
> [55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
> [55222.413944]  ? init_wait_var_entry+0x40/0x40
> [55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [55222.417505]  do_writepages+0x1a/0x60
> [55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
> [55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [55222.424140]  ? retarget_shared_pending+0x70/0x70
> [55222.424861]  do_fsync+0x38/0x60
> [55222.425581]  __x64_sys_fdatasync+0x13/0x20
> [55222.426308]  do_syscall_64+0x55/0x1a0
> [55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [55222.427732] RIP: 0033:0x7f1db3fc85f0
> [55222.428396] Code: Bad RIP value.
> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
> [55343.235887]       Not tainted 5.3.0-rc8 #1
> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [55343.237213] rsync           D    0  9830   9829 0x00004002
> [55343.237766] Call Trace:
> [55343.238353]  ? __schedule+0x3cf/0x680
> [55343.238971]  ? bit_wait+0x50/0x50
> [55343.239592]  schedule+0x39/0xa0
> [55343.240173]  io_schedule+0x12/0x40
> [55343.240721]  bit_wait_io+0xd/0x50
> [55343.241266]  __wait_on_bit+0x66/0x90
> [55343.241835]  ? bit_wait+0x50/0x50
> [55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
> [55343.242938]  ? init_wait_var_entry+0x40/0x40
> [55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [55343.245843]  do_writepages+0x1a/0x60
> [55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
> [55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [55343.251139]  ? retarget_shared_pending+0x70/0x70
> [55343.251628]  do_fsync+0x38/0x60
> [55343.252208]  __x64_sys_fdatasync+0x13/0x20
> [55343.252702]  do_syscall_64+0x55/0x1a0
> [55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [55343.253798] RIP: 0033:0x7f1db3fc85f0
> [55343.254294] Code: Bad RIP value.
> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
> [55464.069701]       Not tainted 5.3.0-rc8 #1
> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [55464.071637] rsync           D    0  9830   9829 0x00004002
> [55464.072637] Call Trace:
> [55464.073623]  ? __schedule+0x3cf/0x680
> [55464.074604]  ? bit_wait+0x50/0x50
> [55464.075577]  schedule+0x39/0xa0
> [55464.076531]  io_schedule+0x12/0x40
> [55464.077480]  bit_wait_io+0xd/0x50
> [55464.078400]  __wait_on_bit+0x66/0x90
> [55464.079300]  ? bit_wait+0x50/0x50
> [55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
> [55464.081107]  ? init_wait_var_entry+0x40/0x40
> [55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [55464.085456]  do_writepages+0x1a/0x60
> [55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
> [55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [55464.089043]  ? retarget_shared_pending+0x70/0x70
> [55464.089429]  do_fsync+0x38/0x60
> [55464.089811]  __x64_sys_fdatasync+0x13/0x20
> [55464.090190]  do_syscall_64+0x55/0x1a0
> [55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [55464.090944] RIP: 0033:0x7f1db3fc85f0
> [55464.091321] Code: Bad RIP value.
> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
> [55584.903748]       Not tainted 5.3.0-rc8 #1
> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [55584.906023] rsync           D    0  9830   9829 0x00004002
> [55584.907207] Call Trace:
> [55584.908355]  ? __schedule+0x3cf/0x680
> [55584.909507]  ? bit_wait+0x50/0x50
> [55584.910682]  schedule+0x39/0xa0
> [55584.911230]  io_schedule+0x12/0x40
> [55584.911666]  bit_wait_io+0xd/0x50
> [55584.912092]  __wait_on_bit+0x66/0x90
> [55584.912510]  ? bit_wait+0x50/0x50
> [55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
> [55584.913343]  ? init_wait_var_entry+0x40/0x40
> [55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [55584.915588]  do_writepages+0x1a/0x60
> [55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
> [55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [55584.919679]  ? retarget_shared_pending+0x70/0x70
> [55584.920122]  do_fsync+0x38/0x60
> [55584.920559]  __x64_sys_fdatasync+0x13/0x20
> [55584.920996]  do_syscall_64+0x55/0x1a0
> [55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [55584.921865] RIP: 0033:0x7f1db3fc85f0
> [55584.922298] Code: Bad RIP value.
> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
> [55705.736999]       Not tainted 5.3.0-rc8 #1
> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [55705.738411] rsync           D    0  9830   9829 0x00004002
> [55705.739072] Call Trace:
> [55705.739455]  ? __schedule+0x3cf/0x680
> [55705.739837]  ? bit_wait+0x50/0x50
> [55705.740215]  schedule+0x39/0xa0
> [55705.740610]  io_schedule+0x12/0x40
> [55705.741243]  bit_wait_io+0xd/0x50
> [55705.741897]  __wait_on_bit+0x66/0x90
> [55705.742524]  ? bit_wait+0x50/0x50
> [55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
> [55705.743750]  ? init_wait_var_entry+0x40/0x40
> [55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [55705.746753]  do_writepages+0x1a/0x60
> [55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
> [55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [55705.752981]  ? retarget_shared_pending+0x70/0x70
> [55705.753686]  do_fsync+0x38/0x60
> [55705.754340]  __x64_sys_fdatasync+0x13/0x20
> [55705.755012]  do_syscall_64+0x55/0x1a0
> [55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [55705.756375] RIP: 0033:0x7f1db3fc85f0
> [55705.757042] Code: Bad RIP value.
> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
> [55826.571349]       Not tainted 5.3.0-rc8 #1
> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [55826.573618] rsync           D    0  9830   9829 0x00004002
> [55826.574790] Call Trace:
> [55826.575932]  ? __schedule+0x3cf/0x680
> [55826.577079]  ? bit_wait+0x50/0x50
> [55826.578233]  schedule+0x39/0xa0
> [55826.579350]  io_schedule+0x12/0x40
> [55826.580451]  bit_wait_io+0xd/0x50
> [55826.581527]  __wait_on_bit+0x66/0x90
> [55826.582596]  ? bit_wait+0x50/0x50
> [55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
> [55826.583550]  ? init_wait_var_entry+0x40/0x40
> [55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> [55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> [55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> [55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> [55826.585547]  do_writepages+0x1a/0x60
> [55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
> [55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> [55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> [55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> [55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> [55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> [55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> [55826.589219]  ? retarget_shared_pending+0x70/0x70
> [55826.589617]  do_fsync+0x38/0x60
> [55826.590011]  __x64_sys_fdatasync+0x13/0x20
> [55826.590411]  do_syscall_64+0x55/0x1a0
> [55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [55826.591185] RIP: 0033:0x7f1db3fc85f0
> [55826.591572] Code: Bad RIP value.
> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> 000000000000004b
> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> 00007f1db3fc85f0
> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> 0000000000000001
> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 0000000081c492ca
> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
> 0000000000000028
> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> 0000000000000000
> 
> 
> Greets,
> Stefan

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
@ 2019-09-11 13:59                                                   ` Stefan Priebe - Profihost AG
  2019-09-12 10:53                                                     ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-11 13:59 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

HI,

i've now tried v5.2.14 but that one died with - i don't know which
version to try... now

2019-09-11 15:41:09     ------------[ cut here ]------------
2019-09-11 15:41:09     kernel BUG at mm/page-writeback.c:2655!
2019-09-11 15:41:09     invalid opcode: 0000 [#1] SMP PTI
2019-09-11 15:41:09     CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted
5.2.14 #1
2019-09-11 15:41:09     Hardware name: Supermicro Super Server/X10SRi-F,
BIOS 1.0b 04/21/2015
2019-09-11 15:41:09     Workqueue: btrfs-delalloc btrfs_delalloc_helper
[btrfs]
2019-09-11 15:41:09     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
2019-09-11 15:41:09     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
d2 48 8b
2019-09-11 15:41:09     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
2019-09-11 15:41:09     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
0000000000000000
2019-09-11 15:41:09     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
ffffe660525b3140
2019-09-11 15:41:09     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
000000000002de18
2019-09-11 15:41:09     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
0000000000000000
2019-09-11 15:41:09     R13: 0000000000000001 R14: 0000000000000000 R15:
ffffbd4b8d2f3d08
2019-09-11 15:41:09     FS: 0000000000000000(0000)
GS:ffff9ade3f900000(0000) knlGS:0000000000000000
2019-09-11 15:41:09     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-09-11 15:41:09     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
00000000001606e0
2019-09-11 15:41:09     Call Trace:
2019-09-11 15:41:09     __process_pages_contig+0x270/0x360 [btrfs]
2019-09-11 15:41:09     submit_compressed_extents+0x39d/0x460 [btrfs]
2019-09-11 15:41:09     normal_work_helper+0x20f/0x320
[btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0
2019-09-11 15:41:09     ? rescuer_thread+0x330/0x330kthread+0xf8/0x130
2019-09-11 15:41:09     ?
kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40
2019-09-11 15:41:09     Modules linked in: netconsole xt_tcpudp xt_owner
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse
ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper
irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect
ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf
ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear
md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit
i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas
2019-09-11 15:41:09     ---[ end trace d9a3f99c047dc8bf ]---
2019-09-11 15:41:10     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
2019-09-11 15:41:10     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
d2 48 8b
2019-09-11 15:41:10     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
2019-09-11 15:41:10     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
0000000000000000
2019-09-11 15:41:10     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
ffffe660525b3140
2019-09-11 15:41:10     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
000000000002de18
2019-09-11 15:41:10     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
0000000000000000
2019-09-11 15:41:10     R13: 0000000000000001 R14: 0000000000000000 R15:
ffffbd4b8d2f3d08
2019-09-11 15:41:10     FS: 0000000000000000(0000)
GS:ffff9ade3f900000(0000) knlGS:0000000000000000
2019-09-11 15:41:10     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-09-11 15:41:10     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
00000000001606e0
2019-09-11 15:41:10     Kernel panic - not syncing: Fatal exception
2019-09-11 15:41:10     Kernel Offset: 0x1a000000 from
0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2019-09-11 15:41:10     Rebooting in 20 seconds..
2019-09-11 15:41:29     ACPI MEMORY or I/O RESET_REG.

Stefan
Am 11.09.19 um 08:24 schrieb Stefan Priebe - Profihost AG:
> Hi Michal,
> 
> Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG:
>> Hi Michal,
>> Am 10.09.19 um 15:24 schrieb Michal Hocko:
>>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
>>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hello Michal,
>>>>>>>
>>>>>>> ok this might take a long time. Attached you'll find a graph from a
>>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>>>>>> Usage decreases as well as cache but slowly and only over time and days.
>>>>>>>
>>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>>>>>
>>>>>> No problem. Just make sure to collect the requested data from the time
>>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get
>>>>>> an idea of how much memory gets reclaimed due to THP.
>>>>>
>>>>> You mean your sed and sort on top of the trace file? No i did not with
>>>>> the current 5.3 kernel do you think it will show anything interesting?
>>>>> Which line shows me how much memory gets reclaimed due to THP?
>>>
>>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
>>> Each command has a commented output. If you see nunmber of reclaimed
>>> pages to be large for GFP_TRANSHUGE then you are seeing a similar
>>> problem.
>>>
>>>> Is something like a kernel memory leak possible? Or wouldn't this end up
>>>> in having a lot of free memory which doesn't seem usable.
>>>
>>> I would be really surprised if this was the case.
>>>
>>>> I also wonder why a reclaim takes place when there is enough memory.
>>>
>>> This is not clear yet and it might be a bug that has been fixed since
>>> 4.18. That's why we need to see whether the same is pattern is happening
>>> with 5.3 as well.
> 
> but except from the btrfs problem the memory consumption looks far
> better than before.
> 
> Running 4.19.X:
> after about 12h cache starts to drop from 30G to 24G
> 
> Running 5.3-rc8:
> after about 24h cache is still constant at nearly 30G
> 
> Greets,
> Stefan
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI)
  2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
@ 2019-09-11 14:09                                                   ` Stefan Priebe - Profihost AG
  2019-09-11 14:56                                                   ` Filipe Manana
  1 sibling, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-11 14:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka,
	Jens Axboe, linux-block, linux-fsdevel, David Sterba,
	linux-btrfs

HI,

i've now tried v5.2.14 but that one died with - i don't know which
version to try... now

2019-09-11 15:41:09     ------------[ cut here ]------------
2019-09-11 15:41:09     kernel BUG at mm/page-writeback.c:2655!
2019-09-11 15:41:09     invalid opcode: 0000 [#1] SMP PTI
2019-09-11 15:41:09     CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted
5.2.14 #1
2019-09-11 15:41:09     Hardware name: Supermicro Super Server/X10SRi-F,
BIOS 1.0b 04/21/2015
2019-09-11 15:41:09     Workqueue: btrfs-delalloc btrfs_delalloc_helper
[btrfs]
2019-09-11 15:41:09     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
2019-09-11 15:41:09     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
d2 48 8b
2019-09-11 15:41:09     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
2019-09-11 15:41:09     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
0000000000000000
2019-09-11 15:41:09     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
ffffe660525b3140
2019-09-11 15:41:09     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
000000000002de18
2019-09-11 15:41:09     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
0000000000000000
2019-09-11 15:41:09     R13: 0000000000000001 R14: 0000000000000000 R15:
ffffbd4b8d2f3d08
2019-09-11 15:41:09     FS: 0000000000000000(0000)
GS:ffff9ade3f900000(0000) knlGS:0000000000000000
2019-09-11 15:41:09     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-09-11 15:41:09     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
00000000001606e0
2019-09-11 15:41:09     Call Trace:
2019-09-11 15:41:09     __process_pages_contig+0x270/0x360 [btrfs]
2019-09-11 15:41:09     submit_compressed_extents+0x39d/0x460 [btrfs]
2019-09-11 15:41:09     normal_work_helper+0x20f/0x320
[btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0
2019-09-11 15:41:09     ? rescuer_thread+0x330/0x330kthread+0xf8/0x130
2019-09-11 15:41:09     ?
kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40
2019-09-11 15:41:09     Modules linked in: netconsole xt_tcpudp xt_owner
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse
ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper
irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect
ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf
ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear
md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit
i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas
2019-09-11 15:41:09     ---[ end trace d9a3f99c047dc8bf ]---
2019-09-11 15:41:10     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
2019-09-11 15:41:10     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
d2 48 8b
2019-09-11 15:41:10     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
2019-09-11 15:41:10     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
0000000000000000
2019-09-11 15:41:10     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
ffffe660525b3140
2019-09-11 15:41:10     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
000000000002de18
2019-09-11 15:41:10     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
0000000000000000
2019-09-11 15:41:10     R13: 0000000000000001 R14: 0000000000000000 R15:
ffffbd4b8d2f3d08
2019-09-11 15:41:10     FS: 0000000000000000(0000)
GS:ffff9ade3f900000(0000) knlGS:0000000000000000
2019-09-11 15:41:10     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-09-11 15:41:10     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
00000000001606e0
2019-09-11 15:41:10     Kernel panic - not syncing: Fatal exception
2019-09-11 15:41:10     Kernel Offset: 0x1a000000 from
0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2019-09-11 15:41:10     Rebooting in 20 seconds..
2019-09-11 15:41:29     ACPI MEMORY or I/O RESET_REG.



Am 11.09.19 um 09:09 schrieb Michal Hocko:
> This smells like IO/Btrfs issue to me. Cc some more people.
> 
> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote:
> [...]
>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
>> on backup disk completely hangs / is blocked at 100% i/o:
>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
>> [54739.066973]       Not tainted 5.3.0-rc8 #1
>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [54739.069065] rsync           D    0  9830   9829 0x00004002
>> [54739.070146] Call Trace:
>> [54739.071183]  ? __schedule+0x3cf/0x680
>> [54739.072202]  ? bit_wait+0x50/0x50
>> [54739.073196]  schedule+0x39/0xa0
>> [54739.074213]  io_schedule+0x12/0x40
>> [54739.075219]  bit_wait_io+0xd/0x50
>> [54739.076227]  __wait_on_bit+0x66/0x90
>> [54739.077239]  ? bit_wait+0x50/0x50
>> [54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
>> [54739.078741]  ? init_wait_var_entry+0x40/0x40
>> [54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [54739.080748]  do_writepages+0x1a/0x60
>> [54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
>> [54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [54739.084608]  ? retarget_shared_pending+0x70/0x70
>> [54739.085049]  do_fsync+0x38/0x60
>> [54739.085494]  __x64_sys_fdatasync+0x13/0x20
>> [54739.085944]  do_syscall_64+0x55/0x1a0
>> [54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [54739.086850] RIP: 0033:0x7f1db3fc85f0
>> [54739.087310] Code: Bad RIP value.
>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
>> [54859.900863]       Not tainted 5.3.0-rc8 #1
>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [54859.902909] rsync           D    0  9830   9829 0x00004002
>> [54859.903930] Call Trace:
>> [54859.904888]  ? __schedule+0x3cf/0x680
>> [54859.905831]  ? bit_wait+0x50/0x50
>> [54859.906751]  schedule+0x39/0xa0
>> [54859.907653]  io_schedule+0x12/0x40
>> [54859.908535]  bit_wait_io+0xd/0x50
>> [54859.909441]  __wait_on_bit+0x66/0x90
>> [54859.910306]  ? bit_wait+0x50/0x50
>> [54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
>> [54859.912043]  ? init_wait_var_entry+0x40/0x40
>> [54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [54859.914276]  do_writepages+0x1a/0x60
>> [54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
>> [54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [54859.917903]  ? retarget_shared_pending+0x70/0x70
>> [54859.918307]  do_fsync+0x38/0x60
>> [54859.918707]  __x64_sys_fdatasync+0x13/0x20
>> [54859.919106]  do_syscall_64+0x55/0x1a0
>> [54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [54859.919866] RIP: 0033:0x7f1db3fc85f0
>> [54859.920243] Code: Bad RIP value.
>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
>> [54980.734061]       Not tainted 5.3.0-rc8 #1
>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [54980.735209] rsync           D    0  9830   9829 0x00004002
>> [54980.735802] Call Trace:
>> [54980.736473]  ? __schedule+0x3cf/0x680
>> [54980.737054]  ? bit_wait+0x50/0x50
>> [54980.737664]  schedule+0x39/0xa0
>> [54980.738243]  io_schedule+0x12/0x40
>> [54980.738712]  bit_wait_io+0xd/0x50
>> [54980.739171]  __wait_on_bit+0x66/0x90
>> [54980.739623]  ? bit_wait+0x50/0x50
>> [54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
>> [54980.740548]  ? init_wait_var_entry+0x40/0x40
>> [54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [54980.743045]  do_writepages+0x1a/0x60
>> [54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
>> [54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [54980.747575]  ? retarget_shared_pending+0x70/0x70
>> [54980.748059]  do_fsync+0x38/0x60
>> [54980.748539]  __x64_sys_fdatasync+0x13/0x20
>> [54980.749012]  do_syscall_64+0x55/0x1a0
>> [54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [54980.749995] RIP: 0033:0x7f1db3fc85f0
>> [54980.750368] Code: Bad RIP value.
>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
>> [55101.567775]       Not tainted 5.3.0-rc8 #1
>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [55101.568649] rsync           D    0  9830   9829 0x00004002
>> [55101.569101] Call Trace:
>> [55101.569609]  ? __schedule+0x3cf/0x680
>> [55101.570052]  ? bit_wait+0x50/0x50
>> [55101.570504]  schedule+0x39/0xa0
>> [55101.570938]  io_schedule+0x12/0x40
>> [55101.571404]  bit_wait_io+0xd/0x50
>> [55101.571934]  __wait_on_bit+0x66/0x90
>> [55101.572601]  ? bit_wait+0x50/0x50
>> [55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
>> [55101.573599]  ? init_wait_var_entry+0x40/0x40
>> [55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [55101.575580]  do_writepages+0x1a/0x60
>> [55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
>> [55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [55101.579139]  ? retarget_shared_pending+0x70/0x70
>> [55101.579543]  do_fsync+0x38/0x60
>> [55101.579928]  __x64_sys_fdatasync+0x13/0x20
>> [55101.580312]  do_syscall_64+0x55/0x1a0
>> [55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [55101.581086] RIP: 0033:0x7f1db3fc85f0
>> [55101.581463] Code: Bad RIP value.
>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
>> [55222.405773]       Not tainted 5.3.0-rc8 #1
>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [55222.407158] rsync           D    0  9830   9829 0x00004002
>> [55222.407776] Call Trace:
>> [55222.408450]  ? __schedule+0x3cf/0x680
>> [55222.409206]  ? bit_wait+0x50/0x50
>> [55222.409942]  schedule+0x39/0xa0
>> [55222.410658]  io_schedule+0x12/0x40
>> [55222.411346]  bit_wait_io+0xd/0x50
>> [55222.411946]  __wait_on_bit+0x66/0x90
>> [55222.412572]  ? bit_wait+0x50/0x50
>> [55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
>> [55222.413944]  ? init_wait_var_entry+0x40/0x40
>> [55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [55222.417505]  do_writepages+0x1a/0x60
>> [55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
>> [55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [55222.424140]  ? retarget_shared_pending+0x70/0x70
>> [55222.424861]  do_fsync+0x38/0x60
>> [55222.425581]  __x64_sys_fdatasync+0x13/0x20
>> [55222.426308]  do_syscall_64+0x55/0x1a0
>> [55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [55222.427732] RIP: 0033:0x7f1db3fc85f0
>> [55222.428396] Code: Bad RIP value.
>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
>> [55343.235887]       Not tainted 5.3.0-rc8 #1
>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [55343.237213] rsync           D    0  9830   9829 0x00004002
>> [55343.237766] Call Trace:
>> [55343.238353]  ? __schedule+0x3cf/0x680
>> [55343.238971]  ? bit_wait+0x50/0x50
>> [55343.239592]  schedule+0x39/0xa0
>> [55343.240173]  io_schedule+0x12/0x40
>> [55343.240721]  bit_wait_io+0xd/0x50
>> [55343.241266]  __wait_on_bit+0x66/0x90
>> [55343.241835]  ? bit_wait+0x50/0x50
>> [55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
>> [55343.242938]  ? init_wait_var_entry+0x40/0x40
>> [55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [55343.245843]  do_writepages+0x1a/0x60
>> [55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
>> [55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [55343.251139]  ? retarget_shared_pending+0x70/0x70
>> [55343.251628]  do_fsync+0x38/0x60
>> [55343.252208]  __x64_sys_fdatasync+0x13/0x20
>> [55343.252702]  do_syscall_64+0x55/0x1a0
>> [55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [55343.253798] RIP: 0033:0x7f1db3fc85f0
>> [55343.254294] Code: Bad RIP value.
>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
>> [55464.069701]       Not tainted 5.3.0-rc8 #1
>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [55464.071637] rsync           D    0  9830   9829 0x00004002
>> [55464.072637] Call Trace:
>> [55464.073623]  ? __schedule+0x3cf/0x680
>> [55464.074604]  ? bit_wait+0x50/0x50
>> [55464.075577]  schedule+0x39/0xa0
>> [55464.076531]  io_schedule+0x12/0x40
>> [55464.077480]  bit_wait_io+0xd/0x50
>> [55464.078400]  __wait_on_bit+0x66/0x90
>> [55464.079300]  ? bit_wait+0x50/0x50
>> [55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
>> [55464.081107]  ? init_wait_var_entry+0x40/0x40
>> [55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [55464.085456]  do_writepages+0x1a/0x60
>> [55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
>> [55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [55464.089043]  ? retarget_shared_pending+0x70/0x70
>> [55464.089429]  do_fsync+0x38/0x60
>> [55464.089811]  __x64_sys_fdatasync+0x13/0x20
>> [55464.090190]  do_syscall_64+0x55/0x1a0
>> [55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [55464.090944] RIP: 0033:0x7f1db3fc85f0
>> [55464.091321] Code: Bad RIP value.
>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
>> [55584.903748]       Not tainted 5.3.0-rc8 #1
>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [55584.906023] rsync           D    0  9830   9829 0x00004002
>> [55584.907207] Call Trace:
>> [55584.908355]  ? __schedule+0x3cf/0x680
>> [55584.909507]  ? bit_wait+0x50/0x50
>> [55584.910682]  schedule+0x39/0xa0
>> [55584.911230]  io_schedule+0x12/0x40
>> [55584.911666]  bit_wait_io+0xd/0x50
>> [55584.912092]  __wait_on_bit+0x66/0x90
>> [55584.912510]  ? bit_wait+0x50/0x50
>> [55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
>> [55584.913343]  ? init_wait_var_entry+0x40/0x40
>> [55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [55584.915588]  do_writepages+0x1a/0x60
>> [55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
>> [55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [55584.919679]  ? retarget_shared_pending+0x70/0x70
>> [55584.920122]  do_fsync+0x38/0x60
>> [55584.920559]  __x64_sys_fdatasync+0x13/0x20
>> [55584.920996]  do_syscall_64+0x55/0x1a0
>> [55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [55584.921865] RIP: 0033:0x7f1db3fc85f0
>> [55584.922298] Code: Bad RIP value.
>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
>> [55705.736999]       Not tainted 5.3.0-rc8 #1
>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [55705.738411] rsync           D    0  9830   9829 0x00004002
>> [55705.739072] Call Trace:
>> [55705.739455]  ? __schedule+0x3cf/0x680
>> [55705.739837]  ? bit_wait+0x50/0x50
>> [55705.740215]  schedule+0x39/0xa0
>> [55705.740610]  io_schedule+0x12/0x40
>> [55705.741243]  bit_wait_io+0xd/0x50
>> [55705.741897]  __wait_on_bit+0x66/0x90
>> [55705.742524]  ? bit_wait+0x50/0x50
>> [55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
>> [55705.743750]  ? init_wait_var_entry+0x40/0x40
>> [55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [55705.746753]  do_writepages+0x1a/0x60
>> [55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
>> [55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [55705.752981]  ? retarget_shared_pending+0x70/0x70
>> [55705.753686]  do_fsync+0x38/0x60
>> [55705.754340]  __x64_sys_fdatasync+0x13/0x20
>> [55705.755012]  do_syscall_64+0x55/0x1a0
>> [55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [55705.756375] RIP: 0033:0x7f1db3fc85f0
>> [55705.757042] Code: Bad RIP value.
>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
>> [55826.571349]       Not tainted 5.3.0-rc8 #1
>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>> disables this message.
>> [55826.573618] rsync           D    0  9830   9829 0x00004002
>> [55826.574790] Call Trace:
>> [55826.575932]  ? __schedule+0x3cf/0x680
>> [55826.577079]  ? bit_wait+0x50/0x50
>> [55826.578233]  schedule+0x39/0xa0
>> [55826.579350]  io_schedule+0x12/0x40
>> [55826.580451]  bit_wait_io+0xd/0x50
>> [55826.581527]  __wait_on_bit+0x66/0x90
>> [55826.582596]  ? bit_wait+0x50/0x50
>> [55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
>> [55826.583550]  ? init_wait_var_entry+0x40/0x40
>> [55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>> [55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>> [55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>> [55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>> [55826.585547]  do_writepages+0x1a/0x60
>> [55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
>> [55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>> [55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>> [55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>> [55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>> [55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>> [55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>> [55826.589219]  ? retarget_shared_pending+0x70/0x70
>> [55826.589617]  do_fsync+0x38/0x60
>> [55826.590011]  __x64_sys_fdatasync+0x13/0x20
>> [55826.590411]  do_syscall_64+0x55/0x1a0
>> [55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [55826.591185] RIP: 0033:0x7f1db3fc85f0
>> [55826.591572] Code: Bad RIP value.
>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>> 000000000000004b
>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>> 00007f1db3fc85f0
>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>> 0000000000000001
>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> 0000000081c492ca
>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
>> 0000000000000028
>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>> 0000000000000000
>>
>>
>> Greets,
>> Stefan
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI)
  2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
  2019-09-11 14:09                                                   ` Stefan Priebe - Profihost AG
@ 2019-09-11 14:56                                                   ` Filipe Manana
  2019-09-11 15:39                                                     ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Filipe Manana @ 2019-09-11 14:56 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Stefan Priebe - Profihost AG, linux-mm, l.roehrs, cgroups,
	Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block,
	linux-fsdevel, David Sterba, linux-btrfs

On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> This smells like IO/Btrfs issue to me. Cc some more people.
>
> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote:
> [...]
> > Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
> > on backup disk completely hangs / is blocked at 100% i/o:
> > [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
> > [54739.066973]       Not tainted 5.3.0-rc8 #1
> > [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [54739.069065] rsync           D    0  9830   9829 0x00004002
> > [54739.070146] Call Trace:
> > [54739.071183]  ? __schedule+0x3cf/0x680
> > [54739.072202]  ? bit_wait+0x50/0x50
> > [54739.073196]  schedule+0x39/0xa0
> > [54739.074213]  io_schedule+0x12/0x40
> > [54739.075219]  bit_wait_io+0xd/0x50
> > [54739.076227]  __wait_on_bit+0x66/0x90
> > [54739.077239]  ? bit_wait+0x50/0x50
> > [54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
> > [54739.078741]  ? init_wait_var_entry+0x40/0x40
> > [54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [54739.080748]  do_writepages+0x1a/0x60
> > [54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
> > [54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [54739.084608]  ? retarget_shared_pending+0x70/0x70
> > [54739.085049]  do_fsync+0x38/0x60
> > [54739.085494]  __x64_sys_fdatasync+0x13/0x20
> > [54739.085944]  do_syscall_64+0x55/0x1a0
> > [54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [54739.086850] RIP: 0033:0x7f1db3fc85f0
> > [54739.087310] Code: Bad RIP value.

It's a regression introduced in 5.2
Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u

Thanks.

> > [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
> > [54859.900863]       Not tainted 5.3.0-rc8 #1
> > [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [54859.902909] rsync           D    0  9830   9829 0x00004002
> > [54859.903930] Call Trace:
> > [54859.904888]  ? __schedule+0x3cf/0x680
> > [54859.905831]  ? bit_wait+0x50/0x50
> > [54859.906751]  schedule+0x39/0xa0
> > [54859.907653]  io_schedule+0x12/0x40
> > [54859.908535]  bit_wait_io+0xd/0x50
> > [54859.909441]  __wait_on_bit+0x66/0x90
> > [54859.910306]  ? bit_wait+0x50/0x50
> > [54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
> > [54859.912043]  ? init_wait_var_entry+0x40/0x40
> > [54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [54859.914276]  do_writepages+0x1a/0x60
> > [54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
> > [54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [54859.917903]  ? retarget_shared_pending+0x70/0x70
> > [54859.918307]  do_fsync+0x38/0x60
> > [54859.918707]  __x64_sys_fdatasync+0x13/0x20
> > [54859.919106]  do_syscall_64+0x55/0x1a0
> > [54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [54859.919866] RIP: 0033:0x7f1db3fc85f0
> > [54859.920243] Code: Bad RIP value.
> > [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
> > [54980.734061]       Not tainted 5.3.0-rc8 #1
> > [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [54980.735209] rsync           D    0  9830   9829 0x00004002
> > [54980.735802] Call Trace:
> > [54980.736473]  ? __schedule+0x3cf/0x680
> > [54980.737054]  ? bit_wait+0x50/0x50
> > [54980.737664]  schedule+0x39/0xa0
> > [54980.738243]  io_schedule+0x12/0x40
> > [54980.738712]  bit_wait_io+0xd/0x50
> > [54980.739171]  __wait_on_bit+0x66/0x90
> > [54980.739623]  ? bit_wait+0x50/0x50
> > [54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
> > [54980.740548]  ? init_wait_var_entry+0x40/0x40
> > [54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [54980.743045]  do_writepages+0x1a/0x60
> > [54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
> > [54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [54980.747575]  ? retarget_shared_pending+0x70/0x70
> > [54980.748059]  do_fsync+0x38/0x60
> > [54980.748539]  __x64_sys_fdatasync+0x13/0x20
> > [54980.749012]  do_syscall_64+0x55/0x1a0
> > [54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [54980.749995] RIP: 0033:0x7f1db3fc85f0
> > [54980.750368] Code: Bad RIP value.
> > [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
> > [55101.567775]       Not tainted 5.3.0-rc8 #1
> > [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [55101.568649] rsync           D    0  9830   9829 0x00004002
> > [55101.569101] Call Trace:
> > [55101.569609]  ? __schedule+0x3cf/0x680
> > [55101.570052]  ? bit_wait+0x50/0x50
> > [55101.570504]  schedule+0x39/0xa0
> > [55101.570938]  io_schedule+0x12/0x40
> > [55101.571404]  bit_wait_io+0xd/0x50
> > [55101.571934]  __wait_on_bit+0x66/0x90
> > [55101.572601]  ? bit_wait+0x50/0x50
> > [55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
> > [55101.573599]  ? init_wait_var_entry+0x40/0x40
> > [55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [55101.575580]  do_writepages+0x1a/0x60
> > [55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
> > [55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [55101.579139]  ? retarget_shared_pending+0x70/0x70
> > [55101.579543]  do_fsync+0x38/0x60
> > [55101.579928]  __x64_sys_fdatasync+0x13/0x20
> > [55101.580312]  do_syscall_64+0x55/0x1a0
> > [55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [55101.581086] RIP: 0033:0x7f1db3fc85f0
> > [55101.581463] Code: Bad RIP value.
> > [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
> > [55222.405773]       Not tainted 5.3.0-rc8 #1
> > [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [55222.407158] rsync           D    0  9830   9829 0x00004002
> > [55222.407776] Call Trace:
> > [55222.408450]  ? __schedule+0x3cf/0x680
> > [55222.409206]  ? bit_wait+0x50/0x50
> > [55222.409942]  schedule+0x39/0xa0
> > [55222.410658]  io_schedule+0x12/0x40
> > [55222.411346]  bit_wait_io+0xd/0x50
> > [55222.411946]  __wait_on_bit+0x66/0x90
> > [55222.412572]  ? bit_wait+0x50/0x50
> > [55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
> > [55222.413944]  ? init_wait_var_entry+0x40/0x40
> > [55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [55222.417505]  do_writepages+0x1a/0x60
> > [55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
> > [55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [55222.424140]  ? retarget_shared_pending+0x70/0x70
> > [55222.424861]  do_fsync+0x38/0x60
> > [55222.425581]  __x64_sys_fdatasync+0x13/0x20
> > [55222.426308]  do_syscall_64+0x55/0x1a0
> > [55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [55222.427732] RIP: 0033:0x7f1db3fc85f0
> > [55222.428396] Code: Bad RIP value.
> > [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
> > [55343.235887]       Not tainted 5.3.0-rc8 #1
> > [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [55343.237213] rsync           D    0  9830   9829 0x00004002
> > [55343.237766] Call Trace:
> > [55343.238353]  ? __schedule+0x3cf/0x680
> > [55343.238971]  ? bit_wait+0x50/0x50
> > [55343.239592]  schedule+0x39/0xa0
> > [55343.240173]  io_schedule+0x12/0x40
> > [55343.240721]  bit_wait_io+0xd/0x50
> > [55343.241266]  __wait_on_bit+0x66/0x90
> > [55343.241835]  ? bit_wait+0x50/0x50
> > [55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
> > [55343.242938]  ? init_wait_var_entry+0x40/0x40
> > [55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [55343.245843]  do_writepages+0x1a/0x60
> > [55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
> > [55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [55343.251139]  ? retarget_shared_pending+0x70/0x70
> > [55343.251628]  do_fsync+0x38/0x60
> > [55343.252208]  __x64_sys_fdatasync+0x13/0x20
> > [55343.252702]  do_syscall_64+0x55/0x1a0
> > [55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [55343.253798] RIP: 0033:0x7f1db3fc85f0
> > [55343.254294] Code: Bad RIP value.
> > [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
> > [55464.069701]       Not tainted 5.3.0-rc8 #1
> > [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [55464.071637] rsync           D    0  9830   9829 0x00004002
> > [55464.072637] Call Trace:
> > [55464.073623]  ? __schedule+0x3cf/0x680
> > [55464.074604]  ? bit_wait+0x50/0x50
> > [55464.075577]  schedule+0x39/0xa0
> > [55464.076531]  io_schedule+0x12/0x40
> > [55464.077480]  bit_wait_io+0xd/0x50
> > [55464.078400]  __wait_on_bit+0x66/0x90
> > [55464.079300]  ? bit_wait+0x50/0x50
> > [55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
> > [55464.081107]  ? init_wait_var_entry+0x40/0x40
> > [55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [55464.085456]  do_writepages+0x1a/0x60
> > [55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
> > [55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [55464.089043]  ? retarget_shared_pending+0x70/0x70
> > [55464.089429]  do_fsync+0x38/0x60
> > [55464.089811]  __x64_sys_fdatasync+0x13/0x20
> > [55464.090190]  do_syscall_64+0x55/0x1a0
> > [55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [55464.090944] RIP: 0033:0x7f1db3fc85f0
> > [55464.091321] Code: Bad RIP value.
> > [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
> > [55584.903748]       Not tainted 5.3.0-rc8 #1
> > [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [55584.906023] rsync           D    0  9830   9829 0x00004002
> > [55584.907207] Call Trace:
> > [55584.908355]  ? __schedule+0x3cf/0x680
> > [55584.909507]  ? bit_wait+0x50/0x50
> > [55584.910682]  schedule+0x39/0xa0
> > [55584.911230]  io_schedule+0x12/0x40
> > [55584.911666]  bit_wait_io+0xd/0x50
> > [55584.912092]  __wait_on_bit+0x66/0x90
> > [55584.912510]  ? bit_wait+0x50/0x50
> > [55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
> > [55584.913343]  ? init_wait_var_entry+0x40/0x40
> > [55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [55584.915588]  do_writepages+0x1a/0x60
> > [55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
> > [55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [55584.919679]  ? retarget_shared_pending+0x70/0x70
> > [55584.920122]  do_fsync+0x38/0x60
> > [55584.920559]  __x64_sys_fdatasync+0x13/0x20
> > [55584.920996]  do_syscall_64+0x55/0x1a0
> > [55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [55584.921865] RIP: 0033:0x7f1db3fc85f0
> > [55584.922298] Code: Bad RIP value.
> > [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
> > [55705.736999]       Not tainted 5.3.0-rc8 #1
> > [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [55705.738411] rsync           D    0  9830   9829 0x00004002
> > [55705.739072] Call Trace:
> > [55705.739455]  ? __schedule+0x3cf/0x680
> > [55705.739837]  ? bit_wait+0x50/0x50
> > [55705.740215]  schedule+0x39/0xa0
> > [55705.740610]  io_schedule+0x12/0x40
> > [55705.741243]  bit_wait_io+0xd/0x50
> > [55705.741897]  __wait_on_bit+0x66/0x90
> > [55705.742524]  ? bit_wait+0x50/0x50
> > [55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
> > [55705.743750]  ? init_wait_var_entry+0x40/0x40
> > [55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [55705.746753]  do_writepages+0x1a/0x60
> > [55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
> > [55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [55705.752981]  ? retarget_shared_pending+0x70/0x70
> > [55705.753686]  do_fsync+0x38/0x60
> > [55705.754340]  __x64_sys_fdatasync+0x13/0x20
> > [55705.755012]  do_syscall_64+0x55/0x1a0
> > [55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [55705.756375] RIP: 0033:0x7f1db3fc85f0
> > [55705.757042] Code: Bad RIP value.
> > [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> > [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
> > [55826.571349]       Not tainted 5.3.0-rc8 #1
> > [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > [55826.573618] rsync           D    0  9830   9829 0x00004002
> > [55826.574790] Call Trace:
> > [55826.575932]  ? __schedule+0x3cf/0x680
> > [55826.577079]  ? bit_wait+0x50/0x50
> > [55826.578233]  schedule+0x39/0xa0
> > [55826.579350]  io_schedule+0x12/0x40
> > [55826.580451]  bit_wait_io+0xd/0x50
> > [55826.581527]  __wait_on_bit+0x66/0x90
> > [55826.582596]  ? bit_wait+0x50/0x50
> > [55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
> > [55826.583550]  ? init_wait_var_entry+0x40/0x40
> > [55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> > [55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> > [55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> > [55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> > [55826.585547]  do_writepages+0x1a/0x60
> > [55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
> > [55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> > [55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> > [55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> > [55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> > [55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> > [55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> > [55826.589219]  ? retarget_shared_pending+0x70/0x70
> > [55826.589617]  do_fsync+0x38/0x60
> > [55826.590011]  __x64_sys_fdatasync+0x13/0x20
> > [55826.590411]  do_syscall_64+0x55/0x1a0
> > [55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [55826.591185] RIP: 0033:0x7f1db3fc85f0
> > [55826.591572] Code: Bad RIP value.
> > [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> > 000000000000004b
> > [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> > 00007f1db3fc85f0
> > [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> > 0000000000000001
> > [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 0000000081c492ca
> > [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
> > 0000000000000028
> > [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> > 0000000000000000
> >
> >
> > Greets,
> > Stefan
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI)
  2019-09-11 14:56                                                   ` Filipe Manana
@ 2019-09-11 15:39                                                     ` Stefan Priebe - Profihost AG
  2019-09-11 15:56                                                       ` Filipe Manana
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-11 15:39 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner,
	Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel,
	David Sterba, linux-btrfs

Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me.

Stefan

> Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>:
> 
>> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote:
>> 
>> This smells like IO/Btrfs issue to me. Cc some more people.
>> 
>>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote:
>>> [...]
>>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
>>> on backup disk completely hangs / is blocked at 100% i/o:
>>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
>>> [54739.066973]       Not tainted 5.3.0-rc8 #1
>>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [54739.069065] rsync           D    0  9830   9829 0x00004002
>>> [54739.070146] Call Trace:
>>> [54739.071183]  ? __schedule+0x3cf/0x680
>>> [54739.072202]  ? bit_wait+0x50/0x50
>>> [54739.073196]  schedule+0x39/0xa0
>>> [54739.074213]  io_schedule+0x12/0x40
>>> [54739.075219]  bit_wait_io+0xd/0x50
>>> [54739.076227]  __wait_on_bit+0x66/0x90
>>> [54739.077239]  ? bit_wait+0x50/0x50
>>> [54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [54739.078741]  ? init_wait_var_entry+0x40/0x40
>>> [54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [54739.080748]  do_writepages+0x1a/0x60
>>> [54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
>>> [54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [54739.084608]  ? retarget_shared_pending+0x70/0x70
>>> [54739.085049]  do_fsync+0x38/0x60
>>> [54739.085494]  __x64_sys_fdatasync+0x13/0x20
>>> [54739.085944]  do_syscall_64+0x55/0x1a0
>>> [54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [54739.086850] RIP: 0033:0x7f1db3fc85f0
>>> [54739.087310] Code: Bad RIP value.
> 
> It's a regression introduced in 5.2
> Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u
> 
> Thanks.
> 
>>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
>>> [54859.900863]       Not tainted 5.3.0-rc8 #1
>>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [54859.902909] rsync           D    0  9830   9829 0x00004002
>>> [54859.903930] Call Trace:
>>> [54859.904888]  ? __schedule+0x3cf/0x680
>>> [54859.905831]  ? bit_wait+0x50/0x50
>>> [54859.906751]  schedule+0x39/0xa0
>>> [54859.907653]  io_schedule+0x12/0x40
>>> [54859.908535]  bit_wait_io+0xd/0x50
>>> [54859.909441]  __wait_on_bit+0x66/0x90
>>> [54859.910306]  ? bit_wait+0x50/0x50
>>> [54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [54859.912043]  ? init_wait_var_entry+0x40/0x40
>>> [54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [54859.914276]  do_writepages+0x1a/0x60
>>> [54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
>>> [54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [54859.917903]  ? retarget_shared_pending+0x70/0x70
>>> [54859.918307]  do_fsync+0x38/0x60
>>> [54859.918707]  __x64_sys_fdatasync+0x13/0x20
>>> [54859.919106]  do_syscall_64+0x55/0x1a0
>>> [54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [54859.919866] RIP: 0033:0x7f1db3fc85f0
>>> [54859.920243] Code: Bad RIP value.
>>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
>>> [54980.734061]       Not tainted 5.3.0-rc8 #1
>>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [54980.735209] rsync           D    0  9830   9829 0x00004002
>>> [54980.735802] Call Trace:
>>> [54980.736473]  ? __schedule+0x3cf/0x680
>>> [54980.737054]  ? bit_wait+0x50/0x50
>>> [54980.737664]  schedule+0x39/0xa0
>>> [54980.738243]  io_schedule+0x12/0x40
>>> [54980.738712]  bit_wait_io+0xd/0x50
>>> [54980.739171]  __wait_on_bit+0x66/0x90
>>> [54980.739623]  ? bit_wait+0x50/0x50
>>> [54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [54980.740548]  ? init_wait_var_entry+0x40/0x40
>>> [54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [54980.743045]  do_writepages+0x1a/0x60
>>> [54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
>>> [54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [54980.747575]  ? retarget_shared_pending+0x70/0x70
>>> [54980.748059]  do_fsync+0x38/0x60
>>> [54980.748539]  __x64_sys_fdatasync+0x13/0x20
>>> [54980.749012]  do_syscall_64+0x55/0x1a0
>>> [54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [54980.749995] RIP: 0033:0x7f1db3fc85f0
>>> [54980.750368] Code: Bad RIP value.
>>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
>>> [55101.567775]       Not tainted 5.3.0-rc8 #1
>>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [55101.568649] rsync           D    0  9830   9829 0x00004002
>>> [55101.569101] Call Trace:
>>> [55101.569609]  ? __schedule+0x3cf/0x680
>>> [55101.570052]  ? bit_wait+0x50/0x50
>>> [55101.570504]  schedule+0x39/0xa0
>>> [55101.570938]  io_schedule+0x12/0x40
>>> [55101.571404]  bit_wait_io+0xd/0x50
>>> [55101.571934]  __wait_on_bit+0x66/0x90
>>> [55101.572601]  ? bit_wait+0x50/0x50
>>> [55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [55101.573599]  ? init_wait_var_entry+0x40/0x40
>>> [55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [55101.575580]  do_writepages+0x1a/0x60
>>> [55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
>>> [55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [55101.579139]  ? retarget_shared_pending+0x70/0x70
>>> [55101.579543]  do_fsync+0x38/0x60
>>> [55101.579928]  __x64_sys_fdatasync+0x13/0x20
>>> [55101.580312]  do_syscall_64+0x55/0x1a0
>>> [55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [55101.581086] RIP: 0033:0x7f1db3fc85f0
>>> [55101.581463] Code: Bad RIP value.
>>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
>>> [55222.405773]       Not tainted 5.3.0-rc8 #1
>>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [55222.407158] rsync           D    0  9830   9829 0x00004002
>>> [55222.407776] Call Trace:
>>> [55222.408450]  ? __schedule+0x3cf/0x680
>>> [55222.409206]  ? bit_wait+0x50/0x50
>>> [55222.409942]  schedule+0x39/0xa0
>>> [55222.410658]  io_schedule+0x12/0x40
>>> [55222.411346]  bit_wait_io+0xd/0x50
>>> [55222.411946]  __wait_on_bit+0x66/0x90
>>> [55222.412572]  ? bit_wait+0x50/0x50
>>> [55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [55222.413944]  ? init_wait_var_entry+0x40/0x40
>>> [55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [55222.417505]  do_writepages+0x1a/0x60
>>> [55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
>>> [55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [55222.424140]  ? retarget_shared_pending+0x70/0x70
>>> [55222.424861]  do_fsync+0x38/0x60
>>> [55222.425581]  __x64_sys_fdatasync+0x13/0x20
>>> [55222.426308]  do_syscall_64+0x55/0x1a0
>>> [55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [55222.427732] RIP: 0033:0x7f1db3fc85f0
>>> [55222.428396] Code: Bad RIP value.
>>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
>>> [55343.235887]       Not tainted 5.3.0-rc8 #1
>>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [55343.237213] rsync           D    0  9830   9829 0x00004002
>>> [55343.237766] Call Trace:
>>> [55343.238353]  ? __schedule+0x3cf/0x680
>>> [55343.238971]  ? bit_wait+0x50/0x50
>>> [55343.239592]  schedule+0x39/0xa0
>>> [55343.240173]  io_schedule+0x12/0x40
>>> [55343.240721]  bit_wait_io+0xd/0x50
>>> [55343.241266]  __wait_on_bit+0x66/0x90
>>> [55343.241835]  ? bit_wait+0x50/0x50
>>> [55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [55343.242938]  ? init_wait_var_entry+0x40/0x40
>>> [55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [55343.245843]  do_writepages+0x1a/0x60
>>> [55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
>>> [55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [55343.251139]  ? retarget_shared_pending+0x70/0x70
>>> [55343.251628]  do_fsync+0x38/0x60
>>> [55343.252208]  __x64_sys_fdatasync+0x13/0x20
>>> [55343.252702]  do_syscall_64+0x55/0x1a0
>>> [55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [55343.253798] RIP: 0033:0x7f1db3fc85f0
>>> [55343.254294] Code: Bad RIP value.
>>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
>>> [55464.069701]       Not tainted 5.3.0-rc8 #1
>>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [55464.071637] rsync           D    0  9830   9829 0x00004002
>>> [55464.072637] Call Trace:
>>> [55464.073623]  ? __schedule+0x3cf/0x680
>>> [55464.074604]  ? bit_wait+0x50/0x50
>>> [55464.075577]  schedule+0x39/0xa0
>>> [55464.076531]  io_schedule+0x12/0x40
>>> [55464.077480]  bit_wait_io+0xd/0x50
>>> [55464.078400]  __wait_on_bit+0x66/0x90
>>> [55464.079300]  ? bit_wait+0x50/0x50
>>> [55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [55464.081107]  ? init_wait_var_entry+0x40/0x40
>>> [55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [55464.085456]  do_writepages+0x1a/0x60
>>> [55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
>>> [55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [55464.089043]  ? retarget_shared_pending+0x70/0x70
>>> [55464.089429]  do_fsync+0x38/0x60
>>> [55464.089811]  __x64_sys_fdatasync+0x13/0x20
>>> [55464.090190]  do_syscall_64+0x55/0x1a0
>>> [55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [55464.090944] RIP: 0033:0x7f1db3fc85f0
>>> [55464.091321] Code: Bad RIP value.
>>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
>>> [55584.903748]       Not tainted 5.3.0-rc8 #1
>>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [55584.906023] rsync           D    0  9830   9829 0x00004002
>>> [55584.907207] Call Trace:
>>> [55584.908355]  ? __schedule+0x3cf/0x680
>>> [55584.909507]  ? bit_wait+0x50/0x50
>>> [55584.910682]  schedule+0x39/0xa0
>>> [55584.911230]  io_schedule+0x12/0x40
>>> [55584.911666]  bit_wait_io+0xd/0x50
>>> [55584.912092]  __wait_on_bit+0x66/0x90
>>> [55584.912510]  ? bit_wait+0x50/0x50
>>> [55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [55584.913343]  ? init_wait_var_entry+0x40/0x40
>>> [55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [55584.915588]  do_writepages+0x1a/0x60
>>> [55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
>>> [55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [55584.919679]  ? retarget_shared_pending+0x70/0x70
>>> [55584.920122]  do_fsync+0x38/0x60
>>> [55584.920559]  __x64_sys_fdatasync+0x13/0x20
>>> [55584.920996]  do_syscall_64+0x55/0x1a0
>>> [55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [55584.921865] RIP: 0033:0x7f1db3fc85f0
>>> [55584.922298] Code: Bad RIP value.
>>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
>>> [55705.736999]       Not tainted 5.3.0-rc8 #1
>>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [55705.738411] rsync           D    0  9830   9829 0x00004002
>>> [55705.739072] Call Trace:
>>> [55705.739455]  ? __schedule+0x3cf/0x680
>>> [55705.739837]  ? bit_wait+0x50/0x50
>>> [55705.740215]  schedule+0x39/0xa0
>>> [55705.740610]  io_schedule+0x12/0x40
>>> [55705.741243]  bit_wait_io+0xd/0x50
>>> [55705.741897]  __wait_on_bit+0x66/0x90
>>> [55705.742524]  ? bit_wait+0x50/0x50
>>> [55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [55705.743750]  ? init_wait_var_entry+0x40/0x40
>>> [55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [55705.746753]  do_writepages+0x1a/0x60
>>> [55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
>>> [55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [55705.752981]  ? retarget_shared_pending+0x70/0x70
>>> [55705.753686]  do_fsync+0x38/0x60
>>> [55705.754340]  __x64_sys_fdatasync+0x13/0x20
>>> [55705.755012]  do_syscall_64+0x55/0x1a0
>>> [55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [55705.756375] RIP: 0033:0x7f1db3fc85f0
>>> [55705.757042] Code: Bad RIP value.
>>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
>>> [55826.571349]       Not tainted 5.3.0-rc8 #1
>>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [55826.573618] rsync           D    0  9830   9829 0x00004002
>>> [55826.574790] Call Trace:
>>> [55826.575932]  ? __schedule+0x3cf/0x680
>>> [55826.577079]  ? bit_wait+0x50/0x50
>>> [55826.578233]  schedule+0x39/0xa0
>>> [55826.579350]  io_schedule+0x12/0x40
>>> [55826.580451]  bit_wait_io+0xd/0x50
>>> [55826.581527]  __wait_on_bit+0x66/0x90
>>> [55826.582596]  ? bit_wait+0x50/0x50
>>> [55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
>>> [55826.583550]  ? init_wait_var_entry+0x40/0x40
>>> [55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>> [55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>> [55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>> [55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>> [55826.585547]  do_writepages+0x1a/0x60
>>> [55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
>>> [55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>> [55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>> [55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>> [55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>> [55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>> [55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>> [55826.589219]  ? retarget_shared_pending+0x70/0x70
>>> [55826.589617]  do_fsync+0x38/0x60
>>> [55826.590011]  __x64_sys_fdatasync+0x13/0x20
>>> [55826.590411]  do_syscall_64+0x55/0x1a0
>>> [55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [55826.591185] RIP: 0033:0x7f1db3fc85f0
>>> [55826.591572] Code: Bad RIP value.
>>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>> 000000000000004b
>>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>> 00007f1db3fc85f0
>>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>> 0000000000000001
>>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>> 0000000081c492ca
>>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
>>> 0000000000000028
>>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>> 0000000000000000
>>> 
>>> 
>>> Greets,
>>> Stefan
>> 
>> --
>> Michal Hocko
>> SUSE Labs



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI)
  2019-09-11 15:39                                                     ` Stefan Priebe - Profihost AG
@ 2019-09-11 15:56                                                       ` Filipe Manana
  2019-09-11 16:15                                                         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Filipe Manana @ 2019-09-11 15:56 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner,
	Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel,
	David Sterba, linux-btrfs

On Wed, Sep 11, 2019 at 4:39 PM Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
> Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me.

I don't know, I can't see that backtrace. The thread was split and
I've only seen the one sent to the btrfs list.

>
> Stefan
>
> > Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>:
> >
> >> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote:
> >>
> >> This smells like IO/Btrfs issue to me. Cc some more people.
> >>
> >>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote:
> >>> [...]
> >>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
> >>> on backup disk completely hangs / is blocked at 100% i/o:
> >>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
> >>> [54739.066973]       Not tainted 5.3.0-rc8 #1
> >>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [54739.069065] rsync           D    0  9830   9829 0x00004002
> >>> [54739.070146] Call Trace:
> >>> [54739.071183]  ? __schedule+0x3cf/0x680
> >>> [54739.072202]  ? bit_wait+0x50/0x50
> >>> [54739.073196]  schedule+0x39/0xa0
> >>> [54739.074213]  io_schedule+0x12/0x40
> >>> [54739.075219]  bit_wait_io+0xd/0x50
> >>> [54739.076227]  __wait_on_bit+0x66/0x90
> >>> [54739.077239]  ? bit_wait+0x50/0x50
> >>> [54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [54739.078741]  ? init_wait_var_entry+0x40/0x40
> >>> [54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [54739.080748]  do_writepages+0x1a/0x60
> >>> [54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [54739.084608]  ? retarget_shared_pending+0x70/0x70
> >>> [54739.085049]  do_fsync+0x38/0x60
> >>> [54739.085494]  __x64_sys_fdatasync+0x13/0x20
> >>> [54739.085944]  do_syscall_64+0x55/0x1a0
> >>> [54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [54739.086850] RIP: 0033:0x7f1db3fc85f0
> >>> [54739.087310] Code: Bad RIP value.
> >
> > It's a regression introduced in 5.2
> > Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u
> >
> > Thanks.
> >
> >>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
> >>> [54859.900863]       Not tainted 5.3.0-rc8 #1
> >>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [54859.902909] rsync           D    0  9830   9829 0x00004002
> >>> [54859.903930] Call Trace:
> >>> [54859.904888]  ? __schedule+0x3cf/0x680
> >>> [54859.905831]  ? bit_wait+0x50/0x50
> >>> [54859.906751]  schedule+0x39/0xa0
> >>> [54859.907653]  io_schedule+0x12/0x40
> >>> [54859.908535]  bit_wait_io+0xd/0x50
> >>> [54859.909441]  __wait_on_bit+0x66/0x90
> >>> [54859.910306]  ? bit_wait+0x50/0x50
> >>> [54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [54859.912043]  ? init_wait_var_entry+0x40/0x40
> >>> [54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [54859.914276]  do_writepages+0x1a/0x60
> >>> [54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [54859.917903]  ? retarget_shared_pending+0x70/0x70
> >>> [54859.918307]  do_fsync+0x38/0x60
> >>> [54859.918707]  __x64_sys_fdatasync+0x13/0x20
> >>> [54859.919106]  do_syscall_64+0x55/0x1a0
> >>> [54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [54859.919866] RIP: 0033:0x7f1db3fc85f0
> >>> [54859.920243] Code: Bad RIP value.
> >>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
> >>> [54980.734061]       Not tainted 5.3.0-rc8 #1
> >>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [54980.735209] rsync           D    0  9830   9829 0x00004002
> >>> [54980.735802] Call Trace:
> >>> [54980.736473]  ? __schedule+0x3cf/0x680
> >>> [54980.737054]  ? bit_wait+0x50/0x50
> >>> [54980.737664]  schedule+0x39/0xa0
> >>> [54980.738243]  io_schedule+0x12/0x40
> >>> [54980.738712]  bit_wait_io+0xd/0x50
> >>> [54980.739171]  __wait_on_bit+0x66/0x90
> >>> [54980.739623]  ? bit_wait+0x50/0x50
> >>> [54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [54980.740548]  ? init_wait_var_entry+0x40/0x40
> >>> [54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [54980.743045]  do_writepages+0x1a/0x60
> >>> [54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [54980.747575]  ? retarget_shared_pending+0x70/0x70
> >>> [54980.748059]  do_fsync+0x38/0x60
> >>> [54980.748539]  __x64_sys_fdatasync+0x13/0x20
> >>> [54980.749012]  do_syscall_64+0x55/0x1a0
> >>> [54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [54980.749995] RIP: 0033:0x7f1db3fc85f0
> >>> [54980.750368] Code: Bad RIP value.
> >>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
> >>> [55101.567775]       Not tainted 5.3.0-rc8 #1
> >>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [55101.568649] rsync           D    0  9830   9829 0x00004002
> >>> [55101.569101] Call Trace:
> >>> [55101.569609]  ? __schedule+0x3cf/0x680
> >>> [55101.570052]  ? bit_wait+0x50/0x50
> >>> [55101.570504]  schedule+0x39/0xa0
> >>> [55101.570938]  io_schedule+0x12/0x40
> >>> [55101.571404]  bit_wait_io+0xd/0x50
> >>> [55101.571934]  __wait_on_bit+0x66/0x90
> >>> [55101.572601]  ? bit_wait+0x50/0x50
> >>> [55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [55101.573599]  ? init_wait_var_entry+0x40/0x40
> >>> [55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [55101.575580]  do_writepages+0x1a/0x60
> >>> [55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [55101.579139]  ? retarget_shared_pending+0x70/0x70
> >>> [55101.579543]  do_fsync+0x38/0x60
> >>> [55101.579928]  __x64_sys_fdatasync+0x13/0x20
> >>> [55101.580312]  do_syscall_64+0x55/0x1a0
> >>> [55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [55101.581086] RIP: 0033:0x7f1db3fc85f0
> >>> [55101.581463] Code: Bad RIP value.
> >>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
> >>> [55222.405773]       Not tainted 5.3.0-rc8 #1
> >>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [55222.407158] rsync           D    0  9830   9829 0x00004002
> >>> [55222.407776] Call Trace:
> >>> [55222.408450]  ? __schedule+0x3cf/0x680
> >>> [55222.409206]  ? bit_wait+0x50/0x50
> >>> [55222.409942]  schedule+0x39/0xa0
> >>> [55222.410658]  io_schedule+0x12/0x40
> >>> [55222.411346]  bit_wait_io+0xd/0x50
> >>> [55222.411946]  __wait_on_bit+0x66/0x90
> >>> [55222.412572]  ? bit_wait+0x50/0x50
> >>> [55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [55222.413944]  ? init_wait_var_entry+0x40/0x40
> >>> [55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [55222.417505]  do_writepages+0x1a/0x60
> >>> [55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [55222.424140]  ? retarget_shared_pending+0x70/0x70
> >>> [55222.424861]  do_fsync+0x38/0x60
> >>> [55222.425581]  __x64_sys_fdatasync+0x13/0x20
> >>> [55222.426308]  do_syscall_64+0x55/0x1a0
> >>> [55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [55222.427732] RIP: 0033:0x7f1db3fc85f0
> >>> [55222.428396] Code: Bad RIP value.
> >>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
> >>> [55343.235887]       Not tainted 5.3.0-rc8 #1
> >>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [55343.237213] rsync           D    0  9830   9829 0x00004002
> >>> [55343.237766] Call Trace:
> >>> [55343.238353]  ? __schedule+0x3cf/0x680
> >>> [55343.238971]  ? bit_wait+0x50/0x50
> >>> [55343.239592]  schedule+0x39/0xa0
> >>> [55343.240173]  io_schedule+0x12/0x40
> >>> [55343.240721]  bit_wait_io+0xd/0x50
> >>> [55343.241266]  __wait_on_bit+0x66/0x90
> >>> [55343.241835]  ? bit_wait+0x50/0x50
> >>> [55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [55343.242938]  ? init_wait_var_entry+0x40/0x40
> >>> [55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [55343.245843]  do_writepages+0x1a/0x60
> >>> [55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [55343.251139]  ? retarget_shared_pending+0x70/0x70
> >>> [55343.251628]  do_fsync+0x38/0x60
> >>> [55343.252208]  __x64_sys_fdatasync+0x13/0x20
> >>> [55343.252702]  do_syscall_64+0x55/0x1a0
> >>> [55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [55343.253798] RIP: 0033:0x7f1db3fc85f0
> >>> [55343.254294] Code: Bad RIP value.
> >>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
> >>> [55464.069701]       Not tainted 5.3.0-rc8 #1
> >>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [55464.071637] rsync           D    0  9830   9829 0x00004002
> >>> [55464.072637] Call Trace:
> >>> [55464.073623]  ? __schedule+0x3cf/0x680
> >>> [55464.074604]  ? bit_wait+0x50/0x50
> >>> [55464.075577]  schedule+0x39/0xa0
> >>> [55464.076531]  io_schedule+0x12/0x40
> >>> [55464.077480]  bit_wait_io+0xd/0x50
> >>> [55464.078400]  __wait_on_bit+0x66/0x90
> >>> [55464.079300]  ? bit_wait+0x50/0x50
> >>> [55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [55464.081107]  ? init_wait_var_entry+0x40/0x40
> >>> [55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [55464.085456]  do_writepages+0x1a/0x60
> >>> [55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [55464.089043]  ? retarget_shared_pending+0x70/0x70
> >>> [55464.089429]  do_fsync+0x38/0x60
> >>> [55464.089811]  __x64_sys_fdatasync+0x13/0x20
> >>> [55464.090190]  do_syscall_64+0x55/0x1a0
> >>> [55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [55464.090944] RIP: 0033:0x7f1db3fc85f0
> >>> [55464.091321] Code: Bad RIP value.
> >>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
> >>> [55584.903748]       Not tainted 5.3.0-rc8 #1
> >>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [55584.906023] rsync           D    0  9830   9829 0x00004002
> >>> [55584.907207] Call Trace:
> >>> [55584.908355]  ? __schedule+0x3cf/0x680
> >>> [55584.909507]  ? bit_wait+0x50/0x50
> >>> [55584.910682]  schedule+0x39/0xa0
> >>> [55584.911230]  io_schedule+0x12/0x40
> >>> [55584.911666]  bit_wait_io+0xd/0x50
> >>> [55584.912092]  __wait_on_bit+0x66/0x90
> >>> [55584.912510]  ? bit_wait+0x50/0x50
> >>> [55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [55584.913343]  ? init_wait_var_entry+0x40/0x40
> >>> [55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [55584.915588]  do_writepages+0x1a/0x60
> >>> [55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [55584.919679]  ? retarget_shared_pending+0x70/0x70
> >>> [55584.920122]  do_fsync+0x38/0x60
> >>> [55584.920559]  __x64_sys_fdatasync+0x13/0x20
> >>> [55584.920996]  do_syscall_64+0x55/0x1a0
> >>> [55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [55584.921865] RIP: 0033:0x7f1db3fc85f0
> >>> [55584.922298] Code: Bad RIP value.
> >>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
> >>> [55705.736999]       Not tainted 5.3.0-rc8 #1
> >>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [55705.738411] rsync           D    0  9830   9829 0x00004002
> >>> [55705.739072] Call Trace:
> >>> [55705.739455]  ? __schedule+0x3cf/0x680
> >>> [55705.739837]  ? bit_wait+0x50/0x50
> >>> [55705.740215]  schedule+0x39/0xa0
> >>> [55705.740610]  io_schedule+0x12/0x40
> >>> [55705.741243]  bit_wait_io+0xd/0x50
> >>> [55705.741897]  __wait_on_bit+0x66/0x90
> >>> [55705.742524]  ? bit_wait+0x50/0x50
> >>> [55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [55705.743750]  ? init_wait_var_entry+0x40/0x40
> >>> [55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [55705.746753]  do_writepages+0x1a/0x60
> >>> [55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [55705.752981]  ? retarget_shared_pending+0x70/0x70
> >>> [55705.753686]  do_fsync+0x38/0x60
> >>> [55705.754340]  __x64_sys_fdatasync+0x13/0x20
> >>> [55705.755012]  do_syscall_64+0x55/0x1a0
> >>> [55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [55705.756375] RIP: 0033:0x7f1db3fc85f0
> >>> [55705.757042] Code: Bad RIP value.
> >>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
> >>> [55826.571349]       Not tainted 5.3.0-rc8 #1
> >>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>> disables this message.
> >>> [55826.573618] rsync           D    0  9830   9829 0x00004002
> >>> [55826.574790] Call Trace:
> >>> [55826.575932]  ? __schedule+0x3cf/0x680
> >>> [55826.577079]  ? bit_wait+0x50/0x50
> >>> [55826.578233]  schedule+0x39/0xa0
> >>> [55826.579350]  io_schedule+0x12/0x40
> >>> [55826.580451]  bit_wait_io+0xd/0x50
> >>> [55826.581527]  __wait_on_bit+0x66/0x90
> >>> [55826.582596]  ? bit_wait+0x50/0x50
> >>> [55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
> >>> [55826.583550]  ? init_wait_var_entry+0x40/0x40
> >>> [55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>> [55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>> [55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>> [55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>> [55826.585547]  do_writepages+0x1a/0x60
> >>> [55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
> >>> [55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>> [55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>> [55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>> [55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>> [55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>> [55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>> [55826.589219]  ? retarget_shared_pending+0x70/0x70
> >>> [55826.589617]  do_fsync+0x38/0x60
> >>> [55826.590011]  __x64_sys_fdatasync+0x13/0x20
> >>> [55826.590411]  do_syscall_64+0x55/0x1a0
> >>> [55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>> [55826.591185] RIP: 0033:0x7f1db3fc85f0
> >>> [55826.591572] Code: Bad RIP value.
> >>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>> 000000000000004b
> >>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>> 00007f1db3fc85f0
> >>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>> 0000000000000001
> >>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>> 0000000081c492ca
> >>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>> 0000000000000028
> >>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>> 0000000000000000
> >>>
> >>>
> >>> Greets,
> >>> Stefan
> >>
> >> --
> >> Michal Hocko
> >> SUSE Labs
>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI)
  2019-09-11 15:56                                                       ` Filipe Manana
@ 2019-09-11 16:15                                                         ` Stefan Priebe - Profihost AG
  2019-09-11 16:19                                                           ` Filipe Manana
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-11 16:15 UTC (permalink / raw)
  To: Filipe Manana
  Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner,
	Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel,
	David Sterba, linux-btrfs

Am 11.09.19 um 17:56 schrieb Filipe Manana:
> On Wed, Sep 11, 2019 at 4:39 PM Stefan Priebe - Profihost AG
> <s.priebe@profihost.ag> wrote:
>>
>> Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me.
> 
> I don't know, I can't see that backtrace. The thread was split and
> I've only seen the one sent to the btrfs list.

Hi,

strange.

This is the 5.3-rc8 stacktrace:
https://lore.kernel.org/linux-mm/d07620d9-4967-40fe-fa0f-be51f2459dc5@profihost.ag/

and this the 5.2.14:
https://lore.kernel.org/linux-mm/289fbe71-0472-520f-64e2-b6d07ced5436@profihost.ag/

Greets,
Stefan

>>
>> Stefan
>>
>>> Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>:
>>>
>>>> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote:
>>>>
>>>> This smells like IO/Btrfs issue to me. Cc some more people.
>>>>
>>>>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote:
>>>>> [...]
>>>>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
>>>>> on backup disk completely hangs / is blocked at 100% i/o:
>>>>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
>>>>> [54739.066973]       Not tainted 5.3.0-rc8 #1
>>>>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [54739.069065] rsync           D    0  9830   9829 0x00004002
>>>>> [54739.070146] Call Trace:
>>>>> [54739.071183]  ? __schedule+0x3cf/0x680
>>>>> [54739.072202]  ? bit_wait+0x50/0x50
>>>>> [54739.073196]  schedule+0x39/0xa0
>>>>> [54739.074213]  io_schedule+0x12/0x40
>>>>> [54739.075219]  bit_wait_io+0xd/0x50
>>>>> [54739.076227]  __wait_on_bit+0x66/0x90
>>>>> [54739.077239]  ? bit_wait+0x50/0x50
>>>>> [54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [54739.078741]  ? init_wait_var_entry+0x40/0x40
>>>>> [54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [54739.080748]  do_writepages+0x1a/0x60
>>>>> [54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [54739.084608]  ? retarget_shared_pending+0x70/0x70
>>>>> [54739.085049]  do_fsync+0x38/0x60
>>>>> [54739.085494]  __x64_sys_fdatasync+0x13/0x20
>>>>> [54739.085944]  do_syscall_64+0x55/0x1a0
>>>>> [54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [54739.086850] RIP: 0033:0x7f1db3fc85f0
>>>>> [54739.087310] Code: Bad RIP value.
>>>
>>> It's a regression introduced in 5.2
>>> Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u
>>>
>>> Thanks.
>>>
>>>>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
>>>>> [54859.900863]       Not tainted 5.3.0-rc8 #1
>>>>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [54859.902909] rsync           D    0  9830   9829 0x00004002
>>>>> [54859.903930] Call Trace:
>>>>> [54859.904888]  ? __schedule+0x3cf/0x680
>>>>> [54859.905831]  ? bit_wait+0x50/0x50
>>>>> [54859.906751]  schedule+0x39/0xa0
>>>>> [54859.907653]  io_schedule+0x12/0x40
>>>>> [54859.908535]  bit_wait_io+0xd/0x50
>>>>> [54859.909441]  __wait_on_bit+0x66/0x90
>>>>> [54859.910306]  ? bit_wait+0x50/0x50
>>>>> [54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [54859.912043]  ? init_wait_var_entry+0x40/0x40
>>>>> [54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [54859.914276]  do_writepages+0x1a/0x60
>>>>> [54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [54859.917903]  ? retarget_shared_pending+0x70/0x70
>>>>> [54859.918307]  do_fsync+0x38/0x60
>>>>> [54859.918707]  __x64_sys_fdatasync+0x13/0x20
>>>>> [54859.919106]  do_syscall_64+0x55/0x1a0
>>>>> [54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [54859.919866] RIP: 0033:0x7f1db3fc85f0
>>>>> [54859.920243] Code: Bad RIP value.
>>>>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
>>>>> [54980.734061]       Not tainted 5.3.0-rc8 #1
>>>>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [54980.735209] rsync           D    0  9830   9829 0x00004002
>>>>> [54980.735802] Call Trace:
>>>>> [54980.736473]  ? __schedule+0x3cf/0x680
>>>>> [54980.737054]  ? bit_wait+0x50/0x50
>>>>> [54980.737664]  schedule+0x39/0xa0
>>>>> [54980.738243]  io_schedule+0x12/0x40
>>>>> [54980.738712]  bit_wait_io+0xd/0x50
>>>>> [54980.739171]  __wait_on_bit+0x66/0x90
>>>>> [54980.739623]  ? bit_wait+0x50/0x50
>>>>> [54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [54980.740548]  ? init_wait_var_entry+0x40/0x40
>>>>> [54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [54980.743045]  do_writepages+0x1a/0x60
>>>>> [54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [54980.747575]  ? retarget_shared_pending+0x70/0x70
>>>>> [54980.748059]  do_fsync+0x38/0x60
>>>>> [54980.748539]  __x64_sys_fdatasync+0x13/0x20
>>>>> [54980.749012]  do_syscall_64+0x55/0x1a0
>>>>> [54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [54980.749995] RIP: 0033:0x7f1db3fc85f0
>>>>> [54980.750368] Code: Bad RIP value.
>>>>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
>>>>> [55101.567775]       Not tainted 5.3.0-rc8 #1
>>>>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [55101.568649] rsync           D    0  9830   9829 0x00004002
>>>>> [55101.569101] Call Trace:
>>>>> [55101.569609]  ? __schedule+0x3cf/0x680
>>>>> [55101.570052]  ? bit_wait+0x50/0x50
>>>>> [55101.570504]  schedule+0x39/0xa0
>>>>> [55101.570938]  io_schedule+0x12/0x40
>>>>> [55101.571404]  bit_wait_io+0xd/0x50
>>>>> [55101.571934]  __wait_on_bit+0x66/0x90
>>>>> [55101.572601]  ? bit_wait+0x50/0x50
>>>>> [55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [55101.573599]  ? init_wait_var_entry+0x40/0x40
>>>>> [55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [55101.575580]  do_writepages+0x1a/0x60
>>>>> [55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [55101.579139]  ? retarget_shared_pending+0x70/0x70
>>>>> [55101.579543]  do_fsync+0x38/0x60
>>>>> [55101.579928]  __x64_sys_fdatasync+0x13/0x20
>>>>> [55101.580312]  do_syscall_64+0x55/0x1a0
>>>>> [55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [55101.581086] RIP: 0033:0x7f1db3fc85f0
>>>>> [55101.581463] Code: Bad RIP value.
>>>>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
>>>>> [55222.405773]       Not tainted 5.3.0-rc8 #1
>>>>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [55222.407158] rsync           D    0  9830   9829 0x00004002
>>>>> [55222.407776] Call Trace:
>>>>> [55222.408450]  ? __schedule+0x3cf/0x680
>>>>> [55222.409206]  ? bit_wait+0x50/0x50
>>>>> [55222.409942]  schedule+0x39/0xa0
>>>>> [55222.410658]  io_schedule+0x12/0x40
>>>>> [55222.411346]  bit_wait_io+0xd/0x50
>>>>> [55222.411946]  __wait_on_bit+0x66/0x90
>>>>> [55222.412572]  ? bit_wait+0x50/0x50
>>>>> [55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [55222.413944]  ? init_wait_var_entry+0x40/0x40
>>>>> [55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [55222.417505]  do_writepages+0x1a/0x60
>>>>> [55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [55222.424140]  ? retarget_shared_pending+0x70/0x70
>>>>> [55222.424861]  do_fsync+0x38/0x60
>>>>> [55222.425581]  __x64_sys_fdatasync+0x13/0x20
>>>>> [55222.426308]  do_syscall_64+0x55/0x1a0
>>>>> [55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [55222.427732] RIP: 0033:0x7f1db3fc85f0
>>>>> [55222.428396] Code: Bad RIP value.
>>>>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
>>>>> [55343.235887]       Not tainted 5.3.0-rc8 #1
>>>>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [55343.237213] rsync           D    0  9830   9829 0x00004002
>>>>> [55343.237766] Call Trace:
>>>>> [55343.238353]  ? __schedule+0x3cf/0x680
>>>>> [55343.238971]  ? bit_wait+0x50/0x50
>>>>> [55343.239592]  schedule+0x39/0xa0
>>>>> [55343.240173]  io_schedule+0x12/0x40
>>>>> [55343.240721]  bit_wait_io+0xd/0x50
>>>>> [55343.241266]  __wait_on_bit+0x66/0x90
>>>>> [55343.241835]  ? bit_wait+0x50/0x50
>>>>> [55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [55343.242938]  ? init_wait_var_entry+0x40/0x40
>>>>> [55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [55343.245843]  do_writepages+0x1a/0x60
>>>>> [55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [55343.251139]  ? retarget_shared_pending+0x70/0x70
>>>>> [55343.251628]  do_fsync+0x38/0x60
>>>>> [55343.252208]  __x64_sys_fdatasync+0x13/0x20
>>>>> [55343.252702]  do_syscall_64+0x55/0x1a0
>>>>> [55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [55343.253798] RIP: 0033:0x7f1db3fc85f0
>>>>> [55343.254294] Code: Bad RIP value.
>>>>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
>>>>> [55464.069701]       Not tainted 5.3.0-rc8 #1
>>>>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [55464.071637] rsync           D    0  9830   9829 0x00004002
>>>>> [55464.072637] Call Trace:
>>>>> [55464.073623]  ? __schedule+0x3cf/0x680
>>>>> [55464.074604]  ? bit_wait+0x50/0x50
>>>>> [55464.075577]  schedule+0x39/0xa0
>>>>> [55464.076531]  io_schedule+0x12/0x40
>>>>> [55464.077480]  bit_wait_io+0xd/0x50
>>>>> [55464.078400]  __wait_on_bit+0x66/0x90
>>>>> [55464.079300]  ? bit_wait+0x50/0x50
>>>>> [55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [55464.081107]  ? init_wait_var_entry+0x40/0x40
>>>>> [55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [55464.085456]  do_writepages+0x1a/0x60
>>>>> [55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [55464.089043]  ? retarget_shared_pending+0x70/0x70
>>>>> [55464.089429]  do_fsync+0x38/0x60
>>>>> [55464.089811]  __x64_sys_fdatasync+0x13/0x20
>>>>> [55464.090190]  do_syscall_64+0x55/0x1a0
>>>>> [55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [55464.090944] RIP: 0033:0x7f1db3fc85f0
>>>>> [55464.091321] Code: Bad RIP value.
>>>>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
>>>>> [55584.903748]       Not tainted 5.3.0-rc8 #1
>>>>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [55584.906023] rsync           D    0  9830   9829 0x00004002
>>>>> [55584.907207] Call Trace:
>>>>> [55584.908355]  ? __schedule+0x3cf/0x680
>>>>> [55584.909507]  ? bit_wait+0x50/0x50
>>>>> [55584.910682]  schedule+0x39/0xa0
>>>>> [55584.911230]  io_schedule+0x12/0x40
>>>>> [55584.911666]  bit_wait_io+0xd/0x50
>>>>> [55584.912092]  __wait_on_bit+0x66/0x90
>>>>> [55584.912510]  ? bit_wait+0x50/0x50
>>>>> [55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [55584.913343]  ? init_wait_var_entry+0x40/0x40
>>>>> [55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [55584.915588]  do_writepages+0x1a/0x60
>>>>> [55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [55584.919679]  ? retarget_shared_pending+0x70/0x70
>>>>> [55584.920122]  do_fsync+0x38/0x60
>>>>> [55584.920559]  __x64_sys_fdatasync+0x13/0x20
>>>>> [55584.920996]  do_syscall_64+0x55/0x1a0
>>>>> [55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [55584.921865] RIP: 0033:0x7f1db3fc85f0
>>>>> [55584.922298] Code: Bad RIP value.
>>>>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
>>>>> [55705.736999]       Not tainted 5.3.0-rc8 #1
>>>>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [55705.738411] rsync           D    0  9830   9829 0x00004002
>>>>> [55705.739072] Call Trace:
>>>>> [55705.739455]  ? __schedule+0x3cf/0x680
>>>>> [55705.739837]  ? bit_wait+0x50/0x50
>>>>> [55705.740215]  schedule+0x39/0xa0
>>>>> [55705.740610]  io_schedule+0x12/0x40
>>>>> [55705.741243]  bit_wait_io+0xd/0x50
>>>>> [55705.741897]  __wait_on_bit+0x66/0x90
>>>>> [55705.742524]  ? bit_wait+0x50/0x50
>>>>> [55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [55705.743750]  ? init_wait_var_entry+0x40/0x40
>>>>> [55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [55705.746753]  do_writepages+0x1a/0x60
>>>>> [55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [55705.752981]  ? retarget_shared_pending+0x70/0x70
>>>>> [55705.753686]  do_fsync+0x38/0x60
>>>>> [55705.754340]  __x64_sys_fdatasync+0x13/0x20
>>>>> [55705.755012]  do_syscall_64+0x55/0x1a0
>>>>> [55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [55705.756375] RIP: 0033:0x7f1db3fc85f0
>>>>> [55705.757042] Code: Bad RIP value.
>>>>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
>>>>> [55826.571349]       Not tainted 5.3.0-rc8 #1
>>>>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>>>> disables this message.
>>>>> [55826.573618] rsync           D    0  9830   9829 0x00004002
>>>>> [55826.574790] Call Trace:
>>>>> [55826.575932]  ? __schedule+0x3cf/0x680
>>>>> [55826.577079]  ? bit_wait+0x50/0x50
>>>>> [55826.578233]  schedule+0x39/0xa0
>>>>> [55826.579350]  io_schedule+0x12/0x40
>>>>> [55826.580451]  bit_wait_io+0xd/0x50
>>>>> [55826.581527]  __wait_on_bit+0x66/0x90
>>>>> [55826.582596]  ? bit_wait+0x50/0x50
>>>>> [55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
>>>>> [55826.583550]  ? init_wait_var_entry+0x40/0x40
>>>>> [55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
>>>>> [55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
>>>>> [55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
>>>>> [55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
>>>>> [55826.585547]  do_writepages+0x1a/0x60
>>>>> [55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
>>>>> [55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
>>>>> [55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
>>>>> [55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
>>>>> [55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
>>>>> [55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
>>>>> [55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
>>>>> [55826.589219]  ? retarget_shared_pending+0x70/0x70
>>>>> [55826.589617]  do_fsync+0x38/0x60
>>>>> [55826.590011]  __x64_sys_fdatasync+0x13/0x20
>>>>> [55826.590411]  do_syscall_64+0x55/0x1a0
>>>>> [55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [55826.591185] RIP: 0033:0x7f1db3fc85f0
>>>>> [55826.591572] Code: Bad RIP value.
>>>>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
>>>>> 000000000000004b
>>>>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
>>>>> 00007f1db3fc85f0
>>>>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
>>>>> 0000000000000001
>>>>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
>>>>> 0000000081c492ca
>>>>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
>>>>> 0000000000000028
>>>>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
>>>>> 0000000000000000
>>>>>
>>>>>
>>>>> Greets,
>>>>> Stefan
>>>>
>>>> --
>>>> Michal Hocko
>>>> SUSE Labs
>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI)
  2019-09-11 16:15                                                         ` Stefan Priebe - Profihost AG
@ 2019-09-11 16:19                                                           ` Filipe Manana
  0 siblings, 0 replies; 61+ messages in thread
From: Filipe Manana @ 2019-09-11 16:19 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner,
	Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel,
	David Sterba, linux-btrfs

On Wed, Sep 11, 2019 at 5:15 PM Stefan Priebe - Profihost AG
<s.priebe@profihost.ag> wrote:
>
> Am 11.09.19 um 17:56 schrieb Filipe Manana:
> > On Wed, Sep 11, 2019 at 4:39 PM Stefan Priebe - Profihost AG
> > <s.priebe@profihost.ag> wrote:
> >>
> >> Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me.
> >
> > I don't know, I can't see that backtrace. The thread was split and
> > I've only seen the one sent to the btrfs list.
>
> Hi,
>
> strange.
>
> This is the 5.3-rc8 stacktrace:
> https://lore.kernel.org/linux-mm/d07620d9-4967-40fe-fa0f-be51f2459dc5@profihost.ag/

It's the same.

>
> and this the 5.2.14:
> https://lore.kernel.org/linux-mm/289fbe71-0472-520f-64e2-b6d07ced5436@profihost.ag/
>
> Greets,
> Stefan
>
> >>
> >> Stefan
> >>
> >>> Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>:
> >>>
> >>>> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote:
> >>>>
> >>>> This smells like IO/Btrfs issue to me. Cc some more people.
> >>>>
> >>>>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote:
> >>>>> [...]
> >>>>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process
> >>>>> on backup disk completely hangs / is blocked at 100% i/o:
> >>>>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds.
> >>>>> [54739.066973]       Not tainted 5.3.0-rc8 #1
> >>>>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [54739.069065] rsync           D    0  9830   9829 0x00004002
> >>>>> [54739.070146] Call Trace:
> >>>>> [54739.071183]  ? __schedule+0x3cf/0x680
> >>>>> [54739.072202]  ? bit_wait+0x50/0x50
> >>>>> [54739.073196]  schedule+0x39/0xa0
> >>>>> [54739.074213]  io_schedule+0x12/0x40
> >>>>> [54739.075219]  bit_wait_io+0xd/0x50
> >>>>> [54739.076227]  __wait_on_bit+0x66/0x90
> >>>>> [54739.077239]  ? bit_wait+0x50/0x50
> >>>>> [54739.078273]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [54739.078741]  ? init_wait_var_entry+0x40/0x40
> >>>>> [54739.079162]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [54739.079557]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [54739.079956]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [54739.080357]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [54739.080748]  do_writepages+0x1a/0x60
> >>>>> [54739.081140]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [54739.081558]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [54739.081985]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [54739.082412]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [54739.082847]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [54739.083280]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [54739.083725]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [54739.084170]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [54739.084608]  ? retarget_shared_pending+0x70/0x70
> >>>>> [54739.085049]  do_fsync+0x38/0x60
> >>>>> [54739.085494]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [54739.085944]  do_syscall_64+0x55/0x1a0
> >>>>> [54739.086395]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [54739.086850] RIP: 0033:0x7f1db3fc85f0
> >>>>> [54739.087310] Code: Bad RIP value.
> >>>
> >>> It's a regression introduced in 5.2
> >>> Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u
> >>>
> >>> Thanks.
> >>>
> >>>>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds.
> >>>>> [54859.900863]       Not tainted 5.3.0-rc8 #1
> >>>>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [54859.902909] rsync           D    0  9830   9829 0x00004002
> >>>>> [54859.903930] Call Trace:
> >>>>> [54859.904888]  ? __schedule+0x3cf/0x680
> >>>>> [54859.905831]  ? bit_wait+0x50/0x50
> >>>>> [54859.906751]  schedule+0x39/0xa0
> >>>>> [54859.907653]  io_schedule+0x12/0x40
> >>>>> [54859.908535]  bit_wait_io+0xd/0x50
> >>>>> [54859.909441]  __wait_on_bit+0x66/0x90
> >>>>> [54859.910306]  ? bit_wait+0x50/0x50
> >>>>> [54859.911177]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [54859.912043]  ? init_wait_var_entry+0x40/0x40
> >>>>> [54859.912727]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [54859.913113]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [54859.913501]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [54859.913894]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [54859.914276]  do_writepages+0x1a/0x60
> >>>>> [54859.914656]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [54859.915052]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [54859.915449]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [54859.915855]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [54859.916256]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [54859.916658]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [54859.917078]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [54859.917497]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [54859.917903]  ? retarget_shared_pending+0x70/0x70
> >>>>> [54859.918307]  do_fsync+0x38/0x60
> >>>>> [54859.918707]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [54859.919106]  do_syscall_64+0x55/0x1a0
> >>>>> [54859.919482]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [54859.919866] RIP: 0033:0x7f1db3fc85f0
> >>>>> [54859.920243] Code: Bad RIP value.
> >>>>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds.
> >>>>> [54980.734061]       Not tainted 5.3.0-rc8 #1
> >>>>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [54980.735209] rsync           D    0  9830   9829 0x00004002
> >>>>> [54980.735802] Call Trace:
> >>>>> [54980.736473]  ? __schedule+0x3cf/0x680
> >>>>> [54980.737054]  ? bit_wait+0x50/0x50
> >>>>> [54980.737664]  schedule+0x39/0xa0
> >>>>> [54980.738243]  io_schedule+0x12/0x40
> >>>>> [54980.738712]  bit_wait_io+0xd/0x50
> >>>>> [54980.739171]  __wait_on_bit+0x66/0x90
> >>>>> [54980.739623]  ? bit_wait+0x50/0x50
> >>>>> [54980.740073]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [54980.740548]  ? init_wait_var_entry+0x40/0x40
> >>>>> [54980.741033]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [54980.741579]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [54980.742076]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [54980.742560]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [54980.743045]  do_writepages+0x1a/0x60
> >>>>> [54980.743516]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [54980.744019]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [54980.744513]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [54980.745026]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [54980.745563]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [54980.746073]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [54980.746575]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [54980.747074]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [54980.747575]  ? retarget_shared_pending+0x70/0x70
> >>>>> [54980.748059]  do_fsync+0x38/0x60
> >>>>> [54980.748539]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [54980.749012]  do_syscall_64+0x55/0x1a0
> >>>>> [54980.749512]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [54980.749995] RIP: 0033:0x7f1db3fc85f0
> >>>>> [54980.750368] Code: Bad RIP value.
> >>>>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds.
> >>>>> [55101.567775]       Not tainted 5.3.0-rc8 #1
> >>>>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [55101.568649] rsync           D    0  9830   9829 0x00004002
> >>>>> [55101.569101] Call Trace:
> >>>>> [55101.569609]  ? __schedule+0x3cf/0x680
> >>>>> [55101.570052]  ? bit_wait+0x50/0x50
> >>>>> [55101.570504]  schedule+0x39/0xa0
> >>>>> [55101.570938]  io_schedule+0x12/0x40
> >>>>> [55101.571404]  bit_wait_io+0xd/0x50
> >>>>> [55101.571934]  __wait_on_bit+0x66/0x90
> >>>>> [55101.572601]  ? bit_wait+0x50/0x50
> >>>>> [55101.573235]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [55101.573599]  ? init_wait_var_entry+0x40/0x40
> >>>>> [55101.574008]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [55101.574394]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [55101.574783]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [55101.575184]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [55101.575580]  do_writepages+0x1a/0x60
> >>>>> [55101.575959]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [55101.576351]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [55101.576746]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [55101.577144]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [55101.577543]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55101.577939]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55101.578343]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [55101.578746]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [55101.579139]  ? retarget_shared_pending+0x70/0x70
> >>>>> [55101.579543]  do_fsync+0x38/0x60
> >>>>> [55101.579928]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [55101.580312]  do_syscall_64+0x55/0x1a0
> >>>>> [55101.580706]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [55101.581086] RIP: 0033:0x7f1db3fc85f0
> >>>>> [55101.581463] Code: Bad RIP value.
> >>>>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds.
> >>>>> [55222.405773]       Not tainted 5.3.0-rc8 #1
> >>>>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [55222.407158] rsync           D    0  9830   9829 0x00004002
> >>>>> [55222.407776] Call Trace:
> >>>>> [55222.408450]  ? __schedule+0x3cf/0x680
> >>>>> [55222.409206]  ? bit_wait+0x50/0x50
> >>>>> [55222.409942]  schedule+0x39/0xa0
> >>>>> [55222.410658]  io_schedule+0x12/0x40
> >>>>> [55222.411346]  bit_wait_io+0xd/0x50
> >>>>> [55222.411946]  __wait_on_bit+0x66/0x90
> >>>>> [55222.412572]  ? bit_wait+0x50/0x50
> >>>>> [55222.413249]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [55222.413944]  ? init_wait_var_entry+0x40/0x40
> >>>>> [55222.414675]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [55222.415362]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [55222.416085]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [55222.416796]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [55222.417505]  do_writepages+0x1a/0x60
> >>>>> [55222.418243]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [55222.418969]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [55222.419713]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [55222.420453]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [55222.421206]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55222.421925]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55222.422656]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [55222.423400]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [55222.424140]  ? retarget_shared_pending+0x70/0x70
> >>>>> [55222.424861]  do_fsync+0x38/0x60
> >>>>> [55222.425581]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [55222.426308]  do_syscall_64+0x55/0x1a0
> >>>>> [55222.427025]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [55222.427732] RIP: 0033:0x7f1db3fc85f0
> >>>>> [55222.428396] Code: Bad RIP value.
> >>>>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds.
> >>>>> [55343.235887]       Not tainted 5.3.0-rc8 #1
> >>>>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [55343.237213] rsync           D    0  9830   9829 0x00004002
> >>>>> [55343.237766] Call Trace:
> >>>>> [55343.238353]  ? __schedule+0x3cf/0x680
> >>>>> [55343.238971]  ? bit_wait+0x50/0x50
> >>>>> [55343.239592]  schedule+0x39/0xa0
> >>>>> [55343.240173]  io_schedule+0x12/0x40
> >>>>> [55343.240721]  bit_wait_io+0xd/0x50
> >>>>> [55343.241266]  __wait_on_bit+0x66/0x90
> >>>>> [55343.241835]  ? bit_wait+0x50/0x50
> >>>>> [55343.242418]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [55343.242938]  ? init_wait_var_entry+0x40/0x40
> >>>>> [55343.243496]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [55343.244090]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [55343.244720]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [55343.245296]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [55343.245843]  do_writepages+0x1a/0x60
> >>>>> [55343.246407]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [55343.247014]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [55343.247631]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [55343.248186]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [55343.248743]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55343.249326]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55343.249931]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [55343.250562]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [55343.251139]  ? retarget_shared_pending+0x70/0x70
> >>>>> [55343.251628]  do_fsync+0x38/0x60
> >>>>> [55343.252208]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [55343.252702]  do_syscall_64+0x55/0x1a0
> >>>>> [55343.253212]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [55343.253798] RIP: 0033:0x7f1db3fc85f0
> >>>>> [55343.254294] Code: Bad RIP value.
> >>>>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds.
> >>>>> [55464.069701]       Not tainted 5.3.0-rc8 #1
> >>>>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [55464.071637] rsync           D    0  9830   9829 0x00004002
> >>>>> [55464.072637] Call Trace:
> >>>>> [55464.073623]  ? __schedule+0x3cf/0x680
> >>>>> [55464.074604]  ? bit_wait+0x50/0x50
> >>>>> [55464.075577]  schedule+0x39/0xa0
> >>>>> [55464.076531]  io_schedule+0x12/0x40
> >>>>> [55464.077480]  bit_wait_io+0xd/0x50
> >>>>> [55464.078400]  __wait_on_bit+0x66/0x90
> >>>>> [55464.079300]  ? bit_wait+0x50/0x50
> >>>>> [55464.080184]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [55464.081107]  ? init_wait_var_entry+0x40/0x40
> >>>>> [55464.082047]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [55464.083001]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [55464.083963]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [55464.084944]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [55464.085456]  do_writepages+0x1a/0x60
> >>>>> [55464.085840]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [55464.086231]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [55464.086625]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [55464.087019]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [55464.087417]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55464.087814]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55464.088219]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [55464.088652]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [55464.089043]  ? retarget_shared_pending+0x70/0x70
> >>>>> [55464.089429]  do_fsync+0x38/0x60
> >>>>> [55464.089811]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [55464.090190]  do_syscall_64+0x55/0x1a0
> >>>>> [55464.090568]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [55464.090944] RIP: 0033:0x7f1db3fc85f0
> >>>>> [55464.091321] Code: Bad RIP value.
> >>>>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds.
> >>>>> [55584.903748]       Not tainted 5.3.0-rc8 #1
> >>>>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [55584.906023] rsync           D    0  9830   9829 0x00004002
> >>>>> [55584.907207] Call Trace:
> >>>>> [55584.908355]  ? __schedule+0x3cf/0x680
> >>>>> [55584.909507]  ? bit_wait+0x50/0x50
> >>>>> [55584.910682]  schedule+0x39/0xa0
> >>>>> [55584.911230]  io_schedule+0x12/0x40
> >>>>> [55584.911666]  bit_wait_io+0xd/0x50
> >>>>> [55584.912092]  __wait_on_bit+0x66/0x90
> >>>>> [55584.912510]  ? bit_wait+0x50/0x50
> >>>>> [55584.912924]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [55584.913343]  ? init_wait_var_entry+0x40/0x40
> >>>>> [55584.913795]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [55584.914242]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [55584.914698]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [55584.915152]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [55584.915588]  do_writepages+0x1a/0x60
> >>>>> [55584.916022]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [55584.916474]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [55584.916928]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [55584.917386]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [55584.917844]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55584.918300]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55584.918772]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [55584.919233]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [55584.919679]  ? retarget_shared_pending+0x70/0x70
> >>>>> [55584.920122]  do_fsync+0x38/0x60
> >>>>> [55584.920559]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [55584.920996]  do_syscall_64+0x55/0x1a0
> >>>>> [55584.921429]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [55584.921865] RIP: 0033:0x7f1db3fc85f0
> >>>>> [55584.922298] Code: Bad RIP value.
> >>>>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds.
> >>>>> [55705.736999]       Not tainted 5.3.0-rc8 #1
> >>>>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [55705.738411] rsync           D    0  9830   9829 0x00004002
> >>>>> [55705.739072] Call Trace:
> >>>>> [55705.739455]  ? __schedule+0x3cf/0x680
> >>>>> [55705.739837]  ? bit_wait+0x50/0x50
> >>>>> [55705.740215]  schedule+0x39/0xa0
> >>>>> [55705.740610]  io_schedule+0x12/0x40
> >>>>> [55705.741243]  bit_wait_io+0xd/0x50
> >>>>> [55705.741897]  __wait_on_bit+0x66/0x90
> >>>>> [55705.742524]  ? bit_wait+0x50/0x50
> >>>>> [55705.743131]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [55705.743750]  ? init_wait_var_entry+0x40/0x40
> >>>>> [55705.744128]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [55705.744766]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [55705.745440]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [55705.746118]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [55705.746753]  do_writepages+0x1a/0x60
> >>>>> [55705.747411]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [55705.748106]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [55705.748807]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [55705.749495]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [55705.750190]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55705.750890]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55705.751580]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [55705.752293]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [55705.752981]  ? retarget_shared_pending+0x70/0x70
> >>>>> [55705.753686]  do_fsync+0x38/0x60
> >>>>> [55705.754340]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [55705.755012]  do_syscall_64+0x55/0x1a0
> >>>>> [55705.755678]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [55705.756375] RIP: 0033:0x7f1db3fc85f0
> >>>>> [55705.757042] Code: Bad RIP value.
> >>>>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds.
> >>>>> [55826.571349]       Not tainted 5.3.0-rc8 #1
> >>>>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>>>> disables this message.
> >>>>> [55826.573618] rsync           D    0  9830   9829 0x00004002
> >>>>> [55826.574790] Call Trace:
> >>>>> [55826.575932]  ? __schedule+0x3cf/0x680
> >>>>> [55826.577079]  ? bit_wait+0x50/0x50
> >>>>> [55826.578233]  schedule+0x39/0xa0
> >>>>> [55826.579350]  io_schedule+0x12/0x40
> >>>>> [55826.580451]  bit_wait_io+0xd/0x50
> >>>>> [55826.581527]  __wait_on_bit+0x66/0x90
> >>>>> [55826.582596]  ? bit_wait+0x50/0x50
> >>>>> [55826.583178]  out_of_line_wait_on_bit+0x8b/0xb0
> >>>>> [55826.583550]  ? init_wait_var_entry+0x40/0x40
> >>>>> [55826.583953]  lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs]
> >>>>> [55826.584356]  btree_write_cache_pages+0x17d/0x350 [btrfs]
> >>>>> [55826.584755]  ? btrfs_set_token_32+0x72/0x130 [btrfs]
> >>>>> [55826.585155]  ? merge_state.part.47+0x3f/0x160 [btrfs]
> >>>>> [55826.585547]  do_writepages+0x1a/0x60
> >>>>> [55826.585937]  __filemap_fdatawrite_range+0xc8/0x100
> >>>>> [55826.586352]  ? convert_extent_bit+0x2e8/0x580 [btrfs]
> >>>>> [55826.586761]  btrfs_write_marked_extents+0x141/0x160 [btrfs]
> >>>>> [55826.587171]  btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs]
> >>>>> [55826.587581]  ? btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55826.587990]  btrfs_commit_transaction+0x752/0x9d0 [btrfs]
> >>>>> [55826.588406]  ? btrfs_log_dentry_safe+0x54/0x70 [btrfs]
> >>>>> [55826.588818]  btrfs_sync_file+0x395/0x3e0 [btrfs]
> >>>>> [55826.589219]  ? retarget_shared_pending+0x70/0x70
> >>>>> [55826.589617]  do_fsync+0x38/0x60
> >>>>> [55826.590011]  __x64_sys_fdatasync+0x13/0x20
> >>>>> [55826.590411]  do_syscall_64+0x55/0x1a0
> >>>>> [55826.590798]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >>>>> [55826.591185] RIP: 0033:0x7f1db3fc85f0
> >>>>> [55826.591572] Code: Bad RIP value.
> >>>>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX:
> >>>>> 000000000000004b
> >>>>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX:
> >>>>> 00007f1db3fc85f0
> >>>>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI:
> >>>>> 0000000000000001
> >>>>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >>>>> 0000000081c492ca
> >>>>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12:
> >>>>> 0000000000000028
> >>>>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15:
> >>>>> 0000000000000000
> >>>>>
> >>>>>
> >>>>> Greets,
> >>>>> Stefan
> >>>>
> >>>> --
> >>>> Michal Hocko
> >>>> SUSE Labs
> >>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-11 13:59                                                   ` Stefan Priebe - Profihost AG
@ 2019-09-12 10:53                                                     ` Stefan Priebe - Profihost AG
  2019-09-12 11:06                                                       ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-12 10:53 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Hello Michal,

now the kernel (5.2.14) was locked / deadlocked with:
---------------
019-09-12 12:41:47     ------------[ cut here ]------------
2019-09-12 12:41:47     NETDEV WATCHDOG: eth0 (igb): transmit queue 2
timed out
2019-09-12 12:41:47     WARNING: CPU: 2 PID: 0 at
net/sched/sch_generic.c:443 dev_watchdog+0x254/0x260
2019-09-12 12:41:47     Modules linked in: btrfs dm_mod netconsole
xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set
iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp
bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm
drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si
syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi
sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear
md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801
i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core
megaraid_sas [last unloaded: btrfs]
2019-09-12 12:41:47     CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.14 #1
2019-09-12 12:41:47     Hardware name: Supermicro Super Server/X10SRi-F,
BIOS 1.0b 04/21/2015
2019-09-12 12:41:47     RIP: 0010:dev_watchdog+0x254/0x260
2019-09-12 12:41:47     Code: 48 85 c0 75 e4 eb 9d 4c 89 ef c6 05 a6 09
c8 00 01 e8 b0 53 fb ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 10 d6 0c be e8
ac ca 98 ff <0f> 0b e9 7c ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 41 57
41 56 49
2019-09-12 12:41:47     RSP: 0018:ffffbea7c63a0e68 EFLAGS: 00010282
2019-09-12 12:41:47     RAX: 0000000000000000 RBX: 0000000000000002 RCX:
0000000000000006
2019-09-12 12:41:47     RDX: 0000000000000007 RSI: 0000000000000086 RDI:
ffff96f9ff896540
2019-09-12 12:41:47     RBP: ffff96f9fc18041c R08: 0000000000000001 R09:
000000000000046f
2019-09-12 12:41:47     R10: ffff96f9ff89a630 R11: 0000000000000000 R12:
ffff96f9f9e16940
2019-09-12 12:41:47     R13: ffff96f9fc180000 R14: ffff96f9fc180440 R15:
0000000000000008
2019-09-12 12:41:47     FS: 0000000000000000(0000)
GS:ffff96f9ff880000(0000) knlGS:0000000000000000
2019-09-12 12:41:47     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-09-12 12:41:47     CR2: 00007fbb2c4e2000 CR3: 0000000c0d20a004 CR4:
00000000001606e0
2019-09-12 12:41:47     Call Trace:<IRQ>
2019-09-12 12:41:47     ?
pfifo_fast_reset+0x110/0x110call_timer_fn+0x2d/0x140run_timer_softirq+0x1e2/0x440
2019-09-12 12:41:47     ? timerqueue_add+0x54/0x80
2019-09-12 12:41:48     ?
enqueue_hrtimer+0x3a/0x90__do_softirq+0x10c/0x2d4irq_exit+0xdd/0xf0smp_apic_timer_interrupt+0x74/0x130apic_timer_interrupt+0xf/0x20</IRQ>
2019-09-12 12:41:48     RIP: 0010:cpuidle_enter_state+0xbd/0x410
2019-09-12 12:41:48     Code: 24 0f 1f 44 00 00 31 ff e8 b0 67 a5 ff 80
7c 24 13 00 74 12 9c 58 f6 c4 02 0f 85 2c 03 00 00 31 ff e8 a7 b9 aa ff
fb 45 85 ed <0f> 88 e0 02 00 00 4c 8b 04 24 4c 2b 44 24 08 48 ba cf f7
53 e3 a5
2019-09-12 12:41:48     RSP: 0018:ffffbea7c62f7e60 EFLAGS: 00000202
ORIG_RAX: ffffffffffffff13
2019-09-12 12:41:48     RAX: ffff96f9ff8a9840 RBX: ffffffffbe3271a0 RCX:
000000000000001f
2019-09-12 12:41:48     RDX: 000044a065c471eb RSI: 0000000024925419 RDI:
0000000000000000
2019-09-12 12:41:48     RBP: ffffdea7bfa80f00 R08: 0000000000000002 R09:
00000000000290c0
2019-09-12 12:41:48     R10: 00000000ffffffff R11: 0000000000000f05 R12:
0000000000000004
2019-09-12 12:41:48     R13: 0000000000000004 R14: 0000000000000004 R15:
ffffffffbe3271a0cpuidle_enter+0x29/0x40do_idle+0x1d5/0x220cpu_startup_entry+0x19/0x20start_secondary+0x16b/0x1b0secondary_startup_64+0xa4/0xb0
2019-09-12 12:41:48     ---[ end trace 3241d99856ac4582 ]---
2019-09-12 12:41:48     igb 0000:05:00.0 eth0: Reset adapter
-------------------------------

Stefan
Am 11.09.19 um 15:59 schrieb Stefan Priebe - Profihost AG:
> HI,
> 
> i've now tried v5.2.14 but that one died with - i don't know which
> version to try... now
> 
> 2019-09-11 15:41:09     ------------[ cut here ]------------
> 2019-09-11 15:41:09     kernel BUG at mm/page-writeback.c:2655!
> 2019-09-11 15:41:09     invalid opcode: 0000 [#1] SMP PTI
> 2019-09-11 15:41:09     CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted
> 5.2.14 #1
> 2019-09-11 15:41:09     Hardware name: Supermicro Super Server/X10SRi-F,
> BIOS 1.0b 04/21/2015
> 2019-09-11 15:41:09     Workqueue: btrfs-delalloc btrfs_delalloc_helper
> [btrfs]
> 2019-09-11 15:41:09     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
> 2019-09-11 15:41:09     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
> 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
> 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
> d2 48 8b
> 2019-09-11 15:41:09     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
> 2019-09-11 15:41:09     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
> 0000000000000000
> 2019-09-11 15:41:09     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
> ffffe660525b3140
> 2019-09-11 15:41:09     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
> 000000000002de18
> 2019-09-11 15:41:09     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
> 0000000000000000
> 2019-09-11 15:41:09     R13: 0000000000000001 R14: 0000000000000000 R15:
> ffffbd4b8d2f3d08
> 2019-09-11 15:41:09     FS: 0000000000000000(0000)
> GS:ffff9ade3f900000(0000) knlGS:0000000000000000
> 2019-09-11 15:41:09     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2019-09-11 15:41:09     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
> 00000000001606e0
> 2019-09-11 15:41:09     Call Trace:
> 2019-09-11 15:41:09     __process_pages_contig+0x270/0x360 [btrfs]
> 2019-09-11 15:41:09     submit_compressed_extents+0x39d/0x460 [btrfs]
> 2019-09-11 15:41:09     normal_work_helper+0x20f/0x320
> [btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0
> 2019-09-11 15:41:09     ? rescuer_thread+0x330/0x330kthread+0xf8/0x130
> 2019-09-11 15:41:09     ?
> kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40
> 2019-09-11 15:41:09     Modules linked in: netconsole xt_tcpudp xt_owner
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
> ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse
> ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
> x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper
> irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect
> ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf
> ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress
> zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear
> md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit
> i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas
> 2019-09-11 15:41:09     ---[ end trace d9a3f99c047dc8bf ]---
> 2019-09-11 15:41:10     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
> 2019-09-11 15:41:10     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
> 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
> 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
> d2 48 8b
> 2019-09-11 15:41:10     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
> 2019-09-11 15:41:10     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
> 0000000000000000
> 2019-09-11 15:41:10     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
> ffffe660525b3140
> 2019-09-11 15:41:10     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
> 000000000002de18
> 2019-09-11 15:41:10     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
> 0000000000000000
> 2019-09-11 15:41:10     R13: 0000000000000001 R14: 0000000000000000 R15:
> ffffbd4b8d2f3d08
> 2019-09-11 15:41:10     FS: 0000000000000000(0000)
> GS:ffff9ade3f900000(0000) knlGS:0000000000000000
> 2019-09-11 15:41:10     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2019-09-11 15:41:10     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
> 00000000001606e0
> 2019-09-11 15:41:10     Kernel panic - not syncing: Fatal exception
> 2019-09-11 15:41:10     Kernel Offset: 0x1a000000 from
> 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> 2019-09-11 15:41:10     Rebooting in 20 seconds..
> 2019-09-11 15:41:29     ACPI MEMORY or I/O RESET_REG.
> 
> Stefan
> Am 11.09.19 um 08:24 schrieb Stefan Priebe - Profihost AG:
>> Hi Michal,
>>
>> Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG:
>>> Hi Michal,
>>> Am 10.09.19 um 15:24 schrieb Michal Hocko:
>>>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
>>>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
>>>>>>
>>>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>>>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>>>>>>> Hello Michal,
>>>>>>>>
>>>>>>>> ok this might take a long time. Attached you'll find a graph from a
>>>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>>>>>>> Usage decreases as well as cache but slowly and only over time and days.
>>>>>>>>
>>>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>>>>>>
>>>>>>> No problem. Just make sure to collect the requested data from the time
>>>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get
>>>>>>> an idea of how much memory gets reclaimed due to THP.
>>>>>>
>>>>>> You mean your sed and sort on top of the trace file? No i did not with
>>>>>> the current 5.3 kernel do you think it will show anything interesting?
>>>>>> Which line shows me how much memory gets reclaimed due to THP?
>>>>
>>>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
>>>> Each command has a commented output. If you see nunmber of reclaimed
>>>> pages to be large for GFP_TRANSHUGE then you are seeing a similar
>>>> problem.
>>>>
>>>>> Is something like a kernel memory leak possible? Or wouldn't this end up
>>>>> in having a lot of free memory which doesn't seem usable.
>>>>
>>>> I would be really surprised if this was the case.
>>>>
>>>>> I also wonder why a reclaim takes place when there is enough memory.
>>>>
>>>> This is not clear yet and it might be a bug that has been fixed since
>>>> 4.18. That's why we need to see whether the same is pattern is happening
>>>> with 5.3 as well.
>>
>> but except from the btrfs problem the memory consumption looks far
>> better than before.
>>
>> Running 4.19.X:
>> after about 12h cache starts to drop from 30G to 24G
>>
>> Running 5.3-rc8:
>> after about 24h cache is still constant at nearly 30G
>>
>> Greets,
>> Stefan
>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-12 10:53                                                     ` Stefan Priebe - Profihost AG
@ 2019-09-12 11:06                                                       ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-12 11:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Sorry very short after that even more traces show up:

So currently it seems we can't test with 5.3-rc8 or 5.2.14 - what's next?


watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [authscanclient:812]
Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter
ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper
irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea
sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg
ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod
xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit
ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last
unloaded: btrfs]
CPU: 7 PID: 812 Comm: authscanclient Tainted: G        W         5.2.14 #1
watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [authscanclient:813]
Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015
Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter
ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper
irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea
sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg
ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod
xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit
ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last
unloaded: btrfs]
RIP: 0010:find_next_bit+0x1c/0x60
CPU: 11 PID: 813 Comm: authscanclient Tainted: G        W         5.2.14 #1
Code: 8d 04 0a c3 f3 c3 0f 1f 84 00 00 00 00 00 48 39 d6 48 89 f0 48 89
d1 76 4e 48 89 d6 48 c7 c2 ff ff ff ff 48 c1 ee 06 48 d3 e2 <48> 83 e1
c0 48 23 14 f7 75 24 48 83 c1 40 48 39 c8 77 0b eb 2a 48
Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015
RIP: 0010:cpumask_next+0x12/0x20
RSP: 0000:ffffbea7c770b608 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13
Code: 48 01 f8 80 38 22 75 97 eb d4 90 90 90 90 90 90 90 90 90 90 90 90
90 90 48 89 f0 8b 35 4b 82 b6 00 8d 57 01 48 89 c7 48 63 d2 <e8> d9 33
c6 ff f3 c3 0f 1f 80 00 00 00 00 55 53 89 f3 8b 35 2a 82
RAX: 000000000000000c RBX: 0000000000000004 RCX: 0000000000000002
RDX: fffffffffffffffc RSI: 0000000000000000 RDI: ffffffffbe3e5e40
RSP: 0000:ffffbea7c727f620 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13
RBP: ffff96f9edc8cc00 R08: 0000000000000000 R09: ffff96eac78049b8
R10: ffffbea7c770b750 R11: ffffffffbe2bd8d8 R12: 0000000000000010
RAX: ffffffffbe3e5e40 RBX: ffff96f9edc8cc00 RCX: 0000000000000007
RDX: 0000000000000008 RSI: 000000000000000c RDI: ffffffffbe3e5e40
R13: ffffffffbe3e5e40 R14: 0000000000000002 R15: ffffffffffffffa0
FS:  00007f3ab8edd700(0000) GS:ffff96f9ff9c0000(0000) knlGS:0000000000000000
RBP: ffffffffbe3e5e40 R08: ffff96eac7804920 R09: ffff96eac78049b8
R10: 0000000000000000 R11: ffffffffbe2bd8d8 R12: 0000000000000018
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3abbf66f20 CR3: 0000000fe60a6001 CR4: 00000000001606e0
R13: 0000000000000003 R14: 0000000000008e4b R15: fffffffffffffff6
FS:  00007f3aabfff700(0000) GS:ffff96f9ffac0000(0000) knlGS:0000000000000000
Call Trace:
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3abbf66fd0 CR3: 0000000fe60a6004 CR4: 00000000001606e0
 cpumask_next+0x17/0x20
Call Trace:
 lruvec_lru_size+0x5c/0x110
 count_shadow_nodes+0xac/0x220
 shrink_node_memcg+0xd7/0x7d0
 do_shrink_slab+0x55/0x2d0
 ? shrink_slab+0x2a3/0x2b0
 shrink_slab+0x219/0x2b0
 ? shrink_node+0xe0/0x4b0
 shrink_node+0xf6/0x4b0
 shrink_node+0xe0/0x4b0
 do_try_to_free_pages+0xeb/0x380
 do_try_to_free_pages+0xeb/0x380
 try_to_free_mem_cgroup_pages+0xe6/0x1e0
 try_to_free_mem_cgroup_pages+0xe6/0x1e0
 try_charge+0x295/0x780
 try_charge+0x295/0x780
 ? shrink_slab+0x2a3/0x2b0
 ? mem_cgroup_commit_charge+0x79/0x4a0
 mem_cgroup_try_charge+0xc2/0x190
 mem_cgroup_try_charge+0xc2/0x190
 __add_to_page_cache_locked+0x282/0x330
 __add_to_page_cache_locked+0x282/0x330
 ? count_shadow_nodes+0x220/0x220
 ? count_shadow_nodes+0x220/0x220
 add_to_page_cache_lru+0x4a/0xc0
 add_to_page_cache_lru+0x4a/0xc0
 iomap_readpages_actor+0x103/0x220
 iomap_readpages_actor+0x103/0x220
 ? iomap_write_begin.constprop.45+0x370/0x370
 ? iomap_write_begin.constprop.45+0x370/0x370
 iomap_apply+0xba/0x150
 ? iomap_write_begin.constprop.45+0x370/0x370
 iomap_apply+0xba/0x150
 iomap_readpages+0xaa/0x1a0
 ? iomap_write_begin.constprop.45+0x370/0x370
 ? iomap_write_begin.constprop.45+0x370/0x370
 iomap_readpages+0xaa/0x1a0
 ? iomap_write_begin.constprop.45+0x370/0x370
 read_pages+0x71/0x1a0
 read_pages+0x71/0x1a0
 ? 0xffffffffbd000000
 ? __do_page_cache_readahead+0x1cc/0x1e0
 ? __do_page_cache_readahead+0x1a8/0x1e0
 __do_page_cache_readahead+0x1cc/0x1e0
 __do_page_cache_readahead+0x1a8/0x1e0
 filemap_fault+0x6fc/0x960
 filemap_fault+0x6fc/0x960
 ? __mod_lruvec_state+0x3f/0xe0
 ? schedule+0x39/0xa0
 ? page_add_file_rmap+0xd1/0x160
 ? __mod_lruvec_state+0x3f/0xe0
 ? alloc_set_pte+0x4f8/0x5c0
 __xfs_filemap_fault.constprop.13+0x49/0x120
 ? page_add_file_rmap+0xd1/0x160
 __do_fault+0x3c/0x110
 ? alloc_set_pte+0x4f8/0x5c0
 __handle_mm_fault+0xa7c/0xfb0
 __xfs_filemap_fault.constprop.13+0x49/0x120
 handle_mm_fault+0xd0/0x1d0
 __do_fault+0x3c/0x110
 __do_page_fault+0x253/0x470
 __handle_mm_fault+0xa7c/0xfb0
 do_page_fault+0x2c/0x106
 handle_mm_fault+0xd0/0x1d0
 ? page_fault+0x8/0x30
 __do_page_fault+0x253/0x470
 page_fault+0x1e/0x30
 do_page_fault+0x2c/0x106
RIP: 0033:0x7f3abbf66f20
 ? page_fault+0x8/0x30
Code: 68 38 00 00 00 e9 60 fc ff ff ff 25 da 72 20 00 68 39 00 00 00 e9
50 fc ff ff ff 25 d2 72 20 00 68 3a 00 00 00 e9 40 fc ff ff <ff> 25 ca
72 20 00 68 3b 00 00 00 e9 30 fc ff ff ff 25 c2 72 20 00
 page_fault+0x1e/0x30
RSP: 002b:00007f3ab8edcb58 EFLAGS: 00010246
RIP: 0033:0x7f3abbf66fd0
RAX: 0000000000000001 RBX: 000055f087797ae0 RCX: 0000000000000010
RDX: 000000002090000c RSI: 0000000000000050 RDI: 000055f087797ae0
Code: 68 43 00 00 00 e9 b0 fb ff ff ff 25 82 72 20 00 68 44 00 00 00 e9
a0 fb ff ff ff 25 7a 72 20 00 68 45 00 00 00 e9 90 fb ff ff <ff> 25 72
72 20 00 68 46 00 00 00 e9 80 fb ff ff ff 25 6a 72 20 00
RBP: 000055f0873a69d0 R08: 0000000000000602 R09: 000055f0876daee0
R10: 000055f087034980 R11: 00000000e24313e9 R12: 000055f08779e558
RSP: 002b:00007f3aabffea68 EFLAGS: 00010287
R13: 000055f08779e558 R14: 000055f0873a69d0 R15: 000055f08779e550
RAX: 000055f087b75190 RBX: 00007f3abc16e2c0 RCX: 0000000000000012
RDX: 0000000000000002 RSI: 00007f3abc16e2c0 RDI: 00007f3abc16e2c0
RBP: 000055f08777a9f0 R08: 000000000000000f R09: 0000000000000003
R10: 000000000000000b R11: 0000000000000000 R12: 000055f08777a9f0
R13: 000055f085e5b5b8 R14: 000055f087b79240 R15: 0000000000000000
igb 0000:05:00.1 eth1: Reset adapter
igb 0000:05:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex,
Flow Control: None
watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [authscanclient:812]
Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter
ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper
irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea
sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg
ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod
xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit
ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last
unloaded: btrfs]
CPU: 7 PID: 812 Comm: authscanclient Tainted: G        W    L    5.2.14 #1
watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [authscanclient:813]
Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015
Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse
xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter
ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper
irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea
sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg
ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress
raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor
async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod
xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit
ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last
unloaded: btrfs]
RIP: 0010:cpumask_next+0x0/0x20
CPU: 11 PID: 813 Comm: authscanclient Tainted: G        W    L    5.2.14 #1
Code: 83 c7 01 e9 24 ff ff ff 48 83 c0 01 48 89 02 44 89 d0 48 01 f8 80
38 22 75 97 eb d4 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <48> 89 f0
8b 35 4b 82 b6 00 8d 57 01 48 89 c7 48 63 d2 e8 d9 33 c6
Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015
RIP: 0010:cpumask_next+0x12/0x20
RSP: 0000:ffffbea7c770b610 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
Code: 48 01 f8 80 38 22 75 97 eb d4 90 90 90 90 90 90 90 90 90 90 90 90
90 90 48 89 f0 8b 35 4b 82 b6 00 8d 57 01 48 89 c7 48 63 d2 <e8> d9 33
c6 ff f3 c3 0f 1f 80 00 00 00 00 55 53 89 f3 8b 35 2a 82
RAX: 0000000000000002 RBX: 0000000000000004 RCX: ffff96f9ff880000
RDX: 000047adbfe07c58 RSI: ffffffffbe3e5e40 RDI: 0000000000000002
RSP: 0000:ffffbea7c727f620 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
RBP: ffff96f9edc8cc00 R08: 0000000000000000 R09: ffff96eac78049b8
R10: ffffbea7c770b750 R11: ffffffffbe2bd8d8 R12: 0000000000000018
RAX: ffffffffbe3e5e40 RBX: ffff96f9edc8cc00 RCX: 0000000000000009
RDX: 000000000000000a RSI: 000000000000000c RDI: ffffffffbe3e5e40
R13: ffffffffbe3e5e40 R14: 0000000000000003 R15: ffffffffffffffc1
FS:  00007f3ab8edd700(0000) GS:ffff96f9ff9c0000(0000) knlGS:0000000000000000
RBP: ffffffffbe3e5e40 R08: ffff96eac7804920 R09: ffff96eac78049b8
R10: 0000000000000000 R11: ffffffffbe2bd8d8 R12: 0000000000000020
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3abbf66f20 CR3: 0000000fe60a6001 CR4: 00000000001606e0
R13: 0000000000000004 R14: 0000000000008e4b R15: 0000000000000000
FS:  00007f3aabfff700(0000) GS:ffff96f9ffac0000(0000) knlGS:0000000000000000
Call Trace:
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f3abbf66fd0 CR3: 0000000fe60a6004 CR4: 00000000001606e0
 lruvec_lru_size+0x5c/0x110
Call Trace:
 shrink_node_memcg+0xd7/0x7d0
 count_shadow_nodes+0xac/0x220
 ? shrink_slab+0x2a3/0x2b0
 do_shrink_slab+0x55/0x2d0
 ? shrink_node+0xe0/0x4b0
 shrink_slab+0x219/0x2b0
 shrink_node+0xe0/0x4b0

Stefan

Am 12.09.19 um 12:53 schrieb Stefan Priebe - Profihost AG:
> Hello Michal,
> 
> now the kernel (5.2.14) was locked / deadlocked with:
> ---------------
> 019-09-12 12:41:47     ------------[ cut here ]------------
> 2019-09-12 12:41:47     NETDEV WATCHDOG: eth0 (igb): transmit queue 2
> timed out
> 2019-09-12 12:41:47     WARNING: CPU: 2 PID: 0 at
> net/sched/sch_generic.c:443 dev_watchdog+0x254/0x260
> 2019-09-12 12:41:47     Modules linked in: btrfs dm_mod netconsole
> xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set
> iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp
> bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm
> drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si
> syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi
> sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress
> zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
> async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear
> md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801
> i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core
> megaraid_sas [last unloaded: btrfs]
> 2019-09-12 12:41:47     CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.14 #1
> 2019-09-12 12:41:47     Hardware name: Supermicro Super Server/X10SRi-F,
> BIOS 1.0b 04/21/2015
> 2019-09-12 12:41:47     RIP: 0010:dev_watchdog+0x254/0x260
> 2019-09-12 12:41:47     Code: 48 85 c0 75 e4 eb 9d 4c 89 ef c6 05 a6 09
> c8 00 01 e8 b0 53 fb ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 10 d6 0c be e8
> ac ca 98 ff <0f> 0b e9 7c ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 41 57
> 41 56 49
> 2019-09-12 12:41:47     RSP: 0018:ffffbea7c63a0e68 EFLAGS: 00010282
> 2019-09-12 12:41:47     RAX: 0000000000000000 RBX: 0000000000000002 RCX:
> 0000000000000006
> 2019-09-12 12:41:47     RDX: 0000000000000007 RSI: 0000000000000086 RDI:
> ffff96f9ff896540
> 2019-09-12 12:41:47     RBP: ffff96f9fc18041c R08: 0000000000000001 R09:
> 000000000000046f
> 2019-09-12 12:41:47     R10: ffff96f9ff89a630 R11: 0000000000000000 R12:
> ffff96f9f9e16940
> 2019-09-12 12:41:47     R13: ffff96f9fc180000 R14: ffff96f9fc180440 R15:
> 0000000000000008
> 2019-09-12 12:41:47     FS: 0000000000000000(0000)
> GS:ffff96f9ff880000(0000) knlGS:0000000000000000
> 2019-09-12 12:41:47     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2019-09-12 12:41:47     CR2: 00007fbb2c4e2000 CR3: 0000000c0d20a004 CR4:
> 00000000001606e0
> 2019-09-12 12:41:47     Call Trace:<IRQ>
> 2019-09-12 12:41:47     ?
> pfifo_fast_reset+0x110/0x110call_timer_fn+0x2d/0x140run_timer_softirq+0x1e2/0x440
> 2019-09-12 12:41:47     ? timerqueue_add+0x54/0x80
> 2019-09-12 12:41:48     ?
> enqueue_hrtimer+0x3a/0x90__do_softirq+0x10c/0x2d4irq_exit+0xdd/0xf0smp_apic_timer_interrupt+0x74/0x130apic_timer_interrupt+0xf/0x20</IRQ>
> 2019-09-12 12:41:48     RIP: 0010:cpuidle_enter_state+0xbd/0x410
> 2019-09-12 12:41:48     Code: 24 0f 1f 44 00 00 31 ff e8 b0 67 a5 ff 80
> 7c 24 13 00 74 12 9c 58 f6 c4 02 0f 85 2c 03 00 00 31 ff e8 a7 b9 aa ff
> fb 45 85 ed <0f> 88 e0 02 00 00 4c 8b 04 24 4c 2b 44 24 08 48 ba cf f7
> 53 e3 a5
> 2019-09-12 12:41:48     RSP: 0018:ffffbea7c62f7e60 EFLAGS: 00000202
> ORIG_RAX: ffffffffffffff13
> 2019-09-12 12:41:48     RAX: ffff96f9ff8a9840 RBX: ffffffffbe3271a0 RCX:
> 000000000000001f
> 2019-09-12 12:41:48     RDX: 000044a065c471eb RSI: 0000000024925419 RDI:
> 0000000000000000
> 2019-09-12 12:41:48     RBP: ffffdea7bfa80f00 R08: 0000000000000002 R09:
> 00000000000290c0
> 2019-09-12 12:41:48     R10: 00000000ffffffff R11: 0000000000000f05 R12:
> 0000000000000004
> 2019-09-12 12:41:48     R13: 0000000000000004 R14: 0000000000000004 R15:
> ffffffffbe3271a0cpuidle_enter+0x29/0x40do_idle+0x1d5/0x220cpu_startup_entry+0x19/0x20start_secondary+0x16b/0x1b0secondary_startup_64+0xa4/0xb0
> 2019-09-12 12:41:48     ---[ end trace 3241d99856ac4582 ]---
> 2019-09-12 12:41:48     igb 0000:05:00.0 eth0: Reset adapter
> -------------------------------
> 
> Stefan
> Am 11.09.19 um 15:59 schrieb Stefan Priebe - Profihost AG:
>> HI,
>>
>> i've now tried v5.2.14 but that one died with - i don't know which
>> version to try... now
>>
>> 2019-09-11 15:41:09     ------------[ cut here ]------------
>> 2019-09-11 15:41:09     kernel BUG at mm/page-writeback.c:2655!
>> 2019-09-11 15:41:09     invalid opcode: 0000 [#1] SMP PTI
>> 2019-09-11 15:41:09     CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted
>> 5.2.14 #1
>> 2019-09-11 15:41:09     Hardware name: Supermicro Super Server/X10SRi-F,
>> BIOS 1.0b 04/21/2015
>> 2019-09-11 15:41:09     Workqueue: btrfs-delalloc btrfs_delalloc_helper
>> [btrfs]
>> 2019-09-11 15:41:09     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
>> 2019-09-11 15:41:09     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
>> 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
>> 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
>> d2 48 8b
>> 2019-09-11 15:41:09     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
>> 2019-09-11 15:41:09     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
>> 0000000000000000
>> 2019-09-11 15:41:09     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
>> ffffe660525b3140
>> 2019-09-11 15:41:09     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
>> 000000000002de18
>> 2019-09-11 15:41:09     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
>> 0000000000000000
>> 2019-09-11 15:41:09     R13: 0000000000000001 R14: 0000000000000000 R15:
>> ffffbd4b8d2f3d08
>> 2019-09-11 15:41:09     FS: 0000000000000000(0000)
>> GS:ffff9ade3f900000(0000) knlGS:0000000000000000
>> 2019-09-11 15:41:09     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> 2019-09-11 15:41:09     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
>> 00000000001606e0
>> 2019-09-11 15:41:09     Call Trace:
>> 2019-09-11 15:41:09     __process_pages_contig+0x270/0x360 [btrfs]
>> 2019-09-11 15:41:09     submit_compressed_extents+0x39d/0x460 [btrfs]
>> 2019-09-11 15:41:09     normal_work_helper+0x20f/0x320
>> [btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0
>> 2019-09-11 15:41:09     ? rescuer_thread+0x330/0x330kthread+0xf8/0x130
>> 2019-09-11 15:41:09     ?
>> kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40
>> 2019-09-11 15:41:09     Modules linked in: netconsole xt_tcpudp xt_owner
>> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
>> ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse
>> ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
>> x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper
>> irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect
>> ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf
>> ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress
>> zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
>> async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear
>> md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit
>> i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas
>> 2019-09-11 15:41:09     ---[ end trace d9a3f99c047dc8bf ]---
>> 2019-09-11 15:41:10     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
>> 2019-09-11 15:41:10     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
>> 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
>> 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
>> d2 48 8b
>> 2019-09-11 15:41:10     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
>> 2019-09-11 15:41:10     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
>> 0000000000000000
>> 2019-09-11 15:41:10     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
>> ffffe660525b3140
>> 2019-09-11 15:41:10     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
>> 000000000002de18
>> 2019-09-11 15:41:10     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
>> 0000000000000000
>> 2019-09-11 15:41:10     R13: 0000000000000001 R14: 0000000000000000 R15:
>> ffffbd4b8d2f3d08
>> 2019-09-11 15:41:10     FS: 0000000000000000(0000)
>> GS:ffff9ade3f900000(0000) knlGS:0000000000000000
>> 2019-09-11 15:41:10     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> 2019-09-11 15:41:10     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
>> 00000000001606e0
>> 2019-09-11 15:41:10     Kernel panic - not syncing: Fatal exception
>> 2019-09-11 15:41:10     Kernel Offset: 0x1a000000 from
>> 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>> 2019-09-11 15:41:10     Rebooting in 20 seconds..
>> 2019-09-11 15:41:29     ACPI MEMORY or I/O RESET_REG.
>>
>> Stefan
>> Am 11.09.19 um 08:24 schrieb Stefan Priebe - Profihost AG:
>>> Hi Michal,
>>>
>>> Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG:
>>>> Hi Michal,
>>>> Am 10.09.19 um 15:24 schrieb Michal Hocko:
>>>>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
>>>>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
>>>>>>>
>>>>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>>>>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>>>>>>>> Hello Michal,
>>>>>>>>>
>>>>>>>>> ok this might take a long time. Attached you'll find a graph from a
>>>>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>>>>>>>> Usage decreases as well as cache but slowly and only over time and days.
>>>>>>>>>
>>>>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>>>>>>>
>>>>>>>> No problem. Just make sure to collect the requested data from the time
>>>>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get
>>>>>>>> an idea of how much memory gets reclaimed due to THP.
>>>>>>>
>>>>>>> You mean your sed and sort on top of the trace file? No i did not with
>>>>>>> the current 5.3 kernel do you think it will show anything interesting?
>>>>>>> Which line shows me how much memory gets reclaimed due to THP?
>>>>>
>>>>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
>>>>> Each command has a commented output. If you see nunmber of reclaimed
>>>>> pages to be large for GFP_TRANSHUGE then you are seeing a similar
>>>>> problem.
>>>>>
>>>>>> Is something like a kernel memory leak possible? Or wouldn't this end up
>>>>>> in having a lot of free memory which doesn't seem usable.
>>>>>
>>>>> I would be really surprised if this was the case.
>>>>>
>>>>>> I also wonder why a reclaim takes place when there is enough memory.
>>>>>
>>>>> This is not clear yet and it might be a bug that has been fixed since
>>>>> 4.18. That's why we need to see whether the same is pattern is happening
>>>>> with 5.3 as well.
>>>
>>> but except from the btrfs problem the memory consumption looks far
>>> better than before.
>>>
>>> Running 4.19.X:
>>> after about 12h cache starts to drop from 30G to 24G
>>>
>>> Running 5.3-rc8:
>>> after about 24h cache is still constant at nearly 30G
>>>
>>> Greets,
>>> Stefan
>>>


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
  2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
  2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
@ 2019-09-19 10:21                                                 ` Stefan Priebe - Profihost AG
  2019-09-23 12:08                                                   ` Michal Hocko
  2019-09-27 12:45                                                   ` Vlastimil Babka
  2 siblings, 2 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-19 10:21 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

Dear Michal,

Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG:
> Hi Michal,
> Am 10.09.19 um 15:24 schrieb Michal Hocko:
>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
>>>>
>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>>>>> Hello Michal,
>>>>>>
>>>>>> ok this might take a long time. Attached you'll find a graph from a
>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>>>>> Usage decreases as well as cache but slowly and only over time and days.
>>>>>>
>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>>>>
>>>>> No problem. Just make sure to collect the requested data from the time
>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get
>>>>> an idea of how much memory gets reclaimed due to THP.
>>>>
>>>> You mean your sed and sort on top of the trace file? No i did not with
>>>> the current 5.3 kernel do you think it will show anything interesting?
>>>> Which line shows me how much memory gets reclaimed due to THP?
>>
>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
>> Each command has a commented output. If you see nunmber of reclaimed
>> pages to be large for GFP_TRANSHUGE then you are seeing a similar
>> problem.
>>
>>> Is something like a kernel memory leak possible? Or wouldn't this end up
>>> in having a lot of free memory which doesn't seem usable.
>>
>> I would be really surprised if this was the case.
>>
>>> I also wonder why a reclaim takes place when there is enough memory.
>>
>> This is not clear yet and it might be a bug that has been fixed since
>> 4.18. That's why we need to see whether the same is pattern is happening
>> with 5.3 as well.

Kernel 5.2.14 is now running since exactly 7 days and now we can easaly
view a trend i', not sure if i should post graphs.

Cache size is continuously shrinking while memfree is rising.

While there were 4,5GB free in avg in the beginnen we now have an avg of
8GB free memory.

Cache has shrinked from avg 24G to avg 18G.

Memory pressure has rised from avg 0% to avg 0.1% - not much but if you
look at the graphs it's continuously rising while cache is shrinking and
memfree is rising.

Which values should i collect now?

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
@ 2019-09-23 12:08                                                   ` Michal Hocko
  2019-09-27 12:45                                                   ` Vlastimil Babka
  1 sibling, 0 replies; 61+ messages in thread
From: Michal Hocko @ 2019-09-23 12:08 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka

On Thu 19-09-19 12:21:15, Stefan Priebe - Profihost AG wrote:
[...]
> Which values should i collect now?

Collect the same tracepoints as in the past.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
  2019-09-23 12:08                                                   ` Michal Hocko
@ 2019-09-27 12:45                                                   ` Vlastimil Babka
  2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
  2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
  1 sibling, 2 replies; 61+ messages in thread
From: Vlastimil Babka @ 2019-09-27 12:45 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

[-- Attachment #1: Type: text/plain, Size: 1279 bytes --]

On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote:
> Kernel 5.2.14 is now running since exactly 7 days and now we can easaly
> view a trend i', not sure if i should post graphs.
> 
> Cache size is continuously shrinking while memfree is rising.
> 
> While there were 4,5GB free in avg in the beginnen we now have an avg of
> 8GB free memory.
> 
> Cache has shrinked from avg 24G to avg 18G.
> 
> Memory pressure has rised from avg 0% to avg 0.1% - not much but if you
> look at the graphs it's continuously rising while cache is shrinking and
> memfree is rising.

Hi, could you try the patch below? I suspect you're hitting a corner
case where compaction_suitable() returns COMPACT_SKIPPED for the
ZONE_DMA, triggering reclaim even if other zones have plenty of free
memory. And should_continue_reclaim() then returns true until twice the
requested page size is reclaimed (compact_gap()). That means 4MB
reclaimed for each THP allocation attempt, which roughly matches the
trace data you preovided previously.

The amplification to 4MB should be removed in patches merged for 5.4, so
it would be only 32 pages reclaimed per THP allocation. The patch below
tries to remove this corner case completely, and it should be more
visible on your 5.2.x, so please apply it there.

[-- Attachment #2: 0001-mm-compaction-distinguish-when-compaction-is-impossi.patch --]
[-- Type: text/x-patch, Size: 6381 bytes --]

From 565008042b759835d51703f1da9b335dc0404546 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 12 Sep 2019 13:40:46 +0200
Subject: [PATCH] mm, compaction: distinguish when compaction is impossible

---
 include/linux/compaction.h     |  7 ++++++-
 include/trace/events/mmflags.h |  1 +
 mm/compaction.c                | 16 +++++++++++++--
 mm/vmscan.c                    | 36 ++++++++++++++++++++++++----------
 4 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 9569e7c786d3..6e624f482a08 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -17,8 +17,13 @@ enum compact_priority {
 };
 
 /* Return values for compact_zone() and try_to_compact_pages() */
-/* When adding new states, please adjust include/trace/events/compaction.h */
+/* When adding new states, please adjust include/trace/events/mmflags.h */
 enum compact_result {
+	/*
+	 * The zone is too small to provide the requested allocation even if
+	 * fully freed (i.e. ZONE_DMA for THP allocation due to lowmem reserves)
+	 */
+	COMPACT_IMPOSSIBLE,
 	/* For more detailed tracepoint output - internal to compaction */
 	COMPACT_NOT_SUITABLE_ZONE,
 	/*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a1675d43777e..557dad69a9db 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -170,6 +170,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 
 #ifdef CONFIG_COMPACTION
 #define COMPACTION_STATUS					\
+	EM( COMPACT_IMPOSSIBLE,		"impossible")		\
 	EM( COMPACT_SKIPPED,		"skipped")		\
 	EM( COMPACT_DEFERRED,		"deferred")		\
 	EM( COMPACT_CONTINUE,		"continue")		\
diff --git a/mm/compaction.c b/mm/compaction.c
index 9e1b9acb116b..50a3dd2e2b6e 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1948,6 +1948,7 @@ static enum compact_result compact_finished(struct compact_control *cc)
 /*
  * compaction_suitable: Is this suitable to run compaction on this zone now?
  * Returns
+ *   COMPACT_IMPOSSIBLE If the allocation would fail even with all pages free
  *   COMPACT_SKIPPED  - If there are too few free pages for compaction
  *   COMPACT_SUCCESS  - If the allocation would succeed without compaction
  *   COMPACT_CONTINUE - If compaction should run now
@@ -1971,6 +1972,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 								alloc_flags))
 		return COMPACT_SUCCESS;
 
+	/*
+	 * If the allocation would not succeed even with a fully free zone
+	 * due to e.g. lowmem reserves, indicate that compaction can't possibly
+	 * help and it would be pointless to reclaim.
+	 */
+	watermark += 1UL << order;
+	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
+				 alloc_flags, zone_managed_pages(zone)))
+		return COMPACT_IMPOSSIBLE;
+
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
 	 * isolate free pages for migration targets. This means that the
@@ -2058,7 +2069,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 		available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
 		compact_result = __compaction_suitable(zone, order, alloc_flags,
 				ac_classzone_idx(ac), available);
-		if (compact_result != COMPACT_SKIPPED)
+		if (compact_result > COMPACT_SKIPPED)
 			return true;
 	}
 
@@ -2079,7 +2090,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc)
 	ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
-	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
+	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED
+	    || ret == COMPACT_IMPOSSIBLE)
 		return ret;
 
 	/* huh, compaction_suitable is returning something unexpected */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 910e02c793ff..20ba471a8454 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2778,11 +2778,12 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 }
 
 /*
- * Returns true if compaction should go ahead for a costly-order request, or
- * the allocation would already succeed without compaction. Return false if we
- * should reclaim first.
+ * Returns 1 if compaction should go ahead for a costly-order request, or the
+ * allocation would already succeed without compaction. Return 0 if we should
+ * reclaim first. Return -1 when compaction can't help at all due to zone being
+ * too small, which means there's no point in reclaim nor compaction.
  */
-static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
+static inline int compaction_ready(struct zone *zone, struct scan_control *sc)
 {
 	unsigned long watermark;
 	enum compact_result suitable;
@@ -2790,10 +2791,16 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 	suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
 	if (suitable == COMPACT_SUCCESS)
 		/* Allocation should succeed already. Don't reclaim. */
-		return true;
+		return 1;
 	if (suitable == COMPACT_SKIPPED)
 		/* Compaction cannot yet proceed. Do reclaim. */
-		return false;
+		return 0;
+	if (suitable == COMPACT_IMPOSSIBLE)
+		/*
+		 * Compaction can't possibly help. So don't reclaim, but keep
+		 * checking other zones.
+		 */
+		return -1;
 
 	/*
 	 * Compaction is already possible, but it takes time to run and there
@@ -2839,6 +2846,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					sc->reclaim_idx, sc->nodemask) {
+		int compact_ready;
 		/*
 		 * Take care memory controller reclaiming has small influence
 		 * to global LRU.
@@ -2858,10 +2866,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			 * page allocations.
 			 */
 			if (IS_ENABLED(CONFIG_COMPACTION) &&
-			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
-			    compaction_ready(zone, sc)) {
-				sc->compaction_ready = true;
-				continue;
+			    sc->order > PAGE_ALLOC_COSTLY_ORDER) {
+				compact_ready = compaction_ready(zone, sc);
+				if (compact_ready == 1) {
+					sc->compaction_ready = true;
+					continue;
+				} else if (compact_ready == -1) {
+					/*
+					 * In this zone, neither reclaim nor
+					 * compaction can help.
+					 */
+					continue;
+				}
 			}
 
 			/*
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-27 12:45                                                   ` Vlastimil Babka
@ 2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
  2019-09-30  7:21                                                       ` Vlastimil Babka
  2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-09-30  6:56 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

Hi,

the current status is, that everything works well / fine since i
switched from CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS to
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE
Am 27.09.19 um 14:45 schrieb Vlastimil Babka:
> On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote:
>> Kernel 5.2.14 is now running since exactly 7 days and now we can easaly
>> view a trend i', not sure if i should post graphs.
>>
>> Cache size is continuously shrinking while memfree is rising.
>>
>> While there were 4,5GB free in avg in the beginnen we now have an avg of
>> 8GB free memory.
>>
>> Cache has shrinked from avg 24G to avg 18G.
>>
>> Memory pressure has rised from avg 0% to avg 0.1% - not much but if you
>> look at the graphs it's continuously rising while cache is shrinking and
>> memfree is rising.
> 
> Hi, could you try the patch below? I suspect you're hitting a corner
> case where compaction_suitable() returns COMPACT_SKIPPED for the
> ZONE_DMA, triggering reclaim even if other zones have plenty of free
> memory. And should_continue_reclaim() then returns true until twice the
> requested page size is reclaimed (compact_gap()). That means 4MB
> reclaimed for each THP allocation attempt, which roughly matches the
> trace data you preovided previously.
> 
> The amplification to 4MB should be removed in patches merged for 5.4, so
> it would be only 32 pages reclaimed per THP allocation. The patch below
> tries to remove this corner case completely, and it should be more
> visible on your 5.2.x, so please apply it there.

so i switched back to 4.19 LTS Kernel - as this is the kernel we run on
all our infrastructures. THP is now only in use von kvm host machines.
Your patch applies to 4.19 as well - but not sure if it is a good idea
to apply it to those machines.

Greets,
Stefan


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
@ 2019-09-30  7:21                                                       ` Vlastimil Babka
  0 siblings, 0 replies; 61+ messages in thread
From: Vlastimil Babka @ 2019-09-30  7:21 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On 9/30/19 8:56 AM, Stefan Priebe - Profihost AG wrote:
> Hi,
> 
> the current status is, that everything works well / fine since i
> switched from CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS to
> CONFIG_TRANSPARENT_HUGEPAGE_MADVISE

Thanks, that indeed confirms the problem is related to THP's.

> Am 27.09.19 um 14:45 schrieb Vlastimil Babka:
>> On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote:
>>
>> Hi, could you try the patch below? I suspect you're hitting a corner
>> case where compaction_suitable() returns COMPACT_SKIPPED for the
>> ZONE_DMA, triggering reclaim even if other zones have plenty of free
>> memory. And should_continue_reclaim() then returns true until twice the
>> requested page size is reclaimed (compact_gap()). That means 4MB
>> reclaimed for each THP allocation attempt, which roughly matches the
>> trace data you preovided previously.
>>
>> The amplification to 4MB should be removed in patches merged for 5.4, so
>> it would be only 32 pages reclaimed per THP allocation. The patch below
>> tries to remove this corner case completely, and it should be more
>> visible on your 5.2.x, so please apply it there.
> 
> so i switched back to 4.19 LTS Kernel - as this is the kernel we run on
> all our infrastructures. THP is now only in use von kvm host machines.
> Your patch applies to 4.19 as well - but not sure if it is a good idea
> to apply it to those machines.

If you could try that, it would be great (and switch back hugepages to
always after applying). The problem is older than 4.19.

> Greets,
> Stefan
> 



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-09-27 12:45                                                   ` Vlastimil Babka
  2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
@ 2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
  2019-10-22  7:48                                                       ` Vlastimil Babka
  1 sibling, 1 reply; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-10-22  7:41 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

Hi,
Am 27.09.19 um 14:45 schrieb Vlastimil Babka:
> On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote:
>> Kernel 5.2.14 is now running since exactly 7 days and now we can easaly
>> view a trend i', not sure if i should post graphs.
>>
>> Cache size is continuously shrinking while memfree is rising.
>>
>> While there were 4,5GB free in avg in the beginnen we now have an avg of
>> 8GB free memory.
>>
>> Cache has shrinked from avg 24G to avg 18G.
>>
>> Memory pressure has rised from avg 0% to avg 0.1% - not much but if you
>> look at the graphs it's continuously rising while cache is shrinking and
>> memfree is rising.
> 
> Hi, could you try the patch below? I suspect you're hitting a corner
> case where compaction_suitable() returns COMPACT_SKIPPED for the
> ZONE_DMA, triggering reclaim even if other zones have plenty of free
> memory. And should_continue_reclaim() then returns true until twice the
> requested page size is reclaimed (compact_gap()). That means 4MB
> reclaimed for each THP allocation attempt, which roughly matches the
> trace data you preovided previously.
> 
> The amplification to 4MB should be removed in patches merged for 5.4, so
> it would be only 32 pages reclaimed per THP allocation. The patch below
> tries to remove this corner case completely, and it should be more
> visible on your 5.2.x, so please apply it there.
> 
is there any reason to not apply that one on top of 4.19?

Greets,
Stefan



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
@ 2019-10-22  7:48                                                       ` Vlastimil Babka
  2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
  0 siblings, 1 reply; 61+ messages in thread
From: Vlastimil Babka @ 2019-10-22  7:48 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote:
>> Hi, could you try the patch below? I suspect you're hitting a corner
>> case where compaction_suitable() returns COMPACT_SKIPPED for the
>> ZONE_DMA, triggering reclaim even if other zones have plenty of free
>> memory. And should_continue_reclaim() then returns true until twice the
>> requested page size is reclaimed (compact_gap()). That means 4MB
>> reclaimed for each THP allocation attempt, which roughly matches the
>> trace data you preovided previously.
>>
>> The amplification to 4MB should be removed in patches merged for 5.4, so
>> it would be only 32 pages reclaimed per THP allocation. The patch below
>> tries to remove this corner case completely, and it should be more
>> visible on your 5.2.x, so please apply it there.
>>
> is there any reason to not apply that one on top of 4.19?
> 
> Greets,
> Stefan
> 

It should work, cherrypicks fine without conflict here.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-10-22  7:48                                                       ` Vlastimil Babka
@ 2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
  2019-10-22 10:20                                                           ` Oscar Salvador
  2019-10-22 10:21                                                           ` Vlastimil Babka
  0 siblings, 2 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-10-22 10:02 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner


Am 22.10.19 um 09:48 schrieb Vlastimil Babka:
> On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote:
>>> Hi, could you try the patch below? I suspect you're hitting a corner
>>> case where compaction_suitable() returns COMPACT_SKIPPED for the
>>> ZONE_DMA, triggering reclaim even if other zones have plenty of free
>>> memory. And should_continue_reclaim() then returns true until twice the
>>> requested page size is reclaimed (compact_gap()). That means 4MB
>>> reclaimed for each THP allocation attempt, which roughly matches the
>>> trace data you preovided previously.
>>>
>>> The amplification to 4MB should be removed in patches merged for 5.4, so
>>> it would be only 32 pages reclaimed per THP allocation. The patch below
>>> tries to remove this corner case completely, and it should be more
>>> visible on your 5.2.x, so please apply it there.
>>>
>> is there any reason to not apply that one on top of 4.19?
>>
>> Greets,
>> Stefan
>>
> 
> It should work, cherrypicks fine without conflict here.

OK but does not work ;-)


mm/compaction.c: In function '__compaction_suitable':
mm/compaction.c:1451:19: error: implicit declaration of function
'zone_managed_pages'; did you mean 'node_spanned_pages'?
[-Werror=implicit-function-declaration]
      alloc_flags, zone_managed_pages(zone)))
                   ^~~~~~~~~~~~~~~~~~
                   node_spanned_pages

Greets,
Stefan




^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
@ 2019-10-22 10:20                                                           ` Oscar Salvador
  2019-10-22 10:21                                                           ` Vlastimil Babka
  1 sibling, 0 replies; 61+ messages in thread
From: Oscar Salvador @ 2019-10-22 10:20 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG
  Cc: Vlastimil Babka, Michal Hocko, linux-mm, l.roehrs, cgroups,
	Johannes Weiner

On Tue, Oct 22, 2019 at 12:02:13PM +0200, Stefan Priebe - Profihost AG wrote:
> 
> Am 22.10.19 um 09:48 schrieb Vlastimil Babka:
> > On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote:
> >>> Hi, could you try the patch below? I suspect you're hitting a corner
> >>> case where compaction_suitable() returns COMPACT_SKIPPED for the
> >>> ZONE_DMA, triggering reclaim even if other zones have plenty of free
> >>> memory. And should_continue_reclaim() then returns true until twice the
> >>> requested page size is reclaimed (compact_gap()). That means 4MB
> >>> reclaimed for each THP allocation attempt, which roughly matches the
> >>> trace data you preovided previously.
> >>>
> >>> The amplification to 4MB should be removed in patches merged for 5.4, so
> >>> it would be only 32 pages reclaimed per THP allocation. The patch below
> >>> tries to remove this corner case completely, and it should be more
> >>> visible on your 5.2.x, so please apply it there.
> >>>
> >> is there any reason to not apply that one on top of 4.19?
> >>
> >> Greets,
> >> Stefan
> >>
> > 
> > It should work, cherrypicks fine without conflict here.
> 
> OK but does not work ;-)
> 
> 
> mm/compaction.c: In function '__compaction_suitable':
> mm/compaction.c:1451:19: error: implicit declaration of function
> 'zone_managed_pages'; did you mean 'node_spanned_pages'?
> [-Werror=implicit-function-declaration]
>       alloc_flags, zone_managed_pages(zone)))
>                    ^~~~~~~~~~~~~~~~~~
>                    node_spanned_pages

zone_managed_pages() was introduced later.
On 4.19, you need zone->managed_pages.
So, changing zone_managed_pages(zone) to zone->managed_pages in that chunk
should make the trick.

> 
> Greets,
> Stefan
> 
> 
> 

-- 
Oscar Salvador
SUSE L3


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
  2019-10-22 10:20                                                           ` Oscar Salvador
@ 2019-10-22 10:21                                                           ` Vlastimil Babka
  2019-10-22 11:08                                                             ` Stefan Priebe - Profihost AG
  1 sibling, 1 reply; 61+ messages in thread
From: Vlastimil Babka @ 2019-10-22 10:21 UTC (permalink / raw)
  To: Stefan Priebe - Profihost AG, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

On 10/22/19 12:02 PM, Stefan Priebe - Profihost AG wrote:
> 
> Am 22.10.19 um 09:48 schrieb Vlastimil Babka:
>> On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote:
>>>> Hi, could you try the patch below? I suspect you're hitting a corner
>>>> case where compaction_suitable() returns COMPACT_SKIPPED for the
>>>> ZONE_DMA, triggering reclaim even if other zones have plenty of free
>>>> memory. And should_continue_reclaim() then returns true until twice the
>>>> requested page size is reclaimed (compact_gap()). That means 4MB
>>>> reclaimed for each THP allocation attempt, which roughly matches the
>>>> trace data you preovided previously.
>>>>
>>>> The amplification to 4MB should be removed in patches merged for 5.4, so
>>>> it would be only 32 pages reclaimed per THP allocation. The patch below
>>>> tries to remove this corner case completely, and it should be more
>>>> visible on your 5.2.x, so please apply it there.
>>>>
>>> is there any reason to not apply that one on top of 4.19?
>>>
>>> Greets,
>>> Stefan
>>>
>>
>> It should work, cherrypicks fine without conflict here.
> 
> OK but does not work ;-)
> 
> 
> mm/compaction.c: In function '__compaction_suitable':
> mm/compaction.c:1451:19: error: implicit declaration of function
> 'zone_managed_pages'; did you mean 'node_spanned_pages'?
> [-Werror=implicit-function-declaration]
>       alloc_flags, zone_managed_pages(zone)))
>                    ^~~~~~~~~~~~~~~~~~
>                    node_spanned_pages

Ah, this?

----8<----
From f1335e1c0d4b74205fc0cc40b5960223d6f1dec7 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 12 Sep 2019 13:40:46 +0200
Subject: [PATCH] WIP

---
 include/linux/compaction.h     |  7 ++++++-
 include/trace/events/mmflags.h |  1 +
 mm/compaction.c                | 16 +++++++++++++--
 mm/vmscan.c                    | 36 ++++++++++++++++++++++++----------
 4 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/include/linux/compaction.h b/include/linux/compaction.h
index 68250a57aace..2f3b331c5239 100644
--- a/include/linux/compaction.h
+++ b/include/linux/compaction.h
@@ -17,8 +17,13 @@ enum compact_priority {
 };
 
 /* Return values for compact_zone() and try_to_compact_pages() */
-/* When adding new states, please adjust include/trace/events/compaction.h */
+/* When adding new states, please adjust include/trace/events/mmflags.h */
 enum compact_result {
+	/*
+	 * The zone is too small to provide the requested allocation even if
+	 * fully freed (i.e. ZONE_DMA for THP allocation due to lowmem reserves)
+	 */
+	COMPACT_IMPOSSIBLE,
 	/* For more detailed tracepoint output - internal to compaction */
 	COMPACT_NOT_SUITABLE_ZONE,
 	/*
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index a81cffb76d89..d7aa9cece234 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -169,6 +169,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 
 #ifdef CONFIG_COMPACTION
 #define COMPACTION_STATUS					\
+	EM( COMPACT_IMPOSSIBLE,		"impossible")		\
 	EM( COMPACT_SKIPPED,		"skipped")		\
 	EM( COMPACT_DEFERRED,		"deferred")		\
 	EM( COMPACT_CONTINUE,		"continue")		\
diff --git a/mm/compaction.c b/mm/compaction.c
index 5079ddbec8f9..7d2299c7faa2 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1416,6 +1416,7 @@ static enum compact_result compact_finished(struct zone *zone,
 /*
  * compaction_suitable: Is this suitable to run compaction on this zone now?
  * Returns
+ *   COMPACT_IMPOSSIBLE If the allocation would fail even with all pages free
  *   COMPACT_SKIPPED  - If there are too few free pages for compaction
  *   COMPACT_SUCCESS  - If the allocation would succeed without compaction
  *   COMPACT_CONTINUE - If compaction should run now
@@ -1439,6 +1440,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
 								alloc_flags))
 		return COMPACT_SUCCESS;
 
+	/*
+	 * If the allocation would not succeed even with a fully free zone
+	 * due to e.g. lowmem reserves, indicate that compaction can't possibly
+	 * help and it would be pointless to reclaim.
+	 */
+	watermark += 1UL << order;
+	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
+				 alloc_flags, zone->managed_pages))
+		return COMPACT_IMPOSSIBLE;
+
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
 	 * isolate free pages for migration targets. This means that the
@@ -1526,7 +1537,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
 		available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
 		compact_result = __compaction_suitable(zone, order, alloc_flags,
 				ac_classzone_idx(ac), available);
-		if (compact_result != COMPACT_SKIPPED)
+		if (compact_result > COMPACT_SKIPPED)
 			return true;
 	}
 
@@ -1555,7 +1566,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
 	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
 							cc->classzone_idx);
 	/* Compaction is likely to fail */
-	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
+	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED
+	    || ret == COMPACT_IMPOSSIBLE)
 		return ret;
 
 	/* huh, compaction_suitable is returning something unexpected */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index b37610c0eac6..7ad331a64fc5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2849,11 +2849,12 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 }
 
 /*
- * Returns true if compaction should go ahead for a costly-order request, or
- * the allocation would already succeed without compaction. Return false if we
- * should reclaim first.
+ * Returns 1 if compaction should go ahead for a costly-order request, or the
+ * allocation would already succeed without compaction. Return 0 if we should
+ * reclaim first. Return -1 when compaction can't help at all due to zone being
+ * too small, which means there's no point in reclaim nor compaction.
  */
-static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
+static inline int compaction_ready(struct zone *zone, struct scan_control *sc)
 {
 	unsigned long watermark;
 	enum compact_result suitable;
@@ -2861,10 +2862,16 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
 	suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
 	if (suitable == COMPACT_SUCCESS)
 		/* Allocation should succeed already. Don't reclaim. */
-		return true;
+		return 1;
 	if (suitable == COMPACT_SKIPPED)
 		/* Compaction cannot yet proceed. Do reclaim. */
-		return false;
+		return 0;
+	if (suitable == COMPACT_IMPOSSIBLE)
+		/*
+		 * Compaction can't possibly help. So don't reclaim, but keep
+		 * checking other zones.
+		 */
+		return -1;
 
 	/*
 	 * Compaction is already possible, but it takes time to run and there
@@ -2910,6 +2917,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 
 	for_each_zone_zonelist_nodemask(zone, z, zonelist,
 					sc->reclaim_idx, sc->nodemask) {
+		int compact_ready;
 		/*
 		 * Take care memory controller reclaiming has small influence
 		 * to global LRU.
@@ -2929,10 +2937,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 			 * page allocations.
 			 */
 			if (IS_ENABLED(CONFIG_COMPACTION) &&
-			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
-			    compaction_ready(zone, sc)) {
-				sc->compaction_ready = true;
-				continue;
+			    sc->order > PAGE_ALLOC_COSTLY_ORDER) {
+				compact_ready = compaction_ready(zone, sc);
+				if (compact_ready == 1) {
+					sc->compaction_ready = true;
+					continue;
+				} else if (compact_ready == -1) {
+					/*
+					 * In this zone, neither reclaim nor
+					 * compaction can help.
+					 */
+					continue;
+				}
 			}
 
 			/*
-- 
2.23.0




^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: lot of MemAvailable but falling cache and raising PSI
  2019-10-22 10:21                                                           ` Vlastimil Babka
@ 2019-10-22 11:08                                                             ` Stefan Priebe - Profihost AG
  0 siblings, 0 replies; 61+ messages in thread
From: Stefan Priebe - Profihost AG @ 2019-10-22 11:08 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko
  Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner

works - thanks

Am 22.10.19 um 12:21 schrieb Vlastimil Babka:
> On 10/22/19 12:02 PM, Stefan Priebe - Profihost AG wrote:
>>
>> Am 22.10.19 um 09:48 schrieb Vlastimil Babka:
>>> On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote:
>>>>> Hi, could you try the patch below? I suspect you're hitting a corner
>>>>> case where compaction_suitable() returns COMPACT_SKIPPED for the
>>>>> ZONE_DMA, triggering reclaim even if other zones have plenty of free
>>>>> memory. And should_continue_reclaim() then returns true until twice the
>>>>> requested page size is reclaimed (compact_gap()). That means 4MB
>>>>> reclaimed for each THP allocation attempt, which roughly matches the
>>>>> trace data you preovided previously.
>>>>>
>>>>> The amplification to 4MB should be removed in patches merged for 5.4, so
>>>>> it would be only 32 pages reclaimed per THP allocation. The patch below
>>>>> tries to remove this corner case completely, and it should be more
>>>>> visible on your 5.2.x, so please apply it there.
>>>>>
>>>> is there any reason to not apply that one on top of 4.19?
>>>>
>>>> Greets,
>>>> Stefan
>>>>
>>>
>>> It should work, cherrypicks fine without conflict here.
>>
>> OK but does not work ;-)
>>
>>
>> mm/compaction.c: In function '__compaction_suitable':
>> mm/compaction.c:1451:19: error: implicit declaration of function
>> 'zone_managed_pages'; did you mean 'node_spanned_pages'?
>> [-Werror=implicit-function-declaration]
>>       alloc_flags, zone_managed_pages(zone)))
>>                    ^~~~~~~~~~~~~~~~~~
>>                    node_spanned_pages
> 
> Ah, this?
> 
> ----8<----
> From f1335e1c0d4b74205fc0cc40b5960223d6f1dec7 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 12 Sep 2019 13:40:46 +0200
> Subject: [PATCH] WIP
> 
> ---
>  include/linux/compaction.h     |  7 ++++++-
>  include/trace/events/mmflags.h |  1 +
>  mm/compaction.c                | 16 +++++++++++++--
>  mm/vmscan.c                    | 36 ++++++++++++++++++++++++----------
>  4 files changed, 47 insertions(+), 13 deletions(-)
> 
> diff --git a/include/linux/compaction.h b/include/linux/compaction.h
> index 68250a57aace..2f3b331c5239 100644
> --- a/include/linux/compaction.h
> +++ b/include/linux/compaction.h
> @@ -17,8 +17,13 @@ enum compact_priority {
>  };
>  
>  /* Return values for compact_zone() and try_to_compact_pages() */
> -/* When adding new states, please adjust include/trace/events/compaction.h */
> +/* When adding new states, please adjust include/trace/events/mmflags.h */
>  enum compact_result {
> +	/*
> +	 * The zone is too small to provide the requested allocation even if
> +	 * fully freed (i.e. ZONE_DMA for THP allocation due to lowmem reserves)
> +	 */
> +	COMPACT_IMPOSSIBLE,
>  	/* For more detailed tracepoint output - internal to compaction */
>  	COMPACT_NOT_SUITABLE_ZONE,
>  	/*
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index a81cffb76d89..d7aa9cece234 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -169,6 +169,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
>  
>  #ifdef CONFIG_COMPACTION
>  #define COMPACTION_STATUS					\
> +	EM( COMPACT_IMPOSSIBLE,		"impossible")		\
>  	EM( COMPACT_SKIPPED,		"skipped")		\
>  	EM( COMPACT_DEFERRED,		"deferred")		\
>  	EM( COMPACT_CONTINUE,		"continue")		\
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5079ddbec8f9..7d2299c7faa2 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1416,6 +1416,7 @@ static enum compact_result compact_finished(struct zone *zone,
>  /*
>   * compaction_suitable: Is this suitable to run compaction on this zone now?
>   * Returns
> + *   COMPACT_IMPOSSIBLE If the allocation would fail even with all pages free
>   *   COMPACT_SKIPPED  - If there are too few free pages for compaction
>   *   COMPACT_SUCCESS  - If the allocation would succeed without compaction
>   *   COMPACT_CONTINUE - If compaction should run now
> @@ -1439,6 +1440,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order,
>  								alloc_flags))
>  		return COMPACT_SUCCESS;
>  
> +	/*
> +	 * If the allocation would not succeed even with a fully free zone
> +	 * due to e.g. lowmem reserves, indicate that compaction can't possibly
> +	 * help and it would be pointless to reclaim.
> +	 */
> +	watermark += 1UL << order;
> +	if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx,
> +				 alloc_flags, zone->managed_pages))
> +		return COMPACT_IMPOSSIBLE;
> +
>  	/*
>  	 * Watermarks for order-0 must be met for compaction to be able to
>  	 * isolate free pages for migration targets. This means that the
> @@ -1526,7 +1537,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order,
>  		available += zone_page_state_snapshot(zone, NR_FREE_PAGES);
>  		compact_result = __compaction_suitable(zone, order, alloc_flags,
>  				ac_classzone_idx(ac), available);
> -		if (compact_result != COMPACT_SKIPPED)
> +		if (compact_result > COMPACT_SKIPPED)
>  			return true;
>  	}
>  
> @@ -1555,7 +1566,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro
>  	ret = compaction_suitable(zone, cc->order, cc->alloc_flags,
>  							cc->classzone_idx);
>  	/* Compaction is likely to fail */
> -	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED)
> +	if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED
> +	    || ret == COMPACT_IMPOSSIBLE)
>  		return ret;
>  
>  	/* huh, compaction_suitable is returning something unexpected */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b37610c0eac6..7ad331a64fc5 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2849,11 +2849,12 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>  }
>  
>  /*
> - * Returns true if compaction should go ahead for a costly-order request, or
> - * the allocation would already succeed without compaction. Return false if we
> - * should reclaim first.
> + * Returns 1 if compaction should go ahead for a costly-order request, or the
> + * allocation would already succeed without compaction. Return 0 if we should
> + * reclaim first. Return -1 when compaction can't help at all due to zone being
> + * too small, which means there's no point in reclaim nor compaction.
>   */
> -static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
> +static inline int compaction_ready(struct zone *zone, struct scan_control *sc)
>  {
>  	unsigned long watermark;
>  	enum compact_result suitable;
> @@ -2861,10 +2862,16 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
>  	suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx);
>  	if (suitable == COMPACT_SUCCESS)
>  		/* Allocation should succeed already. Don't reclaim. */
> -		return true;
> +		return 1;
>  	if (suitable == COMPACT_SKIPPED)
>  		/* Compaction cannot yet proceed. Do reclaim. */
> -		return false;
> +		return 0;
> +	if (suitable == COMPACT_IMPOSSIBLE)
> +		/*
> +		 * Compaction can't possibly help. So don't reclaim, but keep
> +		 * checking other zones.
> +		 */
> +		return -1;
>  
>  	/*
>  	 * Compaction is already possible, but it takes time to run and there
> @@ -2910,6 +2917,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  
>  	for_each_zone_zonelist_nodemask(zone, z, zonelist,
>  					sc->reclaim_idx, sc->nodemask) {
> +		int compact_ready;
>  		/*
>  		 * Take care memory controller reclaiming has small influence
>  		 * to global LRU.
> @@ -2929,10 +2937,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  			 * page allocations.
>  			 */
>  			if (IS_ENABLED(CONFIG_COMPACTION) &&
> -			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> -			    compaction_ready(zone, sc)) {
> -				sc->compaction_ready = true;
> -				continue;
> +			    sc->order > PAGE_ALLOC_COSTLY_ORDER) {
> +				compact_ready = compaction_ready(zone, sc);
> +				if (compact_ready == 1) {
> +					sc->compaction_ready = true;
> +					continue;
> +				} else if (compact_ready == -1) {
> +					/*
> +					 * In this zone, neither reclaim nor
> +					 * compaction can help.
> +					 */
> +					continue;
> +				}
>  			}
>  
>  			/*
> 


^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2019-10-22 11:08 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05 11:27 lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
2019-09-05 11:40 ` Michal Hocko
2019-09-05 11:56   ` Stefan Priebe - Profihost AG
2019-09-05 16:28     ` Yang Shi
2019-09-05 17:26       ` Stefan Priebe - Profihost AG
2019-09-05 18:46         ` Yang Shi
2019-09-05 19:31           ` Stefan Priebe - Profihost AG
2019-09-06 10:08     ` Stefan Priebe - Profihost AG
2019-09-06 10:25       ` Vlastimil Babka
2019-09-06 18:52       ` Yang Shi
2019-09-07  7:32         ` Stefan Priebe - Profihost AG
2019-09-09  8:27       ` Michal Hocko
2019-09-09  8:54         ` Stefan Priebe - Profihost AG
2019-09-09 11:01           ` Michal Hocko
2019-09-09 12:08             ` Michal Hocko
2019-09-09 12:10               ` Stefan Priebe - Profihost AG
2019-09-09 12:28                 ` Michal Hocko
2019-09-09 12:37                   ` Stefan Priebe - Profihost AG
2019-09-09 12:49                     ` Michal Hocko
2019-09-09 12:56                       ` Stefan Priebe - Profihost AG
     [not found]                         ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>
2019-09-10  5:58                           ` Stefan Priebe - Profihost AG
2019-09-10  8:29                           ` Michal Hocko
2019-09-10  8:38                             ` Stefan Priebe - Profihost AG
2019-09-10  9:02                               ` Michal Hocko
2019-09-10  9:37                                 ` Stefan Priebe - Profihost AG
2019-09-10 11:07                                   ` Michal Hocko
2019-09-10 12:45                                     ` Stefan Priebe - Profihost AG
2019-09-10 12:57                                       ` Michal Hocko
2019-09-10 13:05                                         ` Stefan Priebe - Profihost AG
2019-09-10 13:14                                           ` Stefan Priebe - Profihost AG
2019-09-10 13:24                                             ` Michal Hocko
2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
2019-09-11 13:59                                                   ` Stefan Priebe - Profihost AG
2019-09-12 10:53                                                     ` Stefan Priebe - Profihost AG
2019-09-12 11:06                                                       ` Stefan Priebe - Profihost AG
2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
2019-09-11 14:09                                                   ` Stefan Priebe - Profihost AG
2019-09-11 14:56                                                   ` Filipe Manana
2019-09-11 15:39                                                     ` Stefan Priebe - Profihost AG
2019-09-11 15:56                                                       ` Filipe Manana
2019-09-11 16:15                                                         ` Stefan Priebe - Profihost AG
2019-09-11 16:19                                                           ` Filipe Manana
2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
2019-09-23 12:08                                                   ` Michal Hocko
2019-09-27 12:45                                                   ` Vlastimil Babka
2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
2019-09-30  7:21                                                       ` Vlastimil Babka
2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
2019-10-22  7:48                                                       ` Vlastimil Babka
2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
2019-10-22 10:20                                                           ` Oscar Salvador
2019-10-22 10:21                                                           ` Vlastimil Babka
2019-10-22 11:08                                                             ` Stefan Priebe - Profihost AG
2019-09-10  5:41                       ` Stefan Priebe - Profihost AG
2019-09-09 11:49           ` Vlastimil Babka
2019-09-09 12:09             ` Stefan Priebe - Profihost AG
2019-09-09 12:21               ` Vlastimil Babka
2019-09-09 12:31                 ` Stefan Priebe - Profihost AG
2019-09-05 12:15 ` Vlastimil Babka
2019-09-05 12:27   ` Stefan Priebe - Profihost AG

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).