* lot of MemAvailable but falling cache and raising PSI @ 2019-09-05 11:27 Stefan Priebe - Profihost AG 2019-09-05 11:40 ` Michal Hocko 2019-09-05 12:15 ` Vlastimil Babka 0 siblings, 2 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-05 11:27 UTC (permalink / raw) To: linux-mm; +Cc: l.roehrs, cgroups, Johannes Weiner, Michal Hocko Hello all, i hope you can help me again to understand the current MemAvailable value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in this case. I'm seeing the following behaviour i don't understand and ask for help. While MemAvailable shows 5G the kernel starts to drop cache from 4G down to 1G while the apache spawns some PHP processes. After that the PSI mem.some value rises and the kernel tries to reclaim memory but MemAvailable stays at 5G. Any ideas? Thanks! Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 11:27 lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG @ 2019-09-05 11:40 ` Michal Hocko 2019-09-05 11:56 ` Stefan Priebe - Profihost AG 2019-09-05 12:15 ` Vlastimil Babka 1 sibling, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-05 11:40 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: > Hello all, > > i hope you can help me again to understand the current MemAvailable > value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in > this case. > > I'm seeing the following behaviour i don't understand and ask for help. > > While MemAvailable shows 5G the kernel starts to drop cache from 4G down > to 1G while the apache spawns some PHP processes. After that the PSI > mem.some value rises and the kernel tries to reclaim memory but > MemAvailable stays at 5G. > > Any ideas? Can you collect /proc/vmstat (every second or so) and post it while this is the case please? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 11:40 ` Michal Hocko @ 2019-09-05 11:56 ` Stefan Priebe - Profihost AG 2019-09-05 16:28 ` Yang Shi 2019-09-06 10:08 ` Stefan Priebe - Profihost AG 0 siblings, 2 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-05 11:56 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner Am 05.09.19 um 13:40 schrieb Michal Hocko: > On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: >> Hello all, >> >> i hope you can help me again to understand the current MemAvailable >> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in >> this case. >> >> I'm seeing the following behaviour i don't understand and ask for help. >> >> While MemAvailable shows 5G the kernel starts to drop cache from 4G down >> to 1G while the apache spawns some PHP processes. After that the PSI >> mem.some value rises and the kernel tries to reclaim memory but >> MemAvailable stays at 5G. >> >> Any ideas? > > Can you collect /proc/vmstat (every second or so) and post it while this > is the case please? Yes sure. But i don't know which event you mean exactly. Current situation is PSI / memory pressure is > 20 but: This is the current status where MemAvailable show 5G but Cached is already dropped to 1G coming from 4G: meminfo: MemTotal: 16423116 kB MemFree: 5280736 kB MemAvailable: 5332752 kB Buffers: 2572 kB Cached: 1225112 kB SwapCached: 0 kB Active: 8934976 kB Inactive: 1026900 kB Active(anon): 8740396 kB Inactive(anon): 873448 kB Active(file): 194580 kB Inactive(file): 153452 kB Unevictable: 19900 kB Mlocked: 19900 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 1980 kB Writeback: 0 kB AnonPages: 8423480 kB Mapped: 978212 kB Shmem: 875680 kB Slab: 839868 kB SReclaimable: 383396 kB SUnreclaim: 456472 kB KernelStack: 22576 kB PageTables: 49824 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8211556 kB Committed_AS: 32060624 kB VmallocTotal: 34359738367 kB VmallocUsed: 0 kB VmallocChunk: 0 kB Percpu: 118048 kB HardwareCorrupted: 0 kB AnonHugePages: 6406144 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 0 kB DirectMap4k: 2580336 kB DirectMap2M: 14196736 kB DirectMap1G: 2097152 kB vmstat shows: nr_free_pages 1320053 nr_zone_inactive_anon 218362 nr_zone_active_anon 2185108 nr_zone_inactive_file 38363 nr_zone_active_file 48645 nr_zone_unevictable 4975 nr_zone_write_pending 495 nr_mlock 4975 nr_page_table_pages 12553 nr_kernel_stack 22576 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 13916119899 numa_miss 0 numa_foreign 0 numa_interleave 15629 numa_local 13916119899 numa_other 0 nr_inactive_anon 218362 nr_active_anon 2185164 nr_inactive_file 38363 nr_active_file 48645 nr_unevictable 4975 nr_slab_reclaimable 95849 nr_slab_unreclaimable 114118 nr_isolated_anon 0 nr_isolated_file 0 workingset_refault 71365357 workingset_activate 20281670 workingset_restore 8995665 workingset_nodereclaim 326085 nr_anon_pages 2105903 nr_mapped 244553 nr_file_pages 306921 nr_dirty 495 nr_writeback 0 nr_writeback_temp 0 nr_shmem 218920 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_anon_transparent_hugepages 3128 nr_unstable 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 1833104 nr_dirtied 386544087 nr_written 259220036 nr_dirty_threshold 265636 nr_dirty_background_threshold 132656 pgpgin 1817628997 pgpgout 3730818029 pswpin 0 pswpout 0 pgalloc_dma 0 pgalloc_dma32 5790777997 pgalloc_normal 20003662520 pgalloc_movable 0 allocstall_dma 0 allocstall_dma32 0 allocstall_normal 39 allocstall_movable 1980089 pgskip_dma 0 pgskip_dma32 0 pgskip_normal 0 pgskip_movable 0 pgfree 26637215947 pgactivate 316722654 pgdeactivate 261039211 pglazyfree 0 pgfault 17719356599 pgmajfault 30985544 pglazyfreed 0 pgrefill 286826568 pgsteal_kswapd 36740923 pgsteal_direct 349291470 pgscan_kswapd 36878966 pgscan_direct 395327492 pgscan_direct_throttle 0 zone_reclaim_failed 0 pginodesteal 49817087 slabs_scanned 597956834 kswapd_inodesteal 1412447 kswapd_low_wmark_hit_quickly 39 kswapd_high_wmark_hit_quickly 319 pageoutrun 3585 pgrotated 2873743 drop_pagecache 0 drop_slab 0 oom_kill 0 pgmigrate_success 839062285 pgmigrate_fail 507313 compact_migrate_scanned 9619077010 compact_free_scanned 67985619651 compact_isolated 1684537704 compact_stall 205761 compact_fail 182420 compact_success 23341 compact_daemon_wake 2 compact_daemon_migrate_scanned 811 compact_daemon_free_scanned 490241 htlb_buddy_alloc_success 0 htlb_buddy_alloc_fail 0 unevictable_pgs_culled 1006521 unevictable_pgs_scanned 0 unevictable_pgs_rescued 997077 unevictable_pgs_mlocked 1319203 unevictable_pgs_munlocked 842471 unevictable_pgs_cleared 470531 unevictable_pgs_stranded 459613 thp_fault_alloc 20263113 thp_fault_fallback 3368635 thp_collapse_alloc 226476 thp_collapse_alloc_failed 17594 thp_file_alloc 0 thp_file_mapped 0 thp_split_page 1159 thp_split_page_failed 3927 thp_deferred_split_page 20348941 thp_split_pmd 53361 thp_split_pud 0 thp_zero_page_alloc 1 thp_zero_page_alloc_failed 0 thp_swpout 0 thp_swpout_fallback 0 balloon_inflate 0 balloon_deflate 0 balloon_migrate 0 swap_ra 0 swap_ra_hit 0 Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 11:56 ` Stefan Priebe - Profihost AG @ 2019-09-05 16:28 ` Yang Shi 2019-09-05 17:26 ` Stefan Priebe - Profihost AG 2019-09-06 10:08 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Yang Shi @ 2019-09-05 16:28 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > > > Am 05.09.19 um 13:40 schrieb Michal Hocko: > > On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: > >> Hello all, > >> > >> i hope you can help me again to understand the current MemAvailable > >> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in > >> this case. > >> > >> I'm seeing the following behaviour i don't understand and ask for help. > >> > >> While MemAvailable shows 5G the kernel starts to drop cache from 4G down > >> to 1G while the apache spawns some PHP processes. After that the PSI > >> mem.some value rises and the kernel tries to reclaim memory but > >> MemAvailable stays at 5G. > >> > >> Any ideas? > > > > Can you collect /proc/vmstat (every second or so) and post it while this > > is the case please? > > Yes sure. > > But i don't know which event you mean exactly. Current situation is PSI > / memory pressure is > 20 but: > > This is the current status where MemAvailable show 5G but Cached is > already dropped to 1G coming from 4G: I don't get what problem you are running into. MemAvailable is *not* the indication for triggering memory reclaim. Basically MemAvailable = MemFree + page cache (active file + inactive file) / 2 + SReclaimable / 2, which means that much memory could be reclaimed if memory pressure is hit. But, memory pressure (tracked by PSI) is triggered by how much memory (aka watermark) is consumed. So, it looks page reclaim logic just reclaimed file cache (it looks sane since your VM doesn't have swap partition), so I'm supposed you would see MemFree increased along with dropping "Cached", but MemAvailable basically is not changed. It looks sane to me. Am I missing something else? > > > meminfo: > MemTotal: 16423116 kB > MemFree: 5280736 kB > MemAvailable: 5332752 kB > Buffers: 2572 kB > Cached: 1225112 kB > SwapCached: 0 kB > Active: 8934976 kB > Inactive: 1026900 kB > Active(anon): 8740396 kB > Inactive(anon): 873448 kB > Active(file): 194580 kB > Inactive(file): 153452 kB > Unevictable: 19900 kB > Mlocked: 19900 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 1980 kB > Writeback: 0 kB > AnonPages: 8423480 kB > Mapped: 978212 kB > Shmem: 875680 kB > Slab: 839868 kB > SReclaimable: 383396 kB > SUnreclaim: 456472 kB > KernelStack: 22576 kB > PageTables: 49824 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 8211556 kB > Committed_AS: 32060624 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 0 kB > VmallocChunk: 0 kB > Percpu: 118048 kB > HardwareCorrupted: 0 kB > AnonHugePages: 6406144 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > Hugetlb: 0 kB > DirectMap4k: 2580336 kB > DirectMap2M: 14196736 kB > DirectMap1G: 2097152 kB > > > vmstat shows: > nr_free_pages 1320053 > nr_zone_inactive_anon 218362 > nr_zone_active_anon 2185108 > nr_zone_inactive_file 38363 > nr_zone_active_file 48645 > nr_zone_unevictable 4975 > nr_zone_write_pending 495 > nr_mlock 4975 > nr_page_table_pages 12553 > nr_kernel_stack 22576 > nr_bounce 0 > nr_zspages 0 > nr_free_cma 0 > numa_hit 13916119899 > numa_miss 0 > numa_foreign 0 > numa_interleave 15629 > numa_local 13916119899 > numa_other 0 > nr_inactive_anon 218362 > nr_active_anon 2185164 > nr_inactive_file 38363 > nr_active_file 48645 > nr_unevictable 4975 > nr_slab_reclaimable 95849 > nr_slab_unreclaimable 114118 > nr_isolated_anon 0 > nr_isolated_file 0 > workingset_refault 71365357 > workingset_activate 20281670 > workingset_restore 8995665 > workingset_nodereclaim 326085 > nr_anon_pages 2105903 > nr_mapped 244553 > nr_file_pages 306921 > nr_dirty 495 > nr_writeback 0 > nr_writeback_temp 0 > nr_shmem 218920 > nr_shmem_hugepages 0 > nr_shmem_pmdmapped 0 > nr_anon_transparent_hugepages 3128 > nr_unstable 0 > nr_vmscan_write 0 > nr_vmscan_immediate_reclaim 1833104 > nr_dirtied 386544087 > nr_written 259220036 > nr_dirty_threshold 265636 > nr_dirty_background_threshold 132656 > pgpgin 1817628997 > pgpgout 3730818029 > pswpin 0 > pswpout 0 > pgalloc_dma 0 > pgalloc_dma32 5790777997 > pgalloc_normal 20003662520 > pgalloc_movable 0 > allocstall_dma 0 > allocstall_dma32 0 > allocstall_normal 39 > allocstall_movable 1980089 > pgskip_dma 0 > pgskip_dma32 0 > pgskip_normal 0 > pgskip_movable 0 > pgfree 26637215947 > pgactivate 316722654 > pgdeactivate 261039211 > pglazyfree 0 > pgfault 17719356599 > pgmajfault 30985544 > pglazyfreed 0 > pgrefill 286826568 > pgsteal_kswapd 36740923 > pgsteal_direct 349291470 > pgscan_kswapd 36878966 > pgscan_direct 395327492 > pgscan_direct_throttle 0 > zone_reclaim_failed 0 > pginodesteal 49817087 > slabs_scanned 597956834 > kswapd_inodesteal 1412447 > kswapd_low_wmark_hit_quickly 39 > kswapd_high_wmark_hit_quickly 319 > pageoutrun 3585 > pgrotated 2873743 > drop_pagecache 0 > drop_slab 0 > oom_kill 0 > pgmigrate_success 839062285 > pgmigrate_fail 507313 > compact_migrate_scanned 9619077010 > compact_free_scanned 67985619651 > compact_isolated 1684537704 > compact_stall 205761 > compact_fail 182420 > compact_success 23341 > compact_daemon_wake 2 > compact_daemon_migrate_scanned 811 > compact_daemon_free_scanned 490241 > htlb_buddy_alloc_success 0 > htlb_buddy_alloc_fail 0 > unevictable_pgs_culled 1006521 > unevictable_pgs_scanned 0 > unevictable_pgs_rescued 997077 > unevictable_pgs_mlocked 1319203 > unevictable_pgs_munlocked 842471 > unevictable_pgs_cleared 470531 > unevictable_pgs_stranded 459613 > thp_fault_alloc 20263113 > thp_fault_fallback 3368635 > thp_collapse_alloc 226476 > thp_collapse_alloc_failed 17594 > thp_file_alloc 0 > thp_file_mapped 0 > thp_split_page 1159 > thp_split_page_failed 3927 > thp_deferred_split_page 20348941 > thp_split_pmd 53361 > thp_split_pud 0 > thp_zero_page_alloc 1 > thp_zero_page_alloc_failed 0 > thp_swpout 0 > thp_swpout_fallback 0 > balloon_inflate 0 > balloon_deflate 0 > balloon_migrate 0 > swap_ra 0 > swap_ra_hit 0 > > Greets, > Stefan > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 16:28 ` Yang Shi @ 2019-09-05 17:26 ` Stefan Priebe - Profihost AG 2019-09-05 18:46 ` Yang Shi 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-05 17:26 UTC (permalink / raw) To: Yang Shi; +Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner Hi, Am 05.09.19 um 18:28 schrieb Yang Shi: > On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> >> >> Am 05.09.19 um 13:40 schrieb Michal Hocko: >>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: >>>> Hello all, >>>> >>>> i hope you can help me again to understand the current MemAvailable >>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in >>>> this case. >>>> >>>> I'm seeing the following behaviour i don't understand and ask for help. >>>> >>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down >>>> to 1G while the apache spawns some PHP processes. After that the PSI >>>> mem.some value rises and the kernel tries to reclaim memory but >>>> MemAvailable stays at 5G. >>>> >>>> Any ideas? >>> >>> Can you collect /proc/vmstat (every second or so) and post it while this >>> is the case please? >> >> Yes sure. >> >> But i don't know which event you mean exactly. Current situation is PSI >> / memory pressure is > 20 but: >> >> This is the current status where MemAvailable show 5G but Cached is >> already dropped to 1G coming from 4G: > > I don't get what problem you are running into. MemAvailable is *not* > the indication for triggering memory reclaim. Yes it's not sure. But i don't get why: * PSI is raising and Caches are dropped when MemAvail and MemFree show 5GB > Basically MemAvailable = MemFree + page cache (active file + inactive > file) / 2 + SReclaimable / 2, which means that much memory could be > reclaimed if memory pressure is hit. Yes but MemFree also shows 5G in this case see below and still file cache gets dropped and PSI is rising. > But, memory pressure (tracked by PSI) is triggered by how much memory > (aka watermark) is consumed. What does this exactly mean? > So, it looks page reclaim logic just reclaimed file cache (it looks > sane since your VM doesn't have swap partition), so I'm supposed you > would see MemFree increased along with dropping "Cached", No it does not. MemFree and MemAvail stay constant at 5G. > but > MemAvailable basically is not changed. It looks sane to me. Am I > missing something else? I ever thought the kerne would not free the cache nor PSI gets rising when there are 5GB in MemFree and in MemAvail. This makes still no sense to me. Why drop the cache when you have 5G free. This results currently in I/O waits as the page was dropped. Greets, Stefan >> >> meminfo: >> MemTotal: 16423116 kB >> MemFree: 5280736 kB >> MemAvailable: 5332752 kB >> Buffers: 2572 kB >> Cached: 1225112 kB >> SwapCached: 0 kB >> Active: 8934976 kB >> Inactive: 1026900 kB >> Active(anon): 8740396 kB >> Inactive(anon): 873448 kB >> Active(file): 194580 kB >> Inactive(file): 153452 kB >> Unevictable: 19900 kB >> Mlocked: 19900 kB >> SwapTotal: 0 kB >> SwapFree: 0 kB >> Dirty: 1980 kB >> Writeback: 0 kB >> AnonPages: 8423480 kB >> Mapped: 978212 kB >> Shmem: 875680 kB >> Slab: 839868 kB >> SReclaimable: 383396 kB >> SUnreclaim: 456472 kB >> KernelStack: 22576 kB >> PageTables: 49824 kB >> NFS_Unstable: 0 kB >> Bounce: 0 kB >> WritebackTmp: 0 kB >> CommitLimit: 8211556 kB >> Committed_AS: 32060624 kB >> VmallocTotal: 34359738367 kB >> VmallocUsed: 0 kB >> VmallocChunk: 0 kB >> Percpu: 118048 kB >> HardwareCorrupted: 0 kB >> AnonHugePages: 6406144 kB >> ShmemHugePages: 0 kB >> ShmemPmdMapped: 0 kB >> HugePages_Total: 0 >> HugePages_Free: 0 >> HugePages_Rsvd: 0 >> HugePages_Surp: 0 >> Hugepagesize: 2048 kB >> Hugetlb: 0 kB >> DirectMap4k: 2580336 kB >> DirectMap2M: 14196736 kB >> DirectMap1G: 2097152 kB >> >> >> vmstat shows: >> nr_free_pages 1320053 >> nr_zone_inactive_anon 218362 >> nr_zone_active_anon 2185108 >> nr_zone_inactive_file 38363 >> nr_zone_active_file 48645 >> nr_zone_unevictable 4975 >> nr_zone_write_pending 495 >> nr_mlock 4975 >> nr_page_table_pages 12553 >> nr_kernel_stack 22576 >> nr_bounce 0 >> nr_zspages 0 >> nr_free_cma 0 >> numa_hit 13916119899 >> numa_miss 0 >> numa_foreign 0 >> numa_interleave 15629 >> numa_local 13916119899 >> numa_other 0 >> nr_inactive_anon 218362 >> nr_active_anon 2185164 >> nr_inactive_file 38363 >> nr_active_file 48645 >> nr_unevictable 4975 >> nr_slab_reclaimable 95849 >> nr_slab_unreclaimable 114118 >> nr_isolated_anon 0 >> nr_isolated_file 0 >> workingset_refault 71365357 >> workingset_activate 20281670 >> workingset_restore 8995665 >> workingset_nodereclaim 326085 >> nr_anon_pages 2105903 >> nr_mapped 244553 >> nr_file_pages 306921 >> nr_dirty 495 >> nr_writeback 0 >> nr_writeback_temp 0 >> nr_shmem 218920 >> nr_shmem_hugepages 0 >> nr_shmem_pmdmapped 0 >> nr_anon_transparent_hugepages 3128 >> nr_unstable 0 >> nr_vmscan_write 0 >> nr_vmscan_immediate_reclaim 1833104 >> nr_dirtied 386544087 >> nr_written 259220036 >> nr_dirty_threshold 265636 >> nr_dirty_background_threshold 132656 >> pgpgin 1817628997 >> pgpgout 3730818029 >> pswpin 0 >> pswpout 0 >> pgalloc_dma 0 >> pgalloc_dma32 5790777997 >> pgalloc_normal 20003662520 >> pgalloc_movable 0 >> allocstall_dma 0 >> allocstall_dma32 0 >> allocstall_normal 39 >> allocstall_movable 1980089 >> pgskip_dma 0 >> pgskip_dma32 0 >> pgskip_normal 0 >> pgskip_movable 0 >> pgfree 26637215947 >> pgactivate 316722654 >> pgdeactivate 261039211 >> pglazyfree 0 >> pgfault 17719356599 >> pgmajfault 30985544 >> pglazyfreed 0 >> pgrefill 286826568 >> pgsteal_kswapd 36740923 >> pgsteal_direct 349291470 >> pgscan_kswapd 36878966 >> pgscan_direct 395327492 >> pgscan_direct_throttle 0 >> zone_reclaim_failed 0 >> pginodesteal 49817087 >> slabs_scanned 597956834 >> kswapd_inodesteal 1412447 >> kswapd_low_wmark_hit_quickly 39 >> kswapd_high_wmark_hit_quickly 319 >> pageoutrun 3585 >> pgrotated 2873743 >> drop_pagecache 0 >> drop_slab 0 >> oom_kill 0 >> pgmigrate_success 839062285 >> pgmigrate_fail 507313 >> compact_migrate_scanned 9619077010 >> compact_free_scanned 67985619651 >> compact_isolated 1684537704 >> compact_stall 205761 >> compact_fail 182420 >> compact_success 23341 >> compact_daemon_wake 2 >> compact_daemon_migrate_scanned 811 >> compact_daemon_free_scanned 490241 >> htlb_buddy_alloc_success 0 >> htlb_buddy_alloc_fail 0 >> unevictable_pgs_culled 1006521 >> unevictable_pgs_scanned 0 >> unevictable_pgs_rescued 997077 >> unevictable_pgs_mlocked 1319203 >> unevictable_pgs_munlocked 842471 >> unevictable_pgs_cleared 470531 >> unevictable_pgs_stranded 459613 >> thp_fault_alloc 20263113 >> thp_fault_fallback 3368635 >> thp_collapse_alloc 226476 >> thp_collapse_alloc_failed 17594 >> thp_file_alloc 0 >> thp_file_mapped 0 >> thp_split_page 1159 >> thp_split_page_failed 3927 >> thp_deferred_split_page 20348941 >> thp_split_pmd 53361 >> thp_split_pud 0 >> thp_zero_page_alloc 1 >> thp_zero_page_alloc_failed 0 >> thp_swpout 0 >> thp_swpout_fallback 0 >> balloon_inflate 0 >> balloon_deflate 0 >> balloon_migrate 0 >> swap_ra 0 >> swap_ra_hit 0 >> >> Greets, >> Stefan >> >> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 17:26 ` Stefan Priebe - Profihost AG @ 2019-09-05 18:46 ` Yang Shi 2019-09-05 19:31 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Yang Shi @ 2019-09-05 18:46 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner On Thu, Sep 5, 2019 at 10:26 AM Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > > Hi, > Am 05.09.19 um 18:28 schrieb Yang Shi: > > On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG > > <s.priebe@profihost.ag> wrote: > >> > >> > >> Am 05.09.19 um 13:40 schrieb Michal Hocko: > >>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: > >>>> Hello all, > >>>> > >>>> i hope you can help me again to understand the current MemAvailable > >>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in > >>>> this case. > >>>> > >>>> I'm seeing the following behaviour i don't understand and ask for help. > >>>> > >>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down > >>>> to 1G while the apache spawns some PHP processes. After that the PSI > >>>> mem.some value rises and the kernel tries to reclaim memory but > >>>> MemAvailable stays at 5G. > >>>> > >>>> Any ideas? > >>> > >>> Can you collect /proc/vmstat (every second or so) and post it while this > >>> is the case please? > >> > >> Yes sure. > >> > >> But i don't know which event you mean exactly. Current situation is PSI > >> / memory pressure is > 20 but: > >> > >> This is the current status where MemAvailable show 5G but Cached is > >> already dropped to 1G coming from 4G: > > > > I don't get what problem you are running into. MemAvailable is *not* > > the indication for triggering memory reclaim. > > Yes it's not sure. But i don't get why: > * PSI is raising and Caches are dropped when MemAvail and MemFree show 5GB You need check your water mark (/proc/min_free_kbytes, /proc/watermark_scale_factor and /proc/zoneinfo) setting why kswapd is launched when there is 5 GB free memory. > > > Basically MemAvailable = MemFree + page cache (active file + inactive > > file) / 2 + SReclaimable / 2, which means that much memory could be > > reclaimed if memory pressure is hit. > > Yes but MemFree also shows 5G in this case see below and still file > cache gets dropped and PSI is rising. > > > But, memory pressure (tracked by PSI) is triggered by how much memory > > (aka watermark) is consumed. > What does this exactly mean? cat /proc/zoneinfo, it would show something like: pages free 4118641 min 12470 low 16598 high 20726 Here min/low/high are the so-called "water mark". When free memory is lower than low, kswapd would be launched. > > > So, it looks page reclaim logic just reclaimed file cache (it looks > > sane since your VM doesn't have swap partition), so I'm supposed you > > would see MemFree increased along with dropping "Cached", > > No it does not. MemFree and MemAvail stay constant at 5G. > > > but > > MemAvailable basically is not changed. It looks sane to me. Am I > > missing something else? > > I ever thought the kerne would not free the cache nor PSI gets rising > when there are 5GB in MemFree and in MemAvail. This makes still no sense > to me. Why drop the cache when you have 5G free. This results currently > in I/O waits as the page was dropped. > > Greets, > Stefan > > >> > >> meminfo: > >> MemTotal: 16423116 kB > >> MemFree: 5280736 kB > >> MemAvailable: 5332752 kB > >> Buffers: 2572 kB > >> Cached: 1225112 kB > >> SwapCached: 0 kB > >> Active: 8934976 kB > >> Inactive: 1026900 kB > >> Active(anon): 8740396 kB > >> Inactive(anon): 873448 kB > >> Active(file): 194580 kB > >> Inactive(file): 153452 kB > >> Unevictable: 19900 kB > >> Mlocked: 19900 kB > >> SwapTotal: 0 kB > >> SwapFree: 0 kB > >> Dirty: 1980 kB > >> Writeback: 0 kB > >> AnonPages: 8423480 kB > >> Mapped: 978212 kB > >> Shmem: 875680 kB > >> Slab: 839868 kB > >> SReclaimable: 383396 kB > >> SUnreclaim: 456472 kB > >> KernelStack: 22576 kB > >> PageTables: 49824 kB > >> NFS_Unstable: 0 kB > >> Bounce: 0 kB > >> WritebackTmp: 0 kB > >> CommitLimit: 8211556 kB > >> Committed_AS: 32060624 kB > >> VmallocTotal: 34359738367 kB > >> VmallocUsed: 0 kB > >> VmallocChunk: 0 kB > >> Percpu: 118048 kB > >> HardwareCorrupted: 0 kB > >> AnonHugePages: 6406144 kB > >> ShmemHugePages: 0 kB > >> ShmemPmdMapped: 0 kB > >> HugePages_Total: 0 > >> HugePages_Free: 0 > >> HugePages_Rsvd: 0 > >> HugePages_Surp: 0 > >> Hugepagesize: 2048 kB > >> Hugetlb: 0 kB > >> DirectMap4k: 2580336 kB > >> DirectMap2M: 14196736 kB > >> DirectMap1G: 2097152 kB > >> > >> > >> vmstat shows: > >> nr_free_pages 1320053 > >> nr_zone_inactive_anon 218362 > >> nr_zone_active_anon 2185108 > >> nr_zone_inactive_file 38363 > >> nr_zone_active_file 48645 > >> nr_zone_unevictable 4975 > >> nr_zone_write_pending 495 > >> nr_mlock 4975 > >> nr_page_table_pages 12553 > >> nr_kernel_stack 22576 > >> nr_bounce 0 > >> nr_zspages 0 > >> nr_free_cma 0 > >> numa_hit 13916119899 > >> numa_miss 0 > >> numa_foreign 0 > >> numa_interleave 15629 > >> numa_local 13916119899 > >> numa_other 0 > >> nr_inactive_anon 218362 > >> nr_active_anon 2185164 > >> nr_inactive_file 38363 > >> nr_active_file 48645 > >> nr_unevictable 4975 > >> nr_slab_reclaimable 95849 > >> nr_slab_unreclaimable 114118 > >> nr_isolated_anon 0 > >> nr_isolated_file 0 > >> workingset_refault 71365357 > >> workingset_activate 20281670 > >> workingset_restore 8995665 > >> workingset_nodereclaim 326085 > >> nr_anon_pages 2105903 > >> nr_mapped 244553 > >> nr_file_pages 306921 > >> nr_dirty 495 > >> nr_writeback 0 > >> nr_writeback_temp 0 > >> nr_shmem 218920 > >> nr_shmem_hugepages 0 > >> nr_shmem_pmdmapped 0 > >> nr_anon_transparent_hugepages 3128 > >> nr_unstable 0 > >> nr_vmscan_write 0 > >> nr_vmscan_immediate_reclaim 1833104 > >> nr_dirtied 386544087 > >> nr_written 259220036 > >> nr_dirty_threshold 265636 > >> nr_dirty_background_threshold 132656 > >> pgpgin 1817628997 > >> pgpgout 3730818029 > >> pswpin 0 > >> pswpout 0 > >> pgalloc_dma 0 > >> pgalloc_dma32 5790777997 > >> pgalloc_normal 20003662520 > >> pgalloc_movable 0 > >> allocstall_dma 0 > >> allocstall_dma32 0 > >> allocstall_normal 39 > >> allocstall_movable 1980089 > >> pgskip_dma 0 > >> pgskip_dma32 0 > >> pgskip_normal 0 > >> pgskip_movable 0 > >> pgfree 26637215947 > >> pgactivate 316722654 > >> pgdeactivate 261039211 > >> pglazyfree 0 > >> pgfault 17719356599 > >> pgmajfault 30985544 > >> pglazyfreed 0 > >> pgrefill 286826568 > >> pgsteal_kswapd 36740923 > >> pgsteal_direct 349291470 > >> pgscan_kswapd 36878966 > >> pgscan_direct 395327492 > >> pgscan_direct_throttle 0 > >> zone_reclaim_failed 0 > >> pginodesteal 49817087 > >> slabs_scanned 597956834 > >> kswapd_inodesteal 1412447 > >> kswapd_low_wmark_hit_quickly 39 > >> kswapd_high_wmark_hit_quickly 319 > >> pageoutrun 3585 > >> pgrotated 2873743 > >> drop_pagecache 0 > >> drop_slab 0 > >> oom_kill 0 > >> pgmigrate_success 839062285 > >> pgmigrate_fail 507313 > >> compact_migrate_scanned 9619077010 > >> compact_free_scanned 67985619651 > >> compact_isolated 1684537704 > >> compact_stall 205761 > >> compact_fail 182420 > >> compact_success 23341 > >> compact_daemon_wake 2 > >> compact_daemon_migrate_scanned 811 > >> compact_daemon_free_scanned 490241 > >> htlb_buddy_alloc_success 0 > >> htlb_buddy_alloc_fail 0 > >> unevictable_pgs_culled 1006521 > >> unevictable_pgs_scanned 0 > >> unevictable_pgs_rescued 997077 > >> unevictable_pgs_mlocked 1319203 > >> unevictable_pgs_munlocked 842471 > >> unevictable_pgs_cleared 470531 > >> unevictable_pgs_stranded 459613 > >> thp_fault_alloc 20263113 > >> thp_fault_fallback 3368635 > >> thp_collapse_alloc 226476 > >> thp_collapse_alloc_failed 17594 > >> thp_file_alloc 0 > >> thp_file_mapped 0 > >> thp_split_page 1159 > >> thp_split_page_failed 3927 > >> thp_deferred_split_page 20348941 > >> thp_split_pmd 53361 > >> thp_split_pud 0 > >> thp_zero_page_alloc 1 > >> thp_zero_page_alloc_failed 0 > >> thp_swpout 0 > >> thp_swpout_fallback 0 > >> balloon_inflate 0 > >> balloon_deflate 0 > >> balloon_migrate 0 > >> swap_ra 0 > >> swap_ra_hit 0 > >> > >> Greets, > >> Stefan > >> > >> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 18:46 ` Yang Shi @ 2019-09-05 19:31 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-05 19:31 UTC (permalink / raw) To: Yang Shi; +Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner Am 05.09.19 um 20:46 schrieb Yang Shi: > On Thu, Sep 5, 2019 at 10:26 AM Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> >> Hi, >> Am 05.09.19 um 18:28 schrieb Yang Shi: >>> On Thu, Sep 5, 2019 at 4:56 AM Stefan Priebe - Profihost AG >>> <s.priebe@profihost.ag> wrote: >>>> >>>> >>>> Am 05.09.19 um 13:40 schrieb Michal Hocko: >>>>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: >>>>>> Hello all, >>>>>> >>>>>> i hope you can help me again to understand the current MemAvailable >>>>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in >>>>>> this case. >>>>>> >>>>>> I'm seeing the following behaviour i don't understand and ask for help. >>>>>> >>>>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down >>>>>> to 1G while the apache spawns some PHP processes. After that the PSI >>>>>> mem.some value rises and the kernel tries to reclaim memory but >>>>>> MemAvailable stays at 5G. >>>>>> >>>>>> Any ideas? >>>>> >>>>> Can you collect /proc/vmstat (every second or so) and post it while this >>>>> is the case please? >>>> >>>> Yes sure. >>>> >>>> But i don't know which event you mean exactly. Current situation is PSI >>>> / memory pressure is > 20 but: >>>> >>>> This is the current status where MemAvailable show 5G but Cached is >>>> already dropped to 1G coming from 4G: >>> >>> I don't get what problem you are running into. MemAvailable is *not* >>> the indication for triggering memory reclaim. >> >> Yes it's not sure. But i don't get why: >> * PSI is raising and Caches are dropped when MemAvail and MemFree show 5GB > > You need check your water mark (/proc/min_free_kbytes, > /proc/watermark_scale_factor and /proc/zoneinfo) setting why kswapd is > launched when there is 5 GB free memory. sure i did but can't find anything: # cat /proc/sys/vm/min_free_kbytes 164231 # cat /proc/sys/vm/watermark_scale_factor 10 # cat /proc/zoneinfo Node 0, zone DMA per-node stats nr_inactive_anon 177046 nr_active_anon 1718836 nr_inactive_file 288146 nr_active_file 121497 nr_unevictable 5510 nr_slab_reclaimable 301721 nr_slab_unreclaimable 119276 nr_isolated_anon 0 nr_isolated_file 0 workingset_refault 72376392 workingset_activate 20641006 workingset_restore 9149962 workingset_nodereclaim 326469 nr_anon_pages 1647524 nr_mapped 211704 nr_file_pages 587984 nr_dirty 212 nr_writeback 0 nr_writeback_temp 0 nr_shmem 177458 nr_shmem_hugepages 0 nr_shmem_pmdmapped 0 nr_anon_transparent_hugepages 2480 nr_unstable 0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 1843759 nr_dirtied 388618149 nr_written 260643754 pages free 3977 min 39 low 48 high 57 spanned 4095 present 3998 managed 3977 protection: (0, 2968, 16022, 16022, 16022) nr_free_pages 3977 nr_zone_inactive_anon 0 nr_zone_active_anon 0 nr_zone_inactive_file 0 nr_zone_active_file 0 nr_zone_unevictable 0 nr_zone_write_pending 0 nr_mlock 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 0 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 8 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 8 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 8 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 8 cpu: 4 count: 0 high: 0 batch: 1 vm stats threshold: 8 cpu: 5 count: 0 high: 0 batch: 1 vm stats threshold: 8 cpu: 6 count: 0 high: 0 batch: 1 vm stats threshold: 8 cpu: 7 count: 0 high: 0 batch: 1 vm stats threshold: 8 node_unreclaimable: 0 start_pfn: 1 Node 0, zone DMA32 pages free 439019 min 7600 low 9500 high 11400 spanned 1044480 present 782300 managed 760023 protection: (0, 0, 13053, 13053, 13053) nr_free_pages 439019 nr_zone_inactive_anon 0 nr_zone_active_anon 309777 nr_zone_inactive_file 809 nr_zone_active_file 645 nr_zone_unevictable 2048 nr_zone_write_pending 1 nr_mlock 2048 nr_page_table_pages 8 nr_kernel_stack 32 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 213697054 numa_miss 0 numa_foreign 0 numa_interleave 0 numa_local 213697054 numa_other 0 pagesets cpu: 0 count: 0 high: 378 batch: 63 vm stats threshold: 48 cpu: 1 count: 1 high: 378 batch: 63 vm stats threshold: 48 cpu: 2 count: 338 high: 378 batch: 63 vm stats threshold: 48 cpu: 3 count: 10 high: 378 batch: 63 vm stats threshold: 48 cpu: 4 count: 0 high: 378 batch: 63 vm stats threshold: 48 cpu: 5 count: 324 high: 378 batch: 63 vm stats threshold: 48 cpu: 6 count: 136 high: 378 batch: 63 vm stats threshold: 48 cpu: 7 count: 1 high: 378 batch: 63 vm stats threshold: 48 node_unreclaimable: 0 start_pfn: 4096 Node 0, zone Normal pages free 734519 min 33417 low 41771 high 50125 spanned 3407872 present 3407872 managed 3341779 protection: (0, 0, 0, 0, 0) nr_free_pages 734519 nr_zone_inactive_anon 177046 nr_zone_active_anon 1409059 nr_zone_inactive_file 287337 nr_zone_active_file 120852 nr_zone_unevictable 3462 nr_zone_write_pending 211 nr_mlock 3462 nr_page_table_pages 10551 nr_kernel_stack 22464 nr_bounce 0 nr_zspages 0 nr_free_cma 0 numa_hit 13801352577 numa_miss 0 numa_foreign 0 numa_interleave 15629 numa_local 13801352577 numa_other 0 pagesets cpu: 0 count: 12 high: 42 batch: 7 vm stats threshold: 64 cpu: 1 count: 40 high: 42 batch: 7 vm stats threshold: 64 cpu: 2 count: 41 high: 42 batch: 7 vm stats threshold: 64 cpu: 3 count: 41 high: 42 batch: 7 vm stats threshold: 64 cpu: 4 count: 37 high: 42 batch: 7 vm stats threshold: 64 cpu: 5 count: 39 high: 42 batch: 7 vm stats threshold: 64 cpu: 6 count: 19 high: 42 batch: 7 vm stats threshold: 64 cpu: 7 count: 9 high: 42 batch: 7 vm stats threshold: 64 node_unreclaimable: 0 start_pfn: 1048576 Node 0, zone Movable pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 0, 0) Node 0, zone Device pages free 0 min 0 low 0 high 0 spanned 0 present 0 managed 0 protection: (0, 0, 0, 0, 0) >>> Basically MemAvailable = MemFree + page cache (active file + inactive >>> file) / 2 + SReclaimable / 2, which means that much memory could be >>> reclaimed if memory pressure is hit. >> >> Yes but MemFree also shows 5G in this case see below and still file >> cache gets dropped and PSI is rising. >> >>> But, memory pressure (tracked by PSI) is triggered by how much memory >>> (aka watermark) is consumed. >> What does this exactly mean? > > cat /proc/zoneinfo, it would show something like: > > pages free 4118641 > min 12470 > low 16598 > high 20726 > > Here min/low/high are the so-called "water mark". When free memory is > lower than low, kswapd would be launched. > >> >>> So, it looks page reclaim logic just reclaimed file cache (it looks >>> sane since your VM doesn't have swap partition), so I'm supposed you >>> would see MemFree increased along with dropping "Cached", >> >> No it does not. MemFree and MemAvail stay constant at 5G. >> >>> but >>> MemAvailable basically is not changed. It looks sane to me. Am I >>> missing something else? >> >> I ever thought the kerne would not free the cache nor PSI gets rising >> when there are 5GB in MemFree and in MemAvail. This makes still no sense >> to me. Why drop the cache when you have 5G free. This results currently >> in I/O waits as the page was dropped. >> >> Greets, >> Stefan >> >>>> >>>> meminfo: >>>> MemTotal: 16423116 kB >>>> MemFree: 5280736 kB >>>> MemAvailable: 5332752 kB >>>> Buffers: 2572 kB >>>> Cached: 1225112 kB >>>> SwapCached: 0 kB >>>> Active: 8934976 kB >>>> Inactive: 1026900 kB >>>> Active(anon): 8740396 kB >>>> Inactive(anon): 873448 kB >>>> Active(file): 194580 kB >>>> Inactive(file): 153452 kB >>>> Unevictable: 19900 kB >>>> Mlocked: 19900 kB >>>> SwapTotal: 0 kB >>>> SwapFree: 0 kB >>>> Dirty: 1980 kB >>>> Writeback: 0 kB >>>> AnonPages: 8423480 kB >>>> Mapped: 978212 kB >>>> Shmem: 875680 kB >>>> Slab: 839868 kB >>>> SReclaimable: 383396 kB >>>> SUnreclaim: 456472 kB >>>> KernelStack: 22576 kB >>>> PageTables: 49824 kB >>>> NFS_Unstable: 0 kB >>>> Bounce: 0 kB >>>> WritebackTmp: 0 kB >>>> CommitLimit: 8211556 kB >>>> Committed_AS: 32060624 kB >>>> VmallocTotal: 34359738367 kB >>>> VmallocUsed: 0 kB >>>> VmallocChunk: 0 kB >>>> Percpu: 118048 kB >>>> HardwareCorrupted: 0 kB >>>> AnonHugePages: 6406144 kB >>>> ShmemHugePages: 0 kB >>>> ShmemPmdMapped: 0 kB >>>> HugePages_Total: 0 >>>> HugePages_Free: 0 >>>> HugePages_Rsvd: 0 >>>> HugePages_Surp: 0 >>>> Hugepagesize: 2048 kB >>>> Hugetlb: 0 kB >>>> DirectMap4k: 2580336 kB >>>> DirectMap2M: 14196736 kB >>>> DirectMap1G: 2097152 kB >>>> >>>> >>>> vmstat shows: >>>> nr_free_pages 1320053 >>>> nr_zone_inactive_anon 218362 >>>> nr_zone_active_anon 2185108 >>>> nr_zone_inactive_file 38363 >>>> nr_zone_active_file 48645 >>>> nr_zone_unevictable 4975 >>>> nr_zone_write_pending 495 >>>> nr_mlock 4975 >>>> nr_page_table_pages 12553 >>>> nr_kernel_stack 22576 >>>> nr_bounce 0 >>>> nr_zspages 0 >>>> nr_free_cma 0 >>>> numa_hit 13916119899 >>>> numa_miss 0 >>>> numa_foreign 0 >>>> numa_interleave 15629 >>>> numa_local 13916119899 >>>> numa_other 0 >>>> nr_inactive_anon 218362 >>>> nr_active_anon 2185164 >>>> nr_inactive_file 38363 >>>> nr_active_file 48645 >>>> nr_unevictable 4975 >>>> nr_slab_reclaimable 95849 >>>> nr_slab_unreclaimable 114118 >>>> nr_isolated_anon 0 >>>> nr_isolated_file 0 >>>> workingset_refault 71365357 >>>> workingset_activate 20281670 >>>> workingset_restore 8995665 >>>> workingset_nodereclaim 326085 >>>> nr_anon_pages 2105903 >>>> nr_mapped 244553 >>>> nr_file_pages 306921 >>>> nr_dirty 495 >>>> nr_writeback 0 >>>> nr_writeback_temp 0 >>>> nr_shmem 218920 >>>> nr_shmem_hugepages 0 >>>> nr_shmem_pmdmapped 0 >>>> nr_anon_transparent_hugepages 3128 >>>> nr_unstable 0 >>>> nr_vmscan_write 0 >>>> nr_vmscan_immediate_reclaim 1833104 >>>> nr_dirtied 386544087 >>>> nr_written 259220036 >>>> nr_dirty_threshold 265636 >>>> nr_dirty_background_threshold 132656 >>>> pgpgin 1817628997 >>>> pgpgout 3730818029 >>>> pswpin 0 >>>> pswpout 0 >>>> pgalloc_dma 0 >>>> pgalloc_dma32 5790777997 >>>> pgalloc_normal 20003662520 >>>> pgalloc_movable 0 >>>> allocstall_dma 0 >>>> allocstall_dma32 0 >>>> allocstall_normal 39 >>>> allocstall_movable 1980089 >>>> pgskip_dma 0 >>>> pgskip_dma32 0 >>>> pgskip_normal 0 >>>> pgskip_movable 0 >>>> pgfree 26637215947 >>>> pgactivate 316722654 >>>> pgdeactivate 261039211 >>>> pglazyfree 0 >>>> pgfault 17719356599 >>>> pgmajfault 30985544 >>>> pglazyfreed 0 >>>> pgrefill 286826568 >>>> pgsteal_kswapd 36740923 >>>> pgsteal_direct 349291470 >>>> pgscan_kswapd 36878966 >>>> pgscan_direct 395327492 >>>> pgscan_direct_throttle 0 >>>> zone_reclaim_failed 0 >>>> pginodesteal 49817087 >>>> slabs_scanned 597956834 >>>> kswapd_inodesteal 1412447 >>>> kswapd_low_wmark_hit_quickly 39 >>>> kswapd_high_wmark_hit_quickly 319 >>>> pageoutrun 3585 >>>> pgrotated 2873743 >>>> drop_pagecache 0 >>>> drop_slab 0 >>>> oom_kill 0 >>>> pgmigrate_success 839062285 >>>> pgmigrate_fail 507313 >>>> compact_migrate_scanned 9619077010 >>>> compact_free_scanned 67985619651 >>>> compact_isolated 1684537704 >>>> compact_stall 205761 >>>> compact_fail 182420 >>>> compact_success 23341 >>>> compact_daemon_wake 2 >>>> compact_daemon_migrate_scanned 811 >>>> compact_daemon_free_scanned 490241 >>>> htlb_buddy_alloc_success 0 >>>> htlb_buddy_alloc_fail 0 >>>> unevictable_pgs_culled 1006521 >>>> unevictable_pgs_scanned 0 >>>> unevictable_pgs_rescued 997077 >>>> unevictable_pgs_mlocked 1319203 >>>> unevictable_pgs_munlocked 842471 >>>> unevictable_pgs_cleared 470531 >>>> unevictable_pgs_stranded 459613 >>>> thp_fault_alloc 20263113 >>>> thp_fault_fallback 3368635 >>>> thp_collapse_alloc 226476 >>>> thp_collapse_alloc_failed 17594 >>>> thp_file_alloc 0 >>>> thp_file_mapped 0 >>>> thp_split_page 1159 >>>> thp_split_page_failed 3927 >>>> thp_deferred_split_page 20348941 >>>> thp_split_pmd 53361 >>>> thp_split_pud 0 >>>> thp_zero_page_alloc 1 >>>> thp_zero_page_alloc_failed 0 >>>> thp_swpout 0 >>>> thp_swpout_fallback 0 >>>> balloon_inflate 0 >>>> balloon_deflate 0 >>>> balloon_migrate 0 >>>> swap_ra 0 >>>> swap_ra_hit 0 >>>> >>>> Greets, >>>> Stefan >>>> >>>> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 11:56 ` Stefan Priebe - Profihost AG 2019-09-05 16:28 ` Yang Shi @ 2019-09-06 10:08 ` Stefan Priebe - Profihost AG 2019-09-06 10:25 ` Vlastimil Babka ` (2 more replies) 1 sibling, 3 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-06 10:08 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner These are the biggest differences in meminfo before and after cached starts to drop. I didn't expect cached end up in MemFree. Before: MemTotal: 16423116 kB MemFree: 374572 kB MemAvailable: 5633816 kB Cached: 5550972 kB Inactive: 4696580 kB Inactive(file): 3624776 kB After: MemTotal: 16423116 kB MemFree: 3477168 kB MemAvailable: 6066916 kB Cached: 2724504 kB Inactive: 1854740 kB Inactive(file): 950680 kB Any explanation? Greets, Stefan Am 05.09.19 um 13:56 schrieb Stefan Priebe - Profihost AG: > > Am 05.09.19 um 13:40 schrieb Michal Hocko: >> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: >>> Hello all, >>> >>> i hope you can help me again to understand the current MemAvailable >>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in >>> this case. >>> >>> I'm seeing the following behaviour i don't understand and ask for help. >>> >>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down >>> to 1G while the apache spawns some PHP processes. After that the PSI >>> mem.some value rises and the kernel tries to reclaim memory but >>> MemAvailable stays at 5G. >>> >>> Any ideas? >> >> Can you collect /proc/vmstat (every second or so) and post it while this >> is the case please? > > Yes sure. > > But i don't know which event you mean exactly. Current situation is PSI > / memory pressure is > 20 but: > > This is the current status where MemAvailable show 5G but Cached is > already dropped to 1G coming from 4G: > > > meminfo: > MemTotal: 16423116 kB > MemFree: 5280736 kB > MemAvailable: 5332752 kB > Buffers: 2572 kB > Cached: 1225112 kB > SwapCached: 0 kB > Active: 8934976 kB > Inactive: 1026900 kB > Active(anon): 8740396 kB > Inactive(anon): 873448 kB > Active(file): 194580 kB > Inactive(file): 153452 kB > Unevictable: 19900 kB > Mlocked: 19900 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 1980 kB > Writeback: 0 kB > AnonPages: 8423480 kB > Mapped: 978212 kB > Shmem: 875680 kB > Slab: 839868 kB > SReclaimable: 383396 kB > SUnreclaim: 456472 kB > KernelStack: 22576 kB > PageTables: 49824 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 8211556 kB > Committed_AS: 32060624 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 0 kB > VmallocChunk: 0 kB > Percpu: 118048 kB > HardwareCorrupted: 0 kB > AnonHugePages: 6406144 kB > ShmemHugePages: 0 kB > ShmemPmdMapped: 0 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > Hugetlb: 0 kB > DirectMap4k: 2580336 kB > DirectMap2M: 14196736 kB > DirectMap1G: 2097152 kB > > > vmstat shows: > nr_free_pages 1320053 > nr_zone_inactive_anon 218362 > nr_zone_active_anon 2185108 > nr_zone_inactive_file 38363 > nr_zone_active_file 48645 > nr_zone_unevictable 4975 > nr_zone_write_pending 495 > nr_mlock 4975 > nr_page_table_pages 12553 > nr_kernel_stack 22576 > nr_bounce 0 > nr_zspages 0 > nr_free_cma 0 > numa_hit 13916119899 > numa_miss 0 > numa_foreign 0 > numa_interleave 15629 > numa_local 13916119899 > numa_other 0 > nr_inactive_anon 218362 > nr_active_anon 2185164 > nr_inactive_file 38363 > nr_active_file 48645 > nr_unevictable 4975 > nr_slab_reclaimable 95849 > nr_slab_unreclaimable 114118 > nr_isolated_anon 0 > nr_isolated_file 0 > workingset_refault 71365357 > workingset_activate 20281670 > workingset_restore 8995665 > workingset_nodereclaim 326085 > nr_anon_pages 2105903 > nr_mapped 244553 > nr_file_pages 306921 > nr_dirty 495 > nr_writeback 0 > nr_writeback_temp 0 > nr_shmem 218920 > nr_shmem_hugepages 0 > nr_shmem_pmdmapped 0 > nr_anon_transparent_hugepages 3128 > nr_unstable 0 > nr_vmscan_write 0 > nr_vmscan_immediate_reclaim 1833104 > nr_dirtied 386544087 > nr_written 259220036 > nr_dirty_threshold 265636 > nr_dirty_background_threshold 132656 > pgpgin 1817628997 > pgpgout 3730818029 > pswpin 0 > pswpout 0 > pgalloc_dma 0 > pgalloc_dma32 5790777997 > pgalloc_normal 20003662520 > pgalloc_movable 0 > allocstall_dma 0 > allocstall_dma32 0 > allocstall_normal 39 > allocstall_movable 1980089 > pgskip_dma 0 > pgskip_dma32 0 > pgskip_normal 0 > pgskip_movable 0 > pgfree 26637215947 > pgactivate 316722654 > pgdeactivate 261039211 > pglazyfree 0 > pgfault 17719356599 > pgmajfault 30985544 > pglazyfreed 0 > pgrefill 286826568 > pgsteal_kswapd 36740923 > pgsteal_direct 349291470 > pgscan_kswapd 36878966 > pgscan_direct 395327492 > pgscan_direct_throttle 0 > zone_reclaim_failed 0 > pginodesteal 49817087 > slabs_scanned 597956834 > kswapd_inodesteal 1412447 > kswapd_low_wmark_hit_quickly 39 > kswapd_high_wmark_hit_quickly 319 > pageoutrun 3585 > pgrotated 2873743 > drop_pagecache 0 > drop_slab 0 > oom_kill 0 > pgmigrate_success 839062285 > pgmigrate_fail 507313 > compact_migrate_scanned 9619077010 > compact_free_scanned 67985619651 > compact_isolated 1684537704 > compact_stall 205761 > compact_fail 182420 > compact_success 23341 > compact_daemon_wake 2 > compact_daemon_migrate_scanned 811 > compact_daemon_free_scanned 490241 > htlb_buddy_alloc_success 0 > htlb_buddy_alloc_fail 0 > unevictable_pgs_culled 1006521 > unevictable_pgs_scanned 0 > unevictable_pgs_rescued 997077 > unevictable_pgs_mlocked 1319203 > unevictable_pgs_munlocked 842471 > unevictable_pgs_cleared 470531 > unevictable_pgs_stranded 459613 > thp_fault_alloc 20263113 > thp_fault_fallback 3368635 > thp_collapse_alloc 226476 > thp_collapse_alloc_failed 17594 > thp_file_alloc 0 > thp_file_mapped 0 > thp_split_page 1159 > thp_split_page_failed 3927 > thp_deferred_split_page 20348941 > thp_split_pmd 53361 > thp_split_pud 0 > thp_zero_page_alloc 1 > thp_zero_page_alloc_failed 0 > thp_swpout 0 > thp_swpout_fallback 0 > balloon_inflate 0 > balloon_deflate 0 > balloon_migrate 0 > swap_ra 0 > swap_ra_hit 0 > > Greets, > Stefan > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-06 10:08 ` Stefan Priebe - Profihost AG @ 2019-09-06 10:25 ` Vlastimil Babka 2019-09-06 18:52 ` Yang Shi 2019-09-09 8:27 ` Michal Hocko 2 siblings, 0 replies; 61+ messages in thread From: Vlastimil Babka @ 2019-09-06 10:25 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On 9/6/19 12:08 PM, Stefan Priebe - Profihost AG wrote: > These are the biggest differences in meminfo before and after cached > starts to drop. I didn't expect cached end up in MemFree. > > Before: > MemTotal: 16423116 kB > MemFree: 374572 kB > MemAvailable: 5633816 kB > Cached: 5550972 kB > Inactive: 4696580 kB > Inactive(file): 3624776 kB > > > After: > MemTotal: 16423116 kB > MemFree: 3477168 kB > MemAvailable: 6066916 kB > Cached: 2724504 kB > Inactive: 1854740 kB > Inactive(file): 950680 kB > > Any explanation? How does /proc/pagetypeinfo look like? Also as Michal said, collecting the whole of /proc/vmstat (e.g. catting it to vmstat.$TIMESTAMP once per second) when the bad situation is happening, would be useful. You could also try if the bad trend stops after you execute: echo never > /sys/kernel/mm/transparent_hugepage/defrag ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-06 10:08 ` Stefan Priebe - Profihost AG 2019-09-06 10:25 ` Vlastimil Babka @ 2019-09-06 18:52 ` Yang Shi 2019-09-07 7:32 ` Stefan Priebe - Profihost AG 2019-09-09 8:27 ` Michal Hocko 2 siblings, 1 reply; 61+ messages in thread From: Yang Shi @ 2019-09-06 18:52 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner On Fri, Sep 6, 2019 at 3:08 AM Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > > These are the biggest differences in meminfo before and after cached > starts to drop. I didn't expect cached end up in MemFree. > > Before: > MemTotal: 16423116 kB > MemFree: 374572 kB Here MemFree is only ~300MB? It is quite low comparing with the amount of total memory. It may trigger water mark to launch kswapd. > MemAvailable: 5633816 kB > Cached: 5550972 kB > Inactive: 4696580 kB > Inactive(file): 3624776 kB > > > After: > MemTotal: 16423116 kB > MemFree: 3477168 kB Here MemFree is ~3GB and file cache was shrunk from ~5G down to ~2G. > MemAvailable: 6066916 kB > Cached: 2724504 kB > Inactive: 1854740 kB > Inactive(file): 950680 kB > > Any explanation? > > Greets, > Stefan > Am 05.09.19 um 13:56 schrieb Stefan Priebe - Profihost AG: > > > > Am 05.09.19 um 13:40 schrieb Michal Hocko: > >> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: > >>> Hello all, > >>> > >>> i hope you can help me again to understand the current MemAvailable > >>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in > >>> this case. > >>> > >>> I'm seeing the following behaviour i don't understand and ask for help. > >>> > >>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down > >>> to 1G while the apache spawns some PHP processes. After that the PSI > >>> mem.some value rises and the kernel tries to reclaim memory but > >>> MemAvailable stays at 5G. > >>> > >>> Any ideas? > >> > >> Can you collect /proc/vmstat (every second or so) and post it while this > >> is the case please? > > > > Yes sure. > > > > But i don't know which event you mean exactly. Current situation is PSI > > / memory pressure is > 20 but: > > > > This is the current status where MemAvailable show 5G but Cached is > > already dropped to 1G coming from 4G: > > > > > > meminfo: > > MemTotal: 16423116 kB > > MemFree: 5280736 kB > > MemAvailable: 5332752 kB > > Buffers: 2572 kB > > Cached: 1225112 kB > > SwapCached: 0 kB > > Active: 8934976 kB > > Inactive: 1026900 kB > > Active(anon): 8740396 kB > > Inactive(anon): 873448 kB > > Active(file): 194580 kB > > Inactive(file): 153452 kB > > Unevictable: 19900 kB > > Mlocked: 19900 kB > > SwapTotal: 0 kB > > SwapFree: 0 kB > > Dirty: 1980 kB > > Writeback: 0 kB > > AnonPages: 8423480 kB > > Mapped: 978212 kB > > Shmem: 875680 kB > > Slab: 839868 kB > > SReclaimable: 383396 kB > > SUnreclaim: 456472 kB > > KernelStack: 22576 kB > > PageTables: 49824 kB > > NFS_Unstable: 0 kB > > Bounce: 0 kB > > WritebackTmp: 0 kB > > CommitLimit: 8211556 kB > > Committed_AS: 32060624 kB > > VmallocTotal: 34359738367 kB > > VmallocUsed: 0 kB > > VmallocChunk: 0 kB > > Percpu: 118048 kB > > HardwareCorrupted: 0 kB > > AnonHugePages: 6406144 kB > > ShmemHugePages: 0 kB > > ShmemPmdMapped: 0 kB > > HugePages_Total: 0 > > HugePages_Free: 0 > > HugePages_Rsvd: 0 > > HugePages_Surp: 0 > > Hugepagesize: 2048 kB > > Hugetlb: 0 kB > > DirectMap4k: 2580336 kB > > DirectMap2M: 14196736 kB > > DirectMap1G: 2097152 kB > > > > > > vmstat shows: > > nr_free_pages 1320053 > > nr_zone_inactive_anon 218362 > > nr_zone_active_anon 2185108 > > nr_zone_inactive_file 38363 > > nr_zone_active_file 48645 > > nr_zone_unevictable 4975 > > nr_zone_write_pending 495 > > nr_mlock 4975 > > nr_page_table_pages 12553 > > nr_kernel_stack 22576 > > nr_bounce 0 > > nr_zspages 0 > > nr_free_cma 0 > > numa_hit 13916119899 > > numa_miss 0 > > numa_foreign 0 > > numa_interleave 15629 > > numa_local 13916119899 > > numa_other 0 > > nr_inactive_anon 218362 > > nr_active_anon 2185164 > > nr_inactive_file 38363 > > nr_active_file 48645 > > nr_unevictable 4975 > > nr_slab_reclaimable 95849 > > nr_slab_unreclaimable 114118 > > nr_isolated_anon 0 > > nr_isolated_file 0 > > workingset_refault 71365357 > > workingset_activate 20281670 > > workingset_restore 8995665 > > workingset_nodereclaim 326085 > > nr_anon_pages 2105903 > > nr_mapped 244553 > > nr_file_pages 306921 > > nr_dirty 495 > > nr_writeback 0 > > nr_writeback_temp 0 > > nr_shmem 218920 > > nr_shmem_hugepages 0 > > nr_shmem_pmdmapped 0 > > nr_anon_transparent_hugepages 3128 > > nr_unstable 0 > > nr_vmscan_write 0 > > nr_vmscan_immediate_reclaim 1833104 > > nr_dirtied 386544087 > > nr_written 259220036 > > nr_dirty_threshold 265636 > > nr_dirty_background_threshold 132656 > > pgpgin 1817628997 > > pgpgout 3730818029 > > pswpin 0 > > pswpout 0 > > pgalloc_dma 0 > > pgalloc_dma32 5790777997 > > pgalloc_normal 20003662520 > > pgalloc_movable 0 > > allocstall_dma 0 > > allocstall_dma32 0 > > allocstall_normal 39 > > allocstall_movable 1980089 > > pgskip_dma 0 > > pgskip_dma32 0 > > pgskip_normal 0 > > pgskip_movable 0 > > pgfree 26637215947 > > pgactivate 316722654 > > pgdeactivate 261039211 > > pglazyfree 0 > > pgfault 17719356599 > > pgmajfault 30985544 > > pglazyfreed 0 > > pgrefill 286826568 > > pgsteal_kswapd 36740923 > > pgsteal_direct 349291470 > > pgscan_kswapd 36878966 > > pgscan_direct 395327492 > > pgscan_direct_throttle 0 > > zone_reclaim_failed 0 > > pginodesteal 49817087 > > slabs_scanned 597956834 > > kswapd_inodesteal 1412447 > > kswapd_low_wmark_hit_quickly 39 > > kswapd_high_wmark_hit_quickly 319 > > pageoutrun 3585 > > pgrotated 2873743 > > drop_pagecache 0 > > drop_slab 0 > > oom_kill 0 > > pgmigrate_success 839062285 > > pgmigrate_fail 507313 > > compact_migrate_scanned 9619077010 > > compact_free_scanned 67985619651 > > compact_isolated 1684537704 > > compact_stall 205761 > > compact_fail 182420 > > compact_success 23341 > > compact_daemon_wake 2 > > compact_daemon_migrate_scanned 811 > > compact_daemon_free_scanned 490241 > > htlb_buddy_alloc_success 0 > > htlb_buddy_alloc_fail 0 > > unevictable_pgs_culled 1006521 > > unevictable_pgs_scanned 0 > > unevictable_pgs_rescued 997077 > > unevictable_pgs_mlocked 1319203 > > unevictable_pgs_munlocked 842471 > > unevictable_pgs_cleared 470531 > > unevictable_pgs_stranded 459613 > > thp_fault_alloc 20263113 > > thp_fault_fallback 3368635 > > thp_collapse_alloc 226476 > > thp_collapse_alloc_failed 17594 > > thp_file_alloc 0 > > thp_file_mapped 0 > > thp_split_page 1159 > > thp_split_page_failed 3927 > > thp_deferred_split_page 20348941 > > thp_split_pmd 53361 > > thp_split_pud 0 > > thp_zero_page_alloc 1 > > thp_zero_page_alloc_failed 0 > > thp_swpout 0 > > thp_swpout_fallback 0 > > balloon_inflate 0 > > balloon_deflate 0 > > balloon_migrate 0 > > swap_ra 0 > > swap_ra_hit 0 > > > > Greets, > > Stefan > > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-06 18:52 ` Yang Shi @ 2019-09-07 7:32 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-07 7:32 UTC (permalink / raw) To: Yang Shi; +Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner Am 06.09.19 um 20:52 schrieb Yang Shi: > On Fri, Sep 6, 2019 at 3:08 AM Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> >> These are the biggest differences in meminfo before and after cached >> starts to drop. I didn't expect cached end up in MemFree. >> >> Before: >> MemTotal: 16423116 kB >> MemFree: 374572 kB > > Here MemFree is only ~300MB? It is quite low comparing with the amount > of total memory. It may trigger water mark to launch kswapd. mhm yes that might be possible but i don't see kswapd running in the process list at least not using cpu. Also does it really free all cache? i thought it would only free up to vm.min_free_kbytes but this is 160MB on this machine. >> MemAvailable: 5633816 kB >> Cached: 5550972 kB >> Inactive: 4696580 kB >> Inactive(file): 3624776 kB >> >> >> After: >> MemTotal: 16423116 kB >> MemFree: 3477168 kB > > Here MemFree is ~3GB and file cache was shrunk from ~5G down to ~2G. yes - bu i thought if file cache gets shrunk another process has requested tis memory and would use it. So it does not end up in free. I'm sure this all is explainable - but i really would like to know how. Greets, Stefan >> MemAvailable: 6066916 kB >> Cached: 2724504 kB >> Inactive: 1854740 kB >> Inactive(file): 950680 kB >> >> Any explanation? >> >> Greets, >> Stefan >> Am 05.09.19 um 13:56 schrieb Stefan Priebe - Profihost AG: >>> >>> Am 05.09.19 um 13:40 schrieb Michal Hocko: >>>> On Thu 05-09-19 13:27:10, Stefan Priebe - Profihost AG wrote: >>>>> Hello all, >>>>> >>>>> i hope you can help me again to understand the current MemAvailable >>>>> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in >>>>> this case. >>>>> >>>>> I'm seeing the following behaviour i don't understand and ask for help. >>>>> >>>>> While MemAvailable shows 5G the kernel starts to drop cache from 4G down >>>>> to 1G while the apache spawns some PHP processes. After that the PSI >>>>> mem.some value rises and the kernel tries to reclaim memory but >>>>> MemAvailable stays at 5G. >>>>> >>>>> Any ideas? >>>> >>>> Can you collect /proc/vmstat (every second or so) and post it while this >>>> is the case please? >>> >>> Yes sure. >>> >>> But i don't know which event you mean exactly. Current situation is PSI >>> / memory pressure is > 20 but: >>> >>> This is the current status where MemAvailable show 5G but Cached is >>> already dropped to 1G coming from 4G: >>> >>> >>> meminfo: >>> MemTotal: 16423116 kB >>> MemFree: 5280736 kB >>> MemAvailable: 5332752 kB >>> Buffers: 2572 kB >>> Cached: 1225112 kB >>> SwapCached: 0 kB >>> Active: 8934976 kB >>> Inactive: 1026900 kB >>> Active(anon): 8740396 kB >>> Inactive(anon): 873448 kB >>> Active(file): 194580 kB >>> Inactive(file): 153452 kB >>> Unevictable: 19900 kB >>> Mlocked: 19900 kB >>> SwapTotal: 0 kB >>> SwapFree: 0 kB >>> Dirty: 1980 kB >>> Writeback: 0 kB >>> AnonPages: 8423480 kB >>> Mapped: 978212 kB >>> Shmem: 875680 kB >>> Slab: 839868 kB >>> SReclaimable: 383396 kB >>> SUnreclaim: 456472 kB >>> KernelStack: 22576 kB >>> PageTables: 49824 kB >>> NFS_Unstable: 0 kB >>> Bounce: 0 kB >>> WritebackTmp: 0 kB >>> CommitLimit: 8211556 kB >>> Committed_AS: 32060624 kB >>> VmallocTotal: 34359738367 kB >>> VmallocUsed: 0 kB >>> VmallocChunk: 0 kB >>> Percpu: 118048 kB >>> HardwareCorrupted: 0 kB >>> AnonHugePages: 6406144 kB >>> ShmemHugePages: 0 kB >>> ShmemPmdMapped: 0 kB >>> HugePages_Total: 0 >>> HugePages_Free: 0 >>> HugePages_Rsvd: 0 >>> HugePages_Surp: 0 >>> Hugepagesize: 2048 kB >>> Hugetlb: 0 kB >>> DirectMap4k: 2580336 kB >>> DirectMap2M: 14196736 kB >>> DirectMap1G: 2097152 kB >>> >>> >>> vmstat shows: >>> nr_free_pages 1320053 >>> nr_zone_inactive_anon 218362 >>> nr_zone_active_anon 2185108 >>> nr_zone_inactive_file 38363 >>> nr_zone_active_file 48645 >>> nr_zone_unevictable 4975 >>> nr_zone_write_pending 495 >>> nr_mlock 4975 >>> nr_page_table_pages 12553 >>> nr_kernel_stack 22576 >>> nr_bounce 0 >>> nr_zspages 0 >>> nr_free_cma 0 >>> numa_hit 13916119899 >>> numa_miss 0 >>> numa_foreign 0 >>> numa_interleave 15629 >>> numa_local 13916119899 >>> numa_other 0 >>> nr_inactive_anon 218362 >>> nr_active_anon 2185164 >>> nr_inactive_file 38363 >>> nr_active_file 48645 >>> nr_unevictable 4975 >>> nr_slab_reclaimable 95849 >>> nr_slab_unreclaimable 114118 >>> nr_isolated_anon 0 >>> nr_isolated_file 0 >>> workingset_refault 71365357 >>> workingset_activate 20281670 >>> workingset_restore 8995665 >>> workingset_nodereclaim 326085 >>> nr_anon_pages 2105903 >>> nr_mapped 244553 >>> nr_file_pages 306921 >>> nr_dirty 495 >>> nr_writeback 0 >>> nr_writeback_temp 0 >>> nr_shmem 218920 >>> nr_shmem_hugepages 0 >>> nr_shmem_pmdmapped 0 >>> nr_anon_transparent_hugepages 3128 >>> nr_unstable 0 >>> nr_vmscan_write 0 >>> nr_vmscan_immediate_reclaim 1833104 >>> nr_dirtied 386544087 >>> nr_written 259220036 >>> nr_dirty_threshold 265636 >>> nr_dirty_background_threshold 132656 >>> pgpgin 1817628997 >>> pgpgout 3730818029 >>> pswpin 0 >>> pswpout 0 >>> pgalloc_dma 0 >>> pgalloc_dma32 5790777997 >>> pgalloc_normal 20003662520 >>> pgalloc_movable 0 >>> allocstall_dma 0 >>> allocstall_dma32 0 >>> allocstall_normal 39 >>> allocstall_movable 1980089 >>> pgskip_dma 0 >>> pgskip_dma32 0 >>> pgskip_normal 0 >>> pgskip_movable 0 >>> pgfree 26637215947 >>> pgactivate 316722654 >>> pgdeactivate 261039211 >>> pglazyfree 0 >>> pgfault 17719356599 >>> pgmajfault 30985544 >>> pglazyfreed 0 >>> pgrefill 286826568 >>> pgsteal_kswapd 36740923 >>> pgsteal_direct 349291470 >>> pgscan_kswapd 36878966 >>> pgscan_direct 395327492 >>> pgscan_direct_throttle 0 >>> zone_reclaim_failed 0 >>> pginodesteal 49817087 >>> slabs_scanned 597956834 >>> kswapd_inodesteal 1412447 >>> kswapd_low_wmark_hit_quickly 39 >>> kswapd_high_wmark_hit_quickly 319 >>> pageoutrun 3585 >>> pgrotated 2873743 >>> drop_pagecache 0 >>> drop_slab 0 >>> oom_kill 0 >>> pgmigrate_success 839062285 >>> pgmigrate_fail 507313 >>> compact_migrate_scanned 9619077010 >>> compact_free_scanned 67985619651 >>> compact_isolated 1684537704 >>> compact_stall 205761 >>> compact_fail 182420 >>> compact_success 23341 >>> compact_daemon_wake 2 >>> compact_daemon_migrate_scanned 811 >>> compact_daemon_free_scanned 490241 >>> htlb_buddy_alloc_success 0 >>> htlb_buddy_alloc_fail 0 >>> unevictable_pgs_culled 1006521 >>> unevictable_pgs_scanned 0 >>> unevictable_pgs_rescued 997077 >>> unevictable_pgs_mlocked 1319203 >>> unevictable_pgs_munlocked 842471 >>> unevictable_pgs_cleared 470531 >>> unevictable_pgs_stranded 459613 >>> thp_fault_alloc 20263113 >>> thp_fault_fallback 3368635 >>> thp_collapse_alloc 226476 >>> thp_collapse_alloc_failed 17594 >>> thp_file_alloc 0 >>> thp_file_mapped 0 >>> thp_split_page 1159 >>> thp_split_page_failed 3927 >>> thp_deferred_split_page 20348941 >>> thp_split_pmd 53361 >>> thp_split_pud 0 >>> thp_zero_page_alloc 1 >>> thp_zero_page_alloc_failed 0 >>> thp_swpout 0 >>> thp_swpout_fallback 0 >>> balloon_inflate 0 >>> balloon_deflate 0 >>> balloon_migrate 0 >>> swap_ra 0 >>> swap_ra_hit 0 >>> >>> Greets, >>> Stefan >>> >> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-06 10:08 ` Stefan Priebe - Profihost AG 2019-09-06 10:25 ` Vlastimil Babka 2019-09-06 18:52 ` Yang Shi @ 2019-09-09 8:27 ` Michal Hocko 2019-09-09 8:54 ` Stefan Priebe - Profihost AG 2 siblings, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-09 8:27 UTC (permalink / raw) To: Stefan Priebe - Profihost AG; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On Fri 06-09-19 12:08:31, Stefan Priebe - Profihost AG wrote: > These are the biggest differences in meminfo before and after cached > starts to drop. I didn't expect cached end up in MemFree. > > Before: > MemTotal: 16423116 kB > MemFree: 374572 kB > MemAvailable: 5633816 kB > Cached: 5550972 kB > Inactive: 4696580 kB > Inactive(file): 3624776 kB > > > After: > MemTotal: 16423116 kB > MemFree: 3477168 kB > MemAvailable: 6066916 kB > Cached: 2724504 kB > Inactive: 1854740 kB > Inactive(file): 950680 kB > > Any explanation? Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and me earlier in this thread? Seeing the overall progress would tell us much more than before and after. Or have I missed this data? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 8:27 ` Michal Hocko @ 2019-09-09 8:54 ` Stefan Priebe - Profihost AG 2019-09-09 11:01 ` Michal Hocko 2019-09-09 11:49 ` Vlastimil Babka 0 siblings, 2 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-09 8:54 UTC (permalink / raw) To: Michal Hocko; +Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner [-- Attachment #1: Type: text/plain, Size: 1116 bytes --] Hello Michal, Am 09.09.19 um 10:27 schrieb Michal Hocko: > On Fri 06-09-19 12:08:31, Stefan Priebe - Profihost AG wrote: >> These are the biggest differences in meminfo before and after cached >> starts to drop. I didn't expect cached end up in MemFree. >> >> Before: >> MemTotal: 16423116 kB >> MemFree: 374572 kB >> MemAvailable: 5633816 kB >> Cached: 5550972 kB >> Inactive: 4696580 kB >> Inactive(file): 3624776 kB >> >> >> After: >> MemTotal: 16423116 kB >> MemFree: 3477168 kB >> MemAvailable: 6066916 kB >> Cached: 2724504 kB >> Inactive: 1854740 kB >> Inactive(file): 950680 kB >> >> Any explanation? > > Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and > me earlier in this thread? Seeing the overall progress would tell us > much more than before and after. Or have I missed this data? I needed to wait until today to grab again such a situation but from what i know it is very clear that MemFree is low and than the kernel starts to drop the chaches. Attached you'll find two log files. Greets, Stefan [-- Attachment #2: meminfo.gz --] [-- Type: application/gzip, Size: 114233 bytes --] [-- Attachment #3: vmstat.gz --] [-- Type: application/gzip, Size: 224712 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 8:54 ` Stefan Priebe - Profihost AG @ 2019-09-09 11:01 ` Michal Hocko 2019-09-09 12:08 ` Michal Hocko 2019-09-09 11:49 ` Vlastimil Babka 1 sibling, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-09 11:01 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka [Cc Vlastimil - logs are http://lkml.kernel.org/r/1d9ee19a-98c9-cd78-1e5b-21d9d6e36792@profihost.ag] On Mon 09-09-19 10:54:21, Stefan Priebe - Profihost AG wrote: > Hello Michal, > > Am 09.09.19 um 10:27 schrieb Michal Hocko: > > On Fri 06-09-19 12:08:31, Stefan Priebe - Profihost AG wrote: > >> These are the biggest differences in meminfo before and after cached > >> starts to drop. I didn't expect cached end up in MemFree. > >> > >> Before: > >> MemTotal: 16423116 kB > >> MemFree: 374572 kB > >> MemAvailable: 5633816 kB > >> Cached: 5550972 kB > >> Inactive: 4696580 kB > >> Inactive(file): 3624776 kB > >> > >> > >> After: > >> MemTotal: 16423116 kB > >> MemFree: 3477168 kB > >> MemAvailable: 6066916 kB > >> Cached: 2724504 kB > >> Inactive: 1854740 kB > >> Inactive(file): 950680 kB > >> > >> Any explanation? > > > > Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and > > me earlier in this thread? Seeing the overall progress would tell us > > much more than before and after. Or have I missed this data? > > I needed to wait until today to grab again such a situation but from > what i know it is very clear that MemFree is low and than the kernel > starts to drop the chaches. > > Attached you'll find two log files. $ grep pgsteal_kswapd vmstat | uniq -c 1331 pgsteal_kswapd 37142300 $ grep pgscan_kswapd vmstat | uniq -c 1331 pgscan_kswapd 37285092 kswapd hasn't scanned nor reclaimed any memory throughout the whole collected time span. On the other hand we can see direct reclaim active. But we can see quite some direct reclaim activity: $ awk '/pgsteal_direct/ {val=$2+0; ln++; if (last && val-last > 0) {printf("%d %d\n", ln, val-last)} last=val}' vmstat | head 17 1058 18 9773 19 1036 24 11413 49 1055 50 1050 51 17938 52 22665 53 29400 54 5997 So there is a steady source of the direct reclaim which is quite unexpected considering the background reclaim is inactive. Or maybe it is blocked not able to make a forward progress. 780513 pages has been reclaimed which is 3G worth of memory which matches the dropdown you are seeing AFAICS. $ grep allocstall_dma32 vmstat | uniq -c 1331 allocstall_dma32 0 $ grep allocstall_normal vmstat | uniq -c 1331 allocstall_normal 39 no direct reclaim invoked for DMA32 and Normal zones. But Movable zone seems the be the source of the direct reclaim awk '/allocstall_movable/ {val=$2+0; ln++; if (last && val-last > 0) {printf("%d %d\n", ln, val-last)} last=val}' vmstat | head 17 1 18 9 19 1 24 10 49 1 50 1 51 17 52 20 53 28 54 5 and that matches moments when we reclaimed memory. There seems to be a steady THP allocations flow so maybe this is a source of the direct reclaim? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 11:01 ` Michal Hocko @ 2019-09-09 12:08 ` Michal Hocko 2019-09-09 12:10 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-09 12:08 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Mon 09-09-19 13:01:36, Michal Hocko wrote: > and that matches moments when we reclaimed memory. There seems to be a > steady THP allocations flow so maybe this is a source of the direct > reclaim? I was thinking about this some more and THP being a source of reclaim sounds quite unlikely. At least in a default configuration because we shouldn't do anything expensinve in the #PF path. But there might be a difference source of high order (!costly) allocations. Could you check how many allocation requests like that you have on your system? mount -t debugfs none /debug echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable cat /debug/tracing/trace_pipe > $file -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:08 ` Michal Hocko @ 2019-09-09 12:10 ` Stefan Priebe - Profihost AG 2019-09-09 12:28 ` Michal Hocko 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-09 12:10 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Am 09.09.19 um 14:08 schrieb Michal Hocko: > On Mon 09-09-19 13:01:36, Michal Hocko wrote: >> and that matches moments when we reclaimed memory. There seems to be a >> steady THP allocations flow so maybe this is a source of the direct >> reclaim? > > I was thinking about this some more and THP being a source of reclaim > sounds quite unlikely. At least in a default configuration because we > shouldn't do anything expensinve in the #PF path. But there might be a > difference source of high order (!costly) allocations. Could you check > how many allocation requests like that you have on your system? > > mount -t debugfs none /debug > echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter > echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable > cat /debug/tracing/trace_pipe > $file Just now or when PSI raises? Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:10 ` Stefan Priebe - Profihost AG @ 2019-09-09 12:28 ` Michal Hocko 2019-09-09 12:37 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-09 12:28 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: > > Am 09.09.19 um 14:08 schrieb Michal Hocko: > > On Mon 09-09-19 13:01:36, Michal Hocko wrote: > >> and that matches moments when we reclaimed memory. There seems to be a > >> steady THP allocations flow so maybe this is a source of the direct > >> reclaim? > > > > I was thinking about this some more and THP being a source of reclaim > > sounds quite unlikely. At least in a default configuration because we > > shouldn't do anything expensinve in the #PF path. But there might be a > > difference source of high order (!costly) allocations. Could you check > > how many allocation requests like that you have on your system? > > > > mount -t debugfs none /debug > > echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter > > echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable > > cat /debug/tracing/trace_pipe > $file echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable might tell us something as well but it might turn out that it just still doesn't give us the full picture and we might need echo stacktrace > /debug/tracing/trace_options It will generate much more output though. > Just now or when PSI raises? When the excessive reclaim is happening ideally. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:28 ` Michal Hocko @ 2019-09-09 12:37 ` Stefan Priebe - Profihost AG 2019-09-09 12:49 ` Michal Hocko 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-09 12:37 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka [-- Attachment #1: Type: text/plain, Size: 1941 bytes --] Am 09.09.19 um 14:28 schrieb Michal Hocko: > On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: >> >> Am 09.09.19 um 14:08 schrieb Michal Hocko: >>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: >>>> and that matches moments when we reclaimed memory. There seems to be a >>>> steady THP allocations flow so maybe this is a source of the direct >>>> reclaim? >>> >>> I was thinking about this some more and THP being a source of reclaim >>> sounds quite unlikely. At least in a default configuration because we >>> shouldn't do anything expensinve in the #PF path. But there might be a >>> difference source of high order (!costly) allocations. Could you check >>> how many allocation requests like that you have on your system? >>> >>> mount -t debugfs none /debug >>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter >>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable >>> cat /debug/tracing/trace_pipe > $file > > echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable > echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable > > might tell us something as well but it might turn out that it just still > doesn't give us the full picture and we might need > echo stacktrace > /debug/tracing/trace_options > > It will generate much more output though. > >> Just now or when PSI raises? > > When the excessive reclaim is happening ideally. This one is from a server with 28G memfree but memory pressure is still jumping between 0 and 10%. I did: echo "order > 0" > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable echo 1 > /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable echo 1 > /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace File attached. Stefan [-- Attachment #2: trace.gz --] [-- Type: application/gzip, Size: 311017 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:37 ` Stefan Priebe - Profihost AG @ 2019-09-09 12:49 ` Michal Hocko 2019-09-09 12:56 ` Stefan Priebe - Profihost AG 2019-09-10 5:41 ` Stefan Priebe - Profihost AG 0 siblings, 2 replies; 61+ messages in thread From: Michal Hocko @ 2019-09-09 12:49 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: > > Am 09.09.19 um 14:28 schrieb Michal Hocko: > > On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: > >> > >> Am 09.09.19 um 14:08 schrieb Michal Hocko: > >>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: > >>>> and that matches moments when we reclaimed memory. There seems to be a > >>>> steady THP allocations flow so maybe this is a source of the direct > >>>> reclaim? > >>> > >>> I was thinking about this some more and THP being a source of reclaim > >>> sounds quite unlikely. At least in a default configuration because we > >>> shouldn't do anything expensinve in the #PF path. But there might be a > >>> difference source of high order (!costly) allocations. Could you check > >>> how many allocation requests like that you have on your system? > >>> > >>> mount -t debugfs none /debug > >>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter > >>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable > >>> cat /debug/tracing/trace_pipe > $file > > > > echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable > > echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable > > > > might tell us something as well but it might turn out that it just still > > doesn't give us the full picture and we might need > > echo stacktrace > /debug/tracing/trace_options > > > > It will generate much more output though. > > > >> Just now or when PSI raises? > > > > When the excessive reclaim is happening ideally. > > This one is from a server with 28G memfree but memory pressure is still > jumping between 0 and 10%. > > I did: > echo "order > 0" > > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter > > echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable > > echo 1 > > /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable > > echo 1 > > /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable > > timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace > > File attached. There is no reclaim captured in this trace dump. $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c 777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO 4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT 62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP 11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE 1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE 1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO 7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT 73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE 528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT 5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP 1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC 13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO 1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO 1232 order=9 gfp_flags=GFP_TRANSHUGE 108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE 362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE Nothing really stands out because except for the THP ones none of others are going to even be using movable zone. You've said that your machine doesn't have more than one NUMA node, right? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:49 ` Michal Hocko @ 2019-09-09 12:56 ` Stefan Priebe - Profihost AG [not found] ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag> 2019-09-10 5:41 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-09 12:56 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Am 09.09.19 um 14:49 schrieb Michal Hocko: > On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: >> >> Am 09.09.19 um 14:28 schrieb Michal Hocko: >>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 09.09.19 um 14:08 schrieb Michal Hocko: >>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: >>>>>> and that matches moments when we reclaimed memory. There seems to be a >>>>>> steady THP allocations flow so maybe this is a source of the direct >>>>>> reclaim? >>>>> >>>>> I was thinking about this some more and THP being a source of reclaim >>>>> sounds quite unlikely. At least in a default configuration because we >>>>> shouldn't do anything expensinve in the #PF path. But there might be a >>>>> difference source of high order (!costly) allocations. Could you check >>>>> how many allocation requests like that you have on your system? >>>>> >>>>> mount -t debugfs none /debug >>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter >>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable >>>>> cat /debug/tracing/trace_pipe > $file >>> >>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >>> >>> might tell us something as well but it might turn out that it just still >>> doesn't give us the full picture and we might need >>> echo stacktrace > /debug/tracing/trace_options >>> >>> It will generate much more output though. >>> >>>> Just now or when PSI raises? >>> >>> When the excessive reclaim is happening ideally. >> >> This one is from a server with 28G memfree but memory pressure is still >> jumping between 0 and 10%. >> >> I did: >> echo "order > 0" > >> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter >> >> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable >> >> echo 1 > >> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >> >> echo 1 > >> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >> >> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace >> >> File attached. > > There is no reclaim captured in this trace dump. > $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c > 777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO > 4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > 62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP > 11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE > 1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE > 1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO > 7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > 73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE > 528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > 5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP > 1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > 13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO > 1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO > 1232 order=9 gfp_flags=GFP_TRANSHUGE > 108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE > 362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE > > Nothing really stands out because except for the THP ones none of others > are going to even be using movable zone. It might be that this is not an ideal example is was just the fastest i could find. May be we really need one with much higher pressure. I would try to find one with much higher pressure. > You've said that your machine > doesn't have more than one NUMA node, right? Yes the first example is / was a VM. This one is a Single Xeon. Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
[parent not found: <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>]
* Re: lot of MemAvailable but falling cache and raising PSI [not found] ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag> @ 2019-09-10 5:58 ` Stefan Priebe - Profihost AG 2019-09-10 8:29 ` Michal Hocko 1 sibling, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-10 5:58 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Those are also constantly running on this system (30G free mem): 101 root 20 0 0 0 0 S 12,9 0,0 40:38.45 [kswapd0] 89 root 39 19 0 0 0 S 11,6 0,0 38:58.84 [khugepaged] # cat /proc/pagetypeinfo Page block order: 9 Pages per block: 512 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 0 0 1 2 1 1 0 1 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Unmovable 0 1 0 1 0 1 0 1 1 0 3 Node 0, zone DMA32, type Movable 66 53 71 57 59 53 49 47 24 2 42 Node 0, zone DMA32, type Reclaimable 0 0 3 1 0 1 1 1 1 0 0 Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 1 5442 25546 12849 8379 5771 3297 1523 268 0 0 Node 0, zone Normal, type Movable 100322 153229 102511 75583 52007 34284 19259 9465 2014 15 5 Node 0, zone Normal, type Reclaimable 4002 4299 2395 3721 2568 1056 489 177 63 0 0 Node 0, zone Normal, type HighAtomic 0 0 1 3 3 3 1 0 1 0 0 Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate Node 0, zone DMA 1 7 0 0 0 Node 0, zone DMA32 10 1005 1 0 0 Node 0, zone Normal 3411 27125 1207 1 0 Greets, Stefan Am 10.09.19 um 07:56 schrieb Stefan Priebe - Profihost AG: > > Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG: >> Am 09.09.19 um 14:49 schrieb Michal Hocko: >>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 09.09.19 um 14:28 schrieb Michal Hocko: >>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: >>>>>> >>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko: >>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: >>>>>>>> and that matches moments when we reclaimed memory. There seems to be a >>>>>>>> steady THP allocations flow so maybe this is a source of the direct >>>>>>>> reclaim? >>>>>>> >>>>>>> I was thinking about this some more and THP being a source of reclaim >>>>>>> sounds quite unlikely. At least in a default configuration because we >>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a >>>>>>> difference source of high order (!costly) allocations. Could you check >>>>>>> how many allocation requests like that you have on your system? >>>>>>> >>>>>>> mount -t debugfs none /debug >>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter >>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable >>>>>>> cat /debug/tracing/trace_pipe > $file >>>>> >>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >>>>> >>>>> might tell us something as well but it might turn out that it just still >>>>> doesn't give us the full picture and we might need >>>>> echo stacktrace > /debug/tracing/trace_options >>>>> >>>>> It will generate much more output though. >>>>> >>>>>> Just now or when PSI raises? >>>>> >>>>> When the excessive reclaim is happening ideally. >>>> >>>> This one is from a server with 28G memfree but memory pressure is still >>>> jumping between 0 and 10%. >>>> >>>> I did: >>>> echo "order > 0" > >>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter >>>> >>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable >>>> >>>> echo 1 > >>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >>>> >>>> echo 1 > >>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >>>> >>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace >>>> >>>> File attached. >>> >>> There is no reclaim captured in this trace dump. >>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c >>> 777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO >>> 4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>> 62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP >>> 11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE >>> 1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE >>> 1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO >>> 7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>> 73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE >>> 528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>> 5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP >>> 1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>> 13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO >>> 1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO >>> 1232 order=9 gfp_flags=GFP_TRANSHUGE >>> 108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE >>> 362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE >>> >>> Nothing really stands out because except for the THP ones none of others >>> are going to even be using movable zone. >> It might be that this is not an ideal example is was just the fastest i >> could find. May be we really need one with much higher pressure. > > here another trace log where a system has 30GB free memory but is under > constant pressure and does not build up any file cache caused by memory > pressure. > > > Greets, > Stefan > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI [not found] ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag> 2019-09-10 5:58 ` Stefan Priebe - Profihost AG @ 2019-09-10 8:29 ` Michal Hocko 2019-09-10 8:38 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-10 8:29 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote: > > Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG: > > Am 09.09.19 um 14:49 schrieb Michal Hocko: > >> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: > >>> > >>> Am 09.09.19 um 14:28 schrieb Michal Hocko: > >>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: > >>>>> > >>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko: > >>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: > >>>>>>> and that matches moments when we reclaimed memory. There seems to be a > >>>>>>> steady THP allocations flow so maybe this is a source of the direct > >>>>>>> reclaim? > >>>>>> > >>>>>> I was thinking about this some more and THP being a source of reclaim > >>>>>> sounds quite unlikely. At least in a default configuration because we > >>>>>> shouldn't do anything expensinve in the #PF path. But there might be a > >>>>>> difference source of high order (!costly) allocations. Could you check > >>>>>> how many allocation requests like that you have on your system? > >>>>>> > >>>>>> mount -t debugfs none /debug > >>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter > >>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable > >>>>>> cat /debug/tracing/trace_pipe > $file > >>>> > >>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable > >>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable > >>>> > >>>> might tell us something as well but it might turn out that it just still > >>>> doesn't give us the full picture and we might need > >>>> echo stacktrace > /debug/tracing/trace_options > >>>> > >>>> It will generate much more output though. > >>>> > >>>>> Just now or when PSI raises? > >>>> > >>>> When the excessive reclaim is happening ideally. > >>> > >>> This one is from a server with 28G memfree but memory pressure is still > >>> jumping between 0 and 10%. > >>> > >>> I did: > >>> echo "order > 0" > > >>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter > >>> > >>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable > >>> > >>> echo 1 > > >>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable > >>> > >>> echo 1 > > >>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable > >>> > >>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace > >>> > >>> File attached. > >> > >> There is no reclaim captured in this trace dump. > >> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c > >> 777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO > >> 4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > >> 62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP > >> 11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE > >> 1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE > >> 1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO > >> 7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > >> 73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE > >> 528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > >> 5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP > >> 1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >> 13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO > >> 1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO > >> 1232 order=9 gfp_flags=GFP_TRANSHUGE > >> 108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE > >> 362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE > >> > >> Nothing really stands out because except for the THP ones none of others > >> are going to even be using movable zone. > > It might be that this is not an ideal example is was just the fastest i > > could find. May be we really need one with much higher pressure. > > here another trace log where a system has 30GB free memory but is under > constant pressure and does not build up any file cache caused by memory > pressure. So the reclaim is clearly induced by THP allocations $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c 1580 gfp_flags=GFP_TRANSHUGE 15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' | awk '{nr+=$6+0}END{print nr}' 1541726 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation rate is really high as well $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l 15340 this is 30GB worth of THPs (some of them might get released of course). Also only 10% of requests ends up reclaiming. One additional interesting point $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596 Even though the std is high there are quite some outliers when a lot of memory is reclaimed. Which kernel version is this. And again, what is the THP configuration. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 8:29 ` Michal Hocko @ 2019-09-10 8:38 ` Stefan Priebe - Profihost AG 2019-09-10 9:02 ` Michal Hocko 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-10 8:38 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Am 10.09.19 um 10:29 schrieb Michal Hocko: > On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote: >> >> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG: >>> Am 09.09.19 um 14:49 schrieb Michal Hocko: >>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: >>>>> >>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko: >>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: >>>>>>> >>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko: >>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: >>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a >>>>>>>>> steady THP allocations flow so maybe this is a source of the direct >>>>>>>>> reclaim? >>>>>>>> >>>>>>>> I was thinking about this some more and THP being a source of reclaim >>>>>>>> sounds quite unlikely. At least in a default configuration because we >>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a >>>>>>>> difference source of high order (!costly) allocations. Could you check >>>>>>>> how many allocation requests like that you have on your system? >>>>>>>> >>>>>>>> mount -t debugfs none /debug >>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter >>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable >>>>>>>> cat /debug/tracing/trace_pipe > $file >>>>>> >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >>>>>> >>>>>> might tell us something as well but it might turn out that it just still >>>>>> doesn't give us the full picture and we might need >>>>>> echo stacktrace > /debug/tracing/trace_options >>>>>> >>>>>> It will generate much more output though. >>>>>> >>>>>>> Just now or when PSI raises? >>>>>> >>>>>> When the excessive reclaim is happening ideally. >>>>> >>>>> This one is from a server with 28G memfree but memory pressure is still >>>>> jumping between 0 and 10%. >>>>> >>>>> I did: >>>>> echo "order > 0" > >>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter >>>>> >>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable >>>>> >>>>> echo 1 > >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >>>>> >>>>> echo 1 > >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >>>>> >>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace >>>>> >>>>> File attached. >>>> >>>> There is no reclaim captured in this trace dump. >>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c >>>> 777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO >>>> 4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>>> 62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP >>>> 11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE >>>> 1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE >>>> 1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO >>>> 7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>>> 73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE >>>> 528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>>> 5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP >>>> 1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>> 13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO >>>> 1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO >>>> 1232 order=9 gfp_flags=GFP_TRANSHUGE >>>> 108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE >>>> 362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE >>>> >>>> Nothing really stands out because except for the THP ones none of others >>>> are going to even be using movable zone. >>> It might be that this is not an ideal example is was just the fastest i >>> could find. May be we really need one with much higher pressure. >> >> here another trace log where a system has 30GB free memory but is under >> constant pressure and does not build up any file cache caused by memory >> pressure. > > So the reclaim is clearly induced by THP allocations > $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c > 1580 gfp_flags=GFP_TRANSHUGE > 15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE > > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' | awk '{nr+=$6+0}END{print nr}' > 1541726 > > 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation > rate is really high as well > $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l > 15340 > > this is 30GB worth of THPs (some of them might get released of course). > Also only 10% of requests ends up reclaiming. > > One additional interesting point > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk > min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596 > > Even though the std is high there are quite some outliers when a lot of > memory is reclaimed. > > Which kernel version is this. And again, what is the THP configuration. This is 4.19.66 regarding THP you mean this: /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise] madvise never /sys/kernel/mm/transparent_hugepage/enabled:[always] madvise never /sys/kernel/mm/transparent_hugepage/hpage_pmd_size:2097152 /sys/kernel/mm/transparent_hugepage/shmem_enabled:always within_size advise [never] deny force /sys/kernel/mm/transparent_hugepage/use_zero_page:1 /sys/kernel/mm/transparent_hugepage/enabled was madvise until yesterday where i tried to switch to defer+madvise - which didn't help. Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 8:38 ` Stefan Priebe - Profihost AG @ 2019-09-10 9:02 ` Michal Hocko 2019-09-10 9:37 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-10 9:02 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote: > Am 10.09.19 um 10:29 schrieb Michal Hocko: > > On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote: > >> > >> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG: > >>> Am 09.09.19 um 14:49 schrieb Michal Hocko: > >>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: > >>>>> > >>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko: > >>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: > >>>>>>> > >>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko: > >>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: > >>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a > >>>>>>>>> steady THP allocations flow so maybe this is a source of the direct > >>>>>>>>> reclaim? > >>>>>>>> > >>>>>>>> I was thinking about this some more and THP being a source of reclaim > >>>>>>>> sounds quite unlikely. At least in a default configuration because we > >>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a > >>>>>>>> difference source of high order (!costly) allocations. Could you check > >>>>>>>> how many allocation requests like that you have on your system? > >>>>>>>> > >>>>>>>> mount -t debugfs none /debug > >>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter > >>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable > >>>>>>>> cat /debug/tracing/trace_pipe > $file > >>>>>> > >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable > >>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable > >>>>>> > >>>>>> might tell us something as well but it might turn out that it just still > >>>>>> doesn't give us the full picture and we might need > >>>>>> echo stacktrace > /debug/tracing/trace_options > >>>>>> > >>>>>> It will generate much more output though. > >>>>>> > >>>>>>> Just now or when PSI raises? > >>>>>> > >>>>>> When the excessive reclaim is happening ideally. > >>>>> > >>>>> This one is from a server with 28G memfree but memory pressure is still > >>>>> jumping between 0 and 10%. > >>>>> > >>>>> I did: > >>>>> echo "order > 0" > > >>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter > >>>>> > >>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable > >>>>> > >>>>> echo 1 > > >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable > >>>>> > >>>>> echo 1 > > >>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable > >>>>> > >>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace > >>>>> > >>>>> File attached. > >>>> > >>>> There is no reclaim captured in this trace dump. > >>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c > >>>> 777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO > >>>> 4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > >>>> 62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP > >>>> 11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE > >>>> 1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE > >>>> 1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO > >>>> 7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > >>>> 73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE > >>>> 528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT > >>>> 5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP > >>>> 1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC > >>>> 13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO > >>>> 1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO > >>>> 1232 order=9 gfp_flags=GFP_TRANSHUGE > >>>> 108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE > >>>> 362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE > >>>> > >>>> Nothing really stands out because except for the THP ones none of others > >>>> are going to even be using movable zone. > >>> It might be that this is not an ideal example is was just the fastest i > >>> could find. May be we really need one with much higher pressure. > >> > >> here another trace log where a system has 30GB free memory but is under > >> constant pressure and does not build up any file cache caused by memory > >> pressure. > > > > So the reclaim is clearly induced by THP allocations > > $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c > > 1580 gfp_flags=GFP_TRANSHUGE > > 15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE > > > > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' | awk '{nr+=$6+0}END{print nr}' > > 1541726 > > > > 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation > > rate is really high as well > > $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l > > 15340 > > > > this is 30GB worth of THPs (some of them might get released of course). > > Also only 10% of requests ends up reclaiming. > > > > One additional interesting point > > $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk > > min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596 > > > > Even though the std is high there are quite some outliers when a lot of > > memory is reclaimed. > > > > Which kernel version is this. And again, what is the THP configuration. > > This is 4.19.66 regarding THP you mean this: Do you see the same behavior with 5.3? > /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise] > madvise never > > /sys/kernel/mm/transparent_hugepage/enabled:[always] madvise never > > /sys/kernel/mm/transparent_hugepage/hpage_pmd_size:2097152 > > /sys/kernel/mm/transparent_hugepage/shmem_enabled:always within_size > advise [never] deny force > > /sys/kernel/mm/transparent_hugepage/use_zero_page:1 > > /sys/kernel/mm/transparent_hugepage/enabled was madvise until yesterday > where i tried to switch to defer+madvise - which didn't help. Many processes hitting the reclaim are php5 others I cannot say because their cmd is not reflected in the trace. I suspect those are using madvise. I haven't really seen kcompactd interfering much. That would suggest using defer. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 9:02 ` Michal Hocko @ 2019-09-10 9:37 ` Stefan Priebe - Profihost AG 2019-09-10 11:07 ` Michal Hocko 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-10 9:37 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Am 10.09.19 um 11:02 schrieb Michal Hocko: > On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote: >> Am 10.09.19 um 10:29 schrieb Michal Hocko: >>> On Tue 10-09-19 07:56:36, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 09.09.19 um 14:56 schrieb Stefan Priebe - Profihost AG: >>>>> Am 09.09.19 um 14:49 schrieb Michal Hocko: >>>>>> On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: >>>>>>> >>>>>>> Am 09.09.19 um 14:28 schrieb Michal Hocko: >>>>>>>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: >>>>>>>>> >>>>>>>>> Am 09.09.19 um 14:08 schrieb Michal Hocko: >>>>>>>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: >>>>>>>>>>> and that matches moments when we reclaimed memory. There seems to be a >>>>>>>>>>> steady THP allocations flow so maybe this is a source of the direct >>>>>>>>>>> reclaim? >>>>>>>>>> >>>>>>>>>> I was thinking about this some more and THP being a source of reclaim >>>>>>>>>> sounds quite unlikely. At least in a default configuration because we >>>>>>>>>> shouldn't do anything expensinve in the #PF path. But there might be a >>>>>>>>>> difference source of high order (!costly) allocations. Could you check >>>>>>>>>> how many allocation requests like that you have on your system? >>>>>>>>>> >>>>>>>>>> mount -t debugfs none /debug >>>>>>>>>> echo "order > 0" > /debug/tracing/events/kmem/mm_page_alloc/filter >>>>>>>>>> echo 1 > /debug/tracing/events/kmem/mm_page_alloc/enable >>>>>>>>>> cat /debug/tracing/trace_pipe > $file >>>>>>>> >>>>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >>>>>>>> echo 1 > /debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >>>>>>>> >>>>>>>> might tell us something as well but it might turn out that it just still >>>>>>>> doesn't give us the full picture and we might need >>>>>>>> echo stacktrace > /debug/tracing/trace_options >>>>>>>> >>>>>>>> It will generate much more output though. >>>>>>>> >>>>>>>>> Just now or when PSI raises? >>>>>>>> >>>>>>>> When the excessive reclaim is happening ideally. >>>>>>> >>>>>>> This one is from a server with 28G memfree but memory pressure is still >>>>>>> jumping between 0 and 10%. >>>>>>> >>>>>>> I did: >>>>>>> echo "order > 0" > >>>>>>> /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/filter >>>>>>> >>>>>>> echo 1 > /sys/kernel/debug/tracing/events/kmem/mm_page_alloc/enable >>>>>>> >>>>>>> echo 1 > >>>>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_begin/enable >>>>>>> >>>>>>> echo 1 > >>>>>>> /sys/kernel/debug/tracing/events/vmscan/mm_vmscan_direct_reclaim_end/enable >>>>>>> >>>>>>> timeout 120 cat /sys/kernel/debug/tracing/trace_pipe > /trace >>>>>>> >>>>>>> File attached. >>>>>> >>>>>> There is no reclaim captured in this trace dump. >>>>>> $ zcat trace1.gz | sed 's@.*\(order=[0-9]\).*\(gfp_flags=.*\)@\1 \2@' | sort | uniq -c >>>>>> 777 order=1 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 663 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 153 order=1 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 911 order=1 gfp_flags=GFP_KERNEL_ACCOUNT|__GFP_ZERO >>>>>> 4872 order=1 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>>>>> 62 order=1 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 14 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP >>>>>> 11 order=2 gfp_flags=GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_RECLAIMABLE >>>>>> 1263 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 45 order=2 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE >>>>>> 1 order=2 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO >>>>>> 7853 order=2 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>>>>> 73 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 729 order=3 gfp_flags=__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE >>>>>> 528 order=3 gfp_flags=__GFP_IO|__GFP_NOWARN|__GFP_RETRY_MAYFAIL|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 1203 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_COMP|__GFP_ACCOUNT >>>>>> 5295 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP >>>>>> 1 order=3 gfp_flags=GFP_NOWAIT|__GFP_IO|__GFP_FS|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 132 order=3 gfp_flags=GFP_NOWAIT|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC >>>>>> 13 order=5 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO >>>>>> 1 order=6 gfp_flags=GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_ZERO >>>>>> 1232 order=9 gfp_flags=GFP_TRANSHUGE >>>>>> 108 order=9 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE >>>>>> 362 order=9 gfp_flags=GFP_TRANSHUGE_LIGHT|__GFP_THISNODE >>>>>> >>>>>> Nothing really stands out because except for the THP ones none of others >>>>>> are going to even be using movable zone. >>>>> It might be that this is not an ideal example is was just the fastest i >>>>> could find. May be we really need one with much higher pressure. >>>> >>>> here another trace log where a system has 30GB free memory but is under >>>> constant pressure and does not build up any file cache caused by memory >>>> pressure. >>> >>> So the reclaim is clearly induced by THP allocations >>> $ zgrep vmscan trace2.gz | grep gfp_flags | sed 's@.*\(gfp_flags=.*\) .*@\1@' | sort | uniq -c >>> 1580 gfp_flags=GFP_TRANSHUGE >>> 15 gfp_flags=GFP_TRANSHUGE|__GFP_THISNODE >>> >>> $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@nr_reclaimed=@@' | awk '{nr+=$6+0}END{print nr}' >>> 1541726 >>> >>> 6GB of memory reclaimed in 1776s. That is a lot! But the THP allocation >>> rate is really high as well >>> $ zgrep "page_alloc.*GFP_TRANSHUGE" trace2.gz | wc -l >>> 15340 >>> >>> this is 30GB worth of THPs (some of them might get released of course). >>> Also only 10% of requests ends up reclaiming. >>> >>> One additional interesting point >>> $ zgrep vmscan trace2.gz | grep nr_reclaimed | sed 's@.*nr_reclaimed=\([[0-9]*\)@\1@' | calc_min_max.awk >>> min: 1.00 max: 2792.00 avg: 965.99 std: 331.12 nr: 1596 >>> >>> Even though the std is high there are quite some outliers when a lot of >>> memory is reclaimed. >>> >>> Which kernel version is this. And again, what is the THP configuration. >> >> This is 4.19.66 regarding THP you mean this: > > Do you see the same behavior with 5.3? I rebootet with 5.3.0-rc8 - let's see what happens it might take some hours or even days. >> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise] >> madvise never >> >> /sys/kernel/mm/transparent_hugepage/enabled:[always] madvise never >> >> /sys/kernel/mm/transparent_hugepage/hpage_pmd_size:2097152 >> >> /sys/kernel/mm/transparent_hugepage/shmem_enabled:always within_size >> advise [never] deny force >> >> /sys/kernel/mm/transparent_hugepage/use_zero_page:1 >> >> /sys/kernel/mm/transparent_hugepage/enabled was madvise until yesterday >> where i tried to switch to defer+madvise - which didn't help. > > Many processes hitting the reclaim are php5 others I cannot say because > their cmd is not reflected in the trace. I suspect those are using > madvise. I haven't really seen kcompactd interfering much. That would > suggest using defer. You mean i should set transparent_hugepage to defer? Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 9:37 ` Stefan Priebe - Profihost AG @ 2019-09-10 11:07 ` Michal Hocko 2019-09-10 12:45 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-10 11:07 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Tue 10-09-19 11:37:19, Stefan Priebe - Profihost AG wrote: > > Am 10.09.19 um 11:02 schrieb Michal Hocko: > > On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote: [...] > >> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise] > >> madvise never [...] > > Many processes hitting the reclaim are php5 others I cannot say because > > their cmd is not reflected in the trace. I suspect those are using > > madvise. I haven't really seen kcompactd interfering much. That would > > suggest using defer. > > You mean i should set transparent_hugepage to defer? Let's try with 5.3 without any changes first and then if the problem is still reproducible then limit the THP load by setting transparent_hugepage to defer. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 11:07 ` Michal Hocko @ 2019-09-10 12:45 ` Stefan Priebe - Profihost AG 2019-09-10 12:57 ` Michal Hocko 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-10 12:45 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka [-- Attachment #1: Type: text/plain, Size: 1118 bytes --] Hello Michal, ok this might take a long time. Attached you'll find a graph from a fresh boot what happens over time (here 17 August to 30 August). Memory Usage decreases as well as cache but slowly and only over time and days. So it might take 2-3 weeks running Kernel 5.3 to see what happens. Greets, Stefan Am 10.09.19 um 13:07 schrieb Michal Hocko: > On Tue 10-09-19 11:37:19, Stefan Priebe - Profihost AG wrote: >> >> Am 10.09.19 um 11:02 schrieb Michal Hocko: >>> On Tue 10-09-19 10:38:25, Stefan Priebe - Profihost AG wrote: > [...] >>>> /sys/kernel/mm/transparent_hugepage/defrag:always defer [defer+madvise] >>>> madvise never > [...] >>> Many processes hitting the reclaim are php5 others I cannot say because >>> their cmd is not reflected in the trace. I suspect those are using >>> madvise. I haven't really seen kcompactd interfering much. That would >>> suggest using defer. >> >> You mean i should set transparent_hugepage to defer? > > Let's try with 5.3 without any changes first and then if the problem is > still reproducible then limit the THP load by setting > transparent_hugepage to defer. [-- Attachment #2: psi-overview.png --] [-- Type: image/png, Size: 111334 bytes --] ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 12:45 ` Stefan Priebe - Profihost AG @ 2019-09-10 12:57 ` Michal Hocko 2019-09-10 13:05 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-10 12:57 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: > Hello Michal, > > ok this might take a long time. Attached you'll find a graph from a > fresh boot what happens over time (here 17 August to 30 August). Memory > Usage decreases as well as cache but slowly and only over time and days. > > So it might take 2-3 weeks running Kernel 5.3 to see what happens. No problem. Just make sure to collect the requested data from the time you see the actual problem. Btw. you try my very dumb scriplets to get an idea of how much memory gets reclaimed due to THP. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 12:57 ` Michal Hocko @ 2019-09-10 13:05 ` Stefan Priebe - Profihost AG 2019-09-10 13:14 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-10 13:05 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Am 10.09.19 um 14:57 schrieb Michal Hocko: > On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >> Hello Michal, >> >> ok this might take a long time. Attached you'll find a graph from a >> fresh boot what happens over time (here 17 August to 30 August). Memory >> Usage decreases as well as cache but slowly and only over time and days. >> >> So it might take 2-3 weeks running Kernel 5.3 to see what happens. > > No problem. Just make sure to collect the requested data from the time > you see the actual problem. Btw. you try my very dumb scriplets to get > an idea of how much memory gets reclaimed due to THP. You mean your sed and sort on top of the trace file? No i did not with the current 5.3 kernel do you think it will show anything interesting? Which line shows me how much memory gets reclaimed due to THP? Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 13:05 ` Stefan Priebe - Profihost AG @ 2019-09-10 13:14 ` Stefan Priebe - Profihost AG 2019-09-10 13:24 ` Michal Hocko 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-10 13:14 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: > > Am 10.09.19 um 14:57 schrieb Michal Hocko: >> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >>> Hello Michal, >>> >>> ok this might take a long time. Attached you'll find a graph from a >>> fresh boot what happens over time (here 17 August to 30 August). Memory >>> Usage decreases as well as cache but slowly and only over time and days. >>> >>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. >> >> No problem. Just make sure to collect the requested data from the time >> you see the actual problem. Btw. you try my very dumb scriplets to get >> an idea of how much memory gets reclaimed due to THP. > > You mean your sed and sort on top of the trace file? No i did not with > the current 5.3 kernel do you think it will show anything interesting? > Which line shows me how much memory gets reclaimed due to THP? Is something like a kernel memory leak possible? Or wouldn't this end up in having a lot of free memory which doesn't seem usable. I also wonder why a reclaim takes place when there is enough memory. Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 13:14 ` Stefan Priebe - Profihost AG @ 2019-09-10 13:24 ` Michal Hocko 2019-09-11 6:12 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Michal Hocko @ 2019-09-10 13:24 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote: > Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: > > > > Am 10.09.19 um 14:57 schrieb Michal Hocko: > >> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: > >>> Hello Michal, > >>> > >>> ok this might take a long time. Attached you'll find a graph from a > >>> fresh boot what happens over time (here 17 August to 30 August). Memory > >>> Usage decreases as well as cache but slowly and only over time and days. > >>> > >>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. > >> > >> No problem. Just make sure to collect the requested data from the time > >> you see the actual problem. Btw. you try my very dumb scriplets to get > >> an idea of how much memory gets reclaimed due to THP. > > > > You mean your sed and sort on top of the trace file? No i did not with > > the current 5.3 kernel do you think it will show anything interesting? > > Which line shows me how much memory gets reclaimed due to THP? Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz Each command has a commented output. If you see nunmber of reclaimed pages to be large for GFP_TRANSHUGE then you are seeing a similar problem. > Is something like a kernel memory leak possible? Or wouldn't this end up > in having a lot of free memory which doesn't seem usable. I would be really surprised if this was the case. > I also wonder why a reclaim takes place when there is enough memory. This is not clear yet and it might be a bug that has been fixed since 4.18. That's why we need to see whether the same is pattern is happening with 5.3 as well. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-10 13:24 ` Michal Hocko @ 2019-09-11 6:12 ` Stefan Priebe - Profihost AG 2019-09-11 6:24 ` Stefan Priebe - Profihost AG ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-11 6:12 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Hi Michal, Am 10.09.19 um 15:24 schrieb Michal Hocko: > On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote: >> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: >>> >>> Am 10.09.19 um 14:57 schrieb Michal Hocko: >>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >>>>> Hello Michal, >>>>> >>>>> ok this might take a long time. Attached you'll find a graph from a >>>>> fresh boot what happens over time (here 17 August to 30 August). Memory >>>>> Usage decreases as well as cache but slowly and only over time and days. >>>>> >>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. >>>> >>>> No problem. Just make sure to collect the requested data from the time >>>> you see the actual problem. Btw. you try my very dumb scriplets to get >>>> an idea of how much memory gets reclaimed due to THP. >>> >>> You mean your sed and sort on top of the trace file? No i did not with >>> the current 5.3 kernel do you think it will show anything interesting? >>> Which line shows me how much memory gets reclaimed due to THP? > > Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz > Each command has a commented output. If you see nunmber of reclaimed > pages to be large for GFP_TRANSHUGE then you are seeing a similar > problem. > >> Is something like a kernel memory leak possible? Or wouldn't this end up >> in having a lot of free memory which doesn't seem usable. > > I would be really surprised if this was the case. > >> I also wonder why a reclaim takes place when there is enough memory. > > This is not clear yet and it might be a bug that has been fixed since > 4.18. That's why we need to see whether the same is pattern is happening > with 5.3 as well. Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process on backup disk completely hangs / is blocked at 100% i/o: [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. [54739.066973] Not tainted 5.3.0-rc8 #1 [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [54739.069065] rsync D 0 9830 9829 0x00004002 [54739.070146] Call Trace: [54739.071183] ? __schedule+0x3cf/0x680 [54739.072202] ? bit_wait+0x50/0x50 [54739.073196] schedule+0x39/0xa0 [54739.074213] io_schedule+0x12/0x40 [54739.075219] bit_wait_io+0xd/0x50 [54739.076227] __wait_on_bit+0x66/0x90 [54739.077239] ? bit_wait+0x50/0x50 [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 [54739.078741] ? init_wait_var_entry+0x40/0x40 [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] [54739.080748] do_writepages+0x1a/0x60 [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] [54739.084608] ? retarget_shared_pending+0x70/0x70 [54739.085049] do_fsync+0x38/0x60 [54739.085494] __x64_sys_fdatasync+0x13/0x20 [54739.085944] do_syscall_64+0x55/0x1a0 [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [54739.086850] RIP: 0033:0x7f1db3fc85f0 [54739.087310] Code: Bad RIP value. [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. [54859.900863] Not tainted 5.3.0-rc8 #1 [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [54859.902909] rsync D 0 9830 9829 0x00004002 [54859.903930] Call Trace: [54859.904888] ? __schedule+0x3cf/0x680 [54859.905831] ? bit_wait+0x50/0x50 [54859.906751] schedule+0x39/0xa0 [54859.907653] io_schedule+0x12/0x40 [54859.908535] bit_wait_io+0xd/0x50 [54859.909441] __wait_on_bit+0x66/0x90 [54859.910306] ? bit_wait+0x50/0x50 [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 [54859.912043] ? init_wait_var_entry+0x40/0x40 [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] [54859.914276] do_writepages+0x1a/0x60 [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] [54859.917903] ? retarget_shared_pending+0x70/0x70 [54859.918307] do_fsync+0x38/0x60 [54859.918707] __x64_sys_fdatasync+0x13/0x20 [54859.919106] do_syscall_64+0x55/0x1a0 [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [54859.919866] RIP: 0033:0x7f1db3fc85f0 [54859.920243] Code: Bad RIP value. [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. [54980.734061] Not tainted 5.3.0-rc8 #1 [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [54980.735209] rsync D 0 9830 9829 0x00004002 [54980.735802] Call Trace: [54980.736473] ? __schedule+0x3cf/0x680 [54980.737054] ? bit_wait+0x50/0x50 [54980.737664] schedule+0x39/0xa0 [54980.738243] io_schedule+0x12/0x40 [54980.738712] bit_wait_io+0xd/0x50 [54980.739171] __wait_on_bit+0x66/0x90 [54980.739623] ? bit_wait+0x50/0x50 [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 [54980.740548] ? init_wait_var_entry+0x40/0x40 [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] [54980.743045] do_writepages+0x1a/0x60 [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] [54980.747575] ? retarget_shared_pending+0x70/0x70 [54980.748059] do_fsync+0x38/0x60 [54980.748539] __x64_sys_fdatasync+0x13/0x20 [54980.749012] do_syscall_64+0x55/0x1a0 [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [54980.749995] RIP: 0033:0x7f1db3fc85f0 [54980.750368] Code: Bad RIP value. [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. [55101.567775] Not tainted 5.3.0-rc8 #1 [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [55101.568649] rsync D 0 9830 9829 0x00004002 [55101.569101] Call Trace: [55101.569609] ? __schedule+0x3cf/0x680 [55101.570052] ? bit_wait+0x50/0x50 [55101.570504] schedule+0x39/0xa0 [55101.570938] io_schedule+0x12/0x40 [55101.571404] bit_wait_io+0xd/0x50 [55101.571934] __wait_on_bit+0x66/0x90 [55101.572601] ? bit_wait+0x50/0x50 [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 [55101.573599] ? init_wait_var_entry+0x40/0x40 [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] [55101.575580] do_writepages+0x1a/0x60 [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] [55101.579139] ? retarget_shared_pending+0x70/0x70 [55101.579543] do_fsync+0x38/0x60 [55101.579928] __x64_sys_fdatasync+0x13/0x20 [55101.580312] do_syscall_64+0x55/0x1a0 [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55101.581086] RIP: 0033:0x7f1db3fc85f0 [55101.581463] Code: Bad RIP value. [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. [55222.405773] Not tainted 5.3.0-rc8 #1 [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [55222.407158] rsync D 0 9830 9829 0x00004002 [55222.407776] Call Trace: [55222.408450] ? __schedule+0x3cf/0x680 [55222.409206] ? bit_wait+0x50/0x50 [55222.409942] schedule+0x39/0xa0 [55222.410658] io_schedule+0x12/0x40 [55222.411346] bit_wait_io+0xd/0x50 [55222.411946] __wait_on_bit+0x66/0x90 [55222.412572] ? bit_wait+0x50/0x50 [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 [55222.413944] ? init_wait_var_entry+0x40/0x40 [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] [55222.417505] do_writepages+0x1a/0x60 [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] [55222.424140] ? retarget_shared_pending+0x70/0x70 [55222.424861] do_fsync+0x38/0x60 [55222.425581] __x64_sys_fdatasync+0x13/0x20 [55222.426308] do_syscall_64+0x55/0x1a0 [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55222.427732] RIP: 0033:0x7f1db3fc85f0 [55222.428396] Code: Bad RIP value. [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. [55343.235887] Not tainted 5.3.0-rc8 #1 [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [55343.237213] rsync D 0 9830 9829 0x00004002 [55343.237766] Call Trace: [55343.238353] ? __schedule+0x3cf/0x680 [55343.238971] ? bit_wait+0x50/0x50 [55343.239592] schedule+0x39/0xa0 [55343.240173] io_schedule+0x12/0x40 [55343.240721] bit_wait_io+0xd/0x50 [55343.241266] __wait_on_bit+0x66/0x90 [55343.241835] ? bit_wait+0x50/0x50 [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 [55343.242938] ? init_wait_var_entry+0x40/0x40 [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] [55343.245843] do_writepages+0x1a/0x60 [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] [55343.251139] ? retarget_shared_pending+0x70/0x70 [55343.251628] do_fsync+0x38/0x60 [55343.252208] __x64_sys_fdatasync+0x13/0x20 [55343.252702] do_syscall_64+0x55/0x1a0 [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55343.253798] RIP: 0033:0x7f1db3fc85f0 [55343.254294] Code: Bad RIP value. [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. [55464.069701] Not tainted 5.3.0-rc8 #1 [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [55464.071637] rsync D 0 9830 9829 0x00004002 [55464.072637] Call Trace: [55464.073623] ? __schedule+0x3cf/0x680 [55464.074604] ? bit_wait+0x50/0x50 [55464.075577] schedule+0x39/0xa0 [55464.076531] io_schedule+0x12/0x40 [55464.077480] bit_wait_io+0xd/0x50 [55464.078400] __wait_on_bit+0x66/0x90 [55464.079300] ? bit_wait+0x50/0x50 [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 [55464.081107] ? init_wait_var_entry+0x40/0x40 [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] [55464.085456] do_writepages+0x1a/0x60 [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] [55464.089043] ? retarget_shared_pending+0x70/0x70 [55464.089429] do_fsync+0x38/0x60 [55464.089811] __x64_sys_fdatasync+0x13/0x20 [55464.090190] do_syscall_64+0x55/0x1a0 [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55464.090944] RIP: 0033:0x7f1db3fc85f0 [55464.091321] Code: Bad RIP value. [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. [55584.903748] Not tainted 5.3.0-rc8 #1 [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [55584.906023] rsync D 0 9830 9829 0x00004002 [55584.907207] Call Trace: [55584.908355] ? __schedule+0x3cf/0x680 [55584.909507] ? bit_wait+0x50/0x50 [55584.910682] schedule+0x39/0xa0 [55584.911230] io_schedule+0x12/0x40 [55584.911666] bit_wait_io+0xd/0x50 [55584.912092] __wait_on_bit+0x66/0x90 [55584.912510] ? bit_wait+0x50/0x50 [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 [55584.913343] ? init_wait_var_entry+0x40/0x40 [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] [55584.915588] do_writepages+0x1a/0x60 [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] [55584.919679] ? retarget_shared_pending+0x70/0x70 [55584.920122] do_fsync+0x38/0x60 [55584.920559] __x64_sys_fdatasync+0x13/0x20 [55584.920996] do_syscall_64+0x55/0x1a0 [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55584.921865] RIP: 0033:0x7f1db3fc85f0 [55584.922298] Code: Bad RIP value. [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. [55705.736999] Not tainted 5.3.0-rc8 #1 [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [55705.738411] rsync D 0 9830 9829 0x00004002 [55705.739072] Call Trace: [55705.739455] ? __schedule+0x3cf/0x680 [55705.739837] ? bit_wait+0x50/0x50 [55705.740215] schedule+0x39/0xa0 [55705.740610] io_schedule+0x12/0x40 [55705.741243] bit_wait_io+0xd/0x50 [55705.741897] __wait_on_bit+0x66/0x90 [55705.742524] ? bit_wait+0x50/0x50 [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 [55705.743750] ? init_wait_var_entry+0x40/0x40 [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] [55705.746753] do_writepages+0x1a/0x60 [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] [55705.752981] ? retarget_shared_pending+0x70/0x70 [55705.753686] do_fsync+0x38/0x60 [55705.754340] __x64_sys_fdatasync+0x13/0x20 [55705.755012] do_syscall_64+0x55/0x1a0 [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55705.756375] RIP: 0033:0x7f1db3fc85f0 [55705.757042] Code: Bad RIP value. [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. [55826.571349] Not tainted 5.3.0-rc8 #1 [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [55826.573618] rsync D 0 9830 9829 0x00004002 [55826.574790] Call Trace: [55826.575932] ? __schedule+0x3cf/0x680 [55826.577079] ? bit_wait+0x50/0x50 [55826.578233] schedule+0x39/0xa0 [55826.579350] io_schedule+0x12/0x40 [55826.580451] bit_wait_io+0xd/0x50 [55826.581527] __wait_on_bit+0x66/0x90 [55826.582596] ? bit_wait+0x50/0x50 [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 [55826.583550] ? init_wait_var_entry+0x40/0x40 [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] [55826.585547] do_writepages+0x1a/0x60 [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] [55826.589219] ? retarget_shared_pending+0x70/0x70 [55826.589617] do_fsync+0x38/0x60 [55826.590011] __x64_sys_fdatasync+0x13/0x20 [55826.590411] do_syscall_64+0x55/0x1a0 [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [55826.591185] RIP: 0033:0x7f1db3fc85f0 [55826.591572] Code: Bad RIP value. [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1db3fc85f0 [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: 0000000000000001 [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000081c492ca [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000028 [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: 0000000000000000 Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-11 6:12 ` Stefan Priebe - Profihost AG @ 2019-09-11 6:24 ` Stefan Priebe - Profihost AG 2019-09-11 13:59 ` Stefan Priebe - Profihost AG 2019-09-11 7:09 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko 2019-09-19 10:21 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG 2 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-11 6:24 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Hi Michal, Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG: > Hi Michal, > Am 10.09.19 um 15:24 schrieb Michal Hocko: >> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote: >>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: >>>> >>>> Am 10.09.19 um 14:57 schrieb Michal Hocko: >>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >>>>>> Hello Michal, >>>>>> >>>>>> ok this might take a long time. Attached you'll find a graph from a >>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory >>>>>> Usage decreases as well as cache but slowly and only over time and days. >>>>>> >>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. >>>>> >>>>> No problem. Just make sure to collect the requested data from the time >>>>> you see the actual problem. Btw. you try my very dumb scriplets to get >>>>> an idea of how much memory gets reclaimed due to THP. >>>> >>>> You mean your sed and sort on top of the trace file? No i did not with >>>> the current 5.3 kernel do you think it will show anything interesting? >>>> Which line shows me how much memory gets reclaimed due to THP? >> >> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz >> Each command has a commented output. If you see nunmber of reclaimed >> pages to be large for GFP_TRANSHUGE then you are seeing a similar >> problem. >> >>> Is something like a kernel memory leak possible? Or wouldn't this end up >>> in having a lot of free memory which doesn't seem usable. >> >> I would be really surprised if this was the case. >> >>> I also wonder why a reclaim takes place when there is enough memory. >> >> This is not clear yet and it might be a bug that has been fixed since >> 4.18. That's why we need to see whether the same is pattern is happening >> with 5.3 as well. but except from the btrfs problem the memory consumption looks far better than before. Running 4.19.X: after about 12h cache starts to drop from 30G to 24G Running 5.3-rc8: after about 24h cache is still constant at nearly 30G Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-11 6:24 ` Stefan Priebe - Profihost AG @ 2019-09-11 13:59 ` Stefan Priebe - Profihost AG 2019-09-12 10:53 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-11 13:59 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka HI, i've now tried v5.2.14 but that one died with - i don't know which version to try... now 2019-09-11 15:41:09 ------------[ cut here ]------------ 2019-09-11 15:41:09 kernel BUG at mm/page-writeback.c:2655! 2019-09-11 15:41:09 invalid opcode: 0000 [#1] SMP PTI 2019-09-11 15:41:09 CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted 5.2.14 #1 2019-09-11 15:41:09 Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 2019-09-11 15:41:09 Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] 2019-09-11 15:41:09 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 2019-09-11 15:41:09 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 d2 48 8b 2019-09-11 15:41:09 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 2019-09-11 15:41:09 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: 0000000000000000 2019-09-11 15:41:09 RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffe660525b3140 2019-09-11 15:41:09 RBP: ffff9ad639868818 R08: 0000000000000001 R09: 000000000002de18 2019-09-11 15:41:09 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: 0000000000000000 2019-09-11 15:41:09 R13: 0000000000000001 R14: 0000000000000000 R15: ffffbd4b8d2f3d08 2019-09-11 15:41:09 FS: 0000000000000000(0000) GS:ffff9ade3f900000(0000) knlGS:0000000000000000 2019-09-11 15:41:09 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2019-09-11 15:41:09 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: 00000000001606e0 2019-09-11 15:41:09 Call Trace: 2019-09-11 15:41:09 __process_pages_contig+0x270/0x360 [btrfs] 2019-09-11 15:41:09 submit_compressed_extents+0x39d/0x460 [btrfs] 2019-09-11 15:41:09 normal_work_helper+0x20f/0x320 [btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0 2019-09-11 15:41:09 ? rescuer_thread+0x330/0x330kthread+0xf8/0x130 2019-09-11 15:41:09 ? kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40 2019-09-11 15:41:09 Modules linked in: netconsole xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas 2019-09-11 15:41:09 ---[ end trace d9a3f99c047dc8bf ]--- 2019-09-11 15:41:10 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 2019-09-11 15:41:10 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 d2 48 8b 2019-09-11 15:41:10 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 2019-09-11 15:41:10 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: 0000000000000000 2019-09-11 15:41:10 RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffe660525b3140 2019-09-11 15:41:10 RBP: ffff9ad639868818 R08: 0000000000000001 R09: 000000000002de18 2019-09-11 15:41:10 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: 0000000000000000 2019-09-11 15:41:10 R13: 0000000000000001 R14: 0000000000000000 R15: ffffbd4b8d2f3d08 2019-09-11 15:41:10 FS: 0000000000000000(0000) GS:ffff9ade3f900000(0000) knlGS:0000000000000000 2019-09-11 15:41:10 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2019-09-11 15:41:10 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: 00000000001606e0 2019-09-11 15:41:10 Kernel panic - not syncing: Fatal exception 2019-09-11 15:41:10 Kernel Offset: 0x1a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 2019-09-11 15:41:10 Rebooting in 20 seconds.. 2019-09-11 15:41:29 ACPI MEMORY or I/O RESET_REG. Stefan Am 11.09.19 um 08:24 schrieb Stefan Priebe - Profihost AG: > Hi Michal, > > Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG: >> Hi Michal, >> Am 10.09.19 um 15:24 schrieb Michal Hocko: >>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote: >>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: >>>>> >>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko: >>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >>>>>>> Hello Michal, >>>>>>> >>>>>>> ok this might take a long time. Attached you'll find a graph from a >>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory >>>>>>> Usage decreases as well as cache but slowly and only over time and days. >>>>>>> >>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. >>>>>> >>>>>> No problem. Just make sure to collect the requested data from the time >>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get >>>>>> an idea of how much memory gets reclaimed due to THP. >>>>> >>>>> You mean your sed and sort on top of the trace file? No i did not with >>>>> the current 5.3 kernel do you think it will show anything interesting? >>>>> Which line shows me how much memory gets reclaimed due to THP? >>> >>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz >>> Each command has a commented output. If you see nunmber of reclaimed >>> pages to be large for GFP_TRANSHUGE then you are seeing a similar >>> problem. >>> >>>> Is something like a kernel memory leak possible? Or wouldn't this end up >>>> in having a lot of free memory which doesn't seem usable. >>> >>> I would be really surprised if this was the case. >>> >>>> I also wonder why a reclaim takes place when there is enough memory. >>> >>> This is not clear yet and it might be a bug that has been fixed since >>> 4.18. That's why we need to see whether the same is pattern is happening >>> with 5.3 as well. > > but except from the btrfs problem the memory consumption looks far > better than before. > > Running 4.19.X: > after about 12h cache starts to drop from 30G to 24G > > Running 5.3-rc8: > after about 24h cache is still constant at nearly 30G > > Greets, > Stefan > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-11 13:59 ` Stefan Priebe - Profihost AG @ 2019-09-12 10:53 ` Stefan Priebe - Profihost AG 2019-09-12 11:06 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-12 10:53 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Hello Michal, now the kernel (5.2.14) was locked / deadlocked with: --------------- 019-09-12 12:41:47 ------------[ cut here ]------------ 2019-09-12 12:41:47 NETDEV WATCHDOG: eth0 (igb): transmit queue 2 timed out 2019-09-12 12:41:47 WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x254/0x260 2019-09-12 12:41:47 Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last unloaded: btrfs] 2019-09-12 12:41:47 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.14 #1 2019-09-12 12:41:47 Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 2019-09-12 12:41:47 RIP: 0010:dev_watchdog+0x254/0x260 2019-09-12 12:41:47 Code: 48 85 c0 75 e4 eb 9d 4c 89 ef c6 05 a6 09 c8 00 01 e8 b0 53 fb ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 10 d6 0c be e8 ac ca 98 ff <0f> 0b e9 7c ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 41 57 41 56 49 2019-09-12 12:41:47 RSP: 0018:ffffbea7c63a0e68 EFLAGS: 00010282 2019-09-12 12:41:47 RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006 2019-09-12 12:41:47 RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff96f9ff896540 2019-09-12 12:41:47 RBP: ffff96f9fc18041c R08: 0000000000000001 R09: 000000000000046f 2019-09-12 12:41:47 R10: ffff96f9ff89a630 R11: 0000000000000000 R12: ffff96f9f9e16940 2019-09-12 12:41:47 R13: ffff96f9fc180000 R14: ffff96f9fc180440 R15: 0000000000000008 2019-09-12 12:41:47 FS: 0000000000000000(0000) GS:ffff96f9ff880000(0000) knlGS:0000000000000000 2019-09-12 12:41:47 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2019-09-12 12:41:47 CR2: 00007fbb2c4e2000 CR3: 0000000c0d20a004 CR4: 00000000001606e0 2019-09-12 12:41:47 Call Trace:<IRQ> 2019-09-12 12:41:47 ? pfifo_fast_reset+0x110/0x110call_timer_fn+0x2d/0x140run_timer_softirq+0x1e2/0x440 2019-09-12 12:41:47 ? timerqueue_add+0x54/0x80 2019-09-12 12:41:48 ? enqueue_hrtimer+0x3a/0x90__do_softirq+0x10c/0x2d4irq_exit+0xdd/0xf0smp_apic_timer_interrupt+0x74/0x130apic_timer_interrupt+0xf/0x20</IRQ> 2019-09-12 12:41:48 RIP: 0010:cpuidle_enter_state+0xbd/0x410 2019-09-12 12:41:48 Code: 24 0f 1f 44 00 00 31 ff e8 b0 67 a5 ff 80 7c 24 13 00 74 12 9c 58 f6 c4 02 0f 85 2c 03 00 00 31 ff e8 a7 b9 aa ff fb 45 85 ed <0f> 88 e0 02 00 00 4c 8b 04 24 4c 2b 44 24 08 48 ba cf f7 53 e3 a5 2019-09-12 12:41:48 RSP: 0018:ffffbea7c62f7e60 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff13 2019-09-12 12:41:48 RAX: ffff96f9ff8a9840 RBX: ffffffffbe3271a0 RCX: 000000000000001f 2019-09-12 12:41:48 RDX: 000044a065c471eb RSI: 0000000024925419 RDI: 0000000000000000 2019-09-12 12:41:48 RBP: ffffdea7bfa80f00 R08: 0000000000000002 R09: 00000000000290c0 2019-09-12 12:41:48 R10: 00000000ffffffff R11: 0000000000000f05 R12: 0000000000000004 2019-09-12 12:41:48 R13: 0000000000000004 R14: 0000000000000004 R15: ffffffffbe3271a0cpuidle_enter+0x29/0x40do_idle+0x1d5/0x220cpu_startup_entry+0x19/0x20start_secondary+0x16b/0x1b0secondary_startup_64+0xa4/0xb0 2019-09-12 12:41:48 ---[ end trace 3241d99856ac4582 ]--- 2019-09-12 12:41:48 igb 0000:05:00.0 eth0: Reset adapter ------------------------------- Stefan Am 11.09.19 um 15:59 schrieb Stefan Priebe - Profihost AG: > HI, > > i've now tried v5.2.14 but that one died with - i don't know which > version to try... now > > 2019-09-11 15:41:09 ------------[ cut here ]------------ > 2019-09-11 15:41:09 kernel BUG at mm/page-writeback.c:2655! > 2019-09-11 15:41:09 invalid opcode: 0000 [#1] SMP PTI > 2019-09-11 15:41:09 CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted > 5.2.14 #1 > 2019-09-11 15:41:09 Hardware name: Supermicro Super Server/X10SRi-F, > BIOS 1.0b 04/21/2015 > 2019-09-11 15:41:09 Workqueue: btrfs-delalloc btrfs_delalloc_helper > [btrfs] > 2019-09-11 15:41:09 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 > 2019-09-11 15:41:09 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 > 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 > 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 > d2 48 8b > 2019-09-11 15:41:09 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 > 2019-09-11 15:41:09 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: > 0000000000000000 > 2019-09-11 15:41:09 RDX: 0000000000000000 RSI: 0000000000000006 RDI: > ffffe660525b3140 > 2019-09-11 15:41:09 RBP: ffff9ad639868818 R08: 0000000000000001 R09: > 000000000002de18 > 2019-09-11 15:41:09 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: > 0000000000000000 > 2019-09-11 15:41:09 R13: 0000000000000001 R14: 0000000000000000 R15: > ffffbd4b8d2f3d08 > 2019-09-11 15:41:09 FS: 0000000000000000(0000) > GS:ffff9ade3f900000(0000) knlGS:0000000000000000 > 2019-09-11 15:41:09 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2019-09-11 15:41:09 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: > 00000000001606e0 > 2019-09-11 15:41:09 Call Trace: > 2019-09-11 15:41:09 __process_pages_contig+0x270/0x360 [btrfs] > 2019-09-11 15:41:09 submit_compressed_extents+0x39d/0x460 [btrfs] > 2019-09-11 15:41:09 normal_work_helper+0x20f/0x320 > [btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0 > 2019-09-11 15:41:09 ? rescuer_thread+0x330/0x330kthread+0xf8/0x130 > 2019-09-11 15:41:09 ? > kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40 > 2019-09-11 15:41:09 Modules linked in: netconsole xt_tcpudp xt_owner > xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport > ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse > ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac > x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper > irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect > ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf > ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress > zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq > async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear > md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit > i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas > 2019-09-11 15:41:09 ---[ end trace d9a3f99c047dc8bf ]--- > 2019-09-11 15:41:10 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 > 2019-09-11 15:41:10 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 > 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 > 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 > d2 48 8b > 2019-09-11 15:41:10 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 > 2019-09-11 15:41:10 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: > 0000000000000000 > 2019-09-11 15:41:10 RDX: 0000000000000000 RSI: 0000000000000006 RDI: > ffffe660525b3140 > 2019-09-11 15:41:10 RBP: ffff9ad639868818 R08: 0000000000000001 R09: > 000000000002de18 > 2019-09-11 15:41:10 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: > 0000000000000000 > 2019-09-11 15:41:10 R13: 0000000000000001 R14: 0000000000000000 R15: > ffffbd4b8d2f3d08 > 2019-09-11 15:41:10 FS: 0000000000000000(0000) > GS:ffff9ade3f900000(0000) knlGS:0000000000000000 > 2019-09-11 15:41:10 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2019-09-11 15:41:10 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: > 00000000001606e0 > 2019-09-11 15:41:10 Kernel panic - not syncing: Fatal exception > 2019-09-11 15:41:10 Kernel Offset: 0x1a000000 from > 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > 2019-09-11 15:41:10 Rebooting in 20 seconds.. > 2019-09-11 15:41:29 ACPI MEMORY or I/O RESET_REG. > > Stefan > Am 11.09.19 um 08:24 schrieb Stefan Priebe - Profihost AG: >> Hi Michal, >> >> Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG: >>> Hi Michal, >>> Am 10.09.19 um 15:24 schrieb Michal Hocko: >>>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote: >>>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: >>>>>> >>>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko: >>>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >>>>>>>> Hello Michal, >>>>>>>> >>>>>>>> ok this might take a long time. Attached you'll find a graph from a >>>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory >>>>>>>> Usage decreases as well as cache but slowly and only over time and days. >>>>>>>> >>>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. >>>>>>> >>>>>>> No problem. Just make sure to collect the requested data from the time >>>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get >>>>>>> an idea of how much memory gets reclaimed due to THP. >>>>>> >>>>>> You mean your sed and sort on top of the trace file? No i did not with >>>>>> the current 5.3 kernel do you think it will show anything interesting? >>>>>> Which line shows me how much memory gets reclaimed due to THP? >>>> >>>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz >>>> Each command has a commented output. If you see nunmber of reclaimed >>>> pages to be large for GFP_TRANSHUGE then you are seeing a similar >>>> problem. >>>> >>>>> Is something like a kernel memory leak possible? Or wouldn't this end up >>>>> in having a lot of free memory which doesn't seem usable. >>>> >>>> I would be really surprised if this was the case. >>>> >>>>> I also wonder why a reclaim takes place when there is enough memory. >>>> >>>> This is not clear yet and it might be a bug that has been fixed since >>>> 4.18. That's why we need to see whether the same is pattern is happening >>>> with 5.3 as well. >> >> but except from the btrfs problem the memory consumption looks far >> better than before. >> >> Running 4.19.X: >> after about 12h cache starts to drop from 30G to 24G >> >> Running 5.3-rc8: >> after about 24h cache is still constant at nearly 30G >> >> Greets, >> Stefan >> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-12 10:53 ` Stefan Priebe - Profihost AG @ 2019-09-12 11:06 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-12 11:06 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Sorry very short after that even more traces show up: So currently it seems we can't test with 5.3-rc8 or 5.2.14 - what's next? watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [authscanclient:812] Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last unloaded: btrfs] CPU: 7 PID: 812 Comm: authscanclient Tainted: G W 5.2.14 #1 watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [authscanclient:813] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last unloaded: btrfs] RIP: 0010:find_next_bit+0x1c/0x60 CPU: 11 PID: 813 Comm: authscanclient Tainted: G W 5.2.14 #1 Code: 8d 04 0a c3 f3 c3 0f 1f 84 00 00 00 00 00 48 39 d6 48 89 f0 48 89 d1 76 4e 48 89 d6 48 c7 c2 ff ff ff ff 48 c1 ee 06 48 d3 e2 <48> 83 e1 c0 48 23 14 f7 75 24 48 83 c1 40 48 39 c8 77 0b eb 2a 48 Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 RIP: 0010:cpumask_next+0x12/0x20 RSP: 0000:ffffbea7c770b608 EFLAGS: 00000287 ORIG_RAX: ffffffffffffff13 Code: 48 01 f8 80 38 22 75 97 eb d4 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f0 8b 35 4b 82 b6 00 8d 57 01 48 89 c7 48 63 d2 <e8> d9 33 c6 ff f3 c3 0f 1f 80 00 00 00 00 55 53 89 f3 8b 35 2a 82 RAX: 000000000000000c RBX: 0000000000000004 RCX: 0000000000000002 RDX: fffffffffffffffc RSI: 0000000000000000 RDI: ffffffffbe3e5e40 RSP: 0000:ffffbea7c727f620 EFLAGS: 00000286 ORIG_RAX: ffffffffffffff13 RBP: ffff96f9edc8cc00 R08: 0000000000000000 R09: ffff96eac78049b8 R10: ffffbea7c770b750 R11: ffffffffbe2bd8d8 R12: 0000000000000010 RAX: ffffffffbe3e5e40 RBX: ffff96f9edc8cc00 RCX: 0000000000000007 RDX: 0000000000000008 RSI: 000000000000000c RDI: ffffffffbe3e5e40 R13: ffffffffbe3e5e40 R14: 0000000000000002 R15: ffffffffffffffa0 FS: 00007f3ab8edd700(0000) GS:ffff96f9ff9c0000(0000) knlGS:0000000000000000 RBP: ffffffffbe3e5e40 R08: ffff96eac7804920 R09: ffff96eac78049b8 R10: 0000000000000000 R11: ffffffffbe2bd8d8 R12: 0000000000000018 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3abbf66f20 CR3: 0000000fe60a6001 CR4: 00000000001606e0 R13: 0000000000000003 R14: 0000000000008e4b R15: fffffffffffffff6 FS: 00007f3aabfff700(0000) GS:ffff96f9ffac0000(0000) knlGS:0000000000000000 Call Trace: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3abbf66fd0 CR3: 0000000fe60a6004 CR4: 00000000001606e0 cpumask_next+0x17/0x20 Call Trace: lruvec_lru_size+0x5c/0x110 count_shadow_nodes+0xac/0x220 shrink_node_memcg+0xd7/0x7d0 do_shrink_slab+0x55/0x2d0 ? shrink_slab+0x2a3/0x2b0 shrink_slab+0x219/0x2b0 ? shrink_node+0xe0/0x4b0 shrink_node+0xf6/0x4b0 shrink_node+0xe0/0x4b0 do_try_to_free_pages+0xeb/0x380 do_try_to_free_pages+0xeb/0x380 try_to_free_mem_cgroup_pages+0xe6/0x1e0 try_to_free_mem_cgroup_pages+0xe6/0x1e0 try_charge+0x295/0x780 try_charge+0x295/0x780 ? shrink_slab+0x2a3/0x2b0 ? mem_cgroup_commit_charge+0x79/0x4a0 mem_cgroup_try_charge+0xc2/0x190 mem_cgroup_try_charge+0xc2/0x190 __add_to_page_cache_locked+0x282/0x330 __add_to_page_cache_locked+0x282/0x330 ? count_shadow_nodes+0x220/0x220 ? count_shadow_nodes+0x220/0x220 add_to_page_cache_lru+0x4a/0xc0 add_to_page_cache_lru+0x4a/0xc0 iomap_readpages_actor+0x103/0x220 iomap_readpages_actor+0x103/0x220 ? iomap_write_begin.constprop.45+0x370/0x370 ? iomap_write_begin.constprop.45+0x370/0x370 iomap_apply+0xba/0x150 ? iomap_write_begin.constprop.45+0x370/0x370 iomap_apply+0xba/0x150 iomap_readpages+0xaa/0x1a0 ? iomap_write_begin.constprop.45+0x370/0x370 ? iomap_write_begin.constprop.45+0x370/0x370 iomap_readpages+0xaa/0x1a0 ? iomap_write_begin.constprop.45+0x370/0x370 read_pages+0x71/0x1a0 read_pages+0x71/0x1a0 ? 0xffffffffbd000000 ? __do_page_cache_readahead+0x1cc/0x1e0 ? __do_page_cache_readahead+0x1a8/0x1e0 __do_page_cache_readahead+0x1cc/0x1e0 __do_page_cache_readahead+0x1a8/0x1e0 filemap_fault+0x6fc/0x960 filemap_fault+0x6fc/0x960 ? __mod_lruvec_state+0x3f/0xe0 ? schedule+0x39/0xa0 ? page_add_file_rmap+0xd1/0x160 ? __mod_lruvec_state+0x3f/0xe0 ? alloc_set_pte+0x4f8/0x5c0 __xfs_filemap_fault.constprop.13+0x49/0x120 ? page_add_file_rmap+0xd1/0x160 __do_fault+0x3c/0x110 ? alloc_set_pte+0x4f8/0x5c0 __handle_mm_fault+0xa7c/0xfb0 __xfs_filemap_fault.constprop.13+0x49/0x120 handle_mm_fault+0xd0/0x1d0 __do_fault+0x3c/0x110 __do_page_fault+0x253/0x470 __handle_mm_fault+0xa7c/0xfb0 do_page_fault+0x2c/0x106 handle_mm_fault+0xd0/0x1d0 ? page_fault+0x8/0x30 __do_page_fault+0x253/0x470 page_fault+0x1e/0x30 do_page_fault+0x2c/0x106 RIP: 0033:0x7f3abbf66f20 ? page_fault+0x8/0x30 Code: 68 38 00 00 00 e9 60 fc ff ff ff 25 da 72 20 00 68 39 00 00 00 e9 50 fc ff ff ff 25 d2 72 20 00 68 3a 00 00 00 e9 40 fc ff ff <ff> 25 ca 72 20 00 68 3b 00 00 00 e9 30 fc ff ff ff 25 c2 72 20 00 page_fault+0x1e/0x30 RSP: 002b:00007f3ab8edcb58 EFLAGS: 00010246 RIP: 0033:0x7f3abbf66fd0 RAX: 0000000000000001 RBX: 000055f087797ae0 RCX: 0000000000000010 RDX: 000000002090000c RSI: 0000000000000050 RDI: 000055f087797ae0 Code: 68 43 00 00 00 e9 b0 fb ff ff ff 25 82 72 20 00 68 44 00 00 00 e9 a0 fb ff ff ff 25 7a 72 20 00 68 45 00 00 00 e9 90 fb ff ff <ff> 25 72 72 20 00 68 46 00 00 00 e9 80 fb ff ff ff 25 6a 72 20 00 RBP: 000055f0873a69d0 R08: 0000000000000602 R09: 000055f0876daee0 R10: 000055f087034980 R11: 00000000e24313e9 R12: 000055f08779e558 RSP: 002b:00007f3aabffea68 EFLAGS: 00010287 R13: 000055f08779e558 R14: 000055f0873a69d0 R15: 000055f08779e550 RAX: 000055f087b75190 RBX: 00007f3abc16e2c0 RCX: 0000000000000012 RDX: 0000000000000002 RSI: 00007f3abc16e2c0 RDI: 00007f3abc16e2c0 RBP: 000055f08777a9f0 R08: 000000000000000f R09: 0000000000000003 R10: 000000000000000b R11: 0000000000000000 R12: 000055f08777a9f0 R13: 000055f085e5b5b8 R14: 000055f087b79240 R15: 0000000000000000 igb 0000:05:00.1 eth1: Reset adapter igb 0000:05:00.1 eth1: igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [authscanclient:812] Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last unloaded: btrfs] CPU: 7 PID: 812 Comm: authscanclient Tainted: G W L 5.2.14 #1 watchdog: BUG: soft lockup - CPU#11 stuck for 23s! [authscanclient:813] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 Modules linked in: btrfs dm_mod netconsole xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core megaraid_sas [last unloaded: btrfs] RIP: 0010:cpumask_next+0x0/0x20 CPU: 11 PID: 813 Comm: authscanclient Tainted: G W L 5.2.14 #1 Code: 83 c7 01 e9 24 ff ff ff 48 83 c0 01 48 89 02 44 89 d0 48 01 f8 80 38 22 75 97 eb d4 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <48> 89 f0 8b 35 4b 82 b6 00 8d 57 01 48 89 c7 48 63 d2 e8 d9 33 c6 Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 RIP: 0010:cpumask_next+0x12/0x20 RSP: 0000:ffffbea7c770b610 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13 Code: 48 01 f8 80 38 22 75 97 eb d4 90 90 90 90 90 90 90 90 90 90 90 90 90 90 48 89 f0 8b 35 4b 82 b6 00 8d 57 01 48 89 c7 48 63 d2 <e8> d9 33 c6 ff f3 c3 0f 1f 80 00 00 00 00 55 53 89 f3 8b 35 2a 82 RAX: 0000000000000002 RBX: 0000000000000004 RCX: ffff96f9ff880000 RDX: 000047adbfe07c58 RSI: ffffffffbe3e5e40 RDI: 0000000000000002 RSP: 0000:ffffbea7c727f620 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 RBP: ffff96f9edc8cc00 R08: 0000000000000000 R09: ffff96eac78049b8 R10: ffffbea7c770b750 R11: ffffffffbe2bd8d8 R12: 0000000000000018 RAX: ffffffffbe3e5e40 RBX: ffff96f9edc8cc00 RCX: 0000000000000009 RDX: 000000000000000a RSI: 000000000000000c RDI: ffffffffbe3e5e40 R13: ffffffffbe3e5e40 R14: 0000000000000003 R15: ffffffffffffffc1 FS: 00007f3ab8edd700(0000) GS:ffff96f9ff9c0000(0000) knlGS:0000000000000000 RBP: ffffffffbe3e5e40 R08: ffff96eac7804920 R09: ffff96eac78049b8 R10: 0000000000000000 R11: ffffffffbe2bd8d8 R12: 0000000000000020 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3abbf66f20 CR3: 0000000fe60a6001 CR4: 00000000001606e0 R13: 0000000000000004 R14: 0000000000008e4b R15: 0000000000000000 FS: 00007f3aabfff700(0000) GS:ffff96f9ffac0000(0000) knlGS:0000000000000000 Call Trace: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f3abbf66fd0 CR3: 0000000fe60a6004 CR4: 00000000001606e0 lruvec_lru_size+0x5c/0x110 Call Trace: shrink_node_memcg+0xd7/0x7d0 count_shadow_nodes+0xac/0x220 ? shrink_slab+0x2a3/0x2b0 do_shrink_slab+0x55/0x2d0 ? shrink_node+0xe0/0x4b0 shrink_slab+0x219/0x2b0 shrink_node+0xe0/0x4b0 Stefan Am 12.09.19 um 12:53 schrieb Stefan Priebe - Profihost AG: > Hello Michal, > > now the kernel (5.2.14) was locked / deadlocked with: > --------------- > 019-09-12 12:41:47 ------------[ cut here ]------------ > 2019-09-12 12:41:47 NETDEV WATCHDOG: eth0 (igb): transmit queue 2 > timed out > 2019-09-12 12:41:47 WARNING: CPU: 2 PID: 0 at > net/sched/sch_generic.c:443 dev_watchdog+0x254/0x260 > 2019-09-12 12:41:47 Modules linked in: btrfs dm_mod netconsole > xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 fuse xt_multiport ipt_REJECT nf_reject_ipv4 xt_set > iptable_filter bpfilter ip_set_hash_net ip_set nfnetlink 8021q garp > bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast ttm kvm > drm_kms_helper irqbypass drm crc32_pclmul fb_sys_fops lpc_ich ipmi_si > syscopyarea sysfillrect ipmi_devintf mfd_core ghash_clmulni_intel wmi > sysimgblt sg ipmi_msghandler button ip_tables x_tables zstd_decompress > zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq > async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear > md_mod xhci_pci sd_mod ehci_pci xhci_hcd ehci_hcd igb i2c_i801 > i2c_algo_bit ahci usbcore ptp libahci i2c_core usb_common pps_core > megaraid_sas [last unloaded: btrfs] > 2019-09-12 12:41:47 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.2.14 #1 > 2019-09-12 12:41:47 Hardware name: Supermicro Super Server/X10SRi-F, > BIOS 1.0b 04/21/2015 > 2019-09-12 12:41:47 RIP: 0010:dev_watchdog+0x254/0x260 > 2019-09-12 12:41:47 Code: 48 85 c0 75 e4 eb 9d 4c 89 ef c6 05 a6 09 > c8 00 01 e8 b0 53 fb ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 10 d6 0c be e8 > ac ca 98 ff <0f> 0b e9 7c ff ff ff 0f 1f 44 00 00 0f 1f 44 00 00 41 57 > 41 56 49 > 2019-09-12 12:41:47 RSP: 0018:ffffbea7c63a0e68 EFLAGS: 00010282 > 2019-09-12 12:41:47 RAX: 0000000000000000 RBX: 0000000000000002 RCX: > 0000000000000006 > 2019-09-12 12:41:47 RDX: 0000000000000007 RSI: 0000000000000086 RDI: > ffff96f9ff896540 > 2019-09-12 12:41:47 RBP: ffff96f9fc18041c R08: 0000000000000001 R09: > 000000000000046f > 2019-09-12 12:41:47 R10: ffff96f9ff89a630 R11: 0000000000000000 R12: > ffff96f9f9e16940 > 2019-09-12 12:41:47 R13: ffff96f9fc180000 R14: ffff96f9fc180440 R15: > 0000000000000008 > 2019-09-12 12:41:47 FS: 0000000000000000(0000) > GS:ffff96f9ff880000(0000) knlGS:0000000000000000 > 2019-09-12 12:41:47 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > 2019-09-12 12:41:47 CR2: 00007fbb2c4e2000 CR3: 0000000c0d20a004 CR4: > 00000000001606e0 > 2019-09-12 12:41:47 Call Trace:<IRQ> > 2019-09-12 12:41:47 ? > pfifo_fast_reset+0x110/0x110call_timer_fn+0x2d/0x140run_timer_softirq+0x1e2/0x440 > 2019-09-12 12:41:47 ? timerqueue_add+0x54/0x80 > 2019-09-12 12:41:48 ? > enqueue_hrtimer+0x3a/0x90__do_softirq+0x10c/0x2d4irq_exit+0xdd/0xf0smp_apic_timer_interrupt+0x74/0x130apic_timer_interrupt+0xf/0x20</IRQ> > 2019-09-12 12:41:48 RIP: 0010:cpuidle_enter_state+0xbd/0x410 > 2019-09-12 12:41:48 Code: 24 0f 1f 44 00 00 31 ff e8 b0 67 a5 ff 80 > 7c 24 13 00 74 12 9c 58 f6 c4 02 0f 85 2c 03 00 00 31 ff e8 a7 b9 aa ff > fb 45 85 ed <0f> 88 e0 02 00 00 4c 8b 04 24 4c 2b 44 24 08 48 ba cf f7 > 53 e3 a5 > 2019-09-12 12:41:48 RSP: 0018:ffffbea7c62f7e60 EFLAGS: 00000202 > ORIG_RAX: ffffffffffffff13 > 2019-09-12 12:41:48 RAX: ffff96f9ff8a9840 RBX: ffffffffbe3271a0 RCX: > 000000000000001f > 2019-09-12 12:41:48 RDX: 000044a065c471eb RSI: 0000000024925419 RDI: > 0000000000000000 > 2019-09-12 12:41:48 RBP: ffffdea7bfa80f00 R08: 0000000000000002 R09: > 00000000000290c0 > 2019-09-12 12:41:48 R10: 00000000ffffffff R11: 0000000000000f05 R12: > 0000000000000004 > 2019-09-12 12:41:48 R13: 0000000000000004 R14: 0000000000000004 R15: > ffffffffbe3271a0cpuidle_enter+0x29/0x40do_idle+0x1d5/0x220cpu_startup_entry+0x19/0x20start_secondary+0x16b/0x1b0secondary_startup_64+0xa4/0xb0 > 2019-09-12 12:41:48 ---[ end trace 3241d99856ac4582 ]--- > 2019-09-12 12:41:48 igb 0000:05:00.0 eth0: Reset adapter > ------------------------------- > > Stefan > Am 11.09.19 um 15:59 schrieb Stefan Priebe - Profihost AG: >> HI, >> >> i've now tried v5.2.14 but that one died with - i don't know which >> version to try... now >> >> 2019-09-11 15:41:09 ------------[ cut here ]------------ >> 2019-09-11 15:41:09 kernel BUG at mm/page-writeback.c:2655! >> 2019-09-11 15:41:09 invalid opcode: 0000 [#1] SMP PTI >> 2019-09-11 15:41:09 CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted >> 5.2.14 #1 >> 2019-09-11 15:41:09 Hardware name: Supermicro Super Server/X10SRi-F, >> BIOS 1.0b 04/21/2015 >> 2019-09-11 15:41:09 Workqueue: btrfs-delalloc btrfs_delalloc_helper >> [btrfs] >> 2019-09-11 15:41:09 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 >> 2019-09-11 15:41:09 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 >> 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 >> 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 >> d2 48 8b >> 2019-09-11 15:41:09 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 >> 2019-09-11 15:41:09 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: >> 0000000000000000 >> 2019-09-11 15:41:09 RDX: 0000000000000000 RSI: 0000000000000006 RDI: >> ffffe660525b3140 >> 2019-09-11 15:41:09 RBP: ffff9ad639868818 R08: 0000000000000001 R09: >> 000000000002de18 >> 2019-09-11 15:41:09 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: >> 0000000000000000 >> 2019-09-11 15:41:09 R13: 0000000000000001 R14: 0000000000000000 R15: >> ffffbd4b8d2f3d08 >> 2019-09-11 15:41:09 FS: 0000000000000000(0000) >> GS:ffff9ade3f900000(0000) knlGS:0000000000000000 >> 2019-09-11 15:41:09 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> 2019-09-11 15:41:09 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: >> 00000000001606e0 >> 2019-09-11 15:41:09 Call Trace: >> 2019-09-11 15:41:09 __process_pages_contig+0x270/0x360 [btrfs] >> 2019-09-11 15:41:09 submit_compressed_extents+0x39d/0x460 [btrfs] >> 2019-09-11 15:41:09 normal_work_helper+0x20f/0x320 >> [btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0 >> 2019-09-11 15:41:09 ? rescuer_thread+0x330/0x330kthread+0xf8/0x130 >> 2019-09-11 15:41:09 ? >> kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40 >> 2019-09-11 15:41:09 Modules linked in: netconsole xt_tcpudp xt_owner >> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport >> ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse >> ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac >> x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper >> irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect >> ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf >> ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress >> zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq >> async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear >> md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit >> i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas >> 2019-09-11 15:41:09 ---[ end trace d9a3f99c047dc8bf ]--- >> 2019-09-11 15:41:10 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 >> 2019-09-11 15:41:10 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 >> 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 >> 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 >> d2 48 8b >> 2019-09-11 15:41:10 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 >> 2019-09-11 15:41:10 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: >> 0000000000000000 >> 2019-09-11 15:41:10 RDX: 0000000000000000 RSI: 0000000000000006 RDI: >> ffffe660525b3140 >> 2019-09-11 15:41:10 RBP: ffff9ad639868818 R08: 0000000000000001 R09: >> 000000000002de18 >> 2019-09-11 15:41:10 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: >> 0000000000000000 >> 2019-09-11 15:41:10 R13: 0000000000000001 R14: 0000000000000000 R15: >> ffffbd4b8d2f3d08 >> 2019-09-11 15:41:10 FS: 0000000000000000(0000) >> GS:ffff9ade3f900000(0000) knlGS:0000000000000000 >> 2019-09-11 15:41:10 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> 2019-09-11 15:41:10 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: >> 00000000001606e0 >> 2019-09-11 15:41:10 Kernel panic - not syncing: Fatal exception >> 2019-09-11 15:41:10 Kernel Offset: 0x1a000000 from >> 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) >> 2019-09-11 15:41:10 Rebooting in 20 seconds.. >> 2019-09-11 15:41:29 ACPI MEMORY or I/O RESET_REG. >> >> Stefan >> Am 11.09.19 um 08:24 schrieb Stefan Priebe - Profihost AG: >>> Hi Michal, >>> >>> Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG: >>>> Hi Michal, >>>> Am 10.09.19 um 15:24 schrieb Michal Hocko: >>>>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote: >>>>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: >>>>>>> >>>>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko: >>>>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >>>>>>>>> Hello Michal, >>>>>>>>> >>>>>>>>> ok this might take a long time. Attached you'll find a graph from a >>>>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory >>>>>>>>> Usage decreases as well as cache but slowly and only over time and days. >>>>>>>>> >>>>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. >>>>>>>> >>>>>>>> No problem. Just make sure to collect the requested data from the time >>>>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get >>>>>>>> an idea of how much memory gets reclaimed due to THP. >>>>>>> >>>>>>> You mean your sed and sort on top of the trace file? No i did not with >>>>>>> the current 5.3 kernel do you think it will show anything interesting? >>>>>>> Which line shows me how much memory gets reclaimed due to THP? >>>>> >>>>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz >>>>> Each command has a commented output. If you see nunmber of reclaimed >>>>> pages to be large for GFP_TRANSHUGE then you are seeing a similar >>>>> problem. >>>>> >>>>>> Is something like a kernel memory leak possible? Or wouldn't this end up >>>>>> in having a lot of free memory which doesn't seem usable. >>>>> >>>>> I would be really surprised if this was the case. >>>>> >>>>>> I also wonder why a reclaim takes place when there is enough memory. >>>>> >>>>> This is not clear yet and it might be a bug that has been fixed since >>>>> 4.18. That's why we need to see whether the same is pattern is happening >>>>> with 5.3 as well. >>> >>> but except from the btrfs problem the memory consumption looks far >>> better than before. >>> >>> Running 4.19.X: >>> after about 12h cache starts to drop from 30G to 24G >>> >>> Running 5.3-rc8: >>> after about 24h cache is still constant at nearly 30G >>> >>> Greets, >>> Stefan >>> ^ permalink raw reply [flat|nested] 61+ messages in thread
* 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) 2019-09-11 6:12 ` Stefan Priebe - Profihost AG 2019-09-11 6:24 ` Stefan Priebe - Profihost AG @ 2019-09-11 7:09 ` Michal Hocko 2019-09-11 14:09 ` Stefan Priebe - Profihost AG 2019-09-11 14:56 ` Filipe Manana 2019-09-19 10:21 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG 2 siblings, 2 replies; 61+ messages in thread From: Michal Hocko @ 2019-09-11 7:09 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel, David Sterba, linux-btrfs This smells like IO/Btrfs issue to me. Cc some more people. On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote: [...] > Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process > on backup disk completely hangs / is blocked at 100% i/o: > [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. > [54739.066973] Not tainted 5.3.0-rc8 #1 > [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [54739.069065] rsync D 0 9830 9829 0x00004002 > [54739.070146] Call Trace: > [54739.071183] ? __schedule+0x3cf/0x680 > [54739.072202] ? bit_wait+0x50/0x50 > [54739.073196] schedule+0x39/0xa0 > [54739.074213] io_schedule+0x12/0x40 > [54739.075219] bit_wait_io+0xd/0x50 > [54739.076227] __wait_on_bit+0x66/0x90 > [54739.077239] ? bit_wait+0x50/0x50 > [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 > [54739.078741] ? init_wait_var_entry+0x40/0x40 > [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] > [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] > [54739.080748] do_writepages+0x1a/0x60 > [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 > [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] > [54739.084608] ? retarget_shared_pending+0x70/0x70 > [54739.085049] do_fsync+0x38/0x60 > [54739.085494] __x64_sys_fdatasync+0x13/0x20 > [54739.085944] do_syscall_64+0x55/0x1a0 > [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [54739.086850] RIP: 0033:0x7f1db3fc85f0 > [54739.087310] Code: Bad RIP value. > [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. > [54859.900863] Not tainted 5.3.0-rc8 #1 > [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [54859.902909] rsync D 0 9830 9829 0x00004002 > [54859.903930] Call Trace: > [54859.904888] ? __schedule+0x3cf/0x680 > [54859.905831] ? bit_wait+0x50/0x50 > [54859.906751] schedule+0x39/0xa0 > [54859.907653] io_schedule+0x12/0x40 > [54859.908535] bit_wait_io+0xd/0x50 > [54859.909441] __wait_on_bit+0x66/0x90 > [54859.910306] ? bit_wait+0x50/0x50 > [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 > [54859.912043] ? init_wait_var_entry+0x40/0x40 > [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] > [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] > [54859.914276] do_writepages+0x1a/0x60 > [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 > [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] > [54859.917903] ? retarget_shared_pending+0x70/0x70 > [54859.918307] do_fsync+0x38/0x60 > [54859.918707] __x64_sys_fdatasync+0x13/0x20 > [54859.919106] do_syscall_64+0x55/0x1a0 > [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [54859.919866] RIP: 0033:0x7f1db3fc85f0 > [54859.920243] Code: Bad RIP value. > [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. > [54980.734061] Not tainted 5.3.0-rc8 #1 > [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [54980.735209] rsync D 0 9830 9829 0x00004002 > [54980.735802] Call Trace: > [54980.736473] ? __schedule+0x3cf/0x680 > [54980.737054] ? bit_wait+0x50/0x50 > [54980.737664] schedule+0x39/0xa0 > [54980.738243] io_schedule+0x12/0x40 > [54980.738712] bit_wait_io+0xd/0x50 > [54980.739171] __wait_on_bit+0x66/0x90 > [54980.739623] ? bit_wait+0x50/0x50 > [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 > [54980.740548] ? init_wait_var_entry+0x40/0x40 > [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] > [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] > [54980.743045] do_writepages+0x1a/0x60 > [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 > [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] > [54980.747575] ? retarget_shared_pending+0x70/0x70 > [54980.748059] do_fsync+0x38/0x60 > [54980.748539] __x64_sys_fdatasync+0x13/0x20 > [54980.749012] do_syscall_64+0x55/0x1a0 > [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [54980.749995] RIP: 0033:0x7f1db3fc85f0 > [54980.750368] Code: Bad RIP value. > [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. > [55101.567775] Not tainted 5.3.0-rc8 #1 > [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [55101.568649] rsync D 0 9830 9829 0x00004002 > [55101.569101] Call Trace: > [55101.569609] ? __schedule+0x3cf/0x680 > [55101.570052] ? bit_wait+0x50/0x50 > [55101.570504] schedule+0x39/0xa0 > [55101.570938] io_schedule+0x12/0x40 > [55101.571404] bit_wait_io+0xd/0x50 > [55101.571934] __wait_on_bit+0x66/0x90 > [55101.572601] ? bit_wait+0x50/0x50 > [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 > [55101.573599] ? init_wait_var_entry+0x40/0x40 > [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] > [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] > [55101.575580] do_writepages+0x1a/0x60 > [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 > [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] > [55101.579139] ? retarget_shared_pending+0x70/0x70 > [55101.579543] do_fsync+0x38/0x60 > [55101.579928] __x64_sys_fdatasync+0x13/0x20 > [55101.580312] do_syscall_64+0x55/0x1a0 > [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [55101.581086] RIP: 0033:0x7f1db3fc85f0 > [55101.581463] Code: Bad RIP value. > [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. > [55222.405773] Not tainted 5.3.0-rc8 #1 > [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [55222.407158] rsync D 0 9830 9829 0x00004002 > [55222.407776] Call Trace: > [55222.408450] ? __schedule+0x3cf/0x680 > [55222.409206] ? bit_wait+0x50/0x50 > [55222.409942] schedule+0x39/0xa0 > [55222.410658] io_schedule+0x12/0x40 > [55222.411346] bit_wait_io+0xd/0x50 > [55222.411946] __wait_on_bit+0x66/0x90 > [55222.412572] ? bit_wait+0x50/0x50 > [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 > [55222.413944] ? init_wait_var_entry+0x40/0x40 > [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] > [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] > [55222.417505] do_writepages+0x1a/0x60 > [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 > [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] > [55222.424140] ? retarget_shared_pending+0x70/0x70 > [55222.424861] do_fsync+0x38/0x60 > [55222.425581] __x64_sys_fdatasync+0x13/0x20 > [55222.426308] do_syscall_64+0x55/0x1a0 > [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [55222.427732] RIP: 0033:0x7f1db3fc85f0 > [55222.428396] Code: Bad RIP value. > [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. > [55343.235887] Not tainted 5.3.0-rc8 #1 > [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [55343.237213] rsync D 0 9830 9829 0x00004002 > [55343.237766] Call Trace: > [55343.238353] ? __schedule+0x3cf/0x680 > [55343.238971] ? bit_wait+0x50/0x50 > [55343.239592] schedule+0x39/0xa0 > [55343.240173] io_schedule+0x12/0x40 > [55343.240721] bit_wait_io+0xd/0x50 > [55343.241266] __wait_on_bit+0x66/0x90 > [55343.241835] ? bit_wait+0x50/0x50 > [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 > [55343.242938] ? init_wait_var_entry+0x40/0x40 > [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] > [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] > [55343.245843] do_writepages+0x1a/0x60 > [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 > [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] > [55343.251139] ? retarget_shared_pending+0x70/0x70 > [55343.251628] do_fsync+0x38/0x60 > [55343.252208] __x64_sys_fdatasync+0x13/0x20 > [55343.252702] do_syscall_64+0x55/0x1a0 > [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [55343.253798] RIP: 0033:0x7f1db3fc85f0 > [55343.254294] Code: Bad RIP value. > [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. > [55464.069701] Not tainted 5.3.0-rc8 #1 > [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [55464.071637] rsync D 0 9830 9829 0x00004002 > [55464.072637] Call Trace: > [55464.073623] ? __schedule+0x3cf/0x680 > [55464.074604] ? bit_wait+0x50/0x50 > [55464.075577] schedule+0x39/0xa0 > [55464.076531] io_schedule+0x12/0x40 > [55464.077480] bit_wait_io+0xd/0x50 > [55464.078400] __wait_on_bit+0x66/0x90 > [55464.079300] ? bit_wait+0x50/0x50 > [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 > [55464.081107] ? init_wait_var_entry+0x40/0x40 > [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] > [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] > [55464.085456] do_writepages+0x1a/0x60 > [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 > [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] > [55464.089043] ? retarget_shared_pending+0x70/0x70 > [55464.089429] do_fsync+0x38/0x60 > [55464.089811] __x64_sys_fdatasync+0x13/0x20 > [55464.090190] do_syscall_64+0x55/0x1a0 > [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [55464.090944] RIP: 0033:0x7f1db3fc85f0 > [55464.091321] Code: Bad RIP value. > [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. > [55584.903748] Not tainted 5.3.0-rc8 #1 > [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [55584.906023] rsync D 0 9830 9829 0x00004002 > [55584.907207] Call Trace: > [55584.908355] ? __schedule+0x3cf/0x680 > [55584.909507] ? bit_wait+0x50/0x50 > [55584.910682] schedule+0x39/0xa0 > [55584.911230] io_schedule+0x12/0x40 > [55584.911666] bit_wait_io+0xd/0x50 > [55584.912092] __wait_on_bit+0x66/0x90 > [55584.912510] ? bit_wait+0x50/0x50 > [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 > [55584.913343] ? init_wait_var_entry+0x40/0x40 > [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] > [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] > [55584.915588] do_writepages+0x1a/0x60 > [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 > [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] > [55584.919679] ? retarget_shared_pending+0x70/0x70 > [55584.920122] do_fsync+0x38/0x60 > [55584.920559] __x64_sys_fdatasync+0x13/0x20 > [55584.920996] do_syscall_64+0x55/0x1a0 > [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [55584.921865] RIP: 0033:0x7f1db3fc85f0 > [55584.922298] Code: Bad RIP value. > [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. > [55705.736999] Not tainted 5.3.0-rc8 #1 > [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [55705.738411] rsync D 0 9830 9829 0x00004002 > [55705.739072] Call Trace: > [55705.739455] ? __schedule+0x3cf/0x680 > [55705.739837] ? bit_wait+0x50/0x50 > [55705.740215] schedule+0x39/0xa0 > [55705.740610] io_schedule+0x12/0x40 > [55705.741243] bit_wait_io+0xd/0x50 > [55705.741897] __wait_on_bit+0x66/0x90 > [55705.742524] ? bit_wait+0x50/0x50 > [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 > [55705.743750] ? init_wait_var_entry+0x40/0x40 > [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] > [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] > [55705.746753] do_writepages+0x1a/0x60 > [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 > [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] > [55705.752981] ? retarget_shared_pending+0x70/0x70 > [55705.753686] do_fsync+0x38/0x60 > [55705.754340] __x64_sys_fdatasync+0x13/0x20 > [55705.755012] do_syscall_64+0x55/0x1a0 > [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [55705.756375] RIP: 0033:0x7f1db3fc85f0 > [55705.757042] Code: Bad RIP value. > [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. > [55826.571349] Not tainted 5.3.0-rc8 #1 > [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [55826.573618] rsync D 0 9830 9829 0x00004002 > [55826.574790] Call Trace: > [55826.575932] ? __schedule+0x3cf/0x680 > [55826.577079] ? bit_wait+0x50/0x50 > [55826.578233] schedule+0x39/0xa0 > [55826.579350] io_schedule+0x12/0x40 > [55826.580451] bit_wait_io+0xd/0x50 > [55826.581527] __wait_on_bit+0x66/0x90 > [55826.582596] ? bit_wait+0x50/0x50 > [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 > [55826.583550] ? init_wait_var_entry+0x40/0x40 > [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] > [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] > [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] > [55826.585547] do_writepages+0x1a/0x60 > [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 > [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] > [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] > [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] > [55826.589219] ? retarget_shared_pending+0x70/0x70 > [55826.589617] do_fsync+0x38/0x60 > [55826.590011] __x64_sys_fdatasync+0x13/0x20 > [55826.590411] do_syscall_64+0x55/0x1a0 > [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [55826.591185] RIP: 0033:0x7f1db3fc85f0 > [55826.591572] Code: Bad RIP value. > [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > 000000000000004b > [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > 00007f1db3fc85f0 > [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > 0000000000000001 > [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: > 0000000081c492ca > [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: > 0000000000000028 > [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > 0000000000000000 > > > Greets, > Stefan -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) 2019-09-11 7:09 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko @ 2019-09-11 14:09 ` Stefan Priebe - Profihost AG 2019-09-11 14:56 ` Filipe Manana 1 sibling, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-11 14:09 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel, David Sterba, linux-btrfs HI, i've now tried v5.2.14 but that one died with - i don't know which version to try... now 2019-09-11 15:41:09 ------------[ cut here ]------------ 2019-09-11 15:41:09 kernel BUG at mm/page-writeback.c:2655! 2019-09-11 15:41:09 invalid opcode: 0000 [#1] SMP PTI 2019-09-11 15:41:09 CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted 5.2.14 #1 2019-09-11 15:41:09 Hardware name: Supermicro Super Server/X10SRi-F, BIOS 1.0b 04/21/2015 2019-09-11 15:41:09 Workqueue: btrfs-delalloc btrfs_delalloc_helper [btrfs] 2019-09-11 15:41:09 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 2019-09-11 15:41:09 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 d2 48 8b 2019-09-11 15:41:09 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 2019-09-11 15:41:09 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: 0000000000000000 2019-09-11 15:41:09 RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffe660525b3140 2019-09-11 15:41:09 RBP: ffff9ad639868818 R08: 0000000000000001 R09: 000000000002de18 2019-09-11 15:41:09 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: 0000000000000000 2019-09-11 15:41:09 R13: 0000000000000001 R14: 0000000000000000 R15: ffffbd4b8d2f3d08 2019-09-11 15:41:09 FS: 0000000000000000(0000) GS:ffff9ade3f900000(0000) knlGS:0000000000000000 2019-09-11 15:41:09 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2019-09-11 15:41:09 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: 00000000001606e0 2019-09-11 15:41:09 Call Trace: 2019-09-11 15:41:09 __process_pages_contig+0x270/0x360 [btrfs] 2019-09-11 15:41:09 submit_compressed_extents+0x39d/0x460 [btrfs] 2019-09-11 15:41:09 normal_work_helper+0x20f/0x320 [btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0 2019-09-11 15:41:09 ? rescuer_thread+0x330/0x330kthread+0xf8/0x130 2019-09-11 15:41:09 ? kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40 2019-09-11 15:41:09 Modules linked in: netconsole xt_tcpudp xt_owner xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas 2019-09-11 15:41:09 ---[ end trace d9a3f99c047dc8bf ]--- 2019-09-11 15:41:10 RIP: 0010:clear_page_dirty_for_io+0xfc/0x210 2019-09-11 15:41:10 Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00 00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85 d2 48 8b 2019-09-11 15:41:10 RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246 2019-09-11 15:41:10 RAX: 001000000004205c RBX: ffffe660525b3140 RCX: 0000000000000000 2019-09-11 15:41:10 RDX: 0000000000000000 RSI: 0000000000000006 RDI: ffffe660525b3140 2019-09-11 15:41:10 RBP: ffff9ad639868818 R08: 0000000000000001 R09: 000000000002de18 2019-09-11 15:41:10 R10: 0000000000000002 R11: ffff9ade7ffd6000 R12: 0000000000000000 2019-09-11 15:41:10 R13: 0000000000000001 R14: 0000000000000000 R15: ffffbd4b8d2f3d08 2019-09-11 15:41:10 FS: 0000000000000000(0000) GS:ffff9ade3f900000(0000) knlGS:0000000000000000 2019-09-11 15:41:10 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2019-09-11 15:41:10 CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4: 00000000001606e0 2019-09-11 15:41:10 Kernel panic - not syncing: Fatal exception 2019-09-11 15:41:10 Kernel Offset: 0x1a000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) 2019-09-11 15:41:10 Rebooting in 20 seconds.. 2019-09-11 15:41:29 ACPI MEMORY or I/O RESET_REG. Am 11.09.19 um 09:09 schrieb Michal Hocko: > This smells like IO/Btrfs issue to me. Cc some more people. > > On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote: > [...] >> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process >> on backup disk completely hangs / is blocked at 100% i/o: >> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. >> [54739.066973] Not tainted 5.3.0-rc8 #1 >> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [54739.069065] rsync D 0 9830 9829 0x00004002 >> [54739.070146] Call Trace: >> [54739.071183] ? __schedule+0x3cf/0x680 >> [54739.072202] ? bit_wait+0x50/0x50 >> [54739.073196] schedule+0x39/0xa0 >> [54739.074213] io_schedule+0x12/0x40 >> [54739.075219] bit_wait_io+0xd/0x50 >> [54739.076227] __wait_on_bit+0x66/0x90 >> [54739.077239] ? bit_wait+0x50/0x50 >> [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 >> [54739.078741] ? init_wait_var_entry+0x40/0x40 >> [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [54739.080748] do_writepages+0x1a/0x60 >> [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 >> [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [54739.084608] ? retarget_shared_pending+0x70/0x70 >> [54739.085049] do_fsync+0x38/0x60 >> [54739.085494] __x64_sys_fdatasync+0x13/0x20 >> [54739.085944] do_syscall_64+0x55/0x1a0 >> [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [54739.086850] RIP: 0033:0x7f1db3fc85f0 >> [54739.087310] Code: Bad RIP value. >> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. >> [54859.900863] Not tainted 5.3.0-rc8 #1 >> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [54859.902909] rsync D 0 9830 9829 0x00004002 >> [54859.903930] Call Trace: >> [54859.904888] ? __schedule+0x3cf/0x680 >> [54859.905831] ? bit_wait+0x50/0x50 >> [54859.906751] schedule+0x39/0xa0 >> [54859.907653] io_schedule+0x12/0x40 >> [54859.908535] bit_wait_io+0xd/0x50 >> [54859.909441] __wait_on_bit+0x66/0x90 >> [54859.910306] ? bit_wait+0x50/0x50 >> [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 >> [54859.912043] ? init_wait_var_entry+0x40/0x40 >> [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [54859.914276] do_writepages+0x1a/0x60 >> [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 >> [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [54859.917903] ? retarget_shared_pending+0x70/0x70 >> [54859.918307] do_fsync+0x38/0x60 >> [54859.918707] __x64_sys_fdatasync+0x13/0x20 >> [54859.919106] do_syscall_64+0x55/0x1a0 >> [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [54859.919866] RIP: 0033:0x7f1db3fc85f0 >> [54859.920243] Code: Bad RIP value. >> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. >> [54980.734061] Not tainted 5.3.0-rc8 #1 >> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [54980.735209] rsync D 0 9830 9829 0x00004002 >> [54980.735802] Call Trace: >> [54980.736473] ? __schedule+0x3cf/0x680 >> [54980.737054] ? bit_wait+0x50/0x50 >> [54980.737664] schedule+0x39/0xa0 >> [54980.738243] io_schedule+0x12/0x40 >> [54980.738712] bit_wait_io+0xd/0x50 >> [54980.739171] __wait_on_bit+0x66/0x90 >> [54980.739623] ? bit_wait+0x50/0x50 >> [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 >> [54980.740548] ? init_wait_var_entry+0x40/0x40 >> [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [54980.743045] do_writepages+0x1a/0x60 >> [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 >> [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [54980.747575] ? retarget_shared_pending+0x70/0x70 >> [54980.748059] do_fsync+0x38/0x60 >> [54980.748539] __x64_sys_fdatasync+0x13/0x20 >> [54980.749012] do_syscall_64+0x55/0x1a0 >> [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [54980.749995] RIP: 0033:0x7f1db3fc85f0 >> [54980.750368] Code: Bad RIP value. >> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. >> [55101.567775] Not tainted 5.3.0-rc8 #1 >> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [55101.568649] rsync D 0 9830 9829 0x00004002 >> [55101.569101] Call Trace: >> [55101.569609] ? __schedule+0x3cf/0x680 >> [55101.570052] ? bit_wait+0x50/0x50 >> [55101.570504] schedule+0x39/0xa0 >> [55101.570938] io_schedule+0x12/0x40 >> [55101.571404] bit_wait_io+0xd/0x50 >> [55101.571934] __wait_on_bit+0x66/0x90 >> [55101.572601] ? bit_wait+0x50/0x50 >> [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 >> [55101.573599] ? init_wait_var_entry+0x40/0x40 >> [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [55101.575580] do_writepages+0x1a/0x60 >> [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 >> [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [55101.579139] ? retarget_shared_pending+0x70/0x70 >> [55101.579543] do_fsync+0x38/0x60 >> [55101.579928] __x64_sys_fdatasync+0x13/0x20 >> [55101.580312] do_syscall_64+0x55/0x1a0 >> [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [55101.581086] RIP: 0033:0x7f1db3fc85f0 >> [55101.581463] Code: Bad RIP value. >> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. >> [55222.405773] Not tainted 5.3.0-rc8 #1 >> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [55222.407158] rsync D 0 9830 9829 0x00004002 >> [55222.407776] Call Trace: >> [55222.408450] ? __schedule+0x3cf/0x680 >> [55222.409206] ? bit_wait+0x50/0x50 >> [55222.409942] schedule+0x39/0xa0 >> [55222.410658] io_schedule+0x12/0x40 >> [55222.411346] bit_wait_io+0xd/0x50 >> [55222.411946] __wait_on_bit+0x66/0x90 >> [55222.412572] ? bit_wait+0x50/0x50 >> [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 >> [55222.413944] ? init_wait_var_entry+0x40/0x40 >> [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [55222.417505] do_writepages+0x1a/0x60 >> [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 >> [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [55222.424140] ? retarget_shared_pending+0x70/0x70 >> [55222.424861] do_fsync+0x38/0x60 >> [55222.425581] __x64_sys_fdatasync+0x13/0x20 >> [55222.426308] do_syscall_64+0x55/0x1a0 >> [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [55222.427732] RIP: 0033:0x7f1db3fc85f0 >> [55222.428396] Code: Bad RIP value. >> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. >> [55343.235887] Not tainted 5.3.0-rc8 #1 >> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [55343.237213] rsync D 0 9830 9829 0x00004002 >> [55343.237766] Call Trace: >> [55343.238353] ? __schedule+0x3cf/0x680 >> [55343.238971] ? bit_wait+0x50/0x50 >> [55343.239592] schedule+0x39/0xa0 >> [55343.240173] io_schedule+0x12/0x40 >> [55343.240721] bit_wait_io+0xd/0x50 >> [55343.241266] __wait_on_bit+0x66/0x90 >> [55343.241835] ? bit_wait+0x50/0x50 >> [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 >> [55343.242938] ? init_wait_var_entry+0x40/0x40 >> [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [55343.245843] do_writepages+0x1a/0x60 >> [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 >> [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [55343.251139] ? retarget_shared_pending+0x70/0x70 >> [55343.251628] do_fsync+0x38/0x60 >> [55343.252208] __x64_sys_fdatasync+0x13/0x20 >> [55343.252702] do_syscall_64+0x55/0x1a0 >> [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [55343.253798] RIP: 0033:0x7f1db3fc85f0 >> [55343.254294] Code: Bad RIP value. >> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. >> [55464.069701] Not tainted 5.3.0-rc8 #1 >> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [55464.071637] rsync D 0 9830 9829 0x00004002 >> [55464.072637] Call Trace: >> [55464.073623] ? __schedule+0x3cf/0x680 >> [55464.074604] ? bit_wait+0x50/0x50 >> [55464.075577] schedule+0x39/0xa0 >> [55464.076531] io_schedule+0x12/0x40 >> [55464.077480] bit_wait_io+0xd/0x50 >> [55464.078400] __wait_on_bit+0x66/0x90 >> [55464.079300] ? bit_wait+0x50/0x50 >> [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 >> [55464.081107] ? init_wait_var_entry+0x40/0x40 >> [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [55464.085456] do_writepages+0x1a/0x60 >> [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 >> [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [55464.089043] ? retarget_shared_pending+0x70/0x70 >> [55464.089429] do_fsync+0x38/0x60 >> [55464.089811] __x64_sys_fdatasync+0x13/0x20 >> [55464.090190] do_syscall_64+0x55/0x1a0 >> [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [55464.090944] RIP: 0033:0x7f1db3fc85f0 >> [55464.091321] Code: Bad RIP value. >> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. >> [55584.903748] Not tainted 5.3.0-rc8 #1 >> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [55584.906023] rsync D 0 9830 9829 0x00004002 >> [55584.907207] Call Trace: >> [55584.908355] ? __schedule+0x3cf/0x680 >> [55584.909507] ? bit_wait+0x50/0x50 >> [55584.910682] schedule+0x39/0xa0 >> [55584.911230] io_schedule+0x12/0x40 >> [55584.911666] bit_wait_io+0xd/0x50 >> [55584.912092] __wait_on_bit+0x66/0x90 >> [55584.912510] ? bit_wait+0x50/0x50 >> [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 >> [55584.913343] ? init_wait_var_entry+0x40/0x40 >> [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [55584.915588] do_writepages+0x1a/0x60 >> [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 >> [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [55584.919679] ? retarget_shared_pending+0x70/0x70 >> [55584.920122] do_fsync+0x38/0x60 >> [55584.920559] __x64_sys_fdatasync+0x13/0x20 >> [55584.920996] do_syscall_64+0x55/0x1a0 >> [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [55584.921865] RIP: 0033:0x7f1db3fc85f0 >> [55584.922298] Code: Bad RIP value. >> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. >> [55705.736999] Not tainted 5.3.0-rc8 #1 >> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [55705.738411] rsync D 0 9830 9829 0x00004002 >> [55705.739072] Call Trace: >> [55705.739455] ? __schedule+0x3cf/0x680 >> [55705.739837] ? bit_wait+0x50/0x50 >> [55705.740215] schedule+0x39/0xa0 >> [55705.740610] io_schedule+0x12/0x40 >> [55705.741243] bit_wait_io+0xd/0x50 >> [55705.741897] __wait_on_bit+0x66/0x90 >> [55705.742524] ? bit_wait+0x50/0x50 >> [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 >> [55705.743750] ? init_wait_var_entry+0x40/0x40 >> [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [55705.746753] do_writepages+0x1a/0x60 >> [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 >> [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [55705.752981] ? retarget_shared_pending+0x70/0x70 >> [55705.753686] do_fsync+0x38/0x60 >> [55705.754340] __x64_sys_fdatasync+0x13/0x20 >> [55705.755012] do_syscall_64+0x55/0x1a0 >> [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [55705.756375] RIP: 0033:0x7f1db3fc85f0 >> [55705.757042] Code: Bad RIP value. >> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. >> [55826.571349] Not tainted 5.3.0-rc8 #1 >> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [55826.573618] rsync D 0 9830 9829 0x00004002 >> [55826.574790] Call Trace: >> [55826.575932] ? __schedule+0x3cf/0x680 >> [55826.577079] ? bit_wait+0x50/0x50 >> [55826.578233] schedule+0x39/0xa0 >> [55826.579350] io_schedule+0x12/0x40 >> [55826.580451] bit_wait_io+0xd/0x50 >> [55826.581527] __wait_on_bit+0x66/0x90 >> [55826.582596] ? bit_wait+0x50/0x50 >> [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 >> [55826.583550] ? init_wait_var_entry+0x40/0x40 >> [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >> [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] >> [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] >> [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] >> [55826.585547] do_writepages+0x1a/0x60 >> [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 >> [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] >> [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] >> [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >> [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >> [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >> [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] >> [55826.589219] ? retarget_shared_pending+0x70/0x70 >> [55826.589617] do_fsync+0x38/0x60 >> [55826.590011] __x64_sys_fdatasync+0x13/0x20 >> [55826.590411] do_syscall_64+0x55/0x1a0 >> [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> [55826.591185] RIP: 0033:0x7f1db3fc85f0 >> [55826.591572] Code: Bad RIP value. >> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >> 000000000000004b >> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >> 00007f1db3fc85f0 >> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >> 0000000000000001 >> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: >> 0000000081c492ca >> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: >> 0000000000000028 >> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >> 0000000000000000 >> >> >> Greets, >> Stefan > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) 2019-09-11 7:09 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko 2019-09-11 14:09 ` Stefan Priebe - Profihost AG @ 2019-09-11 14:56 ` Filipe Manana 2019-09-11 15:39 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Filipe Manana @ 2019-09-11 14:56 UTC (permalink / raw) To: Michal Hocko Cc: Stefan Priebe - Profihost AG, linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel, David Sterba, linux-btrfs On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote: > > This smells like IO/Btrfs issue to me. Cc some more people. > > On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote: > [...] > > Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process > > on backup disk completely hangs / is blocked at 100% i/o: > > [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. > > [54739.066973] Not tainted 5.3.0-rc8 #1 > > [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [54739.069065] rsync D 0 9830 9829 0x00004002 > > [54739.070146] Call Trace: > > [54739.071183] ? __schedule+0x3cf/0x680 > > [54739.072202] ? bit_wait+0x50/0x50 > > [54739.073196] schedule+0x39/0xa0 > > [54739.074213] io_schedule+0x12/0x40 > > [54739.075219] bit_wait_io+0xd/0x50 > > [54739.076227] __wait_on_bit+0x66/0x90 > > [54739.077239] ? bit_wait+0x50/0x50 > > [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 > > [54739.078741] ? init_wait_var_entry+0x40/0x40 > > [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [54739.080748] do_writepages+0x1a/0x60 > > [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 > > [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [54739.084608] ? retarget_shared_pending+0x70/0x70 > > [54739.085049] do_fsync+0x38/0x60 > > [54739.085494] __x64_sys_fdatasync+0x13/0x20 > > [54739.085944] do_syscall_64+0x55/0x1a0 > > [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [54739.086850] RIP: 0033:0x7f1db3fc85f0 > > [54739.087310] Code: Bad RIP value. It's a regression introduced in 5.2 Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u Thanks. > > [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. > > [54859.900863] Not tainted 5.3.0-rc8 #1 > > [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [54859.902909] rsync D 0 9830 9829 0x00004002 > > [54859.903930] Call Trace: > > [54859.904888] ? __schedule+0x3cf/0x680 > > [54859.905831] ? bit_wait+0x50/0x50 > > [54859.906751] schedule+0x39/0xa0 > > [54859.907653] io_schedule+0x12/0x40 > > [54859.908535] bit_wait_io+0xd/0x50 > > [54859.909441] __wait_on_bit+0x66/0x90 > > [54859.910306] ? bit_wait+0x50/0x50 > > [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 > > [54859.912043] ? init_wait_var_entry+0x40/0x40 > > [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [54859.914276] do_writepages+0x1a/0x60 > > [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 > > [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [54859.917903] ? retarget_shared_pending+0x70/0x70 > > [54859.918307] do_fsync+0x38/0x60 > > [54859.918707] __x64_sys_fdatasync+0x13/0x20 > > [54859.919106] do_syscall_64+0x55/0x1a0 > > [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [54859.919866] RIP: 0033:0x7f1db3fc85f0 > > [54859.920243] Code: Bad RIP value. > > [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. > > [54980.734061] Not tainted 5.3.0-rc8 #1 > > [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [54980.735209] rsync D 0 9830 9829 0x00004002 > > [54980.735802] Call Trace: > > [54980.736473] ? __schedule+0x3cf/0x680 > > [54980.737054] ? bit_wait+0x50/0x50 > > [54980.737664] schedule+0x39/0xa0 > > [54980.738243] io_schedule+0x12/0x40 > > [54980.738712] bit_wait_io+0xd/0x50 > > [54980.739171] __wait_on_bit+0x66/0x90 > > [54980.739623] ? bit_wait+0x50/0x50 > > [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 > > [54980.740548] ? init_wait_var_entry+0x40/0x40 > > [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [54980.743045] do_writepages+0x1a/0x60 > > [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 > > [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [54980.747575] ? retarget_shared_pending+0x70/0x70 > > [54980.748059] do_fsync+0x38/0x60 > > [54980.748539] __x64_sys_fdatasync+0x13/0x20 > > [54980.749012] do_syscall_64+0x55/0x1a0 > > [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [54980.749995] RIP: 0033:0x7f1db3fc85f0 > > [54980.750368] Code: Bad RIP value. > > [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. > > [55101.567775] Not tainted 5.3.0-rc8 #1 > > [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [55101.568649] rsync D 0 9830 9829 0x00004002 > > [55101.569101] Call Trace: > > [55101.569609] ? __schedule+0x3cf/0x680 > > [55101.570052] ? bit_wait+0x50/0x50 > > [55101.570504] schedule+0x39/0xa0 > > [55101.570938] io_schedule+0x12/0x40 > > [55101.571404] bit_wait_io+0xd/0x50 > > [55101.571934] __wait_on_bit+0x66/0x90 > > [55101.572601] ? bit_wait+0x50/0x50 > > [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 > > [55101.573599] ? init_wait_var_entry+0x40/0x40 > > [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [55101.575580] do_writepages+0x1a/0x60 > > [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 > > [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [55101.579139] ? retarget_shared_pending+0x70/0x70 > > [55101.579543] do_fsync+0x38/0x60 > > [55101.579928] __x64_sys_fdatasync+0x13/0x20 > > [55101.580312] do_syscall_64+0x55/0x1a0 > > [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [55101.581086] RIP: 0033:0x7f1db3fc85f0 > > [55101.581463] Code: Bad RIP value. > > [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. > > [55222.405773] Not tainted 5.3.0-rc8 #1 > > [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [55222.407158] rsync D 0 9830 9829 0x00004002 > > [55222.407776] Call Trace: > > [55222.408450] ? __schedule+0x3cf/0x680 > > [55222.409206] ? bit_wait+0x50/0x50 > > [55222.409942] schedule+0x39/0xa0 > > [55222.410658] io_schedule+0x12/0x40 > > [55222.411346] bit_wait_io+0xd/0x50 > > [55222.411946] __wait_on_bit+0x66/0x90 > > [55222.412572] ? bit_wait+0x50/0x50 > > [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 > > [55222.413944] ? init_wait_var_entry+0x40/0x40 > > [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [55222.417505] do_writepages+0x1a/0x60 > > [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 > > [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [55222.424140] ? retarget_shared_pending+0x70/0x70 > > [55222.424861] do_fsync+0x38/0x60 > > [55222.425581] __x64_sys_fdatasync+0x13/0x20 > > [55222.426308] do_syscall_64+0x55/0x1a0 > > [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [55222.427732] RIP: 0033:0x7f1db3fc85f0 > > [55222.428396] Code: Bad RIP value. > > [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. > > [55343.235887] Not tainted 5.3.0-rc8 #1 > > [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [55343.237213] rsync D 0 9830 9829 0x00004002 > > [55343.237766] Call Trace: > > [55343.238353] ? __schedule+0x3cf/0x680 > > [55343.238971] ? bit_wait+0x50/0x50 > > [55343.239592] schedule+0x39/0xa0 > > [55343.240173] io_schedule+0x12/0x40 > > [55343.240721] bit_wait_io+0xd/0x50 > > [55343.241266] __wait_on_bit+0x66/0x90 > > [55343.241835] ? bit_wait+0x50/0x50 > > [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 > > [55343.242938] ? init_wait_var_entry+0x40/0x40 > > [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [55343.245843] do_writepages+0x1a/0x60 > > [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 > > [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [55343.251139] ? retarget_shared_pending+0x70/0x70 > > [55343.251628] do_fsync+0x38/0x60 > > [55343.252208] __x64_sys_fdatasync+0x13/0x20 > > [55343.252702] do_syscall_64+0x55/0x1a0 > > [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [55343.253798] RIP: 0033:0x7f1db3fc85f0 > > [55343.254294] Code: Bad RIP value. > > [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. > > [55464.069701] Not tainted 5.3.0-rc8 #1 > > [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [55464.071637] rsync D 0 9830 9829 0x00004002 > > [55464.072637] Call Trace: > > [55464.073623] ? __schedule+0x3cf/0x680 > > [55464.074604] ? bit_wait+0x50/0x50 > > [55464.075577] schedule+0x39/0xa0 > > [55464.076531] io_schedule+0x12/0x40 > > [55464.077480] bit_wait_io+0xd/0x50 > > [55464.078400] __wait_on_bit+0x66/0x90 > > [55464.079300] ? bit_wait+0x50/0x50 > > [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 > > [55464.081107] ? init_wait_var_entry+0x40/0x40 > > [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [55464.085456] do_writepages+0x1a/0x60 > > [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 > > [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [55464.089043] ? retarget_shared_pending+0x70/0x70 > > [55464.089429] do_fsync+0x38/0x60 > > [55464.089811] __x64_sys_fdatasync+0x13/0x20 > > [55464.090190] do_syscall_64+0x55/0x1a0 > > [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [55464.090944] RIP: 0033:0x7f1db3fc85f0 > > [55464.091321] Code: Bad RIP value. > > [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. > > [55584.903748] Not tainted 5.3.0-rc8 #1 > > [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [55584.906023] rsync D 0 9830 9829 0x00004002 > > [55584.907207] Call Trace: > > [55584.908355] ? __schedule+0x3cf/0x680 > > [55584.909507] ? bit_wait+0x50/0x50 > > [55584.910682] schedule+0x39/0xa0 > > [55584.911230] io_schedule+0x12/0x40 > > [55584.911666] bit_wait_io+0xd/0x50 > > [55584.912092] __wait_on_bit+0x66/0x90 > > [55584.912510] ? bit_wait+0x50/0x50 > > [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 > > [55584.913343] ? init_wait_var_entry+0x40/0x40 > > [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [55584.915588] do_writepages+0x1a/0x60 > > [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 > > [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [55584.919679] ? retarget_shared_pending+0x70/0x70 > > [55584.920122] do_fsync+0x38/0x60 > > [55584.920559] __x64_sys_fdatasync+0x13/0x20 > > [55584.920996] do_syscall_64+0x55/0x1a0 > > [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [55584.921865] RIP: 0033:0x7f1db3fc85f0 > > [55584.922298] Code: Bad RIP value. > > [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. > > [55705.736999] Not tainted 5.3.0-rc8 #1 > > [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [55705.738411] rsync D 0 9830 9829 0x00004002 > > [55705.739072] Call Trace: > > [55705.739455] ? __schedule+0x3cf/0x680 > > [55705.739837] ? bit_wait+0x50/0x50 > > [55705.740215] schedule+0x39/0xa0 > > [55705.740610] io_schedule+0x12/0x40 > > [55705.741243] bit_wait_io+0xd/0x50 > > [55705.741897] __wait_on_bit+0x66/0x90 > > [55705.742524] ? bit_wait+0x50/0x50 > > [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 > > [55705.743750] ? init_wait_var_entry+0x40/0x40 > > [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [55705.746753] do_writepages+0x1a/0x60 > > [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 > > [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [55705.752981] ? retarget_shared_pending+0x70/0x70 > > [55705.753686] do_fsync+0x38/0x60 > > [55705.754340] __x64_sys_fdatasync+0x13/0x20 > > [55705.755012] do_syscall_64+0x55/0x1a0 > > [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [55705.756375] RIP: 0033:0x7f1db3fc85f0 > > [55705.757042] Code: Bad RIP value. > > [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. > > [55826.571349] Not tainted 5.3.0-rc8 #1 > > [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > [55826.573618] rsync D 0 9830 9829 0x00004002 > > [55826.574790] Call Trace: > > [55826.575932] ? __schedule+0x3cf/0x680 > > [55826.577079] ? bit_wait+0x50/0x50 > > [55826.578233] schedule+0x39/0xa0 > > [55826.579350] io_schedule+0x12/0x40 > > [55826.580451] bit_wait_io+0xd/0x50 > > [55826.581527] __wait_on_bit+0x66/0x90 > > [55826.582596] ? bit_wait+0x50/0x50 > > [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 > > [55826.583550] ? init_wait_var_entry+0x40/0x40 > > [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > > [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] > > [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] > > [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] > > [55826.585547] do_writepages+0x1a/0x60 > > [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 > > [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] > > [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] > > [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > > [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > > [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > > [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] > > [55826.589219] ? retarget_shared_pending+0x70/0x70 > > [55826.589617] do_fsync+0x38/0x60 > > [55826.590011] __x64_sys_fdatasync+0x13/0x20 > > [55826.590411] do_syscall_64+0x55/0x1a0 > > [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [55826.591185] RIP: 0033:0x7f1db3fc85f0 > > [55826.591572] Code: Bad RIP value. > > [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > > 000000000000004b > > [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > > 00007f1db3fc85f0 > > [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > > 0000000000000001 > > [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: > > 0000000081c492ca > > [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: > > 0000000000000028 > > [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > > 0000000000000000 > > > > > > Greets, > > Stefan > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) 2019-09-11 14:56 ` Filipe Manana @ 2019-09-11 15:39 ` Stefan Priebe - Profihost AG 2019-09-11 15:56 ` Filipe Manana 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-11 15:39 UTC (permalink / raw) To: Filipe Manana Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel, David Sterba, linux-btrfs Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me. Stefan > Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>: > >> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote: >> >> This smells like IO/Btrfs issue to me. Cc some more people. >> >>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote: >>> [...] >>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process >>> on backup disk completely hangs / is blocked at 100% i/o: >>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. >>> [54739.066973] Not tainted 5.3.0-rc8 #1 >>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [54739.069065] rsync D 0 9830 9829 0x00004002 >>> [54739.070146] Call Trace: >>> [54739.071183] ? __schedule+0x3cf/0x680 >>> [54739.072202] ? bit_wait+0x50/0x50 >>> [54739.073196] schedule+0x39/0xa0 >>> [54739.074213] io_schedule+0x12/0x40 >>> [54739.075219] bit_wait_io+0xd/0x50 >>> [54739.076227] __wait_on_bit+0x66/0x90 >>> [54739.077239] ? bit_wait+0x50/0x50 >>> [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 >>> [54739.078741] ? init_wait_var_entry+0x40/0x40 >>> [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [54739.080748] do_writepages+0x1a/0x60 >>> [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 >>> [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [54739.084608] ? retarget_shared_pending+0x70/0x70 >>> [54739.085049] do_fsync+0x38/0x60 >>> [54739.085494] __x64_sys_fdatasync+0x13/0x20 >>> [54739.085944] do_syscall_64+0x55/0x1a0 >>> [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [54739.086850] RIP: 0033:0x7f1db3fc85f0 >>> [54739.087310] Code: Bad RIP value. > > It's a regression introduced in 5.2 > Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u > > Thanks. > >>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. >>> [54859.900863] Not tainted 5.3.0-rc8 #1 >>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [54859.902909] rsync D 0 9830 9829 0x00004002 >>> [54859.903930] Call Trace: >>> [54859.904888] ? __schedule+0x3cf/0x680 >>> [54859.905831] ? bit_wait+0x50/0x50 >>> [54859.906751] schedule+0x39/0xa0 >>> [54859.907653] io_schedule+0x12/0x40 >>> [54859.908535] bit_wait_io+0xd/0x50 >>> [54859.909441] __wait_on_bit+0x66/0x90 >>> [54859.910306] ? bit_wait+0x50/0x50 >>> [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 >>> [54859.912043] ? init_wait_var_entry+0x40/0x40 >>> [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [54859.914276] do_writepages+0x1a/0x60 >>> [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 >>> [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [54859.917903] ? retarget_shared_pending+0x70/0x70 >>> [54859.918307] do_fsync+0x38/0x60 >>> [54859.918707] __x64_sys_fdatasync+0x13/0x20 >>> [54859.919106] do_syscall_64+0x55/0x1a0 >>> [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [54859.919866] RIP: 0033:0x7f1db3fc85f0 >>> [54859.920243] Code: Bad RIP value. >>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. >>> [54980.734061] Not tainted 5.3.0-rc8 #1 >>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [54980.735209] rsync D 0 9830 9829 0x00004002 >>> [54980.735802] Call Trace: >>> [54980.736473] ? __schedule+0x3cf/0x680 >>> [54980.737054] ? bit_wait+0x50/0x50 >>> [54980.737664] schedule+0x39/0xa0 >>> [54980.738243] io_schedule+0x12/0x40 >>> [54980.738712] bit_wait_io+0xd/0x50 >>> [54980.739171] __wait_on_bit+0x66/0x90 >>> [54980.739623] ? bit_wait+0x50/0x50 >>> [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 >>> [54980.740548] ? init_wait_var_entry+0x40/0x40 >>> [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [54980.743045] do_writepages+0x1a/0x60 >>> [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 >>> [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [54980.747575] ? retarget_shared_pending+0x70/0x70 >>> [54980.748059] do_fsync+0x38/0x60 >>> [54980.748539] __x64_sys_fdatasync+0x13/0x20 >>> [54980.749012] do_syscall_64+0x55/0x1a0 >>> [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [54980.749995] RIP: 0033:0x7f1db3fc85f0 >>> [54980.750368] Code: Bad RIP value. >>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. >>> [55101.567775] Not tainted 5.3.0-rc8 #1 >>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [55101.568649] rsync D 0 9830 9829 0x00004002 >>> [55101.569101] Call Trace: >>> [55101.569609] ? __schedule+0x3cf/0x680 >>> [55101.570052] ? bit_wait+0x50/0x50 >>> [55101.570504] schedule+0x39/0xa0 >>> [55101.570938] io_schedule+0x12/0x40 >>> [55101.571404] bit_wait_io+0xd/0x50 >>> [55101.571934] __wait_on_bit+0x66/0x90 >>> [55101.572601] ? bit_wait+0x50/0x50 >>> [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 >>> [55101.573599] ? init_wait_var_entry+0x40/0x40 >>> [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [55101.575580] do_writepages+0x1a/0x60 >>> [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 >>> [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [55101.579139] ? retarget_shared_pending+0x70/0x70 >>> [55101.579543] do_fsync+0x38/0x60 >>> [55101.579928] __x64_sys_fdatasync+0x13/0x20 >>> [55101.580312] do_syscall_64+0x55/0x1a0 >>> [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [55101.581086] RIP: 0033:0x7f1db3fc85f0 >>> [55101.581463] Code: Bad RIP value. >>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. >>> [55222.405773] Not tainted 5.3.0-rc8 #1 >>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [55222.407158] rsync D 0 9830 9829 0x00004002 >>> [55222.407776] Call Trace: >>> [55222.408450] ? __schedule+0x3cf/0x680 >>> [55222.409206] ? bit_wait+0x50/0x50 >>> [55222.409942] schedule+0x39/0xa0 >>> [55222.410658] io_schedule+0x12/0x40 >>> [55222.411346] bit_wait_io+0xd/0x50 >>> [55222.411946] __wait_on_bit+0x66/0x90 >>> [55222.412572] ? bit_wait+0x50/0x50 >>> [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 >>> [55222.413944] ? init_wait_var_entry+0x40/0x40 >>> [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [55222.417505] do_writepages+0x1a/0x60 >>> [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 >>> [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [55222.424140] ? retarget_shared_pending+0x70/0x70 >>> [55222.424861] do_fsync+0x38/0x60 >>> [55222.425581] __x64_sys_fdatasync+0x13/0x20 >>> [55222.426308] do_syscall_64+0x55/0x1a0 >>> [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [55222.427732] RIP: 0033:0x7f1db3fc85f0 >>> [55222.428396] Code: Bad RIP value. >>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. >>> [55343.235887] Not tainted 5.3.0-rc8 #1 >>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [55343.237213] rsync D 0 9830 9829 0x00004002 >>> [55343.237766] Call Trace: >>> [55343.238353] ? __schedule+0x3cf/0x680 >>> [55343.238971] ? bit_wait+0x50/0x50 >>> [55343.239592] schedule+0x39/0xa0 >>> [55343.240173] io_schedule+0x12/0x40 >>> [55343.240721] bit_wait_io+0xd/0x50 >>> [55343.241266] __wait_on_bit+0x66/0x90 >>> [55343.241835] ? bit_wait+0x50/0x50 >>> [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 >>> [55343.242938] ? init_wait_var_entry+0x40/0x40 >>> [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [55343.245843] do_writepages+0x1a/0x60 >>> [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 >>> [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [55343.251139] ? retarget_shared_pending+0x70/0x70 >>> [55343.251628] do_fsync+0x38/0x60 >>> [55343.252208] __x64_sys_fdatasync+0x13/0x20 >>> [55343.252702] do_syscall_64+0x55/0x1a0 >>> [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [55343.253798] RIP: 0033:0x7f1db3fc85f0 >>> [55343.254294] Code: Bad RIP value. >>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. >>> [55464.069701] Not tainted 5.3.0-rc8 #1 >>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [55464.071637] rsync D 0 9830 9829 0x00004002 >>> [55464.072637] Call Trace: >>> [55464.073623] ? __schedule+0x3cf/0x680 >>> [55464.074604] ? bit_wait+0x50/0x50 >>> [55464.075577] schedule+0x39/0xa0 >>> [55464.076531] io_schedule+0x12/0x40 >>> [55464.077480] bit_wait_io+0xd/0x50 >>> [55464.078400] __wait_on_bit+0x66/0x90 >>> [55464.079300] ? bit_wait+0x50/0x50 >>> [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 >>> [55464.081107] ? init_wait_var_entry+0x40/0x40 >>> [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [55464.085456] do_writepages+0x1a/0x60 >>> [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 >>> [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [55464.089043] ? retarget_shared_pending+0x70/0x70 >>> [55464.089429] do_fsync+0x38/0x60 >>> [55464.089811] __x64_sys_fdatasync+0x13/0x20 >>> [55464.090190] do_syscall_64+0x55/0x1a0 >>> [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [55464.090944] RIP: 0033:0x7f1db3fc85f0 >>> [55464.091321] Code: Bad RIP value. >>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. >>> [55584.903748] Not tainted 5.3.0-rc8 #1 >>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [55584.906023] rsync D 0 9830 9829 0x00004002 >>> [55584.907207] Call Trace: >>> [55584.908355] ? __schedule+0x3cf/0x680 >>> [55584.909507] ? bit_wait+0x50/0x50 >>> [55584.910682] schedule+0x39/0xa0 >>> [55584.911230] io_schedule+0x12/0x40 >>> [55584.911666] bit_wait_io+0xd/0x50 >>> [55584.912092] __wait_on_bit+0x66/0x90 >>> [55584.912510] ? bit_wait+0x50/0x50 >>> [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 >>> [55584.913343] ? init_wait_var_entry+0x40/0x40 >>> [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [55584.915588] do_writepages+0x1a/0x60 >>> [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 >>> [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [55584.919679] ? retarget_shared_pending+0x70/0x70 >>> [55584.920122] do_fsync+0x38/0x60 >>> [55584.920559] __x64_sys_fdatasync+0x13/0x20 >>> [55584.920996] do_syscall_64+0x55/0x1a0 >>> [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [55584.921865] RIP: 0033:0x7f1db3fc85f0 >>> [55584.922298] Code: Bad RIP value. >>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. >>> [55705.736999] Not tainted 5.3.0-rc8 #1 >>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [55705.738411] rsync D 0 9830 9829 0x00004002 >>> [55705.739072] Call Trace: >>> [55705.739455] ? __schedule+0x3cf/0x680 >>> [55705.739837] ? bit_wait+0x50/0x50 >>> [55705.740215] schedule+0x39/0xa0 >>> [55705.740610] io_schedule+0x12/0x40 >>> [55705.741243] bit_wait_io+0xd/0x50 >>> [55705.741897] __wait_on_bit+0x66/0x90 >>> [55705.742524] ? bit_wait+0x50/0x50 >>> [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 >>> [55705.743750] ? init_wait_var_entry+0x40/0x40 >>> [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [55705.746753] do_writepages+0x1a/0x60 >>> [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 >>> [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [55705.752981] ? retarget_shared_pending+0x70/0x70 >>> [55705.753686] do_fsync+0x38/0x60 >>> [55705.754340] __x64_sys_fdatasync+0x13/0x20 >>> [55705.755012] do_syscall_64+0x55/0x1a0 >>> [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [55705.756375] RIP: 0033:0x7f1db3fc85f0 >>> [55705.757042] Code: Bad RIP value. >>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. >>> [55826.571349] Not tainted 5.3.0-rc8 #1 >>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>> disables this message. >>> [55826.573618] rsync D 0 9830 9829 0x00004002 >>> [55826.574790] Call Trace: >>> [55826.575932] ? __schedule+0x3cf/0x680 >>> [55826.577079] ? bit_wait+0x50/0x50 >>> [55826.578233] schedule+0x39/0xa0 >>> [55826.579350] io_schedule+0x12/0x40 >>> [55826.580451] bit_wait_io+0xd/0x50 >>> [55826.581527] __wait_on_bit+0x66/0x90 >>> [55826.582596] ? bit_wait+0x50/0x50 >>> [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 >>> [55826.583550] ? init_wait_var_entry+0x40/0x40 >>> [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>> [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] >>> [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>> [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] >>> [55826.585547] do_writepages+0x1a/0x60 >>> [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 >>> [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>> [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>> [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>> [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>> [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>> [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] >>> [55826.589219] ? retarget_shared_pending+0x70/0x70 >>> [55826.589617] do_fsync+0x38/0x60 >>> [55826.590011] __x64_sys_fdatasync+0x13/0x20 >>> [55826.590411] do_syscall_64+0x55/0x1a0 >>> [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> [55826.591185] RIP: 0033:0x7f1db3fc85f0 >>> [55826.591572] Code: Bad RIP value. >>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>> 000000000000004b >>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>> 00007f1db3fc85f0 >>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>> 0000000000000001 >>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: >>> 0000000081c492ca >>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: >>> 0000000000000028 >>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>> 0000000000000000 >>> >>> >>> Greets, >>> Stefan >> >> -- >> Michal Hocko >> SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) 2019-09-11 15:39 ` Stefan Priebe - Profihost AG @ 2019-09-11 15:56 ` Filipe Manana 2019-09-11 16:15 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Filipe Manana @ 2019-09-11 15:56 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel, David Sterba, linux-btrfs On Wed, Sep 11, 2019 at 4:39 PM Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > > Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me. I don't know, I can't see that backtrace. The thread was split and I've only seen the one sent to the btrfs list. > > Stefan > > > Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>: > > > >> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote: > >> > >> This smells like IO/Btrfs issue to me. Cc some more people. > >> > >>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote: > >>> [...] > >>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process > >>> on backup disk completely hangs / is blocked at 100% i/o: > >>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. > >>> [54739.066973] Not tainted 5.3.0-rc8 #1 > >>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [54739.069065] rsync D 0 9830 9829 0x00004002 > >>> [54739.070146] Call Trace: > >>> [54739.071183] ? __schedule+0x3cf/0x680 > >>> [54739.072202] ? bit_wait+0x50/0x50 > >>> [54739.073196] schedule+0x39/0xa0 > >>> [54739.074213] io_schedule+0x12/0x40 > >>> [54739.075219] bit_wait_io+0xd/0x50 > >>> [54739.076227] __wait_on_bit+0x66/0x90 > >>> [54739.077239] ? bit_wait+0x50/0x50 > >>> [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [54739.078741] ? init_wait_var_entry+0x40/0x40 > >>> [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [54739.080748] do_writepages+0x1a/0x60 > >>> [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 > >>> [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [54739.084608] ? retarget_shared_pending+0x70/0x70 > >>> [54739.085049] do_fsync+0x38/0x60 > >>> [54739.085494] __x64_sys_fdatasync+0x13/0x20 > >>> [54739.085944] do_syscall_64+0x55/0x1a0 > >>> [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [54739.086850] RIP: 0033:0x7f1db3fc85f0 > >>> [54739.087310] Code: Bad RIP value. > > > > It's a regression introduced in 5.2 > > Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u > > > > Thanks. > > > >>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. > >>> [54859.900863] Not tainted 5.3.0-rc8 #1 > >>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [54859.902909] rsync D 0 9830 9829 0x00004002 > >>> [54859.903930] Call Trace: > >>> [54859.904888] ? __schedule+0x3cf/0x680 > >>> [54859.905831] ? bit_wait+0x50/0x50 > >>> [54859.906751] schedule+0x39/0xa0 > >>> [54859.907653] io_schedule+0x12/0x40 > >>> [54859.908535] bit_wait_io+0xd/0x50 > >>> [54859.909441] __wait_on_bit+0x66/0x90 > >>> [54859.910306] ? bit_wait+0x50/0x50 > >>> [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [54859.912043] ? init_wait_var_entry+0x40/0x40 > >>> [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [54859.914276] do_writepages+0x1a/0x60 > >>> [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 > >>> [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [54859.917903] ? retarget_shared_pending+0x70/0x70 > >>> [54859.918307] do_fsync+0x38/0x60 > >>> [54859.918707] __x64_sys_fdatasync+0x13/0x20 > >>> [54859.919106] do_syscall_64+0x55/0x1a0 > >>> [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [54859.919866] RIP: 0033:0x7f1db3fc85f0 > >>> [54859.920243] Code: Bad RIP value. > >>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. > >>> [54980.734061] Not tainted 5.3.0-rc8 #1 > >>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [54980.735209] rsync D 0 9830 9829 0x00004002 > >>> [54980.735802] Call Trace: > >>> [54980.736473] ? __schedule+0x3cf/0x680 > >>> [54980.737054] ? bit_wait+0x50/0x50 > >>> [54980.737664] schedule+0x39/0xa0 > >>> [54980.738243] io_schedule+0x12/0x40 > >>> [54980.738712] bit_wait_io+0xd/0x50 > >>> [54980.739171] __wait_on_bit+0x66/0x90 > >>> [54980.739623] ? bit_wait+0x50/0x50 > >>> [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [54980.740548] ? init_wait_var_entry+0x40/0x40 > >>> [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [54980.743045] do_writepages+0x1a/0x60 > >>> [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 > >>> [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [54980.747575] ? retarget_shared_pending+0x70/0x70 > >>> [54980.748059] do_fsync+0x38/0x60 > >>> [54980.748539] __x64_sys_fdatasync+0x13/0x20 > >>> [54980.749012] do_syscall_64+0x55/0x1a0 > >>> [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [54980.749995] RIP: 0033:0x7f1db3fc85f0 > >>> [54980.750368] Code: Bad RIP value. > >>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. > >>> [55101.567775] Not tainted 5.3.0-rc8 #1 > >>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [55101.568649] rsync D 0 9830 9829 0x00004002 > >>> [55101.569101] Call Trace: > >>> [55101.569609] ? __schedule+0x3cf/0x680 > >>> [55101.570052] ? bit_wait+0x50/0x50 > >>> [55101.570504] schedule+0x39/0xa0 > >>> [55101.570938] io_schedule+0x12/0x40 > >>> [55101.571404] bit_wait_io+0xd/0x50 > >>> [55101.571934] __wait_on_bit+0x66/0x90 > >>> [55101.572601] ? bit_wait+0x50/0x50 > >>> [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [55101.573599] ? init_wait_var_entry+0x40/0x40 > >>> [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [55101.575580] do_writepages+0x1a/0x60 > >>> [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 > >>> [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [55101.579139] ? retarget_shared_pending+0x70/0x70 > >>> [55101.579543] do_fsync+0x38/0x60 > >>> [55101.579928] __x64_sys_fdatasync+0x13/0x20 > >>> [55101.580312] do_syscall_64+0x55/0x1a0 > >>> [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [55101.581086] RIP: 0033:0x7f1db3fc85f0 > >>> [55101.581463] Code: Bad RIP value. > >>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. > >>> [55222.405773] Not tainted 5.3.0-rc8 #1 > >>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [55222.407158] rsync D 0 9830 9829 0x00004002 > >>> [55222.407776] Call Trace: > >>> [55222.408450] ? __schedule+0x3cf/0x680 > >>> [55222.409206] ? bit_wait+0x50/0x50 > >>> [55222.409942] schedule+0x39/0xa0 > >>> [55222.410658] io_schedule+0x12/0x40 > >>> [55222.411346] bit_wait_io+0xd/0x50 > >>> [55222.411946] __wait_on_bit+0x66/0x90 > >>> [55222.412572] ? bit_wait+0x50/0x50 > >>> [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [55222.413944] ? init_wait_var_entry+0x40/0x40 > >>> [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [55222.417505] do_writepages+0x1a/0x60 > >>> [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 > >>> [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [55222.424140] ? retarget_shared_pending+0x70/0x70 > >>> [55222.424861] do_fsync+0x38/0x60 > >>> [55222.425581] __x64_sys_fdatasync+0x13/0x20 > >>> [55222.426308] do_syscall_64+0x55/0x1a0 > >>> [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [55222.427732] RIP: 0033:0x7f1db3fc85f0 > >>> [55222.428396] Code: Bad RIP value. > >>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. > >>> [55343.235887] Not tainted 5.3.0-rc8 #1 > >>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [55343.237213] rsync D 0 9830 9829 0x00004002 > >>> [55343.237766] Call Trace: > >>> [55343.238353] ? __schedule+0x3cf/0x680 > >>> [55343.238971] ? bit_wait+0x50/0x50 > >>> [55343.239592] schedule+0x39/0xa0 > >>> [55343.240173] io_schedule+0x12/0x40 > >>> [55343.240721] bit_wait_io+0xd/0x50 > >>> [55343.241266] __wait_on_bit+0x66/0x90 > >>> [55343.241835] ? bit_wait+0x50/0x50 > >>> [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [55343.242938] ? init_wait_var_entry+0x40/0x40 > >>> [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [55343.245843] do_writepages+0x1a/0x60 > >>> [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 > >>> [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [55343.251139] ? retarget_shared_pending+0x70/0x70 > >>> [55343.251628] do_fsync+0x38/0x60 > >>> [55343.252208] __x64_sys_fdatasync+0x13/0x20 > >>> [55343.252702] do_syscall_64+0x55/0x1a0 > >>> [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [55343.253798] RIP: 0033:0x7f1db3fc85f0 > >>> [55343.254294] Code: Bad RIP value. > >>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. > >>> [55464.069701] Not tainted 5.3.0-rc8 #1 > >>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [55464.071637] rsync D 0 9830 9829 0x00004002 > >>> [55464.072637] Call Trace: > >>> [55464.073623] ? __schedule+0x3cf/0x680 > >>> [55464.074604] ? bit_wait+0x50/0x50 > >>> [55464.075577] schedule+0x39/0xa0 > >>> [55464.076531] io_schedule+0x12/0x40 > >>> [55464.077480] bit_wait_io+0xd/0x50 > >>> [55464.078400] __wait_on_bit+0x66/0x90 > >>> [55464.079300] ? bit_wait+0x50/0x50 > >>> [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [55464.081107] ? init_wait_var_entry+0x40/0x40 > >>> [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [55464.085456] do_writepages+0x1a/0x60 > >>> [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 > >>> [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [55464.089043] ? retarget_shared_pending+0x70/0x70 > >>> [55464.089429] do_fsync+0x38/0x60 > >>> [55464.089811] __x64_sys_fdatasync+0x13/0x20 > >>> [55464.090190] do_syscall_64+0x55/0x1a0 > >>> [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [55464.090944] RIP: 0033:0x7f1db3fc85f0 > >>> [55464.091321] Code: Bad RIP value. > >>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. > >>> [55584.903748] Not tainted 5.3.0-rc8 #1 > >>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [55584.906023] rsync D 0 9830 9829 0x00004002 > >>> [55584.907207] Call Trace: > >>> [55584.908355] ? __schedule+0x3cf/0x680 > >>> [55584.909507] ? bit_wait+0x50/0x50 > >>> [55584.910682] schedule+0x39/0xa0 > >>> [55584.911230] io_schedule+0x12/0x40 > >>> [55584.911666] bit_wait_io+0xd/0x50 > >>> [55584.912092] __wait_on_bit+0x66/0x90 > >>> [55584.912510] ? bit_wait+0x50/0x50 > >>> [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [55584.913343] ? init_wait_var_entry+0x40/0x40 > >>> [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [55584.915588] do_writepages+0x1a/0x60 > >>> [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 > >>> [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [55584.919679] ? retarget_shared_pending+0x70/0x70 > >>> [55584.920122] do_fsync+0x38/0x60 > >>> [55584.920559] __x64_sys_fdatasync+0x13/0x20 > >>> [55584.920996] do_syscall_64+0x55/0x1a0 > >>> [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [55584.921865] RIP: 0033:0x7f1db3fc85f0 > >>> [55584.922298] Code: Bad RIP value. > >>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. > >>> [55705.736999] Not tainted 5.3.0-rc8 #1 > >>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [55705.738411] rsync D 0 9830 9829 0x00004002 > >>> [55705.739072] Call Trace: > >>> [55705.739455] ? __schedule+0x3cf/0x680 > >>> [55705.739837] ? bit_wait+0x50/0x50 > >>> [55705.740215] schedule+0x39/0xa0 > >>> [55705.740610] io_schedule+0x12/0x40 > >>> [55705.741243] bit_wait_io+0xd/0x50 > >>> [55705.741897] __wait_on_bit+0x66/0x90 > >>> [55705.742524] ? bit_wait+0x50/0x50 > >>> [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [55705.743750] ? init_wait_var_entry+0x40/0x40 > >>> [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [55705.746753] do_writepages+0x1a/0x60 > >>> [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 > >>> [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [55705.752981] ? retarget_shared_pending+0x70/0x70 > >>> [55705.753686] do_fsync+0x38/0x60 > >>> [55705.754340] __x64_sys_fdatasync+0x13/0x20 > >>> [55705.755012] do_syscall_64+0x55/0x1a0 > >>> [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [55705.756375] RIP: 0033:0x7f1db3fc85f0 > >>> [55705.757042] Code: Bad RIP value. > >>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. > >>> [55826.571349] Not tainted 5.3.0-rc8 #1 > >>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>> disables this message. > >>> [55826.573618] rsync D 0 9830 9829 0x00004002 > >>> [55826.574790] Call Trace: > >>> [55826.575932] ? __schedule+0x3cf/0x680 > >>> [55826.577079] ? bit_wait+0x50/0x50 > >>> [55826.578233] schedule+0x39/0xa0 > >>> [55826.579350] io_schedule+0x12/0x40 > >>> [55826.580451] bit_wait_io+0xd/0x50 > >>> [55826.581527] __wait_on_bit+0x66/0x90 > >>> [55826.582596] ? bit_wait+0x50/0x50 > >>> [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 > >>> [55826.583550] ? init_wait_var_entry+0x40/0x40 > >>> [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>> [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>> [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>> [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>> [55826.585547] do_writepages+0x1a/0x60 > >>> [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 > >>> [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>> [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>> [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>> [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>> [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>> [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>> [55826.589219] ? retarget_shared_pending+0x70/0x70 > >>> [55826.589617] do_fsync+0x38/0x60 > >>> [55826.590011] __x64_sys_fdatasync+0x13/0x20 > >>> [55826.590411] do_syscall_64+0x55/0x1a0 > >>> [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> [55826.591185] RIP: 0033:0x7f1db3fc85f0 > >>> [55826.591572] Code: Bad RIP value. > >>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>> 000000000000004b > >>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>> 00007f1db3fc85f0 > >>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>> 0000000000000001 > >>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>> 0000000081c492ca > >>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: > >>> 0000000000000028 > >>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>> 0000000000000000 > >>> > >>> > >>> Greets, > >>> Stefan > >> > >> -- > >> Michal Hocko > >> SUSE Labs > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) 2019-09-11 15:56 ` Filipe Manana @ 2019-09-11 16:15 ` Stefan Priebe - Profihost AG 2019-09-11 16:19 ` Filipe Manana 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-11 16:15 UTC (permalink / raw) To: Filipe Manana Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel, David Sterba, linux-btrfs Am 11.09.19 um 17:56 schrieb Filipe Manana: > On Wed, Sep 11, 2019 at 4:39 PM Stefan Priebe - Profihost AG > <s.priebe@profihost.ag> wrote: >> >> Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me. > > I don't know, I can't see that backtrace. The thread was split and > I've only seen the one sent to the btrfs list. Hi, strange. This is the 5.3-rc8 stacktrace: https://lore.kernel.org/linux-mm/d07620d9-4967-40fe-fa0f-be51f2459dc5@profihost.ag/ and this the 5.2.14: https://lore.kernel.org/linux-mm/289fbe71-0472-520f-64e2-b6d07ced5436@profihost.ag/ Greets, Stefan >> >> Stefan >> >>> Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>: >>> >>>> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote: >>>> >>>> This smells like IO/Btrfs issue to me. Cc some more people. >>>> >>>>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote: >>>>> [...] >>>>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process >>>>> on backup disk completely hangs / is blocked at 100% i/o: >>>>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. >>>>> [54739.066973] Not tainted 5.3.0-rc8 #1 >>>>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [54739.069065] rsync D 0 9830 9829 0x00004002 >>>>> [54739.070146] Call Trace: >>>>> [54739.071183] ? __schedule+0x3cf/0x680 >>>>> [54739.072202] ? bit_wait+0x50/0x50 >>>>> [54739.073196] schedule+0x39/0xa0 >>>>> [54739.074213] io_schedule+0x12/0x40 >>>>> [54739.075219] bit_wait_io+0xd/0x50 >>>>> [54739.076227] __wait_on_bit+0x66/0x90 >>>>> [54739.077239] ? bit_wait+0x50/0x50 >>>>> [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [54739.078741] ? init_wait_var_entry+0x40/0x40 >>>>> [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [54739.080748] do_writepages+0x1a/0x60 >>>>> [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [54739.084608] ? retarget_shared_pending+0x70/0x70 >>>>> [54739.085049] do_fsync+0x38/0x60 >>>>> [54739.085494] __x64_sys_fdatasync+0x13/0x20 >>>>> [54739.085944] do_syscall_64+0x55/0x1a0 >>>>> [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [54739.086850] RIP: 0033:0x7f1db3fc85f0 >>>>> [54739.087310] Code: Bad RIP value. >>> >>> It's a regression introduced in 5.2 >>> Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u >>> >>> Thanks. >>> >>>>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. >>>>> [54859.900863] Not tainted 5.3.0-rc8 #1 >>>>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [54859.902909] rsync D 0 9830 9829 0x00004002 >>>>> [54859.903930] Call Trace: >>>>> [54859.904888] ? __schedule+0x3cf/0x680 >>>>> [54859.905831] ? bit_wait+0x50/0x50 >>>>> [54859.906751] schedule+0x39/0xa0 >>>>> [54859.907653] io_schedule+0x12/0x40 >>>>> [54859.908535] bit_wait_io+0xd/0x50 >>>>> [54859.909441] __wait_on_bit+0x66/0x90 >>>>> [54859.910306] ? bit_wait+0x50/0x50 >>>>> [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [54859.912043] ? init_wait_var_entry+0x40/0x40 >>>>> [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [54859.914276] do_writepages+0x1a/0x60 >>>>> [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [54859.917903] ? retarget_shared_pending+0x70/0x70 >>>>> [54859.918307] do_fsync+0x38/0x60 >>>>> [54859.918707] __x64_sys_fdatasync+0x13/0x20 >>>>> [54859.919106] do_syscall_64+0x55/0x1a0 >>>>> [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [54859.919866] RIP: 0033:0x7f1db3fc85f0 >>>>> [54859.920243] Code: Bad RIP value. >>>>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. >>>>> [54980.734061] Not tainted 5.3.0-rc8 #1 >>>>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [54980.735209] rsync D 0 9830 9829 0x00004002 >>>>> [54980.735802] Call Trace: >>>>> [54980.736473] ? __schedule+0x3cf/0x680 >>>>> [54980.737054] ? bit_wait+0x50/0x50 >>>>> [54980.737664] schedule+0x39/0xa0 >>>>> [54980.738243] io_schedule+0x12/0x40 >>>>> [54980.738712] bit_wait_io+0xd/0x50 >>>>> [54980.739171] __wait_on_bit+0x66/0x90 >>>>> [54980.739623] ? bit_wait+0x50/0x50 >>>>> [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [54980.740548] ? init_wait_var_entry+0x40/0x40 >>>>> [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [54980.743045] do_writepages+0x1a/0x60 >>>>> [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [54980.747575] ? retarget_shared_pending+0x70/0x70 >>>>> [54980.748059] do_fsync+0x38/0x60 >>>>> [54980.748539] __x64_sys_fdatasync+0x13/0x20 >>>>> [54980.749012] do_syscall_64+0x55/0x1a0 >>>>> [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [54980.749995] RIP: 0033:0x7f1db3fc85f0 >>>>> [54980.750368] Code: Bad RIP value. >>>>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. >>>>> [55101.567775] Not tainted 5.3.0-rc8 #1 >>>>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [55101.568649] rsync D 0 9830 9829 0x00004002 >>>>> [55101.569101] Call Trace: >>>>> [55101.569609] ? __schedule+0x3cf/0x680 >>>>> [55101.570052] ? bit_wait+0x50/0x50 >>>>> [55101.570504] schedule+0x39/0xa0 >>>>> [55101.570938] io_schedule+0x12/0x40 >>>>> [55101.571404] bit_wait_io+0xd/0x50 >>>>> [55101.571934] __wait_on_bit+0x66/0x90 >>>>> [55101.572601] ? bit_wait+0x50/0x50 >>>>> [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [55101.573599] ? init_wait_var_entry+0x40/0x40 >>>>> [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [55101.575580] do_writepages+0x1a/0x60 >>>>> [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [55101.579139] ? retarget_shared_pending+0x70/0x70 >>>>> [55101.579543] do_fsync+0x38/0x60 >>>>> [55101.579928] __x64_sys_fdatasync+0x13/0x20 >>>>> [55101.580312] do_syscall_64+0x55/0x1a0 >>>>> [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [55101.581086] RIP: 0033:0x7f1db3fc85f0 >>>>> [55101.581463] Code: Bad RIP value. >>>>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. >>>>> [55222.405773] Not tainted 5.3.0-rc8 #1 >>>>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [55222.407158] rsync D 0 9830 9829 0x00004002 >>>>> [55222.407776] Call Trace: >>>>> [55222.408450] ? __schedule+0x3cf/0x680 >>>>> [55222.409206] ? bit_wait+0x50/0x50 >>>>> [55222.409942] schedule+0x39/0xa0 >>>>> [55222.410658] io_schedule+0x12/0x40 >>>>> [55222.411346] bit_wait_io+0xd/0x50 >>>>> [55222.411946] __wait_on_bit+0x66/0x90 >>>>> [55222.412572] ? bit_wait+0x50/0x50 >>>>> [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [55222.413944] ? init_wait_var_entry+0x40/0x40 >>>>> [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [55222.417505] do_writepages+0x1a/0x60 >>>>> [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [55222.424140] ? retarget_shared_pending+0x70/0x70 >>>>> [55222.424861] do_fsync+0x38/0x60 >>>>> [55222.425581] __x64_sys_fdatasync+0x13/0x20 >>>>> [55222.426308] do_syscall_64+0x55/0x1a0 >>>>> [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [55222.427732] RIP: 0033:0x7f1db3fc85f0 >>>>> [55222.428396] Code: Bad RIP value. >>>>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. >>>>> [55343.235887] Not tainted 5.3.0-rc8 #1 >>>>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [55343.237213] rsync D 0 9830 9829 0x00004002 >>>>> [55343.237766] Call Trace: >>>>> [55343.238353] ? __schedule+0x3cf/0x680 >>>>> [55343.238971] ? bit_wait+0x50/0x50 >>>>> [55343.239592] schedule+0x39/0xa0 >>>>> [55343.240173] io_schedule+0x12/0x40 >>>>> [55343.240721] bit_wait_io+0xd/0x50 >>>>> [55343.241266] __wait_on_bit+0x66/0x90 >>>>> [55343.241835] ? bit_wait+0x50/0x50 >>>>> [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [55343.242938] ? init_wait_var_entry+0x40/0x40 >>>>> [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [55343.245843] do_writepages+0x1a/0x60 >>>>> [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [55343.251139] ? retarget_shared_pending+0x70/0x70 >>>>> [55343.251628] do_fsync+0x38/0x60 >>>>> [55343.252208] __x64_sys_fdatasync+0x13/0x20 >>>>> [55343.252702] do_syscall_64+0x55/0x1a0 >>>>> [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [55343.253798] RIP: 0033:0x7f1db3fc85f0 >>>>> [55343.254294] Code: Bad RIP value. >>>>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. >>>>> [55464.069701] Not tainted 5.3.0-rc8 #1 >>>>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [55464.071637] rsync D 0 9830 9829 0x00004002 >>>>> [55464.072637] Call Trace: >>>>> [55464.073623] ? __schedule+0x3cf/0x680 >>>>> [55464.074604] ? bit_wait+0x50/0x50 >>>>> [55464.075577] schedule+0x39/0xa0 >>>>> [55464.076531] io_schedule+0x12/0x40 >>>>> [55464.077480] bit_wait_io+0xd/0x50 >>>>> [55464.078400] __wait_on_bit+0x66/0x90 >>>>> [55464.079300] ? bit_wait+0x50/0x50 >>>>> [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [55464.081107] ? init_wait_var_entry+0x40/0x40 >>>>> [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [55464.085456] do_writepages+0x1a/0x60 >>>>> [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [55464.089043] ? retarget_shared_pending+0x70/0x70 >>>>> [55464.089429] do_fsync+0x38/0x60 >>>>> [55464.089811] __x64_sys_fdatasync+0x13/0x20 >>>>> [55464.090190] do_syscall_64+0x55/0x1a0 >>>>> [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [55464.090944] RIP: 0033:0x7f1db3fc85f0 >>>>> [55464.091321] Code: Bad RIP value. >>>>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. >>>>> [55584.903748] Not tainted 5.3.0-rc8 #1 >>>>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [55584.906023] rsync D 0 9830 9829 0x00004002 >>>>> [55584.907207] Call Trace: >>>>> [55584.908355] ? __schedule+0x3cf/0x680 >>>>> [55584.909507] ? bit_wait+0x50/0x50 >>>>> [55584.910682] schedule+0x39/0xa0 >>>>> [55584.911230] io_schedule+0x12/0x40 >>>>> [55584.911666] bit_wait_io+0xd/0x50 >>>>> [55584.912092] __wait_on_bit+0x66/0x90 >>>>> [55584.912510] ? bit_wait+0x50/0x50 >>>>> [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [55584.913343] ? init_wait_var_entry+0x40/0x40 >>>>> [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [55584.915588] do_writepages+0x1a/0x60 >>>>> [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [55584.919679] ? retarget_shared_pending+0x70/0x70 >>>>> [55584.920122] do_fsync+0x38/0x60 >>>>> [55584.920559] __x64_sys_fdatasync+0x13/0x20 >>>>> [55584.920996] do_syscall_64+0x55/0x1a0 >>>>> [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [55584.921865] RIP: 0033:0x7f1db3fc85f0 >>>>> [55584.922298] Code: Bad RIP value. >>>>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. >>>>> [55705.736999] Not tainted 5.3.0-rc8 #1 >>>>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [55705.738411] rsync D 0 9830 9829 0x00004002 >>>>> [55705.739072] Call Trace: >>>>> [55705.739455] ? __schedule+0x3cf/0x680 >>>>> [55705.739837] ? bit_wait+0x50/0x50 >>>>> [55705.740215] schedule+0x39/0xa0 >>>>> [55705.740610] io_schedule+0x12/0x40 >>>>> [55705.741243] bit_wait_io+0xd/0x50 >>>>> [55705.741897] __wait_on_bit+0x66/0x90 >>>>> [55705.742524] ? bit_wait+0x50/0x50 >>>>> [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [55705.743750] ? init_wait_var_entry+0x40/0x40 >>>>> [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [55705.746753] do_writepages+0x1a/0x60 >>>>> [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [55705.752981] ? retarget_shared_pending+0x70/0x70 >>>>> [55705.753686] do_fsync+0x38/0x60 >>>>> [55705.754340] __x64_sys_fdatasync+0x13/0x20 >>>>> [55705.755012] do_syscall_64+0x55/0x1a0 >>>>> [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [55705.756375] RIP: 0033:0x7f1db3fc85f0 >>>>> [55705.757042] Code: Bad RIP value. >>>>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. >>>>> [55826.571349] Not tainted 5.3.0-rc8 #1 >>>>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> [55826.573618] rsync D 0 9830 9829 0x00004002 >>>>> [55826.574790] Call Trace: >>>>> [55826.575932] ? __schedule+0x3cf/0x680 >>>>> [55826.577079] ? bit_wait+0x50/0x50 >>>>> [55826.578233] schedule+0x39/0xa0 >>>>> [55826.579350] io_schedule+0x12/0x40 >>>>> [55826.580451] bit_wait_io+0xd/0x50 >>>>> [55826.581527] __wait_on_bit+0x66/0x90 >>>>> [55826.582596] ? bit_wait+0x50/0x50 >>>>> [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 >>>>> [55826.583550] ? init_wait_var_entry+0x40/0x40 >>>>> [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] >>>>> [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] >>>>> [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] >>>>> [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] >>>>> [55826.585547] do_writepages+0x1a/0x60 >>>>> [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 >>>>> [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] >>>>> [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] >>>>> [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] >>>>> [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] >>>>> [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] >>>>> [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] >>>>> [55826.589219] ? retarget_shared_pending+0x70/0x70 >>>>> [55826.589617] do_fsync+0x38/0x60 >>>>> [55826.590011] __x64_sys_fdatasync+0x13/0x20 >>>>> [55826.590411] do_syscall_64+0x55/0x1a0 >>>>> [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>>>> [55826.591185] RIP: 0033:0x7f1db3fc85f0 >>>>> [55826.591572] Code: Bad RIP value. >>>>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: >>>>> 000000000000004b >>>>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: >>>>> 00007f1db3fc85f0 >>>>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: >>>>> 0000000000000001 >>>>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: >>>>> 0000000081c492ca >>>>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: >>>>> 0000000000000028 >>>>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> >>>>> >>>>> Greets, >>>>> Stefan >>>> >>>> -- >>>> Michal Hocko >>>> SUSE Labs >> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) 2019-09-11 16:15 ` Stefan Priebe - Profihost AG @ 2019-09-11 16:19 ` Filipe Manana 0 siblings, 0 replies; 61+ messages in thread From: Filipe Manana @ 2019-09-11 16:19 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka, Jens Axboe, linux-block, linux-fsdevel, David Sterba, linux-btrfs On Wed, Sep 11, 2019 at 5:15 PM Stefan Priebe - Profihost AG <s.priebe@profihost.ag> wrote: > > Am 11.09.19 um 17:56 schrieb Filipe Manana: > > On Wed, Sep 11, 2019 at 4:39 PM Stefan Priebe - Profihost AG > > <s.priebe@profihost.ag> wrote: > >> > >> Thanks! Is this the same as for the 5.3-rc8 I tested? Stacktrace looked different to me. > > > > I don't know, I can't see that backtrace. The thread was split and > > I've only seen the one sent to the btrfs list. > > Hi, > > strange. > > This is the 5.3-rc8 stacktrace: > https://lore.kernel.org/linux-mm/d07620d9-4967-40fe-fa0f-be51f2459dc5@profihost.ag/ It's the same. > > and this the 5.2.14: > https://lore.kernel.org/linux-mm/289fbe71-0472-520f-64e2-b6d07ced5436@profihost.ag/ > > Greets, > Stefan > > >> > >> Stefan > >> > >>> Am 11.09.2019 um 16:56 schrieb Filipe Manana <fdmanana@kernel.org>: > >>> > >>>> On Wed, Sep 11, 2019 at 8:10 AM Michal Hocko <mhocko@kernel.org> wrote: > >>>> > >>>> This smells like IO/Btrfs issue to me. Cc some more people. > >>>> > >>>>> On Wed 11-09-19 08:12:28, Stefan Priebe - Profihost AG wrote: > >>>>> [...] > >>>>> Sadly i'm running into issues with btrfs on 5.3-rc8 - the rsync process > >>>>> on backup disk completely hangs / is blocked at 100% i/o: > >>>>> [54739.065906] INFO: task rsync:9830 blocked for more than 120 seconds. > >>>>> [54739.066973] Not tainted 5.3.0-rc8 #1 > >>>>> [54739.067988] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [54739.069065] rsync D 0 9830 9829 0x00004002 > >>>>> [54739.070146] Call Trace: > >>>>> [54739.071183] ? __schedule+0x3cf/0x680 > >>>>> [54739.072202] ? bit_wait+0x50/0x50 > >>>>> [54739.073196] schedule+0x39/0xa0 > >>>>> [54739.074213] io_schedule+0x12/0x40 > >>>>> [54739.075219] bit_wait_io+0xd/0x50 > >>>>> [54739.076227] __wait_on_bit+0x66/0x90 > >>>>> [54739.077239] ? bit_wait+0x50/0x50 > >>>>> [54739.078273] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [54739.078741] ? init_wait_var_entry+0x40/0x40 > >>>>> [54739.079162] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [54739.079557] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [54739.079956] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [54739.080357] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [54739.080748] do_writepages+0x1a/0x60 > >>>>> [54739.081140] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [54739.081558] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [54739.081985] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [54739.082412] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [54739.082847] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [54739.083280] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [54739.083725] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [54739.084170] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [54739.084608] ? retarget_shared_pending+0x70/0x70 > >>>>> [54739.085049] do_fsync+0x38/0x60 > >>>>> [54739.085494] __x64_sys_fdatasync+0x13/0x20 > >>>>> [54739.085944] do_syscall_64+0x55/0x1a0 > >>>>> [54739.086395] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [54739.086850] RIP: 0033:0x7f1db3fc85f0 > >>>>> [54739.087310] Code: Bad RIP value. > >>> > >>> It's a regression introduced in 5.2 > >>> Fix just sent: https://lore.kernel.org/linux-btrfs/20190911145542.1125-1-fdmanana@kernel.org/T/#u > >>> > >>> Thanks. > >>> > >>>>> [54739.087772] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [54739.088249] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [54739.088733] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [54739.089234] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [54739.089722] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [54739.090205] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [54859.899715] INFO: task rsync:9830 blocked for more than 241 seconds. > >>>>> [54859.900863] Not tainted 5.3.0-rc8 #1 > >>>>> [54859.901885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [54859.902909] rsync D 0 9830 9829 0x00004002 > >>>>> [54859.903930] Call Trace: > >>>>> [54859.904888] ? __schedule+0x3cf/0x680 > >>>>> [54859.905831] ? bit_wait+0x50/0x50 > >>>>> [54859.906751] schedule+0x39/0xa0 > >>>>> [54859.907653] io_schedule+0x12/0x40 > >>>>> [54859.908535] bit_wait_io+0xd/0x50 > >>>>> [54859.909441] __wait_on_bit+0x66/0x90 > >>>>> [54859.910306] ? bit_wait+0x50/0x50 > >>>>> [54859.911177] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [54859.912043] ? init_wait_var_entry+0x40/0x40 > >>>>> [54859.912727] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [54859.913113] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [54859.913501] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [54859.913894] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [54859.914276] do_writepages+0x1a/0x60 > >>>>> [54859.914656] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [54859.915052] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [54859.915449] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [54859.915855] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [54859.916256] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [54859.916658] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [54859.917078] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [54859.917497] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [54859.917903] ? retarget_shared_pending+0x70/0x70 > >>>>> [54859.918307] do_fsync+0x38/0x60 > >>>>> [54859.918707] __x64_sys_fdatasync+0x13/0x20 > >>>>> [54859.919106] do_syscall_64+0x55/0x1a0 > >>>>> [54859.919482] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [54859.919866] RIP: 0033:0x7f1db3fc85f0 > >>>>> [54859.920243] Code: Bad RIP value. > >>>>> [54859.920614] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [54859.920997] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [54859.921383] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [54859.921773] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [54859.922165] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [54859.922551] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [54980.733463] INFO: task rsync:9830 blocked for more than 362 seconds. > >>>>> [54980.734061] Not tainted 5.3.0-rc8 #1 > >>>>> [54980.734619] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [54980.735209] rsync D 0 9830 9829 0x00004002 > >>>>> [54980.735802] Call Trace: > >>>>> [54980.736473] ? __schedule+0x3cf/0x680 > >>>>> [54980.737054] ? bit_wait+0x50/0x50 > >>>>> [54980.737664] schedule+0x39/0xa0 > >>>>> [54980.738243] io_schedule+0x12/0x40 > >>>>> [54980.738712] bit_wait_io+0xd/0x50 > >>>>> [54980.739171] __wait_on_bit+0x66/0x90 > >>>>> [54980.739623] ? bit_wait+0x50/0x50 > >>>>> [54980.740073] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [54980.740548] ? init_wait_var_entry+0x40/0x40 > >>>>> [54980.741033] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [54980.741579] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [54980.742076] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [54980.742560] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [54980.743045] do_writepages+0x1a/0x60 > >>>>> [54980.743516] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [54980.744019] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [54980.744513] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [54980.745026] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [54980.745563] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [54980.746073] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [54980.746575] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [54980.747074] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [54980.747575] ? retarget_shared_pending+0x70/0x70 > >>>>> [54980.748059] do_fsync+0x38/0x60 > >>>>> [54980.748539] __x64_sys_fdatasync+0x13/0x20 > >>>>> [54980.749012] do_syscall_64+0x55/0x1a0 > >>>>> [54980.749512] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [54980.749995] RIP: 0033:0x7f1db3fc85f0 > >>>>> [54980.750368] Code: Bad RIP value. > >>>>> [54980.750735] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [54980.751117] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [54980.751505] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [54980.751895] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [54980.752291] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [54980.752680] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [55101.567251] INFO: task rsync:9830 blocked for more than 483 seconds. > >>>>> [55101.567775] Not tainted 5.3.0-rc8 #1 > >>>>> [55101.568218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [55101.568649] rsync D 0 9830 9829 0x00004002 > >>>>> [55101.569101] Call Trace: > >>>>> [55101.569609] ? __schedule+0x3cf/0x680 > >>>>> [55101.570052] ? bit_wait+0x50/0x50 > >>>>> [55101.570504] schedule+0x39/0xa0 > >>>>> [55101.570938] io_schedule+0x12/0x40 > >>>>> [55101.571404] bit_wait_io+0xd/0x50 > >>>>> [55101.571934] __wait_on_bit+0x66/0x90 > >>>>> [55101.572601] ? bit_wait+0x50/0x50 > >>>>> [55101.573235] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [55101.573599] ? init_wait_var_entry+0x40/0x40 > >>>>> [55101.574008] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [55101.574394] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [55101.574783] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [55101.575184] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [55101.575580] do_writepages+0x1a/0x60 > >>>>> [55101.575959] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [55101.576351] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [55101.576746] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [55101.577144] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [55101.577543] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55101.577939] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55101.578343] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [55101.578746] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [55101.579139] ? retarget_shared_pending+0x70/0x70 > >>>>> [55101.579543] do_fsync+0x38/0x60 > >>>>> [55101.579928] __x64_sys_fdatasync+0x13/0x20 > >>>>> [55101.580312] do_syscall_64+0x55/0x1a0 > >>>>> [55101.580706] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [55101.581086] RIP: 0033:0x7f1db3fc85f0 > >>>>> [55101.581463] Code: Bad RIP value. > >>>>> [55101.581834] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [55101.582219] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [55101.582607] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [55101.582998] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [55101.583397] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [55101.583784] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [55222.405056] INFO: task rsync:9830 blocked for more than 604 seconds. > >>>>> [55222.405773] Not tainted 5.3.0-rc8 #1 > >>>>> [55222.406456] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [55222.407158] rsync D 0 9830 9829 0x00004002 > >>>>> [55222.407776] Call Trace: > >>>>> [55222.408450] ? __schedule+0x3cf/0x680 > >>>>> [55222.409206] ? bit_wait+0x50/0x50 > >>>>> [55222.409942] schedule+0x39/0xa0 > >>>>> [55222.410658] io_schedule+0x12/0x40 > >>>>> [55222.411346] bit_wait_io+0xd/0x50 > >>>>> [55222.411946] __wait_on_bit+0x66/0x90 > >>>>> [55222.412572] ? bit_wait+0x50/0x50 > >>>>> [55222.413249] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [55222.413944] ? init_wait_var_entry+0x40/0x40 > >>>>> [55222.414675] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [55222.415362] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [55222.416085] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [55222.416796] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [55222.417505] do_writepages+0x1a/0x60 > >>>>> [55222.418243] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [55222.418969] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [55222.419713] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [55222.420453] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [55222.421206] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55222.421925] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55222.422656] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [55222.423400] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [55222.424140] ? retarget_shared_pending+0x70/0x70 > >>>>> [55222.424861] do_fsync+0x38/0x60 > >>>>> [55222.425581] __x64_sys_fdatasync+0x13/0x20 > >>>>> [55222.426308] do_syscall_64+0x55/0x1a0 > >>>>> [55222.427025] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [55222.427732] RIP: 0033:0x7f1db3fc85f0 > >>>>> [55222.428396] Code: Bad RIP value. > >>>>> [55222.429087] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [55222.429757] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [55222.430451] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [55222.431159] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [55222.431856] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [55222.432544] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [55343.234863] INFO: task rsync:9830 blocked for more than 724 seconds. > >>>>> [55343.235887] Not tainted 5.3.0-rc8 #1 > >>>>> [55343.236611] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [55343.237213] rsync D 0 9830 9829 0x00004002 > >>>>> [55343.237766] Call Trace: > >>>>> [55343.238353] ? __schedule+0x3cf/0x680 > >>>>> [55343.238971] ? bit_wait+0x50/0x50 > >>>>> [55343.239592] schedule+0x39/0xa0 > >>>>> [55343.240173] io_schedule+0x12/0x40 > >>>>> [55343.240721] bit_wait_io+0xd/0x50 > >>>>> [55343.241266] __wait_on_bit+0x66/0x90 > >>>>> [55343.241835] ? bit_wait+0x50/0x50 > >>>>> [55343.242418] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [55343.242938] ? init_wait_var_entry+0x40/0x40 > >>>>> [55343.243496] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [55343.244090] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [55343.244720] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [55343.245296] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [55343.245843] do_writepages+0x1a/0x60 > >>>>> [55343.246407] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [55343.247014] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [55343.247631] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [55343.248186] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [55343.248743] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55343.249326] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55343.249931] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [55343.250562] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [55343.251139] ? retarget_shared_pending+0x70/0x70 > >>>>> [55343.251628] do_fsync+0x38/0x60 > >>>>> [55343.252208] __x64_sys_fdatasync+0x13/0x20 > >>>>> [55343.252702] do_syscall_64+0x55/0x1a0 > >>>>> [55343.253212] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [55343.253798] RIP: 0033:0x7f1db3fc85f0 > >>>>> [55343.254294] Code: Bad RIP value. > >>>>> [55343.254821] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [55343.255404] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [55343.255989] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [55343.256521] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [55343.257073] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [55343.257649] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [55464.068704] INFO: task rsync:9830 blocked for more than 845 seconds. > >>>>> [55464.069701] Not tainted 5.3.0-rc8 #1 > >>>>> [55464.070655] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [55464.071637] rsync D 0 9830 9829 0x00004002 > >>>>> [55464.072637] Call Trace: > >>>>> [55464.073623] ? __schedule+0x3cf/0x680 > >>>>> [55464.074604] ? bit_wait+0x50/0x50 > >>>>> [55464.075577] schedule+0x39/0xa0 > >>>>> [55464.076531] io_schedule+0x12/0x40 > >>>>> [55464.077480] bit_wait_io+0xd/0x50 > >>>>> [55464.078400] __wait_on_bit+0x66/0x90 > >>>>> [55464.079300] ? bit_wait+0x50/0x50 > >>>>> [55464.080184] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [55464.081107] ? init_wait_var_entry+0x40/0x40 > >>>>> [55464.082047] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [55464.083001] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [55464.083963] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [55464.084944] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [55464.085456] do_writepages+0x1a/0x60 > >>>>> [55464.085840] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [55464.086231] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [55464.086625] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [55464.087019] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [55464.087417] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55464.087814] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55464.088219] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [55464.088652] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [55464.089043] ? retarget_shared_pending+0x70/0x70 > >>>>> [55464.089429] do_fsync+0x38/0x60 > >>>>> [55464.089811] __x64_sys_fdatasync+0x13/0x20 > >>>>> [55464.090190] do_syscall_64+0x55/0x1a0 > >>>>> [55464.090568] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [55464.090944] RIP: 0033:0x7f1db3fc85f0 > >>>>> [55464.091321] Code: Bad RIP value. > >>>>> [55464.091693] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [55464.092078] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [55464.092467] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [55464.092863] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [55464.093254] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [55464.093643] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [55584.902564] INFO: task rsync:9830 blocked for more than 966 seconds. > >>>>> [55584.903748] Not tainted 5.3.0-rc8 #1 > >>>>> [55584.904868] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [55584.906023] rsync D 0 9830 9829 0x00004002 > >>>>> [55584.907207] Call Trace: > >>>>> [55584.908355] ? __schedule+0x3cf/0x680 > >>>>> [55584.909507] ? bit_wait+0x50/0x50 > >>>>> [55584.910682] schedule+0x39/0xa0 > >>>>> [55584.911230] io_schedule+0x12/0x40 > >>>>> [55584.911666] bit_wait_io+0xd/0x50 > >>>>> [55584.912092] __wait_on_bit+0x66/0x90 > >>>>> [55584.912510] ? bit_wait+0x50/0x50 > >>>>> [55584.912924] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [55584.913343] ? init_wait_var_entry+0x40/0x40 > >>>>> [55584.913795] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [55584.914242] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [55584.914698] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [55584.915152] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [55584.915588] do_writepages+0x1a/0x60 > >>>>> [55584.916022] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [55584.916474] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [55584.916928] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [55584.917386] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [55584.917844] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55584.918300] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55584.918772] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [55584.919233] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [55584.919679] ? retarget_shared_pending+0x70/0x70 > >>>>> [55584.920122] do_fsync+0x38/0x60 > >>>>> [55584.920559] __x64_sys_fdatasync+0x13/0x20 > >>>>> [55584.920996] do_syscall_64+0x55/0x1a0 > >>>>> [55584.921429] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [55584.921865] RIP: 0033:0x7f1db3fc85f0 > >>>>> [55584.922298] Code: Bad RIP value. > >>>>> [55584.922734] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [55584.923174] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [55584.923568] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [55584.923982] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [55584.924378] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [55584.924774] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [55705.736285] INFO: task rsync:9830 blocked for more than 1087 seconds. > >>>>> [55705.736999] Not tainted 5.3.0-rc8 #1 > >>>>> [55705.737694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [55705.738411] rsync D 0 9830 9829 0x00004002 > >>>>> [55705.739072] Call Trace: > >>>>> [55705.739455] ? __schedule+0x3cf/0x680 > >>>>> [55705.739837] ? bit_wait+0x50/0x50 > >>>>> [55705.740215] schedule+0x39/0xa0 > >>>>> [55705.740610] io_schedule+0x12/0x40 > >>>>> [55705.741243] bit_wait_io+0xd/0x50 > >>>>> [55705.741897] __wait_on_bit+0x66/0x90 > >>>>> [55705.742524] ? bit_wait+0x50/0x50 > >>>>> [55705.743131] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [55705.743750] ? init_wait_var_entry+0x40/0x40 > >>>>> [55705.744128] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [55705.744766] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [55705.745440] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [55705.746118] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [55705.746753] do_writepages+0x1a/0x60 > >>>>> [55705.747411] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [55705.748106] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [55705.748807] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [55705.749495] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [55705.750190] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55705.750890] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55705.751580] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [55705.752293] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [55705.752981] ? retarget_shared_pending+0x70/0x70 > >>>>> [55705.753686] do_fsync+0x38/0x60 > >>>>> [55705.754340] __x64_sys_fdatasync+0x13/0x20 > >>>>> [55705.755012] do_syscall_64+0x55/0x1a0 > >>>>> [55705.755678] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [55705.756375] RIP: 0033:0x7f1db3fc85f0 > >>>>> [55705.757042] Code: Bad RIP value. > >>>>> [55705.757690] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [55705.758300] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [55705.758678] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [55705.759107] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [55705.759785] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [55705.760471] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> [55826.570182] INFO: task rsync:9830 blocked for more than 1208 seconds. > >>>>> [55826.571349] Not tainted 5.3.0-rc8 #1 > >>>>> [55826.572469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > >>>>> disables this message. > >>>>> [55826.573618] rsync D 0 9830 9829 0x00004002 > >>>>> [55826.574790] Call Trace: > >>>>> [55826.575932] ? __schedule+0x3cf/0x680 > >>>>> [55826.577079] ? bit_wait+0x50/0x50 > >>>>> [55826.578233] schedule+0x39/0xa0 > >>>>> [55826.579350] io_schedule+0x12/0x40 > >>>>> [55826.580451] bit_wait_io+0xd/0x50 > >>>>> [55826.581527] __wait_on_bit+0x66/0x90 > >>>>> [55826.582596] ? bit_wait+0x50/0x50 > >>>>> [55826.583178] out_of_line_wait_on_bit+0x8b/0xb0 > >>>>> [55826.583550] ? init_wait_var_entry+0x40/0x40 > >>>>> [55826.583953] lock_extent_buffer_for_io+0x10b/0x2c0 [btrfs] > >>>>> [55826.584356] btree_write_cache_pages+0x17d/0x350 [btrfs] > >>>>> [55826.584755] ? btrfs_set_token_32+0x72/0x130 [btrfs] > >>>>> [55826.585155] ? merge_state.part.47+0x3f/0x160 [btrfs] > >>>>> [55826.585547] do_writepages+0x1a/0x60 > >>>>> [55826.585937] __filemap_fdatawrite_range+0xc8/0x100 > >>>>> [55826.586352] ? convert_extent_bit+0x2e8/0x580 [btrfs] > >>>>> [55826.586761] btrfs_write_marked_extents+0x141/0x160 [btrfs] > >>>>> [55826.587171] btrfs_write_and_wait_transaction.isra.26+0x58/0xb0 [btrfs] > >>>>> [55826.587581] ? btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55826.587990] btrfs_commit_transaction+0x752/0x9d0 [btrfs] > >>>>> [55826.588406] ? btrfs_log_dentry_safe+0x54/0x70 [btrfs] > >>>>> [55826.588818] btrfs_sync_file+0x395/0x3e0 [btrfs] > >>>>> [55826.589219] ? retarget_shared_pending+0x70/0x70 > >>>>> [55826.589617] do_fsync+0x38/0x60 > >>>>> [55826.590011] __x64_sys_fdatasync+0x13/0x20 > >>>>> [55826.590411] do_syscall_64+0x55/0x1a0 > >>>>> [55826.590798] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>>>> [55826.591185] RIP: 0033:0x7f1db3fc85f0 > >>>>> [55826.591572] Code: Bad RIP value. > >>>>> [55826.591952] RSP: 002b:00007ffe6f827db8 EFLAGS: 00000246 ORIG_RAX: > >>>>> 000000000000004b > >>>>> [55826.592347] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: > >>>>> 00007f1db3fc85f0 > >>>>> [55826.592743] RDX: 00007f1db4aa6060 RSI: 0000000000000003 RDI: > >>>>> 0000000000000001 > >>>>> [55826.593143] RBP: 0000000000000001 R08: 0000000000000000 R09: > >>>>> 0000000081c492ca > >>>>> [55826.593543] R10: 0000000000000008 R11: 0000000000000246 R12: > >>>>> 0000000000000028 > >>>>> [55826.593941] R13: 00007ffe6f827e40 R14: 0000000000000000 R15: > >>>>> 0000000000000000 > >>>>> > >>>>> > >>>>> Greets, > >>>>> Stefan > >>>> > >>>> -- > >>>> Michal Hocko > >>>> SUSE Labs > >> ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-11 6:12 ` Stefan Priebe - Profihost AG 2019-09-11 6:24 ` Stefan Priebe - Profihost AG 2019-09-11 7:09 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko @ 2019-09-19 10:21 ` Stefan Priebe - Profihost AG 2019-09-23 12:08 ` Michal Hocko 2019-09-27 12:45 ` Vlastimil Babka 2 siblings, 2 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-19 10:21 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Dear Michal, Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG: > Hi Michal, > Am 10.09.19 um 15:24 schrieb Michal Hocko: >> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote: >>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG: >>>> >>>> Am 10.09.19 um 14:57 schrieb Michal Hocko: >>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote: >>>>>> Hello Michal, >>>>>> >>>>>> ok this might take a long time. Attached you'll find a graph from a >>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory >>>>>> Usage decreases as well as cache but slowly and only over time and days. >>>>>> >>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens. >>>>> >>>>> No problem. Just make sure to collect the requested data from the time >>>>> you see the actual problem. Btw. you try my very dumb scriplets to get >>>>> an idea of how much memory gets reclaimed due to THP. >>>> >>>> You mean your sed and sort on top of the trace file? No i did not with >>>> the current 5.3 kernel do you think it will show anything interesting? >>>> Which line shows me how much memory gets reclaimed due to THP? >> >> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz >> Each command has a commented output. If you see nunmber of reclaimed >> pages to be large for GFP_TRANSHUGE then you are seeing a similar >> problem. >> >>> Is something like a kernel memory leak possible? Or wouldn't this end up >>> in having a lot of free memory which doesn't seem usable. >> >> I would be really surprised if this was the case. >> >>> I also wonder why a reclaim takes place when there is enough memory. >> >> This is not clear yet and it might be a bug that has been fixed since >> 4.18. That's why we need to see whether the same is pattern is happening >> with 5.3 as well. Kernel 5.2.14 is now running since exactly 7 days and now we can easaly view a trend i', not sure if i should post graphs. Cache size is continuously shrinking while memfree is rising. While there were 4,5GB free in avg in the beginnen we now have an avg of 8GB free memory. Cache has shrinked from avg 24G to avg 18G. Memory pressure has rised from avg 0% to avg 0.1% - not much but if you look at the graphs it's continuously rising while cache is shrinking and memfree is rising. Which values should i collect now? Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-19 10:21 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG @ 2019-09-23 12:08 ` Michal Hocko 2019-09-27 12:45 ` Vlastimil Babka 1 sibling, 0 replies; 61+ messages in thread From: Michal Hocko @ 2019-09-23 12:08 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka On Thu 19-09-19 12:21:15, Stefan Priebe - Profihost AG wrote: [...] > Which values should i collect now? Collect the same tracepoints as in the past. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-19 10:21 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG 2019-09-23 12:08 ` Michal Hocko @ 2019-09-27 12:45 ` Vlastimil Babka 2019-09-30 6:56 ` Stefan Priebe - Profihost AG 2019-10-22 7:41 ` Stefan Priebe - Profihost AG 1 sibling, 2 replies; 61+ messages in thread From: Vlastimil Babka @ 2019-09-27 12:45 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner [-- Attachment #1: Type: text/plain, Size: 1279 bytes --] On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote: > Kernel 5.2.14 is now running since exactly 7 days and now we can easaly > view a trend i', not sure if i should post graphs. > > Cache size is continuously shrinking while memfree is rising. > > While there were 4,5GB free in avg in the beginnen we now have an avg of > 8GB free memory. > > Cache has shrinked from avg 24G to avg 18G. > > Memory pressure has rised from avg 0% to avg 0.1% - not much but if you > look at the graphs it's continuously rising while cache is shrinking and > memfree is rising. Hi, could you try the patch below? I suspect you're hitting a corner case where compaction_suitable() returns COMPACT_SKIPPED for the ZONE_DMA, triggering reclaim even if other zones have plenty of free memory. And should_continue_reclaim() then returns true until twice the requested page size is reclaimed (compact_gap()). That means 4MB reclaimed for each THP allocation attempt, which roughly matches the trace data you preovided previously. The amplification to 4MB should be removed in patches merged for 5.4, so it would be only 32 pages reclaimed per THP allocation. The patch below tries to remove this corner case completely, and it should be more visible on your 5.2.x, so please apply it there. [-- Attachment #2: 0001-mm-compaction-distinguish-when-compaction-is-impossi.patch --] [-- Type: text/x-patch, Size: 6381 bytes --] From 565008042b759835d51703f1da9b335dc0404546 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka@suse.cz> Date: Thu, 12 Sep 2019 13:40:46 +0200 Subject: [PATCH] mm, compaction: distinguish when compaction is impossible --- include/linux/compaction.h | 7 ++++++- include/trace/events/mmflags.h | 1 + mm/compaction.c | 16 +++++++++++++-- mm/vmscan.c | 36 ++++++++++++++++++++++++---------- 4 files changed, 47 insertions(+), 13 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 9569e7c786d3..6e624f482a08 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -17,8 +17,13 @@ enum compact_priority { }; /* Return values for compact_zone() and try_to_compact_pages() */ -/* When adding new states, please adjust include/trace/events/compaction.h */ +/* When adding new states, please adjust include/trace/events/mmflags.h */ enum compact_result { + /* + * The zone is too small to provide the requested allocation even if + * fully freed (i.e. ZONE_DMA for THP allocation due to lowmem reserves) + */ + COMPACT_IMPOSSIBLE, /* For more detailed tracepoint output - internal to compaction */ COMPACT_NOT_SUITABLE_ZONE, /* diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a1675d43777e..557dad69a9db 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -170,6 +170,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY, "softdirty" ) \ #ifdef CONFIG_COMPACTION #define COMPACTION_STATUS \ + EM( COMPACT_IMPOSSIBLE, "impossible") \ EM( COMPACT_SKIPPED, "skipped") \ EM( COMPACT_DEFERRED, "deferred") \ EM( COMPACT_CONTINUE, "continue") \ diff --git a/mm/compaction.c b/mm/compaction.c index 9e1b9acb116b..50a3dd2e2b6e 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1948,6 +1948,7 @@ static enum compact_result compact_finished(struct compact_control *cc) /* * compaction_suitable: Is this suitable to run compaction on this zone now? * Returns + * COMPACT_IMPOSSIBLE If the allocation would fail even with all pages free * COMPACT_SKIPPED - If there are too few free pages for compaction * COMPACT_SUCCESS - If the allocation would succeed without compaction * COMPACT_CONTINUE - If compaction should run now @@ -1971,6 +1972,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order, alloc_flags)) return COMPACT_SUCCESS; + /* + * If the allocation would not succeed even with a fully free zone + * due to e.g. lowmem reserves, indicate that compaction can't possibly + * help and it would be pointless to reclaim. + */ + watermark += 1UL << order; + if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx, + alloc_flags, zone_managed_pages(zone))) + return COMPACT_IMPOSSIBLE; + /* * Watermarks for order-0 must be met for compaction to be able to * isolate free pages for migration targets. This means that the @@ -2058,7 +2069,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, available += zone_page_state_snapshot(zone, NR_FREE_PAGES); compact_result = __compaction_suitable(zone, order, alloc_flags, ac_classzone_idx(ac), available); - if (compact_result != COMPACT_SKIPPED) + if (compact_result > COMPACT_SKIPPED) return true; } @@ -2079,7 +2090,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) ret = compaction_suitable(cc->zone, cc->order, cc->alloc_flags, cc->classzone_idx); /* Compaction is likely to fail */ - if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED) + if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED + || ret == COMPACT_IMPOSSIBLE) return ret; /* huh, compaction_suitable is returning something unexpected */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 910e02c793ff..20ba471a8454 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2778,11 +2778,12 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) } /* - * Returns true if compaction should go ahead for a costly-order request, or - * the allocation would already succeed without compaction. Return false if we - * should reclaim first. + * Returns 1 if compaction should go ahead for a costly-order request, or the + * allocation would already succeed without compaction. Return 0 if we should + * reclaim first. Return -1 when compaction can't help at all due to zone being + * too small, which means there's no point in reclaim nor compaction. */ -static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) +static inline int compaction_ready(struct zone *zone, struct scan_control *sc) { unsigned long watermark; enum compact_result suitable; @@ -2790,10 +2791,16 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx); if (suitable == COMPACT_SUCCESS) /* Allocation should succeed already. Don't reclaim. */ - return true; + return 1; if (suitable == COMPACT_SKIPPED) /* Compaction cannot yet proceed. Do reclaim. */ - return false; + return 0; + if (suitable == COMPACT_IMPOSSIBLE) + /* + * Compaction can't possibly help. So don't reclaim, but keep + * checking other zones. + */ + return -1; /* * Compaction is already possible, but it takes time to run and there @@ -2839,6 +2846,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) for_each_zone_zonelist_nodemask(zone, z, zonelist, sc->reclaim_idx, sc->nodemask) { + int compact_ready; /* * Take care memory controller reclaiming has small influence * to global LRU. @@ -2858,10 +2866,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) * page allocations. */ if (IS_ENABLED(CONFIG_COMPACTION) && - sc->order > PAGE_ALLOC_COSTLY_ORDER && - compaction_ready(zone, sc)) { - sc->compaction_ready = true; - continue; + sc->order > PAGE_ALLOC_COSTLY_ORDER) { + compact_ready = compaction_ready(zone, sc); + if (compact_ready == 1) { + sc->compaction_ready = true; + continue; + } else if (compact_ready == -1) { + /* + * In this zone, neither reclaim nor + * compaction can help. + */ + continue; + } } /* -- 2.23.0 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-27 12:45 ` Vlastimil Babka @ 2019-09-30 6:56 ` Stefan Priebe - Profihost AG 2019-09-30 7:21 ` Vlastimil Babka 2019-10-22 7:41 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-30 6:56 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner Hi, the current status is, that everything works well / fine since i switched from CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS to CONFIG_TRANSPARENT_HUGEPAGE_MADVISE Am 27.09.19 um 14:45 schrieb Vlastimil Babka: > On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote: >> Kernel 5.2.14 is now running since exactly 7 days and now we can easaly >> view a trend i', not sure if i should post graphs. >> >> Cache size is continuously shrinking while memfree is rising. >> >> While there were 4,5GB free in avg in the beginnen we now have an avg of >> 8GB free memory. >> >> Cache has shrinked from avg 24G to avg 18G. >> >> Memory pressure has rised from avg 0% to avg 0.1% - not much but if you >> look at the graphs it's continuously rising while cache is shrinking and >> memfree is rising. > > Hi, could you try the patch below? I suspect you're hitting a corner > case where compaction_suitable() returns COMPACT_SKIPPED for the > ZONE_DMA, triggering reclaim even if other zones have plenty of free > memory. And should_continue_reclaim() then returns true until twice the > requested page size is reclaimed (compact_gap()). That means 4MB > reclaimed for each THP allocation attempt, which roughly matches the > trace data you preovided previously. > > The amplification to 4MB should be removed in patches merged for 5.4, so > it would be only 32 pages reclaimed per THP allocation. The patch below > tries to remove this corner case completely, and it should be more > visible on your 5.2.x, so please apply it there. so i switched back to 4.19 LTS Kernel - as this is the kernel we run on all our infrastructures. THP is now only in use von kvm host machines. Your patch applies to 4.19 as well - but not sure if it is a good idea to apply it to those machines. Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-30 6:56 ` Stefan Priebe - Profihost AG @ 2019-09-30 7:21 ` Vlastimil Babka 0 siblings, 0 replies; 61+ messages in thread From: Vlastimil Babka @ 2019-09-30 7:21 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On 9/30/19 8:56 AM, Stefan Priebe - Profihost AG wrote: > Hi, > > the current status is, that everything works well / fine since i > switched from CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS to > CONFIG_TRANSPARENT_HUGEPAGE_MADVISE Thanks, that indeed confirms the problem is related to THP's. > Am 27.09.19 um 14:45 schrieb Vlastimil Babka: >> On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote: >> >> Hi, could you try the patch below? I suspect you're hitting a corner >> case where compaction_suitable() returns COMPACT_SKIPPED for the >> ZONE_DMA, triggering reclaim even if other zones have plenty of free >> memory. And should_continue_reclaim() then returns true until twice the >> requested page size is reclaimed (compact_gap()). That means 4MB >> reclaimed for each THP allocation attempt, which roughly matches the >> trace data you preovided previously. >> >> The amplification to 4MB should be removed in patches merged for 5.4, so >> it would be only 32 pages reclaimed per THP allocation. The patch below >> tries to remove this corner case completely, and it should be more >> visible on your 5.2.x, so please apply it there. > > so i switched back to 4.19 LTS Kernel - as this is the kernel we run on > all our infrastructures. THP is now only in use von kvm host machines. > Your patch applies to 4.19 as well - but not sure if it is a good idea > to apply it to those machines. If you could try that, it would be great (and switch back hugepages to always after applying). The problem is older than 4.19. > Greets, > Stefan > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-27 12:45 ` Vlastimil Babka 2019-09-30 6:56 ` Stefan Priebe - Profihost AG @ 2019-10-22 7:41 ` Stefan Priebe - Profihost AG 2019-10-22 7:48 ` Vlastimil Babka 1 sibling, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-10-22 7:41 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner Hi, Am 27.09.19 um 14:45 schrieb Vlastimil Babka: > On 9/19/19 12:21 PM, Stefan Priebe - Profihost AG wrote: >> Kernel 5.2.14 is now running since exactly 7 days and now we can easaly >> view a trend i', not sure if i should post graphs. >> >> Cache size is continuously shrinking while memfree is rising. >> >> While there were 4,5GB free in avg in the beginnen we now have an avg of >> 8GB free memory. >> >> Cache has shrinked from avg 24G to avg 18G. >> >> Memory pressure has rised from avg 0% to avg 0.1% - not much but if you >> look at the graphs it's continuously rising while cache is shrinking and >> memfree is rising. > > Hi, could you try the patch below? I suspect you're hitting a corner > case where compaction_suitable() returns COMPACT_SKIPPED for the > ZONE_DMA, triggering reclaim even if other zones have plenty of free > memory. And should_continue_reclaim() then returns true until twice the > requested page size is reclaimed (compact_gap()). That means 4MB > reclaimed for each THP allocation attempt, which roughly matches the > trace data you preovided previously. > > The amplification to 4MB should be removed in patches merged for 5.4, so > it would be only 32 pages reclaimed per THP allocation. The patch below > tries to remove this corner case completely, and it should be more > visible on your 5.2.x, so please apply it there. > is there any reason to not apply that one on top of 4.19? Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-10-22 7:41 ` Stefan Priebe - Profihost AG @ 2019-10-22 7:48 ` Vlastimil Babka 2019-10-22 10:02 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Vlastimil Babka @ 2019-10-22 7:48 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote: >> Hi, could you try the patch below? I suspect you're hitting a corner >> case where compaction_suitable() returns COMPACT_SKIPPED for the >> ZONE_DMA, triggering reclaim even if other zones have plenty of free >> memory. And should_continue_reclaim() then returns true until twice the >> requested page size is reclaimed (compact_gap()). That means 4MB >> reclaimed for each THP allocation attempt, which roughly matches the >> trace data you preovided previously. >> >> The amplification to 4MB should be removed in patches merged for 5.4, so >> it would be only 32 pages reclaimed per THP allocation. The patch below >> tries to remove this corner case completely, and it should be more >> visible on your 5.2.x, so please apply it there. >> > is there any reason to not apply that one on top of 4.19? > > Greets, > Stefan > It should work, cherrypicks fine without conflict here. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-10-22 7:48 ` Vlastimil Babka @ 2019-10-22 10:02 ` Stefan Priebe - Profihost AG 2019-10-22 10:20 ` Oscar Salvador 2019-10-22 10:21 ` Vlastimil Babka 0 siblings, 2 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-10-22 10:02 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner Am 22.10.19 um 09:48 schrieb Vlastimil Babka: > On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote: >>> Hi, could you try the patch below? I suspect you're hitting a corner >>> case where compaction_suitable() returns COMPACT_SKIPPED for the >>> ZONE_DMA, triggering reclaim even if other zones have plenty of free >>> memory. And should_continue_reclaim() then returns true until twice the >>> requested page size is reclaimed (compact_gap()). That means 4MB >>> reclaimed for each THP allocation attempt, which roughly matches the >>> trace data you preovided previously. >>> >>> The amplification to 4MB should be removed in patches merged for 5.4, so >>> it would be only 32 pages reclaimed per THP allocation. The patch below >>> tries to remove this corner case completely, and it should be more >>> visible on your 5.2.x, so please apply it there. >>> >> is there any reason to not apply that one on top of 4.19? >> >> Greets, >> Stefan >> > > It should work, cherrypicks fine without conflict here. OK but does not work ;-) mm/compaction.c: In function '__compaction_suitable': mm/compaction.c:1451:19: error: implicit declaration of function 'zone_managed_pages'; did you mean 'node_spanned_pages'? [-Werror=implicit-function-declaration] alloc_flags, zone_managed_pages(zone))) ^~~~~~~~~~~~~~~~~~ node_spanned_pages Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-10-22 10:02 ` Stefan Priebe - Profihost AG @ 2019-10-22 10:20 ` Oscar Salvador 2019-10-22 10:21 ` Vlastimil Babka 1 sibling, 0 replies; 61+ messages in thread From: Oscar Salvador @ 2019-10-22 10:20 UTC (permalink / raw) To: Stefan Priebe - Profihost AG Cc: Vlastimil Babka, Michal Hocko, linux-mm, l.roehrs, cgroups, Johannes Weiner On Tue, Oct 22, 2019 at 12:02:13PM +0200, Stefan Priebe - Profihost AG wrote: > > Am 22.10.19 um 09:48 schrieb Vlastimil Babka: > > On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote: > >>> Hi, could you try the patch below? I suspect you're hitting a corner > >>> case where compaction_suitable() returns COMPACT_SKIPPED for the > >>> ZONE_DMA, triggering reclaim even if other zones have plenty of free > >>> memory. And should_continue_reclaim() then returns true until twice the > >>> requested page size is reclaimed (compact_gap()). That means 4MB > >>> reclaimed for each THP allocation attempt, which roughly matches the > >>> trace data you preovided previously. > >>> > >>> The amplification to 4MB should be removed in patches merged for 5.4, so > >>> it would be only 32 pages reclaimed per THP allocation. The patch below > >>> tries to remove this corner case completely, and it should be more > >>> visible on your 5.2.x, so please apply it there. > >>> > >> is there any reason to not apply that one on top of 4.19? > >> > >> Greets, > >> Stefan > >> > > > > It should work, cherrypicks fine without conflict here. > > OK but does not work ;-) > > > mm/compaction.c: In function '__compaction_suitable': > mm/compaction.c:1451:19: error: implicit declaration of function > 'zone_managed_pages'; did you mean 'node_spanned_pages'? > [-Werror=implicit-function-declaration] > alloc_flags, zone_managed_pages(zone))) > ^~~~~~~~~~~~~~~~~~ > node_spanned_pages zone_managed_pages() was introduced later. On 4.19, you need zone->managed_pages. So, changing zone_managed_pages(zone) to zone->managed_pages in that chunk should make the trick. > > Greets, > Stefan > > > -- Oscar Salvador SUSE L3 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-10-22 10:02 ` Stefan Priebe - Profihost AG 2019-10-22 10:20 ` Oscar Salvador @ 2019-10-22 10:21 ` Vlastimil Babka 2019-10-22 11:08 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Vlastimil Babka @ 2019-10-22 10:21 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On 10/22/19 12:02 PM, Stefan Priebe - Profihost AG wrote: > > Am 22.10.19 um 09:48 schrieb Vlastimil Babka: >> On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote: >>>> Hi, could you try the patch below? I suspect you're hitting a corner >>>> case where compaction_suitable() returns COMPACT_SKIPPED for the >>>> ZONE_DMA, triggering reclaim even if other zones have plenty of free >>>> memory. And should_continue_reclaim() then returns true until twice the >>>> requested page size is reclaimed (compact_gap()). That means 4MB >>>> reclaimed for each THP allocation attempt, which roughly matches the >>>> trace data you preovided previously. >>>> >>>> The amplification to 4MB should be removed in patches merged for 5.4, so >>>> it would be only 32 pages reclaimed per THP allocation. The patch below >>>> tries to remove this corner case completely, and it should be more >>>> visible on your 5.2.x, so please apply it there. >>>> >>> is there any reason to not apply that one on top of 4.19? >>> >>> Greets, >>> Stefan >>> >> >> It should work, cherrypicks fine without conflict here. > > OK but does not work ;-) > > > mm/compaction.c: In function '__compaction_suitable': > mm/compaction.c:1451:19: error: implicit declaration of function > 'zone_managed_pages'; did you mean 'node_spanned_pages'? > [-Werror=implicit-function-declaration] > alloc_flags, zone_managed_pages(zone))) > ^~~~~~~~~~~~~~~~~~ > node_spanned_pages Ah, this? ----8<---- From f1335e1c0d4b74205fc0cc40b5960223d6f1dec7 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka <vbabka@suse.cz> Date: Thu, 12 Sep 2019 13:40:46 +0200 Subject: [PATCH] WIP --- include/linux/compaction.h | 7 ++++++- include/trace/events/mmflags.h | 1 + mm/compaction.c | 16 +++++++++++++-- mm/vmscan.c | 36 ++++++++++++++++++++++++---------- 4 files changed, 47 insertions(+), 13 deletions(-) diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 68250a57aace..2f3b331c5239 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -17,8 +17,13 @@ enum compact_priority { }; /* Return values for compact_zone() and try_to_compact_pages() */ -/* When adding new states, please adjust include/trace/events/compaction.h */ +/* When adding new states, please adjust include/trace/events/mmflags.h */ enum compact_result { + /* + * The zone is too small to provide the requested allocation even if + * fully freed (i.e. ZONE_DMA for THP allocation due to lowmem reserves) + */ + COMPACT_IMPOSSIBLE, /* For more detailed tracepoint output - internal to compaction */ COMPACT_NOT_SUITABLE_ZONE, /* diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index a81cffb76d89..d7aa9cece234 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -169,6 +169,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY, "softdirty" ) \ #ifdef CONFIG_COMPACTION #define COMPACTION_STATUS \ + EM( COMPACT_IMPOSSIBLE, "impossible") \ EM( COMPACT_SKIPPED, "skipped") \ EM( COMPACT_DEFERRED, "deferred") \ EM( COMPACT_CONTINUE, "continue") \ diff --git a/mm/compaction.c b/mm/compaction.c index 5079ddbec8f9..7d2299c7faa2 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1416,6 +1416,7 @@ static enum compact_result compact_finished(struct zone *zone, /* * compaction_suitable: Is this suitable to run compaction on this zone now? * Returns + * COMPACT_IMPOSSIBLE If the allocation would fail even with all pages free * COMPACT_SKIPPED - If there are too few free pages for compaction * COMPACT_SUCCESS - If the allocation would succeed without compaction * COMPACT_CONTINUE - If compaction should run now @@ -1439,6 +1440,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order, alloc_flags)) return COMPACT_SUCCESS; + /* + * If the allocation would not succeed even with a fully free zone + * due to e.g. lowmem reserves, indicate that compaction can't possibly + * help and it would be pointless to reclaim. + */ + watermark += 1UL << order; + if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx, + alloc_flags, zone->managed_pages)) + return COMPACT_IMPOSSIBLE; + /* * Watermarks for order-0 must be met for compaction to be able to * isolate free pages for migration targets. This means that the @@ -1526,7 +1537,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, available += zone_page_state_snapshot(zone, NR_FREE_PAGES); compact_result = __compaction_suitable(zone, order, alloc_flags, ac_classzone_idx(ac), available); - if (compact_result != COMPACT_SKIPPED) + if (compact_result > COMPACT_SKIPPED) return true; } @@ -1555,7 +1566,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro ret = compaction_suitable(zone, cc->order, cc->alloc_flags, cc->classzone_idx); /* Compaction is likely to fail */ - if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED) + if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED + || ret == COMPACT_IMPOSSIBLE) return ret; /* huh, compaction_suitable is returning something unexpected */ diff --git a/mm/vmscan.c b/mm/vmscan.c index b37610c0eac6..7ad331a64fc5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2849,11 +2849,12 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) } /* - * Returns true if compaction should go ahead for a costly-order request, or - * the allocation would already succeed without compaction. Return false if we - * should reclaim first. + * Returns 1 if compaction should go ahead for a costly-order request, or the + * allocation would already succeed without compaction. Return 0 if we should + * reclaim first. Return -1 when compaction can't help at all due to zone being + * too small, which means there's no point in reclaim nor compaction. */ -static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) +static inline int compaction_ready(struct zone *zone, struct scan_control *sc) { unsigned long watermark; enum compact_result suitable; @@ -2861,10 +2862,16 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx); if (suitable == COMPACT_SUCCESS) /* Allocation should succeed already. Don't reclaim. */ - return true; + return 1; if (suitable == COMPACT_SKIPPED) /* Compaction cannot yet proceed. Do reclaim. */ - return false; + return 0; + if (suitable == COMPACT_IMPOSSIBLE) + /* + * Compaction can't possibly help. So don't reclaim, but keep + * checking other zones. + */ + return -1; /* * Compaction is already possible, but it takes time to run and there @@ -2910,6 +2917,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) for_each_zone_zonelist_nodemask(zone, z, zonelist, sc->reclaim_idx, sc->nodemask) { + int compact_ready; /* * Take care memory controller reclaiming has small influence * to global LRU. @@ -2929,10 +2937,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) * page allocations. */ if (IS_ENABLED(CONFIG_COMPACTION) && - sc->order > PAGE_ALLOC_COSTLY_ORDER && - compaction_ready(zone, sc)) { - sc->compaction_ready = true; - continue; + sc->order > PAGE_ALLOC_COSTLY_ORDER) { + compact_ready = compaction_ready(zone, sc); + if (compact_ready == 1) { + sc->compaction_ready = true; + continue; + } else if (compact_ready == -1) { + /* + * In this zone, neither reclaim nor + * compaction can help. + */ + continue; + } } /* -- 2.23.0 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-10-22 10:21 ` Vlastimil Babka @ 2019-10-22 11:08 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-10-22 11:08 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner works - thanks Am 22.10.19 um 12:21 schrieb Vlastimil Babka: > On 10/22/19 12:02 PM, Stefan Priebe - Profihost AG wrote: >> >> Am 22.10.19 um 09:48 schrieb Vlastimil Babka: >>> On 10/22/19 9:41 AM, Stefan Priebe - Profihost AG wrote: >>>>> Hi, could you try the patch below? I suspect you're hitting a corner >>>>> case where compaction_suitable() returns COMPACT_SKIPPED for the >>>>> ZONE_DMA, triggering reclaim even if other zones have plenty of free >>>>> memory. And should_continue_reclaim() then returns true until twice the >>>>> requested page size is reclaimed (compact_gap()). That means 4MB >>>>> reclaimed for each THP allocation attempt, which roughly matches the >>>>> trace data you preovided previously. >>>>> >>>>> The amplification to 4MB should be removed in patches merged for 5.4, so >>>>> it would be only 32 pages reclaimed per THP allocation. The patch below >>>>> tries to remove this corner case completely, and it should be more >>>>> visible on your 5.2.x, so please apply it there. >>>>> >>>> is there any reason to not apply that one on top of 4.19? >>>> >>>> Greets, >>>> Stefan >>>> >>> >>> It should work, cherrypicks fine without conflict here. >> >> OK but does not work ;-) >> >> >> mm/compaction.c: In function '__compaction_suitable': >> mm/compaction.c:1451:19: error: implicit declaration of function >> 'zone_managed_pages'; did you mean 'node_spanned_pages'? >> [-Werror=implicit-function-declaration] >> alloc_flags, zone_managed_pages(zone))) >> ^~~~~~~~~~~~~~~~~~ >> node_spanned_pages > > Ah, this? > > ----8<---- > From f1335e1c0d4b74205fc0cc40b5960223d6f1dec7 Mon Sep 17 00:00:00 2001 > From: Vlastimil Babka <vbabka@suse.cz> > Date: Thu, 12 Sep 2019 13:40:46 +0200 > Subject: [PATCH] WIP > > --- > include/linux/compaction.h | 7 ++++++- > include/trace/events/mmflags.h | 1 + > mm/compaction.c | 16 +++++++++++++-- > mm/vmscan.c | 36 ++++++++++++++++++++++++---------- > 4 files changed, 47 insertions(+), 13 deletions(-) > > diff --git a/include/linux/compaction.h b/include/linux/compaction.h > index 68250a57aace..2f3b331c5239 100644 > --- a/include/linux/compaction.h > +++ b/include/linux/compaction.h > @@ -17,8 +17,13 @@ enum compact_priority { > }; > > /* Return values for compact_zone() and try_to_compact_pages() */ > -/* When adding new states, please adjust include/trace/events/compaction.h */ > +/* When adding new states, please adjust include/trace/events/mmflags.h */ > enum compact_result { > + /* > + * The zone is too small to provide the requested allocation even if > + * fully freed (i.e. ZONE_DMA for THP allocation due to lowmem reserves) > + */ > + COMPACT_IMPOSSIBLE, > /* For more detailed tracepoint output - internal to compaction */ > COMPACT_NOT_SUITABLE_ZONE, > /* > diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h > index a81cffb76d89..d7aa9cece234 100644 > --- a/include/trace/events/mmflags.h > +++ b/include/trace/events/mmflags.h > @@ -169,6 +169,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY, "softdirty" ) \ > > #ifdef CONFIG_COMPACTION > #define COMPACTION_STATUS \ > + EM( COMPACT_IMPOSSIBLE, "impossible") \ > EM( COMPACT_SKIPPED, "skipped") \ > EM( COMPACT_DEFERRED, "deferred") \ > EM( COMPACT_CONTINUE, "continue") \ > diff --git a/mm/compaction.c b/mm/compaction.c > index 5079ddbec8f9..7d2299c7faa2 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -1416,6 +1416,7 @@ static enum compact_result compact_finished(struct zone *zone, > /* > * compaction_suitable: Is this suitable to run compaction on this zone now? > * Returns > + * COMPACT_IMPOSSIBLE If the allocation would fail even with all pages free > * COMPACT_SKIPPED - If there are too few free pages for compaction > * COMPACT_SUCCESS - If the allocation would succeed without compaction > * COMPACT_CONTINUE - If compaction should run now > @@ -1439,6 +1440,16 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order, > alloc_flags)) > return COMPACT_SUCCESS; > > + /* > + * If the allocation would not succeed even with a fully free zone > + * due to e.g. lowmem reserves, indicate that compaction can't possibly > + * help and it would be pointless to reclaim. > + */ > + watermark += 1UL << order; > + if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx, > + alloc_flags, zone->managed_pages)) > + return COMPACT_IMPOSSIBLE; > + > /* > * Watermarks for order-0 must be met for compaction to be able to > * isolate free pages for migration targets. This means that the > @@ -1526,7 +1537,7 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, > available += zone_page_state_snapshot(zone, NR_FREE_PAGES); > compact_result = __compaction_suitable(zone, order, alloc_flags, > ac_classzone_idx(ac), available); > - if (compact_result != COMPACT_SKIPPED) > + if (compact_result > COMPACT_SKIPPED) > return true; > } > > @@ -1555,7 +1566,8 @@ static enum compact_result compact_zone(struct zone *zone, struct compact_contro > ret = compaction_suitable(zone, cc->order, cc->alloc_flags, > cc->classzone_idx); > /* Compaction is likely to fail */ > - if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED) > + if (ret == COMPACT_SUCCESS || ret == COMPACT_SKIPPED > + || ret == COMPACT_IMPOSSIBLE) > return ret; > > /* huh, compaction_suitable is returning something unexpected */ > diff --git a/mm/vmscan.c b/mm/vmscan.c > index b37610c0eac6..7ad331a64fc5 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2849,11 +2849,12 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc) > } > > /* > - * Returns true if compaction should go ahead for a costly-order request, or > - * the allocation would already succeed without compaction. Return false if we > - * should reclaim first. > + * Returns 1 if compaction should go ahead for a costly-order request, or the > + * allocation would already succeed without compaction. Return 0 if we should > + * reclaim first. Return -1 when compaction can't help at all due to zone being > + * too small, which means there's no point in reclaim nor compaction. > */ > -static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) > +static inline int compaction_ready(struct zone *zone, struct scan_control *sc) > { > unsigned long watermark; > enum compact_result suitable; > @@ -2861,10 +2862,16 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc) > suitable = compaction_suitable(zone, sc->order, 0, sc->reclaim_idx); > if (suitable == COMPACT_SUCCESS) > /* Allocation should succeed already. Don't reclaim. */ > - return true; > + return 1; > if (suitable == COMPACT_SKIPPED) > /* Compaction cannot yet proceed. Do reclaim. */ > - return false; > + return 0; > + if (suitable == COMPACT_IMPOSSIBLE) > + /* > + * Compaction can't possibly help. So don't reclaim, but keep > + * checking other zones. > + */ > + return -1; > > /* > * Compaction is already possible, but it takes time to run and there > @@ -2910,6 +2917,7 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) > > for_each_zone_zonelist_nodemask(zone, z, zonelist, > sc->reclaim_idx, sc->nodemask) { > + int compact_ready; > /* > * Take care memory controller reclaiming has small influence > * to global LRU. > @@ -2929,10 +2937,18 @@ static void shrink_zones(struct zonelist *zonelist, struct scan_control *sc) > * page allocations. > */ > if (IS_ENABLED(CONFIG_COMPACTION) && > - sc->order > PAGE_ALLOC_COSTLY_ORDER && > - compaction_ready(zone, sc)) { > - sc->compaction_ready = true; > - continue; > + sc->order > PAGE_ALLOC_COSTLY_ORDER) { > + compact_ready = compaction_ready(zone, sc); > + if (compact_ready == 1) { > + sc->compaction_ready = true; > + continue; > + } else if (compact_ready == -1) { > + /* > + * In this zone, neither reclaim nor > + * compaction can help. > + */ > + continue; > + } > } > > /* > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:49 ` Michal Hocko 2019-09-09 12:56 ` Stefan Priebe - Profihost AG @ 2019-09-10 5:41 ` Stefan Priebe - Profihost AG 1 sibling, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-10 5:41 UTC (permalink / raw) To: Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner, Vlastimil Babka Am 09.09.19 um 14:49 schrieb Michal Hocko: > On Mon 09-09-19 14:37:52, Stefan Priebe - Profihost AG wrote: >> >> Am 09.09.19 um 14:28 schrieb Michal Hocko: >>> On Mon 09-09-19 14:10:02, Stefan Priebe - Profihost AG wrote: >>>> >>>> Am 09.09.19 um 14:08 schrieb Michal Hocko: >>>>> On Mon 09-09-19 13:01:36, Michal Hocko wrote: >>>>>> and that matches moments when we reclaimed memory. There seems to be a >>>>>> steady THP allocations flow so maybe this is a source of the direct >>>>>> reclaim? >>>>> >>>>> I was thinking about this some more and THP being a source of reclaim >>>>> sounds quite unlikely. At least in a default configuration because we >>>>> shouldn't do anything expensinve in the #PF path. But there might be a >>>>> difference source of high order (!costly) allocations. Could you check >>>>> how many allocation requests like that you have on your system? I've another system which might be interesting. Not sure which stuff to gather. It never builds up any read cache cause memory is constantly under pressure. But memfree is 28G. What would be interesting to collect here? Pressure is not very high just 1-3% but it seems it prevents the system from building up file cache. Mostly at the night where no pressure is it starts building up a read cache until pressure happens again. But all this happens with MemFree at nearly 30GB of memory. Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 8:54 ` Stefan Priebe - Profihost AG 2019-09-09 11:01 ` Michal Hocko @ 2019-09-09 11:49 ` Vlastimil Babka 2019-09-09 12:09 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Vlastimil Babka @ 2019-09-09 11:49 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote: >> Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and >> me earlier in this thread? Seeing the overall progress would tell us >> much more than before and after. Or have I missed this data? > > I needed to wait until today to grab again such a situation but from > what i know it is very clear that MemFree is low and than the kernel > starts to drop the chaches. > > Attached you'll find two log files. Thanks, what about my other requests/suggestions from earlier? 1. How does /proc/pagetypeinfo look like? 2. Could you also try if the bad trend stops after you execute: echo never > /sys/kernel/mm/transparent_hugepage/defrag and report the result? Thanks ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 11:49 ` Vlastimil Babka @ 2019-09-09 12:09 ` Stefan Priebe - Profihost AG 2019-09-09 12:21 ` Vlastimil Babka 0 siblings, 1 reply; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-09 12:09 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner Am 09.09.19 um 13:49 schrieb Vlastimil Babka: > On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote: >>> Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and >>> me earlier in this thread? Seeing the overall progress would tell us >>> much more than before and after. Or have I missed this data? >> >> I needed to wait until today to grab again such a situation but from >> what i know it is very clear that MemFree is low and than the kernel >> starts to drop the chaches. >> >> Attached you'll find two log files. > > Thanks, what about my other requests/suggestions from earlier? Sorry i missed your email. > 1. How does /proc/pagetypeinfo look like? # cat /proc/pagetypeinfo Page block order: 9 Pages per block: 512 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 1 0 0 1 2 1 1 0 1 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Unmovable 1141 970 903 628 302 106 27 4 0 0 0 Node 0, zone DMA32, type Movable 274 269 368 396 342 265 214 178 113 12 13 Node 0, zone DMA32, type Reclaimable 81 57 134 114 60 50 25 4 2 0 0 Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 39 36 13257 3474 1333 317 42 0 0 0 0 Node 0, zone Normal, type Movable 1087 9678 1104 4250 2391 1946 1768 691 141 0 0 Node 0, zone Normal, type Reclaimable 1 1782 1153 2455 1927 986 330 7 2 0 0 Node 0, zone Normal, type HighAtomic 1 1 2 2 2 0 1 1 1 0 0 Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate Node 0, zone DMA 1 7 0 0 0 Node 0, zone DMA32 52 1461 15 0 0 Node 0, zone Normal 824 5448 383 1 0 > 2. Could you also try if the bad trend stops after you execute: > echo never > /sys/kernel/mm/transparent_hugepage/defrag > and report the result? it's pretty difficult to catch those moments. Is it OK so set the value now and monitor if it happens again? Just to let you know: I've now also some more servers where memfree show 10-20Gb but cache drops suddently and memory PSI raises. Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:09 ` Stefan Priebe - Profihost AG @ 2019-09-09 12:21 ` Vlastimil Babka 2019-09-09 12:31 ` Stefan Priebe - Profihost AG 0 siblings, 1 reply; 61+ messages in thread From: Vlastimil Babka @ 2019-09-09 12:21 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner On 9/9/19 2:09 PM, Stefan Priebe - Profihost AG wrote: > > Am 09.09.19 um 13:49 schrieb Vlastimil Babka: >> On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote: >>>> Do you have more snapshots of /proc/vmstat as suggested by Vlastimil and >>>> me earlier in this thread? Seeing the overall progress would tell us >>>> much more than before and after. Or have I missed this data? >>> >>> I needed to wait until today to grab again such a situation but from >>> what i know it is very clear that MemFree is low and than the kernel >>> starts to drop the chaches. >>> >>> Attached you'll find two log files. >> >> Thanks, what about my other requests/suggestions from earlier? > > Sorry i missed your email. > >> 1. How does /proc/pagetypeinfo look like? > > # cat /proc/pagetypeinfo > Page block order: 9 > Pages per block: 512 Looks like it might be fragmented, but was that snapshot taken in the situation where there's free memory and the system still drops cache? >> 2. Could you also try if the bad trend stops after you execute: >> echo never > /sys/kernel/mm/transparent_hugepage/defrag >> and report the result? > > it's pretty difficult to catch those moments. Is it OK so set the value > now and monitor if it happens again? Well if it doesn't happen again after changing that setting, it would definitely point at THP interactions. > Just to let you know: > I've now also some more servers where memfree show 10-20Gb but cache > drops suddently and memory PSI raises. You mean those are in that state right now? So how does /proc/pagetypeinfo look there, and would changing the defrag setting help? > Greets, > Stefan > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-09 12:21 ` Vlastimil Babka @ 2019-09-09 12:31 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-09 12:31 UTC (permalink / raw) To: Vlastimil Babka, Michal Hocko Cc: linux-mm, l.roehrs, cgroups, Johannes Weiner Am 09.09.19 um 14:21 schrieb Vlastimil Babka: > On 9/9/19 2:09 PM, Stefan Priebe - Profihost AG wrote: >> >> Am 09.09.19 um 13:49 schrieb Vlastimil Babka: >>> On 9/9/19 10:54 AM, Stefan Priebe - Profihost AG wrote: >>>>> Do you have more snapshots of /proc/vmstat as suggested by >>>>> Vlastimil and >>>>> me earlier in this thread? Seeing the overall progress would tell us >>>>> much more than before and after. Or have I missed this data? >>>> >>>> I needed to wait until today to grab again such a situation but from >>>> what i know it is very clear that MemFree is low and than the kernel >>>> starts to drop the chaches. >>>> >>>> Attached you'll find two log files. >>> >>> Thanks, what about my other requests/suggestions from earlier? >> >> Sorry i missed your email. >> >>> 1. How does /proc/pagetypeinfo look like? >> >> # cat /proc/pagetypeinfo >> Page block order: 9 >> Pages per block: 512 > > Looks like it might be fragmented, but was that snapshot taken in the > situation where there's free memory and the system still drops cache? No this one is from "now" where no pressure is recorded and where mem free is at 3G and cache is also at 3G. >>> 2. Could you also try if the bad trend stops after you execute: >>> echo never > /sys/kernel/mm/transparent_hugepage/defrag >>> and report the result? >> >> it's pretty difficult to catch those moments. Is it OK so set the value >> now and monitor if it happens again? > > Well if it doesn't happen again after changing that setting, it would > definitely point at THP interactions. OK i set it to never. >> Just to let you know: >> I've now also some more servers where memfree show 10-20Gb but cache >> drops suddently and memory PSI raises. > > You mean those are in that state right now? So how does > /proc/pagetypeinfo look there, and would changing the defrag setting help? Yes i've a system which constantly triggers PSI (just 1-3%) but Mem Free is at 29GB. 1402: # cat /proc/pagetypeinfo Page block order: 9 Pages per block: 512 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 0 0 1 2 1 1 0 1 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Unmovable 0 1 0 1 0 1 0 1 1 0 3 Node 0, zone DMA32, type Movable 42 29 60 52 56 52 47 46 24 3 48 Node 0, zone DMA32, type Reclaimable 0 0 3 1 0 1 1 1 1 0 0 Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 189 7690 24737 14314 7620 5362 3458 1607 165 0 0 Node 0, zone Normal, type Movable 29269 31003 70251 73957 54776 37134 21084 10547 2307 35 4 Node 0, zone Normal, type Reclaimable 1431 3837 1821 2137 2475 978 386 112 2 0 0 Node 0, zone Normal, type HighAtomic 0 0 1 3 3 3 1 0 1 0 0 Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Number of blocks type Unmovable Movable Reclaimable HighAtomic Isolate Node 0, zone DMA 1 7 0 0 0 Node 0, zone DMA32 10 1005 1 0 0 Node 0, zone Normal 3407 27184 1152 1 0 Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 11:27 lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG 2019-09-05 11:40 ` Michal Hocko @ 2019-09-05 12:15 ` Vlastimil Babka 2019-09-05 12:27 ` Stefan Priebe - Profihost AG 1 sibling, 1 reply; 61+ messages in thread From: Vlastimil Babka @ 2019-09-05 12:15 UTC (permalink / raw) To: Stefan Priebe - Profihost AG, linux-mm Cc: l.roehrs, cgroups, Johannes Weiner, Michal Hocko On 9/5/19 1:27 PM, Stefan Priebe - Profihost AG wrote: > Hello all, > > i hope you can help me again to understand the current MemAvailable > value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in > this case. > > I'm seeing the following behaviour i don't understand and ask for help. > > While MemAvailable shows 5G the kernel starts to drop cache from 4G down > to 1G while the apache spawns some PHP processes. After that the PSI > mem.some value rises and the kernel tries to reclaim memory but > MemAvailable stays at 5G. > > Any ideas? PHP seems to use madvise(MADV_HUGEPAGE), so if it's a NUMA machine it might be worth trying to cherry-pick these two commits: 92717d429b38 ("Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""") a8282608c88e ("Revert "mm, thp: restore node-local hugepage allocations"") > Thanks! > > Greets, > Stefan > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: lot of MemAvailable but falling cache and raising PSI 2019-09-05 12:15 ` Vlastimil Babka @ 2019-09-05 12:27 ` Stefan Priebe - Profihost AG 0 siblings, 0 replies; 61+ messages in thread From: Stefan Priebe - Profihost AG @ 2019-09-05 12:27 UTC (permalink / raw) To: Vlastimil Babka, linux-mm Cc: l.roehrs, cgroups, Johannes Weiner, Michal Hocko Am 05.09.19 um 14:15 schrieb Vlastimil Babka: > On 9/5/19 1:27 PM, Stefan Priebe - Profihost AG wrote: >> Hello all, >> >> i hope you can help me again to understand the current MemAvailable >> value in the linux kernel. I'm running a 4.19.52 kernel + psi patches in >> this case. >> >> I'm seeing the following behaviour i don't understand and ask for help. >> >> While MemAvailable shows 5G the kernel starts to drop cache from 4G down >> to 1G while the apache spawns some PHP processes. After that the PSI >> mem.some value rises and the kernel tries to reclaim memory but >> MemAvailable stays at 5G. >> >> Any ideas? > > PHP seems to use madvise(MADV_HUGEPAGE), so if it's a NUMA machine it > might be worth trying to cherry-pick these two commits: > 92717d429b38 ("Revert "Revert "mm, thp: consolidate THP gfp handling > into alloc_hugepage_direct_gfpmask""") > a8282608c88e ("Revert "mm, thp: restore node-local hugepage allocations"") No it's a vm running inside qemu/kvm without numa. Greets, Stefan ^ permalink raw reply [flat|nested] 61+ messages in thread
end of thread, other threads:[~2019-10-22 11:08 UTC | newest] Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-05 11:27 lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG 2019-09-05 11:40 ` Michal Hocko 2019-09-05 11:56 ` Stefan Priebe - Profihost AG 2019-09-05 16:28 ` Yang Shi 2019-09-05 17:26 ` Stefan Priebe - Profihost AG 2019-09-05 18:46 ` Yang Shi 2019-09-05 19:31 ` Stefan Priebe - Profihost AG 2019-09-06 10:08 ` Stefan Priebe - Profihost AG 2019-09-06 10:25 ` Vlastimil Babka 2019-09-06 18:52 ` Yang Shi 2019-09-07 7:32 ` Stefan Priebe - Profihost AG 2019-09-09 8:27 ` Michal Hocko 2019-09-09 8:54 ` Stefan Priebe - Profihost AG 2019-09-09 11:01 ` Michal Hocko 2019-09-09 12:08 ` Michal Hocko 2019-09-09 12:10 ` Stefan Priebe - Profihost AG 2019-09-09 12:28 ` Michal Hocko 2019-09-09 12:37 ` Stefan Priebe - Profihost AG 2019-09-09 12:49 ` Michal Hocko 2019-09-09 12:56 ` Stefan Priebe - Profihost AG [not found] ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag> 2019-09-10 5:58 ` Stefan Priebe - Profihost AG 2019-09-10 8:29 ` Michal Hocko 2019-09-10 8:38 ` Stefan Priebe - Profihost AG 2019-09-10 9:02 ` Michal Hocko 2019-09-10 9:37 ` Stefan Priebe - Profihost AG 2019-09-10 11:07 ` Michal Hocko 2019-09-10 12:45 ` Stefan Priebe - Profihost AG 2019-09-10 12:57 ` Michal Hocko 2019-09-10 13:05 ` Stefan Priebe - Profihost AG 2019-09-10 13:14 ` Stefan Priebe - Profihost AG 2019-09-10 13:24 ` Michal Hocko 2019-09-11 6:12 ` Stefan Priebe - Profihost AG 2019-09-11 6:24 ` Stefan Priebe - Profihost AG 2019-09-11 13:59 ` Stefan Priebe - Profihost AG 2019-09-12 10:53 ` Stefan Priebe - Profihost AG 2019-09-12 11:06 ` Stefan Priebe - Profihost AG 2019-09-11 7:09 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko 2019-09-11 14:09 ` Stefan Priebe - Profihost AG 2019-09-11 14:56 ` Filipe Manana 2019-09-11 15:39 ` Stefan Priebe - Profihost AG 2019-09-11 15:56 ` Filipe Manana 2019-09-11 16:15 ` Stefan Priebe - Profihost AG 2019-09-11 16:19 ` Filipe Manana 2019-09-19 10:21 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG 2019-09-23 12:08 ` Michal Hocko 2019-09-27 12:45 ` Vlastimil Babka 2019-09-30 6:56 ` Stefan Priebe - Profihost AG 2019-09-30 7:21 ` Vlastimil Babka 2019-10-22 7:41 ` Stefan Priebe - Profihost AG 2019-10-22 7:48 ` Vlastimil Babka 2019-10-22 10:02 ` Stefan Priebe - Profihost AG 2019-10-22 10:20 ` Oscar Salvador 2019-10-22 10:21 ` Vlastimil Babka 2019-10-22 11:08 ` Stefan Priebe - Profihost AG 2019-09-10 5:41 ` Stefan Priebe - Profihost AG 2019-09-09 11:49 ` Vlastimil Babka 2019-09-09 12:09 ` Stefan Priebe - Profihost AG 2019-09-09 12:21 ` Vlastimil Babka 2019-09-09 12:31 ` Stefan Priebe - Profihost AG 2019-09-05 12:15 ` Vlastimil Babka 2019-09-05 12:27 ` Stefan Priebe - Profihost AG
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).