Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
To: Michal Hocko <mhocko@kernel.org>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	l.roehrs@profihost.ag, cgroups@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: lot of MemAvailable but falling cache and raising PSI
Date: Wed, 11 Sep 2019 15:59:56 +0200
Message-ID: <289fbe71-0472-520f-64e2-b6d07ced5436@profihost.ag> (raw)
In-Reply-To: <5f960e74-1f44-9a0a-58a6-dcb64aa71612@profihost.ag>

HI,

i've now tried v5.2.14 but that one died with - i don't know which
version to try... now

2019-09-11 15:41:09     ------------[ cut here ]------------
2019-09-11 15:41:09     kernel BUG at mm/page-writeback.c:2655!
2019-09-11 15:41:09     invalid opcode: 0000 [#1] SMP PTI
2019-09-11 15:41:09     CPU: 4 PID: 466 Comm: kworker/u24:6 Not tainted
5.2.14 #1
2019-09-11 15:41:09     Hardware name: Supermicro Super Server/X10SRi-F,
BIOS 1.0b 04/21/2015
2019-09-11 15:41:09     Workqueue: btrfs-delalloc btrfs_delalloc_helper
[btrfs]
2019-09-11 15:41:09     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
2019-09-11 15:41:09     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
d2 48 8b
2019-09-11 15:41:09     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
2019-09-11 15:41:09     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
0000000000000000
2019-09-11 15:41:09     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
ffffe660525b3140
2019-09-11 15:41:09     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
000000000002de18
2019-09-11 15:41:09     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
0000000000000000
2019-09-11 15:41:09     R13: 0000000000000001 R14: 0000000000000000 R15:
ffffbd4b8d2f3d08
2019-09-11 15:41:09     FS: 0000000000000000(0000)
GS:ffff9ade3f900000(0000) knlGS:0000000000000000
2019-09-11 15:41:09     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-09-11 15:41:09     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
00000000001606e0
2019-09-11 15:41:09     Call Trace:
2019-09-11 15:41:09     __process_pages_contig+0x270/0x360 [btrfs]
2019-09-11 15:41:09     submit_compressed_extents+0x39d/0x460 [btrfs]
2019-09-11 15:41:09     normal_work_helper+0x20f/0x320
[btrfs]process_one_work+0x18b/0x380worker_thread+0x4f/0x3a0
2019-09-11 15:41:09     ? rescuer_thread+0x330/0x330kthread+0xf8/0x130
2019-09-11 15:41:09     ?
kthread_create_worker_on_cpu+0x70/0x70ret_from_fork+0x35/0x40
2019-09-11 15:41:09     Modules linked in: netconsole xt_tcpudp xt_owner
xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_multiport
ipt_REJECT nf_reject_ipv4 xt_set iptable_filter bpfilter fuse
ip_set_hash_net ip_set nfnetlink 8021q garp bonding sb_edac
x86_pkg_temp_thermal coretemp kvm_intel ast kvm ttm drm_kms_helper
irqbypass crc32_pclmul drm fb_sys_fops syscopyarea lpc_ich sysfillrect
ghash_clmulni_intel sysimgblt mfd_core sg wmi ipmi_si ipmi_devintf
ipmi_msghandler button ip_tables x_tables btrfs zstd_decompress
zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq
async_xor async_tx xor usbhid raid6_pq raid1 raid0 multipath linear
md_mod sd_mod xhci_pci ehci_pci igb xhci_hcd ehci_hcd i2c_algo_bit
i2c_i801 ahci ptp i2c_core usbcore libahci usb_common pps_core megaraid_sas
2019-09-11 15:41:09     ---[ end trace d9a3f99c047dc8bf ]---
2019-09-11 15:41:10     RIP: 0010:clear_page_dirty_for_io+0xfc/0x210
2019-09-11 15:41:10     Code: 01 48 0f 44 d3 f0 48 0f ba 32 03 b8 00 00
00 00 72 1a 4d 85 e4 0f 85 b4 00 00 00 48 83 c4 08 5b 5d 41 5c 41 5d 41
5e 41 5f c3 <0f> 0b 9c 41 5f fa 48 8b 03 48 8b 53 38 48 c1 e8 36 48 85
d2 48 8b
2019-09-11 15:41:10     RSP: 0018:ffffbd4b8d2f3c18 EFLAGS: 00010246
2019-09-11 15:41:10     RAX: 001000000004205c RBX: ffffe660525b3140 RCX:
0000000000000000
2019-09-11 15:41:10     RDX: 0000000000000000 RSI: 0000000000000006 RDI:
ffffe660525b3140
2019-09-11 15:41:10     RBP: ffff9ad639868818 R08: 0000000000000001 R09:
000000000002de18
2019-09-11 15:41:10     R10: 0000000000000002 R11: ffff9ade7ffd6000 R12:
0000000000000000
2019-09-11 15:41:10     R13: 0000000000000001 R14: 0000000000000000 R15:
ffffbd4b8d2f3d08
2019-09-11 15:41:10     FS: 0000000000000000(0000)
GS:ffff9ade3f900000(0000) knlGS:0000000000000000
2019-09-11 15:41:10     CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2019-09-11 15:41:10     CR2: 000055fa10d2bf70 CR3: 00000005a420a002 CR4:
00000000001606e0
2019-09-11 15:41:10     Kernel panic - not syncing: Fatal exception
2019-09-11 15:41:10     Kernel Offset: 0x1a000000 from
0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
2019-09-11 15:41:10     Rebooting in 20 seconds..
2019-09-11 15:41:29     ACPI MEMORY or I/O RESET_REG.

Stefan
Am 11.09.19 um 08:24 schrieb Stefan Priebe - Profihost AG:
> Hi Michal,
> 
> Am 11.09.19 um 08:12 schrieb Stefan Priebe - Profihost AG:
>> Hi Michal,
>> Am 10.09.19 um 15:24 schrieb Michal Hocko:
>>> On Tue 10-09-19 15:14:45, Stefan Priebe - Profihost AG wrote:
>>>> Am 10.09.19 um 15:05 schrieb Stefan Priebe - Profihost AG:
>>>>>
>>>>> Am 10.09.19 um 14:57 schrieb Michal Hocko:
>>>>>> On Tue 10-09-19 14:45:37, Stefan Priebe - Profihost AG wrote:
>>>>>>> Hello Michal,
>>>>>>>
>>>>>>> ok this might take a long time. Attached you'll find a graph from a
>>>>>>> fresh boot what happens over time (here 17 August to 30 August). Memory
>>>>>>> Usage decreases as well as cache but slowly and only over time and days.
>>>>>>>
>>>>>>> So it might take 2-3 weeks running Kernel 5.3 to see what happens.
>>>>>>
>>>>>> No problem. Just make sure to collect the requested data from the time
>>>>>> you see the actual problem. Btw. you try my very dumb scriplets to get
>>>>>> an idea of how much memory gets reclaimed due to THP.
>>>>>
>>>>> You mean your sed and sort on top of the trace file? No i did not with
>>>>> the current 5.3 kernel do you think it will show anything interesting?
>>>>> Which line shows me how much memory gets reclaimed due to THP?
>>>
>>> Please re-read http://lkml.kernel.org/r/20190910082919.GL2063@dhcp22.suse.cz
>>> Each command has a commented output. If you see nunmber of reclaimed
>>> pages to be large for GFP_TRANSHUGE then you are seeing a similar
>>> problem.
>>>
>>>> Is something like a kernel memory leak possible? Or wouldn't this end up
>>>> in having a lot of free memory which doesn't seem usable.
>>>
>>> I would be really surprised if this was the case.
>>>
>>>> I also wonder why a reclaim takes place when there is enough memory.
>>>
>>> This is not clear yet and it might be a bug that has been fixed since
>>> 4.18. That's why we need to see whether the same is pattern is happening
>>> with 5.3 as well.
> 
> but except from the btrfs problem the memory consumption looks far
> better than before.
> 
> Running 4.19.X:
> after about 12h cache starts to drop from 30G to 24G
> 
> Running 5.3-rc8:
> after about 24h cache is still constant at nearly 30G
> 
> Greets,
> Stefan
> 


  reply index

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 11:27 Stefan Priebe - Profihost AG
2019-09-05 11:40 ` Michal Hocko
2019-09-05 11:56   ` Stefan Priebe - Profihost AG
2019-09-05 16:28     ` Yang Shi
2019-09-05 17:26       ` Stefan Priebe - Profihost AG
2019-09-05 18:46         ` Yang Shi
2019-09-05 19:31           ` Stefan Priebe - Profihost AG
2019-09-06 10:08     ` Stefan Priebe - Profihost AG
2019-09-06 10:25       ` Vlastimil Babka
2019-09-06 18:52       ` Yang Shi
2019-09-07  7:32         ` Stefan Priebe - Profihost AG
2019-09-09  8:27       ` Michal Hocko
2019-09-09  8:54         ` Stefan Priebe - Profihost AG
2019-09-09 11:01           ` Michal Hocko
2019-09-09 12:08             ` Michal Hocko
2019-09-09 12:10               ` Stefan Priebe - Profihost AG
2019-09-09 12:28                 ` Michal Hocko
2019-09-09 12:37                   ` Stefan Priebe - Profihost AG
2019-09-09 12:49                     ` Michal Hocko
2019-09-09 12:56                       ` Stefan Priebe - Profihost AG
     [not found]                         ` <52235eda-ffe2-721c-7ad7-575048e2d29d@profihost.ag>
2019-09-10  5:58                           ` Stefan Priebe - Profihost AG
2019-09-10  8:29                           ` Michal Hocko
2019-09-10  8:38                             ` Stefan Priebe - Profihost AG
2019-09-10  9:02                               ` Michal Hocko
2019-09-10  9:37                                 ` Stefan Priebe - Profihost AG
2019-09-10 11:07                                   ` Michal Hocko
2019-09-10 12:45                                     ` Stefan Priebe - Profihost AG
2019-09-10 12:57                                       ` Michal Hocko
2019-09-10 13:05                                         ` Stefan Priebe - Profihost AG
2019-09-10 13:14                                           ` Stefan Priebe - Profihost AG
2019-09-10 13:24                                             ` Michal Hocko
2019-09-11  6:12                                               ` Stefan Priebe - Profihost AG
2019-09-11  6:24                                                 ` Stefan Priebe - Profihost AG
2019-09-11 13:59                                                   ` Stefan Priebe - Profihost AG [this message]
2019-09-12 10:53                                                     ` Stefan Priebe - Profihost AG
2019-09-12 11:06                                                       ` Stefan Priebe - Profihost AG
2019-09-11  7:09                                                 ` 5.3-rc-8 hung task in IO (was: Re: lot of MemAvailable but falling cache and raising PSI) Michal Hocko
2019-09-11 14:09                                                   ` Stefan Priebe - Profihost AG
2019-09-11 14:56                                                   ` Filipe Manana
2019-09-11 15:39                                                     ` Stefan Priebe - Profihost AG
2019-09-11 15:56                                                       ` Filipe Manana
2019-09-11 16:15                                                         ` Stefan Priebe - Profihost AG
2019-09-11 16:19                                                           ` Filipe Manana
2019-09-19 10:21                                                 ` lot of MemAvailable but falling cache and raising PSI Stefan Priebe - Profihost AG
2019-09-23 12:08                                                   ` Michal Hocko
2019-09-27 12:45                                                   ` Vlastimil Babka
2019-09-30  6:56                                                     ` Stefan Priebe - Profihost AG
2019-09-30  7:21                                                       ` Vlastimil Babka
2019-10-22  7:41                                                     ` Stefan Priebe - Profihost AG
2019-10-22  7:48                                                       ` Vlastimil Babka
2019-10-22 10:02                                                         ` Stefan Priebe - Profihost AG
2019-10-22 10:20                                                           ` Oscar Salvador
2019-10-22 10:21                                                           ` Vlastimil Babka
2019-10-22 11:08                                                             ` Stefan Priebe - Profihost AG
2019-09-10  5:41                       ` Stefan Priebe - Profihost AG
2019-09-09 11:49           ` Vlastimil Babka
2019-09-09 12:09             ` Stefan Priebe - Profihost AG
2019-09-09 12:21               ` Vlastimil Babka
2019-09-09 12:31                 ` Stefan Priebe - Profihost AG
2019-09-05 12:15 ` Vlastimil Babka
2019-09-05 12:27   ` Stefan Priebe - Profihost AG

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=289fbe71-0472-520f-64e2-b6d07ced5436@profihost.ag \
    --to=s.priebe@profihost.ag \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=l.roehrs@profihost.ag \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git