stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>
To: Jan Kara <jack@suse.cz>
Cc: Mikulas Patocka <mpatocka@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	stable@vger.kernel.org, regressions@lists.linux.dev,
	Alasdair Kergon <agk@redhat.com>,
	Mike Snitzer <snitzer@kernel.org>,
	dm-devel@lists.linux.dev, linux-mm@kvack.org
Subject: Re: Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5
Date: Mon, 30 Oct 2023 17:51:07 +0100	[thread overview]
Message-ID: <ZT/e/EaBIkJEgevQ@mail-itl> (raw)
In-Reply-To: <20231030155603.k3kejytq2e4vnp7z@quack3>

[-- Attachment #1: Type: text/plain, Size: 20199 bytes --]

On Mon, Oct 30, 2023 at 04:56:03PM +0100, Jan Kara wrote:
> On Mon 30-10-23 15:08:56, Mikulas Patocka wrote:
> > On Mon, 30 Oct 2023, Marek Marczykowski-Górecki wrote:
> > 
> > > > Well, it would be possible that larger pages in a bio would trip e.g. bio
> > > > splitting due to maximum segment size the disk supports (which can be e.g.
> > > > 0xffff) and that upsets something somewhere. But this is pure
> > > > speculation. We definitely need more debug data to be able to tell more.
> > > 
> > > I can collect more info, but I need some guidance how :) Some patch
> > > adding extra debug messages?
> > > Note I collect those via serial console (writing to disk doesn't work
> > > when it freezes), and that has some limits in the amount of data I can
> > > extract especially when printed quickly. For example sysrq-t is too much.
> > > Or maybe there is some trick to it, like increasing log_bug_len?
> > 
> > If you can do more tests, I would suggest this:
> > 
> > We already know that it works with order 3 and doesn't work with order 4.
> > 
> > So, in the file include/linux/mmzone.h, change PAGE_ALLOC_COSTLY_ORDER 
> > from 3 to 4 and in the file drivers/md/dm-crypt.c leave "unsigned int 
> > order = PAGE_ALLOC_COSTLY_ORDER" there.
> > 
> > Does it deadlock or not?
> > 
> > So, that we can see whether the deadlock depends on 
> > PAGE_ALLOC_COSTLY_ORDER or whether it is just a coincidence.
> 
> Good idea. Also if the kernel hangs, please find kcryptd processes. In what
> state are they? If they are sleeping, please send what's in
> /proc/<kcryptd-pid>/stack. Thanks!

Will do.

In the meantime, while testing version with PAGE_ALLOC_COSTLY_ORDER=4,
and order=PAGE_ALLOC_COSTLY_ORDER, I'm getting crash like this (see
important note below both traces):

    [   92.668486] BUG: unable to handle page fault for address: ffff8880c7b64098
    [   92.668558] #PF: supervisor read access in kernel mode
    [   92.668574] #PF: error_code(0x0000) - not-present page
    [   92.668590] PGD 2a32067 P4D 2a32067 PUD 12868a067 PMD 0 
    [   92.668617] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [   92.668637] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W          6.5.6-dirty #354
    [   92.668658] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
    [   92.668675] RIP: e030:__free_one_page+0x301/0x3e0
    [   92.668704] Code: 02 0f 85 c1 fe ff ff 49 c1 e6 04 49 8d 4c 24 08 4a 8d 94 36 c0 00 00 00 48 8d 34 80 48 8d 04 70 4c 01 fa 48 c1 e0 03 49 01 c6 <4b> 8b b4 37 c0 00 00 00 48 89 4e 08 49 89 74 24 08 49 89 54 24 10
    [   92.668738] RSP: e02b:ffffc90040154c60 EFLAGS: 00010006
    [   92.668754] RAX: 0000000000000058 RBX: 0000000000000001 RCX: ffff888075f66bc0
    [   92.668773] RDX: ffff8880c7b64098 RSI: 0000000000000005 RDI: 0000000000000000
    [   92.668789] RBP: fffffe7a01d7d9ae R08: ffff888075f66b38 R09: fffffe7a01d7d9ac
    [   92.668805] R10: ffffea0004b35bc8 R11: 0000000000000001 R12: ffff888075f66bb8
    [   92.668821] R13: 0000000000000000 R14: 0000000051bfd4d8 R15: ffff888075f66b00
    [   92.668854] FS:  0000000000000000(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
    [   92.668874] CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
    [   92.668889] CR2: ffff8880c7b64098 CR3: 0000000133cf8000 CR4: 0000000000050660
    [   92.668912] Call Trace:
    [   92.668924]  <IRQ>
    [   92.668934]  ? __die+0x1e/0x60
    [   92.668955]  ? page_fault_oops+0x178/0x4a0
    [   92.668975]  ? exc_page_fault+0x14e/0x160
    [   92.668994]  ? asm_exc_page_fault+0x26/0x30
    [   92.669014]  ? __free_one_page+0x301/0x3e0
    [   92.669027]  free_pcppages_bulk+0x11c/0x2b0
    [   92.669042]  free_unref_page+0x10d/0x170
    [   92.669058]  crypt_free_buffer_pages+0x1f4/0x250
    [   92.669079]  crypt_endio+0x48/0x70
    [   92.669094]  blk_mq_end_request_batch+0xd0/0x400
    [   92.669114]  nvme_irq+0x6d/0x80
    [   92.669132]  ? __pfx_nvme_pci_complete_batch+0x10/0x10
    [   92.669148]  __handle_irq_event_percpu+0x42/0x1a0
    [   92.669166]  handle_irq_event+0x33/0x70
    [   92.669179]  handle_edge_irq+0x9e/0x240
    [   92.669197]  handle_irq_desc+0x36/0x50
    [   92.669215]  __evtchn_fifo_handle_events+0x1af/0x1d0
    [   92.669236]  __xen_evtchn_do_upcall+0x70/0xd0
    [   92.669255]  __xen_pv_evtchn_do_upcall+0x3d/0x70
    [   92.669275]  xen_pv_evtchn_do_upcall+0xd9/0x110
    [   92.669294]  </IRQ>
    [   92.669302]  <TASK>
    [   92.669310]  exc_xen_hypervisor_callback+0x8/0x20
    [   92.669330] RIP: e030:xen_hypercall_xen_version+0xa/0x20
    [   92.669348] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
    [   92.669380] RSP: e02b:ffffc900400c7e08 EFLAGS: 00000246
    [   92.669394] RAX: 0000000000040011 RBX: 0000000000000000 RCX: ffffffff81fc222a
    [   92.669411] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
    [   92.669426] RBP: ffffc900400c7eb0 R08: 0000000000000000 R09: 00000000000001ba
    [   92.669442] R10: 0000000000007ff0 R11: 0000000000000246 R12: ffff88818976c540
    [   92.669458] R13: ffff888101e10000 R14: 0000000000000000 R15: 0000000000000402
    [   92.669475]  ? xen_hypercall_xen_version+0xa/0x20
    [   92.669493]  ? pmu_msr_read+0x3c/0xd0
    [   92.669510]  ? xen_force_evtchn_callback+0xd/0x20
    [   92.669526]  ? check_events+0x16/0x30
    [   92.669540]  ? xen_irq_enable_direct+0x1d/0x30
    [   92.669556]  ? finish_task_switch.isra.0+0x8e/0x270
    [   92.669576]  ? __switch_to+0x165/0x3b0
    [   92.669590]  ? __schedule+0x316/0x8b0
    [   92.669609]  ? schedule_idle+0x25/0x40
    [   92.669626]  ? cpu_startup_entry+0x25/0x30
    [   92.669641]  ? cpu_bringup_and_idle+0x89/0xa0
    [   92.669660]  ? asm_cpu_bringup_and_idle+0x9/0x10
    [   92.669680]  </TASK>
    [   92.669688] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep hid_multitouch snd_hda_core snd_seq snd_seq_device idma64 snd_pcm i2c_designware_platform iwlwifi i2c_designware_core snd_timer snd soundcore i2c_i801 i2c_smbus efivarfs i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
    [   92.669901] CR2: ffff8880c7b64098
    [   92.669915] ---[ end trace 0000000000000000 ]---
    [   92.669930] RIP: e030:__free_one_page+0x301/0x3e0
    [   92.669946] Code: 02 0f 85 c1 fe ff ff 49 c1 e6 04 49 8d 4c 24 08 4a 8d 94 36 c0 00 00 00 48 8d 34 80 48 8d 04 70 4c 01 fa 48 c1 e0 03 49 01 c6 <4b> 8b b4 37 c0 00 00 00 48 89 4e 08 49 89 74 24 08 49 89 54 24 10
    [   92.669977] RSP: e02b:ffffc90040154c60 EFLAGS: 00010006
    [   92.669992] RAX: 0000000000000058 RBX: 0000000000000001 RCX: ffff888075f66bc0
    [   92.670008] RDX: ffff8880c7b64098 RSI: 0000000000000005 RDI: 0000000000000000
    [   92.670024] RBP: fffffe7a01d7d9ae R08: ffff888075f66b38 R09: fffffe7a01d7d9ac
    [   92.670040] R10: ffffea0004b35bc8 R11: 0000000000000001 R12: ffff888075f66bb8
    [   92.670055] R13: 0000000000000000 R14: 0000000051bfd4d8 R15: ffff888075f66b00
    [   92.670083] FS:  0000000000000000(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
    [   92.670101] CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
    [   92.670116] CR2: ffff8880c7b64098 CR3: 0000000133cf8000 CR4: 0000000000050660
    [   92.670148] Kernel panic - not syncing: Fatal exception in interrupt
    [   92.670177] ------------[ cut here ]------------
    [   92.670188] WARNING: CPU: 5 PID: 0 at kernel/smp.c:766 smp_call_function_many_cond+0x4db/0x560
    [   92.670219] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep hid_multitouch snd_hda_core snd_seq snd_seq_device idma64 snd_pcm i2c_designware_platform iwlwifi i2c_designware_core snd_timer snd soundcore i2c_i801 i2c_smbus efivarfs i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput
    [   92.670400] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G      D W          6.5.6-dirty #354
    [   92.670419] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
    [   92.670434] RIP: e030:smp_call_function_many_cond+0x4db/0x560
    [   92.670455] Code: f0 52 1b 81 48 89 74 24 08 e8 d1 c8 f6 ff 48 8b 74 24 08 65 ff 0d 5d 70 e7 7e 0f 85 bb fd ff ff 0f 1f 44 00 00 e9 b1 fd ff ff <0f> 0b e9 69 fb ff ff 8b 7c 24 38 e8 d5 4f f7 ff 84 c0 0f 84 a8 fd
    [   92.670486] RSP: e02b:ffffc900401549d0 EFLAGS: 00010006
    [   92.670500] RAX: 0000000080010007 RBX: 0000000000000000 RCX: 0000000000000000
    [   92.670516] RDX: 0000000000000000 RSI: ffffffff826f3c61 RDI: ffffffff82cbc1d0
    [   92.670531] RBP: 0000000000000005 R08: 0000000000000000 R09: 0000000000000001
    [   92.670546] R10: 00000000ffffdfff R11: ffffffff82a5ddc0 R12: 0000000000000005
    [   92.670562] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    [   92.670588] FS:  0000000000000000(0000) GS:ffff888189740000(0000) knlGS:0000000000000000
    [   92.670606] CS:  10000e030 DS: 002b ES: 002b CR0: 0000000080050033
    [   92.670620] CR2: ffff8880c7b64098 CR3: 0000000133cf8000 CR4: 0000000000050660
    [   92.670642] Call Trace:
    [   92.670650]  <IRQ>
    [   92.670659]  ? smp_call_function_many_cond+0x4db/0x560
    [   92.670677]  ? __warn+0x7c/0x130
    [   92.670693]  ? smp_call_function_many_cond+0x4db/0x560
    [   92.670711]  ? report_bug+0x191/0x1c0
    [   92.670726]  ? handle_bug+0x3c/0x80
    [   92.670742]  ? exc_invalid_op+0x17/0x70
    [   92.670758]  ? asm_exc_invalid_op+0x1a/0x20
    [   92.670775]  ? smp_call_function_many_cond+0x4db/0x560
    [   92.670793]  ? __pfx_stop_self+0x10/0x10
    [   92.670811]  ? _printk+0x5f/0x80
    [   92.670823]  ? __pfx_stop_self+0x10/0x10
    [   92.670840]  smp_call_function+0x38/0x70
    [   92.670857]  panic+0x19c/0x320
    [   92.670873]  oops_end+0xd8/0xe0
    [   92.670887]  page_fault_oops+0x19c/0x4a0
    [   92.670905]  exc_page_fault+0x14e/0x160
    [   92.670921]  asm_exc_page_fault+0x26/0x30
    [   92.670937] RIP: e030:__free_one_page+0x301/0x3e0
    [   92.670952] Code: 02 0f 85 c1 fe ff ff 49 c1 e6 04 49 8d 4c 24 08 4a 8d 94 36 c0 00 00 00 48 8d 34 80 48 8d 04 70 4c 01 fa 48 c1 e0 03 49 01 c6 <4b> 8b b4 37 c0 00 00 00 48 89 4e 08 49 89 74 24 08 49 89 54 24 10
    [   92.670983] RSP: e02b:ffffc90040154c60 EFLAGS: 00010006
    [   92.670997] RAX: 0000000000000058 RBX: 0000000000000001 RCX: ffff888075f66bc0
    [   92.671013] RDX: ffff8880c7b64098 RSI: 0000000000000005 RDI: 0000000000000000
    [   92.671028] RBP: fffffe7a01d7d9ae R08: ffff888075f66b38 R09: fffffe7a01d7d9ac
    [   92.671044] R10: ffffea0004b35bc8 R11: 0000000000000001 R12: ffff888075f66bb8
    [   92.671059] R13: 0000000000000000 R14: 0000000051bfd4d8 R15: ffff888075f66b00
    [   92.671077]  free_pcppages_bulk+0x11c/0x2b0
    [   92.671091]  free_unref_page+0x10d/0x170
    [   92.671106]  crypt_free_buffer_pages+0x1f4/0x250
    [   92.671122]  crypt_endio+0x48/0x70
    [   92.671136]  blk_mq_end_request_batch+0xd0/0x400
    [   92.671152]  nvme_irq+0x6d/0x80
    [   92.671166]  ? __pfx_nvme_pci_complete_batch+0x10/0x10
    [   92.671181]  __handle_irq_event_percpu+0x42/0x1a0
    [   92.671196]  handle_irq_event+0x33/0x70
    [   92.671208]  handle_edge_irq+0x9e/0x240
    [   92.671224]  handle_irq_desc+0x36/0x50
    [   92.671241]  __evtchn_fifo_handle_events+0x1af/0x1d0
    [   92.671258]  __xen_evtchn_do_upcall+0x70/0xd0
    [   92.671275]  __xen_pv_evtchn_do_upcall+0x3d/0x70
    [   92.671292]  xen_pv_evtchn_do_upcall+0xd9/0x110
    [   92.671308]  </IRQ>
    [   92.671316]  <TASK>
    [   92.671324]  exc_xen_hypervisor_callback+0x8/0x20
    [   92.671342] RIP: e030:xen_hypercall_xen_version+0xa/0x20
    [   92.671360] Code: 51 41 53 b8 10 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 11 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
    [   92.671390] RSP: e02b:ffffc900400c7e08 EFLAGS: 00000246
    [   92.671403] RAX: 0000000000040011 RBX: 0000000000000000 RCX: ffffffff81fc222a
    [   92.671419] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000000
    [   92.671434] RBP: ffffc900400c7eb0 R08: 0000000000000000 R09: 00000000000001ba
    [   92.671449] R10: 0000000000007ff0 R11: 0000000000000246 R12: ffff88818976c540
    [   92.671465] R13: ffff888101e10000 R14: 0000000000000000 R15: 0000000000000402
    [   92.671481]  ? xen_hypercall_xen_version+0xa/0x20
    [   92.671499]  ? pmu_msr_read+0x3c/0xd0
    [   92.671514]  ? xen_force_evtchn_callback+0xd/0x20
    [   92.671529]  ? check_events+0x16/0x30
    [   92.671543]  ? xen_irq_enable_direct+0x1d/0x30
    [   92.671560]  ? finish_task_switch.isra.0+0x8e/0x270
    [   92.671579]  ? __switch_to+0x165/0x3b0
    [   92.671593]  ? __schedule+0x316/0x8b0
    [   92.671611]  ? schedule_idle+0x25/0x40
    [   92.671631]  ? cpu_startup_entry+0x25/0x30
    [   92.671644]  ? cpu_bringup_and_idle+0x89/0xa0
    [   92.671663]  ? asm_cpu_bringup_and_idle+0x9/0x10
    [   92.671683]  </TASK>
    [   92.671691] ---[ end trace 0000000000000000 ]---
    [   92.671782] Kernel Offset: disabled
    (XEN) Hardware Dom0 crashed: rebooting machine in 5 seconds.


If I change order=PAGE_ALLOC_COSTLY_ORDER+1, then this:


    [ 2205.112802] BUG: unable to handle page fault for address: ffffffff89630301
    [ 2205.112866] #PF: supervisor write access in kernel mode
    [ 2205.112882] #PF: error_code(0x0002) - not-present page
    [ 2205.112899] PGD 2a35067 P4D 2a35067 PUD 2a36067 PMD 0 
    [ 2205.112921] Oops: 0002 [#1] PREEMPT SMP NOPTI
    [ 2205.112946] CPU: 0 PID: 12609 Comm: kworker/u12:9 Tainted: G        W          6.5.6-dirty #355
    [ 2205.112979] Hardware name: Star Labs StarBook/StarBook, BIOS 8.97 10/03/2023
    [ 2205.112997] Workqueue: kcryptd/252:0 kcryptd_crypt
    [ 2205.113022] RIP: e030:get_page_from_freelist+0x281/0x10c0
    [ 2205.113044] Code: 6c 05 18 49 89 df 8b 5c 24 34 49 8b 47 18 48 39 c5 0f 84 85 02 00 00 49 8b 47 18 48 8b 48 08 48 8b 30 48 8d 50 f8 48 89 4e 08 <48> 89 31 48 b9 00 01 00 00 00 00 ad de 48 89 08 48 83 c1 22 48 89
    [ 2205.113079] RSP: e02b:ffffc90041ea7c48 EFLAGS: 00010283
    [ 2205.113096] RAX: ffffea00059b9248 RBX: 0000000000000001 RCX: ffffffff89630301
    [ 2205.113117] RDX: ffffea00059b9240 RSI: ffffea00059b9201 RDI: 0000000000000001
    [ 2205.113134] RBP: ffff888189631718 R08: 000000000000c000 R09: 000000000003cb07
    [ 2205.113153] R10: 0000000000000001 R11: fefefefefefefeff R12: ffff888075f66b00
    [ 2205.113171] R13: ffff888189631700 R14: 0000000000000000 R15: ffff888189631700
    [ 2205.113208] FS:  0000000000000000(0000) GS:ffff888189600000(0000) knlGS:0000000000000000
    [ 2205.113229] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 2205.113245] CR2: ffffffff89630301 CR3: 00000001049d0000 CR4: 0000000000050660
    [ 2205.113272] Call Trace:
    [ 2205.113285]  <TASK>
    [ 2205.113295]  ? __die+0x1e/0x60
    [ 2205.113312]  ? page_fault_oops+0x178/0x4a0
    [ 2205.113330]  ? exc_page_fault+0x14e/0x160
    [ 2205.113346]  ? asm_exc_page_fault+0x26/0x30
    [ 2205.113363]  ? get_page_from_freelist+0x281/0x10c0
    [ 2205.113379]  ? get_page_from_freelist+0x227/0x10c0
    [ 2205.113396]  __alloc_pages+0x1dd/0x300
    [ 2205.113412]  crypt_page_alloc+0x29/0x60
    [ 2205.113425]  mempool_alloc+0x81/0x1b0
    [ 2205.113443]  kcryptd_crypt+0x293/0x4b0
    [ 2205.113458]  process_one_work+0x1e0/0x3e0
    [ 2205.113476]  worker_thread+0x49/0x3b0
    [ 2205.113490]  ? _raw_spin_lock_irqsave+0x22/0x50
    [ 2205.113510]  ? __pfx_worker_thread+0x10/0x10
    [ 2205.113527]  kthread+0xef/0x120
    [ 2205.113541]  ? __pfx_kthread+0x10/0x10
    [ 2205.113554]  ret_from_fork+0x2c/0x50
    [ 2205.113568]  ? __pfx_kthread+0x10/0x10
    [ 2205.113581]  ret_from_fork_asm+0x1b/0x30
    [ 2205.113598]  </TASK>
    [ 2205.113606] Modules linked in: snd_hda_codec_hdmi snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda soundwire_intel soundwire_generic_allocation snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_hda_codec_generic snd_sof ledtrig_audio snd_sof_utils snd_sof_xtensa_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_hda_ext_core soundwire_bus snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device snd_pcm hid_multitouch snd_timer snd i2c_designware_platform i2c_designware_core i2c_i801 soundcore iwlwifi idma64 i2c_smbus i2c_hid_acpi i2c_hid pinctrl_tigerlake pinctrl_intel xen_acpi_processor xen_pciback xen_blkback xen_gntalloc xen_gntdev xen_evtchn uinput efivarfs
    [ 2205.113795] CR2: ffffffff89630301
    [ 2205.113808] ---[ end trace 0000000000000000 ]---
    [ 2205.113822] RIP: e030:get_page_from_freelist+0x281/0x10c0
    [ 2205.113839] Code: 6c 05 18 49 89 df 8b 5c 24 34 49 8b 47 18 48 39 c5 0f 84 85 02 00 00 49 8b 47 18 48 8b 48 08 48 8b 30 48 8d 50 f8 48 89 4e 08 <48> 89 31 48 b9 00 01 00 00 00 00 ad de 48 89 08 48 83 c1 22 48 89
    [ 2205.113873] RSP: e02b:ffffc90041ea7c48 EFLAGS: 00010283
    [ 2205.113887] RAX: ffffea00059b9248 RBX: 0000000000000001 RCX: ffffffff89630301
    [ 2205.113903] RDX: ffffea00059b9240 RSI: ffffea00059b9201 RDI: 0000000000000001
    [ 2205.113919] RBP: ffff888189631718 R08: 000000000000c000 R09: 000000000003cb07
    [ 2205.113935] R10: 0000000000000001 R11: fefefefefefefeff R12: ffff888075f66b00
    [ 2205.113950] R13: ffff888189631700 R14: 0000000000000000 R15: ffff888189631700
    [ 2205.113980] FS:  0000000000000000(0000) GS:ffff888189600000(0000) knlGS:0000000000000000
    [ 2205.113998] CS:  e030 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 2205.114012] CR2: ffffffff89630301 CR3: 00000001049d0000 CR4: 0000000000050660
    [ 2205.114038] note: kworker/u12:9[12609] exited with irqs disabled
    [ 2205.114100] note: kworker/u12:9[12609] exited with preempt_count 2

Then retried with order=PAGE_ALLOC_COSTLY_ORDER and
PAGE_ALLOC_COSTLY_ORDER back at 3, and also got similar crash.

Both happened only after logging into X session (lightdm -> Xfce) and
starting xfce4-terminal. If I use text console/ssh/serial and leave
lightdm at login screen, then it does not crash (this is in fact how I
did most previous tests). I have no idea if it's related to some other
bug somewhere else (graphics driver?), or simply higher memory usage due
to the whole Xfce session running.
It could be also a coincidence, the sample size is rather small...
But could be also some memory corruption that depending on memory layout
sometimes results in a crash and sometimes in "just" storage freeze.

Note this all is still on top of 6.5.6 with changes we discuss here. If
you believe it's another issue that got fixed in the meantime, I can
switch to another version, but otherwise I'd like to limit changes.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2023-10-30 16:51 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-21  3:38 Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Marek Marczykowski-Górecki
2023-10-21  7:48 ` Bagas Sanjaya
2023-10-29  6:23   ` Linux regression tracking #update (Thorsten Leemhuis)
2023-10-23 20:59 ` Mikulas Patocka
2023-10-25  3:10   ` Marek Marczykowski-Górecki
2023-10-25  3:22     ` Marek Marczykowski-Górecki
2023-10-25 10:13       ` Mikulas Patocka
2023-10-27 17:32         ` Mikulas Patocka
2023-10-28  9:23           ` Matthew Wilcox
2023-10-28 15:14             ` Mike Snitzer
2023-10-29 11:15           ` Marek Marczykowski-Górecki
2023-10-29 20:02         ` Vlastimil Babka
2023-10-30  7:37           ` Mikulas Patocka
2023-10-30  8:37             ` Vlastimil Babka
2023-10-30 11:22               ` Mikulas Patocka
2023-10-30 11:30                 ` Vlastimil Babka
2023-10-30 11:37                   ` Mikulas Patocka
2023-10-30 12:25                   ` Jan Kara
2023-10-30 13:30                     ` Marek Marczykowski-Górecki
2023-10-30 14:08                       ` Mikulas Patocka
2023-10-30 15:56                         ` Jan Kara
2023-10-30 16:51                           ` Marek Marczykowski-Górecki [this message]
2023-10-30 17:50                             ` Mikulas Patocka
2023-10-31  3:48                               ` Marek Marczykowski-Górecki
2023-10-31 14:01                                 ` Jan Kara
2023-10-31 15:42                                   ` Marek Marczykowski-Górecki
2023-10-31 17:17                                     ` Mikulas Patocka
2023-10-31 17:24                                       ` Mikulas Patocka
2023-11-02  0:38                                         ` Marek Marczykowski-Górecki
2023-11-02  9:28                                           ` Mikulas Patocka
2023-11-02 11:45                                             ` Marek Marczykowski-Górecki
2023-11-02 17:06                                               ` Mikulas Patocka
2023-11-03 15:01                                                 ` Marek Marczykowski-Górecki
2023-11-03 15:10                                                   ` Keith Busch
2023-11-03 16:15                                                 ` Marek Marczykowski-Górecki
2023-11-03 16:54                                                   ` Keith Busch
2023-11-03 20:30                                                     ` Marek Marczykowski-G'orecki
2023-11-03 22:42                                                       ` Keith Busch
2023-11-04  9:27                                                         ` Mikulas Patocka
2023-11-04 13:59                                                           ` Keith Busch
2023-11-06  7:10                                                             ` Christoph Hellwig
2023-11-06 14:59                                                               ` [PATCH] swiotlb-xen: provide the "max_mapping_size" method Mikulas Patocka
2023-11-06 15:16                                                                 ` Keith Busch
2023-11-06 15:30                                                                   ` Mike Snitzer
2023-11-06 17:12                                                                     ` [PATCH v2] " Mikulas Patocka
2023-11-07  4:18                                                                       ` Stefano Stabellini
2023-11-08  7:31                                                                       ` Christoph Hellwig
2023-11-06  7:08                                                     ` Intermittent storage (dm-crypt?) freeze - regression 6.4->6.5 Christoph Hellwig
2023-11-02 12:21                                             ` Jan Kara
2023-11-01  1:27                                     ` Ming Lei
2023-11-01  2:14                                       ` Marek Marczykowski-Górecki
2023-11-01  2:15                                         ` Marek Marczykowski-Górecki
2023-11-01  2:35                                           ` Marek Marczykowski-Górecki
2023-11-01  3:24                                         ` Ming Lei
2023-11-01 10:15                                           ` Hannes Reinecke
2023-11-01 10:26                                             ` Jan Kara
2023-11-01 11:23                                             ` Ming Lei
2023-11-02 14:02                                               ` Keith Busch
2023-11-01 12:16                                             ` Mikulas Patocka
2023-10-30 11:28               ` Jan Kara
2023-10-30 11:49                 ` Mikulas Patocka
2023-10-30 12:11                   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZT/e/EaBIkJEgevQ@mail-itl \
    --to=marmarek@invisiblethingslab.com \
    --cc=agk@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dm-devel@lists.linux.dev \
    --cc=jack@suse.cz \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mpatocka@redhat.com \
    --cc=regressions@lists.linux.dev \
    --cc=snitzer@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).