* BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2019-11-17 17:34 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2019-11-17 17:34 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs, linux-kernel Hi Ryusuke, today I got this bug in kernel, which seems to be related to nilfs2. It was likely caused by improper shutdown and following nilfs2 partition corruption. Now I can still read the data, but on the whole the computer is not useable, because starting a process which uses the corrupted file system simply crashes in kernel. I am actually not sure if the filesystem is corrupted, as I don't know about any tool to check that. The relevant parts of dmesg log are bellow. Please let me know if you are the right contact or if you need more info about the problem. Thank you, Tomas [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 38.321561] NILFS version 2 loaded [ 38.323236] NILFS (dm-1): mounting unchecked fs [ 38.349185] NILFS (dm-1): recovery complete [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds [ 63.543941] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal corrupted or uncleanly shut down, renaming and replacing. [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [12637.085558] PGD 0 P4D 0 [12637.085567] Oops: 0000 [#1] SMP PTI [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 [12637.085628] Call Trace: [12637.085640] __test_set_page_writeback+0x37c/0x3f0 [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] [12637.085713] kthread+0x112/0x130 [12637.085719] ? kthread_bind+0x30/0x30 [12637.085728] ret_from_fork+0x1f/0x40 [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec coretemp btintel [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod [12637.086000] CR2: 00000000000000a8 [12637.086005] ---[ end trace ee0079180c990cd2 ]--- [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2019-11-17 17:34 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2019-11-17 17:34 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs, linux-kernel Hi Ryusuke, today I got this bug in kernel, which seems to be related to nilfs2. It was likely caused by improper shutdown and following nilfs2 partition corruption. Now I can still read the data, but on the whole the computer is not useable, because starting a process which uses the corrupted file system simply crashes in kernel. I am actually not sure if the filesystem is corrupted, as I don't know about any tool to check that. The relevant parts of dmesg log are bellow. Please let me know if you are the right contact or if you need more info about the problem. Thank you, Tomas [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 38.321561] NILFS version 2 loaded [ 38.323236] NILFS (dm-1): mounting unchecked fs [ 38.349185] NILFS (dm-1): recovery complete [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds [ 63.543941] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal corrupted or uncleanly shut down, renaming and replacing. [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 [12637.085558] PGD 0 P4D 0 [12637.085567] Oops: 0000 [#1] SMP PTI [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 [12637.085628] Call Trace: [12637.085640] __test_set_page_writeback+0x37c/0x3f0 [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] [12637.085713] kthread+0x112/0x130 [12637.085719] ? kthread_bind+0x30/0x30 [12637.085728] ret_from_fork+0x1f/0x40 [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec coret emp btintel [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf _defrag_ipv6 [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod [12637.086000] CR2: 00000000000000a8 [12637.086005] ---[ end trace ee0079180c990cd2 ]--- [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct 2019-11-17 17:34 ` Tomas Hlavaty (?) @ 2019-11-18 16:51 ` Ryusuke Konishi 2019-11-19 6:04 ` Viacheslav Dubeyko ` (2 more replies) -1 siblings, 3 replies; 32+ messages in thread From: Ryusuke Konishi @ 2019-11-18 16:51 UTC (permalink / raw) To: Tomas Hlavaty; +Cc: linux-nilfs, LKML Hi, > It was likely caused by improper shutdown and following nilfs2 partition > corruption. Now I can still read the data, but on the whole the > computer is not useable, because starting a process which uses the > corrupted file system simply crashes in kernel. Thank you for reporting the issue. Let me ask you a few questions: 1) Is the crash reproducible in the environment ? 2) Can you mount the corrupted(?) partition from a recent version of kernel ? 3) Does read-only mount option (-r) work to avoid the crash ? Thanks, Ryusuke Konishi 2019年11月18日(月) 2:34 Tomas Hlavaty <tom@logand.com>: > > Hi Ryusuke, > > today I got this bug in kernel, which seems to be related to nilfs2. > > It was likely caused by improper shutdown and following nilfs2 partition > corruption. Now I can still read the data, but on the whole the > computer is not useable, because starting a process which uses the > corrupted file system simply crashes in kernel. I am actually not sure > if the filesystem is corrupted, as I don't know about any tool to check > that. The relevant parts of dmesg log are bellow. > > Please let me know if you are the right contact or if you need more info > about the problem. > > Thank you, > > Tomas > > [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 > [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 > > > > [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. > [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. > [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' > > [ 38.321561] NILFS version 2 loaded > [ 38.323236] NILFS (dm-1): mounting unchecked fs > > > [ 38.349185] NILFS (dm-1): recovery complete > [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > > [ 63.543941] systemd-journald[470]: File > /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal > corrupted or uncleanly shut down, renaming and replacing. > > [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 > [12637.085558] PGD 0 P4D 0 > [12637.085567] Oops: 0000 [#1] SMP PTI > [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS > [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 > [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 > [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a > [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 > [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 > [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 > [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 > [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 > [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 > [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 > [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 > [12637.085628] Call Trace: > [12637.085640] __test_set_page_writeback+0x37c/0x3f0 > [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] > [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] > [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] > [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] > [12637.085713] kthread+0x112/0x130 > [12637.085719] ? kthread_bind+0x30/0x30 > [12637.085728] ret_from_fork+0x1f/0x40 > [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec coretemp btintel > [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 > [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod > [12637.086000] CR2: 00000000000000a8 > [12637.086005] ---[ end trace ee0079180c990cd2 ]--- > [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 > [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a > [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 > [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 > [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 > [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 > [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 > [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 > [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 > [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct 2019-11-18 16:51 ` Ryusuke Konishi @ 2019-11-19 6:04 ` Viacheslav Dubeyko 2020-01-23 13:00 ` Tomas Hlavaty 2019-12-19 21:02 ` Tomas Hlavaty 2020-01-23 13:58 ` BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct ARAI Shun-ichi 2 siblings, 1 reply; 32+ messages in thread From: Viacheslav Dubeyko @ 2019-11-19 6:04 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: Tomas Hlavaty, linux-nilfs, LKML > On Nov 18, 2019, at 7:51 PM, Ryusuke Konishi <konishi.ryusuke@gmail.com> wrote: > > Hi, > >> It was likely caused by improper shutdown and following nilfs2 partition >> corruption. Now I can still read the data, but on the whole the >> computer is not useable, because starting a process which uses the >> corrupted file system simply crashes in kernel. > > Thank you for reporting the issue. > Let me ask you a few questions: > > 1) Is the crash reproducible in the environment ? > 2) Can you mount the corrupted(?) partition from a recent version of kernel ? > 3) Does read-only mount option (-r) work to avoid the crash ? I believe it could be important to know more details about the partition too: (1) the partition size? (2) the logical block size? (3) the segment size? (4) how the partition was created? (5) the version of tools that created the partition? (6) the amount of free space on the partition? Thanks, Viacheslav Dubeyko. > > Thanks, > Ryusuke Konishi > > 2019年11月18日(月) 2:34 Tomas Hlavaty <tom@logand.com>: >> >> Hi Ryusuke, >> >> today I got this bug in kernel, which seems to be related to nilfs2. >> >> It was likely caused by improper shutdown and following nilfs2 partition >> corruption. Now I can still read the data, but on the whole the >> computer is not useable, because starting a process which uses the >> corrupted file system simply crashes in kernel. I am actually not sure >> if the filesystem is corrupted, as I don't know about any tool to check >> that. The relevant parts of dmesg log are bellow. >> >> Please let me know if you are the right contact or if you need more info >> about the problem. >> >> Thank you, >> >> Tomas >> >> [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 >> [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 >> >> >> >> [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. >> [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. >> [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >> >> [ 38.321561] NILFS version 2 loaded >> [ 38.323236] NILFS (dm-1): mounting unchecked fs >> >> >> [ 38.349185] NILFS (dm-1): recovery complete >> [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds >> >> [ 63.543941] systemd-journald[470]: File >> /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal >> corrupted or uncleanly shut down, renaming and replacing. >> >> [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 >> [12637.085558] PGD 0 P4D 0 >> [12637.085567] Oops: 0000 [#1] SMP PTI >> [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS >> [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 >> [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >> [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >> [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >> [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >> [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >> [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >> [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >> [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >> [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >> [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 >> [12637.085628] Call Trace: >> [12637.085640] __test_set_page_writeback+0x37c/0x3f0 >> [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] >> [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] >> [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] >> [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] >> [12637.085713] kthread+0x112/0x130 >> [12637.085719] ? kthread_bind+0x30/0x30 >> [12637.085728] ret_from_fork+0x1f/0x40 >> [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec coretemp btintel >> [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 >> [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod >> [12637.086000] CR2: 00000000000000a8 >> [12637.086005] ---[ end trace ee0079180c990cd2 ]--- >> [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >> [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >> [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >> [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >> [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >> [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >> [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >> [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >> [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >> [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-01-23 13:00 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2020-01-23 13:00 UTC (permalink / raw) To: Viacheslav Dubeyko, Ryusuke Konishi; +Cc: linux-nilfs, LKML Hi Viacheslav, Viacheslav Dubeyko <slava@dubeyko.com> writes: > (1) the partition size? the first disk with crash was 1TB the second disk with crash, which i have by me, is 2TB: $ lsblk /dev/sdb NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdb 8:16 0 1.8T 0 disk └─extdisk 254:2 0 1.8T 0 crypt /mnt/b > (2) the logical block size? > (3) the segment size? how can i find (2) and (3) out? here is the output of nilfs-tune: $ sudo nilfs-tune -l /dev/mapper/extdisk nilfs-tune 2.2.7 Filesystem volume name: backup1_nilfs2 Filesystem UUID: 7d9708f9-464f-41b7-a0c6-eda18741012f Filesystem magic number: 0x3434 Filesystem revision #: 2.0 Filesystem features: (none) Filesystem state: valid Filesystem OS type: Linux Block size: 4096 Filesystem created: Thu Dec 27 14:14:14 2018 Last mount time: Fri Dec 20 13:06:15 2019 Last write time: Thu Jan 23 13:04:30 2020 Mount count: 15 Maximum mount count: 50 Reserve blocks uid: 0 (user root) Reserve blocks gid: 0 (group root) First inode: 11 Inode size: 128 DAT entry size: 32 Checkpoint size: 192 Segment usage size: 16 Number of segments: 238465 Device size: 2000396834816 First data block: 1 # of blocks per segment: 2048 Reserved segments %: 5 Last checkpoint #: 9884 Last block address: 280841435 Last sequence #: 137120 Free blocks count: 207591424 Commit interval: 0 # of blks to create seg: 0 CRC seed: 0x5172270a CRC check sum: 0x2ef767d2 CRC check data size: 0x00000118 it seems strange that the last write time is today, even though i mounted the partition read-only /dev/mapper/extdisk on /mnt/b type nilfs2 (ro,relatime) > (4) how the partition was created? using parted then cryptsetup luksFormat then cryptsetup luksOpen then mkfs.nilfs2 > (5) the version of tools that created the partition? how can i find this out? is it saved somewhere? > (6) the amount of free space on the partition? /dev/mapper/extdisk 1.9T 1.1T 699G 61% /mnt/b Regards, Tomas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-01-23 13:00 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2020-01-23 13:00 UTC (permalink / raw) To: Viacheslav Dubeyko, Ryusuke Konishi; +Cc: linux-nilfs, LKML Hi Viacheslav, Viacheslav Dubeyko <slava-yeENwD64cLxBDgjK7y7TUQ@public.gmane.org> writes: > (1) the partition size? the first disk with crash was 1TB the second disk with crash, which i have by me, is 2TB: $ lsblk /dev/sdb NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdb 8:16 0 1.8T 0 disk └─extdisk 254:2 0 1.8T 0 crypt /mnt/b > (2) the logical block size? > (3) the segment size? how can i find (2) and (3) out? here is the output of nilfs-tune: $ sudo nilfs-tune -l /dev/mapper/extdisk nilfs-tune 2.2.7 Filesystem volume name: backup1_nilfs2 Filesystem UUID: 7d9708f9-464f-41b7-a0c6-eda18741012f Filesystem magic number: 0x3434 Filesystem revision #: 2.0 Filesystem features: (none) Filesystem state: valid Filesystem OS type: Linux Block size: 4096 Filesystem created: Thu Dec 27 14:14:14 2018 Last mount time: Fri Dec 20 13:06:15 2019 Last write time: Thu Jan 23 13:04:30 2020 Mount count: 15 Maximum mount count: 50 Reserve blocks uid: 0 (user root) Reserve blocks gid: 0 (group root) First inode: 11 Inode size: 128 DAT entry size: 32 Checkpoint size: 192 Segment usage size: 16 Number of segments: 238465 Device size: 2000396834816 First data block: 1 # of blocks per segment: 2048 Reserved segments %: 5 Last checkpoint #: 9884 Last block address: 280841435 Last sequence #: 137120 Free blocks count: 207591424 Commit interval: 0 # of blks to create seg: 0 CRC seed: 0x5172270a CRC check sum: 0x2ef767d2 CRC check data size: 0x00000118 it seems strange that the last write time is today, even though i mounted the partition read-only /dev/mapper/extdisk on /mnt/b type nilfs2 (ro,relatime) > (4) how the partition was created? using parted then cryptsetup luksFormat then cryptsetup luksOpen then mkfs.nilfs2 > (5) the version of tools that created the partition? how can i find this out? is it saved somewhere? > (6) the amount of free space on the partition? /dev/mapper/extdisk 1.9T 1.1T 699G 61% /mnt/b Regards, Tomas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct 2019-11-18 16:51 ` Ryusuke Konishi 2019-11-19 6:04 ` Viacheslav Dubeyko @ 2019-12-19 21:02 ` Tomas Hlavaty 2020-01-23 12:31 ` Tomas Hlavaty 2020-01-23 13:58 ` BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct ARAI Shun-ichi 2 siblings, 1 reply; 32+ messages in thread From: Tomas Hlavaty @ 2019-12-19 21:02 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs, LKML Hi Ryusuke, thanks for your answer. Ryusuke Konishi <konishi.ryusuke@gmail.com> writes: > 1) Is the crash reproducible in the environment ? yes > 2) Can you mount the corrupted(?) partition from a recent version of > kernel ? > 3) Does read-only mount option (-r) work to avoid the crash ? I'll have access to the computer sometime next week so I'll try this out and let you know. Thank you, Tomas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-01-23 12:31 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2020-01-23 12:31 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs, LKML Hi Ryusuke, >> 2) Can you mount the corrupted(?) partition from a recent version of >> kernel ? this will take me some time to figure out >> 3) Does read-only mount option (-r) work to avoid the crash ? ro mount doesn't seem to crash at least after mounting the partition read-only - running lscp - running sudo find . -type f inside the mounted partition - cat <some random file on the nilfs partition> does not crash the crash i was seeing was during rsync (writing i guess) Other info that might be relevant: - the nilfs partition was on top of luks - the corruption happened probably during shutdown the shutdown hanged for a long time waiting for nilfs disk (iirc it waits for 1m30s) and even after that it did not finish so i turned the computer off without waiting further. after new start, i got the crash - i got the same problem on another disk recently Regards, Tomas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-01-23 12:31 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2020-01-23 12:31 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs, LKML Hi Ryusuke, >> 2) Can you mount the corrupted(?) partition from a recent version of >> kernel ? this will take me some time to figure out >> 3) Does read-only mount option (-r) work to avoid the crash ? ro mount doesn't seem to crash at least after mounting the partition read-only - running lscp - running sudo find . -type f inside the mounted partition - cat <some random file on the nilfs partition> does not crash the crash i was seeing was during rsync (writing i guess) Other info that might be relevant: - the nilfs partition was on top of luks - the corruption happened probably during shutdown the shutdown hanged for a long time waiting for nilfs disk (iirc it waits for 1m30s) and even after that it did not finish so i turned the computer off without waiting further. after new start, i got the crash - i got the same problem on another disk recently Regards, Tomas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-03-27 6:26 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2020-03-27 6:26 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs, LKML Tomas Hlavaty <tom@logand.com> writes: >>> 2) Can you mount the corrupted(?) partition from a recent version of >>> kernel ? I tried the following Linux kernel versions: - v4.19 - v5.4 - v5.5.11 and still get the crash ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-03-27 6:26 ` Tomas Hlavaty 0 siblings, 0 replies; 32+ messages in thread From: Tomas Hlavaty @ 2020-03-27 6:26 UTC (permalink / raw) To: Ryusuke Konishi; +Cc: linux-nilfs, LKML Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org> writes: >>> 2) Can you mount the corrupted(?) partition from a recent version of >>> kernel ? I tried the following Linux kernel versions: - v4.19 - v5.4 - v5.5.11 and still get the crash ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <CAKFNMomjWkNvHvHkEp=Jv_BiGPNj=oLEChyoXX1yCj5xctAkMA@mail.gmail.com>]
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct [not found] ` <CAKFNMomjWkNvHvHkEp=Jv_BiGPNj=oLEChyoXX1yCj5xctAkMA@mail.gmail.com> @ 2020-03-28 9:26 ` ARAI Shun-ichi 2020-04-30 12:38 ` Hideki EIRAKU 0 siblings, 1 reply; 32+ messages in thread From: ARAI Shun-ichi @ 2020-03-28 9:26 UTC (permalink / raw) To: linux-nilfs, linux-kernel In Msg <874kuapb2s.fsf@logand.com>; Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Tomas Hlavaty <tom@logand.com> writes: >>>> 2) Can you mount the corrupted(?) partition from a recent version of >>>> kernel ? > > I tried the following Linux kernel versions: > > - v4.19 > - v5.4 > - v5.5.11 > > and still get the crash Ryusuke Konishi pointed out: In Msg <CAKFNMomjWkNvHvHkEp=Jv_BiGPNj=oLEChyoXX1yCj5xctAkMA@mail.gmail.com>; Subject "Re: BUG: kernel NULL pointer dereference, address: 00000000000000a8": > As the result of bisection, it turned out that commit > f4bdb2697ccc9cecf1a9de86905c309ad901da4c on 5.3.y > ("mm/filemap.c: don't initiate writeback if mapping has no dirty pages") > triggers the crash. This commit modifies __filemap_fdatawrite_range() as follows. [before] if (!mapping_cap_writeback_dirty(mapping)) return 0; [after] if (!mapping_cap_writeback_dirty(mapping) || !mapping_tagged(mapping, PAGECACHE_TAG_DIRTY)) return 0; I did simple test with this code (Kernel 5.5.13). [test] if (!mapping_cap_writeback_dirty(mapping) || mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK)) return 0; It does not cause crash by the test (without long-term operation). So, I think that it may be related to PAGECACHE_TAG_TOWRITE. One possible(?) scenario is: 0. some write operation 1. sync (WB_SYNC_ALL) 2. tagged "PAGECACHE_TAG_TOWRITE" 3. __filemap_fdatawrite_range() is called and returns successfully (but no-op) 4. some data is/are free-ed (because of 3.) 5. crash at test/setting writeback for free-ed data nilfs_segctor_do_construct() nilfs_segctor_prepare_write() set_page_writeback() How about this? ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-04-30 12:38 ` Hideki EIRAKU 0 siblings, 0 replies; 32+ messages in thread From: Hideki EIRAKU @ 2020-04-30 12:38 UTC (permalink / raw) To: linux-nilfs; +Cc: linux-kernel > In Msg <874kuapb2s.fsf@logand.com>; > Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > >> Tomas Hlavaty <tom@logand.com> writes: >>>>> 2) Can you mount the corrupted(?) partition from a recent version of >>>>> kernel ? >> >> I tried the following Linux kernel versions: >> >> - v4.19 >> - v5.4 >> - v5.5.11 >> >> and still get the crash I found conditions to reproduce this issue with Linux 5.7-rc3: - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y - When the NILFS2 file system writes to a device, the device file has never written by other programs since boot The following is an example with CONFIG_MEMCG=y and CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works because the mkfs command has written data to the device file before mounting: # mkfs -t nilfs2 /dev/sda1 mkfs.nilfs2 (nilfs-utils 2.2.7) Start writing file system initial data to the device Blocksize:4096 Device:/dev/sda1 Device Size:267386880 File system initialization succeeded !! # mount /dev/sda1 /mnt # touch /mnt # sync # Loopback mount seems to be the same - if you do losetup, mkfs and mount on a loopback device, it works: # losetup /dev/loop0 foo # mkfs -t nilfs2 /dev/loop0 mkfs.nilfs2 (nilfs-utils 2.2.7) Start writing file system initial data to the device Blocksize:4096 Device:/dev/loop0 Device Size:267386880 File system initialization succeeded !! # mount /dev/sda1 /mnt # touch /mnt # sync # But if you do mkfs on a file and use mount -o loop, it may fail, depending on whether the loopback device assigned by the mount command was used or not before mounting: # /sbin/mkfs.nilfs2 ./foo mkfs.nilfs2 (nilfs-utils 2.2.7) Start writing file system initial data to the device Blocksize:4096 Device:./foo Device Size:268435456 File system initialization succeeded !! # mount -o loop ./foo /mnt [ 36.371331] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync [ 40.252869] BUG: kernel NULL pointer dereference, address: 00000000000000a8 (snip) After reboot, it fails: # mount /dev/sda1 /mnt [ 14.021188] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync [ 20.576309] BUG: kernel NULL pointer dereference, address: 00000000000000a8 (snip) But if you do dummy write to the device file before mounting, it works: # dd if=/dev/sda1 of=/dev/sda1 count=1 1+0 records in 1+0 records out 512 bytes copied, 0.0135982 s, 37.7 kB/s # mount /dev/sda1 /mnt [ 52.604560] NILFS (sda1): mounting unchecked fs [ 52.613335] NILFS (sda1): recovery complete [ 52.613877] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync # # losetup /dev/loop0 foo # dd if=/dev/loop0 of=/dev/loop0 count=1 1+0 records in 1+0 records out 512 bytes copied, 0.0243797 s, 21.0 kB/s # mount /dev/loop0 /mnt [ 271.915595] NILFS (loop0): mounting unchecked fs [ 272.049603] NILFS (loop0): recovery complete [ 272.049724] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync # I think the dummy write is a simple workaround for now, unless mounting NILFS2 at boot time. But I have been using NILFS2 /home for years, I would like to know better workarounds. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-04-30 12:38 ` Hideki EIRAKU 0 siblings, 0 replies; 32+ messages in thread From: Hideki EIRAKU @ 2020-04-30 12:38 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA; +Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA > In Msg <874kuapb2s.fsf-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org>; > Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > >> Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org> writes: >>>>> 2) Can you mount the corrupted(?) partition from a recent version of >>>>> kernel ? >> >> I tried the following Linux kernel versions: >> >> - v4.19 >> - v5.4 >> - v5.5.11 >> >> and still get the crash I found conditions to reproduce this issue with Linux 5.7-rc3: - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y - When the NILFS2 file system writes to a device, the device file has never written by other programs since boot The following is an example with CONFIG_MEMCG=y and CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works because the mkfs command has written data to the device file before mounting: # mkfs -t nilfs2 /dev/sda1 mkfs.nilfs2 (nilfs-utils 2.2.7) Start writing file system initial data to the device Blocksize:4096 Device:/dev/sda1 Device Size:267386880 File system initialization succeeded !! # mount /dev/sda1 /mnt # touch /mnt # sync # Loopback mount seems to be the same - if you do losetup, mkfs and mount on a loopback device, it works: # losetup /dev/loop0 foo # mkfs -t nilfs2 /dev/loop0 mkfs.nilfs2 (nilfs-utils 2.2.7) Start writing file system initial data to the device Blocksize:4096 Device:/dev/loop0 Device Size:267386880 File system initialization succeeded !! # mount /dev/sda1 /mnt # touch /mnt # sync # But if you do mkfs on a file and use mount -o loop, it may fail, depending on whether the loopback device assigned by the mount command was used or not before mounting: # /sbin/mkfs.nilfs2 ./foo mkfs.nilfs2 (nilfs-utils 2.2.7) Start writing file system initial data to the device Blocksize:4096 Device:./foo Device Size:268435456 File system initialization succeeded !! # mount -o loop ./foo /mnt [ 36.371331] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync [ 40.252869] BUG: kernel NULL pointer dereference, address: 00000000000000a8 (snip) After reboot, it fails: # mount /dev/sda1 /mnt [ 14.021188] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync [ 20.576309] BUG: kernel NULL pointer dereference, address: 00000000000000a8 (snip) But if you do dummy write to the device file before mounting, it works: # dd if=/dev/sda1 of=/dev/sda1 count=1 1+0 records in 1+0 records out 512 bytes copied, 0.0135982 s, 37.7 kB/s # mount /dev/sda1 /mnt [ 52.604560] NILFS (sda1): mounting unchecked fs [ 52.613335] NILFS (sda1): recovery complete [ 52.613877] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync # # losetup /dev/loop0 foo # dd if=/dev/loop0 of=/dev/loop0 count=1 1+0 records in 1+0 records out 512 bytes copied, 0.0243797 s, 21.0 kB/s # mount /dev/loop0 /mnt [ 271.915595] NILFS (loop0): mounting unchecked fs [ 272.049603] NILFS (loop0): recovery complete [ 272.049724] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds # touch /mnt # sync # I think the dummy write is a simple workaround for now, unless mounting NILFS2 at boot time. But I have been using NILFS2 /home for years, I would like to know better workarounds. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-04-30 15:27 ` Tom 0 siblings, 0 replies; 32+ messages in thread From: Tom @ 2020-04-30 15:27 UTC (permalink / raw) To: Hideki EIRAKU, linux-nilfs; +Cc: linux-kernel Thank you! This is very helpful information, and does seem to be a workaround. Like you, I have my home directory on a separate NILFS2 filesystem. As a temporary solution, I removed the line from /etc/fstab for that filesystem and added your dd suggestion along with a manual mount of the home filesystem to /etc/rc.local. /home is now mounted properly at boot with any of the newer kernels I tried. Thanks, Tom On 4/30/20 5:38 AM, Hideki EIRAKU wrote: >> In Msg <874kuapb2s.fsf@logand.com>; >> Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": >> >>> Tomas Hlavaty <tom@logand.com> writes: >>>>>> 2) Can you mount the corrupted(?) partition from a recent version of >>>>>> kernel ? >>> >>> I tried the following Linux kernel versions: >>> >>> - v4.19 >>> - v5.4 >>> - v5.5.11 >>> >>> and still get the crash > > I found conditions to reproduce this issue with Linux 5.7-rc3: > > - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y > > - When the NILFS2 file system writes to a device, the device file has > never written by other programs since boot > > The following is an example with CONFIG_MEMCG=y and > CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works > because the mkfs command has written data to the device file before > mounting: > > # mkfs -t nilfs2 /dev/sda1 > mkfs.nilfs2 (nilfs-utils 2.2.7) > Start writing file system initial data to the device > Blocksize:4096 Device:/dev/sda1 Device Size:267386880 > File system initialization succeeded !! > # mount /dev/sda1 /mnt > # touch /mnt > # sync > # > > Loopback mount seems to be the same - if you do losetup, mkfs and > mount on a loopback device, it works: > > # losetup /dev/loop0 foo > # mkfs -t nilfs2 /dev/loop0 > mkfs.nilfs2 (nilfs-utils 2.2.7) > Start writing file system initial data to the device > Blocksize:4096 Device:/dev/loop0 Device Size:267386880 > File system initialization succeeded !! > # mount /dev/sda1 /mnt > # touch /mnt > # sync > # > > But if you do mkfs on a file and use mount -o loop, it may fail, > depending on whether the loopback device assigned by the mount command > was used or not before mounting: > > # /sbin/mkfs.nilfs2 ./foo > mkfs.nilfs2 (nilfs-utils 2.2.7) > Start writing file system initial data to the device > Blocksize:4096 Device:./foo Device Size:268435456 > File system initialization succeeded !! > # mount -o loop ./foo /mnt > [ 36.371331] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > [ 40.252869] BUG: kernel NULL pointer dereference, address: 00000000000000a8 > (snip) > > After reboot, it fails: > > # mount /dev/sda1 /mnt > [ 14.021188] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > [ 20.576309] BUG: kernel NULL pointer dereference, address: 00000000000000a8 > (snip) > > But if you do dummy write to the device file before mounting, it > works: > > # dd if=/dev/sda1 of=/dev/sda1 count=1 > 1+0 records in > 1+0 records out > 512 bytes copied, 0.0135982 s, 37.7 kB/s > # mount /dev/sda1 /mnt > [ 52.604560] NILFS (sda1): mounting unchecked fs > [ 52.613335] NILFS (sda1): recovery complete > [ 52.613877] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > # > > # losetup /dev/loop0 foo > # dd if=/dev/loop0 of=/dev/loop0 count=1 > 1+0 records in > 1+0 records out > 512 bytes copied, 0.0243797 s, 21.0 kB/s > # mount /dev/loop0 /mnt > [ 271.915595] NILFS (loop0): mounting unchecked fs > [ 272.049603] NILFS (loop0): recovery complete > [ 272.049724] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > # > > I think the dummy write is a simple workaround for now, unless > mounting NILFS2 at boot time. But I have been using NILFS2 /home for > years, I would like to know better workarounds. > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-04-30 15:27 ` Tom 0 siblings, 0 replies; 32+ messages in thread From: Tom @ 2020-04-30 15:27 UTC (permalink / raw) To: Hideki EIRAKU, linux-nilfs-u79uwXL29TY76Z2rM5mHXA Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA Thank you! This is very helpful information, and does seem to be a workaround. Like you, I have my home directory on a separate NILFS2 filesystem. As a temporary solution, I removed the line from /etc/fstab for that filesystem and added your dd suggestion along with a manual mount of the home filesystem to /etc/rc.local. /home is now mounted properly at boot with any of the newer kernels I tried. Thanks, Tom On 4/30/20 5:38 AM, Hideki EIRAKU wrote: >> In Msg <874kuapb2s.fsf-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org>; >> Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": >> >>> Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org> writes: >>>>>> 2) Can you mount the corrupted(?) partition from a recent version of >>>>>> kernel ? >>> >>> I tried the following Linux kernel versions: >>> >>> - v4.19 >>> - v5.4 >>> - v5.5.11 >>> >>> and still get the crash > > I found conditions to reproduce this issue with Linux 5.7-rc3: > > - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y > > - When the NILFS2 file system writes to a device, the device file has > never written by other programs since boot > > The following is an example with CONFIG_MEMCG=y and > CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works > because the mkfs command has written data to the device file before > mounting: > > # mkfs -t nilfs2 /dev/sda1 > mkfs.nilfs2 (nilfs-utils 2.2.7) > Start writing file system initial data to the device > Blocksize:4096 Device:/dev/sda1 Device Size:267386880 > File system initialization succeeded !! > # mount /dev/sda1 /mnt > # touch /mnt > # sync > # > > Loopback mount seems to be the same - if you do losetup, mkfs and > mount on a loopback device, it works: > > # losetup /dev/loop0 foo > # mkfs -t nilfs2 /dev/loop0 > mkfs.nilfs2 (nilfs-utils 2.2.7) > Start writing file system initial data to the device > Blocksize:4096 Device:/dev/loop0 Device Size:267386880 > File system initialization succeeded !! > # mount /dev/sda1 /mnt > # touch /mnt > # sync > # > > But if you do mkfs on a file and use mount -o loop, it may fail, > depending on whether the loopback device assigned by the mount command > was used or not before mounting: > > # /sbin/mkfs.nilfs2 ./foo > mkfs.nilfs2 (nilfs-utils 2.2.7) > Start writing file system initial data to the device > Blocksize:4096 Device:./foo Device Size:268435456 > File system initialization succeeded !! > # mount -o loop ./foo /mnt > [ 36.371331] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > [ 40.252869] BUG: kernel NULL pointer dereference, address: 00000000000000a8 > (snip) > > After reboot, it fails: > > # mount /dev/sda1 /mnt > [ 14.021188] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > [ 20.576309] BUG: kernel NULL pointer dereference, address: 00000000000000a8 > (snip) > > But if you do dummy write to the device file before mounting, it > works: > > # dd if=/dev/sda1 of=/dev/sda1 count=1 > 1+0 records in > 1+0 records out > 512 bytes copied, 0.0135982 s, 37.7 kB/s > # mount /dev/sda1 /mnt > [ 52.604560] NILFS (sda1): mounting unchecked fs > [ 52.613335] NILFS (sda1): recovery complete > [ 52.613877] NILFS (sda1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > # > > # losetup /dev/loop0 foo > # dd if=/dev/loop0 of=/dev/loop0 count=1 > 1+0 records in > 1+0 records out > 512 bytes copied, 0.0243797 s, 21.0 kB/s > # mount /dev/loop0 /mnt > [ 271.915595] NILFS (loop0): mounting unchecked fs > [ 272.049603] NILFS (loop0): recovery complete > [ 272.049724] NILFS (loop0): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds > # touch /mnt > # sync > # > > I think the dummy write is a simple workaround for now, unless > mounting NILFS2 at boot time. But I have been using NILFS2 /home for > years, I would like to know better workarounds. > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-05-31 17:49 ` Ryusuke Konishi 0 siblings, 0 replies; 32+ messages in thread From: Ryusuke Konishi @ 2020-05-31 17:49 UTC (permalink / raw) To: hdk1983; +Cc: tommytoad0, linux-nilfs, linux-kernel Hi, This bug turned out to be caused by set_page_writeback() call for segment summary buffers and super root buffers at nilfs_segctor_prepare_write(). set_page_writeback() can call inc_wb_stat(inode_to_wb(inode), WB_WRIEBACK) where inode_to_wb(inode) is NULL if inode_attach_wb() is not called in advance. To ensure inode_attach_wb() is called, mark_buffer_dirty() should be called for those buffers. The following patch fixes this issue, but I got another oops at nilfs_segctor_complete_write() during a stress test. So, I'm still investigating. Regards, Ryusuke Konishi === diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 445eef4..f6b5ca8 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1650,6 +1650,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) list_for_each_entry(bh, &segbuf->sb_segsum_buffers, b_assoc_buffers) { + set_buffer_uptodate(bh); + mark_buffer_dirty(bh); if (bh->b_page != bd_page) { if (bd_page) { lock_page(bd_page); @@ -1665,6 +1667,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) b_assoc_buffers) { set_buffer_async_write(bh); if (bh == segbuf->sb_super_root) { + set_buffer_uptodate(bh); + mark_buffer_dirty(bh); if (bh->b_page != bd_page) { lock_page(bd_page); clear_page_dirty_for_io(bd_page); === On Thu, 30 Apr 2020 08:27:47 -0700, Tom <tommytoad0@gmail.com> wrote: > Thank you! This is very helpful information, and does seem to be a > workaround. > > Like you, I have my home directory on a separate NILFS2 filesystem. As > a temporary solution, I removed the line from /etc/fstab for that > filesystem and added your dd suggestion along with a manual mount of > the home filesystem to /etc/rc.local. /home is now mounted properly > at boot with any of the newer kernels I tried. > > Thanks, > Tom > > On 4/30/20 5:38 AM, Hideki EIRAKU wrote: >>> In Msg <874kuapb2s.fsf@logand.com>; >>> Subject "Re: BUG: unable to handle kernel NULL pointer dereference at >>> 00000000000000a8 in nilfs_segctor_do_construct": >>> >>>> Tomas Hlavaty <tom@logand.com> writes: >>>>>>> 2) Can you mount the corrupted(?) partition from a recent version of >>>>>>> kernel ? >>>> >>>> I tried the following Linux kernel versions: >>>> >>>> - v4.19 >>>> - v5.4 >>>> - v5.5.11 >>>> >>>> and still get the crash >> I found conditions to reproduce this issue with Linux 5.7-rc3: >> - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y >> - When the NILFS2 file system writes to a device, the device file has >> never written by other programs since boot >> The following is an example with CONFIG_MEMCG=y and >> CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works >> because the mkfs command has written data to the device file before >> mounting: >> # mkfs -t nilfs2 /dev/sda1 >> mkfs.nilfs2 (nilfs-utils 2.2.7) >> Start writing file system initial data to the device >> Blocksize:4096 Device:/dev/sda1 Device Size:267386880 >> File system initialization succeeded !! >> # mount /dev/sda1 /mnt >> # touch /mnt >> # sync >> # >> Loopback mount seems to be the same - if you do losetup, mkfs and >> mount on a loopback device, it works: >> # losetup /dev/loop0 foo >> # mkfs -t nilfs2 /dev/loop0 >> mkfs.nilfs2 (nilfs-utils 2.2.7) >> Start writing file system initial data to the device >> Blocksize:4096 Device:/dev/loop0 Device Size:267386880 >> File system initialization succeeded !! >> # mount /dev/sda1 /mnt >> # touch /mnt >> # sync >> # >> But if you do mkfs on a file and use mount -o loop, it may fail, >> depending on whether the loopback device assigned by the mount command >> was used or not before mounting: >> # /sbin/mkfs.nilfs2 ./foo >> mkfs.nilfs2 (nilfs-utils 2.2.7) >> Start writing file system initial data to the device >> Blocksize:4096 Device:./foo Device Size:268435456 >> File system initialization succeeded !! >> # mount -o loop ./foo /mnt >> [ 36.371331] NILFS (loop0): segctord starting. Construction interval = >> 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> [ 40.252869] BUG: kernel NULL pointer dereference, address: >> 00000000000000a8 >> (snip) >> After reboot, it fails: >> # mount /dev/sda1 /mnt >> [ 14.021188] NILFS (sda1): segctord starting. Construction interval = >> 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> [ 20.576309] BUG: kernel NULL pointer dereference, address: >> 00000000000000a8 >> (snip) >> But if you do dummy write to the device file before mounting, it >> works: >> # dd if=/dev/sda1 of=/dev/sda1 count=1 >> 1+0 records in >> 1+0 records out >> 512 bytes copied, 0.0135982 s, 37.7 kB/s >> # mount /dev/sda1 /mnt >> [ 52.604560] NILFS (sda1): mounting unchecked fs >> [ 52.613335] NILFS (sda1): recovery complete >> [ 52.613877] NILFS (sda1): segctord starting. Construction interval = >> 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> # >> # losetup /dev/loop0 foo >> # dd if=/dev/loop0 of=/dev/loop0 count=1 >> 1+0 records in >> 1+0 records out >> 512 bytes copied, 0.0243797 s, 21.0 kB/s >> # mount /dev/loop0 /mnt >> [ 271.915595] NILFS (loop0): mounting unchecked fs >> [ 272.049603] NILFS (loop0): recovery complete >> [ 272.049724] NILFS (loop0): segctord starting. Construction interval >> = 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> # >> I think the dummy write is a simple workaround for now, unless >> mounting NILFS2 at boot time. But I have been using NILFS2 /home for >> years, I would like to know better workarounds. >> ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-05-31 17:49 ` Ryusuke Konishi 0 siblings, 0 replies; 32+ messages in thread From: Ryusuke Konishi @ 2020-05-31 17:49 UTC (permalink / raw) To: hdk1983-Re5JQEeQqe8AvxtiuMwx3w Cc: tommytoad0-Re5JQEeQqe8AvxtiuMwx3w, linux-nilfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi, This bug turned out to be caused by set_page_writeback() call for segment summary buffers and super root buffers at nilfs_segctor_prepare_write(). set_page_writeback() can call inc_wb_stat(inode_to_wb(inode), WB_WRIEBACK) where inode_to_wb(inode) is NULL if inode_attach_wb() is not called in advance. To ensure inode_attach_wb() is called, mark_buffer_dirty() should be called for those buffers. The following patch fixes this issue, but I got another oops at nilfs_segctor_complete_write() during a stress test. So, I'm still investigating. Regards, Ryusuke Konishi === diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c index 445eef4..f6b5ca8 100644 --- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -1650,6 +1650,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) list_for_each_entry(bh, &segbuf->sb_segsum_buffers, b_assoc_buffers) { + set_buffer_uptodate(bh); + mark_buffer_dirty(bh); if (bh->b_page != bd_page) { if (bd_page) { lock_page(bd_page); @@ -1665,6 +1667,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) b_assoc_buffers) { set_buffer_async_write(bh); if (bh == segbuf->sb_super_root) { + set_buffer_uptodate(bh); + mark_buffer_dirty(bh); if (bh->b_page != bd_page) { lock_page(bd_page); clear_page_dirty_for_io(bd_page); === On Thu, 30 Apr 2020 08:27:47 -0700, Tom <tommytoad0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > Thank you! This is very helpful information, and does seem to be a > workaround. > > Like you, I have my home directory on a separate NILFS2 filesystem. As > a temporary solution, I removed the line from /etc/fstab for that > filesystem and added your dd suggestion along with a manual mount of > the home filesystem to /etc/rc.local. /home is now mounted properly > at boot with any of the newer kernels I tried. > > Thanks, > Tom > > On 4/30/20 5:38 AM, Hideki EIRAKU wrote: >>> In Msg <874kuapb2s.fsf-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org>; >>> Subject "Re: BUG: unable to handle kernel NULL pointer dereference at >>> 00000000000000a8 in nilfs_segctor_do_construct": >>> >>>> Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org> writes: >>>>>>> 2) Can you mount the corrupted(?) partition from a recent version of >>>>>>> kernel ? >>>> >>>> I tried the following Linux kernel versions: >>>> >>>> - v4.19 >>>> - v5.4 >>>> - v5.5.11 >>>> >>>> and still get the crash >> I found conditions to reproduce this issue with Linux 5.7-rc3: >> - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y >> - When the NILFS2 file system writes to a device, the device file has >> never written by other programs since boot >> The following is an example with CONFIG_MEMCG=y and >> CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works >> because the mkfs command has written data to the device file before >> mounting: >> # mkfs -t nilfs2 /dev/sda1 >> mkfs.nilfs2 (nilfs-utils 2.2.7) >> Start writing file system initial data to the device >> Blocksize:4096 Device:/dev/sda1 Device Size:267386880 >> File system initialization succeeded !! >> # mount /dev/sda1 /mnt >> # touch /mnt >> # sync >> # >> Loopback mount seems to be the same - if you do losetup, mkfs and >> mount on a loopback device, it works: >> # losetup /dev/loop0 foo >> # mkfs -t nilfs2 /dev/loop0 >> mkfs.nilfs2 (nilfs-utils 2.2.7) >> Start writing file system initial data to the device >> Blocksize:4096 Device:/dev/loop0 Device Size:267386880 >> File system initialization succeeded !! >> # mount /dev/sda1 /mnt >> # touch /mnt >> # sync >> # >> But if you do mkfs on a file and use mount -o loop, it may fail, >> depending on whether the loopback device assigned by the mount command >> was used or not before mounting: >> # /sbin/mkfs.nilfs2 ./foo >> mkfs.nilfs2 (nilfs-utils 2.2.7) >> Start writing file system initial data to the device >> Blocksize:4096 Device:./foo Device Size:268435456 >> File system initialization succeeded !! >> # mount -o loop ./foo /mnt >> [ 36.371331] NILFS (loop0): segctord starting. Construction interval = >> 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> [ 40.252869] BUG: kernel NULL pointer dereference, address: >> 00000000000000a8 >> (snip) >> After reboot, it fails: >> # mount /dev/sda1 /mnt >> [ 14.021188] NILFS (sda1): segctord starting. Construction interval = >> 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> [ 20.576309] BUG: kernel NULL pointer dereference, address: >> 00000000000000a8 >> (snip) >> But if you do dummy write to the device file before mounting, it >> works: >> # dd if=/dev/sda1 of=/dev/sda1 count=1 >> 1+0 records in >> 1+0 records out >> 512 bytes copied, 0.0135982 s, 37.7 kB/s >> # mount /dev/sda1 /mnt >> [ 52.604560] NILFS (sda1): mounting unchecked fs >> [ 52.613335] NILFS (sda1): recovery complete >> [ 52.613877] NILFS (sda1): segctord starting. Construction interval = >> 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> # >> # losetup /dev/loop0 foo >> # dd if=/dev/loop0 of=/dev/loop0 count=1 >> 1+0 records in >> 1+0 records out >> 512 bytes copied, 0.0243797 s, 21.0 kB/s >> # mount /dev/loop0 /mnt >> [ 271.915595] NILFS (loop0): mounting unchecked fs >> [ 272.049603] NILFS (loop0): recovery complete >> [ 272.049724] NILFS (loop0): segctord starting. Construction interval >> = 5 seconds, CP frequency < 30 seconds >> # touch /mnt >> # sync >> # >> I think the dummy write is a simple workaround for now, unless >> mounting NILFS2 at boot time. But I have been using NILFS2 /home for >> years, I would like to know better workarounds. >> ^ permalink raw reply related [flat|nested] 32+ messages in thread
[parent not found: <20200601024013.1296-1-hdanton@sina.com>]
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-06-01 11:46 ` Ryusuke Konishi 0 siblings, 0 replies; 32+ messages in thread From: Ryusuke Konishi @ 2020-06-01 11:46 UTC (permalink / raw) To: Hillf Danton Cc: hdk1983, tommytoad0, linux-nilfs, LKML, Konstantin Khlebnikov, Greg Kroah-Hartman > Wondering if it can be reproduced on mainline with c3aab9a0bd91 > ("mm/filemap.c: dont initiate writeback if mapping has no dirty pages") > reverted? For mainline kernels with that commit reverted, this oops actually doesn't occur. Regards, Ryusuke Konishi On Mon, Jun 1, 2020 at 11:40 AM Hillf Danton <hdanton@sina.com> wrote: > On Mon, 01 Jun 2020 02:49:54 Ryusuke Konishi wrote: > > Hi, > > > > This bug turned out to be caused by set_page_writeback() call for > > segment summary buffers and super root buffers at > > nilfs_segctor_prepare_write(). > > > > set_page_writeback() can call inc_wb_stat(inode_to_wb(inode), > > WB_WRIEBACK) where inode_to_wb(inode) is NULL if inode_attach_wb() is > > not called in advance. To ensure inode_attach_wb() is called, > > mark_buffer_dirty() should be called for those buffers. > > > > The following patch fixes this issue, > > Thanks for sharing your analysis and patch. > > Wondering if it can be reproduced on mainline with c3aab9a0bd91 > ("mm/filemap.c: dont initiate writeback if mapping has no dirty pages") > reverted? If no then we need to update the stable trees. > > Hillf > > > but I got another oops at > > nilfs_segctor_complete_write() during a stress test. So, I'm still > > investigating. > > > > Regards, > > Ryusuke Konishi > > > > === > > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c > > index 445eef4..f6b5ca8 100644 > > --- a/fs/nilfs2/segment.c > > +++ b/fs/nilfs2/segment.c > > @@ -1650,6 +1650,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) > > > > list_for_each_entry(bh, &segbuf->sb_segsum_buffers, > > b_assoc_buffers) { > > + set_buffer_uptodate(bh); > > + mark_buffer_dirty(bh); > > if (bh->b_page != bd_page) { > > if (bd_page) { > > lock_page(bd_page); > > @@ -1665,6 +1667,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) > > b_assoc_buffers) { > > set_buffer_async_write(bh); > > if (bh == segbuf->sb_super_root) { > > + set_buffer_uptodate(bh); > > + mark_buffer_dirty(bh); > > if (bh->b_page != bd_page) { > > lock_page(bd_page); > > clear_page_dirty_for_io(bd_page); > > === > > > > > > On Thu, 30 Apr 2020 08:27:47 -0700, Tom <tommytoad0@gmail.com> wrote: > > > Thank you! This is very helpful information, and does seem to be a > > > workaround. > > > > > > Like you, I have my home directory on a separate NILFS2 filesystem. As > > > a temporary solution, I removed the line from /etc/fstab for that > > > filesystem and added your dd suggestion along with a manual mount of > > > the home filesystem to /etc/rc.local. /home is now mounted properly > > > at boot with any of the newer kernels I tried. > > > > > > Thanks, > > > Tom > > > > > > On 4/30/20 5:38 AM, Hideki EIRAKU wrote: > > >>> In Msg <874kuapb2s.fsf@logand.com>; > > >>> Subject "Re: BUG: unable to handle kernel NULL pointer dereference at > > >>> 00000000000000a8 in nilfs_segctor_do_construct": > > >>> > > >>>> Tomas Hlavaty <tom@logand.com> writes: > > >>>>>>> 2) Can you mount the corrupted(?) partition from a recent version of > > >>>>>>> kernel ? > > >>>> > > >>>> I tried the following Linux kernel versions: > > >>>> > > >>>> - v4.19 > > >>>> - v5.4 > > >>>> - v5.5.11 > > >>>> > > >>>> and still get the crash > > >> I found conditions to reproduce this issue with Linux 5.7-rc3: > > >> - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y > > >> - When the NILFS2 file system writes to a device, the device file has > > >> never written by other programs since boot > > >> The following is an example with CONFIG_MEMCG=y and > > >> CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works > > >> because the mkfs command has written data to the device file before > > >> mounting: > > >> # mkfs -t nilfs2 /dev/sda1 > > >> mkfs.nilfs2 (nilfs-utils 2.2.7) > > >> Start writing file system initial data to the device > > >> Blocksize:4096 Device:/dev/sda1 Device Size:267386880 > > >> File system initialization succeeded !! > > >> # mount /dev/sda1 /mnt > > >> # touch /mnt > > >> # sync > > >> # > > >> Loopback mount seems to be the same - if you do losetup, mkfs and > > >> mount on a loopback device, it works: > > >> # losetup /dev/loop0 foo > > >> # mkfs -t nilfs2 /dev/loop0 > > >> mkfs.nilfs2 (nilfs-utils 2.2.7) > > >> Start writing file system initial data to the device > > >> Blocksize:4096 Device:/dev/loop0 Device Size:267386880 > > >> File system initialization succeeded !! > > >> # mount /dev/sda1 /mnt > > >> # touch /mnt > > >> # sync > > >> # > > >> But if you do mkfs on a file and use mount -o loop, it may fail, > > >> depending on whether the loopback device assigned by the mount command > > >> was used or not before mounting: > > >> # /sbin/mkfs.nilfs2 ./foo > > >> mkfs.nilfs2 (nilfs-utils 2.2.7) > > >> Start writing file system initial data to the device > > >> Blocksize:4096 Device:./foo Device Size:268435456 > > >> File system initialization succeeded !! > > >> # mount -o loop ./foo /mnt > > >> [ 36.371331] NILFS (loop0): segctord starting. Construction interval = > > >> 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> [ 40.252869] BUG: kernel NULL pointer dereference, address: > > >> 00000000000000a8 > > >> (snip) > > >> After reboot, it fails: > > >> # mount /dev/sda1 /mnt > > >> [ 14.021188] NILFS (sda1): segctord starting. Construction interval = > > >> 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> [ 20.576309] BUG: kernel NULL pointer dereference, address: > > >> 00000000000000a8 > > >> (snip) > > >> But if you do dummy write to the device file before mounting, it > > >> works: > > >> # dd if=/dev/sda1 of=/dev/sda1 count=1 > > >> 1+0 records in > > >> 1+0 records out > > >> 512 bytes copied, 0.0135982 s, 37.7 kB/s > > >> # mount /dev/sda1 /mnt > > >> [ 52.604560] NILFS (sda1): mounting unchecked fs > > >> [ 52.613335] NILFS (sda1): recovery complete > > >> [ 52.613877] NILFS (sda1): segctord starting. Construction interval = > > >> 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> # > > >> # losetup /dev/loop0 foo > > >> # dd if=/dev/loop0 of=/dev/loop0 count=1 > > >> 1+0 records in > > >> 1+0 records out > > >> 512 bytes copied, 0.0243797 s, 21.0 kB/s > > >> # mount /dev/loop0 /mnt > > >> [ 271.915595] NILFS (loop0): mounting unchecked fs > > >> [ 272.049603] NILFS (loop0): recovery complete > > >> [ 272.049724] NILFS (loop0): segctord starting. Construction interval > > >> = 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> # > > >> I think the dummy write is a simple workaround for now, unless > > >> mounting NILFS2 at boot time. But I have been using NILFS2 /home for > > >> years, I would like to know better workarounds. > > >> > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co @ 2020-06-01 11:46 ` Ryusuke Konishi 0 siblings, 0 replies; 32+ messages in thread From: Ryusuke Konishi @ 2020-06-01 11:46 UTC (permalink / raw) To: Hillf Danton Cc: hdk1983-Re5JQEeQqe8AvxtiuMwx3w, tommytoad0-Re5JQEeQqe8AvxtiuMwx3w, linux-nilfs, LKML, Konstantin Khlebnikov, Greg Kroah-Hartman > Wondering if it can be reproduced on mainline with c3aab9a0bd91 > ("mm/filemap.c: dont initiate writeback if mapping has no dirty pages") > reverted? For mainline kernels with that commit reverted, this oops actually doesn't occur. Regards, Ryusuke Konishi On Mon, Jun 1, 2020 at 11:40 AM Hillf Danton <hdanton-k+cT0dCbe1g@public.gmane.org> wrote: > On Mon, 01 Jun 2020 02:49:54 Ryusuke Konishi wrote: > > Hi, > > > > This bug turned out to be caused by set_page_writeback() call for > > segment summary buffers and super root buffers at > > nilfs_segctor_prepare_write(). > > > > set_page_writeback() can call inc_wb_stat(inode_to_wb(inode), > > WB_WRIEBACK) where inode_to_wb(inode) is NULL if inode_attach_wb() is > > not called in advance. To ensure inode_attach_wb() is called, > > mark_buffer_dirty() should be called for those buffers. > > > > The following patch fixes this issue, > > Thanks for sharing your analysis and patch. > > Wondering if it can be reproduced on mainline with c3aab9a0bd91 > ("mm/filemap.c: dont initiate writeback if mapping has no dirty pages") > reverted? If no then we need to update the stable trees. > > Hillf > > > but I got another oops at > > nilfs_segctor_complete_write() during a stress test. So, I'm still > > investigating. > > > > Regards, > > Ryusuke Konishi > > > > === > > diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c > > index 445eef4..f6b5ca8 100644 > > --- a/fs/nilfs2/segment.c > > +++ b/fs/nilfs2/segment.c > > @@ -1650,6 +1650,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) > > > > list_for_each_entry(bh, &segbuf->sb_segsum_buffers, > > b_assoc_buffers) { > > + set_buffer_uptodate(bh); > > + mark_buffer_dirty(bh); > > if (bh->b_page != bd_page) { > > if (bd_page) { > > lock_page(bd_page); > > @@ -1665,6 +1667,8 @@ static void nilfs_segctor_prepare_write(struct nilfs_sc_info *sci) > > b_assoc_buffers) { > > set_buffer_async_write(bh); > > if (bh == segbuf->sb_super_root) { > > + set_buffer_uptodate(bh); > > + mark_buffer_dirty(bh); > > if (bh->b_page != bd_page) { > > lock_page(bd_page); > > clear_page_dirty_for_io(bd_page); > > === > > > > > > On Thu, 30 Apr 2020 08:27:47 -0700, Tom <tommytoad0-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: > > > Thank you! This is very helpful information, and does seem to be a > > > workaround. > > > > > > Like you, I have my home directory on a separate NILFS2 filesystem. As > > > a temporary solution, I removed the line from /etc/fstab for that > > > filesystem and added your dd suggestion along with a manual mount of > > > the home filesystem to /etc/rc.local. /home is now mounted properly > > > at boot with any of the newer kernels I tried. > > > > > > Thanks, > > > Tom > > > > > > On 4/30/20 5:38 AM, Hideki EIRAKU wrote: > > >>> In Msg <874kuapb2s.fsf-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org>; > > >>> Subject "Re: BUG: unable to handle kernel NULL pointer dereference at > > >>> 00000000000000a8 in nilfs_segctor_do_construct": > > >>> > > >>>> Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org> writes: > > >>>>>>> 2) Can you mount the corrupted(?) partition from a recent version of > > >>>>>>> kernel ? > > >>>> > > >>>> I tried the following Linux kernel versions: > > >>>> > > >>>> - v4.19 > > >>>> - v5.4 > > >>>> - v5.5.11 > > >>>> > > >>>> and still get the crash > > >> I found conditions to reproduce this issue with Linux 5.7-rc3: > > >> - CONFIG_MEMCG=y *and* CONFIG_BLK_CGROUP=y > > >> - When the NILFS2 file system writes to a device, the device file has > > >> never written by other programs since boot > > >> The following is an example with CONFIG_MEMCG=y and > > >> CONFIG_BLK_CGROUP=y kernel. If you do mkfs and mount it, it works > > >> because the mkfs command has written data to the device file before > > >> mounting: > > >> # mkfs -t nilfs2 /dev/sda1 > > >> mkfs.nilfs2 (nilfs-utils 2.2.7) > > >> Start writing file system initial data to the device > > >> Blocksize:4096 Device:/dev/sda1 Device Size:267386880 > > >> File system initialization succeeded !! > > >> # mount /dev/sda1 /mnt > > >> # touch /mnt > > >> # sync > > >> # > > >> Loopback mount seems to be the same - if you do losetup, mkfs and > > >> mount on a loopback device, it works: > > >> # losetup /dev/loop0 foo > > >> # mkfs -t nilfs2 /dev/loop0 > > >> mkfs.nilfs2 (nilfs-utils 2.2.7) > > >> Start writing file system initial data to the device > > >> Blocksize:4096 Device:/dev/loop0 Device Size:267386880 > > >> File system initialization succeeded !! > > >> # mount /dev/sda1 /mnt > > >> # touch /mnt > > >> # sync > > >> # > > >> But if you do mkfs on a file and use mount -o loop, it may fail, > > >> depending on whether the loopback device assigned by the mount command > > >> was used or not before mounting: > > >> # /sbin/mkfs.nilfs2 ./foo > > >> mkfs.nilfs2 (nilfs-utils 2.2.7) > > >> Start writing file system initial data to the device > > >> Blocksize:4096 Device:./foo Device Size:268435456 > > >> File system initialization succeeded !! > > >> # mount -o loop ./foo /mnt > > >> [ 36.371331] NILFS (loop0): segctord starting. Construction interval = > > >> 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> [ 40.252869] BUG: kernel NULL pointer dereference, address: > > >> 00000000000000a8 > > >> (snip) > > >> After reboot, it fails: > > >> # mount /dev/sda1 /mnt > > >> [ 14.021188] NILFS (sda1): segctord starting. Construction interval = > > >> 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> [ 20.576309] BUG: kernel NULL pointer dereference, address: > > >> 00000000000000a8 > > >> (snip) > > >> But if you do dummy write to the device file before mounting, it > > >> works: > > >> # dd if=/dev/sda1 of=/dev/sda1 count=1 > > >> 1+0 records in > > >> 1+0 records out > > >> 512 bytes copied, 0.0135982 s, 37.7 kB/s > > >> # mount /dev/sda1 /mnt > > >> [ 52.604560] NILFS (sda1): mounting unchecked fs > > >> [ 52.613335] NILFS (sda1): recovery complete > > >> [ 52.613877] NILFS (sda1): segctord starting. Construction interval = > > >> 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> # > > >> # losetup /dev/loop0 foo > > >> # dd if=/dev/loop0 of=/dev/loop0 count=1 > > >> 1+0 records in > > >> 1+0 records out > > >> 512 bytes copied, 0.0243797 s, 21.0 kB/s > > >> # mount /dev/loop0 /mnt > > >> [ 271.915595] NILFS (loop0): mounting unchecked fs > > >> [ 272.049603] NILFS (loop0): recovery complete > > >> [ 272.049724] NILFS (loop0): segctord starting. Construction interval > > >> = 5 seconds, CP frequency < 30 seconds > > >> # touch /mnt > > >> # sync > > >> # > > >> I think the dummy write is a simple workaround for now, unless > > >> mounting NILFS2 at boot time. But I have been using NILFS2 /home for > > >> years, I would like to know better workarounds. > > >> > > > ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct 2019-11-18 16:51 ` Ryusuke Konishi 2019-11-19 6:04 ` Viacheslav Dubeyko 2019-12-19 21:02 ` Tomas Hlavaty @ 2020-01-23 13:58 ` ARAI Shun-ichi [not found] ` <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> ` (2 more replies) 2 siblings, 3 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-01-23 13:58 UTC (permalink / raw) To: linux-kernel Hi, It is reproducible in my environment. Kernel version is 4.19.86 (Gentoo). NILFS2 with kernel 4.19.82 works well. I did following for the test. i) mount corrupt partition with read-only option (this partition causes "mounting fs with errors" at every rw mount) i-1) wait a few minutes ... not crash i-2) fs access (ls, du, ...) ... not crash ii) create small NILFS2 fs and read-write mount dd if=/dev/zero of=/tmp/n bs=1M count=500 mount -o loop /tmp/n /mnt/tmp ii-1) wait a few minutes ... not crash ii-2) touch file in the fs ... crash (in few seconds) In <CAKFNMo=k1wVHOwXhTLEOJ+A-nwmvJ+sN_PPa8kY8fMxrQ4R+Jw@mail.gmail.com>; Ryusuke Konishi <konishi.ryusuke@gmail.com> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > >> It was likely caused by improper shutdown and following nilfs2 partition >> corruption. Now I can still read the data, but on the whole the >> computer is not useable, because starting a process which uses the >> corrupted file system simply crashes in kernel. > > Thank you for reporting the issue. > Let me ask you a few questions: > > 1) Is the crash reproducible in the environment ? > 2) Can you mount the corrupted(?) partition from a recent version of kernel ? > 3) Does read-only mount option (-r) work to avoid the crash ? > > Thanks, > Ryusuke Konishi > > 2019年11月18日(月) 2:34 Tomas Hlavaty <tom@logand.com>: >> >> Hi Ryusuke, >> >> today I got this bug in kernel, which seems to be related to nilfs2. >> >> It was likely caused by improper shutdown and following nilfs2 partition >> corruption. Now I can still read the data, but on the whole the >> computer is not useable, because starting a process which uses the >> corrupted file system simply crashes in kernel. I am actually not sure >> if the filesystem is corrupted, as I don't know about any tool to check >> that. The relevant parts of dmesg log are bellow. >> >> Please let me know if you are the right contact or if you need more info >> about the problem. >> >> Thank you, >> >> Tomas >> >> [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 >> [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 >> >> >> >> [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. >> [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. >> [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >> >> [ 38.321561] NILFS version 2 loaded >> [ 38.323236] NILFS (dm-1): mounting unchecked fs >> >> >> [ 38.349185] NILFS (dm-1): recovery complete >> [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds >> >> [ 63.543941] systemd-journald[470]: File >> /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal >> corrupted or uncleanly shut down, renaming and replacing. >> >> [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 >> [12637.085558] PGD 0 P4D 0 >> [12637.085567] Oops: 0000 [#1] SMP PTI >> [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS >> [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 >> [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >> [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >> [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >> [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >> [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >> [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >> [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >> [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >> [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >> [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 >> [12637.085628] Call Trace: >> [12637.085640] __test_set_page_writeback+0x37c/0x3f0 >> [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] >> [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] >> [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] >> [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] >> [12637.085713] kthread+0x112/0x130 >> [12637.085719] ? kthread_bind+0x30/0x30 >> [12637.085728] ret_from_fork+0x1f/0x40 >> [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec coretemp btintel >> [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 >> [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod >> [12637.086000] CR2: 00000000000000a8 >> [12637.086005] ---[ end trace ee0079180c990cd2 ]--- >> [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >> [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >> [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >> [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >> [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >> [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >> [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >> [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >> [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >> [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
[parent not found: <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>]
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct [not found] ` <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> @ 2020-01-23 14:07 ` ARAI Shun-ichi 0 siblings, 0 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-01-23 14:07 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi, I fogot to send it to this ML... In <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>; ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > > It is reproducible in my environment. > Kernel version is 4.19.86 (Gentoo). > NILFS2 with kernel 4.19.82 works well. > > I did following for the test. > > i) mount corrupt partition with read-only option > (this partition causes "mounting fs with errors" at every rw mount) > i-1) wait a few minutes ... not crash > i-2) fs access (ls, du, ...) ... not crash > > ii) create small NILFS2 fs and read-write mount > dd if=/dev/zero of=/tmp/n bs=1M count=500 > mount -o loop /tmp/n /mnt/tmp > ii-1) wait a few minutes ... not crash > ii-2) touch file in the fs ... crash (in few seconds) > > > In <CAKFNMo=k1wVHOwXhTLEOJ+A-nwmvJ+sN_PPa8kY8fMxrQ4R+Jw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>; > Ryusuke Konishi <konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > >> Hi, >> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. >> >> Thank you for reporting the issue. >> Let me ask you a few questions: >> >> 1) Is the crash reproducible in the environment ? >> 2) Can you mount the corrupted(?) partition from a recent version of kernel ? >> 3) Does read-only mount option (-r) work to avoid the crash ? >> >> Thanks, >> Ryusuke Konishi >> >> 2019年11月18日(月) 2:34 Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org>: >>> >>> Hi Ryusuke, >>> >>> today I got this bug in kernel, which seems to be related to nilfs2. >>> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. I am actually not sure >>> if the filesystem is corrupted, as I don't know about any tool to check >>> that. The relevant parts of dmesg log are bellow. >>> >>> Please let me know if you are the right contact or if you need more info >>> about the problem. >>> >>> Thank you, >>> >>> Tomas >>> >>> [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 >>> [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 >>> >>> >>> >>> [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. >>> [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. >>> [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>> >>> [ 38.321561] NILFS version 2 loaded >>> [ 38.323236] NILFS (dm-1): mounting unchecked fs >>> >>> >>> [ 38.349185] NILFS (dm-1): recovery complete >>> [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds >>> >>> [ 63.543941] systemd-journald[470]: File >>> /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal >>> corrupted or uncleanly shut down, renaming and replacing. >>> >>> [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 >>> [12637.085558] PGD 0 P4D 0 >>> [12637.085567] Oops: 0000 [#1] SMP PTI >>> [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS >>> [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 >>> [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 >>> [12637.085628] Call Trace: >>> [12637.085640] __test_set_page_writeback+0x37c/0x3f0 >>> [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] >>> [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] >>> [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] >>> [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] >>> [12637.085713] kthread+0x112/0x130 >>> [12637.085719] ? kthread_bind+0x30/0x30 >>> [12637.085728] ret_from_fork+0x1f/0x40 >>> [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec c oretemp btintel >>> [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrac k nf_defrag_ipv6 >>> [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod >>> [12637.086000] CR2: 00000000000000a8 >>> [12637.086005] ---[ end trace ee0079180c990cd2 ]--- >>> [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct [not found] ` <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> @ 2020-01-23 14:30 ` ARAI Shun-ichi 0 siblings, 0 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-01-23 14:30 UTC (permalink / raw) To: linux-nilfs, linux-kernel Hi, Now I found that my /tmp is not tmpfs, it is in root partition (!!!???) And, I use LUKS for it. LUKS - VG - LV (root, usr, ...) I want to try "ii)" without LUKS/LVM, but cannot reboot now. In <20200123.225827.1155989593018204741.hermes@ceres.dti.ne.jp>; ARAI Shun-ichi <hermes@ceres.dti.ne.jp> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > > It is reproducible in my environment. > Kernel version is 4.19.86 (Gentoo). > NILFS2 with kernel 4.19.82 works well. > > I did following for the test. > > i) mount corrupt partition with read-only option > (this partition causes "mounting fs with errors" at every rw mount) > i-1) wait a few minutes ... not crash > i-2) fs access (ls, du, ...) ... not crash > > ii) create small NILFS2 fs and read-write mount > dd if=/dev/zero of=/tmp/n bs=1M count=500 > mount -o loop /tmp/n /mnt/tmp > ii-1) wait a few minutes ... not crash > ii-2) touch file in the fs ... crash (in few seconds) > > > In <CAKFNMo=k1wVHOwXhTLEOJ+A-nwmvJ+sN_PPa8kY8fMxrQ4R+Jw@mail.gmail.com>; > Ryusuke Konishi <konishi.ryusuke@gmail.com> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > >> Hi, >> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. >> >> Thank you for reporting the issue. >> Let me ask you a few questions: >> >> 1) Is the crash reproducible in the environment ? >> 2) Can you mount the corrupted(?) partition from a recent version of kernel ? >> 3) Does read-only mount option (-r) work to avoid the crash ? >> >> Thanks, >> Ryusuke Konishi >> >> 2019年11月18日(月) 2:34 Tomas Hlavaty <tom@logand.com>: >>> >>> Hi Ryusuke, >>> >>> today I got this bug in kernel, which seems to be related to nilfs2. >>> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. I am actually not sure >>> if the filesystem is corrupted, as I don't know about any tool to check >>> that. The relevant parts of dmesg log are bellow. >>> >>> Please let me know if you are the right contact or if you need more info >>> about the problem. >>> >>> Thank you, >>> >>> Tomas >>> >>> [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 >>> [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 >>> >>> >>> >>> [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. >>> [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. >>> [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>> >>> [ 38.321561] NILFS version 2 loaded >>> [ 38.323236] NILFS (dm-1): mounting unchecked fs >>> >>> >>> [ 38.349185] NILFS (dm-1): recovery complete >>> [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds >>> >>> [ 63.543941] systemd-journald[470]: File >>> /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal >>> corrupted or uncleanly shut down, renaming and replacing. >>> >>> [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 >>> [12637.085558] PGD 0 P4D 0 >>> [12637.085567] Oops: 0000 [#1] SMP PTI >>> [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS >>> [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 >>> [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 >>> [12637.085628] Call Trace: >>> [12637.085640] __test_set_page_writeback+0x37c/0x3f0 >>> [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] >>> [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] >>> [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] >>> [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] >>> [12637.085713] kthread+0x112/0x130 >>> [12637.085719] ? kthread_bind+0x30/0x30 >>> [12637.085728] ret_from_fork+0x1f/0x40 >>> [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec coretemp btintel >>> [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 >>> [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod >>> [12637.086000] CR2: 00000000000000a8 >>> [12637.086005] ---[ end trace ee0079180c990cd2 ]--- >>> [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-01-23 14:30 ` ARAI Shun-ichi 0 siblings, 0 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-01-23 14:30 UTC (permalink / raw) To: linux-nilfs-u79uwXL29TY76Z2rM5mHXA, linux-kernel-u79uwXL29TY76Z2rM5mHXA Hi, Now I found that my /tmp is not tmpfs, it is in root partition (!!!???) And, I use LUKS for it. LUKS - VG - LV (root, usr, ...) I want to try "ii)" without LUKS/LVM, but cannot reboot now. In <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>; ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > > It is reproducible in my environment. > Kernel version is 4.19.86 (Gentoo). > NILFS2 with kernel 4.19.82 works well. > > I did following for the test. > > i) mount corrupt partition with read-only option > (this partition causes "mounting fs with errors" at every rw mount) > i-1) wait a few minutes ... not crash > i-2) fs access (ls, du, ...) ... not crash > > ii) create small NILFS2 fs and read-write mount > dd if=/dev/zero of=/tmp/n bs=1M count=500 > mount -o loop /tmp/n /mnt/tmp > ii-1) wait a few minutes ... not crash > ii-2) touch file in the fs ... crash (in few seconds) > > > In <CAKFNMo=k1wVHOwXhTLEOJ+A-nwmvJ+sN_PPa8kY8fMxrQ4R+Jw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>; > Ryusuke Konishi <konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > >> Hi, >> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. >> >> Thank you for reporting the issue. >> Let me ask you a few questions: >> >> 1) Is the crash reproducible in the environment ? >> 2) Can you mount the corrupted(?) partition from a recent version of kernel ? >> 3) Does read-only mount option (-r) work to avoid the crash ? >> >> Thanks, >> Ryusuke Konishi >> >> 2019年11月18日(月) 2:34 Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org>: >>> >>> Hi Ryusuke, >>> >>> today I got this bug in kernel, which seems to be related to nilfs2. >>> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. I am actually not sure >>> if the filesystem is corrupted, as I don't know about any tool to check >>> that. The relevant parts of dmesg log are bellow. >>> >>> Please let me know if you are the right contact or if you need more info >>> about the problem. >>> >>> Thank you, >>> >>> Tomas >>> >>> [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 >>> [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 >>> >>> >>> >>> [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. >>> [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. >>> [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>> >>> [ 38.321561] NILFS version 2 loaded >>> [ 38.323236] NILFS (dm-1): mounting unchecked fs >>> >>> >>> [ 38.349185] NILFS (dm-1): recovery complete >>> [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds >>> >>> [ 63.543941] systemd-journald[470]: File >>> /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal >>> corrupted or uncleanly shut down, renaming and replacing. >>> >>> [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 >>> [12637.085558] PGD 0 P4D 0 >>> [12637.085567] Oops: 0000 [#1] SMP PTI >>> [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS >>> [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 >>> [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 >>> [12637.085628] Call Trace: >>> [12637.085640] __test_set_page_writeback+0x37c/0x3f0 >>> [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] >>> [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] >>> [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] >>> [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] >>> [12637.085713] kthread+0x112/0x130 >>> [12637.085719] ? kthread_bind+0x30/0x30 >>> [12637.085728] ret_from_fork+0x1f/0x40 >>> [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec c oretemp btintel >>> [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrac k nf_defrag_ipv6 >>> [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod >>> [12637.086000] CR2: 00000000000000a8 >>> [12637.086005] ---[ end trace ee0079180c990cd2 ]--- >>> [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct [not found] ` <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> @ 2020-02-10 13:46 ` ARAI Shun-ichi 0 siblings, 0 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-02-10 13:46 UTC (permalink / raw) To: linux-kernel, linux-nilfs Hi, FYI, reporting additional test results. I reproduced this problem with clean NILFS2 fs in previous mail. "clean" means that "make filesystem before every tests." In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. * Not reproduced USB stick - primary partition - NILFS2 USB stick - primary partition - VG/LV - NILFS2 USB stick - primary partition - VG/LV - LUKS - NILFS2 USB stick - primary partition - LUKS - VG/LV - NILFS2 USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) USB stick - primary partition(512MiB) - NILFS2 * Reproduced (always, immediately) /tmp (tmpfs) - regular file - NILFS2 (loopback mount) USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) Test conditions: kernel 4.19.86 (same as previous test) NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters size of "primary partition" in USB stick is approx. 14GiB size of "regular file" is approx. 512MiB "reproduce": mount NILFS2, touch file, sync In <20200123.225827.1155989593018204741.hermes@ceres.dti.ne.jp>; ARAI Shun-ichi <hermes@ceres.dti.ne.jp> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > > It is reproducible in my environment. > Kernel version is 4.19.86 (Gentoo). > NILFS2 with kernel 4.19.82 works well. > > I did following for the test. > > i) mount corrupt partition with read-only option > (this partition causes "mounting fs with errors" at every rw mount) > i-1) wait a few minutes ... not crash > i-2) fs access (ls, du, ...) ... not crash > > ii) create small NILFS2 fs and read-write mount > dd if=/dev/zero of=/tmp/n bs=1M count=500 > mount -o loop /tmp/n /mnt/tmp > ii-1) wait a few minutes ... not crash > ii-2) touch file in the fs ... crash (in few seconds) > > > In <CAKFNMo=k1wVHOwXhTLEOJ+A-nwmvJ+sN_PPa8kY8fMxrQ4R+Jw@mail.gmail.com>; > Ryusuke Konishi <konishi.ryusuke@gmail.com> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > >> Hi, >> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. >> >> Thank you for reporting the issue. >> Let me ask you a few questions: >> >> 1) Is the crash reproducible in the environment ? >> 2) Can you mount the corrupted(?) partition from a recent version of kernel ? >> 3) Does read-only mount option (-r) work to avoid the crash ? >> >> Thanks, >> Ryusuke Konishi >> >> 2019年11月18日(月) 2:34 Tomas Hlavaty <tom@logand.com>: >>> >>> Hi Ryusuke, >>> >>> today I got this bug in kernel, which seems to be related to nilfs2. >>> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. I am actually not sure >>> if the filesystem is corrupted, as I don't know about any tool to check >>> that. The relevant parts of dmesg log are bellow. >>> >>> Please let me know if you are the right contact or if you need more info >>> about the problem. >>> >>> Thank you, >>> >>> Tomas >>> >>> [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 >>> [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 >>> >>> >>> >>> [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. >>> [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. >>> [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>> >>> [ 38.321561] NILFS version 2 loaded >>> [ 38.323236] NILFS (dm-1): mounting unchecked fs >>> >>> >>> [ 38.349185] NILFS (dm-1): recovery complete >>> [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds >>> >>> [ 63.543941] systemd-journald[470]: File >>> /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal >>> corrupted or uncleanly shut down, renaming and replacing. >>> >>> [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 >>> [12637.085558] PGD 0 P4D 0 >>> [12637.085567] Oops: 0000 [#1] SMP PTI >>> [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS >>> [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 >>> [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 >>> [12637.085628] Call Trace: >>> [12637.085640] __test_set_page_writeback+0x37c/0x3f0 >>> [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] >>> [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] >>> [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] >>> [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] >>> [12637.085713] kthread+0x112/0x130 >>> [12637.085719] ? kthread_bind+0x30/0x30 >>> [12637.085728] ret_from_fork+0x1f/0x40 >>> [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec coretemp btintel >>> [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrack nf_defrag_ipv6 >>> [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod >>> [12637.086000] CR2: 00000000000000a8 >>> [12637.086005] ---[ end trace ee0079180c990cd2 ]--- >>> [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-02-10 13:46 ` ARAI Shun-ichi 0 siblings, 0 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-02-10 13:46 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-nilfs-u79uwXL29TY76Z2rM5mHXA Hi, FYI, reporting additional test results. I reproduced this problem with clean NILFS2 fs in previous mail. "clean" means that "make filesystem before every tests." In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. * Not reproduced USB stick - primary partition - NILFS2 USB stick - primary partition - VG/LV - NILFS2 USB stick - primary partition - VG/LV - LUKS - NILFS2 USB stick - primary partition - LUKS - VG/LV - NILFS2 USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) USB stick - primary partition(512MiB) - NILFS2 * Reproduced (always, immediately) /tmp (tmpfs) - regular file - NILFS2 (loopback mount) USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) Test conditions: kernel 4.19.86 (same as previous test) NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters size of "primary partition" in USB stick is approx. 14GiB size of "regular file" is approx. 512MiB "reproduce": mount NILFS2, touch file, sync In <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>; ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > > It is reproducible in my environment. > Kernel version is 4.19.86 (Gentoo). > NILFS2 with kernel 4.19.82 works well. > > I did following for the test. > > i) mount corrupt partition with read-only option > (this partition causes "mounting fs with errors" at every rw mount) > i-1) wait a few minutes ... not crash > i-2) fs access (ls, du, ...) ... not crash > > ii) create small NILFS2 fs and read-write mount > dd if=/dev/zero of=/tmp/n bs=1M count=500 > mount -o loop /tmp/n /mnt/tmp > ii-1) wait a few minutes ... not crash > ii-2) touch file in the fs ... crash (in few seconds) > > > In <CAKFNMo=k1wVHOwXhTLEOJ+A-nwmvJ+sN_PPa8kY8fMxrQ4R+Jw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>; > Ryusuke Konishi <konishi.ryusuke-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > >> Hi, >> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. >> >> Thank you for reporting the issue. >> Let me ask you a few questions: >> >> 1) Is the crash reproducible in the environment ? >> 2) Can you mount the corrupted(?) partition from a recent version of kernel ? >> 3) Does read-only mount option (-r) work to avoid the crash ? >> >> Thanks, >> Ryusuke Konishi >> >> 2019年11月18日(月) 2:34 Tomas Hlavaty <tom-3l5KtCzVe0PQT0dZR+AlfA@public.gmane.org>: >>> >>> Hi Ryusuke, >>> >>> today I got this bug in kernel, which seems to be related to nilfs2. >>> >>> It was likely caused by improper shutdown and following nilfs2 partition >>> corruption. Now I can still read the data, but on the whole the >>> computer is not useable, because starting a process which uses the >>> corrupted file system simply crashes in kernel. I am actually not sure >>> if the filesystem is corrupted, as I don't know about any tool to check >>> that. The relevant parts of dmesg log are bellow. >>> >>> Please let me know if you are the right contact or if you need more info >>> about the problem. >>> >>> Thank you, >>> >>> Tomas >>> >>> [ 0.000000] Linux version 4.19.84 (nixbld@localhost) (gcc version 8.3.0 (GCC)) #1-NixOS SMP Tue Nov 12 18:21:46 UTC 2019 >>> [ 0.000000] Command line: initrd=\efi\nixos\4s51zw36kd1qb0ymk0charxjg8x6k5k3-initrd-linux-4.19.84-initrd.efi systemConfig=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67 init=/nix/store/gdbxhzysr929abrymjqala0b5bh2fqmv-nixos-system-ushi-19.09.1258.07e66484e67/init loglevel=4 >>> >>> >>> >>> [ 37.741106] systemd-journald[470]: Received client request to flush runtime journal. >>> [ 37.749084] systemd-journald[470]: File /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/system.journal corrupted or uncleanly shut down, renaming and replacing. >>> [ 37.810819] audit: type=1130 audit(1573985039.617:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-udevd comm="systemd" exe="/nix/store/v8flm2h07zcfg5k5npz56m0ayj0qm1q8-systemd-243/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' >>> >>> [ 38.321561] NILFS version 2 loaded >>> [ 38.323236] NILFS (dm-1): mounting unchecked fs >>> >>> >>> [ 38.349185] NILFS (dm-1): recovery complete >>> [ 38.353228] NILFS (dm-1): segctord starting. Construction interval = 5 seconds, CP frequency < 30 seconds >>> >>> [ 63.543941] systemd-journald[470]: File >>> /var/log/journal/55a4ea9159c14c0bb8767a43819c6927/user-1000.journal >>> corrupted or uncleanly shut down, renaming and replacing. >>> >>> [12637.085548] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 >>> [12637.085558] PGD 0 P4D 0 >>> [12637.085567] Oops: 0000 [#1] SMP PTI >>> [12637.085574] CPU: 0 PID: 657 Comm: segctord Not tainted 4.19.84 #1-NixOS >>> [12637.085577] Hardware name: ASUSTeK COMPUTER INC. VivoBook 15_ASUS Laptop X507MA_R507MA/X507MA, BIOS X507MA.301 09/14/2018 >>> [12637.085589] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.085593] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.085597] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.085601] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.085604] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.085608] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.085611] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.085614] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.085618] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.085621] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.085624] CR2: 00000000000000a8 CR3: 000000011ac0a000 CR4: 0000000000340ef0 >>> [12637.085628] Call Trace: >>> [12637.085640] __test_set_page_writeback+0x37c/0x3f0 >>> [12637.085663] nilfs_segctor_do_construct+0x184e/0x2040 [nilfs2] >>> [12637.085680] nilfs_segctor_construct+0x1f5/0x2e0 [nilfs2] >>> [12637.085693] nilfs_segctor_thread+0x129/0x370 [nilfs2] >>> [12637.085706] ? nilfs_segctor_construct+0x2e0/0x2e0 [nilfs2] >>> [12637.085713] kthread+0x112/0x130 >>> [12637.085719] ? kthread_bind+0x30/0x30 >>> [12637.085728] ret_from_fork+0x1f/0x40 >>> [12637.085734] Modules linked in: ctr ccm af_packet msr 8021q snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic hid_multitouch arc4 ath9k ath9k_common ath9k_hw ath mac80211 snd_soc_skl snd_soc_skl_ipc spi_pxa2xx_platform asus_nb_wmi snd_soc_sst_ipc snd_soc_sst_dsp asus_wmi 8250_dw i2c_designware_platform sparse_keymap i2c_designware_core wmi_bmof i915 snd_hda_ext_core nilfs2 snd_soc_acpi_intel_match snd_soc_acpi uvcvideo videobuf2_vmalloc nls_iso8859_1 videobuf2_memops videobuf2_v4l2 snd_soc_core nls_cp437 rtsx_usb_ms intel_telemetry_pltdrv vfat intel_punit_ipc intel_telemetry_core fat intel_pmc_ipc memstick videobuf2_common snd_compress kvmgt vfio_mdev mdev ath3k vfio_iommu_type1 vfio btusb ac97_bus snd_pcm_dmaengine btrtl x86_pkg_temp_thermal intel_powerclamp btbcm cec c oretemp btintel >>> [12637.085819] crct10dif_pclmul crc32_pclmul videodev snd_hda_intel bluetooth drm_kms_helper ghash_clmulni_intel deflate media efi_pstore intel_cstate pstore intel_rapl_perf cfg80211 snd_hda_codec joydev mousedev evdev wdat_wdt serio_raw mac_hid efivars drm snd_hda_core snd_hwdep ecdh_generic snd_pcm snd_timer mei_me idma64 virt_dma snd intel_gtt agpgart i2c_i801 i2c_algo_bit mei fb_sys_fops syscopyarea soundcore rfkill processor_thermal_device sysfillrect sysimgblt intel_lpss_pci intel_soc_dts_iosf thermal wmi intel_lpss i2c_hid i2c_core battery tpm_crb button ac tpm_tis tpm_tis_core asus_wireless video pcc_cpufreq tpm rng_core pinctrl_geminilake int3400_thermal int3403_thermal pinctrl_intel int340x_thermal_zone acpi_thermal_rel iptable_nat nf_nat_ipv4 nf_nat xt_conntrack nf_conntrac k nf_defrag_ipv6 >>> [12637.085912] nf_defrag_ipv4 libcrc32c ip6t_rpfilter ipt_rpfilter ip6table_raw iptable_raw xt_pkttype nf_log_ipv6 nf_log_ipv4 nf_log_common xt_LOG xt_tcpudp ip6table_filter ip6_tables iptable_filter sch_fq_codel loop cpufreq_powersave tun tap macvlan bridge stp llc kvm_intel kvm irqbypass efivarfs ip_tables x_tables ipv6 crc_ccitt autofs4 ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt algif_skcipher af_alg rtsx_usb_sdmmc mmc_core rtsx_usb hid_generic usbhid hid sd_mod input_leds led_class atkbd libps2 ahci libahci xhci_pci libata xhci_hcd aesni_intel usbcore aes_x86_64 crypto_simd scsi_mod cryptd glue_helper crc32c_intel usb_common rtc_cmos i8042 serio dm_mod >>> [12637.086000] CR2: 00000000000000a8 >>> [12637.086005] ---[ end trace ee0079180c990cd2 ]--- >>> [12637.120805] RIP: 0010:percpu_counter_add_batch+0x4/0x60 >>> [12637.120807] Code: 89 e6 89 c7 e8 dd 3b 28 00 3b 05 fb e0 b6 00 72 d8 4c 89 ee 48 89 ef e8 7a 63 2a 00 48 89 d8 5b 5d 41 5c 41 5d c3 41 54 55 53 <48> 8b 47 20 65 44 8b 20 49 63 ec 48 63 ca 48 01 f5 48 39 e9 7e 0a >>> [12637.120809] RSP: 0018:ffff9d1b00a0bd20 EFLAGS: 00010006 >>> [12637.120811] RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000018 >>> [12637.120812] RDX: 0000000000000018 RSI: 0000000000000001 RDI: 0000000000000088 >>> [12637.120814] RBP: ffff8df67a2988d0 R08: 0000000000000000 R09: ffff8df66fe0cfe0 >>> [12637.120815] R10: 0000000000000230 R11: 0000000000000000 R12: 0000000000000000 >>> [12637.120816] R13: ffff8df67a298758 R14: ffff8df67a2988c8 R15: ffffccd684229a80 >>> [12637.120818] FS: 0000000000000000(0000) GS:ffff8df67ba00000(0000) knlGS:0000000000000000 >>> [12637.120820] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [12637.120821] CR2: 00000000000000a8 CR3: 0000000138e0a000 CR4: 0000000000340ef0 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-02-16 2:10 ` ARAI Shun-ichi 0 siblings, 0 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-02-16 2:10 UTC (permalink / raw) To: linux-kernel, linux-nilfs And, In <20200210.224609.499887311281343618.hermes@ceres.dti.ne.jp>; ARAI Shun-ichi <hermes@ceres.dti.ne.jp> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > > FYI, reporting additional test results. > > I reproduced this problem with clean NILFS2 fs in previous mail. > "clean" means that "make filesystem before every tests." > In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. > > * Not reproduced > USB stick - primary partition - NILFS2 > USB stick - primary partition - VG/LV - NILFS2 > USB stick - primary partition - VG/LV - LUKS - NILFS2 > USB stick - primary partition - LUKS - VG/LV - NILFS2 > USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 > /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) > USB stick - primary partition(512MiB) - NILFS2 > > * Reproduced (always, immediately) > /tmp (tmpfs) - regular file - NILFS2 (loopback mount) > USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) this loopback problem is seen in Kernel 5.5.4. > Test conditions: > kernel 4.19.86 (same as previous test) > NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters > size of "primary partition" in USB stick is approx. 14GiB > size of "regular file" is approx. 512MiB > "reproduce": mount NILFS2, touch file, sync ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-02-16 2:10 ` ARAI Shun-ichi 0 siblings, 0 replies; 32+ messages in thread From: ARAI Shun-ichi @ 2020-02-16 2:10 UTC (permalink / raw) To: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-nilfs-u79uwXL29TY76Z2rM5mHXA And, In <20200210.224609.499887311281343618.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>; ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > Hi, > > FYI, reporting additional test results. > > I reproduced this problem with clean NILFS2 fs in previous mail. > "clean" means that "make filesystem before every tests." > In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. > > * Not reproduced > USB stick - primary partition - NILFS2 > USB stick - primary partition - VG/LV - NILFS2 > USB stick - primary partition - VG/LV - LUKS - NILFS2 > USB stick - primary partition - LUKS - VG/LV - NILFS2 > USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 > /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) > USB stick - primary partition(512MiB) - NILFS2 > > * Reproduced (always, immediately) > /tmp (tmpfs) - regular file - NILFS2 (loopback mount) > USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) this loopback problem is seen in Kernel 5.5.4. > Test conditions: > kernel 4.19.86 (same as previous test) > NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters > size of "primary partition" in USB stick is approx. 14GiB > size of "regular file" is approx. 512MiB > "reproduce": mount NILFS2, touch file, sync ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-02-16 2:24 ` Brian G. 0 siblings, 0 replies; 32+ messages in thread From: Brian G. @ 2020-02-16 2:24 UTC (permalink / raw) To: ARAI Shun-ichi; +Cc: linux-kernel, linux-nilfs This is my first post to the LKML, so please be kind :) I also have been affected by this bug. The bug is triggered whenever a write happens to the filesystem, which means mounting read-only is an available option to recover data. I took the time to do a full bisect on the kernel sources and have identified the commit where the breakage happens. Regarding versions, I can confirm that 4.19.83 is stable with regards to NILFS, and 4.19.84 and later are broken. I can also confirm that 5.3.10 works fine and have heard that 5.3.12 breaks NILFS as well. I can also confirm that the 5.4.18 kernel still has this issue. I did not trace how far back the issue goes on the 5.4.x series, or even in more detail on the 5.3.x series. To simplify my bisection task, I used the 4.19.x series, and determined that commit d3b3c0a14615c495118acc4bdca23d53eea46ed2 is the commit that breaks NILFS. Furthermore, when reverting this commit on otherwise clean 4.19.84 kernel sources, the NILFS issue does not occur anymore. I'm not familiar enough with NILFS's internals to determine why the small caching change to the kernel from that commit breaks NILFS, nor can I offer a patch to fix it (besides reverting the offending change) but I can confirm that this is the initial cause. I also know there has been alot of new changes to kernel caching in more recent (5.4 / 5.5 / 5.6) kernels, so perhaps there is still more diagnostics to do. I have the test VM that I used for bisection available if someone wants to coordinate with me to put together a patch for this, but ideally someone can take my diagnostics effort here and make use of it directly. I saved dmesg logs from both good and bad cases and I can send them if someone is interested. I can also provide some level of detailed system setup instructions to reproduce the issue. I did my testing against an existing external hard drive, but I have been able to reproduce the issue consistently against a freshly created loopback mount as well, so it is not just caused by disk corruption or an unclean unmount. - Brian On Sat, Feb 15, 2020 at 8:11 PM ARAI Shun-ichi <hermes@ceres.dti.ne.jp> wrote: > > And, > > In <20200210.224609.499887311281343618.hermes@ceres.dti.ne.jp>; > ARAI Shun-ichi <hermes@ceres.dti.ne.jp> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > > > Hi, > > > > FYI, reporting additional test results. > > > > I reproduced this problem with clean NILFS2 fs in previous mail. > > "clean" means that "make filesystem before every tests." > > In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. > > > > * Not reproduced > > USB stick - primary partition - NILFS2 > > USB stick - primary partition - VG/LV - NILFS2 > > USB stick - primary partition - VG/LV - LUKS - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) > > USB stick - primary partition(512MiB) - NILFS2 > > > > * Reproduced (always, immediately) > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount) > > USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) > > this loopback problem is seen in Kernel 5.5.4. > > > Test conditions: > > kernel 4.19.86 (same as previous test) > > NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters > > size of "primary partition" in USB stick is approx. 14GiB > > size of "regular file" is approx. 512MiB > > "reproduce": mount NILFS2, touch file, sync ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-02-16 2:24 ` Brian G. 0 siblings, 0 replies; 32+ messages in thread From: Brian G. @ 2020-02-16 2:24 UTC (permalink / raw) To: ARAI Shun-ichi Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, linux-nilfs-u79uwXL29TY76Z2rM5mHXA This is my first post to the LKML, so please be kind :) I also have been affected by this bug. The bug is triggered whenever a write happens to the filesystem, which means mounting read-only is an available option to recover data. I took the time to do a full bisect on the kernel sources and have identified the commit where the breakage happens. Regarding versions, I can confirm that 4.19.83 is stable with regards to NILFS, and 4.19.84 and later are broken. I can also confirm that 5.3.10 works fine and have heard that 5.3.12 breaks NILFS as well. I can also confirm that the 5.4.18 kernel still has this issue. I did not trace how far back the issue goes on the 5.4.x series, or even in more detail on the 5.3.x series. To simplify my bisection task, I used the 4.19.x series, and determined that commit d3b3c0a14615c495118acc4bdca23d53eea46ed2 is the commit that breaks NILFS. Furthermore, when reverting this commit on otherwise clean 4.19.84 kernel sources, the NILFS issue does not occur anymore. I'm not familiar enough with NILFS's internals to determine why the small caching change to the kernel from that commit breaks NILFS, nor can I offer a patch to fix it (besides reverting the offending change) but I can confirm that this is the initial cause. I also know there has been alot of new changes to kernel caching in more recent (5.4 / 5.5 / 5.6) kernels, so perhaps there is still more diagnostics to do. I have the test VM that I used for bisection available if someone wants to coordinate with me to put together a patch for this, but ideally someone can take my diagnostics effort here and make use of it directly. I saved dmesg logs from both good and bad cases and I can send them if someone is interested. I can also provide some level of detailed system setup instructions to reproduce the issue. I did my testing against an existing external hard drive, but I have been able to reproduce the issue consistently against a freshly created loopback mount as well, so it is not just caused by disk corruption or an unclean unmount. - Brian On Sat, Feb 15, 2020 at 8:11 PM ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote: > > And, > > In <20200210.224609.499887311281343618.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>; > ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > > > Hi, > > > > FYI, reporting additional test results. > > > > I reproduced this problem with clean NILFS2 fs in previous mail. > > "clean" means that "make filesystem before every tests." > > In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. > > > > * Not reproduced > > USB stick - primary partition - NILFS2 > > USB stick - primary partition - VG/LV - NILFS2 > > USB stick - primary partition - VG/LV - LUKS - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) > > USB stick - primary partition(512MiB) - NILFS2 > > > > * Reproduced (always, immediately) > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount) > > USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) > > this loopback problem is seen in Kernel 5.5.4. > > > Test conditions: > > kernel 4.19.86 (same as previous test) > > NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters > > size of "primary partition" in USB stick is approx. 14GiB > > size of "regular file" is approx. 512MiB > > "reproduce": mount NILFS2, touch file, sync ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-02-16 3:59 ` Ryusuke Konishi 0 siblings, 0 replies; 32+ messages in thread From: Ryusuke Konishi @ 2020-02-16 3:59 UTC (permalink / raw) To: ARAI Shun-ichi; +Cc: LKML, linux-nilfs, Brian G. Thank you Arai-san, Your method with loopback device worked to reproduce the issue even where the bug doesn't easily hit for physical devices. Regards, Ryusuke Konishi On Sun, Feb 16, 2020 at 11:11 AM ARAI Shun-ichi <hermes@ceres.dti.ne.jp> wrote: > > And, > > In <20200210.224609.499887311281343618.hermes@ceres.dti.ne.jp>; > ARAI Shun-ichi <hermes@ceres.dti.ne.jp> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > > > Hi, > > > > FYI, reporting additional test results. > > > > I reproduced this problem with clean NILFS2 fs in previous mail. > > "clean" means that "make filesystem before every tests." > > In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. > > > > * Not reproduced > > USB stick - primary partition - NILFS2 > > USB stick - primary partition - VG/LV - NILFS2 > > USB stick - primary partition - VG/LV - LUKS - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) > > USB stick - primary partition(512MiB) - NILFS2 > > > > * Reproduced (always, immediately) > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount) > > USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) > > this loopback problem is seen in Kernel 5.5.4. > > > Test conditions: > > kernel 4.19.86 (same as previous test) > > NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters > > size of "primary partition" in USB stick is approx. 14GiB > > size of "regular file" is approx. 512MiB > > "reproduce": mount NILFS2, touch file, sync ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct @ 2020-02-16 3:59 ` Ryusuke Konishi 0 siblings, 0 replies; 32+ messages in thread From: Ryusuke Konishi @ 2020-02-16 3:59 UTC (permalink / raw) To: ARAI Shun-ichi; +Cc: LKML, linux-nilfs, Brian G. Thank you Arai-san, Your method with loopback device worked to reproduce the issue even where the bug doesn't easily hit for physical devices. Regards, Ryusuke Konishi On Sun, Feb 16, 2020 at 11:11 AM ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote: > > And, > > In <20200210.224609.499887311281343618.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org>; > ARAI Shun-ichi <hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> wrote > as Subject "Re: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct": > > > Hi, > > > > FYI, reporting additional test results. > > > > I reproduced this problem with clean NILFS2 fs in previous mail. > > "clean" means that "make filesystem before every tests." > > In this mail, I tried to reproduct with/without VG/LV, LUKS, loopback. > > > > * Not reproduced > > USB stick - primary partition - NILFS2 > > USB stick - primary partition - VG/LV - NILFS2 > > USB stick - primary partition - VG/LV - LUKS - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - NILFS2 > > USB stick - primary partition - LUKS - VG/LV - LUKS - NILFS2 > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount, kernel 4.19.82) > > USB stick - primary partition(512MiB) - NILFS2 > > > > * Reproduced (always, immediately) > > /tmp (tmpfs) - regular file - NILFS2 (loopback mount) > > USB stick - primary partition - ext4 - regular file - NILFS2 (loopback mount) > > this loopback problem is seen in Kernel 5.5.4. > > > Test conditions: > > kernel 4.19.86 (same as previous test) > > NILFS2/ext4 filesystem, VG/LV, LUKS were made with default parameters > > size of "primary partition" in USB stick is approx. 14GiB > > size of "regular file" is approx. 512MiB > > "reproduce": mount NILFS2, touch file, sync ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2020-06-01 11:46 UTC | newest] Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-17 17:34 BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct Tomas Hlavaty 2019-11-17 17:34 ` Tomas Hlavaty 2019-11-18 16:51 ` Ryusuke Konishi 2019-11-19 6:04 ` Viacheslav Dubeyko 2020-01-23 13:00 ` Tomas Hlavaty 2020-01-23 13:00 ` Tomas Hlavaty 2019-12-19 21:02 ` Tomas Hlavaty 2020-01-23 12:31 ` Tomas Hlavaty 2020-01-23 12:31 ` Tomas Hlavaty 2020-03-27 6:26 ` Tomas Hlavaty 2020-03-27 6:26 ` Tomas Hlavaty [not found] ` <CAKFNMomjWkNvHvHkEp=Jv_BiGPNj=oLEChyoXX1yCj5xctAkMA@mail.gmail.com> 2020-03-28 9:26 ` ARAI Shun-ichi 2020-04-30 12:38 ` BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_co Hideki EIRAKU 2020-04-30 12:38 ` Hideki EIRAKU 2020-04-30 15:27 ` Tom 2020-04-30 15:27 ` Tom 2020-05-31 17:49 ` Ryusuke Konishi 2020-05-31 17:49 ` Ryusuke Konishi [not found] ` <20200601024013.1296-1-hdanton@sina.com> 2020-06-01 11:46 ` Ryusuke Konishi 2020-06-01 11:46 ` Ryusuke Konishi 2020-01-23 13:58 ` BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 in nilfs_segctor_do_construct ARAI Shun-ichi [not found] ` <20200123.225827.1155989593018204741.hermes-akuOmOme3sQYOdUovKs6ag@public.gmane.org> 2020-01-23 14:07 ` ARAI Shun-ichi 2020-01-23 14:30 ` ARAI Shun-ichi 2020-01-23 14:30 ` ARAI Shun-ichi 2020-02-10 13:46 ` ARAI Shun-ichi 2020-02-10 13:46 ` ARAI Shun-ichi 2020-02-16 2:10 ` ARAI Shun-ichi 2020-02-16 2:10 ` ARAI Shun-ichi 2020-02-16 2:24 ` Brian G. 2020-02-16 2:24 ` Brian G. 2020-02-16 3:59 ` Ryusuke Konishi 2020-02-16 3:59 ` Ryusuke Konishi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.