* ext4: journal has aborted @ 2014-06-30 21:30 Matteo Croce 2014-07-01 6:26 ` David Jander 0 siblings, 1 reply; 50+ messages in thread From: Matteo Croce @ 2014-06-30 21:30 UTC (permalink / raw) To: linux-ext4 I was web surfing and using gimp when: EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. Aborting journal on device sda2-8. EXT4-fs (sda2): Remounting filesystem read-only ------------[ cut here ]------------ WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 __ext4_handle_dirty_metadata+0x18e/0x1d0() Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm backlight CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 Call Trace: [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 [<ffffffff81114a46>] ? evict+0xa6/0x160 [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f ---[ end trace 795411398e41fbcb ]--- EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at line 241, credits 91/91, errcode -30 EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in ext4_free_blocks:4867: Journal has aborted EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle type 5 started at line 241, credits 91/91, errcode -30 EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal has aborted EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal has aborted EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal has aborted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-06-30 21:30 ext4: journal has aborted Matteo Croce @ 2014-07-01 6:26 ` David Jander 2014-07-01 8:00 ` Matteo Croce ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: David Jander @ 2014-07-01 6:26 UTC (permalink / raw) To: Matteo Croce; +Cc: linux-ext4, Theodore Ts'o Hi, On Mon, 30 Jun 2014 23:30:10 +0200 Matteo Croce <technoboy85@gmail.com> wrote: > I was web surfing and using gimp when: > > EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, > 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. I was about to post a related question to this list. I am also seeing these kind of errors when using ext4 on latest mainline (I began testing with 3.15 where I saw this and now in 3.16-rc3 it is still there). It happens almost instantly when power-cycling the system (unclean shutdown). The next time the system boots, I get these errors. AFAICT, you are using a pretty recent kernel. Which version exactly? > Aborting journal on device sda2-8. > EXT4-fs (sda2): Remounting filesystem read-only > ------------[ cut here ]------------ > WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 > __ext4_handle_dirty_metadata+0x18e/0x1d0() > Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek > snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops > videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep > rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm > aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class > glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath > mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill > snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich > soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart > ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm > backlight > CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 > 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d > ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 > ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 > Call Trace: > [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 > [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 > [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 > [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 > [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 > [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 > [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 > [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 > [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 > [<ffffffff81114a46>] ? evict+0xa6/0x160 > [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 > [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 > [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 > [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f > ---[ end trace 795411398e41fbcb ]--- > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > line 241, credits 91/91, errcode -30 > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in > ext4_free_blocks:4867: Journal has aborted > EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: > block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle > type 5 started at line 241, credits 91/91, errcode -30 > EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has > aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has > aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > has aborted > EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > has aborted > EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > has aborted I did not get these errors. I suspect this may be a consequence of FS corruption due to a bug in etx4. Here's why I suspect a bug: I am running latest git head (3.16-rc3+ as of yesterday) on an ARM system with eMMC flash. The eMMC is formatted in SLC mode ("enhanced" mode according to eMMC 4.41) and "reliable-writes" are enabled, so power-cycling should not cause FS corruption in presence of a journal. I can format the eMMC device either as EXT3 or EXT4 for the test. After formatting and writing the rootfs to the partition I can boot successfully in either situation. Once booted from eMMC, I start bonnie++ (to just stress the FS for a while), and after a minute or so the board is power-cycled while bonnie++ is still running. Next time I boot the situation is this: With EXT3: All seems fine, journal is replayed, no errors. I can repeat this as many times as I want, FS stays consistent. With EXT4: After just one power cycle I start getting this: [ 7.603871] EXT4-fs error (device mmcblk0p2): ext4_mb_generate_buddy:757: group 1, 8542 clusters in bitmap, 8550 in gd; block bitmap corrupt. [ 7.616743] JBD2: Spotted dirty metadata buffer (dev = mmcblk0p2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. If I continue the test, it doesn't take long and serious corruption starts occurring. Again, with EXT3 I am unable to detect any problems. Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 6:26 ` David Jander @ 2014-07-01 8:00 ` Matteo Croce 2014-07-01 8:42 ` Darrick J. Wong 2014-07-01 9:02 ` Darrick J. Wong 2 siblings, 0 replies; 50+ messages in thread From: Matteo Croce @ 2014-07-01 8:00 UTC (permalink / raw) To: David Jander; +Cc: linux-ext4, Theodore Ts'o 2014-07-01 8:26 GMT+02:00 David Jander <david@protonic.nl>: > > Hi, > > On Mon, 30 Jun 2014 23:30:10 +0200 > Matteo Croce <technoboy85@gmail.com> wrote: > >> I was web surfing and using gimp when: >> >> EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, >> 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. > > I was about to post a related question to this list. I am also seeing these > kind of errors when using ext4 on latest mainline (I began testing with 3.15 > where I saw this and now in 3.16-rc3 it is still there). > It happens almost instantly when power-cycling the system (unclean shutdown). > The next time the system boots, I get these errors. Indeed I had a power failure during last boot, but it took about 1 hour after the unclean mount to abort the journal > AFAICT, you are using a pretty recent kernel. Which version exactly? vanilla 3.15.0 >> Aborting journal on device sda2-8. >> EXT4-fs (sda2): Remounting filesystem read-only >> ------------[ cut here ]------------ >> WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 >> __ext4_handle_dirty_metadata+0x18e/0x1d0() >> Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek >> snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops >> videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep >> rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl >> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm >> snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm >> aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class >> glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath >> mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill >> snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich >> soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart >> ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm >> backlight >> CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 >> 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d >> ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 >> ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 >> Call Trace: >> [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 >> [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 >> [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 >> [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 >> [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 >> [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 >> [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 >> [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 >> [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 >> [<ffffffff81114a46>] ? evict+0xa6/0x160 >> [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 >> [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 >> [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 >> [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f >> ---[ end trace 795411398e41fbcb ]--- >> EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at >> line 241, credits 91/91, errcode -30 >> EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at >> line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in >> ext4_free_blocks:4867: Journal has aborted >> EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: >> block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle >> type 5 started at line 241, credits 91/91, errcode -30 >> EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has >> aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has >> aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >> has aborted >> EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted >> EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >> has aborted >> EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted >> EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >> has aborted > > I did not get these errors. I suspect this may be a consequence of FS > corruption due to a bug in etx4. > > Here's why I suspect a bug: > > I am running latest git head (3.16-rc3+ as of yesterday) on an ARM system with > eMMC flash. The eMMC is formatted in SLC mode ("enhanced" mode according to > eMMC 4.41) and "reliable-writes" are enabled, so power-cycling should not > cause FS corruption in presence of a journal. > > I can format the eMMC device either as EXT3 or EXT4 for the test. After > formatting and writing the rootfs to the partition I can boot successfully in > either situation. Once booted from eMMC, I start bonnie++ (to just stress the > FS for a while), and after a minute or so the board is power-cycled while > bonnie++ is still running. > > Next time I boot the situation is this: > > With EXT3: All seems fine, journal is replayed, no errors. I can repeat this as > many times as I want, FS stays consistent. > > With EXT4: After just one power cycle I start getting this: > > [ 7.603871] EXT4-fs error (device mmcblk0p2): ext4_mb_generate_buddy:757: group 1, 8542 clusters in bitmap, 8550 in gd; block bitmap corrupt. > [ 7.616743] JBD2: Spotted dirty metadata buffer (dev = mmcblk0p2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. > > If I continue the test, it doesn't take long and serious corruption starts > occurring. > > Again, with EXT3 I am unable to detect any problems. > > Best regards, > > -- > David Jander > Protonic Holland. -- Matteo Croce OpenWrt Developer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 6:26 ` David Jander 2014-07-01 8:00 ` Matteo Croce @ 2014-07-01 8:42 ` Darrick J. Wong 2014-07-01 8:55 ` Matteo Croce 2014-07-01 12:07 ` Jaehoon Chung 2014-07-01 9:02 ` Darrick J. Wong 2 siblings, 2 replies; 50+ messages in thread From: Darrick J. Wong @ 2014-07-01 8:42 UTC (permalink / raw) To: Matteo Croce, David Jander; +Cc: linux-ext4, Theodore Ts'o On Tue, Jul 01, 2014 at 08:26:19AM +0200, David Jander wrote: > > Hi, > > On Mon, 30 Jun 2014 23:30:10 +0200 > Matteo Croce <technoboy85@gmail.com> wrote: > > > I was web surfing and using gimp when: > > > > EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, > > 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. > > I was about to post a related question to this list. I am also seeing these > kind of errors when using ext4 on latest mainline (I began testing with 3.15 > where I saw this and now in 3.16-rc3 it is still there). > It happens almost instantly when power-cycling the system (unclean shutdown). > The next time the system boots, I get these errors. > > AFAICT, you are using a pretty recent kernel. Which version exactly? > > > Aborting journal on device sda2-8. > > EXT4-fs (sda2): Remounting filesystem read-only Matteo, could you please post the full dmesg log somewhere? I'm interested in what happens before all this happens, because... > > ------------[ cut here ]------------ > > WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 > > __ext4_handle_dirty_metadata+0x18e/0x1d0() > > Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek > > snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops > > videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep > > rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > > snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm > > aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class > > glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath > > mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill > > snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich > > soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart > > ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm > > backlight > > CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 > > 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d > > ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 > > ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 > > Call Trace: > > [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 > > [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 > > [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 > > [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 > > [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 > > [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 > > [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 > > [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 > > [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 > > [<ffffffff81114a46>] ? evict+0xa6/0x160 > > [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 > > [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 > > [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 > > [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f > > ---[ end trace 795411398e41fbcb ]--- > > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > > line 241, credits 91/91, errcode -30 > > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > > line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in > > ext4_free_blocks:4867: Journal has aborted > > EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: > > block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle > > type 5 started at line 241, credits 91/91, errcode -30 > > EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has > > aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has > > aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > > has aborted > > EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted > > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > > has aborted > > EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted > > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > > has aborted > > I did not get these errors. I suspect this may be a consequence of FS > corruption due to a bug in etx4. > > Here's why I suspect a bug: > > I am running latest git head (3.16-rc3+ as of yesterday) on an ARM system with > eMMC flash. The eMMC is formatted in SLC mode ("enhanced" mode according to > eMMC 4.41) and "reliable-writes" are enabled, so power-cycling should not > cause FS corruption in presence of a journal. > > I can format the eMMC device either as EXT3 or EXT4 for the test. After > formatting and writing the rootfs to the partition I can boot successfully in > either situation. Once booted from eMMC, I start bonnie++ (to just stress the > FS for a while), and after a minute or so the board is power-cycled while > bonnie++ is still running. > > Next time I boot the situation is this: > > With EXT3: All seems fine, journal is replayed, no errors. I can repeat this as > many times as I want, FS stays consistent. > > With EXT4: After just one power cycle I start getting this: > > [ 7.603871] EXT4-fs error (device mmcblk0p2): ext4_mb_generate_buddy:757: group 1, 8542 clusters in bitmap, 8550 in gd; block bitmap corrupt. > [ 7.616743] JBD2: Spotted dirty metadata buffer (dev = mmcblk0p2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. I've been seeing this same set of symptoms with 3.15.0 on various SSDs (Samsung 840 Pro, Crucial M4). It seems that something (upstart?) is holding open some file or other during poweroff, which means that the root fs can't be unmounted or even remounted rw. I also noticed that the next time the system comes up, the kernel tells me that it has to process the inode orphan list as part of recovery. Shortly after the orphan list gets processed, I get that message and the FS goes ro. A subsequent fsck run reveals that the block bitmap is indeed incorrect in that block group, and when I bd the blocks that are incorrect in the bitmap, I see what could be some kind of upstart log file. Either way, I suspect some bug in orphan processing. <shrug> I don't know if this is specific to SSDs or spinning rust. Right now I've simply rigged the initramfs to e2fsck -p the root fs before mounting it, which seems(?) to have patched around it for now. > If I continue the test, it doesn't take long and serious corruption starts > occurring. You're getting actual FS data corruption too? Or more of those messages? --D > > Again, with EXT3 I am unable to detect any problems. > > Best regards, > > -- > David Jander > Protonic Holland. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 8:42 ` Darrick J. Wong @ 2014-07-01 8:55 ` Matteo Croce 2014-07-02 13:49 ` Dmitry Monakhov 2014-07-03 13:43 ` Theodore Ts'o 2014-07-01 12:07 ` Jaehoon Chung 1 sibling, 2 replies; 50+ messages in thread From: Matteo Croce @ 2014-07-01 8:55 UTC (permalink / raw) To: Darrick J. Wong; +Cc: David Jander, linux-ext4, Theodore Ts'o 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > On Tue, Jul 01, 2014 at 08:26:19AM +0200, David Jander wrote: >> >> Hi, >> >> On Mon, 30 Jun 2014 23:30:10 +0200 >> Matteo Croce <technoboy85@gmail.com> wrote: >> >> > I was web surfing and using gimp when: >> > >> > EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, >> > 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. >> >> I was about to post a related question to this list. I am also seeing these >> kind of errors when using ext4 on latest mainline (I began testing with 3.15 >> where I saw this and now in 3.16-rc3 it is still there). >> It happens almost instantly when power-cycling the system (unclean shutdown). >> The next time the system boots, I get these errors. >> >> AFAICT, you are using a pretty recent kernel. Which version exactly? >> >> > Aborting journal on device sda2-8. >> > EXT4-fs (sda2): Remounting filesystem read-only > > Matteo, could you please post the full dmesg log somewhere? I'm interested in > what happens before all this happens, because... I've rebooted the notebook twice >> > ------------[ cut here ]------------ >> > WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 >> > __ext4_handle_dirty_metadata+0x18e/0x1d0() >> > Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek >> > snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops >> > videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep >> > rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl >> > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm >> > snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm >> > aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class >> > glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath >> > mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill >> > snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich >> > soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart >> > ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm >> > backlight >> > CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 >> > 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d >> > ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 >> > ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 >> > Call Trace: >> > [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 >> > [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 >> > [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 >> > [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 >> > [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 >> > [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 >> > [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 >> > [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 >> > [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 >> > [<ffffffff81114a46>] ? evict+0xa6/0x160 >> > [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 >> > [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 >> > [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 >> > [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f >> > ---[ end trace 795411398e41fbcb ]--- >> > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at >> > line 241, credits 91/91, errcode -30 >> > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at >> > line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in >> > ext4_free_blocks:4867: Journal has aborted >> > EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: >> > block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle >> > type 5 started at line 241, credits 91/91, errcode -30 >> > EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has >> > aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has >> > aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >> > has aborted >> > EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted >> > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >> > has aborted >> > EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted >> > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >> > has aborted >> >> I did not get these errors. I suspect this may be a consequence of FS >> corruption due to a bug in etx4. >> >> Here's why I suspect a bug: >> >> I am running latest git head (3.16-rc3+ as of yesterday) on an ARM system with >> eMMC flash. The eMMC is formatted in SLC mode ("enhanced" mode according to >> eMMC 4.41) and "reliable-writes" are enabled, so power-cycling should not >> cause FS corruption in presence of a journal. I have a Samsung SSD 840 PRO >> I can format the eMMC device either as EXT3 or EXT4 for the test. After >> formatting and writing the rootfs to the partition I can boot successfully in >> either situation. Once booted from eMMC, I start bonnie++ (to just stress the >> FS for a while), and after a minute or so the board is power-cycled while >> bonnie++ is still running. >> >> Next time I boot the situation is this: >> >> With EXT3: All seems fine, journal is replayed, no errors. I can repeat this as >> many times as I want, FS stays consistent. >> >> With EXT4: After just one power cycle I start getting this: >> >> [ 7.603871] EXT4-fs error (device mmcblk0p2): ext4_mb_generate_buddy:757: group 1, 8542 clusters in bitmap, 8550 in gd; block bitmap corrupt. >> [ 7.616743] JBD2: Spotted dirty metadata buffer (dev = mmcblk0p2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. > > I've been seeing this same set of symptoms with 3.15.0 on various SSDs (Samsung > 840 Pro, Crucial M4). It seems that something (upstart?) is holding open some > file or other during poweroff, which means that the root fs can't be unmounted > or even remounted rw. I also noticed that the next time the system comes up, > the kernel tells me that it has to process the inode orphan list as part of > recovery. > > Shortly after the orphan list gets processed, I get that message and the FS > goes ro. A subsequent fsck run reveals that the block bitmap is indeed > incorrect in that block group, and when I bd the blocks that are incorrect in > the bitmap, I see what could be some kind of upstart log file. Either way, I > suspect some bug in orphan processing. > > <shrug> I don't know if this is specific to SSDs or spinning rust. Right now > I've simply rigged the initramfs to e2fsck -p the root fs before mounting it, > which seems(?) to have patched around it for now. > >> If I continue the test, it doesn't take long and serious corruption starts >> occurring. > > You're getting actual FS data corruption too? Or more of those messages? Actually it seems that there are no corruption > --D >> >> Again, with EXT3 I am unable to detect any problems. >> >> Best regards, >> >> -- >> David Jander >> Protonic Holland. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matteo Croce OpenWrt Developer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 8:55 ` Matteo Croce @ 2014-07-02 13:49 ` Dmitry Monakhov 2014-07-03 13:43 ` Theodore Ts'o 1 sibling, 0 replies; 50+ messages in thread From: Dmitry Monakhov @ 2014-07-02 13:49 UTC (permalink / raw) To: Matteo Croce, Darrick J. Wong; +Cc: David Jander, linux-ext4, Theodore Ts'o [-- Attachment #1: Type: text/plain, Size: 1615 bytes --] On Tue, 1 Jul 2014 10:55:11 +0200, Matteo Croce <technoboy85@gmail.com> wrote: > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > On Tue, Jul 01, 2014 at 08:26:19AM +0200, David Jander wrote: > >> > >> Hi, > >> > >> On Mon, 30 Jun 2014 23:30:10 +0200 > >> Matteo Croce <technoboy85@gmail.com> wrote: > >> > >> > I was web surfing and using gimp when: > >> > > >> > EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, > >> > 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. > >> > >> I was about to post a related question to this list. I am also seeing these > >> kind of errors when using ext4 on latest mainline (I began testing with 3.15 > >> where I saw this and now in 3.16-rc3 it is still there). > >> It happens almost instantly when power-cycling the system (unclean shutdown). > >> The next time the system boots, I get these errors. > >> > >> AFAICT, you are using a pretty recent kernel. Which version exactly? > >> > >> > Aborting journal on device sda2-8. > >> > EXT4-fs (sda2): Remounting filesystem read-only > > > > Matteo, could you please post the full dmesg log somewhere? I'm interested in > > what happens before all this happens, because... > It is reasonable to understand what is the root of this issue in disk driver or issue in JBD. Can you please run fio utility for your disk ( can be found here http://git.kernel.dk/?p=fio.git;a=summary) or simply install it with your favorite pkg mgr (yum install fio) and run verify job for your disk (see attachment) like follows: # fio disk-verify.fio ####### job:disk-verify.fio ############# [-- Attachment #2: disk-veryfy.fio --] [-- Type: text/plain, Size: 532 bytes --] [global] bs=64k direct=1 rw=randwrite ioengine=libaio iodepth=32 runtime=3600 time_based verify=crc32c-intel verify_fatal=1 verify_dump=1 verify_backlog=1024 verify_async=1 verifysort=1 exitall ############## # My volume '/dev/mapper/vg-test_dev' size of 10gb # In order to run this script just change job't name for example '/dev/sda' ############### # This job will run 4 concurent threads which operate on separated regions # [0,2gb][2gb,4gb][4gb,6gb][6gb,8gb] # [/dev/mapper/vg-test_dev] numjobs=4 offset_increment=2G size=2G [-- Attachment #3: Type: text/plain, Size: 6686 bytes --] > I've rebooted the notebook twice > > >> > ------------[ cut here ]------------ > >> > WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 > >> > __ext4_handle_dirty_metadata+0x18e/0x1d0() > >> > Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek > >> > snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops > >> > videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep > >> > rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl > >> > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > >> > snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm > >> > aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class > >> > glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath > >> > mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill > >> > snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich > >> > soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart > >> > ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm > >> > backlight > >> > CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 > >> > 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d > >> > ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 > >> > ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 > >> > Call Trace: > >> > [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 > >> > [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 > >> > [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 > >> > [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 > >> > [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 > >> > [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 > >> > [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 > >> > [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 > >> > [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 > >> > [<ffffffff81114a46>] ? evict+0xa6/0x160 > >> > [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 > >> > [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 > >> > [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 > >> > [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f > >> > ---[ end trace 795411398e41fbcb ]--- > >> > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > >> > line 241, credits 91/91, errcode -30 > >> > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > >> > line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in > >> > ext4_free_blocks:4867: Journal has aborted > >> > EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: > >> > block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle > >> > type 5 started at line 241, credits 91/91, errcode -30 > >> > EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has > >> > aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has > >> > aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > >> > has aborted > >> > EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted > >> > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > >> > has aborted > >> > EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted > >> > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > >> > has aborted > >> > >> I did not get these errors. I suspect this may be a consequence of FS > >> corruption due to a bug in etx4. > >> > >> Here's why I suspect a bug: > >> > >> I am running latest git head (3.16-rc3+ as of yesterday) on an ARM system with > >> eMMC flash. The eMMC is formatted in SLC mode ("enhanced" mode according to > >> eMMC 4.41) and "reliable-writes" are enabled, so power-cycling should not > >> cause FS corruption in presence of a journal. > > I have a Samsung SSD 840 PRO > > >> I can format the eMMC device either as EXT3 or EXT4 for the test. After > >> formatting and writing the rootfs to the partition I can boot successfully in > >> either situation. Once booted from eMMC, I start bonnie++ (to just stress the > >> FS for a while), and after a minute or so the board is power-cycled while > >> bonnie++ is still running. > >> > >> Next time I boot the situation is this: > >> > >> With EXT3: All seems fine, journal is replayed, no errors. I can repeat this as > >> many times as I want, FS stays consistent. > >> > >> With EXT4: After just one power cycle I start getting this: > >> > >> [ 7.603871] EXT4-fs error (device mmcblk0p2): ext4_mb_generate_buddy:757: group 1, 8542 clusters in bitmap, 8550 in gd; block bitmap corrupt. > >> [ 7.616743] JBD2: Spotted dirty metadata buffer (dev = mmcblk0p2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. > > > > I've been seeing this same set of symptoms with 3.15.0 on various SSDs (Samsung > > 840 Pro, Crucial M4). It seems that something (upstart?) is holding open some > > file or other during poweroff, which means that the root fs can't be unmounted > > or even remounted rw. I also noticed that the next time the system comes up, > > the kernel tells me that it has to process the inode orphan list as part of > > recovery. > > > > Shortly after the orphan list gets processed, I get that message and the FS > > goes ro. A subsequent fsck run reveals that the block bitmap is indeed > > incorrect in that block group, and when I bd the blocks that are incorrect in > > the bitmap, I see what could be some kind of upstart log file. Either way, I > > suspect some bug in orphan processing. > > > > <shrug> I don't know if this is specific to SSDs or spinning rust. Right now > > I've simply rigged the initramfs to e2fsck -p the root fs before mounting it, > > which seems(?) to have patched around it for now. > > > >> If I continue the test, it doesn't take long and serious corruption starts > >> occurring. > > > > You're getting actual FS data corruption too? Or more of those messages? > > Actually it seems that there are no corruption > > > --D > >> > >> Again, with EXT3 I am unable to detect any problems. > >> > >> Best regards, > >> > >> -- > >> David Jander > >> Protonic Holland. > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Matteo Croce > OpenWrt Developer > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 8:55 ` Matteo Croce 2014-07-02 13:49 ` Dmitry Monakhov @ 2014-07-03 13:43 ` Theodore Ts'o 2014-07-03 14:15 ` David Jander 1 sibling, 1 reply; 50+ messages in thread From: Theodore Ts'o @ 2014-07-03 13:43 UTC (permalink / raw) To: Matteo Croce; +Cc: Darrick J. Wong, David Jander, linux-ext4 On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > I have a Samsung SSD 840 PRO Matteo, For you, you said you were seeing these problems on 3.15. Was it *not* happening for you when you used an older kernel? If so, that would help us try to provide the basis of trying to do a bisection search. Using the kvm-xfstests infrastructure, I've been trying to reproduce the problem as follows: ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k xfstests geneeric/075 runs fsx which does a fair amount of block allocation deallocations, and then after the test finishes, it first replays the journal (e2fsck -p) and then forces a fsck run on the test disk that I use for the run. After I launch this, in a separate window, I do this: sleep 60 ; killall qemu-system-x86_64 This kills the qemu process midway through the fsx test, and then I see if I can find a problem. I haven't had a chance to automate this yet, and it is my intention to try to set this up where I can run this on a ramdisk or a SSD, so I can more closely approximate what people are reporting on flash-based media. So far, I haven't been able to reproduce the problem. If after doing a large number of times, it can't be reproduced (especially if it can't be reproduced on an SSD), then it would lead us to believe that one of two things is the cause. (a) The CACHE FLUSH command isn't properly getting sent to the device in some cases, or (b) there really is a hardware problem with the flash device in question. Cheers, - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-03 13:43 ` Theodore Ts'o @ 2014-07-03 14:15 ` David Jander 2014-07-03 14:46 ` Theodore Ts'o ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: David Jander @ 2014-07-03 14:15 UTC (permalink / raw) To: Theodore Ts'o; +Cc: Matteo Croce, Darrick J. Wong, linux-ext4 Hi Ted, On Thu, 3 Jul 2014 09:43:38 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > > I have a Samsung SSD 840 PRO > > Matteo, > > For you, you said you were seeing these problems on 3.15. Was it > *not* happening for you when you used an older kernel? If so, that > would help us try to provide the basis of trying to do a bisection > search. I also tested with 3.15, and there too I see the same problem. > Using the kvm-xfstests infrastructure, I've been trying to reproduce > the problem as follows: > > ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k > > xfstests geneeric/075 runs fsx which does a fair amount of block > allocation deallocations, and then after the test finishes, it first > replays the journal (e2fsck -p) and then forces a fsck run on the > test disk that I use for the run. > > After I launch this, in a separate window, I do this: > > sleep 60 ; killall qemu-system-x86_64 > > This kills the qemu process midway through the fsx test, and then I > see if I can find a problem. I haven't had a chance to automate this > yet, and it is my intention to try to set this up where I can run this > on a ramdisk or a SSD, so I can more closely approximate what people > are reporting on flash-based media. > > So far, I haven't been able to reproduce the problem. If after doing > a large number of times, it can't be reproduced (especially if it > can't be reproduced on an SSD), then it would lead us to believe that > one of two things is the cause. (a) The CACHE FLUSH command isn't > properly getting sent to the device in some cases, or (b) there really > is a hardware problem with the flash device in question. Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral driver? Can you explain why I don't see any problems with EXT3? I can't discard the possibility of (b) because I cannot prove it, but I will try to see if I can do the same test on a SSD which I happen to have on that platform. That should be able to rule out problems with the eMMC chip and -driver, right? Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? I left the system running (it started from a dirty EXT4 partition), and I am seen the following error pop up after a few minutes. The system is not doing much (some syslog activity maybe, but not much more): [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 What does that mean? Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-03 14:15 ` David Jander @ 2014-07-03 14:46 ` Theodore Ts'o 2014-07-03 14:57 ` Dmitry Monakhov 2014-07-03 14:58 ` Dmitry Monakhov 2 siblings, 0 replies; 50+ messages in thread From: Theodore Ts'o @ 2014-07-03 14:46 UTC (permalink / raw) To: David Jander; +Cc: Matteo Croce, Darrick J. Wong, linux-ext4 On Thu, Jul 03, 2014 at 04:15:51PM +0200, David Jander wrote: > > Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral > driver? Can you explain why I don't see any problems with EXT3? It's possible. I seem to recall a bug related to the mmc subsystem that was causing file system corruption after power failure across multiple file systems --- xfs, and reiserfs were mentioned, as I recall. I *thought* the problem was fixed, and then backported if necessary. Hmm... Here's where that bug was reported: https://lkml.org/lkml/2014/6/12/19 ... but I havne't found the fix yet. Now, this would be quite different from the bug Matteo was seeing, since he has a Samsung SSD which is *not* a MMC device. As far as why you aren't seeing a problem with ext3, it doesn't have the same sort of paranoid checks that ext4 has, so it's less likely to catch certain problems at runtime. If you ran fsck on an ext3 file system, and it was corrupt, of course that that would show th eproblem. > I left the system running (it started from a dirty EXT4 partition), and I am > seen the following error pop up after a few minutes. The system is not doing > much (some syslog activity maybe, but not much more): > > [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 > [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 > [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 > > What does that mean? This means that file sysgtem has errors that weren't fixed after an fsck. The first error occured at: % date -d @1404216838 Tue Jul 1 08:13:58 EDT 2014 and the most recent error occured at: % date -d @1404388969 Thu Jul 3 08:02:49 EDT 2014 The error count information should have gotten cleared by e2fsck, so long as you are using a version of e2fsck newer than 1.41.13, released in December 2010. So if it has not been cleared, and you've since rebooted, that's an indication that e2fsck isn't getting run at boot. If you haven't rebooted yet, then about once a day, you'll see that message in your syslog. It's there so that people know that their file system has been problems, and they *really* should get it unmounted and checked before they lose more data.... The reason why I added this is because there were systems where people weren't noticing that they had been running with a corrupted file systems for days, weeks, months, etc., and then would complain that they had lost lots of data. By putting something in the logs once a day, hopefully it would reduce the chance of this happening. (And if they had configured their file system to panic when an error was detected, via "tune2fs -e panic /dev/sdXX", so long as their init scripts were properly configured, the file system should have been repaired after the reboot.) - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-03 14:15 ` David Jander 2014-07-03 14:46 ` Theodore Ts'o @ 2014-07-03 14:57 ` Dmitry Monakhov 2014-07-03 14:58 ` Dmitry Monakhov 2 siblings, 0 replies; 50+ messages in thread From: Dmitry Monakhov @ 2014-07-03 14:57 UTC (permalink / raw) To: David Jander, Theodore Ts'o; +Cc: Matteo Croce, Darrick J. Wong, linux-ext4 On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote: > > Hi Ted, > > On Thu, 3 Jul 2014 09:43:38 -0400 > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > > > > I have a Samsung SSD 840 PRO > > > > Matteo, > > > > For you, you said you were seeing these problems on 3.15. Was it > > *not* happening for you when you used an older kernel? If so, that > > would help us try to provide the basis of trying to do a bisection > > search. > > I also tested with 3.15, and there too I see the same problem. > > > Using the kvm-xfstests infrastructure, I've been trying to reproduce > > the problem as follows: > > > > ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k > > > > xfstests geneeric/075 runs fsx which does a fair amount of block > > allocation deallocations, and then after the test finishes, it first > > replays the journal (e2fsck -p) and then forces a fsck run on the > > test disk that I use for the run. > > > > After I launch this, in a separate window, I do this: > > > > sleep 60 ; killall qemu-system-x86_64 > > > > This kills the qemu process midway through the fsx test, and then I > > see if I can find a problem. I haven't had a chance to automate this > > yet, and it is my intention to try to set this up where I can run this > > on a ramdisk or a SSD, so I can more closely approximate what people > > are reporting on flash-based media. > > > > So far, I haven't been able to reproduce the problem. If after doing > > a large number of times, it can't be reproduced (especially if it > > can't be reproduced on an SSD), then it would lead us to believe that > > one of two things is the cause. (a) The CACHE FLUSH command isn't > > properly getting sent to the device in some cases, or (b) there really > > is a hardware problem with the flash device in question. > > Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral > driver? Can you explain why I don't see any problems with EXT3? > > I can't discard the possibility of (b) because I cannot prove it, but I will > try to see if I can do the same test on a SSD which I happen to have on that > platform. That should be able to rule out problems with the eMMC chip and > -driver, right? > > Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? > > I left the system running (it started from a dirty EXT4 partition), and I am > seen the following error pop up after a few minutes. The system is not doing > much (some syslog activity maybe, but not much more): > > [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 > [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 > [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 > > What does that mean? This means that it found previous error in internal ext4's log. Which is normal because your fs was corrupted before. It is reasonable to recreate filesystem from very beginning. In order to understand whenever it is regression in eMMC driver it is reasonable to run integrity test for a device itself. You can run any integrity test you like, For example just run a fio's job "fio disk-verify2.fio" (see attachment), IMPORTANT this script will destroy data on test partition. If it failed with errors like follows "verify: bad magic header XXX" than it is definitely a drivers issue. If my theory is true and it is storage's driver issue than JBD complain simply because it do care about it's data (it does integrity checks). Can you also create btrfs on that partition and performs some io activity and run fsck after that. You likely will see similar corruption > > Best regards, > > -- > David Jander > Protonic Holland. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-03 14:15 ` David Jander 2014-07-03 14:46 ` Theodore Ts'o 2014-07-03 14:57 ` Dmitry Monakhov @ 2014-07-03 14:58 ` Dmitry Monakhov 2014-07-04 9:40 ` David Jander 2 siblings, 1 reply; 50+ messages in thread From: Dmitry Monakhov @ 2014-07-03 14:58 UTC (permalink / raw) To: David Jander, Theodore Ts'o; +Cc: Matteo Croce, Darrick J. Wong, linux-ext4 [-- Attachment #1: Type: text/plain, Size: 3861 bytes --] On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote: > > Hi Ted, > > On Thu, 3 Jul 2014 09:43:38 -0400 > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > > > > I have a Samsung SSD 840 PRO > > > > Matteo, > > > > For you, you said you were seeing these problems on 3.15. Was it > > *not* happening for you when you used an older kernel? If so, that > > would help us try to provide the basis of trying to do a bisection > > search. > > I also tested with 3.15, and there too I see the same problem. > > > Using the kvm-xfstests infrastructure, I've been trying to reproduce > > the problem as follows: > > > > ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k > > > > xfstests geneeric/075 runs fsx which does a fair amount of block > > allocation deallocations, and then after the test finishes, it first > > replays the journal (e2fsck -p) and then forces a fsck run on the > > test disk that I use for the run. > > > > After I launch this, in a separate window, I do this: > > > > sleep 60 ; killall qemu-system-x86_64 > > > > This kills the qemu process midway through the fsx test, and then I > > see if I can find a problem. I haven't had a chance to automate this > > yet, and it is my intention to try to set this up where I can run this > > on a ramdisk or a SSD, so I can more closely approximate what people > > are reporting on flash-based media. > > > > So far, I haven't been able to reproduce the problem. If after doing > > a large number of times, it can't be reproduced (especially if it > > can't be reproduced on an SSD), then it would lead us to believe that > > one of two things is the cause. (a) The CACHE FLUSH command isn't > > properly getting sent to the device in some cases, or (b) there really > > is a hardware problem with the flash device in question. > > Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral > driver? Can you explain why I don't see any problems with EXT3? > > I can't discard the possibility of (b) because I cannot prove it, but I will > try to see if I can do the same test on a SSD which I happen to have on that > platform. That should be able to rule out problems with the eMMC chip and > -driver, right? > > Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? > > I left the system running (it started from a dirty EXT4 partition), and I am > seen the following error pop up after a few minutes. The system is not doing > much (some syslog activity maybe, but not much more): > > [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 > [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 > [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 > > What does that mean? This means that it found previous error in internal ext4's log. Which is normal because your fs was corrupted before. It is reasonable to recreate filesystem from very beginning. In order to understand whenever it is regression in eMMC driver it is reasonable to run integrity test for a device itself. You can run any integrity test you like, For example just run a fio's job "fio disk-verify2.fio" (see attachment), IMPORTANT this script will destroy data on test partition. If it failed with errors like follows "verify: bad magic header XXX" than it is definitely a drivers issue. If my theory is true and it is storage's driver issue than JBD complain simply because it do care about it's data (it does integrity checks). Can you also create btrfs on that partition and performs some io activity and run fsck after that. You likely will see similar corruption [-- Attachment #2: disk-veryfy2.fio --] [-- Type: text/plain, Size: 512 bytes --] [global] bs=4k direct=1 rw=randwrite ioengine=libaio iodepth=32 runtime=3600 time_based verify=crc32c-intel verify_fatal=1 verify_dump=1 verify_backlog=1024 verify_async=1 verifysort=1 exitall ############## # My volume '/dev/mmcblk1p2' size of 4gb # In order to run this script just change job't name for example '/dev/sda' ############### # This job will run 4 concurent threads which operate on separated regions # [0,1gb][1gb,2gb][2gb,3gb][3gb,4gb] # [/dev/mmcblk1p2] numjobs=4 offset_increment=1G size=1G [-- Attachment #3: Type: text/plain, Size: 266 bytes --] > > Best regards, > > -- > David Jander > Protonic Holland. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-03 14:58 ` Dmitry Monakhov @ 2014-07-04 9:40 ` David Jander 2014-07-04 10:17 ` Dmitry Monakhov 2014-07-04 11:04 ` Jaehoon Chung 0 siblings, 2 replies; 50+ messages in thread From: David Jander @ 2014-07-04 9:40 UTC (permalink / raw) To: Dmitry Monakhov Cc: Theodore Ts'o, Matteo Croce, Darrick J. Wong, linux-ext4 Hi Dmitry, On Thu, 03 Jul 2014 18:58:48 +0400 Dmitry Monakhov <dmonakhov@openvz.org> wrote: > On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote: > > > > Hi Ted, > > > > On Thu, 3 Jul 2014 09:43:38 -0400 > > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > > > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > > > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > > > > > > I have a Samsung SSD 840 PRO > > > > > > Matteo, > > > > > > For you, you said you were seeing these problems on 3.15. Was it > > > *not* happening for you when you used an older kernel? If so, that > > > would help us try to provide the basis of trying to do a bisection > > > search. > > > > I also tested with 3.15, and there too I see the same problem. > > > > > Using the kvm-xfstests infrastructure, I've been trying to reproduce > > > the problem as follows: > > > > > > ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k > > > > > > xfstests geneeric/075 runs fsx which does a fair amount of block > > > allocation deallocations, and then after the test finishes, it first > > > replays the journal (e2fsck -p) and then forces a fsck run on the > > > test disk that I use for the run. > > > > > > After I launch this, in a separate window, I do this: > > > > > > sleep 60 ; killall qemu-system-x86_64 > > > > > > This kills the qemu process midway through the fsx test, and then I > > > see if I can find a problem. I haven't had a chance to automate this > > > yet, and it is my intention to try to set this up where I can run this > > > on a ramdisk or a SSD, so I can more closely approximate what people > > > are reporting on flash-based media. > > > > > > So far, I haven't been able to reproduce the problem. If after doing > > > a large number of times, it can't be reproduced (especially if it > > > can't be reproduced on an SSD), then it would lead us to believe that > > > one of two things is the cause. (a) The CACHE FLUSH command isn't > > > properly getting sent to the device in some cases, or (b) there really > > > is a hardware problem with the flash device in question. > > > > Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral > > driver? Can you explain why I don't see any problems with EXT3? > > > > I can't discard the possibility of (b) because I cannot prove it, but I will > > try to see if I can do the same test on a SSD which I happen to have on that > > platform. That should be able to rule out problems with the eMMC chip and > > -driver, right? > > > > Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? > > > > I left the system running (it started from a dirty EXT4 partition), and I am > > seen the following error pop up after a few minutes. The system is not doing > > much (some syslog activity maybe, but not much more): > > > > [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 > > [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 > > [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 > > > > What does that mean? > This means that it found previous error in internal ext4's log. Which is > normal because your fs was corrupted before. It is reasonable to > recreate filesystem from very beginning. > > In order to understand whenever it is regression in eMMC driver it is > reasonable to run integrity test for a device itself. You can run > any integrity test you like, For example just run a fio's job > "fio disk-verify2.fio" (see attachment), IMPORTANT this script will > destroy data on test partition. If it failed with errors like > follows "verify: bad magic header XXX" than it is definitely a drivers issue. I have been trying to run fio on my board with your configuration file, but I am having problems, and since I am not familiar with fio at all, I can't really figure out what's wrong. My eMMC device is only 916MiB in size, so I edited the last part to be: offset_increment=100M size=100M Is that ok? I still get error messages complaining about blocksize though. Here is the output I get (can't really make sense of it): # ./fio ../disk-verify2.fio Multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures. /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 ... fio-2.1.10-49-gf302 Starting 4 processes fio: blocksize too large for data set fio: blocksize too large for data set fio: blocksize too large for data set fio: io_u.c:1315: __get_io_u: Assertion `io_u->flags & IO_U_F_FREE' failed.ta 00m:00s] fio: pid=7612, got signal=6 /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=7612: Fri Jul 4 09:31:15 2014 lat (msec) : 4=0.19%, 10=0.19%, 20=0.19%, 50=0.85%, 100=1.23% lat (msec) : 250=56.01%, 500=37.18%, 750=1.14% cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0 IO depths : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=97.1%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=33/w=1024/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=32 Run status group 0 (all jobs): Disk stats (read/write): mmcblk1: ios=11/1025, merge=0/0, ticks=94/6671, in_queue=7121, util=96.12% fio: file hash not empty on exit This assertion bugs me. Is it due to the previous errors ("blocksize too large for data set") or is is because my eMMC drive/kernel is seriously screwed? Help please! > If my theory is true and it is storage's driver issue than JBD complain > simply because it do care about it's data (it does integrity checks). > Can you also create btrfs on that partition and performs some io > activity and run fsck after that. You likely will see similar corruption Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 9:40 ` David Jander @ 2014-07-04 10:17 ` Dmitry Monakhov 2014-07-04 11:28 ` David Jander 2014-07-04 11:04 ` Jaehoon Chung 1 sibling, 1 reply; 50+ messages in thread From: Dmitry Monakhov @ 2014-07-04 10:17 UTC (permalink / raw) To: David Jander; +Cc: Theodore Ts'o, Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, 4 Jul 2014 11:40:31 +0200, David Jander <david@protonic.nl> wrote: > > Hi Dmitry, > > On Thu, 03 Jul 2014 18:58:48 +0400 > Dmitry Monakhov <dmonakhov@openvz.org> wrote: > > > On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote: > > > > > > Hi Ted, > > > > > > On Thu, 3 Jul 2014 09:43:38 -0400 > > > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > > > > > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > > > > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > > > > > > > > I have a Samsung SSD 840 PRO > > > > > > > > Matteo, > > > > > > > > For you, you said you were seeing these problems on 3.15. Was it > > > > *not* happening for you when you used an older kernel? If so, that > > > > would help us try to provide the basis of trying to do a bisection > > > > search. > > > > > > I also tested with 3.15, and there too I see the same problem. > > > > > > > Using the kvm-xfstests infrastructure, I've been trying to reproduce > > > > the problem as follows: > > > > > > > > ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k > > > > > > > > xfstests geneeric/075 runs fsx which does a fair amount of block > > > > allocation deallocations, and then after the test finishes, it first > > > > replays the journal (e2fsck -p) and then forces a fsck run on the > > > > test disk that I use for the run. > > > > > > > > After I launch this, in a separate window, I do this: > > > > > > > > sleep 60 ; killall qemu-system-x86_64 > > > > > > > > This kills the qemu process midway through the fsx test, and then I > > > > see if I can find a problem. I haven't had a chance to automate this > > > > yet, and it is my intention to try to set this up where I can run this > > > > on a ramdisk or a SSD, so I can more closely approximate what people > > > > are reporting on flash-based media. > > > > > > > > So far, I haven't been able to reproduce the problem. If after doing > > > > a large number of times, it can't be reproduced (especially if it > > > > can't be reproduced on an SSD), then it would lead us to believe that > > > > one of two things is the cause. (a) The CACHE FLUSH command isn't > > > > properly getting sent to the device in some cases, or (b) there really > > > > is a hardware problem with the flash device in question. > > > > > > Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral > > > driver? Can you explain why I don't see any problems with EXT3? > > > > > > I can't discard the possibility of (b) because I cannot prove it, but I will > > > try to see if I can do the same test on a SSD which I happen to have on that > > > platform. That should be able to rule out problems with the eMMC chip and > > > -driver, right? > > > > > > Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? > > > > > > I left the system running (it started from a dirty EXT4 partition), and I am > > > seen the following error pop up after a few minutes. The system is not doing > > > much (some syslog activity maybe, but not much more): > > > > > > [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 > > > [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 > > > [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 > > > > > > What does that mean? > > This means that it found previous error in internal ext4's log. Which is > > normal because your fs was corrupted before. It is reasonable to > > recreate filesystem from very beginning. > > > > In order to understand whenever it is regression in eMMC driver it is > > reasonable to run integrity test for a device itself. You can run > > any integrity test you like, For example just run a fio's job > > "fio disk-verify2.fio" (see attachment), IMPORTANT this script will > > destroy data on test partition. If it failed with errors like > > follows "verify: bad magic header XXX" than it is definitely a drivers issue. > > I have been trying to run fio on my board with your configuration file, but I > am having problems, and since I am not familiar with fio at all, I can't > really figure out what's wrong. My eMMC device is only 916MiB in size, so I > edited the last part to be: > > offset_increment=100M > size=100M > > Is that ok? > > I still get error messages complaining about blocksize though. Here is the > output I get (can't really make sense of it): > > # ./fio ../disk-verify2.fio > Multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures. > /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 > ... > fio-2.1.10-49-gf302 > Starting 4 processes > fio: blocksize too large for data set > fio: blocksize too large for data set > fio: blocksize too large for data set > fio: io_u.c:1315: __get_io_u: Assertion `io_u->flags & IO_U_F_FREE' failed.ta 00m:00s] > fio: pid=7612, got signal=6 > > /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=7612: Fri Jul 4 09:31:15 2014 > lat (msec) : 4=0.19%, 10=0.19%, 20=0.19%, 50=0.85%, 100=1.23% > lat (msec) : 250=56.01%, 500=37.18%, 750=1.14% > cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0 > IO depths : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=97.1%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > issued : total=r=33/w=1024/d=0, short=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > > Disk stats (read/write): > mmcblk1: ios=11/1025, merge=0/0, ticks=94/6671, in_queue=7121, util=96.12% > fio: file hash not empty on exit > > > This assertion bugs me. Is it due to the previous errors ("blocksize too large > for data set") or is is because my eMMC drive/kernel is seriously screwed? > > Help please! Ohhh. Actually this is axboe's crap. Recent fio's version is broken. Please use old good commit ffa93ca9d8d37ef git checkout git://git.kernel.dk/fio.git cd fio git checkout -b b2.0.13 ffa93ca9d8d37ef make -j4 ./fio ffa93ca9d8d37ef > > > If my theory is true and it is storage's driver issue than JBD complain > > simply because it do care about it's data (it does integrity checks). > > Can you also create btrfs on that partition and performs some io > > activity and run fsck after that. You likely will see similar corruption > > Best regards, > > -- > David Jander > Protonic Holland. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 10:17 ` Dmitry Monakhov @ 2014-07-04 11:28 ` David Jander 2014-07-04 12:20 ` Theodore Ts'o 0 siblings, 1 reply; 50+ messages in thread From: David Jander @ 2014-07-04 11:28 UTC (permalink / raw) To: Dmitry Monakhov Cc: Theodore Ts'o, Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, 04 Jul 2014 14:17:21 +0400 Dmitry Monakhov <dmonakhov@openvz.org> wrote: > On Fri, 4 Jul 2014 11:40:31 +0200, David Jander <david@protonic.nl> wrote: > > > > Hi Dmitry, > > > > On Thu, 03 Jul 2014 18:58:48 +0400 > > Dmitry Monakhov <dmonakhov@openvz.org> wrote: > > > > > On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote: > > > > > > > > Hi Ted, > > > > > > > > On Thu, 3 Jul 2014 09:43:38 -0400 > > > > "Theodore Ts'o" <tytso@mit.edu> wrote: > > > > > > > > > On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > > > > > > 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > > > > > > > > > > I have a Samsung SSD 840 PRO > > > > > > > > > > Matteo, > > > > > > > > > > For you, you said you were seeing these problems on 3.15. Was it > > > > > *not* happening for you when you used an older kernel? If so, that > > > > > would help us try to provide the basis of trying to do a bisection > > > > > search. > > > > > > > > I also tested with 3.15, and there too I see the same problem. > > > > > > > > > Using the kvm-xfstests infrastructure, I've been trying to reproduce > > > > > the problem as follows: > > > > > > > > > > ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k > > > > > > > > > > xfstests geneeric/075 runs fsx which does a fair amount of block > > > > > allocation deallocations, and then after the test finishes, it first > > > > > replays the journal (e2fsck -p) and then forces a fsck run on the > > > > > test disk that I use for the run. > > > > > > > > > > After I launch this, in a separate window, I do this: > > > > > > > > > > sleep 60 ; killall qemu-system-x86_64 > > > > > > > > > > This kills the qemu process midway through the fsx test, and then I > > > > > see if I can find a problem. I haven't had a chance to automate this > > > > > yet, and it is my intention to try to set this up where I can run this > > > > > on a ramdisk or a SSD, so I can more closely approximate what people > > > > > are reporting on flash-based media. > > > > > > > > > > So far, I haven't been able to reproduce the problem. If after doing > > > > > a large number of times, it can't be reproduced (especially if it > > > > > can't be reproduced on an SSD), then it would lead us to believe that > > > > > one of two things is the cause. (a) The CACHE FLUSH command isn't > > > > > properly getting sent to the device in some cases, or (b) there really > > > > > is a hardware problem with the flash device in question. > > > > > > > > Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral > > > > driver? Can you explain why I don't see any problems with EXT3? > > > > > > > > I can't discard the possibility of (b) because I cannot prove it, but I will > > > > try to see if I can do the same test on a SSD which I happen to have on that > > > > platform. That should be able to rule out problems with the eMMC chip and > > > > -driver, right? > > > > > > > > Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? > > > > > > > > I left the system running (it started from a dirty EXT4 partition), and I am > > > > seen the following error pop up after a few minutes. The system is not doing > > > > much (some syslog activity maybe, but not much more): > > > > > > > > [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 > > > > [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 > > > > [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 > > > > > > > > What does that mean? > > > This means that it found previous error in internal ext4's log. Which is > > > normal because your fs was corrupted before. It is reasonable to > > > recreate filesystem from very beginning. > > > > > > In order to understand whenever it is regression in eMMC driver it is > > > reasonable to run integrity test for a device itself. You can run > > > any integrity test you like, For example just run a fio's job > > > "fio disk-verify2.fio" (see attachment), IMPORTANT this script will > > > destroy data on test partition. If it failed with errors like > > > follows "verify: bad magic header XXX" than it is definitely a drivers issue. > > > > I have been trying to run fio on my board with your configuration file, but I > > am having problems, and since I am not familiar with fio at all, I can't > > really figure out what's wrong. My eMMC device is only 916MiB in size, so I > > edited the last part to be: > > > > offset_increment=100M > > size=100M > > > > Is that ok? > > > > I still get error messages complaining about blocksize though. Here is the > > output I get (can't really make sense of it): > > > > # ./fio ../disk-verify2.fio > > Multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures. > > /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 > > ... > > fio-2.1.10-49-gf302 > > Starting 4 processes > > fio: blocksize too large for data set > > fio: blocksize too large for data set > > fio: blocksize too large for data set > > fio: io_u.c:1315: __get_io_u: Assertion `io_u->flags & IO_U_F_FREE' failed.ta 00m:00s] > > fio: pid=7612, got signal=6 > > > > /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=7612: Fri Jul 4 09:31:15 2014 > > lat (msec) : 4=0.19%, 10=0.19%, 20=0.19%, 50=0.85%, 100=1.23% > > lat (msec) : 250=56.01%, 500=37.18%, 750=1.14% > > cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0 > > IO depths : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=97.1%, >=64=0.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > issued : total=r=33/w=1024/d=0, short=r=0/w=0/d=0 > > latency : target=0, window=0, percentile=100.00%, depth=32 > > > > Run status group 0 (all jobs): > > > > Disk stats (read/write): > > mmcblk1: ios=11/1025, merge=0/0, ticks=94/6671, in_queue=7121, util=96.12% > > fio: file hash not empty on exit > > > > > > This assertion bugs me. Is it due to the previous errors ("blocksize too large > > for data set") or is is because my eMMC drive/kernel is seriously screwed? > > > > Help please! > Ohhh. Actually this is axboe's crap. Recent fio's version is broken. > Please use old good commit ffa93ca9d8d37ef > git checkout git://git.kernel.dk/fio.git > cd fio > git checkout -b b2.0.13 ffa93ca9d8d37ef > make -j4 > ./fio ffa93ca9d8d37ef Ok, that fixed it... :-) Here is the output I am getting... AFAICS no problems on the raw device. Is this sufficient testing, Ted? # ./fio ../disk-verify2.fio Multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures. /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 ... /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 fio-2.0.13 Starting 4 processes Jobs: 4 (f=4): [wVww] [100.0% done] [339K/523K/0K /s] [84 /130 /0 iops] [eta 00m:00s] /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=1157: Fri Jul 4 11:22:53 2014 write: io=448828KB, bw=127664 B/s, iops=31 , runt=3600049msec slat (msec): min=1 , max=1821 , avg=30.33, stdev=66.98 clat (usec): min=52 , max=5463.3K, avg=910506.27, stdev=554911.89 lat (msec): min=5 , max=5531 , avg=940.84, stdev=567.60 clat percentiles (msec): | 1.00th=[ 159], 5.00th=[ 253], 10.00th=[ 351], 20.00th=[ 465], | 30.00th=[ 578], 40.00th=[ 685], 50.00th=[ 791], 60.00th=[ 914], | 70.00th=[ 1074], 80.00th=[ 1270], 90.00th=[ 1614], 95.00th=[ 1958], | 99.00th=[ 2802], 99.50th=[ 3163], 99.90th=[ 4228], 99.95th=[ 4555], | 99.99th=[ 5211] bw (KB/s) : min= 3, max= 944, per=37.55%, avg=139.69, stdev=93.70 lat (usec) : 100=0.01% lat (msec) : 2=0.01%, 4=0.61%, 10=4.33%, 20=5.93%, 50=16.04% lat (msec) : 100=18.92%, 250=4.74%, 500=9.38%, 750=12.09%, 1000=9.87% lat (msec) : 2000=15.58%, >=2000=2.51% cpu : usr=0.38%, sys=0.59%, ctx=332741, majf=0, minf=0 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0% issued : total=r=0/w=111616/d=0, short=r=112207/w=0/d=0 /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=1158: Fri Jul 4 11:22:53 2014 read : io=352016KB, bw=946262 B/s, iops=231 , runt=380935msec slat (msec): min=2 , max=12334 , avg=261.58, stdev=527.14 clat (msec): min=18 , max=6145 , avg=59.95, stdev=221.07 lat (msec): min=21 , max=12360 , avg=321.53, stdev=570.56 clat percentiles (msec): | 1.00th=[ 22], 5.00th=[ 23], 10.00th=[ 23], 20.00th=[ 25], | 30.00th=[ 25], 40.00th=[ 26], 50.00th=[ 27], 60.00th=[ 28], | 70.00th=[ 29], 80.00th=[ 34], 90.00th=[ 43], 95.00th=[ 198], | 99.00th=[ 865], 99.50th=[ 1500], 99.90th=[ 3687], 99.95th=[ 3916], | 99.99th=[ 6128] write: io=352072KB, bw=111995 B/s, iops=27 , runt=3219084msec slat (msec): min=1 , max=12214 , avg=36.47, stdev=175.23 clat (usec): min=20 , max=12325K, avg=224666.84, stdev=487725.70 lat (msec): min=2 , max=12333 , avg=261.14, stdev=527.05 clat percentiles (usec): | 1.00th=[ 25], 5.00th=[ 28], 10.00th=[ 8032], 20.00th=[21376], | 30.00th=[38144], 40.00th=[57088], 50.00th=[84480], 60.00th=[132096], | 70.00th=[224256], 80.00th=[288768], 90.00th=[477184], 95.00th=[782336], | 99.00th=[2072576], 99.50th=[2932736], 99.90th=[5865472], 99.95th=[7503872], | 99.99th=[12255232] bw (KB/s) : min= 0, max= 8, per=1.13%, avg= 4.20, stdev= 2.05 lat (usec) : 50=3.49%, 100=0.08% lat (msec) : 4=0.20%, 10=2.37%, 20=3.64%, 50=54.33%, 100=9.55% lat (msec) : 250=12.29%, 500=8.59%, 750=2.25%, 1000=1.04%, 2000=1.49% lat (msec) : >=2000=0.69% cpu : usr=0.28%, sys=0.38%, ctx=199155, majf=0, minf=0 IO depths : 1=7.1%, 2=14.3%, 4=28.6%, 8=50.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=88004/d=0, short=r=88018/w=0/d=0 /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=1159: Fri Jul 4 11:22:53 2014 read : io=270648KB, bw=164244 B/s, iops=40 , runt=1687389msec slat (msec): min=1 , max=2338 , avg=28.22, stdev=72.10 clat (msec): min=1 , max=2747 , avg=24.77, stdev=68.06 lat (msec): min=3 , max=2753 , avg=52.99, stdev=105.59 clat percentiles (msec): | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 4], 20.00th=[ 6], | 30.00th=[ 6], 40.00th=[ 7], 50.00th=[ 7], 60.00th=[ 8], | 70.00th=[ 9], 80.00th=[ 10], 90.00th=[ 35], 95.00th=[ 178], | 99.00th=[ 269], 99.50th=[ 367], 99.90th=[ 734], 99.95th=[ 914], | 99.99th=[ 1942] write: io=270648KB, bw=144929 B/s, iops=35 , runt=1912261msec slat (msec): min=1 , max=2338 , avg=28.04, stdev=72.08 clat (usec): min=20 , max=4560 , avg=27.39, stdev=40.17 lat (msec): min=1 , max=2338 , avg=28.07, stdev=72.08 clat percentiles (usec): | 1.00th=[ 22], 5.00th=[ 23], 10.00th=[ 23], 20.00th=[ 24], | 30.00th=[ 24], 40.00th=[ 25], 50.00th=[ 25], 60.00th=[ 26], | 70.00th=[ 26], 80.00th=[ 27], 90.00th=[ 30], 95.00th=[ 41], | 99.00th=[ 54], 99.50th=[ 65], 99.90th=[ 81], 99.95th=[ 98], | 99.99th=[ 2160] bw (KB/s) : min= 1, max= 8, per=1.40%, avg= 5.22, stdev= 1.68 lat (usec) : 50=49.15%, 100=0.83%, 250=0.01%, 500=0.01% lat (msec) : 2=0.31%, 4=7.00%, 10=33.14%, 20=2.84%, 50=2.10% lat (msec) : 100=0.89%, 250=2.99%, 500=0.61%, 750=0.09%, 1000=0.03% lat (msec) : 2000=0.02%, >=2000=0.01% cpu : usr=0.34%, sys=0.58%, ctx=179182, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=67662/d=0, short=r=67662/w=0/d=0 /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=1160: Fri Jul 4 11:22:53 2014 read : io=269304KB, bw=161896 B/s, iops=39 , runt=1703357msec slat (msec): min=1 , max=2787 , avg=28.12, stdev=69.99 clat (msec): min=1 , max=2876 , avg=25.13, stdev=70.53 lat (msec): min=3 , max=3065 , avg=53.26, stdev=105.59 clat percentiles (msec): | 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 4], 20.00th=[ 6], | 30.00th=[ 6], 40.00th=[ 7], 50.00th=[ 7], 60.00th=[ 8], | 70.00th=[ 9], 80.00th=[ 10], 90.00th=[ 36], 95.00th=[ 178], | 99.00th=[ 269], 99.50th=[ 375], 99.90th=[ 742], 99.95th=[ 1012], | 99.99th=[ 1844] write: io=269304KB, bw=145424 B/s, iops=35 , runt=1896288msec slat (msec): min=1 , max=2787 , avg=27.94, stdev=69.97 clat (usec): min=20 , max=4580 , avg=27.51, stdev=44.68 lat (msec): min=1 , max=2787 , avg=27.98, stdev=69.97 clat percentiles (usec): | 1.00th=[ 22], 5.00th=[ 23], 10.00th=[ 23], 20.00th=[ 24], | 30.00th=[ 24], 40.00th=[ 25], 50.00th=[ 25], 60.00th=[ 25], | 70.00th=[ 26], 80.00th=[ 27], 90.00th=[ 30], 95.00th=[ 40], | 99.00th=[ 54], 99.50th=[ 65], 99.90th=[ 82], 99.95th=[ 105], | 99.99th=[ 2160] bw (KB/s) : min= 1, max= 8, per=1.42%, avg= 5.30, stdev= 1.56 lat (usec) : 50=49.18%, 100=0.79%, 250=0.01%, 500=0.01% lat (msec) : 2=0.45%, 4=6.66%, 10=33.29%, 20=2.93%, 50=2.03% lat (msec) : 100=0.86%, 250=3.03%, 500=0.63%, 750=0.09%, 1000=0.02% lat (msec) : 2000=0.02%, >=2000=0.01% cpu : usr=0.35%, sys=0.58%, ctx=179806, majf=0, minf=0 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued : total=r=0/w=67326/d=0, short=r=67326/w=0/d=0 Run status group 0 (all jobs): READ: io=891968KB, aggrb=523KB/s, minb=158KB/s, maxb=924KB/s, mint=380935msec, maxt=1703357msec WRITE: io=1309.5MB, aggrb=372KB/s, minb=109KB/s, maxb=142KB/s, mint=1896288msec, maxt=3600049msec Disk stats (read/write): mmcblk1: ios=514970/335209, merge=2162/0, ticks=17103230/10385442, in_queue=27486452, util=100.00% Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 11:28 ` David Jander @ 2014-07-04 12:20 ` Theodore Ts'o 2014-07-04 12:38 ` Dmitry Monakhov 2014-07-04 13:45 ` David Jander 0 siblings, 2 replies; 50+ messages in thread From: Theodore Ts'o @ 2014-07-04 12:20 UTC (permalink / raw) To: David Jander; +Cc: Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, Jul 04, 2014 at 01:28:02PM +0200, David Jander wrote: > > Here is the output I am getting... AFAICS no problems on the raw device. Is > this sufficient testing, Ted? I'm not sure what theory Dmitry was trying to pursue when he requested that you run the fio test. Dmitry? Please note that at this point there may be multiple causes with similar symptoms that are showing up. So just because one person reports one set of data points, such as someone claiming they've seen this without a power drop to the storage device, that therefore all of the problems were caused by flaky I/O to the device. Right now, there are multiple theories floating around --- and it may be that more than one of them are true (i.e., there may be multiple bugs here). Some of the possibilities, which again, may not be mutually exclusive: 1) Some kind of eMMC driver bug, which is possibly causing the CACHE FLUSH command not to be sent. 2) Some kind of hardware problem involving flash translation layers not having durable transactions of their flash metadata across power failures. 3) Some kind of ext4/jbd2 bug, recently introduced, where we are modifying some ext4 metadata (either the block allocation bitmap or block group summary statistics) outside of a valid transaction handle. 4) Some other kind of hard-to-reproduce race or wild pointer which is sometimes corrupting fs data structures. If someone has a easy to reproduce failure case, the first step is to do a very rough bisection test. Does the easy-to-reproduce failure go away if you use 3.14? 3.12? Also, if you can describe in great detail your hardware and software configuration, and under what circumstances the problem reproduces, and when it doesn't, that would also be critical. Whether you are just doing reset or a power cycle if an unclean shutdown is involved, might also be important. And at this point, because I'm getting very suspicious that there may be more than one root cause, we should try to keep the debugging of one person's reproduction, such as David's, separate from another's, such as Matteo's. It may be that there ultimately have the same root cause, and so if one person is able to get an interesting reproduction result, it would be great for the other person to try running the same experiment on their hardware/software configuration. But what we must not do is assume that one person's experiment is automatically applicable to other circumstances. Cheers, - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 12:20 ` Theodore Ts'o @ 2014-07-04 12:38 ` Dmitry Monakhov 2014-07-04 13:45 ` David Jander 1 sibling, 0 replies; 50+ messages in thread From: Dmitry Monakhov @ 2014-07-04 12:38 UTC (permalink / raw) To: Theodore Ts'o, David Jander; +Cc: Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, 4 Jul 2014 08:20:22 -0400, Theodore Ts'o <tytso@mit.edu> wrote: > On Fri, Jul 04, 2014 at 01:28:02PM +0200, David Jander wrote: > > > > Here is the output I am getting... AFAICS no problems on the raw device. Is > > this sufficient testing, Ted? > > I'm not sure what theory Dmitry was trying to pursue when he requested > that you run the fio test. Dmitry? Because at this moment we have some complex storage+fs interaction, My idea was to simply isolate raw dev case and run integrity test on that storage. fio/libaio is trivial and easy way to do it(except it does not issued flush cmd). Unfortunetly according to David test finished w/o any error. So my theory about broken strorage driver was not confirmed. > > > Please note that at this point there may be multiple causes with > similar symptoms that are showing up. So just because one person > reports one set of data points, such as someone claiming they've seen > this without a power drop to the storage device, that therefore all of > the problems were caused by flaky I/O to the device. > > Right now, there are multiple theories floating around --- and it may > be that more than one of them are true (i.e., there may be multiple > bugs here). Some of the possibilities, which again, may not be > mutually exclusive: > > 1) Some kind of eMMC driver bug, which is possibly causing the CACHE > FLUSH command not to be sent. > > 2) Some kind of hardware problem involving flash translation layers > not having durable transactions of their flash metadata across power > failures. > > 3) Some kind of ext4/jbd2 bug, recently introduced, where we are > modifying some ext4 metadata (either the block allocation bitmap or > block group summary statistics) outside of a valid transaction handle. > > 4) Some other kind of hard-to-reproduce race or wild pointer which is > sometimes corrupting fs data structures. > > > If someone has a easy to reproduce failure case, the first step is to > do a very rough bisection test. Does the easy-to-reproduce failure go > away if you use 3.14? 3.12? Also, if you can describe in great > detail your hardware and software configuration, and under what > circumstances the problem reproduces, and when it doesn't, that would > also be critical. Whether you are just doing reset or a power cycle > if an unclean shutdown is involved, might also be important. > > And at this point, because I'm getting very suspicious that there may > be more than one root cause, we should try to keep the debugging of > one person's reproduction, such as David's, separate from another's, > such as Matteo's. It may be that there ultimately have the same root > cause, and so if one person is able to get an interesting reproduction > result, it would be great for the other person to try running the same > experiment on their hardware/software configuration. But what we must > not do is assume that one person's experiment is automatically > applicable to other circumstances. > > Cheers, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 12:20 ` Theodore Ts'o 2014-07-04 12:38 ` Dmitry Monakhov @ 2014-07-04 13:45 ` David Jander 2014-07-04 18:45 ` Theodore Ts'o 1 sibling, 1 reply; 50+ messages in thread From: David Jander @ 2014-07-04 13:45 UTC (permalink / raw) To: Theodore Ts'o Cc: Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4 Hi Ted, Dmitry, On Fri, 4 Jul 2014 08:20:22 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Fri, Jul 04, 2014 at 01:28:02PM +0200, David Jander wrote: > > > > Here is the output I am getting... AFAICS no problems on the raw device. Is > > this sufficient testing, Ted? > > I'm not sure what theory Dmitry was trying to pursue when he requested > that you run the fio test. Dmitry? > > > Please note that at this point there may be multiple causes with > similar symptoms that are showing up. So just because one person > reports one set of data points, such as someone claiming they've seen > this without a power drop to the storage device, that therefore all of > the problems were caused by flaky I/O to the device. > > Right now, there are multiple theories floating around --- and it may > be that more than one of them are true (i.e., there may be multiple > bugs here). Some of the possibilities, which again, may not be > mutually exclusive: > > 1) Some kind of eMMC driver bug, which is possibly causing the CACHE > FLUSH command not to be sent. How can I investigate this? According to the fio tests I ran and the explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH commands is the only thing left to be discarded on the eMMC driver front, right? > 2) Some kind of hardware problem involving flash translation layers > not having durable transactions of their flash metadata across power > failures. That would be like blaming Micron (the eMMC part manufacturer) for faulty firmware... could be, but how can we test this? > 3) Some kind of ext4/jbd2 bug, recently introduced, where we are > modifying some ext4 metadata (either the block allocation bitmap or > block group summary statistics) outside of a valid transaction handle. I think I have some more evidence to support this case: Until previously, I did not run fsck EVER! I know that this is not a good idea to do in a production environment, but I am only testing right now, and in theory it should not be necessary, right? What I did this time, was to run fsck.ext3 or fsck.ext4 (depending on FS format of course) once every one or two power cycles. So effectively, what I did amounts to this: CASE 1: fsck on every power-cycle: 1.- Boot from clean filesystem 2.- Run the following command line: $ cp -a /usr . & bonnie\+\+ -r 32 -u 100:100 & bonnie\+\+ -r 32 -u 102:102 3.- Hit CTRL+Z (to stop the second bonnie++ process) 4.- Execute "sync" 5.- While "sync" was running, cut off the power supply. 6.- Turn on power and boot from external medium 7.- Run fsck.ext3/4 on eMMC device 8.- Repeat In this case, there was a minor difference for the fsck output of both filesystems: EXT4 was always something like this: # fsck.ext4 /dev/mmcblk1p2 e2fsck 1.42.5 (29-Jul-2012) rootfs: recovering journal Setting free inodes count to 37692 (was 37695) Setting free blocks count to 136285 (was 136291) rootfs: clean, 7140/44832 files, 42915/179200 blocks While for EXT3 the output did not contain the "Setting free * count..." messages: # fsck.ext3 -p /dev/mmcblk1p2 rootfs: recovering journal rootfs: clean, 4895/44832 files, 36473/179200 blocks CASE 2: fsck on every other power-cycle: Same as CASE 1 steps 1...5 and then: 6.- Turn on power and boot again from dirty internal eMMC without running fsck. 7.- Repeat steps 2...5 one more time 8.- Perform steps 6...8 from CASE 1. With this test, the following difference became apparent: With EXT3: fsck.ext3 did the same as in CASE 1 With EXT4: I get a long list of errors that are being fixed. It starts like this: # fsck.ext4 /dev/mmcblk1p2 e2fsck 1.42.5 (29-Jul-2012) rootfs: recovering journal rootfs contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Inode 4591, i_blocks is 16, should be 8. Fix<y>? yes Inode 4594, i_blocks is 16, should be 8. Fix<y>? yes Inode 4595, i_blocks is 16, should be 8. Fix<y>? yes Inode 4596, i_blocks is 16, should be 8. Fix<y>? yes Inode 4597, i_blocks is 16, should be 8. Fix<y>? yes Inode 4598, i_blocks is 16, should be 8. Fix<y>? yes Inode 4599, i_blocks is 16, should be 8. Fix<y>? yes Inode 4600, i_blocks is 16, should be 8. Fix<y>? yes Inode 4601, i_blocks is 16, should be 8. Fix<y>? yes Inode 4602, i_blocks is 16, should be 8. Fix<y>? yes Inode 4603, i_blocks is 16, should be 8. Fix<y>? yes ... ... Eventually I pressed CTRL+C and restarted fsck with the option "-p", because this list was getting a little long. ... ... # fsck.ext4 -p /dev/mmcblk1p2 rootfs contains a file system with errors, check forced. rootfs: Inode 5391, i_blocks is 32, should be 16. FIXED. rootfs: Inode 5392, i_blocks is 16, should be 8. FIXED. rootfs: Inode 5393, i_blocks is 48, should be 24. FIXED. rootfs: Inode 5394, i_blocks is 32, should be 16. FIXED. rootfs: Inode 5395, i_blocks is 16, should be 8. FIXED. ... ... rootfs: Inode 5854, i_blocks is 240, should be 120. FIXED. rootfs: Inode 5857, i_blocks is 576, should be 288. FIXED. rootfs: Inode 5860, i_blocks is 512, should be 256. FIXED. rootfs: Inode 5863, i_blocks is 656, should be 328. FIXED. rootfs: Inode 5866, i_blocks is 480, should be 240. FIXED. rootfs: Inode 5869, i_blocks is 176, should be 88. FIXED. rootfs: Inode 5872, i_blocks is 336, should be 168. FIXED. rootfs: 11379/44832 files (0.1% non-contiguous), 70010/179200 blocks # > 4) Some other kind of hard-to-reproduce race or wild pointer which is > sometimes corrupting fs data structures. I don't have such a hard time reproducing it... but it does take quite some time (booting several times, re-installing, testing, etc...) > If someone has a easy to reproduce failure case, the first step is to > do a very rough bisection test. Does the easy-to-reproduce failure go > away if you use 3.14? 3.12? Also, if you can describe in great > detail your hardware and software configuration, and under what > circumstances the problem reproduces, and when it doesn't, that would > also be critical. Whether you are just doing reset or a power cycle > if an unclean shutdown is involved, might also be important. Until now, I always do a power-cycle, but I can try to check if I am able to reproduce the problem with just a "shutdown -f" (AFAIK, this does NOT sync filesystems, right?) I will try to check 3.14 and 3.12 (if 3.14 still seems buggy). It could take quite a while until I have results... certainly not before monday. > And at this point, because I'm getting very suspicious that there may > be more than one root cause, we should try to keep the debugging of > one person's reproduction, such as David's, separate from another's, > such as Matteo's. It may be that there ultimately have the same root > cause, and so if one person is able to get an interesting reproduction > result, it would be great for the other person to try running the same > experiment on their hardware/software configuration. But what we must > not do is assume that one person's experiment is automatically > applicable to other circumstances. I agree. Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 13:45 ` David Jander @ 2014-07-04 18:45 ` Theodore Ts'o 2014-07-04 22:46 ` Dave Chinner ` (3 more replies) 0 siblings, 4 replies; 50+ messages in thread From: Theodore Ts'o @ 2014-07-04 18:45 UTC (permalink / raw) To: David Jander; +Cc: Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, Jul 04, 2014 at 03:45:59PM +0200, David Jander wrote: > > 1) Some kind of eMMC driver bug, which is possibly causing the CACHE > > FLUSH command not to be sent. > > How can I investigate this? According to the fio tests I ran and the > explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH > commands is the only thing left to be discarded on the eMMC driver front, > right? Can you try using an older kernel? The report that that I quoted from John Stultz (https://lkml.org/lkml/2014/6/12/19) indicated that it was a problem that showed up in "recent kernels", and a bisection search seemed to point towards an unknown problem in the eMMC driver. Quoting from https://lkml.org/lkml/2014/6/12/762: "However, despite many many reboots the last good commit in my branch - bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: Fixup busy detection while...) doesn't ever show the issue. While the immediately following commit which bisect found - e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD irq before DATA irq) always does. The immensely frustrating part is while backing that single change off from its commit sha always makes the issue go away, reverting that change from on top of v3.15 doesn't. The issue persists....." > > 2) Some kind of hardware problem involving flash translation layers > > not having durable transactions of their flash metadata across power > > failures. > > That would be like blaming Micron (the eMMC part manufacturer) for faulty > firmware... could be, but how can we test this? The problem is that people who write these programs end up doing one-offs, as opposed to something that is well packaged and stands the test of time. But basically what we want is a program that writes to sequential blocks in a block device with the following information: *) a timestamp (seconds and microseconds from gettimeofday) *) a 64-bit generation number (which is randomly generated and the same for each run of the progam) *) a 32-bit sequence number (starts at zero and increments once per block *) a 32-bit "sync" number which is written after each time fsync(2) is called while writing to the disk *) the sector number where the data was written *) a CRC of the above information *) some random pattern to fill the rest of the 512 or 4k block, depending on the physical sector size The program would then write this information to each 512 or 4k block, and periodically, say every N blocks (where N should be configurable), it would call fsync() which on an open block device, should result in a CACHE FLUSH command being sent. Ideally, the sequence number and sync number would be output via a serial console, or maybe over an ssh connection, so you can see what was the last sequence number and sync number that was written before the disk, before you yank the power cord out of the wall. Then you have write another program which scans the disk and makes sure that all of the blocks up until the last sync number that was output was in fact preserved faithfully on the disk after the power outage. If not, you know you have a problem. For bonus credit, you can also have a mode where the disk is pre-written with some known repeating pattern, such as 0xDEADBEEF, and then see if you have some sectors that do not contain either the expected 0xDEADBEEF, or the test pattern described above. That indicates a potential problem where as power drops, sometimes the memory can go insane before the hard drive and the DMA engine manages to stop. (Since as the saying goes, death is not an event, but rather a process, and different components inside the computer can die at different times.) SGI discovered this problem very early on, which is why SGI machines have extra large capacitors on their power supplies, plus a power fail interrupt, so that when they lose power, the system can abort DMA transfers before allowing the power to shut down. Otherwise, a file system which is doing logical journalling can end up with crap getting written on top of some critical file system data block, and which replying the journal won't help, since the journal just contains things like "update the mtime to be this value". Since ext3 and ext4 do physical block journalling, we don't suffer from this problem since we always write the journal first, and the journal is flushed to disk before we update the metadata blocks. On a power failure, when we replay the journal, if there is a corrupted fs metadata block due to the power failure causing garbage to be written to the disk, the damage can get repaired automatically when the journal is replayed. And this is a good thing, since PC class hardware generally don't have power fail interrupts. OTOH, ext3/4 is not as efficient as file systems such as XFS that do logical journalling. Life is full of tradeoffs.... > EXT4 was always something like this: > > # fsck.ext4 /dev/mmcblk1p2 > e2fsck 1.42.5 (29-Jul-2012) > rootfs: recovering journal > Setting free inodes count to 37692 (was 37695) > Setting free blocks count to 136285 (was 136291) > rootfs: clean, 7140/44832 files, 42915/179200 blocks This is normal, and not an indication of a problem. In ext4, we longer update the free blocks and free inodes count in the superblock, except when we unmount the file system. Otherwise on large CPU systems, needing to grab the superblock lock to update these values on each block/inode allocation and deallocation becomes a massive scalability bottleneck. Instead, at mount time, since we need need to read in the block group descriptors *anyway*, we just sum the free blocks/inodes in each block group descriptor, and update the global counters then. So the fact that the global free blocks and free inodes values in the superblock are not up to date after a crash is normal, and expected. This is why e2fsck didn't ask for permission before fixing this issue, which it would have when you run e2fsck without either the -p or -y option. > CASE 2: fsck on every other power-cycle: > > Same as CASE 1 steps 1...5 and then: > 6.- Turn on power and boot again from dirty internal eMMC without running fsck. > 7.- Repeat steps 2...5 one more time > 8.- Perform steps 6...8 from CASE 1. > > With this test, the following difference became apparent: > > With EXT3: fsck.ext3 did the same as in CASE 1 > > With EXT4: I get a long list of errors that are being fixed. > It starts like this: > > # fsck.ext4 /dev/mmcblk1p2 > e2fsck 1.42.5 (29-Jul-2012) > rootfs: recovering journal > rootfs contains a file system with errors, check forced. > Pass 1: Checking inodes, blocks, and sizes > Inode 4591, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4594, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4595, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4596, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4597, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4598, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4599, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4600, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4601, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4602, i_blocks is 16, should be 8. Fix<y>? yes > Inode 4603, i_blocks is 16, should be 8. Fix<y>? yes *This* is a problem. And if you're seeing it for such a very large number of cases, it makes it unlikely to me that it is a ext4/jbd2 bug. For one thing, if it's so blatent, I should be able to replicate it very easily when using my "kill qemu" test. What this makes me suspect is that you may be getting an older version of an inode table block. That's why the above test procedure I outlined included the timestamp and the generation number in each block. If after a power failure you start seeing an old generation number, then that's a hint that the FTL did somethign funky. And the advantage of the using a test progam that writes to the raw eMMC flash is that it cuts the file system completely out of the picture.... Cheers, - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 18:45 ` Theodore Ts'o @ 2014-07-04 22:46 ` Dave Chinner 2014-07-05 2:30 ` Dmitry Monakhov ` (2 subsequent siblings) 3 siblings, 0 replies; 50+ messages in thread From: Dave Chinner @ 2014-07-04 22:46 UTC (permalink / raw) To: Theodore Ts'o Cc: David Jander, Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, Jul 04, 2014 at 02:45:39PM -0400, Theodore Ts'o wrote: > On Fri, Jul 04, 2014 at 03:45:59PM +0200, David Jander wrote: > > > 1) Some kind of eMMC driver bug, which is possibly causing the CACHE > > > FLUSH command not to be sent. > > > > How can I investigate this? According to the fio tests I ran and the > > explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH > > commands is the only thing left to be discarded on the eMMC driver front, > > right? > > Can you try using an older kernel? The report that that I quoted from > John Stultz (https://lkml.org/lkml/2014/6/12/19) indicated that it was > a problem that showed up in "recent kernels", and a bisection search > seemed to point towards an unknown problem in the eMMC driver. > Quoting from https://lkml.org/lkml/2014/6/12/762: > > "However, despite many many reboots the last good commit in my > branch - bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: > Fixup busy detection while...) doesn't ever show the issue. While > the immediately following commit which bisect found - > e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD > irq before DATA irq) always does. > > The immensely frustrating part is while backing that single change off > from its commit sha always makes the issue go away, reverting that > change from on top of v3.15 doesn't. The issue persists....." > > > > 2) Some kind of hardware problem involving flash translation layers > > > not having durable transactions of their flash metadata across power > > > failures. > > > > That would be like blaming Micron (the eMMC part manufacturer) for faulty > > firmware... could be, but how can we test this? > > The problem is that people who write these programs end up doing > one-offs, as opposed to something that is well packaged and stands the > test of time. But basically what we want is a program that writes to > sequential blocks in a block device with the following information: > > *) a timestamp (seconds and microseconds from gettimeofday) > *) a 64-bit generation number (which is randomly > generated and the same for each run of the progam) > *) a 32-bit sequence number (starts at zero and > increments once per block > *) a 32-bit "sync" number which is written after each time > fsync(2) is called while writing to the disk > *) the sector number where the data was written > *) a CRC of the above information > *) some random pattern to fill the rest of the 512 or 4k block, > depending on the physical sector size genstream + checkstream. http://oss.sgi.com/projects/nfs/testtools/ Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 18:45 ` Theodore Ts'o 2014-07-04 22:46 ` Dave Chinner @ 2014-07-05 2:30 ` Dmitry Monakhov 2014-07-05 20:36 ` Theodore Ts'o 2014-07-07 12:17 ` David Jander 3 siblings, 0 replies; 50+ messages in thread From: Dmitry Monakhov @ 2014-07-05 2:30 UTC (permalink / raw) To: Theodore Ts'o, David Jander; +Cc: Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, 4 Jul 2014 14:45:39 -0400, Theodore Ts'o <tytso@mit.edu> wrote: > On Fri, Jul 04, 2014 at 03:45:59PM +0200, David Jander wrote: > > > 1) Some kind of eMMC driver bug, which is possibly causing the CACHE > > > FLUSH command not to be sent. > > > > How can I investigate this? According to the fio tests I ran and the > > explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH > > commands is the only thing left to be discarded on the eMMC driver front, > > right? > > Can you try using an older kernel? The report that that I quoted from > John Stultz (https://lkml.org/lkml/2014/6/12/19) indicated that it was > a problem that showed up in "recent kernels", and a bisection search > seemed to point towards an unknown problem in the eMMC driver. > Quoting from https://lkml.org/lkml/2014/6/12/762: > > "However, despite many many reboots the last good commit in my > branch - bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: > Fixup busy detection while...) doesn't ever show the issue. While > the immediately following commit which bisect found - > e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD > irq before DATA irq) always does. > > The immensely frustrating part is while backing that single change off > from its commit sha always makes the issue go away, reverting that > change from on top of v3.15 doesn't. The issue persists....." > > > > 2) Some kind of hardware problem involving flash translation layers > > > not having durable transactions of their flash metadata across power > > > failures. > > > > That would be like blaming Micron (the eMMC part manufacturer) for faulty > > firmware... could be, but how can we test this? > > The problem is that people who write these programs end up doing > one-offs, as opposed to something that is well packaged and stands the > test of time. But basically what we want is a program that writes to > sequential blocks in a block device with the following information: > > *) a timestamp (seconds and microseconds from gettimeofday) > *) a 64-bit generation number (which is randomly > generated and the same for each run of the progam) > *) a 32-bit sequence number (starts at zero and > increments once per block > *) a 32-bit "sync" number which is written after each time > fsync(2) is called while writing to the disk > *) the sector number where the data was written > *) a CRC of the above information > *) some random pattern to fill the rest of the 512 or 4k block, > depending on the physical sector size I've posted similar utility (hwflush-check) here http://oss.sgi.com/archives/xfs/2013-05/msg00443.html We use it for testing power-tolerance for SSD. It has client and server # On a server side, start to listen incoming connections # Stage0: hwflush-check -l ############ # On tested node # Stage 1: Create and allocate all files. hwflush-check -s $SERVER_IP -d $SCRATCH_MNT --prep # Stage 2: Start test, end let it work for a while hwflush-check -s $SERVER_IP -d $SCRATCH_MNT & sleep 20; wall "it is time to cut power cord" # Stage 3: On next boot check that all data was successfully flushed. hwflush-check -s $SERVER_IP -d $SCRATCH_MNT --check It is reasonable to repeate steps 2,3 several times. > > The program would then write this information to each 512 or 4k block, > and periodically, say every N blocks (where N should be configurable), > it would call fsync() which on an open block device, should result in > a CACHE FLUSH command being sent. > > Ideally, the sequence number and sync number would be output via a > serial console, or maybe over an ssh connection, so you can see what > was the last sequence number and sync number that was written before > the disk, before you yank the power cord out of the wall. > > Then you have write another program which scans the disk and makes > sure that all of the blocks up until the last sync number that was > output was in fact preserved faithfully on the disk after the power > outage. > > If not, you know you have a problem. > > For bonus credit, you can also have a mode where the disk is > pre-written with some known repeating pattern, such as 0xDEADBEEF, and > then see if you have some sectors that do not contain either the > expected 0xDEADBEEF, or the test pattern described above. > > That indicates a potential problem where as power drops, sometimes the > memory can go insane before the hard drive and the DMA engine manages > to stop. (Since as the saying goes, death is not an event, but rather > a process, and different components inside the computer can die at > different times.) SGI discovered this problem very early on, which is > why SGI machines have extra large capacitors on their power supplies, > plus a power fail interrupt, so that when they lose power, the system > can abort DMA transfers before allowing the power to shut down. > Otherwise, a file system which is doing logical journalling can end up > with crap getting written on top of some critical file system data > block, and which replying the journal won't help, since the journal > just contains things like "update the mtime to be this value". > > Since ext3 and ext4 do physical block journalling, we don't suffer > from this problem since we always write the journal first, and the > journal is flushed to disk before we update the metadata blocks. On a > power failure, when we replay the journal, if there is a corrupted fs > metadata block due to the power failure causing garbage to be written > to the disk, the damage can get repaired automatically when the > journal is replayed. And this is a good thing, since PC class > hardware generally don't have power fail interrupts. OTOH, ext3/4 is > not as efficient as file systems such as XFS that do logical > journalling. Life is full of tradeoffs.... > > > EXT4 was always something like this: > > > > # fsck.ext4 /dev/mmcblk1p2 > > e2fsck 1.42.5 (29-Jul-2012) > > rootfs: recovering journal > > Setting free inodes count to 37692 (was 37695) > > Setting free blocks count to 136285 (was 136291) > > rootfs: clean, 7140/44832 files, 42915/179200 blocks > > This is normal, and not an indication of a problem. In ext4, we > longer update the free blocks and free inodes count in the superblock, > except when we unmount the file system. Otherwise on large CPU > systems, needing to grab the superblock lock to update these values on > each block/inode allocation and deallocation becomes a massive > scalability bottleneck. > > Instead, at mount time, since we need need to read in the block group > descriptors *anyway*, we just sum the free blocks/inodes in each block > group descriptor, and update the global counters then. So the fact > that the global free blocks and free inodes values in the superblock > are not up to date after a crash is normal, and expected. This is why > e2fsck didn't ask for permission before fixing this issue, which it > would have when you run e2fsck without either the -p or -y option. > > > CASE 2: fsck on every other power-cycle: > > > > Same as CASE 1 steps 1...5 and then: > > 6.- Turn on power and boot again from dirty internal eMMC without running fsck. > > 7.- Repeat steps 2...5 one more time > > 8.- Perform steps 6...8 from CASE 1. > > > > With this test, the following difference became apparent: > > > > With EXT3: fsck.ext3 did the same as in CASE 1 > > > > With EXT4: I get a long list of errors that are being fixed. > > It starts like this: > > > > # fsck.ext4 /dev/mmcblk1p2 > > e2fsck 1.42.5 (29-Jul-2012) > > rootfs: recovering journal > > rootfs contains a file system with errors, check forced. > > Pass 1: Checking inodes, blocks, and sizes > > Inode 4591, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4594, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4595, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4596, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4597, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4598, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4599, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4600, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4601, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4602, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4603, i_blocks is 16, should be 8. Fix<y>? yes > > *This* is a problem. And if you're seeing it for such a very large > number of cases, it makes it unlikely to me that it is a ext4/jbd2 > bug. For one thing, if it's so blatent, I should be able to replicate > it very easily when using my "kill qemu" test. > > What this makes me suspect is that you may be getting an older version > of an inode table block. That's why the above test procedure I > outlined included the timestamp and the generation number in each > block. If after a power failure you start seeing an old generation > number, then that's a hint that the FTL did somethign funky. > > And the advantage of the using a test progam that writes to the raw > eMMC flash is that it cuts the file system completely out of the > picture.... > > Cheers, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 18:45 ` Theodore Ts'o 2014-07-04 22:46 ` Dave Chinner 2014-07-05 2:30 ` Dmitry Monakhov @ 2014-07-05 20:36 ` Theodore Ts'o 2014-07-07 12:17 ` David Jander 3 siblings, 0 replies; 50+ messages in thread From: Theodore Ts'o @ 2014-07-05 20:36 UTC (permalink / raw) To: David Jander Cc: Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4, Pavel Machek I've been auditing the ext4 code looking for a potential bug in our journalling code that might result in fs corruption after a crash/power failure. I did find one such instance: http://patchwork.ozlabs.org/patch/367346/ However, I'm fairly sure this is not the cause of most of the problems folks have been complaining about. For one thing, this bug has been around for a long, long time (since v3.5 or so). For another, hitting it is incredibly rare. It would only be noticeable if you crash immediately (within 5 seconds or so) after allocating the first inode in a block group where both the inode allocation bitmap and block allocation bitmap are uninitialized (i.e, no inodes or blocks have ever been allocated from that block group since the file system was first initialized). So for people who are indicating that they are seeing problems much more frequently in 3.15, but not in older kernels, it's almost certainly something else going on. But if you can reliably reproduce the problem, you might want to try applying the patch. It certainly won't hurt, and if it does help, please let me know. :-) Cheers, - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 18:45 ` Theodore Ts'o ` (2 preceding siblings ...) 2014-07-05 20:36 ` Theodore Ts'o @ 2014-07-07 12:17 ` David Jander 2014-07-07 15:53 ` Theodore Ts'o 3 siblings, 1 reply; 50+ messages in thread From: David Jander @ 2014-07-07 12:17 UTC (permalink / raw) To: Theodore Ts'o Cc: Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4 Dear Ted, On Fri, 4 Jul 2014 14:45:39 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Fri, Jul 04, 2014 at 03:45:59PM +0200, David Jander wrote: > > > 1) Some kind of eMMC driver bug, which is possibly causing the CACHE > > > FLUSH command not to be sent. > > > > How can I investigate this? According to the fio tests I ran and the > > explanation Dmitry gave, I conclude that incorrectly sending of CACHE-FLUSH > > commands is the only thing left to be discarded on the eMMC driver front, > > right? > > Can you try using an older kernel? The report that that I quoted from > John Stultz (https://lkml.org/lkml/2014/6/12/19) indicated that it was > a problem that showed up in "recent kernels", and a bisection search > seemed to point towards an unknown problem in the eMMC driver. > Quoting from https://lkml.org/lkml/2014/6/12/762: > > "However, despite many many reboots the last good commit in my > branch - bb5cba40dc7f079ea7ee3ae760b7c388b6eb5fc3 (mmc: block: > Fixup busy detection while...) doesn't ever show the issue. While > the immediately following commit which bisect found - > e7f3d22289e4307b3071cc18b1d8ecc6598c0be4 (mmc: mmci: Handle CMD > irq before DATA irq) always does. > > The immensely frustrating part is while backing that single change off > from its commit sha always makes the issue go away, reverting that > change from on top of v3.15 doesn't. The issue persists....." Next thing I am going to do is check 3.14 and/or 3.12, but first... > > > 2) Some kind of hardware problem involving flash translation layers > > > not having durable transactions of their flash metadata across power > > > failures. > > > > That would be like blaming Micron (the eMMC part manufacturer) for faulty > > firmware... could be, but how can we test this? > > The problem is that people who write these programs end up doing > one-offs, as opposed to something that is well packaged and stands the > test of time. But basically what we want is a program that writes to > sequential blocks in a block device with the following information: > > *) a timestamp (seconds and microseconds from gettimeofday) > *) a 64-bit generation number (which is randomly > generated and the same for each run of the progam) > *) a 32-bit sequence number (starts at zero and > increments once per block > *) a 32-bit "sync" number which is written after each time > fsync(2) is called while writing to the disk > *) the sector number where the data was written > *) a CRC of the above information > *) some random pattern to fill the rest of the 512 or 4k block, > depending on the physical sector size > > The program would then write this information to each 512 or 4k block, > and periodically, say every N blocks (where N should be configurable), > it would call fsync() which on an open block device, should result in > a CACHE FLUSH command being sent. > > Ideally, the sequence number and sync number would be output via a > serial console, or maybe over an ssh connection, so you can see what > was the last sequence number and sync number that was written before > the disk, before you yank the power cord out of the wall. > > Then you have write another program which scans the disk and makes > sure that all of the blocks up until the last sync number that was > output was in fact preserved faithfully on the disk after the power > outage. > > If not, you know you have a problem. > > For bonus credit, you can also have a mode where the disk is > pre-written with some known repeating pattern, such as 0xDEADBEEF, and > then see if you have some sectors that do not contain either the > expected 0xDEADBEEF, or the test pattern described above. Ok, I wrote a simple python test script (see at the end of this e-mail) that implements your idea. The tool checks the device, to see how far it came last time, eventually analyzes and dumps a corrupt block, rewrites the whole device to the background pattern (0xdeadbeef), syncs and then starts filling sectors from the beginning with records consisting of sequence number (=sector index), 64-bit pseudo-random number, a timestamp, a CRC and filling data (0x55). The sync-interval is an input parameter and is the interval (in sectors) between calls to fsync() (Yes, I took care to flush python file objects first!). The sector number after each sync is output to stdout, which is logged on a serial port. Next time the program is run (after a power-cycle), data must be correctly written to at least that sector, and after that only sectors with 100% background-fill have to exist. This is checked for. I have been running this script repeatedly with different sync-interval values on a 256 MiB partition of the eMMC device. Until now, I have not been able to produce a sector with corrupt data, nor evidence of CACHE_FLUSH not being processed correctly. The tool itself is quite hacky (I wrote it this morning), but AFAICS it should work correctly for our purposes. > That indicates a potential problem where as power drops, sometimes the > memory can go insane before the hard drive and the DMA engine manages > to stop. (Since as the saying goes, death is not an event, but rather > a process, and different components inside the computer can die at > different times.) SGI discovered this problem very early on, which is > why SGI machines have extra large capacitors on their power supplies, Hahaha, yes, I think I heard that story before, but I honestly always thought it was a joke or a myth, since it is pretty stupid. As for our hardware, it does have proper brown-out detection and asserts "RESET" pins on all peripherals that have one as soon as any power rail comes out of range. This would have been the correct way of fixing SGI's "problem" IMHO. > plus a power fail interrupt, so that when they lose power, the system > can abort DMA transfers before allowing the power to shut down. A power-fail interrupt would be helpful to make sure there is no ongoing write or erase activity in the FLASH chips when power finally drops, but then the eMMC device itself should have support for it. Instead of that, they do actually guarantee that power interruptions are harmless through the "reliable-writes" feature in the FTL firmware. The question is if we can indeed trust that firmware. > Otherwise, a file system which is doing logical journalling can end up > with crap getting written on top of some critical file system data > block, and which replying the journal won't help, since the journal > just contains things like "update the mtime to be this value". I am not familiar with filesystems having such type of journalling, and I wonder what would be the use of it anyway, but I am not an expert in filesystems.... > Since ext3 and ext4 do physical block journalling, we don't suffer > from this problem since we always write the journal first, and the > journal is flushed to disk before we update the metadata blocks. That I understand. That's why a working CACHE_FLUSH is so important, right? > On a > power failure, when we replay the journal, if there is a corrupted fs > metadata block due to the power failure causing garbage to be written > to the disk, the damage can get repaired automatically when the > journal is replayed. And this is a good thing, since PC class > hardware generally don't have power fail interrupts. OTOH, ext3/4 is > not as efficient as file systems such as XFS that do logical > journalling. Life is full of tradeoffs.... I never dared to use anything different that trusty ol' ext3/4.... or JFFS2 and UBIFS when dealing with bare flash devices of course. > > EXT4 was always something like this: > > > > # fsck.ext4 /dev/mmcblk1p2 > > e2fsck 1.42.5 (29-Jul-2012) > > rootfs: recovering journal > > Setting free inodes count to 37692 (was 37695) > > Setting free blocks count to 136285 (was 136291) > > rootfs: clean, 7140/44832 files, 42915/179200 blocks > > This is normal, and not an indication of a problem. In ext4, we > longer update the free blocks and free inodes count in the superblock, > except when we unmount the file system. Otherwise on large CPU > systems, needing to grab the superblock lock to update these values on > each block/inode allocation and deallocation becomes a massive > scalability bottleneck. > > Instead, at mount time, since we need need to read in the block group > descriptors *anyway*, we just sum the free blocks/inodes in each block > group descriptor, and update the global counters then. So the fact > that the global free blocks and free inodes values in the superblock > are not up to date after a crash is normal, and expected. This is why > e2fsck didn't ask for permission before fixing this issue, which it > would have when you run e2fsck without either the -p or -y option. Ok, thanks a lot for this detailed explanation! > > CASE 2: fsck on every other power-cycle: > > > > Same as CASE 1 steps 1...5 and then: > > 6.- Turn on power and boot again from dirty internal eMMC without running fsck. > > 7.- Repeat steps 2...5 one more time > > 8.- Perform steps 6...8 from CASE 1. > > > > With this test, the following difference became apparent: > > > > With EXT3: fsck.ext3 did the same as in CASE 1 > > > > With EXT4: I get a long list of errors that are being fixed. > > It starts like this: > > > > # fsck.ext4 /dev/mmcblk1p2 > > e2fsck 1.42.5 (29-Jul-2012) > > rootfs: recovering journal > > rootfs contains a file system with errors, check forced. > > Pass 1: Checking inodes, blocks, and sizes > > Inode 4591, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4594, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4595, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4596, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4597, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4598, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4599, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4600, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4601, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4602, i_blocks is 16, should be 8. Fix<y>? yes > > Inode 4603, i_blocks is 16, should be 8. Fix<y>? yes > > *This* is a problem. And if you're seeing it for such a very large > number of cases, it makes it unlikely to me that it is a ext4/jbd2 > bug. For one thing, if it's so blatent, I should be able to replicate > it very easily when using my "kill qemu" test. > > What this makes me suspect is that you may be getting an older version > of an inode table block. That's why the above test procedure I > outlined included the timestamp and the generation number in each > block. If after a power failure you start seeing an old generation > number, then that's a hint that the FTL did somethign funky. I have failed miserably until now at trying to prove the FTL wrong (or funky at least ;-). > And the advantage of the using a test progam that writes to the raw > eMMC flash is that it cuts the file system completely out of the > picture.... Any suggestion on how to improve my eMMC device test script (see below)? Best regards, -- David Jander Protonic Holland. filldev.py: #!/usr/bin/env python # vim: tabstop=4 import os, sys, stat from time import time from zlib import crc32 import subprocess class Perc(object): def __init__(self, size): self.size = size self.last = -1 def update(self, n): perc = int((n * 1000) / self.size) if perc != self.last: sys.stdout.write("\rProgress : %4.1f %% " % (perc/10.0)) sys.stdout.flush() self.last = perc def devstat(dev): # Return size of dev, if it's a file or a device size = os.stat(dev).st_size if not size: # It's probably a device... txt, _ = subprocess.Popen("blockdev --getsize " + dev, shell=True, stdout=subprocess.PIPE).communicate() size = int(txt) * 512 return size def initdev(dev): size = devstat(dev) f = open(dev, 'r+b') data = "\xde\xad\xbe\xef" * (512 / 4) blocks = size / len(data) p = Perc(blocks) print "Filling whole device with background pattern:" for i in xrange(blocks): f.write(data) p.update(i) p.update(i+1) f.close() os.system("sync") print "\nDone." class PatternGenerator(object): def __init__(self): self.index = 0 self.rand = 18274091872761 def nextrand(self): ret = self.rand * 128273237635 + 45845 self.rand = ret % (2**32) return (ret >> 16) & 0xffff def nextrand64(self): ret = self.nextrand() | (self.nextrand() << 16) return ret | (self.nextrand() << 32) | (self.nextrand() << 48) def timestamp(self): return "%17.6f" % time() def make_record(self): ret = "%010d %17.6f %16x" % (self.index, time(), self.nextrand64()) ret += " %08x \n\0" % (2**31 + crc32(ret)) self.index += 1 fill = "\x55" * (512 - len(ret)) return ret + fill def check_record_crc(self, rec): try: data, crc, rest = rec.rsplit(" ", 2) except ValueError: return False ncrc = (2**31 + crc32(data)) crc = int(crc, 16) return crc == ncrc def sync(self): self.f.flush() os.fsync(self.f) os.system("sync") class PatternWriter(PatternGenerator): def __init__(self, dev, syncinterval): PatternGenerator.__init__(self) self.size = devstat(dev) self.blocks = self.size / 512 self.f = open(dev, 'r+b') self.perc = Perc(self.blocks) self.syncinterval = syncinterval def write_one(self): self.f.write(self.make_record()) if (self.index % self.syncinterval) == 0: self.sync() print "Synced at block", self.index self.perc.update(self.index) def run(self): print "Writing data to device..." while self.blocks > self.index: self.write_one() print "\nDone." class PatternChecker(PatternGenerator): def __init__(self, dev): PatternGenerator.__init__(self) self.size = devstat(dev) self.blocks = self.size / 512 self.f = open(dev, 'rb') self.perc = Perc(self.blocks) self.ti = -1 def read_one(self): return self.f.read(512) def analyze_block(self, blk): data = "\xde\xad\xbe\xef" * (512 / 4) self.index += 1 if blk == data: return 1 print "Possibly corrupt block detected!" print "Block data:", repr(blk) return 2 def check_block(self, blk): if not self.check_record_crc(blk): print "\nCRC error on block", self.index ret = self.analyze_block(blk) if ret == 1: print "Block contents are 100% fill pattern." print "Checking rest of device for fill pattern..." return ret idx, ti, rnd, crc, rest = blk.split() idx = int(idx) ti = float(ti) rnd = int(rnd, 16) if idx != self.index: print "\nIndex number error on block", self.index print "Block data:", repr(blk) return 3 if ti < self.ti: print "\nTimestamp error on block", self.index print "Block data:", repr(blk) return 4 self.ti = ti if rnd != self.nextrand64(): # FIXME: This can never be true, so remove this check? print "\nRandom number incorrect on block", self.index print "Block data:", repr(blk) return 5 self.index += 1 return 0 def run(self): print "Checking device contents..." checkfill = False while self.blocks > self.index: blk = self.read_one() if checkfill: ret = self.analyze_block(blk) else: ret = self.check_block(blk) if ret == 1: checkfill = True elif ret: sys.exit(ret) self.perc.update(self.index) print "\nDone." return 0 if __name__ == "__main__": if len(sys.argv) < 3: print "Syntax: filldev <device> <sync-interval>" sys.exit(2) dev = sys.argv[1] sint = int(sys.argv[2], 0) c = PatternChecker(dev) c.run() initdev(dev) w = PatternWriter(dev, sint) w.run() ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-07 12:17 ` David Jander @ 2014-07-07 15:53 ` Theodore Ts'o 2014-07-07 22:31 ` Darrick J. Wong 2014-07-07 22:56 ` Theodore Ts'o 0 siblings, 2 replies; 50+ messages in thread From: Theodore Ts'o @ 2014-07-07 15:53 UTC (permalink / raw) To: David Jander; +Cc: Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4 An update from today's ext4 concall. Eric Whitney can fairly reliably reproduce this on his Panda board with 3.15, and definitely not on 3.14. So at this point there seems to be at least some kind of 3.15 regression going on here, regardless of whether it's in the eMMC driver or the ext4 code. (It also means that the bug fix I found is irrelevant for the purposes of working this issue, since that's a much harder to hit, and that bug has been around long before 3.14.) The problem in terms of narrowing it down any further is that the Pandaboard is running into RCU bugs which makes it hard to test the early 3.15-rcX kernels. There is some indication that the bug showed up in the ext4 patches which Linus pulled at the beginning of 3.15-rc3. However, due to the ARM (or at least Pandaboard) RCU bugs, it's not possible to bisect test this on the Pandaboard. And on the x86_64, it takes most of a day to confirm the absence of a test failure. (Although this is with a HDD, so assuming that we don't have an eMMC as well as an ext4 regression in 3.15, it seems likely that the problem is with some kind of ext4 regression sometime between 3.14 and 3.15. So we are making progress, but it's slow. Hopefuly we'll know more in the near future. Thanks to everyone who has been working on this bug! Cheers, - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-07 15:53 ` Theodore Ts'o @ 2014-07-07 22:31 ` Darrick J. Wong 2014-07-07 22:56 ` Theodore Ts'o 1 sibling, 0 replies; 50+ messages in thread From: Darrick J. Wong @ 2014-07-07 22:31 UTC (permalink / raw) To: Theodore Ts'o; +Cc: David Jander, Dmitry Monakhov, Matteo Croce, linux-ext4 Hi all, Whatever this bug is, I hit it on a spinny disk with 3.15.0. Disk: APPLE HDD HTS541010A9E662 (Hitachi SATA disk in a Mac Mini) Kernel: 3.15.0 (fairly standard build; didn't have problems w/ 3.14) OS: Ubuntu 14.04 + e2fsprogs 1.42.10 (May 2014) That machine hadn't been rebooted for a couple of weeks, when I rebooted it to put 3.15.4 on the machine (ptrace bug fix). When 3.15.4 came up, I saw this in dmesg: [ 31.717188] EXT4-fs (dm-3): recovery complete [ 31.721056] EXT4-fs (dm-3): mounted filesystem with ordered data mode. Opts: discard <snip> [ 213.150808] EXT4-fs error (device dm-3): ext4_mb_generate_buddy:756: group 4803, 12227 clusters in bitmap, 12217 in gd; block bitmap corrupt. [ 213.150822] JBD2: Spotted dirty metadata buffer (dev = dm-3, blocknr = 0). There's a risk of filesystem corruption in case of system crash. After which I umounted the FS and ran e2fsck -n: # e2fsck -fn /dev/dm-3 -C0 e2fsck 1.42.10 (18-May-2014) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Block bitmap differences: -(157397995--157398003) -(157398006--157398014) -(157403147--157403184) -(157403237--157403242) -(157403258--157403273) -(157403280--157403505) -(157403508--157403647) Fix? no dumpe2fs shows this for BG 4803: Group 4803: (Blocks 157384704-157417471) [INODE_UNINIT, ITABLE_ZEROED] Checksum 0xfbd1, unused inodes 8192 Block bitmap at 157286403 (bg #4800 + 3), Inode bitmap at 157286419 (bg #4800 + 19) Inode table at 157287968-157288479 (bg #4800 + 1568) 12233 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes Free blocks: 157384770, 157384840-157384888, 157385059-157385087, 157385213-157385215, 157385283-157385285, 157385408, 157385469-157385471, 157385779-157385780, 157385846, 157385916-157385918, 157386119-157386175, 157386235-157386239, 157386563-157386571, 157386641-157386687, 157387037-157387062, 157387331, 157387531-157387555, 157387624-157387646, 157387725-157387775, 157388100-157388130, 157388523-157388543, 157388942-157389055, 157389258-157389300, 157389379-157389428, 157389795-157390137, 157390294-157390591, 157390798-157390847, 157391360-157391524, 157391594-157391871, 157391950-157392009, 157392073-157392309, 157392455-157392536, 157392604-157393151, 157393423-157394310, 157394377-157394431, 157395200-157395261, 157395320-157395342, 157395408-157395455, 157395968-157396438, 157396440-157396735, 157396799-157397005, 157397007-157397994, 157398004-157398005, 157398015, 157398272-157398527, 157399040-157399104, 157399106-157399757, 157399759-157400094, 157400177-157400190, 157400210-157400223, 157400404-157400419, 157400459-157400494, 157400571-157400831, 157401600-157401696, 157401698-157402124, 157402126-157402833, 157402835-157402836, 157402839-157403146, 157403185-157403214, 157403226-157403236, 157403243-157403257, 157403274-157403279, 157403506-157403507, 157403648-157403865, 157403867-157404241, 157404250-157404671, 157405440-157405695, 157406464-157406564, 157406710-157406764, 157406768-157407086, 157407088-157407231, 157407271-157407487, 157407744-157407975, 157408034-157408149, 157408672-157408676, 157408736-157408766, 157409000-157409023, 157409280-157409785, 157412864-157413119 Free inodes: 39346177-39354368 I then had the funny idea to copy all of those blocks out of the filesystem to see what had previously been stored there. Of the files that weren't unintelligible binary crud, the recognizable blocks (according to file(1)) mostly appear to be HTML, CSS, and JPEG files from Firefox's browser cache. There's nothing obviously wrong with Fx's cache directory, though. Anyway, I'll keep digging. --D On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > An update from today's ext4 concall. Eric Whitney can fairly reliably > reproduce this on his Panda board with 3.15, and definitely not on > 3.14. So at this point there seems to be at least some kind of 3.15 > regression going on here, regardless of whether it's in the eMMC > driver or the ext4 code. (It also means that the bug fix I found is > irrelevant for the purposes of working this issue, since that's a much > harder to hit, and that bug has been around long before 3.14.) > > The problem in terms of narrowing it down any further is that the > Pandaboard is running into RCU bugs which makes it hard to test the > early 3.15-rcX kernels. There is some indication that the bug showed > up in the ext4 patches which Linus pulled at the beginning of > 3.15-rc3. However, due to the ARM (or at least Pandaboard) RCU bugs, > it's not possible to bisect test this on the Pandaboard. > > And on the x86_64, it takes most of a day to confirm the absence of a > test failure. (Although this is with a HDD, so assuming that we don't > have an eMMC as well as an ext4 regression in 3.15, it seems likely > that the problem is with some kind of ext4 regression sometime between > 3.14 and 3.15. > > So we are making progress, but it's slow. Hopefuly we'll know more in > the near future. > > Thanks to everyone who has been working on this bug! > > Cheers, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-07 15:53 ` Theodore Ts'o 2014-07-07 22:31 ` Darrick J. Wong @ 2014-07-07 22:56 ` Theodore Ts'o 2014-07-10 18:57 ` Eric Whitney 1 sibling, 1 reply; 50+ messages in thread From: Theodore Ts'o @ 2014-07-07 22:56 UTC (permalink / raw) To: David Jander Cc: Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4, Eric Whitney On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > An update from today's ext4 concall. Eric Whitney can fairly reliably > reproduce this on his Panda board with 3.15, and definitely not on > 3.14. So at this point there seems to be at least some kind of 3.15 > regression going on here, regardless of whether it's in the eMMC > driver or the ext4 code. (It also means that the bug fix I found is > irrelevant for the purposes of working this issue, since that's a much > harder to hit, and that bug has been around long before 3.14.) > > The problem in terms of narrowing it down any further is that the > Pandaboard is running into RCU bugs which makes it hard to test the > early 3.15-rcX kernels..... In the hopes of making it easy to bisect, I've created a kernel branch which starts with 3.14, and then adds on all of the ext4-related commits since then. You can find it at: git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure Eric, can you see if you can repro the failure on your Panda Board? If you can, try doing a bisection search on these series: git bisect start git bisect good v3.14 git bisect bad test-mb_generate_buddy-failure Hopefully if it is caused by one of the commits in this series, we'll be able to pin point it this way. Thanks!! - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-07 22:56 ` Theodore Ts'o @ 2014-07-10 18:57 ` Eric Whitney 2014-07-10 20:01 ` Darrick J. Wong 2014-07-10 23:29 ` Azat Khuzhin 0 siblings, 2 replies; 50+ messages in thread From: Eric Whitney @ 2014-07-10 18:57 UTC (permalink / raw) To: Theodore Ts'o Cc: David Jander, Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4, Eric Whitney * Theodore Ts'o <tytso@mit.edu>: > On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > > An update from today's ext4 concall. Eric Whitney can fairly reliably > > reproduce this on his Panda board with 3.15, and definitely not on > > 3.14. So at this point there seems to be at least some kind of 3.15 > > regression going on here, regardless of whether it's in the eMMC > > driver or the ext4 code. (It also means that the bug fix I found is > > irrelevant for the purposes of working this issue, since that's a much > > harder to hit, and that bug has been around long before 3.14.) > > > > The problem in terms of narrowing it down any further is that the > > Pandaboard is running into RCU bugs which makes it hard to test the > > early 3.15-rcX kernels..... > > In the hopes of making it easy to bisect, I've created a kernel branch > which starts with 3.14, and then adds on all of the ext4-related > commits since then. You can find it at: > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure > > https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure > > Eric, can you see if you can repro the failure on your Panda Board? > If you can, try doing a bisection search on these series: > > git bisect start > git bisect good v3.14 > git bisect bad test-mb_generate_buddy-failure > > Hopefully if it is caused by one of the commits in this series, we'll > be able to pin point it this way. First, the good news (with luck): My testing currently suggests that the patch causing this regression was pulled into 3.15-rc3 - 007649375f6af242d5b1df2c15996949714303ba ext4: initialize multi-block allocator before checking block descriptors Bisection by selectively reverting ext4 commits in -rc3 identified this patch while running on the Pandaboard. I'm still using generic/068 as my reproducer. It occasionally yields a false negative, but it has passed 10 consecutive trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency of false negatives I've seen, I'm reasonably confident in that result. I'm going to run another series with just that patch reverted on 3.16-rc3. Looking at the patch, the call to ext4_mb_init() was hoisted above the code performing journal recovery in ext4_fill_super(). The regression occurs only after journal recovery on the root filesystem. Secondly: Thanks for that git tree! However, I discovered that the same "RCU bug" I thought I was seeing on the Panda was also visible on the x86_64 KVM, and it was actually just RCU noticing stalls. These also occurred when using your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of masking the regression on the root filesystem. The test system would lock up completely - no console response - and made it impossible to force the reboot which was required to set up the failure. Hence the reversion approach, since RCU does not report stalls in 3.15-rc3 (final). Eric > > Thanks!! > > - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-10 18:57 ` Eric Whitney @ 2014-07-10 20:01 ` Darrick J. Wong 2014-07-10 21:31 ` Matteo Croce 2014-07-10 23:29 ` Azat Khuzhin 1 sibling, 1 reply; 50+ messages in thread From: Darrick J. Wong @ 2014-07-10 20:01 UTC (permalink / raw) To: Eric Whitney Cc: Theodore Ts'o, David Jander, Dmitry Monakhov, Matteo Croce, linux-ext4 On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote: > * Theodore Ts'o <tytso@mit.edu>: > > On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > > > An update from today's ext4 concall. Eric Whitney can fairly reliably > > > reproduce this on his Panda board with 3.15, and definitely not on > > > 3.14. So at this point there seems to be at least some kind of 3.15 > > > regression going on here, regardless of whether it's in the eMMC > > > driver or the ext4 code. (It also means that the bug fix I found is > > > irrelevant for the purposes of working this issue, since that's a much > > > harder to hit, and that bug has been around long before 3.14.) > > > > > > The problem in terms of narrowing it down any further is that the > > > Pandaboard is running into RCU bugs which makes it hard to test the > > > early 3.15-rcX kernels..... > > > > In the hopes of making it easy to bisect, I've created a kernel branch > > which starts with 3.14, and then adds on all of the ext4-related > > commits since then. You can find it at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure > > > > https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure > > > > Eric, can you see if you can repro the failure on your Panda Board? > > If you can, try doing a bisection search on these series: > > > > git bisect start > > git bisect good v3.14 > > git bisect bad test-mb_generate_buddy-failure > > > > Hopefully if it is caused by one of the commits in this series, we'll > > be able to pin point it this way. > > First, the good news (with luck): > > My testing currently suggests that the patch causing this regression was > pulled into 3.15-rc3 - > > 007649375f6af242d5b1df2c15996949714303ba > ext4: initialize multi-block allocator before checking block descriptors > > Bisection by selectively reverting ext4 commits in -rc3 identified this patch > while running on the Pandaboard. I'm still using generic/068 as my reproducer. > It occasionally yields a false negative, but it has passed 10 consecutive > trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency > of false negatives I've seen, I'm reasonably confident in that result. I'm > going to run another series with just that patch reverted on 3.16-rc3. > > Looking at the patch, the call to ext4_mb_init() was hoisted above the code > performing journal recovery in ext4_fill_super(). The regression occurs only > after journal recovery on the root filesystem. Thanks for finding the culprit! :) Can you apply this patch, build with CONFIG_EXT4FS_DEBUG=y, and see if an FS will mount without crashing? This was the cruddy patch I sent in (and later killed) that fixed the crash on mount with EXT4FS_DEBUG in a somewhat silly way. Maybe it's appropriate now. http://www.spinics.net/lists/linux-ext4/msg43287.html --D > > Secondly: > > Thanks for that git tree! However, I discovered that the same "RCU bug" I > thought I was seeing on the Panda was also visible on the x86_64 KVM, and > it was actually just RCU noticing stalls. These also occurred when using > your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during > bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of > masking the regression on the root filesystem. The test system would lock up > completely - no console response - and made it impossible to force the reboot > which was required to set up the failure. Hence the reversion approach, since > RCU does not report stalls in 3.15-rc3 (final). > > Eric > > > > > > > Thanks!! > > > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-10 20:01 ` Darrick J. Wong @ 2014-07-10 21:31 ` Matteo Croce 2014-07-10 22:32 ` Theodore Ts'o 0 siblings, 1 reply; 50+ messages in thread From: Matteo Croce @ 2014-07-10 21:31 UTC (permalink / raw) To: Darrick J. Wong Cc: Eric Whitney, Theodore Ts'o, David Jander, Dmitry Monakhov, linux-ext4 Will do, thanks! 2014-07-10 22:01 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote: >> * Theodore Ts'o <tytso@mit.edu>: >> > On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: >> > > An update from today's ext4 concall. Eric Whitney can fairly reliably >> > > reproduce this on his Panda board with 3.15, and definitely not on >> > > 3.14. So at this point there seems to be at least some kind of 3.15 >> > > regression going on here, regardless of whether it's in the eMMC >> > > driver or the ext4 code. (It also means that the bug fix I found is >> > > irrelevant for the purposes of working this issue, since that's a much >> > > harder to hit, and that bug has been around long before 3.14.) >> > > >> > > The problem in terms of narrowing it down any further is that the >> > > Pandaboard is running into RCU bugs which makes it hard to test the >> > > early 3.15-rcX kernels..... >> > >> > In the hopes of making it easy to bisect, I've created a kernel branch >> > which starts with 3.14, and then adds on all of the ext4-related >> > commits since then. You can find it at: >> > >> > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure >> > >> > https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure >> > >> > Eric, can you see if you can repro the failure on your Panda Board? >> > If you can, try doing a bisection search on these series: >> > >> > git bisect start >> > git bisect good v3.14 >> > git bisect bad test-mb_generate_buddy-failure >> > >> > Hopefully if it is caused by one of the commits in this series, we'll >> > be able to pin point it this way. >> >> First, the good news (with luck): >> >> My testing currently suggests that the patch causing this regression was >> pulled into 3.15-rc3 - >> >> 007649375f6af242d5b1df2c15996949714303ba >> ext4: initialize multi-block allocator before checking block descriptors >> >> Bisection by selectively reverting ext4 commits in -rc3 identified this patch >> while running on the Pandaboard. I'm still using generic/068 as my reproducer. >> It occasionally yields a false negative, but it has passed 10 consecutive >> trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency >> of false negatives I've seen, I'm reasonably confident in that result. I'm >> going to run another series with just that patch reverted on 3.16-rc3. >> >> Looking at the patch, the call to ext4_mb_init() was hoisted above the code >> performing journal recovery in ext4_fill_super(). The regression occurs only >> after journal recovery on the root filesystem. > > Thanks for finding the culprit! :) > > Can you apply this patch, build with CONFIG_EXT4FS_DEBUG=y, and see if an > FS will mount without crashing? This was the cruddy patch I sent in (and later > killed) that fixed the crash on mount with EXT4FS_DEBUG in a somewhat silly > way. Maybe it's appropriate now. > http://www.spinics.net/lists/linux-ext4/msg43287.html > > --D > >> >> Secondly: >> >> Thanks for that git tree! However, I discovered that the same "RCU bug" I >> thought I was seeing on the Panda was also visible on the x86_64 KVM, and >> it was actually just RCU noticing stalls. These also occurred when using >> your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during >> bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of >> masking the regression on the root filesystem. The test system would lock up >> completely - no console response - and made it impossible to force the reboot >> which was required to set up the failure. Hence the reversion approach, since >> RCU does not report stalls in 3.15-rc3 (final). >> >> Eric >> >> >> >> > >> > Thanks!! >> > >> > - Ted >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matteo Croce OpenWrt Developer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-10 21:31 ` Matteo Croce @ 2014-07-10 22:32 ` Theodore Ts'o 2014-07-11 0:13 ` Darrick J. Wong 0 siblings, 1 reply; 50+ messages in thread From: Theodore Ts'o @ 2014-07-10 22:32 UTC (permalink / raw) To: Matteo Croce Cc: Darrick J. Wong, Eric Whitney, David Jander, Dmitry Monakhov, linux-ext4, Azat Khuzhin To be clear, what you would need to do is to revert commit 007649375f6af242d5b1df2c15996949714303ba to prevent the fs corruption. Darrick's patch is one that tries to fix the problem addressed by that commit in a different fashion. Quite frankly, reverting the commit, which is causing real damage, is far more impotrant to me right now than what to do in order allow CONFIG_EXT4FS_DEBUG to work (which is nice, but it's only something that file system developers need, and to be honest I can't remember the last time I've used said config option). But if we know that Darrick's fix works, I'm willing to push that to Linus at the same time that I push a revert of 007649375f6af242d5b1df2c15996949714303ba Cheers, - Ted On Thu, Jul 10, 2014 at 11:31:14PM +0200, Matteo Croce wrote: > Will do, thanks! > > 2014-07-10 22:01 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote: > >> * Theodore Ts'o <tytso@mit.edu>: > >> > On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > >> > > An update from today's ext4 concall. Eric Whitney can fairly reliably > >> > > reproduce this on his Panda board with 3.15, and definitely not on > >> > > 3.14. So at this point there seems to be at least some kind of 3.15 > >> > > regression going on here, regardless of whether it's in the eMMC > >> > > driver or the ext4 code. (It also means that the bug fix I found is > >> > > irrelevant for the purposes of working this issue, since that's a much > >> > > harder to hit, and that bug has been around long before 3.14.) > >> > > > >> > > The problem in terms of narrowing it down any further is that the > >> > > Pandaboard is running into RCU bugs which makes it hard to test the > >> > > early 3.15-rcX kernels..... > >> > > >> > In the hopes of making it easy to bisect, I've created a kernel branch > >> > which starts with 3.14, and then adds on all of the ext4-related > >> > commits since then. You can find it at: > >> > > >> > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure > >> > > >> > https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure > >> > > >> > Eric, can you see if you can repro the failure on your Panda Board? > >> > If you can, try doing a bisection search on these series: > >> > > >> > git bisect start > >> > git bisect good v3.14 > >> > git bisect bad test-mb_generate_buddy-failure > >> > > >> > Hopefully if it is caused by one of the commits in this series, we'll > >> > be able to pin point it this way. > >> > >> First, the good news (with luck): > >> > >> My testing currently suggests that the patch causing this regression was > >> pulled into 3.15-rc3 - > >> > >> 007649375f6af242d5b1df2c15996949714303ba > >> ext4: initialize multi-block allocator before checking block descriptors > >> > >> Bisection by selectively reverting ext4 commits in -rc3 identified this patch > >> while running on the Pandaboard. I'm still using generic/068 as my reproducer. > >> It occasionally yields a false negative, but it has passed 10 consecutive > >> trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency > >> of false negatives I've seen, I'm reasonably confident in that result. I'm > >> going to run another series with just that patch reverted on 3.16-rc3. > >> > >> Looking at the patch, the call to ext4_mb_init() was hoisted above the code > >> performing journal recovery in ext4_fill_super(). The regression occurs only > >> after journal recovery on the root filesystem. > > > > Thanks for finding the culprit! :) > > > > Can you apply this patch, build with CONFIG_EXT4FS_DEBUG=y, and see if an > > FS will mount without crashing? This was the cruddy patch I sent in (and later > > killed) that fixed the crash on mount with EXT4FS_DEBUG in a somewhat silly > > way. Maybe it's appropriate now. > > http://www.spinics.net/lists/linux-ext4/msg43287.html > > > > --D > > > >> > >> Secondly: > >> > >> Thanks for that git tree! However, I discovered that the same "RCU bug" I > >> thought I was seeing on the Panda was also visible on the x86_64 KVM, and > >> it was actually just RCU noticing stalls. These also occurred when using > >> your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during > >> bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of > >> masking the regression on the root filesystem. The test system would lock up > >> completely - no console response - and made it impossible to force the reboot > >> which was required to set up the failure. Hence the reversion approach, since > >> RCU does not report stalls in 3.15-rc3 (final). > >> > >> Eric > >> > >> > >> > >> > > >> > Thanks!! > >> > > >> > - Ted > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Matteo Croce > OpenWrt Developer > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-10 22:32 ` Theodore Ts'o @ 2014-07-11 0:13 ` Darrick J. Wong 2014-07-11 0:45 ` Eric Whitney 0 siblings, 1 reply; 50+ messages in thread From: Darrick J. Wong @ 2014-07-11 0:13 UTC (permalink / raw) To: Theodore Ts'o Cc: Matteo Croce, Eric Whitney, David Jander, Dmitry Monakhov, linux-ext4, Azat Khuzhin On Thu, Jul 10, 2014 at 06:32:45PM -0400, Theodore Ts'o wrote: > To be clear, what you would need to do is to revert commit > 007649375f6af242d5b1df2c15996949714303ba to prevent the fs corruption. > Darrick's patch is one that tries to fix the problem addressed by that > commit in a different fashion. > > Quite frankly, reverting the commit, which is causing real damage, is > far more impotrant to me right now than what to do in order allow > CONFIG_EXT4FS_DEBUG to work (which is nice, but it's only something > that file system developers need, and to be honest I can't remember > the last time I've used said config option). But if we know that > Darrick's fix works, I'm willing to push that to Linus at the same > time that I push a revert of 007649375f6af242d5b1df2c15996949714303ba Reverting the 007649375... patch doesn't seem to create any obvious regressions on my test box (though again, I was never able to reproduce it as consistently as Eric W.). Tossing in the [1] patch also fixes the crash when CONFIG_EXT4_DEBUG=y on 3.16-rc4. I'd say it's safe to send both to Linus and stable. If anyone experiences problems that I'm not seeing, please yell loudly and soon! --D [1] http://www.spinics.net/lists/linux-ext4/msg43287.html > > Cheers, > > - Ted > > On Thu, Jul 10, 2014 at 11:31:14PM +0200, Matteo Croce wrote: > > Will do, thanks! > > > > 2014-07-10 22:01 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote: > > >> * Theodore Ts'o <tytso@mit.edu>: > > >> > On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > > >> > > An update from today's ext4 concall. Eric Whitney can fairly reliably > > >> > > reproduce this on his Panda board with 3.15, and definitely not on > > >> > > 3.14. So at this point there seems to be at least some kind of 3.15 > > >> > > regression going on here, regardless of whether it's in the eMMC > > >> > > driver or the ext4 code. (It also means that the bug fix I found is > > >> > > irrelevant for the purposes of working this issue, since that's a much > > >> > > harder to hit, and that bug has been around long before 3.14.) > > >> > > > > >> > > The problem in terms of narrowing it down any further is that the > > >> > > Pandaboard is running into RCU bugs which makes it hard to test the > > >> > > early 3.15-rcX kernels..... > > >> > > > >> > In the hopes of making it easy to bisect, I've created a kernel branch > > >> > which starts with 3.14, and then adds on all of the ext4-related > > >> > commits since then. You can find it at: > > >> > > > >> > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure > > >> > > > >> > https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure > > >> > > > >> > Eric, can you see if you can repro the failure on your Panda Board? > > >> > If you can, try doing a bisection search on these series: > > >> > > > >> > git bisect start > > >> > git bisect good v3.14 > > >> > git bisect bad test-mb_generate_buddy-failure > > >> > > > >> > Hopefully if it is caused by one of the commits in this series, we'll > > >> > be able to pin point it this way. > > >> > > >> First, the good news (with luck): > > >> > > >> My testing currently suggests that the patch causing this regression was > > >> pulled into 3.15-rc3 - > > >> > > >> 007649375f6af242d5b1df2c15996949714303ba > > >> ext4: initialize multi-block allocator before checking block descriptors > > >> > > >> Bisection by selectively reverting ext4 commits in -rc3 identified this patch > > >> while running on the Pandaboard. I'm still using generic/068 as my reproducer. > > >> It occasionally yields a false negative, but it has passed 10 consecutive > > >> trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency > > >> of false negatives I've seen, I'm reasonably confident in that result. I'm > > >> going to run another series with just that patch reverted on 3.16-rc3. > > >> > > >> Looking at the patch, the call to ext4_mb_init() was hoisted above the code > > >> performing journal recovery in ext4_fill_super(). The regression occurs only > > >> after journal recovery on the root filesystem. > > > > > > Thanks for finding the culprit! :) > > > > > > Can you apply this patch, build with CONFIG_EXT4FS_DEBUG=y, and see if an > > > FS will mount without crashing? This was the cruddy patch I sent in (and later > > > killed) that fixed the crash on mount with EXT4FS_DEBUG in a somewhat silly > > > way. Maybe it's appropriate now. > > > http://www.spinics.net/lists/linux-ext4/msg43287.html > > > > > > --D > > > > > >> > > >> Secondly: > > >> > > >> Thanks for that git tree! However, I discovered that the same "RCU bug" I > > >> thought I was seeing on the Panda was also visible on the x86_64 KVM, and > > >> it was actually just RCU noticing stalls. These also occurred when using > > >> your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during > > >> bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of > > >> masking the regression on the root filesystem. The test system would lock up > > >> completely - no console response - and made it impossible to force the reboot > > >> which was required to set up the failure. Hence the reversion approach, since > > >> RCU does not report stalls in 3.15-rc3 (final). > > >> > > >> Eric > > >> > > >> > > >> > > >> > > > >> > Thanks!! > > >> > > > >> > - Ted > > >> -- > > >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > >> the body of a message to majordomo@vger.kernel.org > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > > Matteo Croce > > OpenWrt Developer > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-11 0:13 ` Darrick J. Wong @ 2014-07-11 0:45 ` Eric Whitney 2014-07-11 8:50 ` Jaehoon Chung ` (2 more replies) 0 siblings, 3 replies; 50+ messages in thread From: Eric Whitney @ 2014-07-11 0:45 UTC (permalink / raw) To: Darrick J. Wong Cc: Theodore Ts'o, Matteo Croce, Eric Whitney, David Jander, Dmitry Monakhov, linux-ext4, Azat Khuzhin * Darrick J. Wong <darrick.wong@oracle.com>: > On Thu, Jul 10, 2014 at 06:32:45PM -0400, Theodore Ts'o wrote: > > To be clear, what you would need to do is to revert commit > > 007649375f6af242d5b1df2c15996949714303ba to prevent the fs corruption. > > Darrick's patch is one that tries to fix the problem addressed by that > > commit in a different fashion. > > > > Quite frankly, reverting the commit, which is causing real damage, is > > far more impotrant to me right now than what to do in order allow > > CONFIG_EXT4FS_DEBUG to work (which is nice, but it's only something > > that file system developers need, and to be honest I can't remember > > the last time I've used said config option). But if we know that > > Darrick's fix works, I'm willing to push that to Linus at the same > > time that I push a revert of 007649375f6af242d5b1df2c15996949714303ba > > Reverting the 007649375... patch doesn't seem to create any obvious regressions > on my test box (though again, I was never able to reproduce it as consistently > as Eric W.). > > Tossing in the [1] patch also fixes the crash when CONFIG_EXT4_DEBUG=y on > 3.16-rc4. I'd say it's safe to send both to Linus and stable. > > If anyone experiences problems that I'm not seeing, please yell loudly and > soon! > Reverting the suspect patch - 007649375f - on 3.16-rc3 and running on the Panda yielded 10 successive "successful" generic/068 failures (no block bitmap trouble on reboot). So, it looks like that patch is all of it. Running the same test scenario on Darrick's patch (CONFIG_EXT4FS_DEBUG => CONFIG_EXT4_DEBUG) applied to 3.16-rc3 lead to exactly the same result. No panics, BUGS, or other misbehavior whether generic/068 completed successfully or failed (and that test used here simply because it was convenient) and no trouble on boot, etc. Let me know if anything else is needed. Eric > --D > > [1] http://www.spinics.net/lists/linux-ext4/msg43287.html > > > > Cheers, > > > > - Ted > > > > On Thu, Jul 10, 2014 at 11:31:14PM +0200, Matteo Croce wrote: > > > Will do, thanks! > > > > > > 2014-07-10 22:01 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > > > > On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote: > > > >> * Theodore Ts'o <tytso@mit.edu>: > > > >> > On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > > > >> > > An update from today's ext4 concall. Eric Whitney can fairly reliably > > > >> > > reproduce this on his Panda board with 3.15, and definitely not on > > > >> > > 3.14. So at this point there seems to be at least some kind of 3.15 > > > >> > > regression going on here, regardless of whether it's in the eMMC > > > >> > > driver or the ext4 code. (It also means that the bug fix I found is > > > >> > > irrelevant for the purposes of working this issue, since that's a much > > > >> > > harder to hit, and that bug has been around long before 3.14.) > > > >> > > > > > >> > > The problem in terms of narrowing it down any further is that the > > > >> > > Pandaboard is running into RCU bugs which makes it hard to test the > > > >> > > early 3.15-rcX kernels..... > > > >> > > > > >> > In the hopes of making it easy to bisect, I've created a kernel branch > > > >> > which starts with 3.14, and then adds on all of the ext4-related > > > >> > commits since then. You can find it at: > > > >> > > > > >> > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure > > > >> > > > > >> > https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure > > > >> > > > > >> > Eric, can you see if you can repro the failure on your Panda Board? > > > >> > If you can, try doing a bisection search on these series: > > > >> > > > > >> > git bisect start > > > >> > git bisect good v3.14 > > > >> > git bisect bad test-mb_generate_buddy-failure > > > >> > > > > >> > Hopefully if it is caused by one of the commits in this series, we'll > > > >> > be able to pin point it this way. > > > >> > > > >> First, the good news (with luck): > > > >> > > > >> My testing currently suggests that the patch causing this regression was > > > >> pulled into 3.15-rc3 - > > > >> > > > >> 007649375f6af242d5b1df2c15996949714303ba > > > >> ext4: initialize multi-block allocator before checking block descriptors > > > >> > > > >> Bisection by selectively reverting ext4 commits in -rc3 identified this patch > > > >> while running on the Pandaboard. I'm still using generic/068 as my reproducer. > > > >> It occasionally yields a false negative, but it has passed 10 consecutive > > > >> trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency > > > >> of false negatives I've seen, I'm reasonably confident in that result. I'm > > > >> going to run another series with just that patch reverted on 3.16-rc3. > > > >> > > > >> Looking at the patch, the call to ext4_mb_init() was hoisted above the code > > > >> performing journal recovery in ext4_fill_super(). The regression occurs only > > > >> after journal recovery on the root filesystem. > > > > > > > > Thanks for finding the culprit! :) > > > > > > > > Can you apply this patch, build with CONFIG_EXT4FS_DEBUG=y, and see if an > > > > FS will mount without crashing? This was the cruddy patch I sent in (and later > > > > killed) that fixed the crash on mount with EXT4FS_DEBUG in a somewhat silly > > > > way. Maybe it's appropriate now. > > > > http://www.spinics.net/lists/linux-ext4/msg43287.html > > > > > > > > --D > > > > > > > >> > > > >> Secondly: > > > >> > > > >> Thanks for that git tree! However, I discovered that the same "RCU bug" I > > > >> thought I was seeing on the Panda was also visible on the x86_64 KVM, and > > > >> it was actually just RCU noticing stalls. These also occurred when using > > > >> your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during > > > >> bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of > > > >> masking the regression on the root filesystem. The test system would lock up > > > >> completely - no console response - and made it impossible to force the reboot > > > >> which was required to set up the failure. Hence the reversion approach, since > > > >> RCU does not report stalls in 3.15-rc3 (final). > > > >> > > > >> Eric > > > >> > > > >> > > > >> > > > >> > > > > >> > Thanks!! > > > >> > > > > >> > - Ted > > > >> -- > > > >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > > >> the body of a message to majordomo@vger.kernel.org > > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > > > > > -- > > > Matteo Croce > > > OpenWrt Developer > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-11 0:45 ` Eric Whitney @ 2014-07-11 8:50 ` Jaehoon Chung 2014-07-11 11:43 ` Theodore Ts'o 2014-07-15 6:31 ` David Jander 2 siblings, 0 replies; 50+ messages in thread From: Jaehoon Chung @ 2014-07-11 8:50 UTC (permalink / raw) To: Eric Whitney, Darrick J. Wong Cc: Theodore Ts'o, Matteo Croce, David Jander, Dmitry Monakhov, linux-ext4, Azat Khuzhin On 07/11/2014 09:45 AM, Eric Whitney wrote: > * Darrick J. Wong <darrick.wong@oracle.com>: >> On Thu, Jul 10, 2014 at 06:32:45PM -0400, Theodore Ts'o wrote: >>> To be clear, what you would need to do is to revert commit >>> 007649375f6af242d5b1df2c15996949714303ba to prevent the fs corruption. >>> Darrick's patch is one that tries to fix the problem addressed by that >>> commit in a different fashion. >>> >>> Quite frankly, reverting the commit, which is causing real damage, is >>> far more impotrant to me right now than what to do in order allow >>> CONFIG_EXT4FS_DEBUG to work (which is nice, but it's only something >>> that file system developers need, and to be honest I can't remember >>> the last time I've used said config option). But if we know that >>> Darrick's fix works, I'm willing to push that to Linus at the same >>> time that I push a revert of 007649375f6af242d5b1df2c15996949714303ba >> >> Reverting the 007649375... patch doesn't seem to create any obvious regressions >> on my test box (though again, I was never able to reproduce it as consistently >> as Eric W.). >> >> Tossing in the [1] patch also fixes the crash when CONFIG_EXT4_DEBUG=y on >> 3.16-rc4. I'd say it's safe to send both to Linus and stable. >> >> If anyone experiences problems that I'm not seeing, please yell loudly and >> soon! >> > > Reverting the suspect patch - 007649375f - on 3.16-rc3 and running on the > Panda yielded 10 successive "successful" generic/068 failures (no block > bitmap trouble on reboot). So, it looks like that patch is all of it. In my case, after reverting it, i didn't find the bitmap corrupt problem at exynos board. Before reverting it, when i try to reboot, it occurred the problem at almost every time. (Kernel version is 3.16-rv4, eMMC5.0 card is used.) Best Regards, Jaehoon Chung > > Running the same test scenario on Darrick's patch (CONFIG_EXT4FS_DEBUG => > CONFIG_EXT4_DEBUG) applied to 3.16-rc3 lead to exactly the same result. > No panics, BUGS, or other misbehavior whether generic/068 completed > successfully or failed (and that test used here simply because it was > convenient) and no trouble on boot, etc. > > Let me know if anything else is needed. > > Eric > >> --D >> >> [1] http://www.spinics.net/lists/linux-ext4/msg43287.html >>> >>> Cheers, >>> >>> - Ted >>> >>> On Thu, Jul 10, 2014 at 11:31:14PM +0200, Matteo Croce wrote: >>>> Will do, thanks! >>>> >>>> 2014-07-10 22:01 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: >>>>> On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote: >>>>>> * Theodore Ts'o <tytso@mit.edu>: >>>>>>> On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: >>>>>>>> An update from today's ext4 concall. Eric Whitney can fairly reliably >>>>>>>> reproduce this on his Panda board with 3.15, and definitely not on >>>>>>>> 3.14. So at this point there seems to be at least some kind of 3.15 >>>>>>>> regression going on here, regardless of whether it's in the eMMC >>>>>>>> driver or the ext4 code. (It also means that the bug fix I found is >>>>>>>> irrelevant for the purposes of working this issue, since that's a much >>>>>>>> harder to hit, and that bug has been around long before 3.14.) >>>>>>>> >>>>>>>> The problem in terms of narrowing it down any further is that the >>>>>>>> Pandaboard is running into RCU bugs which makes it hard to test the >>>>>>>> early 3.15-rcX kernels..... >>>>>>> >>>>>>> In the hopes of making it easy to bisect, I've created a kernel branch >>>>>>> which starts with 3.14, and then adds on all of the ext4-related >>>>>>> commits since then. You can find it at: >>>>>>> >>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure >>>>>>> >>>>>>> https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure >>>>>>> >>>>>>> Eric, can you see if you can repro the failure on your Panda Board? >>>>>>> If you can, try doing a bisection search on these series: >>>>>>> >>>>>>> git bisect start >>>>>>> git bisect good v3.14 >>>>>>> git bisect bad test-mb_generate_buddy-failure >>>>>>> >>>>>>> Hopefully if it is caused by one of the commits in this series, we'll >>>>>>> be able to pin point it this way. >>>>>> >>>>>> First, the good news (with luck): >>>>>> >>>>>> My testing currently suggests that the patch causing this regression was >>>>>> pulled into 3.15-rc3 - >>>>>> >>>>>> 007649375f6af242d5b1df2c15996949714303ba >>>>>> ext4: initialize multi-block allocator before checking block descriptors >>>>>> >>>>>> Bisection by selectively reverting ext4 commits in -rc3 identified this patch >>>>>> while running on the Pandaboard. I'm still using generic/068 as my reproducer. >>>>>> It occasionally yields a false negative, but it has passed 10 consecutive >>>>>> trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency >>>>>> of false negatives I've seen, I'm reasonably confident in that result. I'm >>>>>> going to run another series with just that patch reverted on 3.16-rc3. >>>>>> >>>>>> Looking at the patch, the call to ext4_mb_init() was hoisted above the code >>>>>> performing journal recovery in ext4_fill_super(). The regression occurs only >>>>>> after journal recovery on the root filesystem. >>>>> >>>>> Thanks for finding the culprit! :) >>>>> >>>>> Can you apply this patch, build with CONFIG_EXT4FS_DEBUG=y, and see if an >>>>> FS will mount without crashing? This was the cruddy patch I sent in (and later >>>>> killed) that fixed the crash on mount with EXT4FS_DEBUG in a somewhat silly >>>>> way. Maybe it's appropriate now. >>>>> http://www.spinics.net/lists/linux-ext4/msg43287.html >>>>> >>>>> --D >>>>> >>>>>> >>>>>> Secondly: >>>>>> >>>>>> Thanks for that git tree! However, I discovered that the same "RCU bug" I >>>>>> thought I was seeing on the Panda was also visible on the x86_64 KVM, and >>>>>> it was actually just RCU noticing stalls. These also occurred when using >>>>>> your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during >>>>>> bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of >>>>>> masking the regression on the root filesystem. The test system would lock up >>>>>> completely - no console response - and made it impossible to force the reboot >>>>>> which was required to set up the failure. Hence the reversion approach, since >>>>>> RCU does not report stalls in 3.15-rc3 (final). >>>>>> >>>>>> Eric >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> Thanks!! >>>>>>> >>>>>>> - Ted >>>>>> -- >>>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>>>> the body of a message to majordomo@vger.kernel.org >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Matteo Croce >>>> OpenWrt Developer >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-11 0:45 ` Eric Whitney 2014-07-11 8:50 ` Jaehoon Chung @ 2014-07-11 11:43 ` Theodore Ts'o 2014-07-15 6:31 ` David Jander 2 siblings, 0 replies; 50+ messages in thread From: Theodore Ts'o @ 2014-07-11 11:43 UTC (permalink / raw) To: Eric Whitney Cc: Darrick J. Wong, Matteo Croce, David Jander, Dmitry Monakhov, linux-ext4, Azat Khuzhin On Thu, Jul 10, 2014 at 08:45:08PM -0400, Eric Whitney wrote: > > Reverting the suspect patch - 007649375f - on 3.16-rc3 and running on the > Panda yielded 10 successive "successful" generic/068 failures (no block > bitmap trouble on reboot). So, it looks like that patch is all of it. Thanks again Eric, for finding this!! > Running the same test scenario on Darrick's patch (CONFIG_EXT4FS_DEBUG => > CONFIG_EXT4_DEBUG) applied to 3.16-rc3 lead to exactly the same result. > No panics, BUGS, or other misbehavior whether generic/068 completed > successfully or failed (and that test used here simply because it was > convenient) and no trouble on boot, etc. I've been looking more closely at the changes in line, and I suspect the real fix is that we should move these lines: ext4_free_blocks_count_set(sbi->s_es, EXT4_C2B(sbi, ext4_count_free_clusters(sb))); sbi->s_es->s_free_inodes_count =cpu_to_le32(ext4_count_free_inodes(sb)); after the journal is run. Not that it really matters since so very little (and nothing for normal file system operation, including the statfs(2) system call) actually depends on the free blocks count in the superblock --- instead we the percpu "s_freeclusters_count" and "s_dirtyclusters_counter" fields for scalability reasons --- but if we *are* going to set these fields in the on-disk superblock, we should wait until *after* we have updated the allocation bitmaps before we start counting the free blocks in those bitmaps. Cheers, - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-11 0:45 ` Eric Whitney 2014-07-11 8:50 ` Jaehoon Chung 2014-07-11 11:43 ` Theodore Ts'o @ 2014-07-15 6:31 ` David Jander 2 siblings, 0 replies; 50+ messages in thread From: David Jander @ 2014-07-15 6:31 UTC (permalink / raw) To: Eric Whitney Cc: Darrick J. Wong, Theodore Ts'o, Matteo Croce, Dmitry Monakhov, linux-ext4, Azat Khuzhin Dear Eric, Ted, ... On Thu, 10 Jul 2014 20:45:08 -0400 Eric Whitney <enwlinux@gmail.com> wrote: > * Darrick J. Wong <darrick.wong@oracle.com>: > > On Thu, Jul 10, 2014 at 06:32:45PM -0400, Theodore Ts'o wrote: > > > To be clear, what you would need to do is to revert commit > > > 007649375f6af242d5b1df2c15996949714303ba to prevent the fs corruption. > > > Darrick's patch is one that tries to fix the problem addressed by that > > > commit in a different fashion. > > > > > > Quite frankly, reverting the commit, which is causing real damage, is > > > far more impotrant to me right now than what to do in order allow > > > CONFIG_EXT4FS_DEBUG to work (which is nice, but it's only something > > > that file system developers need, and to be honest I can't remember > > > the last time I've used said config option). But if we know that > > > Darrick's fix works, I'm willing to push that to Linus at the same > > > time that I push a revert of 007649375f6af242d5b1df2c15996949714303ba > > > > Reverting the 007649375... patch doesn't seem to create any obvious regressions > > on my test box (though again, I was never able to reproduce it as consistently > > as Eric W.). > > > > Tossing in the [1] patch also fixes the crash when CONFIG_EXT4_DEBUG=y on > > 3.16-rc4. I'd say it's safe to send both to Linus and stable. > > > > If anyone experiences problems that I'm not seeing, please yell loudly and > > soon! > > > > Reverting the suspect patch - 007649375f - on 3.16-rc3 and running on the > Panda yielded 10 successive "successful" generic/068 failures (no block > bitmap trouble on reboot). So, it looks like that patch is all of it. > > Running the same test scenario on Darrick's patch (CONFIG_EXT4FS_DEBUG => > CONFIG_EXT4_DEBUG) applied to 3.16-rc3 lead to exactly the same result. > No panics, BUGS, or other misbehavior whether generic/068 completed > successfully or failed (and that test used here simply because it was > convenient) and no trouble on boot, etc. Thanks a lot to everyone for the excellent work hunting this down! I can confirm that on 3.16-rc5 the problem is gone. At least after extensively testing yesterday, I have not been unable to reproduce it. No ext4 error messages and no filesystem corruption anymore! Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-10 18:57 ` Eric Whitney 2014-07-10 20:01 ` Darrick J. Wong @ 2014-07-10 23:29 ` Azat Khuzhin 1 sibling, 0 replies; 50+ messages in thread From: Azat Khuzhin @ 2014-07-10 23:29 UTC (permalink / raw) To: Eric Whitney Cc: Theodore Ts'o, David Jander, Dmitry Monakhov, Matteo Croce, Darrick J. Wong, linux-ext4 On Thu, Jul 10, 2014 at 02:57:48PM -0400, Eric Whitney wrote: > * Theodore Ts'o <tytso@mit.edu>: > > On Mon, Jul 07, 2014 at 11:53:10AM -0400, Theodore Ts'o wrote: > > > An update from today's ext4 concall. Eric Whitney can fairly reliably > > > reproduce this on his Panda board with 3.15, and definitely not on > > > 3.14. So at this point there seems to be at least some kind of 3.15 > > > regression going on here, regardless of whether it's in the eMMC > > > driver or the ext4 code. (It also means that the bug fix I found is > > > irrelevant for the purposes of working this issue, since that's a much > > > harder to hit, and that bug has been around long before 3.14.) > > > > > > The problem in terms of narrowing it down any further is that the > > > Pandaboard is running into RCU bugs which makes it hard to test the > > > early 3.15-rcX kernels..... > > > > In the hopes of making it easy to bisect, I've created a kernel branch > > which starts with 3.14, and then adds on all of the ext4-related > > commits since then. You can find it at: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git test-mb_generate_buddy-failure > > > > https://git.kernel.org/cgit/linux/kernel/git/tytso/ext4.git/log/?h=test-mb_generate_buddy-failure > > > > Eric, can you see if you can repro the failure on your Panda Board? > > If you can, try doing a bisection search on these series: > > > > git bisect start > > git bisect good v3.14 > > git bisect bad test-mb_generate_buddy-failure > > > > Hopefully if it is caused by one of the commits in this series, we'll > > be able to pin point it this way. > > First, the good news (with luck): > > My testing currently suggests that the patch causing this regression was > pulled into 3.15-rc3 - > > 007649375f6af242d5b1df2c15996949714303ba > ext4: initialize multi-block allocator before checking block descriptors > > Bisection by selectively reverting ext4 commits in -rc3 identified this patch > while running on the Pandaboard. I'm still using generic/068 as my reproducer. > It occasionally yields a false negative, but it has passed 10 consecutive > trials on my revert/bisect kernel derived from 3.15-rc3. Given the frequency > of false negatives I've seen, I'm reasonably confident in that result. I'm > going to run another series with just that patch reverted on 3.16-rc3. > > Looking at the patch, the call to ext4_mb_init() was hoisted above the code > performing journal recovery in ext4_fill_super(). The regression occurs only > after journal recovery on the root filesystem. Oops, nice catch! I'm very sorry for this. When this problems begun, I rechecked my patch, but didn't found this. (I should test more next time!) But I don't understand why this triggers only on the root fs? It will be greate if ext4 will have an BUG_ON() for this case, to avoid futher bugs, something like this: $ git di fs/ext4/mballoc.c diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 59e3162..8dfc999 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -832,6 +832,8 @@ static int ext4_mb_init_cache(struct page *page, char *incore) inode = page->mapping->host; sb = inode->i_sb; + BUG_ON((EXT4_HAS_INCOMPAT_FEATURE(sb, EXT4_FEATURE_INCOMPAT_RECOVER)); + ngroups = ext4_get_groups_count(sb); blocksize = 1 << inode->i_blkbits; blocks_per_page = PAGE_CACHE_SIZE / blocksize; Thanks, and please accept my apologies those, who have got corrupted fs. > > Secondly: > > Thanks for that git tree! However, I discovered that the same "RCU bug" I > thought I was seeing on the Panda was also visible on the x86_64 KVM, and > it was actually just RCU noticing stalls. These also occurred when using > your git tree as well as on mainline 3.15-rc1 and 3.15-rc2 and during > bisection attempts on 3.15-rc3 within the ext4 patches, and had the effect of > masking the regression on the root filesystem. The test system would lock up > completely - no console response - and made it impossible to force the reboot > which was required to set up the failure. Hence the reversion approach, since > RCU does not report stalls in 3.15-rc3 (final). > > Eric > > > > > > > Thanks!! > > > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 9:40 ` David Jander 2014-07-04 10:17 ` Dmitry Monakhov @ 2014-07-04 11:04 ` Jaehoon Chung 2014-07-04 11:32 ` David Jander 1 sibling, 1 reply; 50+ messages in thread From: Jaehoon Chung @ 2014-07-04 11:04 UTC (permalink / raw) To: David Jander, Dmitry Monakhov Cc: Theodore Ts'o, Matteo Croce, Darrick J. Wong, linux-ext4 Hi, David. On 07/04/2014 06:40 PM, David Jander wrote: > > Hi Dmitry, > > On Thu, 03 Jul 2014 18:58:48 +0400 > Dmitry Monakhov <dmonakhov@openvz.org> wrote: > >> On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote: >>> >>> Hi Ted, >>> >>> On Thu, 3 Jul 2014 09:43:38 -0400 >>> "Theodore Ts'o" <tytso@mit.edu> wrote: >>> >>>> On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: >>>>> 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: >>>>> >>>>> I have a Samsung SSD 840 PRO >>>> >>>> Matteo, >>>> >>>> For you, you said you were seeing these problems on 3.15. Was it >>>> *not* happening for you when you used an older kernel? If so, that >>>> would help us try to provide the basis of trying to do a bisection >>>> search. >>> >>> I also tested with 3.15, and there too I see the same problem. >>> >>>> Using the kvm-xfstests infrastructure, I've been trying to reproduce >>>> the problem as follows: >>>> >>>> ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k >>>> >>>> xfstests geneeric/075 runs fsx which does a fair amount of block >>>> allocation deallocations, and then after the test finishes, it first >>>> replays the journal (e2fsck -p) and then forces a fsck run on the >>>> test disk that I use for the run. >>>> >>>> After I launch this, in a separate window, I do this: >>>> >>>> sleep 60 ; killall qemu-system-x86_64 >>>> >>>> This kills the qemu process midway through the fsx test, and then I >>>> see if I can find a problem. I haven't had a chance to automate this >>>> yet, and it is my intention to try to set this up where I can run this >>>> on a ramdisk or a SSD, so I can more closely approximate what people >>>> are reporting on flash-based media. >>>> >>>> So far, I haven't been able to reproduce the problem. If after doing >>>> a large number of times, it can't be reproduced (especially if it >>>> can't be reproduced on an SSD), then it would lead us to believe that >>>> one of two things is the cause. (a) The CACHE FLUSH command isn't >>>> properly getting sent to the device in some cases, or (b) there really >>>> is a hardware problem with the flash device in question. >>> >>> Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral >>> driver? Can you explain why I don't see any problems with EXT3? >>> >>> I can't discard the possibility of (b) because I cannot prove it, but I will >>> try to see if I can do the same test on a SSD which I happen to have on that >>> platform. That should be able to rule out problems with the eMMC chip and >>> -driver, right? >>> >>> Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? >>> >>> I left the system running (it started from a dirty EXT4 partition), and I am >>> seen the following error pop up after a few minutes. The system is not doing >>> much (some syslog activity maybe, but not much more): >>> >>> [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 >>> [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 >>> [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 >>> >>> What does that mean? >> This means that it found previous error in internal ext4's log. Which is >> normal because your fs was corrupted before. It is reasonable to >> recreate filesystem from very beginning. >> >> In order to understand whenever it is regression in eMMC driver it is >> reasonable to run integrity test for a device itself. You can run >> any integrity test you like, For example just run a fio's job >> "fio disk-verify2.fio" (see attachment), IMPORTANT this script will >> destroy data on test partition. If it failed with errors like >> follows "verify: bad magic header XXX" than it is definitely a drivers issue. > > I have been trying to run fio on my board with your configuration file, but I > am having problems, and since I am not familiar with fio at all, I can't > really figure out what's wrong. My eMMC device is only 916MiB in size, so I > edited the last part to be: Which eMMC host controller did you use? > > offset_increment=100M > size=100M > > Is that ok? > > I still get error messages complaining about blocksize though. Here is the > output I get (can't really make sense of it): > > # ./fio ../disk-verify2.fio > Multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures. > /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 > ... > fio-2.1.10-49-gf302 > Starting 4 processes > fio: blocksize too large for data set > fio: blocksize too large for data set > fio: blocksize too large for data set > fio: io_u.c:1315: __get_io_u: Assertion `io_u->flags & IO_U_F_FREE' failed.ta 00m:00s] > fio: pid=7612, got signal=6 > > /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=7612: Fri Jul 4 09:31:15 2014 > lat (msec) : 4=0.19%, 10=0.19%, 20=0.19%, 50=0.85%, 100=1.23% > lat (msec) : 250=56.01%, 500=37.18%, 750=1.14% > cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0 > IO depths : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=97.1%, >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > issued : total=r=33/w=1024/d=0, short=r=0/w=0/d=0 > latency : target=0, window=0, percentile=100.00%, depth=32 > > Run status group 0 (all jobs): > > Disk stats (read/write): > mmcblk1: ios=11/1025, merge=0/0, ticks=94/6671, in_queue=7121, util=96.12% > fio: file hash not empty on exit > > > This assertion bugs me. Is it due to the previous errors ("blocksize too large > for data set") or is is because my eMMC drive/kernel is seriously screwed? > > Help please! > >> If my theory is true and it is storage's driver issue than JBD complain >> simply because it do care about it's data (it does integrity checks). >> Can you also create btrfs on that partition and performs some io >> activity and run fsck after that. You likely will see similar corruption > > Best regards, > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-04 11:04 ` Jaehoon Chung @ 2014-07-04 11:32 ` David Jander 0 siblings, 0 replies; 50+ messages in thread From: David Jander @ 2014-07-04 11:32 UTC (permalink / raw) To: Jaehoon Chung Cc: Dmitry Monakhov, Theodore Ts'o, Matteo Croce, Darrick J. Wong, linux-ext4 On Fri, 04 Jul 2014 20:04:13 +0900 Jaehoon Chung <jh80.chung@samsung.com> wrote: > Hi, David. > > On 07/04/2014 06:40 PM, David Jander wrote: > > > > Hi Dmitry, > > > > On Thu, 03 Jul 2014 18:58:48 +0400 > > Dmitry Monakhov <dmonakhov@openvz.org> wrote: > > > >> On Thu, 3 Jul 2014 16:15:51 +0200, David Jander <david@protonic.nl> wrote: > >>> > >>> Hi Ted, > >>> > >>> On Thu, 3 Jul 2014 09:43:38 -0400 > >>> "Theodore Ts'o" <tytso@mit.edu> wrote: > >>> > >>>> On Tue, Jul 01, 2014 at 10:55:11AM +0200, Matteo Croce wrote: > >>>>> 2014-07-01 10:42 GMT+02:00 Darrick J. Wong <darrick.wong@oracle.com>: > >>>>> > >>>>> I have a Samsung SSD 840 PRO > >>>> > >>>> Matteo, > >>>> > >>>> For you, you said you were seeing these problems on 3.15. Was it > >>>> *not* happening for you when you used an older kernel? If so, that > >>>> would help us try to provide the basis of trying to do a bisection > >>>> search. > >>> > >>> I also tested with 3.15, and there too I see the same problem. > >>> > >>>> Using the kvm-xfstests infrastructure, I've been trying to reproduce > >>>> the problem as follows: > >>>> > >>>> ./kvm-xfstests --no-log -c 4k generic/075 ; e2fsck -p /dev/heap/test-4k ; e2fsck -f /dev/heap/test-4k > >>>> > >>>> xfstests geneeric/075 runs fsx which does a fair amount of block > >>>> allocation deallocations, and then after the test finishes, it first > >>>> replays the journal (e2fsck -p) and then forces a fsck run on the > >>>> test disk that I use for the run. > >>>> > >>>> After I launch this, in a separate window, I do this: > >>>> > >>>> sleep 60 ; killall qemu-system-x86_64 > >>>> > >>>> This kills the qemu process midway through the fsx test, and then I > >>>> see if I can find a problem. I haven't had a chance to automate this > >>>> yet, and it is my intention to try to set this up where I can run this > >>>> on a ramdisk or a SSD, so I can more closely approximate what people > >>>> are reporting on flash-based media. > >>>> > >>>> So far, I haven't been able to reproduce the problem. If after doing > >>>> a large number of times, it can't be reproduced (especially if it > >>>> can't be reproduced on an SSD), then it would lead us to believe that > >>>> one of two things is the cause. (a) The CACHE FLUSH command isn't > >>>> properly getting sent to the device in some cases, or (b) there really > >>>> is a hardware problem with the flash device in question. > >>> > >>> Could (a) be caused by a bug in the mmc subsystem or in the MMC peripheral > >>> driver? Can you explain why I don't see any problems with EXT3? > >>> > >>> I can't discard the possibility of (b) because I cannot prove it, but I will > >>> try to see if I can do the same test on a SSD which I happen to have on that > >>> platform. That should be able to rule out problems with the eMMC chip and > >>> -driver, right? > >>> > >>> Do you know a way to investigate (a) (CACHE FLUSH not being sent correctly)? > >>> > >>> I left the system running (it started from a dirty EXT4 partition), and I am > >>> seen the following error pop up after a few minutes. The system is not doing > >>> much (some syslog activity maybe, but not much more): > >>> > >>> [ 303.072983] EXT4-fs (mmcblk1p2): error count: 4 > >>> [ 303.077558] EXT4-fs (mmcblk1p2): initial error at 1404216838: ext4_mb_generate_buddy:756 > >>> [ 303.085690] EXT4-fs (mmcblk1p2): last error at 1404388969: ext4_mb_generate_buddy:757 > >>> > >>> What does that mean? > >> This means that it found previous error in internal ext4's log. Which is > >> normal because your fs was corrupted before. It is reasonable to > >> recreate filesystem from very beginning. > >> > >> In order to understand whenever it is regression in eMMC driver it is > >> reasonable to run integrity test for a device itself. You can run > >> any integrity test you like, For example just run a fio's job > >> "fio disk-verify2.fio" (see attachment), IMPORTANT this script will > >> destroy data on test partition. If it failed with errors like > >> follows "verify: bad magic header XXX" than it is definitely a drivers issue. > > > > I have been trying to run fio on my board with your configuration file, but I > > am having problems, and since I am not familiar with fio at all, I can't > > really figure out what's wrong. My eMMC device is only 916MiB in size, so I > > edited the last part to be: > > Which eMMC host controller did you use? compatible = "fsl,imx6q-usdhc"; Basically a Freescale i.MX6 which uses the drivers/mmc/host/sdhci-esdhc-imx.c driver which (as the name says) is SDHCI compliant. > > offset_increment=100M > > size=100M > > > > Is that ok? > > > > I still get error messages complaining about blocksize though. Here is the > > output I get (can't really make sense of it): > > > > # ./fio ../disk-verify2.fio > > Multiple writers may overwrite blocks that belong to other jobs. This can cause verification failures. > > /dev/mmcblk1p2: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=32 > > ... > > fio-2.1.10-49-gf302 > > Starting 4 processes > > fio: blocksize too large for data set > > fio: blocksize too large for data set > > fio: blocksize too large for data set > > fio: io_u.c:1315: __get_io_u: Assertion `io_u->flags & IO_U_F_FREE' failed.ta 00m:00s] > > fio: pid=7612, got signal=6 > > > > /dev/mmcblk1p2: (groupid=0, jobs=1): err= 0: pid=7612: Fri Jul 4 09:31:15 2014 > > lat (msec) : 4=0.19%, 10=0.19%, 20=0.19%, 50=0.85%, 100=1.23% > > lat (msec) : 250=56.01%, 500=37.18%, 750=1.14% > > cpu : usr=0.00%, sys=0.00%, ctx=0, majf=0, minf=0 > > IO depths : 1=0.1%, 2=0.2%, 4=0.4%, 8=0.8%, 16=1.5%, 32=97.1%, >=64=0.0% > > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% > > issued : total=r=33/w=1024/d=0, short=r=0/w=0/d=0 > > latency : target=0, window=0, percentile=100.00%, depth=32 > > > > Run status group 0 (all jobs): > > > > Disk stats (read/write): > > mmcblk1: ios=11/1025, merge=0/0, ticks=94/6671, in_queue=7121, util=96.12% > > fio: file hash not empty on exit > > > > > > This assertion bugs me. Is it due to the previous errors ("blocksize too large > > for data set") or is is because my eMMC drive/kernel is seriously screwed? > > > > Help please! > > > >> If my theory is true and it is storage's driver issue than JBD complain > >> simply because it do care about it's data (it does integrity checks). > >> Can you also create btrfs on that partition and performs some io > >> activity and run fsck after that. You likely will see similar corruption > > > > Best regards, > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 8:42 ` Darrick J. Wong 2014-07-01 8:55 ` Matteo Croce @ 2014-07-01 12:07 ` Jaehoon Chung 2014-07-01 13:50 ` David Jander 2014-07-01 15:58 ` Theodore Ts'o 1 sibling, 2 replies; 50+ messages in thread From: Jaehoon Chung @ 2014-07-01 12:07 UTC (permalink / raw) To: Darrick J. Wong, Matteo Croce, David Jander; +Cc: linux-ext4, Theodore Ts'o Hi, i have interesting for this problem..Because i also found the same problem.. Is it Journal problem? I used the Linux version 3.16.0-rc3. [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. [ 3.877937] Aborting journal on device mmcblk0p13-8. [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error [ 3.885025] [ 3.891751] CPU: 1 PID: 1488 Comm: mv Not tainted 3.16.0-rc3-00206-gc0f4eea0-dirty #2 [ 3.899594] [<c0016258>] (unwind_backtrace) from [<c00127d4>] (show_stack+0x10/0x14) [ 3.907294] [<c00127d4>] (show_stack) from [<c051f458>] (dump_stack+0x78/0xc8) [ 3.914495] [<c051f458>] (dump_stack) from [<c051cd28>] (panic+0x8c/0x1f0) [ 3.921354] [<c051cd28>] (panic) from [<c018eb38>] (ext4_handle_error+0x68/0xa8) [ 3.928730] [<c018eb38>] (ext4_handle_error) from [<c018f584>] (__ext4_grp_locked_error+0x88/0x1c0) [ 3.937759] [<c018f584>] (__ext4_grp_locked_error) from [<c01a1e04>] (ext4_mb_generate_buddy+0x260/0x398) [ 3.947306] [<c01a1e04>] (ext4_mb_generate_buddy) from [<c01a3d20>] (ext4_mb_init_cache+0x2fc/0x7fc) [ 3.956433] [<c01a3d20>] (ext4_mb_init_cache) from [<c01a43d8>] (ext4_mb_init_group+0x1b8/0x234) [ 3.965186] [<c01a43d8>] (ext4_mb_init_group) from [<c01a4950>] (ext4_mb_load_buddy+0x34c/0x380) [ 3.973952] [<c01a4950>] (ext4_mb_load_buddy) from [<c01a9918>] (ext4_free_blocks+0x47c/0xd30) [ 3.982545] [<c01a9918>] (ext4_free_blocks) from [<c019b440>] (ext4_ext_remove_space+0xae0/0x1474) [ 3.991486] [<c019b440>] (ext4_ext_remove_space) from [<c019df68>] (ext4_ext_truncate+0xb4/0xd4) [ 4.000251] [<c019df68>] (ext4_ext_truncate) from [<c0178ac4>] (ext4_truncate+0x37c/0x43c) [ 4.008497] [<c0178ac4>] (ext4_truncate) from [<c017ba90>] (ext4_evict_inode+0x334/0x4f4) [ 4.016660] [<c017ba90>] (ext4_evict_inode) from [<c01229b0>] (evict+0x94/0x158) [ 4.024036] [<c01229b0>] (evict) from [<c011f1fc>] (__dentry_kill+0x1dc/0x22c) [ 4.031238] [<c011f1fc>] (__dentry_kill) from [<c011f924>] (dput+0xa8/0x1ac) [ 4.038282] [<c011f924>] (dput) from [<c0119a1c>] (SyS_renameat2+0x250/0x36c) [ 4.045396] [<c0119a1c>] (SyS_renameat2) from [<c0119b7c>] (SyS_rename+0x28/0x30) [ 4.052859] [<c0119b7c>] (SyS_rename) from [<c000eea0>] (ret_fast_syscall+0x0/0x3c) I used with the eMMC card. I want to know the debugging point for this problem. If anybody have solution, let me know, plz. Best Regards, Jaehoon Chung On 07/01/2014 05:42 PM, Darrick J. Wong wrote: > On Tue, Jul 01, 2014 at 08:26:19AM +0200, David Jander wrote: >> >> Hi, >> >> On Mon, 30 Jun 2014 23:30:10 +0200 >> Matteo Croce <technoboy85@gmail.com> wrote: >> >>> I was web surfing and using gimp when: >>> >>> EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, >>> 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. >> >> I was about to post a related question to this list. I am also seeing these >> kind of errors when using ext4 on latest mainline (I began testing with 3.15 >> where I saw this and now in 3.16-rc3 it is still there). >> It happens almost instantly when power-cycling the system (unclean shutdown). >> The next time the system boots, I get these errors. >> >> AFAICT, you are using a pretty recent kernel. Which version exactly? >> >>> Aborting journal on device sda2-8. >>> EXT4-fs (sda2): Remounting filesystem read-only > > Matteo, could you please post the full dmesg log somewhere? I'm interested in > what happens before all this happens, because... > >>> ------------[ cut here ]------------ >>> WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 >>> __ext4_handle_dirty_metadata+0x18e/0x1d0() >>> Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek >>> snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops >>> videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep >>> rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl >>> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm >>> snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm >>> aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class >>> glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath >>> mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill >>> snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich >>> soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart >>> ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm >>> backlight >>> CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 >>> 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d >>> ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 >>> ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 >>> Call Trace: >>> [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 >>> [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 >>> [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 >>> [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 >>> [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 >>> [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 >>> [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 >>> [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 >>> [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 >>> [<ffffffff81114a46>] ? evict+0xa6/0x160 >>> [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 >>> [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 >>> [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 >>> [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f >>> ---[ end trace 795411398e41fbcb ]--- >>> EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at >>> line 241, credits 91/91, errcode -30 >>> EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at >>> line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in >>> ext4_free_blocks:4867: Journal has aborted >>> EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: >>> block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle >>> type 5 started at line 241, credits 91/91, errcode -30 >>> EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has >>> aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has >>> aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >>> has aborted >>> EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted >>> EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >>> has aborted >>> EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted >>> EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal >>> has aborted >> >> I did not get these errors. I suspect this may be a consequence of FS >> corruption due to a bug in etx4. >> >> Here's why I suspect a bug: >> >> I am running latest git head (3.16-rc3+ as of yesterday) on an ARM system with >> eMMC flash. The eMMC is formatted in SLC mode ("enhanced" mode according to >> eMMC 4.41) and "reliable-writes" are enabled, so power-cycling should not >> cause FS corruption in presence of a journal. >> >> I can format the eMMC device either as EXT3 or EXT4 for the test. After >> formatting and writing the rootfs to the partition I can boot successfully in >> either situation. Once booted from eMMC, I start bonnie++ (to just stress the >> FS for a while), and after a minute or so the board is power-cycled while >> bonnie++ is still running. >> >> Next time I boot the situation is this: >> >> With EXT3: All seems fine, journal is replayed, no errors. I can repeat this as >> many times as I want, FS stays consistent. >> >> With EXT4: After just one power cycle I start getting this: >> >> [ 7.603871] EXT4-fs error (device mmcblk0p2): ext4_mb_generate_buddy:757: group 1, 8542 clusters in bitmap, 8550 in gd; block bitmap corrupt. >> [ 7.616743] JBD2: Spotted dirty metadata buffer (dev = mmcblk0p2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. > > I've been seeing this same set of symptoms with 3.15.0 on various SSDs (Samsung > 840 Pro, Crucial M4). It seems that something (upstart?) is holding open some > file or other during poweroff, which means that the root fs can't be unmounted > or even remounted rw. I also noticed that the next time the system comes up, > the kernel tells me that it has to process the inode orphan list as part of > recovery. > > Shortly after the orphan list gets processed, I get that message and the FS > goes ro. A subsequent fsck run reveals that the block bitmap is indeed > incorrect in that block group, and when I bd the blocks that are incorrect in > the bitmap, I see what could be some kind of upstart log file. Either way, I > suspect some bug in orphan processing. > > <shrug> I don't know if this is specific to SSDs or spinning rust. Right now > I've simply rigged the initramfs to e2fsck -p the root fs before mounting it, > which seems(?) to have patched around it for now. > >> If I continue the test, it doesn't take long and serious corruption starts >> occurring. > > You're getting actual FS data corruption too? Or more of those messages? > > --D >> >> Again, with EXT3 I am unable to detect any problems. >> >> Best regards, >> >> -- >> David Jander >> Protonic Holland. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 12:07 ` Jaehoon Chung @ 2014-07-01 13:50 ` David Jander 2014-07-01 15:58 ` Theodore Ts'o 1 sibling, 0 replies; 50+ messages in thread From: David Jander @ 2014-07-01 13:50 UTC (permalink / raw) To: Jaehoon Chung Cc: Darrick J. Wong, Matteo Croce, linux-ext4, Theodore Ts'o Hi Jaehoon, On Tue, 01 Jul 2014 21:07:27 +0900 Jaehoon Chung <jh80.chung@samsung.com> wrote: > Hi, > > i have interesting for this problem..Because i also found the same problem.. > Is it Journal problem? > > I used the Linux version 3.16.0-rc3. > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. > [ 3.877937] Aborting journal on device mmcblk0p13-8. > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error > [ 3.885025] > [ 3.891751] CPU: 1 PID: 1488 Comm: mv Not tainted 3.16.0-rc3-00206-gc0f4eea0-dirty #2 > [ 3.899594] [<c0016258>] (unwind_backtrace) from [<c00127d4>] (show_stack+0x10/0x14) > [ 3.907294] [<c00127d4>] (show_stack) from [<c051f458>] (dump_stack+0x78/0xc8) > [ 3.914495] [<c051f458>] (dump_stack) from [<c051cd28>] (panic+0x8c/0x1f0) > [ 3.921354] [<c051cd28>] (panic) from [<c018eb38>] (ext4_handle_error+0x68/0xa8) > [ 3.928730] [<c018eb38>] (ext4_handle_error) from [<c018f584>] (__ext4_grp_locked_error+0x88/0x1c0) > [ 3.937759] [<c018f584>] (__ext4_grp_locked_error) from [<c01a1e04>] (ext4_mb_generate_buddy+0x260/0x398) > [ 3.947306] [<c01a1e04>] (ext4_mb_generate_buddy) from [<c01a3d20>] (ext4_mb_init_cache+0x2fc/0x7fc) > [ 3.956433] [<c01a3d20>] (ext4_mb_init_cache) from [<c01a43d8>] (ext4_mb_init_group+0x1b8/0x234) > [ 3.965186] [<c01a43d8>] (ext4_mb_init_group) from [<c01a4950>] (ext4_mb_load_buddy+0x34c/0x380) > [ 3.973952] [<c01a4950>] (ext4_mb_load_buddy) from [<c01a9918>] (ext4_free_blocks+0x47c/0xd30) > [ 3.982545] [<c01a9918>] (ext4_free_blocks) from [<c019b440>] (ext4_ext_remove_space+0xae0/0x1474) > [ 3.991486] [<c019b440>] (ext4_ext_remove_space) from [<c019df68>] (ext4_ext_truncate+0xb4/0xd4) > [ 4.000251] [<c019df68>] (ext4_ext_truncate) from [<c0178ac4>] (ext4_truncate+0x37c/0x43c) > [ 4.008497] [<c0178ac4>] (ext4_truncate) from [<c017ba90>] (ext4_evict_inode+0x334/0x4f4) > [ 4.016660] [<c017ba90>] (ext4_evict_inode) from [<c01229b0>] (evict+0x94/0x158) > [ 4.024036] [<c01229b0>] (evict) from [<c011f1fc>] (__dentry_kill+0x1dc/0x22c) > [ 4.031238] [<c011f1fc>] (__dentry_kill) from [<c011f924>] (dput+0xa8/0x1ac) > [ 4.038282] [<c011f924>] (dput) from [<c0119a1c>] (SyS_renameat2+0x250/0x36c) > [ 4.045396] [<c0119a1c>] (SyS_renameat2) from [<c0119b7c>] (SyS_rename+0x28/0x30) > [ 4.052859] [<c0119b7c>] (SyS_rename) from [<c000eea0>] (ret_fast_syscall+0x0/0x3c) > > I used with the eMMC card. I want to know the debugging point for this problem. > If anybody have solution, let me know, plz. You can try to disable the mount option "errors=panic", I suspect the actual error is harmless.... OTOH, then you cannot see the stack-trace anymore. AFAICS, it seems to originate in ext4_ext_remove_space() in all cases I have seen, and happens quicker if the filesystem was not unmounted properly, so maybe the bug is related to the journal code's accounting of free space. Let's wait and see what the experts say about this... Regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 12:07 ` Jaehoon Chung 2014-07-01 13:50 ` David Jander @ 2014-07-01 15:58 ` Theodore Ts'o 2014-07-01 16:14 ` Lukáš Czerner ` (2 more replies) 1 sibling, 3 replies; 50+ messages in thread From: Theodore Ts'o @ 2014-07-01 15:58 UTC (permalink / raw) To: Jaehoon Chung; +Cc: Darrick J. Wong, Matteo Croce, David Jander, linux-ext4 On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: > Hi, > > i have interesting for this problem..Because i also found the same problem.. > Is it Journal problem? > > I used the Linux version 3.16.0-rc3. > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. > [ 3.877937] Aborting journal on device mmcblk0p13-8. > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error This message means that the file system has detected an inconsistency --- specifically, that the number of blocks marked as in use in the allocation bbitmap is different from what is in the block group descriptors. The file system has been marked to force a panic after an error, at which point e2fsck will be able to repair the inconsistency. What's not clear is *how* the why this happened. It can happen simply because of a hardware problem. (In particular, not all mmc flash devices handle power failures gracefully.) Or it could be a cosmic, ray, or it might be a kernel bug. Normally I would chalk this up to a hardware bug, bug it's possible that it is a kernel bug. If people can reliably reproduce the problem where no power failures or other unclean shutdowns were involved (since the last time file system has been checked using e2fsck) then that would be realy interesting. We should probably also change the message so the message is a bit more understanding to people who aren't ext4 developers. - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 15:58 ` Theodore Ts'o @ 2014-07-01 16:14 ` Lukáš Czerner 2014-07-01 16:36 ` Eric Whitney 2014-07-02 9:44 ` David Jander 2 siblings, 0 replies; 50+ messages in thread From: Lukáš Czerner @ 2014-07-01 16:14 UTC (permalink / raw) To: Theodore Ts'o Cc: Jaehoon Chung, Darrick J. Wong, Matteo Croce, David Jander, linux-ext4 On Tue, 1 Jul 2014, Theodore Ts'o wrote: > Date: Tue, 1 Jul 2014 11:58:12 -0400 > From: Theodore Ts'o <tytso@mit.edu> > To: Jaehoon Chung <jh80.chung@samsung.com> > Cc: Darrick J. Wong <darrick.wong@oracle.com>, > Matteo Croce <technoboy85@gmail.com>, David Jander <david@protonic.nl>, > linux-ext4@vger.kernel.org > Subject: Re: ext4: journal has aborted > > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: > > Hi, > > > > i have interesting for this problem..Because i also found the same problem.. > > Is it Journal problem? > > > > I used the Linux version 3.16.0-rc3. > > > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. > > [ 3.877937] Aborting journal on device mmcblk0p13-8. > > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error > > This message means that the file system has detected an inconsistency > --- specifically, that the number of blocks marked as in use in the > allocation bbitmap is different from what is in the block group > descriptors. > > The file system has been marked to force a panic after an error, at > which point e2fsck will be able to repair the inconsistency. > > What's not clear is *how* the why this happened. It can happen simply > because of a hardware problem. (In particular, not all mmc flash > devices handle power failures gracefully.) Or it could be a cosmic, > ray, or it might be a kernel bug. > > Normally I would chalk this up to a hardware bug, bug it's possible > that it is a kernel bug. If people can reliably reproduce the problem > where no power failures or other unclean shutdowns were involved > (since the last time file system has been checked using e2fsck) then > that would be realy interesting. > > We should probably also change the message so the message is a bit > more understanding to people who aren't ext4 developers. Yes we probably should :) Maybe we could make it more clear that the there might be wide variety of reasons for this and that one of the likely reasons for this might be hardware related. -Lukas > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 15:58 ` Theodore Ts'o 2014-07-01 16:14 ` Lukáš Czerner @ 2014-07-01 16:36 ` Eric Whitney 2014-07-02 8:34 ` Matteo Croce 2014-07-02 10:17 ` David Jander 2014-07-02 9:44 ` David Jander 2 siblings, 2 replies; 50+ messages in thread From: Eric Whitney @ 2014-07-01 16:36 UTC (permalink / raw) To: Theodore Ts'o Cc: Jaehoon Chung, Darrick J. Wong, Matteo Croce, David Jander, linux-ext4 * Theodore Ts'o <tytso@mit.edu>: > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: > > Hi, > > > > i have interesting for this problem..Because i also found the same problem.. > > Is it Journal problem? > > > > I used the Linux version 3.16.0-rc3. > > > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. > > [ 3.877937] Aborting journal on device mmcblk0p13-8. > > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error > > This message means that the file system has detected an inconsistency > --- specifically, that the number of blocks marked as in use in the > allocation bbitmap is different from what is in the block group > descriptors. > > The file system has been marked to force a panic after an error, at > which point e2fsck will be able to repair the inconsistency. > > What's not clear is *how* the why this happened. It can happen simply > because of a hardware problem. (In particular, not all mmc flash > devices handle power failures gracefully.) Or it could be a cosmic, > ray, or it might be a kernel bug. > > Normally I would chalk this up to a hardware bug, bug it's possible > that it is a kernel bug. If people can reliably reproduce the problem > where no power failures or other unclean shutdowns were involved > (since the last time file system has been checked using e2fsck) then > that would be realy interesting. Hi Ted: I saw a similar failure during 3.16-rc3 (plus ext4 stable fixes plus msync patch) regression on the Pandaboard this morning. A generic/068 hang on data_journal required a reboot for recovery (old bug, though rarer lately). On reboot, the root filesystem - default 4K, and on an SD card - went ro after the same sort of bad block bitmap / journal abort sequence. Rebooting forced a fsck that cleared up the problem. The target test filesystem was on a USB-attached disk, and it did not exhibit the same problems on recovery. So, it looks like there might be more than just hardware involved here, although eMMC/flash might be a common denominator. I'll see if I can come up with a reliable reproducer once the regression pass is finished if someone doesn't beat me to it. Eric > > We should probably also change the message so the message is a bit > more understanding to people who aren't ext4 developers. > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 16:36 ` Eric Whitney @ 2014-07-02 8:34 ` Matteo Croce 2014-07-02 10:17 ` David Jander 1 sibling, 0 replies; 50+ messages in thread From: Matteo Croce @ 2014-07-02 8:34 UTC (permalink / raw) To: Eric Whitney Cc: Theodore Ts'o, Jaehoon Chung, Darrick J. Wong, David Jander, linux-ext4 Similar issue on an X86 router: # dmesg Initializing cgroup subsys cpu Linux version 3.15.0-alix (root@alix) (gcc version 4.8.3 (Debian 4.8.3-2) ) #2 Mon Jun 9 16:54:44 CEST 2014 KERNEL supported cpus: AMD AuthenticAMD e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved Notice: NX (Execute Disable) protection missing in CPU! e820: update [mem 0x00000000-0x00000fff] usable ==> reserved e820: remove [mem 0x000a0000-0x000fffff] usable e820: last_pfn = 0x10000 max_arch_pfn = 0x100000 initial memory mapped: [mem 0x00000000-0x017fffff] Base memory trampoline at [c009b000] 9b000 size 16384 init_memory_mapping: [mem 0x00000000-0x000fffff] [mem 0x00000000-0x000fffff] page 4k init_memory_mapping: [mem 0x0fc00000-0x0fffffff] [mem 0x0fc00000-0x0fffffff] page 2M init_memory_mapping: [mem 0x08000000-0x0fbfffff] [mem 0x08000000-0x0fbfffff] page 2M init_memory_mapping: [mem 0x00100000-0x07ffffff] [mem 0x00100000-0x003fffff] page 4k [mem 0x00400000-0x07ffffff] page 2M 256MB LOWMEM available. mapped low ram: 0 - 10000000 low ram: 0 - 10000000 Zone ranges: DMA [mem 0x00001000-0x00ffffff] Normal [mem 0x01000000-0x0fffffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x00001000-0x0009ffff] node 0: [mem 0x00100000-0x0fffffff] On node 0 totalpages: 65439 DMA zone: 32 pages used for memmap DMA zone: 0 pages reserved DMA zone: 3999 pages, LIFO batch:0 Normal zone: 480 pages used for memmap Normal zone: 61440 pages, LIFO batch:15 e820: [mem 0x10000000-0xffefffff] available for PCI devices pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768 pcpu-alloc: [0] 0 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64927 Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.15.0-alix root=/dev/sda1 ro console=ttyS0,115200 panic=1 init=/bin/systemd PID hash table entries: 1024 (order: 0, 4096 bytes) Dentry cache hash table entries: 32768 (order: 5, 131072 bytes) Inode-cache hash table entries: 16384 (order: 4, 65536 bytes) Initializing CPU#0 Memory: 255748K/261756K available (2409K kernel code, 151K rwdata, 628K rodata, 160K init, 236K bss, 6008K reserved) virtual kernel memory layout: fixmap : 0xfffe5000 - 0xfffff000 ( 104 kB) vmalloc : 0xd0800000 - 0xfffe3000 ( 759 MB) lowmem : 0xc0000000 - 0xd0000000 ( 256 MB) .init : 0xc1320000 - 0xc1348000 ( 160 kB) .data : 0xc125aaef - 0xc131ece0 ( 784 kB) .text : 0xc1000000 - 0xc125aaef (2410 kB) Checking if this processor honours the WP bit even in supervisor mode...Ok. SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 NR_IRQS:16 nr_irqs:16 16 CPU 0 irqstacks, hard=cf808000 soft=cf80a000 console [ttyS0] enabled tsc: Fast TSC calibration using PIT tsc: Detected 498.030 MHz processor Calibrating delay loop (skipped), value calculated using timer frequency.. 996.06 BogoMIPS (lpj=4980300) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 1024 (order: 0, 4096 bytes) Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes) Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 tlb_flushall_shift: -1 CPU: Geode(TM) Integrated Processor by AMD PCS (fam: 06, model: 0a, stepping: 02) Performance Events: no APIC, boot with the "lapic" boot parameter to force-enable it. no hardware sampling interrupt available. Broken PMU hardware detected, using software events only. Failed to access perfctr msr (MSR c0010004 is 0) devtmpfs: initialized NET: Registered protocol family 16 cpuidle: using governor ladder cpuidle: using governor menu PCI: PCI BIOS revision 2.10 entry at 0xfced9, last bus=0 PCI: Using configuration type 1 for base access SCSI subsystem initialized libata version 3.00 loaded. PCI: Probing PCI hardware PCI: root bus 00: using default resources PCI: Probing PCI hardware (bus 00) PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x0000-0xffff] pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffff] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff] pci 0000:00:01.0: [1022:2080] type 00 class 0x060000 pci 0000:00:01.0: reg 0x10: [io 0xac1c-0xac1f] pci 0000:00:01.2: [1022:2082] type 00 class 0x101000 pci 0000:00:01.2: reg 0x10: [mem 0xefff4000-0xefff7fff] pci 0000:00:09.0: [1106:3053] type 00 class 0x020000 pci 0000:00:09.0: reg 0x10: [io 0x1000-0x10ff] pci 0000:00:09.0: reg 0x14: [mem 0xe0000000-0xe00000ff] pci 0000:00:09.0: supports D1 D2 pci 0000:00:09.0: PME# supported from D0 D1 D2 D3hot D3cold pci 0000:00:0c.0: [168c:0029] type 00 class 0x028000 pci 0000:00:0c.0: reg 0x10: [mem 0xe0040000-0xe004ffff] pci 0000:00:0c.0: PME# supported from D0 D3hot pci 0000:00:0f.0: [1022:2090] type 00 class 0x060100 pci 0000:00:0f.0: reg 0x10: [io 0x6000-0x6007] pci 0000:00:0f.0: reg 0x14: [io 0x6100-0x61ff] pci 0000:00:0f.0: reg 0x18: [io 0x6200-0x623f] pci 0000:00:0f.0: reg 0x20: [io 0x9d00-0x9d7f] pci 0000:00:0f.0: reg 0x24: [io 0x9c00-0x9c3f] pci 0000:00:0f.2: [1022:209a] type 00 class 0x010180 pci 0000:00:0f.2: reg 0x20: [io 0xff00-0xff0f] pci 0000:00:0f.2: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] pci 0000:00:0f.2: legacy IDE quirk: reg 0x14: [io 0x03f6] pci 0000:00:0f.2: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] pci 0000:00:0f.2: legacy IDE quirk: reg 0x1c: [io 0x0376] pci 0000:00:0f.4: [1022:2094] type 00 class 0x0c0310 pci 0000:00:0f.4: reg 0x10: [mem 0xefffe000-0xefffefff] pci 0000:00:0f.4: PME# supported from D0 D3hot D3cold pci 0000:00:0f.5: [1022:2095] type 00 class 0x0c0320 pci 0000:00:0f.5: reg 0x10: [mem 0xefffd000-0xefffdfff] pci 0000:00:0f.5: PME# supported from D0 D3hot D3cold pci_bus 0000:00: busn_res: [bus 00-ff] end is updated to 00 PCI: pci_cache_line_size set to 32 bytes Switched to clocksource pit pci_bus 0000:00: resource 4 [io 0x0000-0xffff] pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffff] NET: Registered protocol family 2 TCP established hash table entries: 2048 (order: 1, 8192 bytes) TCP bind hash table entries: 2048 (order: 1, 8192 bytes) TCP: Hash tables configured (established 2048 bind 2048) TCP: reno registered UDP hash table entries: 256 (order: 0, 4096 bytes) UDP-Lite hash table entries: 256 (order: 0, 4096 bytes) NET: Registered protocol family 1 platform rtc_cmos: registered platform RTC device (no PNP device found) futex hash table entries: 256 (order: -1, 3072 bytes) msgmni has been set to 499 Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252) io scheduler noop registered io scheduler deadline registered (default) Serial: 8250/16550 driver, 1 ports, IRQ sharing disabled serial8250: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 921600) is a NS16550A scsi0 : pata_cs5536 scsi1 : pata_cs5536 ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xff00 irq 14 ata2: DUMMY rtc_cmos rtc_cmos: rtc core: registered rtc_cmos as rtc0 rtc_cmos rtc_cmos: alarms up to one day, 114 bytes nvram TCP: cubic registered NET: Registered protocol family 10 NET: Registered protocol family 17 rtc_cmos rtc_cmos: setting system clock to 2000-01-01 00:00:04 UTC (946684804) ata1.00: CFA: , 20101012, max UDMA/100 ata1.00: 62537328 sectors, multi 0: LBA ata1.00: limited to UDMA/33 due to 40-wire cable ata1.00: configured for UDMA/33 scsi 0:0:0:0: Direct-Access ATA 2010 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 62537328 512-byte logical blocks: (32.0 GB/29.8 GiB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sd 0:0:0:0: [sda] Attached SCSI disk EXT4-fs (sda1): couldn't mount as ext3 due to feature incompatibilities EXT4-fs (sda1): couldn't mount as ext2 due to feature incompatibilities EXT4-fs (sda1): INFO: recovery required on readonly filesystem EXT4-fs (sda1): write access will be enabled during recovery Switched to clocksource tsc EXT4-fs (sda1): orphan cleanup on readonly fs EXT4-fs (sda1): 1 orphan inode deleted EXT4-fs (sda1): recovery complete EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) VFS: Mounted root (ext4 filesystem) readonly on device 8:1. devtmpfs: mounted Freeing unused kernel memory: 160K (c1320000 - c1348000) Write protecting the kernel text: 2412k Write protecting the kernel read-only data: 632k systemd[1]: systemd 204 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ) systemd[1]: Inserted module 'autofs4' systemd[1]: Set hostname to <alix>. random: systemd urandom read with 30 bits of entropy available systemd[1]: Cannot add dependency job for unit display-manager.service, ignoring: Unit display-manager.service failed to load: No such file or directory. See system logs and 'systemctl status display-manager.service' for details. systemd[1]: Expecting device dev-ttyS0.device... systemd[1]: Starting Forward Password Requests to Wall Directory Watch. systemd[1]: Started Forward Password Requests to Wall Directory Watch. systemd[1]: Starting Syslog Socket. systemd[1]: Listening on Syslog Socket. systemd[1]: Starting Delayed Shutdown Socket. systemd[1]: Listening on Delayed Shutdown Socket. systemd[1]: Starting /dev/initctl Compatibility Named Pipe. systemd[1]: Listening on /dev/initctl Compatibility Named Pipe. systemd[1]: Starting Dispatch Password Requests to Console Directory Watch. systemd[1]: Started Dispatch Password Requests to Console Directory Watch. systemd[1]: Starting Encrypted Volumes. systemd[1]: Reached target Encrypted Volumes. systemd[1]: Starting udev Kernel Socket. systemd[1]: Listening on udev Kernel Socket. systemd[1]: Starting udev Control Socket. systemd[1]: Listening on udev Control Socket. systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point. systemd[1]: Starting Journal Socket. systemd[1]: Listening on Journal Socket. systemd[1]: Starting Syslog. systemd[1]: Reached target Syslog. systemd[1]: Mounted Huge Pages File System. systemd[1]: Started Set Up Additional Binary Formats. systemd[1]: Starting Create static device nodes in /dev... systemd[1]: Starting Apply Kernel Variables... systemd[1]: Starting Load Kernel Modules... systemd[1]: Starting udev Coldplug all Devices... systemd[1]: Starting Journal Service... systemd[1]: Started Journal Service. systemd[1]: Mounted POSIX Message Queue File System. systemd[1]: Expecting device dev-sda2.device... systemd[1]: Starting File System Check on Root Device... cs5535-smb cs5535-smb: SCx200 device 'CS5535 ACB0' registered cs5535-mfgpt cs5535-mfgpt: reserved resource region [io 0x6200-0x623f] cs5535-mfgpt cs5535-mfgpt: 8 MFGPT timers available cs5535-mfd 0000:00:0f.0: 5 devices registered. systemd[1]: Started Create static device nodes in /dev. systemd[1]: Started Apply Kernel Variables. cs5535-mfgpt cs5535-mfgpt: registered timer 0 cs5535-clockevt: Registering MFGPT timer as a clock event, using IRQ 7 systemd[1]: Starting udev Kernel Device Manager... systemd-udevd[317]: starting version 204 EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro cs5535-gpio cs5535-gpio: reserved resource region [io 0x6100-0x61ff] via_rhine: v1.10-LK1.5.1 2010-10-09 Written by Donald Becker via-rhine 0000:00:09.0 eth0: VIA Rhine III (Management Adapter) at 0xe0000000, 00:0d:b9:19:4c:bc, IRQ 10 via-rhine 0000:00:09.0 eth0: MII PHY found at address 1, status 0x7849 advertising 05e1 Link 0000 AMD Geode RNG detected geode-aes: GEODE AES engine enabled. cfg80211: Calling CRDA to update world regulatory domain usbcore: registered new interface driver usbfs usbcore: registered new interface driver hub usbcore: registered new device driver usb ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver ohci-pci: OHCI PCI platform driver ohci-pci 0000:00:0f.4: OHCI PCI host controller ohci-pci 0000:00:0f.4: new USB bus registered, assigned bus number 1 ohci-pci 0000:00:0f.4: irq 12, io mem 0xefffe000 ehci-pci: EHCI PCI platform driver hub 1-0:1.0: USB hub found hub 1-0:1.0: 4 ports detected ehci-pci 0000:00:0f.5: EHCI Host Controller ehci-pci 0000:00:0f.5: new USB bus registered, assigned bus number 2 ehci-pci 0000:00:0f.5: irq 12, io mem 0xefffd000 cfg80211: World regulatory domain updated: cfg80211: DFS Master region: unset cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A) cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A) ehci-pci 0000:00:0f.5: USB 2.0 started, EHCI 1.00 hub 2-0:1.0: USB hub found hub 2-0:1.0: 4 ports detected hub 1-0:1.0: USB hub found hub 1-0:1.0: 4 ports detected ath: EEPROM regdomain: 0x0 ath: EEPROM indicates default country code should be used ath: doing EEPROM country->regdmn map search ath: country maps to regdmn code: 0x3a ath: Country alpha2 being used: US ath: Regpair used: 0x3a ieee80211 phy0: Selected rate control algorithm 'minstrel_ht' ieee80211 phy0: Atheros AR9280 Rev:2 mem=0xd0940000, irq=9 IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready cfg80211: Calling CRDA for country: US cfg80211: Regulatory domain changed to country: US cfg80211: DFS Master region: unset cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 3000 mBm), (N/A) cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz), (N/A, 1700 mBm), (N/A) cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz), (N/A, 2300 mBm), (0 s) cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 3000 mBm), (N/A) cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 4000 mBm), (N/A) Adding 858932k swap on /dev/sda2. Priority:-1 extents:1 across:858932k random: nonblocking pool is initialized IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready EXT4-fs error (device sda1): ext4_mb_generate_buddy:756: group 114, 24855 clusters in bitmap, 24856 in gd; block bitmap corrupt. Aborting journal on device sda1-8. EXT4-fs (sda1): Remounting filesystem read-only # e2fsck -fy /dev/sda1 e2fsck 1.42.10 (18-May-2014) /dev/sda1: recovering journal Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (6708644, counted=6708655). Fix? yes Free inodes count wrong (1752623, counted=1752627). Fix? yes /dev/sda1: ***** FILE SYSTEM WAS MODIFIED ***** /dev/sda1: ***** REBOOT LINUX ***** /dev/sda1: 147917/1900544 files (0.1% non-contiguous), 893521/7602176 blocks 2014-07-01 18:36 GMT+02:00 Eric Whitney <enwlinux@gmail.com>: > * Theodore Ts'o <tytso@mit.edu>: >> On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: >> > Hi, >> > >> > i have interesting for this problem..Because i also found the same problem.. >> > Is it Journal problem? >> > >> > I used the Linux version 3.16.0-rc3. >> > >> > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. >> > [ 3.877937] Aborting journal on device mmcblk0p13-8. >> > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error >> >> This message means that the file system has detected an inconsistency >> --- specifically, that the number of blocks marked as in use in the >> allocation bbitmap is different from what is in the block group >> descriptors. >> >> The file system has been marked to force a panic after an error, at >> which point e2fsck will be able to repair the inconsistency. >> >> What's not clear is *how* the why this happened. It can happen simply >> because of a hardware problem. (In particular, not all mmc flash >> devices handle power failures gracefully.) Or it could be a cosmic, >> ray, or it might be a kernel bug. >> >> Normally I would chalk this up to a hardware bug, bug it's possible >> that it is a kernel bug. If people can reliably reproduce the problem >> where no power failures or other unclean shutdowns were involved >> (since the last time file system has been checked using e2fsck) then >> that would be realy interesting. > > Hi Ted: > > I saw a similar failure during 3.16-rc3 (plus ext4 stable fixes plus msync > patch) regression on the Pandaboard this morning. A generic/068 hang > on data_journal required a reboot for recovery (old bug, though rarer lately). > On reboot, the root filesystem - default 4K, and on an SD card - went ro > after the same sort of bad block bitmap / journal abort sequence. Rebooting > forced a fsck that cleared up the problem. The target test filesystem was on > a USB-attached disk, and it did not exhibit the same problems on recovery. > > So, it looks like there might be more than just hardware involved here, > although eMMC/flash might be a common denominator. I'll see if I can come up > with a reliable reproducer once the regression pass is finished if someone > doesn't beat me to it. > > Eric > > >> >> We should probably also change the message so the message is a bit >> more understanding to people who aren't ext4 developers. >> >> - Ted >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matteo Croce OpenWrt Developer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 16:36 ` Eric Whitney 2014-07-02 8:34 ` Matteo Croce @ 2014-07-02 10:17 ` David Jander 2014-07-02 10:19 ` Matteo Croce 1 sibling, 1 reply; 50+ messages in thread From: David Jander @ 2014-07-02 10:17 UTC (permalink / raw) To: Eric Whitney Cc: Theodore Ts'o, Jaehoon Chung, Darrick J. Wong, Matteo Croce, linux-ext4 Hi Eric, On Tue, 1 Jul 2014 12:36:46 -0400 Eric Whitney <enwlinux@gmail.com> wrote: > * Theodore Ts'o <tytso@mit.edu>: > > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: > > > Hi, > > > > > > i have interesting for this problem..Because i also found the same problem.. > > > Is it Journal problem? > > > > > > I used the Linux version 3.16.0-rc3. > > > > > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. > > > [ 3.877937] Aborting journal on device mmcblk0p13-8. > > > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error > > > > This message means that the file system has detected an inconsistency > > --- specifically, that the number of blocks marked as in use in the > > allocation bbitmap is different from what is in the block group > > descriptors. > > > > The file system has been marked to force a panic after an error, at > > which point e2fsck will be able to repair the inconsistency. > > > > What's not clear is *how* the why this happened. It can happen simply > > because of a hardware problem. (In particular, not all mmc flash > > devices handle power failures gracefully.) Or it could be a cosmic, > > ray, or it might be a kernel bug. > > > > Normally I would chalk this up to a hardware bug, bug it's possible > > that it is a kernel bug. If people can reliably reproduce the problem > > where no power failures or other unclean shutdowns were involved > > (since the last time file system has been checked using e2fsck) then > > that would be realy interesting. > > Hi Ted: > > I saw a similar failure during 3.16-rc3 (plus ext4 stable fixes plus msync > patch) regression on the Pandaboard this morning. A generic/068 hang > on data_journal required a reboot for recovery (old bug, though rarer lately). > On reboot, the root filesystem - default 4K, and on an SD card - went ro > after the same sort of bad block bitmap / journal abort sequence. Rebooting > forced a fsck that cleared up the problem. The target test filesystem was on > a USB-attached disk, and it did not exhibit the same problems on recovery. Please be careful about conclusions from regular SD cards and USB sticks for mass-storage. Unlike hardened eMMC (4.41+), these COTS mass-storage devices are not meant for intensive use and can perfectly easily corrupt data out of themselves. I've seen it happening many times already. > So, it looks like there might be more than just hardware involved here, > although eMMC/flash might be a common denominator. I'll see if I can come up > with a reliable reproducer once the regression pass is finished if someone > doesn't beat me to it. I agree that there is a strong correlation towards flash-based storage, but I cannot explain why this factor would make a difference. How are flash-based block-devices different to ext4 than spinning-disk media (besides trim support)? Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-02 10:17 ` David Jander @ 2014-07-02 10:19 ` Matteo Croce 2014-07-03 17:14 ` Eric Whitney 0 siblings, 1 reply; 50+ messages in thread From: Matteo Croce @ 2014-07-02 10:19 UTC (permalink / raw) To: David Jander Cc: Eric Whitney, Theodore Ts'o, Jaehoon Chung, Darrick J. Wong, linux-ext4 2014-07-02 12:17 GMT+02:00 David Jander <david@protonic.nl>: > > Hi Eric, > > On Tue, 1 Jul 2014 12:36:46 -0400 > Eric Whitney <enwlinux@gmail.com> wrote: > >> * Theodore Ts'o <tytso@mit.edu>: >> > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: >> > > Hi, >> > > >> > > i have interesting for this problem..Because i also found the same problem.. >> > > Is it Journal problem? >> > > >> > > I used the Linux version 3.16.0-rc3. >> > > >> > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. >> > > [ 3.877937] Aborting journal on device mmcblk0p13-8. >> > > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error >> > >> > This message means that the file system has detected an inconsistency >> > --- specifically, that the number of blocks marked as in use in the >> > allocation bbitmap is different from what is in the block group >> > descriptors. >> > >> > The file system has been marked to force a panic after an error, at >> > which point e2fsck will be able to repair the inconsistency. >> > >> > What's not clear is *how* the why this happened. It can happen simply >> > because of a hardware problem. (In particular, not all mmc flash >> > devices handle power failures gracefully.) Or it could be a cosmic, >> > ray, or it might be a kernel bug. >> > >> > Normally I would chalk this up to a hardware bug, bug it's possible >> > that it is a kernel bug. If people can reliably reproduce the problem >> > where no power failures or other unclean shutdowns were involved >> > (since the last time file system has been checked using e2fsck) then >> > that would be realy interesting. >> >> Hi Ted: >> >> I saw a similar failure during 3.16-rc3 (plus ext4 stable fixes plus msync >> patch) regression on the Pandaboard this morning. A generic/068 hang >> on data_journal required a reboot for recovery (old bug, though rarer lately). >> On reboot, the root filesystem - default 4K, and on an SD card - went ro >> after the same sort of bad block bitmap / journal abort sequence. Rebooting >> forced a fsck that cleared up the problem. The target test filesystem was on >> a USB-attached disk, and it did not exhibit the same problems on recovery. > > Please be careful about conclusions from regular SD cards and USB sticks for > mass-storage. Unlike hardened eMMC (4.41+), these COTS mass-storage devices > are not meant for intensive use and can perfectly easily corrupt data out of > themselves. I've seen it happening many times already. > >> So, it looks like there might be more than just hardware involved here, >> although eMMC/flash might be a common denominator. I'll see if I can come up >> with a reliable reproducer once the regression pass is finished if someone >> doesn't beat me to it. > > I agree that there is a strong correlation towards flash-based storage, but I > cannot explain why this factor would make a difference. How are flash-based > block-devices different to ext4 than spinning-disk media (besides trim > support)? maybe the zero access time can trigger some race condition? > Best regards, > > -- > David Jander > Protonic Holland. -- Matteo Croce OpenWrt Developer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-02 10:19 ` Matteo Croce @ 2014-07-03 17:14 ` Eric Whitney 2014-07-03 23:17 ` Theodore Ts'o 0 siblings, 1 reply; 50+ messages in thread From: Eric Whitney @ 2014-07-03 17:14 UTC (permalink / raw) To: Matteo Croce Cc: David Jander, Eric Whitney, Theodore Ts'o, Jaehoon Chung, Darrick J. Wong, linux-ext4 * Matteo Croce <technoboy85@gmail.com>: > 2014-07-02 12:17 GMT+02:00 David Jander <david@protonic.nl>: > > > > Hi Eric, > > > > On Tue, 1 Jul 2014 12:36:46 -0400 > > Eric Whitney <enwlinux@gmail.com> wrote: > > > >> * Theodore Ts'o <tytso@mit.edu>: > >> > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: > >> > > Hi, > >> > > > >> > > i have interesting for this problem..Because i also found the same problem.. > >> > > Is it Journal problem? > >> > > > >> > > I used the Linux version 3.16.0-rc3. > >> > > > >> > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. > >> > > [ 3.877937] Aborting journal on device mmcblk0p13-8. > >> > > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error > >> > > >> > This message means that the file system has detected an inconsistency > >> > --- specifically, that the number of blocks marked as in use in the > >> > allocation bbitmap is different from what is in the block group > >> > descriptors. > >> > > >> > The file system has been marked to force a panic after an error, at > >> > which point e2fsck will be able to repair the inconsistency. > >> > > >> > What's not clear is *how* the why this happened. It can happen simply > >> > because of a hardware problem. (In particular, not all mmc flash > >> > devices handle power failures gracefully.) Or it could be a cosmic, > >> > ray, or it might be a kernel bug. > >> > > >> > Normally I would chalk this up to a hardware bug, bug it's possible > >> > that it is a kernel bug. If people can reliably reproduce the problem > >> > where no power failures or other unclean shutdowns were involved > >> > (since the last time file system has been checked using e2fsck) then > >> > that would be realy interesting. > >> > >> Hi Ted: > >> > >> I saw a similar failure during 3.16-rc3 (plus ext4 stable fixes plus msync > >> patch) regression on the Pandaboard this morning. A generic/068 hang > >> on data_journal required a reboot for recovery (old bug, though rarer lately). > >> On reboot, the root filesystem - default 4K, and on an SD card - went ro > >> after the same sort of bad block bitmap / journal abort sequence. Rebooting > >> forced a fsck that cleared up the problem. The target test filesystem was on > >> a USB-attached disk, and it did not exhibit the same problems on recovery. > > > > Please be careful about conclusions from regular SD cards and USB sticks for > > mass-storage. Unlike hardened eMMC (4.41+), these COTS mass-storage devices > > are not meant for intensive use and can perfectly easily corrupt data out of > > themselves. I've seen it happening many times already. > > > >> So, it looks like there might be more than just hardware involved here, > >> although eMMC/flash might be a common denominator. I'll see if I can come up > >> with a reliable reproducer once the regression pass is finished if someone > >> doesn't beat me to it. I've not found a reproducer that doesn't involve an unclean shutdown, which is what Ted's looking for. However, I've noted a behavioral change that might be of interest with the failure scenario described above using xfstests generic/068 that occurred between 3.14 and 3.15-rc3. It's possible that this change would make filesystem damage caused by an unclean shutdown more likely or more noticable, and perhaps it's in play for the power fail/cycle cases described in this thread. FWIW, I've also been able to reproduce that failure scenario on an x86_64 KVM with raw virtio disks alone. It's just a lot harder to get there with that configuration - many more trials required. The change is that the root filesystem sustains damage reported as - EXT4-fs error (device mmcblk0p3): ext4_mb_generate_buddy:757: group 65, 1243 clusters in bitmap, 1244 in gd; block bitmap corrupt. Aborting journal on device mmcblk0p3-8. EXT4-fs error (device mmcblk0p3): ext4_journal_check_start:56: Detected aborted journal EXT4-fs (mmcblk0p3): Remounting filesystem read-only - when generic/068 is run on a separate test filesystem that forces an ext4 failure requiring a power cycle / reset to recover from a hung reboot attempt. This doesn't happen in 3.14 on either my x86_64 or ARM test systems. Generally, the root filesystem doesn't appear to be affected at all or is minimally affected (does not require fsck to fully recover) in 3.14, whereas a fsck is usually required to recover the root in 3.15-rc3. My attempts to bisect further into 3.15-rc1 to 3.15-rc3 haven't gone well as yet - other kernel problems are making it difficult to work in there. Eric > > > > I agree that there is a strong correlation towards flash-based storage, but I > > cannot explain why this factor would make a difference. How are flash-based > > block-devices different to ext4 than spinning-disk media (besides trim > > support)? > > maybe the zero access time can trigger some race condition? > > > Best regards, > > > > -- > > David Jander > > Protonic Holland. > > > > -- > Matteo Croce > OpenWrt Developer ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-03 17:14 ` Eric Whitney @ 2014-07-03 23:17 ` Theodore Ts'o 2014-07-04 20:48 ` Eric Whitney 0 siblings, 1 reply; 50+ messages in thread From: Theodore Ts'o @ 2014-07-03 23:17 UTC (permalink / raw) To: Eric Whitney Cc: Matteo Croce, David Jander, Jaehoon Chung, Darrick J. Wong, linux-ext4 On Thu, Jul 03, 2014 at 01:14:34PM -0400, Eric Whitney wrote: > > FWIW, I've also been able to reproduce that failure scenario on an > x86_64 KVM with raw virtio disks alone. It's just a lot harder to > get there with that configuration - many more trials required. What configuration are you using when you see the failure using x86_64 KVM with raw virtio? Is this using kvm-xfstests, or some other setup? And which file system is getting corrupted? Because with kvm-xfstests the root file system is a read-only snapshot, and the "mmcblk0p3" device which you show before is clearly not the virtio device.... - Ted ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-03 23:17 ` Theodore Ts'o @ 2014-07-04 20:48 ` Eric Whitney 0 siblings, 0 replies; 50+ messages in thread From: Eric Whitney @ 2014-07-04 20:48 UTC (permalink / raw) To: Theodore Ts'o Cc: Eric Whitney, Matteo Croce, David Jander, Jaehoon Chung, Darrick J. Wong, linux-ext4 * Theodore Ts'o <tytso@mit.edu>: > On Thu, Jul 03, 2014 at 01:14:34PM -0400, Eric Whitney wrote: > > > > FWIW, I've also been able to reproduce that failure scenario on an > > x86_64 KVM with raw virtio disks alone. It's just a lot harder to > > get there with that configuration - many more trials required. > > What configuration are you using when you see the failure using x86_64 > KVM with raw virtio? Is this using kvm-xfstests, or some other setup? > And which file system is getting corrupted? > > Because with kvm-xfstests the root file system is a read-only > snapshot, and the "mmcblk0p3" device which you show before is clearly > not the virtio device.... > I'd meant the x86_64 note (paragraph beginning with FWIW) as an aside, and that regrettably caused confusion. Here's a more formal report, this time largely limited to an x86_64 case where eMMC is not involved. x86_64 KVM case: Base distro installed - Ubuntu 13.10 Kernel version - mainline 3.15-rc3 e2fsprogs - master branch, bb9cca2ca9 xfstests - master branch, 45d1fac130 xfsprogs - master branch, 03e956b252 xfstests-bld - master branch, 723c23e1b2 (about a year old) (e2fsprogs, xfstests, and xfsprogs are relatively recent - 1 to 2 months old, and latest as of the availability of 3.15-rc1.) The test scenario involves running xfstest generic/068 using the runtests.sh script from xfstests-bld within a KVM constructed with libvirt tools and where all test components were built and installed by hand. (kvm-xfstests was not used here.) The virtual machine running the test used a raw virtio disk for the root (vda) and for the test filesystem (vdc). (Two other virtio disks were also attached but not used for this test - vdb and vdd.) vda contained a vanilla 4k ext4 filesystem as provided by the distro. vdc contained a 4k ext4 filesystem mounted with the data=journal and block_validity mount options, newly created and mounted before each attempt to execute the test. The failure sequence: 1) running generic/068 on vdc forces a BUG - [ 1976.238967] kernel BUG at fs/buffer.c:3017! [ 1976.239070] invalid opcode: 0000 [#1] SMP [ 1976.239150] Modules linked in: kvm_intel kvm psmouse microcode i2c_piix4 serio_raw virtio_balloon [ 1976.239286] CPU: 0 PID: 11705 Comm: xfs_io Not tainted 3.15.0-rc3-emmc+ #1 [ 1976.239380] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 1976.239472] task: ffff88007624a110 ti: ffff8800768da000 task.ti: ffff8800768da000 [ 1976.239568] RIP: 0010:[<ffffffff811d849c>] [<ffffffff811d849c>] _submit_bh+0x18c/0x210 [ 1976.239682] RSP: 0018:ffff8800768dbd38 EFLAGS: 00010246 [ 1976.239763] RAX: 000000000022c005 RBX: ffff88006e3e3e38 RCX: 0000000000000000 [ 1976.242063] RDX: 0000000000000000 RSI: ffff88006e3e3e38 RDI: 0000000000000411 [ 1976.242759] RBP: ffff8800768dbd58 R08: 0000000000000000 R09: 0000000000000000 [ 1976.242759] R10: 00000000000360b0 R11: 0000000225c17d03 R12: ffff88006e3e3e38 [ 1976.242759] R13: ffff8800768dbe14 R14: ffff880067648800 R15: ffff88006e023410 [ 1976.242759] FS: 00007f2e03b85740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000 [ 1976.242759] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1976.242759] CR2: 000000000061da00 CR3: 0000000076a68000 CR4: 00000000000006f0 [ 1976.242759] Stack: [ 1976.242759] ffff88006e3e3e38 0000000000000411 ffff8800768dbe14 ffff880067648800 [ 1976.242759] ffff8800768dbd78 ffffffff811d8f71 0000000000000001 ffff880067648b00 [ 1976.242759] ffff8800768dbde0 ffffffff8129a67d 0000000091827364 ffff8800768dbd90 [ 1976.242759] Call Trace: [ 1976.242759] [<ffffffff811d8f71>] write_dirty_buffer+0x51/0x70 [ 1976.242759] [<ffffffff8129a67d>] __flush_batch+0x4d/0xa0 [ 1976.242759] [<ffffffff8129ac5c>] jbd2_log_do_checkpoint+0x1dc/0x480 [ 1976.242759] [<ffffffff8129f574>] jbd2_journal_flush+0x94/0x180 [ 1976.242759] [<ffffffff8126277e>] ext4_freeze+0x3e/0x80 [ 1976.242759] [<ffffffff811a6a88>] freeze_super+0xb8/0x130 [ 1976.242759] [<ffffffff811b7d3e>] do_vfs_ioctl+0x2ce/0x520 [ 1976.242759] [<ffffffff816d3689>] ? retint_swapgs+0xe/0x13 [ 1976.242759] [<ffffffff810a09ed>] ? trace_hardirqs_on_caller+0xfd/0x1c0 [ 1976.242759] [<ffffffff811b8011>] SyS_ioctl+0x81/0xa0 [ 1976.242759] [<ffffffff816dbf92>] system_call_fastpath+0x16/0x1b [ 1976.242759] Code: 5d 41 5e 5d c3 66 2e 0f 1f 84 00 00 00 00 00 40 f6 c7 01 0f 84 e3 fe ff ff f0 41 80 64 24 01 f7 e9 d7 fe ff ff 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 48 8b 53 78 c1 e0 09 41 f6 c5 01 89 43 28\ 89 42 08 [ 1976.242759] RIP [<ffffffff811d849c>] _submit_bh+0x18c/0x210 [ 1976.242759] RSP <ffff8800768dbd38> [ 1976.268124] ---[ end trace c0ad2d4fabce4179 ]--- (Note that this snippet was extracted from the kernel log - /var/log/kern.log on the root filesystem, and was therefore written successfully.) 2) Components of the test are then left in the D state (fsstress blocked on wchan sync_inodes_sb and rm on vfs_unlink according to ps), and the test's controlling shell is unresponsive. 3) Rebooting the test system then leads to a hang as the system is shutting down - unresponsive console, shutdown does not make progress. (Nothing new here.) 4) Resetting the virtual machine leads to a reboot, which in most (but not all) cases results in a read-only root filesystem. (This is where the filesystem damage is that I mentioned.) >From the dmesg log after reboot (since the kernel log can't be written once the root filesystem is read only) - [ 1.030238] EXT4-fs (vda1): INFO: recovery required on readonly filesystem [ 1.031162] EXT4-fs (vda1): write access will be enabled during recovery [ 1.432329] tsc: Refined TSC clocksource calibration: 3000.110 MHz [ 1.472223] EXT4-fs (vda1): recovery complete [ 1.477099] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null) [ 1.598232] random: init urandom read with 38 bits of entropy available [ 2.079563] Adding 1046524k swap on /dev/vda5. Priority:-1 extents:1 across:1046524k FS [ 2.305267] systemd-udevd[263]: starting version 204 [ 2.440193] Switched to clocksource tsc [ 2.441306] EXT4-fs (vda1): re-mounted. Opts: errors=remount-ro [ 2.626895] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 [ 2.729328] audit: type=1400 audit(1404499589.548:2): apparmor="STATUS" operation="profile_load" name="/sbin/dhclient" pid=314 comm="apparmor_parser" [ 2.729338] audit: type=1400 audit(1404499589.548:3): apparmor="STATUS" operation="profile_load" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=314 comm="apparmor_parser" [ 2.729342] audit: type=1400 audit(1404499589.548:4): apparmor="STATUS" operation="profile_load" name="/usr/lib/connman/scripts/dhclient-script" pid=314 comm="apparmor_parser" [ 2.729803] audit: type=1400 audit(1404499589.548:5): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=314 comm="apparmor_parser" [ 2.729810] audit: type=1400 audit(1404499589.548:6): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=314 comm="apparmor_parser" [ 2.729997] audit: type=1400 audit(1404499589.548:7): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=314 comm="apparmor_parser" [ 2.853872] microcode: CPU0 sig=0x623, pf=0x0, revision=0x1 [ 2.858650] microcode: CPU1 sig=0x623, pf=0x0, revision=0x1 [ 2.871491] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba [ 3.780154] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input3 [ 4.012541] init: failsafe main process (628) killed by TERM signal [ 4.300987] audit: type=1400 audit(1404499591.120:8): apparmor="STATUS" operation="profile_replace" name="/sbin/dhclient" pid=749 comm="apparmor_parser" [ 4.300997] audit: type=1400 audit(1404499591.120:9): apparmor="STATUS" operation="profile_replace" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=749 comm="apparmor_parser" [ 4.301003] audit: type=1400 audit(1404499591.120:10): apparmor="STATUS" operation="profile_replace" name="/usr/lib/connman/scripts/dhclient-script" pid=749 comm="apparmor_parser" [ 4.361772] EXT4-fs error (device vda1): ext4_mb_generate_buddy:756: group 42, 2591 clusters in bitmap, 2592 in gd; block bitmap corrupt. [ 4.361908] Aborting journal on device vda1-8. [ 4.362198] EXT4-fs (vda1): Remounting filesystem read-only (Note that this snippet is complete from the first indication of successful recovery of the root filesystem on reboot down to the apparent subsequent discovery of block bitmap corruption and journal abort.) 5) A second reboot attempt forces fsck to run and successfully recover the root filesystem, whereupon the test system resumes normal operation. (I've never observed a failed recovery attempt at this step.) This same sequence occurs in 3.14 (and has over several past releases) with the exception of the block bitmap corruption indication and automated fsck recovery on reboot here at the end. That appears to be new in 3.15 (and at least as early there as -rc3). It's about an order of magnitude harder for me to reproduce this on my x86_64 KVM as on my Pandaboard test system. It requires 100-200 generic/068 trials to trigger the failure on x86_64 as opposed to roughly 10 on the Pandaboard (this is a measure of the relative difficulty in triggering the generic/068 failure only - the block bitmap corruption then generally but not always occurs). So far, I've not been able to reproduce the failure on 3.14 on either test system. To be clear, it's not that I think the generic/068 failure is directly related to the block bitmap corruption on the root filesystem - it simply appears to be a (really awkward) way to trigger that behavior. Eric ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 15:58 ` Theodore Ts'o 2014-07-01 16:14 ` Lukáš Czerner 2014-07-01 16:36 ` Eric Whitney @ 2014-07-02 9:44 ` David Jander 2 siblings, 0 replies; 50+ messages in thread From: David Jander @ 2014-07-02 9:44 UTC (permalink / raw) To: Theodore Ts'o Cc: Jaehoon Chung, Darrick J. Wong, Matteo Croce, linux-ext4 Hi Ted, On Tue, 1 Jul 2014 11:58:12 -0400 "Theodore Ts'o" <tytso@mit.edu> wrote: > On Tue, Jul 01, 2014 at 09:07:27PM +0900, Jaehoon Chung wrote: > > Hi, > > > > i have interesting for this problem..Because i also found the same problem.. > > Is it Journal problem? > > > > I used the Linux version 3.16.0-rc3. > > > > [ 3.866449] EXT4-fs error (device mmcblk0p13): ext4_mb_generate_buddy:756: group 0, 20490 clusters in bitmap, 20488 in gd; block bitmap corrupt. > > [ 3.877937] Aborting journal on device mmcblk0p13-8. > > [ 3.885025] Kernel panic - not syncing: EXT4-fs (device mmcblk0p13): panic forced after error > > This message means that the file system has detected an inconsistency > --- specifically, that the number of blocks marked as in use in the > allocation bbitmap is different from what is in the block group > descriptors. > > The file system has been marked to force a panic after an error, at > which point e2fsck will be able to repair the inconsistency. > > What's not clear is *how* the why this happened. It can happen simply > because of a hardware problem. (In particular, not all mmc flash > devices handle power failures gracefully.) Or it could be a cosmic, > ray, or it might be a kernel bug. I understand all this. > Normally I would chalk this up to a hardware bug, bug it's possible > that it is a kernel bug. If people can reliably reproduce the problem > where no power failures or other unclean shutdowns were involved > (since the last time file system has been checked using e2fsck) then > that would be realy interesting. If you read my first reply to Matteo, you would have noticed that I can reliably reproduce this bug with ext4 and also that I can be pretty confident that this is NOT a hardware issue. Here's (again) why: The eMMC device supports eMMC 4.41 and is configured with all the "hardening" features necessary for embedded systems that boot from eMMC: Enhanced mode is active (SLC NAND mode) and reliable-writes are turned on. This means that (at least by design) when a power cut occurs it is guaranteed that: 1.- The sector currently being written will be either in the old state or in the new (re-written) state, but never "in-between" or in an unstable state (what happens to regular MLC NAND flash). 2.- No other sectors of the flash may be affected by write interruptions on one sector. So power-cuts should always end up just requiring a journal-replay on next mount. No real corruption should ever occur this way. Right? I have been testing with both EXT3 and EXT4 on this device and I only see problems when using EXT4. Furthermore, the process of reproducing the test produces with almost 100% reliability this error _ALWAYS_ when using EXT4, and until now I have not been able to use this procedure to corrupt or otherwise harm an EXT3 filesystem beyond simply replaying the journal on the next boot. Please tell me what you want me to test to continue investigating. I am convinced this is a kernel-bug, but I'd be happy if you managed to prove me wrong. I could even try git bisecting if you think this could help, but if I have to go too far back in time to find a working version I will get into trouble getting the kernel to boot on my hardware without patching a lot of things on each iteration.... > We should probably also change the message so the message is a bit > more understanding to people who aren't ext4 developers. That would be nice, but not really necessary. Let's better find the bug and solve it. Best regards, -- David Jander Protonic Holland. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: ext4: journal has aborted 2014-07-01 6:26 ` David Jander 2014-07-01 8:00 ` Matteo Croce 2014-07-01 8:42 ` Darrick J. Wong @ 2014-07-01 9:02 ` Darrick J. Wong 2 siblings, 0 replies; 50+ messages in thread From: Darrick J. Wong @ 2014-07-01 9:02 UTC (permalink / raw) To: David Jander; +Cc: Matteo Croce, linux-ext4, Theodore Ts'o On Tue, Jul 01, 2014 at 08:26:19AM +0200, David Jander wrote: > > Hi, > > On Mon, 30 Jun 2014 23:30:10 +0200 > Matteo Croce <technoboy85@gmail.com> wrote: > > > I was web surfing and using gimp when: > > > > EXT4-fs error (device sda2): ext4_mb_generate_buddy:756: group 199, > > 9414 clusters in bitmap, 9500 in gd; block bitmap corrupt. > > I was about to post a related question to this list. I am also seeing these > kind of errors when using ext4 on latest mainline (I began testing with 3.15 > where I saw this and now in 3.16-rc3 it is still there). > It happens almost instantly when power-cycling the system (unclean shutdown). > The next time the system boots, I get these errors. Hmm... David, could you also send me your dmesg when this happens? (I hope you can figure out a way to stash it somewhere...) I'm particularly interested in what the FS might do with regards to post-crash recovery. --D > > AFAICT, you are using a pretty recent kernel. Which version exactly? > > > Aborting journal on device sda2-8. > > EXT4-fs (sda2): Remounting filesystem read-only > > ------------[ cut here ]------------ > > WARNING: CPU: 6 PID: 4134 at fs/ext4/ext4_jbd2.c:259 > > __ext4_handle_dirty_metadata+0x18e/0x1d0() > > Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek > > snd_hda_codec_generic ecb uvcvideo videobuf2_vmalloc videobuf2_memops > > videobuf2_core videodev ath3k btusb rts5139(C) ctr ccm iTCO_wdt bnep > > rfcomm bluetooth nls_iso8859_1 vfat fat arc4 intel_rapl > > x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm > > snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm > > aesni_intel aes_x86_64 snd_seq_midi snd_seq_midi_event ath9k led_class > > glue_helper ath9k_common lrw gf128mul ath9k_hw ablk_helper cryptd ath > > mac80211 snd_rawmidi snd_seq cfg80211 radeon microcode rfkill > > snd_timer snd_seq_device sr_mod psmouse r8169 snd cdrom i915 lpc_ich > > soundcore ttm mii mfd_core drm_kms_helper drm intel_gtt agpgart > > ehci_pci mei_me xhci_hcd tpm_infineon ehci_hcd video mei wmi tpm > > backlight > > CPU: 6 PID: 4134 Comm: gimp-2.8 Tainted: G C 3.15.0 #6 > > 0000000000000009 ffffffff813acbdd 0000000000000000 ffffffff8103de3d > > ffff8802365231a0 00000000ffffffe2 0000000000000000 ffff8800b90816c0 > > ffffffff814205a0 ffffffff8118879e 0000000000000005 ffff8802365231a0 > > Call Trace: > > [<ffffffff813acbdd>] ? dump_stack+0x41/0x51 > > [<ffffffff8103de3d>] ? warn_slowpath_common+0x6d/0x90 > > [<ffffffff8118879e>] ? __ext4_handle_dirty_metadata+0x18e/0x1d0 > > [<ffffffff8116e130>] ? ext4_dirty_inode+0x20/0x50 > > [<ffffffff811903e9>] ? ext4_free_blocks+0x539/0xa40 > > [<ffffffff8118468b>] ? ext4_ext_remove_space+0x83b/0xe60 > > [<ffffffff81186a58>] ? ext4_ext_truncate+0x98/0xc0 > > [<ffffffff8116c985>] ? ext4_truncate+0x2b5/0x300 > > [<ffffffff8116d3d8>] ? ext4_evict_inode+0x3d8/0x410 > > [<ffffffff81114a46>] ? evict+0xa6/0x160 > > [<ffffffff81109346>] ? do_unlinkat+0x186/0x2a0 > > [<ffffffff8110e51e>] ? SyS_getdents+0xde/0x100 > > [<ffffffff8110e1d0>] ? fillonedir+0xd0/0xd0 > > [<ffffffff813b2626>] ? system_call_fastpath+0x1a/0x1f > > ---[ end trace 795411398e41fbcb ]--- > > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > > line 241, credits 91/91, errcode -30 > > EXT4: jbd2_journal_dirty_metadata failed: handle type 5 started at > > line 241, credits 91/91, errcode -30<2>EXT4-fs error (device sda2) in > > ext4_free_blocks:4867: Journal has aborted > > EXT4-fs error (device sda2): ext4_ext_rm_leaf:2731: inode #8257653: > > block 6520936: comm gimp-2.8: journal_dirty_metadata failed: handle > > type 5 started at line 241, credits 91/91, errcode -30 > > EXT4-fs error (device sda2) in ext4_ext_remove_space:3018: Journal has > > aborted EXT4-fs error (device sda2) in ext4_ext_truncate:4666: Journal has > > aborted EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > > has aborted > > EXT4-fs error (device sda2) in ext4_truncate:3788: Journal has aborted > > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > > has aborted > > EXT4-fs error (device sda2) in ext4_orphan_del:2684: Journal has aborted > > EXT4-fs error (device sda2) in ext4_reserve_inode_write:4877: Journal > > has aborted > > I did not get these errors. I suspect this may be a consequence of FS > corruption due to a bug in etx4. > > Here's why I suspect a bug: > > I am running latest git head (3.16-rc3+ as of yesterday) on an ARM system with > eMMC flash. The eMMC is formatted in SLC mode ("enhanced" mode according to > eMMC 4.41) and "reliable-writes" are enabled, so power-cycling should not > cause FS corruption in presence of a journal. > > I can format the eMMC device either as EXT3 or EXT4 for the test. After > formatting and writing the rootfs to the partition I can boot successfully in > either situation. Once booted from eMMC, I start bonnie++ (to just stress the > FS for a while), and after a minute or so the board is power-cycled while > bonnie++ is still running. > > Next time I boot the situation is this: > > With EXT3: All seems fine, journal is replayed, no errors. I can repeat this as > many times as I want, FS stays consistent. > > With EXT4: After just one power cycle I start getting this: > > [ 7.603871] EXT4-fs error (device mmcblk0p2): ext4_mb_generate_buddy:757: group 1, 8542 clusters in bitmap, 8550 in gd; block bitmap corrupt. > [ 7.616743] JBD2: Spotted dirty metadata buffer (dev = mmcblk0p2, blocknr = 0). There's a risk of filesystem corruption in case of system crash. > > If I continue the test, it doesn't take long and serious corruption starts > occurring. > > Again, with EXT3 I am unable to detect any problems. > > Best regards, > > -- > David Jander > Protonic Holland. > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2014-07-15 6:31 UTC | newest] Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-06-30 21:30 ext4: journal has aborted Matteo Croce 2014-07-01 6:26 ` David Jander 2014-07-01 8:00 ` Matteo Croce 2014-07-01 8:42 ` Darrick J. Wong 2014-07-01 8:55 ` Matteo Croce 2014-07-02 13:49 ` Dmitry Monakhov 2014-07-03 13:43 ` Theodore Ts'o 2014-07-03 14:15 ` David Jander 2014-07-03 14:46 ` Theodore Ts'o 2014-07-03 14:57 ` Dmitry Monakhov 2014-07-03 14:58 ` Dmitry Monakhov 2014-07-04 9:40 ` David Jander 2014-07-04 10:17 ` Dmitry Monakhov 2014-07-04 11:28 ` David Jander 2014-07-04 12:20 ` Theodore Ts'o 2014-07-04 12:38 ` Dmitry Monakhov 2014-07-04 13:45 ` David Jander 2014-07-04 18:45 ` Theodore Ts'o 2014-07-04 22:46 ` Dave Chinner 2014-07-05 2:30 ` Dmitry Monakhov 2014-07-05 20:36 ` Theodore Ts'o 2014-07-07 12:17 ` David Jander 2014-07-07 15:53 ` Theodore Ts'o 2014-07-07 22:31 ` Darrick J. Wong 2014-07-07 22:56 ` Theodore Ts'o 2014-07-10 18:57 ` Eric Whitney 2014-07-10 20:01 ` Darrick J. Wong 2014-07-10 21:31 ` Matteo Croce 2014-07-10 22:32 ` Theodore Ts'o 2014-07-11 0:13 ` Darrick J. Wong 2014-07-11 0:45 ` Eric Whitney 2014-07-11 8:50 ` Jaehoon Chung 2014-07-11 11:43 ` Theodore Ts'o 2014-07-15 6:31 ` David Jander 2014-07-10 23:29 ` Azat Khuzhin 2014-07-04 11:04 ` Jaehoon Chung 2014-07-04 11:32 ` David Jander 2014-07-01 12:07 ` Jaehoon Chung 2014-07-01 13:50 ` David Jander 2014-07-01 15:58 ` Theodore Ts'o 2014-07-01 16:14 ` Lukáš Czerner 2014-07-01 16:36 ` Eric Whitney 2014-07-02 8:34 ` Matteo Croce 2014-07-02 10:17 ` David Jander 2014-07-02 10:19 ` Matteo Croce 2014-07-03 17:14 ` Eric Whitney 2014-07-03 23:17 ` Theodore Ts'o 2014-07-04 20:48 ` Eric Whitney 2014-07-02 9:44 ` David Jander 2014-07-01 9:02 ` Darrick J. Wong
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.