* 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) @ 2009-12-24 22:28 Alexander Beregalov 2009-12-24 22:49 ` Alexander Beregalov 2009-12-24 23:05 ` tytso 0 siblings, 2 replies; 16+ messages in thread From: Alexander Beregalov @ 2009-12-24 22:28 UTC (permalink / raw) To: Linux Kernel Mailing List, linux-ext4, Jens Axboe, Theodore Ts'o Hi Kernel is 2.6.33-rc1-00366-g2f99f5c Ext4 mounts ext3 filesystem kernel BUG at fs/ext4/inode.c:1063! \|/ ____ \|/ "@'/ .. \`@" /_| \__/ |_\ \__U_/ flush-8:0(1137): Kernel bad sw trap 5 [#1] TSTATE: 0000000080001603 TPC: 0000000000544fb4 TNPC: 0000000000544fb8 Y: 00000000 Not tainted TPC: <ext4_get_blocks+0x3f4/0x400> g0: fffff800dc862fe0 g1: 0000000000000001 g2: 0000000000000001 g3: fffff800dc85f9c1 g4: fffff800df305880 g5: 2e68006800002e68 g6: fffff800dc860000 g7: 0000000000833e08 o0: 00000000007b0cb8 o1: 0000000000000427 o2: 0000000000000040 o3: 0000000000000040 o4: fffff800dc8630a0 o5: 000000000000000d sp: fffff800dc8628d1 ret_pc: 0000000000544fac RPC: <ext4_get_blocks+0x3ec/0x400> l0: 0000000000000002 l1: fffff800c2674000 l2: fffff800c26740d0 l3: 0000000000000001 l4: fffff800df39d2d0 l5: 00000000000003f9 l6: 0006000000000000 l7: 0000000000001ffe i0: 0000000000000040 i1: fffff800c2674130 i2: 000000000000064c i3: fffff800c26745a8 i4: fffff800dc863308 i5: 0000000000000003 i6: fffff800dc8629a1 i7: 0000000000545380 I7: <mpage_da_map_blocks+0x80/0x800> Disabling lock debugging due to kernel taint Caller[0000000000545380]: mpage_da_map_blocks+0x80/0x800 Caller[00000000005461c0]: mpage_add_bh_to_extent+0x40/0x100 Caller[000000000054642c]: __mpage_da_writepage+0x1ac/0x220 Caller[00000000004a951c]: write_cache_pages+0x19c/0x380 Caller[0000000000545d7c]: ext4_da_writepages+0x27c/0x680 Caller[00000000004a978c]: do_writepages+0x2c/0x60 Caller[00000000004f94cc]: writeback_single_inode+0xcc/0x3c0 Caller[00000000004fa3d8]: writeback_inodes_wb+0x338/0x500 Caller[00000000004fa6e8]: wb_writeback+0x148/0x220 Caller[00000000004fab00]: wb_do_writeback+0x240/0x260 Caller[00000000004fab8c]: bdi_writeback_task+0x6c/0xc0 Caller[00000000004b6f50]: bdi_start_fn+0x70/0xe0 Caller[000000000047030c]: kthread+0x6c/0x80 Caller[000000000042bc9c]: kernel_thread+0x3c/0x60 Caller[0000000000470408]: kthreadd+0xe8/0x160 Instruction DUMP: 92102427 7ffb95a5 901220b8 <91d02005> 01000000 01000000 9de3bf40 11002096 a4100018 note: flush-8:0[1137] exited with preempt_count 1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-24 22:28 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) Alexander Beregalov @ 2009-12-24 22:49 ` Alexander Beregalov 2009-12-25 12:31 ` Dmitry Monakhov 2009-12-24 23:05 ` tytso 1 sibling, 1 reply; 16+ messages in thread From: Alexander Beregalov @ 2009-12-24 22:49 UTC (permalink / raw) To: Linux Kernel Mailing List, linux-ext4, Jens Axboe, Theodore Ts'o 2009/12/25 Alexander Beregalov <a.beregalov@gmail.com>: > Hi > > Kernel is 2.6.33-rc1-00366-g2f99f5c > Ext4 mounts ext3 filesystem > > > > kernel BUG at fs/ext4/inode.c:1063! > \|/ ____ \|/ > "@'/ .. \`@" > /_| \__/ |_\ > \__U_/ > flush-8:0(1137): Kernel bad sw trap 5 [#1] > TSTATE: 0000000080001603 TPC: 0000000000544fb4 TNPC: 0000000000544fb8 > Y: 00000000 Not tainted > TPC: <ext4_get_blocks+0x3f4/0x400> > g0: fffff800dc862fe0 g1: 0000000000000001 g2: 0000000000000001 g3: > fffff800dc85f9c1 > g4: fffff800df305880 g5: 2e68006800002e68 g6: fffff800dc860000 g7: > 0000000000833e08 > o0: 00000000007b0cb8 o1: 0000000000000427 o2: 0000000000000040 o3: > 0000000000000040 > o4: fffff800dc8630a0 o5: 000000000000000d sp: fffff800dc8628d1 ret_pc: > 0000000000544fac > RPC: <ext4_get_blocks+0x3ec/0x400> > l0: 0000000000000002 l1: fffff800c2674000 l2: fffff800c26740d0 l3: > 0000000000000001 > l4: fffff800df39d2d0 l5: 00000000000003f9 l6: 0006000000000000 l7: > 0000000000001ffe > i0: 0000000000000040 i1: fffff800c2674130 i2: 000000000000064c i3: > fffff800c26745a8 > i4: fffff800dc863308 i5: 0000000000000003 i6: fffff800dc8629a1 i7: > 0000000000545380 > I7: <mpage_da_map_blocks+0x80/0x800> > Disabling lock debugging due to kernel taint > Caller[0000000000545380]: mpage_da_map_blocks+0x80/0x800 > Caller[00000000005461c0]: mpage_add_bh_to_extent+0x40/0x100 > Caller[000000000054642c]: __mpage_da_writepage+0x1ac/0x220 > Caller[00000000004a951c]: write_cache_pages+0x19c/0x380 > Caller[0000000000545d7c]: ext4_da_writepages+0x27c/0x680 > Caller[00000000004a978c]: do_writepages+0x2c/0x60 > Caller[00000000004f94cc]: writeback_single_inode+0xcc/0x3c0 > Caller[00000000004fa3d8]: writeback_inodes_wb+0x338/0x500 > Caller[00000000004fa6e8]: wb_writeback+0x148/0x220 > Caller[00000000004fab00]: wb_do_writeback+0x240/0x260 > Caller[00000000004fab8c]: bdi_writeback_task+0x6c/0xc0 > Caller[00000000004b6f50]: bdi_start_fn+0x70/0xe0 > Caller[000000000047030c]: kthread+0x6c/0x80 > Caller[000000000042bc9c]: kernel_thread+0x3c/0x60 > Caller[0000000000470408]: kthreadd+0xe8/0x160 > Instruction DUMP: 92102427 7ffb95a5 901220b8 <91d02005> 01000000 > 01000000 9de3bf40 11002096 a4100018 > note: flush-8:0[1137] exited with preempt_count 1 > It seems I can easily reproduce it. But I can't compile 2.6.33-rc2 :) scripts/kconfig/conf -s arch/sparc/Kconfig CHK include/linux/version.h CHK include/generated/utsrelease.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h GZIP kernel/config_data.gz CC fs/configfs/inode.o IKCFG kernel/config_data.h LD [M] fs/btrfs/btrfs.o CC kernel/configs.o fs/btrfs/sysfs.o: file not recognized: File truncated make[2]: *** [fs/btrfs/btrfs.o] Error 1 make[1]: *** [fs/btrfs] Error 2 make[1]: *** Waiting for unfinished jobs.... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-24 22:49 ` Alexander Beregalov @ 2009-12-25 12:31 ` Dmitry Monakhov 2009-12-25 19:33 ` Alexander Beregalov 0 siblings, 1 reply; 16+ messages in thread From: Dmitry Monakhov @ 2009-12-25 12:31 UTC (permalink / raw) To: Alexander Beregalov Cc: Linux Kernel Mailing List, linux-ext4, Jens Axboe, Theodore Ts'o Alexander Beregalov <a.beregalov@gmail.com> writes: > 2009/12/25 Alexander Beregalov <a.beregalov@gmail.com>: >> Hi >> >> Kernel is 2.6.33-rc1-00366-g2f99f5c >> Ext4 mounts ext3 filesystem >> >> >> >> kernel BUG at fs/ext4/inode.c:1063! >> \|/ ____ \|/ >> "@'/ .. \`@" >> /_| \__/ |_\ >> \__U_/ >> flush-8:0(1137): Kernel bad sw trap 5 [#1] >> TSTATE: 0000000080001603 TPC: 0000000000544fb4 TNPC: 0000000000544fb8 >> Y: 00000000 Not tainted >> TPC: <ext4_get_blocks+0x3f4/0x400> >> g0: fffff800dc862fe0 g1: 0000000000000001 g2: 0000000000000001 g3: >> fffff800dc85f9c1 >> g4: fffff800df305880 g5: 2e68006800002e68 g6: fffff800dc860000 g7: >> 0000000000833e08 >> o0: 00000000007b0cb8 o1: 0000000000000427 o2: 0000000000000040 o3: >> 0000000000000040 >> o4: fffff800dc8630a0 o5: 000000000000000d sp: fffff800dc8628d1 ret_pc: >> 0000000000544fac >> RPC: <ext4_get_blocks+0x3ec/0x400> >> l0: 0000000000000002 l1: fffff800c2674000 l2: fffff800c26740d0 l3: >> 0000000000000001 >> l4: fffff800df39d2d0 l5: 00000000000003f9 l6: 0006000000000000 l7: >> 0000000000001ffe >> i0: 0000000000000040 i1: fffff800c2674130 i2: 000000000000064c i3: >> fffff800c26745a8 >> i4: fffff800dc863308 i5: 0000000000000003 i6: fffff800dc8629a1 i7: >> 0000000000545380 >> I7: <mpage_da_map_blocks+0x80/0x800> >> Disabling lock debugging due to kernel taint >> Caller[0000000000545380]: mpage_da_map_blocks+0x80/0x800 >> Caller[00000000005461c0]: mpage_add_bh_to_extent+0x40/0x100 >> Caller[000000000054642c]: __mpage_da_writepage+0x1ac/0x220 >> Caller[00000000004a951c]: write_cache_pages+0x19c/0x380 >> Caller[0000000000545d7c]: ext4_da_writepages+0x27c/0x680 >> Caller[00000000004a978c]: do_writepages+0x2c/0x60 >> Caller[00000000004f94cc]: writeback_single_inode+0xcc/0x3c0 >> Caller[00000000004fa3d8]: writeback_inodes_wb+0x338/0x500 >> Caller[00000000004fa6e8]: wb_writeback+0x148/0x220 >> Caller[00000000004fab00]: wb_do_writeback+0x240/0x260 >> Caller[00000000004fab8c]: bdi_writeback_task+0x6c/0xc0 >> Caller[00000000004b6f50]: bdi_start_fn+0x70/0xe0 >> Caller[000000000047030c]: kthread+0x6c/0x80 >> Caller[000000000042bc9c]: kernel_thread+0x3c/0x60 >> Caller[0000000000470408]: kthreadd+0xe8/0x160 >> Instruction DUMP: 92102427 7ffb95a5 901220b8 <91d02005> 01000000 >> 01000000 9de3bf40 11002096 a4100018 >> note: flush-8:0[1137] exited with preempt_count 1 >> > > It seems I can easily reproduce it. > But I can't compile 2.6.33-rc2 :) > > scripts/kconfig/conf -s arch/sparc/Kconfig > CHK include/linux/version.h > CHK include/generated/utsrelease.h > CALL scripts/checksyscalls.sh > CHK include/generated/compile.h > GZIP kernel/config_data.gz > CC fs/configfs/inode.o > IKCFG kernel/config_data.h > LD [M] fs/btrfs/btrfs.o > CC kernel/configs.o > fs/btrfs/sysfs.o: file not recognized: File truncated This happens because of delayed allocation. Each time BUG or unexpected power off happens during object files usually becomes broken. IMHO this is expected issue. Just recompile from beginning # make clean; make -j4 As soon as your testcase is kernel compilation. Strange i'm living with quota patches on my notebook more than a month( two weeks with the version committed to quota git tree) with and without quota . But this never happens. Currently i'm trying to reproduce the bug on 2.6.33-rc2 Please add keep me in cc because seems the bug was introduced (or just triggered) by my quota patches. > make[2]: *** [fs/btrfs/btrfs.o] Error 1 > make[1]: *** [fs/btrfs] Error 2 > make[1]: *** Waiting for unfinished jobs.... > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-25 12:31 ` Dmitry Monakhov @ 2009-12-25 19:33 ` Alexander Beregalov 2009-12-25 23:47 ` Dmitry Monakhov 0 siblings, 1 reply; 16+ messages in thread From: Alexander Beregalov @ 2009-12-25 19:33 UTC (permalink / raw) To: Dmitry Monakhov Cc: Linux Kernel Mailing List, linux-ext4, Jens Axboe, Theodore Ts'o >> It seems I can easily reproduce it. >> But I can't compile 2.6.33-rc2 :) >> >> scripts/kconfig/conf -s arch/sparc/Kconfig >> CHK include/linux/version.h >> CHK include/generated/utsrelease.h >> CALL scripts/checksyscalls.sh >> CHK include/generated/compile.h >> GZIP kernel/config_data.gz >> CC fs/configfs/inode.o >> IKCFG kernel/config_data.h >> LD [M] fs/btrfs/btrfs.o >> CC kernel/configs.o >> fs/btrfs/sysfs.o: file not recognized: File truncated > This happens because of delayed allocation. Each time BUG or > unexpected power off happens during object files usually becomes > broken. IMHO this is expected issue. Just recompile from beginning > # make clean; make -j4 It does not help, it still fails. I will try to crosscompile the kernel with Ted's patch on another host. > As soon as your testcase is kernel compilation. > Strange i'm living with quota patches on my notebook more than > a month( two weeks with the version committed to quota git tree) > with and without quota . But this never happens. > Currently i'm trying to reproduce the bug on 2.6.33-rc2 > Please add keep me in cc because seems the bug was introduced > (or just triggered) by my quota patches. >> make[2]: *** [fs/btrfs/btrfs.o] Error 1 >> make[1]: *** [fs/btrfs] Error 2 >> make[1]: *** Waiting for unfinished jobs.... >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-25 19:33 ` Alexander Beregalov @ 2009-12-25 23:47 ` Dmitry Monakhov 2009-12-27 20:32 ` Alexander Beregalov 0 siblings, 1 reply; 16+ messages in thread From: Dmitry Monakhov @ 2009-12-25 23:47 UTC (permalink / raw) To: Alexander Beregalov Cc: Linux Kernel Mailing List, linux-ext4, Jens Axboe, Theodore Ts'o Alexander Beregalov <a.beregalov@gmail.com> writes: >>> It seems I can easily reproduce it. >>> But I can't compile 2.6.33-rc2 :) BTW what sha1 of the git-commit you have used to reproduce the bug (2.6.33-rc1 HEAD has no this BUG_ON). This is important to me to know it, or just post the fs/ext4/inode.c file. >>> >>> scripts/kconfig/conf -s arch/sparc/Kconfig >>> CHK include/linux/version.h >>> CHK include/generated/utsrelease.h >>> CALL scripts/checksyscalls.sh >>> CHK include/generated/compile.h >>> GZIP kernel/config_data.gz >>> CC fs/configfs/inode.o >>> IKCFG kernel/config_data.h >>> LD [M] fs/btrfs/btrfs.o >>> CC kernel/configs.o >>> fs/btrfs/sysfs.o: file not recognized: File truncated >> This happens because of delayed allocation. Each time BUG or >> unexpected power off happens during object files usually becomes >> broken. IMHO this is expected issue. Just recompile from beginning >> # make clean; make -j4 > > It does not help, it still fails. Again strange, please run fsck. What about compile it from very beginning (start from unpacking tar-ball from kernel.org) Or may be compile it on another file-system(ext3 or ext4 with nodelalloc option) > I will try to crosscompile the kernel with Ted's patch on another host. > It is sad, but i still can not reproduce your bug. At this time i've tested following configurations: system : 2.6.33-rc2, x86 two cores cpu with 2GB of ram block dev: real sata drive, loopdev over tmpfs mkfs : 4k and 1k blocksize mount : w/o quota, quota, journaled quota quota : both ON and OFF states fs-load : - fsstress with 1,4,16,32 concurrent tasks - kernel compilation -j4, -j32 - In fact currently my mail-dir is under quota control. Please clarify your use-case: 0) Your system speciffication: cpu_num, mem_size, page_size(i guess 8k) block device. 1) mkfs options 2) mount options 3) quota options (if any) 4) your fs load test-case 5) How long does it takes you to reproduce the bug. >> As soon as your testcase is kernel compilation. >> Strange i'm living with quota patches on my notebook more than >> a month( two weeks with the version committed to quota git tree) >> with and without quota . But this never happens. >> Currently i'm trying to reproduce the bug on 2.6.33-rc2 >> Please add keep me in cc because seems the bug was introduced >> (or just triggered) by my quota patches. >>> make[2]: *** [fs/btrfs/btrfs.o] Error 1 >>> make[1]: *** [fs/btrfs] Error 2 >>> make[1]: *** Waiting for unfinished jobs.... >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > LocalWords: speciffication cpu ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-25 23:47 ` Dmitry Monakhov @ 2009-12-27 20:32 ` Alexander Beregalov 2009-12-27 21:38 ` Dmitry Torokhov 2009-12-27 22:52 ` tytso 0 siblings, 2 replies; 16+ messages in thread From: Alexander Beregalov @ 2009-12-27 20:32 UTC (permalink / raw) To: Dmitry Monakhov Cc: Linux Kernel Mailing List, linux-ext4, Jens Axboe, Theodore Ts'o, dmitry.torokhov It seems Dmitry Torokhov has the same issue, Cc'ed. 2009/12/26 Dmitry Monakhov <dmonakhov@openvz.org>: > Alexander Beregalov <a.beregalov@gmail.com> writes: > >>>> It seems I can easily reproduce it. >>>> But I can't compile 2.6.33-rc2 :) > BTW what sha1 of the git-commit you have used to reproduce > the bug (2.6.33-rc1 HEAD has no this BUG_ON). > This is important to me to know it, or just post the > fs/ext4/inode.c file. It was in the first post - 2f99f5c There is only OCFS update between it and -rc2. >>>> >>>> scripts/kconfig/conf -s arch/sparc/Kconfig >>>> CHK include/linux/version.h >>>> CHK include/generated/utsrelease.h >>>> CALL scripts/checksyscalls.sh >>>> CHK include/generated/compile.h >>>> GZIP kernel/config_data.gz >>>> CC fs/configfs/inode.o >>>> IKCFG kernel/config_data.h >>>> LD [M] fs/btrfs/btrfs.o >>>> CC kernel/configs.o >>>> fs/btrfs/sysfs.o: file not recognized: File truncated >>> This happens because of delayed allocation. Each time BUG or >>> unexpected power off happens during object files usually becomes >>> broken. IMHO this is expected issue. Just recompile from beginning >>> # make clean; make -j4 >> >> It does not help, it still fails. > Again strange, please run fsck. What about compile it from very > beginning (start from unpacking tar-ball from kernel.org) > Or may be compile it on another file-system(ext3 or > ext4 with nodelalloc option) I tried fsck, it did not find any problem, kernel build still fails after it. >> I will try to crosscompile the kernel with Ted's patch on another host. Here is output of 2.6.33-rc2 plus Ted's patch EXT4-fs (sda1): inode #1387643: mdb_free (1) < mdb_claim (2) BUG ------------[ cut here ]------------ WARNING: at fs/ext4/inode.c:1067 ext4_get_blocks+0x3f0/0x440() Modules linked in: Call Trace: [0000000000456bb0] warn_slowpath_common+0x50/0xa0 [0000000000456c1c] warn_slowpath_null+0x1c/0x40 [0000000000545010] ext4_get_blocks+0x3f0/0x440 [0000000000545420] mpage_da_map_blocks+0x80/0x800 [0000000000546260] mpage_add_bh_to_extent+0x40/0x100 [00000000005464cc] __mpage_da_writepage+0x1ac/0x220 [00000000004a957c] write_cache_pages+0x19c/0x380 [0000000000545e1c] ext4_da_writepages+0x27c/0x680 [00000000004a97ec] do_writepages+0x2c/0x60 [00000000004f952c] writeback_single_inode+0xcc/0x3c0 [00000000004fa438] writeback_inodes_wb+0x338/0x500 [00000000004fa748] wb_writeback+0x148/0x220 [00000000004fab60] wb_do_writeback+0x240/0x260 [00000000004fabec] bdi_writeback_task+0x6c/0xc0 [00000000004b6fb0] bdi_start_fn+0x70/0xe0 [000000000047036c] kthread+0x6c/0x80 ---[ end trace 46a56c443941c84d ]--- >> > It is sad, but i still can not reproduce your bug. > At this time i've tested following configurations: > system : 2.6.33-rc2, x86 two cores cpu with 2GB of ram > block dev: real sata drive, loopdev over tmpfs > mkfs : 4k and 1k blocksize > mount : w/o quota, quota, journaled quota > quota : both ON and OFF states > fs-load : - fsstress with 1,4,16,32 concurrent tasks > - kernel compilation -j4, -j32 > - In fact currently my mail-dir is under quota control. > Please clarify your use-case: > 0) Your system speciffication: cpu_num, mem_size, page_size(i guess 8k) > block device. UltraSparc IIe, UP, 2Gb, 8kb, real SCSI disk (sym53c8xx driver) > 1) mkfs options I do not remember. Perhaps dumpe2fs can help root@v120 ~ # dumpe2fs -h /dev/sda1 dumpe2fs 1.41.9 (22-Aug-2009) Filesystem volume name: <none> Last mounted on: / Filesystem UUID: b34f302e-78a3-4f80-bae6-31639456216c Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file Filesystem flags: signed_directory_hash Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 2113536 Block count: 8448000 Reserved block count: 422400 Free blocks: 6661110 Free inodes: 1861302 First block: 0 Block size: 4096 Fragment size: 4096 Reserved GDT blocks: 1021 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Filesystem created: Tue Nov 10 00:44:17 2009 Last mount time: Sun Dec 27 20:05:48 2009 Last write time: Sat Dec 26 10:59:00 2009 Mount count: 3 Maximum mount count: 21 Last checked: Sat Dec 26 06:07:50 2009 Check interval: 15552000 (6 months) Next check after: Thu Jun 24 07:07:50 2010 Lifetime writes: 30 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: ae1ec2f1-0f86-4f26-ace5-eb656fd25709 Journal backup: inode blocks Journal size: 128M > 2) mount options noatime > 3) quota options (if any) No > 4) your fs load test-case Have not tried to find a simpler testcase yet. make CROSS_COMPILE="ccache sparc64-unknown-linux-gnu-" -j4 zImage modules Hm, perhaps ccache is the real trigger of the problem. > 5) How long does it takes you to reproduce the bug. Few seconds (~5) ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-27 20:32 ` Alexander Beregalov @ 2009-12-27 21:38 ` Dmitry Torokhov 2009-12-27 22:52 ` tytso 1 sibling, 0 replies; 16+ messages in thread From: Dmitry Torokhov @ 2009-12-27 21:38 UTC (permalink / raw) To: Alexander Beregalov Cc: Dmitry Monakhov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, Theodore Ts'o On Sun, Dec 27, 2009 at 11:32:25PM +0300, Alexander Beregalov wrote: > It seems Dmitry Torokhov has the same issue, Cc'ed. > > 2009/12/26 Dmitry Monakhov <dmonakhov@openvz.org>: > > Alexander Beregalov <a.beregalov@gmail.com> writes: > > > >>>> It seems I can easily reproduce it. > >>>> But I can't compile 2.6.33-rc2 :) > > BTW what sha1 of the git-commit you have used to reproduce > > the bug (2.6.33-rc1 HEAD has no this BUG_ON). > > This is important to me to know it, or just post the > > fs/ext4/inode.c file. > > It was in the first post - 2f99f5c > There is only OCFS update between it and -rc2. > > >>>> > >>>> scripts/kconfig/conf -s arch/sparc/Kconfig > >>>> CHK include/linux/version.h > >>>> CHK include/generated/utsrelease.h > >>>> CALL scripts/checksyscalls.sh > >>>> CHK include/generated/compile.h > >>>> GZIP kernel/config_data.gz > >>>> CC fs/configfs/inode.o > >>>> IKCFG kernel/config_data.h > >>>> LD [M] fs/btrfs/btrfs.o > >>>> CC kernel/configs.o > >>>> fs/btrfs/sysfs.o: file not recognized: File truncated > >>> This happens because of delayed allocation. Each time BUG or > >>> unexpected power off happens during object files usually becomes > >>> broken. IMHO this is expected issue. Just recompile from beginning > >>> # make clean; make -j4 > >> > >> It does not help, it still fails. > > Again strange, please run fsck. What about compile it from very > > beginning (start from unpacking tar-ball from kernel.org) > > Or may be compile it on another file-system(ext3 or > > ext4 with nodelalloc option) > > I tried fsck, it did not find any problem, kernel build still fails after it. > Are you using ccache? I do and all the breakage is hidden there (so "make clean" does not help), just clean you cache and you should be good to go. > >> I will try to crosscompile the kernel with Ted's patch on another host. > > Here is output of 2.6.33-rc2 plus Ted's patch > > EXT4-fs (sda1): inode #1387643: mdb_free (1) < mdb_claim (2) BUG > > ------------[ cut here ]------------ > WARNING: at fs/ext4/inode.c:1067 ext4_get_blocks+0x3f0/0x440() > Modules linked in: > Call Trace: > [0000000000456bb0] warn_slowpath_common+0x50/0xa0 > [0000000000456c1c] warn_slowpath_null+0x1c/0x40 > [0000000000545010] ext4_get_blocks+0x3f0/0x440 > [0000000000545420] mpage_da_map_blocks+0x80/0x800 > [0000000000546260] mpage_add_bh_to_extent+0x40/0x100 > [00000000005464cc] __mpage_da_writepage+0x1ac/0x220 > [00000000004a957c] write_cache_pages+0x19c/0x380 > [0000000000545e1c] ext4_da_writepages+0x27c/0x680 > [00000000004a97ec] do_writepages+0x2c/0x60 > [00000000004f952c] writeback_single_inode+0xcc/0x3c0 > [00000000004fa438] writeback_inodes_wb+0x338/0x500 > [00000000004fa748] wb_writeback+0x148/0x220 > [00000000004fab60] wb_do_writeback+0x240/0x260 > [00000000004fabec] bdi_writeback_task+0x6c/0xc0 > [00000000004b6fb0] bdi_start_fn+0x70/0xe0 > [000000000047036c] kthread+0x6c/0x80 > ---[ end trace 46a56c443941c84d ]--- > > >> > > It is sad, but i still can not reproduce your bug. It happens to me as soon as a moderate load is put on ext3 fs mounted with ext4 driver. > > At this time i've tested following configurations: > > system : 2.6.33-rc2, x86 two cores cpu with 2GB of ram > > block dev: real sata drive, loopdev over tmpfs > > mkfs : 4k and 1k blocksize > > mount : w/o quota, quota, journaled quota > > quota : both ON and OFF states > > fs-load : - fsstress with 1,4,16,32 concurrent tasks > > - kernel compilation -j4, -j32 > > - In fact currently my mail-dir is under quota control. > > Please clarify your use-case: > > 0) Your system speciffication: cpu_num, mem_size, page_size(i guess 8k) > > block device. > UltraSparc IIe, UP, 2Gb, 8kb, real SCSI disk (sym53c8xx driver) > > 1) mkfs options > I do not remember. > Perhaps dumpe2fs can help > > root@v120 ~ # dumpe2fs -h /dev/sda1 > dumpe2fs 1.41.9 (22-Aug-2009) > Filesystem volume name: <none> > Last mounted on: / > Filesystem UUID: b34f302e-78a3-4f80-bae6-31639456216c > Filesystem magic number: 0xEF53 > Filesystem revision #: 1 (dynamic) > Filesystem features: has_journal ext_attr resize_inode dir_index > filetype needs_recovery sparse_super large_file > Filesystem flags: signed_directory_hash > Default mount options: (none) > Filesystem state: clean > Errors behavior: Continue > Filesystem OS type: Linux > Inode count: 2113536 > Block count: 8448000 > Reserved block count: 422400 > Free blocks: 6661110 > Free inodes: 1861302 > First block: 0 > Block size: 4096 > Fragment size: 4096 > Reserved GDT blocks: 1021 > Blocks per group: 32768 > Fragments per group: 32768 > Inodes per group: 8192 > Inode blocks per group: 512 > Filesystem created: Tue Nov 10 00:44:17 2009 > Last mount time: Sun Dec 27 20:05:48 2009 > Last write time: Sat Dec 26 10:59:00 2009 > Mount count: 3 > Maximum mount count: 21 > Last checked: Sat Dec 26 06:07:50 2009 > Check interval: 15552000 (6 months) > Next check after: Thu Jun 24 07:07:50 2010 > Lifetime writes: 30 GB > Reserved blocks uid: 0 (user root) > Reserved blocks gid: 0 (group root) > First inode: 11 > Inode size: 256 > Required extra isize: 28 > Desired extra isize: 28 > Journal inode: 8 > Default directory hash: half_md4 > Directory Hash Seed: ae1ec2f1-0f86-4f26-ace5-eb656fd25709 > Journal backup: inode blocks > Journal size: 128M > > > > 2) mount options > noatime > > 3) quota options (if any) > No > > 4) your fs load test-case > Have not tried to find a simpler testcase yet. > make CROSS_COMPILE="ccache sparc64-unknown-linux-gnu-" -j4 zImage modules > > Hm, perhaps ccache is the real trigger of the problem. > > > 5) How long does it takes you to reproduce the bug. > Few seconds (~5) -- Dmitry ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-27 20:32 ` Alexander Beregalov 2009-12-27 21:38 ` Dmitry Torokhov @ 2009-12-27 22:52 ` tytso 2009-12-27 23:02 ` Alexander Beregalov 1 sibling, 1 reply; 16+ messages in thread From: tytso @ 2009-12-27 22:52 UTC (permalink / raw) To: Alexander Beregalov Cc: Dmitry Monakhov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, dmitry.torokhov On Sun, Dec 27, 2009 at 11:32:25PM +0300, Alexander Beregalov wrote: > >> I will try to crosscompile the kernel with Ted's patch on another host. > > Here is output of 2.6.33-rc2 plus Ted's patch > > EXT4-fs (sda1): inode #1387643: mdb_free (1) < mdb_claim (2) BUG OK, can you give me the output of: debugfs /dev/sda1 debugfs: stat <1387643> debugfs: ncheck 1387643 debugfs: quit Thanks!! - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-27 22:52 ` tytso @ 2009-12-27 23:02 ` Alexander Beregalov 2009-12-28 3:51 ` tytso 0 siblings, 1 reply; 16+ messages in thread From: Alexander Beregalov @ 2009-12-27 23:02 UTC (permalink / raw) To: tytso, Alexander Beregalov, Dmitry Monakhov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, dmitry.torokhov 2009/12/28 <tytso@mit.edu>: > On Sun, Dec 27, 2009 at 11:32:25PM +0300, Alexander Beregalov wrote: >> >> I will try to crosscompile the kernel with Ted's patch on another host. >> >> Here is output of 2.6.33-rc2 plus Ted's patch >> >> EXT4-fs (sda1): inode #1387643: mdb_free (1) < mdb_claim (2) BUG > > OK, can you give me the output of: > > debugfs /dev/sda1 > debugfs: stat <1387643> > debugfs: ncheck 1387643 > debugfs: quit Cleaning of CCache does not help. debugfs 1.41.9 (22-Aug-2009) debugfs: stat <1387643> Inode: 1387643 Type: regular Mode: 0644 Flags: 0x0 Generation: 2004186252 Version: 0x00000000:00000001 User: 1000 Group: 1003 Size: 11028803 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 21576 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x4b37d65c:04b0870a -- Mon Dec 28 00:49:16 2009 atime: 0x4b37d65a:2803e3e5 -- Mon Dec 28 00:49:14 2009 mtime: 0x4b37d65c:04b0870a -- Mon Dec 28 00:49:16 2009 crtime: 0x5ad6374b:2803e3e5 -- Tue Apr 17 22:04:59 2018 Size of extra inode fields: 28 BLOCKS: (0-11):172032-172043, (IND):165412, (12-1035):172044-173067, (DIND):165380, (IND):165381, (1036-2047):173068-174079, (2048-2059):188416-188427, (IND):165414, (2060-2692):188428-189060 TOTAL: 2697 debugfs: ncheck 1387643 Inode Pathname 1387643 /home/alexb/linux-2.6/kernel/built-in.o ~/linux-2.6 $ rm kernel/built-in.o ~/linux-2.6 $ sync ~/linux-2.6 $ kmake CHK include/linux/version.h CHK include/generated/utsrelease.h CALL scripts/checksyscalls.sh CHK include/generated/compile.h LD kernel/built-in.o LD [M] fs/btrfs/btrfs.o fs/btrfs/relocation.o: file not recognized: File truncated ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-27 23:02 ` Alexander Beregalov @ 2009-12-28 3:51 ` tytso 2009-12-30 5:37 ` tytso 0 siblings, 1 reply; 16+ messages in thread From: tytso @ 2009-12-28 3:51 UTC (permalink / raw) To: Alexander Beregalov Cc: Dmitry Monakhov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, dmitry.torokhov, Jan Kara kernel BUG at fs/ext4/inode.c:1063! > >> Here is output of 2.6.33-rc2 plus Ted's patch > >> > >> EXT4-fs (sda1): inode #1387643: mdb_free (1) < mdb_claim (2) BUG > > OK, i've been able to reproduce the problem using xfsqa test #74 (fstest) when an ext3 file system is mounted the ext4 file system driver. I was then able to bisect it down to commit d21cd8f6, which was introduced between 2.6.33-rc1 and 2.6.33-rc2, as part of quota/ext4 patch series pushed by Jan. I then tested v2.6.33-rc2 with commit d21cd8 reverted, and I was not able to replicate the BUG. More investigation is needed, but if we compare the potential quota deadlock with the apparently fairly-easy-to-replicate BUG, if we can't find a better fix fairly quickly we should probably just revert this commit for now. - Ted commit d21cd8f163ac44b15c465aab7306db931c606908 Author: Dmitry Monakhov <dmonakhov@openvz.org> AuthorDate: Thu Dec 10 03:31:45 2009 +0000 Commit: Jan Kara <jack@suse.cz> CommitDate: Wed Dec 23 13:44:12 2009 +0100 ext4: Fix potential quota deadlock We have to delay vfs_dq_claim_space() until allocation context destruction. Currently we have following call-trace: ext4_mb_new_blocks() /* task is already holding ac->alloc_semp */ ->ext4_mb_mark_diskspace_used ->vfs_dq_claim_space() /* acquire dqptr_sem here. Possible deadlock */ ->ext4_mb_release_context() /* drop ac->alloc_semp here */ Let's move quota claiming to ext4_da_update_reserve_space() ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.32-rc7 #18 ------------------------------------------------------- write-truncate-/3465 is trying to acquire lock: (&s->s_dquot.dqptr_sem){++++..}, at: [<c025e73b>] dquot_claim_space+0x3b/0x1b0 but task is already holding lock: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&meta_group_info[i]->alloc_sem){++++..}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 [<c02d0c1c>] ext4_mb_free_blocks+0x46c/0x870 [<c029c9d3>] ext4_free_blocks+0x73/0x130 [<c02c8cfc>] ext4_ext_truncate+0x76c/0x8d0 [<c02a8087>] ext4_truncate+0x187/0x5e0 [<c01e0f7b>] vmtruncate+0x6b/0x70 [<c022ec02>] inode_setattr+0x62/0x190 [<c02a2d7a>] ext4_setattr+0x25a/0x370 [<c022ee81>] notify_change+0x151/0x340 [<c021349d>] do_truncate+0x6d/0xa0 [<c0221034>] may_open+0x1d4/0x200 [<c022412b>] do_filp_open+0x1eb/0x910 [<c021244d>] do_sys_open+0x6d/0x140 [<c021258e>] sys_open+0x2e/0x40 [<c0103100>] sysenter_do_call+0x12/0x32 -> #2 (&ei->i_data_sem){++++..}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c02a5787>] ext4_get_blocks+0x47/0x450 [<c02a74c1>] ext4_getblk+0x61/0x1d0 [<c02a7a7f>] ext4_bread+0x1f/0xa0 [<c02bcddc>] ext4_quota_write+0x12c/0x310 [<c0262d23>] qtree_write_dquot+0x93/0x120 [<c0261708>] v2_write_dquot+0x28/0x30 [<c025d3fb>] dquot_commit+0xab/0xf0 [<c02be977>] ext4_write_dquot+0x77/0x90 [<c02be9bf>] ext4_mark_dquot_dirty+0x2f/0x50 [<c025e321>] dquot_alloc_inode+0x101/0x180 [<c029fec2>] ext4_new_inode+0x602/0xf00 [<c02ad789>] ext4_create+0x89/0x150 [<c0221ff2>] vfs_create+0xa2/0xc0 [<c02246e7>] do_filp_open+0x7a7/0x910 [<c021244d>] do_sys_open+0x6d/0x140 [<c021258e>] sys_open+0x2e/0x40 [<c0103100>] sysenter_do_call+0x12/0x32 -> #1 (&sb->s_type->i_mutex_key#7/4){+.+...}: [<c017d04b>] __lock_acquire+0xd7b/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0526505>] mutex_lock_nested+0x65/0x2d0 [<c0260c9d>] vfs_load_quota_inode+0x4bd/0x5a0 [<c02610af>] vfs_quota_on_path+0x5f/0x70 [<c02bc812>] ext4_quota_on+0x112/0x190 [<c026345a>] sys_quotactl+0x44a/0x8a0 [<c0103100>] sysenter_do_call+0x12/0x32 -> #0 (&s->s_dquot.dqptr_sem){++++..}: [<c017d361>] __lock_acquire+0x1091/0x1260 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c0527191>] down_read+0x51/0x90 [<c025e73b>] dquot_claim_space+0x3b/0x1b0 [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380 [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530 [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0 [<c02a5966>] ext4_get_blocks+0x226/0x450 [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0 [<c02a6ed6>] ext4_da_writepages+0x506/0x790 [<c01de272>] do_writepages+0x22/0x50 [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80 [<c01d7b9b>] filemap_flush+0x2b/0x30 [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60 [<c029e595>] ext4_release_file+0x75/0xb0 [<c0216b59>] __fput+0xf9/0x210 [<c0216c97>] fput+0x27/0x30 [<c02122dc>] filp_close+0x4c/0x80 [<c014510e>] put_files_struct+0x6e/0xd0 [<c01451b7>] exit_files+0x47/0x60 [<c0146a24>] do_exit+0x144/0x710 [<c0147028>] do_group_exit+0x38/0xa0 [<c0159abc>] get_signal_to_deliver+0x2ac/0x410 [<c0102849>] do_notify_resume+0xb9/0x890 [<c01032d2>] work_notifysig+0x13/0x21 other info that might help us debug this: 3 locks held by write-truncate-/3465: #0: (jbd2_handle){+.+...}, at: [<c02e1f8f>] start_this_handle+0x38f/0x5c0 #1: (&ei->i_data_sem){++++..}, at: [<c02a57f6>] ext4_get_blocks+0xb6/0x450 #2: (&meta_group_info[i]->alloc_sem){++++..}, at: [<c02ce962>] ext4_mb_load_buddy+0xb2/0x370 stack backtrace: Pid: 3465, comm: write-truncate- Not tainted 2.6.32-rc7 #18 Call Trace: [<c0524cb3>] ? printk+0x1d/0x22 [<c017ac9a>] print_circular_bug+0xca/0xd0 [<c017d361>] __lock_acquire+0x1091/0x1260 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c017d5ea>] lock_acquire+0xba/0xd0 [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0 [<c0527191>] down_read+0x51/0x90 [<c025e73b>] ? dquot_claim_space+0x3b/0x1b0 [<c025e73b>] dquot_claim_space+0x3b/0x1b0 [<c02cb95f>] ext4_mb_mark_diskspace_used+0x36f/0x380 [<c02d210a>] ext4_mb_new_blocks+0x34a/0x530 [<c02c601d>] ? ext4_ext_find_extent+0x25d/0x280 [<c02c83fb>] ext4_ext_get_blocks+0x122b/0x13c0 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c052712c>] ? down_write+0x8c/0xa0 [<c02a5966>] ext4_get_blocks+0x226/0x450 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c017908b>] ? trace_hardirqs_off+0xb/0x10 [<c02a5ff3>] mpage_da_map_blocks+0xc3/0xaa0 [<c01d69cc>] ? find_get_pages_tag+0x16c/0x180 [<c01d6860>] ? find_get_pages_tag+0x0/0x180 [<c02a73bd>] ? __mpage_da_writepage+0x16d/0x1a0 [<c01dfc4e>] ? pagevec_lookup_tag+0x2e/0x40 [<c01ddf1b>] ? write_cache_pages+0xdb/0x3d0 [<c02a7250>] ? __mpage_da_writepage+0x0/0x1a0 [<c02a6ed6>] ext4_da_writepages+0x506/0x790 [<c016beef>] ? cpu_clock+0x4f/0x60 [<c016bca2>] ? sched_clock_local+0xd2/0x170 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c016be60>] ? sched_clock_cpu+0x120/0x160 [<c02a69d0>] ? ext4_da_writepages+0x0/0x790 [<c01de272>] do_writepages+0x22/0x50 [<c01d766d>] __filemap_fdatawrite_range+0x6d/0x80 [<c01d7b9b>] filemap_flush+0x2b/0x30 [<c02a40ac>] ext4_alloc_da_blocks+0x5c/0x60 [<c029e595>] ext4_release_file+0x75/0xb0 [<c0216b59>] __fput+0xf9/0x210 [<c0216c97>] fput+0x27/0x30 [<c02122dc>] filp_close+0x4c/0x80 [<c014510e>] put_files_struct+0x6e/0xd0 [<c01451b7>] exit_files+0x47/0x60 [<c0146a24>] do_exit+0x144/0x710 [<c017b163>] ? lock_release_holdtime+0x33/0x210 [<c0528137>] ? _spin_unlock_irq+0x27/0x30 [<c0147028>] do_group_exit+0x38/0xa0 [<c017babb>] ? trace_hardirqs_on+0xb/0x10 [<c0159abc>] get_signal_to_deliver+0x2ac/0x410 [<c0102849>] do_notify_resume+0xb9/0x890 [<c0178fd0>] ? trace_hardirqs_off_caller+0x20/0xd0 [<c017b163>] ? lock_release_holdtime+0x33/0x210 [<c0165b50>] ? autoremove_wake_function+0x0/0x50 [<c017ba54>] ? trace_hardirqs_on_caller+0x134/0x190 [<c017babb>] ? trace_hardirqs_on+0xb/0x10 [<c0300ba4>] ? security_file_permission+0x14/0x20 [<c0215761>] ? vfs_write+0x131/0x190 [<c0214f50>] ? do_sync_write+0x0/0x120 [<c0103115>] ? sysenter_do_call+0x27/0x32 [<c01032d2>] work_notifysig+0x13/0x21 CC: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-28 3:51 ` tytso @ 2009-12-30 5:37 ` tytso 2009-12-30 13:18 ` Dmitry Monakhov 0 siblings, 1 reply; 16+ messages in thread From: tytso @ 2009-12-30 5:37 UTC (permalink / raw) To: Alexander Beregalov, Dmitry Monakhov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, dmitry.torokhov, Jan Kara, aanisimov, pl4nkton On Sun, Dec 27, 2009 at 10:51:59PM -0500, tytso@MIT.EDU wrote: > OK, i've been able to reproduce the problem using xfsqa test #74 > (fstest) when an ext3 file system is mounted the ext4 file system > driver. I was then able to bisect it down to commit d21cd8f6, which > was introduced between 2.6.33-rc1 and 2.6.33-rc2, as part of > quota/ext4 patch series pushed by Jan. OK, here's a patch which I think should avoid the BUG in fs/ext4/inode.c. It should fix the regression, but in the long run we need to pretty seriously rethink how we account for the need for potentially new meta-data blocks when doing delayed allocation. The remaining problem with this machinery is that ext4_da_update_reserve_space() and ext4_da_release_space() is that they both try to calculate how many metadata blocks will potentially required by calculating ext4_calc_metadata_amount() based on the number of delayed allocation blocks found in i_reserved_data_blocks. The problem is that ext4_calc_metadata_amount() assumes that the number of blocks passed to it is contiguous, and what might be left remaining to be written in the page cache could be anything but contiguous. This is a problem which has always been there, so it's not a regression per se; just a design flaw. The patch below should fixes the regression caused by commit d21cd8f, but we need to look much more closely to find a better way of accounting for the potential need for metadata for inodes facing delayed allocation. Could people who are having problems with the BUG in line 1063 of fs/ext4/inode.c try this patch? Thanks!! - Ted commit 48b71e562ecd35ab12f6b6420a92fb3c9145da92 Author: Theodore Ts'o <tytso@mit.edu> Date: Wed Dec 30 00:04:04 2009 -0500 ext4: Patch up how we claim metadata blocks for quota purposes Commit d21cd8f triggered a BUG in the function ext4_da_update_reserve_space() found in fs/ext4/inode.c, which was caused by fact that ext4_calc_metadata_amount() can over-estimate how many metadata blocks will be needed, especially when using direct block-mapped files. Work around this by not claiming any excess metadata blocks than we are prepared to claim at this point. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 3e3b454..d6e84b4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1058,14 +1058,23 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb; if (mdb_free) { - /* Account for allocated meta_blocks */ + /* + * Account for allocated meta_blocks; it is possible + * for us to have allocated more meta blocks than we + * are prepared to free at this point. This is + * because ext4_calc_metadata_amount can over-estimate + * how many blocks are still needed. So we may not be + * able to claim all of the allocated meta blocks + * right away. The accounting will work out in the end... + */ mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; - BUG_ON(mdb_free < mdb_claim); + if (mdb_free < mdb_claim) + mdb_claim = mdb_free; mdb_free -= mdb_claim; /* update fs dirty blocks counter */ percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free); - EXT4_I(inode)->i_allocated_meta_blocks = 0; + EXT4_I(inode)->i_allocated_meta_blocks -= mdb_claim; EXT4_I(inode)->i_reserved_meta_blocks = mdb; } @@ -1845,7 +1854,7 @@ repeat: static void ext4_da_release_space(struct inode *inode, int to_free) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - int total, mdb, mdb_free, release; + int total, mdb, mdb_free, mdb_claim, release; if (!to_free) return; /* Nothing to release, exit */ @@ -1874,6 +1883,16 @@ static void ext4_da_release_space(struct inode *inode, int to_free) BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb; + if (mdb_free) { + /* Account for allocated meta_blocks */ + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; + if (mdb_free < mdb_claim) + mdb_claim = mdb_free; + mdb_free -= mdb_claim; + + EXT4_I(inode)->i_allocated_meta_blocks -= mdb_claim; + } + release = to_free + mdb_free; /* update fs dirty blocks counter for truncate case */ ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-30 5:37 ` tytso @ 2009-12-30 13:18 ` Dmitry Monakhov 2009-12-30 17:45 ` tytso 2009-12-30 17:48 ` tytso 0 siblings, 2 replies; 16+ messages in thread From: Dmitry Monakhov @ 2009-12-30 13:18 UTC (permalink / raw) To: tytso Cc: Alexander Beregalov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, dmitry.torokhov, Jan Kara, aanisimov, pl4nkton tytso@mit.edu writes: > On Sun, Dec 27, 2009 at 10:51:59PM -0500, tytso@MIT.EDU wrote: >> OK, i've been able to reproduce the problem using xfsqa test #74 >> (fstest) when an ext3 file system is mounted the ext4 file system >> driver. I was then able to bisect it down to commit d21cd8f6, which >> was introduced between 2.6.33-rc1 and 2.6.33-rc2, as part of >> quota/ext4 patch series pushed by Jan. > > OK, here's a patch which I think should avoid the BUG in > fs/ext4/inode.c. It should fix the regression, but in the long run we > need to pretty seriously rethink how we account for the need for > potentially new meta-data blocks when doing delayed allocation. > > The remaining problem with this machinery is that > ext4_da_update_reserve_space() and ext4_da_release_space() is that > they both try to calculate how many metadata blocks will potentially > required by calculating ext4_calc_metadata_amount() based on the > number of delayed allocation blocks found in i_reserved_data_blocks. > The problem is that ext4_calc_metadata_amount() assumes that the > number of blocks passed to it is contiguous, and what might be left > remaining to be written in the page cache could be anything but > contiguous. This is a problem which has always been there, so it's > not a regression per se; just a design flaw. Hello, I've finally able to reproduce the issue. I'm agree with your diagnose. But while looking in to code i've found some questions see late in the message. > > The patch below should fixes the regression caused by commit d21cd8f, > but we need to look much more closely to find a better way of > accounting for the potential need for metadata for inodes facing > delayed allocation. Could people who are having problems with the BUG > in line 1063 of fs/ext4/inode.c try this patch? > > Thanks!! > > - Ted > > > commit 48b71e562ecd35ab12f6b6420a92fb3c9145da92 > Author: Theodore Ts'o <tytso@mit.edu> > Date: Wed Dec 30 00:04:04 2009 -0500 > > ext4: Patch up how we claim metadata blocks for quota purposes > > Commit d21cd8f triggered a BUG in the function > ext4_da_update_reserve_space() found in fs/ext4/inode.c, which was > caused by fact that ext4_calc_metadata_amount() can over-estimate how > many metadata blocks will be needed, especially when using direct > block-mapped files. Work around this by not claiming any excess > metadata blocks than we are prepared to claim at this point. > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 3e3b454..d6e84b4 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1058,14 +1058,23 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) > mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb; > > if (mdb_free) { > - /* Account for allocated meta_blocks */ > + /* > + * Account for allocated meta_blocks; it is possible > + * for us to have allocated more meta blocks than we > + * are prepared to free at this point. This is > + * because ext4_calc_metadata_amount can over-estimate > + * how many blocks are still needed. So we may not be > + * able to claim all of the allocated meta blocks > + * right away. The accounting will work out in the end... > + */ > mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; > - BUG_ON(mdb_free < mdb_claim); > + if (mdb_free < mdb_claim) > + mdb_claim = mdb_free; > mdb_free -= mdb_claim; > > /* update fs dirty blocks counter */ > percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free); > - EXT4_I(inode)->i_allocated_meta_blocks = 0; > + EXT4_I(inode)->i_allocated_meta_blocks -= mdb_claim; > EXT4_I(inode)->i_reserved_meta_blocks = mdb; > } > > @@ -1845,7 +1854,7 @@ repeat: > static void ext4_da_release_space(struct inode *inode, int to_free) > { > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); > - int total, mdb, mdb_free, release; > + int total, mdb, mdb_free, mdb_claim, release; > > if (!to_free) > return; /* Nothing to release, exit */ > @@ -1874,6 +1883,16 @@ static void ext4_da_release_space(struct inode *inode, int to_free) > BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); > mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb; > > + if (mdb_free) { > + /* Account for allocated meta_blocks */ > + mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; > + if (mdb_free < mdb_claim) > + mdb_claim = mdb_free; > + mdb_free -= mdb_claim; > + > + EXT4_I(inode)->i_allocated_meta_blocks -= mdb_claim; > + } > + Seems what this is not enough. Just imagine, we may have following call-trace: userspace pwrite(fd, d, 1000, off) ->ext4_da_reserve_space(inode, 1000) ->dq_reserve_space(1000 + md_needed) userspace ftruncate(fd, off) /* "off" is the same as in pwrite call */ ->ext4_da_invalidatepage() ->ext4_da_page_release_reservation() ->ext4_da_release_space() <<< And we decrease ->i_allocated_meta_blocks only if (mdb_free > 0) userspace close(fd) So reserved metadata blocks will leak. I'm able to reproduce it like this: quotacheck -cu /mnt quotaon /mnt fsstres -p 16 -d /mnt -l999999999 -n99999999& sleep 180 killall -9 fsstress sync; sync; cp /mnt/aquota.user > q1 quotaoff /mnt quotacheck -cu /mnt/ # recaculate real quota usage. cp /mnt/aquota.user > q2 diff -up q1 q2 # in my case i've found 1 block leaked. IMHO we may drop i_allocated_meta_block in ext4_release_file() But while looking in to this function i've found another question about locking static int ext4_release_file(struct inode *inode, struct file *filp) { if (EXT4_I(inode)->i_state & EXT4_STATE_DA_ALLOC_CLOSE) { ext4_alloc_da_blocks(inode); EXT4_I(inode)->i_state &= ~EXT4_STATE_DA_ALLOC_CLOSE; <<< Seems what i_state modification must being protected by i_mutex, but currently caller don't have to hold it. ..... } > release = to_free + mdb_free; > > /* update fs dirty blocks counter for truncate case */ > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-30 13:18 ` Dmitry Monakhov @ 2009-12-30 17:45 ` tytso 2009-12-30 17:48 ` tytso 1 sibling, 0 replies; 16+ messages in thread From: tytso @ 2009-12-30 17:45 UTC (permalink / raw) To: Dmitry Monakhov Cc: Alexander Beregalov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, dmitry.torokhov, Jan Kara, aanisimov, pl4nkton On Wed, Dec 30, 2009 at 04:18:09PM +0300, Dmitry Monakhov wrote: > Hello, I've finally able to reproduce the issue. I'm agree with your > diagnose. But while looking in to code i've found some questions > see late in the message. The simplest way of reproducing the BUG() that I've found is: mke2fs -t ext3 /dev/XXX mount /dev/XXX /mnt dd if=/dev/zero of=/mnt/big-file bs=1024k count=16 sync Unfortunately, this is all to easy for users to stumble across, so we need to fix this ASAP. :-( > Seems what this is not enough. > Just imagine, we may have following call-trace: > > userspace pwrite(fd, d, 1000, off) > ->ext4_da_reserve_space(inode, 1000) > ->dq_reserve_space(1000 + md_needed) > userspace ftruncate(fd, off) /* "off" is the same as in pwrite call */ > ->ext4_da_invalidatepage() > ->ext4_da_page_release_reservation() > ->ext4_da_release_space() > <<< And we decrease ->i_allocated_meta_blocks only if (mdb_free > 0) > userspace close(fd) I don't think this is the problem. After we do the truncate, we will be calling ext4_da_release_space with a value that should cause us to call ext4_calc_metadata_amount with 0, so it will return 0. At that point, we will have some i_reserved_metadata_blocks to free. The problem is that in ext4_da_release_space(), I forgot to call vfs_dq_claim_block(mdb_claim). That was probably the cause of the leak. In any case, here's a patch which also fixes the blatent under-estimation of the number of metadata blocks that could be needed if the process is writing random blocks into a sparse file. Unfortunately, especially for non-extent mapped files, we *very* badly over-estimate how many indirect blocks will be necessary we assume each data block requires 2-3 indirect blocks(!!!). Guessing exactly how many metadata blocks will be necessary when doing delayed allocation is painful, and I'm very tempted to simply change the quota system to not include metadata blocks at all. The only thing stopping me from doing this is we'd also need to make synchronized changes to userspace programs like checkquota. Care to give this a spin? BTW, are you testing with lockdep enabled? I'm reliably getting a LOCKDEP complaint any time I use quotas, either normal quotas or journalled quotas, and if I use normal quotas, I get a lot of complaints from the JBD layer about dirty metadata buffers that aren't part of a transaction belonging to the aquota.user file. (What's up with that? I thought if you weren't using journalled quotas the file that should be used is quota.user?) In any case, it looks like the quota code in ext4 needs more attention, and we may want to check and see if any of these bugs are also turning up in the ext3 code, or were introduced as part of the ext4 enhancements. (Clearly the problems associated with quota and delalloc are ext4 specific.) - Ted commit ef627929781c98113e6ae93f159dd3c12a884ad8 Author: Theodore Ts'o <tytso@mit.edu> Date: Wed Dec 30 00:04:04 2009 -0500 ext4: Patch up how we claim metadata blocks for quota purposes Commit d21cd8f triggered a BUG in the function ext4_da_update_reserve_space() found in fs/ext4/inode.c. The root cause of this BUG() is caused by the fact that ext4_calc_metadata_amount() can severely over-estimate how many metadata blocks will be needed, especially when using direct block-mapped files. In addition, it can also badly *under* estimate how much space is needed, since ext4_calc_metadata_amount() assumes that the blocks are contiguous, and this is not always true. If the application is writing blocks to a sparse file, the number of metadata blocks necessary can be severly underestimated by the functions ext4_da_reserve_space(), ext4_da_update_reserve_space() and ext4_da_release_space(). Unfortunately, doing this right means that we need to massively over-estimate the amount of free space needed. So in some cases we may need to force the inode to be written to disk asynchronously in the hope that we don't get spurious quota failures. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 3e3b454..84eeb8f 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1043,43 +1043,47 @@ static int ext4_calc_metadata_amount(struct inode *inode, int blocks) return ext4_indirect_calc_metadata_amount(inode, blocks); } +/* + * Called with i_data_sem down, which is important since we can call + * ext4_discard_preallocations() from here. + */ static void ext4_da_update_reserve_space(struct inode *inode, int used) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - int total, mdb, mdb_free, mdb_claim = 0; - - spin_lock(&EXT4_I(inode)->i_block_reservation_lock); - /* recalculate the number of metablocks still need to be reserved */ - total = EXT4_I(inode)->i_reserved_data_blocks - used; - mdb = ext4_calc_metadata_amount(inode, total); - - /* figure out how many metablocks to release */ - BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); - mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb; - - if (mdb_free) { - /* Account for allocated meta_blocks */ - mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; - BUG_ON(mdb_free < mdb_claim); - mdb_free -= mdb_claim; - - /* update fs dirty blocks counter */ + struct ext4_inode_info *ei = EXT4_I(inode); + int mdb_free = 0; + + spin_lock(&ei->i_block_reservation_lock); + if (unlikely(used > ei->i_reserved_data_blocks)) { + ext4_msg(inode->i_sb, KERN_NOTICE, "%s: ino %lu, used %d " + "with only %d reserved data blocks\n", + __func__, inode->i_ino, used, + ei->i_reserved_data_blocks); + WARN_ON(1); + used = ei->i_reserved_data_blocks; + } + + /* Update per-inode reservations */ + ei->i_reserved_data_blocks -= used; + used += ei->i_allocated_meta_blocks; + ei->i_reserved_meta_blocks -= ei->i_allocated_meta_blocks; + ei->i_allocated_meta_blocks = 0; + percpu_counter_sub(&sbi->s_dirtyblocks_counter, used); + + if (ei->i_reserved_data_blocks == 0) { + /* + * We can release all of the reserved metadata blocks + * only when we have written all of the delayed + * allocation blocks. + */ + mdb_free = ei->i_allocated_meta_blocks; percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free); - EXT4_I(inode)->i_allocated_meta_blocks = 0; - EXT4_I(inode)->i_reserved_meta_blocks = mdb; + ei->i_allocated_meta_blocks = 0; } - - /* update per-inode reservations */ - BUG_ON(used > EXT4_I(inode)->i_reserved_data_blocks); - EXT4_I(inode)->i_reserved_data_blocks -= used; - percpu_counter_sub(&sbi->s_dirtyblocks_counter, used + mdb_claim); spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); - vfs_dq_claim_block(inode, used + mdb_claim); - - /* - * free those over-booking quota for metadata blocks - */ + /* Update quota subsystem */ + vfs_dq_claim_block(inode, used); if (mdb_free) vfs_dq_release_reservation_block(inode, mdb_free); @@ -1088,7 +1092,8 @@ static void ext4_da_update_reserve_space(struct inode *inode, int used) * there aren't any writers on the inode, we can discard the * inode's preallocations. */ - if (!total && (atomic_read(&inode->i_writecount) == 0)) + if ((ei->i_reserved_data_blocks == 0) && + (atomic_read(&inode->i_writecount) == 0)) ext4_discard_preallocations(inode); } @@ -1801,7 +1806,8 @@ static int ext4_da_reserve_space(struct inode *inode, int nrblocks) { int retries = 0; struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - unsigned long md_needed, mdblocks, total = 0; + struct ext4_inode_info *ei = EXT4_I(inode); + unsigned long md_needed, md_reserved, total = 0; /* * recalculate the amount of metadata blocks to reserve @@ -1809,35 +1815,44 @@ static int ext4_da_reserve_space(struct inode *inode, int nrblocks) * worse case is one extent per block */ repeat: - spin_lock(&EXT4_I(inode)->i_block_reservation_lock); - total = EXT4_I(inode)->i_reserved_data_blocks + nrblocks; - mdblocks = ext4_calc_metadata_amount(inode, total); - BUG_ON(mdblocks < EXT4_I(inode)->i_reserved_meta_blocks); - - md_needed = mdblocks - EXT4_I(inode)->i_reserved_meta_blocks; + spin_lock(&ei->i_block_reservation_lock); + md_reserved = ei->i_reserved_meta_blocks; + md_needed = ext4_calc_metadata_amount(inode, nrblocks); total = md_needed + nrblocks; - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); + spin_unlock(&ei->i_block_reservation_lock); /* * Make quota reservation here to prevent quota overflow * later. Real quota accounting is done at pages writeout * time. */ - if (vfs_dq_reserve_block(inode, total)) + if (vfs_dq_reserve_block(inode, total)) { + /* + * We tend to badly over-estimate the amount of + * metadata blocks which are needed, so if we have + * reserved any metadata blocks, try to force out the + * inode and see if we have any better luck. + */ + if (md_reserved && retries++ <= 3) + goto retry; return -EDQUOT; + } if (ext4_claim_free_blocks(sbi, total)) { vfs_dq_release_reservation_block(inode, total); if (ext4_should_retry_alloc(inode->i_sb, &retries)) { + retry: + if (md_reserved) + write_inode_now(inode, (retries == 3)); yield(); goto repeat; } return -ENOSPC; } - spin_lock(&EXT4_I(inode)->i_block_reservation_lock); - EXT4_I(inode)->i_reserved_data_blocks += nrblocks; - EXT4_I(inode)->i_reserved_meta_blocks += md_needed; - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); + spin_lock(&ei->i_block_reservation_lock); + ei->i_reserved_data_blocks += nrblocks; + ei->i_reserved_meta_blocks += md_needed; + spin_unlock(&ei->i_block_reservation_lock); return 0; /* success */ } @@ -1845,49 +1860,45 @@ repeat: static void ext4_da_release_space(struct inode *inode, int to_free) { struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); - int total, mdb, mdb_free, release; + struct ext4_inode_info *ei = EXT4_I(inode); if (!to_free) return; /* Nothing to release, exit */ spin_lock(&EXT4_I(inode)->i_block_reservation_lock); - if (!EXT4_I(inode)->i_reserved_data_blocks) { + if (unlikely(to_free > ei->i_reserved_data_blocks)) { /* - * if there is no reserved blocks, but we try to free some - * then the counter is messed up somewhere. - * but since this function is called from invalidate - * page, it's harmless to return without any action + * if there aren't enough reserved blocks, then the + * counter is messed up somewhere. Since this + * function is called from invalidate page, it's + * harmless to return without any action. */ - printk(KERN_INFO "ext4 delalloc try to release %d reserved " - "blocks for inode %lu, but there is no reserved " - "data blocks\n", to_free, inode->i_ino); - spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); - return; + ext4_msg(inode->i_sb, KERN_NOTICE, "ext4_da_release_space: " + "ino %lu, to_free %d with only %d reserved " + "data blocks\n", inode->i_ino, to_free, + ei->i_reserved_data_blocks); + WARN_ON(1); + to_free = ei->i_reserved_data_blocks; } + ei->i_reserved_data_blocks -= to_free; - /* recalculate the number of metablocks still need to be reserved */ - total = EXT4_I(inode)->i_reserved_data_blocks - to_free; - mdb = ext4_calc_metadata_amount(inode, total); - - /* figure out how many metablocks to release */ - BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); - mdb_free = EXT4_I(inode)->i_reserved_meta_blocks - mdb; - - release = to_free + mdb_free; - - /* update fs dirty blocks counter for truncate case */ - percpu_counter_sub(&sbi->s_dirtyblocks_counter, release); + if (ei->i_reserved_data_blocks == 0) { + /* + * We can release all of the reserved metadata blocks + * only when we have written all of the delayed + * allocation blocks. + */ + to_free += ei->i_allocated_meta_blocks; + ei->i_allocated_meta_blocks = 0; + } - /* update per-inode reservations */ - BUG_ON(to_free > EXT4_I(inode)->i_reserved_data_blocks); - EXT4_I(inode)->i_reserved_data_blocks -= to_free; + /* update fs dirty blocks counter */ + percpu_counter_sub(&sbi->s_dirtyblocks_counter, to_free); - BUG_ON(mdb > EXT4_I(inode)->i_reserved_meta_blocks); - EXT4_I(inode)->i_reserved_meta_blocks = mdb; spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); - vfs_dq_release_reservation_block(inode, release); + vfs_dq_release_reservation_block(inode, to_free); } static void ext4_da_page_release_reservation(struct page *page, ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-30 13:18 ` Dmitry Monakhov 2009-12-30 17:45 ` tytso @ 2009-12-30 17:48 ` tytso 1 sibling, 0 replies; 16+ messages in thread From: tytso @ 2009-12-30 17:48 UTC (permalink / raw) To: Dmitry Monakhov Cc: Alexander Beregalov, Linux Kernel Mailing List, linux-ext4, Jens Axboe, dmitry.torokhov, Jan Kara, aanisimov, pl4nkton On Wed, Dec 30, 2009 at 04:18:09PM +0300, Dmitry Monakhov wrote: > > IMHO we may drop i_allocated_meta_block in ext4_release_file() > But while looking in to this function i've found another question > about locking > static int ext4_release_file(struct inode *inode, struct file *filp) > { > if (EXT4_I(inode)->i_state & EXT4_STATE_DA_ALLOC_CLOSE) { > ext4_alloc_da_blocks(inode); > EXT4_I(inode)->i_state &= ~EXT4_STATE_DA_ALLOC_CLOSE; > <<< Seems what i_state modification must being protected by i_mutex, > but currently caller don't have to hold it. (I'm answering this in a separate message since it really is a separate question). Yeah, that looks like a problem --- and it exists in more than just this one place. Unfortunately using i_mutex to protect updates to i_state is a bit heavyweight. What I'm thinking about doing is converting all of the references the i_state flags to use set_bit, clear_bit, and test_bit, since this will allow us to safely and cleanly set/clear/test individual bits. A quick audit of ext3 seems to show this is potentially a problem with ext3 as well (specifically, in fs/ext3/xattr.c's use of EXT3_STATE_XATTR). - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-24 22:28 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) Alexander Beregalov 2009-12-24 22:49 ` Alexander Beregalov @ 2009-12-24 23:05 ` tytso 2009-12-24 23:15 ` tytso 1 sibling, 1 reply; 16+ messages in thread From: tytso @ 2009-12-24 23:05 UTC (permalink / raw) To: Alexander Beregalov; +Cc: Linux Kernel Mailing List, linux-ext4, Jens Axboe On Fri, Dec 25, 2009 at 01:28:34AM +0300, Alexander Beregalov wrote: > > Kernel is 2.6.33-rc1-00366-g2f99f5c > Ext4 mounts ext3 filesystem > > kernel BUG at fs/ext4/inode.c:1063! OK, that's this BUG which is triggering: if (mdb_free) { /* Account for allocated meta_blocks */ mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; BUG_ON(mdb_free < mdb_claim); <------- BUG triggered mdb_free -= mdb_claim; Can you replicate this? If so, I'd like to ask you to replicate with the following debugging patch applied: --- fs/ext4/inode.c 2009-12-24 17:55:03.736366001 -0500 +++ fs/ext4/inode.c.new 2009-12-24 18:02:58.556366024 -0500 @@ -1060,6 +1060,10 @@ if (mdb_free) { /* Account for allocated meta_blocks */ mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; + if (mdb_free < mdb_claim) + ext4_msg(inode->i_sb, KERN_ERR, "inode #%lu: " + "mdb_free (%d) < mdb_claim (%d) BUG\n", + inode->i_ino, mdb_free, mdb_claim); BUG_ON(mdb_free < mdb_claim); mdb_free -= mdb_claim; Then once you get the inode number (suppose it's 12345), please send the output of the following debugfs commands: debugfs: stat <12345> debugfs: ncheck 12345 Thanks!! - Ted ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) 2009-12-24 23:05 ` tytso @ 2009-12-24 23:15 ` tytso 0 siblings, 0 replies; 16+ messages in thread From: tytso @ 2009-12-24 23:15 UTC (permalink / raw) To: Alexander Beregalov, Linux Kernel Mailing List, linux-ext4, Jens Axboe On Thu, Dec 24, 2009 at 06:05:12PM -0500, tytso@MIT.EDU wrote: > On Fri, Dec 25, 2009 at 01:28:34AM +0300, Alexander Beregalov wrote: > > > > Kernel is 2.6.33-rc1-00366-g2f99f5c > > Ext4 mounts ext3 filesystem > > > > kernel BUG at fs/ext4/inode.c:1063! > > OK, that's this BUG which is triggering: > > if (mdb_free) { > /* Account for allocated meta_blocks */ > mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; > BUG_ON(mdb_free < mdb_claim); <------- BUG triggered > mdb_free -= mdb_claim; > > Can you replicate this? If so, I'd like to ask you to replicate with > the following debugging patch applied: Here's a revised version of the patch which should avoid the BUG_ON, which should make it be less annoying. We should really figure out what's going on and fix it, though. It may be fixed by the recently pushed quota race fixes, or at least there's a good chace that it's related to a ext4 quota-releated WARN_ON that people have been complaining about. - Ted --- /tmp/inode.c 2009-12-24 17:55:03.736366001 -0500 +++ /tmp/inode.c.new 2009-12-24 18:13:07.716366002 -0500 @@ -1060,8 +1060,14 @@ if (mdb_free) { /* Account for allocated meta_blocks */ mdb_claim = EXT4_I(inode)->i_allocated_meta_blocks; - BUG_ON(mdb_free < mdb_claim); - mdb_free -= mdb_claim; + if (mdb_free < mdb_claim) { + ext4_msg(inode->i_sb, KERN_ERR, "inode #%lu: " + "mdb_free (%d) < mdb_claim (%d) BUG\n", + inode->i_ino, mdb_free, mdb_claim); + WARN_ON(1); + mdb_free = 0; + } else + mdb_free -= mdb_claim; /* update fs dirty blocks counter */ percpu_counter_sub(&sbi->s_dirtyblocks_counter, mdb_free); ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2009-12-30 17:48 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-12-24 22:28 2.6.33-rc1: kernel BUG at fs/ext4/inode.c:1063 (sparc) Alexander Beregalov 2009-12-24 22:49 ` Alexander Beregalov 2009-12-25 12:31 ` Dmitry Monakhov 2009-12-25 19:33 ` Alexander Beregalov 2009-12-25 23:47 ` Dmitry Monakhov 2009-12-27 20:32 ` Alexander Beregalov 2009-12-27 21:38 ` Dmitry Torokhov 2009-12-27 22:52 ` tytso 2009-12-27 23:02 ` Alexander Beregalov 2009-12-28 3:51 ` tytso 2009-12-30 5:37 ` tytso 2009-12-30 13:18 ` Dmitry Monakhov 2009-12-30 17:45 ` tytso 2009-12-30 17:48 ` tytso 2009-12-24 23:05 ` tytso 2009-12-24 23:15 ` tytso
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).