* 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems @ 2014-05-19 13:49 Marc MERLIN 2014-06-17 6:29 ` Satoru Takeuchi 0 siblings, 1 reply; 12+ messages in thread From: Marc MERLIN @ 2014-05-19 13:49 UTC (permalink / raw) To: linux-btrfs Ok, that's 2 out of 2. I was copying pictures from an sdcard (through mmcblk0), and the filesystem deadlocked. Unfortunately, when this happens, I copied my pictures (which were still in RAM) to my 2nd drive which was also btrfs. I had to reboot, and of course the last pictures didn't get committed to disk, but more annoyingly the copy I did to the second drive didn't work either. All the filenames got copied to the 2nd drive, some ended up with data, and others ended up empty. Why does a deadlock on drive 1 also cause btrfs to fail to write to drive #2? This is not the first time, there seem to be common codepaths across all drives (just like disk array #1 having problems causing failure of syslog to work on the boot drive with btrfs). I tried to capture sysrq+w, but it didn't make it to disk because of that bug. I do have remote syslog of the hangs before that though, but the capture of sysrq+w has too much missing data to be useful http://marc.merlins.org/tmp/btrfs-hang.txt Mmmh, maybe the deadlock is more complicated. I had a 2nd syslog stream going to an ext4 filesystem, exactly to get around that btrfs master deadlock, and now I see that didn't work either. If sync hangs, and logging to an ext4 filesystem didn't work, am I hitting another bug/hardware problem? Here's what I got at the end? [194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive! [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. [194932.445153] INFO: task IndexedDB:29612 blocked for more than 120 seconds. [194932.445161] Tainted: G W 3.15.0-rc5-amd64-i915-preempt-20140216s1 #2 [194932.445163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [194932.445166] IndexedDB D ffff8800ccde8bc0 0 29612 5570 0x00000080 [194932.445172] ffff8801b521fc30 0000000000000086 ffff8801b521fc00 ffff8801b521ffd8 [194932.445178] ffff8801d622a450 00000000000141c0 ffff88041e3941c0 ffff8801d622a450 [194932.445182] ffff8801b521fcd0 0000000000000002 ffffffff810fda1a ffff8801b521fc40 [194932.445188] Call Trace: [194932.445198] [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c [194932.445209] [<ffffffff8161ca1b>] io_schedule+0x60/0x7a [194932.445214] [<ffffffff810fda28>] sleep_on_page+0xe/0x12 [194932.445219] [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a [194932.445223] [<ffffffff810fdae3>] __lock_page+0x69/0x6b [194932.445228] [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34 [194932.445232] [<ffffffff81240c41>] lock_page+0x1e/0x21 [194932.445237] [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3 [194932.445243] [<ffffffff8161d2d4>] ? mutex_unlock+0x16/0x18 [194932.445248] [<ffffffff81239c74>] ? btrfs_file_aio_write+0x3e9/0x4b6 [194932.445251] [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c [194932.445255] [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4 [194932.445262] [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a [194932.445267] [<ffffffff811082b1>] do_writepages+0x1e/0x2c [194932.445272] [<ffffffff810ff179>] __filemap_fdatawrite_range+0x55/0x57 [194932.445277] [<ffffffff810ff1ef>] filemap_fdatawrite_range+0x13/0x15 [194932.445280] [<ffffffff8123885a>] btrfs_sync_file+0xa8/0x2b3 [194932.445286] [<ffffffff8132048f>] ? __percpu_counter_add+0x8c/0xa6 [194932.445292] [<ffffffff8117a1a7>] vfs_fsync_range+0x18/0x22 [194932.445296] [<ffffffff8117a1cd>] vfs_fsync+0x1c/0x1e [194932.445299] [<ffffffff8117a3d9>] do_fsync+0x2c/0x4c [194932.445303] [<ffffffff8117a5f9>] SyS_fdatasync+0x13/0x17 [194932.445308] [<ffffffff81625bad>] system_call_fastpath+0x1a/0x1f [194932.445395] INFO: task kworker/u16:35:3812 blocked for more than 120 seconds. [194932.445398] Tainted: G W 3.15.0-rc5-amd64-i915-preempt-20140216s1 #2 [194932.445400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [194932.445403] kworker/u16:35 D 0000000000000000 0 3812 2 0x00000080 [194932.445410] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) [194932.445414] ffff88003b647a00 0000000000000046 ffff88003b6479d0 ffff88003b647fd8 [194932.445419] ffff88003b8ca590 00000000000141c0 ffff88041e3941c0 ffff88003b8ca590 [194932.445423] ffff88003b647aa0 0000000000000002 ffffffff810fda1a ffff88003b647a10 [194932.445427] Call Trace: [194932.445432] [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c [194932.445437] [<ffffffff8161c876>] schedule+0x73/0x75 [194932.445441] [<ffffffff8161ca1b>] io_schedule+0x60/0x7a [194932.445445] [<ffffffff810fda28>] sleep_on_page+0xe/0x12 [194932.445450] [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a [194932.445454] [<ffffffff810fdae3>] __lock_page+0x69/0x6b [194932.445458] [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34 [194932.445461] [<ffffffff81240c41>] lock_page+0x1e/0x21 [194932.445465] [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3 [194932.445470] [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c [194932.445473] [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4 [194932.445479] [<ffffffff8162280c>] ? preempt_count_add+0x77/0x8d [194932.445483] [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a [194932.445488] [<ffffffff811082b1>] do_writepages+0x1e/0x2c [194932.445492] [<ffffffff81175ef2>] __writeback_single_inode+0x7d/0x238 [194932.445495] [<ffffffff81176c2a>] writeback_sb_inodes+0x1eb/0x339 [194932.445499] [<ffffffff81176dec>] __writeback_inodes_wb+0x74/0xb7 [194932.445503] [<ffffffff81176f67>] wb_writeback+0x138/0x293 [194932.445507] [<ffffffff8117759f>] bdi_writeback_workfn+0x19a/0x329 [194932.445513] [<ffffffff8100d047>] ? load_TLS+0xb/0xf [194932.445519] [<ffffffff81065d2e>] process_one_work+0x195/0x2d2 [194932.445523] [<ffffffff8106624a>] worker_thread+0x136/0x205 [194932.445526] [<ffffffff81066114>] ? rescuer_thread+0x27a/0x27a [194932.445530] [<ffffffff8106b467>] kthread+0xae/0xb6 [194932.445534] [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61 [194932.445537] [<ffffffff81625afc>] ret_from_fork+0x7c/0xb0 [194932.445540] [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61 -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems 2014-05-19 13:49 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems Marc MERLIN @ 2014-06-17 6:29 ` Satoru Takeuchi 2014-06-17 14:40 ` Marc MERLIN 2014-06-17 14:59 ` frustrations with handling of crash reports Marc MERLIN 0 siblings, 2 replies; 12+ messages in thread From: Satoru Takeuchi @ 2014-06-17 6:29 UTC (permalink / raw) To: Marc MERLIN, linux-btrfs Hi Marc, (2014/05/19 22:49), Marc MERLIN wrote: > Ok, that's 2 out of 2. > > I was copying pictures from an sdcard (through mmcblk0), and the > filesystem deadlocked. > > Unfortunately, when this happens, I copied my pictures (which were still > in RAM) to my 2nd drive which was also btrfs. From your sysrq capture, your sd card is formatted as VFAT, is it correct? === [194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive! === > I had to reboot, and of course the last pictures didn't get committed to > disk, but more annoyingly the copy I did to the second drive didn't work > either. > All the filenames got copied to the 2nd drive, some ended up with data, > and others ended up empty. > Why does a deadlock on drive 1 also cause btrfs to fail to write to > drive #2? > This is not the first time, there seem to be common codepaths across all > drives (just like disk array #1 having problems causing failure of > syslog to work on the boot drive with btrfs). > > I tried to capture sysrq+w, but it didn't make it to disk because of that bug. > I do have remote syslog of the hangs before that though, but the capture of sysrq+w > has too much missing data to be useful > http://marc.merlins.org/tmp/btrfs-hang.txt quoted from btrfs-hang.txt: === [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. === Did you try mkfs.fsck? In addition, does this problem happen after that? Here try to reproduce with 3.16-rc1 is desirable. If it's easy to reproduce, - run fsck.vfat (as I described before), - change SD card, - change copy target to other filesystem than btrfs is useful to find out the root cause. Thanks, Satoru > > Mmmh, maybe the deadlock is more complicated. I had a 2nd syslog stream > going to an ext4 filesystem, exactly to get around that btrfs master > deadlock, and now I see that didn't work either. > > If sync hangs, and logging to an ext4 filesystem didn't work, am I > hitting another bug/hardware problem? > > Here's what I got at the end? > > > [194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive! > [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. > [194932.445153] INFO: task IndexedDB:29612 blocked for more than 120 seconds. > [194932.445161] Tainted: G W 3.15.0-rc5-amd64-i915-preempt-20140216s1 #2 > [194932.445163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [194932.445166] IndexedDB D ffff8800ccde8bc0 0 29612 5570 0x00000080 > [194932.445172] ffff8801b521fc30 0000000000000086 ffff8801b521fc00 ffff8801b521ffd8 > [194932.445178] ffff8801d622a450 00000000000141c0 ffff88041e3941c0 ffff8801d622a450 > [194932.445182] ffff8801b521fcd0 0000000000000002 ffffffff810fda1a ffff8801b521fc40 > [194932.445188] Call Trace: > [194932.445198] [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c > [194932.445209] [<ffffffff8161ca1b>] io_schedule+0x60/0x7a > [194932.445214] [<ffffffff810fda28>] sleep_on_page+0xe/0x12 > [194932.445219] [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a > [194932.445223] [<ffffffff810fdae3>] __lock_page+0x69/0x6b > [194932.445228] [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34 > [194932.445232] [<ffffffff81240c41>] lock_page+0x1e/0x21 > [194932.445237] [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3 > [194932.445243] [<ffffffff8161d2d4>] ? mutex_unlock+0x16/0x18 > [194932.445248] [<ffffffff81239c74>] ? btrfs_file_aio_write+0x3e9/0x4b6 > [194932.445251] [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c > [194932.445255] [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4 > [194932.445262] [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a > [194932.445267] [<ffffffff811082b1>] do_writepages+0x1e/0x2c > [194932.445272] [<ffffffff810ff179>] __filemap_fdatawrite_range+0x55/0x57 > [194932.445277] [<ffffffff810ff1ef>] filemap_fdatawrite_range+0x13/0x15 > [194932.445280] [<ffffffff8123885a>] btrfs_sync_file+0xa8/0x2b3 > [194932.445286] [<ffffffff8132048f>] ? __percpu_counter_add+0x8c/0xa6 > [194932.445292] [<ffffffff8117a1a7>] vfs_fsync_range+0x18/0x22 > [194932.445296] [<ffffffff8117a1cd>] vfs_fsync+0x1c/0x1e > [194932.445299] [<ffffffff8117a3d9>] do_fsync+0x2c/0x4c > [194932.445303] [<ffffffff8117a5f9>] SyS_fdatasync+0x13/0x17 > [194932.445308] [<ffffffff81625bad>] system_call_fastpath+0x1a/0x1f > [194932.445395] INFO: task kworker/u16:35:3812 blocked for more than 120 seconds. > [194932.445398] Tainted: G W 3.15.0-rc5-amd64-i915-preempt-20140216s1 #2 > [194932.445400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [194932.445403] kworker/u16:35 D 0000000000000000 0 3812 2 0x00000080 > [194932.445410] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) > [194932.445414] ffff88003b647a00 0000000000000046 ffff88003b6479d0 ffff88003b647fd8 > [194932.445419] ffff88003b8ca590 00000000000141c0 ffff88041e3941c0 ffff88003b8ca590 > [194932.445423] ffff88003b647aa0 0000000000000002 ffffffff810fda1a ffff88003b647a10 > [194932.445427] Call Trace: > [194932.445432] [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c > [194932.445437] [<ffffffff8161c876>] schedule+0x73/0x75 > [194932.445441] [<ffffffff8161ca1b>] io_schedule+0x60/0x7a > [194932.445445] [<ffffffff810fda28>] sleep_on_page+0xe/0x12 > [194932.445450] [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a > [194932.445454] [<ffffffff810fdae3>] __lock_page+0x69/0x6b > [194932.445458] [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34 > [194932.445461] [<ffffffff81240c41>] lock_page+0x1e/0x21 > [194932.445465] [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3 > [194932.445470] [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c > [194932.445473] [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4 > [194932.445479] [<ffffffff8162280c>] ? preempt_count_add+0x77/0x8d > [194932.445483] [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a > [194932.445488] [<ffffffff811082b1>] do_writepages+0x1e/0x2c > [194932.445492] [<ffffffff81175ef2>] __writeback_single_inode+0x7d/0x238 > [194932.445495] [<ffffffff81176c2a>] writeback_sb_inodes+0x1eb/0x339 > [194932.445499] [<ffffffff81176dec>] __writeback_inodes_wb+0x74/0xb7 > [194932.445503] [<ffffffff81176f67>] wb_writeback+0x138/0x293 > [194932.445507] [<ffffffff8117759f>] bdi_writeback_workfn+0x19a/0x329 > [194932.445513] [<ffffffff8100d047>] ? load_TLS+0xb/0xf > [194932.445519] [<ffffffff81065d2e>] process_one_work+0x195/0x2d2 > [194932.445523] [<ffffffff8106624a>] worker_thread+0x136/0x205 > [194932.445526] [<ffffffff81066114>] ? rescuer_thread+0x27a/0x27a > [194932.445530] [<ffffffff8106b467>] kthread+0xae/0xb6 > [194932.445534] [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61 > [194932.445537] [<ffffffff81625afc>] ret_from_fork+0x7c/0xb0 > [194932.445540] [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61 > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems 2014-06-17 6:29 ` Satoru Takeuchi @ 2014-06-17 14:40 ` Marc MERLIN 2014-06-17 14:59 ` frustrations with handling of crash reports Marc MERLIN 1 sibling, 0 replies; 12+ messages in thread From: Marc MERLIN @ 2014-06-17 14:40 UTC (permalink / raw) To: Satoru Takeuchi; +Cc: linux-btrfs On Tue, Jun 17, 2014 at 03:29:19PM +0900, Satoru Takeuchi wrote: > Hi Marc, > > (2014/05/19 22:49), Marc MERLIN wrote: > >Ok, that's 2 out of 2. > > > >I was copying pictures from an sdcard (through mmcblk0), and the > >filesystem deadlocked. > > > >Unfortunately, when this happens, I copied my pictures (which were still > >in RAM) to my 2nd drive which was also btrfs. > > From your sysrq capture, your sd card is formatted as VFAT, is it correct? Yes, typical camera sdcard :) > === > [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some > data may be corrupt. Please run fsck. > === > > Did you try mkfs.fsck? In addition, does this problem happen > after that? Here try to reproduce with 3.16-rc1 is desirable. Tat was almost a month ago. The card has been reformatted since then, but the problem was not with the sdcard or vfat FS. All the data was read fine, ended up in the page cache, and btrfs failed to actually commit it to disk. > If it's easy to reproduce, > > - run fsck.vfat (as I described before), > - change SD card, > - change copy target to other filesystem than btrfs > > is useful to find out the root cause. I wish I could reproduce this at will, but I can't. In some way, that's good since I lost actual pictures (from Japan at the time) each time this happened. Either way, thanks for having a look. I'll answer the rest in another message since it warrants another thread. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* frustrations with handling of crash reports 2014-06-17 6:29 ` Satoru Takeuchi 2014-06-17 14:40 ` Marc MERLIN @ 2014-06-17 14:59 ` Marc MERLIN 2014-06-17 18:27 ` Marc MERLIN 1 sibling, 1 reply; 12+ messages in thread From: Marc MERLIN @ 2014-06-17 14:59 UTC (permalink / raw) To: Satoru Takeuchi; +Cc: linux-btrfs, Chris Mason On Tue, Jun 17, 2014 at 03:29:19PM +0900, Satoru Takeuchi wrote: > after that? Here try to reproduce with 3.16-rc1 is desirable. As for 3.16-rc1, my problem, and this is not targetted at you, just a general unfortunate observation of my last months of btrfs testing and reorts. I use btrfs on real systems. I have backups, but having crashes is both inconvenient and time consuming since it's my real data and systems I need and use, and where a crash at the wrong time can be quite inconvenient, as well as cost me hours of time to recover. I get to pick between staying 1 or 2 kernel versions back and being told that all my reports are useless because the kernel is too old, and running unstable kernels and unfortuantely still have at least 50% of my reports not looked at. I signed up for that when using btrfs, but at least 50% of the time, when I went through this, and reported the problems (which took more time since it's not a test system in a VM or with serial console), the reports were ignored, or looked like they were. The other half of the time, I was indeed told to use an even more unstable/unproven version of btrfs, assuming the one I was running wasn't already unstable/unproven enough. There is no right answer here, I understand that in their limited time developers are working on new code or maybe fixing existing code. However, if they are interested in real users with real data brave enough to run recent code, my wish would be for more support and timely interest when severe hang problems are reported, or corruption. Case in point: I just reported a FS last week that oopsed btrfs, and worse that crashed 3.15 (the problem is still there, but the symptom is worse in 3.15), and got no answer from anyone carying about the filesystem. I asked a 2nd time before deleting it, no one answered. It took me 2h during my work day (which isn't supposed to be related to testing btrfs) to even capture that, and now I regret even having bothered because it looks like no one cared. Next time it happens again, I likely won't waste my time to report it and get back to work, potentially reverting the FS to something other than btrfs :( Similarly I've seen other posts from people reporting corruption, data loss, and getting no answer or feedback at all. I realize that the developers can't put hours of personal time chasing each (sometimes incomplete) report sent to the list, but my point is that if user reports or crashes and data loss seemingly get so little attention, this is going to put off a lot of early users who will get burnt, remember the bad experience, and not come back. I already know some who have, some of which have even told me "why do you even still bother using btrfs for your data". Btrfs is labelled as experimental, no one has a right to complain if data is lost, but my suggestion is for developers to allocate a bit more time to looking at user reports, especially if they spent time getting the crash data and trying to give useful information. It is also ok to answer "Any FS created or used before kernel 3.x can be corrupted due to bugs we fixed in 3.y, thank you for your report but it's not a good use of our time to investigate this" (although newer kernels should not just crash with BUG(xxx) on unexpected data, they should remount the FS read only). Maybe it would make sense for some developers to clean those up, and do some kind of unofficial rotation of list monitoring to gather important reports from users and act on the ones that have useful data or at least get back to users who reported a crash on code that is known to have corruption or deadlock problems that were fixed in newer kernels. Again, this was not targeted at your answer, I do thank you for trying to help. Thanks to all for reading. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-17 14:59 ` frustrations with handling of crash reports Marc MERLIN @ 2014-06-17 18:27 ` Marc MERLIN 2014-06-18 13:23 ` Konstantinos Skarlatos 0 siblings, 1 reply; 12+ messages in thread From: Marc MERLIN @ 2014-06-17 18:27 UTC (permalink / raw) To: Satoru Takeuchi; +Cc: linux-btrfs, Chris Mason On Tue, Jun 17, 2014 at 07:59:57AM -0700, Marc MERLIN wrote: > It is also ok to answer "Any FS created or used before kernel 3.x can be > corrupted due to bugs we fixed in 3.y, thank you for your report but it's > not a good use of our time to investigate this" > (although newer kernels should not just crash with BUG(xxx) on unexpected > data, they should remount the FS read only). I was thinking about this some more, and I know I have no right to tell others what to do, so take this as a mere suggestion :) How about doing a release with cleanups and stabilization and better state reporting when things go wrong? This would give a good known version for users who have actual data and backups that can take many hours or days to restore (never mind downtime). A few things I was thinking about: 1) Wouldn't it be a good time to replace all the BUG ON statements with appropriate error handling? Unexpected data can happen, the kernel shouldn't crash that. At the very least it should remount read only and give maybe a wiki link to the user on what to do next (some bu reporting and recovery page) 2) On unexpected cases, output basic information on the filesystem or printk instructions to the user on how to gather data that would be sent to the list to be reviewed. This would include information on how old the filesystem is when it's possible to detect, and the instruction page could say "sorry, anything older than X, we don't want to hear about, we already fixed corruption bugs since then" 3) getting printk data on an end user machine when it just started refusing to write to disk can be challenging and cause useful debug info to be lost. Things I thinking about: a) make sure most btrfs bugs do not just hang the kernel b) recommend to users to send kernel syslog messages to an ext4 partition How does that sound? Thanks, Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-17 18:27 ` Marc MERLIN @ 2014-06-18 13:23 ` Konstantinos Skarlatos 2014-06-18 21:22 ` Duncan 0 siblings, 1 reply; 12+ messages in thread From: Konstantinos Skarlatos @ 2014-06-18 13:23 UTC (permalink / raw) To: Marc MERLIN, Satoru Takeuchi; +Cc: linux-btrfs, Chris Mason, torvalds On 17/6/2014 9:27 μμ, Marc MERLIN wrote: > On Tue, Jun 17, 2014 at 07:59:57AM -0700, Marc MERLIN wrote: >> It is also ok to answer "Any FS created or used before kernel 3.x can be >> corrupted due to bugs we fixed in 3.y, thank you for your report but it's >> not a good use of our time to investigate this" >> (although newer kernels should not just crash with BUG(xxx) on unexpected >> data, they should remount the FS read only). > I was thinking about this some more, and I know I have no right to tell > others what to do, so take this as a mere suggestion :) > > How about doing a release with cleanups and stabilization and better state > reporting when things go wrong? > > This would give a good known version for users who have actual data and > backups that can take many hours or days to restore (never mind downtime). > > A few things I was thinking about: > 1) Wouldn't it be a good time to replace all the BUG ON statements with > appropriate error handling? Unexpected data can happen, the kernel shouldn't > crash that. > At the very least it should remount read only and give maybe a wiki link to > the user on what to do next (some bu reporting and recovery page) > > 2) On unexpected cases, output basic information on the filesystem or printk > instructions to the user on how to gather data that would be sent to the > list to be reviewed. > This would include information on how old the filesystem is when it's > possible to detect, and the instruction page could say "sorry, anything > older than X, we don't want to hear about, we already fixed corruption bugs > since then" > > 3) getting printk data on an end user machine when it just started refusing > to write to disk can be challenging and cause useful debug info to be lost. > Things I thinking about: > a) make sure most btrfs bugs do not just hang the kernel > b) recommend to users to send kernel syslog messages to an ext4 partition > > How does that sound? I 100% agree with this. I also have a problem where btrfs decides to BUG_ON and force a kernel panic because it has found an unexpected type of metadata. Although in my case I was more lucky and had help and test patches from Liu Bo, I am still of the opinion that btrfs should not take down a whole system because it found something unexpected. I guess that btrfs developers have put these BUG_ONs so that they get reports from users when btrfs gets in these unexpected situations. But if most of these reports are ignored or not resolved, then maybe there is no use for these BUG_ONs and they should be replaced with something more mild. Keep in mind that if a system panics, then the only way to get logs from it is with serial or netconsole, so BUG_ON really makes it much harder for users to know what happened and send reports, and only the most technical and determined users will manage to send reports here. So I can guess that the real number of kernel panics due to btrfs is much higher, and most people are unable to report them, because they _never know_ that it was btrfs that caused their crash. I know btrfs is still experimental, but it is in kernel since 2009-01-09, so I think most users have some expectation of stability after something is 5.5 years in the mainline kernel. So my suggestion is that basicaly the same with Marc's: These BUG_ONs should be replaced with something that does not crash the system and gives out as much info as possible, so that users do not have to get here and ask for a debugging patch. After all, btrfs is still experimental, right? :) Furthermore, these problems should either remount the fs as readonly, or try to make the file that is implicated readonly, and report the filename, so users can delete it and continue with their lives without having to mkfs every few months. Or even make fsck able to fix these, and not choke on a few TB filesystem because it wants to use ridiculous amounts of RAM. In general, btrfs must get _much_ better at reporting what happened, which file was implicated and if it is a multiple disk fs, the disk where the problem is and the sector where that occured. PS. I am not a kernel developer, so please be kind if I have said something completely wrong :) > > Thanks, > Marc -- Konstantinos Skarlatos ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-18 13:23 ` Konstantinos Skarlatos @ 2014-06-18 21:22 ` Duncan 2014-06-19 8:56 ` Konstantinos Skarlatos 2014-06-19 15:13 ` Marc MERLIN 0 siblings, 2 replies; 12+ messages in thread From: Duncan @ 2014-06-18 21:22 UTC (permalink / raw) To: linux-btrfs Konstantinos Skarlatos posted on Wed, 18 Jun 2014 16:23:04 +0300 as excerpted: > I guess that btrfs developers have put these BUG_ONs so that they get > reports from users when btrfs gets in these unexpected situations. But > if most of these reports are ignored or not resolved, then maybe there > is no use for these BUG_ONs and they should be replaced with something > more mild. > > Keep in mind that if a system panics, then the only way to get logs from > it is with serial or netconsole, so BUG_ON really makes it much harder > for users to know what happened and send reports, and only the most > technical and determined users will manage to send reports here. In terms of the BUGONs, they've been converting them to WARNONs recently, exactly due to the point you and Marc have made. Not being a dev and simply based on the patch-flow I've seen as btrfs has been basically behaving itself so far here[1], I had /thought/ that was more or less done (perhaps some really bad bug-ons left but only a few, and basically only where the kernel couldn't be sure it was in a logical enough state to continue writing to other filesystems too, so bugon being logical in that case), but based on you guys' comments there's apparently more to go. So at least for BUGONs they agree. I guess it's simply a matter of getting them all converted. Tho at least in Marc's case, he's running kernels a couple back in some cases and they may still have BUGONs already replaced in the most current kernel. As for experimental, they've been toning down and removing the warnings recently. Yes, the on-device format may come with some level of compatibility guarantee now so I do agree with that bit, but IMO anyway, that warning should be being replaced with a more explicit "on-device- format is now stable but the code is not yet entirely so, so keep your backups and be prepared to use them, and run current kernels", language, and that's not happening, they're mostly just toning it down without the still explicit warnings, ATM. --- [1] Btrfs (so far) behaving itself here: Possibly because my filesystems are relatively small and I don't use snapshots much and prefer several smaller independent filesystems rather than doing subvolumes, thus keeping the number of eggs in a single basket small. Plus, with small filesystems on SSD, I can balance reasonably regularly, and I do full fresh mkfs.btrfs rounds every few kernels as well to take advantage of newer features, which may well have the result of killing smaller problems that aren't yet showing up before they get big enough to cause real issues. Anyway, I'm not complaining! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-18 21:22 ` Duncan @ 2014-06-19 8:56 ` Konstantinos Skarlatos 2014-06-19 15:06 ` Duncan 2014-06-19 17:37 ` Chris Murphy 2014-06-19 15:13 ` Marc MERLIN 1 sibling, 2 replies; 12+ messages in thread From: Konstantinos Skarlatos @ 2014-06-19 8:56 UTC (permalink / raw) To: Duncan, linux-btrfs On 19/6/2014 12:22 πμ, Duncan wrote: > Konstantinos Skarlatos posted on Wed, 18 Jun 2014 16:23:04 +0300 as > excerpted: > >> I guess that btrfs developers have put these BUG_ONs so that they get >> reports from users when btrfs gets in these unexpected situations. But >> if most of these reports are ignored or not resolved, then maybe there >> is no use for these BUG_ONs and they should be replaced with something >> more mild. >> >> Keep in mind that if a system panics, then the only way to get logs from >> it is with serial or netconsole, so BUG_ON really makes it much harder >> for users to know what happened and send reports, and only the most >> technical and determined users will manage to send reports here. > In terms of the BUGONs, they've been converting them to WARNONs recently, > exactly due to the point you and Marc have made. Not being a dev and > simply based on the patch-flow I've seen as btrfs has been basically > behaving itself so far here[1], I had /thought/ that was more or less > done (perhaps some really bad bug-ons left but only a few, and basically > only where the kernel couldn't be sure it was in a logical enough state > to continue writing to other filesystems too, so bugon being logical in > that case), but based on you guys' comments there's apparently more to go. > > So at least for BUGONs they agree. I guess it's simply a matter of > getting them all converted. Thats good to hear. But we should have a way to recover from these kinds of problems, first of all having btrfs report the exact location, disk and file name that is affected, and then make scrub fix or at least report about it, and finaly make fsck work for this. My filesystem that consistently kernel panics when a specific logical address is read, passes scrub without anything bad reported. What's the use of scrub if it cant deal with this? > > Tho at least in Marc's case, he's running kernels a couple back in some > cases and they may still have BUGONs already replaced in the most current > kernel. > > As for experimental, they've been toning down and removing the warnings > recently. Yes, the on-device format may come with some level of > compatibility guarantee now so I do agree with that bit, but IMO anyway, > that warning should be being replaced with a more explicit "on-device- > format is now stable but the code is not yet entirely so, so keep your > backups and be prepared to use them, and run current kernels", language, > and that's not happening, they're mostly just toning it down without the > still explicit warnings, ATM. > > --- > [1] Btrfs (so far) behaving itself here: Possibly because my filesystems > are relatively small and I don't use snapshots much and prefer several > smaller independent filesystems rather than doing subvolumes, thus > keeping the number of eggs in a single basket small. Plus, with small > filesystems on SSD, I can balance reasonably regularly, and I do full > fresh mkfs.btrfs rounds every few kernels as well to take advantage of > newer features, which may well have the result of killing smaller > problems that aren't yet showing up before they get big enough to cause > real issues. Anyway, I'm not complaining! =:^) Well my use case is about 25 filesystems on rotating disks, 20 of them on single disks, and the rest are multiple disk filesystems, either raid1 or single. I have many subvolumes and in some cases thousands of snapshots, but no databases, systemd and the like on them. Of course I have everything backed up, </nag mode on> but I believe that after all those years of development I shouldnt still be forced to do mkfs every 6 monts or so, when i use no new features. </nag mode off> > -- Konstantinos Skarlatos ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-19 8:56 ` Konstantinos Skarlatos @ 2014-06-19 15:06 ` Duncan 2014-06-19 15:19 ` Duncan 2014-06-19 17:37 ` Chris Murphy 1 sibling, 1 reply; 12+ messages in thread From: Duncan @ 2014-06-19 15:06 UTC (permalink / raw) To: linux-btrfs Konstantinos Skarlatos posted on Thu, 19 Jun 2014 11:56:59 +0300 as excerpted: > Thats good to hear. But we should have a way to recover from these kinds > of problems, first of all having btrfs report the exact location, disk > and file name that is affected, and then make scrub fix or at least > report about it, and finaly make fsck work for this. > > My filesystem that consistently kernel panics when a specific logical > address is read, passes scrub without anything bad reported. What's the > use of scrub if it cant deal with this? Scrub detects (and potentially fixes) exactly one sort of problem (tho that one can definitely cause others), and that's not it. On btrfs, what scrub does is exactly this: (a) Scrub calculates the checksums for all data and metadata blocks and matches that against the recorded checksum, reporting any no-match cases. (b) Where the checksums don't match up, if there's another copy of the data that /does/ checksum- validate, scrub will "scrub" the bad copy, replacing it with a duplicate of the good one. As it happens, on a (non-ssd) single-device filesystem, btrfs defaults to single data, dup metadata. In that case there's a second, hopefully valid, copy of the metadata blocks that can be used to correct a bad copy. But there's only a single copy of data blocks so while scrub can detect data-block errors, it won't be able to fix them. On a multi-device filesystem, btrfs defaults to raid1 metadata (with only two copies regardless of the number of devices present, N-way-mirroring is roadmapped but not yet implemented), single data, so again, hopefully the second copy of a bad metadata block is valid and can be used to scrub the bad one, but just as with the single-device case, it can detect but not fix data checksum errors. Tho of course in the multi-device case it's possible to set data to raid1 as well, and that's what I've done here so it too can be error-corrected from a hopefully good second copy. (Raid10 is similarly protected. Raid5/6 should work a bit differently, with parity, but last I knew raid56 scrub and recovery wasn't fully implemented yet, leaving raid1 and raid10, along with dup mode for single-device metadata only, as the error- correcting choices.) But if the problem is a btrfs logic error, such that the (meta)data that was actually checksummed and written out was bad before it was ever checksummed in the first place, then scrub won't do a thing for it, because the checksum validates just fine, it's just that it's a perfectly valid checksum on perfectly invalid (meta)data. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-19 15:06 ` Duncan @ 2014-06-19 15:19 ` Duncan 0 siblings, 0 replies; 12+ messages in thread From: Duncan @ 2014-06-19 15:19 UTC (permalink / raw) To: linux-btrfs Duncan posted on Thu, 19 Jun 2014 15:06:00 +0000 as excerpted: > Scrub detects (and potentially fixes) exactly one sort of problem (tho > that one can definitely cause others), and that's not it. Hmm. Last phrase was ambiguous. What I meant was, that problem (your problem) is not the sort of problem scrub detects and potentially fixes. NOT: that's not /all/ of what scrub does. (... Which wouldn't make sense in context, but that's how I initially tried to read it when I reread what I posted, thus confusing myself, and if even *I* get confused reading my own writing...! =:^( ) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-19 8:56 ` Konstantinos Skarlatos 2014-06-19 15:06 ` Duncan @ 2014-06-19 17:37 ` Chris Murphy 1 sibling, 0 replies; 12+ messages in thread From: Chris Murphy @ 2014-06-19 17:37 UTC (permalink / raw) To: Btrfs BTRFS On Jun 19, 2014, at 2:56 AM, Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote: > > My filesystem that consistently kernel panics when a specific logical address is read, passes scrub without anything bad reported. What's the use of scrub if it cant deal with this? The myriad repair tools: automatic at mount, recovery mount option, scrub, check/repair, btrfs-zero-log, chunk-recover, super-recover certainly make Btrfs significantly more challenging to troubleshoot for the user familiar with other file systems. I think this is just a maturity issue, and as the necessary logic of repairing a file system reveals itself I think we'll see consolidation and more automation of these repair methods. fs/btrfs/scrub.c comments say what it does now and future enhancements. " … reads all * extent and super block and verifies the checksums. In case a bad checksum * is found or the extent cannot be read, good data will be written back if * any can be found." Scrub is pretty much just about checksum verification. It doesn't check file system consistency. So the file system could be inconsistent and a scrub still comes up clean. > Well my use case is about 25 filesystems on rotating disks, 20 of them on single disks, and the rest are multiple disk filesystems, either raid1 or single. I have many subvolumes and in some cases thousands of snapshots, but no databases, systemd and the like on them. That's a lot of subvolumes and snapshots. I don't know this is expected to work really well right now (?), yes hundreds but with thousands there have been some known problems in the recent past at least. > Of course I have everything backed up, </nag mode on> but I believe that after all those years of development I shouldnt still be forced to do mkfs every 6 monts or so, when i use no new features. </nag mode off> The problem is that an old file system implies many kernels doing different kinds of reads and writes over time, making a given file system rather non-deterministic compared to any other. So the possible problems aren't all known and therefore the way to fix them may not be known yet either. Chris Murphy ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: frustrations with handling of crash reports 2014-06-18 21:22 ` Duncan 2014-06-19 8:56 ` Konstantinos Skarlatos @ 2014-06-19 15:13 ` Marc MERLIN 1 sibling, 0 replies; 12+ messages in thread From: Marc MERLIN @ 2014-06-19 15:13 UTC (permalink / raw) To: Duncan; +Cc: linux-btrfs On Wed, Jun 18, 2014 at 09:22:50PM +0000, Duncan wrote: > Tho at least in Marc's case, he's running kernels a couple back in some > cases and they may still have BUGONs already replaced in the most current > kernel. The machine I originally has that one last bug on (balance crash) was an ubuntu kernel (oldish 3.13), but I reproduced with 3.15.1 where it got worse (it seemed like WARN on 3.13 and BUG_ON in 3.15 since with 3.13 I got syslog output and the system kept running and with 3.15.1 it just crashed and I had to have netconsole ready to catch the output). Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems .... .... what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2014-06-19 17:37 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-19 13:49 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems Marc MERLIN 2014-06-17 6:29 ` Satoru Takeuchi 2014-06-17 14:40 ` Marc MERLIN 2014-06-17 14:59 ` frustrations with handling of crash reports Marc MERLIN 2014-06-17 18:27 ` Marc MERLIN 2014-06-18 13:23 ` Konstantinos Skarlatos 2014-06-18 21:22 ` Duncan 2014-06-19 8:56 ` Konstantinos Skarlatos 2014-06-19 15:06 ` Duncan 2014-06-19 15:19 ` Duncan 2014-06-19 17:37 ` Chris Murphy 2014-06-19 15:13 ` Marc MERLIN
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.