linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems
@ 2014-05-19 13:49 Marc MERLIN
  2014-06-17  6:29 ` Satoru Takeuchi
  0 siblings, 1 reply; 12+ messages in thread
From: Marc MERLIN @ 2014-05-19 13:49 UTC (permalink / raw)
  To: linux-btrfs

Ok, that's 2 out of 2.

I was copying pictures from an sdcard (through mmcblk0), and the
filesystem deadlocked.

Unfortunately, when this happens, I copied my pictures (which were still
in RAM) to my 2nd drive which was also btrfs.
I had to reboot, and of course the last pictures didn't get committed to
disk, but more annoyingly the copy I did to the second drive didn't work
either.
All the filenames got copied to the 2nd drive, some ended up with data,
and others ended up empty.
Why does a deadlock on drive 1 also cause btrfs to fail to write to
drive #2?
This is not the first time, there seem to be common codepaths across all
drives (just like disk array #1 having problems causing failure of
syslog to work on the boot drive with btrfs).

I tried to capture sysrq+w, but it didn't make it to disk because of that bug.
I do have remote syslog of the hangs before that though, but the capture of sysrq+w
has too much missing data to be useful
http://marc.merlins.org/tmp/btrfs-hang.txt

Mmmh, maybe the deadlock is more complicated. I had a 2nd syslog stream
going to an ext4 filesystem, exactly to get around that btrfs master
deadlock, and now I see that didn't work either.

If sync hangs, and logging to an ext4 filesystem didn't work, am I
hitting another bug/hardware problem?

Here's what I got at the end?


[194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
[194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
[194932.445153] INFO: task IndexedDB:29612 blocked for more than 120 seconds.
[194932.445161]       Tainted: G        W     3.15.0-rc5-amd64-i915-preempt-20140216s1 #2
[194932.445163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[194932.445166] IndexedDB       D ffff8800ccde8bc0     0 29612   5570 0x00000080
[194932.445172]  ffff8801b521fc30 0000000000000086 ffff8801b521fc00 ffff8801b521ffd8
[194932.445178]  ffff8801d622a450 00000000000141c0 ffff88041e3941c0 ffff8801d622a450
[194932.445182]  ffff8801b521fcd0 0000000000000002 ffffffff810fda1a ffff8801b521fc40
[194932.445188] Call Trace:
[194932.445198]  [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c
[194932.445209]  [<ffffffff8161ca1b>] io_schedule+0x60/0x7a
[194932.445214]  [<ffffffff810fda28>] sleep_on_page+0xe/0x12
[194932.445219]  [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a
[194932.445223]  [<ffffffff810fdae3>] __lock_page+0x69/0x6b
[194932.445228]  [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34
[194932.445232]  [<ffffffff81240c41>] lock_page+0x1e/0x21
[194932.445237]  [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3
[194932.445243]  [<ffffffff8161d2d4>] ? mutex_unlock+0x16/0x18
[194932.445248]  [<ffffffff81239c74>] ? btrfs_file_aio_write+0x3e9/0x4b6
[194932.445251]  [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c
[194932.445255]  [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4
[194932.445262]  [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a
[194932.445267]  [<ffffffff811082b1>] do_writepages+0x1e/0x2c
[194932.445272]  [<ffffffff810ff179>] __filemap_fdatawrite_range+0x55/0x57
[194932.445277]  [<ffffffff810ff1ef>] filemap_fdatawrite_range+0x13/0x15
[194932.445280]  [<ffffffff8123885a>] btrfs_sync_file+0xa8/0x2b3
[194932.445286]  [<ffffffff8132048f>] ? __percpu_counter_add+0x8c/0xa6
[194932.445292]  [<ffffffff8117a1a7>] vfs_fsync_range+0x18/0x22
[194932.445296]  [<ffffffff8117a1cd>] vfs_fsync+0x1c/0x1e
[194932.445299]  [<ffffffff8117a3d9>] do_fsync+0x2c/0x4c
[194932.445303]  [<ffffffff8117a5f9>] SyS_fdatasync+0x13/0x17
[194932.445308]  [<ffffffff81625bad>] system_call_fastpath+0x1a/0x1f
[194932.445395] INFO: task kworker/u16:35:3812 blocked for more than 120 seconds.
[194932.445398]       Tainted: G        W     3.15.0-rc5-amd64-i915-preempt-20140216s1 #2
[194932.445400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[194932.445403] kworker/u16:35  D 0000000000000000     0  3812      2 0x00000080
[194932.445410] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
[194932.445414]  ffff88003b647a00 0000000000000046 ffff88003b6479d0 ffff88003b647fd8
[194932.445419]  ffff88003b8ca590 00000000000141c0 ffff88041e3941c0 ffff88003b8ca590
[194932.445423]  ffff88003b647aa0 0000000000000002 ffffffff810fda1a ffff88003b647a10
[194932.445427] Call Trace:
[194932.445432]  [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c
[194932.445437]  [<ffffffff8161c876>] schedule+0x73/0x75
[194932.445441]  [<ffffffff8161ca1b>] io_schedule+0x60/0x7a
[194932.445445]  [<ffffffff810fda28>] sleep_on_page+0xe/0x12
[194932.445450]  [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a
[194932.445454]  [<ffffffff810fdae3>] __lock_page+0x69/0x6b
[194932.445458]  [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34
[194932.445461]  [<ffffffff81240c41>] lock_page+0x1e/0x21
[194932.445465]  [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3
[194932.445470]  [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c
[194932.445473]  [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4
[194932.445479]  [<ffffffff8162280c>] ? preempt_count_add+0x77/0x8d
[194932.445483]  [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a
[194932.445488]  [<ffffffff811082b1>] do_writepages+0x1e/0x2c
[194932.445492]  [<ffffffff81175ef2>] __writeback_single_inode+0x7d/0x238
[194932.445495]  [<ffffffff81176c2a>] writeback_sb_inodes+0x1eb/0x339
[194932.445499]  [<ffffffff81176dec>] __writeback_inodes_wb+0x74/0xb7
[194932.445503]  [<ffffffff81176f67>] wb_writeback+0x138/0x293
[194932.445507]  [<ffffffff8117759f>] bdi_writeback_workfn+0x19a/0x329
[194932.445513]  [<ffffffff8100d047>] ? load_TLS+0xb/0xf
[194932.445519]  [<ffffffff81065d2e>] process_one_work+0x195/0x2d2
[194932.445523]  [<ffffffff8106624a>] worker_thread+0x136/0x205
[194932.445526]  [<ffffffff81066114>] ? rescuer_thread+0x27a/0x27a
[194932.445530]  [<ffffffff8106b467>] kthread+0xae/0xb6
[194932.445534]  [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61
[194932.445537]  [<ffffffff81625afc>] ret_from_fork+0x7c/0xb0
[194932.445540]  [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61

-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems
  2014-05-19 13:49 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems Marc MERLIN
@ 2014-06-17  6:29 ` Satoru Takeuchi
  2014-06-17 14:40   ` Marc MERLIN
  2014-06-17 14:59   ` frustrations with handling of crash reports Marc MERLIN
  0 siblings, 2 replies; 12+ messages in thread
From: Satoru Takeuchi @ 2014-06-17  6:29 UTC (permalink / raw)
  To: Marc MERLIN, linux-btrfs

Hi Marc,

(2014/05/19 22:49), Marc MERLIN wrote:
> Ok, that's 2 out of 2.
>
> I was copying pictures from an sdcard (through mmcblk0), and the
> filesystem deadlocked.
>
> Unfortunately, when this happens, I copied my pictures (which were still
> in RAM) to my 2nd drive which was also btrfs.

 From your sysrq capture, your sd card is formatted as VFAT, is it correct?

===
[194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
===

> I had to reboot, and of course the last pictures didn't get committed to
> disk, but more annoyingly the copy I did to the second drive didn't work
> either.
> All the filenames got copied to the 2nd drive, some ended up with data,
> and others ended up empty.
> Why does a deadlock on drive 1 also cause btrfs to fail to write to
> drive #2?
> This is not the first time, there seem to be common codepaths across all
> drives (just like disk array #1 having problems causing failure of
> syslog to work on the boot drive with btrfs).
>
> I tried to capture sysrq+w, but it didn't make it to disk because of that bug.
> I do have remote syslog of the hangs before that though, but the capture of sysrq+w
> has too much missing data to be useful
> http://marc.merlins.org/tmp/btrfs-hang.txt

quoted from btrfs-hang.txt:
===
[194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
===

Did you try mkfs.fsck? In addition, does this problem happen
after that? Here try to reproduce with 3.16-rc1 is desirable.

If it's easy to reproduce,

  - run fsck.vfat (as I described before),
  - change SD card,
  - change copy target to other filesystem than btrfs

is useful to find out the root cause.

Thanks,
Satoru

>
> Mmmh, maybe the deadlock is more complicated. I had a 2nd syslog stream
> going to an ext4 filesystem, exactly to get around that btrfs master
> deadlock, and now I see that didn't work either.
>
> If sync hangs, and logging to an ext4 filesystem didn't work, am I
> hitting another bug/hardware problem?
>
> Here's what I got at the end?
>
>
> [194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!
> [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
> [194932.445153] INFO: task IndexedDB:29612 blocked for more than 120 seconds.
> [194932.445161]       Tainted: G        W     3.15.0-rc5-amd64-i915-preempt-20140216s1 #2
> [194932.445163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [194932.445166] IndexedDB       D ffff8800ccde8bc0     0 29612   5570 0x00000080
> [194932.445172]  ffff8801b521fc30 0000000000000086 ffff8801b521fc00 ffff8801b521ffd8
> [194932.445178]  ffff8801d622a450 00000000000141c0 ffff88041e3941c0 ffff8801d622a450
> [194932.445182]  ffff8801b521fcd0 0000000000000002 ffffffff810fda1a ffff8801b521fc40
> [194932.445188] Call Trace:
> [194932.445198]  [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c
> [194932.445209]  [<ffffffff8161ca1b>] io_schedule+0x60/0x7a
> [194932.445214]  [<ffffffff810fda28>] sleep_on_page+0xe/0x12
> [194932.445219]  [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a
> [194932.445223]  [<ffffffff810fdae3>] __lock_page+0x69/0x6b
> [194932.445228]  [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34
> [194932.445232]  [<ffffffff81240c41>] lock_page+0x1e/0x21
> [194932.445237]  [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3
> [194932.445243]  [<ffffffff8161d2d4>] ? mutex_unlock+0x16/0x18
> [194932.445248]  [<ffffffff81239c74>] ? btrfs_file_aio_write+0x3e9/0x4b6
> [194932.445251]  [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c
> [194932.445255]  [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4
> [194932.445262]  [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a
> [194932.445267]  [<ffffffff811082b1>] do_writepages+0x1e/0x2c
> [194932.445272]  [<ffffffff810ff179>] __filemap_fdatawrite_range+0x55/0x57
> [194932.445277]  [<ffffffff810ff1ef>] filemap_fdatawrite_range+0x13/0x15
> [194932.445280]  [<ffffffff8123885a>] btrfs_sync_file+0xa8/0x2b3
> [194932.445286]  [<ffffffff8132048f>] ? __percpu_counter_add+0x8c/0xa6
> [194932.445292]  [<ffffffff8117a1a7>] vfs_fsync_range+0x18/0x22
> [194932.445296]  [<ffffffff8117a1cd>] vfs_fsync+0x1c/0x1e
> [194932.445299]  [<ffffffff8117a3d9>] do_fsync+0x2c/0x4c
> [194932.445303]  [<ffffffff8117a5f9>] SyS_fdatasync+0x13/0x17
> [194932.445308]  [<ffffffff81625bad>] system_call_fastpath+0x1a/0x1f
> [194932.445395] INFO: task kworker/u16:35:3812 blocked for more than 120 seconds.
> [194932.445398]       Tainted: G        W     3.15.0-rc5-amd64-i915-preempt-20140216s1 #2
> [194932.445400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [194932.445403] kworker/u16:35  D 0000000000000000     0  3812      2 0x00000080
> [194932.445410] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
> [194932.445414]  ffff88003b647a00 0000000000000046 ffff88003b6479d0 ffff88003b647fd8
> [194932.445419]  ffff88003b8ca590 00000000000141c0 ffff88041e3941c0 ffff88003b8ca590
> [194932.445423]  ffff88003b647aa0 0000000000000002 ffffffff810fda1a ffff88003b647a10
> [194932.445427] Call Trace:
> [194932.445432]  [<ffffffff810fda1a>] ? wait_on_page_read+0x3c/0x3c
> [194932.445437]  [<ffffffff8161c876>] schedule+0x73/0x75
> [194932.445441]  [<ffffffff8161ca1b>] io_schedule+0x60/0x7a
> [194932.445445]  [<ffffffff810fda28>] sleep_on_page+0xe/0x12
> [194932.445450]  [<ffffffff8161cdab>] __wait_on_bit_lock+0x46/0x8a
> [194932.445454]  [<ffffffff810fdae3>] __lock_page+0x69/0x6b
> [194932.445458]  [<ffffffff81084771>] ? autoremove_wake_function+0x34/0x34
> [194932.445461]  [<ffffffff81240c41>] lock_page+0x1e/0x21
> [194932.445465]  [<ffffffff81244779>] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3
> [194932.445470]  [<ffffffff81244bd4>] extent_writepages+0x4b/0x5c
> [194932.445473]  [<ffffffff8122ee1f>] ? btrfs_submit_direct+0x3f4/0x3f4
> [194932.445479]  [<ffffffff8162280c>] ? preempt_count_add+0x77/0x8d
> [194932.445483]  [<ffffffff8122d3fa>] btrfs_writepages+0x28/0x2a
> [194932.445488]  [<ffffffff811082b1>] do_writepages+0x1e/0x2c
> [194932.445492]  [<ffffffff81175ef2>] __writeback_single_inode+0x7d/0x238
> [194932.445495]  [<ffffffff81176c2a>] writeback_sb_inodes+0x1eb/0x339
> [194932.445499]  [<ffffffff81176dec>] __writeback_inodes_wb+0x74/0xb7
> [194932.445503]  [<ffffffff81176f67>] wb_writeback+0x138/0x293
> [194932.445507]  [<ffffffff8117759f>] bdi_writeback_workfn+0x19a/0x329
> [194932.445513]  [<ffffffff8100d047>] ? load_TLS+0xb/0xf
> [194932.445519]  [<ffffffff81065d2e>] process_one_work+0x195/0x2d2
> [194932.445523]  [<ffffffff8106624a>] worker_thread+0x136/0x205
> [194932.445526]  [<ffffffff81066114>] ? rescuer_thread+0x27a/0x27a
> [194932.445530]  [<ffffffff8106b467>] kthread+0xae/0xb6
> [194932.445534]  [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61
> [194932.445537]  [<ffffffff81625afc>] ret_from_fork+0x7c/0xb0
> [194932.445540]  [<ffffffff8106b3b9>] ? __kthread_parkme+0x61/0x61
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems
  2014-06-17  6:29 ` Satoru Takeuchi
@ 2014-06-17 14:40   ` Marc MERLIN
  2014-06-17 14:59   ` frustrations with handling of crash reports Marc MERLIN
  1 sibling, 0 replies; 12+ messages in thread
From: Marc MERLIN @ 2014-06-17 14:40 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: linux-btrfs

On Tue, Jun 17, 2014 at 03:29:19PM +0900, Satoru Takeuchi wrote:
> Hi Marc,
> 
> (2014/05/19 22:49), Marc MERLIN wrote:
> >Ok, that's 2 out of 2.
> >
> >I was copying pictures from an sdcard (through mmcblk0), and the
> >filesystem deadlocked.
> >
> >Unfortunately, when this happens, I copied my pictures (which were still
> >in RAM) to my 2nd drive which was also btrfs.
> 
> From your sysrq capture, your sd card is formatted as VFAT, is it correct?

Yes, typical camera sdcard :)
 
> ===
> [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some 
> data may be corrupt. Please run fsck.
> ===
> 
> Did you try mkfs.fsck? In addition, does this problem happen
> after that? Here try to reproduce with 3.16-rc1 is desirable.

Tat was almost a month ago. The card has been reformatted since then, but
the problem was not with the sdcard or vfat FS. All the data was read fine,
ended up in the page cache, and btrfs failed to actually commit it to disk.
 
> If it's easy to reproduce,
> 
>  - run fsck.vfat (as I described before),
>  - change SD card,
>  - change copy target to other filesystem than btrfs
> 
> is useful to find out the root cause.

I wish I could reproduce this at will, but I can't. In some way, that's good
since I lost actual pictures (from Japan at the time) each time this
happened.

Either way, thanks for having a look.

I'll answer the rest in another message since it warrants another thread.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* frustrations with handling of crash reports
  2014-06-17  6:29 ` Satoru Takeuchi
  2014-06-17 14:40   ` Marc MERLIN
@ 2014-06-17 14:59   ` Marc MERLIN
  2014-06-17 18:27     ` Marc MERLIN
  1 sibling, 1 reply; 12+ messages in thread
From: Marc MERLIN @ 2014-06-17 14:59 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: linux-btrfs, Chris Mason

On Tue, Jun 17, 2014 at 03:29:19PM +0900, Satoru Takeuchi wrote:
> after that? Here try to reproduce with 3.16-rc1 is desirable.

As for 3.16-rc1, my problem, and this is not targetted at you, just a
general unfortunate observation of my last months of btrfs testing and
reorts.

I use btrfs on real systems. I have backups, but having crashes is both
inconvenient and time consuming since it's my real data and systems I need
and use, and where a crash at the wrong time can be quite inconvenient, as
well as cost me hours of time to recover.
I get to pick between staying 1 or 2 kernel versions back and being told
that all my reports are useless because the kernel is too old, and running
unstable kernels and unfortuantely still have at least 50% of my reports not
looked at.

I signed up for that when using btrfs, but at least 50% of the time, when I
went through this, and reported the problems (which took more time since
it's not a test system in a VM or with serial console), the reports were
ignored, or looked like they were.
The other half of the time, I was indeed told to use an even more
unstable/unproven version of btrfs, assuming the one I was running wasn't
already unstable/unproven enough.

There is no right answer here, I understand that in their limited time
developers are working on new code or maybe fixing existing code.
However, if they are interested in real users with real data brave
enough to run recent code, my wish would be for more support and timely
interest when severe hang problems are reported, or corruption.

Case in point: I just reported a FS last week that oopsed btrfs, and worse
that crashed 3.15 (the problem is still there, but the symptom is worse in
3.15), and got no answer from anyone carying about the filesystem.
I asked a 2nd time before deleting it, no one answered.

It took me 2h during my work day (which isn't supposed to be related to
testing btrfs) to even capture that, and now I regret even having bothered
because it looks like no one cared.
Next time it happens again, I likely won't waste my time to report it and
get back to work, potentially reverting the FS to something other than btrfs
:(

Similarly I've seen other posts from people reporting corruption, data loss,
and getting no answer or feedback at all.

I realize that the developers can't put hours of personal time chasing each
(sometimes incomplete) report sent to the list, but my point is that if user
reports or crashes and data loss seemingly get so little attention, this is
going to put off a lot of early users who will get burnt, remember the bad
experience, and not come back. I already know some who have, some of which
have even told me "why do you even still bother using btrfs for your data".

Btrfs is labelled as experimental, no one has a right to complain if data is
lost, but my suggestion is for developers to allocate a bit more time to
looking at user reports, especially if they spent time getting the crash
data and trying to give useful information.

It is also ok to answer "Any FS created or used before kernel 3.x can be
corrupted due to bugs we fixed in 3.y, thank you for your report but it's
not a good use of our time to investigate this"
(although newer kernels should not just crash with BUG(xxx) on unexpected
data, they should remount the FS read only).

Maybe it would make sense for some developers to clean those up, and do some
kind of unofficial rotation of list monitoring to gather important reports
from users and act on the ones that have useful data or at least get back to
users who reported a crash on code that is known to have corruption or
deadlock problems that were fixed in newer kernels.

Again, this was not targeted at your answer, I do thank you for trying to
help.
Thanks to all for reading.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-17 14:59   ` frustrations with handling of crash reports Marc MERLIN
@ 2014-06-17 18:27     ` Marc MERLIN
  2014-06-18 13:23       ` Konstantinos Skarlatos
  0 siblings, 1 reply; 12+ messages in thread
From: Marc MERLIN @ 2014-06-17 18:27 UTC (permalink / raw)
  To: Satoru Takeuchi; +Cc: linux-btrfs, Chris Mason

On Tue, Jun 17, 2014 at 07:59:57AM -0700, Marc MERLIN wrote:
> It is also ok to answer "Any FS created or used before kernel 3.x can be
> corrupted due to bugs we fixed in 3.y, thank you for your report but it's
> not a good use of our time to investigate this"
> (although newer kernels should not just crash with BUG(xxx) on unexpected
> data, they should remount the FS read only).

I was thinking about this some more, and I know I have no right to tell
others what to do, so take this as a mere suggestion :)

How about doing a release with cleanups and stabilization and better state
reporting when things go wrong?

This would give a good known version for users who have actual data and
backups that can take many hours or days to restore (never mind downtime).

A few things I was thinking about:
1) Wouldn't it be a good time to replace all the BUG ON statements with
appropriate error handling? Unexpected data can happen, the kernel shouldn't
crash that.
At the very least it should remount read only and give maybe a wiki link to
the user on what to do next (some bu reporting and recovery page)

2) On unexpected cases, output basic information on the filesystem or printk
instructions to the user on how to gather data that would be sent to the
list to be reviewed.
This would include information on how old the filesystem is when it's
possible to detect, and the instruction page could say "sorry, anything
older than X, we don't want to hear about, we already fixed corruption bugs
since then"

3) getting printk data on an end user machine when it just started refusing
to write to disk can be challenging and cause useful debug info to be lost.
Things I thinking about:
a) make sure most btrfs bugs do not just hang the kernel
b) recommend to users to send kernel syslog messages to an ext4 partition

How does that sound?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-17 18:27     ` Marc MERLIN
@ 2014-06-18 13:23       ` Konstantinos Skarlatos
  2014-06-18 21:22         ` Duncan
  0 siblings, 1 reply; 12+ messages in thread
From: Konstantinos Skarlatos @ 2014-06-18 13:23 UTC (permalink / raw)
  To: Marc MERLIN, Satoru Takeuchi; +Cc: linux-btrfs, Chris Mason, torvalds

On 17/6/2014 9:27 μμ, Marc MERLIN wrote:
> On Tue, Jun 17, 2014 at 07:59:57AM -0700, Marc MERLIN wrote:
>> It is also ok to answer "Any FS created or used before kernel 3.x can be
>> corrupted due to bugs we fixed in 3.y, thank you for your report but it's
>> not a good use of our time to investigate this"
>> (although newer kernels should not just crash with BUG(xxx) on unexpected
>> data, they should remount the FS read only).
> I was thinking about this some more, and I know I have no right to tell
> others what to do, so take this as a mere suggestion :)
>
> How about doing a release with cleanups and stabilization and better state
> reporting when things go wrong?
>
> This would give a good known version for users who have actual data and
> backups that can take many hours or days to restore (never mind downtime).
>
> A few things I was thinking about:
> 1) Wouldn't it be a good time to replace all the BUG ON statements with
> appropriate error handling? Unexpected data can happen, the kernel shouldn't
> crash that.
> At the very least it should remount read only and give maybe a wiki link to
> the user on what to do next (some bu reporting and recovery page)
>
> 2) On unexpected cases, output basic information on the filesystem or printk
> instructions to the user on how to gather data that would be sent to the
> list to be reviewed.
> This would include information on how old the filesystem is when it's
> possible to detect, and the instruction page could say "sorry, anything
> older than X, we don't want to hear about, we already fixed corruption bugs
> since then"
>
> 3) getting printk data on an end user machine when it just started refusing
> to write to disk can be challenging and cause useful debug info to be lost.
> Things I thinking about:
> a) make sure most btrfs bugs do not just hang the kernel
> b) recommend to users to send kernel syslog messages to an ext4 partition
>
> How does that sound?
I 100% agree with this. I also have a problem where btrfs decides to 
BUG_ON and force a kernel panic because it has found an unexpected type 
of metadata. Although in my case I was more lucky and had help and test 
patches from Liu Bo, I am still of the opinion that btrfs should not 
take down a whole system because it found something unexpected.

I guess that btrfs developers have put these BUG_ONs so that they get 
reports from users when btrfs gets in these unexpected situations. But 
if most of these reports are ignored or not resolved, then maybe there 
is no use for these BUG_ONs and they should be replaced with something 
more mild.

Keep in mind that if a system panics, then the only way to get logs from 
it is with serial or netconsole, so BUG_ON really makes it much harder 
for users to know what happened and send reports, and only the most 
technical and determined users will manage to send reports here. So I 
can guess that the real number of kernel panics due to btrfs is much 
higher, and most people are unable to report them, because they _never 
know_ that it was btrfs that caused their crash.

I know btrfs is still experimental, but it is in kernel since 
2009-01-09, so I think most users have some expectation of stability 
after something is 5.5 years in the mainline kernel.

So my suggestion is that basicaly the same with Marc's:

These BUG_ONs should be replaced with something that does not crash the 
system and gives out as much info as possible, so that users do not have 
to get here and ask for a debugging patch.  After all, btrfs is still 
experimental, right? :)

Furthermore, these problems should either remount the fs as readonly, or 
try to make the file that is implicated readonly, and report the 
filename, so users can delete it and continue with their lives without 
having to mkfs every few months. Or even make fsck able to fix these, 
and not choke on a few TB filesystem because it wants to use ridiculous 
amounts of RAM.

In general, btrfs must get _much_ better at reporting what happened, 
which file was implicated and if it is a multiple disk fs, the disk 
where the problem is and the sector where that occured.

PS.
I am not a kernel developer, so please be kind if I have said something 
completely wrong :)

>
> Thanks,
> Marc


-- 
Konstantinos Skarlatos


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-18 13:23       ` Konstantinos Skarlatos
@ 2014-06-18 21:22         ` Duncan
  2014-06-19  8:56           ` Konstantinos Skarlatos
  2014-06-19 15:13           ` Marc MERLIN
  0 siblings, 2 replies; 12+ messages in thread
From: Duncan @ 2014-06-18 21:22 UTC (permalink / raw)
  To: linux-btrfs

Konstantinos Skarlatos posted on Wed, 18 Jun 2014 16:23:04 +0300 as
excerpted:

> I guess that btrfs developers have put these BUG_ONs so that they get
> reports from users when btrfs gets in these unexpected situations. But
> if most of these reports are ignored or not resolved, then maybe there
> is no use for these BUG_ONs and they should be replaced with something
> more mild.
> 
> Keep in mind that if a system panics, then the only way to get logs from
> it is with serial or netconsole, so BUG_ON really makes it much harder
> for users to know what happened and send reports, and only the most
> technical and determined users will manage to send reports here.

In terms of the BUGONs, they've been converting them to WARNONs recently, 
exactly due to the point you and Marc have made.  Not being a dev and 
simply based on the patch-flow I've seen as btrfs has been basically 
behaving itself so far here[1], I had /thought/ that was more or less 
done (perhaps some really bad bug-ons left but only a few, and basically 
only where the kernel couldn't be sure it was in a logical enough state 
to continue writing to other filesystems too, so bugon being logical in 
that case), but based on you guys' comments there's apparently more to go.

So at least for BUGONs they agree.  I guess it's simply a matter of 
getting them all converted.

Tho at least in Marc's case, he's running kernels a couple back in some 
cases and they may still have BUGONs already replaced in the most current 
kernel.

As for experimental, they've been toning down and removing the warnings 
recently.  Yes, the on-device format may come with some level of 
compatibility guarantee now so I do agree with that bit, but IMO anyway, 
that warning should be being replaced with a more explicit "on-device-
format is now stable but the code is not yet entirely so, so keep your 
backups and be prepared to use them, and run current kernels", language, 
and that's not happening, they're mostly just toning it down without the 
still explicit warnings, ATM.

---
[1] Btrfs (so far) behaving itself here: Possibly because my filesystems 
are relatively small and I don't use snapshots much and prefer several 
smaller independent filesystems rather than doing subvolumes, thus 
keeping the number of eggs in a single basket small.  Plus, with small 
filesystems on SSD, I can balance reasonably regularly, and I do full 
fresh mkfs.btrfs rounds every few kernels as well to take advantage of 
newer features, which may well have the result of killing smaller 
problems that aren't yet showing up before they get big enough to cause 
real issues.  Anyway, I'm not complaining! =:^)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-18 21:22         ` Duncan
@ 2014-06-19  8:56           ` Konstantinos Skarlatos
  2014-06-19 15:06             ` Duncan
  2014-06-19 17:37             ` Chris Murphy
  2014-06-19 15:13           ` Marc MERLIN
  1 sibling, 2 replies; 12+ messages in thread
From: Konstantinos Skarlatos @ 2014-06-19  8:56 UTC (permalink / raw)
  To: Duncan, linux-btrfs

On 19/6/2014 12:22 πμ, Duncan wrote:
> Konstantinos Skarlatos posted on Wed, 18 Jun 2014 16:23:04 +0300 as
> excerpted:
>
>> I guess that btrfs developers have put these BUG_ONs so that they get
>> reports from users when btrfs gets in these unexpected situations. But
>> if most of these reports are ignored or not resolved, then maybe there
>> is no use for these BUG_ONs and they should be replaced with something
>> more mild.
>>
>> Keep in mind that if a system panics, then the only way to get logs from
>> it is with serial or netconsole, so BUG_ON really makes it much harder
>> for users to know what happened and send reports, and only the most
>> technical and determined users will manage to send reports here.
> In terms of the BUGONs, they've been converting them to WARNONs recently,
> exactly due to the point you and Marc have made.  Not being a dev and
> simply based on the patch-flow I've seen as btrfs has been basically
> behaving itself so far here[1], I had /thought/ that was more or less
> done (perhaps some really bad bug-ons left but only a few, and basically
> only where the kernel couldn't be sure it was in a logical enough state
> to continue writing to other filesystems too, so bugon being logical in
> that case), but based on you guys' comments there's apparently more to go.
>
> So at least for BUGONs they agree.  I guess it's simply a matter of
> getting them all converted.
Thats good to hear. But we should have a way to recover from these kinds 
of problems, first of all having btrfs report the exact location, disk 
and file name that is affected, and then make scrub fix or at least 
report about it, and finaly make fsck work for this.

My filesystem that consistently kernel panics when a specific logical 
address is read, passes scrub without anything bad reported. What's the 
use of scrub if it cant deal with this?

>
> Tho at least in Marc's case, he's running kernels a couple back in some
> cases and they may still have BUGONs already replaced in the most current
> kernel.
>
> As for experimental, they've been toning down and removing the warnings
> recently.  Yes, the on-device format may come with some level of
> compatibility guarantee now so I do agree with that bit, but IMO anyway,
> that warning should be being replaced with a more explicit "on-device-
> format is now stable but the code is not yet entirely so, so keep your
> backups and be prepared to use them, and run current kernels", language,
> and that's not happening, they're mostly just toning it down without the
> still explicit warnings, ATM.
>
> ---
> [1] Btrfs (so far) behaving itself here: Possibly because my filesystems
> are relatively small and I don't use snapshots much and prefer several
> smaller independent filesystems rather than doing subvolumes, thus
> keeping the number of eggs in a single basket small.  Plus, with small
> filesystems on SSD, I can balance reasonably regularly, and I do full
> fresh mkfs.btrfs rounds every few kernels as well to take advantage of
> newer features, which may well have the result of killing smaller
> problems that aren't yet showing up before they get big enough to cause
> real issues.  Anyway, I'm not complaining! =:^)
Well my use case is about 25 filesystems on rotating disks, 20 of them 
on single disks, and the rest are multiple disk filesystems, either 
raid1 or single. I have many subvolumes and in some cases thousands of 
snapshots, but no databases, systemd and the like on them. Of course I 
have everything backed up, </nag mode on> but I believe that after all 
those years of development I shouldnt still be forced to do mkfs every 6 
monts or so, when i use no new features. </nag mode off>
>


-- 
Konstantinos Skarlatos


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-19  8:56           ` Konstantinos Skarlatos
@ 2014-06-19 15:06             ` Duncan
  2014-06-19 15:19               ` Duncan
  2014-06-19 17:37             ` Chris Murphy
  1 sibling, 1 reply; 12+ messages in thread
From: Duncan @ 2014-06-19 15:06 UTC (permalink / raw)
  To: linux-btrfs

Konstantinos Skarlatos posted on Thu, 19 Jun 2014 11:56:59 +0300 as
excerpted:

> Thats good to hear. But we should have a way to recover from these kinds
> of problems, first of all having btrfs report the exact location, disk
> and file name that is affected, and then make scrub fix or at least
> report about it, and finaly make fsck work for this.
> 
> My filesystem that consistently kernel panics when a specific logical
> address is read, passes scrub without anything bad reported. What's the
> use of scrub if it cant deal with this?

Scrub detects (and potentially fixes) exactly one sort of problem (tho 
that one can definitely cause others), and that's not it.

On btrfs, what scrub does is exactly this:  (a) Scrub calculates the 
checksums for all data and metadata blocks and matches that against the 
recorded checksum, reporting any no-match cases. (b) Where the checksums 
don't match up, if there's another copy of the data that /does/ checksum-
validate, scrub will "scrub" the bad copy, replacing it with a duplicate 
of the good one.

As it happens, on a (non-ssd) single-device filesystem, btrfs defaults to 
single data, dup metadata.  In that case there's a second, hopefully 
valid, copy of the metadata blocks that can be used to correct a bad 
copy.  But there's only a single copy of data blocks so while scrub can 
detect data-block errors, it won't be able to fix them.

On a multi-device filesystem, btrfs defaults to raid1 metadata (with only 
two copies regardless of the number of devices present, N-way-mirroring 
is roadmapped but not yet implemented), single data, so again, hopefully 
the second copy of a bad metadata block is valid and can be used to scrub 
the bad one, but just as with the single-device case, it can detect but 
not fix data checksum errors.

Tho of course in the multi-device case it's possible to set data to raid1 
as well, and that's what I've done here so it too can be error-corrected 
from a hopefully good second copy.  (Raid10 is similarly protected.  
Raid5/6 should work a bit differently, with parity, but last I knew raid56 
scrub and recovery wasn't fully implemented yet, leaving raid1 and raid10, 
along with dup mode for single-device metadata only, as the error-
correcting choices.)

But if the problem is a btrfs logic error, such that the (meta)data that 
was actually checksummed and written out was bad before it was ever 
checksummed in the first place, then scrub won't do a thing for it, 
because the checksum validates just fine, it's just that it's a perfectly 
valid checksum on perfectly invalid (meta)data.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-18 21:22         ` Duncan
  2014-06-19  8:56           ` Konstantinos Skarlatos
@ 2014-06-19 15:13           ` Marc MERLIN
  1 sibling, 0 replies; 12+ messages in thread
From: Marc MERLIN @ 2014-06-19 15:13 UTC (permalink / raw)
  To: Duncan; +Cc: linux-btrfs

On Wed, Jun 18, 2014 at 09:22:50PM +0000, Duncan wrote:
> Tho at least in Marc's case, he's running kernels a couple back in some 
> cases and they may still have BUGONs already replaced in the most current 
> kernel.
 
The machine I originally has that one last bug on (balance crash) was an
ubuntu kernel (oldish 3.13), but I reproduced with 3.15.1 where it got worse
(it seemed like WARN on 3.13 and BUG_ON in 3.15 since with 3.13 I got syslog
output and the system kept running and with 3.15.1 it just crashed and I had
to have netconsole ready to catch the output).

Marc 
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-19 15:06             ` Duncan
@ 2014-06-19 15:19               ` Duncan
  0 siblings, 0 replies; 12+ messages in thread
From: Duncan @ 2014-06-19 15:19 UTC (permalink / raw)
  To: linux-btrfs

Duncan posted on Thu, 19 Jun 2014 15:06:00 +0000 as excerpted:

> Scrub detects (and potentially fixes) exactly one sort of problem (tho
> that one can definitely cause others), and that's not it.

Hmm. Last phrase was ambiguous.

What I meant was, that problem (your problem) is not the sort of problem 
scrub detects and potentially fixes.

NOT: that's not /all/ of what scrub does. (... Which wouldn't make sense 
in context, but that's how I initially tried to read it when I reread 
what I posted, thus confusing myself, and if even *I* get confused 
reading my own writing...! =:^( )

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: frustrations with handling of crash reports
  2014-06-19  8:56           ` Konstantinos Skarlatos
  2014-06-19 15:06             ` Duncan
@ 2014-06-19 17:37             ` Chris Murphy
  1 sibling, 0 replies; 12+ messages in thread
From: Chris Murphy @ 2014-06-19 17:37 UTC (permalink / raw)
  To: Btrfs BTRFS


On Jun 19, 2014, at 2:56 AM, Konstantinos Skarlatos <k.skarlatos@gmail.com> wrote:
> 
> My filesystem that consistently kernel panics when a specific logical address is read, passes scrub without anything bad reported. What's the use of scrub if it cant deal with this?

The myriad repair tools: automatic at mount, recovery mount option, scrub, check/repair, btrfs-zero-log, chunk-recover, super-recover certainly make Btrfs significantly more challenging to troubleshoot for the user familiar with other file systems. I think this is just a maturity issue, and as the necessary logic of repairing a file system reveals itself I think we'll see consolidation and more automation of these repair methods.

fs/btrfs/scrub.c comments say what it does now and future enhancements. 

" … reads all
 * extent and super block and verifies the checksums. In case a bad checksum
 * is found or the extent cannot be read, good data will be written back if
 * any can be found."

Scrub is pretty much just about checksum verification. It doesn't check file system consistency. So the file system could be inconsistent and a scrub still comes up clean.

> Well my use case is about 25 filesystems on rotating disks, 20 of them on single disks, and the rest are multiple disk filesystems, either raid1 or single. I have many subvolumes and in some cases thousands of snapshots, but no databases, systemd and the like on them.

That's a lot of subvolumes and snapshots. I don't know this is expected to work really well right now (?), yes hundreds but with thousands there have been some known problems in the recent past at least.


> Of course I have everything backed up, </nag mode on> but I believe that after all those years of development I shouldnt still be forced to do mkfs every 6 monts or so, when i use no new features. </nag mode off>

The problem is that an old file system implies many kernels doing different kinds of reads and writes over time, making a given file system rather non-deterministic compared to any other. So the possible problems aren't all known and therefore the way to fix them may not be known yet either.


Chris Murphy

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2014-06-19 17:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-19 13:49 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems Marc MERLIN
2014-06-17  6:29 ` Satoru Takeuchi
2014-06-17 14:40   ` Marc MERLIN
2014-06-17 14:59   ` frustrations with handling of crash reports Marc MERLIN
2014-06-17 18:27     ` Marc MERLIN
2014-06-18 13:23       ` Konstantinos Skarlatos
2014-06-18 21:22         ` Duncan
2014-06-19  8:56           ` Konstantinos Skarlatos
2014-06-19 15:06             ` Duncan
2014-06-19 15:19               ` Duncan
2014-06-19 17:37             ` Chris Murphy
2014-06-19 15:13           ` Marc MERLIN

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).