From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from fgwmail2.fujitsu.co.jp ([164.71.1.135]:40460 "EHLO fgwmail2.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751127AbaFQG3u (ORCPT ); Tue, 17 Jun 2014 02:29:50 -0400 Received: from kw-mxq.gw.nic.fujitsu.com (unknown [10.0.237.131]) by fgwmail2.fujitsu.co.jp (Postfix) with ESMTP id 37E533EE0CD for ; Tue, 17 Jun 2014 15:29:49 +0900 (JST) Received: from s2.gw.fujitsu.co.jp (s2.gw.nic.fujitsu.com [10.0.50.92]) by kw-mxq.gw.nic.fujitsu.com (Postfix) with ESMTP id 65D36AC0920 for ; Tue, 17 Jun 2014 15:29:48 +0900 (JST) Received: from g01jpfmpwkw03.exch.g01.fujitsu.local (g01jpfmpwkw03.exch.g01.fujitsu.local [10.0.193.57]) by s2.gw.fujitsu.co.jp (Postfix) with ESMTP id E13E71DB803C for ; Tue, 17 Jun 2014 15:29:47 +0900 (JST) Message-ID: <539FE03F.5030306@jp.fujitsu.com> Date: Tue, 17 Jun 2014 15:29:19 +0900 From: Satoru Takeuchi MIME-Version: 1.0 To: Marc MERLIN , Subject: Re: 3.15-rc5 deadlocked a 2nd time after I was copying photos from an sdcard + common code path that deadlocks all btrfs filesystems References: <20140519134915.GA27432@merlins.org> In-Reply-To: <20140519134915.GA27432@merlins.org> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Sender: linux-btrfs-owner@vger.kernel.org List-ID: Hi Marc, (2014/05/19 22:49), Marc MERLIN wrote: > Ok, that's 2 out of 2. > > I was copying pictures from an sdcard (through mmcblk0), and the > filesystem deadlocked. > > Unfortunately, when this happens, I copied my pictures (which were still > in RAM) to my 2nd drive which was also btrfs. From your sysrq capture, your sd card is formatted as VFAT, is it correct? === [194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive! === > I had to reboot, and of course the last pictures didn't get committed to > disk, but more annoyingly the copy I did to the second drive didn't work > either. > All the filenames got copied to the 2nd drive, some ended up with data, > and others ended up empty. > Why does a deadlock on drive 1 also cause btrfs to fail to write to > drive #2? > This is not the first time, there seem to be common codepaths across all > drives (just like disk array #1 having problems causing failure of > syslog to work on the boot drive with btrfs). > > I tried to capture sysrq+w, but it didn't make it to disk because of that bug. > I do have remote syslog of the hangs before that though, but the capture of sysrq+w > has too much missing data to be useful > http://marc.merlins.org/tmp/btrfs-hang.txt quoted from btrfs-hang.txt: === [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. === Did you try mkfs.fsck? In addition, does this problem happen after that? Here try to reproduce with 3.16-rc1 is desirable. If it's easy to reproduce, - run fsck.vfat (as I described before), - change SD card, - change copy target to other filesystem than btrfs is useful to find out the root cause. Thanks, Satoru > > Mmmh, maybe the deadlock is more complicated. I had a 2nd syslog stream > going to an ext4 filesystem, exactly to get around that btrfs master > deadlock, and now I see that didn't work either. > > If sync hangs, and logging to an ext4 filesystem didn't work, am I > hitting another bug/hardware problem? > > Here's what I got at the end? > > > [194790.138156] FAT-fs (mmcblk0p1): utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive! > [194790.140892] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck. > [194932.445153] INFO: task IndexedDB:29612 blocked for more than 120 seconds. > [194932.445161] Tainted: G W 3.15.0-rc5-amd64-i915-preempt-20140216s1 #2 > [194932.445163] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [194932.445166] IndexedDB D ffff8800ccde8bc0 0 29612 5570 0x00000080 > [194932.445172] ffff8801b521fc30 0000000000000086 ffff8801b521fc00 ffff8801b521ffd8 > [194932.445178] ffff8801d622a450 00000000000141c0 ffff88041e3941c0 ffff8801d622a450 > [194932.445182] ffff8801b521fcd0 0000000000000002 ffffffff810fda1a ffff8801b521fc40 > [194932.445188] Call Trace: > [194932.445198] [] ? wait_on_page_read+0x3c/0x3c > [194932.445209] [] io_schedule+0x60/0x7a > [194932.445214] [] sleep_on_page+0xe/0x12 > [194932.445219] [] __wait_on_bit_lock+0x46/0x8a > [194932.445223] [] __lock_page+0x69/0x6b > [194932.445228] [] ? autoremove_wake_function+0x34/0x34 > [194932.445232] [] lock_page+0x1e/0x21 > [194932.445237] [] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3 > [194932.445243] [] ? mutex_unlock+0x16/0x18 > [194932.445248] [] ? btrfs_file_aio_write+0x3e9/0x4b6 > [194932.445251] [] extent_writepages+0x4b/0x5c > [194932.445255] [] ? btrfs_submit_direct+0x3f4/0x3f4 > [194932.445262] [] btrfs_writepages+0x28/0x2a > [194932.445267] [] do_writepages+0x1e/0x2c > [194932.445272] [] __filemap_fdatawrite_range+0x55/0x57 > [194932.445277] [] filemap_fdatawrite_range+0x13/0x15 > [194932.445280] [] btrfs_sync_file+0xa8/0x2b3 > [194932.445286] [] ? __percpu_counter_add+0x8c/0xa6 > [194932.445292] [] vfs_fsync_range+0x18/0x22 > [194932.445296] [] vfs_fsync+0x1c/0x1e > [194932.445299] [] do_fsync+0x2c/0x4c > [194932.445303] [] SyS_fdatasync+0x13/0x17 > [194932.445308] [] system_call_fastpath+0x1a/0x1f > [194932.445395] INFO: task kworker/u16:35:3812 blocked for more than 120 seconds. > [194932.445398] Tainted: G W 3.15.0-rc5-amd64-i915-preempt-20140216s1 #2 > [194932.445400] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [194932.445403] kworker/u16:35 D 0000000000000000 0 3812 2 0x00000080 > [194932.445410] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) > [194932.445414] ffff88003b647a00 0000000000000046 ffff88003b6479d0 ffff88003b647fd8 > [194932.445419] ffff88003b8ca590 00000000000141c0 ffff88041e3941c0 ffff88003b8ca590 > [194932.445423] ffff88003b647aa0 0000000000000002 ffffffff810fda1a ffff88003b647a10 > [194932.445427] Call Trace: > [194932.445432] [] ? wait_on_page_read+0x3c/0x3c > [194932.445437] [] schedule+0x73/0x75 > [194932.445441] [] io_schedule+0x60/0x7a > [194932.445445] [] sleep_on_page+0xe/0x12 > [194932.445450] [] __wait_on_bit_lock+0x46/0x8a > [194932.445454] [] __lock_page+0x69/0x6b > [194932.445458] [] ? autoremove_wake_function+0x34/0x34 > [194932.445461] [] lock_page+0x1e/0x21 > [194932.445465] [] extent_write_cache_pages.isra.16.constprop.32+0x10e/0x2c3 > [194932.445470] [] extent_writepages+0x4b/0x5c > [194932.445473] [] ? btrfs_submit_direct+0x3f4/0x3f4 > [194932.445479] [] ? preempt_count_add+0x77/0x8d > [194932.445483] [] btrfs_writepages+0x28/0x2a > [194932.445488] [] do_writepages+0x1e/0x2c > [194932.445492] [] __writeback_single_inode+0x7d/0x238 > [194932.445495] [] writeback_sb_inodes+0x1eb/0x339 > [194932.445499] [] __writeback_inodes_wb+0x74/0xb7 > [194932.445503] [] wb_writeback+0x138/0x293 > [194932.445507] [] bdi_writeback_workfn+0x19a/0x329 > [194932.445513] [] ? load_TLS+0xb/0xf > [194932.445519] [] process_one_work+0x195/0x2d2 > [194932.445523] [] worker_thread+0x136/0x205 > [194932.445526] [] ? rescuer_thread+0x27a/0x27a > [194932.445530] [] kthread+0xae/0xb6 > [194932.445534] [] ? __kthread_parkme+0x61/0x61 > [194932.445537] [] ret_from_fork+0x7c/0xb0 > [194932.445540] [] ? __kthread_parkme+0x61/0x61 >