All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Applications using fsync cause hangs for several seconds every few minutes
@ 2011-08-09 21:29 Andrew Guertin
  2011-08-12  1:13 ` Andrew Guertin
  2011-08-17 14:24 ` Andrew Guertin
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Guertin @ 2011-08-09 21:29 UTC (permalink / raw)
  To: linux-btrfs

On 06/21/2011 01:15 PM, Jan Stilow wrote:
> Hello,
> 
> Nirbheek Chauhan <nirbheek <at> gentoo.org> writes:
>> [...]
>>
>> Every few minutes, (I guess) when applications do fsync (firefox,
>> xchat, vim, etc), all applications that use fsync() hang for several
>> seconds, and applications that use general IO suffer extreme
>> slowdowns. iotop shows various combinations of the processes listed
>> below doing writes, and the total write as 2-3MB/s.
>>
>> [btrfs-dealloc-]
>> [btrfs-submit-0]
>> [btrfs-transacti]
>> [btrfs-endio-wri]
>> [flush-btrfs-1]
> 
> I'm using btrfs under a 2.6.39-ARCH kernel and run into the same issue.
> 
> In my case the [btrfs-submit-0] and [btrfs-transacti] shows up in iotop
> and produce 99% of IO at the time a application is frozen. For something
> like 10 to 30 seconds.
> 
> [...]

I see the same issue. I have bisected it to

4e69b598f6cfb0940b75abf7e179d6020e94ad1e is the first bad commit
commit 4e69b598f6cfb0940b75abf7e179d6020e94ad1e
Author: Josef Bacik <josef@redhat.com>
Date:   Mon Mar 21 10:11:24 2011 -0400

Btrfs: cleanup how we setup free space clusters

...which came in between 2.6.38 and 2.6.39.

The newest kernel I have tried was 3.0-rc7, which still had the bug. I
have not tried 3.1-rc1, but plan to soon.

--Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-09 21:29 Applications using fsync cause hangs for several seconds every few minutes Andrew Guertin
@ 2011-08-12  1:13 ` Andrew Guertin
  2011-08-18 14:38   ` Chris Mason
  2011-08-17 14:24 ` Andrew Guertin
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Guertin @ 2011-08-12  1:13 UTC (permalink / raw)
  To: linux-btrfs

On 08/09/2011 05:29 PM, Andrew Guertin wrote:
> I have not tried 3.1-rc1, but plan to soon.

I've tested now, this does still occur in 3.1-rc1.

--Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-09 21:29 Applications using fsync cause hangs for several seconds every few minutes Andrew Guertin
  2011-08-12  1:13 ` Andrew Guertin
@ 2011-08-17 14:24 ` Andrew Guertin
  2011-08-17 14:29   ` Michael Cronenworth
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Guertin @ 2011-08-17 14:24 UTC (permalink / raw)
  To: linux-btrfs

On 08/09/2011 05:29 PM, Andrew Guertin wrote:
> On 06/21/2011 01:15 PM, Jan Stilow wrote:
>> Hello,
>>
>> Nirbheek Chauhan <nirbheek <at> gentoo.org> writes:
>>> [...]
>>>
>>> Every few minutes, (I guess) when applications do fsync (firefox,
>>> xchat, vim, etc), all applications that use fsync() hang for several
>>> seconds, and applications that use general IO suffer extreme
>>> slowdowns. iotop shows various combinations of the processes listed
>>> below doing writes, and the total write as 2-3MB/s.
>>>
>>> [btrfs-dealloc-]
>>> [btrfs-submit-0]
>>> [btrfs-transacti]
>>> [btrfs-endio-wri]
>>> [flush-btrfs-1]
>>
>> I'm using btrfs under a 2.6.39-ARCH kernel and run into the same issue.
>>
>> In my case the [btrfs-submit-0] and [btrfs-transacti] shows up in iotop
>> and produce 99% of IO at the time a application is frozen. For something
>> like 10 to 30 seconds.
>>
>> [...]
> 
> I see the same issue. I have bisected it to
> 
> 4e69b598f6cfb0940b75abf7e179d6020e94ad1e is the first bad commit
> commit 4e69b598f6cfb0940b75abf7e179d6020e94ad1e
> Author: Josef Bacik <josef@redhat.com>
> Date:   Mon Mar 21 10:11:24 2011 -0400
> 
> Btrfs: cleanup how we setup free space clusters
> 
> ...which came in between 2.6.38 and 2.6.39.

Any chance of someone looking at this? I (and presumably others) haven't
been able to upgrade my kernel past 2.6.38 because of this.

--Andrew



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-17 14:24 ` Andrew Guertin
@ 2011-08-17 14:29   ` Michael Cronenworth
  2011-08-17 14:38     ` Andrew Guertin
  2011-08-18  6:47     ` Chris Samuel
  0 siblings, 2 replies; 24+ messages in thread
From: Michael Cronenworth @ 2011-08-17 14:29 UTC (permalink / raw)
  To: Andrew Guertin; +Cc: linux-btrfs

Andrew Guertin on 08/17/2011 09:24 AM wrote:
> I (and presumably others) haven't
> been able to upgrade my kernel past 2.6.38 because of this.

I'm running kernel 3.0 (Fedora 15's 2.6.40) on two boxes and I have not 
seen slow downs or hangs. I use Firefox.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-17 14:29   ` Michael Cronenworth
@ 2011-08-17 14:38     ` Andrew Guertin
  2011-08-17 14:55       ` Dave
  2011-08-18  6:47     ` Chris Samuel
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Guertin @ 2011-08-17 14:38 UTC (permalink / raw)
  To: Michael Cronenworth; +Cc: linux-btrfs

On 08/17/2011 10:29 AM, Michael Cronenworth wrote:
> Andrew Guertin on 08/17/2011 09:24 AM wrote:
>> I (and presumably others) haven't
>> been able to upgrade my kernel past 2.6.38 because of this.
> 
> I'm running kernel 3.0 (Fedora 15's 2.6.40) on two boxes and I have not
> seen slow downs or hangs. I use Firefox.

Well I'd expect it to be somewhat uncommon, or it wouldn't survive 3
kernel versions :) But at least 3 people have reported it, and for me at
least it's reliably reproducible enough to bisect, so I'm quite certain
there's something going on.

--Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-17 14:38     ` Andrew Guertin
@ 2011-08-17 14:55       ` Dave
  2011-08-18  2:41         ` Anand Jain
  0 siblings, 1 reply; 24+ messages in thread
From: Dave @ 2011-08-17 14:55 UTC (permalink / raw)
  To: Andrew Guertin; +Cc: Michael Cronenworth, linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 758 bytes --]

On Wed, Aug 17, 2011 at 10:38:42AM -0400, Andrew Guertin wrote:
> Well I'd expect it to be somewhat uncommon, or it wouldn't survive 3
> kernel versions :) But at least 3 people have reported it, and for me at
> least it's reliably reproducible enough to bisect, so I'm quite certain
> there's something going on.

I've been simply living with this issue.  I can reproduce it by rsyncing very
large files to a btrfs volume.  My entire desktop will freeze for up to three
minutes and no amount of nice/ionice can temper this.

Once I've finished the rsync certain apps will periodically hang (Firefox in
particular).  This behavior goes away after a reboot.

I'm running kernel version 3.0.
-- 
-=[dave]=-

Entropy isn't what it used to be.

[-- Attachment #2: Type: application/pgp-signature, Size: 230 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-17 14:55       ` Dave
@ 2011-08-18  2:41         ` Anand Jain
  2011-08-18  6:44           ` youagree
  2011-08-18  7:41           ` Andrew Guertin
  0 siblings, 2 replies; 24+ messages in thread
From: Anand Jain @ 2011-08-18  2:41 UTC (permalink / raw)
  To: Dave; +Cc: Andrew Guertin, Michael Cronenworth, linux-btrfs


Dave,

  good to have a test case on the 3.0 kernel.  do you have btrfs as
  root fs ? and
  can you show how are you using the btrfs mainly I would need
  'btrfs fi show' let me try if I can reproduce.

Thanks, Anand


> I've been simply living with this issue.  I can reproduce it by rsyncing very
> large files to a btrfs volume.  My entire desktop will freeze for up to three
> minutes and no amount of nice/ionice can temper this.
>
> Once I've finished the rsync certain apps will periodically hang (Firefox in
> particular).  This behavior goes away after a reboot.
>
> I'm running kernel version 3.0.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18  2:41         ` Anand Jain
@ 2011-08-18  6:44           ` youagree
  2011-08-18  7:29             ` Andrew Guertin
  2011-08-18  7:41           ` Andrew Guertin
  1 sibling, 1 reply; 24+ messages in thread
From: youagree @ 2011-08-18  6:44 UTC (permalink / raw)
  To: linux-btrfs

This is most probably related to the same regression seen after 2.6.38,
my blocked comment on 3 August included an indication to that the
behavior was present in my distro 2.6.38 kernel too, it just was
appearing after a considerably longer uptime (on my desktop system using
btrfs as rootfs on an Intel ICH10 driven SATA HDD).

I have reverted my /  to ext4 since, and I'm okay with it, although I
would be very happy to see some improvement on this serious-for-me issue.


Btrfs slowdown

news://news.gmane.org:119/CAO47_-9BLKWUGDEuzaLqHSq9tZkAUaO8FMQEy1pPk9A2Hb+5AQ@mail.gmail.com

Also, a patch by Josef Bacik was an attempt for fixing this, but no one
reported about testing it on an affected system, it did not eliminate
the slowdowns for me:

PLEASE TEST: Everybody who is seeing weird and long hangs
news://news.gmane.org:119/4E36C47E.70309@redhat.com


My comment was going as an aswer to Mck's post in "Btrfs slowdown"
thread, where I reported about this in a little more detail - but it
never appeared on the list.

I try including it now:

________________________________________________________________________

I'm confirming this too. Following advices given on #btrfs irc, I have
applied Josef's second patch for fs/btrfs/extent_io.c and I'm reporting
that it did NOT make the slowdowns disappear on 3.0 kernels (even with
some rather different configs).

The HDD thrashing appeared on all other kernel versions I tried, higher
than 2.6.37.
Initially, I had been into looking for a latest known good kernel (to
prepare a proper git bisect as cwillu advised) and at first I also felt
like 2.6.38 does not show this miserable behaviour. But later it turned
out this was only for approximately 2 days of uptime. Given enough time,
the lock-ups appeared on 2.6.38 too. Although they were not that
apparent than on later kernel versions, and the individual lockups took
much less time with 2.6.38 running for 2 days (binary Sabayon Linux
repository kernel).

My HDD, with btrfs as / on it emits very distinct (and loud enough)
noises with a slightly different character for reads and writes - and I
can actually hear the disk's repetitive seek pattern during a such
thrashing period.

Based on that, I guess it must be the exact same thing happening with
2.6.38 as with later kernels because they sound very similar. They last
much shorter but they have a similarly repetitive seeking nature with
other I/O severely throttled and I believe it is write what is mostly
what's happening during a lockup. So I concluded that I failed to
identify a known good version so far. I didn't have time to get into
earlier kernels than .38. (Tried .37, but for too brief of uptime to
claim they did not appear when I was on .37)

Similar with my current kernel. It started happening after about 12
hours of running the machine using
# uname -a
Linux insula 3.0.0-git15genseed #2 SMP PREEMPT Tue Aug 2 20:10:05 CEST
2011 x86_64 Intel(R) Core(TM)2 Duo CPU E4500 @ 2.20GHz GenuineIntel
GNU/Linux

As appended string reflects, it is a custom kernel, it has Josef's patch
applied with the config attached.Tried to patch my distro's 3.0 kernel,
no change was experienced with regards to the issue (iirc it was even a
lot worse).

Let me know if I can contribute with anything that would be valuable for
the developers towards elimination of this very nasty bug.

Now, after 23 hours of uptime, my PC has become almost unusable.
Currently there's about 8 seconds thrashing, 10 seconds not thrashing,
and during thrashing, all other (disk) I/O is practically blocked.

SysRq+W under thrashing (dunno how informative it is, but here's one):

[62279.779382] SysRq : Show Blocked State
[62279.779389]   task                        PC stack   pid father
[62279.779404] btrfs-submit-0  D 0000000000000000  5616  4678      2
0x00000000
[62279.779413]  ffff88012b1370d0 0000000000000046 ffff880100000000
ffffffff8182c020
[62279.779422]  ffff880128d39fd8 0000000000010480 0000000000004000
ffff880128d38000
[62279.779429]  ffff880128d39fd8 0000000000010480 ffff88012b1370d0
0000000000010480
[62279.779437] Call Trace:
[62279.779449]  [<ffffffff812779c6>] ? cfq_set_request+0x33e/0x37e
[62279.779456]  [<ffffffff81277063>] ? cfq_cic_lookup+0x35/0x139
[62279.779462]  [<ffffffff812773a2>] ? cfq_may_queue+0x51/0x6e
[62279.779470]  [<ffffffff8143ed81>] ? io_schedule+0x4e/0x63
[62279.779477]  [<ffffffff8126b276>] ? get_request_wait+0xaa/0x10e
[62279.779484]  [<ffffffff8104f2ad>] ? wake_up_bit+0x23/0x23
[62279.779490]  [<ffffffff8126c2a6>] ? __make_request+0x175/0x26b
[62279.779496]  [<ffffffff8126a267>] ? generic_make_request+0x224/0x289
[62279.779502]  [<ffffffff8126a37f>] ? submit_bio+0xb3/0xbc
[62279.779509]  [<ffffffff81372238>] ? dm_any_congested+0x4f/0x57
[62279.779516]  [<ffffffff81206de6>] ? run_scheduled_bios+0x246/0x3b1
[62279.779523]  [<ffffffff8120c791>] ? worker_loop+0x180/0x4bb
[62279.779529]  [<ffffffff8120c611>] ? btrfs_queue_worker+0x24e/0x24e
[62279.779535]  [<ffffffff8104eee7>] ? kthread+0x7a/0x82
[62279.779542]  [<ffffffff81442554>] ? kernel_thread_helper+0x4/0x10
[62279.779548]  [<ffffffff8104ee6d>] ? kthread_worker_fn+0x149/0x149
[62279.779554]  [<ffffffff81442550>] ? gs_change+0xb/0xb
[62279.779560] btrfs-transacti D 0000000000000001  3856  4689      2
0x00000000
[62279.779568]  ffff88012b205320 0000000000000046 0000000000000000
ffff88012b06d320
[62279.779576]  ffff880128d97fd8 0000000000010480 0000000000004000
ffff880128d96000
[62279.779583]  ffff880128d97fd8 0000000000010480 ffff88012b205320
0000000000010480
[62279.779591] Call Trace:
[62279.779597]  [<ffffffff8120152f>] ? alloc_extent_state+0x12/0x55
[62279.779605]  [<ffffffff810aefbe>] ? kmem_cache_free+0x87/0x8e
[62279.779611]  [<ffffffff8127e2ab>] ? rb_erase+0x134/0x26f
[62279.779617]  [<ffffffff81081326>] ? __lock_page+0x63/0x63
[62279.779622]  [<ffffffff8143ed81>] ? io_schedule+0x4e/0x63
[62279.779628]  [<ffffffff8108132f>] ? sleep_on_page+0x9/0x10
[62279.779633]  [<ffffffff81081326>] ? __lock_page+0x63/0x63
[62279.779638]  [<ffffffff8143f36c>] ? __wait_on_bit+0x3e/0x71
[62279.779644]  [<ffffffff810814c9>] ? wait_on_page_bit+0x6a/0x70
[62279.779650]  [<ffffffff8104f2d7>] ? autoremove_wake_function+0x2a/0x2a
[62279.779657]  [<ffffffff811ebb53>] ? btrfs_wait_marked_extents+0xf5/0x12f
[62279.779664]  [<ffffffff811ebbb6>] ?
btrfs_write_and_wait_marked_extents+0x29/0x3d
[62279.779670]  [<ffffffff811ec2b0>] ? btrfs_commit_transaction+0x5c7/0x6e8
[62279.779677]  [<ffffffff810433c4>] ? del_timer_sync+0x34/0x3e
[62279.779682]  [<ffffffff8143f1bd>] ? schedule_timeout+0x182/0x1a0
[62279.779688]  [<ffffffff8104f2ad>] ? wake_up_bit+0x23/0x23
[62279.779694]  [<ffffffff811ec801>] ? start_transaction+0x1e0/0x21a
[62279.779700]  [<ffffffff811e66c4>] ? transaction_kthread+0x180/0x238
[62279.779706]  [<ffffffff811e6544>] ? btrfs_congested_fn+0x87/0x87
[62279.779712]  [<ffffffff811e6544>] ? btrfs_congested_fn+0x87/0x87
[62279.779718]  [<ffffffff8104eee7>] ? kthread+0x7a/0x82
[62279.779724]  [<ffffffff81442554>] ? kernel_thread_helper+0x4/0x10
[62279.779730]  [<ffffffff8104ee6d>] ? kthread_worker_fn+0x149/0x149
[62279.779736]  [<ffffffff81442550>] ? gs_change+0xb/0xb
[62279.779759] btrfs-endio-wri D 0000000000000000  4208 11320      2
0x00000000
[62279.779767]  ffff88012b173570 0000000000000046 0000000000000000
ffffffff8182c020
[62279.779775]  ffff88011afa9fd8 0000000000010480 0000000000004000
ffff88011afa8000
[62279.779782]  ffff88011afa9fd8 0000000000010480 ffff88012b173570
0000000000010480
[62279.779789] Call Trace:
[62279.779796]  [<ffffffff8126a267>] ? generic_make_request+0x224/0x289
[62279.779802]  [<ffffffff811faaeb>] ? lookup_extent_mapping+0x37/0xb3
[62279.779808]  [<ffffffff81081326>] ? __lock_page+0x63/0x63
[62279.779813]  [<ffffffff8143ed81>] ? io_schedule+0x4e/0x63
[62279.779818]  [<ffffffff8108132f>] ? sleep_on_page+0x9/0x10
[62279.779823]  [<ffffffff81081326>] ? __lock_page+0x63/0x63
[62279.779828]  [<ffffffff8143f36c>] ? __wait_on_bit+0x3e/0x71
[62279.779834]  [<ffffffff810814c9>] ? wait_on_page_bit+0x6a/0x70
[62279.779840]  [<ffffffff8104f2d7>] ? autoremove_wake_function+0x2a/0x2a
[62279.779846]  [<ffffffff81205835>] ? read_extent_buffer_pages+0x318/0x39b
[62279.779852]  [<ffffffff811e5a9e>] ? verify_parent_transid+0x1d9/0x1d9
[62279.779859]  [<ffffffff811e6c95>] ?
btree_read_extent_buffer_pages.clone.66+0x58/0xb2
[62279.779865]  [<ffffffff811e78b7>] ? read_tree_block+0x31/0x44
[62279.779871]  [<ffffffff811d1a8a>] ?
read_block_for_search.clone.41+0x309/0x33f
[62279.779878]  [<ffffffff812115fa>] ? btrfs_tree_read_unlock+0x9/0x33
[62279.779884]  [<ffffffff811cd235>] ? unlock_up+0x114/0x140
[62279.779890]  [<ffffffff811d4203>] ? btrfs_search_slot+0x7e7/0xa5e
[62279.779897]  [<ffffffff811d54fc>] ? btrfs_insert_empty_items+0x62/0xb3
[62279.779904]  [<ffffffff811da616>] ?
alloc_reserved_file_extent.clone.68+0x9b/0x213
[62279.779911]  [<ffffffff811dd08c>] ? run_clustered_refs+0x61f/0x70b
[62279.779918]  [<ffffffff811dd241>] ? btrfs_run_delayed_refs+0xc9/0x1cd
[62279.779924]  [<ffffffff811ec46f>] ? __btrfs_end_transaction+0x83/0x1e2
[62279.779931]  [<ffffffff811f171d>] ? btrfs_finish_ordered_io+0x280/0x2a5
[62279.779937]  [<ffffffff81202316>] ? end_bio_extent_writepage+0xa0/0x14a
[62279.779943]  [<ffffffff8120c791>] ? worker_loop+0x180/0x4bb
[62279.779949]  [<ffffffff8120c611>] ? btrfs_queue_worker+0x24e/0x24e
[62279.779955]  [<ffffffff8104eee7>] ? kthread+0x7a/0x82
[62279.779962]  [<ffffffff81442554>] ? kernel_thread_helper+0x4/0x10
[62279.779968]  [<ffffffff8104ee6d>] ? kthread_worker_fn+0x149/0x149
[62279.779974]  [<ffffffff81442550>] ? gs_change+0xb/0xb


# mount | grep btrfs
/dev/mapper/vg0-rootvol on / type btrfs (rw,relatime)


Thanks for all efforts.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-17 14:29   ` Michael Cronenworth
  2011-08-17 14:38     ` Andrew Guertin
@ 2011-08-18  6:47     ` Chris Samuel
  2011-08-18  6:58       ` youagree
  1 sibling, 1 reply; 24+ messages in thread
From: Chris Samuel @ 2011-08-18  6:47 UTC (permalink / raw)
  To: Michael Cronenworth; +Cc: Andrew Guertin, linux-btrfs

On 18/08/11 00:29, Michael Cronenworth wrote:

> I'm running kernel 3.0 (Fedora 15's 2.6.40) on two boxes
> and I have not seen slow downs or hangs. I use Firefox.

I've got btrfs on an external USB drive with the 3.0.1 kernel and
I see that sync seems to take an age, according to iotop it seems
that the btrfs processes are hitting it quite hard, IIRC.

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18  6:47     ` Chris Samuel
@ 2011-08-18  6:58       ` youagree
  2011-08-19  7:34         ` Chris Samuel
  0 siblings, 1 reply; 24+ messages in thread
From: youagree @ 2011-08-18  6:58 UTC (permalink / raw)
  To: linux-btrfs


Are these processes principally btrfs-submit and btrfs-transacti in
particular?

Then it may be related to my very similar issue reported earlier.

On 08/18/2011 08:47 AM, Chris Samuel wrote:
> On 18/08/11 00:29, Michael Cronenworth wrote:
> 
>> I'm running kernel 3.0 (Fedora 15's 2.6.40) on two boxes
>> and I have not seen slow downs or hangs. I use Firefox.
> 
> I've got btrfs on an external USB drive with the 3.0.1 kernel and
> I see that sync seems to take an age, according to iotop it seems
> that the btrfs processes are hitting it quite hard, IIRC.
> 
> cheers,
> Chris


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18  6:44           ` youagree
@ 2011-08-18  7:29             ` Andrew Guertin
  2011-08-18  7:55               ` youagree
  2011-08-18 11:45               ` Andrew Guertin
  0 siblings, 2 replies; 24+ messages in thread
From: Andrew Guertin @ 2011-08-18  7:29 UTC (permalink / raw)
  To: youagree, linux-btrfs

On 08/18/2011 02:44 AM, youagree wrote:
> Also, a patch by Josef Bacik was an attempt for fixing this, but no one
> reported about testing it on an affected system, it did not eliminate
> the slowdowns for me:
>
> PLEASE TEST: Everybody who is seeing weird and long hangs
> news://news.gmane.org:119/4E36C47E.70309@redhat.com

I had not seen this (actually, I had skimmed it but not thought it was 
relevant). I will try it as soon as I get a chance.

> The HDD thrashing appeared on all other kernel versions I tried, higher
> than 2.6.37.
> Initially, I had been into looking for a latest known good kernel (to
> prepare a proper git bisect as cwillu advised) and at first I also felt
> like 2.6.38 does not show this miserable behaviour. But later it turned
> out this was only for approximately 2 days of uptime. Given enough time,
> the lock-ups appeared on 2.6.38 too. Although they were not that
> apparent than on later kernel versions, and the individual lockups took
> much less time with 2.6.38 running for 2 days (binary Sabayon Linux
> repository kernel).

I have not seen slowdowns on 2.6.38. More specifically, I observe the following 
behaviors after commit 4e69b59:

* Many processes occasionally hang for a short time
* When this happens, my cpu monitor shows a short burst of cpu activity (100% of 
1 core) followed by a longer period of IO
* When this happens, iotop shows [btrfs-submit-0] and [btrfs-transacti] at the 
top of the list
* Behavior slowly increases in duration (and frequency?) over time, and goes 
away with a reboot
* Heavy IO makes behavior appear faster

... and the following behaviors before commit 4e69b59:

* Occasional spikes of IO on cpu monitor concurrent with [btrfs-submit-0] and 
[btrfs-transacti] at top of iotop
* No hangs, even when that occurs

I wasn't taking notes or anything though, so I'm not 100% certain I was 
observing or interpreting or remembering everything correctly.

--Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18  2:41         ` Anand Jain
  2011-08-18  6:44           ` youagree
@ 2011-08-18  7:41           ` Andrew Guertin
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Guertin @ 2011-08-18  7:41 UTC (permalink / raw)
  To: Anand Jain; +Cc: linux-btrfs

On 08/17/2011 10:41 PM, Anand Jain wrote:
> Dave,
>
> good to have a test case on the 3.0 kernel. do you have btrfs as
> root fs ? and
> can you show how are you using the btrfs mainly I would need
> 'btrfs fi show' let me try if I can reproduce.
>
> Thanks, Anand

Personally, I find that large compiles are very "useful" in making the issue 
occur sooner. I'm on gentoo, so when I was bisecting, I'd often just emerge 
openoffice and let it run for a while.

For observing, the best way I found was to run JOSM (Java OpenStreetMap editor). 
Browsing around a map is very interactive, so it's immediately noticeable when 
it hangs, and downloading map tiles all the time uses a lot of IO. In-browser 
map applications would probably work too.

My filesystem is partitioned with a small ext2 /boot as sda1, a 2GB swap as 
sda2, and the remaining space as btrfs / on sda3.
btrfs fi show gives:
Label: none  uuid: 28559ad8-7db8-402b-a93d-27ec9c5e943b
	Total devices 1 FS bytes used 102.83GB
	devid    1 size 144.90GB used 144.90GB path /dev/sda3

Btrfs v0.19-35-g1b444cd-dirty

--Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18  7:29             ` Andrew Guertin
@ 2011-08-18  7:55               ` youagree
  2011-08-18 11:45               ` Andrew Guertin
  1 sibling, 0 replies; 24+ messages in thread
From: youagree @ 2011-08-18  7:55 UTC (permalink / raw)
  To: linux-btrfs

On 08/18/2011 09:29 AM, Andrew Guertin wrote:
> * Many processes occasionally hang for a short time
> * When this happens, my cpu monitor shows a short burst of cpu activity
> (100% of 1 core) followed by a longer period of IO
> * When this happens, iotop shows [btrfs-submit-0] and [btrfs-transacti]
> at the top of the list
> * Behavior slowly increases in duration (and frequency?) over time, and
> goes away with a reboot
> * Heavy IO makes behavior appear faster
> 
> ... and the following behaviors before commit 4e69b59:
> 
> * Occasional spikes of IO on cpu monitor concurrent with
> [btrfs-submit-0] and [btrfs-transacti] at top of iotop
> * No hangs, even when that occurs

Yes, exactly that happened in my case too. Yours is a much more precise
description! I did not diagnose 2.6.38 further because I just wanted to
establish a known-good version and at first sight (2 days uptime) my HDD
behavior showed that it cannot be good if _any_ HDD thrashing appears at
all in the first place...

I was able to work with the computer during those IO spikes on 2.6.38
too, although it was observable that the HDD is being thrased
(meanwhile, LED was almost constant lit). But it didn't cause other
programs to be unresponsive, I confirm...


> I wasn't taking notes or anything though, so I'm not 100% certain I was
> observing or interpreting or remembering everything correctly.
> 
> --Andrew
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18  7:29             ` Andrew Guertin
  2011-08-18  7:55               ` youagree
@ 2011-08-18 11:45               ` Andrew Guertin
  2011-08-19  9:58                 ` Anand Jain
  1 sibling, 1 reply; 24+ messages in thread
From: Andrew Guertin @ 2011-08-18 11:45 UTC (permalink / raw)
  To: linux-btrfs

On 08/18/2011 03:29 AM, Andrew Guertin wrote:
> I have not seen slowdowns on 2.6.38. More specifically, I observe the
> following behaviors after commit 4e69b59:
> 
> * Many processes occasionally hang for a short time
> * When this happens, my cpu monitor shows a short burst of cpu activity
> (100% of 1 core) followed by a longer period of IO
> * When this happens, iotop shows [btrfs-submit-0] and [btrfs-transacti]
> at the top of the list
> * Behavior slowly increases in duration (and frequency?) over time, and
> goes away with a reboot
> * Heavy IO makes behavior appear faster
> 
> ... and the following behaviors before commit 4e69b59:
> 
> * Occasional spikes of IO on cpu monitor concurrent with
> [btrfs-submit-0] and [btrfs-transacti] at top of iotop
> * No hangs, even when that occurs
> 
> I wasn't taking notes or anything though, so I'm not 100% certain I was
> observing or interpreting or remembering everything correctly.

I've investigated a little more, and have a few things to add:

Before commit 4e69b59:

* In the IO spikes where [btrfs-submit-0] and [btrfs-transacti] are at
the top of iotop, there is no short burst of cpu activity preceding them

* When running gentoo's emerge --sync (which IIRC is mainly an rsync of
~200MB of small files), output appears to pause during these spikes. I
wasn't able to tell if output stopped entirely or just slowed down.

--Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-12  1:13 ` Andrew Guertin
@ 2011-08-18 14:38   ` Chris Mason
  2011-08-20 17:18     ` Andrew Guertin
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Mason @ 2011-08-18 14:38 UTC (permalink / raw)
  To: Andrew Guertin; +Cc: linux-btrfs

Excerpts from Andrew Guertin's message of 2011-08-11 21:13:18 -0400:
> On 08/09/2011 05:29 PM, Andrew Guertin wrote:
> > I have not tried 3.1-rc1, but plan to soon.
> 
> I've tested now, this does still occur in 3.1-rc1.

Ok, I had high hopes that the btrfs changes in rc1 would fix this.

Could you please try with the deadline elevator instead of the cfq
default?

-chris

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18  6:58       ` youagree
@ 2011-08-19  7:34         ` Chris Samuel
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Samuel @ 2011-08-19  7:34 UTC (permalink / raw)
  To: linux-btrfs

On 18/08/11 16:58, youagree wrote:

> Are these processes principally btrfs-submit and btrfs-transacti
> in particular?
> 
> Then it may be related to my very similar issue reported earlier.

I spent a little bit of time last night looking at it and it
seems that what I'm seeing also affects ext4 on my local SATA
mirror too, so whatever is going on doesn't appear to be
related to btrfs.  So ignore my comment.. ;-)

cheers,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18 11:45               ` Andrew Guertin
@ 2011-08-19  9:58                 ` Anand Jain
  0 siblings, 0 replies; 24+ messages in thread
From: Anand Jain @ 2011-08-19  9:58 UTC (permalink / raw)
  To: Andrew Guertin; +Cc: linux-btrfs


Andrew,

  Facing some challenges to test this. If you have a chance to test
  it again, the following output will be interesting to observe.
    iostat -ctx -p sda 3 >  /tmp/iostat.out

  Also note your system time when this problem occurs, (iostat has time
  stamp, I wish see the waitQ and activeQ at that time, hopefully
  captured in the file /tmp/iostat.out above).

  We need more clarity on the test-case which can reproduce this issue.
  As I know you are writing into the btrfs. However is that a large
  number of small files (you are creating new files) OR you are writing
  a few large new files ?

  If there is any script to test this that will make understanding
  a lot easier.

  PS: Does anybody know Solaris lockstat(1M) equivalent in Linux ?

Thanks, Anand

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-08-18 14:38   ` Chris Mason
@ 2011-08-20 17:18     ` Andrew Guertin
  0 siblings, 0 replies; 24+ messages in thread
From: Andrew Guertin @ 2011-08-20 17:18 UTC (permalink / raw)
  To: Chris Mason; +Cc: linux-btrfs

On 08/18/2011 10:38 AM, Chris Mason wrote:
> Excerpts from Andrew Guertin's message of 2011-08-11 21:13:18 -0400:
>> On 08/09/2011 05:29 PM, Andrew Guertin wrote:
>>> I have not tried 3.1-rc1, but plan to soon.
>>
>> I've tested now, this does still occur in 3.1-rc1.
>
> Ok, I had high hopes that the btrfs changes in rc1 would fix this.
>
> Could you please try with the deadline elevator instead of the cfq
> default?

The deadline elevator does not fix it (tested with 3.1-rc2)

Sorry for taking a long time with this.

--Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-07-18 18:17 ` Josef Bacik
  2011-07-20 20:59   ` Nirbheek Chauhan
@ 2011-08-03 15:50   ` mck
  1 sibling, 0 replies; 24+ messages in thread
From: mck @ 2011-08-03 15:50 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

On Mon, 2011-07-18 at 14:17 -0400, Josef Bacik wrote:
> I've been looking into this and I have a suspicion.  Would you run
> with this patch and see if the problem goes away? 

Didn't help me.

2.6.39 is not usable. 3.0.0 is ok for a few hours then too becomes
unusable. This is discussed in future threads, eg "Btrfs slowdown".

dumbing out fsync (libeatmydata) only gives marginal improvements...

~mck

-- 
Bombing for peace is like F***ing for virginity | www.semb.wever.org |
www.sesat.no | tech.finn.no | http://xss-http-filter.sf.net

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-07-18 18:17 ` Josef Bacik
@ 2011-07-20 20:59   ` Nirbheek Chauhan
  2011-08-03 15:50   ` mck
  1 sibling, 0 replies; 24+ messages in thread
From: Nirbheek Chauhan @ 2011-07-20 20:59 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs

On Mon, Jul 18, 2011 at 11:47 PM, Josef Bacik <josef@redhat.com> wrote:
> On 06/06/2011 06:58 PM, Nirbheek Chauhan wrote:
>> What can I do to debug this issue? What other information should I
>> supply? Could someone guide me on how to figure out why my machine i=
s
>> unusable now?
>
> I've been looking into this and I have a suspicion. =C2=A0Would you r=
un with this
> patch and see if the problem goes away? =C2=A0If so I'm on the right =
track
> and I'll
> have more test patches for you to try :). =C2=A0Thanks,
>

Hello!

Thanks for taking a look! I'm currently travelling and as soon as I
get a bit closer to my backups, I'll try the patch out and report back
for further testing :)

Cheers,

--=20
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" =
in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-06-06 22:58 Nirbheek Chauhan
  2011-07-18 17:37 ` Mck
@ 2011-07-18 18:17 ` Josef Bacik
  2011-07-20 20:59   ` Nirbheek Chauhan
  2011-08-03 15:50   ` mck
  1 sibling, 2 replies; 24+ messages in thread
From: Josef Bacik @ 2011-07-18 18:17 UTC (permalink / raw)
  To: Nirbheek Chauhan; +Cc: linux-btrfs

On 06/06/2011 06:58 PM, Nirbheek Chauhan wrote:
> Hello list,
> 
> I've been using btrfs on my personal machines for about two years now,
> and on this machine for about a year with absolutely no problems.
> Infact, it has held up better than ext4 with regards to reliability.
> 
> However, recently, perhaps with 2.6.39, or after I quickly started
> filling up my disk again, it has become impossible for me to work for
> long periods on my machine.
> 
> Every few minutes, (I guess) when applications do fsync (firefox,
> xchat, vim, etc), all applications that use fsync() hang for several
> seconds, and applications that use general IO suffer extreme
> slowdowns. iotop shows various combinations of the processes listed
> below doing writes, and the total write as 2-3MB/s.
> 
> [btrfs-dealloc-]
> [btrfs-submit-0]
> [btrfs-transacti]
> [btrfs-endio-wri]
> [flush-btrfs-1]
> 
> In some extreme cases, I've had hangs for 5 whole minutes. I'm really
> beginning to appreciate how little I/O GNOME Shell does since it
> remains completely responsive throughout this. I have a feeling that
> the cause for this is extreme fragmentation.
> 
> My hard disk is a 500GB SATA hdd, my btrfs partition details are:
> 
> # btrfs filesystem show
> Label: 'gentoo'  uuid: 6f539d7f-f70f-4216-a4a9-6f7a2117a04a
> 	Total devices 1 FS bytes used 246.37GB
> 	devid    1 size 345.13GB used 345.13GB path /dev/sda7
> 
> Btrfs v0.19-35-g1b444cd-dirty
> 
> What can I do to debug this issue? What other information should I
> supply? Could someone guide me on how to figure out why my machine is
> unusable now?
> 
> Thanks in advance,
> 

Hello,

I've been looking into this and I have a suspicion.  Would you run with this
patch and see if the problem goes away?  If so I'm on the right track
and I'll
have more test patches for you to try :).  Thanks,

Josef

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 19450bc..2e30350 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -150,7 +150,6 @@ static noinline int run_scheduled_bios(struct
btrfs_device *device)
 	 * another device without first sending all of these down.
 	 * So, setup a plug here and finish it off before we return
 	 */
-	blk_start_plug(&plug);

 	bdi = blk_get_backing_dev_info(device->bdev);
 	fs_info = device->dev_root->fs_info;
@@ -290,7 +289,6 @@ loop_lock:
 	spin_unlock(&device->io_lock);

 done:
-	blk_finish_plug(&plug);
 	return 0;
 }


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
  2011-06-06 22:58 Nirbheek Chauhan
@ 2011-07-18 17:37 ` Mck
  2011-07-18 18:17 ` Josef Bacik
  1 sibling, 0 replies; 24+ messages in thread
From: Mck @ 2011-07-18 17:37 UTC (permalink / raw)
  To: Nirbheek Chauhan; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 970 bytes --]

On Tue, 2011-06-07 at 04:28 +0530, Nirbheek Chauhan wrote:
> Every few minutes, (I guess) when applications do fsync (firefox,
> xchat, vim, etc), all applications that use fsync() hang for several
> seconds, and applications that use general IO suffer extreme
> slowdowns. iotop shows various combinations of the processes listed
> below doing writes, and the total write as 2-3MB/s. 

I have experienced this too. It /seemed/ to help removing a lot of
snapshots (i have hundreds that i didn't really need).

Would it be stupid to try disabling fsync like described at 
 http://ubuntuforums.org/archive/index.php/t-1103926.html ?

I don't know of the consequences... but it would prove your theory?

~mck

-- 
“Don’t worry about people stealing your ideas. If your ideas are any
good, you’ll have to ram them down people’s throats.” - Howard Aiken 
| http://semb.wever.org | http://sesat.no
| http://tech.finn.no       | Java XSS Filter

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Applications using fsync cause hangs for several seconds every few minutes
@ 2011-06-21 11:15 Jan Stilow
  0 siblings, 0 replies; 24+ messages in thread
From: Jan Stilow @ 2011-06-21 11:15 UTC (permalink / raw)
  To: linux-btrfs

Hello,

Nirbheek Chauhan <nirbheek <at> gentoo.org> writes:
> However, recently, perhaps with 2.6.39, or after I quickly started
> filling up my disk again, it has become impossible for me to work for
> long periods on my machine.
>
> Every few minutes, (I guess) when applications do fsync (firefox,
> xchat, vim, etc), all applications that use fsync() hang for several
> seconds, and applications that use general IO suffer extreme
> slowdowns. iotop shows various combinations of the processes listed
> below doing writes, and the total write as 2-3MB/s.
>
> [btrfs-dealloc-]
> [btrfs-submit-0]
> [btrfs-transacti]
> [btrfs-endio-wri]
> [flush-btrfs-1]

I'm using btrfs under a 2.6.39-ARCH kernel and run into the same issue.

In my case the [btrfs-submit-0] and [btrfs-transacti] shows up in iotop
and produce 99% of IO at the time a application is frozen. For something
like 10 to 30 seconds.

After a fresh boot everything runs quiet acceptable and goes worse over
the day. If I reboot the unmount process takes a lot of time something
up to 8 minutes. After the reboot everything is back to normal. At least
for a while.

I use snapshots on a daily base and have 20 active snapshots all the
time. Maybe this is the reason for the performance impact?

> What can I do to debug this issue? What other information should I
> supply? Could someone guide me on how to figure out why my machine is
> unusable now?

I have no solution for this problem. In fact a reboot helps me even if
only temporary. I also changed the IO scheduler, switched to laptop mode
and remounted the device. Neither of these helped.

Maybe someone else has any suggestion?

Thanks.

Jan Stilow

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Applications using fsync cause hangs for several seconds every few minutes
@ 2011-06-06 22:58 Nirbheek Chauhan
  2011-07-18 17:37 ` Mck
  2011-07-18 18:17 ` Josef Bacik
  0 siblings, 2 replies; 24+ messages in thread
From: Nirbheek Chauhan @ 2011-06-06 22:58 UTC (permalink / raw)
  To: linux-btrfs

Hello list,

I've been using btrfs on my personal machines for about two years now,
and on this machine for about a year with absolutely no problems.
Infact, it has held up better than ext4 with regards to reliability.

However, recently, perhaps with 2.6.39, or after I quickly started
filling up my disk again, it has become impossible for me to work for
long periods on my machine.

Every few minutes, (I guess) when applications do fsync (firefox,
xchat, vim, etc), all applications that use fsync() hang for several
seconds, and applications that use general IO suffer extreme
slowdowns. iotop shows various combinations of the processes listed
below doing writes, and the total write as 2-3MB/s.

[btrfs-dealloc-]
[btrfs-submit-0]
[btrfs-transacti]
[btrfs-endio-wri]
[flush-btrfs-1]

In some extreme cases, I've had hangs for 5 whole minutes. I'm really
beginning to appreciate how little I/O GNOME Shell does since it
remains completely responsive throughout this. I have a feeling that
the cause for this is extreme fragmentation.

My hard disk is a 500GB SATA hdd, my btrfs partition details are:

# btrfs filesystem show
Label: 'gentoo'  uuid: 6f539d7f-f70f-4216-a4a9-6f7a2117a04a
	Total devices 1 FS bytes used 246.37GB
	devid    1 size 345.13GB used 345.13GB path /dev/sda7

Btrfs v0.19-35-g1b444cd-dirty

What can I do to debug this issue? What other information should I
supply? Could someone guide me on how to figure out why my machine is
unusable now?

Thanks in advance,

-- 
~Nirbheek Chauhan

Gentoo GNOME+Mozilla Team

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-08-20 17:18 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-09 21:29 Applications using fsync cause hangs for several seconds every few minutes Andrew Guertin
2011-08-12  1:13 ` Andrew Guertin
2011-08-18 14:38   ` Chris Mason
2011-08-20 17:18     ` Andrew Guertin
2011-08-17 14:24 ` Andrew Guertin
2011-08-17 14:29   ` Michael Cronenworth
2011-08-17 14:38     ` Andrew Guertin
2011-08-17 14:55       ` Dave
2011-08-18  2:41         ` Anand Jain
2011-08-18  6:44           ` youagree
2011-08-18  7:29             ` Andrew Guertin
2011-08-18  7:55               ` youagree
2011-08-18 11:45               ` Andrew Guertin
2011-08-19  9:58                 ` Anand Jain
2011-08-18  7:41           ` Andrew Guertin
2011-08-18  6:47     ` Chris Samuel
2011-08-18  6:58       ` youagree
2011-08-19  7:34         ` Chris Samuel
  -- strict thread matches above, loose matches on Subject: below --
2011-06-21 11:15 Jan Stilow
2011-06-06 22:58 Nirbheek Chauhan
2011-07-18 17:37 ` Mck
2011-07-18 18:17 ` Josef Bacik
2011-07-20 20:59   ` Nirbheek Chauhan
2011-08-03 15:50   ` mck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.