Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync

All of lore.kernel.org
 help / color / mirror / Atom feed

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
       [not found] <bug-15426-10286@http.bugzilla.kernel.org/>
@ 2010-03-03  0:16 ` Andrew Morton
  2010-03-03 12:09   ` John Hughes
  0 siblings, 1 reply; 13+ messages in thread
From: Andrew Morton @ 2010-03-03  0:16 UTC (permalink / raw)
  To: john; +Cc: bugzilla-daemon, bugme-daemon, linux-kernel, Jens Axboe


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Tue, 2 Mar 2010 16:31:03 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> http://bugzilla.kernel.org/show_bug.cgi?id=15426
> 
>            Summary: Running many copies of bonnie++ on different
>                     filesystems seems to deadlock in sync
>            Product: File System
>            Version: 2.5
>     Kernel Version: 2.6.32
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: VFS
>         AssignedTo: fs_vfs@kernel-bugs.osdl.org
>         ReportedBy: john@calva.com
>         Regression: No
> 
> 
> I create 8 ext3 filesystems on 8 disks and run a copy of bonnie++ on each
> filesystem.  All but two or three of the Bonnie++ processes seem to hang in
> sync and I get log messages like:
> 
> [ 1557.377626] INFO: task bonnie:2503 blocked for more than 120 seconds.
> [ 1557.377693] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
> message.
> [ 1557.377775] bonnie        D 0000000000000000     0  2503   2489 0x00000000
> [ 1557.377778]  ffffffff8144b1f0 0000000000000082 0000000000000000
> ffff88034ac155c0
> [ 1557.377780]  ffff88033c079880 ffffffff81047f43 000000000000f8a0
> ffff88063b1e9fd8
> [ 1557.377783]  00000000000155c0 00000000000155c0 ffff88063bbe3f90
> ffff88063bbe4288
> [ 1557.377785] Call Trace:
> [ 1557.377787]  [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
> [ 1557.377790]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
> [ 1557.377792]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
> [ 1557.377794]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
> [ 1557.377796]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
> [ 1557.377799]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
> 

When the system is stuck, please run

	echo w > /proc/sysrq-trigger

or, equivalently, hit `alt-sysrq-w'.

This will emit huge amounts of stack traces into the system logs and
onto the console.  There's no need to send all of them - please
identify typical-looking ones and then send them in reply to this
email.  Email clients like to wordwrap these traces - please try to
prevent that.

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 12:09   ` John Hughes
@ 2010-03-03 11:50     ` Jens Axboe
  2010-03-03 12:37       ` John Hughes
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2010-03-03 11:50 UTC (permalink / raw)
  To: John Hughes; +Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

On Wed, Mar 03 2010, John Hughes wrote:
> Andrew Morton wrote:
>> When the system is stuck, please run
>>
>> 	echo w > /proc/sysrq-trigger
>>   
> Ok, we have:

Is IO still going on, or does it appear to be stuck? From the traces
below, we have various procs caught in waiting for a request. So if
things are totally stuck, it could be some race in there.


>
> [ 1128.964403] kjournald     D 0000000000000000     0  2196      2 0x00000000
> [ 1128.964407]  ffffffff8144b1f0 0000000000000046 0000000000000000 ffffffff811837bb
> [ 1128.964409]  ffff88063d1762e0 ffff880340002c80 000000000000f8a0 ffff88063c6e9fd8
> [ 1128.964412]  00000000000155c0 00000000000155c0 ffff88063d1762e0 ffff88063d1765d8
> [ 1128.964415] Call Trace:
> [ 1128.964423]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
> [ 1128.964427]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
> [ 1128.964431]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964433]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
> [ 1128.964437]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964441]  [<ffffffff8117b28d>] ? blk_recount_segments+0x17/0x27
> [ 1128.964443]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
> [ 1128.964447]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
> [ 1128.964450]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
> [ 1128.964455]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
> [ 1128.964463]  [<ffffffffa017f5a2>] ? journal_commit_transaction+0x8d9/0xe2b [jbd]
> [ 1128.964467]  [<ffffffffa01823fb>] ? kjournald+0xdf/0x226 [jbd]
> [ 1128.964469]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964472]  [<ffffffffa018231c>] ? kjournald+0x0/0x226 [jbd]
> [ 1128.964474]  [<ffffffff81064789>] ? kthread+0x79/0x81
> [ 1128.964479]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
> [ 1128.964481]  [<ffffffff81064710>] ? kthread+0x0/0x81
> [ 1128.964483]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
>
> And three of (flush-8:32, flush-8:48 and flush-8:64):
>
> [ 1128.964485] flush-8:32    D 0000000000000000     0  2221      2 0x00000000
> [ 1128.964488]  ffff88033e44b880 0000000000000046 0000000000000000 ffffffff811837bb
> [ 1128.964490]  ffff88063d177100 0000001000022c80 000000000000f8a0 ffff88063b66dfd8
> [ 1128.964493]  00000000000155c0 00000000000155c0 ffff88063d177100 ffff88063d1773f8
> [ 1128.964495] Call Trace:
> [ 1128.964497]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
> [ 1128.964500]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
> [ 1128.964502]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964504]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
> [ 1128.964506]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964508]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
> [ 1128.964511]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
> [ 1128.964514]  [<ffffffff810114ce>] ? common_interrupt+0xe/0x13
> [ 1128.964516]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
> [ 1128.964518]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
> [ 1128.964522]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
> [ 1128.964525]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
> [ 1128.964528]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
> [ 1128.964532]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
> [ 1128.964534]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
> [ 1128.964536]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
> [ 1128.964541]  [<ffffffff8110642e>] ? writeback_single_inode+0xe7/0x2da
> [ 1128.964543]  [<ffffffff81107107>] ? writeback_inodes_wb+0x423/0x4fe
> [ 1128.964546]  [<ffffffff8110730e>] ? wb_writeback+0x12c/0x1ab
> [ 1128.964549]  [<ffffffff81107574>] ? wb_do_writeback+0x145/0x15b
> [ 1128.964551]  [<ffffffff811075bb>] ? bdi_writeback_task+0x31/0x9d
> [ 1128.964556]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
> [ 1128.964558]  [<ffffffff810c7d36>] ? bdi_start_fn+0x70/0xca
> [ 1128.964560]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
> [ 1128.964562]  [<ffffffff81064789>] ? kthread+0x79/0x81
> [ 1128.964564]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
> [ 1128.964566]  [<ffffffff81064710>] ? kthread+0x0/0x81
> [ 1128.964568]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20
>
>
> And a few copies of bonnie (14 in total):
>
> Two like this:
>
> [ 1128.964735] bonnie        D 0000000000000000     0  2240   2236 0x00000000
> [ 1128.964737]  ffff88033e44b880 0000000000000086 0000000000000000 00000000000009ef
> [ 1128.964740]  ffff8802bac52000 0000000000430bf5 000000000000f8a0 ffff88033d031fd8
> [ 1128.964743]  00000000000155c0 00000000000155c0 ffff88033d811c40 ffff88033d811f38
> [ 1128.964745] Call Trace:
> [ 1128.964747]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
> [ 1128.964749]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964752]  [<ffffffff8110c672>] ? sync_buffer+0x3b/0x40
> [ 1128.964754]  [<ffffffff812ee339>] ? __wait_on_bit_lock+0x3f/0x84
> [ 1128.964756]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
> [ 1128.964758]  [<ffffffff812ee3e9>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
> [ 1128.964760]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
> [ 1128.964764]  [<ffffffffa017e068>] ? do_get_write_access+0x82/0x441 [jbd]
> [ 1128.964766]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
> [ 1128.964769]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
> [ 1128.964774]  [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
> [ 1128.964778]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
> [ 1128.964782]  [<ffffffffa0199d86>] ? ext3_orphan_add+0x94/0x15b [ext3]
> [ 1128.964786]  [<ffffffffa01941f8>] ? ext3_mark_inode_dirty+0x33/0x3c [ext3]
> [ 1128.964790]  [<ffffffffa019a6cb>] ? ext3_unlink+0x166/0x1b4 [ext3]
> [ 1128.964793]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
> [ 1128.964795]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
> [ 1128.964797]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
> [ 1128.964800]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
> [ 1128.964803]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
> One like this:
>
> [ 1128.964804] bonnie        D 0000000000000000     0  2241   2236 0x00000000
> [ 1128.964807]  ffff88033e44b880 0000000000000082 0000000000000000 ffffffff810b90a9
> [ 1128.964809]  ffff880000022c80 0000000000000000 000000000000f8a0 ffff88033c3abfd8
> [ 1128.964812]  00000000000155c0 00000000000155c0 ffff88033d813f90 ffff88033d814288
> [ 1128.964814] Call Trace:
> [ 1128.964816]  [<ffffffff810b90a9>] ? __alloc_pages_nodemask+0x10f/0x5ce
> [ 1128.964819]  [<ffffffff81064977>] ? bit_waitqueue+0x10/0xa0
> [ 1128.964822]  [<ffffffffa017e212>] ? do_get_write_access+0x22c/0x441 [jbd]
> [ 1128.964824]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
> [ 1128.964826]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
> [ 1128.964829]  [<ffffffff810170cd>] ? read_tsc+0xa/0x20
> [ 1128.964832]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
> [ 1128.964836]  [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
> [ 1128.964840]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
> [ 1128.964844]  [<ffffffffa01941e6>] ? ext3_mark_inode_dirty+0x21/0x3c [ext3]
> [ 1128.964847]  [<ffffffffa019a6b4>] ? ext3_unlink+0x14f/0x1b4 [ext3]
> [ 1128.964850]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
> [ 1128.964852]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
> [ 1128.964854]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
> [ 1128.964856]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
> [ 1128.964859]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
>
> One like this:
>
> [ 1128.964955] bonnie        D 0000000000000000     0  2244   2236 0x00000000
> [ 1128.964958]  ffffffff8144b1f0 0000000000000082 0000000000000000 ffffffff811837bb
> [ 1128.964960]  ffff88033d813170 00000010a00fc7ab 000000000000f8a0 ffff88033ceadfd8
> [ 1128.964963]  00000000000155c0 00000000000155c0 ffff88033d813170 ffff88033d813468
> [ 1128.964965] Call Trace:
> [ 1128.964967]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
> [ 1128.964969]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
> [ 1128.964971]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
> [ 1128.964973]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
> [ 1128.964976]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
> [ 1128.964978]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
> [ 1128.964980]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
> [ 1128.964983]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
> [ 1128.964985]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
> [ 1128.964987]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
> [ 1128.964990]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
> [ 1128.964992]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
> [ 1128.964994]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
> [ 1128.964997]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
> [ 1128.964999]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
> [ 1128.965003]  [<ffffffff810b3cda>] ? __filemap_fdatawrite_range+0x4b/0x54
> [ 1128.965007]  [<ffffffffa019b5bb>] ? ext3_sync_fs+0x42/0x4b [ext3]
> [ 1128.965009]  [<ffffffff810b4998>] ? filemap_write_and_wait+0x17/0x32
> [ 1128.965013]  [<ffffffff8112d015>] ? sync_quota_sb+0x5d/0xf6
> [ 1128.965015]  [<ffffffff8110a38d>] ? __sync_filesystem+0x43/0x70
> [ 1128.965017]  [<ffffffff8110a454>] ? sync_filesystems+0x9a/0xe3
> [ 1128.965019]  [<ffffffff8110a4e2>] ? sys_sync+0x1c/0x2e
> [ 1128.965021]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
> Seven bonnies like this:
>
> [ 1128.965231] bonnie        D 0000000000000000     0  2253   2236 0x00000000
> [ 1128.965233]  ffffffff8144b1f0 0000000000000082 0000000000000000 ffff88034ac155c0
> [ 1128.965236]  ffff88033ce7f700 ffffffff81047f43 000000000000f8a0 ffff88033c249fd8
> [ 1128.965239]  00000000000155c0 00000000000155c0 ffff88033d814db0 ffff88033d8150a8
> [ 1128.965241] Call Trace:
> [ 1128.965243]  [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
> [ 1128.965246]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
> [ 1128.965248]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
> [ 1128.965250]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
> [ 1128.965252]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
> [ 1128.965255]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
> And three bonnies like this:
>
> [ 1128.965023] bonnie        D 0000000000000000     0  2245   2236 0x00000000
> [ 1128.965026]  ffff88033e44b880 0000000000000082 0000000000000000 00000000000155c0
> [ 1128.965028]  0000000000000005 00000000000155c0 000000000000f8a0 ffff88033cfc3fd8
> [ 1128.965031]  00000000000155c0 00000000000155c0 ffff88033d817100 ffff88033d8173f8
> [ 1128.965033] Call Trace:
> [ 1128.965035]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
> [ 1128.965038]  [<ffffffff8104a196>] ? try_to_wake_up+0x249/0x259
> [ 1128.965040]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
> [ 1128.965042]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
> [ 1128.965044]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
> [ 1128.965047]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
>
>
>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 12:37       ` John Hughes
@ 2010-03-03 12:03         ` Jens Axboe
  2010-03-03 12:45           ` John Hughes
                             ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Jens Axboe @ 2010-03-03 12:03 UTC (permalink / raw)
  To: John Hughes; +Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 908 bytes --]

On Wed, Mar 03 2010, John Hughes wrote:
> Jens Axboe wrote:
>> Is IO still going on, or does it appear to be stuck? From the traces
>> below, we have various procs caught in waiting for a request. So if
>> things are totally stuck, it could be some race in there.
>>   
> I see I/O happening on three or four of the disks.
>
> Just a thought.  What exactly is sync(2) supposed to do - block until  
> there are no more dirty pages, or block until all pages that were dirty  
> when the sync was done are clean?  In other words is the problem simply  
> that pages are being dirtied faster than the sync is writing them out?

Our sync is currently broken in that regard, since it'll wait for too
long. We have a debated patch going, I have included it below. Any
chance you could give it a whirl?

The semantics of sync are supposed to be 'wait for dirty IO generated
BEFORE this sync call'.

-- 
Jens Axboe


[-- Attachment #2: writeback-fix-broken-sync-2.6.32.patch --]
[-- Type: text/x-diff, Size: 2409 bytes --]

commit 057226ca7447880e4e766a82cf32197e492ba963
Author: Jens Axboe <jens.axboe@oracle.com>
Date:   Fri Feb 12 10:14:34 2010 +0100

    writeback: Fix broken sync writeback
    
    There's currently a writeback bug in the 2.6.32 and 2.6.33-rc kernels
    that prevent proper writeback when sync(1) is being run. Instead of
    flushing everything older than the sync run, it will do chunks of normal
    MAX_WRITEBACK_PAGES writeback and restart over and over. This results in
    very suboptimal writeback for many files, see the initial report from
    Jan Engelhardt:
    
    http://lkml.org/lkml/2010/1/22/382
    
    This fixes it by using the passed in page writeback count, instead of
    doing MAX_WRITEBACK_PAGES batches, which gets us much better performance
    (Jan reports it's up from ~400KB/sec to 10MB/sec) and makes sync(1)
    finish properly even when new pages are being dirted.
    
    Thanks to Jan Kara <jack@suse.cz> for spotting this problem!
    
    Cc: stable@kernel.org
    Reported-by: Jan Engelhardt <jengelh@medozas.de>
    Signed-off-by: Jens Axboe <jens.axboe@oracle.com>

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 9d5360c..8a46c67 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -773,6 +773,8 @@ static long wb_writeback(struct bdi_writeback *wb,
 	}
 
 	for (;;) {
+		long to_write = 0;
+
 		/*
 		 * Stop writeback when nr_pages has been consumed
 		 */
@@ -786,13 +788,18 @@ static long wb_writeback(struct bdi_writeback *wb,
 		if (args->for_background && !over_bground_thresh())
 			break;
 
+		if (args->sync_mode == WB_SYNC_ALL)
+			to_write = args->nr_pages;
+		if (!to_write)
+			to_write = MAX_WRITEBACK_PAGES;
+
 		wbc.more_io = 0;
 		wbc.encountered_congestion = 0;
-		wbc.nr_to_write = MAX_WRITEBACK_PAGES;
+		wbc.nr_to_write = to_write;
 		wbc.pages_skipped = 0;
 		writeback_inodes_wb(wb, &wbc);
-		args->nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
-		wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+		args->nr_pages -= to_write - wbc.nr_to_write;
+		wrote += to_write - wbc.nr_to_write;
 
 		/*
 		 * If we consumed everything, see if we have more
@@ -807,7 +814,7 @@ static long wb_writeback(struct bdi_writeback *wb,
 		/*
 		 * Did we write something? Try for more
 		 */
-		if (wbc.nr_to_write < MAX_WRITEBACK_PAGES)
+		if (wbc.nr_to_write < to_write)
 			continue;
 		/*
 		 * Nothing written. Wait for some inode to

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 12:45           ` John Hughes
@ 2010-03-03 12:09             ` Jens Axboe
  0 siblings, 0 replies; 13+ messages in thread
From: Jens Axboe @ 2010-03-03 12:09 UTC (permalink / raw)
  To: John Hughes; +Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

On Wed, Mar 03 2010, John Hughes wrote:
> Jens Axboe wrote:
>> On Wed, Mar 03 2010, John Hughes wrote:
>>   
>>> Just a thought.  What exactly is sync(2) supposed to do - block until 
>>>  there are no more dirty pages, or block until all pages that were 
>>> dirty  when the sync was done are clean?  In other words is the 
>>> problem simply  that pages are being dirtied faster than the sync is 
>>> writing them out?
>>>     
>>
>> Our sync is currently broken in that regard, since it'll wait for too
>> long. We have a debated patch going, I have included it below. Any
>> chance you could give it a whirl?
>>
>> The semantics of sync are supposed to be 'wait for dirty IO generated
>> BEFORE this sync call'.
>>   
> Ok, I will try this patch and get back to you.

Looking at your bug report, I saw that you were on 2.6.32. If you are
testing 2.6.33, let me know, the patch wont apply (I have another
variant for that).

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03  0:16 ` [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync Andrew Morton
@ 2010-03-03 12:09   ` John Hughes
  2010-03-03 11:50     ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: John Hughes @ 2010-03-03 12:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: bugzilla-daemon, bugme-daemon, linux-kernel, Jens Axboe, John Hughes

Andrew Morton wrote:
> When the system is stuck, please run
>
> 	echo w > /proc/sysrq-trigger
>   
Ok, we have:

[ 1128.964403] kjournald     D 0000000000000000     0  2196      2 0x00000000
[ 1128.964407]  ffffffff8144b1f0 0000000000000046 0000000000000000 ffffffff811837bb
[ 1128.964409]  ffff88063d1762e0 ffff880340002c80 000000000000f8a0 ffff88063c6e9fd8
[ 1128.964412]  00000000000155c0 00000000000155c0 ffff88063d1762e0 ffff88063d1765d8
[ 1128.964415] Call Trace:
[ 1128.964423]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964427]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964431]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964433]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964437]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964441]  [<ffffffff8117b28d>] ? blk_recount_segments+0x17/0x27
[ 1128.964443]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964447]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964450]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964455]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964463]  [<ffffffffa017f5a2>] ? journal_commit_transaction+0x8d9/0xe2b [jbd]
[ 1128.964467]  [<ffffffffa01823fb>] ? kjournald+0xdf/0x226 [jbd]
[ 1128.964469]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964472]  [<ffffffffa018231c>] ? kjournald+0x0/0x226 [jbd]
[ 1128.964474]  [<ffffffff81064789>] ? kthread+0x79/0x81
[ 1128.964479]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 1128.964481]  [<ffffffff81064710>] ? kthread+0x0/0x81
[ 1128.964483]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20

And three of (flush-8:32, flush-8:48 and flush-8:64):

[ 1128.964485] flush-8:32    D 0000000000000000     0  2221      2 0x00000000
[ 1128.964488]  ffff88033e44b880 0000000000000046 0000000000000000 ffffffff811837bb
[ 1128.964490]  ffff88063d177100 0000001000022c80 000000000000f8a0 ffff88063b66dfd8
[ 1128.964493]  00000000000155c0 00000000000155c0 ffff88063d177100 ffff88063d1773f8
[ 1128.964495] Call Trace:
[ 1128.964497]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964500]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964502]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964504]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964506]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964508]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964511]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964514]  [<ffffffff810114ce>] ? common_interrupt+0xe/0x13
[ 1128.964516]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964518]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964522]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
[ 1128.964525]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
[ 1128.964528]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
[ 1128.964532]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
[ 1128.964534]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
[ 1128.964536]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
[ 1128.964541]  [<ffffffff8110642e>] ? writeback_single_inode+0xe7/0x2da
[ 1128.964543]  [<ffffffff81107107>] ? writeback_inodes_wb+0x423/0x4fe
[ 1128.964546]  [<ffffffff8110730e>] ? wb_writeback+0x12c/0x1ab
[ 1128.964549]  [<ffffffff81107574>] ? wb_do_writeback+0x145/0x15b
[ 1128.964551]  [<ffffffff811075bb>] ? bdi_writeback_task+0x31/0x9d
[ 1128.964556]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
[ 1128.964558]  [<ffffffff810c7d36>] ? bdi_start_fn+0x70/0xca
[ 1128.964560]  [<ffffffff810c7cc6>] ? bdi_start_fn+0x0/0xca
[ 1128.964562]  [<ffffffff81064789>] ? kthread+0x79/0x81
[ 1128.964564]  [<ffffffff81011baa>] ? child_rip+0xa/0x20
[ 1128.964566]  [<ffffffff81064710>] ? kthread+0x0/0x81
[ 1128.964568]  [<ffffffff81011ba0>] ? child_rip+0x0/0x20


And a few copies of bonnie (14 in total):

Two like this:

[ 1128.964735] bonnie        D 0000000000000000     0  2240   2236 0x00000000
[ 1128.964737]  ffff88033e44b880 0000000000000086 0000000000000000 00000000000009ef
[ 1128.964740]  ffff8802bac52000 0000000000430bf5 000000000000f8a0 ffff88033d031fd8
[ 1128.964743]  00000000000155c0 00000000000155c0 ffff88033d811c40 ffff88033d811f38
[ 1128.964745] Call Trace:
[ 1128.964747]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
[ 1128.964749]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964752]  [<ffffffff8110c672>] ? sync_buffer+0x3b/0x40
[ 1128.964754]  [<ffffffff812ee339>] ? __wait_on_bit_lock+0x3f/0x84
[ 1128.964756]  [<ffffffff8110c637>] ? sync_buffer+0x0/0x40
[ 1128.964758]  [<ffffffff812ee3e9>] ? out_of_line_wait_on_bit_lock+0x6b/0x77
[ 1128.964760]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
[ 1128.964764]  [<ffffffffa017e068>] ? do_get_write_access+0x82/0x441 [jbd]
[ 1128.964766]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
[ 1128.964769]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
[ 1128.964774]  [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
[ 1128.964778]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
[ 1128.964782]  [<ffffffffa0199d86>] ? ext3_orphan_add+0x94/0x15b [ext3]
[ 1128.964786]  [<ffffffffa01941f8>] ? ext3_mark_inode_dirty+0x33/0x3c [ext3]
[ 1128.964790]  [<ffffffffa019a6cb>] ? ext3_unlink+0x166/0x1b4 [ext3]
[ 1128.964793]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
[ 1128.964795]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
[ 1128.964797]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
[ 1128.964800]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
[ 1128.964803]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

One like this:

[ 1128.964804] bonnie        D 0000000000000000     0  2241   2236 0x00000000
[ 1128.964807]  ffff88033e44b880 0000000000000082 0000000000000000 ffffffff810b90a9
[ 1128.964809]  ffff880000022c80 0000000000000000 000000000000f8a0 ffff88033c3abfd8
[ 1128.964812]  00000000000155c0 00000000000155c0 ffff88033d813f90 ffff88033d814288
[ 1128.964814] Call Trace:
[ 1128.964816]  [<ffffffff810b90a9>] ? __alloc_pages_nodemask+0x10f/0x5ce
[ 1128.964819]  [<ffffffff81064977>] ? bit_waitqueue+0x10/0xa0
[ 1128.964822]  [<ffffffffa017e212>] ? do_get_write_access+0x22c/0x441 [jbd]
[ 1128.964824]  [<ffffffff81064a84>] ? wake_bit_function+0x0/0x23
[ 1128.964826]  [<ffffffff8110bbed>] ? __getblk+0x26/0x29a
[ 1128.964829]  [<ffffffff810170cd>] ? read_tsc+0xa/0x20
[ 1128.964832]  [<ffffffffa017e449>] ? journal_get_write_access+0x22/0x33 [jbd]
[ 1128.964836]  [<ffffffffa01a040f>] ? __ext3_journal_get_write_access+0x1e/0x47 [ext3]
[ 1128.964840]  [<ffffffffa019418e>] ? ext3_reserve_inode_write+0x3e/0x75 [ext3]
[ 1128.964844]  [<ffffffffa01941e6>] ? ext3_mark_inode_dirty+0x21/0x3c [ext3]
[ 1128.964847]  [<ffffffffa019a6b4>] ? ext3_unlink+0x14f/0x1b4 [ext3]
[ 1128.964850]  [<ffffffff810f645d>] ? vfs_unlink+0x72/0xa6
[ 1128.964852]  [<ffffffff810f7cb5>] ? do_unlinkat+0xad/0x134
[ 1128.964854]  [<ffffffff810f9b11>] ? vfs_readdir+0x92/0xa7
[ 1128.964856]  [<ffffffff810f9c9d>] ? sys_getdents+0xb9/0xc7
[ 1128.964859]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b


One like this:

[ 1128.964955] bonnie        D 0000000000000000     0  2244   2236 0x00000000
[ 1128.964958]  ffffffff8144b1f0 0000000000000082 0000000000000000 ffffffff811837bb
[ 1128.964960]  ffff88033d813170 00000010a00fc7ab 000000000000f8a0 ffff88033ceadfd8
[ 1128.964963]  00000000000155c0 00000000000155c0 ffff88033d813170 ffff88033d813468
[ 1128.964965] Call Trace:
[ 1128.964967]  [<ffffffff811837bb>] ? cfq_set_request+0x2af/0x319
[ 1128.964969]  [<ffffffff811780d7>] ? get_request+0x7e/0x2ba
[ 1128.964971]  [<ffffffff812edf36>] ? io_schedule+0x73/0xb7
[ 1128.964973]  [<ffffffff81178403>] ? get_request_wait+0xf0/0x188
[ 1128.964976]  [<ffffffff81064a56>] ? autoremove_wake_function+0x0/0x2e
[ 1128.964978]  [<ffffffff81178792>] ? __make_request+0x2f7/0x428
[ 1128.964980]  [<ffffffff81176f6f>] ? generic_make_request+0x299/0x2f9
[ 1128.964983]  [<ffffffff811770a5>] ? submit_bio+0xd6/0xf2
[ 1128.964985]  [<ffffffff8110b50b>] ? submit_bh+0xf5/0x115
[ 1128.964987]  [<ffffffff8110da7c>] ? __block_write_full_page+0x1d6/0x2ac
[ 1128.964990]  [<ffffffff8110c801>] ? end_buffer_async_write+0x0/0x13b
[ 1128.964992]  [<ffffffff8110fabc>] ? blkdev_get_block+0x0/0x57
[ 1128.964994]  [<ffffffff810b9572>] ? __writepage+0xa/0x25
[ 1128.964997]  [<ffffffff810b9bf9>] ? write_cache_pages+0x20b/0x327
[ 1128.964999]  [<ffffffff810b9568>] ? __writepage+0x0/0x25
[ 1128.965003]  [<ffffffff810b3cda>] ? __filemap_fdatawrite_range+0x4b/0x54
[ 1128.965007]  [<ffffffffa019b5bb>] ? ext3_sync_fs+0x42/0x4b [ext3]
[ 1128.965009]  [<ffffffff810b4998>] ? filemap_write_and_wait+0x17/0x32
[ 1128.965013]  [<ffffffff8112d015>] ? sync_quota_sb+0x5d/0xf6
[ 1128.965015]  [<ffffffff8110a38d>] ? __sync_filesystem+0x43/0x70
[ 1128.965017]  [<ffffffff8110a454>] ? sync_filesystems+0x9a/0xe3
[ 1128.965019]  [<ffffffff8110a4e2>] ? sys_sync+0x1c/0x2e
[ 1128.965021]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

Seven bonnies like this:

[ 1128.965231] bonnie        D 0000000000000000     0  2253   2236 0x00000000
[ 1128.965233]  ffffffff8144b1f0 0000000000000082 0000000000000000 ffff88034ac155c0
[ 1128.965236]  ffff88033ce7f700 ffffffff81047f43 000000000000f8a0 ffff88033c249fd8
[ 1128.965239]  00000000000155c0 00000000000155c0 ffff88033d814db0 ffff88033d8150a8
[ 1128.965241] Call Trace:
[ 1128.965243]  [<ffffffff81047f43>] ? finish_task_switch+0x3a/0xa7
[ 1128.965246]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
[ 1128.965248]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
[ 1128.965250]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
[ 1128.965252]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
[ 1128.965255]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b

And three bonnies like this:

[ 1128.965023] bonnie        D 0000000000000000     0  2245   2236 0x00000000
[ 1128.965026]  ffff88033e44b880 0000000000000082 0000000000000000 00000000000155c0
[ 1128.965028]  0000000000000005 00000000000155c0 000000000000f8a0 ffff88033cfc3fd8
[ 1128.965031]  00000000000155c0 00000000000155c0 ffff88033d817100 ffff88033d8173f8
[ 1128.965033] Call Trace:
[ 1128.965035]  [<ffffffff812ee653>] ? __mutex_lock_common+0x10b/0x17b
[ 1128.965038]  [<ffffffff8104a196>] ? try_to_wake_up+0x249/0x259
[ 1128.965040]  [<ffffffff812ee77b>] ? mutex_lock+0x1a/0x31
[ 1128.965042]  [<ffffffff8110a3cd>] ? sync_filesystems+0x13/0xe3
[ 1128.965044]  [<ffffffff8110a4d8>] ? sys_sync+0x12/0x2e
[ 1128.965047]  [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 11:50     ` Jens Axboe
@ 2010-03-03 12:37       ` John Hughes
  2010-03-03 12:03         ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: John Hughes @ 2010-03-03 12:37 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

Jens Axboe wrote:
> Is IO still going on, or does it appear to be stuck? From the traces
> below, we have various procs caught in waiting for a request. So if
> things are totally stuck, it could be some race in there.
>   
I see I/O happening on three or four of the disks.

Just a thought.  What exactly is sync(2) supposed to do - block until 
there are no more dirty pages, or block until all pages that were dirty 
when the sync was done are clean?  In other words is the problem simply 
that pages are being dirtied faster than the sync is writing them out?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 12:03         ` Jens Axboe
@ 2010-03-03 12:45           ` John Hughes
  2010-03-03 12:09             ` Jens Axboe
  2010-03-03 14:42           ` Andre Noll
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: John Hughes @ 2010-03-03 12:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

Jens Axboe wrote:
> On Wed, Mar 03 2010, John Hughes wrote:
>   
>> Just a thought.  What exactly is sync(2) supposed to do - block until  
>> there are no more dirty pages, or block until all pages that were dirty  
>> when the sync was done are clean?  In other words is the problem simply  
>> that pages are being dirtied faster than the sync is writing them out?
>>     
>
> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?
>
> The semantics of sync are supposed to be 'wait for dirty IO generated
> BEFORE this sync call'.
>   
Ok, I will try this patch and get back to you.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 12:03         ` Jens Axboe
  2010-03-03 12:45           ` John Hughes
@ 2010-03-03 14:42           ` Andre Noll
  2010-03-04 14:55             ` John Hughes
  2010-03-03 18:33           ` John Hughes
  2010-03-04 11:16           ` John Hughes
  3 siblings, 1 reply; 13+ messages in thread
From: Andre Noll @ 2010-03-03 14:42 UTC (permalink / raw)
  To: Jens Axboe
  Cc: John Hughes, Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1577 bytes --]

On 13:03, Jens Axboe wrote:

> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?

That's the same patch you posted two weeks ago and on which Linus
commented he thinks it is broken. He also proposed another, simpler
patch (included below) which I'm running for a few days and which
fixes the sync problem for me.

However, it is not clear at all whether that simpler patch is
correct. Any comments on that?

Thanks
Andre

---
 fs/fs-writeback.c |   15 ++++++++++-----
 1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1a7c42c..a0a8424 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -738,11 +738,16 @@ static long wb_writeback(struct bdi_writeback *wb,
 	long wrote = 0;
 	struct inode *inode;
 
-	if (wbc.for_kupdate) {
-		wbc.older_than_this = &oldest_jif;
-		oldest_jif = jiffies -
-				msecs_to_jiffies(dirty_expire_interval * 10);
-	}
+	/*
+	 * We never write back data that was dirtied _after_ we
+	 * started writeback. But kupdate doesn't even want to
+	 * write back recently dirtied stuff, only older data.
+	 */
+	oldest_jif = jiffies-1;
+	wbc.older_than_this = &oldest_jif;
+	if (wbc.for_kupdate)
+		oldest_jif -= msecs_to_jiffies(dirty_expire_interval * 10);
+
 	if (!wbc.range_cyclic) {
 		wbc.range_start = 0;
 		wbc.range_end = LLONG_MAX;
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 12:03         ` Jens Axboe
  2010-03-03 12:45           ` John Hughes
  2010-03-03 14:42           ` Andre Noll
@ 2010-03-03 18:33           ` John Hughes
  2010-03-04 11:16           ` John Hughes
  3 siblings, 0 replies; 13+ messages in thread
From: John Hughes @ 2010-03-03 18:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

Jens Axboe wrote:
> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?
>
> The semantics of sync are supposed to be 'wait for dirty IO generated
> BEFORE this sync call'.
>
>   
This patch doesn't seem to change things.  It's a bit late here, I'll 
check my results tomorrow.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 12:03         ` Jens Axboe
                             ` (2 preceding siblings ...)
  2010-03-03 18:33           ` John Hughes
@ 2010-03-04 11:16           ` John Hughes
  3 siblings, 0 replies; 13+ messages in thread
From: John Hughes @ 2010-03-04 11:16 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

Jens Axboe wrote:
> Our sync is currently broken in that regard, since it'll wait for too
> long. We have a debated patch going, I have included it below. Any
> chance you could give it a whirl?
>   
Ok, I confirm that this patch does not fix my problem.

I'll try the "simpler" patch.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-03 14:42           ` Andre Noll
@ 2010-03-04 14:55             ` John Hughes
  2010-03-04 17:42               ` Andre Noll
  0 siblings, 1 reply; 13+ messages in thread
From: John Hughes @ 2010-03-04 14:55 UTC (permalink / raw)
  To: Andre Noll
  Cc: Jens Axboe, Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

Andre Noll wrote:
> On 13:03, Jens Axboe wrote:
>
>   
>> Our sync is currently broken in that regard, since it'll wait for too
>> long. We have a debated patch going, I have included it below. Any
>> chance you could give it a whirl?
>>     
>
> That's the same patch you posted two weeks ago and on which Linus
> commented he thinks it is broken. He also proposed another, simpler
> patch (included below) which I'm running for a few days and which
> fixes the sync problem for me.
>   
I've tried the "simpler patch" on my system and it makes no difference 
for me.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-04 14:55             ` John Hughes
@ 2010-03-04 17:42               ` Andre Noll
  2010-03-05 10:44                 ` John Hughes
  0 siblings, 1 reply; 13+ messages in thread
From: Andre Noll @ 2010-03-04 17:42 UTC (permalink / raw)
  To: John Hughes
  Cc: Jens Axboe, Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 639 bytes --]

On 15:55, John Hughes wrote:
> >That's the same patch you posted two weeks ago and on which Linus
> >commented he thinks it is broken. He also proposed another, simpler
> >patch (included below) which I'm running for a few days and which
> >fixes the sync problem for me.
> >  
> I've tried the "simpler patch" on my system and it makes no difference 
> for me.

Does it make no difference to Jens' patch, or no difference to an
unpatched kernel?

BTW, switching from cfq to noop or deadline also improved things a
bit for me.

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync
  2010-03-04 17:42               ` Andre Noll
@ 2010-03-05 10:44                 ` John Hughes
  0 siblings, 0 replies; 13+ messages in thread
From: John Hughes @ 2010-03-05 10:44 UTC (permalink / raw)
  To: Andre Noll
  Cc: Jens Axboe, Andrew Morton, bugzilla-daemon, bugme-daemon, linux-kernel

Andre Noll wrote:
> On 15:55, John Hughes wrote:
>   
>>> That's the same patch you posted two weeks ago and on which Linus
>>> commented he thinks it is broken. He also proposed another, simpler
>>> patch (included below) which I'm running for a few days and which
>>> fixes the sync problem for me.
>>>  
>>>       
>> I've tried the "simpler patch" on my system and it makes no difference 
>> for me.
>>     
>
> Does it make no difference to Jens' patch, or no difference to an
> unpatched kernel?
>   
No difference to an unpatched kernel.  I didn't think of stacking the 
patches.
> BTW, switching from cfq to noop or deadline also improved things a
> bit for me.
>   
Will try that.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-03-05 10:44 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-15426-10286@http.bugzilla.kernel.org/>
2010-03-03  0:16 ` [Bugme-new] [Bug 15426] New: Running many copies of bonnie++ on different filesystems seems to deadlock in sync Andrew Morton
2010-03-03 12:09   ` John Hughes
2010-03-03 11:50     ` Jens Axboe
2010-03-03 12:37       ` John Hughes
2010-03-03 12:03         ` Jens Axboe
2010-03-03 12:45           ` John Hughes
2010-03-03 12:09             ` Jens Axboe
2010-03-03 14:42           ` Andre Noll
2010-03-04 14:55             ` John Hughes
2010-03-04 17:42               ` Andre Noll
2010-03-05 10:44                 ` John Hughes
2010-03-03 18:33           ` John Hughes
2010-03-04 11:16           ` John Hughes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.