* [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-05-28 11:46 Jens Axboe
2009-05-28 11:46 ` [PATCH 01/11] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
` (16 more replies)
0 siblings, 17 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Hi,
Here's the 9th version of the writeback patches. Changes since v8:
- Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
issue.
- Get rid of the explicit wait queues, we can just use wake_up_process()
since it's just for that one task.
- Add separate "sync_supers" thread that makes sure that the dirty
super blocks get written. We cannot safely do this from bdi_forker_task(),
as that risks deadlocking on ->s_umount. Artem, I implemented this
by doing the wake ups from a timer so that it would be easier for you
to just deactivate the timer when there are no super blocks.
For ease of patching, I've put the full diff here:
http://kernel.dk/writeback-v9.patch
and also stored this in a writeback-v9 branch that will not change,
you can pull that into Linus tree from here:
git://git.kernel.dk/linux-2.6-block.git writeback-v9
block/blk-core.c | 1 +
drivers/block/aoe/aoeblk.c | 1 +
drivers/char/mem.c | 1 +
fs/btrfs/disk-io.c | 24 +-
fs/buffer.c | 2 +-
fs/char_dev.c | 1 +
fs/configfs/inode.c | 1 +
fs/fs-writeback.c | 804 ++++++++++++++++++++++++++++-------
fs/fuse/inode.c | 1 +
fs/hugetlbfs/inode.c | 1 +
fs/nfs/client.c | 1 +
fs/ntfs/super.c | 33 +--
fs/ocfs2/dlm/dlmfs.c | 1 +
fs/ramfs/inode.c | 1 +
fs/super.c | 3 -
fs/sync.c | 2 +-
fs/sysfs/inode.c | 1 +
fs/ubifs/super.c | 1 +
include/linux/backing-dev.h | 73 ++++-
include/linux/fs.h | 11 +-
include/linux/writeback.h | 15 +-
kernel/cgroup.c | 1 +
mm/Makefile | 2 +-
mm/backing-dev.c | 518 ++++++++++++++++++++++-
mm/page-writeback.c | 151 +------
mm/pdflush.c | 269 ------------
mm/swap_state.c | 1 +
mm/vmscan.c | 2 +-
28 files changed, 1286 insertions(+), 637 deletions(-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* [PATCH 01/11] ntfs: remove old debug check for dirty data in ntfs_put_super()
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 02/11] btrfs: properly register fs backing device Jens Axboe
` (15 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
This should not trigger anymore, so kill it.
Acked-by: Anton Altaparmakov <aia21@cam.ac.uk>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/ntfs/super.c | 33 +++------------------------------
1 files changed, 3 insertions(+), 30 deletions(-)
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index f76951d..3fc03bd 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -2373,39 +2373,12 @@ static void ntfs_put_super(struct super_block *sb)
vol->mftmirr_ino = NULL;
}
/*
- * If any dirty inodes are left, throw away all mft data page cache
- * pages to allow a clean umount. This should never happen any more
- * due to mft.c::ntfs_mft_writepage() cleaning all the dirty pages as
- * the underlying mft records are written out and cleaned. If it does,
- * happen anyway, we want to know...
+ * We should have no dirty inodes left, due to
+ * mft.c::ntfs_mft_writepage() cleaning all the dirty pages as
+ * the underlying mft records are written out and cleaned.
*/
ntfs_commit_inode(vol->mft_ino);
write_inode_now(vol->mft_ino, 1);
- if (sb_has_dirty_inodes(sb)) {
- const char *s1, *s2;
-
- mutex_lock(&vol->mft_ino->i_mutex);
- truncate_inode_pages(vol->mft_ino->i_mapping, 0);
- mutex_unlock(&vol->mft_ino->i_mutex);
- write_inode_now(vol->mft_ino, 1);
- if (sb_has_dirty_inodes(sb)) {
- static const char *_s1 = "inodes";
- static const char *_s2 = "";
- s1 = _s1;
- s2 = _s2;
- } else {
- static const char *_s1 = "mft pages";
- static const char *_s2 = "They have been thrown "
- "away. ";
- s1 = _s1;
- s2 = _s2;
- }
- ntfs_error(sb, "Dirty %s found at umount time. %sYou should "
- "run chkdsk. Please email "
- "linux-ntfs-dev@lists.sourceforge.net and say "
- "that you saw this message. Thank you.", s1,
- s2);
- }
#endif /* NTFS_RW */
iput(vol->mft_ino);
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 02/11] btrfs: properly register fs backing device
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
2009-05-28 11:46 ` [PATCH 01/11] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 03/11] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
` (14 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
btrfs assigns this bdi to all inodes on that file system, so make
sure it's registered. This isn't really important now, but will be
when we put dirty inodes there. Even now, we miss the stats when the
bdi isn't visible.
Also fixes failure to check bdi_init() return value, and bad inherit of
->capabilities flags from the default bdi.
Acked-by: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/btrfs/disk-io.c | 23 ++++++++++++++++++-----
1 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4b0ea0b..2dc19c9 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1345,12 +1345,24 @@ static void btrfs_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
free_extent_map(em);
}
+/*
+ * If this fails, caller must call bdi_destroy() to get rid of the
+ * bdi again.
+ */
static int setup_bdi(struct btrfs_fs_info *info, struct backing_dev_info *bdi)
{
- bdi_init(bdi);
+ int err;
+
+ bdi->capabilities = BDI_CAP_MAP_COPY;
+ err = bdi_init(bdi);
+ if (err)
+ return err;
+
+ err = bdi_register(bdi, NULL, "btrfs");
+ if (err)
+ return err;
+
bdi->ra_pages = default_backing_dev_info.ra_pages;
- bdi->state = 0;
- bdi->capabilities = default_backing_dev_info.capabilities;
bdi->unplug_io_fn = btrfs_unplug_io_fn;
bdi->unplug_io_data = info;
bdi->congested_fn = btrfs_congested_fn;
@@ -1574,7 +1586,8 @@ struct btrfs_root *open_ctree(struct super_block *sb,
fs_info->sb = sb;
fs_info->max_extent = (u64)-1;
fs_info->max_inline = 8192 * 1024;
- setup_bdi(fs_info, &fs_info->bdi);
+ if (setup_bdi(fs_info, &fs_info->bdi))
+ goto fail_bdi;
fs_info->btree_inode = new_inode(sb);
fs_info->btree_inode->i_ino = 1;
fs_info->btree_inode->i_nlink = 1;
@@ -1931,8 +1944,8 @@ fail_iput:
btrfs_close_devices(fs_info->fs_devices);
btrfs_mapping_tree_free(&fs_info->mapping_tree);
+fail_bdi:
bdi_destroy(&fs_info->bdi);
-
fail:
kfree(extent_root);
kfree(tree_root);
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 03/11] writeback: move dirty inodes from super_block to backing_dev_info
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
2009-05-28 11:46 ` [PATCH 01/11] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-28 11:46 ` [PATCH 02/11] btrfs: properly register fs backing device Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 04/11] writeback: switch to per-bdi threads for flushing data Jens Axboe
` (13 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
This is a first step at introducing per-bdi flusher threads. We should
have no change in behaviour, although sb_has_dirty_inodes() is now
ridiculously expensive, as there's no easy way to answer that question.
Not a huge problem, since it'll be deleted in subsequent patches.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 196 +++++++++++++++++++++++++++---------------
fs/super.c | 3 -
include/linux/backing-dev.h | 9 ++
include/linux/fs.h | 5 +-
mm/backing-dev.c | 24 +++++
mm/page-writeback.c | 11 +--
6 files changed, 164 insertions(+), 84 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 91013ff..1137408 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -25,6 +25,7 @@
#include <linux/buffer_head.h>
#include "internal.h"
+#define inode_to_bdi(inode) ((inode)->i_mapping->backing_dev_info)
/**
* writeback_acquire - attempt to get exclusive writeback access to a device
@@ -158,12 +159,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
goto out;
/*
- * If the inode was already on s_dirty/s_io/s_more_io, don't
- * reposition it (that would break s_dirty time-ordering).
+ * If the inode was already on b_dirty/b_io/b_more_io, don't
+ * reposition it (that would break b_dirty time-ordering).
*/
if (!was_dirty) {
inode->dirtied_when = jiffies;
- list_move(&inode->i_list, &sb->s_dirty);
+ list_move(&inode->i_list,
+ &inode_to_bdi(inode)->b_dirty);
}
}
out:
@@ -184,31 +186,30 @@ static int write_inode(struct inode *inode, int sync)
* furthest end of its superblock's dirty-inode list.
*
* Before stamping the inode's ->dirtied_when, we check to see whether it is
- * already the most-recently-dirtied inode on the s_dirty list. If that is
+ * already the most-recently-dirtied inode on the b_dirty list. If that is
* the case then the inode must have been redirtied while it was being written
* out and we don't reset its dirtied_when.
*/
static void redirty_tail(struct inode *inode)
{
- struct super_block *sb = inode->i_sb;
+ struct backing_dev_info *bdi = inode_to_bdi(inode);
- if (!list_empty(&sb->s_dirty)) {
- struct inode *tail_inode;
+ if (!list_empty(&bdi->b_dirty)) {
+ struct inode *tail;
- tail_inode = list_entry(sb->s_dirty.next, struct inode, i_list);
- if (time_before(inode->dirtied_when,
- tail_inode->dirtied_when))
+ tail = list_entry(bdi->b_dirty.next, struct inode, i_list);
+ if (time_before(inode->dirtied_when, tail->dirtied_when))
inode->dirtied_when = jiffies;
}
- list_move(&inode->i_list, &sb->s_dirty);
+ list_move(&inode->i_list, &bdi->b_dirty);
}
/*
- * requeue inode for re-scanning after sb->s_io list is exhausted.
+ * requeue inode for re-scanning after bdi->b_io list is exhausted.
*/
static void requeue_io(struct inode *inode)
{
- list_move(&inode->i_list, &inode->i_sb->s_more_io);
+ list_move(&inode->i_list, &inode_to_bdi(inode)->b_more_io);
}
static void inode_sync_complete(struct inode *inode)
@@ -255,18 +256,50 @@ static void move_expired_inodes(struct list_head *delaying_queue,
/*
* Queue all expired dirty inodes for io, eldest first.
*/
-static void queue_io(struct super_block *sb,
- unsigned long *older_than_this)
+static void queue_io(struct backing_dev_info *bdi,
+ unsigned long *older_than_this)
+{
+ list_splice_init(&bdi->b_more_io, bdi->b_io.prev);
+ move_expired_inodes(&bdi->b_dirty, &bdi->b_io, older_than_this);
+}
+
+static int sb_on_inode_list(struct super_block *sb, struct list_head *list)
{
- list_splice_init(&sb->s_more_io, sb->s_io.prev);
- move_expired_inodes(&sb->s_dirty, &sb->s_io, older_than_this);
+ struct inode *inode;
+ int ret = 0;
+
+ spin_lock(&inode_lock);
+ list_for_each_entry(inode, list, i_list) {
+ if (inode->i_sb == sb) {
+ ret = 1;
+ break;
+ }
+ }
+ spin_unlock(&inode_lock);
+ return ret;
}
int sb_has_dirty_inodes(struct super_block *sb)
{
- return !list_empty(&sb->s_dirty) ||
- !list_empty(&sb->s_io) ||
- !list_empty(&sb->s_more_io);
+ struct backing_dev_info *bdi;
+ int ret = 0;
+
+ /*
+ * This is REALLY expensive right now, but it'll go away
+ * when the bdi writeback is introduced
+ */
+ mutex_lock(&bdi_lock);
+ list_for_each_entry(bdi, &bdi_list, bdi_list) {
+ if (sb_on_inode_list(sb, &bdi->b_dirty) ||
+ sb_on_inode_list(sb, &bdi->b_io) ||
+ sb_on_inode_list(sb, &bdi->b_more_io)) {
+ ret = 1;
+ break;
+ }
+ }
+ mutex_unlock(&bdi_lock);
+
+ return ret;
}
EXPORT_SYMBOL(sb_has_dirty_inodes);
@@ -322,11 +355,11 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc)
/*
* We didn't write back all the pages. nfs_writepages()
* sometimes bales out without doing anything. Redirty
- * the inode; Move it from s_io onto s_more_io/s_dirty.
+ * the inode; Move it from b_io onto b_more_io/b_dirty.
*/
/*
* akpm: if the caller was the kupdate function we put
- * this inode at the head of s_dirty so it gets first
+ * this inode at the head of b_dirty so it gets first
* consideration. Otherwise, move it to the tail, for
* the reasons described there. I'm not really sure
* how much sense this makes. Presumably I had a good
@@ -336,7 +369,7 @@ __sync_single_inode(struct inode *inode, struct writeback_control *wbc)
if (wbc->for_kupdate) {
/*
* For the kupdate function we move the inode
- * to s_more_io so it will get more writeout as
+ * to b_more_io so it will get more writeout as
* soon as the queue becomes uncongested.
*/
inode->i_state |= I_DIRTY_PAGES;
@@ -402,10 +435,10 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
if ((wbc->sync_mode != WB_SYNC_ALL) && (inode->i_state & I_SYNC)) {
/*
* We're skipping this inode because it's locked, and we're not
- * doing writeback-for-data-integrity. Move it to s_more_io so
- * that writeback can proceed with the other inodes on s_io.
+ * doing writeback-for-data-integrity. Move it to b_more_io so
+ * that writeback can proceed with the other inodes on b_io.
* We'll have another go at writing back this inode when we
- * completed a full scan of s_io.
+ * completed a full scan of b_io.
*/
requeue_io(inode);
return 0;
@@ -428,51 +461,34 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
return __sync_single_inode(inode, wbc);
}
-/*
- * Write out a superblock's list of dirty inodes. A wait will be performed
- * upon no inodes, all inodes or the final one, depending upon sync_mode.
- *
- * If older_than_this is non-NULL, then only write out inodes which
- * had their first dirtying at a time earlier than *older_than_this.
- *
- * If we're a pdflush thread, then implement pdflush collision avoidance
- * against the entire list.
- *
- * If `bdi' is non-zero then we're being asked to writeback a specific queue.
- * This function assumes that the blockdev superblock's inodes are backed by
- * a variety of queues, so all inodes are searched. For other superblocks,
- * assume that all inodes are backed by the same queue.
- *
- * FIXME: this linear search could get expensive with many fileystems. But
- * how to fix? We need to go from an address_space to all inodes which share
- * a queue with that address_space. (Easy: have a global "dirty superblocks"
- * list).
- *
- * The inodes to be written are parked on sb->s_io. They are moved back onto
- * sb->s_dirty as they are selected for writing. This way, none can be missed
- * on the writer throttling path, and we get decent balancing between many
- * throttled threads: we don't want them all piling up on inode_sync_wait.
- */
-void generic_sync_sb_inodes(struct super_block *sb,
- struct writeback_control *wbc)
+static void generic_sync_bdi_inodes(struct backing_dev_info *bdi,
+ struct writeback_control *wbc,
+ struct super_block *sb,
+ int is_blkdev_sb)
{
const unsigned long start = jiffies; /* livelock avoidance */
- int sync = wbc->sync_mode == WB_SYNC_ALL;
spin_lock(&inode_lock);
- if (!wbc->for_kupdate || list_empty(&sb->s_io))
- queue_io(sb, wbc->older_than_this);
- while (!list_empty(&sb->s_io)) {
- struct inode *inode = list_entry(sb->s_io.prev,
+ if (!wbc->for_kupdate || list_empty(&bdi->b_io))
+ queue_io(bdi, wbc->older_than_this);
+
+ while (!list_empty(&bdi->b_io)) {
+ struct inode *inode = list_entry(bdi->b_io.prev,
struct inode, i_list);
- struct address_space *mapping = inode->i_mapping;
- struct backing_dev_info *bdi = mapping->backing_dev_info;
long pages_skipped;
+ /*
+ * super block given and doesn't match, skip this inode
+ */
+ if (sb && sb != inode->i_sb) {
+ redirty_tail(inode);
+ continue;
+ }
+
if (!bdi_cap_writeback_dirty(bdi)) {
redirty_tail(inode);
- if (sb_is_blkdev_sb(sb)) {
+ if (is_blkdev_sb) {
/*
* Dirty memory-backed blockdev: the ramdisk
* driver does this. Skip just this inode
@@ -494,14 +510,14 @@ void generic_sync_sb_inodes(struct super_block *sb,
if (wbc->nonblocking && bdi_write_congested(bdi)) {
wbc->encountered_congestion = 1;
- if (!sb_is_blkdev_sb(sb))
+ if (!is_blkdev_sb)
break; /* Skip a congested fs */
requeue_io(inode);
continue; /* Skip a congested blockdev */
}
if (wbc->bdi && bdi != wbc->bdi) {
- if (!sb_is_blkdev_sb(sb))
+ if (!is_blkdev_sb)
break; /* fs has the wrong queue */
requeue_io(inode);
continue; /* blockdev has wrong queue */
@@ -539,13 +555,55 @@ void generic_sync_sb_inodes(struct super_block *sb,
wbc->more_io = 1;
break;
}
- if (!list_empty(&sb->s_more_io))
+ if (!list_empty(&bdi->b_more_io))
wbc->more_io = 1;
}
- if (sync) {
+ spin_unlock(&inode_lock);
+ /* Leave any unwritten inodes on b_io */
+}
+
+/*
+ * Write out a superblock's list of dirty inodes. A wait will be performed
+ * upon no inodes, all inodes or the final one, depending upon sync_mode.
+ *
+ * If older_than_this is non-NULL, then only write out inodes which
+ * had their first dirtying at a time earlier than *older_than_this.
+ *
+ * If we're a pdlfush thread, then implement pdflush collision avoidance
+ * against the entire list.
+ *
+ * If `bdi' is non-zero then we're being asked to writeback a specific queue.
+ * This function assumes that the blockdev superblock's inodes are backed by
+ * a variety of queues, so all inodes are searched. For other superblocks,
+ * assume that all inodes are backed by the same queue.
+ *
+ * FIXME: this linear search could get expensive with many fileystems. But
+ * how to fix? We need to go from an address_space to all inodes which share
+ * a queue with that address_space. (Easy: have a global "dirty superblocks"
+ * list).
+ *
+ * The inodes to be written are parked on bdi->b_io. They are moved back onto
+ * bdi->b_dirty as they are selected for writing. This way, none can be missed
+ * on the writer throttling path, and we get decent balancing between many
+ * throttled threads: we don't want them all piling up on inode_sync_wait.
+ */
+void generic_sync_sb_inodes(struct super_block *sb,
+ struct writeback_control *wbc)
+{
+ const int is_blkdev_sb = sb_is_blkdev_sb(sb);
+ struct backing_dev_info *bdi;
+
+ mutex_lock(&bdi_lock);
+ list_for_each_entry(bdi, &bdi_list, bdi_list)
+ generic_sync_bdi_inodes(bdi, wbc, sb, is_blkdev_sb);
+ mutex_unlock(&bdi_lock);
+
+ if (wbc->sync_mode == WB_SYNC_ALL) {
struct inode *inode, *old_inode = NULL;
+ spin_lock(&inode_lock);
+
/*
* Data integrity sync. Must wait for all pages under writeback,
* because there may have been pages dirtied before our sync
@@ -583,10 +641,8 @@ void generic_sync_sb_inodes(struct super_block *sb,
}
spin_unlock(&inode_lock);
iput(old_inode);
- } else
- spin_unlock(&inode_lock);
+ }
- return; /* Leave any unwritten inodes on s_io */
}
EXPORT_SYMBOL_GPL(generic_sync_sb_inodes);
@@ -601,8 +657,8 @@ static void sync_sb_inodes(struct super_block *sb,
*
* Note:
* We don't need to grab a reference to superblock here. If it has non-empty
- * ->s_dirty it's hadn't been killed yet and kill_super() won't proceed
- * past sync_inodes_sb() until the ->s_dirty/s_io/s_more_io lists are all
+ * ->b_dirty it's hadn't been killed yet and kill_super() won't proceed
+ * past sync_inodes_sb() until the ->b_dirty/b_io/b_more_io lists are all
* empty. Since __sync_single_inode() regains inode_lock before it finally moves
* inode from superblock lists we are OK.
*
diff --git a/fs/super.c b/fs/super.c
index 1943fdf..76dd5b2 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -64,9 +64,6 @@ static struct super_block *alloc_super(struct file_system_type *type)
s = NULL;
goto out;
}
- INIT_LIST_HEAD(&s->s_dirty);
- INIT_LIST_HEAD(&s->s_io);
- INIT_LIST_HEAD(&s->s_more_io);
INIT_LIST_HEAD(&s->s_files);
INIT_LIST_HEAD(&s->s_instances);
INIT_HLIST_HEAD(&s->s_anon);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 0ec2c59..8719c87 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -40,6 +40,8 @@ enum bdi_stat_item {
#define BDI_STAT_BATCH (8*(1+ilog2(nr_cpu_ids)))
struct backing_dev_info {
+ struct list_head bdi_list;
+
unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */
unsigned long state; /* Always use atomic bitops on this */
unsigned int capabilities; /* Device capabilities */
@@ -58,6 +60,10 @@ struct backing_dev_info {
struct device *dev;
+ struct list_head b_dirty; /* dirty inodes */
+ struct list_head b_io; /* parked for writeback */
+ struct list_head b_more_io; /* parked for more writeback */
+
#ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
struct dentry *debug_stats;
@@ -72,6 +78,9 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
void bdi_unregister(struct backing_dev_info *bdi);
+extern struct mutex bdi_lock;
+extern struct list_head bdi_list;
+
static inline void __add_bdi_stat(struct backing_dev_info *bdi,
enum bdi_stat_item item, s64 amount)
{
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3b534e5..6b475d4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -712,7 +712,7 @@ static inline int mapping_writably_mapped(struct address_space *mapping)
struct inode {
struct hlist_node i_hash;
- struct list_head i_list;
+ struct list_head i_list; /* backing dev IO list */
struct list_head i_sb_list;
struct list_head i_dentry;
unsigned long i_ino;
@@ -1329,9 +1329,6 @@ struct super_block {
struct xattr_handler **s_xattr;
struct list_head s_inodes; /* all inodes */
- struct list_head s_dirty; /* dirty inodes */
- struct list_head s_io; /* parked for writeback */
- struct list_head s_more_io; /* parked for more writeback */
struct hlist_head s_anon; /* anonymous dentries for (nfs) exporting */
struct list_head s_files;
/* s_dentry_lru and s_nr_dentry_unused are protected by dcache_lock */
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 493b468..de0bbfe 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -22,6 +22,8 @@ struct backing_dev_info default_backing_dev_info = {
EXPORT_SYMBOL_GPL(default_backing_dev_info);
static struct class *bdi_class;
+DEFINE_MUTEX(bdi_lock);
+LIST_HEAD(bdi_list);
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
@@ -211,6 +213,10 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
goto exit;
}
+ mutex_lock(&bdi_lock);
+ list_add_tail(&bdi->bdi_list, &bdi_list);
+ mutex_unlock(&bdi_lock);
+
bdi->dev = dev;
bdi_debug_register(bdi, dev_name(dev));
@@ -225,9 +231,17 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev)
}
EXPORT_SYMBOL(bdi_register_dev);
+static void bdi_remove_from_list(struct backing_dev_info *bdi)
+{
+ mutex_lock(&bdi_lock);
+ list_del(&bdi->bdi_list);
+ mutex_unlock(&bdi_lock);
+}
+
void bdi_unregister(struct backing_dev_info *bdi)
{
if (bdi->dev) {
+ bdi_remove_from_list(bdi);
bdi_debug_unregister(bdi);
device_unregister(bdi->dev);
bdi->dev = NULL;
@@ -245,6 +259,10 @@ int bdi_init(struct backing_dev_info *bdi)
bdi->min_ratio = 0;
bdi->max_ratio = 100;
bdi->max_prop_frac = PROP_FRAC_BASE;
+ INIT_LIST_HEAD(&bdi->bdi_list);
+ INIT_LIST_HEAD(&bdi->b_io);
+ INIT_LIST_HEAD(&bdi->b_dirty);
+ INIT_LIST_HEAD(&bdi->b_more_io);
for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
err = percpu_counter_init(&bdi->bdi_stat[i], 0);
@@ -259,6 +277,8 @@ int bdi_init(struct backing_dev_info *bdi)
err:
while (i--)
percpu_counter_destroy(&bdi->bdi_stat[i]);
+
+ bdi_remove_from_list(bdi);
}
return err;
@@ -269,6 +289,10 @@ void bdi_destroy(struct backing_dev_info *bdi)
{
int i;
+ WARN_ON(!list_empty(&bdi->b_dirty));
+ WARN_ON(!list_empty(&bdi->b_io));
+ WARN_ON(!list_empty(&bdi->b_more_io));
+
bdi_unregister(bdi);
for (i = 0; i < NR_BDI_STAT_ITEMS; i++)
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index bb553c3..7c44314 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -319,15 +319,13 @@ static void task_dirty_limit(struct task_struct *tsk, long *pdirty)
/*
*
*/
-static DEFINE_SPINLOCK(bdi_lock);
static unsigned int bdi_min_ratio;
int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio)
{
int ret = 0;
- unsigned long flags;
- spin_lock_irqsave(&bdi_lock, flags);
+ mutex_lock(&bdi_lock);
if (min_ratio > bdi->max_ratio) {
ret = -EINVAL;
} else {
@@ -339,27 +337,26 @@ int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio)
ret = -EINVAL;
}
}
- spin_unlock_irqrestore(&bdi_lock, flags);
+ mutex_unlock(&bdi_lock);
return ret;
}
int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned max_ratio)
{
- unsigned long flags;
int ret = 0;
if (max_ratio > 100)
return -EINVAL;
- spin_lock_irqsave(&bdi_lock, flags);
+ mutex_lock(&bdi_lock);
if (bdi->min_ratio > max_ratio) {
ret = -EINVAL;
} else {
bdi->max_ratio = max_ratio;
bdi->max_prop_frac = (PROP_FRAC_BASE * max_ratio) / 100;
}
- spin_unlock_irqrestore(&bdi_lock, flags);
+ mutex_unlock(&bdi_lock);
return ret;
}
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 04/11] writeback: switch to per-bdi threads for flushing data
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (2 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 03/11] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 14:13 ` Artem Bityutskiy
2009-05-28 11:46 ` [PATCH 05/11] writeback: get rid of pdflush completely Jens Axboe
` (12 subsequent siblings)
16 siblings, 1 reply; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
This gets rid of pdflush for bdi writeout and kupdated style cleaning.
This is an experiment to see if we get better writeout behaviour with
per-bdi flushing. Some initial tests look pretty encouraging. A sample
ffsb workload that does random writes to files is about 8% faster here
on a simple SATA drive during the benchmark phase. File layout also seems
a LOT more smooth in vmstat:
r b swpd free buff cache si so bi bo in cs us sy id wa
0 1 0 608848 2652 375372 0 0 0 71024 604 24 1 10 48 42
0 1 0 549644 2712 433736 0 0 0 60692 505 27 1 8 48 44
1 0 0 476928 2784 505192 0 0 4 29540 553 24 0 9 53 37
0 1 0 457972 2808 524008 0 0 0 54876 331 16 0 4 38 58
0 1 0 366128 2928 614284 0 0 4 92168 710 58 0 13 53 34
0 1 0 295092 3000 684140 0 0 0 62924 572 23 0 9 53 37
0 1 0 236592 3064 741704 0 0 4 58256 523 17 0 8 48 44
0 1 0 165608 3132 811464 0 0 0 57460 560 21 0 8 54 38
0 1 0 102952 3200 873164 0 0 4 74748 540 29 1 10 48 41
0 1 0 48604 3252 926472 0 0 0 53248 469 29 0 7 47 45
where vanilla tends to fluctuate a lot in the creation phase:
r b swpd free buff cache si so bi bo in cs us sy id wa
1 1 0 678716 5792 303380 0 0 0 74064 565 50 1 11 52 36
1 0 0 662488 5864 319396 0 0 4 352 302 329 0 2 47 51
0 1 0 599312 5924 381468 0 0 0 78164 516 55 0 9 51 40
0 1 0 519952 6008 459516 0 0 4 78156 622 56 1 11 52 37
1 1 0 436640 6092 541632 0 0 0 82244 622 54 0 11 48 41
0 1 0 436640 6092 541660 0 0 0 8 152 39 0 0 51 49
0 1 0 332224 6200 644252 0 0 4 102800 728 46 1 13 49 36
1 0 0 274492 6260 701056 0 0 4 12328 459 49 0 7 50 43
0 1 0 211220 6324 763356 0 0 0 106940 515 37 1 10 51 39
1 0 0 160412 6376 813468 0 0 0 8224 415 43 0 6 49 45
1 1 0 85980 6452 886556 0 0 4 113516 575 39 1 11 54 34
0 2 0 85968 6452 886620 0 0 0 1640 158 211 0 0 46 54
So apart from seemingly behaving better for buffered writeout, this also
allows us to potentially have more than one bdi thread flushing out data.
This may be useful for NUMA type setups.
A 10 disk test with btrfs performs 26% faster with per-bdi flushing. Other
tests pending. mmap heavy writing also improves considerably.
A separate thread is added to sync the super blocks. In the long term,
adding sync_supers_bdi() functionality could get rid of this thread again.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/buffer.c | 2 +-
fs/fs-writeback.c | 309 ++++++++++++++++++++++++++-----------------
fs/sync.c | 2 +-
include/linux/backing-dev.h | 28 ++++
include/linux/fs.h | 3 +-
include/linux/writeback.h | 2 +-
mm/backing-dev.c | 231 +++++++++++++++++++++++++++++++-
mm/page-writeback.c | 140 +------------------
mm/vmscan.c | 2 +-
9 files changed, 452 insertions(+), 267 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index aed2977..14f0802 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -281,7 +281,7 @@ static void free_more_memory(void)
struct zone *zone;
int nid;
- wakeup_pdflush(1024);
+ wakeup_flusher_threads(1024);
yield();
for_each_online_node(nid) {
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 1137408..aa0b560 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -19,6 +19,8 @@
#include <linux/sched.h>
#include <linux/fs.h>
#include <linux/mm.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
#include <linux/writeback.h>
#include <linux/blkdev.h>
#include <linux/backing-dev.h>
@@ -61,10 +63,186 @@ int writeback_in_progress(struct backing_dev_info *bdi)
*/
static void writeback_release(struct backing_dev_info *bdi)
{
- BUG_ON(!writeback_in_progress(bdi));
+ WARN_ON_ONCE(!writeback_in_progress(bdi));
+ bdi->wb_arg.nr_pages = 0;
+ bdi->wb_arg.sb = NULL;
clear_bit(BDI_pdflush, &bdi->state);
}
+int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
+ long nr_pages, enum writeback_sync_modes sync_mode)
+{
+ /*
+ * This only happens the first time someone kicks this bdi, so put
+ * it out-of-line.
+ */
+ if (unlikely(!bdi->task)) {
+ bdi_add_default_flusher_task(bdi);
+ return 1;
+ }
+
+ if (writeback_acquire(bdi)) {
+ bdi->wb_arg.nr_pages = nr_pages;
+ bdi->wb_arg.sb = sb;
+ bdi->wb_arg.sync_mode = sync_mode;
+
+ if (bdi->task)
+ wake_up_process(bdi->task);
+ }
+
+ return 0;
+}
+
+/*
+ * The maximum number of pages to writeout in a single bdi flush/kupdate
+ * operation. We do this so we don't hold I_SYNC against an inode for
+ * enormous amounts of time, which would block a userspace task which has
+ * been forced to throttle against that inode. Also, the code reevaluates
+ * the dirty each time it has written this many pages.
+ */
+#define MAX_WRITEBACK_PAGES 1024
+
+/*
+ * Periodic writeback of "old" data.
+ *
+ * Define "old": the first time one of an inode's pages is dirtied, we mark the
+ * dirtying-time in the inode's address_space. So this periodic writeback code
+ * just walks the superblock inode list, writing back any inodes which are
+ * older than a specific point in time.
+ *
+ * Try to run once per dirty_writeback_interval. But if a writeback event
+ * takes longer than a dirty_writeback_interval interval, then leave a
+ * one-second gap.
+ *
+ * older_than_this takes precedence over nr_to_write. So we'll only write back
+ * all dirty pages if they are all attached to "old" mappings.
+ */
+static void bdi_kupdated(struct backing_dev_info *bdi)
+{
+ unsigned long oldest_jif;
+ long nr_to_write;
+ struct writeback_control wbc = {
+ .bdi = bdi,
+ .sync_mode = WB_SYNC_NONE,
+ .older_than_this = &oldest_jif,
+ .nr_to_write = 0,
+ .for_kupdate = 1,
+ .range_cyclic = 1,
+ };
+
+ oldest_jif = jiffies - msecs_to_jiffies(dirty_expire_interval * 10);
+
+ nr_to_write = global_page_state(NR_FILE_DIRTY) +
+ global_page_state(NR_UNSTABLE_NFS) +
+ (inodes_stat.nr_inodes - inodes_stat.nr_unused);
+
+ while (nr_to_write > 0) {
+ wbc.more_io = 0;
+ wbc.encountered_congestion = 0;
+ wbc.nr_to_write = MAX_WRITEBACK_PAGES;
+ generic_sync_bdi_inodes(NULL, &wbc);
+ if (wbc.nr_to_write > 0)
+ break; /* All the old data is written */
+ nr_to_write -= MAX_WRITEBACK_PAGES;
+ }
+}
+
+static inline bool over_bground_thresh(void)
+{
+ unsigned long background_thresh, dirty_thresh;
+
+ get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
+
+ return (global_page_state(NR_FILE_DIRTY) +
+ global_page_state(NR_UNSTABLE_NFS) >= background_thresh);
+}
+
+static void bdi_pdflush(struct backing_dev_info *bdi)
+{
+ struct writeback_control wbc = {
+ .bdi = bdi,
+ .sync_mode = bdi->wb_arg.sync_mode,
+ .older_than_this = NULL,
+ .range_cyclic = 1,
+ };
+ long nr_pages = bdi->wb_arg.nr_pages;
+
+ for (;;) {
+ if (wbc.sync_mode == WB_SYNC_NONE && nr_pages <= 0 &&
+ !over_bground_thresh())
+ break;
+
+ wbc.more_io = 0;
+ wbc.encountered_congestion = 0;
+ wbc.nr_to_write = MAX_WRITEBACK_PAGES;
+ wbc.pages_skipped = 0;
+ generic_sync_bdi_inodes(bdi->wb_arg.sb, &wbc);
+ nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+ /*
+ * If we ran out of stuff to write, bail unless more_io got set
+ */
+ if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
+ if (wbc.more_io)
+ continue;
+ break;
+ }
+ }
+}
+
+/*
+ * Handle writeback of dirty data for the device backed by this bdi. Also
+ * wakes up periodically and does kupdated style flushing.
+ */
+int bdi_writeback_task(struct backing_dev_info *bdi)
+{
+ while (!kthread_should_stop()) {
+ unsigned long wait_jiffies;
+
+ wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(wait_jiffies);
+ try_to_freeze();
+
+ /*
+ * We get here in two cases:
+ *
+ * schedule_timeout() returned because the dirty writeback
+ * interval has elapsed. If that happens, we will be able
+ * to acquire the writeback lock and will proceed to do
+ * kupdated style writeout.
+ *
+ * Someone called bdi_start_writeback(), which will acquire
+ * the writeback lock. This means our writeback_acquire()
+ * below will fail and we call into bdi_pdflush() for
+ * pdflush style writeout.
+ *
+ */
+ if (writeback_acquire(bdi))
+ bdi_kupdated(bdi);
+ else
+ bdi_pdflush(bdi);
+
+ writeback_release(bdi);
+ }
+
+ return 0;
+}
+
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc)
+{
+ struct backing_dev_info *bdi, *tmp;
+
+ mutex_lock(&bdi_lock);
+
+ list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
+ if (!bdi_has_dirty_io(bdi))
+ continue;
+ bdi_start_writeback(bdi, sb, wbc->nr_to_write, wbc->sync_mode);
+ }
+
+ mutex_unlock(&bdi_lock);
+}
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -263,46 +441,6 @@ static void queue_io(struct backing_dev_info *bdi,
move_expired_inodes(&bdi->b_dirty, &bdi->b_io, older_than_this);
}
-static int sb_on_inode_list(struct super_block *sb, struct list_head *list)
-{
- struct inode *inode;
- int ret = 0;
-
- spin_lock(&inode_lock);
- list_for_each_entry(inode, list, i_list) {
- if (inode->i_sb == sb) {
- ret = 1;
- break;
- }
- }
- spin_unlock(&inode_lock);
- return ret;
-}
-
-int sb_has_dirty_inodes(struct super_block *sb)
-{
- struct backing_dev_info *bdi;
- int ret = 0;
-
- /*
- * This is REALLY expensive right now, but it'll go away
- * when the bdi writeback is introduced
- */
- mutex_lock(&bdi_lock);
- list_for_each_entry(bdi, &bdi_list, bdi_list) {
- if (sb_on_inode_list(sb, &bdi->b_dirty) ||
- sb_on_inode_list(sb, &bdi->b_io) ||
- sb_on_inode_list(sb, &bdi->b_more_io)) {
- ret = 1;
- break;
- }
- }
- mutex_unlock(&bdi_lock);
-
- return ret;
-}
-EXPORT_SYMBOL(sb_has_dirty_inodes);
-
/*
* Write a single inode's dirty pages and inode data out to disk.
* If `wait' is set, wait on the writeout.
@@ -461,11 +599,11 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
return __sync_single_inode(inode, wbc);
}
-static void generic_sync_bdi_inodes(struct backing_dev_info *bdi,
- struct writeback_control *wbc,
- struct super_block *sb,
- int is_blkdev_sb)
+void generic_sync_bdi_inodes(struct super_block *sb,
+ struct writeback_control *wbc)
{
+ const int is_blkdev_sb = sb_is_blkdev_sb(sb);
+ struct backing_dev_info *bdi = wbc->bdi;
const unsigned long start = jiffies; /* livelock avoidance */
spin_lock(&inode_lock);
@@ -516,13 +654,6 @@ static void generic_sync_bdi_inodes(struct backing_dev_info *bdi,
continue; /* Skip a congested blockdev */
}
- if (wbc->bdi && bdi != wbc->bdi) {
- if (!is_blkdev_sb)
- break; /* fs has the wrong queue */
- requeue_io(inode);
- continue; /* blockdev has wrong queue */
- }
-
/*
* Was this inode dirtied after sync_sb_inodes was called?
* This keeps sync from extra jobs and livelock.
@@ -530,16 +661,10 @@ static void generic_sync_bdi_inodes(struct backing_dev_info *bdi,
if (inode_dirtied_after(inode, start))
break;
- /* Is another pdflush already flushing this queue? */
- if (current_is_pdflush() && !writeback_acquire(bdi))
- break;
-
BUG_ON(inode->i_state & I_FREEING);
__iget(inode);
pages_skipped = wbc->pages_skipped;
__writeback_single_inode(inode, wbc);
- if (current_is_pdflush())
- writeback_release(bdi);
if (wbc->pages_skipped != pages_skipped) {
/*
* writeback is not making progress due to locked
@@ -578,11 +703,6 @@ static void generic_sync_bdi_inodes(struct backing_dev_info *bdi,
* a variety of queues, so all inodes are searched. For other superblocks,
* assume that all inodes are backed by the same queue.
*
- * FIXME: this linear search could get expensive with many fileystems. But
- * how to fix? We need to go from an address_space to all inodes which share
- * a queue with that address_space. (Easy: have a global "dirty superblocks"
- * list).
- *
* The inodes to be written are parked on bdi->b_io. They are moved back onto
* bdi->b_dirty as they are selected for writing. This way, none can be missed
* on the writer throttling path, and we get decent balancing between many
@@ -591,13 +711,10 @@ static void generic_sync_bdi_inodes(struct backing_dev_info *bdi,
void generic_sync_sb_inodes(struct super_block *sb,
struct writeback_control *wbc)
{
- const int is_blkdev_sb = sb_is_blkdev_sb(sb);
- struct backing_dev_info *bdi;
-
- mutex_lock(&bdi_lock);
- list_for_each_entry(bdi, &bdi_list, bdi_list)
- generic_sync_bdi_inodes(bdi, wbc, sb, is_blkdev_sb);
- mutex_unlock(&bdi_lock);
+ if (wbc->bdi)
+ generic_sync_bdi_inodes(sb, wbc);
+ else
+ bdi_writeback_all(sb, wbc);
if (wbc->sync_mode == WB_SYNC_ALL) {
struct inode *inode, *old_inode = NULL;
@@ -653,58 +770,6 @@ static void sync_sb_inodes(struct super_block *sb,
}
/*
- * Start writeback of dirty pagecache data against all unlocked inodes.
- *
- * Note:
- * We don't need to grab a reference to superblock here. If it has non-empty
- * ->b_dirty it's hadn't been killed yet and kill_super() won't proceed
- * past sync_inodes_sb() until the ->b_dirty/b_io/b_more_io lists are all
- * empty. Since __sync_single_inode() regains inode_lock before it finally moves
- * inode from superblock lists we are OK.
- *
- * If `older_than_this' is non-zero then only flush inodes which have a
- * flushtime older than *older_than_this.
- *
- * If `bdi' is non-zero then we will scan the first inode against each
- * superblock until we find the matching ones. One group will be the dirty
- * inodes against a filesystem. Then when we hit the dummy blockdev superblock,
- * sync_sb_inodes will seekout the blockdev which matches `bdi'. Maybe not
- * super-efficient but we're about to do a ton of I/O...
- */
-void
-writeback_inodes(struct writeback_control *wbc)
-{
- struct super_block *sb;
-
- might_sleep();
- spin_lock(&sb_lock);
-restart:
- list_for_each_entry_reverse(sb, &super_blocks, s_list) {
- if (sb_has_dirty_inodes(sb)) {
- /* we're making our own get_super here */
- sb->s_count++;
- spin_unlock(&sb_lock);
- /*
- * If we can't get the readlock, there's no sense in
- * waiting around, most of the time the FS is going to
- * be unmounted by the time it is released.
- */
- if (down_read_trylock(&sb->s_umount)) {
- if (sb->s_root)
- sync_sb_inodes(sb, wbc);
- up_read(&sb->s_umount);
- }
- spin_lock(&sb_lock);
- if (__put_super_and_need_restart(sb))
- goto restart;
- }
- if (wbc->nr_to_write <= 0)
- break;
- }
- spin_unlock(&sb_lock);
-}
-
-/*
* writeback and wait upon the filesystem's dirty inodes. The caller will
* do this in two passes - one to write, and one to wait.
*
diff --git a/fs/sync.c b/fs/sync.c
index 7abc65f..3887f10 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -23,7 +23,7 @@
*/
static void do_sync(unsigned long wait)
{
- wakeup_pdflush(0);
+ wakeup_flusher_threads(0);
sync_inodes(0); /* All mappings, inodes and their blockdevs */
vfs_dq_sync(NULL);
sync_supers(); /* Write the superblocks */
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 8719c87..4a312e9 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -13,6 +13,7 @@
#include <linux/proportions.h>
#include <linux/kernel.h>
#include <linux/fs.h>
+#include <linux/writeback.h>
#include <asm/atomic.h>
struct page;
@@ -24,6 +25,7 @@ struct dentry;
*/
enum bdi_state {
BDI_pdflush, /* A pdflush thread is working this device */
+ BDI_pending, /* On its way to being activated */
BDI_async_congested, /* The async (write) queue is getting full */
BDI_sync_congested, /* The sync queue is getting full */
BDI_unused, /* Available bits start here */
@@ -39,6 +41,12 @@ enum bdi_stat_item {
#define BDI_STAT_BATCH (8*(1+ilog2(nr_cpu_ids)))
+struct bdi_writeback_arg {
+ unsigned long nr_pages;
+ struct super_block *sb;
+ enum writeback_sync_modes sync_mode;
+};
+
struct backing_dev_info {
struct list_head bdi_list;
@@ -60,6 +68,8 @@ struct backing_dev_info {
struct device *dev;
+ struct task_struct *task; /* writeback task */
+ struct bdi_writeback_arg wb_arg; /* protected by BDI_pdflush */
struct list_head b_dirty; /* dirty inodes */
struct list_head b_io; /* parked for writeback */
struct list_head b_more_io; /* parked for more writeback */
@@ -77,10 +87,22 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
const char *fmt, ...);
int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
void bdi_unregister(struct backing_dev_info *bdi);
+int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
+ long nr_pages, enum writeback_sync_modes sync_mode);
+int bdi_writeback_task(struct backing_dev_info *bdi);
+void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc);
+void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
extern struct mutex bdi_lock;
extern struct list_head bdi_list;
+static inline int bdi_has_dirty_io(struct backing_dev_info *bdi)
+{
+ return !list_empty(&bdi->b_dirty) ||
+ !list_empty(&bdi->b_io) ||
+ !list_empty(&bdi->b_more_io);
+}
+
static inline void __add_bdi_stat(struct backing_dev_info *bdi,
enum bdi_stat_item item, s64 amount)
{
@@ -196,6 +218,7 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
#define BDI_CAP_EXEC_MAP 0x00000040
#define BDI_CAP_NO_ACCT_WB 0x00000080
#define BDI_CAP_SWAP_BACKED 0x00000100
+#define BDI_CAP_FLUSH_FORKER 0x00000200
#define BDI_CAP_VMFLAGS \
(BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP)
@@ -265,6 +288,11 @@ static inline bool bdi_cap_swap_backed(struct backing_dev_info *bdi)
return bdi->capabilities & BDI_CAP_SWAP_BACKED;
}
+static inline bool bdi_cap_flush_forker(struct backing_dev_info *bdi)
+{
+ return bdi->capabilities & BDI_CAP_FLUSH_FORKER;
+}
+
static inline bool mapping_cap_writeback_dirty(struct address_space *mapping)
{
return bdi_cap_writeback_dirty(mapping->backing_dev_info);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6b475d4..ecdc544 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2063,6 +2063,8 @@ extern int invalidate_inode_pages2_range(struct address_space *mapping,
pgoff_t start, pgoff_t end);
extern void generic_sync_sb_inodes(struct super_block *sb,
struct writeback_control *wbc);
+extern void generic_sync_bdi_inodes(struct super_block *sb,
+ struct writeback_control *);
extern int write_inode_now(struct inode *, int);
extern int filemap_fdatawrite(struct address_space *);
extern int filemap_flush(struct address_space *);
@@ -2180,7 +2182,6 @@ extern int bdev_read_only(struct block_device *);
extern int set_blocksize(struct block_device *, int);
extern int sb_set_blocksize(struct super_block *, int);
extern int sb_min_blocksize(struct super_block *, int);
-extern int sb_has_dirty_inodes(struct super_block *);
extern int generic_file_mmap(struct file *, struct vm_area_struct *);
extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 9344547..a8e9f78 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -99,7 +99,7 @@ static inline void inode_sync_wait(struct inode *inode)
/*
* mm/page-writeback.c
*/
-int wakeup_pdflush(long nr_pages);
+void wakeup_flusher_threads(long nr_pages);
void laptop_io_completion(void);
void laptop_sync_completion(void);
void throttle_vm_writeout(gfp_t gfp_mask);
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index de0bbfe..3dbfc76 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -1,8 +1,11 @@
#include <linux/wait.h>
#include <linux/backing-dev.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
#include <linux/fs.h>
#include <linux/pagemap.h>
+#include <linux/mm.h>
#include <linux/sched.h>
#include <linux/module.h>
#include <linux/writeback.h>
@@ -16,7 +19,7 @@ EXPORT_SYMBOL(default_unplug_io_fn);
struct backing_dev_info default_backing_dev_info = {
.ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
.state = 0,
- .capabilities = BDI_CAP_MAP_COPY,
+ .capabilities = BDI_CAP_MAP_COPY | BDI_CAP_FLUSH_FORKER,
.unplug_io_fn = default_unplug_io_fn,
};
EXPORT_SYMBOL_GPL(default_backing_dev_info);
@@ -24,6 +27,14 @@ EXPORT_SYMBOL_GPL(default_backing_dev_info);
static struct class *bdi_class;
DEFINE_MUTEX(bdi_lock);
LIST_HEAD(bdi_list);
+LIST_HEAD(bdi_pending_list);
+
+static struct task_struct *sync_supers_tsk;
+static struct timer_list sync_supers_timer;
+
+static int bdi_sync_supers(void *);
+static void sync_supers_timer_fn(unsigned long);
+static void arm_supers_timer(void);
#ifdef CONFIG_DEBUG_FS
#include <linux/debugfs.h>
@@ -187,6 +198,13 @@ static int __init default_bdi_init(void)
{
int err;
+ sync_supers_tsk = kthread_run(bdi_sync_supers, NULL, "sync_supers");
+ BUG_ON(!sync_supers_tsk);
+
+ init_timer(&sync_supers_timer);
+ setup_timer(&sync_supers_timer, sync_supers_timer_fn, 0);
+ arm_supers_timer();
+
err = bdi_init(&default_backing_dev_info);
if (!err)
bdi_register(&default_backing_dev_info, NULL, "default");
@@ -195,6 +213,172 @@ static int __init default_bdi_init(void)
}
subsys_initcall(default_bdi_init);
+static int bdi_start_fn(void *ptr)
+{
+ struct backing_dev_info *bdi = ptr;
+ struct task_struct *tsk = current;
+
+ /*
+ * Add us to the active bdi_list
+ */
+ mutex_lock(&bdi_lock);
+ list_add(&bdi->bdi_list, &bdi_list);
+ mutex_unlock(&bdi_lock);
+
+ tsk->flags |= PF_FLUSHER | PF_SWAPWRITE;
+ set_freezable();
+
+ /*
+ * Our parent may run at a different priority, just set us to normal
+ */
+ set_user_nice(tsk, 0);
+
+ /*
+ * Clear pending bit and wakeup anybody waiting to tear us down
+ */
+ clear_bit(BDI_pending, &bdi->state);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&bdi->state, BDI_pending);
+
+ return bdi_writeback_task(bdi);
+}
+
+static void bdi_flush_io(struct backing_dev_info *bdi)
+{
+ struct writeback_control wbc = {
+ .bdi = bdi,
+ .sync_mode = WB_SYNC_NONE,
+ .older_than_this = NULL,
+ .range_cyclic = 1,
+ .nr_to_write = 1024,
+ };
+
+ generic_sync_bdi_inodes(NULL, &wbc);
+}
+
+/*
+ * kupdated() used to do this. We cannot do it from the bdi_forker_task()
+ * or we risk deadlocking on ->s_umount. The longer term solution would be
+ * to implement sync_supers_bdi() or similar and simply do it from the
+ * bdi writeback tasks individually.
+ */
+static int bdi_sync_supers(void *unused)
+{
+ set_user_nice(current, 0);
+
+ while (!kthread_should_stop()) {
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule();
+
+ /*
+ * Do this periodically, like kupdated() did before.
+ */
+ sync_supers();
+ }
+
+ return 0;
+}
+
+static void arm_supers_timer(void)
+{
+ unsigned long next;
+
+ next = msecs_to_jiffies(dirty_writeback_interval * 10) + jiffies;
+ mod_timer(&sync_supers_timer, round_jiffies_up(next));
+}
+
+static void sync_supers_timer_fn(unsigned long unused)
+{
+ wake_up_process(sync_supers_tsk);
+ arm_supers_timer();
+}
+
+static int bdi_forker_task(void *ptr)
+{
+ struct backing_dev_info *me = ptr;
+
+ for (;;) {
+ struct backing_dev_info *bdi, *tmp;
+
+ /*
+ * Temporary measure, we want to make sure we don't see
+ * dirty data on the default backing_dev_info
+ */
+ if (bdi_has_dirty_io(me))
+ bdi_flush_io(me);
+
+ mutex_lock(&bdi_lock);
+
+ /*
+ * Check if any existing bdi's have dirty data without
+ * a thread registered. If so, set that up.
+ */
+ list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
+ if (bdi->task || !bdi_has_dirty_io(bdi))
+ continue;
+
+ bdi_add_default_flusher_task(bdi);
+ }
+
+ if (list_empty(&bdi_pending_list)) {
+ unsigned long wait;
+
+ mutex_unlock(&bdi_lock);
+ wait = msecs_to_jiffies(dirty_writeback_interval * 10);
+ set_current_state(TASK_INTERRUPTIBLE);
+ schedule_timeout(wait);
+ try_to_freeze();
+ continue;
+ }
+
+ /*
+ * This is our real job - check for pending entries in
+ * bdi_pending_list, and create the tasks that got added
+ */
+ bdi = list_entry(bdi_pending_list.next, struct backing_dev_info,
+ bdi_list);
+ list_del_init(&bdi->bdi_list);
+ mutex_unlock(&bdi_lock);
+
+ BUG_ON(bdi->task);
+
+ bdi->task = kthread_run(bdi_start_fn, bdi, "bdi-%s",
+ dev_name(bdi->dev));
+ /*
+ * If task creation fails, then readd the bdi to
+ * the pending list and force writeout of the bdi
+ * from this forker thread. That will free some memory
+ * and we can try again.
+ */
+ if (!bdi->task) {
+ /*
+ * Add this 'bdi' to the back, so we get
+ * a chance to flush other bdi's to free
+ * memory.
+ */
+ mutex_lock(&bdi_lock);
+ list_add_tail(&bdi->bdi_list, &bdi_pending_list);
+ mutex_unlock(&bdi_lock);
+
+ bdi_flush_io(bdi);
+ }
+ }
+
+ return 0;
+}
+
+void bdi_add_default_flusher_task(struct backing_dev_info *bdi)
+{
+ if (test_and_set_bit(BDI_pending, &bdi->state))
+ return;
+
+ mutex_lock(&bdi_lock);
+ list_move_tail(&bdi->bdi_list, &bdi_pending_list);
+ mutex_unlock(&bdi_lock);
+
+ wake_up_process(default_backing_dev_info.task);
+}
+
int bdi_register(struct backing_dev_info *bdi, struct device *parent,
const char *fmt, ...)
{
@@ -218,8 +402,25 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
mutex_unlock(&bdi_lock);
bdi->dev = dev;
- bdi_debug_register(bdi, dev_name(dev));
+ /*
+ * Just start the forker thread for our default backing_dev_info,
+ * and add other bdi's to the list. They will get a thread created
+ * on-demand when they need it.
+ */
+ if (bdi_cap_flush_forker(bdi)) {
+ bdi->task = kthread_run(bdi_forker_task, bdi, "bdi-%s",
+ dev_name(dev));
+ if (!bdi->task) {
+ mutex_lock(&bdi_lock);
+ list_del(&bdi->bdi_list);
+ mutex_unlock(&bdi_lock);
+ ret = -ENOMEM;
+ goto exit;
+ }
+ }
+
+ bdi_debug_register(bdi, dev_name(dev));
exit:
return ret;
}
@@ -231,8 +432,19 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev)
}
EXPORT_SYMBOL(bdi_register_dev);
-static void bdi_remove_from_list(struct backing_dev_info *bdi)
+static int sched_wait(void *word)
{
+ schedule();
+ return 0;
+}
+
+static void bdi_wb_shutdown(struct backing_dev_info *bdi)
+{
+ /*
+ * If setup is pending, wait for that to complete first
+ */
+ wait_on_bit(&bdi->state, BDI_pending, sched_wait, TASK_UNINTERRUPTIBLE);
+
mutex_lock(&bdi_lock);
list_del(&bdi->bdi_list);
mutex_unlock(&bdi_lock);
@@ -241,7 +453,13 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi)
void bdi_unregister(struct backing_dev_info *bdi)
{
if (bdi->dev) {
- bdi_remove_from_list(bdi);
+ if (!bdi_cap_flush_forker(bdi)) {
+ bdi_wb_shutdown(bdi);
+ if (bdi->task) {
+ kthread_stop(bdi->task);
+ bdi->task = NULL;
+ }
+ }
bdi_debug_unregister(bdi);
device_unregister(bdi->dev);
bdi->dev = NULL;
@@ -251,8 +469,7 @@ EXPORT_SYMBOL(bdi_unregister);
int bdi_init(struct backing_dev_info *bdi)
{
- int i;
- int err;
+ int i, err;
bdi->dev = NULL;
@@ -277,8 +494,6 @@ int bdi_init(struct backing_dev_info *bdi)
err:
while (i--)
percpu_counter_destroy(&bdi->bdi_stat[i]);
-
- bdi_remove_from_list(bdi);
}
return err;
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7c44314..46c62b0 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -36,15 +36,6 @@
#include <linux/pagevec.h>
/*
- * The maximum number of pages to writeout in a single bdflush/kupdate
- * operation. We do this so we don't hold I_SYNC against an inode for
- * enormous amounts of time, which would block a userspace task which has
- * been forced to throttle against that inode. Also, the code reevaluates
- * the dirty each time it has written this many pages.
- */
-#define MAX_WRITEBACK_PAGES 1024
-
-/*
* After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited
* will look to see if it needs to force writeback or throttling.
*/
@@ -117,8 +108,6 @@ EXPORT_SYMBOL(laptop_mode);
/* End of sysctl-exported parameters */
-static void background_writeout(unsigned long _min_pages);
-
/*
* Scale the writeback cache size proportional to the relative writeout speeds.
*
@@ -539,7 +528,7 @@ static void balance_dirty_pages(struct address_space *mapping)
* been flushed to permanent storage.
*/
if (bdi_nr_reclaimable) {
- writeback_inodes(&wbc);
+ generic_sync_bdi_inodes(NULL, &wbc);
pages_written += write_chunk - wbc.nr_to_write;
get_dirty_limits(&background_thresh, &dirty_thresh,
&bdi_thresh, bdi);
@@ -590,7 +579,7 @@ static void balance_dirty_pages(struct address_space *mapping)
(!laptop_mode && (global_page_state(NR_FILE_DIRTY)
+ global_page_state(NR_UNSTABLE_NFS)
> background_thresh)))
- pdflush_operation(background_writeout, 0);
+ bdi_start_writeback(bdi, NULL, 0, WB_SYNC_NONE);
}
void set_page_dirty_balance(struct page *page, int page_mkwrite)
@@ -675,152 +664,41 @@ void throttle_vm_writeout(gfp_t gfp_mask)
}
/*
- * writeback at least _min_pages, and keep writing until the amount of dirty
- * memory is less than the background threshold, or until we're all clean.
+ * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back
+ * the whole world.
*/
-static void background_writeout(unsigned long _min_pages)
+void wakeup_flusher_threads(long nr_pages)
{
- long min_pages = _min_pages;
struct writeback_control wbc = {
- .bdi = NULL,
.sync_mode = WB_SYNC_NONE,
.older_than_this = NULL,
- .nr_to_write = 0,
- .nonblocking = 1,
.range_cyclic = 1,
};
- for ( ; ; ) {
- unsigned long background_thresh;
- unsigned long dirty_thresh;
-
- get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
- if (global_page_state(NR_FILE_DIRTY) +
- global_page_state(NR_UNSTABLE_NFS) < background_thresh
- && min_pages <= 0)
- break;
- wbc.more_io = 0;
- wbc.encountered_congestion = 0;
- wbc.nr_to_write = MAX_WRITEBACK_PAGES;
- wbc.pages_skipped = 0;
- writeback_inodes(&wbc);
- min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
- if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
- /* Wrote less than expected */
- if (wbc.encountered_congestion || wbc.more_io)
- congestion_wait(WRITE, HZ/10);
- else
- break;
- }
- }
-}
-
-/*
- * Start writeback of `nr_pages' pages. If `nr_pages' is zero, write back
- * the whole world. Returns 0 if a pdflush thread was dispatched. Returns
- * -1 if all pdflush threads were busy.
- */
-int wakeup_pdflush(long nr_pages)
-{
if (nr_pages == 0)
nr_pages = global_page_state(NR_FILE_DIRTY) +
global_page_state(NR_UNSTABLE_NFS);
- return pdflush_operation(background_writeout, nr_pages);
+ wbc.nr_to_write = nr_pages;
+ bdi_writeback_all(NULL, &wbc);
}
-static void wb_timer_fn(unsigned long unused);
static void laptop_timer_fn(unsigned long unused);
-static DEFINE_TIMER(wb_timer, wb_timer_fn, 0, 0);
static DEFINE_TIMER(laptop_mode_wb_timer, laptop_timer_fn, 0, 0);
/*
- * Periodic writeback of "old" data.
- *
- * Define "old": the first time one of an inode's pages is dirtied, we mark the
- * dirtying-time in the inode's address_space. So this periodic writeback code
- * just walks the superblock inode list, writing back any inodes which are
- * older than a specific point in time.
- *
- * Try to run once per dirty_writeback_interval. But if a writeback event
- * takes longer than a dirty_writeback_interval interval, then leave a
- * one-second gap.
- *
- * older_than_this takes precedence over nr_to_write. So we'll only write back
- * all dirty pages if they are all attached to "old" mappings.
- */
-static void wb_kupdate(unsigned long arg)
-{
- unsigned long oldest_jif;
- unsigned long start_jif;
- unsigned long next_jif;
- long nr_to_write;
- struct writeback_control wbc = {
- .bdi = NULL,
- .sync_mode = WB_SYNC_NONE,
- .older_than_this = &oldest_jif,
- .nr_to_write = 0,
- .nonblocking = 1,
- .for_kupdate = 1,
- .range_cyclic = 1,
- };
-
- sync_supers();
-
- oldest_jif = jiffies - msecs_to_jiffies(dirty_expire_interval * 10);
- start_jif = jiffies;
- next_jif = start_jif + msecs_to_jiffies(dirty_writeback_interval * 10);
- nr_to_write = global_page_state(NR_FILE_DIRTY) +
- global_page_state(NR_UNSTABLE_NFS) +
- (inodes_stat.nr_inodes - inodes_stat.nr_unused);
- while (nr_to_write > 0) {
- wbc.more_io = 0;
- wbc.encountered_congestion = 0;
- wbc.nr_to_write = MAX_WRITEBACK_PAGES;
- writeback_inodes(&wbc);
- if (wbc.nr_to_write > 0) {
- if (wbc.encountered_congestion || wbc.more_io)
- congestion_wait(WRITE, HZ/10);
- else
- break; /* All the old data is written */
- }
- nr_to_write -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
- }
- if (time_before(next_jif, jiffies + HZ))
- next_jif = jiffies + HZ;
- if (dirty_writeback_interval)
- mod_timer(&wb_timer, next_jif);
-}
-
-/*
* sysctl handler for /proc/sys/vm/dirty_writeback_centisecs
*/
int dirty_writeback_centisecs_handler(ctl_table *table, int write,
struct file *file, void __user *buffer, size_t *length, loff_t *ppos)
{
proc_dointvec(table, write, file, buffer, length, ppos);
- if (dirty_writeback_interval)
- mod_timer(&wb_timer, jiffies +
- msecs_to_jiffies(dirty_writeback_interval * 10));
- else
- del_timer(&wb_timer);
return 0;
}
-static void wb_timer_fn(unsigned long unused)
-{
- if (pdflush_operation(wb_kupdate, 0) < 0)
- mod_timer(&wb_timer, jiffies + HZ); /* delay 1 second */
-}
-
-static void laptop_flush(unsigned long unused)
-{
- sys_sync();
-}
-
static void laptop_timer_fn(unsigned long unused)
{
- pdflush_operation(laptop_flush, 0);
+ wakeup_flusher_threads(0);
}
/*
@@ -903,8 +781,6 @@ void __init page_writeback_init(void)
{
int shift;
- mod_timer(&wb_timer,
- jiffies + msecs_to_jiffies(dirty_writeback_interval * 10));
writeback_set_ratelimit();
register_cpu_notifier(&ratelimit_nb);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5fa3eda..e37fd38 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1654,7 +1654,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist,
*/
if (total_scanned > sc->swap_cluster_max +
sc->swap_cluster_max / 2) {
- wakeup_pdflush(laptop_mode ? 0 : total_scanned);
+ wakeup_flusher_threads(laptop_mode ? 0 : total_scanned);
sc->may_writepage = 1;
}
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 05/11] writeback: get rid of pdflush completely
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (3 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 04/11] writeback: switch to per-bdi threads for flushing data Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 06/11] writeback: separate the flushing state/task from the bdi Jens Axboe
` (11 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
It is now unused, so kill it off.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 5 +
include/linux/writeback.h | 12 --
mm/Makefile | 2 +-
mm/pdflush.c | 269 ---------------------------------------------
4 files changed, 6 insertions(+), 282 deletions(-)
delete mode 100644 mm/pdflush.c
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index aa0b560..5ae0dd4 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -29,6 +29,11 @@
#define inode_to_bdi(inode) ((inode)->i_mapping->backing_dev_info)
+/*
+ * We don't actually have pdflush, but this one is exported though /proc...
+ */
+int nr_pdflush_threads;
+
/**
* writeback_acquire - attempt to get exclusive writeback access to a device
* @bdi: the device's backing_dev_info structure
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a8e9f78..baf04a9 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -14,17 +14,6 @@ extern struct list_head inode_in_use;
extern struct list_head inode_unused;
/*
- * Yes, writeback.h requires sched.h
- * No, sched.h is not included from here.
- */
-static inline int task_is_pdflush(struct task_struct *task)
-{
- return task->flags & PF_FLUSHER;
-}
-
-#define current_is_pdflush() task_is_pdflush(current)
-
-/*
* fs/fs-writeback.c
*/
enum writeback_sync_modes {
@@ -151,7 +140,6 @@ balance_dirty_pages_ratelimited(struct address_space *mapping)
typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc,
void *data);
-int pdflush_operation(void (*fn)(unsigned long), unsigned long arg0);
int generic_writepages(struct address_space *mapping,
struct writeback_control *wbc);
int write_cache_pages(struct address_space *mapping,
diff --git a/mm/Makefile b/mm/Makefile
index ec73c68..2adb811 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -8,7 +8,7 @@ mmu-$(CONFIG_MMU) := fremap.o highmem.o madvise.o memory.o mincore.o \
vmalloc.o
obj-y := bootmem.o filemap.o mempool.o oom_kill.o fadvise.o \
- maccess.o page_alloc.o page-writeback.o pdflush.o \
+ maccess.o page_alloc.o page-writeback.o \
readahead.o swap.o truncate.o vmscan.o shmem.o \
prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
page_isolation.o mm_init.o $(mmu-y)
diff --git a/mm/pdflush.c b/mm/pdflush.c
deleted file mode 100644
index 235ac44..0000000
--- a/mm/pdflush.c
+++ /dev/null
@@ -1,269 +0,0 @@
-/*
- * mm/pdflush.c - worker threads for writing back filesystem data
- *
- * Copyright (C) 2002, Linus Torvalds.
- *
- * 09Apr2002 Andrew Morton
- * Initial version
- * 29Feb2004 kaos@sgi.com
- * Move worker thread creation to kthread to avoid chewing
- * up stack space with nested calls to kernel_thread.
- */
-
-#include <linux/sched.h>
-#include <linux/list.h>
-#include <linux/signal.h>
-#include <linux/spinlock.h>
-#include <linux/gfp.h>
-#include <linux/init.h>
-#include <linux/module.h>
-#include <linux/fs.h> /* Needed by writeback.h */
-#include <linux/writeback.h> /* Prototypes pdflush_operation() */
-#include <linux/kthread.h>
-#include <linux/cpuset.h>
-#include <linux/freezer.h>
-
-
-/*
- * Minimum and maximum number of pdflush instances
- */
-#define MIN_PDFLUSH_THREADS 2
-#define MAX_PDFLUSH_THREADS 8
-
-static void start_one_pdflush_thread(void);
-
-
-/*
- * The pdflush threads are worker threads for writing back dirty data.
- * Ideally, we'd like one thread per active disk spindle. But the disk
- * topology is very hard to divine at this level. Instead, we take
- * care in various places to prevent more than one pdflush thread from
- * performing writeback against a single filesystem. pdflush threads
- * have the PF_FLUSHER flag set in current->flags to aid in this.
- */
-
-/*
- * All the pdflush threads. Protected by pdflush_lock
- */
-static LIST_HEAD(pdflush_list);
-static DEFINE_SPINLOCK(pdflush_lock);
-
-/*
- * The count of currently-running pdflush threads. Protected
- * by pdflush_lock.
- *
- * Readable by sysctl, but not writable. Published to userspace at
- * /proc/sys/vm/nr_pdflush_threads.
- */
-int nr_pdflush_threads = 0;
-
-/*
- * The time at which the pdflush thread pool last went empty
- */
-static unsigned long last_empty_jifs;
-
-/*
- * The pdflush thread.
- *
- * Thread pool management algorithm:
- *
- * - The minimum and maximum number of pdflush instances are bound
- * by MIN_PDFLUSH_THREADS and MAX_PDFLUSH_THREADS.
- *
- * - If there have been no idle pdflush instances for 1 second, create
- * a new one.
- *
- * - If the least-recently-went-to-sleep pdflush thread has been asleep
- * for more than one second, terminate a thread.
- */
-
-/*
- * A structure for passing work to a pdflush thread. Also for passing
- * state information between pdflush threads. Protected by pdflush_lock.
- */
-struct pdflush_work {
- struct task_struct *who; /* The thread */
- void (*fn)(unsigned long); /* A callback function */
- unsigned long arg0; /* An argument to the callback */
- struct list_head list; /* On pdflush_list, when idle */
- unsigned long when_i_went_to_sleep;
-};
-
-static int __pdflush(struct pdflush_work *my_work)
-{
- current->flags |= PF_FLUSHER | PF_SWAPWRITE;
- set_freezable();
- my_work->fn = NULL;
- my_work->who = current;
- INIT_LIST_HEAD(&my_work->list);
-
- spin_lock_irq(&pdflush_lock);
- for ( ; ; ) {
- struct pdflush_work *pdf;
-
- set_current_state(TASK_INTERRUPTIBLE);
- list_move(&my_work->list, &pdflush_list);
- my_work->when_i_went_to_sleep = jiffies;
- spin_unlock_irq(&pdflush_lock);
- schedule();
- try_to_freeze();
- spin_lock_irq(&pdflush_lock);
- if (!list_empty(&my_work->list)) {
- /*
- * Someone woke us up, but without removing our control
- * structure from the global list. swsusp will do this
- * in try_to_freeze()->refrigerator(). Handle it.
- */
- my_work->fn = NULL;
- continue;
- }
- if (my_work->fn == NULL) {
- printk("pdflush: bogus wakeup\n");
- continue;
- }
- spin_unlock_irq(&pdflush_lock);
-
- (*my_work->fn)(my_work->arg0);
-
- spin_lock_irq(&pdflush_lock);
-
- /*
- * Thread creation: For how long have there been zero
- * available threads?
- *
- * To throttle creation, we reset last_empty_jifs.
- */
- if (time_after(jiffies, last_empty_jifs + 1 * HZ)) {
- if (list_empty(&pdflush_list)) {
- if (nr_pdflush_threads < MAX_PDFLUSH_THREADS) {
- last_empty_jifs = jiffies;
- nr_pdflush_threads++;
- spin_unlock_irq(&pdflush_lock);
- start_one_pdflush_thread();
- spin_lock_irq(&pdflush_lock);
- }
- }
- }
-
- my_work->fn = NULL;
-
- /*
- * Thread destruction: For how long has the sleepiest
- * thread slept?
- */
- if (list_empty(&pdflush_list))
- continue;
- if (nr_pdflush_threads <= MIN_PDFLUSH_THREADS)
- continue;
- pdf = list_entry(pdflush_list.prev, struct pdflush_work, list);
- if (time_after(jiffies, pdf->when_i_went_to_sleep + 1 * HZ)) {
- /* Limit exit rate */
- pdf->when_i_went_to_sleep = jiffies;
- break; /* exeunt */
- }
- }
- nr_pdflush_threads--;
- spin_unlock_irq(&pdflush_lock);
- return 0;
-}
-
-/*
- * Of course, my_work wants to be just a local in __pdflush(). It is
- * separated out in this manner to hopefully prevent the compiler from
- * performing unfortunate optimisations against the auto variables. Because
- * these are visible to other tasks and CPUs. (No problem has actually
- * been observed. This is just paranoia).
- */
-static int pdflush(void *dummy)
-{
- struct pdflush_work my_work;
- cpumask_var_t cpus_allowed;
-
- /*
- * Since the caller doesn't even check kthread_run() worked, let's not
- * freak out too much if this fails.
- */
- if (!alloc_cpumask_var(&cpus_allowed, GFP_KERNEL)) {
- printk(KERN_WARNING "pdflush failed to allocate cpumask\n");
- return 0;
- }
-
- /*
- * pdflush can spend a lot of time doing encryption via dm-crypt. We
- * don't want to do that at keventd's priority.
- */
- set_user_nice(current, 0);
-
- /*
- * Some configs put our parent kthread in a limited cpuset,
- * which kthread() overrides, forcing cpus_allowed == cpu_all_mask.
- * Our needs are more modest - cut back to our cpusets cpus_allowed.
- * This is needed as pdflush's are dynamically created and destroyed.
- * The boottime pdflush's are easily placed w/o these 2 lines.
- */
- cpuset_cpus_allowed(current, cpus_allowed);
- set_cpus_allowed_ptr(current, cpus_allowed);
- free_cpumask_var(cpus_allowed);
-
- return __pdflush(&my_work);
-}
-
-/*
- * Attempt to wake up a pdflush thread, and get it to do some work for you.
- * Returns zero if it indeed managed to find a worker thread, and passed your
- * payload to it.
- */
-int pdflush_operation(void (*fn)(unsigned long), unsigned long arg0)
-{
- unsigned long flags;
- int ret = 0;
-
- BUG_ON(fn == NULL); /* Hard to diagnose if it's deferred */
-
- spin_lock_irqsave(&pdflush_lock, flags);
- if (list_empty(&pdflush_list)) {
- ret = -1;
- } else {
- struct pdflush_work *pdf;
-
- pdf = list_entry(pdflush_list.next, struct pdflush_work, list);
- list_del_init(&pdf->list);
- if (list_empty(&pdflush_list))
- last_empty_jifs = jiffies;
- pdf->fn = fn;
- pdf->arg0 = arg0;
- wake_up_process(pdf->who);
- }
- spin_unlock_irqrestore(&pdflush_lock, flags);
-
- return ret;
-}
-
-static void start_one_pdflush_thread(void)
-{
- struct task_struct *k;
-
- k = kthread_run(pdflush, NULL, "pdflush");
- if (unlikely(IS_ERR(k))) {
- spin_lock_irq(&pdflush_lock);
- nr_pdflush_threads--;
- spin_unlock_irq(&pdflush_lock);
- }
-}
-
-static int __init pdflush_init(void)
-{
- int i;
-
- /*
- * Pre-set nr_pdflush_threads... If we fail to create,
- * the count will be decremented.
- */
- nr_pdflush_threads = MIN_PDFLUSH_THREADS;
-
- for (i = 0; i < MIN_PDFLUSH_THREADS; i++)
- start_one_pdflush_thread();
- return 0;
-}
-
-module_init(pdflush_init);
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 06/11] writeback: separate the flushing state/task from the bdi
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (4 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 05/11] writeback: get rid of pdflush completely Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 07/11] writeback: support > 1 flusher thread per bdi Jens Axboe
` (10 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
Add a struct bdi_writeback for tracking and handling dirty IO. This
is in preparation for adding > 1 flusher task per bdi.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 136 +++++++++++++++++++++++++++----------------
include/linux/backing-dev.h | 38 +++++++-----
mm/backing-dev.c | 126 ++++++++++++++++++++++++++++++++--------
3 files changed, 208 insertions(+), 92 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 5ae0dd4..ed242d5 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -46,9 +46,11 @@ int nr_pdflush_threads;
* unless they implement their own. Which is somewhat inefficient, as this
* may prevent concurrent writeback against multiple devices.
*/
-static int writeback_acquire(struct backing_dev_info *bdi)
+static int writeback_acquire(struct bdi_writeback *wb)
{
- return !test_and_set_bit(BDI_pdflush, &bdi->state);
+ struct backing_dev_info *bdi = wb->bdi;
+
+ return !test_and_set_bit(wb->nr, &bdi->wb_active);
}
/**
@@ -59,19 +61,37 @@ static int writeback_acquire(struct backing_dev_info *bdi)
*/
int writeback_in_progress(struct backing_dev_info *bdi)
{
- return test_bit(BDI_pdflush, &bdi->state);
+ return bdi->wb_active != 0;
}
/**
* writeback_release - relinquish exclusive writeback access against a device.
* @bdi: the device's backing_dev_info structure
*/
-static void writeback_release(struct backing_dev_info *bdi)
+static void writeback_release(struct bdi_writeback *wb)
{
- WARN_ON_ONCE(!writeback_in_progress(bdi));
- bdi->wb_arg.nr_pages = 0;
- bdi->wb_arg.sb = NULL;
- clear_bit(BDI_pdflush, &bdi->state);
+ struct backing_dev_info *bdi = wb->bdi;
+
+ wb->nr_pages = 0;
+ wb->sb = NULL;
+ clear_bit(wb->nr, &bdi->wb_active);
+}
+
+static void wb_start_writeback(struct bdi_writeback *wb, struct super_block *sb,
+ long nr_pages,
+ enum writeback_sync_modes sync_mode)
+{
+ if (!wb_has_dirty_io(wb))
+ return;
+
+ if (writeback_acquire(wb)) {
+ wb->nr_pages = nr_pages;
+ wb->sb = sb;
+ wb->sync_mode = sync_mode;
+
+ if (wb->task)
+ wake_up_process(wb->task);
+ }
}
int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
@@ -81,20 +101,12 @@ int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
* This only happens the first time someone kicks this bdi, so put
* it out-of-line.
*/
- if (unlikely(!bdi->task)) {
+ if (unlikely(!bdi->wb.task)) {
bdi_add_default_flusher_task(bdi);
return 1;
}
- if (writeback_acquire(bdi)) {
- bdi->wb_arg.nr_pages = nr_pages;
- bdi->wb_arg.sb = sb;
- bdi->wb_arg.sync_mode = sync_mode;
-
- if (bdi->task)
- wake_up_process(bdi->task);
- }
-
+ wb_start_writeback(&bdi->wb, sb, nr_pages, sync_mode);
return 0;
}
@@ -122,12 +134,12 @@ int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
* older_than_this takes precedence over nr_to_write. So we'll only write back
* all dirty pages if they are all attached to "old" mappings.
*/
-static void bdi_kupdated(struct backing_dev_info *bdi)
+static void wb_kupdated(struct bdi_writeback *wb)
{
unsigned long oldest_jif;
long nr_to_write;
struct writeback_control wbc = {
- .bdi = bdi,
+ .bdi = wb->bdi,
.sync_mode = WB_SYNC_NONE,
.older_than_this = &oldest_jif,
.nr_to_write = 0,
@@ -162,15 +174,19 @@ static inline bool over_bground_thresh(void)
global_page_state(NR_UNSTABLE_NFS) >= background_thresh);
}
-static void bdi_pdflush(struct backing_dev_info *bdi)
+static void generic_sync_wb_inodes(struct bdi_writeback *wb,
+ struct super_block *sb,
+ struct writeback_control *wbc);
+
+static void wb_writeback(struct bdi_writeback *wb)
{
struct writeback_control wbc = {
- .bdi = bdi,
- .sync_mode = bdi->wb_arg.sync_mode,
+ .bdi = wb->bdi,
+ .sync_mode = wb->sync_mode,
.older_than_this = NULL,
.range_cyclic = 1,
};
- long nr_pages = bdi->wb_arg.nr_pages;
+ long nr_pages = wb->nr_pages;
for (;;) {
if (wbc.sync_mode == WB_SYNC_NONE && nr_pages <= 0 &&
@@ -181,7 +197,7 @@ static void bdi_pdflush(struct backing_dev_info *bdi)
wbc.encountered_congestion = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
wbc.pages_skipped = 0;
- generic_sync_bdi_inodes(bdi->wb_arg.sb, &wbc);
+ generic_sync_wb_inodes(wb, wb->sb, &wbc);
nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
/*
* If we ran out of stuff to write, bail unless more_io got set
@@ -198,7 +214,7 @@ static void bdi_pdflush(struct backing_dev_info *bdi)
* Handle writeback of dirty data for the device backed by this bdi. Also
* wakes up periodically and does kupdated style flushing.
*/
-int bdi_writeback_task(struct backing_dev_info *bdi)
+int bdi_writeback_task(struct bdi_writeback *wb)
{
while (!kthread_should_stop()) {
unsigned long wait_jiffies;
@@ -222,12 +238,12 @@ int bdi_writeback_task(struct backing_dev_info *bdi)
* pdflush style writeout.
*
*/
- if (writeback_acquire(bdi))
- bdi_kupdated(bdi);
+ if (writeback_acquire(wb))
+ wb_kupdated(wb);
else
- bdi_pdflush(bdi);
+ wb_writeback(wb);
- writeback_release(bdi);
+ writeback_release(wb);
}
return 0;
@@ -248,6 +264,14 @@ void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc)
mutex_unlock(&bdi_lock);
}
+/*
+ * We have only a single wb per bdi, so just return that.
+ */
+static inline struct bdi_writeback *inode_get_wb(struct inode *inode)
+{
+ return &inode_to_bdi(inode)->wb;
+}
+
/**
* __mark_inode_dirty - internal function
* @inode: inode to mark
@@ -346,9 +370,10 @@ void __mark_inode_dirty(struct inode *inode, int flags)
* reposition it (that would break b_dirty time-ordering).
*/
if (!was_dirty) {
+ struct bdi_writeback *wb = inode_get_wb(inode);
+
inode->dirtied_when = jiffies;
- list_move(&inode->i_list,
- &inode_to_bdi(inode)->b_dirty);
+ list_move(&inode->i_list, &wb->b_dirty);
}
}
out:
@@ -375,16 +400,16 @@ static int write_inode(struct inode *inode, int sync)
*/
static void redirty_tail(struct inode *inode)
{
- struct backing_dev_info *bdi = inode_to_bdi(inode);
+ struct bdi_writeback *wb = inode_get_wb(inode);
- if (!list_empty(&bdi->b_dirty)) {
+ if (!list_empty(&wb->b_dirty)) {
struct inode *tail;
- tail = list_entry(bdi->b_dirty.next, struct inode, i_list);
+ tail = list_entry(wb->b_dirty.next, struct inode, i_list);
if (time_before(inode->dirtied_when, tail->dirtied_when))
inode->dirtied_when = jiffies;
}
- list_move(&inode->i_list, &bdi->b_dirty);
+ list_move(&inode->i_list, &wb->b_dirty);
}
/*
@@ -392,7 +417,9 @@ static void redirty_tail(struct inode *inode)
*/
static void requeue_io(struct inode *inode)
{
- list_move(&inode->i_list, &inode_to_bdi(inode)->b_more_io);
+ struct bdi_writeback *wb = inode_get_wb(inode);
+
+ list_move(&inode->i_list, &wb->b_more_io);
}
static void inode_sync_complete(struct inode *inode)
@@ -439,11 +466,10 @@ static void move_expired_inodes(struct list_head *delaying_queue,
/*
* Queue all expired dirty inodes for io, eldest first.
*/
-static void queue_io(struct backing_dev_info *bdi,
- unsigned long *older_than_this)
+static void queue_io(struct bdi_writeback *wb, unsigned long *older_than_this)
{
- list_splice_init(&bdi->b_more_io, bdi->b_io.prev);
- move_expired_inodes(&bdi->b_dirty, &bdi->b_io, older_than_this);
+ list_splice_init(&wb->b_more_io, wb->b_io.prev);
+ move_expired_inodes(&wb->b_dirty, &wb->b_io, older_than_this);
}
/*
@@ -604,20 +630,20 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
return __sync_single_inode(inode, wbc);
}
-void generic_sync_bdi_inodes(struct super_block *sb,
- struct writeback_control *wbc)
+static void generic_sync_wb_inodes(struct bdi_writeback *wb,
+ struct super_block *sb,
+ struct writeback_control *wbc)
{
const int is_blkdev_sb = sb_is_blkdev_sb(sb);
- struct backing_dev_info *bdi = wbc->bdi;
const unsigned long start = jiffies; /* livelock avoidance */
spin_lock(&inode_lock);
- if (!wbc->for_kupdate || list_empty(&bdi->b_io))
- queue_io(bdi, wbc->older_than_this);
+ if (!wbc->for_kupdate || list_empty(&wb->b_io))
+ queue_io(wb, wbc->older_than_this);
- while (!list_empty(&bdi->b_io)) {
- struct inode *inode = list_entry(bdi->b_io.prev,
+ while (!list_empty(&wb->b_io)) {
+ struct inode *inode = list_entry(wb->b_io.prev,
struct inode, i_list);
long pages_skipped;
@@ -629,7 +655,7 @@ void generic_sync_bdi_inodes(struct super_block *sb,
continue;
}
- if (!bdi_cap_writeback_dirty(bdi)) {
+ if (!bdi_cap_writeback_dirty(wb->bdi)) {
redirty_tail(inode);
if (is_blkdev_sb) {
/*
@@ -651,7 +677,7 @@ void generic_sync_bdi_inodes(struct super_block *sb,
continue;
}
- if (wbc->nonblocking && bdi_write_congested(bdi)) {
+ if (wbc->nonblocking && bdi_write_congested(wb->bdi)) {
wbc->encountered_congestion = 1;
if (!is_blkdev_sb)
break; /* Skip a congested fs */
@@ -685,7 +711,7 @@ void generic_sync_bdi_inodes(struct super_block *sb,
wbc->more_io = 1;
break;
}
- if (!list_empty(&bdi->b_more_io))
+ if (!list_empty(&wb->b_more_io))
wbc->more_io = 1;
}
@@ -693,6 +719,14 @@ void generic_sync_bdi_inodes(struct super_block *sb,
/* Leave any unwritten inodes on b_io */
}
+void generic_sync_bdi_inodes(struct super_block *sb,
+ struct writeback_control *wbc)
+{
+ struct backing_dev_info *bdi = wbc->bdi;
+
+ generic_sync_wb_inodes(&bdi->wb, sb, wbc);
+}
+
/*
* Write out a superblock's list of dirty inodes. A wait will be performed
* upon no inodes, all inodes or the final one, depending upon sync_mode.
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 4a312e9..59f88e5 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -24,8 +24,8 @@ struct dentry;
* Bits in backing_dev_info.state
*/
enum bdi_state {
- BDI_pdflush, /* A pdflush thread is working this device */
BDI_pending, /* On its way to being activated */
+ BDI_wb_alloc, /* Default embedded wb allocated */
BDI_async_congested, /* The async (write) queue is getting full */
BDI_sync_congested, /* The sync queue is getting full */
BDI_unused, /* Available bits start here */
@@ -41,15 +41,22 @@ enum bdi_stat_item {
#define BDI_STAT_BATCH (8*(1+ilog2(nr_cpu_ids)))
-struct bdi_writeback_arg {
- unsigned long nr_pages;
- struct super_block *sb;
+struct bdi_writeback {
+ struct backing_dev_info *bdi; /* our parent bdi */
+ unsigned int nr;
+
+ struct task_struct *task; /* writeback task */
+ struct list_head b_dirty; /* dirty inodes */
+ struct list_head b_io; /* parked for writeback */
+ struct list_head b_more_io; /* parked for more writeback */
+
+ unsigned long nr_pages;
+ struct super_block *sb;
enum writeback_sync_modes sync_mode;
};
struct backing_dev_info {
struct list_head bdi_list;
-
unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */
unsigned long state; /* Always use atomic bitops on this */
unsigned int capabilities; /* Device capabilities */
@@ -66,13 +73,11 @@ struct backing_dev_info {
unsigned int min_ratio;
unsigned int max_ratio, max_prop_frac;
- struct device *dev;
+ struct bdi_writeback wb; /* default writeback info for this bdi */
+ unsigned long wb_active; /* bitmap of active tasks */
+ unsigned long wb_mask; /* number of registered tasks */
- struct task_struct *task; /* writeback task */
- struct bdi_writeback_arg wb_arg; /* protected by BDI_pdflush */
- struct list_head b_dirty; /* dirty inodes */
- struct list_head b_io; /* parked for writeback */
- struct list_head b_more_io; /* parked for more writeback */
+ struct device *dev;
#ifdef CONFIG_DEBUG_FS
struct dentry *debug_dir;
@@ -89,18 +94,19 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
void bdi_unregister(struct backing_dev_info *bdi);
int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
long nr_pages, enum writeback_sync_modes sync_mode);
-int bdi_writeback_task(struct backing_dev_info *bdi);
+int bdi_writeback_task(struct bdi_writeback *wb);
void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc);
void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
+int bdi_has_dirty_io(struct backing_dev_info *bdi);
extern struct mutex bdi_lock;
extern struct list_head bdi_list;
-static inline int bdi_has_dirty_io(struct backing_dev_info *bdi)
+static inline int wb_has_dirty_io(struct bdi_writeback *wb)
{
- return !list_empty(&bdi->b_dirty) ||
- !list_empty(&bdi->b_io) ||
- !list_empty(&bdi->b_more_io);
+ return !list_empty(&wb->b_dirty) ||
+ !list_empty(&wb->b_io) ||
+ !list_empty(&wb->b_more_io);
}
static inline void __add_bdi_stat(struct backing_dev_info *bdi,
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 3dbfc76..75c9054 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -213,10 +213,45 @@ static int __init default_bdi_init(void)
}
subsys_initcall(default_bdi_init);
+static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
+{
+ memset(wb, 0, sizeof(*wb));
+
+ wb->bdi = bdi;
+ INIT_LIST_HEAD(&wb->b_dirty);
+ INIT_LIST_HEAD(&wb->b_io);
+ INIT_LIST_HEAD(&wb->b_more_io);
+}
+
+static int wb_assign_nr(struct backing_dev_info *bdi, struct bdi_writeback *wb)
+{
+ set_bit(0, &bdi->wb_mask);
+ wb->nr = 0;
+ return 0;
+}
+
+static void bdi_put_wb(struct backing_dev_info *bdi, struct bdi_writeback *wb)
+{
+ clear_bit(wb->nr, &bdi->wb_mask);
+ clear_bit(BDI_wb_alloc, &bdi->state);
+}
+
+static struct bdi_writeback *bdi_new_wb(struct backing_dev_info *bdi)
+{
+ struct bdi_writeback *wb;
+
+ set_bit(BDI_wb_alloc, &bdi->state);
+ wb = &bdi->wb;
+ wb_assign_nr(bdi, wb);
+ return wb;
+}
+
static int bdi_start_fn(void *ptr)
{
- struct backing_dev_info *bdi = ptr;
+ struct bdi_writeback *wb = ptr;
+ struct backing_dev_info *bdi = wb->bdi;
struct task_struct *tsk = current;
+ int ret;
/*
* Add us to the active bdi_list
@@ -240,7 +275,15 @@ static int bdi_start_fn(void *ptr)
smp_mb__after_clear_bit();
wake_up_bit(&bdi->state, BDI_pending);
- return bdi_writeback_task(bdi);
+ ret = bdi_writeback_task(wb);
+
+ bdi_put_wb(bdi, wb);
+ return ret;
+}
+
+int bdi_has_dirty_io(struct backing_dev_info *bdi)
+{
+ return wb_has_dirty_io(&bdi->wb);
}
static void bdi_flush_io(struct backing_dev_info *bdi)
@@ -295,17 +338,18 @@ static void sync_supers_timer_fn(unsigned long unused)
static int bdi_forker_task(void *ptr)
{
- struct backing_dev_info *me = ptr;
+ struct bdi_writeback *me = ptr;
for (;;) {
struct backing_dev_info *bdi, *tmp;
+ struct bdi_writeback *wb;
/*
* Temporary measure, we want to make sure we don't see
* dirty data on the default backing_dev_info
*/
- if (bdi_has_dirty_io(me))
- bdi_flush_io(me);
+ if (wb_has_dirty_io(me))
+ bdi_flush_io(me->bdi);
mutex_lock(&bdi_lock);
@@ -314,7 +358,7 @@ static int bdi_forker_task(void *ptr)
* a thread registered. If so, set that up.
*/
list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
- if (bdi->task || !bdi_has_dirty_io(bdi))
+ if (bdi->wb.task || !bdi_has_dirty_io(bdi))
continue;
bdi_add_default_flusher_task(bdi);
@@ -340,17 +384,22 @@ static int bdi_forker_task(void *ptr)
list_del_init(&bdi->bdi_list);
mutex_unlock(&bdi_lock);
- BUG_ON(bdi->task);
+ wb = bdi_new_wb(bdi);
+ if (!wb)
+ goto readd_flush;
- bdi->task = kthread_run(bdi_start_fn, bdi, "bdi-%s",
+ wb->task = kthread_run(bdi_start_fn, wb, "bdi-%s",
dev_name(bdi->dev));
+
/*
* If task creation fails, then readd the bdi to
* the pending list and force writeout of the bdi
* from this forker thread. That will free some memory
* and we can try again.
*/
- if (!bdi->task) {
+ if (!wb->task) {
+ bdi_put_wb(bdi, wb);
+readd_flush:
/*
* Add this 'bdi' to the back, so we get
* a chance to flush other bdi's to free
@@ -367,8 +416,18 @@ static int bdi_forker_task(void *ptr)
return 0;
}
+/*
+ * Add a new flusher task that gets created for any bdi
+ * that has dirty data pending writeout
+ */
void bdi_add_default_flusher_task(struct backing_dev_info *bdi)
{
+ if (!bdi_cap_writeback_dirty(bdi))
+ return;
+
+ /*
+ * Someone already marked this pending for task creation
+ */
if (test_and_set_bit(BDI_pending, &bdi->state))
return;
@@ -376,7 +435,7 @@ void bdi_add_default_flusher_task(struct backing_dev_info *bdi)
list_move_tail(&bdi->bdi_list, &bdi_pending_list);
mutex_unlock(&bdi_lock);
- wake_up_process(default_backing_dev_info.task);
+ wake_up_process(default_backing_dev_info.wb.task);
}
int bdi_register(struct backing_dev_info *bdi, struct device *parent,
@@ -409,13 +468,23 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
* on-demand when they need it.
*/
if (bdi_cap_flush_forker(bdi)) {
- bdi->task = kthread_run(bdi_forker_task, bdi, "bdi-%s",
+ struct bdi_writeback *wb;
+
+ wb = bdi_new_wb(bdi);
+ if (!wb) {
+ ret = -ENOMEM;
+ goto remove_err;
+ }
+
+ wb->task = kthread_run(bdi_forker_task, wb, "bdi-%s",
dev_name(dev));
- if (!bdi->task) {
+ if (!wb->task) {
+ bdi_put_wb(bdi, wb);
+ ret = -ENOMEM;
+remove_err:
mutex_lock(&bdi_lock);
list_del(&bdi->bdi_list);
mutex_unlock(&bdi_lock);
- ret = -ENOMEM;
goto exit;
}
}
@@ -438,28 +507,37 @@ static int sched_wait(void *word)
return 0;
}
+/*
+ * Remove bdi from global list and shutdown any threads we have running
+ */
static void bdi_wb_shutdown(struct backing_dev_info *bdi)
{
+ if (!bdi_cap_writeback_dirty(bdi))
+ return;
+
/*
* If setup is pending, wait for that to complete first
*/
wait_on_bit(&bdi->state, BDI_pending, sched_wait, TASK_UNINTERRUPTIBLE);
+ /*
+ * Make sure nobody finds us on the bdi_list anymore
+ */
mutex_lock(&bdi_lock);
list_del(&bdi->bdi_list);
mutex_unlock(&bdi_lock);
+
+ /*
+ * Finally, kill the kernel thread
+ */
+ kthread_stop(bdi->wb.task);
}
void bdi_unregister(struct backing_dev_info *bdi)
{
if (bdi->dev) {
- if (!bdi_cap_flush_forker(bdi)) {
+ if (!bdi_cap_flush_forker(bdi))
bdi_wb_shutdown(bdi);
- if (bdi->task) {
- kthread_stop(bdi->task);
- bdi->task = NULL;
- }
- }
bdi_debug_unregister(bdi);
device_unregister(bdi->dev);
bdi->dev = NULL;
@@ -477,9 +555,9 @@ int bdi_init(struct backing_dev_info *bdi)
bdi->max_ratio = 100;
bdi->max_prop_frac = PROP_FRAC_BASE;
INIT_LIST_HEAD(&bdi->bdi_list);
- INIT_LIST_HEAD(&bdi->b_io);
- INIT_LIST_HEAD(&bdi->b_dirty);
- INIT_LIST_HEAD(&bdi->b_more_io);
+ bdi->wb_mask = bdi->wb_active = 0;
+
+ bdi_wb_init(&bdi->wb, bdi);
for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
err = percpu_counter_init(&bdi->bdi_stat[i], 0);
@@ -504,9 +582,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
{
int i;
- WARN_ON(!list_empty(&bdi->b_dirty));
- WARN_ON(!list_empty(&bdi->b_io));
- WARN_ON(!list_empty(&bdi->b_more_io));
+ WARN_ON(bdi_has_dirty_io(bdi));
bdi_unregister(bdi);
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 07/11] writeback: support > 1 flusher thread per bdi
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (5 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 06/11] writeback: separate the flushing state/task from the bdi Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 08/11] writeback: allow sleepy exit of default writeback task Jens Axboe
` (9 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
Build on the bdi_writeback support by allowing registration of
more than 1 flusher thread. File systems can call bdi_add_flusher_task(bdi)
to add more flusher threads to the device. If they do so, they must also
provide a super_operations function to return the suitable bdi_writeback
struct from any given inode.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 445 +++++++++++++++++++++++++++++++++++--------
include/linux/backing-dev.h | 34 +++-
include/linux/fs.h | 3 +
include/linux/writeback.h | 1 +
mm/backing-dev.c | 242 +++++++++++++++++++-----
5 files changed, 592 insertions(+), 133 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index ed242d5..f3db578 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -34,80 +34,249 @@
*/
int nr_pdflush_threads;
-/**
- * writeback_acquire - attempt to get exclusive writeback access to a device
- * @bdi: the device's backing_dev_info structure
- *
- * It is a waste of resources to have more than one pdflush thread blocked on
- * a single request queue. Exclusion at the request_queue level is obtained
- * via a flag in the request_queue's backing_dev_info.state.
- *
- * Non-request_queue-backed address_spaces will share default_backing_dev_info,
- * unless they implement their own. Which is somewhat inefficient, as this
- * may prevent concurrent writeback against multiple devices.
+static void generic_sync_wb_inodes(struct bdi_writeback *wb,
+ struct super_block *sb,
+ struct writeback_control *wbc);
+
+/*
+ * Work items for the bdi_writeback threads
*/
-static int writeback_acquire(struct bdi_writeback *wb)
+struct bdi_work {
+ struct list_head list;
+ struct list_head wait_list;
+ struct rcu_head rcu_head;
+
+ unsigned long seen;
+ atomic_t pending;
+
+ unsigned long sb_data;
+ unsigned long nr_pages;
+ enum writeback_sync_modes sync_mode;
+
+ unsigned long state;
+};
+
+static struct super_block *bdi_work_sb(struct bdi_work *work)
{
- struct backing_dev_info *bdi = wb->bdi;
+ return (struct super_block *) (work->sb_data & ~1UL);
+}
+
+static inline bool bdi_work_on_stack(struct bdi_work *work)
+{
+ return work->sb_data & 1UL;
+}
- return !test_and_set_bit(wb->nr, &bdi->wb_active);
+static inline void bdi_work_init(struct bdi_work *work, struct super_block *sb,
+ unsigned long nr_pages,
+ enum writeback_sync_modes sync_mode)
+{
+ INIT_RCU_HEAD(&work->rcu_head);
+ work->sb_data = (unsigned long) sb;
+ work->nr_pages = nr_pages;
+ work->sync_mode = sync_mode;
+ work->state = 1;
+}
+
+static inline void bdi_work_init_on_stack(struct bdi_work *work,
+ struct super_block *sb,
+ unsigned long nr_pages,
+ enum writeback_sync_modes sync_mode)
+{
+ bdi_work_init(work, sb, nr_pages, sync_mode);
+ work->sb_data |= 1UL;
}
/**
* writeback_in_progress - determine whether there is writeback in progress
* @bdi: the device's backing_dev_info structure.
*
- * Determine whether there is writeback in progress against a backing device.
+ * Determine whether there is writeback waiting to be handled against a
+ * backing device.
*/
int writeback_in_progress(struct backing_dev_info *bdi)
{
- return bdi->wb_active != 0;
+ return !list_empty(&bdi->work_list);
}
-/**
- * writeback_release - relinquish exclusive writeback access against a device.
- * @bdi: the device's backing_dev_info structure
- */
-static void writeback_release(struct bdi_writeback *wb)
+static void bdi_work_clear(struct bdi_work *work)
{
- struct backing_dev_info *bdi = wb->bdi;
+ clear_bit(0, &work->state);
+ smp_mb__after_clear_bit();
+ wake_up_bit(&work->state, 0);
+}
- wb->nr_pages = 0;
- wb->sb = NULL;
- clear_bit(wb->nr, &bdi->wb_active);
+static void bdi_work_free(struct rcu_head *head)
+{
+ struct bdi_work *work = container_of(head, struct bdi_work, rcu_head);
+
+ if (!bdi_work_on_stack(work))
+ kfree(work);
+ else
+ bdi_work_clear(work);
}
-static void wb_start_writeback(struct bdi_writeback *wb, struct super_block *sb,
- long nr_pages,
- enum writeback_sync_modes sync_mode)
+static void wb_work_complete(struct bdi_work *work)
{
- if (!wb_has_dirty_io(wb))
- return;
+ const enum writeback_sync_modes sync_mode = work->sync_mode;
- if (writeback_acquire(wb)) {
- wb->nr_pages = nr_pages;
- wb->sb = sb;
- wb->sync_mode = sync_mode;
+ /*
+ * For allocated work, we can clear the done/seen bit right here.
+ * For on-stack work, we need to postpone both the clear and free
+ * to after the RCU grace period, since the stack could be invalidated
+ * as soon as bdi_work_clear() has done the wakeup.
+ */
+ if (!bdi_work_on_stack(work))
+ bdi_work_clear(work);
+ if (sync_mode == WB_SYNC_NONE || bdi_work_on_stack(work))
+ call_rcu(&work->rcu_head, bdi_work_free);
+}
- if (wb->task)
- wake_up_process(wb->task);
+static void wb_clear_pending(struct bdi_writeback *wb, struct bdi_work *work)
+{
+ /*
+ * The caller has retrieved the work arguments from this work,
+ * drop our reference. If this is the last ref, delete and free it
+ */
+ if (atomic_dec_and_test(&work->pending)) {
+ struct backing_dev_info *bdi = wb->bdi;
+
+ spin_lock(&bdi->wb_lock);
+ list_del_rcu(&work->list);
+ spin_unlock(&bdi->wb_lock);
+
+ wb_work_complete(work);
}
}
-int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
- long nr_pages, enum writeback_sync_modes sync_mode)
+static void wb_start_writeback(struct bdi_writeback *wb, struct bdi_work *work)
{
/*
- * This only happens the first time someone kicks this bdi, so put
- * it out-of-line.
+ * If we failed allocating the bdi work item, wake up the wb thread
+ * always. As a safety precaution, it'll flush out everything
*/
- if (unlikely(!bdi->wb.task)) {
+ if (!wb_has_dirty_io(wb) && work)
+ wb_clear_pending(wb, work);
+ else if (wb->task)
+ wake_up_process(wb->task);
+}
+
+static void bdi_queue_work(struct backing_dev_info *bdi, struct bdi_work *work)
+{
+ if (work) {
+ work->seen = bdi->wb_mask;
+ BUG_ON(!work->seen);
+ atomic_set(&work->pending, bdi->wb_cnt);
+ BUG_ON(!bdi->wb_cnt);
+
+ /*
+ * Make sure stores are seen before it appears on the list
+ */
+ smp_mb();
+
+ spin_lock(&bdi->wb_lock);
+ list_add_tail_rcu(&work->list, &bdi->work_list);
+ spin_unlock(&bdi->wb_lock);
+ }
+}
+
+static void bdi_sched_work(struct backing_dev_info *bdi, struct bdi_work *work)
+{
+ if (!bdi_wblist_needs_lock(bdi))
+ wb_start_writeback(&bdi->wb, work);
+ else {
+ struct bdi_writeback *wb;
+ int idx;
+
+ idx = srcu_read_lock(&bdi->srcu);
+
+ list_for_each_entry_rcu(wb, &bdi->wb_list, list)
+ wb_start_writeback(wb, work);
+
+ srcu_read_unlock(&bdi->srcu, idx);
+ }
+}
+
+static void __bdi_start_work(struct backing_dev_info *bdi,
+ struct bdi_work *work)
+{
+ /*
+ * If the default thread isn't there, make sure we add it. When
+ * it gets created and wakes up, we'll run this work.
+ */
+ if (unlikely(list_empty_careful(&bdi->wb_list)))
bdi_add_default_flusher_task(bdi);
- return 1;
+ else
+ bdi_sched_work(bdi, work);
+}
+
+static void bdi_start_work(struct backing_dev_info *bdi, struct bdi_work *work)
+{
+ /*
+ * If the default thread isn't there, make sure we add it. When
+ * it gets created and wakes up, we'll run this work.
+ */
+ if (unlikely(list_empty_careful(&bdi->wb_list))) {
+ mutex_lock(&bdi_lock);
+ bdi_add_default_flusher_task(bdi);
+ mutex_unlock(&bdi_lock);
+ } else
+ bdi_sched_work(bdi, work);
+}
+
+/*
+ * Used for on-stack allocated work items. The caller needs to wait until
+ * the wb threads have acked the work before it's safe to continue.
+ */
+static void bdi_wait_on_work_clear(struct bdi_work *work)
+{
+ wait_on_bit(&work->state, 0, bdi_sched_wait, TASK_UNINTERRUPTIBLE);
+}
+
+static struct bdi_work *bdi_alloc_work(struct super_block *sb, long nr_pages,
+ enum writeback_sync_modes sync_mode)
+{
+ struct bdi_work *work;
+
+ work = kmalloc(sizeof(*work), GFP_ATOMIC);
+ if (work)
+ bdi_work_init(work, sb, nr_pages, sync_mode);
+
+ return work;
+}
+
+void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
+ long nr_pages, enum writeback_sync_modes sync_mode)
+{
+ const bool must_wait = sync_mode == WB_SYNC_ALL;
+ struct bdi_work work_stack, *work = NULL;
+
+ if (!must_wait)
+ work = bdi_alloc_work(sb, nr_pages, sync_mode);
+
+ if (!work) {
+ work = &work_stack;
+ bdi_work_init_on_stack(work, sb, nr_pages, sync_mode);
}
- wb_start_writeback(&bdi->wb, sb, nr_pages, sync_mode);
- return 0;
+ bdi_queue_work(bdi, work);
+ bdi_start_work(bdi, work);
+
+ /*
+ * If the sync mode is WB_SYNC_ALL, block waiting for the work to
+ * complete. If not, we only need to wait for the work to be started,
+ * if we allocated it on-stack. We use the same mechanism, if the
+ * wait bit is set in the bdi_work struct, then threads will not
+ * clear pending until after they are done.
+ *
+ * Note that work == &work_stack if must_wait is true, so we don't
+ * need to do call_rcu() here ever, since the completion path will
+ * have done that for us.
+ */
+ if (must_wait || work == &work_stack) {
+ bdi_wait_on_work_clear(work);
+ if (work != &work_stack)
+ call_rcu(&work->rcu_head, bdi_work_free);
+ }
}
/*
@@ -157,7 +326,7 @@ static void wb_kupdated(struct bdi_writeback *wb)
wbc.more_io = 0;
wbc.encountered_congestion = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
- generic_sync_bdi_inodes(NULL, &wbc);
+ generic_sync_wb_inodes(wb, NULL, &wbc);
if (wbc.nr_to_write > 0)
break; /* All the old data is written */
nr_to_write -= MAX_WRITEBACK_PAGES;
@@ -174,22 +343,19 @@ static inline bool over_bground_thresh(void)
global_page_state(NR_UNSTABLE_NFS) >= background_thresh);
}
-static void generic_sync_wb_inodes(struct bdi_writeback *wb,
- struct super_block *sb,
- struct writeback_control *wbc);
-
-static void wb_writeback(struct bdi_writeback *wb)
+static void __wb_writeback(struct bdi_writeback *wb, long nr_pages,
+ struct super_block *sb,
+ enum writeback_sync_modes sync_mode)
{
struct writeback_control wbc = {
.bdi = wb->bdi,
- .sync_mode = wb->sync_mode,
+ .sync_mode = sync_mode,
.older_than_this = NULL,
.range_cyclic = 1,
};
- long nr_pages = wb->nr_pages;
for (;;) {
- if (wbc.sync_mode == WB_SYNC_NONE && nr_pages <= 0 &&
+ if (sync_mode == WB_SYNC_NONE && nr_pages <= 0 &&
!over_bground_thresh())
break;
@@ -197,7 +363,7 @@ static void wb_writeback(struct bdi_writeback *wb)
wbc.encountered_congestion = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
wbc.pages_skipped = 0;
- generic_sync_wb_inodes(wb, wb->sb, &wbc);
+ generic_sync_wb_inodes(wb, sb, &wbc);
nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
/*
* If we ran out of stuff to write, bail unless more_io got set
@@ -211,6 +377,82 @@ static void wb_writeback(struct bdi_writeback *wb)
}
/*
+ * Return the next bdi_work struct that hasn't been processed by this
+ * wb thread yet
+ */
+static struct bdi_work *get_next_work_item(struct backing_dev_info *bdi,
+ struct bdi_writeback *wb)
+{
+ struct bdi_work *work, *ret = NULL;
+
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(work, &bdi->work_list, list) {
+ if (!test_and_clear_bit(wb->nr, &work->seen))
+ continue;
+
+ ret = work;
+ break;
+ }
+
+ rcu_read_unlock();
+ return ret;
+}
+
+/*
+ * Retrieve work items and do the writeback they describe
+ */
+static void wb_writeback(struct bdi_writeback *wb)
+{
+ struct backing_dev_info *bdi = wb->bdi;
+ struct bdi_work *work;
+
+ while ((work = get_next_work_item(bdi, wb)) != NULL) {
+ struct super_block *sb = bdi_work_sb(work);
+ long nr_pages = work->nr_pages;
+ enum writeback_sync_modes sync_mode = work->sync_mode;
+
+ /*
+ * If this isn't a data integrity operation, just notify
+ * that we have seen this work and we are now starting it.
+ */
+ if (sync_mode == WB_SYNC_NONE)
+ wb_clear_pending(wb, work);
+
+ __wb_writeback(wb, nr_pages, sb, sync_mode);
+
+ /*
+ * This is a data integrity writeback, so only do the
+ * notification when we have completed the work.
+ */
+ if (sync_mode == WB_SYNC_ALL)
+ wb_clear_pending(wb, work);
+ }
+}
+
+/*
+ * This will be inlined in bdi_writeback_task() once we get rid of any
+ * dirty inodes on the default_backing_dev_info
+ */
+void wb_do_writeback(struct bdi_writeback *wb)
+{
+ /*
+ * We get here in two cases:
+ *
+ * schedule_timeout() returned because the dirty writeback
+ * interval has elapsed. If that happens, the work item list
+ * will be empty and we will proceed to do kupdated style writeout.
+ *
+ * Someone called bdi_start_writeback(), which put one/more work
+ * items on the work_list. Process those.
+ */
+ if (list_empty(&wb->bdi->work_list))
+ wb_kupdated(wb);
+ else
+ wb_writeback(wb);
+}
+
+/*
* Handle writeback of dirty data for the device backed by this bdi. Also
* wakes up periodically and does kupdated style flushing.
*/
@@ -219,57 +461,84 @@ int bdi_writeback_task(struct bdi_writeback *wb)
while (!kthread_should_stop()) {
unsigned long wait_jiffies;
+ wb_do_writeback(wb);
+
wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(wait_jiffies);
try_to_freeze();
-
- /*
- * We get here in two cases:
- *
- * schedule_timeout() returned because the dirty writeback
- * interval has elapsed. If that happens, we will be able
- * to acquire the writeback lock and will proceed to do
- * kupdated style writeout.
- *
- * Someone called bdi_start_writeback(), which will acquire
- * the writeback lock. This means our writeback_acquire()
- * below will fail and we call into bdi_pdflush() for
- * pdflush style writeout.
- *
- */
- if (writeback_acquire(wb))
- wb_kupdated(wb);
- else
- wb_writeback(wb);
-
- writeback_release(wb);
}
return 0;
}
+/*
+ * Schedule writeback for all backing devices. Expensive! If this is a data
+ * integrity operation, writeback will be complete when this returns. If
+ * we are simply called for WB_SYNC_NONE, then writeback will merely be
+ * scheduled to run.
+ */
void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc)
{
+ const bool must_wait = wbc->sync_mode == WB_SYNC_ALL;
struct backing_dev_info *bdi, *tmp;
+ struct bdi_work *work;
+ LIST_HEAD(list);
mutex_lock(&bdi_lock);
list_for_each_entry_safe(bdi, tmp, &bdi_list, bdi_list) {
+ struct bdi_work *work;
+
if (!bdi_has_dirty_io(bdi))
continue;
- bdi_start_writeback(bdi, sb, wbc->nr_to_write, wbc->sync_mode);
+
+ /*
+ * If work allocation fails, do the writes inline. An
+ * alternative approach would be too fall back to an on-stack
+ * allocation of work. For that we need to drop the bdi_lock
+ * and restart the scan afterwards, though.
+ */
+ work = bdi_alloc_work(sb, wbc->nr_to_write, wbc->sync_mode);
+ if (!work) {
+ wbc->bdi = bdi;
+ generic_sync_bdi_inodes(sb, wbc);
+ continue;
+ }
+ if (must_wait)
+ list_add_tail(&work->wait_list, &list);
+
+ bdi_queue_work(bdi, work);
+ __bdi_start_work(bdi, work);
}
mutex_unlock(&bdi_lock);
+
+ /*
+ * If this is for WB_SYNC_ALL, wait for pending work to complete
+ * before returning.
+ */
+ while (!list_empty(&list)) {
+ work = list_entry(list.next, struct bdi_work, wait_list);
+ list_del(&work->wait_list);
+ bdi_wait_on_work_clear(work);
+ call_rcu(&work->rcu_head, bdi_work_free);
+ }
}
/*
- * We have only a single wb per bdi, so just return that.
+ * If the filesystem didn't provide a way to map an inode to a dedicated
+ * flusher thread, it doesn't support more than 1 thread. So we know it's
+ * the default thread, return that.
*/
static inline struct bdi_writeback *inode_get_wb(struct inode *inode)
{
- return &inode_to_bdi(inode)->wb;
+ const struct super_operations *sop = inode->i_sb->s_op;
+
+ if (!sop->inode_get_wb)
+ return &inode_to_bdi(inode)->wb;
+
+ return sop->inode_get_wb(inode);
}
/**
@@ -723,8 +992,24 @@ void generic_sync_bdi_inodes(struct super_block *sb,
struct writeback_control *wbc)
{
struct backing_dev_info *bdi = wbc->bdi;
+ struct bdi_writeback *wb;
- generic_sync_wb_inodes(&bdi->wb, sb, wbc);
+ /*
+ * Common case is just a single wb thread and that is embedded in
+ * the bdi, so it doesn't need locking
+ */
+ if (!bdi_wblist_needs_lock(bdi))
+ generic_sync_wb_inodes(&bdi->wb, sb, wbc);
+ else {
+ int idx;
+
+ idx = srcu_read_lock(&bdi->srcu);
+
+ list_for_each_entry_rcu(wb, &bdi->wb_list, list)
+ generic_sync_wb_inodes(wb, sb, wbc);
+
+ srcu_read_unlock(&bdi->srcu, idx);
+ }
}
/*
@@ -751,7 +1036,7 @@ void generic_sync_sb_inodes(struct super_block *sb,
struct writeback_control *wbc)
{
if (wbc->bdi)
- generic_sync_bdi_inodes(sb, wbc);
+ bdi_start_writeback(wbc->bdi, sb, wbc->nr_to_write, wbc->sync_mode);
else
bdi_writeback_all(sb, wbc);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 59f88e5..8584438 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -13,6 +13,8 @@
#include <linux/proportions.h>
#include <linux/kernel.h>
#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/srcu.h>
#include <linux/writeback.h>
#include <asm/atomic.h>
@@ -26,6 +28,7 @@ struct dentry;
enum bdi_state {
BDI_pending, /* On its way to being activated */
BDI_wb_alloc, /* Default embedded wb allocated */
+ BDI_wblist_lock, /* bdi->wb_list now needs locking */
BDI_async_congested, /* The async (write) queue is getting full */
BDI_sync_congested, /* The sync queue is getting full */
BDI_unused, /* Available bits start here */
@@ -42,6 +45,8 @@ enum bdi_stat_item {
#define BDI_STAT_BATCH (8*(1+ilog2(nr_cpu_ids)))
struct bdi_writeback {
+ struct list_head list; /* hangs off the bdi */
+
struct backing_dev_info *bdi; /* our parent bdi */
unsigned int nr;
@@ -49,13 +54,12 @@ struct bdi_writeback {
struct list_head b_dirty; /* dirty inodes */
struct list_head b_io; /* parked for writeback */
struct list_head b_more_io; /* parked for more writeback */
-
- unsigned long nr_pages;
- struct super_block *sb;
- enum writeback_sync_modes sync_mode;
};
+#define BDI_MAX_FLUSHERS 32
+
struct backing_dev_info {
+ struct srcu_struct srcu; /* for wb_list read side protection */
struct list_head bdi_list;
unsigned long ra_pages; /* max readahead in PAGE_CACHE_SIZE units */
unsigned long state; /* Always use atomic bitops on this */
@@ -74,8 +78,12 @@ struct backing_dev_info {
unsigned int max_ratio, max_prop_frac;
struct bdi_writeback wb; /* default writeback info for this bdi */
- unsigned long wb_active; /* bitmap of active tasks */
- unsigned long wb_mask; /* number of registered tasks */
+ spinlock_t wb_lock; /* protects update side of wb_list */
+ struct list_head wb_list; /* the flusher threads hanging off this bdi */
+ unsigned long wb_mask; /* bitmask of registered tasks */
+ unsigned int wb_cnt; /* number of registered tasks */
+
+ struct list_head work_list;
struct device *dev;
@@ -92,16 +100,22 @@ int bdi_register(struct backing_dev_info *bdi, struct device *parent,
const char *fmt, ...);
int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
void bdi_unregister(struct backing_dev_info *bdi);
-int bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
+void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
long nr_pages, enum writeback_sync_modes sync_mode);
int bdi_writeback_task(struct bdi_writeback *wb);
void bdi_writeback_all(struct super_block *sb, struct writeback_control *wbc);
void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
+void bdi_add_flusher_task(struct backing_dev_info *bdi);
int bdi_has_dirty_io(struct backing_dev_info *bdi);
extern struct mutex bdi_lock;
extern struct list_head bdi_list;
+static inline int bdi_wblist_needs_lock(struct backing_dev_info *bdi)
+{
+ return test_bit(BDI_wblist_lock, &bdi->state);
+}
+
static inline int wb_has_dirty_io(struct bdi_writeback *wb)
{
return !list_empty(&wb->b_dirty) ||
@@ -314,4 +328,10 @@ static inline bool mapping_cap_swap_backed(struct address_space *mapping)
return bdi_cap_swap_backed(mapping->backing_dev_info);
}
+static inline int bdi_sched_wait(void *word)
+{
+ schedule();
+ return 0;
+}
+
#endif /* _LINUX_BACKING_DEV_H */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ecdc544..d3bda5d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1550,11 +1550,14 @@ extern ssize_t vfs_readv(struct file *, const struct iovec __user *,
extern ssize_t vfs_writev(struct file *, const struct iovec __user *,
unsigned long, loff_t *);
+struct bdi_writeback;
+
struct super_operations {
struct inode *(*alloc_inode)(struct super_block *sb);
void (*destroy_inode)(struct inode *);
void (*dirty_inode) (struct inode *);
+ struct bdi_writeback *(*inode_get_wb) (struct inode *);
int (*write_inode) (struct inode *, int);
void (*drop_inode) (struct inode *);
void (*delete_inode) (struct inode *);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index baf04a9..e414702 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -69,6 +69,7 @@ void writeback_inodes(struct writeback_control *wbc);
int inode_wait(void *);
void sync_inodes_sb(struct super_block *, int wait);
void sync_inodes(int wait);
+void wb_do_writeback(struct bdi_writeback *wb);
/* writeback.h requires fs.h; it, too, is not included from here. */
static inline void wait_on_inode(struct inode *inode)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 75c9054..8980f6f 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -213,52 +213,100 @@ static int __init default_bdi_init(void)
}
subsys_initcall(default_bdi_init);
-static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
+static int wb_assign_nr(struct backing_dev_info *bdi, struct bdi_writeback *wb)
{
- memset(wb, 0, sizeof(*wb));
+ unsigned long mask = BDI_MAX_FLUSHERS - 1;
+ unsigned int nr;
- wb->bdi = bdi;
- INIT_LIST_HEAD(&wb->b_dirty);
- INIT_LIST_HEAD(&wb->b_io);
- INIT_LIST_HEAD(&wb->b_more_io);
-}
+ do {
+ if ((bdi->wb_mask & mask) == mask)
+ return 1;
+
+ nr = find_first_zero_bit(&bdi->wb_mask, BDI_MAX_FLUSHERS);
+ } while (test_and_set_bit(nr, &bdi->wb_mask));
+
+ wb->nr = nr;
+
+ spin_lock(&bdi->wb_lock);
+ bdi->wb_cnt++;
+ spin_unlock(&bdi->wb_lock);
-static int wb_assign_nr(struct backing_dev_info *bdi, struct bdi_writeback *wb)
-{
- set_bit(0, &bdi->wb_mask);
- wb->nr = 0;
return 0;
}
static void bdi_put_wb(struct backing_dev_info *bdi, struct bdi_writeback *wb)
{
- clear_bit(wb->nr, &bdi->wb_mask);
- clear_bit(BDI_wb_alloc, &bdi->state);
+ /*
+ * If this is the default wb thread exiting, leave the bit set
+ * in the wb mask as we set that before it's created as well. This
+ * is done to make sure that assigned work with no thread has at
+ * least one receipient.
+ */
+ if (wb == &bdi->wb)
+ clear_bit(BDI_wb_alloc, &bdi->state);
+ else {
+ clear_bit(wb->nr, &bdi->wb_mask);
+ kfree(wb);
+ spin_lock(&bdi->wb_lock);
+ bdi->wb_cnt--;
+ spin_unlock(&bdi->wb_lock);
+ }
+}
+
+static int bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
+{
+ memset(wb, 0, sizeof(*wb));
+
+ wb->bdi = bdi;
+ INIT_LIST_HEAD(&wb->b_dirty);
+ INIT_LIST_HEAD(&wb->b_io);
+ INIT_LIST_HEAD(&wb->b_more_io);
+
+ return wb_assign_nr(bdi, wb);
}
static struct bdi_writeback *bdi_new_wb(struct backing_dev_info *bdi)
{
struct bdi_writeback *wb;
- set_bit(BDI_wb_alloc, &bdi->state);
- wb = &bdi->wb;
- wb_assign_nr(bdi, wb);
+ /*
+ * Default bdi->wb is already assigned, so just return it
+ */
+ if (!test_and_set_bit(BDI_wb_alloc, &bdi->state))
+ wb = &bdi->wb;
+ else {
+ wb = kmalloc(sizeof(struct bdi_writeback), GFP_KERNEL);
+ if (wb) {
+ if (bdi_wb_init(wb, bdi)) {
+ kfree(wb);
+ wb = NULL;
+ }
+ }
+ }
+
return wb;
}
-static int bdi_start_fn(void *ptr)
+static void bdi_task_init(struct backing_dev_info *bdi,
+ struct bdi_writeback *wb)
{
- struct bdi_writeback *wb = ptr;
- struct backing_dev_info *bdi = wb->bdi;
struct task_struct *tsk = current;
- int ret;
+ int was_empty;
/*
- * Add us to the active bdi_list
+ * Add us to the active bdi_list. If we are adding threads beyond
+ * the default embedded bdi_writeback, then we need to start using
+ * proper locking. Check the list for empty first, then set the
+ * BDI_wblist_lock flag if there's > 1 entry on the list now
*/
- mutex_lock(&bdi_lock);
- list_add(&bdi->bdi_list, &bdi_list);
- mutex_unlock(&bdi_lock);
+ spin_lock(&bdi->wb_lock);
+
+ was_empty = list_empty(&bdi->wb_list);
+ list_add_tail_rcu(&wb->list, &bdi->wb_list);
+ if (!was_empty)
+ set_bit(BDI_wblist_lock, &bdi->state);
+
+ spin_unlock(&bdi->wb_lock);
tsk->flags |= PF_FLUSHER | PF_SWAPWRITE;
set_freezable();
@@ -267,6 +315,22 @@ static int bdi_start_fn(void *ptr)
* Our parent may run at a different priority, just set us to normal
*/
set_user_nice(tsk, 0);
+}
+
+static int bdi_start_fn(void *ptr)
+{
+ struct bdi_writeback *wb = ptr;
+ struct backing_dev_info *bdi = wb->bdi;
+ int ret;
+
+ /*
+ * Add us to the active bdi_list
+ */
+ mutex_lock(&bdi_lock);
+ list_add(&bdi->bdi_list, &bdi_list);
+ mutex_unlock(&bdi_lock);
+
+ bdi_task_init(bdi, wb);
/*
* Clear pending bit and wakeup anybody waiting to tear us down
@@ -277,13 +341,44 @@ static int bdi_start_fn(void *ptr)
ret = bdi_writeback_task(wb);
+ /*
+ * Remove us from the list
+ */
+ spin_lock(&bdi->wb_lock);
+ list_del_rcu(&wb->list);
+ spin_unlock(&bdi->wb_lock);
+
+ /*
+ * wait for rcu grace period to end, so we can free wb
+ */
+ synchronize_srcu(&bdi->srcu);
+
bdi_put_wb(bdi, wb);
return ret;
}
int bdi_has_dirty_io(struct backing_dev_info *bdi)
{
- return wb_has_dirty_io(&bdi->wb);
+ struct bdi_writeback *wb;
+ int ret = 0;
+
+ if (!bdi_wblist_needs_lock(bdi))
+ ret = wb_has_dirty_io(&bdi->wb);
+ else {
+ int idx;
+
+ idx = srcu_read_lock(&bdi->srcu);
+
+ list_for_each_entry_rcu(wb, &bdi->wb_list, list) {
+ ret = wb_has_dirty_io(wb);
+ if (ret)
+ break;
+ }
+
+ srcu_read_unlock(&bdi->srcu, idx);
+ }
+
+ return ret;
}
static void bdi_flush_io(struct backing_dev_info *bdi)
@@ -340,6 +435,8 @@ static int bdi_forker_task(void *ptr)
{
struct bdi_writeback *me = ptr;
+ bdi_task_init(me->bdi, me);
+
for (;;) {
struct backing_dev_info *bdi, *tmp;
struct bdi_writeback *wb;
@@ -348,8 +445,8 @@ static int bdi_forker_task(void *ptr)
* Temporary measure, we want to make sure we don't see
* dirty data on the default backing_dev_info
*/
- if (wb_has_dirty_io(me))
- bdi_flush_io(me->bdi);
+ if (wb_has_dirty_io(me) || !list_empty(&me->bdi->work_list))
+ wb_do_writeback(me);
mutex_lock(&bdi_lock);
@@ -417,27 +514,70 @@ readd_flush:
}
/*
- * Add a new flusher task that gets created for any bdi
- * that has dirty data pending writeout
+ * bdi_lock held on entry
*/
-void bdi_add_default_flusher_task(struct backing_dev_info *bdi)
+static void bdi_add_one_flusher_task(struct backing_dev_info *bdi,
+ int(*func)(struct backing_dev_info *))
{
if (!bdi_cap_writeback_dirty(bdi))
return;
/*
- * Someone already marked this pending for task creation
+ * Check with the helper whether to proceed adding a task. Will only
+ * abort if we two or more simultanous calls to
+ * bdi_add_default_flusher_task() occured, further additions will block
+ * waiting for previous additions to finish.
*/
- if (test_and_set_bit(BDI_pending, &bdi->state))
- return;
+ if (!func(bdi)) {
+ list_move_tail(&bdi->bdi_list, &bdi_pending_list);
- mutex_lock(&bdi_lock);
- list_move_tail(&bdi->bdi_list, &bdi_pending_list);
+ /*
+ * We are now on the pending list, wake up bdi_forker_task()
+ * to finish the job and add us back to the active bdi_list
+ */
+ wake_up_process(default_backing_dev_info.wb.task);
+ }
+}
+
+static int flusher_add_helper_block(struct backing_dev_info *bdi)
+{
mutex_unlock(&bdi_lock);
+ wait_on_bit_lock(&bdi->state, BDI_pending, bdi_sched_wait,
+ TASK_UNINTERRUPTIBLE);
+ mutex_lock(&bdi_lock);
+ return 0;
+}
- wake_up_process(default_backing_dev_info.wb.task);
+static int flusher_add_helper_test(struct backing_dev_info *bdi)
+{
+ return test_and_set_bit(BDI_pending, &bdi->state);
+}
+
+/*
+ * Add the default flusher task that gets created for any bdi
+ * that has dirty data pending writeout
+ */
+void bdi_add_default_flusher_task(struct backing_dev_info *bdi)
+{
+ bdi_add_one_flusher_task(bdi, flusher_add_helper_test);
}
+/**
+ * bdi_add_flusher_task - add one more flusher task to this @bdi
+ * @bdi: the bdi
+ *
+ * Add an additional flusher task to this @bdi. Will block waiting on
+ * previous additions, if any.
+ *
+ */
+void bdi_add_flusher_task(struct backing_dev_info *bdi)
+{
+ mutex_lock(&bdi_lock);
+ bdi_add_one_flusher_task(bdi, flusher_add_helper_block);
+ mutex_unlock(&bdi_lock);
+}
+EXPORT_SYMBOL(bdi_add_flusher_task);
+
int bdi_register(struct backing_dev_info *bdi, struct device *parent,
const char *fmt, ...)
{
@@ -501,24 +641,21 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev)
}
EXPORT_SYMBOL(bdi_register_dev);
-static int sched_wait(void *word)
-{
- schedule();
- return 0;
-}
-
/*
* Remove bdi from global list and shutdown any threads we have running
*/
static void bdi_wb_shutdown(struct backing_dev_info *bdi)
{
+ struct bdi_writeback *wb;
+
if (!bdi_cap_writeback_dirty(bdi))
return;
/*
* If setup is pending, wait for that to complete first
*/
- wait_on_bit(&bdi->state, BDI_pending, sched_wait, TASK_UNINTERRUPTIBLE);
+ wait_on_bit(&bdi->state, BDI_pending, bdi_sched_wait,
+ TASK_UNINTERRUPTIBLE);
/*
* Make sure nobody finds us on the bdi_list anymore
@@ -528,9 +665,11 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
mutex_unlock(&bdi_lock);
/*
- * Finally, kill the kernel thread
+ * Finally, kill the kernel threads. We don't need to be RCU
+ * safe anymore, since the bdi is gone from visibility.
*/
- kthread_stop(bdi->wb.task);
+ list_for_each_entry(wb, &bdi->wb_list, list)
+ kthread_stop(wb->task);
}
void bdi_unregister(struct backing_dev_info *bdi)
@@ -554,8 +693,12 @@ int bdi_init(struct backing_dev_info *bdi)
bdi->min_ratio = 0;
bdi->max_ratio = 100;
bdi->max_prop_frac = PROP_FRAC_BASE;
+ spin_lock_init(&bdi->wb_lock);
+ bdi->wb_mask = 0;
+ bdi->wb_cnt = 0;
INIT_LIST_HEAD(&bdi->bdi_list);
- bdi->wb_mask = bdi->wb_active = 0;
+ INIT_LIST_HEAD(&bdi->wb_list);
+ INIT_LIST_HEAD(&bdi->work_list);
bdi_wb_init(&bdi->wb, bdi);
@@ -565,10 +708,15 @@ int bdi_init(struct backing_dev_info *bdi)
goto err;
}
+ err = init_srcu_struct(&bdi->srcu);
+ if (err)
+ goto err;
+
bdi->dirty_exceeded = 0;
err = prop_local_init_percpu(&bdi->completions);
if (err) {
+ cleanup_srcu_struct(&bdi->srcu);
err:
while (i--)
percpu_counter_destroy(&bdi->bdi_stat[i]);
@@ -586,6 +734,8 @@ void bdi_destroy(struct backing_dev_info *bdi)
bdi_unregister(bdi);
+ cleanup_srcu_struct(&bdi->srcu);
+
for (i = 0; i < NR_BDI_STAT_ITEMS; i++)
percpu_counter_destroy(&bdi->bdi_stat[i]);
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 08/11] writeback: allow sleepy exit of default writeback task
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (6 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 07/11] writeback: support > 1 flusher thread per bdi Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 09/11] writeback: add some debug inode list counters to bdi stats Jens Axboe
` (8 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
Since we do lazy create of default writeback tasks for a bdi, we can
allow sleepy exit if it has been completely idle for 5 minutes.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 54 ++++++++++++++++++++++++++++++++++--------
include/linux/backing-dev.h | 5 ++++
include/linux/writeback.h | 2 +-
3 files changed, 49 insertions(+), 12 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index f3db578..d1d47c0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -303,10 +303,10 @@ void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
* older_than_this takes precedence over nr_to_write. So we'll only write back
* all dirty pages if they are all attached to "old" mappings.
*/
-static void wb_kupdated(struct bdi_writeback *wb)
+static long wb_kupdated(struct bdi_writeback *wb)
{
unsigned long oldest_jif;
- long nr_to_write;
+ long nr_to_write, wrote = 0;
struct writeback_control wbc = {
.bdi = wb->bdi,
.sync_mode = WB_SYNC_NONE,
@@ -327,10 +327,13 @@ static void wb_kupdated(struct bdi_writeback *wb)
wbc.encountered_congestion = 0;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
generic_sync_wb_inodes(wb, NULL, &wbc);
+ wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
if (wbc.nr_to_write > 0)
break; /* All the old data is written */
nr_to_write -= MAX_WRITEBACK_PAGES;
}
+
+ return wrote;
}
static inline bool over_bground_thresh(void)
@@ -343,7 +346,7 @@ static inline bool over_bground_thresh(void)
global_page_state(NR_UNSTABLE_NFS) >= background_thresh);
}
-static void __wb_writeback(struct bdi_writeback *wb, long nr_pages,
+static long __wb_writeback(struct bdi_writeback *wb, long nr_pages,
struct super_block *sb,
enum writeback_sync_modes sync_mode)
{
@@ -353,6 +356,7 @@ static void __wb_writeback(struct bdi_writeback *wb, long nr_pages,
.older_than_this = NULL,
.range_cyclic = 1,
};
+ long wrote = 0;
for (;;) {
if (sync_mode == WB_SYNC_NONE && nr_pages <= 0 &&
@@ -365,6 +369,7 @@ static void __wb_writeback(struct bdi_writeback *wb, long nr_pages,
wbc.pages_skipped = 0;
generic_sync_wb_inodes(wb, sb, &wbc);
nr_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
+ wrote += MAX_WRITEBACK_PAGES - wbc.nr_to_write;
/*
* If we ran out of stuff to write, bail unless more_io got set
*/
@@ -374,6 +379,8 @@ static void __wb_writeback(struct bdi_writeback *wb, long nr_pages,
break;
}
}
+
+ return wrote;
}
/*
@@ -402,10 +409,11 @@ static struct bdi_work *get_next_work_item(struct backing_dev_info *bdi,
/*
* Retrieve work items and do the writeback they describe
*/
-static void wb_writeback(struct bdi_writeback *wb)
+static long wb_writeback(struct bdi_writeback *wb)
{
struct backing_dev_info *bdi = wb->bdi;
struct bdi_work *work;
+ long wrote = 0;
while ((work = get_next_work_item(bdi, wb)) != NULL) {
struct super_block *sb = bdi_work_sb(work);
@@ -419,7 +427,7 @@ static void wb_writeback(struct bdi_writeback *wb)
if (sync_mode == WB_SYNC_NONE)
wb_clear_pending(wb, work);
- __wb_writeback(wb, nr_pages, sb, sync_mode);
+ wrote += __wb_writeback(wb, nr_pages, sb, sync_mode);
/*
* This is a data integrity writeback, so only do the
@@ -428,14 +436,18 @@ static void wb_writeback(struct bdi_writeback *wb)
if (sync_mode == WB_SYNC_ALL)
wb_clear_pending(wb, work);
}
+
+ return wrote;
}
/*
* This will be inlined in bdi_writeback_task() once we get rid of any
* dirty inodes on the default_backing_dev_info
*/
-void wb_do_writeback(struct bdi_writeback *wb)
+long wb_do_writeback(struct bdi_writeback *wb)
{
+ long wrote;
+
/*
* We get here in two cases:
*
@@ -447,9 +459,11 @@ void wb_do_writeback(struct bdi_writeback *wb)
* items on the work_list. Process those.
*/
if (list_empty(&wb->bdi->work_list))
- wb_kupdated(wb);
+ wrote = wb_kupdated(wb);
else
- wb_writeback(wb);
+ wrote = wb_writeback(wb);
+
+ return wrote;
}
/*
@@ -458,10 +472,28 @@ void wb_do_writeback(struct bdi_writeback *wb)
*/
int bdi_writeback_task(struct bdi_writeback *wb)
{
+ unsigned long last_active = jiffies;
+ unsigned long wait_jiffies = -1UL;
+ long pages_written;
+
while (!kthread_should_stop()) {
- unsigned long wait_jiffies;
+ pages_written = wb_do_writeback(wb);
+
+ if (pages_written)
+ last_active = jiffies;
+ else if (wait_jiffies != -1UL) {
+ unsigned long max_idle;
- wb_do_writeback(wb);
+ /*
+ * Longest period of inactivity that we tolerate. If we
+ * see dirty data again later, the task will get
+ * recreated automatically.
+ */
+ max_idle = max(5UL * 60 * HZ, wait_jiffies);
+ if (time_after(jiffies, max_idle + last_active) &&
+ wb_is_default_task(wb))
+ break;
+ }
wait_jiffies = msecs_to_jiffies(dirty_writeback_interval * 10);
set_current_state(TASK_INTERRUPTIBLE);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 8584438..d55553d 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -111,6 +111,11 @@ int bdi_has_dirty_io(struct backing_dev_info *bdi);
extern struct mutex bdi_lock;
extern struct list_head bdi_list;
+static inline int wb_is_default_task(struct bdi_writeback *wb)
+{
+ return wb == &wb->bdi->wb;
+}
+
static inline int bdi_wblist_needs_lock(struct backing_dev_info *bdi)
{
return test_bit(BDI_wblist_lock, &bdi->state);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index e414702..30e318b 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -69,7 +69,7 @@ void writeback_inodes(struct writeback_control *wbc);
int inode_wait(void *);
void sync_inodes_sb(struct super_block *, int wait);
void sync_inodes(int wait);
-void wb_do_writeback(struct bdi_writeback *wb);
+long wb_do_writeback(struct bdi_writeback *wb);
/* writeback.h requires fs.h; it, too, is not included from here. */
static inline void wait_on_inode(struct inode *inode)
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 09/11] writeback: add some debug inode list counters to bdi stats
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (7 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 08/11] writeback: allow sleepy exit of default writeback task Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 10/11] writeback: add name to backing_dev_info Jens Axboe
` (7 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
Add some debug entries to be able to inspect the internal state of
the writeback details.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
mm/backing-dev.c | 38 ++++++++++++++++++++++++++++++++++----
1 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 8980f6f..b981118 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -50,9 +50,29 @@ static void bdi_debug_init(void)
static int bdi_debug_stats_show(struct seq_file *m, void *v)
{
struct backing_dev_info *bdi = m->private;
+ struct bdi_writeback *wb;
unsigned long background_thresh;
unsigned long dirty_thresh;
unsigned long bdi_thresh;
+ unsigned long nr_dirty, nr_io, nr_more_io, nr_wb;
+ struct inode *inode;
+
+ /*
+ * inode lock is enough here, the bdi->wb_list is protected by
+ * RCU on the reader side
+ */
+ nr_wb = nr_dirty = nr_io = nr_more_io = 0;
+ spin_lock(&inode_lock);
+ list_for_each_entry(wb, &bdi->wb_list, list) {
+ nr_wb++;
+ list_for_each_entry(inode, &wb->b_dirty, i_list)
+ nr_dirty++;
+ list_for_each_entry(inode, &wb->b_io, i_list)
+ nr_io++;
+ list_for_each_entry(inode, &wb->b_more_io, i_list)
+ nr_more_io++;
+ }
+ spin_unlock(&inode_lock);
get_dirty_limits(&background_thresh, &dirty_thresh, &bdi_thresh, bdi);
@@ -62,12 +82,22 @@ static int bdi_debug_stats_show(struct seq_file *m, void *v)
"BdiReclaimable: %8lu kB\n"
"BdiDirtyThresh: %8lu kB\n"
"DirtyThresh: %8lu kB\n"
- "BackgroundThresh: %8lu kB\n",
+ "BackgroundThresh: %8lu kB\n"
+ "WriteBack threads:%8lu\n"
+ "b_dirty: %8lu\n"
+ "b_io: %8lu\n"
+ "b_more_io: %8lu\n"
+ "bdi_list: %8u\n"
+ "state: %8lx\n"
+ "wb_mask: %8lx\n"
+ "wb_list: %8u\n"
+ "wb_cnt: %8u\n",
(unsigned long) K(bdi_stat(bdi, BDI_WRITEBACK)),
(unsigned long) K(bdi_stat(bdi, BDI_RECLAIMABLE)),
- K(bdi_thresh),
- K(dirty_thresh),
- K(background_thresh));
+ K(bdi_thresh), K(dirty_thresh),
+ K(background_thresh), nr_wb, nr_dirty, nr_io, nr_more_io,
+ !list_empty(&bdi->bdi_list), bdi->state, bdi->wb_mask,
+ !list_empty(&bdi->wb_list), bdi->wb_cnt);
#undef K
return 0;
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 10/11] writeback: add name to backing_dev_info
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (8 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 09/11] writeback: add some debug inode list counters to bdi stats Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 11/11] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
` (6 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
block/blk-core.c | 1 +
drivers/block/aoe/aoeblk.c | 1 +
drivers/char/mem.c | 1 +
fs/btrfs/disk-io.c | 1 +
fs/char_dev.c | 1 +
fs/configfs/inode.c | 1 +
fs/fuse/inode.c | 1 +
fs/hugetlbfs/inode.c | 1 +
fs/nfs/client.c | 1 +
fs/ocfs2/dlm/dlmfs.c | 1 +
fs/ramfs/inode.c | 1 +
fs/sysfs/inode.c | 1 +
fs/ubifs/super.c | 1 +
include/linux/backing-dev.h | 2 ++
kernel/cgroup.c | 1 +
mm/backing-dev.c | 1 +
mm/swap_state.c | 1 +
17 files changed, 18 insertions(+), 0 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index c89883b..d3f18b5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -517,6 +517,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug;
q->backing_dev_info.unplug_io_data = q;
+ q->backing_dev_info.name = "block";
err = bdi_init(&q->backing_dev_info);
if (err) {
kmem_cache_free(blk_requestq_cachep, q);
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 2307a27..0efb8fc 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -265,6 +265,7 @@ aoeblk_gdalloc(void *vp)
}
blk_queue_make_request(&d->blkq, aoeblk_make_request);
+ d->blkq.backing_dev_info.name = "aoe";
if (bdi_init(&d->blkq.backing_dev_info))
goto err_mempool;
spin_lock_irqsave(&d->lock, flags);
diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index 8f05c38..3b38093 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -820,6 +820,7 @@ static const struct file_operations zero_fops = {
* - permits private mappings, "copies" are taken of the source of zeros
*/
static struct backing_dev_info zero_bdi = {
+ .name = "char/mem",
.capabilities = BDI_CAP_MAP_COPY,
};
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2dc19c9..eff2a82 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1353,6 +1353,7 @@ static int setup_bdi(struct btrfs_fs_info *info, struct backing_dev_info *bdi)
{
int err;
+ bdi->name = "btrfs";
bdi->capabilities = BDI_CAP_MAP_COPY;
err = bdi_init(bdi);
if (err)
diff --git a/fs/char_dev.c b/fs/char_dev.c
index 38f7122..350ef9c 100644
--- a/fs/char_dev.c
+++ b/fs/char_dev.c
@@ -32,6 +32,7 @@
* - no readahead or I/O queue unplugging required
*/
struct backing_dev_info directly_mappable_cdev_bdi = {
+ .name = "char",
.capabilities = (
#ifdef CONFIG_MMU
/* permit private copies of the data to be taken */
diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c
index 5d349d3..9a266cd 100644
--- a/fs/configfs/inode.c
+++ b/fs/configfs/inode.c
@@ -46,6 +46,7 @@ static const struct address_space_operations configfs_aops = {
};
static struct backing_dev_info configfs_backing_dev_info = {
+ .name = "configfs",
.ra_pages = 0, /* No readahead */
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 91f7c85..e5e8b03 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -484,6 +484,7 @@ int fuse_conn_init(struct fuse_conn *fc, struct super_block *sb)
INIT_LIST_HEAD(&fc->bg_queue);
INIT_LIST_HEAD(&fc->entry);
atomic_set(&fc->num_waiting, 0);
+ fc->bdi.name = "fuse";
fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
fc->bdi.unplug_io_fn = default_unplug_io_fn;
/* fuse does it's own writeback accounting */
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index c1462d4..db1e537 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -43,6 +43,7 @@ static const struct inode_operations hugetlbfs_dir_inode_operations;
static const struct inode_operations hugetlbfs_inode_operations;
static struct backing_dev_info hugetlbfs_backing_dev_info = {
+ .name = "hugetlbfs",
.ra_pages = 0, /* No readahead */
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 75c9cd2..3a26d06 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -836,6 +836,7 @@ static void nfs_server_set_fsinfo(struct nfs_server *server, struct nfs_fsinfo *
server->rsize = NFS_MAX_FILE_IO_SIZE;
server->rpages = (server->rsize + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
+ server->backing_dev_info.name = "nfs";
server->backing_dev_info.ra_pages = server->rpages * NFS_MAX_READAHEAD;
if (server->wsize > max_rpc_payload)
diff --git a/fs/ocfs2/dlm/dlmfs.c b/fs/ocfs2/dlm/dlmfs.c
index 1c9efb4..02bf178 100644
--- a/fs/ocfs2/dlm/dlmfs.c
+++ b/fs/ocfs2/dlm/dlmfs.c
@@ -325,6 +325,7 @@ clear_fields:
}
static struct backing_dev_info dlmfs_backing_dev_info = {
+ .name = "ocfs2-dlmfs",
.ra_pages = 0, /* No readahead */
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index 3a6b193..5a24199 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -46,6 +46,7 @@ static const struct super_operations ramfs_ops;
static const struct inode_operations ramfs_dir_inode_operations;
static struct backing_dev_info ramfs_backing_dev_info = {
+ .name = "ramfs",
.ra_pages = 0, /* No readahead */
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK |
BDI_CAP_MAP_DIRECT | BDI_CAP_MAP_COPY |
diff --git a/fs/sysfs/inode.c b/fs/sysfs/inode.c
index 555f0ff..e57f98e 100644
--- a/fs/sysfs/inode.c
+++ b/fs/sysfs/inode.c
@@ -29,6 +29,7 @@ static const struct address_space_operations sysfs_aops = {
};
static struct backing_dev_info sysfs_backing_dev_info = {
+ .name = "sysfs",
.ra_pages = 0, /* No readahead */
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index e9f7a75..2349e2c 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1923,6 +1923,7 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
*
* Read-ahead will be disabled because @c->bdi.ra_pages is 0.
*/
+ c->bdi.name = "ubifs",
c->bdi.capabilities = BDI_CAP_MAP_COPY;
c->bdi.unplug_io_fn = default_unplug_io_fn;
err = bdi_init(&c->bdi);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index d55553d..653a652 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -69,6 +69,8 @@ struct backing_dev_info {
void (*unplug_io_fn)(struct backing_dev_info *, struct page *);
void *unplug_io_data;
+ char *name;
+
struct percpu_counter bdi_stat[NR_BDI_STAT_ITEMS];
struct prop_local_percpu completions;
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index a7267bf..0863c5f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -598,6 +598,7 @@ static struct inode_operations cgroup_dir_inode_operations;
static struct file_operations proc_cgroupstats_operations;
static struct backing_dev_info cgroup_backing_dev_info = {
+ .name = "cgroup",
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index b981118..e6991d6 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -17,6 +17,7 @@ void default_unplug_io_fn(struct backing_dev_info *bdi, struct page *page)
EXPORT_SYMBOL(default_unplug_io_fn);
struct backing_dev_info default_backing_dev_info = {
+ .name = "default",
.ra_pages = VM_MAX_READAHEAD * 1024 / PAGE_CACHE_SIZE,
.state = 0,
.capabilities = BDI_CAP_MAP_COPY | BDI_CAP_FLUSH_FORKER,
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3ecea98..323da00 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -34,6 +34,7 @@ static const struct address_space_operations swap_aops = {
};
static struct backing_dev_info swap_backing_dev_info = {
+ .name = "swap",
.capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK | BDI_CAP_SWAP_BACKED,
.unplug_io_fn = swap_unplug_io_fn,
};
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* [PATCH 11/11] writeback: check for registered bdi in flusher add and inode dirty
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (9 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 10/11] writeback: add name to backing_dev_info Jens Axboe
@ 2009-05-28 11:46 ` Jens Axboe
2009-05-28 13:56 ` [PATCH 0/11] Per-bdi writeback flusher threads v9 Peter Zijlstra
` (5 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 11:46 UTC (permalink / raw)
To: linux-kernel, linux-fsdevel, tytso
Cc: chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart, Jens Axboe
Also a debugging aid. We want to catch dirty inodes being added to
backing devices that don't do writeback.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
---
fs/fs-writeback.c | 7 +++++++
include/linux/backing-dev.h | 1 +
mm/backing-dev.c | 6 ++++++
3 files changed, 14 insertions(+), 0 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index d1d47c0..d6fbfa7 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -672,6 +672,13 @@ void __mark_inode_dirty(struct inode *inode, int flags)
*/
if (!was_dirty) {
struct bdi_writeback *wb = inode_get_wb(inode);
+ struct backing_dev_info *bdi = wb->bdi;
+
+ if (bdi_cap_writeback_dirty(bdi) &&
+ !test_bit(BDI_registered, &bdi->state)) {
+ WARN_ON(1);
+ printk("bdi-%s not registered\n", bdi->name);
+ }
inode->dirtied_when = jiffies;
list_move(&inode->i_list, &wb->b_dirty);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 653a652..2831c81 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -31,6 +31,7 @@ enum bdi_state {
BDI_wblist_lock, /* bdi->wb_list now needs locking */
BDI_async_congested, /* The async (write) queue is getting full */
BDI_sync_congested, /* The sync queue is getting full */
+ BDI_registered, /* bdi_register() was done */
BDI_unused, /* Available bits start here */
};
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index e6991d6..3882ac3 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -553,6 +553,11 @@ static void bdi_add_one_flusher_task(struct backing_dev_info *bdi,
if (!bdi_cap_writeback_dirty(bdi))
return;
+ if (WARN_ON(!test_bit(BDI_registered, &bdi->state))) {
+ printk("bdi %p/%s is not registered!\n", bdi, bdi->name);
+ return;
+ }
+
/*
* Check with the helper whether to proceed adding a task. Will only
* abort if we two or more simultanous calls to
@@ -661,6 +666,7 @@ remove_err:
}
bdi_debug_register(bdi, dev_name(dev));
+ set_bit(BDI_registered, &bdi->state);
exit:
return ret;
}
--
1.6.3.rc0.1.gf800
^ permalink raw reply related [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (10 preceding siblings ...)
2009-05-28 11:46 ` [PATCH 11/11] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
@ 2009-05-28 13:56 ` Peter Zijlstra
2009-05-28 22:28 ` Jens Axboe
2009-05-28 14:17 ` Artem Bityutskiy
` (4 subsequent siblings)
16 siblings, 1 reply; 70+ messages in thread
From: Peter Zijlstra @ 2009-05-28 13:56 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
On Thu, 2009-05-28 at 13:46 +0200, Jens Axboe wrote:
> - Get rid of the explicit wait queues, we can just use wake_up_process()
> since it's just for that one task.
Ah, good, should clean up those funny prepare/finish_wait thingies that
looked odd.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 04/11] writeback: switch to per-bdi threads for flushing data
2009-05-28 11:46 ` [PATCH 04/11] writeback: switch to per-bdi threads for flushing data Jens Axboe
@ 2009-05-28 14:13 ` Artem Bityutskiy
2009-05-28 22:28 ` Jens Axboe
0 siblings, 1 reply; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-28 14:13 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
> +#define BDI_CAP_FLUSH_FORKER 0x00000200
Would it please be possible to add a comment about
what this flag is, and whether it is for internal
usage or not. Not immediately obvious for me.
Artem.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (11 preceding siblings ...)
2009-05-28 13:56 ` [PATCH 0/11] Per-bdi writeback flusher threads v9 Peter Zijlstra
@ 2009-05-28 14:17 ` Artem Bityutskiy
2009-05-28 14:19 ` Artem Bityutskiy
2009-05-28 14:41 ` Theodore Tso
` (3 subsequent siblings)
16 siblings, 1 reply; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-28 14:17 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
> Here's the 9th version of the writeback patches. Changes since v8:
>
> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
> issue.
> - Get rid of the explicit wait queues, we can just use wake_up_process()
> since it's just for that one task.
> - Add separate "sync_supers" thread that makes sure that the dirty
> super blocks get written. We cannot safely do this from bdi_forker_task(),
> as that risks deadlocking on ->s_umount. Artem, I implemented this
> by doing the wake ups from a timer so that it would be easier for you
> to just deactivate the timer when there are no super blocks.
Thanks.
I've just tried to test UBIFS with your patches (writeback-v9)
and got lots of these warnings:
------------[ cut here ]------------
WARNING: at fs/fs-writeback.c:679 __mark_inode_dirty+0x1b6/0x212()
Hardware name: HP xw6600 Workstation
Modules linked in: deflate zlib_deflate lzo lzo_decompress lzo_compress ubifs crc16 ubi nandsim nand nand_ids nand_ecc mtd cpufreq_ondemand acpi_cpufreq freq_table iTCO_wdt iTCO_vendor_support tg3 libphy wmi mptsas mptscsih mptbase scsi_transport_sas [last unloaded: microcode]
Pid: 2210, comm: integck Tainted: G W 2.6.30-rc7-block-2.6 #1
Call Trace:
[<ffffffff810ecf78>] ? __mark_inode_dirty+0x1b6/0x212
[<ffffffff8103ffe2>] warn_slowpath_common+0x77/0xa4
[<ffffffff8104001e>] warn_slowpath_null+0xf/0x11
[<ffffffff810ecf78>] __mark_inode_dirty+0x1b6/0x212
[<ffffffff810a4faa>] __set_page_dirty_nobuffers+0xf5/0x105
[<ffffffffa00c4399>] ubifs_write_end+0x1a9/0x236 [ubifs]
[<ffffffff8109c7c1>] ? pagefault_enable+0x28/0x33
[<ffffffff8109cc8f>] ? iov_iter_copy_from_user_atomic+0xfb/0x10a
[<ffffffff8109e2da>] generic_file_buffered_write+0x18c/0x2d9
[<ffffffff8109e828>] __generic_file_aio_write_nolock+0x261/0x295
[<ffffffff8109f09f>] generic_file_aio_write+0x69/0xc5
[<ffffffffa00c39d6>] ubifs_aio_write+0x14c/0x19e [ubifs]
[<ffffffff810d1a89>] do_sync_write+0xe7/0x12d
[<ffffffff812f51c5>] ? __mutex_lock_common+0x36f/0x419
[<ffffffff812f5218>] ? __mutex_lock_common+0x3c2/0x419
[<ffffffff81054bd4>] ? autoremove_wake_function+0x0/0x38
[<ffffffff812f4cae>] ? __mutex_unlock_slowpath+0x10d/0x13c
[<ffffffff8106211f>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff812f4ccb>] ? __mutex_unlock_slowpath+0x12a/0x13c
[<ffffffff811578d0>] ? security_file_permission+0x11/0x13
[<ffffffff810d24ae>] vfs_write+0xab/0x105
[<ffffffff810d25cc>] sys_write+0x47/0x70
[<ffffffff8100bc2b>] system_call_fastpath+0x16/0x1b
---[ end trace 7205fe43ac3aa184 ]---
And then eventually my test failed. It yells at this code:
if (bdi_cap_writeback_dirty(bdi) &&
!test_bit(BDI_registered, &bdi->state)) {
WARN_ON(1);
printk("bdi-%s not registered\n", bdi->name);
}
UBIFS is flash file-system. It works on top of MTD devices,
not block devices. Well, to be correct, it works on top of
UBI volumes, which sit on top of MTD devices, which represent
raw flash.
UBIFS needs write-back, but it does not need a full BDI
device. So we used-to have a fake BDI device. Also, UBIFS
wants to disable read-ahead. We do not need anything else
from the block sub-system.
I guess the reason for the complaint is that UBIFS does
not call 'bdi_register()' or 'bdi_register_dev()'. The
question is - should it? 'bdi_register()' a block device,
but we do not have one.
Suggestions?
Artem.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 14:17 ` Artem Bityutskiy
@ 2009-05-28 14:19 ` Artem Bityutskiy
2009-05-28 20:35 ` Peter Zijlstra
0 siblings, 1 reply; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-28 14:19 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Artem Bityutskiy wrote:
> question is - should it? 'bdi_register()' a block device,
> but we do not have one.
Sorry, wanted to say: 'bdi_register()' registers a block
device.
Artem.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (12 preceding siblings ...)
2009-05-28 14:17 ` Artem Bityutskiy
@ 2009-05-28 14:41 ` Theodore Tso
2009-05-29 16:07 ` Artem Bityutskiy
` (2 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Theodore Tso @ 2009-05-28 14:41 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, chris.mason, david, hch, akpm, jack,
yanmin_zhang, richard, damien.wyart
On Thu, May 28, 2009 at 01:46:33PM +0200, Jens Axboe wrote:
> Hi,
>
> Here's the 9th version of the writeback patches. Changes since v8:
>
> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
> issue.
It appears to have fixed the soft lockup hang when running fsstress,
thanks!!
- Ted
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 14:19 ` Artem Bityutskiy
@ 2009-05-28 20:35 ` Peter Zijlstra
2009-05-28 22:27 ` Jens Axboe
2009-05-29 15:37 ` Artem Bityutskiy
0 siblings, 2 replies; 70+ messages in thread
From: Peter Zijlstra @ 2009-05-28 20:35 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: Jens Axboe, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
On Thu, 2009-05-28 at 17:19 +0300, Artem Bityutskiy wrote:
> Artem Bityutskiy wrote:
> > question is - should it? 'bdi_register()' a block device,
> > but we do not have one.
>
> Sorry, wanted to say: 'bdi_register()' registers a block
> device.
BDI stands for backing device info and is not related to block devices
other than that block devices can also be (ok, always are) backing
devices.
If you want to do writeback, you need a backing device to write to. The
BDI is the device abstraction for writeback.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 20:35 ` Peter Zijlstra
@ 2009-05-28 22:27 ` Jens Axboe
2009-05-29 15:37 ` Artem Bityutskiy
1 sibling, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 22:27 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Artem Bityutskiy, linux-kernel, linux-fsdevel, tytso,
chris.mason, david, hch, akpm, jack, yanmin_zhang, richard,
damien.wyart
On Thu, May 28 2009, Peter Zijlstra wrote:
> On Thu, 2009-05-28 at 17:19 +0300, Artem Bityutskiy wrote:
> > Artem Bityutskiy wrote:
> > > question is - should it? 'bdi_register()' a block device,
> > > but we do not have one.
> >
> > Sorry, wanted to say: 'bdi_register()' registers a block
> > device.
>
> BDI stands for backing device info and is not related to block devices
> other than that block devices can also be (ok, always are) backing
> devices.
>
> If you want to do writeback, you need a backing device to write to. The
> BDI is the device abstraction for writeback.
Precisely. Apparently ubifs doesn't register its backing device. I fixed
a similar issue in btrfs, I'll do an audit of the file systems and fix
that up.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 04/11] writeback: switch to per-bdi threads for flushing data
2009-05-28 14:13 ` Artem Bityutskiy
@ 2009-05-28 22:28 ` Jens Axboe
0 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 22:28 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
On Thu, May 28 2009, Artem Bityutskiy wrote:
> Jens Axboe wrote:
>> +#define BDI_CAP_FLUSH_FORKER 0x00000200
>
> Would it please be possible to add a comment about
> what this flag is, and whether it is for internal
> usage or not. Not immediately obvious for me.
It's internal, probably I should just replace it with a check for
&default_backing_dev_info. If not, I'll add a comment.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 13:56 ` [PATCH 0/11] Per-bdi writeback flusher threads v9 Peter Zijlstra
@ 2009-05-28 22:28 ` Jens Axboe
0 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-28 22:28 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
On Thu, May 28 2009, Peter Zijlstra wrote:
> On Thu, 2009-05-28 at 13:46 +0200, Jens Axboe wrote:
> > - Get rid of the explicit wait queues, we can just use wake_up_process()
> > since it's just for that one task.
>
> Ah, good, should clean up those funny prepare/finish_wait thingies that
> looked odd.
Precisely, they are gone now :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 20:35 ` Peter Zijlstra
@ 2009-05-29 15:37 ` Artem Bityutskiy
2009-05-29 15:37 ` Artem Bityutskiy
1 sibling, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-29 15:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jens Axboe, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Peter Zijlstra wrote:
> On Thu, 2009-05-28 at 17:19 +0300, Artem Bityutskiy wrote:
>> Artem Bityutskiy wrote:
>>> question is - should it? 'bdi_register()' a block device,
>>> but we do not have one.
>> Sorry, wanted to say: 'bdi_register()' registers a block
>> device.
>
> BDI stands for backing device info and is not related to block devices
> other than that block devices can also be (ok, always are) backing
> devices.
>
> If you want to do writeback, you need a backing device to write to. The
> BDI is the device abstraction for writeback.
I see, thanks. The below UBIFS patch fixes the issue. I'll
push it to ubifs-2.6.git tree, unless there are objections.
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Subject: [PATCH] UBIFS: do not forget to register BDI device
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/ubifs/super.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 2349e2c..d1ac967 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
err = bdi_init(&c->bdi);
if (err)
goto out_close;
+ err = bdi_register(&c->bdi, NULL, "ubifs");
+ if (err)
+ goto out_close;
err = ubifs_parse_options(c, data, 0);
if (err)
--
1.6.0.6
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply related [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-05-29 15:37 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-29 15:37 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Jens Axboe, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Peter Zijlstra wrote:
> On Thu, 2009-05-28 at 17:19 +0300, Artem Bityutskiy wrote:
>> Artem Bityutskiy wrote:
>>> question is - should it? 'bdi_register()' a block device,
>>> but we do not have one.
>> Sorry, wanted to say: 'bdi_register()' registers a block
>> device.
>
> BDI stands for backing device info and is not related to block devices
> other than that block devices can also be (ok, always are) backing
> devices.
>
> If you want to do writeback, you need a backing device to write to. The
> BDI is the device abstraction for writeback.
I see, thanks. The below UBIFS patch fixes the issue. I'll
push it to ubifs-2.6.git tree, unless there are objections.
From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Subject: [PATCH] UBIFS: do not forget to register BDI device
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
---
fs/ubifs/super.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index 2349e2c..d1ac967 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
err = bdi_init(&c->bdi);
if (err)
goto out_close;
+ err = bdi_register(&c->bdi, NULL, "ubifs");
+ if (err)
+ goto out_close;
err = ubifs_parse_options(c, data, 0);
if (err)
--
1.6.0.6
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 15:37 ` Artem Bityutskiy
(?)
@ 2009-05-29 15:50 ` Jens Axboe
2009-05-29 16:02 ` Artem Bityutskiy
-1 siblings, 1 reply; 70+ messages in thread
From: Jens Axboe @ 2009-05-29 15:50 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
On Fri, May 29 2009, Artem Bityutskiy wrote:
> Peter Zijlstra wrote:
>> On Thu, 2009-05-28 at 17:19 +0300, Artem Bityutskiy wrote:
>>> Artem Bityutskiy wrote:
>>>> question is - should it? 'bdi_register()' a block device,
>>>> but we do not have one.
>>> Sorry, wanted to say: 'bdi_register()' registers a block
>>> device.
>>
>> BDI stands for backing device info and is not related to block devices
>> other than that block devices can also be (ok, always are) backing
>> devices.
>>
>> If you want to do writeback, you need a backing device to write to. The
>> BDI is the device abstraction for writeback.
>
> I see, thanks. The below UBIFS patch fixes the issue. I'll
> push it to ubifs-2.6.git tree, unless there are objections.
>
> From: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> Subject: [PATCH] UBIFS: do not forget to register BDI device
>
> Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
> ---
> fs/ubifs/super.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
> index 2349e2c..d1ac967 100644
> --- a/fs/ubifs/super.c
> +++ b/fs/ubifs/super.c
> @@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
> err = bdi_init(&c->bdi);
> if (err)
> goto out_close;
> + err = bdi_register(&c->bdi, NULL, "ubifs");
> + if (err)
> + goto out_close;
Not quite right, you need to call bdi_destroy() if you have done the
init.
I committed this one this morning:
http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=570a2fe1df85741988ad0ca22aa406744436e281
But feel free to commit/submit to the ubifs tree directly, then it'll
disappear from my tree once it is merged.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 15:50 ` Jens Axboe
@ 2009-05-29 16:02 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-29 16:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
>> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
>> index 2349e2c..d1ac967 100644
>> --- a/fs/ubifs/super.c
>> +++ b/fs/ubifs/super.c
>> @@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
>> err = bdi_init(&c->bdi);
>> if (err)
>> goto out_close;
>> + err = bdi_register(&c->bdi, NULL, "ubifs");
>> + if (err)
>> + goto out_close;
>
> Not quite right, you need to call bdi_destroy() if you have done the
> init.
Right, bdi_destroy() is already there for long time.
I'm confused.
> I committed this one this morning:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=570a2fe1df85741988ad0ca22aa406744436e281
Hmm, it is the same as my patch, but you do
+ err = bdi_register(&c->bdi);
while I do
+ err = bdi_register(&c->bdi, NULL, "ubifs");
> But feel free to commit/submit to the ubifs tree directly, then it'll
> disappear from my tree once it is merged.
Yeah, I think it can go via my tree. I'd merge it at
2.6.31 window. This change does not depend on your
work anyway.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-05-29 16:02 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-29 16:02 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
>> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
>> index 2349e2c..d1ac967 100644
>> --- a/fs/ubifs/super.c
>> +++ b/fs/ubifs/super.c
>> @@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
>> err = bdi_init(&c->bdi);
>> if (err)
>> goto out_close;
>> + err = bdi_register(&c->bdi, NULL, "ubifs");
>> + if (err)
>> + goto out_close;
>
> Not quite right, you need to call bdi_destroy() if you have done the
> init.
Right, bdi_destroy() is already there for long time.
I'm confused.
> I committed this one this morning:
>
> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=570a2fe1df85741988ad0ca22aa406744436e281
Hmm, it is the same as my patch, but you do
+ err = bdi_register(&c->bdi);
while I do
+ err = bdi_register(&c->bdi, NULL, "ubifs");
> But feel free to commit/submit to the ubifs tree directly, then it'll
> disappear from my tree once it is merged.
Yeah, I think it can go via my tree. I'd merge it at
2.6.31 window. This change does not depend on your
work anyway.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (13 preceding siblings ...)
2009-05-28 14:41 ` Theodore Tso
@ 2009-05-29 16:07 ` Artem Bityutskiy
2009-05-29 16:20 ` Artem Bityutskiy
2009-05-29 17:08 ` Jens Axboe
2009-06-03 11:12 ` Artem Bityutskiy
2009-06-04 15:20 ` Frederic Weisbecker
16 siblings, 2 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-29 16:07 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
> Hi,
>
> Here's the 9th version of the writeback patches. Changes since v8:
>
> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
> issue.
> - Get rid of the explicit wait queues, we can just use wake_up_process()
> since it's just for that one task.
> - Add separate "sync_supers" thread that makes sure that the dirty
> super blocks get written. We cannot safely do this from bdi_forker_task(),
> as that risks deadlocking on ->s_umount. Artem, I implemented this
> by doing the wake ups from a timer so that it would be easier for you
> to just deactivate the timer when there are no super blocks.
>
> For ease of patching, I've put the full diff here:
>
> http://kernel.dk/writeback-v9.patch
>
> and also stored this in a writeback-v9 branch that will not change,
> you can pull that into Linus tree from here:
>
> git://git.kernel.dk/linux-2.6-block.git writeback-v9
I'm working with the above branch. Got the following twice.
Not sure what triggers this, probably if I do nothing and
cpufreq starts doing its magic, this is triggered.
And I'm not sure it has something to do with your changes,
it is just that I saw this only with your tree. Please,
ignore if this is not relevant.
=======================================================
scaling: [ INFO: possible circular locking dependency detected ]
2.6.30-rc7-block-2.6 #1
-------------------------------------------------------
K99cpuspeed/9923 is trying to acquire lock:
(&(&dbs_info->work)->work){+.+...}, at: [<ffffffff81051155>] __cancel_work_timer+0xd9/0x21d
but task is already holding lock:
(dbs_mutex){+.+.+.}, at: [<ffffffffa0073aa8>] cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (dbs_mutex){+.+.+.}:
[<ffffffff81063529>] __lock_acquire+0xa63/0xbeb
[<ffffffff8106379f>] lock_acquire+0xee/0x112
[<ffffffff812f4eb0>] __mutex_lock_common+0x5a/0x419
[<ffffffff812f5309>] mutex_lock_nested+0x30/0x35
[<ffffffffa00738f2>] cpufreq_governor_dbs+0x86/0x2cc [cpufreq_ondemand]
[<ffffffff8125eaa4>] __cpufreq_governor+0x84/0xc2
[<ffffffff8125ecae>] __cpufreq_set_policy+0x195/0x211
[<ffffffff8125f6fb>] store_scaling_governor+0x1e7/0x223
[<ffffffff8126038f>] store+0x5f/0x83
[<ffffffff81125107>] sysfs_write_file+0xe4/0x119
[<ffffffff810d24ae>] vfs_write+0xab/0x105
[<ffffffff810d25cc>] sys_write+0x47/0x70
[<ffffffff8100bc2b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
-> #1 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
[<ffffffff81063529>] __lock_acquire+0xa63/0xbeb
[<ffffffff8106379f>] lock_acquire+0xee/0x112
[<ffffffff812f5561>] down_write+0x3d/0x49
[<ffffffff8125fc31>] lock_policy_rwsem_write+0x48/0x78
[<ffffffffa007364c>] do_dbs_timer+0x5f/0x27f [cpufreq_ondemand]
[<ffffffff81050869>] worker_thread+0x24b/0x367
[<ffffffff810547c1>] kthread+0x56/0x83
[<ffffffff8100cd3a>] child_rip+0xa/0x20
[<ffffffffffffffff>] 0xffffffffffffffff
-> #0 (&(&dbs_info->work)->work){+.+...}:
[<ffffffff8106341d>] __lock_acquire+0x957/0xbeb
[<ffffffff8106379f>] lock_acquire+0xee/0x112
[<ffffffff81051189>] __cancel_work_timer+0x10d/0x21d
[<ffffffff810512a6>] cancel_delayed_work_sync+0xd/0xf
[<ffffffffa0073abb>] cpufreq_governor_dbs+0x24f/0x2cc [cpufreq_ondemand]
[<ffffffff8125eaa4>] __cpufreq_governor+0x84/0xc2
[<ffffffff8125ec98>] __cpufreq_set_policy+0x17f/0x211
[<ffffffff8125f6fb>] store_scaling_governor+0x1e7/0x223
[<ffffffff8126038f>] store+0x5f/0x83
[<ffffffff81125107>] sysfs_write_file+0xe4/0x119
[<ffffffff810d24ae>] vfs_write+0xab/0x105
[<ffffffff810d25cc>] sys_write+0x47/0x70
[<ffffffff8100bc2b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
other info that might help us debug this:
3 locks held by K99cpuspeed/9923:
#0: (&buffer->mutex){+.+.+.}, at: [<ffffffff8112505b>] sysfs_write_file+0x38/0x119
#1: (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at: [<ffffffff8125fc31>] lock_policy_rwsem_write+0x48/0x78
#2: (dbs_mutex){+.+.+.}, at: [<ffffffffa0073aa8>] cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
stack backtrace:
Pid: 9923, comm: K99cpuspeed Not tainted 2.6.30-rc7-block-2.6 #1
Call Trace:
[<ffffffff81062750>] print_circular_bug_tail+0x71/0x7c
[<ffffffff8106341d>] __lock_acquire+0x957/0xbeb
[<ffffffff8106379f>] lock_acquire+0xee/0x112
[<ffffffff81051155>] ? __cancel_work_timer+0xd9/0x21d
[<ffffffff81051189>] __cancel_work_timer+0x10d/0x21d
[<ffffffff81051155>] ? __cancel_work_timer+0xd9/0x21d
[<ffffffff812f5218>] ? __mutex_lock_common+0x3c2/0x419
[<ffffffffa0073aa8>] ? cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
[<ffffffff81061e66>] ? mark_held_locks+0x4d/0x6b
[<ffffffffa0073aa8>] ? cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
[<ffffffff810512a6>] cancel_delayed_work_sync+0xd/0xf
[<ffffffffa0073abb>] cpufreq_governor_dbs+0x24f/0x2cc [cpufreq_ondemand]
[<ffffffff810580f1>] ? up_read+0x26/0x2b
[<ffffffff8125eaa4>] __cpufreq_governor+0x84/0xc2
[<ffffffff8125ec98>] __cpufreq_set_policy+0x17f/0x211
[<ffffffff8125f6fb>] store_scaling_governor+0x1e7/0x223
[<ffffffff812604dc>] ? handle_update+0x0/0x33
[<ffffffff812f5569>] ? down_write+0x45/0x49
[<ffffffff8126038f>] store+0x5f/0x83
[<ffffffff81125107>] sysfs_write_file+0xe4/0x119
[<ffffffff810d24ae>] vfs_write+0xab/0x105
[<ffffffff810d25cc>] sys_write+0x47/0x70
[<ffffffff8100bc2b>] system_call_fastpath+0x16/0x1b
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 16:07 ` Artem Bityutskiy
@ 2009-05-29 16:20 ` Artem Bityutskiy
2009-05-29 17:08 ` Jens Axboe
1 sibling, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-29 16:20 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Artem Bityutskiy wrote:
> Jens Axboe wrote:
>> Hi,
>>
>> Here's the 9th version of the writeback patches. Changes since v8:
>>
>> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
>> issue.
>> - Get rid of the explicit wait queues, we can just use wake_up_process()
>> since it's just for that one task.
>> - Add separate "sync_supers" thread that makes sure that the dirty
>> super blocks get written. We cannot safely do this from
>> bdi_forker_task(),
>> as that risks deadlocking on ->s_umount. Artem, I implemented this
>> by doing the wake ups from a timer so that it would be easier for you
>> to just deactivate the timer when there are no super blocks.
>>
>> For ease of patching, I've put the full diff here:
>>
>> http://kernel.dk/writeback-v9.patch
>>
>> and also stored this in a writeback-v9 branch that will not change,
>> you can pull that into Linus tree from here:
>>
>> git://git.kernel.dk/linux-2.6-block.git writeback-v9
>
> I'm working with the above branch. Got the following twice.
> Not sure what triggers this, probably if I do nothing and
> cpufreq starts doing its magic, this is triggered.
>
> And I'm not sure it has something to do with your changes,
> it is just that I saw this only with your tree. Please,
> ignore if this is not relevant.
Sorry, probably I shouldn't have reported this before looking
closer. I'll investigate this later and fine out whether it
is related to your work or not. Sorry for too early and probably
false alarm.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-05-29 16:20 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-05-29 16:20 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Artem Bityutskiy wrote:
> Jens Axboe wrote:
>> Hi,
>>
>> Here's the 9th version of the writeback patches. Changes since v8:
>>
>> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
>> issue.
>> - Get rid of the explicit wait queues, we can just use wake_up_process()
>> since it's just for that one task.
>> - Add separate "sync_supers" thread that makes sure that the dirty
>> super blocks get written. We cannot safely do this from
>> bdi_forker_task(),
>> as that risks deadlocking on ->s_umount. Artem, I implemented this
>> by doing the wake ups from a timer so that it would be easier for you
>> to just deactivate the timer when there are no super blocks.
>>
>> For ease of patching, I've put the full diff here:
>>
>> http://kernel.dk/writeback-v9.patch
>>
>> and also stored this in a writeback-v9 branch that will not change,
>> you can pull that into Linus tree from here:
>>
>> git://git.kernel.dk/linux-2.6-block.git writeback-v9
>
> I'm working with the above branch. Got the following twice.
> Not sure what triggers this, probably if I do nothing and
> cpufreq starts doing its magic, this is triggered.
>
> And I'm not sure it has something to do with your changes,
> it is just that I saw this only with your tree. Please,
> ignore if this is not relevant.
Sorry, probably I shouldn't have reported this before looking
closer. I'll investigate this later and fine out whether it
is related to your work or not. Sorry for too early and probably
false alarm.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 16:02 ` Artem Bityutskiy
(?)
@ 2009-05-29 17:07 ` Jens Axboe
2009-06-03 7:39 ` Artem Bityutskiy
-1 siblings, 1 reply; 70+ messages in thread
From: Jens Axboe @ 2009-05-29 17:07 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
On Fri, May 29 2009, Artem Bityutskiy wrote:
> Jens Axboe wrote:
>>> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
>>> index 2349e2c..d1ac967 100644
>>> --- a/fs/ubifs/super.c
>>> +++ b/fs/ubifs/super.c
>>> @@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
>>> err = bdi_init(&c->bdi);
>>> if (err)
>>> goto out_close;
>>> + err = bdi_register(&c->bdi, NULL, "ubifs");
>>> + if (err)
>>> + goto out_close;
>>
>> Not quite right, you need to call bdi_destroy() if you have done the
>> init.
>
> Right, bdi_destroy() is already there for long time.
> I'm confused.
>
>> I committed this one this morning:
>>
>> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=570a2fe1df85741988ad0ca22aa406744436e281
>
> Hmm, it is the same as my patch, but you do
> + err = bdi_register(&c->bdi);
> while I do
> + err = bdi_register(&c->bdi, NULL, "ubifs");
Oops, that's my bad. If you combine the two, we should have a working
patch :-)
>> But feel free to commit/submit to the ubifs tree directly, then it'll
>> disappear from my tree once it is merged.
>
> Yeah, I think it can go via my tree. I'd merge it at
> 2.6.31 window. This change does not depend on your
> work anyway.
Right, I'll just carry the fixup patches meanwhile as well, but wont
upstream them.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 16:07 ` Artem Bityutskiy
2009-05-29 16:20 ` Artem Bityutskiy
@ 2009-05-29 17:08 ` Jens Axboe
1 sibling, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-05-29 17:08 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
On Fri, May 29 2009, Artem Bityutskiy wrote:
> Jens Axboe wrote:
>> Hi,
>>
>> Here's the 9th version of the writeback patches. Changes since v8:
>>
>> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
>> issue.
>> - Get rid of the explicit wait queues, we can just use wake_up_process()
>> since it's just for that one task.
>> - Add separate "sync_supers" thread that makes sure that the dirty
>> super blocks get written. We cannot safely do this from bdi_forker_task(),
>> as that risks deadlocking on ->s_umount. Artem, I implemented this
>> by doing the wake ups from a timer so that it would be easier for you
>> to just deactivate the timer when there are no super blocks.
>>
>> For ease of patching, I've put the full diff here:
>>
>> http://kernel.dk/writeback-v9.patch
>>
>> and also stored this in a writeback-v9 branch that will not change,
>> you can pull that into Linus tree from here:
>>
>> git://git.kernel.dk/linux-2.6-block.git writeback-v9
>
> I'm working with the above branch. Got the following twice.
> Not sure what triggers this, probably if I do nothing and
> cpufreq starts doing its magic, this is triggered.
>
> And I'm not sure it has something to do with your changes,
> it is just that I saw this only with your tree. Please,
> ignore if this is not relevant.
OK, doesn't look related, but if it only triggers with the writeback
patches, something fishy is going on. I'll check up on it.
>
> =======================================================
> scaling: [ INFO: possible circular locking dependency detected ]
> 2.6.30-rc7-block-2.6 #1
> -------------------------------------------------------
> K99cpuspeed/9923 is trying to acquire lock:
> (&(&dbs_info->work)->work){+.+...}, at: [<ffffffff81051155>]
> __cancel_work_timer+0xd9/0x21d
>
> but task is already holding lock:
> (dbs_mutex){+.+.+.}, at: [<ffffffffa0073aa8>] cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (dbs_mutex){+.+.+.}:
> [<ffffffff81063529>] __lock_acquire+0xa63/0xbeb
> [<ffffffff8106379f>] lock_acquire+0xee/0x112
> [<ffffffff812f4eb0>] __mutex_lock_common+0x5a/0x419
> [<ffffffff812f5309>] mutex_lock_nested+0x30/0x35
> [<ffffffffa00738f2>] cpufreq_governor_dbs+0x86/0x2cc [cpufreq_ondemand]
> [<ffffffff8125eaa4>] __cpufreq_governor+0x84/0xc2
> [<ffffffff8125ecae>] __cpufreq_set_policy+0x195/0x211
> [<ffffffff8125f6fb>] store_scaling_governor+0x1e7/0x223
> [<ffffffff8126038f>] store+0x5f/0x83
> [<ffffffff81125107>] sysfs_write_file+0xe4/0x119
> [<ffffffff810d24ae>] vfs_write+0xab/0x105
> [<ffffffff810d25cc>] sys_write+0x47/0x70
> [<ffffffff8100bc2b>]
> system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> -> #1 (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}:
> [<ffffffff81063529>] __lock_acquire+0xa63/0xbeb
> [<ffffffff8106379f>] lock_acquire+0xee/0x112
> [<ffffffff812f5561>] down_write+0x3d/0x49 [<ffffffff8125fc31>]
> lock_policy_rwsem_write+0x48/0x78
> [<ffffffffa007364c>] do_dbs_timer+0x5f/0x27f [cpufreq_ondemand]
> [<ffffffff81050869>] worker_thread+0x24b/0x367
> [<ffffffff810547c1>] kthread+0x56/0x83
> [<ffffffff8100cd3a>] child_rip+0xa/0x20
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> -> #0 (&(&dbs_info->work)->work){+.+...}:
> [<ffffffff8106341d>] __lock_acquire+0x957/0xbeb
> [<ffffffff8106379f>] lock_acquire+0xee/0x112
> [<ffffffff81051189>] __cancel_work_timer+0x10d/0x21d
> [<ffffffff810512a6>] cancel_delayed_work_sync+0xd/0xf
> [<ffffffffa0073abb>] cpufreq_governor_dbs+0x24f/0x2cc [cpufreq_ondemand]
> [<ffffffff8125eaa4>] __cpufreq_governor+0x84/0xc2
> [<ffffffff8125ec98>] __cpufreq_set_policy+0x17f/0x211
> [<ffffffff8125f6fb>] store_scaling_governor+0x1e7/0x223
> [<ffffffff8126038f>] store+0x5f/0x83
> [<ffffffff81125107>] sysfs_write_file+0xe4/0x119
> [<ffffffff810d24ae>] vfs_write+0xab/0x105
> [<ffffffff810d25cc>] sys_write+0x47/0x70
> [<ffffffff8100bc2b>]
> system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> other info that might help us debug this:
>
> 3 locks held by K99cpuspeed/9923:
> #0: (&buffer->mutex){+.+.+.}, at: [<ffffffff8112505b>] sysfs_write_file+0x38/0x119
> #1: (&per_cpu(cpu_policy_rwsem, cpu)){+++++.}, at: [<ffffffff8125fc31>] lock_policy_rwsem_write+0x48/0x78
> #2: (dbs_mutex){+.+.+.}, at: [<ffffffffa0073aa8>] cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
>
> stack backtrace:
> Pid: 9923, comm: K99cpuspeed Not tainted 2.6.30-rc7-block-2.6 #1
> Call Trace:
> [<ffffffff81062750>] print_circular_bug_tail+0x71/0x7c
> [<ffffffff8106341d>] __lock_acquire+0x957/0xbeb
> [<ffffffff8106379f>] lock_acquire+0xee/0x112
> [<ffffffff81051155>] ? __cancel_work_timer+0xd9/0x21d
> [<ffffffff81051189>] __cancel_work_timer+0x10d/0x21d
> [<ffffffff81051155>] ? __cancel_work_timer+0xd9/0x21d
> [<ffffffff812f5218>] ? __mutex_lock_common+0x3c2/0x419
> [<ffffffffa0073aa8>] ? cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
> [<ffffffff81061e66>] ? mark_held_locks+0x4d/0x6b
> [<ffffffffa0073aa8>] ? cpufreq_governor_dbs+0x23c/0x2cc [cpufreq_ondemand]
> [<ffffffff810512a6>] cancel_delayed_work_sync+0xd/0xf
> [<ffffffffa0073abb>] cpufreq_governor_dbs+0x24f/0x2cc [cpufreq_ondemand]
> [<ffffffff810580f1>] ? up_read+0x26/0x2b
> [<ffffffff8125eaa4>] __cpufreq_governor+0x84/0xc2
> [<ffffffff8125ec98>] __cpufreq_set_policy+0x17f/0x211
> [<ffffffff8125f6fb>] store_scaling_governor+0x1e7/0x223
> [<ffffffff812604dc>] ? handle_update+0x0/0x33
> [<ffffffff812f5569>] ? down_write+0x45/0x49
> [<ffffffff8126038f>] store+0x5f/0x83
> [<ffffffff81125107>] sysfs_write_file+0xe4/0x119
> [<ffffffff810d24ae>] vfs_write+0xab/0x105
> [<ffffffff810d25cc>] sys_write+0x47/0x70
> [<ffffffff8100bc2b>] system_call_fastpath+0x16/0x1b
>
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 16:20 ` Artem Bityutskiy
(?)
@ 2009-05-29 17:09 ` Jens Axboe
2009-06-03 8:11 ` Artem Bityutskiy
-1 siblings, 1 reply; 70+ messages in thread
From: Jens Axboe @ 2009-05-29 17:09 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
On Fri, May 29 2009, Artem Bityutskiy wrote:
> Artem Bityutskiy wrote:
>> Jens Axboe wrote:
>>> Hi,
>>>
>>> Here's the 9th version of the writeback patches. Changes since v8:
>>>
>>> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
>>> issue.
>>> - Get rid of the explicit wait queues, we can just use wake_up_process()
>>> since it's just for that one task.
>>> - Add separate "sync_supers" thread that makes sure that the dirty
>>> super blocks get written. We cannot safely do this from
>>> bdi_forker_task(),
>>> as that risks deadlocking on ->s_umount. Artem, I implemented this
>>> by doing the wake ups from a timer so that it would be easier for you
>>> to just deactivate the timer when there are no super blocks.
>>>
>>> For ease of patching, I've put the full diff here:
>>>
>>> http://kernel.dk/writeback-v9.patch
>>>
>>> and also stored this in a writeback-v9 branch that will not change,
>>> you can pull that into Linus tree from here:
>>>
>>> git://git.kernel.dk/linux-2.6-block.git writeback-v9
>>
>> I'm working with the above branch. Got the following twice.
>> Not sure what triggers this, probably if I do nothing and
>> cpufreq starts doing its magic, this is triggered.
>>
>> And I'm not sure it has something to do with your changes,
>> it is just that I saw this only with your tree. Please,
>> ignore if this is not relevant.
>
> Sorry, probably I shouldn't have reported this before looking
> closer. I'll investigate this later and fine out whether it
> is related to your work or not. Sorry for too early and probably
> false alarm.
No problem. If it does turn out to have some relation to the writeback
stuff, let me know.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 17:07 ` Jens Axboe
@ 2009-06-03 7:39 ` Artem Bityutskiy
2009-06-03 7:44 ` Jens Axboe
0 siblings, 1 reply; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 7:39 UTC (permalink / raw)
To: Jens Axboe
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
> On Fri, May 29 2009, Artem Bityutskiy wrote:
>> Jens Axboe wrote:
>>>> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
>>>> index 2349e2c..d1ac967 100644
>>>> --- a/fs/ubifs/super.c
>>>> +++ b/fs/ubifs/super.c
>>>> @@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
>>>> err = bdi_init(&c->bdi);
>>>> if (err)
>>>> goto out_close;
>>>> + err = bdi_register(&c->bdi, NULL, "ubifs");
>>>> + if (err)
>>>> + goto out_close;
>>> Not quite right, you need to call bdi_destroy() if you have done the
>>> init.
>> Right, bdi_destroy() is already there for long time.
>> I'm confused.
>>
>>> I committed this one this morning:
>>>
>>> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=570a2fe1df85741988ad0ca22aa406744436e281
>> Hmm, it is the same as my patch, but you do
>> + err = bdi_register(&c->bdi);
>> while I do
>> + err = bdi_register(&c->bdi, NULL, "ubifs");
>
> Oops, that's my bad. If you combine the two, we should have a working
> patch :-)
>
>>> But feel free to commit/submit to the ubifs tree directly, then it'll
>>> disappear from my tree once it is merged.
>> Yeah, I think it can go via my tree. I'd merge it at
>> 2.6.31 window. This change does not depend on your
>> work anyway.
>
> Right, I'll just carry the fixup patches meanwhile as well, but wont
> upstream them.
Just to make sure I understood you correctly. I assume my original
patch is fine (because there is bdi_destroy()) and merge it to
ubifs tree.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-03 7:39 ` Artem Bityutskiy
@ 2009-06-03 7:44 ` Jens Axboe
2009-06-03 7:46 ` Artem Bityutskiy
2009-06-03 7:59 ` Artem Bityutskiy
0 siblings, 2 replies; 70+ messages in thread
From: Jens Axboe @ 2009-06-03 7:44 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
On Wed, Jun 03 2009, Artem Bityutskiy wrote:
> Jens Axboe wrote:
>> On Fri, May 29 2009, Artem Bityutskiy wrote:
>>> Jens Axboe wrote:
>>>>> diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
>>>>> index 2349e2c..d1ac967 100644
>>>>> --- a/fs/ubifs/super.c
>>>>> +++ b/fs/ubifs/super.c
>>>>> @@ -1929,6 +1929,9 @@ static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
>>>>> err = bdi_init(&c->bdi);
>>>>> if (err)
>>>>> goto out_close;
>>>>> + err = bdi_register(&c->bdi, NULL, "ubifs");
>>>>> + if (err)
>>>>> + goto out_close;
>>>> Not quite right, you need to call bdi_destroy() if you have done the
>>>> init.
>>> Right, bdi_destroy() is already there for long time.
>>> I'm confused.
>>>
>>>> I committed this one this morning:
>>>>
>>>> http://git.kernel.dk/?p=linux-2.6-block.git;a=commit;h=570a2fe1df85741988ad0ca22aa406744436e281
>>> Hmm, it is the same as my patch, but you do
>>> + err = bdi_register(&c->bdi);
>>> while I do
>>> + err = bdi_register(&c->bdi, NULL, "ubifs");
>>
>> Oops, that's my bad. If you combine the two, we should have a working
>> patch :-)
>>
>>>> But feel free to commit/submit to the ubifs tree directly, then it'll
>>>> disappear from my tree once it is merged.
>>> Yeah, I think it can go via my tree. I'd merge it at
>>> 2.6.31 window. This change does not depend on your
>>> work anyway.
>>
>> Right, I'll just carry the fixup patches meanwhile as well, but wont
>> upstream them.
>
> Just to make sure I understood you correctly. I assume my original
> patch is fine (because there is bdi_destroy()) and merge it to
> ubifs tree.
It needs to be:
err = bdi_register(&c->bdi, NULL, "ubifs");
if (err)
goto out_bdi;
so you hit the bdi_destroy() for that failure, not goto out_close;
Otherwise it was fine.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-03 7:44 ` Jens Axboe
@ 2009-06-03 7:46 ` Artem Bityutskiy
2009-06-03 7:59 ` Artem Bityutskiy
1 sibling, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 7:46 UTC (permalink / raw)
To: Jens Axboe
Cc: Artem Bityutskiy, Peter Zijlstra, linux-kernel, linux-fsdevel,
tytso, chris.mason, david, hch, akpm, jack, yanmin_zhang,
richard, damien.wyart
Jens Axboe wrote:
>> Just to make sure I understood you correctly. I assume my original
>> patch is fine (because there is bdi_destroy()) and merge it to
>> ubifs tree.
>
> It needs to be:
>
> err = bdi_register(&c->bdi, NULL, "ubifs");
> if (err)
> goto out_bdi;
>
> so you hit the bdi_destroy() for that failure, not goto out_close;
> Otherwise it was fine.
Ah, I see. Rather non-typical convention though. I expected
bdi_register() to clean-up stuff in case of failure. Isn't
it a better interface?
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-06-03 7:46 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 7:46 UTC (permalink / raw)
To: Jens Axboe
Cc: Artem Bityutskiy, Peter Zijlstra, linux-kernel, linux-fsdevel,
tytso, chris.mason, david, hch, akpm, jack, yanmin_zhang,
richard, damien.wyart
Jens Axboe wrote:
>> Just to make sure I understood you correctly. I assume my original
>> patch is fine (because there is bdi_destroy()) and merge it to
>> ubifs tree.
>
> It needs to be:
>
> err = bdi_register(&c->bdi, NULL, "ubifs");
> if (err)
> goto out_bdi;
>
> so you hit the bdi_destroy() for that failure, not goto out_close;
> Otherwise it was fine.
Ah, I see. Rather non-typical convention though. I expected
bdi_register() to clean-up stuff in case of failure. Isn't
it a better interface?
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-03 7:46 ` Artem Bityutskiy
(?)
@ 2009-06-03 7:50 ` Jens Axboe
2009-06-03 7:54 ` Artem Bityutskiy
-1 siblings, 1 reply; 70+ messages in thread
From: Jens Axboe @ 2009-06-03 7:50 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
On Wed, Jun 03 2009, Artem Bityutskiy wrote:
> Jens Axboe wrote:
>>> Just to make sure I understood you correctly. I assume my original
>>> patch is fine (because there is bdi_destroy()) and merge it to
>>> ubifs tree.
>>
>> It needs to be:
>>
>> err = bdi_register(&c->bdi, NULL, "ubifs");
>> if (err)
>> goto out_bdi;
>>
>> so you hit the bdi_destroy() for that failure, not goto out_close;
>> Otherwise it was fine.
>
> Ah, I see. Rather non-typical convention though. I expected
> bdi_register() to clean-up stuff in case of failure. Isn't
> it a better interface?
You already did a bdi_init() at that point. bdi_destroy() must be used
to clean up after both bdi_init() and/or bdi_register().
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-03 7:50 ` Jens Axboe
@ 2009-06-03 7:54 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 7:54 UTC (permalink / raw)
To: ext Jens Axboe
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
>> Ah, I see. Rather non-typical convention though. I expected
>> bdi_register() to clean-up stuff in case of failure. Isn't
>> it a better interface?
>
> You already did a bdi_init() at that point. bdi_destroy() must be used
> to clean up after both bdi_init() and/or bdi_register().
Right, silly me.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-06-03 7:54 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 7:54 UTC (permalink / raw)
To: ext Jens Axboe
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
>> Ah, I see. Rather non-typical convention though. I expected
>> bdi_register() to clean-up stuff in case of failure. Isn't
>> it a better interface?
>
> You already did a bdi_init() at that point. bdi_destroy() must be used
> to clean up after both bdi_init() and/or bdi_register().
Right, silly me.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-03 7:44 ` Jens Axboe
@ 2009-06-03 7:59 ` Artem Bityutskiy
2009-06-03 7:59 ` Artem Bityutskiy
1 sibling, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 7:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Artem Bityutskiy, Peter Zijlstra, linux-kernel, linux-fsdevel,
tytso, chris.mason, david, hch, akpm, jack, yanmin_zhang,
richard, damien.wyart
Jens Axboe wrote:
>> Just to make sure I understood you correctly. I assume my original
>> patch is fine (because there is bdi_destroy()) and merge it to
>> ubifs tree.
>
> It needs to be:
>
> err = bdi_register(&c->bdi, NULL, "ubifs");
> if (err)
> goto out_bdi;
>
> so you hit the bdi_destroy() for that failure, not goto out_close;
> Otherwise it was fine.
Did this, also added a
Reviewed-by: Jens Axboe <jens.axboe@oracle.com>
http://git.infradead.org/ubifs-2.6.git?a=commit;h=813fdc16ad591e79d0c1b88d31970dcd1c2aa3f1
Thanks.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-06-03 7:59 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 7:59 UTC (permalink / raw)
To: Jens Axboe
Cc: Artem Bityutskiy, Peter Zijlstra, linux-kernel, linux-fsdevel,
tytso, chris.mason, david, hch, akpm, jack, yanmin_zhang,
richard, damien.wyart
Jens Axboe wrote:
>> Just to make sure I understood you correctly. I assume my original
>> patch is fine (because there is bdi_destroy()) and merge it to
>> ubifs tree.
>
> It needs to be:
>
> err = bdi_register(&c->bdi, NULL, "ubifs");
> if (err)
> goto out_bdi;
>
> so you hit the bdi_destroy() for that failure, not goto out_close;
> Otherwise it was fine.
Did this, also added a
Reviewed-by: Jens Axboe <jens.axboe@oracle.com>
http://git.infradead.org/ubifs-2.6.git?a=commit;h=813fdc16ad591e79d0c1b88d31970dcd1c2aa3f1
Thanks.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-03 7:59 ` Artem Bityutskiy
(?)
@ 2009-06-03 8:07 ` Jens Axboe
-1 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-06-03 8:07 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: Peter Zijlstra, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, yanmin_zhang, richard, damien.wyart
On Wed, Jun 03 2009, Artem Bityutskiy wrote:
> Jens Axboe wrote:
>>> Just to make sure I understood you correctly. I assume my original
>>> patch is fine (because there is bdi_destroy()) and merge it to
>>> ubifs tree.
>>
>> It needs to be:
>>
>> err = bdi_register(&c->bdi, NULL, "ubifs");
>> if (err)
>> goto out_bdi;
>>
>> so you hit the bdi_destroy() for that failure, not goto out_close;
>> Otherwise it was fine.
>
> Did this, also added a
> Reviewed-by: Jens Axboe <jens.axboe@oracle.com>
>
> http://git.infradead.org/ubifs-2.6.git?a=commit;h=813fdc16ad591e79d0c1b88d31970dcd1c2aa3f1
Looks good!
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-29 17:09 ` Jens Axboe
@ 2009-06-03 8:11 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 8:11 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
ext Jens Axboe wrote:
> On Fri, May 29 2009, Artem Bityutskiy wrote:
>> Artem Bityutskiy wrote:
>>> Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> Here's the 9th version of the writeback patches. Changes since v8:
>>>>
>>>> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
>>>> issue.
>>>> - Get rid of the explicit wait queues, we can just use wake_up_process()
>>>> since it's just for that one task.
>>>> - Add separate "sync_supers" thread that makes sure that the dirty
>>>> super blocks get written. We cannot safely do this from
>>>> bdi_forker_task(),
>>>> as that risks deadlocking on ->s_umount. Artem, I implemented this
>>>> by doing the wake ups from a timer so that it would be easier for you
>>>> to just deactivate the timer when there are no super blocks.
>>>>
>>>> For ease of patching, I've put the full diff here:
>>>>
>>>> http://kernel.dk/writeback-v9.patch
>>>>
>>>> and also stored this in a writeback-v9 branch that will not change,
>>>> you can pull that into Linus tree from here:
>>>>
>>>> git://git.kernel.dk/linux-2.6-block.git writeback-v9
>>> I'm working with the above branch. Got the following twice.
>>> Not sure what triggers this, probably if I do nothing and
>>> cpufreq starts doing its magic, this is triggered.
>>>
>>> And I'm not sure it has something to do with your changes,
>>> it is just that I saw this only with your tree. Please,
>>> ignore if this is not relevant.
>> Sorry, probably I shouldn't have reported this before looking
>> closer. I'll investigate this later and fine out whether it
>> is related to your work or not. Sorry for too early and probably
>> false alarm.
>
> No problem. If it does turn out to have some relation to the writeback
> stuff, let me know.
OK, I'm confirming that I observe this also with pure 2.6.30-rc7
as well.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-06-03 8:11 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 8:11 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
ext Jens Axboe wrote:
> On Fri, May 29 2009, Artem Bityutskiy wrote:
>> Artem Bityutskiy wrote:
>>> Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> Here's the 9th version of the writeback patches. Changes since v8:
>>>>
>>>> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
>>>> issue.
>>>> - Get rid of the explicit wait queues, we can just use wake_up_process()
>>>> since it's just for that one task.
>>>> - Add separate "sync_supers" thread that makes sure that the dirty
>>>> super blocks get written. We cannot safely do this from
>>>> bdi_forker_task(),
>>>> as that risks deadlocking on ->s_umount. Artem, I implemented this
>>>> by doing the wake ups from a timer so that it would be easier for you
>>>> to just deactivate the timer when there are no super blocks.
>>>>
>>>> For ease of patching, I've put the full diff here:
>>>>
>>>> http://kernel.dk/writeback-v9.patch
>>>>
>>>> and also stored this in a writeback-v9 branch that will not change,
>>>> you can pull that into Linus tree from here:
>>>>
>>>> git://git.kernel.dk/linux-2.6-block.git writeback-v9
>>> I'm working with the above branch. Got the following twice.
>>> Not sure what triggers this, probably if I do nothing and
>>> cpufreq starts doing its magic, this is triggered.
>>>
>>> And I'm not sure it has something to do with your changes,
>>> it is just that I saw this only with your tree. Please,
>>> ignore if this is not relevant.
>> Sorry, probably I shouldn't have reported this before looking
>> closer. I'll investigate this later and fine out whether it
>> is related to your work or not. Sorry for too early and probably
>> false alarm.
>
> No problem. If it does turn out to have some relation to the writeback
> stuff, let me know.
OK, I'm confirming that I observe this also with pure 2.6.30-rc7
as well.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
@ 2009-06-03 11:12 ` Artem Bityutskiy
2009-05-28 11:46 ` [PATCH 02/11] btrfs: properly register fs backing device Jens Axboe
` (15 subsequent siblings)
16 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 11:12 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
> Here's the 9th version of the writeback patches. Changes since v8:
>
> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
> issue.
> - Get rid of the explicit wait queues, we can just use wake_up_process()
> since it's just for that one task.
> - Add separate "sync_supers" thread that makes sure that the dirty
> super blocks get written. We cannot safely do this from bdi_forker_task(),
> as that risks deadlocking on ->s_umount. Artem, I implemented this
> by doing the wake ups from a timer so that it would be easier for you
> to just deactivate the timer when there are no super blocks.
I wonder if you would consider to work on top of the latest VFS changes:
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git for-next
For me the problem is that my original patches were created against
the VFS tree, and they do not apply nicely to your tree. So what I've
tried to do - I applied your patches on top of the VFS tree. But they
did not apply cleanly either. I'm currently working on merging them,
but I thought it is better to ask if you already did this.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-06-03 11:12 ` Artem Bityutskiy
0 siblings, 0 replies; 70+ messages in thread
From: Artem Bityutskiy @ 2009-06-03 11:12 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
Jens Axboe wrote:
> Here's the 9th version of the writeback patches. Changes since v8:
>
> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
> issue.
> - Get rid of the explicit wait queues, we can just use wake_up_process()
> since it's just for that one task.
> - Add separate "sync_supers" thread that makes sure that the dirty
> super blocks get written. We cannot safely do this from bdi_forker_task(),
> as that risks deadlocking on ->s_umount. Artem, I implemented this
> by doing the wake ups from a timer so that it would be easier for you
> to just deactivate the timer when there are no super blocks.
I wonder if you would consider to work on top of the latest VFS changes:
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git for-next
For me the problem is that my original patches were created against
the VFS tree, and they do not apply nicely to your tree. So what I've
tried to do - I applied your patches on top of the VFS tree. But they
did not apply cleanly either. I'm currently working on merging them,
but I thought it is better to ask if you already did this.
--
Best Regards,
Artem Bityutskiy (Артём Битюцкий)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-03 11:12 ` Artem Bityutskiy
(?)
@ 2009-06-03 11:42 ` Jens Axboe
-1 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-06-03 11:42 UTC (permalink / raw)
To: Artem Bityutskiy
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
On Wed, Jun 03 2009, Artem Bityutskiy wrote:
> Jens Axboe wrote:
>> Here's the 9th version of the writeback patches. Changes since v8:
>>
>> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
>> issue.
>> - Get rid of the explicit wait queues, we can just use wake_up_process()
>> since it's just for that one task.
>> - Add separate "sync_supers" thread that makes sure that the dirty
>> super blocks get written. We cannot safely do this from bdi_forker_task(),
>> as that risks deadlocking on ->s_umount. Artem, I implemented this
>> by doing the wake ups from a timer so that it would be easier for you
>> to just deactivate the timer when there are no super blocks.
>
> I wonder if you would consider to work on top of the latest VFS changes:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git for-next
>
> For me the problem is that my original patches were created against
> the VFS tree, and they do not apply nicely to your tree. So what I've
> tried to do - I applied your patches on top of the VFS tree. But they
> did not apply cleanly either. I'm currently working on merging them,
> but I thought it is better to ask if you already did this.
Al, what's the time frame for submitting these vfs changes? I'm assuming
2.6.31 since it's called for-next. If that is the case, then it would be
for the best if I rebase on top of those.
So, to answer your other ping mail as well, my writeback changes will
then be based on top off the vfs tree and then your 0-17 patches. Then
we should have a joint base to work from.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
` (15 preceding siblings ...)
2009-06-03 11:12 ` Artem Bityutskiy
@ 2009-06-04 15:20 ` Frederic Weisbecker
2009-06-04 19:07 ` Andrew Morton
2009-06-05 1:14 ` Zhang, Yanmin
16 siblings, 2 replies; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-04 15:20 UTC (permalink / raw)
To: Jens Axboe
Cc: linux-kernel, linux-fsdevel, tytso, chris.mason, david, hch,
akpm, jack, yanmin_zhang, richard, damien.wyart
[-- Attachment #1: Type: text/plain, Size: 3380 bytes --]
Hi,
On Thu, May 28, 2009 at 01:46:33PM +0200, Jens Axboe wrote:
> Hi,
>
> Here's the 9th version of the writeback patches. Changes since v8:
>
> - Fix a bdi_work on-stack allocation hang. I hope this fixes Ted's
> issue.
> - Get rid of the explicit wait queues, we can just use wake_up_process()
> since it's just for that one task.
> - Add separate "sync_supers" thread that makes sure that the dirty
> super blocks get written. We cannot safely do this from bdi_forker_task(),
> as that risks deadlocking on ->s_umount. Artem, I implemented this
> by doing the wake ups from a timer so that it would be easier for you
> to just deactivate the timer when there are no super blocks.
>
> For ease of patching, I've put the full diff here:
>
> http://kernel.dk/writeback-v9.patch
>
> and also stored this in a writeback-v9 branch that will not change,
> you can pull that into Linus tree from here:
>
> git://git.kernel.dk/linux-2.6-block.git writeback-v9
>
> block/blk-core.c | 1 +
> drivers/block/aoe/aoeblk.c | 1 +
> drivers/char/mem.c | 1 +
> fs/btrfs/disk-io.c | 24 +-
> fs/buffer.c | 2 +-
> fs/char_dev.c | 1 +
> fs/configfs/inode.c | 1 +
> fs/fs-writeback.c | 804 ++++++++++++++++++++++++++++-------
> fs/fuse/inode.c | 1 +
> fs/hugetlbfs/inode.c | 1 +
> fs/nfs/client.c | 1 +
> fs/ntfs/super.c | 33 +--
> fs/ocfs2/dlm/dlmfs.c | 1 +
> fs/ramfs/inode.c | 1 +
> fs/super.c | 3 -
> fs/sync.c | 2 +-
> fs/sysfs/inode.c | 1 +
> fs/ubifs/super.c | 1 +
> include/linux/backing-dev.h | 73 ++++-
> include/linux/fs.h | 11 +-
> include/linux/writeback.h | 15 +-
> kernel/cgroup.c | 1 +
> mm/Makefile | 2 +-
> mm/backing-dev.c | 518 ++++++++++++++++++++++-
> mm/page-writeback.c | 151 +------
> mm/pdflush.c | 269 ------------
> mm/swap_state.c | 1 +
> mm/vmscan.c | 2 +-
> 28 files changed, 1286 insertions(+), 637 deletions(-)
>
I've just tested it on UP in a single disk.
I've run two parallels dbench tests on two partitions and
tried it with this patch and without.
I used 30 proc each during 600 secs.
You can see the result in attachment.
And also there:
http://kernel.org/pub/linux/kernel/people/frederic/dbench.pdf
http://kernel.org/pub/linux/kernel/people/frederic/bdi-writeback-hda1.log
http://kernel.org/pub/linux/kernel/people/frederic/bdi-writeback-hda3.log
http://kernel.org/pub/linux/kernel/people/frederic/pdflush-hda1.log
http://kernel.org/pub/linux/kernel/people/frederic/pdflush-hda3.log
As you can see, bdi writeback is faster than pdflush on hda1 and slower
on hda3. But, well that's not the point.
What I can observe here is the difference on the standard deviation
for the rate between two parallel writers on a same device (but
two different partitions, then superblocks).
With pdflush, the distributed rate is much better balanced than
with bdi writeback in a single device.
I'm not sure why. Is there something in these patches that makes
several bdi flusher threads for a same bdi not well balanced
between them?
Frederic.
[-- Attachment #2: dbench.pdf --]
[-- Type: application/pdf, Size: 21887 bytes --]
[-- Attachment #3: bdi-writeback-hda1.log --]
[-- Type: text/plain, Size: 26598 bytes --]
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004
Running for 600 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 120 secs
30 clients started
30 48 47.25 MB/sec warmup 1 sec
30 48 25.73 MB/sec warmup 2 sec
30 48 17.67 MB/sec warmup 3 sec
30 51 14.77 MB/sec warmup 4 sec
30 55 5.01 MB/sec warmup 14 sec
30 57 2.47 MB/sec warmup 29 sec
30 60 2.29 MB/sec warmup 33 sec
30 61 1.83 MB/sec warmup 42 sec
30 66 1.90 MB/sec warmup 45 sec
30 66 1.86 MB/sec warmup 46 sec
30 66 1.82 MB/sec warmup 47 sec
30 66 1.78 MB/sec warmup 48 sec
30 94 2.43 MB/sec warmup 52 sec
30 94 2.39 MB/sec warmup 53 sec
30 99 2.08 MB/sec warmup 64 sec
30 126 2.40 MB/sec warmup 68 sec
30 126 2.37 MB/sec warmup 69 sec
30 171 2.49 MB/sec warmup 84 sec
30 171 2.46 MB/sec warmup 85 sec
30 171 2.43 MB/sec warmup 86 sec
30 171 2.40 MB/sec warmup 87 sec
30 179 2.23 MB/sec warmup 97 sec
30 186 2.15 MB/sec warmup 104 sec
30 186 2.02 MB/sec warmup 111 sec
30 189 1.93 MB/sec warmup 117 sec
30 256 2.44 MB/sec warmup 119 sec
30 256 2.42 MB/sec warmup 120 sec
30 261 0.00 MB/sec execute 1 sec
30 261 0.00 MB/sec execute 2 sec
30 272 0.45 MB/sec execute 23 sec
30 299 1.20 MB/sec execute 30 sec
30 299 1.16 MB/sec execute 31 sec
30 299 1.12 MB/sec execute 32 sec
30 299 1.09 MB/sec execute 33 sec
30 299 1.06 MB/sec execute 34 sec
30 335 1.83 MB/sec execute 38 sec
30 350 2.14 MB/sec execute 39 sec
30 418 3.10 MB/sec execute 48 sec
30 430 3.22 MB/sec execute 49 sec
30 430 3.16 MB/sec execute 50 sec
30 493 3.61 MB/sec execute 59 sec
30 499 3.61 MB/sec execute 60 sec
30 617 3.94 MB/sec execute 67 sec
30 720 4.14 MB/sec execute 68 sec
30 839 4.46 MB/sec execute 69 sec
30 1171 5.26 MB/sec execute 70 sec
30 1185 5.25 MB/sec execute 71 sec
30 1185 5.17 MB/sec execute 72 sec
30 1493 5.44 MB/sec execute 81 sec
30 1493 5.37 MB/sec execute 82 sec
30 1505 5.33 MB/sec execute 83 sec
30 1559 5.39 MB/sec execute 84 sec
30 1563 5.33 MB/sec execute 85 sec
30 1646 5.43 MB/sec execute 86 sec
30 1677 5.43 MB/sec execute 87 sec
30 2030 6.06 MB/sec execute 88 sec
30 2381 6.61 MB/sec execute 89 sec
30 2738 7.17 MB/sec execute 90 sec
30 3127 7.72 MB/sec execute 91 sec
30 3451 8.32 MB/sec execute 92 sec
30 3837 8.78 MB/sec execute 93 sec
30 4188 9.35 MB/sec execute 94 sec
30 4521 9.80 MB/sec execute 95 sec
30 4903 10.34 MB/sec execute 96 sec
30 5267 10.88 MB/sec execute 97 sec
30 5376 11.02 MB/sec execute 98 sec
30 5587 11.17 MB/sec execute 99 sec
30 5868 11.52 MB/sec execute 100 sec
30 6039 11.68 MB/sec execute 101 sec
30 6047 11.57 MB/sec execute 102 sec
30 6078 11.46 MB/sec execute 103 sec
30 6170 11.46 MB/sec execute 104 sec
30 6224 11.42 MB/sec execute 105 sec
30 6374 11.56 MB/sec execute 106 sec
30 6601 11.84 MB/sec execute 107 sec
30 6839 12.08 MB/sec execute 108 sec
30 7078 12.32 MB/sec execute 109 sec
30 7320 12.56 MB/sec execute 110 sec
30 7634 12.88 MB/sec execute 111 sec
30 8001 13.31 MB/sec execute 112 sec
30 8290 13.69 MB/sec execute 113 sec
30 8638 14.13 MB/sec execute 114 sec
30 9038 14.41 MB/sec execute 115 sec
30 9367 14.86 MB/sec execute 116 sec
30 9697 15.19 MB/sec execute 117 sec
30 10052 15.52 MB/sec execute 118 sec
30 10412 15.91 MB/sec execute 119 sec
30 10613 16.01 MB/sec execute 120 sec
30 10640 15.93 MB/sec execute 121 sec
30 10838 16.08 MB/sec execute 122 sec
30 11211 16.38 MB/sec execute 123 sec
30 11558 16.69 MB/sec execute 124 sec
30 11899 17.09 MB/sec execute 125 sec
30 12267 17.39 MB/sec execute 126 sec
30 12619 17.67 MB/sec execute 127 sec
30 12891 17.94 MB/sec execute 128 sec
30 13072 18.01 MB/sec execute 129 sec
30 13263 18.05 MB/sec execute 130 sec
30 13425 18.16 MB/sec execute 131 sec
30 13572 18.23 MB/sec execute 132 sec
30 13761 18.29 MB/sec execute 133 sec
30 13901 18.29 MB/sec execute 134 sec
30 14035 18.36 MB/sec execute 135 sec
30 14129 18.37 MB/sec execute 136 sec
30 14212 18.31 MB/sec execute 137 sec
30 14279 18.26 MB/sec execute 138 sec
30 14374 18.20 MB/sec execute 139 sec
30 14460 18.12 MB/sec execute 140 sec
30 14552 18.14 MB/sec execute 141 sec
30 14565 18.02 MB/sec execute 142 sec
30 14567 17.90 MB/sec execute 143 sec
30 14567 17.77 MB/sec execute 144 sec
30 14567 17.65 MB/sec execute 145 sec
30 14567 17.53 MB/sec execute 146 sec
30 14567 17.41 MB/sec execute 147 sec
30 14728 17.51 MB/sec execute 148 sec
30 14957 17.69 MB/sec execute 149 sec
30 15027 17.61 MB/sec execute 150 sec
30 15378 17.90 MB/sec execute 151 sec
30 15742 18.13 MB/sec execute 152 sec
30 16100 18.42 MB/sec execute 153 sec
30 16466 18.68 MB/sec execute 154 sec
30 16790 18.93 MB/sec execute 155 sec
30 17055 19.08 MB/sec execute 156 sec
30 17138 19.06 MB/sec execute 157 sec
30 17230 19.03 MB/sec execute 158 sec
30 17332 18.99 MB/sec execute 159 sec
30 17522 19.07 MB/sec execute 160 sec
30 17736 19.16 MB/sec execute 161 sec
30 17840 18.18 MB/sec execute 171 sec
30 17840 18.07 MB/sec execute 172 sec
30 17851 17.97 MB/sec execute 173 sec
30 17851 17.87 MB/sec execute 174 sec
30 17851 17.77 MB/sec execute 175 sec
30 17852 17.67 MB/sec execute 176 sec
30 17858 17.58 MB/sec execute 177 sec
30 17858 17.48 MB/sec execute 178 sec
30 17899 17.43 MB/sec execute 179 sec
30 18946 17.39 MB/sec execute 189 sec
30 18946 17.29 MB/sec execute 190 sec
30 19249 17.30 MB/sec execute 192 sec
30 19415 17.35 MB/sec execute 193 sec
30 19517 17.33 MB/sec execute 194 sec
30 19589 17.31 MB/sec execute 195 sec
30 19669 17.28 MB/sec execute 196 sec
30 19709 17.24 MB/sec execute 197 sec
30 19773 17.19 MB/sec execute 198 sec
30 19847 17.18 MB/sec execute 199 sec
30 19947 17.14 MB/sec execute 200 sec
30 20045 17.14 MB/sec execute 201 sec
30 20136 17.14 MB/sec execute 202 sec
30 20203 17.10 MB/sec execute 203 sec
30 20294 17.10 MB/sec execute 204 sec
30 20316 17.05 MB/sec execute 205 sec
30 20422 17.03 MB/sec execute 206 sec
30 20470 16.97 MB/sec execute 207 sec
30 20480 16.90 MB/sec execute 208 sec
30 20480 16.82 MB/sec execute 209 sec
30 20480 16.74 MB/sec execute 210 sec
30 20480 16.66 MB/sec execute 211 sec
30 20526 16.62 MB/sec execute 212 sec
30 20555 16.56 MB/sec execute 213 sec
30 20555 16.48 MB/sec execute 214 sec
30 20768 16.56 MB/sec execute 215 sec
30 21073 16.75 MB/sec execute 216 sec
30 21427 16.92 MB/sec execute 217 sec
30 21778 17.10 MB/sec execute 218 sec
30 22150 17.27 MB/sec execute 219 sec
30 22494 17.47 MB/sec execute 220 sec
30 22837 17.63 MB/sec execute 221 sec
30 23200 17.81 MB/sec execute 222 sec
30 23552 18.00 MB/sec execute 223 sec
30 23886 18.17 MB/sec execute 224 sec
30 24037 18.21 MB/sec execute 225 sec
30 24060 18.14 MB/sec execute 226 sec
30 24293 17.72 MB/sec execute 234 sec
30 24293 17.64 MB/sec execute 235 sec
30 24293 17.57 MB/sec execute 236 sec
30 24321 17.53 MB/sec execute 237 sec
30 24547 17.58 MB/sec execute 238 sec
30 24602 17.56 MB/sec execute 239 sec
30 24950 17.71 MB/sec execute 240 sec
30 25300 17.86 MB/sec execute 241 sec
30 25654 18.03 MB/sec execute 242 sec
30 26001 18.20 MB/sec execute 243 sec
30 26340 18.34 MB/sec execute 244 sec
30 27206 18.57 MB/sec execute 248 sec
30 27288 18.56 MB/sec execute 249 sec
30 27288 18.49 MB/sec execute 250 sec
30 27290 18.41 MB/sec execute 251 sec
30 27290 18.34 MB/sec execute 252 sec
30 27347 18.32 MB/sec execute 253 sec
30 27347 18.25 MB/sec execute 254 sec
30 27347 18.18 MB/sec execute 255 sec
30 27454 18.17 MB/sec execute 256 sec
30 27728 18.28 MB/sec execute 257 sec
30 28097 18.43 MB/sec execute 258 sec
30 28464 18.59 MB/sec execute 259 sec
30 28795 18.73 MB/sec execute 260 sec
30 29002 18.83 MB/sec execute 261 sec
30 29209 18.84 MB/sec execute 262 sec
30 29428 18.88 MB/sec execute 263 sec
30 29577 18.92 MB/sec execute 264 sec
30 29725 18.94 MB/sec execute 265 sec
30 29802 18.93 MB/sec execute 266 sec
30 29835 18.88 MB/sec execute 267 sec
30 29938 18.88 MB/sec execute 268 sec
30 30150 18.93 MB/sec execute 269 sec
30 30487 19.08 MB/sec execute 270 sec
30 30853 19.22 MB/sec execute 271 sec
30 31222 19.35 MB/sec execute 272 sec
30 31579 19.49 MB/sec execute 273 sec
30 31936 19.64 MB/sec execute 274 sec
30 32085 19.67 MB/sec execute 275 sec
30 32232 19.68 MB/sec execute 276 sec
30 32399 19.71 MB/sec execute 277 sec
30 32513 19.70 MB/sec execute 278 sec
30 33554 19.35 MB/sec execute 291 sec
30 33577 19.29 MB/sec execute 292 sec
30 33577 19.23 MB/sec execute 293 sec
30 33577 19.16 MB/sec execute 294 sec
30 33577 19.10 MB/sec execute 295 sec
30 33577 19.03 MB/sec execute 296 sec
30 33577 18.97 MB/sec execute 297 sec
30 33577 18.91 MB/sec execute 298 sec
30 33577 18.84 MB/sec execute 299 sec
30 33577 18.78 MB/sec execute 300 sec
30 33577 18.72 MB/sec execute 301 sec
30 33588 18.66 MB/sec execute 302 sec
30 33667 18.64 MB/sec execute 303 sec
30 33843 18.66 MB/sec execute 304 sec
30 33872 18.62 MB/sec execute 305 sec
30 34209 18.76 MB/sec execute 306 sec
30 34558 18.88 MB/sec execute 307 sec
30 34883 19.00 MB/sec execute 308 sec
30 35233 19.13 MB/sec execute 309 sec
30 35571 19.24 MB/sec execute 310 sec
30 35939 19.36 MB/sec execute 311 sec
30 36268 19.52 MB/sec execute 312 sec
30 36588 19.65 MB/sec execute 313 sec
30 36887 19.70 MB/sec execute 314 sec
30 36887 19.64 MB/sec execute 315 sec
30 36889 19.58 MB/sec execute 316 sec
30 37176 19.68 MB/sec execute 317 sec
30 37289 19.64 MB/sec execute 318 sec
30 37321 19.59 MB/sec execute 319 sec
30 37452 19.58 MB/sec execute 320 sec
30 37677 19.62 MB/sec execute 321 sec
30 38025 19.74 MB/sec execute 322 sec
30 38379 19.86 MB/sec execute 323 sec
30 38741 19.98 MB/sec execute 324 sec
30 39109 20.09 MB/sec execute 325 sec
30 39465 20.22 MB/sec execute 326 sec
30 39831 20.33 MB/sec execute 327 sec
30 40194 20.45 MB/sec execute 328 sec
30 40530 20.54 MB/sec execute 329 sec
30 40741 20.59 MB/sec execute 330 sec
30 40882 20.59 MB/sec execute 331 sec
30 40967 20.58 MB/sec execute 332 sec
30 41068 20.58 MB/sec execute 333 sec
30 41191 20.57 MB/sec execute 334 sec
30 41249 20.53 MB/sec execute 335 sec
30 41249 20.47 MB/sec execute 336 sec
30 41249 20.40 MB/sec execute 337 sec
30 41249 20.34 MB/sec execute 338 sec
30 41249 20.28 MB/sec execute 339 sec
30 41261 20.24 MB/sec execute 340 sec
30 41261 20.18 MB/sec execute 341 sec
30 41262 20.12 MB/sec execute 342 sec
30 41279 20.06 MB/sec execute 343 sec
30 41375 20.00 MB/sec execute 345 sec
30 41375 19.94 MB/sec execute 346 sec
30 41416 19.90 MB/sec execute 347 sec
30 41704 19.98 MB/sec execute 348 sec
30 42073 20.10 MB/sec execute 349 sec
30 42437 20.21 MB/sec execute 350 sec
30 42788 20.31 MB/sec execute 351 sec
30 43159 20.42 MB/sec execute 352 sec
30 43528 20.53 MB/sec execute 353 sec
30 43878 20.64 MB/sec execute 354 sec
30 44254 20.73 MB/sec execute 355 sec
30 44585 20.85 MB/sec execute 356 sec
30 44944 20.94 MB/sec execute 357 sec
30 45246 21.01 MB/sec execute 358 sec
30 45453 21.06 MB/sec execute 359 sec
30 45662 21.12 MB/sec execute 360 sec
30 45873 21.15 MB/sec execute 361 sec
30 46057 21.18 MB/sec execute 362 sec
30 46289 21.20 MB/sec execute 363 sec
30 46469 21.22 MB/sec execute 364 sec
30 46611 21.24 MB/sec execute 365 sec
30 46719 21.25 MB/sec execute 366 sec
30 46869 21.22 MB/sec execute 367 sec
30 46898 21.17 MB/sec execute 368 sec
30 46930 21.12 MB/sec execute 369 sec
30 46960 21.08 MB/sec execute 370 sec
30 47021 21.06 MB/sec execute 371 sec
30 47043 21.01 MB/sec execute 372 sec
30 47166 21.00 MB/sec execute 373 sec
30 47219 20.97 MB/sec execute 374 sec
30 47219 20.91 MB/sec execute 375 sec
30 47219 20.86 MB/sec execute 376 sec
30 47219 20.80 MB/sec execute 377 sec
30 47219 20.75 MB/sec execute 378 sec
30 47219 20.69 MB/sec execute 379 sec
30 47219 20.64 MB/sec execute 380 sec
30 47245 20.60 MB/sec execute 381 sec
30 47296 20.56 MB/sec execute 382 sec
30 47461 20.57 MB/sec execute 383 sec
30 47678 20.61 MB/sec execute 384 sec
30 48044 20.71 MB/sec execute 385 sec
30 48370 20.82 MB/sec execute 386 sec
30 48993 20.63 MB/sec execute 395 sec
30 49009 20.58 MB/sec execute 396 sec
30 49075 20.55 MB/sec execute 397 sec
30 49075 20.50 MB/sec execute 398 sec
30 49075 20.45 MB/sec execute 399 sec
30 49075 20.40 MB/sec execute 400 sec
30 49075 20.35 MB/sec execute 401 sec
30 49075 20.30 MB/sec execute 402 sec
30 49274 20.33 MB/sec execute 403 sec
30 49623 20.43 MB/sec execute 404 sec
30 49993 20.51 MB/sec execute 405 sec
30 50324 20.62 MB/sec execute 406 sec
30 50700 20.69 MB/sec execute 407 sec
30 50845 20.70 MB/sec execute 408 sec
30 52557 20.77 MB/sec execute 420 sec
30 52924 20.85 MB/sec execute 421 sec
30 53287 20.94 MB/sec execute 422 sec
30 53631 21.02 MB/sec execute 423 sec
30 53997 21.12 MB/sec execute 424 sec
30 54316 21.19 MB/sec execute 425 sec
30 54659 21.27 MB/sec execute 426 sec
30 54845 21.31 MB/sec execute 427 sec
30 55031 21.32 MB/sec execute 428 sec
30 55175 21.33 MB/sec execute 429 sec
30 55317 21.34 MB/sec execute 430 sec
30 55437 21.33 MB/sec execute 431 sec
30 55518 21.30 MB/sec execute 432 sec
30 55518 21.26 MB/sec execute 433 sec
30 55633 21.25 MB/sec execute 434 sec
30 55734 21.24 MB/sec execute 435 sec
30 55757 21.20 MB/sec execute 436 sec
30 55780 21.16 MB/sec execute 437 sec
30 55862 21.14 MB/sec execute 438 sec
30 56199 21.22 MB/sec execute 439 sec
30 56559 21.30 MB/sec execute 440 sec
30 56920 21.39 MB/sec execute 441 sec
30 57279 21.47 MB/sec execute 442 sec
30 57642 21.55 MB/sec execute 443 sec
30 58017 21.63 MB/sec execute 444 sec
30 58374 21.72 MB/sec execute 445 sec
30 58736 21.78 MB/sec execute 446 sec
30 59070 21.86 MB/sec execute 447 sec
30 59434 21.95 MB/sec execute 448 sec
30 59619 21.98 MB/sec execute 449 sec
30 59654 21.94 MB/sec execute 450 sec
30 59983 22.05 MB/sec execute 451 sec
30 60218 22.08 MB/sec execute 452 sec
30 60495 22.13 MB/sec execute 453 sec
30 60506 22.08 MB/sec execute 454 sec
30 60584 22.06 MB/sec execute 455 sec
30 60662 22.03 MB/sec execute 456 sec
30 60854 22.02 MB/sec execute 457 sec
30 61212 22.09 MB/sec execute 458 sec
30 61523 22.16 MB/sec execute 459 sec
30 61533 22.11 MB/sec execute 460 sec
30 61536 22.06 MB/sec execute 461 sec
30 61537 22.01 MB/sec execute 462 sec
30 61538 21.97 MB/sec execute 463 sec
30 61550 21.93 MB/sec execute 464 sec
30 61550 21.88 MB/sec execute 465 sec
30 61555 21.84 MB/sec execute 466 sec
30 61555 21.79 MB/sec execute 467 sec
30 61555 21.74 MB/sec execute 468 sec
30 61556 21.70 MB/sec execute 469 sec
30 61663 21.69 MB/sec execute 470 sec
30 62013 21.76 MB/sec execute 471 sec
30 62299 21.87 MB/sec execute 472 sec
30 62632 21.95 MB/sec execute 473 sec
30 62971 22.00 MB/sec execute 474 sec
30 63260 22.05 MB/sec execute 475 sec
30 63646 21.86 MB/sec execute 480 sec
30 63711 21.83 MB/sec execute 481 sec
30 63711 21.79 MB/sec execute 482 sec
30 63825 21.78 MB/sec execute 483 sec
30 64186 21.86 MB/sec execute 484 sec
30 64537 21.93 MB/sec execute 485 sec
30 64897 22.00 MB/sec execute 486 sec
30 65243 22.09 MB/sec execute 487 sec
30 65600 22.15 MB/sec execute 488 sec
30 65961 22.23 MB/sec execute 489 sec
30 66314 22.30 MB/sec execute 490 sec
30 66410 22.29 MB/sec execute 491 sec
30 66662 22.32 MB/sec execute 492 sec
30 66870 22.35 MB/sec execute 493 sec
30 67106 22.39 MB/sec execute 494 sec
30 67305 22.40 MB/sec execute 495 sec
30 67475 22.41 MB/sec execute 496 sec
30 67482 22.37 MB/sec execute 497 sec
30 67520 22.33 MB/sec execute 498 sec
30 67530 22.29 MB/sec execute 499 sec
30 67623 22.27 MB/sec execute 500 sec
30 67679 22.24 MB/sec execute 501 sec
30 67768 22.22 MB/sec execute 502 sec
30 68060 22.28 MB/sec execute 503 sec
30 68262 22.31 MB/sec execute 504 sec
30 68335 22.28 MB/sec execute 505 sec
30 68346 22.24 MB/sec execute 506 sec
30 68395 22.21 MB/sec execute 507 sec
30 68438 22.18 MB/sec execute 508 sec
30 68440 22.14 MB/sec execute 509 sec
30 68600 22.15 MB/sec execute 510 sec
30 68952 22.22 MB/sec execute 511 sec
30 69262 22.27 MB/sec execute 512 sec
30 69504 22.31 MB/sec execute 513 sec
30 69774 22.35 MB/sec execute 514 sec
30 70182 22.41 MB/sec execute 515 sec
30 70510 22.50 MB/sec execute 516 sec
30 70834 22.58 MB/sec execute 517 sec
30 71147 22.64 MB/sec execute 518 sec
30 71177 22.61 MB/sec execute 519 sec
30 71337 22.61 MB/sec execute 520 sec
30 71354 22.57 MB/sec execute 521 sec
30 71354 22.53 MB/sec execute 522 sec
30 71372 22.49 MB/sec execute 523 sec
30 71372 22.44 MB/sec execute 524 sec
30 71483 22.44 MB/sec execute 525 sec
30 71641 22.44 MB/sec execute 526 sec
30 71823 22.46 MB/sec execute 527 sec
30 72045 22.47 MB/sec execute 528 sec
30 72211 22.48 MB/sec execute 529 sec
30 72417 22.50 MB/sec execute 530 sec
30 72778 22.57 MB/sec execute 531 sec
30 73142 22.64 MB/sec execute 532 sec
30 73511 22.70 MB/sec execute 533 sec
30 73572 22.67 MB/sec execute 534 sec
30 73671 22.66 MB/sec execute 535 sec
30 73909 22.63 MB/sec execute 538 sec
30 74121 22.68 MB/sec execute 539 sec
30 74351 22.72 MB/sec execute 540 sec
30 74443 22.69 MB/sec execute 541 sec
30 74453 22.65 MB/sec execute 542 sec
30 74532 22.62 MB/sec execute 543 sec
30 74612 22.60 MB/sec execute 544 sec
30 74721 22.50 MB/sec execute 547 sec
30 74721 22.46 MB/sec execute 548 sec
30 74735 22.42 MB/sec execute 549 sec
30 74735 22.38 MB/sec execute 550 sec
30 74750 22.34 MB/sec execute 551 sec
30 74903 22.34 MB/sec execute 552 sec
30 75279 22.41 MB/sec execute 553 sec
30 75573 22.46 MB/sec execute 554 sec
30 75873 22.50 MB/sec execute 555 sec
30 76203 22.57 MB/sec execute 556 sec
30 76562 22.64 MB/sec execute 557 sec
30 76899 22.68 MB/sec execute 558 sec
30 77253 22.74 MB/sec execute 559 sec
30 77614 22.80 MB/sec execute 560 sec
30 77942 22.87 MB/sec execute 561 sec
30 78282 22.94 MB/sec execute 562 sec
30 78566 22.99 MB/sec execute 563 sec
30 78917 23.05 MB/sec execute 564 sec
30 79191 23.08 MB/sec execute 565 sec
30 79232 23.04 MB/sec execute 566 sec
30 79232 23.00 MB/sec execute 567 sec
30 79239 22.96 MB/sec execute 568 sec
30 79239 22.92 MB/sec execute 569 sec
30 79593 22.97 MB/sec execute 570 sec
30 79954 23.03 MB/sec execute 571 sec
30 80319 23.10 MB/sec execute 572 sec
30 80680 23.16 MB/sec execute 573 sec
30 81044 23.22 MB/sec execute 574 sec
30 81404 23.28 MB/sec execute 575 sec
30 81765 23.34 MB/sec execute 576 sec
30 82122 23.41 MB/sec execute 577 sec
30 82226 23.37 MB/sec execute 578 sec
30 82226 23.33 MB/sec execute 579 sec
30 82226 23.29 MB/sec execute 580 sec
30 82226 23.25 MB/sec execute 581 sec
30 82226 23.21 MB/sec execute 582 sec
30 82226 23.17 MB/sec execute 583 sec
30 82226 23.13 MB/sec execute 584 sec
30 82226 23.09 MB/sec execute 585 sec
30 82226 23.05 MB/sec execute 586 sec
30 82226 23.01 MB/sec execute 587 sec
30 82325 23.00 MB/sec execute 588 sec
30 82358 22.96 MB/sec execute 589 sec
30 82395 22.93 MB/sec execute 590 sec
30 82485 22.91 MB/sec execute 591 sec
30 82495 22.88 MB/sec execute 592 sec
30 82682 22.89 MB/sec execute 593 sec
30 83043 22.95 MB/sec execute 594 sec
30 83407 23.00 MB/sec execute 595 sec
30 83772 23.06 MB/sec execute 596 sec
30 84137 23.12 MB/sec execute 597 sec
30 84392 23.15 MB/sec execute 598 sec
30 84523 23.16 MB/sec execute 599 sec
30 84692 23.16 MB/sec cleanup 600 sec
30 84692 23.12 MB/sec cleanup 601 sec
30 84692 23.08 MB/sec cleanup 602 sec
30 84692 23.05 MB/sec cleanup 603 sec
30 84692 23.01 MB/sec cleanup 604 sec
30 84692 22.97 MB/sec cleanup 605 sec
30 84692 22.93 MB/sec cleanup 606 sec
30 84692 22.87 MB/sec cleanup 608 sec
30 84692 22.83 MB/sec cleanup 609 sec
30 84692 22.79 MB/sec cleanup 610 sec
30 84692 22.76 MB/sec cleanup 611 sec
30 84692 22.72 MB/sec cleanup 612 sec
30 84692 22.68 MB/sec cleanup 613 sec
30 84692 22.64 MB/sec cleanup 614 sec
30 84692 22.61 MB/sec cleanup 615 sec
30 84692 22.57 MB/sec cleanup 616 sec
30 84692 22.53 MB/sec cleanup 617 sec
30 84692 22.50 MB/sec cleanup 618 sec
30 84692 22.46 MB/sec cleanup 619 sec
30 84692 22.44 MB/sec cleanup 620 sec
Throughput 23.1628 MB/sec 30 procs
[-- Attachment #4: bdi-writeback-hda3.log --]
[-- Type: text/plain, Size: 23517 bytes --]
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004
Running for 600 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 120 secs
30 clients started
30 13 0.00 MB/sec warmup 1 sec
30 14 0.00 MB/sec warmup 2 sec
30 14 0.00 MB/sec warmup 3 sec
30 14 0.00 MB/sec warmup 4 sec
30 20 2.14 MB/sec warmup 5 sec
30 26 3.29 MB/sec warmup 6 sec
30 30 3.86 MB/sec warmup 7 sec
30 34 4.19 MB/sec warmup 8 sec
30 35 3.93 MB/sec warmup 9 sec
30 44 4.58 MB/sec warmup 10 sec
30 49 4.88 MB/sec warmup 11 sec
30 52 4.88 MB/sec warmup 12 sec
30 60 5.19 MB/sec warmup 13 sec
30 68 5.65 MB/sec warmup 14 sec
30 68 5.28 MB/sec warmup 15 sec
30 83 5.94 MB/sec warmup 16 sec
30 92 6.43 MB/sec warmup 17 sec
30 96 6.35 MB/sec warmup 18 sec
30 101 6.29 MB/sec warmup 19 sec
30 112 6.62 MB/sec warmup 20 sec
30 116 6.60 MB/sec warmup 21 sec
30 123 6.69 MB/sec warmup 22 sec
30 123 6.41 MB/sec warmup 23 sec
30 142 6.92 MB/sec warmup 24 sec
30 149 7.02 MB/sec warmup 25 sec
30 158 7.13 MB/sec warmup 26 sec
30 169 7.29 MB/sec warmup 27 sec
30 177 7.33 MB/sec warmup 28 sec
30 190 7.51 MB/sec warmup 29 sec
30 225 7.94 MB/sec warmup 32 sec
30 237 8.04 MB/sec warmup 33 sec
30 250 8.16 MB/sec warmup 34 sec
30 254 8.04 MB/sec warmup 35 sec
30 269 8.25 MB/sec warmup 36 sec
30 275 8.24 MB/sec warmup 37 sec
30 288 8.35 MB/sec warmup 38 sec
30 301 8.45 MB/sec warmup 39 sec
30 315 8.55 MB/sec warmup 40 sec
30 339 8.58 MB/sec warmup 41 sec
30 347 8.57 MB/sec warmup 42 sec
30 356 8.57 MB/sec warmup 43 sec
30 374 8.62 MB/sec warmup 44 sec
30 535 9.14 MB/sec warmup 45 sec
30 606 9.09 MB/sec warmup 46 sec
30 631 8.68 MB/sec warmup 51 sec
30 631 8.52 MB/sec warmup 52 sec
30 671 8.56 MB/sec warmup 53 sec
30 949 9.24 MB/sec warmup 54 sec
30 1102 9.86 MB/sec warmup 55 sec
30 1196 9.99 MB/sec warmup 56 sec
30 1339 10.34 MB/sec warmup 57 sec
30 1460 10.46 MB/sec warmup 58 sec
30 1544 10.67 MB/sec warmup 59 sec
30 1593 10.76 MB/sec warmup 60 sec
30 1621 10.73 MB/sec warmup 61 sec
30 1644 10.73 MB/sec warmup 62 sec
30 1661 10.79 MB/sec warmup 63 sec
30 1689 10.85 MB/sec warmup 64 sec
30 1740 10.46 MB/sec warmup 68 sec
30 1761 10.36 MB/sec warmup 69 sec
30 1817 10.44 MB/sec warmup 70 sec
30 2138 10.95 MB/sec warmup 71 sec
30 2446 11.50 MB/sec warmup 72 sec
30 2532 11.76 MB/sec warmup 73 sec
30 2538 11.69 MB/sec warmup 74 sec
30 2572 11.58 MB/sec warmup 75 sec
30 2731 11.87 MB/sec warmup 76 sec
30 3047 12.36 MB/sec warmup 77 sec
30 3104 12.33 MB/sec warmup 78 sec
30 3107 12.15 MB/sec warmup 79 sec
30 3124 12.01 MB/sec warmup 80 sec
30 3143 11.96 MB/sec warmup 81 sec
30 3231 11.82 MB/sec warmup 84 sec
30 3231 11.69 MB/sec warmup 85 sec
30 3231 11.55 MB/sec warmup 86 sec
30 3231 11.42 MB/sec warmup 87 sec
30 3231 11.29 MB/sec warmup 88 sec
30 3428 11.29 MB/sec warmup 92 sec
30 3663 11.70 MB/sec warmup 93 sec
30 3785 11.85 MB/sec warmup 94 sec
30 3923 11.94 MB/sec warmup 95 sec
30 3937 11.86 MB/sec warmup 96 sec
30 3980 11.78 MB/sec warmup 97 sec
30 4298 12.17 MB/sec warmup 98 sec
30 4616 12.59 MB/sec warmup 99 sec
30 4905 12.95 MB/sec warmup 100 sec
30 5228 13.32 MB/sec warmup 101 sec
30 5513 13.58 MB/sec warmup 102 sec
30 5826 13.97 MB/sec warmup 103 sec
30 6116 14.29 MB/sec warmup 104 sec
30 6409 14.58 MB/sec warmup 105 sec
30 6674 14.89 MB/sec warmup 106 sec
30 6913 15.14 MB/sec warmup 107 sec
30 7043 15.19 MB/sec warmup 108 sec
30 7052 15.05 MB/sec warmup 109 sec
30 7052 14.91 MB/sec warmup 110 sec
30 7052 14.78 MB/sec warmup 111 sec
30 7052 14.65 MB/sec warmup 112 sec
30 7155 13.85 MB/sec warmup 120 sec
30 7658 49.68 MB/sec execute 1 sec
30 7972 51.10 MB/sec execute 2 sec
30 8295 51.55 MB/sec execute 3 sec
30 8611 51.36 MB/sec execute 4 sec
30 8916 51.52 MB/sec execute 5 sec
30 9246 51.10 MB/sec execute 6 sec
30 9544 51.43 MB/sec execute 7 sec
30 9848 50.67 MB/sec execute 8 sec
30 10068 49.09 MB/sec execute 9 sec
30 10386 46.18 MB/sec execute 11 sec
30 10695 47.04 MB/sec execute 12 sec
30 10875 45.13 MB/sec execute 13 sec
30 11177 45.71 MB/sec execute 14 sec
30 11484 46.09 MB/sec execute 15 sec
30 11807 46.10 MB/sec execute 16 sec
30 12103 46.36 MB/sec execute 17 sec
30 12405 46.66 MB/sec execute 18 sec
30 12735 46.73 MB/sec execute 19 sec
30 13569 46.71 MB/sec execute 22 sec
30 13878 47.14 MB/sec execute 23 sec
30 14149 47.36 MB/sec execute 24 sec
30 14442 47.34 MB/sec execute 25 sec
30 14759 47.20 MB/sec execute 26 sec
30 14961 46.57 MB/sec execute 27 sec
30 15008 43.28 MB/sec execute 29 sec
30 15234 33.64 MB/sec execute 38 sec
30 15234 32.77 MB/sec execute 39 sec
30 15395 26.75 MB/sec execute 49 sec
30 15421 26.24 MB/sec execute 50 sec
30 15421 25.73 MB/sec execute 51 sec
30 15559 22.94 MB/sec execute 58 sec
30 15578 22.63 MB/sec execute 59 sec
30 15665 22.51 MB/sec execute 60 sec
30 15833 22.59 MB/sec execute 61 sec
30 15833 22.22 MB/sec execute 62 sec
30 15833 21.87 MB/sec execute 63 sec
30 15920 19.76 MB/sec execute 71 sec
30 15923 19.49 MB/sec execute 72 sec
30 16229 19.88 MB/sec execute 73 sec
30 16549 20.32 MB/sec execute 74 sec
30 16806 20.58 MB/sec execute 75 sec
30 16935 20.60 MB/sec execute 76 sec
30 16942 20.34 MB/sec execute 77 sec
30 17042 20.27 MB/sec execute 78 sec
30 17125 20.14 MB/sec execute 79 sec
30 17160 19.95 MB/sec execute 80 sec
30 17201 19.81 MB/sec execute 81 sec
30 17201 19.57 MB/sec execute 82 sec
30 17212 19.35 MB/sec execute 83 sec
30 17228 19.14 MB/sec execute 84 sec
30 17376 19.22 MB/sec execute 85 sec
30 17412 19.06 MB/sec execute 86 sec
30 17624 17.30 MB/sec execute 97 sec
30 17734 17.33 MB/sec execute 98 sec
30 17817 17.32 MB/sec execute 99 sec
30 17872 17.24 MB/sec execute 100 sec
30 17912 17.09 MB/sec execute 101 sec
30 17920 16.94 MB/sec execute 102 sec
30 17922 16.76 MB/sec execute 103 sec
30 18159 16.94 MB/sec execute 104 sec
30 18298 16.98 MB/sec execute 105 sec
30 18405 16.98 MB/sec execute 106 sec
30 18512 17.00 MB/sec execute 107 sec
30 18618 17.00 MB/sec execute 108 sec
30 18715 16.98 MB/sec execute 109 sec
30 18758 15.53 MB/sec execute 119 sec
30 18758 14.94 MB/sec execute 124 sec
30 18758 14.82 MB/sec execute 125 sec
30 18758 14.70 MB/sec execute 126 sec
30 18851 14.78 MB/sec execute 127 sec
30 19019 14.87 MB/sec execute 128 sec
30 19143 14.93 MB/sec execute 129 sec
30 19301 15.00 MB/sec execute 130 sec
30 19458 15.02 MB/sec execute 131 sec
30 19636 15.10 MB/sec execute 132 sec
30 19814 15.25 MB/sec execute 133 sec
30 20000 15.36 MB/sec execute 134 sec
30 20210 15.49 MB/sec execute 135 sec
30 20419 15.64 MB/sec execute 136 sec
30 20664 15.81 MB/sec execute 137 sec
30 20910 15.95 MB/sec execute 138 sec
30 21142 16.11 MB/sec execute 139 sec
30 21377 16.29 MB/sec execute 140 sec
30 21487 16.28 MB/sec execute 141 sec
30 21516 16.20 MB/sec execute 142 sec
30 21566 16.14 MB/sec execute 143 sec
30 21752 16.24 MB/sec execute 144 sec
30 22045 16.45 MB/sec execute 145 sec
30 22285 15.60 MB/sec execute 155 sec
30 22506 15.74 MB/sec execute 156 sec
30 22740 15.79 MB/sec execute 157 sec
30 22761 15.71 MB/sec execute 158 sec
30 22761 15.61 MB/sec execute 159 sec
30 22807 15.57 MB/sec execute 160 sec
30 22854 15.56 MB/sec execute 161 sec
30 24444 15.92 MB/sec execute 174 sec
30 24744 16.05 MB/sec execute 175 sec
30 25033 16.29 MB/sec execute 176 sec
30 25233 16.37 MB/sec execute 177 sec
30 25413 15.58 MB/sec execute 188 sec
30 25413 15.50 MB/sec execute 189 sec
30 25510 15.50 MB/sec execute 190 sec
30 25575 15.46 MB/sec execute 191 sec
30 25780 15.58 MB/sec execute 192 sec
30 26020 15.66 MB/sec execute 193 sec
30 26228 15.77 MB/sec execute 194 sec
30 26496 15.89 MB/sec execute 195 sec
30 26758 16.04 MB/sec execute 196 sec
30 26976 16.16 MB/sec execute 197 sec
30 27206 16.25 MB/sec execute 198 sec
30 27417 16.35 MB/sec execute 199 sec
30 27651 16.42 MB/sec execute 200 sec
30 27875 16.53 MB/sec execute 201 sec
30 28118 16.68 MB/sec execute 202 sec
30 28283 16.71 MB/sec execute 203 sec
30 28285 16.63 MB/sec execute 204 sec
30 28299 16.55 MB/sec execute 205 sec
30 28381 16.53 MB/sec execute 206 sec
30 28429 16.49 MB/sec execute 207 sec
30 28484 16.46 MB/sec execute 208 sec
30 28563 16.45 MB/sec execute 209 sec
30 28563 16.38 MB/sec execute 210 sec
30 28563 16.30 MB/sec execute 211 sec
30 28710 15.47 MB/sec execute 224 sec
30 28740 15.42 MB/sec execute 225 sec
30 28756 15.37 MB/sec execute 226 sec
30 28808 15.33 MB/sec execute 227 sec
30 28993 15.36 MB/sec execute 228 sec
30 28993 15.29 MB/sec execute 229 sec
30 29136 15.32 MB/sec execute 230 sec
30 29136 15.26 MB/sec execute 231 sec
30 29136 15.19 MB/sec execute 232 sec
30 29136 15.12 MB/sec execute 233 sec
30 29136 15.06 MB/sec execute 234 sec
30 29136 15.00 MB/sec execute 235 sec
30 29273 14.34 MB/sec execute 248 sec
30 29292 14.31 MB/sec execute 249 sec
30 29295 14.25 MB/sec execute 250 sec
30 29327 14.21 MB/sec execute 251 sec
30 29568 14.31 MB/sec execute 252 sec
30 29837 14.43 MB/sec execute 253 sec
30 30141 14.56 MB/sec execute 254 sec
30 30309 14.62 MB/sec execute 255 sec
30 30346 14.58 MB/sec execute 256 sec
30 30347 14.51 MB/sec execute 257 sec
30 30349 14.45 MB/sec execute 258 sec
30 30423 14.46 MB/sec execute 259 sec
30 30542 14.49 MB/sec execute 260 sec
30 30656 14.51 MB/sec execute 261 sec
30 30845 14.53 MB/sec execute 262 sec
30 31008 14.60 MB/sec execute 263 sec
30 31127 14.62 MB/sec execute 264 sec
30 31170 14.58 MB/sec execute 265 sec
30 31386 14.24 MB/sec execute 274 sec
30 31521 14.27 MB/sec execute 275 sec
30 31680 14.32 MB/sec execute 276 sec
30 31895 14.38 MB/sec execute 277 sec
30 32031 14.40 MB/sec execute 278 sec
30 32168 14.40 MB/sec execute 279 sec
30 32414 14.51 MB/sec execute 280 sec
30 32667 14.59 MB/sec execute 281 sec
30 32945 14.69 MB/sec execute 282 sec
30 33242 14.81 MB/sec execute 283 sec
30 33536 14.92 MB/sec execute 284 sec
30 33735 15.00 MB/sec execute 285 sec
30 33965 15.07 MB/sec execute 286 sec
30 34200 15.13 MB/sec execute 287 sec
30 34455 15.21 MB/sec execute 288 sec
30 34524 15.20 MB/sec execute 289 sec
30 34528 15.15 MB/sec execute 290 sec
30 34544 15.11 MB/sec execute 291 sec
30 34632 15.09 MB/sec execute 292 sec
30 34901 15.19 MB/sec execute 293 sec
30 35218 15.32 MB/sec execute 294 sec
30 35515 15.43 MB/sec execute 295 sec
30 35825 15.54 MB/sec execute 296 sec
30 36138 15.67 MB/sec execute 297 sec
30 36433 15.78 MB/sec execute 298 sec
30 36724 15.90 MB/sec execute 299 sec
30 37025 16.01 MB/sec execute 300 sec
30 37336 16.09 MB/sec execute 301 sec
30 37372 16.05 MB/sec execute 302 sec
30 37625 15.57 MB/sec execute 314 sec
30 37637 15.52 MB/sec execute 315 sec
30 37652 15.48 MB/sec execute 316 sec
30 37654 15.43 MB/sec execute 317 sec
30 37654 15.38 MB/sec execute 318 sec
30 37654 15.33 MB/sec execute 319 sec
30 37654 15.29 MB/sec execute 320 sec
30 37654 15.24 MB/sec execute 321 sec
30 37654 15.19 MB/sec execute 322 sec
30 37654 15.14 MB/sec execute 323 sec
30 37654 15.10 MB/sec execute 324 sec
30 37654 15.05 MB/sec execute 325 sec
30 37654 15.00 MB/sec execute 326 sec
30 37654 14.95 MB/sec execute 327 sec
30 37691 14.92 MB/sec execute 328 sec
30 37779 14.94 MB/sec execute 329 sec
30 37963 14.99 MB/sec execute 330 sec
30 38159 15.05 MB/sec execute 331 sec
30 38352 15.11 MB/sec execute 332 sec
30 38549 15.18 MB/sec execute 333 sec
30 38733 15.20 MB/sec execute 334 sec
30 38873 15.21 MB/sec execute 335 sec
30 39050 15.25 MB/sec execute 336 sec
30 39197 15.28 MB/sec execute 337 sec
30 39289 15.27 MB/sec execute 338 sec
30 39297 15.23 MB/sec execute 339 sec
30 39554 15.25 MB/sec execute 342 sec
30 39584 15.21 MB/sec execute 343 sec
30 39587 15.16 MB/sec execute 344 sec
30 39587 15.12 MB/sec execute 345 sec
30 39587 15.08 MB/sec execute 346 sec
30 39587 15.03 MB/sec execute 347 sec
30 39587 14.99 MB/sec execute 348 sec
30 39587 14.95 MB/sec execute 349 sec
30 39696 14.67 MB/sec execute 358 sec
30 39801 14.68 MB/sec execute 359 sec
30 39927 14.71 MB/sec execute 360 sec
30 40050 14.71 MB/sec execute 361 sec
30 40162 14.72 MB/sec execute 362 sec
30 40289 14.73 MB/sec execute 363 sec
30 40451 14.78 MB/sec execute 364 sec
30 40666 14.81 MB/sec execute 365 sec
30 40864 14.86 MB/sec execute 366 sec
30 41030 14.89 MB/sec execute 367 sec
30 41073 14.86 MB/sec execute 368 sec
30 41095 14.83 MB/sec execute 369 sec
30 41107 14.79 MB/sec execute 370 sec
30 41107 14.75 MB/sec execute 371 sec
30 41107 14.71 MB/sec execute 372 sec
30 41107 14.67 MB/sec execute 373 sec
30 41107 14.64 MB/sec execute 374 sec
30 41107 14.60 MB/sec execute 375 sec
30 41107 14.56 MB/sec execute 376 sec
30 41107 14.52 MB/sec execute 377 sec
30 41107 14.48 MB/sec execute 378 sec
30 41107 14.44 MB/sec execute 379 sec
30 41107 14.40 MB/sec execute 380 sec
30 41232 14.19 MB/sec execute 388 sec
30 41458 14.10 MB/sec execute 392 sec
30 41549 14.10 MB/sec execute 393 sec
30 41549 14.06 MB/sec execute 394 sec
30 41549 14.03 MB/sec execute 395 sec
30 41549 13.99 MB/sec execute 396 sec
30 41549 13.96 MB/sec execute 397 sec
30 42290 13.74 MB/sec execute 412 sec
30 42290 13.71 MB/sec execute 413 sec
30 42290 13.67 MB/sec execute 414 sec
30 42290 13.64 MB/sec execute 415 sec
30 42290 13.61 MB/sec execute 416 sec
30 42458 13.37 MB/sec execute 426 sec
30 42607 13.39 MB/sec execute 427 sec
30 42765 13.43 MB/sec execute 428 sec
30 42961 13.45 MB/sec execute 429 sec
30 43148 13.50 MB/sec execute 430 sec
30 43339 13.55 MB/sec execute 431 sec
30 43438 13.54 MB/sec execute 432 sec
30 43451 13.51 MB/sec execute 433 sec
30 43469 13.49 MB/sec execute 434 sec
30 43532 13.48 MB/sec execute 435 sec
30 43534 13.45 MB/sec execute 436 sec
30 43534 13.42 MB/sec execute 437 sec
30 43534 13.39 MB/sec execute 438 sec
30 43534 13.36 MB/sec execute 439 sec
30 43534 13.33 MB/sec execute 440 sec
30 43653 13.03 MB/sec execute 451 sec
30 43707 13.02 MB/sec execute 452 sec
30 43727 13.01 MB/sec execute 453 sec
30 43729 12.98 MB/sec execute 454 sec
30 43756 12.96 MB/sec execute 455 sec
30 43763 12.93 MB/sec execute 456 sec
30 43789 12.92 MB/sec execute 457 sec
30 43828 12.90 MB/sec execute 458 sec
30 43875 12.89 MB/sec execute 459 sec
30 43879 12.86 MB/sec execute 460 sec
30 43879 12.84 MB/sec execute 461 sec
30 43879 12.81 MB/sec execute 462 sec
30 43879 12.78 MB/sec execute 463 sec
30 43879 12.75 MB/sec execute 464 sec
30 44099 12.54 MB/sec execute 475 sec
30 44193 12.55 MB/sec execute 476 sec
30 44252 12.55 MB/sec execute 477 sec
30 44302 12.53 MB/sec execute 478 sec
30 44313 12.51 MB/sec execute 479 sec
30 44313 12.48 MB/sec execute 480 sec
30 44313 12.45 MB/sec execute 481 sec
30 44313 12.43 MB/sec execute 482 sec
30 44511 12.29 MB/sec execute 490 sec
30 44610 12.30 MB/sec execute 491 sec
30 44691 12.30 MB/sec execute 492 sec
30 44772 12.32 MB/sec execute 493 sec
30 44921 12.34 MB/sec execute 494 sec
30 45064 12.36 MB/sec execute 495 sec
30 45169 12.36 MB/sec execute 496 sec
30 45266 12.36 MB/sec execute 497 sec
30 45377 12.37 MB/sec execute 498 sec
30 45418 12.36 MB/sec execute 499 sec
30 45419 12.33 MB/sec execute 500 sec
30 45419 12.31 MB/sec execute 501 sec
30 45479 12.30 MB/sec execute 502 sec
30 45584 12.31 MB/sec execute 503 sec
30 45584 12.28 MB/sec execute 504 sec
30 45584 12.26 MB/sec execute 505 sec
30 45584 12.23 MB/sec execute 506 sec
30 45584 12.21 MB/sec execute 507 sec
30 45584 12.19 MB/sec execute 508 sec
30 45584 12.16 MB/sec execute 509 sec
30 45584 12.14 MB/sec execute 510 sec
30 45613 12.13 MB/sec execute 511 sec
30 45756 11.96 MB/sec execute 520 sec
30 45783 11.95 MB/sec execute 521 sec
30 45795 11.93 MB/sec execute 522 sec
30 45795 11.91 MB/sec execute 523 sec
30 45814 11.89 MB/sec execute 524 sec
30 45927 11.90 MB/sec execute 525 sec
30 46076 11.93 MB/sec execute 526 sec
30 46229 11.95 MB/sec execute 527 sec
30 46367 11.97 MB/sec execute 528 sec
30 46510 11.99 MB/sec execute 529 sec
30 46531 11.98 MB/sec execute 530 sec
30 46685 11.87 MB/sec execute 537 sec
30 47084 11.79 MB/sec execute 546 sec
30 47084 11.77 MB/sec execute 547 sec
30 47084 11.75 MB/sec execute 548 sec
30 47084 11.73 MB/sec execute 549 sec
30 47084 11.71 MB/sec execute 550 sec
30 47084 11.69 MB/sec execute 551 sec
30 47108 11.67 MB/sec execute 552 sec
30 47170 11.67 MB/sec execute 553 sec
30 47258 11.47 MB/sec execute 564 sec
30 47335 11.46 MB/sec execute 565 sec
30 47346 11.44 MB/sec execute 566 sec
30 47346 11.42 MB/sec execute 567 sec
30 47346 11.40 MB/sec execute 568 sec
30 47346 11.38 MB/sec execute 569 sec
30 47346 11.36 MB/sec execute 570 sec
30 47346 11.34 MB/sec execute 571 sec
30 47527 11.27 MB/sec execute 577 sec
30 47610 11.28 MB/sec execute 578 sec
30 47658 11.29 MB/sec execute 579 sec
30 47778 11.30 MB/sec execute 580 sec
30 47912 11.32 MB/sec execute 581 sec
30 48003 11.34 MB/sec execute 582 sec
30 48125 11.36 MB/sec execute 583 sec
30 48236 11.36 MB/sec execute 584 sec
30 48445 11.38 MB/sec execute 585 sec
30 48705 11.44 MB/sec execute 586 sec
30 48750 11.43 MB/sec execute 587 sec
30 48774 11.41 MB/sec execute 588 sec
30 48817 11.40 MB/sec execute 589 sec
30 48949 11.28 MB/sec execute 597 sec
30 49124 11.31 MB/sec execute 598 sec
30 49341 11.35 MB/sec execute 599 sec
30 49649 11.41 MB/sec cleanup 600 sec
30 49649 11.39 MB/sec cleanup 601 sec
30 49649 11.37 MB/sec cleanup 602 sec
30 49649 11.35 MB/sec cleanup 603 sec
30 49649 11.33 MB/sec cleanup 604 sec
30 49649 11.31 MB/sec cleanup 605 sec
Throughput 11.4073 MB/sec 30 procs
[-- Attachment #5: pdflush-hda1.log --]
[-- Type: text/plain, Size: 28719 bytes --]
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004
Running for 600 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 120 secs
30 clients started
30 48 56.54 MB/sec warmup 1 sec
30 48 28.00 MB/sec warmup 2 sec
30 48 18.73 MB/sec warmup 3 sec
30 48 14.07 MB/sec warmup 4 sec
30 48 11.26 MB/sec warmup 5 sec
30 48 9.39 MB/sec warmup 6 sec
30 50 4.38 MB/sec warmup 13 sec
30 50 2.57 MB/sec warmup 23 sec
30 59 2.65 MB/sec warmup 27 sec
30 59 2.56 MB/sec warmup 28 sec
30 65 2.17 MB/sec warmup 37 sec
30 68 2.06 MB/sec warmup 42 sec
30 69 1.67 MB/sec warmup 52 sec
30 72 1.52 MB/sec warmup 60 sec
30 81 1.51 MB/sec warmup 70 sec
30 86 1.48 MB/sec warmup 76 sec
30 104 1.54 MB/sec warmup 88 sec
30 121 1.71 MB/sec warmup 89 sec
30 121 1.69 MB/sec warmup 90 sec
30 125 1.72 MB/sec warmup 91 sec
30 370 2.26 MB/sec warmup 92 sec
30 681 2.90 MB/sec warmup 93 sec
30 868 3.29 MB/sec warmup 94 sec
30 879 3.38 MB/sec warmup 95 sec
30 886 3.40 MB/sec warmup 96 sec
30 902 3.41 MB/sec warmup 97 sec
30 902 3.37 MB/sec warmup 98 sec
30 902 3.34 MB/sec warmup 99 sec
30 902 3.31 MB/sec warmup 100 sec
30 902 3.27 MB/sec warmup 101 sec
30 902 3.24 MB/sec warmup 102 sec
30 961 3.67 MB/sec warmup 103 sec
30 983 3.72 MB/sec warmup 104 sec
30 983 3.69 MB/sec warmup 105 sec
30 983 3.65 MB/sec warmup 106 sec
30 983 3.62 MB/sec warmup 107 sec
30 983 3.59 MB/sec warmup 108 sec
30 990 3.61 MB/sec warmup 109 sec
30 1073 3.95 MB/sec warmup 110 sec
30 1078 3.95 MB/sec warmup 111 sec
30 1078 3.91 MB/sec warmup 112 sec
30 1078 3.88 MB/sec warmup 113 sec
30 1078 3.84 MB/sec warmup 114 sec
30 1078 3.81 MB/sec warmup 115 sec
30 1078 3.78 MB/sec warmup 116 sec
30 1079 3.77 MB/sec warmup 117 sec
30 1079 3.74 MB/sec warmup 118 sec
30 1079 3.71 MB/sec warmup 119 sec
30 1079 0.00 MB/sec execute 1 sec
30 1089 4.10 MB/sec execute 2 sec
30 1099 5.52 MB/sec execute 3 sec
30 1107 6.18 MB/sec execute 4 sec
30 1148 7.56 MB/sec execute 5 sec
30 1151 6.70 MB/sec execute 6 sec
30 1151 5.74 MB/sec execute 7 sec
30 1151 5.02 MB/sec execute 8 sec
30 1151 4.46 MB/sec execute 9 sec
30 1151 4.02 MB/sec execute 10 sec
30 1151 3.66 MB/sec execute 11 sec
30 1164 3.91 MB/sec execute 12 sec
30 1172 4.23 MB/sec execute 13 sec
30 1305 7.88 MB/sec execute 15 sec
30 1324 7.53 MB/sec execute 16 sec
30 1428 7.96 MB/sec execute 17 sec
30 1793 10.90 MB/sec execute 18 sec
30 2150 13.32 MB/sec execute 19 sec
30 2514 15.80 MB/sec execute 20 sec
30 2891 17.63 MB/sec execute 21 sec
30 3109 18.79 MB/sec execute 22 sec
30 3173 19.45 MB/sec execute 23 sec
30 3392 20.10 MB/sec execute 24 sec
30 3435 19.51 MB/sec execute 25 sec
30 3435 18.76 MB/sec execute 26 sec
30 3537 18.72 MB/sec execute 27 sec
30 3742 19.16 MB/sec execute 28 sec
30 3790 18.80 MB/sec execute 29 sec
30 3801 18.19 MB/sec execute 30 sec
30 3801 17.61 MB/sec execute 31 sec
30 3801 17.06 MB/sec execute 32 sec
30 3802 16.54 MB/sec execute 33 sec
30 3838 16.27 MB/sec execute 34 sec
30 3863 15.94 MB/sec execute 35 sec
30 4080 16.43 MB/sec execute 36 sec
30 4185 16.61 MB/sec execute 37 sec
30 4254 16.47 MB/sec execute 38 sec
30 4254 16.05 MB/sec execute 39 sec
30 4254 15.65 MB/sec execute 40 sec
30 4261 15.41 MB/sec execute 41 sec
30 4266 15.16 MB/sec execute 42 sec
30 6902 19.95 MB/sec execute 56 sec
30 6997 20.12 MB/sec execute 57 sec
30 6998 19.79 MB/sec execute 58 sec
30 7060 19.63 MB/sec execute 59 sec
30 7234 19.72 MB/sec execute 60 sec
30 7580 20.44 MB/sec execute 61 sec
30 7943 21.01 MB/sec execute 62 sec
30 8298 21.60 MB/sec execute 63 sec
30 8659 22.19 MB/sec execute 64 sec
30 8986 22.73 MB/sec execute 65 sec
30 9347 23.33 MB/sec execute 66 sec
30 9713 23.84 MB/sec execute 67 sec
30 9968 24.12 MB/sec execute 68 sec
30 9968 23.77 MB/sec execute 69 sec
30 9968 23.43 MB/sec execute 70 sec
30 9968 23.10 MB/sec execute 71 sec
30 9968 22.77 MB/sec execute 72 sec
30 10001 22.57 MB/sec execute 73 sec
30 10013 22.31 MB/sec execute 74 sec
30 10078 22.19 MB/sec execute 75 sec
30 10109 22.04 MB/sec execute 76 sec
30 10113 21.76 MB/sec execute 77 sec
30 10197 21.58 MB/sec execute 78 sec
30 10198 21.31 MB/sec execute 79 sec
30 10209 21.05 MB/sec execute 80 sec
30 10209 20.78 MB/sec execute 81 sec
30 10510 19.28 MB/sec execute 91 sec
30 10698 19.43 MB/sec execute 92 sec
30 10746 19.31 MB/sec execute 93 sec
30 10944 19.44 MB/sec execute 94 sec
30 11068 19.47 MB/sec execute 95 sec
30 11077 19.27 MB/sec execute 96 sec
30 11166 19.22 MB/sec execute 97 sec
30 11166 19.02 MB/sec execute 98 sec
30 11166 18.83 MB/sec execute 99 sec
30 11166 18.64 MB/sec execute 100 sec
30 11272 18.62 MB/sec execute 101 sec
30 11438 18.71 MB/sec execute 102 sec
30 11633 18.92 MB/sec execute 103 sec
30 11783 18.96 MB/sec execute 104 sec
30 12021 19.17 MB/sec execute 105 sec
30 13712 19.37 MB/sec execute 117 sec
30 13943 19.49 MB/sec execute 118 sec
30 13993 19.37 MB/sec execute 119 sec
30 14021 19.23 MB/sec execute 120 sec
30 14031 19.10 MB/sec execute 121 sec
30 14037 18.95 MB/sec execute 122 sec
30 14070 18.85 MB/sec execute 123 sec
30 14413 19.15 MB/sec execute 124 sec
30 14729 19.37 MB/sec execute 125 sec
30 15089 19.68 MB/sec execute 126 sec
30 15451 20.00 MB/sec execute 127 sec
30 15814 20.30 MB/sec execute 128 sec
30 16178 20.60 MB/sec execute 129 sec
30 16212 20.50 MB/sec execute 130 sec
30 16270 20.42 MB/sec execute 131 sec
30 16278 20.28 MB/sec execute 132 sec
30 16632 20.56 MB/sec execute 133 sec
30 16981 20.85 MB/sec execute 134 sec
30 17315 21.11 MB/sec execute 135 sec
30 17595 21.27 MB/sec execute 136 sec
30 17920 21.57 MB/sec execute 137 sec
30 18231 21.77 MB/sec execute 138 sec
30 18605 22.02 MB/sec execute 139 sec
30 18977 22.24 MB/sec execute 140 sec
30 19181 22.34 MB/sec execute 141 sec
30 19232 22.20 MB/sec execute 142 sec
30 19481 22.34 MB/sec execute 143 sec
30 19724 22.45 MB/sec execute 144 sec
30 19831 22.43 MB/sec execute 145 sec
30 19943 22.38 MB/sec execute 146 sec
30 20025 22.31 MB/sec execute 147 sec
30 20091 22.24 MB/sec execute 148 sec
30 20175 22.18 MB/sec execute 149 sec
30 20252 22.12 MB/sec execute 150 sec
30 20348 22.08 MB/sec execute 151 sec
30 20454 22.03 MB/sec execute 152 sec
30 20543 21.99 MB/sec execute 153 sec
30 20653 21.97 MB/sec execute 154 sec
30 20706 21.88 MB/sec execute 155 sec
30 20758 21.80 MB/sec execute 156 sec
30 20790 21.70 MB/sec execute 157 sec
30 20828 21.61 MB/sec execute 158 sec
30 21111 21.76 MB/sec execute 159 sec
30 21312 21.79 MB/sec execute 160 sec
30 21414 20.90 MB/sec execute 168 sec
30 21414 20.78 MB/sec execute 169 sec
30 21414 20.66 MB/sec execute 170 sec
30 21414 20.54 MB/sec execute 171 sec
30 21414 20.42 MB/sec execute 172 sec
30 21414 20.30 MB/sec execute 173 sec
30 21414 20.18 MB/sec execute 174 sec
30 21414 20.07 MB/sec execute 175 sec
30 21414 19.95 MB/sec execute 176 sec
30 21443 19.89 MB/sec execute 177 sec
30 21539 19.89 MB/sec execute 178 sec
30 21630 19.88 MB/sec execute 179 sec
30 21677 19.81 MB/sec execute 180 sec
30 21906 19.89 MB/sec execute 181 sec
30 22088 19.95 MB/sec execute 182 sec
30 22242 19.98 MB/sec execute 183 sec
30 22461 20.08 MB/sec execute 184 sec
30 22547 20.05 MB/sec execute 185 sec
30 22566 19.95 MB/sec execute 186 sec
30 22572 19.85 MB/sec execute 187 sec
30 22572 19.74 MB/sec execute 188 sec
30 22572 19.64 MB/sec execute 189 sec
30 22573 19.53 MB/sec execute 190 sec
30 22727 19.58 MB/sec execute 191 sec
30 22952 19.66 MB/sec execute 192 sec
30 23302 19.83 MB/sec execute 193 sec
30 23326 19.75 MB/sec execute 194 sec
30 23326 19.65 MB/sec execute 195 sec
30 23639 19.28 MB/sec execute 201 sec
30 23914 19.43 MB/sec execute 202 sec
30 24221 19.58 MB/sec execute 203 sec
30 24323 19.55 MB/sec execute 204 sec
30 24361 19.49 MB/sec execute 205 sec
30 24361 19.39 MB/sec execute 206 sec
30 24361 19.30 MB/sec execute 207 sec
30 24387 19.23 MB/sec execute 208 sec
30 24387 19.14 MB/sec execute 209 sec
30 24387 19.05 MB/sec execute 210 sec
30 24387 18.96 MB/sec execute 211 sec
30 24387 18.87 MB/sec execute 212 sec
30 24387 18.78 MB/sec execute 213 sec
30 24407 18.69 MB/sec execute 214 sec
30 24517 18.69 MB/sec execute 215 sec
30 24647 18.69 MB/sec execute 216 sec
30 24907 18.80 MB/sec execute 217 sec
30 25262 18.97 MB/sec execute 218 sec
30 25610 19.15 MB/sec execute 219 sec
30 25698 19.15 MB/sec execute 220 sec
30 25804 19.10 MB/sec execute 221 sec
30 26017 19.19 MB/sec execute 222 sec
30 26179 19.27 MB/sec execute 223 sec
30 26291 19.26 MB/sec execute 224 sec
30 26448 19.26 MB/sec execute 225 sec
30 26801 19.45 MB/sec execute 226 sec
30 26998 19.51 MB/sec execute 227 sec
30 27023 19.44 MB/sec execute 228 sec
30 27057 19.36 MB/sec execute 230 sec
30 27057 19.28 MB/sec execute 231 sec
30 27057 19.20 MB/sec execute 232 sec
30 27173 19.15 MB/sec execute 233 sec
30 27173 19.07 MB/sec execute 234 sec
30 27182 18.99 MB/sec execute 235 sec
30 27182 18.91 MB/sec execute 236 sec
30 27182 18.83 MB/sec execute 237 sec
30 27182 18.75 MB/sec execute 238 sec
30 27182 18.67 MB/sec execute 239 sec
30 27182 18.60 MB/sec execute 240 sec
30 27182 18.52 MB/sec execute 241 sec
30 27182 18.44 MB/sec execute 242 sec
30 27182 18.37 MB/sec execute 243 sec
30 27182 18.29 MB/sec execute 244 sec
30 27183 18.22 MB/sec execute 245 sec
30 27185 18.14 MB/sec execute 246 sec
30 27185 18.07 MB/sec execute 247 sec
30 27185 18.00 MB/sec execute 248 sec
30 27185 17.93 MB/sec execute 249 sec
30 27185 17.86 MB/sec execute 250 sec
30 27185 17.79 MB/sec execute 251 sec
30 27187 17.72 MB/sec execute 252 sec
30 27187 17.65 MB/sec execute 253 sec
30 27199 17.58 MB/sec execute 254 sec
30 27228 17.53 MB/sec execute 255 sec
30 27261 17.48 MB/sec execute 256 sec
30 27296 17.14 MB/sec execute 261 sec
30 27296 17.07 MB/sec execute 262 sec
30 27296 17.01 MB/sec execute 263 sec
30 27296 16.94 MB/sec execute 264 sec
30 27296 16.88 MB/sec execute 265 sec
30 27300 16.82 MB/sec execute 266 sec
30 27308 16.78 MB/sec execute 267 sec
30 27308 16.71 MB/sec execute 268 sec
30 27308 16.65 MB/sec execute 269 sec
30 27423 16.64 MB/sec execute 270 sec
30 27720 16.77 MB/sec execute 271 sec
30 27951 16.85 MB/sec execute 272 sec
30 28036 16.88 MB/sec execute 273 sec
30 28182 16.91 MB/sec execute 274 sec
30 28259 16.89 MB/sec execute 275 sec
30 28308 16.86 MB/sec execute 276 sec
30 28330 16.81 MB/sec execute 277 sec
30 28330 16.75 MB/sec execute 278 sec
30 28377 16.72 MB/sec execute 279 sec
30 28539 16.74 MB/sec execute 280 sec
30 28762 16.80 MB/sec execute 281 sec
30 29077 16.92 MB/sec execute 282 sec
30 29372 17.03 MB/sec execute 283 sec
30 29642 17.14 MB/sec execute 284 sec
30 29906 17.20 MB/sec execute 285 sec
30 30263 17.34 MB/sec execute 286 sec
30 30622 17.48 MB/sec execute 287 sec
30 30981 17.63 MB/sec execute 288 sec
30 31325 17.75 MB/sec execute 289 sec
30 31685 17.90 MB/sec execute 290 sec
30 32046 18.05 MB/sec execute 291 sec
30 32185 18.06 MB/sec execute 292 sec
30 32267 18.04 MB/sec execute 293 sec
30 32415 18.06 MB/sec execute 294 sec
30 32783 18.19 MB/sec execute 295 sec
30 33145 18.32 MB/sec execute 296 sec
30 33481 18.47 MB/sec execute 297 sec
30 33666 18.53 MB/sec execute 298 sec
30 33667 18.46 MB/sec execute 299 sec
30 33785 18.46 MB/sec execute 300 sec
30 33973 18.49 MB/sec execute 301 sec
30 34219 18.54 MB/sec execute 302 sec
30 34401 18.57 MB/sec execute 303 sec
30 34428 18.53 MB/sec execute 304 sec
30 34547 18.52 MB/sec execute 305 sec
30 34559 18.46 MB/sec execute 306 sec
30 34567 18.40 MB/sec execute 307 sec
30 34579 18.34 MB/sec execute 308 sec
30 34643 18.32 MB/sec execute 309 sec
30 34794 18.33 MB/sec execute 310 sec
30 34794 18.27 MB/sec execute 311 sec
30 34794 18.22 MB/sec execute 312 sec
30 34794 18.16 MB/sec execute 313 sec
30 34854 18.15 MB/sec execute 314 sec
30 35074 18.18 MB/sec execute 315 sec
30 35264 18.22 MB/sec execute 316 sec
30 35560 18.30 MB/sec execute 317 sec
30 35832 18.40 MB/sec execute 318 sec
30 36135 18.47 MB/sec execute 319 sec
30 36442 18.58 MB/sec execute 320 sec
30 36734 18.68 MB/sec execute 321 sec
30 37050 18.76 MB/sec execute 322 sec
30 37130 18.74 MB/sec execute 323 sec
30 37131 18.68 MB/sec execute 324 sec
30 37168 18.65 MB/sec execute 325 sec
30 37261 18.63 MB/sec execute 326 sec
30 37315 18.61 MB/sec execute 327 sec
30 37608 18.53 MB/sec execute 332 sec
30 37945 18.66 MB/sec execute 333 sec
30 38283 18.76 MB/sec execute 334 sec
30 38574 18.84 MB/sec execute 335 sec
30 38649 18.83 MB/sec execute 336 sec
30 38649 18.77 MB/sec execute 337 sec
30 38987 18.67 MB/sec execute 342 sec
30 39347 18.78 MB/sec execute 343 sec
30 39666 18.89 MB/sec execute 344 sec
30 39939 18.96 MB/sec execute 345 sec
30 40084 18.98 MB/sec execute 346 sec
30 40101 18.93 MB/sec execute 347 sec
30 40101 18.88 MB/sec execute 348 sec
30 40101 18.82 MB/sec execute 349 sec
30 40101 18.77 MB/sec execute 350 sec
30 40101 18.71 MB/sec execute 351 sec
30 40103 18.67 MB/sec execute 352 sec
30 40104 18.61 MB/sec execute 353 sec
30 40715 18.12 MB/sec execute 367 sec
30 40740 18.08 MB/sec execute 368 sec
30 40758 18.04 MB/sec execute 369 sec
30 40758 17.99 MB/sec execute 370 sec
30 40758 17.94 MB/sec execute 371 sec
30 40760 17.89 MB/sec execute 372 sec
30 40837 17.88 MB/sec execute 373 sec
30 41050 17.92 MB/sec execute 374 sec
30 41051 17.88 MB/sec execute 375 sec
30 41085 17.84 MB/sec execute 376 sec
30 41428 17.94 MB/sec execute 377 sec
30 41802 18.03 MB/sec execute 378 sec
30 42136 18.15 MB/sec execute 379 sec
30 42520 18.24 MB/sec execute 380 sec
30 42832 18.34 MB/sec execute 381 sec
30 43128 18.40 MB/sec execute 382 sec
30 43397 18.48 MB/sec execute 383 sec
30 43671 18.54 MB/sec execute 384 sec
30 43902 18.59 MB/sec execute 385 sec
30 43943 18.55 MB/sec execute 386 sec
30 44051 18.55 MB/sec execute 387 sec
30 44098 18.53 MB/sec execute 388 sec
30 44099 18.48 MB/sec execute 389 sec
30 44183 18.46 MB/sec execute 390 sec
30 44223 18.43 MB/sec execute 391 sec
30 44223 18.38 MB/sec execute 392 sec
30 44226 18.33 MB/sec execute 393 sec
30 44226 18.29 MB/sec execute 394 sec
30 44226 18.24 MB/sec execute 395 sec
30 44226 18.19 MB/sec execute 396 sec
30 44226 18.15 MB/sec execute 397 sec
30 44228 18.10 MB/sec execute 398 sec
30 44229 18.06 MB/sec execute 399 sec
30 44233 18.02 MB/sec execute 400 sec
30 44461 18.09 MB/sec execute 401 sec
30 44750 18.17 MB/sec execute 402 sec
30 44835 18.17 MB/sec execute 403 sec
30 44890 18.15 MB/sec execute 404 sec
30 45089 18.18 MB/sec execute 405 sec
30 45109 18.14 MB/sec execute 406 sec
30 45322 18.18 MB/sec execute 407 sec
30 45326 18.14 MB/sec execute 408 sec
30 45353 18.12 MB/sec execute 409 sec
30 45718 18.21 MB/sec execute 410 sec
30 45791 18.19 MB/sec execute 411 sec
30 45947 18.04 MB/sec execute 416 sec
30 46023 18.02 MB/sec execute 417 sec
30 46140 18.02 MB/sec execute 418 sec
30 46384 18.07 MB/sec execute 419 sec
30 46621 18.12 MB/sec execute 420 sec
30 46858 18.18 MB/sec execute 421 sec
30 47095 18.23 MB/sec execute 422 sec
30 47334 18.28 MB/sec execute 423 sec
30 47557 18.31 MB/sec execute 424 sec
30 47799 18.36 MB/sec execute 425 sec
30 47934 18.37 MB/sec execute 426 sec
30 47934 18.33 MB/sec execute 427 sec
30 47934 18.29 MB/sec execute 428 sec
30 47934 18.24 MB/sec execute 429 sec
30 47935 18.20 MB/sec execute 430 sec
30 47945 18.17 MB/sec execute 431 sec
30 47974 18.13 MB/sec execute 432 sec
30 47982 18.10 MB/sec execute 433 sec
30 48020 18.07 MB/sec execute 434 sec
30 48362 18.11 MB/sec execute 436 sec
30 48627 18.16 MB/sec execute 437 sec
30 48794 18.19 MB/sec execute 438 sec
30 49137 18.27 MB/sec execute 439 sec
30 49429 18.34 MB/sec execute 440 sec
30 49483 18.31 MB/sec execute 441 sec
30 49564 18.30 MB/sec execute 442 sec
30 49564 18.26 MB/sec execute 443 sec
30 49564 18.22 MB/sec execute 444 sec
30 49745 18.22 MB/sec execute 445 sec
30 49869 18.24 MB/sec execute 446 sec
30 49869 18.19 MB/sec execute 447 sec
30 49869 18.15 MB/sec execute 448 sec
30 49869 18.11 MB/sec execute 449 sec
30 49869 18.07 MB/sec execute 450 sec
30 49869 18.03 MB/sec execute 451 sec
30 49966 18.01 MB/sec execute 453 sec
30 50077 18.01 MB/sec execute 454 sec
30 50436 18.10 MB/sec execute 455 sec
30 50787 18.18 MB/sec execute 456 sec
30 51046 18.23 MB/sec execute 457 sec
30 51046 18.19 MB/sec execute 458 sec
30 51046 18.15 MB/sec execute 459 sec
30 51119 18.14 MB/sec execute 460 sec
30 51262 18.15 MB/sec execute 461 sec
30 51450 18.18 MB/sec execute 462 sec
30 51692 18.21 MB/sec execute 463 sec
30 51743 18.19 MB/sec execute 464 sec
30 52015 18.27 MB/sec execute 465 sec
30 52213 18.29 MB/sec execute 466 sec
30 52388 18.33 MB/sec execute 467 sec
30 52592 18.34 MB/sec execute 468 sec
30 52843 18.38 MB/sec execute 469 sec
30 53055 18.42 MB/sec execute 470 sec
30 53346 18.45 MB/sec execute 472 sec
30 53626 18.52 MB/sec execute 473 sec
30 53893 18.55 MB/sec execute 474 sec
30 54125 18.59 MB/sec execute 475 sec
30 54148 18.55 MB/sec execute 476 sec
30 54148 18.51 MB/sec execute 477 sec
30 54215 18.50 MB/sec execute 478 sec
30 54215 18.46 MB/sec execute 479 sec
30 54215 18.42 MB/sec execute 480 sec
30 54215 18.38 MB/sec execute 481 sec
30 54215 18.34 MB/sec execute 482 sec
30 54215 18.30 MB/sec execute 483 sec
30 54215 18.27 MB/sec execute 484 sec
30 54215 18.23 MB/sec execute 485 sec
30 54215 18.19 MB/sec execute 486 sec
30 54228 18.15 MB/sec execute 487 sec
30 54408 17.36 MB/sec execute 511 sec
30 54460 17.36 MB/sec execute 512 sec
30 54488 17.34 MB/sec execute 513 sec
30 54556 17.33 MB/sec execute 514 sec
30 54608 17.30 MB/sec execute 515 sec
30 54608 17.27 MB/sec execute 516 sec
30 54608 17.24 MB/sec execute 517 sec
30 54616 17.20 MB/sec execute 518 sec
30 54616 17.17 MB/sec execute 519 sec
30 54616 17.14 MB/sec execute 520 sec
30 54616 17.10 MB/sec execute 521 sec
30 54616 17.07 MB/sec execute 522 sec
30 54616 17.04 MB/sec execute 523 sec
30 54616 17.01 MB/sec execute 524 sec
30 54682 17.00 MB/sec execute 525 sec
30 54683 16.97 MB/sec execute 526 sec
30 54683 16.93 MB/sec execute 527 sec
30 54683 16.90 MB/sec execute 528 sec
30 54686 16.87 MB/sec execute 529 sec
30 54687 16.84 MB/sec execute 530 sec
30 54789 16.82 MB/sec execute 531 sec
30 54849 16.81 MB/sec execute 532 sec
30 54930 16.79 MB/sec execute 533 sec
30 55042 16.80 MB/sec execute 534 sec
30 55161 16.80 MB/sec execute 535 sec
30 55244 16.80 MB/sec execute 536 sec
30 55273 16.77 MB/sec execute 537 sec
30 55427 16.76 MB/sec execute 539 sec
30 55443 16.73 MB/sec execute 540 sec
30 55443 16.70 MB/sec execute 541 sec
30 55443 16.67 MB/sec execute 542 sec
30 55591 16.69 MB/sec execute 543 sec
30 55748 16.69 MB/sec execute 544 sec
30 55751 16.66 MB/sec execute 545 sec
30 55751 16.63 MB/sec execute 546 sec
30 55751 16.60 MB/sec execute 547 sec
30 55751 16.57 MB/sec execute 548 sec
30 55751 16.54 MB/sec execute 549 sec
30 55751 16.51 MB/sec execute 550 sec
30 55751 16.48 MB/sec execute 551 sec
30 55751 16.45 MB/sec execute 552 sec
30 55760 16.42 MB/sec execute 553 sec
30 55896 16.43 MB/sec execute 554 sec
30 56111 16.47 MB/sec execute 555 sec
30 56459 16.53 MB/sec execute 556 sec
30 56802 16.61 MB/sec execute 557 sec
30 57107 16.66 MB/sec execute 558 sec
30 57195 16.66 MB/sec execute 559 sec
30 57302 16.66 MB/sec execute 560 sec
30 57313 16.63 MB/sec execute 561 sec
30 57314 16.60 MB/sec execute 562 sec
30 57336 16.58 MB/sec execute 563 sec
30 57346 16.55 MB/sec execute 564 sec
30 57346 16.52 MB/sec execute 565 sec
30 57351 16.50 MB/sec execute 566 sec
30 57373 16.47 MB/sec execute 567 sec
30 57390 16.45 MB/sec execute 568 sec
30 57422 16.43 MB/sec execute 569 sec
30 57451 16.41 MB/sec execute 570 sec
30 57501 16.39 MB/sec execute 571 sec
30 57520 16.38 MB/sec execute 572 sec
30 57538 16.35 MB/sec execute 573 sec
30 57893 16.36 MB/sec execute 576 sec
30 58139 16.42 MB/sec execute 577 sec
30 58157 16.39 MB/sec execute 578 sec
30 58157 16.37 MB/sec execute 579 sec
30 58195 16.36 MB/sec execute 580 sec
30 58317 16.36 MB/sec execute 581 sec
30 58644 16.41 MB/sec execute 582 sec
30 58646 16.39 MB/sec execute 583 sec
30 58661 16.36 MB/sec execute 584 sec
30 58902 16.33 MB/sec execute 587 sec
30 59116 16.36 MB/sec execute 588 sec
30 59333 16.38 MB/sec execute 589 sec
30 59561 16.43 MB/sec execute 590 sec
30 59788 16.47 MB/sec execute 591 sec
30 60014 16.50 MB/sec execute 592 sec
30 60236 16.53 MB/sec execute 593 sec
30 60461 16.57 MB/sec execute 594 sec
30 60625 16.58 MB/sec execute 595 sec
30 60681 16.57 MB/sec execute 596 sec
30 60685 16.54 MB/sec execute 597 sec
30 60713 16.52 MB/sec execute 598 sec
30 60735 16.50 MB/sec execute 599 sec
30 60751 16.48 MB/sec cleanup 600 sec
30 60751 16.45 MB/sec cleanup 601 sec
30 60751 16.42 MB/sec cleanup 603 sec
30 60751 16.39 MB/sec cleanup 604 sec
30 60751 16.37 MB/sec cleanup 605 sec
30 60751 16.34 MB/sec cleanup 606 sec
30 60751 16.31 MB/sec cleanup 607 sec
30 60751 16.29 MB/sec cleanup 608 sec
30 60751 16.26 MB/sec cleanup 609 sec
30 60751 16.23 MB/sec cleanup 610 sec
30 60751 16.21 MB/sec cleanup 611 sec
30 60751 16.18 MB/sec cleanup 612 sec
30 60751 16.15 MB/sec cleanup 613 sec
30 60751 16.13 MB/sec cleanup 614 sec
30 60751 16.10 MB/sec cleanup 615 sec
30 60751 16.08 MB/sec cleanup 616 sec
30 60751 15.99 MB/sec cleanup 619 sec
30 60751 15.96 MB/sec cleanup 620 sec
30 60751 15.94 MB/sec cleanup 621 sec
30 60751 15.93 MB/sec cleanup 621 sec
Throughput 16.4806 MB/sec 30 procs
[-- Attachment #6: pdflush-hda3.log --]
[-- Type: text/plain, Size: 27799 bytes --]
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004
Running for 600 seconds with load '/usr/share/dbench/client.txt' and minimum warmup 120 secs
30 clients started
30 13 2.82 MB/sec warmup 1 sec
30 20 4.27 MB/sec warmup 3 sec
30 20 3.05 MB/sec warmup 4 sec
30 20 2.37 MB/sec warmup 5 sec
30 20 1.94 MB/sec warmup 6 sec
30 51 6.86 MB/sec warmup 7 sec
30 64 7.10 MB/sec warmup 8 sec
30 69 7.26 MB/sec warmup 9 sec
30 74 7.22 MB/sec warmup 10 sec
30 78 7.09 MB/sec warmup 11 sec
30 82 6.71 MB/sec warmup 12 sec
30 93 7.23 MB/sec warmup 13 sec
30 161 7.83 MB/sec warmup 14 sec
30 169 8.17 MB/sec warmup 15 sec
30 169 7.64 MB/sec warmup 16 sec
30 177 7.82 MB/sec warmup 17 sec
30 183 7.81 MB/sec warmup 18 sec
30 188 7.73 MB/sec warmup 19 sec
30 190 7.46 MB/sec warmup 20 sec
30 221 7.60 MB/sec warmup 21 sec
30 225 7.52 MB/sec warmup 22 sec
30 228 7.43 MB/sec warmup 23 sec
30 232 7.29 MB/sec warmup 24 sec
30 232 6.99 MB/sec warmup 25 sec
30 245 7.37 MB/sec warmup 26 sec
30 252 7.34 MB/sec warmup 27 sec
30 256 7.29 MB/sec warmup 28 sec
30 265 7.38 MB/sec warmup 29 sec
30 265 7.13 MB/sec warmup 30 sec
30 282 7.43 MB/sec warmup 31 sec
30 292 7.51 MB/sec warmup 32 sec
30 304 7.51 MB/sec warmup 33 sec
30 306 7.33 MB/sec warmup 34 sec
30 306 7.12 MB/sec warmup 35 sec
30 324 7.40 MB/sec warmup 36 sec
30 336 7.54 MB/sec warmup 37 sec
30 347 7.62 MB/sec warmup 38 sec
30 347 7.43 MB/sec warmup 39 sec
30 359 7.55 MB/sec warmup 40 sec
30 379 7.57 MB/sec warmup 41 sec
30 387 7.59 MB/sec warmup 42 sec
30 400 7.70 MB/sec warmup 43 sec
30 408 7.71 MB/sec warmup 44 sec
30 421 7.79 MB/sec warmup 45 sec
30 433 7.75 MB/sec warmup 46 sec
30 444 7.83 MB/sec warmup 47 sec
30 448 7.75 MB/sec warmup 48 sec
30 456 7.76 MB/sec warmup 49 sec
30 480 7.93 MB/sec warmup 50 sec
30 483 7.84 MB/sec warmup 51 sec
30 498 7.87 MB/sec warmup 52 sec
30 506 7.86 MB/sec warmup 53 sec
30 531 8.06 MB/sec warmup 54 sec
30 544 8.14 MB/sec warmup 55 sec
30 575 8.17 MB/sec warmup 56 sec
30 606 8.15 MB/sec warmup 57 sec
30 613 8.06 MB/sec warmup 58 sec
30 802 8.63 MB/sec warmup 59 sec
30 971 9.04 MB/sec warmup 60 sec
30 988 9.00 MB/sec warmup 61 sec
30 998 9.02 MB/sec warmup 62 sec
30 1082 9.09 MB/sec warmup 63 sec
30 1223 9.30 MB/sec warmup 64 sec
30 1472 9.81 MB/sec warmup 65 sec
30 1629 9.43 MB/sec warmup 72 sec
30 1661 9.45 MB/sec warmup 73 sec
30 1792 9.78 MB/sec warmup 74 sec
30 1926 9.94 MB/sec warmup 75 sec
30 2252 10.47 MB/sec warmup 76 sec
30 2564 11.02 MB/sec warmup 77 sec
30 2881 11.53 MB/sec warmup 78 sec
30 3138 12.11 MB/sec warmup 79 sec
30 3270 12.36 MB/sec warmup 80 sec
30 3360 11.97 MB/sec warmup 85 sec
30 3390 11.46 MB/sec warmup 89 sec
30 3430 10.80 MB/sec warmup 95 sec
30 3435 10.67 MB/sec warmup 96 sec
30 3461 10.62 MB/sec warmup 97 sec
30 3474 10.51 MB/sec warmup 99 sec
30 3485 10.39 MB/sec warmup 100 sec
30 3542 10.35 MB/sec warmup 101 sec
30 3574 10.31 MB/sec warmup 102 sec
30 3789 10.53 MB/sec warmup 103 sec
30 4105 10.90 MB/sec warmup 104 sec
30 4416 11.30 MB/sec warmup 105 sec
30 4736 11.70 MB/sec warmup 106 sec
30 4903 11.82 MB/sec warmup 107 sec
30 4934 11.75 MB/sec warmup 108 sec
30 4984 11.70 MB/sec warmup 109 sec
30 4992 11.63 MB/sec warmup 110 sec
30 4992 11.52 MB/sec warmup 111 sec
30 4992 11.42 MB/sec warmup 112 sec
30 5033 11.38 MB/sec warmup 113 sec
30 5149 11.47 MB/sec warmup 114 sec
30 5284 11.61 MB/sec warmup 115 sec
30 5450 11.76 MB/sec warmup 116 sec
30 5618 11.90 MB/sec warmup 117 sec
30 5783 12.00 MB/sec warmup 118 sec
30 5783 11.90 MB/sec warmup 119 sec
30 5802 11.80 MB/sec warmup 120 sec
30 5948 11.54 MB/sec execute 1 sec
30 5956 7.62 MB/sec execute 2 sec
30 6053 9.94 MB/sec execute 3 sec
30 6218 14.81 MB/sec execute 4 sec
30 6275 5.18 MB/sec execute 13 sec
30 6275 4.81 MB/sec execute 14 sec
30 6275 4.48 MB/sec execute 15 sec
30 6275 4.20 MB/sec execute 16 sec
30 6275 3.95 MB/sec execute 17 sec
30 6275 3.73 MB/sec execute 18 sec
30 6278 3.57 MB/sec execute 19 sec
30 6380 4.11 MB/sec execute 20 sec
30 6473 5.05 MB/sec execute 21 sec
30 6474 4.82 MB/sec execute 22 sec
30 6480 4.85 MB/sec execute 23 sec
30 6586 5.57 MB/sec execute 24 sec
30 6788 6.49 MB/sec execute 25 sec
30 7070 8.13 MB/sec execute 26 sec
30 7355 9.76 MB/sec execute 27 sec
30 7684 10.88 MB/sec execute 28 sec
30 7965 12.52 MB/sec execute 29 sec
30 8027 12.57 MB/sec execute 30 sec
30 8027 12.16 MB/sec execute 31 sec
30 8027 11.78 MB/sec execute 32 sec
30 8027 11.42 MB/sec execute 33 sec
30 8081 11.46 MB/sec execute 34 sec
30 8234 11.74 MB/sec execute 35 sec
30 8402 12.12 MB/sec execute 36 sec
30 8680 13.02 MB/sec execute 37 sec
30 8911 13.77 MB/sec execute 38 sec
30 9069 14.05 MB/sec execute 39 sec
30 9396 15.01 MB/sec execute 40 sec
30 9636 15.58 MB/sec execute 41 sec
30 9775 15.73 MB/sec execute 42 sec
30 9870 15.74 MB/sec execute 43 sec
30 9934 12.59 MB/sec execute 54 sec
30 9934 12.36 MB/sec execute 55 sec
30 9934 12.14 MB/sec execute 56 sec
30 9934 11.93 MB/sec execute 57 sec
30 9934 11.72 MB/sec execute 58 sec
30 9934 11.53 MB/sec execute 59 sec
30 9934 11.34 MB/sec execute 60 sec
30 9934 11.15 MB/sec execute 61 sec
30 9934 10.97 MB/sec execute 62 sec
30 9934 10.80 MB/sec execute 63 sec
30 10006 10.83 MB/sec execute 64 sec
30 10316 11.34 MB/sec execute 65 sec
30 10584 11.96 MB/sec execute 66 sec
30 10871 12.60 MB/sec execute 67 sec
30 11147 13.17 MB/sec execute 68 sec
30 11383 13.43 MB/sec execute 69 sec
30 11676 13.82 MB/sec execute 70 sec
30 11896 14.12 MB/sec execute 71 sec
30 12149 14.53 MB/sec execute 72 sec
30 12234 14.49 MB/sec execute 73 sec
30 12273 14.44 MB/sec execute 74 sec
30 12399 14.55 MB/sec execute 75 sec
30 12410 14.37 MB/sec execute 76 sec
30 12476 14.28 MB/sec execute 77 sec
30 12758 14.73 MB/sec execute 78 sec
30 13081 15.17 MB/sec execute 79 sec
30 13393 15.60 MB/sec execute 80 sec
30 13701 16.05 MB/sec execute 81 sec
30 14015 16.43 MB/sec execute 82 sec
30 14291 16.74 MB/sec execute 83 sec
30 14327 16.61 MB/sec execute 84 sec
30 14469 15.54 MB/sec execute 92 sec
30 14498 15.40 MB/sec execute 93 sec
30 14498 15.24 MB/sec execute 94 sec
30 14498 15.08 MB/sec execute 95 sec
30 14531 14.95 MB/sec execute 96 sec
30 14736 15.17 MB/sec execute 97 sec
30 14880 15.22 MB/sec execute 98 sec
30 15007 15.30 MB/sec execute 99 sec
30 15159 15.39 MB/sec execute 100 sec
30 15237 15.37 MB/sec execute 101 sec
30 15342 15.42 MB/sec execute 102 sec
30 15567 15.56 MB/sec execute 103 sec
30 15879 15.89 MB/sec execute 104 sec
30 16075 16.11 MB/sec execute 105 sec
30 16323 16.31 MB/sec execute 106 sec
30 16556 16.49 MB/sec execute 107 sec
30 16817 16.74 MB/sec execute 108 sec
30 17654 17.09 MB/sec execute 114 sec
30 17766 17.10 MB/sec execute 115 sec
30 17856 17.03 MB/sec execute 116 sec
30 17889 16.93 MB/sec execute 117 sec
30 17959 16.84 MB/sec execute 118 sec
30 18024 16.77 MB/sec execute 119 sec
30 18036 16.66 MB/sec execute 120 sec
30 18036 16.53 MB/sec execute 121 sec
30 18067 16.45 MB/sec execute 122 sec
30 18179 14.64 MB/sec execute 138 sec
30 19236 15.07 MB/sec execute 145 sec
30 19498 15.27 MB/sec execute 146 sec
30 19687 15.41 MB/sec execute 147 sec
30 19928 15.50 MB/sec execute 148 sec
30 20154 15.70 MB/sec execute 149 sec
30 20341 15.79 MB/sec execute 150 sec
30 20577 15.91 MB/sec execute 151 sec
30 20813 16.14 MB/sec execute 152 sec
30 21060 16.32 MB/sec execute 153 sec
30 21308 16.47 MB/sec execute 154 sec
30 21542 16.64 MB/sec execute 155 sec
30 21872 16.63 MB/sec execute 158 sec
30 22137 16.76 MB/sec execute 159 sec
30 22206 16.69 MB/sec execute 160 sec
30 22330 16.68 MB/sec execute 161 sec
30 22472 16.73 MB/sec execute 162 sec
30 22529 16.68 MB/sec execute 163 sec
30 22541 16.60 MB/sec execute 164 sec
30 22541 16.43 MB/sec execute 166 sec
30 22541 16.33 MB/sec execute 167 sec
30 22541 16.23 MB/sec execute 168 sec
30 22541 16.14 MB/sec execute 169 sec
30 22541 16.04 MB/sec execute 170 sec
30 22556 15.97 MB/sec execute 171 sec
30 22689 16.01 MB/sec execute 172 sec
30 23014 16.21 MB/sec execute 173 sec
30 23191 16.30 MB/sec execute 174 sec
30 23423 16.42 MB/sec execute 175 sec
30 23680 16.55 MB/sec execute 176 sec
30 23846 16.60 MB/sec execute 177 sec
30 23965 16.61 MB/sec execute 178 sec
30 24135 16.69 MB/sec execute 179 sec
30 24259 16.71 MB/sec execute 180 sec
30 24481 16.83 MB/sec execute 181 sec
30 24729 16.96 MB/sec execute 182 sec
30 24991 17.11 MB/sec execute 183 sec
30 25281 17.31 MB/sec execute 184 sec
30 25579 17.46 MB/sec execute 185 sec
30 25838 17.62 MB/sec execute 186 sec
30 25903 17.57 MB/sec execute 187 sec
30 26263 17.35 MB/sec execute 192 sec
30 26350 17.33 MB/sec execute 193 sec
30 26371 17.25 MB/sec execute 194 sec
30 26569 17.34 MB/sec execute 195 sec
30 26892 17.50 MB/sec execute 196 sec
30 27050 17.55 MB/sec execute 197 sec
30 27088 17.49 MB/sec execute 198 sec
30 27135 17.43 MB/sec execute 199 sec
30 27151 17.37 MB/sec execute 200 sec
30 27151 17.28 MB/sec execute 201 sec
30 27157 17.21 MB/sec execute 202 sec
30 27300 17.23 MB/sec execute 203 sec
30 27628 17.41 MB/sec execute 204 sec
30 27928 17.55 MB/sec execute 205 sec
30 28244 17.72 MB/sec execute 206 sec
30 28571 17.89 MB/sec execute 207 sec
30 28897 18.05 MB/sec execute 208 sec
30 29208 18.21 MB/sec execute 209 sec
30 29535 18.38 MB/sec execute 210 sec
30 29785 18.48 MB/sec execute 211 sec
30 29998 18.56 MB/sec execute 212 sec
30 30092 18.54 MB/sec execute 213 sec
30 30092 18.45 MB/sec execute 214 sec
30 30092 18.37 MB/sec execute 215 sec
30 30092 18.28 MB/sec execute 216 sec
30 30092 18.20 MB/sec execute 217 sec
30 30169 18.19 MB/sec execute 218 sec
30 30258 18.17 MB/sec execute 219 sec
30 30411 18.20 MB/sec execute 220 sec
30 30595 18.28 MB/sec execute 221 sec
30 30662 18.26 MB/sec execute 222 sec
30 30960 18.09 MB/sec execute 227 sec
30 31247 18.20 MB/sec execute 228 sec
30 31316 18.16 MB/sec execute 229 sec
30 31350 18.10 MB/sec execute 230 sec
30 31427 18.08 MB/sec execute 231 sec
30 31497 18.06 MB/sec execute 232 sec
30 31674 18.09 MB/sec execute 233 sec
30 31936 18.18 MB/sec execute 234 sec
30 32246 18.34 MB/sec execute 235 sec
30 32560 18.46 MB/sec execute 236 sec
30 32897 18.56 MB/sec execute 237 sec
30 33175 18.74 MB/sec execute 238 sec
30 33513 18.82 MB/sec execute 239 sec
30 33784 18.99 MB/sec execute 240 sec
30 34125 19.10 MB/sec execute 241 sec
30 37031 20.11 MB/sec execute 253 sec
30 37316 20.23 MB/sec execute 254 sec
30 37603 20.29 MB/sec execute 255 sec
30 37918 20.41 MB/sec execute 256 sec
30 38232 20.47 MB/sec execute 257 sec
30 38526 20.58 MB/sec execute 258 sec
30 38718 20.64 MB/sec execute 259 sec
30 38718 20.56 MB/sec execute 260 sec
30 38718 20.48 MB/sec execute 261 sec
30 38755 20.42 MB/sec execute 262 sec
30 38824 20.39 MB/sec execute 263 sec
30 38824 20.31 MB/sec execute 264 sec
30 38824 20.23 MB/sec execute 265 sec
30 38824 20.16 MB/sec execute 266 sec
30 38824 20.08 MB/sec execute 267 sec
30 38824 20.01 MB/sec execute 268 sec
30 38824 19.93 MB/sec execute 269 sec
30 38824 19.86 MB/sec execute 270 sec
30 38824 19.78 MB/sec execute 271 sec
30 38824 19.71 MB/sec execute 272 sec
30 38824 19.64 MB/sec execute 273 sec
30 38824 19.57 MB/sec execute 274 sec
30 38824 19.50 MB/sec execute 275 sec
30 38824 19.43 MB/sec execute 276 sec
30 38824 19.36 MB/sec execute 277 sec
30 38824 19.29 MB/sec execute 278 sec
30 38824 19.22 MB/sec execute 279 sec
30 38878 19.19 MB/sec execute 280 sec
30 38962 19.15 MB/sec execute 281 sec
30 39021 19.14 MB/sec execute 282 sec
30 39180 18.29 MB/sec execute 297 sec
30 39329 18.35 MB/sec execute 298 sec
30 39422 18.35 MB/sec execute 299 sec
30 39551 18.34 MB/sec execute 300 sec
30 39623 18.32 MB/sec execute 301 sec
30 39689 18.29 MB/sec execute 302 sec
30 39729 18.24 MB/sec execute 303 sec
30 39763 18.20 MB/sec execute 304 sec
30 39766 18.14 MB/sec execute 305 sec
30 39766 18.08 MB/sec execute 306 sec
30 39784 18.03 MB/sec execute 307 sec
30 39884 18.02 MB/sec execute 308 sec
30 40163 18.11 MB/sec execute 309 sec
30 40489 18.22 MB/sec execute 310 sec
30 40727 18.29 MB/sec execute 311 sec
30 40859 18.30 MB/sec execute 312 sec
30 41000 18.31 MB/sec execute 313 sec
30 41057 18.27 MB/sec execute 314 sec
30 41100 18.25 MB/sec execute 315 sec
30 41140 18.21 MB/sec execute 316 sec
30 41186 18.17 MB/sec execute 317 sec
30 41225 18.14 MB/sec execute 318 sec
30 41292 18.11 MB/sec execute 319 sec
30 41540 18.18 MB/sec execute 320 sec
30 41849 18.28 MB/sec execute 321 sec
30 42124 18.37 MB/sec execute 322 sec
30 42341 18.41 MB/sec execute 323 sec
30 42596 18.49 MB/sec execute 324 sec
30 42915 18.59 MB/sec execute 325 sec
30 43192 18.67 MB/sec execute 326 sec
30 43474 18.72 MB/sec execute 327 sec
30 43655 18.48 MB/sec execute 332 sec
30 43679 18.41 MB/sec execute 334 sec
30 43679 18.36 MB/sec execute 335 sec
30 43679 18.30 MB/sec execute 336 sec
30 43679 18.25 MB/sec execute 337 sec
30 43679 18.19 MB/sec execute 338 sec
30 43679 18.14 MB/sec execute 339 sec
30 43679 18.09 MB/sec execute 340 sec
30 43679 18.03 MB/sec execute 341 sec
30 43679 17.98 MB/sec execute 342 sec
30 43679 17.93 MB/sec execute 343 sec
30 43679 17.88 MB/sec execute 344 sec
30 43679 17.82 MB/sec execute 345 sec
30 43679 17.77 MB/sec execute 346 sec
30 43679 17.72 MB/sec execute 347 sec
30 43679 17.67 MB/sec execute 348 sec
30 44430 17.86 MB/sec execute 352 sec
30 44741 17.93 MB/sec execute 353 sec
30 45043 18.03 MB/sec execute 354 sec
30 45343 18.10 MB/sec execute 355 sec
30 45671 18.21 MB/sec execute 356 sec
30 45932 18.28 MB/sec execute 357 sec
30 46248 18.35 MB/sec execute 358 sec
30 46541 18.43 MB/sec execute 359 sec
30 46711 18.47 MB/sec execute 360 sec
30 46909 18.50 MB/sec execute 361 sec
30 47073 18.54 MB/sec execute 362 sec
30 47345 18.42 MB/sec execute 367 sec
30 47438 18.40 MB/sec execute 368 sec
30 47526 18.38 MB/sec execute 369 sec
30 47744 18.43 MB/sec execute 370 sec
30 47817 18.42 MB/sec execute 371 sec
30 47817 18.37 MB/sec execute 372 sec
30 47817 18.32 MB/sec execute 373 sec
30 47817 18.27 MB/sec execute 374 sec
30 47817 18.22 MB/sec execute 375 sec
30 47817 18.17 MB/sec execute 376 sec
30 47817 18.12 MB/sec execute 377 sec
30 47821 18.08 MB/sec execute 378 sec
30 47882 18.05 MB/sec execute 379 sec
30 47930 18.03 MB/sec execute 380 sec
30 48008 18.01 MB/sec execute 381 sec
30 48115 18.01 MB/sec execute 382 sec
30 48377 18.08 MB/sec execute 383 sec
30 48586 18.11 MB/sec execute 384 sec
30 48801 18.16 MB/sec execute 385 sec
30 49112 18.24 MB/sec execute 386 sec
30 49364 18.30 MB/sec execute 387 sec
30 49606 18.37 MB/sec execute 388 sec
30 49817 18.40 MB/sec execute 389 sec
30 50071 18.45 MB/sec execute 390 sec
30 50392 18.52 MB/sec execute 391 sec
30 50665 18.62 MB/sec execute 392 sec
30 50969 18.69 MB/sec execute 393 sec
30 51267 18.76 MB/sec execute 394 sec
30 51576 18.82 MB/sec execute 395 sec
30 51863 18.90 MB/sec execute 396 sec
30 52156 18.96 MB/sec execute 397 sec
30 52441 18.57 MB/sec execute 408 sec
30 52710 18.63 MB/sec execute 409 sec
30 52913 18.67 MB/sec execute 410 sec
30 52963 18.64 MB/sec execute 411 sec
30 52969 18.59 MB/sec execute 412 sec
30 52969 18.55 MB/sec execute 413 sec
30 52969 18.50 MB/sec execute 414 sec
30 53011 18.48 MB/sec execute 415 sec
30 53101 18.47 MB/sec execute 416 sec
30 53166 18.45 MB/sec execute 417 sec
30 53257 18.44 MB/sec execute 418 sec
30 53360 18.44 MB/sec execute 419 sec
30 53471 18.44 MB/sec execute 420 sec
30 53570 18.43 MB/sec execute 421 sec
30 53719 18.44 MB/sec execute 422 sec
30 54034 18.52 MB/sec execute 423 sec
30 54325 18.58 MB/sec execute 424 sec
30 54634 18.65 MB/sec execute 425 sec
30 54910 18.73 MB/sec execute 426 sec
30 55216 18.79 MB/sec execute 427 sec
30 55486 18.86 MB/sec execute 428 sec
30 55768 18.91 MB/sec execute 429 sec
30 56018 18.88 MB/sec execute 431 sec
30 56327 18.45 MB/sec execute 444 sec
30 56419 18.44 MB/sec execute 445 sec
30 56471 18.41 MB/sec execute 446 sec
30 56520 18.39 MB/sec execute 447 sec
30 56520 18.35 MB/sec execute 448 sec
30 56520 18.31 MB/sec execute 449 sec
30 56520 18.27 MB/sec execute 450 sec
30 56520 18.23 MB/sec execute 451 sec
30 56520 18.19 MB/sec execute 452 sec
30 56520 18.15 MB/sec execute 453 sec
30 56520 18.11 MB/sec execute 454 sec
30 56520 18.07 MB/sec execute 455 sec
30 56520 18.03 MB/sec execute 456 sec
30 56520 17.99 MB/sec execute 457 sec
30 56640 17.99 MB/sec execute 458 sec
30 56751 17.99 MB/sec execute 459 sec
30 57011 18.04 MB/sec execute 460 sec
30 57116 18.03 MB/sec execute 461 sec
30 57206 18.03 MB/sec execute 462 sec
30 57325 18.04 MB/sec execute 463 sec
30 57473 18.04 MB/sec execute 464 sec
30 57552 18.04 MB/sec execute 465 sec
30 57623 18.03 MB/sec execute 466 sec
30 57883 18.07 MB/sec execute 467 sec
30 57914 18.05 MB/sec execute 468 sec
30 57967 18.03 MB/sec execute 469 sec
30 58052 18.03 MB/sec execute 470 sec
30 58147 18.00 MB/sec execute 471 sec
30 58195 17.98 MB/sec execute 472 sec
30 58221 17.95 MB/sec execute 473 sec
30 58225 17.91 MB/sec execute 474 sec
30 58352 17.56 MB/sec execute 485 sec
30 58474 17.58 MB/sec execute 486 sec
30 58630 17.60 MB/sec execute 487 sec
30 58712 17.59 MB/sec execute 488 sec
30 58982 17.64 MB/sec execute 489 sec
30 59303 17.69 MB/sec execute 490 sec
30 59613 17.76 MB/sec execute 491 sec
30 59885 17.82 MB/sec execute 492 sec
30 60158 17.90 MB/sec execute 493 sec
30 60464 17.94 MB/sec execute 494 sec
30 60770 18.00 MB/sec execute 495 sec
30 61069 18.06 MB/sec execute 496 sec
30 61384 18.12 MB/sec execute 497 sec
30 61687 18.19 MB/sec execute 498 sec
30 61905 18.24 MB/sec execute 499 sec
30 62204 18.25 MB/sec execute 501 sec
30 62480 18.32 MB/sec execute 502 sec
30 62776 18.40 MB/sec execute 503 sec
30 63086 18.43 MB/sec execute 504 sec
30 63402 18.48 MB/sec execute 505 sec
30 63412 18.45 MB/sec execute 506 sec
30 63726 18.27 MB/sec execute 513 sec
30 63727 18.24 MB/sec execute 514 sec
30 63900 18.25 MB/sec execute 515 sec
30 64195 18.32 MB/sec execute 516 sec
30 64520 18.38 MB/sec execute 517 sec
30 64827 18.44 MB/sec execute 518 sec
30 65161 18.50 MB/sec execute 519 sec
30 65456 18.57 MB/sec execute 520 sec
30 65696 18.61 MB/sec execute 521 sec
30 65995 18.66 MB/sec execute 522 sec
30 66315 18.72 MB/sec execute 523 sec
30 66594 18.79 MB/sec execute 524 sec
30 66867 18.86 MB/sec execute 525 sec
30 67175 18.90 MB/sec execute 526 sec
30 67401 18.92 MB/sec execute 527 sec
30 67613 18.96 MB/sec execute 528 sec
30 67872 19.00 MB/sec execute 529 sec
30 68048 19.04 MB/sec execute 530 sec
30 68234 19.05 MB/sec execute 531 sec
30 68431 19.08 MB/sec execute 532 sec
30 68685 19.11 MB/sec execute 533 sec
30 68994 19.15 MB/sec execute 534 sec
30 69074 19.14 MB/sec execute 535 sec
30 69134 19.12 MB/sec execute 536 sec
30 69434 19.17 MB/sec execute 537 sec
30 69551 19.16 MB/sec execute 538 sec
30 69551 19.13 MB/sec execute 539 sec
30 69551 19.09 MB/sec execute 540 sec
30 69551 19.06 MB/sec execute 541 sec
30 69551 19.02 MB/sec execute 542 sec
30 69551 18.99 MB/sec execute 543 sec
30 69782 18.61 MB/sec execute 557 sec
30 70043 18.66 MB/sec execute 558 sec
30 70329 18.71 MB/sec execute 559 sec
30 70594 18.74 MB/sec execute 560 sec
30 70873 18.78 MB/sec execute 561 sec
30 71176 18.83 MB/sec execute 562 sec
30 71444 18.87 MB/sec execute 563 sec
30 71722 18.93 MB/sec execute 564 sec
30 72012 18.96 MB/sec execute 565 sec
30 72257 19.01 MB/sec execute 566 sec
30 72509 19.05 MB/sec execute 567 sec
30 72745 19.09 MB/sec execute 568 sec
30 72949 19.12 MB/sec execute 569 sec
30 73154 18.85 MB/sec execute 580 sec
30 73184 18.82 MB/sec execute 581 sec
30 73195 18.79 MB/sec execute 582 sec
30 73506 18.84 MB/sec execute 583 sec
30 73681 18.86 MB/sec execute 584 sec
30 73774 18.86 MB/sec execute 585 sec
30 73905 18.85 MB/sec execute 586 sec
30 74020 18.86 MB/sec execute 587 sec
30 74103 18.85 MB/sec execute 588 sec
30 74206 18.85 MB/sec execute 589 sec
30 74337 18.85 MB/sec execute 590 sec
30 74442 18.85 MB/sec execute 591 sec
30 74581 18.85 MB/sec execute 592 sec
30 74801 18.89 MB/sec execute 593 sec
30 75102 18.93 MB/sec execute 594 sec
30 75360 18.97 MB/sec execute 595 sec
30 75676 19.01 MB/sec execute 596 sec
30 75916 19.06 MB/sec execute 597 sec
30 76208 19.09 MB/sec execute 598 sec
30 76497 19.15 MB/sec execute 599 sec
30 76721 19.18 MB/sec execute 600 sec
30 77016 19.22 MB/sec cleanup 601 sec
30 77016 19.19 MB/sec cleanup 602 sec
30 77016 18.84 MB/sec cleanup 612 sec
30 77016 18.69 MB/sec cleanup 617 sec
30 77016 18.64 MB/sec cleanup 619 sec
Throughput 19.219 MB/sec 30 procs
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 15:20 ` Frederic Weisbecker
@ 2009-06-04 19:07 ` Andrew Morton
2009-06-04 19:13 ` Frederic Weisbecker
2009-06-05 1:14 ` Zhang, Yanmin
1 sibling, 1 reply; 70+ messages in thread
From: Andrew Morton @ 2009-06-04 19:07 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jens Axboe, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
> I've just tested it on UP in a single disk.
I must say, I'm stunned at the amount of testing which people are
performing on this patchset. Normally when someone sends out a
patchset it just sort of lands with a dull thud.
I'm not sure what Jens did right to make all this happen, but thanks!
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 19:07 ` Andrew Morton
@ 2009-06-04 19:13 ` Frederic Weisbecker
2009-06-04 19:50 ` Jens Axboe
0 siblings, 1 reply; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-04 19:13 UTC (permalink / raw)
To: Andrew Morton
Cc: Jens Axboe, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Thu, Jun 04, 2009 at 12:07:26PM -0700, Andrew Morton wrote:
> On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
>
> > I've just tested it on UP in a single disk.
>
> I must say, I'm stunned at the amount of testing which people are
> performing on this patchset. Normally when someone sends out a
> patchset it just sort of lands with a dull thud.
>
> I'm not sure what Jens did right to make all this happen, but thanks!
I don't know how he did either. I was reading theses patches and *something*
pushed me to my testbox, and then I tested...
Jens, how do you do that?
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 19:13 ` Frederic Weisbecker
@ 2009-06-04 19:50 ` Jens Axboe
2009-06-04 20:10 ` Jens Axboe
2009-06-04 21:37 ` Frederic Weisbecker
0 siblings, 2 replies; 70+ messages in thread
From: Jens Axboe @ 2009-06-04 19:50 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Andrew Morton, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Thu, Jun 04 2009, Frederic Weisbecker wrote:
> On Thu, Jun 04, 2009 at 12:07:26PM -0700, Andrew Morton wrote:
> > On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
> >
> > > I've just tested it on UP in a single disk.
> >
> > I must say, I'm stunned at the amount of testing which people are
> > performing on this patchset. Normally when someone sends out a
> > patchset it just sort of lands with a dull thud.
> >
> > I'm not sure what Jens did right to make all this happen, but thanks!
>
>
> I don't know how he did either. I was reading theses patches and *something*
> pushed me to my testbox, and then I tested...
>
> Jens, how do you do that?
Heh, not sure :-)
But indeed, thanks for the testing. It looks quite interesting. I'm
guessing it probably has to do with who ends up doing the balancing and
that the flusher threads block, it may change the picture a bit. So it
may just be that it'll require a few vm tweaks. I'll definitely look
into it and try and reproduce your results.
Did you run it a 2nd time on each drive and check if the results were
(approximately) consistent on the two drives?
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 19:50 ` Jens Axboe
@ 2009-06-04 20:10 ` Jens Axboe
2009-06-04 22:34 ` Frederic Weisbecker
2009-06-04 21:37 ` Frederic Weisbecker
1 sibling, 1 reply; 70+ messages in thread
From: Jens Axboe @ 2009-06-04 20:10 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Andrew Morton, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Thu, Jun 04 2009, Jens Axboe wrote:
> On Thu, Jun 04 2009, Frederic Weisbecker wrote:
> > On Thu, Jun 04, 2009 at 12:07:26PM -0700, Andrew Morton wrote:
> > > On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > >
> > > > I've just tested it on UP in a single disk.
> > >
> > > I must say, I'm stunned at the amount of testing which people are
> > > performing on this patchset. Normally when someone sends out a
> > > patchset it just sort of lands with a dull thud.
> > >
> > > I'm not sure what Jens did right to make all this happen, but thanks!
> >
> >
> > I don't know how he did either. I was reading theses patches and *something*
> > pushed me to my testbox, and then I tested...
> >
> > Jens, how do you do that?
>
> Heh, not sure :-)
>
> But indeed, thanks for the testing. It looks quite interesting. I'm
> guessing it probably has to do with who ends up doing the balancing and
> that the flusher threads block, it may change the picture a bit. So it
> may just be that it'll require a few vm tweaks. I'll definitely look
> into it and try and reproduce your results.
>
> Did you run it a 2nd time on each drive and check if the results were
> (approximately) consistent on the two drives?
each partition... What IO scheduler did you use on hda?
The main difference with this test case is that before we had two super
blocks, each with lists of dirty inodes. pdflush would attack those. Now
we have both the inodes from the two supers on a single set of lists on
the bdi. So either we have some ordering issue there (which is causing
the unfairness), or something else is.
So perhaps you can try with noop on hda to see if that changes the
picture?
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 19:50 ` Jens Axboe
2009-06-04 20:10 ` Jens Axboe
@ 2009-06-04 21:37 ` Frederic Weisbecker
1 sibling, 0 replies; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-04 21:37 UTC (permalink / raw)
To: Jens Axboe
Cc: Andrew Morton, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Thu, Jun 04, 2009 at 09:50:13PM +0200, Jens Axboe wrote:
> On Thu, Jun 04 2009, Frederic Weisbecker wrote:
> > On Thu, Jun 04, 2009 at 12:07:26PM -0700, Andrew Morton wrote:
> > > On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > >
> > > > I've just tested it on UP in a single disk.
> > >
> > > I must say, I'm stunned at the amount of testing which people are
> > > performing on this patchset. Normally when someone sends out a
> > > patchset it just sort of lands with a dull thud.
> > >
> > > I'm not sure what Jens did right to make all this happen, but thanks!
> >
> >
> > I don't know how he did either. I was reading theses patches and *something*
> > pushed me to my testbox, and then I tested...
> >
> > Jens, how do you do that?
>
> Heh, not sure :-)
>
> But indeed, thanks for the testing. It looks quite interesting. I'm
> guessing it probably has to do with who ends up doing the balancing and
> that the flusher threads block, it may change the picture a bit. So it
> may just be that it'll require a few vm tweaks. I'll definitely look
> into it and try and reproduce your results.
>
> Did you run it a 2nd time on each drive and check if the results were
> (approximately) consistent on the two drives?
Another snapshot, only with bdi-writeback this time.
http://kernel.org/pub/linux/kernel/people/frederic/dbench2.pdf
Looks like the same effect but the difference is more quiet this time.
I guess there is a good bunch of entropy inside, so it's hard to tell :)
I'll test with no op scheduler.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 20:10 ` Jens Axboe
@ 2009-06-04 22:34 ` Frederic Weisbecker
2009-06-05 19:15 ` Jens Axboe
0 siblings, 1 reply; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-04 22:34 UTC (permalink / raw)
To: Jens Axboe
Cc: Andrew Morton, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Thu, Jun 04, 2009 at 10:10:12PM +0200, Jens Axboe wrote:
> On Thu, Jun 04 2009, Jens Axboe wrote:
> > On Thu, Jun 04 2009, Frederic Weisbecker wrote:
> > > On Thu, Jun 04, 2009 at 12:07:26PM -0700, Andrew Morton wrote:
> > > > On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > >
> > > > > I've just tested it on UP in a single disk.
> > > >
> > > > I must say, I'm stunned at the amount of testing which people are
> > > > performing on this patchset. Normally when someone sends out a
> > > > patchset it just sort of lands with a dull thud.
> > > >
> > > > I'm not sure what Jens did right to make all this happen, but thanks!
> > >
> > >
> > > I don't know how he did either. I was reading theses patches and *something*
> > > pushed me to my testbox, and then I tested...
> > >
> > > Jens, how do you do that?
> >
> > Heh, not sure :-)
> >
> > But indeed, thanks for the testing. It looks quite interesting. I'm
> > guessing it probably has to do with who ends up doing the balancing and
> > that the flusher threads block, it may change the picture a bit. So it
> > may just be that it'll require a few vm tweaks. I'll definitely look
> > into it and try and reproduce your results.
> >
> > Did you run it a 2nd time on each drive and check if the results were
> > (approximately) consistent on the two drives?
>
> each partition... What IO scheduler did you use on hda?
CFQ.
> The main difference with this test case is that before we had two super
> blocks, each with lists of dirty inodes. pdflush would attack those. Now
> we have both the inodes from the two supers on a single set of lists on
> the bdi. So either we have some ordering issue there (which is causing
> the unfairness), or something else is.
Yeah.
But although these flushers are per-bdi, with a single list (well, three)
of dirty inodes, it looks like the writeback is still performed per
superblock, I mean the bdi work gives the concerned superblock
and the bdi list is iterated in generic_sync_wb_inodes() which
only processes the inodes for the given superblock. So there is
a bit of a per superblock serialization there and....
(Note, the above is just written for myself in the secret hope I could
understand better these patches by writing my brainstorming...)
> So perhaps you can try with noop on hda to see if that changes the
> picture?
The result with noop is even more impressive.
See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
Also a comparison, noop with pdflush against noop with bdi writeback:
http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
Frederic.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 15:20 ` Frederic Weisbecker
@ 2009-06-05 1:14 ` Zhang, Yanmin
2009-06-05 1:14 ` Zhang, Yanmin
1 sibling, 0 replies; 70+ messages in thread
From: Zhang, Yanmin @ 2009-06-05 1:14 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jens Axboe, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, richard, damien.wyart
On Thu, 2009-06-04 at 17:20 +0200, Frederic Weisbecker wrote:
> Hi,
>
>
> On Thu, May 28, 2009 at 01:46:33PM +0200, Jens Axboe wrote:
> > Hi,
> >
> > Here's the 9th version of the writeback patches. Changes since v8:
> I've just tested it on UP in a single disk.
>
> I've run two parallels dbench tests on two partitions and
> tried it with this patch and without.
I also tested V9 with multiple-dbench workload by starting multiple
dbench tasks and every task has 4 processes to do I/O on one partition (file
system). Mostly I use JBODs which have 7/11/13 disks.
I didn't find result regression between vanilla and V9 kernel on this workload.
>
> I used 30 proc each during 600 secs.
>
> You can see the result in attachment.
> And also there:
>
> http://kernel.org/pub/linux/kernel/people/frederic/dbench.pdf
> http://kernel.org/pub/linux/kernel/people/frederic/bdi-writeback-hda1.log
> http://kernel.org/pub/linux/kernel/people/frederic/bdi-writeback-hda3.log
> http://kernel.org/pub/linux/kernel/people/frederic/pdflush-hda1.log
> http://kernel.org/pub/linux/kernel/people/frederic/pdflush-hda3.log
>
>
> As you can see, bdi writeback is faster than pdflush on hda1 and slower
> on hda3. But, well that's not the point.
>
> What I can observe here is the difference on the standard deviation
> for the rate between two parallel writers on a same device (but
> two different partitions, then superblocks).
>
> With pdflush, the distributed rate is much better balanced than
> with bdi writeback in a single device.
>
> I'm not sure why. Is there something in these patches that makes
> several bdi flusher threads for a same bdi not well balanced
> between them?
>
> Frederic.
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-06-05 1:14 ` Zhang, Yanmin
0 siblings, 0 replies; 70+ messages in thread
From: Zhang, Yanmin @ 2009-06-05 1:14 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jens Axboe, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, akpm, jack, richard, damien.wyart
On Thu, 2009-06-04 at 17:20 +0200, Frederic Weisbecker wrote:
> Hi,
>
>
> On Thu, May 28, 2009 at 01:46:33PM +0200, Jens Axboe wrote:
> > Hi,
> >
> > Here's the 9th version of the writeback patches. Changes since v8:
> I've just tested it on UP in a single disk.
>
> I've run two parallels dbench tests on two partitions and
> tried it with this patch and without.
I also tested V9 with multiple-dbench workload by starting multiple
dbench tasks and every task has 4 processes to do I/O on one partition (file
system). Mostly I use JBODs which have 7/11/13 disks.
I didn't find result regression between vanilla and V9 kernel on this workload.
>
> I used 30 proc each during 600 secs.
>
> You can see the result in attachment.
> And also there:
>
> http://kernel.org/pub/linux/kernel/people/frederic/dbench.pdf
> http://kernel.org/pub/linux/kernel/people/frederic/bdi-writeback-hda1.log
> http://kernel.org/pub/linux/kernel/people/frederic/bdi-writeback-hda3.log
> http://kernel.org/pub/linux/kernel/people/frederic/pdflush-hda1.log
> http://kernel.org/pub/linux/kernel/people/frederic/pdflush-hda3.log
>
>
> As you can see, bdi writeback is faster than pdflush on hda1 and slower
> on hda3. But, well that's not the point.
>
> What I can observe here is the difference on the standard deviation
> for the rate between two parallel writers on a same device (but
> two different partitions, then superblocks).
>
> With pdflush, the distributed rate is much better balanced than
> with bdi writeback in a single device.
>
> I'm not sure why. Is there something in these patches that makes
> several bdi flusher threads for a same bdi not well balanced
> between them?
>
> Frederic.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-04 22:34 ` Frederic Weisbecker
@ 2009-06-05 19:15 ` Jens Axboe
2009-06-05 21:14 ` Jan Kara
2009-06-06 0:35 ` Frederic Weisbecker
0 siblings, 2 replies; 70+ messages in thread
From: Jens Axboe @ 2009-06-05 19:15 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Andrew Morton, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> On Thu, Jun 04, 2009 at 10:10:12PM +0200, Jens Axboe wrote:
> > On Thu, Jun 04 2009, Jens Axboe wrote:
> > > On Thu, Jun 04 2009, Frederic Weisbecker wrote:
> > > > On Thu, Jun 04, 2009 at 12:07:26PM -0700, Andrew Morton wrote:
> > > > > On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > > >
> > > > > > I've just tested it on UP in a single disk.
> > > > >
> > > > > I must say, I'm stunned at the amount of testing which people are
> > > > > performing on this patchset. Normally when someone sends out a
> > > > > patchset it just sort of lands with a dull thud.
> > > > >
> > > > > I'm not sure what Jens did right to make all this happen, but thanks!
> > > >
> > > >
> > > > I don't know how he did either. I was reading theses patches and *something*
> > > > pushed me to my testbox, and then I tested...
> > > >
> > > > Jens, how do you do that?
> > >
> > > Heh, not sure :-)
> > >
> > > But indeed, thanks for the testing. It looks quite interesting. I'm
> > > guessing it probably has to do with who ends up doing the balancing and
> > > that the flusher threads block, it may change the picture a bit. So it
> > > may just be that it'll require a few vm tweaks. I'll definitely look
> > > into it and try and reproduce your results.
> > >
> > > Did you run it a 2nd time on each drive and check if the results were
> > > (approximately) consistent on the two drives?
> >
> > each partition... What IO scheduler did you use on hda?
>
>
> CFQ.
>
>
> > The main difference with this test case is that before we had two super
> > blocks, each with lists of dirty inodes. pdflush would attack those. Now
> > we have both the inodes from the two supers on a single set of lists on
> > the bdi. So either we have some ordering issue there (which is causing
> > the unfairness), or something else is.
>
>
> Yeah.
> But although these flushers are per-bdi, with a single list (well, three)
> of dirty inodes, it looks like the writeback is still performed per
> superblock, I mean the bdi work gives the concerned superblock
> and the bdi list is iterated in generic_sync_wb_inodes() which
> only processes the inodes for the given superblock. So there is
> a bit of a per superblock serialization there and....
But in most cases sb == NULL, which means that the writeback does not
care. It should only pass in a valid sb if someone explicitly wants to
sync that sb.
But the way that the lists are organized now does definitely open some
windows of unfairness for a test like yours. It's on the top of the
investigate list for monday.
> > So perhaps you can try with noop on hda to see if that changes the
> > picture?
>
>
>
> The result with noop is even more impressive.
>
> See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
>
> Also a comparison, noop with pdflush against noop with bdi writeback:
>
> http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
OK, so things aren't exactly peachy here to begin with. It may not
actually BE an issue, or at least now a new one, but that doesn't mean
that we should not attempt to quantify the impact.
How are you starting these runs? With a test like this, even a small
difference in start time can make a huge difference.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-05 1:14 ` Zhang, Yanmin
(?)
@ 2009-06-05 19:16 ` Jens Axboe
-1 siblings, 0 replies; 70+ messages in thread
From: Jens Axboe @ 2009-06-05 19:16 UTC (permalink / raw)
To: Zhang, Yanmin
Cc: Frederic Weisbecker, linux-kernel, linux-fsdevel, tytso,
chris.mason, david, hch, akpm, jack, richard, damien.wyart
On Fri, Jun 05 2009, Zhang, Yanmin wrote:
> On Thu, 2009-06-04 at 17:20 +0200, Frederic Weisbecker wrote:
> > Hi,
> >
> >
> > On Thu, May 28, 2009 at 01:46:33PM +0200, Jens Axboe wrote:
> > > Hi,
> > >
> > > Here's the 9th version of the writeback patches. Changes since v8:
>
> > I've just tested it on UP in a single disk.
> >
> > I've run two parallels dbench tests on two partitions and
> > tried it with this patch and without.
> I also tested V9 with multiple-dbench workload by starting multiple
> dbench tasks and every task has 4 processes to do I/O on one partition (file
> system). Mostly I use JBODs which have 7/11/13 disks.
>
> I didn't find result regression between ???vanilla and V9 kernel on
> this workload.
Ah that's good, thanks for that result as well :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-05 19:15 ` Jens Axboe
@ 2009-06-05 21:14 ` Jan Kara
2009-06-06 0:18 ` Chris Mason
2009-06-06 1:00 ` Frederic Weisbecker
2009-06-06 0:35 ` Frederic Weisbecker
1 sibling, 2 replies; 70+ messages in thread
From: Jan Kara @ 2009-06-05 21:14 UTC (permalink / raw)
To: Jens Axboe
Cc: Frederic Weisbecker, Andrew Morton, linux-kernel, linux-fsdevel,
tytso, chris.mason, david, hch, jack, yanmin_zhang, richard,
damien.wyart
On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > The result with noop is even more impressive.
> >
> > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> >
> > Also a comparison, noop with pdflush against noop with bdi writeback:
> >
> > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
>
> OK, so things aren't exactly peachy here to begin with. It may not
> actually BE an issue, or at least now a new one, but that doesn't mean
> that we should not attempt to quantify the impact.
What looks interesting is also the overall throughput. With pdflush we
get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
So per-bdi seems to be *more* fair but throughput suffers a lot (which
might be inevitable due to incurred seeks).
Frederic, how much does dbench achieve for you just on one partition
(test both consecutively if possible) with as many threads as have those
two dbench instances together? Thanks.
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-05 21:14 ` Jan Kara
@ 2009-06-06 0:18 ` Chris Mason
2009-06-06 0:23 ` Jan Kara
2009-06-06 1:00 ` Frederic Weisbecker
1 sibling, 1 reply; 70+ messages in thread
From: Chris Mason @ 2009-06-06 0:18 UTC (permalink / raw)
To: Jan Kara
Cc: Jens Axboe, Frederic Weisbecker, Andrew Morton, linux-kernel,
linux-fsdevel, tytso, david, hch, yanmin_zhang, richard,
damien.wyart
On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > The result with noop is even more impressive.
> > >
> > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > >
> > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > >
> > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> >
> > OK, so things aren't exactly peachy here to begin with. It may not
> > actually BE an issue, or at least now a new one, but that doesn't mean
> > that we should not attempt to quantify the impact.
> What looks interesting is also the overall throughput. With pdflush we
> get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> So per-bdi seems to be *more* fair but throughput suffers a lot (which
> might be inevitable due to incurred seeks).
> Frederic, how much does dbench achieve for you just on one partition
> (test both consecutively if possible) with as many threads as have those
> two dbench instances together? Thanks.
Is the graph showing us dbench tput or disk tput? I'm assuming it is
disk tput, so bdi may just be writing less?
-chris
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-06 0:18 ` Chris Mason
@ 2009-06-06 0:23 ` Jan Kara
0 siblings, 0 replies; 70+ messages in thread
From: Jan Kara @ 2009-06-06 0:23 UTC (permalink / raw)
To: Chris Mason, Jan Kara, Jens Axboe, Frederic Weisbecker,
Andrew Morton, linux-kernel, linux-fsdevel, tytso, david, hch,
yanmin_zhang, richard, damien.wyart
On Fri 05-06-09 20:18:15, Chris Mason wrote:
> On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > The result with noop is even more impressive.
> > > >
> > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > >
> > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > >
> > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > >
> > > OK, so things aren't exactly peachy here to begin with. It may not
> > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > that we should not attempt to quantify the impact.
> > What looks interesting is also the overall throughput. With pdflush we
> > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > might be inevitable due to incurred seeks).
> > Frederic, how much does dbench achieve for you just on one partition
> > (test both consecutively if possible) with as many threads as have those
> > two dbench instances together? Thanks.
>
> Is the graph showing us dbench tput or disk tput? I'm assuming it is
> disk tput, so bdi may just be writing less?
Good, question. I was assuming dbench throughput :).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
@ 2009-06-06 0:23 ` Jan Kara
0 siblings, 0 replies; 70+ messages in thread
From: Jan Kara @ 2009-06-06 0:23 UTC (permalink / raw)
To: Chris Mason, Jan Kara, Jens Axboe, Frederic Weisbecker, Andrew Morton
On Fri 05-06-09 20:18:15, Chris Mason wrote:
> On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > The result with noop is even more impressive.
> > > >
> > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > >
> > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > >
> > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > >
> > > OK, so things aren't exactly peachy here to begin with. It may not
> > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > that we should not attempt to quantify the impact.
> > What looks interesting is also the overall throughput. With pdflush we
> > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > might be inevitable due to incurred seeks).
> > Frederic, how much does dbench achieve for you just on one partition
> > (test both consecutively if possible) with as many threads as have those
> > two dbench instances together? Thanks.
>
> Is the graph showing us dbench tput or disk tput? I'm assuming it is
> disk tput, so bdi may just be writing less?
Good, question. I was assuming dbench throughput :).
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-05 19:15 ` Jens Axboe
2009-06-05 21:14 ` Jan Kara
@ 2009-06-06 0:35 ` Frederic Weisbecker
1 sibling, 0 replies; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-06 0:35 UTC (permalink / raw)
To: Jens Axboe
Cc: Andrew Morton, linux-kernel, linux-fsdevel, tytso, chris.mason,
david, hch, jack, yanmin_zhang, richard, damien.wyart
On Fri, Jun 05, 2009 at 09:15:28PM +0200, Jens Axboe wrote:
> On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > On Thu, Jun 04, 2009 at 10:10:12PM +0200, Jens Axboe wrote:
> > > On Thu, Jun 04 2009, Jens Axboe wrote:
> > > > On Thu, Jun 04 2009, Frederic Weisbecker wrote:
> > > > > On Thu, Jun 04, 2009 at 12:07:26PM -0700, Andrew Morton wrote:
> > > > > > On Thu, 4 Jun 2009 17:20:44 +0200 Frederic Weisbecker <fweisbec@gmail.com> wrote:
> > > > > >
> > > > > > > I've just tested it on UP in a single disk.
> > > > > >
> > > > > > I must say, I'm stunned at the amount of testing which people are
> > > > > > performing on this patchset. Normally when someone sends out a
> > > > > > patchset it just sort of lands with a dull thud.
> > > > > >
> > > > > > I'm not sure what Jens did right to make all this happen, but thanks!
> > > > >
> > > > >
> > > > > I don't know how he did either. I was reading theses patches and *something*
> > > > > pushed me to my testbox, and then I tested...
> > > > >
> > > > > Jens, how do you do that?
> > > >
> > > > Heh, not sure :-)
> > > >
> > > > But indeed, thanks for the testing. It looks quite interesting. I'm
> > > > guessing it probably has to do with who ends up doing the balancing and
> > > > that the flusher threads block, it may change the picture a bit. So it
> > > > may just be that it'll require a few vm tweaks. I'll definitely look
> > > > into it and try and reproduce your results.
> > > >
> > > > Did you run it a 2nd time on each drive and check if the results were
> > > > (approximately) consistent on the two drives?
> > >
> > > each partition... What IO scheduler did you use on hda?
> >
> >
> > CFQ.
> >
> >
> > > The main difference with this test case is that before we had two super
> > > blocks, each with lists of dirty inodes. pdflush would attack those. Now
> > > we have both the inodes from the two supers on a single set of lists on
> > > the bdi. So either we have some ordering issue there (which is causing
> > > the unfairness), or something else is.
> >
> >
> > Yeah.
> > But although these flushers are per-bdi, with a single list (well, three)
> > of dirty inodes, it looks like the writeback is still performed per
> > superblock, I mean the bdi work gives the concerned superblock
> > and the bdi list is iterated in generic_sync_wb_inodes() which
> > only processes the inodes for the given superblock. So there is
> > a bit of a per superblock serialization there and....
>
> But in most cases sb == NULL, which means that the writeback does not
> care. It should only pass in a valid sb if someone explicitly wants to
> sync that sb.
Ah ok.
> But the way that the lists are organized now does definitely open some
> windows of unfairness for a test like yours. It's on the top of the
> investigate list for monday.
I stay tuned.
> > > So perhaps you can try with noop on hda to see if that changes the
> > > picture?
> >
> >
> >
> > The result with noop is even more impressive.
> >
> > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> >
> > Also a comparison, noop with pdflush against noop with bdi writeback:
> >
> > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
>
> OK, so things aren't exactly peachy here to begin with. It may not
> actually BE an issue, or at least now a new one, but that doesn't mean
> that we should not attempt to quantify the impact.
>
> How are you starting these runs? With a test like this, even a small
> difference in start time can make a huge difference.
Hmm, in a kind of draft way :)
I pre-write the command on two consoles, each on a concerned
partition, then I type enter for each one.
So there is always one that is started before the other with
some delay. And it looks like the first often win the race.
Frederic.
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-05 21:14 ` Jan Kara
2009-06-06 0:18 ` Chris Mason
@ 2009-06-06 1:00 ` Frederic Weisbecker
1 sibling, 0 replies; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-06 1:00 UTC (permalink / raw)
To: Jan Kara
Cc: Jens Axboe, Andrew Morton, linux-kernel, linux-fsdevel, tytso,
chris.mason, david, hch, yanmin_zhang, richard, damien.wyart
On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > The result with noop is even more impressive.
> > >
> > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > >
> > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > >
> > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> >
> > OK, so things aren't exactly peachy here to begin with. It may not
> > actually BE an issue, or at least now a new one, but that doesn't mean
> > that we should not attempt to quantify the impact.
> What looks interesting is also the overall throughput. With pdflush we
> get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> So per-bdi seems to be *more* fair but throughput suffers a lot (which
> might be inevitable due to incurred seeks).
Heh indeed, I was confused with the colors here but yes pdflush has
a faster total and a higher unfairness with noop, at least with this test.
> Frederic, how much does dbench achieve for you just on one partition
> (test both consecutively if possible) with as many threads as have those
> two dbench instances together? Thanks.
Good idea, I'll try it out so that there wouldn't have any per superblock
ordering there, or whathever that could be.
Thanks.
> Honza
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-06 0:23 ` Jan Kara
(?)
@ 2009-06-06 1:06 ` Frederic Weisbecker
2009-06-08 9:23 ` Jens Axboe
-1 siblings, 1 reply; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-06 1:06 UTC (permalink / raw)
To: Jan Kara
Cc: Chris Mason, Jens Axboe, Andrew Morton, linux-kernel,
linux-fsdevel, tytso, david, hch, yanmin_zhang, richard,
damien.wyart
On Sat, Jun 06, 2009 at 02:23:40AM +0200, Jan Kara wrote:
> On Fri 05-06-09 20:18:15, Chris Mason wrote:
> > On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > > The result with noop is even more impressive.
> > > > >
> > > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > > >
> > > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > > >
> > > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > > >
> > > > OK, so things aren't exactly peachy here to begin with. It may not
> > > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > > that we should not attempt to quantify the impact.
> > > What looks interesting is also the overall throughput. With pdflush we
> > > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > > might be inevitable due to incurred seeks).
> > > Frederic, how much does dbench achieve for you just on one partition
> > > (test both consecutively if possible) with as many threads as have those
> > > two dbench instances together? Thanks.
> >
> > Is the graph showing us dbench tput or disk tput? I'm assuming it is
> > disk tput, so bdi may just be writing less?
> Good, question. I was assuming dbench throughput :).
>
> Honza
Yeah it's dbench. May be that's not the right tool to measure the writeback
layer, even though dbench results are necessarily influenced by the writeback
behaviour.
May be I should use something else?
Note that if you want I can put some surgicals trace_printk()
in fs/fs-writeback.c
>
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-06 1:06 ` Frederic Weisbecker
@ 2009-06-08 9:23 ` Jens Axboe
2009-06-08 12:23 ` Jan Kara
0 siblings, 1 reply; 70+ messages in thread
From: Jens Axboe @ 2009-06-08 9:23 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Jan Kara, Chris Mason, Andrew Morton, linux-kernel,
linux-fsdevel, tytso, david, hch, yanmin_zhang, richard,
damien.wyart
On Sat, Jun 06 2009, Frederic Weisbecker wrote:
> On Sat, Jun 06, 2009 at 02:23:40AM +0200, Jan Kara wrote:
> > On Fri 05-06-09 20:18:15, Chris Mason wrote:
> > > On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > > > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > > > The result with noop is even more impressive.
> > > > > >
> > > > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > > > >
> > > > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > > > >
> > > > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > > > >
> > > > > OK, so things aren't exactly peachy here to begin with. It may not
> > > > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > > > that we should not attempt to quantify the impact.
> > > > What looks interesting is also the overall throughput. With pdflush we
> > > > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > > > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > > > might be inevitable due to incurred seeks).
> > > > Frederic, how much does dbench achieve for you just on one partition
> > > > (test both consecutively if possible) with as many threads as have those
> > > > two dbench instances together? Thanks.
> > >
> > > Is the graph showing us dbench tput or disk tput? I'm assuming it is
> > > disk tput, so bdi may just be writing less?
> > Good, question. I was assuming dbench throughput :).
> >
> > Honza
>
>
> Yeah it's dbench. May be that's not the right tool to measure the writeback
> layer, even though dbench results are necessarily influenced by the writeback
> behaviour.
>
> May be I should use something else?
>
> Note that if you want I can put some surgicals trace_printk()
> in fs/fs-writeback.c
FWIW, I ran a similar test here just now. CFQ was used, two partitions
on an (otherwise) idle drive. I used 30 clients per dbench and 600s
runtime. Results are nearly identical, both throughout the run and
total:
/dev/sdb1
Throughput 165.738 MB/sec 30 clients 30 procs max_latency=459.002 ms
/dev/sdb2
Throughput 165.773 MB/sec 30 clients 30 procs max_latency=607.198 ms
The flusher threads see very little exercise here.
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-08 9:23 ` Jens Axboe
@ 2009-06-08 12:23 ` Jan Kara
2009-06-08 12:28 ` Jens Axboe
0 siblings, 1 reply; 70+ messages in thread
From: Jan Kara @ 2009-06-08 12:23 UTC (permalink / raw)
To: Jens Axboe
Cc: Frederic Weisbecker, Jan Kara, Chris Mason, Andrew Morton,
linux-kernel, linux-fsdevel, tytso, david, hch, yanmin_zhang,
richard, damien.wyart
On Mon 08-06-09 11:23:38, Jens Axboe wrote:
> On Sat, Jun 06 2009, Frederic Weisbecker wrote:
> > On Sat, Jun 06, 2009 at 02:23:40AM +0200, Jan Kara wrote:
> > > On Fri 05-06-09 20:18:15, Chris Mason wrote:
> > > > On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > > > > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > > > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > > > > The result with noop is even more impressive.
> > > > > > >
> > > > > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > > > > >
> > > > > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > > > > >
> > > > > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > > > > >
> > > > > > OK, so things aren't exactly peachy here to begin with. It may not
> > > > > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > > > > that we should not attempt to quantify the impact.
> > > > > What looks interesting is also the overall throughput. With pdflush we
> > > > > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > > > > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > > > > might be inevitable due to incurred seeks).
> > > > > Frederic, how much does dbench achieve for you just on one partition
> > > > > (test both consecutively if possible) with as many threads as have those
> > > > > two dbench instances together? Thanks.
> > > >
> > > > Is the graph showing us dbench tput or disk tput? I'm assuming it is
> > > > disk tput, so bdi may just be writing less?
> > > Good, question. I was assuming dbench throughput :).
> > >
> > > Honza
> >
> >
> > Yeah it's dbench. May be that's not the right tool to measure the writeback
> > layer, even though dbench results are necessarily influenced by the writeback
> > behaviour.
> >
> > May be I should use something else?
> >
> > Note that if you want I can put some surgicals trace_printk()
> > in fs/fs-writeback.c
>
> FWIW, I ran a similar test here just now. CFQ was used, two partitions
> on an (otherwise) idle drive. I used 30 clients per dbench and 600s
> runtime. Results are nearly identical, both throughout the run and
> total:
>
> /dev/sdb1
> Throughput 165.738 MB/sec 30 clients 30 procs max_latency=459.002 ms
>
> /dev/sdb2
> Throughput 165.773 MB/sec 30 clients 30 procs max_latency=607.198 ms
Hmm, interesting. 165 MB/sec (in fact 330 MB/sec for that drive) sounds
like quite a lot ;). This usually happens with dbench when the processes
manage to delete / redirty data before writeback thread gets to them (so
some IO happens in memory only and throughput is bound by the CPU / memory
speed). So I think you are on a different part of the performance curve
than Frederic. Probably you have to run with more threads so that dbench
threads get throttled because of total amount of dirty data generated...
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-08 12:23 ` Jan Kara
@ 2009-06-08 12:28 ` Jens Axboe
2009-06-08 13:01 ` Jan Kara
2009-06-09 18:39 ` Frederic Weisbecker
0 siblings, 2 replies; 70+ messages in thread
From: Jens Axboe @ 2009-06-08 12:28 UTC (permalink / raw)
To: Jan Kara
Cc: Frederic Weisbecker, Chris Mason, Andrew Morton, linux-kernel,
linux-fsdevel, tytso, david, hch, yanmin_zhang, richard,
damien.wyart
On Mon, Jun 08 2009, Jan Kara wrote:
> On Mon 08-06-09 11:23:38, Jens Axboe wrote:
> > On Sat, Jun 06 2009, Frederic Weisbecker wrote:
> > > On Sat, Jun 06, 2009 at 02:23:40AM +0200, Jan Kara wrote:
> > > > On Fri 05-06-09 20:18:15, Chris Mason wrote:
> > > > > On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > > > > > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > > > > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > > > > > The result with noop is even more impressive.
> > > > > > > >
> > > > > > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > > > > > >
> > > > > > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > > > > > >
> > > > > > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > > > > > >
> > > > > > > OK, so things aren't exactly peachy here to begin with. It may not
> > > > > > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > > > > > that we should not attempt to quantify the impact.
> > > > > > What looks interesting is also the overall throughput. With pdflush we
> > > > > > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > > > > > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > > > > > might be inevitable due to incurred seeks).
> > > > > > Frederic, how much does dbench achieve for you just on one partition
> > > > > > (test both consecutively if possible) with as many threads as have those
> > > > > > two dbench instances together? Thanks.
> > > > >
> > > > > Is the graph showing us dbench tput or disk tput? I'm assuming it is
> > > > > disk tput, so bdi may just be writing less?
> > > > Good, question. I was assuming dbench throughput :).
> > > >
> > > > Honza
> > >
> > >
> > > Yeah it's dbench. May be that's not the right tool to measure the writeback
> > > layer, even though dbench results are necessarily influenced by the writeback
> > > behaviour.
> > >
> > > May be I should use something else?
> > >
> > > Note that if you want I can put some surgicals trace_printk()
> > > in fs/fs-writeback.c
> >
> > FWIW, I ran a similar test here just now. CFQ was used, two partitions
> > on an (otherwise) idle drive. I used 30 clients per dbench and 600s
> > runtime. Results are nearly identical, both throughout the run and
> > total:
> >
> > /dev/sdb1
> > Throughput 165.738 MB/sec 30 clients 30 procs max_latency=459.002 ms
> >
> > /dev/sdb2
> > Throughput 165.773 MB/sec 30 clients 30 procs max_latency=607.198 ms
> Hmm, interesting. 165 MB/sec (in fact 330 MB/sec for that drive) sounds
> like quite a lot ;). This usually happens with dbench when the processes
> manage to delete / redirty data before writeback thread gets to them (so
> some IO happens in memory only and throughput is bound by the CPU / memory
> speed). So I think you are on a different part of the performance curve
> than Frederic. Probably you have to run with more threads so that dbench
> threads get throttled because of total amount of dirty data generated...
Certainly, the actual disk data rate was consistenctly in the
60-70MB/sec region. The issue is likely that the box has 6GB of RAM, if
I boot with less than 30 clients will do.
But unless the situation changes radically with memory pressure, it
still shows a fair distribution of IO between the two. Since they have
identical results throughout, it should be safe to assume that the have
equal bandwidth distribution at the disk end. A fast dbench run is one
that doesn't touch the disk at all, once you start touching disk you
lose :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-08 12:28 ` Jens Axboe
@ 2009-06-08 13:01 ` Jan Kara
2009-06-09 18:39 ` Frederic Weisbecker
1 sibling, 0 replies; 70+ messages in thread
From: Jan Kara @ 2009-06-08 13:01 UTC (permalink / raw)
To: Jens Axboe
Cc: Frederic Weisbecker, Chris Mason, Andrew Morton, linux-kernel,
linux-fsdevel, tytso, david, hch, yanmin_zhang, richard,
damien.wyart
On Mon 08-06-09 14:28:34, Jens Axboe wrote:
> On Mon, Jun 08 2009, Jan Kara wrote:
> > On Mon 08-06-09 11:23:38, Jens Axboe wrote:
> > > On Sat, Jun 06 2009, Frederic Weisbecker wrote:
> > > > On Sat, Jun 06, 2009 at 02:23:40AM +0200, Jan Kara wrote:
> > > > > On Fri 05-06-09 20:18:15, Chris Mason wrote:
> > > > > > On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > > > > > > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > > > > > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > > > > > > The result with noop is even more impressive.
> > > > > > > > >
> > > > > > > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > > > > > > >
> > > > > > > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > > > > > > >
> > > > > > > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > > > > > > >
> > > > > > > > OK, so things aren't exactly peachy here to begin with. It may not
> > > > > > > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > > > > > > that we should not attempt to quantify the impact.
> > > > > > > What looks interesting is also the overall throughput. With pdflush we
> > > > > > > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > > > > > > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > > > > > > might be inevitable due to incurred seeks).
> > > > > > > Frederic, how much does dbench achieve for you just on one partition
> > > > > > > (test both consecutively if possible) with as many threads as have those
> > > > > > > two dbench instances together? Thanks.
> > > > > >
> > > > > > Is the graph showing us dbench tput or disk tput? I'm assuming it is
> > > > > > disk tput, so bdi may just be writing less?
> > > > > Good, question. I was assuming dbench throughput :).
> > > > >
> > > > > Honza
> > > >
> > > >
> > > > Yeah it's dbench. May be that's not the right tool to measure the writeback
> > > > layer, even though dbench results are necessarily influenced by the writeback
> > > > behaviour.
> > > >
> > > > May be I should use something else?
> > > >
> > > > Note that if you want I can put some surgicals trace_printk()
> > > > in fs/fs-writeback.c
> > >
> > > FWIW, I ran a similar test here just now. CFQ was used, two partitions
> > > on an (otherwise) idle drive. I used 30 clients per dbench and 600s
> > > runtime. Results are nearly identical, both throughout the run and
> > > total:
> > >
> > > /dev/sdb1
> > > Throughput 165.738 MB/sec 30 clients 30 procs max_latency=459.002 ms
> > >
> > > /dev/sdb2
> > > Throughput 165.773 MB/sec 30 clients 30 procs max_latency=607.198 ms
> > Hmm, interesting. 165 MB/sec (in fact 330 MB/sec for that drive) sounds
> > like quite a lot ;). This usually happens with dbench when the processes
> > manage to delete / redirty data before writeback thread gets to them (so
> > some IO happens in memory only and throughput is bound by the CPU / memory
> > speed). So I think you are on a different part of the performance curve
> > than Frederic. Probably you have to run with more threads so that dbench
> > threads get throttled because of total amount of dirty data generated...
>
> Certainly, the actual disk data rate was consistenctly in the
> 60-70MB/sec region. The issue is likely that the box has 6GB of RAM, if
> I boot with less than 30 clients will do.
Yes, that would do as well.
> But unless the situation changes radically with memory pressure, it
> still shows a fair distribution of IO between the two. Since they have
> identical results throughout, it should be safe to assume that the have
> equal bandwidth distribution at the disk end. A fast dbench run is one
Yes, I agree. Your previous test indirectly shows fair distribution
on the disk end (with blktrace you could actually confirm it directly).
> that doesn't touch the disk at all, once you start touching disk you
> lose :-)
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: [PATCH 0/11] Per-bdi writeback flusher threads v9
2009-06-08 12:28 ` Jens Axboe
2009-06-08 13:01 ` Jan Kara
@ 2009-06-09 18:39 ` Frederic Weisbecker
1 sibling, 0 replies; 70+ messages in thread
From: Frederic Weisbecker @ 2009-06-09 18:39 UTC (permalink / raw)
To: Jens Axboe
Cc: Jan Kara, Chris Mason, Andrew Morton, linux-kernel,
linux-fsdevel, tytso, david, hch, yanmin_zhang, richard,
damien.wyart
On Mon, Jun 08, 2009 at 02:28:34PM +0200, Jens Axboe wrote:
> On Mon, Jun 08 2009, Jan Kara wrote:
> > On Mon 08-06-09 11:23:38, Jens Axboe wrote:
> > > On Sat, Jun 06 2009, Frederic Weisbecker wrote:
> > > > On Sat, Jun 06, 2009 at 02:23:40AM +0200, Jan Kara wrote:
> > > > > On Fri 05-06-09 20:18:15, Chris Mason wrote:
> > > > > > On Fri, Jun 05, 2009 at 11:14:38PM +0200, Jan Kara wrote:
> > > > > > > On Fri 05-06-09 21:15:28, Jens Axboe wrote:
> > > > > > > > On Fri, Jun 05 2009, Frederic Weisbecker wrote:
> > > > > > > > > The result with noop is even more impressive.
> > > > > > > > >
> > > > > > > > > See: http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop.pdf
> > > > > > > > >
> > > > > > > > > Also a comparison, noop with pdflush against noop with bdi writeback:
> > > > > > > > >
> > > > > > > > > http://kernel.org/pub/linux/kernel/people/frederic/dbench-noop-cmp.pdf
> > > > > > > >
> > > > > > > > OK, so things aren't exactly peachy here to begin with. It may not
> > > > > > > > actually BE an issue, or at least now a new one, but that doesn't mean
> > > > > > > > that we should not attempt to quantify the impact.
> > > > > > > What looks interesting is also the overall throughput. With pdflush we
> > > > > > > get to 2.5 MB/s + 26 MB/s while with per-bdi we get to 2.7 MB/s + 13 MB/s.
> > > > > > > So per-bdi seems to be *more* fair but throughput suffers a lot (which
> > > > > > > might be inevitable due to incurred seeks).
> > > > > > > Frederic, how much does dbench achieve for you just on one partition
> > > > > > > (test both consecutively if possible) with as many threads as have those
> > > > > > > two dbench instances together? Thanks.
> > > > > >
> > > > > > Is the graph showing us dbench tput or disk tput? I'm assuming it is
> > > > > > disk tput, so bdi may just be writing less?
> > > > > Good, question. I was assuming dbench throughput :).
> > > > >
> > > > > Honza
> > > >
> > > >
> > > > Yeah it's dbench. May be that's not the right tool to measure the writeback
> > > > layer, even though dbench results are necessarily influenced by the writeback
> > > > behaviour.
> > > >
> > > > May be I should use something else?
> > > >
> > > > Note that if you want I can put some surgicals trace_printk()
> > > > in fs/fs-writeback.c
> > >
> > > FWIW, I ran a similar test here just now. CFQ was used, two partitions
> > > on an (otherwise) idle drive. I used 30 clients per dbench and 600s
> > > runtime. Results are nearly identical, both throughout the run and
> > > total:
> > >
> > > /dev/sdb1
> > > Throughput 165.738 MB/sec 30 clients 30 procs max_latency=459.002 ms
> > >
> > > /dev/sdb2
> > > Throughput 165.773 MB/sec 30 clients 30 procs max_latency=607.198 ms
> > Hmm, interesting. 165 MB/sec (in fact 330 MB/sec for that drive) sounds
> > like quite a lot ;). This usually happens with dbench when the processes
> > manage to delete / redirty data before writeback thread gets to them (so
> > some IO happens in memory only and throughput is bound by the CPU / memory
> > speed). So I think you are on a different part of the performance curve
> > than Frederic. Probably you have to run with more threads so that dbench
> > threads get throttled because of total amount of dirty data generated...
>
> Certainly, the actual disk data rate was consistenctly in the
> 60-70MB/sec region. The issue is likely that the box has 6GB of RAM, if
> I boot with less than 30 clients will do.
>
> But unless the situation changes radically with memory pressure, it
> still shows a fair distribution of IO between the two. Since they have
> identical results throughout, it should be safe to assume that the have
> equal bandwidth distribution at the disk end. A fast dbench run is one
> that doesn't touch the disk at all, once you start touching disk you
> lose :-)
When I ran my tests, I only had 384 MB of memory, 100 threads and
only one CPU. So I was in a constant writeback, which should
be smoother with 6 GB of memory and 30 threads.
May be that's why you had a so well balanced result... Or may
be there is too much entropy in my testbox :)
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 70+ messages in thread
end of thread, other threads:[~2009-06-09 18:40 UTC | newest]
Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-28 11:46 [PATCH 0/11] Per-bdi writeback flusher threads v9 Jens Axboe
2009-05-28 11:46 ` [PATCH 01/11] ntfs: remove old debug check for dirty data in ntfs_put_super() Jens Axboe
2009-05-28 11:46 ` [PATCH 02/11] btrfs: properly register fs backing device Jens Axboe
2009-05-28 11:46 ` [PATCH 03/11] writeback: move dirty inodes from super_block to backing_dev_info Jens Axboe
2009-05-28 11:46 ` [PATCH 04/11] writeback: switch to per-bdi threads for flushing data Jens Axboe
2009-05-28 14:13 ` Artem Bityutskiy
2009-05-28 22:28 ` Jens Axboe
2009-05-28 11:46 ` [PATCH 05/11] writeback: get rid of pdflush completely Jens Axboe
2009-05-28 11:46 ` [PATCH 06/11] writeback: separate the flushing state/task from the bdi Jens Axboe
2009-05-28 11:46 ` [PATCH 07/11] writeback: support > 1 flusher thread per bdi Jens Axboe
2009-05-28 11:46 ` [PATCH 08/11] writeback: allow sleepy exit of default writeback task Jens Axboe
2009-05-28 11:46 ` [PATCH 09/11] writeback: add some debug inode list counters to bdi stats Jens Axboe
2009-05-28 11:46 ` [PATCH 10/11] writeback: add name to backing_dev_info Jens Axboe
2009-05-28 11:46 ` [PATCH 11/11] writeback: check for registered bdi in flusher add and inode dirty Jens Axboe
2009-05-28 13:56 ` [PATCH 0/11] Per-bdi writeback flusher threads v9 Peter Zijlstra
2009-05-28 22:28 ` Jens Axboe
2009-05-28 14:17 ` Artem Bityutskiy
2009-05-28 14:19 ` Artem Bityutskiy
2009-05-28 20:35 ` Peter Zijlstra
2009-05-28 22:27 ` Jens Axboe
2009-05-29 15:37 ` Artem Bityutskiy
2009-05-29 15:37 ` Artem Bityutskiy
2009-05-29 15:50 ` Jens Axboe
2009-05-29 16:02 ` Artem Bityutskiy
2009-05-29 16:02 ` Artem Bityutskiy
2009-05-29 17:07 ` Jens Axboe
2009-06-03 7:39 ` Artem Bityutskiy
2009-06-03 7:44 ` Jens Axboe
2009-06-03 7:46 ` Artem Bityutskiy
2009-06-03 7:46 ` Artem Bityutskiy
2009-06-03 7:50 ` Jens Axboe
2009-06-03 7:54 ` Artem Bityutskiy
2009-06-03 7:54 ` Artem Bityutskiy
2009-06-03 7:59 ` Artem Bityutskiy
2009-06-03 7:59 ` Artem Bityutskiy
2009-06-03 8:07 ` Jens Axboe
2009-05-28 14:41 ` Theodore Tso
2009-05-29 16:07 ` Artem Bityutskiy
2009-05-29 16:20 ` Artem Bityutskiy
2009-05-29 16:20 ` Artem Bityutskiy
2009-05-29 17:09 ` Jens Axboe
2009-06-03 8:11 ` Artem Bityutskiy
2009-06-03 8:11 ` Artem Bityutskiy
2009-05-29 17:08 ` Jens Axboe
2009-06-03 11:12 ` Artem Bityutskiy
2009-06-03 11:12 ` Artem Bityutskiy
2009-06-03 11:42 ` Jens Axboe
2009-06-04 15:20 ` Frederic Weisbecker
2009-06-04 19:07 ` Andrew Morton
2009-06-04 19:13 ` Frederic Weisbecker
2009-06-04 19:50 ` Jens Axboe
2009-06-04 20:10 ` Jens Axboe
2009-06-04 22:34 ` Frederic Weisbecker
2009-06-05 19:15 ` Jens Axboe
2009-06-05 21:14 ` Jan Kara
2009-06-06 0:18 ` Chris Mason
2009-06-06 0:23 ` Jan Kara
2009-06-06 0:23 ` Jan Kara
2009-06-06 1:06 ` Frederic Weisbecker
2009-06-08 9:23 ` Jens Axboe
2009-06-08 12:23 ` Jan Kara
2009-06-08 12:28 ` Jens Axboe
2009-06-08 13:01 ` Jan Kara
2009-06-09 18:39 ` Frederic Weisbecker
2009-06-06 1:00 ` Frederic Weisbecker
2009-06-06 0:35 ` Frederic Weisbecker
2009-06-04 21:37 ` Frederic Weisbecker
2009-06-05 1:14 ` Zhang, Yanmin
2009-06-05 1:14 ` Zhang, Yanmin
2009-06-05 19:16 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.