linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] [RFC] blkdev: flush generation optimization
@ 2013-04-14 19:31 Dmitry Monakhov
  2013-04-14 19:34 ` [PATCH 1/2] blkdev: add flush generation counter Dmitry Monakhov
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:31 UTC (permalink / raw)
  To: ext4 development, LKML; +Cc: Jan Kara, axboe

[-- Attachment #1: Type: text/plain, Size: 487 bytes --]


Some filesystems try to optimize barrier flushes by maintaining
fs-specific generation counters, but if we introduce generic
flush generation counter for block device filesystems may use
it for fdatasync(2) optimization. Optimization should works if
userspace performs mutli-threaded IO with a lot of fdatasync()
Here are graphs for a test where each task performs random buffered writes
to dedicated file and performs fdatasync(2) after each operation.

Axis: x=nr_tasks, y=write_iops

[-- Attachment #2: fsync_test.fio --]
[-- Type: text/plain, Size: 322 bytes --]

# Chunk server simulation workload
# Files 'chunk.$NUM_JOB.0' should be precreated before the test
# 
[global]
bs=4k
ioengine=psync
filesize=64M
size=8G
direct=0
runtime=30
directory=/mnt
fdatasync=1
group_reporting=1

[chunk]
overwrite=1
new_group=1
write_bw_log=bw.log
rw=randwrite
numjobs=${NUM_JOBS}
fsync=1
stonewall

[-- Attachment #3: ssd-fsync.png --]
[-- Type: image/png, Size: 5313 bytes --]

[-- Attachment #4: hdd-fsync.png --]
[-- Type: image/png, Size: 5566 bytes --]

[-- Attachment #5: Type: text/plain, Size: 103 bytes --]



TOC:
0001 blkdev: add flush generation counter
0002 ext4: Add fdatasync scalability optimization





^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] blkdev: add flush generation counter
  2013-04-14 19:31 [PATCH 0/2] [RFC] blkdev: flush generation optimization Dmitry Monakhov
@ 2013-04-14 19:34 ` Dmitry Monakhov
  2013-04-14 19:34   ` [PATCH 2/2] ext4: Add fdatasync scalability optimization Dmitry Monakhov
  2013-04-15 14:14   ` [PATCH 1/2] blkdev: add flush generation counter Jan Kara
  0 siblings, 2 replies; 6+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ext4, jack, axboe, Dmitry Monakhov

Callers may use this counter to optimize flushes

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 block/blk-core.c       |    1 +
 block/blk-flush.c      |    3 ++-
 include/linux/blkdev.h |    1 +
 3 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 074b758..afb5a4b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -537,6 +537,7 @@ void blk_cleanup_queue(struct request_queue *q)
 	spin_unlock_irq(lock);
 	mutex_unlock(&q->sysfs_lock);
 
+	atomic_set(&q->flush_tag, 0);
 	/*
 	 * Drain all requests queued before DYING marking. Set DEAD flag to
 	 * prevent that q->request_fn() gets invoked after draining finished.
diff --git a/block/blk-flush.c b/block/blk-flush.c
index cc2b827..b1adc75 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -203,7 +203,7 @@ static void flush_end_io(struct request *flush_rq, int error)
 	/* account completion of the flush request */
 	q->flush_running_idx ^= 1;
 	elv_completed_request(q, flush_rq);
-
+	atomic_inc(&q->flush_tag);
 	/* and push the waiting requests to the next stage */
 	list_for_each_entry_safe(rq, n, running, flush.list) {
 		unsigned int seq = blk_flush_cur_seq(rq);
@@ -268,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q)
 	q->flush_rq.end_io = flush_end_io;
 
 	q->flush_pending_idx ^= 1;
+	atomic_inc(&q->flush_tag);
 	list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
 	return true;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 78feda9..e079fbd 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -416,6 +416,7 @@ struct request_queue {
 	unsigned int		flush_queue_delayed:1;
 	unsigned int		flush_pending_idx:1;
 	unsigned int		flush_running_idx:1;
+	atomic_t		flush_tag;
 	unsigned long		flush_pending_since;
 	struct list_head	flush_queue[2];
 	struct list_head	flush_data_in_flight;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] ext4: Add fdatasync scalability optimization
  2013-04-14 19:34 ` [PATCH 1/2] blkdev: add flush generation counter Dmitry Monakhov
@ 2013-04-14 19:34   ` Dmitry Monakhov
  2013-04-15 14:14   ` [PATCH 1/2] blkdev: add flush generation counter Jan Kara
  1 sibling, 0 replies; 6+ messages in thread
From: Dmitry Monakhov @ 2013-04-14 19:34 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-ext4, jack, axboe, Dmitry Monakhov

Track blkdev's flush generation counter on per-inode basis and update
inside end_io. If inode's flush generation counter is older than current
blkdev's flush counter inode's data was already flushed to stable media,
so we can skip explicit barrier. Optimization is safe only when inode's
end_io was called before flush request was QUEUED and COMPLETED.

With that optimization we do not longer need jbd2 flush optimization.

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/ext4.h      |    1 +
 fs/ext4/ext4_jbd2.h |   10 +++++++++-
 fs/ext4/fsync.c     |   16 +++++++++++-----
 fs/ext4/inode.c     |    3 ++-
 fs/ext4/page-io.c   |    2 +-
 5 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 75b2326..e2ec980 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -932,6 +932,7 @@ struct ext4_inode_info {
 	 */
 	tid_t i_sync_tid;
 	tid_t i_datasync_tid;
+	atomic_t i_flush_tag;
 
 	/* Precomputed uuid+inum+igen checksum for seeding inode checksums */
 	__u32 i_csum_seed;
diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index c8c6885..46943ed 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -365,7 +365,15 @@ static inline void ext4_update_inode_fsync_trans(handle_t *handle,
 		ei->i_sync_tid = handle->h_transaction->t_tid;
 		if (datasync)
 			ei->i_datasync_tid = handle->h_transaction->t_tid;
-	}
+	} else {
+		struct request_queue *q = bdev_get_queue(inode->i_sb->s_bdev);
+		if (q)
+			atomic_set(&EXT4_I(inode)->i_flush_tag,
+				   atomic_read(&q->flush_tag));
+		else
+			atomic_set(&EXT4_I(inode)->i_flush_tag, UINT_MAX);
+ 	}
+
 }
 
 /* super.c */
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c
index 8a0dee8..b02d1ec 100644
--- a/fs/ext4/fsync.c
+++ b/fs/ext4/fsync.c
@@ -116,10 +116,10 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	struct inode *inode = file->f_mapping->host;
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+	bool needs_barrier = journal->j_flags & JBD2_BARRIER;
+	struct request_queue *q = bdev_get_queue(inode->i_sb->s_bdev);
 	int ret, err;
 	tid_t commit_tid;
-	bool needs_barrier = false;
-
 	J_ASSERT(ext4_journal_current_handle() == NULL);
 
 	trace_ext4_sync_file_enter(file, datasync);
@@ -163,10 +163,16 @@ int ext4_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
 	}
 
 	commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid;
-	if (journal->j_flags & JBD2_BARRIER &&
-	    !jbd2_trans_will_send_data_barrier(journal, &commit_tid))
-		needs_barrier = true;
 	ret = jbd2_complete_transaction(journal, commit_tid);
+	/*
+	 * We must send a barrier unless we can guarantee that:
+	 * Latest io-requst for given inode was completed before
+	 * new flush request was QUEUED and COMPLETED by blkdev.
+	 */
+	if (q && ((unsigned int)atomic_read(&q->flush_tag) & ~1U)
+	    > (((unsigned int)atomic_read(&ei->i_flush_tag) + 1U) & (~1U)))
+		needs_barrier = 0;
+
 	if (needs_barrier) {
 		err = blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL);
 		if (!ret)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1be5827..761513c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3073,11 +3073,12 @@ static void ext4_end_io_dio(struct kiocb *iocb, loff_t offset,
 		  size);
 
 	iocb->private = NULL;
-
 	/* if not aio dio with unwritten extents, just free io and return */
 	if (!(io_end->flag & EXT4_IO_END_UNWRITTEN)) {
 		ext4_free_io_end(io_end);
 out:
+		if (size)
+			ext4_update_inode_fsync_trans(NULL, inode, 1);
 		inode_dio_done(inode);
 		if (is_async)
 			aio_complete(iocb, ret, 0);
diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 047a6de..8a2a09b 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -282,7 +282,7 @@ static void ext4_end_bio(struct bio *bio, int error)
 	}
 	io_end->num_io_pages = 0;
 	inode = io_end->inode;
-
+	ext4_update_inode_fsync_trans(NULL, inode, 1);
 	if (error) {
 		io_end->flag |= EXT4_IO_END_ERROR;
 		ext4_warning(inode->i_sb, "I/O error writing to inode %lu "
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] blkdev: add flush generation counter
  2013-04-14 19:34 ` [PATCH 1/2] blkdev: add flush generation counter Dmitry Monakhov
  2013-04-14 19:34   ` [PATCH 2/2] ext4: Add fdatasync scalability optimization Dmitry Monakhov
@ 2013-04-15 14:14   ` Jan Kara
  2013-04-17  8:46     ` Dmitry Monakhov
  1 sibling, 1 reply; 6+ messages in thread
From: Jan Kara @ 2013-04-15 14:14 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: linux-kernel, linux-ext4, jack, axboe

On Sun 14-04-13 23:34:27, Dmitry Monakhov wrote:
> Callers may use this counter to optimize flushes
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  block/blk-core.c       |    1 +
>  block/blk-flush.c      |    3 ++-
>  include/linux/blkdev.h |    1 +
>  3 files changed, 4 insertions(+), 1 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 074b758..afb5a4b 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -537,6 +537,7 @@ void blk_cleanup_queue(struct request_queue *q)
>  	spin_unlock_irq(lock);
>  	mutex_unlock(&q->sysfs_lock);
>  
> +	atomic_set(&q->flush_tag, 0);
>  	/*
>  	 * Drain all requests queued before DYING marking. Set DEAD flag to
>  	 * prevent that q->request_fn() gets invoked after draining finished.
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index cc2b827..b1adc75 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -203,7 +203,7 @@ static void flush_end_io(struct request *flush_rq, int error)
>  	/* account completion of the flush request */
>  	q->flush_running_idx ^= 1;
>  	elv_completed_request(q, flush_rq);
> -
> +	atomic_inc(&q->flush_tag);
>  	/* and push the waiting requests to the next stage */
>  	list_for_each_entry_safe(rq, n, running, flush.list) {
>  		unsigned int seq = blk_flush_cur_seq(rq);
> @@ -268,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q)
>  	q->flush_rq.end_io = flush_end_io;
>  
>  	q->flush_pending_idx ^= 1;
> +	atomic_inc(&q->flush_tag);
>  	list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
>  	return true;
>  }
  But this doesn't quite work, does it? When fs reads flush_tag counter,
CACHE_FLUSH command can be already issued so you are not sure how the IO
you are issuing relates to the cache flush in flight. If you want to make
this work, you really need two counters - one for flush submissions and one
for flush completions. Fs would need to read the submission counter before
issuing the IO and the completion counter when it wants to make sure date is
on the stable storage. But it all gets somewhat complex and I'm not sure
it's worth it given the block layer tries to merge flush requests by
itself. But you can try for correct implementation and measure the
performance impact of course...

								Honza
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 78feda9..e079fbd 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -416,6 +416,7 @@ struct request_queue {
>  	unsigned int		flush_queue_delayed:1;
>  	unsigned int		flush_pending_idx:1;
>  	unsigned int		flush_running_idx:1;
> +	atomic_t		flush_tag;
>  	unsigned long		flush_pending_since;
>  	struct list_head	flush_queue[2];
>  	struct list_head	flush_data_in_flight;
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] blkdev: add flush generation counter
  2013-04-15 14:14   ` [PATCH 1/2] blkdev: add flush generation counter Jan Kara
@ 2013-04-17  8:46     ` Dmitry Monakhov
  2013-04-18 14:34       ` Jan Kara
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry Monakhov @ 2013-04-17  8:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-ext4, jack, axboe

On Mon, 15 Apr 2013 16:14:39 +0200, Jan Kara <jack@suse.cz> wrote:
> On Sun 14-04-13 23:34:27, Dmitry Monakhov wrote:
> > Callers may use this counter to optimize flushes
> > 
> > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > ---
> >  block/blk-core.c       |    1 +
> >  block/blk-flush.c      |    3 ++-
> >  include/linux/blkdev.h |    1 +
> >  3 files changed, 4 insertions(+), 1 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 074b758..afb5a4b 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -537,6 +537,7 @@ void blk_cleanup_queue(struct request_queue *q)
> >  	spin_unlock_irq(lock);
> >  	mutex_unlock(&q->sysfs_lock);
> >  
> > +	atomic_set(&q->flush_tag, 0);
> >  	/*
> >  	 * Drain all requests queued before DYING marking. Set DEAD flag to
> >  	 * prevent that q->request_fn() gets invoked after draining finished.
> > diff --git a/block/blk-flush.c b/block/blk-flush.c
> > index cc2b827..b1adc75 100644
> > --- a/block/blk-flush.c
> > +++ b/block/blk-flush.c
> > @@ -203,7 +203,7 @@ static void flush_end_io(struct request *flush_rq, int error)
> >  	/* account completion of the flush request */
> >  	q->flush_running_idx ^= 1;
> >  	elv_completed_request(q, flush_rq);
> > -
> > +	atomic_inc(&q->flush_tag);
> >  	/* and push the waiting requests to the next stage */
> >  	list_for_each_entry_safe(rq, n, running, flush.list) {
> >  		unsigned int seq = blk_flush_cur_seq(rq);
> > @@ -268,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q)
> >  	q->flush_rq.end_io = flush_end_io;
> >  
> >  	q->flush_pending_idx ^= 1;
> > +	atomic_inc(&q->flush_tag);
> >  	list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
> >  	return true;
> >  }
>   But this doesn't quite work, does it? When fs reads flush_tag counter,
> CACHE_FLUSH command can be already issued so you are not sure how the IO
> you are issuing relates to the cache flush in flight. If you want to make
> this work, you really need two counters - one for flush submissions and one
> for flush completions. Fs would need to read the submission counter before
> issuing the IO and the completion counter when it wants to make sure date is
> on the stable storage. But it all gets somewhat complex and I'm not sure
> it's worth it given the block layer tries to merge flush requests by
> itself. But you can try for correct implementation and measure the
> performance impact of course...

Probably I've missed good comments, but I've tried to make things correct
Here are an flush implementation assumptions from block/blk-flush.c:

Currently, the following conditions are used to determine when to e flush.

C1. At any given time, only one flush shall be in progress.  This is
    double buffering sufficient.
C2. Flush is deferred if any request is executing DATA of its ence.
    This avoids issuing separate POSTFLUSHes for requests which ed
    PREFLUSH.
C3. The second condition is ignored if there is a request which has
    waited longer than FLUSH_PENDING_TIMEOUT.  This is to avoid
    starvation in the unlikely case where there are continuous am of
    FUA (without FLUSH) requests.

So if we will increment flush_tag counter in two places:
blk_kick_flush: the place where flush request is issued
flush_end_io : the place where flush is completed
And by using rule (C1) we can guarantee that

if (flush_tag & 0x1 == 1) then flush_tag  is in progress
if (flush_tag & 0x1 == 0) then (flush_tag & ~(0x1)) completed
In other words we can define it as:

FLUSH_TAG_IDX    = (flush_tag +1) & ~0x1
FLUSH_TAG_STATE  = flush_tag & 0x1 ? IN_PROGRESS : COMPLETED

After that we can define rules for flush optimization:
We can be sure that our data was flushed only if:
1) data's bio was completed before flush request was QUEUED
   and COMPLETED
So in terms of formulas we can write it like follows:
is_data_flushed = (blkdev->flush_tag & ~(0x1)) >
                  ((data->flush_tag + 0x1) & (~0x1))

> 
> 								Honza
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 78feda9..e079fbd 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -416,6 +416,7 @@ struct request_queue {
> >  	unsigned int		flush_queue_delayed:1;
> >  	unsigned int		flush_pending_idx:1;
> >  	unsigned int		flush_running_idx:1;
> > +	atomic_t		flush_tag;
> >  	unsigned long		flush_pending_since;
> >  	struct list_head	flush_queue[2];
> >  	struct list_head	flush_data_in_flight;
> > -- 
> > 1.7.1
> > 
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] blkdev: add flush generation counter
  2013-04-17  8:46     ` Dmitry Monakhov
@ 2013-04-18 14:34       ` Jan Kara
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kara @ 2013-04-18 14:34 UTC (permalink / raw)
  To: Dmitry Monakhov; +Cc: Jan Kara, linux-kernel, linux-ext4, axboe

On Wed 17-04-13 12:46:43, Dmitry Monakhov wrote:
> On Mon, 15 Apr 2013 16:14:39 +0200, Jan Kara <jack@suse.cz> wrote:
> > On Sun 14-04-13 23:34:27, Dmitry Monakhov wrote:
> > > Callers may use this counter to optimize flushes
> > > 
> > > Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> > > ---
> > >  block/blk-core.c       |    1 +
> > >  block/blk-flush.c      |    3 ++-
> > >  include/linux/blkdev.h |    1 +
> > >  3 files changed, 4 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/block/blk-core.c b/block/blk-core.c
> > > index 074b758..afb5a4b 100644
> > > --- a/block/blk-core.c
> > > +++ b/block/blk-core.c
> > > @@ -537,6 +537,7 @@ void blk_cleanup_queue(struct request_queue *q)
> > >  	spin_unlock_irq(lock);
> > >  	mutex_unlock(&q->sysfs_lock);
> > >  
> > > +	atomic_set(&q->flush_tag, 0);
> > >  	/*
> > >  	 * Drain all requests queued before DYING marking. Set DEAD flag to
> > >  	 * prevent that q->request_fn() gets invoked after draining finished.
> > > diff --git a/block/blk-flush.c b/block/blk-flush.c
> > > index cc2b827..b1adc75 100644
> > > --- a/block/blk-flush.c
> > > +++ b/block/blk-flush.c
> > > @@ -203,7 +203,7 @@ static void flush_end_io(struct request *flush_rq, int error)
> > >  	/* account completion of the flush request */
> > >  	q->flush_running_idx ^= 1;
> > >  	elv_completed_request(q, flush_rq);
> > > -
> > > +	atomic_inc(&q->flush_tag);
> > >  	/* and push the waiting requests to the next stage */
> > >  	list_for_each_entry_safe(rq, n, running, flush.list) {
> > >  		unsigned int seq = blk_flush_cur_seq(rq);
> > > @@ -268,6 +268,7 @@ static bool blk_kick_flush(struct request_queue *q)
> > >  	q->flush_rq.end_io = flush_end_io;
> > >  
> > >  	q->flush_pending_idx ^= 1;
> > > +	atomic_inc(&q->flush_tag);
> > >  	list_add_tail(&q->flush_rq.queuelist, &q->queue_head);
> > >  	return true;
> > >  }
> >   But this doesn't quite work, does it? When fs reads flush_tag counter,
> > CACHE_FLUSH command can be already issued so you are not sure how the IO
> > you are issuing relates to the cache flush in flight. If you want to make
> > this work, you really need two counters - one for flush submissions and one
> > for flush completions. Fs would need to read the submission counter before
> > issuing the IO and the completion counter when it wants to make sure date is
> > on the stable storage. But it all gets somewhat complex and I'm not sure
> > it's worth it given the block layer tries to merge flush requests by
> > itself. But you can try for correct implementation and measure the
> > performance impact of course...
> 
> Probably I've missed good comments, but I've tried to make things correct
> Here are an flush implementation assumptions from block/blk-flush.c:
> 
> Currently, the following conditions are used to determine when to e flush.
> 
> C1. At any given time, only one flush shall be in progress.  This is
>     double buffering sufficient.
> C2. Flush is deferred if any request is executing DATA of its ence.
>     This avoids issuing separate POSTFLUSHes for requests which ed
>     PREFLUSH.
> C3. The second condition is ignored if there is a request which has
>     waited longer than FLUSH_PENDING_TIMEOUT.  This is to avoid
>     starvation in the unlikely case where there are continuous am of
>     FUA (without FLUSH) requests.
> 
> So if we will increment flush_tag counter in two places:
> blk_kick_flush: the place where flush request is issued
> flush_end_io : the place where flush is completed
> And by using rule (C1) we can guarantee that
> 
> if (flush_tag & 0x1 == 1) then flush_tag  is in progress
> if (flush_tag & 0x1 == 0) then (flush_tag & ~(0x1)) completed
> In other words we can define it as:
> 
> FLUSH_TAG_IDX    = (flush_tag +1) & ~0x1
> FLUSH_TAG_STATE  = flush_tag & 0x1 ? IN_PROGRESS : COMPLETED
> 
> After that we can define rules for flush optimization:
> We can be sure that our data was flushed only if:
> 1) data's bio was completed before flush request was QUEUED
>    and COMPLETED
> So in terms of formulas we can write it like follows:
> is_data_flushed = (blkdev->flush_tag & ~(0x1)) >
>                   ((data->flush_tag + 0x1) & (~0x1))
  Ah, clever! But then please
a) create a function with the condition so that it doesn't have to be
   opencoded in every place using the counter
b) add a detailed comment before that function explaining why it works.
   Something like what you wrote in this email...

  Thanks!

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-18 14:35 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-14 19:31 [PATCH 0/2] [RFC] blkdev: flush generation optimization Dmitry Monakhov
2013-04-14 19:34 ` [PATCH 1/2] blkdev: add flush generation counter Dmitry Monakhov
2013-04-14 19:34   ` [PATCH 2/2] ext4: Add fdatasync scalability optimization Dmitry Monakhov
2013-04-15 14:14   ` [PATCH 1/2] blkdev: add flush generation counter Jan Kara
2013-04-17  8:46     ` Dmitry Monakhov
2013-04-18 14:34       ` Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).