* [patch]btrfs: finish read pages in the order they are submitted
@ 2010-02-03 7:45 Shaohua Li
2010-02-03 18:18 ` Chris Mason
0 siblings, 1 reply; 4+ messages in thread
From: Shaohua Li @ 2010-02-03 7:45 UTC (permalink / raw)
To: linux-btrfs
the endio is done at reverse order of bio vectors. That means for a sequential
read, the page first submitted will finish last in a bio. Considering we will
do checksum (making cache hot) for every page, this does introduce delay (and
chance to squeeze cache used soon) for pages submitted at the begining. I
don't observe obvious performance difference with below patch at my simple test,
but seems more natural to finish read in the order they are submitted.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 96577e8..4df0c56 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1750,7 +1750,8 @@ static void end_bio_extent_writepage(struct bio *bio, int err)
static void end_bio_extent_readpage(struct bio *bio, int err)
{
int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
- struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+ struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
+ struct bio_vec *bvec = bio->bi_io_vec;
struct extent_io_tree *tree;
u64 start;
u64 end;
@@ -1773,7 +1774,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
else
whole_page = 0;
- if (--bvec >= bio->bi_io_vec)
+ if (++bvec <= bvec_end)
prefetchw(&bvec->bv_page->flags);
if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) {
@@ -1818,7 +1819,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
}
check_page_locked(tree, page);
}
- } while (bvec >= bio->bi_io_vec);
+ } while (bvec <= bvec_end);
bio_put(bio);
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [patch]btrfs: finish read pages in the order they are submitted
2010-02-03 7:45 [patch]btrfs: finish read pages in the order they are submitted Shaohua Li
@ 2010-02-03 18:18 ` Chris Mason
2010-02-08 10:59 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Chris Mason @ 2010-02-03 18:18 UTC (permalink / raw)
To: Shaohua Li; +Cc: linux-btrfs, jens.axboe
On Wed, Feb 03, 2010 at 03:45:11PM +0800, Shaohua Li wrote:
> the endio is done at reverse order of bio vectors. That means for a sequential
> read, the page first submitted will finish last in a bio. Considering we will
> do checksum (making cache hot) for every page, this does introduce delay (and
> chance to squeeze cache used soon) for pages submitted at the begining. I
> don't observe obvious performance difference with below patch at my simple test,
> but seems more natural to finish read in the order they are submitted.
Interesting, I wonder if we'd be able to see this on a higher throughput
system. Jens, care to give it a shot (patch below)?
-chris
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 96577e8..4df0c56 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1750,7 +1750,8 @@ static void end_bio_extent_writepage(struct bio *bio, int err)
static void end_bio_extent_readpage(struct bio *bio, int err)
{
int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
- struct bio_vec *bvec = bio->bi_io_vec + bio->bi_vcnt - 1;
+ struct bio_vec *bvec_end = bio->bi_io_vec + bio->bi_vcnt - 1;
+ struct bio_vec *bvec = bio->bi_io_vec;
struct extent_io_tree *tree;
u64 start;
u64 end;
@@ -1773,7 +1774,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
else
whole_page = 0;
- if (--bvec >= bio->bi_io_vec)
+ if (++bvec <= bvec_end)
prefetchw(&bvec->bv_page->flags);
if (uptodate && tree->ops && tree->ops->readpage_end_io_hook) {
@@ -1818,7 +1819,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
}
check_page_locked(tree, page);
}
- } while (bvec >= bio->bi_io_vec);
+ } while (bvec <= bvec_end);
bio_put(bio);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [patch]btrfs: finish read pages in the order they are submitted
2010-02-03 18:18 ` Chris Mason
@ 2010-02-08 10:59 ` Jens Axboe
2010-02-08 11:44 ` Jens Axboe
0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2010-02-08 10:59 UTC (permalink / raw)
To: Chris Mason; +Cc: Shaohua Li, linux-btrfs
On Wed, Feb 03 2010, Chris Mason wrote:
> On Wed, Feb 03, 2010 at 03:45:11PM +0800, Shaohua Li wrote:
> > the endio is done at reverse order of bio vectors. That means for a sequential
> > read, the page first submitted will finish last in a bio. Considering we will
> > do checksum (making cache hot) for every page, this does introduce delay (and
> > chance to squeeze cache used soon) for pages submitted at the begining. I
> > don't observe obvious performance difference with below patch at my simple test,
> > but seems more natural to finish read in the order they are submitted.
>
> Interesting, I wonder if we'd be able to see this on a higher throughput
> system. Jens, care to give it a shot (patch below)?
Sure, I gave it a spin. Baseline is current -git (-rc7'ish), and the
workload is just stream reading 8 16GB files. I used large streaming
reads as the bigger ios would hopefully help show the effect of doing
the reverse completions. The run takes ~1 minute, and the results are
averaged over 3 runs.
Throughput:
Kernel Slowest Fastest Average
-------------------------------------------------------
baseline 2041MB/sec 2229MB/sec 2155MB/sec
patched 2052MB/sec 2071MB/sec 2062MB/sec
Completion latency average (msecs):
Kernel Best Worst Average
-------------------------------------------------------
baseline 1.72 1.89 1.79
patche 1.83 1.89 1.85
Probably would need a LOT more runs to get a statistically significant
number here, it would be nice if O_DIRECT worked (hint, hint!) which
usually makes these things easier to test. If I look at the throughput
of the runs, the baseline usually starts a little slower (1.8GB/sec or
so) and gets faster, while the patched run starts much higher (close to
3.0GB/sec) and drops to 2.0GB/sec after that for the rest of the run.
So I did some perf stat checks too, to see if we see an improvement for
cache utilization. Results below.
Cache stats (millions)
Kernel References Misses
----------------------------------------------
baseline 3547 2387
patched 3822 2351o
These numbers are very stable, the above were also averaged over 3 runs,
but variability was very low.
My feeling is that the patch should be included. Cache misses are
provably down and the patch makes a lot of sense just logically. The
patched runs seemed more stable, and my gut tells me that the unpatched
runs may have been a bit flukey (one fast run, should probably be
excluded).
Let me know if you want more tests.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch]btrfs: finish read pages in the order they are submitted
2010-02-08 10:59 ` Jens Axboe
@ 2010-02-08 11:44 ` Jens Axboe
0 siblings, 0 replies; 4+ messages in thread
From: Jens Axboe @ 2010-02-08 11:44 UTC (permalink / raw)
To: Chris Mason; +Cc: Shaohua Li, linux-btrfs
On Mon, Feb 08 2010, Jens Axboe wrote:
> Cache stats (millions)
>
> Kernel References Misses
> ----------------------------------------------
> baseline 3547 2387
> patched 3822 2351
>
> These numbers are very stable, the above were also averaged over 3 runs,
> but variability was very low.
Update on this. I setup the storage system for more stable runs and
repeated the above test. It runs a bit faster as well, completes the
workload at 2.5GB/sec average.
Cache stats (millions)
Kernel References Misses
----------------------------------------------
baseline 3384 2318
baseline 3417 2313
baseline 3382 2323
baseline avg 3394 2318
patched 3518 2258
patched 3428 2201
patched 3536 2274
patched avg 3494 2244
So for those runs, ~3% more references and ~3 less misses. Even with the
variability here, that looks like a win in my book.
--
Jens Axboe
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-02-08 11:44 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-03 7:45 [patch]btrfs: finish read pages in the order they are submitted Shaohua Li
2010-02-03 18:18 ` Chris Mason
2010-02-08 10:59 ` Jens Axboe
2010-02-08 11:44 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.