* [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write
@ 2010-04-19 12:34 Stefan Hajnoczi
2010-04-19 12:34 ` [Qemu-devel] [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls Stefan Hajnoczi
2010-04-20 8:37 ` [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write Kevin Wolf
0 siblings, 2 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2010-04-19 12:34 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Jan Kiszka, Christoph Hellwig, Stefan Hajnoczi
This patch combines the lseek+read/write calls to use pread/pwrite
instead. This will result in fewer system calls and is already used by
AIO.
Thanks to Jan Kiszka <jan.kiszka@siemens.com> for identifying excessive
lseek and Christoph Hellwig <hch@lst.de> for confirming that this
approach should work.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block/raw-posix.c | 37 ++++---------------------------------
1 files changed, 4 insertions(+), 33 deletions(-)
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 598ea19..7541ed2 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -105,7 +105,6 @@
typedef struct BDRVRawState {
int fd;
int type;
- unsigned int lseek_err_cnt;
int open_flags;
#if defined(__linux__)
/* linux floppy specific */
@@ -134,8 +133,6 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
BDRVRawState *s = bs->opaque;
int fd, ret;
- s->lseek_err_cnt = 0;
-
s->open_flags = open_flags | O_BINARY;
s->open_flags &= ~O_ACCMODE;
if (bdrv_flags & BDRV_O_RDWR) {
@@ -243,19 +240,7 @@ static int raw_pread_aligned(BlockDriverState *bs, int64_t offset,
if (ret < 0)
return ret;
- if (offset >= 0 && lseek(s->fd, offset, SEEK_SET) == (off_t)-1) {
- ++(s->lseek_err_cnt);
- if(s->lseek_err_cnt <= 10) {
- DEBUG_BLOCK_PRINT("raw_pread(%d:%s, %" PRId64 ", %p, %d) [%" PRId64
- "] lseek failed : %d = %s\n",
- s->fd, bs->filename, offset, buf, count,
- bs->total_sectors, errno, strerror(errno));
- }
- return -1;
- }
- s->lseek_err_cnt=0;
-
- ret = read(s->fd, buf, count);
+ ret = pread(s->fd, buf, count, offset);
if (ret == count)
goto label__raw_read__success;
@@ -276,12 +261,10 @@ static int raw_pread_aligned(BlockDriverState *bs, int64_t offset,
/* Try harder for CDrom. */
if (bs->type == BDRV_TYPE_CDROM) {
- lseek(s->fd, offset, SEEK_SET);
- ret = read(s->fd, buf, count);
+ ret = pread(s->fd, buf, count, offset);
if (ret == count)
goto label__raw_read__success;
- lseek(s->fd, offset, SEEK_SET);
- ret = read(s->fd, buf, count);
+ ret = pread(s->fd, buf, count, offset);
if (ret == count)
goto label__raw_read__success;
@@ -313,19 +296,7 @@ static int raw_pwrite_aligned(BlockDriverState *bs, int64_t offset,
if (ret < 0)
return -errno;
- if (offset >= 0 && lseek(s->fd, offset, SEEK_SET) == (off_t)-1) {
- ++(s->lseek_err_cnt);
- if(s->lseek_err_cnt) {
- DEBUG_BLOCK_PRINT("raw_pwrite(%d:%s, %" PRId64 ", %p, %d) [%"
- PRId64 "] lseek failed : %d = %s\n",
- s->fd, bs->filename, offset, buf, count,
- bs->total_sectors, errno, strerror(errno));
- }
- return -EIO;
- }
- s->lseek_err_cnt = 0;
-
- ret = write(s->fd, buf, count);
+ ret = pwrite(s->fd, buf, count, offset);
if (ret == count)
goto label__raw_write__success;
--
1.7.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls
2010-04-19 12:34 [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write Stefan Hajnoczi
@ 2010-04-19 12:34 ` Stefan Hajnoczi
2010-04-19 14:10 ` [Qemu-devel] " Kevin Wolf
2010-04-20 8:37 ` [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write Kevin Wolf
1 sibling, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2010-04-19 12:34 UTC (permalink / raw)
To: qemu-devel; +Cc: Kevin Wolf, Jan Kiszka, Christoph Hellwig, Stefan Hajnoczi
The BlockDriver bdrv_getlength function is called from the I/O code path
when checking that the request falls within the device. Unfortunately
this involves an lseek system call in the raw protocol; every read or
write request will incur this lseek cost.
Jan Kiszka <jan.kiszka@siemens.com> identified this issue and its
latency overhead. This patch caches device length in the existing
total_sectors variable so lseek calls can be avoided for fixed size
devices.
Growable devices fall back to the full bdrv_getlength code path because
I have not added logic to detect extending the size of the device in a
write.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
block.c | 28 ++++++++++++++++++++++------
1 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/block.c b/block.c
index def3400..d5a3ba7 100644
--- a/block.c
+++ b/block.c
@@ -363,6 +363,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
assert(drv != NULL);
bs->file = NULL;
+ bs->total_sectors = 0;
bs->is_temporary = 0;
bs->encrypted = 0;
bs->valid_key = 0;
@@ -416,9 +417,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
}
bs->keep_read_only = bs->read_only = !(open_flags & BDRV_O_RDWR);
- if (drv->bdrv_getlength) {
- bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
- }
+ bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
#ifndef _WIN32
if (bs->is_temporary) {
unlink(filename);
@@ -959,13 +958,26 @@ int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
int bdrv_truncate(BlockDriverState *bs, int64_t offset)
{
BlockDriver *drv = bs->drv;
+ int ret;
if (!drv)
return -ENOMEDIUM;
if (!drv->bdrv_truncate)
return -ENOTSUP;
if (bs->read_only)
return -EACCES;
- return drv->bdrv_truncate(bs, offset);
+ ret = drv->bdrv_truncate(bs, offset);
+ if (ret < 0) {
+ return ret;
+ }
+
+ /* refresh total sectors */
+ if (drv->bdrv_getlength) {
+ bs->total_sectors = 0; /* discard cached value */
+ bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
+ } else {
+ bs->total_sectors = offset >> BDRV_SECTOR_BITS;
+ }
+ return ret;
}
/**
@@ -976,8 +988,12 @@ int64_t bdrv_getlength(BlockDriverState *bs)
BlockDriver *drv = bs->drv;
if (!drv)
return -ENOMEDIUM;
- if (!drv->bdrv_getlength) {
- /* legacy mode */
+
+ /* Fixed size devices use the total_sectors value for speed instead of
+ issuing a length query (like lseek) on each call. Also, legacy block
+ drivers don't provide a bdrv_getlength function and must use
+ total_sectors. */
+ if ((bs->total_sectors && !bs->growable) || !drv->bdrv_getlength) {
return bs->total_sectors * BDRV_SECTOR_SIZE;
}
return drv->bdrv_getlength(bs);
--
1.7.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [Qemu-devel] Re: [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls
2010-04-19 12:34 ` [Qemu-devel] [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls Stefan Hajnoczi
@ 2010-04-19 14:10 ` Kevin Wolf
2010-04-19 14:26 ` Stefan Hajnoczi
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2010-04-19 14:10 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Jan Kiszka, qemu-devel, Christoph Hellwig
Am 19.04.2010 14:34, schrieb Stefan Hajnoczi:
> The BlockDriver bdrv_getlength function is called from the I/O code path
> when checking that the request falls within the device. Unfortunately
> this involves an lseek system call in the raw protocol; every read or
> write request will incur this lseek cost.
>
> Jan Kiszka <jan.kiszka@siemens.com> identified this issue and its
> latency overhead. This patch caches device length in the existing
> total_sectors variable so lseek calls can be avoided for fixed size
> devices.
>
> Growable devices fall back to the full bdrv_getlength code path because
> I have not added logic to detect extending the size of the device in a
> write.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
> block.c | 28 ++++++++++++++++++++++------
> 1 files changed, 22 insertions(+), 6 deletions(-)
>
> diff --git a/block.c b/block.c
> index def3400..d5a3ba7 100644
> --- a/block.c
> +++ b/block.c
> @@ -363,6 +363,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
> assert(drv != NULL);
>
> bs->file = NULL;
> + bs->total_sectors = 0;
> bs->is_temporary = 0;
> bs->encrypted = 0;
> bs->valid_key = 0;
> @@ -416,9 +417,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
> }
>
> bs->keep_read_only = bs->read_only = !(open_flags & BDRV_O_RDWR);
> - if (drv->bdrv_getlength) {
> - bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
> - }
> + bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
Does this hunk make a difference? If drv->bdrv_getlength == NULL, we'll
just get back the current value.
But now that you sent this hunk for review, one thing about the existing
code: We should probably check the return value of bdrv_getlength.
Otherwise both patches look good to me.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls
2010-04-19 14:10 ` [Qemu-devel] " Kevin Wolf
@ 2010-04-19 14:26 ` Stefan Hajnoczi
2010-04-19 14:31 ` Kevin Wolf
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2010-04-19 14:26 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Jan Kiszka, Christoph Hellwig, Stefan Hajnoczi, qemu-devel
On Mon, Apr 19, 2010 at 3:10 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> @@ -416,9 +417,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
>> }
>>
>> bs->keep_read_only = bs->read_only = !(open_flags & BDRV_O_RDWR);
>> - if (drv->bdrv_getlength) {
>> - bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>> - }
>> + bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>
> Does this hunk make a difference? If drv->bdrv_getlength == NULL, we'll
> just get back the current value.
The if statement could be left as is. I removed it to reduce the
number of places where if (drv->bdrv_getlength) is explicitly checked.
If callers don't know the internals of bdrv_getlength() then it is
easier to extend it without auditing and changing callers.
Having said that, I did add an if (drv->bdrv_getlength) check into
bdrv_truncate...
> But now that you sent this hunk for review, one thing about the existing
> code: We should probably check the return value of bdrv_getlength.
You're right.
I'll clean this up and send a v2.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] Re: [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls
2010-04-19 14:26 ` Stefan Hajnoczi
@ 2010-04-19 14:31 ` Kevin Wolf
0 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2010-04-19 14:31 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: Jan Kiszka, Christoph Hellwig, Stefan Hajnoczi, qemu-devel
Am 19.04.2010 16:26, schrieb Stefan Hajnoczi:
> On Mon, Apr 19, 2010 at 3:10 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> @@ -416,9 +417,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
>>> }
>>>
>>> bs->keep_read_only = bs->read_only = !(open_flags & BDRV_O_RDWR);
>>> - if (drv->bdrv_getlength) {
>>> - bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>>> - }
>>> + bs->total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>>
>> Does this hunk make a difference? If drv->bdrv_getlength == NULL, we'll
>> just get back the current value.
>
> The if statement could be left as is. I removed it to reduce the
> number of places where if (drv->bdrv_getlength) is explicitly checked.
> If callers don't know the internals of bdrv_getlength() then it is
> easier to extend it without auditing and changing callers.
Makes sense, I'm not opposed to it.
> Having said that, I did add an if (drv->bdrv_getlength) check into
> bdrv_truncate...
Well, you probably can't do much about it there.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write
2010-04-19 12:34 [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write Stefan Hajnoczi
2010-04-19 12:34 ` [Qemu-devel] [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls Stefan Hajnoczi
@ 2010-04-20 8:37 ` Kevin Wolf
1 sibling, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2010-04-20 8:37 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Jan Kiszka, qemu-devel, Christoph Hellwig
Am 19.04.2010 14:34, schrieb Stefan Hajnoczi:
> This patch combines the lseek+read/write calls to use pread/pwrite
> instead. This will result in fewer system calls and is already used by
> AIO.
>
> Thanks to Jan Kiszka <jan.kiszka@siemens.com> for identifying excessive
> lseek and Christoph Hellwig <hch@lst.de> for confirming that this
> approach should work.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Thanks, applied to the block branch.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2010-04-20 8:37 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-19 12:34 [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write Stefan Hajnoczi
2010-04-19 12:34 ` [Qemu-devel] [PATCH 2/2] block: Cache total_sectors to reduce bdrv_getlength calls Stefan Hajnoczi
2010-04-19 14:10 ` [Qemu-devel] " Kevin Wolf
2010-04-19 14:26 ` Stefan Hajnoczi
2010-04-19 14:31 ` Kevin Wolf
2010-04-20 8:37 ` [Qemu-devel] [PATCH 1/2] raw-posix: Use pread/pwrite instead of lseek+read/write Kevin Wolf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.