* [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests
@ 2015-01-28 18:49 Denis V. Lunev
2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
0 siblings, 1 reply; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw)
Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 100000; i++)
write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.
The difference is quite reliable.
I have used the following program to test
#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <malloc.h>
#include <string.h>
int main(int argc, char *argv[])
{
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
void *buf;
int i = 0;
do {
buf = memalign(512, 4096); <--- replace 512 with 4096
if ((unsigned long)buf & 4095)
break;
i++;
} while (1);
printf("%d\n", i);
memset(buf, 0x11, 4096);
for (i = 0; i < 100000; i++)
write(fd, buf, 4096);
close(fd);
return 0;
}
time for in in `seq 1 30` ; do a.out aa ; done
The file was placed into 8 GB partition on HDD below to avoid speed
change due to different offset on disk. Results are reliable:
- 189 vs 180 seconds on Linux 3.16
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
hades ~/src/qemu # hdparm -I /dev/sdg
/dev/sdg:
ATA device, with non-removable media
Model Number: WDC WD20EZRX-07D8PB0
Serial Number: WD-WCC4M5LVSAEP
Firmware Revision: 80.00A80
Transport: Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Supported: 9 8 7 6 5
Likely used: 9
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 3907029168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
device size with M = 1024*1024: 1907729 MBytes
device size with M = 1000*1000: 2000398 MBytes (2000 GB)
cache/buffer size = unknown
Nominal Media Rotation Rate: 5400
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
Power-Up In Standby feature set
* SET_FEATURES required to spinup after power up
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* 64-bit World wide name
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* NCQ priority information
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
* DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
unknown 206[12] (vendor specific)
unknown 206[13] (vendor specific)
unknown 206[14] (vendor specific)
Security:
Master password revision code = 65534
supported
not enabled
not locked
frozen
not expired: security count
supported: enhanced erase
276min for SECURITY ERASE UNIT. 276min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2b5da838c
NAA : 5
IEEE OUI : 0014ee
Unique ID : 2b5da838c
Checksum: correct
hades ~/src/qemu #
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev
@ 2015-01-28 18:49 ` Denis V. Lunev
2015-01-28 19:59 ` Denis V. Lunev
2015-01-28 20:07 ` Paolo Bonzini
0 siblings, 2 replies; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw)
Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi
The following sequence
int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
for (i = 0; i < 100000; i++)
write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.
The difference is quite reliable.
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
block.c | 4 ++--
block/raw-posix.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/block.c b/block.c
index d45e4dd..bc5d1e7 100644
--- a/block.c
+++ b/block.c
@@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
} else {
- bs->bl.opt_mem_alignment = 512;
+ bs->bl.opt_mem_alignment = 4096;
}
if (bs->backing_hd) {
@@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
bs->open_flags = flags;
bs->guest_block_size = 512;
- bs->request_alignment = 512;
+ bs->request_alignment = 4096;
bs->zero_beyond_eof = true;
open_flags = bdrv_open_flags(bs, flags);
bs->read_only = !(open_flags & BDRV_O_RDWR);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index ec38fee..d1b3388 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
if (!s->buf_align) {
size_t align;
buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
- for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+ for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
s->buf_align = align;
break;
@@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
if (!bs->request_alignment) {
size_t align;
buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
- for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+ for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
if (pread(fd, buf, align, 0) >= 0) {
bs->request_alignment = align;
break;
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
@ 2015-01-28 19:59 ` Denis V. Lunev
2015-01-28 20:07 ` Paolo Bonzini
1 sibling, 0 replies; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 19:59 UTC (permalink / raw)
Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 28/01/15 21:49, Denis V. Lunev wrote:
> The following sequence
> int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
> for (i = 0; i < 100000; i++)
> write(fd, buf, 4096);
> performs 10% better if buf is aligned to 4096 bytes rather then to
> 512 bytes on HDD with 512/4096 logical/physical sector size.
>
> The difference is quite reliable.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> block.c | 4 ++--
> block/raw-posix.c | 4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index d45e4dd..bc5d1e7 100644
> --- a/block.c
> +++ b/block.c
> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
> bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
> bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
> } else {
> - bs->bl.opt_mem_alignment = 512;
> + bs->bl.opt_mem_alignment = 4096;
> }
>
> if (bs->backing_hd) {
> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>
> bs->open_flags = flags;
> bs->guest_block_size = 512;
> - bs->request_alignment = 512;
> + bs->request_alignment = 4096;
> bs->zero_beyond_eof = true;
> open_flags = bdrv_open_flags(bs, flags);
> bs->read_only = !(open_flags & BDRV_O_RDWR);
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index ec38fee..d1b3388 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
> if (!s->buf_align) {
> size_t align;
> buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
> if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
> s->buf_align = align;
> break;
> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
> if (!bs->request_alignment) {
> size_t align;
> buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
> if (pread(fd, buf, align, 0) >= 0) {
> bs->request_alignment = align;
> break;
sorry, the patch is wrong. It breaks 'make check-block'.
I will redo it and perform more testing.
request-alignment related changes are wrong :(
I have run tests without them but added them as
a obvious last minute addition.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
2015-01-28 19:59 ` Denis V. Lunev
@ 2015-01-28 20:07 ` Paolo Bonzini
2015-01-28 20:13 ` Denis V. Lunev
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2015-01-28 20:07 UTC (permalink / raw)
To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 28/01/2015 19:49, Denis V. Lunev wrote:
> The following sequence
> int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
> for (i = 0; i < 100000; i++)
> write(fd, buf, 4096);
> performs 10% better if buf is aligned to 4096 bytes rather then to
> 512 bytes on HDD with 512/4096 logical/physical sector size.
>
> The difference is quite reliable.
The 10% difference, however, is probably not enough to cover the cost of
providing a bounce buffer if a guest is (rightfully) using a 512-byte
aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it
should be bs->bl.min_mem_alignment instead.
Instead, you probably should patch bdrv_opt_mem_align to return at least
4096, and leave the detection logic intact. This will let
qemu_blockalign return a properly aligned buffer to qemu-img and other
in-process allocations, without negatively affecting the guest.
Thanks,
Paolo
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> block.c | 4 ++--
> block/raw-posix.c | 4 ++--
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index d45e4dd..bc5d1e7 100644
> --- a/block.c
> +++ b/block.c
> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
> bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
> bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
> } else {
> - bs->bl.opt_mem_alignment = 512;
> + bs->bl.opt_mem_alignment = 4096;
> }
>
> if (bs->backing_hd) {
> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>
> bs->open_flags = flags;
> bs->guest_block_size = 512;
> - bs->request_alignment = 512;
> + bs->request_alignment = 4096;
> bs->zero_beyond_eof = true;
> open_flags = bdrv_open_flags(bs, flags);
> bs->read_only = !(open_flags & BDRV_O_RDWR);
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index ec38fee..d1b3388 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
> if (!s->buf_align) {
> size_t align;
> buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
> if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
> s->buf_align = align;
> break;
> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
> if (!bs->request_alignment) {
> size_t align;
> buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
> if (pread(fd, buf, align, 0) >= 0) {
> bs->request_alignment = align;
> break;
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
2015-01-28 20:07 ` Paolo Bonzini
@ 2015-01-28 20:13 ` Denis V. Lunev
0 siblings, 0 replies; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 20:13 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi
On 28/01/15 23:07, Paolo Bonzini wrote:
>
> On 28/01/2015 19:49, Denis V. Lunev wrote:
>> The following sequence
>> int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>> for (i = 0; i < 100000; i++)
>> write(fd, buf, 4096);
>> performs 10% better if buf is aligned to 4096 bytes rather then to
>> 512 bytes on HDD with 512/4096 logical/physical sector size.
>>
>> The difference is quite reliable.
> The 10% difference, however, is probably not enough to cover the cost of
> providing a bounce buffer if a guest is (rightfully) using a 512-byte
> aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it
> should be bs->bl.min_mem_alignment instead.
>
> Instead, you probably should patch bdrv_opt_mem_align to return at least
> 4096, and leave the detection logic intact. This will let
> qemu_blockalign return a properly aligned buffer to qemu-img and other
> in-process allocations, without negatively affecting the guest.
>
> Thanks,
>
> Paolo
ok, this looks good to me :)
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> block.c | 4 ++--
>> block/raw-posix.c | 4 ++--
>> 2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index d45e4dd..bc5d1e7 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>> bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
>> bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
>> } else {
>> - bs->bl.opt_mem_alignment = 512;
>> + bs->bl.opt_mem_alignment = 4096;
>> }
>>
>> if (bs->backing_hd) {
>> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>>
>> bs->open_flags = flags;
>> bs->guest_block_size = 512;
>> - bs->request_alignment = 512;
>> + bs->request_alignment = 4096;
>> bs->zero_beyond_eof = true;
>> open_flags = bdrv_open_flags(bs, flags);
>> bs->read_only = !(open_flags & BDRV_O_RDWR);
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index ec38fee..d1b3388 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>> if (!s->buf_align) {
>> size_t align;
>> buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
>> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
>> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>> if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
>> s->buf_align = align;
>> break;
>> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>> if (!bs->request_alignment) {
>> size_t align;
>> buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
>> - for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
>> + for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>> if (pread(fd, buf, align, 0) >= 0) {
>> bs->request_alignment = align;
>> break;
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-01-28 20:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev
2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
2015-01-28 19:59 ` Denis V. Lunev
2015-01-28 20:07 ` Paolo Bonzini
2015-01-28 20:13 ` Denis V. Lunev
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.