All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests
@ 2015-01-28 18:49 Denis V. Lunev
  2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
  0 siblings, 1 reply; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

The following sequence
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    for (i = 0; i < 100000; i++)
            write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

I have used the following program to test
#define _GNU_SOURCE

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <malloc.h>
#include <string.h>

int main(int argc, char *argv[])
{
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    void *buf;
    int i = 0;

    do {
        buf = memalign(512, 4096); <--- replace 512 with 4096
        if ((unsigned long)buf & 4095)
            break;
        i++;
    } while (1);
    printf("%d\n", i);

    memset(buf, 0x11, 4096);

    for (i = 0; i < 100000; i++)
        write(fd, buf, 4096);

    close(fd);
    return 0;
}
time for in in `seq 1 30` ; do a.out aa ; done

The file was placed into 8 GB partition on HDD below to avoid speed
change due to different offset on disk. Results are reliable:
- 189 vs 180 seconds on Linux 3.16

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>

hades ~/src/qemu # hdparm -I /dev/sdg

/dev/sdg:

ATA device, with non-removable media
    Model Number:       WDC WD20EZRX-07D8PB0
    Serial Number:      WD-WCC4M5LVSAEP
    Firmware Revision:  80.00A80
    Transport:          Serial, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
    Supported: 9 8 7 6 5
    Likely used: 9
Configuration:
    Logical     max current
    cylinders   16383   16383
    heads       16  16
    sectors/track   63  63
    --
    CHS current addressable sectors:   16514064
    LBA    user addressable sectors:  268435455
    LBA48  user addressable sectors: 3907029168
    Logical  Sector size:                   512 bytes
    Physical Sector size:                  4096 bytes
    device size with M = 1024*1024:     1907729 MBytes
    device size with M = 1000*1000:     2000398 MBytes (2000 GB)
    cache/buffer size  = unknown
    Nominal Media Rotation Rate: 5400
Capabilities:
    LBA, IORDY(can be disabled)
    Queue depth: 32
    Standby timer values: spec'd by Standard, with device specific minimum
    R/W multiple sector transfer: Max = 16  Current = 16
    DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
         Cycle time: min=120ns recommended=120ns
    PIO: pio0 pio1 pio2 pio3 pio4
         Cycle time: no flow control=120ns  IORDY flow control=120ns
Commands/features:
    Enabled Supported:
       *    SMART feature set
            Security Mode feature set
       *    Power Management feature set
       *    Write cache
       *    Look-ahead
       *    Host Protected Area feature set
       *    WRITE_BUFFER command
       *    READ_BUFFER command
       *    NOP cmd
       *    DOWNLOAD_MICROCODE
            Power-Up In Standby feature set
       *    SET_FEATURES required to spinup after power up
            SET_MAX security extension
       *    48-bit Address feature set
       *    Device Configuration Overlay feature set
       *    Mandatory FLUSH_CACHE
       *    FLUSH_CACHE_EXT
       *    SMART error logging
       *    SMART self-test
       *    General Purpose Logging feature set
       *    64-bit World wide name
       *    WRITE_UNCORRECTABLE_EXT command
       *    {READ,WRITE}_DMA_EXT_GPL commands
       *    Segmented DOWNLOAD_MICROCODE
       *    Gen1 signaling speed (1.5Gb/s)
       *    Gen2 signaling speed (3.0Gb/s)
       *    Gen3 signaling speed (6.0Gb/s)
       *    Native Command Queueing (NCQ)
       *    Host-initiated interface power management
       *    Phy event counters
       *    NCQ priority information
       *    READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
       *    DMA Setup Auto-Activate optimization
            Device-initiated interface power management
       *    Software settings preservation
       *    SMART Command Transport (SCT) feature set
       *    SCT Write Same (AC2)
       *    SCT Features Control (AC4)
       *    SCT Data Tables (AC5)
            unknown 206[12] (vendor specific)
            unknown 206[13] (vendor specific)
            unknown 206[14] (vendor specific)
Security:
    Master password revision code = 65534
        supported
    not enabled
    not locked
        frozen
    not expired: security count
        supported: enhanced erase
    276min for SECURITY ERASE UNIT. 276min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 50014ee2b5da838c
    NAA     : 5
    IEEE OUI    : 0014ee
    Unique ID   : 2b5da838c
Checksum: correct
hades ~/src/qemu #

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
  2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev
@ 2015-01-28 18:49 ` Denis V. Lunev
  2015-01-28 19:59   ` Denis V. Lunev
  2015-01-28 20:07   ` Paolo Bonzini
  0 siblings, 2 replies; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 18:49 UTC (permalink / raw)
  Cc: Kevin Wolf, Denis V. Lunev, qemu-devel, Stefan Hajnoczi

The following sequence
    int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
    for (i = 0; i < 100000; i++)
            write(fd, buf, 4096);
performs 10% better if buf is aligned to 4096 bytes rather then to
512 bytes on HDD with 512/4096 logical/physical sector size.

The difference is quite reliable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
---
 block.c           | 4 ++--
 block/raw-posix.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index d45e4dd..bc5d1e7 100644
--- a/block.c
+++ b/block.c
@@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
         bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
         bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
     } else {
-        bs->bl.opt_mem_alignment = 512;
+        bs->bl.opt_mem_alignment = 4096;
     }
 
     if (bs->backing_hd) {
@@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
 
     bs->open_flags = flags;
     bs->guest_block_size = 512;
-    bs->request_alignment = 512;
+    bs->request_alignment = 4096;
     bs->zero_beyond_eof = true;
     open_flags = bdrv_open_flags(bs, flags);
     bs->read_only = !(open_flags & BDRV_O_RDWR);
diff --git a/block/raw-posix.c b/block/raw-posix.c
index ec38fee..d1b3388 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     if (!s->buf_align) {
         size_t align;
         buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
-        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
             if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
                 s->buf_align = align;
                 break;
@@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
     if (!bs->request_alignment) {
         size_t align;
         buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
-        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
             if (pread(fd, buf, align, 0) >= 0) {
                 bs->request_alignment = align;
                 break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
  2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
@ 2015-01-28 19:59   ` Denis V. Lunev
  2015-01-28 20:07   ` Paolo Bonzini
  1 sibling, 0 replies; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 19:59 UTC (permalink / raw)
  Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 28/01/15 21:49, Denis V. Lunev wrote:
> The following sequence
>      int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>      for (i = 0; i < 100000; i++)
>              write(fd, buf, 4096);
> performs 10% better if buf is aligned to 4096 bytes rather then to
> 512 bytes on HDD with 512/4096 logical/physical sector size.
>
> The difference is quite reliable.
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block.c           | 4 ++--
>   block/raw-posix.c | 4 ++--
>   2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/block.c b/block.c
> index d45e4dd..bc5d1e7 100644
> --- a/block.c
> +++ b/block.c
> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>           bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
>           bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
>       } else {
> -        bs->bl.opt_mem_alignment = 512;
> +        bs->bl.opt_mem_alignment = 4096;
>       }
>   
>       if (bs->backing_hd) {
> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>   
>       bs->open_flags = flags;
>       bs->guest_block_size = 512;
> -    bs->request_alignment = 512;
> +    bs->request_alignment = 4096;
>       bs->zero_beyond_eof = true;
>       open_flags = bdrv_open_flags(bs, flags);
>       bs->read_only = !(open_flags & BDRV_O_RDWR);
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index ec38fee..d1b3388 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>       if (!s->buf_align) {
>           size_t align;
>           buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
> -        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> +        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>               if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
>                   s->buf_align = align;
>                   break;
> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>       if (!bs->request_alignment) {
>           size_t align;
>           buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
> -        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> +        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>               if (pread(fd, buf, align, 0) >= 0) {
>                   bs->request_alignment = align;
>                   break;
sorry, the patch is wrong. It breaks 'make check-block'.
I will redo it and perform more testing.

request-alignment related changes are wrong :(
I have run tests without them but added them as
a obvious last minute addition.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
  2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
  2015-01-28 19:59   ` Denis V. Lunev
@ 2015-01-28 20:07   ` Paolo Bonzini
  2015-01-28 20:13     ` Denis V. Lunev
  1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2015-01-28 20:07 UTC (permalink / raw)
  To: Denis V. Lunev; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi



On 28/01/2015 19:49, Denis V. Lunev wrote:
> The following sequence
>     int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>     for (i = 0; i < 100000; i++)
>             write(fd, buf, 4096);
> performs 10% better if buf is aligned to 4096 bytes rather then to
> 512 bytes on HDD with 512/4096 logical/physical sector size.
> 
> The difference is quite reliable.

The 10% difference, however, is probably not enough to cover the cost of
providing a bounce buffer if a guest is (rightfully) using a 512-byte
aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it
should be bs->bl.min_mem_alignment instead.

Instead, you probably should patch bdrv_opt_mem_align to return at least
4096, and leave the detection logic intact.  This will let
qemu_blockalign return a properly aligned buffer to qemu-img and other
in-process allocations, without negatively affecting the guest.

Thanks,

Paolo

> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  block.c           | 4 ++--
>  block/raw-posix.c | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/block.c b/block.c
> index d45e4dd..bc5d1e7 100644
> --- a/block.c
> +++ b/block.c
> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>          bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
>          bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
>      } else {
> -        bs->bl.opt_mem_alignment = 512;
> +        bs->bl.opt_mem_alignment = 4096;
>      }
>  
>      if (bs->backing_hd) {
> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>  
>      bs->open_flags = flags;
>      bs->guest_block_size = 512;
> -    bs->request_alignment = 512;
> +    bs->request_alignment = 4096;
>      bs->zero_beyond_eof = true;
>      open_flags = bdrv_open_flags(bs, flags);
>      bs->read_only = !(open_flags & BDRV_O_RDWR);
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index ec38fee..d1b3388 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>      if (!s->buf_align) {
>          size_t align;
>          buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
> -        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> +        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>              if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
>                  s->buf_align = align;
>                  break;
> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>      if (!bs->request_alignment) {
>          size_t align;
>          buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
> -        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
> +        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>              if (pread(fd, buf, align, 0) >= 0) {
>                  bs->request_alignment = align;
>                  break;
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096
  2015-01-28 20:07   ` Paolo Bonzini
@ 2015-01-28 20:13     ` Denis V. Lunev
  0 siblings, 0 replies; 5+ messages in thread
From: Denis V. Lunev @ 2015-01-28 20:13 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On 28/01/15 23:07, Paolo Bonzini wrote:
>
> On 28/01/2015 19:49, Denis V. Lunev wrote:
>> The following sequence
>>      int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644);
>>      for (i = 0; i < 100000; i++)
>>              write(fd, buf, 4096);
>> performs 10% better if buf is aligned to 4096 bytes rather then to
>> 512 bytes on HDD with 512/4096 logical/physical sector size.
>>
>> The difference is quite reliable.
> The 10% difference, however, is probably not enough to cover the cost of
> providing a bounce buffer if a guest is (rightfully) using a 512-byte
> aligned buffer: bs->bl.opt_mem_alignment is in fact badly named and it
> should be bs->bl.min_mem_alignment instead.
>
> Instead, you probably should patch bdrv_opt_mem_align to return at least
> 4096, and leave the detection logic intact.  This will let
> qemu_blockalign return a properly aligned buffer to qemu-img and other
> in-process allocations, without negatively affecting the guest.
>
> Thanks,
>
> Paolo
ok, this looks good to me :)


>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>   block.c           | 4 ++--
>>   block/raw-posix.c | 4 ++--
>>   2 files changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index d45e4dd..bc5d1e7 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -543,7 +543,7 @@ void bdrv_refresh_limits(BlockDriverState *bs, Error **errp)
>>           bs->bl.max_transfer_length = bs->file->bl.max_transfer_length;
>>           bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
>>       } else {
>> -        bs->bl.opt_mem_alignment = 512;
>> +        bs->bl.opt_mem_alignment = 4096;
>>       }
>>   
>>       if (bs->backing_hd) {
>> @@ -966,7 +966,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>>   
>>       bs->open_flags = flags;
>>       bs->guest_block_size = 512;
>> -    bs->request_alignment = 512;
>> +    bs->request_alignment = 4096;
>>       bs->zero_beyond_eof = true;
>>       open_flags = bdrv_open_flags(bs, flags);
>>       bs->read_only = !(open_flags & BDRV_O_RDWR);
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index ec38fee..d1b3388 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -266,7 +266,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>>       if (!s->buf_align) {
>>           size_t align;
>>           buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
>> -        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
>> +        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>>               if (pread(fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
>>                   s->buf_align = align;
>>                   break;
>> @@ -278,7 +278,7 @@ static void raw_probe_alignment(BlockDriverState *bs, int fd, Error **errp)
>>       if (!bs->request_alignment) {
>>           size_t align;
>>           buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
>> -        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
>> +        for (align = 4096; align <= MAX_BLOCKSIZE; align <<= 1) {
>>               if (pread(fd, buf, align, 0) >= 0) {
>>                   bs->request_alignment = align;
>>                   break;
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-28 20:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-28 18:49 [Qemu-devel] [PATCH 0/1] block: change default memory alignment for block requests Denis V. Lunev
2015-01-28 18:49 ` [Qemu-devel] [PATCH 1/1] block: change default memory alignment for block requests to 4096 Denis V. Lunev
2015-01-28 19:59   ` Denis V. Lunev
2015-01-28 20:07   ` Paolo Bonzini
2015-01-28 20:13     ` Denis V. Lunev

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.