All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
@ 2014-06-23  9:17 Chunyan Liu
  2014-06-27  3:36 ` Chun Yan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Chunyan Liu @ 2014-06-23  9:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: Chunyan Liu, stefanha

Add 'nocow' option so that users could have a chance to set NOCOW flag to
newly created files. It's useful on btrfs file system to enhance performance.

Btrfs has low performance when hosting VM images, even more when the guest
in those VM are also using btrfs as file system. One way to mitigate this bad
performance is to turn off COW attributes on VM files. Generally, there are
two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then
all newly created files will be NOCOW. b) per file. Add the NOCOW file
attribute. It could only be done to empty or new files.

This patch tries the second way, according to the option, it could add NOCOW
per file.

For most block drivers, since the create file step is in raw-posix.c, so we
can do setting NOCOW flag ioctl in raw-posix.c only.

But there are some exceptions, like block/vpc.c and block/vdi.c, they are
creating file by calling qemu_open directly. For them, do the same setting
NOCOW flag ioctl work in them separately.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
---
Changes to v2:
  * based on QemuOpts instead of old QEMUOptionParameters
  * add nocow description in man page and html doc

  Old v2 is here:
  http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html

---
 block/cow.c               |  5 +++++
 block/qcow.c              |  5 +++++
 block/qcow2.c             |  5 +++++
 block/qed.c               | 11 ++++++++---
 block/raw-posix.c         | 25 +++++++++++++++++++++++++
 block/vdi.c               | 29 +++++++++++++++++++++++++++++
 block/vhdx.c              |  5 +++++
 block/vmdk.c              | 11 ++++++++---
 block/vpc.c               | 29 +++++++++++++++++++++++++++++
 include/block/block_int.h |  1 +
 qemu-doc.texi             | 16 ++++++++++++++++
 qemu-img.texi             | 16 ++++++++++++++++
 12 files changed, 152 insertions(+), 6 deletions(-)

diff --git a/block/cow.c b/block/cow.c
index a05a92c..43b537c 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -401,6 +401,11 @@ static QemuOptsList cow_create_opts = {
             .type = QEMU_OPT_STRING,
             .help = "File name of a base image"
         },
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         { /* end of list */ }
     }
 };
diff --git a/block/qcow.c b/block/qcow.c
index 1f2bac8..5b23540 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -928,6 +928,11 @@ static QemuOptsList qcow_create_opts = {
             .help = "Encrypt the image",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         { /* end of list */ }
     }
 };
diff --git a/block/qcow2.c b/block/qcow2.c
index b9d2fa6..3a4cc8a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2382,6 +2382,11 @@ static QemuOptsList qcow2_create_opts = {
             .help = "Postpone refcount updates",
             .def_value_str = "off"
         },
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         { /* end of list */ }
     }
 };
diff --git a/block/qed.c b/block/qed.c
index 092e6fb..460ac92 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs)
 static int qed_create(const char *filename, uint32_t cluster_size,
                       uint64_t image_size, uint32_t table_size,
                       const char *backing_file, const char *backing_fmt,
-                      Error **errp)
+                      QemuOpts *opts, Error **errp)
 {
     QEDHeader header = {
         .magic = QED_MAGIC,
@@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t cluster_size,
     int ret = 0;
     BlockDriverState *bs;
 
-    ret = bdrv_create_file(filename, NULL, &local_err);
+    ret = bdrv_create_file(filename, opts, &local_err);
     if (ret < 0) {
         error_propagate(errp, local_err);
         return ret;
@@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts *opts, Error **errp)
     }
 
     ret = qed_create(filename, cluster_size, image_size, table_size,
-                     backing_file, backing_fmt, errp);
+                     backing_file, backing_fmt, opts, errp);
 
 finish:
     g_free(backing_file);
@@ -1644,6 +1644,11 @@ static QemuOptsList qed_create_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "L1/L2 table size (in clusters)"
         },
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         { /* end of list */ }
     }
 };
diff --git a/block/raw-posix.c b/block/raw-posix.c
index dacf4fb..825a0c8 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -55,6 +55,9 @@
 #include <linux/cdrom.h>
 #include <linux/fd.h>
 #include <linux/fs.h>
+#ifndef FS_NOCOW_FL
+#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
+#endif
 #endif
 #ifdef CONFIG_FIEMAP
 #include <linux/fiemap.h>
@@ -1278,12 +1281,14 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
     int fd;
     int result = 0;
     int64_t total_size = 0;
+    bool nocow = false;
 
     strstart(filename, "file:", &filename);
 
     /* Read out options */
     total_size =
         qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE;
+    nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false);
 
     fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,
                    0644);
@@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
         result = -errno;
         error_setg_errno(errp, -result, "Could not create file");
     } else {
+        if (nocow) {
+#ifdef __linux__
+            /* Set NOCOW flag to solve performance issue on fs like btrfs.
+             * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
+             * will be ignored since any failure of this operation should not
+             * block the left work.
+             */
+            int attr;
+            if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
+                attr |= FS_NOCOW_FL;
+                ioctl(fd, FS_IOC_SETFLAGS, &attr);
+            }
+#endif
+        }
+
         if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) {
             result = -errno;
             error_setg_errno(errp, -result, "Could not resize file");
@@ -1477,6 +1497,11 @@ static QemuOptsList raw_create_opts = {
             .type = QEMU_OPT_SIZE,
             .help = "Virtual disk size"
         },
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         { /* end of list */ }
     }
 };
diff --git a/block/vdi.c b/block/vdi.c
index 01fe22e..197bd77 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -53,6 +53,13 @@
 #include "block/block_int.h"
 #include "qemu/module.h"
 #include "migration/migration.h"
+#ifdef __linux__
+#include <linux/fs.h>
+#include <sys/ioctl.h>
+#ifndef FS_NOCOW_FL
+#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
+#endif
+#endif
 
 #if defined(CONFIG_UUID)
 #include <uuid/uuid.h>
@@ -683,6 +690,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
     VdiHeader header;
     size_t i;
     size_t bmap_size;
+    bool nocow = false;
 
     logout("\n");
 
@@ -699,6 +707,7 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
         image_type = VDI_TYPE_STATIC;
     }
 #endif
+    nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
 
     if (bytes > VDI_DISK_SIZE_MAX) {
         result = -ENOTSUP;
@@ -716,6 +725,21 @@ static int vdi_create(const char *filename, QemuOpts *opts, Error **errp)
         goto exit;
     }
 
+    if (nocow) {
+#ifdef __linux__
+        /* Set NOCOW flag to solve performance issue on fs like btrfs.
+         * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value will
+         * be ignored since any failure of this operation should not block the
+         * left work.
+         */
+        int attr;
+        if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
+            attr |= FS_NOCOW_FL;
+            ioctl(fd, FS_IOC_SETFLAGS, &attr);
+        }
+#endif
+    }
+
     /* We need enough blocks to store the given disk size,
        so always round up. */
     blocks = (bytes + block_size - 1) / block_size;
@@ -818,6 +842,11 @@ static QemuOptsList vdi_create_opts = {
             .def_value_str = "off"
         },
 #endif
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         /* TODO: An additional option to set UUID values might be useful. */
         { /* end of list */ }
     }
diff --git a/block/vhdx.c b/block/vhdx.c
index fedcf9f..7bdb456 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1909,6 +1909,11 @@ static QemuOptsList vhdx_create_opts = {
            .type = QEMU_OPT_BOOL,
            .help = "Force use of payload blocks of type 'ZERO'.  Non-standard."
        },
+       {
+           .name = BLOCK_OPT_NOCOW,
+           .type = QEMU_OPT_BOOL,
+           .help = "Turn off copy-on-write (valid only on btrfs)"
+       },
        { NULL }
     }
 };
diff --git a/block/vmdk.c b/block/vmdk.c
index 83dd6fe..94e1ff7 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1529,7 +1529,7 @@ static int coroutine_fn vmdk_co_write_zeroes(BlockDriverState *bs,
 
 static int vmdk_create_extent(const char *filename, int64_t filesize,
                               bool flat, bool compress, bool zeroed_grain,
-                              Error **errp)
+                              QemuOpts *opts, Error **errp)
 {
     int ret, i;
     BlockDriverState *bs = NULL;
@@ -1539,7 +1539,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize,
     uint32_t *gd_buf = NULL;
     int gd_buf_size;
 
-    ret = bdrv_create_file(filename, NULL, &local_err);
+    ret = bdrv_create_file(filename, opts, &local_err);
     if (ret < 0) {
         error_propagate(errp, local_err);
         goto exit;
@@ -1845,7 +1845,7 @@ static int vmdk_create(const char *filename, QemuOpts *opts, Error **errp)
                 path, desc_filename);
 
         if (vmdk_create_extent(ext_filename, size,
-                               flat, compress, zeroed_grain, errp)) {
+                               flat, compress, zeroed_grain, opts, errp)) {
             ret = -EINVAL;
             goto exit;
         }
@@ -2153,6 +2153,11 @@ static QemuOptsList vmdk_create_opts = {
             .help = "Enable efficient zero writes "
                     "using the zeroed-grain GTE feature"
         },
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         { /* end of list */ }
     }
 };
diff --git a/block/vpc.c b/block/vpc.c
index 798d854..8b376a4 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -29,6 +29,13 @@
 #if defined(CONFIG_UUID)
 #include <uuid/uuid.h>
 #endif
+#ifdef __linux__
+#include <linux/fs.h>
+#include <sys/ioctl.h>
+#ifndef FS_NOCOW_FL
+#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */
+#endif
+#endif
 
 /**************************************************************/
 
@@ -751,6 +758,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
     int64_t total_size;
     int disk_type;
     int ret = -EIO;
+    bool nocow = false;
 
     /* Read out options */
     total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0);
@@ -767,6 +775,7 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
     } else {
         disk_type = VHD_DYNAMIC;
     }
+    nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false);
 
     /* Create the file */
     fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0644);
@@ -775,6 +784,21 @@ static int vpc_create(const char *filename, QemuOpts *opts, Error **errp)
         goto out;
     }
 
+    if (nocow) {
+#ifdef __linux__
+        /* Set NOCOW flag to solve performance issue on fs like btrfs.
+         * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value will
+         * be ignored since any failure of this operation should not block the
+         * left work.
+         */
+        int attr;
+        if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
+            attr |= FS_NOCOW_FL;
+            ioctl(fd, FS_IOC_SETFLAGS, &attr);
+        }
+#endif
+    }
+
     /*
      * Calculate matching total_size and geometry. Increase the number of
      * sectors requested until we get enough (or fail). This ensures that
@@ -884,6 +908,11 @@ static QemuOptsList vpc_create_opts = {
                 "Type of virtual hard disk format. Supported formats are "
                 "{dynamic (default) | fixed} "
         },
+        {
+            .name = BLOCK_OPT_NOCOW,
+            .type = QEMU_OPT_BOOL,
+            .help = "Turn off copy-on-write (valid only on btrfs)"
+        },
         { /* end of list */ }
     }
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7aa2213..4e5022a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -54,6 +54,7 @@
 #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
 #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type"
 #define BLOCK_OPT_REDUNDANCY        "redundancy"
+#define BLOCK_OPT_NOCOW             "nocow"
 
 typedef struct BdrvTrackedRequest {
     BlockDriverState *bs;
diff --git a/qemu-doc.texi b/qemu-doc.texi
index 88ec9bb..ad92c85 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -589,6 +589,22 @@ check -r all} is required, which may take some time.
 
 This option can only be enabled if @code{compat=1.1} is specified.
 
+@item nocow
+If this option is set to @code{on}, it will trun off COW of the file. It's only
+valid on btrfs, no effect on other file systems.
+
+Btrfs has low performance when hosting a VM image file, even more when the guest
+on the VM also using btrfs as file system. Turning off COW is a way to mitigate
+this bad performance. Generally there are two ways to turn off COW on btrfs:
+a) Disable it by mounting with nodatacow, then all newly created files will be
+NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option
+does.
+
+Note: this option is only valid to new or empty files. If there is an existing
+file which is COW and has data blocks already, it couldn't be changed to NOCOW
+by setting @code{nocow=on}. One can issue @code{lsattr filename} to check if
+the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag).
+
 @end table
 
 @item qed
diff --git a/qemu-img.texi b/qemu-img.texi
index c68b541..8496f3b 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -474,6 +474,22 @@ check -r all} is required, which may take some time.
 
 This option can only be enabled if @code{compat=1.1} is specified.
 
+@item nocow
+If this option is set to @code{on}, it will trun off COW of the file. It's only
+valid on btrfs, no effect on other file systems.
+
+Btrfs has low performance when hosting a VM image file, even more when the guest
+on the VM also using btrfs as file system. Turning off COW is a way to mitigate
+this bad performance. Generally there are two ways to turn off COW on btrfs:
+a) Disable it by mounting with nodatacow, then all newly created files will be
+NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option
+does.
+
+Note: this option is only valid to new or empty files. If there is an existing
+file which is COW and has data blocks already, it couldn't be changed to NOCOW
+by setting @code{nocow=on}. One can issue @code{lsattr filename} to check if
+the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag).
+
 @end table
 
 @item Other
-- 
1.8.4.5

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
  2014-06-23  9:17 [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option Chunyan Liu
@ 2014-06-27  3:36 ` Chun Yan Liu
  2014-06-27 11:48 ` Stefan Hajnoczi
  2014-07-01 21:02 ` Eric Blake
  2 siblings, 0 replies; 6+ messages in thread
From: Chun Yan Liu @ 2014-06-27  3:36 UTC (permalink / raw)
  To: Stefan Hajnoczi, kwolf; +Cc: qemu-devel

Hi, Stefan & Kevin,

Could you help to have a look at this version? We've discussed about
this last November and now switch it to QemuOpts.

Thanks,
Chunyan

>>> On 6/23/2014 at 05:17 PM, in message
<1403515022-24802-1-git-send-email-cyliu@suse.com>, Chunyan Liu
<cyliu@suse.com> wrote: 
> Add 'nocow' option so that users could have a chance to set NOCOW flag to 
> newly created files. It's useful on btrfs file system to enhance  
> performance. 
>  
> Btrfs has low performance when hosting VM images, even more when the guest 
> in those VM are also using btrfs as file system. One way to mitigate this  
> bad 
> performance is to turn off COW attributes on VM files. Generally, there are 
> two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then 
> all newly created files will be NOCOW. b) per file. Add the NOCOW file 
> attribute. It could only be done to empty or new files. 
>  
> This patch tries the second way, according to the option, it could add NOCOW 
> per file. 
>  
> For most block drivers, since the create file step is in raw-posix.c, so we 
> can do setting NOCOW flag ioctl in raw-posix.c only. 
>  
> But there are some exceptions, like block/vpc.c and block/vdi.c, they are 
> creating file by calling qemu_open directly. For them, do the same setting 
> NOCOW flag ioctl work in them separately. 
>  
> Signed-off-by: Chunyan Liu <cyliu@suse.com> 
> --- 
> Changes to v2: 
>   * based on QemuOpts instead of old QEMUOptionParameters 
>   * add nocow description in man page and html doc 
>  
>   Old v2 is here: 
>   http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html 
>  
> --- 
>  block/cow.c               |  5 +++++ 
>  block/qcow.c              |  5 +++++ 
>  block/qcow2.c             |  5 +++++ 
>  block/qed.c               | 11 ++++++++--- 
>  block/raw-posix.c         | 25 +++++++++++++++++++++++++ 
>  block/vdi.c               | 29 +++++++++++++++++++++++++++++ 
>  block/vhdx.c              |  5 +++++ 
>  block/vmdk.c              | 11 ++++++++--- 
>  block/vpc.c               | 29 +++++++++++++++++++++++++++++ 
>  include/block/block_int.h |  1 + 
>  qemu-doc.texi             | 16 ++++++++++++++++ 
>  qemu-img.texi             | 16 ++++++++++++++++ 
>  12 files changed, 152 insertions(+), 6 deletions(-) 
>  
> diff --git a/block/cow.c b/block/cow.c 
> index a05a92c..43b537c 100644 
> --- a/block/cow.c 
> +++ b/block/cow.c 
> @@ -401,6 +401,11 @@ static QemuOptsList cow_create_opts = { 
>              .type = QEMU_OPT_STRING, 
>              .help = "File name of a base image" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/qcow.c b/block/qcow.c 
> index 1f2bac8..5b23540 100644 
> --- a/block/qcow.c 
> +++ b/block/qcow.c 
> @@ -928,6 +928,11 @@ static QemuOptsList qcow_create_opts = { 
>              .help = "Encrypt the image", 
>              .def_value_str = "off" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/qcow2.c b/block/qcow2.c 
> index b9d2fa6..3a4cc8a 100644 
> --- a/block/qcow2.c 
> +++ b/block/qcow2.c 
> @@ -2382,6 +2382,11 @@ static QemuOptsList qcow2_create_opts = { 
>              .help = "Postpone refcount updates", 
>              .def_value_str = "off" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/qed.c b/block/qed.c 
> index 092e6fb..460ac92 100644 
> --- a/block/qed.c 
> +++ b/block/qed.c 
> @@ -567,7 +567,7 @@ static void bdrv_qed_close(BlockDriverState *bs) 
>  static int qed_create(const char *filename, uint32_t cluster_size, 
>                        uint64_t image_size, uint32_t table_size, 
>                        const char *backing_file, const char *backing_fmt, 
> -                      Error **errp) 
> +                      QemuOpts *opts, Error **errp) 
>  { 
>      QEDHeader header = { 
>          .magic = QED_MAGIC, 
> @@ -586,7 +586,7 @@ static int qed_create(const char *filename, uint32_t  
> cluster_size, 
>      int ret = 0; 
>      BlockDriverState *bs; 
>   
> -    ret = bdrv_create_file(filename, NULL, &local_err); 
> +    ret = bdrv_create_file(filename, opts, &local_err); 
>      if (ret < 0) { 
>          error_propagate(errp, local_err); 
>          return ret; 
> @@ -682,7 +682,7 @@ static int bdrv_qed_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      } 
>   
>      ret = qed_create(filename, cluster_size, image_size, table_size, 
> -                     backing_file, backing_fmt, errp); 
> +                     backing_file, backing_fmt, opts, errp); 
>   
>  finish: 
>      g_free(backing_file); 
> @@ -1644,6 +1644,11 @@ static QemuOptsList qed_create_opts = { 
>              .type = QEMU_OPT_SIZE, 
>              .help = "L1/L2 table size (in clusters)" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/raw-posix.c b/block/raw-posix.c 
> index dacf4fb..825a0c8 100644 
> --- a/block/raw-posix.c 
> +++ b/block/raw-posix.c 
> @@ -55,6 +55,9 @@ 
>  #include <linux/cdrom.h> 
>  #include <linux/fd.h> 
>  #include <linux/fs.h> 
> +#ifndef FS_NOCOW_FL 
> +#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */ 
> +#endif 
>  #endif 
>  #ifdef CONFIG_FIEMAP 
>  #include <linux/fiemap.h> 
> @@ -1278,12 +1281,14 @@ static int raw_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      int fd; 
>      int result = 0; 
>      int64_t total_size = 0; 
> +    bool nocow = false; 
>   
>      strstart(filename, "file:", &filename); 
>   
>      /* Read out options */ 
>      total_size = 
>          qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0) / BDRV_SECTOR_SIZE; 
> +    nocow = qemu_opt_get_bool(opts, BLOCK_OPT_NOCOW, false); 
>   
>      fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 
>                     0644); 
> @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          result = -errno; 
>          error_setg_errno(errp, -result, "Could not create file"); 
>      } else { 
> +        if (nocow) { 
> +#ifdef __linux__ 
> +            /* Set NOCOW flag to solve performance issue on fs like btrfs. 
> +             * This is an optimisation. The FS_IOC_SETFLAGS ioctl return  
> value 
> +             * will be ignored since any failure of this operation should  
> not 
> +             * block the left work. 
> +             */ 
> +            int attr; 
> +            if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { 
> +                attr |= FS_NOCOW_FL; 
> +                ioctl(fd, FS_IOC_SETFLAGS, &attr); 
> +            } 
> +#endif 
> +        } 
> + 
>          if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) { 
>              result = -errno; 
>              error_setg_errno(errp, -result, "Could not resize file"); 
> @@ -1477,6 +1497,11 @@ static QemuOptsList raw_create_opts = { 
>              .type = QEMU_OPT_SIZE, 
>              .help = "Virtual disk size" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/vdi.c b/block/vdi.c 
> index 01fe22e..197bd77 100644 
> --- a/block/vdi.c 
> +++ b/block/vdi.c 
> @@ -53,6 +53,13 @@ 
>  #include "block/block_int.h" 
>  #include "qemu/module.h" 
>  #include "migration/migration.h" 
> +#ifdef __linux__ 
> +#include <linux/fs.h> 
> +#include <sys/ioctl.h> 
> +#ifndef FS_NOCOW_FL 
> +#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */ 
> +#endif 
> +#endif 
>   
>  #if defined(CONFIG_UUID) 
>  #include <uuid/uuid.h> 
> @@ -683,6 +690,7 @@ static int vdi_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      VdiHeader header; 
>      size_t i; 
>      size_t bmap_size; 
> +    bool nocow = false; 
>   
>      logout("\n"); 
>   
> @@ -699,6 +707,7 @@ static int vdi_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          image_type = VDI_TYPE_STATIC; 
>      } 
>  #endif 
> +    nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); 
>   
>      if (bytes > VDI_DISK_SIZE_MAX) { 
>          result = -ENOTSUP; 
> @@ -716,6 +725,21 @@ static int vdi_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          goto exit; 
>      } 
>   
> +    if (nocow) { 
> +#ifdef __linux__ 
> +        /* Set NOCOW flag to solve performance issue on fs like btrfs. 
> +         * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value  
> will 
> +         * be ignored since any failure of this operation should not block  
> the 
> +         * left work. 
> +         */ 
> +        int attr; 
> +        if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { 
> +            attr |= FS_NOCOW_FL; 
> +            ioctl(fd, FS_IOC_SETFLAGS, &attr); 
> +        } 
> +#endif 
> +    } 
> + 
>      /* We need enough blocks to store the given disk size, 
>         so always round up. */ 
>      blocks = (bytes + block_size - 1) / block_size; 
> @@ -818,6 +842,11 @@ static QemuOptsList vdi_create_opts = { 
>              .def_value_str = "off" 
>          }, 
>  #endif 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          /* TODO: An additional option to set UUID values might be useful.  
> */ 
>          { /* end of list */ } 
>      } 
> diff --git a/block/vhdx.c b/block/vhdx.c 
> index fedcf9f..7bdb456 100644 
> --- a/block/vhdx.c 
> +++ b/block/vhdx.c 
> @@ -1909,6 +1909,11 @@ static QemuOptsList vhdx_create_opts = { 
>             .type = QEMU_OPT_BOOL, 
>             .help = "Force use of payload blocks of type 'ZERO'.   
> Non-standard." 
>         }, 
> +       { 
> +           .name = BLOCK_OPT_NOCOW, 
> +           .type = QEMU_OPT_BOOL, 
> +           .help = "Turn off copy-on-write (valid only on btrfs)" 
> +       }, 
>         { NULL } 
>      } 
>  }; 
> diff --git a/block/vmdk.c b/block/vmdk.c 
> index 83dd6fe..94e1ff7 100644 
> --- a/block/vmdk.c 
> +++ b/block/vmdk.c 
> @@ -1529,7 +1529,7 @@ static int coroutine_fn  
> vmdk_co_write_zeroes(BlockDriverState *bs, 
>   
>  static int vmdk_create_extent(const char *filename, int64_t filesize, 
>                                bool flat, bool compress, bool zeroed_grain, 
> -                              Error **errp) 
> +                              QemuOpts *opts, Error **errp) 
>  { 
>      int ret, i; 
>      BlockDriverState *bs = NULL; 
> @@ -1539,7 +1539,7 @@ static int vmdk_create_extent(const char *filename,  
> int64_t filesize, 
>      uint32_t *gd_buf = NULL; 
>      int gd_buf_size; 
>   
> -    ret = bdrv_create_file(filename, NULL, &local_err); 
> +    ret = bdrv_create_file(filename, opts, &local_err); 
>      if (ret < 0) { 
>          error_propagate(errp, local_err); 
>          goto exit; 
> @@ -1845,7 +1845,7 @@ static int vmdk_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>                  path, desc_filename); 
>   
>          if (vmdk_create_extent(ext_filename, size, 
> -                               flat, compress, zeroed_grain, errp)) { 
> +                               flat, compress, zeroed_grain, opts, errp)) { 
>              ret = -EINVAL; 
>              goto exit; 
>          } 
> @@ -2153,6 +2153,11 @@ static QemuOptsList vmdk_create_opts = { 
>              .help = "Enable efficient zero writes " 
>                      "using the zeroed-grain GTE feature" 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/block/vpc.c b/block/vpc.c 
> index 798d854..8b376a4 100644 
> --- a/block/vpc.c 
> +++ b/block/vpc.c 
> @@ -29,6 +29,13 @@ 
>  #if defined(CONFIG_UUID) 
>  #include <uuid/uuid.h> 
>  #endif 
> +#ifdef __linux__ 
> +#include <linux/fs.h> 
> +#include <sys/ioctl.h> 
> +#ifndef FS_NOCOW_FL 
> +#define FS_NOCOW_FL                     0x00800000 /* Do not cow file */ 
> +#endif 
> +#endif 
>   
>  /**************************************************************/ 
>   
> @@ -751,6 +758,7 @@ static int vpc_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      int64_t total_size; 
>      int disk_type; 
>      int ret = -EIO; 
> +    bool nocow = false; 
>   
>      /* Read out options */ 
>      total_size = qemu_opt_get_size_del(opts, BLOCK_OPT_SIZE, 0); 
> @@ -767,6 +775,7 @@ static int vpc_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>      } else { 
>          disk_type = VHD_DYNAMIC; 
>      } 
> +    nocow = qemu_opt_get_bool_del(opts, BLOCK_OPT_NOCOW, false); 
>   
>      /* Create the file */ 
>      fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY,  
> 0644); 
> @@ -775,6 +784,21 @@ static int vpc_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
>          goto out; 
>      } 
>   
> +    if (nocow) { 
> +#ifdef __linux__ 
> +        /* Set NOCOW flag to solve performance issue on fs like btrfs. 
> +         * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value  
> will 
> +         * be ignored since any failure of this operation should not block  
> the 
> +         * left work. 
> +         */ 
> +        int attr; 
> +        if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { 
> +            attr |= FS_NOCOW_FL; 
> +            ioctl(fd, FS_IOC_SETFLAGS, &attr); 
> +        } 
> +#endif 
> +    } 
> + 
>      /* 
>       * Calculate matching total_size and geometry. Increase the number of 
>       * sectors requested until we get enough (or fail). This ensures that 
> @@ -884,6 +908,11 @@ static QemuOptsList vpc_create_opts = { 
>                  "Type of virtual hard disk format. Supported formats are " 
>                  "{dynamic (default) | fixed} " 
>          }, 
> +        { 
> +            .name = BLOCK_OPT_NOCOW, 
> +            .type = QEMU_OPT_BOOL, 
> +            .help = "Turn off copy-on-write (valid only on btrfs)" 
> +        }, 
>          { /* end of list */ } 
>      } 
>  }; 
> diff --git a/include/block/block_int.h b/include/block/block_int.h 
> index 7aa2213..4e5022a 100644 
> --- a/include/block/block_int.h 
> +++ b/include/block/block_int.h 
> @@ -54,6 +54,7 @@ 
>  #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts" 
>  #define BLOCK_OPT_ADAPTER_TYPE      "adapter_type" 
>  #define BLOCK_OPT_REDUNDANCY        "redundancy" 
> +#define BLOCK_OPT_NOCOW             "nocow" 
>   
>  typedef struct BdrvTrackedRequest { 
>      BlockDriverState *bs; 
> diff --git a/qemu-doc.texi b/qemu-doc.texi 
> index 88ec9bb..ad92c85 100644 
> --- a/qemu-doc.texi 
> +++ b/qemu-doc.texi 
> @@ -589,6 +589,22 @@ check -r all} is required, which may take some time. 
>   
>  This option can only be enabled if @code{compat=1.1} is specified. 
>   
> +@item nocow 
> +If this option is set to @code{on}, it will trun off COW of the file. It's  
> only 
> +valid on btrfs, no effect on other file systems. 
> + 
> +Btrfs has low performance when hosting a VM image file, even more when the  
> guest 
> +on the VM also using btrfs as file system. Turning off COW is a way to  
> mitigate 
> +this bad performance. Generally there are two ways to turn off COW on  
> btrfs: 
> +a) Disable it by mounting with nodatacow, then all newly created files will  
> be 
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this  
> option 
> +does. 
> + 
> +Note: this option is only valid to new or empty files. If there is an  
> existing 
> +file which is COW and has data blocks already, it couldn't be changed to  
> NOCOW 
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check  
> if 
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). 
> + 
>  @end table 
>   
>  @item qed 
> diff --git a/qemu-img.texi b/qemu-img.texi 
> index c68b541..8496f3b 100644 
> --- a/qemu-img.texi 
> +++ b/qemu-img.texi 
> @@ -474,6 +474,22 @@ check -r all} is required, which may take some time. 
>   
>  This option can only be enabled if @code{compat=1.1} is specified. 
>   
> +@item nocow 
> +If this option is set to @code{on}, it will trun off COW of the file. It's  
> only 
> +valid on btrfs, no effect on other file systems. 
> + 
> +Btrfs has low performance when hosting a VM image file, even more when the  
> guest 
> +on the VM also using btrfs as file system. Turning off COW is a way to  
> mitigate 
> +this bad performance. Generally there are two ways to turn off COW on  
> btrfs: 
> +a) Disable it by mounting with nodatacow, then all newly created files will  
> be 
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this  
> option 
> +does. 
> + 
> +Note: this option is only valid to new or empty files. If there is an  
> existing 
> +file which is COW and has data blocks already, it couldn't be changed to  
> NOCOW 
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check  
> if 
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). 
> + 
>  @end table 
>   
>  @item Other 
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
  2014-06-23  9:17 [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option Chunyan Liu
  2014-06-27  3:36 ` Chun Yan Liu
@ 2014-06-27 11:48 ` Stefan Hajnoczi
  2014-06-30  4:50   ` Chun Yan Liu
  2014-07-01 21:02 ` Eric Blake
  2 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-06-27 11:48 UTC (permalink / raw)
  To: Chunyan Liu; +Cc: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 3150 bytes --]

On Mon, Jun 23, 2014 at 05:17:02PM +0800, Chunyan Liu wrote:
> Add 'nocow' option so that users could have a chance to set NOCOW flag to
> newly created files. It's useful on btrfs file system to enhance performance.
> 
> Btrfs has low performance when hosting VM images, even more when the guest
> in those VM are also using btrfs as file system. One way to mitigate this bad
> performance is to turn off COW attributes on VM files. Generally, there are
> two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then
> all newly created files will be NOCOW. b) per file. Add the NOCOW file
> attribute. It could only be done to empty or new files.
> 
> This patch tries the second way, according to the option, it could add NOCOW
> per file.
> 
> For most block drivers, since the create file step is in raw-posix.c, so we
> can do setting NOCOW flag ioctl in raw-posix.c only.
> 
> But there are some exceptions, like block/vpc.c and block/vdi.c, they are
> creating file by calling qemu_open directly. For them, do the same setting
> NOCOW flag ioctl work in them separately.
> 
> Signed-off-by: Chunyan Liu <cyliu@suse.com>
> ---
> Changes to v2:
>   * based on QemuOpts instead of old QEMUOptionParameters
>   * add nocow description in man page and html doc
> 
>   Old v2 is here:
>   http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html
> 
> ---
>  block/cow.c               |  5 +++++
>  block/qcow.c              |  5 +++++
>  block/qcow2.c             |  5 +++++
>  block/qed.c               | 11 ++++++++---
>  block/raw-posix.c         | 25 +++++++++++++++++++++++++
>  block/vdi.c               | 29 +++++++++++++++++++++++++++++
>  block/vhdx.c              |  5 +++++
>  block/vmdk.c              | 11 ++++++++---
>  block/vpc.c               | 29 +++++++++++++++++++++++++++++
>  include/block/block_int.h |  1 +
>  qemu-doc.texi             | 16 ++++++++++++++++
>  qemu-img.texi             | 16 ++++++++++++++++
>  12 files changed, 152 insertions(+), 6 deletions(-)

Are you sure it's necessary to touch all image formats in order to pass
through the nocow option?  Looking at bdrv_img_create() I think it will
work without touching all image formats since both drv and
proto_drv->create_opts are appended:

void bdrv_img_create(const char *filename, const char *fmt,
                     const char *base_filename, const char *base_fmt,
                     char *options, uint64_t img_size, int flags,
                     Error **errp, bool quiet)
{
    QemuOptsList *create_opts = NULL;
...
    create_opts = qemu_opts_append(create_opts, drv->create_opts);
    create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);

    /* Create parameter list with default values */
    opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
    qemu_opt_set_number(opts, BLOCK_OPT_SIZE, img_size);

    /* Parse -o options */
    if (options) {
        if (qemu_opts_do_parse(opts, options, NULL) != 0) {
            error_setg(errp, "Invalid options for file format '%s'", fmt);
            goto out;
        }
    }

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
  2014-06-27 11:48 ` Stefan Hajnoczi
@ 2014-06-30  4:50   ` Chun Yan Liu
  0 siblings, 0 replies; 6+ messages in thread
From: Chun Yan Liu @ 2014-06-30  4:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel



>>> On 6/27/2014 at 07:48 PM, in message
<20140627114806.GM12061@stefanha-thinkpad.muc.redhat.com>, Stefan Hajnoczi
<stefanha@redhat.com> wrote: 
> On Mon, Jun 23, 2014 at 05:17:02PM +0800, Chunyan Liu wrote: 
> > Add 'nocow' option so that users could have a chance to set NOCOW flag to 
> > newly created files. It's useful on btrfs file system to enhance  
> performance. 
> >  
> > Btrfs has low performance when hosting VM images, even more when the guest 
> > in those VM are also using btrfs as file system. One way to mitigate this  
> bad 
> > performance is to turn off COW attributes on VM files. Generally, there are 
> > two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then 
> > all newly created files will be NOCOW. b) per file. Add the NOCOW file 
> > attribute. It could only be done to empty or new files. 
> >  
> > This patch tries the second way, according to the option, it could add  
> NOCOW 
> > per file. 
> >  
> > For most block drivers, since the create file step is in raw-posix.c, so we 
> > can do setting NOCOW flag ioctl in raw-posix.c only. 
> >  
> > But there are some exceptions, like block/vpc.c and block/vdi.c, they are 
> > creating file by calling qemu_open directly. For them, do the same setting 
> > NOCOW flag ioctl work in them separately. 
> >  
> > Signed-off-by: Chunyan Liu <cyliu@suse.com> 
> > --- 
> > Changes to v2: 
> >   * based on QemuOpts instead of old QEMUOptionParameters 
> >   * add nocow description in man page and html doc 
> >  
> >   Old v2 is here: 
> >   http://lists.gnu.org/archive/html/qemu-devel/2013-11/msg02429.html 
> >  
> > --- 
> >  block/cow.c               |  5 +++++ 
> >  block/qcow.c              |  5 +++++ 
> >  block/qcow2.c             |  5 +++++ 
> >  block/qed.c               | 11 ++++++++--- 
> >  block/raw-posix.c         | 25 +++++++++++++++++++++++++ 
> >  block/vdi.c               | 29 +++++++++++++++++++++++++++++ 
> >  block/vhdx.c              |  5 +++++ 
> >  block/vmdk.c              | 11 ++++++++--- 
> >  block/vpc.c               | 29 +++++++++++++++++++++++++++++ 
> >  include/block/block_int.h |  1 + 
> >  qemu-doc.texi             | 16 ++++++++++++++++ 
> >  qemu-img.texi             | 16 ++++++++++++++++ 
> >  12 files changed, 152 insertions(+), 6 deletions(-) 
>  
> Are you sure it's necessary to touch all image formats in order to pass 
> through the nocow option?  Looking at bdrv_img_create() I think it will 
> work without touching all image formats since both drv and 
> proto_drv->create_opts are appended: 

Right. For those calling bdrv_create_file to create file, it's not necessary
to add NOCOW option to their .create_opts. Adding NOCOW to raw-posix.c
is enough. There will be no difference to users when they do:
qemu-img create -f fmt name size -o nocow=on
or
qemu-img create -f fmt name size -o ?

>  
> void bdrv_img_create(const char *filename, const char *fmt, 
>                      const char *base_filename, const char *base_fmt, 
>                      char *options, uint64_t img_size, int flags, 
>                      Error **errp, bool quiet) 
> { 
>     QemuOptsList *create_opts = NULL; 
> ... 
>     create_opts = qemu_opts_append(create_opts, drv->create_opts); 
>     create_opts = qemu_opts_append(create_opts, proto_drv->create_opts); 
>  
>     /* Create parameter list with default values */ 
>     opts = qemu_opts_create(create_opts, NULL, 0, &error_abort); 
>     qemu_opt_set_number(opts, BLOCK_OPT_SIZE, img_size); 
>  
>     /* Parse -o options */ 
>     if (options) { 
>         if (qemu_opts_do_parse(opts, options, NULL) != 0) { 
>             error_setg(errp, "Invalid options for file format '%s'", fmt); 
>             goto out; 
>         } 
>     } 
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
  2014-06-23  9:17 [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option Chunyan Liu
  2014-06-27  3:36 ` Chun Yan Liu
  2014-06-27 11:48 ` Stefan Hajnoczi
@ 2014-07-01 21:02 ` Eric Blake
  2014-07-02  3:03   ` Chun Yan Liu
  2 siblings, 1 reply; 6+ messages in thread
From: Eric Blake @ 2014-07-01 21:02 UTC (permalink / raw)
  To: Chunyan Liu, qemu-devel; +Cc: stefanha

[-- Attachment #1: Type: text/plain, Size: 5376 bytes --]

On 06/23/2014 03:17 AM, Chunyan Liu wrote:
> Add 'nocow' option so that users could have a chance to set NOCOW flag to
> newly created files. It's useful on btrfs file system to enhance performance.
> 
> Btrfs has low performance when hosting VM images, even more when the guest
> in those VM are also using btrfs as file system. One way to mitigate this bad
> performance is to turn off COW attributes on VM files. Generally, there are
> two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then
> all newly created files will be NOCOW. b) per file. Add the NOCOW file
> attribute. It could only be done to empty or new files.
> 
> This patch tries the second way, according to the option, it could add NOCOW
> per file.
> 
> For most block drivers, since the create file step is in raw-posix.c, so we
> can do setting NOCOW flag ioctl in raw-posix.c only.
> 
> But there are some exceptions, like block/vpc.c and block/vdi.c, they are
> creating file by calling qemu_open directly. For them, do the same setting
> NOCOW flag ioctl work in them separately.

Design question (not a patch review):

It looks like your patch allows one to set the NOCOW flag via ioctl when
requested.  But how does one learn if the flag is already set?  Can you
update 'qemu-img info' to show whether a file currently has the flag
set?  Can 'qemu-img amend' be taught to set and/or clear the flag on an
already existing file?


> @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts *opts, Error **errp)
>          result = -errno;
>          error_setg_errno(errp, -result, "Could not create file");
>      } else {
> +        if (nocow) {
> +#ifdef __linux__
> +            /* Set NOCOW flag to solve performance issue on fs like btrfs.
> +             * This is an optimisation. The FS_IOC_SETFLAGS ioctl return value
> +             * will be ignored since any failure of this operation should not
> +             * block the left work.
> +             */
> +            int attr;
> +            if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) {
> +                attr |= FS_NOCOW_FL;
> +                ioctl(fd, FS_IOC_SETFLAGS, &attr);
> +            }
> +#endif
> +        }

This silently ignores the nocow flag on non-Linux.  Wouldn't it be
better to reject the option as unsupported?

What happens if the ioctl fails?  Would it be better to fail the
qemu-img creation if the flag is requested but can't be honored?


> +++ b/qemu-doc.texi
> @@ -589,6 +589,22 @@ check -r all} is required, which may take some time.
>  
>  This option can only be enabled if @code{compat=1.1} is specified.
>  
> +@item nocow
> +If this option is set to @code{on}, it will trun off COW of the file. It's only

s/trun/turn/

> +valid on btrfs, no effect on other file systems.

This sort of statement may get stale, if other file systems learn to
honor the same ioctl as btrfs.

> +
> +Btrfs has low performance when hosting a VM image file, even more when the guest
> +on the VM also using btrfs as file system. Turning off COW is a way to mitigate
> +this bad performance. Generally there are two ways to turn off COW on btrfs:
> +a) Disable it by mounting with nodatacow, then all newly created files will be
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option
> +does.
> +
> +Note: this option is only valid to new or empty files. If there is an existing
> +file which is COW and has data blocks already, it couldn't be changed to NOCOW
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check if
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag).

s/Capitabl/Capital/

Oh, so it looks like setting the attribute is one-way, and can't be
undone once something is written?  Or is it that it can only be set on
an empty file, but can be cleared at any time?

Again, making people refer to lsattr to learn if the flag is already set
seems painful; can qemu-img info be taught to expose this information,
so that one tool is sufficient to manage the entire experience?

> +++ b/qemu-img.texi
> @@ -474,6 +474,22 @@ check -r all} is required, which may take some time.
>  
>  This option can only be enabled if @code{compat=1.1} is specified.
>  
> +@item nocow
> +If this option is set to @code{on}, it will trun off COW of the file. It's only

s/trun/turn/

> +valid on btrfs, no effect on other file systems.
> +
> +Btrfs has low performance when hosting a VM image file, even more when the guest
> +on the VM also using btrfs as file system. Turning off COW is a way to mitigate
> +this bad performance. Generally there are two ways to turn off COW on btrfs:
> +a) Disable it by mounting with nodatacow, then all newly created files will be
> +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what this option
> +does.
> +
> +Note: this option is only valid to new or empty files. If there is an existing
> +file which is COW and has data blocks already, it couldn't be changed to NOCOW
> +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check if
> +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag).

s/Capitabl/Capital/

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option
  2014-07-01 21:02 ` Eric Blake
@ 2014-07-02  3:03   ` Chun Yan Liu
  0 siblings, 0 replies; 6+ messages in thread
From: Chun Yan Liu @ 2014-07-02  3:03 UTC (permalink / raw)
  To: qemu-devel, Eric Blake; +Cc: stefanha



>>> On 7/2/2014 at 05:02 AM, in message <53B321CC.7070300@redhat.com>, Eric Blake
<eblake@redhat.com> wrote: 
> On 06/23/2014 03:17 AM, Chunyan Liu wrote: 
> > Add 'nocow' option so that users could have a chance to set NOCOW flag to 
> > newly created files. It's useful on btrfs file system to enhance  
> performance. 
> >  
> > Btrfs has low performance when hosting VM images, even more when the guest 
> > in those VM are also using btrfs as file system. One way to mitigate this  
> bad 
> > performance is to turn off COW attributes on VM files. Generally, there are 
> > two ways to turn off NOCOW on btrfs: a) by mounting fs with nodatacow, then 
> > all newly created files will be NOCOW. b) per file. Add the NOCOW file 
> > attribute. It could only be done to empty or new files. 
> >  
> > This patch tries the second way, according to the option, it could add  
> NOCOW 
> > per file. 
> >  
> > For most block drivers, since the create file step is in raw-posix.c, so we 
> > can do setting NOCOW flag ioctl in raw-posix.c only. 
> >  
> > But there are some exceptions, like block/vpc.c and block/vdi.c, they are 
> > creating file by calling qemu_open directly. For them, do the same setting 
> > NOCOW flag ioctl work in them separately. 
>  
> Design question (not a patch review): 
>  
> It looks like your patch allows one to set the NOCOW flag via ioctl when 
> requested.  But how does one learn if the flag is already set?  Can you 
> update 'qemu-img info' to show whether a file currently has the flag 
> set? 

The reliable way to check if the file is really NOCOW is 'lsattr filename'.
-o nocow=on not really means NOCOW successfully set. We didn't block
file creation even if ioctl fails (see below).
Maybe 'qemu-img info' can call 'lsattr filename' to judge if the file is NOCOW.

> Can 'qemu-img amend' be taught to set and/or clear the flag on an 
> already existing file? 

No. It's one way. A COW file can be not changed to NOCOW, vice versa.

>  
>  
> > @@ -1291,6 +1296,21 @@ static int raw_create(const char *filename, QemuOpts  
> *opts, Error **errp) 
> >          result = -errno; 
> >          error_setg_errno(errp, -result, "Could not create file"); 
> >      } else { 
> > +        if (nocow) { 
> > +#ifdef __linux__ 
> > +            /* Set NOCOW flag to solve performance issue on fs like btrfs. 
> > +             * This is an optimisation. The FS_IOC_SETFLAGS ioctl return  
> value 
> > +             * will be ignored since any failure of this operation should  
> not 
> > +             * block the left work. 
> > +             */ 
> > +            int attr; 
> > +            if (ioctl(fd, FS_IOC_GETFLAGS, &attr) == 0) { 
> > +                attr |= FS_NOCOW_FL; 
> > +                ioctl(fd, FS_IOC_SETFLAGS, &attr); 
> > +            } 
> > +#endif 
> > +        } 
>  
> This silently ignores the nocow flag on non-Linux.  Wouldn't it be 
> better to reject the option as unsupported? 

First, other file systems (at least ext3 ext4 as I checked ) not supporting that
ioctl would simply return 0.  So we couldn't rely on the ioctl return value
to check if the option is unsupported.
 
> What happens if the ioctl fails?  Would it be better to fail the 
> qemu-img creation if the flag is requested but can't be honored? 
>  

NOCOW is an optimization for performance, we don't want it to block the file
creation. In other words, if ioctl fails, we still hope the file image is created.

Another reason for doing this is the ioctl return value is not consistent in
different file systems, ioctl return value:
0: not always means SUCCESS, could be UNSUPPORTED 
!0: could be supported but not successfuly (like in btrfs);
     could be unsupported at all but other error. (other fs. In this case, we don't
      want to block the file creation.)

>  
> > +++ b/qemu-doc.texi 
> > @@ -589,6 +589,22 @@ check -r all} is required, which may take some time. 
> >   
> >  This option can only be enabled if @code{compat=1.1} is specified. 
> >   
> > +@item nocow 
> > +If this option is set to @code{on}, it will trun off COW of the file. It's  
> only 
>  
> s/trun/turn/ 
>  
> > +valid on btrfs, no effect on other file systems. 
>  
> This sort of statement may get stale, if other file systems learn to 
> honor the same ioctl as btrfs. 

Yes. Currently only btrfs supports this ioctl. Writting here because we didn't
check what current fs is, and we didn't have a reliable way to report the ioctl
is unsupported (as above). So to warn users in manpage.

>  
> > + 
> > +Btrfs has low performance when hosting a VM image file, even more when the  
> guest 
> > +on the VM also using btrfs as file system. Turning off COW is a way to  
> mitigate 
> > +this bad performance. Generally there are two ways to turn off COW on  
> btrfs: 
> > +a) Disable it by mounting with nodatacow, then all newly created files  
> will be 
> > +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what  
> this option 
> > +does. 
> > + 
> > +Note: this option is only valid to new or empty files. If there is an  
> existing 
> > +file which is COW and has data blocks already, it couldn't be changed to  
> NOCOW 
> > +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check  
> if 
> > +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). 
>  
> s/Capitabl/Capital/ 
>  
> Oh, so it looks like setting the attribute is one-way, and can't be 
> undone once something is written?  Or is it that it can only be set on 
> an empty file, but can be cleared at any time? 

Yes, it's one-way. A COW file can be not changed to NOCOW, vice versa.
To change, one can create a new file with expected COW or NOCOW, and
copy data to that new file.

-Chunyan

>  
> Again, making people refer to lsattr to learn if the flag is already set 
> seems painful; can qemu-img info be taught to expose this information, 
> so that one tool is sufficient to manage the entire experience? 
>  
> > +++ b/qemu-img.texi 
> > @@ -474,6 +474,22 @@ check -r all} is required, which may take some time. 
> >   
> >  This option can only be enabled if @code{compat=1.1} is specified. 
> >   
> > +@item nocow 
> > +If this option is set to @code{on}, it will trun off COW of the file. It's  
> only 
>  
> s/trun/turn/ 
>  
> > +valid on btrfs, no effect on other file systems. 
> > + 
> > +Btrfs has low performance when hosting a VM image file, even more when the  
> guest 
> > +on the VM also using btrfs as file system. Turning off COW is a way to  
> mitigate 
> > +this bad performance. Generally there are two ways to turn off COW on  
> btrfs: 
> > +a) Disable it by mounting with nodatacow, then all newly created files  
> will be 
> > +NOCOW. b) For an empty file, add the NOCOW file attribute. That's what  
> this option 
> > +does. 
> > + 
> > +Note: this option is only valid to new or empty files. If there is an  
> existing 
> > +file which is COW and has data blocks already, it couldn't be changed to  
> NOCOW 
> > +by setting @code{nocow=on}. One can issue @code{lsattr filename} to check  
> if 
> > +the NOCOW flag is set or not (Capitabl 'C' is NOCOW flag). 
>  
> s/Capitabl/Capital/ 
>  
> --  
> Eric Blake   eblake redhat com    +1-919-301-3266 
> Libvirt virtualization library http://libvirt.org 
>  
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-07-02  3:04 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-06-23  9:17 [Qemu-devel] [PATCH V3] qemu-img create: add 'nocow' option Chunyan Liu
2014-06-27  3:36 ` Chun Yan Liu
2014-06-27 11:48 ` Stefan Hajnoczi
2014-06-30  4:50   ` Chun Yan Liu
2014-07-01 21:02 ` Eric Blake
2014-07-02  3:03   ` Chun Yan Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.