All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
@ 2011-07-09 13:02 Pekka Enberg
  2011-07-09 13:02 ` [PATCH 1/9] kvm tools, qcow: Rename struct qcow_l2_cache to struct qcow_l2_table Pekka Enberg
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

This series fixes QCOW locking issues and implements delayed metadata writeout.
This improves performance of writeout to QCOW2 images that don't have clusters
and L2 tables allocated on-disk.

I tested the series by running 

  mount -t ext4 /dev/vdb /mnt
  dd if=/dev/zero of=/mnt/tmp

in the guest multiple times for fresly generated QCOW2 image:

  dd if=/dev/zero of=fs.ext4 bs=1024k count=512 && mkfs.ext4 -F fs.ext4 && qemu-img convert -O qcow2 fs.ext4 fs.qcow2

which causes worst-case behavior for the current code.

Before:

  [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-master.png ]

  511229952 bytes (511 MB) copied, 19.906 s, 25.7 MB/s
  511229952 bytes (511 MB) copied, 20.3168 s, 25.2 MB/s
  511229952 bytes (511 MB) copied, 20.8078 s, 24.6 MB/s
  511229952 bytes (511 MB) copied, 21.0889 s, 24.2 MB/s
  511229952 bytes (511 MB) copied, 20.7833 s, 24.6 MB/s
  511229952 bytes (511 MB) copied, 20.7536 s, 24.6 MB/s
  511229952 bytes (511 MB) copied, 20.0312 s, 25.5 MB/s

After:

  [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-delayed.png ]

  511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
  511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
  511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
  511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
  511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
  511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
  511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s

For comparison, the results for raw ext4 images are as follows:

  [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-raw.png ]

  511229952 bytes (511 MB) copied, 8.75403 s, 58.4 MB/s
  511229952 bytes (511 MB) copied, 10.3326 s, 49.5 MB/s
  511229952 bytes (511 MB) copied, 6.8494 s, 74.6 MB/s
  511229952 bytes (511 MB) copied, 9.21538 s, 55.5 MB/s
  511229952 bytes (511 MB) copied, 8.52883 s, 59.9 MB/s
  511229952 bytes (511 MB) copied, 8.13298 s, 62.9 MB/s
  511229952 bytes (511 MB) copied, 5.03801 s, 101 MB/s

			Pekka

Pekka Enberg (9):
  kvm tools, qcow: Rename struct qcow_l2_cache to struct qcow_l2_table
  kvm tools, qcow: Use 'struct qcow_l2_table' instead of untyped array
  kvm tools, qcow: Fix locking issues
  kvm tools, qcow: Introduce qcow_disk_flush()
  kvm tools, qcow: Delayed L1 table writeout
  kvm tools, qcow: Don't fdatasync() L2 table writeout
  kvm tools, qcow: Use big endian order for L2 table entries
  kvm tools, qcow: Delayed L2 table writeout
  kvm tools, qcow: Flush only dirty L2 tables

 tools/kvm/disk/qcow.c         |  248 ++++++++++++++++++++++-------------------
 tools/kvm/include/kvm/mutex.h |    6 +
 tools/kvm/include/kvm/qcow.h  |    6 +-
 3 files changed, 145 insertions(+), 115 deletions(-)


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/9] kvm tools, qcow: Rename struct qcow_l2_cache to struct qcow_l2_table
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 2/9] kvm tools, qcow: Use 'struct qcow_l2_table' instead of untyped array Pekka Enberg
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

This patch renames 'struct qcow_l2_cache' to 'struct qcow_l2_table' in
preparation for replacing the untyped L2 table arrays with the struct.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c        |   32 ++++++++++++++++----------------
 tools/kvm/include/kvm/qcow.h |    2 +-
 2 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index 3f8c52d..a346c3d 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -16,16 +16,16 @@
 #include <linux/kernel.h>
 #include <linux/types.h>
 
-static int insert(struct rb_root *root, struct qcow_l2_cache *new)
+static int insert(struct rb_root *root, struct qcow_l2_table *new)
 {
 	struct rb_node **link = &(root->rb_node), *parent = NULL;
 	u64 offset = new->offset;
 
 	/* search the tree */
 	while (*link) {
-		struct qcow_l2_cache *t;
+		struct qcow_l2_table *t;
 
-		t = rb_entry(*link, struct qcow_l2_cache, node);
+		t = rb_entry(*link, struct qcow_l2_table, node);
 		if (!t)
 			goto error;
 
@@ -48,14 +48,14 @@ error:
 	return -1;
 }
 
-static struct qcow_l2_cache *search(struct rb_root *root, u64 offset)
+static struct qcow_l2_table *search(struct rb_root *root, u64 offset)
 {
 	struct rb_node *link = root->rb_node;
 
 	while (link) {
-		struct qcow_l2_cache *t;
+		struct qcow_l2_table *t;
 
-		t = rb_entry(link, struct qcow_l2_cache, node);
+		t = rb_entry(link, struct qcow_l2_table, node);
 		if (!t)
 			goto out;
 
@@ -73,13 +73,13 @@ out:
 static void free_cache(struct qcow *q)
 {
 	struct list_head *pos, *n;
-	struct qcow_l2_cache *t;
+	struct qcow_l2_table *t;
 	struct rb_root *r = &q->root;
 
 	list_for_each_safe(pos, n, &q->lru_list) {
 		/* Remove cache table from the list and RB tree */
 		list_del(pos);
-		t = list_entry(pos, struct qcow_l2_cache, list);
+		t = list_entry(pos, struct qcow_l2_table, list);
 		rb_erase(&t->node, r);
 
 		/* Free the cached node */
@@ -87,17 +87,17 @@ static void free_cache(struct qcow *q)
 	}
 }
 
-static int cache_table(struct qcow *q, struct qcow_l2_cache *c)
+static int cache_table(struct qcow *q, struct qcow_l2_table *c)
 {
 	struct rb_root *r = &q->root;
-	struct qcow_l2_cache *lru;
+	struct qcow_l2_table *lru;
 
 	if (q->nr_cached == MAX_CACHE_NODES) {
 		/*
 		 * The node at the head of the list is least recently used
 		 * node. Remove it from the list and replaced with a new node.
 		 */
-		lru = list_first_entry(&q->lru_list, struct qcow_l2_cache, list);
+		lru = list_first_entry(&q->lru_list, struct qcow_l2_table, list);
 
 		/* Remove the node from the cache */
 		rb_erase(&lru->node, r);
@@ -123,7 +123,7 @@ error:
 
 static int search_table(struct qcow *q, u64 **table, u64 offset)
 {
-	struct qcow_l2_cache *c;
+	struct qcow_l2_table *c;
 
 	*table = NULL;
 
@@ -139,10 +139,10 @@ static int search_table(struct qcow *q, u64 **table, u64 offset)
 }
 
 /* Allocates a new node for caching L2 table */
-static struct qcow_l2_cache *new_cache_table(struct qcow *q, u64 offset)
+static struct qcow_l2_table *new_cache_table(struct qcow *q, u64 offset)
 {
 	struct qcow_header *header = q->header;
-	struct qcow_l2_cache *c;
+	struct qcow_l2_table *c;
 	u64 l2t_sz;
 	u64 size;
 
@@ -183,7 +183,7 @@ static inline u64 get_cluster_offset(struct qcow *q, u64 offset)
 static int qcow_read_l2_table(struct qcow *q, u64 **table, u64 offset)
 {
 	struct qcow_header *header = q->header;
-	struct qcow_l2_cache *c;
+	struct qcow_l2_table *c;
 	u64 size;
 	u64 i;
 	u64 *t;
@@ -367,7 +367,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 {
 	struct qcow_header *header = q->header;
 	struct qcow_table  *table  = &q->table;
-	struct qcow_l2_cache *c;
+	struct qcow_l2_table *c;
 	bool update_meta;
 	u64 clust_start;
 	u64 clust_off;
diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h
index 973d9f3..12247e0 100644
--- a/tools/kvm/include/kvm/qcow.h
+++ b/tools/kvm/include/kvm/qcow.h
@@ -21,7 +21,7 @@
 
 #define MAX_CACHE_NODES         32
 
-struct qcow_l2_cache {
+struct qcow_l2_table {
 	u64                     offset;
 	struct rb_node          node;
 	struct list_head        list;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/9] kvm tools, qcow: Use 'struct qcow_l2_table' instead of untyped array
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
  2011-07-09 13:02 ` [PATCH 1/9] kvm tools, qcow: Rename struct qcow_l2_cache to struct qcow_l2_table Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 3/9] kvm tools, qcow: Fix locking issues Pekka Enberg
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

This patch converts disk/qcow.c to use 'struct qcow_l2_table' for tracking
dirty L2 tables later on in this series.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c |   80 ++++++++++++++++++++++---------------------------
 1 files changed, 36 insertions(+), 44 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index a346c3d..a1f6ef3 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -121,21 +121,18 @@ error:
 	return -1;
 }
 
-static int search_table(struct qcow *q, u64 **table, u64 offset)
+static struct qcow_l2_table *search_table(struct qcow *q, u64 offset)
 {
-	struct qcow_l2_table *c;
-
-	*table = NULL;
+	struct qcow_l2_table *l2t;
 
-	c = search(&q->root, offset);
-	if (!c)
-		return -1;
+	l2t = search(&q->root, offset);
+	if (!l2t)
+		return NULL;
 
 	/* Update the LRU state, by moving the searched node to list tail */
-	list_move_tail(&c->list, &q->lru_list);
+	list_move_tail(&l2t->list, &q->lru_list);
 
-	*table = c->table;
-	return 0;
+	return l2t;
 }
 
 /* Allocates a new node for caching L2 table */
@@ -180,62 +177,57 @@ static inline u64 get_cluster_offset(struct qcow *q, u64 offset)
 	return offset & ((1 << header->cluster_bits)-1);
 }
 
-static int qcow_read_l2_table(struct qcow *q, u64 **table, u64 offset)
+static struct qcow_l2_table *qcow_read_l2_table(struct qcow *q, u64 offset)
 {
 	struct qcow_header *header = q->header;
-	struct qcow_l2_table *c;
+	struct qcow_l2_table *l2t;
 	u64 size;
 	u64 i;
-	u64 *t;
 
-	c      = NULL;
-	*table = NULL;
-	size   = 1 << header->l2_bits;
+	size = 1 << header->l2_bits;
 
 	/* search an entry for offset in cache */
-	if (search_table(q, table, offset) >= 0)
-		return 0;
+	l2t = search_table(q, offset);
+	if (l2t)
+		return l2t;
 
 	/* allocate new node for caching l2 table */
-	c = new_cache_table(q, offset);
-	if (!c)
+	l2t = new_cache_table(q, offset);
+	if (!l2t)
 		goto error;
-	t = c->table;
 
 	/* table not cached: read from the disk */
-	if (pread_in_full(q->fd, t, size * sizeof(u64), offset) < 0)
+	if (pread_in_full(q->fd, l2t->table, size * sizeof(u64), offset) < 0)
 		goto error;
 
 	/* cache the table */
-	if (cache_table(q, c) < 0)
+	if (cache_table(q, l2t) < 0)
 		goto error;
 
 	/* change cached table to CPU's byte-order */
 	for (i = 0; i < size; i++)
-		be64_to_cpus(&t[i]);
+		be64_to_cpus(&l2t->table[i]);
 
-	*table = t;
-	return 0;
+	return l2t;
 error:
-	free(c);
-	return -1;
+	free(l2t);
+	return NULL;
 }
 
 static ssize_t qcow_read_cluster(struct qcow *q, u64 offset, void *dst, u32 dst_len)
 {
 	struct qcow_header *header = q->header;
 	struct qcow_table *table  = &q->table;
+	struct qcow_l2_table *l2_table;
 	u64 l2_table_offset;
 	u64 l2_table_size;
 	u64 cluster_size;
 	u64 clust_offset;
 	u64 clust_start;
 	size_t length;
-	u64 *l2_table;
 	u64 l1_idx;
 	u64 l2_idx;
 
-
 	cluster_size = 1 << header->cluster_bits;
 
 	l1_idx = get_l1_index(q, offset);
@@ -257,14 +249,15 @@ static ssize_t qcow_read_cluster(struct qcow *q, u64 offset, void *dst, u32 dst_
 	l2_table_size = 1 << header->l2_bits;
 
 	/* read and cache level 2 table */
-	if (qcow_read_l2_table(q, &l2_table, l2_table_offset) < 0)
+	l2_table = qcow_read_l2_table(q, l2_table_offset);
+	if (!l2_table)
 		goto out_error;
 
 	l2_idx = get_l2_index(q, offset);
 	if (l2_idx >= l2_table_size)
 		goto out_error;
 
-	clust_start = l2_table[l2_idx] & ~header->oflag_mask;
+	clust_start = l2_table->table[l2_idx] & ~header->oflag_mask;
 	if (!clust_start)
 		goto zero_cluster;
 
@@ -367,7 +360,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 {
 	struct qcow_header *header = q->header;
 	struct qcow_table  *table  = &q->table;
-	struct qcow_l2_table *c;
+	struct qcow_l2_table *l2t;
 	bool update_meta;
 	u64 clust_start;
 	u64 clust_off;
@@ -376,12 +369,11 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 	u64 l2t_idx;
 	u64 l2t_off;
 	u64 l2t_sz;
-	u64 *l2t;
 	u64 f_sz;
 	u64 len;
 	u64 t;
 
-	c               = NULL;
+	l2t		= NULL;
 	l2t_sz		= 1 << header->l2_bits;
 	clust_sz	= 1 << header->cluster_bits;
 
@@ -404,13 +396,13 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 	l2t_off		= table->l1_table[l1t_idx] & ~header->oflag_mask;
 	if (l2t_off) {
 		/* read and cache l2 table */
-		if (qcow_read_l2_table(q, &l2t, l2t_off) < 0)
+		l2t = qcow_read_l2_table(q, l2t_off);
+		if (!l2t)
 			goto error;
 	} else {
-		c = new_cache_table(q, l2t_off);
-		if (!c)
+		l2t = new_cache_table(q, l2t_off);
+		if (!l2t)
 			goto error;
-		l2t = c->table;
 
 		/* Capture the state of the consistent QCOW image */
 		f_sz		= file_size(q->fd);
@@ -418,7 +410,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 			goto free_cache;
 
 		/* Write the l2 table of 0's at the end of the file */
-		l2t_off		= qcow_write_l2_table(q, l2t);
+		l2t_off		= qcow_write_l2_table(q, l2t->table);
 		if (!l2t_off)
 			goto free_cache;
 
@@ -433,7 +425,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 			goto free_cache;
 		}
 
-		if (cache_table(q, c) < 0) {
+		if (cache_table(q, l2t) < 0) {
 			if (ftruncate(q->fd, f_sz) < 0)
 				goto free_cache;
 
@@ -449,7 +441,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 	if (!f_sz)
 		goto error;
 
-	clust_start	= l2t[l2t_idx] & ~header->oflag_mask;
+	clust_start	= l2t->table[l2t_idx] & ~header->oflag_mask;
 	if (!clust_start) {
 		clust_start	= ALIGN(f_sz, clust_sz);
 		update_meta	= true;
@@ -471,13 +463,13 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 		}
 
 		/* Update the cached level2 entry */
-		l2t[l2t_idx] = clust_start;
+		l2t->table[l2t_idx] = clust_start;
 	}
 
 	return len;
 
 free_cache:
-	free(c);
+	free(l2t);
 error:
 	return -1;
 }
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/9] kvm tools, qcow: Fix locking issues
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
  2011-07-09 13:02 ` [PATCH 1/9] kvm tools, qcow: Rename struct qcow_l2_cache to struct qcow_l2_table Pekka Enberg
  2011-07-09 13:02 ` [PATCH 2/9] kvm tools, qcow: Use 'struct qcow_l2_table' instead of untyped array Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 4/9] kvm tools, qcow: Introduce qcow_disk_flush() Pekka Enberg
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

The virtio_blk_do_io() function can enter the QCOW code through
disk_image__{read,write,flush}() from multiple threads because it uses a thread
pool for I/O requests. Thus, use locking to make the QCOW2 code thread-safe.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c         |   30 +++++++++++++++++++++---------
 tools/kvm/include/kvm/mutex.h |    6 ++++++
 tools/kvm/include/kvm/qcow.h  |    3 +++
 3 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index a1f6ef3..939bc61 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -2,6 +2,7 @@
 
 #include "kvm/disk-image.h"
 #include "kvm/read-write.h"
+#include "kvm/mutex.h"
 #include "kvm/util.h"
 
 #include <sys/types.h>
@@ -232,16 +233,17 @@ static ssize_t qcow_read_cluster(struct qcow *q, u64 offset, void *dst, u32 dst_
 
 	l1_idx = get_l1_index(q, offset);
 	if (l1_idx >= table->table_size)
-		goto out_error;
+		return -1;
 
 	clust_offset = get_cluster_offset(q, offset);
 	if (clust_offset >= cluster_size)
-		goto out_error;
+		return -1;
 
 	length = cluster_size - clust_offset;
 	if (length > dst_len)
 		length = dst_len;
 
+	mutex_lock(&q->mutex);
 	l2_table_offset = table->l1_table[l1_idx] & ~header->oflag_mask;
 	if (!l2_table_offset)
 		goto zero_cluster;
@@ -261,19 +263,22 @@ static ssize_t qcow_read_cluster(struct qcow *q, u64 offset, void *dst, u32 dst_
 	if (!clust_start)
 		goto zero_cluster;
 
+	mutex_unlock(&q->mutex);
+
 	if (pread_in_full(q->fd, dst, length, clust_start + clust_offset) < 0)
-		goto out_error;
+		return -1;
 
-out:
 	return length;
 
 zero_cluster:
+	mutex_unlock(&q->mutex);
 	memset(dst, 0, length);
-	goto out;
+	return length;
 
 out_error:
+	mutex_unlock(&q->mutex);
 	length = -1;
-	goto out;
+	return -1;
 }
 
 static ssize_t qcow_read_sector(struct disk_image *disk, u64 sector, void *dst, u32 dst_len)
@@ -379,20 +384,22 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 
 	l1t_idx		= get_l1_index(q, offset);
 	if (l1t_idx >= table->table_size)
-		goto error;
+		return -1;
 
 	l2t_idx		= get_l2_index(q, offset);
 	if (l2t_idx >= l2t_sz)
-		goto error;
+		return -1;
 
 	clust_off	= get_cluster_offset(q, offset);
 	if (clust_off >= clust_sz)
-		goto error;
+		return -1;
 
 	len		= clust_sz - clust_off;
 	if (len > src_len)
 		len = src_len;
 
+	mutex_lock(&q->mutex);
+
 	l2t_off		= table->l1_table[l1t_idx] & ~header->oflag_mask;
 	if (l2t_off) {
 		/* read and cache l2 table */
@@ -466,11 +473,14 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 		l2t->table[l2t_idx] = clust_start;
 	}
 
+	mutex_unlock(&q->mutex);
+
 	return len;
 
 free_cache:
 	free(l2t);
 error:
+	mutex_unlock(&q->mutex);
 	return -1;
 }
 
@@ -611,6 +621,7 @@ static struct disk_image *qcow2_probe(int fd, bool readonly)
 	if (!q)
 		goto error;
 
+	mutex_init(&q->mutex);
 	q->fd = fd;
 	q->root = RB_ROOT;
 	INIT_LIST_HEAD(&q->lru_list);
@@ -710,6 +721,7 @@ static struct disk_image *qcow1_probe(int fd, bool readonly)
 	if (!q)
 		goto error;
 
+	mutex_init(&q->mutex);
 	q->fd = fd;
 	q->root = RB_ROOT;
 	INIT_LIST_HEAD(&q->lru_list);
diff --git a/tools/kvm/include/kvm/mutex.h b/tools/kvm/include/kvm/mutex.h
index bd765c4..3286cea 100644
--- a/tools/kvm/include/kvm/mutex.h
+++ b/tools/kvm/include/kvm/mutex.h
@@ -12,6 +12,12 @@
 
 #define DEFINE_MUTEX(mutex) pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER
 
+static inline void mutex_init(pthread_mutex_t *mutex)
+{
+	if (pthread_mutex_init(mutex, NULL) != 0)
+		die("unexpected pthread_mutex_init() failure!");
+}
+
 static inline void mutex_lock(pthread_mutex_t *mutex)
 {
 	if (pthread_mutex_lock(mutex) != 0)
diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h
index 12247e0..d44c64a 100644
--- a/tools/kvm/include/kvm/qcow.h
+++ b/tools/kvm/include/kvm/qcow.h
@@ -1,6 +1,8 @@
 #ifndef KVM__QCOW_H
 #define KVM__QCOW_H
 
+#include "kvm/mutex.h"
+
 #include <linux/types.h>
 #include <stdbool.h>
 #include <linux/rbtree.h>
@@ -34,6 +36,7 @@ struct qcow_table {
 };
 
 struct qcow {
+	pthread_mutex_t		mutex;
 	void			*header;
 	struct qcow_table	table;
 	int			fd;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 4/9] kvm tools, qcow: Introduce qcow_disk_flush()
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
                   ` (2 preceding siblings ...)
  2011-07-09 13:02 ` [PATCH 3/9] kvm tools, qcow: Fix locking issues Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 5/9] kvm tools, qcow: Delayed L1 table writeout Pekka Enberg
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

This patch introduces a QCOW specific qcow_disk_flush() in preparation for
delaying QCOW metadata writeout until VIRTIO_BLK_T_FLUSH time.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c |   22 ++++++++++------------
 1 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index 939bc61..1a6a969 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -244,7 +244,7 @@ static ssize_t qcow_read_cluster(struct qcow *q, u64 offset, void *dst, u32 dst_
 		length = dst_len;
 
 	mutex_lock(&q->mutex);
-	l2_table_offset = table->l1_table[l1_idx] & ~header->oflag_mask;
+	l2_table_offset = be64_to_cpu(table->l1_table[l1_idx]) & ~header->oflag_mask;
 	if (!l2_table_offset)
 		goto zero_cluster;
 
@@ -400,7 +400,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 
 	mutex_lock(&q->mutex);
 
-	l2t_off		= table->l1_table[l1t_idx] & ~header->oflag_mask;
+	l2t_off		= be64_to_cpu(table->l1_table[l1t_idx]) & ~header->oflag_mask;
 	if (l2t_off) {
 		/* read and cache l2 table */
 		l2t = qcow_read_l2_table(q, l2t_off);
@@ -440,7 +440,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 		}
 
 		/* Update the in-core entry */
-		table->l1_table[l1t_idx] = l2t_off;
+		table->l1_table[l1t_idx] = cpu_to_be64(l2t_off);
 	}
 
 	/* Capture the state of the consistent QCOW image */
@@ -520,6 +520,11 @@ static ssize_t qcow_nowrite_sector(struct disk_image *disk, u64 sector, void *sr
 	return -1;
 }
 
+static int qcow_disk_flush(struct disk_image *disk)
+{
+	return fsync(disk->fd);
+}
+
 static int qcow_disk_close(struct disk_image *disk)
 {
 	struct qcow *q;
@@ -546,6 +551,7 @@ static struct disk_image_operations qcow_disk_readonly_ops = {
 static struct disk_image_operations qcow_disk_ops = {
 	.read_sector		= qcow_read_sector,
 	.write_sector		= qcow_write_sector,
+	.flush			= qcow_disk_flush,
 	.close			= qcow_disk_close,
 };
 
@@ -553,7 +559,6 @@ static int qcow_read_l1_table(struct qcow *q)
 {
 	struct qcow_header *header = q->header;
 	struct qcow_table *table = &q->table;
-	u64 i;
 
 	table->table_size	= header->l1_size;
 
@@ -561,14 +566,7 @@ static int qcow_read_l1_table(struct qcow *q)
 	if (!table->l1_table)
 		return -1;
 
-	if (pread_in_full(q->fd, table->l1_table, sizeof(u64) *
-				table->table_size, header->l1_table_offset) < 0)
-		return -1;
-
-	for (i = 0; i < table->table_size; i++)
-		be64_to_cpus(&table->l1_table[i]);
-
-	return 0;
+	return pread_in_full(q->fd, table->l1_table, sizeof(u64) * table->table_size, header->l1_table_offset);
 }
 
 static void *qcow2_read_header(int fd)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 5/9] kvm tools, qcow: Delayed L1 table writeout
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
                   ` (3 preceding siblings ...)
  2011-07-09 13:02 ` [PATCH 4/9] kvm tools, qcow: Introduce qcow_disk_flush() Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 6/9] kvm tools, qcow: Don't fdatasync() L2 " Pekka Enberg
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

This patch moves L1 table writeout to qcow_disk_flush(). The rationale here is
that while writes to clusters that don't have L2 table allocated on-disk are
volatile until VIRTIO_BLK_T_FLUSH is issued, we never corrupt the disk image.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c |   24 +++++++++++++-----------
 1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index 1a6a969..0fef92f 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -421,17 +421,6 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 		if (!l2t_off)
 			goto free_cache;
 
-		/* Metadata update: update on disk level 1 table */
-		t		= cpu_to_be64(l2t_off);
-
-		if (qcow_pwrite_sync(q->fd, &t, sizeof(t), header->l1_table_offset + l1t_idx * sizeof(u64)) < 0) {
-			/* restore file to consistent state */
-			if (ftruncate(q->fd, f_sz) < 0)
-				goto free_cache;
-
-			goto free_cache;
-		}
-
 		if (cache_table(q, l2t) < 0) {
 			if (ftruncate(q->fd, f_sz) < 0)
 				goto free_cache;
@@ -522,6 +511,19 @@ static ssize_t qcow_nowrite_sector(struct disk_image *disk, u64 sector, void *sr
 
 static int qcow_disk_flush(struct disk_image *disk)
 {
+	struct qcow *q = disk->priv;
+	struct qcow_header *header;
+	struct qcow_table *table;
+
+	if (fdatasync(disk->fd) < 0)
+		return -1;
+
+	header	= q->header;
+	table	= &q->table;
+
+	if (pwrite_in_full(disk->fd, table->l1_table, table->table_size * sizeof(u64), header->l1_table_offset) < 0)
+		return -1;
+
 	return fsync(disk->fd);
 }
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 6/9] kvm tools, qcow: Don't fdatasync() L2 table writeout
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
                   ` (4 preceding siblings ...)
  2011-07-09 13:02 ` [PATCH 5/9] kvm tools, qcow: Delayed L1 table writeout Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 7/9] kvm tools, qcow: Use big endian order for L2 table entries Pekka Enberg
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

There's now now point in making sure new L2 tables actually hit the disk before
we write out data to clusters because they are not visible on-disk until
qcow_disk_flush() is called which flushes the L1 table.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index 0fef92f..35408ab 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -345,7 +345,7 @@ static u64 qcow_write_l2_table(struct qcow *q, u64 *table)
 	clust_sz	= 1 << header->cluster_bits;
 	off		= ALIGN(f_sz, clust_sz);
 
-	if (qcow_pwrite_sync(q->fd, table, sz * sizeof(u64), off) < 0)
+	if (pwrite_in_full(q->fd, table, sz * sizeof(u64), off) < 0)
 		return 0;
 
 	return off;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 7/9] kvm tools, qcow: Use big endian order for L2 table entries
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
                   ` (5 preceding siblings ...)
  2011-07-09 13:02 ` [PATCH 6/9] kvm tools, qcow: Don't fdatasync() L2 " Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 8/9] kvm tools, qcow: Delayed L2 table writeout Pekka Enberg
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

Don't keep the in-memory array in CPU byte order to simplify delayed L2 table
writeout.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c |   11 +++--------
 1 files changed, 3 insertions(+), 8 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index 35408ab..c851e7f 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -183,7 +183,6 @@ static struct qcow_l2_table *qcow_read_l2_table(struct qcow *q, u64 offset)
 	struct qcow_header *header = q->header;
 	struct qcow_l2_table *l2t;
 	u64 size;
-	u64 i;
 
 	size = 1 << header->l2_bits;
 
@@ -205,10 +204,6 @@ static struct qcow_l2_table *qcow_read_l2_table(struct qcow *q, u64 offset)
 	if (cache_table(q, l2t) < 0)
 		goto error;
 
-	/* change cached table to CPU's byte-order */
-	for (i = 0; i < size; i++)
-		be64_to_cpus(&l2t->table[i]);
-
 	return l2t;
 error:
 	free(l2t);
@@ -259,7 +254,7 @@ static ssize_t qcow_read_cluster(struct qcow *q, u64 offset, void *dst, u32 dst_
 	if (l2_idx >= l2_table_size)
 		goto out_error;
 
-	clust_start = l2_table->table[l2_idx] & ~header->oflag_mask;
+	clust_start = be64_to_cpu(l2_table->table[l2_idx]) & ~header->oflag_mask;
 	if (!clust_start)
 		goto zero_cluster;
 
@@ -437,7 +432,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 	if (!f_sz)
 		goto error;
 
-	clust_start	= l2t->table[l2t_idx] & ~header->oflag_mask;
+	clust_start	= be64_to_cpu(l2t->table[l2t_idx]) & ~header->oflag_mask;
 	if (!clust_start) {
 		clust_start	= ALIGN(f_sz, clust_sz);
 		update_meta	= true;
@@ -459,7 +454,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 		}
 
 		/* Update the cached level2 entry */
-		l2t->table[l2t_idx] = clust_start;
+		l2t->table[l2t_idx] = cpu_to_be64(clust_start);
 	}
 
 	mutex_unlock(&q->mutex);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 8/9] kvm tools, qcow: Delayed L2 table writeout
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
                   ` (6 preceding siblings ...)
  2011-07-09 13:02 ` [PATCH 7/9] kvm tools, qcow: Use big endian order for L2 table entries Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-09 13:02 ` [PATCH 9/9] kvm tools, qcow: Flush only dirty L2 tables Pekka Enberg
  2011-07-10 17:15 ` [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Ingo Molnar
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

This patch delays writeout for new L2 tables like we do for L1 tables. If a L2
table has non-allocated clusters, we mark that in the in-memory L2 table but
don't actually write it to disk until the L2 table is thrown out of LRU cache
or when qcow_disk_flush() is called. That makes writes to new clusters volatile
before VIRTIO_BLK_T_FLUSH is issued without corrupting the QCOW image on I/O
error.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c |   66 +++++++++++++++++++++++++++++--------------------
 1 files changed, 39 insertions(+), 27 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index c851e7f..b71762f 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -88,6 +88,16 @@ static void free_cache(struct qcow *q)
 	}
 }
 
+static int qcow_l2_cache_write(struct qcow *q, struct qcow_l2_table *c)
+{
+	struct qcow_header *header = q->header;
+	u64 size;
+
+	size = 1 << header->l2_bits;
+
+	return pwrite_in_full(q->fd, c->table, size * sizeof(u64), c->offset);
+}
+
 static int cache_table(struct qcow *q, struct qcow_l2_table *c)
 {
 	struct rb_root *r = &q->root;
@@ -100,6 +110,9 @@ static int cache_table(struct qcow *q, struct qcow_l2_table *c)
 		 */
 		lru = list_first_entry(&q->lru_list, struct qcow_l2_table, list);
 
+		if (qcow_l2_cache_write(q, lru) < 0)
+			goto error;
+
 		/* Remove the node from the cache */
 		rb_erase(&lru->node, r);
 		list_del_init(&lru->list);
@@ -361,7 +374,6 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 	struct qcow_header *header = q->header;
 	struct qcow_table  *table  = &q->table;
 	struct qcow_l2_table *l2t;
-	bool update_meta;
 	u64 clust_start;
 	u64 clust_off;
 	u64 clust_sz;
@@ -371,7 +383,6 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 	u64 l2t_sz;
 	u64 f_sz;
 	u64 len;
-	u64 t;
 
 	l2t		= NULL;
 	l2t_sz		= 1 << header->l2_bits;
@@ -434,31 +445,16 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 
 	clust_start	= be64_to_cpu(l2t->table[l2t_idx]) & ~header->oflag_mask;
 	if (!clust_start) {
-		clust_start	= ALIGN(f_sz, clust_sz);
-		update_meta	= true;
-	} else
-		update_meta	= false;
-
-	/* Write actual data */
-	if (pwrite_in_full(q->fd, buf, len, clust_start + clust_off) < 0)
-		goto error;
-
-	if (update_meta) {
-		t = cpu_to_be64(clust_start);
-		if (qcow_pwrite_sync(q->fd, &t, sizeof(t), l2t_off + l2t_idx * sizeof(u64)) < 0) {
-			/* Restore the file to consistent state */
-			if (ftruncate(q->fd, f_sz) < 0)
-				goto error;
-
-			goto error;
-		}
-
-		/* Update the cached level2 entry */
-		l2t->table[l2t_idx] = cpu_to_be64(clust_start);
+		clust_start		= ALIGN(f_sz, clust_sz);
+		l2t->table[l2t_idx]	= cpu_to_be64(clust_start);
 	}
 
 	mutex_unlock(&q->mutex);
 
+	/* Write actual data */
+	if (pwrite_in_full(q->fd, buf, len, clust_start + clust_off) < 0)
+		return -1;
+
 	return len;
 
 free_cache:
@@ -508,18 +504,34 @@ static int qcow_disk_flush(struct disk_image *disk)
 {
 	struct qcow *q = disk->priv;
 	struct qcow_header *header;
+	struct list_head *pos, *n;
 	struct qcow_table *table;
 
-	if (fdatasync(disk->fd) < 0)
-		return -1;
-
 	header	= q->header;
 	table	= &q->table;
 
+	mutex_lock(&q->mutex);
+
+	list_for_each_safe(pos, n, &q->lru_list) {
+		struct qcow_l2_table *c = list_entry(pos, struct qcow_l2_table, list);
+
+		if (qcow_l2_cache_write(q, c) < 0)
+			goto error_unlock;
+	}
+
+	if (fdatasync(disk->fd) < 0)
+		goto error_unlock;
+
 	if (pwrite_in_full(disk->fd, table->l1_table, table->table_size * sizeof(u64), header->l1_table_offset) < 0)
-		return -1;
+		goto error_unlock;
+
+	mutex_unlock(&q->mutex);
 
 	return fsync(disk->fd);
+
+error_unlock:
+	mutex_unlock(&q->mutex);
+	return -1;
 }
 
 static int qcow_disk_close(struct disk_image *disk)
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 9/9] kvm tools, qcow: Flush only dirty L2 tables
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
                   ` (7 preceding siblings ...)
  2011-07-09 13:02 ` [PATCH 8/9] kvm tools, qcow: Delayed L2 table writeout Pekka Enberg
@ 2011-07-09 13:02 ` Pekka Enberg
  2011-07-10 17:15 ` [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Ingo Molnar
  9 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-09 13:02 UTC (permalink / raw)
  To: kvm
  Cc: Pekka Enberg, Asias He, Cyrill Gorcunov, Ingo Molnar,
	Prasad Joshi, Sasha Levin

This patch improves qcow_l2_cache_write() to only flush dirty L2 tables.

Cc: Asias He <asias.hejun@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prasad Joshi <prasadjoshi124@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
---
 tools/kvm/disk/qcow.c        |   11 ++++++++++-
 tools/kvm/include/kvm/qcow.h |    1 +
 2 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/disk/qcow.c b/tools/kvm/disk/qcow.c
index b71762f..13c8bea 100644
--- a/tools/kvm/disk/qcow.c
+++ b/tools/kvm/disk/qcow.c
@@ -93,9 +93,17 @@ static int qcow_l2_cache_write(struct qcow *q, struct qcow_l2_table *c)
 	struct qcow_header *header = q->header;
 	u64 size;
 
+	if (!c->dirty)
+		return 0;
+
 	size = 1 << header->l2_bits;
 
-	return pwrite_in_full(q->fd, c->table, size * sizeof(u64), c->offset);
+	if (pwrite_in_full(q->fd, c->table, size * sizeof(u64), c->offset) < 0)
+		return -1;
+
+	c->dirty = 0;
+
+	return 0;
 }
 
 static int cache_table(struct qcow *q, struct qcow_l2_table *c)
@@ -447,6 +455,7 @@ static ssize_t qcow_write_cluster(struct qcow *q, u64 offset, void *buf, u32 src
 	if (!clust_start) {
 		clust_start		= ALIGN(f_sz, clust_sz);
 		l2t->table[l2t_idx]	= cpu_to_be64(clust_start);
+		l2t->dirty		= 1;
 	}
 
 	mutex_unlock(&q->mutex);
diff --git a/tools/kvm/include/kvm/qcow.h b/tools/kvm/include/kvm/qcow.h
index d44c64a..650d3c2 100644
--- a/tools/kvm/include/kvm/qcow.h
+++ b/tools/kvm/include/kvm/qcow.h
@@ -27,6 +27,7 @@ struct qcow_l2_table {
 	u64                     offset;
 	struct rb_node          node;
 	struct list_head        list;
+	u8			dirty;
 	u64                     table[];
 };
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
                   ` (8 preceding siblings ...)
  2011-07-09 13:02 ` [PATCH 9/9] kvm tools, qcow: Flush only dirty L2 tables Pekka Enberg
@ 2011-07-10 17:15 ` Ingo Molnar
  2011-07-10 18:08   ` Pekka Enberg
  9 siblings, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2011-07-10 17:15 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin


* Pekka Enberg <penberg@kernel.org> wrote:

> This series fixes QCOW locking issues and implements delayed metadata writeout.
> This improves performance of writeout to QCOW2 images that don't have clusters
> and L2 tables allocated on-disk.
> 
> I tested the series by running 
> 
>   mount -t ext4 /dev/vdb /mnt
>   dd if=/dev/zero of=/mnt/tmp
> 
> in the guest multiple times for fresly generated QCOW2 image:
> 
>   dd if=/dev/zero of=fs.ext4 bs=1024k count=512 && mkfs.ext4 -F fs.ext4 && qemu-img convert -O qcow2 fs.ext4 fs.qcow2
> 
> which causes worst-case behavior for the current code.
> 
> Before:
> 
>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-master.png ]
> 
>   511229952 bytes (511 MB) copied, 19.906 s, 25.7 MB/s
>   511229952 bytes (511 MB) copied, 20.3168 s, 25.2 MB/s
>   511229952 bytes (511 MB) copied, 20.8078 s, 24.6 MB/s
>   511229952 bytes (511 MB) copied, 21.0889 s, 24.2 MB/s
>   511229952 bytes (511 MB) copied, 20.7833 s, 24.6 MB/s
>   511229952 bytes (511 MB) copied, 20.7536 s, 24.6 MB/s
>   511229952 bytes (511 MB) copied, 20.0312 s, 25.5 MB/s
> 
> After:
> 
>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-delayed.png ]
> 
>   511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
>   511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
>   511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
>   511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
>   511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
>   511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
>   511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s

Just wondering, how does Qemu perform on the same system using the 
same image, with comparable settings?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-10 17:15 ` [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Ingo Molnar
@ 2011-07-10 18:08   ` Pekka Enberg
  2011-07-10 18:17     ` Ingo Molnar
  2011-07-11  9:31     ` Kevin Wolf
  0 siblings, 2 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-10 18:08 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin

Hi Ingo,

* Pekka Enberg <penberg@kernel.org> wrote:
>> This series fixes QCOW locking issues and implements delayed metadata writeout.
>> This improves performance of writeout to QCOW2 images that don't have clusters
>> and L2 tables allocated on-disk.
>>
>> I tested the series by running
>>
>>   mount -t ext4 /dev/vdb /mnt
>>   dd if=/dev/zero of=/mnt/tmp
>>
>> in the guest multiple times for fresly generated QCOW2 image:
>>
>>   dd if=/dev/zero of=fs.ext4 bs=1024k count=512 && mkfs.ext4 -F fs.ext4 && qemu-img convert -O qcow2 fs.ext4 fs.qcow2
>>
>> which causes worst-case behavior for the current code.
>>
>> Before:
>>
>>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-master.png ]
>>
>>   511229952 bytes (511 MB) copied, 19.906 s, 25.7 MB/s
>>   511229952 bytes (511 MB) copied, 20.3168 s, 25.2 MB/s
>>   511229952 bytes (511 MB) copied, 20.8078 s, 24.6 MB/s
>>   511229952 bytes (511 MB) copied, 21.0889 s, 24.2 MB/s
>>   511229952 bytes (511 MB) copied, 20.7833 s, 24.6 MB/s
>>   511229952 bytes (511 MB) copied, 20.7536 s, 24.6 MB/s
>>   511229952 bytes (511 MB) copied, 20.0312 s, 25.5 MB/s
>>
>> After:
>>
>>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-delayed.png ]
>>
>>   511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
>>   511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
>>   511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
>>   511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
>>   511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
>>   511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
>>   511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s

On Sun, Jul 10, 2011 at 8:15 PM, Ingo Molnar <mingo@elte.hu> wrote:
> Just wondering, how does Qemu perform on the same system using the
> same image, with comparable settings?

Freshly built from qemu-kvm.git:

$ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 --version
QEMU emulator version 0.14.50 (qemu-kvm-devel), Copyright (c)
2003-2008 Fabrice Bellard

Tests were run with this configuration:

$ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 -kernel
/boot/vmlinuz-3.0.0-rc5+ -drive
file=/home/penberg/images/debian_squeeze_amd64_standard.img,if=virtio,boot=on
-drive file=fs.qcow2,if=virtio -nographic -m 320 -smp 2 -append
"root=/dev/vda1 console=ttyS0 init=/root/iobench-write"

Not sure if that's 100% comparable settings but anyway. The results
looks as follows:

  511229952 bytes (511 MB) copied, 12.5543 s, 40.7 MB/s
  511229952 bytes (511 MB) copied, 9.50382 s, 53.8 MB/s
  511229952 bytes (511 MB) copied, 12.1092 s, 42.2 MB/s
  511229952 bytes (511 MB) copied, 13.2981 s, 38.4 MB/s
  511229952 bytes (511 MB) copied, 11.3314 s, 45.1 MB/s
  511229952 bytes (511 MB) copied, 12.7505 s, 40.1 MB/s
  511229952 bytes (511 MB) copied, 11.2921 s, 45.3 MB/s

This is what I'd expect as tools/kvm has much more relaxed sync()
guarantees than qemu-kvm. We treat all writes to QCOW2 images as
volatile until VIRTIO_BLK_T_FLUSH is issued. Furthemore, for this
particular (special case) load, it's pretty much append-only to the
backing file which is why QCOW is so close to raw image performance
here.

                        Pekka

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-10 18:08   ` Pekka Enberg
@ 2011-07-10 18:17     ` Ingo Molnar
  2011-07-10 18:38       ` Pekka Enberg
  2011-07-11  9:31     ` Kevin Wolf
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2011-07-10 18:17 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin


* Pekka Enberg <penberg@kernel.org> wrote:

> Hi Ingo,
> 
> * Pekka Enberg <penberg@kernel.org> wrote:
> >> This series fixes QCOW locking issues and implements delayed metadata writeout.
> >> This improves performance of writeout to QCOW2 images that don't have clusters
> >> and L2 tables allocated on-disk.
> >>
> >> I tested the series by running
> >>
> >>   mount -t ext4 /dev/vdb /mnt
> >>   dd if=/dev/zero of=/mnt/tmp
> >>
> >> in the guest multiple times for fresly generated QCOW2 image:
> >>
> >>   dd if=/dev/zero of=fs.ext4 bs=1024k count=512 && mkfs.ext4 -F fs.ext4 && qemu-img convert -O qcow2 fs.ext4 fs.qcow2
> >>
> >> which causes worst-case behavior for the current code.
> >>
> >> Before:
> >>
> >>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-master.png ]
> >>
> >>   511229952 bytes (511 MB) copied, 19.906 s, 25.7 MB/s
> >>   511229952 bytes (511 MB) copied, 20.3168 s, 25.2 MB/s
> >>   511229952 bytes (511 MB) copied, 20.8078 s, 24.6 MB/s
> >>   511229952 bytes (511 MB) copied, 21.0889 s, 24.2 MB/s
> >>   511229952 bytes (511 MB) copied, 20.7833 s, 24.6 MB/s
> >>   511229952 bytes (511 MB) copied, 20.7536 s, 24.6 MB/s
> >>   511229952 bytes (511 MB) copied, 20.0312 s, 25.5 MB/s
> >>
> >> After:
> >>
> >>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-delayed.png ]
> >>
> >>   511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
> >>   511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
> >>   511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
> >>   511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
> >>   511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
> >>   511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
> >>   511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s
> 
> On Sun, Jul 10, 2011 at 8:15 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > Just wondering, how does Qemu perform on the same system using the
> > same image, with comparable settings?
> 
> Freshly built from qemu-kvm.git:
> 
> $ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 0.14.50 (qemu-kvm-devel), Copyright (c)
> 2003-2008 Fabrice Bellard
> 
> Tests were run with this configuration:
> 
> $ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 -kernel
> /boot/vmlinuz-3.0.0-rc5+ -drive
> file=/home/penberg/images/debian_squeeze_amd64_standard.img,if=virtio,boot=on
> -drive file=fs.qcow2,if=virtio -nographic -m 320 -smp 2 -append
> "root=/dev/vda1 console=ttyS0 init=/root/iobench-write"
> 
> Not sure if that's 100% comparable settings but anyway. The results
> looks as follows:
> 
>   511229952 bytes (511 MB) copied, 12.5543 s, 40.7 MB/s
>   511229952 bytes (511 MB) copied, 9.50382 s, 53.8 MB/s
>   511229952 bytes (511 MB) copied, 12.1092 s, 42.2 MB/s
>   511229952 bytes (511 MB) copied, 13.2981 s, 38.4 MB/s
>   511229952 bytes (511 MB) copied, 11.3314 s, 45.1 MB/s
>   511229952 bytes (511 MB) copied, 12.7505 s, 40.1 MB/s
>   511229952 bytes (511 MB) copied, 11.2921 s, 45.3 MB/s
> 
> This is what I'd expect as tools/kvm has much more relaxed sync()
> guarantees than qemu-kvm. We treat all writes to QCOW2 images as
> volatile until VIRTIO_BLK_T_FLUSH is issued. Furthemore, for this
> particular (special case) load, it's pretty much append-only to the
> backing file which is why QCOW is so close to raw image performance
> here.

Pretty impressive numbers!

To relax Qemu's caching guarantees you can append ,cache=writeback to 
your -drive option, i.e. something like:

  -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio

Does that improve the Qemu results?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-10 18:17     ` Ingo Molnar
@ 2011-07-10 18:38       ` Pekka Enberg
  0 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-10 18:38 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin

On Sun, Jul 10, 2011 at 9:17 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Pekka Enberg <penberg@kernel.org> wrote:
>
>> Hi Ingo,
>>
>> * Pekka Enberg <penberg@kernel.org> wrote:
>> >> This series fixes QCOW locking issues and implements delayed metadata writeout.
>> >> This improves performance of writeout to QCOW2 images that don't have clusters
>> >> and L2 tables allocated on-disk.
>> >>
>> >> I tested the series by running
>> >>
>> >>   mount -t ext4 /dev/vdb /mnt
>> >>   dd if=/dev/zero of=/mnt/tmp
>> >>
>> >> in the guest multiple times for fresly generated QCOW2 image:
>> >>
>> >>   dd if=/dev/zero of=fs.ext4 bs=1024k count=512 && mkfs.ext4 -F fs.ext4 && qemu-img convert -O qcow2 fs.ext4 fs.qcow2
>> >>
>> >> which causes worst-case behavior for the current code.
>> >>
>> >> Before:
>> >>
>> >>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-master.png ]
>> >>
>> >>   511229952 bytes (511 MB) copied, 19.906 s, 25.7 MB/s
>> >>   511229952 bytes (511 MB) copied, 20.3168 s, 25.2 MB/s
>> >>   511229952 bytes (511 MB) copied, 20.8078 s, 24.6 MB/s
>> >>   511229952 bytes (511 MB) copied, 21.0889 s, 24.2 MB/s
>> >>   511229952 bytes (511 MB) copied, 20.7833 s, 24.6 MB/s
>> >>   511229952 bytes (511 MB) copied, 20.7536 s, 24.6 MB/s
>> >>   511229952 bytes (511 MB) copied, 20.0312 s, 25.5 MB/s
>> >>
>> >> After:
>> >>
>> >>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-delayed.png ]
>> >>
>> >>   511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
>> >>   511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
>> >>   511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
>> >>   511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
>> >>   511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
>> >>   511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
>> >>   511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s
>>
>> On Sun, Jul 10, 2011 at 8:15 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> > Just wondering, how does Qemu perform on the same system using the
>> > same image, with comparable settings?
>>
>> Freshly built from qemu-kvm.git:
>>
>> $ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 --version
>> QEMU emulator version 0.14.50 (qemu-kvm-devel), Copyright (c)
>> 2003-2008 Fabrice Bellard
>>
>> Tests were run with this configuration:
>>
>> $ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 -kernel
>> /boot/vmlinuz-3.0.0-rc5+ -drive
>> file=/home/penberg/images/debian_squeeze_amd64_standard.img,if=virtio,boot=on
>> -drive file=fs.qcow2,if=virtio -nographic -m 320 -smp 2 -append
>> "root=/dev/vda1 console=ttyS0 init=/root/iobench-write"
>>
>> Not sure if that's 100% comparable settings but anyway. The results
>> looks as follows:
>>
>>   511229952 bytes (511 MB) copied, 12.5543 s, 40.7 MB/s
>>   511229952 bytes (511 MB) copied, 9.50382 s, 53.8 MB/s
>>   511229952 bytes (511 MB) copied, 12.1092 s, 42.2 MB/s
>>   511229952 bytes (511 MB) copied, 13.2981 s, 38.4 MB/s
>>   511229952 bytes (511 MB) copied, 11.3314 s, 45.1 MB/s
>>   511229952 bytes (511 MB) copied, 12.7505 s, 40.1 MB/s
>>   511229952 bytes (511 MB) copied, 11.2921 s, 45.3 MB/s
>>
>> This is what I'd expect as tools/kvm has much more relaxed sync()
>> guarantees than qemu-kvm. We treat all writes to QCOW2 images as
>> volatile until VIRTIO_BLK_T_FLUSH is issued. Furthemore, for this
>> particular (special case) load, it's pretty much append-only to the
>> backing file which is why QCOW is so close to raw image performance
>> here.
>
> Pretty impressive numbers!
>
> To relax Qemu's caching guarantees you can append ,cache=writeback to
> your -drive option, i.e. something like:
>
>  -drive file=/dev/shm/test.qcow2,cache=writeback,if=virtio
>
> Does that improve the Qemu results?

Yes, it seems so:

  511229952 bytes (511 MB) copied, 10.0879 s, 50.7 MB/s
  511229952 bytes (511 MB) copied, 4.92686 s, 104 MB/s
  511229952 bytes (511 MB) copied, 13.1955 s, 38.7 MB/s
  511229952 bytes (511 MB) copied, 10.7322 s, 47.6 MB/s
  511229952 bytes (511 MB) copied, 9.46115 s, 54.0 MB/s
  511229952 bytes (511 MB) copied, 14.9963 s, 34.1 MB/s
  511229952 bytes (511 MB) copied, 11.1701 s, 45.8 MB/s

The numbers seem much more unstable from run to run with 'writeback'
so it's pretty difficult to say how much it helps. I'm doing
'drop_caches' after image creation so I don't quite understand why
they are so unstable.

                                Pekka

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-10 18:08   ` Pekka Enberg
  2011-07-10 18:17     ` Ingo Molnar
@ 2011-07-11  9:31     ` Kevin Wolf
  2011-07-11  9:41       ` Pekka Enberg
  1 sibling, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2011-07-11  9:31 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin

Am 10.07.2011 20:08, schrieb Pekka Enberg:
> Hi Ingo,
> 
> * Pekka Enberg <penberg@kernel.org> wrote:
>>> This series fixes QCOW locking issues and implements delayed metadata writeout.
>>> This improves performance of writeout to QCOW2 images that don't have clusters
>>> and L2 tables allocated on-disk.
>>>
>>> I tested the series by running
>>>
>>>   mount -t ext4 /dev/vdb /mnt
>>>   dd if=/dev/zero of=/mnt/tmp
>>>
>>> in the guest multiple times for fresly generated QCOW2 image:
>>>
>>>   dd if=/dev/zero of=fs.ext4 bs=1024k count=512 && mkfs.ext4 -F fs.ext4 && qemu-img convert -O qcow2 fs.ext4 fs.qcow2
>>>
>>> which causes worst-case behavior for the current code.
>>>
>>> Before:
>>>
>>>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-master.png ]
>>>
>>>   511229952 bytes (511 MB) copied, 19.906 s, 25.7 MB/s
>>>   511229952 bytes (511 MB) copied, 20.3168 s, 25.2 MB/s
>>>   511229952 bytes (511 MB) copied, 20.8078 s, 24.6 MB/s
>>>   511229952 bytes (511 MB) copied, 21.0889 s, 24.2 MB/s
>>>   511229952 bytes (511 MB) copied, 20.7833 s, 24.6 MB/s
>>>   511229952 bytes (511 MB) copied, 20.7536 s, 24.6 MB/s
>>>   511229952 bytes (511 MB) copied, 20.0312 s, 25.5 MB/s
>>>
>>> After:
>>>
>>>   [ seekwatcher: http://userweb.kernel.org/~penberg/kvm-qcow-delayed/kvm-qcow2-delayed.png ]
>>>
>>>   511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
>>>   511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
>>>   511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
>>>   511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
>>>   511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
>>>   511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
>>>   511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s
> 
> On Sun, Jul 10, 2011 at 8:15 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> Just wondering, how does Qemu perform on the same system using the
>> same image, with comparable settings?
> 
> Freshly built from qemu-kvm.git:
> 
> $ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 --version
> QEMU emulator version 0.14.50 (qemu-kvm-devel), Copyright (c)
> 2003-2008 Fabrice Bellard
> 
> Tests were run with this configuration:
> 
> $ /home/penberg/qemu-kvm/x86_64-softmmu/qemu-system-x86_64 -kernel
> /boot/vmlinuz-3.0.0-rc5+ -drive
> file=/home/penberg/images/debian_squeeze_amd64_standard.img,if=virtio,boot=on
> -drive file=fs.qcow2,if=virtio -nographic -m 320 -smp 2 -append
> "root=/dev/vda1 console=ttyS0 init=/root/iobench-write"
> 
> Not sure if that's 100% comparable settings but anyway. The results
> looks as follows:

I would love to try out your code occasionally myself, but so far I have
been to lazy to build a guest kernel only to be able to test it. Having
to deal with the huge kernel git tree just for a small program doesn't
really make it more fun either... Anyway, what I'm trying to say is that
everything in my mails is from a purely theoretical POV. I have only
looked at the code, but never really tried it.

As Ingo already said, the cache mode is probably the major difference.
>From what I can see in your code, cache=writeback would be the
equivalent for what tools/kvm is doing, however cache=none (i.e.
O_DIRECT) is what people usually do with qemu.

And then there seems to be another big difference. I hope I'm not
missing anything, but you seem to be completely lacking refcount
handling for qcow2. This is okay for read-only image, but with write
access to the image, you're corrupting the images if you don't update
the refcounts. Have you checked qcow2 images with qemu-img check after
tools/kvm having written to it?

Maintaining the right order between L2 writes and refcount block writes
is another source of flushes in qemu, which of course makes a difference
for performance.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-11  9:31     ` Kevin Wolf
@ 2011-07-11  9:41       ` Pekka Enberg
  2011-07-11 10:29         ` Kevin Wolf
  2011-07-11 10:36         ` Ingo Molnar
  0 siblings, 2 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-11  9:41 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Ingo Molnar, kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin

Hi Kevin,

On Mon, Jul 11, 2011 at 12:31 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> I would love to try out your code occasionally myself, but so far I have
> been to lazy to build a guest kernel only to be able to test it. Having
> to deal with the huge kernel git tree just for a small program doesn't
> really make it more fun either... Anyway, what I'm trying to say is that
> everything in my mails is from a purely theoretical POV. I have only
> looked at the code, but never really tried it.

Most distro kernels boot just fine, AFAIK. If you have a kernel tree
laying around, you can use

  git remote add kvm-tool git://github.com/penberg/linux-kvm.git
  git remote update kvm-tool

to fetch the sources.

> As Ingo already said, the cache mode is probably the major difference.
> From what I can see in your code, cache=writeback would be the
> equivalent for what tools/kvm is doing, however cache=none (i.e.
> O_DIRECT) is what people usually do with qemu.

Yup, I posted 'cache=writeback' results too which are much closer to
tools/kvm numbers.

> And then there seems to be another big difference. I hope I'm not
> missing anything, but you seem to be completely lacking refcount
> handling for qcow2. This is okay for read-only image, but with write
> access to the image, you're corrupting the images if you don't update
> the refcounts. Have you checked qcow2 images with qemu-img check after
> tools/kvm having written to it?
>
> Maintaining the right order between L2 writes and refcount block writes
> is another source of flushes in qemu, which of course makes a difference
> for performance.

Yes, you're absolutely correct. We don't support copy-on-write images
and I didn't realize until yesterday evening that we don't even check
the 'copied' bit to make sure writes are safe.

However, for these particular numbers, it doesn't matter that much
because it's all append-only and thus shouldn't trigger any of the
copy-on-write paths.

                                Pekka

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-11  9:41       ` Pekka Enberg
@ 2011-07-11 10:29         ` Kevin Wolf
  2011-07-11 10:32           ` Pekka Enberg
  2011-07-11 10:36         ` Ingo Molnar
  1 sibling, 1 reply; 20+ messages in thread
From: Kevin Wolf @ 2011-07-11 10:29 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Ingo Molnar, kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin

Am 11.07.2011 11:41, schrieb Pekka Enberg:
> Hi Kevin,
> 
> On Mon, Jul 11, 2011 at 12:31 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>> I would love to try out your code occasionally myself, but so far I have
>> been to lazy to build a guest kernel only to be able to test it. Having
>> to deal with the huge kernel git tree just for a small program doesn't
>> really make it more fun either... Anyway, what I'm trying to say is that
>> everything in my mails is from a purely theoretical POV. I have only
>> looked at the code, but never really tried it.
> 
> Most distro kernels boot just fine, AFAIK. If you have a kernel tree
> laying around, you can use
> 
>   git remote add kvm-tool git://github.com/penberg/linux-kvm.git
>   git remote update kvm-tool
> 
> to fetch the sources.

Yeah, I do have the source and I read some parts of it. Just running it
didn't seem to work with the standard Fedora kernel last time. Seems to
work now, so it was probably my fault.

Not sure what I did different last time, maybe I relied on it to pick up
kernel and initrd automatically from the host (it finds the kernel, but
not the initrd).

>> As Ingo already said, the cache mode is probably the major difference.
>> From what I can see in your code, cache=writeback would be the
>> equivalent for what tools/kvm is doing, however cache=none (i.e.
>> O_DIRECT) is what people usually do with qemu.
> 
> Yup, I posted 'cache=writeback' results too which are much closer to
> tools/kvm numbers.

Saw it. cache=none would probably help with the stability, but of course
you would also have to add O_DIRECT to tools/kvm to make it fair.

>> And then there seems to be another big difference. I hope I'm not
>> missing anything, but you seem to be completely lacking refcount
>> handling for qcow2. This is okay for read-only image, but with write
>> access to the image, you're corrupting the images if you don't update
>> the refcounts. Have you checked qcow2 images with qemu-img check after
>> tools/kvm having written to it?
>>
>> Maintaining the right order between L2 writes and refcount block writes
>> is another source of flushes in qemu, which of course makes a difference
>> for performance.
> 
> Yes, you're absolutely correct. We don't support copy-on-write images
> and I didn't realize until yesterday evening that we don't even check
> the 'copied' bit to make sure writes are safe.
> 
> However, for these particular numbers, it doesn't matter that much
> because it's all append-only and thus shouldn't trigger any of the
> copy-on-write paths.

It has nothing to do with copy on write. Well, of course COW is the
reason why the refcounts exist at all, but for a correct qcow2 image
they must be consistent even when you don't do COW.

The problem is that when you run an image, in which tools/kvm has
allocated new clusters, in qemu, it will use the refcount table and
still see the clusters as free. So you'll end up with two guest disk
clusters mapped to the same cluster in the image file and that obviously
means that you'll get data corruption.

qemu-img check would tell you about such inconsistencies.

Kevin

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-11 10:29         ` Kevin Wolf
@ 2011-07-11 10:32           ` Pekka Enberg
  0 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-11 10:32 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Ingo Molnar, kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin

On Mon, Jul 11, 2011 at 1:29 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 11.07.2011 11:41, schrieb Pekka Enberg:
>> Hi Kevin,
>>
>> On Mon, Jul 11, 2011 at 12:31 PM, Kevin Wolf <kwolf@redhat.com> wrote:
>>> I would love to try out your code occasionally myself, but so far I have
>>> been to lazy to build a guest kernel only to be able to test it. Having
>>> to deal with the huge kernel git tree just for a small program doesn't
>>> really make it more fun either... Anyway, what I'm trying to say is that
>>> everything in my mails is from a purely theoretical POV. I have only
>>> looked at the code, but never really tried it.
>>
>> Most distro kernels boot just fine, AFAIK. If you have a kernel tree
>> laying around, you can use
>>
>>   git remote add kvm-tool git://github.com/penberg/linux-kvm.git
>>   git remote update kvm-tool
>>
>> to fetch the sources.
>
> Yeah, I do have the source and I read some parts of it. Just running it
> didn't seem to work with the standard Fedora kernel last time. Seems to
> work now, so it was probably my fault.
>
> Not sure what I did different last time, maybe I relied on it to pick up
> kernel and initrd automatically from the host (it finds the kernel, but
> not the initrd).

Yeah, we should really add automatic initrd detection too.

>>> As Ingo already said, the cache mode is probably the major difference.
>>> From what I can see in your code, cache=writeback would be the
>>> equivalent for what tools/kvm is doing, however cache=none (i.e.
>>> O_DIRECT) is what people usually do with qemu.
>>
>> Yup, I posted 'cache=writeback' results too which are much closer to
>> tools/kvm numbers.
>
> Saw it. cache=none would probably help with the stability, but of course
> you would also have to add O_DIRECT to tools/kvm to make it fair.
>
>>> And then there seems to be another big difference. I hope I'm not
>>> missing anything, but you seem to be completely lacking refcount
>>> handling for qcow2. This is okay for read-only image, but with write
>>> access to the image, you're corrupting the images if you don't update
>>> the refcounts. Have you checked qcow2 images with qemu-img check after
>>> tools/kvm having written to it?
>>>
>>> Maintaining the right order between L2 writes and refcount block writes
>>> is another source of flushes in qemu, which of course makes a difference
>>> for performance.
>>
>> Yes, you're absolutely correct. We don't support copy-on-write images
>> and I didn't realize until yesterday evening that we don't even check
>> the 'copied' bit to make sure writes are safe.
>>
>> However, for these particular numbers, it doesn't matter that much
>> because it's all append-only and thus shouldn't trigger any of the
>> copy-on-write paths.
>
> It has nothing to do with copy on write. Well, of course COW is the
> reason why the refcounts exist at all, but for a correct qcow2 image
> they must be consistent even when you don't do COW.
>
> The problem is that when you run an image, in which tools/kvm has
> allocated new clusters, in qemu, it will use the refcount table and
> still see the clusters as free. So you'll end up with two guest disk
> clusters mapped to the same cluster in the image file and that obviously
> means that you'll get data corruption.
>
> qemu-img check would tell you about such inconsistencies.

Aah, OK, we need to fix that. Thanks!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-11  9:41       ` Pekka Enberg
  2011-07-11 10:29         ` Kevin Wolf
@ 2011-07-11 10:36         ` Ingo Molnar
  2011-07-11 10:44           ` Pekka Enberg
  1 sibling, 1 reply; 20+ messages in thread
From: Ingo Molnar @ 2011-07-11 10:36 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Kevin Wolf, kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin


* Pekka Enberg <penberg@kernel.org> wrote:

> > As Ingo already said, the cache mode is probably the major 
> > difference. From what I can see in your code, cache=writeback 
> > would be the equivalent for what tools/kvm is doing, however 
> > cache=none (i.e. O_DIRECT) is what people usually do with qemu.
> 
> Yup, I posted 'cache=writeback' results too which are much closer 
> to tools/kvm numbers.

tools/kvm/ seems to be about 20% faster on average:

  511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
  511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
  511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
  511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
  511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
  511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
  511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s

versus the qemu numbers:

  511229952 bytes (511 MB) copied, 10.0879 s, 50.7 MB/s
  511229952 bytes (511 MB) copied, 4.92686 s, 104 MB/s
  511229952 bytes (511 MB) copied, 13.1955 s, 38.7 MB/s
  511229952 bytes (511 MB) copied, 10.7322 s, 47.6 MB/s
  511229952 bytes (511 MB) copied, 9.46115 s, 54.0 MB/s
  511229952 bytes (511 MB) copied, 14.9963 s, 34.1 MB/s
  511229952 bytes (511 MB) copied, 11.1701 s, 45.8 MB/s

but indeed there's (much) more variability in the Qemu numbers, 
suggesting some cache artifact.

Are all of these measurements done via /dev/shm, to stabilize the 
numbers and to remove disk IO delay artifacts?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 0/9] kvm tools, qcow: Improve QCOW performance
  2011-07-11 10:36         ` Ingo Molnar
@ 2011-07-11 10:44           ` Pekka Enberg
  0 siblings, 0 replies; 20+ messages in thread
From: Pekka Enberg @ 2011-07-11 10:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kevin Wolf, kvm, Asias He, Cyrill Gorcunov, Prasad Joshi, Sasha Levin

On Mon, Jul 11, 2011 at 1:36 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Pekka Enberg <penberg@kernel.org> wrote:
>
>> > As Ingo already said, the cache mode is probably the major
>> > difference. From what I can see in your code, cache=writeback
>> > would be the equivalent for what tools/kvm is doing, however
>> > cache=none (i.e. O_DIRECT) is what people usually do with qemu.
>>
>> Yup, I posted 'cache=writeback' results too which are much closer
>> to tools/kvm numbers.
>
> tools/kvm/ seems to be about 20% faster on average:
>
>   511229952 bytes (511 MB) copied, 7.68312 s, 66.5 MB/s
>   511229952 bytes (511 MB) copied, 7.54065 s, 67.8 MB/s
>   511229952 bytes (511 MB) copied, 9.34749 s, 54.7 MB/s
>   511229952 bytes (511 MB) copied, 9.2421 s, 55.3 MB/s
>   511229952 bytes (511 MB) copied, 9.9364 s, 51.5 MB/s
>   511229952 bytes (511 MB) copied, 10.0337 s, 51.0 MB/s
>   511229952 bytes (511 MB) copied, 9.39502 s, 54.4 MB/s
>
> versus the qemu numbers:
>
>  511229952 bytes (511 MB) copied, 10.0879 s, 50.7 MB/s
>  511229952 bytes (511 MB) copied, 4.92686 s, 104 MB/s
>  511229952 bytes (511 MB) copied, 13.1955 s, 38.7 MB/s
>  511229952 bytes (511 MB) copied, 10.7322 s, 47.6 MB/s
>  511229952 bytes (511 MB) copied, 9.46115 s, 54.0 MB/s
>  511229952 bytes (511 MB) copied, 14.9963 s, 34.1 MB/s
>  511229952 bytes (511 MB) copied, 11.1701 s, 45.8 MB/s
>
> but indeed there's (much) more variability in the Qemu numbers,
> suggesting some cache artifact.
>
> Are all of these measurements done via /dev/shm, to stabilize the
> numbers and to remove disk IO delay artifacts?

No, I wanted to include disk IO delay artifacts because I was
comparing tools/kvm to itself using seekwatcher to see what's really
happening. And as Kevin pointed out, we're still missing refcount
tables from tools/kvm so it's not a proper comparison anyway. It does
show that tools/kvm QCOW performance improved significantly, though.

                                Pekka

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2011-07-11 10:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-09 13:02 [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Pekka Enberg
2011-07-09 13:02 ` [PATCH 1/9] kvm tools, qcow: Rename struct qcow_l2_cache to struct qcow_l2_table Pekka Enberg
2011-07-09 13:02 ` [PATCH 2/9] kvm tools, qcow: Use 'struct qcow_l2_table' instead of untyped array Pekka Enberg
2011-07-09 13:02 ` [PATCH 3/9] kvm tools, qcow: Fix locking issues Pekka Enberg
2011-07-09 13:02 ` [PATCH 4/9] kvm tools, qcow: Introduce qcow_disk_flush() Pekka Enberg
2011-07-09 13:02 ` [PATCH 5/9] kvm tools, qcow: Delayed L1 table writeout Pekka Enberg
2011-07-09 13:02 ` [PATCH 6/9] kvm tools, qcow: Don't fdatasync() L2 " Pekka Enberg
2011-07-09 13:02 ` [PATCH 7/9] kvm tools, qcow: Use big endian order for L2 table entries Pekka Enberg
2011-07-09 13:02 ` [PATCH 8/9] kvm tools, qcow: Delayed L2 table writeout Pekka Enberg
2011-07-09 13:02 ` [PATCH 9/9] kvm tools, qcow: Flush only dirty L2 tables Pekka Enberg
2011-07-10 17:15 ` [PATCH 0/9] kvm tools, qcow: Improve QCOW performance Ingo Molnar
2011-07-10 18:08   ` Pekka Enberg
2011-07-10 18:17     ` Ingo Molnar
2011-07-10 18:38       ` Pekka Enberg
2011-07-11  9:31     ` Kevin Wolf
2011-07-11  9:41       ` Pekka Enberg
2011-07-11 10:29         ` Kevin Wolf
2011-07-11 10:32           ` Pekka Enberg
2011-07-11 10:36         ` Ingo Molnar
2011-07-11 10:44           ` Pekka Enberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.