All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Rbd: copy-on-read support for kernel rbd client
@ 2015-05-21  3:11 Li Wang
  2015-05-21  9:07 ` Ilya Dryomov
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Li Wang @ 2015-05-21  3:11 UTC (permalink / raw)
  To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
  Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang

This is a new feature of rbd layering, when reading an object 
from child, if not exist, the kernel rbd client will not only 
request parent for the object, but also write it to child, 
and the jobs are done in an asynchronous way. Therefore, the 
subsequent accesses on this object will hit child without 
bothering parent. This feature could avoid longer latency incurred 
during accessing parent, especially when child and parent are 
geographically isolated, and it also could potentially avoid 
overloading parent.

The patches: 
https://github.com/ceph/ceph-client/pull/11

Min Chen (4):
  Rbd: add an option for copy-on-read
  Rbd: add a new request: rbd_copyup_request
  Rbd: helper functions to manipulate rbd_copyup_request
  Rbd: implement the copy-on-read logic

 drivers/block/rbd.c | 385 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 382 insertions(+), 3 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Rbd: copy-on-read support for kernel rbd client
  2015-05-21  3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
@ 2015-05-21  9:07 ` Ilya Dryomov
  2015-05-21  9:19 ` [PATCH 1/4] Rbd: add an option for copy-on-read Li Wang
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Ilya Dryomov @ 2015-05-21  9:07 UTC (permalink / raw)
  To: Li Wang
  Cc: Sage Weil, Alex Elder, Josh Durgin, Ceph Development, Min Chen,
	Yunchuan Wen

On Thu, May 21, 2015 at 6:11 AM, Li Wang <liwang@ubuntukylin.com> wrote:
> This is a new feature of rbd layering, when reading an object
> from child, if not exist, the kernel rbd client will not only
> request parent for the object, but also write it to child,
> and the jobs are done in an asynchronous way. Therefore, the
> subsequent accesses on this object will hit child without
> bothering parent. This feature could avoid longer latency incurred
> during accessing parent, especially when child and parent are
> geographically isolated, and it also could potentially avoid
> overloading parent.
>
> The patches:
> https://github.com/ceph/ceph-client/pull/11
>
> Min Chen (4):
>   Rbd: add an option for copy-on-read
>   Rbd: add a new request: rbd_copyup_request
>   Rbd: helper functions to manipulate rbd_copyup_request
>   Rbd: implement the copy-on-read logic
>
>  drivers/block/rbd.c | 385 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 382 insertions(+), 3 deletions(-)

Kernel client patches should be posted to ceph-devel.  While I can
certainly do my review on github, others won't.  Please post those
patches as a reply to your cover letter.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] Rbd: add an option for copy-on-read
  2015-05-21  3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
  2015-05-21  9:07 ` Ilya Dryomov
@ 2015-05-21  9:19 ` Li Wang
  2015-05-21  9:19 ` [PATCH 2/4] Rbd: add a new request: rbd_copyup_request Li Wang
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Li Wang @ 2015-05-21  9:19 UTC (permalink / raw)
  To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
  Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang

From: Min Chen <minchen@ubuntukylin.com>

Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
 drivers/block/rbd.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index ec6c5c6..446204b 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -324,6 +324,7 @@ struct rbd_mapping {
 	u64                     size;
 	u64                     features;
 	bool			read_only;
+	bool			copy_on_read;	/* rbd map option: copy on read */
 };
 
 /*
@@ -733,6 +734,7 @@ enum {
 	/* string args above */
 	Opt_read_only,
 	Opt_read_write,
+	Opt_copy_on_read,
 	/* Boolean args above */
 	Opt_last_bool,
 };
@@ -744,15 +746,19 @@ static match_table_t rbd_opts_tokens = {
 	{Opt_read_only, "ro"},		/* Alternate spelling */
 	{Opt_read_write, "read_write"},
 	{Opt_read_write, "rw"},		/* Alternate spelling */
+	{Opt_copy_on_read, "copy_on_read"},
+	{Opt_copy_on_read, "cor"},	/* Alternate spelling */
 	/* Boolean args above */
 	{-1, NULL}
 };
 
 struct rbd_options {
 	bool	read_only;
+	bool	copy_on_read;
 };
 
 #define RBD_READ_ONLY_DEFAULT	false
+#define RBD_COPY_ON_READ_DEFAULT false
 
 static int parse_rbd_opts_token(char *c, void *private)
 {
@@ -788,6 +794,9 @@ static int parse_rbd_opts_token(char *c, void *private)
 	case Opt_read_write:
 		rbd_opts->read_only = false;
 		break;
+	case Opt_copy_on_read:
+		rbd_opts->copy_on_read = true;
+		break;
 	default:
 		rbd_assert(false);
 		break;
@@ -1736,6 +1745,13 @@ static void rbd_osd_trivial_callback(struct rbd_obj_request *obj_request)
 	obj_request_done_set(obj_request);
 }
 
+static inline bool is_copy_on_read(struct rbd_device *rbd_dev)
+{
+	rbd_assert(rbd_dev);
+	rbd_assert(&rbd_dev->mapping);
+	return !rbd_dev->mapping.read_only && rbd_dev->mapping.copy_on_read;
+}
+
 static void rbd_osd_read_callback(struct rbd_obj_request *obj_request)
 {
 	struct rbd_img_request *img_request = NULL;
@@ -4933,6 +4949,7 @@ static int rbd_add_parse_args(const char *buf,
 		goto out_mem;
 
 	rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
+	rbd_opts->copy_on_read = RBD_COPY_ON_READ_DEFAULT;
 
 	copts = ceph_parse_options(options, mon_addrs,
 					mon_addrs + mon_addrs_size - 1,
@@ -5385,6 +5402,7 @@ static ssize_t do_rbd_add(struct bus_type *bus,
 	struct rbd_spec *spec = NULL;
 	struct rbd_client *rbdc;
 	bool read_only;
+	bool copy_on_read;
 	int rc = -ENOMEM;
 
 	if (!try_module_get(THIS_MODULE))
@@ -5395,6 +5413,7 @@ static ssize_t do_rbd_add(struct bus_type *bus,
 	if (rc < 0)
 		goto err_out_module;
 	read_only = rbd_opts->read_only;
+	copy_on_read = rbd_opts->copy_on_read;
 	kfree(rbd_opts);
 	rbd_opts = NULL;	/* done with this */
 
@@ -5436,7 +5455,9 @@ static ssize_t do_rbd_add(struct bus_type *bus,
 
 	if (rbd_dev->spec->snap_id != CEPH_NOSNAP)
 		read_only = true;
+
 	rbd_dev->mapping.read_only = read_only;
+	rbd_dev->mapping.copy_on_read = copy_on_read;
 
 	rc = rbd_dev_device_setup(rbd_dev);
 	if (rc) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] Rbd: add a new request: rbd_copyup_request
  2015-05-21  3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
  2015-05-21  9:07 ` Ilya Dryomov
  2015-05-21  9:19 ` [PATCH 1/4] Rbd: add an option for copy-on-read Li Wang
@ 2015-05-21  9:19 ` Li Wang
  2015-05-21  9:19 ` [PATCH 3/4] Rbd: helper functions to manipulate rbd_copyup_request Li Wang
  2015-05-21  9:19 ` [PATCH 4/4] Rbd: implement the copy-on-read logic Li Wang
  4 siblings, 0 replies; 7+ messages in thread
From: Li Wang @ 2015-05-21  9:19 UTC (permalink / raw)
  To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
  Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang

From: Min Chen <minchen@ubuntukylin.com>

Rbd_copyup_request is used only when copy_on_read enabled
for RBD child images. It is independent of rbd_obj_request.

Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
 drivers/block/rbd.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 446204b..581cb7b 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -207,6 +207,9 @@ typedef void (*rbd_img_callback_t)(struct rbd_img_request *);
 struct rbd_obj_request;
 typedef void (*rbd_obj_callback_t)(struct rbd_obj_request *);
 
+struct rbd_copyup_request;
+typedef void (*rbd_copyup_callback_t)(struct rbd_copyup_request *);
+
 enum obj_request_type {
 	OBJ_REQUEST_NODATA, OBJ_REQUEST_BIO, OBJ_REQUEST_PAGES
 };
@@ -279,6 +282,27 @@ struct rbd_obj_request {
 	struct kref		kref;
 };
 
+struct rbd_copyup_request {
+	const char *	object_name;
+	u64		object_no;
+
+	struct page 	**copyup_pages;
+	u32		copyup_page_count;
+
+	struct rbd_img_request *img_request;
+	/* links for img_request->copyup_requests list */
+	struct list_head	links;
+
+	struct ceph_osd_request		*osd_req;
+
+	rbd_copyup_callback_t 	callback;
+	struct completion	completion;
+
+	int result;
+
+	struct kref	kref;
+};
+
 enum img_req_flags {
 	IMG_REQ_WRITE,		/* I/O direction: read = 0, write = 1 */
 	IMG_REQ_CHILD,		/* initiator: block = 0, child image = 1 */
@@ -310,6 +334,9 @@ struct rbd_img_request {
 	u32			obj_request_count;
 	struct list_head	obj_requests;	/* rbd_obj_request structs */
 
+	struct list_head	copyup_list;	/* rbd_copyup_request list */
+	spinlock_t		copyup_list_lock; /* protects copyup list */
+
 	struct kref		kref;
 };
 
@@ -320,6 +347,9 @@ struct rbd_img_request {
 #define for_each_obj_request_safe(ireq, oreq, n) \
 	list_for_each_entry_safe_reverse(oreq, n, &(ireq)->obj_requests, links)
 
+#define for_each_copyup_request(ireq, cpreq) \
+	list_for_each_entry(cpreq, &(ireq)->copyup_list, links)
+
 struct rbd_mapping {
 	u64                     size;
 	u64                     features;
@@ -399,6 +429,7 @@ static DEFINE_SPINLOCK(rbd_client_list_lock);
 
 static struct kmem_cache	*rbd_img_request_cache;
 static struct kmem_cache	*rbd_obj_request_cache;
+static struct kmem_cache	*rbd_copyup_request_cache;
 static struct kmem_cache	*rbd_segment_name_cache;
 
 static int rbd_major;
@@ -2178,6 +2209,8 @@ static struct rbd_img_request *rbd_img_request_create(
 	img_request->result = 0;
 	img_request->obj_request_count = 0;
 	INIT_LIST_HEAD(&img_request->obj_requests);
+	INIT_LIST_HEAD(&img_request->copyup_list);
+	spin_lock_init(&img_request->copyup_list_lock);
 	kref_init(&img_request->kref);
 
 	dout("%s: rbd_dev %p %s %llu/%llu -> img %p\n", __func__, rbd_dev,
@@ -5666,6 +5699,14 @@ static int rbd_slab_init(void)
 	if (!rbd_obj_request_cache)
 		goto out_err;
 
+	rbd_assert(!rbd_copyup_request_cache);
+	rbd_copyup_request_cache = kmem_cache_create("rbd_copyup_request",
+					sizeof (struct rbd_copyup_request),
+					__alignof__(struct rbd_copyup_request),
+					0, NULL);
+	if (!rbd_copyup_request_cache)
+		goto out_err;
+
 	rbd_assert(!rbd_segment_name_cache);
 	rbd_segment_name_cache = kmem_cache_create("rbd_segment_name",
 					CEPH_MAX_OID_NAME_LEN + 1, 1, 0, NULL);
@@ -5677,6 +5718,11 @@ out_err:
 		rbd_obj_request_cache = NULL;
 	}
 
+	if (rbd_copyup_request_cache) {
+		kmem_cache_destroy(rbd_copyup_request_cache);
+		rbd_copyup_request_cache = NULL;
+	}
+
 	kmem_cache_destroy(rbd_img_request_cache);
 	rbd_img_request_cache = NULL;
 
@@ -5693,6 +5739,10 @@ static void rbd_slab_exit(void)
 	kmem_cache_destroy(rbd_obj_request_cache);
 	rbd_obj_request_cache = NULL;
 
+	rbd_assert(rbd_copyup_request_cache);
+	kmem_cache_destroy(rbd_copyup_request_cache);
+	rbd_copyup_request_cache = NULL;
+
 	rbd_assert(rbd_img_request_cache);
 	kmem_cache_destroy(rbd_img_request_cache);
 	rbd_img_request_cache = NULL;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] Rbd: helper functions to manipulate rbd_copyup_request
  2015-05-21  3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
                   ` (2 preceding siblings ...)
  2015-05-21  9:19 ` [PATCH 2/4] Rbd: add a new request: rbd_copyup_request Li Wang
@ 2015-05-21  9:19 ` Li Wang
  2015-05-21  9:19 ` [PATCH 4/4] Rbd: implement the copy-on-read logic Li Wang
  4 siblings, 0 replies; 7+ messages in thread
From: Li Wang @ 2015-05-21  9:19 UTC (permalink / raw)
  To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
  Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang

From: Min Chen <minchen@ubuntukylin.com>

Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
 drivers/block/rbd.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 128 insertions(+)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 581cb7b..99a3a556 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1202,6 +1202,26 @@ static void rbd_dev_mapping_clear(struct rbd_device *rbd_dev)
 	rbd_dev->mapping.features = 0;
 }
 
+static inline u64 rbd_object_no(struct rbd_device *rbd_dev, const char *object_name)
+{
+	const char *ptr = NULL;
+	size_t len = 0;
+	u64 offset_width = 10;
+	u64 obj_no = (u64)-1;
+
+	rbd_assert(rbd_dev);
+	rbd_assert(object_name);
+
+	ptr = object_name;
+	len = strlen(object_name);
+	if (rbd_dev->image_format == 2)
+		offset_width = 16;
+	rbd_assert(len >= offset_width);
+	ptr += len - offset_width;
+	obj_no = simple_strtoull(ptr, NULL, 16);
+	return obj_no;
+}
+
 static void rbd_segment_name_free(const char *name)
 {
 	/* The explicit cast here is needed to drop the const qualifier */
@@ -1518,6 +1538,8 @@ static void rbd_obj_request_put(struct rbd_obj_request *obj_request)
 	kref_put(&obj_request->kref, rbd_obj_request_destroy);
 }
 
+static void rbd_copyup_request_destroy(struct kref *kref);
+
 static void rbd_img_request_get(struct rbd_img_request *img_request)
 {
 	dout("%s: img %p (was %d)\n", __func__, img_request,
@@ -1575,6 +1597,34 @@ static inline void rbd_img_obj_request_del(struct rbd_img_request *img_request,
 	rbd_obj_request_put(obj_request);
 }
 
+static inline void rbd_img_copyup_request_add(struct rbd_img_request *img_request,
+					struct rbd_copyup_request *copyup_request)
+{
+	rbd_assert(copyup_request->img_request == NULL)
+	copyup_request->img_request = img_request;
+	/*
+	* For object locality, the new request is more likely to
+	* access the last inserted object, so, just insert copyup_request
+	* after head of list_head(copyup_list)
+	*/
+	spin_lock(&img_request->copyup_list_lock);
+	list_add(&copyup_request->links, &img_request->copyup_list);
+	spin_unlock(&img_request->copyup_list_lock);
+}
+
+static inline void rbd_img_copyup_request_del(struct rbd_img_request *img_request,
+					struct rbd_copyup_request *copyup_request)
+{
+	rbd_assert(copyup_request != NULL);
+	spin_lock(&img_request->copyup_list_lock);
+	list_del(&copyup_request->links);
+	spin_unlock(&img_request->copyup_list_lock);
+
+	rbd_assert(copyup_request->img_request == img_request);
+	copyup_request->img_request = NULL;
+	copyup_request->callback = NULL;
+}
+
 static bool obj_request_type_valid(enum obj_request_type type)
 {
 	switch (type) {
@@ -1726,6 +1776,8 @@ rbd_img_request_op_type(struct rbd_img_request *img_request)
 		return OBJ_OP_READ;
 }
 
+static void rbd_img_copyup_start(struct rbd_img_request *img_request, const char *object_name);
+
 static void
 rbd_img_obj_request_read_callback(struct rbd_obj_request *obj_request)
 {
@@ -2108,6 +2160,82 @@ static void rbd_obj_request_destroy(struct kref *kref)
 	kmem_cache_free(rbd_obj_request_cache, obj_request);
 }
 
+static struct rbd_copyup_request *rbd_copyup_request_create(const char *object_name,
+						struct rbd_device *rbd_dev)
+{
+	struct rbd_copyup_request *copyup_request = NULL;
+	size_t size = 0;
+	u64 length = 0;
+	char *name = NULL;
+	struct page **pages = NULL;
+	u32	page_count = 0;
+
+	rbd_assert(rbd_dev);
+	rbd_assert(object_name);
+
+	/* Allocate memory for object_name */
+	size = strlen(object_name) + 1;
+	name = kmalloc(size, GFP_KERNEL);
+	if(!name)
+		goto out_name;
+
+	/* Allocate memory for entire object */
+	length = (u64)1 << rbd_dev->header.obj_order;
+	page_count = (u32)calc_pages_for(0,length);
+	pages = ceph_alloc_page_vector(page_count, GFP_KERNEL);
+	if (IS_ERR(pages))
+		goto out_pages;
+
+	/* Allocate memory for struct rbd_copyup_request */
+	copyup_request = kmem_cache_zalloc(rbd_copyup_request_cache, GFP_KERNEL);
+	if(!copyup_request)
+		goto out_request;
+
+	/* Init all members of struct rbd_copyup_request */
+	copyup_request->object_name = memcpy(name, object_name, size);
+	copyup_request->object_no = rbd_object_no(rbd_dev, object_name);
+	copyup_request->copyup_pages = pages;
+	copyup_request->copyup_page_count = page_count;
+
+	INIT_LIST_HEAD(&copyup_request->links);
+
+	init_completion(&copyup_request->completion);
+
+	return copyup_request;
+out_request:
+	if (copyup_request)
+		kmem_cache_free(rbd_copyup_request_cache, copyup_request);
+out_pages:
+	if (pages)
+		ceph_release_page_vector(pages, page_count);
+out_name:
+	if (name)
+		kfree(name);
+	return NULL;
+}
+
+static void rbd_copyup_request_destroy(struct kref *kref)
+{
+	struct rbd_copyup_request *copyup_request;
+	copyup_request = container_of(kref, struct rbd_copyup_request, kref);
+
+	if (copyup_request->osd_req) {
+		rbd_osd_req_destroy(copyup_request->osd_req);
+		copyup_request->osd_req = NULL;
+	}
+
+	if (copyup_request->copyup_pages) {
+		ceph_release_page_vector(copyup_request->copyup_pages, copyup_request->copyup_page_count);
+		copyup_request->copyup_pages = NULL;
+	}
+
+	if (copyup_request->object_name) {
+		kfree(copyup_request->object_name);
+		copyup_request->object_name = NULL;
+	}
+	kmem_cache_free(rbd_copyup_request_cache, copyup_request);
+}
+
 /* It's OK to call this for a device with no parent */
 
 static void rbd_spec_put(struct rbd_spec *spec);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] Rbd: implement the copy-on-read logic
  2015-05-21  3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
                   ` (3 preceding siblings ...)
  2015-05-21  9:19 ` [PATCH 3/4] Rbd: helper functions to manipulate rbd_copyup_request Li Wang
@ 2015-05-21  9:19 ` Li Wang
  2015-06-24  8:50   ` Ilya Dryomov
  4 siblings, 1 reply; 7+ messages in thread
From: Li Wang @ 2015-05-21  9:19 UTC (permalink / raw)
  To: Sage Weil, Ilya Dryomov, Alex Elder, Josh Durgin
  Cc: ceph-devel, Min Chen, Yunchuan Wen, Li Wang

From: Min Chen <minchen@ubuntukylin.com>

Signed-off-by: Min Chen <minchen@ubuntukylin.com>
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
---
 drivers/block/rbd.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 183 insertions(+), 3 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 99a3a556..51d8398 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1851,12 +1851,16 @@ static void rbd_osd_read_callback(struct rbd_obj_request *obj_request)
 		obj_request, img_request, obj_request->result,
 		obj_request->xferred, obj_request->length);
 	if (layered && obj_request->result == -ENOENT &&
-			obj_request->img_offset < rbd_dev->parent_overlap)
+			obj_request->img_offset < rbd_dev->parent_overlap) {
 		rbd_img_parent_read(obj_request);
-	else if (img_request)
+		rbd_assert(obj_request->img_request);
+		if(is_copy_on_read(obj_request->img_request->rbd_dev))
+			rbd_img_copyup_start(obj_request->img_request, obj_request->object_name);
+	} else if (img_request) {
 		rbd_img_obj_request_read_callback(obj_request);
-	else
+	} else {
 		obj_request_done_set(obj_request);
+	}
 }
 
 static void rbd_osd_write_callback(struct rbd_obj_request *obj_request)
@@ -2915,6 +2919,182 @@ out_err:
 	return result;
 }
 
+static void rbd_img_copyup_end(struct rbd_copyup_request *copyup_request)
+{
+	struct rbd_img_request *img_request = NULL;
+	rbd_assert(copyup_request);
+	img_request = copyup_request->img_request;
+	rbd_img_copyup_request_del(img_request, copyup_request);
+	rbd_copyup_request_destroy(&copyup_request->kref);
+	rbd_img_request_put(img_request);
+}
+
+static void rbd_osd_req_copyup_callback(struct ceph_osd_request *osd_req,
+					struct ceph_msg *msg)
+{
+	struct rbd_copyup_request *copyup_request = NULL;
+	rbd_assert(osd_req);
+	copyup_request = osd_req->r_priv;
+	copyup_request->result = osd_req->r_result;
+	if(copyup_request->callback)
+		copyup_request->callback(copyup_request);
+	else
+		complete_all(&copyup_request->completion);
+}
+
+static void rbd_img_copyup_write_async(struct rbd_copyup_request *copyup_request)
+{
+	struct rbd_img_request *img_request = NULL;
+	struct ceph_snap_context *snapc = NULL;
+	struct ceph_osd_request *osd_req = NULL;
+	struct ceph_osd_client *osdc = NULL;
+	struct rbd_device *rbd_dev = NULL;
+	struct page **pages = NULL;
+	struct timespec mtime = CURRENT_TIME;
+	u32 page_count = 0;
+	u64 object_size = 0;
+	int result = 0;
+
+	/* if copyup_request read from parent failed, just end it */
+	if (copyup_request->result < 0) {
+		rbd_img_copyup_end(copyup_request);
+		goto out;
+	}
+
+	img_request = copyup_request->img_request;
+	rbd_assert(img_request);
+	rbd_dev = img_request->rbd_dev;
+	rbd_assert(rbd_dev);
+	osdc = &rbd_dev->rbd_client->client->osdc;
+	rbd_assert(osdc);
+	snapc = rbd_dev->header.snapc;
+
+	ceph_osdc_put_request(copyup_request->osd_req);
+
+	copyup_request->osd_req = NULL;
+	osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
+	if (!osd_req)
+		goto out;
+
+	pages = copyup_request->copyup_pages;
+	page_count = copyup_request->copyup_page_count;
+	object_size = (u64)1 << rbd_dev->header.obj_order;
+
+	/* Initialize copyup op */
+	osd_req_op_cls_init(osd_req, 0, CEPH_OSD_OP_CALL, "rbd", "copyup");
+	osd_req_op_cls_request_data_pages(osd_req, 0, pages, object_size, 0, false, false);
+	osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
+	osd_req->r_callback = rbd_osd_req_copyup_callback;
+	osd_req->r_priv = copyup_request;
+
+	osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->layout);
+	ceph_oid_set_name(&osd_req->r_base_oid, copyup_request->object_name);
+
+	copyup_request->osd_req = osd_req;
+	copyup_request->callback = rbd_img_copyup_end;
+
+	ceph_osdc_build_request(osd_req, 0, snapc, CEPH_NOSNAP, &mtime);
+	result = ceph_osdc_start_request(osdc, osd_req, false);
+	if(!result)
+		goto out;
+
+	ceph_osdc_put_request(osd_req);
+out:
+	return;
+}
+
+static void rbd_img_copyup_start(struct rbd_img_request *img_request,
+				const char *object_name)
+{
+	struct rbd_copyup_request *copyup_request = NULL;
+	struct rbd_device *rbd_dev = NULL;
+	struct ceph_snap_context *snapc = NULL;
+	struct ceph_osd_client *osdc = NULL;
+	struct ceph_osd_request *osd_req = NULL;
+		const char *parent_object_name = NULL;
+
+	int result = 0;
+	u64 object_no = (u64)-1;
+	u64 object_size = 0;
+	u64 snap_id = 0;
+	__u8 obj_order = 0;
+	bool is_read = false;
+
+	rbd_assert(img_request != NULL);
+	rbd_assert(object_name != NULL);
+
+	rbd_dev = img_request->rbd_dev;
+	rbd_assert(rbd_dev != NULL);
+
+	is_read = !img_request_write_test(img_request) &&
+			!img_request_discard_test(img_request);
+
+	object_no = rbd_object_no(rbd_dev, object_name);
+	obj_order = rbd_dev->header.obj_order;
+	object_size = (u64)1 << obj_order;
+
+	spin_lock(&img_request->copyup_list_lock);
+	/* Find if object_no exists in copyup_list */
+	for_each_copyup_request(img_request, copyup_request) {
+		/* Found, just return */
+		if(copyup_request->object_no == object_no) {
+			spin_unlock(&img_request->copyup_list_lock);
+			return;
+		}
+	}
+	spin_unlock(&img_request->copyup_list_lock);
+
+	/* Not found, send new copyup request */
+	copyup_request = NULL;
+	osdc = &rbd_dev->rbd_client->client->osdc;
+	parent_object_name = rbd_segment_name(rbd_dev->parent, object_no << obj_order);
+	if (!parent_object_name)
+		goto out;
+	osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
+	if (!osd_req)
+		goto out;
+	copyup_request = rbd_copyup_request_create(object_name, rbd_dev);
+	if (!copyup_request) {
+		ceph_osdc_put_request(osd_req);
+		goto out;
+	}
+
+	/* Init osd_req */
+	osd_req_op_extent_init(osd_req, 0, CEPH_OSD_OP_READ, 0, object_size, 0, 0);
+	osd_req_op_extent_osd_data_pages(osd_req, 0, copyup_request->copyup_pages, object_size,
+					0, false, false);
+
+	osd_req->r_flags = CEPH_OSD_FLAG_READ;
+	osd_req->r_callback = rbd_osd_req_copyup_callback;
+	osd_req->r_priv = copyup_request;
+
+	osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->parent->layout);
+	ceph_oid_set_name(&osd_req->r_base_oid, parent_object_name);
+	rbd_segment_name_free(parent_object_name);
+
+	/* Init copyup request */
+	rbd_assert(copyup_request->osd_req == NULL);
+	copyup_request->osd_req = osd_req;
+	copyup_request->callback = rbd_img_copyup_write_async;
+
+	/* Encode osd_req data */
+	snap_id = img_request ? img_request->snap_id : CEPH_NOSNAP;
+	ceph_osdc_build_request(osd_req, 0, NULL, snap_id, NULL);
+
+	/* Add copyup request to img_request->copyup_list */
+	rbd_img_copyup_request_add(img_request, copyup_request);
+
+	rbd_img_request_get(img_request);
+
+	/* Send osd_req */
+	result = ceph_osdc_start_request(osdc, osd_req, false);
+	if (!result)
+		goto out;
+out:
+	return;
+}
+
+
 static void rbd_img_obj_exists_callback(struct rbd_obj_request *obj_request)
 {
 	struct rbd_obj_request *orig_request;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] Rbd: implement the copy-on-read logic
  2015-05-21  9:19 ` [PATCH 4/4] Rbd: implement the copy-on-read logic Li Wang
@ 2015-06-24  8:50   ` Ilya Dryomov
  0 siblings, 0 replies; 7+ messages in thread
From: Ilya Dryomov @ 2015-06-24  8:50 UTC (permalink / raw)
  To: Li Wang
  Cc: Sage Weil, Alex Elder, Josh Durgin, Ceph Development, Min Chen,
	Yunchuan Wen

On Thu, May 21, 2015 at 12:19 PM, Li Wang <liwang@ubuntukylin.com> wrote:
> From: Min Chen <minchen@ubuntukylin.com>
>
> Signed-off-by: Min Chen <minchen@ubuntukylin.com>
> Signed-off-by: Li Wang <liwang@ubuntukylin.com>
> Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
> ---
>  drivers/block/rbd.c | 186 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 183 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 99a3a556..51d8398 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1851,12 +1851,16 @@ static void rbd_osd_read_callback(struct rbd_obj_request *obj_request)
>                 obj_request, img_request, obj_request->result,
>                 obj_request->xferred, obj_request->length);
>         if (layered && obj_request->result == -ENOENT &&
> -                       obj_request->img_offset < rbd_dev->parent_overlap)
> +                       obj_request->img_offset < rbd_dev->parent_overlap) {
>                 rbd_img_parent_read(obj_request);
> -       else if (img_request)
> +               rbd_assert(obj_request->img_request);
> +               if(is_copy_on_read(obj_request->img_request->rbd_dev))
> +                       rbd_img_copyup_start(obj_request->img_request, obj_request->object_name);
> +       } else if (img_request) {
>                 rbd_img_obj_request_read_callback(obj_request);
> -       else
> +       } else {
>                 obj_request_done_set(obj_request);
> +       }
>  }
>
>  static void rbd_osd_write_callback(struct rbd_obj_request *obj_request)
> @@ -2915,6 +2919,182 @@ out_err:
>         return result;
>  }
>
> +static void rbd_img_copyup_end(struct rbd_copyup_request *copyup_request)
> +{
> +       struct rbd_img_request *img_request = NULL;
> +       rbd_assert(copyup_request);
> +       img_request = copyup_request->img_request;
> +       rbd_img_copyup_request_del(img_request, copyup_request);
> +       rbd_copyup_request_destroy(&copyup_request->kref);
> +       rbd_img_request_put(img_request);
> +}
> +
> +static void rbd_osd_req_copyup_callback(struct ceph_osd_request *osd_req,
> +                                       struct ceph_msg *msg)
> +{
> +       struct rbd_copyup_request *copyup_request = NULL;
> +       rbd_assert(osd_req);
> +       copyup_request = osd_req->r_priv;
> +       copyup_request->result = osd_req->r_result;
> +       if(copyup_request->callback)
> +               copyup_request->callback(copyup_request);
> +       else
> +               complete_all(&copyup_request->completion);
> +}
> +
> +static void rbd_img_copyup_write_async(struct rbd_copyup_request *copyup_request)
> +{
> +       struct rbd_img_request *img_request = NULL;
> +       struct ceph_snap_context *snapc = NULL;
> +       struct ceph_osd_request *osd_req = NULL;
> +       struct ceph_osd_client *osdc = NULL;
> +       struct rbd_device *rbd_dev = NULL;
> +       struct page **pages = NULL;
> +       struct timespec mtime = CURRENT_TIME;
> +       u32 page_count = 0;
> +       u64 object_size = 0;
> +       int result = 0;
> +
> +       /* if copyup_request read from parent failed, just end it */
> +       if (copyup_request->result < 0) {
> +               rbd_img_copyup_end(copyup_request);
> +               goto out;
> +       }
> +
> +       img_request = copyup_request->img_request;
> +       rbd_assert(img_request);
> +       rbd_dev = img_request->rbd_dev;
> +       rbd_assert(rbd_dev);
> +       osdc = &rbd_dev->rbd_client->client->osdc;
> +       rbd_assert(osdc);
> +       snapc = rbd_dev->header.snapc;
> +
> +       ceph_osdc_put_request(copyup_request->osd_req);
> +
> +       copyup_request->osd_req = NULL;
> +       osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
> +       if (!osd_req)
> +               goto out;
> +
> +       pages = copyup_request->copyup_pages;
> +       page_count = copyup_request->copyup_page_count;
> +       object_size = (u64)1 << rbd_dev->header.obj_order;
> +
> +       /* Initialize copyup op */
> +       osd_req_op_cls_init(osd_req, 0, CEPH_OSD_OP_CALL, "rbd", "copyup");
> +       osd_req_op_cls_request_data_pages(osd_req, 0, pages, object_size, 0, false, false);
> +       osd_req->r_flags = CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK;
> +       osd_req->r_callback = rbd_osd_req_copyup_callback;
> +       osd_req->r_priv = copyup_request;
> +
> +       osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->layout);
> +       ceph_oid_set_name(&osd_req->r_base_oid, copyup_request->object_name);
> +
> +       copyup_request->osd_req = osd_req;
> +       copyup_request->callback = rbd_img_copyup_end;
> +
> +       ceph_osdc_build_request(osd_req, 0, snapc, CEPH_NOSNAP, &mtime);
> +       result = ceph_osdc_start_request(osdc, osd_req, false);
> +       if(!result)
> +               goto out;
> +
> +       ceph_osdc_put_request(osd_req);
> +out:
> +       return;
> +}
> +
> +static void rbd_img_copyup_start(struct rbd_img_request *img_request,
> +                               const char *object_name)
> +{
> +       struct rbd_copyup_request *copyup_request = NULL;
> +       struct rbd_device *rbd_dev = NULL;
> +       struct ceph_snap_context *snapc = NULL;
> +       struct ceph_osd_client *osdc = NULL;
> +       struct ceph_osd_request *osd_req = NULL;
> +               const char *parent_object_name = NULL;
> +
> +       int result = 0;
> +       u64 object_no = (u64)-1;
> +       u64 object_size = 0;
> +       u64 snap_id = 0;
> +       __u8 obj_order = 0;
> +       bool is_read = false;
> +
> +       rbd_assert(img_request != NULL);
> +       rbd_assert(object_name != NULL);
> +
> +       rbd_dev = img_request->rbd_dev;
> +       rbd_assert(rbd_dev != NULL);
> +
> +       is_read = !img_request_write_test(img_request) &&
> +                       !img_request_discard_test(img_request);
> +
> +       object_no = rbd_object_no(rbd_dev, object_name);
> +       obj_order = rbd_dev->header.obj_order;
> +       object_size = (u64)1 << obj_order;
> +
> +       spin_lock(&img_request->copyup_list_lock);
> +       /* Find if object_no exists in copyup_list */
> +       for_each_copyup_request(img_request, copyup_request) {
> +               /* Found, just return */
> +               if(copyup_request->object_no == object_no) {
> +                       spin_unlock(&img_request->copyup_list_lock);
> +                       return;
> +               }
> +       }
> +       spin_unlock(&img_request->copyup_list_lock);
> +
> +       /* Not found, send new copyup request */
> +       copyup_request = NULL;
> +       osdc = &rbd_dev->rbd_client->client->osdc;
> +       parent_object_name = rbd_segment_name(rbd_dev->parent, object_no << obj_order);
> +       if (!parent_object_name)
> +               goto out;
> +       osd_req = ceph_osdc_alloc_request(osdc, snapc, 1, false, GFP_ATOMIC);
> +       if (!osd_req)
> +               goto out;
> +       copyup_request = rbd_copyup_request_create(object_name, rbd_dev);
> +       if (!copyup_request) {
> +               ceph_osdc_put_request(osd_req);
> +               goto out;
> +       }
> +
> +       /* Init osd_req */
> +       osd_req_op_extent_init(osd_req, 0, CEPH_OSD_OP_READ, 0, object_size, 0, 0);
> +       osd_req_op_extent_osd_data_pages(osd_req, 0, copyup_request->copyup_pages, object_size,
> +                                       0, false, false);
> +
> +       osd_req->r_flags = CEPH_OSD_FLAG_READ;
> +       osd_req->r_callback = rbd_osd_req_copyup_callback;
> +       osd_req->r_priv = copyup_request;
> +
> +       osd_req->r_base_oloc.pool = ceph_file_layout_pg_pool(rbd_dev->parent->layout);
> +       ceph_oid_set_name(&osd_req->r_base_oid, parent_object_name);
> +       rbd_segment_name_free(parent_object_name);
> +
> +       /* Init copyup request */
> +       rbd_assert(copyup_request->osd_req == NULL);
> +       copyup_request->osd_req = osd_req;
> +       copyup_request->callback = rbd_img_copyup_write_async;
> +
> +       /* Encode osd_req data */
> +       snap_id = img_request ? img_request->snap_id : CEPH_NOSNAP;
> +       ceph_osdc_build_request(osd_req, 0, NULL, snap_id, NULL);
> +
> +       /* Add copyup request to img_request->copyup_list */
> +       rbd_img_copyup_request_add(img_request, copyup_request);
> +
> +       rbd_img_request_get(img_request);
> +
> +       /* Send osd_req */
> +       result = ceph_osdc_start_request(osdc, osd_req, false);
> +       if (!result)
> +               goto out;
> +out:
> +       return;
> +}
> +
> +
>  static void rbd_img_obj_exists_callback(struct rbd_obj_request *obj_request)
>  {
>         struct rbd_obj_request *orig_request;

Hi,

Sorry for late feedback, I thought I had sent this..

I have a number of high-level issues with this.  First and foremost,
the lifetime rules are unclear.  It looks to me like there is nothing
preventing a bunch of async copyup requests from hanging around after
rbd unmap completes.  These copyups bump img_request refcount, which
means that img_requests along with rbd_device, rbd_client and
ceph_client will also hang in there after rbd unmap, waiting for late
replies.  This is wrong: if there are no images mapped, there should be
no rbd or libceph state around.  I'm pretty sure async copyup requests
don't need an osd_client callback at all - you can just send and
forget.

Second, I think sync (copy-on-write) and async (copy-on-read) copyups
should be coordinated with each other.  If there is an async copyup in
progress, a sync copyup can just wait for it to complete.

Related to the above is the copyup request list.  I think it should be
a per image rather than a per img_request thing - copyup_list in struct
rbd_img_request doesn't make much sense to me.  What exactly is it's
use there?

Fourth, unless I'm missing something, the following

    rbd_img_parent_read(obj_request);
    rbd_assert(obj_request->img_request);
    if(is_copy_on_read(obj_request->img_request->rbd_dev))
        rbd_img_copyup_start(...);

in rbd_osd_read_callback() means that async copyup requests will be
issued for objects that don't exist in the parent.  I think the correct
way to handle this is to wait for a parent read request to complete and
issue a copyup only if it completes successfully.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-06-24  8:50 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-21  3:11 [PATCH] Rbd: copy-on-read support for kernel rbd client Li Wang
2015-05-21  9:07 ` Ilya Dryomov
2015-05-21  9:19 ` [PATCH 1/4] Rbd: add an option for copy-on-read Li Wang
2015-05-21  9:19 ` [PATCH 2/4] Rbd: add a new request: rbd_copyup_request Li Wang
2015-05-21  9:19 ` [PATCH 3/4] Rbd: helper functions to manipulate rbd_copyup_request Li Wang
2015-05-21  9:19 ` [PATCH 4/4] Rbd: implement the copy-on-read logic Li Wang
2015-06-24  8:50   ` Ilya Dryomov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.