All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] rbd: another set of patches
@ 2012-08-24 16:26 Alex Elder
  2012-08-24 16:32 ` [PATCH 01/11] rbd: handle locking inside __rbd_client_find() Alex Elder
                   ` (12 more replies)
  0 siblings, 13 replies; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:26 UTC (permalink / raw)
  To: ceph-devel

I keep moving simple and simplifying patches to the front of
my patch queue.  This is my latest set of things that are
ready to get reviewed and committed.

					-Alex

[PATCH 01/11] rbd: handle locking inside __rbd_client_find()
    This simplifies a little code.
[PATCH 02/11] rbd: don't over-allocate space for object prefix
    Don't allocate the max required space for the object prefix.
    This also ensures that the object prefix is either terminated
    with '\0' or else is maximally-sized.
[PATCH 03/11] rbd: kill incore snap_names_len
    Eliminate an unnecessary field in struct rbd_image_header.
[PATCH 04/11] rbd: more cleanup in rbd_header_from_disk()
    Rearrange some code, for better logical grouping.
[PATCH 05/11] rbd: move rbd_opts to struct rbd_device
    Tie rbd-specific options to the rbd_device, not the client.
[PATCH 06/11] rbd: add read_only rbd map option
[PATCH 07/11] rbd: kill notify_timeout option
    These two were done together, in this order, to preserve the
    argument parsing code.  The first adds the ability to map an
    rbd image read-only.  The second eliminates the only other
    rbd option, which is otherwise unused.
[PATCH 08/11] rbd: bio_chain_clone() cleanups
    Simple refactoring.
[PATCH 09/11] rbd: drop needless test in rbd_rq_fn()
    Trivial.
[PATCH 10/11] rbd: check for overflow in rbd_get_num_segments()
    Make sure we don't exceed 64-bit maximum when adding offset
    and length.
[PATCH 11/11] rbd: split up rbd_get_segment()
    Refactor rbd_get_segment() into three separate functions,
    which return a segment name, offset, and length.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/11] rbd: handle locking inside __rbd_client_find()
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
@ 2012-08-24 16:32 ` Alex Elder
  2012-08-30 16:09   ` Yehuda Sadeh
  2012-08-24 16:32 ` [PATCH 02/11] rbd: don't over-allocate space for object prefix Alex Elder
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:32 UTC (permalink / raw)
  To: ceph-devel

There is only one caller of __rbd_client_find(), and it somewhat
clumsily gets the appropriate lock and gets a reference to the
existing ceph_client structure if it's found.

Instead, have that function handle its own locking, and acquire the
reference if found while it holds the lock.  Drop the underscores
from the name because there's no need to signify anything special
about this function.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 8e6e29e..81b5344 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -322,19 +322,28 @@ out_opt:
 }

 /*
- * Find a ceph client with specific addr and configuration.
+ * Find a ceph client with specific addr and configuration.  If
+ * found, bump its reference count.
  */
-static struct rbd_client *__rbd_client_find(struct ceph_options *ceph_opts)
+static struct rbd_client *rbd_client_find(struct ceph_options *ceph_opts)
 {
 	struct rbd_client *client_node;
+	bool found = false;

 	if (ceph_opts->flags & CEPH_OPT_NOSHARE)
 		return NULL;

-	list_for_each_entry(client_node, &rbd_client_list, node)
-		if (!ceph_compare_options(ceph_opts, client_node->client))
-			return client_node;
-	return NULL;
+	spin_lock(&rbd_client_list_lock);
+	list_for_each_entry(client_node, &rbd_client_list, node) {
+		if (!ceph_compare_options(ceph_opts, client_node->client)) {
+			kref_get(&client_node->kref);
+			found = true;
+			break;
+		}
+	}
+	spin_unlock(&rbd_client_list_lock);
+
+	return found ? client_node : NULL;
 }

 /*
@@ -416,22 +425,16 @@ static struct rbd_client *rbd_get_client(const
char *mon_addr,
 		return ERR_CAST(ceph_opts);
 	}

-	spin_lock(&rbd_client_list_lock);
-	rbdc = __rbd_client_find(ceph_opts);
+	rbdc = rbd_client_find(ceph_opts);
 	if (rbdc) {
 		/* using an existing client */
-		kref_get(&rbdc->kref);
-		spin_unlock(&rbd_client_list_lock);
-
 		ceph_destroy_options(ceph_opts);
 		kfree(rbd_opts);

 		return rbdc;
 	}
-	spin_unlock(&rbd_client_list_lock);

 	rbdc = rbd_client_create(ceph_opts, rbd_opts);
-
 	if (IS_ERR(rbdc))
 		kfree(rbd_opts);

-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/11] rbd: don't over-allocate space for object prefix
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
  2012-08-24 16:32 ` [PATCH 01/11] rbd: handle locking inside __rbd_client_find() Alex Elder
@ 2012-08-24 16:32 ` Alex Elder
  2012-08-30 16:18   ` Yehuda Sadeh
  2012-08-24 16:33 ` [PATCH 03/11] rbd: kill incore snap_names_len Alex Elder
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:32 UTC (permalink / raw)
  To: ceph-devel

In rbd_header_from_disk() the object prefix buffer is sized based on
the maximum size it's block_name equivalent on disk could be.

Instead, only allocate enough to hold NUL-terminated string from
the on-disk header--or the maximum size of no NUL is found.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 81b5344..a8a4cba 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -519,18 +519,19 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 				 struct rbd_image_header_ondisk *ondisk)
 {
 	u32 snap_count;
+	size_t len;
 	size_t size;

 	memset(header, 0, sizeof (*header));

 	snap_count = le32_to_cpu(ondisk->snap_count);

-	size = sizeof (ondisk->block_name) + 1;
-	header->object_prefix = kmalloc(size, GFP_KERNEL);
+	len = strnlen(ondisk->block_name, sizeof (ondisk->block_name));
+	header->object_prefix = kmalloc(len + 1, GFP_KERNEL);
 	if (!header->object_prefix)
 		return -ENOMEM;
-	memcpy(header->object_prefix, ondisk->block_name, size - 1);
-	header->object_prefix[size - 1] = '\0';
+	memcpy(header->object_prefix, ondisk->block_name, len);
+	header->object_prefix[len] = '\0';

 	if (snap_count) {
 		header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/11] rbd: kill incore snap_names_len
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
  2012-08-24 16:32 ` [PATCH 01/11] rbd: handle locking inside __rbd_client_find() Alex Elder
  2012-08-24 16:32 ` [PATCH 02/11] rbd: don't over-allocate space for object prefix Alex Elder
@ 2012-08-24 16:33 ` Alex Elder
  2012-08-30 16:24   ` Yehuda Sadeh
  2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
  2012-08-24 16:33 ` [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk() Alex Elder
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:33 UTC (permalink / raw)
  To: ceph-devel

The only thing the on-disk snap_names_len field is used for is to
size the buffer allocated to hold a copy of the snapshot names for
an rbd image.

Don't bother saving it in the in-core rbd_image_header structure.
Just use a local variable to hold the required buffer size while
it's needed.

Move the code that actually copies the snapshot names up closer
to where the required length is saved.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index a8a4cba..7b3d861 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -81,7 +81,6 @@ struct rbd_image_header {
 	__u8 crypt_type;
 	__u8 comp_type;
 	struct ceph_snap_context *snapc;
-	u64 snap_names_len;
 	u32 total_snaps;

 	char *snap_names;
@@ -534,12 +533,14 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 	header->object_prefix[len] = '\0';

 	if (snap_count) {
-		header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
-		BUG_ON(header->snap_names_len > (u64) SIZE_MAX);
-		header->snap_names = kmalloc(header->snap_names_len,
-					     GFP_KERNEL);
+		u64 snap_names_len = le64_to_cpu(ondisk->snap_names_len);
+
+		BUG_ON(snap_names_len > (u64) SIZE_MAX);
+		header->snap_names = kmalloc(snap_names_len, GFP_KERNEL);
 		if (!header->snap_names)
 			goto out_err;
+		memcpy(header->snap_names, &ondisk->snaps[snap_count],
+			snap_names_len);

 		size = snap_count * sizeof (*header->snap_sizes);
 		header->snap_sizes = kmalloc(size, GFP_KERNEL);
@@ -547,7 +548,6 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 			goto out_err;
 	} else {
 		WARN_ON(ondisk->snap_names_len);
-		header->snap_names_len = 0;
 		header->snap_names = NULL;
 		header->snap_sizes = NULL;
 	}
@@ -579,10 +579,6 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 			header->snap_sizes[i] =
 				le64_to_cpu(ondisk->snaps[i].image_size);
 		}
-
-		/* copy snapshot names */
-		memcpy(header->snap_names, &ondisk->snaps[snap_count],
-			header->snap_names_len);
 	}

 	return 0;
@@ -592,7 +588,6 @@ out_err:
 	header->snap_sizes = NULL;
 	kfree(header->snap_names);
 	header->snap_names = NULL;
-	header->snap_names_len = 0;
 	kfree(header->object_prefix);
 	header->object_prefix = NULL;

@@ -660,7 +655,6 @@ static void rbd_header_free(struct rbd_image_header
*header)
 	header->snap_sizes = NULL;
 	kfree(header->snap_names);
 	header->snap_names = NULL;
-	header->snap_names_len = 0;
 	ceph_put_snap_context(header->snapc);
 	header->snapc = NULL;
 }
@@ -1800,7 +1794,6 @@ static int __rbd_refresh_header(struct rbd_device
*rbd_dev, u64 *hver)
 	rbd_dev->header.total_snaps = h.total_snaps;
 	rbd_dev->header.snapc = h.snapc;
 	rbd_dev->header.snap_names = h.snap_names;
-	rbd_dev->header.snap_names_len = h.snap_names_len;
 	rbd_dev->header.snap_sizes = h.snap_sizes;
 	/* Free the extra copy of the object prefix */
 	WARN_ON(strcmp(rbd_dev->header.object_prefix, h.object_prefix));
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk()
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (2 preceding siblings ...)
  2012-08-24 16:33 ` [PATCH 03/11] rbd: kill incore snap_names_len Alex Elder
@ 2012-08-24 16:33 ` Alex Elder
  2012-08-30 16:48   ` Yehuda Sadeh
  2012-08-24 16:33 ` [PATCH 05/11] rbd: move rbd_opts to struct rbd_device Alex Elder
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:33 UTC (permalink / raw)
  To: ceph-devel

This just rearranges things a bit more in rbd_header_from_disk()
so that the snapshot sizes are initialized right after the buffer
to hold them is allocated, and doing a little further consolidation
that follows from that.  It also adds a few simple comments.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 7b3d861..130dbc2 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -520,6 +520,7 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 	u32 snap_count;
 	size_t len;
 	size_t size;
+	u32 i;

 	memset(header, 0, sizeof (*header));

@@ -535,6 +536,8 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 	if (snap_count) {
 		u64 snap_names_len = le64_to_cpu(ondisk->snap_names_len);

+		/* Save a copy of the snapshot names */
+
 		BUG_ON(snap_names_len > (u64) SIZE_MAX);
 		header->snap_names = kmalloc(snap_names_len, GFP_KERNEL);
 		if (!header->snap_names)
@@ -542,10 +545,15 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 		memcpy(header->snap_names, &ondisk->snaps[snap_count],
 			snap_names_len);

+		/* Record each snapshot's size */
+
 		size = snap_count * sizeof (*header->snap_sizes);
 		header->snap_sizes = kmalloc(size, GFP_KERNEL);
 		if (!header->snap_sizes)
 			goto out_err;
+		for (i = 0; i < snap_count; i++)
+			header->snap_sizes[i] =
+				le64_to_cpu(ondisk->snaps[i].image_size);
 	} else {
 		WARN_ON(ondisk->snap_names_len);
 		header->snap_names = NULL;
@@ -558,6 +566,8 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 	header->comp_type = ondisk->options.comp_type;
 	header->total_snaps = snap_count;

+	/* Allocate and fill in the snapshot context */
+
 	size = sizeof (struct ceph_snap_context);
 	size += snap_count * sizeof (header->snapc->snaps[0]);
 	header->snapc = kzalloc(size, GFP_KERNEL);
@@ -567,19 +577,9 @@ static int rbd_header_from_disk(struct
rbd_image_header *header,
 	atomic_set(&header->snapc->nref, 1);
 	header->snapc->seq = le64_to_cpu(ondisk->snap_seq);
 	header->snapc->num_snaps = snap_count;
-
-	/* Fill in the snapshot information */
-
-	if (snap_count) {
-		u32 i;
-
-		for (i = 0; i < snap_count; i++) {
-			header->snapc->snaps[i] =
-				le64_to_cpu(ondisk->snaps[i].id);
-			header->snap_sizes[i] =
-				le64_to_cpu(ondisk->snaps[i].image_size);
-		}
-	}
+	for (i = 0; i < snap_count; i++)
+		header->snapc->snaps[i] =
+			le64_to_cpu(ondisk->snaps[i].id);

 	return 0;

-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/11] rbd: move rbd_opts to struct rbd_device
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (3 preceding siblings ...)
  2012-08-24 16:33 ` [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk() Alex Elder
@ 2012-08-24 16:33 ` Alex Elder
  2012-08-30 17:07   ` Yehuda Sadeh
  2012-08-24 16:34 ` [PATCH 06/11] rbd: add read_only rbd map option Alex Elder
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:33 UTC (permalink / raw)
  To: ceph-devel

The rbd options don't really apply to the ceph client.  So don't
store a pointer to it in the ceph_client structure, and put them
(a struct, not a pointer) into the rbd_dev structure proper.

Pass the rbd device structure to rbd_client_create() so it can
assign rbd_dev->rbdc if successful, and have it return an error code
instead of the rbd client pointer.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   49
++++++++++++++++---------------------------------
 1 file changed, 16 insertions(+), 33 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 130dbc2..b40f553 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -98,7 +98,6 @@ struct rbd_options {
  */
 struct rbd_client {
 	struct ceph_client	*client;
-	struct rbd_options	*rbd_opts;
 	struct kref		kref;
 	struct list_head	node;
 };
@@ -152,6 +151,7 @@ struct rbd_device {
 	struct gendisk		*disk;		/* blkdev's gendisk and rq */
 	struct request_queue	*q;

+	struct rbd_options	rbd_opts;
 	struct rbd_client	*rbd_client;

 	char			name[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
@@ -273,8 +273,7 @@ static const struct block_device_operations
rbd_bd_ops = {
  * Initialize an rbd client instance.
  * We own *ceph_opts.
  */
-static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts,
-					    struct rbd_options *rbd_opts)
+static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts)
 {
 	struct rbd_client *rbdc;
 	int ret = -ENOMEM;
@@ -298,8 +297,6 @@ static struct rbd_client *rbd_client_create(struct
ceph_options *ceph_opts,
 	if (ret < 0)
 		goto out_err;

-	rbdc->rbd_opts = rbd_opts;
-
 	spin_lock(&rbd_client_list_lock);
 	list_add_tail(&rbdc->node, &rbd_client_list);
 	spin_unlock(&rbd_client_list_lock);
@@ -402,42 +399,33 @@ static int parse_rbd_opts_token(char *c, void
*private)
  * Get a ceph client with specific addr and configuration, if one does
  * not exist create it.
  */
-static struct rbd_client *rbd_get_client(const char *mon_addr,
-					 size_t mon_addr_len,
-					 char *options)
+static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,
+				size_t mon_addr_len, char *options)
 {
-	struct rbd_client *rbdc;
+	struct rbd_options *rbd_opts = &rbd_dev->rbd_opts;
 	struct ceph_options *ceph_opts;
-	struct rbd_options *rbd_opts;
-
-	rbd_opts = kzalloc(sizeof(*rbd_opts), GFP_KERNEL);
-	if (!rbd_opts)
-		return ERR_PTR(-ENOMEM);
+	struct rbd_client *rbdc;

 	rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;

 	ceph_opts = ceph_parse_options(options, mon_addr,
 					mon_addr + mon_addr_len,
 					parse_rbd_opts_token, rbd_opts);
-	if (IS_ERR(ceph_opts)) {
-		kfree(rbd_opts);
-		return ERR_CAST(ceph_opts);
-	}
+	if (IS_ERR(ceph_opts))
+		return PTR_ERR(ceph_opts);

 	rbdc = rbd_client_find(ceph_opts);
 	if (rbdc) {
 		/* using an existing client */
 		ceph_destroy_options(ceph_opts);
-		kfree(rbd_opts);
-
-		return rbdc;
+	} else {
+		rbdc = rbd_client_create(ceph_opts);
+		if (IS_ERR(rbdc))
+			return PTR_ERR(rbdc);
 	}
+	rbd_dev->rbd_client = rbdc;

-	rbdc = rbd_client_create(ceph_opts, rbd_opts);
-	if (IS_ERR(rbdc))
-		kfree(rbd_opts);
-
-	return rbdc;
+	return 0;
 }

 /*
@@ -455,7 +443,6 @@ static void rbd_client_release(struct kref *kref)
 	spin_unlock(&rbd_client_list_lock);

 	ceph_destroy_client(rbdc->client);
-	kfree(rbdc->rbd_opts);
 	kfree(rbdc);
 }

@@ -2526,13 +2513,9 @@ static ssize_t rbd_add(struct bus_type *bus,
 	if (rc)
 		goto err_put_id;

-	rbd_dev->rbd_client = rbd_get_client(mon_addrs, mon_addrs_size - 1,
-						options);
-	if (IS_ERR(rbd_dev->rbd_client)) {
-		rc = PTR_ERR(rbd_dev->rbd_client);
-		rbd_dev->rbd_client = NULL;
+	rc = rbd_get_client(rbd_dev, mon_addrs, mon_addrs_size - 1, options);
+	if (rc < 0)
 		goto err_put_id;
-	}

 	/* pick the pool */
 	osdc = &rbd_dev->rbd_client->client->osdc;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/11] rbd: add read_only rbd map option
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (4 preceding siblings ...)
  2012-08-24 16:33 ` [PATCH 05/11] rbd: move rbd_opts to struct rbd_device Alex Elder
@ 2012-08-24 16:34 ` Alex Elder
  2012-08-30 17:29   ` Yehuda Sadeh
  2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
  2012-08-24 16:34 ` [PATCH 07/11] rbd: kill notify_timeout option Alex Elder
                   ` (6 subsequent siblings)
  12 siblings, 2 replies; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:34 UTC (permalink / raw)
  To: ceph-devel

Add the ability to map an rbd image read-only, by specifying either
"read_only" or "ro" as an option on the rbd "command line."

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b40f553..5e22bb5 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -69,7 +69,8 @@
 #define DEV_NAME_LEN		32
 #define MAX_INT_FORMAT_WIDTH	((5 * sizeof (int)) / 2 + 1)

-#define RBD_NOTIFY_TIMEOUT_DEFAULT 10
+#define RBD_NOTIFY_TIMEOUT_DEFAULT	10
+#define RBD_READ_ONLY_DEFAULT		false

 /*
  * block device image metadata (in-memory version)
@@ -91,6 +92,7 @@ struct rbd_image_header {

 struct rbd_options {
 	int	notify_timeout;
+	bool	read_only;
 };

 /*
@@ -176,7 +178,7 @@ struct rbd_device {
 	u64                     snap_id;	/* current snapshot id */
 	/* whether the snap_id this device reads from still exists */
 	bool                    snap_exists;
-	int                     read_only;
+	bool			read_only;

 	struct list_head	node;

@@ -351,12 +353,18 @@ enum {
 	/* int args above */
 	Opt_last_string,
 	/* string args above */
+	Opt_read_only,
+	/* Boolean args above */
+	Opt_last_bool,
 };

 static match_table_t rbd_opts_tokens = {
 	{Opt_notify_timeout, "notify_timeout=%d"},
 	/* int args above */
 	/* string args above */
+	{Opt_read_only, "read_only"},
+	{Opt_read_only, "ro"},		/* Alternate spelling */
+	/* Boolean args above */
 	{-1, NULL}
 };

@@ -381,6 +389,8 @@ static int parse_rbd_opts_token(char *c, void *private)
 	} else if (token > Opt_last_int && token < Opt_last_string) {
 		dout("got string token %d val %s\n", token,
 		     argstr[0].from);
+	} else if (token > Opt_last_string && token < Opt_last_bool) {
+		dout("got Boolean token %d\n", token);
 	} else {
 		dout("got token %d\n", token);
 	}
@@ -389,6 +399,9 @@ static int parse_rbd_opts_token(char *c, void *private)
 	case Opt_notify_timeout:
 		rbd_opts->notify_timeout = intval;
 		break;
+	case Opt_read_only:
+		rbd_opts->read_only = !rbd_opts->read_only;
+		break;
 	default:
 		BUG_ON(token);
 	}
@@ -407,6 +420,7 @@ static int rbd_get_client(struct rbd_device
*rbd_dev, const char *mon_addr,
 	struct rbd_client *rbdc;

 	rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
+	rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;

 	ceph_opts = ceph_parse_options(options, mon_addr,
 					mon_addr + mon_addr_len,
@@ -613,7 +627,7 @@ static int rbd_header_set_snap(struct rbd_device
*rbd_dev, u64 *size)
 		    sizeof (RBD_SNAP_HEAD_NAME))) {
 		rbd_dev->snap_id = CEPH_NOSNAP;
 		rbd_dev->snap_exists = false;
-		rbd_dev->read_only = 0;
+		rbd_dev->read_only = rbd_dev->rbd_opts.read_only;
 		if (size)
 			*size = rbd_dev->header.image_size;
 	} else {
@@ -625,7 +639,7 @@ static int rbd_header_set_snap(struct rbd_device
*rbd_dev, u64 *size)
 			goto done;
 		rbd_dev->snap_id = snap_id;
 		rbd_dev->snap_exists = true;
-		rbd_dev->read_only = 1;
+		rbd_dev->read_only = true;	/* No choice for snapshots */
 	}

 	ret = 0;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/11] rbd: kill notify_timeout option
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (5 preceding siblings ...)
  2012-08-24 16:34 ` [PATCH 06/11] rbd: add read_only rbd map option Alex Elder
@ 2012-08-24 16:34 ` Alex Elder
  2012-08-30 17:31   ` Yehuda Sadeh
  2012-08-24 16:35 ` [PATCH 08/11] rbd: bio_chain_clone() cleanups Alex Elder
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:34 UTC (permalink / raw)
  To: ceph-devel

The "notify_timeout" rbd device option is never used, so get rid of
it.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |    8 --------
 1 file changed, 8 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 5e22bb5..e94336d 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -69,7 +69,6 @@
 #define DEV_NAME_LEN		32
 #define MAX_INT_FORMAT_WIDTH	((5 * sizeof (int)) / 2 + 1)

-#define RBD_NOTIFY_TIMEOUT_DEFAULT	10
 #define RBD_READ_ONLY_DEFAULT		false

 /*
@@ -91,7 +90,6 @@ struct rbd_image_header {
 };

 struct rbd_options {
-	int	notify_timeout;
 	bool	read_only;
 };

@@ -348,7 +346,6 @@ static struct rbd_client *rbd_client_find(struct
ceph_options *ceph_opts)
  * mount options
  */
 enum {
-	Opt_notify_timeout,
 	Opt_last_int,
 	/* int args above */
 	Opt_last_string,
@@ -359,7 +356,6 @@ enum {
 };

 static match_table_t rbd_opts_tokens = {
-	{Opt_notify_timeout, "notify_timeout=%d"},
 	/* int args above */
 	/* string args above */
 	{Opt_read_only, "read_only"},
@@ -396,9 +392,6 @@ static int parse_rbd_opts_token(char *c, void *private)
 	}

 	switch (token) {
-	case Opt_notify_timeout:
-		rbd_opts->notify_timeout = intval;
-		break;
 	case Opt_read_only:
 		rbd_opts->read_only = !rbd_opts->read_only;
 		break;
@@ -419,7 +412,6 @@ static int rbd_get_client(struct rbd_device
*rbd_dev, const char *mon_addr,
 	struct ceph_options *ceph_opts;
 	struct rbd_client *rbdc;

-	rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
 	rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;

 	ceph_opts = ceph_parse_options(options, mon_addr,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/11] rbd: bio_chain_clone() cleanups
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (6 preceding siblings ...)
  2012-08-24 16:34 ` [PATCH 07/11] rbd: kill notify_timeout option Alex Elder
@ 2012-08-24 16:35 ` Alex Elder
  2012-08-30 17:40   ` Yehuda Sadeh
  2012-08-24 16:35 ` [PATCH 09/11] rbd: drop needless test in rbd_rq_fn() Alex Elder
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:35 UTC (permalink / raw)
  To: ceph-devel

In bio_chain_clone(), at the end of the function the bi_next field
of the tail of the new bio chain is nulled.  This isn't necessary,
because if "tail" is non-null, its value will be the last bio
structure allocated at the top of the while loop in that function.
And before that structure is added to the end of the new chain, its
bi_next pointer is always made null.

While touching that function, clean a few other things:
    - define each local variable on its own line
    - move the definition of "tmp" to an inner scope
    - move the modification of gfpmask closer to where it's used
    - rearrange the logic that sets the chain's tail pointer

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index e94336d..bb8dad7 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -741,7 +741,9 @@ static struct bio *bio_chain_clone(struct bio **old,
struct bio **next,
 				   struct bio_pair **bp,
 				   int len, gfp_t gfpmask)
 {
-	struct bio *tmp, *old_chain = *old, *new_chain = NULL, *tail = NULL;
+	struct bio *old_chain = *old;
+	struct bio *new_chain = NULL;
+	struct bio *tail;
 	int total = 0;

 	if (*bp) {
@@ -750,9 +752,12 @@ static struct bio *bio_chain_clone(struct bio
**old, struct bio **next,
 	}

 	while (old_chain && (total < len)) {
+		struct bio *tmp;
+
 		tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
 		if (!tmp)
 			goto err_out;
+		gfpmask &= ~__GFP_WAIT;	/* can't wait after the first */

 		if (total + old_chain->bi_size > len) {
 			struct bio_pair *bp;
@@ -780,15 +785,12 @@ static struct bio *bio_chain_clone(struct bio
**old, struct bio **next,
 		}

 		tmp->bi_bdev = NULL;
-		gfpmask &= ~__GFP_WAIT;
 		tmp->bi_next = NULL;
-
-		if (!new_chain) {
-			new_chain = tail = tmp;
-		} else {
+		if (new_chain)
 			tail->bi_next = tmp;
-			tail = tmp;
-		}
+		else
+			new_chain = tmp;
+		tail = tmp;
 		old_chain = old_chain->bi_next;

 		total += tmp->bi_size;
@@ -796,9 +798,6 @@ static struct bio *bio_chain_clone(struct bio **old,
struct bio **next,

 	BUG_ON(total < len);

-	if (tail)
-		tail->bi_next = NULL;
-
 	*old = old_chain;

 	return new_chain;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/11] rbd: drop needless test in rbd_rq_fn()
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (7 preceding siblings ...)
  2012-08-24 16:35 ` [PATCH 08/11] rbd: bio_chain_clone() cleanups Alex Elder
@ 2012-08-24 16:35 ` Alex Elder
  2012-08-30 17:41   ` Yehuda Sadeh
  2012-08-24 16:36 ` [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments() Alex Elder
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:35 UTC (permalink / raw)
  To: ceph-devel

There's a test for null rq pointer inside the while loop in
rbd_rq_fn() that's not needed.  That same test already occurred
in the immediatly preceding loop condition test.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |    4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index bb8dad7..fad4ecb 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -1462,10 +1462,6 @@ static void rbd_rq_fn(struct request_queue *q)
 		struct rbd_req_coll *coll;
 		struct ceph_snap_context *snapc;

-		/* peek at request from block layer */
-		if (!rq)
-			break;
-
 		dout("fetched request\n");

 		/* filter out block requests we don't understand */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments()
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (8 preceding siblings ...)
  2012-08-24 16:35 ` [PATCH 09/11] rbd: drop needless test in rbd_rq_fn() Alex Elder
@ 2012-08-24 16:36 ` Alex Elder
  2012-08-30 17:50   ` Yehuda Sadeh
  2012-08-24 16:36 ` [PATCH 11/11] rbd: split up rbd_get_segment() Alex Elder
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:36 UTC (permalink / raw)
  To: ceph-devel

It is possible in rbd_get_num_segments() for an overflow to occur
when adding the offset and length.  This is easily avoided.

Since the function returns an int and the one caller is already
prepared to handle errors, have it return -ERANGE if overflow would
occur.

The overflow check would not work if a zero-length request was
being tested, so short-circuit that case, returning 0 for the
number of segments required.  (This condition might be avoided
elsewhere already, I don't know.)

Have the caller end the request if either an error or 0 is returned.
The returned value is passed to __blk_end_request_all(), meaning
a 0 length request is not treated an error.

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index fad4ecb..b649446 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -50,6 +50,10 @@
 #define	SECTOR_SHIFT	9
 #define	SECTOR_SIZE	(1ULL << SECTOR_SHIFT)

+/* It might be useful to have this defined elsewhere too */
+
+#define	U64_MAX	((u64) (~0ULL))
+
 #define RBD_DRV_NAME "rbd"
 #define RBD_DRV_NAME_LONG "rbd (rados block device)"

@@ -678,8 +682,17 @@ static u64 rbd_get_segment(struct rbd_image_header
*header,
 static int rbd_get_num_segments(struct rbd_image_header *header,
 				u64 ofs, u64 len)
 {
-	u64 start_seg = ofs >> header->obj_order;
-	u64 end_seg = (ofs + len - 1) >> header->obj_order;
+	u64 start_seg;
+	u64 end_seg;
+
+	if (!len)
+		return 0;
+	if (len - 1 > U64_MAX - ofs)
+		return -ERANGE;
+
+	start_seg = ofs >> header->obj_order;
+	end_seg = (ofs + len - 1) >> header->obj_order;
+
 	return end_seg - start_seg + 1;
 }

@@ -1502,6 +1515,12 @@ static void rbd_rq_fn(struct request_queue *q)
 		     size, (unsigned long long) blk_rq_pos(rq) * SECTOR_SIZE);

 		num_segs = rbd_get_num_segments(&rbd_dev->header, ofs, size);
+		if (num_segs <= 0) {
+			spin_lock_irq(q->queue_lock);
+			__blk_end_request_all(rq, num_segs);
+			ceph_put_snap_context(snapc);
+			continue;
+		}
 		coll = rbd_alloc_coll(num_segs);
 		if (!coll) {
 			spin_lock_irq(q->queue_lock);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/11] rbd: split up rbd_get_segment()
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (9 preceding siblings ...)
  2012-08-24 16:36 ` [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments() Alex Elder
@ 2012-08-24 16:36 ` Alex Elder
  2012-08-30 18:03   ` Yehuda Sadeh
  2012-08-30 12:32 ` [PATCH 00/11] rbd: another set of patches Alex Elder
  2012-09-06 15:34 ` Alex Elder
  12 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-08-24 16:36 UTC (permalink / raw)
  To: ceph-devel

There are two places where rbd_get_segment() is called.  One, in
rbd_rq_fn(), only needs to know the length within a segment that an
I/O request should be.  The other, in rbd_do_op(), also needs the
name of the object and the offset within it for the I/O request.

Split out rbd_segment_name() into three dedicated functions:
    - rbd_segment_name() allocates and formats the name of the
      object for a segment containing a given rbd image offset
    - rbd_segment_offset() computes the offset within a segment for
      a given rbd image offset
    - rbd_segment_length() computes the length to use for I/O within
      a segment for a request, not to exceed the end of a segment
      object.

In the new functions be a bit more careful, checking for possible
error conditions:
    - watch for errors or overflows returned by snprintf()
    - catch (using BUG_ON()) potential overflow conditions
      when computing segment length

Signed-off-by: Alex Elder <elder@inktank.com>
---
 drivers/block/rbd.c |   66
+++++++++++++++++++++++++++++++--------------------
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index b649446..3a79e58 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -656,27 +656,47 @@ static void rbd_header_free(struct
rbd_image_header *header)
 	header->snapc = NULL;
 }

-/*
- * get the actual striped segment name, offset and length
- */
-static u64 rbd_get_segment(struct rbd_image_header *header,
-			   const char *object_prefix,
-			   u64 ofs, u64 len,
-			   char *seg_name, u64 *segofs)
+static char *rbd_segment_name(struct rbd_device *rbd_dev, u64 offset)
+{
+	char *name;
+	u64 segment;
+	int ret;
+
+	name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
+	if (!name)
+		return NULL;
+	segment = offset >> rbd_dev->header.obj_order;
+	ret = snprintf(name, RBD_MAX_SEG_NAME_LEN, "%s.%012llx",
+			rbd_dev->header.object_prefix, segment);
+	if (ret < 0 || ret >= RBD_MAX_SEG_NAME_LEN) {
+		pr_err("error formatting segment name for #%llu (%d)\n",
+			segment, ret);
+		kfree(name);
+		name = NULL;
+	}
+
+	return name;
+}
+
+static u64 rbd_segment_offset(struct rbd_device *rbd_dev, u64 offset)
 {
-	u64 seg = ofs >> header->obj_order;
+	u64 segment_size = (u64) 1 << rbd_dev->header.obj_order;

-	if (seg_name)
-		snprintf(seg_name, RBD_MAX_SEG_NAME_LEN,
-			 "%s.%012llx", object_prefix, seg);
+	return offset & (segment_size - 1);
+}
+
+static u64 rbd_segment_length(struct rbd_device *rbd_dev,
+				u64 offset, u64 length)
+{
+	u64 segment_size = (u64) 1 << rbd_dev->header.obj_order;

-	ofs = ofs & ((1 << header->obj_order) - 1);
-	len = min_t(u64, len, (1 << header->obj_order) - ofs);
+	offset &= segment_size - 1;

-	if (segofs)
-		*segofs = ofs;
+	BUG_ON(length > U64_MAX - offset);
+	if (offset + length > segment_size)
+		length = segment_size - offset;

-	return len;
+	return length;
 }

 static int rbd_get_num_segments(struct rbd_image_header *header,
@@ -1114,14 +1134,11 @@ static int rbd_do_op(struct request *rq,
 	struct ceph_osd_req_op *ops;
 	u32 payload_len;

-	seg_name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
+	seg_name = rbd_segment_name(rbd_dev, ofs);
 	if (!seg_name)
 		return -ENOMEM;
-
-	seg_len = rbd_get_segment(&rbd_dev->header,
-				  rbd_dev->header.object_prefix,
-				  ofs, len,
-				  seg_name, &seg_ofs);
+	seg_len = rbd_segment_length(rbd_dev, ofs, len);
+	seg_ofs = rbd_segment_offset(rbd_dev, ofs);

 	payload_len = (flags & CEPH_OSD_FLAG_WRITE ? seg_len : 0);

@@ -1532,10 +1549,7 @@ static void rbd_rq_fn(struct request_queue *q)
 		do {
 			/* a bio clone to be passed down to OSD req */
 			dout("rq->bio->bi_vcnt=%hu\n", rq->bio->bi_vcnt);
-			op_size = rbd_get_segment(&rbd_dev->header,
-						  rbd_dev->header.object_prefix,
-						  ofs, size,
-						  NULL, NULL);
+			op_size = rbd_segment_length(rbd_dev, ofs, size);
 			kref_get(&coll->kref);
 			bio = bio_chain_clone(&rq_bio, &next_bio, &bp,
 					      op_size, GFP_ATOMIC);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/11] rbd: another set of patches
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (10 preceding siblings ...)
  2012-08-24 16:36 ` [PATCH 11/11] rbd: split up rbd_get_segment() Alex Elder
@ 2012-08-30 12:32 ` Alex Elder
  2012-09-06 15:34 ` Alex Elder
  12 siblings, 0 replies; 35+ messages in thread
From: Alex Elder @ 2012-08-30 12:32 UTC (permalink / raw)
  To: ceph-devel

On 08/24/2012 11:26 AM, Alex Elder wrote:
> I keep moving simple and simplifying patches to the front of
> my patch queue.  This is my latest set of things that are
> ready to get reviewed and committed.

Anyone have time to review these?  They're mostly very simple.
Thanks.

					-Alex

> [PATCH 01/11] rbd: handle locking inside __rbd_client_find()
>     This simplifies a little code.
> [PATCH 02/11] rbd: don't over-allocate space for object prefix
>     Don't allocate the max required space for the object prefix.
>     This also ensures that the object prefix is either terminated
>     with '\0' or else is maximally-sized.
> [PATCH 03/11] rbd: kill incore snap_names_len
>     Eliminate an unnecessary field in struct rbd_image_header.
> [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk()
>     Rearrange some code, for better logical grouping.
> [PATCH 05/11] rbd: move rbd_opts to struct rbd_device
>     Tie rbd-specific options to the rbd_device, not the client.
> [PATCH 06/11] rbd: add read_only rbd map option
> [PATCH 07/11] rbd: kill notify_timeout option
>     These two were done together, in this order, to preserve the
>     argument parsing code.  The first adds the ability to map an
>     rbd image read-only.  The second eliminates the only other
>     rbd option, which is otherwise unused.
> [PATCH 08/11] rbd: bio_chain_clone() cleanups
>     Simple refactoring.
> [PATCH 09/11] rbd: drop needless test in rbd_rq_fn()
>     Trivial.
> [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments()
>     Make sure we don't exceed 64-bit maximum when adding offset
>     and length.
> [PATCH 11/11] rbd: split up rbd_get_segment()
>     Refactor rbd_get_segment() into three separate functions,
>     which return a segment name, offset, and length.
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/11] rbd: handle locking inside __rbd_client_find()
  2012-08-24 16:32 ` [PATCH 01/11] rbd: handle locking inside __rbd_client_find() Alex Elder
@ 2012-08-30 16:09   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 16:09 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Fri, Aug 24, 2012 at 9:32 AM, Alex Elder <elder@inktank.com> wrote:
>
> There is only one caller of __rbd_client_find(), and it somewhat
> clumsily gets the appropriate lock and gets a reference to the
> existing ceph_client structure if it's found.
>
> Instead, have that function handle its own locking, and acquire the
> reference if found while it holds the lock.  Drop the underscores
> from the name because there's no need to signify anything special
> about this function.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 8e6e29e..81b5344 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -322,19 +322,28 @@ out_opt:
>  }
>
>  /*
> - * Find a ceph client with specific addr and configuration.
> + * Find a ceph client with specific addr and configuration.  If
> + * found, bump its reference count.
>   */
> -static struct rbd_client *__rbd_client_find(struct ceph_options
> *ceph_opts)
> +static struct rbd_client *rbd_client_find(struct ceph_options *ceph_opts)
>  {
>         struct rbd_client *client_node;
> +       bool found = false;
>
>         if (ceph_opts->flags & CEPH_OPT_NOSHARE)
>                 return NULL;
>
> -       list_for_each_entry(client_node, &rbd_client_list, node)
> -               if (!ceph_compare_options(ceph_opts, client_node->client))
> -                       return client_node;
> -       return NULL;
> +       spin_lock(&rbd_client_list_lock);
> +       list_for_each_entry(client_node, &rbd_client_list, node) {
> +               if (!ceph_compare_options(ceph_opts, client_node->client))
> {
> +                       kref_get(&client_node->kref);
> +                       found = true;
> +                       break;
> +               }
> +       }
> +       spin_unlock(&rbd_client_list_lock);
> +
> +       return found ? client_node : NULL;
>  }
>
>  /*
> @@ -416,22 +425,16 @@ static struct rbd_client *rbd_get_client(const
> char *mon_addr,
>                 return ERR_CAST(ceph_opts);
>         }
>
> -       spin_lock(&rbd_client_list_lock);
> -       rbdc = __rbd_client_find(ceph_opts);
> +       rbdc = rbd_client_find(ceph_opts);
>         if (rbdc) {
>                 /* using an existing client */
> -               kref_get(&rbdc->kref);
> -               spin_unlock(&rbd_client_list_lock);
> -
>                 ceph_destroy_options(ceph_opts);
>                 kfree(rbd_opts);
>
>                 return rbdc;
>         }
> -       spin_unlock(&rbd_client_list_lock);
>
>         rbdc = rbd_client_create(ceph_opts, rbd_opts);
> -
>         if (IS_ERR(rbdc))
>                 kfree(rbd_opts);
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/11] rbd: don't over-allocate space for object prefix
  2012-08-24 16:32 ` [PATCH 02/11] rbd: don't over-allocate space for object prefix Alex Elder
@ 2012-08-30 16:18   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 16:18 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Fri, Aug 24, 2012 at 9:32 AM, Alex Elder <elder@inktank.com> wrote:
> In rbd_header_from_disk() the object prefix buffer is sized based on
> the maximum size it's block_name equivalent on disk could be.
>
> Instead, only allocate enough to hold NUL-terminated string from
> the on-disk header--or the maximum size of no NUL is found.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |    9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 81b5344..a8a4cba 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -519,18 +519,19 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>                                  struct rbd_image_header_ondisk *ondisk)
>  {
>         u32 snap_count;
> +       size_t len;
>         size_t size;
>
>         memset(header, 0, sizeof (*header));
>
>         snap_count = le32_to_cpu(ondisk->snap_count);
>
> -       size = sizeof (ondisk->block_name) + 1;
> -       header->object_prefix = kmalloc(size, GFP_KERNEL);
> +       len = strnlen(ondisk->block_name, sizeof (ondisk->block_name));
> +       header->object_prefix = kmalloc(len + 1, GFP_KERNEL);
>         if (!header->object_prefix)
>                 return -ENOMEM;
> -       memcpy(header->object_prefix, ondisk->block_name, size - 1);
> -       header->object_prefix[size - 1] = '\0';
> +       memcpy(header->object_prefix, ondisk->block_name, len);
> +       header->object_prefix[len] = '\0';
>
>         if (snap_count) {
>                 header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/11] rbd: kill incore snap_names_len
  2012-08-24 16:33 ` [PATCH 03/11] rbd: kill incore snap_names_len Alex Elder
@ 2012-08-30 16:24   ` Yehuda Sadeh
  2012-08-30 16:41     ` Alex Elder
  2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
  1 sibling, 1 reply; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 16:24 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

On Fri, Aug 24, 2012 at 9:33 AM, Alex Elder <elder@inktank.com> wrote:
> The only thing the on-disk snap_names_len field is used for is to
> size the buffer allocated to hold a copy of the snapshot names for
> an rbd image.
>
> Don't bother saving it in the in-core rbd_image_header structure.
> Just use a local variable to hold the required buffer size while
> it's needed.
>
> Move the code that actually copies the snapshot names up closer
> to where the required length is saved.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   19 ++++++-------------
>  1 file changed, 6 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index a8a4cba..7b3d861 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -81,7 +81,6 @@ struct rbd_image_header {
>         __u8 crypt_type;
>         __u8 comp_type;
>         struct ceph_snap_context *snapc;
> -       u64 snap_names_len;
>         u32 total_snaps;
>
>         char *snap_names;
> @@ -534,12 +533,14 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>         header->object_prefix[len] = '\0';
>
>         if (snap_count) {
> -               header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
> -               BUG_ON(header->snap_names_len > (u64) SIZE_MAX);
> -               header->snap_names = kmalloc(header->snap_names_len,
> -                                            GFP_KERNEL);
> +               u64 snap_names_len = le64_to_cpu(ondisk->snap_names_len);
> +
> +               BUG_ON(snap_names_len > (u64) SIZE_MAX);

Should we get rid of this BUG_ON and return -EIO instead?

> +               header->snap_names = kmalloc(snap_names_len, GFP_KERNEL);
>                 if (!header->snap_names)
>                         goto out_err;
> +               memcpy(header->snap_names, &ondisk->snaps[snap_count],
> +                       snap_names_len);

I think we're missing a check here to verify that we don't exceed the
ondisk buffer


>
>                 size = snap_count * sizeof (*header->snap_sizes);
>                 header->snap_sizes = kmalloc(size, GFP_KERNEL);
> @@ -547,7 +548,6 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>                         goto out_err;
>         } else {
>                 WARN_ON(ondisk->snap_names_len);
> -               header->snap_names_len = 0;
>                 header->snap_names = NULL;
>                 header->snap_sizes = NULL;
>         }
> @@ -579,10 +579,6 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>                         header->snap_sizes[i] =
>                                 le64_to_cpu(ondisk->snaps[i].image_size);
>                 }
> -
> -               /* copy snapshot names */
> -               memcpy(header->snap_names, &ondisk->snaps[snap_count],
> -                       header->snap_names_len);
>         }
>
>         return 0;
> @@ -592,7 +588,6 @@ out_err:
>         header->snap_sizes = NULL;
>         kfree(header->snap_names);
>         header->snap_names = NULL;
> -       header->snap_names_len = 0;
>         kfree(header->object_prefix);
>         header->object_prefix = NULL;
>
> @@ -660,7 +655,6 @@ static void rbd_header_free(struct rbd_image_header
> *header)
>         header->snap_sizes = NULL;
>         kfree(header->snap_names);
>         header->snap_names = NULL;
> -       header->snap_names_len = 0;
>         ceph_put_snap_context(header->snapc);
>         header->snapc = NULL;
>  }
> @@ -1800,7 +1794,6 @@ static int __rbd_refresh_header(struct rbd_device
> *rbd_dev, u64 *hver)
>         rbd_dev->header.total_snaps = h.total_snaps;
>         rbd_dev->header.snapc = h.snapc;
>         rbd_dev->header.snap_names = h.snap_names;
> -       rbd_dev->header.snap_names_len = h.snap_names_len;
>         rbd_dev->header.snap_sizes = h.snap_sizes;
>         /* Free the extra copy of the object prefix */
>         WARN_ON(strcmp(rbd_dev->header.object_prefix, h.object_prefix));
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/11] rbd: kill incore snap_names_len
  2012-08-30 16:24   ` Yehuda Sadeh
@ 2012-08-30 16:41     ` Alex Elder
  0 siblings, 0 replies; 35+ messages in thread
From: Alex Elder @ 2012-08-30 16:41 UTC (permalink / raw)
  To: Yehuda Sadeh; +Cc: ceph-devel

On 08/30/2012 11:24 AM, Yehuda Sadeh wrote:
> On Fri, Aug 24, 2012 at 9:33 AM, Alex Elder <elder@inktank.com> wrote:
>> The only thing the on-disk snap_names_len field is used for is to
>> size the buffer allocated to hold a copy of the snapshot names for
>> an rbd image.
>>
>> Don't bother saving it in the in-core rbd_image_header structure.
>> Just use a local variable to hold the required buffer size while
>> it's needed.
>>
>> Move the code that actually copies the snapshot names up closer
>> to where the required length is saved.
>>
>> Signed-off-by: Alex Elder <elder@inktank.com>
>> ---
>>  drivers/block/rbd.c |   19 ++++++-------------
>>  1 file changed, 6 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index a8a4cba..7b3d861 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -81,7 +81,6 @@ struct rbd_image_header {
>>         __u8 crypt_type;
>>         __u8 comp_type;
>>         struct ceph_snap_context *snapc;
>> -       u64 snap_names_len;
>>         u32 total_snaps;
>>
>>         char *snap_names;
>> @@ -534,12 +533,14 @@ static int rbd_header_from_disk(struct
>> rbd_image_header *header,
>>         header->object_prefix[len] = '\0';
>>
>>         if (snap_count) {
>> -               header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
>> -               BUG_ON(header->snap_names_len > (u64) SIZE_MAX);
>> -               header->snap_names = kmalloc(header->snap_names_len,
>> -                                            GFP_KERNEL);
>> +               u64 snap_names_len = le64_to_cpu(ondisk->snap_names_len);
>> +
>> +               BUG_ON(snap_names_len > (u64) SIZE_MAX);
> 
> Should we get rid of this BUG_ON and return -EIO instead?

Yes.  I've been using BUG_ON() a bit freely and I'd really like
to define a CEPH_ASSERT() or something instead.  In this case I
can return an error value so I will.

I'll define CEPH_ASSERT() at some point, separately so I can
state assumptions without potentially killing the kernel.


>> +               header->snap_names = kmalloc(snap_names_len, GFP_KERNEL);
>>                 if (!header->snap_names)
>>                         goto out_err;
>> +               memcpy(header->snap_names, &ondisk->snaps[snap_count],
>> +                       snap_names_len);
> 
> I think we're missing a check here to verify that we don't exceed the
> ondisk buffer

I believe in this case we're OK, though it's not obvious looking
only at this code.  (I'm looking at code with all these patches
applied at the moment, so I may need to reorder the patches for the
following explanation to be valid.)

The ondisk buffer passed in here came from a successful call to
what's now called rbd_dev_v1_header_read(), which reads the
header in *only*, and only succeeds if it got everything it
was looking for--namely the snapshot context, plus the size
of the set of names that the context says will follow.  It
will furthermore re-read the header if the count of snapshots
found in what got read differed from what we wanted it to be.

Therefore the ondisk header+names buffer here will always be
self-consistent--there will be room in the buffer for exactly
the number of bytes to hold the set of NUL-terminated snapshot
names.

If I could find an easy way to verify this here I would, but
right now the size of the ondisk buffer passed in is not
available.  If you think it's important I can make changes
so it is though.

And in any case, before I commit anything I'll take another
look at this patch and maybe reorder things so the above
argument holds water.

					-Alex

>>
>>                 size = snap_count * sizeof (*header->snap_sizes);
>>                 header->snap_sizes = kmalloc(size, GFP_KERNEL);
>> @@ -547,7 +548,6 @@ static int rbd_header_from_disk(struct
>> rbd_image_header *header,
>>                         goto out_err;
>>         } else {
>>                 WARN_ON(ondisk->snap_names_len);
>> -               header->snap_names_len = 0;
>>                 header->snap_names = NULL;
>>                 header->snap_sizes = NULL;
>>         }
>> @@ -579,10 +579,6 @@ static int rbd_header_from_disk(struct
>> rbd_image_header *header,
>>                         header->snap_sizes[i] =
>>                                 le64_to_cpu(ondisk->snaps[i].image_size);
>>                 }
>> -
>> -               /* copy snapshot names */
>> -               memcpy(header->snap_names, &ondisk->snaps[snap_count],
>> -                       header->snap_names_len);
>>         }
>>
>>         return 0;
>> @@ -592,7 +588,6 @@ out_err:
>>         header->snap_sizes = NULL;
>>         kfree(header->snap_names);
>>         header->snap_names = NULL;
>> -       header->snap_names_len = 0;
>>         kfree(header->object_prefix);
>>         header->object_prefix = NULL;
>>
>> @@ -660,7 +655,6 @@ static void rbd_header_free(struct rbd_image_header
>> *header)
>>         header->snap_sizes = NULL;
>>         kfree(header->snap_names);
>>         header->snap_names = NULL;
>> -       header->snap_names_len = 0;
>>         ceph_put_snap_context(header->snapc);
>>         header->snapc = NULL;
>>  }
>> @@ -1800,7 +1794,6 @@ static int __rbd_refresh_header(struct rbd_device
>> *rbd_dev, u64 *hver)
>>         rbd_dev->header.total_snaps = h.total_snaps;
>>         rbd_dev->header.snapc = h.snapc;
>>         rbd_dev->header.snap_names = h.snap_names;
>> -       rbd_dev->header.snap_names_len = h.snap_names_len;
>>         rbd_dev->header.snap_sizes = h.snap_sizes;
>>         /* Free the extra copy of the object prefix */
>>         WARN_ON(strcmp(rbd_dev->header.object_prefix, h.object_prefix));
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk()
  2012-08-24 16:33 ` [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk() Alex Elder
@ 2012-08-30 16:48   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 16:48 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Fri, Aug 24, 2012 at 9:33 AM, Alex Elder <elder@inktank.com> wrote:
> This just rearranges things a bit more in rbd_header_from_disk()
> so that the snapshot sizes are initialized right after the buffer
> to hold them is allocated, and doing a little further consolidation
> that follows from that.  It also adds a few simple comments.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 7b3d861..130dbc2 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -520,6 +520,7 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>         u32 snap_count;
>         size_t len;
>         size_t size;
> +       u32 i;
>
>         memset(header, 0, sizeof (*header));
>
> @@ -535,6 +536,8 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>         if (snap_count) {
>                 u64 snap_names_len = le64_to_cpu(ondisk->snap_names_len);
>
> +               /* Save a copy of the snapshot names */
> +
>                 BUG_ON(snap_names_len > (u64) SIZE_MAX);
>                 header->snap_names = kmalloc(snap_names_len, GFP_KERNEL);
>                 if (!header->snap_names)
> @@ -542,10 +545,15 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>                 memcpy(header->snap_names, &ondisk->snaps[snap_count],
>                         snap_names_len);
>
> +               /* Record each snapshot's size */
> +
>                 size = snap_count * sizeof (*header->snap_sizes);
>                 header->snap_sizes = kmalloc(size, GFP_KERNEL);
>                 if (!header->snap_sizes)
>                         goto out_err;
> +               for (i = 0; i < snap_count; i++)
> +                       header->snap_sizes[i] =
> +                               le64_to_cpu(ondisk->snaps[i].image_size);
>         } else {
>                 WARN_ON(ondisk->snap_names_len);
>                 header->snap_names = NULL;
> @@ -558,6 +566,8 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>         header->comp_type = ondisk->options.comp_type;
>         header->total_snaps = snap_count;
>
> +       /* Allocate and fill in the snapshot context */
> +
>         size = sizeof (struct ceph_snap_context);
>         size += snap_count * sizeof (header->snapc->snaps[0]);
>         header->snapc = kzalloc(size, GFP_KERNEL);
> @@ -567,19 +577,9 @@ static int rbd_header_from_disk(struct
> rbd_image_header *header,
>         atomic_set(&header->snapc->nref, 1);
>         header->snapc->seq = le64_to_cpu(ondisk->snap_seq);
>         header->snapc->num_snaps = snap_count;
> -
> -       /* Fill in the snapshot information */
> -
> -       if (snap_count) {
> -               u32 i;
> -
> -               for (i = 0; i < snap_count; i++) {
> -                       header->snapc->snaps[i] =
> -                               le64_to_cpu(ondisk->snaps[i].id);
> -                       header->snap_sizes[i] =
> -                               le64_to_cpu(ondisk->snaps[i].image_size);
> -               }
> -       }
> +       for (i = 0; i < snap_count; i++)
> +               header->snapc->snaps[i] =
> +                       le64_to_cpu(ondisk->snaps[i].id);
>
>         return 0;
>
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/11] rbd: move rbd_opts to struct rbd_device
  2012-08-24 16:33 ` [PATCH 05/11] rbd: move rbd_opts to struct rbd_device Alex Elder
@ 2012-08-30 17:07   ` Yehuda Sadeh
  2012-09-06 14:21     ` Alex Elder
  0 siblings, 1 reply; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 17:07 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

On Fri, Aug 24, 2012 at 9:33 AM, Alex Elder <elder@inktank.com> wrote:
> The rbd options don't really apply to the ceph client.  So don't
> store a pointer to it in the ceph_client structure, and put them
> (a struct, not a pointer) into the rbd_dev structure proper.
>
> Pass the rbd device structure to rbd_client_create() so it can
> assign rbd_dev->rbdc if successful, and have it return an error code
> instead of the rbd client pointer.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   49
> ++++++++++++++++---------------------------------
>  1 file changed, 16 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 130dbc2..b40f553 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -98,7 +98,6 @@ struct rbd_options {
>   */
>  struct rbd_client {
>         struct ceph_client      *client;
> -       struct rbd_options      *rbd_opts;
>         struct kref             kref;
>         struct list_head        node;
>  };
> @@ -152,6 +151,7 @@ struct rbd_device {
>         struct gendisk          *disk;          /* blkdev's gendisk and rq */
>         struct request_queue    *q;
>
> +       struct rbd_options      rbd_opts;
>         struct rbd_client       *rbd_client;
>
>         char                    name[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
> @@ -273,8 +273,7 @@ static const struct block_device_operations
> rbd_bd_ops = {
>   * Initialize an rbd client instance.
>   * We own *ceph_opts.
>   */
> -static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts,
> -                                           struct rbd_options *rbd_opts)
> +static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts)
>  {
>         struct rbd_client *rbdc;
>         int ret = -ENOMEM;
> @@ -298,8 +297,6 @@ static struct rbd_client *rbd_client_create(struct
> ceph_options *ceph_opts,
>         if (ret < 0)
>                 goto out_err;
>
> -       rbdc->rbd_opts = rbd_opts;
> -
>         spin_lock(&rbd_client_list_lock);
>         list_add_tail(&rbdc->node, &rbd_client_list);
>         spin_unlock(&rbd_client_list_lock);
> @@ -402,42 +399,33 @@ static int parse_rbd_opts_token(char *c, void
> *private)
>   * Get a ceph client with specific addr and configuration, if one does
>   * not exist create it.
>   */
> -static struct rbd_client *rbd_get_client(const char *mon_addr,
> -                                        size_t mon_addr_len,
> -                                        char *options)
> +static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,
> +                               size_t mon_addr_len, char *options)
>  {
> -       struct rbd_client *rbdc;
> +       struct rbd_options *rbd_opts = &rbd_dev->rbd_opts;
>         struct ceph_options *ceph_opts;
> -       struct rbd_options *rbd_opts;
> -
> -       rbd_opts = kzalloc(sizeof(*rbd_opts), GFP_KERNEL);
> -       if (!rbd_opts)
> -               return ERR_PTR(-ENOMEM);
> +       struct rbd_client *rbdc;
>
>         rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
>
>         ceph_opts = ceph_parse_options(options, mon_addr,
>                                         mon_addr + mon_addr_len,
>                                         parse_rbd_opts_token, rbd_opts);
> -       if (IS_ERR(ceph_opts)) {
> -               kfree(rbd_opts);
> -               return ERR_CAST(ceph_opts);
> -       }
> +       if (IS_ERR(ceph_opts))
> +               return PTR_ERR(ceph_opts);
>
>         rbdc = rbd_client_find(ceph_opts);
>         if (rbdc) {
>                 /* using an existing client */
>                 ceph_destroy_options(ceph_opts);

It'll probably be better to use a reference count to control ceph_opts lifecycle

> -               kfree(rbd_opts);
> -
> -               return rbdc;
> +       } else {
> +               rbdc = rbd_client_create(ceph_opts);
> +               if (IS_ERR(rbdc))
> +                       return PTR_ERR(rbdc);
>         }
> +       rbd_dev->rbd_client = rbdc;
>
> -       rbdc = rbd_client_create(ceph_opts, rbd_opts);
> -       if (IS_ERR(rbdc))
> -               kfree(rbd_opts);
> -
> -       return rbdc;
> +       return 0;
>  }
>
>  /*
> @@ -455,7 +443,6 @@ static void rbd_client_release(struct kref *kref)
>         spin_unlock(&rbd_client_list_lock);
>
>         ceph_destroy_client(rbdc->client);
> -       kfree(rbdc->rbd_opts);
>         kfree(rbdc);
>  }
>
> @@ -2526,13 +2513,9 @@ static ssize_t rbd_add(struct bus_type *bus,
>         if (rc)
>                 goto err_put_id;
>
> -       rbd_dev->rbd_client = rbd_get_client(mon_addrs, mon_addrs_size - 1,
> -                                               options);
> -       if (IS_ERR(rbd_dev->rbd_client)) {
> -               rc = PTR_ERR(rbd_dev->rbd_client);
> -               rbd_dev->rbd_client = NULL;
> +       rc = rbd_get_client(rbd_dev, mon_addrs, mon_addrs_size - 1, options);
> +       if (rc < 0)
>                 goto err_put_id;
> -       }
>
>         /* pick the pool */
>         osdc = &rbd_dev->rbd_client->client->osdc;
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/11] rbd: add read_only rbd map option
  2012-08-24 16:34 ` [PATCH 06/11] rbd: add read_only rbd map option Alex Elder
@ 2012-08-30 17:29   ` Yehuda Sadeh
  2012-08-30 17:39     ` Alex Elder
  2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
  1 sibling, 1 reply; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 17:29 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

On Fri, Aug 24, 2012 at 9:34 AM, Alex Elder <elder@inktank.com> wrote:
> Add the ability to map an rbd image read-only, by specifying either
> "read_only" or "ro" as an option on the rbd "command line."
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index b40f553..5e22bb5 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -69,7 +69,8 @@
>  #define DEV_NAME_LEN           32
>  #define MAX_INT_FORMAT_WIDTH   ((5 * sizeof (int)) / 2 + 1)
>
> -#define RBD_NOTIFY_TIMEOUT_DEFAULT 10
> +#define RBD_NOTIFY_TIMEOUT_DEFAULT     10
> +#define RBD_READ_ONLY_DEFAULT          false
>
>  /*
>   * block device image metadata (in-memory version)
> @@ -91,6 +92,7 @@ struct rbd_image_header {
>
>  struct rbd_options {
>         int     notify_timeout;
> +       bool    read_only;
>  };
>
>  /*
> @@ -176,7 +178,7 @@ struct rbd_device {
>         u64                     snap_id;        /* current snapshot id */
>         /* whether the snap_id this device reads from still exists */
>         bool                    snap_exists;
> -       int                     read_only;
> +       bool                    read_only;
>
>         struct list_head        node;
>
> @@ -351,12 +353,18 @@ enum {
>         /* int args above */
>         Opt_last_string,
>         /* string args above */
> +       Opt_read_only,
> +       /* Boolean args above */
> +       Opt_last_bool,
>  };
>
>  static match_table_t rbd_opts_tokens = {
>         {Opt_notify_timeout, "notify_timeout=%d"},
>         /* int args above */
>         /* string args above */
> +       {Opt_read_only, "read_only"},
> +       {Opt_read_only, "ro"},          /* Alternate spelling */
> +       /* Boolean args above */
>         {-1, NULL}
>  };
>
> @@ -381,6 +389,8 @@ static int parse_rbd_opts_token(char *c, void *private)
>         } else if (token > Opt_last_int && token < Opt_last_string) {
>                 dout("got string token %d val %s\n", token,
>                      argstr[0].from);
> +       } else if (token > Opt_last_string && token < Opt_last_bool) {
> +               dout("got Boolean token %d\n", token);
>         } else {
>                 dout("got token %d\n", token);
>         }
> @@ -389,6 +399,9 @@ static int parse_rbd_opts_token(char *c, void *private)
>         case Opt_notify_timeout:
>                 rbd_opts->notify_timeout = intval;
>                 break;
> +       case Opt_read_only:
> +               rbd_opts->read_only = !rbd_opts->read_only;

should this be a flip-flop? better have "rw" and "read_write" options
instead, or just set it to true here.

> +               break;
>         default:
>                 BUG_ON(token);
>         }
> @@ -407,6 +420,7 @@ static int rbd_get_client(struct rbd_device
> *rbd_dev, const char *mon_addr,
>         struct rbd_client *rbdc;
>
>         rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
> +       rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
>
>         ceph_opts = ceph_parse_options(options, mon_addr,
>                                         mon_addr + mon_addr_len,
> @@ -613,7 +627,7 @@ static int rbd_header_set_snap(struct rbd_device
> *rbd_dev, u64 *size)
>                     sizeof (RBD_SNAP_HEAD_NAME))) {
>                 rbd_dev->snap_id = CEPH_NOSNAP;
>                 rbd_dev->snap_exists = false;
> -               rbd_dev->read_only = 0;
> +               rbd_dev->read_only = rbd_dev->rbd_opts.read_only;
>                 if (size)
>                         *size = rbd_dev->header.image_size;
>         } else {
> @@ -625,7 +639,7 @@ static int rbd_header_set_snap(struct rbd_device
> *rbd_dev, u64 *size)
>                         goto done;
>                 rbd_dev->snap_id = snap_id;
>                 rbd_dev->snap_exists = true;
> -               rbd_dev->read_only = 1;
> +               rbd_dev->read_only = true;      /* No choice for snapshots */
>         }
>
>         ret = 0;
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/11] rbd: kill notify_timeout option
  2012-08-24 16:34 ` [PATCH 07/11] rbd: kill notify_timeout option Alex Elder
@ 2012-08-30 17:31   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 17:31 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Fri, Aug 24, 2012 at 9:34 AM, Alex Elder <elder@inktank.com> wrote:
> The "notify_timeout" rbd device option is never used, so get rid of
> it.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |    8 --------
>  1 file changed, 8 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 5e22bb5..e94336d 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -69,7 +69,6 @@
>  #define DEV_NAME_LEN           32
>  #define MAX_INT_FORMAT_WIDTH   ((5 * sizeof (int)) / 2 + 1)
>
> -#define RBD_NOTIFY_TIMEOUT_DEFAULT     10
>  #define RBD_READ_ONLY_DEFAULT          false
>
>  /*
> @@ -91,7 +90,6 @@ struct rbd_image_header {
>  };
>
>  struct rbd_options {
> -       int     notify_timeout;
>         bool    read_only;
>  };
>
> @@ -348,7 +346,6 @@ static struct rbd_client *rbd_client_find(struct
> ceph_options *ceph_opts)
>   * mount options
>   */
>  enum {
> -       Opt_notify_timeout,
>         Opt_last_int,
>         /* int args above */
>         Opt_last_string,
> @@ -359,7 +356,6 @@ enum {
>  };
>
>  static match_table_t rbd_opts_tokens = {
> -       {Opt_notify_timeout, "notify_timeout=%d"},
>         /* int args above */
>         /* string args above */
>         {Opt_read_only, "read_only"},
> @@ -396,9 +392,6 @@ static int parse_rbd_opts_token(char *c, void *private)
>         }
>
>         switch (token) {
> -       case Opt_notify_timeout:
> -               rbd_opts->notify_timeout = intval;
> -               break;
>         case Opt_read_only:
>                 rbd_opts->read_only = !rbd_opts->read_only;
>                 break;
> @@ -419,7 +412,6 @@ static int rbd_get_client(struct rbd_device
> *rbd_dev, const char *mon_addr,
>         struct ceph_options *ceph_opts;
>         struct rbd_client *rbdc;
>
> -       rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
>         rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
>
>         ceph_opts = ceph_parse_options(options, mon_addr,
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/11] rbd: add read_only rbd map option
  2012-08-30 17:29   ` Yehuda Sadeh
@ 2012-08-30 17:39     ` Alex Elder
  0 siblings, 0 replies; 35+ messages in thread
From: Alex Elder @ 2012-08-30 17:39 UTC (permalink / raw)
  To: Yehuda Sadeh; +Cc: ceph-devel

On 08/30/2012 12:29 PM, Yehuda Sadeh wrote:
> On Fri, Aug 24, 2012 at 9:34 AM, Alex Elder <elder@inktank.com> wrote:
>> Add the ability to map an rbd image read-only, by specifying either
>> "read_only" or "ro" as an option on the rbd "command line."
>>
>> Signed-off-by: Alex Elder <elder@inktank.com>
>> ---
>>  drivers/block/rbd.c |   22 ++++++++++++++++++----
>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index b40f553..5e22bb5 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -69,7 +69,8 @@
>>  #define DEV_NAME_LEN           32
>>  #define MAX_INT_FORMAT_WIDTH   ((5 * sizeof (int)) / 2 + 1)
>>
>> -#define RBD_NOTIFY_TIMEOUT_DEFAULT 10
>> +#define RBD_NOTIFY_TIMEOUT_DEFAULT     10
>> +#define RBD_READ_ONLY_DEFAULT          false
>>
>>  /*
>>   * block device image metadata (in-memory version)
>> @@ -91,6 +92,7 @@ struct rbd_image_header {
>>
>>  struct rbd_options {
>>         int     notify_timeout;
>> +       bool    read_only;
>>  };
>>
>>  /*
>> @@ -176,7 +178,7 @@ struct rbd_device {
>>         u64                     snap_id;        /* current snapshot id */
>>         /* whether the snap_id this device reads from still exists */
>>         bool                    snap_exists;
>> -       int                     read_only;
>> +       bool                    read_only;
>>
>>         struct list_head        node;
>>
>> @@ -351,12 +353,18 @@ enum {
>>         /* int args above */
>>         Opt_last_string,
>>         /* string args above */
>> +       Opt_read_only,
>> +       /* Boolean args above */
>> +       Opt_last_bool,
>>  };
>>
>>  static match_table_t rbd_opts_tokens = {
>>         {Opt_notify_timeout, "notify_timeout=%d"},
>>         /* int args above */
>>         /* string args above */
>> +       {Opt_read_only, "read_only"},
>> +       {Opt_read_only, "ro"},          /* Alternate spelling */
>> +       /* Boolean args above */
>>         {-1, NULL}
>>  };
>>
>> @@ -381,6 +389,8 @@ static int parse_rbd_opts_token(char *c, void *private)
>>         } else if (token > Opt_last_int && token < Opt_last_string) {
>>                 dout("got string token %d val %s\n", token,
>>                      argstr[0].from);
>> +       } else if (token > Opt_last_string && token < Opt_last_bool) {
>> +               dout("got Boolean token %d\n", token);
>>         } else {
>>                 dout("got token %d\n", token);
>>         }
>> @@ -389,6 +399,9 @@ static int parse_rbd_opts_token(char *c, void *private)
>>         case Opt_notify_timeout:
>>                 rbd_opts->notify_timeout = intval;
>>                 break;
>> +       case Opt_read_only:
>> +               rbd_opts->read_only = !rbd_opts->read_only;
> 
> should this be a flip-flop? better have "rw" and "read_write" options
> instead, or just set it to true here.

Yes I think you have a good point.  I'll set it to true.  I'll also
add a "rw" option while I'm at it; the mount command offers both and
I do prefer offering both.

					-Alex


>> +               break;
>>         default:
>>                 BUG_ON(token);
>>         }
>> @@ -407,6 +420,7 @@ static int rbd_get_client(struct rbd_device
>> *rbd_dev, const char *mon_addr,
>>         struct rbd_client *rbdc;
>>
>>         rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
>> +       rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
>>
>>         ceph_opts = ceph_parse_options(options, mon_addr,
>>                                         mon_addr + mon_addr_len,
>> @@ -613,7 +627,7 @@ static int rbd_header_set_snap(struct rbd_device
>> *rbd_dev, u64 *size)
>>                     sizeof (RBD_SNAP_HEAD_NAME))) {
>>                 rbd_dev->snap_id = CEPH_NOSNAP;
>>                 rbd_dev->snap_exists = false;
>> -               rbd_dev->read_only = 0;
>> +               rbd_dev->read_only = rbd_dev->rbd_opts.read_only;
>>                 if (size)
>>                         *size = rbd_dev->header.image_size;
>>         } else {
>> @@ -625,7 +639,7 @@ static int rbd_header_set_snap(struct rbd_device
>> *rbd_dev, u64 *size)
>>                         goto done;
>>                 rbd_dev->snap_id = snap_id;
>>                 rbd_dev->snap_exists = true;
>> -               rbd_dev->read_only = 1;
>> +               rbd_dev->read_only = true;      /* No choice for snapshots */
>>         }
>>
>>         ret = 0;
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/11] rbd: bio_chain_clone() cleanups
  2012-08-24 16:35 ` [PATCH 08/11] rbd: bio_chain_clone() cleanups Alex Elder
@ 2012-08-30 17:40   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 17:40 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

we still need to fix the bio_pair leak though

On Fri, Aug 24, 2012 at 9:35 AM, Alex Elder <elder@inktank.com> wrote:
> In bio_chain_clone(), at the end of the function the bi_next field
> of the tail of the new bio chain is nulled.  This isn't necessary,
> because if "tail" is non-null, its value will be the last bio
> structure allocated at the top of the while loop in that function.
> And before that structure is added to the end of the new chain, its
> bi_next pointer is always made null.
>
> While touching that function, clean a few other things:
>     - define each local variable on its own line
>     - move the definition of "tmp" to an inner scope
>     - move the modification of gfpmask closer to where it's used
>     - rearrange the logic that sets the chain's tail pointer
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   21 ++++++++++-----------
>  1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index e94336d..bb8dad7 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -741,7 +741,9 @@ static struct bio *bio_chain_clone(struct bio **old,
> struct bio **next,
>                                    struct bio_pair **bp,
>                                    int len, gfp_t gfpmask)
>  {
> -       struct bio *tmp, *old_chain = *old, *new_chain = NULL, *tail = NULL;
> +       struct bio *old_chain = *old;
> +       struct bio *new_chain = NULL;
> +       struct bio *tail;
>         int total = 0;
>
>         if (*bp) {
> @@ -750,9 +752,12 @@ static struct bio *bio_chain_clone(struct bio
> **old, struct bio **next,
>         }
>
>         while (old_chain && (total < len)) {
> +               struct bio *tmp;
> +
>                 tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
>                 if (!tmp)
>                         goto err_out;
> +               gfpmask &= ~__GFP_WAIT; /* can't wait after the first */
>
>                 if (total + old_chain->bi_size > len) {
>                         struct bio_pair *bp;
> @@ -780,15 +785,12 @@ static struct bio *bio_chain_clone(struct bio
> **old, struct bio **next,
>                 }
>
>                 tmp->bi_bdev = NULL;
> -               gfpmask &= ~__GFP_WAIT;
>                 tmp->bi_next = NULL;
> -
> -               if (!new_chain) {
> -                       new_chain = tail = tmp;
> -               } else {
> +               if (new_chain)
>                         tail->bi_next = tmp;
> -                       tail = tmp;
> -               }
> +               else
> +                       new_chain = tmp;
> +               tail = tmp;
>                 old_chain = old_chain->bi_next;
>
>                 total += tmp->bi_size;
> @@ -796,9 +798,6 @@ static struct bio *bio_chain_clone(struct bio **old,
> struct bio **next,
>
>         BUG_ON(total < len);
>
> -       if (tail)
> -               tail->bi_next = NULL;
> -
>         *old = old_chain;
>
>         return new_chain;
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 09/11] rbd: drop needless test in rbd_rq_fn()
  2012-08-24 16:35 ` [PATCH 09/11] rbd: drop needless test in rbd_rq_fn() Alex Elder
@ 2012-08-30 17:41   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 17:41 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Fri, Aug 24, 2012 at 9:35 AM, Alex Elder <elder@inktank.com> wrote:
> There's a test for null rq pointer inside the while loop in
> rbd_rq_fn() that's not needed.  That same test already occurred
> in the immediatly preceding loop condition test.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |    4 ----
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index bb8dad7..fad4ecb 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -1462,10 +1462,6 @@ static void rbd_rq_fn(struct request_queue *q)
>                 struct rbd_req_coll *coll;
>                 struct ceph_snap_context *snapc;
>
> -               /* peek at request from block layer */
> -               if (!rq)
> -                       break;
> -
>                 dout("fetched request\n");
>
>                 /* filter out block requests we don't understand */
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments()
  2012-08-24 16:36 ` [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments() Alex Elder
@ 2012-08-30 17:50   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 17:50 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Fri, Aug 24, 2012 at 9:36 AM, Alex Elder <elder@inktank.com> wrote:
> It is possible in rbd_get_num_segments() for an overflow to occur
> when adding the offset and length.  This is easily avoided.
>
> Since the function returns an int and the one caller is already
> prepared to handle errors, have it return -ERANGE if overflow would
> occur.
>
> The overflow check would not work if a zero-length request was
> being tested, so short-circuit that case, returning 0 for the
> number of segments required.  (This condition might be avoided
> elsewhere already, I don't know.)
>
> Have the caller end the request if either an error or 0 is returned.
> The returned value is passed to __blk_end_request_all(), meaning
> a 0 length request is not treated an error.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index fad4ecb..b649446 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -50,6 +50,10 @@
>  #define        SECTOR_SHIFT    9
>  #define        SECTOR_SIZE     (1ULL << SECTOR_SHIFT)
>
> +/* It might be useful to have this defined elsewhere too */
> +
> +#define        U64_MAX ((u64) (~0ULL))

ULLONG_MAX defined in include/linux/kernel.h

> +
>  #define RBD_DRV_NAME "rbd"
>  #define RBD_DRV_NAME_LONG "rbd (rados block device)"
>
> @@ -678,8 +682,17 @@ static u64 rbd_get_segment(struct rbd_image_header
> *header,
>  static int rbd_get_num_segments(struct rbd_image_header *header,
>                                 u64 ofs, u64 len)
>  {
> -       u64 start_seg = ofs >> header->obj_order;
> -       u64 end_seg = (ofs + len - 1) >> header->obj_order;
> +       u64 start_seg;
> +       u64 end_seg;
> +
> +       if (!len)
> +               return 0;
> +       if (len - 1 > U64_MAX - ofs)
> +               return -ERANGE;
> +
> +       start_seg = ofs >> header->obj_order;
> +       end_seg = (ofs + len - 1) >> header->obj_order;
> +
>         return end_seg - start_seg + 1;
>  }
>
> @@ -1502,6 +1515,12 @@ static void rbd_rq_fn(struct request_queue *q)
>                      size, (unsigned long long) blk_rq_pos(rq) * SECTOR_SIZE);
>
>                 num_segs = rbd_get_num_segments(&rbd_dev->header, ofs, size);
> +               if (num_segs <= 0) {
> +                       spin_lock_irq(q->queue_lock);
> +                       __blk_end_request_all(rq, num_segs);
> +                       ceph_put_snap_context(snapc);
> +                       continue;
> +               }
>                 coll = rbd_alloc_coll(num_segs);
>                 if (!coll) {
>                         spin_lock_irq(q->queue_lock);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 11/11] rbd: split up rbd_get_segment()
  2012-08-24 16:36 ` [PATCH 11/11] rbd: split up rbd_get_segment() Alex Elder
@ 2012-08-30 18:03   ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-08-30 18:03 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Fri, Aug 24, 2012 at 9:36 AM, Alex Elder <elder@inktank.com> wrote:
> There are two places where rbd_get_segment() is called.  One, in
> rbd_rq_fn(), only needs to know the length within a segment that an
> I/O request should be.  The other, in rbd_do_op(), also needs the
> name of the object and the offset within it for the I/O request.
>
> Split out rbd_segment_name() into three dedicated functions:
>     - rbd_segment_name() allocates and formats the name of the
>       object for a segment containing a given rbd image offset
>     - rbd_segment_offset() computes the offset within a segment for
>       a given rbd image offset
>     - rbd_segment_length() computes the length to use for I/O within
>       a segment for a request, not to exceed the end of a segment
>       object.
>
> In the new functions be a bit more careful, checking for possible
> error conditions:
>     - watch for errors or overflows returned by snprintf()
>     - catch (using BUG_ON()) potential overflow conditions
>       when computing segment length
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
>  drivers/block/rbd.c |   66
> +++++++++++++++++++++++++++++++--------------------
>  1 file changed, 40 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index b649446..3a79e58 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -656,27 +656,47 @@ static void rbd_header_free(struct
> rbd_image_header *header)
>         header->snapc = NULL;
>  }
>
> -/*
> - * get the actual striped segment name, offset and length
> - */
> -static u64 rbd_get_segment(struct rbd_image_header *header,
> -                          const char *object_prefix,
> -                          u64 ofs, u64 len,
> -                          char *seg_name, u64 *segofs)
> +static char *rbd_segment_name(struct rbd_device *rbd_dev, u64 offset)
> +{
> +       char *name;
> +       u64 segment;
> +       int ret;
> +
> +       name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
> +       if (!name)
> +               return NULL;
> +       segment = offset >> rbd_dev->header.obj_order;
> +       ret = snprintf(name, RBD_MAX_SEG_NAME_LEN, "%s.%012llx",
> +                       rbd_dev->header.object_prefix, segment);
> +       if (ret < 0 || ret >= RBD_MAX_SEG_NAME_LEN) {
> +               pr_err("error formatting segment name for #%llu (%d)\n",
> +                       segment, ret);
> +               kfree(name);
> +               name = NULL;
> +       }
> +
> +       return name;
> +}
> +
> +static u64 rbd_segment_offset(struct rbd_device *rbd_dev, u64 offset)
>  {
> -       u64 seg = ofs >> header->obj_order;
> +       u64 segment_size = (u64) 1 << rbd_dev->header.obj_order;
>
> -       if (seg_name)
> -               snprintf(seg_name, RBD_MAX_SEG_NAME_LEN,
> -                        "%s.%012llx", object_prefix, seg);
> +       return offset & (segment_size - 1);
> +}
> +
> +static u64 rbd_segment_length(struct rbd_device *rbd_dev,
> +                               u64 offset, u64 length)
> +{
> +       u64 segment_size = (u64) 1 << rbd_dev->header.obj_order;
>
> -       ofs = ofs & ((1 << header->obj_order) - 1);
> -       len = min_t(u64, len, (1 << header->obj_order) - ofs);
> +       offset &= segment_size - 1;
>
> -       if (segofs)
> -               *segofs = ofs;
> +       BUG_ON(length > U64_MAX - offset);
> +       if (offset + length > segment_size)
> +               length = segment_size - offset;
>
> -       return len;
> +       return length;
>  }
>
>  static int rbd_get_num_segments(struct rbd_image_header *header,
> @@ -1114,14 +1134,11 @@ static int rbd_do_op(struct request *rq,
>         struct ceph_osd_req_op *ops;
>         u32 payload_len;
>
> -       seg_name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
> +       seg_name = rbd_segment_name(rbd_dev, ofs);
>         if (!seg_name)
>                 return -ENOMEM;
> -
> -       seg_len = rbd_get_segment(&rbd_dev->header,
> -                                 rbd_dev->header.object_prefix,
> -                                 ofs, len,
> -                                 seg_name, &seg_ofs);
> +       seg_len = rbd_segment_length(rbd_dev, ofs, len);
> +       seg_ofs = rbd_segment_offset(rbd_dev, ofs);
>
>         payload_len = (flags & CEPH_OSD_FLAG_WRITE ? seg_len : 0);
>
> @@ -1532,10 +1549,7 @@ static void rbd_rq_fn(struct request_queue *q)
>                 do {
>                         /* a bio clone to be passed down to OSD req */
>                         dout("rq->bio->bi_vcnt=%hu\n", rq->bio->bi_vcnt);
> -                       op_size = rbd_get_segment(&rbd_dev->header,
> -                                                 rbd_dev->header.object_prefix,
> -                                                 ofs, size,
> -                                                 NULL, NULL);
> +                       op_size = rbd_segment_length(rbd_dev, ofs, size);
>                         kref_get(&coll->kref);
>                         bio = bio_chain_clone(&rq_bio, &next_bio, &bp,
>                                               op_size, GFP_ATOMIC);
> --
> 1.7.9.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/11] rbd: move rbd_opts to struct rbd_device
  2012-08-30 17:07   ` Yehuda Sadeh
@ 2012-09-06 14:21     ` Alex Elder
  2012-09-07 21:40       ` Yehuda Sadeh
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-09-06 14:21 UTC (permalink / raw)
  To: Yehuda Sadeh; +Cc: ceph-devel

On 08/30/2012 12:07 PM, Yehuda Sadeh wrote:
> On Fri, Aug 24, 2012 at 9:33 AM, Alex Elder <elder@inktank.com> wrote:
>> The rbd options don't really apply to the ceph client.  So don't
>> store a pointer to it in the ceph_client structure, and put them
>> (a struct, not a pointer) into the rbd_dev structure proper.
>>
>> Pass the rbd device structure to rbd_client_create() so it can
>> assign rbd_dev->rbdc if successful, and have it return an error code
>> instead of the rbd client pointer.
>>
>> Signed-off-by: Alex Elder <elder@inktank.com>
>> ---
>>  drivers/block/rbd.c |   49
>> ++++++++++++++++---------------------------------
>>  1 file changed, 16 insertions(+), 33 deletions(-)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 130dbc2..b40f553 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -98,7 +98,6 @@ struct rbd_options {
>>   */
>>  struct rbd_client {
>>         struct ceph_client      *client;
>> -       struct rbd_options      *rbd_opts;
>>         struct kref             kref;
>>         struct list_head        node;
>>  };
>> @@ -152,6 +151,7 @@ struct rbd_device {
>>         struct gendisk          *disk;          /* blkdev's gendisk and rq */
>>         struct request_queue    *q;
>>
>> +       struct rbd_options      rbd_opts;
>>         struct rbd_client       *rbd_client;
>>
>>         char                    name[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
>> @@ -273,8 +273,7 @@ static const struct block_device_operations
>> rbd_bd_ops = {
>>   * Initialize an rbd client instance.
>>   * We own *ceph_opts.
>>   */
>> -static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts,
>> -                                           struct rbd_options *rbd_opts)
>> +static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts)
>>  {
>>         struct rbd_client *rbdc;
>>         int ret = -ENOMEM;
>> @@ -298,8 +297,6 @@ static struct rbd_client *rbd_client_create(struct
>> ceph_options *ceph_opts,
>>         if (ret < 0)
>>                 goto out_err;
>>
>> -       rbdc->rbd_opts = rbd_opts;
>> -
>>         spin_lock(&rbd_client_list_lock);
>>         list_add_tail(&rbdc->node, &rbd_client_list);
>>         spin_unlock(&rbd_client_list_lock);
>> @@ -402,42 +399,33 @@ static int parse_rbd_opts_token(char *c, void
>> *private)
>>   * Get a ceph client with specific addr and configuration, if one does
>>   * not exist create it.
>>   */
>> -static struct rbd_client *rbd_get_client(const char *mon_addr,
>> -                                        size_t mon_addr_len,
>> -                                        char *options)
>> +static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,
>> +                               size_t mon_addr_len, char *options)
>>  {
>> -       struct rbd_client *rbdc;
>> +       struct rbd_options *rbd_opts = &rbd_dev->rbd_opts;
>>         struct ceph_options *ceph_opts;
>> -       struct rbd_options *rbd_opts;
>> -
>> -       rbd_opts = kzalloc(sizeof(*rbd_opts), GFP_KERNEL);
>> -       if (!rbd_opts)
>> -               return ERR_PTR(-ENOMEM);
>> +       struct rbd_client *rbdc;
>>
>>         rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
>>
>>         ceph_opts = ceph_parse_options(options, mon_addr,
>>                                         mon_addr + mon_addr_len,
>>                                         parse_rbd_opts_token, rbd_opts);
>> -       if (IS_ERR(ceph_opts)) {
>> -               kfree(rbd_opts);
>> -               return ERR_CAST(ceph_opts);
>> -       }
>> +       if (IS_ERR(ceph_opts))
>> +               return PTR_ERR(ceph_opts);
>>
>>         rbdc = rbd_client_find(ceph_opts);
>>         if (rbdc) {
>>                 /* using an existing client */
>>                 ceph_destroy_options(ceph_opts);
> 
> It'll probably be better to use a reference count to control ceph_opts lifecycle

I looked at this.

In this case, ceph_opts is directly tied to a ceph_client.
If a ceph client exists whose options match the one that
has been created here, that client will be used, and its
options get destroyed when the ceph_client gets destroyed
(and the ceph_client is refcounted).  The ceph_opts
created here will never be referenced by anything else
so it's safe to just destroy it.

If no existing rbd_client has a ceph_client with matching
ceph_options, then one is created below and it will use
the ceph_opts created here and thus its life cycle will
be managed by that ceph_client's reference count.

I believe this means there is no reason to reference count
ceph_opts.

Yehuda, please let me know if you disagree with this.  If
not, would you sign off on this?



There is a different problem here though, which I will
address separately.  The rbd_client_find() has protection
of rbd_client_list_lock(), and inserting a new rbd
client into the list in rbd_client_create() gets the
same protection, but there's a race between the two
that could lead to the creation of multiple clients for
the same set of options.  This may still produce correct
behavior but it's not the way it should work.

I've created this bug to track getting this fixed at
some (near) future date:

    http://tracker.newdream.net/issues/3094

					-Alex

>> -               kfree(rbd_opts);
>> -
>> -               return rbdc;
>> +       } else {
>> +               rbdc = rbd_client_create(ceph_opts);
>> +               if (IS_ERR(rbdc))
>> +                       return PTR_ERR(rbdc);
>>         }
>> +       rbd_dev->rbd_client = rbdc;
>>
>> -       rbdc = rbd_client_create(ceph_opts, rbd_opts);
>> -       if (IS_ERR(rbdc))
>> -               kfree(rbd_opts);
>> -
>> -       return rbdc;
>> +       return 0;
>>  }
>>
>>  /*
>> @@ -455,7 +443,6 @@ static void rbd_client_release(struct kref *kref)
>>         spin_unlock(&rbd_client_list_lock);
>>
>>         ceph_destroy_client(rbdc->client);
>> -       kfree(rbdc->rbd_opts);
>>         kfree(rbdc);
>>  }
>>
>> @@ -2526,13 +2513,9 @@ static ssize_t rbd_add(struct bus_type *bus,
>>         if (rc)
>>                 goto err_put_id;
>>
>> -       rbd_dev->rbd_client = rbd_get_client(mon_addrs, mon_addrs_size - 1,
>> -                                               options);
>> -       if (IS_ERR(rbd_dev->rbd_client)) {
>> -               rc = PTR_ERR(rbd_dev->rbd_client);
>> -               rbd_dev->rbd_client = NULL;
>> +       rc = rbd_get_client(rbd_dev, mon_addrs, mon_addrs_size - 1, options);
>> +       if (rc < 0)
>>                 goto err_put_id;
>> -       }
>>
>>         /* pick the pool */
>>         osdc = &rbd_dev->rbd_client->client->osdc;
>> --
>> 1.7.9.5
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 00/11] rbd: another set of patches
  2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
                   ` (11 preceding siblings ...)
  2012-08-30 12:32 ` [PATCH 00/11] rbd: another set of patches Alex Elder
@ 2012-09-06 15:34 ` Alex Elder
  12 siblings, 0 replies; 35+ messages in thread
From: Alex Elder @ 2012-09-06 15:34 UTC (permalink / raw)
  To: ceph-devel

On 08/24/2012 11:26 AM, Alex Elder wrote:
> I keep moving simple and simplifying patches to the front of
> my patch queue.  This is my latest set of things that are
> ready to get reviewed and committed.
> 
> 					-Alex

I'm about to send updates to two patches in this series, based
on comments from Yehuda's review.  I'm not going to re-send the
entire series.

Updated are patches 03 and 06.  I also am still awaiting
confirmation of review for patch 05 (which has not changed
and I am not re-sending).

					-Alex

> 
> [PATCH 01/11] rbd: handle locking inside __rbd_client_find()
>     This simplifies a little code.
> [PATCH 02/11] rbd: don't over-allocate space for object prefix
>     Don't allocate the max required space for the object prefix.
>     This also ensures that the object prefix is either terminated
>     with '\0' or else is maximally-sized.
> [PATCH 03/11] rbd: kill incore snap_names_len
>     Eliminate an unnecessary field in struct rbd_image_header.
> [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk()
>     Rearrange some code, for better logical grouping.
> [PATCH 05/11] rbd: move rbd_opts to struct rbd_device
>     Tie rbd-specific options to the rbd_device, not the client.
> [PATCH 06/11] rbd: add read_only rbd map option
> [PATCH 07/11] rbd: kill notify_timeout option
>     These two were done together, in this order, to preserve the
>     argument parsing code.  The first adds the ability to map an
>     rbd image read-only.  The second eliminates the only other
>     rbd option, which is otherwise unused.
> [PATCH 08/11] rbd: bio_chain_clone() cleanups
>     Simple refactoring.
> [PATCH 09/11] rbd: drop needless test in rbd_rq_fn()
>     Trivial.
> [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments()
>     Make sure we don't exceed 64-bit maximum when adding offset
>     and length.
> [PATCH 11/11] rbd: split up rbd_get_segment()
>     Refactor rbd_get_segment() into three separate functions,
>     which return a segment name, offset, and length.
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH, v2 03/11] rbd: kill incore snap_names_len
  2012-08-24 16:33 ` [PATCH 03/11] rbd: kill incore snap_names_len Alex Elder
  2012-08-30 16:24   ` Yehuda Sadeh
@ 2012-09-06 15:36   ` Alex Elder
  2012-09-07 21:22     ` Yehuda Sadeh
  1 sibling, 1 reply; 35+ messages in thread
From: Alex Elder @ 2012-09-06 15:36 UTC (permalink / raw)
  To: ceph-devel

The only thing the on-disk snap_names_len field is needed is to
size the buffer allocated to hold a copy of the snapshot names
for an rbd image.

So don't bother saving it in the in-core rbd_image_header structure.
Just use a local variable to hold the required buffer size while
it's needed.

Move the code that actually copies the snapshot names up closer
to where the required length is saved.

Signed-off-by: Alex Elder <elder@inktank.com>
---
v2: - Return -EIO rather than BUG_ON() as suggested by Yehuda.
    - Added a comment explaining why a memcpy() will not exceed
      the length of the on-disk buffer, in response to Yehuda's
      concern.

 drivers/block/rbd.c |   26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

Index: b/drivers/block/rbd.c
===================================================================
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -81,7 +81,6 @@ struct rbd_image_header {
 	__u8 crypt_type;
 	__u8 comp_type;
 	struct ceph_snap_context *snapc;
-	u64 snap_names_len;
 	u32 total_snaps;

 	char *snap_names;
@@ -534,12 +533,21 @@ static int rbd_header_from_disk(struct r
 	header->object_prefix[len] = '\0';

 	if (snap_count) {
-		header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
-		BUG_ON(header->snap_names_len > (u64) SIZE_MAX);
-		header->snap_names = kmalloc(header->snap_names_len,
-					     GFP_KERNEL);
+		u64 snap_names_len = le64_to_cpu(ondisk->snap_names_len);
+
+		if (snap_names_len > (u64) SIZE_MAX)
+			return -EIO;
+		header->snap_names = kmalloc(snap_names_len, GFP_KERNEL);
 		if (!header->snap_names)
 			goto out_err;
+		/*
+		 * Note that rbd_dev_v1_header_read() guarantees
+		 * the ondisk buffer we're working with has
+		 * snap_names_len bytes beyond the end of the
+		 * snapshot id array, this memcpy() is safe.
+		 */
+		memcpy(header->snap_names, &ondisk->snaps[snap_count],
+			snap_names_len);

 		size = snap_count * sizeof (*header->snap_sizes);
 		header->snap_sizes = kmalloc(size, GFP_KERNEL);
@@ -547,7 +555,6 @@ static int rbd_header_from_disk(struct r
 			goto out_err;
 	} else {
 		WARN_ON(ondisk->snap_names_len);
-		header->snap_names_len = 0;
 		header->snap_names = NULL;
 		header->snap_sizes = NULL;
 	}
@@ -579,10 +586,6 @@ static int rbd_header_from_disk(struct r
 			header->snap_sizes[i] =
 				le64_to_cpu(ondisk->snaps[i].image_size);
 		}
-
-		/* copy snapshot names */
-		memcpy(header->snap_names, &ondisk->snaps[snap_count],
-			header->snap_names_len);
 	}

 	return 0;
@@ -592,7 +595,6 @@ out_err:
 	header->snap_sizes = NULL;
 	kfree(header->snap_names);
 	header->snap_names = NULL;
-	header->snap_names_len = 0;
 	kfree(header->object_prefix);
 	header->object_prefix = NULL;

@@ -660,7 +662,6 @@ static void rbd_header_free(struct rbd_i
 	header->snap_sizes = NULL;
 	kfree(header->snap_names);
 	header->snap_names = NULL;
-	header->snap_names_len = 0;
 	ceph_put_snap_context(header->snapc);
 	header->snapc = NULL;
 }
@@ -1800,7 +1801,6 @@ static int __rbd_refresh_header(struct r
 	rbd_dev->header.total_snaps = h.total_snaps;
 	rbd_dev->header.snapc = h.snapc;
 	rbd_dev->header.snap_names = h.snap_names;
-	rbd_dev->header.snap_names_len = h.snap_names_len;
 	rbd_dev->header.snap_sizes = h.snap_sizes;
 	/* Free the extra copy of the object prefix */
 	WARN_ON(strcmp(rbd_dev->header.object_prefix, h.object_prefix));


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH, v2 06/11] rbd: add read_only rbd map option
  2012-08-24 16:34 ` [PATCH 06/11] rbd: add read_only rbd map option Alex Elder
  2012-08-30 17:29   ` Yehuda Sadeh
@ 2012-09-06 15:36   ` Alex Elder
  2012-09-07 15:45     ` Sage Weil
  2012-09-07 21:26     ` Yehuda Sadeh Weinraub
  1 sibling, 2 replies; 35+ messages in thread
From: Alex Elder @ 2012-09-06 15:36 UTC (permalink / raw)
  To: ceph-devel

Add the ability to map an rbd image read-only, by specifying either
"read_only" or "ro" as an option on the rbd "command line."  Also
allow the inverse to be explicitly specified using "read_write" or
"rw".

Signed-off-by: Alex Elder <elder@inktank.com>
---
v2: - Made the read_only/ro flag explicitly make the mapping
      read-only rather than having it toggle.
    - Added read_write/rw flags to allow a writable mapping to
      be explicitly requested.  Snapshot mappings are silently
      mapped read-only (even if read_write was requested).

 drivers/block/rbd.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

Index: b/drivers/block/rbd.c
===================================================================
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -69,7 +69,8 @@
 #define DEV_NAME_LEN		32
 #define MAX_INT_FORMAT_WIDTH	((5 * sizeof (int)) / 2 + 1)

-#define RBD_NOTIFY_TIMEOUT_DEFAULT 10
+#define RBD_NOTIFY_TIMEOUT_DEFAULT	10
+#define RBD_READ_ONLY_DEFAULT		false

 /*
  * block device image metadata (in-memory version)
@@ -91,6 +92,7 @@ struct rbd_image_header {

 struct rbd_options {
 	int	notify_timeout;
+	bool	read_only;
 };

 /*
@@ -176,7 +178,7 @@ struct rbd_device {
 	u64                     snap_id;	/* current snapshot id */
 	/* whether the snap_id this device reads from still exists */
 	bool                    snap_exists;
-	int                     read_only;
+	bool			read_only;

 	struct list_head	node;

@@ -351,12 +353,21 @@ enum {
 	/* int args above */
 	Opt_last_string,
 	/* string args above */
+	Opt_read_only,
+	Opt_read_write,
+	/* Boolean args above */
+	Opt_last_bool,
 };

 static match_table_t rbd_opts_tokens = {
 	{Opt_notify_timeout, "notify_timeout=%d"},
 	/* int args above */
 	/* string args above */
+	{Opt_read_only, "read_only"},
+	{Opt_read_only, "ro"},		/* Alternate spelling */
+	{Opt_read_write, "read_write"},
+	{Opt_read_write, "rw"},		/* Alternate spelling */
+	/* Boolean args above */
 	{-1, NULL}
 };

@@ -381,6 +392,8 @@ static int parse_rbd_opts_token(char *c,
 	} else if (token > Opt_last_int && token < Opt_last_string) {
 		dout("got string token %d val %s\n", token,
 		     argstr[0].from);
+	} else if (token > Opt_last_string && token < Opt_last_bool) {
+		dout("got Boolean token %d\n", token);
 	} else {
 		dout("got token %d\n", token);
 	}
@@ -389,6 +402,12 @@ static int parse_rbd_opts_token(char *c,
 	case Opt_notify_timeout:
 		rbd_opts->notify_timeout = intval;
 		break;
+	case Opt_read_only:
+		rbd_opts->read_only = true;
+		break;
+	case Opt_read_write:
+		rbd_opts->read_only = false;
+		break;
 	default:
 		BUG_ON(token);
 	}
@@ -407,6 +426,7 @@ static int rbd_get_client(struct rbd_dev
 	struct rbd_client *rbdc;

 	rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
+	rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;

 	ceph_opts = ceph_parse_options(options, mon_addr,
 					mon_addr + mon_addr_len,
@@ -620,7 +640,7 @@ static int rbd_header_set_snap(struct rb
 		    sizeof (RBD_SNAP_HEAD_NAME))) {
 		rbd_dev->snap_id = CEPH_NOSNAP;
 		rbd_dev->snap_exists = false;
-		rbd_dev->read_only = 0;
+		rbd_dev->read_only = rbd_dev->rbd_opts.read_only;
 		if (size)
 			*size = rbd_dev->header.image_size;
 	} else {
@@ -632,7 +652,7 @@ static int rbd_header_set_snap(struct rb
 			goto done;
 		rbd_dev->snap_id = snap_id;
 		rbd_dev->snap_exists = true;
-		rbd_dev->read_only = 1;
+		rbd_dev->read_only = true;	/* No choice for snapshots */
 	}

 	ret = 0;

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH, v2 06/11] rbd: add read_only rbd map option
  2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
@ 2012-09-07 15:45     ` Sage Weil
  2012-09-07 20:36       ` Alex Elder
  2012-09-07 21:26     ` Yehuda Sadeh Weinraub
  1 sibling, 1 reply; 35+ messages in thread
From: Sage Weil @ 2012-09-07 15:45 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

On Thu, 6 Sep 2012, Alex Elder wrote:
> Add the ability to map an rbd image read-only, by specifying either
> "read_only" or "ro" as an option on the rbd "command line."  Also
> allow the inverse to be explicitly specified using "read_write" or
> "rw".
> 
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> v2: - Made the read_only/ro flag explicitly make the mapping
>       read-only rather than having it toggle.
>     - Added read_write/rw flags to allow a writable mapping to
>       be explicitly requested.  Snapshot mappings are silently
>       mapped read-only (even if read_write was requested).

If rw is the default, and rw is silently ignored if the image itself is a 
snapshot, what is the value in rw?  Would it make more sense to fail with 
EROFS if you specify rw on a snapshot (or otherwise read-only image)?

sage

> 
>  drivers/block/rbd.c |   28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
> 
> Index: b/drivers/block/rbd.c
> ===================================================================
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -69,7 +69,8 @@
>  #define DEV_NAME_LEN		32
>  #define MAX_INT_FORMAT_WIDTH	((5 * sizeof (int)) / 2 + 1)
> 
> -#define RBD_NOTIFY_TIMEOUT_DEFAULT 10
> +#define RBD_NOTIFY_TIMEOUT_DEFAULT	10
> +#define RBD_READ_ONLY_DEFAULT		false
> 
>  /*
>   * block device image metadata (in-memory version)
> @@ -91,6 +92,7 @@ struct rbd_image_header {
> 
>  struct rbd_options {
>  	int	notify_timeout;
> +	bool	read_only;
>  };
> 
>  /*
> @@ -176,7 +178,7 @@ struct rbd_device {
>  	u64                     snap_id;	/* current snapshot id */
>  	/* whether the snap_id this device reads from still exists */
>  	bool                    snap_exists;
> -	int                     read_only;
> +	bool			read_only;
> 
>  	struct list_head	node;
> 
> @@ -351,12 +353,21 @@ enum {
>  	/* int args above */
>  	Opt_last_string,
>  	/* string args above */
> +	Opt_read_only,
> +	Opt_read_write,
> +	/* Boolean args above */
> +	Opt_last_bool,
>  };
> 
>  static match_table_t rbd_opts_tokens = {
>  	{Opt_notify_timeout, "notify_timeout=%d"},
>  	/* int args above */
>  	/* string args above */
> +	{Opt_read_only, "read_only"},
> +	{Opt_read_only, "ro"},		/* Alternate spelling */
> +	{Opt_read_write, "read_write"},
> +	{Opt_read_write, "rw"},		/* Alternate spelling */
> +	/* Boolean args above */
>  	{-1, NULL}
>  };
> 
> @@ -381,6 +392,8 @@ static int parse_rbd_opts_token(char *c,
>  	} else if (token > Opt_last_int && token < Opt_last_string) {
>  		dout("got string token %d val %s\n", token,
>  		     argstr[0].from);
> +	} else if (token > Opt_last_string && token < Opt_last_bool) {
> +		dout("got Boolean token %d\n", token);
>  	} else {
>  		dout("got token %d\n", token);
>  	}
> @@ -389,6 +402,12 @@ static int parse_rbd_opts_token(char *c,
>  	case Opt_notify_timeout:
>  		rbd_opts->notify_timeout = intval;
>  		break;
> +	case Opt_read_only:
> +		rbd_opts->read_only = true;
> +		break;
> +	case Opt_read_write:
> +		rbd_opts->read_only = false;
> +		break;
>  	default:
>  		BUG_ON(token);
>  	}
> @@ -407,6 +426,7 @@ static int rbd_get_client(struct rbd_dev
>  	struct rbd_client *rbdc;
> 
>  	rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
> +	rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
> 
>  	ceph_opts = ceph_parse_options(options, mon_addr,
>  					mon_addr + mon_addr_len,
> @@ -620,7 +640,7 @@ static int rbd_header_set_snap(struct rb
>  		    sizeof (RBD_SNAP_HEAD_NAME))) {
>  		rbd_dev->snap_id = CEPH_NOSNAP;
>  		rbd_dev->snap_exists = false;
> -		rbd_dev->read_only = 0;
> +		rbd_dev->read_only = rbd_dev->rbd_opts.read_only;
>  		if (size)
>  			*size = rbd_dev->header.image_size;
>  	} else {
> @@ -632,7 +652,7 @@ static int rbd_header_set_snap(struct rb
>  			goto done;
>  		rbd_dev->snap_id = snap_id;
>  		rbd_dev->snap_exists = true;
> -		rbd_dev->read_only = 1;
> +		rbd_dev->read_only = true;	/* No choice for snapshots */
>  	}
> 
>  	ret = 0;
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH, v2 06/11] rbd: add read_only rbd map option
  2012-09-07 15:45     ` Sage Weil
@ 2012-09-07 20:36       ` Alex Elder
  0 siblings, 0 replies; 35+ messages in thread
From: Alex Elder @ 2012-09-07 20:36 UTC (permalink / raw)
  To: Sage Weil; +Cc: ceph-devel

On 09/07/2012 10:45 AM, Sage Weil wrote:
> On Thu, 6 Sep 2012, Alex Elder wrote:
>> Add the ability to map an rbd image read-only, by specifying either
>> "read_only" or "ro" as an option on the rbd "command line."  Also
>> allow the inverse to be explicitly specified using "read_write" or
>> "rw".
>>
>> Signed-off-by: Alex Elder <elder@inktank.com>
>> ---
>> v2: - Made the read_only/ro flag explicitly make the mapping
>>       read-only rather than having it toggle.
>>     - Added read_write/rw flags to allow a writable mapping to
>>       be explicitly requested.  Snapshot mappings are silently
>>       mapped read-only (even if read_write was requested).
> 
> If rw is the default, and rw is silently ignored if the image itself is a 
> snapshot, what is the value in rw?  Would it make more sense to fail with 
> EROFS if you specify rw on a snapshot (or otherwise read-only image)?

I explicitly called this out because what I wanted to do was simply too
hard.

Namely:
- default for base image map would be read-write
- default for snapshot map would be read-only
- attempt to map snapshot read-write would be met with a "ignoring"
  warning

But the way the argument parsing is structured, you never have all the
information you want at the right time to do the above.  So I punted
and just decided attempts to read-write map a snapshot would be silently
ignored.

I'll take a look at simply failing on a read-write mount of a snapshot
and if it's workable I'll do that.

					-Alex

PS  As I told to Josh separately, one of the reasons I even put in the
    read-only option in the first place is to have a reason to preserve
    the argument parsing infrastructure.  And that's why it gets added
    at this position in the series--so that code remains after the next
    patch, which removes the only other option (notify_timeout).  But it
    also seemed like a (marginally) useful option either way.

> sage
> 
>>
>>  drivers/block/rbd.c |   28 ++++++++++++++++++++++++----
>>  1 file changed, 24 insertions(+), 4 deletions(-)
>>
>> Index: b/drivers/block/rbd.c
>> ===================================================================
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -69,7 +69,8 @@
>>  #define DEV_NAME_LEN		32
>>  #define MAX_INT_FORMAT_WIDTH	((5 * sizeof (int)) / 2 + 1)
>>
>> -#define RBD_NOTIFY_TIMEOUT_DEFAULT 10
>> +#define RBD_NOTIFY_TIMEOUT_DEFAULT	10
>> +#define RBD_READ_ONLY_DEFAULT		false
>>
>>  /*
>>   * block device image metadata (in-memory version)
>> @@ -91,6 +92,7 @@ struct rbd_image_header {
>>
>>  struct rbd_options {
>>  	int	notify_timeout;
>> +	bool	read_only;
>>  };
>>
>>  /*
>> @@ -176,7 +178,7 @@ struct rbd_device {
>>  	u64                     snap_id;	/* current snapshot id */
>>  	/* whether the snap_id this device reads from still exists */
>>  	bool                    snap_exists;
>> -	int                     read_only;
>> +	bool			read_only;
>>
>>  	struct list_head	node;
>>
>> @@ -351,12 +353,21 @@ enum {
>>  	/* int args above */
>>  	Opt_last_string,
>>  	/* string args above */
>> +	Opt_read_only,
>> +	Opt_read_write,
>> +	/* Boolean args above */
>> +	Opt_last_bool,
>>  };
>>
>>  static match_table_t rbd_opts_tokens = {
>>  	{Opt_notify_timeout, "notify_timeout=%d"},
>>  	/* int args above */
>>  	/* string args above */
>> +	{Opt_read_only, "read_only"},
>> +	{Opt_read_only, "ro"},		/* Alternate spelling */
>> +	{Opt_read_write, "read_write"},
>> +	{Opt_read_write, "rw"},		/* Alternate spelling */
>> +	/* Boolean args above */
>>  	{-1, NULL}
>>  };
>>
>> @@ -381,6 +392,8 @@ static int parse_rbd_opts_token(char *c,
>>  	} else if (token > Opt_last_int && token < Opt_last_string) {
>>  		dout("got string token %d val %s\n", token,
>>  		     argstr[0].from);
>> +	} else if (token > Opt_last_string && token < Opt_last_bool) {
>> +		dout("got Boolean token %d\n", token);
>>  	} else {
>>  		dout("got token %d\n", token);
>>  	}
>> @@ -389,6 +402,12 @@ static int parse_rbd_opts_token(char *c,
>>  	case Opt_notify_timeout:
>>  		rbd_opts->notify_timeout = intval;
>>  		break;
>> +	case Opt_read_only:
>> +		rbd_opts->read_only = true;
>> +		break;
>> +	case Opt_read_write:
>> +		rbd_opts->read_only = false;
>> +		break;
>>  	default:
>>  		BUG_ON(token);
>>  	}
>> @@ -407,6 +426,7 @@ static int rbd_get_client(struct rbd_dev
>>  	struct rbd_client *rbdc;
>>
>>  	rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
>> +	rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
>>
>>  	ceph_opts = ceph_parse_options(options, mon_addr,
>>  					mon_addr + mon_addr_len,
>> @@ -620,7 +640,7 @@ static int rbd_header_set_snap(struct rb
>>  		    sizeof (RBD_SNAP_HEAD_NAME))) {
>>  		rbd_dev->snap_id = CEPH_NOSNAP;
>>  		rbd_dev->snap_exists = false;
>> -		rbd_dev->read_only = 0;
>> +		rbd_dev->read_only = rbd_dev->rbd_opts.read_only;
>>  		if (size)
>>  			*size = rbd_dev->header.image_size;
>>  	} else {
>> @@ -632,7 +652,7 @@ static int rbd_header_set_snap(struct rb
>>  			goto done;
>>  		rbd_dev->snap_id = snap_id;
>>  		rbd_dev->snap_exists = true;
>> -		rbd_dev->read_only = 1;
>> +		rbd_dev->read_only = true;	/* No choice for snapshots */
>>  	}
>>
>>  	ret = 0;
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH, v2 03/11] rbd: kill incore snap_names_len
  2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
@ 2012-09-07 21:22     ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-09-07 21:22 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Thu, Sep 6, 2012 at 8:36 AM, Alex Elder <elder@inktank.com> wrote:
> The only thing the on-disk snap_names_len field is needed is to
> size the buffer allocated to hold a copy of the snapshot names
> for an rbd image.
>
> So don't bother saving it in the in-core rbd_image_header structure.
> Just use a local variable to hold the required buffer size while
> it's needed.
>
> Move the code that actually copies the snapshot names up closer
> to where the required length is saved.
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> v2: - Return -EIO rather than BUG_ON() as suggested by Yehuda.
>     - Added a comment explaining why a memcpy() will not exceed
>       the length of the on-disk buffer, in response to Yehuda's
>       concern.
>
>  drivers/block/rbd.c |   26 +++++++++++++-------------
>  1 file changed, 13 insertions(+), 13 deletions(-)
>
> Index: b/drivers/block/rbd.c
> ===================================================================
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -81,7 +81,6 @@ struct rbd_image_header {
>         __u8 crypt_type;
>         __u8 comp_type;
>         struct ceph_snap_context *snapc;
> -       u64 snap_names_len;
>         u32 total_snaps;
>
>         char *snap_names;
> @@ -534,12 +533,21 @@ static int rbd_header_from_disk(struct r
>         header->object_prefix[len] = '\0';
>
>         if (snap_count) {
> -               header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
> -               BUG_ON(header->snap_names_len > (u64) SIZE_MAX);
> -               header->snap_names = kmalloc(header->snap_names_len,
> -                                            GFP_KERNEL);
> +               u64 snap_names_len = le64_to_cpu(ondisk->snap_names_len);
> +
> +               if (snap_names_len > (u64) SIZE_MAX)
> +                       return -EIO;
> +               header->snap_names = kmalloc(snap_names_len, GFP_KERNEL);
>                 if (!header->snap_names)
>                         goto out_err;
> +               /*
> +                * Note that rbd_dev_v1_header_read() guarantees
> +                * the ondisk buffer we're working with has
> +                * snap_names_len bytes beyond the end of the
> +                * snapshot id array, this memcpy() is safe.
> +                */
> +               memcpy(header->snap_names, &ondisk->snaps[snap_count],
> +                       snap_names_len);
>
>                 size = snap_count * sizeof (*header->snap_sizes);
>                 header->snap_sizes = kmalloc(size, GFP_KERNEL);
> @@ -547,7 +555,6 @@ static int rbd_header_from_disk(struct r
>                         goto out_err;
>         } else {
>                 WARN_ON(ondisk->snap_names_len);
> -               header->snap_names_len = 0;
>                 header->snap_names = NULL;
>                 header->snap_sizes = NULL;
>         }
> @@ -579,10 +586,6 @@ static int rbd_header_from_disk(struct r
>                         header->snap_sizes[i] =
>                                 le64_to_cpu(ondisk->snaps[i].image_size);
>                 }
> -
> -               /* copy snapshot names */
> -               memcpy(header->snap_names, &ondisk->snaps[snap_count],
> -                       header->snap_names_len);
>         }
>
>         return 0;
> @@ -592,7 +595,6 @@ out_err:
>         header->snap_sizes = NULL;
>         kfree(header->snap_names);
>         header->snap_names = NULL;
> -       header->snap_names_len = 0;
>         kfree(header->object_prefix);
>         header->object_prefix = NULL;
>
> @@ -660,7 +662,6 @@ static void rbd_header_free(struct rbd_i
>         header->snap_sizes = NULL;
>         kfree(header->snap_names);
>         header->snap_names = NULL;
> -       header->snap_names_len = 0;
>         ceph_put_snap_context(header->snapc);
>         header->snapc = NULL;
>  }
> @@ -1800,7 +1801,6 @@ static int __rbd_refresh_header(struct r
>         rbd_dev->header.total_snaps = h.total_snaps;
>         rbd_dev->header.snapc = h.snapc;
>         rbd_dev->header.snap_names = h.snap_names;
> -       rbd_dev->header.snap_names_len = h.snap_names_len;
>         rbd_dev->header.snap_sizes = h.snap_sizes;
>         /* Free the extra copy of the object prefix */
>         WARN_ON(strcmp(rbd_dev->header.object_prefix, h.object_prefix));
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH, v2 06/11] rbd: add read_only rbd map option
  2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
  2012-09-07 15:45     ` Sage Weil
@ 2012-09-07 21:26     ` Yehuda Sadeh Weinraub
  1 sibling, 0 replies; 35+ messages in thread
From: Yehuda Sadeh Weinraub @ 2012-09-07 21:26 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

On Thu, Sep 6, 2012 at 8:36 AM, Alex Elder <elder@inktank.com> wrote:
> Add the ability to map an rbd image read-only, by specifying either
> "read_only" or "ro" as an option on the rbd "command line."  Also
> allow the inverse to be explicitly specified using "read_write" or
> "rw".
>
> Signed-off-by: Alex Elder <elder@inktank.com>
> ---
> v2: - Made the read_only/ro flag explicitly make the mapping
>       read-only rather than having it toggle.
>     - Added read_write/rw flags to allow a writable mapping to
>       be explicitly requested.  Snapshot mappings are silently
>       mapped read-only (even if read_write was requested).
>
>  drivers/block/rbd.c |   28 ++++++++++++++++++++++++----
>  1 file changed, 24 insertions(+), 4 deletions(-)
>
> Index: b/drivers/block/rbd.c
> ===================================================================
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -69,7 +69,8 @@
>  #define DEV_NAME_LEN           32
>  #define MAX_INT_FORMAT_WIDTH   ((5 * sizeof (int)) / 2 + 1)
>
> -#define RBD_NOTIFY_TIMEOUT_DEFAULT 10
> +#define RBD_NOTIFY_TIMEOUT_DEFAULT     10
> +#define RBD_READ_ONLY_DEFAULT          false
>
>  /*
>   * block device image metadata (in-memory version)
> @@ -91,6 +92,7 @@ struct rbd_image_header {
>
>  struct rbd_options {
>         int     notify_timeout;
> +       bool    read_only;
>  };
>
>  /*
> @@ -176,7 +178,7 @@ struct rbd_device {
>         u64                     snap_id;        /* current snapshot id */
>         /* whether the snap_id this device reads from still exists */
>         bool                    snap_exists;
> -       int                     read_only;
> +       bool                    read_only;
>
>         struct list_head        node;
>
> @@ -351,12 +353,21 @@ enum {
>         /* int args above */
>         Opt_last_string,
>         /* string args above */
> +       Opt_read_only,
> +       Opt_read_write,
> +       /* Boolean args above */
> +       Opt_last_bool,
>  };
>
>  static match_table_t rbd_opts_tokens = {
>         {Opt_notify_timeout, "notify_timeout=%d"},
>         /* int args above */
>         /* string args above */
> +       {Opt_read_only, "read_only"},
> +       {Opt_read_only, "ro"},          /* Alternate spelling */
> +       {Opt_read_write, "read_write"},
> +       {Opt_read_write, "rw"},         /* Alternate spelling */
> +       /* Boolean args above */
>         {-1, NULL}
>  };
>
> @@ -381,6 +392,8 @@ static int parse_rbd_opts_token(char *c,
>         } else if (token > Opt_last_int && token < Opt_last_string) {
>                 dout("got string token %d val %s\n", token,
>                      argstr[0].from);
> +       } else if (token > Opt_last_string && token < Opt_last_bool) {
> +               dout("got Boolean token %d\n", token);
>         } else {
>                 dout("got token %d\n", token);
>         }
> @@ -389,6 +402,12 @@ static int parse_rbd_opts_token(char *c,
>         case Opt_notify_timeout:
>                 rbd_opts->notify_timeout = intval;
>                 break;
> +       case Opt_read_only:
> +               rbd_opts->read_only = true;
> +               break;
> +       case Opt_read_write:
> +               rbd_opts->read_only = false;
> +               break;
>         default:
>                 BUG_ON(token);
>         }
> @@ -407,6 +426,7 @@ static int rbd_get_client(struct rbd_dev
>         struct rbd_client *rbdc;
>
>         rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
> +       rbd_opts->read_only = RBD_READ_ONLY_DEFAULT;
>
>         ceph_opts = ceph_parse_options(options, mon_addr,
>                                         mon_addr + mon_addr_len,
> @@ -620,7 +640,7 @@ static int rbd_header_set_snap(struct rb
>                     sizeof (RBD_SNAP_HEAD_NAME))) {
>                 rbd_dev->snap_id = CEPH_NOSNAP;
>                 rbd_dev->snap_exists = false;
> -               rbd_dev->read_only = 0;
> +               rbd_dev->read_only = rbd_dev->rbd_opts.read_only;
>                 if (size)
>                         *size = rbd_dev->header.image_size;
>         } else {
> @@ -632,7 +652,7 @@ static int rbd_header_set_snap(struct rb
>                         goto done;
>                 rbd_dev->snap_id = snap_id;
>                 rbd_dev->snap_exists = true;
> -               rbd_dev->read_only = 1;
> +               rbd_dev->read_only = true;      /* No choice for snapshots */
>         }
>
>         ret = 0;
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/11] rbd: move rbd_opts to struct rbd_device
  2012-09-06 14:21     ` Alex Elder
@ 2012-09-07 21:40       ` Yehuda Sadeh
  0 siblings, 0 replies; 35+ messages in thread
From: Yehuda Sadeh @ 2012-09-07 21:40 UTC (permalink / raw)
  To: Alex Elder; +Cc: ceph-devel

On Thu, Sep 6, 2012 at 7:21 AM, Alex Elder <elder@inktank.com> wrote:
> On 08/30/2012 12:07 PM, Yehuda Sadeh wrote:
>> On Fri, Aug 24, 2012 at 9:33 AM, Alex Elder <elder@inktank.com> wrote:
>>> The rbd options don't really apply to the ceph client.  So don't
>>> store a pointer to it in the ceph_client structure, and put them
>>> (a struct, not a pointer) into the rbd_dev structure proper.
>>>
>>> Pass the rbd device structure to rbd_client_create() so it can
>>> assign rbd_dev->rbdc if successful, and have it return an error code
>>> instead of the rbd client pointer.
>>>
>>> Signed-off-by: Alex Elder <elder@inktank.com>
>>> ---
>>>  drivers/block/rbd.c |   49
>>> ++++++++++++++++---------------------------------
>>>  1 file changed, 16 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>>> index 130dbc2..b40f553 100644
>>> --- a/drivers/block/rbd.c
>>> +++ b/drivers/block/rbd.c
>>> @@ -98,7 +98,6 @@ struct rbd_options {
>>>   */
>>>  struct rbd_client {
>>>         struct ceph_client      *client;
>>> -       struct rbd_options      *rbd_opts;
>>>         struct kref             kref;
>>>         struct list_head        node;
>>>  };
>>> @@ -152,6 +151,7 @@ struct rbd_device {
>>>         struct gendisk          *disk;          /* blkdev's gendisk and rq */
>>>         struct request_queue    *q;
>>>
>>> +       struct rbd_options      rbd_opts;
>>>         struct rbd_client       *rbd_client;
>>>
>>>         char                    name[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
>>> @@ -273,8 +273,7 @@ static const struct block_device_operations
>>> rbd_bd_ops = {
>>>   * Initialize an rbd client instance.
>>>   * We own *ceph_opts.
>>>   */
>>> -static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts,
>>> -                                           struct rbd_options *rbd_opts)
>>> +static struct rbd_client *rbd_client_create(struct ceph_options *ceph_opts)
>>>  {
>>>         struct rbd_client *rbdc;
>>>         int ret = -ENOMEM;
>>> @@ -298,8 +297,6 @@ static struct rbd_client *rbd_client_create(struct
>>> ceph_options *ceph_opts,
>>>         if (ret < 0)
>>>                 goto out_err;
>>>
>>> -       rbdc->rbd_opts = rbd_opts;
>>> -
>>>         spin_lock(&rbd_client_list_lock);
>>>         list_add_tail(&rbdc->node, &rbd_client_list);
>>>         spin_unlock(&rbd_client_list_lock);
>>> @@ -402,42 +399,33 @@ static int parse_rbd_opts_token(char *c, void
>>> *private)
>>>   * Get a ceph client with specific addr and configuration, if one does
>>>   * not exist create it.
>>>   */
>>> -static struct rbd_client *rbd_get_client(const char *mon_addr,
>>> -                                        size_t mon_addr_len,
>>> -                                        char *options)
>>> +static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,
>>> +                               size_t mon_addr_len, char *options)
>>>  {
>>> -       struct rbd_client *rbdc;
>>> +       struct rbd_options *rbd_opts = &rbd_dev->rbd_opts;
>>>         struct ceph_options *ceph_opts;
>>> -       struct rbd_options *rbd_opts;
>>> -
>>> -       rbd_opts = kzalloc(sizeof(*rbd_opts), GFP_KERNEL);
>>> -       if (!rbd_opts)
>>> -               return ERR_PTR(-ENOMEM);
>>> +       struct rbd_client *rbdc;
>>>
>>>         rbd_opts->notify_timeout = RBD_NOTIFY_TIMEOUT_DEFAULT;
>>>
>>>         ceph_opts = ceph_parse_options(options, mon_addr,
>>>                                         mon_addr + mon_addr_len,
>>>                                         parse_rbd_opts_token, rbd_opts);
>>> -       if (IS_ERR(ceph_opts)) {
>>> -               kfree(rbd_opts);
>>> -               return ERR_CAST(ceph_opts);
>>> -       }
>>> +       if (IS_ERR(ceph_opts))
>>> +               return PTR_ERR(ceph_opts);
>>>
>>>         rbdc = rbd_client_find(ceph_opts);
>>>         if (rbdc) {
>>>                 /* using an existing client */
>>>                 ceph_destroy_options(ceph_opts);
>>
>> It'll probably be better to use a reference count to control ceph_opts lifecycle
>
> I looked at this.
>
> In this case, ceph_opts is directly tied to a ceph_client.
> If a ceph client exists whose options match the one that
> has been created here, that client will be used, and its
> options get destroyed when the ceph_client gets destroyed
> (and the ceph_client is refcounted).  The ceph_opts
> created here will never be referenced by anything else
> so it's safe to just destroy it.
>
> If no existing rbd_client has a ceph_client with matching
> ceph_options, then one is created below and it will use
> the ceph_opts created here and thus its life cycle will
> be managed by that ceph_client's reference count.
>
> I believe this means there is no reason to reference count
> ceph_opts.
>
> Yehuda, please let me know if you disagree with this.  If
> not, would you sign off on this?
>

Yeah, I see it now. Not sure whether the way it is now is the cleanest
way to approach it, but it'll work for now. Maybe we should keep the
existing ceph_opts in libceph, and have ceph_parse_options() test and
increase reference if exists. Though, it might be that it's what we
did before and it was problematic as rbd was sharing configs with
cephfs.

Anyway,

Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-09-07 21:40 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-24 16:26 [PATCH 00/11] rbd: another set of patches Alex Elder
2012-08-24 16:32 ` [PATCH 01/11] rbd: handle locking inside __rbd_client_find() Alex Elder
2012-08-30 16:09   ` Yehuda Sadeh
2012-08-24 16:32 ` [PATCH 02/11] rbd: don't over-allocate space for object prefix Alex Elder
2012-08-30 16:18   ` Yehuda Sadeh
2012-08-24 16:33 ` [PATCH 03/11] rbd: kill incore snap_names_len Alex Elder
2012-08-30 16:24   ` Yehuda Sadeh
2012-08-30 16:41     ` Alex Elder
2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
2012-09-07 21:22     ` Yehuda Sadeh
2012-08-24 16:33 ` [PATCH 04/11] rbd: more cleanup in rbd_header_from_disk() Alex Elder
2012-08-30 16:48   ` Yehuda Sadeh
2012-08-24 16:33 ` [PATCH 05/11] rbd: move rbd_opts to struct rbd_device Alex Elder
2012-08-30 17:07   ` Yehuda Sadeh
2012-09-06 14:21     ` Alex Elder
2012-09-07 21:40       ` Yehuda Sadeh
2012-08-24 16:34 ` [PATCH 06/11] rbd: add read_only rbd map option Alex Elder
2012-08-30 17:29   ` Yehuda Sadeh
2012-08-30 17:39     ` Alex Elder
2012-09-06 15:36   ` [PATCH, v2 " Alex Elder
2012-09-07 15:45     ` Sage Weil
2012-09-07 20:36       ` Alex Elder
2012-09-07 21:26     ` Yehuda Sadeh Weinraub
2012-08-24 16:34 ` [PATCH 07/11] rbd: kill notify_timeout option Alex Elder
2012-08-30 17:31   ` Yehuda Sadeh
2012-08-24 16:35 ` [PATCH 08/11] rbd: bio_chain_clone() cleanups Alex Elder
2012-08-30 17:40   ` Yehuda Sadeh
2012-08-24 16:35 ` [PATCH 09/11] rbd: drop needless test in rbd_rq_fn() Alex Elder
2012-08-30 17:41   ` Yehuda Sadeh
2012-08-24 16:36 ` [PATCH 10/11] rbd: check for overflow in rbd_get_num_segments() Alex Elder
2012-08-30 17:50   ` Yehuda Sadeh
2012-08-24 16:36 ` [PATCH 11/11] rbd: split up rbd_get_segment() Alex Elder
2012-08-30 18:03   ` Yehuda Sadeh
2012-08-30 12:32 ` [PATCH 00/11] rbd: another set of patches Alex Elder
2012-09-06 15:34 ` Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.