All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] wip-overlap
@ 2014-07-24  8:42 Ilya Dryomov
  2014-07-24  8:42 ` [PATCH 1/8] rbd: show the entire chain of parent images Ilya Dryomov
                   ` (7 more replies)
  0 siblings, 8 replies; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

Hi,

This series fixes a bug reported by Sandisk in the "Read from clones"
thread about a week ago.  The description along with a simplified
reproducer are in 7/8 and 8/8.  1/8 touches on our sysfs interface.
The rest are cleanups.

Thanks,

                Ilya


Ilya Dryomov (8):
  rbd: show the entire chain of parent images
  rbd: introduce rbd_dev_header_info()
  rbd: remove unnecessary asserts in rbd_dev_image_probe()
  rbd: split rbd_dev_spec_update() into two functions
  rbd: harden rbd_dev_refresh() caller
  rbd: update mapping size only on refresh
  rbd: do not read in parent info before snap context
  rbd: take snap_id into account when reading in parent info

 Documentation/ABI/testing/sysfs-bus-rbd |    4 +-
 drivers/block/rbd.c                     |  261 ++++++++++++++++---------------
 2 files changed, 137 insertions(+), 128 deletions(-)

-- 
1.7.10.4


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/8] rbd: show the entire chain of parent images
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-24 12:31   ` Alex Elder
  2014-07-24  8:42 ` [PATCH 2/8] rbd: introduce rbd_dev_header_info() Ilya Dryomov
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

Make /sys/bus/rbd/devices/<id>/parent show the entire chain of parent
images.  While at it, kernel sprintf() doesn't return negative values,
casting to unsigned long long is no longer necessary and there is no
good reason to split into multiple sprintf() calls.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 Documentation/ABI/testing/sysfs-bus-rbd |    4 +--
 drivers/block/rbd.c                     |   56 +++++++++++++------------------
 2 files changed, 25 insertions(+), 35 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-bus-rbd b/Documentation/ABI/testing/sysfs-bus-rbd
index 501adc2a9ec7..2ddd680929d8 100644
--- a/Documentation/ABI/testing/sysfs-bus-rbd
+++ b/Documentation/ABI/testing/sysfs-bus-rbd
@@ -94,5 +94,5 @@ current_snap
 
 parent
 
-	Information identifying the pool, image, and snapshot id for
-	the parent image in a layered rbd image (format 2 only).
+	Information identifying the chain of parent images in a layered rbd
+	image.  Entries are separated by empty lines.
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 703b728e05fa..7847fbb949ff 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3685,46 +3685,36 @@ static ssize_t rbd_snap_show(struct device *dev,
 }
 
 /*
- * For an rbd v2 image, shows the pool id, image id, and snapshot id
- * for the parent image.  If there is no parent, simply shows
- * "(no parent image)".
+ * For a v2 image, shows the chain of parent images, separated by empty
+ * lines.  For v1 images or if there is no parent, shows "(no parent
+ * image)".
  */
 static ssize_t rbd_parent_show(struct device *dev,
-			     struct device_attribute *attr,
-			     char *buf)
+			       struct device_attribute *attr,
+			       char *buf)
 {
 	struct rbd_device *rbd_dev = dev_to_rbd_dev(dev);
-	struct rbd_spec *spec = rbd_dev->parent_spec;
-	int count;
-	char *bufp = buf;
+	ssize_t count = 0;
 
-	if (!spec)
+	if (!rbd_dev->parent)
 		return sprintf(buf, "(no parent image)\n");
 
-	count = sprintf(bufp, "pool_id %llu\npool_name %s\n",
-			(unsigned long long) spec->pool_id, spec->pool_name);
-	if (count < 0)
-		return count;
-	bufp += count;
-
-	count = sprintf(bufp, "image_id %s\nimage_name %s\n", spec->image_id,
-			spec->image_name ? spec->image_name : "(unknown)");
-	if (count < 0)
-		return count;
-	bufp += count;
-
-	count = sprintf(bufp, "snap_id %llu\nsnap_name %s\n",
-			(unsigned long long) spec->snap_id, spec->snap_name);
-	if (count < 0)
-		return count;
-	bufp += count;
-
-	count = sprintf(bufp, "overlap %llu\n", rbd_dev->parent_overlap);
-	if (count < 0)
-		return count;
-	bufp += count;
-
-	return (ssize_t) (bufp - buf);
+	for ( ; rbd_dev->parent; rbd_dev = rbd_dev->parent) {
+		struct rbd_spec *spec = rbd_dev->parent_spec;
+
+		count += sprintf(&buf[count], "%s"
+			    "pool_id %llu\npool_name %s\n"
+			    "image_id %s\nimage_name %s\n"
+			    "snap_id %llu\nsnap_name %s\n"
+			    "overlap %llu\n",
+			    !count ? "" : "\n", /* first? */
+			    spec->pool_id, spec->pool_name,
+			    spec->image_id, spec->image_name ?: "(unknown)",
+			    spec->snap_id, spec->snap_name,
+			    rbd_dev->parent_overlap);
+	}
+
+	return count;
 }
 
 static ssize_t rbd_image_refresh(struct device *dev,
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/8] rbd: introduce rbd_dev_header_info()
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
  2014-07-24  8:42 ` [PATCH 1/8] rbd: show the entire chain of parent images Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-24 12:34   ` Alex Elder
  2014-07-24  8:42 ` [PATCH 3/8] rbd: remove unnecessary asserts in rbd_dev_image_probe() Ilya Dryomov
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

A wrapper around rbd_dev_v{1,2}_header_info() to reduce duplication.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 drivers/block/rbd.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 7847fbb949ff..0d3be608f16f 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -514,7 +514,7 @@ static void rbd_dev_remove_parent(struct rbd_device *rbd_dev);
 
 static int rbd_dev_refresh(struct rbd_device *rbd_dev);
 static int rbd_dev_v2_header_onetime(struct rbd_device *rbd_dev);
-static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev);
+static int rbd_dev_header_info(struct rbd_device *rbd_dev);
 static const char *rbd_dev_v2_snap_name(struct rbd_device *rbd_dev,
 					u64 snap_id);
 static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id,
@@ -3506,13 +3506,10 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
 	u64 mapping_size;
 	int ret;
 
-	rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
 	down_write(&rbd_dev->header_rwsem);
 	mapping_size = rbd_dev->mapping.size;
-	if (rbd_dev->image_format == 1)
-		ret = rbd_dev_v1_header_info(rbd_dev);
-	else
-		ret = rbd_dev_v2_header_info(rbd_dev);
+
+	ret = rbd_dev_header_info(rbd_dev);
 
 	/* If it's a mapped snapshot, validate its EXISTS flag */
 
@@ -4501,6 +4498,16 @@ static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev)
 	return ret;
 }
 
+static int rbd_dev_header_info(struct rbd_device *rbd_dev)
+{
+	rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
+
+	if (rbd_dev->image_format == 1)
+		return rbd_dev_v1_header_info(rbd_dev);
+
+	return rbd_dev_v2_header_info(rbd_dev);
+}
+
 static int rbd_bus_add_dev(struct rbd_device *rbd_dev)
 {
 	struct device *dev;
@@ -5149,10 +5156,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
 			goto out_header_name;
 	}
 
-	if (rbd_dev->image_format == 1)
-		ret = rbd_dev_v1_header_info(rbd_dev);
-	else
-		ret = rbd_dev_v2_header_info(rbd_dev);
+	ret = rbd_dev_header_info(rbd_dev);
 	if (ret)
 		goto err_out_watch;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/8] rbd: remove unnecessary asserts in rbd_dev_image_probe()
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
  2014-07-24  8:42 ` [PATCH 1/8] rbd: show the entire chain of parent images Ilya Dryomov
  2014-07-24  8:42 ` [PATCH 2/8] rbd: introduce rbd_dev_header_info() Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-24 12:40   ` Alex Elder
  2014-07-24  8:42 ` [PATCH 4/8] rbd: split rbd_dev_spec_update() into two functions Ilya Dryomov
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

spec->image_id assert doesn't buy us much and image_format is asserted
in rbd_dev_header_name() and rbd_dev_header_info() anyway.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 drivers/block/rbd.c |    2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 0d3be608f16f..4541f6027e4a 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -5143,8 +5143,6 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
 	ret = rbd_dev_image_id(rbd_dev);
 	if (ret)
 		return ret;
-	rbd_assert(rbd_dev->spec->image_id);
-	rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
 
 	ret = rbd_dev_header_name(rbd_dev);
 	if (ret)
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/8] rbd: split rbd_dev_spec_update() into two functions
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
                   ` (2 preceding siblings ...)
  2014-07-24  8:42 ` [PATCH 3/8] rbd: remove unnecessary asserts in rbd_dev_image_probe() Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-24 12:55   ` Alex Elder
  2014-07-24  8:42 ` [PATCH 5/8] rbd: harden rbd_dev_refresh() caller Ilya Dryomov
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

rbd_dev_spec_update() has two modes of operation, with nothing in
common between them.  Split it into two functions, one for each mode
and make our expectations more clear.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 drivers/block/rbd.c |   79 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 48 insertions(+), 31 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 4541f6027e4a..23df1773ef77 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3798,6 +3798,9 @@ static struct rbd_spec *rbd_spec_alloc(void)
 	spec = kzalloc(sizeof (*spec), GFP_KERNEL);
 	if (!spec)
 		return NULL;
+
+	spec->pool_id = CEPH_NOPOOL;
+	spec->snap_id = CEPH_NOSNAP;
 	kref_init(&spec->kref);
 
 	return spec;
@@ -4257,18 +4260,38 @@ static u64 rbd_snap_id_by_name(struct rbd_device *rbd_dev, const char *name)
 }
 
 /*
- * When an rbd image has a parent image, it is identified by the
- * pool, image, and snapshot ids (not names).  This function fills
- * in the names for those ids.  (It's OK if we can't figure out the
- * name for an image id, but the pool and snapshot ids should always
- * exist and have names.)  All names in an rbd spec are dynamically
- * allocated.
+ * An image being mapped will have everything but the snap id.
+ */
+static int rbd_spec_fill_snap_id(struct rbd_device *rbd_dev)
+{
+	struct rbd_spec *spec = rbd_dev->spec;
+
+	rbd_assert(spec->pool_id != CEPH_NOPOOL && spec->pool_name);
+	rbd_assert(spec->image_id && spec->image_name);
+	rbd_assert(spec->snap_name);
+
+	if (strcmp(spec->snap_name, RBD_SNAP_HEAD_NAME)) {
+		u64 snap_id;
+
+		snap_id = rbd_snap_id_by_name(rbd_dev, spec->snap_name);
+		if (snap_id == CEPH_NOSNAP)
+			return -ENOENT;
+
+		spec->snap_id = snap_id;
+	} else {
+		spec->snap_id = CEPH_NOSNAP;
+	}
+
+	return 0;
+}
+
+/*
+ * A parent image will have all ids but none of the names.
  *
- * When an image being mapped (not a parent) is probed, we have the
- * pool name and pool id, image name and image id, and the snapshot
- * name.  The only thing we're missing is the snapshot id.
+ * All names in an rbd spec are dynamically allocated.  It's OK if we
+ * can't figure out the name for an image id.
  */
-static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
+static int rbd_spec_fill_names(struct rbd_device *rbd_dev)
 {
 	struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc;
 	struct rbd_spec *spec = rbd_dev->spec;
@@ -4277,24 +4300,9 @@ static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
 	const char *snap_name;
 	int ret;
 
-	/*
-	 * An image being mapped will have the pool name (etc.), but
-	 * we need to look up the snapshot id.
-	 */
-	if (spec->pool_name) {
-		if (strcmp(spec->snap_name, RBD_SNAP_HEAD_NAME)) {
-			u64 snap_id;
-
-			snap_id = rbd_snap_id_by_name(rbd_dev, spec->snap_name);
-			if (snap_id == CEPH_NOSNAP)
-				return -ENOENT;
-			spec->snap_id = snap_id;
-		} else {
-			spec->snap_id = CEPH_NOSNAP;
-		}
-
-		return 0;
-	}
+	rbd_assert(spec->pool_id != CEPH_NOPOOL);
+	rbd_assert(spec->image_id);
+	rbd_assert(spec->snap_id != CEPH_NOSNAP);
 
 	/* Get the pool name; we have to make our own copy of this */
 
@@ -4313,7 +4321,7 @@ static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
 	if (!image_name)
 		rbd_warn(rbd_dev, "unable to get image name");
 
-	/* Look up the snapshot name, and make a copy */
+	/* Fetch the snapshot name */
 
 	snap_name = rbd_snap_name(rbd_dev, spec->snap_id);
 	if (IS_ERR(snap_name)) {
@@ -4326,10 +4334,10 @@ static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
 	spec->snap_name = snap_name;
 
 	return 0;
+
 out_err:
 	kfree(image_name);
 	kfree(pool_name);
-
 	return ret;
 }
 
@@ -5158,7 +5166,16 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
 	if (ret)
 		goto err_out_watch;
 
-	ret = rbd_dev_spec_update(rbd_dev);
+	/*
+	 * If this image is the one being mapped, we have pool name and
+	 * id, image name and id, and snap name - need to fill snap id.
+	 * Otherwise this is a parent image, identified by pool, image
+	 * and snap ids - need to fill in names for those ids.
+	 */
+	if (mapping)
+		ret = rbd_spec_fill_snap_id(rbd_dev);
+	else
+		ret = rbd_spec_fill_names(rbd_dev);
 	if (ret)
 		goto err_out_probe;
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/8] rbd: harden rbd_dev_refresh() caller
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
                   ` (3 preceding siblings ...)
  2014-07-24  8:42 ` [PATCH 4/8] rbd: split rbd_dev_spec_update() into two functions Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-24 13:09   ` Alex Elder
  2014-07-24  8:42 ` [PATCH 6/8] rbd: update mapping size only on refresh Ilya Dryomov
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

Recently discovered watch/notify problems showed that we really can't
ignore errors in anything refresh related.  Alas, currently there is
not much we can do in response to those errors, except print warnings.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 drivers/block/rbd.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 23df1773ef77..5dd6f5057ef4 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -2963,11 +2963,20 @@ static void rbd_watch_cb(u64 ver, u64 notify_id, u8 opcode, void *data)
 	dout("%s: \"%s\" notify_id %llu opcode %u\n", __func__,
 		rbd_dev->header_name, (unsigned long long)notify_id,
 		(unsigned int)opcode);
+
+	/*
+	 * Until adequate refresh error handling is in place, there is
+	 * not much we can do here, except warn.
+	 *
+	 * See http://tracker.ceph.com/issues/5040
+	 */
 	ret = rbd_dev_refresh(rbd_dev);
 	if (ret)
-		rbd_warn(rbd_dev, "header refresh error (%d)\n", ret);
+		rbd_warn(rbd_dev, "refresh failed: %d\n", ret);
 
-	rbd_obj_notify_ack_sync(rbd_dev, notify_id);
+	ret = rbd_obj_notify_ack_sync(rbd_dev, notify_id);
+	if (ret)
+		rbd_warn(rbd_dev, "notify_ack ret %d\n", ret);
 }
 
 /*
@@ -3724,9 +3733,9 @@ static ssize_t rbd_image_refresh(struct device *dev,
 
 	ret = rbd_dev_refresh(rbd_dev);
 	if (ret)
-		rbd_warn(rbd_dev, ": manual header refresh error (%d)\n", ret);
+		return ret;
 
-	return ret < 0 ? ret : size;
+	return size;
 }
 
 static DEVICE_ATTR(size, S_IRUGO, rbd_size_show, NULL);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/8] rbd: update mapping size only on refresh
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
                   ` (4 preceding siblings ...)
  2014-07-24  8:42 ` [PATCH 5/8] rbd: harden rbd_dev_refresh() caller Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-24 13:25   ` Alex Elder
  2014-07-24  8:42 ` [PATCH 7/8] rbd: do not read in parent info before snap context Ilya Dryomov
  2014-07-24  8:42 ` [PATCH 8/8] rbd: take snap_id into account when reading in parent info Ilya Dryomov
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

There is no sense in trying to update the mapping size before it's even
been set.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 drivers/block/rbd.c |   19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 5dd6f5057ef4..16f388f2960b 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -971,12 +971,6 @@ static int rbd_header_from_disk(struct rbd_device *rbd_dev,
 	header->snap_names = snap_names;
 	header->snap_sizes = snap_sizes;
 
-	/* Make sure mapping size is consistent with header info */
-
-	if (rbd_dev->spec->snap_id == CEPH_NOSNAP || first_time)
-		if (rbd_dev->mapping.size != header->image_size)
-			rbd_dev->mapping.size = header->image_size;
-
 	return 0;
 out_2big:
 	ret = -EIO;
@@ -3520,9 +3514,14 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
 
 	ret = rbd_dev_header_info(rbd_dev);
 
-	/* If it's a mapped snapshot, validate its EXISTS flag */
+	if (rbd_dev->spec->snap_id == CEPH_NOSNAP) {
+		if (rbd_dev->mapping.size != rbd_dev->header.image_size)
+			rbd_dev->mapping.size = rbd_dev->header.image_size;
+	} else {
+		/* validate mapped snapshot's EXISTS flag */
+		rbd_exists_validate(rbd_dev);
+	}
 
-	rbd_exists_validate(rbd_dev);
 	up_write(&rbd_dev->header_rwsem);
 
 	if (mapping_size != rbd_dev->mapping.size) {
@@ -4505,10 +4504,6 @@ static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev)
 					"is EXPERIMENTAL!");
 	}
 
-	if (rbd_dev->spec->snap_id == CEPH_NOSNAP)
-		if (rbd_dev->mapping.size != rbd_dev->header.image_size)
-			rbd_dev->mapping.size = rbd_dev->header.image_size;
-
 	ret = rbd_dev_v2_snap_context(rbd_dev);
 	dout("rbd_dev_v2_snap_context returned %d\n", ret);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/8] rbd: do not read in parent info before snap context
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
                   ` (5 preceding siblings ...)
  2014-07-24  8:42 ` [PATCH 6/8] rbd: update mapping size only on refresh Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-25  8:14   ` Alex Elder
  2014-07-24  8:42 ` [PATCH 8/8] rbd: take snap_id into account when reading in parent info Ilya Dryomov
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

Currently rbd_dev_v2_header_info() reads in parent info before the snap
context is read in.  This is wrong, because we may need to look at the
the parent_overlap value of the snapshot instead of that of the base
image, for example when mapping a snapshot - see next commit.  (When
mapping a snapshot, all we got is its name and we need the snap context
to translate that name into an id to know which parent info to look
for).

The approach taken here is to make sure rbd_dev_v2_parent_info() is
called after the snap context has been read in.  The other approach
would be to add a parent_overlap field to struct rbd_mapping and
maintain it the same way rbd_mapping::size is maintained.  The reason
I chose the first approach is that the value of keeping around both
base image values and the actual mapping values is unclear to me.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 drivers/block/rbd.c |   64 ++++++++++++++++++++++++---------------------------
 1 file changed, 30 insertions(+), 34 deletions(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 16f388f2960b..c4606987e9d1 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -515,6 +515,7 @@ static void rbd_dev_remove_parent(struct rbd_device *rbd_dev);
 static int rbd_dev_refresh(struct rbd_device *rbd_dev);
 static int rbd_dev_v2_header_onetime(struct rbd_device *rbd_dev);
 static int rbd_dev_header_info(struct rbd_device *rbd_dev);
+static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev);
 static const char *rbd_dev_v2_snap_name(struct rbd_device *rbd_dev,
 					u64 snap_id);
 static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id,
@@ -3513,6 +3514,18 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
 	mapping_size = rbd_dev->mapping.size;
 
 	ret = rbd_dev_header_info(rbd_dev);
+	if (ret)
+		return ret;
+
+	/*
+	 * If there is a parent, see if it has disappeared due to the
+	 * mapped image getting flattened.
+	 */
+	if (rbd_dev->parent) {
+		ret = rbd_dev_v2_parent_info(rbd_dev);
+		if (ret)
+			return ret;
+	}
 
 	if (rbd_dev->spec->snap_id == CEPH_NOSNAP) {
 		if (rbd_dev->mapping.size != rbd_dev->header.image_size)
@@ -3524,11 +3537,10 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
 
 	up_write(&rbd_dev->header_rwsem);
 
-	if (mapping_size != rbd_dev->mapping.size) {
+	if (mapping_size != rbd_dev->mapping.size)
 		rbd_dev_update_size(rbd_dev);
-	}
 
-	return ret;
+	return 0;
 }
 
 static int rbd_init_disk(struct rbd_device *rbd_dev)
@@ -4477,33 +4489,6 @@ static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev)
 			return ret;
 	}
 
-	/*
-	 * If the image supports layering, get the parent info.  We
-	 * need to probe the first time regardless.  Thereafter we
-	 * only need to if there's a parent, to see if it has
-	 * disappeared due to the mapped image getting flattened.
-	 */
-	if (rbd_dev->header.features & RBD_FEATURE_LAYERING &&
-			(first_time || rbd_dev->parent_spec)) {
-		bool warn;
-
-		ret = rbd_dev_v2_parent_info(rbd_dev);
-		if (ret)
-			return ret;
-
-		/*
-		 * Print a warning if this is the initial probe and
-		 * the image has a parent.  Don't print it if the
-		 * image now being probed is itself a parent.  We
-		 * can tell at this point because we won't know its
-		 * pool name yet (just its pool id).
-		 */
-		warn = rbd_dev->parent_spec && rbd_dev->spec->pool_name;
-		if (first_time && warn)
-			rbd_warn(rbd_dev, "WARNING: kernel layering "
-					"is EXPERIMENTAL!");
-	}
-
 	ret = rbd_dev_v2_snap_context(rbd_dev);
 	dout("rbd_dev_v2_snap_context returned %d\n", ret);
 
@@ -5183,14 +5168,28 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
 	if (ret)
 		goto err_out_probe;
 
+	if (rbd_dev->header.features & RBD_FEATURE_LAYERING) {
+		ret = rbd_dev_v2_parent_info(rbd_dev);
+		if (ret)
+			goto err_out_probe;
+
+		/*
+		 * Need to warn users if this image is the one being
+		 * mapped and has a parent.
+		 */
+		if (mapping && rbd_dev->parent_spec)
+			rbd_warn(rbd_dev,
+				 "WARNING: kernel layering is EXPERIMENTAL!");
+	}
+
 	ret = rbd_dev_probe_parent(rbd_dev);
 	if (ret)
 		goto err_out_probe;
 
 	dout("discovered format %u image, header name is %s\n",
 		rbd_dev->image_format, rbd_dev->header_name);
-
 	return 0;
+
 err_out_probe:
 	rbd_dev_unprobe(rbd_dev);
 err_out_watch:
@@ -5203,9 +5202,6 @@ err_out_format:
 	rbd_dev->image_format = 0;
 	kfree(rbd_dev->spec->image_id);
 	rbd_dev->spec->image_id = NULL;
-
-	dout("probe failed, returning %d\n", ret);
-
 	return ret;
 }
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/8] rbd: take snap_id into account when reading in parent info
  2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
                   ` (6 preceding siblings ...)
  2014-07-24  8:42 ` [PATCH 7/8] rbd: do not read in parent info before snap context Ilya Dryomov
@ 2014-07-24  8:42 ` Ilya Dryomov
  2014-07-24 18:43   ` Alex Elder
  7 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24  8:42 UTC (permalink / raw)
  To: ceph-devel

If we are mapping a snapshot, we must read in the parent_overlap value
of that snapshot instead of that of the base image.  Not doing so may
in particular result in us returning zeros instead of user data:

    # cat overlap-snap.sh
    #!/bin/bash
    rbd create --size 10 --image-format 2 foo
    FOO_DEV=$(rbd map foo)
    dd if=/dev/urandom of=/dev/rbd0 bs=1M &>/dev/null
    echo "Base image"
    dd if=$FOO_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    rbd snap create bar@snap
    BAR_DEV=$(rbd map bar@snap)
    echo "Snapshot"
    dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
    rbd resize --allow-shrink --size 4 bar
    echo "Snapshot after base image resize"
    dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd

    # ./overlap-snap.sh
    Base image
    0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
    Snapshot
    0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
    Resizing image: 100% complete...done.
    Snapshot after base image resize
    0000000: e781 e33b d34b 2225 0000 0000 0000 0000  ...;.K"%........

Even though bar@snap was taken with the old bar parent_overlap (8M),
reads from bar@snap beyond the new bar parent_overlap (4M) return
zeroes.  Fix it.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
---
 drivers/block/rbd.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c4606987e9d1..cbc89fa9a677 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4020,7 +4020,7 @@ static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev)
 		goto out_err;
 	}
 
-	snapid = cpu_to_le64(CEPH_NOSNAP);
+	snapid = cpu_to_le64(rbd_dev->spec->snap_id);
 	ret = rbd_obj_method_sync(rbd_dev, rbd_dev->header_name,
 				"rbd", "get_parent",
 				&snapid, sizeof (snapid),
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/8] rbd: show the entire chain of parent images
  2014-07-24  8:42 ` [PATCH 1/8] rbd: show the entire chain of parent images Ilya Dryomov
@ 2014-07-24 12:31   ` Alex Elder
  2014-07-24 12:45     ` Ilya Dryomov
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Elder @ 2014-07-24 12:31 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> Make /sys/bus/rbd/devices/<id>/parent show the entire chain of parent
> images.  While at it, kernel sprintf() doesn't return negative values,
> casting to unsigned long long is no longer necessary and there is no
> good reason to split into multiple sprintf() calls.

I like the use of a single snprintf() call, it is much less
cumbersome.

The reason I always cast u64 values to (unsigned long long)
in printf() calls is because on some 64-bit architectures,
u64 is defined as simply (unsigned long), so without the
cast they spit out warnings.  I hate this, but it is reality
(see include/asm-generic/{int-l64.h,int-ll64.h}).  You can
alternatively use %PRIu64 rather than %llu in your format
strings, but I think I hate that more.  Anyway, if you want
to avoid warnings on all architectures you should fix that.

As another aside, I've been too timid to use the ?: conditional
expression without its middle operand.  I have no objection
to it at all, I like it.  I bring it up because I recently
got burned for using a gcc feature that wasn't supported
on older compilers (unnamed struct/union fields)--specifically
a version newer than gcc 3.2, which is the minimum supported
version for the kernel (see Documentation/Changes).

But fear not!  That extension is supported in gcc 3.2:

https://gcc.gnu.org/onlinedocs/gcc-3.2/gcc/Conditionals.html#Conditionals
Just FYI...

Reviewed-by: Alex Elder <elder@linaro.org>

> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
>  Documentation/ABI/testing/sysfs-bus-rbd |    4 +--
>  drivers/block/rbd.c                     |   56 +++++++++++++------------------
>  2 files changed, 25 insertions(+), 35 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-rbd b/Documentation/ABI/testing/sysfs-bus-rbd
> index 501adc2a9ec7..2ddd680929d8 100644
> --- a/Documentation/ABI/testing/sysfs-bus-rbd
> +++ b/Documentation/ABI/testing/sysfs-bus-rbd
> @@ -94,5 +94,5 @@ current_snap
>  
>  parent
>  
> -	Information identifying the pool, image, and snapshot id for
> -	the parent image in a layered rbd image (format 2 only).
> +	Information identifying the chain of parent images in a layered rbd
> +	image.  Entries are separated by empty lines.
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 703b728e05fa..7847fbb949ff 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -3685,46 +3685,36 @@ static ssize_t rbd_snap_show(struct device *dev,
>  }
>  
>  /*
> - * For an rbd v2 image, shows the pool id, image id, and snapshot id
> - * for the parent image.  If there is no parent, simply shows
> - * "(no parent image)".
> + * For a v2 image, shows the chain of parent images, separated by empty
> + * lines.  For v1 images or if there is no parent, shows "(no parent
> + * image)".
>   */
>  static ssize_t rbd_parent_show(struct device *dev,
> -			     struct device_attribute *attr,
> -			     char *buf)
> +			       struct device_attribute *attr,
> +			       char *buf)
>  {
>  	struct rbd_device *rbd_dev = dev_to_rbd_dev(dev);
> -	struct rbd_spec *spec = rbd_dev->parent_spec;
> -	int count;
> -	char *bufp = buf;
> +	ssize_t count = 0;
>  
> -	if (!spec)
> +	if (!rbd_dev->parent)
>  		return sprintf(buf, "(no parent image)\n");
>  
> -	count = sprintf(bufp, "pool_id %llu\npool_name %s\n",
> -			(unsigned long long) spec->pool_id, spec->pool_name);
> -	if (count < 0)
> -		return count;
> -	bufp += count;
> -
> -	count = sprintf(bufp, "image_id %s\nimage_name %s\n", spec->image_id,
> -			spec->image_name ? spec->image_name : "(unknown)");
> -	if (count < 0)
> -		return count;
> -	bufp += count;
> -
> -	count = sprintf(bufp, "snap_id %llu\nsnap_name %s\n",
> -			(unsigned long long) spec->snap_id, spec->snap_name);
> -	if (count < 0)
> -		return count;
> -	bufp += count;
> -
> -	count = sprintf(bufp, "overlap %llu\n", rbd_dev->parent_overlap);
> -	if (count < 0)
> -		return count;
> -	bufp += count;
> -
> -	return (ssize_t) (bufp - buf);
> +	for ( ; rbd_dev->parent; rbd_dev = rbd_dev->parent) {
> +		struct rbd_spec *spec = rbd_dev->parent_spec;
> +
> +		count += sprintf(&buf[count], "%s"
> +			    "pool_id %llu\npool_name %s\n"
> +			    "image_id %s\nimage_name %s\n"
> +			    "snap_id %llu\nsnap_name %s\n"
> +			    "overlap %llu\n",
> +			    !count ? "" : "\n", /* first? */
> +			    spec->pool_id, spec->pool_name,
> +			    spec->image_id, spec->image_name ?: "(unknown)",
> +			    spec->snap_id, spec->snap_name,
> +			    rbd_dev->parent_overlap);
> +	}
> +
> +	return count;
>  }
>  
>  static ssize_t rbd_image_refresh(struct device *dev,
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/8] rbd: introduce rbd_dev_header_info()
  2014-07-24  8:42 ` [PATCH 2/8] rbd: introduce rbd_dev_header_info() Ilya Dryomov
@ 2014-07-24 12:34   ` Alex Elder
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-24 12:34 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> A wrapper around rbd_dev_v{1,2}_header_info() to reduce duplication.

Looks good.

Reviewed-by: Alex Elder <elder@linaro.org>

> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
>  drivers/block/rbd.c |   24 ++++++++++++++----------
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 7847fbb949ff..0d3be608f16f 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -514,7 +514,7 @@ static void rbd_dev_remove_parent(struct rbd_device *rbd_dev);
>  
>  static int rbd_dev_refresh(struct rbd_device *rbd_dev);
>  static int rbd_dev_v2_header_onetime(struct rbd_device *rbd_dev);
> -static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev);
> +static int rbd_dev_header_info(struct rbd_device *rbd_dev);
>  static const char *rbd_dev_v2_snap_name(struct rbd_device *rbd_dev,
>  					u64 snap_id);
>  static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id,
> @@ -3506,13 +3506,10 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
>  	u64 mapping_size;
>  	int ret;
>  
> -	rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
>  	down_write(&rbd_dev->header_rwsem);
>  	mapping_size = rbd_dev->mapping.size;
> -	if (rbd_dev->image_format == 1)
> -		ret = rbd_dev_v1_header_info(rbd_dev);
> -	else
> -		ret = rbd_dev_v2_header_info(rbd_dev);
> +
> +	ret = rbd_dev_header_info(rbd_dev);
>  
>  	/* If it's a mapped snapshot, validate its EXISTS flag */
>  
> @@ -4501,6 +4498,16 @@ static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev)
>  	return ret;
>  }
>  
> +static int rbd_dev_header_info(struct rbd_device *rbd_dev)
> +{
> +	rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
> +
> +	if (rbd_dev->image_format == 1)
> +		return rbd_dev_v1_header_info(rbd_dev);
> +
> +	return rbd_dev_v2_header_info(rbd_dev);
> +}
> +
>  static int rbd_bus_add_dev(struct rbd_device *rbd_dev)
>  {
>  	struct device *dev;
> @@ -5149,10 +5156,7 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
>  			goto out_header_name;
>  	}
>  
> -	if (rbd_dev->image_format == 1)
> -		ret = rbd_dev_v1_header_info(rbd_dev);
> -	else
> -		ret = rbd_dev_v2_header_info(rbd_dev);
> +	ret = rbd_dev_header_info(rbd_dev);
>  	if (ret)
>  		goto err_out_watch;
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] rbd: remove unnecessary asserts in rbd_dev_image_probe()
  2014-07-24  8:42 ` [PATCH 3/8] rbd: remove unnecessary asserts in rbd_dev_image_probe() Ilya Dryomov
@ 2014-07-24 12:40   ` Alex Elder
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-24 12:40 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> spec->image_id assert doesn't buy us much and image_format is asserted
> in rbd_dev_header_name() and rbd_dev_header_info() anyway.

Looks good.  Over time I'm sure there are more assertions
that can go away; they were most useful during rapid development
when confidence in certain things was lower.

Reviewed-by: Alex Elder <elder@linaro.org>


> 
> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
>  drivers/block/rbd.c |    2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 0d3be608f16f..4541f6027e4a 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -5143,8 +5143,6 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
>  	ret = rbd_dev_image_id(rbd_dev);
>  	if (ret)
>  		return ret;
> -	rbd_assert(rbd_dev->spec->image_id);
> -	rbd_assert(rbd_image_format_valid(rbd_dev->image_format));
>  
>  	ret = rbd_dev_header_name(rbd_dev);
>  	if (ret)
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/8] rbd: show the entire chain of parent images
  2014-07-24 12:31   ` Alex Elder
@ 2014-07-24 12:45     ` Ilya Dryomov
  0 siblings, 0 replies; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24 12:45 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development

On Thu, Jul 24, 2014 at 4:31 PM, Alex Elder <elder@ieee.org> wrote:
> On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
>> Make /sys/bus/rbd/devices/<id>/parent show the entire chain of parent
>> images.  While at it, kernel sprintf() doesn't return negative values,
>> casting to unsigned long long is no longer necessary and there is no
>> good reason to split into multiple sprintf() calls.
>
> I like the use of a single snprintf() call, it is much less
> cumbersome.
>
> The reason I always cast u64 values to (unsigned long long)
> in printf() calls is because on some 64-bit architectures,
> u64 is defined as simply (unsigned long), so without the
> cast they spit out warnings.  I hate this, but it is reality
> (see include/asm-generic/{int-l64.h,int-ll64.h}).  You can
> alternatively use %PRIu64 rather than %llu in your format
> strings, but I think I hate that more.  Anyway, if you want
> to avoid warnings on all architectures you should fix that.

IIRC all architectures nowadays use int-ll64.h.  int-l64.h is used only
in userspace for compatibility.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] rbd: split rbd_dev_spec_update() into two functions
  2014-07-24  8:42 ` [PATCH 4/8] rbd: split rbd_dev_spec_update() into two functions Ilya Dryomov
@ 2014-07-24 12:55   ` Alex Elder
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-24 12:55 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> rbd_dev_spec_update() has two modes of operation, with nothing in
> common between them.  Split it into two functions, one for each mode
> and make our expectations more clear.
> 
> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>

Looks good.

Reviewed-by: Alex Elder <elder@linaro.org>

> ---
>  drivers/block/rbd.c |   79 +++++++++++++++++++++++++++++++--------------------
>  1 file changed, 48 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 4541f6027e4a..23df1773ef77 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -3798,6 +3798,9 @@ static struct rbd_spec *rbd_spec_alloc(void)
>  	spec = kzalloc(sizeof (*spec), GFP_KERNEL);
>  	if (!spec)
>  		return NULL;
> +
> +	spec->pool_id = CEPH_NOPOOL;
> +	spec->snap_id = CEPH_NOSNAP;
>  	kref_init(&spec->kref);
>  
>  	return spec;
> @@ -4257,18 +4260,38 @@ static u64 rbd_snap_id_by_name(struct rbd_device *rbd_dev, const char *name)
>  }
>  
>  /*
> - * When an rbd image has a parent image, it is identified by the
> - * pool, image, and snapshot ids (not names).  This function fills
> - * in the names for those ids.  (It's OK if we can't figure out the
> - * name for an image id, but the pool and snapshot ids should always
> - * exist and have names.)  All names in an rbd spec are dynamically
> - * allocated.
> + * An image being mapped will have everything but the snap id.
> + */
> +static int rbd_spec_fill_snap_id(struct rbd_device *rbd_dev)
> +{
> +	struct rbd_spec *spec = rbd_dev->spec;
> +
> +	rbd_assert(spec->pool_id != CEPH_NOPOOL && spec->pool_name);
> +	rbd_assert(spec->image_id && spec->image_name);
> +	rbd_assert(spec->snap_name);
> +
> +	if (strcmp(spec->snap_name, RBD_SNAP_HEAD_NAME)) {
> +		u64 snap_id;
> +
> +		snap_id = rbd_snap_id_by_name(rbd_dev, spec->snap_name);
> +		if (snap_id == CEPH_NOSNAP)
> +			return -ENOENT;
> +
> +		spec->snap_id = snap_id;
> +	} else {
> +		spec->snap_id = CEPH_NOSNAP;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
> + * A parent image will have all ids but none of the names.
>   *
> - * When an image being mapped (not a parent) is probed, we have the
> - * pool name and pool id, image name and image id, and the snapshot
> - * name.  The only thing we're missing is the snapshot id.
> + * All names in an rbd spec are dynamically allocated.  It's OK if we
> + * can't figure out the name for an image id.
>   */
> -static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
> +static int rbd_spec_fill_names(struct rbd_device *rbd_dev)
>  {
>  	struct ceph_osd_client *osdc = &rbd_dev->rbd_client->client->osdc;
>  	struct rbd_spec *spec = rbd_dev->spec;
> @@ -4277,24 +4300,9 @@ static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
>  	const char *snap_name;
>  	int ret;
>  
> -	/*
> -	 * An image being mapped will have the pool name (etc.), but
> -	 * we need to look up the snapshot id.
> -	 */
> -	if (spec->pool_name) {
> -		if (strcmp(spec->snap_name, RBD_SNAP_HEAD_NAME)) {
> -			u64 snap_id;
> -
> -			snap_id = rbd_snap_id_by_name(rbd_dev, spec->snap_name);
> -			if (snap_id == CEPH_NOSNAP)
> -				return -ENOENT;
> -			spec->snap_id = snap_id;
> -		} else {
> -			spec->snap_id = CEPH_NOSNAP;
> -		}
> -
> -		return 0;
> -	}
> +	rbd_assert(spec->pool_id != CEPH_NOPOOL);
> +	rbd_assert(spec->image_id);
> +	rbd_assert(spec->snap_id != CEPH_NOSNAP);
>  
>  	/* Get the pool name; we have to make our own copy of this */
>  
> @@ -4313,7 +4321,7 @@ static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
>  	if (!image_name)
>  		rbd_warn(rbd_dev, "unable to get image name");
>  
> -	/* Look up the snapshot name, and make a copy */
> +	/* Fetch the snapshot name */
>  
>  	snap_name = rbd_snap_name(rbd_dev, spec->snap_id);
>  	if (IS_ERR(snap_name)) {
> @@ -4326,10 +4334,10 @@ static int rbd_dev_spec_update(struct rbd_device *rbd_dev)
>  	spec->snap_name = snap_name;
>  
>  	return 0;
> +
>  out_err:
>  	kfree(image_name);
>  	kfree(pool_name);
> -
>  	return ret;
>  }
>  
> @@ -5158,7 +5166,16 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
>  	if (ret)
>  		goto err_out_watch;
>  
> -	ret = rbd_dev_spec_update(rbd_dev);
> +	/*
> +	 * If this image is the one being mapped, we have pool name and
> +	 * id, image name and id, and snap name - need to fill snap id.
> +	 * Otherwise this is a parent image, identified by pool, image
> +	 * and snap ids - need to fill in names for those ids.
> +	 */
> +	if (mapping)
> +		ret = rbd_spec_fill_snap_id(rbd_dev);
> +	else
> +		ret = rbd_spec_fill_names(rbd_dev);
>  	if (ret)
>  		goto err_out_probe;
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] rbd: harden rbd_dev_refresh() caller
  2014-07-24  8:42 ` [PATCH 5/8] rbd: harden rbd_dev_refresh() caller Ilya Dryomov
@ 2014-07-24 13:09   ` Alex Elder
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-24 13:09 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> Recently discovered watch/notify problems showed that we really can't
> ignore errors in anything refresh related.  Alas, currently there is
> not much we can do in response to those errors, except print warnings.

Looks good.

Reviewed-by: Alex Elder <elder@linaro.org>


> 
> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
>  drivers/block/rbd.c |   17 +++++++++++++----
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 23df1773ef77..5dd6f5057ef4 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -2963,11 +2963,20 @@ static void rbd_watch_cb(u64 ver, u64 notify_id, u8 opcode, void *data)
>  	dout("%s: \"%s\" notify_id %llu opcode %u\n", __func__,
>  		rbd_dev->header_name, (unsigned long long)notify_id,
>  		(unsigned int)opcode);
> +
> +	/*
> +	 * Until adequate refresh error handling is in place, there is
> +	 * not much we can do here, except warn.
> +	 *
> +	 * See http://tracker.ceph.com/issues/5040
> +	 */
>  	ret = rbd_dev_refresh(rbd_dev);
>  	if (ret)
> -		rbd_warn(rbd_dev, "header refresh error (%d)\n", ret);
> +		rbd_warn(rbd_dev, "refresh failed: %d\n", ret);
>  
> -	rbd_obj_notify_ack_sync(rbd_dev, notify_id);
> +	ret = rbd_obj_notify_ack_sync(rbd_dev, notify_id);
> +	if (ret)
> +		rbd_warn(rbd_dev, "notify_ack ret %d\n", ret);
>  }
>  
>  /*
> @@ -3724,9 +3733,9 @@ static ssize_t rbd_image_refresh(struct device *dev,
>  
>  	ret = rbd_dev_refresh(rbd_dev);
>  	if (ret)
> -		rbd_warn(rbd_dev, ": manual header refresh error (%d)\n", ret);
> +		return ret;
>  
> -	return ret < 0 ? ret : size;
> +	return size;
>  }
>  
>  static DEVICE_ATTR(size, S_IRUGO, rbd_size_show, NULL);
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] rbd: update mapping size only on refresh
  2014-07-24  8:42 ` [PATCH 6/8] rbd: update mapping size only on refresh Ilya Dryomov
@ 2014-07-24 13:25   ` Alex Elder
  2014-07-24 13:46     ` Ilya Dryomov
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Elder @ 2014-07-24 13:25 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> There is no sense in trying to update the mapping size before it's even
> been set.

It took me a bit to follow this.  But basically there is
no mapping unless it's mapped.  So previously this was
updating the mapping information even for unmapped
parent (or other) images.  There's no need to update
the mapping size for a snapshot--it'll never change.

Is that right?  If not, please advise; otherwise:

Reviewed-by: Alex Elder <elder@linaro.org>

> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
>  drivers/block/rbd.c |   19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 5dd6f5057ef4..16f388f2960b 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -971,12 +971,6 @@ static int rbd_header_from_disk(struct rbd_device *rbd_dev,
>  	header->snap_names = snap_names;
>  	header->snap_sizes = snap_sizes;
>  
> -	/* Make sure mapping size is consistent with header info */
> -
> -	if (rbd_dev->spec->snap_id == CEPH_NOSNAP || first_time)
> -		if (rbd_dev->mapping.size != header->image_size)
> -			rbd_dev->mapping.size = header->image_size;
> -
>  	return 0;
>  out_2big:
>  	ret = -EIO;
> @@ -3520,9 +3514,14 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
>  
>  	ret = rbd_dev_header_info(rbd_dev);
>  
> -	/* If it's a mapped snapshot, validate its EXISTS flag */
> +	if (rbd_dev->spec->snap_id == CEPH_NOSNAP) {
> +		if (rbd_dev->mapping.size != rbd_dev->header.image_size)
> +			rbd_dev->mapping.size = rbd_dev->header.image_size;
> +	} else {
> +		/* validate mapped snapshot's EXISTS flag */
> +		rbd_exists_validate(rbd_dev);
> +	}
>  
> -	rbd_exists_validate(rbd_dev);
>  	up_write(&rbd_dev->header_rwsem);
>  
>  	if (mapping_size != rbd_dev->mapping.size) {
> @@ -4505,10 +4504,6 @@ static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev)
>  					"is EXPERIMENTAL!");
>  	}
>  
> -	if (rbd_dev->spec->snap_id == CEPH_NOSNAP)
> -		if (rbd_dev->mapping.size != rbd_dev->header.image_size)
> -			rbd_dev->mapping.size = rbd_dev->header.image_size;
> -
>  	ret = rbd_dev_v2_snap_context(rbd_dev);
>  	dout("rbd_dev_v2_snap_context returned %d\n", ret);
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] rbd: update mapping size only on refresh
  2014-07-24 13:25   ` Alex Elder
@ 2014-07-24 13:46     ` Ilya Dryomov
  2014-07-24 15:10       ` Ilya Dryomov
  2014-07-24 17:59       ` Alex Elder
  0 siblings, 2 replies; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24 13:46 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development

On Thu, Jul 24, 2014 at 5:25 PM, Alex Elder <elder@ieee.org> wrote:
> On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
>> There is no sense in trying to update the mapping size before it's even
>> been set.
>
> It took me a bit to follow this.  But basically there is
> no mapping unless it's mapped.  So previously this was
> updating the mapping information even for unmapped
> parent (or other) images.  There's no need to update
> the mapping size for a snapshot--it'll never change.
>
> Is that right?  If not, please advise; otherwise:

No.  Previously it was updating the mapping size both on the inital map
and on refresh (of the base image).  Doing that on the inital map
doesn't make sense: not only struct rbd_mapping fields aren't properly
initialized at that point - rbd_dev_mapping_set() is called much later
in the map sequence, snap_id isn't initialized either (at least in the
format 2 case, I haven't looked too closely at the format 1 case).  And
just in general, trying to update something before it had a chance to
change is bogus..

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] rbd: update mapping size only on refresh
  2014-07-24 13:46     ` Ilya Dryomov
@ 2014-07-24 15:10       ` Ilya Dryomov
  2014-07-25 13:31         ` Alex Elder
  2014-07-24 17:59       ` Alex Elder
  1 sibling, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-24 15:10 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development

On Thu, Jul 24, 2014 at 5:46 PM, Ilya Dryomov <ilya.dryomov@inktank.com> wrote:
> On Thu, Jul 24, 2014 at 5:25 PM, Alex Elder <elder@ieee.org> wrote:
>> On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
>>> There is no sense in trying to update the mapping size before it's even
>>> been set.
>>
>> It took me a bit to follow this.  But basically there is
>> no mapping unless it's mapped.  So previously this was
>> updating the mapping information even for unmapped
>> parent (or other) images.  There's no need to update
>> the mapping size for a snapshot--it'll never change.
>>
>> Is that right?  If not, please advise; otherwise:
>
> No.  Previously it was updating the mapping size both on the inital map
> and on refresh (of the base image).  Doing that on the inital map
> doesn't make sense: not only struct rbd_mapping fields aren't properly
> initialized at that point - rbd_dev_mapping_set() is called much later
> in the map sequence, snap_id isn't initialized either (at least in the
> format 2 case, I haven't looked too closely at the format 1 case).  And
> just in general, trying to update something before it had a chance to
> change is bogus..

BTW, while we are on the subject of struct rbd_mapping, is there any
particular reason to keep around both the base image values and actual
mapping values?  I am tempted to change the mapping sequence so that 1)
snap context is read in immediately after watch setup, before anything
else is done, 2) supplied snap name is resolved into an id and 3) the
actual (i.e. based on snap_id) mapping size, features, parent_overlap,
etc are read in directly into struct rbd_image_header.  That would
allow to rip struct rbd_mapping entirely and make the code more clear.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] rbd: update mapping size only on refresh
  2014-07-24 13:46     ` Ilya Dryomov
  2014-07-24 15:10       ` Ilya Dryomov
@ 2014-07-24 17:59       ` Alex Elder
  1 sibling, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-24 17:59 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

On 07/24/2014 08:46 AM, Ilya Dryomov wrote:
> On Thu, Jul 24, 2014 at 5:25 PM, Alex Elder <elder@ieee.org> wrote:
>> On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
>>> There is no sense in trying to update the mapping size before it's even
>>> been set.
>>
>> It took me a bit to follow this.  But basically there is
>> no mapping unless it's mapped.  So previously this was
>> updating the mapping information even for unmapped
>> parent (or other) images.  There's no need to update
>> the mapping size for a snapshot--it'll never change.
>>
>> Is that right?  If not, please advise; otherwise:
> 
> No.  Previously it was updating the mapping size both on the inital map
> and on refresh (of the base image).  Doing that on the inital map
> doesn't make sense: not only struct rbd_mapping fields aren't properly
> initialized at that point - rbd_dev_mapping_set() is called much later
> in the map sequence, snap_id isn't initialized either (at least in the
> format 2 case, I haven't looked too closely at the format 1 case).  And
> just in general, trying to update something before it had a chance to
> change is bogus..

OK, now I see it.  Hopefully this makes sense.  Here is how it
was previously structured:

  rbd_add()
    do_rbd_add()
     |rbd_dev_image_probe()
     |  rbd_dev_header_info()
     |    rbd_dev_v1_header_info()    or    rbd_dev_v2_header_info()
     |      rbd_header_from_disk()            <update mapping>
     |        <update mapping>
     |  . . .
     |rbd_dev_device_setup()
     |  rbd_dev_mapping_set()

So neither rbd_header_from_disk() nor rbd_dev_v2_header_info()
should be using the values in the rbd_dev->mapping field during
initial image probe, because they have not yet been initialized
at that point.

Meanwhile, the only reason we need to *update* a mapping size
is if we learn it may have changed, which will be covered by a
refresh, so doing so in rbd_dev_refresh() makes sense.  And
as long as you know whether it's mapping the base image you
might as well do the rbd_exists_validate() call conditionally.

OK, this all looks good and makes good sense to me.
Thank you for explaining it.

Reviewed-by: Alex Elder <elder@linaro.org>


> 
> Thanks,
> 
>                 Ilya
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 8/8] rbd: take snap_id into account when reading in parent info
  2014-07-24  8:42 ` [PATCH 8/8] rbd: take snap_id into account when reading in parent info Ilya Dryomov
@ 2014-07-24 18:43   ` Alex Elder
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-24 18:43 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> If we are mapping a snapshot, we must read in the parent_overlap value
> of that snapshot instead of that of the base image.  Not doing so may
> in particular result in us returning zeros instead of user data:
> 
>     # cat overlap-snap.sh
>     #!/bin/bash
>     rbd create --size 10 --image-format 2 foo
>     FOO_DEV=$(rbd map foo)
>     dd if=/dev/urandom of=/dev/rbd0 bs=1M &>/dev/null
>     echo "Base image"
>     dd if=$FOO_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
>     rbd snap create foo@snap
>     rbd snap protect foo@snap
>     rbd clone foo@snap bar
>     rbd snap create bar@snap
>     BAR_DEV=$(rbd map bar@snap)
>     echo "Snapshot"
>     dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
>     rbd resize --allow-shrink --size 4 bar
>     echo "Snapshot after base image resize"
>     dd if=$BAR_DEV bs=1 count=16 skip=$(((4 << 20) - 8)) 2>/dev/null | xxd
> 
>     # ./overlap-snap.sh
>     Base image
>     0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
>     Snapshot
>     0000000: e781 e33b d34b 2225 6034 2845 a2e3 36ed  ...;.K"%`4(E..6.
>     Resizing image: 100% complete...done.
>     Snapshot after base image resize
>     0000000: e781 e33b d34b 2225 0000 0000 0000 0000  ...;.K"%........
> 
> Even though bar@snap was taken with the old bar parent_overlap (8M),
> reads from bar@snap beyond the new bar parent_overlap (4M) return
> zeroes.  Fix it.
> 
> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
>  drivers/block/rbd.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index c4606987e9d1..cbc89fa9a677 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -4020,7 +4020,7 @@ static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev)
>  		goto out_err;
>  	}
>  
> -	snapid = cpu_to_le64(CEPH_NOSNAP);
> +	snapid = cpu_to_le64(rbd_dev->spec->snap_id);

Well that's just an outright bug.  It's been there since the
original commit that added parent support:
    86b00e0 rbd: get parent spec for version 2 images
Parent images *must* be snapshots, so this was never
right.

I bet that was hard to figure out...

Looks good.

Reviewed-by: Alex Elder <elder@linaro.org>


>  	ret = rbd_obj_method_sync(rbd_dev, rbd_dev->header_name,
>  				"rbd", "get_parent",
>  				&snapid, sizeof (snapid),
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/8] rbd: do not read in parent info before snap context
  2014-07-24  8:42 ` [PATCH 7/8] rbd: do not read in parent info before snap context Ilya Dryomov
@ 2014-07-25  8:14   ` Alex Elder
  2014-07-25  8:36     ` Ilya Dryomov
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Elder @ 2014-07-25  8:14 UTC (permalink / raw)
  To: Ilya Dryomov, ceph-devel

On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
> Currently rbd_dev_v2_header_info() reads in parent info before the snap
> context is read in.  This is wrong, because we may need to look at the
> the parent_overlap value of the snapshot instead of that of the base

I had some trouble understanding this.

The parent image, snapshot id and overlap, are together a property
of a particular image (and associated with its header object).
Nothing about that is dependent on the child image snapshots
or snapshot id.

However...

> image, for example when mapping a snapshot - see next commit.  (When
> mapping a snapshot, all we got is its name and we need the snap context
> to translate that name into an id to know which parent info to look
> for).

I finally figured out what path through the code you
were talking about.  Here's what I see.

On the initial probe (previously), we have:
  rbd_add()
    do_rbd_add()
      rbd_dev_image_probe()
       |rbd_dev_header_info()
       | |rbd_dev_v2_header_info()
       | | |rbd_dev_v2_parent_info()
       | | | --> expects rbd_dev->spec->snap_id to be valid
       | | |rbd_dev_v2_snap_context()
       | | | --> this fills in the snapshot context
       |rbd_spec_fill_snap_id()
       | | --> fills rbd_dev->spec->snap_id
       |rbd_dev_probe_parent()

So clearly, at least when mapping a snapshot, we would not
get the desired result.  We'll be unable to look up the id
for the named snapshot, so would get ENOENT.

Now you've pulled getting the parent info back out
into rbd_dev_image_probe():
  rbd_add()
    do_rbd_add()
      rbd_dev_image_probe()
       |rbd_dev_header_info()
       | |rbd_dev_v2_header_info()
       | | |rbd_dev_v2_snap_context()
       | | | --> this fills in the snapshot context
       |rbd_spec_fill_snap_id()
       | | --> fills rbd_dev->spec->snap_id
       |rbd_dev_v2_parent_info()
       | | --> rbd_dev->spec->snap_id will be valid
       |rbd_dev_probe_parent()

In the refresh path it's similar.  You move the
rbd_dev_v2_parent_info() call into rbd_dev_refresh()
instead of it happening in rbd_dev_v2_header_info().
Missing the ordering problem here might have caused
even more subtle problems (due to using an apparently
valid but possibly out-of-date snapshot context).

Given this understanding I'd say your change looks
good.

> The approach taken here is to make sure rbd_dev_v2_parent_info() is
> called after the snap context has been read in.  The other approach
> would be to add a parent_overlap field to struct rbd_mapping and
> maintain it the same way rbd_mapping::size is maintained.  The reason
> I chose the first approach is that the value of keeping around both
> base image values and the actual mapping values is unclear to me.

I'll think about this and respond to your followup e-mail.

Reviewed-by: Alex Elder <elder@linaro.org>

> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
> ---
>  drivers/block/rbd.c |   64 ++++++++++++++++++++++++---------------------------
>  1 file changed, 30 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
> index 16f388f2960b..c4606987e9d1 100644
> --- a/drivers/block/rbd.c
> +++ b/drivers/block/rbd.c
> @@ -515,6 +515,7 @@ static void rbd_dev_remove_parent(struct rbd_device *rbd_dev);
>  static int rbd_dev_refresh(struct rbd_device *rbd_dev);
>  static int rbd_dev_v2_header_onetime(struct rbd_device *rbd_dev);
>  static int rbd_dev_header_info(struct rbd_device *rbd_dev);
> +static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev);
>  static const char *rbd_dev_v2_snap_name(struct rbd_device *rbd_dev,
>  					u64 snap_id);
>  static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id,
> @@ -3513,6 +3514,18 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
>  	mapping_size = rbd_dev->mapping.size;
>  
>  	ret = rbd_dev_header_info(rbd_dev);

This looked odd.  But I guess I didn't notice you
didn't check the return value when you first added
this line a few patches ago...

> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * If there is a parent, see if it has disappeared due to the
> +	 * mapped image getting flattened.
> +	 */
> +	if (rbd_dev->parent) {
> +		ret = rbd_dev_v2_parent_info(rbd_dev);
> +		if (ret)
> +			return ret;
> +	}
>  
>  	if (rbd_dev->spec->snap_id == CEPH_NOSNAP) {
>  		if (rbd_dev->mapping.size != rbd_dev->header.image_size)
> @@ -3524,11 +3537,10 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
>  
>  	up_write(&rbd_dev->header_rwsem);
>  
> -	if (mapping_size != rbd_dev->mapping.size) {
> +	if (mapping_size != rbd_dev->mapping.size)
>  		rbd_dev_update_size(rbd_dev);
> -	}
>  
> -	return ret;
> +	return 0;
>  }
>  
>  static int rbd_init_disk(struct rbd_device *rbd_dev)
> @@ -4477,33 +4489,6 @@ static int rbd_dev_v2_header_info(struct rbd_device *rbd_dev)
>  			return ret;
>  	}
>  
> -	/*
> -	 * If the image supports layering, get the parent info.  We
> -	 * need to probe the first time regardless.  Thereafter we
> -	 * only need to if there's a parent, to see if it has
> -	 * disappeared due to the mapped image getting flattened.
> -	 */
> -	if (rbd_dev->header.features & RBD_FEATURE_LAYERING &&
> -			(first_time || rbd_dev->parent_spec)) {
> -		bool warn;
> -
> -		ret = rbd_dev_v2_parent_info(rbd_dev);
> -		if (ret)
> -			return ret;
> -
> -		/*
> -		 * Print a warning if this is the initial probe and
> -		 * the image has a parent.  Don't print it if the
> -		 * image now being probed is itself a parent.  We
> -		 * can tell at this point because we won't know its
> -		 * pool name yet (just its pool id).
> -		 */
> -		warn = rbd_dev->parent_spec && rbd_dev->spec->pool_name;
> -		if (first_time && warn)
> -			rbd_warn(rbd_dev, "WARNING: kernel layering "
> -					"is EXPERIMENTAL!");
> -	}
> -
>  	ret = rbd_dev_v2_snap_context(rbd_dev);
>  	dout("rbd_dev_v2_snap_context returned %d\n", ret);
>  
> @@ -5183,14 +5168,28 @@ static int rbd_dev_image_probe(struct rbd_device *rbd_dev, bool mapping)
>  	if (ret)
>  		goto err_out_probe;
>  
> +	if (rbd_dev->header.features & RBD_FEATURE_LAYERING) {
> +		ret = rbd_dev_v2_parent_info(rbd_dev);
> +		if (ret)
> +			goto err_out_probe;
> +
> +		/*
> +		 * Need to warn users if this image is the one being
> +		 * mapped and has a parent.
> +		 */
> +		if (mapping && rbd_dev->parent_spec)
> +			rbd_warn(rbd_dev,
> +				 "WARNING: kernel layering is EXPERIMENTAL!");
> +	}
> +
>  	ret = rbd_dev_probe_parent(rbd_dev);
>  	if (ret)
>  		goto err_out_probe;
>  
>  	dout("discovered format %u image, header name is %s\n",
>  		rbd_dev->image_format, rbd_dev->header_name);
> -
>  	return 0;
> +
>  err_out_probe:
>  	rbd_dev_unprobe(rbd_dev);
>  err_out_watch:
> @@ -5203,9 +5202,6 @@ err_out_format:
>  	rbd_dev->image_format = 0;
>  	kfree(rbd_dev->spec->image_id);
>  	rbd_dev->spec->image_id = NULL;
> -
> -	dout("probe failed, returning %d\n", ret);
> -
>  	return ret;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/8] rbd: do not read in parent info before snap context
  2014-07-25  8:14   ` Alex Elder
@ 2014-07-25  8:36     ` Ilya Dryomov
  2014-07-25 12:46       ` Alex Elder
  0 siblings, 1 reply; 24+ messages in thread
From: Ilya Dryomov @ 2014-07-25  8:36 UTC (permalink / raw)
  To: Alex Elder; +Cc: Ceph Development

On Fri, Jul 25, 2014 at 12:14 PM, Alex Elder <elder@ieee.org> wrote:
> On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
>> Currently rbd_dev_v2_header_info() reads in parent info before the snap
>> context is read in.  This is wrong, because we may need to look at the
>> the parent_overlap value of the snapshot instead of that of the base
>
> I had some trouble understanding this.
>
> The parent image, snapshot id and overlap, are together a property
> of a particular image (and associated with its header object).
> Nothing about that is dependent on the child image snapshots
> or snapshot id.
>
> However...
>
>> image, for example when mapping a snapshot - see next commit.  (When
>> mapping a snapshot, all we got is its name and we need the snap context
>> to translate that name into an id to know which parent info to look
>> for).
>
> I finally figured out what path through the code you
> were talking about.  Here's what I see.
>
> On the initial probe (previously), we have:
>   rbd_add()
>     do_rbd_add()
>       rbd_dev_image_probe()
>        |rbd_dev_header_info()
>        | |rbd_dev_v2_header_info()
>        | | |rbd_dev_v2_parent_info()
>        | | | --> expects rbd_dev->spec->snap_id to be valid
>        | | |rbd_dev_v2_snap_context()
>        | | | --> this fills in the snapshot context
>        |rbd_spec_fill_snap_id()
>        | | --> fills rbd_dev->spec->snap_id
>        |rbd_dev_probe_parent()
>
> So clearly, at least when mapping a snapshot, we would not
> get the desired result.  We'll be unable to look up the id
> for the named snapshot, so would get ENOENT.
>
> Now you've pulled getting the parent info back out
> into rbd_dev_image_probe():
>   rbd_add()
>     do_rbd_add()
>       rbd_dev_image_probe()
>        |rbd_dev_header_info()
>        | |rbd_dev_v2_header_info()
>        | | |rbd_dev_v2_snap_context()
>        | | | --> this fills in the snapshot context
>        |rbd_spec_fill_snap_id()
>        | | --> fills rbd_dev->spec->snap_id
>        |rbd_dev_v2_parent_info()
>        | | --> rbd_dev->spec->snap_id will be valid
>        |rbd_dev_probe_parent()
>
> In the refresh path it's similar.  You move the
> rbd_dev_v2_parent_info() call into rbd_dev_refresh()
> instead of it happening in rbd_dev_v2_header_info().
> Missing the ordering problem here might have caused
> even more subtle problems (due to using an apparently
> valid but possibly out-of-date snapshot context).
>
> Given this understanding I'd say your change looks
> good.

Yeah, basically any time we read in snapshot metadata (both when
mapping a snapshot or following a chain of parent images) we need to
look at the parent_overlap of the snapshot (i.e. the parent_overlap of
the base image at the time the snapshot was taken).  That requires
having snap_id at hand when the call to rbd_dev_v2_parent_info() is
made.

>
>> The approach taken here is to make sure rbd_dev_v2_parent_info() is
>> called after the snap context has been read in.  The other approach
>> would be to add a parent_overlap field to struct rbd_mapping and
>> maintain it the same way rbd_mapping::size is maintained.  The reason
>> I chose the first approach is that the value of keeping around both
>> base image values and the actual mapping values is unclear to me.
>
> I'll think about this and respond to your followup e-mail.
>
> Reviewed-by: Alex Elder <elder@linaro.org>
>
>> Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
>> ---
>>  drivers/block/rbd.c |   64 ++++++++++++++++++++++++---------------------------
>>  1 file changed, 30 insertions(+), 34 deletions(-)
>>
>> diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
>> index 16f388f2960b..c4606987e9d1 100644
>> --- a/drivers/block/rbd.c
>> +++ b/drivers/block/rbd.c
>> @@ -515,6 +515,7 @@ static void rbd_dev_remove_parent(struct rbd_device *rbd_dev);
>>  static int rbd_dev_refresh(struct rbd_device *rbd_dev);
>>  static int rbd_dev_v2_header_onetime(struct rbd_device *rbd_dev);
>>  static int rbd_dev_header_info(struct rbd_device *rbd_dev);
>> +static int rbd_dev_v2_parent_info(struct rbd_device *rbd_dev);
>>  static const char *rbd_dev_v2_snap_name(struct rbd_device *rbd_dev,
>>                                       u64 snap_id);
>>  static int _rbd_dev_v2_snap_size(struct rbd_device *rbd_dev, u64 snap_id,
>> @@ -3513,6 +3514,18 @@ static int rbd_dev_refresh(struct rbd_device *rbd_dev)
>>       mapping_size = rbd_dev->mapping.size;
>>
>>       ret = rbd_dev_header_info(rbd_dev);
>
> This looked odd.  But I guess I didn't notice you
> didn't check the return value when you first added
> this line a few patches ago...

This is what rbd_dev_refresh() did before any of my changes.  It would
go ahead with trying to update mapping size and validating snapshot
existance even if rbd_dev_v{1,2}_header_info() had failed.

>
>> +     if (ret)
>> +             return ret;

I'll move this bit to "harden rbd_dev_refresh()" commit.

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 7/8] rbd: do not read in parent info before snap context
  2014-07-25  8:36     ` Ilya Dryomov
@ 2014-07-25 12:46       ` Alex Elder
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-25 12:46 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

On 07/25/2014 03:36 AM, Ilya Dryomov wrote:
>> >
>>> >> +     if (ret)
>>> >> +             return ret;
> I'll move this bit to "harden rbd_dev_refresh()" commit.

Sounds good.	-Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] rbd: update mapping size only on refresh
  2014-07-24 15:10       ` Ilya Dryomov
@ 2014-07-25 13:31         ` Alex Elder
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Elder @ 2014-07-25 13:31 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

On 07/24/2014 10:10 AM, Ilya Dryomov wrote:
> On Thu, Jul 24, 2014 at 5:46 PM, Ilya Dryomov <ilya.dryomov@inktank.com> wrote:
>> On Thu, Jul 24, 2014 at 5:25 PM, Alex Elder <elder@ieee.org> wrote:
>>> On 07/24/2014 03:42 AM, Ilya Dryomov wrote:
>>>> There is no sense in trying to update the mapping size before it's even
>>>> been set.
>>>
>>> It took me a bit to follow this.  But basically there is
>>> no mapping unless it's mapped.  So previously this was
>>> updating the mapping information even for unmapped
>>> parent (or other) images.  There's no need to update
>>> the mapping size for a snapshot--it'll never change.
>>>
>>> Is that right?  If not, please advise; otherwise:
>>
>> No.  Previously it was updating the mapping size both on the inital map
>> and on refresh (of the base image).  Doing that on the inital map
>> doesn't make sense: not only struct rbd_mapping fields aren't properly
>> initialized at that point - rbd_dev_mapping_set() is called much later
>> in the map sequence, snap_id isn't initialized either (at least in the
>> format 2 case, I haven't looked too closely at the format 1 case).  And
>> just in general, trying to update something before it had a chance to
>> change is bogus..
> 
> BTW, while we are on the subject of struct rbd_mapping, is there any
> particular reason to keep around both the base image values and actual
> mapping values?  I am tempted to change the mapping sequence so that 1)
> snap context is read in immediately after watch setup, before anything
> else is done, 2) supplied snap name is resolved into an id and 3) the
> actual (i.e. based on snap_id) mapping size, features, parent_overlap,
> etc are read in directly into struct rbd_image_header.  That would
> allow to rip struct rbd_mapping entirely and make the code more clear.

The rbd_mapping structure started out well-intentioned but
over time it seems clear it hasn't added the value it was
intended to add.  Here's where it got created:
    f84344f rbd: separate mapping info in rbd_dev
The only original field that survives is read_only.  There's
no harm at all in just moving the fields in that structure
out into the rbd_device structure.

Preserving the base image size in the header structure is
an artifact of the version 1 code, where it held the last
version of the on-disk header data.  The version 2 code
maintains it for consistency.

You could eliminate that I suppose.  I think the result
would require rbd_header_from_disk() to know about the
mapping rather than doing a simple conversion of data
from one form to another.

I say try it, and if you like the result I probably will too...

					-Alex


> Thanks,
> 
>                 Ilya
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-07-25 13:31 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-24  8:42 [PATCH 0/8] wip-overlap Ilya Dryomov
2014-07-24  8:42 ` [PATCH 1/8] rbd: show the entire chain of parent images Ilya Dryomov
2014-07-24 12:31   ` Alex Elder
2014-07-24 12:45     ` Ilya Dryomov
2014-07-24  8:42 ` [PATCH 2/8] rbd: introduce rbd_dev_header_info() Ilya Dryomov
2014-07-24 12:34   ` Alex Elder
2014-07-24  8:42 ` [PATCH 3/8] rbd: remove unnecessary asserts in rbd_dev_image_probe() Ilya Dryomov
2014-07-24 12:40   ` Alex Elder
2014-07-24  8:42 ` [PATCH 4/8] rbd: split rbd_dev_spec_update() into two functions Ilya Dryomov
2014-07-24 12:55   ` Alex Elder
2014-07-24  8:42 ` [PATCH 5/8] rbd: harden rbd_dev_refresh() caller Ilya Dryomov
2014-07-24 13:09   ` Alex Elder
2014-07-24  8:42 ` [PATCH 6/8] rbd: update mapping size only on refresh Ilya Dryomov
2014-07-24 13:25   ` Alex Elder
2014-07-24 13:46     ` Ilya Dryomov
2014-07-24 15:10       ` Ilya Dryomov
2014-07-25 13:31         ` Alex Elder
2014-07-24 17:59       ` Alex Elder
2014-07-24  8:42 ` [PATCH 7/8] rbd: do not read in parent info before snap context Ilya Dryomov
2014-07-25  8:14   ` Alex Elder
2014-07-25  8:36     ` Ilya Dryomov
2014-07-25 12:46       ` Alex Elder
2014-07-24  8:42 ` [PATCH 8/8] rbd: take snap_id into account when reading in parent info Ilya Dryomov
2014-07-24 18:43   ` Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.