All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] Support setting the array size from userspace
@ 2009-03-06  0:24 Dan Williams
  2009-03-06  0:24 ` [PATCH 1/3] md: add 'size' as a personality method Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Dan Williams @ 2009-03-06  0:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, ed.ciechanowski, jacek.danecki

This series, against the 'md/for-next' branch, adds a sysfs attribute
for pinning the array size.  As discussed earlier this is being
initiated to support the extra size rounding expected for Intel(R)
Matrix metadata arrays.  For example a 20GB 4-disk raid5 array created
in the orom will have the per-device size set to 13981448 sectors.  MD
calculates the array size as 41944320 sectors while the metadata records
the size as 41943040 sectors.

Tested against a raid5 grow from 4 to 5 disks, and a grow that attempts
to set the size smaller than the userspace pinned size.  The attribute
displays 'default' when the size is unpinned, to aid debug.

---

Dan Williams (3):
      md: 'array_size' sysfs attribute
      md: centralize ->array_sectors modifications
      md: add 'size' as a personality method


 drivers/md/faulty.c       |   14 ++++++
 drivers/md/linear.c       |   15 ++++++-
 drivers/md/md.c           |   97 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/multipath.c    |   11 +++++
 drivers/md/raid0.c        |   22 ++++++++--
 drivers/md/raid1.c        |   19 +++++++--
 drivers/md/raid10.c       |   27 ++++++++++++-
 drivers/md/raid5.c        |   49 +++++++++++++++++------
 include/linux/raid/md.h   |    2 +
 include/linux/raid/md_k.h |    2 +
 10 files changed, 229 insertions(+), 29 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] md: add 'size' as a personality method
  2009-03-06  0:24 [PATCH 0/3] Support setting the array size from userspace Dan Williams
@ 2009-03-06  0:24 ` Dan Williams
  2009-03-06 16:15   ` Andre Noll
  2009-03-06  0:24 ` [PATCH 2/3] md: centralize ->array_sectors modifications Dan Williams
  2009-03-06  0:24 ` [PATCH 3/3] md: 'array_size' sysfs attribute Dan Williams
  2 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2009-03-06  0:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, ed.ciechanowski, jacek.danecki

In preparation for giving userspace control over ->array_sectors we need
to be able to retrieve the 'default' size, and the 'anticipated' size
when a reshape is requested.  For personalities that do not reshape emit
a warning if anything but the default size is requested.

In the raid5 case we need to update ->previous_raid_disks to make the
new 'default' size available.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/md/faulty.c       |   14 +++++++++++++-
 drivers/md/linear.c       |   15 +++++++++++++--
 drivers/md/multipath.c    |   11 ++++++++++-
 drivers/md/raid0.c        |   20 ++++++++++++++++----
 drivers/md/raid1.c        |   13 +++++++++++--
 drivers/md/raid10.c       |   27 +++++++++++++++++++++++++--
 drivers/md/raid5.c        |   36 ++++++++++++++++++++++++++----------
 include/linux/raid/md_k.h |    1 +
 8 files changed, 115 insertions(+), 22 deletions(-)

diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index b3b0437..9bf3629 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -280,6 +280,17 @@ static int reconfig(mddev_t *mddev, int layout, int chunk_size)
 	return 0;
 }
 
+static sector_t faulty_size(mddev_t *mddev, sector_t sectors, int raid_disks)
+{
+	WARN_ONCE(raid_disks,
+		  "%s does not support generic reshape\n", __func__);
+
+	if (sectors == 0)
+		return mddev->dev_sectors;
+
+	return sectors;
+}
+
 static int run(mddev_t *mddev)
 {
 	mdk_rdev_t *rdev;
@@ -298,7 +309,7 @@ static int run(mddev_t *mddev)
 	list_for_each_entry(rdev, &mddev->disks, same_set)
 		conf->rdev = rdev;
 
-	mddev->array_sectors = mddev->dev_sectors;
+	mddev->array_sectors = faulty_size(mddev, 0, 0);
 	mddev->private = conf;
 
 	reconfig(mddev, mddev->layout, -1);
@@ -325,6 +336,7 @@ static struct mdk_personality faulty_personality =
 	.stop		= stop,
 	.status		= status,
 	.reconfig	= reconfig,
+	.size		= faulty_size,
 };
 
 static int __init raid_init(void)
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 83110f8..4a7397e 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -97,6 +97,16 @@ static int linear_congested(void *data, int bits)
 	return ret;
 }
 
+static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks)
+{
+	linear_conf_t *conf = mddev_to_conf(mddev);
+
+	WARN_ONCE(sectors || raid_disks,
+		  "%s does not support generic reshape\n", __func__);
+
+	return conf->array_sectors;
+}
+
 static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks)
 {
 	linear_conf_t *conf;
@@ -249,7 +259,7 @@ static int linear_run (mddev_t *mddev)
 	if (!conf)
 		return 1;
 	mddev->private = conf;
-	mddev->array_sectors = conf->array_sectors;
+	mddev->array_sectors = linear_size(mddev, 0, 0);
 
 	blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
 	mddev->queue->unplug_fn = linear_unplug;
@@ -283,7 +293,7 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev)
 	newconf->prev = mddev_to_conf(mddev);
 	mddev->private = newconf;
 	mddev->raid_disks++;
-	mddev->array_sectors = newconf->array_sectors;
+	mddev->array_sectors = linear_size(mddev, 0, 0);
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	return 0;
 }
@@ -381,6 +391,7 @@ static struct mdk_personality linear_personality =
 	.stop		= linear_stop,
 	.status		= linear_status,
 	.hot_add_disk	= linear_add,
+	.size		= linear_size,
 };
 
 static int __init linear_init (void)
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index a21084b..1ca72ed 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -402,6 +402,14 @@ static void multipathd (mddev_t *mddev)
 	spin_unlock_irqrestore(&conf->device_lock, flags);
 }
 
+static sector_t multipath_size(mddev_t *mddev, sector_t sectors, int raid_disks)
+{
+	WARN_ONCE(sectors || raid_disks,
+		  "%s does not support generic reshape\n", __func__);
+
+	return mddev->dev_sectors;
+}
+
 static int multipath_run (mddev_t *mddev)
 {
 	multipath_conf_t *conf;
@@ -498,7 +506,7 @@ static int multipath_run (mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	mddev->array_sectors = mddev->dev_sectors;
+	mddev->array_sectors = multipath_size(mddev, 0, 0);
 
 	mddev->queue->unplug_fn = multipath_unplug;
 	mddev->queue->backing_dev_info.congested_fn = multipath_congested;
@@ -543,6 +551,7 @@ static struct mdk_personality multipath_personality =
 	.error_handler	= multipath_error,
 	.hot_add_disk	= multipath_add_disk,
 	.hot_remove_disk= multipath_remove_disk,
+	.size		= multipath_size,
 };
 
 static int __init multipath_init (void)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index a5037de..d5c8ea9 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -260,12 +260,25 @@ static int raid0_mergeable_bvec(struct request_queue *q,
 		return max;
 }
 
+static sector_t raid0_size(mddev_t *mddev, sector_t sectors, int raid_disks)
+{
+	sector_t array_sectors = 0;
+	mdk_rdev_t *rdev;
+
+	WARN_ONCE(sectors || raid_disks,
+		  "%s does not support generic reshape\n", __func__);
+
+	list_for_each_entry(rdev, &mddev->disks, same_set)
+		array_sectors += rdev->sectors;
+
+	return array_sectors;
+}
+
 static int raid0_run (mddev_t *mddev)
 {
 	unsigned  cur=0, i=0, nb_zone;
 	s64 sectors;
 	raid0_conf_t *conf;
-	mdk_rdev_t *rdev;
 
 	if (mddev->chunk_size == 0) {
 		printk(KERN_ERR "md/raid0: non-zero chunk size required.\n");
@@ -290,9 +303,7 @@ static int raid0_run (mddev_t *mddev)
 		goto out_free_conf;
 
 	/* calculate array device size */
-	mddev->array_sectors = 0;
-	list_for_each_entry(rdev, &mddev->disks, same_set)
-		mddev->array_sectors += rdev->sectors;
+	mddev->array_sectors = raid0_size(mddev, 0, 0);
 
 	printk(KERN_INFO "raid0 : md_size is %llu sectors.\n",
 		(unsigned long long)mddev->array_sectors);
@@ -508,6 +519,7 @@ static struct mdk_personality raid0_personality=
 	.run		= raid0_run,
 	.stop		= raid0_stop,
 	.status		= raid0_status,
+	.size		= raid0_size,
 };
 
 static int __init raid0_init (void)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 9778bef..5f62d42 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1919,6 +1919,14 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 	return nr_sectors;
 }
 
+static sector_t raid1_size(mddev_t *mddev, sector_t sectors, int raid_disks)
+{
+	if (sectors)
+		return sectors;
+
+	return mddev->dev_sectors;
+}
+
 static int run(mddev_t *mddev)
 {
 	conf_t *conf;
@@ -2048,7 +2056,7 @@ static int run(mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	mddev->array_sectors = mddev->dev_sectors;
+	mddev->array_sectors = raid1_size(mddev, 0, 0);
 
 	mddev->queue->unplug_fn = raid1_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid1_congested;
@@ -2113,7 +2121,7 @@ static int raid1_resize(mddev_t *mddev, sector_t sectors)
 	 * any io in the removed space completes, but it hardly seems
 	 * worth it.
 	 */
-	mddev->array_sectors = sectors;
+	mddev->array_sectors = raid1_size(mddev, sectors, 0);
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	mddev->changed = 1;
 	if (mddev->array_sectors > mddev->dev_sectors &&
@@ -2267,6 +2275,7 @@ static struct mdk_personality raid1_personality =
 	.spare_active	= raid1_spare_active,
 	.sync_request	= sync_request,
 	.resize		= raid1_resize,
+	.size		= raid1_size,
 	.check_reshape	= raid1_reshape,
 	.quiesce	= raid1_quiesce,
 };
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 5579a26..1c28679 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2020,6 +2020,28 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *skipped, i
 	goto skipped;
 }
 
+static sector_t
+raid10_size(mddev_t *mddev, sector_t sectors, int raid_disks)
+{
+	sector_t size;
+	int chunk_shift;
+	conf_t *conf = mddev_to_conf(mddev);
+	int chunk_size = mddev->chunk_size;
+
+	if (!raid_disks)
+		raid_disks = mddev->raid_disks;
+	if (!sectors)
+		sectors = mddev->dev_sectors;
+
+	chunk_shift = ffz(~chunk_size) - 9;
+	size = sectors >> chunk_shift;
+	sector_div(size, conf->far_copies);
+	size = size * raid_disks;
+	sector_div(size, conf->near_copies);
+
+	return size << chunk_shift;
+}
+
 static int run(mddev_t *mddev)
 {
 	conf_t *conf;
@@ -2171,8 +2193,8 @@ static int run(mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	mddev->array_sectors = size << conf->chunk_shift;
-	mddev->resync_max_sectors = size << conf->chunk_shift;
+	mddev->array_sectors = raid10_size(mddev, 0, 0);
+	mddev->resync_max_sectors = mddev->array_sectors;
 
 	mddev->queue->unplug_fn = raid10_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid10_congested;
@@ -2258,6 +2280,7 @@ static struct mdk_personality raid10_personality =
 	.spare_active	= raid10_spare_active,
 	.sync_request	= sync_request,
 	.quiesce	= raid10_quiesce,
+	.size		= raid10_size,
 };
 
 static int __init raid_init(void)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 46b097a..3cc570b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4298,6 +4298,21 @@ static raid5_conf_t *setup_conf(mddev_t *mddev)
 		return ERR_PTR(-ENOMEM);
 }
 
+static sector_t
+raid5_size(mddev_t *mddev, sector_t sectors, int raid_disks)
+{
+	raid5_conf_t *conf = mddev_to_conf(mddev);
+	int chunk_size = mddev->chunk_size;
+
+	if (!sectors)
+		sectors = mddev->dev_sectors;
+	if (!raid_disks)
+		raid_disks = conf->previous_raid_disks;
+
+	sectors &= ~(chunk_size / 512 - 1);
+	return sectors * (raid_disks - conf->max_degraded);
+}
+
 static int run(mddev_t *mddev)
 {
 	raid5_conf_t *conf;
@@ -4457,8 +4472,7 @@ static int run(mddev_t *mddev)
 	mddev->queue->backing_dev_info.congested_data = mddev;
 	mddev->queue->backing_dev_info.congested_fn = raid5_congested;
 
-	mddev->array_sectors = mddev->dev_sectors *
-		(conf->previous_raid_disks - conf->max_degraded);
+	mddev->array_sectors = raid5_size(mddev, 0, 0);
 
 	blk_queue_merge_bvec(mddev->queue, raid5_mergeable_bvec);
 
@@ -4679,11 +4693,8 @@ static int raid5_resize(mddev_t *mddev, sector_t sectors)
 	 * any io in the removed space completes, but it hardly seems
 	 * worth it.
 	 */
-	raid5_conf_t *conf = mddev_to_conf(mddev);
-
 	sectors &= ~((sector_t)mddev->chunk_size/512 - 1);
-	mddev->array_sectors = sectors * (mddev->raid_disks
-					  - conf->max_degraded);
+	mddev->array_sectors = raid5_size(mddev, sectors, mddev->raid_disks);
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	mddev->changed = 1;
 	if (sectors > mddev->dev_sectors && mddev->recovery_cp == MaxSector) {
@@ -4819,10 +4830,12 @@ static void end_reshape(raid5_conf_t *conf)
 	struct block_device *bdev;
 
 	if (!test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
-		conf->mddev->array_sectors = conf->mddev->dev_sectors *
-			(conf->raid_disks - conf->max_degraded);
-		set_capacity(conf->mddev->gendisk, conf->mddev->array_sectors);
-		conf->mddev->changed = 1;
+		mddev_t *mddev = conf->mddev;
+
+		mddev->array_sectors = raid5_size(mddev, 0, conf->raid_disks);
+		set_capacity(mddev->gendisk, mddev->array_sectors);
+		mddev->changed = 1;
+		conf->previous_raid_disks = conf->raid_disks;
 
 		bdev = bdget_disk(conf->mddev->gendisk, 0);
 		if (bdev) {
@@ -5071,6 +5084,7 @@ static struct mdk_personality raid6_personality =
 	.spare_active	= raid5_spare_active,
 	.sync_request	= sync_request,
 	.resize		= raid5_resize,
+	.size		= raid5_size,
 #ifdef CONFIG_MD_RAID5_RESHAPE
 	.check_reshape	= raid5_check_reshape,
 	.start_reshape  = raid5_start_reshape,
@@ -5093,6 +5107,7 @@ static struct mdk_personality raid5_personality =
 	.spare_active	= raid5_spare_active,
 	.sync_request	= sync_request,
 	.resize		= raid5_resize,
+	.size		= raid5_size,
 #ifdef CONFIG_MD_RAID5_RESHAPE
 	.check_reshape	= raid5_check_reshape,
 	.start_reshape  = raid5_start_reshape,
@@ -5117,6 +5132,7 @@ static struct mdk_personality raid4_personality =
 	.spare_active	= raid5_spare_active,
 	.sync_request	= sync_request,
 	.resize		= raid5_resize,
+	.size		= raid5_size,
 #ifdef CONFIG_MD_RAID5_RESHAPE
 	.check_reshape	= raid5_check_reshape,
 	.start_reshape  = raid5_start_reshape,
diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h
index 68a6ac0..564ce81 100644
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -325,6 +325,7 @@ struct mdk_personality
 	int (*spare_active) (mddev_t *mddev);
 	sector_t (*sync_request)(mddev_t *mddev, sector_t sector_nr, int *skipped, int go_faster);
 	int (*resize) (mddev_t *mddev, sector_t sectors);
+	sector_t (*size) (mddev_t *mddev, sector_t sectors, int raid_disks);
 	int (*check_reshape) (mddev_t *mddev);
 	int (*start_reshape) (mddev_t *mddev);
 	int (*reconfig) (mddev_t *mddev, int layout, int chunk_size);


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] md: centralize ->array_sectors modifications
  2009-03-06  0:24 [PATCH 0/3] Support setting the array size from userspace Dan Williams
  2009-03-06  0:24 ` [PATCH 1/3] md: add 'size' as a personality method Dan Williams
@ 2009-03-06  0:24 ` Dan Williams
  2009-03-06  0:24 ` [PATCH 3/3] md: 'array_size' sysfs attribute Dan Williams
  2 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2009-03-06  0:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, ed.ciechanowski, jacek.danecki

Get personalities out of the business of directly modifying
->array_sectors.  Lays groundwork to introduce policy on when
->array_sectors can be modified.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/md/faulty.c     |    2 +-
 drivers/md/linear.c     |    4 ++--
 drivers/md/md.c         |    6 ++++++
 drivers/md/multipath.c  |    2 +-
 drivers/md/raid0.c      |    2 +-
 drivers/md/raid1.c      |    4 ++--
 drivers/md/raid10.c     |    2 +-
 drivers/md/raid5.c      |    6 +++---
 include/linux/raid/md.h |    1 +
 9 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index 9bf3629..93ff27b 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -309,7 +309,7 @@ static int run(mddev_t *mddev)
 	list_for_each_entry(rdev, &mddev->disks, same_set)
 		conf->rdev = rdev;
 
-	mddev->array_sectors = faulty_size(mddev, 0, 0);
+	md_set_size(mddev, faulty_size(mddev, 0, 0));
 	mddev->private = conf;
 
 	reconfig(mddev, mddev->layout, -1);
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 4a7397e..694f857 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -259,7 +259,7 @@ static int linear_run (mddev_t *mddev)
 	if (!conf)
 		return 1;
 	mddev->private = conf;
-	mddev->array_sectors = linear_size(mddev, 0, 0);
+	md_set_size(mddev, linear_size(mddev, 0, 0));
 
 	blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
 	mddev->queue->unplug_fn = linear_unplug;
@@ -293,7 +293,7 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev)
 	newconf->prev = mddev_to_conf(mddev);
 	mddev->private = newconf;
 	mddev->raid_disks++;
-	mddev->array_sectors = linear_size(mddev, 0, 0);
+	md_set_size(mddev, linear_size(mddev, 0, 0));
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	return 0;
 }
diff --git a/drivers/md/md.c b/drivers/md/md.c
index f538c3d..031411b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4975,6 +4975,12 @@ static int set_array_info(mddev_t * mddev, mdu_array_info_t *info)
 	return 0;
 }
 
+void md_set_size(mddev_t *mddev, sector_t array_sectors)
+{
+	mddev->array_sectors = array_sectors;
+}
+EXPORT_SYMBOL(md_set_size);
+
 static int update_size(mddev_t *mddev, sector_t num_sectors)
 {
 	mdk_rdev_t *rdev;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 1ca72ed..f03d8ea 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -506,7 +506,7 @@ static int multipath_run (mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	mddev->array_sectors = multipath_size(mddev, 0, 0);
+	md_set_size(mddev, multipath_size(mddev, 0, 0));
 
 	mddev->queue->unplug_fn = multipath_unplug;
 	mddev->queue->backing_dev_info.congested_fn = multipath_congested;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index d5c8ea9..64c62bc 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -303,7 +303,7 @@ static int raid0_run (mddev_t *mddev)
 		goto out_free_conf;
 
 	/* calculate array device size */
-	mddev->array_sectors = raid0_size(mddev, 0, 0);
+	md_set_size(mddev, raid0_size(mddev, 0, 0));
 
 	printk(KERN_INFO "raid0 : md_size is %llu sectors.\n",
 		(unsigned long long)mddev->array_sectors);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5f62d42..a9f19b7 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2056,7 +2056,7 @@ static int run(mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	mddev->array_sectors = raid1_size(mddev, 0, 0);
+	md_set_size(mddev, raid1_size(mddev, 0, 0));
 
 	mddev->queue->unplug_fn = raid1_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid1_congested;
@@ -2121,7 +2121,7 @@ static int raid1_resize(mddev_t *mddev, sector_t sectors)
 	 * any io in the removed space completes, but it hardly seems
 	 * worth it.
 	 */
-	mddev->array_sectors = raid1_size(mddev, sectors, 0);
+	md_set_size(mddev, raid1_size(mddev, sectors, 0));
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	mddev->changed = 1;
 	if (mddev->array_sectors > mddev->dev_sectors &&
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 1c28679..571a3c8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2193,7 +2193,7 @@ static int run(mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	mddev->array_sectors = raid10_size(mddev, 0, 0);
+	md_set_size(mddev, raid10_size(mddev, 0, 0));
 	mddev->resync_max_sectors = mddev->array_sectors;
 
 	mddev->queue->unplug_fn = raid10_unplug;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 3cc570b..41a6ec9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4472,7 +4472,7 @@ static int run(mddev_t *mddev)
 	mddev->queue->backing_dev_info.congested_data = mddev;
 	mddev->queue->backing_dev_info.congested_fn = raid5_congested;
 
-	mddev->array_sectors = raid5_size(mddev, 0, 0);
+	md_set_size(mddev, raid5_size(mddev, 0, 0));
 
 	blk_queue_merge_bvec(mddev->queue, raid5_mergeable_bvec);
 
@@ -4694,7 +4694,7 @@ static int raid5_resize(mddev_t *mddev, sector_t sectors)
 	 * worth it.
 	 */
 	sectors &= ~((sector_t)mddev->chunk_size/512 - 1);
-	mddev->array_sectors = raid5_size(mddev, sectors, mddev->raid_disks);
+	md_set_size(mddev, raid5_size(mddev, sectors, mddev->raid_disks));
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	mddev->changed = 1;
 	if (sectors > mddev->dev_sectors && mddev->recovery_cp == MaxSector) {
@@ -4832,7 +4832,7 @@ static void end_reshape(raid5_conf_t *conf)
 	if (!test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
 		mddev_t *mddev = conf->mddev;
 
-		mddev->array_sectors = raid5_size(mddev, 0, conf->raid_disks);
+		md_set_size(mddev, raid5_size(mddev, 0, conf->raid_disks));
 		set_capacity(mddev->gendisk, mddev->array_sectors);
 		mddev->changed = 1;
 		conf->previous_raid_disks = conf->raid_disks;
diff --git a/include/linux/raid/md.h b/include/linux/raid/md.h
index 82bea14..de9bec2 100644
--- a/include/linux/raid/md.h
+++ b/include/linux/raid/md.h
@@ -75,6 +75,7 @@ extern void md_do_sync(mddev_t *mddev);
 extern void md_new_event(mddev_t *mddev);
 extern int md_allow_write(mddev_t *mddev);
 extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
+extern void md_set_size(mddev_t *mddev, sector_t array_sectors);
 
 #endif /* CONFIG_MD */
 #endif 


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] md: 'array_size' sysfs attribute
  2009-03-06  0:24 [PATCH 0/3] Support setting the array size from userspace Dan Williams
  2009-03-06  0:24 ` [PATCH 1/3] md: add 'size' as a personality method Dan Williams
  2009-03-06  0:24 ` [PATCH 2/3] md: centralize ->array_sectors modifications Dan Williams
@ 2009-03-06  0:24 ` Dan Williams
  2009-03-06 16:15   ` Andre Noll
  2 siblings, 1 reply; 10+ messages in thread
From: Dan Williams @ 2009-03-06  0:24 UTC (permalink / raw)
  To: neilb; +Cc: linux-raid, ed.ciechanowski, jacek.danecki

Allow userspace to set the size of the array according to the following
semantics:

1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
   a) If size is set before the array is running, do_md_run will fail
      if size is greater than the default size
   b) A reshape attempt that reduces the default size to less than the set
      array size should be blocked
2/ once userspace sets the size the kernel will not change it
3/ writing 'default' to this attribute returns control of the size to the
   kernel and reverts to the size reported by the personality

Also, convert locations that need to know the default size from directly
reading ->array_sectors to <pers>_size.  Resync/reshape operations
always follow the default size.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
 drivers/md/md.c           |   91 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/md/raid0.c        |    2 -
 drivers/md/raid1.c        |    6 ++-
 drivers/md/raid10.c       |    2 -
 drivers/md/raid5.c        |   15 +++++--
 include/linux/raid/md.h   |    1 
 include/linux/raid/md_k.h |    1 
 7 files changed, 109 insertions(+), 9 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 031411b..de55758 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -388,6 +388,11 @@ static inline int mddev_lock(mddev_t * mddev)
 	return mutex_lock_interruptible(&mddev->reconfig_mutex);
 }
 
+static inline int mddev_is_locked(mddev_t *mddev)
+{
+	return mutex_is_locked(&mddev->reconfig_mutex);
+}
+
 static inline int mddev_trylock(mddev_t * mddev)
 {
 	return mutex_trylock(&mddev->reconfig_mutex);
@@ -3628,6 +3633,65 @@ static struct md_sysfs_entry md_reshape_position =
 __ATTR(reshape_position, S_IRUGO|S_IWUSR, reshape_position_show,
        reshape_position_store);
 
+static ssize_t
+array_size_show(mddev_t *mddev, char *page)
+{
+	if (mddev->external_size)
+		return sprintf(page, "%llu\n",
+			       (unsigned long long)mddev->array_sectors/2);
+	else
+		return sprintf(page, "default\n");
+}
+
+static ssize_t
+array_size_store(mddev_t *mddev, const char *buf, size_t len)
+{
+	unsigned long long sectors;
+	struct block_device *bdev;
+
+	if (strncmp(buf, "default", 7) == 0) {
+		if (mddev->pers)
+			sectors = mddev->pers->size(mddev, 0, 0);
+		else
+			sectors = mddev->array_sectors;
+
+		mddev->external_size = 0;
+	} else {
+		int err;
+		sector_t new;
+
+		err = strict_strtoull(buf, 10, &sectors);
+		if (err < 0)
+			return err;
+		sectors *= 2;
+		new = sectors;
+		if (new != sectors) /* overflow */
+			return -EINVAL;
+		if (mddev->pers && mddev->pers->size(mddev, 0, 0) < sectors)
+			return -EINVAL;
+
+		mddev->external_size = 1;
+	}
+
+	mddev->array_sectors = sectors;
+	set_capacity(mddev->gendisk, mddev->array_sectors);
+	if (mddev->pers) {
+		bdev = bdget_disk(mddev->gendisk, 0);
+		if (bdev) {
+			mutex_lock(&bdev->bd_inode->i_mutex);
+			i_size_write(bdev->bd_inode,
+				     (loff_t)mddev->array_sectors << 9);
+			mutex_unlock(&bdev->bd_inode->i_mutex);
+			bdput(bdev);
+		}
+	}
+
+	return len;
+}
+
+static struct md_sysfs_entry md_array_size =
+__ATTR(array_size, S_IRUGO|S_IWUSR, array_size_show,
+       array_size_store);
 
 static struct attribute *md_default_attrs[] = {
 	&md_level.attr,
@@ -3641,6 +3705,7 @@ static struct attribute *md_default_attrs[] = {
 	&md_safe_delay.attr,
 	&md_array_state.attr,
 	&md_reshape_position.attr,
+	&md_array_size.attr,
 	NULL,
 };
 
@@ -4043,7 +4108,17 @@ static int do_md_run(mddev_t * mddev)
 	err = mddev->pers->run(mddev);
 	if (err)
 		printk(KERN_ERR "md: pers->run() failed ...\n");
-	else if (mddev->pers->sync_request) {
+	else if (mddev->pers->size(mddev, 0, 0) < mddev->array_sectors) {
+		WARN_ONCE(!mddev->external_size, "%s: default size too small,"
+			  " but 'external_size' not in effect?\n", __func__);
+		printk(KERN_ERR
+		       "md: invalid array_size %llu > default size %llu\n",
+		       (unsigned long long)mddev->array_sectors / 2,
+		       (unsigned long long)mddev->pers->size(mddev, 0, 0) / 2);
+		err = -EINVAL;
+		mddev->pers->stop(mddev);
+	}
+	if (err == 0 && mddev->pers->sync_request) {
 		err = bitmap_create(mddev);
 		if (err) {
 			printk(KERN_ERR "%s: failed to create bitmap (%d)\n",
@@ -4279,6 +4354,7 @@ static int do_md_stop(mddev_t * mddev, int mode, int is_open)
 		export_array(mddev);
 
 		mddev->array_sectors = 0;
+		mddev->external_size = 0;
 		mddev->dev_sectors = 0;
 		mddev->raid_disks = 0;
 		mddev->recovery_cp = 0;
@@ -4977,10 +5053,23 @@ static int set_array_info(mddev_t * mddev, mdu_array_info_t *info)
 
 void md_set_size(mddev_t *mddev, sector_t array_sectors)
 {
+	WARN(!mddev_is_locked(mddev), "%s: unlocked mddev!\n", __func__);
+
+	if (mddev->external_size)
+		return;
+
 	mddev->array_sectors = array_sectors;
 }
 EXPORT_SYMBOL(md_set_size);
 
+void md_set_size_lock(mddev_t *mddev, sector_t array_sectors)
+{
+	mddev_lock(mddev);
+	md_set_size(mddev, array_sectors);
+	mddev_unlock(mddev);
+}
+EXPORT_SYMBOL(md_set_size_lock);
+
 static int update_size(mddev_t *mddev, sector_t num_sectors)
 {
 	mdk_rdev_t *rdev;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 64c62bc..6547d9c 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -310,7 +310,7 @@ static int raid0_run (mddev_t *mddev)
 	printk(KERN_INFO "raid0 : conf->spacing is %llu sectors.\n",
 		(unsigned long long)conf->spacing);
 	{
-		sector_t s = mddev->array_sectors;
+		sector_t s = raid0_size(mddev, 0, 0);
 		sector_t space = conf->spacing;
 		int round;
 		conf->sector_shift = 0;
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a9f19b7..4d0df53 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2122,14 +2122,16 @@ static int raid1_resize(mddev_t *mddev, sector_t sectors)
 	 * worth it.
 	 */
 	md_set_size(mddev, raid1_size(mddev, sectors, 0));
+	if (mddev->array_sectors > raid1_size(mddev, sectors, 0))
+		return -EINVAL;
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	mddev->changed = 1;
-	if (mddev->array_sectors > mddev->dev_sectors &&
+	if (sectors > mddev->dev_sectors &&
 	    mddev->recovery_cp == MaxSector) {
 		mddev->recovery_cp = mddev->dev_sectors;
 		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	}
-	mddev->dev_sectors = mddev->array_sectors;
+	mddev->dev_sectors = sectors;
 	mddev->resync_max_sectors = sectors;
 	return 0;
 }
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 571a3c8..4c30d03 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2194,7 +2194,7 @@ static int run(mddev_t *mddev)
 	 * Ok, everything is just fine now
 	 */
 	md_set_size(mddev, raid10_size(mddev, 0, 0));
-	mddev->resync_max_sectors = mddev->array_sectors;
+	mddev->resync_max_sectors = raid10_size(mddev, 0, 0);
 
 	mddev->queue->unplug_fn = raid10_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid10_congested;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 41a6ec9..017c948 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3700,6 +3700,8 @@ static int make_request(struct request_queue *q, struct bio * bi)
 	return 0;
 }
 
+static sector_t raid5_size(mddev_t *mddev, sector_t sectors, int raid_disks);
+
 static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped)
 {
 	/* reshaping is quite different to recovery/resync so it is
@@ -3778,7 +3780,7 @@ static sector_t reshape_request(mddev_t *mddev, sector_t sector_nr, int *skipped
 			    j == sh->qd_idx)
 				continue;
 			s = compute_blocknr(sh, j);
-			if (s < mddev->array_sectors) {
+			if (s < raid5_size(mddev, 0, 0)) {
 				skipped = 1;
 				continue;
 			}
@@ -4695,6 +4697,9 @@ static int raid5_resize(mddev_t *mddev, sector_t sectors)
 	 */
 	sectors &= ~((sector_t)mddev->chunk_size/512 - 1);
 	md_set_size(mddev, raid5_size(mddev, sectors, mddev->raid_disks));
+	if (mddev->array_sectors >
+	    raid5_size(mddev, sectors, mddev->raid_disks))
+		return -EINVAL;
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	mddev->changed = 1;
 	if (sectors > mddev->dev_sectors && mddev->recovery_cp == MaxSector) {
@@ -4832,7 +4837,7 @@ static void end_reshape(raid5_conf_t *conf)
 	if (!test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
 		mddev_t *mddev = conf->mddev;
 
-		md_set_size(mddev, raid5_size(mddev, 0, conf->raid_disks));
+		md_set_size_lock(mddev, raid5_size(mddev, 0, conf->raid_disks));
 		set_capacity(mddev->gendisk, mddev->array_sectors);
 		mddev->changed = 1;
 		conf->previous_raid_disks = conf->raid_disks;
@@ -4959,8 +4964,10 @@ static int raid5_reconfig(mddev_t *mddev, int new_layout, int new_chunk)
 	/* Currently the layout and chunk size can only be changed
 	 * for a 2-drive raid array, as in that case no data shuffling
 	 * is required.
-	 * Later we might validate these and set new_* so a reshape
-	 * can complete the change.
+	 * Later we might validate these and set new_* so a reshape can
+	 * complete the change (in which case raid5_size would need to
+	 * be updated to allow validating the new geometry does not
+	 * reduce the size below the user specified array_size)
 	 */
 	raid5_conf_t *conf = mddev_to_conf(mddev);
 
diff --git a/include/linux/raid/md.h b/include/linux/raid/md.h
index de9bec2..e4ae183 100644
--- a/include/linux/raid/md.h
+++ b/include/linux/raid/md.h
@@ -76,6 +76,7 @@ extern void md_new_event(mddev_t *mddev);
 extern int md_allow_write(mddev_t *mddev);
 extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
 extern void md_set_size(mddev_t *mddev, sector_t array_sectors);
+extern void md_set_size_lock(mddev_t *mddev, sector_t array_sectors);
 
 #endif /* CONFIG_MD */
 #endif 
diff --git a/include/linux/raid/md_k.h b/include/linux/raid/md_k.h
index 564ce81..fd28152 100644
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -160,6 +160,7 @@ struct mddev_s
 	sector_t			dev_sectors; 	/* used size of
 							 * component devices */
 	sector_t			array_sectors; /* exported array size */
+	int				external_size; /* size managed externallly */
 	__u64				events;
 
 	char				uuid[16];


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] md: add 'size' as a personality method
  2009-03-06  0:24 ` [PATCH 1/3] md: add 'size' as a personality method Dan Williams
@ 2009-03-06 16:15   ` Andre Noll
  2009-03-06 17:55     ` Dan Williams
  0 siblings, 1 reply; 10+ messages in thread
From: Andre Noll @ 2009-03-06 16:15 UTC (permalink / raw)
  To: Dan Williams; +Cc: neilb, linux-raid, ed.ciechanowski, jacek.danecki

[-- Attachment #1: Type: text/plain, Size: 1166 bytes --]

On 17:24, Dan Williams wrote:
> +raid10_size(mddev_t *mddev, sector_t sectors, int raid_disks)
> +{
> +	sector_t size;
> +	int chunk_shift;
> +	conf_t *conf = mddev_to_conf(mddev);
> +	int chunk_size = mddev->chunk_size;
> +
> +	if (!raid_disks)
> +		raid_disks = mddev->raid_disks;
> +	if (!sectors)
> +		sectors = mddev->dev_sectors;
> +
> +	chunk_shift = ffz(~chunk_size) - 9;
> +	size = sectors >> chunk_shift;
> +	sector_div(size, conf->far_copies);
> +	size = size * raid_disks;
> +	sector_div(size, conf->near_copies);
> +
> +	return size << chunk_shift;
> +}

Is there a reason you are recomputing chunk_shift instead of using
conf->chunk_shift.

> +static sector_t
> +raid5_size(mddev_t *mddev, sector_t sectors, int raid_disks)
> +{
> +	raid5_conf_t *conf = mddev_to_conf(mddev);
> +	int chunk_size = mddev->chunk_size;
> +
> +	if (!sectors)
> +		sectors = mddev->dev_sectors;
> +	if (!raid_disks)
> +		raid_disks = conf->previous_raid_disks;
> +
> +	sectors &= ~(chunk_size / 512 - 1);

Don't we need a cast to sector_t here?

Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] md: 'array_size' sysfs attribute
  2009-03-06  0:24 ` [PATCH 3/3] md: 'array_size' sysfs attribute Dan Williams
@ 2009-03-06 16:15   ` Andre Noll
  2009-03-06 18:20     ` Dan Williams
  2009-03-07  6:28     ` Dan Williams
  0 siblings, 2 replies; 10+ messages in thread
From: Andre Noll @ 2009-03-06 16:15 UTC (permalink / raw)
  To: Dan Williams; +Cc: neilb, linux-raid, ed.ciechanowski, jacek.danecki

[-- Attachment #1: Type: text/plain, Size: 3203 bytes --]

On 17:24, Dan Williams wrote:
> Allow userspace to set the size of the array according to the following
> semantics:
> 
> 1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
>    a) If size is set before the array is running, do_md_run will fail
>       if size is greater than the default size
>    b) A reshape attempt that reduces the default size to less than the set
>       array size should be blocked
> 2/ once userspace sets the size the kernel will not change it

This holds only until the array is stopped and reassembled (for
example due to a reboot). Is that correct and intended?

> +static ssize_t
> +array_size_store(mddev_t *mddev, const char *buf, size_t len)
> +{
> +	unsigned long long sectors;
> +	struct block_device *bdev;
> +
> +	if (strncmp(buf, "default", 7) == 0) {
> +		if (mddev->pers)
> +			sectors = mddev->pers->size(mddev, 0, 0);
> +		else
> +			sectors = mddev->array_sectors;
> +
> +		mddev->external_size = 0;
> +	} else {
> +		int err;
> +		sector_t new;
> +
> +		err = strict_strtoull(buf, 10, &sectors);
> +		if (err < 0)
> +			return err;
> +		sectors *= 2;
> +		new = sectors;
> +		if (new != sectors) /* overflow */
> +			return -EINVAL;

Is it possible that already the "sectors *= 2" overflows? If this
happens a much too small value is going to be stored by set_capacity()
later.

> +	mddev->array_sectors = sectors;
> +	set_capacity(mddev->gendisk, mddev->array_sectors);
> +	if (mddev->pers) {
> +		bdev = bdget_disk(mddev->gendisk, 0);
> +		if (bdev) {
> +			mutex_lock(&bdev->bd_inode->i_mutex);
> +			i_size_write(bdev->bd_inode,
> +				     (loff_t)mddev->array_sectors << 9);
> +			mutex_unlock(&bdev->bd_inode->i_mutex);
> +			bdput(bdev);
> +		}
> +	}

bdev is only used inside the if (mddev->pers). No need to define it at
the top of the function.

> @@ -4043,7 +4108,17 @@ static int do_md_run(mddev_t * mddev)
>  	err = mddev->pers->run(mddev);
>  	if (err)
>  		printk(KERN_ERR "md: pers->run() failed ...\n");
> -	else if (mddev->pers->sync_request) {
> +	else if (mddev->pers->size(mddev, 0, 0) < mddev->array_sectors) {
> +		WARN_ONCE(!mddev->external_size, "%s: default size too small,"
> +			  " but 'external_size' not in effect?\n", __func__);
> +		printk(KERN_ERR
> +		       "md: invalid array_size %llu > default size %llu\n",
> +		       (unsigned long long)mddev->array_sectors / 2,
> +		       (unsigned long long)mddev->pers->size(mddev, 0, 0) / 2);
> +		err = -EINVAL;
> +		mddev->pers->stop(mddev);
> +	}

What's the point of the WARN_ONCE() if it is followed by a printk() that
is always being printed?

>  void md_set_size(mddev_t *mddev, sector_t array_sectors)
>  {
> +	WARN(!mddev_is_locked(mddev), "%s: unlocked mddev!\n", __func__);

An unlocked mddev would actually be a bug, no? Also, the name
"md_set_size" is a bit unfortunate as an exported function name because
it's not clear from the name that the size should be specified in
sectors. "md_set_sector_count" or "md_set_array_sectors" is probably
clearer.

Regards
Andre

-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] md: add 'size' as a personality method
  2009-03-06 16:15   ` Andre Noll
@ 2009-03-06 17:55     ` Dan Williams
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Williams @ 2009-03-06 17:55 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, linux-raid, ed.ciechanowski, jacek.danecki

On Fri, Mar 6, 2009 at 9:15 AM, Andre Noll <maan@systemlinux.org> wrote:
> On 17:24, Dan Williams wrote:
>> +raid10_size(mddev_t *mddev, sector_t sectors, int raid_disks)
>> +{
>> +     sector_t size;
>> +     int chunk_shift;
>> +     conf_t *conf = mddev_to_conf(mddev);
>> +     int chunk_size = mddev->chunk_size;
>> +
>> +     if (!raid_disks)
>> +             raid_disks = mddev->raid_disks;
>> +     if (!sectors)
>> +             sectors = mddev->dev_sectors;
>> +
>> +     chunk_shift = ffz(~chunk_size) - 9;
>> +     size = sectors >> chunk_shift;
>> +     sector_div(size, conf->far_copies);
>> +     size = size * raid_disks;
>> +     sector_div(size, conf->near_copies);
>> +
>> +     return size << chunk_shift;
>> +}
>
> Is there a reason you are recomputing chunk_shift instead of using
> conf->chunk_shift.
>

This is a leftover from an earlier version of the patch where I was
allowing chunk_size to specified.  Since we do not support chunk_size
reshaping I dropped the extra argument, will clean this up as well.

>> +static sector_t
>> +raid5_size(mddev_t *mddev, sector_t sectors, int raid_disks)
>> +{
>> +     raid5_conf_t *conf = mddev_to_conf(mddev);
>> +     int chunk_size = mddev->chunk_size;
>> +
>> +     if (!sectors)
>> +             sectors = mddev->dev_sectors;
>> +     if (!raid_disks)
>> +             raid_disks = conf->previous_raid_disks;
>> +
>> +     sectors &= ~(chunk_size / 512 - 1);
>
> Don't we need a cast to sector_t here?
>

Yes.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] md: 'array_size' sysfs attribute
  2009-03-06 16:15   ` Andre Noll
@ 2009-03-06 18:20     ` Dan Williams
  2009-03-07  6:28     ` Dan Williams
  1 sibling, 0 replies; 10+ messages in thread
From: Dan Williams @ 2009-03-06 18:20 UTC (permalink / raw)
  To: Andre Noll; +Cc: neilb, linux-raid, ed.ciechanowski, jacek.danecki

On Fri, Mar 6, 2009 at 9:15 AM, Andre Noll <maan@systemlinux.org> wrote:
> On 17:24, Dan Williams wrote:
>> Allow userspace to set the size of the array according to the following
>> semantics:
>>
>> 1/ size must be <= to the size returned by mddev->pers->size(mddev, 0, 0)
>>    a) If size is set before the array is running, do_md_run will fail
>>       if size is greater than the default size
>>    b) A reshape attempt that reduces the default size to less than the set
>>       array size should be blocked
>> 2/ once userspace sets the size the kernel will not change it
>
> This holds only until the array is stopped and reassembled (for
> example due to a reboot). Is that correct and intended?

Yes.  For arrays that need this the size is recorded in the metadata,
so it will be persistent across reboots.  For the reshape-to-smaller
size cases mdadm will need to remember to set this accordingly.

>
>> +static ssize_t
>> +array_size_store(mddev_t *mddev, const char *buf, size_t len)
>> +{
>> +     unsigned long long sectors;
>> +     struct block_device *bdev;
>> +
>> +     if (strncmp(buf, "default", 7) == 0) {
>> +             if (mddev->pers)
>> +                     sectors = mddev->pers->size(mddev, 0, 0);
>> +             else
>> +                     sectors = mddev->array_sectors;
>> +
>> +             mddev->external_size = 0;
>> +     } else {
>> +             int err;
>> +             sector_t new;
>> +
>> +             err = strict_strtoull(buf, 10, &sectors);
>> +             if (err < 0)
>> +                     return err;
>> +             sectors *= 2;
>> +             new = sectors;
>> +             if (new != sectors) /* overflow */
>> +                     return -EINVAL;
>
> Is it possible that already the "sectors *= 2" overflows? If this
> happens a much too small value is going to be stored by set_capacity()
> later.

That is true.  If we overflow the blocks->sectors conversion we should
be returning an error.  This is also needs to be fixed up in
rdev_size_store and size_store.

>
>> +     mddev->array_sectors = sectors;
>> +     set_capacity(mddev->gendisk, mddev->array_sectors);
>> +     if (mddev->pers) {
>> +             bdev = bdget_disk(mddev->gendisk, 0);
>> +             if (bdev) {
>> +                     mutex_lock(&bdev->bd_inode->i_mutex);
>> +                     i_size_write(bdev->bd_inode,
>> +                                  (loff_t)mddev->array_sectors << 9);
>> +                     mutex_unlock(&bdev->bd_inode->i_mutex);
>> +                     bdput(bdev);
>> +             }
>> +     }
>
> bdev is only used inside the if (mddev->pers). No need to define it at
> the top of the function.
>

ok

>> @@ -4043,7 +4108,17 @@ static int do_md_run(mddev_t * mddev)
>>       err = mddev->pers->run(mddev);
>>       if (err)
>>               printk(KERN_ERR "md: pers->run() failed ...\n");
>> -     else if (mddev->pers->sync_request) {
>> +     else if (mddev->pers->size(mddev, 0, 0) < mddev->array_sectors) {
>> +             WARN_ONCE(!mddev->external_size, "%s: default size too small,"
>> +                       " but 'external_size' not in effect?\n", __func__);
>> +             printk(KERN_ERR
>> +                    "md: invalid array_size %llu > default size %llu\n",
>> +                    (unsigned long long)mddev->array_sectors / 2,
>> +                    (unsigned long long)mddev->pers->size(mddev, 0, 0) / 2);
>> +             err = -EINVAL;
>> +             mddev->pers->stop(mddev);
>> +     }
>
> What's the point of the WARN_ONCE() if it is followed by a printk() that
> is always being printed?
>

There is a subtle difference.  In both cases it is an error, but the
WARN_ONCE is checking for a kernel coding bug while the printk will
also trigger on a bad value from userspace.

>>  void md_set_size(mddev_t *mddev, sector_t array_sectors)
>>  {
>> +     WARN(!mddev_is_locked(mddev), "%s: unlocked mddev!\n", __func__);
>
> An unlocked mddev would actually be a bug, no?

A bug, yes, but not a fatal one.  The user can now report the warning
and fix up the size because the system stayed running.

> Also, the name
> "md_set_size" is a bit unfortunate as an exported function name because
> it's not clear from the name that the size should be specified in
> sectors. "md_set_sector_count" or "md_set_array_sectors" is probably
> clearer.

Admittedly a minor issue, not much different than the lack of clarity
for set_capacity(), but I will go ahead and make this
md_set_array_sectors.

Thanks,
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] md: 'array_size' sysfs attribute
  2009-03-06 16:15   ` Andre Noll
  2009-03-06 18:20     ` Dan Williams
@ 2009-03-07  6:28     ` Dan Williams
  2009-03-09 10:12       ` Andre Noll
  1 sibling, 1 reply; 10+ messages in thread
From: Dan Williams @ 2009-03-07  6:28 UTC (permalink / raw)
  To: Andre Noll, neilb; +Cc: linux-raid, Ciechanowski, Ed, Danecki, Jacek

Subject: md: fixups to the userspace array size changes

From: Dan Williams <dan.j.williams@intel.com>

As identified by Andre Noll <maan@systemlinux.org>:
1/ No need to recalculate chunk shift in raid10_size
2/ Check for overflow in conversions from blocks to sectors
3/ Missing cast to sector_t in raid5_size

Cc: Andre Noll <maan@systemlinux.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
On Fri, 2009-03-06 at 09:15 -0700, Andre Noll wrote: 
> This holds only until the array is stopped and reassembled (for
> example due to a reboot). Is that correct and intended?
[..] 
> Is it possible that already the "sectors *= 2" overflows? If this
> happens a much too small value is going to be stored by set_capacity()
> later.
[..]
> bdev is only used inside the if (mddev->pers). No need to define it at
> the top of the function.
[..]

Here is a fixup patch to be folded into the existing series.  Do you
want the fixed series in a git tree?

Thanks,
Dan

 drivers/md/faulty.c    |    2 +-
 drivers/md/linear.c    |    4 ++-
 drivers/md/md.c        |   55 ++++++++++++++++++++++++++++--------------------
 drivers/md/md.h        |    4 ++-
 drivers/md/multipath.c |    2 +-
 drivers/md/raid0.c     |    2 +-
 drivers/md/raid1.c     |    4 ++-
 drivers/md/raid10.c    |    9 +++-----
 drivers/md/raid5.c     |    9 +++-----
 9 files changed, 48 insertions(+), 43 deletions(-)

diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c
index 24656c2..8695809 100644
--- a/drivers/md/faulty.c
+++ b/drivers/md/faulty.c
@@ -312,7 +312,7 @@ static int run(mddev_t *mddev)
 	list_for_each_entry(rdev, &mddev->disks, same_set)
 		conf->rdev = rdev;
 
-	md_set_size(mddev, faulty_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, faulty_size(mddev, 0, 0));
 	mddev->private = conf;
 
 	reconfig(mddev, mddev->layout, -1);
diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 5b982a0..7a36e38 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -263,7 +263,7 @@ static int linear_run (mddev_t *mddev)
 	if (!conf)
 		return 1;
 	mddev->private = conf;
-	md_set_size(mddev, linear_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
 
 	blk_queue_merge_bvec(mddev->queue, linear_mergeable_bvec);
 	mddev->queue->unplug_fn = linear_unplug;
@@ -297,7 +297,7 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev)
 	newconf->prev = mddev_to_conf(mddev);
 	mddev->private = newconf;
 	mddev->raid_disks++;
-	md_set_size(mddev, linear_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, linear_size(mddev, 0, 0));
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	return 0;
 }
diff --git a/drivers/md/md.c b/drivers/md/md.c
index cd1f6f0..b28db42 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2287,16 +2287,34 @@ static int overlaps(sector_t s1, sector_t l1, sector_t s2, sector_t l2)
 	return 1;
 }
 
+static int strict_blocks_to_sectors(const char *buf, sector_t *sectors)
+{
+	unsigned long long blocks;
+	sector_t new;
+
+	if (strict_strtoull(buf, 10, &blocks) < 0)
+		return -EINVAL;
+
+	if (blocks & 1ULL << (8 * sizeof(blocks) - 1))
+		return -EINVAL; /* sector conversion overflow */
+
+	new = blocks * 2;
+	if (new != blocks * 2)
+		return -EINVAL; /* unsigned long long to sector_t overflow */
+
+	*sectors = new;
+	return 0;
+}
+
 static ssize_t
 rdev_size_store(mdk_rdev_t *rdev, const char *buf, size_t len)
 {
 	mddev_t *my_mddev = rdev->mddev;
 	sector_t oldsectors = rdev->sectors;
-	unsigned long long sectors;
+	sector_t sectors;
 
-	if (strict_strtoull(buf, 10, &sectors) < 0)
+	if (strict_blocks_to_sectors(buf, &sectors) < 0)
 		return -EINVAL;
-	sectors *= 2;
 	if (my_mddev->pers && rdev->raid_disk >= 0) {
 		if (my_mddev->persistent) {
 			sectors = super_types[my_mddev->major_version].
@@ -3187,12 +3205,11 @@ size_store(mddev_t *mddev, const char *buf, size_t len)
 	 * not increase it (except from 0).
 	 * If array is active, we can try an on-line resize
 	 */
-	unsigned long long sectors;
-	int err = strict_strtoull(buf, 10, &sectors);
+	sector_t sectors;
+	int err = strict_blocks_to_sectors(buf, &sectors);
 
 	if (err < 0)
 		return err;
-	sectors *= 2;
 	if (mddev->pers) {
 		err = update_size(mddev, sectors);
 		md_update_sb(mddev, 1);
@@ -3645,8 +3662,7 @@ array_size_show(mddev_t *mddev, char *page)
 static ssize_t
 array_size_store(mddev_t *mddev, const char *buf, size_t len)
 {
-	unsigned long long sectors;
-	struct block_device *bdev;
+	sector_t sectors;
 
 	if (strncmp(buf, "default", 7) == 0) {
 		if (mddev->pers)
@@ -3656,15 +3672,7 @@ array_size_store(mddev_t *mddev, const char *buf, size_t len)
 
 		mddev->external_size = 0;
 	} else {
-		int err;
-		sector_t new;
-
-		err = strict_strtoull(buf, 10, &sectors);
-		if (err < 0)
-			return err;
-		sectors *= 2;
-		new = sectors;
-		if (new != sectors) /* overflow */
+		if (strict_blocks_to_sectors(buf, &sectors) < 0)
 			return -EINVAL;
 		if (mddev->pers && mddev->pers->size(mddev, 0, 0) < sectors)
 			return -EINVAL;
@@ -3675,7 +3683,8 @@ array_size_store(mddev_t *mddev, const char *buf, size_t len)
 	mddev->array_sectors = sectors;
 	set_capacity(mddev->gendisk, mddev->array_sectors);
 	if (mddev->pers) {
-		bdev = bdget_disk(mddev->gendisk, 0);
+		struct block_device *bdev = bdget_disk(mddev->gendisk, 0);
+
 		if (bdev) {
 			mutex_lock(&bdev->bd_inode->i_mutex);
 			i_size_write(bdev->bd_inode,
@@ -5050,7 +5059,7 @@ static int set_array_info(mddev_t * mddev, mdu_array_info_t *info)
 	return 0;
 }
 
-void md_set_size(mddev_t *mddev, sector_t array_sectors)
+void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors)
 {
 	WARN(!mddev_is_locked(mddev), "%s: unlocked mddev!\n", __func__);
 
@@ -5059,15 +5068,15 @@ void md_set_size(mddev_t *mddev, sector_t array_sectors)
 
 	mddev->array_sectors = array_sectors;
 }
-EXPORT_SYMBOL(md_set_size);
+EXPORT_SYMBOL(md_set_array_sectors);
 
-void md_set_size_lock(mddev_t *mddev, sector_t array_sectors)
+void md_set_array_sectors_lock(mddev_t *mddev, sector_t array_sectors)
 {
 	mddev_lock(mddev);
-	md_set_size(mddev, array_sectors);
+	md_set_array_sectors(mddev, array_sectors);
 	mddev_unlock(mddev);
 }
-EXPORT_SYMBOL(md_set_size_lock);
+EXPORT_SYMBOL(md_set_array_sectors_lock);
 
 static int update_size(mddev_t *mddev, sector_t num_sectors)
 {
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 5ef9625..614329d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -431,5 +431,5 @@ extern void md_do_sync(mddev_t *mddev);
 extern void md_new_event(mddev_t *mddev);
 extern int md_allow_write(mddev_t *mddev);
 extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
-extern void md_set_size(mddev_t *mddev, sector_t array_sectors);
-extern void md_set_size_lock(mddev_t *mddev, sector_t array_sectors);
+extern void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors);
+extern void md_set_array_sectors_lock(mddev_t *mddev, sector_t array_sectors);
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 273abca..41ced0c 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -510,7 +510,7 @@ static int multipath_run (mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	md_set_size(mddev, multipath_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, multipath_size(mddev, 0, 0));
 
 	mddev->queue->unplug_fn = multipath_unplug;
 	mddev->queue->backing_dev_info.congested_fn = multipath_congested;
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index e0603e2..c08d755 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -306,7 +306,7 @@ static int raid0_run (mddev_t *mddev)
 		goto out_free_conf;
 
 	/* calculate array device size */
-	md_set_size(mddev, raid0_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
 
 	printk(KERN_INFO "raid0 : md_size is %llu sectors.\n",
 		(unsigned long long)mddev->array_sectors);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index e22fca0..b4f4bad 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2059,7 +2059,7 @@ static int run(mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	md_set_size(mddev, raid1_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, raid1_size(mddev, 0, 0));
 
 	mddev->queue->unplug_fn = raid1_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid1_congested;
@@ -2124,7 +2124,7 @@ static int raid1_resize(mddev_t *mddev, sector_t sectors)
 	 * any io in the removed space completes, but it hardly seems
 	 * worth it.
 	 */
-	md_set_size(mddev, raid1_size(mddev, sectors, 0));
+	md_set_array_sectors(mddev, raid1_size(mddev, sectors, 0));
 	if (mddev->array_sectors > raid1_size(mddev, sectors, 0))
 		return -EINVAL;
 	set_capacity(mddev->gendisk, mddev->array_sectors);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index dd9850f..efbfbfa 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2026,22 +2026,19 @@ static sector_t
 raid10_size(mddev_t *mddev, sector_t sectors, int raid_disks)
 {
 	sector_t size;
-	int chunk_shift;
 	conf_t *conf = mddev_to_conf(mddev);
-	int chunk_size = mddev->chunk_size;
 
 	if (!raid_disks)
 		raid_disks = mddev->raid_disks;
 	if (!sectors)
 		sectors = mddev->dev_sectors;
 
-	chunk_shift = ffz(~chunk_size) - 9;
-	size = sectors >> chunk_shift;
+	size = sectors >> conf->chunk_shift;
 	sector_div(size, conf->far_copies);
 	size = size * raid_disks;
 	sector_div(size, conf->near_copies);
 
-	return size << chunk_shift;
+	return size << conf->chunk_shift;
 }
 
 static int run(mddev_t *mddev)
@@ -2195,7 +2192,7 @@ static int run(mddev_t *mddev)
 	/*
 	 * Ok, everything is just fine now
 	 */
-	md_set_size(mddev, raid10_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, raid10_size(mddev, 0, 0));
 	mddev->resync_max_sectors = raid10_size(mddev, 0, 0);
 
 	mddev->queue->unplug_fn = raid10_unplug;
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 0e9e867..582a87e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4172,14 +4172,13 @@ static sector_t
 raid5_size(mddev_t *mddev, sector_t sectors, int raid_disks)
 {
 	raid5_conf_t *conf = mddev_to_conf(mddev);
-	int chunk_size = mddev->chunk_size;
 
 	if (!sectors)
 		sectors = mddev->dev_sectors;
 	if (!raid_disks)
 		raid_disks = conf->previous_raid_disks;
 
-	sectors &= ~(chunk_size / 512 - 1);
+	sectors &= ~((sector_t)mddev->chunk_size/512 - 1);
 	return sectors * (raid_disks - conf->max_degraded);
 }
 
@@ -4477,7 +4476,7 @@ static int run(mddev_t *mddev)
 	mddev->queue->backing_dev_info.congested_data = mddev;
 	mddev->queue->backing_dev_info.congested_fn = raid5_congested;
 
-	md_set_size(mddev, raid5_size(mddev, 0, 0));
+	md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
 
 	blk_queue_merge_bvec(mddev->queue, raid5_mergeable_bvec);
 
@@ -4699,7 +4698,7 @@ static int raid5_resize(mddev_t *mddev, sector_t sectors)
 	 * worth it.
 	 */
 	sectors &= ~((sector_t)mddev->chunk_size/512 - 1);
-	md_set_size(mddev, raid5_size(mddev, sectors, mddev->raid_disks));
+	md_set_array_sectors(mddev, raid5_size(mddev, sectors, mddev->raid_disks));
 	if (mddev->array_sectors >
 	    raid5_size(mddev, sectors, mddev->raid_disks))
 		return -EINVAL;
@@ -4840,7 +4839,7 @@ static void end_reshape(raid5_conf_t *conf)
 	if (!test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
 		mddev_t *mddev = conf->mddev;
 
-		md_set_size_lock(mddev, raid5_size(mddev, 0, conf->raid_disks));
+		md_set_array_sectors_lock(mddev, raid5_size(mddev, 0, conf->raid_disks));
 		set_capacity(mddev->gendisk, mddev->array_sectors);
 		mddev->changed = 1;
 		conf->previous_raid_disks = conf->raid_disks;



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] md: 'array_size' sysfs attribute
  2009-03-07  6:28     ` Dan Williams
@ 2009-03-09 10:12       ` Andre Noll
  0 siblings, 0 replies; 10+ messages in thread
From: Andre Noll @ 2009-03-09 10:12 UTC (permalink / raw)
  To: Dan Williams; +Cc: neilb, linux-raid, Ciechanowski, Ed, Danecki, Jacek

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On 23:28, Dan Williams wrote:
> Subject: md: fixups to the userspace array size changes
> 
> From: Dan Williams <dan.j.williams@intel.com>
> 
> As identified by Andre Noll <maan@systemlinux.org>:
> 1/ No need to recalculate chunk shift in raid10_size
> 2/ Check for overflow in conversions from blocks to sectors
> 3/ Missing cast to sector_t in raid5_size
> 
> Cc: Andre Noll <maan@systemlinux.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>

Yes, these fixes all look fine to me. Feel free to add

Reviewed-by: Andre Noll <maan@systemlinux.org>

Thanks
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-09 10:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-06  0:24 [PATCH 0/3] Support setting the array size from userspace Dan Williams
2009-03-06  0:24 ` [PATCH 1/3] md: add 'size' as a personality method Dan Williams
2009-03-06 16:15   ` Andre Noll
2009-03-06 17:55     ` Dan Williams
2009-03-06  0:24 ` [PATCH 2/3] md: centralize ->array_sectors modifications Dan Williams
2009-03-06  0:24 ` [PATCH 3/3] md: 'array_size' sysfs attribute Dan Williams
2009-03-06 16:15   ` Andre Noll
2009-03-06 18:20     ` Dan Williams
2009-03-07  6:28     ` Dan Williams
2009-03-09 10:12       ` Andre Noll

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.