All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch 0/2] [V4] block: Support online resize of disk partitions
@ 2012-07-09 21:34 vgoyal
  2012-07-09 21:34 ` [patch 1/2] block: add partition resize function to blkpg ioctl vgoyal
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: vgoyal @ 2012-07-09 21:34 UTC (permalink / raw)
  To: linux-kernel, axboe, dm-devel, kzak; +Cc: psusi, vgoyal, maxim.patlasov

Hi,

Few people have pinged me in rencent past about status of this patch, hence,
this is V4 of patch which adds support for online resizing of a partition.
This patch is based on previously posted patches by Phillip Susi. 

There are two patches. Out of which one is kernel patch and other one is
util-linux patch to add support of a user space utility "resizepart" to
allow resizing the partition.

This ioctl only resizes the partition size in kenrel and does not change
the size on disk. A user needs to make sure that corresponding changes
are made to disk data structures also using fdisk(or partx), if changes
are to be retained across reboot.

Changes since V3
----------------
- Do bdput() in error path as per the Maxim's review comments.

Changes since V2
----------------
- Do not ignore the "start" parameter in RESIZE ioctl.
- Change resizepart utility to parse sysfs to get to partition start.

Changes since V1
----------------
Following are changes since the version Phillip posted.
- RESIZE ioctl ignores the partition "start" and does not expect user to
  specify one. Caller needs to just specify "device", "partition number" and
  "size" of new partition.

- Got rid of part_nr_sects_write_begin/part_nr_sects_write_end functions
  and replaced these with single part_nr_sects_write().

- Some sequence counter related changes are simply lifted from i_size_write().

- Initialized part->nr_sects_seq using seqcount_init().

Phillip, do let me know if I should put your signed-off-by also in the
patch.

Any review feedback is welcome.

I did following test.

- Create a partition of 10MB on a disk using fdisk.
- Add this partition to a volume group
- Use fdisk to increase the partition size to 20MB. (First delete the
  partition and then create a new one of 20MB size).
- Use resizepart to extend partition size in kernel.
        resizepart /dev/sdc 1 40960
- Do pvresize on partition so that physical volume can be incrased in
  size online.
        pvresize /dev/sda1

pvresize does recognize the new size. Also lsblk and /proc/partitions
report the new size of partition.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [patch 1/2] block: add partition resize function to blkpg ioctl
  2012-07-09 21:34 [patch 0/2] [V4] block: Support online resize of disk partitions vgoyal
@ 2012-07-09 21:34 ` vgoyal
  2012-07-09 21:34 ` [patch 2/2] util-linux: resizepart: Utility to resize a partition vgoyal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: vgoyal @ 2012-07-09 21:34 UTC (permalink / raw)
  To: linux-kernel, axboe, dm-devel, kzak; +Cc: psusi, vgoyal, maxim.patlasov

[-- Attachment #1: 0001-block-add-partition-resize-function-to-blkpg-ioctl.patch --]
[-- Type: text/plain, Size: 9882 bytes --]


Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.

This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.

Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 block/genhd.c             |   20 +++++++++++----
 block/ioctl.c             |   59 ++++++++++++++++++++++++++++++++++++++++++--
 block/partition-generic.c |    4 ++-
 include/linux/blkpg.h     |    1 +
 include/linux/genhd.h     |   57 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 132 insertions(+), 9 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 9cf5583..cac7366 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter)
 		part = rcu_dereference(ptbl->part[piter->idx]);
 		if (!part)
 			continue;
-		if (!part->nr_sects &&
+		if (!part_nr_sects_read(part) &&
 		    !(piter->flags & DISK_PITER_INCL_EMPTY) &&
 		    !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 &&
 		      piter->idx == 0))
@@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit);
 static inline int sector_in_part(struct hd_struct *part, sector_t sector)
 {
 	return part->start_sect <= sector &&
-		sector < part->start_sect + part->nr_sects;
+		sector < part->start_sect + part_nr_sects_read(part);
 }
 
 /**
@@ -769,8 +769,8 @@ void __init printk_all_partitions(void)
 
 			printk("%s%s %10llu %s %s", is_part0 ? "" : "  ",
 			       bdevt_str(part_devt(part), devt_buf),
-			       (unsigned long long)part->nr_sects >> 1,
-			       disk_name(disk, part->partno, name_buf),
+			       (unsigned long long)part_nr_sects_read(part) >> 1
+			       , disk_name(disk, part->partno, name_buf),
 			       uuid_buf);
 			if (is_part0) {
 				if (disk->driverfs_dev != NULL &&
@@ -862,7 +862,7 @@ static int show_partition(struct seq_file *seqf, void *v)
 	while ((part = disk_part_iter_next(&piter)))
 		seq_printf(seqf, "%4d  %7d %10llu %s\n",
 			   MAJOR(part_devt(part)), MINOR(part_devt(part)),
-			   (unsigned long long)part->nr_sects >> 1,
+			   (unsigned long long)part_nr_sects_read(part) >> 1,
 			   disk_name(sgp, part->partno, buf));
 	disk_part_iter_exit(&piter);
 
@@ -1268,6 +1268,16 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
 		}
 		disk->part_tbl->part[0] = &disk->part0;
 
+		/*
+		 * set_capacity() and get_capacity() currently don't use
+		 * seqcounter to read/update the part0->nr_sects. Still init
+		 * the counter as we can read the sectors in IO submission
+		 * patch using seqence counters.
+		 *
+		 * TODO: Ideally set_capacity() and get_capacity() should be
+		 * converted to make use of bd_mutex and sequence counters.
+		 */
+		seqcount_init(&disk->part0.nr_sects_seq);
 		hd_ref_init(&disk->part0);
 
 		disk->minors = minors;
diff --git a/block/ioctl.c b/block/ioctl.c
index ba15b2d..4476e0e8 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -13,7 +13,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 {
 	struct block_device *bdevp;
 	struct gendisk *disk;
-	struct hd_struct *part;
+	struct hd_struct *part, *lpart;
 	struct blkpg_ioctl_arg a;
 	struct blkpg_partition p;
 	struct disk_part_iter piter;
@@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 		case BLKPG_ADD_PARTITION:
 			start = p.start >> 9;
 			length = p.length >> 9;
-			/* check for fit in a hd_struct */ 
-			if (sizeof(sector_t) == sizeof(long) && 
+			/* check for fit in a hd_struct */
+			if (sizeof(sector_t) == sizeof(long) &&
 			    sizeof(long long) > sizeof(long)) {
 				long pstart = start, plength = length;
 				if (pstart != start || plength != length
@@ -92,6 +92,59 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 			bdput(bdevp);
 
 			return 0;
+		case BLKPG_RESIZE_PARTITION:
+			start = p.start >> 9;
+			/* new length of partition in bytes */
+			length = p.length >> 9;
+			/* check for fit in a hd_struct */
+			if (sizeof(sector_t) == sizeof(long) &&
+			    sizeof(long long) > sizeof(long)) {
+				long pstart = start, plength = length;
+				if (pstart != start || plength != length
+				    || pstart < 0 || plength < 0)
+					return -EINVAL;
+			}
+			part = disk_get_part(disk, partno);
+			if (!part)
+				return -ENXIO;
+			bdevp = bdget(part_devt(part));
+			if (!bdevp) {
+				disk_put_part(part);
+				return -ENOMEM;
+			}
+			mutex_lock(&bdevp->bd_mutex);
+			mutex_lock_nested(&bdev->bd_mutex, 1);
+			if (start != part->start_sect) {
+				mutex_unlock(&bdevp->bd_mutex);
+				mutex_unlock(&bdev->bd_mutex);
+				bdput(bdevp);
+				disk_put_part(part);
+				return -EINVAL;
+			}
+			/* overlap? */
+			disk_part_iter_init(&piter, disk,
+					    DISK_PITER_INCL_EMPTY);
+			while ((lpart = disk_part_iter_next(&piter))) {
+				if (lpart->partno != partno &&
+				   !(start + length <= lpart->start_sect ||
+				   start >= lpart->start_sect + lpart->nr_sects)
+				   ) {
+					disk_part_iter_exit(&piter);
+					mutex_unlock(&bdevp->bd_mutex);
+					mutex_unlock(&bdev->bd_mutex);
+					bdput(bdevp);
+					disk_put_part(part);
+					return -EBUSY;
+				}
+			}
+			disk_part_iter_exit(&piter);
+			part_nr_sects_write(part, (sector_t)length);
+			i_size_write(bdevp->bd_inode, p.length);
+			mutex_unlock(&bdevp->bd_mutex);
+			mutex_unlock(&bdev->bd_mutex);
+			bdput(bdevp);
+			disk_put_part(part);
+			return 0;
 		default:
 			return -EINVAL;
 	}
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 6df5d69..f1d1451 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -84,7 +84,7 @@ ssize_t part_size_show(struct device *dev,
 		       struct device_attribute *attr, char *buf)
 {
 	struct hd_struct *p = dev_to_part(dev);
-	return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects);
+	return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p));
 }
 
 static ssize_t part_ro_show(struct device *dev,
@@ -294,6 +294,8 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno,
 		err = -ENOMEM;
 		goto out_free;
 	}
+
+	seqcount_init(&p->nr_sects_seq);
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h
index faf8a45..a851944 100644
--- a/include/linux/blkpg.h
+++ b/include/linux/blkpg.h
@@ -40,6 +40,7 @@ struct blkpg_ioctl_arg {
 /* The subfunctions (for the op field) */
 #define BLKPG_ADD_PARTITION	1
 #define BLKPG_DEL_PARTITION	2
+#define BLKPG_RESIZE_PARTITION	3
 
 /* Sizes of name fields. Unused at present. */
 #define BLKPG_DEVNAMELTH	64
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 017a7fb..ee8e688 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -98,7 +98,13 @@ struct partition_meta_info {
 
 struct hd_struct {
 	sector_t start_sect;
+	/*
+	 * nr_sects is protected by sequence counter. One might extend a
+	 * partition while IO is happening to it and update of nr_sects
+	 * can be non-atomic on 32bit machines with 64bit sector_t.
+	 */
 	sector_t nr_sects;
+	seqcount_t nr_sects_seq;
 	sector_t alignment_offset;
 	unsigned int discard_alignment;
 	struct device __dev;
@@ -648,6 +654,57 @@ static inline void hd_struct_put(struct hd_struct *part)
 		__delete_partition(part);
 }
 
+/*
+ * Any access of part->nr_sects which is not protected by partition
+ * bd_mutex or gendisk bdev bd_mutex, should be done using this
+ * accessor function.
+ *
+ * Code written along the lines of i_size_read() and i_size_write().
+ * CONFIG_PREEMPT case optimizes the case of UP kernel with preemption
+ * on.
+ */
+static inline sector_t part_nr_sects_read(struct hd_struct *part)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_SMP)
+	sector_t nr_sects;
+	unsigned seq;
+	do {
+		seq = read_seqcount_begin(&part->nr_sects_seq);
+		nr_sects = part->nr_sects;
+	} while (read_seqcount_retry(&part->nr_sects_seq, seq));
+	return nr_sects;
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
+	sector_t nr_sects;
+
+	preempt_disable();
+	nr_sects = part->nr_sects;
+	preempt_enable();
+	return nr_sects;
+#else
+	return part->nr_sects;
+#endif
+}
+
+/*
+ * Should be called with mutex lock held (typically bd_mutex) of partition
+ * to provide mutual exlusion among writers otherwise seqcount might be
+ * left in wrong state leaving the readers spinning infinitely.
+ */
+static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_SMP)
+	write_seqcount_begin(&part->nr_sects_seq);
+	part->nr_sects = size;
+	write_seqcount_end(&part->nr_sects_seq);
+#elif BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_PREEMPT)
+	preempt_disable();
+	part->nr_sects = size;
+	preempt_enable();
+#else
+	part->nr_sects = size;
+#endif
+}
+
 #else /* CONFIG_BLOCK */
 
 static inline void printk_all_partitions(void) { }
-- 
1.7.7.6



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [patch 2/2] util-linux: resizepart: Utility to resize a partition
  2012-07-09 21:34 [patch 0/2] [V4] block: Support online resize of disk partitions vgoyal
  2012-07-09 21:34 ` [patch 1/2] block: add partition resize function to blkpg ioctl vgoyal
@ 2012-07-09 21:34 ` vgoyal
  2012-08-13 20:20     ` Karel Zak
  2012-07-09 22:40 ` [patch 0/2] [V4] block: Support online resize of disk partitions Phillip Susi
  2012-08-01 14:07 ` Vivek Goyal
  3 siblings, 1 reply; 13+ messages in thread
From: vgoyal @ 2012-07-09 21:34 UTC (permalink / raw)
  To: linux-kernel, axboe, dm-devel, kzak; +Cc: psusi, vgoyal, maxim.patlasov

[-- Attachment #1: 0001-util-linux-resizepart-Utility-to-resize-a-partition.patch --]
[-- Type: text/plain, Size: 5326 bytes --]


A simple user space utility to resize an existing partition. It tries to read
the start of partiton from sysfs.

This is a real quick dirty patch I used for my testing. I am sure there
are better and faster ways of getting to partition "start" from device
and partition number.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 disk-utils/Makemodule.am |    7 +++-
 disk-utils/partx.h       |   19 +++++++++
 disk-utils/resizepart.8  |   38 ++++++++++++++++++
 disk-utils/resizepart.c  |   98 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 161 insertions(+), 1 deletions(-)
 create mode 100644 disk-utils/resizepart.8
 create mode 100644 disk-utils/resizepart.c

diff --git a/disk-utils/Makemodule.am b/disk-utils/Makemodule.am
index 830e8f7..b544e87 100644
--- a/disk-utils/Makemodule.am
+++ b/disk-utils/Makemodule.am
@@ -113,7 +113,7 @@ endif # LINUX
 
 
 if BUILD_PARTX
-usrsbin_exec_PROGRAMS += partx addpart delpart
+usrsbin_exec_PROGRAMS += partx addpart delpart resizepart
 dist_man_MANS += \
 	disk-utils/addpart.8 \
 	disk-utils/delpart.8 \
@@ -130,6 +130,11 @@ delpart_SOURCES = \
 	disk-utils/partx.h
 delpart_LDADD = $(LDADD) libcommon.la
 
+resizepart_SOURCES = \
+	disk-utils/resizepart.c \
+	disk-utils/partx.h
+resizepart_LDADD = $(LDADD) libcommon.la
+
 partx_SOURCES = \
 	disk-utils/partx.c \
 	disk-utils/partx.h
diff --git a/disk-utils/partx.h b/disk-utils/partx.h
index ed0fd0a..02e273e 100644
--- a/disk-utils/partx.h
+++ b/disk-utils/partx.h
@@ -41,4 +41,23 @@ static inline int partx_add_partition(int fd, int partno,
 	return ioctl(fd, BLKPG, &a);
 }
 
+static inline int partx_resize_partition(int fd, int partno,
+			long long start, long long size)
+{
+	struct blkpg_ioctl_arg a;
+	struct blkpg_partition p;
+
+	p.pno = partno;
+	p.start = start << 9;
+	p.length = size << 9;
+	p.devname[0] = 0;
+	p.volname[0] = 0;
+	a.op = BLKPG_RESIZE_PARTITION;
+	a.flags = 0;
+	a.datalen = sizeof(p);
+	a.data = &p;
+
+	return ioctl(fd, BLKPG, &a);
+}
+
 #endif /*  UTIL_LINUX_PARTX_H */
diff --git a/disk-utils/resizepart.8 b/disk-utils/resizepart.8
new file mode 100644
index 0000000..c009cc3
--- /dev/null
+++ b/disk-utils/resizepart.8
@@ -0,0 +1,38 @@
+.\" resizepart.8 --
+.\" Copyright 2012 Vivek Goyal <vgoyal@redhat.com>
+.\" Copyright 2012 Red Hat, Inc.
+.\" May be distributed under the GNU General Public License
+.TH RESIZEPART 8 "February 2012" "util-linux" "System Administration"
+.SH NAME
+resizepart \-
+simple wrapper around the "resize partition" ioctl
+.SH SYNOPSIS
+.B resizepart
+.I device partition length
+.SH DESCRIPTION
+.B resizepart
+is a program that informs the Linux kernel of new partition size.
+
+This command doesn't manipulate partitions on hard drive.
+
+.SH PARAMETERS
+.TP
+.I device
+Specify the disk device.
+.TP
+.I partition
+Specify the partition number.
+.TP
+.I length
+Specify the length of the partition (in 512-byte sectors).
+
+.SH SEE ALSO
+.BR addpart (8),
+.BR delpart (8),
+.BR fdisk (8),
+.BR parted (8),
+.BR partprobe (8),
+.BR partx (8)
+.SH AVAILABILITY
+The resizepart command is part of the util-linux package and is available from
+ftp://ftp.kernel.org/pub/linux/utils/util-linux/.
diff --git a/disk-utils/resizepart.c b/disk-utils/resizepart.c
new file mode 100644
index 0000000..4f9e9ce
--- /dev/null
+++ b/disk-utils/resizepart.c
@@ -0,0 +1,98 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <ctype.h>
+#include "canonicalize.h"
+#include "sysfs.h"
+#include "partx.h"
+
+char *
+get_devname_from_canonical_path(char *path)
+{
+	struct sysfs_cxt cxt;
+	dev_t devno;
+	char name[PATH_MAX];
+	char *devname;
+
+	devno = sysfs_devname_to_devno(path, NULL);
+	if (!devno) {
+		fprintf(stderr, "failed to read devno. \n");
+		exit(1);
+	}
+
+	if (sysfs_init(&cxt, devno, NULL)) {
+		fprintf(stderr, "failed to initialize sysfs. \n");
+		exit(1);
+	}
+	devname = sysfs_get_devname(&cxt, name, sizeof(name));
+	return strdup(devname);
+}
+
+char *
+get_partname_from_devname(char *devname, int partno)
+{
+	char partname[PATH_MAX];
+
+	if (isdigit(devname[strlen(devname) - 1]))
+		snprintf(partname, PATH_MAX, "%sp%d", devname, partno);
+	else
+		snprintf(partname, PATH_MAX, "%s%d", devname, partno);
+
+	return strdup(partname);
+}
+
+
+int
+main(int argc, char **argv)
+{
+	int fd;
+	char *real_path, *devname, *partname, *pstart;
+	char part_sysfs_path[PATH_MAX], part_start[30];
+	FILE *fp;
+
+	if (argc != 4) {
+		fprintf(stderr,
+			"usage: %s diskdevice partitionnr length\n",
+			argv[0]);
+		exit(1);
+	}
+	if ((fd = open(argv[1], O_RDONLY)) < 0) {
+		perror(argv[1]);
+		exit(1);
+	}
+
+	real_path = canonicalize_path(argv[1]);
+
+	if (real_path == NULL) {
+		fprintf(stderr, "canonicalize_path(%s) failed. \n", argv[1]);
+		exit(1);
+	}
+
+	devname = get_devname_from_canonical_path(real_path);
+	partname = get_partname_from_devname(devname, atoi(argv[2]));
+
+	snprintf(part_sysfs_path, PATH_MAX, "/sys/block/%s/%s/start",
+			devname, partname);
+
+	fp = fopen(part_sysfs_path, "r");
+
+	if (!fp) {
+		perror("BLKPG");
+		exit(1);
+	}
+
+	pstart = fgets(part_start, 30, fp);
+
+	if (!pstart) {
+		perror("BLKPG");
+		exit(1);
+	}
+
+	if (partx_resize_partition(fd, atoi(argv[2]), atoll(pstart),
+				atoll(argv[3]))) {
+		perror("BLKPG");
+		exit(1);
+	}
+
+	return 0;
+}
-- 
1.7.7.6



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-07-09 21:34 [patch 0/2] [V4] block: Support online resize of disk partitions vgoyal
  2012-07-09 21:34 ` [patch 1/2] block: add partition resize function to blkpg ioctl vgoyal
  2012-07-09 21:34 ` [patch 2/2] util-linux: resizepart: Utility to resize a partition vgoyal
@ 2012-07-09 22:40 ` Phillip Susi
  2012-07-10 13:57   ` Vivek Goyal
  2012-08-01 14:07 ` Vivek Goyal
  3 siblings, 1 reply; 13+ messages in thread
From: Phillip Susi @ 2012-07-09 22:40 UTC (permalink / raw)
  To: vgoyal; +Cc: linux-kernel, axboe, dm-devel, kzak, maxim.patlasov

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/09/2012 05:34 PM, vgoyal@redhat.com wrote:
> Phillip, do let me know if I should put your signed-off-by also in the
> patch.

Sure, kernel side looks good.  My original util-linux patches also added a -u update mode to kpartx, which I think is the more useful interface than the lower level resizepart command, but I suppose I can rebase it to apply on top of this patch.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP+13DAAoJEJrBOlT6nu75cQIIAJvm6ZFxpNNvgkXq0I6blvIj
Q3s5YbzJYecouHPZdy06UXIdfucHKO7WAaMvpmPnDk+JgtltNljVpA50d21NN2lY
k2j2oU9mFHGEKLDnnYobnr6cO2UShaZkrcMtC29S4LaAdgAgPNyD8aTTWS9w0frv
+p2ko+HKvp3neRpOBwfnYXq/rTBLUmOn0k7XsG8QjnNb3aMnMyYp/crV9Kzeb4YX
uCbnIkzN++oDhmnqsLDGt82/VGXZdnA1umISbV9vZw+Q7FfeaiJVMneFxKe5w/FZ
wiCixoCtGhbqtz1hSsUR+rs2ZaDozL0iygSB15Z71aLPim0TLumtRTpfpu+JR3w=
=Vtl+
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-07-09 22:40 ` [patch 0/2] [V4] block: Support online resize of disk partitions Phillip Susi
@ 2012-07-10 13:57   ` Vivek Goyal
  2012-07-10 14:13     ` Phillip Susi
  0 siblings, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2012-07-10 13:57 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-kernel, axboe, dm-devel, kzak, maxim.patlasov

On Mon, Jul 09, 2012 at 06:40:03PM -0400, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 07/09/2012 05:34 PM, vgoyal@redhat.com wrote:
> > Phillip, do let me know if I should put your signed-off-by also in the
> > patch.
> 
> Sure, kernel side looks good.  My original util-linux patches also added a -u update mode to kpartx, which I think is the more useful interface than the lower level resizepart command, but I suppose I can rebase it to apply on top of this patch.

I wrote user space part only to do testing. We definitely need better user
space support. Especially a utility which changes partition details both
in kernel as well as on disk. Asking user to call two different utilities
is error prone and irritating.

Resending this patch again with your Signed-off-by.

Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.

This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.

Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
---
 block/genhd.c             |   20 +++++++++++----
 block/ioctl.c             |   59 ++++++++++++++++++++++++++++++++++++++++++--
 block/partition-generic.c |    4 ++-
 include/linux/blkpg.h     |    1 +
 include/linux/genhd.h     |   57 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 132 insertions(+), 9 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 9cf5583..cac7366 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter)
 		part = rcu_dereference(ptbl->part[piter->idx]);
 		if (!part)
 			continue;
-		if (!part->nr_sects &&
+		if (!part_nr_sects_read(part) &&
 		    !(piter->flags & DISK_PITER_INCL_EMPTY) &&
 		    !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 &&
 		      piter->idx == 0))
@@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit);
 static inline int sector_in_part(struct hd_struct *part, sector_t sector)
 {
 	return part->start_sect <= sector &&
-		sector < part->start_sect + part->nr_sects;
+		sector < part->start_sect + part_nr_sects_read(part);
 }
 
 /**
@@ -769,8 +769,8 @@ void __init printk_all_partitions(void)
 
 			printk("%s%s %10llu %s %s", is_part0 ? "" : "  ",
 			       bdevt_str(part_devt(part), devt_buf),
-			       (unsigned long long)part->nr_sects >> 1,
-			       disk_name(disk, part->partno, name_buf),
+			       (unsigned long long)part_nr_sects_read(part) >> 1
+			       , disk_name(disk, part->partno, name_buf),
 			       uuid_buf);
 			if (is_part0) {
 				if (disk->driverfs_dev != NULL &&
@@ -862,7 +862,7 @@ static int show_partition(struct seq_file *seqf, void *v)
 	while ((part = disk_part_iter_next(&piter)))
 		seq_printf(seqf, "%4d  %7d %10llu %s\n",
 			   MAJOR(part_devt(part)), MINOR(part_devt(part)),
-			   (unsigned long long)part->nr_sects >> 1,
+			   (unsigned long long)part_nr_sects_read(part) >> 1,
 			   disk_name(sgp, part->partno, buf));
 	disk_part_iter_exit(&piter);
 
@@ -1268,6 +1268,16 @@ struct gendisk *alloc_disk_node(int minors, int node_id)
 		}
 		disk->part_tbl->part[0] = &disk->part0;
 
+		/*
+		 * set_capacity() and get_capacity() currently don't use
+		 * seqcounter to read/update the part0->nr_sects. Still init
+		 * the counter as we can read the sectors in IO submission
+		 * patch using seqence counters.
+		 *
+		 * TODO: Ideally set_capacity() and get_capacity() should be
+		 * converted to make use of bd_mutex and sequence counters.
+		 */
+		seqcount_init(&disk->part0.nr_sects_seq);
 		hd_ref_init(&disk->part0);
 
 		disk->minors = minors;
diff --git a/block/ioctl.c b/block/ioctl.c
index ba15b2d..4476e0e8 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -13,7 +13,7 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 {
 	struct block_device *bdevp;
 	struct gendisk *disk;
-	struct hd_struct *part;
+	struct hd_struct *part, *lpart;
 	struct blkpg_ioctl_arg a;
 	struct blkpg_partition p;
 	struct disk_part_iter piter;
@@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 		case BLKPG_ADD_PARTITION:
 			start = p.start >> 9;
 			length = p.length >> 9;
-			/* check for fit in a hd_struct */ 
-			if (sizeof(sector_t) == sizeof(long) && 
+			/* check for fit in a hd_struct */
+			if (sizeof(sector_t) == sizeof(long) &&
 			    sizeof(long long) > sizeof(long)) {
 				long pstart = start, plength = length;
 				if (pstart != start || plength != length
@@ -92,6 +92,59 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user
 			bdput(bdevp);
 
 			return 0;
+		case BLKPG_RESIZE_PARTITION:
+			start = p.start >> 9;
+			/* new length of partition in bytes */
+			length = p.length >> 9;
+			/* check for fit in a hd_struct */
+			if (sizeof(sector_t) == sizeof(long) &&
+			    sizeof(long long) > sizeof(long)) {
+				long pstart = start, plength = length;
+				if (pstart != start || plength != length
+				    || pstart < 0 || plength < 0)
+					return -EINVAL;
+			}
+			part = disk_get_part(disk, partno);
+			if (!part)
+				return -ENXIO;
+			bdevp = bdget(part_devt(part));
+			if (!bdevp) {
+				disk_put_part(part);
+				return -ENOMEM;
+			}
+			mutex_lock(&bdevp->bd_mutex);
+			mutex_lock_nested(&bdev->bd_mutex, 1);
+			if (start != part->start_sect) {
+				mutex_unlock(&bdevp->bd_mutex);
+				mutex_unlock(&bdev->bd_mutex);
+				bdput(bdevp);
+				disk_put_part(part);
+				return -EINVAL;
+			}
+			/* overlap? */
+			disk_part_iter_init(&piter, disk,
+					    DISK_PITER_INCL_EMPTY);
+			while ((lpart = disk_part_iter_next(&piter))) {
+				if (lpart->partno != partno &&
+				   !(start + length <= lpart->start_sect ||
+				   start >= lpart->start_sect + lpart->nr_sects)
+				   ) {
+					disk_part_iter_exit(&piter);
+					mutex_unlock(&bdevp->bd_mutex);
+					mutex_unlock(&bdev->bd_mutex);
+					bdput(bdevp);
+					disk_put_part(part);
+					return -EBUSY;
+				}
+			}
+			disk_part_iter_exit(&piter);
+			part_nr_sects_write(part, (sector_t)length);
+			i_size_write(bdevp->bd_inode, p.length);
+			mutex_unlock(&bdevp->bd_mutex);
+			mutex_unlock(&bdev->bd_mutex);
+			bdput(bdevp);
+			disk_put_part(part);
+			return 0;
 		default:
 			return -EINVAL;
 	}
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 6df5d69..f1d1451 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -84,7 +84,7 @@ ssize_t part_size_show(struct device *dev,
 		       struct device_attribute *attr, char *buf)
 {
 	struct hd_struct *p = dev_to_part(dev);
-	return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects);
+	return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p));
 }
 
 static ssize_t part_ro_show(struct device *dev,
@@ -294,6 +294,8 @@ struct hd_struct *add_partition(struct gendisk *disk, int partno,
 		err = -ENOMEM;
 		goto out_free;
 	}
+
+	seqcount_init(&p->nr_sects_seq);
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h
index faf8a45..a851944 100644
--- a/include/linux/blkpg.h
+++ b/include/linux/blkpg.h
@@ -40,6 +40,7 @@ struct blkpg_ioctl_arg {
 /* The subfunctions (for the op field) */
 #define BLKPG_ADD_PARTITION	1
 #define BLKPG_DEL_PARTITION	2
+#define BLKPG_RESIZE_PARTITION	3
 
 /* Sizes of name fields. Unused at present. */
 #define BLKPG_DEVNAMELTH	64
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 017a7fb..ee8e688 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -98,7 +98,13 @@ struct partition_meta_info {
 
 struct hd_struct {
 	sector_t start_sect;
+	/*
+	 * nr_sects is protected by sequence counter. One might extend a
+	 * partition while IO is happening to it and update of nr_sects
+	 * can be non-atomic on 32bit machines with 64bit sector_t.
+	 */
 	sector_t nr_sects;
+	seqcount_t nr_sects_seq;
 	sector_t alignment_offset;
 	unsigned int discard_alignment;
 	struct device __dev;
@@ -648,6 +654,57 @@ static inline void hd_struct_put(struct hd_struct *part)
 		__delete_partition(part);
 }
 
+/*
+ * Any access of part->nr_sects which is not protected by partition
+ * bd_mutex or gendisk bdev bd_mutex, should be done using this
+ * accessor function.
+ *
+ * Code written along the lines of i_size_read() and i_size_write().
+ * CONFIG_PREEMPT case optimizes the case of UP kernel with preemption
+ * on.
+ */
+static inline sector_t part_nr_sects_read(struct hd_struct *part)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_SMP)
+	sector_t nr_sects;
+	unsigned seq;
+	do {
+		seq = read_seqcount_begin(&part->nr_sects_seq);
+		nr_sects = part->nr_sects;
+	} while (read_seqcount_retry(&part->nr_sects_seq, seq));
+	return nr_sects;
+#elif BITS_PER_LONG==32 && defined(CONFIG_PREEMPT)
+	sector_t nr_sects;
+
+	preempt_disable();
+	nr_sects = part->nr_sects;
+	preempt_enable();
+	return nr_sects;
+#else
+	return part->nr_sects;
+#endif
+}
+
+/*
+ * Should be called with mutex lock held (typically bd_mutex) of partition
+ * to provide mutual exlusion among writers otherwise seqcount might be
+ * left in wrong state leaving the readers spinning infinitely.
+ */
+static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_SMP)
+	write_seqcount_begin(&part->nr_sects_seq);
+	part->nr_sects = size;
+	write_seqcount_end(&part->nr_sects_seq);
+#elif BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_PREEMPT)
+	preempt_disable();
+	part->nr_sects = size;
+	preempt_enable();
+#else
+	part->nr_sects = size;
+#endif
+}
+
 #else /* CONFIG_BLOCK */
 
 static inline void printk_all_partitions(void) { }
-- 
1.7.7.6


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-07-10 13:57   ` Vivek Goyal
@ 2012-07-10 14:13     ` Phillip Susi
  2012-07-10 14:20       ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Phillip Susi @ 2012-07-10 14:13 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel, axboe, dm-devel, kzak, maxim.patlasov

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 7/10/2012 9:57 AM, Vivek Goyal wrote:
> +static inline sector_t part_nr_sects_read(struct hd_struct *part) 
> +{ +#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) &&
> defined(CONFIG_SMP) +	sector_t nr_sects; +	unsigned seq; +	do { +
> seq = read_seqcount_begin(&part->nr_sects_seq); +		nr_sects =
> part->nr_sects; +	} while (read_seqcount_retry(&part->nr_sects_seq,
> seq)); +	return nr_sects; +#elif BITS_PER_LONG==32 &&
> defined(CONFIG_PREEMPT)

Shouldn't this be BITS_PER_LONG==32 && defined(CONFIG_LBDAF) &&
defined(CONFIG_PREEMPT)?  No sense disabling preemption when the
sector size is also 32 bits.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.17 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJP/Dh/AAoJEJrBOlT6nu75ussIANfCKaObXMJ6JhCNLSeiHUqr
FfBd6p2lJIFd73NSymyigZZ1/rPsy9TpMBlcMwpuFh2erxQ7q3rFnP0O52fEgm2+
20c9+oeUGkzYx62fh0KtfrzBpzhiHOivKR/muAZcfNbb75iKTGrZSZUFrdqAqHci
4zCuu8T37BRo4G9TGdIXD1WT3sltZ7yOk4I7RBhAIDkbt82vVakZ6mlW9hyWmyvD
/sMnXMmkNjwTDdhQj2ho9mn9SFcnDA+aAtJfPXjVAT0W9CKNDYbXw28ud+ARxo5T
rkTjoewdxKffZsBmkkSiNCuLWwpkZng+nTchQjyZbmp/pl4UPlQOfcUb6YxpDX4=
=1tUj
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-07-10 14:13     ` Phillip Susi
@ 2012-07-10 14:20       ` Vivek Goyal
  2012-07-10 14:59         ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2012-07-10 14:20 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-kernel, axboe, dm-devel, kzak, maxim.patlasov

On Tue, Jul 10, 2012 at 10:13:19AM -0400, Phillip Susi wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 7/10/2012 9:57 AM, Vivek Goyal wrote:
> > +static inline sector_t part_nr_sects_read(struct hd_struct *part) 
> > +{ +#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) &&
> > defined(CONFIG_SMP) +	sector_t nr_sects; +	unsigned seq; +	do { +
> > seq = read_seqcount_begin(&part->nr_sects_seq); +		nr_sects =
> > part->nr_sects; +	} while (read_seqcount_retry(&part->nr_sects_seq,
> > seq)); +	return nr_sects; +#elif BITS_PER_LONG==32 &&
> > defined(CONFIG_PREEMPT)
> 
> Shouldn't this be BITS_PER_LONG==32 && defined(CONFIG_LBDAF) &&
> defined(CONFIG_PREEMPT)?  No sense disabling preemption when the
> sector size is also 32 bits.

Yes. Good catch. We don't want to disable/enable preemption for 32bit UP 
kernels with sector size 32bit. I will modify the patch and repost soon.

BTW, what happened to all the new lines in the code above. Looks like you
mailer chewed these up.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-07-10 14:20       ` Vivek Goyal
@ 2012-07-10 14:59         ` Vivek Goyal
  0 siblings, 0 replies; 13+ messages in thread
From: Vivek Goyal @ 2012-07-10 14:59 UTC (permalink / raw)
  To: Phillip Susi; +Cc: linux-kernel, axboe, dm-devel, kzak, maxim.patlasov

On Tue, Jul 10, 2012 at 10:20:01AM -0400, Vivek Goyal wrote:

[..]
> > 
> > Shouldn't this be BITS_PER_LONG==32 && defined(CONFIG_LBDAF) &&
> > defined(CONFIG_PREEMPT)?  No sense disabling preemption when the
> > sector size is also 32 bits.
> 
> Yes. Good catch. We don't want to disable/enable preemption for 32bit UP 
> kernels with sector size 32bit. I will modify the patch and repost soon.
> 
> BTW, what happened to all the new lines in the code above. Looks like you
> mailer chewed these up.

Here is the V2 of the patch with CONFIG_LBDAF test added to UP, PREEMPT
case.

Thanks
Vivek


block: add partition resize function to blkpg ioctl

Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that
allows altering the size of an existing partition, even if it is currently
in use.

This patch converts hd_struct->nr_sects into sequence counter because
One might extend a partition while IO is happening to it and update of
nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This
can lead to issues like reading inconsistent size of a partition. Sequence
counter have been used so that readers don't have to take bdev mutex lock
as we call sector_in_part() very frequently.

Now all the access to hd_struct->nr_sects should happen using sequence
counter read/update helper functions part_nr_sects_read/part_nr_sects_write.
There is one exception though, set_capacity()/get_capacity(). I think
theoritically race should exist there too but this patch does not
modify set_capacity()/get_capacity() due to sheer number of call sites
and I am afraid that change might break something. I have left that as a
TODO item. We can handle it later if need be. This patch does not introduce
any new races as such w.r.t set_capacity()/get_capacity().

v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip. 

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Phillip Susi <psusi@ubuntu.com>
---
 block/genhd.c             |   20 +++++++++++----
 block/ioctl.c             |   59 +++++++++++++++++++++++++++++++++++++++++++---
 block/partition-generic.c |    4 ++-
 include/linux/blkpg.h     |    1 
 include/linux/genhd.h     |   57 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 132 insertions(+), 9 deletions(-)

Index: linux-2.6/block/genhd.c
===================================================================
--- linux-2.6.orig/block/genhd.c	2012-07-10 21:03:52.458190498 -0400
+++ linux-2.6/block/genhd.c	2012-07-10 21:03:57.065190639 -0400
@@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(st
 		part = rcu_dereference(ptbl->part[piter->idx]);
 		if (!part)
 			continue;
-		if (!part->nr_sects &&
+		if (!part_nr_sects_read(part) &&
 		    !(piter->flags & DISK_PITER_INCL_EMPTY) &&
 		    !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 &&
 		      piter->idx == 0))
@@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit);
 static inline int sector_in_part(struct hd_struct *part, sector_t sector)
 {
 	return part->start_sect <= sector &&
-		sector < part->start_sect + part->nr_sects;
+		sector < part->start_sect + part_nr_sects_read(part);
 }
 
 /**
@@ -769,8 +769,8 @@ void __init printk_all_partitions(void)
 
 			printk("%s%s %10llu %s %s", is_part0 ? "" : "  ",
 			       bdevt_str(part_devt(part), devt_buf),
-			       (unsigned long long)part->nr_sects >> 1,
-			       disk_name(disk, part->partno, name_buf),
+			       (unsigned long long)part_nr_sects_read(part) >> 1
+			       , disk_name(disk, part->partno, name_buf),
 			       uuid_buf);
 			if (is_part0) {
 				if (disk->driverfs_dev != NULL &&
@@ -862,7 +862,7 @@ static int show_partition(struct seq_fil
 	while ((part = disk_part_iter_next(&piter)))
 		seq_printf(seqf, "%4d  %7d %10llu %s\n",
 			   MAJOR(part_devt(part)), MINOR(part_devt(part)),
-			   (unsigned long long)part->nr_sects >> 1,
+			   (unsigned long long)part_nr_sects_read(part) >> 1,
 			   disk_name(sgp, part->partno, buf));
 	disk_part_iter_exit(&piter);
 
@@ -1268,6 +1268,16 @@ struct gendisk *alloc_disk_node(int mino
 		}
 		disk->part_tbl->part[0] = &disk->part0;
 
+		/*
+		 * set_capacity() and get_capacity() currently don't use
+		 * seqcounter to read/update the part0->nr_sects. Still init
+		 * the counter as we can read the sectors in IO submission
+		 * patch using seqence counters.
+		 *
+		 * TODO: Ideally set_capacity() and get_capacity() should be
+		 * converted to make use of bd_mutex and sequence counters.
+		 */
+		seqcount_init(&disk->part0.nr_sects_seq);
 		hd_ref_init(&disk->part0);
 
 		disk->minors = minors;
Index: linux-2.6/block/ioctl.c
===================================================================
--- linux-2.6.orig/block/ioctl.c	2012-07-10 21:03:52.459190498 -0400
+++ linux-2.6/block/ioctl.c	2012-07-10 21:03:57.067190639 -0400
@@ -13,7 +13,7 @@ static int blkpg_ioctl(struct block_devi
 {
 	struct block_device *bdevp;
 	struct gendisk *disk;
-	struct hd_struct *part;
+	struct hd_struct *part, *lpart;
 	struct blkpg_ioctl_arg a;
 	struct blkpg_partition p;
 	struct disk_part_iter piter;
@@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_devi
 		case BLKPG_ADD_PARTITION:
 			start = p.start >> 9;
 			length = p.length >> 9;
-			/* check for fit in a hd_struct */ 
-			if (sizeof(sector_t) == sizeof(long) && 
+			/* check for fit in a hd_struct */
+			if (sizeof(sector_t) == sizeof(long) &&
 			    sizeof(long long) > sizeof(long)) {
 				long pstart = start, plength = length;
 				if (pstart != start || plength != length
@@ -92,6 +92,59 @@ static int blkpg_ioctl(struct block_devi
 			bdput(bdevp);
 
 			return 0;
+		case BLKPG_RESIZE_PARTITION:
+			start = p.start >> 9;
+			/* new length of partition in bytes */
+			length = p.length >> 9;
+			/* check for fit in a hd_struct */
+			if (sizeof(sector_t) == sizeof(long) &&
+			    sizeof(long long) > sizeof(long)) {
+				long pstart = start, plength = length;
+				if (pstart != start || plength != length
+				    || pstart < 0 || plength < 0)
+					return -EINVAL;
+			}
+			part = disk_get_part(disk, partno);
+			if (!part)
+				return -ENXIO;
+			bdevp = bdget(part_devt(part));
+			if (!bdevp) {
+				disk_put_part(part);
+				return -ENOMEM;
+			}
+			mutex_lock(&bdevp->bd_mutex);
+			mutex_lock_nested(&bdev->bd_mutex, 1);
+			if (start != part->start_sect) {
+				mutex_unlock(&bdevp->bd_mutex);
+				mutex_unlock(&bdev->bd_mutex);
+				bdput(bdevp);
+				disk_put_part(part);
+				return -EINVAL;
+			}
+			/* overlap? */
+			disk_part_iter_init(&piter, disk,
+					    DISK_PITER_INCL_EMPTY);
+			while ((lpart = disk_part_iter_next(&piter))) {
+				if (lpart->partno != partno &&
+				   !(start + length <= lpart->start_sect ||
+				   start >= lpart->start_sect + lpart->nr_sects)
+				   ) {
+					disk_part_iter_exit(&piter);
+					mutex_unlock(&bdevp->bd_mutex);
+					mutex_unlock(&bdev->bd_mutex);
+					bdput(bdevp);
+					disk_put_part(part);
+					return -EBUSY;
+				}
+			}
+			disk_part_iter_exit(&piter);
+			part_nr_sects_write(part, (sector_t)length);
+			i_size_write(bdevp->bd_inode, p.length);
+			mutex_unlock(&bdevp->bd_mutex);
+			mutex_unlock(&bdev->bd_mutex);
+			bdput(bdevp);
+			disk_put_part(part);
+			return 0;
 		default:
 			return -EINVAL;
 	}
Index: linux-2.6/block/partition-generic.c
===================================================================
--- linux-2.6.orig/block/partition-generic.c	2012-07-10 21:03:52.460190498 -0400
+++ linux-2.6/block/partition-generic.c	2012-07-10 21:03:57.069190639 -0400
@@ -84,7 +84,7 @@ ssize_t part_size_show(struct device *de
 		       struct device_attribute *attr, char *buf)
 {
 	struct hd_struct *p = dev_to_part(dev);
-	return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects);
+	return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p));
 }
 
 static ssize_t part_ro_show(struct device *dev,
@@ -294,6 +294,8 @@ struct hd_struct *add_partition(struct g
 		err = -ENOMEM;
 		goto out_free;
 	}
+
+	seqcount_init(&p->nr_sects_seq);
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
Index: linux-2.6/include/linux/blkpg.h
===================================================================
--- linux-2.6.orig/include/linux/blkpg.h	2012-07-10 21:03:52.462190498 -0400
+++ linux-2.6/include/linux/blkpg.h	2012-07-10 21:03:57.072190639 -0400
@@ -40,6 +40,7 @@ struct blkpg_ioctl_arg {
 /* The subfunctions (for the op field) */
 #define BLKPG_ADD_PARTITION	1
 #define BLKPG_DEL_PARTITION	2
+#define BLKPG_RESIZE_PARTITION	3
 
 /* Sizes of name fields. Unused at present. */
 #define BLKPG_DEVNAMELTH	64
Index: linux-2.6/include/linux/genhd.h
===================================================================
--- linux-2.6.orig/include/linux/genhd.h	2012-07-10 21:03:52.463190498 -0400
+++ linux-2.6/include/linux/genhd.h	2012-07-10 21:05:47.178194202 -0400
@@ -98,7 +98,13 @@ struct partition_meta_info {
 
 struct hd_struct {
 	sector_t start_sect;
+	/*
+	 * nr_sects is protected by sequence counter. One might extend a
+	 * partition while IO is happening to it and update of nr_sects
+	 * can be non-atomic on 32bit machines with 64bit sector_t.
+	 */
 	sector_t nr_sects;
+	seqcount_t nr_sects_seq;
 	sector_t alignment_offset;
 	unsigned int discard_alignment;
 	struct device __dev;
@@ -648,6 +654,57 @@ static inline void hd_struct_put(struct 
 		__delete_partition(part);
 }
 
+/*
+ * Any access of part->nr_sects which is not protected by partition
+ * bd_mutex or gendisk bdev bd_mutex, should be done using this
+ * accessor function.
+ *
+ * Code written along the lines of i_size_read() and i_size_write().
+ * CONFIG_PREEMPT case optimizes the case of UP kernel with preemption
+ * on.
+ */
+static inline sector_t part_nr_sects_read(struct hd_struct *part)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_SMP)
+	sector_t nr_sects;
+	unsigned seq;
+	do {
+		seq = read_seqcount_begin(&part->nr_sects_seq);
+		nr_sects = part->nr_sects;
+	} while (read_seqcount_retry(&part->nr_sects_seq, seq));
+	return nr_sects;
+#elif BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_PREEMPT)
+	sector_t nr_sects;
+
+	preempt_disable();
+	nr_sects = part->nr_sects;
+	preempt_enable();
+	return nr_sects;
+#else
+	return part->nr_sects;
+#endif
+}
+
+/*
+ * Should be called with mutex lock held (typically bd_mutex) of partition
+ * to provide mutual exlusion among writers otherwise seqcount might be
+ * left in wrong state leaving the readers spinning infinitely.
+ */
+static inline void part_nr_sects_write(struct hd_struct *part, sector_t size)
+{
+#if BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_SMP)
+	write_seqcount_begin(&part->nr_sects_seq);
+	part->nr_sects = size;
+	write_seqcount_end(&part->nr_sects_seq);
+#elif BITS_PER_LONG==32 && defined(CONFIG_LBDAF) && defined(CONFIG_PREEMPT)
+	preempt_disable();
+	part->nr_sects = size;
+	preempt_enable();
+#else
+	part->nr_sects = size;
+#endif
+}
+
 #else /* CONFIG_BLOCK */
 
 static inline void printk_all_partitions(void) { }

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-07-09 21:34 [patch 0/2] [V4] block: Support online resize of disk partitions vgoyal
                   ` (2 preceding siblings ...)
  2012-07-09 22:40 ` [patch 0/2] [V4] block: Support online resize of disk partitions Phillip Susi
@ 2012-08-01 14:07 ` Vivek Goyal
  2012-08-01 15:49   ` Jens Axboe
  3 siblings, 1 reply; 13+ messages in thread
From: Vivek Goyal @ 2012-08-01 14:07 UTC (permalink / raw)
  To: axboe; +Cc: linux-kernel, dm-devel, kzak, psusi, vgoyal, maxim.patlasov

On Mon, Jul 09, 2012 at 05:34:18PM -0400, vivek.goyal2008@gmail.com wrote:
> Hi,
> 
> Few people have pinged me in rencent past about status of this patch, hence,
> this is V4 of patch which adds support for online resizing of a partition.
> This patch is based on previously posted patches by Phillip Susi. 
> 

Hi Jens,

Can you please consider partition resize patches for inclusion.

Thanks
Vivek

> There are two patches. Out of which one is kernel patch and other one is
> util-linux patch to add support of a user space utility "resizepart" to
> allow resizing the partition.
> 
> This ioctl only resizes the partition size in kenrel and does not change
> the size on disk. A user needs to make sure that corresponding changes
> are made to disk data structures also using fdisk(or partx), if changes
> are to be retained across reboot.
> 
> Changes since V3
> ----------------
> - Do bdput() in error path as per the Maxim's review comments.
> 
> Changes since V2
> ----------------
> - Do not ignore the "start" parameter in RESIZE ioctl.
> - Change resizepart utility to parse sysfs to get to partition start.
> 
> Changes since V1
> ----------------
> Following are changes since the version Phillip posted.
> - RESIZE ioctl ignores the partition "start" and does not expect user to
>   specify one. Caller needs to just specify "device", "partition number" and
>   "size" of new partition.
> 
> - Got rid of part_nr_sects_write_begin/part_nr_sects_write_end functions
>   and replaced these with single part_nr_sects_write().
> 
> - Some sequence counter related changes are simply lifted from i_size_write().
> 
> - Initialized part->nr_sects_seq using seqcount_init().
> 
> Phillip, do let me know if I should put your signed-off-by also in the
> patch.
> 
> Any review feedback is welcome.
> 
> I did following test.
> 
> - Create a partition of 10MB on a disk using fdisk.
> - Add this partition to a volume group
> - Use fdisk to increase the partition size to 20MB. (First delete the
>   partition and then create a new one of 20MB size).
> - Use resizepart to extend partition size in kernel.
>         resizepart /dev/sdc 1 40960
> - Do pvresize on partition so that physical volume can be incrased in
>   size online.
>         pvresize /dev/sda1
> 
> pvresize does recognize the new size. Also lsblk and /proc/partitions
> report the new size of partition.
> 
> Thanks
> Vivek
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-08-01 14:07 ` Vivek Goyal
@ 2012-08-01 15:49   ` Jens Axboe
  2012-08-01 15:59     ` Vivek Goyal
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2012-08-01 15:49 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel, dm-devel, kzak, psusi, maxim.patlasov

On 08/01/2012 04:07 PM, Vivek Goyal wrote:
> On Mon, Jul 09, 2012 at 05:34:18PM -0400, vivek.goyal2008@gmail.com wrote:
>> Hi,
>>
>> Few people have pinged me in rencent past about status of this patch, hence,
>> this is V4 of patch which adds support for online resizing of a partition.
>> This patch is based on previously posted patches by Phillip Susi. 
>>
> 
> Hi Jens,
> 
> Can you please consider partition resize patches for inclusion.

Vivek, they are in and were in the pull request sent out earlier today!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 0/2] [V4] block: Support online resize of disk partitions
  2012-08-01 15:49   ` Jens Axboe
@ 2012-08-01 15:59     ` Vivek Goyal
  0 siblings, 0 replies; 13+ messages in thread
From: Vivek Goyal @ 2012-08-01 15:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, dm-devel, kzak, psusi, maxim.patlasov

On Wed, Aug 01, 2012 at 05:49:11PM +0200, Jens Axboe wrote:
> On 08/01/2012 04:07 PM, Vivek Goyal wrote:
> > On Mon, Jul 09, 2012 at 05:34:18PM -0400, vivek.goyal2008@gmail.com wrote:
> >> Hi,
> >>
> >> Few people have pinged me in rencent past about status of this patch, hence,
> >> this is V4 of patch which adds support for online resizing of a partition.
> >> This patch is based on previously posted patches by Phillip Susi. 
> >>
> > 
> > Hi Jens,
> > 
> > Can you please consider partition resize patches for inclusion.
> 
> Vivek, they are in and were in the pull request sent out earlier today!

Great. I missed that. Thanks for including the patch.

Vivek

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 2/2] util-linux: resizepart: Utility to resize a partition
  2012-07-09 21:34 ` [patch 2/2] util-linux: resizepart: Utility to resize a partition vgoyal
@ 2012-08-13 20:20     ` Karel Zak
  0 siblings, 0 replies; 13+ messages in thread
From: Karel Zak @ 2012-08-13 20:20 UTC (permalink / raw)
  To: vgoyal
  Cc: linux-kernel, axboe, dm-devel, psusi, vgoyal, maxim.patlasov, util-linux

On Mon, Jul 09, 2012 at 05:34:20PM -0400, vgoyal@redhat.com wrote:
> A simple user space utility to resize an existing partition. It tries to read
> the start of partiton from sysfs.

Applied with some changes -- will be in the next release util-linux 2.22-rc2

Thanks!

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [patch 2/2] util-linux: resizepart: Utility to resize a partition
@ 2012-08-13 20:20     ` Karel Zak
  0 siblings, 0 replies; 13+ messages in thread
From: Karel Zak @ 2012-08-13 20:20 UTC (permalink / raw)
  Cc: axboe, util-linux, psusi, linux-kernel, maxim.patlasov, dm-devel, vgoyal

On Mon, Jul 09, 2012 at 05:34:20PM -0400, vgoyal@redhat.com wrote:
> A simple user space utility to resize an existing partition. It tries to read
> the start of partiton from sysfs.

Applied with some changes -- will be in the next release util-linux 2.22-rc2

Thanks!

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-08-13 20:20 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-09 21:34 [patch 0/2] [V4] block: Support online resize of disk partitions vgoyal
2012-07-09 21:34 ` [patch 1/2] block: add partition resize function to blkpg ioctl vgoyal
2012-07-09 21:34 ` [patch 2/2] util-linux: resizepart: Utility to resize a partition vgoyal
2012-08-13 20:20   ` Karel Zak
2012-08-13 20:20     ` Karel Zak
2012-07-09 22:40 ` [patch 0/2] [V4] block: Support online resize of disk partitions Phillip Susi
2012-07-10 13:57   ` Vivek Goyal
2012-07-10 14:13     ` Phillip Susi
2012-07-10 14:20       ` Vivek Goyal
2012-07-10 14:59         ` Vivek Goyal
2012-08-01 14:07 ` Vivek Goyal
2012-08-01 15:49   ` Jens Axboe
2012-08-01 15:59     ` Vivek Goyal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.