* [PATCH 1/2] Add partition resize function to BLKPG ioctl [not found] <cover.1322709471.git.psusi@cfl.rr.com> @ 2011-12-01 3:23 ` Phillip Susi 2011-12-08 12:30 ` Karel Zak 2011-12-01 3:23 ` Phillip Susi 1 sibling, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-01 3:23 UTC (permalink / raw) To: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Add a new operation code ( BLKPG_RES_PARTITION ) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> - --- block/ioctl.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/blkpg.h | 1 + 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index ca939fc..f97e6a4 100644 - --- a/block/ioctl.c +++ b/block/ioctl.c @@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user case BLKPG_ADD_PARTITION: start = p.start >> 9; length = p.length >> 9; - - /* check for fit in a hd_struct */ - - if (sizeof(sector_t) == sizeof(long) && + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && sizeof(long long) > sizeof(long)) { long pstart = start, plength = length; if (pstart != start || plength != length @@ -92,6 +92,53 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); return 0; + case BLKPG_RES_PARTITION: + start = p.start >> 9; + length = p.length >> 9; + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && + sizeof(long long) > sizeof(long)) { + long pstart = start, plength = length; + if (pstart != start || plength != length + || pstart < 0 || plength < 0) + return -EINVAL; + } + + mutex_lock(&bdev->bd_mutex); + + /* overlap? */ + disk_part_iter_init(&piter, disk, + DISK_PITER_INCL_EMPTY); + while ((part = disk_part_iter_next(&piter))) { + if (part->partno != partno && !(start + length <= part->start_sect || + start >= part->start_sect + part->nr_sects)) { + disk_part_iter_exit(&piter); + mutex_unlock(&bdev->bd_mutex); + return -EBUSY; + } + } + disk_part_iter_exit(&piter); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + if (start != part->start_sect) + { + mutex_unlock(&bdev->bd_mutex); + disk_put_part(part); + return -EINVAL; + } + part->nr_sects = length; + bdevp = bdget(part_devt(part)); + mutex_lock(&bdevp->bd_mutex); + i_size_write(bdevp->bd_inode, p.length); + mutex_unlock(&bdevp->bd_mutex); + bdput(bdevp); + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + return 0; default: return -EINVAL; } diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index faf8a45..103da38 100644 - --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -40,6 +40,7 @@ struct blkpg_ioctl_arg { /* The subfunctions (for the op field) */ #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 +#define BLKPG_RES_PARTITION 3 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 - -- 1.7.5.4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7W8yAACgkQJ4UciIs+XuKpiwCfd8YC2241ScOcruAoGvBnTyVZ p6UAoI5JT/UX4gkilpjmTKbuBLzeCTv4 =K/d3 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-01 3:23 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Phillip Susi @ 2011-12-08 12:30 ` Karel Zak 2011-12-08 14:22 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Karel Zak @ 2011-12-08 12:30 UTC (permalink / raw) To: Phillip Susi; +Cc: linux-kernel, Vivek Goyal, Jens Axboe On Wed, Nov 30, 2011 at 10:23:12PM -0500, Phillip Susi wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Add a new operation code ( BLKPG_RES_PARTITION ) to the > BLKPG ioctl that allows altering the size of an existing > partition, even if it is currently in use. It seems very close to Vivek's BLKPG_EXTEND_PARTITION idea, see: https://lkml.org/lkml/2011/9/1/351 It's evident that we (userspace) need something better than BLKPG_{ADD,DEL}_* and BLKRRPART, but no response from kernel side :-( Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 12:30 ` Karel Zak @ 2011-12-08 14:22 ` Phillip Susi 2011-12-08 15:16 ` Karel Zak 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-08 14:22 UTC (permalink / raw) To: Karel Zak; +Cc: linux-kernel, Vivek Goyal, Jens Axboe On 12/8/2011 7:30 AM, Karel Zak wrote: > It seems very close to Vivek's BLKPG_EXTEND_PARTITION idea, see: > > https://lkml.org/lkml/2011/9/1/351 > > It's evident that we (userspace) need something better than > BLKPG_{ADD,DEL}_* and BLKRRPART, but no response from kernel side :-( Yes, this looks very similar to mine, except that I also wanted to allow shrinking the partition, not just extending, since btrfs can handle this ( tested ). ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 14:22 ` Phillip Susi @ 2011-12-08 15:16 ` Karel Zak 2011-12-08 15:25 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Karel Zak @ 2011-12-08 15:16 UTC (permalink / raw) To: Phillip Susi; +Cc: linux-kernel, Vivek Goyal, Jens Axboe On Thu, Dec 08, 2011 at 09:22:12AM -0500, Phillip Susi wrote: > On 12/8/2011 7:30 AM, Karel Zak wrote: >> It seems very close to Vivek's BLKPG_EXTEND_PARTITION idea, see: >> >> https://lkml.org/lkml/2011/9/1/351 >> >> It's evident that we (userspace) need something better than >> BLKPG_{ADD,DEL}_* and BLKRRPART, but no response from kernel side :-( > > Yes, this looks very similar to mine, except that I also wanted to allow > shrinking the partition, not just extending, since btrfs can handle this > ( tested ). Is it safe to alter the partition size in arbitrary way if the partition is used by any process? Vivek's BLKPG_EXTEND_PARTITIONV seems more safety, because it extends a partition size only, so all offsets in all running stuff are still valid. Maybe you need to check bdevp->bd_openers and returns -EBUSY if you want to alter the begin of the partition. Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 15:16 ` Karel Zak @ 2011-12-08 15:25 ` Phillip Susi 2011-12-08 15:58 ` Vivek Goyal 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-08 15:25 UTC (permalink / raw) To: Karel Zak; +Cc: linux-kernel, Vivek Goyal, Jens Axboe On 12/8/2011 10:16 AM, Karel Zak wrote: > Is it safe to alter the partition size in arbitrary way if the > partition is used by any process? > > Vivek's BLKPG_EXTEND_PARTITIONV seems more safety, because it extends > a partition size only, so all offsets in all running stuff are still > valid. > > Maybe you need to check bdevp->bd_openers and returns -EBUSY if you > want to alter the begin of the partition. I disallowed altering the start of the partition ( that would just be crazy ), but altering the end has been supported on lvm for years now. Ext4 can not perform an online shrink, but btrfs can, and I was able to successfully have btrfs shrink the fs and then use BLKPG_RES_PARTITION to shrink the partition. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 15:25 ` Phillip Susi @ 2011-12-08 15:58 ` Vivek Goyal 2011-12-08 16:06 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Vivek Goyal @ 2011-12-08 15:58 UTC (permalink / raw) To: Phillip Susi; +Cc: Karel Zak, linux-kernel, Jens Axboe On Thu, Dec 08, 2011 at 10:25:26AM -0500, Phillip Susi wrote: > On 12/8/2011 10:16 AM, Karel Zak wrote: > > Is it safe to alter the partition size in arbitrary way if the > > partition is used by any process? > > > > Vivek's BLKPG_EXTEND_PARTITIONV seems more safety, because it extends > > a partition size only, so all offsets in all running stuff are still > > valid. > > > > Maybe you need to check bdevp->bd_openers and returns -EBUSY if you > > want to alter the begin of the partition. > > I disallowed altering the start of the partition ( that would just > be crazy ), but altering the end has been supported on lvm for years > now. Ext4 can not perform an online shrink, but btrfs can, and I was > able to successfully have btrfs shrink the fs and then use > BLKPG_RES_PARTITION to shrink the partition. So, if there is an IO in flight while partition shrinking is happening, then IO can end up happening outside the partition? part->nr_sects can be 64 bits on 32bit machines and update will be non-atomic. I had used sequence counter to make sure read is able to get to intermediate value. May be it is a good idea to address this concenrn. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 15:58 ` Vivek Goyal @ 2011-12-08 16:06 ` Phillip Susi 2011-12-08 16:28 ` Vivek Goyal 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-08 16:06 UTC (permalink / raw) To: Vivek Goyal; +Cc: Karel Zak, linux-kernel, Jens Axboe On 12/8/2011 10:58 AM, Vivek Goyal wrote: > So, if there is an IO in flight while partition shrinking is happening, > then IO can end up happening outside the partition? Yes. Of course if there was an in flight IO at the time you truncate the partition, then you did something wrong ( didn't shrink the fs first ). > part->nr_sects can be 64 bits on 32bit machines and update will be > non-atomic. I had used sequence counter to make sure read is able > to get to intermediate value. May be it is a good idea to address > this concenrn. Isn't that what the mutex is for? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 16:06 ` Phillip Susi @ 2011-12-08 16:28 ` Vivek Goyal 2011-12-08 16:55 ` Phillip Susi 2011-12-09 2:53 ` Phillip Susi 0 siblings, 2 replies; 36+ messages in thread From: Vivek Goyal @ 2011-12-08 16:28 UTC (permalink / raw) To: Phillip Susi; +Cc: Karel Zak, linux-kernel, Jens Axboe On Thu, Dec 08, 2011 at 11:06:29AM -0500, Phillip Susi wrote: > On 12/8/2011 10:58 AM, Vivek Goyal wrote: > >So, if there is an IO in flight while partition shrinking is happening, > >then IO can end up happening outside the partition? > > Yes. Of course if there was an in flight IO at the time you > truncate the partition, then you did something wrong ( didn't shrink > the fs first ). Ok, so one needs to shrink filesystem. So if partition is part of a volume group one is supposed to first reduce the size using pvresize and then shrink actual partition? > > >part->nr_sects can be 64 bits on 32bit machines and update will be > >non-atomic. I had used sequence counter to make sure read is able > >to get to intermediate value. May be it is a good idea to address > >this concenrn. > > Isn't that what the mutex is for? I thought I have found couple of places where we don't take mutex. For example. drive_stat_acct() disk_map_sector_rcu() sector_in_part() --> read part->nr_sects without mutex. printk_all_partitions() ---> read part->nr_sects without mutex. show_partition() ---> read part->nr_sects without mutex. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 16:28 ` Vivek Goyal @ 2011-12-08 16:55 ` Phillip Susi 2011-12-09 2:53 ` Phillip Susi 1 sibling, 0 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-08 16:55 UTC (permalink / raw) To: Vivek Goyal; +Cc: Karel Zak, linux-kernel, Jens Axboe On 12/8/2011 11:28 AM, Vivek Goyal wrote: > Ok, so one needs to shrink filesystem. So if partition is part of a > volume group one is supposed to first reduce the size using pvresize > and then shrink actual partition? Yes. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-08 16:28 ` Vivek Goyal 2011-12-08 16:55 ` Phillip Susi @ 2011-12-09 2:53 ` Phillip Susi 2011-12-12 14:53 ` Vivek Goyal 1 sibling, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-09 2:53 UTC (permalink / raw) To: Vivek Goyal; +Cc: Karel Zak, linux-kernel, Jens Axboe [-- Attachment #1.1: Type: text/plain, Size: 199 bytes --] On 12/08/2011 11:28 AM, Vivek Goyal wrote: > I thought I have found couple of places where we don't take mutex. For > example. I merged your seq changes into my patch. How does this look? [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.2: 0001-Add-partition-resize-function-to-BLKPG-ioctl.patch --] [-- Type: text/x-patch; name="0001-Add-partition-resize-function-to-BLKPG-ioctl.patch", Size: 8423 bytes --] From 8dbca9e824dd18eeb44079717234db0e853f8478 Mon Sep 17 00:00:00 2001 From: Phillip Susi <psusi@cfl.rr.com> Date: Wed, 30 Nov 2011 22:05:24 -0500 Subject: [PATCH 1/3] Add partition resize function to BLKPG ioctl Add a new operation code ( BLKPG_RES_PARTITION ) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. --- block/genhd.c | 10 ++++---- block/ioctl.c | 51 +++++++++++++++++++++++++++++++++++++- fs/partitions/check.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++++- include/linux/blkpg.h | 1 + include/linux/genhd.h | 7 +++++ 5 files changed, 126 insertions(+), 8 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 02e9fca..b34fb90 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter) part = rcu_dereference(ptbl->part[piter->idx]); if (!part) continue; - if (!part->nr_sects && + if (!part_nr_sects_read(part) && !(piter->flags & DISK_PITER_INCL_EMPTY) && !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 && piter->idx == 0)) @@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit); static inline int sector_in_part(struct hd_struct *part, sector_t sector) { return part->start_sect <= sector && - sector < part->start_sect + part->nr_sects; + sector < part->start_sect + part_nr_sects_read(part); } /** @@ -766,8 +766,8 @@ void __init printk_all_partitions(void) printk("%s%s %10llu %s %s", is_part0 ? "" : " ", bdevt_str(part_devt(part), devt_buf), - (unsigned long long)part->nr_sects >> 1, - disk_name(disk, part->partno, name_buf), uuid); + (unsigned long long)part_nr_sects_read(part) >> 1 + , disk_name(disk, part->partno, name_buf), uuid); if (is_part0) { if (disk->driverfs_dev != NULL && disk->driverfs_dev->driver != NULL) @@ -858,7 +858,7 @@ static int show_partition(struct seq_file *seqf, void *v) while ((part = disk_part_iter_next(&piter))) seq_printf(seqf, "%4d %7d %10llu %s\n", MAJOR(part_devt(part)), MINOR(part_devt(part)), - (unsigned long long)part->nr_sects >> 1, + (unsigned long long)part_nr_sects_read(part) >> 1, disk_name(sgp, part->partno, buf)); disk_part_iter_exit(&piter); diff --git a/block/ioctl.c b/block/ioctl.c index ca939fc..f97e6a4 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user case BLKPG_ADD_PARTITION: start = p.start >> 9; length = p.length >> 9; - /* check for fit in a hd_struct */ - if (sizeof(sector_t) == sizeof(long) && + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && sizeof(long long) > sizeof(long)) { long pstart = start, plength = length; if (pstart != start || plength != length @@ -92,6 +92,53 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); return 0; + case BLKPG_RES_PARTITION: + start = p.start >> 9; + length = p.length >> 9; + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && + sizeof(long long) > sizeof(long)) { + long pstart = start, plength = length; + if (pstart != start || plength != length + || pstart < 0 || plength < 0) + return -EINVAL; + } + + mutex_lock(&bdev->bd_mutex); + + /* overlap? */ + disk_part_iter_init(&piter, disk, + DISK_PITER_INCL_EMPTY); + while ((part = disk_part_iter_next(&piter))) { + if (part->partno != partno && !(start + length <= part->start_sect || + start >= part->start_sect + part->nr_sects)) { + disk_part_iter_exit(&piter); + mutex_unlock(&bdev->bd_mutex); + return -EBUSY; + } + } + disk_part_iter_exit(&piter); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + if (start != part->start_sect) + { + mutex_unlock(&bdev->bd_mutex); + disk_put_part(part); + return -EINVAL; + } + part->nr_sects = length; + bdevp = bdget(part_devt(part)); + mutex_lock(&bdevp->bd_mutex); + i_size_write(bdevp->bd_inode, p.length); + mutex_unlock(&bdevp->bd_mutex); + bdput(bdevp); + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + return 0; default: return -EINVAL; } diff --git a/fs/partitions/check.c b/fs/partitions/check.c index e3c63d1..090ee1a 100644 --- a/fs/partitions/check.c +++ b/fs/partitions/check.c @@ -234,7 +234,7 @@ ssize_t part_size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hd_struct *p = dev_to_part(dev); - return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects); + return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p)); } static ssize_t part_ro_show(struct device *dev, @@ -408,6 +408,69 @@ void delete_partition(struct gendisk *disk, int partno) hd_struct_put(part); } +#if BITS_PER_LONG == 32 && defined(CONFIG_LBDAF) +static inline void part_nr_sects_write_begin(struct seqcount_t *seq) +{ + write_seqcount_begin(&seq); +} + +static inline void part_nr_sects_write_end(struct seqcount_t *seq) +{ + write_seqcount_end(&seq); +} + +/* + * Any access of part->nr_sects which is not protected by partition + * bd_mutex or gendisk bdev bd_mutex, hould be done using this accessor + * function. + */ +sector_t part_nr_sects_read(struct hd_struct *part) +{ + sector_t nr_sects; + unsigned seq; + + do { + seq = read_seqcount_begin(&part->seq); + nr_sects = part->nr_sects; + } while (read_seqcount_retry(&part->seq, seq)); + + return nr_sects; +} +#else +static inline void part_nr_sects_write_begin(seqcount_t *seq) {} +static inline void part_nr_sects_write_end(seqcount_t *seq) {} +sector_t part_nr_sects_read(struct hd_struct *part) +{ + return part->nr_sects; +} +#endif + +int extend_partition(struct gendisk *disk, int partno, sector_t size) +{ + struct disk_part_tbl *ptbl = disk->part_tbl; + struct hd_struct *part; + unsigned long flags; + + if (partno >= ptbl->len) + return 1; + + part = ptbl->part[partno]; + if (!part) + return 1; + + /* + * It is called with mutex held for writer mutual exclusion. Disabling + * interrupts to protect against a reader in interrupt/softirq + * context. Is it not needed? + */ + local_irq_save(flags); + part_nr_sects_write_begin(&part->seq); + part->nr_sects += size; + part_nr_sects_write_end(&part->seq); + local_irq_restore(flags); + return 0; +} + static ssize_t whole_disk_show(struct device *dev, struct device_attribute *attr, char *buf) { diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index faf8a45..103da38 100644 --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -40,6 +40,7 @@ struct blkpg_ioctl_arg { /* The subfunctions (for the op field) */ #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 +#define BLKPG_RES_PARTITION 3 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 6d18f35..a55ce2c 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -98,7 +98,13 @@ struct partition_meta_info { struct hd_struct { sector_t start_sect; + /* + * nr_sects is protected by sequence counter. One might extend a + * partition while IO is happening to it and update of nr_sects + * can be non-atomic on 32bit machines with 64bit sector_t. + */ sector_t nr_sects; + seqcount_t seq; sector_t alignment_offset; unsigned int discard_alignment; struct device __dev; @@ -604,6 +610,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk, extern void __delete_partition(struct hd_struct *); extern void delete_partition(struct gendisk *, int); extern void printk_all_partitions(void); +extern sector_t part_nr_sects_read(struct hd_struct *part); extern struct gendisk *alloc_disk_node(int minors, int node_id); extern struct gendisk *alloc_disk(int minors); -- 1.7.5.4 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 262 bytes --] ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-09 2:53 ` Phillip Susi @ 2011-12-12 14:53 ` Vivek Goyal 2011-12-12 17:43 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Vivek Goyal @ 2011-12-12 14:53 UTC (permalink / raw) To: Phillip Susi; +Cc: Karel Zak, linux-kernel, Jens Axboe On Thu, Dec 08, 2011 at 09:53:09PM -0500, Phillip Susi wrote: > On 12/08/2011 11:28 AM, Vivek Goyal wrote: > > I thought I have found couple of places where we don't take mutex. For > > example. > > I merged your seq changes into my patch. How does this look? Can you please inline the patch so that reviewing becomes easier. Right now it is an attachment. Couple of quick comments. - I think during copy/paste tabs have been replaced by spaces at places. - You have retained function extend_partition() from my patch. I think you are not using it. - Can we rename BLKPG_RES_PARTITION to BLKPG_RESIZE_PARTITION. It is easier to read. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-12 14:53 ` Vivek Goyal @ 2011-12-12 17:43 ` Phillip Susi 2011-12-12 17:49 ` Joe Perches 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-12 17:43 UTC (permalink / raw) To: Vivek Goyal; +Cc: Karel Zak, linux-kernel, Jens Axboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/12/2011 09:53 AM, Vivek Goyal wrote: > Can you please inline the patch so that reviewing becomes easier. Right > now it is an attachment. Hrm... it shows up inline for me. I guess I'll copy/paste this time. > - I think during copy/paste tabs have been replaced by spaces at places. > - You have retained function extend_partition() from my patch. I think you > are not using it. Woops, fixed. > - Can we rename BLKPG_RES_PARTITION to BLKPG_RESIZE_PARTITION. It is > easier to read. BLKPG_DELETE_PARTITION is also easier to read, but the existing code went with DEL. I prefer to stay consistent with the existing code and also use a 3 letter name. - From de3a303233515b89c4d18869daca1bd5e6775895 Mon Sep 17 00:00:00 2001 From: Phillip Susi <psusi@cfl.rr.com> Date: Mon, 12 Dec 2011 12:33:36 -0500 Subject: [PATCH] Add partition resize function to BLKPG ioctl Add a new operation code ( BLKPG_RES_PARTITION ) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. Thanks to Vivek Goyal for the seq counter idea. - --- block/genhd.c | 10 ++++---- block/ioctl.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++- fs/partitions/check.c | 39 ++++++++++++++++++++++++++++++++++++- include/linux/blkpg.h | 1 + include/linux/genhd.h | 7 ++++++ 5 files changed, 100 insertions(+), 8 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 02e9fca..b34fb90 100644 - --- a/block/genhd.c +++ b/block/genhd.c @@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter) part = rcu_dereference(ptbl->part[piter->idx]); if (!part) continue; - - if (!part->nr_sects && + if (!part_nr_sects_read(part) && !(piter->flags & DISK_PITER_INCL_EMPTY) && !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 && piter->idx == 0)) @@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit); static inline int sector_in_part(struct hd_struct *part, sector_t sector) { return part->start_sect <= sector && - - sector < part->start_sect + part->nr_sects; + sector < part->start_sect + part_nr_sects_read(part); } /** @@ -766,8 +766,8 @@ void __init printk_all_partitions(void) printk("%s%s %10llu %s %s", is_part0 ? "" : " ", bdevt_str(part_devt(part), devt_buf), - - (unsigned long long)part->nr_sects >> 1, - - disk_name(disk, part->partno, name_buf), uuid); + (unsigned long long)part_nr_sects_read(part) >> 1 + , disk_name(disk, part->partno, name_buf), uuid); if (is_part0) { if (disk->driverfs_dev != NULL && disk->driverfs_dev->driver != NULL) @@ -858,7 +858,7 @@ static int show_partition(struct seq_file *seqf, void *v) while ((part = disk_part_iter_next(&piter))) seq_printf(seqf, "%4d %7d %10llu %s\n", MAJOR(part_devt(part)), MINOR(part_devt(part)), - - (unsigned long long)part->nr_sects >> 1, + (unsigned long long)part_nr_sects_read(part) >> 1, disk_name(sgp, part->partno, buf)); disk_part_iter_exit(&piter); diff --git a/block/ioctl.c b/block/ioctl.c index ca939fc..f97e6a4 100644 - --- a/block/ioctl.c +++ b/block/ioctl.c @@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user case BLKPG_ADD_PARTITION: start = p.start >> 9; length = p.length >> 9; - - /* check for fit in a hd_struct */ - - if (sizeof(sector_t) == sizeof(long) && + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && sizeof(long long) > sizeof(long)) { long pstart = start, plength = length; if (pstart != start || plength != length @@ -92,6 +92,53 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); return 0; + case BLKPG_RES_PARTITION: + start = p.start >> 9; + length = p.length >> 9; + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && + sizeof(long long) > sizeof(long)) { + long pstart = start, plength = length; + if (pstart != start || plength != length + || pstart < 0 || plength < 0) + return -EINVAL; + } + + mutex_lock(&bdev->bd_mutex); + + /* overlap? */ + disk_part_iter_init(&piter, disk, + DISK_PITER_INCL_EMPTY); + while ((part = disk_part_iter_next(&piter))) { + if (part->partno != partno && !(start + length <= part->start_sect || + start >= part->start_sect + part->nr_sects)) { + disk_part_iter_exit(&piter); + mutex_unlock(&bdev->bd_mutex); + return -EBUSY; + } + } + disk_part_iter_exit(&piter); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + if (start != part->start_sect) + { + mutex_unlock(&bdev->bd_mutex); + disk_put_part(part); + return -EINVAL; + } + part->nr_sects = length; + bdevp = bdget(part_devt(part)); + mutex_lock(&bdevp->bd_mutex); + i_size_write(bdevp->bd_inode, p.length); + mutex_unlock(&bdevp->bd_mutex); + bdput(bdevp); + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + return 0; default: return -EINVAL; } diff --git a/fs/partitions/check.c b/fs/partitions/check.c index e3c63d1..089e4bc 100644 - --- a/fs/partitions/check.c +++ b/fs/partitions/check.c @@ -234,7 +234,7 @@ ssize_t part_size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hd_struct *p = dev_to_part(dev); - - return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects); + return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p)); } static ssize_t part_ro_show(struct device *dev, @@ -408,6 +408,43 @@ void delete_partition(struct gendisk *disk, int partno) hd_struct_put(part); } +#if BITS_PER_LONG == 32 && defined(CONFIG_LBDAF) +static inline void part_nr_sects_write_begin(struct seqcount_t *seq) +{ + write_seqcount_begin(&seq); +} + +static inline void part_nr_sects_write_end(struct seqcount_t *seq) +{ + write_seqcount_end(&seq); +} + +/* + * Any access of part->nr_sects which is not protected by partition + * bd_mutex or gendisk bdev bd_mutex, hould be done using this accessor + * function. + */ +sector_t part_nr_sects_read(struct hd_struct *part) +{ + sector_t nr_sects; + unsigned seq; + + do { + seq = read_seqcount_begin(&part->seq); + nr_sects = part->nr_sects; + } while (read_seqcount_retry(&part->seq, seq)); + + return nr_sects; +} +#else +static inline void part_nr_sects_write_begin(seqcount_t *seq) {} +static inline void part_nr_sects_write_end(seqcount_t *seq) {} +sector_t part_nr_sects_read(struct hd_struct *part) +{ + return part->nr_sects; +} +#endif + static ssize_t whole_disk_show(struct device *dev, struct device_attribute *attr, char *buf) { diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index faf8a45..103da38 100644 - --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -40,6 +40,7 @@ struct blkpg_ioctl_arg { /* The subfunctions (for the op field) */ #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 +#define BLKPG_RES_PARTITION 3 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 6d18f35..a55ce2c 100644 - --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -98,7 +98,13 @@ struct partition_meta_info { struct hd_struct { sector_t start_sect; + /* + * nr_sects is protected by sequence counter. One might extend a + * partition while IO is happening to it and update of nr_sects + * can be non-atomic on 32bit machines with 64bit sector_t. + */ sector_t nr_sects; + seqcount_t seq; sector_t alignment_offset; unsigned int discard_alignment; struct device __dev; @@ -604,6 +610,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk, extern void __delete_partition(struct hd_struct *); extern void delete_partition(struct gendisk *, int); extern void printk_all_partitions(void); +extern sector_t part_nr_sects_read(struct hd_struct *part); extern struct gendisk *alloc_disk_node(int minors, int node_id); extern struct gendisk *alloc_disk(int minors); - -- 1.7.5.4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7mPT4ACgkQJ4UciIs+XuJs4QCfS4zI3KeVsI1trJ/135mZMz6F 6J8AnRy1hH++V32jUgait02EYeHfuI+c =6guc -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-12 17:43 ` Phillip Susi @ 2011-12-12 17:49 ` Joe Perches 2011-12-12 18:04 ` Vivek Goyal 0 siblings, 1 reply; 36+ messages in thread From: Joe Perches @ 2011-12-12 17:49 UTC (permalink / raw) To: Phillip Susi; +Cc: Vivek Goyal, Karel Zak, linux-kernel, Jens Axboe On Mon, 2011-12-12 at 12:43 -0500, Phillip Susi wrote: > On 12/12/2011 09:53 AM, Vivek Goyal wrote: [] > > - Can we rename BLKPG_RES_PARTITION to BLKPG_RESIZE_PARTITION. It is > > easier to read. > BLKPG_DELETE_PARTITION is also easier to read, but the existing code > went with DEL. I prefer to stay consistent with the existing code and > also use a 3 letter name. RES is most commonly used for RESOURCE. If it really needs to be 3 letters, and I don't think it does, perhaps BLKPG_RSZ_PARTITION > diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h > index faf8a45..103da38 100644 > --- a/include/linux/blkpg.h > +++ b/include/linux/blkpg.h > @@ -40,6 +40,7 @@ struct blkpg_ioctl_arg { > /* The subfunctions (for the op field) */ > #define BLKPG_ADD_PARTITION 1 > #define BLKPG_DEL_PARTITION 2 > +#define BLKPG_RES_PARTITION 3 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-12 17:49 ` Joe Perches @ 2011-12-12 18:04 ` Vivek Goyal 2011-12-13 0:15 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Vivek Goyal @ 2011-12-12 18:04 UTC (permalink / raw) To: Joe Perches; +Cc: Phillip Susi, Karel Zak, linux-kernel, Jens Axboe On Mon, Dec 12, 2011 at 09:49:07AM -0800, Joe Perches wrote: > On Mon, 2011-12-12 at 12:43 -0500, Phillip Susi wrote: > > On 12/12/2011 09:53 AM, Vivek Goyal wrote: > [] > > > - Can we rename BLKPG_RES_PARTITION to BLKPG_RESIZE_PARTITION. It is > > > easier to read. > > BLKPG_DELETE_PARTITION is also easier to read, but the existing code > > went with DEL. I prefer to stay consistent with the existing code and > > also use a 3 letter name. > > RES is most commonly used for RESOURCE. > > If it really needs to be 3 letters, and I don't think > it does, perhaps BLKPG_RSZ_PARTITION Agreed. RES does not occur to me as resize but resources. ADD and DEL are very clear. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-12 18:04 ` Vivek Goyal @ 2011-12-13 0:15 ` Phillip Susi 2011-12-13 0:16 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-13 0:15 UTC (permalink / raw) To: Vivek Goyal; +Cc: Joe Perches, Karel Zak, linux-kernel, Jens Axboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/12/2011 01:04 PM, Vivek Goyal wrote: >> If it really needs to be 3 letters, and I don't think >> it does, perhaps BLKPG_RSZ_PARTITION > > Agreed. RES does not occur to me as resize but resources. ADD and DEL > are very clear. Ok, I suppose RESIZE is fine. Here come both patches, hopefully final. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7mmQoACgkQJ4UciIs+XuIDkwCdEvqpBaxWw5jmgj2PBs/0pStB IyQAoJHW2U4GeBRy7f/WdKWlN0Vk6pgC =x4de -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-13 0:15 ` Phillip Susi @ 2011-12-13 0:16 ` Phillip Susi 2011-12-13 0:16 ` [PATCH 2/2] Add BLKPG_GET_PARTITION operation Phillip Susi 2011-12-19 20:25 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal 0 siblings, 2 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-13 0:16 UTC (permalink / raw) To: vgoyal; +Cc: joe, kzak, linux-kernel, jaxboe, psusi Add a new operation code ( BLKPG_RESIZE_PARTITION ) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. Thanks to Vivek Goyal for the seq counter idea. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> --- block/genhd.c | 10 ++++---- block/ioctl.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++- fs/partitions/check.c | 39 ++++++++++++++++++++++++++++++++++++- include/linux/blkpg.h | 1 + include/linux/genhd.h | 7 ++++++ 5 files changed, 100 insertions(+), 8 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 02e9fca..b34fb90 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter) part = rcu_dereference(ptbl->part[piter->idx]); if (!part) continue; - if (!part->nr_sects && + if (!part_nr_sects_read(part) && !(piter->flags & DISK_PITER_INCL_EMPTY) && !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 && piter->idx == 0)) @@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit); static inline int sector_in_part(struct hd_struct *part, sector_t sector) { return part->start_sect <= sector && - sector < part->start_sect + part->nr_sects; + sector < part->start_sect + part_nr_sects_read(part); } /** @@ -766,8 +766,8 @@ void __init printk_all_partitions(void) printk("%s%s %10llu %s %s", is_part0 ? "" : " ", bdevt_str(part_devt(part), devt_buf), - (unsigned long long)part->nr_sects >> 1, - disk_name(disk, part->partno, name_buf), uuid); + (unsigned long long)part_nr_sects_read(part) >> 1 + , disk_name(disk, part->partno, name_buf), uuid); if (is_part0) { if (disk->driverfs_dev != NULL && disk->driverfs_dev->driver != NULL) @@ -858,7 +858,7 @@ static int show_partition(struct seq_file *seqf, void *v) while ((part = disk_part_iter_next(&piter))) seq_printf(seqf, "%4d %7d %10llu %s\n", MAJOR(part_devt(part)), MINOR(part_devt(part)), - (unsigned long long)part->nr_sects >> 1, + (unsigned long long)part_nr_sects_read(part) >> 1, disk_name(sgp, part->partno, buf)); disk_part_iter_exit(&piter); diff --git a/block/ioctl.c b/block/ioctl.c index ca939fc..e4e9198 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user case BLKPG_ADD_PARTITION: start = p.start >> 9; length = p.length >> 9; - /* check for fit in a hd_struct */ - if (sizeof(sector_t) == sizeof(long) && + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && sizeof(long long) > sizeof(long)) { long pstart = start, plength = length; if (pstart != start || plength != length @@ -92,6 +92,53 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); return 0; + case BLKPG_RESIZE_PARTITION: + start = p.start >> 9; + length = p.length >> 9; + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && + sizeof(long long) > sizeof(long)) { + long pstart = start, plength = length; + if (pstart != start || plength != length + || pstart < 0 || plength < 0) + return -EINVAL; + } + + mutex_lock(&bdev->bd_mutex); + + /* overlap? */ + disk_part_iter_init(&piter, disk, + DISK_PITER_INCL_EMPTY); + while ((part = disk_part_iter_next(&piter))) { + if (part->partno != partno && !(start + length <= part->start_sect || + start >= part->start_sect + part->nr_sects)) { + disk_part_iter_exit(&piter); + mutex_unlock(&bdev->bd_mutex); + return -EBUSY; + } + } + disk_part_iter_exit(&piter); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + if (start != part->start_sect) + { + mutex_unlock(&bdev->bd_mutex); + disk_put_part(part); + return -EINVAL; + } + part->nr_sects = length; + bdevp = bdget(part_devt(part)); + mutex_lock(&bdevp->bd_mutex); + i_size_write(bdevp->bd_inode, p.length); + mutex_unlock(&bdevp->bd_mutex); + bdput(bdevp); + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + return 0; default: return -EINVAL; } diff --git a/fs/partitions/check.c b/fs/partitions/check.c index e3c63d1..089e4bc 100644 --- a/fs/partitions/check.c +++ b/fs/partitions/check.c @@ -234,7 +234,7 @@ ssize_t part_size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hd_struct *p = dev_to_part(dev); - return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects); + return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p)); } static ssize_t part_ro_show(struct device *dev, @@ -408,6 +408,43 @@ void delete_partition(struct gendisk *disk, int partno) hd_struct_put(part); } +#if BITS_PER_LONG == 32 && defined(CONFIG_LBDAF) +static inline void part_nr_sects_write_begin(struct seqcount_t *seq) +{ + write_seqcount_begin(&seq); +} + +static inline void part_nr_sects_write_end(struct seqcount_t *seq) +{ + write_seqcount_end(&seq); +} + +/* + * Any access of part->nr_sects which is not protected by partition + * bd_mutex or gendisk bdev bd_mutex, hould be done using this accessor + * function. + */ +sector_t part_nr_sects_read(struct hd_struct *part) +{ + sector_t nr_sects; + unsigned seq; + + do { + seq = read_seqcount_begin(&part->seq); + nr_sects = part->nr_sects; + } while (read_seqcount_retry(&part->seq, seq)); + + return nr_sects; +} +#else +static inline void part_nr_sects_write_begin(seqcount_t *seq) {} +static inline void part_nr_sects_write_end(seqcount_t *seq) {} +sector_t part_nr_sects_read(struct hd_struct *part) +{ + return part->nr_sects; +} +#endif + static ssize_t whole_disk_show(struct device *dev, struct device_attribute *attr, char *buf) { diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index faf8a45..a851944 100644 --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -40,6 +40,7 @@ struct blkpg_ioctl_arg { /* The subfunctions (for the op field) */ #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 +#define BLKPG_RESIZE_PARTITION 3 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 6d18f35..a55ce2c 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -98,7 +98,13 @@ struct partition_meta_info { struct hd_struct { sector_t start_sect; + /* + * nr_sects is protected by sequence counter. One might extend a + * partition while IO is happening to it and update of nr_sects + * can be non-atomic on 32bit machines with 64bit sector_t. + */ sector_t nr_sects; + seqcount_t seq; sector_t alignment_offset; unsigned int discard_alignment; struct device __dev; @@ -604,6 +610,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk, extern void __delete_partition(struct hd_struct *); extern void delete_partition(struct gendisk *, int); extern void printk_all_partitions(void); +extern sector_t part_nr_sects_read(struct hd_struct *part); extern struct gendisk *alloc_disk_node(int minors, int node_id); extern struct gendisk *alloc_disk(int minors); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/2] Add BLKPG_GET_PARTITION operation 2011-12-13 0:16 ` Phillip Susi @ 2011-12-13 0:16 ` Phillip Susi 2011-12-19 20:25 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal 1 sibling, 0 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-13 0:16 UTC (permalink / raw) To: vgoyal; +Cc: joe, kzak, linux-kernel, jaxboe, psusi Add a new operation to the BLKPG ioctl to read the partition. This allows user space to find the current start and length of a partition without having to open the partition and use the long depreciated HDIO_GETGEO ioctl. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> --- block/ioctl.c | 15 +++++++++++++++ include/linux/blkpg.h | 1 + 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index e4e9198..c84a949 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -139,6 +139,21 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user disk_put_part(part); mutex_unlock(&bdev->bd_mutex); return 0; + case BLKPG_GET_PARTITION: + mutex_lock(&bdev->bd_mutex); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + p.start = part->start_sect << 9; + p.length = part->nr_sects << 9; + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + if (copy_to_user(a.data, &p, sizeof(struct blkpg_partition))) + return -EFAULT; + return 0; default: return -EINVAL; } diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index a851944..72bf5e2 100644 --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -41,6 +41,7 @@ struct blkpg_ioctl_arg { #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 #define BLKPG_RESIZE_PARTITION 3 +#define BLKPG_GET_PARTITION 4 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-13 0:16 ` Phillip Susi 2011-12-13 0:16 ` [PATCH 2/2] Add BLKPG_GET_PARTITION operation Phillip Susi @ 2011-12-19 20:25 ` Vivek Goyal 2011-12-21 1:53 ` Phillip Susi 1 sibling, 1 reply; 36+ messages in thread From: Vivek Goyal @ 2011-12-19 20:25 UTC (permalink / raw) To: Phillip Susi; +Cc: joe, kzak, linux-kernel, jaxboe On Mon, Dec 12, 2011 at 07:16:21PM -0500, Phillip Susi wrote: [..] > + case BLKPG_RESIZE_PARTITION: > + start = p.start >> 9; > + length = p.length >> 9; > + /* check for fit in a hd_struct */ > + if (sizeof(sector_t) == sizeof(long) && > + sizeof(long long) > sizeof(long)) { > + long pstart = start, plength = length; > + if (pstart != start || plength != length > + || pstart < 0 || plength < 0) > + return -EINVAL; > + } > + > + mutex_lock(&bdev->bd_mutex); > + > + /* overlap? */ > + disk_part_iter_init(&piter, disk, > + DISK_PITER_INCL_EMPTY); > + while ((part = disk_part_iter_next(&piter))) { > + if (part->partno != partno && !(start + length <= part->start_sect || > + start >= part->start_sect + part->nr_sects)) { > + disk_part_iter_exit(&piter); > + mutex_unlock(&bdev->bd_mutex); > + return -EBUSY; > + } > + } > + disk_part_iter_exit(&piter); > + part = disk_get_part(disk, partno); > + if (!part) > + { > + mutex_unlock(&bdev->bd_mutex); > + return -ENXIO; > + } > + if (start != part->start_sect) > + { > + mutex_unlock(&bdev->bd_mutex); > + disk_put_part(part); > + return -EINVAL; > + } > + part->nr_sects = length; > + bdevp = bdget(part_devt(part)); > + mutex_lock(&bdevp->bd_mutex); Are there any restrictions on order of partition bdev and disk bdev locking. I see that DEL_PARTITION ioctl first takes partition mutex and then disk mutex using mutex_lock_nested(). In your implementation you take disk mutex first and the partition bdev mutex. - Is that a problem from deadlock point of view? Two threads calling ioctl on same fd. - Is it an issue from lock validator point of view. I am not familiar with mutex_lock_nested(). So thought of asking. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-19 20:25 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal @ 2011-12-21 1:53 ` Phillip Susi 2011-12-21 1:54 ` Phillip Susi 2011-12-21 20:46 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal 0 siblings, 2 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-21 1:53 UTC (permalink / raw) To: Vivek Goyal; +Cc: joe, kzak, linux-kernel, jaxboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/19/2011 03:25 PM, Vivek Goyal wrote: > Are there any restrictions on order of partition bdev and disk bdev > locking. Good point, yes, it seems you need to take the partition first. Now that I think about it though, I don't think we need the disk mutex at all. I thought you needed to hold it in order to iterate over the partitions, but it looks like you don't since that uses an RCU, right? I guess we'll go for another try. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO8TwsAAoJEJrBOlT6nu75PIwIAJiPhnaRKJokVxzgzg3UQLb+ jPLgzl3a/lD9PcDdg50ol8rYSwqTXUgWjncFwV752nIEsTV/3kGRGfvgkmO25PsS vjkPjL3rubGmYk2n3QPCU5SNIm1iH7CtyTxHcU2e/qUdfccF5jwo38mPd3xYw/IY GEngNnXwmDDoVttHFK+WDuHpxltZb60quIeVtBhdJQxff6npY78G3dNdKnbvwAb6 v97ZNPDMt5vXOCqA7vWYMLZbJMW5FzDCEr3ehTUOaegT3HgXyRa3tpWDnih3/nyT WI1ngR9qou2qpu13FxS4h8GI4zWyPxgd4EOVqD7JlAE5Z6hTne5Z0fo2VJkVK3A= =ZC+O -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-21 1:53 ` Phillip Susi @ 2011-12-21 1:54 ` Phillip Susi 2011-12-21 1:54 ` [PATCH 2/2] Add BLKPG_GET_PARTITION operation Phillip Susi 2011-12-21 20:46 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal 1 sibling, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-21 1:54 UTC (permalink / raw) To: vgoyal; +Cc: joe, kzak, linux-kernel, jaxboe, Phillip Susi Add a new operation code ( BLKPG_RESIZE_PARTITION ) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. Thanks to Vivek Goyal for the seq counter idea. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> --- block/genhd.c | 10 +++++----- block/ioctl.c | 41 +++++++++++++++++++++++++++++++++++++++-- fs/partitions/check.c | 39 ++++++++++++++++++++++++++++++++++++++- include/linux/blkpg.h | 1 + include/linux/genhd.h | 7 +++++++ 5 files changed, 90 insertions(+), 8 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 02e9fca..b34fb90 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter) part = rcu_dereference(ptbl->part[piter->idx]); if (!part) continue; - if (!part->nr_sects && + if (!part_nr_sects_read(part) && !(piter->flags & DISK_PITER_INCL_EMPTY) && !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 && piter->idx == 0)) @@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit); static inline int sector_in_part(struct hd_struct *part, sector_t sector) { return part->start_sect <= sector && - sector < part->start_sect + part->nr_sects; + sector < part->start_sect + part_nr_sects_read(part); } /** @@ -766,8 +766,8 @@ void __init printk_all_partitions(void) printk("%s%s %10llu %s %s", is_part0 ? "" : " ", bdevt_str(part_devt(part), devt_buf), - (unsigned long long)part->nr_sects >> 1, - disk_name(disk, part->partno, name_buf), uuid); + (unsigned long long)part_nr_sects_read(part) >> 1 + , disk_name(disk, part->partno, name_buf), uuid); if (is_part0) { if (disk->driverfs_dev != NULL && disk->driverfs_dev->driver != NULL) @@ -858,7 +858,7 @@ static int show_partition(struct seq_file *seqf, void *v) while ((part = disk_part_iter_next(&piter))) seq_printf(seqf, "%4d %7d %10llu %s\n", MAJOR(part_devt(part)), MINOR(part_devt(part)), - (unsigned long long)part->nr_sects >> 1, + (unsigned long long)part_nr_sects_read(part) >> 1, disk_name(sgp, part->partno, buf)); disk_part_iter_exit(&piter); diff --git a/block/ioctl.c b/block/ioctl.c index ca939fc..4a097db 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user case BLKPG_ADD_PARTITION: start = p.start >> 9; length = p.length >> 9; - /* check for fit in a hd_struct */ - if (sizeof(sector_t) == sizeof(long) && + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && sizeof(long long) > sizeof(long)) { long pstart = start, plength = length; if (pstart != start || plength != length @@ -92,6 +92,43 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); return 0; + case BLKPG_RESIZE_PARTITION: + start = p.start >> 9; + length = p.length >> 9; + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && + sizeof(long long) > sizeof(long)) { + long pstart = start, plength = length; + if (pstart != start || plength != length + || pstart < 0 || plength < 0) + return -EINVAL; + } + /* overlap? */ + disk_part_iter_init(&piter, disk, + DISK_PITER_INCL_EMPTY); + while ((part = disk_part_iter_next(&piter))) { + if (part->partno != partno && !(start + length <= part->start_sect || + start >= part->start_sect + part->nr_sects)) { + disk_part_iter_exit(&piter); + return -EBUSY; + } + } + disk_part_iter_exit(&piter); + part = disk_get_part(disk, partno); + if (!part) + return -ENXIO; + if (start != part->start_sect) { + disk_put_part(part); + return -EINVAL; + } + part->nr_sects = length; + bdevp = bdget(part_devt(part)); + mutex_lock(&bdevp->bd_mutex); + i_size_write(bdevp->bd_inode, p.length); + mutex_unlock(&bdevp->bd_mutex); + bdput(bdevp); + disk_put_part(part); + return 0; default: return -EINVAL; } diff --git a/fs/partitions/check.c b/fs/partitions/check.c index e3c63d1..089e4bc 100644 --- a/fs/partitions/check.c +++ b/fs/partitions/check.c @@ -234,7 +234,7 @@ ssize_t part_size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hd_struct *p = dev_to_part(dev); - return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects); + return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p)); } static ssize_t part_ro_show(struct device *dev, @@ -408,6 +408,43 @@ void delete_partition(struct gendisk *disk, int partno) hd_struct_put(part); } +#if BITS_PER_LONG == 32 && defined(CONFIG_LBDAF) +static inline void part_nr_sects_write_begin(struct seqcount_t *seq) +{ + write_seqcount_begin(&seq); +} + +static inline void part_nr_sects_write_end(struct seqcount_t *seq) +{ + write_seqcount_end(&seq); +} + +/* + * Any access of part->nr_sects which is not protected by partition + * bd_mutex or gendisk bdev bd_mutex, hould be done using this accessor + * function. + */ +sector_t part_nr_sects_read(struct hd_struct *part) +{ + sector_t nr_sects; + unsigned seq; + + do { + seq = read_seqcount_begin(&part->seq); + nr_sects = part->nr_sects; + } while (read_seqcount_retry(&part->seq, seq)); + + return nr_sects; +} +#else +static inline void part_nr_sects_write_begin(seqcount_t *seq) {} +static inline void part_nr_sects_write_end(seqcount_t *seq) {} +sector_t part_nr_sects_read(struct hd_struct *part) +{ + return part->nr_sects; +} +#endif + static ssize_t whole_disk_show(struct device *dev, struct device_attribute *attr, char *buf) { diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index faf8a45..a851944 100644 --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -40,6 +40,7 @@ struct blkpg_ioctl_arg { /* The subfunctions (for the op field) */ #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 +#define BLKPG_RESIZE_PARTITION 3 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 6d18f35..a55ce2c 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -98,7 +98,13 @@ struct partition_meta_info { struct hd_struct { sector_t start_sect; + /* + * nr_sects is protected by sequence counter. One might extend a + * partition while IO is happening to it and update of nr_sects + * can be non-atomic on 32bit machines with 64bit sector_t. + */ sector_t nr_sects; + seqcount_t seq; sector_t alignment_offset; unsigned int discard_alignment; struct device __dev; @@ -604,6 +610,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk, extern void __delete_partition(struct hd_struct *); extern void delete_partition(struct gendisk *, int); extern void printk_all_partitions(void); +extern sector_t part_nr_sects_read(struct hd_struct *part); extern struct gendisk *alloc_disk_node(int minors, int node_id); extern struct gendisk *alloc_disk(int minors); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/2] Add BLKPG_GET_PARTITION operation 2011-12-21 1:54 ` Phillip Susi @ 2011-12-21 1:54 ` Phillip Susi 0 siblings, 0 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-21 1:54 UTC (permalink / raw) To: vgoyal; +Cc: joe, kzak, linux-kernel, jaxboe, Phillip Susi Add a new operation to the BLKPG ioctl to read the partition. This allows user space to find the current start and length of a partition without having to open the partition and use the long depreciated HDIO_GETGEO ioctl. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> --- block/ioctl.c | 15 +++++++++++++++ include/linux/blkpg.h | 1 + 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index 4a097db..245024a 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -129,6 +129,21 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); disk_put_part(part); return 0; + case BLKPG_GET_PARTITION: + mutex_lock(&bdev->bd_mutex); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + p.start = part->start_sect << 9; + p.length = part->nr_sects << 9; + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + if (copy_to_user(a.data, &p, sizeof(struct blkpg_partition))) + return -EFAULT; + return 0; default: return -EINVAL; } diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index a851944..72bf5e2 100644 --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -41,6 +41,7 @@ struct blkpg_ioctl_arg { #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 #define BLKPG_RESIZE_PARTITION 3 +#define BLKPG_GET_PARTITION 4 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-21 1:53 ` Phillip Susi 2011-12-21 1:54 ` Phillip Susi @ 2011-12-21 20:46 ` Vivek Goyal 2011-12-24 21:36 ` Phillip Susi 1 sibling, 1 reply; 36+ messages in thread From: Vivek Goyal @ 2011-12-21 20:46 UTC (permalink / raw) To: Phillip Susi; +Cc: joe, kzak, linux-kernel, jaxboe On Tue, Dec 20, 2011 at 08:53:52PM -0500, Phillip Susi wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 12/19/2011 03:25 PM, Vivek Goyal wrote: > > Are there any restrictions on order of partition bdev and disk bdev > > locking. > > Good point, yes, it seems you need to take the partition first. Now that I think about it though, I don't think we need the disk mutex at all. I thought you needed to hold it in order to iterate over the partitions, but it looks like you don't since that uses an RCU, right? > I guess taking disk mutex will help us providing mutual exclusion between multiple ioctls or multiple instances of same ioctl. Like, while you are extending a partition and checking for overlap, you don't want another call making progress on next partition which might overlap with this partition. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-21 20:46 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal @ 2011-12-24 21:36 ` Phillip Susi 2011-12-24 22:21 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-24 21:36 UTC (permalink / raw) To: Vivek Goyal; +Cc: joe, kzak, linux-kernel, jaxboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/21/2011 03:46 PM, Vivek Goyal wrote: > I guess taking disk mutex will help us providing mutual exclusion between > multiple ioctls or multiple instances of same ioctl. Like, while you are > extending a partition and checking for overlap, you don't want another call > making progress on next partition which might overlap with this partition. Good point; I will put it back and grab it in the correct order. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO9kXcAAoJEJrBOlT6nu75NcMH/jC8YXfpB6jnCjKtPxP2Hw8G vRfCULFi3h8WQbRmhR4AvayMJfwTYNkA/QB7TxZ0bYEm8SawesJuVyAKWLe1qutB AFUEFL6d8+lt8vMcpsK1LWFBACEPd2TIN+oMttbaLTm2J+SOotnCdkKk691cIX8g PtxEIeAB3NBBxfJ8hJUee3CQb4E0yg7zefdBEctLUC5Y3bKzAHtD5OvboyWf6Ycj 4+VONmaBwHS10hvXnk1CjLJXgjVs5lG+gKniW29SqdnKFbZ3zGl7Ye+QyUyjH9FU xltg/a+qV5Zx2e3EAi3uTkqaSUgsgo4dCKdCIAmR5+FyhbooJV8xV4d4rdMJvFU= =taVc -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-24 21:36 ` Phillip Susi @ 2011-12-24 22:21 ` Phillip Susi 2011-12-24 22:21 ` [PATCH 2/2] Add BLKPG_GET_PARTITION operation Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-24 22:21 UTC (permalink / raw) To: vgoyal; +Cc: joe, kzak, linux-kernel, jaxboe, Phillip Susi Add a new operation code ( BLKPG_RESIZE_PARTITION ) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. Thanks to Vivek Goyal for the seq counter idea. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> --- block/genhd.c | 10 ++++---- block/ioctl.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++- fs/partitions/check.c | 39 +++++++++++++++++++++++++++++++++++- include/linux/blkpg.h | 1 + include/linux/genhd.h | 7 ++++++ 5 files changed, 102 insertions(+), 8 deletions(-) diff --git a/block/genhd.c b/block/genhd.c index 02e9fca..b34fb90 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -154,7 +154,7 @@ struct hd_struct *disk_part_iter_next(struct disk_part_iter *piter) part = rcu_dereference(ptbl->part[piter->idx]); if (!part) continue; - if (!part->nr_sects && + if (!part_nr_sects_read(part) && !(piter->flags & DISK_PITER_INCL_EMPTY) && !(piter->flags & DISK_PITER_INCL_EMPTY_PART0 && piter->idx == 0)) @@ -191,7 +191,7 @@ EXPORT_SYMBOL_GPL(disk_part_iter_exit); static inline int sector_in_part(struct hd_struct *part, sector_t sector) { return part->start_sect <= sector && - sector < part->start_sect + part->nr_sects; + sector < part->start_sect + part_nr_sects_read(part); } /** @@ -766,8 +766,8 @@ void __init printk_all_partitions(void) printk("%s%s %10llu %s %s", is_part0 ? "" : " ", bdevt_str(part_devt(part), devt_buf), - (unsigned long long)part->nr_sects >> 1, - disk_name(disk, part->partno, name_buf), uuid); + (unsigned long long)part_nr_sects_read(part) >> 1 + , disk_name(disk, part->partno, name_buf), uuid); if (is_part0) { if (disk->driverfs_dev != NULL && disk->driverfs_dev->driver != NULL) @@ -858,7 +858,7 @@ static int show_partition(struct seq_file *seqf, void *v) while ((part = disk_part_iter_next(&piter))) seq_printf(seqf, "%4d %7d %10llu %s\n", MAJOR(part_devt(part)), MINOR(part_devt(part)), - (unsigned long long)part->nr_sects >> 1, + (unsigned long long)part_nr_sects_read(part) >> 1, disk_name(sgp, part->partno, buf)); disk_part_iter_exit(&piter); diff --git a/block/ioctl.c b/block/ioctl.c index ca939fc..08b2bd6 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -36,8 +36,8 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user case BLKPG_ADD_PARTITION: start = p.start >> 9; length = p.length >> 9; - /* check for fit in a hd_struct */ - if (sizeof(sector_t) == sizeof(long) && + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && sizeof(long long) > sizeof(long)) { long pstart = start, plength = length; if (pstart != start || plength != length @@ -92,6 +92,55 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); return 0; + case BLKPG_RESIZE_PARTITION: + start = p.start >> 9; + length = p.length >> 9; + /* check for fit in a hd_struct */ + if (sizeof(sector_t) == sizeof(long) && + sizeof(long long) > sizeof(long)) { + long pstart = start, plength = length; + if (pstart != start || plength != length + || pstart < 0 || plength < 0) + return -EINVAL; + } + part = disk_get_part(disk, partno); + if (!part) + return -ENXIO; + bdevp = bdget(part_devt(part)); + if (!bdevp) { + disk_put_part(part); + return -ENOMEM; + } + mutex_lock(&bdevp->bd_mutex); + mutex_lock_nested(&bdev->bd_mutex, 1); + if (start != part->start_sect) { + mutex_unlock(&bdevp->bd_mutex); + mutex_unlock(&bdev->bd_mutex); + disk_put_part(part); + return -EINVAL; + } + /* overlap? */ + disk_part_iter_init(&piter, disk, + DISK_PITER_INCL_EMPTY); + struct hd_struct *lpart; + while ((lpart = disk_part_iter_next(&piter))) { + if (lpart->partno != partno && !(start + length <= lpart->start_sect || + start >= lpart->start_sect + lpart->nr_sects)) { + disk_part_iter_exit(&piter); + mutex_unlock(&bdevp->bd_mutex); + mutex_unlock(&bdev->bd_mutex); + disk_put_part(part); + return -EBUSY; + } + } + disk_part_iter_exit(&piter); + part->nr_sects = length; + i_size_write(bdevp->bd_inode, p.length); + mutex_unlock(&bdevp->bd_mutex); + mutex_unlock(&bdev->bd_mutex); + bdput(bdevp); + disk_put_part(part); + return 0; default: return -EINVAL; } diff --git a/fs/partitions/check.c b/fs/partitions/check.c index e3c63d1..089e4bc 100644 --- a/fs/partitions/check.c +++ b/fs/partitions/check.c @@ -234,7 +234,7 @@ ssize_t part_size_show(struct device *dev, struct device_attribute *attr, char *buf) { struct hd_struct *p = dev_to_part(dev); - return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects); + return sprintf(buf, "%llu\n",(unsigned long long)part_nr_sects_read(p)); } static ssize_t part_ro_show(struct device *dev, @@ -408,6 +408,43 @@ void delete_partition(struct gendisk *disk, int partno) hd_struct_put(part); } +#if BITS_PER_LONG == 32 && defined(CONFIG_LBDAF) +static inline void part_nr_sects_write_begin(struct seqcount_t *seq) +{ + write_seqcount_begin(&seq); +} + +static inline void part_nr_sects_write_end(struct seqcount_t *seq) +{ + write_seqcount_end(&seq); +} + +/* + * Any access of part->nr_sects which is not protected by partition + * bd_mutex or gendisk bdev bd_mutex, hould be done using this accessor + * function. + */ +sector_t part_nr_sects_read(struct hd_struct *part) +{ + sector_t nr_sects; + unsigned seq; + + do { + seq = read_seqcount_begin(&part->seq); + nr_sects = part->nr_sects; + } while (read_seqcount_retry(&part->seq, seq)); + + return nr_sects; +} +#else +static inline void part_nr_sects_write_begin(seqcount_t *seq) {} +static inline void part_nr_sects_write_end(seqcount_t *seq) {} +sector_t part_nr_sects_read(struct hd_struct *part) +{ + return part->nr_sects; +} +#endif + static ssize_t whole_disk_show(struct device *dev, struct device_attribute *attr, char *buf) { diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index faf8a45..a851944 100644 --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -40,6 +40,7 @@ struct blkpg_ioctl_arg { /* The subfunctions (for the op field) */ #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 +#define BLKPG_RESIZE_PARTITION 3 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 6d18f35..a55ce2c 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -98,7 +98,13 @@ struct partition_meta_info { struct hd_struct { sector_t start_sect; + /* + * nr_sects is protected by sequence counter. One might extend a + * partition while IO is happening to it and update of nr_sects + * can be non-atomic on 32bit machines with 64bit sector_t. + */ sector_t nr_sects; + seqcount_t seq; sector_t alignment_offset; unsigned int discard_alignment; struct device __dev; @@ -604,6 +610,7 @@ extern struct hd_struct * __must_check add_partition(struct gendisk *disk, extern void __delete_partition(struct hd_struct *); extern void delete_partition(struct gendisk *, int); extern void printk_all_partitions(void); +extern sector_t part_nr_sects_read(struct hd_struct *part); extern struct gendisk *alloc_disk_node(int minors, int node_id); extern struct gendisk *alloc_disk(int minors); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/2] Add BLKPG_GET_PARTITION operation 2011-12-24 22:21 ` Phillip Susi @ 2011-12-24 22:21 ` Phillip Susi 0 siblings, 0 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-24 22:21 UTC (permalink / raw) To: vgoyal; +Cc: joe, kzak, linux-kernel, jaxboe, Phillip Susi Add a new operation to the BLKPG ioctl to read the partition. This allows user space to find the current start and length of a partition without having to open the partition and use the long depreciated HDIO_GETGEO ioctl. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> --- block/ioctl.c | 15 +++++++++++++++ include/linux/blkpg.h | 1 + 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index 08b2bd6..fad9064 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -141,6 +141,21 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user bdput(bdevp); disk_put_part(part); return 0; + case BLKPG_GET_PARTITION: + mutex_lock(&bdev->bd_mutex); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + p.start = part->start_sect << 9; + p.length = part->nr_sects << 9; + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + if (copy_to_user(a.data, &p, sizeof(struct blkpg_partition))) + return -EFAULT; + return 0; default: return -EINVAL; } diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index a851944..72bf5e2 100644 --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -41,6 +41,7 @@ struct blkpg_ioctl_arg { #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 #define BLKPG_RESIZE_PARTITION 3 +#define BLKPG_GET_PARTITION 4 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH 2/2] Add BLKPG_GET_PARTITION operation [not found] <cover.1322709471.git.psusi@cfl.rr.com> 2011-12-01 3:23 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Phillip Susi @ 2011-12-01 3:23 ` Phillip Susi 2011-12-08 12:35 ` Karel Zak 1 sibling, 1 reply; 36+ messages in thread From: Phillip Susi @ 2011-12-01 3:23 UTC (permalink / raw) To: linux-kernel -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Add a new operation to the BLKPG ioctl to read the partition. This allows user space to find the current start and length of a partition without having to open the partition and use the long depreciated HDIO_GETGEO ioctl. Signed-off-by: Phillip Susi <psusi@cfl.rr.com> - --- block/ioctl.c | 15 +++++++++++++++ include/linux/blkpg.h | 1 + 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index f97e6a4..3e92f0f 100644 - --- a/block/ioctl.c +++ b/block/ioctl.c @@ -139,6 +139,21 @@ static int blkpg_ioctl(struct block_device *bdev, struct blkpg_ioctl_arg __user disk_put_part(part); mutex_unlock(&bdev->bd_mutex); return 0; + case BLKPG_GET_PARTITION: + mutex_lock(&bdev->bd_mutex); + part = disk_get_part(disk, partno); + if (!part) + { + mutex_unlock(&bdev->bd_mutex); + return -ENXIO; + } + p.start = part->start_sect << 9; + p.length = part->nr_sects << 9; + disk_put_part(part); + mutex_unlock(&bdev->bd_mutex); + if (copy_to_user(a.data, &p, sizeof(struct blkpg_partition))) + return -EFAULT; + return 0; default: return -EINVAL; } diff --git a/include/linux/blkpg.h b/include/linux/blkpg.h index 103da38..26b1aef 100644 - --- a/include/linux/blkpg.h +++ b/include/linux/blkpg.h @@ -41,6 +41,7 @@ struct blkpg_ioctl_arg { #define BLKPG_ADD_PARTITION 1 #define BLKPG_DEL_PARTITION 2 #define BLKPG_RES_PARTITION 3 +#define BLKPG_GET_PARTITION 4 /* Sizes of name fields. Unused at present. */ #define BLKPG_DEVNAMELTH 64 - -- 1.7.5.4 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk7W80AACgkQJ4UciIs+XuJHtQCgjsMlhoVgn2ZDOv2nJ2pJNM3j 0psAoIDeUgJKw+eCAeleFNAGuuHy+HBO =mBqW -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] Add BLKPG_GET_PARTITION operation 2011-12-01 3:23 ` Phillip Susi @ 2011-12-08 12:35 ` Karel Zak 2011-12-08 14:25 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Karel Zak @ 2011-12-08 12:35 UTC (permalink / raw) To: Phillip Susi; +Cc: linux-kernel On Wed, Nov 30, 2011 at 10:23:44PM -0500, Phillip Susi wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Add a new operation to the BLKPG ioctl to read the partition. > This allows user space to find the current start and length > of a partition without having to open the partition and use > the long depreciated HDIO_GETGEO ioctl. $ cat /sys/block/sda/sda1/{size,start} 4096000 2048 ...is probably the best way if you want to avoid problems with permissions and deprecated (or all) ioctls :-) Karel -- Karel Zak <kzak@redhat.com> http://karelzak.blogspot.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 2/2] Add BLKPG_GET_PARTITION operation 2011-12-08 12:35 ` Karel Zak @ 2011-12-08 14:25 ` Phillip Susi 0 siblings, 0 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-08 14:25 UTC (permalink / raw) To: Karel Zak; +Cc: linux-kernel On 12/8/2011 7:35 AM, Karel Zak wrote: > On Wed, Nov 30, 2011 at 10:23:44PM -0500, Phillip Susi wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Add a new operation to the BLKPG ioctl to read the partition. >> This allows user space to find the current start and length >> of a partition without having to open the partition and use >> the long depreciated HDIO_GETGEO ioctl. > > $ cat /sys/block/sda/sda1/{size,start} > 4096000 > 2048 > > ...is probably the best way if you want to avoid problems with > permissions and deprecated (or all) ioctls :-) Yes, I patched parted to grab the values from sysfs and only use HDIO_GETGEO if that fails, but I thought an ioctl would be preferable ( no need to muck about with appending partition numbers to paths and parsing text and such ), and it seemed an orthogonal operation to the other BLKPG ops. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl
@ 2011-12-29 17:06 Maxim Patlasov
2011-12-30 0:09 ` Phillip Susi
0 siblings, 1 reply; 36+ messages in thread
From: Maxim Patlasov @ 2011-12-29 17:06 UTC (permalink / raw)
To: psusi; +Cc: vgoyal, joe, kzak, linux-kernel, jaxboe
Phillip,
The patch seems to have some problems:
1. Typo in part_nr_sects_write_begin:
> + write_seqcount_begin(&seq);
It should be write_seqcount_begin(seq). Similar issue in
part_nr_sects_write_end().
2. part_nr_sects_write_begin/end is never called.
3. part->seq should be initialized with seqcount_init().
Btw, do we really need both part_nr_sects_write_begin and
part_nr_sects_write_end? What about:
#if BITS_PER_LONG == 32 && defined(CONFIG_LBDAF)
static inline void part_nr_sects_write(struct hd_struct *part,
sector_t nr_sects)
{
write_seqcount_begin(&part->seq);
part->nr_sects = nr_sects;
write_seqcount_end(&part->seq);
}
#else
static inline void part_nr_sects_write(struct hd_struct *part,
sector_t nr_sects)
{
part->nr_sects = nr_sects;
}
#endif
and use part_nr_sects_write(part, length) instead of part->nr_sects =
length in case BLKPG_RESIZE_PARTITION?
Thanks,
Maxim
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-29 17:06 [PATCH 1/2] Add partition resize function to BLKPG ioctl Maxim Patlasov @ 2011-12-30 0:09 ` Phillip Susi 2012-01-01 21:49 ` Phillip Susi 2012-01-26 19:01 ` Vivek Goyal 0 siblings, 2 replies; 36+ messages in thread From: Phillip Susi @ 2011-12-30 0:09 UTC (permalink / raw) To: Maxim Patlasov; +Cc: vgoyal, joe, kzak, linux-kernel, jaxboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/29/2011 12:06 PM, Maxim Patlasov wrote: > Phillip, > > The patch seems to have some problems: > > 1. Typo in part_nr_sects_write_begin: > >> + write_seqcount_begin(&seq); > > It should be write_seqcount_begin(seq). Similar issue in > part_nr_sects_write_end(). > > 2. part_nr_sects_write_begin/end is never called. > > 3. part->seq should be initialized with seqcount_init(). > > Btw, do we really need both part_nr_sects_write_begin and > part_nr_sects_write_end? What about: Good points. I also noticed that the read/write functions were only being called when not holding the mutex. If anyone is touching nr_sects without the mutex, then everyone must use the read/write functions, whether they hold the mutex or not. Otherwise, a mutex holder that touches it directly will race with a non mutex holder using the seqcounter. Vivek, rather than fix the rest of the references to nr_sects to use the read/write functions, why not just fix the few sites that were accessing it without the mutex to take the mutex fist instead of using a seqcounter? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJO/QEtAAoJEJrBOlT6nu756LYH/jPZrhv9svf9TX0orH0eZJSY UmF3UYffWOexarz5Xppb42so+LedZLtJ5Ya7oQvcWQ5UuSRW6LNLH3tOoutmRVp5 6q5ReaEAp61neE7D2dkqJ9XjSBz7v+I/+CfNMRKhkVNq1n41UpV3iw4qYFBR3DR7 BSyfNPJlXXNU2tSJi1hT/ZsEM1/9bHKiB3yeoipln7CvRjNFjSBVyiDIFX/xl5wD QlLKFesgxSGkYCVfdYtkdT4XybDydUIKWzVXIllFvZgqrBNnLEvwOWlhiCV93YJe jYKHk2oWt7pSxxn3Vz8WlUbf5Jt1k/oresb6I7oVtR8c5Xutmk/HsLOT8K43YlM= =TIs3 -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-30 0:09 ` Phillip Susi @ 2012-01-01 21:49 ` Phillip Susi 2012-01-26 19:01 ` Vivek Goyal 1 sibling, 0 replies; 36+ messages in thread From: Phillip Susi @ 2012-01-01 21:49 UTC (permalink / raw) To: Maxim Patlasov; +Cc: vgoyal, joe, kzak, linux-kernel, jaxboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 12/29/2011 07:09 PM, Phillip Susi wrote: > Good points. I also noticed that the read/write functions were only > being called when not holding the mutex. If anyone is touching > nr_sects without the mutex, then everyone must use the read/write > functions, whether they hold the mutex or not. Otherwise, a mutex > holder that touches it directly will race with a non mutex holder > using the seqcounter. Vivek, rather than fix the rest of the > references to nr_sects to use the read/write functions, why not just > fix the few sites that were accessing it without the mutex to take > the mutex fist instead of using a seqcounter? I've been wondering why device-mapper and loop have been able to change device size for years, yet didn't add a seqcounter to do this. It looks to me like there are plenty of places ( like xen-blkback ) that either call get_capacity() or directly reference nr_sects without holding the bd_mutex. Does this mean that device-mapper or loop changing capacity on i386 with large block support already has a race condition? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPANTRAAoJEJrBOlT6nu75FjAIAMR6B/Q4o8VDdZGIMw7RoutP MHdCFPwcxTjc06061E2oRk4NYHGQFhg9NfKK6vmI2enjB+ctVIFyK2qC+Ar13mUp KkKbqrVbjOM8AOXgEb4VTQMQ6inTUFEvTOx+EGvAw0flJzpfq7zdjw+1P+BM/PrY LPhxjb8dPeooHmv2TEFFpW0XavkqBM1JHuL9sM2f2g3U9QKU3clNFnrHGY7p1Wxn FBO2ncqqSdbJKsw2xskdRfMq1x6jXfUeIyXCoNoTrHHPz+80AeHOMWy2Y54dvZMX J9iKuQlURG5zyQ1eGh96YlRj4f4Z+zFY8h84K3z2i+xR45J4IgNxXD4ewbIPqg8= =XZQc -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2011-12-30 0:09 ` Phillip Susi 2012-01-01 21:49 ` Phillip Susi @ 2012-01-26 19:01 ` Vivek Goyal 2012-01-26 20:35 ` Phillip Susi 1 sibling, 1 reply; 36+ messages in thread From: Vivek Goyal @ 2012-01-26 19:01 UTC (permalink / raw) To: Phillip Susi; +Cc: Maxim Patlasov, joe, kzak, linux-kernel, jaxboe On Thu, Dec 29, 2011 at 07:09:17PM -0500, Phillip Susi wrote: [..] > > Btw, do we really need both part_nr_sects_write_begin and > > part_nr_sects_write_end? What about: > > Good points. I also noticed that the read/write functions were only being called when not holding the mutex. If anyone is touching nr_sects without the mutex, then everyone must use the read/write functions, whether they hold the mutex or not. Otherwise, a mutex holder that touches it directly will race with a non mutex holder using the seqcounter. Sorry for a very late reply. I was on vacation. I thought update will always happen with mutex lock held. That's what sequence counter expects so that two updaters don't race. Just that while updating under mutex lock, we still need to use sequence counter mecahinsm to update values so that any readers out there not holding mutex don't get confused. Did I miss some code where nr_sects is being read without mutex and I have not use sequence counter version to read it? > Vivek, rather than fix the rest of the references to nr_sects to use the read/write functions, why not just fix the few sites that were accessing it without the mutex to take the mutex fist instead of using a seqcounter? Right now readers can afford not to take lock. Introducing mutex on read side with just add to the cost. Especially IO submission path where we map IO to a partitiona and we wouldn't want to take mutexes there? Are you still pursuing this pathset? Sounds like a useful functionality to have. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2012-01-26 19:01 ` Vivek Goyal @ 2012-01-26 20:35 ` Phillip Susi 2012-01-26 21:04 ` Vivek Goyal 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2012-01-26 20:35 UTC (permalink / raw) To: Vivek Goyal; +Cc: Maxim Patlasov, joe, kzak, linux-kernel, jaxboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/26/2012 2:01 PM, Vivek Goyal wrote: > I thought update will always happen with mutex lock held. That's > what sequence counter expects so that two updaters don't race. Just > that while updating under mutex lock, we still need to use sequence > counter mecahinsm to update values so that any readers out there > not holding mutex don't get confused. Yes, but holding the mutex while writing does no good for the reader. When the writer doesn't use the seqcounter, then the reader that is using it is not actually protected. > Right now readers can afford not to take lock. Introducing mutex on > read side with just add to the cost. Especially IO submission path > where we map IO to a partitiona and we wouldn't want to take > mutexes there? Yes, it does look like readers can't afford that overhead. > Are you still pursuing this pathset? Sounds like a useful > functionality to have. Yes, but I hadn't yet heard back about my question about this being a broader issue that is already a bug in the kernel because things like loop and md already change nr_sects ( on the whole disk partition ) without any protection. Maybe what we need is a read/write lock on struct genhd, then all readers need to acquire the read lock, which should only slow them down if they collide with a writer. Another idea that I had but have not yet checked to see if it is actually feasible is to copy the struct genhd, change the size of the copy, and replace the existing one since updating the pointer will be atomic. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPIbkeAAoJEJrBOlT6nu75Bx8IAKgTgA4czpng+JaQkja7n8j9 8B+e+9Asnyp/ND3qkxdUHwmRwJxmC+fJV9PVT84neoBHugN63rX3afc4KGJp5elg k21kziCy46cbIOOeYQDkTZnkHoqHL3dd1sJhO6tv7bDjtRKW0PVY855sKQKW8cNk zjX0WA+iaePs0+Yhd921MwKisRyWtpUr5Sm3Ib0h4kTjRKL7Nyk6cHNH516HpuD2 mXTDCbK5eRhljiadd7igCbbUQNVyFNHm3JGcgLLw35dnarzobZFDlrN1ybXoPExC 31FhDG2Vbo5kLlUNibIsuuf9DoYzBilMTyE/aHN1EtjkTk8bKMg3lZoUi1sVV7c= =TMsc -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2012-01-26 20:35 ` Phillip Susi @ 2012-01-26 21:04 ` Vivek Goyal 2012-01-26 21:48 ` Phillip Susi 0 siblings, 1 reply; 36+ messages in thread From: Vivek Goyal @ 2012-01-26 21:04 UTC (permalink / raw) To: Phillip Susi; +Cc: Maxim Patlasov, joe, kzak, linux-kernel, jaxboe On Thu, Jan 26, 2012 at 03:35:42PM -0500, Phillip Susi wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 1/26/2012 2:01 PM, Vivek Goyal wrote: > > I thought update will always happen with mutex lock held. That's > > what sequence counter expects so that two updaters don't race. Just > > that while updating under mutex lock, we still need to use sequence > > counter mecahinsm to update values so that any readers out there > > not holding mutex don't get confused. > > Yes, but holding the mutex while writing does no good for the reader. > When the writer doesn't use the seqcounter, then the reader that is > using it is not actually protected. Ok, so you are worried about updates to nr_sects by other code. Makes sense. [..] > > Are you still pursuing this pathset? Sounds like a useful > > functionality to have. > > Yes, but I hadn't yet heard back about my question about this being a > broader issue that is already a bug in the kernel because things like > loop and md already change nr_sects ( on the whole disk partition ) > without any protection. Interesting. I do see that set_capacity() changes the nr_sects without any protection. Sounds like it is racy on 32bit machines with 64bit sector_t. > > Maybe what we need is a read/write lock on struct genhd, then all > readers need to acquire the read lock, which should only slow them > down if they collide with a writer. But taking lock will mean atomic operation irrespective of the fact whether lock is taken by somebody else or not. I am assuming it will still turn out to be expensive. > > Another idea that I had but have not yet checked to see if it is > actually feasible is to copy the struct genhd, change the size of the > copy, and replace the existing one since updating the pointer will be > atomic. You will run into issues if somebody has a pointer stored to genhd. I think simpler thing would be to stick with sequence counter approach which keeps read side lockless. We can fix other writers of nr_sects over a period of time. If nobody has complained so far, that means we don't run into issues frequently and it is not a huge concern. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2012-01-26 21:04 ` Vivek Goyal @ 2012-01-26 21:48 ` Phillip Susi 2012-01-30 15:49 ` Vivek Goyal 0 siblings, 1 reply; 36+ messages in thread From: Phillip Susi @ 2012-01-26 21:48 UTC (permalink / raw) To: Vivek Goyal; +Cc: Maxim Patlasov, joe, kzak, linux-kernel, jaxboe -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/26/2012 4:04 PM, Vivek Goyal wrote: >> Another idea that I had but have not yet checked to see if it is >> actually feasible is to copy the struct genhd, change the size >> of the copy, and replace the existing one since updating the >> pointer will be atomic. > > You will run into issues if somebody has a pointer stored to > genhd. They are already kept in an RCU list which has the same problem. Doesn't that deal with it by using reference counters, so the reader can keep and use the pointer to the old structure just fine, and it will be cleaned up when they release the reference. > I think simpler thing would be to stick with sequence counter > approach which keeps read side lockless. We can fix other writers > of nr_sects over a period of time. If nobody has complained so > far, that means we don't run into issues frequently and it is not a > huge concern. So you think the patch is fine the way it is? -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPIcoWAAoJEJrBOlT6nu75SfYIALuzdfVKVKBdXa16nrKj4XtU T2PHkbNRRJhFjRbtxfyLkAWL//yEz4S1z08z+RCpP8YcuQ47bKa8WnoamTYSkzuV SLjKjjec5a2v/SpEBl6QSHUxg73kBXi0pvsQDEm1+j0ZinZk7Sy+x2uZ8fKwYYYp Z8dOShzSR5toIdpi42SbgmtO/qrdqbRcxANvCVwtybeAyVqmIiOY4DTdBm4YhTp5 GeVTUYTsfjuS3P+i1JJaVmUPMtMzOegXLKI775gWTn+x90TIMnoI+lXa2h1QeZOL ZCsk6x7BF9t3lAkk60E8BePFFGoYpz3rHrAsco2qizrXz4Z0WVlh6KGIad0xDF4= =SifF -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH 1/2] Add partition resize function to BLKPG ioctl 2012-01-26 21:48 ` Phillip Susi @ 2012-01-30 15:49 ` Vivek Goyal 0 siblings, 0 replies; 36+ messages in thread From: Vivek Goyal @ 2012-01-30 15:49 UTC (permalink / raw) To: Phillip Susi; +Cc: Maxim Patlasov, joe, kzak, linux-kernel, jaxboe On Thu, Jan 26, 2012 at 04:48:06PM -0500, Phillip Susi wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 1/26/2012 4:04 PM, Vivek Goyal wrote: > >> Another idea that I had but have not yet checked to see if it is > >> actually feasible is to copy the struct genhd, change the size > >> of the copy, and replace the existing one since updating the > >> pointer will be atomic. > > > > You will run into issues if somebody has a pointer stored to > > genhd. > > They are already kept in an RCU list which has the same problem. Yes, using rcu is one option. > Doesn't that deal with it by using reference counters, so the reader > can keep and use the pointer to the old structure just fine, and it > will be cleaned up when they release the reference. Then reader needs to take reference count before every read and that makes it heavy weight solution for reader. Sequence counter keeps it lightweight for reader in fast path. > > > I think simpler thing would be to stick with sequence counter > > approach which keeps read side lockless. We can fix other writers > > of nr_sects over a period of time. If nobody has complained so > > far, that means we don't run into issues frequently and it is not a > > huge concern. > > So you think the patch is fine the way it is? If you can fix other use cases now, it would be good. Otherwise, I think we can leave a big fat comment in the code and if somebody runs into issues, we can go fix the individiual cases then. As we are not introducing any new races, I will not be too concerned about that. Thanks Vivek ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2012-01-30 15:49 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <cover.1322709471.git.psusi@cfl.rr.com> 2011-12-01 3:23 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Phillip Susi 2011-12-08 12:30 ` Karel Zak 2011-12-08 14:22 ` Phillip Susi 2011-12-08 15:16 ` Karel Zak 2011-12-08 15:25 ` Phillip Susi 2011-12-08 15:58 ` Vivek Goyal 2011-12-08 16:06 ` Phillip Susi 2011-12-08 16:28 ` Vivek Goyal 2011-12-08 16:55 ` Phillip Susi 2011-12-09 2:53 ` Phillip Susi 2011-12-12 14:53 ` Vivek Goyal 2011-12-12 17:43 ` Phillip Susi 2011-12-12 17:49 ` Joe Perches 2011-12-12 18:04 ` Vivek Goyal 2011-12-13 0:15 ` Phillip Susi 2011-12-13 0:16 ` Phillip Susi 2011-12-13 0:16 ` [PATCH 2/2] Add BLKPG_GET_PARTITION operation Phillip Susi 2011-12-19 20:25 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal 2011-12-21 1:53 ` Phillip Susi 2011-12-21 1:54 ` Phillip Susi 2011-12-21 1:54 ` [PATCH 2/2] Add BLKPG_GET_PARTITION operation Phillip Susi 2011-12-21 20:46 ` [PATCH 1/2] Add partition resize function to BLKPG ioctl Vivek Goyal 2011-12-24 21:36 ` Phillip Susi 2011-12-24 22:21 ` Phillip Susi 2011-12-24 22:21 ` [PATCH 2/2] Add BLKPG_GET_PARTITION operation Phillip Susi 2011-12-01 3:23 ` Phillip Susi 2011-12-08 12:35 ` Karel Zak 2011-12-08 14:25 ` Phillip Susi 2011-12-29 17:06 [PATCH 1/2] Add partition resize function to BLKPG ioctl Maxim Patlasov 2011-12-30 0:09 ` Phillip Susi 2012-01-01 21:49 ` Phillip Susi 2012-01-26 19:01 ` Vivek Goyal 2012-01-26 20:35 ` Phillip Susi 2012-01-26 21:04 ` Vivek Goyal 2012-01-26 21:48 ` Phillip Susi 2012-01-30 15:49 ` Vivek Goyal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).