All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/4] brd: partition fixes
@ 2014-08-06 11:27 Boaz Harrosh
  2014-08-06 11:29 ` [PATCH 1/4] Change direct_access calling convention Boaz Harrosh
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-06 11:27 UTC (permalink / raw)
  To: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel; +Cc: linux-fsdevel

Hi

Jens for me this should go into current 3.17-rcX Kernel. Current situation is
that any attempt to use partitions with brd device would create the partition
but then any use will trash the data.
(But after review from Ross Zwisler)

See: http://www.spinics.net/lists/linux-scsi/msg76737.html

So these patches fixes up all the problems we saw with the code, but not sacrificing
any of the old fixtures. See [patch 3/4] for more explanations.

Ross please review.

list of patches:
[PATCH 1/4] Change direct_access calling convention

    This is a Matthew's patch from the DAX series which fixes the interface to
    direct_access taking into account the partition offset. It must be applied
    here for partitions to work with direct_access() API.

[PATCH 2/4] brd: Add getgeo to block ops

    Ross I put this patch on your name please confirm that you agree and
    that you allow the sign-off. Though your original patch was for prd,
    this here is the exact same one for brd.

[PATCH 3/4] brd: Fix all partitions BUGs
[PATCH 4/4] brd: Request from fdisk 4k alignment

Thanks
Boaz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] Change direct_access calling convention
  2014-08-06 11:27 [PATCHSET 0/4] brd: partition fixes Boaz Harrosh
@ 2014-08-06 11:29 ` Boaz Harrosh
  2014-08-06 11:30 ` [PATCH 2/4] brd: Add getgeo to block ops Boaz Harrosh
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-06 11:29 UTC (permalink / raw)
  To: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel; +Cc: linux-fsdevel

From: Matthew Wilcox <matthew.r.wilcox@intel.com>

In order to support accesses to larger chunks of memory, pass in a
'size' parameter (counted in bytes), and return the amount available at
that address.

Add a new helper function, bdev_direct_access(), to handle common
functionality including partition handling, checking for the sector
being page-aligned, and checking the length of the request does not
pass the end of the partition.

[Boaz] checkpatch fixes, add prints.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 Documentation/filesystems/xip.txt | 15 ++++++++------
 arch/powerpc/sysdev/axonram.c     | 17 ++++------------
 drivers/block/brd.c               | 12 +++++------
 drivers/s390/block/dcssblk.c      | 21 ++++++++-----------
 fs/block_dev.c                    | 43 +++++++++++++++++++++++++++++++++++++++
 fs/ext2/xip.c                     | 30 ++++++++++++---------------
 include/linux/blkdev.h            |  6 ++++--
 7 files changed, 87 insertions(+), 57 deletions(-)

diff --git a/Documentation/filesystems/xip.txt b/Documentation/filesystems/xip.txt
index 0466ee5..b62eabf 100644
--- a/Documentation/filesystems/xip.txt
+++ b/Documentation/filesystems/xip.txt
@@ -28,12 +28,15 @@ Implementation
 Execute-in-place is implemented in three steps: block device operation,
 address space operation, and file operations.
 
-A block device operation named direct_access is used to retrieve a
-reference (pointer) to a block on-disk. The reference is supposed to be
-cpu-addressable, physical address and remain valid until the release operation
-is performed. A struct block_device reference is used to address the device,
-and a sector_t argument is used to identify the individual block. As an
-alternative, memory technology devices can be used for this.
+A block device operation named direct_access is used to translate the
+block device sector number to a page frame number (pfn) that identifies
+the physical page for the memory.  It also returns a kernel virtual
+address that can be used to access the memory.
+
+The direct_access method takes a 'size' parameter that indicates the
+number of bytes being requested.  The function should return the number
+of bytes that it can provide, although it must not exceed the number of
+bytes requested.  It may also return a negative errno if an error occurs.
 
 The block device operation is optional, these block devices support it as of
 today:
diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c
index 47b6b9f..142892b 100644
--- a/arch/powerpc/sysdev/axonram.c
+++ b/arch/powerpc/sysdev/axonram.c
@@ -139,26 +139,17 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio)
  * axon_ram_direct_access - direct_access() method for block device
  * @device, @sector, @data: see block_device_operations method
  */
-static int
+static long
 axon_ram_direct_access(struct block_device *device, sector_t sector,
-		       void **kaddr, unsigned long *pfn)
+		       void **kaddr, unsigned long *pfn, long size)
 {
 	struct axon_ram_bank *bank = device->bd_disk->private_data;
-	loff_t offset;
-
-	offset = sector;
-	if (device->bd_part != NULL)
-		offset += device->bd_part->start_sect;
-	offset <<= AXON_RAM_SECTOR_SHIFT;
-	if (offset >= bank->size) {
-		dev_err(&bank->device->dev, "Access outside of address space\n");
-		return -ERANGE;
-	}
+	loff_t offset = (loff_t)sector << AXON_RAM_SECTOR_SHIFT;
 
 	*kaddr = (void *)(bank->ph_addr + offset);
 	*pfn = virt_to_phys(kaddr) >> PAGE_SHIFT;
 
-	return 0;
+	return min_t(long, size, bank->size - offset);
 }
 
 static const struct block_device_operations axon_ram_devops = {
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index c7d138e..a10a0a9 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -370,25 +370,23 @@ static int brd_rw_page(struct block_device *bdev, sector_t sector,
 }
 
 #ifdef CONFIG_BLK_DEV_XIP
-static int brd_direct_access(struct block_device *bdev, sector_t sector,
-			void **kaddr, unsigned long *pfn)
+static long brd_direct_access(struct block_device *bdev, sector_t sector,
+			void **kaddr, unsigned long *pfn, long size)
 {
 	struct brd_device *brd = bdev->bd_disk->private_data;
 	struct page *page;
 
 	if (!brd)
 		return -ENODEV;
-	if (sector & (PAGE_SECTORS-1))
-		return -EINVAL;
-	if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk))
-		return -ERANGE;
 	page = brd_insert_page(brd, sector);
 	if (!page)
 		return -ENOSPC;
 	*kaddr = page_address(page);
 	*pfn = page_to_pfn(page);
 
-	return 0;
+	/* Could optimistically check to see if the next page in the
+	 * file is mapped to the next page of physical RAM */
+	return min_t(long, PAGE_SIZE, size);
 }
 #endif
 
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 0f47175..2ee5556 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -28,8 +28,8 @@
 static int dcssblk_open(struct block_device *bdev, fmode_t mode);
 static void dcssblk_release(struct gendisk *disk, fmode_t mode);
 static void dcssblk_make_request(struct request_queue *q, struct bio *bio);
-static int dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
-				 void **kaddr, unsigned long *pfn);
+static long dcssblk_direct_access(struct block_device *bdev, sector_t secnum,
+				 void **kaddr, unsigned long *pfn, long size);
 
 static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0";
 
@@ -866,25 +866,22 @@ fail:
 	bio_io_error(bio);
 }
 
-static int
+static long
 dcssblk_direct_access (struct block_device *bdev, sector_t secnum,
-			void **kaddr, unsigned long *pfn)
+			void **kaddr, unsigned long *pfn, long size)
 {
 	struct dcssblk_dev_info *dev_info;
-	unsigned long pgoff;
+	unsigned long offset, dev_sz;
 
 	dev_info = bdev->bd_disk->private_data;
 	if (!dev_info)
 		return -ENODEV;
-	if (secnum % (PAGE_SIZE/512))
-		return -EINVAL;
-	pgoff = secnum / (PAGE_SIZE / 512);
-	if ((pgoff+1)*PAGE_SIZE-1 > dev_info->end - dev_info->start)
-		return -ERANGE;
-	*kaddr = (void *) (dev_info->start+pgoff*PAGE_SIZE);
+	dev_sz = dev_info->end - dev_info->start;
+	offset = secnum * 512;
+	*kaddr = (void *) (dev_info->start + offset);
 	*pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT;
 
-	return 0;
+	return min_t(long, size, dev_sz - offset);
 }
 
 static void
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 6d72746..002d5d6 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -427,6 +427,49 @@ int bdev_write_page(struct block_device *bdev, sector_t sector,
 }
 EXPORT_SYMBOL_GPL(bdev_write_page);
 
+/**
+ * bdev_direct_access() - Get the address for directly-accessibly memory
+ * @bdev: The device containing the memory
+ * @sector: The offset within the device
+ * @addr: Where to put the address of the memory
+ * @pfn: The Page Frame Number for the memory
+ * @size: The number of bytes requested
+ *
+ * If a block device is made up of directly addressable memory, this function
+ * will tell the caller the PFN and the address of the memory.  The address
+ * may be directly dereferenced within the kernel without the need to call
+ * ioremap(), kmap() or similar.  THe PFN is suitable for inserting into
+ * page tables.
+ *
+ * Return: negative errno if an error occurs, otherwise the number of bytes
+ * accessible at this address.
+ */
+long bdev_direct_access(struct block_device *bdev, sector_t sector,
+			void **addr, unsigned long *pfn, long size)
+{
+	const struct block_device_operations *ops = bdev->bd_disk->fops;
+
+	if (!ops->direct_access)
+		return -EOPNOTSUPP;
+	if ((sector + DIV_ROUND_UP(size, 512)) >
+					part_nr_sects_read(bdev->bd_part))
+		return -ERANGE;
+	sector += get_start_sect(bdev);
+	if (sector % (PAGE_SIZE / 512)) {
+		if (get_start_sect(bdev) % (PAGE_SIZE / 512)) {
+			pr_err_ratelimited(
+				"%s: [%s:%d,%d] partition not 4K aligned!!!\n",
+				__func__, bdev->bd_disk->disk_name,
+				MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
+		}
+		return -EINVAL;
+	}
+
+	size = ops->direct_access(bdev, sector, addr, pfn, size);
+	return size ? size : -ERANGE;
+}
+EXPORT_SYMBOL_GPL(bdev_direct_access);
+
 /*
  * pseudo-fs
  */
diff --git a/fs/ext2/xip.c b/fs/ext2/xip.c
index e98171a..a4d5435 100644
--- a/fs/ext2/xip.c
+++ b/fs/ext2/xip.c
@@ -13,18 +13,13 @@
 #include "ext2.h"
 #include "xip.h"
 
-static inline int
-__inode_direct_access(struct inode *inode, sector_t block,
-		      void **kaddr, unsigned long *pfn)
+static inline long __inode_direct_access(struct inode *inode, sector_t block,
+				void **kaddr, unsigned long *pfn, long size)
 {
 	struct block_device *bdev = inode->i_sb->s_bdev;
-	const struct block_device_operations *ops = bdev->bd_disk->fops;
-	sector_t sector;
-
-	sector = block * (PAGE_SIZE / 512); /* ext2 block to bdev sector */
+	sector_t sector = block * (PAGE_SIZE / 512);
 
-	BUG_ON(!ops->direct_access);
-	return ops->direct_access(bdev, sector, kaddr, pfn);
+	return bdev_direct_access(bdev, sector, kaddr, pfn, size);
 }
 
 static inline int
@@ -53,12 +48,13 @@ ext2_clear_xip_target(struct inode *inode, sector_t block)
 {
 	void *kaddr;
 	unsigned long pfn;
-	int rc;
+	long size;
 
-	rc = __inode_direct_access(inode, block, &kaddr, &pfn);
-	if (!rc)
-		clear_page(kaddr);
-	return rc;
+	size = __inode_direct_access(inode, block, &kaddr, &pfn, PAGE_SIZE);
+	if (size < 0)
+		return size;
+	clear_page(kaddr);
+	return 0;
 }
 
 void ext2_xip_verify_sb(struct super_block *sb)
@@ -77,7 +73,7 @@ void ext2_xip_verify_sb(struct super_block *sb)
 int ext2_get_xip_mem(struct address_space *mapping, pgoff_t pgoff, int create,
 				void **kmem, unsigned long *pfn)
 {
-	int rc;
+	long rc;
 	sector_t block;
 
 	/* first, retrieve the sector number */
@@ -86,6 +82,6 @@ int ext2_get_xip_mem(struct address_space *mapping, pgoff_t pgoff, int create,
 		return rc;
 
 	/* retrieve address of the target data */
-	rc = __inode_direct_access(mapping->host, block, kmem, pfn);
-	return rc;
+	rc = __inode_direct_access(mapping->host, block, kmem, pfn, PAGE_SIZE);
+	return (rc < 0) ? rc : 0;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 8699bcf..bc5ea9e 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1613,8 +1613,8 @@ struct block_device_operations {
 	int (*rw_page)(struct block_device *, sector_t, struct page *, int rw);
 	int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
 	int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long);
-	int (*direct_access) (struct block_device *, sector_t,
-						void **, unsigned long *);
+	long (*direct_access) (struct block_device *, sector_t,
+					void **, unsigned long *pfn, long size);
 	unsigned int (*check_events) (struct gendisk *disk,
 				      unsigned int clearing);
 	/* ->media_changed() is DEPRECATED, use ->check_events() instead */
@@ -1632,6 +1632,8 @@ extern int __blkdev_driver_ioctl(struct block_device *, fmode_t, unsigned int,
 extern int bdev_read_page(struct block_device *, sector_t, struct page *);
 extern int bdev_write_page(struct block_device *, sector_t, struct page *,
 						struct writeback_control *);
+extern long bdev_direct_access(struct block_device *, sector_t, void **addr,
+						unsigned long *pfn, long size);
 #else /* CONFIG_BLOCK */
 
 struct block_device;
-- 
1.9.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] brd: Add getgeo to block ops
  2014-08-06 11:27 [PATCHSET 0/4] brd: partition fixes Boaz Harrosh
  2014-08-06 11:29 ` [PATCH 1/4] Change direct_access calling convention Boaz Harrosh
@ 2014-08-06 11:30 ` Boaz Harrosh
  2014-08-06 17:52   ` Ross Zwisler
  2014-08-07 14:03   ` [PATCH 2/4 v2] " Boaz Harrosh
  2014-08-06 11:33 ` [PATCH 3/4] brd: Fix all partitions BUGs Boaz Harrosh
  2014-08-06 11:35 ` [PATCH 4/4] brd: Request from fdisk 4k alignment Boaz Harrosh
  3 siblings, 2 replies; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-06 11:30 UTC (permalink / raw)
  To: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel; +Cc: linux-fsdevel

From: Ross Zwisler <ross.zwisler@linux.intel.com>

Some programs require HDIO_GETGEO work, which requires we implement
getgeo.  Based off of the work done to the NVMe driver in this commit:

4cc09e2dc4cb NVMe: Add getgeo to block ops

[Boaz] Converted original work done for prd.c for here.
       This is needed if we want to support partitions, fdisk
       calls this.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 drivers/block/brd.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index a10a0a9..3f07cb4 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -19,6 +19,7 @@
 #include <linux/radix-tree.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/hdreg.h>
 
 #include <asm/uaccess.h>
 
@@ -424,6 +425,15 @@ static int brd_ioctl(struct block_device *bdev, fmode_t mode,
 	return error;
 }
 
+static int brd_getgeo(struct block_device *bd, struct hd_geometry *geo)
+{
+	/* some standard values */
+	geo->heads = 1 << 6;
+	geo->sectors = 1 << 5;
+	geo->cylinders = get_capacity(bd->bd_disk) >> 11;
+	return 0;
+}
+
 static const struct block_device_operations brd_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		brd_rw_page,
@@ -431,6 +441,7 @@ static const struct block_device_operations brd_fops = {
 #ifdef CONFIG_BLK_DEV_XIP
 	.direct_access =	brd_direct_access,
 #endif
+	.getgeo =		brd_getgeo,
 };
 
 /*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] brd: Fix all partitions BUGs
  2014-08-06 11:27 [PATCHSET 0/4] brd: partition fixes Boaz Harrosh
  2014-08-06 11:29 ` [PATCH 1/4] Change direct_access calling convention Boaz Harrosh
  2014-08-06 11:30 ` [PATCH 2/4] brd: Add getgeo to block ops Boaz Harrosh
@ 2014-08-06 11:33 ` Boaz Harrosh
  2014-08-06 23:06   ` Ross Zwisler
  2014-08-07 18:53   ` Ross Zwisler
  2014-08-06 11:35 ` [PATCH 4/4] brd: Request from fdisk 4k alignment Boaz Harrosh
  3 siblings, 2 replies; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-06 11:33 UTC (permalink / raw)
  To: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel; +Cc: linux-fsdevel


This patch fixes up brd's partitions scheme, now enjoying all worlds.

The MAIN fix here is that currently if one fdisks some partitions,
a BAD bug will make all partitions point to the same start-end sector
ie: 0 - brd_size And an mkfs of any partition would trash the partition
table and the other partition.

Another fix is that "mount -U uuid" did not work, because of the
GENHD_FL_SUPPRESS_PARTITION_INFO flag.

So NOW the logic goes like this:
* max_part - Just says how many minors to reserve between devices
  But in any way, there can be as many partition as requested.
  If minors between devices ends, then dynamic 259-major ids will
  be allocated on the fly.
  The default is now max_part=1, which means all partitions devt
  will be from the dynamic major-range.
  (If persistent partition minors is needed use max_part=)

* Creation of new devices on the fly still/always work:
  mknod /path/devnod b 1 X
  fdisk -l /path/devnod
  Will create a new device if (X / max_part) was not already
  created before. (Just as before)

  partitions on the dynamically created device will work as well
  Same logic applies with minors as with the pre-created ones.

TODO: dynamic grow of device size, maybe through sysfs. So each
      device can have it's own size.

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 drivers/block/brd.c | 93 +++++++++++++++++++++--------------------------------
 1 file changed, 36 insertions(+), 57 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 3f07cb4..9673704 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -447,16 +447,18 @@ static const struct block_device_operations brd_fops = {
 /*
  * And now the modules code and kernel interface.
  */
-static int rd_nr;
-int rd_size = CONFIG_BLK_DEV_RAM_SIZE;
-static int max_part;
-static int part_shift;
+static int rd_nr = CONFIG_BLK_DEV_RAM_COUNT;
 module_param(rd_nr, int, S_IRUGO);
 MODULE_PARM_DESC(rd_nr, "Maximum number of brd devices");
+
+int rd_size = CONFIG_BLK_DEV_RAM_SIZE;
 module_param(rd_size, int, S_IRUGO);
 MODULE_PARM_DESC(rd_size, "Size of each RAM disk in kbytes.");
+
+static int max_part = 1;
 module_param(max_part, int, S_IRUGO);
-MODULE_PARM_DESC(max_part, "Maximum number of partitions per RAM disk");
+MODULE_PARM_DESC(max_part, "Num Minors to reserve between devices");
+
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_BLOCKDEV_MAJOR(RAMDISK_MAJOR);
 MODULE_ALIAS("rd");
@@ -502,15 +504,15 @@ static struct brd_device *brd_alloc(int i)
 	brd->brd_queue->limits.discard_zeroes_data = 1;
 	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
 
-	disk = brd->brd_disk = alloc_disk(1 << part_shift);
+	disk = brd->brd_disk = alloc_disk(max_part);
 	if (!disk)
 		goto out_free_queue;
 	disk->major		= RAMDISK_MAJOR;
-	disk->first_minor	= i << part_shift;
+	disk->first_minor	= i * max_part;
 	disk->fops		= &brd_fops;
 	disk->private_data	= brd;
 	disk->queue		= brd->brd_queue;
-	disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
+	disk->flags		= GENHD_FL_EXT_DEVT;
 	sprintf(disk->disk_name, "ram%d", i);
 	set_capacity(disk, rd_size * 2);
 
@@ -532,10 +534,11 @@ static void brd_free(struct brd_device *brd)
 	kfree(brd);
 }
 
-static struct brd_device *brd_init_one(int i)
+static struct brd_device *brd_init_one(int i, bool *new)
 {
 	struct brd_device *brd;
 
+	*new = false;
 	list_for_each_entry(brd, &brd_devices, brd_list) {
 		if (brd->brd_number == i)
 			goto out;
@@ -546,6 +549,7 @@ static struct brd_device *brd_init_one(int i)
 		add_disk(brd->brd_disk);
 		list_add_tail(&brd->brd_list, &brd_devices);
 	}
+	*new = true;
 out:
 	return brd;
 }
@@ -561,70 +565,46 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data)
 {
 	struct brd_device *brd;
 	struct kobject *kobj;
+	bool new;
 
 	mutex_lock(&brd_devices_mutex);
-	brd = brd_init_one(MINOR(dev) >> part_shift);
+	brd = brd_init_one(MINOR(dev) / max_part, &new);
 	kobj = brd ? get_disk(brd->brd_disk) : NULL;
 	mutex_unlock(&brd_devices_mutex);
 
-	*part = 0;
+	if (new)
+		*part = 0;
+
 	return kobj;
 }
 
 static int __init brd_init(void)
 {
-	int i, nr;
-	unsigned long range;
 	struct brd_device *brd, *next;
+	int i;
 
 	/*
 	 * brd module now has a feature to instantiate underlying device
 	 * structure on-demand, provided that there is an access dev node.
-	 * However, this will not work well with user space tool that doesn't
-	 * know about such "feature".  In order to not break any existing
-	 * tool, we do the following:
 	 *
-	 * (1) if rd_nr is specified, create that many upfront, and this
-	 *     also becomes a hard limit.
-	 * (2) if rd_nr is not specified, create CONFIG_BLK_DEV_RAM_COUNT
-	 *     (default 16) rd device on module load, user can further
-	 *     extend brd device by create dev node themselves and have
-	 *     kernel automatically instantiate actual device on-demand.
+	 * (1) if rd_nr is specified, create that many upfront. else
+	 *     it defaults to CONFIG_BLK_DEV_RAM_COUNT
+	 * (2) User can further extend brd devices by create dev node themselves
+	 *     and have kernel automatically instantiate actual device
+	 *     on-demand. Example:
+	 *		mknod /path/devnod_name b 1 X	# 1 is the rd major
+	 *		fdisk -l /path/devnod_name
+	 *	If (X / max_part) was not already created it will be created
+	 *	dynamically.
 	 */
 
-	part_shift = 0;
-	if (max_part > 0) {
-		part_shift = fls(max_part);
-
-		/*
-		 * Adjust max_part according to part_shift as it is exported
-		 * to user space so that user can decide correct minor number
-		 * if [s]he want to create more devices.
-		 *
-		 * Note that -1 is required because partition 0 is reserved
-		 * for the whole disk.
-		 */
-		max_part = (1UL << part_shift) - 1;
-	}
-
-	if ((1UL << part_shift) > DISK_MAX_PARTS)
-		return -EINVAL;
-
-	if (rd_nr > 1UL << (MINORBITS - part_shift))
-		return -EINVAL;
-
-	if (rd_nr) {
-		nr = rd_nr;
-		range = rd_nr << part_shift;
-	} else {
-		nr = CONFIG_BLK_DEV_RAM_COUNT;
-		range = 1UL << MINORBITS;
-	}
-
 	if (register_blkdev(RAMDISK_MAJOR, "ramdisk"))
 		return -EIO;
 
-	for (i = 0; i < nr; i++) {
+	if (unlikely(!max_part))
+		max_part = 1;
+
+	for (i = 0; i < rd_nr; i++) {
 		brd = brd_alloc(i);
 		if (!brd)
 			goto out_free;
@@ -636,7 +616,7 @@ static int __init brd_init(void)
 	list_for_each_entry(brd, &brd_devices, brd_list)
 		add_disk(brd->brd_disk);
 
-	blk_register_region(MKDEV(RAMDISK_MAJOR, 0), range,
+	blk_register_region(MKDEV(RAMDISK_MAJOR, 0), 1UL << MINORBITS,
 				  THIS_MODULE, brd_probe, NULL, NULL);
 
 	printk(KERN_INFO "brd: module loaded\n");
@@ -649,21 +629,20 @@ out_free:
 	}
 	unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
 
+	printk(KERN_INFO "brd: module NOT loaded !!!\n");
 	return -ENOMEM;
 }
 
 static void __exit brd_exit(void)
 {
-	unsigned long range;
 	struct brd_device *brd, *next;
 
-	range = rd_nr ? rd_nr << part_shift : 1UL << MINORBITS;
-
 	list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
 		brd_del_one(brd);
 
-	blk_unregister_region(MKDEV(RAMDISK_MAJOR, 0), range);
+	blk_unregister_region(MKDEV(RAMDISK_MAJOR, 0), 1UL << MINORBITS);
 	unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
+	printk(KERN_INFO "brd: module unloaded\n");
 }
 
 module_init(brd_init);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-06 11:27 [PATCHSET 0/4] brd: partition fixes Boaz Harrosh
                   ` (2 preceding siblings ...)
  2014-08-06 11:33 ` [PATCH 3/4] brd: Fix all partitions BUGs Boaz Harrosh
@ 2014-08-06 11:35 ` Boaz Harrosh
  2014-08-06 22:03   ` Ross Zwisler
  3 siblings, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-06 11:35 UTC (permalink / raw)
  To: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel; +Cc: linux-fsdevel


Because of the direct_access() API which returns a PFN. partitions
better start on 4K boundary, else offset ZERO of a partition will
not be aligned and blk_direct_access() will fail the call.

By setting blk_queue_physical_block_size(PAGE_SIZE) we can communicate
this to fdisk and friends.
Note that blk_queue_physical_block_size() also trashes io_min, but
we can leave this one to be 512.

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 drivers/block/brd.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 9673704..514cfe1 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -495,10 +495,17 @@ static struct brd_device *brd_alloc(int i)
 	brd->brd_queue = blk_alloc_queue(GFP_KERNEL);
 	if (!brd->brd_queue)
 		goto out_free_dev;
+
 	blk_queue_make_request(brd->brd_queue, brd_make_request);
 	blk_queue_max_hw_sectors(brd->brd_queue, 1024);
 	blk_queue_bounce_limit(brd->brd_queue, BLK_BOUNCE_ANY);
 
+	/* This is so fdisk will align partitions on 4k, because of
+	 * direct_access API needing 4k alignment, returning a PFN
+	 */
+	blk_queue_physical_block_size(brd->brd_queue, PAGE_SIZE);
+	brd->brd_queue->limits.io_min = 512; /* Don't use the accessor */
+
 	brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
 	brd->brd_queue->limits.max_discard_sectors = UINT_MAX;
 	brd->brd_queue->limits.discard_zeroes_data = 1;
-- 
1.9.3



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] brd: Add getgeo to block ops
  2014-08-06 11:30 ` [PATCH 2/4] brd: Add getgeo to block ops Boaz Harrosh
@ 2014-08-06 17:52   ` Ross Zwisler
  2014-08-07  9:20     ` Boaz Harrosh
  2014-08-07 14:03   ` [PATCH 2/4 v2] " Boaz Harrosh
  1 sibling, 1 reply; 22+ messages in thread
From: Ross Zwisler @ 2014-08-06 17:52 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel, linux-fsdevel

On Wed, 6 Aug 2014, Boaz Harrosh wrote:
> From: Ross Zwisler <ross.zwisler@linux.intel.com>
> 
> Some programs require HDIO_GETGEO work, which requires we implement
> getgeo.  Based off of the work done to the NVMe driver in this commit:
> 
> 4cc09e2dc4cb NVMe: Add getgeo to block ops
> 
> [Boaz] Converted original work done for prd.c for here.
>        This is needed if we want to support partitions, fdisk
>        calls this.
> 
> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
> ---
>  drivers/block/brd.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
> index a10a0a9..3f07cb4 100644
> --- a/drivers/block/brd.c
> +++ b/drivers/block/brd.c
> @@ -19,6 +19,7 @@
>  #include <linux/radix-tree.h>
>  #include <linux/fs.h>
>  #include <linux/slab.h>
> +#include <linux/hdreg.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -424,6 +425,15 @@ static int brd_ioctl(struct block_device *bdev, fmode_t mode,
>  	return error;
>  }
>  
> +static int brd_getgeo(struct block_device *bd, struct hd_geometry *geo)
> +{
> +	/* some standard values */
> +	geo->heads = 1 << 6;
> +	geo->sectors = 1 << 5;
> +	geo->cylinders = get_capacity(bd->bd_disk) >> 11;
> +	return 0;
> +}
> +
>  static const struct block_device_operations brd_fops = {
>  	.owner =		THIS_MODULE,
>  	.rw_page =		brd_rw_page,
> @@ -431,6 +441,7 @@ static const struct block_device_operations brd_fops = {
>  #ifdef CONFIG_BLK_DEV_XIP
>  	.direct_access =	brd_direct_access,
>  #endif
> +	.getgeo =		brd_getgeo,
>  };
>  
>  /*
> -- 
> 1.9.3

This looks good.

- Ross

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-06 11:35 ` [PATCH 4/4] brd: Request from fdisk 4k alignment Boaz Harrosh
@ 2014-08-06 22:03   ` Ross Zwisler
  2014-08-07 12:17     ` Boaz Harrosh
  0 siblings, 1 reply; 22+ messages in thread
From: Ross Zwisler @ 2014-08-06 22:03 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On Wed, 2014-08-06 at 14:35 +0300, Boaz Harrosh wrote:
> Because of the direct_access() API which returns a PFN. partitions
> better start on 4K boundary, else offset ZERO of a partition will
> not be aligned and blk_direct_access() will fail the call.
> 
> By setting blk_queue_physical_block_size(PAGE_SIZE) we can communicate
> this to fdisk and friends.
> Note that blk_queue_physical_block_size() also trashes io_min, but
> we can leave this one to be 512.
> 
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
> ---
>  drivers/block/brd.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
> index 9673704..514cfe1 100644
> --- a/drivers/block/brd.c
> +++ b/drivers/block/brd.c
> @@ -495,10 +495,17 @@ static struct brd_device *brd_alloc(int i)
>  	brd->brd_queue = blk_alloc_queue(GFP_KERNEL);
>  	if (!brd->brd_queue)
>  		goto out_free_dev;
> +
>  	blk_queue_make_request(brd->brd_queue, brd_make_request);
>  	blk_queue_max_hw_sectors(brd->brd_queue, 1024);
>  	blk_queue_bounce_limit(brd->brd_queue, BLK_BOUNCE_ANY);
>  
> +	/* This is so fdisk will align partitions on 4k, because of
> +	 * direct_access API needing 4k alignment, returning a PFN
> +	 */
> +	blk_queue_physical_block_size(brd->brd_queue, PAGE_SIZE);
> +	brd->brd_queue->limits.io_min = 512; /* Don't use the accessor */
> +
>  	brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
>  	brd->brd_queue->limits.max_discard_sectors = UINT_MAX;
>  	brd->brd_queue->limits.discard_zeroes_data = 1;

Is there an error case that this patch fixes?  I've had page alignment checks
in my PRD direct_access code forever, and I don't know if they've ever
tripped.  

Also, blk_queue_physical_block_size() seems wrong - here's the comment for
that function:

	/**
	 * blk_queue_physical_block_size - set physical block size for the queue
	 * @q:  the request queue for the device
	 * @size:  the physical block size, in bytes
	 *
	 * Description:
	 *   This should be set to the lowest possible sector size that the
	 *   hardware can operate on without reverting to read-modify-write
	 *   operations.
	 */

It doesn't sound like this is what you're after?  It sounds like instead you
want to control the alignment, not minimum natural I/O size?  It seems like if
you did want to do this, blk_queue_alignment_offset() would be more what you
were after?

- Ross



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] brd: Fix all partitions BUGs
  2014-08-06 11:33 ` [PATCH 3/4] brd: Fix all partitions BUGs Boaz Harrosh
@ 2014-08-06 23:06   ` Ross Zwisler
  2014-08-07  9:11     ` Boaz Harrosh
  2014-08-07 18:53   ` Ross Zwisler
  1 sibling, 1 reply; 22+ messages in thread
From: Ross Zwisler @ 2014-08-06 23:06 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On Wed, 2014-08-06 at 14:33 +0300, Boaz Harrosh wrote:
> This patch fixes up brd's partitions scheme, now enjoying all worlds.
> 
> The MAIN fix here is that currently if one fdisks some partitions,
> a BAD bug will make all partitions point to the same start-end sector
> ie: 0 - brd_size And an mkfs of any partition would trash the partition
> table and the other partition.
> 
> Another fix is that "mount -U uuid" did not work, because of the
> GENHD_FL_SUPPRESS_PARTITION_INFO flag.
> 
> So NOW the logic goes like this:
> * max_part - Just says how many minors to reserve between devices
>   But in any way, there can be as many partition as requested.
>   If minors between devices ends, then dynamic 259-major ids will
>   be allocated on the fly.
>   The default is now max_part=1, which means all partitions devt
>   will be from the dynamic major-range.
>   (If persistent partition minors is needed use max_part=)
> 
> * Creation of new devices on the fly still/always work:
>   mknod /path/devnod b 1 X
>   fdisk -l /path/devnod
>   Will create a new device if (X / max_part) was not already
>   created before. (Just as before)
> 
>   partitions on the dynamically created device will work as well
>   Same logic applies with minors as with the pre-created ones.
> 
> TODO: dynamic grow of device size, maybe through sysfs. So each
>       device can have it's own size.

With this patch we end up in what feels like a weird place where we're half
using the old scheme of major/minor allocation, and half in the world of
dynamic major/minors.  Devices have a major of 1 and minors that increment by
1, but partitions have a major of 259 (BLOCK_EXT_MAJOR):

	brw-rw---- 1 root disk   1, 0 Aug  6 14:10 /dev/ram0
	brw-rw---- 1 root disk   1, 1 Aug  6 14:13 /dev/ram1
	brw-rw---- 1 root disk 259, 0 Aug  6 14:14 /dev/ram1p1
	brw-rw---- 1 root disk 259, 1 Aug  6 14:13 /dev/ram1p2
	brw-rw---- 1 root disk 259, 2 Aug  6 14:14 /dev/ram1p51

For NVMe and PRD you get a major of 259 all around:

	brw-rw---- 1 root disk 259, 0 Aug  6 16:55 /dev/pmem0
	brw-rw---- 1 root disk 259, 2 Aug  6 16:55 /dev/pmem0p1
	brw-rw---- 1 root disk 259, 3 Aug  6 16:55 /dev/pmem0p2
	brw-rw---- 1 root disk 259, 1 Aug  6 16:54 /dev/pmem1

It could be that this is fine, but it just smells fishy to me I guess.

Also, it looks like you can still create a new device with this patch, but you
can't create partitions on that device.  Not sure if this is just what you get
when you dynamically create a device on the fly, or if it's a symptom of
something larger.

- Ross

> 
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
> ---
>  drivers/block/brd.c | 93 +++++++++++++++++++++--------------------------------
>  1 file changed, 36 insertions(+), 57 deletions(-)
> 
> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
> index 3f07cb4..9673704 100644
> --- a/drivers/block/brd.c
> +++ b/drivers/block/brd.c
> @@ -447,16 +447,18 @@ static const struct block_device_operations brd_fops = {
>  /*
>   * And now the modules code and kernel interface.
>   */
> -static int rd_nr;
> -int rd_size = CONFIG_BLK_DEV_RAM_SIZE;
> -static int max_part;
> -static int part_shift;
> +static int rd_nr = CONFIG_BLK_DEV_RAM_COUNT;
>  module_param(rd_nr, int, S_IRUGO);
>  MODULE_PARM_DESC(rd_nr, "Maximum number of brd devices");
> +
> +int rd_size = CONFIG_BLK_DEV_RAM_SIZE;
>  module_param(rd_size, int, S_IRUGO);
>  MODULE_PARM_DESC(rd_size, "Size of each RAM disk in kbytes.");
> +
> +static int max_part = 1;
>  module_param(max_part, int, S_IRUGO);
> -MODULE_PARM_DESC(max_part, "Maximum number of partitions per RAM disk");
> +MODULE_PARM_DESC(max_part, "Num Minors to reserve between devices");
> +
>  MODULE_LICENSE("GPL");
>  MODULE_ALIAS_BLOCKDEV_MAJOR(RAMDISK_MAJOR);
>  MODULE_ALIAS("rd");
> @@ -502,15 +504,15 @@ static struct brd_device *brd_alloc(int i)
>  	brd->brd_queue->limits.discard_zeroes_data = 1;
>  	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, brd->brd_queue);
>  
> -	disk = brd->brd_disk = alloc_disk(1 << part_shift);
> +	disk = brd->brd_disk = alloc_disk(max_part);
>  	if (!disk)
>  		goto out_free_queue;
>  	disk->major		= RAMDISK_MAJOR;
> -	disk->first_minor	= i << part_shift;
> +	disk->first_minor	= i * max_part;
>  	disk->fops		= &brd_fops;
>  	disk->private_data	= brd;
>  	disk->queue		= brd->brd_queue;
> -	disk->flags |= GENHD_FL_SUPPRESS_PARTITION_INFO;
> +	disk->flags		= GENHD_FL_EXT_DEVT;
>  	sprintf(disk->disk_name, "ram%d", i);
>  	set_capacity(disk, rd_size * 2);
>  
> @@ -532,10 +534,11 @@ static void brd_free(struct brd_device *brd)
>  	kfree(brd);
>  }
>  
> -static struct brd_device *brd_init_one(int i)
> +static struct brd_device *brd_init_one(int i, bool *new)
>  {
>  	struct brd_device *brd;
>  
> +	*new = false;
>  	list_for_each_entry(brd, &brd_devices, brd_list) {
>  		if (brd->brd_number == i)
>  			goto out;
> @@ -546,6 +549,7 @@ static struct brd_device *brd_init_one(int i)
>  		add_disk(brd->brd_disk);
>  		list_add_tail(&brd->brd_list, &brd_devices);
>  	}
> +	*new = true;
>  out:
>  	return brd;
>  }
> @@ -561,70 +565,46 @@ static struct kobject *brd_probe(dev_t dev, int *part, void *data)
>  {
>  	struct brd_device *brd;
>  	struct kobject *kobj;
> +	bool new;
>  
>  	mutex_lock(&brd_devices_mutex);
> -	brd = brd_init_one(MINOR(dev) >> part_shift);
> +	brd = brd_init_one(MINOR(dev) / max_part, &new);
>  	kobj = brd ? get_disk(brd->brd_disk) : NULL;
>  	mutex_unlock(&brd_devices_mutex);
>  
> -	*part = 0;
> +	if (new)
> +		*part = 0;
> +
>  	return kobj;
>  }
>  
>  static int __init brd_init(void)
>  {
> -	int i, nr;
> -	unsigned long range;
>  	struct brd_device *brd, *next;
> +	int i;
>  
>  	/*
>  	 * brd module now has a feature to instantiate underlying device
>  	 * structure on-demand, provided that there is an access dev node.
> -	 * However, this will not work well with user space tool that doesn't
> -	 * know about such "feature".  In order to not break any existing
> -	 * tool, we do the following:
>  	 *
> -	 * (1) if rd_nr is specified, create that many upfront, and this
> -	 *     also becomes a hard limit.
> -	 * (2) if rd_nr is not specified, create CONFIG_BLK_DEV_RAM_COUNT
> -	 *     (default 16) rd device on module load, user can further
> -	 *     extend brd device by create dev node themselves and have
> -	 *     kernel automatically instantiate actual device on-demand.
> +	 * (1) if rd_nr is specified, create that many upfront. else
> +	 *     it defaults to CONFIG_BLK_DEV_RAM_COUNT
> +	 * (2) User can further extend brd devices by create dev node themselves
> +	 *     and have kernel automatically instantiate actual device
> +	 *     on-demand. Example:
> +	 *		mknod /path/devnod_name b 1 X	# 1 is the rd major
> +	 *		fdisk -l /path/devnod_name
> +	 *	If (X / max_part) was not already created it will be created
> +	 *	dynamically.
>  	 */
>  
> -	part_shift = 0;
> -	if (max_part > 0) {
> -		part_shift = fls(max_part);
> -
> -		/*
> -		 * Adjust max_part according to part_shift as it is exported
> -		 * to user space so that user can decide correct minor number
> -		 * if [s]he want to create more devices.
> -		 *
> -		 * Note that -1 is required because partition 0 is reserved
> -		 * for the whole disk.
> -		 */
> -		max_part = (1UL << part_shift) - 1;
> -	}
> -
> -	if ((1UL << part_shift) > DISK_MAX_PARTS)
> -		return -EINVAL;
> -
> -	if (rd_nr > 1UL << (MINORBITS - part_shift))
> -		return -EINVAL;
> -
> -	if (rd_nr) {
> -		nr = rd_nr;
> -		range = rd_nr << part_shift;
> -	} else {
> -		nr = CONFIG_BLK_DEV_RAM_COUNT;
> -		range = 1UL << MINORBITS;
> -	}
> -
>  	if (register_blkdev(RAMDISK_MAJOR, "ramdisk"))
>  		return -EIO;
>  
> -	for (i = 0; i < nr; i++) {
> +	if (unlikely(!max_part))
> +		max_part = 1;
> +
> +	for (i = 0; i < rd_nr; i++) {
>  		brd = brd_alloc(i);
>  		if (!brd)
>  			goto out_free;
> @@ -636,7 +616,7 @@ static int __init brd_init(void)
>  	list_for_each_entry(brd, &brd_devices, brd_list)
>  		add_disk(brd->brd_disk);
>  
> -	blk_register_region(MKDEV(RAMDISK_MAJOR, 0), range,
> +	blk_register_region(MKDEV(RAMDISK_MAJOR, 0), 1UL << MINORBITS,
>  				  THIS_MODULE, brd_probe, NULL, NULL);
>  
>  	printk(KERN_INFO "brd: module loaded\n");
> @@ -649,21 +629,20 @@ out_free:
>  	}
>  	unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
>  
> +	printk(KERN_INFO "brd: module NOT loaded !!!\n");
>  	return -ENOMEM;
>  }
>  
>  static void __exit brd_exit(void)
>  {
> -	unsigned long range;
>  	struct brd_device *brd, *next;
>  
> -	range = rd_nr ? rd_nr << part_shift : 1UL << MINORBITS;
> -
>  	list_for_each_entry_safe(brd, next, &brd_devices, brd_list)
>  		brd_del_one(brd);
>  
> -	blk_unregister_region(MKDEV(RAMDISK_MAJOR, 0), range);
> +	blk_unregister_region(MKDEV(RAMDISK_MAJOR, 0), 1UL << MINORBITS);
>  	unregister_blkdev(RAMDISK_MAJOR, "ramdisk");
> +	printk(KERN_INFO "brd: module unloaded\n");
>  }
>  
>  module_init(brd_init);





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] brd: Fix all partitions BUGs
  2014-08-06 23:06   ` Ross Zwisler
@ 2014-08-07  9:11     ` Boaz Harrosh
  2014-08-07 18:50       ` Ross Zwisler
  0 siblings, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-07  9:11 UTC (permalink / raw)
  To: Ross Zwisler, Boaz Harrosh
  Cc: Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On 08/07/2014 02:06 AM, Ross Zwisler wrote:
> On Wed, 2014-08-06 at 14:33 +0300, Boaz Harrosh wrote:
> 
> With this patch we end up in what feels like a weird place where we're half
> using the old scheme of major/minor allocation, and half in the world of
> dynamic major/minors.  Devices have a major of 1 and minors that increment by
> 1, but partitions have a major of 259 (BLOCK_EXT_MAJOR):
> 
> 	brw-rw---- 1 root disk   1, 0 Aug  6 14:10 /dev/ram0
> 	brw-rw---- 1 root disk   1, 1 Aug  6 14:13 /dev/ram1
> 	brw-rw---- 1 root disk 259, 0 Aug  6 14:14 /dev/ram1p1
> 	brw-rw---- 1 root disk 259, 1 Aug  6 14:13 /dev/ram1p2
> 	brw-rw---- 1 root disk 259, 2 Aug  6 14:14 /dev/ram1p51
> 

This is when you use max_part = 1. If someone really cares, why
would he care, but if he cares like if he wants persistent
partition minors than he can use  max_part = 8, say and it will
all go nice. (And if he creates 9 partitions on a device then
the last one will be from the 259 dynamic range)

So you see it is all in the user choice. If he cares at all
because I do not see why it matters

> For NVMe and PRD you get a major of 259 all around:
> 
> 	brw-rw---- 1 root disk 259, 0 Aug  6 16:55 /dev/pmem0
> 	brw-rw---- 1 root disk 259, 2 Aug  6 16:55 /dev/pmem0p1
> 	brw-rw---- 1 root disk 259, 3 Aug  6 16:55 /dev/pmem0p2
> 	brw-rw---- 1 root disk 259, 1 Aug  6 16:54 /dev/pmem1
> 
> It could be that this is fine, but it just smells fishy to me I guess.
> 

If you go this way then dynamic creation with the mknod trick does not work
because there is no association in the system with your driver. Whats so fishy
I have read the all partitions code this is simple math really. This is the way
it works.

> Also, it looks like you can still create a new device with this patch, but you
> can't create partitions on that device.  Not sure if this is just what you get
> when you dynamically create a device on the fly, or if it's a symptom of
> something larger.
> 

What? I just tried again this all works fine for me, here with fdisk.
$ modprobe brd      # will create ram0-7
$ mknod /dev/ram8 b 1 8
$ fdisk /dev/ram8
  g, n, , , +2M, n, , , , , w 

I create 2 partitions 2M each and press w and it is all there.

What numbers did you use ? rd_nr, max_part, and the mknod numbers. Here it
just works fine. What did you try?

Thanks
Boaz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4] brd: Add getgeo to block ops
  2014-08-06 17:52   ` Ross Zwisler
@ 2014-08-07  9:20     ` Boaz Harrosh
  0 siblings, 0 replies; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-07  9:20 UTC (permalink / raw)
  To: Ross Zwisler, Boaz Harrosh
  Cc: Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On 08/06/2014 08:52 PM, Ross Zwisler wrote:
> On Wed, 6 Aug 2014, Boaz Harrosh wrote:
>> From: Ross Zwisler <ross.zwisler@linux.intel.com>
>>
>> Some programs require HDIO_GETGEO work, which requires we implement
>> getgeo.  Based off of the work done to the NVMe driver in this commit:
>>
<>
>> +static int brd_getgeo(struct block_device *bd, struct hd_geometry *geo)
>> +{
>> +	/* some standard values */
>> +	geo->heads = 1 << 6;
>> +	geo->sectors = 1 << 5;
>> +	geo->cylinders = get_capacity(bd->bd_disk) >> 11;
>> +	return 0;
>> +}
>> +
<> 
> This looks good.
> 

Actually this one is wrong. When I go into fdisk it always gives me
34 as the first possible sector. (And with my 4k patch it gives me
40)

But what I really want is just plain 8. All I want is to waist first
4k for partition table and enough is enough. So actually this above
is wrong.

All we need is :
+	geo->heads = 1 ;
+	geo->sectors = 1 ;
+	geo->cylinders = 1;

Now do not worry the capacity is never extracted from this 32bit structure
for a long time. The capacity is fetched else where, in all programs. This
here just gives you that wired HD math calculation for this 34 sectors
alignment crap.

I just tried it works perfectly with fdisk and gpart. So I'll post a patch.

Thanks
Boaz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-06 22:03   ` Ross Zwisler
@ 2014-08-07 12:17     ` Boaz Harrosh
  2014-08-07 13:00       ` Karel Zak
  0 siblings, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-07 12:17 UTC (permalink / raw)
  To: Ross Zwisler, Boaz Harrosh
  Cc: Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On 08/07/2014 01:03 AM, Ross Zwisler wrote:
> On Wed, 2014-08-06 at 14:35 +0300, Boaz Harrosh wrote:
>> Because of the direct_access() API which returns a PFN. partitions
>> better start on 4K boundary, else offset ZERO of a partition will
>> not be aligned and blk_direct_access() will fail the call.
>>
>> By setting blk_queue_physical_block_size(PAGE_SIZE) we can communicate
>> this to fdisk and friends.
>> Note that blk_queue_physical_block_size() also trashes io_min, but
>> we can leave this one to be 512.
>>
>> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
>> ---
>>  drivers/block/brd.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/block/brd.c b/drivers/block/brd.c
>> index 9673704..514cfe1 100644
>> --- a/drivers/block/brd.c
>> +++ b/drivers/block/brd.c
>> @@ -495,10 +495,17 @@ static struct brd_device *brd_alloc(int i)
>>  	brd->brd_queue = blk_alloc_queue(GFP_KERNEL);
>>  	if (!brd->brd_queue)
>>  		goto out_free_dev;
>> +
>>  	blk_queue_make_request(brd->brd_queue, brd_make_request);
>>  	blk_queue_max_hw_sectors(brd->brd_queue, 1024);
>>  	blk_queue_bounce_limit(brd->brd_queue, BLK_BOUNCE_ANY);
>>  
>> +	/* This is so fdisk will align partitions on 4k, because of
>> +	 * direct_access API needing 4k alignment, returning a PFN
>> +	 */
>> +	blk_queue_physical_block_size(brd->brd_queue, PAGE_SIZE);
>> +	brd->brd_queue->limits.io_min = 512; /* Don't use the accessor */
>> +
>>  	brd->brd_queue->limits.discard_granularity = PAGE_SIZE;
>>  	brd->brd_queue->limits.max_discard_sectors = UINT_MAX;
>>  	brd->brd_queue->limits.discard_zeroes_data = 1;
> 
> Is there an error case that this patch fixes?  I've had page alignment checks
> in my PRD direct_access code forever, and I don't know if they've ever
> tripped.  
> 

Yes! as I said above fix fdisk. You never tripped on it because partitions never
worked and you never tried them. With current code fdisk is very trigger happy
to miss-align my partitions. Depending on size maybe not the very first one but the
consecutive ones easily.
With this patch user can still do wrong, but all the default values offered
by fdisk are correct, specially for start-sector which is the one we care about
most. So yes this is an important fix. 

> Also, blk_queue_physical_block_size() seems wrong - here's the comment for
> that function:
> 
> 	/**
> 	 * blk_queue_physical_block_size - set physical block size for the queue
> 	 * @q:  the request queue for the device
> 	 * @size:  the physical block size, in bytes
> 	 *
> 	 * Description:
> 	 *   This should be set to the lowest possible sector size that the
> 	 *   hardware can operate on without reverting to read-modify-write
> 	 *   operations.
> 	 */
> 

This text is just text. But in practice all this is used for is alignment.
Actually at Kernel it is not used, it is just exported through the disk limits
sysfs info to user space.

I can see in code that there are bunch of fast flash devices like Micron that do the
same, and also like zram and others, this is the only one that gets the job done, that
I can find.

And then this is the only one that caused fdisk to jump from 34 to 40.
And with my new getgeo patch jump to 8, and align all partition starts on
8. try it for your self.

> It doesn't sound like this is what you're after?  It sounds like instead you
> want to control the alignment, not minimum natural I/O size?  It seems like if
> you did want to do this, blk_queue_alignment_offset() would be more what you
> were after?
> 

No. I tried that one. It only helps with first partition start. And not so much.
(And it is actually masked by limits->physical_block_size)

fdisk will not look at it only Kernel try's to do something with it after it
is too late and the partition table is received wrong.

With what I searched in code and what I tried with fdisk this does exactly what I
want, scares everyone to align my sectors on 8!

I would like to see some bad code doing because of this one? for me and with
DAX access it can do no harm whats so ever. What are you afraid that can happen
with this? Do you imagine applications doing extra copy because of that? That
does not make sense. The read-modify-write is faster at the device then at
application so the application should let it be as is and not do anything extra

> - Ross
> 
> 

Please play with this for a while. I have for a week now. It looks very good
and the only one that does what I want. Actually I'm very happy that I found
this, I was afraid at first that nothing will work, and fdisk will keep give
me wrong start sectors.

Thanks
Boaz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-07 12:17     ` Boaz Harrosh
@ 2014-08-07 13:00       ` Karel Zak
  2014-08-07 13:51         ` Karel Zak
  2014-08-07 13:57         ` Boaz Harrosh
  0 siblings, 2 replies; 22+ messages in thread
From: Karel Zak @ 2014-08-07 13:00 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Ross Zwisler, Boaz Harrosh, Jens Axboe, Matthew Wilcox,
	linux-kernel, linux-fsdevel

On Thu, Aug 07, 2014 at 03:17:23PM +0300, Boaz Harrosh wrote:

> > Is there an error case that this patch fixes?  I've had page alignment checks
> > in my PRD direct_access code forever, and I don't know if they've ever
> > tripped.  
> > 
> 
> Yes! as I said above fix fdisk. You never tripped on it because partitions never
> worked and you never tried them. With current code fdisk is very trigger happy

 What do you mean with fdisk? which version?

 The current fdisk (and cfdisk) follows I/O limits it has no problem
 with 4K devices. All you need is to provide all necessary information 
 by /sys (or ioctls).

> to miss-align my partitions. Depending on size maybe not the very first one but the
> consecutive ones easily.

 it would be nice to have usable bug report...

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-07 13:00       ` Karel Zak
@ 2014-08-07 13:51         ` Karel Zak
  2014-08-07 13:57         ` Boaz Harrosh
  1 sibling, 0 replies; 22+ messages in thread
From: Karel Zak @ 2014-08-07 13:51 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Ross Zwisler, Boaz Harrosh, Jens Axboe, Matthew Wilcox,
	linux-kernel, linux-fsdevel

On Thu, Aug 07, 2014 at 03:00:42PM +0200, Karel Zak wrote:
> On Thu, Aug 07, 2014 at 03:17:23PM +0300, Boaz Harrosh wrote:
> 
> > > Is there an error case that this patch fixes?  I've had page alignment checks
> > > in my PRD direct_access code forever, and I don't know if they've ever
> > > tripped.  
> > > 
> > 
> > Yes! as I said above fix fdisk. You never tripped on it because partitions never
> > worked and you never tried them. With current code fdisk is very trigger happy
> 
>  What do you mean with fdisk? which version?

Oh, I read all your email more carefully now, and if I good understand
the problem is what I/O limits the device provides to userspace rather
than with fdisk.

Anyway, I think you're right that alignment_offset is not the right
thing. It was introduced for backward compatibility with DOS-like
partitioning tools (~magical sector 63). I have doubts it's usable for
something else. For normal use-case should be enough to set proper
phy-sector size or min/optimal I/O limits (e.g. zram has all the
limits set to 4K(PAGE_SIZE)).

$ lsblk --topology /dev/zram0
NAME  ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
zram0         0   4096   4096    4096    4096    0           128 128 0B

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-07 13:00       ` Karel Zak
  2014-08-07 13:51         ` Karel Zak
@ 2014-08-07 13:57         ` Boaz Harrosh
  2014-08-07 15:21           ` Karel Zak
  1 sibling, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-07 13:57 UTC (permalink / raw)
  To: Karel Zak, Boaz Harrosh
  Cc: Ross Zwisler, Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On 08/07/2014 04:00 PM, Karel Zak wrote:
> On Thu, Aug 07, 2014 at 03:17:23PM +0300, Boaz Harrosh wrote:
> 
>>> Is there an error case that this patch fixes?  I've had page alignment checks
>>> in my PRD direct_access code forever, and I don't know if they've ever
>>> tripped.  
>>>
>>
>> Yes! as I said above fix fdisk. You never tripped on it because partitions never
>> worked and you never tried them. With current code fdisk is very trigger happy
> 
>  What do you mean with fdisk? which version?
> 

fdisk from util-linux 2.24.2


>  The current fdisk (and cfdisk) follows I/O limits it has no problem
>  with 4K devices. All you need is to provide all necessary information 
>  by /sys (or ioctls).
> 

I was not saying that fdisk is wrong. I was saying that if my block driver
was *not* exporting 4K physical sectors through limits.physical_block_size
then fdisk would be happy to not align my partition start on 4k and would
give me funny values like 34 for first sector which makes my device unusable
because in direct_access() API we must absolutely have 4K aligned partitions.

>> to miss-align my partitions. Depending on size maybe not the very first one but the
>> consecutive ones easily.
> 
>  it would be nice to have usable bug report...
> 

Hi Karel

Setting limits.physical_block_size = 4k; was the only way I found that could cause
fdisk to default to 4k alignment.

I was trying to play with the  heads, sectors, cylinders; values but none I tried
would cause an alignment of 4k, not even of the first partition start.

Please advise what I can do?

Thanks
Boaz

>     Karel
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/4 v2] brd: Add getgeo to block ops
  2014-08-06 11:30 ` [PATCH 2/4] brd: Add getgeo to block ops Boaz Harrosh
  2014-08-06 17:52   ` Ross Zwisler
@ 2014-08-07 14:03   ` Boaz Harrosh
  2014-08-07 18:20     ` One Thousand Gnomes
  1 sibling, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-07 14:03 UTC (permalink / raw)
  To: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel; +Cc: linux-fsdevel

From: Boaz Harrosh <boaz@plexistor.com>

Some programs like fdisk, require HDIO_GETGEO to work, which requires we
implement getgeo.

We set all hd_geometry members to 1, because this way fdisk
math will not try its crazy geometry math and get stuff totally wrong.

I was trying to get some values that will make fdisk Want to align
first sector on 4K (like 8, 16, 20, ... sectors) but nothing worked,
I searched the net the math is not your regular simple multiplication
at all.

If you managed to get these magic values please tell me. I would love to
solve this.

But for now we use 4k physical_sector for fixing fdisk alignment
issues, and setting these here to something that will not make
fdisk serve us with crazy numbers.

Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
---
 drivers/block/brd.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index a10a0a9..8be3747 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -19,6 +19,7 @@
 #include <linux/radix-tree.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/hdreg.h>
 
 #include <asm/uaccess.h>
 
@@ -424,6 +425,23 @@ static int brd_ioctl(struct block_device *bdev, fmode_t mode,
 	return error;
 }
 
+static int brd_getgeo(struct block_device *bd, struct hd_geometry *geo)
+{
+	/* Just tell fdisk to get out of the way. The math here is so
+	 * convoluted and does not make any sense at all. With all 1s
+	 * The math just gets out of the way.
+	 * NOTE: I was trying to get some values that will make fdisk
+	 * Want to align first sector on 4K (like 8, 16, 20, ... sectors) but
+	 * nothing worked, I searched the net the math is not your regular
+	 * simple multiplication at all. If you managed to get these please
+	 * fix here. For now we use 4k physical sectors for this
+	 */
+	geo->heads = 1;
+	geo->sectors = 1;
+	geo->cylinders = 1;
+	return 0;
+}
+
 static const struct block_device_operations brd_fops = {
 	.owner =		THIS_MODULE,
 	.rw_page =		brd_rw_page,
@@ -431,6 +449,7 @@ static const struct block_device_operations brd_fops = {
 #ifdef CONFIG_BLK_DEV_XIP
 	.direct_access =	brd_direct_access,
 #endif
+	.getgeo =		brd_getgeo,
 };
 
 /*
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-07 13:57         ` Boaz Harrosh
@ 2014-08-07 15:21           ` Karel Zak
  2014-08-07 15:40             ` Boaz Harrosh
  0 siblings, 1 reply; 22+ messages in thread
From: Karel Zak @ 2014-08-07 15:21 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Boaz Harrosh, Ross Zwisler, Jens Axboe, Matthew Wilcox,
	linux-kernel, linux-fsdevel

On Thu, Aug 07, 2014 at 04:57:18PM +0300, Boaz Harrosh wrote:

> I was not saying that fdisk is wrong. I was saying that if my block driver
> was *not* exporting 4K physical sectors through limits.physical_block_size

 yep, sorry (it's probably bad idea to read emails and listen to talks on
 conference..)

> then fdisk would be happy to not align my partition start on 4k and would
> give me funny values like 34 for first sector which makes my device unusable
> because in direct_access() API we must absolutely have 4K aligned partitions.
> 
> >> to miss-align my partitions. Depending on size maybe not the very first one but the
> >> consecutive ones easily.
> > 
> >  it would be nice to have usable bug report...
> > 
> 
> Setting limits.physical_block_size = 4k; was the only way I found that could cause
> fdisk to default to 4k alignment.

 fdisk uses physical sector size or minimal I/O size (greater value wins)

> I was trying to play with the  heads, sectors, cylinders; values but none I tried

 don't play with CHS, that's waste of time and it's completely ignored
 by fdisk by default

> would cause an alignment of 4k, not even of the first partition start.
> 
> Please advise what I can do?

 IMHO you're right with your patch (alignment offset is IMHO bad way).
 It's all (brd) about pages, is there any reason to use something else
 for I/O limits?

 It would be also nice to set minimal and optimal io size, zero values
 in this case means (for userspace) that the device does not provide
 any I/O information to system. It's normal for old hw disks and then
 we use some built-in defaults, but I don't see a reason to do the
 same for virtual devices. 

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/4] brd: Request from fdisk 4k alignment
  2014-08-07 15:21           ` Karel Zak
@ 2014-08-07 15:40             ` Boaz Harrosh
  0 siblings, 0 replies; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-07 15:40 UTC (permalink / raw)
  To: Karel Zak
  Cc: Boaz Harrosh, Ross Zwisler, Jens Axboe, Matthew Wilcox,
	linux-kernel, linux-fsdevel

On 08/07/2014 06:21 PM, Karel Zak wrote:
<>
> 
>  fdisk uses physical sector size or minimal I/O size (greater value wins)
> 

OK

>> I was trying to play with the  heads, sectors, cylinders; values but none I tried
> 
>  don't play with CHS, that's waste of time and it's completely ignored
>  by fdisk by default
> 
>> would cause an alignment of 4k, not even of the first partition start.
>>
>> Please advise what I can do?
> 
>  IMHO you're right with your patch (alignment offset is IMHO bad way).
>  It's all (brd) about pages, is there any reason to use something else
>  for I/O limits?
> 
>  It would be also nice to set minimal and optimal io size, zero values
>  in this case means (for userspace) that the device does not provide
>  any I/O information to system. It's normal for old hw disks and then
>  we use some built-in defaults, but I don't see a reason to do the
>  same for virtual devices. 
> 

Hi Ross

I have by now read the all code, and Karel also confirms this from fdisk
side. The best for us is the use of "physical sector size" but with
our "minimal I/O size" set to 512. The later has actual bad effects in
the Kernel code itself. But the "physical sector size" has no effect on
Kernel code, and actually has a very good affect on fdisk which now works
the way we would like it.

Please send your review-by so Jens can pick these up for mainline

>     Karel
> 

Thanks
Boaz


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4 v2] brd: Add getgeo to block ops
  2014-08-07 14:03   ` [PATCH 2/4 v2] " Boaz Harrosh
@ 2014-08-07 18:20     ` One Thousand Gnomes
  2014-08-08  6:52       ` Boaz Harrosh
  0 siblings, 1 reply; 22+ messages in thread
From: One Thousand Gnomes @ 2014-08-07 18:20 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel, linux-fsdevel

On Thu, 07 Aug 2014 17:03:08 +0300
Boaz Harrosh <boaz@plexistor.com> wrote:

> From: Boaz Harrosh <boaz@plexistor.com>
> 
> Some programs like fdisk, require HDIO_GETGEO to work, which requires we
> implement getgeo.
> 
> We set all hd_geometry members to 1, because this way fdisk
> math will not try its crazy geometry math and get stuff totally wrong.

If you are running a new storage system for god sake don't use DOS
partitioning, use GPT or something sane.

> +static int brd_getgeo(struct block_device *bd, struct hd_geometry *geo)
> +{
> +	/* Just tell fdisk to get out of the way. The math here is so
> +	 * convoluted and does not make any sense at all. With all 1s
> +	 * The math just gets out of the way.
> +	 * NOTE: I was trying to get some values that will make fdisk
> +	 * Want to align first sector on 4K (like 8, 16, 20, ... sectors) but
> +	 * nothing worked, I searched the net the math is not your regular
> +	 * simple multiplication at all. If you managed to get these please
> +	 * fix here. For now we use 4k physical sectors for this
> +	 */
> +	geo->heads = 1;
> +	geo->sectors = 1;
> +	geo->cylinders = 1;
> +	return 0;

This is then going to blow up on your with some other tool. Fix fdisk
instead. Lying to apps generally ends up like children lying to parents -
the lie gets more complicated to keep up each case you find until it
breaks.

Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] brd: Fix all partitions BUGs
  2014-08-07  9:11     ` Boaz Harrosh
@ 2014-08-07 18:50       ` Ross Zwisler
  0 siblings, 0 replies; 22+ messages in thread
From: Ross Zwisler @ 2014-08-07 18:50 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Boaz Harrosh, Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On Thu, 2014-08-07 at 12:11 +0300, Boaz Harrosh wrote:
> On 08/07/2014 02:06 AM, Ross Zwisler wrote:
> > Also, it looks like you can still create a new device with this patch, but you
> > can't create partitions on that device.  Not sure if this is just what you get
> > when you dynamically create a device on the fly, or if it's a symptom of
> > something larger.
> > 
> 
> What? I just tried again this all works fine for me, here with fdisk.
> $ modprobe brd      # will create ram0-7
> $ mknod /dev/ram8 b 1 8
> $ fdisk /dev/ram8
>   g, n, , , +2M, n, , , , , w 
> 
> I create 2 partitions 2M each and press w and it is all there.
> 
> What numbers did you use ? rd_nr, max_part, and the mknod numbers. Here it
> just works fine. What did you try?

Ah - it turns out the issue was that I wasn't following the naming scheme
"ramX" where X is your new device name.  Here's the sequence:

	# mknod /dev/ram_new b 1 6
	# fdisk /dev/ram_new 
		< create some partitions>

This ends up creating a "ram_new" and a "ram6", which have the same
major/minor.  The partitions do show up, but they live under ram6:

	brw-rw---- 1 root disk   1, 6 Aug  7 12:36 ram6
	brw-rw---- 1 root disk 259, 0 Aug  7 12:36 ram6p1
	brw-rw---- 1 root disk 259, 1 Aug  7 12:36 ram6p2
	brw-r--r-- 1 root root   1, 6 Aug  7 12:36 ram_new

You can run fdisk -l, etc, on ram_new, and it'll show you the partitions, they
just won't be surfaced in /dev.  ram6 and ram_new seem to be alaises:

	# fdisk -l /dev/ram_new

	Disk /dev/ram_new: 8589 MB, 8589934592 bytes
	64 heads, 32 sectors/track, 8192 cylinders
	Units = cylinders of 2048 * 512 = 1048576 bytes
	Sector size (logical/physical): 512 bytes / 4096 bytes
	I/O size (minimum/optimal): 512 bytes / 512 bytes
	Disk identifier: 0x2812942c

	       Device Boot      Start         End      Blocks   Id  System
	/dev/ram_new1               1        2049     2098160   83  Linux
	/dev/ram_new2            2050        8192     6290432   83  Linux

This device aliasing happened with the old BRD code as well, so this isn't new
behavior.

- Ross



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] brd: Fix all partitions BUGs
  2014-08-06 11:33 ` [PATCH 3/4] brd: Fix all partitions BUGs Boaz Harrosh
  2014-08-06 23:06   ` Ross Zwisler
@ 2014-08-07 18:53   ` Ross Zwisler
  1 sibling, 0 replies; 22+ messages in thread
From: Ross Zwisler @ 2014-08-07 18:53 UTC (permalink / raw)
  To: Boaz Harrosh; +Cc: Jens Axboe, Matthew Wilcox, linux-kernel, linux-fsdevel

On Wed, 2014-08-06 at 14:33 +0300, Boaz Harrosh wrote:
> This patch fixes up brd's partitions scheme, now enjoying all worlds.
> 
> The MAIN fix here is that currently if one fdisks some partitions,
> a BAD bug will make all partitions point to the same start-end sector
> ie: 0 - brd_size And an mkfs of any partition would trash the partition
> table and the other partition.
> 
> Another fix is that "mount -U uuid" did not work, because of the
> GENHD_FL_SUPPRESS_PARTITION_INFO flag.
> 
> So NOW the logic goes like this:
> * max_part - Just says how many minors to reserve between devices
>   But in any way, there can be as many partition as requested.
>   If minors between devices ends, then dynamic 259-major ids will
>   be allocated on the fly.
>   The default is now max_part=1, which means all partitions devt
>   will be from the dynamic major-range.
>   (If persistent partition minors is needed use max_part=)
> 
> * Creation of new devices on the fly still/always work:
>   mknod /path/devnod b 1 X
>   fdisk -l /path/devnod
>   Will create a new device if (X / max_part) was not already
>   created before. (Just as before)
> 
>   partitions on the dynamically created device will work as well
>   Same logic applies with minors as with the pre-created ones.
> 
> TODO: dynamic grow of device size, maybe through sysfs. So each
>       device can have it's own size.
> 
> Signed-off-by: Boaz Harrosh <boaz@plexistor.com>


Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4 v2] brd: Add getgeo to block ops
  2014-08-07 18:20     ` One Thousand Gnomes
@ 2014-08-08  6:52       ` Boaz Harrosh
  2014-08-08  6:58         ` Davidlohr Bueso
  0 siblings, 1 reply; 22+ messages in thread
From: Boaz Harrosh @ 2014-08-08  6:52 UTC (permalink / raw)
  To: One Thousand Gnomes
  Cc: Jens Axboe, Ross Zwisler, Matthew Wilcox, linux-kernel, linux-fsdevel

On Thu, Aug 7, 2014 at 9:20 PM, One Thousand Gnomes
<gnomes@lxorguk.ukuu.org.uk> wrote:
> On Thu, 07 Aug 2014 17:03:08 +0300
> Boaz Harrosh <boaz@plexistor.com> wrote:
>
>> From: Boaz Harrosh <boaz@plexistor.com>
>>
>> Some programs like fdisk, require HDIO_GETGEO to work, which requires we
>> implement getgeo.
>>
>> We set all hd_geometry members to 1, because this way fdisk
>> math will not try its crazy geometry math and get stuff totally wrong.
>
> If you are running a new storage system for god sake don't use DOS
> partitioning, use GPT or something sane.
>

I am always using GPT, on all my systems for few years now.
Fdisk  works exactly the same, regardless of format, regarding alignment
and start sectors.

Please try before you comment

>> +static int brd_getgeo(struct block_device *bd, struct hd_geometry *geo)
>> +{
>> +     /* Just tell fdisk to get out of the way. The math here is so
>> +      * convoluted and does not make any sense at all. With all 1s
>> +      * The math just gets out of the way.
>> +      * NOTE: I was trying to get some values that will make fdisk
>> +      * Want to align first sector on 4K (like 8, 16, 20, ... sectors) but
>> +      * nothing worked, I searched the net the math is not your regular
>> +      * simple multiplication at all. If you managed to get these please
>> +      * fix here. For now we use 4k physical sectors for this
>> +      */
>> +     geo->heads = 1;
>> +     geo->sectors = 1;
>> +     geo->cylinders = 1;
>> +     return 0;
>
> This is then going to blow up on your with some other tool. Fix fdisk
> instead. Lying to apps generally ends up like children lying to parents -
> the lie gets more complicated to keep up each case you find until it
> breaks.
>

Exactly my point sir. What is heads sectors and cylinders for a RAM
disk. Nothing. We have been lying for  years fearing the world of storage
will explode if we tell the truth. Finally, as you say, our lies came byte us in
the arss. Best is to tell the truth and the math at fdisk will do what it is
suppose to.

And the psychology talk aside, please talk code. Do you know of any
code that will break because of these values above?

I have by now looked at fdisk's code. It will calculate several alignment
constrains and finally pick the biggest one. Also it has code for very
fist sector and different one for other partition-start sectors. This HSC
math above being one of them. With these values HSC becomes very
small and gets out of the way, for the other considerations.
That's fdisk gparted (and parted) will just use 1Mib for alignment which
is very big, and be done with it. I would imagine all other tools use one
library or the other. Do you know otherwise?

Boaz

> Alan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/4 v2] brd: Add getgeo to block ops
  2014-08-08  6:52       ` Boaz Harrosh
@ 2014-08-08  6:58         ` Davidlohr Bueso
  0 siblings, 0 replies; 22+ messages in thread
From: Davidlohr Bueso @ 2014-08-08  6:58 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: One Thousand Gnomes, Jens Axboe, Ross Zwisler, Matthew Wilcox,
	linux-kernel, linux-fsdevel

On Fri, 2014-08-08 at 09:52 +0300, Boaz Harrosh wrote:
> On Thu, Aug 7, 2014 at 9:20 PM, One Thousand Gnomes
> <gnomes@lxorguk.ukuu.org.uk> wrote:
> > On Thu, 07 Aug 2014 17:03:08 +0300
> > Boaz Harrosh <boaz@plexistor.com> wrote:
> >
> >> From: Boaz Harrosh <boaz@plexistor.com>
> >>
> >> Some programs like fdisk, require HDIO_GETGEO to work, which requires we
> >> implement getgeo.
> >>
> >> We set all hd_geometry members to 1, because this way fdisk
> >> math will not try its crazy geometry math and get stuff totally wrong.
> >
> > If you are running a new storage system for god sake don't use DOS
> > partitioning, use GPT or something sane.
> >
> 
> I am always using GPT, on all my systems for few years now.
> Fdisk  works exactly the same, regardless of format, regarding alignment
> and start sectors.

Yes, fdisk now supports GPT, for about the past two years now. In fact
it has been completely rewritten.


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2014-08-08  6:58 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-06 11:27 [PATCHSET 0/4] brd: partition fixes Boaz Harrosh
2014-08-06 11:29 ` [PATCH 1/4] Change direct_access calling convention Boaz Harrosh
2014-08-06 11:30 ` [PATCH 2/4] brd: Add getgeo to block ops Boaz Harrosh
2014-08-06 17:52   ` Ross Zwisler
2014-08-07  9:20     ` Boaz Harrosh
2014-08-07 14:03   ` [PATCH 2/4 v2] " Boaz Harrosh
2014-08-07 18:20     ` One Thousand Gnomes
2014-08-08  6:52       ` Boaz Harrosh
2014-08-08  6:58         ` Davidlohr Bueso
2014-08-06 11:33 ` [PATCH 3/4] brd: Fix all partitions BUGs Boaz Harrosh
2014-08-06 23:06   ` Ross Zwisler
2014-08-07  9:11     ` Boaz Harrosh
2014-08-07 18:50       ` Ross Zwisler
2014-08-07 18:53   ` Ross Zwisler
2014-08-06 11:35 ` [PATCH 4/4] brd: Request from fdisk 4k alignment Boaz Harrosh
2014-08-06 22:03   ` Ross Zwisler
2014-08-07 12:17     ` Boaz Harrosh
2014-08-07 13:00       ` Karel Zak
2014-08-07 13:51         ` Karel Zak
2014-08-07 13:57         ` Boaz Harrosh
2014-08-07 15:21           ` Karel Zak
2014-08-07 15:40             ` Boaz Harrosh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.