All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] block: uImage.FIT filesystem image mapper
@ 2022-11-17  0:42 Daniel Golle
  2022-11-17  0:42 ` [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c Daniel Golle
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Daniel Golle @ 2022-11-17  0:42 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

uImage.FIT is the default image format of Das U-Boot bootloader which
is used on the great majority of embedded Linux devices.

Using uImage.FIT to also store the root filesystem besides kernel and
dtb has several obvious advantages which are hard to obtain in any
other way:
 * single image accross different storage types (mmcblk, mtdblock, ubiblock)
 * dynamically sized partitions for kernel and rootfs
 * hash also for rootfs checked by U-Boot before launching kernel
 * images may include additional filesystems e.g. for localization or
   branding

Add uImage.FIT partition parser, this time implemented as a tiny block
driver which can be built as a module. Being an early RFC, module
unloading by now has only been implemented rudimentary without removing
partitions added by the driver on unload. If the overall approach looks
acceptable, this can of course be improved.

For this to work, the image has to be created with external data and
sub-images aligned to the system's memory page boundaries, ie.
 mkimage -E -B 0x1000 -p 0x1000 ...

Booting such images has been supported by Das U-Boot since v2018.01.

A previous version of this partition parser is in production use on some
OpenWrt devices, eg. the BananaPi R64 where using the FIT parser allows
booting the very same image from eMMC, SD Card or SPI-NAND/UBI and also
using it as a firmware-upgrade image at the same time.
The Ubiquiti UniFi 6 LR access point served as a reference board with
SPI-NOR flash and use of the partition parser on top of a mtdblock
device.

As Das U-Boot by now also passes down the selected configuration node
name via device tree this allows the partition parser (or userspace
process via sysfs) to identify the selected image configuration.

Device Tree schema for that has also been merged already[1].

In most cases this partition parser can be used without relying on the
bootloader to pass-down the configuration node name. The default
configuration node is used then in that case.

A check of the chosen/u-boot,version node is used to make sure that
Das U-Boot has been used to boot the system and the user (or the
bootloader) is required to specify the lower block device to be used
by the parser. In case any of the two is missing, this driver won't
start.

This RFC presents an alternative implementation of the previously
posted patch series "partition parser for U-Boot's uImage.FIT"[1].

[1]: https://github.com/devicetree-org/dt-schema/commit/a24d97d43491e55d4def006213213a6c4045b646
[2]: https://patchwork.kernel.org/project/linux-block/list/?series=695709

Daniel Golle (4):
  init: move block device helpers from init/do_mounts.c
  block: add new flag to add partitions read-only
  blkdev: add function to add named read-only partitions
  block: add uImage.FIT block partition driver

 MAINTAINERS             |   6 +
 block/bdev.c            | 166 ++++++++++++++++++
 block/blk.h             |   1 +
 block/partitions/core.c |  37 ++++
 drivers/block/Kconfig   |   6 +
 drivers/block/Makefile  |   2 +
 drivers/block/fitblk.c  | 364 ++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h  |  22 +++
 init/do_mounts.c        | 166 +-----------------
 9 files changed, 606 insertions(+), 164 deletions(-)
 create mode 100644 drivers/block/fitblk.c

-- 
2.38.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-11-17  0:42 [RFC PATCH 0/4] block: uImage.FIT filesystem image mapper Daniel Golle
@ 2022-11-17  0:42 ` Daniel Golle
  2022-11-17  5:55   ` Christoph Hellwig
  2022-11-17  0:43 ` [RFC PATCH 2/4] block: add new flag to add partitions read-only Daniel Golle
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-11-17  0:42 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

In init/do_mounts.c there are helper functions devt_from_partuuid,
devt_from_partlabel and devt_from_devname. In order to make these
functions available to other users, move them to block/bdev.c.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 block/bdev.c           | 166 +++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |  15 ++++
 init/do_mounts.c       | 166 +----------------------------------------
 3 files changed, 183 insertions(+), 164 deletions(-)

diff --git a/block/bdev.c b/block/bdev.c
index d699ecdb3260..92e3e5c81337 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -1092,3 +1092,169 @@ void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 
 	blkdev_put_no_open(bdev);
 }
+
+struct uuidcmp {
+	const char *uuid;
+	int len;
+};
+
+/**
+ * match_dev_by_uuid - callback for finding a partition using its uuid
+ * @dev:	device passed in by the caller
+ * @data:	opaque pointer to the desired struct uuidcmp to match
+ *
+ * Returns 1 if the device matches, and 0 otherwise.
+ */
+static int match_dev_by_uuid(struct device *dev, const void *data)
+{
+	struct block_device *bdev = dev_to_bdev(dev);
+	const struct uuidcmp *cmp = data;
+
+	if (!bdev->bd_meta_info ||
+	    strncasecmp(cmp->uuid, bdev->bd_meta_info->uuid, cmp->len))
+		return 0;
+	return 1;
+}
+
+/**
+ * devt_from_partuuid - looks up the dev_t of a partition by its UUID
+ * @uuid_str:	char array containing ascii UUID
+ *
+ * The function will return the first partition which contains a matching
+ * UUID value in its partition_meta_info struct.  This does not search
+ * by filesystem UUIDs.
+ *
+ * If @uuid_str is followed by a "/PARTNROFF=%d", then the number will be
+ * extracted and used as an offset from the partition identified by the UUID.
+ *
+ * Returns the matching dev_t on success or 0 on failure.
+ */
+dev_t devt_from_partuuid(const char *uuid_str, int *root_wait)
+{
+	struct uuidcmp cmp;
+	struct device *dev = NULL;
+	dev_t devt = 0;
+	int offset = 0;
+	char *slash;
+
+	cmp.uuid = uuid_str;
+
+	slash = strchr(uuid_str, '/');
+	/* Check for optional partition number offset attributes. */
+	if (slash) {
+		char c = 0;
+
+		/* Explicitly fail on poor PARTUUID syntax. */
+		if (sscanf(slash + 1, "PARTNROFF=%d%c", &offset, &c) != 1)
+			goto clear_root_wait;
+		cmp.len = slash - uuid_str;
+	} else {
+		cmp.len = strlen(uuid_str);
+	}
+
+	if (!cmp.len)
+		goto clear_root_wait;
+
+	dev = class_find_device(&block_class, NULL, &cmp, &match_dev_by_uuid);
+	if (!dev)
+		return 0;
+
+	if (offset) {
+		/*
+		 * Attempt to find the requested partition by adding an offset
+		 * to the partition number found by UUID.
+		 */
+		devt = part_devt(dev_to_disk(dev),
+				 dev_to_bdev(dev)->bd_partno + offset);
+	} else {
+		devt = dev->devt;
+	}
+
+	put_device(dev);
+	return devt;
+
+clear_root_wait:
+	pr_err("VFS: PARTUUID= is invalid.\n"
+	       "Expected PARTUUID=<valid-uuid-id>[/PARTNROFF=%%d]\n");
+	if (root_wait && *root_wait) {
+		pr_err("Disabling rootwait; root= is invalid.\n");
+		*root_wait = 0;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devt_from_partuuid);
+
+/**
+ * match_dev_by_label - callback for finding a partition using its label
+ * @dev:	device passed in by the caller
+ * @data:	opaque pointer to the label to match
+ *
+ * Returns 1 if the device matches, and 0 otherwise.
+ */
+static int match_dev_by_label(struct device *dev, const void *data)
+{
+	struct block_device *bdev = dev_to_bdev(dev);
+	const char *label = data;
+
+	if (!bdev->bd_meta_info || strcmp(label, bdev->bd_meta_info->volname))
+		return 0;
+	return 1;
+}
+
+dev_t devt_from_partlabel(const char *label)
+{
+	struct device *dev;
+	dev_t devt = 0;
+
+	dev = class_find_device(&block_class, NULL, label, &match_dev_by_label);
+	if (dev) {
+		devt = dev->devt;
+		put_device(dev);
+	}
+
+	return devt;
+}
+EXPORT_SYMBOL_GPL(devt_from_partlabel);
+
+dev_t devt_from_devname(const char *name)
+{
+	dev_t devt = 0;
+	int part;
+	char s[32];
+	char *p;
+
+	if (strlen(name) > 31)
+		return 0;
+	strcpy(s, name);
+	for (p = s; *p; p++) {
+		if (*p == '/')
+			*p = '!';
+	}
+
+	devt = blk_lookup_devt(s, 0);
+	if (devt)
+		return devt;
+
+	/*
+	 * Try non-existent, but valid partition, which may only exist after
+	 * opening the device, like partitioned md devices.
+	 */
+	while (p > s && isdigit(p[-1]))
+		p--;
+	if (p == s || !*p || *p == '0')
+		return 0;
+
+	/* try disk name without <part number> */
+	part = simple_strtoul(p, NULL, 10);
+	*p = '\0';
+	devt = blk_lookup_devt(s, part);
+	if (devt)
+		return devt;
+
+	/* try disk name without p<part number> */
+	if (p < s + 2 || !isdigit(p[-2]) || p[-1] != 'p')
+		return 0;
+	p[-1] = '\0';
+	return blk_lookup_devt(s, part);
+}
+EXPORT_SYMBOL_GPL(devt_from_devname);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index fb27dfaaaf85..b45cdcdccc6d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1494,6 +1494,9 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart,
 		loff_t lend);
 
 #ifdef CONFIG_BLOCK
+dev_t devt_from_partuuid(const char *uuid_str, int *root_wait);
+dev_t devt_from_partlabel(const char *label);
+dev_t devt_from_devname(const char *name);
 void invalidate_bdev(struct block_device *bdev);
 int sync_blockdev(struct block_device *bdev);
 int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend);
@@ -1502,6 +1505,18 @@ void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
 #else
+static inline dev_t devt_from_partuuid(const char *uuid_str, int *root_wait)
+{
+	return MKDEV(0, 0);
+}
+static inline dev_t devt_from_partlabel(const char *label);
+{
+	return MKDEV(0, 0);
+}
+static inline dev_t devt_from_devname(const char *name);
+{
+	return MKDEV(0, 0);
+}
 static inline void invalidate_bdev(struct block_device *bdev)
 {
 }
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 811e94daf0a8..d55e34994f18 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 #include <linux/module.h>
 #include <linux/sched.h>
+#include <linux/blkdev.h>
 #include <linux/ctype.h>
 #include <linux/fd.h>
 #include <linux/tty.h>
@@ -60,169 +61,6 @@ static int __init readwrite(char *str)
 __setup("ro", readonly);
 __setup("rw", readwrite);
 
-#ifdef CONFIG_BLOCK
-struct uuidcmp {
-	const char *uuid;
-	int len;
-};
-
-/**
- * match_dev_by_uuid - callback for finding a partition using its uuid
- * @dev:	device passed in by the caller
- * @data:	opaque pointer to the desired struct uuidcmp to match
- *
- * Returns 1 if the device matches, and 0 otherwise.
- */
-static int match_dev_by_uuid(struct device *dev, const void *data)
-{
-	struct block_device *bdev = dev_to_bdev(dev);
-	const struct uuidcmp *cmp = data;
-
-	if (!bdev->bd_meta_info ||
-	    strncasecmp(cmp->uuid, bdev->bd_meta_info->uuid, cmp->len))
-		return 0;
-	return 1;
-}
-
-/**
- * devt_from_partuuid - looks up the dev_t of a partition by its UUID
- * @uuid_str:	char array containing ascii UUID
- *
- * The function will return the first partition which contains a matching
- * UUID value in its partition_meta_info struct.  This does not search
- * by filesystem UUIDs.
- *
- * If @uuid_str is followed by a "/PARTNROFF=%d", then the number will be
- * extracted and used as an offset from the partition identified by the UUID.
- *
- * Returns the matching dev_t on success or 0 on failure.
- */
-static dev_t devt_from_partuuid(const char *uuid_str)
-{
-	struct uuidcmp cmp;
-	struct device *dev = NULL;
-	dev_t devt = 0;
-	int offset = 0;
-	char *slash;
-
-	cmp.uuid = uuid_str;
-
-	slash = strchr(uuid_str, '/');
-	/* Check for optional partition number offset attributes. */
-	if (slash) {
-		char c = 0;
-
-		/* Explicitly fail on poor PARTUUID syntax. */
-		if (sscanf(slash + 1, "PARTNROFF=%d%c", &offset, &c) != 1)
-			goto clear_root_wait;
-		cmp.len = slash - uuid_str;
-	} else {
-		cmp.len = strlen(uuid_str);
-	}
-
-	if (!cmp.len)
-		goto clear_root_wait;
-
-	dev = class_find_device(&block_class, NULL, &cmp, &match_dev_by_uuid);
-	if (!dev)
-		return 0;
-
-	if (offset) {
-		/*
-		 * Attempt to find the requested partition by adding an offset
-		 * to the partition number found by UUID.
-		 */
-		devt = part_devt(dev_to_disk(dev),
-				 dev_to_bdev(dev)->bd_partno + offset);
-	} else {
-		devt = dev->devt;
-	}
-
-	put_device(dev);
-	return devt;
-
-clear_root_wait:
-	pr_err("VFS: PARTUUID= is invalid.\n"
-	       "Expected PARTUUID=<valid-uuid-id>[/PARTNROFF=%%d]\n");
-	if (root_wait)
-		pr_err("Disabling rootwait; root= is invalid.\n");
-	root_wait = 0;
-	return 0;
-}
-
-/**
- * match_dev_by_label - callback for finding a partition using its label
- * @dev:	device passed in by the caller
- * @data:	opaque pointer to the label to match
- *
- * Returns 1 if the device matches, and 0 otherwise.
- */
-static int match_dev_by_label(struct device *dev, const void *data)
-{
-	struct block_device *bdev = dev_to_bdev(dev);
-	const char *label = data;
-
-	if (!bdev->bd_meta_info || strcmp(label, bdev->bd_meta_info->volname))
-		return 0;
-	return 1;
-}
-
-static dev_t devt_from_partlabel(const char *label)
-{
-	struct device *dev;
-	dev_t devt = 0;
-
-	dev = class_find_device(&block_class, NULL, label, &match_dev_by_label);
-	if (dev) {
-		devt = dev->devt;
-		put_device(dev);
-	}
-
-	return devt;
-}
-
-static dev_t devt_from_devname(const char *name)
-{
-	dev_t devt = 0;
-	int part;
-	char s[32];
-	char *p;
-
-	if (strlen(name) > 31)
-		return 0;
-	strcpy(s, name);
-	for (p = s; *p; p++) {
-		if (*p == '/')
-			*p = '!';
-	}
-
-	devt = blk_lookup_devt(s, 0);
-	if (devt)
-		return devt;
-
-	/*
-	 * Try non-existent, but valid partition, which may only exist after
-	 * opening the device, like partitioned md devices.
-	 */
-	while (p > s && isdigit(p[-1]))
-		p--;
-	if (p == s || !*p || *p == '0')
-		return 0;
-
-	/* try disk name without <part number> */
-	part = simple_strtoul(p, NULL, 10);
-	*p = '\0';
-	devt = blk_lookup_devt(s, part);
-	if (devt)
-		return devt;
-
-	/* try disk name without p<part number> */
-	if (p < s + 2 || !isdigit(p[-2]) || p[-1] != 'p')
-		return 0;
-	p[-1] = '\0';
-	return blk_lookup_devt(s, part);
-}
-#endif /* CONFIG_BLOCK */
 
 static dev_t devt_from_devnum(const char *name)
 {
@@ -284,7 +122,7 @@ dev_t name_to_dev_t(const char *name)
 		return Root_RAM0;
 #ifdef CONFIG_BLOCK
 	if (strncmp(name, "PARTUUID=", 9) == 0)
-		return devt_from_partuuid(name + 9);
+		return devt_from_partuuid(name + 9, &root_wait);
 	if (strncmp(name, "PARTLABEL=", 10) == 0)
 		return devt_from_partlabel(name + 10);
 	if (strncmp(name, "/dev/", 5) == 0)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 2/4] block: add new flag to add partitions read-only
  2022-11-17  0:42 [RFC PATCH 0/4] block: uImage.FIT filesystem image mapper Daniel Golle
  2022-11-17  0:42 ` [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c Daniel Golle
@ 2022-11-17  0:43 ` Daniel Golle
  2022-11-17  0:44 ` [RFC PATCH 3/4] blkdev: add function to add named read-only partitions Daniel Golle
  2022-11-17  0:45 ` [RFC PATCH 4/4] block: add uImage.FIT block partition driver Daniel Golle
  3 siblings, 0 replies; 20+ messages in thread
From: Daniel Golle @ 2022-11-17  0:43 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

Add flag ADDPART_FLAG_READONLY to allow partition parsers marking a
partition to be set read-only.
This is needed for the uImage.FIT partition parser added by a follow-up
commit: we need to be sure the contents of uImage.FIT sub-images
remain unaltered they are validated using a hash within the uImage.FIT
structure which also serves as partition table.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 block/blk.h             | 1 +
 block/partitions/core.c | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/block/blk.h b/block/blk.h
index e85703ae81dd..05ac426350b2 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -414,6 +414,7 @@ void blk_free_ext_minor(unsigned int minor);
 #define ADDPART_FLAG_NONE	0
 #define ADDPART_FLAG_RAID	1
 #define ADDPART_FLAG_WHOLEDISK	2
+#define ADDPART_FLAG_READONLY	4
 int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
 		sector_t length);
 int bdev_del_partition(struct gendisk *disk, int partno);
diff --git a/block/partitions/core.c b/block/partitions/core.c
index b8112f52d388..355646b0707d 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -398,6 +398,9 @@ static struct block_device *add_partition(struct gendisk *disk, int partno,
 			goto out_del;
 	}
 
+	if (flags & ADDPART_FLAG_READONLY)
+		bdev->bd_read_only = true;
+
 	/* everything is up and running, commence */
 	err = xa_insert(&disk->part_tbl, partno, bdev, GFP_KERNEL);
 	if (err)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 3/4] blkdev: add function to add named read-only partitions
  2022-11-17  0:42 [RFC PATCH 0/4] block: uImage.FIT filesystem image mapper Daniel Golle
  2022-11-17  0:42 ` [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c Daniel Golle
  2022-11-17  0:43 ` [RFC PATCH 2/4] block: add new flag to add partitions read-only Daniel Golle
@ 2022-11-17  0:44 ` Daniel Golle
  2022-11-17  5:56   ` Christoph Hellwig
  2022-11-17  0:45 ` [RFC PATCH 4/4] block: add uImage.FIT block partition driver Daniel Golle
  3 siblings, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-11-17  0:44 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

Add function bdev_add_partition_ro() which can be used by drivers to
register named read-only partitions on a disk device.
Unlike the existing bdev_add_partition() function, there is also no
check for overlapping partitions.
This new function is going to be used by the uImage.FIT parser.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 block/partitions/core.c | 34 ++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h  |  7 +++++++
 2 files changed, 41 insertions(+)

diff --git a/block/partitions/core.c b/block/partitions/core.c
index 355646b0707d..060a6585a387 100644
--- a/block/partitions/core.c
+++ b/block/partitions/core.c
@@ -469,6 +469,40 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start,
 	return ret;
 }
 
+int bdev_add_partition_ro(struct gendisk *disk, int partno, sector_t start,
+			  sector_t length, const char *volname)
+{
+	struct block_device *part;
+	struct partition_meta_info *info;
+	int ret;
+
+	mutex_lock(&disk->open_mutex);
+	if (!disk_live(disk)) {
+		ret = -ENXIO;
+		goto out;
+	}
+
+	part = add_partition(disk, partno, start, length,
+			     ADDPART_FLAG_READONLY, NULL);
+	ret = PTR_ERR_OR_ZERO(part);
+	if (ret)
+		goto out;
+
+	if (volname) {
+		info = kzalloc(sizeof(struct partition_meta_info), GFP_KERNEL);
+		if (!info) {
+			ret = -ENOMEM;
+			goto out;
+		}
+		strscpy(info->volname, volname, sizeof(info->volname));
+		part->bd_meta_info = info;
+	}
+out:
+	mutex_unlock(&disk->open_mutex);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(bdev_add_partition_ro);
+
 int bdev_del_partition(struct gendisk *disk, int partno)
 {
 	struct block_device *part = NULL;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b45cdcdccc6d..6e468a2fc4ec 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1504,6 +1504,8 @@ int sync_blockdev_nowait(struct block_device *bdev);
 void sync_bdevs(bool wait);
 void bdev_statx_dioalign(struct inode *inode, struct kstat *stat);
 void printk_all_partitions(void);
+int bdev_add_partition_ro(struct gendisk *disk, int partno, sector_t start,
+			  sector_t length, const char *volname);
 #else
 static inline dev_t devt_from_partuuid(const char *uuid_str, int *root_wait)
 {
@@ -1537,6 +1539,11 @@ static inline void bdev_statx_dioalign(struct inode *inode, struct kstat *stat)
 static inline void printk_all_partitions(void)
 {
 }
+static inline int bdev_add_partition_ro(struct gendisk *disk, int partno, sector_t start,
+			  sector_t length, const char *volname)
+{
+	return 0;
+}
 #endif /* CONFIG_BLOCK */
 
 int fsync_bdev(struct block_device *bdev);
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC PATCH 4/4] block: add uImage.FIT block partition driver
  2022-11-17  0:42 [RFC PATCH 0/4] block: uImage.FIT filesystem image mapper Daniel Golle
                   ` (2 preceding siblings ...)
  2022-11-17  0:44 ` [RFC PATCH 3/4] blkdev: add function to add named read-only partitions Daniel Golle
@ 2022-11-17  0:45 ` Daniel Golle
  2022-11-17  5:58   ` Christoph Hellwig
  3 siblings, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-11-17  0:45 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

Add a small block driver which allows exposing filesystem sub-images
contained in U-Boot uImage.FIT images as block partitions.
The driver is intended for system using the U-Boot bootloader and
requires the user to specify the lower block device to be specified
as module parameter 'lower_dev'.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
 MAINTAINERS            |   6 +
 drivers/block/Kconfig  |   6 +
 drivers/block/Makefile |   2 +
 drivers/block/fitblk.c | 364 +++++++++++++++++++++++++++++++++++++++++
 4 files changed, 378 insertions(+)
 create mode 100644 drivers/block/fitblk.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 08b67532e374..39402d6d0b9c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -21184,6 +21184,12 @@ F:	Documentation/filesystems/ubifs-authentication.rst
 F:	Documentation/filesystems/ubifs.rst
 F:	fs/ubifs/
 
+U-BOOT UIMAGE.FIT PARSER
+M:	Daniel Golle <daniel@makrotopia.org>
+L:	linux-block@vger.kernel.org
+S:	Maintained
+F:	drivers/block/fitblk.c
+
 UBLK USERSPACE BLOCK DRIVER
 M:	Ming Lei <ming.lei@redhat.com>
 L:	linux-block@vger.kernel.org
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index a41145d52de9..c96c5b9964c7 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -383,6 +383,12 @@ config VIRTIO_BLK
 	  This is the virtual block driver for virtio.  It can be used with
           QEMU based VMMs (like KVM or Xen).  Say Y or M.
 
+config UIMAGE_FIT_BLK
+	tristate "uImage.FIT block driver"
+	help
+	  This is driver allows using filesystems contained in uImage.FIT images
+	  by mapping them as block partitions.
+
 config BLK_DEV_RBD
 	tristate "Rados block device (RBD)"
 	depends on INET && BLOCK
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 101612cba303..60ab4fc1442d 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -39,4 +39,6 @@ obj-$(CONFIG_BLK_DEV_NULL_BLK)	+= null_blk/
 
 obj-$(CONFIG_BLK_DEV_UBLK)			+= ublk_drv.o
 
+obj-$(CONFIG_UIMAGE_FIT_BLK)	+= fitblk.o
+
 swim_mod-y	:= swim.o swim_asm.o
diff --git a/drivers/block/fitblk.c b/drivers/block/fitblk.c
new file mode 100644
index 000000000000..7f959de8c5b2
--- /dev/null
+++ b/drivers/block/fitblk.c
@@ -0,0 +1,364 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * uImage.FIT virtual block device driver.
+ *
+ * Copyright (C) 2022 Daniel Golle
+ * Copyright (C) 2007 Nick Piggin
+ * Copyright (C) 2007 Novell Inc.
+ *
+ * Derived from drivers/block/brd.c which is in parts derived from
+ * drivers/block/rd.c, and drivers/block/loop.c, copyright of their respective
+ * owners.
+ *
+ * uImage.FIT headers extracted from U-Boot mkimage sources
+ *  (C) Copyright 2008 Semihalf
+ *  (C) Copyright 2000-2005
+ *  Wolfgang Denk, DENX Software Engineering, wd@denx.de.
+ */
+
+#include <linux/init.h>
+#include <linux/initrd.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/major.h>
+#include <linux/blkdev.h>
+#include <linux/blk-mq.h>
+#include <linux/ctype.h>
+#include <linux/hdreg.h>
+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/of_device.h>
+#include <linux/of_fdt.h>
+#include <linux/pagemap.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+#include <linux/libfdt.h>
+
+#define FIT_IMAGES_PATH		"/images"
+#define FIT_CONFS_PATH		"/configurations"
+
+/* hash/signature/key node */
+#define FIT_HASH_NODENAME	"hash"
+#define FIT_ALGO_PROP		"algo"
+#define FIT_VALUE_PROP		"value"
+#define FIT_IGNORE_PROP		"uboot-ignore"
+#define FIT_SIG_NODENAME	"signature"
+#define FIT_KEY_REQUIRED	"required"
+#define FIT_KEY_HINT		"key-name-hint"
+
+/* cipher node */
+#define FIT_CIPHER_NODENAME	"cipher"
+#define FIT_ALGO_PROP		"algo"
+
+/* image node */
+#define FIT_DATA_PROP		"data"
+#define FIT_DATA_POSITION_PROP	"data-position"
+#define FIT_DATA_OFFSET_PROP	"data-offset"
+#define FIT_DATA_SIZE_PROP	"data-size"
+#define FIT_TIMESTAMP_PROP	"timestamp"
+#define FIT_DESC_PROP		"description"
+#define FIT_ARCH_PROP		"arch"
+#define FIT_TYPE_PROP		"type"
+#define FIT_OS_PROP		"os"
+#define FIT_COMP_PROP		"compression"
+#define FIT_ENTRY_PROP		"entry"
+#define FIT_LOAD_PROP		"load"
+
+/* configuration node */
+#define FIT_KERNEL_PROP		"kernel"
+#define FIT_FILESYSTEM_PROP	"filesystem"
+#define FIT_RAMDISK_PROP	"ramdisk"
+#define FIT_FDT_PROP		"fdt"
+#define FIT_LOADABLE_PROP	"loadables"
+#define FIT_DEFAULT_PROP	"default"
+#define FIT_SETUP_PROP		"setup"
+#define FIT_FPGA_PROP		"fpga"
+#define FIT_FIRMWARE_PROP	"firmware"
+#define FIT_STANDALONE_PROP	"standalone"
+
+#define MIN_FREE_SECT		16
+#define REMAIN_VOLNAME		"rootfs_data"
+#define MAX_FIT_LOADABLES	16
+
+static char *lower_dev = NULL;
+static const char *ubootver = NULL;
+static LIST_HEAD(fitblk_devices);
+static DEFINE_MUTEX(devices_mutex);
+
+module_param(lower_dev, charp, 0444);
+MODULE_PARM_DESC(lower_dev, "block device to parse uImage.FIT");
+
+static int parse_fit_on_dev(struct device *dev)
+{
+	struct block_device *bdev = dev_to_bdev(dev);
+	struct gendisk *disk = dev_to_disk(dev);
+	struct address_space *mapping = disk->part0->bd_inode->i_mapping;
+	struct folio *folio;
+	void *fit;
+	u64 dsize, dsectors, imgmaxsect = 0;
+	u32 size, image_pos, image_len;
+	const __be32 *image_offset_be, *image_len_be, *image_pos_be;
+	int ret = 0, node, images, config;
+	const char *image_name, *image_type, *image_description,
+		*config_default, *config_description, *config_loadables;
+	u32 image_name_len, image_type_len, image_description_len,
+		bootconf_len, config_default_len, config_description_len,
+		config_loadables_len;
+	sector_t fit_start_sect = bdev->bd_start_sect, start_sect, nr_sects;
+	struct device_node *np = NULL;
+	const char *bootconf;
+	const char *loadable;
+	bool found;
+	int loadables_rem_len, loadable_len;
+	u16 loadcnt;
+	bool lower_is_part = !(bdev == disk->part0);
+	unsigned int slot = lower_is_part?65:1;
+
+	/* uImage.FIT should be aligned to page boundaries on disk */
+	if (fit_start_sect % (1 << (PAGE_SHIFT - SECTOR_SHIFT)))
+		return -ESPIPE;
+
+	/* map first page */
+	folio = read_mapping_folio(mapping,
+				   fit_start_sect >> (PAGE_SHIFT - SECTOR_SHIFT),
+				   NULL);
+
+	if (IS_ERR(folio))
+		return PTR_ERR(folio);
+
+	fit = folio_address(folio) +
+	      offset_in_folio(folio, fit_start_sect * SECTOR_SIZE);
+
+	/* uImage.FIT is based on flattened device tree structure */
+	if (fdt_check_header(fit)) {
+		ret = -EINVAL;
+		goto ret_out;
+	}
+
+	/* acquire disk size */
+	dsectors = get_capacity(bdev->bd_disk);
+	dsize = dsectors << SECTOR_SHIFT;
+
+	/* silently skip non-external-data legacy uImage.FIT */
+	size = fdt_totalsize(fit);
+	if (size > PAGE_SIZE) {
+		ret = -ENOTSUPP;
+		goto ret_out;
+	}
+
+	/* abort if FIT structure is larger than disk or partition size */
+	if (size >= dsize) {
+		ret = -EFBIG;
+		goto ret_out;
+	}
+
+	/* set boot config node name U-Boot may have added to the device tree */
+	np = of_find_node_by_path("/chosen");
+	if (np)
+		bootconf = of_get_property(np, "u-boot,bootconf", &bootconf_len);
+	else
+		bootconf = NULL;
+
+	/* find configuration path in uImage.FIT */
+	config = fdt_path_offset(fit, FIT_CONFS_PATH);
+	if (config < 0) {
+		pr_err("FIT: Cannot find %s node: %d\n",
+		       FIT_CONFS_PATH, config);
+		ret = -ENOENT;
+		goto ret_out;
+	}
+
+	/* get default configuration node name */
+	config_default =
+		fdt_getprop(fit, config, FIT_DEFAULT_PROP, &config_default_len);
+
+	/* make sure we got either default or selected boot config node name */
+	if (!config_default && !bootconf) {
+		pr_err("FIT: Cannot find default configuration\n");
+		ret = -ENOENT;
+		goto ret_out;
+	}
+
+	/* find selected boot config node, fallback on default config node */
+	node = fdt_subnode_offset(fit, config, bootconf ?: config_default);
+	if (node < 0) {
+		pr_err("FIT: Cannot find %s node: %d\n",
+		       bootconf ?: config_default, node);
+		ret = -ENOENT;
+		goto ret_out;
+	}
+
+	pr_info("FIT: Detected U-Boot %s\n", ubootver);
+
+	/* get selected configuration data */
+	config_description =
+		fdt_getprop(fit, node, FIT_DESC_PROP, &config_description_len);
+	config_loadables = fdt_getprop(fit, node, FIT_LOADABLE_PROP,
+				       &config_loadables_len);
+
+	pr_info("FIT: %s configuration: \"%.*s\"%s%.*s%s\n",
+		bootconf ? "Selected" : "Default",
+		bootconf ? bootconf_len : config_default_len,
+		bootconf ?: config_default,
+		config_description ? " (" : "",
+		config_description ? config_description_len : 0,
+		config_description ?: "",
+		config_description ? ")" : "");
+
+	if (!config_loadables || !config_loadables_len) {
+		pr_err("FIT: No loadables configured in \"%s\"\n",
+		       bootconf ?: config_default);
+		ret = -ENOENT;
+		goto ret_out;
+	}
+
+	/* get images path in uImage.FIT */
+	images = fdt_path_offset(fit, FIT_IMAGES_PATH);
+	if (images < 0) {
+		pr_err("FIT: Cannot find %s node: %d\n", FIT_IMAGES_PATH, images);
+		ret = -EINVAL;
+		goto ret_out;
+	}
+
+	/* iterate over images in uImage.FIT */
+	fdt_for_each_subnode(node, fit, images) {
+		image_name = fdt_get_name(fit, node, &image_name_len);
+		image_type = fdt_getprop(fit, node, FIT_TYPE_PROP, &image_type_len);
+		image_offset_be = fdt_getprop(fit, node, FIT_DATA_OFFSET_PROP, NULL);
+		image_pos_be = fdt_getprop(fit, node, FIT_DATA_POSITION_PROP, NULL);
+		image_len_be = fdt_getprop(fit, node, FIT_DATA_SIZE_PROP, NULL);
+
+		if (!image_name || !image_type || !image_len_be)
+			continue;
+
+		image_len = be32_to_cpu(*image_len_be);
+		if (!image_len)
+			continue;
+
+		if (image_offset_be)
+			image_pos = be32_to_cpu(*image_offset_be) + size;
+		else if (image_pos_be)
+			image_pos = be32_to_cpu(*image_pos_be);
+		else
+			continue;
+
+		image_description = fdt_getprop(fit, node, FIT_DESC_PROP,
+						&image_description_len);
+
+		pr_info("FIT: %16s sub-image 0x%08x..0x%08x \"%.*s\"%s%.*s%s\n",
+			image_type, image_pos, image_pos + image_len - 1,
+			image_name_len, image_name, image_description ? " (" : "",
+			image_description ? image_description_len : 0,
+			image_description ?: "", image_description ? ") " : "");
+
+		/* only 'filesystem' images should be mapped as partitions */
+		if (strncmp(image_type, FIT_FILESYSTEM_PROP, image_type_len))
+			continue;
+
+		/* check if sub-image is part of configured loadables */
+		found = false;
+		loadable = config_loadables;
+		loadables_rem_len = config_loadables_len;
+		for (loadcnt = 0; loadables_rem_len > 1 &&
+				  loadcnt < MAX_FIT_LOADABLES; ++loadcnt) {
+			loadable_len =
+				strnlen(loadable, loadables_rem_len - 1) + 1;
+			loadables_rem_len -= loadable_len;
+			if (!strncmp(image_name, loadable, loadable_len)) {
+				found = true;
+				break;
+			}
+			loadable += loadable_len;
+		}
+		if (!found)
+			continue;
+
+		if (image_pos % (1 << PAGE_SHIFT)) {
+			pr_err("FIT: image %.*s start not aligned to page boundaries, skipping\n",
+			       image_name_len, image_name);
+			continue;
+		}
+
+		if (image_len % (1 << PAGE_SHIFT)) {
+			pr_err("FIT: sub-image %.*s end not aligned to page boundaries, skipping\n",
+			       image_name_len, image_name);
+			continue;
+		}
+
+		start_sect = image_pos >> SECTOR_SHIFT;
+		nr_sects = image_len >> SECTOR_SHIFT;
+		imgmaxsect = max_t(sector_t, imgmaxsect, start_sect + nr_sects);
+
+		if (start_sect + nr_sects > dsectors) {
+			pr_err("FIT: sub-image %.*s disk access beyond EOD\n",
+			       image_name_len, image_name);
+			continue;
+		}
+
+		bdev_add_partition_ro(disk, slot++, fit_start_sect + start_sect,
+				      nr_sects, image_name);
+	}
+
+	/* in case uImage.FIT is stored in a partition, map the remaining space */
+	if (lower_is_part && (imgmaxsect + MIN_FREE_SECT) < dsectors)
+		bdev_add_partition_ro(disk, slot++, fit_start_sect + imgmaxsect,
+				      dsectors - imgmaxsect, REMAIN_VOLNAME);
+
+ret_out:
+	folio_put(folio);
+	return ret;
+}
+
+static int fitblk_probe(struct platform_device *pdev)
+{
+	dev_t devt = devt_from_devname(lower_dev);
+	struct device *dev;
+	if (!devt)
+		return -EPROBE_DEFER;
+
+	dev = class_find_device_by_devt(&block_class, devt);
+	if (!dev)
+		return -EPROBE_DEFER;
+
+	return parse_fit_on_dev(dev);
+}
+
+static struct platform_driver fitblk_driver = {
+	.probe		= fitblk_probe,
+	.driver		= {
+		.name   = "fitblk",
+		.owner   = THIS_MODULE,
+	},
+};
+
+static int __init fitblk_init(void)
+{
+	struct device_node *np;
+
+	if (!lower_dev)
+		return 0;
+
+	/* detect U-Boot firmware */
+	np = of_find_node_by_path("/chosen");
+	if (np)
+		ubootver = of_get_property(np, "u-boot,version", NULL);
+
+	if (!ubootver)
+		return 0;
+
+	platform_device_register_simple("fitblk", -1, NULL, 0);
+	return platform_driver_register(&fitblk_driver);
+}
+module_init(fitblk_init);
+
+static void __exit fitblk_exit(void)
+{
+	platform_driver_unregister(&fitblk_driver);
+}
+module_exit(fitblk_exit);
+
+MODULE_AUTHOR("Daniel Golle");
+MODULE_DESCRIPTION("uImage.FIT virtual block driver");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:fitblk");
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-11-17  0:42 ` [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c Daniel Golle
@ 2022-11-17  5:55   ` Christoph Hellwig
  2022-11-19 16:03     ` Daniel Golle
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-11-17  5:55 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

On Thu, Nov 17, 2022 at 12:42:43AM +0000, Daniel Golle wrote:
> In init/do_mounts.c there are helper functions devt_from_partuuid,
> devt_from_partlabel and devt_from_devname. In order to make these
> functions available to other users, move them to block/bdev.c.

Yes, they should be moved and I have an old patch to do this a bit
smarter.  But that doesn't mean anyone else should call them.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/4] blkdev: add function to add named read-only partitions
  2022-11-17  0:44 ` [RFC PATCH 3/4] blkdev: add function to add named read-only partitions Daniel Golle
@ 2022-11-17  5:56   ` Christoph Hellwig
  2022-11-17 17:12     ` Daniel Golle
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-11-17  5:56 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

On Thu, Nov 17, 2022 at 12:44:40AM +0000, Daniel Golle wrote:
> Add function bdev_add_partition_ro() which can be used by drivers to
> register named read-only partitions on a disk device.
> Unlike the existing bdev_add_partition() function, there is also no
> check for overlapping partitions.
> This new function is going to be used by the uImage.FIT parser.

Err, no.  No on has any business adding partitions to the block device
except for the partition parser.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 4/4] block: add uImage.FIT block partition driver
  2022-11-17  0:45 ` [RFC PATCH 4/4] block: add uImage.FIT block partition driver Daniel Golle
@ 2022-11-17  5:58   ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-11-17  5:58 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Jens Axboe, Martin K. Petersen,
	Chaitanya Kulkarni, Wolfram Sang, linux-block, linux-kernel

On Thu, Nov 17, 2022 at 12:45:05AM +0000, Daniel Golle wrote:
> Add a small block driver which allows exposing filesystem sub-images
> contained in U-Boot uImage.FIT images as block partitions.
> The driver is intended for system using the U-Boot bootloader and
> requires the user to specify the lower block device to be specified
> as module parameter 'lower_dev'.

Umm, this is is not a block device driver.  A block device driver
would take in bios and the upper devices it creates and then submit
them to the lower devices.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 3/4] blkdev: add function to add named read-only partitions
  2022-11-17  5:56   ` Christoph Hellwig
@ 2022-11-17 17:12     ` Daniel Golle
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel Golle @ 2022-11-17 17:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Wed, Nov 16, 2022 at 09:56:37PM -0800, Christoph Hellwig wrote:
> On Thu, Nov 17, 2022 at 12:44:40AM +0000, Daniel Golle wrote:
> > Add function bdev_add_partition_ro() which can be used by drivers to
> > register named read-only partitions on a disk device.
> > Unlike the existing bdev_add_partition() function, there is also no
> > check for overlapping partitions.
> > This new function is going to be used by the uImage.FIT parser.
> 
> Err, no.  No on has any business adding partitions to the block device
> except for the partition parser.

Well, there is a user-space ioctl for this as well[1], just that won't
set the partition to read-only and also doesn't allow naming it...

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/block/ioctl.c#n40

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-11-17  5:55   ` Christoph Hellwig
@ 2022-11-19 16:03     ` Daniel Golle
  2022-11-22 12:37       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-11-19 16:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Matthew Wilcox, Jens Axboe,
	Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

Hi Christoph,

On Wed, Nov 16, 2022 at 09:55:55PM -0800, Christoph Hellwig wrote:
> On Thu, Nov 17, 2022 at 12:42:43AM +0000, Daniel Golle wrote:
> > In init/do_mounts.c there are helper functions devt_from_partuuid,
> > devt_from_partlabel and devt_from_devname. In order to make these
> > functions available to other users, move them to block/bdev.c.
> 
> Yes, they should be moved and I have an old patch to do this a bit
> smarter.  But that doesn't mean anyone else should call them.

I have followed your advise and implemented a "real" block driver
stacking on top of an existing block device to expose uImage.FIT
filesystem subimages as block devices. It mangles any bio submitted to
it and re-submits the bio with changed bi_bdev and sector offset added
to bi_iter.bi_sector for all list elements.

That works, but has slightly less utility value than the partition
parser approach as in this way I cannot easily populate the PARTNAME
uevent which can later help userspace to identify a device by the FIT
subimage name -- I'd have to either invent a new bus_type or
device_type, both seem inappropriate and have unwanted side effects.
Or am I missing something and there is a way to use add_uevent_var()
for a disk_type device?

In exchange, instead of using the PARTNAME= uevent var, I can now
freely name the device node by setting the disk_name string in
struct gendisk. While this doesn't offer the same amount of freedom, it
is sufficient for our use-case.

However, I don't see a way to avoid using (or duplicating)
devt_from_devname() to select the lower device somehow without having
to probe and parse *all* block devices present (which is, from what I
understood, what you want to avoid, and I agree that it is more safe to
not do that...)

Can you or anyone give some advise on how this should be done?

As block partitions are not present in device tree, using a cmdline
argument (with similar semantics to 'root=') specifying the lower block
device seems unavoidable.

Also note that due to libfdt not exporting symbols, the driver
currently cannot be built as a module. Hence, I would either need to
export most of libfdt API or only allow the driver to be built-into the
kernel. I don't see a problem with having it always built-in.

Yet another (imho not terrible) problem is removal of the lower device.
Many of the supported SBC use a micro SD card to boot, which can be
removed by the user while the system is running (which is generally not
a good idea, but anyway). For partitions this is handled automatically
by blk_drop_partitions() called directly from genhd.c.
I'm currently playing with doing something similar using the bus device
removal notification, but it doesn't seem to work for all cases, e.g.
mmcblk device do not seem to have the ->bus pointer populated at all
(ie. disk_to_dev(disk)->bus == NULL for mmcblk devices).

Any ideas regarding that?


Thank you for your advise!


Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-11-19 16:03     ` Daniel Golle
@ 2022-11-22 12:37       ` Christoph Hellwig
  2022-12-09 17:00         ` Daniel Golle
  2023-03-30 14:43         ` Daniel Golle
  0 siblings, 2 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-11-22 12:37 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Richard Weinberger, Matthew Wilcox,
	Jens Axboe, Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Sat, Nov 19, 2022 at 04:03:11PM +0000, Daniel Golle wrote:
> That works, but has slightly less utility value than the partition
> parser approach as in this way I cannot easily populate the PARTNAME
> uevent which can later help userspace to identify a device by the FIT
> subimage name -- I'd have to either invent a new bus_type or
> device_type, both seem inappropriate and have unwanted side effects.
> Or am I missing something and there is a way to use add_uevent_var()
> for a disk_type device?

You're not exposing a partition here - this is an image format that
sits in a partition and we should not pretend that is a partition.

> However, I don't see a way to avoid using (or duplicating)
> devt_from_devname() to select the lower device somehow without having
> to probe and parse *all* block devices present (which is, from what I
> understood, what you want to avoid, and I agree that it is more safe to
> not do that...)
> 
> Can you or anyone give some advise on how this should be done?

Just set the block driver up from an initramfs, like we do for all
modern stackable drivers.

> Yet another (imho not terrible) problem is removal of the lower device.
> Many of the supported SBC use a micro SD card to boot, which can be
> removed by the user while the system is running (which is generally not
> a good idea, but anyway). For partitions this is handled automatically
> by blk_drop_partitions() called directly from genhd.c.
> I'm currently playing with doing something similar using the bus device
> removal notification, but it doesn't seem to work for all cases, e.g.
> mmcblk device do not seem to have the ->bus pointer populated at all
> (ie. disk_to_dev(disk)->bus == NULL for mmcblk devices).

I have WIP patches that allow the claimer of a block device get
resize and removal notification.  It's not going to land for 6.2,
but I hope I have it ready in time for the next merge window.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-11-22 12:37       ` Christoph Hellwig
@ 2022-12-09 17:00         ` Daniel Golle
  2022-12-12  9:03           ` Christoph Hellwig
  2023-03-30 14:43         ` Daniel Golle
  1 sibling, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-12-09 17:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Matthew Wilcox, Jens Axboe,
	Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

Hi Christoph,

On Tue, Nov 22, 2022 at 04:37:08AM -0800, Christoph Hellwig wrote:
> On Sat, Nov 19, 2022 at 04:03:11PM +0000, Daniel Golle wrote:
> > That works, but has slightly less utility value than the partition
> > parser approach as in this way I cannot easily populate the PARTNAME
> > uevent which can later help userspace to identify a device by the FIT
> > subimage name -- I'd have to either invent a new bus_type or
> > device_type, both seem inappropriate and have unwanted side effects.
> > Or am I missing something and there is a way to use add_uevent_var()
> > for a disk_type device?
> 
> You're not exposing a partition here - this is an image format that
> sits in a partition and we should not pretend that is a partition.

It doesn't need to be literally the PARTNAME uevent, just any way to
communicate the names of mapped subimages to userspace.

My understanding by now is that there is no way around introducing a
new device_type and then mitigate the unwanted side effects by
follow-up changes, ie. make it possible to use that new device_type
when specifying the rootfs= cmdline variable (currently only disks and
partitions are considered there).

Or give up on the idea that uImage.FIT subimages mapped by the new
driver can be identified by userspace by poking uevent from sysfs and
just rely on convention and ordering.

> 
> > However, I don't see a way to avoid using (or duplicating)
> > devt_from_devname() to select the lower device somehow without having
> > to probe and parse *all* block devices present (which is, from what I
> > understood, what you want to avoid, and I agree that it is more safe to
> > not do that...)
> > 
> > Can you or anyone give some advise on how this should be done?
> 
> Just set the block driver up from an initramfs, like we do for all
> modern stackable drivers.

Instead of using a kernel cmdline parameter we could also have the
bootloader embed that information as string in the 'chosen' section in
the device tree blob, right next to the cmdline. However, as there is
no representation of block partitions in device tree, also in that case
the lower device will have to be referenced by a string somehow, ie.
devt_from_devname() or the like will be needed.

Needing an initramfs, even if it boils down to just one statically
compile executable, is a massive bloat and complication when building
embedded device firmware and none of the over 1580 devices currently
supported by OpenWrt need an intermediate initramfs to mount their
on-flash squashfs rootfs (some, however, already use this uImage.FIT
partition parser, and not needing a downstream patch for that would be
nice).

uImage.FIT typically contains the complete firmware used on an embedded
device, ie. at least a Linux kernel, device tree blob and a filesystem.
The main use of this whole uImage.FIT-parsing-in-Linux approach I'm
trying to get across here is to expose one or more 'filesystem'-type
subimages of such an image as block devices, also so that one of them
can directly be mounted as rootfs by the kernel.

This *replaces* the use of 'ramdisk' type sub-images which need to
remain allocated at runtime, while using a squashfs 'filesystem' type
sub-image allows freeing the filesystem cache if ram is becomes scarce.

As both, storage and memory, are often very limited on small embedded
devices, OpenWrt has always been using a squashfs as rootfs with a
storage-type specific filesytem used as r/w overlay on top. Up to now,
the rootfs is often stored in platform-specific ways, ie. an additional
partition on block devices, MTD partition on NOR flash or UBI volume on
NAND flash.

Carrying the read-only squashfs filesystem inside the uImage.FIT
structure has the advantage of being agnostic regarding the
storage-type (NOR/mtdblockX, NAND/ubiblockX, MMC/mmcblkXpY) and allows
the bootloader to validate the filesystem hash before starting the
kernel, ie. ensuring integrity of the firmware as-a-whole which
includes the root filesystem.


> 
> > Yet another (imho not terrible) problem is removal of the lower device.
> > Many of the supported SBC use a micro SD card to boot, which can be
> > removed by the user while the system is running (which is generally not
> > a good idea, but anyway). For partitions this is handled automatically
> > by blk_drop_partitions() called directly from genhd.c.
> > I'm currently playing with doing something similar using the bus device
> > removal notification, but it doesn't seem to work for all cases, e.g.
> > mmcblk device do not seem to have the ->bus pointer populated at all
> > (ie. disk_to_dev(disk)->bus == NULL for mmcblk devices).
> 
> I have WIP patches that allow the claimer of a block device get
> resize and removal notification.  It's not going to land for 6.2,
> but I hope I have it ready in time for the next merge window.

I'm looking forward to integrate that in the uImage.FIT block driver
I've been working on. In the meantime, should I already post my current
draft so we can start discussing if that solution could be acceptable?


Best regards


Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-12-09 17:00         ` Daniel Golle
@ 2022-12-12  9:03           ` Christoph Hellwig
  2022-12-12 11:02             ` Daniel Golle
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-12  9:03 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Richard Weinberger, Matthew Wilcox,
	Jens Axboe, Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Fri, Dec 09, 2022 at 05:00:34PM +0000, Daniel Golle wrote:
> It doesn't need to be literally the PARTNAME uevent, just any way to
> communicate the names of mapped subimages to userspace.
> 
> My understanding by now is that there is no way around introducing a
> new device_type and then mitigate the unwanted side effects by
> follow-up changes, ie. make it possible to use that new device_type
> when specifying the rootfs= cmdline variable (currently only disks and
> partitions are considered there).
> 
> Or give up on the idea that uImage.FIT subimages mapped by the new
> driver can be identified by userspace by poking uevent from sysfs and
> just rely on convention and ordering.

To be honest I'm still really confused by the patch set.  What is
the problem with simply using an initramfs, which is the way to
deal with any non-trivial init issue for about 20 years now, predated
by the somewhat more awkware initrd...

> Needing an initramfs, even if it boils down to just one statically
> compile executable, is a massive bloat and complication when building
> embedded device firmware and none of the over 1580 devices currently
> supported by OpenWrt need an intermediate initramfs to mount their
> on-flash squashfs rootfs (some, however, already use this uImage.FIT
> partition parser, and not needing a downstream patch for that would be
> nice).

You can always build the initrams into the kernel image, I don't
see how that is bloat.

> Carrying the read-only squashfs filesystem inside the uImage.FIT
> structure has the advantage of being agnostic regarding the
> storage-type (NOR/mtdblockX, NAND/ubiblockX, MMC/mmcblkXpY) and allows
> the bootloader to validate the filesystem hash before starting the
> kernel, ie. ensuring integrity of the firmware as-a-whole which
> includes the root filesystem.

What is the point of the uImage.FIT?  Why can't you use a sane
format?  Or at least not use it on top of partitions, which is just
braindead.  Who actually came up with this amazingly convoluted and
annoying scheme and why?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-12-12  9:03           ` Christoph Hellwig
@ 2022-12-12 11:02             ` Daniel Golle
  2022-12-13  6:48               ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-12-12 11:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Matthew Wilcox, Jens Axboe,
	Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Mon, Dec 12, 2022 at 01:03:09AM -0800, Christoph Hellwig wrote:
> On Fri, Dec 09, 2022 at 05:00:34PM +0000, Daniel Golle wrote:
> > It doesn't need to be literally the PARTNAME uevent, just any way to
> > communicate the names of mapped subimages to userspace.
> > 
> > My understanding by now is that there is no way around introducing a
> > new device_type and then mitigate the unwanted side effects by
> > follow-up changes, ie. make it possible to use that new device_type
> > when specifying the rootfs= cmdline variable (currently only disks and
> > partitions are considered there).
> > 
> > Or give up on the idea that uImage.FIT subimages mapped by the new
> > driver can be identified by userspace by poking uevent from sysfs and
> > just rely on convention and ordering.
> 
> To be honest I'm still really confused by the patch set.  What is
> the problem with simply using an initramfs, which is the way to
> deal with any non-trivial init issue for about 20 years now, predated
> by the somewhat more awkware initrd...

The thing is that there isn't anything extraordinarily complex here,
just dynamically partitioning a block device based on a well-known
format.

> 
> > Needing an initramfs, even if it boils down to just one statically
> > compile executable, is a massive bloat and complication when building
> > embedded device firmware and none of the over 1580 devices currently
> > supported by OpenWrt need an intermediate initramfs to mount their
> > on-flash squashfs rootfs (some, however, already use this uImage.FIT
> > partition parser, and not needing a downstream patch for that would be
> > nice).
> 
> You can always build the initrams into the kernel image, I don't
> see how that is bloat.

Using initramfs implies that we would need a 2nd copy of the standard C
library and libfdt, both alone will already occupy more than just a
single 64kB block of flash. I understand that from the point of view of
classic x86 servers or even embedded devices with eMMC this seems
negligible. However, keep in mind that a huge number of existing
devices and also new designs of embedded devices often boot from just a
few megabytes of NOR flash, and there every byte counts.

> 
> > Carrying the read-only squashfs filesystem inside the uImage.FIT
> > structure has the advantage of being agnostic regarding the
> > storage-type (NOR/mtdblockX, NAND/ubiblockX, MMC/mmcblkXpY) and allows
> > the bootloader to validate the filesystem hash before starting the
> > kernel, ie. ensuring integrity of the firmware as-a-whole which
> > includes the root filesystem.
> 
> What is the point of the uImage.FIT?

It is the format used by Das U-Boot, which is by far the most common
bootloader found on small embedded devices running Linux.
Is is already used by Das U-Boot to validate and load kernel,
devicetree, initramfs, ... to RAM before launching Linux.
I've included a link to the documentation[1] which gives insights
regarding the motivation to create such a format.

> Why can't you use a sane format?

Define "sane format". :) And tell that to the people over at Das U-Boot
please. I'm just trying to make best use of the status-quo which is
that uImage.FIT is the de-facto standard to boot Linux on small
embedded devices.

> Or at least not use it on top of partitions, which is just
> braindead.

In fact, that's only one out of three possible uses in which parsing
the contained sub-image boundaries can be useful:
 * On devices with NOR flash uImage.FIT is stored in an MTD partition,
   hence the uImage.FIT partition parser (or small stackable block
   driver) would then operate on top of /dev/mtdblockX.

 * On devices with NAND flash uImage.FIT is stored in a UBI volume,
   hence in this case /dev/ubiblockX needs to be processed.

 * On devices with block storage (like eMMC or SD card) the image is
   stored on a (GPT) partition (/dev/mmcblkXpY).

Generally speaking, when it comes to storing the rootfs (typically
squashfs), this can be solved in two ways:
 a) keep the rootfs inside the uImage.FIT structure (to be then dealt
    with by this patch series)

 b) store the rootfs on an additional partition or volume:
    - additional GPT partition on block devices
    - additional MTD partition on devices with NOR flash
    - additional UBI volume on devices with NAND flash

Now lets look at the consequences of each approach:

I.
It is often a requirement that the bootloader shall decide about the
integrity of the firmware before starting it (ie. verify hashes of
kernel, dtb, **and rootfs**). In case of approch b) this will have to
be implemented in a custom, platform-specific way; many vendors do
things like this, resulting in huge variety of different, customs ways
which let the bootloader validate the rootfs. As a huge majority of
devices actually use Das U-Boot which already supports validating
uImage.FIT content, this becomes a no-brainer when using approach a).

II.
Updating the firmware in case of approach a) being used is very simple:
Just write the whole image to a storage volume. In case of approach b)
an archive or tarball has to be used, allowing kernel.uImage and rootfs
to each be written to different storage volumes. And for the special
case of NOR flash, developers came up with a complete out-of-tree
subsystem handling only the splitting of MTD partitions in the various
ways used by vendor-modified bootloaders [2].

III.
Approach a) allows the very same uImage.FIT binary to be read by the
bootloader, used as firmware-upgrade format within Linux userland and
also serve as a recovery image format which can be loaded e.g. via TFTP
and stored by the bootloader. Approach b) will require different
formats for firmware upgrades (eg. a tarball) and storage on-disk,
depending on the platform and type of storage media. And when it comes
to bootloader-level recovery via TFTP, one would have to implement
extraction of that tarball **by the bootloader** or use yet another
format.

IV.
When it comes to devices allowing to boot from different media,
approach a) allows using the exact same firmware image independently of
the storage media used for booting. When using approach b), different
firmware images have to be provided **for the same device**, depending
on whether a block device, NAND/UBI or NOR/MTD is used for booting.
The BananaPi BPi-R64 and more recent BananaPi BPi-R3 boards are good
examples for boards which allow booting of different media, and are
already using uImage.FIT as a unified format.

I hope this explains my motivation. Please ask should there by any
doubts or if any of my explainations above are not clear.

> Who actually came up with this amazingly convoluted and
> annoying scheme and why?

This may answer your question:
[1]: https://source.denx.de/u-boot/u-boot/-/blob/master/doc/uImage.FIT/howto.txt

[2]: https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=target/linux/generic/files/drivers/mtd/mtdsplit

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-12-12 11:02             ` Daniel Golle
@ 2022-12-13  6:48               ` Christoph Hellwig
  2022-12-13 12:45                 ` Daniel Golle
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-13  6:48 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Richard Weinberger, Matthew Wilcox,
	Jens Axboe, Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Mon, Dec 12, 2022 at 11:02:33AM +0000, Daniel Golle wrote:
> The thing is that there isn't anything extraordinarily complex here,
> just dynamically partitioning a block device based on a well-known
> format.

Yes, but a completely non-standard format that nests inside an
partition.

> Using initramfs implies that we would need a 2nd copy of the standard C
> library and libfdt, both alone will already occupy more than just a
> single 64kB block of flash.

Why do you need libfdt?  And with a simple statically linked kpartx
you won't pull in much of libc either.

> I understand that from the point of view of
> classic x86 servers or even embedded devices with eMMC this seems
> negligible. However, keep in mind that a huge number of existing
> devices and also new designs of embedded devices often boot from just a
> few megabytes of NOR flash, and there every byte counts.

So I've worked quite a bit on really small deeply embedded systems,
and for those I wouldn't even think of using strange image formats
or the rather wasteful GPT partition format.
There we wouldn't dare to use paritions or weird image formats, but

> > What is the point of the uImage.FIT?
> 
> It is the format used by Das U-Boot, which is by far the most common
> bootloader found on small embedded devices running Linux.
> Is is already used by Das U-Boot to validate and load kernel,
> devicetree, initramfs, ... to RAM before launching Linux.
> I've included a link to the documentation[1] which gives insights
> regarding the motivation to create such a format.

That doesn't explain why you'd want to use it.  Nor how people
came up with it.

> In fact, that's only one out of three possible uses in which parsing
> the contained sub-image boundaries can be useful:
>  * On devices with NOR flash uImage.FIT is stored in an MTD partition,
>    hence the uImage.FIT partition parser (or small stackable block
>    driver) would then operate on top of /dev/mtdblockX.
> 
>  * On devices with NAND flash uImage.FIT is stored in a UBI volume,
>    hence in this case /dev/ubiblockX needs to be processed.

And all the mtdblock / ubiblock is due to the lack of a native
mtd/ubi backend for squashfs?  Why can't we take the block layer
out of the loop entirely?

> I hope this explains my motivation. Please ask should there by any
> doubts or if any of my explainations above are not clear.

None of this explains the silly nesting inside the GPT partition.
It is not needed for the any use cases and the root probem here.
Without that you could simply implement a parition format, with that
you get into crazy nesting behavior.  Note that it would have
any benefit over just not doing this silly image.

Maybe someone just needs to go back and come up wit ha scheme that
actually works and implement that in uboot as well.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-12-13  6:48               ` Christoph Hellwig
@ 2022-12-13 12:45                 ` Daniel Golle
  2022-12-14 16:43                   ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-12-13 12:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Matthew Wilcox, Jens Axboe,
	Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Mon, Dec 12, 2022 at 10:48:12PM -0800, Christoph Hellwig wrote:
> On Mon, Dec 12, 2022 at 11:02:33AM +0000, Daniel Golle wrote:
> > The thing is that there isn't anything extraordinarily complex here,
> > just dynamically partitioning a block device based on a well-known
> > format.
> 
> Yes, but a completely non-standard format that nests inside an
> partition.

The reason for this current discussion (see subject line) is exactly
that you didn't like the newly introduced partition type GUID which
then calls the newly introduced partition parser taking care of the
uImage.FIT content of a partition.
Instead you suggested to implement it as a small stackable block
driver, which I did (and will post as RFC in a moment from now).

This block driver (if built-into the kernel and relied upon to expose
the block device used as root filesystem) will need to identify the
lower device it should work on. And for that the helper functions such
as devt_from_devname() need to be available for that driver.

> 
> > Using initramfs implies that we would need a 2nd copy of the standard C
> > library and libfdt, both alone will already occupy more than just a
> > single 64kB block of flash.
> 
> Why do you need libfdt?

uImage.FIT is based on FDT, so the easiest way to parse this format
without reinventing the wheel is using libfdt.

> And with a simple statically linked kpartx you won't pull in much of
> libc either.

Well, enough to have kpartx and libfdt covered. I still assume that's
more than 64KiB of compressed bytecode in total.

> 
> > I understand that from the point of view of
> > classic x86 servers or even embedded devices with eMMC this seems
> > negligible. However, keep in mind that a huge number of existing
> > devices and also new designs of embedded devices often boot from just a
> > few megabytes of NOR flash, and there every byte counts.
> 
> So I've worked quite a bit on really small deeply embedded systems,
> and for those I wouldn't even think of using strange image formats
> or the rather wasteful GPT partition format.
> There we wouldn't dare to use paritions or weird image formats, but

Well, the reality as of today is that even the smallest of all embedded
devices will need updated firmware at some point.

Using an image format which allows to dynamically partition a firmware
image into kernel and rootfs parts is needed **especially** on devices
with very little storage space -- it's there that we cannot afford to
use hard-coded MTD partition sizes either limiting or overestimating
the future kernel size.

And having a unified format which serves a whole range of devices does
make that easier -- surely, on NOR flash, there would not be a GPT
partition; mtdblock on top of an MTD partition is used in that case.

> 
> > > What is the point of the uImage.FIT?
> > 
> > It is the format used by Das U-Boot, which is by far the most common
> > bootloader found on small embedded devices running Linux.
> > Is is already used by Das U-Boot to validate and load kernel,
> > devicetree, initramfs, ... to RAM before launching Linux.
> > I've included a link to the documentation[1] which gives insights
> > regarding the motivation to create such a format.
> 
> That doesn't explain why you'd want to use it.  Nor how people
> came up with it.

I didn't want to quote all the history here, but it is explained rather
well in their documentation I had linked there.

> 
> > In fact, that's only one out of three possible uses in which parsing
> > the contained sub-image boundaries can be useful:
> >  * On devices with NOR flash uImage.FIT is stored in an MTD partition,
> >    hence the uImage.FIT partition parser (or small stackable block
> >    driver) would then operate on top of /dev/mtdblockX.
> > 
> >  * On devices with NAND flash uImage.FIT is stored in a UBI volume,
> >    hence in this case /dev/ubiblockX needs to be processed.
> 
> And all the mtdblock / ubiblock is due to the lack of a native
> mtd/ubi backend for squashfs?  Why can't we take the block layer
> out of the loop entirely?

A block representation is the common denominator of all the
above. Sure, I could implement splitting MTD devices according to
uImage.FIT and then add MTD support to squashfs. Then implement
splitting of UBI volumes and add UBI support to squashfs.

However, simply using mtdblock on NOR and ubiblock on NAND seem
well-suited and hence there wasn't ever a need to implement
MTD or UBI support for any **read-only** filesystem.
Afaik that is what mtdblock and ubiblock are intended to be used
for.

> > I hope this explains my motivation. Please ask should there by any
> > doubts or if any of my explainations above are not clear.
> 
> None of this explains the silly nesting inside the GPT partition.
> It is not needed for the any use cases and the root probem here.

So where would you store the uImage (which will have to exist
even to just load kernel and DTB in U-Boot, even without containing
the root filesystem) on devices with eMMC then?

I did consider block2mtd on eMMC and gluebi (which is deprecated) on
UBI, but in order to mount the filesystem I'd then use mtdblock further
down the road, and that'd actually be silly.

> Without that you could simply implement a parition format, with that
> you get into crazy nesting behavior.  Note that it would have
> any benefit over just not doing this silly image.

Are you suggesting to come up with an entirely new type of partition
table only for that purpose? Which will require its own tools and
implementation in both, U-Boot and Linux? What would be the benefit
over just using GPT partitioning?

I'm not saying that this cannot be done (and I will do it if you
convince me that it'd be needed).

> 
> Maybe someone just needs to go back and come up wit ha scheme that
> actually works and implement that in uboot as well.

In terms of "works" both are working very well, the uImage.FIT
partition parser as well as the tiny stackable block driver you had
asked me to write instead. I've tested both on multiple devices by now,
the former is even used in production on one of the currently most
popular devices running OpenWrt (according to update backend stats).


Thank you for your patience!


Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-12-13 12:45                 ` Daniel Golle
@ 2022-12-14 16:43                   ` Christoph Hellwig
  2022-12-14 20:01                     ` Daniel Golle
  0 siblings, 1 reply; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-14 16:43 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Richard Weinberger, Matthew Wilcox,
	Jens Axboe, Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Tue, Dec 13, 2022 at 12:45:27PM +0000, Daniel Golle wrote:
> > Yes, but a completely non-standard format that nests inside an
> > partition.
> 
> The reason for this current discussion (see subject line) is exactly
> that you didn't like the newly introduced partition type GUID which
> then calls the newly introduced partition parser taking care of the
> uImage.FIT content of a partition.

Which is the exact nesting I'm complaining about.  Why do you need
to use your format inside a GPT partition table?  What you're doing
is bascially nesting a partition table format inside another one,
which doesn't make any sense at all.

> This block driver (if built-into the kernel and relied upon to expose
> the block device used as root filesystem) will need to identify the
> lower device it should work on. And for that the helper functions such
> as devt_from_devname() need to be available for that driver.

And devt_from_devname must not be used by more non-init code.  It is
bad it got exposed at all, but new users are not acceptable.

> A block representation is the common denominator of all the
> above. Sure, I could implement splitting MTD devices according to
> uImage.FIT and then add MTD support to squashfs. Then implement
> splitting of UBI volumes and add UBI support to squashfs.

Implementing MTD and/or UBI support would allow you to build a
kernel without CONFIG_BLOCK, which will save you a lot more than
the 64k you were whining about above.

> > None of this explains the silly nesting inside the GPT partition.
> > It is not needed for the any use cases and the root probem here.
> 
> So where would you store the uImage (which will have to exist
> even to just load kernel and DTB in U-Boot, even without containing
> the root filesystem) on devices with eMMC then?

Straight on the block device, where else?

> Are you suggesting to come up with an entirely new type of partition
> table only for that purpose? Which will require its own tools and
> implementation in both, U-Boot and Linux? What would be the benefit
> over just using GPT partitioning?

Why do you need another layer of partitioning instead of storing
all your information either in the uImage, or in some other
partition format of your choice?

See, if you have GPT, DOS or whatever partitions, you just use
partitions and store all the bits your care about in them.
If you want a fancy not invented here syndrome image format you use that.
But don't use both.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-12-14 16:43                   ` Christoph Hellwig
@ 2022-12-14 20:01                     ` Daniel Golle
  2022-12-15  8:09                       ` Christoph Hellwig
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Golle @ 2022-12-14 20:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Matthew Wilcox, Jens Axboe,
	Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Wed, Dec 14, 2022 at 08:43:29AM -0800, Christoph Hellwig wrote:
> On Tue, Dec 13, 2022 at 12:45:27PM +0000, Daniel Golle wrote:
> > > Yes, but a completely non-standard format that nests inside an
> > > partition.
> > 
> > The reason for this current discussion (see subject line) is exactly
> > that you didn't like the newly introduced partition type GUID which
> > then calls the newly introduced partition parser taking care of the
> > uImage.FIT content of a partition.
> 
> Which is the exact nesting I'm complaining about.  Why do you need
> to use your format inside a GPT partition table?

The GPT partition table is typically written only once to an eMMC-
based device in factory. Firmware images (typically uImage.FIT) are
stored in partitions because there are sometimes two of them (for
A/B dual-boot, or recovery/production dual-boot).

As a working device firmware consists of kernel, dtb and rootfs, all
these three things have to be written and used together, typically
they also come together in one file for firmware upgrade (ie.
rootfs appended to kernel, tarballs, or uImage.FIT containing all three
of them).

As the size of kernel and rootfs cannot be determined accurately at
the time the device is made, having individual GPT partitions for
kernel and rootfs ends up to either being a limit to future groth of
the kernel image or wastes space by overestimating the kernel size.

Changing the GPT partitioning when updating the device to match the
exact sizes is also not an option as a damage to the GPT would then
present a single point of failure (backup GPT also wouldn't help
much here), so for dual-boot to actually be meaningful, we shouldn't
ever write to any parts of the disk/flash which affect more than one
of the dual-boot options.

> What you're doing is bascially nesting a partition table format
> inside another one, which doesn't make any sense at all.

See the last paragraph of this message for good reasons why one would
want to do exactly that.

> 
> > This block driver (if built-into the kernel and relied upon to expose
> > the block device used as root filesystem) will need to identify the
> > lower device it should work on. And for that the helper functions such
> > as devt_from_devname() need to be available for that driver.
> 
> And devt_from_devname must not be used by more non-init code.  It is
> bad it got exposed at all, but new users are not acceptable.

I assume that implementing anything similar using blk_lookup_devt in
the driver itself is then also not acceptable, right?

Yet another option would be to implement a way to acquire this
information from device tree. Ie. have a reference to the disk device
as well as an unsigned integer in the 'chosen' node which the
bootloader can use to communicate this to the kernel. Example:
chosen {
	bootdev = <&mmc0 3>;
};

It's a bit more tricky for ubiblock or mtdblock devices because they
don't have *any* representation in device tree at all at this point.

In case of an MTD partition (for mtdblockX) we would just reference
the mtd partition in the same way.

To do this cleanly with NAND/UBI, I'd start with adding
device-tree-based attaching of an MTD partition to UBI using a
device-tree attribute 'compatible = "linux,ubi"' on the MTD partition.
We could then have sub-nodes referencing specific UBI volumes, to
select them for use with ubiblock but also for those nodes then
being a valid reference for use with the to-be-introduced 'bootdev'
attribute in 'chosen'.

Does that sound acceptable from your perspective?

> 
> > A block representation is the common denominator of all the
> > above. Sure, I could implement splitting MTD devices according to
> > uImage.FIT and then add MTD support to squashfs. Then implement
> > splitting of UBI volumes and add UBI support to squashfs.
> 
> Implementing MTD and/or UBI support would allow you to build a
> kernel without CONFIG_BLOCK, which will save you a lot more than
> the 64k you were whining about above.

Even devices with NOR flash may still want support for removable
block devices like USB pendrives or SD cards... Many home-routers
got only 8MiB of NOR flash and yet come with USB 2.0 ports intended
for a pendrive which is then shared via Samba.

Also, heavily customzied per-device kernel builds would never
scale up to support thousands of devices -- hence OpenWrt uses the
exact same kernel build for many devices, which makes both, the
build process and also debugging kernel bugs, much much easier (or
even doable at all).

> 
> > > None of this explains the silly nesting inside the GPT partition.
> > > It is not needed for the any use cases and the root probem here.
> > 
> > So where would you store the uImage (which will have to exist
> > even to just load kernel and DTB in U-Boot, even without containing
> > the root filesystem) on devices with eMMC then?
> 
> Straight on the block device, where else?

As the first few blocks are typically used for bootloader code and
bootloader environment, we would then need to hard-code the offset(s)
of the uImage.FIT on the block device. Imho this becomes messy and just
using partitions seemed like a straight forward solution.

And what about dual-boot systems where you have more than one firmware
image? Hard-code more offsets? For each device?

In a way, I was considering this by using blkdevparts= cmdline option
instead of GPT, but 

> 
> > Are you suggesting to come up with an entirely new type of partition
> > table only for that purpose? Which will require its own tools and
> > implementation in both, U-Boot and Linux? What would be the benefit
> > over just using GPT partitioning?
> 
> Why do you need another layer of partitioning instead of storing
> all your information either in the uImage, or in some other
> partition format of your choice?

The reason is the different life-cycle of the device main partition
table, bootloader, bootloader environment, ... on one hand and each
firmware image on a dual boot system on the other hand.
Hence there is more than just one uImage: typically bootloader,
bootloader environment, firmware A (uImage.FIT) and firmware B.
Relace "A" and "B" with "recovery" and "production", depending on the
dual-boot style implemented.

Therefore re-writing the whole disk during firmware upgrades is not an
option because it is risky, eg. in case of a sudden power failure we
could end up with a hard-bricked system. So to me it makes sense that
for a firmware upgrade, we write only to one partition and don't touch
GPT or anything else on the device. So in case something goes wrong,
the device will still boot, the bootloader will realize that the
uImage.FIT in one partition is broken (uImage.FIT also comes with
hashes to ensure image integrity) and it will load something else (from
another partition) instead.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-12-14 20:01                     ` Daniel Golle
@ 2022-12-15  8:09                       ` Christoph Hellwig
  0 siblings, 0 replies; 20+ messages in thread
From: Christoph Hellwig @ 2022-12-15  8:09 UTC (permalink / raw)
  To: Daniel Golle
  Cc: Christoph Hellwig, Richard Weinberger, Matthew Wilcox,
	Jens Axboe, Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

On Wed, Dec 14, 2022 at 08:01:48PM +0000, Daniel Golle wrote:
> I assume that implementing anything similar using blk_lookup_devt in
> the driver itself is then also not acceptable, right?

No.  blk_lookup_devt is a workaround for the early init code where
the file system is not available yet.  The only reasonable use case
is the early init code, but even that is better handled by an initramfs.

> Yet another option would be to implement a way to acquire this
> information from device tree. Ie. have a reference to the disk device
> as well as an unsigned integer in the 'chosen' node which the
> bootloader can use to communicate this to the kernel. Example:
> chosen {
> 	bootdev = <&mmc0 3>;
> };
> 
> It's a bit more tricky for ubiblock or mtdblock devices because they
> don't have *any* representation in device tree at all at this point.
> 
> In case of an MTD partition (for mtdblockX) we would just reference
> the mtd partition in the same way.
> 
> To do this cleanly with NAND/UBI, I'd start with adding
> device-tree-based attaching of an MTD partition to UBI using a
> device-tree attribute 'compatible = "linux,ubi"' on the MTD partition.
> We could then have sub-nodes referencing specific UBI volumes, to
> select them for use with ubiblock but also for those nodes then
> being a valid reference for use with the to-be-introduced 'bootdev'
> attribute in 'chosen'.
> 
> Does that sound acceptable from your perspective?

Device tree questions are really out of my knowledge and domain, the
right people to talk to would be the device-tree and relevant driver
maintainers. 

> Even devices with NOR flash may still want support for removable
> block devices like USB pendrives or SD cards... Many home-routers
> got only 8MiB of NOR flash and yet come with USB 2.0 ports intended
> for a pendrive which is then shared via Samba.

Ok, so we're not _that_ resource contrained at all, because for the
really small devices with just a few MB of ram, not having the block
layer makes a huge difference.

> As the first few blocks are typically used for bootloader code and
> bootloader environment, we would then need to hard-code the offset(s)
> of the uImage.FIT on the block device. Imho this becomes messy and just
> using partitions seemed like a straight forward solution.

Ok, so the whole DOS-World boot load nightmare has spread to these
devices as well.

> And what about dual-boot systems where you have more than one firmware
> image? Hard-code more offsets? For each device?

Is this for a fail save use case where on image is update while the
other on is in use?  I've usually seen people use two different NOR
chips for that to have full error isolation.

> The reason is the different life-cycle of the device main partition
> table, bootloader, bootloader environment, ... on one hand and each
> firmware image on a dual boot system on the other hand.

Oh, so the firmware image doesn't even include the bootloader, but the
bootload is on the same device?  That just seems like a very odd
setup, but I'll take your word for it being used.

> Therefore re-writing the whole disk during firmware upgrades is not an
> option because it is risky, eg. in case of a sudden power failure we
> could end up with a hard-bricked system. So to me it makes sense that
> for a firmware upgrade, we write only to one partition and don't touch
> GPT or anything else on the device. So in case something goes wrong,
> the device will still boot, the bootloader will realize that the
> uImage.FIT in one partition is broken (uImage.FIT also comes with
> hashes to ensure image integrity) and it will load something else (from
> another partition) instead.

I'm just really confused about this whole setup.  Maybe it's just
because what I've worked with, which is either really deeply embededd
devices where the everything is one image, that you might in some case
have twice, or a full blown UEFI-like setup where the boot loader is
separate, but can simply load a kernel from the actual root file system.

I have to say even after all these round trip I'm still not sure
what problem where really solving. In basically all other environments
that have a powerful enough bootloader to read file systems your rootfs
would simply store the kernel image and dtb if they are interdependent.
Is the root cause that uboot doesn't support reading files from squashfs
despite supporting half a dozen other file systems?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c
  2022-11-22 12:37       ` Christoph Hellwig
  2022-12-09 17:00         ` Daniel Golle
@ 2023-03-30 14:43         ` Daniel Golle
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Golle @ 2023-03-30 14:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Matthew Wilcox, Jens Axboe,
	Martin K. Petersen, Chaitanya Kulkarni, Wolfram Sang,
	linux-block, linux-kernel

Hi Christoph,

On Tue, Nov 22, 2022 at 04:37:08AM -0800, Christoph Hellwig wrote:
> On Sat, Nov 19, 2022 at 04:03:11PM +0000, Daniel Golle wrote:
> > [...]
> > Yet another (imho not terrible) problem is removal of the lower device.
> > Many of the supported SBC use a micro SD card to boot, which can be
> > removed by the user while the system is running (which is generally not
> > a good idea, but anyway). For partitions this is handled automatically
> > by blk_drop_partitions() called directly from genhd.c.
> > I'm currently playing with doing something similar using the bus device
> > removal notification, but it doesn't seem to work for all cases, e.g.
> > mmcblk device do not seem to have the ->bus pointer populated at all
> > (ie. disk_to_dev(disk)->bus == NULL for mmcblk devices).
> 
> I have WIP patches that allow the claimer of a block device get
> resize and removal notification.  It's not going to land for 6.2,
> but I hope I have it ready in time for the next merge window.

Any news about that patchset? I'd happily review, test and use it ;)

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2023-03-30 14:43 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-17  0:42 [RFC PATCH 0/4] block: uImage.FIT filesystem image mapper Daniel Golle
2022-11-17  0:42 ` [RFC PATCH 1/4] init: move block device helpers from init/do_mounts.c Daniel Golle
2022-11-17  5:55   ` Christoph Hellwig
2022-11-19 16:03     ` Daniel Golle
2022-11-22 12:37       ` Christoph Hellwig
2022-12-09 17:00         ` Daniel Golle
2022-12-12  9:03           ` Christoph Hellwig
2022-12-12 11:02             ` Daniel Golle
2022-12-13  6:48               ` Christoph Hellwig
2022-12-13 12:45                 ` Daniel Golle
2022-12-14 16:43                   ` Christoph Hellwig
2022-12-14 20:01                     ` Daniel Golle
2022-12-15  8:09                       ` Christoph Hellwig
2023-03-30 14:43         ` Daniel Golle
2022-11-17  0:43 ` [RFC PATCH 2/4] block: add new flag to add partitions read-only Daniel Golle
2022-11-17  0:44 ` [RFC PATCH 3/4] blkdev: add function to add named read-only partitions Daniel Golle
2022-11-17  5:56   ` Christoph Hellwig
2022-11-17 17:12     ` Daniel Golle
2022-11-17  0:45 ` [RFC PATCH 4/4] block: add uImage.FIT block partition driver Daniel Golle
2022-11-17  5:58   ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.