All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] cramfs refresh for embedded usage
@ 2017-09-27 23:32 ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

To memory management people: please review patch #4 of this series.

This series brings a nice refresh to the cramfs filesystem, adding the
following capabilities:

- Direct memory access, bypassing the block and/or MTD layers entirely.

- Ability to store individual data blocks uncompressed.

- Ability to locate individual data blocks anywhere in the filesystem.

The end result is a very tight filesystem that can be accessed directly
from ROM without any other subsystem underneath. This also allows for
user space XIP which is a very important feature for tiny embedded
systems.

This series is also available based on v4.13 via git here:

  http://git.linaro.org/people/nicolas.pitre/linux xipcramfs

Why cramfs?

  Because cramfs is very simple and small. With CONFIG_CRAMFS_BLOCK=n and
  CONFIG_CRAMFS_PHYSMEM=y the cramfs driver may use as little as 3704 bytes
  of code. That's many times smaller than squashfs. And the runtime memory
  usage is also much less with cramfs than squashfs. It packs very tightly
  already compared to romfs which has no compression support. And the cramfs
  format was simple to extend, allowing for both compressed and uncompressed
  blocks within the same file.

Why not accessing ROM via MTD?

  The MTD layer is nice and flexible. It also represents a huge overhead
  considering its core with no other enabled options weights 19KB.
  That's many times the size of the cramfs code for something that
  essentially boils down to a glorified argument parser and a call to
  memremap() in this case.  And if someone still wants to use cramfs via
  MTD then it is already possible with mtdblock.

Why not using DAX?

  DAX stands for "Direct Access" and is a generic kernel layer helping
  with the necessary tasks involved with XIP. It is tailored for large
  writable filesystems and relies on the presence of an MMU. It also has
  the following shortcoming: "The DAX code does not work correctly on
  architectures which have virtually mapped caches such as ARM, MIPS and
  SPARC." That makes it unsuitable for a large portion of the intended
  targets for this series. And due to the read-only nature of cramfs, it is
  possible to achieve the intended result with a much simpler approach making
  DAX somewhat overkill in this context.

The maximum size of a cramfs image can't exceed 272MB. In practice it is
likely to be much much less. Given this series is concerned with small
memory systems, even in the MMU case there is always plenty of vmalloc
space left to map it all and even a 272MB memremap() wouldn't be a
problem. If it is then maybe your system is big enough with large
resources to manage already and you're pretty unlikely to be using cramfs
in the first place.

Of course, while this cramfs remains backward compatible with existing
filesystem images, a newer mkcramfs version is necessary to take advantage
of the extended data layout. I created a version of mkcramfs that
detects ELF files and marks text+rodata segments for XIP and compresses the
rest of those ELF files automatically.

So here it is. I'm also willing to step up as cramfs maintainer given
that no sign of any maintenance activities appeared for years.


Changes from v3:

- Rebased on v4.13.
- Made direct access depend on cramfs not being modular due to unexported
  vma handling functions.
- Solicit comments from mm people explicitly.

Changes from v2:

- Plugged a few races in cramfs_vmasplit_fault(). Thanks to Al Viro for
  highlighting them.
- Fixed some checkpatch warnings

Changes from v1:

- Improved mmap() support by adding the ability to partially populate a
  mapping and lazily split the non directly mapable pages to a separate
  vma at fault time (thanks to Chris Brandt for testing).
- Clarified the documentation some more.


diffstat:

 Documentation/filesystems/cramfs.txt |  42 ++
 MAINTAINERS                          |   4 +-
 fs/cramfs/Kconfig                    |  38 +-
 fs/cramfs/README                     |  31 +-
 fs/cramfs/inode.c                    | 646 ++++++++++++++++++++++++++---
 include/uapi/linux/cramfs_fs.h       |  20 +-
 init/do_mounts.c                     |   8 +
 7 files changed, 712 insertions(+), 77 deletions(-)

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 0/5] cramfs refresh for embedded usage
@ 2017-09-27 23:32 ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

To memory management people: please review patch #4 of this series.

This series brings a nice refresh to the cramfs filesystem, adding the
following capabilities:

- Direct memory access, bypassing the block and/or MTD layers entirely.

- Ability to store individual data blocks uncompressed.

- Ability to locate individual data blocks anywhere in the filesystem.

The end result is a very tight filesystem that can be accessed directly
from ROM without any other subsystem underneath. This also allows for
user space XIP which is a very important feature for tiny embedded
systems.

This series is also available based on v4.13 via git here:

  http://git.linaro.org/people/nicolas.pitre/linux xipcramfs

Why cramfs?

  Because cramfs is very simple and small. With CONFIG_CRAMFS_BLOCK=n and
  CONFIG_CRAMFS_PHYSMEM=y the cramfs driver may use as little as 3704 bytes
  of code. That's many times smaller than squashfs. And the runtime memory
  usage is also much less with cramfs than squashfs. It packs very tightly
  already compared to romfs which has no compression support. And the cramfs
  format was simple to extend, allowing for both compressed and uncompressed
  blocks within the same file.

Why not accessing ROM via MTD?

  The MTD layer is nice and flexible. It also represents a huge overhead
  considering its core with no other enabled options weights 19KB.
  That's many times the size of the cramfs code for something that
  essentially boils down to a glorified argument parser and a call to
  memremap() in this case.  And if someone still wants to use cramfs via
  MTD then it is already possible with mtdblock.

Why not using DAX?

  DAX stands for "Direct Access" and is a generic kernel layer helping
  with the necessary tasks involved with XIP. It is tailored for large
  writable filesystems and relies on the presence of an MMU. It also has
  the following shortcoming: "The DAX code does not work correctly on
  architectures which have virtually mapped caches such as ARM, MIPS and
  SPARC." That makes it unsuitable for a large portion of the intended
  targets for this series. And due to the read-only nature of cramfs, it is
  possible to achieve the intended result with a much simpler approach making
  DAX somewhat overkill in this context.

The maximum size of a cramfs image can't exceed 272MB. In practice it is
likely to be much much less. Given this series is concerned with small
memory systems, even in the MMU case there is always plenty of vmalloc
space left to map it all and even a 272MB memremap() wouldn't be a
problem. If it is then maybe your system is big enough with large
resources to manage already and you're pretty unlikely to be using cramfs
in the first place.

Of course, while this cramfs remains backward compatible with existing
filesystem images, a newer mkcramfs version is necessary to take advantage
of the extended data layout. I created a version of mkcramfs that
detects ELF files and marks text+rodata segments for XIP and compresses the
rest of those ELF files automatically.

So here it is. I'm also willing to step up as cramfs maintainer given
that no sign of any maintenance activities appeared for years.


Changes from v3:

- Rebased on v4.13.
- Made direct access depend on cramfs not being modular due to unexported
  vma handling functions.
- Solicit comments from mm people explicitly.

Changes from v2:

- Plugged a few races in cramfs_vmasplit_fault(). Thanks to Al Viro for
  highlighting them.
- Fixed some checkpatch warnings

Changes from v1:

- Improved mmap() support by adding the ability to partially populate a
  mapping and lazily split the non directly mapable pages to a separate
  vma at fault time (thanks to Chris Brandt for testing).
- Clarified the documentation some more.


diffstat:

 Documentation/filesystems/cramfs.txt |  42 ++
 MAINTAINERS                          |   4 +-
 fs/cramfs/Kconfig                    |  38 +-
 fs/cramfs/README                     |  31 +-
 fs/cramfs/inode.c                    | 646 ++++++++++++++++++++++++++---
 include/uapi/linux/cramfs_fs.h       |  20 +-
 init/do_mounts.c                     |   8 +
 7 files changed, 712 insertions(+), 77 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH v4 1/5] cramfs: direct memory access support
  2017-09-27 23:32 ` Nicolas Pitre
@ 2017-09-27 23:32   ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Small embedded systems typically execute the kernel code in place (XIP)
directly from flash to save on precious RAM usage. This adds the ability
to consume filesystem data directly from flash to the cramfs filesystem
as well. Cramfs is particularly well suited to this feature as it is
very simple and its RAM usage is already very low, and with this feature
it is possible to use it with no block device support and even lower RAM
usage.

This patch was inspired by a similar patch from Shane Nay dated 17 years
ago that used to be very popular in embedded circles but never made it
into mainline. This is a cleaned-up implementation that uses far fewer
memory address at run time when both methods are configured in. In the
context of small IoT deployments, this functionality has become relevant
and useful again.

To distinguish between both access types, the cramfs_physmem filesystem
type must be specified when using a memory accessible cramfs image, and
the physaddr argument must provide the actual filesystem image's physical
memory location.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 fs/cramfs/Kconfig |  29 +++++-
 fs/cramfs/inode.c | 264 +++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 241 insertions(+), 52 deletions(-)

diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 11b29d491b..5b4e0b7e13 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -1,6 +1,5 @@
 config CRAMFS
 	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
-	depends on BLOCK
 	select ZLIB_INFLATE
 	help
 	  Saying Y here includes support for CramFs (Compressed ROM File
@@ -20,3 +19,31 @@ config CRAMFS
 	  in terms of performance and features.
 
 	  If unsure, say N.
+
+config CRAMFS_BLOCKDEV
+	bool "Support CramFs image over a regular block device" if EXPERT
+	depends on CRAMFS && BLOCK
+	default y
+	help
+	  This option allows the CramFs driver to load data from a regular
+	  block device such a disk partition or a ramdisk.
+
+config CRAMFS_PHYSMEM
+	bool "Support CramFs image directly mapped in physical memory"
+	depends on CRAMFS
+	default y if !CRAMFS_BLOCKDEV
+	help
+	  This option allows the CramFs driver to load data directly from
+	  a linear adressed memory range (usually non volatile memory
+	  like flash) instead of going through the block device layer.
+	  This saves some memory since no intermediate buffering is
+	  necessary.
+
+	  The filesystem type for this feature is "cramfs_physmem".
+	  The location of the CramFs image in memory is board
+	  dependent. Therefore, if you say Y, you must know the proper
+	  physical address where to store the CramFs image and specify
+	  it using the physaddr=0x******** mount option (for example:
+	  "mount -t cramfs_physmem -o physaddr=0x100000 none /mnt").
+
+	  If unsure, say N.
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 7919967488..19f464a214 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -24,6 +24,7 @@
 #include <linux/mutex.h>
 #include <uapi/linux/cramfs_fs.h>
 #include <linux/uaccess.h>
+#include <linux/io.h>
 
 #include "internal.h"
 
@@ -36,6 +37,8 @@ struct cramfs_sb_info {
 	unsigned long blocks;
 	unsigned long files;
 	unsigned long flags;
+	void *linear_virt_addr;
+	phys_addr_t linear_phys_addr;
 };
 
 static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
@@ -140,6 +143,9 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
  * BLKS_PER_BUF*PAGE_SIZE, so that the caller doesn't need to
  * worry about end-of-buffer issues even when decompressing a full
  * page cache.
+ *
+ * Note: This is all optimized away at compile time when
+ *       CONFIG_CRAMFS_BLOCKDEV=n.
  */
 #define READ_BUFFERS (2)
 /* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */
@@ -160,10 +166,10 @@ static struct super_block *buffer_dev[READ_BUFFERS];
 static int next_buffer;
 
 /*
- * Returns a pointer to a buffer containing at least LEN bytes of
- * filesystem starting at byte offset OFFSET into the filesystem.
+ * Populate our block cache and return a pointer from it.
  */
-static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
+static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
+				unsigned int len)
 {
 	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
 	struct page *pages[BLKS_PER_BUF];
@@ -239,7 +245,39 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i
 	return read_buffers[buffer] + offset;
 }
 
-static void cramfs_kill_sb(struct super_block *sb)
+/*
+ * Return a pointer to the linearly addressed cramfs image in memory.
+ */
+static void *cramfs_direct_read(struct super_block *sb, unsigned int offset,
+				unsigned int len)
+{
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+
+	if (!len)
+		return NULL;
+	if (len > sbi->size || offset > sbi->size - len)
+	       return page_address(ZERO_PAGE(0));
+	return sbi->linear_virt_addr + offset;
+}
+
+/*
+ * Returns a pointer to a buffer containing at least LEN bytes of
+ * filesystem starting at byte offset OFFSET into the filesystem.
+ */
+static void *cramfs_read(struct super_block *sb, unsigned int offset,
+			 unsigned int len)
+{
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && sbi->linear_virt_addr)
+		return cramfs_direct_read(sb, offset, len);
+	else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
+		return cramfs_blkdev_read(sb, offset, len);
+	else
+		return NULL;
+}
+
+static void cramfs_blkdev_kill_sb(struct super_block *sb)
 {
 	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
 
@@ -247,6 +285,16 @@ static void cramfs_kill_sb(struct super_block *sb)
 	kfree(sbi);
 }
 
+static void cramfs_physmem_kill_sb(struct super_block *sb)
+{
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+
+	if (sbi->linear_virt_addr)
+		memunmap(sbi->linear_virt_addr);
+	kill_anon_super(sb);
+	kfree(sbi);
+}
+
 static int cramfs_remount(struct super_block *sb, int *flags, char *data)
 {
 	sync_filesystem(sb);
@@ -254,34 +302,24 @@ static int cramfs_remount(struct super_block *sb, int *flags, char *data)
 	return 0;
 }
 
-static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
+static int cramfs_read_super(struct super_block *sb,
+			     struct cramfs_super *super, int silent)
 {
-	int i;
-	struct cramfs_super super;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
 	unsigned long root_offset;
-	struct cramfs_sb_info *sbi;
-	struct inode *root;
-
-	sb->s_flags |= MS_RDONLY;
-
-	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
-	if (!sbi)
-		return -ENOMEM;
-	sb->s_fs_info = sbi;
 
-	/* Invalidate the read buffers on mount: think disk change.. */
-	mutex_lock(&read_mutex);
-	for (i = 0; i < READ_BUFFERS; i++)
-		buffer_blocknr[i] = -1;
+	/* We don't know the real size yet */
+	sbi->size = PAGE_SIZE;
 
 	/* Read the first block and get the superblock from it */
-	memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
+	mutex_lock(&read_mutex);
+	memcpy(super, cramfs_read(sb, 0, sizeof(*super)), sizeof(*super));
 	mutex_unlock(&read_mutex);
 
 	/* Do sanity checks on the superblock */
-	if (super.magic != CRAMFS_MAGIC) {
+	if (super->magic != CRAMFS_MAGIC) {
 		/* check for wrong endianness */
-		if (super.magic == CRAMFS_MAGIC_WEND) {
+		if (super->magic == CRAMFS_MAGIC_WEND) {
 			if (!silent)
 				pr_err("wrong endianness\n");
 			return -EINVAL;
@@ -289,10 +327,10 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 
 		/* check at 512 byte offset */
 		mutex_lock(&read_mutex);
-		memcpy(&super, cramfs_read(sb, 512, sizeof(super)), sizeof(super));
+		memcpy(super, cramfs_read(sb, 512, sizeof(*super)), sizeof(*super));
 		mutex_unlock(&read_mutex);
-		if (super.magic != CRAMFS_MAGIC) {
-			if (super.magic == CRAMFS_MAGIC_WEND && !silent)
+		if (super->magic != CRAMFS_MAGIC) {
+			if (super->magic == CRAMFS_MAGIC_WEND && !silent)
 				pr_err("wrong endianness\n");
 			else if (!silent)
 				pr_err("wrong magic\n");
@@ -301,34 +339,34 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 	}
 
 	/* get feature flags first */
-	if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) {
+	if (super->flags & ~CRAMFS_SUPPORTED_FLAGS) {
 		pr_err("unsupported filesystem features\n");
 		return -EINVAL;
 	}
 
 	/* Check that the root inode is in a sane state */
-	if (!S_ISDIR(super.root.mode)) {
+	if (!S_ISDIR(super->root.mode)) {
 		pr_err("root is not a directory\n");
 		return -EINVAL;
 	}
 	/* correct strange, hard-coded permissions of mkcramfs */
-	super.root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
+	super->root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
 
-	root_offset = super.root.offset << 2;
-	if (super.flags & CRAMFS_FLAG_FSID_VERSION_2) {
-		sbi->size = super.size;
-		sbi->blocks = super.fsid.blocks;
-		sbi->files = super.fsid.files;
+	root_offset = super->root.offset << 2;
+	if (super->flags & CRAMFS_FLAG_FSID_VERSION_2) {
+		sbi->size = super->size;
+		sbi->blocks = super->fsid.blocks;
+		sbi->files = super->fsid.files;
 	} else {
 		sbi->size = 1<<28;
 		sbi->blocks = 0;
 		sbi->files = 0;
 	}
-	sbi->magic = super.magic;
-	sbi->flags = super.flags;
+	sbi->magic = super->magic;
+	sbi->flags = super->flags;
 	if (root_offset == 0)
 		pr_info("empty filesystem");
-	else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
+	else if (!(super->flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
 		 ((root_offset != sizeof(struct cramfs_super)) &&
 		  (root_offset != 512 + sizeof(struct cramfs_super))))
 	{
@@ -336,9 +374,18 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 		return -EINVAL;
 	}
 
+	return 0;
+}
+
+static int cramfs_finalize_super(struct super_block *sb,
+				 struct cramfs_inode *cramfs_root)
+{
+	struct inode *root;
+
 	/* Set it all up.. */
+	sb->s_flags |= MS_RDONLY;
 	sb->s_op = &cramfs_ops;
-	root = get_cramfs_inode(sb, &super.root, 0);
+	root = get_cramfs_inode(sb, cramfs_root, 0);
 	if (IS_ERR(root))
 		return PTR_ERR(root);
 	sb->s_root = d_make_root(root);
@@ -347,6 +394,92 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 	return 0;
 }
 
+static int cramfs_blkdev_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct cramfs_sb_info *sbi;
+	struct cramfs_super super;
+	int i, err;
+
+	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
+	if (!sbi)
+		return -ENOMEM;
+	sb->s_fs_info = sbi;
+
+	/* Invalidate the read buffers on mount: think disk change.. */
+	for (i = 0; i < READ_BUFFERS; i++)
+		buffer_blocknr[i] = -1;
+
+	err = cramfs_read_super(sb, &super, silent);
+	if (err)
+		return err;
+	return cramfs_finalize_super(sb, &super.root);
+}
+
+static int cramfs_physmem_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct cramfs_sb_info *sbi;
+	struct cramfs_super super;
+	char *p;
+	int err;
+
+	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
+	if (!sbi)
+		return -ENOMEM;
+	sb->s_fs_info = sbi;
+
+	/*
+	 * The physical location of the cramfs image is specified as
+	 * a mount parameter.  This parameter is mandatory for obvious
+	 * reasons.  Some validation is made on the phys address but this
+	 * is not exhaustive and we count on the fact that someone using
+	 * this feature is supposed to know what he/she's doing.
+	 */
+	if (!data || !(p = strstr((char *)data, "physaddr="))) {
+		pr_err("unknown physical address for linear cramfs image\n");
+		return -EINVAL;
+	}
+	sbi->linear_phys_addr = memparse(p + 9, NULL);
+	if (!sbi->linear_phys_addr) {
+		pr_err("bad value for cramfs image physical address\n");
+		return -EINVAL;
+	}
+	if (sbi->linear_phys_addr & (PAGE_SIZE-1)) {
+		pr_err("physical address %pap for linear cramfs isn't aligned to a page boundary\n",
+			&sbi->linear_phys_addr);
+		return -EINVAL;
+	}
+
+	/*
+	 * Map only one page for now.  Will remap it when fs size is known.
+	 * Although we'll only read from it, we want the CPU cache to
+	 * kick in for the higher throughput it provides, hence MEMREMAP_WB.
+	 */
+	pr_info("checking physical address %pap for linear cramfs image\n", &sbi->linear_phys_addr);
+	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, PAGE_SIZE,
+					 MEMREMAP_WB);
+	if (!sbi->linear_virt_addr) {
+		pr_err("ioremap of the linear cramfs image failed\n");
+		return -ENOMEM;
+	}
+
+	err = cramfs_read_super(sb, &super, silent);
+	if (err)
+		return err;
+
+	/* Remap the whole filesystem now */
+	pr_info("linear cramfs image appears to be %lu KB in size\n",
+		sbi->size/1024);
+	memunmap(sbi->linear_virt_addr);
+	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, sbi->size,
+					 MEMREMAP_WB);
+	if (!sbi->linear_virt_addr) {
+		pr_err("ioremap of the linear cramfs image failed\n");
+		return -ENOMEM;
+	}
+
+	return cramfs_finalize_super(sb, &super.root);
+}
+
 static int cramfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
@@ -573,38 +706,67 @@ static const struct super_operations cramfs_ops = {
 	.statfs		= cramfs_statfs,
 };
 
-static struct dentry *cramfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+static struct dentry *cramfs_blkdev_mount(struct file_system_type *fs_type,
+				int flags, const char *dev_name, void *data)
+{
+	return mount_bdev(fs_type, flags, dev_name, data, cramfs_blkdev_fill_super);
+}
+
+static struct dentry *cramfs_physmem_mount(struct file_system_type *fs_type,
+				int flags, const char *dev_name, void *data)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, cramfs_fill_super);
+	return mount_nodev(fs_type, flags, data, cramfs_physmem_fill_super);
 }
 
 static struct file_system_type cramfs_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "cramfs",
-	.mount		= cramfs_mount,
-	.kill_sb	= cramfs_kill_sb,
+	.mount		= cramfs_blkdev_mount,
+	.kill_sb	= cramfs_blkdev_kill_sb,
 	.fs_flags	= FS_REQUIRES_DEV,
 };
+
+static struct file_system_type cramfs_physmem_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "cramfs_physmem",
+	.mount		= cramfs_physmem_mount,
+	.kill_sb	= cramfs_physmem_kill_sb,
+};
+
+#ifdef CONFIG_CRAMFS_BLOCKDEV
 MODULE_ALIAS_FS("cramfs");
+#endif
+#ifdef CONFIG_CRAMFS_PHYSMEM
+MODULE_ALIAS_FS("cramfs_physmem");
+#endif
 
 static int __init init_cramfs_fs(void)
 {
 	int rv;
 
-	rv = cramfs_uncompress_init();
-	if (rv < 0)
-		return rv;
-	rv = register_filesystem(&cramfs_fs_type);
-	if (rv < 0)
-		cramfs_uncompress_exit();
-	return rv;
+	if ((rv = cramfs_uncompress_init()) < 0)
+		goto err0;
+	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) &&
+	    (rv = register_filesystem(&cramfs_fs_type)) < 0)
+		goto err1;
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
+	    (rv = register_filesystem(&cramfs_physmem_fs_type)) < 0)
+		goto err2;
+	return 0;
+
+err2:	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
+		unregister_filesystem(&cramfs_fs_type);
+err1:	cramfs_uncompress_exit();
+err0:	return rv;
 }
 
 static void __exit exit_cramfs_fs(void)
 {
 	cramfs_uncompress_exit();
-	unregister_filesystem(&cramfs_fs_type);
+	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
+		unregister_filesystem(&cramfs_fs_type);
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM))
+		unregister_filesystem(&cramfs_physmem_fs_type);
 }
 
 module_init(init_cramfs_fs)
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 1/5] cramfs: direct memory access support
@ 2017-09-27 23:32   ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Small embedded systems typically execute the kernel code in place (XIP)
directly from flash to save on precious RAM usage. This adds the ability
to consume filesystem data directly from flash to the cramfs filesystem
as well. Cramfs is particularly well suited to this feature as it is
very simple and its RAM usage is already very low, and with this feature
it is possible to use it with no block device support and even lower RAM
usage.

This patch was inspired by a similar patch from Shane Nay dated 17 years
ago that used to be very popular in embedded circles but never made it
into mainline. This is a cleaned-up implementation that uses far fewer
memory address at run time when both methods are configured in. In the
context of small IoT deployments, this functionality has become relevant
and useful again.

To distinguish between both access types, the cramfs_physmem filesystem
type must be specified when using a memory accessible cramfs image, and
the physaddr argument must provide the actual filesystem image's physical
memory location.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 fs/cramfs/Kconfig |  29 +++++-
 fs/cramfs/inode.c | 264 +++++++++++++++++++++++++++++++++++++++++++-----------
 2 files changed, 241 insertions(+), 52 deletions(-)

diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 11b29d491b..5b4e0b7e13 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -1,6 +1,5 @@
 config CRAMFS
 	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
-	depends on BLOCK
 	select ZLIB_INFLATE
 	help
 	  Saying Y here includes support for CramFs (Compressed ROM File
@@ -20,3 +19,31 @@ config CRAMFS
 	  in terms of performance and features.
 
 	  If unsure, say N.
+
+config CRAMFS_BLOCKDEV
+	bool "Support CramFs image over a regular block device" if EXPERT
+	depends on CRAMFS && BLOCK
+	default y
+	help
+	  This option allows the CramFs driver to load data from a regular
+	  block device such a disk partition or a ramdisk.
+
+config CRAMFS_PHYSMEM
+	bool "Support CramFs image directly mapped in physical memory"
+	depends on CRAMFS
+	default y if !CRAMFS_BLOCKDEV
+	help
+	  This option allows the CramFs driver to load data directly from
+	  a linear adressed memory range (usually non volatile memory
+	  like flash) instead of going through the block device layer.
+	  This saves some memory since no intermediate buffering is
+	  necessary.
+
+	  The filesystem type for this feature is "cramfs_physmem".
+	  The location of the CramFs image in memory is board
+	  dependent. Therefore, if you say Y, you must know the proper
+	  physical address where to store the CramFs image and specify
+	  it using the physaddr=0x******** mount option (for example:
+	  "mount -t cramfs_physmem -o physaddr=0x100000 none /mnt").
+
+	  If unsure, say N.
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 7919967488..19f464a214 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -24,6 +24,7 @@
 #include <linux/mutex.h>
 #include <uapi/linux/cramfs_fs.h>
 #include <linux/uaccess.h>
+#include <linux/io.h>
 
 #include "internal.h"
 
@@ -36,6 +37,8 @@ struct cramfs_sb_info {
 	unsigned long blocks;
 	unsigned long files;
 	unsigned long flags;
+	void *linear_virt_addr;
+	phys_addr_t linear_phys_addr;
 };
 
 static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
@@ -140,6 +143,9 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
  * BLKS_PER_BUF*PAGE_SIZE, so that the caller doesn't need to
  * worry about end-of-buffer issues even when decompressing a full
  * page cache.
+ *
+ * Note: This is all optimized away at compile time when
+ *       CONFIG_CRAMFS_BLOCKDEV=n.
  */
 #define READ_BUFFERS (2)
 /* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */
@@ -160,10 +166,10 @@ static struct super_block *buffer_dev[READ_BUFFERS];
 static int next_buffer;
 
 /*
- * Returns a pointer to a buffer containing at least LEN bytes of
- * filesystem starting at byte offset OFFSET into the filesystem.
+ * Populate our block cache and return a pointer from it.
  */
-static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
+static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
+				unsigned int len)
 {
 	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
 	struct page *pages[BLKS_PER_BUF];
@@ -239,7 +245,39 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i
 	return read_buffers[buffer] + offset;
 }
 
-static void cramfs_kill_sb(struct super_block *sb)
+/*
+ * Return a pointer to the linearly addressed cramfs image in memory.
+ */
+static void *cramfs_direct_read(struct super_block *sb, unsigned int offset,
+				unsigned int len)
+{
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+
+	if (!len)
+		return NULL;
+	if (len > sbi->size || offset > sbi->size - len)
+	       return page_address(ZERO_PAGE(0));
+	return sbi->linear_virt_addr + offset;
+}
+
+/*
+ * Returns a pointer to a buffer containing at least LEN bytes of
+ * filesystem starting at byte offset OFFSET into the filesystem.
+ */
+static void *cramfs_read(struct super_block *sb, unsigned int offset,
+			 unsigned int len)
+{
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && sbi->linear_virt_addr)
+		return cramfs_direct_read(sb, offset, len);
+	else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
+		return cramfs_blkdev_read(sb, offset, len);
+	else
+		return NULL;
+}
+
+static void cramfs_blkdev_kill_sb(struct super_block *sb)
 {
 	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
 
@@ -247,6 +285,16 @@ static void cramfs_kill_sb(struct super_block *sb)
 	kfree(sbi);
 }
 
+static void cramfs_physmem_kill_sb(struct super_block *sb)
+{
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+
+	if (sbi->linear_virt_addr)
+		memunmap(sbi->linear_virt_addr);
+	kill_anon_super(sb);
+	kfree(sbi);
+}
+
 static int cramfs_remount(struct super_block *sb, int *flags, char *data)
 {
 	sync_filesystem(sb);
@@ -254,34 +302,24 @@ static int cramfs_remount(struct super_block *sb, int *flags, char *data)
 	return 0;
 }
 
-static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
+static int cramfs_read_super(struct super_block *sb,
+			     struct cramfs_super *super, int silent)
 {
-	int i;
-	struct cramfs_super super;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
 	unsigned long root_offset;
-	struct cramfs_sb_info *sbi;
-	struct inode *root;
-
-	sb->s_flags |= MS_RDONLY;
-
-	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
-	if (!sbi)
-		return -ENOMEM;
-	sb->s_fs_info = sbi;
 
-	/* Invalidate the read buffers on mount: think disk change.. */
-	mutex_lock(&read_mutex);
-	for (i = 0; i < READ_BUFFERS; i++)
-		buffer_blocknr[i] = -1;
+	/* We don't know the real size yet */
+	sbi->size = PAGE_SIZE;
 
 	/* Read the first block and get the superblock from it */
-	memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
+	mutex_lock(&read_mutex);
+	memcpy(super, cramfs_read(sb, 0, sizeof(*super)), sizeof(*super));
 	mutex_unlock(&read_mutex);
 
 	/* Do sanity checks on the superblock */
-	if (super.magic != CRAMFS_MAGIC) {
+	if (super->magic != CRAMFS_MAGIC) {
 		/* check for wrong endianness */
-		if (super.magic == CRAMFS_MAGIC_WEND) {
+		if (super->magic == CRAMFS_MAGIC_WEND) {
 			if (!silent)
 				pr_err("wrong endianness\n");
 			return -EINVAL;
@@ -289,10 +327,10 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 
 		/* check at 512 byte offset */
 		mutex_lock(&read_mutex);
-		memcpy(&super, cramfs_read(sb, 512, sizeof(super)), sizeof(super));
+		memcpy(super, cramfs_read(sb, 512, sizeof(*super)), sizeof(*super));
 		mutex_unlock(&read_mutex);
-		if (super.magic != CRAMFS_MAGIC) {
-			if (super.magic == CRAMFS_MAGIC_WEND && !silent)
+		if (super->magic != CRAMFS_MAGIC) {
+			if (super->magic == CRAMFS_MAGIC_WEND && !silent)
 				pr_err("wrong endianness\n");
 			else if (!silent)
 				pr_err("wrong magic\n");
@@ -301,34 +339,34 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 	}
 
 	/* get feature flags first */
-	if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) {
+	if (super->flags & ~CRAMFS_SUPPORTED_FLAGS) {
 		pr_err("unsupported filesystem features\n");
 		return -EINVAL;
 	}
 
 	/* Check that the root inode is in a sane state */
-	if (!S_ISDIR(super.root.mode)) {
+	if (!S_ISDIR(super->root.mode)) {
 		pr_err("root is not a directory\n");
 		return -EINVAL;
 	}
 	/* correct strange, hard-coded permissions of mkcramfs */
-	super.root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
+	super->root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
 
-	root_offset = super.root.offset << 2;
-	if (super.flags & CRAMFS_FLAG_FSID_VERSION_2) {
-		sbi->size = super.size;
-		sbi->blocks = super.fsid.blocks;
-		sbi->files = super.fsid.files;
+	root_offset = super->root.offset << 2;
+	if (super->flags & CRAMFS_FLAG_FSID_VERSION_2) {
+		sbi->size = super->size;
+		sbi->blocks = super->fsid.blocks;
+		sbi->files = super->fsid.files;
 	} else {
 		sbi->size = 1<<28;
 		sbi->blocks = 0;
 		sbi->files = 0;
 	}
-	sbi->magic = super.magic;
-	sbi->flags = super.flags;
+	sbi->magic = super->magic;
+	sbi->flags = super->flags;
 	if (root_offset == 0)
 		pr_info("empty filesystem");
-	else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
+	else if (!(super->flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
 		 ((root_offset != sizeof(struct cramfs_super)) &&
 		  (root_offset != 512 + sizeof(struct cramfs_super))))
 	{
@@ -336,9 +374,18 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 		return -EINVAL;
 	}
 
+	return 0;
+}
+
+static int cramfs_finalize_super(struct super_block *sb,
+				 struct cramfs_inode *cramfs_root)
+{
+	struct inode *root;
+
 	/* Set it all up.. */
+	sb->s_flags |= MS_RDONLY;
 	sb->s_op = &cramfs_ops;
-	root = get_cramfs_inode(sb, &super.root, 0);
+	root = get_cramfs_inode(sb, cramfs_root, 0);
 	if (IS_ERR(root))
 		return PTR_ERR(root);
 	sb->s_root = d_make_root(root);
@@ -347,6 +394,92 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
 	return 0;
 }
 
+static int cramfs_blkdev_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct cramfs_sb_info *sbi;
+	struct cramfs_super super;
+	int i, err;
+
+	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
+	if (!sbi)
+		return -ENOMEM;
+	sb->s_fs_info = sbi;
+
+	/* Invalidate the read buffers on mount: think disk change.. */
+	for (i = 0; i < READ_BUFFERS; i++)
+		buffer_blocknr[i] = -1;
+
+	err = cramfs_read_super(sb, &super, silent);
+	if (err)
+		return err;
+	return cramfs_finalize_super(sb, &super.root);
+}
+
+static int cramfs_physmem_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct cramfs_sb_info *sbi;
+	struct cramfs_super super;
+	char *p;
+	int err;
+
+	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
+	if (!sbi)
+		return -ENOMEM;
+	sb->s_fs_info = sbi;
+
+	/*
+	 * The physical location of the cramfs image is specified as
+	 * a mount parameter.  This parameter is mandatory for obvious
+	 * reasons.  Some validation is made on the phys address but this
+	 * is not exhaustive and we count on the fact that someone using
+	 * this feature is supposed to know what he/she's doing.
+	 */
+	if (!data || !(p = strstr((char *)data, "physaddr="))) {
+		pr_err("unknown physical address for linear cramfs image\n");
+		return -EINVAL;
+	}
+	sbi->linear_phys_addr = memparse(p + 9, NULL);
+	if (!sbi->linear_phys_addr) {
+		pr_err("bad value for cramfs image physical address\n");
+		return -EINVAL;
+	}
+	if (sbi->linear_phys_addr & (PAGE_SIZE-1)) {
+		pr_err("physical address %pap for linear cramfs isn't aligned to a page boundary\n",
+			&sbi->linear_phys_addr);
+		return -EINVAL;
+	}
+
+	/*
+	 * Map only one page for now.  Will remap it when fs size is known.
+	 * Although we'll only read from it, we want the CPU cache to
+	 * kick in for the higher throughput it provides, hence MEMREMAP_WB.
+	 */
+	pr_info("checking physical address %pap for linear cramfs image\n", &sbi->linear_phys_addr);
+	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, PAGE_SIZE,
+					 MEMREMAP_WB);
+	if (!sbi->linear_virt_addr) {
+		pr_err("ioremap of the linear cramfs image failed\n");
+		return -ENOMEM;
+	}
+
+	err = cramfs_read_super(sb, &super, silent);
+	if (err)
+		return err;
+
+	/* Remap the whole filesystem now */
+	pr_info("linear cramfs image appears to be %lu KB in size\n",
+		sbi->size/1024);
+	memunmap(sbi->linear_virt_addr);
+	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, sbi->size,
+					 MEMREMAP_WB);
+	if (!sbi->linear_virt_addr) {
+		pr_err("ioremap of the linear cramfs image failed\n");
+		return -ENOMEM;
+	}
+
+	return cramfs_finalize_super(sb, &super.root);
+}
+
 static int cramfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct super_block *sb = dentry->d_sb;
@@ -573,38 +706,67 @@ static const struct super_operations cramfs_ops = {
 	.statfs		= cramfs_statfs,
 };
 
-static struct dentry *cramfs_mount(struct file_system_type *fs_type,
-	int flags, const char *dev_name, void *data)
+static struct dentry *cramfs_blkdev_mount(struct file_system_type *fs_type,
+				int flags, const char *dev_name, void *data)
+{
+	return mount_bdev(fs_type, flags, dev_name, data, cramfs_blkdev_fill_super);
+}
+
+static struct dentry *cramfs_physmem_mount(struct file_system_type *fs_type,
+				int flags, const char *dev_name, void *data)
 {
-	return mount_bdev(fs_type, flags, dev_name, data, cramfs_fill_super);
+	return mount_nodev(fs_type, flags, data, cramfs_physmem_fill_super);
 }
 
 static struct file_system_type cramfs_fs_type = {
 	.owner		= THIS_MODULE,
 	.name		= "cramfs",
-	.mount		= cramfs_mount,
-	.kill_sb	= cramfs_kill_sb,
+	.mount		= cramfs_blkdev_mount,
+	.kill_sb	= cramfs_blkdev_kill_sb,
 	.fs_flags	= FS_REQUIRES_DEV,
 };
+
+static struct file_system_type cramfs_physmem_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= "cramfs_physmem",
+	.mount		= cramfs_physmem_mount,
+	.kill_sb	= cramfs_physmem_kill_sb,
+};
+
+#ifdef CONFIG_CRAMFS_BLOCKDEV
 MODULE_ALIAS_FS("cramfs");
+#endif
+#ifdef CONFIG_CRAMFS_PHYSMEM
+MODULE_ALIAS_FS("cramfs_physmem");
+#endif
 
 static int __init init_cramfs_fs(void)
 {
 	int rv;
 
-	rv = cramfs_uncompress_init();
-	if (rv < 0)
-		return rv;
-	rv = register_filesystem(&cramfs_fs_type);
-	if (rv < 0)
-		cramfs_uncompress_exit();
-	return rv;
+	if ((rv = cramfs_uncompress_init()) < 0)
+		goto err0;
+	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) &&
+	    (rv = register_filesystem(&cramfs_fs_type)) < 0)
+		goto err1;
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
+	    (rv = register_filesystem(&cramfs_physmem_fs_type)) < 0)
+		goto err2;
+	return 0;
+
+err2:	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
+		unregister_filesystem(&cramfs_fs_type);
+err1:	cramfs_uncompress_exit();
+err0:	return rv;
 }
 
 static void __exit exit_cramfs_fs(void)
 {
 	cramfs_uncompress_exit();
-	unregister_filesystem(&cramfs_fs_type);
+	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
+		unregister_filesystem(&cramfs_fs_type);
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM))
+		unregister_filesystem(&cramfs_physmem_fs_type);
 }
 
 module_init(init_cramfs_fs)
-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 2/5] cramfs: make cramfs_physmem usable as root fs
  2017-09-27 23:32 ` Nicolas Pitre
@ 2017-09-27 23:32   ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 init/do_mounts.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/init/do_mounts.c b/init/do_mounts.c
index c2de5104aa..43b5817f60 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -556,6 +556,14 @@ void __init prepare_namespace(void)
 		ssleep(root_delay);
 	}
 
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && root_fs_names &&
+	    !strcmp(root_fs_names, "cramfs_physmem")) {
+		int err = do_mount_root("cramfs", "cramfs_physmem",
+					root_mountflags, root_mount_data);
+		if (!err)
+			goto out;
+	}
+
 	/*
 	 * wait for the known devices to complete their probing
 	 *
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 2/5] cramfs: make cramfs_physmem usable as root fs
@ 2017-09-27 23:32   ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 init/do_mounts.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/init/do_mounts.c b/init/do_mounts.c
index c2de5104aa..43b5817f60 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -556,6 +556,14 @@ void __init prepare_namespace(void)
 		ssleep(root_delay);
 	}
 
+	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && root_fs_names &&
+	    !strcmp(root_fs_names, "cramfs_physmem")) {
+		int err = do_mount_root("cramfs", "cramfs_physmem",
+					root_mountflags, root_mount_data);
+		if (!err)
+			goto out;
+	}
+
 	/*
 	 * wait for the known devices to complete their probing
 	 *
-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 3/5] cramfs: implement uncompressed and arbitrary data block positioning
  2017-09-27 23:32 ` Nicolas Pitre
@ 2017-09-27 23:32   ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Two new capabilities are introduced here:

- The ability to store some blocks uncompressed.

- The ability to locate blocks anywhere.

Those capabilities can be used independently, but the combination
opens the possibility for execute-in-place (XIP) of program text segments
that must remain uncompressed, and in the MMU case, must have a specific
alignment.  It is even possible to still have the writable data segments
from the same file compressed as they have to be copied into RAM anyway.

This is achieved by giving special meanings to some unused block pointer
bits while remaining compatible with legacy cramfs images.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 fs/cramfs/README               | 31 ++++++++++++++-
 fs/cramfs/inode.c              | 87 +++++++++++++++++++++++++++++++++---------
 include/uapi/linux/cramfs_fs.h | 20 +++++++++-
 3 files changed, 118 insertions(+), 20 deletions(-)

diff --git a/fs/cramfs/README b/fs/cramfs/README
index 9d4e7ea311..d71b27e0ff 100644
--- a/fs/cramfs/README
+++ b/fs/cramfs/README
@@ -49,17 +49,46 @@ same as the start of the (i+1)'th <block> if there is one).  The first
 <block> immediately follows the last <block_pointer> for the file.
 <block_pointer>s are each 32 bits long.
 
+When the CRAMFS_FLAG_EXT_BLOCK_POINTERS capability bit is set, each
+<block_pointer>'s top bits may contain special flags as follows:
+
+CRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31):
+	The block data is not compressed and should be copied verbatim.
+
+CRAMFS_BLK_FLAG_DIRECT_PTR (bit 30):
+	The <block_pointer> stores the actual block start offset and not
+	its end, shifted right by 2 bits. The block must therefore be
+	aligned to a 4-byte boundary. The block size is either blksize
+	if CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified, otherwise
+	the compressed data length is included in the first 2 bytes of
+	the block data. This is used to allow discontiguous data layout
+	and specific data block alignments e.g. for XIP applications.
+
+
 The order of <file_data>'s is a depth-first descent of the directory
 tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
 -print'.
 
 
 <block>: The i'th <block> is the output of zlib's compress function
-applied to the i'th blksize-sized chunk of the input data.
+applied to the i'th blksize-sized chunk of the input data if the
+corresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <block_ptr> bit is not set,
+otherwise it is the input data directly.
 (For the last <block> of the file, the input may of course be smaller.)
 Each <block> may be a different size.  (See <block_pointer> above.)
+
 <block>s are merely byte-aligned, not generally u32-aligned.
 
+When CRAMFS_BLK_FLAG_DIRECT_PTR is specified then the corresponding
+<block> may be located anywhere and not necessarily contiguous with
+the previous/next blocks. In that case it is minimally u32-aligned.
+If CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified then the size is always
+blksize except for the last block which is limited by the file length.
+If CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMFS_BLK_FLAG_UNCOMPRESSED
+is not set then the first 2 bytes of the block contains the size of the
+remaining block data as this cannot be determined from the placement of
+logically adjacent blocks.
+
 
 Holes
 -----
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 19f464a214..2fc886092b 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -636,33 +636,84 @@ static int cramfs_readpage(struct file *file, struct page *page)
 	if (page->index < maxblock) {
 		struct super_block *sb = inode->i_sb;
 		u32 blkptr_offset = OFFSET(inode) + page->index*4;
-		u32 start_offset, compr_len;
+		u32 block_ptr, block_start, block_len;
+		bool uncompressed, direct;
 
-		start_offset = OFFSET(inode) + maxblock*4;
 		mutex_lock(&read_mutex);
-		if (page->index)
-			start_offset = *(u32 *) cramfs_read(sb, blkptr_offset-4,
-				4);
-		compr_len = (*(u32 *) cramfs_read(sb, blkptr_offset, 4) -
-			start_offset);
-		mutex_unlock(&read_mutex);
+		block_ptr = *(u32 *) cramfs_read(sb, blkptr_offset, 4);
+		uncompressed = (block_ptr & CRAMFS_BLK_FLAG_UNCOMPRESSED);
+		direct = (block_ptr & CRAMFS_BLK_FLAG_DIRECT_PTR);
+		block_ptr &= ~CRAMFS_BLK_FLAGS;
+
+		if (direct) {
+			/*
+			 * The block pointer is an absolute start pointer,
+			 * shifted by 2 bits. The size is included in the
+			 * first 2 bytes of the data block when compressed,
+			 * or PAGE_SIZE otherwise.
+			 */
+			block_start = block_ptr << 2;
+			if (uncompressed) {
+				block_len = PAGE_SIZE;
+				/* if last block: cap to file length */
+				if (page->index == maxblock - 1)
+					block_len = offset_in_page(inode->i_size);
+			} else {
+				block_len = *(u16 *)
+					cramfs_read(sb, block_start, 2);
+				block_start += 2;
+			}
+		} else {
+			/*
+			 * The block pointer indicates one past the end of
+			 * the current block (start of next block). If this
+			 * is the first block then it starts where the block
+			 * pointer table ends, otherwise its start comes
+			 * from the previous block's pointer.
+			 */
+			block_start = OFFSET(inode) + maxblock*4;
+			if (page->index)
+				block_start = *(u32 *)
+					cramfs_read(sb, blkptr_offset-4, 4);
+			/* Beware... previous ptr might be a direct ptr */
+			if (unlikely(block_start & CRAMFS_BLK_FLAG_DIRECT_PTR)) {
+				/* See comments on earlier code. */
+				u32 prev_start = block_start;
+			       block_start = prev_start & ~CRAMFS_BLK_FLAGS;
+			       block_start <<= 2;
+				if (prev_start & CRAMFS_BLK_FLAG_UNCOMPRESSED) {
+					block_start += PAGE_SIZE;
+				} else {
+					block_len = *(u16 *)
+						cramfs_read(sb, block_start, 2);
+					block_start += 2 + block_len;
+				}
+			}
+			block_start &= ~CRAMFS_BLK_FLAGS;
+			block_len = block_ptr - block_start;
+		}
 
-		if (compr_len == 0)
+		if (block_len == 0)
 			; /* hole */
-		else if (unlikely(compr_len > (PAGE_SIZE << 1))) {
-			pr_err("bad compressed blocksize %u\n",
-				compr_len);
+		else if (unlikely(block_len > 2*PAGE_SIZE ||
+				  (uncompressed && block_len > PAGE_SIZE))) {
+			mutex_unlock(&read_mutex);
+			pr_err("bad data blocksize %u\n", block_len);
 			goto err;
+		} else if (uncompressed) {
+			memcpy(pgdata,
+			       cramfs_read(sb, block_start, block_len),
+			       block_len);
+			bytes_filled = block_len;
 		} else {
-			mutex_lock(&read_mutex);
 			bytes_filled = cramfs_uncompress_block(pgdata,
 				 PAGE_SIZE,
-				 cramfs_read(sb, start_offset, compr_len),
-				 compr_len);
-			mutex_unlock(&read_mutex);
-			if (unlikely(bytes_filled < 0))
-				goto err;
+				 cramfs_read(sb, block_start, block_len),
+				 block_len);
 		}
+		mutex_unlock(&read_mutex);
+		if (unlikely(bytes_filled < 0))
+			goto err;
 	}
 
 	memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled);
diff --git a/include/uapi/linux/cramfs_fs.h b/include/uapi/linux/cramfs_fs.h
index e4611a9b92..c7a7883fab 100644
--- a/include/uapi/linux/cramfs_fs.h
+++ b/include/uapi/linux/cramfs_fs.h
@@ -73,6 +73,7 @@ struct cramfs_super {
 #define CRAMFS_FLAG_HOLES		0x00000100	/* support for holes */
 #define CRAMFS_FLAG_WRONG_SIGNATURE	0x00000200	/* reserved */
 #define CRAMFS_FLAG_SHIFTED_ROOT_OFFSET	0x00000400	/* shifted root fs */
+#define CRAMFS_FLAG_EXT_BLOCK_POINTERS	0x00000800	/* block pointer extensions */
 
 /*
  * Valid values in super.flags.  Currently we refuse to mount
@@ -82,7 +83,24 @@ struct cramfs_super {
 #define CRAMFS_SUPPORTED_FLAGS	( 0x000000ff \
 				| CRAMFS_FLAG_HOLES \
 				| CRAMFS_FLAG_WRONG_SIGNATURE \
-				| CRAMFS_FLAG_SHIFTED_ROOT_OFFSET )
+				| CRAMFS_FLAG_SHIFTED_ROOT_OFFSET \
+				| CRAMFS_FLAG_EXT_BLOCK_POINTERS )
 
+/*
+ * Block pointer flags
+ *
+ * The maximum block offset that needs to be represented is roughly:
+ *
+ *   (1 << CRAMFS_OFFSET_WIDTH) * 4 +
+ *   (1 << CRAMFS_SIZE_WIDTH) / PAGE_SIZE * (4 + PAGE_SIZE)
+ *   = 0x11004000
+ *
+ * That leaves room for 3 flag bits in the block pointer table.
+ */
+#define CRAMFS_BLK_FLAG_UNCOMPRESSED	(1 << 31)
+#define CRAMFS_BLK_FLAG_DIRECT_PTR	(1 << 30)
+
+#define CRAMFS_BLK_FLAGS	( CRAMFS_BLK_FLAG_UNCOMPRESSED \
+				| CRAMFS_BLK_FLAG_DIRECT_PTR )
 
 #endif /* _UAPI__CRAMFS_H */
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 3/5] cramfs: implement uncompressed and arbitrary data block positioning
@ 2017-09-27 23:32   ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Two new capabilities are introduced here:

- The ability to store some blocks uncompressed.

- The ability to locate blocks anywhere.

Those capabilities can be used independently, but the combination
opens the possibility for execute-in-place (XIP) of program text segments
that must remain uncompressed, and in the MMU case, must have a specific
alignment.  It is even possible to still have the writable data segments
from the same file compressed as they have to be copied into RAM anyway.

This is achieved by giving special meanings to some unused block pointer
bits while remaining compatible with legacy cramfs images.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 fs/cramfs/README               | 31 ++++++++++++++-
 fs/cramfs/inode.c              | 87 +++++++++++++++++++++++++++++++++---------
 include/uapi/linux/cramfs_fs.h | 20 +++++++++-
 3 files changed, 118 insertions(+), 20 deletions(-)

diff --git a/fs/cramfs/README b/fs/cramfs/README
index 9d4e7ea311..d71b27e0ff 100644
--- a/fs/cramfs/README
+++ b/fs/cramfs/README
@@ -49,17 +49,46 @@ same as the start of the (i+1)'th <block> if there is one).  The first
 <block> immediately follows the last <block_pointer> for the file.
 <block_pointer>s are each 32 bits long.
 
+When the CRAMFS_FLAG_EXT_BLOCK_POINTERS capability bit is set, each
+<block_pointer>'s top bits may contain special flags as follows:
+
+CRAMFS_BLK_FLAG_UNCOMPRESSED (bit 31):
+	The block data is not compressed and should be copied verbatim.
+
+CRAMFS_BLK_FLAG_DIRECT_PTR (bit 30):
+	The <block_pointer> stores the actual block start offset and not
+	its end, shifted right by 2 bits. The block must therefore be
+	aligned to a 4-byte boundary. The block size is either blksize
+	if CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified, otherwise
+	the compressed data length is included in the first 2 bytes of
+	the block data. This is used to allow discontiguous data layout
+	and specific data block alignments e.g. for XIP applications.
+
+
 The order of <file_data>'s is a depth-first descent of the directory
 tree, i.e. the same order as `find -size +0 \( -type f -o -type l \)
 -print'.
 
 
 <block>: The i'th <block> is the output of zlib's compress function
-applied to the i'th blksize-sized chunk of the input data.
+applied to the i'th blksize-sized chunk of the input data if the
+corresponding CRAMFS_BLK_FLAG_UNCOMPRESSED <block_ptr> bit is not set,
+otherwise it is the input data directly.
 (For the last <block> of the file, the input may of course be smaller.)
 Each <block> may be a different size.  (See <block_pointer> above.)
+
 <block>s are merely byte-aligned, not generally u32-aligned.
 
+When CRAMFS_BLK_FLAG_DIRECT_PTR is specified then the corresponding
+<block> may be located anywhere and not necessarily contiguous with
+the previous/next blocks. In that case it is minimally u32-aligned.
+If CRAMFS_BLK_FLAG_UNCOMPRESSED is also specified then the size is always
+blksize except for the last block which is limited by the file length.
+If CRAMFS_BLK_FLAG_DIRECT_PTR is set and CRAMFS_BLK_FLAG_UNCOMPRESSED
+is not set then the first 2 bytes of the block contains the size of the
+remaining block data as this cannot be determined from the placement of
+logically adjacent blocks.
+
 
 Holes
 -----
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 19f464a214..2fc886092b 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -636,33 +636,84 @@ static int cramfs_readpage(struct file *file, struct page *page)
 	if (page->index < maxblock) {
 		struct super_block *sb = inode->i_sb;
 		u32 blkptr_offset = OFFSET(inode) + page->index*4;
-		u32 start_offset, compr_len;
+		u32 block_ptr, block_start, block_len;
+		bool uncompressed, direct;
 
-		start_offset = OFFSET(inode) + maxblock*4;
 		mutex_lock(&read_mutex);
-		if (page->index)
-			start_offset = *(u32 *) cramfs_read(sb, blkptr_offset-4,
-				4);
-		compr_len = (*(u32 *) cramfs_read(sb, blkptr_offset, 4) -
-			start_offset);
-		mutex_unlock(&read_mutex);
+		block_ptr = *(u32 *) cramfs_read(sb, blkptr_offset, 4);
+		uncompressed = (block_ptr & CRAMFS_BLK_FLAG_UNCOMPRESSED);
+		direct = (block_ptr & CRAMFS_BLK_FLAG_DIRECT_PTR);
+		block_ptr &= ~CRAMFS_BLK_FLAGS;
+
+		if (direct) {
+			/*
+			 * The block pointer is an absolute start pointer,
+			 * shifted by 2 bits. The size is included in the
+			 * first 2 bytes of the data block when compressed,
+			 * or PAGE_SIZE otherwise.
+			 */
+			block_start = block_ptr << 2;
+			if (uncompressed) {
+				block_len = PAGE_SIZE;
+				/* if last block: cap to file length */
+				if (page->index == maxblock - 1)
+					block_len = offset_in_page(inode->i_size);
+			} else {
+				block_len = *(u16 *)
+					cramfs_read(sb, block_start, 2);
+				block_start += 2;
+			}
+		} else {
+			/*
+			 * The block pointer indicates one past the end of
+			 * the current block (start of next block). If this
+			 * is the first block then it starts where the block
+			 * pointer table ends, otherwise its start comes
+			 * from the previous block's pointer.
+			 */
+			block_start = OFFSET(inode) + maxblock*4;
+			if (page->index)
+				block_start = *(u32 *)
+					cramfs_read(sb, blkptr_offset-4, 4);
+			/* Beware... previous ptr might be a direct ptr */
+			if (unlikely(block_start & CRAMFS_BLK_FLAG_DIRECT_PTR)) {
+				/* See comments on earlier code. */
+				u32 prev_start = block_start;
+			       block_start = prev_start & ~CRAMFS_BLK_FLAGS;
+			       block_start <<= 2;
+				if (prev_start & CRAMFS_BLK_FLAG_UNCOMPRESSED) {
+					block_start += PAGE_SIZE;
+				} else {
+					block_len = *(u16 *)
+						cramfs_read(sb, block_start, 2);
+					block_start += 2 + block_len;
+				}
+			}
+			block_start &= ~CRAMFS_BLK_FLAGS;
+			block_len = block_ptr - block_start;
+		}
 
-		if (compr_len == 0)
+		if (block_len == 0)
 			; /* hole */
-		else if (unlikely(compr_len > (PAGE_SIZE << 1))) {
-			pr_err("bad compressed blocksize %u\n",
-				compr_len);
+		else if (unlikely(block_len > 2*PAGE_SIZE ||
+				  (uncompressed && block_len > PAGE_SIZE))) {
+			mutex_unlock(&read_mutex);
+			pr_err("bad data blocksize %u\n", block_len);
 			goto err;
+		} else if (uncompressed) {
+			memcpy(pgdata,
+			       cramfs_read(sb, block_start, block_len),
+			       block_len);
+			bytes_filled = block_len;
 		} else {
-			mutex_lock(&read_mutex);
 			bytes_filled = cramfs_uncompress_block(pgdata,
 				 PAGE_SIZE,
-				 cramfs_read(sb, start_offset, compr_len),
-				 compr_len);
-			mutex_unlock(&read_mutex);
-			if (unlikely(bytes_filled < 0))
-				goto err;
+				 cramfs_read(sb, block_start, block_len),
+				 block_len);
 		}
+		mutex_unlock(&read_mutex);
+		if (unlikely(bytes_filled < 0))
+			goto err;
 	}
 
 	memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled);
diff --git a/include/uapi/linux/cramfs_fs.h b/include/uapi/linux/cramfs_fs.h
index e4611a9b92..c7a7883fab 100644
--- a/include/uapi/linux/cramfs_fs.h
+++ b/include/uapi/linux/cramfs_fs.h
@@ -73,6 +73,7 @@ struct cramfs_super {
 #define CRAMFS_FLAG_HOLES		0x00000100	/* support for holes */
 #define CRAMFS_FLAG_WRONG_SIGNATURE	0x00000200	/* reserved */
 #define CRAMFS_FLAG_SHIFTED_ROOT_OFFSET	0x00000400	/* shifted root fs */
+#define CRAMFS_FLAG_EXT_BLOCK_POINTERS	0x00000800	/* block pointer extensions */
 
 /*
  * Valid values in super.flags.  Currently we refuse to mount
@@ -82,7 +83,24 @@ struct cramfs_super {
 #define CRAMFS_SUPPORTED_FLAGS	( 0x000000ff \
 				| CRAMFS_FLAG_HOLES \
 				| CRAMFS_FLAG_WRONG_SIGNATURE \
-				| CRAMFS_FLAG_SHIFTED_ROOT_OFFSET )
+				| CRAMFS_FLAG_SHIFTED_ROOT_OFFSET \
+				| CRAMFS_FLAG_EXT_BLOCK_POINTERS )
 
+/*
+ * Block pointer flags
+ *
+ * The maximum block offset that needs to be represented is roughly:
+ *
+ *   (1 << CRAMFS_OFFSET_WIDTH) * 4 +
+ *   (1 << CRAMFS_SIZE_WIDTH) / PAGE_SIZE * (4 + PAGE_SIZE)
+ *   = 0x11004000
+ *
+ * That leaves room for 3 flag bits in the block pointer table.
+ */
+#define CRAMFS_BLK_FLAG_UNCOMPRESSED	(1 << 31)
+#define CRAMFS_BLK_FLAG_DIRECT_PTR	(1 << 30)
+
+#define CRAMFS_BLK_FLAGS	( CRAMFS_BLK_FLAG_UNCOMPRESSED \
+				| CRAMFS_BLK_FLAG_DIRECT_PTR )
 
 #endif /* _UAPI__CRAMFS_H */
-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 4/5] cramfs: add mmap support
  2017-09-27 23:32 ` Nicolas Pitre
@ 2017-09-27 23:32   ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

When cramfs_physmem is used then we have the opportunity to map files
directly from ROM, directly into user space, saving on RAM usage.
This gives us Execute-In-Place (XIP) support.

For a file to be mmap()-able, the map area has to correspond to a range
of uncompressed and contiguous blocks, and in the MMU case it also has
to be page aligned. A version of mkcramfs with appropriate support is
necessary to create such a filesystem image.

In the MMU case it may happen for a vma structure to extend beyond the
actual file size. This is notably the case in binfmt_elf.c:elf_map().
Or the file's last block is shared with other files and cannot be mapped
as is. Rather than refusing to mmap it, we do a partial map and set up
a special vm_ops fault handler that splits the vma in two: the direct
mapping vma and the memory-backed vma populated by the readpage method.
In practice the unmapped area is seldom accessed so the split might never
occur before this area is discarded.

In the non-MMU case it is the get_unmapped_area method that is responsible
for providing the address where the actual data can be found. No mapping
is necessary of course.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 fs/cramfs/Kconfig |   2 +-
 fs/cramfs/inode.c | 295 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 296 insertions(+), 1 deletion(-)

diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 5b4e0b7e13..306549be25 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
 
 config CRAMFS_PHYSMEM
 	bool "Support CramFs image directly mapped in physical memory"
-	depends on CRAMFS
+	depends on CRAMFS = y
 	default y if !CRAMFS_BLOCKDEV
 	help
 	  This option allows the CramFs driver to load data directly from
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 2fc886092b..1d7d61354b 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -15,7 +15,9 @@
 
 #include <linux/module.h>
 #include <linux/fs.h>
+#include <linux/file.h>
 #include <linux/pagemap.h>
+#include <linux/ramfs.h>
 #include <linux/init.h>
 #include <linux/string.h>
 #include <linux/blkdev.h>
@@ -49,6 +51,7 @@ static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
 static const struct super_operations cramfs_ops;
 static const struct inode_operations cramfs_dir_inode_operations;
 static const struct file_operations cramfs_directory_operations;
+static const struct file_operations cramfs_physmem_fops;
 static const struct address_space_operations cramfs_aops;
 
 static DEFINE_MUTEX(read_mutex);
@@ -96,6 +99,10 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
 	case S_IFREG:
 		inode->i_fop = &generic_ro_fops;
 		inode->i_data.a_ops = &cramfs_aops;
+		if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
+		    CRAMFS_SB(sb)->flags & CRAMFS_FLAG_EXT_BLOCK_POINTERS &&
+		    CRAMFS_SB(sb)->linear_phys_addr)
+			inode->i_fop = &cramfs_physmem_fops;
 		break;
 	case S_IFDIR:
 		inode->i_op = &cramfs_dir_inode_operations;
@@ -277,6 +284,294 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset,
 		return NULL;
 }
 
+/*
+ * For a mapping to be possible, we need a range of uncompressed and
+ * contiguous blocks. Return the offset for the first block and number of
+ * valid blocks for which that is true, or zero otherwise.
+ */
+static u32 cramfs_get_block_range(struct inode *inode, u32 pgoff, u32 *pages)
+{
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	int i;
+	u32 *blockptrs, blockaddr;
+
+	/*
+	 * We can dereference memory directly here as this code may be
+	 * reached only when there is a direct filesystem image mapping
+	 * available in memory.
+	 */
+	blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff*4);
+	blockaddr = blockptrs[0] & ~CRAMFS_BLK_FLAGS;
+	i = 0;
+	do {
+		u32 expect = blockaddr + i * (PAGE_SIZE >> 2);
+		expect |= CRAMFS_BLK_FLAG_DIRECT_PTR|CRAMFS_BLK_FLAG_UNCOMPRESSED;
+		if (blockptrs[i] != expect) {
+			pr_debug("range: block %d/%d got %#x expects %#x\n",
+				 pgoff+i, pgoff+*pages-1, blockptrs[i], expect);
+			if (i == 0)
+				return 0;
+			break;
+		}
+	} while (++i < *pages);
+
+	*pages = i;
+
+	/* stored "direct" block ptrs are shifted down by 2 bits */
+	return blockaddr << 2;
+}
+
+/*
+ * It is possible for cramfs_physmem_mmap() to partially populate the mapping
+ * causing page faults in the unmapped area. When that happens, we need to
+ * split the vma so that the unmapped area gets its own vma that can be backed
+ * with actual memory pages and loaded normally. This is necessary because
+ * remap_pfn_range() overwrites vma->vm_pgoff with the pfn and filemap_fault()
+ * no longer works with it. Furthermore this makes /proc/x/maps right.
+ * Q: is there a way to do split vma at mmap() time?
+ */
+static const struct vm_operations_struct cramfs_vmasplit_ops;
+static int cramfs_vmasplit_fault(struct vm_fault *vmf)
+{
+	struct mm_struct *mm = vmf->vma->vm_mm;
+	struct vm_area_struct *vma, *new_vma;
+	struct file *vma_file = get_file(vmf->vma->vm_file);
+	unsigned long split_val, split_addr;
+	unsigned int split_pgoff;
+	int ret;
+
+	/* We have some vma surgery to do and need the write lock. */
+	up_read(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		fput(vma_file);
+		return VM_FAULT_RETRY;
+	}
+
+	/* Make sure the vma didn't change between the locks */
+	ret = VM_FAULT_SIGSEGV;
+	vma = find_vma(mm, vmf->address);
+	if (!vma)
+		goto out_fput;
+
+	/*
+	 * Someone else might have raced with us and handled the fault,
+	 * changed the vma, etc. If so let it go back to user space and
+	 * fault again if necessary.
+	 */
+	ret = VM_FAULT_NOPAGE;
+	if (vma->vm_ops != &cramfs_vmasplit_ops || vma->vm_file != vma_file)
+		goto out_fput;
+	fput(vma_file);
+
+	/* Retrieve the vma split address and validate it */
+	split_val = (unsigned long)vma->vm_private_data;
+	split_pgoff = split_val & 0xfff;
+	split_addr = (split_val >> 12) << PAGE_SHIFT;
+	if (split_addr < vma->vm_start) {
+		/* bottom of vma was unmapped */
+		split_pgoff += (vma->vm_start - split_addr) >> PAGE_SHIFT;
+		split_addr = vma->vm_start;
+	}
+	pr_debug("fault: addr=%#lx vma=%#lx-%#lx split=%#lx\n",
+		 vmf->address, vma->vm_start, vma->vm_end, split_addr);
+	ret = VM_FAULT_SIGSEGV;
+	if (!split_val || split_addr > vmf->address || vma->vm_end <= vmf->address)
+		goto out;
+
+	if (unlikely(vma->vm_start == split_addr)) {
+		/* nothing to split */
+		new_vma = vma;
+	} else {
+		/* Split away the directly mapped area */
+		ret = VM_FAULT_OOM;
+		if (split_vma(mm, vma, split_addr, 0) != 0)
+			goto out;
+
+		/* The direct vma should no longer ever fault */
+		vma->vm_ops = NULL;
+
+		/* Retrieve the new vma covering the unmapped area */
+		new_vma = find_vma(mm, split_addr);
+		BUG_ON(new_vma == vma);
+		ret = VM_FAULT_SIGSEGV;
+		if (!new_vma)
+			goto out;
+	}
+
+	/*
+	 * Readjust the new vma with the actual file based pgoff and
+	 * process the fault normally on it.
+	 */
+	new_vma->vm_pgoff = split_pgoff;
+	new_vma->vm_ops = &generic_file_vm_ops;
+	new_vma->vm_flags &= ~(VM_IO | VM_PFNMAP | VM_DONTEXPAND);
+	vmf->vma = new_vma;
+	vmf->pgoff = split_pgoff;
+	vmf->pgoff += (vmf->address - new_vma->vm_start) >> PAGE_SHIFT;
+	downgrade_write(&mm->mmap_sem);
+	return filemap_fault(vmf);
+
+out_fput:
+	fput(vma_file);
+out:
+	downgrade_write(&mm->mmap_sem);
+	return ret;
+}
+
+static const struct vm_operations_struct cramfs_vmasplit_ops = {
+	.fault	= cramfs_vmasplit_fault,
+};
+
+static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, vma_pages, max_pages, offset;
+	unsigned long address;
+	char *fail_reason;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_MMU))
+		return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS;
+
+	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
+		return -EINVAL;
+
+	/* Could COW work here? */
+	fail_reason = "vma is writable";
+	if (vma->vm_flags & VM_WRITE)
+		goto fail;
+
+	vma_pages = (vma->vm_end - vma->vm_start + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	fail_reason = "beyond file limit";
+	if (vma->vm_pgoff >= max_pages)
+		goto fail;
+	pages = vma_pages;
+	if (pages > max_pages - vma->vm_pgoff)
+		pages = max_pages - vma->vm_pgoff;
+
+	offset = cramfs_get_block_range(inode, vma->vm_pgoff, &pages);
+	fail_reason = "unsuitable block layout";
+	if (!offset)
+		goto fail;
+	address = sbi->linear_phys_addr + offset;
+	fail_reason = "data is not page aligned";
+	if (!PAGE_ALIGNED(address))
+		goto fail;
+
+	/* Don't map the last page if it contains some other data */
+	if (unlikely(vma->vm_pgoff + pages == max_pages)) {
+		unsigned int partial = offset_in_page(inode->i_size);
+		if (partial) {
+			char *data = sbi->linear_virt_addr + offset;
+			data += (max_pages - 1) * PAGE_SIZE + partial;
+			while ((unsigned long)data & 7)
+				if (*data++ != 0)
+					goto nonzero;
+			while (offset_in_page(data)) {
+				if (*(u64 *)data != 0) {
+					nonzero:
+					pr_debug("mmap: %s: last page is shared\n",
+						 file_dentry(file)->d_name.name);
+					pages--;
+					break;
+				}
+				data += 8;
+			}
+		}
+	}
+
+	if (pages) {
+		/*
+		 * If we can't map it all, page faults will occur if the
+		 * unmapped area is accessed. Let's handle them to split the
+		 * vma and let the normal paging machinery take care of the
+		 * rest through cramfs_readpage(). Because remap_pfn_range()
+		 * repurposes vma->vm_pgoff, we have to save it somewhere.
+		 * Let's use vma->vm_private_data to hold both the pgoff and
+		 * the actual address split point. Maximum file size is 16MB
+		 * (12 bits pgoff) and max 20 bits pfn where a long is 32 bits
+		 * so we can pack both together.
+		 */
+		if (pages != vma_pages) {
+			unsigned int split_pgoff = vma->vm_pgoff + pages;
+			unsigned long split_pfn = (vma->vm_start >> PAGE_SHIFT) + pages;
+			unsigned long split_val = split_pgoff | (split_pfn << 12);
+			vma->vm_private_data = (void *)split_val;
+			vma->vm_ops = &cramfs_vmasplit_ops;
+			/* to keep remap_pfn_range() happy */
+			vma->vm_end = vma->vm_start + pages * PAGE_SIZE;
+		}
+
+		ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
+				      pages * PAGE_SIZE, vma->vm_page_prot);
+		/* restore vm_end in case we cheated it above */
+		vma->vm_end = vma->vm_start + vma_pages * PAGE_SIZE;
+		if (ret)
+			return ret;
+
+		pr_debug("mapped %s at 0x%08lx (%u/%u pages) to vma 0x%08lx, "
+			 "page_prot 0x%llx\n", file_dentry(file)->d_name.name,
+			 address, pages, vma_pages, vma->vm_start,
+			 (unsigned long long)pgprot_val(vma->vm_page_prot));
+		return 0;
+	}
+	fail_reason = "no suitable block remaining";
+
+fail:
+	pr_debug("%s: direct mmap failed: %s\n",
+		 file_dentry(file)->d_name.name, fail_reason);
+
+	/* We failed to do a direct map, but normal paging will do it */
+	vma->vm_ops = &generic_file_vm_ops;
+	return 0;
+}
+
+#ifndef CONFIG_MMU
+
+static unsigned long cramfs_physmem_get_unmapped_area(struct file *file,
+			unsigned long addr, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, block_pages, max_pages, offset;
+
+	pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	if (pgoff >= max_pages || pages > max_pages - pgoff)
+		return -EINVAL;
+	block_pages = pages;
+	offset = cramfs_get_block_range(inode, pgoff, &block_pages);
+	if (!offset || block_pages != pages)
+		return -ENOSYS;
+	addr = sbi->linear_phys_addr + offset;
+	pr_debug("get_unmapped for %s ofs %#lx siz %lu at 0x%08lx\n",
+		 file_dentry(file)->d_name.name, pgoff*PAGE_SIZE, len, addr);
+	return addr;
+}
+
+static unsigned cramfs_physmem_mmap_capabilities(struct file *file)
+{
+	return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC;
+}
+#endif
+
+static const struct file_operations cramfs_physmem_fops = {
+	.llseek			= generic_file_llseek,
+	.read_iter		= generic_file_read_iter,
+	.splice_read		= generic_file_splice_read,
+	.mmap			= cramfs_physmem_mmap,
+#ifndef CONFIG_MMU
+	.get_unmapped_area	= cramfs_physmem_get_unmapped_area,
+	.mmap_capabilities	= cramfs_physmem_mmap_capabilities,
+#endif
+};
+
 static void cramfs_blkdev_kill_sb(struct super_block *sb)
 {
 	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 4/5] cramfs: add mmap support
@ 2017-09-27 23:32   ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

When cramfs_physmem is used then we have the opportunity to map files
directly from ROM, directly into user space, saving on RAM usage.
This gives us Execute-In-Place (XIP) support.

For a file to be mmap()-able, the map area has to correspond to a range
of uncompressed and contiguous blocks, and in the MMU case it also has
to be page aligned. A version of mkcramfs with appropriate support is
necessary to create such a filesystem image.

In the MMU case it may happen for a vma structure to extend beyond the
actual file size. This is notably the case in binfmt_elf.c:elf_map().
Or the file's last block is shared with other files and cannot be mapped
as is. Rather than refusing to mmap it, we do a partial map and set up
a special vm_ops fault handler that splits the vma in two: the direct
mapping vma and the memory-backed vma populated by the readpage method.
In practice the unmapped area is seldom accessed so the split might never
occur before this area is discarded.

In the non-MMU case it is the get_unmapped_area method that is responsible
for providing the address where the actual data can be found. No mapping
is necessary of course.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 fs/cramfs/Kconfig |   2 +-
 fs/cramfs/inode.c | 295 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 296 insertions(+), 1 deletion(-)

diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 5b4e0b7e13..306549be25 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
 
 config CRAMFS_PHYSMEM
 	bool "Support CramFs image directly mapped in physical memory"
-	depends on CRAMFS
+	depends on CRAMFS = y
 	default y if !CRAMFS_BLOCKDEV
 	help
 	  This option allows the CramFs driver to load data directly from
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 2fc886092b..1d7d61354b 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -15,7 +15,9 @@
 
 #include <linux/module.h>
 #include <linux/fs.h>
+#include <linux/file.h>
 #include <linux/pagemap.h>
+#include <linux/ramfs.h>
 #include <linux/init.h>
 #include <linux/string.h>
 #include <linux/blkdev.h>
@@ -49,6 +51,7 @@ static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
 static const struct super_operations cramfs_ops;
 static const struct inode_operations cramfs_dir_inode_operations;
 static const struct file_operations cramfs_directory_operations;
+static const struct file_operations cramfs_physmem_fops;
 static const struct address_space_operations cramfs_aops;
 
 static DEFINE_MUTEX(read_mutex);
@@ -96,6 +99,10 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
 	case S_IFREG:
 		inode->i_fop = &generic_ro_fops;
 		inode->i_data.a_ops = &cramfs_aops;
+		if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
+		    CRAMFS_SB(sb)->flags & CRAMFS_FLAG_EXT_BLOCK_POINTERS &&
+		    CRAMFS_SB(sb)->linear_phys_addr)
+			inode->i_fop = &cramfs_physmem_fops;
 		break;
 	case S_IFDIR:
 		inode->i_op = &cramfs_dir_inode_operations;
@@ -277,6 +284,294 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset,
 		return NULL;
 }
 
+/*
+ * For a mapping to be possible, we need a range of uncompressed and
+ * contiguous blocks. Return the offset for the first block and number of
+ * valid blocks for which that is true, or zero otherwise.
+ */
+static u32 cramfs_get_block_range(struct inode *inode, u32 pgoff, u32 *pages)
+{
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	int i;
+	u32 *blockptrs, blockaddr;
+
+	/*
+	 * We can dereference memory directly here as this code may be
+	 * reached only when there is a direct filesystem image mapping
+	 * available in memory.
+	 */
+	blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff*4);
+	blockaddr = blockptrs[0] & ~CRAMFS_BLK_FLAGS;
+	i = 0;
+	do {
+		u32 expect = blockaddr + i * (PAGE_SIZE >> 2);
+		expect |= CRAMFS_BLK_FLAG_DIRECT_PTR|CRAMFS_BLK_FLAG_UNCOMPRESSED;
+		if (blockptrs[i] != expect) {
+			pr_debug("range: block %d/%d got %#x expects %#x\n",
+				 pgoff+i, pgoff+*pages-1, blockptrs[i], expect);
+			if (i == 0)
+				return 0;
+			break;
+		}
+	} while (++i < *pages);
+
+	*pages = i;
+
+	/* stored "direct" block ptrs are shifted down by 2 bits */
+	return blockaddr << 2;
+}
+
+/*
+ * It is possible for cramfs_physmem_mmap() to partially populate the mapping
+ * causing page faults in the unmapped area. When that happens, we need to
+ * split the vma so that the unmapped area gets its own vma that can be backed
+ * with actual memory pages and loaded normally. This is necessary because
+ * remap_pfn_range() overwrites vma->vm_pgoff with the pfn and filemap_fault()
+ * no longer works with it. Furthermore this makes /proc/x/maps right.
+ * Q: is there a way to do split vma at mmap() time?
+ */
+static const struct vm_operations_struct cramfs_vmasplit_ops;
+static int cramfs_vmasplit_fault(struct vm_fault *vmf)
+{
+	struct mm_struct *mm = vmf->vma->vm_mm;
+	struct vm_area_struct *vma, *new_vma;
+	struct file *vma_file = get_file(vmf->vma->vm_file);
+	unsigned long split_val, split_addr;
+	unsigned int split_pgoff;
+	int ret;
+
+	/* We have some vma surgery to do and need the write lock. */
+	up_read(&mm->mmap_sem);
+	if (down_write_killable(&mm->mmap_sem)) {
+		fput(vma_file);
+		return VM_FAULT_RETRY;
+	}
+
+	/* Make sure the vma didn't change between the locks */
+	ret = VM_FAULT_SIGSEGV;
+	vma = find_vma(mm, vmf->address);
+	if (!vma)
+		goto out_fput;
+
+	/*
+	 * Someone else might have raced with us and handled the fault,
+	 * changed the vma, etc. If so let it go back to user space and
+	 * fault again if necessary.
+	 */
+	ret = VM_FAULT_NOPAGE;
+	if (vma->vm_ops != &cramfs_vmasplit_ops || vma->vm_file != vma_file)
+		goto out_fput;
+	fput(vma_file);
+
+	/* Retrieve the vma split address and validate it */
+	split_val = (unsigned long)vma->vm_private_data;
+	split_pgoff = split_val & 0xfff;
+	split_addr = (split_val >> 12) << PAGE_SHIFT;
+	if (split_addr < vma->vm_start) {
+		/* bottom of vma was unmapped */
+		split_pgoff += (vma->vm_start - split_addr) >> PAGE_SHIFT;
+		split_addr = vma->vm_start;
+	}
+	pr_debug("fault: addr=%#lx vma=%#lx-%#lx split=%#lx\n",
+		 vmf->address, vma->vm_start, vma->vm_end, split_addr);
+	ret = VM_FAULT_SIGSEGV;
+	if (!split_val || split_addr > vmf->address || vma->vm_end <= vmf->address)
+		goto out;
+
+	if (unlikely(vma->vm_start == split_addr)) {
+		/* nothing to split */
+		new_vma = vma;
+	} else {
+		/* Split away the directly mapped area */
+		ret = VM_FAULT_OOM;
+		if (split_vma(mm, vma, split_addr, 0) != 0)
+			goto out;
+
+		/* The direct vma should no longer ever fault */
+		vma->vm_ops = NULL;
+
+		/* Retrieve the new vma covering the unmapped area */
+		new_vma = find_vma(mm, split_addr);
+		BUG_ON(new_vma == vma);
+		ret = VM_FAULT_SIGSEGV;
+		if (!new_vma)
+			goto out;
+	}
+
+	/*
+	 * Readjust the new vma with the actual file based pgoff and
+	 * process the fault normally on it.
+	 */
+	new_vma->vm_pgoff = split_pgoff;
+	new_vma->vm_ops = &generic_file_vm_ops;
+	new_vma->vm_flags &= ~(VM_IO | VM_PFNMAP | VM_DONTEXPAND);
+	vmf->vma = new_vma;
+	vmf->pgoff = split_pgoff;
+	vmf->pgoff += (vmf->address - new_vma->vm_start) >> PAGE_SHIFT;
+	downgrade_write(&mm->mmap_sem);
+	return filemap_fault(vmf);
+
+out_fput:
+	fput(vma_file);
+out:
+	downgrade_write(&mm->mmap_sem);
+	return ret;
+}
+
+static const struct vm_operations_struct cramfs_vmasplit_ops = {
+	.fault	= cramfs_vmasplit_fault,
+};
+
+static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, vma_pages, max_pages, offset;
+	unsigned long address;
+	char *fail_reason;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_MMU))
+		return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS;
+
+	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
+		return -EINVAL;
+
+	/* Could COW work here? */
+	fail_reason = "vma is writable";
+	if (vma->vm_flags & VM_WRITE)
+		goto fail;
+
+	vma_pages = (vma->vm_end - vma->vm_start + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	fail_reason = "beyond file limit";
+	if (vma->vm_pgoff >= max_pages)
+		goto fail;
+	pages = vma_pages;
+	if (pages > max_pages - vma->vm_pgoff)
+		pages = max_pages - vma->vm_pgoff;
+
+	offset = cramfs_get_block_range(inode, vma->vm_pgoff, &pages);
+	fail_reason = "unsuitable block layout";
+	if (!offset)
+		goto fail;
+	address = sbi->linear_phys_addr + offset;
+	fail_reason = "data is not page aligned";
+	if (!PAGE_ALIGNED(address))
+		goto fail;
+
+	/* Don't map the last page if it contains some other data */
+	if (unlikely(vma->vm_pgoff + pages == max_pages)) {
+		unsigned int partial = offset_in_page(inode->i_size);
+		if (partial) {
+			char *data = sbi->linear_virt_addr + offset;
+			data += (max_pages - 1) * PAGE_SIZE + partial;
+			while ((unsigned long)data & 7)
+				if (*data++ != 0)
+					goto nonzero;
+			while (offset_in_page(data)) {
+				if (*(u64 *)data != 0) {
+					nonzero:
+					pr_debug("mmap: %s: last page is shared\n",
+						 file_dentry(file)->d_name.name);
+					pages--;
+					break;
+				}
+				data += 8;
+			}
+		}
+	}
+
+	if (pages) {
+		/*
+		 * If we can't map it all, page faults will occur if the
+		 * unmapped area is accessed. Let's handle them to split the
+		 * vma and let the normal paging machinery take care of the
+		 * rest through cramfs_readpage(). Because remap_pfn_range()
+		 * repurposes vma->vm_pgoff, we have to save it somewhere.
+		 * Let's use vma->vm_private_data to hold both the pgoff and
+		 * the actual address split point. Maximum file size is 16MB
+		 * (12 bits pgoff) and max 20 bits pfn where a long is 32 bits
+		 * so we can pack both together.
+		 */
+		if (pages != vma_pages) {
+			unsigned int split_pgoff = vma->vm_pgoff + pages;
+			unsigned long split_pfn = (vma->vm_start >> PAGE_SHIFT) + pages;
+			unsigned long split_val = split_pgoff | (split_pfn << 12);
+			vma->vm_private_data = (void *)split_val;
+			vma->vm_ops = &cramfs_vmasplit_ops;
+			/* to keep remap_pfn_range() happy */
+			vma->vm_end = vma->vm_start + pages * PAGE_SIZE;
+		}
+
+		ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
+				      pages * PAGE_SIZE, vma->vm_page_prot);
+		/* restore vm_end in case we cheated it above */
+		vma->vm_end = vma->vm_start + vma_pages * PAGE_SIZE;
+		if (ret)
+			return ret;
+
+		pr_debug("mapped %s at 0x%08lx (%u/%u pages) to vma 0x%08lx, "
+			 "page_prot 0x%llx\n", file_dentry(file)->d_name.name,
+			 address, pages, vma_pages, vma->vm_start,
+			 (unsigned long long)pgprot_val(vma->vm_page_prot));
+		return 0;
+	}
+	fail_reason = "no suitable block remaining";
+
+fail:
+	pr_debug("%s: direct mmap failed: %s\n",
+		 file_dentry(file)->d_name.name, fail_reason);
+
+	/* We failed to do a direct map, but normal paging will do it */
+	vma->vm_ops = &generic_file_vm_ops;
+	return 0;
+}
+
+#ifndef CONFIG_MMU
+
+static unsigned long cramfs_physmem_get_unmapped_area(struct file *file,
+			unsigned long addr, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, block_pages, max_pages, offset;
+
+	pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	if (pgoff >= max_pages || pages > max_pages - pgoff)
+		return -EINVAL;
+	block_pages = pages;
+	offset = cramfs_get_block_range(inode, pgoff, &block_pages);
+	if (!offset || block_pages != pages)
+		return -ENOSYS;
+	addr = sbi->linear_phys_addr + offset;
+	pr_debug("get_unmapped for %s ofs %#lx siz %lu at 0x%08lx\n",
+		 file_dentry(file)->d_name.name, pgoff*PAGE_SIZE, len, addr);
+	return addr;
+}
+
+static unsigned cramfs_physmem_mmap_capabilities(struct file *file)
+{
+	return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC;
+}
+#endif
+
+static const struct file_operations cramfs_physmem_fops = {
+	.llseek			= generic_file_llseek,
+	.read_iter		= generic_file_read_iter,
+	.splice_read		= generic_file_splice_read,
+	.mmap			= cramfs_physmem_mmap,
+#ifndef CONFIG_MMU
+	.get_unmapped_area	= cramfs_physmem_get_unmapped_area,
+	.mmap_capabilities	= cramfs_physmem_mmap_capabilities,
+#endif
+};
+
 static void cramfs_blkdev_kill_sb(struct super_block *sb)
 {
 	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 5/5] cramfs: rehabilitate it
  2017-09-27 23:32 ` Nicolas Pitre
@ 2017-09-27 23:32   ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Update documentation, pointer to latest tools, appoint myself as
maintainer. Given it's been unloved for so long, I don't expect anyone
will protest.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 Documentation/filesystems/cramfs.txt | 42 ++++++++++++++++++++++++++++++++++++
 MAINTAINERS                          |  4 ++--
 fs/cramfs/Kconfig                    |  9 +++++---
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/cramfs.txt b/Documentation/filesystems/cramfs.txt
index 4006298f67..8875d306bc 100644
--- a/Documentation/filesystems/cramfs.txt
+++ b/Documentation/filesystems/cramfs.txt
@@ -45,6 +45,48 @@ you can just change the #define in mkcramfs.c, so long as you don't
 mind the filesystem becoming unreadable to future kernels.
 
 
+Memory Mapped cramfs image
+--------------------------
+
+The CRAMFS_PHYSMEM Kconfig option adds support for loading data directly
+from a physical linear memory range (usually non volatile memory like Flash)
+to cramfs instead of going through the block device layer. This saves some
+memory since no intermediate buffering is necessary to hold the data before
+decompressing.
+
+And when data blocks are kept uncompressed and properly aligned, they will
+automatically be mapped directly into user space whenever possible providing
+eXecute-In-Place (XIP) from ROM of read-only segments. Data segments mapped
+read-write (hence they have to be copied to RAM) may still be compressed in
+the cramfs image in the same file along with non compressed read-only
+segments. Both MMU and no-MMU systems are supported. This is particularly
+handy for tiny embedded systems with very tight memory constraints.
+
+The filesystem type for this feature is "cramfs_physmem" to distinguish it
+from the block device (or MTD) based access. The location of the cramfs
+image in memory is system dependent. You must know the proper physical
+address where the cramfs image is located and specify it using the
+physaddr=0x******** mount option (for example, if the physical address
+of the cramfs image is 0x80100000, the following command would mount it
+on /mnt:
+
+$ mount -t cramfs_physmem -o physaddr=0x80100000 none /mnt
+
+To boot such an image as the root filesystem, the following kernel
+commandline parameters must be provided:
+
+	"rootfstype=cramfs_physmem rootflags=physaddr=0x80100000"
+
+
+Tools
+-----
+
+A version of mkcramfs that can take advantage of the latest capabilities
+described above can be found here:
+
+https://github.com/npitre/cramfs-tools
+
+
 For /usr/share/magic
 --------------------
 
diff --git a/MAINTAINERS b/MAINTAINERS
index 1c3feffb1c..f00aec6a66 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3612,8 +3612,8 @@ F:	drivers/cpuidle/*
 F:	include/linux/cpuidle.h
 
 CRAMFS FILESYSTEM
-W:	http://sourceforge.net/projects/cramfs/
-S:	Orphan / Obsolete
+M:	Nicolas Pitre <nico@linaro.org>
+S:	Maintained
 F:	Documentation/filesystems/cramfs.txt
 F:	fs/cramfs/
 
diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 306549be25..374d52e029 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -1,5 +1,5 @@
 config CRAMFS
-	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
+	tristate "Compressed ROM file system support (cramfs)"
 	select ZLIB_INFLATE
 	help
 	  Saying Y here includes support for CramFs (Compressed ROM File
@@ -15,8 +15,11 @@ config CRAMFS
 	  cramfs.  Note that the root file system (the one containing the
 	  directory /) cannot be compiled as a module.
 
-	  This filesystem is obsoleted by SquashFS, which is much better
-	  in terms of performance and features.
+	  This filesystem is limited in capabilities and performance on
+	  purpose to remain small and low on RAM usage. It is most suitable
+	  for small embedded systems. For a more capable compressed filesystem
+	  you should look at SquashFS which is much better in terms of
+	  performance and features.
 
 	  If unsure, say N.
 
-- 
2.9.5

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH v4 5/5] cramfs: rehabilitate it
@ 2017-09-27 23:32   ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-09-27 23:32 UTC (permalink / raw)
  To: Alexander Viro, linux-mm
  Cc: linux-fsdevel, linux-embedded, linux-kernel, Chris Brandt

Update documentation, pointer to latest tools, appoint myself as
maintainer. Given it's been unloved for so long, I don't expect anyone
will protest.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Chris Brandt <chris.brandt@renesas.com>
---
 Documentation/filesystems/cramfs.txt | 42 ++++++++++++++++++++++++++++++++++++
 MAINTAINERS                          |  4 ++--
 fs/cramfs/Kconfig                    |  9 +++++---
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/filesystems/cramfs.txt b/Documentation/filesystems/cramfs.txt
index 4006298f67..8875d306bc 100644
--- a/Documentation/filesystems/cramfs.txt
+++ b/Documentation/filesystems/cramfs.txt
@@ -45,6 +45,48 @@ you can just change the #define in mkcramfs.c, so long as you don't
 mind the filesystem becoming unreadable to future kernels.
 
 
+Memory Mapped cramfs image
+--------------------------
+
+The CRAMFS_PHYSMEM Kconfig option adds support for loading data directly
+from a physical linear memory range (usually non volatile memory like Flash)
+to cramfs instead of going through the block device layer. This saves some
+memory since no intermediate buffering is necessary to hold the data before
+decompressing.
+
+And when data blocks are kept uncompressed and properly aligned, they will
+automatically be mapped directly into user space whenever possible providing
+eXecute-In-Place (XIP) from ROM of read-only segments. Data segments mapped
+read-write (hence they have to be copied to RAM) may still be compressed in
+the cramfs image in the same file along with non compressed read-only
+segments. Both MMU and no-MMU systems are supported. This is particularly
+handy for tiny embedded systems with very tight memory constraints.
+
+The filesystem type for this feature is "cramfs_physmem" to distinguish it
+from the block device (or MTD) based access. The location of the cramfs
+image in memory is system dependent. You must know the proper physical
+address where the cramfs image is located and specify it using the
+physaddr=0x******** mount option (for example, if the physical address
+of the cramfs image is 0x80100000, the following command would mount it
+on /mnt:
+
+$ mount -t cramfs_physmem -o physaddr=0x80100000 none /mnt
+
+To boot such an image as the root filesystem, the following kernel
+commandline parameters must be provided:
+
+	"rootfstype=cramfs_physmem rootflags=physaddr=0x80100000"
+
+
+Tools
+-----
+
+A version of mkcramfs that can take advantage of the latest capabilities
+described above can be found here:
+
+https://github.com/npitre/cramfs-tools
+
+
 For /usr/share/magic
 --------------------
 
diff --git a/MAINTAINERS b/MAINTAINERS
index 1c3feffb1c..f00aec6a66 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3612,8 +3612,8 @@ F:	drivers/cpuidle/*
 F:	include/linux/cpuidle.h
 
 CRAMFS FILESYSTEM
-W:	http://sourceforge.net/projects/cramfs/
-S:	Orphan / Obsolete
+M:	Nicolas Pitre <nico@linaro.org>
+S:	Maintained
 F:	Documentation/filesystems/cramfs.txt
 F:	fs/cramfs/
 
diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 306549be25..374d52e029 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -1,5 +1,5 @@
 config CRAMFS
-	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
+	tristate "Compressed ROM file system support (cramfs)"
 	select ZLIB_INFLATE
 	help
 	  Saying Y here includes support for CramFs (Compressed ROM File
@@ -15,8 +15,11 @@ config CRAMFS
 	  cramfs.  Note that the root file system (the one containing the
 	  directory /) cannot be compiled as a module.
 
-	  This filesystem is obsoleted by SquashFS, which is much better
-	  in terms of performance and features.
+	  This filesystem is limited in capabilities and performance on
+	  purpose to remain small and low on RAM usage. It is most suitable
+	  for small embedded systems. For a more capable compressed filesystem
+	  you should look at SquashFS which is much better in terms of
+	  performance and features.
 
 	  If unsure, say N.
 
-- 
2.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
  2017-09-27 23:32   ` Nicolas Pitre
@ 2017-10-01  8:29     ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-01  8:29 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt, linux-mtd, devicetree

On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
> To distinguish between both access types, the cramfs_physmem filesystem
> type must be specified when using a memory accessible cramfs image, and
> the physaddr argument must provide the actual filesystem image's physical
> memory location.

Sorry, but this still is a complete no-go.  A physical address is not a
proper interface.  You still need to have some interface for your NOR nand
or DRAM.  - usually that would be a mtd driver, but if you have a good
reason why that's not suitable for you (and please explain it well)
we'll need a little OF or similar layer to bind a thin driver.

> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> Tested-by: Chris Brandt <chris.brandt@renesas.com>
> ---
>  fs/cramfs/Kconfig |  29 +++++-
>  fs/cramfs/inode.c | 264 +++++++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 241 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> index 11b29d491b..5b4e0b7e13 100644
> --- a/fs/cramfs/Kconfig
> +++ b/fs/cramfs/Kconfig
> @@ -1,6 +1,5 @@
>  config CRAMFS
>  	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
> -	depends on BLOCK
>  	select ZLIB_INFLATE
>  	help
>  	  Saying Y here includes support for CramFs (Compressed ROM File
> @@ -20,3 +19,31 @@ config CRAMFS
>  	  in terms of performance and features.
>  
>  	  If unsure, say N.
> +
> +config CRAMFS_BLOCKDEV
> +	bool "Support CramFs image over a regular block device" if EXPERT
> +	depends on CRAMFS && BLOCK
> +	default y
> +	help
> +	  This option allows the CramFs driver to load data from a regular
> +	  block device such a disk partition or a ramdisk.
> +
> +config CRAMFS_PHYSMEM
> +	bool "Support CramFs image directly mapped in physical memory"
> +	depends on CRAMFS
> +	default y if !CRAMFS_BLOCKDEV
> +	help
> +	  This option allows the CramFs driver to load data directly from
> +	  a linear adressed memory range (usually non volatile memory
> +	  like flash) instead of going through the block device layer.
> +	  This saves some memory since no intermediate buffering is
> +	  necessary.
> +
> +	  The filesystem type for this feature is "cramfs_physmem".
> +	  The location of the CramFs image in memory is board
> +	  dependent. Therefore, if you say Y, you must know the proper
> +	  physical address where to store the CramFs image and specify
> +	  it using the physaddr=0x******** mount option (for example:
> +	  "mount -t cramfs_physmem -o physaddr=0x100000 none /mnt").
> +
> +	  If unsure, say N.
> diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> index 7919967488..19f464a214 100644
> --- a/fs/cramfs/inode.c
> +++ b/fs/cramfs/inode.c
> @@ -24,6 +24,7 @@
>  #include <linux/mutex.h>
>  #include <uapi/linux/cramfs_fs.h>
>  #include <linux/uaccess.h>
> +#include <linux/io.h>
>  
>  #include "internal.h"
>  
> @@ -36,6 +37,8 @@ struct cramfs_sb_info {
>  	unsigned long blocks;
>  	unsigned long files;
>  	unsigned long flags;
> +	void *linear_virt_addr;
> +	phys_addr_t linear_phys_addr;
>  };
>  
>  static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
> @@ -140,6 +143,9 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
>   * BLKS_PER_BUF*PAGE_SIZE, so that the caller doesn't need to
>   * worry about end-of-buffer issues even when decompressing a full
>   * page cache.
> + *
> + * Note: This is all optimized away at compile time when
> + *       CONFIG_CRAMFS_BLOCKDEV=n.
>   */
>  #define READ_BUFFERS (2)
>  /* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */
> @@ -160,10 +166,10 @@ static struct super_block *buffer_dev[READ_BUFFERS];
>  static int next_buffer;
>  
>  /*
> - * Returns a pointer to a buffer containing at least LEN bytes of
> - * filesystem starting at byte offset OFFSET into the filesystem.
> + * Populate our block cache and return a pointer from it.
>   */
> -static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
> +static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> +				unsigned int len)
>  {
>  	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
>  	struct page *pages[BLKS_PER_BUF];
> @@ -239,7 +245,39 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i
>  	return read_buffers[buffer] + offset;
>  }
>  
> -static void cramfs_kill_sb(struct super_block *sb)
> +/*
> + * Return a pointer to the linearly addressed cramfs image in memory.
> + */
> +static void *cramfs_direct_read(struct super_block *sb, unsigned int offset,
> +				unsigned int len)
> +{
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +
> +	if (!len)
> +		return NULL;
> +	if (len > sbi->size || offset > sbi->size - len)
> +	       return page_address(ZERO_PAGE(0));
> +	return sbi->linear_virt_addr + offset;
> +}
> +
> +/*
> + * Returns a pointer to a buffer containing at least LEN bytes of
> + * filesystem starting at byte offset OFFSET into the filesystem.
> + */
> +static void *cramfs_read(struct super_block *sb, unsigned int offset,
> +			 unsigned int len)
> +{
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +
> +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && sbi->linear_virt_addr)
> +		return cramfs_direct_read(sb, offset, len);
> +	else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> +		return cramfs_blkdev_read(sb, offset, len);
> +	else
> +		return NULL;
> +}
> +
> +static void cramfs_blkdev_kill_sb(struct super_block *sb)
>  {
>  	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
>  
> @@ -247,6 +285,16 @@ static void cramfs_kill_sb(struct super_block *sb)
>  	kfree(sbi);
>  }
>  
> +static void cramfs_physmem_kill_sb(struct super_block *sb)
> +{
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +
> +	if (sbi->linear_virt_addr)
> +		memunmap(sbi->linear_virt_addr);
> +	kill_anon_super(sb);
> +	kfree(sbi);
> +}
> +
>  static int cramfs_remount(struct super_block *sb, int *flags, char *data)
>  {
>  	sync_filesystem(sb);
> @@ -254,34 +302,24 @@ static int cramfs_remount(struct super_block *sb, int *flags, char *data)
>  	return 0;
>  }
>  
> -static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> +static int cramfs_read_super(struct super_block *sb,
> +			     struct cramfs_super *super, int silent)
>  {
> -	int i;
> -	struct cramfs_super super;
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
>  	unsigned long root_offset;
> -	struct cramfs_sb_info *sbi;
> -	struct inode *root;
> -
> -	sb->s_flags |= MS_RDONLY;
> -
> -	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> -	if (!sbi)
> -		return -ENOMEM;
> -	sb->s_fs_info = sbi;
>  
> -	/* Invalidate the read buffers on mount: think disk change.. */
> -	mutex_lock(&read_mutex);
> -	for (i = 0; i < READ_BUFFERS; i++)
> -		buffer_blocknr[i] = -1;
> +	/* We don't know the real size yet */
> +	sbi->size = PAGE_SIZE;
>  
>  	/* Read the first block and get the superblock from it */
> -	memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
> +	mutex_lock(&read_mutex);
> +	memcpy(super, cramfs_read(sb, 0, sizeof(*super)), sizeof(*super));
>  	mutex_unlock(&read_mutex);
>  
>  	/* Do sanity checks on the superblock */
> -	if (super.magic != CRAMFS_MAGIC) {
> +	if (super->magic != CRAMFS_MAGIC) {
>  		/* check for wrong endianness */
> -		if (super.magic == CRAMFS_MAGIC_WEND) {
> +		if (super->magic == CRAMFS_MAGIC_WEND) {
>  			if (!silent)
>  				pr_err("wrong endianness\n");
>  			return -EINVAL;
> @@ -289,10 +327,10 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  
>  		/* check at 512 byte offset */
>  		mutex_lock(&read_mutex);
> -		memcpy(&super, cramfs_read(sb, 512, sizeof(super)), sizeof(super));
> +		memcpy(super, cramfs_read(sb, 512, sizeof(*super)), sizeof(*super));
>  		mutex_unlock(&read_mutex);
> -		if (super.magic != CRAMFS_MAGIC) {
> -			if (super.magic == CRAMFS_MAGIC_WEND && !silent)
> +		if (super->magic != CRAMFS_MAGIC) {
> +			if (super->magic == CRAMFS_MAGIC_WEND && !silent)
>  				pr_err("wrong endianness\n");
>  			else if (!silent)
>  				pr_err("wrong magic\n");
> @@ -301,34 +339,34 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  	}
>  
>  	/* get feature flags first */
> -	if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) {
> +	if (super->flags & ~CRAMFS_SUPPORTED_FLAGS) {
>  		pr_err("unsupported filesystem features\n");
>  		return -EINVAL;
>  	}
>  
>  	/* Check that the root inode is in a sane state */
> -	if (!S_ISDIR(super.root.mode)) {
> +	if (!S_ISDIR(super->root.mode)) {
>  		pr_err("root is not a directory\n");
>  		return -EINVAL;
>  	}
>  	/* correct strange, hard-coded permissions of mkcramfs */
> -	super.root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
> +	super->root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
>  
> -	root_offset = super.root.offset << 2;
> -	if (super.flags & CRAMFS_FLAG_FSID_VERSION_2) {
> -		sbi->size = super.size;
> -		sbi->blocks = super.fsid.blocks;
> -		sbi->files = super.fsid.files;
> +	root_offset = super->root.offset << 2;
> +	if (super->flags & CRAMFS_FLAG_FSID_VERSION_2) {
> +		sbi->size = super->size;
> +		sbi->blocks = super->fsid.blocks;
> +		sbi->files = super->fsid.files;
>  	} else {
>  		sbi->size = 1<<28;
>  		sbi->blocks = 0;
>  		sbi->files = 0;
>  	}
> -	sbi->magic = super.magic;
> -	sbi->flags = super.flags;
> +	sbi->magic = super->magic;
> +	sbi->flags = super->flags;
>  	if (root_offset == 0)
>  		pr_info("empty filesystem");
> -	else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
> +	else if (!(super->flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
>  		 ((root_offset != sizeof(struct cramfs_super)) &&
>  		  (root_offset != 512 + sizeof(struct cramfs_super))))
>  	{
> @@ -336,9 +374,18 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  		return -EINVAL;
>  	}
>  
> +	return 0;
> +}
> +
> +static int cramfs_finalize_super(struct super_block *sb,
> +				 struct cramfs_inode *cramfs_root)
> +{
> +	struct inode *root;
> +
>  	/* Set it all up.. */
> +	sb->s_flags |= MS_RDONLY;
>  	sb->s_op = &cramfs_ops;
> -	root = get_cramfs_inode(sb, &super.root, 0);
> +	root = get_cramfs_inode(sb, cramfs_root, 0);
>  	if (IS_ERR(root))
>  		return PTR_ERR(root);
>  	sb->s_root = d_make_root(root);
> @@ -347,6 +394,92 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  	return 0;
>  }
>  
> +static int cramfs_blkdev_fill_super(struct super_block *sb, void *data, int silent)
> +{
> +	struct cramfs_sb_info *sbi;
> +	struct cramfs_super super;
> +	int i, err;
> +
> +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> +	if (!sbi)
> +		return -ENOMEM;
> +	sb->s_fs_info = sbi;
> +
> +	/* Invalidate the read buffers on mount: think disk change.. */
> +	for (i = 0; i < READ_BUFFERS; i++)
> +		buffer_blocknr[i] = -1;
> +
> +	err = cramfs_read_super(sb, &super, silent);
> +	if (err)
> +		return err;
> +	return cramfs_finalize_super(sb, &super.root);
> +}
> +
> +static int cramfs_physmem_fill_super(struct super_block *sb, void *data, int silent)
> +{
> +	struct cramfs_sb_info *sbi;
> +	struct cramfs_super super;
> +	char *p;
> +	int err;
> +
> +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> +	if (!sbi)
> +		return -ENOMEM;
> +	sb->s_fs_info = sbi;
> +
> +	/*
> +	 * The physical location of the cramfs image is specified as
> +	 * a mount parameter.  This parameter is mandatory for obvious
> +	 * reasons.  Some validation is made on the phys address but this
> +	 * is not exhaustive and we count on the fact that someone using
> +	 * this feature is supposed to know what he/she's doing.
> +	 */
> +	if (!data || !(p = strstr((char *)data, "physaddr="))) {
> +		pr_err("unknown physical address for linear cramfs image\n");
> +		return -EINVAL;
> +	}
> +	sbi->linear_phys_addr = memparse(p + 9, NULL);
> +	if (!sbi->linear_phys_addr) {
> +		pr_err("bad value for cramfs image physical address\n");
> +		return -EINVAL;
> +	}
> +	if (sbi->linear_phys_addr & (PAGE_SIZE-1)) {
> +		pr_err("physical address %pap for linear cramfs isn't aligned to a page boundary\n",
> +			&sbi->linear_phys_addr);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Map only one page for now.  Will remap it when fs size is known.
> +	 * Although we'll only read from it, we want the CPU cache to
> +	 * kick in for the higher throughput it provides, hence MEMREMAP_WB.
> +	 */
> +	pr_info("checking physical address %pap for linear cramfs image\n", &sbi->linear_phys_addr);
> +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, PAGE_SIZE,
> +					 MEMREMAP_WB);
> +	if (!sbi->linear_virt_addr) {
> +		pr_err("ioremap of the linear cramfs image failed\n");
> +		return -ENOMEM;
> +	}
> +
> +	err = cramfs_read_super(sb, &super, silent);
> +	if (err)
> +		return err;
> +
> +	/* Remap the whole filesystem now */
> +	pr_info("linear cramfs image appears to be %lu KB in size\n",
> +		sbi->size/1024);
> +	memunmap(sbi->linear_virt_addr);
> +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, sbi->size,
> +					 MEMREMAP_WB);
> +	if (!sbi->linear_virt_addr) {
> +		pr_err("ioremap of the linear cramfs image failed\n");
> +		return -ENOMEM;
> +	}
> +
> +	return cramfs_finalize_super(sb, &super.root);
> +}
> +
>  static int cramfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>  	struct super_block *sb = dentry->d_sb;
> @@ -573,38 +706,67 @@ static const struct super_operations cramfs_ops = {
>  	.statfs		= cramfs_statfs,
>  };
>  
> -static struct dentry *cramfs_mount(struct file_system_type *fs_type,
> -	int flags, const char *dev_name, void *data)
> +static struct dentry *cramfs_blkdev_mount(struct file_system_type *fs_type,
> +				int flags, const char *dev_name, void *data)
> +{
> +	return mount_bdev(fs_type, flags, dev_name, data, cramfs_blkdev_fill_super);
> +}
> +
> +static struct dentry *cramfs_physmem_mount(struct file_system_type *fs_type,
> +				int flags, const char *dev_name, void *data)
>  {
> -	return mount_bdev(fs_type, flags, dev_name, data, cramfs_fill_super);
> +	return mount_nodev(fs_type, flags, data, cramfs_physmem_fill_super);
>  }
>  
>  static struct file_system_type cramfs_fs_type = {
>  	.owner		= THIS_MODULE,
>  	.name		= "cramfs",
> -	.mount		= cramfs_mount,
> -	.kill_sb	= cramfs_kill_sb,
> +	.mount		= cramfs_blkdev_mount,
> +	.kill_sb	= cramfs_blkdev_kill_sb,
>  	.fs_flags	= FS_REQUIRES_DEV,
>  };
> +
> +static struct file_system_type cramfs_physmem_fs_type = {
> +	.owner		= THIS_MODULE,
> +	.name		= "cramfs_physmem",
> +	.mount		= cramfs_physmem_mount,
> +	.kill_sb	= cramfs_physmem_kill_sb,
> +};
> +
> +#ifdef CONFIG_CRAMFS_BLOCKDEV
>  MODULE_ALIAS_FS("cramfs");
> +#endif
> +#ifdef CONFIG_CRAMFS_PHYSMEM
> +MODULE_ALIAS_FS("cramfs_physmem");
> +#endif
>  
>  static int __init init_cramfs_fs(void)
>  {
>  	int rv;
>  
> -	rv = cramfs_uncompress_init();
> -	if (rv < 0)
> -		return rv;
> -	rv = register_filesystem(&cramfs_fs_type);
> -	if (rv < 0)
> -		cramfs_uncompress_exit();
> -	return rv;
> +	if ((rv = cramfs_uncompress_init()) < 0)
> +		goto err0;
> +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) &&
> +	    (rv = register_filesystem(&cramfs_fs_type)) < 0)
> +		goto err1;
> +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
> +	    (rv = register_filesystem(&cramfs_physmem_fs_type)) < 0)
> +		goto err2;
> +	return 0;
> +
> +err2:	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> +		unregister_filesystem(&cramfs_fs_type);
> +err1:	cramfs_uncompress_exit();
> +err0:	return rv;
>  }
>  
>  static void __exit exit_cramfs_fs(void)
>  {
>  	cramfs_uncompress_exit();
> -	unregister_filesystem(&cramfs_fs_type);
> +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> +		unregister_filesystem(&cramfs_fs_type);
> +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM))
> +		unregister_filesystem(&cramfs_physmem_fs_type);
>  }
>  
>  module_init(init_cramfs_fs)
> -- 
> 2.9.5
> 
---end quoted text---

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
@ 2017-10-01  8:29     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-01  8:29 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt, linux-mtd, devicetree

On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
> To distinguish between both access types, the cramfs_physmem filesystem
> type must be specified when using a memory accessible cramfs image, and
> the physaddr argument must provide the actual filesystem image's physical
> memory location.

Sorry, but this still is a complete no-go.  A physical address is not a
proper interface.  You still need to have some interface for your NOR nand
or DRAM.  - usually that would be a mtd driver, but if you have a good
reason why that's not suitable for you (and please explain it well)
we'll need a little OF or similar layer to bind a thin driver.

> 
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
> Tested-by: Chris Brandt <chris.brandt@renesas.com>
> ---
>  fs/cramfs/Kconfig |  29 +++++-
>  fs/cramfs/inode.c | 264 +++++++++++++++++++++++++++++++++++++++++++-----------
>  2 files changed, 241 insertions(+), 52 deletions(-)
> 
> diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> index 11b29d491b..5b4e0b7e13 100644
> --- a/fs/cramfs/Kconfig
> +++ b/fs/cramfs/Kconfig
> @@ -1,6 +1,5 @@
>  config CRAMFS
>  	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
> -	depends on BLOCK
>  	select ZLIB_INFLATE
>  	help
>  	  Saying Y here includes support for CramFs (Compressed ROM File
> @@ -20,3 +19,31 @@ config CRAMFS
>  	  in terms of performance and features.
>  
>  	  If unsure, say N.
> +
> +config CRAMFS_BLOCKDEV
> +	bool "Support CramFs image over a regular block device" if EXPERT
> +	depends on CRAMFS && BLOCK
> +	default y
> +	help
> +	  This option allows the CramFs driver to load data from a regular
> +	  block device such a disk partition or a ramdisk.
> +
> +config CRAMFS_PHYSMEM
> +	bool "Support CramFs image directly mapped in physical memory"
> +	depends on CRAMFS
> +	default y if !CRAMFS_BLOCKDEV
> +	help
> +	  This option allows the CramFs driver to load data directly from
> +	  a linear adressed memory range (usually non volatile memory
> +	  like flash) instead of going through the block device layer.
> +	  This saves some memory since no intermediate buffering is
> +	  necessary.
> +
> +	  The filesystem type for this feature is "cramfs_physmem".
> +	  The location of the CramFs image in memory is board
> +	  dependent. Therefore, if you say Y, you must know the proper
> +	  physical address where to store the CramFs image and specify
> +	  it using the physaddr=0x******** mount option (for example:
> +	  "mount -t cramfs_physmem -o physaddr=0x100000 none /mnt").
> +
> +	  If unsure, say N.
> diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> index 7919967488..19f464a214 100644
> --- a/fs/cramfs/inode.c
> +++ b/fs/cramfs/inode.c
> @@ -24,6 +24,7 @@
>  #include <linux/mutex.h>
>  #include <uapi/linux/cramfs_fs.h>
>  #include <linux/uaccess.h>
> +#include <linux/io.h>
>  
>  #include "internal.h"
>  
> @@ -36,6 +37,8 @@ struct cramfs_sb_info {
>  	unsigned long blocks;
>  	unsigned long files;
>  	unsigned long flags;
> +	void *linear_virt_addr;
> +	phys_addr_t linear_phys_addr;
>  };
>  
>  static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
> @@ -140,6 +143,9 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
>   * BLKS_PER_BUF*PAGE_SIZE, so that the caller doesn't need to
>   * worry about end-of-buffer issues even when decompressing a full
>   * page cache.
> + *
> + * Note: This is all optimized away at compile time when
> + *       CONFIG_CRAMFS_BLOCKDEV=n.
>   */
>  #define READ_BUFFERS (2)
>  /* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */
> @@ -160,10 +166,10 @@ static struct super_block *buffer_dev[READ_BUFFERS];
>  static int next_buffer;
>  
>  /*
> - * Returns a pointer to a buffer containing at least LEN bytes of
> - * filesystem starting at byte offset OFFSET into the filesystem.
> + * Populate our block cache and return a pointer from it.
>   */
> -static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
> +static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> +				unsigned int len)
>  {
>  	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
>  	struct page *pages[BLKS_PER_BUF];
> @@ -239,7 +245,39 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i
>  	return read_buffers[buffer] + offset;
>  }
>  
> -static void cramfs_kill_sb(struct super_block *sb)
> +/*
> + * Return a pointer to the linearly addressed cramfs image in memory.
> + */
> +static void *cramfs_direct_read(struct super_block *sb, unsigned int offset,
> +				unsigned int len)
> +{
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +
> +	if (!len)
> +		return NULL;
> +	if (len > sbi->size || offset > sbi->size - len)
> +	       return page_address(ZERO_PAGE(0));
> +	return sbi->linear_virt_addr + offset;
> +}
> +
> +/*
> + * Returns a pointer to a buffer containing at least LEN bytes of
> + * filesystem starting at byte offset OFFSET into the filesystem.
> + */
> +static void *cramfs_read(struct super_block *sb, unsigned int offset,
> +			 unsigned int len)
> +{
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +
> +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && sbi->linear_virt_addr)
> +		return cramfs_direct_read(sb, offset, len);
> +	else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> +		return cramfs_blkdev_read(sb, offset, len);
> +	else
> +		return NULL;
> +}
> +
> +static void cramfs_blkdev_kill_sb(struct super_block *sb)
>  {
>  	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
>  
> @@ -247,6 +285,16 @@ static void cramfs_kill_sb(struct super_block *sb)
>  	kfree(sbi);
>  }
>  
> +static void cramfs_physmem_kill_sb(struct super_block *sb)
> +{
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +
> +	if (sbi->linear_virt_addr)
> +		memunmap(sbi->linear_virt_addr);
> +	kill_anon_super(sb);
> +	kfree(sbi);
> +}
> +
>  static int cramfs_remount(struct super_block *sb, int *flags, char *data)
>  {
>  	sync_filesystem(sb);
> @@ -254,34 +302,24 @@ static int cramfs_remount(struct super_block *sb, int *flags, char *data)
>  	return 0;
>  }
>  
> -static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> +static int cramfs_read_super(struct super_block *sb,
> +			     struct cramfs_super *super, int silent)
>  {
> -	int i;
> -	struct cramfs_super super;
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
>  	unsigned long root_offset;
> -	struct cramfs_sb_info *sbi;
> -	struct inode *root;
> -
> -	sb->s_flags |= MS_RDONLY;
> -
> -	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> -	if (!sbi)
> -		return -ENOMEM;
> -	sb->s_fs_info = sbi;
>  
> -	/* Invalidate the read buffers on mount: think disk change.. */
> -	mutex_lock(&read_mutex);
> -	for (i = 0; i < READ_BUFFERS; i++)
> -		buffer_blocknr[i] = -1;
> +	/* We don't know the real size yet */
> +	sbi->size = PAGE_SIZE;
>  
>  	/* Read the first block and get the superblock from it */
> -	memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
> +	mutex_lock(&read_mutex);
> +	memcpy(super, cramfs_read(sb, 0, sizeof(*super)), sizeof(*super));
>  	mutex_unlock(&read_mutex);
>  
>  	/* Do sanity checks on the superblock */
> -	if (super.magic != CRAMFS_MAGIC) {
> +	if (super->magic != CRAMFS_MAGIC) {
>  		/* check for wrong endianness */
> -		if (super.magic == CRAMFS_MAGIC_WEND) {
> +		if (super->magic == CRAMFS_MAGIC_WEND) {
>  			if (!silent)
>  				pr_err("wrong endianness\n");
>  			return -EINVAL;
> @@ -289,10 +327,10 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  
>  		/* check at 512 byte offset */
>  		mutex_lock(&read_mutex);
> -		memcpy(&super, cramfs_read(sb, 512, sizeof(super)), sizeof(super));
> +		memcpy(super, cramfs_read(sb, 512, sizeof(*super)), sizeof(*super));
>  		mutex_unlock(&read_mutex);
> -		if (super.magic != CRAMFS_MAGIC) {
> -			if (super.magic == CRAMFS_MAGIC_WEND && !silent)
> +		if (super->magic != CRAMFS_MAGIC) {
> +			if (super->magic == CRAMFS_MAGIC_WEND && !silent)
>  				pr_err("wrong endianness\n");
>  			else if (!silent)
>  				pr_err("wrong magic\n");
> @@ -301,34 +339,34 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  	}
>  
>  	/* get feature flags first */
> -	if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) {
> +	if (super->flags & ~CRAMFS_SUPPORTED_FLAGS) {
>  		pr_err("unsupported filesystem features\n");
>  		return -EINVAL;
>  	}
>  
>  	/* Check that the root inode is in a sane state */
> -	if (!S_ISDIR(super.root.mode)) {
> +	if (!S_ISDIR(super->root.mode)) {
>  		pr_err("root is not a directory\n");
>  		return -EINVAL;
>  	}
>  	/* correct strange, hard-coded permissions of mkcramfs */
> -	super.root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
> +	super->root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
>  
> -	root_offset = super.root.offset << 2;
> -	if (super.flags & CRAMFS_FLAG_FSID_VERSION_2) {
> -		sbi->size = super.size;
> -		sbi->blocks = super.fsid.blocks;
> -		sbi->files = super.fsid.files;
> +	root_offset = super->root.offset << 2;
> +	if (super->flags & CRAMFS_FLAG_FSID_VERSION_2) {
> +		sbi->size = super->size;
> +		sbi->blocks = super->fsid.blocks;
> +		sbi->files = super->fsid.files;
>  	} else {
>  		sbi->size = 1<<28;
>  		sbi->blocks = 0;
>  		sbi->files = 0;
>  	}
> -	sbi->magic = super.magic;
> -	sbi->flags = super.flags;
> +	sbi->magic = super->magic;
> +	sbi->flags = super->flags;
>  	if (root_offset == 0)
>  		pr_info("empty filesystem");
> -	else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
> +	else if (!(super->flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
>  		 ((root_offset != sizeof(struct cramfs_super)) &&
>  		  (root_offset != 512 + sizeof(struct cramfs_super))))
>  	{
> @@ -336,9 +374,18 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  		return -EINVAL;
>  	}
>  
> +	return 0;
> +}
> +
> +static int cramfs_finalize_super(struct super_block *sb,
> +				 struct cramfs_inode *cramfs_root)
> +{
> +	struct inode *root;
> +
>  	/* Set it all up.. */
> +	sb->s_flags |= MS_RDONLY;
>  	sb->s_op = &cramfs_ops;
> -	root = get_cramfs_inode(sb, &super.root, 0);
> +	root = get_cramfs_inode(sb, cramfs_root, 0);
>  	if (IS_ERR(root))
>  		return PTR_ERR(root);
>  	sb->s_root = d_make_root(root);
> @@ -347,6 +394,92 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
>  	return 0;
>  }
>  
> +static int cramfs_blkdev_fill_super(struct super_block *sb, void *data, int silent)
> +{
> +	struct cramfs_sb_info *sbi;
> +	struct cramfs_super super;
> +	int i, err;
> +
> +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> +	if (!sbi)
> +		return -ENOMEM;
> +	sb->s_fs_info = sbi;
> +
> +	/* Invalidate the read buffers on mount: think disk change.. */
> +	for (i = 0; i < READ_BUFFERS; i++)
> +		buffer_blocknr[i] = -1;
> +
> +	err = cramfs_read_super(sb, &super, silent);
> +	if (err)
> +		return err;
> +	return cramfs_finalize_super(sb, &super.root);
> +}
> +
> +static int cramfs_physmem_fill_super(struct super_block *sb, void *data, int silent)
> +{
> +	struct cramfs_sb_info *sbi;
> +	struct cramfs_super super;
> +	char *p;
> +	int err;
> +
> +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> +	if (!sbi)
> +		return -ENOMEM;
> +	sb->s_fs_info = sbi;
> +
> +	/*
> +	 * The physical location of the cramfs image is specified as
> +	 * a mount parameter.  This parameter is mandatory for obvious
> +	 * reasons.  Some validation is made on the phys address but this
> +	 * is not exhaustive and we count on the fact that someone using
> +	 * this feature is supposed to know what he/she's doing.
> +	 */
> +	if (!data || !(p = strstr((char *)data, "physaddr="))) {
> +		pr_err("unknown physical address for linear cramfs image\n");
> +		return -EINVAL;
> +	}
> +	sbi->linear_phys_addr = memparse(p + 9, NULL);
> +	if (!sbi->linear_phys_addr) {
> +		pr_err("bad value for cramfs image physical address\n");
> +		return -EINVAL;
> +	}
> +	if (sbi->linear_phys_addr & (PAGE_SIZE-1)) {
> +		pr_err("physical address %pap for linear cramfs isn't aligned to a page boundary\n",
> +			&sbi->linear_phys_addr);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Map only one page for now.  Will remap it when fs size is known.
> +	 * Although we'll only read from it, we want the CPU cache to
> +	 * kick in for the higher throughput it provides, hence MEMREMAP_WB.
> +	 */
> +	pr_info("checking physical address %pap for linear cramfs image\n", &sbi->linear_phys_addr);
> +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, PAGE_SIZE,
> +					 MEMREMAP_WB);
> +	if (!sbi->linear_virt_addr) {
> +		pr_err("ioremap of the linear cramfs image failed\n");
> +		return -ENOMEM;
> +	}
> +
> +	err = cramfs_read_super(sb, &super, silent);
> +	if (err)
> +		return err;
> +
> +	/* Remap the whole filesystem now */
> +	pr_info("linear cramfs image appears to be %lu KB in size\n",
> +		sbi->size/1024);
> +	memunmap(sbi->linear_virt_addr);
> +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, sbi->size,
> +					 MEMREMAP_WB);
> +	if (!sbi->linear_virt_addr) {
> +		pr_err("ioremap of the linear cramfs image failed\n");
> +		return -ENOMEM;
> +	}
> +
> +	return cramfs_finalize_super(sb, &super.root);
> +}
> +
>  static int cramfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  {
>  	struct super_block *sb = dentry->d_sb;
> @@ -573,38 +706,67 @@ static const struct super_operations cramfs_ops = {
>  	.statfs		= cramfs_statfs,
>  };
>  
> -static struct dentry *cramfs_mount(struct file_system_type *fs_type,
> -	int flags, const char *dev_name, void *data)
> +static struct dentry *cramfs_blkdev_mount(struct file_system_type *fs_type,
> +				int flags, const char *dev_name, void *data)
> +{
> +	return mount_bdev(fs_type, flags, dev_name, data, cramfs_blkdev_fill_super);
> +}
> +
> +static struct dentry *cramfs_physmem_mount(struct file_system_type *fs_type,
> +				int flags, const char *dev_name, void *data)
>  {
> -	return mount_bdev(fs_type, flags, dev_name, data, cramfs_fill_super);
> +	return mount_nodev(fs_type, flags, data, cramfs_physmem_fill_super);
>  }
>  
>  static struct file_system_type cramfs_fs_type = {
>  	.owner		= THIS_MODULE,
>  	.name		= "cramfs",
> -	.mount		= cramfs_mount,
> -	.kill_sb	= cramfs_kill_sb,
> +	.mount		= cramfs_blkdev_mount,
> +	.kill_sb	= cramfs_blkdev_kill_sb,
>  	.fs_flags	= FS_REQUIRES_DEV,
>  };
> +
> +static struct file_system_type cramfs_physmem_fs_type = {
> +	.owner		= THIS_MODULE,
> +	.name		= "cramfs_physmem",
> +	.mount		= cramfs_physmem_mount,
> +	.kill_sb	= cramfs_physmem_kill_sb,
> +};
> +
> +#ifdef CONFIG_CRAMFS_BLOCKDEV
>  MODULE_ALIAS_FS("cramfs");
> +#endif
> +#ifdef CONFIG_CRAMFS_PHYSMEM
> +MODULE_ALIAS_FS("cramfs_physmem");
> +#endif
>  
>  static int __init init_cramfs_fs(void)
>  {
>  	int rv;
>  
> -	rv = cramfs_uncompress_init();
> -	if (rv < 0)
> -		return rv;
> -	rv = register_filesystem(&cramfs_fs_type);
> -	if (rv < 0)
> -		cramfs_uncompress_exit();
> -	return rv;
> +	if ((rv = cramfs_uncompress_init()) < 0)
> +		goto err0;
> +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) &&
> +	    (rv = register_filesystem(&cramfs_fs_type)) < 0)
> +		goto err1;
> +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
> +	    (rv = register_filesystem(&cramfs_physmem_fs_type)) < 0)
> +		goto err2;
> +	return 0;
> +
> +err2:	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> +		unregister_filesystem(&cramfs_fs_type);
> +err1:	cramfs_uncompress_exit();
> +err0:	return rv;
>  }
>  
>  static void __exit exit_cramfs_fs(void)
>  {
>  	cramfs_uncompress_exit();
> -	unregister_filesystem(&cramfs_fs_type);
> +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> +		unregister_filesystem(&cramfs_fs_type);
> +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM))
> +		unregister_filesystem(&cramfs_physmem_fs_type);
>  }
>  
>  module_init(init_cramfs_fs)
> -- 
> 2.9.5
> 
---end quoted text---

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-09-27 23:32   ` Nicolas Pitre
@ 2017-10-01  8:30     ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-01  8:30 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt

up_read(&mm->mmap_sem) in the fault path is a still a complete
no-go,

NAK

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-01  8:30     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-01  8:30 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt

up_read(&mm->mmap_sem) in the fault path is a still a complete
no-go,

NAK

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
  2017-10-01  8:29     ` Christoph Hellwig
@ 2017-10-01 22:27       ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-01 22:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt, linux-mtd, devicetree

On Sun, 1 Oct 2017, Christoph Hellwig wrote:

> On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
> > To distinguish between both access types, the cramfs_physmem filesystem
> > type must be specified when using a memory accessible cramfs image, and
> > the physaddr argument must provide the actual filesystem image's physical
> > memory location.
> 
> Sorry, but this still is a complete no-go.  A physical address is not a
> proper interface.  You still need to have some interface for your NOR nand
> or DRAM.  - usually that would be a mtd driver, but if you have a good
> reason why that's not suitable for you (and please explain it well)
> we'll need a little OF or similar layer to bind a thin driver.

The primary use case for this is to run Linux on a small microcontroller 
with some amount of RAM and ROM on chip. And this is not theoretical -- 
I already have it running here. The ROM is some kind of flash that 
appears in the direct memory address space and requires no access layer 
what so ever given it is meant to execute code from it. The flash is 
programmed with an external programmer through some debug port. It can't 
be programmed from the microcontroller itself, not even probed, as that 
would make the running code unavailable (unless the probe code is copied 
elsewhere but what would be the point?). Persistent state is typically 
kept in NVRAM or external flash, not in _that_ flash.

The MTD subsystem provides a lot of features and flexibility, but almost 
none of it would be usable here and constitutes only a useless kernel 
size increase.

The kernel itself runs XIP from that ROM. It has to be linked for the 
exact address where it is flashed. The link address is therefore not a 
variable that can be changed at run time. It is the same for the 
filesystem image: it is related to the way things are laid out in ROM, 
and typically depends on the actual size of the kernel when ROM is 
tight.

You fundamentally need to know the address of the kernel _and_ the 
address of the fs image. Those addresses are properties of your kernel 
config. So having to specify one in Kconfig and bury the other in DT 
doesn't make sense to me as this is just an extra file to edit and 
compile, and an extra binary to write into flash, for something that 
isn't a property of the hardware. The bootloader and DT should remain 
stable as much as possible with invariant data.

If you prefer, the physical address could be specified with a Kconfig 
symbol just like the kernel link address. Personally I think it is best 
to keep it along with the other root mount args. But going all the way 
with a dynamic driver binding interface and a dummy intermediate name is 
like using a sledge hammer to kill an ant: it will work of course, but 
given the context it is prone to errors due to the added manipulations 
mentioned previously ... and a tad overkill.


> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > Tested-by: Chris Brandt <chris.brandt@renesas.com>
> > ---
> >  fs/cramfs/Kconfig |  29 +++++-
> >  fs/cramfs/inode.c | 264 +++++++++++++++++++++++++++++++++++++++++++-----------
> >  2 files changed, 241 insertions(+), 52 deletions(-)
> > 
> > diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> > index 11b29d491b..5b4e0b7e13 100644
> > --- a/fs/cramfs/Kconfig
> > +++ b/fs/cramfs/Kconfig
> > @@ -1,6 +1,5 @@
> >  config CRAMFS
> >  	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
> > -	depends on BLOCK
> >  	select ZLIB_INFLATE
> >  	help
> >  	  Saying Y here includes support for CramFs (Compressed ROM File
> > @@ -20,3 +19,31 @@ config CRAMFS
> >  	  in terms of performance and features.
> >  
> >  	  If unsure, say N.
> > +
> > +config CRAMFS_BLOCKDEV
> > +	bool "Support CramFs image over a regular block device" if EXPERT
> > +	depends on CRAMFS && BLOCK
> > +	default y
> > +	help
> > +	  This option allows the CramFs driver to load data from a regular
> > +	  block device such a disk partition or a ramdisk.
> > +
> > +config CRAMFS_PHYSMEM
> > +	bool "Support CramFs image directly mapped in physical memory"
> > +	depends on CRAMFS
> > +	default y if !CRAMFS_BLOCKDEV
> > +	help
> > +	  This option allows the CramFs driver to load data directly from
> > +	  a linear adressed memory range (usually non volatile memory
> > +	  like flash) instead of going through the block device layer.
> > +	  This saves some memory since no intermediate buffering is
> > +	  necessary.
> > +
> > +	  The filesystem type for this feature is "cramfs_physmem".
> > +	  The location of the CramFs image in memory is board
> > +	  dependent. Therefore, if you say Y, you must know the proper
> > +	  physical address where to store the CramFs image and specify
> > +	  it using the physaddr=0x******** mount option (for example:
> > +	  "mount -t cramfs_physmem -o physaddr=0x100000 none /mnt").
> > +
> > +	  If unsure, say N.
> > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > index 7919967488..19f464a214 100644
> > --- a/fs/cramfs/inode.c
> > +++ b/fs/cramfs/inode.c
> > @@ -24,6 +24,7 @@
> >  #include <linux/mutex.h>
> >  #include <uapi/linux/cramfs_fs.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/io.h>
> >  
> >  #include "internal.h"
> >  
> > @@ -36,6 +37,8 @@ struct cramfs_sb_info {
> >  	unsigned long blocks;
> >  	unsigned long files;
> >  	unsigned long flags;
> > +	void *linear_virt_addr;
> > +	phys_addr_t linear_phys_addr;
> >  };
> >  
> >  static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
> > @@ -140,6 +143,9 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
> >   * BLKS_PER_BUF*PAGE_SIZE, so that the caller doesn't need to
> >   * worry about end-of-buffer issues even when decompressing a full
> >   * page cache.
> > + *
> > + * Note: This is all optimized away at compile time when
> > + *       CONFIG_CRAMFS_BLOCKDEV=n.
> >   */
> >  #define READ_BUFFERS (2)
> >  /* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */
> > @@ -160,10 +166,10 @@ static struct super_block *buffer_dev[READ_BUFFERS];
> >  static int next_buffer;
> >  
> >  /*
> > - * Returns a pointer to a buffer containing at least LEN bytes of
> > - * filesystem starting at byte offset OFFSET into the filesystem.
> > + * Populate our block cache and return a pointer from it.
> >   */
> > -static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
> > +static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > +				unsigned int len)
> >  {
> >  	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
> >  	struct page *pages[BLKS_PER_BUF];
> > @@ -239,7 +245,39 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i
> >  	return read_buffers[buffer] + offset;
> >  }
> >  
> > -static void cramfs_kill_sb(struct super_block *sb)
> > +/*
> > + * Return a pointer to the linearly addressed cramfs image in memory.
> > + */
> > +static void *cramfs_direct_read(struct super_block *sb, unsigned int offset,
> > +				unsigned int len)
> > +{
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> > +
> > +	if (!len)
> > +		return NULL;
> > +	if (len > sbi->size || offset > sbi->size - len)
> > +	       return page_address(ZERO_PAGE(0));
> > +	return sbi->linear_virt_addr + offset;
> > +}
> > +
> > +/*
> > + * Returns a pointer to a buffer containing at least LEN bytes of
> > + * filesystem starting at byte offset OFFSET into the filesystem.
> > + */
> > +static void *cramfs_read(struct super_block *sb, unsigned int offset,
> > +			 unsigned int len)
> > +{
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> > +
> > +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && sbi->linear_virt_addr)
> > +		return cramfs_direct_read(sb, offset, len);
> > +	else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> > +		return cramfs_blkdev_read(sb, offset, len);
> > +	else
> > +		return NULL;
> > +}
> > +
> > +static void cramfs_blkdev_kill_sb(struct super_block *sb)
> >  {
> >  	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> >  
> > @@ -247,6 +285,16 @@ static void cramfs_kill_sb(struct super_block *sb)
> >  	kfree(sbi);
> >  }
> >  
> > +static void cramfs_physmem_kill_sb(struct super_block *sb)
> > +{
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> > +
> > +	if (sbi->linear_virt_addr)
> > +		memunmap(sbi->linear_virt_addr);
> > +	kill_anon_super(sb);
> > +	kfree(sbi);
> > +}
> > +
> >  static int cramfs_remount(struct super_block *sb, int *flags, char *data)
> >  {
> >  	sync_filesystem(sb);
> > @@ -254,34 +302,24 @@ static int cramfs_remount(struct super_block *sb, int *flags, char *data)
> >  	return 0;
> >  }
> >  
> > -static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> > +static int cramfs_read_super(struct super_block *sb,
> > +			     struct cramfs_super *super, int silent)
> >  {
> > -	int i;
> > -	struct cramfs_super super;
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> >  	unsigned long root_offset;
> > -	struct cramfs_sb_info *sbi;
> > -	struct inode *root;
> > -
> > -	sb->s_flags |= MS_RDONLY;
> > -
> > -	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> > -	if (!sbi)
> > -		return -ENOMEM;
> > -	sb->s_fs_info = sbi;
> >  
> > -	/* Invalidate the read buffers on mount: think disk change.. */
> > -	mutex_lock(&read_mutex);
> > -	for (i = 0; i < READ_BUFFERS; i++)
> > -		buffer_blocknr[i] = -1;
> > +	/* We don't know the real size yet */
> > +	sbi->size = PAGE_SIZE;
> >  
> >  	/* Read the first block and get the superblock from it */
> > -	memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
> > +	mutex_lock(&read_mutex);
> > +	memcpy(super, cramfs_read(sb, 0, sizeof(*super)), sizeof(*super));
> >  	mutex_unlock(&read_mutex);
> >  
> >  	/* Do sanity checks on the superblock */
> > -	if (super.magic != CRAMFS_MAGIC) {
> > +	if (super->magic != CRAMFS_MAGIC) {
> >  		/* check for wrong endianness */
> > -		if (super.magic == CRAMFS_MAGIC_WEND) {
> > +		if (super->magic == CRAMFS_MAGIC_WEND) {
> >  			if (!silent)
> >  				pr_err("wrong endianness\n");
> >  			return -EINVAL;
> > @@ -289,10 +327,10 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  		/* check at 512 byte offset */
> >  		mutex_lock(&read_mutex);
> > -		memcpy(&super, cramfs_read(sb, 512, sizeof(super)), sizeof(super));
> > +		memcpy(super, cramfs_read(sb, 512, sizeof(*super)), sizeof(*super));
> >  		mutex_unlock(&read_mutex);
> > -		if (super.magic != CRAMFS_MAGIC) {
> > -			if (super.magic == CRAMFS_MAGIC_WEND && !silent)
> > +		if (super->magic != CRAMFS_MAGIC) {
> > +			if (super->magic == CRAMFS_MAGIC_WEND && !silent)
> >  				pr_err("wrong endianness\n");
> >  			else if (!silent)
> >  				pr_err("wrong magic\n");
> > @@ -301,34 +339,34 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  	}
> >  
> >  	/* get feature flags first */
> > -	if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) {
> > +	if (super->flags & ~CRAMFS_SUPPORTED_FLAGS) {
> >  		pr_err("unsupported filesystem features\n");
> >  		return -EINVAL;
> >  	}
> >  
> >  	/* Check that the root inode is in a sane state */
> > -	if (!S_ISDIR(super.root.mode)) {
> > +	if (!S_ISDIR(super->root.mode)) {
> >  		pr_err("root is not a directory\n");
> >  		return -EINVAL;
> >  	}
> >  	/* correct strange, hard-coded permissions of mkcramfs */
> > -	super.root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
> > +	super->root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
> >  
> > -	root_offset = super.root.offset << 2;
> > -	if (super.flags & CRAMFS_FLAG_FSID_VERSION_2) {
> > -		sbi->size = super.size;
> > -		sbi->blocks = super.fsid.blocks;
> > -		sbi->files = super.fsid.files;
> > +	root_offset = super->root.offset << 2;
> > +	if (super->flags & CRAMFS_FLAG_FSID_VERSION_2) {
> > +		sbi->size = super->size;
> > +		sbi->blocks = super->fsid.blocks;
> > +		sbi->files = super->fsid.files;
> >  	} else {
> >  		sbi->size = 1<<28;
> >  		sbi->blocks = 0;
> >  		sbi->files = 0;
> >  	}
> > -	sbi->magic = super.magic;
> > -	sbi->flags = super.flags;
> > +	sbi->magic = super->magic;
> > +	sbi->flags = super->flags;
> >  	if (root_offset == 0)
> >  		pr_info("empty filesystem");
> > -	else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
> > +	else if (!(super->flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
> >  		 ((root_offset != sizeof(struct cramfs_super)) &&
> >  		  (root_offset != 512 + sizeof(struct cramfs_super))))
> >  	{
> > @@ -336,9 +374,18 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  		return -EINVAL;
> >  	}
> >  
> > +	return 0;
> > +}
> > +
> > +static int cramfs_finalize_super(struct super_block *sb,
> > +				 struct cramfs_inode *cramfs_root)
> > +{
> > +	struct inode *root;
> > +
> >  	/* Set it all up.. */
> > +	sb->s_flags |= MS_RDONLY;
> >  	sb->s_op = &cramfs_ops;
> > -	root = get_cramfs_inode(sb, &super.root, 0);
> > +	root = get_cramfs_inode(sb, cramfs_root, 0);
> >  	if (IS_ERR(root))
> >  		return PTR_ERR(root);
> >  	sb->s_root = d_make_root(root);
> > @@ -347,6 +394,92 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  	return 0;
> >  }
> >  
> > +static int cramfs_blkdev_fill_super(struct super_block *sb, void *data, int silent)
> > +{
> > +	struct cramfs_sb_info *sbi;
> > +	struct cramfs_super super;
> > +	int i, err;
> > +
> > +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> > +	if (!sbi)
> > +		return -ENOMEM;
> > +	sb->s_fs_info = sbi;
> > +
> > +	/* Invalidate the read buffers on mount: think disk change.. */
> > +	for (i = 0; i < READ_BUFFERS; i++)
> > +		buffer_blocknr[i] = -1;
> > +
> > +	err = cramfs_read_super(sb, &super, silent);
> > +	if (err)
> > +		return err;
> > +	return cramfs_finalize_super(sb, &super.root);
> > +}
> > +
> > +static int cramfs_physmem_fill_super(struct super_block *sb, void *data, int silent)
> > +{
> > +	struct cramfs_sb_info *sbi;
> > +	struct cramfs_super super;
> > +	char *p;
> > +	int err;
> > +
> > +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> > +	if (!sbi)
> > +		return -ENOMEM;
> > +	sb->s_fs_info = sbi;
> > +
> > +	/*
> > +	 * The physical location of the cramfs image is specified as
> > +	 * a mount parameter.  This parameter is mandatory for obvious
> > +	 * reasons.  Some validation is made on the phys address but this
> > +	 * is not exhaustive and we count on the fact that someone using
> > +	 * this feature is supposed to know what he/she's doing.
> > +	 */
> > +	if (!data || !(p = strstr((char *)data, "physaddr="))) {
> > +		pr_err("unknown physical address for linear cramfs image\n");
> > +		return -EINVAL;
> > +	}
> > +	sbi->linear_phys_addr = memparse(p + 9, NULL);
> > +	if (!sbi->linear_phys_addr) {
> > +		pr_err("bad value for cramfs image physical address\n");
> > +		return -EINVAL;
> > +	}
> > +	if (sbi->linear_phys_addr & (PAGE_SIZE-1)) {
> > +		pr_err("physical address %pap for linear cramfs isn't aligned to a page boundary\n",
> > +			&sbi->linear_phys_addr);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * Map only one page for now.  Will remap it when fs size is known.
> > +	 * Although we'll only read from it, we want the CPU cache to
> > +	 * kick in for the higher throughput it provides, hence MEMREMAP_WB.
> > +	 */
> > +	pr_info("checking physical address %pap for linear cramfs image\n", &sbi->linear_phys_addr);
> > +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, PAGE_SIZE,
> > +					 MEMREMAP_WB);
> > +	if (!sbi->linear_virt_addr) {
> > +		pr_err("ioremap of the linear cramfs image failed\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	err = cramfs_read_super(sb, &super, silent);
> > +	if (err)
> > +		return err;
> > +
> > +	/* Remap the whole filesystem now */
> > +	pr_info("linear cramfs image appears to be %lu KB in size\n",
> > +		sbi->size/1024);
> > +	memunmap(sbi->linear_virt_addr);
> > +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, sbi->size,
> > +					 MEMREMAP_WB);
> > +	if (!sbi->linear_virt_addr) {
> > +		pr_err("ioremap of the linear cramfs image failed\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	return cramfs_finalize_super(sb, &super.root);
> > +}
> > +
> >  static int cramfs_statfs(struct dentry *dentry, struct kstatfs *buf)
> >  {
> >  	struct super_block *sb = dentry->d_sb;
> > @@ -573,38 +706,67 @@ static const struct super_operations cramfs_ops = {
> >  	.statfs		= cramfs_statfs,
> >  };
> >  
> > -static struct dentry *cramfs_mount(struct file_system_type *fs_type,
> > -	int flags, const char *dev_name, void *data)
> > +static struct dentry *cramfs_blkdev_mount(struct file_system_type *fs_type,
> > +				int flags, const char *dev_name, void *data)
> > +{
> > +	return mount_bdev(fs_type, flags, dev_name, data, cramfs_blkdev_fill_super);
> > +}
> > +
> > +static struct dentry *cramfs_physmem_mount(struct file_system_type *fs_type,
> > +				int flags, const char *dev_name, void *data)
> >  {
> > -	return mount_bdev(fs_type, flags, dev_name, data, cramfs_fill_super);
> > +	return mount_nodev(fs_type, flags, data, cramfs_physmem_fill_super);
> >  }
> >  
> >  static struct file_system_type cramfs_fs_type = {
> >  	.owner		= THIS_MODULE,
> >  	.name		= "cramfs",
> > -	.mount		= cramfs_mount,
> > -	.kill_sb	= cramfs_kill_sb,
> > +	.mount		= cramfs_blkdev_mount,
> > +	.kill_sb	= cramfs_blkdev_kill_sb,
> >  	.fs_flags	= FS_REQUIRES_DEV,
> >  };
> > +
> > +static struct file_system_type cramfs_physmem_fs_type = {
> > +	.owner		= THIS_MODULE,
> > +	.name		= "cramfs_physmem",
> > +	.mount		= cramfs_physmem_mount,
> > +	.kill_sb	= cramfs_physmem_kill_sb,
> > +};
> > +
> > +#ifdef CONFIG_CRAMFS_BLOCKDEV
> >  MODULE_ALIAS_FS("cramfs");
> > +#endif
> > +#ifdef CONFIG_CRAMFS_PHYSMEM
> > +MODULE_ALIAS_FS("cramfs_physmem");
> > +#endif
> >  
> >  static int __init init_cramfs_fs(void)
> >  {
> >  	int rv;
> >  
> > -	rv = cramfs_uncompress_init();
> > -	if (rv < 0)
> > -		return rv;
> > -	rv = register_filesystem(&cramfs_fs_type);
> > -	if (rv < 0)
> > -		cramfs_uncompress_exit();
> > -	return rv;
> > +	if ((rv = cramfs_uncompress_init()) < 0)
> > +		goto err0;
> > +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) &&
> > +	    (rv = register_filesystem(&cramfs_fs_type)) < 0)
> > +		goto err1;
> > +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
> > +	    (rv = register_filesystem(&cramfs_physmem_fs_type)) < 0)
> > +		goto err2;
> > +	return 0;
> > +
> > +err2:	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> > +		unregister_filesystem(&cramfs_fs_type);
> > +err1:	cramfs_uncompress_exit();
> > +err0:	return rv;
> >  }
> >  
> >  static void __exit exit_cramfs_fs(void)
> >  {
> >  	cramfs_uncompress_exit();
> > -	unregister_filesystem(&cramfs_fs_type);
> > +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> > +		unregister_filesystem(&cramfs_fs_type);
> > +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM))
> > +		unregister_filesystem(&cramfs_physmem_fs_type);
> >  }
> >  
> >  module_init(init_cramfs_fs)
> > -- 
> > 2.9.5
> > 
> ---end quoted text---
> 

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
@ 2017-10-01 22:27       ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-01 22:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt, linux-mtd, devicetree

On Sun, 1 Oct 2017, Christoph Hellwig wrote:

> On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
> > To distinguish between both access types, the cramfs_physmem filesystem
> > type must be specified when using a memory accessible cramfs image, and
> > the physaddr argument must provide the actual filesystem image's physical
> > memory location.
> 
> Sorry, but this still is a complete no-go.  A physical address is not a
> proper interface.  You still need to have some interface for your NOR nand
> or DRAM.  - usually that would be a mtd driver, but if you have a good
> reason why that's not suitable for you (and please explain it well)
> we'll need a little OF or similar layer to bind a thin driver.

The primary use case for this is to run Linux on a small microcontroller 
with some amount of RAM and ROM on chip. And this is not theoretical -- 
I already have it running here. The ROM is some kind of flash that 
appears in the direct memory address space and requires no access layer 
what so ever given it is meant to execute code from it. The flash is 
programmed with an external programmer through some debug port. It can't 
be programmed from the microcontroller itself, not even probed, as that 
would make the running code unavailable (unless the probe code is copied 
elsewhere but what would be the point?). Persistent state is typically 
kept in NVRAM or external flash, not in _that_ flash.

The MTD subsystem provides a lot of features and flexibility, but almost 
none of it would be usable here and constitutes only a useless kernel 
size increase.

The kernel itself runs XIP from that ROM. It has to be linked for the 
exact address where it is flashed. The link address is therefore not a 
variable that can be changed at run time. It is the same for the 
filesystem image: it is related to the way things are laid out in ROM, 
and typically depends on the actual size of the kernel when ROM is 
tight.

You fundamentally need to know the address of the kernel _and_ the 
address of the fs image. Those addresses are properties of your kernel 
config. So having to specify one in Kconfig and bury the other in DT 
doesn't make sense to me as this is just an extra file to edit and 
compile, and an extra binary to write into flash, for something that 
isn't a property of the hardware. The bootloader and DT should remain 
stable as much as possible with invariant data.

If you prefer, the physical address could be specified with a Kconfig 
symbol just like the kernel link address. Personally I think it is best 
to keep it along with the other root mount args. But going all the way 
with a dynamic driver binding interface and a dummy intermediate name is 
like using a sledge hammer to kill an ant: it will work of course, but 
given the context it is prone to errors due to the added manipulations 
mentioned previously ... and a tad overkill.


> > Signed-off-by: Nicolas Pitre <nico@linaro.org>
> > Tested-by: Chris Brandt <chris.brandt@renesas.com>
> > ---
> >  fs/cramfs/Kconfig |  29 +++++-
> >  fs/cramfs/inode.c | 264 +++++++++++++++++++++++++++++++++++++++++++-----------
> >  2 files changed, 241 insertions(+), 52 deletions(-)
> > 
> > diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> > index 11b29d491b..5b4e0b7e13 100644
> > --- a/fs/cramfs/Kconfig
> > +++ b/fs/cramfs/Kconfig
> > @@ -1,6 +1,5 @@
> >  config CRAMFS
> >  	tristate "Compressed ROM file system support (cramfs) (OBSOLETE)"
> > -	depends on BLOCK
> >  	select ZLIB_INFLATE
> >  	help
> >  	  Saying Y here includes support for CramFs (Compressed ROM File
> > @@ -20,3 +19,31 @@ config CRAMFS
> >  	  in terms of performance and features.
> >  
> >  	  If unsure, say N.
> > +
> > +config CRAMFS_BLOCKDEV
> > +	bool "Support CramFs image over a regular block device" if EXPERT
> > +	depends on CRAMFS && BLOCK
> > +	default y
> > +	help
> > +	  This option allows the CramFs driver to load data from a regular
> > +	  block device such a disk partition or a ramdisk.
> > +
> > +config CRAMFS_PHYSMEM
> > +	bool "Support CramFs image directly mapped in physical memory"
> > +	depends on CRAMFS
> > +	default y if !CRAMFS_BLOCKDEV
> > +	help
> > +	  This option allows the CramFs driver to load data directly from
> > +	  a linear adressed memory range (usually non volatile memory
> > +	  like flash) instead of going through the block device layer.
> > +	  This saves some memory since no intermediate buffering is
> > +	  necessary.
> > +
> > +	  The filesystem type for this feature is "cramfs_physmem".
> > +	  The location of the CramFs image in memory is board
> > +	  dependent. Therefore, if you say Y, you must know the proper
> > +	  physical address where to store the CramFs image and specify
> > +	  it using the physaddr=0x******** mount option (for example:
> > +	  "mount -t cramfs_physmem -o physaddr=0x100000 none /mnt").
> > +
> > +	  If unsure, say N.
> > diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
> > index 7919967488..19f464a214 100644
> > --- a/fs/cramfs/inode.c
> > +++ b/fs/cramfs/inode.c
> > @@ -24,6 +24,7 @@
> >  #include <linux/mutex.h>
> >  #include <uapi/linux/cramfs_fs.h>
> >  #include <linux/uaccess.h>
> > +#include <linux/io.h>
> >  
> >  #include "internal.h"
> >  
> > @@ -36,6 +37,8 @@ struct cramfs_sb_info {
> >  	unsigned long blocks;
> >  	unsigned long files;
> >  	unsigned long flags;
> > +	void *linear_virt_addr;
> > +	phys_addr_t linear_phys_addr;
> >  };
> >  
> >  static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
> > @@ -140,6 +143,9 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
> >   * BLKS_PER_BUF*PAGE_SIZE, so that the caller doesn't need to
> >   * worry about end-of-buffer issues even when decompressing a full
> >   * page cache.
> > + *
> > + * Note: This is all optimized away at compile time when
> > + *       CONFIG_CRAMFS_BLOCKDEV=n.
> >   */
> >  #define READ_BUFFERS (2)
> >  /* NEXT_BUFFER(): Loop over [0..(READ_BUFFERS-1)]. */
> > @@ -160,10 +166,10 @@ static struct super_block *buffer_dev[READ_BUFFERS];
> >  static int next_buffer;
> >  
> >  /*
> > - * Returns a pointer to a buffer containing at least LEN bytes of
> > - * filesystem starting at byte offset OFFSET into the filesystem.
> > + * Populate our block cache and return a pointer from it.
> >   */
> > -static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned int len)
> > +static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset,
> > +				unsigned int len)
> >  {
> >  	struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping;
> >  	struct page *pages[BLKS_PER_BUF];
> > @@ -239,7 +245,39 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset, unsigned i
> >  	return read_buffers[buffer] + offset;
> >  }
> >  
> > -static void cramfs_kill_sb(struct super_block *sb)
> > +/*
> > + * Return a pointer to the linearly addressed cramfs image in memory.
> > + */
> > +static void *cramfs_direct_read(struct super_block *sb, unsigned int offset,
> > +				unsigned int len)
> > +{
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> > +
> > +	if (!len)
> > +		return NULL;
> > +	if (len > sbi->size || offset > sbi->size - len)
> > +	       return page_address(ZERO_PAGE(0));
> > +	return sbi->linear_virt_addr + offset;
> > +}
> > +
> > +/*
> > + * Returns a pointer to a buffer containing at least LEN bytes of
> > + * filesystem starting at byte offset OFFSET into the filesystem.
> > + */
> > +static void *cramfs_read(struct super_block *sb, unsigned int offset,
> > +			 unsigned int len)
> > +{
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> > +
> > +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) && sbi->linear_virt_addr)
> > +		return cramfs_direct_read(sb, offset, len);
> > +	else if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> > +		return cramfs_blkdev_read(sb, offset, len);
> > +	else
> > +		return NULL;
> > +}
> > +
> > +static void cramfs_blkdev_kill_sb(struct super_block *sb)
> >  {
> >  	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> >  
> > @@ -247,6 +285,16 @@ static void cramfs_kill_sb(struct super_block *sb)
> >  	kfree(sbi);
> >  }
> >  
> > +static void cramfs_physmem_kill_sb(struct super_block *sb)
> > +{
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> > +
> > +	if (sbi->linear_virt_addr)
> > +		memunmap(sbi->linear_virt_addr);
> > +	kill_anon_super(sb);
> > +	kfree(sbi);
> > +}
> > +
> >  static int cramfs_remount(struct super_block *sb, int *flags, char *data)
> >  {
> >  	sync_filesystem(sb);
> > @@ -254,34 +302,24 @@ static int cramfs_remount(struct super_block *sb, int *flags, char *data)
> >  	return 0;
> >  }
> >  
> > -static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> > +static int cramfs_read_super(struct super_block *sb,
> > +			     struct cramfs_super *super, int silent)
> >  {
> > -	int i;
> > -	struct cramfs_super super;
> > +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> >  	unsigned long root_offset;
> > -	struct cramfs_sb_info *sbi;
> > -	struct inode *root;
> > -
> > -	sb->s_flags |= MS_RDONLY;
> > -
> > -	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> > -	if (!sbi)
> > -		return -ENOMEM;
> > -	sb->s_fs_info = sbi;
> >  
> > -	/* Invalidate the read buffers on mount: think disk change.. */
> > -	mutex_lock(&read_mutex);
> > -	for (i = 0; i < READ_BUFFERS; i++)
> > -		buffer_blocknr[i] = -1;
> > +	/* We don't know the real size yet */
> > +	sbi->size = PAGE_SIZE;
> >  
> >  	/* Read the first block and get the superblock from it */
> > -	memcpy(&super, cramfs_read(sb, 0, sizeof(super)), sizeof(super));
> > +	mutex_lock(&read_mutex);
> > +	memcpy(super, cramfs_read(sb, 0, sizeof(*super)), sizeof(*super));
> >  	mutex_unlock(&read_mutex);
> >  
> >  	/* Do sanity checks on the superblock */
> > -	if (super.magic != CRAMFS_MAGIC) {
> > +	if (super->magic != CRAMFS_MAGIC) {
> >  		/* check for wrong endianness */
> > -		if (super.magic == CRAMFS_MAGIC_WEND) {
> > +		if (super->magic == CRAMFS_MAGIC_WEND) {
> >  			if (!silent)
> >  				pr_err("wrong endianness\n");
> >  			return -EINVAL;
> > @@ -289,10 +327,10 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  
> >  		/* check at 512 byte offset */
> >  		mutex_lock(&read_mutex);
> > -		memcpy(&super, cramfs_read(sb, 512, sizeof(super)), sizeof(super));
> > +		memcpy(super, cramfs_read(sb, 512, sizeof(*super)), sizeof(*super));
> >  		mutex_unlock(&read_mutex);
> > -		if (super.magic != CRAMFS_MAGIC) {
> > -			if (super.magic == CRAMFS_MAGIC_WEND && !silent)
> > +		if (super->magic != CRAMFS_MAGIC) {
> > +			if (super->magic == CRAMFS_MAGIC_WEND && !silent)
> >  				pr_err("wrong endianness\n");
> >  			else if (!silent)
> >  				pr_err("wrong magic\n");
> > @@ -301,34 +339,34 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  	}
> >  
> >  	/* get feature flags first */
> > -	if (super.flags & ~CRAMFS_SUPPORTED_FLAGS) {
> > +	if (super->flags & ~CRAMFS_SUPPORTED_FLAGS) {
> >  		pr_err("unsupported filesystem features\n");
> >  		return -EINVAL;
> >  	}
> >  
> >  	/* Check that the root inode is in a sane state */
> > -	if (!S_ISDIR(super.root.mode)) {
> > +	if (!S_ISDIR(super->root.mode)) {
> >  		pr_err("root is not a directory\n");
> >  		return -EINVAL;
> >  	}
> >  	/* correct strange, hard-coded permissions of mkcramfs */
> > -	super.root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
> > +	super->root.mode |= (S_IRUSR | S_IXUSR | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH);
> >  
> > -	root_offset = super.root.offset << 2;
> > -	if (super.flags & CRAMFS_FLAG_FSID_VERSION_2) {
> > -		sbi->size = super.size;
> > -		sbi->blocks = super.fsid.blocks;
> > -		sbi->files = super.fsid.files;
> > +	root_offset = super->root.offset << 2;
> > +	if (super->flags & CRAMFS_FLAG_FSID_VERSION_2) {
> > +		sbi->size = super->size;
> > +		sbi->blocks = super->fsid.blocks;
> > +		sbi->files = super->fsid.files;
> >  	} else {
> >  		sbi->size = 1<<28;
> >  		sbi->blocks = 0;
> >  		sbi->files = 0;
> >  	}
> > -	sbi->magic = super.magic;
> > -	sbi->flags = super.flags;
> > +	sbi->magic = super->magic;
> > +	sbi->flags = super->flags;
> >  	if (root_offset == 0)
> >  		pr_info("empty filesystem");
> > -	else if (!(super.flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
> > +	else if (!(super->flags & CRAMFS_FLAG_SHIFTED_ROOT_OFFSET) &&
> >  		 ((root_offset != sizeof(struct cramfs_super)) &&
> >  		  (root_offset != 512 + sizeof(struct cramfs_super))))
> >  	{
> > @@ -336,9 +374,18 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  		return -EINVAL;
> >  	}
> >  
> > +	return 0;
> > +}
> > +
> > +static int cramfs_finalize_super(struct super_block *sb,
> > +				 struct cramfs_inode *cramfs_root)
> > +{
> > +	struct inode *root;
> > +
> >  	/* Set it all up.. */
> > +	sb->s_flags |= MS_RDONLY;
> >  	sb->s_op = &cramfs_ops;
> > -	root = get_cramfs_inode(sb, &super.root, 0);
> > +	root = get_cramfs_inode(sb, cramfs_root, 0);
> >  	if (IS_ERR(root))
> >  		return PTR_ERR(root);
> >  	sb->s_root = d_make_root(root);
> > @@ -347,6 +394,92 @@ static int cramfs_fill_super(struct super_block *sb, void *data, int silent)
> >  	return 0;
> >  }
> >  
> > +static int cramfs_blkdev_fill_super(struct super_block *sb, void *data, int silent)
> > +{
> > +	struct cramfs_sb_info *sbi;
> > +	struct cramfs_super super;
> > +	int i, err;
> > +
> > +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> > +	if (!sbi)
> > +		return -ENOMEM;
> > +	sb->s_fs_info = sbi;
> > +
> > +	/* Invalidate the read buffers on mount: think disk change.. */
> > +	for (i = 0; i < READ_BUFFERS; i++)
> > +		buffer_blocknr[i] = -1;
> > +
> > +	err = cramfs_read_super(sb, &super, silent);
> > +	if (err)
> > +		return err;
> > +	return cramfs_finalize_super(sb, &super.root);
> > +}
> > +
> > +static int cramfs_physmem_fill_super(struct super_block *sb, void *data, int silent)
> > +{
> > +	struct cramfs_sb_info *sbi;
> > +	struct cramfs_super super;
> > +	char *p;
> > +	int err;
> > +
> > +	sbi = kzalloc(sizeof(struct cramfs_sb_info), GFP_KERNEL);
> > +	if (!sbi)
> > +		return -ENOMEM;
> > +	sb->s_fs_info = sbi;
> > +
> > +	/*
> > +	 * The physical location of the cramfs image is specified as
> > +	 * a mount parameter.  This parameter is mandatory for obvious
> > +	 * reasons.  Some validation is made on the phys address but this
> > +	 * is not exhaustive and we count on the fact that someone using
> > +	 * this feature is supposed to know what he/she's doing.
> > +	 */
> > +	if (!data || !(p = strstr((char *)data, "physaddr="))) {
> > +		pr_err("unknown physical address for linear cramfs image\n");
> > +		return -EINVAL;
> > +	}
> > +	sbi->linear_phys_addr = memparse(p + 9, NULL);
> > +	if (!sbi->linear_phys_addr) {
> > +		pr_err("bad value for cramfs image physical address\n");
> > +		return -EINVAL;
> > +	}
> > +	if (sbi->linear_phys_addr & (PAGE_SIZE-1)) {
> > +		pr_err("physical address %pap for linear cramfs isn't aligned to a page boundary\n",
> > +			&sbi->linear_phys_addr);
> > +		return -EINVAL;
> > +	}
> > +
> > +	/*
> > +	 * Map only one page for now.  Will remap it when fs size is known.
> > +	 * Although we'll only read from it, we want the CPU cache to
> > +	 * kick in for the higher throughput it provides, hence MEMREMAP_WB.
> > +	 */
> > +	pr_info("checking physical address %pap for linear cramfs image\n", &sbi->linear_phys_addr);
> > +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, PAGE_SIZE,
> > +					 MEMREMAP_WB);
> > +	if (!sbi->linear_virt_addr) {
> > +		pr_err("ioremap of the linear cramfs image failed\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	err = cramfs_read_super(sb, &super, silent);
> > +	if (err)
> > +		return err;
> > +
> > +	/* Remap the whole filesystem now */
> > +	pr_info("linear cramfs image appears to be %lu KB in size\n",
> > +		sbi->size/1024);
> > +	memunmap(sbi->linear_virt_addr);
> > +	sbi->linear_virt_addr = memremap(sbi->linear_phys_addr, sbi->size,
> > +					 MEMREMAP_WB);
> > +	if (!sbi->linear_virt_addr) {
> > +		pr_err("ioremap of the linear cramfs image failed\n");
> > +		return -ENOMEM;
> > +	}
> > +
> > +	return cramfs_finalize_super(sb, &super.root);
> > +}
> > +
> >  static int cramfs_statfs(struct dentry *dentry, struct kstatfs *buf)
> >  {
> >  	struct super_block *sb = dentry->d_sb;
> > @@ -573,38 +706,67 @@ static const struct super_operations cramfs_ops = {
> >  	.statfs		= cramfs_statfs,
> >  };
> >  
> > -static struct dentry *cramfs_mount(struct file_system_type *fs_type,
> > -	int flags, const char *dev_name, void *data)
> > +static struct dentry *cramfs_blkdev_mount(struct file_system_type *fs_type,
> > +				int flags, const char *dev_name, void *data)
> > +{
> > +	return mount_bdev(fs_type, flags, dev_name, data, cramfs_blkdev_fill_super);
> > +}
> > +
> > +static struct dentry *cramfs_physmem_mount(struct file_system_type *fs_type,
> > +				int flags, const char *dev_name, void *data)
> >  {
> > -	return mount_bdev(fs_type, flags, dev_name, data, cramfs_fill_super);
> > +	return mount_nodev(fs_type, flags, data, cramfs_physmem_fill_super);
> >  }
> >  
> >  static struct file_system_type cramfs_fs_type = {
> >  	.owner		= THIS_MODULE,
> >  	.name		= "cramfs",
> > -	.mount		= cramfs_mount,
> > -	.kill_sb	= cramfs_kill_sb,
> > +	.mount		= cramfs_blkdev_mount,
> > +	.kill_sb	= cramfs_blkdev_kill_sb,
> >  	.fs_flags	= FS_REQUIRES_DEV,
> >  };
> > +
> > +static struct file_system_type cramfs_physmem_fs_type = {
> > +	.owner		= THIS_MODULE,
> > +	.name		= "cramfs_physmem",
> > +	.mount		= cramfs_physmem_mount,
> > +	.kill_sb	= cramfs_physmem_kill_sb,
> > +};
> > +
> > +#ifdef CONFIG_CRAMFS_BLOCKDEV
> >  MODULE_ALIAS_FS("cramfs");
> > +#endif
> > +#ifdef CONFIG_CRAMFS_PHYSMEM
> > +MODULE_ALIAS_FS("cramfs_physmem");
> > +#endif
> >  
> >  static int __init init_cramfs_fs(void)
> >  {
> >  	int rv;
> >  
> > -	rv = cramfs_uncompress_init();
> > -	if (rv < 0)
> > -		return rv;
> > -	rv = register_filesystem(&cramfs_fs_type);
> > -	if (rv < 0)
> > -		cramfs_uncompress_exit();
> > -	return rv;
> > +	if ((rv = cramfs_uncompress_init()) < 0)
> > +		goto err0;
> > +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV) &&
> > +	    (rv = register_filesystem(&cramfs_fs_type)) < 0)
> > +		goto err1;
> > +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
> > +	    (rv = register_filesystem(&cramfs_physmem_fs_type)) < 0)
> > +		goto err2;
> > +	return 0;
> > +
> > +err2:	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> > +		unregister_filesystem(&cramfs_fs_type);
> > +err1:	cramfs_uncompress_exit();
> > +err0:	return rv;
> >  }
> >  
> >  static void __exit exit_cramfs_fs(void)
> >  {
> >  	cramfs_uncompress_exit();
> > -	unregister_filesystem(&cramfs_fs_type);
> > +	if (IS_ENABLED(CONFIG_CRAMFS_BLOCKDEV))
> > +		unregister_filesystem(&cramfs_fs_type);
> > +	if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM))
> > +		unregister_filesystem(&cramfs_physmem_fs_type);
> >  }
> >  
> >  module_init(init_cramfs_fs)
> > -- 
> > 2.9.5
> > 
> ---end quoted text---
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-01  8:30     ` Christoph Hellwig
@ 2017-10-01 22:29       ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-01 22:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt

On Sun, 1 Oct 2017, Christoph Hellwig wrote:

> up_read(&mm->mmap_sem) in the fault path is a still a complete
> no-go,
> 
> NAK

Care to elaborate?

What about mm/filemap.c:__lock_page_or_retry() then?

Why the special handling on mm->mmap_sem with VM_FAULT_RETRY?

What are the potential problems with my approach I didn't cover yet?

Serious: I'm simply looking for solutions here.


Nicolas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-01 22:29       ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-01 22:29 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt

On Sun, 1 Oct 2017, Christoph Hellwig wrote:

> up_read(&mm->mmap_sem) in the fault path is a still a complete
> no-go,
> 
> NAK

Care to elaborate?

What about mm/filemap.c:__lock_page_or_retry() then?

Why the special handling on mm->mmap_sem with VM_FAULT_RETRY?

What are the potential problems with my approach I didn't cover yet?

Serious: I'm simply looking for solutions here.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-01 22:29       ` Nicolas Pitre
@ 2017-10-02 22:45         ` Richard Weinberger
  -1 siblings, 0 replies; 54+ messages in thread
From: Richard Weinberger @ 2017-10-02 22:45 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Sun, 1 Oct 2017, Christoph Hellwig wrote:
>
>> up_read(&mm->mmap_sem) in the fault path is a still a complete
>> no-go,
>>
>> NAK
>
> Care to elaborate?
>
> What about mm/filemap.c:__lock_page_or_retry() then?

As soon you up_read() in the page fault path other tasks will race
with you before
you're able to grab the write lock.

HTH

-- 
Thanks,
//richard

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-02 22:45         ` Richard Weinberger
  0 siblings, 0 replies; 54+ messages in thread
From: Richard Weinberger @ 2017-10-02 22:45 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> On Sun, 1 Oct 2017, Christoph Hellwig wrote:
>
>> up_read(&mm->mmap_sem) in the fault path is a still a complete
>> no-go,
>>
>> NAK
>
> Care to elaborate?
>
> What about mm/filemap.c:__lock_page_or_retry() then?

As soon you up_read() in the page fault path other tasks will race
with you before
you're able to grab the write lock.

HTH

-- 
Thanks,
//richard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-02 22:45         ` Richard Weinberger
@ 2017-10-02 23:33           ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-02 23:33 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Christoph Hellwig, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Tue, 3 Oct 2017, Richard Weinberger wrote:

> On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Sun, 1 Oct 2017, Christoph Hellwig wrote:
> >
> >> up_read(&mm->mmap_sem) in the fault path is a still a complete
> >> no-go,
> >>
> >> NAK
> >
> > Care to elaborate?
> >
> > What about mm/filemap.c:__lock_page_or_retry() then?
> 
> As soon you up_read() in the page fault path other tasks will race
> with you before
> you're able to grab the write lock.

But I _know_ that.

Could you highlight an area in my code where this is not accounted for?


Nicolas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-02 23:33           ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-02 23:33 UTC (permalink / raw)
  To: Richard Weinberger
  Cc: Christoph Hellwig, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Tue, 3 Oct 2017, Richard Weinberger wrote:

> On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > On Sun, 1 Oct 2017, Christoph Hellwig wrote:
> >
> >> up_read(&mm->mmap_sem) in the fault path is a still a complete
> >> no-go,
> >>
> >> NAK
> >
> > Care to elaborate?
> >
> > What about mm/filemap.c:__lock_page_or_retry() then?
> 
> As soon you up_read() in the page fault path other tasks will race
> with you before
> you're able to grab the write lock.

But I _know_ that.

Could you highlight an area in my code where this is not accounted for?


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
  2017-10-01  8:29     ` Christoph Hellwig
@ 2017-10-03 14:43       ` Rob Herring
  -1 siblings, 0 replies; 54+ messages in thread
From: Rob Herring @ 2017-10-03 14:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nicolas Pitre, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, linux-kernel, Chris Brandt, linux-mtd,
	devicetree

On Sun, Oct 1, 2017 at 3:29 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
>> To distinguish between both access types, the cramfs_physmem filesystem
>> type must be specified when using a memory accessible cramfs image, and
>> the physaddr argument must provide the actual filesystem image's physical
>> memory location.
>
> Sorry, but this still is a complete no-go.  A physical address is not a
> proper interface.  You still need to have some interface for your NOR nand
> or DRAM.  - usually that would be a mtd driver, but if you have a good
> reason why that's not suitable for you (and please explain it well)
> we'll need a little OF or similar layer to bind a thin driver.

I don't disagree that we may need DT binding here, but DT bindings are
h/w description and not a mechanism bind Linux kernel drivers. It can
be a subtle distinction, but it is an important one.

I can see the case where we have no driver. For RAM we don't have a
driver, yet pretty much all hardware has a DRAM controller which we
just rely on the firmware to setup. I could also envision that we have
hardware we do need to configure in the kernel. Perhaps the boot
settings are not optimal or we want/need to manage the clocks. That
seems somewhat unlikely if the kernel is also XIP from the same flash
as it is in Nico's case.

We do often describe the flash layout in DT when partitions are not
discoverable. I don't know if that would be needed here. Would the ROM
here ever be updateable from within Linux? If we're talking about a
single address to pass the kernel, DT seems like an overkill and
kernel cmdline is perfectly valid IMO.

Rob

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
@ 2017-10-03 14:43       ` Rob Herring
  0 siblings, 0 replies; 54+ messages in thread
From: Rob Herring @ 2017-10-03 14:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nicolas Pitre, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, linux-kernel, Chris Brandt, linux-mtd,
	devicetree

On Sun, Oct 1, 2017 at 3:29 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
>> To distinguish between both access types, the cramfs_physmem filesystem
>> type must be specified when using a memory accessible cramfs image, and
>> the physaddr argument must provide the actual filesystem image's physical
>> memory location.
>
> Sorry, but this still is a complete no-go.  A physical address is not a
> proper interface.  You still need to have some interface for your NOR nand
> or DRAM.  - usually that would be a mtd driver, but if you have a good
> reason why that's not suitable for you (and please explain it well)
> we'll need a little OF or similar layer to bind a thin driver.

I don't disagree that we may need DT binding here, but DT bindings are
h/w description and not a mechanism bind Linux kernel drivers. It can
be a subtle distinction, but it is an important one.

I can see the case where we have no driver. For RAM we don't have a
driver, yet pretty much all hardware has a DRAM controller which we
just rely on the firmware to setup. I could also envision that we have
hardware we do need to configure in the kernel. Perhaps the boot
settings are not optimal or we want/need to manage the clocks. That
seems somewhat unlikely if the kernel is also XIP from the same flash
as it is in Nico's case.

We do often describe the flash layout in DT when partitions are not
discoverable. I don't know if that would be needed here. Would the ROM
here ever be updateable from within Linux? If we're talking about a
single address to pass the kernel, DT seems like an overkill and
kernel cmdline is perfectly valid IMO.

Rob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-02 23:33           ` Nicolas Pitre
@ 2017-10-03 14:57             ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-03 14:57 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Richard Weinberger, Christoph Hellwig, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Mon, Oct 02, 2017 at 07:33:29PM -0400, Nicolas Pitre wrote:
> On Tue, 3 Oct 2017, Richard Weinberger wrote:
> 
> > On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > On Sun, 1 Oct 2017, Christoph Hellwig wrote:
> > >
> > >> up_read(&mm->mmap_sem) in the fault path is a still a complete
> > >> no-go,
> > >>
> > >> NAK
> > >
> > > Care to elaborate?
> > >
> > > What about mm/filemap.c:__lock_page_or_retry() then?
> > 
> > As soon you up_read() in the page fault path other tasks will race
> > with you before
> > you're able to grab the write lock.
> 
> But I _know_ that.
> 
> Could you highlight an area in my code where this is not accounted for?

Existing users of lock_page_or_retry return VM_FAULT_RETRY right after
up()ing mmap_sem, and they must already have a reference to the page
which is the only thing touched until then.

Your patch instead goes for an exclusive mmap_sem if it can, and
even if there is nothing that breaks with that scheme right now
there s nothing documenting that this actually safe, and we are
way down in the complex page fault path.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-03 14:57             ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-03 14:57 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Richard Weinberger, Christoph Hellwig, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Mon, Oct 02, 2017 at 07:33:29PM -0400, Nicolas Pitre wrote:
> On Tue, 3 Oct 2017, Richard Weinberger wrote:
> 
> > On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > On Sun, 1 Oct 2017, Christoph Hellwig wrote:
> > >
> > >> up_read(&mm->mmap_sem) in the fault path is a still a complete
> > >> no-go,
> > >>
> > >> NAK
> > >
> > > Care to elaborate?
> > >
> > > What about mm/filemap.c:__lock_page_or_retry() then?
> > 
> > As soon you up_read() in the page fault path other tasks will race
> > with you before
> > you're able to grab the write lock.
> 
> But I _know_ that.
> 
> Could you highlight an area in my code where this is not accounted for?

Existing users of lock_page_or_retry return VM_FAULT_RETRY right after
up()ing mmap_sem, and they must already have a reference to the page
which is the only thing touched until then.

Your patch instead goes for an exclusive mmap_sem if it can, and
even if there is nothing that breaks with that scheme right now
there s nothing documenting that this actually safe, and we are
way down in the complex page fault path.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 1/5] cramfs: direct memory access support
  2017-10-03 14:43       ` Rob Herring
@ 2017-10-03 14:58         ` Chris Brandt
  -1 siblings, 0 replies; 54+ messages in thread
From: Chris Brandt @ 2017-10-03 14:58 UTC (permalink / raw)
  To: Rob Herring, Christoph Hellwig
  Cc: Nicolas Pitre, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, linux-kernel, linux-mtd, devicetree

On Tuesday, October 03, 2017 1, Rob Herring wrote:
> On Sun, Oct 1, 2017 at 3:29 AM, Christoph Hellwig <hch@infradead.org>
> wrote:
> > On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
> >> To distinguish between both access types, the cramfs_physmem filesystem
> >> type must be specified when using a memory accessible cramfs image, and
> >> the physaddr argument must provide the actual filesystem image's
> physical
> >> memory location.
> >
> > Sorry, but this still is a complete no-go.  A physical address is not a
> > proper interface.  You still need to have some interface for your NOR
> nand
> > or DRAM.  - usually that would be a mtd driver, but if you have a good
> > reason why that's not suitable for you (and please explain it well)
> > we'll need a little OF or similar layer to bind a thin driver.
> 
> I don't disagree that we may need DT binding here, but DT bindings are
> h/w description and not a mechanism bind Linux kernel drivers. It can
> be a subtle distinction, but it is an important one.
> 
> I can see the case where we have no driver. For RAM we don't have a
> driver, yet pretty much all hardware has a DRAM controller which we
> just rely on the firmware to setup. I could also envision that we have
> hardware we do need to configure in the kernel. Perhaps the boot
> settings are not optimal or we want/need to manage the clocks. That
> seems somewhat unlikely if the kernel is also XIP from the same flash
> as it is in Nico's case.
> 
> We do often describe the flash layout in DT when partitions are not
> discoverable. I don't know if that would be needed here. Would the ROM
> here ever be updateable from within Linux? If we're talking about a
> single address to pass the kernel, DT seems like an overkill and
> kernel cmdline is perfectly valid IMO.


As someone that's been using an XIP File system for a while now (AXFS, 
obviously not xip-cramfs), there is a way (in my system at least) to 
write to the same Flash that the kernel and file system are currently XIP 
executing (think jumping to RAM, doing a small flash operation, then 
jumping back to Flash).

The use case is if you've logically partitioned your flash such that you
keep your application in a separate file XIP filesystem image, you 
remotely download an updated version to some unused portion of Flash, then 
simply unmount what you have been using and mount the new image since you
can pass in the physical address of where you wrote your new image to.

So in that case, I guess you can do some type of DT overlay or 
something, but at the moment, just having the physical address as a parameter in 
mount command makes it pretty darn easy.

Chris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 1/5] cramfs: direct memory access support
@ 2017-10-03 14:58         ` Chris Brandt
  0 siblings, 0 replies; 54+ messages in thread
From: Chris Brandt @ 2017-10-03 14:58 UTC (permalink / raw)
  To: Rob Herring, Christoph Hellwig
  Cc: Nicolas Pitre, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, linux-kernel, linux-mtd, devicetree

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2835 bytes --]

On Tuesday, October 03, 2017 1, Rob Herring wrote:
> On Sun, Oct 1, 2017 at 3:29 AM, Christoph Hellwig <hch@infradead.org>
> wrote:
> > On Wed, Sep 27, 2017 at 07:32:20PM -0400, Nicolas Pitre wrote:
> >> To distinguish between both access types, the cramfs_physmem filesystem
> >> type must be specified when using a memory accessible cramfs image, and
> >> the physaddr argument must provide the actual filesystem image's
> physical
> >> memory location.
> >
> > Sorry, but this still is a complete no-go.  A physical address is not a
> > proper interface.  You still need to have some interface for your NOR
> nand
> > or DRAM.  - usually that would be a mtd driver, but if you have a good
> > reason why that's not suitable for you (and please explain it well)
> > we'll need a little OF or similar layer to bind a thin driver.
> 
> I don't disagree that we may need DT binding here, but DT bindings are
> h/w description and not a mechanism bind Linux kernel drivers. It can
> be a subtle distinction, but it is an important one.
> 
> I can see the case where we have no driver. For RAM we don't have a
> driver, yet pretty much all hardware has a DRAM controller which we
> just rely on the firmware to setup. I could also envision that we have
> hardware we do need to configure in the kernel. Perhaps the boot
> settings are not optimal or we want/need to manage the clocks. That
> seems somewhat unlikely if the kernel is also XIP from the same flash
> as it is in Nico's case.
> 
> We do often describe the flash layout in DT when partitions are not
> discoverable. I don't know if that would be needed here. Would the ROM
> here ever be updateable from within Linux? If we're talking about a
> single address to pass the kernel, DT seems like an overkill and
> kernel cmdline is perfectly valid IMO.


As someone that's been using an XIP File system for a while now (AXFS, 
obviously not xip-cramfs), there is a way (in my system at least) to 
write to the same Flash that the kernel and file system are currently XIP 
executing (think jumping to RAM, doing a small flash operation, then 
jumping back to Flash).

The use case is if you've logically partitioned your flash such that you
keep your application in a separate file XIP filesystem image, you 
remotely download an updated version to some unused portion of Flash, then 
simply unmount what you have been using and mount the new image since you
can pass in the physical address of where you wrote your new image to.

So in that case, I guess you can do some type of DT overlay or 
something, but at the moment, just having the physical address as a parameter in 
mount command makes it pretty darn easy.

Chris

N‹§²æìr¸›zǧu©ž²Æ {\b­†éì¹»\x1c®&Þ–)îÆi¢žØ^n‡r¶‰šŽŠÝ¢j$½§$¢¸\x05¢¹¨­è§~Š'.)îÄÃ,yèm¶ŸÿÃ\f%Š{±šj+ƒðèž×¦j)Z†·Ÿ

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
  2017-10-01 22:27       ` Nicolas Pitre
@ 2017-10-03 14:59         ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-03 14:59 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, linux-kernel, Chris Brandt, linux-mtd,
	devicetree

On Sun, Oct 01, 2017 at 06:27:11PM -0400, Nicolas Pitre wrote:
> If you prefer, the physical address could be specified with a Kconfig 
> symbol just like the kernel link address. Personally I think it is best 
> to keep it along with the other root mount args. But going all the way 
> with a dynamic driver binding interface and a dummy intermediate name is 
> like using a sledge hammer to kill an ant: it will work of course, but 
> given the context it is prone to errors due to the added manipulations 
> mentioned previously ... and a tad overkill.

As soon as a kernel enables CRAMFS_PHYSMEM this mount option is
available, so you don't just need to think of your use case.

The normal way for doings this would be to specify it in the device
tree.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
@ 2017-10-03 14:59         ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-03 14:59 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, linux-kernel, Chris Brandt, linux-mtd,
	devicetree

On Sun, Oct 01, 2017 at 06:27:11PM -0400, Nicolas Pitre wrote:
> If you prefer, the physical address could be specified with a Kconfig 
> symbol just like the kernel link address. Personally I think it is best 
> to keep it along with the other root mount args. But going all the way 
> with a dynamic driver binding interface and a dummy intermediate name is 
> like using a sledge hammer to kill an ant: it will work of course, but 
> given the context it is prone to errors due to the added manipulations 
> mentioned previously ... and a tad overkill.

As soon as a kernel enables CRAMFS_PHYSMEM this mount option is
available, so you don't just need to think of your use case.

The normal way for doings this would be to specify it in the device
tree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
  2017-10-03 14:59         ` Christoph Hellwig
@ 2017-10-03 15:06           ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-03 15:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt, linux-mtd, devicetree

On Tue, 3 Oct 2017, Christoph Hellwig wrote:

> On Sun, Oct 01, 2017 at 06:27:11PM -0400, Nicolas Pitre wrote:
> > If you prefer, the physical address could be specified with a Kconfig 
> > symbol just like the kernel link address. Personally I think it is best 
> > to keep it along with the other root mount args. But going all the way 
> > with a dynamic driver binding interface and a dummy intermediate name is 
> > like using a sledge hammer to kill an ant: it will work of course, but 
> > given the context it is prone to errors due to the added manipulations 
> > mentioned previously ... and a tad overkill.
> 
> As soon as a kernel enables CRAMFS_PHYSMEM this mount option is
> available, so you don't just need to think of your use case.

What other use cases do you have in mind?

> The normal way for doings this would be to specify it in the device
> tree.

And specify it how? Creating a pseudo device and passing that instead of 
the actual physical address? What is the advantage?

And what about targets that don't use DT? Yes, there are still quite a 
few out there.


Nicolas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 1/5] cramfs: direct memory access support
@ 2017-10-03 15:06           ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-03 15:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Alexander Viro, linux-mm, linux-fsdevel, linux-embedded,
	linux-kernel, Chris Brandt, linux-mtd, devicetree

On Tue, 3 Oct 2017, Christoph Hellwig wrote:

> On Sun, Oct 01, 2017 at 06:27:11PM -0400, Nicolas Pitre wrote:
> > If you prefer, the physical address could be specified with a Kconfig 
> > symbol just like the kernel link address. Personally I think it is best 
> > to keep it along with the other root mount args. But going all the way 
> > with a dynamic driver binding interface and a dummy intermediate name is 
> > like using a sledge hammer to kill an ant: it will work of course, but 
> > given the context it is prone to errors due to the added manipulations 
> > mentioned previously ... and a tad overkill.
> 
> As soon as a kernel enables CRAMFS_PHYSMEM this mount option is
> available, so you don't just need to think of your use case.

What other use cases do you have in mind?

> The normal way for doings this would be to specify it in the device
> tree.

And specify it how? Creating a pseudo device and passing that instead of 
the actual physical address? What is the advantage?

And what about targets that don't use DT? Yes, there are still quite a 
few out there.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-03 14:57             ` Christoph Hellwig
@ 2017-10-03 15:30               ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-03 15:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Tue, 3 Oct 2017, Christoph Hellwig wrote:

> On Mon, Oct 02, 2017 at 07:33:29PM -0400, Nicolas Pitre wrote:
> > On Tue, 3 Oct 2017, Richard Weinberger wrote:
> > 
> > > On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > On Sun, 1 Oct 2017, Christoph Hellwig wrote:
> > > >
> > > >> up_read(&mm->mmap_sem) in the fault path is a still a complete
> > > >> no-go,
> > > >>
> > > >> NAK
> > > >
> > > > Care to elaborate?
> > > >
> > > > What about mm/filemap.c:__lock_page_or_retry() then?
> > > 
> > > As soon you up_read() in the page fault path other tasks will race
> > > with you before
> > > you're able to grab the write lock.
> > 
> > But I _know_ that.
> > 
> > Could you highlight an area in my code where this is not accounted for?
> 
> Existing users of lock_page_or_retry return VM_FAULT_RETRY right after
> up()ing mmap_sem, and they must already have a reference to the page
> which is the only thing touched until then.
> 
> Your patch instead goes for an exclusive mmap_sem if it can, and
> even if there is nothing that breaks with that scheme right now
> there s nothing documenting that this actually safe, and we are
> way down in the complex page fault path.

It is pretty obvious looking at the existing code that if you want to 
safely manipulate a vma you need the write lock. There are many things 
in the kernel tree that are not explicitly documented. Did that stop 
people from adding new code?

I agree that the fault path is quite complex. I've studied it carefully 
before coming up with this scheme. This is not something that came about 
just because the sunshine felt good when I woke up one day.

So if you agree that I've done a reasonable job creating a scheme that 
currently doesn't break then IMHO this should be good enough, 
*especially* for such an isolated and specialized use case with zero 
impact on anyone else. And if things break in the future than I will be 
the one working out the pieces not you, and _that_ can be written down 
somewhere if necessary so nobody has an obligation to bend backward for 
not breaking it.

Unless you have a better scheme altogether  to suggest of course, given 
the existing constraints.


Nicolas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-03 15:30               ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-03 15:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Tue, 3 Oct 2017, Christoph Hellwig wrote:

> On Mon, Oct 02, 2017 at 07:33:29PM -0400, Nicolas Pitre wrote:
> > On Tue, 3 Oct 2017, Richard Weinberger wrote:
> > 
> > > On Mon, Oct 2, 2017 at 12:29 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > > > On Sun, 1 Oct 2017, Christoph Hellwig wrote:
> > > >
> > > >> up_read(&mm->mmap_sem) in the fault path is a still a complete
> > > >> no-go,
> > > >>
> > > >> NAK
> > > >
> > > > Care to elaborate?
> > > >
> > > > What about mm/filemap.c:__lock_page_or_retry() then?
> > > 
> > > As soon you up_read() in the page fault path other tasks will race
> > > with you before
> > > you're able to grab the write lock.
> > 
> > But I _know_ that.
> > 
> > Could you highlight an area in my code where this is not accounted for?
> 
> Existing users of lock_page_or_retry return VM_FAULT_RETRY right after
> up()ing mmap_sem, and they must already have a reference to the page
> which is the only thing touched until then.
> 
> Your patch instead goes for an exclusive mmap_sem if it can, and
> even if there is nothing that breaks with that scheme right now
> there s nothing documenting that this actually safe, and we are
> way down in the complex page fault path.

It is pretty obvious looking at the existing code that if you want to 
safely manipulate a vma you need the write lock. There are many things 
in the kernel tree that are not explicitly documented. Did that stop 
people from adding new code?

I agree that the fault path is quite complex. I've studied it carefully 
before coming up with this scheme. This is not something that came about 
just because the sunshine felt good when I woke up one day.

So if you agree that I've done a reasonable job creating a scheme that 
currently doesn't break then IMHO this should be good enough, 
*especially* for such an isolated and specialized use case with zero 
impact on anyone else. And if things break in the future than I will be 
the one working out the pieces not you, and _that_ can be written down 
somewhere if necessary so nobody has an obligation to bend backward for 
not breaking it.

Unless you have a better scheme altogether  to suggest of course, given 
the existing constraints.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-03 15:30               ` Nicolas Pitre
@ 2017-10-03 15:37                 ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-03 15:37 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Tue, Oct 03, 2017 at 11:30:50AM -0400, Nicolas Pitre wrote:
> Unless you have a better scheme altogether  to suggest of course, given 
> the existing constraints.

I still can't understand why this convoluted fault path that finds
vma, attempts with all kinds of races and then tries to update things
like vm_ops is even nessecary.

We have direct mappings of physical address perfectly working in the
DAX code (even with write support!) or in drivers using remap_pfn_range
so a really good explanation why neither scheme can be used is needed
first.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-03 15:37                 ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-03 15:37 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Tue, Oct 03, 2017 at 11:30:50AM -0400, Nicolas Pitre wrote:
> Unless you have a better scheme altogether  to suggest of course, given 
> the existing constraints.

I still can't understand why this convoluted fault path that finds
vma, attempts with all kinds of races and then tries to update things
like vm_ops is even nessecary.

We have direct mappings of physical address perfectly working in the
DAX code (even with write support!) or in drivers using remap_pfn_range
so a really good explanation why neither scheme can be used is needed
first.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-03 15:37                 ` Christoph Hellwig
@ 2017-10-03 15:40                   ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-03 15:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Tue, 3 Oct 2017, Christoph Hellwig wrote:

> On Tue, Oct 03, 2017 at 11:30:50AM -0400, Nicolas Pitre wrote:
> > Unless you have a better scheme altogether  to suggest of course, given 
> > the existing constraints.
> 
> I still can't understand why this convoluted fault path that finds
> vma, attempts with all kinds of races and then tries to update things
> like vm_ops is even nessecary.
> 
> We have direct mappings of physical address perfectly working in the
> DAX code (even with write support!) or in drivers using remap_pfn_range
> so a really good explanation why neither scheme can be used is needed
> first.

I provided that explanation several times by now in my cover letter. And 
separately even to you directly at least once.  What else should I do?


Nicolas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-03 15:40                   ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-03 15:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Tue, 3 Oct 2017, Christoph Hellwig wrote:

> On Tue, Oct 03, 2017 at 11:30:50AM -0400, Nicolas Pitre wrote:
> > Unless you have a better scheme altogether  to suggest of course, given 
> > the existing constraints.
> 
> I still can't understand why this convoluted fault path that finds
> vma, attempts with all kinds of races and then tries to update things
> like vm_ops is even nessecary.
> 
> We have direct mappings of physical address perfectly working in the
> DAX code (even with write support!) or in drivers using remap_pfn_range
> so a really good explanation why neither scheme can be used is needed
> first.

I provided that explanation several times by now in my cover letter. And 
separately even to you directly at least once.  What else should I do?


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-03 15:40                   ` Nicolas Pitre
@ 2017-10-04  7:25                     ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-04  7:25 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Tue, Oct 03, 2017 at 11:40:28AM -0400, Nicolas Pitre wrote:
> I provided that explanation several times by now in my cover letter. And 
> separately even to you directly at least once.  What else should I do?

You should do the right things instead of stating irrelevant things
in your cover letter.  As said in my last mail: look at the VM_MIXEDMAP
flag and how it is used by DAX, and you'll get out of the vma splitting
business in the fault path.

If the fs/dax.c code scares you take a look at drivers/dax/device.c
instead.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-04  7:25                     ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-04  7:25 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Tue, Oct 03, 2017 at 11:40:28AM -0400, Nicolas Pitre wrote:
> I provided that explanation several times by now in my cover letter. And 
> separately even to you directly at least once.  What else should I do?

You should do the right things instead of stating irrelevant things
in your cover letter.  As said in my last mail: look at the VM_MIXEDMAP
flag and how it is used by DAX, and you'll get out of the vma splitting
business in the fault path.

If the fs/dax.c code scares you take a look at drivers/dax/device.c
instead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-04  7:25                     ` Christoph Hellwig
@ 2017-10-04 20:47                       ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-04 20:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Wed, 4 Oct 2017, Christoph Hellwig wrote:

> As said in my last mail: look at the VM_MIXEDMAP flag and how it is 
> used by DAX, and you'll get out of the vma splitting business in the 
> fault path.

Alright, it appears to work.

The only downside so far is the lack of visibility from user space to 
confirm it actually works as intended. With the vma splitting approach 
you clearly see what gets directly mapped in /proc/*/maps thanks to 
remap_pfn_range() storing the actual physical address in vma->vm_pgoff. 
With VM_MIXEDMAP things are no longer visible. Any opinion for the best 
way to overcome this?

Anyway, here's a replacement for patch 4/5 below:

----- >8
Subject: cramfs: add mmap support

When cramfs_physmem is used then we have the opportunity to map files
directly from ROM, directly into user space, saving on RAM usage.
This gives us Execute-In-Place (XIP) support.

For a file to be mmap()-able, the map area has to correspond to a range
of uncompressed and contiguous blocks, and in the MMU case it also has
to be page aligned. A version of mkcramfs with appropriate support is
necessary to create such a filesystem image.

In the MMU case it may happen for a vma structure to extend beyond the
actual file size. This is notably the case in binfmt_elf.c:elf_map().
Or the file's last block is shared with other files and cannot be mapped
as is. Rather than refusing to mmap it, we do a "mixed" map and let the
regular fault handler populate the unmapped area with RAM-backed pages.
In practice the unmapped area is seldom accessed so page faults might
never occur before this area is discarded.

In the non-MMU case it is the get_unmapped_area method that is responsible
for providing the address where the actual data can be found. No mapping
is necessary of course.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 2fc886092b..9d5d0c1f7d 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -15,7 +15,10 @@
 
 #include <linux/module.h>
 #include <linux/fs.h>
+#include <linux/file.h>
 #include <linux/pagemap.h>
+#include <linux/pfn_t.h>
+#include <linux/ramfs.h>
 #include <linux/init.h>
 #include <linux/string.h>
 #include <linux/blkdev.h>
@@ -49,6 +52,7 @@ static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
 static const struct super_operations cramfs_ops;
 static const struct inode_operations cramfs_dir_inode_operations;
 static const struct file_operations cramfs_directory_operations;
+static const struct file_operations cramfs_physmem_fops;
 static const struct address_space_operations cramfs_aops;
 
 static DEFINE_MUTEX(read_mutex);
@@ -96,6 +100,10 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
 	case S_IFREG:
 		inode->i_fop = &generic_ro_fops;
 		inode->i_data.a_ops = &cramfs_aops;
+		if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
+		    CRAMFS_SB(sb)->flags & CRAMFS_FLAG_EXT_BLOCK_POINTERS &&
+		    CRAMFS_SB(sb)->linear_phys_addr)
+			inode->i_fop = &cramfs_physmem_fops;
 		break;
 	case S_IFDIR:
 		inode->i_op = &cramfs_dir_inode_operations;
@@ -277,6 +285,188 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset,
 		return NULL;
 }
 
+/*
+ * For a mapping to be possible, we need a range of uncompressed and
+ * contiguous blocks. Return the offset for the first block and number of
+ * valid blocks for which that is true, or zero otherwise.
+ */
+static u32 cramfs_get_block_range(struct inode *inode, u32 pgoff, u32 *pages)
+{
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	int i;
+	u32 *blockptrs, blockaddr;
+
+	/*
+	 * We can dereference memory directly here as this code may be
+	 * reached only when there is a direct filesystem image mapping
+	 * available in memory.
+	 */
+	blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff*4);
+	blockaddr = blockptrs[0] & ~CRAMFS_BLK_FLAGS;
+	i = 0;
+	do {
+		u32 expect = blockaddr + i * (PAGE_SIZE >> 2);
+		expect |= CRAMFS_BLK_FLAG_DIRECT_PTR|CRAMFS_BLK_FLAG_UNCOMPRESSED;
+		if (blockptrs[i] != expect) {
+			pr_debug("range: block %d/%d got %#x expects %#x\n",
+				 pgoff+i, pgoff+*pages-1, blockptrs[i], expect);
+			if (i == 0)
+				return 0;
+			break;
+		}
+	} while (++i < *pages);
+
+	*pages = i;
+
+	/* stored "direct" block ptrs are shifted down by 2 bits */
+	return blockaddr << 2;
+}
+
+static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, vma_pages, max_pages, offset;
+	unsigned long address;
+	char *fail_reason;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_MMU))
+		return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS;
+
+	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
+		return -EINVAL;
+
+	/* Could COW work here? */
+	fail_reason = "vma is writable";
+	if (vma->vm_flags & VM_WRITE)
+		goto fail;
+
+	vma_pages = (vma->vm_end - vma->vm_start + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	fail_reason = "beyond file limit";
+	if (vma->vm_pgoff >= max_pages)
+		goto fail;
+	pages = vma_pages;
+	if (pages > max_pages - vma->vm_pgoff)
+		pages = max_pages - vma->vm_pgoff;
+
+	offset = cramfs_get_block_range(inode, vma->vm_pgoff, &pages);
+	fail_reason = "unsuitable block layout";
+	if (!offset)
+		goto fail;
+	address = sbi->linear_phys_addr + offset;
+	fail_reason = "data is not page aligned";
+	if (!PAGE_ALIGNED(address))
+		goto fail;
+
+	/* Don't map the last page if it contains some other data */
+	if (unlikely(vma->vm_pgoff + pages == max_pages)) {
+		unsigned int partial = offset_in_page(inode->i_size);
+		if (partial) {
+			char *data = sbi->linear_virt_addr + offset;
+			data += (max_pages - 1) * PAGE_SIZE + partial;
+			while ((unsigned long)data & 7)
+				if (*data++ != 0)
+					goto nonzero;
+			while (offset_in_page(data)) {
+				if (*(u64 *)data != 0) {
+					nonzero:
+					pr_debug("mmap: %s: last page is shared\n",
+						 file_dentry(file)->d_name.name);
+					pages--;
+					break;
+				}
+				data += 8;
+			}
+		}
+	}
+
+	if (!pages) {
+		fail_reason = "no suitable block remaining";
+		goto fail;
+	} else if (pages != vma_pages) {
+		/*
+		 * Let's create a mixed map if we can't map it all.
+		 * The normal paging machinery will take care of the
+		 * unpopulated vma via cramfs_readpage().
+		 */
+		int i;
+		vma->vm_flags |= VM_MIXEDMAP;
+		for (i = 0; i < pages; i++) {
+			unsigned long vaddr = vma->vm_start + i*PAGE_SIZE;
+			pfn_t pfn = phys_to_pfn_t(address + i*PAGE_SIZE, PFN_DEV);
+			ret = vm_insert_mixed(vma, vaddr, pfn);
+			if (ret)
+				return ret;
+		}
+		vma->vm_ops = &generic_file_vm_ops;
+	} else {
+		ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
+				      pages * PAGE_SIZE, vma->vm_page_prot);
+		if (ret)
+			return ret;
+	}
+
+	pr_debug("mapped %s at 0x%08lx (%u/%u pages) to vma 0x%08lx, "
+		 "page_prot 0x%llx\n", file_dentry(file)->d_name.name,
+		 address, pages, vma_pages, vma->vm_start,
+		 (unsigned long long)pgprot_val(vma->vm_page_prot));
+	return 0;
+
+fail:
+	pr_debug("%s: direct mmap failed: %s\n",
+		 file_dentry(file)->d_name.name, fail_reason);
+
+	/* We failed to do a direct map, but normal paging is still possible */
+	vma->vm_ops = &generic_file_vm_ops;
+	return 0;
+}
+
+#ifndef CONFIG_MMU
+
+static unsigned long cramfs_physmem_get_unmapped_area(struct file *file,
+			unsigned long addr, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, block_pages, max_pages, offset;
+
+	pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	if (pgoff >= max_pages || pages > max_pages - pgoff)
+		return -EINVAL;
+	block_pages = pages;
+	offset = cramfs_get_block_range(inode, pgoff, &block_pages);
+	if (!offset || block_pages != pages)
+		return -ENOSYS;
+	addr = sbi->linear_phys_addr + offset;
+	pr_debug("get_unmapped for %s ofs %#lx siz %lu at 0x%08lx\n",
+		 file_dentry(file)->d_name.name, pgoff*PAGE_SIZE, len, addr);
+	return addr;
+}
+
+static unsigned cramfs_physmem_mmap_capabilities(struct file *file)
+{
+	return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC;
+}
+#endif
+
+static const struct file_operations cramfs_physmem_fops = {
+	.llseek			= generic_file_llseek,
+	.read_iter		= generic_file_read_iter,
+	.splice_read		= generic_file_splice_read,
+	.mmap			= cramfs_physmem_mmap,
+#ifndef CONFIG_MMU
+	.get_unmapped_area	= cramfs_physmem_get_unmapped_area,
+	.mmap_capabilities	= cramfs_physmem_mmap_capabilities,
+#endif
+};
+
 static void cramfs_blkdev_kill_sb(struct super_block *sb)
 {
 	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-04 20:47                       ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-04 20:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Wed, 4 Oct 2017, Christoph Hellwig wrote:

> As said in my last mail: look at the VM_MIXEDMAP flag and how it is 
> used by DAX, and you'll get out of the vma splitting business in the 
> fault path.

Alright, it appears to work.

The only downside so far is the lack of visibility from user space to 
confirm it actually works as intended. With the vma splitting approach 
you clearly see what gets directly mapped in /proc/*/maps thanks to 
remap_pfn_range() storing the actual physical address in vma->vm_pgoff. 
With VM_MIXEDMAP things are no longer visible. Any opinion for the best 
way to overcome this?

Anyway, here's a replacement for patch 4/5 below:

----- >8
Subject: cramfs: add mmap support

When cramfs_physmem is used then we have the opportunity to map files
directly from ROM, directly into user space, saving on RAM usage.
This gives us Execute-In-Place (XIP) support.

For a file to be mmap()-able, the map area has to correspond to a range
of uncompressed and contiguous blocks, and in the MMU case it also has
to be page aligned. A version of mkcramfs with appropriate support is
necessary to create such a filesystem image.

In the MMU case it may happen for a vma structure to extend beyond the
actual file size. This is notably the case in binfmt_elf.c:elf_map().
Or the file's last block is shared with other files and cannot be mapped
as is. Rather than refusing to mmap it, we do a "mixed" map and let the
regular fault handler populate the unmapped area with RAM-backed pages.
In practice the unmapped area is seldom accessed so page faults might
never occur before this area is discarded.

In the non-MMU case it is the get_unmapped_area method that is responsible
for providing the address where the actual data can be found. No mapping
is necessary of course.

Signed-off-by: Nicolas Pitre <nico@linaro.org>

diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c
index 2fc886092b..9d5d0c1f7d 100644
--- a/fs/cramfs/inode.c
+++ b/fs/cramfs/inode.c
@@ -15,7 +15,10 @@
 
 #include <linux/module.h>
 #include <linux/fs.h>
+#include <linux/file.h>
 #include <linux/pagemap.h>
+#include <linux/pfn_t.h>
+#include <linux/ramfs.h>
 #include <linux/init.h>
 #include <linux/string.h>
 #include <linux/blkdev.h>
@@ -49,6 +52,7 @@ static inline struct cramfs_sb_info *CRAMFS_SB(struct super_block *sb)
 static const struct super_operations cramfs_ops;
 static const struct inode_operations cramfs_dir_inode_operations;
 static const struct file_operations cramfs_directory_operations;
+static const struct file_operations cramfs_physmem_fops;
 static const struct address_space_operations cramfs_aops;
 
 static DEFINE_MUTEX(read_mutex);
@@ -96,6 +100,10 @@ static struct inode *get_cramfs_inode(struct super_block *sb,
 	case S_IFREG:
 		inode->i_fop = &generic_ro_fops;
 		inode->i_data.a_ops = &cramfs_aops;
+		if (IS_ENABLED(CONFIG_CRAMFS_PHYSMEM) &&
+		    CRAMFS_SB(sb)->flags & CRAMFS_FLAG_EXT_BLOCK_POINTERS &&
+		    CRAMFS_SB(sb)->linear_phys_addr)
+			inode->i_fop = &cramfs_physmem_fops;
 		break;
 	case S_IFDIR:
 		inode->i_op = &cramfs_dir_inode_operations;
@@ -277,6 +285,188 @@ static void *cramfs_read(struct super_block *sb, unsigned int offset,
 		return NULL;
 }
 
+/*
+ * For a mapping to be possible, we need a range of uncompressed and
+ * contiguous blocks. Return the offset for the first block and number of
+ * valid blocks for which that is true, or zero otherwise.
+ */
+static u32 cramfs_get_block_range(struct inode *inode, u32 pgoff, u32 *pages)
+{
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	int i;
+	u32 *blockptrs, blockaddr;
+
+	/*
+	 * We can dereference memory directly here as this code may be
+	 * reached only when there is a direct filesystem image mapping
+	 * available in memory.
+	 */
+	blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff*4);
+	blockaddr = blockptrs[0] & ~CRAMFS_BLK_FLAGS;
+	i = 0;
+	do {
+		u32 expect = blockaddr + i * (PAGE_SIZE >> 2);
+		expect |= CRAMFS_BLK_FLAG_DIRECT_PTR|CRAMFS_BLK_FLAG_UNCOMPRESSED;
+		if (blockptrs[i] != expect) {
+			pr_debug("range: block %d/%d got %#x expects %#x\n",
+				 pgoff+i, pgoff+*pages-1, blockptrs[i], expect);
+			if (i == 0)
+				return 0;
+			break;
+		}
+	} while (++i < *pages);
+
+	*pages = i;
+
+	/* stored "direct" block ptrs are shifted down by 2 bits */
+	return blockaddr << 2;
+}
+
+static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, vma_pages, max_pages, offset;
+	unsigned long address;
+	char *fail_reason;
+	int ret;
+
+	if (!IS_ENABLED(CONFIG_MMU))
+		return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS;
+
+	if ((vma->vm_flags & VM_SHARED) && (vma->vm_flags & VM_MAYWRITE))
+		return -EINVAL;
+
+	/* Could COW work here? */
+	fail_reason = "vma is writable";
+	if (vma->vm_flags & VM_WRITE)
+		goto fail;
+
+	vma_pages = (vma->vm_end - vma->vm_start + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	fail_reason = "beyond file limit";
+	if (vma->vm_pgoff >= max_pages)
+		goto fail;
+	pages = vma_pages;
+	if (pages > max_pages - vma->vm_pgoff)
+		pages = max_pages - vma->vm_pgoff;
+
+	offset = cramfs_get_block_range(inode, vma->vm_pgoff, &pages);
+	fail_reason = "unsuitable block layout";
+	if (!offset)
+		goto fail;
+	address = sbi->linear_phys_addr + offset;
+	fail_reason = "data is not page aligned";
+	if (!PAGE_ALIGNED(address))
+		goto fail;
+
+	/* Don't map the last page if it contains some other data */
+	if (unlikely(vma->vm_pgoff + pages == max_pages)) {
+		unsigned int partial = offset_in_page(inode->i_size);
+		if (partial) {
+			char *data = sbi->linear_virt_addr + offset;
+			data += (max_pages - 1) * PAGE_SIZE + partial;
+			while ((unsigned long)data & 7)
+				if (*data++ != 0)
+					goto nonzero;
+			while (offset_in_page(data)) {
+				if (*(u64 *)data != 0) {
+					nonzero:
+					pr_debug("mmap: %s: last page is shared\n",
+						 file_dentry(file)->d_name.name);
+					pages--;
+					break;
+				}
+				data += 8;
+			}
+		}
+	}
+
+	if (!pages) {
+		fail_reason = "no suitable block remaining";
+		goto fail;
+	} else if (pages != vma_pages) {
+		/*
+		 * Let's create a mixed map if we can't map it all.
+		 * The normal paging machinery will take care of the
+		 * unpopulated vma via cramfs_readpage().
+		 */
+		int i;
+		vma->vm_flags |= VM_MIXEDMAP;
+		for (i = 0; i < pages; i++) {
+			unsigned long vaddr = vma->vm_start + i*PAGE_SIZE;
+			pfn_t pfn = phys_to_pfn_t(address + i*PAGE_SIZE, PFN_DEV);
+			ret = vm_insert_mixed(vma, vaddr, pfn);
+			if (ret)
+				return ret;
+		}
+		vma->vm_ops = &generic_file_vm_ops;
+	} else {
+		ret = remap_pfn_range(vma, vma->vm_start, address >> PAGE_SHIFT,
+				      pages * PAGE_SIZE, vma->vm_page_prot);
+		if (ret)
+			return ret;
+	}
+
+	pr_debug("mapped %s at 0x%08lx (%u/%u pages) to vma 0x%08lx, "
+		 "page_prot 0x%llx\n", file_dentry(file)->d_name.name,
+		 address, pages, vma_pages, vma->vm_start,
+		 (unsigned long long)pgprot_val(vma->vm_page_prot));
+	return 0;
+
+fail:
+	pr_debug("%s: direct mmap failed: %s\n",
+		 file_dentry(file)->d_name.name, fail_reason);
+
+	/* We failed to do a direct map, but normal paging is still possible */
+	vma->vm_ops = &generic_file_vm_ops;
+	return 0;
+}
+
+#ifndef CONFIG_MMU
+
+static unsigned long cramfs_physmem_get_unmapped_area(struct file *file,
+			unsigned long addr, unsigned long len,
+			unsigned long pgoff, unsigned long flags)
+{
+	struct inode *inode = file_inode(file);
+	struct super_block *sb = inode->i_sb;
+	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
+	unsigned int pages, block_pages, max_pages, offset;
+
+	pages = (len + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	max_pages = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	if (pgoff >= max_pages || pages > max_pages - pgoff)
+		return -EINVAL;
+	block_pages = pages;
+	offset = cramfs_get_block_range(inode, pgoff, &block_pages);
+	if (!offset || block_pages != pages)
+		return -ENOSYS;
+	addr = sbi->linear_phys_addr + offset;
+	pr_debug("get_unmapped for %s ofs %#lx siz %lu at 0x%08lx\n",
+		 file_dentry(file)->d_name.name, pgoff*PAGE_SIZE, len, addr);
+	return addr;
+}
+
+static unsigned cramfs_physmem_mmap_capabilities(struct file *file)
+{
+	return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC;
+}
+#endif
+
+static const struct file_operations cramfs_physmem_fops = {
+	.llseek			= generic_file_llseek,
+	.read_iter		= generic_file_read_iter,
+	.splice_read		= generic_file_splice_read,
+	.mmap			= cramfs_physmem_mmap,
+#ifndef CONFIG_MMU
+	.get_unmapped_area	= cramfs_physmem_get_unmapped_area,
+	.mmap_capabilities	= cramfs_physmem_mmap_capabilities,
+#endif
+};
+
 static void cramfs_blkdev_kill_sb(struct super_block *sb)
 {
 	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-04 20:47                       ` Nicolas Pitre
@ 2017-10-05  7:15                         ` Christoph Hellwig
  -1 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-05  7:15 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Wed, Oct 04, 2017 at 04:47:52PM -0400, Nicolas Pitre wrote:
> The only downside so far is the lack of visibility from user space to 
> confirm it actually works as intended. With the vma splitting approach 
> you clearly see what gets directly mapped in /proc/*/maps thanks to 
> remap_pfn_range() storing the actual physical address in vma->vm_pgoff. 
> With VM_MIXEDMAP things are no longer visible. Any opinion for the best 
> way to overcome this?

Add trace points that allow you to trace it using trace-cmd, perf
or just tracefs?

> 
> Anyway, here's a replacement for patch 4/5 below:

This looks much better, and is about 100 lines less than the previous
version.  More (mostly cosmetic) comments below:

> +	blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff*4);

missing psaces around the *

>
> +	blockaddr = blockptrs[0] & ~CRAMFS_BLK_FLAGS;
> +	i = 0;
> +	do {
> +		u32 expect = blockaddr + i * (PAGE_SIZE >> 2);

There are a lot of magic numbers in here.  It seems like that's standard
for cramfs, but if you really plan to bring it back to live it would be
create to sort that out..



> +		expect |= CRAMFS_BLK_FLAG_DIRECT_PTR|CRAMFS_BLK_FLAG_UNCOMPRESSED;

Too long line.

Just turn this into:

		 u32 expect = blockaddr + i * (PAGE_SIZE >> 2) |
		 		CRAMFS_BLK_FLAG_DIRECT_PTR |
				CRAMFS_BLK_FLAG_UNCOMPRESSED;

and it will be a lot more readable.

> +static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct super_block *sb = inode->i_sb;
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +	unsigned int pages, vma_pages, max_pages, offset;
> +	unsigned long address;
> +	char *fail_reason;
> +	int ret;
> +
> +	if (!IS_ENABLED(CONFIG_MMU))
> +		return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS;

Given that you have a separate #ifndef CONFIG_MMU section below just
have a separate implementation of cramfs_physmem_mmap for it, which
makes the code a lot more obvious.

> +	/* Could COW work here? */
> +	fail_reason = "vma is writable";
> +	if (vma->vm_flags & VM_WRITE)
> +		goto fail;

The fail_reaosn is a rather unusable style, is there any good reason
why you need it here?  We generall don't add a debug printk for every
pssible failure case.

> +	vma_pages = (vma->vm_end - vma->vm_start + PAGE_SIZE - 1) >> PAGE_SHIFT;

Just use vma_pages - the defintion is different, but given that vm_end
and vm_stat must be page aligned anyway it should not make a difference.

> +	if (pages > max_pages - vma->vm_pgoff)
> +		pages = max_pages - vma->vm_pgoff;

Use min() or min_t().

> +	/* Don't map the last page if it contains some other data */
> +	if (unlikely(vma->vm_pgoff + pages == max_pages)) {
> +		unsigned int partial = offset_in_page(inode->i_size);
> +		if (partial) {
> +			char *data = sbi->linear_virt_addr + offset;
> +			data += (max_pages - 1) * PAGE_SIZE + partial;
> +			while ((unsigned long)data & 7)
> +				if (*data++ != 0)
> +					goto nonzero;
> +			while (offset_in_page(data)) {
> +				if (*(u64 *)data != 0) {
> +					nonzero:
> +					pr_debug("mmap: %s: last page is shared\n",
> +						 file_dentry(file)->d_name.name);
> +					pages--;
> +					break;
> +				}
> +				data += 8;
> +			}

The nonzer label is in a rather unusual space, both having weird
indentation and being in the middle of the loop.

It seems like this whole partial section should just go into a little
helper where the nonzero case is at the end of said helper to make it
readable.  Also lots of magic numbers again, and generally a little
too much magic for the code to be easily understandable: why do you
operate on pointers casted to longs, increment in 8-byte steps?
Why is offset_in_page used for an operation that doesn't operate on
struct page at all?  Any reason you can't just use memchr_inv?

> +	if (!pages) {
> +		fail_reason = "no suitable block remaining";
> +		goto fail;
> +	} else if (pages != vma_pages) {

No if else please if you goto a different label, that just confuses the
user.

> +		/*
> +		 * Let's create a mixed map if we can't map it all.
> +		 * The normal paging machinery will take care of the
> +		 * unpopulated vma via cramfs_readpage().
> +		 */
> +		int i;
> +		vma->vm_flags |= VM_MIXEDMAP;
> +		for (i = 0; i < pages; i++) {
> +			unsigned long vaddr = vma->vm_start + i*PAGE_SIZE;
> +			pfn_t pfn = phys_to_pfn_t(address + i*PAGE_SIZE, PFN_DEV);
> +			ret = vm_insert_mixed(vma, vaddr, pfn);

Please use spaces around the * operator, and don't use overly long
lines.

A local variable might help doing that in a readnable way:

			unsigned long off = i * PAGE_SIZE;

			ret = vm_insert_mixed(vma, vma->vm_start + off,
					phys_to_pfn_t(address + off, PFN_DEV);

> +	/* We failed to do a direct map, but normal paging is still possible */
> +	vma->vm_ops = &generic_file_vm_ops;

Maybe let the mixedmap case fall through to this instead of having
a duplicate vm_ops assignment.

> +static unsigned cramfs_physmem_mmap_capabilities(struct file *file)
> +{
> +	return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC;

Too long line.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-05  7:15                         ` Christoph Hellwig
  0 siblings, 0 replies; 54+ messages in thread
From: Christoph Hellwig @ 2017-10-05  7:15 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML, Chris Brandt

On Wed, Oct 04, 2017 at 04:47:52PM -0400, Nicolas Pitre wrote:
> The only downside so far is the lack of visibility from user space to 
> confirm it actually works as intended. With the vma splitting approach 
> you clearly see what gets directly mapped in /proc/*/maps thanks to 
> remap_pfn_range() storing the actual physical address in vma->vm_pgoff. 
> With VM_MIXEDMAP things are no longer visible. Any opinion for the best 
> way to overcome this?

Add trace points that allow you to trace it using trace-cmd, perf
or just tracefs?

> 
> Anyway, here's a replacement for patch 4/5 below:

This looks much better, and is about 100 lines less than the previous
version.  More (mostly cosmetic) comments below:

> +	blockptrs = (u32 *)(sbi->linear_virt_addr + OFFSET(inode) + pgoff*4);

missing psaces around the *

>
> +	blockaddr = blockptrs[0] & ~CRAMFS_BLK_FLAGS;
> +	i = 0;
> +	do {
> +		u32 expect = blockaddr + i * (PAGE_SIZE >> 2);

There are a lot of magic numbers in here.  It seems like that's standard
for cramfs, but if you really plan to bring it back to live it would be
create to sort that out..



> +		expect |= CRAMFS_BLK_FLAG_DIRECT_PTR|CRAMFS_BLK_FLAG_UNCOMPRESSED;

Too long line.

Just turn this into:

		 u32 expect = blockaddr + i * (PAGE_SIZE >> 2) |
		 		CRAMFS_BLK_FLAG_DIRECT_PTR |
				CRAMFS_BLK_FLAG_UNCOMPRESSED;

and it will be a lot more readable.

> +static int cramfs_physmem_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct inode *inode = file_inode(file);
> +	struct super_block *sb = inode->i_sb;
> +	struct cramfs_sb_info *sbi = CRAMFS_SB(sb);
> +	unsigned int pages, vma_pages, max_pages, offset;
> +	unsigned long address;
> +	char *fail_reason;
> +	int ret;
> +
> +	if (!IS_ENABLED(CONFIG_MMU))
> +		return vma->vm_flags & (VM_SHARED | VM_MAYSHARE) ? 0 : -ENOSYS;

Given that you have a separate #ifndef CONFIG_MMU section below just
have a separate implementation of cramfs_physmem_mmap for it, which
makes the code a lot more obvious.

> +	/* Could COW work here? */
> +	fail_reason = "vma is writable";
> +	if (vma->vm_flags & VM_WRITE)
> +		goto fail;

The fail_reaosn is a rather unusable style, is there any good reason
why you need it here?  We generall don't add a debug printk for every
pssible failure case.

> +	vma_pages = (vma->vm_end - vma->vm_start + PAGE_SIZE - 1) >> PAGE_SHIFT;

Just use vma_pages - the defintion is different, but given that vm_end
and vm_stat must be page aligned anyway it should not make a difference.

> +	if (pages > max_pages - vma->vm_pgoff)
> +		pages = max_pages - vma->vm_pgoff;

Use min() or min_t().

> +	/* Don't map the last page if it contains some other data */
> +	if (unlikely(vma->vm_pgoff + pages == max_pages)) {
> +		unsigned int partial = offset_in_page(inode->i_size);
> +		if (partial) {
> +			char *data = sbi->linear_virt_addr + offset;
> +			data += (max_pages - 1) * PAGE_SIZE + partial;
> +			while ((unsigned long)data & 7)
> +				if (*data++ != 0)
> +					goto nonzero;
> +			while (offset_in_page(data)) {
> +				if (*(u64 *)data != 0) {
> +					nonzero:
> +					pr_debug("mmap: %s: last page is shared\n",
> +						 file_dentry(file)->d_name.name);
> +					pages--;
> +					break;
> +				}
> +				data += 8;
> +			}

The nonzer label is in a rather unusual space, both having weird
indentation and being in the middle of the loop.

It seems like this whole partial section should just go into a little
helper where the nonzero case is at the end of said helper to make it
readable.  Also lots of magic numbers again, and generally a little
too much magic for the code to be easily understandable: why do you
operate on pointers casted to longs, increment in 8-byte steps?
Why is offset_in_page used for an operation that doesn't operate on
struct page at all?  Any reason you can't just use memchr_inv?

> +	if (!pages) {
> +		fail_reason = "no suitable block remaining";
> +		goto fail;
> +	} else if (pages != vma_pages) {

No if else please if you goto a different label, that just confuses the
user.

> +		/*
> +		 * Let's create a mixed map if we can't map it all.
> +		 * The normal paging machinery will take care of the
> +		 * unpopulated vma via cramfs_readpage().
> +		 */
> +		int i;
> +		vma->vm_flags |= VM_MIXEDMAP;
> +		for (i = 0; i < pages; i++) {
> +			unsigned long vaddr = vma->vm_start + i*PAGE_SIZE;
> +			pfn_t pfn = phys_to_pfn_t(address + i*PAGE_SIZE, PFN_DEV);
> +			ret = vm_insert_mixed(vma, vaddr, pfn);

Please use spaces around the * operator, and don't use overly long
lines.

A local variable might help doing that in a readnable way:

			unsigned long off = i * PAGE_SIZE;

			ret = vm_insert_mixed(vma, vma->vm_start + off,
					phys_to_pfn_t(address + off, PFN_DEV);

> +	/* We failed to do a direct map, but normal paging is still possible */
> +	vma->vm_ops = &generic_file_vm_ops;

Maybe let the mixedmap case fall through to this instead of having
a duplicate vm_ops assignment.

> +static unsigned cramfs_physmem_mmap_capabilities(struct file *file)
> +{
> +	return NOMMU_MAP_COPY | NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_EXEC;

Too long line.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-05  7:15                         ` Christoph Hellwig
@ 2017-10-05 17:52                           ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-05 17:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Thu, 5 Oct 2017, Christoph Hellwig wrote:

> On Wed, Oct 04, 2017 at 04:47:52PM -0400, Nicolas Pitre wrote:
> > The only downside so far is the lack of visibility from user space to 
> > confirm it actually works as intended. With the vma splitting approach 
> > you clearly see what gets directly mapped in /proc/*/maps thanks to 
> > remap_pfn_range() storing the actual physical address in vma->vm_pgoff. 
> > With VM_MIXEDMAP things are no longer visible. Any opinion for the best 
> > way to overcome this?
> 
> Add trace points that allow you to trace it using trace-cmd, perf
> or just tracefs?

In memory constrained embedded environments those facilities are 
sometimes too big to be practical. And the /proc/*/maps content is 
static i.e. it is always there regardless of how many tasks you have and 
how long they've been running which makes it extremely handy.

> > Anyway, here's a replacement for patch 4/5 below:
> 
> This looks much better, and is about 100 lines less than the previous
> version.  More (mostly cosmetic) comments below:
> 
[...]
> > +	fail_reason = "vma is writable";
> > +	if (vma->vm_flags & VM_WRITE)
> > +		goto fail;
> 
> The fail_reaosn is a rather unusable style, is there any good reason
> why you need it here?  We generall don't add a debug printk for every
> pssible failure case.

There are many things that might make your files not XIP and they're 
mostly related to how the file is mmap'd or how mkcramfs was used. When 
looking where some of your memory has gone because some files are not 
directly mapped it is nice to have a hint as to why at run time. Doing 
it that way also works as comments for someone reading the code, and the 
compiler optimizes those strings away when DEBUG is not defined anyway. 

I did s/fail/bailout/ though, as those are not hard failures. The hard 
failures have no such debugging messages.

[...]
> It seems like this whole partial section should just go into a little
> helper where the nonzero case is at the end of said helper to make it
> readable.  Also lots of magic numbers again, and generally a little
> too much magic for the code to be easily understandable: why do you
> operate on pointers casted to longs, increment in 8-byte steps?
> Why is offset_in_page used for an operation that doesn't operate on
> struct page at all?  Any reason you can't just use memchr_inv?

Ahhh... use memchr_inv is in fact exactly what I was looking for.
Learn something every day.

[...]
> > +	/* We failed to do a direct map, but normal paging is still possible */
> > +	vma->vm_ops = &generic_file_vm_ops;
> 
> Maybe let the mixedmap case fall through to this instead of having
> a duplicate vm_ops assignment.

The code flow is different and that makes it hard to have a common 
assignment in this case.

Otherwise I've applied all your suggestions.

Thanks for your comments. Very appreciated.


Nicolas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-05 17:52                           ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-05 17:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML, Chris Brandt

On Thu, 5 Oct 2017, Christoph Hellwig wrote:

> On Wed, Oct 04, 2017 at 04:47:52PM -0400, Nicolas Pitre wrote:
> > The only downside so far is the lack of visibility from user space to 
> > confirm it actually works as intended. With the vma splitting approach 
> > you clearly see what gets directly mapped in /proc/*/maps thanks to 
> > remap_pfn_range() storing the actual physical address in vma->vm_pgoff. 
> > With VM_MIXEDMAP things are no longer visible. Any opinion for the best 
> > way to overcome this?
> 
> Add trace points that allow you to trace it using trace-cmd, perf
> or just tracefs?

In memory constrained embedded environments those facilities are 
sometimes too big to be practical. And the /proc/*/maps content is 
static i.e. it is always there regardless of how many tasks you have and 
how long they've been running which makes it extremely handy.

> > Anyway, here's a replacement for patch 4/5 below:
> 
> This looks much better, and is about 100 lines less than the previous
> version.  More (mostly cosmetic) comments below:
> 
[...]
> > +	fail_reason = "vma is writable";
> > +	if (vma->vm_flags & VM_WRITE)
> > +		goto fail;
> 
> The fail_reaosn is a rather unusable style, is there any good reason
> why you need it here?  We generall don't add a debug printk for every
> pssible failure case.

There are many things that might make your files not XIP and they're 
mostly related to how the file is mmap'd or how mkcramfs was used. When 
looking where some of your memory has gone because some files are not 
directly mapped it is nice to have a hint as to why at run time. Doing 
it that way also works as comments for someone reading the code, and the 
compiler optimizes those strings away when DEBUG is not defined anyway. 

I did s/fail/bailout/ though, as those are not hard failures. The hard 
failures have no such debugging messages.

[...]
> It seems like this whole partial section should just go into a little
> helper where the nonzero case is at the end of said helper to make it
> readable.  Also lots of magic numbers again, and generally a little
> too much magic for the code to be easily understandable: why do you
> operate on pointers casted to longs, increment in 8-byte steps?
> Why is offset_in_page used for an operation that doesn't operate on
> struct page at all?  Any reason you can't just use memchr_inv?

Ahhh... use memchr_inv is in fact exactly what I was looking for.
Learn something every day.

[...]
> > +	/* We failed to do a direct map, but normal paging is still possible */
> > +	vma->vm_ops = &generic_file_vm_ops;
> 
> Maybe let the mixedmap case fall through to this instead of having
> a duplicate vm_ops assignment.

The code flow is different and that makes it hard to have a common 
assignment in this case.

Otherwise I've applied all your suggestions.

Thanks for your comments. Very appreciated.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-04 20:47                       ` Nicolas Pitre
@ 2017-10-05 20:00                         ` Chris Brandt
  -1 siblings, 0 replies; 54+ messages in thread
From: Chris Brandt @ 2017-10-05 20:00 UTC (permalink / raw)
  To: Nicolas Pitre, Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML

On Wednesday, October 04, 2017, Nicolas Pitre wrote:
> On Wed, 4 Oct 2017, Christoph Hellwig wrote:
> 
> > As said in my last mail: look at the VM_MIXEDMAP flag and how it is
> > used by DAX, and you'll get out of the vma splitting business in the
> > fault path.
> 
> Alright, it appears to work.
> 
> The only downside so far is the lack of visibility from user space to
> confirm it actually works as intended. With the vma splitting approach
> you clearly see what gets directly mapped in /proc/*/maps thanks to
> remap_pfn_range() storing the actual physical address in vma->vm_pgoff.
> With VM_MIXEDMAP things are no longer visible. Any opinion for the best
> way to overcome this?
> 
> Anyway, here's a replacement for patch 4/5 below:
> 
> ----- >8
> Subject: cramfs: add mmap support
> 
> When cramfs_physmem is used then we have the opportunity to map files
> directly from ROM, directly into user space, saving on RAM usage.
> This gives us Execute-In-Place (XIP) support.


Tested on my setup:
 * Cortex A9 (with MMU)
 * CONFIG_XIP_KERNEL=y
 * booted with XIP CRAMFS as my rootfs 
 * all apps and libraries marked as XIP in my cramfs image



So far, functionally it seems to work the same as [PATCH v4 4/5].

As Nicolas said, before you could easily see that all my apps and 
libraries were XIP from Flash:

$ cat /proc/self/maps
00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
b6e69000-b6f42000 r-xp 1b0bc000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f42000-b6f4a000 ---p 1b195000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f4a000-b6f4c000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f4c000-b6f4d000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f4d000-b6f50000 rw-p 00000000 00:00 0
b6f50000-b6f67000 r-xp 1b0a4000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f6a000-b6f6b000 rw-p 00000000 00:00 0
b6f6c000-b6f6e000 rw-p 00000000 00:00 0
b6f6e000-b6f6f000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f6f000-b6f70000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
beac0000-beae1000 rw-p 00000000 00:00 0          [stack]
bebc9000-bebca000 r-xp 00000000 00:00 0          [sigpage]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]



But now just busybox looks like it's XIP:

$ cat /proc/self/maps
00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
b6e4d000-b6f26000 r-xp 00000000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f26000-b6f2e000 ---p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f2e000-b6f30000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f30000-b6f31000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f31000-b6f34000 rw-p 00000000 00:00 0
b6f34000-b6f4b000 r-xp 00000000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f4e000-b6f4f000 rw-p 00000000 00:00 0
b6f50000-b6f52000 rw-p 00000000 00:00 0
b6f52000-b6f53000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f53000-b6f54000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
bec93000-becb4000 rw-p 00000000 00:00 0          [stack]
befad000-befae000 r-xp 00000000 00:00 0          [sigpage]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]


Regardless, from a functional standpoint:

Tested-by: Chris Brandt <chris.brandt@renesas.com>




Just FYI, the previous [PATCH v4 4/5] also included this (which was the 
only real difference between v3 and v4):


diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 5b4e0b7e13..306549be25 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
 
 config CRAMFS_PHYSMEM
 	bool "Support CramFs image directly mapped in physical memory"
-	depends on CRAMFS
+	depends on CRAMFS = y
 	default y if !CRAMFS_BLOCKDEV
 	help
 	  This option allows the CramFs driver to load data directly from


Chris

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-05 20:00                         ` Chris Brandt
  0 siblings, 0 replies; 54+ messages in thread
From: Chris Brandt @ 2017-10-05 20:00 UTC (permalink / raw)
  To: Nicolas Pitre, Christoph Hellwig
  Cc: Richard Weinberger, Alexander Viro, linux-mm, linux-fsdevel,
	linux-embedded, LKML

On Wednesday, October 04, 2017, Nicolas Pitre wrote:
> On Wed, 4 Oct 2017, Christoph Hellwig wrote:
> 
> > As said in my last mail: look at the VM_MIXEDMAP flag and how it is
> > used by DAX, and you'll get out of the vma splitting business in the
> > fault path.
> 
> Alright, it appears to work.
> 
> The only downside so far is the lack of visibility from user space to
> confirm it actually works as intended. With the vma splitting approach
> you clearly see what gets directly mapped in /proc/*/maps thanks to
> remap_pfn_range() storing the actual physical address in vma->vm_pgoff.
> With VM_MIXEDMAP things are no longer visible. Any opinion for the best
> way to overcome this?
> 
> Anyway, here's a replacement for patch 4/5 below:
> 
> ----- >8
> Subject: cramfs: add mmap support
> 
> When cramfs_physmem is used then we have the opportunity to map files
> directly from ROM, directly into user space, saving on RAM usage.
> This gives us Execute-In-Place (XIP) support.


Tested on my setup:
 * Cortex A9 (with MMU)
 * CONFIG_XIP_KERNEL=y
 * booted with XIP CRAMFS as my rootfs 
 * all apps and libraries marked as XIP in my cramfs image



So far, functionally it seems to work the same as [PATCH v4 4/5].

As Nicolas said, before you could easily see that all my apps and 
libraries were XIP from Flash:

$ cat /proc/self/maps
00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
b6e69000-b6f42000 r-xp 1b0bc000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f42000-b6f4a000 ---p 1b195000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f4a000-b6f4c000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f4c000-b6f4d000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f4d000-b6f50000 rw-p 00000000 00:00 0
b6f50000-b6f67000 r-xp 1b0a4000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f6a000-b6f6b000 rw-p 00000000 00:00 0
b6f6c000-b6f6e000 rw-p 00000000 00:00 0
b6f6e000-b6f6f000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f6f000-b6f70000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
beac0000-beae1000 rw-p 00000000 00:00 0          [stack]
bebc9000-bebca000 r-xp 00000000 00:00 0          [sigpage]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]



But now just busybox looks like it's XIP:

$ cat /proc/self/maps
00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
b6e4d000-b6f26000 r-xp 00000000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f26000-b6f2e000 ---p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f2e000-b6f30000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f30000-b6f31000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
b6f31000-b6f34000 rw-p 00000000 00:00 0
b6f34000-b6f4b000 r-xp 00000000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f4e000-b6f4f000 rw-p 00000000 00:00 0
b6f50000-b6f52000 rw-p 00000000 00:00 0
b6f52000-b6f53000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
b6f53000-b6f54000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
bec93000-becb4000 rw-p 00000000 00:00 0          [stack]
befad000-befae000 r-xp 00000000 00:00 0          [sigpage]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]


Regardless, from a functional standpoint:

Tested-by: Chris Brandt <chris.brandt@renesas.com>




Just FYI, the previous [PATCH v4 4/5] also included this (which was the 
only real difference between v3 and v4):


diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
index 5b4e0b7e13..306549be25 100644
--- a/fs/cramfs/Kconfig
+++ b/fs/cramfs/Kconfig
@@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
 
 config CRAMFS_PHYSMEM
 	bool "Support CramFs image directly mapped in physical memory"
-	depends on CRAMFS
+	depends on CRAMFS = y
 	default y if !CRAMFS_BLOCKDEV
 	help
 	  This option allows the CramFs driver to load data directly from


Chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-05 20:00                         ` Chris Brandt
@ 2017-10-05 21:15                           ` Nicolas Pitre
  -1 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-05 21:15 UTC (permalink / raw)
  To: Chris Brandt
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML

On Thu, 5 Oct 2017, Chris Brandt wrote:

> On Wednesday, October 04, 2017, Nicolas Pitre wrote:
> > Anyway, here's a replacement for patch 4/5 below:
> > 
> > ----- >8
> > Subject: cramfs: add mmap support
> > 
> > When cramfs_physmem is used then we have the opportunity to map files
> > directly from ROM, directly into user space, saving on RAM usage.
> > This gives us Execute-In-Place (XIP) support.
> 
> 
> Tested on my setup:
>  * Cortex A9 (with MMU)
>  * CONFIG_XIP_KERNEL=y
>  * booted with XIP CRAMFS as my rootfs 
>  * all apps and libraries marked as XIP in my cramfs image
> 
> 
> 
> So far, functionally it seems to work the same as [PATCH v4 4/5].
> 
> As Nicolas said, before you could easily see that all my apps and 
> libraries were XIP from Flash:
> 
> $ cat /proc/self/maps
> 00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
> 000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
> b6e69000-b6f42000 r-xp 1b0bc000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f42000-b6f4a000 ---p 1b195000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f4a000-b6f4c000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f4c000-b6f4d000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f4d000-b6f50000 rw-p 00000000 00:00 0
> b6f50000-b6f67000 r-xp 1b0a4000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f6a000-b6f6b000 rw-p 00000000 00:00 0
> b6f6c000-b6f6e000 rw-p 00000000 00:00 0
> b6f6e000-b6f6f000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f6f000-b6f70000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
> beac0000-beae1000 rw-p 00000000 00:00 0          [stack]
> bebc9000-bebca000 r-xp 00000000 00:00 0          [sigpage]
> ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]
> 
> 
> 
> But now just busybox looks like it's XIP:
> 
> $ cat /proc/self/maps
> 00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
> 000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
> b6e4d000-b6f26000 r-xp 00000000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f26000-b6f2e000 ---p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f2e000-b6f30000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f30000-b6f31000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f31000-b6f34000 rw-p 00000000 00:00 0
> b6f34000-b6f4b000 r-xp 00000000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f4e000-b6f4f000 rw-p 00000000 00:00 0
> b6f50000-b6f52000 rw-p 00000000 00:00 0
> b6f52000-b6f53000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f53000-b6f54000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
> bec93000-becb4000 rw-p 00000000 00:00 0          [stack]
> befad000-befae000 r-xp 00000000 00:00 0          [sigpage]
> ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Do you have the same amount of free memory once booted in both cases?

> Regardless, from a functional standpoint:
> 
> Tested-by: Chris Brandt <chris.brandt@renesas.com>

Thanks.

> Just FYI, the previous [PATCH v4 4/5] also included this (which was the 
> only real difference between v3 and v4):
> 
> 
> diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> index 5b4e0b7e13..306549be25 100644
> --- a/fs/cramfs/Kconfig
> +++ b/fs/cramfs/Kconfig
> @@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
>  
>  config CRAMFS_PHYSMEM
>  	bool "Support CramFs image directly mapped in physical memory"
> -	depends on CRAMFS
> +	depends on CRAMFS = y

Yeah, that was necessary because split_vma() wasn't exported to modules. 
Now split_vma() is no longer used so the no-module restriction has also 
been removed.


Nicolas

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-05 21:15                           ` Nicolas Pitre
  0 siblings, 0 replies; 54+ messages in thread
From: Nicolas Pitre @ 2017-10-05 21:15 UTC (permalink / raw)
  To: Chris Brandt
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML

On Thu, 5 Oct 2017, Chris Brandt wrote:

> On Wednesday, October 04, 2017, Nicolas Pitre wrote:
> > Anyway, here's a replacement for patch 4/5 below:
> > 
> > ----- >8
> > Subject: cramfs: add mmap support
> > 
> > When cramfs_physmem is used then we have the opportunity to map files
> > directly from ROM, directly into user space, saving on RAM usage.
> > This gives us Execute-In-Place (XIP) support.
> 
> 
> Tested on my setup:
>  * Cortex A9 (with MMU)
>  * CONFIG_XIP_KERNEL=y
>  * booted with XIP CRAMFS as my rootfs 
>  * all apps and libraries marked as XIP in my cramfs image
> 
> 
> 
> So far, functionally it seems to work the same as [PATCH v4 4/5].
> 
> As Nicolas said, before you could easily see that all my apps and 
> libraries were XIP from Flash:
> 
> $ cat /proc/self/maps
> 00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
> 000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
> b6e69000-b6f42000 r-xp 1b0bc000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f42000-b6f4a000 ---p 1b195000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f4a000-b6f4c000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f4c000-b6f4d000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f4d000-b6f50000 rw-p 00000000 00:00 0
> b6f50000-b6f67000 r-xp 1b0a4000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f6a000-b6f6b000 rw-p 00000000 00:00 0
> b6f6c000-b6f6e000 rw-p 00000000 00:00 0
> b6f6e000-b6f6f000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f6f000-b6f70000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
> beac0000-beae1000 rw-p 00000000 00:00 0          [stack]
> bebc9000-bebca000 r-xp 00000000 00:00 0          [sigpage]
> ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]
> 
> 
> 
> But now just busybox looks like it's XIP:
> 
> $ cat /proc/self/maps
> 00008000-000a1000 r-xp 1b005000 00:0c 18192      /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192      /bin/busybox
> 000aa000-000ac000 rw-p 00000000 00:00 0          [heap]
> b6e4d000-b6f26000 r-xp 00000000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f26000-b6f2e000 ---p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f2e000-b6f30000 r--p 000d9000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f30000-b6f31000 rw-p 000db000 00:0c 766540     /lib/libc-2.18-2013.10.so
> b6f31000-b6f34000 rw-p 00000000 00:00 0
> b6f34000-b6f4b000 r-xp 00000000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f4e000-b6f4f000 rw-p 00000000 00:00 0
> b6f50000-b6f52000 rw-p 00000000 00:00 0
> b6f52000-b6f53000 r--p 00016000 00:0c 670372     /lib/ld-2.18-2013.10.so
> b6f53000-b6f54000 rw-p 00017000 00:0c 670372     /lib/ld-2.18-2013.10.so
> bec93000-becb4000 rw-p 00000000 00:00 0          [stack]
> befad000-befae000 r-xp 00000000 00:00 0          [sigpage]
> ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

Do you have the same amount of free memory once booted in both cases?

> Regardless, from a functional standpoint:
> 
> Tested-by: Chris Brandt <chris.brandt@renesas.com>

Thanks.

> Just FYI, the previous [PATCH v4 4/5] also included this (which was the 
> only real difference between v3 and v4):
> 
> 
> diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> index 5b4e0b7e13..306549be25 100644
> --- a/fs/cramfs/Kconfig
> +++ b/fs/cramfs/Kconfig
> @@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
>  
>  config CRAMFS_PHYSMEM
>  	bool "Support CramFs image directly mapped in physical memory"
> -	depends on CRAMFS
> +	depends on CRAMFS = y

Yeah, that was necessary because split_vma() wasn't exported to modules. 
Now split_vma() is no longer used so the no-module restriction has also 
been removed.


Nicolas

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 4/5] cramfs: add mmap support
  2017-10-05 21:15                           ` Nicolas Pitre
@ 2017-10-05 23:49                             ` Chris Brandt
  -1 siblings, 0 replies; 54+ messages in thread
From: Chris Brandt @ 2017-10-05 23:49 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML

On Thursday, October 05, 2017, Nicolas Pitre wrote:
> Do you have the same amount of free memory once booted in both cases?

Yes, almost exactly the same, so obvious it must be working the same for
both cases. That's enough evidence for me.

Thanks.

Chris

^ permalink raw reply	[flat|nested] 54+ messages in thread

* RE: [PATCH v4 4/5] cramfs: add mmap support
@ 2017-10-05 23:49                             ` Chris Brandt
  0 siblings, 0 replies; 54+ messages in thread
From: Chris Brandt @ 2017-10-05 23:49 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Christoph Hellwig, Richard Weinberger, Alexander Viro, linux-mm,
	linux-fsdevel, linux-embedded, LKML

On Thursday, October 05, 2017, Nicolas Pitre wrote:
> Do you have the same amount of free memory once booted in both cases?

Yes, almost exactly the same, so obvious it must be working the same for
both cases. That's enough evidence for me.

Thanks.

Chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2017-10-05 23:49 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-27 23:32 [PATCH v4 0/5] cramfs refresh for embedded usage Nicolas Pitre
2017-09-27 23:32 ` Nicolas Pitre
2017-09-27 23:32 ` [PATCH v4 1/5] cramfs: direct memory access support Nicolas Pitre
2017-09-27 23:32   ` Nicolas Pitre
2017-10-01  8:29   ` Christoph Hellwig
2017-10-01  8:29     ` Christoph Hellwig
2017-10-01 22:27     ` Nicolas Pitre
2017-10-01 22:27       ` Nicolas Pitre
2017-10-03 14:59       ` Christoph Hellwig
2017-10-03 14:59         ` Christoph Hellwig
2017-10-03 15:06         ` Nicolas Pitre
2017-10-03 15:06           ` Nicolas Pitre
2017-10-03 14:43     ` Rob Herring
2017-10-03 14:43       ` Rob Herring
2017-10-03 14:58       ` Chris Brandt
2017-10-03 14:58         ` Chris Brandt
2017-09-27 23:32 ` [PATCH v4 2/5] cramfs: make cramfs_physmem usable as root fs Nicolas Pitre
2017-09-27 23:32   ` Nicolas Pitre
2017-09-27 23:32 ` [PATCH v4 3/5] cramfs: implement uncompressed and arbitrary data block positioning Nicolas Pitre
2017-09-27 23:32   ` Nicolas Pitre
2017-09-27 23:32 ` [PATCH v4 4/5] cramfs: add mmap support Nicolas Pitre
2017-09-27 23:32   ` Nicolas Pitre
2017-10-01  8:30   ` Christoph Hellwig
2017-10-01  8:30     ` Christoph Hellwig
2017-10-01 22:29     ` Nicolas Pitre
2017-10-01 22:29       ` Nicolas Pitre
2017-10-02 22:45       ` Richard Weinberger
2017-10-02 22:45         ` Richard Weinberger
2017-10-02 23:33         ` Nicolas Pitre
2017-10-02 23:33           ` Nicolas Pitre
2017-10-03 14:57           ` Christoph Hellwig
2017-10-03 14:57             ` Christoph Hellwig
2017-10-03 15:30             ` Nicolas Pitre
2017-10-03 15:30               ` Nicolas Pitre
2017-10-03 15:37               ` Christoph Hellwig
2017-10-03 15:37                 ` Christoph Hellwig
2017-10-03 15:40                 ` Nicolas Pitre
2017-10-03 15:40                   ` Nicolas Pitre
2017-10-04  7:25                   ` Christoph Hellwig
2017-10-04  7:25                     ` Christoph Hellwig
2017-10-04 20:47                     ` Nicolas Pitre
2017-10-04 20:47                       ` Nicolas Pitre
2017-10-05  7:15                       ` Christoph Hellwig
2017-10-05  7:15                         ` Christoph Hellwig
2017-10-05 17:52                         ` Nicolas Pitre
2017-10-05 17:52                           ` Nicolas Pitre
2017-10-05 20:00                       ` Chris Brandt
2017-10-05 20:00                         ` Chris Brandt
2017-10-05 21:15                         ` Nicolas Pitre
2017-10-05 21:15                           ` Nicolas Pitre
2017-10-05 23:49                           ` Chris Brandt
2017-10-05 23:49                             ` Chris Brandt
2017-09-27 23:32 ` [PATCH v4 5/5] cramfs: rehabilitate it Nicolas Pitre
2017-09-27 23:32   ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.