mm-commits Archive on lore.kernel.org
 help / color / Atom feed
* incoming
@ 2021-02-09 21:41 Andrew Morton
  2021-02-09 21:41 ` [patch 01/14] squashfs: avoid out of bounds writes in decompressors Andrew Morton
                   ` (14 more replies)
  0 siblings, 15 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-mm, mm-commits

14 patches, based on e0756cfc7d7cd08c98a53b6009c091a3f6a50be6.

Subsystems affected by this patch series:

  squashfs
  mm/kasan
  firmware
  mm/mremap
  mm/tmpfs
  mm/selftests
  MAINTAINERS
  mm/memcg
  mm/slub
  nilfs2

Subsystem: squashfs

    Phillip Lougher <phillip@squashfs.org.uk>:
    Patch series "Squashfs: fix BIO migration regression and add sanity checks":
      squashfs: avoid out of bounds writes in decompressors
      squashfs: add more sanity checks in id lookup
      squashfs: add more sanity checks in inode lookup
      squashfs: add more sanity checks in xattr id lookup

Subsystem: mm/kasan

    Andrey Konovalov <andreyknvl@google.com>:
      kasan: fix stack traces dependency for HW_TAGS

Subsystem: firmware

    Fangrui Song <maskray@google.com>:
      firmware_loader: align .builtin_fw to 8

Subsystem: mm/mremap

    Arnd Bergmann <arnd@arndb.de>:
      mm/mremap: fix BUILD_BUG_ON() error in get_extent

Subsystem: mm/tmpfs

    Seth Forshee <seth.forshee@canonical.com>:
      tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
      tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha

Subsystem: mm/selftests

    Rong Chen <rong.a.chen@intel.com>:
      selftests/vm: rename file run_vmtests to run_vmtests.sh

Subsystem: MAINTAINERS

    Andrey Ryabinin <ryabinin.a.a@gmail.com>:
      MAINTAINERS: update Andrey Ryabinin's email address

Subsystem: mm/memcg

    Johannes Weiner <hannes@cmpxchg.org>:
      Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"

Subsystem: mm/slub

    Vlastimil Babka <vbabka@suse.cz>:
      mm, slub: better heuristic for number of cpus when calculating slab order

Subsystem: nilfs2

    Joachim Henke <joachim.henke@t-systems.com>:
      nilfs2: make splice write available again

 .mailmap                          |    1 
 Documentation/dev-tools/kasan.rst |    3 -
 MAINTAINERS                       |    2 -
 fs/Kconfig                        |    4 +-
 fs/nilfs2/file.c                  |    1 
 fs/squashfs/block.c               |    8 ++++
 fs/squashfs/export.c              |   41 +++++++++++++++++++----
 fs/squashfs/id.c                  |   40 ++++++++++++++++++-----
 fs/squashfs/squashfs_fs_sb.h      |    1 
 fs/squashfs/super.c               |    6 +--
 fs/squashfs/xattr.h               |   10 +++++
 fs/squashfs/xattr_id.c            |   66 ++++++++++++++++++++++++++++++++------
 include/asm-generic/vmlinux.lds.h |    2 -
 mm/kasan/hw_tags.c                |    8 +---
 mm/memcontrol.c                   |    5 +-
 mm/mremap.c                       |    5 +-
 mm/slub.c                         |   18 +++++++++-
 17 files changed, 172 insertions(+), 49 deletions(-)


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 01/14] squashfs: avoid out of bounds writes in decompressors
  2021-02-09 21:41 incoming Andrew Morton
@ 2021-02-09 21:41 ` Andrew Morton
  2021-02-09 21:41 ` [patch 02/14] squashfs: add more sanity checks in id lookup Andrew Morton
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:41 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, phillip, pliard, stable, torvalds

From: Phillip Lougher <phillip@squashfs.org.uk>
Subject: squashfs: avoid out of bounds writes in decompressors

Patch series "Squashfs: fix BIO migration regression and add sanity checks".

Patch [1/4] fixes a regression introduced by the "migrate from ll_rw_block
usage to BIO" patch, which has produced a number of Sysbot/Syzkaller
reports.

Patches [2/4], [3/4], and [4/4] fix a number of filesystem corruption
issues which have produced Sysbot reports in the id, inode and xattr
lookup code.

Each patch has been tested against the Sysbot reproducers using the given
kernel configuration.  They have the appropriate "Reported-by:" lines
added.

Additionally, all of the reproducer filesystems are indirectly fixed by
patch [4/4] due to the fact they all have xattr corruption which is now
detected there.

Additional testing with other configurations and architectures (32bit, big
endian), and normal filesystems has also been done to trap any inadvertent
regressions caused by the additional sanity checks.


This patch (of 4):

This is a regression introduced by the patch "migrate from ll_rw_block
usage to BIO".

Sysbot/Syskaller has reported a number of "out of bounds writes" and
"unable to handle kernel paging request in squashfs_decompress" errors
which have been identified as a regression introduced by the above patch.

Specifically, the patch removed the following sanity check

if (length < 0 || length > output->length ||
		(index + length) > msblk->bytes_used)

This check did two things:

1. It ensured any reads were not beyond the end of the filesystem

2. It ensured that the "length" field read from the filesystem
   was within the expected maximum length.  Without this any
   corrupted values can over-run allocated buffers.

Link: https://lkml.kernel.org/r/20210204130249.4495-1-phillip@squashfs.org.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-2-phillip@squashfs.org.uk
Fixes: 93e72b3c612adc ("squashfs: migrate from ll_rw_block usage to BIO")
Reported-by: syzbot+6fba78f99b9afd4b5634@syzkaller.appspotmail.com
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Philippe Liard <pliard@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/block.c |    8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

--- a/fs/squashfs/block.c~squashfs-avoid-out-of-bounds-writes-in-decompressors
+++ a/fs/squashfs/block.c
@@ -196,9 +196,15 @@ int squashfs_read_data(struct super_bloc
 		length = SQUASHFS_COMPRESSED_SIZE(length);
 		index += 2;
 
-		TRACE("Block @ 0x%llx, %scompressed size %d\n", index,
+		TRACE("Block @ 0x%llx, %scompressed size %d\n", index - 2,
 		      compressed ? "" : "un", length);
 	}
+	if (length < 0 || length > output->length ||
+			(index + length) > msblk->bytes_used) {
+		res = -EIO;
+		goto out;
+	}
+
 	if (next_index)
 		*next_index = index + length;
 
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 02/14] squashfs: add more sanity checks in id lookup
  2021-02-09 21:41 incoming Andrew Morton
  2021-02-09 21:41 ` [patch 01/14] squashfs: avoid out of bounds writes in decompressors Andrew Morton
@ 2021-02-09 21:41 ` Andrew Morton
  2021-02-09 21:41 ` [patch 03/14] squashfs: add more sanity checks in inode lookup Andrew Morton
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:41 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, phillip, stable, torvalds

From: Phillip Lougher <phillip@squashfs.org.uk>
Subject: squashfs: add more sanity checks in id lookup

Sysbot has reported a number of "slab-out-of-bounds reads" and
"use-after-free read" errors which has been identified as being caused by
a corrupted index value read from the inode.  This could be because the
metadata block is uncompressed, or because the "compression" bit has been
corrupted (turning a compressed block into an uncompressed block).

This patch adds additional sanity checks to detect this, and the
following corruption.

1. It checks against corruption of the ids count.  This can either
   lead to a larger table to be read, or a smaller than expected
   table to be read.

   In the case of a too large ids count, this would often have been
   trapped by the existing sanity checks, but this patch introduces
   a more exact check, which can identify too small values.

2. It checks the contents of the index table for corruption.

Link: https://lkml.kernel.org/r/20210204130249.4495-3-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+b06d57ba83f604522af2@syzkaller.appspotmail.com
Reported-by: syzbot+c021ba012da41ee9807c@syzkaller.appspotmail.com
Reported-by: syzbot+5024636e8b5fd19f0f19@syzkaller.appspotmail.com
Reported-by: syzbot+bcbc661df46657d0fa4f@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/id.c             |   40 ++++++++++++++++++++++++++-------
 fs/squashfs/squashfs_fs_sb.h |    1 
 fs/squashfs/super.c          |    6 ++--
 fs/squashfs/xattr.h          |   10 +++++++-
 4 files changed, 45 insertions(+), 12 deletions(-)

--- a/fs/squashfs/id.c~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/id.c
@@ -35,10 +35,15 @@ int squashfs_get_id(struct super_block *
 	struct squashfs_sb_info *msblk = sb->s_fs_info;
 	int block = SQUASHFS_ID_BLOCK(index);
 	int offset = SQUASHFS_ID_BLOCK_OFFSET(index);
-	u64 start_block = le64_to_cpu(msblk->id_table[block]);
+	u64 start_block;
 	__le32 disk_id;
 	int err;
 
+	if (index >= msblk->ids)
+		return -EINVAL;
+
+	start_block = le64_to_cpu(msblk->id_table[block]);
+
 	err = squashfs_read_metadata(sb, &disk_id, &start_block, &offset,
 							sizeof(disk_id));
 	if (err < 0)
@@ -56,7 +61,10 @@ __le64 *squashfs_read_id_index_table(str
 		u64 id_table_start, u64 next_table, unsigned short no_ids)
 {
 	unsigned int length = SQUASHFS_ID_BLOCK_BYTES(no_ids);
+	unsigned int indexes = SQUASHFS_ID_BLOCKS(no_ids);
+	int n;
 	__le64 *table;
+	u64 start, end;
 
 	TRACE("In read_id_index_table, length %d\n", length);
 
@@ -67,20 +75,36 @@ __le64 *squashfs_read_id_index_table(str
 		return ERR_PTR(-EINVAL);
 
 	/*
-	 * length bytes should not extend into the next table - this check
-	 * also traps instances where id_table_start is incorrectly larger
-	 * than the next table start
+	 * The computed size of the index table (length bytes) should exactly
+	 * match the table start and end points
 	 */
-	if (id_table_start + length > next_table)
+	if (length != (next_table - id_table_start))
 		return ERR_PTR(-EINVAL);
 
 	table = squashfs_read_table(sb, id_table_start, length);
+	if (IS_ERR(table))
+		return table;
 
 	/*
-	 * table[0] points to the first id lookup table metadata block, this
-	 * should be less than id_table_start
+	 * table[0], table[1], ... table[indexes - 1] store the locations
+	 * of the compressed id blocks.   Each entry should be less than
+	 * the next (i.e. table[0] < table[1]), and the difference between them
+	 * should be SQUASHFS_METADATA_SIZE or less.  table[indexes - 1]
+	 * should be less than id_table_start, and again the difference
+	 * should be SQUASHFS_METADATA_SIZE or less
 	 */
-	if (!IS_ERR(table) && le64_to_cpu(table[0]) >= id_table_start) {
+	for (n = 0; n < (indexes - 1); n++) {
+		start = le64_to_cpu(table[n]);
+		end = le64_to_cpu(table[n + 1]);
+
+		if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+			kfree(table);
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	start = le64_to_cpu(table[indexes - 1]);
+	if (start >= id_table_start || (id_table_start - start) > SQUASHFS_METADATA_SIZE) {
 		kfree(table);
 		return ERR_PTR(-EINVAL);
 	}
--- a/fs/squashfs/squashfs_fs_sb.h~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/squashfs_fs_sb.h
@@ -64,5 +64,6 @@ struct squashfs_sb_info {
 	unsigned int				inodes;
 	unsigned int				fragments;
 	int					xattr_ids;
+	unsigned int				ids;
 };
 #endif
--- a/fs/squashfs/super.c~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/super.c
@@ -166,6 +166,7 @@ static int squashfs_fill_super(struct su
 	msblk->directory_table = le64_to_cpu(sblk->directory_table_start);
 	msblk->inodes = le32_to_cpu(sblk->inodes);
 	msblk->fragments = le32_to_cpu(sblk->fragments);
+	msblk->ids = le16_to_cpu(sblk->no_ids);
 	flags = le16_to_cpu(sblk->flags);
 
 	TRACE("Found valid superblock on %pg\n", sb->s_bdev);
@@ -177,7 +178,7 @@ static int squashfs_fill_super(struct su
 	TRACE("Block size %d\n", msblk->block_size);
 	TRACE("Number of inodes %d\n", msblk->inodes);
 	TRACE("Number of fragments %d\n", msblk->fragments);
-	TRACE("Number of ids %d\n", le16_to_cpu(sblk->no_ids));
+	TRACE("Number of ids %d\n", msblk->ids);
 	TRACE("sblk->inode_table_start %llx\n", msblk->inode_table);
 	TRACE("sblk->directory_table_start %llx\n", msblk->directory_table);
 	TRACE("sblk->fragment_table_start %llx\n",
@@ -236,8 +237,7 @@ static int squashfs_fill_super(struct su
 allocate_id_index_table:
 	/* Allocate and read id index table */
 	msblk->id_table = squashfs_read_id_index_table(sb,
-		le64_to_cpu(sblk->id_table_start), next_table,
-		le16_to_cpu(sblk->no_ids));
+		le64_to_cpu(sblk->id_table_start), next_table, msblk->ids);
 	if (IS_ERR(msblk->id_table)) {
 		errorf(fc, "unable to read id index table");
 		err = PTR_ERR(msblk->id_table);
--- a/fs/squashfs/xattr.h~squashfs-add-more-sanity-checks-in-id-lookup
+++ a/fs/squashfs/xattr.h
@@ -17,8 +17,16 @@ extern int squashfs_xattr_lookup(struct
 static inline __le64 *squashfs_read_xattr_id_table(struct super_block *sb,
 		u64 start, u64 *xattr_table_start, int *xattr_ids)
 {
+	struct squashfs_xattr_id_table *id_table;
+
+	id_table = squashfs_read_table(sb, start, sizeof(*id_table));
+	if (IS_ERR(id_table))
+		return (__le64 *) id_table;
+
+	*xattr_table_start = le64_to_cpu(id_table->xattr_table_start);
+	kfree(id_table);
+
 	ERROR("Xattrs in filesystem, these will be ignored\n");
-	*xattr_table_start = start;
 	return ERR_PTR(-ENOTSUPP);
 }
 
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 03/14] squashfs: add more sanity checks in inode lookup
  2021-02-09 21:41 incoming Andrew Morton
  2021-02-09 21:41 ` [patch 01/14] squashfs: avoid out of bounds writes in decompressors Andrew Morton
  2021-02-09 21:41 ` [patch 02/14] squashfs: add more sanity checks in id lookup Andrew Morton
@ 2021-02-09 21:41 ` Andrew Morton
  2021-02-09 21:42 ` [patch 04/14] squashfs: add more sanity checks in xattr id lookup Andrew Morton
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:41 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, phillip, stable, torvalds

From: Phillip Lougher <phillip@squashfs.org.uk>
Subject: squashfs: add more sanity checks in inode lookup

Sysbot has reported an "slab-out-of-bounds read" error which has been
identified as being caused by a corrupted "ino_num" value read from the
inode.  This could be because the metadata block is uncompressed, or
because the "compression" bit has been corrupted (turning a compressed
block into an uncompressed block).

This patch adds additional sanity checks to detect this, and the following
corruption.

1. It checks against corruption of the inodes count.  This can either
   lead to a larger table to be read, or a smaller than expected
   table to be read.

   In the case of a too large inodes count, this would often have been
   trapped by the existing sanity checks, but this patch introduces
   a more exact check, which can identify too small values.

2. It checks the contents of the index table for corruption.

[phillip@squashfs.org.uk: fix checkpatch issue]
  Link: https://lkml.kernel.org/r/527909353.754618.1612769948607@webmail.123-reg.co.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-4-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+04419e3ff19d2970ea28@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/export.c |   41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

--- a/fs/squashfs/export.c~squashfs-add-more-sanity-checks-in-inode-lookup
+++ a/fs/squashfs/export.c
@@ -41,12 +41,17 @@ static long long squashfs_inode_lookup(s
 	struct squashfs_sb_info *msblk = sb->s_fs_info;
 	int blk = SQUASHFS_LOOKUP_BLOCK(ino_num - 1);
 	int offset = SQUASHFS_LOOKUP_BLOCK_OFFSET(ino_num - 1);
-	u64 start = le64_to_cpu(msblk->inode_lookup_table[blk]);
+	u64 start;
 	__le64 ino;
 	int err;
 
 	TRACE("Entered squashfs_inode_lookup, inode_number = %d\n", ino_num);
 
+	if (ino_num == 0 || (ino_num - 1) >= msblk->inodes)
+		return -EINVAL;
+
+	start = le64_to_cpu(msblk->inode_lookup_table[blk]);
+
 	err = squashfs_read_metadata(sb, &ino, &start, &offset, sizeof(ino));
 	if (err < 0)
 		return err;
@@ -111,7 +116,10 @@ __le64 *squashfs_read_inode_lookup_table
 		u64 lookup_table_start, u64 next_table, unsigned int inodes)
 {
 	unsigned int length = SQUASHFS_LOOKUP_BLOCK_BYTES(inodes);
+	unsigned int indexes = SQUASHFS_LOOKUP_BLOCKS(inodes);
+	int n;
 	__le64 *table;
+	u64 start, end;
 
 	TRACE("In read_inode_lookup_table, length %d\n", length);
 
@@ -121,20 +129,37 @@ __le64 *squashfs_read_inode_lookup_table
 	if (inodes == 0)
 		return ERR_PTR(-EINVAL);
 
-	/* length bytes should not extend into the next table - this check
-	 * also traps instances where lookup_table_start is incorrectly larger
-	 * than the next table start
+	/*
+	 * The computed size of the lookup table (length bytes) should exactly
+	 * match the table start and end points
 	 */
-	if (lookup_table_start + length > next_table)
+	if (length != (next_table - lookup_table_start))
 		return ERR_PTR(-EINVAL);
 
 	table = squashfs_read_table(sb, lookup_table_start, length);
+	if (IS_ERR(table))
+		return table;
 
 	/*
-	 * table[0] points to the first inode lookup table metadata block,
-	 * this should be less than lookup_table_start
+	 * table0], table[1], ... table[indexes - 1] store the locations
+	 * of the compressed inode lookup blocks.  Each entry should be
+	 * less than the next (i.e. table[0] < table[1]), and the difference
+	 * between them should be SQUASHFS_METADATA_SIZE or less.
+	 * table[indexes - 1] should  be less than lookup_table_start, and
+	 * again the difference should be SQUASHFS_METADATA_SIZE or less
 	 */
-	if (!IS_ERR(table) && le64_to_cpu(table[0]) >= lookup_table_start) {
+	for (n = 0; n < (indexes - 1); n++) {
+		start = le64_to_cpu(table[n]);
+		end = le64_to_cpu(table[n + 1]);
+
+		if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+			kfree(table);
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	start = le64_to_cpu(table[indexes - 1]);
+	if (start >= lookup_table_start || (lookup_table_start - start) > SQUASHFS_METADATA_SIZE) {
 		kfree(table);
 		return ERR_PTR(-EINVAL);
 	}
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 04/14] squashfs: add more sanity checks in xattr id lookup
  2021-02-09 21:41 incoming Andrew Morton
                   ` (2 preceding siblings ...)
  2021-02-09 21:41 ` [patch 03/14] squashfs: add more sanity checks in inode lookup Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 05/14] kasan: fix stack traces dependency for HW_TAGS Andrew Morton
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, phillip, stable, torvalds

From: Phillip Lougher <phillip@squashfs.org.uk>
Subject: squashfs: add more sanity checks in xattr id lookup

Sysbot has reported a warning where a kmalloc() attempt exceeds the
maximum limit.  This has been identified as corruption of the xattr_ids
count when reading the xattr id lookup table.

This patch adds a number of additional sanity checks to detect this
corruption and others.

1. It checks for a corrupted xattr index read from the inode.  This could
   be because the metadata block is uncompressed, or because the
   "compression" bit has been corrupted (turning a compressed block
   into an uncompressed block).  This would cause an out of bounds read.

2. It checks against corruption of the xattr_ids count.  This can either
   lead to the above kmalloc failure, or a smaller than expected
   table to be read.

3. It checks the contents of the index table for corruption.

[phillip@squashfs.org.uk: fix checkpatch issue]
  Link: https://lkml.kernel.org/r/270245655.754655.1612770082682@webmail.123-reg.co.uk
Link: https://lkml.kernel.org/r/20210204130249.4495-5-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip@squashfs.org.uk>
Reported-by: syzbot+2ccea6339d368360800d@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/squashfs/xattr_id.c |   66 +++++++++++++++++++++++++++++++++------
 1 file changed, 57 insertions(+), 9 deletions(-)

--- a/fs/squashfs/xattr_id.c~squashfs-add-more-sanity-checks-in-xattr-id-lookup
+++ a/fs/squashfs/xattr_id.c
@@ -31,10 +31,15 @@ int squashfs_xattr_lookup(struct super_b
 	struct squashfs_sb_info *msblk = sb->s_fs_info;
 	int block = SQUASHFS_XATTR_BLOCK(index);
 	int offset = SQUASHFS_XATTR_BLOCK_OFFSET(index);
-	u64 start_block = le64_to_cpu(msblk->xattr_id_table[block]);
+	u64 start_block;
 	struct squashfs_xattr_id id;
 	int err;
 
+	if (index >= msblk->xattr_ids)
+		return -EINVAL;
+
+	start_block = le64_to_cpu(msblk->xattr_id_table[block]);
+
 	err = squashfs_read_metadata(sb, &id, &start_block, &offset,
 							sizeof(id));
 	if (err < 0)
@@ -50,13 +55,17 @@ int squashfs_xattr_lookup(struct super_b
 /*
  * Read uncompressed xattr id lookup table indexes from disk into memory
  */
-__le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 start,
+__le64 *squashfs_read_xattr_id_table(struct super_block *sb, u64 table_start,
 		u64 *xattr_table_start, int *xattr_ids)
 {
-	unsigned int len;
+	struct squashfs_sb_info *msblk = sb->s_fs_info;
+	unsigned int len, indexes;
 	struct squashfs_xattr_id_table *id_table;
+	__le64 *table;
+	u64 start, end;
+	int n;
 
-	id_table = squashfs_read_table(sb, start, sizeof(*id_table));
+	id_table = squashfs_read_table(sb, table_start, sizeof(*id_table));
 	if (IS_ERR(id_table))
 		return (__le64 *) id_table;
 
@@ -70,13 +79,52 @@ __le64 *squashfs_read_xattr_id_table(str
 	if (*xattr_ids == 0)
 		return ERR_PTR(-EINVAL);
 
-	/* xattr_table should be less than start */
-	if (*xattr_table_start >= start)
+	len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
+	indexes = SQUASHFS_XATTR_BLOCKS(*xattr_ids);
+
+	/*
+	 * The computed size of the index table (len bytes) should exactly
+	 * match the table start and end points
+	 */
+	start = table_start + sizeof(*id_table);
+	end = msblk->bytes_used;
+
+	if (len != (end - start))
 		return ERR_PTR(-EINVAL);
 
-	len = SQUASHFS_XATTR_BLOCK_BYTES(*xattr_ids);
+	table = squashfs_read_table(sb, start, len);
+	if (IS_ERR(table))
+		return table;
+
+	/* table[0], table[1], ... table[indexes - 1] store the locations
+	 * of the compressed xattr id blocks.  Each entry should be less than
+	 * the next (i.e. table[0] < table[1]), and the difference between them
+	 * should be SQUASHFS_METADATA_SIZE or less.  table[indexes - 1]
+	 * should be less than table_start, and again the difference
+	 * shouls be SQUASHFS_METADATA_SIZE or less.
+	 *
+	 * Finally xattr_table_start should be less than table[0].
+	 */
+	for (n = 0; n < (indexes - 1); n++) {
+		start = le64_to_cpu(table[n]);
+		end = le64_to_cpu(table[n + 1]);
+
+		if (start >= end || (end - start) > SQUASHFS_METADATA_SIZE) {
+			kfree(table);
+			return ERR_PTR(-EINVAL);
+		}
+	}
+
+	start = le64_to_cpu(table[indexes - 1]);
+	if (start >= table_start || (table_start - start) > SQUASHFS_METADATA_SIZE) {
+		kfree(table);
+		return ERR_PTR(-EINVAL);
+	}
 
-	TRACE("In read_xattr_index_table, length %d\n", len);
+	if (*xattr_table_start >= le64_to_cpu(table[0])) {
+		kfree(table);
+		return ERR_PTR(-EINVAL);
+	}
 
-	return squashfs_read_table(sb, start + sizeof(*id_table), len);
+	return table;
 }
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 05/14] kasan: fix stack traces dependency for HW_TAGS
  2021-02-09 21:41 incoming Andrew Morton
                   ` (3 preceding siblings ...)
  2021-02-09 21:42 ` [patch 04/14] squashfs: add more sanity checks in xattr id lookup Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 06/14] firmware_loader: align .builtin_fw to 8 Andrew Morton
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, andreyknvl, aryabinin, Branislav.Rankov, catalin.marinas,
	dvyukov, elver, eugenis, glider, kevin.brodsky, linux-mm,
	mm-commits, pcc, torvalds, vincenzo.frascino, will.deacon

From: Andrey Konovalov <andreyknvl@google.com>
Subject: kasan: fix stack traces dependency for HW_TAGS

Currently, whether the alloc/free stack traces collection is enabled by
default for hardware tag-based KASAN depends on CONFIG_DEBUG_KERNEL.  The
intention for this dependency was to only enable collection on slow debug
kernels due to a significant perf and memory impact.

As it turns out, CONFIG_DEBUG_KERNEL is not considered a debug option and
is enabled on many productions kernels including Android and Ubuntu.  As
the result, this dependency is pointless and only complicates the code and
documentation.

Having stack traces collection disabled by default would make the hardware
mode work differently to to the software ones, which is confusing.

This change removes the dependency and enables stack traces collection by
default.

Looking into the future, this default might makes sense for production
kernels, assuming we implement a fast stack trace collection approach.

Link: https://lkml.kernel.org/r/6678d77ceffb71f1cff2cf61560e2ffe7bb6bfe9.1612808820.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Marco Elver <elver@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Evgenii Stepanov <eugenis@google.com>
Cc: Branislav Rankov <Branislav.Rankov@arm.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 Documentation/dev-tools/kasan.rst |    3 +--
 mm/kasan/hw_tags.c                |    8 ++------
 2 files changed, 3 insertions(+), 8 deletions(-)

--- a/Documentation/dev-tools/kasan.rst~kasan-fix-stack-traces-dependency-for-hw_tags
+++ a/Documentation/dev-tools/kasan.rst
@@ -163,8 +163,7 @@ particular KASAN features.
 - ``kasan=off`` or ``=on`` controls whether KASAN is enabled (default: ``on``).
 
 - ``kasan.stacktrace=off`` or ``=on`` disables or enables alloc and free stack
-  traces collection (default: ``on`` for ``CONFIG_DEBUG_KERNEL=y``, otherwise
-  ``off``).
+  traces collection (default: ``on``).
 
 - ``kasan.fault=report`` or ``=panic`` controls whether to only print a KASAN
   report or also panic the kernel (default: ``report``).
--- a/mm/kasan/hw_tags.c~kasan-fix-stack-traces-dependency-for-hw_tags
+++ a/mm/kasan/hw_tags.c
@@ -134,12 +134,8 @@ void __init kasan_init_hw_tags(void)
 
 	switch (kasan_arg_stacktrace) {
 	case KASAN_ARG_STACKTRACE_DEFAULT:
-		/*
-		 * Default to enabling stack trace collection for
-		 * debug kernels.
-		 */
-		if (IS_ENABLED(CONFIG_DEBUG_KERNEL))
-			static_branch_enable(&kasan_flag_stacktrace);
+		/* Default to enabling stack trace collection. */
+		static_branch_enable(&kasan_flag_stacktrace);
 		break;
 	case KASAN_ARG_STACKTRACE_OFF:
 		/* Do nothing, kasan_flag_stacktrace keeps its default value. */
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 06/14] firmware_loader: align .builtin_fw to 8
  2021-02-09 21:41 incoming Andrew Morton
                   ` (4 preceding siblings ...)
  2021-02-09 21:42 ` [patch 05/14] kasan: fix stack traces dependency for HW_TAGS Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 07/14] mm/mremap: fix BUILD_BUG_ON() error in get_extent Andrew Morton
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, arnd, dianders, linux-mm, lkp, maskray, mm-commits, nathan,
	ndesaulniers, torvalds

From: Fangrui Song <maskray@google.com>
Subject: firmware_loader: align .builtin_fw to 8

arm64 references the start address of .builtin_fw (__start_builtin_fw)
with a pair of R_AARCH64_ADR_PREL_PG_HI21/R_AARCH64_LDST64_ABS_LO12_NC
relocations.  The compiler is allowed to emit the
R_AARCH64_LDST64_ABS_LO12_NC relocation because struct builtin_fw in
include/linux/firmware.h is 8-byte aligned.

The R_AARCH64_LDST64_ABS_LO12_NC relocation requires the address to be a
multiple of 8, which may not be the case if .builtin_fw is empty. 
Unconditionally align .builtin_fw to fix the linker error.  32-bit
architectures could use ALIGN(4) but that would add unnecessary
complexity, so just use ALIGN(8).

Link: https://lkml.kernel.org/r/20201208054646.2913063-1-maskray@google.com
Link: https://github.com/ClangBuiltLinux/linux/issues/1204
Fixes: 5658c76 ("firmware: allow firmware files to be built into kernel image")
Signed-off-by: Fangrui Song <maskray@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/asm-generic/vmlinux.lds.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/include/asm-generic/vmlinux.lds.h~firmware_loader-align-builtin_fw-to-8
+++ a/include/asm-generic/vmlinux.lds.h
@@ -459,7 +459,7 @@
 	}								\
 									\
 	/* Built-in firmware blobs */					\
-	.builtin_fw        : AT(ADDR(.builtin_fw) - LOAD_OFFSET) {	\
+	.builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) ALIGN(8) {	\
 		__start_builtin_fw = .;					\
 		KEEP(*(.builtin_fw))					\
 		__end_builtin_fw = .;					\
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 07/14] mm/mremap: fix BUILD_BUG_ON() error in get_extent
  2021-02-09 21:41 incoming Andrew Morton
                   ` (5 preceding siblings ...)
  2021-02-09 21:42 ` [patch 06/14] firmware_loader: align .builtin_fw to 8 Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 08/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 Andrew Morton
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: 0x7f454c46, akpm, arnd, bgeffon, kirill.shutemov, linux-mm,
	mm-commits, natechancellor, ndesaulniers, richard.weiyang,
	sedat.dilek, torvalds, vbabka

From: Arnd Bergmann <arnd@arndb.de>
Subject: mm/mremap: fix BUILD_BUG_ON() error in get_extent

clang can't evaluate this function argument at compile time when the
function is not inlined, which leads to a link time failure:

ld.lld: error: undefined symbol: __compiletime_assert_414
>>> referenced by mremap.c
>>>               mremap.o:(get_extent) in archive mm/built-in.a

Mark the function as __always_inline to avoid it.

Link: https://lkml.kernel.org/r/20201230154104.522605-1-arnd@kernel.org
Fixes: 9ad9718bfa41 ("mm/mremap: calculate extent in one place")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/mremap.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

--- a/mm/mremap.c~mm-mremap-fix-build_bug_on-error-in-get_extent
+++ a/mm/mremap.c
@@ -336,8 +336,9 @@ enum pgt_entry {
  * valid. Else returns a smaller extent bounded by the end of the source and
  * destination pgt_entry.
  */
-static unsigned long get_extent(enum pgt_entry entry, unsigned long old_addr,
-			unsigned long old_end, unsigned long new_addr)
+static __always_inline unsigned long get_extent(enum pgt_entry entry,
+			unsigned long old_addr, unsigned long old_end,
+			unsigned long new_addr)
 {
 	unsigned long next, extent, mask, size;
 
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 08/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on s390
  2021-02-09 21:41 incoming Andrew Morton
                   ` (6 preceding siblings ...)
  2021-02-09 21:42 ` [patch 07/14] mm/mremap: fix BUILD_BUG_ON() error in get_extent Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha Andrew Morton
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, amir73il, borntraeger, chris, gor, hca, hughd, linux-mm,
	mm-commits, seth.forshee, stable, torvalds

From: Seth Forshee <seth.forshee@canonical.com>
Subject: tmpfs: disallow CONFIG_TMPFS_INODE64 on s390

Currently there is an assumption in tmpfs that 64-bit architectures also
have a 64-bit ino_t.  This is not true on s390 which has a 32-bit ino_t. 
With CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
display "inode64" in the mount options, but passing the "inode64" mount
option will fail.  This leads to the following behavior:

 # mkdir mnt
 # mount -t tmpfs nodev mnt
 # mount -o remount,rw mnt
 mount: /home/ubuntu/mnt: mount point not mounted or bad option.

As mount sees "inode64" in the mount options and thus passes it in the
options for the remount.


So prevent CONFIG_TMPFS_INODE64 from being selected on s390.

Link: https://lkml.kernel.org/r/20210205230620.518245-1-seth.forshee@canonical.com
Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/Kconfig~tmpfs-disallow-config_tmpfs_inode64-on-s390
+++ a/fs/Kconfig
@@ -203,7 +203,7 @@ config TMPFS_XATTR
 
 config TMPFS_INODE64
 	bool "Use 64-bit ino_t by default in tmpfs"
-	depends on TMPFS && 64BIT
+	depends on TMPFS && 64BIT && !S390
 	default n
 	help
 	  tmpfs has historically used only inode numbers as wide as an unsigned
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  2021-02-09 21:41 incoming Andrew Morton
                   ` (7 preceding siblings ...)
  2021-02-09 21:42 ` [patch 08/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 22:03   ` Linus Torvalds
  2021-02-09 21:42 ` [patch 10/14] selftests/vm: rename file run_vmtests to run_vmtests.sh Andrew Morton
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, amir73il, chris, hughd, ink, linux-mm, mattst88,
	mm-commits, rth, seth.forshee, stable, torvalds

From: Seth Forshee <seth.forshee@canonical.com>
Subject: tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha

As with s390, alpha is a 64-bit architecture with a 32-bit ino_t.  With
CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
display "inode64" in the mount options, whereas passing "inode64" in the
mount options will fail.  This leads to erroneous behaviours such as this:

 # mkdir mnt
 # mount -t tmpfs nodev mnt
 # mount -o remount,rw mnt
 mount: /home/ubuntu/mnt: mount point not mounted or bad option.

Prevent CONFIG_TMPFS_INODE64 from being selected on alpha.

Link: https://lkml.kernel.org/r/20210208215726.608197-1-seth.forshee@canonical.com
Fixes: ea3271f7196c ("tmpfs: support 64-bit inums per-sb")
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Chris Down <chris@chrisdown.name>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: <stable@vger.kernel.org>	[5.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/Kconfig |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/fs/Kconfig~tmpfs-disallow-config_tmpfs_inode64-on-alpha
+++ a/fs/Kconfig
@@ -203,7 +203,7 @@ config TMPFS_XATTR
 
 config TMPFS_INODE64
 	bool "Use 64-bit ino_t by default in tmpfs"
-	depends on TMPFS && 64BIT && !S390
+	depends on TMPFS && 64BIT && !(S390 || ALPHA)
 	default n
 	help
 	  tmpfs has historically used only inode numbers as wide as an unsigned
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 10/14] selftests/vm: rename file run_vmtests to run_vmtests.sh
  2021-02-09 21:41 incoming Andrew Morton
                   ` (8 preceding siblings ...)
  2021-02-09 21:42 ` [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 11/14] MAINTAINERS: update Andrey Ryabinin's email address Andrew Morton
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, jhubbard, linux-mm, lkp, mm-commits, rong.a.chen, torvalds

From: Rong Chen <rong.a.chen@intel.com>
Subject: selftests/vm: rename file run_vmtests to run_vmtests.sh

Commit c2aa8afc36fa has renamed run_vmtests in Makefile, but the file
still uses the old name.

The kernel test robot reported the following issue:

 # selftests: vm: run_vmtests.sh
 # Warning: file run_vmtests.sh is missing!
 not ok 1 selftests: vm: run_vmtests.sh

Link: https://lkml.kernel.org/r/20210205085507.1479894-1-rong.a.chen@intel.com
Fixes: c2aa8afc36fa (selftests/vm: rename run_vmtests --> run_vmtests.sh)
Signed-off-by: Rong Chen <rong.a.chen@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 tools/testing/selftests/vm/{run_vmtests => run_vmtests.sh} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename tools/testing/selftests/vm/{run_vmtests => run_vmtests.sh} (100%)

diff --git a/tools/testing/selftests/vm/run_vmtests b/tools/testing/selftests/vm/run_vmtests.sh
similarity index 100%
rename from tools/testing/selftests/vm/run_vmtests
rename to tools/testing/selftests/vm/run_vmtests.sh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 11/14] MAINTAINERS: update Andrey Ryabinin's email address
  2021-02-09 21:41 incoming Andrew Morton
                   ` (9 preceding siblings ...)
  2021-02-09 21:42 ` [patch 10/14] selftests/vm: rename file run_vmtests to run_vmtests.sh Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 12/14] Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" Andrew Morton
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, linux-mm, mm-commits, ryabinin.a.a, torvalds

From: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Subject: MAINTAINERS: update Andrey Ryabinin's email address

Update my email, @virtuozzo.com will stop working shortly.

Link: https://lkml.kernel.org/r/20210204223904.3824-1-ryabinin.a.a@gmail.com
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 .mailmap    |    1 +
 MAINTAINERS |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

--- a/.mailmap~maintainers-update-andrey-ryabinins-email-address
+++ a/.mailmap
@@ -37,6 +37,7 @@ Andrew Murray <amurray@thegoodpenguin.co
 Andrew Murray <amurray@thegoodpenguin.co.uk> <andrew.murray@arm.com>
 Andrew Vasquez <andrew.vasquez@qlogic.com>
 Andrey Ryabinin <ryabinin.a.a@gmail.com> <a.ryabinin@samsung.com>
+Andrey Ryabinin <ryabinin.a.a@gmail.com> <aryabinin@virtuozzo.com>
 Andy Adamson <andros@citi.umich.edu>
 Antoine Tenart <atenart@kernel.org> <antoine.tenart@bootlin.com>
 Antoine Tenart <atenart@kernel.org> <antoine.tenart@free-electrons.com>
--- a/MAINTAINERS~maintainers-update-andrey-ryabinins-email-address
+++ a/MAINTAINERS
@@ -9559,7 +9559,7 @@ F:	Documentation/hwmon/k8temp.rst
 F:	drivers/hwmon/k8temp.c
 
 KASAN
-M:	Andrey Ryabinin <aryabinin@virtuozzo.com>
+M:	Andrey Ryabinin <ryabinin.a.a@gmail.com>
 R:	Alexander Potapenko <glider@google.com>
 R:	Dmitry Vyukov <dvyukov@google.com>
 L:	kasan-dev@googlegroups.com
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 12/14] Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"
  2021-02-09 21:41 incoming Andrew Morton
                   ` (10 preceding siblings ...)
  2021-02-09 21:42 ` [patch 11/14] MAINTAINERS: update Andrey Ryabinin's email address Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-09 21:42 ` [patch 13/14] mm, slub: better heuristic for number of cpus when calculating slab order Andrew Morton
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, chris, guro, hannes, linux-mm, mhocko, mkoutny, mm-commits,
	shakeelb, stable, tj, torvalds

From: Johannes Weiner <hannes@cmpxchg.org>
Subject: Revert "mm: memcontrol: avoid workload stalls when lowering memory.high"

This reverts commit 536d3bf261a2fc3b05b3e91e7eef7383443015cf, as it can
cause writers to memory.high to get stuck in the kernel forever,
performing page reclaim and consuming excessive amounts of CPU cycles.

Before the patch, a write to memory.high would first put the new limit in
place for the workload, and then reclaim the requested delta.  After the
patch, the kernel tries to reclaim the delta before putting the new limit
into place, in order to not overwhelm the workload with a sudden, large
excess over the limit.  However, if reclaim is actively racing with new
allocations from the uncurbed workload, it can keep the write() working
inside the kernel indefinitely.

This is causing problems in Facebook production.  A privileged
system-level daemon that adjusts memory.high for various workloads running
on a host can get unexpectedly stuck in the kernel and essentially turn
into a sort of involuntary kswapd for one of the workloads.  We've
observed that daemon busy-spin in a write() for minutes at a time,
neglecting its other duties on the system, and expending privileged system
resources on behalf of a workload.

To remedy this, we have first considered changing the reclaim logic to
break out after a couple of loops - whether the workload has converged to
the new limit or not - and bound the write() call this way.  However, the
root cause that inspired the sequence change in the first place has been
fixed through other means, and so a revert back to the proven
limit-setting sequence, also used by memory.max, is preferable.

The sequence was changed to avoid extreme latencies in the workload when
the limit was lowered: the sudden, large excess created by the limit
lowering would erroneously trigger the penalty sleeping code that is meant
to throttle excessive growth from below.  Allocating threads could end up
sleeping long after the write() had already reclaimed the delta for which
they were being punished.

However, erroneous throttling also caused problems in other scenarios at
around the same time.  This resulted in commit b3ff92916af3 ("mm, memcg:
reclaim more aggressively before high allocator throttling"), included in
the same release as the offending commit.  When allocating threads now
encounter large excess caused by a racing write() to memory.high, instead
of entering punitive sleeps, they will simply be tasked with helping
reclaim down the excess, and will be held no longer than it takes to
accomplish that.  This is in line with regular limit enforcement - i.e. 
if the workload allocates up against or over an otherwise unchanged limit
from below.

With the patch breaking userspace, and the root cause addressed by other
means already, revert it again.

Link: https://lkml.kernel.org/r/20210122184341.292461-1-hannes@cmpxchg.org
Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Acked-by: Chris Down <chris@chrisdown.name>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Michal Koutný <mkoutny@suse.com>
Cc: <stable@vger.kernel.org>	[5.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/memcontrol.c |    5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

--- a/mm/memcontrol.c~revert-mm-memcontrol-avoid-workload-stalls-when-lowering-memoryhigh
+++ a/mm/memcontrol.c
@@ -6271,6 +6271,8 @@ static ssize_t memory_high_write(struct
 	if (err)
 		return err;
 
+	page_counter_set_high(&memcg->memory, high);
+
 	for (;;) {
 		unsigned long nr_pages = page_counter_read(&memcg->memory);
 		unsigned long reclaimed;
@@ -6294,10 +6296,7 @@ static ssize_t memory_high_write(struct
 			break;
 	}
 
-	page_counter_set_high(&memcg->memory, high);
-
 	memcg_wb_domain_size_changed(memcg);
-
 	return nbytes;
 }
 
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 13/14] mm, slub: better heuristic for number of cpus when calculating slab order
  2021-02-09 21:41 incoming Andrew Morton
                   ` (11 preceding siblings ...)
  2021-02-09 21:42 ` [patch 12/14] Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-10 14:34   ` Vlastimil Babka
  2021-02-09 21:42 ` [patch 14/14] nilfs2: make splice write available again Andrew Morton
  2021-02-10 19:30 ` incoming Linus Torvalds
  14 siblings, 1 reply; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, aneesh.kumar, bharata, catalin.marinas, cl, guro, hannes,
	iamjoonsoo.kim, jannh, linux-mm, mgorman, mhocko, mm-commits,
	rientjes, shakeelb, stable, torvalds, vbabka, vincent.guittot,
	will

From: Vlastimil Babka <vbabka@suse.cz>
Subject: mm, slub: better heuristic for number of cpus when calculating slab order

When creating a new kmem cache, SLUB determines how large the slab pages will
based on number of inputs, including the number of CPUs in the system. Larger
slab pages mean that more objects can be allocated/free from per-cpu slabs
before accessing shared structures, but also potentially more memory can be
wasted due to low slab usage and fragmentation.
The rough idea of using number of CPUs is that larger systems will be more
likely to benefit from reduced contention, and also should have enough memory
to spare.

Number of CPUs used to be determined as nr_cpu_ids, which is number of possible
cpus, but on some systems many will never be onlined, thus commit 045ab8c9487b
("mm/slub: let number of online CPUs determine the slub page order") changed it
to nr_online_cpus(). However, for kmem caches created early before CPUs are
onlined, this may lead to permamently low slab page sizes.

Vincent reports a regression [1] of hackbench on arm64 systems:

> I'm facing significant performances regression on a large arm64 server
> system (224 CPUs). Regressions is also present on small arm64 system
> (8 CPUs) but in a far smaller order of magnitude

> On 224 CPUs system : 9 iterations of hackbench -l 16000 -g 16
> v5.11-rc4 : 9.135sec (+/- 0.45%)
> v5.11-rc4 + revert this patch: 3.173sec (+/- 0.48%)
> v5.10: 3.136sec (+/- 0.40%)

Mel reports a regression [2] of hackbench on x86_64, with lockstat suggesting
page allocator contention:

> i.e. the patch incurs a 7% to 32% performance penalty. This bisected
> cleanly yesterday when I was looking for the regression and then found
> the thread.

> Numerous caches change size. For example, kmalloc-512 goes from order-0
> (vanilla) to order-2 with the revert.

> So mostly this is down to the number of times SLUB calls into the page
> allocator which only caches order-0 pages on a per-cpu basis.

Clearly num_online_cpus() doesn't work too early in bootup. We could change
the order dynamically in a memory hotplug callback, but runtime order changing
for existing kmem caches has been already shown as dangerous, and removed in
32a6f409b693 ("mm, slub: remove runtime allocation order changes"). It could be
resurrected in a safe manner with some effort, but to fix the regression we
need something simpler.

We could use num_present_cpus() that should be the number of physically
present CPUs even before they are onlined.  That would work for PowerPC
[3], which triggered the original commit, but that still doesn't work on
arm64 [4] as explained in [5].

So this patch tries to determine the best available value without specific
arch knowledge.

- num_present_cpus() if the number is larger than 1, as that means the
  arch is likely setting it properly

- nr_cpu_ids otherwise

This should fix the reported regressions while also keeping the effect of
045ab8c9487b for PowerPC systems.  It's possible there are configurations
where num_present_cpus() is 1 during boot while nr_cpu_ids is at the same
time bloated, so these (if they exist) would keep the large orders based
on nr_cpu_ids as was before 045ab8c9487b.

[1] https://lore.kernel.org/linux-mm/CAKfTPtA_JgMf_+zdFbcb_V9rM7JBWNPjAz9irgwFj7Rou=xzZg@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/20210128134512.GF3592@techsingularity.net/
[3] https://lore.kernel.org/linux-mm/20210123051607.GC2587010@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/CAKfTPtAjyVmS5VYvU6DBxg4-JEo5bdmWbngf-03YsY18cmWv_g@mail.gmail.com/
[5] https://lore.kernel.org/linux-mm/20210126230305.GD30941@willie-the-truck/

Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz
Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jann Horn <jannh@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Will Deacon <will@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/slub.c |   18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

--- a/mm/slub.c~mm-slub-better-heuristic-for-number-of-cpus-when-calculating-slab-order
+++ a/mm/slub.c
@@ -3423,6 +3423,7 @@ static inline int calculate_order(unsign
 	unsigned int order;
 	unsigned int min_objects;
 	unsigned int max_objects;
+	unsigned int nr_cpus;
 
 	/*
 	 * Attempt to find best configuration for a slab. This
@@ -3433,8 +3434,21 @@ static inline int calculate_order(unsign
 	 * we reduce the minimum objects required in a slab.
 	 */
 	min_objects = slub_min_objects;
-	if (!min_objects)
-		min_objects = 4 * (fls(num_online_cpus()) + 1);
+	if (!min_objects) {
+		/*
+		 * Some architectures will only update present cpus when
+		 * onlining them, so don't trust the number if it's just 1. But
+		 * we also don't want to use nr_cpu_ids always, as on some other
+		 * architectures, there can be many possible cpus, but never
+		 * onlined. Here we compromise between trying to avoid too high
+		 * order on systems that appear larger than they are, and too
+		 * low order on systems that appear smaller than they are.
+		 */
+		nr_cpus = num_present_cpus();
+		if (nr_cpus <= 1)
+			nr_cpus = nr_cpu_ids;
+		min_objects = 4 * (fls(nr_cpus) + 1);
+	}
 	max_objects = order_objects(slub_max_order, size);
 	min_objects = min(min_objects, max_objects);
 
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [patch 14/14] nilfs2: make splice write available again
  2021-02-09 21:41 incoming Andrew Morton
                   ` (12 preceding siblings ...)
  2021-02-09 21:42 ` [patch 13/14] mm, slub: better heuristic for number of cpus when calculating slab order Andrew Morton
@ 2021-02-09 21:42 ` Andrew Morton
  2021-02-10 19:30 ` incoming Linus Torvalds
  14 siblings, 0 replies; 24+ messages in thread
From: Andrew Morton @ 2021-02-09 21:42 UTC (permalink / raw)
  To: akpm, joachim.henke, konishi.ryusuke, linux-mm, mm-commits,
	stable, torvalds

From: Joachim Henke <joachim.henke@t-systems.com>
Subject: nilfs2: make splice write available again

Since 5.10, splice() or sendfile() to NILFS2 return EINVAL.  This was
caused by commit 36e2c7421f02 ("fs: don't allow splice read/write without
explicit ops").

This patch initializes the splice_write field in file_operations, like
most file systems do, to restore the functionality.

Link: https://lkml.kernel.org/r/1612784101-14353-1-git-send-email-konishi.ryusuke@gmail.com
Signed-off-by: Joachim Henke <joachim.henke@t-systems.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Cc: <stable@vger.kernel.org>	[5.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 fs/nilfs2/file.c |    1 +
 1 file changed, 1 insertion(+)

--- a/fs/nilfs2/file.c~nilfs2-make-splice-write-available-again
+++ a/fs/nilfs2/file.c
@@ -141,6 +141,7 @@ const struct file_operations nilfs_file_
 	/* .release	= nilfs_release_file, */
 	.fsync		= nilfs_sync_file,
 	.splice_read	= generic_file_splice_read,
+	.splice_write   = iter_file_splice_write,
 };
 
 const struct inode_operations nilfs_file_inode_operations = {
_

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  2021-02-09 21:42 ` [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha Andrew Morton
@ 2021-02-09 22:03   ` Linus Torvalds
  2021-02-10 13:34     ` Heiko Carstens
  0 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2021-02-09 22:03 UTC (permalink / raw)
  To: Andrew Morton, Christian Borntraeger, Heiko Carstens, Vasily Gorbik
  Cc: Amir Goldstein, Chris Down, Hugh Dickins, Ivan Kokshaysky,
	Linux-MM, Matt Turner, mm-commits, Richard Henderson,
	Seth Forshee, stable

On Tue, Feb 9, 2021 at 1:42 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> As with s390, alpha is a 64-bit architecture with a 32-bit ino_t.  With
> CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
> display "inode64" in the mount options, whereas passing "inode64" in the
> mount options will fail.

Ugh.

The two patches for s390 and alpha are obviously the right thing to
do, but I do wonder if we could strive to make __kernel_ino_t go away
entirely.

It's actually not used very much, because it's such a nasty type, and
s390 and alpha are the only ones that override it from the default
"word length" version (and honestly, even that default is not a great
type).

The main use of it is for "ino_t" and for "struct ustat".

And yes, "ino_t" is widely used, but I think pretty much all uses of
it are entirely internal to the kernel, and we could just make it be
"unsigned long".

Does anybody see any actual user interfaces that depend on
"__kernel_ino_t", aka "ino_t" (apart from that "struct ustat")?

I guess this is mostly a question for s390, which is actively maintained?

           Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  2021-02-09 22:03   ` Linus Torvalds
@ 2021-02-10 13:34     ` Heiko Carstens
  2021-02-10 17:27       ` Heiko Carstens
  2021-02-10 19:17       ` Linus Torvalds
  0 siblings, 2 replies; 24+ messages in thread
From: Heiko Carstens @ 2021-02-10 13:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Christian Borntraeger, Vasily Gorbik,
	Amir Goldstein, Chris Down, Hugh Dickins, Ivan Kokshaysky,
	Linux-MM, Matt Turner, mm-commits, Richard Henderson,
	Seth Forshee, stable, Arnd Bergmann, Ulrich Weigand, Tuan Hoang1

On Tue, Feb 09, 2021 at 02:03:19PM -0800, Linus Torvalds wrote:
> On Tue, Feb 9, 2021 at 1:42 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > As with s390, alpha is a 64-bit architecture with a 32-bit ino_t.  With
> > CONFIG_TMPFS_INODE64=y tmpfs mounts will get 64-bit inode numbers and
> > display "inode64" in the mount options, whereas passing "inode64" in the
> > mount options will fail.
> 
> Ugh.
> 
> The two patches for s390 and alpha are obviously the right thing to
> do, but I do wonder if we could strive to make __kernel_ino_t go away
> entirely.
> 
> It's actually not used very much, because it's such a nasty type, and
> s390 and alpha are the only ones that override it from the default
> "word length" version (and honestly, even that default is not a great
> type).
> 
> The main use of it is for "ino_t" and for "struct ustat".
> 
> And yes, "ino_t" is widely used, but I think pretty much all uses of
> it are entirely internal to the kernel, and we could just make it be
> "unsigned long".
> 
> Does anybody see any actual user interfaces that depend on
> "__kernel_ino_t", aka "ino_t" (apart from that "struct ustat")?
> 
> I guess this is mostly a question for s390, which is actively maintained?

I couldn't spot any and also gave the patch below a try and my system
still boots without any errors.
So, as far as I can tell it _should_ be ok to change this.

Note that the unusual 32 bit ino_t also recently caused a bug on
s390. See commit ebce3eb2f7ef ("ceph: fix inode number handling on
arches with 32-bit ino_t"). So getting rid of this would be a good
thing.

diff --git a/arch/Kconfig b/arch/Kconfig
index 24862d15f3a3..383c98e86a70 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -327,6 +327,10 @@ config ARCH_32BIT_OFF_T
 	  still support 32-bit off_t. This option is enabled for all such
 	  architectures explicitly.
 
+# Selected by 64 bit architectures which have a 32 bit f_tinode in struct ustat
+config ARCH_32BIT_USTAT_F_TINODE
+	bool
+
 config HAVE_ASM_MODVERSIONS
 	bool
 	help
diff --git a/arch/alpha/Kconfig b/arch/alpha/Kconfig
index 1f51437d5765..96ce6565890e 100644
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -2,6 +2,7 @@
 config ALPHA
 	bool
 	default y
+	select ARCH_32BIT_USTAT_F_TINODE
 	select ARCH_MIGHT_HAVE_PC_PARPORT
 	select ARCH_MIGHT_HAVE_PC_SERIO
 	select ARCH_NO_PREEMPT
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c72874f09741..434efd9ca0c5 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -58,6 +58,7 @@ config S390
 	# Note: keep this list sorted alphabetically
 	#
 	imply IMA_SECURE_AND_OR_TRUSTED_BOOT
+	select ARCH_32BIT_USTAT_F_TINODE
 	select ARCH_BINFMT_ELF_STATE
 	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEBUG_WX
diff --git a/fs/statfs.c b/fs/statfs.c
index 68cb07788750..0ba34c135593 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -255,7 +255,10 @@ SYSCALL_DEFINE2(ustat, unsigned, dev, struct ustat __user *, ubuf)
 
 	memset(&tmp,0,sizeof(struct ustat));
 	tmp.f_tfree = sbuf.f_bfree;
-	tmp.f_tinode = sbuf.f_ffree;
+	if (IS_ENABLED(CONFIG_ARCH_32BIT_USTAT_F_TINODE))
+		tmp.f_tinode = min_t(u64, sbuf.f_ffree, UINT_MAX);
+	else
+		tmp.f_tinode = sbuf.f_ffree;
 
 	return copy_to_user(ubuf, &tmp, sizeof(struct ustat)) ? -EFAULT : 0;
 }
diff --git a/include/linux/types.h b/include/linux/types.h
index a147977602b5..1e9d0a2c1dba 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -14,7 +14,7 @@ typedef u32 __kernel_dev_t;
 
 typedef __kernel_fd_set		fd_set;
 typedef __kernel_dev_t		dev_t;
-typedef __kernel_ino_t		ino_t;
+typedef __kernel_ulong_t	ino_t;
 typedef __kernel_mode_t		mode_t;
 typedef unsigned short		umode_t;
 typedef u32			nlink_t;
@@ -189,7 +189,11 @@ struct hlist_node {
 
 struct ustat {
 	__kernel_daddr_t	f_tfree;
-	__kernel_ino_t		f_tinode;
+#ifdef ARCH_HAS_32BIT_F_TINODE
+	unsigned int		f_tinode;
+#else
+	unsigned long		f_tinode;
+#endif
 	char			f_fname[6];
 	char			f_fpack[6];
 };

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 13/14] mm, slub: better heuristic for number of cpus when calculating slab order
  2021-02-09 21:42 ` [patch 13/14] mm, slub: better heuristic for number of cpus when calculating slab order Andrew Morton
@ 2021-02-10 14:34   ` Vlastimil Babka
  2021-02-10 19:22     ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Vlastimil Babka @ 2021-02-10 14:34 UTC (permalink / raw)
  To: Andrew Morton, aneesh.kumar, bharata, catalin.marinas, cl, guro,
	hannes, iamjoonsoo.kim, jannh, linux-mm, mgorman, mhocko,
	mm-commits, rientjes, shakeelb, stable, torvalds,
	vincent.guittot, will

On 2/9/21 10:42 PM, Andrew Morton wrote:
> From: Vlastimil Babka <vbabka@suse.cz>
> Subject: mm, slub: better heuristic for number of cpus when calculating slab order
> 

...

> Link: https://lkml.kernel.org/r/20210208134108.22286-1-vbabka@suse.cz
> Fixes: 045ab8c9487b ("mm/slub: let number of online CPUs determine the slub page order")
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
> Reported-by: Mel Gorman <mgorman@techsingularity.net>
> Tested-by: Vincent Guittot <vincent.guittot@linaro.org>

As Andrew's incoming series might have been not merged yet, I will point to
Mel's Tested-by:

https://lore.kernel.org/linux-mm/20210210140712.GB3697@techsingularity.net/

Thanks, Mel!

> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Jann Horn <jannh@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Shakeel Butt <shakeelb@google.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/slub.c |   18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  2021-02-10 13:34     ` Heiko Carstens
@ 2021-02-10 17:27       ` Heiko Carstens
  2021-02-10 19:17       ` Linus Torvalds
  1 sibling, 0 replies; 24+ messages in thread
From: Heiko Carstens @ 2021-02-10 17:27 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Linus Torvalds, Andrew Morton, Christian Borntraeger,
	Vasily Gorbik, Amir Goldstein, Chris Down, Hugh Dickins,
	Ivan Kokshaysky, Linux-MM, Matt Turner, mm-commits,
	Richard Henderson, Seth Forshee, stable, Arnd Bergmann,
	Ulrich Weigand, Tuan Hoang1

On Wed, Feb 10, 2021 at 02:34:08PM +0100, Heiko Carstens wrote:
> diff --git a/include/linux/types.h b/include/linux/types.h
> index a147977602b5..1e9d0a2c1dba 100644
> --- a/include/linux/types.h
> +++ b/include/linux/types.h
> @@ -14,7 +14,7 @@ typedef u32 __kernel_dev_t;
>  
>  typedef __kernel_fd_set		fd_set;
>  typedef __kernel_dev_t		dev_t;
> -typedef __kernel_ino_t		ino_t;
> +typedef __kernel_ulong_t	ino_t;
>  typedef __kernel_mode_t		mode_t;
>  typedef unsigned short		umode_t;
>  typedef u32			nlink_t;
> @@ -189,7 +189,11 @@ struct hlist_node {
>  
>  struct ustat {
>  	__kernel_daddr_t	f_tfree;
> -	__kernel_ino_t		f_tinode;
> +#ifdef ARCH_HAS_32BIT_F_TINODE
> +	unsigned int		f_tinode;
> +#else
> +	unsigned long		f_tinode;
> +#endif

Of course that should have been CONFIG_ARCH_32BIT_USTAT_F_TINODE in
order to not break the existing ABI for alpha and s390.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  2021-02-10 13:34     ` Heiko Carstens
  2021-02-10 17:27       ` Heiko Carstens
@ 2021-02-10 19:17       ` Linus Torvalds
  2021-02-10 19:55         ` Arnd Bergmann
  2021-02-11 18:45         ` Heiko Carstens
  1 sibling, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2021-02-10 19:17 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Andrew Morton, Christian Borntraeger, Vasily Gorbik,
	Amir Goldstein, Chris Down, Hugh Dickins, Ivan Kokshaysky,
	Linux-MM, Matt Turner, mm-commits, Richard Henderson,
	Seth Forshee, stable, Arnd Bergmann, Ulrich Weigand, Tuan Hoang1

On Wed, Feb 10, 2021 at 5:39 AM Heiko Carstens <hca@linux.ibm.com> wrote:
>
> I couldn't spot any and also gave the patch below a try and my system
> still boots without any errors.
> So, as far as I can tell it _should_ be ok to change this.

So your patch (with the fix on top) looks sane to me.

I'm not entirely sure it is worth it, but the fact that we've had bugs
wrt this before does seem to imply that we should do this.

I'd remove the __kernel_ino_t type entirely, but I wonder if user
space might depend on it. I do find

   #ifndef __kernel_ino_t
   typedef __kernel_ulong_t __kernel_ino_t;
   #endif

in the GNU libc headers I have, but then I don't find any actual use
of that, so it looks like it may be jyst a "we copied things for other
reasons".

On the whole I think this would be the right thing to do, but I'm a
bit worried that it's more pain that it might be worth.

Heiko, I think I'll leave this decision entirely to you. If you think
it's worth it to avoid any possible future pain wrt this odd inode
number thing for s390, just add it to the s390 tree with my ack.
Because honestly, I think s390 is the only architecture that really
cares by now.

               Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 13/14] mm, slub: better heuristic for number of cpus when calculating slab order
  2021-02-10 14:34   ` Vlastimil Babka
@ 2021-02-10 19:22     ` Linus Torvalds
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2021-02-10 19:22 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Aneesh Kumar K.V, bharata, Catalin Marinas,
	Christoph Lameter, Roman Gushchin, Johannes Weiner, Joonsoo Kim,
	Jann Horn, Linux-MM, Mel Gorman, Michal Hocko, mm-commits,
	David Rientjes, Shakeel Butt, stable, Vincent Guittot,
	Will Deacon

On Wed, Feb 10, 2021 at 6:34 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> As Andrew's incoming series might have been not merged yet, I will point to
> Mel's Tested-by:

Thanks, added.


           Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: incoming
  2021-02-09 21:41 incoming Andrew Morton
                   ` (13 preceding siblings ...)
  2021-02-09 21:42 ` [patch 14/14] nilfs2: make splice write available again Andrew Morton
@ 2021-02-10 19:30 ` Linus Torvalds
  14 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2021-02-10 19:30 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux-MM, mm-commits

Hah. This series shows a small deficiency in your scripting wrt the diffstat:

On Tue, Feb 9, 2021 at 1:41 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
>  .mailmap                          |    1
...
>  mm/slub.c                         |   18 +++++++++-
>  17 files changed, 172 insertions(+), 49 deletions(-)

It actually has 18 files changed, but one of them is a pure rename (no
change to the content), and apparently your diffstat tool can't handle
that case.

It *should* have ended with

 ...
 mm/slub.c                                          | 18 +++++-
 .../selftests/vm/{run_vmtests => run_vmtests.sh}   |  0
 18 files changed, 172 insertions(+), 49 deletions(-)
 rename tools/testing/selftests/vm/{run_vmtests => run_vmtests.sh} (100%)

if you'd done a proper "git diff -M --stat --summary" of the series.

[ Ok, by default git would actually have said

    18 files changed, 171 insertions(+), 48 deletions(-)

  but it looks like you use the patience diff option, which gives that
extra insertion/deletion line because it generates the diff a bit
differently ]

Not a big deal,, but it made me briefly wonder "why doesn't my
diffstat match yours".

           Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  2021-02-10 19:17       ` Linus Torvalds
@ 2021-02-10 19:55         ` Arnd Bergmann
  2021-02-11 18:45         ` Heiko Carstens
  1 sibling, 0 replies; 24+ messages in thread
From: Arnd Bergmann @ 2021-02-10 19:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Heiko Carstens, Andrew Morton, Christian Borntraeger,
	Vasily Gorbik, Amir Goldstein, Chris Down, Hugh Dickins,
	Ivan Kokshaysky, Linux-MM, Matt Turner, mm-commits,
	Richard Henderson, Seth Forshee, stable, Arnd Bergmann,
	Ulrich Weigand, Tuan Hoang1, Debian

On Wed, Feb 10, 2021 at 8:17 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Feb 10, 2021 at 5:39 AM Heiko Carstens <hca@linux.ibm.com> wrote:
> >
> > I couldn't spot any and also gave the patch below a try and my system
> > still boots without any errors.
> > So, as far as I can tell it _should_ be ok to change this.
>
> So your patch (with the fix on top) looks sane to me.
>
> I'm not entirely sure it is worth it, but the fact that we've had bugs
> wrt this before does seem to imply that we should do this.
>
> I'd remove the __kernel_ino_t type entirely, but I wonder if user
> space might depend on it. I do find
>
>    #ifndef __kernel_ino_t
>    typedef __kernel_ulong_t __kernel_ino_t;
>    #endif
>
> in the GNU libc headers I have, but then I don't find any actual use
> of that, so it looks like it may be jyst a "we copied things for other
> reasons".

I checked debian codesearch to see if there are any users in
distro source code and found exactly one instance that will
definitely break at compile time:

https://sources.debian.org/src/nfs-utils/1:1.3.4-4/support/include/nfs/nfs.h/?hl=99#L99

This is a copy of a kernel header that was removed ten years ago
with commit c152292f9ee7 ("nfsd: remove include/linux/nfsd/syscall.h").

The mainline version of that package removed the contents in 2016 in
the following release (2.1.1), but debian is still on the previous
version (1.3.4)
http://git.linux-nfs.org/?p=steved/nfs-utils.git;a=commitdiff;h=fc1127d754578cd1

Someone will have to update the package for Debian, but it seems
that would be a good idea anyway.

      Arnd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha
  2021-02-10 19:17       ` Linus Torvalds
  2021-02-10 19:55         ` Arnd Bergmann
@ 2021-02-11 18:45         ` Heiko Carstens
  1 sibling, 0 replies; 24+ messages in thread
From: Heiko Carstens @ 2021-02-11 18:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrew Morton, Christian Borntraeger, Vasily Gorbik,
	Amir Goldstein, Chris Down, Hugh Dickins, Ivan Kokshaysky,
	Linux-MM, Matt Turner, mm-commits, Richard Henderson,
	Seth Forshee, stable, Arnd Bergmann, Ulrich Weigand, Tuan Hoang1

On Wed, Feb 10, 2021 at 11:17:10AM -0800, Linus Torvalds wrote:
> On Wed, Feb 10, 2021 at 5:39 AM Heiko Carstens <hca@linux.ibm.com> wrote:
> >
> > I couldn't spot any and also gave the patch below a try and my system
> > still boots without any errors.
> > So, as far as I can tell it _should_ be ok to change this.
> 
> So your patch (with the fix on top) looks sane to me.
> 
> I'm not entirely sure it is worth it, but the fact that we've had bugs
> wrt this before does seem to imply that we should do this.
> 
> I'd remove the __kernel_ino_t type entirely, but I wonder if user
> space might depend on it. I do find
> 
>    #ifndef __kernel_ino_t
>    typedef __kernel_ulong_t __kernel_ino_t;
>    #endif
> 
> in the GNU libc headers I have, but then I don't find any actual use
> of that, so it looks like it may be jyst a "we copied things for other
> reasons".
> 
> On the whole I think this would be the right thing to do, but I'm a
> bit worried that it's more pain that it might be worth.
> 
> Heiko, I think I'll leave this decision entirely to you. If you think
> it's worth it to avoid any possible future pain wrt this odd inode
> number thing for s390, just add it to the s390 tree with my ack.
> Because honestly, I think s390 is the only architecture that really
> cares by now.

So, yes. We will go to change this to hopefully avoid future
problems. The patch is supposed to be part of the next merge
window and converts both s390 and alpha, unless somebody objects.

After that has been merged I'll provide a follow-on patch which
enables TMPFS_INODE64 for alpha and s390 again, and yet another one
which removes __kernel_ino_t as suggested by you.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-09 21:41 incoming Andrew Morton
2021-02-09 21:41 ` [patch 01/14] squashfs: avoid out of bounds writes in decompressors Andrew Morton
2021-02-09 21:41 ` [patch 02/14] squashfs: add more sanity checks in id lookup Andrew Morton
2021-02-09 21:41 ` [patch 03/14] squashfs: add more sanity checks in inode lookup Andrew Morton
2021-02-09 21:42 ` [patch 04/14] squashfs: add more sanity checks in xattr id lookup Andrew Morton
2021-02-09 21:42 ` [patch 05/14] kasan: fix stack traces dependency for HW_TAGS Andrew Morton
2021-02-09 21:42 ` [patch 06/14] firmware_loader: align .builtin_fw to 8 Andrew Morton
2021-02-09 21:42 ` [patch 07/14] mm/mremap: fix BUILD_BUG_ON() error in get_extent Andrew Morton
2021-02-09 21:42 ` [patch 08/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on s390 Andrew Morton
2021-02-09 21:42 ` [patch 09/14] tmpfs: disallow CONFIG_TMPFS_INODE64 on alpha Andrew Morton
2021-02-09 22:03   ` Linus Torvalds
2021-02-10 13:34     ` Heiko Carstens
2021-02-10 17:27       ` Heiko Carstens
2021-02-10 19:17       ` Linus Torvalds
2021-02-10 19:55         ` Arnd Bergmann
2021-02-11 18:45         ` Heiko Carstens
2021-02-09 21:42 ` [patch 10/14] selftests/vm: rename file run_vmtests to run_vmtests.sh Andrew Morton
2021-02-09 21:42 ` [patch 11/14] MAINTAINERS: update Andrey Ryabinin's email address Andrew Morton
2021-02-09 21:42 ` [patch 12/14] Revert "mm: memcontrol: avoid workload stalls when lowering memory.high" Andrew Morton
2021-02-09 21:42 ` [patch 13/14] mm, slub: better heuristic for number of cpus when calculating slab order Andrew Morton
2021-02-10 14:34   ` Vlastimil Babka
2021-02-10 19:22     ` Linus Torvalds
2021-02-09 21:42 ` [patch 14/14] nilfs2: make splice write available again Andrew Morton
2021-02-10 19:30 ` incoming Linus Torvalds

mm-commits Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/mm-commits/0 mm-commits/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 mm-commits mm-commits/ https://lore.kernel.org/mm-commits \
		mm-commits@vger.kernel.org
	public-inbox-index mm-commits

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.mm-commits


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git