Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* Re: [vfs]  8bb3c61baf:  vm-scalability.median -23.7% regression
       [not found] <20190903084122.GH15734@shao2-debian>
@ 2019-09-08 21:46 ` Al Viro
  2019-09-08 23:47   ` Al Viro
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-09-08 21:46 UTC (permalink / raw)
  To: kernel test robot; +Cc: David Howells, Hugh Dickins, LKML, linux-fsdevel, lkp

On Tue, Sep 03, 2019 at 04:41:22PM +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed a -23.7% regression of vm-scalability.median due to commit:
> 
> 
> commit: 8bb3c61bafa8c1cd222ada602bb94ff23119e738 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
> https://kernel.googlesource.com/pub/scm/linux/kernel/git/viro/vfs.git work.mount
> 
> in testcase: vm-scalability
> on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
> with following parameters:
> 
> 	runtime: 300s
> 	size: 16G
> 	test: shm-pread-rand
> 	cpufreq_governor: performance
> 	ucode: 0xb000036

That thing loses size=... option.  Both size= and nr_blocks= affect the
same thing (->max_blocks), but the parser keeps track of the options
it has seen and applying the parsed data to superblock checks only
whether nr_blocks= had been there.  IOW, size= gets parsed, but the
result goes nowhere.

I'm not sure whether it's better to fix the patch up or redo it from
scratch - it needs to be carved up anyway and it's highly non-transparent,
so I'm probably going to replace the damn thing entirely with something
that would be easier to follow.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [vfs]  8bb3c61baf:  vm-scalability.median -23.7% regression
  2019-09-08 21:46 ` [vfs] 8bb3c61baf: vm-scalability.median -23.7% regression Al Viro
@ 2019-09-08 23:47   ` Al Viro
  2019-09-09  3:10     ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-09-08 23:47 UTC (permalink / raw)
  To: kernel test robot; +Cc: David Howells, Hugh Dickins, LKML, linux-fsdevel, lkp

On Sun, Sep 08, 2019 at 10:46:01PM +0100, Al Viro wrote:
> On Tue, Sep 03, 2019 at 04:41:22PM +0800, kernel test robot wrote:
> > Greeting,
> > 
> > FYI, we noticed a -23.7% regression of vm-scalability.median due to commit:
> > 
> > 
> > commit: 8bb3c61bafa8c1cd222ada602bb94ff23119e738 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
> > https://kernel.googlesource.com/pub/scm/linux/kernel/git/viro/vfs.git work.mount
> > 
> > in testcase: vm-scalability
> > on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
> > with following parameters:
> > 
> > 	runtime: 300s
> > 	size: 16G
> > 	test: shm-pread-rand
> > 	cpufreq_governor: performance
> > 	ucode: 0xb000036
> 
> That thing loses size=... option.  Both size= and nr_blocks= affect the
> same thing (->max_blocks), but the parser keeps track of the options
> it has seen and applying the parsed data to superblock checks only
> whether nr_blocks= had been there.  IOW, size= gets parsed, but the
> result goes nowhere.
> 
> I'm not sure whether it's better to fix the patch up or redo it from
> scratch - it needs to be carved up anyway and it's highly non-transparent,
> so I'm probably going to replace the damn thing entirely with something
> that would be easier to follow.

... and this
+       { Opt_huge,     "deny",         SHMEM_HUGE_DENY },
+       { Opt_huge,     "force",        SHMEM_HUGE_FORCE },
had been wrong - huge=deny and huge=force should not be accepted _and_
fs_parameter_enum is not suitable for negative constants right now
anyway.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [vfs]  8bb3c61baf:  vm-scalability.median -23.7% regression
  2019-09-08 23:47   ` Al Viro
@ 2019-09-09  3:10     ` Hugh Dickins
  2019-09-09  3:56       ` Al Viro
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2019-09-09  3:10 UTC (permalink / raw)
  To: Al Viro
  Cc: kernel test robot, David Howells, Hugh Dickins, Andrew Morton,
	LKML, linux-fsdevel, lkp

On Mon, 9 Sep 2019, Al Viro wrote:
> On Sun, Sep 08, 2019 at 10:46:01PM +0100, Al Viro wrote:
> > On Tue, Sep 03, 2019 at 04:41:22PM +0800, kernel test robot wrote:
> > > Greeting,
> > > 
> > > FYI, we noticed a -23.7% regression of vm-scalability.median due to commit:
> > > 
> > > 
> > > commit: 8bb3c61bafa8c1cd222ada602bb94ff23119e738 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
> > > https://kernel.googlesource.com/pub/scm/linux/kernel/git/viro/vfs.git work.mount
> > > 
> > > in testcase: vm-scalability
> > > on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory
> > > with following parameters:
> > > 
> > > 	runtime: 300s
> > > 	size: 16G
> > > 	test: shm-pread-rand
> > > 	cpufreq_governor: performance
> > > 	ucode: 0xb000036
> > 
> > That thing loses size=... option.  Both size= and nr_blocks= affect the
> > same thing (->max_blocks), but the parser keeps track of the options
> > it has seen and applying the parsed data to superblock checks only
> > whether nr_blocks= had been there.  IOW, size= gets parsed, but the
> > result goes nowhere.
> > 
> > I'm not sure whether it's better to fix the patch up or redo it from
> > scratch - it needs to be carved up anyway and it's highly non-transparent,
> > so I'm probably going to replace the damn thing entirely with something
> > that would be easier to follow.
> 
> ... and this
> +       { Opt_huge,     "deny",         SHMEM_HUGE_DENY },
> +       { Opt_huge,     "force",        SHMEM_HUGE_FORCE },
> had been wrong - huge=deny and huge=force should not be accepted _and_
> fs_parameter_enum is not suitable for negative constants right now
> anyway.

Sorry you've been spending time redisovering these, Al: I sent David
the tmpfs fixes (Cc'ing you and Andrew and lists) a couple of weeks
ago - but had no idea until your mail that the "loss of size" was
behind this vm-scalability regression report.

Ah, not for the first time, I missed saying "[PATCH]" in the subject:
sorry, that may have rendered it invisible to many eyes.

Here's what Andrew has been carrying in the mmotm tree since I sent it:
I'm sure we'd both be happy for you to take it into your tree.  I had
expected it to percolate through from mmotm to linux-next by now,
but apparently not.

From: Hugh Dickins <hughd@google.com>
Subject: tmpfs: fixups to use of the new mount API

Several fixups to shmem_parse_param() and tmpfs use of new mount API:

mm/shmem.c manages filesystem named "tmpfs": revert "shmem" to "tmpfs"
in its mount error messages.

/sys/kernel/mm/transparent_hugepage/shmem_enabled has valid options
"deny" and "force", but they are not valid as tmpfs "huge" options.

The "size" param is an alternative to "nr_blocks", and needs to be
recognized as changing max_blocks.  And where there's ambiguity, it's
better to mention "size" than "nr_blocks" in messages, since "size" is
the variant shown in /proc/mounts.

shmem_apply_options() left ctx->mpol as the new mpol, so then it was
freed in shmem_free_fc(), and the filesystem went on to use-after-free.

shmem_parse_param() issue "tmpfs: Bad value for '%s'" messages just
like fs_parse() would, instead of a different wording.  Where config
disables "mpol" or "huge", say "tmpfs: Unsupported parameter '%s'".

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1908191503290.1253@eggly.anvils
Fixes: 144df3b288c41 ("vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/shmem.c |   80 ++++++++++++++++++++++++++-------------------------
 1 file changed, 42 insertions(+), 38 deletions(-)

--- a/mm/shmem.c~tmpfs-fixups-to-use-of-the-new-mount-api
+++ a/mm/shmem.c
@@ -3432,13 +3432,11 @@ static const struct fs_parameter_enum sh
 	{ Opt_huge,	"always",	SHMEM_HUGE_ALWAYS },
 	{ Opt_huge,	"within_size",	SHMEM_HUGE_WITHIN_SIZE },
 	{ Opt_huge,	"advise",	SHMEM_HUGE_ADVISE },
-	{ Opt_huge,	"deny",		SHMEM_HUGE_DENY },
-	{ Opt_huge,	"force",	SHMEM_HUGE_FORCE },
 	{}
 };
 
 const struct fs_parameter_description shmem_fs_parameters = {
-	.name		= "shmem",
+	.name		= "tmpfs",
 	.specs		= shmem_param_specs,
 	.enums		= shmem_param_enums,
 };
@@ -3448,9 +3446,9 @@ static void shmem_apply_options(struct s
 				unsigned long inodes_in_use)
 {
 	struct shmem_fs_context *ctx = fc->fs_private;
-	struct mempolicy *old = NULL;
 
-	if (test_bit(Opt_nr_blocks, &ctx->changes))
+	if (test_bit(Opt_nr_blocks, &ctx->changes) ||
+	    test_bit(Opt_size, &ctx->changes))
 		sbinfo->max_blocks = ctx->max_blocks;
 	if (test_bit(Opt_nr_inodes, &ctx->changes)) {
 		sbinfo->max_inodes = ctx->max_inodes;
@@ -3459,8 +3457,11 @@ static void shmem_apply_options(struct s
 	if (test_bit(Opt_huge, &ctx->changes))
 		sbinfo->huge = ctx->huge;
 	if (test_bit(Opt_mpol, &ctx->changes)) {
-		old = sbinfo->mpol;
-		sbinfo->mpol = ctx->mpol;
+		/*
+		 * Update sbinfo->mpol now while stat_lock is held.
+		 * Leave shmem_free_fc() to free the old mpol if any.
+		 */
+		swap(sbinfo->mpol, ctx->mpol);
 	}
 
 	if (fc->purpose != FS_CONTEXT_FOR_RECONFIGURE) {
@@ -3471,8 +3472,6 @@ static void shmem_apply_options(struct s
 		if (test_bit(Opt_mode, &ctx->changes))
 			sbinfo->mode = ctx->mode;
 	}
-
-	mpol_put(old);
 }
 
 static int shmem_parse_param(struct fs_context *fc, struct fs_parameter *param)
@@ -3498,7 +3497,7 @@ static int shmem_parse_param(struct fs_c
 			rest++;
 		}
 		if (*rest)
-			return invalf(fc, "shmem: Invalid size");
+			goto bad_value;
 		ctx->max_blocks = DIV_ROUND_UP(size, PAGE_SIZE);
 		break;
 
@@ -3506,55 +3505,59 @@ static int shmem_parse_param(struct fs_c
 		rest = param->string;
 		ctx->max_blocks = memparse(param->string, &rest);
 		if (*rest)
-			return invalf(fc, "shmem: Invalid nr_blocks");
+			goto bad_value;
 		break;
+
 	case Opt_nr_inodes:
 		rest = param->string;
 		ctx->max_inodes = memparse(param->string, &rest);
 		if (*rest)
-			return invalf(fc, "shmem: Invalid nr_inodes");
+			goto bad_value;
 		break;
+
 	case Opt_mode:
 		ctx->mode = result.uint_32 & 07777;
 		break;
+
 	case Opt_uid:
 		ctx->uid = make_kuid(current_user_ns(), result.uint_32);
 		if (!uid_valid(ctx->uid))
-			return invalf(fc, "shmem: Invalid uid");
+			goto bad_value;
 		break;
 
 	case Opt_gid:
 		ctx->gid = make_kgid(current_user_ns(), result.uint_32);
 		if (!gid_valid(ctx->gid))
-			return invalf(fc, "shmem: Invalid gid");
+			goto bad_value;
 		break;
 
 	case Opt_huge:
-#ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE
-		if (!has_transparent_hugepage() &&
-		    result.uint_32 != SHMEM_HUGE_NEVER)
-			return invalf(fc, "shmem: Huge pages disabled");
-
 		ctx->huge = result.uint_32;
+		if (ctx->huge != SHMEM_HUGE_NEVER &&
+		    !(IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) &&
+		      has_transparent_hugepage()))
+			goto unsupported_parameter;
 		break;
-#else
-		return invalf(fc, "shmem: huge= option disabled");
-#endif
-
-	case Opt_mpol: {
-#ifdef CONFIG_NUMA
-		struct mempolicy *mpol;
-		if (mpol_parse_str(param->string, &mpol))
-			return invalf(fc, "shmem: Invalid mpol=");
-		mpol_put(ctx->mpol);
-		ctx->mpol = mpol;
-#endif
-		break;
-	}
+
+	case Opt_mpol:
+		if (IS_ENABLED(CONFIG_NUMA)) {
+			struct mempolicy *mpol;
+			if (mpol_parse_str(param->string, &mpol))
+				goto bad_value;
+			mpol_put(ctx->mpol);
+			ctx->mpol = mpol;
+			break;
+		}
+		goto unsupported_parameter;
 	}
 
 	__set_bit(opt, &ctx->changes);
 	return 0;
+
+unsupported_parameter:
+	return invalf(fc, "tmpfs: Unsupported parameter '%s'", param->key);
+bad_value:
+	return invalf(fc, "tmpfs: Bad value for '%s'", param->key);
 }
 
 /*
@@ -3572,14 +3575,15 @@ static int shmem_reconfigure(struct fs_c
 	unsigned long inodes_in_use;
 
 	spin_lock(&sbinfo->stat_lock);
-	if (test_bit(Opt_nr_blocks, &ctx->changes)) {
+	if (test_bit(Opt_nr_blocks, &ctx->changes) ||
+	    test_bit(Opt_size, &ctx->changes)) {
 		if (ctx->max_blocks && !sbinfo->max_blocks) {
 			spin_unlock(&sbinfo->stat_lock);
-			return invalf(fc, "shmem: Can't retroactively limit nr_blocks");
+			return invalf(fc, "tmpfs: Cannot retroactively limit size");
 		}
 		if (percpu_counter_compare(&sbinfo->used_blocks, ctx->max_blocks) > 0) {
 			spin_unlock(&sbinfo->stat_lock);
-			return invalf(fc, "shmem: Too few blocks for current use");
+			return invalf(fc, "tmpfs: Too small a size for current use");
 		}
 	}
 
@@ -3587,11 +3591,11 @@ static int shmem_reconfigure(struct fs_c
 	if (test_bit(Opt_nr_inodes, &ctx->changes)) {
 		if (ctx->max_inodes && !sbinfo->max_inodes) {
 			spin_unlock(&sbinfo->stat_lock);
-			return invalf(fc, "shmem: Can't retroactively limit nr_inodes");
+			return invalf(fc, "tmpfs: Cannot retroactively limit inodes");
 		}
 		if (ctx->max_inodes < inodes_in_use) {
 			spin_unlock(&sbinfo->stat_lock);
-			return invalf(fc, "shmem: Too few inodes for current use");
+			return invalf(fc, "tmpfs: Too few inodes for current use");
 		}
 	}
 
_

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [vfs]  8bb3c61baf:  vm-scalability.median -23.7% regression
  2019-09-09  3:10     ` Hugh Dickins
@ 2019-09-09  3:56       ` Al Viro
  2019-09-10  6:27         ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Al Viro @ 2019-09-09  3:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kernel test robot, David Howells, Andrew Morton, LKML,
	linux-fsdevel, lkp

On Sun, Sep 08, 2019 at 08:10:17PM -0700, Hugh Dickins wrote:

Hmm...  FWIW, I'd ended up redoing most of the thing, with
hopefully sane carve-up.  Differences:
	* we really care only about three things having
been set - ->max_blocks, ->max_inodes and ->huge.  This
__set_bit() hack is cute, but asking for trouble (and getting
it).  Explicit ctx->seen & SHMEM_SEEN_BLOCKS, etc. is
cleaner.
	*
>  const struct fs_parameter_description shmem_fs_parameters = {
> -	.name		= "shmem",
> +	.name		= "tmpfs",
>  	.specs		= shmem_param_specs,
>  	.enums		= shmem_param_enums,
>  };

	Missed that one, will fold.

	*
> @@ -3448,9 +3446,9 @@ static void shmem_apply_options(struct s

	The whole "apply" thing is useless - in remount we
need to copy max_inode/max_blocks/huge/mpol under the lock after
checks, and we can do that manually just fine.  Other options
(uid/gid/mode) get ignored.  There very little overlap
with fill_super case, really.

> -		old = sbinfo->mpol;
> -		sbinfo->mpol = ctx->mpol;
> +		/*
> +		 * Update sbinfo->mpol now while stat_lock is held.
> +		 * Leave shmem_free_fc() to free the old mpol if any.
> +		 */
> +		swap(sbinfo->mpol, ctx->mpol);

Umm...  Missed that use-after-free due to destructor, TBH (in
remount, that is).  Fixed (in a slightly different way).

>  		}
>  		if (*rest)
> -			return invalf(fc, "shmem: Invalid size");
> +			goto bad_value;
>  		ctx->max_blocks = DIV_ROUND_UP(size, PAGE_SIZE);
>  		break;

FWIW, I had those with s/shmem/tmpfs/, no problem with merging like
that.  Will fold.

[snip]
>  	case Opt_huge:
> -#ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE
> -		if (!has_transparent_hugepage() &&
> -		    result.uint_32 != SHMEM_HUGE_NEVER)
> -			return invalf(fc, "shmem: Huge pages disabled");
> -
>  		ctx->huge = result.uint_32;
> +		if (ctx->huge != SHMEM_HUGE_NEVER &&
> +		    !(IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) &&
> +		      has_transparent_hugepage()))
> +			goto unsupported_parameter;
>  		break;
> -#else
> -		return invalf(fc, "shmem: huge= option disabled");
> -#endif
> -
> -	case Opt_mpol: {
> -#ifdef CONFIG_NUMA
> -		struct mempolicy *mpol;
> -		if (mpol_parse_str(param->string, &mpol))
> -			return invalf(fc, "shmem: Invalid mpol=");
> -		mpol_put(ctx->mpol);
> -		ctx->mpol = mpol;
> -#endif
> -		break;
> -	}

OK...

> +	case Opt_mpol:
> +		if (IS_ENABLED(CONFIG_NUMA)) {
> +			struct mempolicy *mpol;
> +			if (mpol_parse_str(param->string, &mpol))
> +				goto bad_value;
> +			mpol_put(ctx->mpol);
> +			ctx->mpol = mpol;
> +			break;
> +		}
> +		goto unsupported_parameter;

Slightly different here - I'd done that bit as
                mpol_put(ctx->mpol);
                ctx->mpol = NULL;
                if (mpol_parse_str(param->string, &ctx->mpol))
                        return invalf (goto bad_value now)
		


> +unsupported_parameter:
> +	return invalf(fc, "tmpfs: Unsupported parameter '%s'", param->key);
> +bad_value:
> +	return invalf(fc, "tmpfs: Bad value for '%s'", param->key);


> -			return invalf(fc, "shmem: Can't retroactively limit nr_blocks");
> +			return invalf(fc, "tmpfs: Cannot retroactively limit size");
>  		}
>  		if (percpu_counter_compare(&sbinfo->used_blocks, ctx->max_blocks) > 0) {
>  			spin_unlock(&sbinfo->stat_lock);
> -			return invalf(fc, "shmem: Too few blocks for current use");
> +			return invalf(fc, "tmpfs: Too small a size for current use");
>  		}
>  	}
>  
> @@ -3587,11 +3591,11 @@ static int shmem_reconfigure(struct fs_c
>  	if (test_bit(Opt_nr_inodes, &ctx->changes)) {
>  		if (ctx->max_inodes && !sbinfo->max_inodes) {
>  			spin_unlock(&sbinfo->stat_lock);
> -			return invalf(fc, "shmem: Can't retroactively limit nr_inodes");
> +			return invalf(fc, "tmpfs: Cannot retroactively limit inodes");
>  		}
>  		if (ctx->max_inodes < inodes_in_use) {
>  			spin_unlock(&sbinfo->stat_lock);
> -			return invalf(fc, "shmem: Too few inodes for current use");
> +			return invalf(fc, "tmpfs: Too few inodes for current use");
>  		}
>  	}

s/Can't/Cannot/ and s/few blocks/small a size/?  No problem, except that I'd done
			err = "Too few inodes for current use";
			goto out;
...
out:
        return invalf(fc, "tmpfs: %s", err);


Anyway, see vfs.git#uncertain.shmem for what I've got with those folded in.
Do you see any problems with that one?  That's the last 5 commits in there...

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [vfs]  8bb3c61baf:  vm-scalability.median -23.7% regression
  2019-09-09  3:56       ` Al Viro
@ 2019-09-10  6:27         ` Hugh Dickins
  2019-09-11  3:05           ` Hugh Dickins
  0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2019-09-10  6:27 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, kernel test robot, David Howells, Andrew Morton,
	LKML, linux-fsdevel, lkp

On Mon, 9 Sep 2019, Al Viro wrote:
> 
> Anyway, see vfs.git#uncertain.shmem for what I've got with those folded in.
> Do you see any problems with that one?  That's the last 5 commits in there...

It's mostly fine, I've no problem with going your way instead of what
we had in mmotm; but I have seen some problems with it, and had been
intending to send you a fixup patch tonight (shmem_reconfigure() missing
unlock on error is the main problem, but there are other fixes needed).

But I'm growing tired. I've a feeling my "swap" of the mpols, instead
of immediate mpol_put(), was necessary to protect against a race with
shmem_get_sbmpol(), but I'm not clear-headed enough to trust myself on
that now.  And I've a mystery to solve, that shmem_reconfigure() gets
stuck into showing the wrong error message.

Tomorrow....

Oh, and my first attempt to build and boot that series over 5.3-rc5
wouldn't boot. Luckily there was a tell-tale "i915" in the stacktrace,
which reminded me of the drivers/gpu/drm/i915/gem/i915_gemfs.c fix
we discussed earlier in the cycle.  That is of course in linux-next
by now, but I wonder if your branch ought to contain a duplicate of
that fix, so that people with i915 doing bisections on 5.4-rc do not
fall into an unbootable hole between vfs and gpu merges.

Hugh

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [vfs]  8bb3c61baf:  vm-scalability.median -23.7% regression
  2019-09-10  6:27         ` Hugh Dickins
@ 2019-09-11  3:05           ` Hugh Dickins
  0 siblings, 0 replies; 6+ messages in thread
From: Hugh Dickins @ 2019-09-11  3:05 UTC (permalink / raw)
  To: Al Viro
  Cc: Hugh Dickins, kernel test robot, David Howells, Andrew Morton,
	LKML, linux-fsdevel, lkp

On Mon, 9 Sep 2019, Hugh Dickins wrote:
> On Mon, 9 Sep 2019, Al Viro wrote:
> > 
> > Anyway, see vfs.git#uncertain.shmem for what I've got with those folded in.
> > Do you see any problems with that one?  That's the last 5 commits in there...
> 
> It's mostly fine, I've no problem with going your way instead of what
> we had in mmotm; but I have seen some problems with it, and had been
> intending to send you a fixup patch tonight (shmem_reconfigure() missing
> unlock on error is the main problem, but there are other fixes needed).
> 
> But I'm growing tired. I've a feeling my "swap" of the mpols, instead
> of immediate mpol_put(), was necessary to protect against a race with
> shmem_get_sbmpol(), but I'm not clear-headed enough to trust myself on
> that now.  And I've a mystery to solve, that shmem_reconfigure() gets
> stuck into showing the wrong error message.

On my "swap" for the mpol_put(): no, the race against shmem_get_sbmpol()
is safe enough without that, and what you have matches what was always
done before. I rather like my "swap", which the previous double-free had
led me to, but it's fine if you prefer the ordinary way. I was probably
coming down from some over-exposure to iput() under spinlock, but there's
no such complications here.

> 
> Tomorrow....
> 
> Oh, and my first attempt to build and boot that series over 5.3-rc5
> wouldn't boot. Luckily there was a tell-tale "i915" in the stacktrace,
> which reminded me of the drivers/gpu/drm/i915/gem/i915_gemfs.c fix
> we discussed earlier in the cycle.  That is of course in linux-next
> by now, but I wonder if your branch ought to contain a duplicate of
> that fix, so that people with i915 doing bisections on 5.4-rc do not
> fall into an unbootable hole between vfs and gpu merges.

Below are the fixups I arrived at last night (I've not rechecked your
tree today, to see if you made any changes since).  But they're not
enough: I now understand why shmem_reconfigure() got stuck showing
the wrong error message, but I'll have to leave it to you to decide
what to do about it, because I don't know whether it's just a mistake,
or different filesystem types have different needs there.

My /etc/fstab has a line in for one of my test mounts:
tmpfs                /tlo                 tmpfs      size=4G               0 0
and that "size=4G" is what causes the problem: because each time
shmem_parse_options(fc, data) is called for a remount, data (that is,
options) points to a string starting with "size=4G,", followed by
what's actually been asked for in the remount options.

So if I try
mount -o remount,size=0 /tlo
that succeeds, setting the filesystem size to 0 meaning unlimited.
So if then as a test I try
mount -o remount,size=1M /tlo
that correctly fails with "Cannot retroactively limit size".
But then when I try
mount -o remount,nr_inodes=0 /tlo
I again get "Cannot retroactively limit size",
when it should have succeeded (again, 0 here meaning unlimited).

That's because the options in shmem_parse_options() are
"size=4G,nr_inodes=0", which indeed looks like an attempt to
retroactively limit size; but the user never asked "size=4G" there.

I think this problem, and some of what's fixed below, predate your
rework, and would equally affect the version in mmotm: I just didn't
discover these issues when I was testing that before.

Hugh

--- aviro/mm/shmem.c	2019-09-09 14:10:34.379832855 -0700
+++ hughd/mm/shmem.c	2019-09-09 23:29:28.467037895 -0700
@@ -3456,7 +3456,7 @@ static int shmem_parse_one(struct fs_con
 		ctx->huge = result.uint_32;
 		if (ctx->huge != SHMEM_HUGE_NEVER &&
 		    !(IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) &&
-		    has_transparent_hugepage()))
+		      has_transparent_hugepage()))
 			goto unsupported_parameter;
 		ctx->seen |= SHMEM_SEEN_HUGE;
 		break;
@@ -3532,26 +3532,26 @@ static int shmem_reconfigure(struct fs_c
 
 	spin_lock(&sbinfo->stat_lock);
 	inodes = sbinfo->max_inodes - sbinfo->free_inodes;
-	if (ctx->seen & SHMEM_SEEN_BLOCKS) {
+	if ((ctx->seen & SHMEM_SEEN_BLOCKS) && ctx->blocks) {
+		if (!sbinfo->max_blocks) {
+			err = "Cannot retroactively limit size";
+			goto out;
+		}
 		if (percpu_counter_compare(&sbinfo->used_blocks,
 					   ctx->blocks) > 0) {
 			err = "Too small a size for current use";
 			goto out;
 		}
-		if (ctx->blocks && !sbinfo->max_blocks) {
-			err = "Cannot retroactively limit nr_blocks";
+	}
+	if ((ctx->seen & SHMEM_SEEN_INODES) && ctx->inodes) {
+		if (!sbinfo->max_inodes) {
+			err = "Cannot retroactively limit inodes";
 			goto out;
 		}
-	}
-	if (ctx->seen & SHMEM_SEEN_INODES) {
 		if (ctx->inodes < inodes) {
 			err = "Too few inodes for current use";
 			goto out;
 		}
-		if (ctx->inodes && !sbinfo->max_inodes) {
-			err = "Cannot retroactively limit nr_inodes";
-			goto out;
-		}
 	}
 
 	if (ctx->seen & SHMEM_SEEN_HUGE)
@@ -3574,6 +3574,7 @@ static int shmem_reconfigure(struct fs_c
 	spin_unlock(&sbinfo->stat_lock);
 	return 0;
 out:
+	spin_unlock(&sbinfo->stat_lock);
 	return invalf(fc, "tmpfs: %s", err);
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20190903084122.GH15734@shao2-debian>
2019-09-08 21:46 ` [vfs] 8bb3c61baf: vm-scalability.median -23.7% regression Al Viro
2019-09-08 23:47   ` Al Viro
2019-09-09  3:10     ` Hugh Dickins
2019-09-09  3:56       ` Al Viro
2019-09-10  6:27         ` Hugh Dickins
2019-09-11  3:05           ` Hugh Dickins

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox