linux-unionfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case
@ 2020-12-13 13:27 Jeff Layton
  2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Jeff Layton @ 2020-12-13 13:27 UTC (permalink / raw)
  To: Amir Goldstein, Sargun Dhillon
  Cc: Miklos Szeredi, Vivek Goyal, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

What about this as an alternate approach to the problem that Sargun has
been working on? I have some minor concerns about the complexity of
managing a stateful object across two different words. That can be
done, but I think this may be simpler.

This set steals an extra flag bit from the errseq_t counter so that we
have two flags: one indicating whether to increment the counter at set
time, and another to indicate whether the error has been reported to
userland.

This should give you the semantics you want in the syncfs case, no?  If
this does look like it's a suitable approach, then I'll plan to clean up
the comments and docs.

I have a vague feeling that this might help us eventually kill the
AS_EIO and AS_ENOSPC bits too, but that would require a bit more work to
plumb in "since" samples at appropriate places.

Jeff Layton (2):
  errseq: split the SEEN flag into two new flags
  overlayfs: propagate errors from upper to overlay sb in sync_fs

 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/super.c     | 14 +++++++--
 include/linux/errseq.h   |  2 ++
 lib/errseq.c             | 64 +++++++++++++++++++++++++++++++++-------
 4 files changed, 67 insertions(+), 14 deletions(-)

-- 
2.29.2


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags
  2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton
@ 2020-12-13 13:27 ` Jeff Layton
  2020-12-13 23:35   ` NeilBrown
  2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton
  2020-12-13 20:31 ` [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Sargun Dhillon
  2 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2020-12-13 13:27 UTC (permalink / raw)
  To: Amir Goldstein, Sargun Dhillon
  Cc: Miklos Szeredi, Vivek Goyal, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

Overlayfs's volatile mounts want to be able to sample an error for
their own purposes, without preventing a later opener from potentially
seeing the error.

The original reason for the SEEN flag was to make it so that we didn't
need to increment the counter if nothing had observed the latest value
and the error was the same. Eventually, a regression was reported in
the errseq_t conversion, and we fixed that by using the SEEN flag to
also mean that the error had been reported to userland at least once
somewhere.

Those are two different states, however. If we instead take a second
flag bit from the counter, we can track these two things separately,
and accomodate the overlayfs volatile mount use-case.

Add a new MUSTINC flag that indicates that the counter must be
incremented the next time an error is set, and rework the errseq
functions to set and clear that flag whenever the SEEN bit is set or
cleared.

Test only for the MUSTINC bit when deciding whether to increment the
counter and only for the SEEN bit when deciding what to return in
errseq_sample.

Add a new errseq_peek function to allow for the overlayfs use-case.
This just grabs the latest counter and sets the MUSTINC bit, leaving
the SEEN bit untouched.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 include/linux/errseq.h |  2 ++
 lib/errseq.c           | 64 ++++++++++++++++++++++++++++++++++--------
 2 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/include/linux/errseq.h b/include/linux/errseq.h
index fc2777770768..6d4b9bc629ac 100644
--- a/include/linux/errseq.h
+++ b/include/linux/errseq.h
@@ -9,6 +9,8 @@ typedef u32	errseq_t;
 
 errseq_t errseq_set(errseq_t *eseq, int err);
 errseq_t errseq_sample(errseq_t *eseq);
+errseq_t errseq_peek(errseq_t *eseq);
+errseq_t errseq_sample_advance(errseq_t *eseq);
 int errseq_check(errseq_t *eseq, errseq_t since);
 int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
 #endif
diff --git a/lib/errseq.c b/lib/errseq.c
index 81f9e33aa7e7..5cc830f0361b 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -38,8 +38,11 @@
 /* This bit is used as a flag to indicate whether the value has been seen */
 #define ERRSEQ_SEEN		(1 << ERRSEQ_SHIFT)
 
+/* This bit indicates that value must be incremented even when error is same */
+#define ERRSEQ_MUSTINC		(1 << (ERRSEQ_SHIFT + 1))
+
 /* The lowest bit of the counter */
-#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 1))
+#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 2))
 
 /**
  * errseq_set - set a errseq_t for later reporting
@@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err)
 	for (;;) {
 		errseq_t new;
 
-		/* Clear out error bits and set new error */
-		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
+		/* Clear out flag bits and set new error */
+		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err;
 
-		/* Only increment if someone has looked at it */
-		if (old & ERRSEQ_SEEN)
+		/* Only increment if we have to */
+		if (old & ERRSEQ_MUSTINC)
 			new += ERRSEQ_CTR_INC;
 
 		/* If there would be no change, then call it done */
@@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set);
 errseq_t errseq_sample(errseq_t *eseq)
 {
 	errseq_t old = READ_ONCE(*eseq);
+	errseq_t new = old;
 
-	/* If nobody has seen this error yet, then we can be the first. */
-	if (!(old & ERRSEQ_SEEN))
-		old = 0;
-	return old;
+	/*
+	 * For the common case of no errors ever having been set, we can skip
+	 * marking the SEEN|MUSTINC bits. Once an error has been set, the value
+	 * will never go back to zero.
+	 */
+	if (old != 0) {
+		new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC;
+		if (old != new)
+			cmpxchg(eseq, old, new);
+		if (!(old & ERRSEQ_SEEN))
+			return 0;
+	}
+	return new;
 }
 EXPORT_SYMBOL(errseq_sample);
 
+/**
+ * errseq_peek - Grab current errseq_t value, but don't mark it SEEN
+ * @eseq: Pointer to errseq_t to be sampled.
+ *
+ * In some cases, we need to be able to sample the errseq_t, but we're not
+ * in a situation where we can report the value to userland. Use this
+ * function to do that. This ensures that later errors will be recorded,
+ * and that any current errors are reported at least once.
+ *
+ * Context: Any context.
+ * Return: The current errseq value.
+ */
+errseq_t errseq_peek(errseq_t *eseq)
+{
+	errseq_t old = READ_ONCE(*eseq);
+	errseq_t new = old;
+
+	if (old != 0) {
+		new |= ERRSEQ_MUSTINC;
+		if (old != new)
+			cmpxchg(eseq, old, new);
+	}
+	return new;
+}
+EXPORT_SYMBOL(errseq_peek);
+
 /**
  * errseq_check() - Has an error occurred since a particular sample point?
  * @eseq: Pointer to errseq_t value to be checked.
@@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample);
  */
 int errseq_check(errseq_t *eseq, errseq_t since)
 {
-	errseq_t cur = READ_ONCE(*eseq);
+	errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
+
+	/* Clear the flag bits for comparison */
+	since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
 
 	if (likely(cur == since))
 		return 0;
@@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
 		 * can advance "since" and return an error based on what we
 		 * have.
 		 */
-		new = old | ERRSEQ_SEEN;
+		new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC;
 		if (new != old)
 			cmpxchg(eseq, old, new);
 		*since = new;
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton
  2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton
@ 2020-12-13 13:27 ` Jeff Layton
  2020-12-14 21:38   ` Vivek Goyal
  2020-12-17 19:28   ` Sargun Dhillon
  2020-12-13 20:31 ` [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Sargun Dhillon
  2 siblings, 2 replies; 18+ messages in thread
From: Jeff Layton @ 2020-12-13 13:27 UTC (permalink / raw)
  To: Amir Goldstein, Sargun Dhillon
  Cc: Miklos Szeredi, Vivek Goyal, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

Peek at the upper layer's errseq_t at mount time for volatile mounts,
and record it in the per-sb info. In sync_fs, check for an error since
the recorded point and set it in the overlayfs superblock if there was
one.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 fs/overlayfs/ovl_entry.h |  1 +
 fs/overlayfs/super.c     | 14 +++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
index 1b5a2094df8e..fcfcc3951973 100644
--- a/fs/overlayfs/ovl_entry.h
+++ b/fs/overlayfs/ovl_entry.h
@@ -79,6 +79,7 @@ struct ovl_fs {
 	atomic_long_t last_ino;
 	/* Whiteout dentry cache */
 	struct dentry *whiteout;
+	errseq_t err_mark;
 };
 
 static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 290983bcfbb3..2985d2752970 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -264,8 +264,13 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
 	if (!ovl_upper_mnt(ofs))
 		return 0;
 
-	if (!ovl_should_sync(ofs))
-		return 0;
+	if (!ovl_should_sync(ofs)) {
+		/* Propagate errors from upper to overlayfs */
+		ret = errseq_check(&upper_sb->s_wb_err, ofs->err_mark);
+		errseq_set(&sb->s_wb_err, ret);
+		return ret;
+	}
+
 	/*
 	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
 	 * All the super blocks will be iterated, including upper_sb.
@@ -1945,8 +1950,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
 
 		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
 		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
-
 	}
+
+	if (ofs->config.ovl_volatile)
+		ofs->err_mark = errseq_peek(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
+
 	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
 	err = PTR_ERR(oe);
 	if (IS_ERR(oe))
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case
  2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton
  2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton
  2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton
@ 2020-12-13 20:31 ` Sargun Dhillon
  2 siblings, 0 replies; 18+ messages in thread
From: Sargun Dhillon @ 2020-12-13 20:31 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Amir Goldstein, Miklos Szeredi, Vivek Goyal, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Sun, Dec 13, 2020 at 08:27:11AM -0500, Jeff Layton wrote:
> What about this as an alternate approach to the problem that Sargun has
> been working on? I have some minor concerns about the complexity of
> managing a stateful object across two different words. That can be
> done, but I think this may be simpler.
> 
> This set steals an extra flag bit from the errseq_t counter so that we
> have two flags: one indicating whether to increment the counter at set
> time, and another to indicate whether the error has been reported to
> userland.
> 

This approach works, and I believe you suggested it early on, but I was unsure
whether it was okay to use another bit for state information.

> This should give you the semantics you want in the syncfs case, no?  If
> this does look like it's a suitable approach, then I'll plan to clean up
> the comments and docs.
> 
From a raw semantics perspective, this looks correct, and it looks like we could
stash it as well for later reference (there's no going backwards, and....well,
2**19 errors is unlikely.). We do ~10s of overlayfs mounts / sec at peak,
but even then we usually see a single disk error on a machine before it fails,
I'm not sure if in the field people get more churn out of the errseq than that.


> I have a vague feeling that this might help us eventually kill the
> AS_EIO and AS_ENOSPC bits too, but that would require a bit more work to
> plumb in "since" samples at appropriate places.
> 
> Jeff Layton (2):
>   errseq: split the SEEN flag into two new flags
>   overlayfs: propagate errors from upper to overlay sb in sync_fs
> 
>  fs/overlayfs/ovl_entry.h |  1 +
>  fs/overlayfs/super.c     | 14 +++++++--
>  include/linux/errseq.h   |  2 ++
>  lib/errseq.c             | 64 +++++++++++++++++++++++++++++++++-------
>  4 files changed, 67 insertions(+), 14 deletions(-)
> 
> -- 
> 2.29.2
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags
  2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton
@ 2020-12-13 23:35   ` NeilBrown
  2020-12-14 13:37     ` Jeffrey Layton
  0 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2020-12-13 23:35 UTC (permalink / raw)
  To: Jeff Layton, Amir Goldstein, Sargun Dhillon
  Cc: Miklos Szeredi, Vivek Goyal, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 6688 bytes --]

On Sun, Dec 13 2020, Jeff Layton wrote:

> Overlayfs's volatile mounts want to be able to sample an error for
> their own purposes, without preventing a later opener from potentially
> seeing the error.
>
> The original reason for the SEEN flag was to make it so that we didn't
> need to increment the counter if nothing had observed the latest value
> and the error was the same. Eventually, a regression was reported in
> the errseq_t conversion, and we fixed that by using the SEEN flag to
> also mean that the error had been reported to userland at least once
> somewhere.
>
> Those are two different states, however. If we instead take a second
> flag bit from the counter, we can track these two things separately,
> and accomodate the overlayfs volatile mount use-case.
>
> Add a new MUSTINC flag that indicates that the counter must be
> incremented the next time an error is set, and rework the errseq
> functions to set and clear that flag whenever the SEEN bit is set or
> cleared.
>
> Test only for the MUSTINC bit when deciding whether to increment the
> counter and only for the SEEN bit when deciding what to return in
> errseq_sample.
>
> Add a new errseq_peek function to allow for the overlayfs use-case.
> This just grabs the latest counter and sets the MUSTINC bit, leaving
> the SEEN bit untouched.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  include/linux/errseq.h |  2 ++
>  lib/errseq.c           | 64 ++++++++++++++++++++++++++++++++++--------
>  2 files changed, 55 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/errseq.h b/include/linux/errseq.h
> index fc2777770768..6d4b9bc629ac 100644
> --- a/include/linux/errseq.h
> +++ b/include/linux/errseq.h
> @@ -9,6 +9,8 @@ typedef u32	errseq_t;
>  
>  errseq_t errseq_set(errseq_t *eseq, int err);
>  errseq_t errseq_sample(errseq_t *eseq);
> +errseq_t errseq_peek(errseq_t *eseq);
> +errseq_t errseq_sample_advance(errseq_t *eseq);
>  int errseq_check(errseq_t *eseq, errseq_t since);
>  int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
>  #endif
> diff --git a/lib/errseq.c b/lib/errseq.c
> index 81f9e33aa7e7..5cc830f0361b 100644
> --- a/lib/errseq.c
> +++ b/lib/errseq.c
> @@ -38,8 +38,11 @@
>  /* This bit is used as a flag to indicate whether the value has been seen */
>  #define ERRSEQ_SEEN		(1 << ERRSEQ_SHIFT)

Would this look nicer using the BIT() macro?

  #define ERRSEQ_SEEN		BIT(ERRSEQ_SHIFT)

>  
> +/* This bit indicates that value must be incremented even when error is same */
> +#define ERRSEQ_MUSTINC		(1 << (ERRSEQ_SHIFT + 1))

 #define ERRSEQ_MUSTINC		BIT(ERRSEQ_SHIFT+1)

or if you don't like the BIT macro (not everyone does), then maybe

 #define ERR_SEQ_MUSTINC	(ERRSEQ_SEEN << 1 )

??

> +
>  /* The lowest bit of the counter */
> -#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 1))
> +#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 2))

Ditto.

>  
>  /**
>   * errseq_set - set a errseq_t for later reporting
> @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err)
>  	for (;;) {
>  		errseq_t new;
>  
> -		/* Clear out error bits and set new error */
> -		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
> +		/* Clear out flag bits and set new error */
> +		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err;

This is starting to look clumsy (or maybe, this already looked clumsy,
but now that is hard to ignore).

		new = (old & (ERRSEQ_CTR_INC - 1)) | -err

Also this assumes MAX_ERRNO is a mask, which it is .. today.

	BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1));
??

>  
> -		/* Only increment if someone has looked at it */
> -		if (old & ERRSEQ_SEEN)
> +		/* Only increment if we have to */
> +		if (old & ERRSEQ_MUSTINC)
>  			new += ERRSEQ_CTR_INC;
>  
>  		/* If there would be no change, then call it done */
> @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set);
>  errseq_t errseq_sample(errseq_t *eseq)
>  {
>  	errseq_t old = READ_ONCE(*eseq);
> +	errseq_t new = old;
>  
> -	/* If nobody has seen this error yet, then we can be the first. */
> -	if (!(old & ERRSEQ_SEEN))
> -		old = 0;
> -	return old;
> +	/*
> +	 * For the common case of no errors ever having been set, we can skip
> +	 * marking the SEEN|MUSTINC bits. Once an error has been set, the value
> +	 * will never go back to zero.
> +	 */
> +	if (old != 0) {
> +		new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC;

You lose me here.  Why is ERRSEQ_SEEN being set, where it wasn't before?

The ERRSEQ_SEEN flag not means precisely "The error has been reported to
userspace".
This operations isn't used to report errors - that is errseq_check().

I'm not saying the code it wrong - I really cannot tell.
I'm just saying that I cannot see why it might be right.

Thanks,
NeilBrown



> +		if (old != new)
> +			cmpxchg(eseq, old, new);
> +		if (!(old & ERRSEQ_SEEN))
> +			return 0;
> +	}
> +	return new;
>  }
>  EXPORT_SYMBOL(errseq_sample);
>  
> +/**
> + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN
> + * @eseq: Pointer to errseq_t to be sampled.
> + *
> + * In some cases, we need to be able to sample the errseq_t, but we're not
> + * in a situation where we can report the value to userland. Use this
> + * function to do that. This ensures that later errors will be recorded,
> + * and that any current errors are reported at least once.
> + *
> + * Context: Any context.
> + * Return: The current errseq value.
> + */
> +errseq_t errseq_peek(errseq_t *eseq)
> +{
> +	errseq_t old = READ_ONCE(*eseq);
> +	errseq_t new = old;
> +
> +	if (old != 0) {
> +		new |= ERRSEQ_MUSTINC;
> +		if (old != new)
> +			cmpxchg(eseq, old, new);
> +	}
> +	return new;
> +}
> +EXPORT_SYMBOL(errseq_peek);
> +
>  /**
>   * errseq_check() - Has an error occurred since a particular sample point?
>   * @eseq: Pointer to errseq_t value to be checked.
> @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample);
>   */
>  int errseq_check(errseq_t *eseq, errseq_t since)
>  {
> -	errseq_t cur = READ_ONCE(*eseq);
> +	errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
> +
> +	/* Clear the flag bits for comparison */
> +	since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
>  
>  	if (likely(cur == since))
>  		return 0;
> @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
>  		 * can advance "since" and return an error based on what we
>  		 * have.
>  		 */
> -		new = old | ERRSEQ_SEEN;
> +		new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC;
>  		if (new != old)
>  			cmpxchg(eseq, old, new);
>  		*since = new;
> -- 
> 2.29.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 853 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags
  2020-12-13 23:35   ` NeilBrown
@ 2020-12-14 13:37     ` Jeffrey Layton
  2020-12-14 22:00       ` NeilBrown
  0 siblings, 1 reply; 18+ messages in thread
From: Jeffrey Layton @ 2020-12-14 13:37 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, Amir Goldstein, Sargun Dhillon, Miklos Szeredi,
	Vivek Goyal, overlayfs, Linux FS-devel Mailing List,
	Matthew Wilcox, NeilBrown, Jan Kara

On Mon, Dec 14, 2020 at 10:35:56AM +1100, NeilBrown wrote:
> On Sun, Dec 13 2020, Jeff Layton wrote:
> 
> > Overlayfs's volatile mounts want to be able to sample an error for
> > their own purposes, without preventing a later opener from potentially
> > seeing the error.
> >
> > The original reason for the SEEN flag was to make it so that we didn't
> > need to increment the counter if nothing had observed the latest value
> > and the error was the same. Eventually, a regression was reported in
> > the errseq_t conversion, and we fixed that by using the SEEN flag to
> > also mean that the error had been reported to userland at least once
> > somewhere.
> >
> > Those are two different states, however. If we instead take a second
> > flag bit from the counter, we can track these two things separately,
> > and accomodate the overlayfs volatile mount use-case.
> >
> > Add a new MUSTINC flag that indicates that the counter must be
> > incremented the next time an error is set, and rework the errseq
> > functions to set and clear that flag whenever the SEEN bit is set or
> > cleared.
> >
> > Test only for the MUSTINC bit when deciding whether to increment the
> > counter and only for the SEEN bit when deciding what to return in
> > errseq_sample.
> >
> > Add a new errseq_peek function to allow for the overlayfs use-case.
> > This just grabs the latest counter and sets the MUSTINC bit, leaving
> > the SEEN bit untouched.
> >
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  include/linux/errseq.h |  2 ++
> >  lib/errseq.c           | 64 ++++++++++++++++++++++++++++++++++--------
> >  2 files changed, 55 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/errseq.h b/include/linux/errseq.h
> > index fc2777770768..6d4b9bc629ac 100644
> > --- a/include/linux/errseq.h
> > +++ b/include/linux/errseq.h
> > @@ -9,6 +9,8 @@ typedef u32	errseq_t;
> >  
> >  errseq_t errseq_set(errseq_t *eseq, int err);
> >  errseq_t errseq_sample(errseq_t *eseq);
> > +errseq_t errseq_peek(errseq_t *eseq);
> > +errseq_t errseq_sample_advance(errseq_t *eseq);
> >  int errseq_check(errseq_t *eseq, errseq_t since);
> >  int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
> >  #endif
> > diff --git a/lib/errseq.c b/lib/errseq.c
> > index 81f9e33aa7e7..5cc830f0361b 100644
> > --- a/lib/errseq.c
> > +++ b/lib/errseq.c
> > @@ -38,8 +38,11 @@
> >  /* This bit is used as a flag to indicate whether the value has been seen */
> >  #define ERRSEQ_SEEN		(1 << ERRSEQ_SHIFT)
> 
> Would this look nicer using the BIT() macro?
> 
>   #define ERRSEQ_SEEN		BIT(ERRSEQ_SHIFT)
> 
> >  
> > +/* This bit indicates that value must be incremented even when error is same */
> > +#define ERRSEQ_MUSTINC		(1 << (ERRSEQ_SHIFT + 1))
> 
>  #define ERRSEQ_MUSTINC		BIT(ERRSEQ_SHIFT+1)
> 
> or if you don't like the BIT macro (not everyone does), then maybe
> 
>  #define ERR_SEQ_MUSTINC	(ERRSEQ_SEEN << 1 )
> 
> ??
> 
> > +
> >  /* The lowest bit of the counter */
> > -#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 1))
> > +#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 2))
> 
> Ditto.
> 

Yes, I can make that change. The BIT macro is much easier to read.

> >  
> >  /**
> >   * errseq_set - set a errseq_t for later reporting
> > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err)
> >  	for (;;) {
> >  		errseq_t new;
> >  
> > -		/* Clear out error bits and set new error */
> > -		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
> > +		/* Clear out flag bits and set new error */
> > +		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err;
> 
> This is starting to look clumsy (or maybe, this already looked clumsy,
> but now that is hard to ignore).
> 
> 		new = (old & (ERRSEQ_CTR_INC - 1)) | -err
> 

I think you mean:

		new = (old & ~(ERRSEQ_CTR_INC - 1)) | -err;

Maybe I can add a new ERRSEQ_CTR_MASK value though which makes it more
evident.

> Also this assumes MAX_ERRNO is a mask, which it is .. today.
> 
> 	BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1));
> ??
> 

We already have this in errseq_set:

        BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1);

> >  
> > -		/* Only increment if someone has looked at it */
> > -		if (old & ERRSEQ_SEEN)
> > +		/* Only increment if we have to */
> > +		if (old & ERRSEQ_MUSTINC)
> >  			new += ERRSEQ_CTR_INC;
> >  
> >  		/* If there would be no change, then call it done */
> > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set);
> >  errseq_t errseq_sample(errseq_t *eseq)
> >  {
> >  	errseq_t old = READ_ONCE(*eseq);
> > +	errseq_t new = old;
> >  
> > -	/* If nobody has seen this error yet, then we can be the first. */
> > -	if (!(old & ERRSEQ_SEEN))
> > -		old = 0;
> > -	return old;
> > +	/*
> > +	 * For the common case of no errors ever having been set, we can skip
> > +	 * marking the SEEN|MUSTINC bits. Once an error has been set, the value
> > +	 * will never go back to zero.
> > +	 */
> > +	if (old != 0) {
> > +		new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC;
> 
> You lose me here.  Why is ERRSEQ_SEEN being set, where it wasn't before?
> 
> The ERRSEQ_SEEN flag not means precisely "The error has been reported to
> userspace".
> This operations isn't used to report errors - that is errseq_check().
> 
> I'm not saying the code it wrong - I really cannot tell.
> I'm just saying that I cannot see why it might be right.
> 

I think you're right. We should not be setting SEEN here, but we do
need to set MUSTINC if it's not already set. I'll fix (and re-test).

Thanks for the review!

> 
> 
> 
> > +		if (old != new)
> > +			cmpxchg(eseq, old, new);
> > +		if (!(old & ERRSEQ_SEEN))
> > +			return 0;
> > +	}
> > +	return new;
> >  }
> >  EXPORT_SYMBOL(errseq_sample);
> >  
> > +/**
> > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN
> > + * @eseq: Pointer to errseq_t to be sampled.
> > + *
> > + * In some cases, we need to be able to sample the errseq_t, but we're not
> > + * in a situation where we can report the value to userland. Use this
> > + * function to do that. This ensures that later errors will be recorded,
> > + * and that any current errors are reported at least once.
> > + *
> > + * Context: Any context.
> > + * Return: The current errseq value.
> > + */
> > +errseq_t errseq_peek(errseq_t *eseq)
> > +{
> > +	errseq_t old = READ_ONCE(*eseq);
> > +	errseq_t new = old;
> > +
> > +	if (old != 0) {
> > +		new |= ERRSEQ_MUSTINC;
> > +		if (old != new)
> > +			cmpxchg(eseq, old, new);
> > +	}
> > +	return new;
> > +}
> > +EXPORT_SYMBOL(errseq_peek);
> > +
> >  /**
> >   * errseq_check() - Has an error occurred since a particular sample point?
> >   * @eseq: Pointer to errseq_t value to be checked.
> > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample);
> >   */
> >  int errseq_check(errseq_t *eseq, errseq_t since)
> >  {
> > -	errseq_t cur = READ_ONCE(*eseq);
> > +	errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
> > +
> > +	/* Clear the flag bits for comparison */
> > +	since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
> >  
> >  	if (likely(cur == since))
> >  		return 0;
> > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
> >  		 * can advance "since" and return an error based on what we
> >  		 * have.
> >  		 */
> > -		new = old | ERRSEQ_SEEN;
> > +		new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC;
> >  		if (new != old)
> >  			cmpxchg(eseq, old, new);
> >  		*since = new;
> > -- 
> > 2.29.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton
@ 2020-12-14 21:38   ` Vivek Goyal
  2020-12-14 22:04     ` Sargun Dhillon
  2020-12-14 23:53     ` Jeff Layton
  2020-12-17 19:28   ` Sargun Dhillon
  1 sibling, 2 replies; 18+ messages in thread
From: Vivek Goyal @ 2020-12-14 21:38 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> Peek at the upper layer's errseq_t at mount time for volatile mounts,
> and record it in the per-sb info. In sync_fs, check for an error since
> the recorded point and set it in the overlayfs superblock if there was
> one.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---

While we are solving problem for non-volatile overlay mount, I also
started thinking, what about non-volatile overlay syncfs() writeback errors.
Looks like these will not be reported to user space at all as of now
(because we never update overlay_sb->s_wb_err ever).

A patch like this might fix it. (compile tested only).

overlayfs: Report syncfs() errors to user space

Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
return code. But certain writeback errors can still be reported on 
syncfs() by checking errors on super block.

ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);

For the case of overlayfs, we never set overlayfs super block s_wb_err. That
means sync() will never report writeback errors on overlayfs uppon syncfs().

Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
should mean that user space syncfs() call should see writeback errors.

ovl_fsync() does not need anything special because if there are writeback
errors underlying filesystem will report it through vfs_fsync_range() return
code and user space will see it.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/overlayfs/ovl_entry.h |    1 +
 fs/overlayfs/super.c     |   14 +++++++++++---
 2 files changed, 12 insertions(+), 3 deletions(-)

Index: redhat-linux/fs/overlayfs/super.c
===================================================================
--- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
+++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
@@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
 {
 	struct ovl_fs *ofs = sb->s_fs_info;
 	struct super_block *upper_sb;
-	int ret;
+	int ret, ret2;
 
 	if (!ovl_upper_mnt(ofs))
 		return 0;
@@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
 	ret = sync_filesystem(upper_sb);
 	up_read(&upper_sb->s_umount);
 
-	return ret;
+	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
+		/* Upper sb has errors since last time */
+		spin_lock(&ofs->errseq_lock);
+		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
+						&sb->s_wb_err);
+		spin_unlock(&ofs->errseq_lock);
+	}
+	return ret ? ret : ret2;
 }
 
 /**
@@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
 	if (!cred)
 		goto out_err;
 
+	spin_lock_init(&ofs->errseq_lock);
 	/* Is there a reason anyone would want not to share whiteouts? */
 	ofs->share_whiteout = true;
 
@@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
 
 		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
 		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
-
+		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
 	}
 	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
 	err = PTR_ERR(oe);
Index: redhat-linux/fs/overlayfs/ovl_entry.h
===================================================================
--- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
+++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
@@ -79,6 +79,7 @@ struct ovl_fs {
 	atomic_long_t last_ino;
 	/* Whiteout dentry cache */
 	struct dentry *whiteout;
+	spinlock_t errseq_lock;
 };
 
 static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags
  2020-12-14 13:37     ` Jeffrey Layton
@ 2020-12-14 22:00       ` NeilBrown
  2020-12-14 23:32         ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: NeilBrown @ 2020-12-14 22:00 UTC (permalink / raw)
  To: Jeffrey Layton
  Cc: Jeff Layton, Amir Goldstein, Sargun Dhillon, Miklos Szeredi,
	Vivek Goyal, overlayfs, Linux FS-devel Mailing List,
	Matthew Wilcox, NeilBrown, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 8678 bytes --]

On Mon, Dec 14 2020, Jeffrey Layton wrote:

> On Mon, Dec 14, 2020 at 10:35:56AM +1100, NeilBrown wrote:
>> On Sun, Dec 13 2020, Jeff Layton wrote:
>> 
>> > Overlayfs's volatile mounts want to be able to sample an error for
>> > their own purposes, without preventing a later opener from potentially
>> > seeing the error.
>> >
>> > The original reason for the SEEN flag was to make it so that we didn't
>> > need to increment the counter if nothing had observed the latest value
>> > and the error was the same. Eventually, a regression was reported in
>> > the errseq_t conversion, and we fixed that by using the SEEN flag to
>> > also mean that the error had been reported to userland at least once
>> > somewhere.
>> >
>> > Those are two different states, however. If we instead take a second
>> > flag bit from the counter, we can track these two things separately,
>> > and accomodate the overlayfs volatile mount use-case.
>> >
>> > Add a new MUSTINC flag that indicates that the counter must be
>> > incremented the next time an error is set, and rework the errseq
>> > functions to set and clear that flag whenever the SEEN bit is set or
>> > cleared.
>> >
>> > Test only for the MUSTINC bit when deciding whether to increment the
>> > counter and only for the SEEN bit when deciding what to return in
>> > errseq_sample.
>> >
>> > Add a new errseq_peek function to allow for the overlayfs use-case.
>> > This just grabs the latest counter and sets the MUSTINC bit, leaving
>> > the SEEN bit untouched.
>> >
>> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> > ---
>> >  include/linux/errseq.h |  2 ++
>> >  lib/errseq.c           | 64 ++++++++++++++++++++++++++++++++++--------
>> >  2 files changed, 55 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/include/linux/errseq.h b/include/linux/errseq.h
>> > index fc2777770768..6d4b9bc629ac 100644
>> > --- a/include/linux/errseq.h
>> > +++ b/include/linux/errseq.h
>> > @@ -9,6 +9,8 @@ typedef u32	errseq_t;
>> >  
>> >  errseq_t errseq_set(errseq_t *eseq, int err);
>> >  errseq_t errseq_sample(errseq_t *eseq);
>> > +errseq_t errseq_peek(errseq_t *eseq);
>> > +errseq_t errseq_sample_advance(errseq_t *eseq);
>> >  int errseq_check(errseq_t *eseq, errseq_t since);
>> >  int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
>> >  #endif
>> > diff --git a/lib/errseq.c b/lib/errseq.c
>> > index 81f9e33aa7e7..5cc830f0361b 100644
>> > --- a/lib/errseq.c
>> > +++ b/lib/errseq.c
>> > @@ -38,8 +38,11 @@
>> >  /* This bit is used as a flag to indicate whether the value has been seen */
>> >  #define ERRSEQ_SEEN		(1 << ERRSEQ_SHIFT)
>> 
>> Would this look nicer using the BIT() macro?
>> 
>>   #define ERRSEQ_SEEN		BIT(ERRSEQ_SHIFT)
>> 
>> >  
>> > +/* This bit indicates that value must be incremented even when error is same */
>> > +#define ERRSEQ_MUSTINC		(1 << (ERRSEQ_SHIFT + 1))
>> 
>>  #define ERRSEQ_MUSTINC		BIT(ERRSEQ_SHIFT+1)
>> 
>> or if you don't like the BIT macro (not everyone does), then maybe
>> 
>>  #define ERR_SEQ_MUSTINC	(ERRSEQ_SEEN << 1 )
>> 
>> ??
>> 
>> > +
>> >  /* The lowest bit of the counter */
>> > -#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 1))
>> > +#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 2))
>> 
>> Ditto.
>> 
>
> Yes, I can make that change. The BIT macro is much easier to read.
>
>> >  
>> >  /**
>> >   * errseq_set - set a errseq_t for later reporting
>> > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err)
>> >  	for (;;) {
>> >  		errseq_t new;
>> >  
>> > -		/* Clear out error bits and set new error */
>> > -		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
>> > +		/* Clear out flag bits and set new error */
>> > +		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err;
>> 
>> This is starting to look clumsy (or maybe, this already looked clumsy,
>> but now that is hard to ignore).
>> 
>> 		new = (old & (ERRSEQ_CTR_INC - 1)) | -err
>> 
>
> I think you mean:
>
> 		new = (old & ~(ERRSEQ_CTR_INC - 1)) | -err;
>
> Maybe I can add a new ERRSEQ_CTR_MASK value though which makes it more
> evident.

Sounds good.

>
>> Also this assumes MAX_ERRNO is a mask, which it is .. today.
>> 
>> 	BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1));
>> ??
>> 
>
> We already have this in errseq_set:
>
>         BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1);

Oh good - I didn't see.

>
>> >  
>> > -		/* Only increment if someone has looked at it */
>> > -		if (old & ERRSEQ_SEEN)
>> > +		/* Only increment if we have to */
>> > +		if (old & ERRSEQ_MUSTINC)
>> >  			new += ERRSEQ_CTR_INC;
>> >  
>> >  		/* If there would be no change, then call it done */
>> > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set);
>> >  errseq_t errseq_sample(errseq_t *eseq)
>> >  {
>> >  	errseq_t old = READ_ONCE(*eseq);
>> > +	errseq_t new = old;
>> >  
>> > -	/* If nobody has seen this error yet, then we can be the first. */
>> > -	if (!(old & ERRSEQ_SEEN))
>> > -		old = 0;
>> > -	return old;
>> > +	/*
>> > +	 * For the common case of no errors ever having been set, we can skip
>> > +	 * marking the SEEN|MUSTINC bits. Once an error has been set, the value
>> > +	 * will never go back to zero.
>> > +	 */
>> > +	if (old != 0) {
>> > +		new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC;
>> 
>> You lose me here.  Why is ERRSEQ_SEEN being set, where it wasn't before?
>> 
>> The ERRSEQ_SEEN flag not means precisely "The error has been reported to
>> userspace".
>> This operations isn't used to report errors - that is errseq_check().
>> 
>> I'm not saying the code it wrong - I really cannot tell.
>> I'm just saying that I cannot see why it might be right.
>> 
>
> I think you're right. We should not be setting SEEN here, but we do
> need to set MUSTINC if it's not already set. I'll fix (and re-test).

Thanks.  Though it isn't clear to me why MUSTINC needs to be set there,
so if you could make that clear, it would help me.

Also, the two flags seem similar in how they are handled, only tracking
different states, but their names don't reflect that.
I imagine changing "SEEN" to "MUST_REPORT" or similar, so both flags are
"MUST_XXX".
Only I think we would then need to invert "SEEN" - as it currently means
"MUSTN'T_REPORT" .. approximately.

Or maybe we could replace MUST_INC by DID_INC, so it says what has been
done, rather than what must be done.

Or maybe not.  Certainly it would be useful to have a clear picture of
how the two flags are similar, and how they are different.

Thanks,
NeilBrown


>
> Thanks for the review!
>
>> 
>> 
>> 
>> > +		if (old != new)
>> > +			cmpxchg(eseq, old, new);
>> > +		if (!(old & ERRSEQ_SEEN))
>> > +			return 0;
>> > +	}
>> > +	return new;
>> >  }
>> >  EXPORT_SYMBOL(errseq_sample);
>> >  
>> > +/**
>> > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN
>> > + * @eseq: Pointer to errseq_t to be sampled.
>> > + *
>> > + * In some cases, we need to be able to sample the errseq_t, but we're not
>> > + * in a situation where we can report the value to userland. Use this
>> > + * function to do that. This ensures that later errors will be recorded,
>> > + * and that any current errors are reported at least once.
>> > + *
>> > + * Context: Any context.
>> > + * Return: The current errseq value.
>> > + */
>> > +errseq_t errseq_peek(errseq_t *eseq)
>> > +{
>> > +	errseq_t old = READ_ONCE(*eseq);
>> > +	errseq_t new = old;
>> > +
>> > +	if (old != 0) {
>> > +		new |= ERRSEQ_MUSTINC;
>> > +		if (old != new)
>> > +			cmpxchg(eseq, old, new);
>> > +	}
>> > +	return new;
>> > +}
>> > +EXPORT_SYMBOL(errseq_peek);
>> > +
>> >  /**
>> >   * errseq_check() - Has an error occurred since a particular sample point?
>> >   * @eseq: Pointer to errseq_t value to be checked.
>> > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample);
>> >   */
>> >  int errseq_check(errseq_t *eseq, errseq_t since)
>> >  {
>> > -	errseq_t cur = READ_ONCE(*eseq);
>> > +	errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
>> > +
>> > +	/* Clear the flag bits for comparison */
>> > +	since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
>> >  
>> >  	if (likely(cur == since))
>> >  		return 0;
>> > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
>> >  		 * can advance "since" and return an error based on what we
>> >  		 * have.
>> >  		 */
>> > -		new = old | ERRSEQ_SEEN;
>> > +		new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC;
>> >  		if (new != old)
>> >  			cmpxchg(eseq, old, new);
>> >  		*since = new;
>> > -- 
>> > 2.29.2

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 853 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-14 21:38   ` Vivek Goyal
@ 2020-12-14 22:04     ` Sargun Dhillon
  2020-12-14 23:01       ` Vivek Goyal
  2020-12-14 23:53     ` Jeff Layton
  1 sibling, 1 reply; 18+ messages in thread
From: Sargun Dhillon @ 2020-12-14 22:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Jeff Layton, Amir Goldstein, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Mon, Dec 14, 2020 at 04:38:43PM -0500, Vivek Goyal wrote:
> On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > and record it in the per-sb info. In sync_fs, check for an error since
> > the recorded point and set it in the overlayfs superblock if there was
> > one.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> 
> While we are solving problem for non-volatile overlay mount, I also
> started thinking, what about non-volatile overlay syncfs() writeback errors.
> Looks like these will not be reported to user space at all as of now
> (because we never update overlay_sb->s_wb_err ever).
> 
> A patch like this might fix it. (compile tested only).
> 
> overlayfs: Report syncfs() errors to user space
> 
> Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> return code. But certain writeback errors can still be reported on 
> syncfs() by checking errors on super block.
> 
> ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> 
> For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> means sync() will never report writeback errors on overlayfs uppon syncfs().
> 
> Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> should mean that user space syncfs() call should see writeback errors.
> 
> ovl_fsync() does not need anything special because if there are writeback
> errors underlying filesystem will report it through vfs_fsync_range() return
> code and user space will see it.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/ovl_entry.h |    1 +
>  fs/overlayfs/super.c     |   14 +++++++++++---
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> Index: redhat-linux/fs/overlayfs/super.c
> ===================================================================
> --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
>  {
>  	struct ovl_fs *ofs = sb->s_fs_info;
>  	struct super_block *upper_sb;
> -	int ret;
> +	int ret, ret2;
>  
>  	if (!ovl_upper_mnt(ofs))
>  		return 0;
> @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
>  	ret = sync_filesystem(upper_sb);
>  	up_read(&upper_sb->s_umount);
>  
> -	return ret;
> +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> +		/* Upper sb has errors since last time */
> +		spin_lock(&ofs->errseq_lock);
> +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> +						&sb->s_wb_err);
> +		spin_unlock(&ofs->errseq_lock);
> +	}
> +	return ret ? ret : ret2;
>  }
>  
>  /**
> @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
>  	if (!cred)
>  		goto out_err;
>  
> +	spin_lock_init(&ofs->errseq_lock);
>  	/* Is there a reason anyone would want not to share whiteouts? */
>  	ofs->share_whiteout = true;
>  
> @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
>  
>  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
>  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> -
> +		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
>  	}
>  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
>  	err = PTR_ERR(oe);
> Index: redhat-linux/fs/overlayfs/ovl_entry.h
> ===================================================================
> --- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
> +++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
> @@ -79,6 +79,7 @@ struct ovl_fs {
>  	atomic_long_t last_ino;
>  	/* Whiteout dentry cache */
>  	struct dentry *whiteout;
> +	spinlock_t errseq_lock;
>  };
>  
>  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> 

This was on my list of things to look at. I don't think we can / should use 
errseq_check_and_advance because it will hide errors from userspace. I think we 
need something like:

At startup, call errseq_peek and stash that value somewhere. This sets the 
MUSTINC flag.

At syncfs time: call errseq check, if it says there is an error, call 
errseq_peek again, and store the error in our superblock. Take the error value 
from the differenceb between the previous one and the new one, and copy it up to 
the superblock.

Either way, I think Jeff's work of making it so other kernel subsytems can 
interact with errseq on a superblock bears fruit elsewhere. If the first patch 
gets merged, I can put together the patches to do the standard error bubble
up for normal syncfs, volatile syncfs, and volatile remount.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-14 22:04     ` Sargun Dhillon
@ 2020-12-14 23:01       ` Vivek Goyal
  0 siblings, 0 replies; 18+ messages in thread
From: Vivek Goyal @ 2020-12-14 23:01 UTC (permalink / raw)
  To: Sargun Dhillon
  Cc: Jeff Layton, Amir Goldstein, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Mon, Dec 14, 2020 at 10:04:14PM +0000, Sargun Dhillon wrote:
> On Mon, Dec 14, 2020 at 04:38:43PM -0500, Vivek Goyal wrote:
> > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > > and record it in the per-sb info. In sync_fs, check for an error since
> > > the recorded point and set it in the overlayfs superblock if there was
> > > one.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > 
> > While we are solving problem for non-volatile overlay mount, I also
> > started thinking, what about non-volatile overlay syncfs() writeback errors.
> > Looks like these will not be reported to user space at all as of now
> > (because we never update overlay_sb->s_wb_err ever).
> > 
> > A patch like this might fix it. (compile tested only).
> > 
> > overlayfs: Report syncfs() errors to user space
> > 
> > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> > return code. But certain writeback errors can still be reported on 
> > syncfs() by checking errors on super block.
> > 
> > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> > 
> > For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> > means sync() will never report writeback errors on overlayfs uppon syncfs().
> > 
> > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> > should mean that user space syncfs() call should see writeback errors.
> > 
> > ovl_fsync() does not need anything special because if there are writeback
> > errors underlying filesystem will report it through vfs_fsync_range() return
> > code and user space will see it.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/ovl_entry.h |    1 +
> >  fs/overlayfs/super.c     |   14 +++++++++++---
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > Index: redhat-linux/fs/overlayfs/super.c
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
> >  {
> >  	struct ovl_fs *ofs = sb->s_fs_info;
> >  	struct super_block *upper_sb;
> > -	int ret;
> > +	int ret, ret2;
> >  
> >  	if (!ovl_upper_mnt(ofs))
> >  		return 0;
> > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
> >  	ret = sync_filesystem(upper_sb);
> >  	up_read(&upper_sb->s_umount);
> >  
> > -	return ret;
> > +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > +		/* Upper sb has errors since last time */
> > +		spin_lock(&ofs->errseq_lock);
> > +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> > +						&sb->s_wb_err);
> > +		spin_unlock(&ofs->errseq_lock);
> > +	}
> > +	return ret ? ret : ret2;
> >  }
> >  
> >  /**
> > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
> >  	if (!cred)
> >  		goto out_err;
> >  
> > +	spin_lock_init(&ofs->errseq_lock);
> >  	/* Is there a reason anyone would want not to share whiteouts? */
> >  	ofs->share_whiteout = true;
> >  
> > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
> >  
> >  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> >  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > -
> > +		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> >  	}
> >  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
> >  	err = PTR_ERR(oe);
> > Index: redhat-linux/fs/overlayfs/ovl_entry.h
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
> > @@ -79,6 +79,7 @@ struct ovl_fs {
> >  	atomic_long_t last_ino;
> >  	/* Whiteout dentry cache */
> >  	struct dentry *whiteout;
> > +	spinlock_t errseq_lock;
> >  };
> >  
> >  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > 
> 
> This was on my list of things to look at. I don't think we can / should use 
> errseq_check_and_advance because it will hide errors from userspace.

Hi Sargun,

I have been struggling to figure out when to use
errseq_check_and_advance() and when to use this error_check() and
errorseq_set() combination.

My rational for using errseq_check_and_advance() is that say there
is an unseen error on upper super block, and if overlayfs calls
syncfs(), then this call should set SEEN flag on upper super
block, isn't it. This is equivalent of an app directly calling
syncfs() on upper.

If we use error_check() and errseq_set() combination, then we
are just reading state of upper superblock but really not impacting
it despite the fact overlay apps are calling syncfs().

Comapre it with fsync(). For non-volatile overlay, an fsync() overlay
call will set SEEN flag on upper file. And I believe same thing
should happen for ovl_syncfs() call as well. It makes more sense to me.

Wondering how will it hide errors from user space. ovl "struct file"
will have its own f->f_sb_err initialized from overlay superblock. And
if overlay super block gets updated with error, a later
errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err) should
still return an error.

What am I missing?

Vivek


> I think we 
> need something like:
> 
> At startup, call errseq_peek and stash that value somewhere. This sets the 
> MUSTINC flag.

So this errseq_peek() stuff is required only if we don't want to 
consume error while checking for error. In fact consuming error
is not bad as long as we do it only ovl_syncfs() path. In fact
it will match current semantics of syncfs().

But issue we have is that we want to check for error outside syncfs()
path too and don't want to consume it (otherwise it breaks the semantics
that it will bee seen marked in syncfs() path).

So that's why this notion of checking error without consuming it
so tha we can check it in remount path.

But in syncfs() path, it should be ok to consume unseen error and
we should be able to call errseq_check_and_advance(), both for
volatile and non-volatile mounts, isn't it?

For ther paths, like remount, we probably can stash away on persistent
storage and compare that value on remount and fail remount without
actually consuming unseen error (because it is not syncfs path). This
possibly can be used in other paths like read/write as well to
make sure we can notice error without consuming it.

IOW, we seem to have to paths we want to check errors in. In ovl_syncfs()
path we should be able consume exisiting unseen error on upper, so
errseq_check_and_advance() makes sense. In rest of the paths, we
should use new semantics to check for errors.

Vivek

> 
> At syncfs time: call errseq check, if it says there is an error, call 
> errseq_peek again, and store the error in our superblock. Take the error value 
> from the differenceb between the previous one and the new one, and copy it up to 
> the superblock.
> 
> Either way, I think Jeff's work of making it so other kernel subsytems can 
> interact with errseq on a superblock bears fruit elsewhere. If the first patch 
> gets merged, I can put together the patches to do the standard error bubble
> up for normal syncfs, volatile syncfs, and volatile remount.
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags
  2020-12-14 22:00       ` NeilBrown
@ 2020-12-14 23:32         ` Jeff Layton
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Layton @ 2020-12-14 23:32 UTC (permalink / raw)
  To: NeilBrown
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, Vivek Goyal,
	overlayfs, Linux FS-devel Mailing List, Matthew Wilcox,
	NeilBrown, Jan Kara

On Tue, 2020-12-15 at 09:00 +1100, NeilBrown wrote:
> On Mon, Dec 14 2020, Jeffrey Layton wrote:
> 
> > On Mon, Dec 14, 2020 at 10:35:56AM +1100, NeilBrown wrote:
> > > On Sun, Dec 13 2020, Jeff Layton wrote:
> > > 
> > > > Overlayfs's volatile mounts want to be able to sample an error for
> > > > their own purposes, without preventing a later opener from potentially
> > > > seeing the error.
> > > > 
> > > > The original reason for the SEEN flag was to make it so that we didn't
> > > > need to increment the counter if nothing had observed the latest value
> > > > and the error was the same. Eventually, a regression was reported in
> > > > the errseq_t conversion, and we fixed that by using the SEEN flag to
> > > > also mean that the error had been reported to userland at least once
> > > > somewhere.
> > > > 
> > > > Those are two different states, however. If we instead take a second
> > > > flag bit from the counter, we can track these two things separately,
> > > > and accomodate the overlayfs volatile mount use-case.
> > > > 
> > > > Add a new MUSTINC flag that indicates that the counter must be
> > > > incremented the next time an error is set, and rework the errseq
> > > > functions to set and clear that flag whenever the SEEN bit is set or
> > > > cleared.
> > > > 
> > > > Test only for the MUSTINC bit when deciding whether to increment the
> > > > counter and only for the SEEN bit when deciding what to return in
> > > > errseq_sample.
> > > > 
> > > > Add a new errseq_peek function to allow for the overlayfs use-case.
> > > > This just grabs the latest counter and sets the MUSTINC bit, leaving
> > > > the SEEN bit untouched.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > >  include/linux/errseq.h |  2 ++
> > > >  lib/errseq.c           | 64 ++++++++++++++++++++++++++++++++++--------
> > > >  2 files changed, 55 insertions(+), 11 deletions(-)
> > > > 
> > > > diff --git a/include/linux/errseq.h b/include/linux/errseq.h
> > > > index fc2777770768..6d4b9bc629ac 100644
> > > > --- a/include/linux/errseq.h
> > > > +++ b/include/linux/errseq.h
> > > > @@ -9,6 +9,8 @@ typedef u32	errseq_t;
> > > >  
> > > > 
> > > > 
> > > > 
> > > >  errseq_t errseq_set(errseq_t *eseq, int err);
> > > >  errseq_t errseq_sample(errseq_t *eseq);
> > > > +errseq_t errseq_peek(errseq_t *eseq);
> > > > +errseq_t errseq_sample_advance(errseq_t *eseq);
> > > >  int errseq_check(errseq_t *eseq, errseq_t since);
> > > >  int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
> > > >  #endif
> > > > diff --git a/lib/errseq.c b/lib/errseq.c
> > > > index 81f9e33aa7e7..5cc830f0361b 100644
> > > > --- a/lib/errseq.c
> > > > +++ b/lib/errseq.c
> > > > @@ -38,8 +38,11 @@
> > > >  /* This bit is used as a flag to indicate whether the value has been seen */
> > > >  #define ERRSEQ_SEEN		(1 << ERRSEQ_SHIFT)
> > > 
> > > Would this look nicer using the BIT() macro?
> > > 
> > >   #define ERRSEQ_SEEN		BIT(ERRSEQ_SHIFT)
> > > 
> > > >  
> > > > 
> > > > 
> > > > 
> > > > +/* This bit indicates that value must be incremented even when error is same */
> > > > +#define ERRSEQ_MUSTINC		(1 << (ERRSEQ_SHIFT + 1))
> > > 
> > >  #define ERRSEQ_MUSTINC		BIT(ERRSEQ_SHIFT+1)
> > > 
> > > or if you don't like the BIT macro (not everyone does), then maybe
> > > 
> > >  #define ERR_SEQ_MUSTINC	(ERRSEQ_SEEN << 1 )
> > > 
> > > ??
> > > 
> > > > +
> > > >  /* The lowest bit of the counter */
> > > > -#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 1))
> > > > +#define ERRSEQ_CTR_INC		(1 << (ERRSEQ_SHIFT + 2))
> > > 
> > > Ditto.
> > > 
> > 
> > Yes, I can make that change. The BIT macro is much easier to read.
> > 
> > > >  
> > > > 
> > > > 
> > > > 
> > > >  /**
> > > >   * errseq_set - set a errseq_t for later reporting
> > > > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err)
> > > >  	for (;;) {
> > > >  		errseq_t new;
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -		/* Clear out error bits and set new error */
> > > > -		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err;
> > > > +		/* Clear out flag bits and set new error */
> > > > +		new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err;
> > > 
> > > This is starting to look clumsy (or maybe, this already looked clumsy,
> > > but now that is hard to ignore).
> > > 
> > > 		new = (old & (ERRSEQ_CTR_INC - 1)) | -err
> > > 
> > 
> > I think you mean:
> > 
> > 		new = (old & ~(ERRSEQ_CTR_INC - 1)) | -err;
> > 
> > Maybe I can add a new ERRSEQ_CTR_MASK value though which makes it more
> > evident.
> 
> Sounds good.
> 
> > 
> > > Also this assumes MAX_ERRNO is a mask, which it is .. today.
> > > 
> > > 	BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1));
> > > ??
> > > 
> > 
> > We already have this in errseq_set:
> > 
> >         BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1);
> 
> Oh good - I didn't see.
> 
> > 
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -		/* Only increment if someone has looked at it */
> > > > -		if (old & ERRSEQ_SEEN)
> > > > +		/* Only increment if we have to */
> > > > +		if (old & ERRSEQ_MUSTINC)
> > > >  			new += ERRSEQ_CTR_INC;
> > > >  
> > > > 
> > > > 
> > > > 
> > > >  		/* If there would be no change, then call it done */
> > > > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set);
> > > >  errseq_t errseq_sample(errseq_t *eseq)
> > > >  {
> > > >  	errseq_t old = READ_ONCE(*eseq);
> > > > +	errseq_t new = old;
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -	/* If nobody has seen this error yet, then we can be the first. */
> > > > -	if (!(old & ERRSEQ_SEEN))
> > > > -		old = 0;
> > > > -	return old;
> > > > +	/*
> > > > +	 * For the common case of no errors ever having been set, we can skip
> > > > +	 * marking the SEEN|MUSTINC bits. Once an error has been set, the value
> > > > +	 * will never go back to zero.
> > > > +	 */
> > > > +	if (old != 0) {
> > > > +		new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC;
> > > 
> > > You lose me here.  Why is ERRSEQ_SEEN being set, where it wasn't before?
> > > 
> > > The ERRSEQ_SEEN flag not means precisely "The error has been reported to
> > > userspace".
> > > This operations isn't used to report errors - that is errseq_check().
> > > 
> > > I'm not saying the code it wrong - I really cannot tell.
> > > I'm just saying that I cannot see why it might be right.
> > > 
> > 
> > I think you're right. We should not be setting SEEN here, but we do
> > need to set MUSTINC if it's not already set. I'll fix (and re-test).
> 
> Thanks.  Though it isn't clear to me why MUSTINC needs to be set there,
> so if you could make that clear, it would help me.
> 
> Also, the two flags seem similar in how they are handled, only tracking
> different states, but their names don't reflect that.
> I imagine changing "SEEN" to "MUST_REPORT" or similar, so both flags are
> "MUST_XXX".
> Only I think we would then need to invert "SEEN" - as it currently means
> "MUSTN'T_REPORT" .. approximately.
> 
> Or maybe we could replace MUST_INC by DID_INC, so it says what has been
> done, rather than what must be done.
> 
> Or maybe not.  Certainly it would be useful to have a clear picture of
> how the two flags are similar, and how they are different.
> 


You need to set MUSTINC in errseq_peek to ensure that the next error
that occurs will be recorded, via the counter being bumped. Otherwise
that increment may be skipped (if no one else observed the last error).

I sent a v2 set before I saw your mail. Hopefully it addresses some of
your concerns.

You're right that the flag naming is a bit awkward. I'm open to
suggestions for names, but I'd probably like to keep the "sense" of the
flags so that I don't need to sort out the logic again. It also works
better with 0 being a special value that way.



> Thanks,
> NeilBrown
> 
> 
> > 
> > Thanks for the review!
> > 
> > > 
> > > 
> > > 
> > > > +		if (old != new)
> > > > +			cmpxchg(eseq, old, new);
> > > > +		if (!(old & ERRSEQ_SEEN))
> > > > +			return 0;
> > > > +	}
> > > > +	return new;
> > > >  }
> > > >  EXPORT_SYMBOL(errseq_sample);
> > > >  
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > +/**
> > > > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN
> > > > + * @eseq: Pointer to errseq_t to be sampled.
> > > > + *
> > > > + * In some cases, we need to be able to sample the errseq_t, but we're not
> > > > + * in a situation where we can report the value to userland. Use this
> > > > + * function to do that. This ensures that later errors will be recorded,
> > > > + * and that any current errors are reported at least once.
> > > > + *
> > > > + * Context: Any context.
> > > > + * Return: The current errseq value.
> > > > + */
> > > > +errseq_t errseq_peek(errseq_t *eseq)
> > > > +{
> > > > +	errseq_t old = READ_ONCE(*eseq);
> > > > +	errseq_t new = old;
> > > > +
> > > > +	if (old != 0) {
> > > > +		new |= ERRSEQ_MUSTINC;
> > > > +		if (old != new)
> > > > +			cmpxchg(eseq, old, new);
> > > > +	}
> > > > +	return new;
> > > > +}
> > > > +EXPORT_SYMBOL(errseq_peek);
> > > > +
> > > >  /**
> > > >   * errseq_check() - Has an error occurred since a particular sample point?
> > > >   * @eseq: Pointer to errseq_t value to be checked.
> > > > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample);
> > > >   */
> > > >  int errseq_check(errseq_t *eseq, errseq_t since)
> > > >  {
> > > > -	errseq_t cur = READ_ONCE(*eseq);
> > > > +	errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
> > > > +
> > > > +	/* Clear the flag bits for comparison */
> > > > +	since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN);
> > > >  
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >  	if (likely(cur == since))
> > > >  		return 0;
> > > > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since)
> > > >  		 * can advance "since" and return an error based on what we
> > > >  		 * have.
> > > >  		 */
> > > > -		new = old | ERRSEQ_SEEN;
> > > > +		new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC;
> > > >  		if (new != old)
> > > >  			cmpxchg(eseq, old, new);
> > > >  		*since = new;
> > > > -- 
> > > > 2.29.2

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-14 21:38   ` Vivek Goyal
  2020-12-14 22:04     ` Sargun Dhillon
@ 2020-12-14 23:53     ` Jeff Layton
  2020-12-15 13:16       ` Jeff Layton
  2020-12-15 15:06       ` Vivek Goyal
  1 sibling, 2 replies; 18+ messages in thread
From: Jeff Layton @ 2020-12-14 23:53 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote:
> On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > and record it in the per-sb info. In sync_fs, check for an error since
> > the recorded point and set it in the overlayfs superblock if there was
> > one.
> > 
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> 
> While we are solving problem for non-volatile overlay mount, I also
> started thinking, what about non-volatile overlay syncfs() writeback errors.
> Looks like these will not be reported to user space at all as of now
> (because we never update overlay_sb->s_wb_err ever).
> 
> A patch like this might fix it. (compile tested only).
> 
> overlayfs: Report syncfs() errors to user space
> 
> Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> return code. But certain writeback errors can still be reported on 
> syncfs() by checking errors on super block.
> 
> ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> 
> For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> means sync() will never report writeback errors on overlayfs uppon syncfs().
> 
> Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> should mean that user space syncfs() call should see writeback errors.
> 
> ovl_fsync() does not need anything special because if there are writeback
> errors underlying filesystem will report it through vfs_fsync_range() return
> code and user space will see it.
> 
> Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> ---
>  fs/overlayfs/ovl_entry.h |    1 +
>  fs/overlayfs/super.c     |   14 +++++++++++---
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> Index: redhat-linux/fs/overlayfs/super.c
> ===================================================================
> --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
>  {
>  	struct ovl_fs *ofs = sb->s_fs_info;
>  	struct super_block *upper_sb;
> -	int ret;
> +	int ret, ret2;
>  
> 
> 
> 
>  	if (!ovl_upper_mnt(ofs))
>  		return 0;
> @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
>  	ret = sync_filesystem(upper_sb);
>  	up_read(&upper_sb->s_umount);
>  
> 
> 
> 
> -	return ret;
> +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> +		/* Upper sb has errors since last time */
> +		spin_lock(&ofs->errseq_lock);
> +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> +						&sb->s_wb_err);
> +		spin_unlock(&ofs->errseq_lock);
> +	}
> +	return ret ? ret : ret2;

I think this is probably not quite right.

The problem I think is that the SEEN flag is always going to end up
being set in sb->s_wb_err, and that is going to violate the desired
semantics. If the writeback error occurred after all fd's were closed,
then the next opener wouldn't see it and you'd lose the error.

We probably need a function to cleanly propagate the error from one
errseq_t to another so that that doesn't occur. I'll have to think about
it.

>  }
>  
> 
> 
> 
>  /**
> @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
>  	if (!cred)
>  		goto out_err;
>  
> 
> 
> 
> +	spin_lock_init(&ofs->errseq_lock);
>  	/* Is there a reason anyone would want not to share whiteouts? */
>  	ofs->share_whiteout = true;
>  
> 
> 
> 
> @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
>  
> 
> 
> 
>  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
>  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> -
> +		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
>  	}
>  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
>  	err = PTR_ERR(oe);
> Index: redhat-linux/fs/overlayfs/ovl_entry.h
> ===================================================================
> --- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
> +++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
> @@ -79,6 +79,7 @@ struct ovl_fs {
>  	atomic_long_t last_ino;
>  	/* Whiteout dentry cache */
>  	struct dentry *whiteout;
> +	spinlock_t errseq_lock;
>  };
>  
> 
> 
> 
>  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-14 23:53     ` Jeff Layton
@ 2020-12-15 13:16       ` Jeff Layton
  2020-12-15 14:59         ` Vivek Goyal
  2020-12-15 15:06       ` Vivek Goyal
  1 sibling, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2020-12-15 13:16 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote:
> On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote:
> > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > > and record it in the per-sb info. In sync_fs, check for an error since
> > > the recorded point and set it in the overlayfs superblock if there was
> > > one.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > 
> > While we are solving problem for non-volatile overlay mount, I also
> > started thinking, what about non-volatile overlay syncfs() writeback errors.
> > Looks like these will not be reported to user space at all as of now
> > (because we never update overlay_sb->s_wb_err ever).
> > 
> > A patch like this might fix it. (compile tested only).
> > 
> > overlayfs: Report syncfs() errors to user space
> > 
> > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> > return code. But certain writeback errors can still be reported on 
> > syncfs() by checking errors on super block.
> > 
> > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> > 
> > For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> > means sync() will never report writeback errors on overlayfs uppon syncfs().
> > 
> > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> > should mean that user space syncfs() call should see writeback errors.
> > 
> > ovl_fsync() does not need anything special because if there are writeback
> > errors underlying filesystem will report it through vfs_fsync_range() return
> > code and user space will see it.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/ovl_entry.h |    1 +
> >  fs/overlayfs/super.c     |   14 +++++++++++---
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > Index: redhat-linux/fs/overlayfs/super.c
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
> >  {
> >  	struct ovl_fs *ofs = sb->s_fs_info;
> >  	struct super_block *upper_sb;
> > -	int ret;
> > +	int ret, ret2;
> >  
> > 
> > 
> > 
> >  	if (!ovl_upper_mnt(ofs))
> >  		return 0;
> > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
> >  	ret = sync_filesystem(upper_sb);
> >  	up_read(&upper_sb->s_umount);
> >  
> > 
> > 
> > 
> > -	return ret;
> > +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > +		/* Upper sb has errors since last time */
> > +		spin_lock(&ofs->errseq_lock);
> > +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> > +						&sb->s_wb_err);
> > +		spin_unlock(&ofs->errseq_lock);
> > +	}
> > +	return ret ? ret : ret2;
> 
> I think this is probably not quite right.
> 
> The problem I think is that the SEEN flag is always going to end up
> being set in sb->s_wb_err, and that is going to violate the desired
> semantics. If the writeback error occurred after all fd's were closed,
> then the next opener wouldn't see it and you'd lose the error.
> 
> We probably need a function to cleanly propagate the error from one
> errseq_t to another so that that doesn't occur. I'll have to think about
> it.
> 

So, the problem is that we can't guarantee that we'll have an open file
when sync_fs is called. So if you do the check_and_advance in the
context of a sync() syscall, you'll effectively ensure that a later
opener on the upper layer won't see the error (since the upper_sb's
errseq_t will be marked SEEN.

It's not clear to me what semantics you want in the following situation:

mount upper layer
mount overlayfs with non-volatile upper layer
do "stuff" on overlayfs, and close all files on overlayfs
get a writeback error on upper layer
call sync() (sync_fs gets run)
open file on upper layer mount
call syncfs() on upper-layer fd

Should that last syncfs error report an error?

Also, suppose if at the end we instead opened a file on overlayfs and
issued the syncfs() there -- should we see the error in that case? 

> >  }
> >  
> > 
> > 
> > 
> >  /**
> > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
> >  	if (!cred)
> >  		goto out_err;
> >  
> > 
> > 
> > 
> > +	spin_lock_init(&ofs->errseq_lock);
> >  	/* Is there a reason anyone would want not to share whiteouts? */
> >  	ofs->share_whiteout = true;
> >  
> > 
> > 
> > 
> > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
> >  
> > 
> > 
> > 
> >  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> >  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > -
> > +		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> >  	}
> >  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
> >  	err = PTR_ERR(oe);
> > Index: redhat-linux/fs/overlayfs/ovl_entry.h
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
> > @@ -79,6 +79,7 @@ struct ovl_fs {
> >  	atomic_long_t last_ino;
> >  	/* Whiteout dentry cache */
> >  	struct dentry *whiteout;
> > +	spinlock_t errseq_lock;
> >  };
> >  
> > 
> > 
> > 
> >  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > 
> 

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-15 13:16       ` Jeff Layton
@ 2020-12-15 14:59         ` Vivek Goyal
  2020-12-15 15:23           ` Jeff Layton
  0 siblings, 1 reply; 18+ messages in thread
From: Vivek Goyal @ 2020-12-15 14:59 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Tue, Dec 15, 2020 at 08:16:12AM -0500, Jeff Layton wrote:
> On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote:
> > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote:
> > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > > > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > > > and record it in the per-sb info. In sync_fs, check for an error since
> > > > the recorded point and set it in the overlayfs superblock if there was
> > > > one.
> > > > 
> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > ---
> > > 
> > > While we are solving problem for non-volatile overlay mount, I also
> > > started thinking, what about non-volatile overlay syncfs() writeback errors.
> > > Looks like these will not be reported to user space at all as of now
> > > (because we never update overlay_sb->s_wb_err ever).
> > > 
> > > A patch like this might fix it. (compile tested only).
> > > 
> > > overlayfs: Report syncfs() errors to user space
> > > 
> > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> > > return code. But certain writeback errors can still be reported on 
> > > syncfs() by checking errors on super block.
> > > 
> > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> > > 
> > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> > > means sync() will never report writeback errors on overlayfs uppon syncfs().
> > > 
> > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> > > should mean that user space syncfs() call should see writeback errors.
> > > 
> > > ovl_fsync() does not need anything special because if there are writeback
> > > errors underlying filesystem will report it through vfs_fsync_range() return
> > > code and user space will see it.
> > > 
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > ---
> > >  fs/overlayfs/ovl_entry.h |    1 +
> > >  fs/overlayfs/super.c     |   14 +++++++++++---
> > >  2 files changed, 12 insertions(+), 3 deletions(-)
> > > 
> > > Index: redhat-linux/fs/overlayfs/super.c
> > > ===================================================================
> > > --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> > > +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
> > >  {
> > >  	struct ovl_fs *ofs = sb->s_fs_info;
> > >  	struct super_block *upper_sb;
> > > -	int ret;
> > > +	int ret, ret2;
> > >  
> > > 
> > > 
> > > 
> > >  	if (!ovl_upper_mnt(ofs))
> > >  		return 0;
> > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
> > >  	ret = sync_filesystem(upper_sb);
> > >  	up_read(&upper_sb->s_umount);
> > >  
> > > 
> > > 
> > > 
> > > -	return ret;
> > > +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > > +		/* Upper sb has errors since last time */
> > > +		spin_lock(&ofs->errseq_lock);
> > > +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> > > +						&sb->s_wb_err);
> > > +		spin_unlock(&ofs->errseq_lock);
> > > +	}
> > > +	return ret ? ret : ret2;
> > 
> > I think this is probably not quite right.
> > 
> > The problem I think is that the SEEN flag is always going to end up
> > being set in sb->s_wb_err, and that is going to violate the desired
> > semantics. If the writeback error occurred after all fd's were closed,
> > then the next opener wouldn't see it and you'd lose the error.
> > 
> > We probably need a function to cleanly propagate the error from one
> > errseq_t to another so that that doesn't occur. I'll have to think about
> > it.
> > 
> 
> So, the problem is that we can't guarantee that we'll have an open file
> when sync_fs is called. So if you do the check_and_advance in the
> context of a sync() syscall, you'll effectively ensure that a later
> opener on the upper layer won't see the error (since the upper_sb's
> errseq_t will be marked SEEN.

Aha.., I assumed that when ->sync_fs() is called, we always have a
valid fd open. But that's only true if ->sync_fs() is being called
through syncfs(fd) syscall. For the case of plain sync() syscall,
this is not true.

So it leads us back to need of passing "struct file" in ->sync_fs().
And fetching the writeback error from upper can be done only
if a file is open on which syncfs() has been called.

> 
> It's not clear to me what semantics you want in the following situation:
> 
> mount upper layer
> mount overlayfs with non-volatile upper layer
> do "stuff" on overlayfs, and close all files on overlayfs
> get a writeback error on upper layer
> call sync() (sync_fs gets run)
> open file on upper layer mount
> call syncfs() on upper-layer fd
> 
> Should that last syncfs error report an error?

Actually, I was thinking of following.
- mount upper layer
- mount overlayfs (non-volatile)
- Do bunch of writes.
- A writeback error happens on upper file and gets recorded in
  upper fs sb.
- overlay application calls syncfs(fd) and gets the error back. IIUC,
  the way currently things are written, syncfs(fd) will not return
  writeback errors on overlayfs.

> 
> Also, suppose if at the end we instead opened a file on overlayfs and
> issued the syncfs() there -- should we see the error in that case? 

I am thinking that behavior should be similar to as if two file
descriptors have been opened on a regular filesystem. So if I open
one fd1 on overlay and one fd2 on upper and they both were opened
before writeback error happend, then syncfs(fd1) and syncfs(fd2),
both should see the error.

And any of syncfs(fd1) and syncfs(fd2) should set the SEEN flag in 
upper_sb so that new errors can continue to be reported.

IOW, so looks like major problem with this patch is that we need
to propagate error from upper_sb to overlaysb only if a valid
file descriptor is open. IOW, do this in syncfs(fd) path and not
sync() path. And to distinguish between two, we probably need to
pass additional parameter in ->sync_fs().

Am I missing somehting. Just trying to make sure that if we are
solving the problem of syncfs error propagation in overlay, lets
solve it both for volatile as well as non-volatile case so that
there is less confusion later.

Vivek


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-14 23:53     ` Jeff Layton
  2020-12-15 13:16       ` Jeff Layton
@ 2020-12-15 15:06       ` Vivek Goyal
  1 sibling, 0 replies; 18+ messages in thread
From: Vivek Goyal @ 2020-12-15 15:06 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Mon, Dec 14, 2020 at 06:53:10PM -0500, Jeff Layton wrote:
> On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote:
> > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > > and record it in the per-sb info. In sync_fs, check for an error since
> > > the recorded point and set it in the overlayfs superblock if there was
> > > one.
> > > 
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > 
> > While we are solving problem for non-volatile overlay mount, I also
> > started thinking, what about non-volatile overlay syncfs() writeback errors.
> > Looks like these will not be reported to user space at all as of now
> > (because we never update overlay_sb->s_wb_err ever).
> > 
> > A patch like this might fix it. (compile tested only).
> > 
> > overlayfs: Report syncfs() errors to user space
> > 
> > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> > return code. But certain writeback errors can still be reported on 
> > syncfs() by checking errors on super block.
> > 
> > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> > 
> > For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> > means sync() will never report writeback errors on overlayfs uppon syncfs().
> > 
> > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> > should mean that user space syncfs() call should see writeback errors.
> > 
> > ovl_fsync() does not need anything special because if there are writeback
> > errors underlying filesystem will report it through vfs_fsync_range() return
> > code and user space will see it.
> > 
> > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > ---
> >  fs/overlayfs/ovl_entry.h |    1 +
> >  fs/overlayfs/super.c     |   14 +++++++++++---
> >  2 files changed, 12 insertions(+), 3 deletions(-)
> > 
> > Index: redhat-linux/fs/overlayfs/super.c
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
> >  {
> >  	struct ovl_fs *ofs = sb->s_fs_info;
> >  	struct super_block *upper_sb;
> > -	int ret;
> > +	int ret, ret2;
> >  
> > 
> > 
> > 
> >  	if (!ovl_upper_mnt(ofs))
> >  		return 0;
> > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
> >  	ret = sync_filesystem(upper_sb);
> >  	up_read(&upper_sb->s_umount);
> >  
> > 
> > 
> > 
> > -	return ret;
> > +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > +		/* Upper sb has errors since last time */
> > +		spin_lock(&ofs->errseq_lock);
> > +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> > +						&sb->s_wb_err);
> > +		spin_unlock(&ofs->errseq_lock);
> > +	}
> > +	return ret ? ret : ret2;
> 
> I think this is probably not quite right.
> 
> The problem I think is that the SEEN flag is always going to end up
> being set in sb->s_wb_err, and that is going to violate the desired
> semantics. If the writeback error occurred after all fd's were closed,
> then the next opener wouldn't see it and you'd lose the error.

So this will happen only due to sync() path and not syncfs() path, right?
If we avoid calling errseq_check_and_advance() in sync() path in
ovleryafs(), then we always have a valid fd and marking error SEEN
on upper is perfectly valid?

> 
> We probably need a function to cleanly propagate the error from one
> errseq_t to another so that that doesn't occur. I'll have to think about
> it.

Thanks
Vivek

> 
> >  }
> >  
> > 
> > 
> > 
> >  /**
> > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b
> >  	if (!cred)
> >  		goto out_err;
> >  
> > 
> > 
> > 
> > +	spin_lock_init(&ofs->errseq_lock);
> >  	/* Is there a reason anyone would want not to share whiteouts? */
> >  	ofs->share_whiteout = true;
> >  
> > 
> > 
> > 
> > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b
> >  
> > 
> > 
> > 
> >  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
> >  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> > -
> > +		sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> >  	}
> >  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
> >  	err = PTR_ERR(oe);
> > Index: redhat-linux/fs/overlayfs/ovl_entry.h
> > ===================================================================
> > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h	2020-12-14 15:33:43.934400880 -0500
> > +++ redhat-linux/fs/overlayfs/ovl_entry.h	2020-12-14 15:34:13.509400880 -0500
> > @@ -79,6 +79,7 @@ struct ovl_fs {
> >  	atomic_long_t last_ino;
> >  	/* Whiteout dentry cache */
> >  	struct dentry *whiteout;
> > +	spinlock_t errseq_lock;
> >  };
> >  
> > 
> > 
> > 
> >  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> > 
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-15 14:59         ` Vivek Goyal
@ 2020-12-15 15:23           ` Jeff Layton
  2020-12-15 15:39             ` Vivek Goyal
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Layton @ 2020-12-15 15:23 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Tue, 2020-12-15 at 09:59 -0500, Vivek Goyal wrote:
> On Tue, Dec 15, 2020 at 08:16:12AM -0500, Jeff Layton wrote:
> > On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote:
> > > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote:
> > > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > > > > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > > > > and record it in the per-sb info. In sync_fs, check for an error since
> > > > > the recorded point and set it in the overlayfs superblock if there was
> > > > > one.
> > > > > 
> > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > ---
> > > > 
> > > > While we are solving problem for non-volatile overlay mount, I also
> > > > started thinking, what about non-volatile overlay syncfs() writeback errors.
> > > > Looks like these will not be reported to user space at all as of now
> > > > (because we never update overlay_sb->s_wb_err ever).
> > > > 
> > > > A patch like this might fix it. (compile tested only).
> > > > 
> > > > overlayfs: Report syncfs() errors to user space
> > > > 
> > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> > > > return code. But certain writeback errors can still be reported on 
> > > > syncfs() by checking errors on super block.
> > > > 
> > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> > > > 
> > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> > > > means sync() will never report writeback errors on overlayfs uppon syncfs().
> > > > 
> > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> > > > should mean that user space syncfs() call should see writeback errors.
> > > > 
> > > > ovl_fsync() does not need anything special because if there are writeback
> > > > errors underlying filesystem will report it through vfs_fsync_range() return
> > > > code and user space will see it.
> > > > 
> > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > ---
> > > >  fs/overlayfs/ovl_entry.h |    1 +
> > > >  fs/overlayfs/super.c     |   14 +++++++++++---
> > > >  2 files changed, 12 insertions(+), 3 deletions(-)
> > > > 
> > > > Index: redhat-linux/fs/overlayfs/super.c
> > > > ===================================================================
> > > > --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> > > > +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> > > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
> > > >  {
> > > >  	struct ovl_fs *ofs = sb->s_fs_info;
> > > >  	struct super_block *upper_sb;
> > > > -	int ret;
> > > > +	int ret, ret2;
> > > >  
> > > > 
> > > > 
> > > > 
> > > >  	if (!ovl_upper_mnt(ofs))
> > > >  		return 0;
> > > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
> > > >  	ret = sync_filesystem(upper_sb);
> > > >  	up_read(&upper_sb->s_umount);
> > > >  
> > > > 
> > > > 
> > > > 
> > > > -	return ret;
> > > > +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > > > +		/* Upper sb has errors since last time */
> > > > +		spin_lock(&ofs->errseq_lock);
> > > > +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> > > > +						&sb->s_wb_err);
> > > > +		spin_unlock(&ofs->errseq_lock);
> > > > +	}
> > > > +	return ret ? ret : ret2;
> > > 
> > > I think this is probably not quite right.
> > > 
> > > The problem I think is that the SEEN flag is always going to end up
> > > being set in sb->s_wb_err, and that is going to violate the desired
> > > semantics. If the writeback error occurred after all fd's were closed,
> > > then the next opener wouldn't see it and you'd lose the error.
> > > 
> > > We probably need a function to cleanly propagate the error from one
> > > errseq_t to another so that that doesn't occur. I'll have to think about
> > > it.
> > > 
> > 
> > So, the problem is that we can't guarantee that we'll have an open file
> > when sync_fs is called. So if you do the check_and_advance in the
> > context of a sync() syscall, you'll effectively ensure that a later
> > opener on the upper layer won't see the error (since the upper_sb's
> > errseq_t will be marked SEEN.
> 
> Aha.., I assumed that when ->sync_fs() is called, we always have a
> valid fd open. But that's only true if ->sync_fs() is being called
> through syncfs(fd) syscall. For the case of plain sync() syscall,
> this is not true.
> 
> So it leads us back to need of passing "struct file" in ->sync_fs().
> And fetching the writeback error from upper can be done only
> if a file is open on which syncfs() has been called.
> 
> > 
> > It's not clear to me what semantics you want in the following situation:
> > 
> > mount upper layer
> > mount overlayfs with non-volatile upper layer
> > do "stuff" on overlayfs, and close all files on overlayfs
> > get a writeback error on upper layer
> > call sync() (sync_fs gets run)
> > open file on upper layer mount
> > call syncfs() on upper-layer fd
> > 
> > Should that last syncfs error report an error?
> 
> Actually, I was thinking of following.
> - mount upper layer
> - mount overlayfs (non-volatile)
> - Do bunch of writes.
> - A writeback error happens on upper file and gets recorded in
>   upper fs sb.
> - overlay application calls syncfs(fd) and gets the error back. IIUC,
>   the way currently things are written, syncfs(fd) will not return
>   writeback errors on overlayfs.
> 
> > 
> > Also, suppose if at the end we instead opened a file on overlayfs and
> > issued the syncfs() there -- should we see the error in that case? 
> 
> I am thinking that behavior should be similar to as if two file
> descriptors have been opened on a regular filesystem. So if I open
> one fd1 on overlay and one fd2 on upper and they both were opened
> before writeback error happend, then syncfs(fd1) and syncfs(fd2),
> both should see the error.
> 


Yes, that will happen as a matter of course.

> And any of syncfs(fd1) and syncfs(fd2) should set the SEEN flag in 
> upper_sb so that new errors can continue to be reported.
> 

The SEEN flag indicates whether a later opener should see an error that
predated the open. Currently, it will iff no one else has scraped the
error when the open is done.

Once we start dealing with overlayfs though, things are a bit more
murky. If someone issues a sync on the upper sb and that triggers a
writeback error. If I then do an open+syncfs on the overlay, should I
see the error?

What about in the reverse case?

> IOW, so looks like major problem with this patch is that we need
> to propagate error from upper_sb to overlaysb only if a valid
> file descriptor is open. IOW, do this in syncfs(fd) path and not
> sync() path. And to distinguish between two, we probably need to
> pass additional parameter in ->sync_fs().
> 
> Am I missing somehting. Just trying to make sure that if we are
> solving the problem of syncfs error propagation in overlay, lets
> solve it both for volatile as well as non-volatile case so that
> there is less confusion later.
> 

It may be possible to propagate the errors in some fashion, but it's
starting to sound pretty complex. I think we'd probably be better served
by cleaning things up so that overlayfs can just return an error of its
choosing to syncfs().

What may actually be best is to add a new ->syncfs op to struct
file_operations, and turn the current syncfs syscall wrapper into a
generic_syncfs or something. Then you could just define a syncfs op for
overlayfs and do what you like in there.

-- 
Jeff Layton <jlayton@kernel.org>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-15 15:23           ` Jeff Layton
@ 2020-12-15 15:39             ` Vivek Goyal
  0 siblings, 0 replies; 18+ messages in thread
From: Vivek Goyal @ 2020-12-15 15:39 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Tue, Dec 15, 2020 at 10:23:08AM -0500, Jeff Layton wrote:
> On Tue, 2020-12-15 at 09:59 -0500, Vivek Goyal wrote:
> > On Tue, Dec 15, 2020 at 08:16:12AM -0500, Jeff Layton wrote:
> > > On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote:
> > > > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote:
> > > > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> > > > > > Peek at the upper layer's errseq_t at mount time for volatile mounts,
> > > > > > and record it in the per-sb info. In sync_fs, check for an error since
> > > > > > the recorded point and set it in the overlayfs superblock if there was
> > > > > > one.
> > > > > > 
> > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > > > > ---
> > > > > 
> > > > > While we are solving problem for non-volatile overlay mount, I also
> > > > > started thinking, what about non-volatile overlay syncfs() writeback errors.
> > > > > Looks like these will not be reported to user space at all as of now
> > > > > (because we never update overlay_sb->s_wb_err ever).
> > > > > 
> > > > > A patch like this might fix it. (compile tested only).
> > > > > 
> > > > > overlayfs: Report syncfs() errors to user space
> > > > > 
> > > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the
> > > > > return code. But certain writeback errors can still be reported on 
> > > > > syncfs() by checking errors on super block.
> > > > > 
> > > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
> > > > > 
> > > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That
> > > > > means sync() will never report writeback errors on overlayfs uppon syncfs().
> > > > > 
> > > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that
> > > > > should mean that user space syncfs() call should see writeback errors.
> > > > > 
> > > > > ovl_fsync() does not need anything special because if there are writeback
> > > > > errors underlying filesystem will report it through vfs_fsync_range() return
> > > > > code and user space will see it.
> > > > > 
> > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > > > ---
> > > > >  fs/overlayfs/ovl_entry.h |    1 +
> > > > >  fs/overlayfs/super.c     |   14 +++++++++++---
> > > > >  2 files changed, 12 insertions(+), 3 deletions(-)
> > > > > 
> > > > > Index: redhat-linux/fs/overlayfs/super.c
> > > > > ===================================================================
> > > > > --- redhat-linux.orig/fs/overlayfs/super.c	2020-12-14 15:33:43.934400880 -0500
> > > > > +++ redhat-linux/fs/overlayfs/super.c	2020-12-14 16:15:07.127400880 -0500
> > > > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc
> > > > >  {
> > > > >  	struct ovl_fs *ofs = sb->s_fs_info;
> > > > >  	struct super_block *upper_sb;
> > > > > -	int ret;
> > > > > +	int ret, ret2;
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > > > >  	if (!ovl_upper_mnt(ofs))
> > > > >  		return 0;
> > > > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc
> > > > >  	ret = sync_filesystem(upper_sb);
> > > > >  	up_read(&upper_sb->s_umount);
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > > > > -	return ret;
> > > > > +	if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) {
> > > > > +		/* Upper sb has errors since last time */
> > > > > +		spin_lock(&ofs->errseq_lock);
> > > > > +		ret2 = errseq_check_and_advance(&upper_sb->s_wb_err,
> > > > > +						&sb->s_wb_err);
> > > > > +		spin_unlock(&ofs->errseq_lock);
> > > > > +	}
> > > > > +	return ret ? ret : ret2;
> > > > 
> > > > I think this is probably not quite right.
> > > > 
> > > > The problem I think is that the SEEN flag is always going to end up
> > > > being set in sb->s_wb_err, and that is going to violate the desired
> > > > semantics. If the writeback error occurred after all fd's were closed,
> > > > then the next opener wouldn't see it and you'd lose the error.
> > > > 
> > > > We probably need a function to cleanly propagate the error from one
> > > > errseq_t to another so that that doesn't occur. I'll have to think about
> > > > it.
> > > > 
> > > 
> > > So, the problem is that we can't guarantee that we'll have an open file
> > > when sync_fs is called. So if you do the check_and_advance in the
> > > context of a sync() syscall, you'll effectively ensure that a later
> > > opener on the upper layer won't see the error (since the upper_sb's
> > > errseq_t will be marked SEEN.
> > 
> > Aha.., I assumed that when ->sync_fs() is called, we always have a
> > valid fd open. But that's only true if ->sync_fs() is being called
> > through syncfs(fd) syscall. For the case of plain sync() syscall,
> > this is not true.
> > 
> > So it leads us back to need of passing "struct file" in ->sync_fs().
> > And fetching the writeback error from upper can be done only
> > if a file is open on which syncfs() has been called.
> > 
> > > 
> > > It's not clear to me what semantics you want in the following situation:
> > > 
> > > mount upper layer
> > > mount overlayfs with non-volatile upper layer
> > > do "stuff" on overlayfs, and close all files on overlayfs
> > > get a writeback error on upper layer
> > > call sync() (sync_fs gets run)
> > > open file on upper layer mount
> > > call syncfs() on upper-layer fd
> > > 
> > > Should that last syncfs error report an error?
> > 
> > Actually, I was thinking of following.
> > - mount upper layer
> > - mount overlayfs (non-volatile)
> > - Do bunch of writes.
> > - A writeback error happens on upper file and gets recorded in
> >   upper fs sb.
> > - overlay application calls syncfs(fd) and gets the error back. IIUC,
> >   the way currently things are written, syncfs(fd) will not return
> >   writeback errors on overlayfs.
> > 
> > > 
> > > Also, suppose if at the end we instead opened a file on overlayfs and
> > > issued the syncfs() there -- should we see the error in that case? 
> > 
> > I am thinking that behavior should be similar to as if two file
> > descriptors have been opened on a regular filesystem. So if I open
> > one fd1 on overlay and one fd2 on upper and they both were opened
> > before writeback error happend, then syncfs(fd1) and syncfs(fd2),
> > both should see the error.
> > 
> 
> 
> Yes, that will happen as a matter of course.
> 
> > And any of syncfs(fd1) and syncfs(fd2) should set the SEEN flag in 
> > upper_sb so that new errors can continue to be reported.
> > 
> 
> The SEEN flag indicates whether a later opener should see an error that
> predated the open. Currently, it will iff no one else has scraped the
> error when the open is done.
> 
> Once we start dealing with overlayfs though, things are a bit more
> murky. If someone issues a sync on the upper sb and that triggers a
> writeback error. If I then do an open+syncfs on the overlay, should I
> see the error?

I think that yes, open+syncfs on the overlay should see this UNSEEN error.
IOW, this will be similar to as if somebody did an open+syncfs on upper
and scrapped UNSEEN error.

> 
> What about in the reverse case?

Same for reverse case. If overlayfs triggered sync and resulted in
in unseen error on upper sb, then a later open+syncfs on upper should
see the error.

Thanks
Vivek

> 
> > IOW, so looks like major problem with this patch is that we need
> > to propagate error from upper_sb to overlaysb only if a valid
> > file descriptor is open. IOW, do this in syncfs(fd) path and not
> > sync() path. And to distinguish between two, we probably need to
> > pass additional parameter in ->sync_fs().
> > 
> > Am I missing somehting. Just trying to make sure that if we are
> > solving the problem of syncfs error propagation in overlay, lets
> > solve it both for volatile as well as non-volatile case so that
> > there is less confusion later.
> > 
> 
> It may be possible to propagate the errors in some fashion, but it's
> starting to sound pretty complex. I think we'd probably be better served
> by cleaning things up so that overlayfs can just return an error of its
> choosing to syncfs().
> 
> What may actually be best is to add a new ->syncfs op to struct
> file_operations, and turn the current syncfs syscall wrapper into a
> generic_syncfs or something. Then you could just define a syncfs op for
> overlayfs and do what you like in there.
> 
> -- 
> Jeff Layton <jlayton@kernel.org>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs
  2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton
  2020-12-14 21:38   ` Vivek Goyal
@ 2020-12-17 19:28   ` Sargun Dhillon
  1 sibling, 0 replies; 18+ messages in thread
From: Sargun Dhillon @ 2020-12-17 19:28 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Amir Goldstein, Miklos Szeredi, Vivek Goyal, overlayfs,
	Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara

On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote:
> Peek at the upper layer's errseq_t at mount time for volatile mounts,
> and record it in the per-sb info. In sync_fs, check for an error since
> the recorded point and set it in the overlayfs superblock if there was
> one.
> 
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  fs/overlayfs/ovl_entry.h |  1 +
>  fs/overlayfs/super.c     | 14 +++++++++++---
>  2 files changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h
> index 1b5a2094df8e..fcfcc3951973 100644
> --- a/fs/overlayfs/ovl_entry.h
> +++ b/fs/overlayfs/ovl_entry.h
> @@ -79,6 +79,7 @@ struct ovl_fs {
>  	atomic_long_t last_ino;
>  	/* Whiteout dentry cache */
>  	struct dentry *whiteout;
> +	errseq_t err_mark;
>  };
>  
>  static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs)
> diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
> index 290983bcfbb3..2985d2752970 100644
> --- a/fs/overlayfs/super.c
> +++ b/fs/overlayfs/super.c
> @@ -264,8 +264,13 @@ static int ovl_sync_fs(struct super_block *sb, int wait)
>  	if (!ovl_upper_mnt(ofs))
>  		return 0;
>  
> -	if (!ovl_should_sync(ofs))
> -		return 0;
> +	if (!ovl_should_sync(ofs)) {
> +		/* Propagate errors from upper to overlayfs */
> +		ret = errseq_check(&upper_sb->s_wb_err, ofs->err_mark);
> +		errseq_set(&sb->s_wb_err, ret);
> +		return ret;
> +	}
> +
>  	/*
>  	 * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC).
>  	 * All the super blocks will be iterated, including upper_sb.
> @@ -1945,8 +1950,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
>  
>  		sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth;
>  		sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran;
> -
>  	}
> +
> +	if (ofs->config.ovl_volatile)
> +		ofs->err_mark = errseq_peek(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err);
> +
>  	oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers);
>  	err = PTR_ERR(oe);
>  	if (IS_ERR(oe))
> -- 
> 2.29.2
> 

I've tested this with the following scenarios, seems to work:
Test:
1. Mount ext2 on /mnt/loop, and cause a writeback error
2. Verify syncfs on /mnt/loop shows error
3. Mount volatile filesystem  
4. Create file on volatile filesystem, and verify that I can syncfs it without error
---
Fork:

5a. Create a file on overlayfs, and generate a writeback error
6a. Syncfs overlayfs.
7a. Create a new file on overlayfs, and syncfs, and verify it returns error

---
5b. Create a file on loop back, and generate a writeback error
6b. Sync said file
7b. Verify syncfs on loop returns error once, and then success on next attempts
8b. Verify all syncfs on overlayfs now fail

---
5c. Create file on overlayfs, and generate a writeback error
6c. Sync overlayfs, and verify all syncs are failures               
7c. Verify syncfs on loop fails once.





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-12-17 19:29 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton
2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton
2020-12-13 23:35   ` NeilBrown
2020-12-14 13:37     ` Jeffrey Layton
2020-12-14 22:00       ` NeilBrown
2020-12-14 23:32         ` Jeff Layton
2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton
2020-12-14 21:38   ` Vivek Goyal
2020-12-14 22:04     ` Sargun Dhillon
2020-12-14 23:01       ` Vivek Goyal
2020-12-14 23:53     ` Jeff Layton
2020-12-15 13:16       ` Jeff Layton
2020-12-15 14:59         ` Vivek Goyal
2020-12-15 15:23           ` Jeff Layton
2020-12-15 15:39             ` Vivek Goyal
2020-12-15 15:06       ` Vivek Goyal
2020-12-17 19:28   ` Sargun Dhillon
2020-12-13 20:31 ` [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Sargun Dhillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).