* [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case @ 2020-12-13 13:27 Jeff Layton 2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Jeff Layton @ 2020-12-13 13:27 UTC (permalink / raw) To: Amir Goldstein, Sargun Dhillon Cc: Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara What about this as an alternate approach to the problem that Sargun has been working on? I have some minor concerns about the complexity of managing a stateful object across two different words. That can be done, but I think this may be simpler. This set steals an extra flag bit from the errseq_t counter so that we have two flags: one indicating whether to increment the counter at set time, and another to indicate whether the error has been reported to userland. This should give you the semantics you want in the syncfs case, no? If this does look like it's a suitable approach, then I'll plan to clean up the comments and docs. I have a vague feeling that this might help us eventually kill the AS_EIO and AS_ENOSPC bits too, but that would require a bit more work to plumb in "since" samples at appropriate places. Jeff Layton (2): errseq: split the SEEN flag into two new flags overlayfs: propagate errors from upper to overlay sb in sync_fs fs/overlayfs/ovl_entry.h | 1 + fs/overlayfs/super.c | 14 +++++++-- include/linux/errseq.h | 2 ++ lib/errseq.c | 64 +++++++++++++++++++++++++++++++++------- 4 files changed, 67 insertions(+), 14 deletions(-) -- 2.29.2 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags 2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton @ 2020-12-13 13:27 ` Jeff Layton 2020-12-13 23:35 ` NeilBrown 2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton 2020-12-13 20:31 ` [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Sargun Dhillon 2 siblings, 1 reply; 18+ messages in thread From: Jeff Layton @ 2020-12-13 13:27 UTC (permalink / raw) To: Amir Goldstein, Sargun Dhillon Cc: Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara Overlayfs's volatile mounts want to be able to sample an error for their own purposes, without preventing a later opener from potentially seeing the error. The original reason for the SEEN flag was to make it so that we didn't need to increment the counter if nothing had observed the latest value and the error was the same. Eventually, a regression was reported in the errseq_t conversion, and we fixed that by using the SEEN flag to also mean that the error had been reported to userland at least once somewhere. Those are two different states, however. If we instead take a second flag bit from the counter, we can track these two things separately, and accomodate the overlayfs volatile mount use-case. Add a new MUSTINC flag that indicates that the counter must be incremented the next time an error is set, and rework the errseq functions to set and clear that flag whenever the SEEN bit is set or cleared. Test only for the MUSTINC bit when deciding whether to increment the counter and only for the SEEN bit when deciding what to return in errseq_sample. Add a new errseq_peek function to allow for the overlayfs use-case. This just grabs the latest counter and sets the MUSTINC bit, leaving the SEEN bit untouched. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- include/linux/errseq.h | 2 ++ lib/errseq.c | 64 ++++++++++++++++++++++++++++++++++-------- 2 files changed, 55 insertions(+), 11 deletions(-) diff --git a/include/linux/errseq.h b/include/linux/errseq.h index fc2777770768..6d4b9bc629ac 100644 --- a/include/linux/errseq.h +++ b/include/linux/errseq.h @@ -9,6 +9,8 @@ typedef u32 errseq_t; errseq_t errseq_set(errseq_t *eseq, int err); errseq_t errseq_sample(errseq_t *eseq); +errseq_t errseq_peek(errseq_t *eseq); +errseq_t errseq_sample_advance(errseq_t *eseq); int errseq_check(errseq_t *eseq, errseq_t since); int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); #endif diff --git a/lib/errseq.c b/lib/errseq.c index 81f9e33aa7e7..5cc830f0361b 100644 --- a/lib/errseq.c +++ b/lib/errseq.c @@ -38,8 +38,11 @@ /* This bit is used as a flag to indicate whether the value has been seen */ #define ERRSEQ_SEEN (1 << ERRSEQ_SHIFT) +/* This bit indicates that value must be incremented even when error is same */ +#define ERRSEQ_MUSTINC (1 << (ERRSEQ_SHIFT + 1)) + /* The lowest bit of the counter */ -#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1)) +#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 2)) /** * errseq_set - set a errseq_t for later reporting @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err) for (;;) { errseq_t new; - /* Clear out error bits and set new error */ - new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err; + /* Clear out flag bits and set new error */ + new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err; - /* Only increment if someone has looked at it */ - if (old & ERRSEQ_SEEN) + /* Only increment if we have to */ + if (old & ERRSEQ_MUSTINC) new += ERRSEQ_CTR_INC; /* If there would be no change, then call it done */ @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set); errseq_t errseq_sample(errseq_t *eseq) { errseq_t old = READ_ONCE(*eseq); + errseq_t new = old; - /* If nobody has seen this error yet, then we can be the first. */ - if (!(old & ERRSEQ_SEEN)) - old = 0; - return old; + /* + * For the common case of no errors ever having been set, we can skip + * marking the SEEN|MUSTINC bits. Once an error has been set, the value + * will never go back to zero. + */ + if (old != 0) { + new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC; + if (old != new) + cmpxchg(eseq, old, new); + if (!(old & ERRSEQ_SEEN)) + return 0; + } + return new; } EXPORT_SYMBOL(errseq_sample); +/** + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN + * @eseq: Pointer to errseq_t to be sampled. + * + * In some cases, we need to be able to sample the errseq_t, but we're not + * in a situation where we can report the value to userland. Use this + * function to do that. This ensures that later errors will be recorded, + * and that any current errors are reported at least once. + * + * Context: Any context. + * Return: The current errseq value. + */ +errseq_t errseq_peek(errseq_t *eseq) +{ + errseq_t old = READ_ONCE(*eseq); + errseq_t new = old; + + if (old != 0) { + new |= ERRSEQ_MUSTINC; + if (old != new) + cmpxchg(eseq, old, new); + } + return new; +} +EXPORT_SYMBOL(errseq_peek); + /** * errseq_check() - Has an error occurred since a particular sample point? * @eseq: Pointer to errseq_t value to be checked. @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample); */ int errseq_check(errseq_t *eseq, errseq_t since) { - errseq_t cur = READ_ONCE(*eseq); + errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); + + /* Clear the flag bits for comparison */ + since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); if (likely(cur == since)) return 0; @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since) * can advance "since" and return an error based on what we * have. */ - new = old | ERRSEQ_SEEN; + new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC; if (new != old) cmpxchg(eseq, old, new); *since = new; -- 2.29.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags 2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton @ 2020-12-13 23:35 ` NeilBrown 2020-12-14 13:37 ` Jeffrey Layton 0 siblings, 1 reply; 18+ messages in thread From: NeilBrown @ 2020-12-13 23:35 UTC (permalink / raw) To: Jeff Layton, Amir Goldstein, Sargun Dhillon Cc: Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara [-- Attachment #1: Type: text/plain, Size: 6688 bytes --] On Sun, Dec 13 2020, Jeff Layton wrote: > Overlayfs's volatile mounts want to be able to sample an error for > their own purposes, without preventing a later opener from potentially > seeing the error. > > The original reason for the SEEN flag was to make it so that we didn't > need to increment the counter if nothing had observed the latest value > and the error was the same. Eventually, a regression was reported in > the errseq_t conversion, and we fixed that by using the SEEN flag to > also mean that the error had been reported to userland at least once > somewhere. > > Those are two different states, however. If we instead take a second > flag bit from the counter, we can track these two things separately, > and accomodate the overlayfs volatile mount use-case. > > Add a new MUSTINC flag that indicates that the counter must be > incremented the next time an error is set, and rework the errseq > functions to set and clear that flag whenever the SEEN bit is set or > cleared. > > Test only for the MUSTINC bit when deciding whether to increment the > counter and only for the SEEN bit when deciding what to return in > errseq_sample. > > Add a new errseq_peek function to allow for the overlayfs use-case. > This just grabs the latest counter and sets the MUSTINC bit, leaving > the SEEN bit untouched. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > include/linux/errseq.h | 2 ++ > lib/errseq.c | 64 ++++++++++++++++++++++++++++++++++-------- > 2 files changed, 55 insertions(+), 11 deletions(-) > > diff --git a/include/linux/errseq.h b/include/linux/errseq.h > index fc2777770768..6d4b9bc629ac 100644 > --- a/include/linux/errseq.h > +++ b/include/linux/errseq.h > @@ -9,6 +9,8 @@ typedef u32 errseq_t; > > errseq_t errseq_set(errseq_t *eseq, int err); > errseq_t errseq_sample(errseq_t *eseq); > +errseq_t errseq_peek(errseq_t *eseq); > +errseq_t errseq_sample_advance(errseq_t *eseq); > int errseq_check(errseq_t *eseq, errseq_t since); > int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); > #endif > diff --git a/lib/errseq.c b/lib/errseq.c > index 81f9e33aa7e7..5cc830f0361b 100644 > --- a/lib/errseq.c > +++ b/lib/errseq.c > @@ -38,8 +38,11 @@ > /* This bit is used as a flag to indicate whether the value has been seen */ > #define ERRSEQ_SEEN (1 << ERRSEQ_SHIFT) Would this look nicer using the BIT() macro? #define ERRSEQ_SEEN BIT(ERRSEQ_SHIFT) > > +/* This bit indicates that value must be incremented even when error is same */ > +#define ERRSEQ_MUSTINC (1 << (ERRSEQ_SHIFT + 1)) #define ERRSEQ_MUSTINC BIT(ERRSEQ_SHIFT+1) or if you don't like the BIT macro (not everyone does), then maybe #define ERR_SEQ_MUSTINC (ERRSEQ_SEEN << 1 ) ?? > + > /* The lowest bit of the counter */ > -#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1)) > +#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 2)) Ditto. > > /** > * errseq_set - set a errseq_t for later reporting > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err) > for (;;) { > errseq_t new; > > - /* Clear out error bits and set new error */ > - new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err; > + /* Clear out flag bits and set new error */ > + new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err; This is starting to look clumsy (or maybe, this already looked clumsy, but now that is hard to ignore). new = (old & (ERRSEQ_CTR_INC - 1)) | -err Also this assumes MAX_ERRNO is a mask, which it is .. today. BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1)); ?? > > - /* Only increment if someone has looked at it */ > - if (old & ERRSEQ_SEEN) > + /* Only increment if we have to */ > + if (old & ERRSEQ_MUSTINC) > new += ERRSEQ_CTR_INC; > > /* If there would be no change, then call it done */ > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set); > errseq_t errseq_sample(errseq_t *eseq) > { > errseq_t old = READ_ONCE(*eseq); > + errseq_t new = old; > > - /* If nobody has seen this error yet, then we can be the first. */ > - if (!(old & ERRSEQ_SEEN)) > - old = 0; > - return old; > + /* > + * For the common case of no errors ever having been set, we can skip > + * marking the SEEN|MUSTINC bits. Once an error has been set, the value > + * will never go back to zero. > + */ > + if (old != 0) { > + new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC; You lose me here. Why is ERRSEQ_SEEN being set, where it wasn't before? The ERRSEQ_SEEN flag not means precisely "The error has been reported to userspace". This operations isn't used to report errors - that is errseq_check(). I'm not saying the code it wrong - I really cannot tell. I'm just saying that I cannot see why it might be right. Thanks, NeilBrown > + if (old != new) > + cmpxchg(eseq, old, new); > + if (!(old & ERRSEQ_SEEN)) > + return 0; > + } > + return new; > } > EXPORT_SYMBOL(errseq_sample); > > +/** > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN > + * @eseq: Pointer to errseq_t to be sampled. > + * > + * In some cases, we need to be able to sample the errseq_t, but we're not > + * in a situation where we can report the value to userland. Use this > + * function to do that. This ensures that later errors will be recorded, > + * and that any current errors are reported at least once. > + * > + * Context: Any context. > + * Return: The current errseq value. > + */ > +errseq_t errseq_peek(errseq_t *eseq) > +{ > + errseq_t old = READ_ONCE(*eseq); > + errseq_t new = old; > + > + if (old != 0) { > + new |= ERRSEQ_MUSTINC; > + if (old != new) > + cmpxchg(eseq, old, new); > + } > + return new; > +} > +EXPORT_SYMBOL(errseq_peek); > + > /** > * errseq_check() - Has an error occurred since a particular sample point? > * @eseq: Pointer to errseq_t value to be checked. > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample); > */ > int errseq_check(errseq_t *eseq, errseq_t since) > { > - errseq_t cur = READ_ONCE(*eseq); > + errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > + > + /* Clear the flag bits for comparison */ > + since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > > if (likely(cur == since)) > return 0; > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since) > * can advance "since" and return an error based on what we > * have. > */ > - new = old | ERRSEQ_SEEN; > + new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC; > if (new != old) > cmpxchg(eseq, old, new); > *since = new; > -- > 2.29.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 853 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags 2020-12-13 23:35 ` NeilBrown @ 2020-12-14 13:37 ` Jeffrey Layton 2020-12-14 22:00 ` NeilBrown 0 siblings, 1 reply; 18+ messages in thread From: Jeffrey Layton @ 2020-12-14 13:37 UTC (permalink / raw) To: NeilBrown Cc: Jeff Layton, Amir Goldstein, Sargun Dhillon, Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Mon, Dec 14, 2020 at 10:35:56AM +1100, NeilBrown wrote: > On Sun, Dec 13 2020, Jeff Layton wrote: > > > Overlayfs's volatile mounts want to be able to sample an error for > > their own purposes, without preventing a later opener from potentially > > seeing the error. > > > > The original reason for the SEEN flag was to make it so that we didn't > > need to increment the counter if nothing had observed the latest value > > and the error was the same. Eventually, a regression was reported in > > the errseq_t conversion, and we fixed that by using the SEEN flag to > > also mean that the error had been reported to userland at least once > > somewhere. > > > > Those are two different states, however. If we instead take a second > > flag bit from the counter, we can track these two things separately, > > and accomodate the overlayfs volatile mount use-case. > > > > Add a new MUSTINC flag that indicates that the counter must be > > incremented the next time an error is set, and rework the errseq > > functions to set and clear that flag whenever the SEEN bit is set or > > cleared. > > > > Test only for the MUSTINC bit when deciding whether to increment the > > counter and only for the SEEN bit when deciding what to return in > > errseq_sample. > > > > Add a new errseq_peek function to allow for the overlayfs use-case. > > This just grabs the latest counter and sets the MUSTINC bit, leaving > > the SEEN bit untouched. > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > include/linux/errseq.h | 2 ++ > > lib/errseq.c | 64 ++++++++++++++++++++++++++++++++++-------- > > 2 files changed, 55 insertions(+), 11 deletions(-) > > > > diff --git a/include/linux/errseq.h b/include/linux/errseq.h > > index fc2777770768..6d4b9bc629ac 100644 > > --- a/include/linux/errseq.h > > +++ b/include/linux/errseq.h > > @@ -9,6 +9,8 @@ typedef u32 errseq_t; > > > > errseq_t errseq_set(errseq_t *eseq, int err); > > errseq_t errseq_sample(errseq_t *eseq); > > +errseq_t errseq_peek(errseq_t *eseq); > > +errseq_t errseq_sample_advance(errseq_t *eseq); > > int errseq_check(errseq_t *eseq, errseq_t since); > > int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); > > #endif > > diff --git a/lib/errseq.c b/lib/errseq.c > > index 81f9e33aa7e7..5cc830f0361b 100644 > > --- a/lib/errseq.c > > +++ b/lib/errseq.c > > @@ -38,8 +38,11 @@ > > /* This bit is used as a flag to indicate whether the value has been seen */ > > #define ERRSEQ_SEEN (1 << ERRSEQ_SHIFT) > > Would this look nicer using the BIT() macro? > > #define ERRSEQ_SEEN BIT(ERRSEQ_SHIFT) > > > > > +/* This bit indicates that value must be incremented even when error is same */ > > +#define ERRSEQ_MUSTINC (1 << (ERRSEQ_SHIFT + 1)) > > #define ERRSEQ_MUSTINC BIT(ERRSEQ_SHIFT+1) > > or if you don't like the BIT macro (not everyone does), then maybe > > #define ERR_SEQ_MUSTINC (ERRSEQ_SEEN << 1 ) > > ?? > > > + > > /* The lowest bit of the counter */ > > -#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1)) > > +#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 2)) > > Ditto. > Yes, I can make that change. The BIT macro is much easier to read. > > > > /** > > * errseq_set - set a errseq_t for later reporting > > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err) > > for (;;) { > > errseq_t new; > > > > - /* Clear out error bits and set new error */ > > - new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err; > > + /* Clear out flag bits and set new error */ > > + new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err; > > This is starting to look clumsy (or maybe, this already looked clumsy, > but now that is hard to ignore). > > new = (old & (ERRSEQ_CTR_INC - 1)) | -err > I think you mean: new = (old & ~(ERRSEQ_CTR_INC - 1)) | -err; Maybe I can add a new ERRSEQ_CTR_MASK value though which makes it more evident. > Also this assumes MAX_ERRNO is a mask, which it is .. today. > > BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1)); > ?? > We already have this in errseq_set: BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1); > > > > - /* Only increment if someone has looked at it */ > > - if (old & ERRSEQ_SEEN) > > + /* Only increment if we have to */ > > + if (old & ERRSEQ_MUSTINC) > > new += ERRSEQ_CTR_INC; > > > > /* If there would be no change, then call it done */ > > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set); > > errseq_t errseq_sample(errseq_t *eseq) > > { > > errseq_t old = READ_ONCE(*eseq); > > + errseq_t new = old; > > > > - /* If nobody has seen this error yet, then we can be the first. */ > > - if (!(old & ERRSEQ_SEEN)) > > - old = 0; > > - return old; > > + /* > > + * For the common case of no errors ever having been set, we can skip > > + * marking the SEEN|MUSTINC bits. Once an error has been set, the value > > + * will never go back to zero. > > + */ > > + if (old != 0) { > > + new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC; > > You lose me here. Why is ERRSEQ_SEEN being set, where it wasn't before? > > The ERRSEQ_SEEN flag not means precisely "The error has been reported to > userspace". > This operations isn't used to report errors - that is errseq_check(). > > I'm not saying the code it wrong - I really cannot tell. > I'm just saying that I cannot see why it might be right. > I think you're right. We should not be setting SEEN here, but we do need to set MUSTINC if it's not already set. I'll fix (and re-test). Thanks for the review! > > > > > + if (old != new) > > + cmpxchg(eseq, old, new); > > + if (!(old & ERRSEQ_SEEN)) > > + return 0; > > + } > > + return new; > > } > > EXPORT_SYMBOL(errseq_sample); > > > > +/** > > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN > > + * @eseq: Pointer to errseq_t to be sampled. > > + * > > + * In some cases, we need to be able to sample the errseq_t, but we're not > > + * in a situation where we can report the value to userland. Use this > > + * function to do that. This ensures that later errors will be recorded, > > + * and that any current errors are reported at least once. > > + * > > + * Context: Any context. > > + * Return: The current errseq value. > > + */ > > +errseq_t errseq_peek(errseq_t *eseq) > > +{ > > + errseq_t old = READ_ONCE(*eseq); > > + errseq_t new = old; > > + > > + if (old != 0) { > > + new |= ERRSEQ_MUSTINC; > > + if (old != new) > > + cmpxchg(eseq, old, new); > > + } > > + return new; > > +} > > +EXPORT_SYMBOL(errseq_peek); > > + > > /** > > * errseq_check() - Has an error occurred since a particular sample point? > > * @eseq: Pointer to errseq_t value to be checked. > > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample); > > */ > > int errseq_check(errseq_t *eseq, errseq_t since) > > { > > - errseq_t cur = READ_ONCE(*eseq); > > + errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > > + > > + /* Clear the flag bits for comparison */ > > + since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > > > > if (likely(cur == since)) > > return 0; > > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since) > > * can advance "since" and return an error based on what we > > * have. > > */ > > - new = old | ERRSEQ_SEEN; > > + new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC; > > if (new != old) > > cmpxchg(eseq, old, new); > > *since = new; > > -- > > 2.29.2 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags 2020-12-14 13:37 ` Jeffrey Layton @ 2020-12-14 22:00 ` NeilBrown 2020-12-14 23:32 ` Jeff Layton 0 siblings, 1 reply; 18+ messages in thread From: NeilBrown @ 2020-12-14 22:00 UTC (permalink / raw) To: Jeffrey Layton Cc: Jeff Layton, Amir Goldstein, Sargun Dhillon, Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara [-- Attachment #1: Type: text/plain, Size: 8678 bytes --] On Mon, Dec 14 2020, Jeffrey Layton wrote: > On Mon, Dec 14, 2020 at 10:35:56AM +1100, NeilBrown wrote: >> On Sun, Dec 13 2020, Jeff Layton wrote: >> >> > Overlayfs's volatile mounts want to be able to sample an error for >> > their own purposes, without preventing a later opener from potentially >> > seeing the error. >> > >> > The original reason for the SEEN flag was to make it so that we didn't >> > need to increment the counter if nothing had observed the latest value >> > and the error was the same. Eventually, a regression was reported in >> > the errseq_t conversion, and we fixed that by using the SEEN flag to >> > also mean that the error had been reported to userland at least once >> > somewhere. >> > >> > Those are two different states, however. If we instead take a second >> > flag bit from the counter, we can track these two things separately, >> > and accomodate the overlayfs volatile mount use-case. >> > >> > Add a new MUSTINC flag that indicates that the counter must be >> > incremented the next time an error is set, and rework the errseq >> > functions to set and clear that flag whenever the SEEN bit is set or >> > cleared. >> > >> > Test only for the MUSTINC bit when deciding whether to increment the >> > counter and only for the SEEN bit when deciding what to return in >> > errseq_sample. >> > >> > Add a new errseq_peek function to allow for the overlayfs use-case. >> > This just grabs the latest counter and sets the MUSTINC bit, leaving >> > the SEEN bit untouched. >> > >> > Signed-off-by: Jeff Layton <jlayton@kernel.org> >> > --- >> > include/linux/errseq.h | 2 ++ >> > lib/errseq.c | 64 ++++++++++++++++++++++++++++++++++-------- >> > 2 files changed, 55 insertions(+), 11 deletions(-) >> > >> > diff --git a/include/linux/errseq.h b/include/linux/errseq.h >> > index fc2777770768..6d4b9bc629ac 100644 >> > --- a/include/linux/errseq.h >> > +++ b/include/linux/errseq.h >> > @@ -9,6 +9,8 @@ typedef u32 errseq_t; >> > >> > errseq_t errseq_set(errseq_t *eseq, int err); >> > errseq_t errseq_sample(errseq_t *eseq); >> > +errseq_t errseq_peek(errseq_t *eseq); >> > +errseq_t errseq_sample_advance(errseq_t *eseq); >> > int errseq_check(errseq_t *eseq, errseq_t since); >> > int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); >> > #endif >> > diff --git a/lib/errseq.c b/lib/errseq.c >> > index 81f9e33aa7e7..5cc830f0361b 100644 >> > --- a/lib/errseq.c >> > +++ b/lib/errseq.c >> > @@ -38,8 +38,11 @@ >> > /* This bit is used as a flag to indicate whether the value has been seen */ >> > #define ERRSEQ_SEEN (1 << ERRSEQ_SHIFT) >> >> Would this look nicer using the BIT() macro? >> >> #define ERRSEQ_SEEN BIT(ERRSEQ_SHIFT) >> >> > >> > +/* This bit indicates that value must be incremented even when error is same */ >> > +#define ERRSEQ_MUSTINC (1 << (ERRSEQ_SHIFT + 1)) >> >> #define ERRSEQ_MUSTINC BIT(ERRSEQ_SHIFT+1) >> >> or if you don't like the BIT macro (not everyone does), then maybe >> >> #define ERR_SEQ_MUSTINC (ERRSEQ_SEEN << 1 ) >> >> ?? >> >> > + >> > /* The lowest bit of the counter */ >> > -#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1)) >> > +#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 2)) >> >> Ditto. >> > > Yes, I can make that change. The BIT macro is much easier to read. > >> > >> > /** >> > * errseq_set - set a errseq_t for later reporting >> > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err) >> > for (;;) { >> > errseq_t new; >> > >> > - /* Clear out error bits and set new error */ >> > - new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err; >> > + /* Clear out flag bits and set new error */ >> > + new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err; >> >> This is starting to look clumsy (or maybe, this already looked clumsy, >> but now that is hard to ignore). >> >> new = (old & (ERRSEQ_CTR_INC - 1)) | -err >> > > I think you mean: > > new = (old & ~(ERRSEQ_CTR_INC - 1)) | -err; > > Maybe I can add a new ERRSEQ_CTR_MASK value though which makes it more > evident. Sounds good. > >> Also this assumes MAX_ERRNO is a mask, which it is .. today. >> >> BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1)); >> ?? >> > > We already have this in errseq_set: > > BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1); Oh good - I didn't see. > >> > >> > - /* Only increment if someone has looked at it */ >> > - if (old & ERRSEQ_SEEN) >> > + /* Only increment if we have to */ >> > + if (old & ERRSEQ_MUSTINC) >> > new += ERRSEQ_CTR_INC; >> > >> > /* If there would be no change, then call it done */ >> > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set); >> > errseq_t errseq_sample(errseq_t *eseq) >> > { >> > errseq_t old = READ_ONCE(*eseq); >> > + errseq_t new = old; >> > >> > - /* If nobody has seen this error yet, then we can be the first. */ >> > - if (!(old & ERRSEQ_SEEN)) >> > - old = 0; >> > - return old; >> > + /* >> > + * For the common case of no errors ever having been set, we can skip >> > + * marking the SEEN|MUSTINC bits. Once an error has been set, the value >> > + * will never go back to zero. >> > + */ >> > + if (old != 0) { >> > + new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC; >> >> You lose me here. Why is ERRSEQ_SEEN being set, where it wasn't before? >> >> The ERRSEQ_SEEN flag not means precisely "The error has been reported to >> userspace". >> This operations isn't used to report errors - that is errseq_check(). >> >> I'm not saying the code it wrong - I really cannot tell. >> I'm just saying that I cannot see why it might be right. >> > > I think you're right. We should not be setting SEEN here, but we do > need to set MUSTINC if it's not already set. I'll fix (and re-test). Thanks. Though it isn't clear to me why MUSTINC needs to be set there, so if you could make that clear, it would help me. Also, the two flags seem similar in how they are handled, only tracking different states, but their names don't reflect that. I imagine changing "SEEN" to "MUST_REPORT" or similar, so both flags are "MUST_XXX". Only I think we would then need to invert "SEEN" - as it currently means "MUSTN'T_REPORT" .. approximately. Or maybe we could replace MUST_INC by DID_INC, so it says what has been done, rather than what must be done. Or maybe not. Certainly it would be useful to have a clear picture of how the two flags are similar, and how they are different. Thanks, NeilBrown > > Thanks for the review! > >> >> >> >> > + if (old != new) >> > + cmpxchg(eseq, old, new); >> > + if (!(old & ERRSEQ_SEEN)) >> > + return 0; >> > + } >> > + return new; >> > } >> > EXPORT_SYMBOL(errseq_sample); >> > >> > +/** >> > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN >> > + * @eseq: Pointer to errseq_t to be sampled. >> > + * >> > + * In some cases, we need to be able to sample the errseq_t, but we're not >> > + * in a situation where we can report the value to userland. Use this >> > + * function to do that. This ensures that later errors will be recorded, >> > + * and that any current errors are reported at least once. >> > + * >> > + * Context: Any context. >> > + * Return: The current errseq value. >> > + */ >> > +errseq_t errseq_peek(errseq_t *eseq) >> > +{ >> > + errseq_t old = READ_ONCE(*eseq); >> > + errseq_t new = old; >> > + >> > + if (old != 0) { >> > + new |= ERRSEQ_MUSTINC; >> > + if (old != new) >> > + cmpxchg(eseq, old, new); >> > + } >> > + return new; >> > +} >> > +EXPORT_SYMBOL(errseq_peek); >> > + >> > /** >> > * errseq_check() - Has an error occurred since a particular sample point? >> > * @eseq: Pointer to errseq_t value to be checked. >> > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample); >> > */ >> > int errseq_check(errseq_t *eseq, errseq_t since) >> > { >> > - errseq_t cur = READ_ONCE(*eseq); >> > + errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); >> > + >> > + /* Clear the flag bits for comparison */ >> > + since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); >> > >> > if (likely(cur == since)) >> > return 0; >> > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since) >> > * can advance "since" and return an error based on what we >> > * have. >> > */ >> > - new = old | ERRSEQ_SEEN; >> > + new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC; >> > if (new != old) >> > cmpxchg(eseq, old, new); >> > *since = new; >> > -- >> > 2.29.2 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 853 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags 2020-12-14 22:00 ` NeilBrown @ 2020-12-14 23:32 ` Jeff Layton 0 siblings, 0 replies; 18+ messages in thread From: Jeff Layton @ 2020-12-14 23:32 UTC (permalink / raw) To: NeilBrown Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Tue, 2020-12-15 at 09:00 +1100, NeilBrown wrote: > On Mon, Dec 14 2020, Jeffrey Layton wrote: > > > On Mon, Dec 14, 2020 at 10:35:56AM +1100, NeilBrown wrote: > > > On Sun, Dec 13 2020, Jeff Layton wrote: > > > > > > > Overlayfs's volatile mounts want to be able to sample an error for > > > > their own purposes, without preventing a later opener from potentially > > > > seeing the error. > > > > > > > > The original reason for the SEEN flag was to make it so that we didn't > > > > need to increment the counter if nothing had observed the latest value > > > > and the error was the same. Eventually, a regression was reported in > > > > the errseq_t conversion, and we fixed that by using the SEEN flag to > > > > also mean that the error had been reported to userland at least once > > > > somewhere. > > > > > > > > Those are two different states, however. If we instead take a second > > > > flag bit from the counter, we can track these two things separately, > > > > and accomodate the overlayfs volatile mount use-case. > > > > > > > > Add a new MUSTINC flag that indicates that the counter must be > > > > incremented the next time an error is set, and rework the errseq > > > > functions to set and clear that flag whenever the SEEN bit is set or > > > > cleared. > > > > > > > > Test only for the MUSTINC bit when deciding whether to increment the > > > > counter and only for the SEEN bit when deciding what to return in > > > > errseq_sample. > > > > > > > > Add a new errseq_peek function to allow for the overlayfs use-case. > > > > This just grabs the latest counter and sets the MUSTINC bit, leaving > > > > the SEEN bit untouched. > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > include/linux/errseq.h | 2 ++ > > > > lib/errseq.c | 64 ++++++++++++++++++++++++++++++++++-------- > > > > 2 files changed, 55 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/include/linux/errseq.h b/include/linux/errseq.h > > > > index fc2777770768..6d4b9bc629ac 100644 > > > > --- a/include/linux/errseq.h > > > > +++ b/include/linux/errseq.h > > > > @@ -9,6 +9,8 @@ typedef u32 errseq_t; > > > > > > > > > > > > > > > > > > > > errseq_t errseq_set(errseq_t *eseq, int err); > > > > errseq_t errseq_sample(errseq_t *eseq); > > > > +errseq_t errseq_peek(errseq_t *eseq); > > > > +errseq_t errseq_sample_advance(errseq_t *eseq); > > > > int errseq_check(errseq_t *eseq, errseq_t since); > > > > int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); > > > > #endif > > > > diff --git a/lib/errseq.c b/lib/errseq.c > > > > index 81f9e33aa7e7..5cc830f0361b 100644 > > > > --- a/lib/errseq.c > > > > +++ b/lib/errseq.c > > > > @@ -38,8 +38,11 @@ > > > > /* This bit is used as a flag to indicate whether the value has been seen */ > > > > #define ERRSEQ_SEEN (1 << ERRSEQ_SHIFT) > > > > > > Would this look nicer using the BIT() macro? > > > > > > #define ERRSEQ_SEEN BIT(ERRSEQ_SHIFT) > > > > > > > > > > > > > > > > > > > > > > > +/* This bit indicates that value must be incremented even when error is same */ > > > > +#define ERRSEQ_MUSTINC (1 << (ERRSEQ_SHIFT + 1)) > > > > > > #define ERRSEQ_MUSTINC BIT(ERRSEQ_SHIFT+1) > > > > > > or if you don't like the BIT macro (not everyone does), then maybe > > > > > > #define ERR_SEQ_MUSTINC (ERRSEQ_SEEN << 1 ) > > > > > > ?? > > > > > > > + > > > > /* The lowest bit of the counter */ > > > > -#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1)) > > > > +#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 2)) > > > > > > Ditto. > > > > > > > Yes, I can make that change. The BIT macro is much easier to read. > > > > > > > > > > > > > > > > > > > > > > /** > > > > * errseq_set - set a errseq_t for later reporting > > > > @@ -77,11 +80,11 @@ errseq_t errseq_set(errseq_t *eseq, int err) > > > > for (;;) { > > > > errseq_t new; > > > > > > > > > > > > > > > > > > > > - /* Clear out error bits and set new error */ > > > > - new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN)) | -err; > > > > + /* Clear out flag bits and set new error */ > > > > + new = (old & ~(MAX_ERRNO|ERRSEQ_SEEN|ERRSEQ_MUSTINC)) | -err; > > > > > > This is starting to look clumsy (or maybe, this already looked clumsy, > > > but now that is hard to ignore). > > > > > > new = (old & (ERRSEQ_CTR_INC - 1)) | -err > > > > > > > I think you mean: > > > > new = (old & ~(ERRSEQ_CTR_INC - 1)) | -err; > > > > Maybe I can add a new ERRSEQ_CTR_MASK value though which makes it more > > evident. > > Sounds good. > > > > > > Also this assumes MAX_ERRNO is a mask, which it is .. today. > > > > > > BUILD_BUG_ON(MAX_ERRNO & (MAX_ERRNO + 1)); > > > ?? > > > > > > > We already have this in errseq_set: > > > > BUILD_BUG_ON_NOT_POWER_OF_2(MAX_ERRNO + 1); > > Oh good - I didn't see. > > > > > > > > > > > > > > > > > > > > > > > - /* Only increment if someone has looked at it */ > > > > - if (old & ERRSEQ_SEEN) > > > > + /* Only increment if we have to */ > > > > + if (old & ERRSEQ_MUSTINC) > > > > new += ERRSEQ_CTR_INC; > > > > > > > > > > > > > > > > > > > > /* If there would be no change, then call it done */ > > > > @@ -122,14 +125,50 @@ EXPORT_SYMBOL(errseq_set); > > > > errseq_t errseq_sample(errseq_t *eseq) > > > > { > > > > errseq_t old = READ_ONCE(*eseq); > > > > + errseq_t new = old; > > > > > > > > > > > > > > > > > > > > - /* If nobody has seen this error yet, then we can be the first. */ > > > > - if (!(old & ERRSEQ_SEEN)) > > > > - old = 0; > > > > - return old; > > > > + /* > > > > + * For the common case of no errors ever having been set, we can skip > > > > + * marking the SEEN|MUSTINC bits. Once an error has been set, the value > > > > + * will never go back to zero. > > > > + */ > > > > + if (old != 0) { > > > > + new |= ERRSEQ_SEEN|ERRSEQ_MUSTINC; > > > > > > You lose me here. Why is ERRSEQ_SEEN being set, where it wasn't before? > > > > > > The ERRSEQ_SEEN flag not means precisely "The error has been reported to > > > userspace". > > > This operations isn't used to report errors - that is errseq_check(). > > > > > > I'm not saying the code it wrong - I really cannot tell. > > > I'm just saying that I cannot see why it might be right. > > > > > > > I think you're right. We should not be setting SEEN here, but we do > > need to set MUSTINC if it's not already set. I'll fix (and re-test). > > Thanks. Though it isn't clear to me why MUSTINC needs to be set there, > so if you could make that clear, it would help me. > > Also, the two flags seem similar in how they are handled, only tracking > different states, but their names don't reflect that. > I imagine changing "SEEN" to "MUST_REPORT" or similar, so both flags are > "MUST_XXX". > Only I think we would then need to invert "SEEN" - as it currently means > "MUSTN'T_REPORT" .. approximately. > > Or maybe we could replace MUST_INC by DID_INC, so it says what has been > done, rather than what must be done. > > Or maybe not. Certainly it would be useful to have a clear picture of > how the two flags are similar, and how they are different. > You need to set MUSTINC in errseq_peek to ensure that the next error that occurs will be recorded, via the counter being bumped. Otherwise that increment may be skipped (if no one else observed the last error). I sent a v2 set before I saw your mail. Hopefully it addresses some of your concerns. You're right that the flag naming is a bit awkward. I'm open to suggestions for names, but I'd probably like to keep the "sense" of the flags so that I don't need to sort out the logic again. It also works better with 0 being a special value that way. > Thanks, > NeilBrown > > > > > > Thanks for the review! > > > > > > > > > > > > > > > + if (old != new) > > > > + cmpxchg(eseq, old, new); > > > > + if (!(old & ERRSEQ_SEEN)) > > > > + return 0; > > > > + } > > > > + return new; > > > > } > > > > EXPORT_SYMBOL(errseq_sample); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > +/** > > > > + * errseq_peek - Grab current errseq_t value, but don't mark it SEEN > > > > + * @eseq: Pointer to errseq_t to be sampled. > > > > + * > > > > + * In some cases, we need to be able to sample the errseq_t, but we're not > > > > + * in a situation where we can report the value to userland. Use this > > > > + * function to do that. This ensures that later errors will be recorded, > > > > + * and that any current errors are reported at least once. > > > > + * > > > > + * Context: Any context. > > > > + * Return: The current errseq value. > > > > + */ > > > > +errseq_t errseq_peek(errseq_t *eseq) > > > > +{ > > > > + errseq_t old = READ_ONCE(*eseq); > > > > + errseq_t new = old; > > > > + > > > > + if (old != 0) { > > > > + new |= ERRSEQ_MUSTINC; > > > > + if (old != new) > > > > + cmpxchg(eseq, old, new); > > > > + } > > > > + return new; > > > > +} > > > > +EXPORT_SYMBOL(errseq_peek); > > > > + > > > > /** > > > > * errseq_check() - Has an error occurred since a particular sample point? > > > > * @eseq: Pointer to errseq_t value to be checked. > > > > @@ -143,7 +182,10 @@ EXPORT_SYMBOL(errseq_sample); > > > > */ > > > > int errseq_check(errseq_t *eseq, errseq_t since) > > > > { > > > > - errseq_t cur = READ_ONCE(*eseq); > > > > + errseq_t cur = READ_ONCE(*eseq) & ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > > > > + > > > > + /* Clear the flag bits for comparison */ > > > > + since &= ~(ERRSEQ_MUSTINC|ERRSEQ_SEEN); > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > if (likely(cur == since)) > > > > return 0; > > > > @@ -195,7 +237,7 @@ int errseq_check_and_advance(errseq_t *eseq, errseq_t *since) > > > > * can advance "since" and return an error based on what we > > > > * have. > > > > */ > > > > - new = old | ERRSEQ_SEEN; > > > > + new = old | ERRSEQ_SEEN | ERRSEQ_MUSTINC; > > > > if (new != old) > > > > cmpxchg(eseq, old, new); > > > > *since = new; > > > > -- > > > > 2.29.2 -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton 2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton @ 2020-12-13 13:27 ` Jeff Layton 2020-12-14 21:38 ` Vivek Goyal 2020-12-17 19:28 ` Sargun Dhillon 2020-12-13 20:31 ` [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Sargun Dhillon 2 siblings, 2 replies; 18+ messages in thread From: Jeff Layton @ 2020-12-13 13:27 UTC (permalink / raw) To: Amir Goldstein, Sargun Dhillon Cc: Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara Peek at the upper layer's errseq_t at mount time for volatile mounts, and record it in the per-sb info. In sync_fs, check for an error since the recorded point and set it in the overlayfs superblock if there was one. Signed-off-by: Jeff Layton <jlayton@kernel.org> --- fs/overlayfs/ovl_entry.h | 1 + fs/overlayfs/super.c | 14 +++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h index 1b5a2094df8e..fcfcc3951973 100644 --- a/fs/overlayfs/ovl_entry.h +++ b/fs/overlayfs/ovl_entry.h @@ -79,6 +79,7 @@ struct ovl_fs { atomic_long_t last_ino; /* Whiteout dentry cache */ struct dentry *whiteout; + errseq_t err_mark; }; static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index 290983bcfbb3..2985d2752970 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -264,8 +264,13 @@ static int ovl_sync_fs(struct super_block *sb, int wait) if (!ovl_upper_mnt(ofs)) return 0; - if (!ovl_should_sync(ofs)) - return 0; + if (!ovl_should_sync(ofs)) { + /* Propagate errors from upper to overlayfs */ + ret = errseq_check(&upper_sb->s_wb_err, ofs->err_mark); + errseq_set(&sb->s_wb_err, ret); + return ret; + } + /* * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC). * All the super blocks will be iterated, including upper_sb. @@ -1945,8 +1950,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; - } + + if (ofs->config.ovl_volatile) + ofs->err_mark = errseq_peek(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); + oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); err = PTR_ERR(oe); if (IS_ERR(oe)) -- 2.29.2 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton @ 2020-12-14 21:38 ` Vivek Goyal 2020-12-14 22:04 ` Sargun Dhillon 2020-12-14 23:53 ` Jeff Layton 2020-12-17 19:28 ` Sargun Dhillon 1 sibling, 2 replies; 18+ messages in thread From: Vivek Goyal @ 2020-12-14 21:38 UTC (permalink / raw) To: Jeff Layton Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > Peek at the upper layer's errseq_t at mount time for volatile mounts, > and record it in the per-sb info. In sync_fs, check for an error since > the recorded point and set it in the overlayfs superblock if there was > one. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- While we are solving problem for non-volatile overlay mount, I also started thinking, what about non-volatile overlay syncfs() writeback errors. Looks like these will not be reported to user space at all as of now (because we never update overlay_sb->s_wb_err ever). A patch like this might fix it. (compile tested only). overlayfs: Report syncfs() errors to user space Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the return code. But certain writeback errors can still be reported on syncfs() by checking errors on super block. ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); For the case of overlayfs, we never set overlayfs super block s_wb_err. That means sync() will never report writeback errors on overlayfs uppon syncfs(). Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that should mean that user space syncfs() call should see writeback errors. ovl_fsync() does not need anything special because if there are writeback errors underlying filesystem will report it through vfs_fsync_range() return code and user space will see it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- fs/overlayfs/ovl_entry.h | 1 + fs/overlayfs/super.c | 14 +++++++++++--- 2 files changed, 12 insertions(+), 3 deletions(-) Index: redhat-linux/fs/overlayfs/super.c =================================================================== --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc { struct ovl_fs *ofs = sb->s_fs_info; struct super_block *upper_sb; - int ret; + int ret, ret2; if (!ovl_upper_mnt(ofs)) return 0; @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc ret = sync_filesystem(upper_sb); up_read(&upper_sb->s_umount); - return ret; + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { + /* Upper sb has errors since last time */ + spin_lock(&ofs->errseq_lock); + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, + &sb->s_wb_err); + spin_unlock(&ofs->errseq_lock); + } + return ret ? ret : ret2; } /** @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b if (!cred) goto out_err; + spin_lock_init(&ofs->errseq_lock); /* Is there a reason anyone would want not to share whiteouts? */ ofs->share_whiteout = true; @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; - + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); } oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); err = PTR_ERR(oe); Index: redhat-linux/fs/overlayfs/ovl_entry.h =================================================================== --- redhat-linux.orig/fs/overlayfs/ovl_entry.h 2020-12-14 15:33:43.934400880 -0500 +++ redhat-linux/fs/overlayfs/ovl_entry.h 2020-12-14 15:34:13.509400880 -0500 @@ -79,6 +79,7 @@ struct ovl_fs { atomic_long_t last_ino; /* Whiteout dentry cache */ struct dentry *whiteout; + spinlock_t errseq_lock; }; static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-14 21:38 ` Vivek Goyal @ 2020-12-14 22:04 ` Sargun Dhillon 2020-12-14 23:01 ` Vivek Goyal 2020-12-14 23:53 ` Jeff Layton 1 sibling, 1 reply; 18+ messages in thread From: Sargun Dhillon @ 2020-12-14 22:04 UTC (permalink / raw) To: Vivek Goyal Cc: Jeff Layton, Amir Goldstein, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Mon, Dec 14, 2020 at 04:38:43PM -0500, Vivek Goyal wrote: > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > and record it in the per-sb info. In sync_fs, check for an error since > > the recorded point and set it in the overlayfs superblock if there was > > one. > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > While we are solving problem for non-volatile overlay mount, I also > started thinking, what about non-volatile overlay syncfs() writeback errors. > Looks like these will not be reported to user space at all as of now > (because we never update overlay_sb->s_wb_err ever). > > A patch like this might fix it. (compile tested only). > > overlayfs: Report syncfs() errors to user space > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > return code. But certain writeback errors can still be reported on > syncfs() by checking errors on super block. > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > should mean that user space syncfs() call should see writeback errors. > > ovl_fsync() does not need anything special because if there are writeback > errors underlying filesystem will report it through vfs_fsync_range() return > code and user space will see it. > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > --- > fs/overlayfs/ovl_entry.h | 1 + > fs/overlayfs/super.c | 14 +++++++++++--- > 2 files changed, 12 insertions(+), 3 deletions(-) > > Index: redhat-linux/fs/overlayfs/super.c > =================================================================== > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > { > struct ovl_fs *ofs = sb->s_fs_info; > struct super_block *upper_sb; > - int ret; > + int ret, ret2; > > if (!ovl_upper_mnt(ofs)) > return 0; > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > ret = sync_filesystem(upper_sb); > up_read(&upper_sb->s_umount); > > - return ret; > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > + /* Upper sb has errors since last time */ > + spin_lock(&ofs->errseq_lock); > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > + &sb->s_wb_err); > + spin_unlock(&ofs->errseq_lock); > + } > + return ret ? ret : ret2; > } > > /** > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b > if (!cred) > goto out_err; > > + spin_lock_init(&ofs->errseq_lock); > /* Is there a reason anyone would want not to share whiteouts? */ > ofs->share_whiteout = true; > > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; > - > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); > } > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); > err = PTR_ERR(oe); > Index: redhat-linux/fs/overlayfs/ovl_entry.h > =================================================================== > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h 2020-12-14 15:33:43.934400880 -0500 > +++ redhat-linux/fs/overlayfs/ovl_entry.h 2020-12-14 15:34:13.509400880 -0500 > @@ -79,6 +79,7 @@ struct ovl_fs { > atomic_long_t last_ino; > /* Whiteout dentry cache */ > struct dentry *whiteout; > + spinlock_t errseq_lock; > }; > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) > This was on my list of things to look at. I don't think we can / should use errseq_check_and_advance because it will hide errors from userspace. I think we need something like: At startup, call errseq_peek and stash that value somewhere. This sets the MUSTINC flag. At syncfs time: call errseq check, if it says there is an error, call errseq_peek again, and store the error in our superblock. Take the error value from the differenceb between the previous one and the new one, and copy it up to the superblock. Either way, I think Jeff's work of making it so other kernel subsytems can interact with errseq on a superblock bears fruit elsewhere. If the first patch gets merged, I can put together the patches to do the standard error bubble up for normal syncfs, volatile syncfs, and volatile remount. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-14 22:04 ` Sargun Dhillon @ 2020-12-14 23:01 ` Vivek Goyal 0 siblings, 0 replies; 18+ messages in thread From: Vivek Goyal @ 2020-12-14 23:01 UTC (permalink / raw) To: Sargun Dhillon Cc: Jeff Layton, Amir Goldstein, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Mon, Dec 14, 2020 at 10:04:14PM +0000, Sargun Dhillon wrote: > On Mon, Dec 14, 2020 at 04:38:43PM -0500, Vivek Goyal wrote: > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > > and record it in the per-sb info. In sync_fs, check for an error since > > > the recorded point and set it in the overlayfs superblock if there was > > > one. > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > > While we are solving problem for non-volatile overlay mount, I also > > started thinking, what about non-volatile overlay syncfs() writeback errors. > > Looks like these will not be reported to user space at all as of now > > (because we never update overlay_sb->s_wb_err ever). > > > > A patch like this might fix it. (compile tested only). > > > > overlayfs: Report syncfs() errors to user space > > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > > return code. But certain writeback errors can still be reported on > > syncfs() by checking errors on super block. > > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > > should mean that user space syncfs() call should see writeback errors. > > > > ovl_fsync() does not need anything special because if there are writeback > > errors underlying filesystem will report it through vfs_fsync_range() return > > code and user space will see it. > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > > --- > > fs/overlayfs/ovl_entry.h | 1 + > > fs/overlayfs/super.c | 14 +++++++++++--- > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > Index: redhat-linux/fs/overlayfs/super.c > > =================================================================== > > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > > { > > struct ovl_fs *ofs = sb->s_fs_info; > > struct super_block *upper_sb; > > - int ret; > > + int ret, ret2; > > > > if (!ovl_upper_mnt(ofs)) > > return 0; > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > > ret = sync_filesystem(upper_sb); > > up_read(&upper_sb->s_umount); > > > > - return ret; > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > > + /* Upper sb has errors since last time */ > > + spin_lock(&ofs->errseq_lock); > > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > > + &sb->s_wb_err); > > + spin_unlock(&ofs->errseq_lock); > > + } > > + return ret ? ret : ret2; > > } > > > > /** > > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b > > if (!cred) > > goto out_err; > > > > + spin_lock_init(&ofs->errseq_lock); > > /* Is there a reason anyone would want not to share whiteouts? */ > > ofs->share_whiteout = true; > > > > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b > > > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; > > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; > > - > > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); > > } > > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); > > err = PTR_ERR(oe); > > Index: redhat-linux/fs/overlayfs/ovl_entry.h > > =================================================================== > > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h 2020-12-14 15:33:43.934400880 -0500 > > +++ redhat-linux/fs/overlayfs/ovl_entry.h 2020-12-14 15:34:13.509400880 -0500 > > @@ -79,6 +79,7 @@ struct ovl_fs { > > atomic_long_t last_ino; > > /* Whiteout dentry cache */ > > struct dentry *whiteout; > > + spinlock_t errseq_lock; > > }; > > > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) > > > > This was on my list of things to look at. I don't think we can / should use > errseq_check_and_advance because it will hide errors from userspace. Hi Sargun, I have been struggling to figure out when to use errseq_check_and_advance() and when to use this error_check() and errorseq_set() combination. My rational for using errseq_check_and_advance() is that say there is an unseen error on upper super block, and if overlayfs calls syncfs(), then this call should set SEEN flag on upper super block, isn't it. This is equivalent of an app directly calling syncfs() on upper. If we use error_check() and errseq_set() combination, then we are just reading state of upper superblock but really not impacting it despite the fact overlay apps are calling syncfs(). Comapre it with fsync(). For non-volatile overlay, an fsync() overlay call will set SEEN flag on upper file. And I believe same thing should happen for ovl_syncfs() call as well. It makes more sense to me. Wondering how will it hide errors from user space. ovl "struct file" will have its own f->f_sb_err initialized from overlay superblock. And if overlay super block gets updated with error, a later errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err) should still return an error. What am I missing? Vivek > I think we > need something like: > > At startup, call errseq_peek and stash that value somewhere. This sets the > MUSTINC flag. So this errseq_peek() stuff is required only if we don't want to consume error while checking for error. In fact consuming error is not bad as long as we do it only ovl_syncfs() path. In fact it will match current semantics of syncfs(). But issue we have is that we want to check for error outside syncfs() path too and don't want to consume it (otherwise it breaks the semantics that it will bee seen marked in syncfs() path). So that's why this notion of checking error without consuming it so tha we can check it in remount path. But in syncfs() path, it should be ok to consume unseen error and we should be able to call errseq_check_and_advance(), both for volatile and non-volatile mounts, isn't it? For ther paths, like remount, we probably can stash away on persistent storage and compare that value on remount and fail remount without actually consuming unseen error (because it is not syncfs path). This possibly can be used in other paths like read/write as well to make sure we can notice error without consuming it. IOW, we seem to have to paths we want to check errors in. In ovl_syncfs() path we should be able consume exisiting unseen error on upper, so errseq_check_and_advance() makes sense. In rest of the paths, we should use new semantics to check for errors. Vivek > > At syncfs time: call errseq check, if it says there is an error, call > errseq_peek again, and store the error in our superblock. Take the error value > from the differenceb between the previous one and the new one, and copy it up to > the superblock. > > Either way, I think Jeff's work of making it so other kernel subsytems can > interact with errseq on a superblock bears fruit elsewhere. If the first patch > gets merged, I can put together the patches to do the standard error bubble > up for normal syncfs, volatile syncfs, and volatile remount. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-14 21:38 ` Vivek Goyal 2020-12-14 22:04 ` Sargun Dhillon @ 2020-12-14 23:53 ` Jeff Layton 2020-12-15 13:16 ` Jeff Layton 2020-12-15 15:06 ` Vivek Goyal 1 sibling, 2 replies; 18+ messages in thread From: Jeff Layton @ 2020-12-14 23:53 UTC (permalink / raw) To: Vivek Goyal Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote: > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > and record it in the per-sb info. In sync_fs, check for an error since > > the recorded point and set it in the overlayfs superblock if there was > > one. > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > While we are solving problem for non-volatile overlay mount, I also > started thinking, what about non-volatile overlay syncfs() writeback errors. > Looks like these will not be reported to user space at all as of now > (because we never update overlay_sb->s_wb_err ever). > > A patch like this might fix it. (compile tested only). > > overlayfs: Report syncfs() errors to user space > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > return code. But certain writeback errors can still be reported on > syncfs() by checking errors on super block. > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > should mean that user space syncfs() call should see writeback errors. > > ovl_fsync() does not need anything special because if there are writeback > errors underlying filesystem will report it through vfs_fsync_range() return > code and user space will see it. > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > --- > fs/overlayfs/ovl_entry.h | 1 + > fs/overlayfs/super.c | 14 +++++++++++--- > 2 files changed, 12 insertions(+), 3 deletions(-) > > Index: redhat-linux/fs/overlayfs/super.c > =================================================================== > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > { > struct ovl_fs *ofs = sb->s_fs_info; > struct super_block *upper_sb; > - int ret; > + int ret, ret2; > > > > > if (!ovl_upper_mnt(ofs)) > return 0; > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > ret = sync_filesystem(upper_sb); > up_read(&upper_sb->s_umount); > > > > > - return ret; > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > + /* Upper sb has errors since last time */ > + spin_lock(&ofs->errseq_lock); > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > + &sb->s_wb_err); > + spin_unlock(&ofs->errseq_lock); > + } > + return ret ? ret : ret2; I think this is probably not quite right. The problem I think is that the SEEN flag is always going to end up being set in sb->s_wb_err, and that is going to violate the desired semantics. If the writeback error occurred after all fd's were closed, then the next opener wouldn't see it and you'd lose the error. We probably need a function to cleanly propagate the error from one errseq_t to another so that that doesn't occur. I'll have to think about it. > } > > > > > /** > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b > if (!cred) > goto out_err; > > > > > + spin_lock_init(&ofs->errseq_lock); > /* Is there a reason anyone would want not to share whiteouts? */ > ofs->share_whiteout = true; > > > > > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b > > > > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; > - > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); > } > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); > err = PTR_ERR(oe); > Index: redhat-linux/fs/overlayfs/ovl_entry.h > =================================================================== > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h 2020-12-14 15:33:43.934400880 -0500 > +++ redhat-linux/fs/overlayfs/ovl_entry.h 2020-12-14 15:34:13.509400880 -0500 > @@ -79,6 +79,7 @@ struct ovl_fs { > atomic_long_t last_ino; > /* Whiteout dentry cache */ > struct dentry *whiteout; > + spinlock_t errseq_lock; > }; > > > > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) > -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-14 23:53 ` Jeff Layton @ 2020-12-15 13:16 ` Jeff Layton 2020-12-15 14:59 ` Vivek Goyal 2020-12-15 15:06 ` Vivek Goyal 1 sibling, 1 reply; 18+ messages in thread From: Jeff Layton @ 2020-12-15 13:16 UTC (permalink / raw) To: Vivek Goyal Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote: > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote: > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > > and record it in the per-sb info. In sync_fs, check for an error since > > > the recorded point and set it in the overlayfs superblock if there was > > > one. > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > > While we are solving problem for non-volatile overlay mount, I also > > started thinking, what about non-volatile overlay syncfs() writeback errors. > > Looks like these will not be reported to user space at all as of now > > (because we never update overlay_sb->s_wb_err ever). > > > > A patch like this might fix it. (compile tested only). > > > > overlayfs: Report syncfs() errors to user space > > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > > return code. But certain writeback errors can still be reported on > > syncfs() by checking errors on super block. > > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > > should mean that user space syncfs() call should see writeback errors. > > > > ovl_fsync() does not need anything special because if there are writeback > > errors underlying filesystem will report it through vfs_fsync_range() return > > code and user space will see it. > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > > --- > > fs/overlayfs/ovl_entry.h | 1 + > > fs/overlayfs/super.c | 14 +++++++++++--- > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > Index: redhat-linux/fs/overlayfs/super.c > > =================================================================== > > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > > { > > struct ovl_fs *ofs = sb->s_fs_info; > > struct super_block *upper_sb; > > - int ret; > > + int ret, ret2; > > > > > > > > > > if (!ovl_upper_mnt(ofs)) > > return 0; > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > > ret = sync_filesystem(upper_sb); > > up_read(&upper_sb->s_umount); > > > > > > > > > > - return ret; > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > > + /* Upper sb has errors since last time */ > > + spin_lock(&ofs->errseq_lock); > > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > > + &sb->s_wb_err); > > + spin_unlock(&ofs->errseq_lock); > > + } > > + return ret ? ret : ret2; > > I think this is probably not quite right. > > The problem I think is that the SEEN flag is always going to end up > being set in sb->s_wb_err, and that is going to violate the desired > semantics. If the writeback error occurred after all fd's were closed, > then the next opener wouldn't see it and you'd lose the error. > > We probably need a function to cleanly propagate the error from one > errseq_t to another so that that doesn't occur. I'll have to think about > it. > So, the problem is that we can't guarantee that we'll have an open file when sync_fs is called. So if you do the check_and_advance in the context of a sync() syscall, you'll effectively ensure that a later opener on the upper layer won't see the error (since the upper_sb's errseq_t will be marked SEEN. It's not clear to me what semantics you want in the following situation: mount upper layer mount overlayfs with non-volatile upper layer do "stuff" on overlayfs, and close all files on overlayfs get a writeback error on upper layer call sync() (sync_fs gets run) open file on upper layer mount call syncfs() on upper-layer fd Should that last syncfs error report an error? Also, suppose if at the end we instead opened a file on overlayfs and issued the syncfs() there -- should we see the error in that case? > > } > > > > > > > > > > /** > > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b > > if (!cred) > > goto out_err; > > > > > > > > > > + spin_lock_init(&ofs->errseq_lock); > > /* Is there a reason anyone would want not to share whiteouts? */ > > ofs->share_whiteout = true; > > > > > > > > > > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b > > > > > > > > > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; > > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; > > - > > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); > > } > > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); > > err = PTR_ERR(oe); > > Index: redhat-linux/fs/overlayfs/ovl_entry.h > > =================================================================== > > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h 2020-12-14 15:33:43.934400880 -0500 > > +++ redhat-linux/fs/overlayfs/ovl_entry.h 2020-12-14 15:34:13.509400880 -0500 > > @@ -79,6 +79,7 @@ struct ovl_fs { > > atomic_long_t last_ino; > > /* Whiteout dentry cache */ > > struct dentry *whiteout; > > + spinlock_t errseq_lock; > > }; > > > > > > > > > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) > > > -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-15 13:16 ` Jeff Layton @ 2020-12-15 14:59 ` Vivek Goyal 2020-12-15 15:23 ` Jeff Layton 0 siblings, 1 reply; 18+ messages in thread From: Vivek Goyal @ 2020-12-15 14:59 UTC (permalink / raw) To: Jeff Layton Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Tue, Dec 15, 2020 at 08:16:12AM -0500, Jeff Layton wrote: > On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote: > > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote: > > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > > > and record it in the per-sb info. In sync_fs, check for an error since > > > > the recorded point and set it in the overlayfs superblock if there was > > > > one. > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > --- > > > > > > While we are solving problem for non-volatile overlay mount, I also > > > started thinking, what about non-volatile overlay syncfs() writeback errors. > > > Looks like these will not be reported to user space at all as of now > > > (because we never update overlay_sb->s_wb_err ever). > > > > > > A patch like this might fix it. (compile tested only). > > > > > > overlayfs: Report syncfs() errors to user space > > > > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > > > return code. But certain writeback errors can still be reported on > > > syncfs() by checking errors on super block. > > > > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > > > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > > > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > > > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > > > should mean that user space syncfs() call should see writeback errors. > > > > > > ovl_fsync() does not need anything special because if there are writeback > > > errors underlying filesystem will report it through vfs_fsync_range() return > > > code and user space will see it. > > > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > > > --- > > > fs/overlayfs/ovl_entry.h | 1 + > > > fs/overlayfs/super.c | 14 +++++++++++--- > > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > > > Index: redhat-linux/fs/overlayfs/super.c > > > =================================================================== > > > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > > > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > > > { > > > struct ovl_fs *ofs = sb->s_fs_info; > > > struct super_block *upper_sb; > > > - int ret; > > > + int ret, ret2; > > > > > > > > > > > > > > > if (!ovl_upper_mnt(ofs)) > > > return 0; > > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > > > ret = sync_filesystem(upper_sb); > > > up_read(&upper_sb->s_umount); > > > > > > > > > > > > > > > - return ret; > > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > > > + /* Upper sb has errors since last time */ > > > + spin_lock(&ofs->errseq_lock); > > > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > > > + &sb->s_wb_err); > > > + spin_unlock(&ofs->errseq_lock); > > > + } > > > + return ret ? ret : ret2; > > > > I think this is probably not quite right. > > > > The problem I think is that the SEEN flag is always going to end up > > being set in sb->s_wb_err, and that is going to violate the desired > > semantics. If the writeback error occurred after all fd's were closed, > > then the next opener wouldn't see it and you'd lose the error. > > > > We probably need a function to cleanly propagate the error from one > > errseq_t to another so that that doesn't occur. I'll have to think about > > it. > > > > So, the problem is that we can't guarantee that we'll have an open file > when sync_fs is called. So if you do the check_and_advance in the > context of a sync() syscall, you'll effectively ensure that a later > opener on the upper layer won't see the error (since the upper_sb's > errseq_t will be marked SEEN. Aha.., I assumed that when ->sync_fs() is called, we always have a valid fd open. But that's only true if ->sync_fs() is being called through syncfs(fd) syscall. For the case of plain sync() syscall, this is not true. So it leads us back to need of passing "struct file" in ->sync_fs(). And fetching the writeback error from upper can be done only if a file is open on which syncfs() has been called. > > It's not clear to me what semantics you want in the following situation: > > mount upper layer > mount overlayfs with non-volatile upper layer > do "stuff" on overlayfs, and close all files on overlayfs > get a writeback error on upper layer > call sync() (sync_fs gets run) > open file on upper layer mount > call syncfs() on upper-layer fd > > Should that last syncfs error report an error? Actually, I was thinking of following. - mount upper layer - mount overlayfs (non-volatile) - Do bunch of writes. - A writeback error happens on upper file and gets recorded in upper fs sb. - overlay application calls syncfs(fd) and gets the error back. IIUC, the way currently things are written, syncfs(fd) will not return writeback errors on overlayfs. > > Also, suppose if at the end we instead opened a file on overlayfs and > issued the syncfs() there -- should we see the error in that case? I am thinking that behavior should be similar to as if two file descriptors have been opened on a regular filesystem. So if I open one fd1 on overlay and one fd2 on upper and they both were opened before writeback error happend, then syncfs(fd1) and syncfs(fd2), both should see the error. And any of syncfs(fd1) and syncfs(fd2) should set the SEEN flag in upper_sb so that new errors can continue to be reported. IOW, so looks like major problem with this patch is that we need to propagate error from upper_sb to overlaysb only if a valid file descriptor is open. IOW, do this in syncfs(fd) path and not sync() path. And to distinguish between two, we probably need to pass additional parameter in ->sync_fs(). Am I missing somehting. Just trying to make sure that if we are solving the problem of syncfs error propagation in overlay, lets solve it both for volatile as well as non-volatile case so that there is less confusion later. Vivek ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-15 14:59 ` Vivek Goyal @ 2020-12-15 15:23 ` Jeff Layton 2020-12-15 15:39 ` Vivek Goyal 0 siblings, 1 reply; 18+ messages in thread From: Jeff Layton @ 2020-12-15 15:23 UTC (permalink / raw) To: Vivek Goyal Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Tue, 2020-12-15 at 09:59 -0500, Vivek Goyal wrote: > On Tue, Dec 15, 2020 at 08:16:12AM -0500, Jeff Layton wrote: > > On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote: > > > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote: > > > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > > > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > > > > and record it in the per-sb info. In sync_fs, check for an error since > > > > > the recorded point and set it in the overlayfs superblock if there was > > > > > one. > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > --- > > > > > > > > While we are solving problem for non-volatile overlay mount, I also > > > > started thinking, what about non-volatile overlay syncfs() writeback errors. > > > > Looks like these will not be reported to user space at all as of now > > > > (because we never update overlay_sb->s_wb_err ever). > > > > > > > > A patch like this might fix it. (compile tested only). > > > > > > > > overlayfs: Report syncfs() errors to user space > > > > > > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > > > > return code. But certain writeback errors can still be reported on > > > > syncfs() by checking errors on super block. > > > > > > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > > > > > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > > > > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > > > > > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > > > > should mean that user space syncfs() call should see writeback errors. > > > > > > > > ovl_fsync() does not need anything special because if there are writeback > > > > errors underlying filesystem will report it through vfs_fsync_range() return > > > > code and user space will see it. > > > > > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > > > > --- > > > > fs/overlayfs/ovl_entry.h | 1 + > > > > fs/overlayfs/super.c | 14 +++++++++++--- > > > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > > > > > Index: redhat-linux/fs/overlayfs/super.c > > > > =================================================================== > > > > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > > > > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > > > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > > > > { > > > > struct ovl_fs *ofs = sb->s_fs_info; > > > > struct super_block *upper_sb; > > > > - int ret; > > > > + int ret, ret2; > > > > > > > > > > > > > > > > > > > > if (!ovl_upper_mnt(ofs)) > > > > return 0; > > > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > > > > ret = sync_filesystem(upper_sb); > > > > up_read(&upper_sb->s_umount); > > > > > > > > > > > > > > > > > > > > - return ret; > > > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > > > > + /* Upper sb has errors since last time */ > > > > + spin_lock(&ofs->errseq_lock); > > > > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > > > > + &sb->s_wb_err); > > > > + spin_unlock(&ofs->errseq_lock); > > > > + } > > > > + return ret ? ret : ret2; > > > > > > I think this is probably not quite right. > > > > > > The problem I think is that the SEEN flag is always going to end up > > > being set in sb->s_wb_err, and that is going to violate the desired > > > semantics. If the writeback error occurred after all fd's were closed, > > > then the next opener wouldn't see it and you'd lose the error. > > > > > > We probably need a function to cleanly propagate the error from one > > > errseq_t to another so that that doesn't occur. I'll have to think about > > > it. > > > > > > > So, the problem is that we can't guarantee that we'll have an open file > > when sync_fs is called. So if you do the check_and_advance in the > > context of a sync() syscall, you'll effectively ensure that a later > > opener on the upper layer won't see the error (since the upper_sb's > > errseq_t will be marked SEEN. > > Aha.., I assumed that when ->sync_fs() is called, we always have a > valid fd open. But that's only true if ->sync_fs() is being called > through syncfs(fd) syscall. For the case of plain sync() syscall, > this is not true. > > So it leads us back to need of passing "struct file" in ->sync_fs(). > And fetching the writeback error from upper can be done only > if a file is open on which syncfs() has been called. > > > > > It's not clear to me what semantics you want in the following situation: > > > > mount upper layer > > mount overlayfs with non-volatile upper layer > > do "stuff" on overlayfs, and close all files on overlayfs > > get a writeback error on upper layer > > call sync() (sync_fs gets run) > > open file on upper layer mount > > call syncfs() on upper-layer fd > > > > Should that last syncfs error report an error? > > Actually, I was thinking of following. > - mount upper layer > - mount overlayfs (non-volatile) > - Do bunch of writes. > - A writeback error happens on upper file and gets recorded in > upper fs sb. > - overlay application calls syncfs(fd) and gets the error back. IIUC, > the way currently things are written, syncfs(fd) will not return > writeback errors on overlayfs. > > > > > Also, suppose if at the end we instead opened a file on overlayfs and > > issued the syncfs() there -- should we see the error in that case? > > I am thinking that behavior should be similar to as if two file > descriptors have been opened on a regular filesystem. So if I open > one fd1 on overlay and one fd2 on upper and they both were opened > before writeback error happend, then syncfs(fd1) and syncfs(fd2), > both should see the error. > Yes, that will happen as a matter of course. > And any of syncfs(fd1) and syncfs(fd2) should set the SEEN flag in > upper_sb so that new errors can continue to be reported. > The SEEN flag indicates whether a later opener should see an error that predated the open. Currently, it will iff no one else has scraped the error when the open is done. Once we start dealing with overlayfs though, things are a bit more murky. If someone issues a sync on the upper sb and that triggers a writeback error. If I then do an open+syncfs on the overlay, should I see the error? What about in the reverse case? > IOW, so looks like major problem with this patch is that we need > to propagate error from upper_sb to overlaysb only if a valid > file descriptor is open. IOW, do this in syncfs(fd) path and not > sync() path. And to distinguish between two, we probably need to > pass additional parameter in ->sync_fs(). > > Am I missing somehting. Just trying to make sure that if we are > solving the problem of syncfs error propagation in overlay, lets > solve it both for volatile as well as non-volatile case so that > there is less confusion later. > It may be possible to propagate the errors in some fashion, but it's starting to sound pretty complex. I think we'd probably be better served by cleaning things up so that overlayfs can just return an error of its choosing to syncfs(). What may actually be best is to add a new ->syncfs op to struct file_operations, and turn the current syncfs syscall wrapper into a generic_syncfs or something. Then you could just define a syncfs op for overlayfs and do what you like in there. -- Jeff Layton <jlayton@kernel.org> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-15 15:23 ` Jeff Layton @ 2020-12-15 15:39 ` Vivek Goyal 0 siblings, 0 replies; 18+ messages in thread From: Vivek Goyal @ 2020-12-15 15:39 UTC (permalink / raw) To: Jeff Layton Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Tue, Dec 15, 2020 at 10:23:08AM -0500, Jeff Layton wrote: > On Tue, 2020-12-15 at 09:59 -0500, Vivek Goyal wrote: > > On Tue, Dec 15, 2020 at 08:16:12AM -0500, Jeff Layton wrote: > > > On Mon, 2020-12-14 at 18:53 -0500, Jeff Layton wrote: > > > > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote: > > > > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > > > > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > > > > > and record it in the per-sb info. In sync_fs, check for an error since > > > > > > the recorded point and set it in the overlayfs superblock if there was > > > > > > one. > > > > > > > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > > > > --- > > > > > > > > > > While we are solving problem for non-volatile overlay mount, I also > > > > > started thinking, what about non-volatile overlay syncfs() writeback errors. > > > > > Looks like these will not be reported to user space at all as of now > > > > > (because we never update overlay_sb->s_wb_err ever). > > > > > > > > > > A patch like this might fix it. (compile tested only). > > > > > > > > > > overlayfs: Report syncfs() errors to user space > > > > > > > > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > > > > > return code. But certain writeback errors can still be reported on > > > > > syncfs() by checking errors on super block. > > > > > > > > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > > > > > > > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > > > > > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > > > > > > > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > > > > > should mean that user space syncfs() call should see writeback errors. > > > > > > > > > > ovl_fsync() does not need anything special because if there are writeback > > > > > errors underlying filesystem will report it through vfs_fsync_range() return > > > > > code and user space will see it. > > > > > > > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > > > > > --- > > > > > fs/overlayfs/ovl_entry.h | 1 + > > > > > fs/overlayfs/super.c | 14 +++++++++++--- > > > > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > > > > > > > Index: redhat-linux/fs/overlayfs/super.c > > > > > =================================================================== > > > > > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > > > > > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > > > > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > > > > > { > > > > > struct ovl_fs *ofs = sb->s_fs_info; > > > > > struct super_block *upper_sb; > > > > > - int ret; > > > > > + int ret, ret2; > > > > > > > > > > > > > > > > > > > > > > > > > if (!ovl_upper_mnt(ofs)) > > > > > return 0; > > > > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > > > > > ret = sync_filesystem(upper_sb); > > > > > up_read(&upper_sb->s_umount); > > > > > > > > > > > > > > > > > > > > > > > > > - return ret; > > > > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > > > > > + /* Upper sb has errors since last time */ > > > > > + spin_lock(&ofs->errseq_lock); > > > > > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > > > > > + &sb->s_wb_err); > > > > > + spin_unlock(&ofs->errseq_lock); > > > > > + } > > > > > + return ret ? ret : ret2; > > > > > > > > I think this is probably not quite right. > > > > > > > > The problem I think is that the SEEN flag is always going to end up > > > > being set in sb->s_wb_err, and that is going to violate the desired > > > > semantics. If the writeback error occurred after all fd's were closed, > > > > then the next opener wouldn't see it and you'd lose the error. > > > > > > > > We probably need a function to cleanly propagate the error from one > > > > errseq_t to another so that that doesn't occur. I'll have to think about > > > > it. > > > > > > > > > > So, the problem is that we can't guarantee that we'll have an open file > > > when sync_fs is called. So if you do the check_and_advance in the > > > context of a sync() syscall, you'll effectively ensure that a later > > > opener on the upper layer won't see the error (since the upper_sb's > > > errseq_t will be marked SEEN. > > > > Aha.., I assumed that when ->sync_fs() is called, we always have a > > valid fd open. But that's only true if ->sync_fs() is being called > > through syncfs(fd) syscall. For the case of plain sync() syscall, > > this is not true. > > > > So it leads us back to need of passing "struct file" in ->sync_fs(). > > And fetching the writeback error from upper can be done only > > if a file is open on which syncfs() has been called. > > > > > > > > It's not clear to me what semantics you want in the following situation: > > > > > > mount upper layer > > > mount overlayfs with non-volatile upper layer > > > do "stuff" on overlayfs, and close all files on overlayfs > > > get a writeback error on upper layer > > > call sync() (sync_fs gets run) > > > open file on upper layer mount > > > call syncfs() on upper-layer fd > > > > > > Should that last syncfs error report an error? > > > > Actually, I was thinking of following. > > - mount upper layer > > - mount overlayfs (non-volatile) > > - Do bunch of writes. > > - A writeback error happens on upper file and gets recorded in > > upper fs sb. > > - overlay application calls syncfs(fd) and gets the error back. IIUC, > > the way currently things are written, syncfs(fd) will not return > > writeback errors on overlayfs. > > > > > > > > Also, suppose if at the end we instead opened a file on overlayfs and > > > issued the syncfs() there -- should we see the error in that case? > > > > I am thinking that behavior should be similar to as if two file > > descriptors have been opened on a regular filesystem. So if I open > > one fd1 on overlay and one fd2 on upper and they both were opened > > before writeback error happend, then syncfs(fd1) and syncfs(fd2), > > both should see the error. > > > > > Yes, that will happen as a matter of course. > > > And any of syncfs(fd1) and syncfs(fd2) should set the SEEN flag in > > upper_sb so that new errors can continue to be reported. > > > > The SEEN flag indicates whether a later opener should see an error that > predated the open. Currently, it will iff no one else has scraped the > error when the open is done. > > Once we start dealing with overlayfs though, things are a bit more > murky. If someone issues a sync on the upper sb and that triggers a > writeback error. If I then do an open+syncfs on the overlay, should I > see the error? I think that yes, open+syncfs on the overlay should see this UNSEEN error. IOW, this will be similar to as if somebody did an open+syncfs on upper and scrapped UNSEEN error. > > What about in the reverse case? Same for reverse case. If overlayfs triggered sync and resulted in in unseen error on upper sb, then a later open+syncfs on upper should see the error. Thanks Vivek > > > IOW, so looks like major problem with this patch is that we need > > to propagate error from upper_sb to overlaysb only if a valid > > file descriptor is open. IOW, do this in syncfs(fd) path and not > > sync() path. And to distinguish between two, we probably need to > > pass additional parameter in ->sync_fs(). > > > > Am I missing somehting. Just trying to make sure that if we are > > solving the problem of syncfs error propagation in overlay, lets > > solve it both for volatile as well as non-volatile case so that > > there is less confusion later. > > > > It may be possible to propagate the errors in some fashion, but it's > starting to sound pretty complex. I think we'd probably be better served > by cleaning things up so that overlayfs can just return an error of its > choosing to syncfs(). > > What may actually be best is to add a new ->syncfs op to struct > file_operations, and turn the current syncfs syscall wrapper into a > generic_syncfs or something. Then you could just define a syncfs op for > overlayfs and do what you like in there. > > -- > Jeff Layton <jlayton@kernel.org> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-14 23:53 ` Jeff Layton 2020-12-15 13:16 ` Jeff Layton @ 2020-12-15 15:06 ` Vivek Goyal 1 sibling, 0 replies; 18+ messages in thread From: Vivek Goyal @ 2020-12-15 15:06 UTC (permalink / raw) To: Jeff Layton Cc: Amir Goldstein, Sargun Dhillon, Miklos Szeredi, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Mon, Dec 14, 2020 at 06:53:10PM -0500, Jeff Layton wrote: > On Mon, 2020-12-14 at 16:38 -0500, Vivek Goyal wrote: > > On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > > > Peek at the upper layer's errseq_t at mount time for volatile mounts, > > > and record it in the per-sb info. In sync_fs, check for an error since > > > the recorded point and set it in the overlayfs superblock if there was > > > one. > > > > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > > While we are solving problem for non-volatile overlay mount, I also > > started thinking, what about non-volatile overlay syncfs() writeback errors. > > Looks like these will not be reported to user space at all as of now > > (because we never update overlay_sb->s_wb_err ever). > > > > A patch like this might fix it. (compile tested only). > > > > overlayfs: Report syncfs() errors to user space > > > > Currently, syncfs(), calls filesystem ->sync_fs() method but ignores the > > return code. But certain writeback errors can still be reported on > > syncfs() by checking errors on super block. > > > > ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err); > > > > For the case of overlayfs, we never set overlayfs super block s_wb_err. That > > means sync() will never report writeback errors on overlayfs uppon syncfs(). > > > > Fix this by updating overlay sb->sb_wb_err upon ->sync_fs() call. And that > > should mean that user space syncfs() call should see writeback errors. > > > > ovl_fsync() does not need anything special because if there are writeback > > errors underlying filesystem will report it through vfs_fsync_range() return > > code and user space will see it. > > > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com> > > --- > > fs/overlayfs/ovl_entry.h | 1 + > > fs/overlayfs/super.c | 14 +++++++++++--- > > 2 files changed, 12 insertions(+), 3 deletions(-) > > > > Index: redhat-linux/fs/overlayfs/super.c > > =================================================================== > > --- redhat-linux.orig/fs/overlayfs/super.c 2020-12-14 15:33:43.934400880 -0500 > > +++ redhat-linux/fs/overlayfs/super.c 2020-12-14 16:15:07.127400880 -0500 > > @@ -259,7 +259,7 @@ static int ovl_sync_fs(struct super_bloc > > { > > struct ovl_fs *ofs = sb->s_fs_info; > > struct super_block *upper_sb; > > - int ret; > > + int ret, ret2; > > > > > > > > > > if (!ovl_upper_mnt(ofs)) > > return 0; > > @@ -283,7 +283,14 @@ static int ovl_sync_fs(struct super_bloc > > ret = sync_filesystem(upper_sb); > > up_read(&upper_sb->s_umount); > > > > > > > > > > - return ret; > > + if (errseq_check(&upper_sb->s_wb_err, sb->s_wb_err)) { > > + /* Upper sb has errors since last time */ > > + spin_lock(&ofs->errseq_lock); > > + ret2 = errseq_check_and_advance(&upper_sb->s_wb_err, > > + &sb->s_wb_err); > > + spin_unlock(&ofs->errseq_lock); > > + } > > + return ret ? ret : ret2; > > I think this is probably not quite right. > > The problem I think is that the SEEN flag is always going to end up > being set in sb->s_wb_err, and that is going to violate the desired > semantics. If the writeback error occurred after all fd's were closed, > then the next opener wouldn't see it and you'd lose the error. So this will happen only due to sync() path and not syncfs() path, right? If we avoid calling errseq_check_and_advance() in sync() path in ovleryafs(), then we always have a valid fd and marking error SEEN on upper is perfectly valid? > > We probably need a function to cleanly propagate the error from one > errseq_t to another so that that doesn't occur. I'll have to think about > it. Thanks Vivek > > > } > > > > > > > > > > /** > > @@ -1873,6 +1880,7 @@ static int ovl_fill_super(struct super_b > > if (!cred) > > goto out_err; > > > > > > > > > > + spin_lock_init(&ofs->errseq_lock); > > /* Is there a reason anyone would want not to share whiteouts? */ > > ofs->share_whiteout = true; > > > > > > > > > > @@ -1945,7 +1953,7 @@ static int ovl_fill_super(struct super_b > > > > > > > > > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; > > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; > > - > > + sb->s_wb_err = errseq_sample(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); > > } > > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); > > err = PTR_ERR(oe); > > Index: redhat-linux/fs/overlayfs/ovl_entry.h > > =================================================================== > > --- redhat-linux.orig/fs/overlayfs/ovl_entry.h 2020-12-14 15:33:43.934400880 -0500 > > +++ redhat-linux/fs/overlayfs/ovl_entry.h 2020-12-14 15:34:13.509400880 -0500 > > @@ -79,6 +79,7 @@ struct ovl_fs { > > atomic_long_t last_ino; > > /* Whiteout dentry cache */ > > struct dentry *whiteout; > > + spinlock_t errseq_lock; > > }; > > > > > > > > > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) > > > > -- > Jeff Layton <jlayton@kernel.org> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs 2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton 2020-12-14 21:38 ` Vivek Goyal @ 2020-12-17 19:28 ` Sargun Dhillon 1 sibling, 0 replies; 18+ messages in thread From: Sargun Dhillon @ 2020-12-17 19:28 UTC (permalink / raw) To: Jeff Layton Cc: Amir Goldstein, Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Sun, Dec 13, 2020 at 08:27:13AM -0500, Jeff Layton wrote: > Peek at the upper layer's errseq_t at mount time for volatile mounts, > and record it in the per-sb info. In sync_fs, check for an error since > the recorded point and set it in the overlayfs superblock if there was > one. > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > fs/overlayfs/ovl_entry.h | 1 + > fs/overlayfs/super.c | 14 +++++++++++--- > 2 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h > index 1b5a2094df8e..fcfcc3951973 100644 > --- a/fs/overlayfs/ovl_entry.h > +++ b/fs/overlayfs/ovl_entry.h > @@ -79,6 +79,7 @@ struct ovl_fs { > atomic_long_t last_ino; > /* Whiteout dentry cache */ > struct dentry *whiteout; > + errseq_t err_mark; > }; > > static inline struct vfsmount *ovl_upper_mnt(struct ovl_fs *ofs) > diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c > index 290983bcfbb3..2985d2752970 100644 > --- a/fs/overlayfs/super.c > +++ b/fs/overlayfs/super.c > @@ -264,8 +264,13 @@ static int ovl_sync_fs(struct super_block *sb, int wait) > if (!ovl_upper_mnt(ofs)) > return 0; > > - if (!ovl_should_sync(ofs)) > - return 0; > + if (!ovl_should_sync(ofs)) { > + /* Propagate errors from upper to overlayfs */ > + ret = errseq_check(&upper_sb->s_wb_err, ofs->err_mark); > + errseq_set(&sb->s_wb_err, ret); > + return ret; > + } > + > /* > * Not called for sync(2) call or an emergency sync (SB_I_SKIP_SYNC). > * All the super blocks will be iterated, including upper_sb. > @@ -1945,8 +1950,11 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent) > > sb->s_stack_depth = ovl_upper_mnt(ofs)->mnt_sb->s_stack_depth; > sb->s_time_gran = ovl_upper_mnt(ofs)->mnt_sb->s_time_gran; > - > } > + > + if (ofs->config.ovl_volatile) > + ofs->err_mark = errseq_peek(&ovl_upper_mnt(ofs)->mnt_sb->s_wb_err); > + > oe = ovl_get_lowerstack(sb, splitlower, numlower, ofs, layers); > err = PTR_ERR(oe); > if (IS_ERR(oe)) > -- > 2.29.2 > I've tested this with the following scenarios, seems to work: Test: 1. Mount ext2 on /mnt/loop, and cause a writeback error 2. Verify syncfs on /mnt/loop shows error 3. Mount volatile filesystem 4. Create file on volatile filesystem, and verify that I can syncfs it without error --- Fork: 5a. Create a file on overlayfs, and generate a writeback error 6a. Syncfs overlayfs. 7a. Create a new file on overlayfs, and syncfs, and verify it returns error --- 5b. Create a file on loop back, and generate a writeback error 6b. Sync said file 7b. Verify syncfs on loop returns error once, and then success on next attempts 8b. Verify all syncfs on overlayfs now fail --- 5c. Create file on overlayfs, and generate a writeback error 6c. Sync overlayfs, and verify all syncs are failures 7c. Verify syncfs on loop fails once. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case 2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton 2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton 2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton @ 2020-12-13 20:31 ` Sargun Dhillon 2 siblings, 0 replies; 18+ messages in thread From: Sargun Dhillon @ 2020-12-13 20:31 UTC (permalink / raw) To: Jeff Layton Cc: Amir Goldstein, Miklos Szeredi, Vivek Goyal, overlayfs, Linux FS-devel Mailing List, Matthew Wilcox, NeilBrown, Jan Kara On Sun, Dec 13, 2020 at 08:27:11AM -0500, Jeff Layton wrote: > What about this as an alternate approach to the problem that Sargun has > been working on? I have some minor concerns about the complexity of > managing a stateful object across two different words. That can be > done, but I think this may be simpler. > > This set steals an extra flag bit from the errseq_t counter so that we > have two flags: one indicating whether to increment the counter at set > time, and another to indicate whether the error has been reported to > userland. > This approach works, and I believe you suggested it early on, but I was unsure whether it was okay to use another bit for state information. > This should give you the semantics you want in the syncfs case, no? If > this does look like it's a suitable approach, then I'll plan to clean up > the comments and docs. > From a raw semantics perspective, this looks correct, and it looks like we could stash it as well for later reference (there's no going backwards, and....well, 2**19 errors is unlikely.). We do ~10s of overlayfs mounts / sec at peak, but even then we usually see a single disk error on a machine before it fails, I'm not sure if in the field people get more churn out of the errseq than that. > I have a vague feeling that this might help us eventually kill the > AS_EIO and AS_ENOSPC bits too, but that would require a bit more work to > plumb in "since" samples at appropriate places. > > Jeff Layton (2): > errseq: split the SEEN flag into two new flags > overlayfs: propagate errors from upper to overlay sb in sync_fs > > fs/overlayfs/ovl_entry.h | 1 + > fs/overlayfs/super.c | 14 +++++++-- > include/linux/errseq.h | 2 ++ > lib/errseq.c | 64 +++++++++++++++++++++++++++++++++------- > 4 files changed, 67 insertions(+), 14 deletions(-) > > -- > 2.29.2 > ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-12-17 19:29 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-12-13 13:27 [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Jeff Layton 2020-12-13 13:27 ` [RFC PATCH 1/2] errseq: split the SEEN flag into two new flags Jeff Layton 2020-12-13 23:35 ` NeilBrown 2020-12-14 13:37 ` Jeffrey Layton 2020-12-14 22:00 ` NeilBrown 2020-12-14 23:32 ` Jeff Layton 2020-12-13 13:27 ` [RFC PATCH 2/2] overlayfs: propagate errors from upper to overlay sb in sync_fs Jeff Layton 2020-12-14 21:38 ` Vivek Goyal 2020-12-14 22:04 ` Sargun Dhillon 2020-12-14 23:01 ` Vivek Goyal 2020-12-14 23:53 ` Jeff Layton 2020-12-15 13:16 ` Jeff Layton 2020-12-15 14:59 ` Vivek Goyal 2020-12-15 15:23 ` Jeff Layton 2020-12-15 15:39 ` Vivek Goyal 2020-12-15 15:06 ` Vivek Goyal 2020-12-17 19:28 ` Sargun Dhillon 2020-12-13 20:31 ` [RFC PATCH 0/2] errseq+overlayfs: accomodate the volatile upper layer use-case Sargun Dhillon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).