All of lore.kernel.org
 help / color / mirror / Atom feed
* help converting btrfs to new writeback error tracking?
@ 2017-05-04 11:26 Jeff Layton
  2017-05-05 19:21 ` Liu Bo
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2017-05-04 11:26 UTC (permalink / raw)
  To: linux-btrfs

I've been working on set of patches to clean up how writeback errors are
tracked and handled in the kernel:

http://marc.info/?l=linux-fsdevel&m=149304074111261&w=2

The basic idea is that rather than having a set of flags that are
cleared whenever they are checked, we have a sequence counter and error
that are tracked on a per-mapping basis, and can then use that sequence
counter to tell whether the error should be reported.

This changes the way that things like filemap_write_and_wait work.
Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared
inappropriately (and thus losing errors that should be reported), you
can now tell whether there has been a writeback error since a certain
point in time, irrespective of whether anyone else is checking for
errors.

I've been doing some conversions of the existing code to the new scheme,
but btrfs has _really_ complicated error handling. I think it could
probably be simplified with this new scheme, but I could use some help
here.

What I think we probably want to do is to sample the error sequence in
the mapping at well-defined points in time (probably when starting a
transaction?) and then use that to determine whether writeback errors
have occurred since then. Is there anyone in the btrfs community who
could help me here?

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: help converting btrfs to new writeback error tracking?
  2017-05-04 11:26 help converting btrfs to new writeback error tracking? Jeff Layton
@ 2017-05-05 19:21 ` Liu Bo
  2017-05-05 20:11   ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Liu Bo @ 2017-05-05 19:21 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-btrfs

Hi Jeff,

On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote:
> I've been working on set of patches to clean up how writeback errors are
> tracked and handled in the kernel:
> 
> http://marc.info/?l=linux-fsdevel&m=149304074111261&w=2
> 
> The basic idea is that rather than having a set of flags that are
> cleared whenever they are checked, we have a sequence counter and error
> that are tracked on a per-mapping basis, and can then use that sequence
> counter to tell whether the error should be reported.
> 
> This changes the way that things like filemap_write_and_wait work.
> Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared
> inappropriately (and thus losing errors that should be reported), you
> can now tell whether there has been a writeback error since a certain
> point in time, irrespective of whether anyone else is checking for
> errors.
> 
> I've been doing some conversions of the existing code to the new scheme,
> but btrfs has _really_ complicated error handling. I think it could
> probably be simplified with this new scheme, but I could use some help
> here.
> 
> What I think we probably want to do is to sample the error sequence in
> the mapping at well-defined points in time (probably when starting a
> transaction?) and then use that to determine whether writeback errors
> have occurred since then. Is there anyone in the btrfs community who
> could help me here?
>

I went through the patch set and reviewed the btrfs part particular in
[PATCH v3 14/20] fs: retrofit old error reporting API onto new infrastructure

It looks good to me.

In btrfs ->writepage(), it sets PG_error whenever an error
(-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in
end_extent_writepage().  And the special case is the compression code, where it
only sets mapping's error when there is any error during processing compression
bytes.

Similar to ext4, btrfs tracks the IO error by setting mapping's error in
writepage_endio and other places (eg. compression code), and around tree-log.c
it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is usually
set in writepage_endio and sometimes in some error handling code where it
couldn't call endio.

So the conversion in btrfs's fsync() seems to be good enough, did I miss
anything?

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: help converting btrfs to new writeback error tracking?
  2017-05-05 19:21 ` Liu Bo
@ 2017-05-05 20:11   ` Jeff Layton
  2017-05-08 18:39     ` Liu Bo
  0 siblings, 1 reply; 5+ messages in thread
From: Jeff Layton @ 2017-05-05 20:11 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs

On Fri, 2017-05-05 at 12:21 -0700, Liu Bo wrote:
> Hi Jeff,
> 
> On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote:
> > I've been working on set of patches to clean up how writeback errors are
> > tracked and handled in the kernel:
> > 
> > http://marc.info/?l=linux-fsdevel&m=149304074111261&w=2
> > 
> > The basic idea is that rather than having a set of flags that are
> > cleared whenever they are checked, we have a sequence counter and error
> > that are tracked on a per-mapping basis, and can then use that sequence
> > counter to tell whether the error should be reported.
> > 
> > This changes the way that things like filemap_write_and_wait work.
> > Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared
> > inappropriately (and thus losing errors that should be reported), you
> > can now tell whether there has been a writeback error since a certain
> > point in time, irrespective of whether anyone else is checking for
> > errors.
> > 
> > I've been doing some conversions of the existing code to the new scheme,
> > but btrfs has _really_ complicated error handling. I think it could
> > probably be simplified with this new scheme, but I could use some help
> > here.
> > 
> > What I think we probably want to do is to sample the error sequence in
> > the mapping at well-defined points in time (probably when starting a
> > transaction?) and then use that to determine whether writeback errors
> > have occurred since then. Is there anyone in the btrfs community who
> > could help me here?
> > 
> 
> I went through the patch set and reviewed the btrfs part particular in
> [PATCH v3 14/20] fs: retrofit old error reporting API onto new infrastructure
> 
> It looks good to me.
> 
> In btrfs ->writepage(), it sets PG_error whenever an error
> (-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in
> end_extent_writepage().  And the special case is the compression code, where it
> only sets mapping's error when there is any error during processing compression
> bytes.
> 
> Similar to ext4, btrfs tracks the IO error by setting mapping's error in
> writepage_endio and other places (eg. compression code), and around tree-log.c
> it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is usually
> set in writepage_endio and sometimes in some error handling code where it
> couldn't call endio.
> 
> So the conversion in btrfs's fsync() seems to be good enough, did I miss
> anything?
> 

Many thanks for taking a look:

There are a number of calls in btrfs to filemap_fdatawait_range that
check the return code. That function will wait for writeback on all of
the pages in the mapping range and return an error if there has been
one. Note too that there are also some places that ignore the return
code.

These patches change how filemap_fdatawait_range (and some similar
functions) work. Before this set, you'd get an error if one had occurred
since anyone last checked it. Now, you only get an error there if one
occurred since you started waiting. If the failed writeback occurred
before that function was called, you won't get an error back.

For fsync, it shouldn't matter. You'll get an error back there if one
occurred since the last fsync since you're setting it in the mapping.
The bigger question is whether other callers in this code do anything
with that error return.

If they do, then the next question is: from what point do you want to
detect errors that have occurred?

What sort of makes sense to me (in a handwavy way) would be to sample
the errseq_t in the mapping when you start a transaction, and then check
vs. that for errors. Then, even if you have parallel transactions going
on the same inode (is that even possible?) then you can tell whether
they all succeded or not.

Thoughts?
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: help converting btrfs to new writeback error tracking?
  2017-05-05 20:11   ` Jeff Layton
@ 2017-05-08 18:39     ` Liu Bo
  2017-05-09 11:15       ` Jeff Layton
  0 siblings, 1 reply; 5+ messages in thread
From: Liu Bo @ 2017-05-08 18:39 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-btrfs

Hi Jeff,

On Fri, May 05, 2017 at 04:11:18PM -0400, Jeff Layton wrote:
> On Fri, 2017-05-05 at 12:21 -0700, Liu Bo wrote:
> > Hi Jeff,
> > 
> > On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote:
> > > I've been working on set of patches to clean up how writeback errors are
> > > tracked and handled in the kernel:
> > > 
> > > http://marc.info/?l=linux-fsdevel&m=149304074111261&w=2
> > > 
> > > The basic idea is that rather than having a set of flags that are
> > > cleared whenever they are checked, we have a sequence counter and error
> > > that are tracked on a per-mapping basis, and can then use that sequence
> > > counter to tell whether the error should be reported.
> > > 
> > > This changes the way that things like filemap_write_and_wait work.
> > > Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared
> > > inappropriately (and thus losing errors that should be reported), you
> > > can now tell whether there has been a writeback error since a certain
> > > point in time, irrespective of whether anyone else is checking for
> > > errors.
> > > 
> > > I've been doing some conversions of the existing code to the new scheme,
> > > but btrfs has _really_ complicated error handling. I think it could
> > > probably be simplified with this new scheme, but I could use some help
> > > here.
> > > 
> > > What I think we probably want to do is to sample the error sequence in
> > > the mapping at well-defined points in time (probably when starting a
> > > transaction?) and then use that to determine whether writeback errors
> > > have occurred since then. Is there anyone in the btrfs community who
> > > could help me here?
> > > 
> > 
> > I went through the patch set and reviewed the btrfs part particular in
> > [PATCH v3 14/20] fs: retrofit old error reporting API onto new infrastructure
> > 
> > It looks good to me.
> > 
> > In btrfs ->writepage(), it sets PG_error whenever an error
> > (-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in
> > end_extent_writepage().  And the special case is the compression code, where it
> > only sets mapping's error when there is any error during processing compression
> > bytes.
> > 
> > Similar to ext4, btrfs tracks the IO error by setting mapping's error in
> > writepage_endio and other places (eg. compression code), and around tree-log.c
> > it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is usually
> > set in writepage_endio and sometimes in some error handling code where it
> > couldn't call endio.
> > 
> > So the conversion in btrfs's fsync() seems to be good enough, did I miss
> > anything?
> > 
> 
> Many thanks for taking a look:
> 
> There are a number of calls in btrfs to filemap_fdatawait_range that
> check the return code. That function will wait for writeback on all of
> the pages in the mapping range and return an error if there has been
> one. Note too that there are also some places that ignore the return
> code.
> 
> These patches change how filemap_fdatawait_range (and some similar
> functions) work. Before this set, you'd get an error if one had occurred
> since anyone last checked it. Now, you only get an error there if one
> occurred since you started waiting. If the failed writeback occurred
> before that function was called, you won't get an error back.
> 

Since all filemap_fdatawait_range() called in btrfs checked the return value, it
is supposed to catch any errors that are occured from filemap_fdatawrite_range()
which is called twice by btrfs_fdatawrite_range()[1], so with this set, it's
possible to fail to detect errors if only calling filemap_fdatawait_range().

[1]: filemap_fdatawrite_range() needs to be called twice to make sure compressed
data is flushed.

> For fsync, it shouldn't matter. You'll get an error back there if one
> occurred since the last fsync since you're setting it in the mapping.
> The bigger question is whether other callers in this code do anything
> with that error return.
> 
> If they do, then the next question is: from what point do you want to
> detect errors that have occurred?
> 
> What sort of makes sense to me (in a handwavy way) would be to sample
> the errseq_t in the mapping when you start a transaction, and then check
> vs. that for errors. Then, even if you have parallel transactions going
> on the same inode (is that even possible?) then you can tell whether
> they all succeded or not.
> 
> Thoughts?

That may not work, a transaction here is kind of unrelated to mapping and
writeback of a mapping can be processed accross two transactions.

Is it possible to provide another version of filemap_fdatawait_range() to take
@since as an argument?  If so, we can sample wb_err before calling
filemap_fdatawrite_range() (or btrfs_fdatawrite_range()) and collect any wb_err
after filemap_fdatawait_range().

Thanks,

-liubo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: help converting btrfs to new writeback error tracking?
  2017-05-08 18:39     ` Liu Bo
@ 2017-05-09 11:15       ` Jeff Layton
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Layton @ 2017-05-09 11:15 UTC (permalink / raw)
  To: bo.li.liu; +Cc: linux-btrfs

On Mon, 2017-05-08 at 11:39 -0700, Liu Bo wrote:
> Hi Jeff,
> 
> On Fri, May 05, 2017 at 04:11:18PM -0400, Jeff Layton wrote:
> > On Fri, 2017-05-05 at 12:21 -0700, Liu Bo wrote:
> > > Hi Jeff,
> > > 
> > > On Thu, May 04, 2017 at 07:26:17AM -0400, Jeff Layton wrote:
> > > > I've been working on set of patches to clean up how writeback errors are
> > > > tracked and handled in the kernel:
> > > > 
> > > > http://marc.info/?l=linux-fsdevel&m=149304074111261&w=2
> > > > 
> > > > The basic idea is that rather than having a set of flags that are
> > > > cleared whenever they are checked, we have a sequence counter and error
> > > > that are tracked on a per-mapping basis, and can then use that sequence
> > > > counter to tell whether the error should be reported.
> > > > 
> > > > This changes the way that things like filemap_write_and_wait work.
> > > > Rather than having to ensure that AS_EIO/AS_ENOSPC are not cleared
> > > > inappropriately (and thus losing errors that should be reported), you
> > > > can now tell whether there has been a writeback error since a certain
> > > > point in time, irrespective of whether anyone else is checking for
> > > > errors.
> > > > 
> > > > I've been doing some conversions of the existing code to the new scheme,
> > > > but btrfs has _really_ complicated error handling. I think it could
> > > > probably be simplified with this new scheme, but I could use some help
> > > > here.
> > > > 
> > > > What I think we probably want to do is to sample the error sequence in
> > > > the mapping at well-defined points in time (probably when starting a
> > > > transaction?) and then use that to determine whether writeback errors
> > > > have occurred since then. Is there anyone in the btrfs community who
> > > > could help me here?
> > > > 
> > > 
> > > I went through the patch set and reviewed the btrfs part particular in
> > > [PATCH v3 14/20] fs: retrofit old error reporting API onto new infrastructure
> > > 
> > > It looks good to me.
> > > 
> > > In btrfs ->writepage(), it sets PG_error whenever an error
> > > (-EIO/-ENOSPC/-ENOMEM) occurs and it sets mapping's error as well in
> > > end_extent_writepage().  And the special case is the compression code, where it
> > > only sets mapping's error when there is any error during processing compression
> > > bytes.
> > > 
> > > Similar to ext4, btrfs tracks the IO error by setting mapping's error in
> > > writepage_endio and other places (eg. compression code), and around tree-log.c
> > > it's checking BTRFS_ORDERED_IOERR from ordered_extent->flags, which is usually
> > > set in writepage_endio and sometimes in some error handling code where it
> > > couldn't call endio.
> > > 
> > > So the conversion in btrfs's fsync() seems to be good enough, did I miss
> > > anything?
> > > 
> > 
> > Many thanks for taking a look:
> > 
> > There are a number of calls in btrfs to filemap_fdatawait_range that
> > check the return code. That function will wait for writeback on all of
> > the pages in the mapping range and return an error if there has been
> > one. Note too that there are also some places that ignore the return
> > code.
> > 
> > These patches change how filemap_fdatawait_range (and some similar
> > functions) work. Before this set, you'd get an error if one had occurred
> > since anyone last checked it. Now, you only get an error there if one
> > occurred since you started waiting. If the failed writeback occurred
> > before that function was called, you won't get an error back.
> > 
> 
> Since all filemap_fdatawait_range() called in btrfs checked the return value, it
> is supposed to catch any errors that are occured from filemap_fdatawrite_range()
> which is called twice by btrfs_fdatawrite_range()[1], so with this set, it's
> possible to fail to detect errors if only calling filemap_fdatawait_range().
> 
> [1]: filemap_fdatawrite_range() needs to be called twice to make sure compressed
> data is flushed.
> 
> > For fsync, it shouldn't matter. You'll get an error back there if one
> > occurred since the last fsync since you're setting it in the mapping.
> > The bigger question is whether other callers in this code do anything
> > with that error return.
> > 
> > If they do, then the next question is: from what point do you want to
> > detect errors that have occurred?
> > 
> > What sort of makes sense to me (in a handwavy way) would be to sample
> > the errseq_t in the mapping when you start a transaction, and then check
> > vs. that for errors. Then, even if you have parallel transactions going
> > on the same inode (is that even possible?) then you can tell whether
> > they all succeded or not.
> > 
> > Thoughts?
> 
> That may not work, a transaction here is kind of unrelated to mapping and
> writeback of a mapping can be processed accross two transactions.
> 

Got it, thanks.

> Is it possible to provide another version of filemap_fdatawait_range() to take
> @since as an argument?  If so, we can sample wb_err before calling
> filemap_fdatawrite_range() (or btrfs_fdatawrite_range()) and collect any wb_err
> after filemap_fdatawait_range().
> 

We can absolutely add a filemap_fdatawait_range_since() or something
along those lines. I figured we'd need something like that but didn't
want to add it until we had a clear idea of how it would be used. I'll
spin something up along those lines for the next iteration of the
patchset.

Sampling wb_err just before calling filemap_fdatawrite_range() however
may not be sufficient. It's always possible with the pagecache that
memory pressure or something could cause writeback to start between when
you dirty the page and when you sample wb_err. If that happens and the
writeback fails before you sample the error, then you wouldn't see it.

I think you may want to aim for sampling wb_err before dirtying the
pages that will eventually be written out to close the potential for
that race.

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-05-09 11:15 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 11:26 help converting btrfs to new writeback error tracking? Jeff Layton
2017-05-05 19:21 ` Liu Bo
2017-05-05 20:11   ` Jeff Layton
2017-05-08 18:39     ` Liu Bo
2017-05-09 11:15       ` Jeff Layton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.