On Wed, Apr 12 2017, Jeff Layton wrote:

> On Wed, 2017-04-12 at 07:38 -0700, Matthew Wilcox wrote:
>> On Wed, Apr 12, 2017 at 09:01:34AM -0400, Jeff Layton wrote:
>> > On Wed, 2017-04-12 at 08:06 -0400, Jeff Layton wrote:
>> > > Not sure what to do here just yet.
>> > > 
>> > > Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> > > ---
>> > >  mm/page-writeback.c | 6 ++++++
>> > >  1 file changed, 6 insertions(+)
>> > > 
>> > > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
>> > > index de0dbf12e2c1..3ac8399dc984 100644
>> > > --- a/mm/page-writeback.c
>> > > +++ b/mm/page-writeback.c
>> > > @@ -2388,6 +2388,12 @@ int write_one_page(struct page *page)
>> > >  		ret = mapping->a_ops->writepage(page, &wbc);
>> > >  		if (ret == 0) {
>> > >  			wait_on_page_writeback(page);
>> > > +			/*
>> > > +			 * FIXME: is this racy? What guarantees that PG_error
>> > > +			 * will still be set once we get around to checking it?
>> > > +			 * What if writeback fails, but then a read is issued
>> > > +			 * before we check this, and that calls ClearPageError?
>> > > +			 */
>> > >  			if (PageError(page))
>> > >  				ret = -EIO;
>> > >  		}
>> > 
>> > Ahh, we are always under the page lock here, and this is generally used
>> > for writing out directory pages anyway. I'm fine with dropping this
>> > patch unless someone else sees a problem here.
>> 
>> ->writepage drops the page lock.  We're still holding a refcount on this
>> page, but that's not going to prevent read being called.  But maybe the
>> filesystem won't call read on a page that's marked as PageError?
>
> Hard to be sure there. I really wonder if that check is needed at all,
> the more I look at it. After all, we are calling writepage with
> WB_SYNC_ALL so we should get an error there.

WB_SYNC_ALL doesn't cause writepage to wait.  It might case it to ask
for REQ_SYNC, so the write requests gets priority in the block layer.
WB_SYNC_ALL does cause writepages (with an 's') to wait.
(At least, that is how I read the code).

>
> Is it also possible these pages could be written back before that point
> (due to memory pressure or something) and that fail?

Probably, in which case clear_page_dirty_for_io() will fail and
write_one_page() will just unlock the page.

>
> Maybe we should just have a call to filemap_check_errors on exiting
> this function?

I'm leaning in that direction.

>
> With the the wb_err_t based stuff, we could change it to sample the
> wb_err early, and then use that to see if an error has occurred since
> then. Maybe we should even allow callers to pass a wb_err_t in here, so
> we can report errors that have occurred since a known point?

That feels to me like over-engineering.  We would need to
unconditionally call writepage() for that to work.

We seem to be agreed that write errors for buffered writes are reported
per-address-space.  To get per-page errors you have to use direct IO.
Let's focus on that policy and make it work.

Thanks,
NeilBrown