From: NeilBrown <neilb@suse.com> To: Jeff Layton <jlayton@poochiereds.net>, Trond Myklebust <trondmy@primarydata.com>, "kwolf\@redhat.com" <kwolf@redhat.com>, "tytso\@mit.edu" <tytso@mit.edu> Cc: "lsf-pc\@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org>, "hch\@infradead.org" <hch@infradead.org>, "riel\@redhat.com" <riel@redhat.com>, "rwheeler\@redhat.com" <rwheeler@redhat.com>, "linux-mm\@kvack.org" <linux-mm@kvack.org>, "linux-fsdevel\@vger.kernel.org" <linux-fsdevel@vger.kernel.org> Subject: Re: [Lsf-pc] [LSF/MM TOPIC] I/O error handling and fsync() Date: Wed, 25 Jan 2017 08:58:28 +1100 [thread overview] Message-ID: <87y3y0glx7.fsf@notabene.neil.brown.name> (raw) In-Reply-To: <1485218787.2786.23.camel@poochiereds.net> [-- Attachment #1: Type: text/plain, Size: 4250 bytes --] On Mon, Jan 23 2017, Jeff Layton wrote: > On Tue, 2017-01-24 at 11:16 +1100, NeilBrown wrote: >> On Mon, Jan 23 2017, Trond Myklebust wrote: >> >> > On Mon, 2017-01-23 at 17:35 -0500, Jeff Layton wrote: >> > > On Mon, 2017-01-23 at 11:09 +0100, Kevin Wolf wrote: >> > > > >> > > > However, if we look at the greater problem of hanging requests that >> > > > came >> > > > up in the more recent emails of this thread, it is only moved >> > > > rather >> > > > than solved. Chances are that already write() would hang now >> > > > instead of >> > > > only fsync(), but we still have a hard time dealing with this. >> > > > >> > > >> > > Well, it _is_ better with O_DIRECT as you can usually at least break >> > > out >> > > of the I/O with SIGKILL. >> > > >> > > When I last looked at this, the problem with buffered I/O was that >> > > you >> > > often end up waiting on page bits to clear (usually PG_writeback or >> > > PG_dirty), in non-killable sleeps for the most part. >> > > >> > > Maybe the fix here is as simple as changing that? >> > >> > At the risk of kicking off another O_PONIES discussion: Add an >> > open(O_TIMEOUT) flag that would let the kernel know that the >> > application is prepared to handle timeouts from operations such as >> > read(), write() and fsync(), then add an ioctl() or syscall to allow >> > said application to set the timeout value. >> >> I was thinking on very similar lines, though I'd use 'fcntl()' if >> possible because it would be a per-"file description" option. >> This would be a function of the page cache, and a filesystem wouldn't >> need to know about it at all. Once enable, 'read', 'write', or 'fsync' >> would return EWOULDBLOCK rather than waiting indefinitely. >> It might be nice if 'select' could then be used on page-cache file >> descriptors, but I think that is much harder. Support O_TIMEOUT would >> be a practical first step - if someone agreed to actually try to use it. >> > > Yeah, that does seem like it might be worth exploring. > > That said, I think there's something even simpler we can do to make > things better for a lot of cases, and it may even help pave the way for > the proposal above. > > Looking closer and remembering more, I think the main problem area when > the pages are stuck in writeback is the wait_on_page_writeback call in > places like wait_for_stable_page and __filemap_fdatawait_range. I can't see wait_for_stable_page() being very relevant. That only blocks on backing devices which have requested stable pages. raid5 sometimes does that. Some scsi/sata devices can somehow. And rbd (part of ceph) sometimes does. I don't think NFS ever will. wait_for_stable_page() doesn't currently return an error, so getting to abort in SIGKILL would be a lot of work. filemap_fdatawait_range() is much easier. diff --git a/mm/filemap.c b/mm/filemap.c index b772a33ef640..2773f6dde1da 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -401,7 +401,9 @@ static int __filemap_fdatawait_range(struct address_space *mapping, if (page->index > end) continue; - wait_on_page_writeback(page); + if (PageWriteback(page)) + if (wait_on_page_bit_killable(page, PG_writeback)) + err = -ERESTARTSYS; if (TestClearPageError(page)) ret = -EIO; } That isn't a complete solution. There is code in f2fs which doesn't check the return value and probably should. And gfs2 calls mapping_set_error(mapping, error); with the return value, with we probably don't want in the ERESTARTSYS case. There are some usages in btrfs that I'd need to double-check too. But it looks to be manageable. Thanks, NeilBrown > > That uses an uninterruptible sleep and it's common to see applications > stuck there in these situations. They're unkillable too so your only > recourse is to hard reset the box when you can't reestablish > connectivity. > > I think it might be good to consider making some of those sleeps > TASK_KILLABLE. For instance, both of the above callers of those > functions are int return functions. It may be possible to return > ERESTARTSYS when the task catches a signal. > > -- > Jeff Layton <jlayton@poochiereds.net> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
WARNING: multiple messages have this Message-ID (diff)
From: NeilBrown <neilb@suse.com> To: Jeff Layton <jlayton@poochiereds.net>, Trond Myklebust <trondmy@primarydata.com>, "kwolf@redhat.com" <kwolf@redhat.com>, "tytso@mit.edu" <tytso@mit.edu> Cc: "lsf-pc@lists.linux-foundation.org" <lsf-pc@lists.linux-foundation.org>, "hch@infradead.org" <hch@infradead.org>, "riel@redhat.com" <riel@redhat.com>, "rwheeler@redhat.com" <rwheeler@redhat.com>, "linux-mm@kvack.org" <linux-mm@kvack.org>, "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org> Subject: Re: [Lsf-pc] [LSF/MM TOPIC] I/O error handling and fsync() Date: Wed, 25 Jan 2017 08:58:28 +1100 [thread overview] Message-ID: <87y3y0glx7.fsf@notabene.neil.brown.name> (raw) In-Reply-To: <1485218787.2786.23.camel@poochiereds.net> [-- Attachment #1: Type: text/plain, Size: 4250 bytes --] On Mon, Jan 23 2017, Jeff Layton wrote: > On Tue, 2017-01-24 at 11:16 +1100, NeilBrown wrote: >> On Mon, Jan 23 2017, Trond Myklebust wrote: >> >> > On Mon, 2017-01-23 at 17:35 -0500, Jeff Layton wrote: >> > > On Mon, 2017-01-23 at 11:09 +0100, Kevin Wolf wrote: >> > > > >> > > > However, if we look at the greater problem of hanging requests that >> > > > came >> > > > up in the more recent emails of this thread, it is only moved >> > > > rather >> > > > than solved. Chances are that already write() would hang now >> > > > instead of >> > > > only fsync(), but we still have a hard time dealing with this. >> > > > >> > > >> > > Well, it _is_ better with O_DIRECT as you can usually at least break >> > > out >> > > of the I/O with SIGKILL. >> > > >> > > When I last looked at this, the problem with buffered I/O was that >> > > you >> > > often end up waiting on page bits to clear (usually PG_writeback or >> > > PG_dirty), in non-killable sleeps for the most part. >> > > >> > > Maybe the fix here is as simple as changing that? >> > >> > At the risk of kicking off another O_PONIES discussion: Add an >> > open(O_TIMEOUT) flag that would let the kernel know that the >> > application is prepared to handle timeouts from operations such as >> > read(), write() and fsync(), then add an ioctl() or syscall to allow >> > said application to set the timeout value. >> >> I was thinking on very similar lines, though I'd use 'fcntl()' if >> possible because it would be a per-"file description" option. >> This would be a function of the page cache, and a filesystem wouldn't >> need to know about it at all. Once enable, 'read', 'write', or 'fsync' >> would return EWOULDBLOCK rather than waiting indefinitely. >> It might be nice if 'select' could then be used on page-cache file >> descriptors, but I think that is much harder. Support O_TIMEOUT would >> be a practical first step - if someone agreed to actually try to use it. >> > > Yeah, that does seem like it might be worth exploring. > > That said, I think there's something even simpler we can do to make > things better for a lot of cases, and it may even help pave the way for > the proposal above. > > Looking closer and remembering more, I think the main problem area when > the pages are stuck in writeback is the wait_on_page_writeback call in > places like wait_for_stable_page and __filemap_fdatawait_range. I can't see wait_for_stable_page() being very relevant. That only blocks on backing devices which have requested stable pages. raid5 sometimes does that. Some scsi/sata devices can somehow. And rbd (part of ceph) sometimes does. I don't think NFS ever will. wait_for_stable_page() doesn't currently return an error, so getting to abort in SIGKILL would be a lot of work. filemap_fdatawait_range() is much easier. diff --git a/mm/filemap.c b/mm/filemap.c index b772a33ef640..2773f6dde1da 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -401,7 +401,9 @@ static int __filemap_fdatawait_range(struct address_space *mapping, if (page->index > end) continue; - wait_on_page_writeback(page); + if (PageWriteback(page)) + if (wait_on_page_bit_killable(page, PG_writeback)) + err = -ERESTARTSYS; if (TestClearPageError(page)) ret = -EIO; } That isn't a complete solution. There is code in f2fs which doesn't check the return value and probably should. And gfs2 calls mapping_set_error(mapping, error); with the return value, with we probably don't want in the ERESTARTSYS case. There are some usages in btrfs that I'd need to double-check too. But it looks to be manageable. Thanks, NeilBrown > > That uses an uninterruptible sleep and it's common to see applications > stuck there in these situations. They're unkillable too so your only > recourse is to hard reset the box when you can't reestablish > connectivity. > > I think it might be good to consider making some of those sleeps > TASK_KILLABLE. For instance, both of the above callers of those > functions are int return functions. It may be possible to return > ERESTARTSYS when the task catches a signal. > > -- > Jeff Layton <jlayton@poochiereds.net> [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --]
next prev parent reply other threads:[~2017-01-24 21:58 UTC|newest] Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-01-10 16:02 [LSF/MM TOPIC] I/O error handling and fsync() Kevin Wolf 2017-01-11 0:41 ` NeilBrown 2017-01-13 11:09 ` Kevin Wolf 2017-01-13 14:21 ` Theodore Ts'o 2017-01-13 16:00 ` Kevin Wolf 2017-01-13 22:28 ` NeilBrown 2017-01-14 6:18 ` Darrick J. Wong 2017-01-16 12:14 ` [Lsf-pc] " Jeff Layton 2017-01-22 22:44 ` NeilBrown 2017-01-22 23:31 ` Jeff Layton 2017-01-23 0:21 ` Theodore Ts'o 2017-01-23 10:09 ` Kevin Wolf 2017-01-23 12:10 ` Jeff Layton 2017-01-23 12:10 ` Jeff Layton 2017-01-23 17:25 ` Theodore Ts'o 2017-01-23 17:53 ` Chuck Lever 2017-01-23 22:40 ` Jeff Layton 2017-01-23 22:40 ` Jeff Layton 2017-01-23 22:35 ` Jeff Layton 2017-01-23 22:35 ` Jeff Layton 2017-01-23 23:09 ` Trond Myklebust 2017-01-24 0:16 ` NeilBrown 2017-01-24 0:16 ` NeilBrown 2017-01-24 0:46 ` Jeff Layton 2017-01-24 0:46 ` Jeff Layton 2017-01-24 21:58 ` NeilBrown [this message] 2017-01-24 21:58 ` NeilBrown 2017-01-25 13:00 ` Jeff Layton 2017-01-25 13:00 ` Jeff Layton 2017-01-30 5:30 ` NeilBrown 2017-01-30 5:30 ` NeilBrown 2017-01-24 3:34 ` Trond Myklebust 2017-01-25 18:35 ` Theodore Ts'o 2017-01-26 0:36 ` NeilBrown 2017-01-26 0:36 ` NeilBrown 2017-01-26 9:25 ` Jan Kara 2017-01-26 22:19 ` NeilBrown 2017-01-26 22:19 ` NeilBrown 2017-01-27 3:23 ` Theodore Ts'o 2017-01-27 6:03 ` NeilBrown 2017-01-27 6:03 ` NeilBrown 2017-01-30 16:04 ` Jan Kara 2017-01-13 18:40 ` Al Viro 2017-01-13 19:06 ` Kevin Wolf 2017-01-11 5:03 ` Theodore Ts'o 2017-01-11 9:47 ` [Lsf-pc] " Jan Kara 2017-01-11 15:45 ` Theodore Ts'o 2017-01-11 10:55 ` Chris Vest 2017-01-11 11:40 ` Kevin Wolf 2017-01-13 4:51 ` NeilBrown 2017-01-13 11:51 ` Kevin Wolf 2017-01-13 21:55 ` NeilBrown 2017-01-11 12:14 ` Chris Vest
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87y3y0glx7.fsf@notabene.neil.brown.name \ --to=neilb@suse.com \ --cc=hch@infradead.org \ --cc=jlayton@poochiereds.net \ --cc=kwolf@redhat.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=lsf-pc@lists.linux-foundation.org \ --cc=riel@redhat.com \ --cc=rwheeler@redhat.com \ --cc=trondmy@primarydata.com \ --cc=tytso@mit.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.