* [PATCH] xfs: fix iclog release error check race with shutdown
@ 2020-02-14 18:15 Brian Foster
2020-02-14 19:38 ` Eric Sandeen
2020-02-17 13:33 ` Christoph Hellwig
0 siblings, 2 replies; 6+ messages in thread
From: Brian Foster @ 2020-02-14 18:15 UTC (permalink / raw)
To: linux-xfs; +Cc: Zorro Lang
Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
l_icloglock held"), xlog_state_release_iclog() always performed a
locked check of the iclog error state before proceeding into the
sync state processing code. As of this commit, part of
xlog_state_release_iclog() was open-coded into
xfs_log_release_iclog() and as a result the locked error state check
was lost.
The lockless check still exists, but this doesn't account for the
possibility of a race with a shutdown being performed by another
task causing the iclog state to change while the original task waits
on ->l_icloglock. This has reproduced very rarely via generic/475
and manifests as an assert failure in __xlog_state_release_iclog()
due to an unexpected iclog state.
Restore the locked error state check in xlog_state_release_iclog()
to ensure that an iclog state update via shutdown doesn't race with
the iclog release state processing code.
Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
fs/xfs/xfs_log.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index f6006d94a581..f38fc492a14d 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -611,6 +611,10 @@ xfs_log_release_iclog(
}
if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
+ if (iclog->ic_state == XLOG_STATE_IOERROR) {
+ spin_unlock(&log->l_icloglock);
+ return -EIO;
+ }
sync = __xlog_state_release_iclog(log, iclog);
spin_unlock(&log->l_icloglock);
if (sync)
--
2.21.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: fix iclog release error check race with shutdown
2020-02-14 18:15 [PATCH] xfs: fix iclog release error check race with shutdown Brian Foster
@ 2020-02-14 19:38 ` Eric Sandeen
2020-02-17 13:33 ` Christoph Hellwig
1 sibling, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2020-02-14 19:38 UTC (permalink / raw)
To: Brian Foster, linux-xfs; +Cc: Zorro Lang
On 2/14/20 12:15 PM, Brian Foster wrote:
> Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> l_icloglock held"), xlog_state_release_iclog() always performed a
> locked check of the iclog error state before proceeding into the
> sync state processing code. As of this commit, part of
> xlog_state_release_iclog() was open-coded into
> xfs_log_release_iclog() and as a result the locked error state check
> was lost.
>
> The lockless check still exists, but this doesn't account for the
> possibility of a race with a shutdown being performed by another
> task causing the iclog state to change while the original task waits
> on ->l_icloglock. This has reproduced very rarely via generic/475
> and manifests as an assert failure in __xlog_state_release_iclog()
> due to an unexpected iclog state.
>
> Restore the locked error state check in xlog_state_release_iclog()
> to ensure that an iclog state update via shutdown doesn't race with
> the iclog release state processing code.
>
> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
On vacation* today so not thinking hard about reviews but if this goes in,
Darrick can you please add a:
Fixes: df732b29c8 ("xfs: call xlog_state_release_iclog with l_icloglock held")
Thanks,
-Eric
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: fix iclog release error check race with shutdown
2020-02-14 18:15 [PATCH] xfs: fix iclog release error check race with shutdown Brian Foster
2020-02-14 19:38 ` Eric Sandeen
@ 2020-02-17 13:33 ` Christoph Hellwig
2020-02-17 15:29 ` Brian Foster
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-02-17 13:33 UTC (permalink / raw)
To: Brian Foster; +Cc: linux-xfs, Zorro Lang
On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> l_icloglock held"), xlog_state_release_iclog() always performed a
> locked check of the iclog error state before proceeding into the
> sync state processing code. As of this commit, part of
> xlog_state_release_iclog() was open-coded into
> xfs_log_release_iclog() and as a result the locked error state check
> was lost.
>
> The lockless check still exists, but this doesn't account for the
> possibility of a race with a shutdown being performed by another
> task causing the iclog state to change while the original task waits
> on ->l_icloglock. This has reproduced very rarely via generic/475
> and manifests as an assert failure in __xlog_state_release_iclog()
> due to an unexpected iclog state.
>
> Restore the locked error state check in xlog_state_release_iclog()
> to ensure that an iclog state update via shutdown doesn't race with
> the iclog release state processing code.
>
> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/xfs/xfs_log.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index f6006d94a581..f38fc492a14d 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -611,6 +611,10 @@ xfs_log_release_iclog(
> }
>
> if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> + if (iclog->ic_state == XLOG_STATE_IOERROR) {
> + spin_unlock(&log->l_icloglock);
> + return -EIO;
> + }
So the check just above also shuts the file system down. Any reason to
do that in one case and not the other?
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: fix iclog release error check race with shutdown
2020-02-17 13:33 ` Christoph Hellwig
@ 2020-02-17 15:29 ` Brian Foster
2020-02-18 15:53 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2020-02-17 15:29 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs, Zorro Lang
On Mon, Feb 17, 2020 at 05:33:14AM -0800, Christoph Hellwig wrote:
> On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> > Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> > l_icloglock held"), xlog_state_release_iclog() always performed a
> > locked check of the iclog error state before proceeding into the
> > sync state processing code. As of this commit, part of
> > xlog_state_release_iclog() was open-coded into
> > xfs_log_release_iclog() and as a result the locked error state check
> > was lost.
> >
> > The lockless check still exists, but this doesn't account for the
> > possibility of a race with a shutdown being performed by another
> > task causing the iclog state to change while the original task waits
> > on ->l_icloglock. This has reproduced very rarely via generic/475
> > and manifests as an assert failure in __xlog_state_release_iclog()
> > due to an unexpected iclog state.
> >
> > Restore the locked error state check in xlog_state_release_iclog()
> > to ensure that an iclog state update via shutdown doesn't race with
> > the iclog release state processing code.
> >
> > Reported-by: Zorro Lang <zlang@redhat.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> > fs/xfs/xfs_log.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index f6006d94a581..f38fc492a14d 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -611,6 +611,10 @@ xfs_log_release_iclog(
> > }
> >
> > if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> > + if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > + spin_unlock(&log->l_icloglock);
> > + return -EIO;
> > + }
>
> So the check just above also shuts the file system down. Any reason to
> do that in one case and not the other?
>
The initial check (with the shutdown) was originally associated with the
return from xlog_state_release_iclog(). That covers both state checks,
as they were both originally within that function. My impression was
there isn't a need to shutdown in the second check because the only way
the iclog state changes to IOERROR across that lock cycle is due to a
shutdown already in progress.
Brian
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: fix iclog release error check race with shutdown
2020-02-17 15:29 ` Brian Foster
@ 2020-02-18 15:53 ` Christoph Hellwig
2020-02-18 17:47 ` Brian Foster
0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-02-18 15:53 UTC (permalink / raw)
To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, Zorro Lang
On Mon, Feb 17, 2020 at 10:29:15AM -0500, Brian Foster wrote:
> On Mon, Feb 17, 2020 at 05:33:14AM -0800, Christoph Hellwig wrote:
> > On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> > > Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> > > l_icloglock held"), xlog_state_release_iclog() always performed a
> > > locked check of the iclog error state before proceeding into the
> > > sync state processing code. As of this commit, part of
> > > xlog_state_release_iclog() was open-coded into
> > > xfs_log_release_iclog() and as a result the locked error state check
> > > was lost.
> > >
> > > The lockless check still exists, but this doesn't account for the
> > > possibility of a race with a shutdown being performed by another
> > > task causing the iclog state to change while the original task waits
> > > on ->l_icloglock. This has reproduced very rarely via generic/475
> > > and manifests as an assert failure in __xlog_state_release_iclog()
> > > due to an unexpected iclog state.
> > >
> > > Restore the locked error state check in xlog_state_release_iclog()
> > > to ensure that an iclog state update via shutdown doesn't race with
> > > the iclog release state processing code.
> > >
> > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > fs/xfs/xfs_log.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index f6006d94a581..f38fc492a14d 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -611,6 +611,10 @@ xfs_log_release_iclog(
> > > }
> > >
> > > if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> > > + if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > > + spin_unlock(&log->l_icloglock);
> > > + return -EIO;
> > > + }
> >
> > So the check just above also shuts the file system down. Any reason to
> > do that in one case and not the other?
> >
>
> The initial check (with the shutdown) was originally associated with the
> return from xlog_state_release_iclog(). That covers both state checks,
> as they were both originally within that function. My impression was
> there isn't a need to shutdown in the second check because the only way
> the iclog state changes to IOERROR across that lock cycle is due to a
> shutdown already in progress.
The original code did the force shutdown for both cases. So unless we
have a good reason to do it differently I'd just add a goto label and
merge the two cases to restore the old behavior.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] xfs: fix iclog release error check race with shutdown
2020-02-18 15:53 ` Christoph Hellwig
@ 2020-02-18 17:47 ` Brian Foster
0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2020-02-18 17:47 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: linux-xfs, Zorro Lang
On Tue, Feb 18, 2020 at 07:53:13AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 17, 2020 at 10:29:15AM -0500, Brian Foster wrote:
> > On Mon, Feb 17, 2020 at 05:33:14AM -0800, Christoph Hellwig wrote:
> > > On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> > > > Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> > > > l_icloglock held"), xlog_state_release_iclog() always performed a
> > > > locked check of the iclog error state before proceeding into the
> > > > sync state processing code. As of this commit, part of
> > > > xlog_state_release_iclog() was open-coded into
> > > > xfs_log_release_iclog() and as a result the locked error state check
> > > > was lost.
> > > >
> > > > The lockless check still exists, but this doesn't account for the
> > > > possibility of a race with a shutdown being performed by another
> > > > task causing the iclog state to change while the original task waits
> > > > on ->l_icloglock. This has reproduced very rarely via generic/475
> > > > and manifests as an assert failure in __xlog_state_release_iclog()
> > > > due to an unexpected iclog state.
> > > >
> > > > Restore the locked error state check in xlog_state_release_iclog()
> > > > to ensure that an iclog state update via shutdown doesn't race with
> > > > the iclog release state processing code.
> > > >
> > > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > > fs/xfs/xfs_log.c | 4 ++++
> > > > 1 file changed, 4 insertions(+)
> > > >
> > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > > index f6006d94a581..f38fc492a14d 100644
> > > > --- a/fs/xfs/xfs_log.c
> > > > +++ b/fs/xfs/xfs_log.c
> > > > @@ -611,6 +611,10 @@ xfs_log_release_iclog(
> > > > }
> > > >
> > > > if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> > > > + if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > > > + spin_unlock(&log->l_icloglock);
> > > > + return -EIO;
> > > > + }
> > >
> > > So the check just above also shuts the file system down. Any reason to
> > > do that in one case and not the other?
> > >
> >
> > The initial check (with the shutdown) was originally associated with the
> > return from xlog_state_release_iclog(). That covers both state checks,
> > as they were both originally within that function. My impression was
> > there isn't a need to shutdown in the second check because the only way
> > the iclog state changes to IOERROR across that lock cycle is due to a
> > shutdown already in progress.
>
> The original code did the force shutdown for both cases. So unless we
> have a good reason to do it differently I'd just add a goto label and
> merge the two cases to restore the old behavior.
>
Ok. I'm not sure I see the point, but it's harmless and I can make
Eric's fix as well so I'll post a v2..
Brian
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2020-02-18 17:47 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 18:15 [PATCH] xfs: fix iclog release error check race with shutdown Brian Foster
2020-02-14 19:38 ` Eric Sandeen
2020-02-17 13:33 ` Christoph Hellwig
2020-02-17 15:29 ` Brian Foster
2020-02-18 15:53 ` Christoph Hellwig
2020-02-18 17:47 ` Brian Foster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).