Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] xfs: fix iclog release error check race with shutdown
@ 2020-02-14 18:15 Brian Foster
  2020-02-14 19:38 ` Eric Sandeen
  2020-02-17 13:33 ` Christoph Hellwig
  0 siblings, 2 replies; 6+ messages in thread
From: Brian Foster @ 2020-02-14 18:15 UTC (permalink / raw)
  To: linux-xfs; +Cc: Zorro Lang

Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
l_icloglock held"), xlog_state_release_iclog() always performed a
locked check of the iclog error state before proceeding into the
sync state processing code. As of this commit, part of
xlog_state_release_iclog() was open-coded into
xfs_log_release_iclog() and as a result the locked error state check
was lost.

The lockless check still exists, but this doesn't account for the
possibility of a race with a shutdown being performed by another
task causing the iclog state to change while the original task waits
on ->l_icloglock. This has reproduced very rarely via generic/475
and manifests as an assert failure in __xlog_state_release_iclog()
due to an unexpected iclog state.

Restore the locked error state check in xlog_state_release_iclog()
to ensure that an iclog state update via shutdown doesn't race with
the iclog release state processing code.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_log.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index f6006d94a581..f38fc492a14d 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -611,6 +611,10 @@ xfs_log_release_iclog(
 	}
 
 	if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
+		if (iclog->ic_state == XLOG_STATE_IOERROR) {
+			spin_unlock(&log->l_icloglock);
+			return -EIO;
+		}
 		sync = __xlog_state_release_iclog(log, iclog);
 		spin_unlock(&log->l_icloglock);
 		if (sync)
-- 
2.21.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix iclog release error check race with shutdown
  2020-02-14 18:15 [PATCH] xfs: fix iclog release error check race with shutdown Brian Foster
@ 2020-02-14 19:38 ` Eric Sandeen
  2020-02-17 13:33 ` Christoph Hellwig
  1 sibling, 0 replies; 6+ messages in thread
From: Eric Sandeen @ 2020-02-14 19:38 UTC (permalink / raw)
  To: Brian Foster, linux-xfs; +Cc: Zorro Lang

On 2/14/20 12:15 PM, Brian Foster wrote:
> Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> l_icloglock held"), xlog_state_release_iclog() always performed a
> locked check of the iclog error state before proceeding into the
> sync state processing code. As of this commit, part of
> xlog_state_release_iclog() was open-coded into
> xfs_log_release_iclog() and as a result the locked error state check
> was lost.
> 
> The lockless check still exists, but this doesn't account for the
> possibility of a race with a shutdown being performed by another
> task causing the iclog state to change while the original task waits
> on ->l_icloglock. This has reproduced very rarely via generic/475
> and manifests as an assert failure in __xlog_state_release_iclog()
> due to an unexpected iclog state.
> 
> Restore the locked error state check in xlog_state_release_iclog()
> to ensure that an iclog state update via shutdown doesn't race with
> the iclog release state processing code.
> 
> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>

On vacation* today so not thinking hard about reviews but if this goes in,
Darrick can you please add a:

Fixes: df732b29c8 ("xfs: call xlog_state_release_iclog with l_icloglock held")

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix iclog release error check race with shutdown
  2020-02-14 18:15 [PATCH] xfs: fix iclog release error check race with shutdown Brian Foster
  2020-02-14 19:38 ` Eric Sandeen
@ 2020-02-17 13:33 ` Christoph Hellwig
  2020-02-17 15:29   ` Brian Foster
  1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-02-17 13:33 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs, Zorro Lang

On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> l_icloglock held"), xlog_state_release_iclog() always performed a
> locked check of the iclog error state before proceeding into the
> sync state processing code. As of this commit, part of
> xlog_state_release_iclog() was open-coded into
> xfs_log_release_iclog() and as a result the locked error state check
> was lost.
> 
> The lockless check still exists, but this doesn't account for the
> possibility of a race with a shutdown being performed by another
> task causing the iclog state to change while the original task waits
> on ->l_icloglock. This has reproduced very rarely via generic/475
> and manifests as an assert failure in __xlog_state_release_iclog()
> due to an unexpected iclog state.
> 
> Restore the locked error state check in xlog_state_release_iclog()
> to ensure that an iclog state update via shutdown doesn't race with
> the iclog release state processing code.
> 
> Reported-by: Zorro Lang <zlang@redhat.com>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_log.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> index f6006d94a581..f38fc492a14d 100644
> --- a/fs/xfs/xfs_log.c
> +++ b/fs/xfs/xfs_log.c
> @@ -611,6 +611,10 @@ xfs_log_release_iclog(
>  	}
>  
>  	if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> +		if (iclog->ic_state == XLOG_STATE_IOERROR) {
> +			spin_unlock(&log->l_icloglock);
> +			return -EIO;
> +		}

So the check just above also shuts the file system down.  Any reason to
do that in one case and not the other?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix iclog release error check race with shutdown
  2020-02-17 13:33 ` Christoph Hellwig
@ 2020-02-17 15:29   ` Brian Foster
  2020-02-18 15:53     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Brian Foster @ 2020-02-17 15:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Zorro Lang

On Mon, Feb 17, 2020 at 05:33:14AM -0800, Christoph Hellwig wrote:
> On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> > Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> > l_icloglock held"), xlog_state_release_iclog() always performed a
> > locked check of the iclog error state before proceeding into the
> > sync state processing code. As of this commit, part of
> > xlog_state_release_iclog() was open-coded into
> > xfs_log_release_iclog() and as a result the locked error state check
> > was lost.
> > 
> > The lockless check still exists, but this doesn't account for the
> > possibility of a race with a shutdown being performed by another
> > task causing the iclog state to change while the original task waits
> > on ->l_icloglock. This has reproduced very rarely via generic/475
> > and manifests as an assert failure in __xlog_state_release_iclog()
> > due to an unexpected iclog state.
> > 
> > Restore the locked error state check in xlog_state_release_iclog()
> > to ensure that an iclog state update via shutdown doesn't race with
> > the iclog release state processing code.
> > 
> > Reported-by: Zorro Lang <zlang@redhat.com>
> > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > ---
> >  fs/xfs/xfs_log.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > index f6006d94a581..f38fc492a14d 100644
> > --- a/fs/xfs/xfs_log.c
> > +++ b/fs/xfs/xfs_log.c
> > @@ -611,6 +611,10 @@ xfs_log_release_iclog(
> >  	}
> >  
> >  	if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> > +		if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > +			spin_unlock(&log->l_icloglock);
> > +			return -EIO;
> > +		}
> 
> So the check just above also shuts the file system down.  Any reason to
> do that in one case and not the other?
> 

The initial check (with the shutdown) was originally associated with the
return from xlog_state_release_iclog(). That covers both state checks,
as they were both originally within that function. My impression was
there isn't a need to shutdown in the second check because the only way
the iclog state changes to IOERROR across that lock cycle is due to a
shutdown already in progress.

Brian


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix iclog release error check race with shutdown
  2020-02-17 15:29   ` Brian Foster
@ 2020-02-18 15:53     ` Christoph Hellwig
  2020-02-18 17:47       ` Brian Foster
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2020-02-18 15:53 UTC (permalink / raw)
  To: Brian Foster; +Cc: Christoph Hellwig, linux-xfs, Zorro Lang

On Mon, Feb 17, 2020 at 10:29:15AM -0500, Brian Foster wrote:
> On Mon, Feb 17, 2020 at 05:33:14AM -0800, Christoph Hellwig wrote:
> > On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> > > Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> > > l_icloglock held"), xlog_state_release_iclog() always performed a
> > > locked check of the iclog error state before proceeding into the
> > > sync state processing code. As of this commit, part of
> > > xlog_state_release_iclog() was open-coded into
> > > xfs_log_release_iclog() and as a result the locked error state check
> > > was lost.
> > > 
> > > The lockless check still exists, but this doesn't account for the
> > > possibility of a race with a shutdown being performed by another
> > > task causing the iclog state to change while the original task waits
> > > on ->l_icloglock. This has reproduced very rarely via generic/475
> > > and manifests as an assert failure in __xlog_state_release_iclog()
> > > due to an unexpected iclog state.
> > > 
> > > Restore the locked error state check in xlog_state_release_iclog()
> > > to ensure that an iclog state update via shutdown doesn't race with
> > > the iclog release state processing code.
> > > 
> > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > >  fs/xfs/xfs_log.c | 4 ++++
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > index f6006d94a581..f38fc492a14d 100644
> > > --- a/fs/xfs/xfs_log.c
> > > +++ b/fs/xfs/xfs_log.c
> > > @@ -611,6 +611,10 @@ xfs_log_release_iclog(
> > >  	}
> > >  
> > >  	if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> > > +		if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > > +			spin_unlock(&log->l_icloglock);
> > > +			return -EIO;
> > > +		}
> > 
> > So the check just above also shuts the file system down.  Any reason to
> > do that in one case and not the other?
> > 
> 
> The initial check (with the shutdown) was originally associated with the
> return from xlog_state_release_iclog(). That covers both state checks,
> as they were both originally within that function. My impression was
> there isn't a need to shutdown in the second check because the only way
> the iclog state changes to IOERROR across that lock cycle is due to a
> shutdown already in progress.

The original code did the force shutdown for both cases.  So unless we
have a good reason to do it differently I'd just add a goto label and
merge the two cases to restore the old behavior.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] xfs: fix iclog release error check race with shutdown
  2020-02-18 15:53     ` Christoph Hellwig
@ 2020-02-18 17:47       ` Brian Foster
  0 siblings, 0 replies; 6+ messages in thread
From: Brian Foster @ 2020-02-18 17:47 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Zorro Lang

On Tue, Feb 18, 2020 at 07:53:13AM -0800, Christoph Hellwig wrote:
> On Mon, Feb 17, 2020 at 10:29:15AM -0500, Brian Foster wrote:
> > On Mon, Feb 17, 2020 at 05:33:14AM -0800, Christoph Hellwig wrote:
> > > On Fri, Feb 14, 2020 at 01:15:28PM -0500, Brian Foster wrote:
> > > > Prior to commit df732b29c8 ("xfs: call xlog_state_release_iclog with
> > > > l_icloglock held"), xlog_state_release_iclog() always performed a
> > > > locked check of the iclog error state before proceeding into the
> > > > sync state processing code. As of this commit, part of
> > > > xlog_state_release_iclog() was open-coded into
> > > > xfs_log_release_iclog() and as a result the locked error state check
> > > > was lost.
> > > > 
> > > > The lockless check still exists, but this doesn't account for the
> > > > possibility of a race with a shutdown being performed by another
> > > > task causing the iclog state to change while the original task waits
> > > > on ->l_icloglock. This has reproduced very rarely via generic/475
> > > > and manifests as an assert failure in __xlog_state_release_iclog()
> > > > due to an unexpected iclog state.
> > > > 
> > > > Restore the locked error state check in xlog_state_release_iclog()
> > > > to ensure that an iclog state update via shutdown doesn't race with
> > > > the iclog release state processing code.
> > > > 
> > > > Reported-by: Zorro Lang <zlang@redhat.com>
> > > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > > ---
> > > >  fs/xfs/xfs_log.c | 4 ++++
> > > >  1 file changed, 4 insertions(+)
> > > > 
> > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
> > > > index f6006d94a581..f38fc492a14d 100644
> > > > --- a/fs/xfs/xfs_log.c
> > > > +++ b/fs/xfs/xfs_log.c
> > > > @@ -611,6 +611,10 @@ xfs_log_release_iclog(
> > > >  	}
> > > >  
> > > >  	if (atomic_dec_and_lock(&iclog->ic_refcnt, &log->l_icloglock)) {
> > > > +		if (iclog->ic_state == XLOG_STATE_IOERROR) {
> > > > +			spin_unlock(&log->l_icloglock);
> > > > +			return -EIO;
> > > > +		}
> > > 
> > > So the check just above also shuts the file system down.  Any reason to
> > > do that in one case and not the other?
> > > 
> > 
> > The initial check (with the shutdown) was originally associated with the
> > return from xlog_state_release_iclog(). That covers both state checks,
> > as they were both originally within that function. My impression was
> > there isn't a need to shutdown in the second check because the only way
> > the iclog state changes to IOERROR across that lock cycle is due to a
> > shutdown already in progress.
> 
> The original code did the force shutdown for both cases.  So unless we
> have a good reason to do it differently I'd just add a goto label and
> merge the two cases to restore the old behavior.
> 

Ok. I'm not sure I see the point, but it's harmless and I can make
Eric's fix as well so I'll post a v2..

Brian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-14 18:15 [PATCH] xfs: fix iclog release error check race with shutdown Brian Foster
2020-02-14 19:38 ` Eric Sandeen
2020-02-17 13:33 ` Christoph Hellwig
2020-02-17 15:29   ` Brian Foster
2020-02-18 15:53     ` Christoph Hellwig
2020-02-18 17:47       ` Brian Foster

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git