All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] xfs: handle inconsistent log item formatting state correctly
@ 2018-03-01 22:35 Dave Chinner
  2018-03-02 17:18 ` Darrick J. Wong
  2018-03-05 14:40 ` Brian Foster
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Chinner @ 2018-03-01 22:35 UTC (permalink / raw)
  To: linux-xfs

From: Dave Chinner <dchinner@redhat.com>

Brian Foster reported an oops like:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: xlog_cil_push+0x184/0x400 [xfs]
....
Workqueue: xfs-cil/dm-3 xlog_cil_push_work [xfs]
RIP: 0010:xlog_cil_push+0x184/0x400 [xfs]
....

This was caused by the log item size calculation return that there
were no log vectors to write on a normal buffer. This should only
occur for ordered buffers - the exception handling had never been
executed for a non-ordered buffer. This mean we ended up with
a log item marked dirty (XFS_LID_DIRTY) but with no log vectors
attached that got added to the CIL. When the CIL was pushed, it
tripped over the null log vector on the dirty log item when trying
to chain the log vectors that needed to be written to the log.

Fix this by clearing the XFS_LID_DIRTY flag on the log item if
nothing gets formatted. This will prevent the log item from being
added incorrectly to the CIL, and hence avoid the crash

Reported-by: Brian Foster <bfoster@redhat.com>
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index 61ab5c0a4c45..c1ecd7698100 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -345,9 +345,16 @@ xlog_cil_insert_format_items(
 		if (shadow->lv_buf_len == XFS_LOG_VEC_ORDERED)
 			ordered = true;
 
-		/* Skip items that do not have any vectors for writing */
-		if (!shadow->lv_niovecs && !ordered)
+		/*
+		 * Skip items that do not have any vectors for writing. These
+		 * log items must now be considered clean in this transaction,
+		 * so clear the XFS_LID_DIRTY flag to ensure it doesn't get
+		 * added to the CIL by mistake.
+		 */
+		if (!shadow->lv_niovecs && !ordered) {
+			lidp->lid_flags &= ~XFS_LID_DIRTY;
 			continue;
+		}
 
 		/* compare to existing item size */
 		old_lv = lip->li_lv;
@@ -486,6 +493,14 @@ xlog_cil_insert_items(
 		if (!(lidp->lid_flags & XFS_LID_DIRTY))
 			continue;
 
+		/*
+		 * Log items added to the CIL must have a formatted log vector
+		 * attached to them. This is a requirement of the log vector
+		 * chaining used separate the log items from the changes that
+		 * need to be written to the log in xlog_cil_push().
+		 */
+		ASSERT(lip->li_lv);
+
 		/*
 		 * Only move the item if it isn't already at the tail. This is
 		 * to prevent a transient list_empty() state when reinserting
@@ -655,6 +670,7 @@ xlog_cil_push(
 	struct xfs_log_vec	lvhdr = { NULL };
 	xfs_lsn_t		commit_lsn;
 	xfs_lsn_t		push_seq;
+	struct xfs_log_item	*item;
 
 	if (!cil)
 		return 0;
@@ -722,11 +738,8 @@ xlog_cil_push(
 	 */
 	lv = NULL;
 	num_iovecs = 0;
-	while (!list_empty(&cil->xc_cil)) {
-		struct xfs_log_item	*item;
-
-		item = list_first_entry(&cil->xc_cil,
-					struct xfs_log_item, li_cil);
+	while ((item = list_first_entry_or_null(&cil->xc_cil,
+					struct xfs_log_item, li_cil))) {
 		list_del_init(&item->li_cil);
 		if (!ctx->lv_chain)
 			ctx->lv_chain = item->li_lv;
-- 
2.16.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-01 22:35 [PATCH] xfs: handle inconsistent log item formatting state correctly Dave Chinner
@ 2018-03-02 17:18 ` Darrick J. Wong
  2018-03-02 21:52   ` Dave Chinner
  2018-03-05 14:40 ` Brian Foster
  1 sibling, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2018-03-02 17:18 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Brian Foster reported an oops like:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: xlog_cil_push+0x184/0x400 [xfs]
> ....
> Workqueue: xfs-cil/dm-3 xlog_cil_push_work [xfs]
> RIP: 0010:xlog_cil_push+0x184/0x400 [xfs]
> ....
> 
> This was caused by the log item size calculation return that there
> were no log vectors to write on a normal buffer. This should only
> occur for ordered buffers - the exception handling had never been
> executed for a non-ordered buffer. This mean we ended up with
> a log item marked dirty (XFS_LID_DIRTY) but with no log vectors
> attached that got added to the CIL. When the CIL was pushed, it
> tripped over the null log vector on the dirty log item when trying
> to chain the log vectors that needed to be written to the log.
> 
> Fix this by clearing the XFS_LID_DIRTY flag on the log item if
> nothing gets formatted. This will prevent the log item from being
> added incorrectly to the CIL, and hence avoid the crash

Is there a test case for this, or just intermittent crashes?

> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 61ab5c0a4c45..c1ecd7698100 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -345,9 +345,16 @@ xlog_cil_insert_format_items(
>  		if (shadow->lv_buf_len == XFS_LOG_VEC_ORDERED)
>  			ordered = true;
>  
> -		/* Skip items that do not have any vectors for writing */
> -		if (!shadow->lv_niovecs && !ordered)
> +		/*
> +		 * Skip items that do not have any vectors for writing. These
> +		 * log items must now be considered clean in this transaction,
> +		 * so clear the XFS_LID_DIRTY flag to ensure it doesn't get
> +		 * added to the CIL by mistake.
> +		 */
> +		if (!shadow->lv_niovecs && !ordered) {
> +			lidp->lid_flags &= ~XFS_LID_DIRTY;
>  			continue;
> +		}
>  
>  		/* compare to existing item size */
>  		old_lv = lip->li_lv;
> @@ -486,6 +493,14 @@ xlog_cil_insert_items(
>  		if (!(lidp->lid_flags & XFS_LID_DIRTY))
>  			continue;
>  
> +		/*
> +		 * Log items added to the CIL must have a formatted log vector
> +		 * attached to them. This is a requirement of the log vector
> +		 * chaining used separate the log items from the changes that
                         ^^^^^^^^^^^^^^^^^^
/me trips over this part of the comment, is this supposed to say
'...chaining using separate log items...' or '...used to separate the log
items...'?

Codewise I /think/ I understand what this is trying to do...

--D

> +		 * need to be written to the log in xlog_cil_push().
> +		 */
> +		ASSERT(lip->li_lv);
> +
>  		/*
>  		 * Only move the item if it isn't already at the tail. This is
>  		 * to prevent a transient list_empty() state when reinserting
> @@ -655,6 +670,7 @@ xlog_cil_push(
>  	struct xfs_log_vec	lvhdr = { NULL };
>  	xfs_lsn_t		commit_lsn;
>  	xfs_lsn_t		push_seq;
> +	struct xfs_log_item	*item;
>  
>  	if (!cil)
>  		return 0;
> @@ -722,11 +738,8 @@ xlog_cil_push(
>  	 */
>  	lv = NULL;
>  	num_iovecs = 0;
> -	while (!list_empty(&cil->xc_cil)) {
> -		struct xfs_log_item	*item;
> -
> -		item = list_first_entry(&cil->xc_cil,
> -					struct xfs_log_item, li_cil);
> +	while ((item = list_first_entry_or_null(&cil->xc_cil,
> +					struct xfs_log_item, li_cil))) {
>  		list_del_init(&item->li_cil);
>  		if (!ctx->lv_chain)
>  			ctx->lv_chain = item->li_lv;
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-02 17:18 ` Darrick J. Wong
@ 2018-03-02 21:52   ` Dave Chinner
  0 siblings, 0 replies; 16+ messages in thread
From: Dave Chinner @ 2018-03-02 21:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Fri, Mar 02, 2018 at 09:18:33AM -0800, Darrick J. Wong wrote:
> On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Brian Foster reported an oops like:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: xlog_cil_push+0x184/0x400 [xfs]
> > ....
> > Workqueue: xfs-cil/dm-3 xlog_cil_push_work [xfs]
> > RIP: 0010:xlog_cil_push+0x184/0x400 [xfs]
> > ....
> > 
> > This was caused by the log item size calculation return that there
> > were no log vectors to write on a normal buffer. This should only
> > occur for ordered buffers - the exception handling had never been
> > executed for a non-ordered buffer. This mean we ended up with
> > a log item marked dirty (XFS_LID_DIRTY) but with no log vectors
> > attached that got added to the CIL. When the CIL was pushed, it
> > tripped over the null log vector on the dirty log item when trying
> > to chain the log vectors that needed to be written to the log.
> > 
> > Fix this by clearing the XFS_LID_DIRTY flag on the log item if
> > nothing gets formatted. This will prevent the log item from being
> > added incorrectly to the CIL, and hence avoid the crash
> 
> Is there a test case for this, or just intermittent crashes?

Nope, we can't hit this case right now - it's something that was
only triggered by a bug in my ranged buffer logging patch. It's
clearly a bug, though, so it's a landmine we should remove.

> > Reported-by: Brian Foster <bfoster@redhat.com>
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log_cil.c | 27 ++++++++++++++++++++-------
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 61ab5c0a4c45..c1ecd7698100 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -345,9 +345,16 @@ xlog_cil_insert_format_items(
> >  		if (shadow->lv_buf_len == XFS_LOG_VEC_ORDERED)
> >  			ordered = true;
> >  
> > -		/* Skip items that do not have any vectors for writing */
> > -		if (!shadow->lv_niovecs && !ordered)
> > +		/*
> > +		 * Skip items that do not have any vectors for writing. These
> > +		 * log items must now be considered clean in this transaction,
> > +		 * so clear the XFS_LID_DIRTY flag to ensure it doesn't get
> > +		 * added to the CIL by mistake.
> > +		 */
> > +		if (!shadow->lv_niovecs && !ordered) {
> > +			lidp->lid_flags &= ~XFS_LID_DIRTY;
> >  			continue;
> > +		}
> >  
> >  		/* compare to existing item size */
> >  		old_lv = lip->li_lv;
> > @@ -486,6 +493,14 @@ xlog_cil_insert_items(
> >  		if (!(lidp->lid_flags & XFS_LID_DIRTY))
> >  			continue;
> >  
> > +		/*
> > +		 * Log items added to the CIL must have a formatted log vector
> > +		 * attached to them. This is a requirement of the log vector
> > +		 * chaining used separate the log items from the changes that
>                          ^^^^^^^^^^^^^^^^^^
> /me trips over this part of the comment, is this supposed to say
> '...chaining using separate log items...' or '...used to separate the log
> items...'?

The latter.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-01 22:35 [PATCH] xfs: handle inconsistent log item formatting state correctly Dave Chinner
  2018-03-02 17:18 ` Darrick J. Wong
@ 2018-03-05 14:40 ` Brian Foster
  2018-03-05 22:18   ` Dave Chinner
  1 sibling, 1 reply; 16+ messages in thread
From: Brian Foster @ 2018-03-05 14:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Brian Foster reported an oops like:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> IP: xlog_cil_push+0x184/0x400 [xfs]
> ....
> Workqueue: xfs-cil/dm-3 xlog_cil_push_work [xfs]
> RIP: 0010:xlog_cil_push+0x184/0x400 [xfs]
> ....
> 
> This was caused by the log item size calculation return that there
> were no log vectors to write on a normal buffer. This should only
> occur for ordered buffers - the exception handling had never been
> executed for a non-ordered buffer. This mean we ended up with
> a log item marked dirty (XFS_LID_DIRTY) but with no log vectors
> attached that got added to the CIL. When the CIL was pushed, it
> tripped over the null log vector on the dirty log item when trying
> to chain the log vectors that needed to be written to the log.
> 
> Fix this by clearing the XFS_LID_DIRTY flag on the log item if
> nothing gets formatted. This will prevent the log item from being
> added incorrectly to the CIL, and hence avoid the crash
> 
> Reported-by: Brian Foster <bfoster@redhat.com>
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> ---

Refamiliarizing myself with the problem... we commit a dirty buffer
without any logged ranges, the commit time code manages to skip through
the formatting and whatnot, yet still inserts the item because it is
dirty. A push occurs sometime later and expects to find a log vector,
but runs into a NULL deref because the associated item wasn't prepared.

>  fs/xfs/xfs_log_cil.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index 61ab5c0a4c45..c1ecd7698100 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -345,9 +345,16 @@ xlog_cil_insert_format_items(
>  		if (shadow->lv_buf_len == XFS_LOG_VEC_ORDERED)
>  			ordered = true;
>  
> -		/* Skip items that do not have any vectors for writing */
> -		if (!shadow->lv_niovecs && !ordered)
> +		/*
> +		 * Skip items that do not have any vectors for writing. These
> +		 * log items must now be considered clean in this transaction,
> +		 * so clear the XFS_LID_DIRTY flag to ensure it doesn't get
> +		 * added to the CIL by mistake.
> +		 */
> +		if (!shadow->lv_niovecs && !ordered) {
> +			lidp->lid_flags &= ~XFS_LID_DIRTY;
>  			continue;
> +		}

So to address the problem, we clear the dirty bit at format time and
thus prevent it from being added to the CIL. That looks effective to me
wrt to avoiding the crash, but I'm not sure it sufficiently addresses
the root problem of not logging a dirty item...

Hitting this state in this context means the transaction either set the
dirty state incorrectly or failed to log any data for an item that was
legitimately dirty (the latter being the variant we actually hit via the
buffer dirty range tracking patch). If we do nothing else here (but not
crash), what state have we left the filesystem in? We've presumably
modified the buffer, but will it ever be written back (unless some other
transaction modifies/commits it) if it doesn't make it to the AIL? ISTM
that this is a corruption vector with at least a couple different
interesting paths (e.g. unmount losing data or a crash -> log recovery
inconsistency).

Unless I'm mistaken in the above, I think we need to be a bit more
aggressive in handling this condition. We could obviously assert/warn,
but IMO a shutdown is appropriate given that we may have to assume that
clearing the dirty state made the fs inconsistent (and we already
shutdown for slightly more innocuous things, like tx overrun, in
comparison). The problem is that we need to make sure the current
transaction is not partially committed so we don't actually corrupt the
fs by shutting down, which leads to one last thought...

AFAICT the transaction commit path expects to handle items that are
either not dirty or otherwise must be logged. It looks like the earliest
we identify this problem is xlog_cil_alloc_shadow_bufs() after we've
called ->iop_size(). Any reason we couldn't handle this problem there?
There's no reason to allocate a vector we already know we aren't going
to use (and that is clearly based on invalid parameters), after all.

Given both of the above, a clean way to handle the error may be to allow
the lv allocation code to return an error up through xfs_trans_commit()
and let the caller abort the transaction, since we haven't committed any
part of it at that point. Thoughts?

Brian

>  
>  		/* compare to existing item size */
>  		old_lv = lip->li_lv;
> @@ -486,6 +493,14 @@ xlog_cil_insert_items(
>  		if (!(lidp->lid_flags & XFS_LID_DIRTY))
>  			continue;
>  
> +		/*
> +		 * Log items added to the CIL must have a formatted log vector
> +		 * attached to them. This is a requirement of the log vector
> +		 * chaining used separate the log items from the changes that
> +		 * need to be written to the log in xlog_cil_push().
> +		 */
> +		ASSERT(lip->li_lv);
> +
>  		/*
>  		 * Only move the item if it isn't already at the tail. This is
>  		 * to prevent a transient list_empty() state when reinserting
> @@ -655,6 +670,7 @@ xlog_cil_push(
>  	struct xfs_log_vec	lvhdr = { NULL };
>  	xfs_lsn_t		commit_lsn;
>  	xfs_lsn_t		push_seq;
> +	struct xfs_log_item	*item;
>  
>  	if (!cil)
>  		return 0;
> @@ -722,11 +738,8 @@ xlog_cil_push(
>  	 */
>  	lv = NULL;
>  	num_iovecs = 0;
> -	while (!list_empty(&cil->xc_cil)) {
> -		struct xfs_log_item	*item;
> -
> -		item = list_first_entry(&cil->xc_cil,
> -					struct xfs_log_item, li_cil);
> +	while ((item = list_first_entry_or_null(&cil->xc_cil,
> +					struct xfs_log_item, li_cil))) {
>  		list_del_init(&item->li_cil);
>  		if (!ctx->lv_chain)
>  			ctx->lv_chain = item->li_lv;
> -- 
> 2.16.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-05 14:40 ` Brian Foster
@ 2018-03-05 22:18   ` Dave Chinner
  2018-03-06 12:31     ` Brian Foster
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-03-05 22:18 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Mon, Mar 05, 2018 at 09:40:01AM -0500, Brian Foster wrote:
> On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Brian Foster reported an oops like:
> > 
> > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > IP: xlog_cil_push+0x184/0x400 [xfs]
> > ....
> > Workqueue: xfs-cil/dm-3 xlog_cil_push_work [xfs]
> > RIP: 0010:xlog_cil_push+0x184/0x400 [xfs]
> > ....
> > 
> > This was caused by the log item size calculation return that there
> > were no log vectors to write on a normal buffer. This should only
> > occur for ordered buffers - the exception handling had never been
> > executed for a non-ordered buffer. This mean we ended up with
> > a log item marked dirty (XFS_LID_DIRTY) but with no log vectors
> > attached that got added to the CIL. When the CIL was pushed, it
> > tripped over the null log vector on the dirty log item when trying
> > to chain the log vectors that needed to be written to the log.
> > 
> > Fix this by clearing the XFS_LID_DIRTY flag on the log item if
> > nothing gets formatted. This will prevent the log item from being
> > added incorrectly to the CIL, and hence avoid the crash
> > 
> > Reported-by: Brian Foster <bfoster@redhat.com>
> > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > ---
> 
> Refamiliarizing myself with the problem... we commit a dirty buffer
> without any logged ranges, the commit time code manages to skip through
> the formatting and whatnot, yet still inserts the item because it is
> dirty. A push occurs sometime later and expects to find a log vector,
> but runs into a NULL deref because the associated item wasn't prepared.
> 
> >  fs/xfs/xfs_log_cil.c | 27 ++++++++++++++++++++-------
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index 61ab5c0a4c45..c1ecd7698100 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -345,9 +345,16 @@ xlog_cil_insert_format_items(
> >  		if (shadow->lv_buf_len == XFS_LOG_VEC_ORDERED)
> >  			ordered = true;
> >  
> > -		/* Skip items that do not have any vectors for writing */
> > -		if (!shadow->lv_niovecs && !ordered)
> > +		/*
> > +		 * Skip items that do not have any vectors for writing. These
> > +		 * log items must now be considered clean in this transaction,
> > +		 * so clear the XFS_LID_DIRTY flag to ensure it doesn't get
> > +		 * added to the CIL by mistake.
> > +		 */
> > +		if (!shadow->lv_niovecs && !ordered) {
> > +			lidp->lid_flags &= ~XFS_LID_DIRTY;
> >  			continue;
> > +		}
> 
> So to address the problem, we clear the dirty bit at format time and
> thus prevent it from being added to the CIL. That looks effective to me
> wrt to avoiding the crash, but I'm not sure it sufficiently addresses
> the root problem of not logging a dirty item...

I don't think there's a problem we need to solve with clean objects
that are marked dirty in the transaction.  I'm just fixing an
obvious (now) logic bug in the original code that did not handle
objects that were incorrectly marked dirty safely.

> Hitting this state in this context means the transaction either set the
> dirty state incorrectly or failed to log any data for an item that was
> legitimately dirty (the latter being the variant we actually hit via the
> buffer dirty range tracking patch). If we do nothing else here (but not
> crash), what state have we left the filesystem in?

A perfectly fine state. It's no different to having a clean object
attached to a transaction, and having the transaction commit release
it.

> We've presumably
> modified the buffer,

Not according to the log item, which says there is nothing to write
to the log. It's clean, so treat it that way.

> but will it ever be written back (unless some other
> transaction modifies/commits it) if it doesn't make it to the AIL? ISTM
> that this is a corruption vector with at least a couple different
> interesting paths (e.g. unmount losing data or a crash -> log recovery
> inconsistency).
> 
> Unless I'm mistaken in the above, I think we need to be a bit more
> aggressive in handling this condition. We could obviously assert/warn,
> but IMO a shutdown is appropriate given that we may have to assume that
> clearing the dirty state made the fs inconsistent (and we already
> shutdown for slightly more innocuous things, like tx overrun, in
> comparison). The problem is that we need to make sure the current
> transaction is not partially committed so we don't actually corrupt the
> fs by shutting down, which leads to one last thought...
> 
> AFAICT the transaction commit path expects to handle items that are
> either not dirty or otherwise must be logged. It looks like the earliest
> we identify this problem is xlog_cil_alloc_shadow_bufs() after we've
> called ->iop_size(). Any reason we couldn't handle this problem there?

Because that code can validly be passed a dirty object that ends up
with no logged regions. IMO, shadow buffer allocation is the wrong
place to be determining if a dirty item needs to be added to the CIL
or not - that's the job of the formatting code....

> There's no reason to allocate a vector we already know we aren't going
> to use (and that is clearly based on invalid parameters), after all.

Such objects still need a log vector attached to them so they can be
processed through the CIL and entered into the AIL via the log
vector chain.

> Given both of the above, a clean way to handle the error may be to allow
> the lv allocation code to return an error up through xfs_trans_commit()
> and let the caller abort the transaction, since we haven't committed any
> part of it at that point. Thoughts?

I don't think there's any problem at all with marking an object that
is dirty in a transaction but otherwise unmodified as "clean" in the
transaction and treating it as a clean object....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-05 22:18   ` Dave Chinner
@ 2018-03-06 12:31     ` Brian Foster
  2018-03-06 22:14       ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Foster @ 2018-03-06 12:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Tue, Mar 06, 2018 at 09:18:57AM +1100, Dave Chinner wrote:
> On Mon, Mar 05, 2018 at 09:40:01AM -0500, Brian Foster wrote:
> > On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@redhat.com>
> > > 
> > > Brian Foster reported an oops like:
> > > 
> > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
> > > IP: xlog_cil_push+0x184/0x400 [xfs]
> > > ....
> > > Workqueue: xfs-cil/dm-3 xlog_cil_push_work [xfs]
> > > RIP: 0010:xlog_cil_push+0x184/0x400 [xfs]
> > > ....
> > > 
> > > This was caused by the log item size calculation return that there
> > > were no log vectors to write on a normal buffer. This should only
> > > occur for ordered buffers - the exception handling had never been
> > > executed for a non-ordered buffer. This mean we ended up with
> > > a log item marked dirty (XFS_LID_DIRTY) but with no log vectors
> > > attached that got added to the CIL. When the CIL was pushed, it
> > > tripped over the null log vector on the dirty log item when trying
> > > to chain the log vectors that needed to be written to the log.
> > > 
> > > Fix this by clearing the XFS_LID_DIRTY flag on the log item if
> > > nothing gets formatted. This will prevent the log item from being
> > > added incorrectly to the CIL, and hence avoid the crash
> > > 
> > > Reported-by: Brian Foster <bfoster@redhat.com>
> > > Signed-Off-By: Dave Chinner <dchinner@redhat.com>
> > > ---
> > 
> > Refamiliarizing myself with the problem... we commit a dirty buffer
> > without any logged ranges, the commit time code manages to skip through
> > the formatting and whatnot, yet still inserts the item because it is
> > dirty. A push occurs sometime later and expects to find a log vector,
> > but runs into a NULL deref because the associated item wasn't prepared.
> > 
> > >  fs/xfs/xfs_log_cil.c | 27 ++++++++++++++++++++-------
> > >  1 file changed, 20 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > > index 61ab5c0a4c45..c1ecd7698100 100644
> > > --- a/fs/xfs/xfs_log_cil.c
> > > +++ b/fs/xfs/xfs_log_cil.c
> > > @@ -345,9 +345,16 @@ xlog_cil_insert_format_items(
> > >  		if (shadow->lv_buf_len == XFS_LOG_VEC_ORDERED)
> > >  			ordered = true;
> > >  
> > > -		/* Skip items that do not have any vectors for writing */
> > > -		if (!shadow->lv_niovecs && !ordered)
> > > +		/*
> > > +		 * Skip items that do not have any vectors for writing. These
> > > +		 * log items must now be considered clean in this transaction,
> > > +		 * so clear the XFS_LID_DIRTY flag to ensure it doesn't get
> > > +		 * added to the CIL by mistake.
> > > +		 */
> > > +		if (!shadow->lv_niovecs && !ordered) {
> > > +			lidp->lid_flags &= ~XFS_LID_DIRTY;
> > >  			continue;
> > > +		}
> > 
> > So to address the problem, we clear the dirty bit at format time and
> > thus prevent it from being added to the CIL. That looks effective to me
> > wrt to avoiding the crash, but I'm not sure it sufficiently addresses
> > the root problem of not logging a dirty item...
> 
> I don't think there's a problem we need to solve with clean objects
> that are marked dirty in the transaction.  I'm just fixing an
> obvious (now) logic bug in the original code that did not handle
> objects that were incorrectly marked dirty safely.
> 
> > Hitting this state in this context means the transaction either set the
> > dirty state incorrectly or failed to log any data for an item that was
> > legitimately dirty (the latter being the variant we actually hit via the
> > buffer dirty range tracking patch). If we do nothing else here (but not
> > crash), what state have we left the filesystem in?
> 
> A perfectly fine state. It's no different to having a clean object
> attached to a transaction, and having the transaction commit release
> it.
> 

We don't allocate log vectors for items that aren't dirty in the
transaction, we disallow release of buffers that are dirty in
transactions and AFAICT anywhere we set LID_DIRTY on a lidp we also
dirty the associated transaction, which means we shut down if the
transaction cancels.

> > We've presumably
> > modified the buffer,
> 
> Not according to the log item, which says there is nothing to write
> to the log. It's clean, so treat it that way.
> 

Pretty much any code I refer to suggests XFS_LID_DIRTY is indicative of
a modified object.

> > but will it ever be written back (unless some other
> > transaction modifies/commits it) if it doesn't make it to the AIL? ISTM
> > that this is a corruption vector with at least a couple different
> > interesting paths (e.g. unmount losing data or a crash -> log recovery
> > inconsistency).
> > 
> > Unless I'm mistaken in the above, I think we need to be a bit more
> > aggressive in handling this condition. We could obviously assert/warn,
> > but IMO a shutdown is appropriate given that we may have to assume that
> > clearing the dirty state made the fs inconsistent (and we already
> > shutdown for slightly more innocuous things, like tx overrun, in
> > comparison). The problem is that we need to make sure the current
> > transaction is not partially committed so we don't actually corrupt the
> > fs by shutting down, which leads to one last thought...
> > 
> > AFAICT the transaction commit path expects to handle items that are
> > either not dirty or otherwise must be logged. It looks like the earliest
> > we identify this problem is xlog_cil_alloc_shadow_bufs() after we've
> > called ->iop_size(). Any reason we couldn't handle this problem there?
> 
> Because that code can validly be passed a dirty object that ends up
> with no logged regions. IMO, shadow buffer allocation is the wrong
> place to be determining if a dirty item needs to be added to the CIL
> or not - that's the job of the formatting code....
> 

I don't think the lidp dirty state necessarily means an item has logged
regions or not (and so lack of dirty regions doesn't necessarily justify
clearing the dirty flag). We still log for an invalidated buffer, for
example, and/or an item may have been dirtied by a previous transaction
and end up clean in a subsequent one.

FWIW, I think it would make sense to handle this case at format time if
it were a valid state, not so much if it isn't. I'm simply not following
how it is a valid state. AFAICT we have the following valid states at
transaction commit time:

- lid not dirty, lip state is a 'don't care'
- lid dirty, lip ordered, niovecs == 0
- lid dirty, lip !ordered, niovecs != 0

... and anything else is essentially a bug. Am I missing some other
case?

> > There's no reason to allocate a vector we already know we aren't going
> > to use (and that is clearly based on invalid parameters), after all.
> 
> Such objects still need a log vector attached to them so they can be
> processed through the CIL and entered into the AIL via the log
> vector chain.
> 

Confused. This patch prevents that from happening on such objects (which
you just noted above wrt to the appropriate place to determine whether
to insert into the CIL). Hm?

Brian

> > Given both of the above, a clean way to handle the error may be to allow
> > the lv allocation code to return an error up through xfs_trans_commit()
> > and let the caller abort the transaction, since we haven't committed any
> > part of it at that point. Thoughts?
> 
> I don't think there's any problem at all with marking an object that
> is dirty in a transaction but otherwise unmodified as "clean" in the
> transaction and treating it as a clean object....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-06 12:31     ` Brian Foster
@ 2018-03-06 22:14       ` Dave Chinner
  2018-03-07  1:16         ` Brian Foster
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-03-06 22:14 UTC (permalink / raw)
  To: Brian Foster; +Cc: linux-xfs

On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> On Tue, Mar 06, 2018 at 09:18:57AM +1100, Dave Chinner wrote:
> > On Mon, Mar 05, 2018 at 09:40:01AM -0500, Brian Foster wrote:
> > > On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> > > but will it ever be written back (unless some other
> > > transaction modifies/commits it) if it doesn't make it to the AIL? ISTM
> > > that this is a corruption vector with at least a couple different
> > > interesting paths (e.g. unmount losing data or a crash -> log recovery
> > > inconsistency).
> > > 
> > > Unless I'm mistaken in the above, I think we need to be a bit more
> > > aggressive in handling this condition. We could obviously assert/warn,
> > > but IMO a shutdown is appropriate given that we may have to assume that
> > > clearing the dirty state made the fs inconsistent (and we already
> > > shutdown for slightly more innocuous things, like tx overrun, in
> > > comparison). The problem is that we need to make sure the current
> > > transaction is not partially committed so we don't actually corrupt the
> > > fs by shutting down, which leads to one last thought...
> > > 
> > > AFAICT the transaction commit path expects to handle items that are
> > > either not dirty or otherwise must be logged. It looks like the earliest
> > > we identify this problem is xlog_cil_alloc_shadow_bufs() after we've
> > > called ->iop_size(). Any reason we couldn't handle this problem there?
> > 
> > Because that code can validly be passed a dirty object that ends up
> > with no logged regions. IMO, shadow buffer allocation is the wrong
> > place to be determining if a dirty item needs to be added to the CIL
> > or not - that's the job of the formatting code....
> 
> I don't think the lidp dirty state necessarily means an item has logged
> regions or not (and so lack of dirty regions doesn't necessarily justify
> clearing the dirty flag). We still log for an invalidated buffer, for
> example, and/or an item may have been dirtied by a previous transaction
> and end up clean in a subsequent one.

An invalidated buffer has the BLI_STALE state, and this gets
formatted into the log on commit in the buf log format structure so
that log recovery can correctly cancel invalidated buffers instead
of replaying them. IOWs, they *definitely* have log vectors
associated with them.

That's the thing - anything that is dirtied and needs to be written
to the log will write a log item format structure into a log vector.
The only time this will not occur is if the object is actually clean
and a log item format structure does not need to be written. That's
the case where niovecs = 0.

The only exception to this case is ordered buffers - they have a
dirty lidp, but a clean log item format structure. They need a log
vector structure allocated because they have to pass through the CIL
into the AIL so they can be correctly ordered. But the point here is
that we've passed a "dirty object that ends up with no logged
regions" to the formatting code, and it's special cased this because
it's been marked as an "ordered" log item.

> FWIW, I think it would make sense to handle this case at format time if
> it were a valid state, not so much if it isn't. I'm simply not following
> how it is a valid state.

The object state is valid. putting such objects into the CIL is what
is not valid, and that's where the bug is.

> AFAICT we have the following valid states at
> transaction commit time:
> 
> - lid not dirty, lip state is a 'don't care'
> - lid dirty, lip ordered, niovecs == 0
> - lid dirty, lip !ordered, niovecs != 0
> 
> ... and anything else is essentially a bug. Am I missing some other
> case?

No. We handle all 3 cases you mention above, but we must keep in
mind that there's only one other case here:

- lid dirty, lip !ordered, niovecs == 0

And that's exactly the case that the code currently tries to handle
and gets wrong. It leaves this log item in a bad state (dirty, but
with no attached xfs_log_vec) and adds it to the CIL, and that gets
tripped over later when pushing the CIL.


That's all I'm fixing in this patch. I'm ignoring whether it's
possible or not (the original code thought it necessary to handle),
and if the log item format information says the object is clean,
then it's clean and we should reflect that in the transaction we are
committing. That's why I cleared the lid dirty flag - it's clean,
and we should make sure we treat it consistently as a clean object.

All the transaction cleanup will work correctly when it's committed
and clean log items are released (and freed) because that doesn't
care what the lid dirty state is, just what the log item state is...

> > > There's no reason to allocate a vector we already know we aren't going
> > > to use (and that is clearly based on invalid parameters), after all.
> > 
> > Such objects still need a log vector attached to them so they can be
> > processed through the CIL and entered into the AIL via the log
> > vector chain.
> 
> Confused. This patch prevents that from happening on such objects (which
> you just noted above wrt to the appropriate place to determine whether
> to insert into the CIL). Hm?

See my comments about ordered buffers above. The only difference
between a clean log item we should drop from the transaction and a
clean log item we should consider as dirty and pass through the CIL
into the AIL is the ordered flag....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-06 22:14       ` Dave Chinner
@ 2018-03-07  1:16         ` Brian Foster
  2018-03-07  2:12           ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Foster @ 2018-03-07  1:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > On Tue, Mar 06, 2018 at 09:18:57AM +1100, Dave Chinner wrote:
> > > On Mon, Mar 05, 2018 at 09:40:01AM -0500, Brian Foster wrote:
> > > > On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> > > > but will it ever be written back (unless some other
> > > > transaction modifies/commits it) if it doesn't make it to the AIL? ISTM
> > > > that this is a corruption vector with at least a couple different
> > > > interesting paths (e.g. unmount losing data or a crash -> log recovery
> > > > inconsistency).
> > > > 
> > > > Unless I'm mistaken in the above, I think we need to be a bit more
> > > > aggressive in handling this condition. We could obviously assert/warn,
> > > > but IMO a shutdown is appropriate given that we may have to assume that
> > > > clearing the dirty state made the fs inconsistent (and we already
> > > > shutdown for slightly more innocuous things, like tx overrun, in
> > > > comparison). The problem is that we need to make sure the current
> > > > transaction is not partially committed so we don't actually corrupt the
> > > > fs by shutting down, which leads to one last thought...
> > > > 
> > > > AFAICT the transaction commit path expects to handle items that are
> > > > either not dirty or otherwise must be logged. It looks like the earliest
> > > > we identify this problem is xlog_cil_alloc_shadow_bufs() after we've
> > > > called ->iop_size(). Any reason we couldn't handle this problem there?
> > > 
> > > Because that code can validly be passed a dirty object that ends up
> > > with no logged regions. IMO, shadow buffer allocation is the wrong
> > > place to be determining if a dirty item needs to be added to the CIL
> > > or not - that's the job of the formatting code....
> > 
> > I don't think the lidp dirty state necessarily means an item has logged
> > regions or not (and so lack of dirty regions doesn't necessarily justify
> > clearing the dirty flag). We still log for an invalidated buffer, for
> > example, and/or an item may have been dirtied by a previous transaction
> > and end up clean in a subsequent one.
> 
> An invalidated buffer has the BLI_STALE state, and this gets
> formatted into the log on commit in the buf log format structure so
> that log recovery can correctly cancel invalidated buffers instead
> of replaying them. IOWs, they *definitely* have log vectors
> associated with them.
> 

Yep, pretty much what I stated above.

> That's the thing - anything that is dirtied and needs to be written
> to the log will write a log item format structure into a log vector.
> The only time this will not occur is if the object is actually clean
> and a log item format structure does not need to be written. That's
> the case where niovecs = 0.
> 
> The only exception to this case is ordered buffers - they have a
> dirty lidp, but a clean log item format structure. They need a log
> vector structure allocated because they have to pass through the CIL
> into the AIL so they can be correctly ordered. But the point here is
> that we've passed a "dirty object that ends up with no logged
> regions" to the formatting code, and it's special cased this because
> it's been marked as an "ordered" log item.
> 

Right, and both of these fall under the categories I had listed below
(lidp clean or lidp dirty+ordered)...

> > FWIW, I think it would make sense to handle this case at format time if
> > it were a valid state, not so much if it isn't. I'm simply not following
> > how it is a valid state.
> 
> The object state is valid. putting such objects into the CIL is what
> is not valid, and that's where the bug is.
> 
> > AFAICT we have the following valid states at
> > transaction commit time:
> > 
> > - lid not dirty, lip state is a 'don't care'
> > - lid dirty, lip ordered, niovecs == 0
> > - lid dirty, lip !ordered, niovecs != 0
> > 
> > ... and anything else is essentially a bug. Am I missing some other
> > case?
> 
> No. We handle all 3 cases you mention above, but we must keep in
> mind that there's only one other case here:
> 
> - lid dirty, lip !ordered, niovecs == 0
> 
> And that's exactly the case that the code currently tries to handle
> and gets wrong. It leaves this log item in a bad state (dirty, but
> with no attached xfs_log_vec) and adds it to the CIL, and that gets
> tripped over later when pushing the CIL.
> 
> 
> That's all I'm fixing in this patch. I'm ignoring whether it's
> possible or not (the original code thought it necessary to handle),
> and if the log item format information says the object is clean,
> then it's clean and we should reflect that in the transaction we are
> committing. That's why I cleared the lid dirty flag - it's clean,
> and we should make sure we treat it consistently as a clean object.
>

My argument is not that it's not possible. My argument is that the
semantics of XFS_LID_DIRTY suggest the transaction modified the object
in memory. Today, that means we log a range of the object, log an
invalidation or order a buffer. E.g., we modify an actual metadata
object, dirty the item (log or order), dirty the lidp and dirty the
transaction. If we didn't ever modify a particular object, then there's
no reason to dirty it that I can see. Failing to log/order the object
properly doesn't justify assuming it hasn't been modified IMO, we don't
really know either way.

The way the flag is used seems to bear that out. It's set when an object
has been modified by a transaction. The way the flag alters behavior
suggests the same. Objects that have been dirtied by a transaction
cannot be released from that transaction and a cancel of a transaction
with a dirty lidp causes fs shutdown (because we dirty the transaction
once we dirty a lidp).

> All the transaction cleanup will work correctly when it's committed
> and clean log items are released (and freed) because that doesn't
> care what the lid dirty state is, just what the log item state is...
> 

A cancel of that same transaction would shutdown the fs because it
dirtied (i.e., presumably modified) an object. So we can't cancel the
transaction for risk of corruption, we can't release the object from the
transaction, yet this patch proposes behavior where a commit of that
transaction can silently undirty the lidp, commit whatever else might be
in the transaction and carry on as if nothing were wrong.

At the very least, this is inconsistent with how this flag is used
everywhere else. How do you explain that?

> > > > There's no reason to allocate a vector we already know we aren't going
> > > > to use (and that is clearly based on invalid parameters), after all.
> > > 
> > > Such objects still need a log vector attached to them so they can be
> > > processed through the CIL and entered into the AIL via the log
> > > vector chain.
> > 
> > Confused. This patch prevents that from happening on such objects (which
> > you just noted above wrt to the appropriate place to determine whether
> > to insert into the CIL). Hm?
> 
> See my comments about ordered buffers above. The only difference
> between a clean log item we should drop from the transaction and a
> clean log item we should consider as dirty and pass through the CIL
> into the AIL is the ordered flag....
> 

Sure, but that doesn't answer my question. Why does this patch allocate
log vectors for log items that are not going to be inserted to the CIL
by the committing transaction? I don't see a reason to do that
regardless of the validity question.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-07  1:16         ` Brian Foster
@ 2018-03-07  2:12           ` Darrick J. Wong
  2018-03-07 14:05             ` Brian Foster
  2018-03-08  4:40             ` Dave Chinner
  0 siblings, 2 replies; 16+ messages in thread
From: Darrick J. Wong @ 2018-03-07  2:12 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > > On Tue, Mar 06, 2018 at 09:18:57AM +1100, Dave Chinner wrote:
> > > > On Mon, Mar 05, 2018 at 09:40:01AM -0500, Brian Foster wrote:
> > > > > On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> > > > > but will it ever be written back (unless some other
> > > > > transaction modifies/commits it) if it doesn't make it to the AIL? ISTM
> > > > > that this is a corruption vector with at least a couple different
> > > > > interesting paths (e.g. unmount losing data or a crash -> log recovery
> > > > > inconsistency).
> > > > > 
> > > > > Unless I'm mistaken in the above, I think we need to be a bit more
> > > > > aggressive in handling this condition. We could obviously assert/warn,
> > > > > but IMO a shutdown is appropriate given that we may have to assume that
> > > > > clearing the dirty state made the fs inconsistent (and we already
> > > > > shutdown for slightly more innocuous things, like tx overrun, in
> > > > > comparison). The problem is that we need to make sure the current
> > > > > transaction is not partially committed so we don't actually corrupt the
> > > > > fs by shutting down, which leads to one last thought...
> > > > > 
> > > > > AFAICT the transaction commit path expects to handle items that are
> > > > > either not dirty or otherwise must be logged. It looks like the earliest
> > > > > we identify this problem is xlog_cil_alloc_shadow_bufs() after we've
> > > > > called ->iop_size(). Any reason we couldn't handle this problem there?
> > > > 
> > > > Because that code can validly be passed a dirty object that ends up
> > > > with no logged regions. IMO, shadow buffer allocation is the wrong
> > > > place to be determining if a dirty item needs to be added to the CIL
> > > > or not - that's the job of the formatting code....
> > > 
> > > I don't think the lidp dirty state necessarily means an item has logged
> > > regions or not (and so lack of dirty regions doesn't necessarily justify
> > > clearing the dirty flag). We still log for an invalidated buffer, for
> > > example, and/or an item may have been dirtied by a previous transaction
> > > and end up clean in a subsequent one.
> > 
> > An invalidated buffer has the BLI_STALE state, and this gets
> > formatted into the log on commit in the buf log format structure so
> > that log recovery can correctly cancel invalidated buffers instead
> > of replaying them. IOWs, they *definitely* have log vectors
> > associated with them.
> > 
> 
> Yep, pretty much what I stated above.
> 
> > That's the thing - anything that is dirtied and needs to be written
> > to the log will write a log item format structure into a log vector.
> > The only time this will not occur is if the object is actually clean
> > and a log item format structure does not need to be written. That's
> > the case where niovecs = 0.
> > 
> > The only exception to this case is ordered buffers - they have a
> > dirty lidp, but a clean log item format structure. They need a log
> > vector structure allocated because they have to pass through the CIL
> > into the AIL so they can be correctly ordered. But the point here is
> > that we've passed a "dirty object that ends up with no logged
> > regions" to the formatting code, and it's special cased this because
> > it's been marked as an "ordered" log item.
> > 
> 
> Right, and both of these fall under the categories I had listed below
> (lidp clean or lidp dirty+ordered)...
> 
> > > FWIW, I think it would make sense to handle this case at format time if
> > > it were a valid state, not so much if it isn't. I'm simply not following
> > > how it is a valid state.
> > 
> > The object state is valid. putting such objects into the CIL is what
> > is not valid, and that's where the bug is.
> > 
> > > AFAICT we have the following valid states at
> > > transaction commit time:
> > > 
> > > - lid not dirty, lip state is a 'don't care'
> > > - lid dirty, lip ordered, niovecs == 0
> > > - lid dirty, lip !ordered, niovecs != 0
> > > 
> > > ... and anything else is essentially a bug. Am I missing some other
> > > case?
> > 
> > No. We handle all 3 cases you mention above, but we must keep in
> > mind that there's only one other case here:
> > 
> > - lid dirty, lip !ordered, niovecs == 0
> > 
> > And that's exactly the case that the code currently tries to handle
> > and gets wrong. It leaves this log item in a bad state (dirty, but
> > with no attached xfs_log_vec) and adds it to the CIL, and that gets
> > tripped over later when pushing the CIL.
> > 
> > 
> > That's all I'm fixing in this patch. I'm ignoring whether it's
> > possible or not (the original code thought it necessary to handle),
> > and if the log item format information says the object is clean,
> > then it's clean and we should reflect that in the transaction we are
> > committing. That's why I cleared the lid dirty flag - it's clean,
> > and we should make sure we treat it consistently as a clean object.
> >
> 
> My argument is not that it's not possible. My argument is that the
> semantics of XFS_LID_DIRTY suggest the transaction modified the object
> in memory. Today, that means we log a range of the object, log an
> invalidation or order a buffer. E.g., we modify an actual metadata
> object, dirty the item (log or order), dirty the lidp and dirty the
> transaction. If we didn't ever modify a particular object, then there's
> no reason to dirty it that I can see. Failing to log/order the object
> properly doesn't justify assuming it hasn't been modified IMO, we don't
> really know either way.
> 
> The way the flag is used seems to bear that out. It's set when an object
> has been modified by a transaction. The way the flag alters behavior
> suggests the same. Objects that have been dirtied by a transaction
> cannot be released from that transaction and a cancel of a transaction
> with a dirty lidp causes fs shutdown (because we dirty the transaction
> once we dirty a lidp).

At this point your disheveled maintainer stumbles in with stale replies
that took three days to get to him, and realizes that the crash
mentioned in the commit message only happened if Dave's buffer range
logging patch was applied and it miscalculated the log item size. :(

> > All the transaction cleanup will work correctly when it's committed
> > and clean log items are released (and freed) because that doesn't
> > care what the lid dirty state is, just what the log item state is...
> > 
> 
> A cancel of that same transaction would shutdown the fs because it
> dirtied (i.e., presumably modified) an object. So we can't cancel the
> transaction for risk of corruption, we can't release the object from the
> transaction, yet this patch proposes behavior where a commit of that
> transaction can silently undirty the lidp, commit whatever else might be
> in the transaction and carry on as if nothing were wrong.
> 
> At the very least, this is inconsistent with how this flag is used
> everywhere else. How do you explain that?

So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
and this patch proposes that if we ever see this invalid state then we
decide that no the buffer isn't dirty since there are zero iovecs.  This
prevents the log from allocating anything for this item since there's no
evidence of anything being dirty.

I think I'd be more comfortable with this patch if there was an ASSERT
to that effect so that if some future patch screws up and fails to
attach any iovecs to a dirty buffer (or sets the dirty flag without
marking anything dirty) then it's clear that this isn't supposed to
happen.

As for clearing LID_DIRTY, someone handed us a log item in an
inconsistent state, so perhaps we should shut down the fs instead?  I
get that LID_DIRTY is a flag that is a proxy for counting the iovecs,
but if the iovec array and the flag disagree then something funny is
going on.  Then again I guess it's mysterious if something scribbles on
our incore log and the fs just shuts down... so I guess I'd be ok with
clearing the flag so long as /something/ gets logged about in-memory
state being weird even on production kernels.

--D

> > > > > There's no reason to allocate a vector we already know we aren't going
> > > > > to use (and that is clearly based on invalid parameters), after all.
> > > > 
> > > > Such objects still need a log vector attached to them so they can be
> > > > processed through the CIL and entered into the AIL via the log
> > > > vector chain.
> > > 
> > > Confused. This patch prevents that from happening on such objects (which
> > > you just noted above wrt to the appropriate place to determine whether
> > > to insert into the CIL). Hm?
> > 
> > See my comments about ordered buffers above. The only difference
> > between a clean log item we should drop from the transaction and a
> > clean log item we should consider as dirty and pass through the CIL
> > into the AIL is the ordered flag....
> > 
> 
> Sure, but that doesn't answer my question. Why does this patch allocate
> log vectors for log items that are not going to be inserted to the CIL
> by the committing transaction? I don't see a reason to do that
> regardless of the validity question.
> 
> Brian
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-07  2:12           ` Darrick J. Wong
@ 2018-03-07 14:05             ` Brian Foster
  2018-03-08  4:40             ` Dave Chinner
  1 sibling, 0 replies; 16+ messages in thread
From: Brian Foster @ 2018-03-07 14:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote:
> On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > > > On Tue, Mar 06, 2018 at 09:18:57AM +1100, Dave Chinner wrote:
> > > > > On Mon, Mar 05, 2018 at 09:40:01AM -0500, Brian Foster wrote:
> > > > > > On Fri, Mar 02, 2018 at 09:35:27AM +1100, Dave Chinner wrote:
> > > > > > but will it ever be written back (unless some other
> > > > > > transaction modifies/commits it) if it doesn't make it to the AIL? ISTM
> > > > > > that this is a corruption vector with at least a couple different
> > > > > > interesting paths (e.g. unmount losing data or a crash -> log recovery
> > > > > > inconsistency).
> > > > > > 
> > > > > > Unless I'm mistaken in the above, I think we need to be a bit more
> > > > > > aggressive in handling this condition. We could obviously assert/warn,
> > > > > > but IMO a shutdown is appropriate given that we may have to assume that
> > > > > > clearing the dirty state made the fs inconsistent (and we already
> > > > > > shutdown for slightly more innocuous things, like tx overrun, in
> > > > > > comparison). The problem is that we need to make sure the current
> > > > > > transaction is not partially committed so we don't actually corrupt the
> > > > > > fs by shutting down, which leads to one last thought...
> > > > > > 
> > > > > > AFAICT the transaction commit path expects to handle items that are
> > > > > > either not dirty or otherwise must be logged. It looks like the earliest
> > > > > > we identify this problem is xlog_cil_alloc_shadow_bufs() after we've
> > > > > > called ->iop_size(). Any reason we couldn't handle this problem there?
> > > > > 
> > > > > Because that code can validly be passed a dirty object that ends up
> > > > > with no logged regions. IMO, shadow buffer allocation is the wrong
> > > > > place to be determining if a dirty item needs to be added to the CIL
> > > > > or not - that's the job of the formatting code....
> > > > 
> > > > I don't think the lidp dirty state necessarily means an item has logged
> > > > regions or not (and so lack of dirty regions doesn't necessarily justify
> > > > clearing the dirty flag). We still log for an invalidated buffer, for
> > > > example, and/or an item may have been dirtied by a previous transaction
> > > > and end up clean in a subsequent one.
> > > 
> > > An invalidated buffer has the BLI_STALE state, and this gets
> > > formatted into the log on commit in the buf log format structure so
> > > that log recovery can correctly cancel invalidated buffers instead
> > > of replaying them. IOWs, they *definitely* have log vectors
> > > associated with them.
> > > 
> > 
> > Yep, pretty much what I stated above.
> > 
> > > That's the thing - anything that is dirtied and needs to be written
> > > to the log will write a log item format structure into a log vector.
> > > The only time this will not occur is if the object is actually clean
> > > and a log item format structure does not need to be written. That's
> > > the case where niovecs = 0.
> > > 
> > > The only exception to this case is ordered buffers - they have a
> > > dirty lidp, but a clean log item format structure. They need a log
> > > vector structure allocated because they have to pass through the CIL
> > > into the AIL so they can be correctly ordered. But the point here is
> > > that we've passed a "dirty object that ends up with no logged
> > > regions" to the formatting code, and it's special cased this because
> > > it's been marked as an "ordered" log item.
> > > 
> > 
> > Right, and both of these fall under the categories I had listed below
> > (lidp clean or lidp dirty+ordered)...
> > 
> > > > FWIW, I think it would make sense to handle this case at format time if
> > > > it were a valid state, not so much if it isn't. I'm simply not following
> > > > how it is a valid state.
> > > 
> > > The object state is valid. putting such objects into the CIL is what
> > > is not valid, and that's where the bug is.
> > > 
> > > > AFAICT we have the following valid states at
> > > > transaction commit time:
> > > > 
> > > > - lid not dirty, lip state is a 'don't care'
> > > > - lid dirty, lip ordered, niovecs == 0
> > > > - lid dirty, lip !ordered, niovecs != 0
> > > > 
> > > > ... and anything else is essentially a bug. Am I missing some other
> > > > case?
> > > 
> > > No. We handle all 3 cases you mention above, but we must keep in
> > > mind that there's only one other case here:
> > > 
> > > - lid dirty, lip !ordered, niovecs == 0
> > > 
> > > And that's exactly the case that the code currently tries to handle
> > > and gets wrong. It leaves this log item in a bad state (dirty, but
> > > with no attached xfs_log_vec) and adds it to the CIL, and that gets
> > > tripped over later when pushing the CIL.
> > > 
> > > 
> > > That's all I'm fixing in this patch. I'm ignoring whether it's
> > > possible or not (the original code thought it necessary to handle),
> > > and if the log item format information says the object is clean,
> > > then it's clean and we should reflect that in the transaction we are
> > > committing. That's why I cleared the lid dirty flag - it's clean,
> > > and we should make sure we treat it consistently as a clean object.
> > >
> > 
> > My argument is not that it's not possible. My argument is that the
> > semantics of XFS_LID_DIRTY suggest the transaction modified the object
> > in memory. Today, that means we log a range of the object, log an
> > invalidation or order a buffer. E.g., we modify an actual metadata
> > object, dirty the item (log or order), dirty the lidp and dirty the
> > transaction. If we didn't ever modify a particular object, then there's
> > no reason to dirty it that I can see. Failing to log/order the object
> > properly doesn't justify assuming it hasn't been modified IMO, we don't
> > really know either way.
> > 
> > The way the flag is used seems to bear that out. It's set when an object
> > has been modified by a transaction. The way the flag alters behavior
> > suggests the same. Objects that have been dirtied by a transaction
> > cannot be released from that transaction and a cancel of a transaction
> > with a dirty lidp causes fs shutdown (because we dirty the transaction
> > once we dirty a lidp).
> 
> At this point your disheveled maintainer stumbles in with stale replies
> that took three days to get to him, and realizes that the crash
> mentioned in the commit message only happened if Dave's buffer range
> logging patch was applied and it miscalculated the log item size. :(
> 

;) Yeah, this originated from a bug in an under-development patch that
failed to correctly track a range to log in a buffer. This doesn't
happen in the current code, but using that bug as an example, this patch
would convert that bug from a crash/panic into a log recovery corruption
vector [1].

Obviously crashing is not good, but I think something like a shutdown
might be more appropriate.

> > > All the transaction cleanup will work correctly when it's committed
> > > and clean log items are released (and freed) because that doesn't
> > > care what the lid dirty state is, just what the log item state is...
> > > 
> > 
> > A cancel of that same transaction would shutdown the fs because it
> > dirtied (i.e., presumably modified) an object. So we can't cancel the
> > transaction for risk of corruption, we can't release the object from the
> > transaction, yet this patch proposes behavior where a commit of that
> > transaction can silently undirty the lidp, commit whatever else might be
> > in the transaction and carry on as if nothing were wrong.
> > 
> > At the very least, this is inconsistent with how this flag is used
> > everywhere else. How do you explain that?
> 
> So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
> and this patch proposes that if we ever see this invalid state then we
> decide that no the buffer isn't dirty since there are zero iovecs.  This
> prevents the log from allocating anything for this item since there's no
> evidence of anything being dirty.
> 

Well, AFAICT we still allocate a log vector to attach to the log item
(if one doesn't exist already), but that might not be your point.
Indeed, we essentially wouldn't log anything or otherwise insert the log
item if ->iop_size() tells us there's nothing to log (or order).

> I think I'd be more comfortable with this patch if there was an ASSERT
> to that effect so that if some future patch screws up and fails to
> attach any iovecs to a dirty buffer (or sets the dirty flag without
> marking anything dirty) then it's clear that this isn't supposed to
> happen.
> 

... or if some future patch modifies a buffer and somehow dirties it
without logging it, or if the logging code breaks and fails to track a
request to log an object. ;)

Yeah, I agree on an assert as a minimum contingency.

> As for clearing LID_DIRTY, someone handed us a log item in an
> inconsistent state, so perhaps we should shut down the fs instead?  I
> get that LID_DIRTY is a flag that is a proxy for counting the iovecs,
> but if the iovec array and the flag disagree then something funny is
> going on.  Then again I guess it's mysterious if something scribbles on
> our incore log and the fs just shuts down... so I guess I'd be ok with
> clearing the flag so long as /something/ gets logged about in-memory
> state being weird even on production kernels.
> 

As I see it, it's really not so much about the log item as it is about
the implications of implicitly (partially) clearing a dirty transaction.
We definitely don't want to insert a clean (niovecs == 0, ->li_lv ==
NULL) log item into the CIL. That's part of what this patch does and
certainly makes sense.

However, what it also does is assume the transaction that set
the lidp dirty did not physically modify the associated metadata item
based on what was(n't) logged. If the metadata wasn't modified, then
everything is fine, this causes no problems and shutting down would be
overkill for what was likely a code bug to erroneously dirty a lidp. If
the metadata was modified, then the transaction commit probably corrupts
the filesystem (consider how whatever else is in the transaction may
relate to the modified/unlogged buffer).

Essentially all I'm really suggesting is an earlier error check on the
transaction during the commit process. E.g., if a lidp is dirty but
points to a clean lip, return an error rather than assume whatever bug
that lead to dirtying the lip didn't modify the in-core metadata so that
in addition to not inserting to the CIL, we also minimize risk of silent
corruption.

Brian

[1] The specific bug was a case where we create a multi-block symlink
such that each block is an independent buffer. The last buffer holds the
last byte of the symlink, but the xfs_trans_log_buf() code failed to log
that byte, which essentially means that nothing in the buffer was
logged.

If I remember all the details correctly, this resulted in the following
sequence at tx commit:

- allocate a niovecs == 0 log vector (via ->iop_size())
- the log format code skips the associated log item because niovecs == 0
	- we skip xfs_cil_prepare_item(), therefore lip->li_lv == NULL
- the log item ends up in the CIL because the lidp is dirty
- CIL push occurs sometime later, iterates the CIL log items and crashes
  into a NULL lip->li_lv

With this patch, the sequence for that same bug looks more like:

- allocate a niovecs == 0 log vector
- log format code skips the associated log item and clears the dirty
  lidp state
- the log item is not inserted to the CIL because the lidp is clean

This all seems fine from a CIL/log item perspective because all that
matters from the CIL perspective is that we don't insert the log item
with a NULL log vector. However, if we step off CIL island for a moment
and consider the bigger picture ramifications of essentially cancelling
a dirty part of the transaction in this example, I think you'll have an
idea of my question/concern. HTH.

> --D
> 
> > > > > > There's no reason to allocate a vector we already know we aren't going
> > > > > > to use (and that is clearly based on invalid parameters), after all.
> > > > > 
> > > > > Such objects still need a log vector attached to them so they can be
> > > > > processed through the CIL and entered into the AIL via the log
> > > > > vector chain.
> > > > 
> > > > Confused. This patch prevents that from happening on such objects (which
> > > > you just noted above wrt to the appropriate place to determine whether
> > > > to insert into the CIL). Hm?
> > > 
> > > See my comments about ordered buffers above. The only difference
> > > between a clean log item we should drop from the transaction and a
> > > clean log item we should consider as dirty and pass through the CIL
> > > into the AIL is the ordered flag....
> > > 
> > 
> > Sure, but that doesn't answer my question. Why does this patch allocate
> > log vectors for log items that are not going to be inserted to the CIL
> > by the committing transaction? I don't see a reason to do that
> > regardless of the validity question.
> > 
> > Brian
> > 
> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > david@fromorbit.com
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-07  2:12           ` Darrick J. Wong
  2018-03-07 14:05             ` Brian Foster
@ 2018-03-08  4:40             ` Dave Chinner
  2018-03-08 19:11               ` Brian Foster
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-03-08  4:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Brian Foster, linux-xfs

On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote:
> On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > My argument is not that it's not possible. My argument is that the
> > semantics of XFS_LID_DIRTY suggest the transaction modified the object
> > in memory. Today, that means we log a range of the object, log an
> > invalidation or order a buffer. E.g., we modify an actual metadata
> > object, dirty the item (log or order), dirty the lidp and dirty the
> > transaction. If we didn't ever modify a particular object, then there's
> > no reason to dirty it that I can see. Failing to log/order the object
> > properly doesn't justify assuming it hasn't been modified IMO, we don't
> > really know either way.
> > 
> > The way the flag is used seems to bear that out. It's set when an object
> > has been modified by a transaction. The way the flag alters behavior
> > suggests the same. Objects that have been dirtied by a transaction
> > cannot be released from that transaction and a cancel of a transaction
> > with a dirty lidp causes fs shutdown (because we dirty the transaction
> > once we dirty a lidp).

I'll point out that I've just stumbled onto a series of bugs where
log items are multiply joined to a single transaction, in which case
the lidp state may not reflect the current state of the log item
because the log item no longer points to the lidp in question.

This also raises questions about what happens when we process a log
item twice in the cil commit infrastructure - two formatting passes,
multiple inserts into the CIL list, multiple calls to
iop_committing/iop_unlock when the transaction is freed, etc.
There's lots of shit that could go wrong as a result of this type of
bug...

> > A cancel of that same transaction would shutdown the fs because it
> > dirtied (i.e., presumably modified) an object. So we can't cancel the
> > transaction for risk of corruption, we can't release the object from the
> > transaction, yet this patch proposes behavior where a commit of that
> > transaction can silently undirty the lidp, commit whatever else might be
> > in the transaction and carry on as if nothing were wrong.
> > 
> > At the very least, this is inconsistent with how this flag is used
> > everywhere else. How do you explain that?
> 
> So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
> and this patch proposes that if we ever see this invalid state then we
> decide that no the buffer isn't dirty since there are zero iovecs.  This
> prevents the log from allocating anything for this item since there's no
> evidence of anything being dirty.

Essentially. The log item dirty state is the thing we trust right
through the log item life cycle. It's fundamental to the relogging
algorithm we use to keep dirty metadata moving forwards through the
log.

The log item descriptor, OTOH, was just an abstraction that allowed
the transaction commit to couple the log item formatting to the
xlog_write() vector calls, which is something that went away with
delayed logging about 8 years ago. The only piece of the log item
descriptor that remained was the dirty flag, and the issue here
boils down to one simple question: which dirty state do we trust -
the log item or the descriptor?

That, as an architectural question, is a no brainer. It's the log
item state that matters. The log item descriptor is an abstraction
long past it's use-by date, so I'm going to resolve this problem
simply by removing it (if you are wondering how I found the
mulitply-joined log item bugs...).

In doing this, we no longer have the question of which one to trust
- all the "dirty in transaction" state is carried on the log item
itself and is valid only during the life of a transaction.  At
which point, there should be no possibility of the log item dirty
flag getting out of step with it's dirty state, and we can simply
add asserts in the write place to validate this.

> As for clearing LID_DIRTY, someone handed us a log item in an
> inconsistent state, so perhaps we should shut down the fs instead?  I
> get that LID_DIRTY is a flag that is a proxy for counting the iovecs,
> but if the iovec array and the flag disagree then something funny is
> going on.

Yup, like we have a multiply joined log item and stale log item
descriptors. :/

> Then again I guess it's mysterious if something scribbles on
> our incore log and the fs just shuts down... so I guess I'd be ok with
> clearing the flag so long as /something/ gets logged about in-memory
> state being weird even on production kernels.

I'm just going to make the coherency problem go away completely.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-08  4:40             ` Dave Chinner
@ 2018-03-08 19:11               ` Brian Foster
  2018-03-08 22:10                 ` Dave Chinner
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Foster @ 2018-03-08 19:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Thu, Mar 08, 2018 at 03:40:17PM +1100, Dave Chinner wrote:
> On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote:
> > On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> > > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > > My argument is not that it's not possible. My argument is that the
> > > semantics of XFS_LID_DIRTY suggest the transaction modified the object
> > > in memory. Today, that means we log a range of the object, log an
> > > invalidation or order a buffer. E.g., we modify an actual metadata
> > > object, dirty the item (log or order), dirty the lidp and dirty the
> > > transaction. If we didn't ever modify a particular object, then there's
> > > no reason to dirty it that I can see. Failing to log/order the object
> > > properly doesn't justify assuming it hasn't been modified IMO, we don't
> > > really know either way.
> > > 
> > > The way the flag is used seems to bear that out. It's set when an object
> > > has been modified by a transaction. The way the flag alters behavior
> > > suggests the same. Objects that have been dirtied by a transaction
> > > cannot be released from that transaction and a cancel of a transaction
> > > with a dirty lidp causes fs shutdown (because we dirty the transaction
> > > once we dirty a lidp).
> 
> I'll point out that I've just stumbled onto a series of bugs where
> log items are multiply joined to a single transaction, in which case
> the lidp state may not reflect the current state of the log item
> because the log item no longer points to the lidp in question.
> 
> This also raises questions about what happens when we process a log
> item twice in the cil commit infrastructure - two formatting passes,
> multiple inserts into the CIL list, multiple calls to
> iop_committing/iop_unlock when the transaction is freed, etc.
> There's lots of shit that could go wrong as a result of this type of
> bug...
> 
> > > A cancel of that same transaction would shutdown the fs because it
> > > dirtied (i.e., presumably modified) an object. So we can't cancel the
> > > transaction for risk of corruption, we can't release the object from the
> > > transaction, yet this patch proposes behavior where a commit of that
> > > transaction can silently undirty the lidp, commit whatever else might be
> > > in the transaction and carry on as if nothing were wrong.
> > > 
> > > At the very least, this is inconsistent with how this flag is used
> > > everywhere else. How do you explain that?
> > 
> > So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
> > and this patch proposes that if we ever see this invalid state then we
> > decide that no the buffer isn't dirty since there are zero iovecs.  This
> > prevents the log from allocating anything for this item since there's no
> > evidence of anything being dirty.
> 
> Essentially. The log item dirty state is the thing we trust right
> through the log item life cycle. It's fundamental to the relogging
> algorithm we use to keep dirty metadata moving forwards through the
> log.
> 
> The log item descriptor, OTOH, was just an abstraction that allowed
> the transaction commit to couple the log item formatting to the
> xlog_write() vector calls, which is something that went away with
> delayed logging about 8 years ago. The only piece of the log item
> descriptor that remained was the dirty flag, and the issue here
> boils down to one simple question: which dirty state do we trust -
> the log item or the descriptor?
> 
> That, as an architectural question, is a no brainer. It's the log
> item state that matters. The log item descriptor is an abstraction
> long past it's use-by date, so I'm going to resolve this problem
> simply by removing it (if you are wondering how I found the
> mulitply-joined log item bugs...).
> 

Ooh, we're in danger of making some progress here... :P

So this all suggests to me that you see the lidp dirty state as
duplicative with the log item dirty state. That is quite different from
the perception I have from reading the code (a perception which I tried
my best to describe so you, or somebody, could set me straight if
necessary). Then again...

> In doing this, we no longer have the question of which one to trust
> - all the "dirty in transaction" state is carried on the log item
> itself and is valid only during the life of a transaction.  At
> which point, there should be no possibility of the log item dirty
> flag getting out of step with it's dirty state, and we can simply
> add asserts in the write place to validate this.
> 

This more describes the lidp object as duplicative, while the dirty
state it supports may still be unique with respect to the lip dirty
state (which is what I thought to begin with). In other words, we still
need to differentiate between a dirty lip that might be in the log
pipeline and a log item that is "dirty in a transaction," because afaict
it's still possible to have a dirty log item in a clean transaction.

So which of the above is more accurate? If the latter, doesn't the
"dirty in transaction" state we'd move over to the lip reflect the same
meaning as XFS_LID_DIRTY today? If so, ISTM that we should still not
expect to have a "dirty in transaction" log item unless the log item
itself is dirty. Hm?

(All the other bits around multi-join bugs and whatnot certainly sound
legitimately borked, I'm just still trying to make sense of the bit that
relates to this patch. I'll try and make sense of the other stuff when
patches are available.).

Brian (who will be online intermittently for the next several days due
to power outages...)

> > As for clearing LID_DIRTY, someone handed us a log item in an
> > inconsistent state, so perhaps we should shut down the fs instead?  I
> > get that LID_DIRTY is a flag that is a proxy for counting the iovecs,
> > but if the iovec array and the flag disagree then something funny is
> > going on.
> 
> Yup, like we have a multiply joined log item and stale log item
> descriptors. :/
> 
> > Then again I guess it's mysterious if something scribbles on
> > our incore log and the fs just shuts down... so I guess I'd be ok with
> > clearing the flag so long as /something/ gets logged about in-memory
> > state being weird even on production kernels.
> 
> I'm just going to make the coherency problem go away completely.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-08 19:11               ` Brian Foster
@ 2018-03-08 22:10                 ` Dave Chinner
  2018-03-09 15:47                   ` Brian Foster
  0 siblings, 1 reply; 16+ messages in thread
From: Dave Chinner @ 2018-03-08 22:10 UTC (permalink / raw)
  To: Brian Foster; +Cc: Darrick J. Wong, linux-xfs

On Thu, Mar 08, 2018 at 02:11:13PM -0500, Brian Foster wrote:
> On Thu, Mar 08, 2018 at 03:40:17PM +1100, Dave Chinner wrote:
> > On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote:
> > > On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> > > > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > > > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > > > My argument is not that it's not possible. My argument is that the
> > > > semantics of XFS_LID_DIRTY suggest the transaction modified the object
> > > > in memory. Today, that means we log a range of the object, log an
> > > > invalidation or order a buffer. E.g., we modify an actual metadata
> > > > object, dirty the item (log or order), dirty the lidp and dirty the
> > > > transaction. If we didn't ever modify a particular object, then there's
> > > > no reason to dirty it that I can see. Failing to log/order the object
> > > > properly doesn't justify assuming it hasn't been modified IMO, we don't
> > > > really know either way.
> > > > 
> > > > The way the flag is used seems to bear that out. It's set when an object
> > > > has been modified by a transaction. The way the flag alters behavior
> > > > suggests the same. Objects that have been dirtied by a transaction
> > > > cannot be released from that transaction and a cancel of a transaction
> > > > with a dirty lidp causes fs shutdown (because we dirty the transaction
> > > > once we dirty a lidp).
> > 
> > I'll point out that I've just stumbled onto a series of bugs where
> > log items are multiply joined to a single transaction, in which case
> > the lidp state may not reflect the current state of the log item
> > because the log item no longer points to the lidp in question.
> > 
> > This also raises questions about what happens when we process a log
> > item twice in the cil commit infrastructure - two formatting passes,
> > multiple inserts into the CIL list, multiple calls to
> > iop_committing/iop_unlock when the transaction is freed, etc.
> > There's lots of shit that could go wrong as a result of this type of
> > bug...
> > 
> > > > A cancel of that same transaction would shutdown the fs because it
> > > > dirtied (i.e., presumably modified) an object. So we can't cancel the
> > > > transaction for risk of corruption, we can't release the object from the
> > > > transaction, yet this patch proposes behavior where a commit of that
> > > > transaction can silently undirty the lidp, commit whatever else might be
> > > > in the transaction and carry on as if nothing were wrong.
> > > > 
> > > > At the very least, this is inconsistent with how this flag is used
> > > > everywhere else. How do you explain that?
> > > 
> > > So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
> > > and this patch proposes that if we ever see this invalid state then we
> > > decide that no the buffer isn't dirty since there are zero iovecs.  This
> > > prevents the log from allocating anything for this item since there's no
> > > evidence of anything being dirty.
> > 
> > Essentially. The log item dirty state is the thing we trust right
> > through the log item life cycle. It's fundamental to the relogging
> > algorithm we use to keep dirty metadata moving forwards through the
> > log.
> > 
> > The log item descriptor, OTOH, was just an abstraction that allowed
> > the transaction commit to couple the log item formatting to the
> > xlog_write() vector calls, which is something that went away with
> > delayed logging about 8 years ago. The only piece of the log item
> > descriptor that remained was the dirty flag, and the issue here
> > boils down to one simple question: which dirty state do we trust -
> > the log item or the descriptor?
> > 
> > That, as an architectural question, is a no brainer. It's the log
> > item state that matters. The log item descriptor is an abstraction
> > long past it's use-by date, so I'm going to resolve this problem
> > simply by removing it (if you are wondering how I found the
> > mulitply-joined log item bugs...).
> > 
> 
> Ooh, we're in danger of making some progress here... :P
> 
> So this all suggests to me that you see the lidp dirty state as
> duplicative with the log item dirty state.

Well, sort of.

> That is quite different from
> the perception I have from reading the code (a perception which I tried
> my best to describe so you, or somebody, could set me straight if
> necessary). Then again...

The log item descriptor used to link the transaction to the log item
as it passed though the journal. Transactions used to be freed on
log IO completion - their life cycle changed drastically with
delayed logging, such that they are now freed by xfs_trans_commit(),
rather than being attached to the iclogbuf (as the CIL checkpoint
transaction now is) and freed on log IO completion. i.e. their
functionality was effective replaced by the xfs_log_vec that we now
allocate and format during transaction commit.

IOWs, if we can't/don't create a xfs_log_vec that will be written to
the log during formatting, then it doesn't matter what the lid says
- there's nothing to write to the log, so we don't add the object to
the CIL....

> > In doing this, we no longer have the question of which one to trust
> > - all the "dirty in transaction" state is carried on the log item
> > itself and is valid only during the life of a transaction.  At
> > which point, there should be no possibility of the log item dirty
> > flag getting out of step with it's dirty state, and we can simply
> > add asserts in the write place to validate this.
> > 
> 
> This more describes the lidp object as duplicative, while the dirty
> state it supports may still be unique with respect to the lip dirty
> state (which is what I thought to begin with). In other words, we still
> need to differentiate between a dirty lip that might be in the log
> pipeline and a log item that is "dirty in a transaction," because afaict
> it's still possible to have a dirty log item in a clean transaction.

Yes, that is possible. In which case, it doesn't get formatted again
by the current transaction commit because it's already been
formatted and tracked by the CIL/AIL correctly. It's essentially
just an optimisation to avoid redundant relogging of log items.

> So which of the above is more accurate? If the latter, doesn't the
> "dirty in transaction" state we'd move over to the lip reflect the same
> meaning as XFS_LID_DIRTY today? If so, ISTM that we should still not
> expect to have a "dirty in transaction" log item unless the log item
> itself is dirty. Hm?

Yup, but now it's all held in the one object and there's no
possibility of getting that out of sync anymore as the state is
managed directly when the log item is either dirtied or removed
from the transaction. We're not reliant on managing lids correctly
for it to be correct.

/me realises that the addition of the XFS_LI_DIRTY flag to the log
item means that the XFS_BLI_LOGGED flag is now redundant - it
mirrors the state of the XFS_LID_DIRTY flag and was only ever used
to assert that the buffer was actually logged by the transaction
that is trying to format the item...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-08 22:10                 ` Dave Chinner
@ 2018-03-09 15:47                   ` Brian Foster
  2018-03-24 17:05                     ` Darrick J. Wong
  0 siblings, 1 reply; 16+ messages in thread
From: Brian Foster @ 2018-03-09 15:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Fri, Mar 09, 2018 at 09:10:38AM +1100, Dave Chinner wrote:
> On Thu, Mar 08, 2018 at 02:11:13PM -0500, Brian Foster wrote:
> > On Thu, Mar 08, 2018 at 03:40:17PM +1100, Dave Chinner wrote:
> > > On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote:
> > > > On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> > > > > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > > > > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > > > > My argument is not that it's not possible. My argument is that the
> > > > > semantics of XFS_LID_DIRTY suggest the transaction modified the object
> > > > > in memory. Today, that means we log a range of the object, log an
> > > > > invalidation or order a buffer. E.g., we modify an actual metadata
> > > > > object, dirty the item (log or order), dirty the lidp and dirty the
> > > > > transaction. If we didn't ever modify a particular object, then there's
> > > > > no reason to dirty it that I can see. Failing to log/order the object
> > > > > properly doesn't justify assuming it hasn't been modified IMO, we don't
> > > > > really know either way.
> > > > > 
> > > > > The way the flag is used seems to bear that out. It's set when an object
> > > > > has been modified by a transaction. The way the flag alters behavior
> > > > > suggests the same. Objects that have been dirtied by a transaction
> > > > > cannot be released from that transaction and a cancel of a transaction
> > > > > with a dirty lidp causes fs shutdown (because we dirty the transaction
> > > > > once we dirty a lidp).
> > > 
> > > I'll point out that I've just stumbled onto a series of bugs where
> > > log items are multiply joined to a single transaction, in which case
> > > the lidp state may not reflect the current state of the log item
> > > because the log item no longer points to the lidp in question.
> > > 
> > > This also raises questions about what happens when we process a log
> > > item twice in the cil commit infrastructure - two formatting passes,
> > > multiple inserts into the CIL list, multiple calls to
> > > iop_committing/iop_unlock when the transaction is freed, etc.
> > > There's lots of shit that could go wrong as a result of this type of
> > > bug...
> > > 
> > > > > A cancel of that same transaction would shutdown the fs because it
> > > > > dirtied (i.e., presumably modified) an object. So we can't cancel the
> > > > > transaction for risk of corruption, we can't release the object from the
> > > > > transaction, yet this patch proposes behavior where a commit of that
> > > > > transaction can silently undirty the lidp, commit whatever else might be
> > > > > in the transaction and carry on as if nothing were wrong.
> > > > > 
> > > > > At the very least, this is inconsistent with how this flag is used
> > > > > everywhere else. How do you explain that?
> > > > 
> > > > So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
> > > > and this patch proposes that if we ever see this invalid state then we
> > > > decide that no the buffer isn't dirty since there are zero iovecs.  This
> > > > prevents the log from allocating anything for this item since there's no
> > > > evidence of anything being dirty.
> > > 
> > > Essentially. The log item dirty state is the thing we trust right
> > > through the log item life cycle. It's fundamental to the relogging
> > > algorithm we use to keep dirty metadata moving forwards through the
> > > log.
> > > 
> > > The log item descriptor, OTOH, was just an abstraction that allowed
> > > the transaction commit to couple the log item formatting to the
> > > xlog_write() vector calls, which is something that went away with
> > > delayed logging about 8 years ago. The only piece of the log item
> > > descriptor that remained was the dirty flag, and the issue here
> > > boils down to one simple question: which dirty state do we trust -
> > > the log item or the descriptor?
> > > 
> > > That, as an architectural question, is a no brainer. It's the log
> > > item state that matters. The log item descriptor is an abstraction
> > > long past it's use-by date, so I'm going to resolve this problem
> > > simply by removing it (if you are wondering how I found the
> > > mulitply-joined log item bugs...).
> > > 
> > 
> > Ooh, we're in danger of making some progress here... :P
> > 
> > So this all suggests to me that you see the lidp dirty state as
> > duplicative with the log item dirty state.
> 
> Well, sort of.
> 
> > That is quite different from
> > the perception I have from reading the code (a perception which I tried
> > my best to describe so you, or somebody, could set me straight if
> > necessary). Then again...
> 
> The log item descriptor used to link the transaction to the log item
> as it passed though the journal. Transactions used to be freed on
> log IO completion - their life cycle changed drastically with
> delayed logging, such that they are now freed by xfs_trans_commit(),
> rather than being attached to the iclogbuf (as the CIL checkpoint
> transaction now is) and freed on log IO completion. i.e. their
> functionality was effective replaced by the xfs_log_vec that we now
> allocate and format during transaction commit.
> 

Ok, that makes sense. Pre-CIL is before my time, but we essentially
disconnected the transaction from the full lifecycle up through log I/O
completion and replaced it with the log vector, so we can reformat items
that are relogged before the vectors are written out during a
checkpoint.

> IOWs, if we can't/don't create a xfs_log_vec that will be written to
> the log during formatting, then it doesn't matter what the lid says
> - there's nothing to write to the log, so we don't add the object to
> the CIL....
> 

Yes, there is no point in adding a physically unlogged log item to the
CIL. My question is bigger picture than the CIL..

> > > In doing this, we no longer have the question of which one to trust
> > > - all the "dirty in transaction" state is carried on the log item
> > > itself and is valid only during the life of a transaction.  At
> > > which point, there should be no possibility of the log item dirty
> > > flag getting out of step with it's dirty state, and we can simply
> > > add asserts in the write place to validate this.
> > > 
> > 
> > This more describes the lidp object as duplicative, while the dirty
> > state it supports may still be unique with respect to the lip dirty
> > state (which is what I thought to begin with). In other words, we still
> > need to differentiate between a dirty lip that might be in the log
> > pipeline and a log item that is "dirty in a transaction," because afaict
> > it's still possible to have a dirty log item in a clean transaction.
> 
> Yes, that is possible. In which case, it doesn't get formatted again
> by the current transaction commit because it's already been
> formatted and tracked by the CIL/AIL correctly. It's essentially
> just an optimisation to avoid redundant relogging of log items.
> 

Ok, that's pretty much how I understood it. Thanks for describing some
of this background/viewpoint. At the very least this starts to isolate
where we are looking at this differently.

> > So which of the above is more accurate? If the latter, doesn't the
> > "dirty in transaction" state we'd move over to the lip reflect the same
> > meaning as XFS_LID_DIRTY today? If so, ISTM that we should still not
> > expect to have a "dirty in transaction" log item unless the log item
> > itself is dirty. Hm?
> 
> Yup, but now it's all held in the one object and there's no
> possibility of getting that out of sync anymore as the state is
> managed directly when the log item is either dirtied or removed
> from the transaction. We're not reliant on managing lids correctly
> for it to be correct.
> 

Ok. FWIW, I think you're completely missing the point I'm trying to make
with regard to this patch. It really has nothing to do with the CIL or
the log item, or really the lidp for that matter if we think about in
terms of the extra lidp background you've documented. Just assume the
lidp is gone.. all that matters here are the fundamental "log item
dirty" and "log item dirty in transaction" states and the distinction
between them.

The point is that if we have a transaction that holds an item "dirty in
the transaction" but the item itself is not dirty, we should probably
consider that the in-core metadata had been modified and treat the
transaction as inconsistent because we only dirty an item in a
transaction when it's associated metadata has been physically modified.
Of course, we don't know that for sure in this "shouldn't currently
happen" case, but we do already consider it an error to cancel a dirty
transaction clearing the "dirty in transaction" state from a particular
item is logically equivalent to cancelling that subset of the
transaction.

I completely agree that we should not insert that log vector in the CIL.
That part seems obvious to me. I'm simply suggesting that rather than
being so focused on skipping the particular lidp/lip, we probably
shouldn't commit the entire transaction. Rather, force it into the error
path that is analogous to a cancel (i.e., shutdown the fs).

This is technically a trivial change to the current patch so I'm quite
curious what drives resistance to the point of motivating elimination of
the lidp. Of course all that work might be independently
correct/overdue/worthwhile, but it's presented here as if it washes away
the fundamental discrepancy I'm describing. In reality, so long as we
maintain distinct "dirty log item" and "dirty in transaction" states, it
doesn't change the point.

Brian

P.S., IOW, fold something like the appended into the original patch. As
easy as it is to add, it can just as easily be removed if "dirty in tx"
+ "clean item" turns into an explicitly handled state in the future. In
the meantime, this turns potential bugs like the symlink bug in the buf
log range patch into a shutdown that prevents the silent corruption that
would occur if the transaction had committed.

--- 8< ---

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index f190d1b84c0d..d5d8b88a8aaa 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -170,6 +170,15 @@ xlog_cil_alloc_shadow_bufs(
 		}
 
 		/*
+		 * If the transaction dirtied an item but didn't tell us how to
+		 * log it, something is seriously wrong. We have to assume the
+		 * associated in-core metadata was modified. Don't risk
+		 * corruption by committing the transaction.
+		 */
+		if (!niovecs && !ordered)
+			xfs_force_shutdown(log->l_mp, SHUTDOWN_CORRUPT_INCORE);
+
+		/*
 		 * We 64-bit align the length of each iovec so that the start
 		 * of the next one is naturally aligned.  We'll need to
 		 * account for that slack space here. Then round nbytes up
@@ -352,6 +361,7 @@ xlog_cil_insert_format_items(
 		 * added to the CIL by mistake.
 		 */
 		if (!shadow->lv_niovecs && !ordered) {
+			ASSERT(XFS_FORCED_SHUTDOWN(log->l_mp));
 			lidp->lid_flags &= ~XFS_LID_DIRTY;
 			continue;
 		}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-09 15:47                   ` Brian Foster
@ 2018-03-24 17:05                     ` Darrick J. Wong
  2018-03-26 11:36                       ` Brian Foster
  0 siblings, 1 reply; 16+ messages in thread
From: Darrick J. Wong @ 2018-03-24 17:05 UTC (permalink / raw)
  To: Brian Foster; +Cc: Dave Chinner, linux-xfs

On Fri, Mar 09, 2018 at 10:47:04AM -0500, Brian Foster wrote:
> On Fri, Mar 09, 2018 at 09:10:38AM +1100, Dave Chinner wrote:
> > On Thu, Mar 08, 2018 at 02:11:13PM -0500, Brian Foster wrote:
> > > On Thu, Mar 08, 2018 at 03:40:17PM +1100, Dave Chinner wrote:
> > > > On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote:
> > > > > On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> > > > > > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > > > > > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > > > > > My argument is not that it's not possible. My argument is that the
> > > > > > semantics of XFS_LID_DIRTY suggest the transaction modified the object
> > > > > > in memory. Today, that means we log a range of the object, log an
> > > > > > invalidation or order a buffer. E.g., we modify an actual metadata
> > > > > > object, dirty the item (log or order), dirty the lidp and dirty the
> > > > > > transaction. If we didn't ever modify a particular object, then there's
> > > > > > no reason to dirty it that I can see. Failing to log/order the object
> > > > > > properly doesn't justify assuming it hasn't been modified IMO, we don't
> > > > > > really know either way.
> > > > > > 
> > > > > > The way the flag is used seems to bear that out. It's set when an object
> > > > > > has been modified by a transaction. The way the flag alters behavior
> > > > > > suggests the same. Objects that have been dirtied by a transaction
> > > > > > cannot be released from that transaction and a cancel of a transaction
> > > > > > with a dirty lidp causes fs shutdown (because we dirty the transaction
> > > > > > once we dirty a lidp).
> > > > 
> > > > I'll point out that I've just stumbled onto a series of bugs where
> > > > log items are multiply joined to a single transaction, in which case
> > > > the lidp state may not reflect the current state of the log item
> > > > because the log item no longer points to the lidp in question.
> > > > 
> > > > This also raises questions about what happens when we process a log
> > > > item twice in the cil commit infrastructure - two formatting passes,
> > > > multiple inserts into the CIL list, multiple calls to
> > > > iop_committing/iop_unlock when the transaction is freed, etc.
> > > > There's lots of shit that could go wrong as a result of this type of
> > > > bug...
> > > > 
> > > > > > A cancel of that same transaction would shutdown the fs because it
> > > > > > dirtied (i.e., presumably modified) an object. So we can't cancel the
> > > > > > transaction for risk of corruption, we can't release the object from the
> > > > > > transaction, yet this patch proposes behavior where a commit of that
> > > > > > transaction can silently undirty the lidp, commit whatever else might be
> > > > > > in the transaction and carry on as if nothing were wrong.
> > > > > > 
> > > > > > At the very least, this is inconsistent with how this flag is used
> > > > > > everywhere else. How do you explain that?
> > > > > 
> > > > > So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
> > > > > and this patch proposes that if we ever see this invalid state then we
> > > > > decide that no the buffer isn't dirty since there are zero iovecs.  This
> > > > > prevents the log from allocating anything for this item since there's no
> > > > > evidence of anything being dirty.
> > > > 
> > > > Essentially. The log item dirty state is the thing we trust right
> > > > through the log item life cycle. It's fundamental to the relogging
> > > > algorithm we use to keep dirty metadata moving forwards through the
> > > > log.
> > > > 
> > > > The log item descriptor, OTOH, was just an abstraction that allowed
> > > > the transaction commit to couple the log item formatting to the
> > > > xlog_write() vector calls, which is something that went away with
> > > > delayed logging about 8 years ago. The only piece of the log item
> > > > descriptor that remained was the dirty flag, and the issue here
> > > > boils down to one simple question: which dirty state do we trust -
> > > > the log item or the descriptor?
> > > > 
> > > > That, as an architectural question, is a no brainer. It's the log
> > > > item state that matters. The log item descriptor is an abstraction
> > > > long past it's use-by date, so I'm going to resolve this problem
> > > > simply by removing it (if you are wondering how I found the
> > > > mulitply-joined log item bugs...).
> > > > 
> > > 
> > > Ooh, we're in danger of making some progress here... :P
> > > 
> > > So this all suggests to me that you see the lidp dirty state as
> > > duplicative with the log item dirty state.
> > 
> > Well, sort of.
> > 
> > > That is quite different from
> > > the perception I have from reading the code (a perception which I tried
> > > my best to describe so you, or somebody, could set me straight if
> > > necessary). Then again...
> > 
> > The log item descriptor used to link the transaction to the log item
> > as it passed though the journal. Transactions used to be freed on
> > log IO completion - their life cycle changed drastically with
> > delayed logging, such that they are now freed by xfs_trans_commit(),
> > rather than being attached to the iclogbuf (as the CIL checkpoint
> > transaction now is) and freed on log IO completion. i.e. their
> > functionality was effective replaced by the xfs_log_vec that we now
> > allocate and format during transaction commit.
> > 
> 
> Ok, that makes sense. Pre-CIL is before my time, but we essentially
> disconnected the transaction from the full lifecycle up through log I/O
> completion and replaced it with the log vector, so we can reformat items
> that are relogged before the vectors are written out during a
> checkpoint.
> 
> > IOWs, if we can't/don't create a xfs_log_vec that will be written to
> > the log during formatting, then it doesn't matter what the lid says
> > - there's nothing to write to the log, so we don't add the object to
> > the CIL....
> > 
> 
> Yes, there is no point in adding a physically unlogged log item to the
> CIL. My question is bigger picture than the CIL..
> 
> > > > In doing this, we no longer have the question of which one to trust
> > > > - all the "dirty in transaction" state is carried on the log item
> > > > itself and is valid only during the life of a transaction.  At
> > > > which point, there should be no possibility of the log item dirty
> > > > flag getting out of step with it's dirty state, and we can simply
> > > > add asserts in the write place to validate this.
> > > > 
> > > 
> > > This more describes the lidp object as duplicative, while the dirty
> > > state it supports may still be unique with respect to the lip dirty
> > > state (which is what I thought to begin with). In other words, we still
> > > need to differentiate between a dirty lip that might be in the log
> > > pipeline and a log item that is "dirty in a transaction," because afaict
> > > it's still possible to have a dirty log item in a clean transaction.
> > 
> > Yes, that is possible. In which case, it doesn't get formatted again
> > by the current transaction commit because it's already been
> > formatted and tracked by the CIL/AIL correctly. It's essentially
> > just an optimisation to avoid redundant relogging of log items.
> > 
> 
> Ok, that's pretty much how I understood it. Thanks for describing some
> of this background/viewpoint. At the very least this starts to isolate
> where we are looking at this differently.
> 
> > > So which of the above is more accurate? If the latter, doesn't the
> > > "dirty in transaction" state we'd move over to the lip reflect the same
> > > meaning as XFS_LID_DIRTY today? If so, ISTM that we should still not
> > > expect to have a "dirty in transaction" log item unless the log item
> > > itself is dirty. Hm?
> > 
> > Yup, but now it's all held in the one object and there's no
> > possibility of getting that out of sync anymore as the state is
> > managed directly when the log item is either dirtied or removed
> > from the transaction. We're not reliant on managing lids correctly
> > for it to be correct.
> > 
> 
> Ok. FWIW, I think you're completely missing the point I'm trying to make
> with regard to this patch. It really has nothing to do with the CIL or
> the log item, or really the lidp for that matter if we think about in
> terms of the extra lidp background you've documented. Just assume the
> lidp is gone.. all that matters here are the fundamental "log item
> dirty" and "log item dirty in transaction" states and the distinction
> between them.
> 
> The point is that if we have a transaction that holds an item "dirty in
> the transaction" but the item itself is not dirty, we should probably
> consider that the in-core metadata had been modified and treat the
> transaction as inconsistent because we only dirty an item in a
> transaction when it's associated metadata has been physically modified.
> Of course, we don't know that for sure in this "shouldn't currently
> happen" case, but we do already consider it an error to cancel a dirty
> transaction clearing the "dirty in transaction" state from a particular
> item is logically equivalent to cancelling that subset of the
> transaction.
> 
> I completely agree that we should not insert that log vector in the CIL.
> That part seems obvious to me. I'm simply suggesting that rather than
> being so focused on skipping the particular lidp/lip, we probably
> shouldn't commit the entire transaction. Rather, force it into the error
> path that is analogous to a cancel (i.e., shutdown the fs).
> 
> This is technically a trivial change to the current patch so I'm quite
> curious what drives resistance to the point of motivating elimination of
> the lidp. Of course all that work might be independently
> correct/overdue/worthwhile, but it's presented here as if it washes away
> the fundamental discrepancy I'm describing. In reality, so long as we
> maintain distinct "dirty log item" and "dirty in transaction" states, it
> doesn't change the point.

Sooo... afaict, the upstream kernel doesn't seem to be at a loss for not
having either of these two patches.  Dave's patch makes it so that if we
screw up the dirty state between the log item & log item descriptor
we'll trust the log item and not stumble into a crash.  Brian's patch
below seems to have the viewpoint that if this happens it's evidence of
either a software bug or memory corruption, so let's go offline, at
least until we decide that we actually want to support that situation
because some upper level xfs code wants it, or someone rips out the log
item descriptors.

I think the main reason to take Dave's patch is a defensive one: if
anyone else screws up logging in this manner the kernel will stay up.
This seems reasonable to me.

Moving on to the second patch (below), is there any reason /not/ to
clean it up and pull it in at the same time?  Nobody has come up with a
use case that requires a dirty log item descriptor and a clean log item,
so is there any reason why we shouldn't consider that state symptomatic
of /something/ being wrong in the kernel or memory and shutdown as a
precaution?

Ah well, I'll dump both of them in my tree and see if generic/388 has
anything interesting to say. :P

--D

> Brian
> 
> P.S., IOW, fold something like the appended into the original patch. As
> easy as it is to add, it can just as easily be removed if "dirty in tx"
> + "clean item" turns into an explicitly handled state in the future. In
> the meantime, this turns potential bugs like the symlink bug in the buf
> log range patch into a shutdown that prevents the silent corruption that
> would occur if the transaction had committed.
> 
> --- 8< ---
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index f190d1b84c0d..d5d8b88a8aaa 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -170,6 +170,15 @@ xlog_cil_alloc_shadow_bufs(
>  		}
>  
>  		/*
> +		 * If the transaction dirtied an item but didn't tell us how to
> +		 * log it, something is seriously wrong. We have to assume the
> +		 * associated in-core metadata was modified. Don't risk
> +		 * corruption by committing the transaction.
> +		 */
> +		if (!niovecs && !ordered)
> +			xfs_force_shutdown(log->l_mp, SHUTDOWN_CORRUPT_INCORE);
> +
> +		/*
>  		 * We 64-bit align the length of each iovec so that the start
>  		 * of the next one is naturally aligned.  We'll need to
>  		 * account for that slack space here. Then round nbytes up
> @@ -352,6 +361,7 @@ xlog_cil_insert_format_items(
>  		 * added to the CIL by mistake.
>  		 */
>  		if (!shadow->lv_niovecs && !ordered) {
> +			ASSERT(XFS_FORCED_SHUTDOWN(log->l_mp));
>  			lidp->lid_flags &= ~XFS_LID_DIRTY;
>  			continue;
>  		}
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] xfs: handle inconsistent log item formatting state correctly
  2018-03-24 17:05                     ` Darrick J. Wong
@ 2018-03-26 11:36                       ` Brian Foster
  0 siblings, 0 replies; 16+ messages in thread
From: Brian Foster @ 2018-03-26 11:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Sat, Mar 24, 2018 at 10:05:42AM -0700, Darrick J. Wong wrote:
> On Fri, Mar 09, 2018 at 10:47:04AM -0500, Brian Foster wrote:
> > On Fri, Mar 09, 2018 at 09:10:38AM +1100, Dave Chinner wrote:
> > > On Thu, Mar 08, 2018 at 02:11:13PM -0500, Brian Foster wrote:
> > > > On Thu, Mar 08, 2018 at 03:40:17PM +1100, Dave Chinner wrote:
> > > > > On Tue, Mar 06, 2018 at 06:12:45PM -0800, Darrick J. Wong wrote:
> > > > > > On Tue, Mar 06, 2018 at 08:16:02PM -0500, Brian Foster wrote:
> > > > > > > On Wed, Mar 07, 2018 at 09:14:41AM +1100, Dave Chinner wrote:
> > > > > > > > On Tue, Mar 06, 2018 at 07:31:56AM -0500, Brian Foster wrote:
> > > > > > > My argument is not that it's not possible. My argument is that the
> > > > > > > semantics of XFS_LID_DIRTY suggest the transaction modified the object
> > > > > > > in memory. Today, that means we log a range of the object, log an
> > > > > > > invalidation or order a buffer. E.g., we modify an actual metadata
> > > > > > > object, dirty the item (log or order), dirty the lidp and dirty the
> > > > > > > transaction. If we didn't ever modify a particular object, then there's
> > > > > > > no reason to dirty it that I can see. Failing to log/order the object
> > > > > > > properly doesn't justify assuming it hasn't been modified IMO, we don't
> > > > > > > really know either way.
> > > > > > > 
> > > > > > > The way the flag is used seems to bear that out. It's set when an object
> > > > > > > has been modified by a transaction. The way the flag alters behavior
> > > > > > > suggests the same. Objects that have been dirtied by a transaction
> > > > > > > cannot be released from that transaction and a cancel of a transaction
> > > > > > > with a dirty lidp causes fs shutdown (because we dirty the transaction
> > > > > > > once we dirty a lidp).
> > > > > 
> > > > > I'll point out that I've just stumbled onto a series of bugs where
> > > > > log items are multiply joined to a single transaction, in which case
> > > > > the lidp state may not reflect the current state of the log item
> > > > > because the log item no longer points to the lidp in question.
> > > > > 
> > > > > This also raises questions about what happens when we process a log
> > > > > item twice in the cil commit infrastructure - two formatting passes,
> > > > > multiple inserts into the CIL list, multiple calls to
> > > > > iop_committing/iop_unlock when the transaction is freed, etc.
> > > > > There's lots of shit that could go wrong as a result of this type of
> > > > > bug...
> > > > > 
> > > > > > > A cancel of that same transaction would shutdown the fs because it
> > > > > > > dirtied (i.e., presumably modified) an object. So we can't cancel the
> > > > > > > transaction for risk of corruption, we can't release the object from the
> > > > > > > transaction, yet this patch proposes behavior where a commit of that
> > > > > > > transaction can silently undirty the lidp, commit whatever else might be
> > > > > > > in the transaction and carry on as if nothing were wrong.
> > > > > > > 
> > > > > > > At the very least, this is inconsistent with how this flag is used
> > > > > > > everywhere else. How do you explain that?
> > > > > > 
> > > > > > So the state "li dirty, lip !ordered, niovecs == 0" is an invalid state,
> > > > > > and this patch proposes that if we ever see this invalid state then we
> > > > > > decide that no the buffer isn't dirty since there are zero iovecs.  This
> > > > > > prevents the log from allocating anything for this item since there's no
> > > > > > evidence of anything being dirty.
> > > > > 
> > > > > Essentially. The log item dirty state is the thing we trust right
> > > > > through the log item life cycle. It's fundamental to the relogging
> > > > > algorithm we use to keep dirty metadata moving forwards through the
> > > > > log.
> > > > > 
> > > > > The log item descriptor, OTOH, was just an abstraction that allowed
> > > > > the transaction commit to couple the log item formatting to the
> > > > > xlog_write() vector calls, which is something that went away with
> > > > > delayed logging about 8 years ago. The only piece of the log item
> > > > > descriptor that remained was the dirty flag, and the issue here
> > > > > boils down to one simple question: which dirty state do we trust -
> > > > > the log item or the descriptor?
> > > > > 
> > > > > That, as an architectural question, is a no brainer. It's the log
> > > > > item state that matters. The log item descriptor is an abstraction
> > > > > long past it's use-by date, so I'm going to resolve this problem
> > > > > simply by removing it (if you are wondering how I found the
> > > > > mulitply-joined log item bugs...).
> > > > > 
> > > > 
> > > > Ooh, we're in danger of making some progress here... :P
> > > > 
> > > > So this all suggests to me that you see the lidp dirty state as
> > > > duplicative with the log item dirty state.
> > > 
> > > Well, sort of.
> > > 
> > > > That is quite different from
> > > > the perception I have from reading the code (a perception which I tried
> > > > my best to describe so you, or somebody, could set me straight if
> > > > necessary). Then again...
> > > 
> > > The log item descriptor used to link the transaction to the log item
> > > as it passed though the journal. Transactions used to be freed on
> > > log IO completion - their life cycle changed drastically with
> > > delayed logging, such that they are now freed by xfs_trans_commit(),
> > > rather than being attached to the iclogbuf (as the CIL checkpoint
> > > transaction now is) and freed on log IO completion. i.e. their
> > > functionality was effective replaced by the xfs_log_vec that we now
> > > allocate and format during transaction commit.
> > > 
> > 
> > Ok, that makes sense. Pre-CIL is before my time, but we essentially
> > disconnected the transaction from the full lifecycle up through log I/O
> > completion and replaced it with the log vector, so we can reformat items
> > that are relogged before the vectors are written out during a
> > checkpoint.
> > 
> > > IOWs, if we can't/don't create a xfs_log_vec that will be written to
> > > the log during formatting, then it doesn't matter what the lid says
> > > - there's nothing to write to the log, so we don't add the object to
> > > the CIL....
> > > 
> > 
> > Yes, there is no point in adding a physically unlogged log item to the
> > CIL. My question is bigger picture than the CIL..
> > 
> > > > > In doing this, we no longer have the question of which one to trust
> > > > > - all the "dirty in transaction" state is carried on the log item
> > > > > itself and is valid only during the life of a transaction.  At
> > > > > which point, there should be no possibility of the log item dirty
> > > > > flag getting out of step with it's dirty state, and we can simply
> > > > > add asserts in the write place to validate this.
> > > > > 
> > > > 
> > > > This more describes the lidp object as duplicative, while the dirty
> > > > state it supports may still be unique with respect to the lip dirty
> > > > state (which is what I thought to begin with). In other words, we still
> > > > need to differentiate between a dirty lip that might be in the log
> > > > pipeline and a log item that is "dirty in a transaction," because afaict
> > > > it's still possible to have a dirty log item in a clean transaction.
> > > 
> > > Yes, that is possible. In which case, it doesn't get formatted again
> > > by the current transaction commit because it's already been
> > > formatted and tracked by the CIL/AIL correctly. It's essentially
> > > just an optimisation to avoid redundant relogging of log items.
> > > 
> > 
> > Ok, that's pretty much how I understood it. Thanks for describing some
> > of this background/viewpoint. At the very least this starts to isolate
> > where we are looking at this differently.
> > 
> > > > So which of the above is more accurate? If the latter, doesn't the
> > > > "dirty in transaction" state we'd move over to the lip reflect the same
> > > > meaning as XFS_LID_DIRTY today? If so, ISTM that we should still not
> > > > expect to have a "dirty in transaction" log item unless the log item
> > > > itself is dirty. Hm?
> > > 
> > > Yup, but now it's all held in the one object and there's no
> > > possibility of getting that out of sync anymore as the state is
> > > managed directly when the log item is either dirtied or removed
> > > from the transaction. We're not reliant on managing lids correctly
> > > for it to be correct.
> > > 
> > 
> > Ok. FWIW, I think you're completely missing the point I'm trying to make
> > with regard to this patch. It really has nothing to do with the CIL or
> > the log item, or really the lidp for that matter if we think about in
> > terms of the extra lidp background you've documented. Just assume the
> > lidp is gone.. all that matters here are the fundamental "log item
> > dirty" and "log item dirty in transaction" states and the distinction
> > between them.
> > 
> > The point is that if we have a transaction that holds an item "dirty in
> > the transaction" but the item itself is not dirty, we should probably
> > consider that the in-core metadata had been modified and treat the
> > transaction as inconsistent because we only dirty an item in a
> > transaction when it's associated metadata has been physically modified.
> > Of course, we don't know that for sure in this "shouldn't currently
> > happen" case, but we do already consider it an error to cancel a dirty
> > transaction clearing the "dirty in transaction" state from a particular
> > item is logically equivalent to cancelling that subset of the
> > transaction.
> > 
> > I completely agree that we should not insert that log vector in the CIL.
> > That part seems obvious to me. I'm simply suggesting that rather than
> > being so focused on skipping the particular lidp/lip, we probably
> > shouldn't commit the entire transaction. Rather, force it into the error
> > path that is analogous to a cancel (i.e., shutdown the fs).
> > 
> > This is technically a trivial change to the current patch so I'm quite
> > curious what drives resistance to the point of motivating elimination of
> > the lidp. Of course all that work might be independently
> > correct/overdue/worthwhile, but it's presented here as if it washes away
> > the fundamental discrepancy I'm describing. In reality, so long as we
> > maintain distinct "dirty log item" and "dirty in transaction" states, it
> > doesn't change the point.
> 
> Sooo... afaict, the upstream kernel doesn't seem to be at a loss for not
> having either of these two patches.  Dave's patch makes it so that if we
> screw up the dirty state between the log item & log item descriptor
> we'll trust the log item and not stumble into a crash.  Brian's patch
> below seems to have the viewpoint that if this happens it's evidence of
> either a software bug or memory corruption, so let's go offline, at
> least until we decide that we actually want to support that situation
> because some upper level xfs code wants it, or someone rips out the log
> item descriptors.
> 

A software bug... sure, not so sure about a memory corruption. I suppose
that is always possible but the point is more that a dirty in
transaction item implies the in-core metadata structure may have been
modified. We don't know for sure, but there is non-zero risk of
filesystem corruption if we assume that it wasn't, leave it clean
in-core and commit the rest of the transaction.

> I think the main reason to take Dave's patch is a defensive one: if
> anyone else screws up logging in this manner the kernel will stay up.
> This seems reasonable to me.
> 

Indeed.

> Moving on to the second patch (below), is there any reason /not/ to
> clean it up and pull it in at the same time?  Nobody has come up with a
> use case that requires a dirty log item descriptor and a clean log item,
> so is there any reason why we shouldn't consider that state symptomatic
> of /something/ being wrong in the kernel or memory and shutdown as a
> precaution?
> 

If so (if not? if there is a reason not to..? :P), I'm clearly not aware
of it. The only state I can think of that is remotely close is an
ordered item. Ordered items have a flag that essentially allows them to
pass through to the AIL without being logged.

Also note that the hunk below was not intended as an independent patch.
It was a suggestion on how to address my feedback in the currently
proposed patch.

Brian

> Ah well, I'll dump both of them in my tree and see if generic/388 has
> anything interesting to say. :P
> 
> --D
> 
> > Brian
> > 
> > P.S., IOW, fold something like the appended into the original patch. As
> > easy as it is to add, it can just as easily be removed if "dirty in tx"
> > + "clean item" turns into an explicitly handled state in the future. In
> > the meantime, this turns potential bugs like the symlink bug in the buf
> > log range patch into a shutdown that prevents the silent corruption that
> > would occur if the transaction had committed.
> > 
> > --- 8< ---
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index f190d1b84c0d..d5d8b88a8aaa 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -170,6 +170,15 @@ xlog_cil_alloc_shadow_bufs(
> >  		}
> >  
> >  		/*
> > +		 * If the transaction dirtied an item but didn't tell us how to
> > +		 * log it, something is seriously wrong. We have to assume the
> > +		 * associated in-core metadata was modified. Don't risk
> > +		 * corruption by committing the transaction.
> > +		 */
> > +		if (!niovecs && !ordered)
> > +			xfs_force_shutdown(log->l_mp, SHUTDOWN_CORRUPT_INCORE);
> > +
> > +		/*
> >  		 * We 64-bit align the length of each iovec so that the start
> >  		 * of the next one is naturally aligned.  We'll need to
> >  		 * account for that slack space here. Then round nbytes up
> > @@ -352,6 +361,7 @@ xlog_cil_insert_format_items(
> >  		 * added to the CIL by mistake.
> >  		 */
> >  		if (!shadow->lv_niovecs && !ordered) {
> > +			ASSERT(XFS_FORCED_SHUTDOWN(log->l_mp));
> >  			lidp->lid_flags &= ~XFS_LID_DIRTY;
> >  			continue;
> >  		}
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2018-03-26 11:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-01 22:35 [PATCH] xfs: handle inconsistent log item formatting state correctly Dave Chinner
2018-03-02 17:18 ` Darrick J. Wong
2018-03-02 21:52   ` Dave Chinner
2018-03-05 14:40 ` Brian Foster
2018-03-05 22:18   ` Dave Chinner
2018-03-06 12:31     ` Brian Foster
2018-03-06 22:14       ` Dave Chinner
2018-03-07  1:16         ` Brian Foster
2018-03-07  2:12           ` Darrick J. Wong
2018-03-07 14:05             ` Brian Foster
2018-03-08  4:40             ` Dave Chinner
2018-03-08 19:11               ` Brian Foster
2018-03-08 22:10                 ` Dave Chinner
2018-03-09 15:47                   ` Brian Foster
2018-03-24 17:05                     ` Darrick J. Wong
2018-03-26 11:36                       ` Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.