All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH, RFC] xfs: don't break from growfs ag update loop on error
@ 2013-08-15 18:15 Eric Sandeen
  2013-09-09 20:36 ` Eric Sandeen
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Sandeen @ 2013-08-15 18:15 UTC (permalink / raw)
  To: 'linux-xfs@oss.sgi.com'

When xfs_growfs_data_private() is updating backup superblocks,
it bails out on the first error encountered, whether reading or
writing:

* If we get an error writing out the alternate superblocks,
* just issue a warning and continue.  The real work is
* already done and committed.

This can cause a problem later during repair, because repair
looks at all superblocks, and picks the most prevalent one
as correct.  If we bail out early in the backup superblock
loop, we can end up with more "bad" matching superblocks than
good, and a post-growfs repair may revert the filesystem to
the old geometry.

With the combination of superblock verifiers and old bugs,
we're more likely to encounter read errors due to verification.

And perhaps even worse, we don't even properly write any of the
newly-added superblocks in the new AGs.

Even with this change, growfs will still say:

  xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
  data blocks changed from 319815680 to 335216640

which might be confusing to the user, but it at least communicates
that something has gone wrong, and dmesg will probably highlight
the need for an xfs_repair.

And this is still best-effort; if verifiers fail on more than
half the backup supers, they may still "win" - but that's probably
best left to repair to more gracefully handle by doing its own
strict verification as part of the backup super "voting."

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 614eb0c..70714bb 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -153,7 +153,7 @@ xfs_growfs_data_private(
 	xfs_buf_t		*bp;
 	int			bucket;
 	int			dpct;
-	int			error;
+	int			error, saved_error = 0;
 	xfs_agnumber_t		nagcount;
 	xfs_agnumber_t		nagimax = 0;
 	xfs_rfsblock_t		nb, nb_mod;
@@ -495,29 +495,33 @@ xfs_growfs_data_private(
 				error = ENOMEM;
 		}
 
+		/*
+		 * If we get an error reading or writing alternate superblocks,
+		 * continue.  xfs_repair chooses the "best" superblock based
+		 * on most matches; if we break early, we'll leave more
+		 * superblocks un-updated than updated, and xfs_repair may
+		 * pick them over the properly-updated primary.
+		 */
 		if (error) {
 			xfs_warn(mp,
 		"error %d reading secondary superblock for ag %d",
 				error, agno);
-			break;
+			saved_error = error;
+			continue;
 		}
 		xfs_sb_to_disk(XFS_BUF_TO_SBP(bp), &mp->m_sb, XFS_SB_ALL_BITS);
 
-		/*
-		 * If we get an error writing out the alternate superblocks,
-		 * just issue a warning and continue.  The real work is
-		 * already done and committed.
-		 */
 		error = xfs_bwrite(bp);
 		xfs_buf_relse(bp);
 		if (error) {
 			xfs_warn(mp,
 		"write error %d updating secondary superblock for ag %d",
 				error, agno);
-			break; /* no point in continuing */
+			saved_error = error;
+			continue;
 		}
 	}
-	return error;
+	return saved_error ? saved_error : error;
 
  error0:
 	xfs_trans_cancel(tp, XFS_TRANS_ABORT);


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] xfs: don't break from growfs ag update loop on error
  2013-08-15 18:15 [PATCH, RFC] xfs: don't break from growfs ag update loop on error Eric Sandeen
@ 2013-09-09 20:36 ` Eric Sandeen
  2013-09-09 22:08   ` Dave Chinner
  2013-09-10 15:21   ` Mark Tinguely
  0 siblings, 2 replies; 5+ messages in thread
From: Eric Sandeen @ 2013-09-09 20:36 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: 'linux-xfs@oss.sgi.com'

On 8/15/13 1:15 PM, Eric Sandeen wrote:
> When xfs_growfs_data_private() is updating backup superblocks,
> it bails out on the first error encountered, whether reading or
> writing:

Any thoughts on this one?  W/ the verifiers, we have a higher
chance of encountering an error, and leaving the rest of the
supers un-updated.  Repair will then possibly revert the fs to
it's pre-growfs state, and data loss will ensue...

Thanks,
-Eric

> * If we get an error writing out the alternate superblocks,
> * just issue a warning and continue.  The real work is
> * already done and committed.
> 
> This can cause a problem later during repair, because repair
> looks at all superblocks, and picks the most prevalent one
> as correct.  If we bail out early in the backup superblock
> loop, we can end up with more "bad" matching superblocks than
> good, and a post-growfs repair may revert the filesystem to
> the old geometry.
> 
> With the combination of superblock verifiers and old bugs,
> we're more likely to encounter read errors due to verification.
> 
> And perhaps even worse, we don't even properly write any of the
> newly-added superblocks in the new AGs.
> 
> Even with this change, growfs will still say:
> 
>   xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
>   data blocks changed from 319815680 to 335216640
> 
> which might be confusing to the user, but it at least communicates
> that something has gone wrong, and dmesg will probably highlight
> the need for an xfs_repair.
> 
> And this is still best-effort; if verifiers fail on more than
> half the backup supers, they may still "win" - but that's probably
> best left to repair to more gracefully handle by doing its own
> strict verification as part of the backup super "voting."
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> ---
> 
> diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
> index 614eb0c..70714bb 100644
> --- a/fs/xfs/xfs_fsops.c
> +++ b/fs/xfs/xfs_fsops.c
> @@ -153,7 +153,7 @@ xfs_growfs_data_private(
>  	xfs_buf_t		*bp;
>  	int			bucket;
>  	int			dpct;
> -	int			error;
> +	int			error, saved_error = 0;
>  	xfs_agnumber_t		nagcount;
>  	xfs_agnumber_t		nagimax = 0;
>  	xfs_rfsblock_t		nb, nb_mod;
> @@ -495,29 +495,33 @@ xfs_growfs_data_private(
>  				error = ENOMEM;
>  		}
>  
> +		/*
> +		 * If we get an error reading or writing alternate superblocks,
> +		 * continue.  xfs_repair chooses the "best" superblock based
> +		 * on most matches; if we break early, we'll leave more
> +		 * superblocks un-updated than updated, and xfs_repair may
> +		 * pick them over the properly-updated primary.
> +		 */
>  		if (error) {
>  			xfs_warn(mp,
>  		"error %d reading secondary superblock for ag %d",
>  				error, agno);
> -			break;
> +			saved_error = error;
> +			continue;
>  		}
>  		xfs_sb_to_disk(XFS_BUF_TO_SBP(bp), &mp->m_sb, XFS_SB_ALL_BITS);
>  
> -		/*
> -		 * If we get an error writing out the alternate superblocks,
> -		 * just issue a warning and continue.  The real work is
> -		 * already done and committed.
> -		 */
>  		error = xfs_bwrite(bp);
>  		xfs_buf_relse(bp);
>  		if (error) {
>  			xfs_warn(mp,
>  		"write error %d updating secondary superblock for ag %d",
>  				error, agno);
> -			break; /* no point in continuing */
> +			saved_error = error;
> +			continue;
>  		}
>  	}
> -	return error;
> +	return saved_error ? saved_error : error;
>  
>   error0:
>  	xfs_trans_cancel(tp, XFS_TRANS_ABORT);
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] xfs: don't break from growfs ag update loop on error
  2013-09-09 20:36 ` Eric Sandeen
@ 2013-09-09 22:08   ` Dave Chinner
  2013-09-10 15:21   ` Mark Tinguely
  1 sibling, 0 replies; 5+ messages in thread
From: Dave Chinner @ 2013-09-09 22:08 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: 'linux-xfs@oss.sgi.com', Eric Sandeen

On Mon, Sep 09, 2013 at 03:36:18PM -0500, Eric Sandeen wrote:
> On 8/15/13 1:15 PM, Eric Sandeen wrote:
> > When xfs_growfs_data_private() is updating backup superblocks,
> > it bails out on the first error encountered, whether reading or
> > writing:
> 
> Any thoughts on this one?  W/ the verifiers, we have a higher
> chance of encountering an error, and leaving the rest of the
> supers un-updated.  Repair will then possibly revert the fs to
> it's pre-growfs state, and data loss will ensue...

Sorry, I must have missed this. I remember discussing it with you.
The change looks fine to me, but I haven't tested it at all.

Acked-by: Dave Chinner <david@fromorbit.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] xfs: don't break from growfs ag update loop on error
  2013-09-09 20:36 ` Eric Sandeen
  2013-09-09 22:08   ` Dave Chinner
@ 2013-09-10 15:21   ` Mark Tinguely
  2013-09-10 15:23     ` Eric Sandeen
  1 sibling, 1 reply; 5+ messages in thread
From: Mark Tinguely @ 2013-09-10 15:21 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: 'linux-xfs@oss.sgi.com', Eric Sandeen

On 09/09/13 15:36, Eric Sandeen wrote:
> On 8/15/13 1:15 PM, Eric Sandeen wrote:
>> When xfs_growfs_data_private() is updating backup superblocks,
>> it bails out on the first error encountered, whether reading or
>> writing:
>
> Any thoughts on this one?  W/ the verifiers, we have a higher
> chance of encountering an error, and leaving the rest of the
> supers un-updated.  Repair will then possibly revert the fs to
> it's pre-growfs state, and data loss will ensue...
>
> Thanks,
> -Eric
>
>> * If we get an error writing out the alternate superblocks,
>> * just issue a warning and continue.  The real work is
>> * already done and committed.
>>
>> This can cause a problem later during repair, because repair
>> looks at all superblocks, and picks the most prevalent one
>> as correct.  If we bail out early in the backup superblock
>> loop, we can end up with more "bad" matching superblocks than
>> good, and a post-growfs repair may revert the filesystem to
>> the old geometry.
>>
>> With the combination of superblock verifiers and old bugs,
>> we're more likely to encounter read errors due to verification.
>>
>> And perhaps even worse, we don't even properly write any of the
>> newly-added superblocks in the new AGs.
>>
>> Even with this change, growfs will still say:
>>
>>    xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
>>    data blocks changed from 319815680 to 335216640
>>
>> which might be confusing to the user, but it at least communicates
>> that something has gone wrong, and dmesg will probably highlight
>> the need for an xfs_repair.
>>
>> And this is still best-effort; if verifiers fail on more than
>> half the backup supers, they may still "win" - but that's probably
>> best left to repair to more gracefully handle by doing its own
>> strict verification as part of the backup super "voting."
>>
>> Signed-off-by: Eric Sandeen<sandeen@redhat.com>
>> ---

Make sense to me - it could have been any kind of error including not 
being able to get a xfs_buf for the new secondary (a temp ENOMEM).

I wonder if it could be possible to fix corrupt entries rather than just 
skip them...

Probably could test this patch by corrupting a v5 secondary superblock 
and verifying with xfs_db.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH, RFC] xfs: don't break from growfs ag update loop on error
  2013-09-10 15:21   ` Mark Tinguely
@ 2013-09-10 15:23     ` Eric Sandeen
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Sandeen @ 2013-09-10 15:23 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: 'linux-xfs@oss.sgi.com', Eric Sandeen

On 9/10/13 10:21 AM, Mark Tinguely wrote:
> On 09/09/13 15:36, Eric Sandeen wrote:
>> On 8/15/13 1:15 PM, Eric Sandeen wrote:
>>> When xfs_growfs_data_private() is updating backup superblocks,
>>> it bails out on the first error encountered, whether reading or
>>> writing:
>>
>> Any thoughts on this one?  W/ the verifiers, we have a higher
>> chance of encountering an error, and leaving the rest of the
>> supers un-updated.  Repair will then possibly revert the fs to
>> it's pre-growfs state, and data loss will ensue...
>>
>> Thanks,
>> -Eric
>>
>>> * If we get an error writing out the alternate superblocks,
>>> * just issue a warning and continue.  The real work is
>>> * already done and committed.
>>>
>>> This can cause a problem later during repair, because repair
>>> looks at all superblocks, and picks the most prevalent one
>>> as correct.  If we bail out early in the backup superblock
>>> loop, we can end up with more "bad" matching superblocks than
>>> good, and a post-growfs repair may revert the filesystem to
>>> the old geometry.
>>>
>>> With the combination of superblock verifiers and old bugs,
>>> we're more likely to encounter read errors due to verification.
>>>
>>> And perhaps even worse, we don't even properly write any of the
>>> newly-added superblocks in the new AGs.
>>>
>>> Even with this change, growfs will still say:
>>>
>>>    xfs_growfs: XFS_IOC_FSGROWFSDATA xfsctl failed: Structure needs cleaning
>>>    data blocks changed from 319815680 to 335216640
>>>
>>> which might be confusing to the user, but it at least communicates
>>> that something has gone wrong, and dmesg will probably highlight
>>> the need for an xfs_repair.
>>>
>>> And this is still best-effort; if verifiers fail on more than
>>> half the backup supers, they may still "win" - but that's probably
>>> best left to repair to more gracefully handle by doing its own
>>> strict verification as part of the backup super "voting."
>>>
>>> Signed-off-by: Eric Sandeen<sandeen@redhat.com>
>>> ---
> 
> Make sense to me - it could have been any kind of error including not being able to get a xfs_buf for the new secondary (a temp ENOMEM).
> 
> I wonder if it could be possible to fix corrupt entries rather than just skip them...

Well, that should be xfs_repair's job right - and I think it does?  (Esp. w/ my other patch,
[PATCH] xfs_repair: zero out unused parts of superblocks)

but we'd want growfs to alert the user of the problem one way or another...

-Eric
 
> Probably could test this patch by corrupting a v5 secondary superblock and verifying with xfs_db.
> 
> Reviewed-by: Mark Tinguely <tinguely@sgi.com>
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2013-09-10 15:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-15 18:15 [PATCH, RFC] xfs: don't break from growfs ag update loop on error Eric Sandeen
2013-09-09 20:36 ` Eric Sandeen
2013-09-09 22:08   ` Dave Chinner
2013-09-10 15:21   ` Mark Tinguely
2013-09-10 15:23     ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.