All of lore.kernel.org
 help / color / mirror / Atom feed
* trying to avoid a lengthy quotacheck by deleting all quota data
@ 2015-02-24 15:15 Harry
  2015-02-24 16:39 ` Harry
                   ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Harry @ 2015-02-24 15:15 UTC (permalink / raw)
  To: xfs

Hi there,

We've got a moderately large disk (~2TB) into an inconsistent state, 
such that it's going to want a quotacheck the next time we mount it 
(it's currently mounted with quota accounting inactive).  Our tests 
suggest this is going to take several hours, and cause an outage we 
can't afford.

We're wondering whether there's a 'nuke the site from orbit' option that 
will let us avoid it.  The plan would be to:
- switch off quotas and delete them completely, using the commands:
   -- disable
   -- off
   -- remove
- remount the drive with -o prjquota, hoping that there will not be a 
quotacheck, because we've deleted all the old quota data
- run a script gradually restore all the quotas, one by one and in good 
time, from our own external backups (we've got the quotas in a database 
basically).

So the questions are:
- is there a way to remove all quota information from a mounted drive?
(the current mount status seems to be that it tried to mount it with -o 
prjquota but that quota accounting is *not* active)
- will it work and let us remount the drive with -o prjquota without 
causing a quotacheck?

Answers on a postcard, received with the utmost gratitude.

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-02-24 15:15 trying to avoid a lengthy quotacheck by deleting all quota data Harry
@ 2015-02-24 16:39 ` Harry
  2015-02-24 17:33 ` Ben Myers
  2015-02-24 21:59 ` Dave Chinner
  2 siblings, 0 replies; 20+ messages in thread
From: Harry @ 2015-02-24 16:39 UTC (permalink / raw)
  To: xfs

Hi there,

Initial experiments suggest that the disable/off/remove commands are not 
available when a drive is mounted without quota accounting switched on.  
So it looks like we won't be able to use them.

Is there another way to clear all the old quota information from a drive?

rgds,
Harry

On 24/02/15 15:15, Harry wrote:
> Hi there,
>
> We've got a moderately large disk (~2TB) into an inconsistent state, 
> such that it's going to want a quotacheck the next time we mount it 
> (it's currently mounted with quota accounting inactive). Our tests 
> suggest this is going to take several hours, and cause an outage we 
> can't afford.
>
> We're wondering whether there's a 'nuke the site from orbit' option 
> that will let us avoid it.  The plan would be to:
> - switch off quotas and delete them completely, using the commands:
>   -- disable
>   -- off
>   -- remove
> - remount the drive with -o prjquota, hoping that there will not be a 
> quotacheck, because we've deleted all the old quota data
> - run a script gradually restore all the quotas, one by one and in 
> good time, from our own external backups (we've got the quotas in a 
> database basically).
>
> So the questions are:
> - is there a way to remove all quota information from a mounted drive?
> (the current mount status seems to be that it tried to mount it with 
> -o prjquota but that quota accounting is *not* active)
> - will it work and let us remount the drive with -o prjquota without 
> causing a quotacheck?
>
> Answers on a postcard, received with the utmost gratitude.
>
> Rgds,
> Harry + the PythonAnywhere team.
>

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-02-24 15:15 trying to avoid a lengthy quotacheck by deleting all quota data Harry
  2015-02-24 16:39 ` Harry
@ 2015-02-24 17:33 ` Ben Myers
  2015-02-24 17:59   ` Harry Percival
  2015-02-24 21:59 ` Dave Chinner
  2 siblings, 1 reply; 20+ messages in thread
From: Ben Myers @ 2015-02-24 17:33 UTC (permalink / raw)
  To: Harry; +Cc: xfs

Hi Harry,

On Tue, Feb 24, 2015 at 03:15:26PM +0000, Harry wrote:
> Hi there,
> 
> We've got a moderately large disk (~2TB) into an inconsistent state,
> such that it's going to want a quotacheck the next time we mount it
> (it's currently mounted with quota accounting inactive).  Our tests
> suggest this is going to take several hours, and cause an outage we
> can't afford.

The 'noquota' mount option will disable quotacheck at mount time.  That
may do what you need.
 
> We're wondering whether there's a 'nuke the site from orbit' option
> that will let us avoid it.  The plan would be to:
> - switch off quotas and delete them completely, using the commands:
>   -- disable
>   -- off
>   -- remove
> - remount the drive with -o prjquota, hoping that there will not be
> a quotacheck, because we've deleted all the old quota data
> - run a script gradually restore all the quotas, one by one and in
> good time, from our own external backups (we've got the quotas in a
> database basically).
> 
> So the questions are:
> - is there a way to remove all quota information from a mounted drive?
> (the current mount status seems to be that it tried to mount it with
> -o prjquota but that quota accounting is *not* active)
> - will it work and let us remount the drive with -o prjquota without
> causing a quotacheck?

Quotacheck is implemented by truncating the quota inode and then
rebuilding the dquots from scratch as it traverses all the inodes in the
filesystem.  Unfortunately the filesystem needs to be idle during this
process or the accounting could be incorrect, so there is no gradual
option for restoring quotas.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-02-24 17:33 ` Ben Myers
@ 2015-02-24 17:59   ` Harry Percival
  2015-02-24 18:12     ` Ben Myers
  0 siblings, 1 reply; 20+ messages in thread
From: Harry Percival @ 2015-02-24 17:59 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

Hi Ben,  thanks for replying.

We're using project quotas, and we'd be prepared to wipe the slate clean 
and start by saying there are no projects and no quotas at all, to start 
with.  Is there still no way of starting from scratch and avoiding that 
quotacheck?

HP

On 24/02/15 17:33, Ben Myers wrote:
> Hi Harry,
>
> On Tue, Feb 24, 2015 at 03:15:26PM +0000, Harry wrote:
>> Hi there,
>>
>> We've got a moderately large disk (~2TB) into an inconsistent state,
>> such that it's going to want a quotacheck the next time we mount it
>> (it's currently mounted with quota accounting inactive).  Our tests
>> suggest this is going to take several hours, and cause an outage we
>> can't afford.
> The 'noquota' mount option will disable quotacheck at mount time.  That
> may do what you need.
>   
>> We're wondering whether there's a 'nuke the site from orbit' option
>> that will let us avoid it.  The plan would be to:
>> - switch off quotas and delete them completely, using the commands:
>>    -- disable
>>    -- off
>>    -- remove
>> - remount the drive with -o prjquota, hoping that there will not be
>> a quotacheck, because we've deleted all the old quota data
>> - run a script gradually restore all the quotas, one by one and in
>> good time, from our own external backups (we've got the quotas in a
>> database basically).
>>
>> So the questions are:
>> - is there a way to remove all quota information from a mounted drive?
>> (the current mount status seems to be that it tried to mount it with
>> -o prjquota but that quota accounting is *not* active)
>> - will it work and let us remount the drive with -o prjquota without
>> causing a quotacheck?
> Quotacheck is implemented by truncating the quota inode and then
> rebuilding the dquots from scratch as it traverses all the inodes in the
> filesystem.  Unfortunately the filesystem needs to be idle during this
> process or the accounting could be incorrect, so there is no gradual
> option for restoring quotas.
>
> Regards,
> 	Ben

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-02-24 17:59   ` Harry Percival
@ 2015-02-24 18:12     ` Ben Myers
  0 siblings, 0 replies; 20+ messages in thread
From: Ben Myers @ 2015-02-24 18:12 UTC (permalink / raw)
  To: Harry Percival; +Cc: xfs

Hi Harry,

On Tue, Feb 24, 2015 at 05:59:44PM +0000, Harry Percival wrote:
> Hi Ben,  thanks for replying.
> 
> We're using project quotas, and we'd be prepared to wipe the slate
> clean and start by saying there are no projects and no quotas at
> all, to start with.  Is there still no way of starting from scratch
> and avoiding that quotacheck?

You can use 'noquota' mount option or turn off accounting entirely and
delay the quotacheck until it's a more convenient time, but I'm afraid I
don't know of a way to avoid running the quotacheck altogether.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-02-24 15:15 trying to avoid a lengthy quotacheck by deleting all quota data Harry
  2015-02-24 16:39 ` Harry
  2015-02-24 17:33 ` Ben Myers
@ 2015-02-24 21:59 ` Dave Chinner
  2015-02-26 13:07   ` Harry
  2 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2015-02-24 21:59 UTC (permalink / raw)
  To: Harry; +Cc: xfs

On Tue, Feb 24, 2015 at 03:15:26PM +0000, Harry wrote:
> Hi there,
> 
> We've got a moderately large disk (~2TB) into an inconsistent state,
> such that it's going to want a quotacheck the next time we mount it
> (it's currently mounted with quota accounting inactive).  Our tests
> suggest this is going to take several hours, and cause an outage we
> can't afford.

What tests are you performing to suggest a quotacheck of a small
filesystem will take hours? (yes, 2TB is a *small* filesystem).

(xfs_info, df -i, df -h, storage hardware, etc are all relevant
here).

> We're wondering whether there's a 'nuke the site from orbit' option
> that will let us avoid it.  The plan would be to:
> - switch off quotas and delete them completely, using the commands:
>   -- disable
>   -- off
>   -- remove
> - remount the drive with -o prjquota, hoping that there will not be
> a quotacheck, because we've deleted all the old quota data

Mounting with a quota enabled *forces* a quota check if quotas
aren't currently enabled. You cannot avoid it; it's the way quota
consistency is created.

> - run a script gradually restore all the quotas, one by one and in
> good time, from our own external backups (we've got the quotas in a
> database basically).

Can't be done - quotas need to be consistent with what is currently
on disk, not what you have in a backup somewhere.

> So the questions are:
> - is there a way to remove all quota information from a mounted drive?
> (the current mount status seems to be that it tried to mount it with

mount with quotas on and turn them off via xfs_quota,i or mount
without quota options at all. Then run the remove command in
xfs_quota.

> -o prjquota but that quota accounting is *not* active)

Not possible.

> - will it work and let us remount the drive with -o prjquota without
> causing a quotacheck?

No.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-02-24 21:59 ` Dave Chinner
@ 2015-02-26 13:07   ` Harry
  2015-03-05 13:15     ` Harry
  0 siblings, 1 reply; 20+ messages in thread
From: Harry @ 2015-02-26 13:07 UTC (permalink / raw)
  To: xfs; +Cc: developers


[-- Attachment #1.1: Type: text/plain, Size: 5223 bytes --]

Thanks Dave,

* The main filesystem is currently online and seems ok, but quotas are 
not active.
* We want to estimate how long the quotacheck will take when we 
reboot/remount
* We're even a bit worried the disk might be in a broken state, such 
that the quotacheck won't actually complete successfully at all.

A brief description of our setup:
- we're on AWS
- using mdadm to make a raid array out of 8x 200GB SSD EBS drives (and lvm)
- we're using DRBD to make a live backup of all writes to another 
instance with a similar raid array

We're not doing our experiments on our live system.  Instead, we're 
using the drives from the DRBD target system.  We take DRBD offline, so 
it's no longer writing, then we take snapshots of the drives, then 
remount those elsewhere so we can experiment without disturbing the live 
system.

We've managed to mount the backup drives ok, with the 'noquota' option.  
Files look ok.  But, so far, we haven't been able to get a quotacheck to 
complete.  We've waited 12 hours+. Do you think it's possible DRBD is 
giving us copies of the live disks that are inconsistent somehow?

How can we reassure ourselves that this live disk *will* mount 
successfully if we reboot the machine, and can we estimate how long it 
will take?

    /mount | grep log_storage/
    /dev/drbd0 on /mnt/log_storage type xfs
    (rw,prjquota,allocsize=64k,_netdev)

    /df -i /mnt/log_storage//
    Filesystem        Inodes    IUsed     IFree IUse% Mounted on
    /dev/drbd0     938210704 72929413 865281291    8% /mnt/log_storage

    /df -h /mnt/log_storage//
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/drbd0      1.6T  1.4T  207G  88% /mnt/log_storage

    /xfs_info ///mnt/log_storage////
    /<lots of errors re: cannot find mount point path `xyz`>/
    meta-data=/dev/drbd0             isize=256    agcount=64,
    agsize=6553600 blks
              =                       sectsz=512   attr=2
    data     =                       bsize=4096 blocks=418906112, imaxpct=25
              =                       sunit=0      swidth=0 blks
    naming   =version 2              bsize=4096   ascii-ci=0
    log      =internal               bsize=4096   blocks=12800, version=2
              =                       sectsz=512   sunit=0 blks,
    lazy-count=1
    realtime =none                   extsz=4096   blocks=0, rtextents=0

The missing paths errors are, I think, from folders we've deleted but 
not yet removed from the projid/projects folders. I *think* they're a 
red herring here.

We've also tried running xfs_repair on the backup drives.  It takes 
about 3 hours, and shows a lot of errors about incorrect directory flags 
on inodes.  here's one from the bottom of the log of a recent attempt:

    directory flags set on non-directory inode 268702898


rgds,
Confused in London.



On 24/02/15 21:59, Dave Chinner wrote:
> On Tue, Feb 24, 2015 at 03:15:26PM +0000, Harry wrote:
>> Hi there,
>>
>> We've got a moderately large disk (~2TB) into an inconsistent state,
>> such that it's going to want a quotacheck the next time we mount it
>> (it's currently mounted with quota accounting inactive).  Our tests
>> suggest this is going to take several hours, and cause an outage we
>> can't afford.
> What tests are you performing to suggest a quotacheck of a small
> filesystem will take hours? (yes, 2TB is a *small* filesystem).
>
> (xfs_info, df -i, df -h, storage hardware, etc are all relevant
> here).
>
>> We're wondering whether there's a 'nuke the site from orbit' option
>> that will let us avoid it.  The plan would be to:
>> - switch off quotas and delete them completely, using the commands:
>>    -- disable
>>    -- off
>>    -- remove
>> - remount the drive with -o prjquota, hoping that there will not be
>> a quotacheck, because we've deleted all the old quota data
> Mounting with a quota enabled *forces* a quota check if quotas
> aren't currently enabled. You cannot avoid it; it's the way quota
> consistency is created.
>
>> - run a script gradually restore all the quotas, one by one and in
>> good time, from our own external backups (we've got the quotas in a
>> database basically).
> Can't be done - quotas need to be consistent with what is currently
> on disk, not what you have in a backup somewhere.
>
>> So the questions are:
>> - is there a way to remove all quota information from a mounted drive?
>> (the current mount status seems to be that it tried to mount it with
> mount with quotas on and turn them off via xfs_quota,i or mount
> without quota options at all. Then run the remove command in
> xfs_quota.
>
>> -o prjquota but that quota accounting is *not* active)
> Not possible.
>
>> - will it work and let us remount the drive with -o prjquota without
>> causing a quotacheck?
> No.
>
> Cheers,
>
> Dave.

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK


[-- Attachment #1.2: Type: text/html, Size: 7321 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-02-26 13:07   ` Harry
@ 2015-03-05 13:15     ` Harry
  2015-03-05 15:53       ` Eric Sandeen
  0 siblings, 1 reply; 20+ messages in thread
From: Harry @ 2015-03-05 13:15 UTC (permalink / raw)
  To: xfs; +Cc: developers


[-- Attachment #1.1: Type: text/plain, Size: 6439 bytes --]

Update -- so far, we've not managed to gain any confidence that we'll 
ever be able to re-mount that disk. The general consensus seems to be to 
fish all the data off the disk using rsync, and then move off XFS to ext4.

Not a very helpful message for y'all to hear, I know.  But if it's any 
help in prioritising your future work, i think the dealbreaker for us 
was the inescapable quotacheck on mount, which means that any time a 
fileserver goes down unexpectedly, we have an unavoidable, 
indeterminate-but-long period of downtime...

hp

On 26/02/15 13:07, Harry wrote:
> Thanks Dave,
>
> * The main filesystem is currently online and seems ok, but quotas are 
> not active.
> * We want to estimate how long the quotacheck will take when we 
> reboot/remount
> * We're even a bit worried the disk might be in a broken state, such 
> that the quotacheck won't actually complete successfully at all.
>
> A brief description of our setup:
> - we're on AWS
> - using mdadm to make a raid array out of 8x 200GB SSD EBS drives (and 
> lvm)
> - we're using DRBD to make a live backup of all writes to another 
> instance with a similar raid array
>
> We're not doing our experiments on our live system.  Instead, we're 
> using the drives from the DRBD target system.  We take DRBD offline, 
> so it's no longer writing, then we take snapshots of the drives, then 
> remount those elsewhere so we can experiment without disturbing the 
> live system.
>
> We've managed to mount the backup drives ok, with the 'noquota' 
> option.  Files look ok.  But, so far, we haven't been able to get a 
> quotacheck to complete.  We've waited 12 hours+. Do you think it's 
> possible DRBD is giving us copies of the live disks that are 
> inconsistent somehow?
>
> How can we reassure ourselves that this live disk *will* mount 
> successfully if we reboot the machine, and can we estimate how long it 
> will take?
>
>     /mount | grep log_storage/
>     /dev/drbd0 on /mnt/log_storage type xfs
>     (rw,prjquota,allocsize=64k,_netdev)
>
>     /df -i /mnt/log_storage//
>     Filesystem        Inodes    IUsed     IFree IUse% Mounted on
>     /dev/drbd0     938210704 72929413 865281291    8% /mnt/log_storage
>
>     /df -h /mnt/log_storage//
>     Filesystem      Size  Used Avail Use% Mounted on
>     /dev/drbd0      1.6T  1.4T  207G  88% /mnt/log_storage
>
>     /xfs_info ///mnt/log_storage////
>     /<lots of errors re: cannot find mount point path `xyz`>/
>     meta-data=/dev/drbd0             isize=256    agcount=64,
>     agsize=6553600 blks
>              =                       sectsz=512   attr=2
>     data     =                       bsize=4096 blocks=418906112,
>     imaxpct=25
>              =                       sunit=0      swidth=0 blks
>     naming   =version 2              bsize=4096   ascii-ci=0
>     log      =internal               bsize=4096   blocks=12800, version=2
>              =                       sectsz=512   sunit=0 blks,
>     lazy-count=1
>     realtime =none                   extsz=4096   blocks=0, rtextents=0
>
> The missing paths errors are, I think, from folders we've deleted but 
> not yet removed from the projid/projects folders. I *think* they're a 
> red herring here.
>
> We've also tried running xfs_repair on the backup drives.  It takes 
> about 3 hours, and shows a lot of errors about incorrect directory 
> flags on inodes.  here's one from the bottom of the log of a recent 
> attempt:
>
>     directory flags set on non-directory inode 268702898
>
>
> rgds,
> Confused in London.
>
>
>
> On 24/02/15 21:59, Dave Chinner wrote:
>> On Tue, Feb 24, 2015 at 03:15:26PM +0000, Harry wrote:
>>> Hi there,
>>>
>>> We've got a moderately large disk (~2TB) into an inconsistent state,
>>> such that it's going to want a quotacheck the next time we mount it
>>> (it's currently mounted with quota accounting inactive).  Our tests
>>> suggest this is going to take several hours, and cause an outage we
>>> can't afford.
>> What tests are you performing to suggest a quotacheck of a small
>> filesystem will take hours? (yes, 2TB is a *small* filesystem).
>>
>> (xfs_info, df -i, df -h, storage hardware, etc are all relevant
>> here).
>>
>>> We're wondering whether there's a 'nuke the site from orbit' option
>>> that will let us avoid it.  The plan would be to:
>>> - switch off quotas and delete them completely, using the commands:
>>>    -- disable
>>>    -- off
>>>    -- remove
>>> - remount the drive with -o prjquota, hoping that there will not be
>>> a quotacheck, because we've deleted all the old quota data
>> Mounting with a quota enabled *forces* a quota check if quotas
>> aren't currently enabled. You cannot avoid it; it's the way quota
>> consistency is created.
>>
>>> - run a script gradually restore all the quotas, one by one and in
>>> good time, from our own external backups (we've got the quotas in a
>>> database basically).
>> Can't be done - quotas need to be consistent with what is currently
>> on disk, not what you have in a backup somewhere.
>>
>>> So the questions are:
>>> - is there a way to remove all quota information from a mounted drive?
>>> (the current mount status seems to be that it tried to mount it with
>> mount with quotas on and turn them off via xfs_quota,i or mount
>> without quota options at all. Then run the remove command in
>> xfs_quota.
>>
>>> -o prjquota but that quota accounting is *not* active)
>> Not possible.
>>
>>> - will it work and let us remount the drive with -o prjquota without
>>> causing a quotacheck?
>> No.
>>
>> Cheers,
>>
>> Dave.
>
> Rgds,
> Harry + the PythonAnywhere team.
>
> -- 
> Harry Percival
> Developer
> harry@pythonanywhere.com
>
> PythonAnywhere - a fully browser-based Python development and hosting environment
> <http://www.pythonanywhere.com/>
>
> PythonAnywhere LLP
> 17a Clerkenwell Road, London EC1M 5RD, UK
> VAT No.: GB 893 5643 79
> Registered in England and Wales as company number OC378414.
> Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK


[-- Attachment #1.2: Type: text/html, Size: 9124 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 13:15     ` Harry
@ 2015-03-05 15:53       ` Eric Sandeen
  2015-03-05 17:05         ` Harry
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Sandeen @ 2015-03-05 15:53 UTC (permalink / raw)
  To: Harry, xfs; +Cc: developers

On 3/5/15 7:15 AM, Harry wrote:
> Update -- so far, we've not managed to gain any confidence that we'll
> ever be able to re-mount that disk. The general consensus seems to be
> to fish all the data off the disk using rsync, and then move off XFS
> to ext4.
> 
> Not a very helpful message for y'all to hear, I know. But if it's any
> help in prioritising your future work, i think the dealbreaker for us
> was the inescapable quotacheck on mount, which means that any time a
> fileserver goes down unexpectedly, we have an unavoidable,
> indeterminate-but-long period of downtime...
> 
> hp

What you decide to use is up to you of course, and causes us no
heartbreak.  :)  But I think you fundamentally misunderstand the situation;
an unexpected fileserver failure should not result in a lengthy quotacheck
on xfs, because xfs quota is journaled, and will simply be replayed along with
the rest of the log.

I honestly don't know what has led you to the conclusion that remounting
the filesystem will lead to any quotacheck at all, let alone a lengthy one.

> * We're even a bit worried the disk might be in a broken state, such
> that the quotacheck won't actually complete successfully at all.

If your disk is broken, that's not a filesystem issue.  It seems possible
that whatever drbd manipulation you're doing is causing an issue, but because
you haven't really explained it in detail, I don't know.

> We take DRBD offline, so it's no longer writing, then we take
> snapshots of the drives, then remount those elsewhere so we can
> experiment without disturbing the live system.

Did you quiesce the filesystem first with i.e. xfs_freeze?

So far this thread has been long on prose and speculation, and short
on actual analysis, log messages, etc.  Feel free to use ext4 or whatever
suits you, but given that nothing in this thread has implicated misbehavior
by xfs, I don't think that switching filesystems will solve the perceived
problem.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 15:53       ` Eric Sandeen
@ 2015-03-05 17:05         ` Harry
  2015-03-05 17:09           ` Harry
  2015-03-05 17:27           ` Eric Sandeen
  0 siblings, 2 replies; 20+ messages in thread
From: Harry @ 2015-03-05 17:05 UTC (permalink / raw)
  To: xfs; +Cc: developers


[-- Attachment #1.1: Type: text/plain, Size: 4070 bytes --]

Thanks for the reply Eric.

One of our problems is that we're limited in terms of what manipulations 
we can apply to the live system, and so instead we've been running our 
experiments against the backup system, and you're quite right that DRBD 
may be introducing some weirdness of its own, so those experiments may 
not be safe to draw conclusions from.

Here's what we know about the live system
-> it had an outage, equivalent to having its power cable yanked, or 
doing an 'echo b > /proc/sysrq-trigger'
-> when it came back, it decided to mount the drive without quotas.
-> we saw a message in syslog saying " Failed to initialize disk quotas"
-> last time we had to run a quotacheck (several months ago) it took 
about 2 hours.

We can repro the quotacheck issue on our test clusters, as follows:
-> kick off a job that writes to the disk
-> hard reboot with "echo b > /proc/sysrq-trigger"
-> on next boot, see "Failed to initialize disk quotas" message, xfs 
mounts without quotas
-> soft reboot with "reboot"
-> on next boot, see "Quotacheck needed: Please wait." message.
-> Quotacheck completes some time later.

So our best-case scenario is that, next time we reboot, we'll have an 
outage of about 2 hours.  And our paranoid worst-case scenario, induced 
by our experiments with our drbd backup drives, are that the disk will 
actually turn out not to be mountable at all.

is that "quotacheck always required after hard reboot" behaviour that 
we're observing something you expected?  you seemed to be saying that 
the fact that quota are journaled should mean it's not needed?

HP

On 05/03/15 15:53, Eric Sandeen wrote:
> On 3/5/15 7:15 AM, Harry wrote:
>> Update -- so far, we've not managed to gain any confidence that we'll
>> ever be able to re-mount that disk. The general consensus seems to be
>> to fish all the data off the disk using rsync, and then move off XFS
>> to ext4.
>>
>> Not a very helpful message for y'all to hear, I know. But if it's any
>> help in prioritising your future work, i think the dealbreaker for us
>> was the inescapable quotacheck on mount, which means that any time a
>> fileserver goes down unexpectedly, we have an unavoidable,
>> indeterminate-but-long period of downtime...
>>
>> hp
> What you decide to use is up to you of course, and causes us no
> heartbreak.  :)  But I think you fundamentally misunderstand the situation;
> an unexpected fileserver failure should not result in a lengthy quotacheck
> on xfs, because xfs quota is journaled, and will simply be replayed along with
> the rest of the log.
>
> I honestly don't know what has led you to the conclusion that remounting
> the filesystem will lead to any quotacheck at all, let alone a lengthy one.
>
>> * We're even a bit worried the disk might be in a broken state, such
>> that the quotacheck won't actually complete successfully at all.
> If your disk is broken, that's not a filesystem issue.  It seems possible
> that whatever drbd manipulation you're doing is causing an issue, but because
> you haven't really explained it in detail, I don't know.
>
>> We take DRBD offline, so it's no longer writing, then we take
>> snapshots of the drives, then remount those elsewhere so we can
>> experiment without disturbing the live system.
> Did you quiesce the filesystem first with i.e. xfs_freeze?
>
> So far this thread has been long on prose and speculation, and short
> on actual analysis, log messages, etc.  Feel free to use ext4 or whatever
> suits you, but given that nothing in this thread has implicated misbehavior
> by xfs, I don't think that switching filesystems will solve the perceived
> problem.
>
> -Eric

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK


[-- Attachment #1.2: Type: text/html, Size: 5206 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 17:05         ` Harry
@ 2015-03-05 17:09           ` Harry
  2015-03-05 17:27           ` Eric Sandeen
  1 sibling, 0 replies; 20+ messages in thread
From: Harry @ 2015-03-05 17:09 UTC (permalink / raw)
  To: xfs; +Cc: developers


[-- Attachment #1.1: Type: text/plain, Size: 5133 bytes --]

PS.  We might be interested in getting a better estimate of how long a 
quotacheck would take.  From an old thread on the mailing list, we see 
this suggestion:

xfstests:src/bstat

We're a bit worried about running this on the live system, because we're 
worried it will impact its performance substantially.  Is that an 
unfounded worry?  I presume it's a read-only operation, so it would be 
safe to kill it if we see performance degradation?

rgds,
Harry + the team.

On 05/03/15 17:05, Harry wrote:
> Thanks for the reply Eric.
>
> One of our problems is that we're limited in terms of what 
> manipulations we can apply to the live system, and so instead we've 
> been running our experiments against the backup system, and you're 
> quite right that DRBD may be introducing some weirdness of its own, so 
> those experiments may not be safe to draw conclusions from.
>
> Here's what we know about the live system
> -> it had an outage, equivalent to having its power cable yanked, or 
> doing an 'echo b > /proc/sysrq-trigger'
> -> when it came back, it decided to mount the drive without quotas.
> -> we saw a message in syslog saying " Failed to initialize disk quotas"
> -> last time we had to run a quotacheck (several months ago) it took 
> about 2 hours.
>
> We can repro the quotacheck issue on our test clusters, as follows:
> -> kick off a job that writes to the disk
> -> hard reboot with "echo b > /proc/sysrq-trigger"
> -> on next boot, see "Failed to initialize disk quotas" message, xfs 
> mounts without quotas
> -> soft reboot with "reboot"
> -> on next boot, see "Quotacheck needed: Please wait." message.
> -> Quotacheck completes some time later.
>
> So our best-case scenario is that, next time we reboot, we'll have an 
> outage of about 2 hours.  And our paranoid worst-case scenario, 
> induced by our experiments with our drbd backup drives, are that the 
> disk will actually turn out not to be mountable at all.
>
> is that "quotacheck always required after hard reboot" behaviour that 
> we're observing something you expected?  you seemed to be saying that 
> the fact that quota are journaled should mean it's not needed?
>
> HP
>
> On 05/03/15 15:53, Eric Sandeen wrote:
>> On 3/5/15 7:15 AM, Harry wrote:
>>> Update -- so far, we've not managed to gain any confidence that we'll
>>> ever be able to re-mount that disk. The general consensus seems to be
>>> to fish all the data off the disk using rsync, and then move off XFS
>>> to ext4.
>>>
>>> Not a very helpful message for y'all to hear, I know. But if it's any
>>> help in prioritising your future work, i think the dealbreaker for us
>>> was the inescapable quotacheck on mount, which means that any time a
>>> fileserver goes down unexpectedly, we have an unavoidable,
>>> indeterminate-but-long period of downtime...
>>>
>>> hp
>> What you decide to use is up to you of course, and causes us no
>> heartbreak.  :)  But I think you fundamentally misunderstand the situation;
>> an unexpected fileserver failure should not result in a lengthy quotacheck
>> on xfs, because xfs quota is journaled, and will simply be replayed along with
>> the rest of the log.
>>
>> I honestly don't know what has led you to the conclusion that remounting
>> the filesystem will lead to any quotacheck at all, let alone a lengthy one.
>>
>>> * We're even a bit worried the disk might be in a broken state, such
>>> that the quotacheck won't actually complete successfully at all.
>> If your disk is broken, that's not a filesystem issue.  It seems possible
>> that whatever drbd manipulation you're doing is causing an issue, but because
>> you haven't really explained it in detail, I don't know.
>>
>>> We take DRBD offline, so it's no longer writing, then we take
>>> snapshots of the drives, then remount those elsewhere so we can
>>> experiment without disturbing the live system.
>> Did you quiesce the filesystem first with i.e. xfs_freeze?
>>
>> So far this thread has been long on prose and speculation, and short
>> on actual analysis, log messages, etc.  Feel free to use ext4 or whatever
>> suits you, but given that nothing in this thread has implicated misbehavior
>> by xfs, I don't think that switching filesystems will solve the perceived
>> problem.
>>
>> -Eric
>
> Rgds,
> Harry + the PythonAnywhere team.
>
> -- 
> Harry Percival
> Developer
> harry@pythonanywhere.com
>
> PythonAnywhere - a fully browser-based Python development and hosting environment
> <http://www.pythonanywhere.com/>
>
> PythonAnywhere LLP
> 17a Clerkenwell Road, London EC1M 5RD, UK
> VAT No.: GB 893 5643 79
> Registered in England and Wales as company number OC378414.
> Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK


[-- Attachment #1.2: Type: text/html, Size: 6855 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 17:05         ` Harry
  2015-03-05 17:09           ` Harry
@ 2015-03-05 17:27           ` Eric Sandeen
  2015-03-05 17:34             ` Harry
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Sandeen @ 2015-03-05 17:27 UTC (permalink / raw)
  To: Harry, xfs; +Cc: developers

On 3/5/15 11:05 AM, Harry wrote:
> Thanks for the reply Eric.
> 
> One of our problems is that we're limited in terms of what
> manipulations we can apply to the live system, and so instead we've
> been running our experiments against the backup system, and you're
> quite right that DRBD may be introducing some weirdness of its own,
> so those experiments may not be safe to draw conclusions from.
> 
> Here's what we know about the live system
> -> it had an outage, equivalent to having its power cable yanked, or doing an 'echo b > /proc/sysrq-trigger'
> -> when it came back, it decided to mount the drive without quotas.
> -> we saw a message in syslog saying " Failed to initialize disk quotas"
> -> last time we had to run a quotacheck (several months ago) it took about 2 hours.
> 
> We can repro the quotacheck issue on our test clusters, as follows:
> -> kick off a job that writes to the disk
> -> hard reboot with "echo b > /proc/sysrq-trigger"
> -> on next boot, see "Failed to initialize disk quotas" message, xfs mounts without quotas
> -> soft reboot with "reboot"
> -> on next boot, see "Quotacheck needed: Please wait." message.
> -> Quotacheck completes some time later.
> 
> So our best-case scenario is that, next time we reboot, we'll have an
> outage of about 2 hours. And our paranoid worst-case scenario,
> induced by our experiments with our drbd backup drives, are that the
> disk will actually turn out not to be mountable at all.
> 
> is that "quotacheck always required after hard reboot" behaviour that
> we're observing something you expected? you seemed to be saying that
> the fact that quota are journaled should mean it's not needed?

In general, that's correct.  It's not clear why "Failed to initialize disk quotas"
appeared; that seems closer to the root cause.  But again, we don't have your
full logs to look at, I don't know if anything else offers a clue.  (For that
matter, we don't even know what kernel version you're on...)

here, on a recent 4.0-rc1 kernel:

# mount -o quota /dev/sdc6 /mnt/test
# cp -aR /lib/modules/ /mnt/test
# echo b > /proc/sysrq-trigger

[152807.209688] sysrq: SysRq : Resetting
...
<reboots>

# mount -o quota /dev/sdc6 /mnt/test
# dmesg | tail -n 3
[   90.822601] XFS (sdc6): Mounting V4 Filesystem
[   90.921346] XFS (sdc6): Starting recovery (logdev: internal)
[   93.399133] XFS (sdc6): Ending recovery (logdev: internal)
#

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 17:27           ` Eric Sandeen
@ 2015-03-05 17:34             ` Harry
  2015-03-05 17:44               ` Eric Sandeen
  0 siblings, 1 reply; 20+ messages in thread
From: Harry @ 2015-03-05 17:34 UTC (permalink / raw)
  To: xfs; +Cc: developers

We're on 3.13.0-39 (Ubuntu Trusty).

If you're interested in looking into it further, I'd be happy to provide 
any extra info you'd like?

But just to make sure I'm not wasting any of your time -- I think the 
team have pretty much decided to make the switch no matter what.  The 
quotacheck issue is one thing, but actually the switch to ext4 
simplifies lots of other aspects of our quota system (one of the reasons 
we picked nfs was to be able to use project quotas, but it turns out we 
don't need them any more, so user quotas are simpler...)



On 05/03/15 17:27, Eric Sandeen wrote:
> On 3/5/15 11:05 AM, Harry wrote:
>> Thanks for the reply Eric.
>>
>> One of our problems is that we're limited in terms of what
>> manipulations we can apply to the live system, and so instead we've
>> been running our experiments against the backup system, and you're
>> quite right that DRBD may be introducing some weirdness of its own,
>> so those experiments may not be safe to draw conclusions from.
>>
>> Here's what we know about the live system
>> -> it had an outage, equivalent to having its power cable yanked, or doing an 'echo b > /proc/sysrq-trigger'
>> -> when it came back, it decided to mount the drive without quotas.
>> -> we saw a message in syslog saying " Failed to initialize disk quotas"
>> -> last time we had to run a quotacheck (several months ago) it took about 2 hours.
>>
>> We can repro the quotacheck issue on our test clusters, as follows:
>> -> kick off a job that writes to the disk
>> -> hard reboot with "echo b > /proc/sysrq-trigger"
>> -> on next boot, see "Failed to initialize disk quotas" message, xfs mounts without quotas
>> -> soft reboot with "reboot"
>> -> on next boot, see "Quotacheck needed: Please wait." message.
>> -> Quotacheck completes some time later.
>>
>> So our best-case scenario is that, next time we reboot, we'll have an
>> outage of about 2 hours. And our paranoid worst-case scenario,
>> induced by our experiments with our drbd backup drives, are that the
>> disk will actually turn out not to be mountable at all.
>>
>> is that "quotacheck always required after hard reboot" behaviour that
>> we're observing something you expected? you seemed to be saying that
>> the fact that quota are journaled should mean it's not needed?
> In general, that's correct.  It's not clear why "Failed to initialize disk quotas"
> appeared; that seems closer to the root cause.  But again, we don't have your
> full logs to look at, I don't know if anything else offers a clue.  (For that
> matter, we don't even know what kernel version you're on...)
>
> here, on a recent 4.0-rc1 kernel:
>
> # mount -o quota /dev/sdc6 /mnt/test
> # cp -aR /lib/modules/ /mnt/test
> # echo b > /proc/sysrq-trigger
>
> [152807.209688] sysrq: SysRq : Resetting
> ...
> <reboots>
>
> # mount -o quota /dev/sdc6 /mnt/test
> # dmesg | tail -n 3
> [   90.822601] XFS (sdc6): Mounting V4 Filesystem
> [   90.921346] XFS (sdc6): Starting recovery (logdev: internal)
> [   93.399133] XFS (sdc6): Ending recovery (logdev: internal)
> #
>
> -Eric

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 17:34             ` Harry
@ 2015-03-05 17:44               ` Eric Sandeen
  2015-03-05 18:07                 ` Harry
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Sandeen @ 2015-03-05 17:44 UTC (permalink / raw)
  To: Harry, xfs; +Cc: developers

On 3/5/15 11:34 AM, Harry wrote:
> We're on 3.13.0-39 (Ubuntu Trusty).
> 
> If you're interested in looking into it further, I'd be happy to provide any extra info you'd like?

Well, not really.  It all works here, and you have an ... interesting
setup, so if you've decided that somehow ext4 will save you from
quotachecks in the future, I'm not going to dig a lot further here.

I did already ask for logs, which might tell us why the original quota init
failed, but ... 

> But just to make sure I'm not wasting any of your time -- I think the
> team have pretty much decided to make the switch no matter what. The
> quotacheck issue is one thing, but actually the switch to ext4
> simplifies lots of other aspects of our quota system (one of the
> reasons we picked nfs was to be able to use project quotas, but it
> turns out we don't need them any more, so user quotas are simpler...)

... it sounds like you've already picked your solution to this AFAICT
not-well-understood problem.

*shrug* knock yourself out.  :)  You should use what works best meets your
needs, of course.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 17:44               ` Eric Sandeen
@ 2015-03-05 18:07                 ` Harry
  2015-03-05 20:08                   ` Eric Sandeen
  2015-03-07 13:41                   ` Arkadiusz Miśkiewicz
  0 siblings, 2 replies; 20+ messages in thread
From: Harry @ 2015-03-05 18:07 UTC (permalink / raw)
  To: xfs

Here's the syslog, if you're curious.

http://pastebin.com/raw.php?i=kKvWJcze

Search for "Failed to initialize"

So your best guess is that it's the drbd layer that's causing the 
quotacheck?  Out of curiosity, i may try mounting a non-drbd drive with 
xfs, and seeing if we can still repro the hard-reboot-causes-quotacheck 
thing...  Unless you think it's just an old behaviour that's more to do 
with the version of the kernel we're using?

HP

On 05/03/15 17:44, Eric Sandeen wrote:
> On 3/5/15 11:34 AM, Harry wrote:
>> We're on 3.13.0-39 (Ubuntu Trusty).
>>
>> If you're interested in looking into it further, I'd be happy to provide any extra info you'd like?
> Well, not really.  It all works here, and you have an ... interesting
> setup, so if you've decided that somehow ext4 will save you from
> quotachecks in the future, I'm not going to dig a lot further here.
>
> I did already ask for logs, which might tell us why the original quota init
> failed, but ...
>
>> But just to make sure I'm not wasting any of your time -- I think the
>> team have pretty much decided to make the switch no matter what. The
>> quotacheck issue is one thing, but actually the switch to ext4
>> simplifies lots of other aspects of our quota system (one of the
>> reasons we picked nfs was to be able to use project quotas, but it
>> turns out we don't need them any more, so user quotas are simpler...)
> ... it sounds like you've already picked your solution to this AFAICT
> not-well-understood problem.
>
> *shrug* knock yourself out.  :)  You should use what works best meets your
> needs, of course.
>
> -Eric
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

Rgds,
Harry + the PythonAnywhere team.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 18:07                 ` Harry
@ 2015-03-05 20:08                   ` Eric Sandeen
  2015-03-06 11:27                     ` Harry Percival
  2015-03-07 13:41                   ` Arkadiusz Miśkiewicz
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Sandeen @ 2015-03-05 20:08 UTC (permalink / raw)
  To: Harry, xfs

On 3/5/15 12:07 PM, Harry wrote:
> Here's the syslog, if you're curious.
> 
> http://pastebin.com/raw.php?i=kKvWJcze
> 
> Search for "Failed to initialize"

Ok, there is no other message offering more info, sadly.

> So your best guess is that it's the drbd layer that's causing the
> quotacheck?  Out of curiosity, i may try mounting a non-drbd drive
> with xfs, and seeing if we can still repro the
> hard-reboot-causes-quotacheck thing...  Unless you think it's just an
> old behaviour that's more to do with the version of the kernel we're
> using?

I really don't have a good guess at this point..... oh, wait, finally,
a bell goes off:

commit 5ef828c4152726f56751c78ea844f08d2b2a4fa3
Author: Eric Sandeen <sandeen@sandeen.net>
Date:   Mon Aug 4 11:35:44 2014 +1000

    xfs: avoid false quotacheck after unclean shutdown
    
    The commit
    
    83e782e xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD
    
    added a new function xfs_sb_quota_from_disk() which swaps
    on-disk XFS_OQUOTA_* flags for in-core XFS_GQUOTA_* and XFS_PQUOTA_*
    flags after the superblock is read.
    
    However, if log recovery is required, the superblock is read again,
    and the modified in-core flags are re-read from disk, so we have
    XFS_OQUOTA_* flags in memory again.  This causes the
    XFS_QM_NEED_QUOTACHECK() test to be true, because the XFS_OQUOTA_CHKD
    is still set, and not XFS_GQUOTA_CHKD or XFS_PQUOTA_CHKD.
    
    Change xfs_sb_from_disk to call xfs_sb_quota_from disk and always
    convert the disk flags to in-memory flags.
    
    Add a lower-level function which can be called with "false" to
    not convert the flags, so that the sb verifier can verify
    exactly what was on disk, per Brian Foster's suggestion.
    
    Reported-by: Cyril B. <cbay@excellency.fr>
    Signed-off-by: Eric Sandeen <sandeen@redhat.com>


83e782e went in at v3.11; the above commit hit v3.17, so it was broken
for a while.

I still can't explain the "quota init failed" bit, but the above
probably explains the unexpected quotacheck problem.

-Eric

> HP

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 20:08                   ` Eric Sandeen
@ 2015-03-06 11:27                     ` Harry Percival
  2015-03-06 21:11                       ` Dave Chinner
  0 siblings, 1 reply; 20+ messages in thread
From: Harry Percival @ 2015-03-06 11:27 UTC (permalink / raw)
  To: xfs

Glad we managed to nail down a probable culprit!   Here's hoping Debian 
and Ubuntu pull in a new kernel :)

In other news, any advice on running this

     xfstests:src/bstat

command as a way of estimating how long a quotacheck will take? Would it 
still be a useful estimator?  Do you think it would significantly affect 
the performance of a disk that's under fairly heavy use?

hp

On 05/03/15 20:08, Eric Sandeen wrote:
> On 3/5/15 12:07 PM, Harry wrote:
>> Here's the syslog, if you're curious.
>>
>> http://pastebin.com/raw.php?i=kKvWJcze
>>
>> Search for "Failed to initialize"
> Ok, there is no other message offering more info, sadly.
>
>> So your best guess is that it's the drbd layer that's causing the
>> quotacheck?  Out of curiosity, i may try mounting a non-drbd drive
>> with xfs, and seeing if we can still repro the
>> hard-reboot-causes-quotacheck thing...  Unless you think it's just an
>> old behaviour that's more to do with the version of the kernel we're
>> using?
> I really don't have a good guess at this point..... oh, wait, finally,
> a bell goes off:
>
> commit 5ef828c4152726f56751c78ea844f08d2b2a4fa3
> Author: Eric Sandeen <sandeen@sandeen.net>
> Date:   Mon Aug 4 11:35:44 2014 +1000
>
>      xfs: avoid false quotacheck after unclean shutdown
>      
>      The commit
>      
>      83e782e xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD
>      
>      added a new function xfs_sb_quota_from_disk() which swaps
>      on-disk XFS_OQUOTA_* flags for in-core XFS_GQUOTA_* and XFS_PQUOTA_*
>      flags after the superblock is read.
>      
>      However, if log recovery is required, the superblock is read again,
>      and the modified in-core flags are re-read from disk, so we have
>      XFS_OQUOTA_* flags in memory again.  This causes the
>      XFS_QM_NEED_QUOTACHECK() test to be true, because the XFS_OQUOTA_CHKD
>      is still set, and not XFS_GQUOTA_CHKD or XFS_PQUOTA_CHKD.
>      
>      Change xfs_sb_from_disk to call xfs_sb_quota_from disk and always
>      convert the disk flags to in-memory flags.
>      
>      Add a lower-level function which can be called with "false" to
>      not convert the flags, so that the sb verifier can verify
>      exactly what was on disk, per Brian Foster's suggestion.
>      
>      Reported-by: Cyril B. <cbay@excellency.fr>
>      Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>
>
> 83e782e went in at v3.11; the above commit hit v3.17, so it was broken
> for a while.
>
> I still can't explain the "quota init failed" bit, but the above
> probably explains the unexpected quotacheck problem.
>
> -Eric
>
>> HP

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-06 11:27                     ` Harry Percival
@ 2015-03-06 21:11                       ` Dave Chinner
  2015-03-25 12:34                         ` Harry Percival
  0 siblings, 1 reply; 20+ messages in thread
From: Dave Chinner @ 2015-03-06 21:11 UTC (permalink / raw)
  To: Harry Percival; +Cc: xfs

On Fri, Mar 06, 2015 at 11:27:28AM +0000, Harry Percival wrote:
> Glad we managed to nail down a probable culprit!   Here's hoping
> Debian and Ubuntu pull in a new kernel :)
> 
> In other news, any advice on running this
> 
>     xfstests:src/bstat
> 
> command as a way of estimating how long a quotacheck will take?

It will give you an idea - quotacheck uses bulkstat, too.

> Would it still be a useful estimator?  Do you think it would
> significantly affect the performance of a disk that's under fairly
> heavy use?

Of course. Bulkstat drives the disks as hard as they will go.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-05 18:07                 ` Harry
  2015-03-05 20:08                   ` Eric Sandeen
@ 2015-03-07 13:41                   ` Arkadiusz Miśkiewicz
  1 sibling, 0 replies; 20+ messages in thread
From: Arkadiusz Miśkiewicz @ 2015-03-07 13:41 UTC (permalink / raw)
  To: xfs

On Thursday 05 of March 2015, Harry wrote:
> Here's the syslog, if you're curious.
> 
> http://pastebin.com/raw.php?i=kKvWJcze
> 
> Search for "Failed to initialize"

What does

xfs_db /dev/drbd0 -c "sb 0" -c "print" |grep quot

show?

-- 
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: trying to avoid a lengthy quotacheck by deleting all quota data
  2015-03-06 21:11                       ` Dave Chinner
@ 2015-03-25 12:34                         ` Harry Percival
  0 siblings, 0 replies; 20+ messages in thread
From: Harry Percival @ 2015-03-25 12:34 UTC (permalink / raw)
  Cc: xfs

We've written up a sort of post-mortem blog post describing the whole saga:

http://blog.pythonanywhere.com/110/

We've tried hard to avoid kicking off some kind of filesystem flamewar 
while describing the whys and wherefores of our move from xfs to ext4, 
but if you feel we've misrepresented anything, do let us know, I'm sure 
we can adjust the post.

Thanks again to everyone for your help debugging this stuff, and for a 
filesystem which served us excellently for many years.

rgds,
Harry + the team.


On 06/03/15 21:11, Dave Chinner wrote:
> On Fri, Mar 06, 2015 at 11:27:28AM +0000, Harry Percival wrote:
>> Glad we managed to nail down a probable culprit!   Here's hoping
>> Debian and Ubuntu pull in a new kernel :)
>>
>> In other news, any advice on running this
>>
>>      xfstests:src/bstat
>>
>> command as a way of estimating how long a quotacheck will take?
> It will give you an idea - quotacheck uses bulkstat, too.
>
>> Would it still be a useful estimator?  Do you think it would
>> significantly affect the performance of a disk that's under fairly
>> heavy use?
> Of course. Bulkstat drives the disks as hard as they will go.
>
> Cheers,
>
> Dave.

-- 
Harry Percival
Developer
harry@pythonanywhere.com

PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-03-25 12:34 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-24 15:15 trying to avoid a lengthy quotacheck by deleting all quota data Harry
2015-02-24 16:39 ` Harry
2015-02-24 17:33 ` Ben Myers
2015-02-24 17:59   ` Harry Percival
2015-02-24 18:12     ` Ben Myers
2015-02-24 21:59 ` Dave Chinner
2015-02-26 13:07   ` Harry
2015-03-05 13:15     ` Harry
2015-03-05 15:53       ` Eric Sandeen
2015-03-05 17:05         ` Harry
2015-03-05 17:09           ` Harry
2015-03-05 17:27           ` Eric Sandeen
2015-03-05 17:34             ` Harry
2015-03-05 17:44               ` Eric Sandeen
2015-03-05 18:07                 ` Harry
2015-03-05 20:08                   ` Eric Sandeen
2015-03-06 11:27                     ` Harry Percival
2015-03-06 21:11                       ` Dave Chinner
2015-03-25 12:34                         ` Harry Percival
2015-03-07 13:41                   ` Arkadiusz Miśkiewicz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.