All of lore.kernel.org
 help / color / mirror / Atom feed
* Power loss and zero-length files
@ 2013-08-23 14:59 Robert Widmer
  2013-08-23 15:20 ` Ben Myers
  2013-08-24 11:15 ` Christoph Hellwig
  0 siblings, 2 replies; 8+ messages in thread
From: Robert Widmer @ 2013-08-23 14:59 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: text/plain, Size: 1798 bytes --]

I had a script that updated several files on an XFS filesystem using "sed
-i", and someone decided to power cycle the box without a sync after
running the script, and found that all the files that were updated were now
zero-length.

Curious, I ran the following script to try and isolate the behavior:


#!/usr/bin/perl

my $dir = "/home/$ENV{USER}/XFSTest";
mkdir $dir;
chdir $dir;

my $filecount = 100;
my $tmpfile = 'file.tmp';

while (1) {
    for (my $i=0; $i<$filecount; $i++) {
my $filename = "file.$i";
open(OUT, ">", $tmpfile);
        print OUT "Time:".localtime."\n";
        close OUT;
rename $tmpfile, $filename;
    }
}


On the following release/kernels in a VM:

Fedora 16 w/kernel 3.1.0-7.fc16.x86_64
Fedora 16 w/kernel 3.6.11-4.fc16.x86_64
Fedora 19 w/kernel 3.10-7.200.fc19.x86_64
Ubuntu 13.04 w/kernel 3.8.0-19-generic


And after a power cycle, all the files are zero-length with no extents.

(CentOS 6.4 w/kernel 2.6.32-358.14.1.el6.centos.plus.x86_64 has the binary
NULLS)

Barriers are not disabled and drive cache:
[    2.145011] sd 2:0:0:0: [sda] Cache data unavailable
[    2.145013] sd 2:0:0:0: [sda] Assuming drive cache: write through


The closest thing I can find in the documentation is the XFS FAQ which
mentions "you are looking at an inode which was flushed out, but whose data
was not", which seems to indicate that the inode writes and data writes are
not done in order, but nothing explicitly documents this.

Is this expected behavior?

I've added a sync to the end of my script to try and ensure this does not
happen again, and losing some amount of data after a power loss is
expected, but it seems counter-intuitive that the inode/data writes are not
done in order and that rapid file changes can result in such a large number
of files being zero-length.

[-- Attachment #1.2: Type: text/html, Size: 2749 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Power loss and zero-length files
  2013-08-23 14:59 Power loss and zero-length files Robert Widmer
@ 2013-08-23 15:20 ` Ben Myers
  2013-08-23 15:27   ` Robert Widmer
  2013-08-24 11:15 ` Christoph Hellwig
  1 sibling, 1 reply; 8+ messages in thread
From: Ben Myers @ 2013-08-23 15:20 UTC (permalink / raw)
  To: Robert Widmer; +Cc: xfs

Hey Robert,

On Fri, Aug 23, 2013 at 10:59:50AM -0400, Robert Widmer wrote:
> I had a script that updated several files on an XFS filesystem using "sed
> -i", and someone decided to power cycle the box without a sync after
> running the script, and found that all the files that were updated were now
> zero-length.

How did they power cycle the box?  With a 'shutdown -h now' you shouldn't have
this behavior, but resetting or unplugging the machine is a different matter.

> Curious, I ran the following script to try and isolate the behavior:
> 
> 
> #!/usr/bin/perl
> 
> my $dir = "/home/$ENV{USER}/XFSTest";
> mkdir $dir;
> chdir $dir;
> 
> my $filecount = 100;
> my $tmpfile = 'file.tmp';
> 
> while (1) {
>     for (my $i=0; $i<$filecount; $i++) {
> my $filename = "file.$i";
> open(OUT, ">", $tmpfile);
>         print OUT "Time:".localtime."\n";
>         close OUT;
> rename $tmpfile, $filename;
>     }
> }
> 
> 
> On the following release/kernels in a VM:
> 
> Fedora 16 w/kernel 3.1.0-7.fc16.x86_64
> Fedora 16 w/kernel 3.6.11-4.fc16.x86_64
> Fedora 19 w/kernel 3.10-7.200.fc19.x86_64
> Ubuntu 13.04 w/kernel 3.8.0-19-generic
> 
> 
> And after a power cycle, all the files are zero-length with no extents.
> 
> (CentOS 6.4 w/kernel 2.6.32-358.14.1.el6.centos.plus.x86_64 has the binary
> NULLS)
> 
> Barriers are not disabled and drive cache:
> [    2.145011] sd 2:0:0:0: [sda] Cache data unavailable
> [    2.145013] sd 2:0:0:0: [sda] Assuming drive cache: write through
> 
> 
> The closest thing I can find in the documentation is the XFS FAQ which
> mentions "you are looking at an inode which was flushed out, but whose data
> was not", which seems to indicate that the inode writes and data writes are
> not done in order, but nothing explicitly documents this.

You have it correct.  The inode writes are a separate from the data writes.

> Is this expected behavior?
> 
> I've added a sync to the end of my script to try and ensure this does not
> happen again, and losing some amount of data after a power loss is
> expected, but it seems counter-intuitive that the inode/data writes are not
> done in order and that rapid file changes can result in such a large number
> of files being zero-length.

For a reset or hard power cycle this is the expected behavior.  The inode will
have been logged when it was created and is likely to be written out before the
data.  Unless you issue an fsync, the data will be sitting around in cache
until the kernel decides to write the pages out, and only then is the size
updated.  Adding the fsync is the right thing to do.  ;)

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Power loss and zero-length files
  2013-08-23 15:20 ` Ben Myers
@ 2013-08-23 15:27   ` Robert Widmer
  2013-08-23 15:45     ` Ben Myers
  0 siblings, 1 reply; 8+ messages in thread
From: Robert Widmer @ 2013-08-23 15:27 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Fri, Aug 23, 2013 at 11:20 AM, Ben Myers <bpm@sgi.com> wrote:
>
> Hey Robert,
>
> On Fri, Aug 23, 2013 at 10:59:50AM -0400, Robert Widmer wrote:
> > I had a script that updated several files on an XFS filesystem using "sed
> > -i", and someone decided to power cycle the box without a sync after
> > running the script, and found that all the files that were updated were now
> > zero-length.
>
> How did they power cycle the box?  With a 'shutdown -h now' you shouldn't have
> this behavior, but resetting or unplugging the machine is a different matter.

The person ran the script, unplugged the machine (instead of shutting
it down like they were told), and boxed it up.


> > Curious, I ran the following script to try and isolate the behavior:
> >
> >
> > #!/usr/bin/perl
> >
> > my $dir = "/home/$ENV{USER}/XFSTest";
> > mkdir $dir;
> > chdir $dir;
> >
> > my $filecount = 100;
> > my $tmpfile = 'file.tmp';
> >
> > while (1) {
> >     for (my $i=0; $i<$filecount; $i++) {
> > my $filename = "file.$i";
> > open(OUT, ">", $tmpfile);
> >         print OUT "Time:".localtime."\n";
> >         close OUT;
> > rename $tmpfile, $filename;
> >     }
> > }
> >
> >
> > On the following release/kernels in a VM:
> >
> > Fedora 16 w/kernel 3.1.0-7.fc16.x86_64
> > Fedora 16 w/kernel 3.6.11-4.fc16.x86_64
> > Fedora 19 w/kernel 3.10-7.200.fc19.x86_64
> > Ubuntu 13.04 w/kernel 3.8.0-19-generic
> >
> >
> > And after a power cycle, all the files are zero-length with no extents.
> >
> > (CentOS 6.4 w/kernel 2.6.32-358.14.1.el6.centos.plus.x86_64 has the binary
> > NULLS)
> >
> > Barriers are not disabled and drive cache:
> > [    2.145011] sd 2:0:0:0: [sda] Cache data unavailable
> > [    2.145013] sd 2:0:0:0: [sda] Assuming drive cache: write through
> >
> >
> > The closest thing I can find in the documentation is the XFS FAQ which
> > mentions "you are looking at an inode which was flushed out, but whose data
> > was not", which seems to indicate that the inode writes and data writes are
> > not done in order, but nothing explicitly documents this.
>
> You have it correct.  The inode writes are a separate from the data writes.
>
> > Is this expected behavior?
> >
> > I've added a sync to the end of my script to try and ensure this does not
> > happen again, and losing some amount of data after a power loss is
> > expected, but it seems counter-intuitive that the inode/data writes are not
> > done in order and that rapid file changes can result in such a large number
> > of files being zero-length.
>
> For a reset or hard power cycle this is the expected behavior.  The inode will
> have been logged when it was created and is likely to be written out before the
> data.  Unless you issue an fsync, the data will be sitting around in cache
> until the kernel decides to write the pages out, and only then is the size
> updated.  Adding the fsync is the right thing to do.  ;)

Okey dokey, I'll be more vigilant in making sure my changes are
synced. Thanks for the quick response.


> Regards,
>         Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Power loss and zero-length files
  2013-08-23 15:27   ` Robert Widmer
@ 2013-08-23 15:45     ` Ben Myers
  2013-08-23 22:32       ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Myers @ 2013-08-23 15:45 UTC (permalink / raw)
  To: Robert Widmer; +Cc: xfs

On Fri, Aug 23, 2013 at 11:27:19AM -0400, Robert Widmer wrote:
> On Fri, Aug 23, 2013 at 11:20 AM, Ben Myers <bpm@sgi.com> wrote:
> >
> > Hey Robert,
> >
> > On Fri, Aug 23, 2013 at 10:59:50AM -0400, Robert Widmer wrote:
> > > I had a script that updated several files on an XFS filesystem using "sed
> > > -i", and someone decided to power cycle the box without a sync after
> > > running the script, and found that all the files that were updated were now
> > > zero-length.
> >
> > How did they power cycle the box?  With a 'shutdown -h now' you shouldn't have
> > this behavior, but resetting or unplugging the machine is a different matter.
> 
> The person ran the script, unplugged the machine (instead of shutting
> it down like they were told), and boxed it up.

lol  ;)

> > > Curious, I ran the following script to try and isolate the behavior:
> > >
> > >
> > > #!/usr/bin/perl
> > >
> > > my $dir = "/home/$ENV{USER}/XFSTest";
> > > mkdir $dir;
> > > chdir $dir;
> > >
> > > my $filecount = 100;
> > > my $tmpfile = 'file.tmp';
> > >
> > > while (1) {
> > >     for (my $i=0; $i<$filecount; $i++) {
> > > my $filename = "file.$i";
> > > open(OUT, ">", $tmpfile);
> > >         print OUT "Time:".localtime."\n";
> > >         close OUT;
> > > rename $tmpfile, $filename;
> > >     }
> > > }
> > >
> > >
> > > On the following release/kernels in a VM:
> > >
> > > Fedora 16 w/kernel 3.1.0-7.fc16.x86_64
> > > Fedora 16 w/kernel 3.6.11-4.fc16.x86_64
> > > Fedora 19 w/kernel 3.10-7.200.fc19.x86_64
> > > Ubuntu 13.04 w/kernel 3.8.0-19-generic
> > >
> > >
> > > And after a power cycle, all the files are zero-length with no extents.
> > >
> > > (CentOS 6.4 w/kernel 2.6.32-358.14.1.el6.centos.plus.x86_64 has the binary
> > > NULLS)
> > >
> > > Barriers are not disabled and drive cache:
> > > [    2.145011] sd 2:0:0:0: [sda] Cache data unavailable
> > > [    2.145013] sd 2:0:0:0: [sda] Assuming drive cache: write through
> > >
> > >
> > > The closest thing I can find in the documentation is the XFS FAQ which
> > > mentions "you are looking at an inode which was flushed out, but whose data
> > > was not", which seems to indicate that the inode writes and data writes are
> > > not done in order, but nothing explicitly documents this.
> >
> > You have it correct.  The inode writes are a separate from the data writes.
> >
> > > Is this expected behavior?
> > >
> > > I've added a sync to the end of my script to try and ensure this does not
> > > happen again, and losing some amount of data after a power loss is
> > > expected, but it seems counter-intuitive that the inode/data writes are not
> > > done in order and that rapid file changes can result in such a large number
> > > of files being zero-length.
> >
> > For a reset or hard power cycle this is the expected behavior.  The inode will
> > have been logged when it was created and is likely to be written out before the
> > data.  Unless you issue an fsync, the data will be sitting around in cache
> > until the kernel decides to write the pages out, and only then is the size
> > updated.  Adding the fsync is the right thing to do.  ;)
> 
> Okey dokey, I'll be more vigilant in making sure my changes are
> synced. Thanks for the quick response.

No problem.  Good luck!  

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Power loss and zero-length files
  2013-08-23 15:45     ` Ben Myers
@ 2013-08-23 22:32       ` Stan Hoeppner
  2013-08-23 22:50         ` Ben Myers
  0 siblings, 1 reply; 8+ messages in thread
From: Stan Hoeppner @ 2013-08-23 22:32 UTC (permalink / raw)
  To: Ben Myers; +Cc: Robert Widmer, xfs

On 8/23/2013 10:45 AM, Ben Myers wrote:
> On Fri, Aug 23, 2013 at 11:27:19AM -0400, Robert Widmer wrote:
>> On Fri, Aug 23, 2013 at 11:20 AM, Ben Myers <bpm@sgi.com> wrote:

>> The person ran the script, unplugged the machine (instead of shutting
>> it down like they were told), and boxed it up.
> 
> lol  ;)

Yeah, that's pretty ignorant.  Reminds me of this thread in March:

http://oss.sgi.com/archives/xfs/2013-03/msg00152.html

But "we" (IT industry) shot ourselves in the foot when we began using
the word "appliance".  No wonder then that some people literally treat
them as such.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Power loss and zero-length files
  2013-08-23 22:32       ` Stan Hoeppner
@ 2013-08-23 22:50         ` Ben Myers
  2013-08-24  2:35           ` Stan Hoeppner
  0 siblings, 1 reply; 8+ messages in thread
From: Ben Myers @ 2013-08-23 22:50 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: Robert Widmer, xfs

Hey Stan,

On Fri, Aug 23, 2013 at 05:32:26PM -0500, Stan Hoeppner wrote:
> On 8/23/2013 10:45 AM, Ben Myers wrote:
> > On Fri, Aug 23, 2013 at 11:27:19AM -0400, Robert Widmer wrote:
> >> On Fri, Aug 23, 2013 at 11:20 AM, Ben Myers <bpm@sgi.com> wrote:
> 
> >> The person ran the script, unplugged the machine (instead of shutting
> >> it down like they were told), and boxed it up.
> > 
> > lol  ;)
> 
> Yeah, that's pretty ignorant.  Reminds me of this thread in March:
> 
> http://oss.sgi.com/archives/xfs/2013-03/msg00152.html
> 
> But "we" (IT industry) shot ourselves in the foot when we began using
> the word "appliance".  No wonder then that some people literally treat
> them as such.

On the other foot, it's probably not such an unreasonable expectation for
someone coming from a different background to expect that once it *looks* done
it actually *is* done.

It's as if you were to take me scuba diving, or drag racing... I have zero
expertise and I'd probably be doing all the silly ignorant things too.

Or... maybe the tech was just in a hurry and didn't think all the way through
the consequences.  I've been there.  ;)

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Power loss and zero-length files
  2013-08-23 22:50         ` Ben Myers
@ 2013-08-24  2:35           ` Stan Hoeppner
  0 siblings, 0 replies; 8+ messages in thread
From: Stan Hoeppner @ 2013-08-24  2:35 UTC (permalink / raw)
  To: Ben Myers; +Cc: Robert Widmer, xfs

On 8/23/2013 5:50 PM, Ben Myers wrote:
> Hey Stan,
> 
> On Fri, Aug 23, 2013 at 05:32:26PM -0500, Stan Hoeppner wrote:
>> On 8/23/2013 10:45 AM, Ben Myers wrote:
>>> On Fri, Aug 23, 2013 at 11:27:19AM -0400, Robert Widmer wrote:
>>>> On Fri, Aug 23, 2013 at 11:20 AM, Ben Myers <bpm@sgi.com> wrote:
>>
>>>> The person ran the script, unplugged the machine (instead of shutting
>>>> it down like they were told), and boxed it up.
>>>
>>> lol  ;)
>>
>> Yeah, that's pretty ignorant.  Reminds me of this thread in March:
>>
>> http://oss.sgi.com/archives/xfs/2013-03/msg00152.html
>>
>> But "we" (IT industry) shot ourselves in the foot when we began using
>> the word "appliance".  No wonder then that some people literally treat
>> them as such.
> 
> On the other foot, it's probably not such an unreasonable expectation for
> someone coming from a different background to expect that once it *looks* done
> it actually *is* done.
> 
> It's as if you were to take me scuba diving, or drag racing... I have zero
> expertise and I'd probably be doing all the silly ignorant things too.
> 
> Or... maybe the tech was just in a hurry and didn't think all the way through
> the consequences.  I've been there.  ;)

I think the problem in many of these cases is that the people using the
hardware are not technicians, at least in the traditional sense.  So
when 'we' use the word appliance they treat the hardware like a TV or
DVD player.  With either, pulling the plug while it's running has no
detrimental effect.

The SCUBA and drag racing analogies don't really fit here as you're
completely out of your element.  And acquiring the skills to do either
requires many weeks of training, time in the water or behind the wheel.
 "shutdown -h now <ENTER>" requires no training, but just reading one
line on an instruction sheet and typing it in. ;)

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Power loss and zero-length files
  2013-08-23 14:59 Power loss and zero-length files Robert Widmer
  2013-08-23 15:20 ` Ben Myers
@ 2013-08-24 11:15 ` Christoph Hellwig
  1 sibling, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2013-08-24 11:15 UTC (permalink / raw)
  To: Robert Widmer; +Cc: xfs

On Fri, Aug 23, 2013 at 10:59:50AM -0400, Robert Widmer wrote:
> I had a script that updated several files on an XFS filesystem using "sed
> -i", and someone decided to power cycle the box without a sync after
> running the script, and found that all the files that were updated were now
> zero-length.

>From looking at the scripts this looks expected.

> Curious, I ran the following script to try and isolate the behavior:
> 
> 
> #!/usr/bin/perl
> 
> my $dir = "/home/$ENV{USER}/XFSTest";
> mkdir $dir;
> chdir $dir;
> 
> my $filecount = 100;
> my $tmpfile = 'file.tmp';
> 
> while (1) {
>     for (my $i=0; $i<$filecount; $i++) {
> my $filename = "file.$i";
> open(OUT, ">", $tmpfile);
>         print OUT "Time:".localtime."\n";
>         close OUT;
> rename $tmpfile, $filename;
>     }
> }

there is nothing flushing out the data to disk, so if the xfs metadata
commit interval is faster than the the VM dirty writeback time the above
is what you get.  Try doing the perl exquivalent of a fsync/fdatasync on
the OUT fd and things should be on disk.

> Barriers are not disabled and drive cache:
> [    2.145011] sd 2:0:0:0: [sda] Cache data unavailable
> [    2.145013] sd 2:0:0:0: [sda] Assuming drive cache: write through

What kind of disk is this?  You said VM above, so I'd be curious what
kind of VM doesn't support the scsi caching mode pages.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2013-08-24 11:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23 14:59 Power loss and zero-length files Robert Widmer
2013-08-23 15:20 ` Ben Myers
2013-08-23 15:27   ` Robert Widmer
2013-08-23 15:45     ` Ben Myers
2013-08-23 22:32       ` Stan Hoeppner
2013-08-23 22:50         ` Ben Myers
2013-08-24  2:35           ` Stan Hoeppner
2013-08-24 11:15 ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.