All of lore.kernel.org
 help / color / mirror / Atom feed
* Intel SSD data loss: Any possible way this is user / software error?
@ 2010-08-12 21:02 Evan Jones
  2010-08-13 11:57 ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Evan Jones @ 2010-08-12 21:02 UTC (permalink / raw)
  To: linux-ext4

I'm testing a few systems that attempt to log data to disk reliably. I 
bought a brand new Intel SSD (X25-M G2) for this purpose. It appears to 
me that this disk does *not* store data reliably when there are power 
failures, even with write barriers, even with the cache disabled. I'm 
surprised that this disk might be this broken (possible), but it may 
also mean I've made a mistake. Is there any possible way that I have a 
bug in the test described below? The test works as expected with a 
couple SATA magnetic disks.


Configuration:

* Linux 2.6.32 (a distributed with Ubuntu 10.04)
* SATA SSD directly attached to the system's built-in controller (Intel 
N10/ICH7)
* ext4 with default options (meaning barrier=1)
* Disable the write cache (hdparm -W 0 /dev/sdb)


The test:

1. Write a 64 MB file of zeros (first use fallocate, then zero fill)
2. fsync()
3. write() blocks of this file with a sequence number.
4. fdatasync()
5. Send UDP packet reporting the sequence number written.
6. Go to 3.

While this test is running, I pull the power out of the drive to 
simulate a hard failure. On the magnetic disks I have, this works as 
expected: On reboot, the log file contains the complete record that was 
reported as last written (it may also contain part of the next record).

On the X25-M, when I use large writes (128 kB), it loses data fairly 
frequently (every couple attempts): I either see the last log record as 
being before the reported one, or occasionally I get a media error when 
reading back the file.

I'm surprised that this disk could be this broken, but I suppose it is 
possible. Any help is welcomed. Thanks,

Evan Jones

-- 
Evan Jones
http://evanjones.ca/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intel SSD data loss: Any possible way this is user / software error?
  2010-08-12 21:02 Intel SSD data loss: Any possible way this is user / software error? Evan Jones
@ 2010-08-13 11:57 ` Eric Sandeen
  2010-08-13 16:07   ` Evan Jones
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2010-08-13 11:57 UTC (permalink / raw)
  To: Evan Jones; +Cc: linux-ext4

Evan Jones wrote:
> I'm testing a few systems that attempt to log data to disk reliably. I
> bought a brand new Intel SSD (X25-M G2) for this purpose. It appears to
> me that this disk does *not* store data reliably when there are power
> failures, even with write barriers, even with the cache disabled. I'm
> surprised that this disk might be this broken (possible), but it may
> also mean I've made a mistake. Is there any possible way that I have a
> bug in the test described below? The test works as expected with a
> couple SATA magnetic disks.
> 
> 
> Configuration:
> 
> * Linux 2.6.32 (a distributed with Ubuntu 10.04)
> * SATA SSD directly attached to the system's built-in controller (Intel
> N10/ICH7)
> * ext4 with default options (meaning barrier=1)
> * Disable the write cache (hdparm -W 0 /dev/sdb)

Just out of curiosity, what do you see when the write cache is on?
Seems counter-intuitive that it'd work better, but talking w/
Ric Wheeler, he was curious... maybe Intel didn't test with the
write cache off?

Also, would you be willing to publish the test you're using?

Thanks,
-Eric

> 
> The test:
> 
> 1. Write a 64 MB file of zeros (first use fallocate, then zero fill)
> 2. fsync()
> 3. write() blocks of this file with a sequence number.
> 4. fdatasync()
> 5. Send UDP packet reporting the sequence number written.
> 6. Go to 3.
> 
> While this test is running, I pull the power out of the drive to
> simulate a hard failure. On the magnetic disks I have, this works as
> expected: On reboot, the log file contains the complete record that was
> reported as last written (it may also contain part of the next record).
> 
> On the X25-M, when I use large writes (128 kB), it loses data fairly
> frequently (every couple attempts): I either see the last log record as
> being before the reported one, or occasionally I get a media error when
> reading back the file.
> 
> I'm surprised that this disk could be this broken, but I suppose it is
> possible. Any help is welcomed. Thanks,
> 
> Evan Jones
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intel SSD data loss: Any possible way this is user / software error?
  2010-08-13 11:57 ` Eric Sandeen
@ 2010-08-13 16:07   ` Evan Jones
  2010-08-15 16:18     ` Eric Sandeen
  0 siblings, 1 reply; 4+ messages in thread
From: Evan Jones @ 2010-08-13 16:07 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: linux-ext4

On Aug 13, 2010, at 7:57 , Eric Sandeen wrote:
> Just out of curiosity, what do you see when the write cache is on?
> Seems counter-intuitive that it'd work better, but talking w/
> Ric Wheeler, he was curious... maybe Intel didn't test with the
> write cache off?

Data loss is much easier to trigger with the write cache on. It  
happens to me on the first try. With the write cache off, I've only  
been able to get it to occur with large writes (64 kB or larger), and  
only about once every 3 times.

Others have observed data loss with the write cache enabled using  
Intel SSDs. However, no one else seems to report data loss with the  
cache disabled, which makes me wonder if I am doing something wrong.  
With the X25-E:

http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/

And with the X25-M G2:

http://thread.gmane.org/gmane.os.solaris.opensolaris.zfs/33472


> Also, would you be willing to publish the test you're using?

The programs I have been using are here (but see below):

http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/minlogcrash.c
http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/logfilecrashserver.cc

minlogcrash.c is actually a simplified version of my *real* test  
program (below). However, that program has a lot of dependencies and  
unrelated crap. Unfortunately, I'm away from my hardware for the next  
10 days or so, so minlogcrash has not actually been crash tested. I  
think it should be equivalent, but just in case, the crash tested  
version is here:

http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/logfilecrash.cc


My test procedure:

1. Start logfilecrashserver on a workstation:

./logfilecrashserver 12345

2. Start minlogcrash on the system under test (using large writes is  
more likely to lose data: 128 kB or so):

./minlogcrash tmp workstation 12345 131072

3. Once the workstation starts receiving log records, pull the power  
from the back of the SSD.
4. Power off the system (my system doesn't support hotplug, so losing  
the power on the SSD makes it unhappy)
5. Reconnected power to the SSD.
6. Power the server back on.
7. Observe the output of logfilecrash using hexdump.

You should find that the file has *at least* the last record reported  
by logfilecrashserver. It may have (part of) the next record. Error  
modes I have observed: it is missing the last reported record  
entirely; it has a truncated record; occasionally I get some sort of  
media error in the kernel and I can't read the entire file.

Finally full disclosure: I tested this a lot more with the Intel SSD  
than with my magnetic disks. With the magnetic disks and barrier=0, I  
was able to very easily see "lost writes", but with barrier=1 it  
seemed to work. However, I still need to go back and re-test the  
magnetic disks multiple times, to ensure they are behaving the way I  
expect.

Evan

--
Evan Jones
http://evanjones.ca/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Intel SSD data loss: Any possible way this is user / software error?
  2010-08-13 16:07   ` Evan Jones
@ 2010-08-15 16:18     ` Eric Sandeen
  0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2010-08-15 16:18 UTC (permalink / raw)
  To: Evan Jones; +Cc: linux-ext4

Evan Jones wrote:
> On Aug 13, 2010, at 7:57 , Eric Sandeen wrote:
>> Just out of curiosity, what do you see when the write cache is on?
>> Seems counter-intuitive that it'd work better, but talking w/
>> Ric Wheeler, he was curious... maybe Intel didn't test with the
>> write cache off?
> 
> Data loss is much easier to trigger with the write cache on. It happens
> to me on the first try. With the write cache off, I've only been able to
> get it to occur with large writes (64 kB or larger), and only about once
> every 3 times.

Ok, so working as expected then, really.

...

>> Also, would you be willing to publish the test you're using?
> 
> The programs I have been using are here (but see below):
> 
> http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/minlogcrash.c
> 
> http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/logfilecrashserver.cc

Cool, thanks for publishing all that info, a few people have done 
power loss testing, always interesting to see what's been put together.

I'll take a closer look at some point...

-Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2010-08-15 16:19 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-12 21:02 Intel SSD data loss: Any possible way this is user / software error? Evan Jones
2010-08-13 11:57 ` Eric Sandeen
2010-08-13 16:07   ` Evan Jones
2010-08-15 16:18     ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.