* Intel SSD data loss: Any possible way this is user / software error?
@ 2010-08-12 21:02 Evan Jones
2010-08-13 11:57 ` Eric Sandeen
0 siblings, 1 reply; 4+ messages in thread
From: Evan Jones @ 2010-08-12 21:02 UTC (permalink / raw)
To: linux-ext4
I'm testing a few systems that attempt to log data to disk reliably. I
bought a brand new Intel SSD (X25-M G2) for this purpose. It appears to
me that this disk does *not* store data reliably when there are power
failures, even with write barriers, even with the cache disabled. I'm
surprised that this disk might be this broken (possible), but it may
also mean I've made a mistake. Is there any possible way that I have a
bug in the test described below? The test works as expected with a
couple SATA magnetic disks.
Configuration:
* Linux 2.6.32 (a distributed with Ubuntu 10.04)
* SATA SSD directly attached to the system's built-in controller (Intel
N10/ICH7)
* ext4 with default options (meaning barrier=1)
* Disable the write cache (hdparm -W 0 /dev/sdb)
The test:
1. Write a 64 MB file of zeros (first use fallocate, then zero fill)
2. fsync()
3. write() blocks of this file with a sequence number.
4. fdatasync()
5. Send UDP packet reporting the sequence number written.
6. Go to 3.
While this test is running, I pull the power out of the drive to
simulate a hard failure. On the magnetic disks I have, this works as
expected: On reboot, the log file contains the complete record that was
reported as last written (it may also contain part of the next record).
On the X25-M, when I use large writes (128 kB), it loses data fairly
frequently (every couple attempts): I either see the last log record as
being before the reported one, or occasionally I get a media error when
reading back the file.
I'm surprised that this disk could be this broken, but I suppose it is
possible. Any help is welcomed. Thanks,
Evan Jones
--
Evan Jones
http://evanjones.ca/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Intel SSD data loss: Any possible way this is user / software error?
2010-08-12 21:02 Intel SSD data loss: Any possible way this is user / software error? Evan Jones
@ 2010-08-13 11:57 ` Eric Sandeen
2010-08-13 16:07 ` Evan Jones
0 siblings, 1 reply; 4+ messages in thread
From: Eric Sandeen @ 2010-08-13 11:57 UTC (permalink / raw)
To: Evan Jones; +Cc: linux-ext4
Evan Jones wrote:
> I'm testing a few systems that attempt to log data to disk reliably. I
> bought a brand new Intel SSD (X25-M G2) for this purpose. It appears to
> me that this disk does *not* store data reliably when there are power
> failures, even with write barriers, even with the cache disabled. I'm
> surprised that this disk might be this broken (possible), but it may
> also mean I've made a mistake. Is there any possible way that I have a
> bug in the test described below? The test works as expected with a
> couple SATA magnetic disks.
>
>
> Configuration:
>
> * Linux 2.6.32 (a distributed with Ubuntu 10.04)
> * SATA SSD directly attached to the system's built-in controller (Intel
> N10/ICH7)
> * ext4 with default options (meaning barrier=1)
> * Disable the write cache (hdparm -W 0 /dev/sdb)
Just out of curiosity, what do you see when the write cache is on?
Seems counter-intuitive that it'd work better, but talking w/
Ric Wheeler, he was curious... maybe Intel didn't test with the
write cache off?
Also, would you be willing to publish the test you're using?
Thanks,
-Eric
>
> The test:
>
> 1. Write a 64 MB file of zeros (first use fallocate, then zero fill)
> 2. fsync()
> 3. write() blocks of this file with a sequence number.
> 4. fdatasync()
> 5. Send UDP packet reporting the sequence number written.
> 6. Go to 3.
>
> While this test is running, I pull the power out of the drive to
> simulate a hard failure. On the magnetic disks I have, this works as
> expected: On reboot, the log file contains the complete record that was
> reported as last written (it may also contain part of the next record).
>
> On the X25-M, when I use large writes (128 kB), it loses data fairly
> frequently (every couple attempts): I either see the last log record as
> being before the reported one, or occasionally I get a media error when
> reading back the file.
>
> I'm surprised that this disk could be this broken, but I suppose it is
> possible. Any help is welcomed. Thanks,
>
> Evan Jones
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Intel SSD data loss: Any possible way this is user / software error?
2010-08-13 11:57 ` Eric Sandeen
@ 2010-08-13 16:07 ` Evan Jones
2010-08-15 16:18 ` Eric Sandeen
0 siblings, 1 reply; 4+ messages in thread
From: Evan Jones @ 2010-08-13 16:07 UTC (permalink / raw)
To: Eric Sandeen; +Cc: linux-ext4
On Aug 13, 2010, at 7:57 , Eric Sandeen wrote:
> Just out of curiosity, what do you see when the write cache is on?
> Seems counter-intuitive that it'd work better, but talking w/
> Ric Wheeler, he was curious... maybe Intel didn't test with the
> write cache off?
Data loss is much easier to trigger with the write cache on. It
happens to me on the first try. With the write cache off, I've only
been able to get it to occur with large writes (64 kB or larger), and
only about once every 3 times.
Others have observed data loss with the write cache enabled using
Intel SSDs. However, no one else seems to report data loss with the
cache disabled, which makes me wonder if I am doing something wrong.
With the X25-E:
http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/
And with the X25-M G2:
http://thread.gmane.org/gmane.os.solaris.opensolaris.zfs/33472
> Also, would you be willing to publish the test you're using?
The programs I have been using are here (but see below):
http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/minlogcrash.c
http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/logfilecrashserver.cc
minlogcrash.c is actually a simplified version of my *real* test
program (below). However, that program has a lot of dependencies and
unrelated crap. Unfortunately, I'm away from my hardware for the next
10 days or so, so minlogcrash has not actually been crash tested. I
think it should be equivalent, but just in case, the crash tested
version is here:
http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/logfilecrash.cc
My test procedure:
1. Start logfilecrashserver on a workstation:
./logfilecrashserver 12345
2. Start minlogcrash on the system under test (using large writes is
more likely to lose data: 128 kB or so):
./minlogcrash tmp workstation 12345 131072
3. Once the workstation starts receiving log records, pull the power
from the back of the SSD.
4. Power off the system (my system doesn't support hotplug, so losing
the power on the SSD makes it unhappy)
5. Reconnected power to the SSD.
6. Power the server back on.
7. Observe the output of logfilecrash using hexdump.
You should find that the file has *at least* the last record reported
by logfilecrashserver. It may have (part of) the next record. Error
modes I have observed: it is missing the last reported record
entirely; it has a truncated record; occasionally I get some sort of
media error in the kernel and I can't read the entire file.
Finally full disclosure: I tested this a lot more with the Intel SSD
than with my magnetic disks. With the magnetic disks and barrier=0, I
was able to very easily see "lost writes", but with barrier=1 it
seemed to work. However, I still need to go back and re-test the
magnetic disks multiple times, to ensure they are behaving the way I
expect.
Evan
--
Evan Jones
http://evanjones.ca/
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Intel SSD data loss: Any possible way this is user / software error?
2010-08-13 16:07 ` Evan Jones
@ 2010-08-15 16:18 ` Eric Sandeen
0 siblings, 0 replies; 4+ messages in thread
From: Eric Sandeen @ 2010-08-15 16:18 UTC (permalink / raw)
To: Evan Jones; +Cc: linux-ext4
Evan Jones wrote:
> On Aug 13, 2010, at 7:57 , Eric Sandeen wrote:
>> Just out of curiosity, what do you see when the write cache is on?
>> Seems counter-intuitive that it'd work better, but talking w/
>> Ric Wheeler, he was curious... maybe Intel didn't test with the
>> write cache off?
>
> Data loss is much easier to trigger with the write cache on. It happens
> to me on the first try. With the write cache off, I've only been able to
> get it to occur with large writes (64 kB or larger), and only about once
> every 3 times.
Ok, so working as expected then, really.
...
>> Also, would you be willing to publish the test you're using?
>
> The programs I have been using are here (but see below):
>
> http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/minlogcrash.c
>
> http://people.csail.mit.edu/evanj/hg/index.cgi/hstore/file/tip/logging/logfilecrashserver.cc
Cool, thanks for publishing all that info, a few people have done
power loss testing, always interesting to see what's been put together.
I'll take a closer look at some point...
-Eric
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-08-15 16:19 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-12 21:02 Intel SSD data loss: Any possible way this is user / software error? Evan Jones
2010-08-13 11:57 ` Eric Sandeen
2010-08-13 16:07 ` Evan Jones
2010-08-15 16:18 ` Eric Sandeen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.