On Mon, 2009-03-30 at 09:58 -0700, Linus Torvalds wrote: > > On Mon, 30 Mar 2009, Mark Lord wrote: > > > > I spent an entire day recently, trying to see if I could significantly fill > > up the 32MB cache on a 750GB Hitach SATA drive here. > > > > With deliberate/random write patterns, big and small, near and far, > > I could not fill the drive with anything approaching a full second > > of latent write-cache flush time. > > > > Not even close. Which is a pity, because I really wanted to do some testing > > related to a deep write cache. But it just wouldn't happen. > > > > I tried this again on a 16MB cache of a Seagate drive, no difference. > > > > Bummer. :) > > Try it with laptop drives. You might get to a second, or at least hundreds > of ms (not counting the spinup delay if it went to sleep, obviously). You > probably tested desktop drives (that 750GB Hitachi one is not a low end > one, and I assume the Seagate one isn't either). I had some fun trying things with this, and I've been able to reliably trigger stalls in write cache of ~60 seconds on my seagate 500GB sata drive. The worst I saw was 214 seconds. It took a little experimentation, and I had to switch to the noop scheduler (no idea why). Also, I had to watch vmstat closely. When the test first started, vmstat was reporting 500kb/s or so write throughput. After the test ran for a few minutes, vmstat jumped up to 8MB/s. My guess is that the drive has some internal threshold for when it decides to only write in cache. The switch to 8MB/s is when it switched to cache only goodness. Or perhaps the attached program is buggy and I'll end up looking silly...it was some quick coding. The test forks two procs. One proc does 4k writes to the first 26MB of the test file (/dev/sdb for me). These writes are O_DIRECT, and use a block size of 4k. The idea is that we fill the cache with work that is very beneficial to keep in cache, but that the drive will tend to flush out because it is filling up tracks. The second proc O_DIRECT writes to two adjacent sectors far away from the hot writes from the first proc, and it puts in a timestamp from just before the write. Every second or so, this timestamp is printed to stderr. The drive will want to keep these two sectors in cache because we are constantly overwriting them. (It's worth mentioning this is a destructive test. Running it on /dev/sdb will overwrite the first 64MB of the drive!!!!) Sample output: # ./wb-latency /dev/sdb Found tv 1238434622.461527 starting hot writes run starting tester run current time 1238435045.529751 current time 1238435046.531250 ... current time 1238435063.772456 current time 1238435064.788639 current time 1238435065.814101 current time 1238435066.847704 Right here, I pull the power cord. The box comes back up, and I run: # ./wb-latency -c /dev/sdb Found tv 1238435067.347829 When -c is passed, it just reads the timestamp out of the timestamp block and exits. You compare this value with the value printed just before you pulled the block. For the run here, the two values are within .5s of each other. The tester only prints the time every one second, so anything that close is very good. I had pulled the plug before the drive got into that fast 8MB/s mode, so the drive was doing a pretty good job of fairly servicing the cache. My drive has a cache of 32MB. Smaller caches probably need a smaller hot zone. -chris