* Hyper-Threading Vulnerability @ 2005-05-13 5:51 Gabor MICSKO 2005-05-13 12:47 ` Barry K. Nathan 2005-05-13 18:03 ` Andi Kleen 0 siblings, 2 replies; 144+ messages in thread From: Gabor MICSKO @ 2005-05-13 5:51 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 684 bytes --] Hi! From http://kerneltrap.org/node/5103 ``Hyper-Threading, as currently implemented on Intel Pentium Extreme Edition, Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from a serious security flaw," Colin explains. "This flaw permits local information disclosure, including allowing an unprivileged user to steal an RSA private key being used on the same machine. Administrators of multi-user systems are strongly advised to take action to disable Hyper-Threading immediately." ``More'' info here: http://www.daemonology.net/hyperthreading-considered-harmful/ Is this flaw affects the current stable Linux kernels? Workaround? Patch? Thanks. - MG [-- Attachment #2: Ez az üzenetrész digitális aláírással van ellátva --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 5:51 Hyper-Threading Vulnerability Gabor MICSKO @ 2005-05-13 12:47 ` Barry K. Nathan 2005-05-13 14:10 ` Jeff Garzik 2005-05-13 18:03 ` Andi Kleen 1 sibling, 1 reply; 144+ messages in thread From: Barry K. Nathan @ 2005-05-13 12:47 UTC (permalink / raw) To: Gabor MICSKO; +Cc: linux-kernel On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote: > Is this flaw affects the current stable Linux kernels? Workaround? > Patch? Some pages with relevant information: http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html http://bugzilla.kernel.org/show_bug.cgi?id=2317 AFAICT, the workaround is something like this: 1. If possible, disable HyperThreading in BIOS. 2. If you have only one CPU, boot a UP kernel rather than SMP. 3. If you have 2 or more CPU's and you can't disable HT in the BIOS, boot with "maxcpus=n", where "n" is the number of physical CPU's in the computer (e.g. "maxcpus=2"). If you are running a kernel earlier than 2.6.5 or 2.4.26, this probably isn't going to work. If you try this, check dmesg afterward to make sure it worked properly (see the bugzilla.kernel.org URL for details). 4. If you would try #3 except you are running a 2.4.xx *vendor* kernel (not mainline), where xx < 26, try "noht". 5. If #3 and #4 don't work, try "acpi=off". Option #3 ("maxcpus=2") is what I expect to be deploying in the next several hours, FWIW... -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 12:47 ` Barry K. Nathan @ 2005-05-13 14:10 ` Jeff Garzik 2005-05-13 14:23 ` Daniel Jacobowitz 2005-05-13 20:23 ` Barry K. Nathan 0 siblings, 2 replies; 144+ messages in thread From: Jeff Garzik @ 2005-05-13 14:10 UTC (permalink / raw) To: Barry K. Nathan; +Cc: Gabor MICSKO, linux-kernel Barry K. Nathan wrote: > On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote: > >>Is this flaw affects the current stable Linux kernels? Workaround? >>Patch? Simple. Just boot a uniprocessor kernel, and/or disable HT in BIOS. > Some pages with relevant information: > http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html > http://bugzilla.kernel.org/show_bug.cgi?id=2317 These pages have zero information on the "flaw." In fact, I can see no information at all proving that there is even a problem here. Classic "I found a problem, but I'm keeping the info a secret" security crapola. Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 14:10 ` Jeff Garzik @ 2005-05-13 14:23 ` Daniel Jacobowitz 2005-05-13 14:32 ` Jeff Garzik 2005-05-13 20:23 ` Barry K. Nathan 1 sibling, 1 reply; 144+ messages in thread From: Daniel Jacobowitz @ 2005-05-13 14:23 UTC (permalink / raw) To: Jeff Garzik; +Cc: Barry K. Nathan, Gabor MICSKO, linux-kernel On Fri, May 13, 2005 at 10:10:36AM -0400, Jeff Garzik wrote: > Barry K. Nathan wrote: > >On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote: > > > >>Is this flaw affects the current stable Linux kernels? Workaround? > >>Patch? > > Simple. Just boot a uniprocessor kernel, and/or disable HT in BIOS. > > > >Some pages with relevant information: > >http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html > >http://bugzilla.kernel.org/show_bug.cgi?id=2317 > > These pages have zero information on the "flaw." In fact, I can see no > information at all proving that there is even a problem here. > > Classic "I found a problem, but I'm keeping the info a secret" security > crapola. FYI: http://www.daemonology.net/hyperthreading-considered-harmful/ I don't much agree with Colin about the severity of the problem, but I've read his paper, which should be generally available later today. It's definitely a legitimate issue. -- Daniel Jacobowitz CodeSourcery, LLC ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 14:23 ` Daniel Jacobowitz @ 2005-05-13 14:32 ` Jeff Garzik 2005-05-13 17:13 ` Andy Isaacson 2005-05-13 17:14 ` Gabor MICSKO 0 siblings, 2 replies; 144+ messages in thread From: Jeff Garzik @ 2005-05-13 14:32 UTC (permalink / raw) To: Daniel Jacobowitz; +Cc: Barry K. Nathan, Gabor MICSKO, linux-kernel Daniel Jacobowitz wrote: > On Fri, May 13, 2005 at 10:10:36AM -0400, Jeff Garzik wrote: > >>Barry K. Nathan wrote: >> >>>On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote: >>> >>> >>>>Is this flaw affects the current stable Linux kernels? Workaround? >>>>Patch? >> >>Simple. Just boot a uniprocessor kernel, and/or disable HT in BIOS. >> >> >> >>>Some pages with relevant information: >>>http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html >>>http://bugzilla.kernel.org/show_bug.cgi?id=2317 >> >>These pages have zero information on the "flaw." In fact, I can see no >>information at all proving that there is even a problem here. >> >>Classic "I found a problem, but I'm keeping the info a secret" security >>crapola. > > > FYI: > http://www.daemonology.net/hyperthreading-considered-harmful/ Already read it. This link provides no more information than either of the above links provide. > I don't much agree with Colin about the severity of the problem, but > I've read his paper, which should be generally available later today. > It's definitely a legitimate issue. We'll see... As of this moment, there continues to be _zero_ information proving that a problem exists. Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 14:32 ` Jeff Garzik @ 2005-05-13 17:13 ` Andy Isaacson 2005-05-13 18:30 ` Vadim Lobanov 2005-05-13 17:14 ` Gabor MICSKO 1 sibling, 1 reply; 144+ messages in thread From: Andy Isaacson @ 2005-05-13 17:13 UTC (permalink / raw) To: Jeff Garzik Cc: Daniel Jacobowitz, Barry K. Nathan, Gabor MICSKO, linux-kernel On Fri, May 13, 2005 at 10:32:48AM -0400, Jeff Garzik wrote: > Daniel Jacobowitz wrote: > > http://www.daemonology.net/hyperthreading-considered-harmful/ > > Already read it. This link provides no more information than either of > the above links provide. He's posted his paper now. http://www.daemonology.net/papers/htt.pdf It's a side channel timing attack on data-dependent computation through the L1 and L2 caches. Nice work. In-the-wild exploitation is difficult, though; your timing gets screwed up if you get scheduled away from your victim, and you don't even know, because you can't tell where you were scheduled, so on any reasonably busy multiuser system it's not clear that the attack is practical. -andy ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 17:13 ` Andy Isaacson @ 2005-05-13 18:30 ` Vadim Lobanov 2005-05-13 19:02 ` Andy Isaacson 0 siblings, 1 reply; 144+ messages in thread From: Vadim Lobanov @ 2005-05-13 18:30 UTC (permalink / raw) To: Andy Isaacson Cc: Jeff Garzik, Daniel Jacobowitz, Barry K. Nathan, Gabor MICSKO, linux-kernel On Fri, 13 May 2005, Andy Isaacson wrote: > On Fri, May 13, 2005 at 10:32:48AM -0400, Jeff Garzik wrote: > > Daniel Jacobowitz wrote: > > > http://www.daemonology.net/hyperthreading-considered-harmful/ > > > > Already read it. This link provides no more information than either of > > the above links provide. > > He's posted his paper now. > > http://www.daemonology.net/papers/htt.pdf > > It's a side channel timing attack on data-dependent computation through > the L1 and L2 caches. Nice work. In-the-wild exploitation is > difficult, though; your timing gets screwed up if you get scheduled away > from your victim, and you don't even know, because you can't tell where > you were scheduled, so on any reasonably busy multiuser system it's not > clear that the attack is practical. > > -andy > - Wouldn't scheduling appear as a rather big time delta (in measuring the cache access times), so you would know to disregard that data point? (Just wondering... :-) ) -Vadim ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:30 ` Vadim Lobanov @ 2005-05-13 19:02 ` Andy Isaacson 2005-05-15 9:31 ` Adrian Bunk 0 siblings, 1 reply; 144+ messages in thread From: Andy Isaacson @ 2005-05-13 19:02 UTC (permalink / raw) To: Vadim Lobanov Cc: Jeff Garzik, Daniel Jacobowitz, Barry K. Nathan, Gabor MICSKO, linux-kernel On Fri, May 13, 2005 at 11:30:27AM -0700, Vadim Lobanov wrote: > On Fri, 13 May 2005, Andy Isaacson wrote: > > It's a side channel timing attack on data-dependent computation through > > the L1 and L2 caches. Nice work. In-the-wild exploitation is > > difficult, though; your timing gets screwed up if you get scheduled away > > from your victim, and you don't even know, because you can't tell where > > you were scheduled, so on any reasonably busy multiuser system it's not > > clear that the attack is practical. > > Wouldn't scheduling appear as a rather big time delta (in measuring the > cache access times), so you would know to disregard that data point? > > (Just wondering... :-) ) Good question. Yes, you can probably filter the data. The question is, how hard is it to set up the conditions to acquire the data? You have to be scheduled on the same core as the target process (sibling threads). And you don't know when the target is going to be scheduled, and on a real-world system, there are other threads competing for scheduling; if it's SMP (2 core, 4 thread) with perfect 100% utilization then you've only got a 33% chance of being scheduled on the right thread, and it gets worse if the machine is idle since the kernel should schedule you and the OpenSSL process on different cores... Getting the conditions right is challenging. Not impossible, but neither is it a foregone conclusion. -andy ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 19:02 ` Andy Isaacson @ 2005-05-15 9:31 ` Adrian Bunk 0 siblings, 0 replies; 144+ messages in thread From: Adrian Bunk @ 2005-05-15 9:31 UTC (permalink / raw) To: Andy Isaacson Cc: Vadim Lobanov, Jeff Garzik, Daniel Jacobowitz, Barry K. Nathan, Gabor MICSKO, linux-kernel On Fri, May 13, 2005 at 12:02:44PM -0700, Andy Isaacson wrote: > On Fri, May 13, 2005 at 11:30:27AM -0700, Vadim Lobanov wrote: > > On Fri, 13 May 2005, Andy Isaacson wrote: > > > It's a side channel timing attack on data-dependent computation through > > > the L1 and L2 caches. Nice work. In-the-wild exploitation is > > > difficult, though; your timing gets screwed up if you get scheduled away > > > from your victim, and you don't even know, because you can't tell where > > > you were scheduled, so on any reasonably busy multiuser system it's not > > > clear that the attack is practical. > > > > Wouldn't scheduling appear as a rather big time delta (in measuring the > > cache access times), so you would know to disregard that data point? > > > > (Just wondering... :-) ) > > Good question. Yes, you can probably filter the data. The question is, > how hard is it to set up the conditions to acquire the data? You have > to be scheduled on the same core as the target process (sibling > threads). And you don't know when the target is going to be scheduled, > and on a real-world system, there are other threads competing for > scheduling; if it's SMP (2 core, 4 thread) with perfect 100% utilization > then you've only got a 33% chance of being scheduled on the right > thread, and it gets worse if the machine is idle since the kernel should > schedule you and the OpenSSL process on different cores... >... But if you start 3 processes in the idle case you might get a 100% chance? > -andy cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 14:32 ` Jeff Garzik 2005-05-13 17:13 ` Andy Isaacson @ 2005-05-13 17:14 ` Gabor MICSKO 1 sibling, 0 replies; 144+ messages in thread From: Gabor MICSKO @ 2005-05-13 17:14 UTC (permalink / raw) To: Jeff Garzik; +Cc: Daniel Jacobowitz, Barry K. Nathan, linux-kernel [-- Attachment #1: Type: text/plain, Size: 267 bytes --] More info in this paper: http://www.daemonology.net/papers/htt.pdf > > FYI: > > http://www.daemonology.net/hyperthreading-considered-harmful/ > > Already read it. This link provides no more information than either of > the above links provide. [-- Attachment #2: Ez az üzenetrész digitális aláírással van ellátva --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 14:10 ` Jeff Garzik 2005-05-13 14:23 ` Daniel Jacobowitz @ 2005-05-13 20:23 ` Barry K. Nathan 1 sibling, 0 replies; 144+ messages in thread From: Barry K. Nathan @ 2005-05-13 20:23 UTC (permalink / raw) To: Jeff Garzik; +Cc: Barry K. Nathan, Gabor MICSKO, linux-kernel On Fri, May 13, 2005 at 10:10:36AM -0400, Jeff Garzik wrote: > Barry K. Nathan wrote: > >On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote: > > > >>Is this flaw affects the current stable Linux kernels? Workaround? > >>Patch? > > Simple. Just boot a uniprocessor kernel, and/or disable HT in BIOS. > > > >Some pages with relevant information: > >http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html > >http://bugzilla.kernel.org/show_bug.cgi?id=2317 > > These pages have zero information on the "flaw." In fact, I can see no > information at all proving that there is even a problem here. I meant that those two URL's have relevant information regarding disabling HT for those of us who can't simply boot a UP kernel or disable HT in the BIOS, not that they had information on the flaw. -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 5:51 Hyper-Threading Vulnerability Gabor MICSKO 2005-05-13 12:47 ` Barry K. Nathan @ 2005-05-13 18:03 ` Andi Kleen 2005-05-13 18:34 ` Eric Rannaud ` (3 more replies) 1 sibling, 4 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-13 18:03 UTC (permalink / raw) To: Gabor MICSKO; +Cc: linux-kernel Gabor MICSKO <gmicsko@szintezis.hu> writes: > Hi! > > From http://kerneltrap.org/node/5103 > > ``Hyper-Threading, as currently implemented on Intel Pentium Extreme > Edition, Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from > a serious security flaw," Colin explains. "This flaw permits local > information disclosure, including allowing an unprivileged user to steal > an RSA private key being used on the same machine. Administrators of > multi-user systems are strongly advised to take action to disable > Hyper-Threading immediately." > > ``More'' info here: > http://www.daemonology.net/hyperthreading-considered-harmful/ > > Is this flaw affects the current stable Linux kernels? Workaround? > Patch? This is not a kernel problem, but a user space problem. The fix is to change the user space crypto code to need the same number of cache line accesses on all keys. Disabling HT for this would the totally wrong approach, like throwing out the baby with the bath water. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:03 ` Andi Kleen @ 2005-05-13 18:34 ` Eric Rannaud 2005-05-13 18:35 ` Alan Cox ` (2 subsequent siblings) 3 siblings, 0 replies; 144+ messages in thread From: Eric Rannaud @ 2005-05-13 18:34 UTC (permalink / raw) To: Andi Kleen; +Cc: Gabor MICSKO, linux-kernel On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote: > This is not a kernel problem, but a user space problem. The fix > is to change the user space crypto code to need the same number of cache line > accesses on all keys. Well, this might not be trivial in general, and as pointed out by Colin Perceval, this would require a major rewrite of OpenSSL RSA key generation procedure. He also notes that other applications, a priori less sensible, might also be targeted. And obviously, it would be impractical to ensure this property in all application code. > Disabling HT for this would the totally wrong approach, like throwing > out the baby with the bath water. Colin also mentions another work-around, at the level of the scheduler: "[...] action must be taken to ensure that no pair of threads execute simultaneously on the same processor core if they have different privileges. Due to the complexities of performing such privilege checks correctly and based on the principle that security fixes should be chosen in such a way as to minimize the potential for new bugs to be introduced, we recommend that existing operating systems provide the necessary avoidance of inappropriate co-scheduling by never scheduling any two threads on the same core, i.e., by only scheduling threads on the first thread associated with each processor core. The more complex solution of allowing certain "blessed" pairs of threads to be scheduled on the same processor core is best delayed until future operating systems where it can be extensively tested. In light of the potential for information to be leaked across context switches, especially via the L2 and larger cache(s), we also recommend that operating systems provide some mechanism for processes to request special "secure" treatment, which would include flushing all caches upon a context switch. It is not immediately clear whether it is possible to use the occupancy of the cache across context switches as a side channel, but if an unprivileged user can cause his code to pre-empt a cryptographic operation (e.g., by operating with a higher scheduling priority and being repeatedly woken up by another process), then there is certainly a strong possibility of a side channel existing even in the absence of Hyper-Threading." Is that relevant to the Linux kernel? /er. -- "Sleep, she is for the weak" http://www.eleves.ens.fr/home/rannaud/ ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:03 ` Andi Kleen 2005-05-13 18:34 ` Eric Rannaud @ 2005-05-13 18:35 ` Alan Cox 2005-05-13 18:49 ` Scott Robert Ladd 2005-05-18 19:07 ` Bill Davidsen 2005-05-13 18:38 ` Richard F. Rebel 2005-05-13 19:16 ` Diego Calleja 3 siblings, 2 replies; 144+ messages in thread From: Alan Cox @ 2005-05-13 18:35 UTC (permalink / raw) To: Andi Kleen; +Cc: Gabor MICSKO, Linux Kernel Mailing List > This is not a kernel problem, but a user space problem. The fix > is to change the user space crypto code to need the same number of cache line > accesses on all keys. You actually also need to hit the same cache line sequence on all keys if you take a bit more care about it. > Disabling HT for this would the totally wrong approach, like throwing > out the baby with the bath water. HT for most users is pretty irrelevant, its a neat idea but the benchmarks don't suggest its too big a hit ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:35 ` Alan Cox @ 2005-05-13 18:49 ` Scott Robert Ladd 2005-05-13 19:08 ` Andi Kleen ` (2 more replies) 2005-05-18 19:07 ` Bill Davidsen 1 sibling, 3 replies; 144+ messages in thread From: Scott Robert Ladd @ 2005-05-13 18:49 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List Alan Cox wrote: > HT for most users is pretty irrelevant, its a neat idea but the > benchmarks don't suggest its too big a hit On real-world applications, I haven't seen HT boost performance by more than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything at all. HT is a nice idea, but I don't enable it on my systems. ..Scott ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:49 ` Scott Robert Ladd @ 2005-05-13 19:08 ` Andi Kleen 2005-05-13 19:36 ` Grant Coady 2005-05-16 17:00 ` Linus Torvalds 2 siblings, 0 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-13 19:08 UTC (permalink / raw) To: Scott Robert Ladd; +Cc: Alan Cox, Gabor MICSKO, Linux Kernel Mailing List On Fri, May 13, 2005 at 02:49:25PM -0400, Scott Robert Ladd wrote: > Alan Cox wrote: > > HT for most users is pretty irrelevant, its a neat idea but the > > benchmarks don't suggest its too big a hit > > On real-world applications, I haven't seen HT boost performance by more > than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything > at all. HT is a nice idea, but I don't enable it on my systems. I saw better improvement in some cases. It always depends on the workload. And on the generation of HT (there are three around). And lots of other factors. Even for your workload only it does not seem to me to be very rational to throw away a 15% speedup with open eyes. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:49 ` Scott Robert Ladd 2005-05-13 19:08 ` Andi Kleen @ 2005-05-13 19:36 ` Grant Coady 2005-05-16 17:00 ` Linus Torvalds 2 siblings, 0 replies; 144+ messages in thread From: Grant Coady @ 2005-05-13 19:36 UTC (permalink / raw) To: Scott Robert Ladd Cc: Alan Cox, Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List On Fri, 13 May 2005 14:49:25 -0400, Scott Robert Ladd <lkml@coyotegulch.com> wrote: >Alan Cox wrote: >> HT for most users is pretty irrelevant, its a neat idea but the >> benchmarks don't suggest its too big a hit > >On real-world applications, I haven't seen HT boost performance by more >than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything >at all. HT is a nice idea, but I don't enable it on my systems. P4-HT is great for winxp, a runaway process only gets half the CPU resources, keeps the system responsive. I like HT for that reason, perhaps that's what it was designed for? Hardware fix for msft 'OS' :o) Recently on single AMD CPU box, 2.6.latest-mm, diff got stuck, no disk activity, 100% CPU, started another terminal, recompiled kernel with 8K stacks and rebooted, the whole time the unkillable 'diff' was using just over 1/2 of resources. top showed all 1GB RAM in use, no swap activity, nothing odd in /proc/whatever -- only happened once. I suspected 4k stacks as only change before 'crash' was turning on samba server day before, but I didn't trace 'problem' as it wasn't really a crash. Impressive -- seeing 2.6 handling a stupid process, business as usual for everything else. Haven't had a problem since changing to 8K stacks. nfs, samba and ssh terminals on reiserfs 3.6 on via sata. May have had nvidia driver installed at the time, I now load that only when X running (rare), mostly headless use. --Grant. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:49 ` Scott Robert Ladd 2005-05-13 19:08 ` Andi Kleen 2005-05-13 19:36 ` Grant Coady @ 2005-05-16 17:00 ` Linus Torvalds 2005-05-16 12:37 ` Tommy Reynolds 2 siblings, 1 reply; 144+ messages in thread From: Linus Torvalds @ 2005-05-16 17:00 UTC (permalink / raw) To: Scott Robert Ladd Cc: Alan Cox, Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List On Fri, 13 May 2005, Scott Robert Ladd wrote: > > Alan Cox wrote: > > HT for most users is pretty irrelevant, its a neat idea but the > > benchmarks don't suggest its too big a hit > > On real-world applications, I haven't seen HT boost performance by more > than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything > at all. HT is a nice idea, but I don't enable it on my systems. HT is _wonderful_ for latency reduction. Why people think "performace" means "throughput" is something I'll never understand. Throughput is _always_ secondary to latency, and really only becomes interesting when it becomes a latency number (ie "I need higher throughput in order to process these jobs in 4 hours instead of 8" - notice how the real issue was again about _latency_). Now, Linux tends to have pretty good CPU latency anyway, so it's not usually that big of a deal, but I definitely enjoyed having a HT machine over a regular UP one. I'm told the effect was even more pronounced on XP. Of course, these days I enjoy having dual cores more, though, and with multiple cores, the latency advantages of HT become much less pronounced. As to the HT "vulnerability", it really seems to be not a whole lot different than what people saw with early SMP and (small) direct-mapped caches. Thank God those days are gone. I'd be really surprised if somebody is actually able to get a real-world attack on a real-world pgp key usage or similar out of it (and as to the covert channel, nobody cares). It's a fairly interesting approach, but it's certainly neither new nor HT-specific, or necessarily seem all that worrying in real life. (HT and modern CPU speeds just means that the covert channel is _faster_ than it has been before, since you can test the L1 at core speeds. I doubt it helps the key attack much, though, since faster in that case cuts both ways: the speed of testing the cache eviction may have gone up, but so has the speed of the operation you're trying to follow, and you'd likely have a really hard time trying to catch things in real life). It does show that if you want to hide key operations, you want to be careful. I don't think HT is at fault per se. Linus ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-16 17:00 ` Linus Torvalds @ 2005-05-16 12:37 ` Tommy Reynolds 0 siblings, 0 replies; 144+ messages in thread From: Tommy Reynolds @ 2005-05-16 12:37 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1099 bytes --] Uttered Linus Torvalds <torvalds@osdl.org>, spake thus: > It does show that if you want to hide key operations, you want to be > careful. I don't think HT is at fault per se. Trivially easy when two processes share the same FS namespace. Consider two files: $ ls -l /tmp/a /tmp/b -rw------ 1 owner owner xxxxx /tmp/a -rw------ 1 owner owner xxxxx /tmp/b One file serves as a clock. Note that the permissions deny all access to everyone except the owner. The owner user then does this, intentionally or unintentionally: for x in 0 0 0 1 0 0 0 0 0 1 do rm -f /tmp/a /tmp/b case "$x" in 0 ) rm -f /tmp/a;; 1 ) touch /tmp/a;; esac touch /tmp/b sleep 2 done And the baddie does this: let n=1 let char=0 while (($n < 8)) do while [ ! -f /tmp/b ]; do sleep 0.5 done let char=((char << 1)) if [ -f /tmp/a ]; do let char=((char + 1)) done done printf "The letter was: %c\n" $char This is one of the classic TEMPEST problems that secure systems have long had to deal with. See, at no time did HT ever raise its ugly head ;-) Cheers [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:35 ` Alan Cox 2005-05-13 18:49 ` Scott Robert Ladd @ 2005-05-18 19:07 ` Bill Davidsen 1 sibling, 0 replies; 144+ messages in thread From: Bill Davidsen @ 2005-05-18 19:07 UTC (permalink / raw) To: Alan Cox; +Cc: Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List On Fri, 13 May 2005, Alan Cox wrote: > > This is not a kernel problem, but a user space problem. The fix > > is to change the user space crypto code to need the same number of cache line > > accesses on all keys. > > You actually also need to hit the same cache line sequence on all keys > if you take a bit more care about it. > > > Disabling HT for this would the totally wrong approach, like throwing > > out the baby with the bath water. > > HT for most users is pretty irrelevant, its a neat idea but the > benchmarks don't suggest its too big a hit This is one of those things which can give any result depending on the measurement. For kernel compiles I might see a 5-30% reduction in clock time, for threaded applications like web/mail/news not much, and for applications which communicate via shared memory up to 50% because some blocking system calls can be avoided and cache impact is lower. In general I have to agree with the "too big," but I haven't seen any indication that the hole can be exploited without being able to run a custom application on the machine, so for single users machines and servers the risk level seems low. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:03 ` Andi Kleen 2005-05-13 18:34 ` Eric Rannaud 2005-05-13 18:35 ` Alan Cox @ 2005-05-13 18:38 ` Richard F. Rebel 2005-05-13 19:05 ` Andi Kleen 2005-05-13 19:14 ` Jim Crilly 2005-05-13 19:16 ` Diego Calleja 3 siblings, 2 replies; 144+ messages in thread From: Richard F. Rebel @ 2005-05-13 18:38 UTC (permalink / raw) To: Andi Kleen; +Cc: Gabor MICSKO, linux-kernel [-- Attachment #1: Type: text/plain, Size: 502 bytes --] On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote: > This is not a kernel problem, but a user space problem. The fix > is to change the user space crypto code to need the same number of cache line > accesses on all keys. > > Disabling HT for this would the totally wrong approach, like throwing > out the baby with the bath water. > > -Andi Why? It's certainly reasonable to disable it for the time being and even prudent to do so. -- Richard F. Rebel cat /dev/null > `tty` [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:38 ` Richard F. Rebel @ 2005-05-13 19:05 ` Andi Kleen 2005-05-13 21:26 ` Andy Isaacson 2005-05-13 23:32 ` Paul Jakma 2005-05-13 19:14 ` Jim Crilly 1 sibling, 2 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-13 19:05 UTC (permalink / raw) To: Richard F. Rebel; +Cc: Gabor MICSKO, linux-kernel On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote: > On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote: > > This is not a kernel problem, but a user space problem. The fix > > is to change the user space crypto code to need the same number of cache line > > accesses on all keys. > > > > Disabling HT for this would the totally wrong approach, like throwing > > out the baby with the bath water. > > > > -Andi > > Why? It's certainly reasonable to disable it for the time being and > even prudent to do so. No, i strongly disagree on that. The reasonable thing to do is to fix the crypto code which has this vulnerability, not break a useful performance enhancement for everybody else. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 19:05 ` Andi Kleen @ 2005-05-13 21:26 ` Andy Isaacson 2005-05-13 21:59 ` Matt Mackall ` (4 more replies) 2005-05-13 23:32 ` Paul Jakma 1 sibling, 5 replies; 144+ messages in thread From: Andy Isaacson @ 2005-05-13 21:26 UTC (permalink / raw) To: Andi Kleen; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote: > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote: > > Why? It's certainly reasonable to disable it for the time being and > > even prudent to do so. > > No, i strongly disagree on that. The reasonable thing to do is > to fix the crypto code which has this vulnerability, not break > a useful performance enhancement for everybody else. Pardon me for saying so, but that's bullshit. You're asking the crypto guys to give up a 5x performance gain (that's my wild guess) by giving up all their data-dependent algorithms and contorting their code wildly, to avoid a microarchitectural problem with Intel's HT implementation. There are three places to cut off the side channel, none of which is obviously the right one. 1. The HT implementation could do the cache tricks Colin suggested in his paper. Fairly large performance hit to address a fairly small problem. 2. The OS could do the scheduler tricks to avoid scheduling unfriendly threads on the same core. You're leaving a lot of the benefit of HT on the floor by doing so. 3. Every security-sensitive app can be rigorously audited and re-written to avoid *ever* referencing memory with the address determined by private data. (3) is a complete non-starter. It's just not feasible to rewrite all that code. Furthermore, there's no way to know what code needs to be rewritten! (Until someone publishes an advisory, that is...) Hmm, I can't think of any reason that this technique wouldn't work to extract information from kernel secrets, as well... If SHA has plaintext-dependent memory references, Colin's technique would enable an adversary to extract the contents of the /dev/random pools. I don't *think* SHA does, based on a quick reading of lib/sha1.c, but someone with an actual clue should probably take a look. Andi, are you prepared to *require* that no code ever make a memory reference as a function of a secret? Because that's what you're suggesting the crypto people should do. -andy ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 21:26 ` Andy Isaacson @ 2005-05-13 21:59 ` Matt Mackall 2005-05-13 22:47 ` Alan Cox 2005-05-14 0:39 ` dean gaudet ` (3 subsequent siblings) 4 siblings, 1 reply; 144+ messages in thread From: Matt Mackall @ 2005-05-13 21:59 UTC (permalink / raw) To: Andy Isaacson Cc: Andi Kleen, Richard F. Rebel, Gabor MICSKO, linux-kernel, tytso On Fri, May 13, 2005 at 02:26:20PM -0700, Andy Isaacson wrote: > On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote: > > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote: > > > Why? It's certainly reasonable to disable it for the time being and > > > even prudent to do so. > > > > No, i strongly disagree on that. The reasonable thing to do is > > to fix the crypto code which has this vulnerability, not break > > a useful performance enhancement for everybody else. > > Pardon me for saying so, but that's bullshit. You're asking the crypto > guys to give up a 5x performance gain (that's my wild guess) by giving > up all their data-dependent algorithms and contorting their code wildly, > to avoid a microarchitectural problem with Intel's HT implementation. > > There are three places to cut off the side channel, none of which is > obviously the right one. > 1. The HT implementation could do the cache tricks Colin suggested in > his paper. Fairly large performance hit to address a fairly small > problem. > 2. The OS could do the scheduler tricks to avoid scheduling unfriendly > threads on the same core. You're leaving a lot of the benefit of HT > on the floor by doing so. > 3. Every security-sensitive app can be rigorously audited and re-written > to avoid *ever* referencing memory with the address determined by > private data. > > (3) is a complete non-starter. It's just not feasible to rewrite all > that code. Furthermore, there's no way to know what code needs to be > rewritten! (Until someone publishes an advisory, that is...) > > Hmm, I can't think of any reason that this technique wouldn't work to > extract information from kernel secrets, as well... > > If SHA has plaintext-dependent memory references, Colin's technique > would enable an adversary to extract the contents of the /dev/random > pools. I don't *think* SHA does, based on a quick reading of > lib/sha1.c, but someone with an actual clue should probably take a look. SHA1 should be fine, as are the pool mixing bits. Much more problematic is the ability to do timing attacks against the entropy gathering itself. If an attacker can guess the TSC value that gets mixed into the pool, that's a problem. It might not be much of a problem though. If he's a bit off per guess (really impressive), he'll still be many bits off by the time there's enough entropy in the primary pool to reseed the secondary pool so he can check his guesswork. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 21:59 ` Matt Mackall @ 2005-05-13 22:47 ` Alan Cox 2005-05-13 23:00 ` Lee Revell 0 siblings, 1 reply; 144+ messages in thread From: Alan Cox @ 2005-05-13 22:47 UTC (permalink / raw) To: Matt Mackall Cc: Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote: > It might not be much of a problem though. If he's a bit off per guess > (really impressive), he'll still be many bits off by the time there's > enough entropy in the primary pool to reseed the secondary pool so he > can check his guesswork. You can also disable the tsc to user space in the intel processors. Thats something they anticipated as being neccessary in secure environments long ago. This makes the attack much harder. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 22:47 ` Alan Cox @ 2005-05-13 23:00 ` Lee Revell 2005-05-13 23:27 ` Dave Jones 0 siblings, 1 reply; 144+ messages in thread From: Lee Revell @ 2005-05-13 23:00 UTC (permalink / raw) To: Alan Cox Cc: Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote: > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote: > > It might not be much of a problem though. If he's a bit off per guess > > (really impressive), he'll still be many bits off by the time there's > > enough entropy in the primary pool to reseed the secondary pool so he > > can check his guesswork. > > You can also disable the tsc to user space in the intel processors. > Thats something they anticipated as being neccessary in secure > environments long ago. This makes the attack much harder. And break the hundreds of apps that depend on rdtsc? Am I missing something? Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 23:00 ` Lee Revell @ 2005-05-13 23:27 ` Dave Jones 2005-05-13 23:38 ` Lee Revell 0 siblings, 1 reply; 144+ messages in thread From: Dave Jones @ 2005-05-13 23:27 UTC (permalink / raw) To: Lee Revell Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote: > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote: > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote: > > > It might not be much of a problem though. If he's a bit off per guess > > > (really impressive), he'll still be many bits off by the time there's > > > enough entropy in the primary pool to reseed the secondary pool so he > > > can check his guesswork. > > > > You can also disable the tsc to user space in the intel processors. > > Thats something they anticipated as being neccessary in secure > > environments long ago. This makes the attack much harder. > > And break the hundreds of apps that depend on rdtsc? Am I missing > something? If those apps depend on rdtsc being a) present, and b) working without providing fallbacks, they're already broken. There's a reason its displayed in /proc/cpuinfo's flags field, and visible through cpuid. Apps should be testing for presence before assuming features are present. Dave ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 23:27 ` Dave Jones @ 2005-05-13 23:38 ` Lee Revell 2005-05-13 23:44 ` Dave Jones 2005-05-14 15:23 ` Alan Cox 0 siblings, 2 replies; 144+ messages in thread From: Lee Revell @ 2005-05-13 23:38 UTC (permalink / raw) To: Dave Jones Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Fri, 2005-05-13 at 19:27 -0400, Dave Jones wrote: > On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote: > > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote: > > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote: > > > > It might not be much of a problem though. If he's a bit off per guess > > > > (really impressive), he'll still be many bits off by the time there's > > > > enough entropy in the primary pool to reseed the secondary pool so he > > > > can check his guesswork. > > > > > > You can also disable the tsc to user space in the intel processors. > > > Thats something they anticipated as being neccessary in secure > > > environments long ago. This makes the attack much harder. > > > > And break the hundreds of apps that depend on rdtsc? Am I missing > > something? > > If those apps depend on rdtsc being a) present, and b) working > without providing fallbacks, they're already broken. > > There's a reason its displayed in /proc/cpuinfo's flags field, > and visible through cpuid. Apps should be testing for presence > before assuming features are present. > Well yes but you would still have to recompile those apps. And take the big performance hit from using gettimeofday vs rdtsc. Disabling HT by default looks pretty good by comparison. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 23:38 ` Lee Revell @ 2005-05-13 23:44 ` Dave Jones 2005-05-14 7:37 ` Lee Revell 2005-05-14 15:23 ` Alan Cox 1 sibling, 1 reply; 144+ messages in thread From: Dave Jones @ 2005-05-13 23:44 UTC (permalink / raw) To: Lee Revell Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Fri, May 13, 2005 at 07:38:08PM -0400, Lee Revell wrote: > On Fri, 2005-05-13 at 19:27 -0400, Dave Jones wrote: > > On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote: > > > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote: > > > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote: > > > > > It might not be much of a problem though. If he's a bit off per guess > > > > > (really impressive), he'll still be many bits off by the time there's > > > > > enough entropy in the primary pool to reseed the secondary pool so he > > > > > can check his guesswork. > > > > > > > > You can also disable the tsc to user space in the intel processors. > > > > Thats something they anticipated as being neccessary in secure > > > > environments long ago. This makes the attack much harder. > > > > > > And break the hundreds of apps that depend on rdtsc? Am I missing > > > something? > > > > If those apps depend on rdtsc being a) present, and b) working > > without providing fallbacks, they're already broken. > > > > There's a reason its displayed in /proc/cpuinfo's flags field, > > and visible through cpuid. Apps should be testing for presence > > before assuming features are present. > > > > Well yes but you would still have to recompile those apps. Not if the app is written correctly. See above. Dave ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 23:44 ` Dave Jones @ 2005-05-14 7:37 ` Lee Revell 2005-05-14 15:33 ` Andrea Arcangeli 0 siblings, 1 reply; 144+ messages in thread From: Lee Revell @ 2005-05-14 7:37 UTC (permalink / raw) To: Dave Jones Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Fri, 2005-05-13 at 19:44 -0400, Dave Jones wrote: > On Fri, May 13, 2005 at 07:38:08PM -0400, Lee Revell wrote: > > On Fri, 2005-05-13 at 19:27 -0400, Dave Jones wrote: > > > On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote: > > > > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote: > > > > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote: > > > > > > It might not be much of a problem though. If he's a bit off per guess > > > > > > (really impressive), he'll still be many bits off by the time there's > > > > > > enough entropy in the primary pool to reseed the secondary pool so he > > > > > > can check his guesswork. > > > > > > > > > > You can also disable the tsc to user space in the intel processors. > > > > > Thats something they anticipated as being neccessary in secure > > > > > environments long ago. This makes the attack much harder. > > > > > > > > And break the hundreds of apps that depend on rdtsc? Am I missing > > > > something? > > > > > > If those apps depend on rdtsc being a) present, and b) working > > > without providing fallbacks, they're already broken. > > > > > > There's a reason its displayed in /proc/cpuinfo's flags field, > > > and visible through cpuid. Apps should be testing for presence > > > before assuming features are present. > > > > > > > Well yes but you would still have to recompile those apps. > > Not if the app is written correctly. See above. The apps that bother to use rdtsc vs. gettimeofday need a cheap high res timer more than a correct one anyway - it's not guaranteed that rdtsc provides a reliable time source at all, due to SMP and frequency scaling issues. I'll try to benchmark the difference. Maybe it's not that big a deal. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 7:37 ` Lee Revell @ 2005-05-14 15:33 ` Andrea Arcangeli 2005-05-15 1:07 ` Christer Weinigel 2005-05-15 9:48 ` Andi Kleen 0 siblings, 2 replies; 144+ messages in thread From: Andrea Arcangeli @ 2005-05-14 15:33 UTC (permalink / raw) To: Lee Revell Cc: Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, May 14, 2005 at 03:37:18AM -0400, Lee Revell wrote: > The apps that bother to use rdtsc vs. gettimeofday need a cheap high res > timer more than a correct one anyway - it's not guaranteed that rdtsc > provides a reliable time source at all, due to SMP and frequency scaling > issues. On x86-64 the cost of gettimeofday is the same of the tsc, turning off tsc on x86-64 is not nice (even if we usually have HPET there, so perhaps it wouldn't be too bad). TSC is something only the kernel (or a person with some kernel/hardware knowledge) can do safely knowing it'll work fine. But on x86-64 parts of the kernel runs in userland... Preventing tasks with different uid to run on the same physical cpu was my first idea, disabled by default via sysctl, so only if one is paranoid can enable it. But before touching the kernel in any way it would be really nice if somebody could bother to demonstrate this is real because I've an hard time to believe this is not purely vapourware. On artificial environments a computer can recognize the difference between two faces too, no big deal, but that doesn't mean the same software is going to recognize million of different faces at the airport too. So nothing has been demonstrated in practical terms yet. Nobody runs openssl -sign thousand of times in a row on a pure idle system without noticing the 100% load on the other cpu for months (and he's not root so he can't hide his runaway 100% process, if he was root and he could modify the kernel or ps/top to hide the runaway process, he'd have faster ways to sniff). So to me this sounds a purerly theoretical problem. Cache covert channels are possible too as the paper states, next time somebody will find how to sniff a letter out of a pdf document on a UP no-HT system by opening and closing it some million of times on a otherwise idle system. We're sure not going to flush the l2 cache because of that (at least not by default ;). This was an interesting read, but in practice I'd rate this to have severity 1 on a 0-100 scale, unless somebody bothers to demonstrate it in a remotely realistic environment. Even if this would be real if they sniff a openssl key, unless they also crack the dns the browser will complain (not very different from not having a certificate authority signature on a fake key). And if the server is remotely serious they'll notice the 100% runaway process way before he can sniff the whole key (the 100% runaway load cannot be hidden). Most servers have some statistics so a 100% load for weeks or months isn't very likely to be overlooked. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 15:33 ` Andrea Arcangeli @ 2005-05-15 1:07 ` Christer Weinigel 2005-05-15 9:48 ` Andi Kleen 1 sibling, 0 replies; 144+ messages in thread From: Christer Weinigel @ 2005-05-15 1:07 UTC (permalink / raw) To: Andrea Arcangeli Cc: Lee Revell, Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso Andrea Arcangeli <andrea@suse.de> writes: > Nobody runs openssl -sign thousand of times in a row on a pure idle > system without noticing the 100% load on the other cpu for months Well, actually one does. On a normal https server, each https request results in an operation on the private key. So if the attacker shares the same web server as the victim it's probably rather easy for the attacker to see when the machine is idle and launch an attack giving him thousands of chances to spy on the victim. But I do agree that this probably isn't all that serious, for those who really have secrets to hide, they won't run their https server on a machine shared with anybody else. /Christer -- "Just how much can I get away with and still go to heaven?" Freelance consultant specializing in device driver programming for Linux Christer Weinigel <christer@weinigel.se> http://www.weinigel.se ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 15:33 ` Andrea Arcangeli 2005-05-15 1:07 ` Christer Weinigel @ 2005-05-15 9:48 ` Andi Kleen 1 sibling, 0 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-15 9:48 UTC (permalink / raw) To: Andrea Arcangeli Cc: Lee Revell, Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, May 14, 2005 at 05:33:07PM +0200, Andrea Arcangeli wrote: > On Sat, May 14, 2005 at 03:37:18AM -0400, Lee Revell wrote: > > The apps that bother to use rdtsc vs. gettimeofday need a cheap high res > > timer more than a correct one anyway - it's not guaranteed that rdtsc > > provides a reliable time source at all, due to SMP and frequency scaling > > issues. > > On x86-64 the cost of gettimeofday is the same of the tsc, turning off It depends, on many systems it is more costly. e.g. on many SMP systems we have to use HPET or even the PM timer, because TSC is not reliable. > tsc on x86-64 is not nice (even if we usually have HPET there, so > perhaps it wouldn't be too bad). TSC is something only the kernel (or a > person with some kernel/hardware knowledge) can do safely knowing it'll > work fine. But on x86-64 parts of the kernel runs in userland... Agreed. It is quite complicated to decide if TSC is reliable or not and I would not recommend user space to do this. [hmm actually I already have constant_tsc fake cpuid bit, but it only refers to single CPUs. I wonder if I should add another one for SMP "synchronized_tsc". The latest mm code already has this information, but it does not export it yet] > > Preventing tasks with different uid to run on the same physical cpu was > my first idea, disabled by default via sysctl, so only if one is > paranoid can enable it. The paranoid should just fix their crypto code. And if they're clinically paranoid they can always boot with noht or disable it in the BIOS. But really I think they should just fix OpenSSL. > > But before touching the kernel in any way it would be really nice if > somebody could bother to demonstrate this is real because I've an hard > time to believe this is not purely vapourware. On artificial Similar feeling here. > Nobody runs openssl -sign thousand of times in a row on a pure idle > system without noticing the 100% load on the other cpu for months (and > he's not root so he can't hide his runaway 100% process, if he was root > and he could modify the kernel or ps/top to hide the runaway process, > he'd have faster ways to sniff). Exactly. > > So to me this sounds a purerly theoretical problem. Cache covert Perhaps not purely theoretical, but it is certainly not something that needs drastic action like disabling HT in general. > This was an interesting read, but in practice I'd rate this to have > severity 1 on a 0-100 scale, unless somebody bothers to demonstrate it > in a remotely realistic environment. Agreed. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 23:38 ` Lee Revell 2005-05-13 23:44 ` Dave Jones @ 2005-05-14 15:23 ` Alan Cox 2005-05-14 15:45 ` andrea 2005-05-14 16:30 ` Lee Revell 1 sibling, 2 replies; 144+ messages in thread From: Alan Cox @ 2005-05-14 15:23 UTC (permalink / raw) To: Lee Revell Cc: Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sad, 2005-05-14 at 00:38, Lee Revell wrote: > Well yes but you would still have to recompile those apps. And take the > big performance hit from using gettimeofday vs rdtsc. Disabling HT by > default looks pretty good by comparison. You cannot use rdtsc for anything but rough instruction timing. The timers for different processors run at different speeds on some SMP systems, the timer rates vary as processors change clock rate nowdays. Rdtsc may also jump dramatically on a suspend/resume. If the app uses rdtsc then generally speaking its terminally broken. The only exception is some profiling tools. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 15:23 ` Alan Cox @ 2005-05-14 15:45 ` andrea 2005-05-15 13:38 ` Mikulas Patocka 2005-05-14 16:30 ` Lee Revell 1 sibling, 1 reply; 144+ messages in thread From: andrea @ 2005-05-14 15:45 UTC (permalink / raw) To: Alan Cox Cc: Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso, Andrew Morton On Sat, May 14, 2005 at 04:23:10PM +0100, Alan Cox wrote: > You cannot use rdtsc for anything but rough instruction timing. The > timers for different processors run at different speeds on some SMP > systems, the timer rates vary as processors change clock rate nowdays. > Rdtsc may also jump dramatically on a suspend/resume. x86-64 uses it for vgettimeofday very safely (i386 could do too but it doesn't). Anyway I believe at least for seccomp it's worth to turn off the tsc, not just for HT but for the L2 cache too. So it's up to you, either you turn it off completely (which isn't very nice IMHO) or I recommend to apply this below patch. This has been tested successfully on x86-64 against current cogito repository (i686 compiles so I didn't bother testing ;). People selling the cpu through cpushare may appreciate this bit for a peace of mind. There's no way to get any timing info anymore with this applied (gettimeofday is forbidden of course). The seccomp environment is completely deterministic so it can't be allowed to get timing info, it has to be deterministic so in the future I can enable a computing mode that does a parallel computing for each task with server side transparent checkpointing and verification that the output is the same from all the 2/3 seller computers for each task, without the buyer even noticing (for now the verification is left to the buyer client side and there's no checkpointing, since that would require more kernel changes to track the dirty bits but it'll be easy to extend once the basic mode is finished). Thanks. Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> Index: arch/i386/kernel/process.c =================================================================== --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/i386/kernel/process.c (mode:100644) +++ uncommitted/arch/i386/kernel/process.c (mode:100644) @@ -561,6 +561,25 @@ } /* + * This function selects if the context switch from prev to next + * has to tweak the TSC disable bit in the cr4. + */ +static void disable_tsc(struct thread_info *prev, + struct thread_info *next) +{ + if (unlikely(has_secure_computing(prev) || + has_secure_computing(next))) { + /* slow path here */ + if (has_secure_computing(prev) && + !has_secure_computing(next)) { + clear_in_cr4(X86_CR4_TSD); + } else if (!has_secure_computing(prev) && + has_secure_computing(next)) + set_in_cr4(X86_CR4_TSD); + } +} + +/* * switch_to(x,yn) should switch tasks from x to y. * * We fsave/fwait so that an exception goes off at the right time @@ -639,6 +658,8 @@ if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) handle_io_bitmap(next, tss); + disable_tsc(prev_p->thread_info, next_p->thread_info); + return prev_p; } Index: arch/x86_64/kernel/process.c =================================================================== --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/x86_64/kernel/process.c (mode:100644) +++ uncommitted/arch/x86_64/kernel/process.c (mode:100644) @@ -439,6 +439,25 @@ } /* + * This function selects if the context switch from prev to next + * has to tweak the TSC disable bit in the cr4. + */ +static void disable_tsc(struct thread_info *prev, + struct thread_info *next) +{ + if (unlikely(has_secure_computing(prev) || + has_secure_computing(next))) { + /* slow path here */ + if (has_secure_computing(prev) && + !has_secure_computing(next)) { + clear_in_cr4(X86_CR4_TSD); + } else if (!has_secure_computing(prev) && + has_secure_computing(next)) + set_in_cr4(X86_CR4_TSD); + } +} + +/* * This special macro can be used to load a debugging register */ #define loaddebug(thread,r) set_debug(thread->debugreg ## r, r) @@ -556,6 +575,8 @@ } } + disable_tsc(prev_p->thread_info, next_p->thread_info); + return prev_p; } Index: include/linux/seccomp.h =================================================================== --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/include/linux/seccomp.h (mode:100644) +++ uncommitted/include/linux/seccomp.h (mode:100644) @@ -19,6 +19,11 @@ __secure_computing(this_syscall); } +static inline int has_secure_computing(struct thread_info *ti) +{ + return unlikely(test_ti_thread_flag(ti, TIF_SECCOMP)); +} + #else /* CONFIG_SECCOMP */ #if (__GNUC__ > 2) @@ -28,6 +33,7 @@ #endif #define secure_computing(x) do { } while (0) +#define has_secure_computing(x) 0 #endif /* CONFIG_SECCOMP */ ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 15:45 ` andrea @ 2005-05-15 13:38 ` Mikulas Patocka 2005-05-16 7:06 ` andrea 0 siblings, 1 reply; 144+ messages in thread From: Mikulas Patocka @ 2005-05-15 13:38 UTC (permalink / raw) To: andrea Cc: Alan Cox, Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso, Andrew Morton On Sat, 14 May 2005 andrea@cpushare.com wrote: > On Sat, May 14, 2005 at 04:23:10PM +0100, Alan Cox wrote: > > You cannot use rdtsc for anything but rough instruction timing. The > > timers for different processors run at different speeds on some SMP > > systems, the timer rates vary as processors change clock rate nowdays. > > Rdtsc may also jump dramatically on a suspend/resume. > > x86-64 uses it for vgettimeofday very safely (i386 could do too but it > doesn't). > > Anyway I believe at least for seccomp it's worth to turn off the tsc, > not just for HT but for the L2 cache too. So it's up to you, either you > turn it off completely (which isn't very nice IMHO) or I recommend to > apply this below patch. This has been tested successfully on x86-64 > against current cogito repository (i686 compiles so I didn't bother > testing ;). People selling the cpu through cpushare may appreciate this > bit for a peace of mind. There's no way to get any timing info anymore > with this applied (gettimeofday is forbidden of course). Another possibility to get timing is from direct-io --- i.e. initiate direct io read, wait until one cache line contains new data and you can be sure that the next will contain new data in certain time. IDE controller bus master operation acts here as a timer. Mikulas > The seccomp > environment is completely deterministic so it can't be allowed to get > timing info, it has to be deterministic so in the future I can enable a > computing mode that does a parallel computing for each task with server > side transparent checkpointing and verification that the output is the > same from all the 2/3 seller computers for each task, without the buyer > even noticing (for now the verification is left to the buyer client > side and there's no checkpointing, since that would require more kernel > changes to track the dirty bits but it'll be easy to extend once the > basic mode is finished). > > Thanks. > > Signed-off-by: Andrea Arcangeli <andrea@cpushare.com> > > Index: arch/i386/kernel/process.c > =================================================================== > --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/i386/kernel/process.c (mode:100644) > +++ uncommitted/arch/i386/kernel/process.c (mode:100644) > @@ -561,6 +561,25 @@ > } > > /* > + * This function selects if the context switch from prev to next > + * has to tweak the TSC disable bit in the cr4. > + */ > +static void disable_tsc(struct thread_info *prev, > + struct thread_info *next) > +{ > + if (unlikely(has_secure_computing(prev) || > + has_secure_computing(next))) { > + /* slow path here */ > + if (has_secure_computing(prev) && > + !has_secure_computing(next)) { > + clear_in_cr4(X86_CR4_TSD); > + } else if (!has_secure_computing(prev) && > + has_secure_computing(next)) > + set_in_cr4(X86_CR4_TSD); > + } > +} > + > +/* > * switch_to(x,yn) should switch tasks from x to y. > * > * We fsave/fwait so that an exception goes off at the right time > @@ -639,6 +658,8 @@ > if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr)) > handle_io_bitmap(next, tss); > > + disable_tsc(prev_p->thread_info, next_p->thread_info); > + > return prev_p; > } > > Index: arch/x86_64/kernel/process.c > =================================================================== > --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/x86_64/kernel/process.c (mode:100644) > +++ uncommitted/arch/x86_64/kernel/process.c (mode:100644) > @@ -439,6 +439,25 @@ > } > > /* > + * This function selects if the context switch from prev to next > + * has to tweak the TSC disable bit in the cr4. > + */ > +static void disable_tsc(struct thread_info *prev, > + struct thread_info *next) > +{ > + if (unlikely(has_secure_computing(prev) || > + has_secure_computing(next))) { > + /* slow path here */ > + if (has_secure_computing(prev) && > + !has_secure_computing(next)) { > + clear_in_cr4(X86_CR4_TSD); > + } else if (!has_secure_computing(prev) && > + has_secure_computing(next)) > + set_in_cr4(X86_CR4_TSD); > + } > +} > + > +/* > * This special macro can be used to load a debugging register > */ > #define loaddebug(thread,r) set_debug(thread->debugreg ## r, r) > @@ -556,6 +575,8 @@ > } > } > > + disable_tsc(prev_p->thread_info, next_p->thread_info); > + > return prev_p; > } > > Index: include/linux/seccomp.h > =================================================================== > --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/include/linux/seccomp.h (mode:100644) > +++ uncommitted/include/linux/seccomp.h (mode:100644) > @@ -19,6 +19,11 @@ > __secure_computing(this_syscall); > } > > +static inline int has_secure_computing(struct thread_info *ti) > +{ > + return unlikely(test_ti_thread_flag(ti, TIF_SECCOMP)); > +} > + > #else /* CONFIG_SECCOMP */ > > #if (__GNUC__ > 2) > @@ -28,6 +33,7 @@ > #endif > > #define secure_computing(x) do { } while (0) > +#define has_secure_computing(x) 0 > > #endif /* CONFIG_SECCOMP */ > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 13:38 ` Mikulas Patocka @ 2005-05-16 7:06 ` andrea 0 siblings, 0 replies; 144+ messages in thread From: andrea @ 2005-05-16 7:06 UTC (permalink / raw) To: Mikulas Patocka Cc: Alan Cox, Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso, Andrew Morton On Sun, May 15, 2005 at 03:38:22PM +0200, Mikulas Patocka wrote: > Another possibility to get timing is from direct-io --- i.e. initiate > direct io read, wait until one cache line contains new data and you can be > sure that the next will contain new data in certain time. IDE controller > bus master operation acts here as a timer. There's no way to do direct-io through seccomp, all the fds are pipes with twisted userland listening the other side of the pipe. So disabling the tsc is more than enough to give to CPUShare users a peace of mind with HT enabled and without having to flush the l2 cache either. CPUShare is the only case I can imagine where an untrusted and random bytecode running at 100% system load is the normal behaviour. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 15:23 ` Alan Cox 2005-05-14 15:45 ` andrea @ 2005-05-14 16:30 ` Lee Revell 2005-05-14 16:44 ` Arjan van de Ven ` (2 more replies) 1 sibling, 3 replies; 144+ messages in thread From: Lee Revell @ 2005-05-14 16:30 UTC (permalink / raw) To: Alan Cox Cc: Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote: > On Sad, 2005-05-14 at 00:38, Lee Revell wrote: > > Well yes but you would still have to recompile those apps. And take the > > big performance hit from using gettimeofday vs rdtsc. Disabling HT by > > default looks pretty good by comparison. > > You cannot use rdtsc for anything but rough instruction timing. The > timers for different processors run at different speeds on some SMP > systems, the timer rates vary as processors change clock rate nowdays. > Rdtsc may also jump dramatically on a suspend/resume. > > If the app uses rdtsc then generally speaking its terminally broken. The > only exception is some profiling tools. That is basically all JACK and mplayer use it for. They have RT constraints and the tsc is used to know if we got woken up too late and should just drop some frames. The developers are aware of the issues with rdtsc and have chosen to use it anyway because these apps need every ounce of CPU and cannot tolerate the overhead of gettimeofday(). Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 16:30 ` Lee Revell @ 2005-05-14 16:44 ` Arjan van de Ven 2005-05-14 17:56 ` Lee Revell 2005-05-14 17:04 ` Jindrich Makovicka 2005-05-15 9:58 ` Andi Kleen 2 siblings, 1 reply; 144+ messages in thread From: Arjan van de Ven @ 2005-05-14 16:44 UTC (permalink / raw) To: Lee Revell Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 12:30 -0400, Lee Revell wrote: > On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote: > > On Sad, 2005-05-14 at 00:38, Lee Revell wrote: > > > Well yes but you would still have to recompile those apps. And take the > > > big performance hit from using gettimeofday vs rdtsc. Disabling HT by > > > default looks pretty good by comparison. > > > > You cannot use rdtsc for anything but rough instruction timing. The > > timers for different processors run at different speeds on some SMP > > systems, the timer rates vary as processors change clock rate nowdays. > > Rdtsc may also jump dramatically on a suspend/resume. > > > > If the app uses rdtsc then generally speaking its terminally broken. The > > only exception is some profiling tools. > > That is basically all JACK and mplayer use it for. They have RT > constraints and the tsc is used to know if we got woken up too late and > should just drop some frames. The developers are aware of the issues > with rdtsc and have chosen to use it anyway because these apps need > every ounce of CPU and cannot tolerate the overhead of gettimeofday(). then JACK is terminally broken if it doesn't have a fallback for non- rdtsc cpus. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 16:44 ` Arjan van de Ven @ 2005-05-14 17:56 ` Lee Revell 2005-05-14 18:01 ` Arjan van de Ven 2005-05-15 9:33 ` Hyper-Threading Vulnerability Adrian Bunk 0 siblings, 2 replies; 144+ messages in thread From: Lee Revell @ 2005-05-14 17:56 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote: > then JACK is terminally broken if it doesn't have a fallback for non- > rdtsc cpus. It does have a fallback, but the selection is done at compile time. It uses rdtsc for all x86 CPUs except pre-i586 SMP systems. Maybe we should check at runtime, but this has always worked. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 17:56 ` Lee Revell @ 2005-05-14 18:01 ` Arjan van de Ven 2005-05-14 19:21 ` Lee Revell 2005-05-15 10:01 ` Andi Kleen 2005-05-15 9:33 ` Hyper-Threading Vulnerability Adrian Bunk 1 sibling, 2 replies; 144+ messages in thread From: Arjan van de Ven @ 2005-05-14 18:01 UTC (permalink / raw) To: Lee Revell Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote: > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote: > > then JACK is terminally broken if it doesn't have a fallback for non- > > rdtsc cpus. > > It does have a fallback, but the selection is done at compile time. It > uses rdtsc for all x86 CPUs except pre-i586 SMP systems. > > Maybe we should check at runtime, it's probably a sign that JACK isn't used on SMP systems much, at least not on the bigger systems (like IBM's x440's) where the tsc *will* differ wildly between cpus... ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 18:01 ` Arjan van de Ven @ 2005-05-14 19:21 ` Lee Revell 2005-05-14 19:48 ` Arjan van de Ven 2005-05-15 10:01 ` Andi Kleen 1 sibling, 1 reply; 144+ messages in thread From: Lee Revell @ 2005-05-14 19:21 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 20:01 +0200, Arjan van de Ven wrote: > On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote: > > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote: > > > then JACK is terminally broken if it doesn't have a fallback for non- > > > rdtsc cpus. > > > > It does have a fallback, but the selection is done at compile time. It > > uses rdtsc for all x86 CPUs except pre-i586 SMP systems. > > > > Maybe we should check at runtime, > > it's probably a sign that JACK isn't used on SMP systems much, at least > not on the bigger systems (like IBM's x440's) where the tsc *will* > differ wildly between cpus... Correct. The only bug reports we have seen related to the use of the TSC is due to CPU frequency scaling. The fix is to not use it - people who want to use their PC as a DSP for audio probably don't want their processor slowing down anyway. And JACK is targeted at desktop and smaller systems, it would be kind of crazy to run it on a big iron. Well, maybe there are people who like to record sessions or practice guitar in the server room... If gettimeofday is really as cheap as rdtsc on x86_64, we should use it. But it's too expensive for slower x86 systems. Anyway, Andi's fix disables *all* high res timing including gettimeofday. Obviously no multimedia app can tolerate this, so discussing rdtsc is really a red herring. But multimedia apps aren't much in seccomp environments either. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 19:21 ` Lee Revell @ 2005-05-14 19:48 ` Arjan van de Ven 2005-05-14 23:40 ` Lee Revell 2005-05-15 3:19 ` dean gaudet 0 siblings, 2 replies; 144+ messages in thread From: Arjan van de Ven @ 2005-05-14 19:48 UTC (permalink / raw) To: Lee Revell Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 15:21 -0400, Lee Revell wrote: > On Sat, 2005-05-14 at 20:01 +0200, Arjan van de Ven wrote: > > On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote: > > > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote: > > > > then JACK is terminally broken if it doesn't have a fallback for non- > > > > rdtsc cpus. > > > > > > It does have a fallback, but the selection is done at compile time. It > > > uses rdtsc for all x86 CPUs except pre-i586 SMP systems. > > > > > > Maybe we should check at runtime, > > > > it's probably a sign that JACK isn't used on SMP systems much, at least > > not on the bigger systems (like IBM's x440's) where the tsc *will* > > differ wildly between cpus... > > Correct. The only bug reports we have seen related to the use of the > TSC is due to CPU frequency scaling. The fix is to not use it - people > who want to use their PC as a DSP for audio probably don't want their > processor slowing down anyway. it's a matter of time (my estimate is a year or two) before processors get variable frequencies based on temperature targets etc... and then rdtsc is really useless for this kind of thing.. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 19:48 ` Arjan van de Ven @ 2005-05-14 23:40 ` Lee Revell 2005-05-15 7:30 ` Arjan van de Ven 2005-05-15 9:37 ` Andi Kleen 2005-05-15 3:19 ` dean gaudet 1 sibling, 2 replies; 144+ messages in thread From: Lee Revell @ 2005-05-14 23:40 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 21:48 +0200, Arjan van de Ven wrote: > On Sat, 2005-05-14 at 15:21 -0400, Lee Revell wrote: > > On Sat, 2005-05-14 at 20:01 +0200, Arjan van de Ven wrote: > > > On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote: > > > > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote: > > > > > then JACK is terminally broken if it doesn't have a fallback for non- > > > > > rdtsc cpus. > > > > > > > > It does have a fallback, but the selection is done at compile time. It > > > > uses rdtsc for all x86 CPUs except pre-i586 SMP systems. > > > > > > > > Maybe we should check at runtime, > > > > > > it's probably a sign that JACK isn't used on SMP systems much, at least > > > not on the bigger systems (like IBM's x440's) where the tsc *will* > > > differ wildly between cpus... > > > > Correct. The only bug reports we have seen related to the use of the > > TSC is due to CPU frequency scaling. The fix is to not use it - people > > who want to use their PC as a DSP for audio probably don't want their > > processor slowing down anyway. > > it's a matter of time (my estimate is a year or two) before processors > get variable frequencies based on temperature targets etc... > and then rdtsc is really useless for this kind of thing.. I was under the impression that P4 and later processors do not vary the TSC rate when doing frequency scaling. This is mentioned in the documentation for the high res timers patch. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 23:40 ` Lee Revell @ 2005-05-15 7:30 ` Arjan van de Ven 2005-05-15 20:41 ` Alan Cox 2005-05-15 9:37 ` Andi Kleen 1 sibling, 1 reply; 144+ messages in thread From: Arjan van de Ven @ 2005-05-15 7:30 UTC (permalink / raw) To: Lee Revell Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 2005-05-14 at 19:40 -0400, Lee Revell wrote: > > it's a matter of time (my estimate is a year or two) before processors > > get variable frequencies based on temperature targets etc... > > and then rdtsc is really useless for this kind of thing.. > > I was under the impression that P4 and later processors do not vary the > TSC rate when doing frequency scaling. This is mentioned in the > documentation for the high res timers patch. seems not the case, and worse, during idle time the clock is allowed to stop entirely.... (and that is also happening more and more and linux is getting more agressive idle support (eg no timer tick and such patches) which will trigger bios thresholds for this even more too. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 7:30 ` Arjan van de Ven @ 2005-05-15 20:41 ` Alan Cox 2005-05-15 20:48 ` Arjan van de Ven 0 siblings, 1 reply; 144+ messages in thread From: Alan Cox @ 2005-05-15 20:41 UTC (permalink / raw) To: Arjan van de Ven Cc: Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote: > stop entirely.... (and that is also happening more and more and linux is > getting more agressive idle support (eg no timer tick and such patches) > which will trigger bios thresholds for this even more too. Cyrix did TSC stop on halt a long long time ago, back when it was worth the power difference. Alan ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 20:41 ` Alan Cox @ 2005-05-15 20:48 ` Arjan van de Ven 2005-05-15 21:10 ` Lee Revell 0 siblings, 1 reply; 144+ messages in thread From: Arjan van de Ven @ 2005-05-15 20:48 UTC (permalink / raw) To: Alan Cox Cc: Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote: > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote: > > stop entirely.... (and that is also happening more and more and linux is > > getting more agressive idle support (eg no timer tick and such patches) > > which will trigger bios thresholds for this even more too. > > Cyrix did TSC stop on halt a long long time ago, back when it was worth > the power difference. With linux going to ACPI C2 mode more... tsc is defined to halt in C2... ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 20:48 ` Arjan van de Ven @ 2005-05-15 21:10 ` Lee Revell 2005-05-15 22:55 ` Dave Jones 0 siblings, 1 reply; 144+ messages in thread From: Lee Revell @ 2005-05-15 21:10 UTC (permalink / raw) To: Arjan van de Ven Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote: > On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote: > > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote: > > > stop entirely.... (and that is also happening more and more and linux is > > > getting more agressive idle support (eg no timer tick and such patches) > > > which will trigger bios thresholds for this even more too. > > > > Cyrix did TSC stop on halt a long long time ago, back when it was worth > > the power difference. > > With linux going to ACPI C2 mode more... tsc is defined to halt in C2... JACK doesn't care about any of this now, the behavior when you suspend/resume with a running jackd is undefined. Eventually we should handle it, but there's no point until the ALSA drivers get proper suspend/resume support. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 21:10 ` Lee Revell @ 2005-05-15 22:55 ` Dave Jones 2005-05-15 23:10 ` Lee Revell 0 siblings, 1 reply; 144+ messages in thread From: Dave Jones @ 2005-05-15 22:55 UTC (permalink / raw) To: Lee Revell Cc: Arjan van de Ven, Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sun, May 15, 2005 at 05:10:59PM -0400, Lee Revell wrote: > On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote: > > On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote: > > > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote: > > > > stop entirely.... (and that is also happening more and more and linux is > > > > getting more agressive idle support (eg no timer tick and such patches) > > > > which will trigger bios thresholds for this even more too. > > > > > > Cyrix did TSC stop on halt a long long time ago, back when it was worth > > > the power difference. > > > > With linux going to ACPI C2 mode more... tsc is defined to halt in C2... > > JACK doesn't care about any of this now, the behavior when you > suspend/resume with a running jackd is undefined. Eventually we should > handle it, but there's no point until the ALSA drivers get proper > suspend/resume support. suspend/resume are S states, not C states. C states are occuring during runtime. Dave ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 22:55 ` Dave Jones @ 2005-05-15 23:10 ` Lee Revell 2005-05-16 7:25 ` Arjan van de Ven 0 siblings, 1 reply; 144+ messages in thread From: Lee Revell @ 2005-05-15 23:10 UTC (permalink / raw) To: Dave Jones Cc: Arjan van de Ven, Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sun, 2005-05-15 at 18:55 -0400, Dave Jones wrote: > On Sun, May 15, 2005 at 05:10:59PM -0400, Lee Revell wrote: > > On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote: > > > On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote: > > > > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote: > > > > > stop entirely.... (and that is also happening more and more and linux is > > > > > getting more agressive idle support (eg no timer tick and such patches) > > > > > which will trigger bios thresholds for this even more too. > > > > > > > > Cyrix did TSC stop on halt a long long time ago, back when it was worth > > > > the power difference. > > > > > > With linux going to ACPI C2 mode more... tsc is defined to halt in C2... > > > > JACK doesn't care about any of this now, the behavior when you > > suspend/resume with a running jackd is undefined. Eventually we should > > handle it, but there's no point until the ALSA drivers get proper > > suspend/resume support. > > suspend/resume are S states, not C states. C states are occuring > during runtime. It should never go into C2 if jackd is running, because you're getting interrupts from the audio interface at least every 100ms or so (usually much more often) which will wake up jackd and any clients. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 23:10 ` Lee Revell @ 2005-05-16 7:25 ` Arjan van de Ven 0 siblings, 0 replies; 144+ messages in thread From: Arjan van de Ven @ 2005-05-16 7:25 UTC (permalink / raw) To: Lee Revell Cc: Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sun, 2005-05-15 at 19:10 -0400, Lee Revell wrote: > On Sun, 2005-05-15 at 18:55 -0400, Dave Jones wrote: > > On Sun, May 15, 2005 at 05:10:59PM -0400, Lee Revell wrote: > > > On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote: > > > > On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote: > > > > > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote: > > > > > > stop entirely.... (and that is also happening more and more and linux is > > > > > > getting more agressive idle support (eg no timer tick and such patches) > > > > > > which will trigger bios thresholds for this even more too. > > > > > > > > > > Cyrix did TSC stop on halt a long long time ago, back when it was worth > > > > > the power difference. > > > > > > > > With linux going to ACPI C2 mode more... tsc is defined to halt in C2... > > > > > > JACK doesn't care about any of this now, the behavior when you > > > suspend/resume with a running jackd is undefined. Eventually we should > > > handle it, but there's no point until the ALSA drivers get proper > > > suspend/resume support. > > > > suspend/resume are S states, not C states. C states are occuring > > during runtime. > > It should never go into C2 if jackd is running, because you're getting > interrupts from the audio interface at least every 100ms or so (usually > much more often) which will wake up jackd and any clients. you're not guaranteed to not enter C2 in that case. C2 can happen after a few ms already ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 23:40 ` Lee Revell 2005-05-15 7:30 ` Arjan van de Ven @ 2005-05-15 9:37 ` Andi Kleen 1 sibling, 0 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-15 9:37 UTC (permalink / raw) To: Lee Revell Cc: Arjan van de Ven, Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso > I was under the impression that P4 and later processors do not vary the > TSC rate when doing frequency scaling. This is mentioned in the > documentation for the high res timers patch. Prescott and later do not vary TSC, but P4s before that do. On x86-64 it is true because only Nocona is supported which has a pstate invariant TSC. The latest x86-64 kernel has a special X86_CONSTANT_TSC internal CPUID bit, which is set in that case. If some other subsystem uses it I would recommend to port that to i386 too. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 19:48 ` Arjan van de Ven 2005-05-14 23:40 ` Lee Revell @ 2005-05-15 3:19 ` dean gaudet 1 sibling, 0 replies; 144+ messages in thread From: dean gaudet @ 2005-05-15 3:19 UTC (permalink / raw) To: Arjan van de Ven Cc: Lee Revell, Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, 14 May 2005, Arjan van de Ven wrote: > it's a matter of time (my estimate is a year or two) before processors > get variable frequencies based on temperature targets etc... > and then rdtsc is really useless for this kind of thing.. what do you mean "a year or two"? processors have been doing this for many years now. i'm biased, but i still think transmeta did this the right way... the tsc operates at the top frequency of the processor always. i do a hell of a lot of microbenchmarking on various processors and i always use tsc -- but i'm just smart enough to take multiple samples and i try to make each sample smaller than a time slice... which avoids most of the pitfalls, and would even work on smp boxes with tsc differences. -dean ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 18:01 ` Arjan van de Ven 2005-05-14 19:21 ` Lee Revell @ 2005-05-15 10:01 ` Andi Kleen 2005-05-15 10:23 ` 2.6.4 timer and helper functions kernel 1 sibling, 1 reply; 144+ messages in thread From: Andi Kleen @ 2005-05-15 10:01 UTC (permalink / raw) To: Arjan van de Ven Cc: Lee Revell, Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, May 14, 2005 at 08:01:33PM +0200, Arjan van de Ven wrote: > On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote: > > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote: > > > then JACK is terminally broken if it doesn't have a fallback for non- > > > rdtsc cpus. > > > > It does have a fallback, but the selection is done at compile time. It > > uses rdtsc for all x86 CPUs except pre-i586 SMP systems. > > > > Maybe we should check at runtime, > > it's probably a sign that JACK isn't used on SMP systems much, at least > not on the bigger systems (like IBM's x440's) where the tsc *will* > differ wildly between cpus... It does not even need SMP, just use a Centrino laptop. I suppose what the Jack guys are doing is to recommend to disable frequency scaling then the sound guys complain again that sound on linux is so hard to use. I wonder where this comes from? :) -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* 2.6.4 timer and helper functions 2005-05-15 10:01 ` Andi Kleen @ 2005-05-15 10:23 ` kernel 2005-05-19 0:38 ` George Anzinger 0 siblings, 1 reply; 144+ messages in thread From: kernel @ 2005-05-15 10:23 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1416 bytes --] Hi all, i am running a 2.6.4 kernel on my system , and i am playing a little bit with kernel time issues and helper functions,just to understand how the things really work. While doing that on my x86 system and loaded a module from LDD 3rd edition,jit.c, which uses a dynamic /proc file to return textual information. The info that returns is in this format and uses the kernel functions ,do_gettimeofday,current_kernel_time and jiffies_to_timespec. The output format is: 0x0009073c 0x000000010009073c 1116162967.247441 1116162967.246530656 591.586065248 0x0009073c 0x000000010009073c 1116162967.247463 1116162967.246530656 591.586065248 0x0009073c 0x000000010009073c 1116162967.247476 1116162967.246530656 591.586065248 0x0009073c 0x000000010009073c 1116162967.247489 1116162967.246530656 591.586065248 where the first two values are the jiffies and jiffies_64.The next two are the do_gettimeofday and current_kernel_time and the last value is the jiffies_to_timespec.This output text is "recorded" after 16 minutes of uptime.Shouldnt the last value be the same as uptime.I have attached an output file from the boot time until the time the function resets the struct and starts count from the beggining.Is this a bug or i am missing sth here??? Best regards, Chris. [-- Attachment #2: NAS --] [-- Type: application/octet-stream, Size: 8798 bytes --] 0xfffd544c 0x00000000fffd544c 1116162200.659770 1116162200.659069664 4294139.459575264 0xfffd544c 0x00000000fffd544c 1116162200.659793 1116162200.659069664 4294139.459575264 0xfffd544c 0x00000000fffd544c 1116162200.659807 1116162200.659069664 4294139.459575264 0xfffd544c 0x00000000fffd544c 1116162200.659820 1116162200.659069664 4294139.459575264 0xfffd575b 0x00000000fffd575b 1116162201.442085 1116162201.441950648 4294140.242456248 0xfffd575b 0x00000000fffd575b 1116162201.442109 1116162201.441950648 4294140.242456248 0xfffd575b 0x00000000fffd575b 1116162201.442122 1116162201.441950648 4294140.242456248 0xfffd575b 0x00000000fffd575b 1116162201.442135 1116162201.441950648 4294140.242456248 0xfffd5a71 0x00000000fffd5a71 1116162202.231974 1116162202.231830568 4294141.32336168 0xfffd5a71 0x00000000fffd5a71 1116162202.231996 1116162202.231830568 4294141.32336168 0xfffd5a71 0x00000000fffd5a71 1116162202.232010 1116162202.231830568 4294141.32336168 0xfffd5a71 0x00000000fffd5a71 1116162202.232023 1116162202.231830568 4294141.32336168 0xfffd5d63 0x00000000fffd5d63 1116162202.986007 1116162202.985715960 4294141.786221560 0xfffd5d63 0x00000000fffd5d63 1116162202.986030 1116162202.985715960 4294141.786221560 0xfffd5d63 0x00000000fffd5d63 1116162202.986043 1116162202.985715960 4294141.786221560 0xfffd5d63 0x00000000fffd5d63 1116162202.986056 1116162202.985715960 4294141.786221560 0xfffd71d6 0x00000000fffd71d6 1116162208.220317 1116162208.219920240 4294147.20425840 0xfffd71d6 0x00000000fffd71d6 1116162208.220341 1116162208.219920240 4294147.20425840 0xfffd71d6 0x00000000fffd71d6 1116162208.220354 1116162208.219920240 4294147.20425840 0xfffd71d6 0x00000000fffd71d6 1116162208.220367 1116162208.219920240 4294147.20425840 0xfffd7432 0x00000000fffd7432 1116162208.824141 1116162208.823828432 4294147.624334032 0xfffd7432 0x00000000fffd7432 1116162208.824165 1116162208.823828432 4294147.624334032 0xfffd7432 0x00000000fffd7432 1116162208.824178 1116162208.823828432 4294147.624334032 0xfffd7432 0x00000000fffd7432 1116162208.824191 1116162208.823828432 4294147.624334032 0xfffd76dc 0x00000000fffd76dc 1116162209.506691 1116162209.505724768 4294148.306230368 0xfffd76dc 0x00000000fffd76dc 1116162209.506714 1116162209.505724768 4294148.306230368 0xfffd76dd 0x00000000fffd76dd 1116162209.506750 1116162209.506724616 4294148.307230216 0xfffd76dd 0x00000000fffd76dd 1116162209.506764 1116162209.506724616 4294148.307230216 0xfffd79f0 0x00000000fffd79f0 1116162210.293679 1116162210.293604992 4294149.94110592 0xfffd79f0 0x00000000fffd79f0 1116162210.293702 1116162210.293604992 4294149.94110592 0xfffd79f0 0x00000000fffd79f0 1116162210.293715 1116162210.293604992 4294149.94110592 0xfffd79f0 0x00000000fffd79f0 1116162210.293728 1116162210.293604992 4294149.94110592 0xfffd7c99 0x00000000fffd7c99 1116162210.974616 1116162210.974501480 4294149.775007080 0xfffd7c99 0x00000000fffd7c99 1116162210.974640 1116162210.974501480 4294149.775007080 0xfffd7c99 0x00000000fffd7c99 1116162210.974653 1116162210.974501480 4294149.775007080 0xfffd7c99 0x00000000fffd7c99 1116162210.974666 1116162210.974501480 4294149.775007080 0xfffd7fb0 0x00000000fffd7fb0 1116162211.766070 1116162211.765381248 4294150.565886848 0xfffd7fb0 0x00000000fffd7fb0 1116162211.766094 1116162211.765381248 4294150.565886848 0xfffd7fb0 0x00000000fffd7fb0 1116162211.766107 1116162211.765381248 4294150.565886848 0xfffd7fb0 0x00000000fffd7fb0 1116162211.766120 1116162211.765381248 4294150.565886848 0xfffd829c 0x00000000fffd829c 1116162212.513993 1116162212.513267552 4294151.313773152 0xfffd829c 0x00000000fffd829c 1116162212.514016 1116162212.513267552 4294151.313773152 0xfffd829c 0x00000000fffd829c 1116162212.514029 1116162212.513267552 4294151.313773152 0xfffd829c 0x00000000fffd829c 1116162212.514042 1116162212.513267552 4294151.313773152 0xfffd858d 0x00000000fffd858d 1116162213.266431 1116162213.266153096 4294152.66658696 0xfffd858d 0x00000000fffd858d 1116162213.266453 1116162213.266153096 4294152.66658696 0xfffd858d 0x00000000fffd858d 1116162213.266467 1116162213.266153096 4294152.66658696 0xfffd858d 0x00000000fffd858d 1116162213.266480 1116162213.266153096 4294152.66658696 0xfffdaeb0 0x00000000fffdaeb0 1116162223.796156 1116162223.795552384 4294162.596057984 0xfffdaeb0 0x00000000fffdaeb0 1116162223.796180 1116162223.795552384 4294162.596057984 0xfffdaeb0 0x00000000fffdaeb0 1116162223.796193 1116162223.795552384 4294162.596057984 0xfffdaeb0 0x00000000fffdaeb0 1116162223.796206 1116162223.795552384 4294162.596057984 0xfffdb151 0x00000000fffdb151 1116162224.469209 1116162224.468450088 4294163.268955688 0xfffdb151 0x00000000fffdb151 1116162224.469233 1116162224.468450088 4294163.268955688 0xfffdb151 0x00000000fffdb151 1116162224.469247 1116162224.468450088 4294163.268955688 0xfffdb151 0x00000000fffdb151 1116162224.469260 1116162224.468450088 4294163.268955688 0xfffdb3b2 0x00000000fffdb3b2 1116162225.077922 1116162225.077357520 4294163.877863120 0xfffdb3b2 0x00000000fffdb3b2 1116162225.077946 1116162225.077357520 4294163.877863120 0xfffdb3b2 0x00000000fffdb3b2 1116162225.077959 1116162225.077357520 4294163.877863120 0xfffdb3b2 0x00000000fffdb3b2 1116162225.077972 1116162225.077357520 4294163.877863120 0xfffded9b 0x00000000fffded9b 1116162239.900231 1116162239.900104120 4294178.700609720 0xfffdfbd9 0x00000000fffdfbd9 1116162243.545937 1116162243.545549928 4294182.346055528 0xfffdfea5 0x00000000fffdfea5 1116162244.261748 1116162244.261441096 4294183.61946696 0xfffe014b 0x00000000fffe014b 1116162244.939810 1116162244.939338040 4294183.739843640 0xfffe03cb 0x00000000fffe03cb 1116162245.580168 1116162245.579240760 4294184.379746360 0xfffe0674 0x00000000fffe0674 1116162246.260663 1116162246.260137248 4294185.60642848 0xfffe08f5 0x00000000fffe08f5 1116162246.901559 1116162246.901039816 4294185.701545416 0xfffe0b99 0x00000000fffe0b99 1116162247.577274 1116162247.576937064 4294186.377442664 0xfffe0e63 0x00000000fffe0e63 1116162248.291766 1116162248.290828536 4294187.91334136 0xfffe48ee 0x00000000fffe48ee 1116162263.276450 1116162263.275550512 4294202.76056112 ...................................................................... ...................................................................... .....................after 5 minutes exactly from the beggining ...... 0x0000002d 0x000000010000002d 1116162375.706265 1116162375.705458568 0.44993160 the counter resets from the beggining...shouldnt be like that from the beggining??? ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: 2.6.4 timer and helper functions 2005-05-15 10:23 ` 2.6.4 timer and helper functions kernel @ 2005-05-19 0:38 ` George Anzinger 0 siblings, 0 replies; 144+ messages in thread From: George Anzinger @ 2005-05-19 0:38 UTC (permalink / raw) To: kernel; +Cc: linux-kernel kernel@wired-net.gr wrote: > Hi all, > i am running a 2.6.4 kernel on my system , and i am playing a little bit > with kernel time issues and helper functions,just to understand how the > things really work. > While doing that on my x86 system and loaded a module from LDD 3rd > edition,jit.c, which uses a dynamic /proc file to return textual > information. > The info that returns is in this format and uses the kernel functions > ,do_gettimeofday,current_kernel_time and jiffies_to_timespec. > The output format is: > 0x0009073c 0x000000010009073c 1116162967.247441 > 1116162967.246530656 591.586065248 > 0x0009073c 0x000000010009073c 1116162967.247463 > 1116162967.246530656 591.586065248 > 0x0009073c 0x000000010009073c 1116162967.247476 > 1116162967.246530656 591.586065248 > 0x0009073c 0x000000010009073c 1116162967.247489 > 1116162967.246530656 591.586065248 > where the first two values are the jiffies and jiffies_64.The next two are > the do_gettimeofday and current_kernel_time and the last value is the > jiffies_to_timespec.This output text is "recorded" after 16 minutes of > uptime.Shouldnt the last value be the same as uptime.I have attached an > output file from the boot time until the time the function resets the > struct and starts count from the beggining.Is this a bug or i am missing > sth here??? You are assuming that jiffies starts at zero at boot time. This is clearly not so even from your print outs. (It starts at a value near overflow of the low order 32-bits to flush out problems with the roll over.) > -- George Anzinger george@mvista.com High-res-timers: http://sourceforge.net/projects/high-res-timers/ ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 17:56 ` Lee Revell 2005-05-14 18:01 ` Arjan van de Ven @ 2005-05-15 9:33 ` Adrian Bunk 1 sibling, 0 replies; 144+ messages in thread From: Adrian Bunk @ 2005-05-15 9:33 UTC (permalink / raw) To: Lee Revell Cc: Arjan van de Ven, Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, May 14, 2005 at 01:56:36PM -0400, Lee Revell wrote: > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote: > > then JACK is terminally broken if it doesn't have a fallback for non- > > rdtsc cpus. > > It does have a fallback, but the selection is done at compile time. It > uses rdtsc for all x86 CPUs except pre-i586 SMP systems. > > Maybe we should check at runtime, but this has always worked. If this is critical for JACK, runtime selection was an improvement for distributions like Debian that support both pre-i586 SMP systems and current hardware. > Lee cu Adrian -- "Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 16:30 ` Lee Revell 2005-05-14 16:44 ` Arjan van de Ven @ 2005-05-14 17:04 ` Jindrich Makovicka 2005-05-14 18:27 ` Lee Revell 2005-05-15 9:58 ` Andi Kleen 2 siblings, 1 reply; 144+ messages in thread From: Jindrich Makovicka @ 2005-05-14 17:04 UTC (permalink / raw) To: linux-kernel Lee Revell wrote: > On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote: > >>On Sad, 2005-05-14 at 00:38, Lee Revell wrote: >> >>>Well yes but you would still have to recompile those apps. And take the >>>big performance hit from using gettimeofday vs rdtsc. Disabling HT by >>>default looks pretty good by comparison. >> >>You cannot use rdtsc for anything but rough instruction timing. The >>timers for different processors run at different speeds on some SMP >>systems, the timer rates vary as processors change clock rate nowdays. >>Rdtsc may also jump dramatically on a suspend/resume. >> >>If the app uses rdtsc then generally speaking its terminally broken. The >>only exception is some profiling tools. > > > That is basically all JACK and mplayer use it for. They have RT > constraints and the tsc is used to know if we got woken up too late and > should just drop some frames. The developers are aware of the issues > with rdtsc and have chosen to use it anyway because these apps need > every ounce of CPU and cannot tolerate the overhead of gettimeofday(). AFAIK, mplayer actually uses gettimeofday(). rdtsc is used in some places for profiling and debugging purposes and not compiled in by default. -- Jindrich Makovicka ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 17:04 ` Jindrich Makovicka @ 2005-05-14 18:27 ` Lee Revell 0 siblings, 0 replies; 144+ messages in thread From: Lee Revell @ 2005-05-14 18:27 UTC (permalink / raw) To: Jindrich Makovicka; +Cc: linux-kernel On Sat, 2005-05-14 at 19:04 +0200, Jindrich Makovicka wrote: > AFAIK, mplayer actually uses gettimeofday(). rdtsc is used in some > places for profiling and debugging purposes and not compiled in by default. > OK. The comments in the JACK code say it was copied from mplayer. I guess the usage is not the same. Lee ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 16:30 ` Lee Revell 2005-05-14 16:44 ` Arjan van de Ven 2005-05-14 17:04 ` Jindrich Makovicka @ 2005-05-15 9:58 ` Andi Kleen 2 siblings, 0 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-15 9:58 UTC (permalink / raw) To: Lee Revell Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso On Sat, May 14, 2005 at 12:30:28PM -0400, Lee Revell wrote: > On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote: > > On Sad, 2005-05-14 at 00:38, Lee Revell wrote: > > > Well yes but you would still have to recompile those apps. And take the > > > big performance hit from using gettimeofday vs rdtsc. Disabling HT by > > > default looks pretty good by comparison. > > > > You cannot use rdtsc for anything but rough instruction timing. The > > timers for different processors run at different speeds on some SMP > > systems, the timer rates vary as processors change clock rate nowdays. > > Rdtsc may also jump dramatically on a suspend/resume. > > > > If the app uses rdtsc then generally speaking its terminally broken. The > > only exception is some profiling tools. > > That is basically all JACK and mplayer use it for. They have RT > constraints and the tsc is used to know if we got woken up too late and > should just drop some frames. The developers are aware of the issues > with rdtsc and have chosen to use it anyway because these apps need > every ounce of CPU and cannot tolerate the overhead of gettimeofday(). I would consider jack broken then. For once it breaks on Centrinos and on AMD systems with PowerNow and some others which all have frequency scaling with non pstate invariant TSC. As an additional problem the modern Opterons which support SMP powernow can even have completely different TSC frequencies on different CPUs. All I can recommend is to use gettimeofday() for this. The kernel goes to considerable pains to make gettimeofday() fast, and when it is not fast then the system in general cannot do it better. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 21:26 ` Andy Isaacson 2005-05-13 21:59 ` Matt Mackall @ 2005-05-14 0:39 ` dean gaudet 2005-05-16 13:41 ` Andrea Arcangeli 2005-05-15 9:43 ` Andi Kleen ` (2 subsequent siblings) 4 siblings, 1 reply; 144+ messages in thread From: dean gaudet @ 2005-05-14 0:39 UTC (permalink / raw) To: Andy Isaacson Cc: Andi Kleen, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso On Fri, 13 May 2005, Andy Isaacson wrote: > On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote: > > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote: > > > Why? It's certainly reasonable to disable it for the time being and > > > even prudent to do so. > > > > No, i strongly disagree on that. The reasonable thing to do is > > to fix the crypto code which has this vulnerability, not break > > a useful performance enhancement for everybody else. > > Pardon me for saying so, but that's bullshit. You're asking the crypto > guys to give up a 5x performance gain (that's my wild guess) by giving > up all their data-dependent algorithms and contorting their code wildly, > to avoid a microarchitectural problem with Intel's HT implementation. i think your wild guess is way off. i can think of several approaches to fix these problems which won't be anywhere near 5x. the problem is that an attacker can observe which cache indices (rows) are in use. one workaround is to overload the possible secrets which each index represents. you can overload the secrets in each cache line: for example when doing exponentiation there is an array of bignums x**(2*n). bignums themselves are arrays (which span multiple cache lines). do a "row/column transpose" on this array of arrays -- suddenly each cache line contains a number of possible secrets. if you're operating with 32-bit words in a 64 byte line then you've achieved a 16-fold reduction in exposed information by this transpose. there'll be almost no performance penalty. you can overload the secrets in each cache index: abuse the associativity of the cache. the affected processors are all 8-way associative. ideally you'd want to arrange your data so that it all collides within the same cache index -- and get an 8-fold reduction in exposure. the trick here is the L2 is physically indexed, and userland code can perform only virtual allocations. but it's not too hard to discover physical conflicts if you really want to (using rdtsc) -- it would be done early in the initialization of the program because it involves asking for enough memory until the kernel gives you enough colliding pages. (a system call could help with this if we really wanted it.) my not-so-wild guess is a 128-fold reduction for less than 10% perf hit... i think there's possibly another approach involving a permuted array of indirection pointers... which is going to affect perf a bit due to the extra indirection required, but we're talking <10% here. (i'm just not convinced yet you can select a permutation in a manner which doesn't leak information when the attacker can view multiple invocations of the crypto for example.) > If SHA has plaintext-dependent memory references, Colin's technique > would enable an adversary to extract the contents of the /dev/random > pools. I don't *think* SHA does, based on a quick reading of > lib/sha1.c, but someone with an actual clue should probably take a look. the SHA family do not have any data-dependencies in their memory access patterns. -dean ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-14 0:39 ` dean gaudet @ 2005-05-16 13:41 ` Andrea Arcangeli 0 siblings, 0 replies; 144+ messages in thread From: Andrea Arcangeli @ 2005-05-16 13:41 UTC (permalink / raw) To: dean gaudet Cc: Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso On Fri, May 13, 2005 at 05:39:25PM -0700, dean gaudet wrote: > same cache index -- and get an 8-fold reduction in exposure. the trick > here is the L2 is physically indexed, and userland code can perform only > virtual allocations. but it's not too hard to discover physical conflicts > if you really want to (using rdtsc) -- it would be done early in the > initialization of the program because it involves asking for enough memory > until the kernel gives you enough colliding pages. (a system call could > help with this if we really wanted it.) A 8-way set associative 1M cache is guaranteed to go at l2 speed only up to 128K (no matter what the kernel does), but even if the secret payload is larger than 128K as long as the load is still distributed evenly at each pass for each page, there's not going to be any covert channel, simply the process will run slower than it could if it had a better page coloring. So I don't see the need of kernel support, all it needs to do is to know the page size, and that's provided to userland already. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 21:26 ` Andy Isaacson 2005-05-13 21:59 ` Matt Mackall 2005-05-14 0:39 ` dean gaudet @ 2005-05-15 9:43 ` Andi Kleen 2005-05-15 18:42 ` David Schwartz 2005-05-16 7:10 ` Eric W. Biederman 2005-05-15 14:00 ` Mikulas Patocka 2005-05-15 14:26 ` Andi Kleen 4 siblings, 2 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-15 9:43 UTC (permalink / raw) To: Andy Isaacson; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso On Fri, May 13, 2005 at 02:26:20PM -0700, Andy Isaacson wrote: > On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote: > > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote: > > > Why? It's certainly reasonable to disable it for the time being and > > > even prudent to do so. > > > > No, i strongly disagree on that. The reasonable thing to do is > > to fix the crypto code which has this vulnerability, not break > > a useful performance enhancement for everybody else. > > Pardon me for saying so, but that's bullshit. You're asking the crypto > guys to give up a 5x performance gain (that's my wild guess) by giving > up all their data-dependent algorithms and contorting their code wildly, > to avoid a microarchitectural problem with Intel's HT implementation. And what you're doing is to ask all the non crypto guys to give up an useful optimization just to fix a problem in the crypto guy's code. The cache line information leak is just a information leak bug in the crypto code, not a general problem. There is much more non crypto code than crypto code around - you are proposing to screw the majority of codes to solve a relatively obscure problem of only a few functions, which seems like the totally wrong approach to me. BTW the crypto guys are always free to check for hyperthreading themselves and use different functions. However there is a catch there - the modern dual core processors which actually have separated L1 and L2 caches set these too to stay compatible with old code and license managers. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* RE: Hyper-Threading Vulnerability 2005-05-15 9:43 ` Andi Kleen @ 2005-05-15 18:42 ` David Schwartz 2005-05-15 18:56 ` Dr. David Alan Gilbert 2005-05-16 7:10 ` Eric W. Biederman 1 sibling, 1 reply; 144+ messages in thread From: David Schwartz @ 2005-05-15 18:42 UTC (permalink / raw) To: linux-kernel Andi Kleen wrote: > And what you're doing is to ask all the non crypto guys to give > up an useful optimization just to fix a problem in the crypto guy's > code. The cache line information leak is just a information leak > bug in the crypto code, not a general problem. Portable code shouldn't even have to know that there is such a thing as a cache line. It should be able to rely on the operating system not to let other tasks with a different security context spy on the details of its operation. > There is much more non crypto code than crypto code around - you > are proposing to screw the majority of codes to solve a relatively > obscure problem of only a few functions, which seems like the totally > wrong approach to me. That I do agree with. > BTW the crypto guys are always free to check for hyperthreading > themselves and use different functions. However there is a catch > there - the modern dual core processors which actually have > separated L1 and L2 caches set these too to stay compatible > with old code and license managers. This is just a recipe for making it impossible to write correct code. If you don't believe the operating system or the hardware is at all at fault for this problem, then it would follow that they could repeat this same problem with some new mechanism and still not be at fault. So even if the program checked for hyper-threading, it would still not be correct. It would have to check for every possible future way this same type of problem could arise and hide every type of trace that they could create, even if that trace is in optimization mechanisms and potential channels over which the programmer has no knowledge because they don't exist yet. Let's try a rudctio ad absurdum. Surely you would agree that something other than than the crypto software is at fault if the operating system or hardware allowed another process with a different security context to see every instruction the code executed. The crypto authors shouldn't be expected to make the instruction flows look identical. How different is monitoring the memory accesses? Portable, POSIX-compliant C software shouldn't even have to know that there is such a thing as a cache line. I'm not going to be unreasonable though. Hyper-threading is here, and now that we know the potential problems, it's not unreasonable to ask developers of crypto code to work around it. But it's not a bug in their code that they need to fix. In fact, they can't even fix it yet because there is no portable way to determine if you're on a machine that has hyper-threading or not. DS ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 18:42 ` David Schwartz @ 2005-05-15 18:56 ` Dr. David Alan Gilbert 0 siblings, 0 replies; 144+ messages in thread From: Dr. David Alan Gilbert @ 2005-05-15 18:56 UTC (permalink / raw) To: David Schwartz; +Cc: linux-kernel * David Schwartz (davids@webmaster.com) wrote: > > Andi Kleen wrote: > > > And what you're doing is to ask all the non crypto guys to give > > up an useful optimization just to fix a problem in the crypto guy's > > code. The cache line information leak is just a information leak > > bug in the crypto code, not a general problem. > > Portable code shouldn't even have to know that there is such a thing as a > cache line. It should be able to rely on the operating system not to let > other tasks with a different security context spy on the details of its > operation. I find it interesting to compare this thread with a thread from about a week ago talking about how /proc/cpuinfo wasn't consistent across architectures - where we come round to the view of whether the application writers shouldn't care/are too dumb/shouldn't need to know about/can't be trusted with knowing about what the real hardware is. Personally I think this is a good case of where the application should take care of it - with whatever support the OS can really give. (That is if this is actually a real problem and not just purely theoretical - my crypto knowledge isn't good enough to answer that - but it feels very very abstract). Dave -----Open up your eyes, open up your mind, open up your code ------- / Dr. David Alan Gilbert | Running GNU/Linux on Alpha,68K| Happy \ \ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex / \ _________________________|_____ http://www.treblig.org |_______/ ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 9:43 ` Andi Kleen 2005-05-15 18:42 ` David Schwartz @ 2005-05-16 7:10 ` Eric W. Biederman 2005-05-16 11:04 ` Andi Kleen 1 sibling, 1 reply; 144+ messages in thread From: Eric W. Biederman @ 2005-05-16 7:10 UTC (permalink / raw) To: Andi Kleen Cc: Andy Isaacson, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso Andi Kleen <ak@muc.de> writes: > On Fri, May 13, 2005 at 02:26:20PM -0700, Andy Isaacson wrote: > > On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote: > > > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote: > > > > Why? It's certainly reasonable to disable it for the time being and > > > > even prudent to do so. > > > > > > No, i strongly disagree on that. The reasonable thing to do is > > > to fix the crypto code which has this vulnerability, not break > > > a useful performance enhancement for everybody else. > > > > Pardon me for saying so, but that's bullshit. You're asking the crypto > > guys to give up a 5x performance gain (that's my wild guess) by giving > > up all their data-dependent algorithms and contorting their code wildly, > > to avoid a microarchitectural problem with Intel's HT implementation. > > And what you're doing is to ask all the non crypto guys to give > up an useful optimization just to fix a problem in the crypto guy's > code. The cache line information leak is just a information leak > bug in the crypto code, not a general problem. It is not a problem in the crypto code, it is a mis-feature of the hardware/kernel combination. As such you must know be intimate about each and every flavor of the hardware to attempt to avoid it in the software, and that way lies madness. First this is a reminder that prefect security requires an audit of the hardware as well as the software. As we are neither auditing the hardware not locking it down we obviously will not achieve perfection. The question then becomes what can be done to decrease the likely hood that an application will inadvertently and unavoidably leak information from timing attacks due to unknown hardware optimizations? Attacks that do not result from hardware micro-architecture are another problem and one an application can anticipate and avoid. Ideally a solution will be proposed that will allow this problem to be avoided using the existing POSIX API or at least the current linux kernel API. But that problem may not be the case. The only solution I have seen proposed so far that seems to work is to not schedule untrusted processes simultaneously with the security code. With the current API that sounds like a root process killing off, or at least stopping all non-root processes until the critical process has finished. Potentially the scheduler can be modified to do this at a finer grain but I don't know if this would impact the scheduler fast path. Given the rarity and uncertainty of this it should probably be something that the process that is worried about security should asks for instead of simply getting by default. So it looks to me like the sanest way to handle this is to allocate a pool of threads/processes one per cpu. Set the affinity of each process to a particular cpu. And set priority of the threads to run at the highest priority. And during the critical time ensure none of the threads are sleeping. Can someone see a better way to prevent an accidental information leak to do to hardware architecture details? I wish there was a better way to ensure all of the threads were running simultaneously and other then giving them the highest priority in the system but I don't currently see an alternative. > There is much more non crypto code than crypto code around - you > are proposing to screw the majority of codes to solve a relatively > obscure problem of only a few functions, which seems like the totally > wrong approach to me. > > BTW the crypto guys are always free to check for hyperthreading > themselves and use different functions. However there is a catch > there - the modern dual core processors which actually have > separated L1 and L2 caches set these too to stay compatible > with old code and license managers. And those same processors will have the same problem if the share significant cpu resources. Ideally the entire problem set would fit in the cache and the cpu designers would allow cache blocks to be locked but that is not currently the case. So a shared L3 cache with dual core processors will have the same problem. In addition a flavor of this attack may be made by repeatedly doing multiplies or other activities that access functional units and seeing how long they have to be waited for. So even hyperthreading without sharing a L2 cache may see this problem. Eric ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-16 7:10 ` Eric W. Biederman @ 2005-05-16 11:04 ` Andi Kleen 2005-05-16 19:14 ` Eric W. Biederman 0 siblings, 1 reply; 144+ messages in thread From: Andi Kleen @ 2005-05-16 11:04 UTC (permalink / raw) To: Eric W. Biederman Cc: Andy Isaacson, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso > The only solution I have seen proposed so far that seems to work > is to not schedule untrusted processes simultaneously with > the security code. With the current API that sounds like > a root process killing off, or at least stopping all non-root > processes until the critical process has finished. With virtualization and a hypervisor freely scheduling it is quite impossible to guarantee this. Of course as always the signal is quite noisy so it is unclear if it is exploitable in practical settings. On virtualized environments you cannot use ps to see if a crypto process is running. > And those same processors will have the same problem if the share > significant cpu resources. Ideally the entire problem set > would fit in the cache and the cpu designers would allow cache > blocks to be locked but that is not currently the case. So a shared > L3 cache with dual core processors will have the same problem. At some point the signal gets noisy enough and the assumptions an attacker has to make too great for it being an useful attack. For me it is not even clear it is a real attack on native Linux, at least the setup in the paper looked highly artifical and quite impractical. e.g. I suppose it would be quite difficult to really synchronize to the beginning and end of the RSA encryptions on a server that does other things too. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-16 11:04 ` Andi Kleen @ 2005-05-16 19:14 ` Eric W. Biederman 2005-05-16 20:05 ` Valdis.Kletnieks 0 siblings, 1 reply; 144+ messages in thread From: Eric W. Biederman @ 2005-05-16 19:14 UTC (permalink / raw) To: Andi Kleen Cc: Andy Isaacson, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso Andi Kleen <ak@muc.de> writes: > > The only solution I have seen proposed so far that seems to work > > is to not schedule untrusted processes simultaneously with > > the security code. With the current API that sounds like > > a root process killing off, or at least stopping all non-root > > processes until the critical process has finished. > > With virtualization and a hypervisor freely scheduling it is quite > impossible to guarantee this. Of course as always the signal > is quite noisy so it is unclear if it is exploitable in practical > settings. On virtualized environments you cannot use ps to see > if a crypto process is running. Interesting. I think that is a problem for the hypervisor maintainer. Although that is about enough to convince me to request a OS flag that says "please give me privacy" and later that can be passed down to the hypervisor. My gut feel is running under a hypervisor is when things will at their most vulnerable. Where this is a threat is when there will be a lot of RSA key transactions. At which point it is likely that the attacker can reproduce enough of the setup to figure out the fine details. I think discovering a crypto process will simply be a matter finding a https sever. As for getting the timing how about initiating a https connection? Getting rid of the noise will certainly be a challenge but you will have multiple attempts. > > And those same processors will have the same problem if the share > > significant cpu resources. Ideally the entire problem set > > would fit in the cache and the cpu designers would allow cache > > blocks to be locked but that is not currently the case. So a shared > > L3 cache with dual core processors will have the same problem. > > At some point the signal gets noisy enough and the assumptions > an attacker has to make too great for it being an useful attack. > For me it is not even clear it is a real attack on native Linux, at > least the setup in the paper looked highly artifical and quite impractical. > e.g. I suppose it would be quite difficult to really synchronize > to the beginning and end of the RSA encryptions on a server that > does other things too. Possibly. But then buffer overflow attacks when you don't know the exact stack layout are similarly difficult and ways have been found. And if you have multiple chances things get easier. And if you are aiming at something easier then brute forcing a private key even the littlest bit is a help. When people mmap pages we zero them for the same reason so that we don't have unintentional information leaks. I agree that for now because little is known this is a highly specialized attack. However the trend is now towards increasingly big SMP's. That increases the number of resources that can be shared so the possibility of a problem increases. At the rate Intel's cpus are going we may see throttling of one cpu core when the other one generates too much heat, because it is busy doing something else cpu intensive. And other optimizations lead to much easier to imagine vulnerabilities. As for noise with the area cpu designers are getting into things are becoming increasingly fine grained so information is leaking at an increasingly fine level. As the L2 cache issue has shown that information starts to leak below the level an application designer has control of. At which point things get very difficult to manage. Information leaks are more difficult than simply gaining root on the box where you can simply take the information you want. But that means that is exactly where a locked down well administered box will be vulnerable if a way is not found to avoid the problem. I don't know what the consequences of having your private key discovered are, but I have never heard a case where identity theft was something pleasant to fix. Eric ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-16 19:14 ` Eric W. Biederman @ 2005-05-16 20:05 ` Valdis.Kletnieks 0 siblings, 0 replies; 144+ messages in thread From: Valdis.Kletnieks @ 2005-05-16 20:05 UTC (permalink / raw) To: Eric W. Biederman Cc: Andi Kleen, Andy Isaacson, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso [-- Attachment #1: Type: text/plain, Size: 713 bytes --] On Mon, 16 May 2005 13:14:23 MDT, Eric W. Biederman said: > Interesting. I think that is a problem for the hypervisor maintainer. > Although that is about enough to convince me to request a > OS flag that says "please give me privacy" and later that can be passed > down to the hypervisor. My gut feel is running under a hypervisor > is when things will at their most vulnerable. Not really, because.... > I think discovering a crypto process will simply be a matter > finding a https sever. As for getting the timing how about > initiating a https connection? Getting rid of the noise will certainly > be a challenge but you will have multiple attempts. And the hypervisor is, if anything, adding noise. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 21:26 ` Andy Isaacson ` (2 preceding siblings ...) 2005-05-15 9:43 ` Andi Kleen @ 2005-05-15 14:00 ` Mikulas Patocka 2005-05-15 14:26 ` Andi Kleen 4 siblings, 0 replies; 144+ messages in thread From: Mikulas Patocka @ 2005-05-15 14:00 UTC (permalink / raw) To: Andy Isaacson Cc: Andi Kleen, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso On Fri, 13 May 2005, Andy Isaacson wrote: > On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote: > > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote: > > > Why? It's certainly reasonable to disable it for the time being and > > > even prudent to do so. > > > > No, i strongly disagree on that. The reasonable thing to do is > > to fix the crypto code which has this vulnerability, not break > > a useful performance enhancement for everybody else. > > Pardon me for saying so, but that's bullshit. You're asking the crypto > guys to give up a 5x performance gain (that's my wild guess) by giving > up all their data-dependent algorithms and contorting their code wildly, > to avoid a microarchitectural problem with Intel's HT implementation. That information leak can be exploited not only on HT or SMP, but on any CPU with L2 cache. Without HT it's much harder to get information about L2 cache footprint, but it's still possible. If an attacker can make unlimited number of connections to ssh or http server and manages to get 1 bit in 100 connections, it's still a problem. Possible solutions: 1) don't use branches and data-dependent memory accesses depending on secret data 2) flush cache completely when switching to process with different EUID (0.2ms on Pentium 4 with 1M cache, even worse on CPUs with more cache). Disabling HT/SMP is not a solution. A year later someone may come with something like this: * prefill L2 cache with known pattern * sleep on some precious timer * make connection to security application (ssh, https) * on wakeup, read what's in L2 cache --- get one bit with small probability --- but when repeated many times, it's still a problem Mikulas > There are three places to cut off the side channel, none of which is > obviously the right one. > 1. The HT implementation could do the cache tricks Colin suggested in > his paper. Fairly large performance hit to address a fairly small > problem. > 2. The OS could do the scheduler tricks to avoid scheduling unfriendly > threads on the same core. You're leaving a lot of the benefit of HT > on the floor by doing so. > 3. Every security-sensitive app can be rigorously audited and re-written > to avoid *ever* referencing memory with the address determined by > private data. > > (3) is a complete non-starter. It's just not feasible to rewrite all > that code. Furthermore, there's no way to know what code needs to be > rewritten! (Until someone publishes an advisory, that is...) > > Hmm, I can't think of any reason that this technique wouldn't work to > extract information from kernel secrets, as well... > > If SHA has plaintext-dependent memory references, Colin's technique > would enable an adversary to extract the contents of the /dev/random > pools. I don't *think* SHA does, based on a quick reading of > lib/sha1.c, but someone with an actual clue should probably take a look. > > Andi, are you prepared to *require* that no code ever make a memory > reference as a function of a secret? Because that's what you're > suggesting the crypto people should do. > > -andy > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 21:26 ` Andy Isaacson ` (3 preceding siblings ...) 2005-05-15 14:00 ` Mikulas Patocka @ 2005-05-15 14:26 ` Andi Kleen 4 siblings, 0 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-15 14:26 UTC (permalink / raw) To: Andy Isaacson; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso > There are three places to cut off the side channel, none of which is > obviously the right one. > 1. The HT implementation could do the cache tricks Colin suggested in > his paper. Fairly large performance hit to address a fairly small > problem. As Dean pointed out that is probably not true. > 2. The OS could do the scheduler tricks to avoid scheduling unfriendly > threads on the same core. You're leaving a lot of the benefit of HT > on the floor by doing so. And probably still lose badly in some workloads. > 3. Every security-sensitive app can be rigorously audited and re-written > to avoid *ever* referencing memory with the address determined by > private data. Sure after it was demonstrated that this attack is actually feasible in practice. If yes then fix the crypto code, otherwise do nothing. I have no problem with crypto people being paranoid (that is their job after all), as long as they don't try to affect non crypto code in the process. But the later seems to be clearly the case here :-( > > (3) is a complete non-starter. It's just not feasible to rewrite all > that code. Furthermore, there's no way to know what code needs to be > rewritten! (Until someone publishes an advisory, that is...) > > Hmm, I can't think of any reason that this technique wouldn't work to > extract information from kernel secrets, as well... > > If SHA has plaintext-dependent memory references, Colin's technique > would enable an adversary to extract the contents of the /dev/random > pools. I don't *think* SHA does, based on a quick reading of > lib/sha1.c, but someone with an actual clue should probably take a look. > > Andi, are you prepared to *require* that no code ever make a memory > reference as a function of a secret? Because that's what you're > suggesting the crypto people should do. No, just not do it frequently enough that you leak enough data. Or add dummy memory references to blend your data. And then nobody said writing crypto code was easy. It just got a bit harder today. It is basically like writing smart card code, where you need to care about such side channels. The other crypto code writers just need to care about this too. They will probably avoid other timing attacks on cache misses too with this approach. Although it is doubtful enough signal is leaked in this way, e.g. if you time the performance of a network server with RSA answering over the network - but you see some data is always leaked - the question is just if it is enough and accurate data to aid an attacker. The paper has shown that it is feasible in some cases, but so far the proof is still out this could be actually replicated in not very controlled loads. With more noise in the data it becomes harder. And the question is is the small amount of data with normal background workload is really useful enough to lead to useful real world attacks. I have severe doubts on that. Certainly the effidence is not clear enough for a serious step like disabling an useful performance enhancement like HT. -Andi P.S.: My personal opinion is that we have a far bigger crypto security problem on many system due to weak /dev/random seeding on many systems. If anything is done it would be better to attack that. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 19:05 ` Andi Kleen 2005-05-13 21:26 ` Andy Isaacson @ 2005-05-13 23:32 ` Paul Jakma 2005-05-14 16:29 ` Paul Jakma 1 sibling, 1 reply; 144+ messages in thread From: Paul Jakma @ 2005-05-13 23:32 UTC (permalink / raw) To: Andi Kleen; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel On Fri, 13 May 2005, Andi Kleen wrote: > No, i strongly disagree on that. The reasonable thing to do is to > fix the crypto code which has this vulnerability, not break a > useful performance enhancement for everybody else. Already done: http://www.openssl.org/news/secadv_20030317.txt This is old news it seems, a timing attack that has long been known about and fixed. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: What happens when you cut back the jungle? It recedes. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 23:32 ` Paul Jakma @ 2005-05-14 16:29 ` Paul Jakma 0 siblings, 0 replies; 144+ messages in thread From: Paul Jakma @ 2005-05-14 16:29 UTC (permalink / raw) To: Andi Kleen; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel On Sat, 14 May 2005, Paul Jakma wrote: > http://www.openssl.org/news/secadv_20030317.txt > > This is old news it seems, a timing attack that has long been known > about and fixed. I've now been told it's a new, more involved, timing attack to the one the URL above describes a defence against. regards, -- Paul Jakma paul@clubi.ie paul@jakma.org Key ID: 64A2FF6A Fortune: Weinberg's First Law: Progress is only made on alternate Fridays. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:38 ` Richard F. Rebel 2005-05-13 19:05 ` Andi Kleen @ 2005-05-13 19:14 ` Jim Crilly 2005-05-13 20:18 ` Barry K. Nathan 1 sibling, 1 reply; 144+ messages in thread From: Jim Crilly @ 2005-05-13 19:14 UTC (permalink / raw) To: Richard F. Rebel; +Cc: Andi Kleen, Gabor MICSKO, linux-kernel On 05/13/05 02:38:03PM -0400, Richard F. Rebel wrote: > On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote: > > This is not a kernel problem, but a user space problem. The fix > > is to change the user space crypto code to need the same number of cache line > > accesses on all keys. > > > > Disabling HT for this would the totally wrong approach, like throwing > > out the baby with the bath water. > > > > -Andi > > Why? It's certainly reasonable to disable it for the time being and > even prudent to do so. And what if you have more than one physical HT processor? AFAIK there's no way to disable HT and still run SMP at the same time. > > -- > Richard F. Rebel > > cat /dev/null > `tty` Jim. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 19:14 ` Jim Crilly @ 2005-05-13 20:18 ` Barry K. Nathan 2005-05-13 23:14 ` Jim Crilly 0 siblings, 1 reply; 144+ messages in thread From: Barry K. Nathan @ 2005-05-13 20:18 UTC (permalink / raw) To: Richard F. Rebel, Andi Kleen, Gabor MICSKO, linux-kernel On Fri, May 13, 2005 at 03:14:43PM -0400, Jim Crilly wrote: > And what if you have more than one physical HT processor? AFAIK there's no > way to disable HT and still run SMP at the same time. Actually, there is; read my post earlier in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=111598859708620&w=2 To elaborate on the "check dmesg" part of that e-mail: After you reboot with "maxcpus=2" (or however many physical CPU's you have), you need to make sure you have messages like this, which indicate that it really worked: WARNING: No sibling found for CPU 0. WARNING: No sibling found for CPU 1. (and so on, if you have more than 2 CPU's) -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 20:18 ` Barry K. Nathan @ 2005-05-13 23:14 ` Jim Crilly 0 siblings, 0 replies; 144+ messages in thread From: Jim Crilly @ 2005-05-13 23:14 UTC (permalink / raw) To: Barry K. Nathan; +Cc: Richard F. Rebel, Andi Kleen, Gabor MICSKO, linux-kernel On 05/13/05 01:18:40PM -0700, Barry K. Nathan wrote: > On Fri, May 13, 2005 at 03:14:43PM -0400, Jim Crilly wrote: > > And what if you have more than one physical HT processor? AFAIK there's no > > way to disable HT and still run SMP at the same time. > > Actually, there is; read my post earlier in this thread: > http://marc.theaimsgroup.com/?l=linux-kernel&m=111598859708620&w=2 > > To elaborate on the "check dmesg" part of that e-mail: > > After you reboot with "maxcpus=2" (or however many physical CPU's you > have), you need to make sure you have messages like this, which indicate > that it really worked: > > WARNING: No sibling found for CPU 0. > WARNING: No sibling found for CPU 1. > > (and so on, if you have more than 2 CPU's) But what about machines that don't enumerate physical processors before logical? The comment in setup.c implies that the order that the BIOS presents CPUs is undefined and if you're unlucky enough to have a machine that presents the CPUs as physical, logical, physical, logical, etc you're screwed. Jim. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 18:03 ` Andi Kleen ` (2 preceding siblings ...) 2005-05-13 18:38 ` Richard F. Rebel @ 2005-05-13 19:16 ` Diego Calleja 2005-05-13 19:42 ` Frank Denis (Jedi/Sector One) 2005-05-15 9:54 ` Andi Kleen 3 siblings, 2 replies; 144+ messages in thread From: Diego Calleja @ 2005-05-13 19:16 UTC (permalink / raw) To: Andi Kleen; +Cc: gmicsko, linux-kernel El Fri, 13 May 2005 20:03:58 +0200, Andi Kleen <ak@muc.de> escribió: > This is not a kernel problem, but a user space problem. The fix > is to change the user space crypto code to need the same number of cache line > accesses on all keys. However they've patched the FreeBSD kernel to "workaround?" it: ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 19:16 ` Diego Calleja @ 2005-05-13 19:42 ` Frank Denis (Jedi/Sector One) 2005-05-15 9:54 ` Andi Kleen 1 sibling, 0 replies; 144+ messages in thread From: Frank Denis (Jedi/Sector One) @ 2005-05-13 19:42 UTC (permalink / raw) To: Diego Calleja; +Cc: Andi Kleen, gmicsko, linux-kernel On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote: > However they've patched the FreeBSD kernel to "workaround?" it: > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch This patch just disables hyperthreading by default. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 19:16 ` Diego Calleja 2005-05-13 19:42 ` Frank Denis (Jedi/Sector One) @ 2005-05-15 9:54 ` Andi Kleen 2005-05-15 13:51 ` Mikulas Patocka 1 sibling, 1 reply; 144+ messages in thread From: Andi Kleen @ 2005-05-15 9:54 UTC (permalink / raw) To: Diego Calleja; +Cc: gmicsko, linux-kernel On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote: > El Fri, 13 May 2005 20:03:58 +0200, > Andi Kleen <ak@muc.de> escribi?: > > > > This is not a kernel problem, but a user space problem. The fix > > is to change the user space crypto code to need the same number of cache line > > accesses on all keys. > > > However they've patched the FreeBSD kernel to "workaround?" it: > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch That's a similar stupid idea as they did with the disk write cache (lowering the MTBFs of their disks by considerable factors, which is much worse than the power off data loss problem) Let's not go down this path please. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 9:54 ` Andi Kleen @ 2005-05-15 13:51 ` Mikulas Patocka 2005-05-15 14:12 ` Andi Kleen 0 siblings, 1 reply; 144+ messages in thread From: Mikulas Patocka @ 2005-05-15 13:51 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Sun, 15 May 2005, Andi Kleen wrote: > On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote: > > El Fri, 13 May 2005 20:03:58 +0200, > > Andi Kleen <ak@muc.de> escribi?: > > > > > > > This is not a kernel problem, but a user space problem. The fix > > > is to change the user space crypto code to need the same number of cache line > > > accesses on all keys. > > > > > > However they've patched the FreeBSD kernel to "workaround?" it: > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch > > That's a similar stupid idea as they did with the disk write > cache (lowering the MTBFs of their disks by considerable factors, > which is much worse than the power off data loss problem) > Let's not go down this path please. What wrong did they do with disk write cache? Mikulas > -Andi > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 13:51 ` Mikulas Patocka @ 2005-05-15 14:12 ` Andi Kleen 2005-05-15 14:21 ` Mikulas Patocka 2005-05-15 14:52 ` Tomasz Torcz 0 siblings, 2 replies; 144+ messages in thread From: Andi Kleen @ 2005-05-15 14:12 UTC (permalink / raw) To: Mikulas Patocka; +Cc: linux-kernel On Sun, May 15, 2005 at 03:51:05PM +0200, Mikulas Patocka wrote: > > > On Sun, 15 May 2005, Andi Kleen wrote: > > > On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote: > > > El Fri, 13 May 2005 20:03:58 +0200, > > > Andi Kleen <ak@muc.de> escribi?: > > > > > > > > > > This is not a kernel problem, but a user space problem. The fix > > > > is to change the user space crypto code to need the same number of cache line > > > > accesses on all keys. > > > > > > > > > However they've patched the FreeBSD kernel to "workaround?" it: > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch > > > > That's a similar stupid idea as they did with the disk write > > cache (lowering the MTBFs of their disks by considerable factors, > > which is much worse than the power off data loss problem) > > Let's not go down this path please. > > What wrong did they do with disk write cache? They turned it off by default, which according to disk vendors lowers the MTBF of your disk to a fraction of the original value. I bet the total amount of valuable data lost for FreeBSD users because of broken disks is much much bigger than what they gained from not losing in the rather hard to hit power off cases. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 14:12 ` Andi Kleen @ 2005-05-15 14:21 ` Mikulas Patocka 2005-05-15 14:52 ` Tomasz Torcz 1 sibling, 0 replies; 144+ messages in thread From: Mikulas Patocka @ 2005-05-15 14:21 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Sun, 15 May 2005, Andi Kleen wrote: > On Sun, May 15, 2005 at 03:51:05PM +0200, Mikulas Patocka wrote: > > > > > > On Sun, 15 May 2005, Andi Kleen wrote: > > > > > On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote: > > > > El Fri, 13 May 2005 20:03:58 +0200, > > > > Andi Kleen <ak@muc.de> escribi?: > > > > > > > > > > > > > This is not a kernel problem, but a user space problem. The fix > > > > > is to change the user space crypto code to need the same number of cache line > > > > > accesses on all keys. > > > > > > > > > > > > However they've patched the FreeBSD kernel to "workaround?" it: > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch > > > > > > That's a similar stupid idea as they did with the disk write > > > cache (lowering the MTBFs of their disks by considerable factors, > > > which is much worse than the power off data loss problem) > > > Let's not go down this path please. > > > > What wrong did they do with disk write cache? > > They turned it off by default, which according to disk vendors > lowers the MTBF of your disk to a fraction of the original value. > > I bet the total amount of valuable data lost for FreeBSD users because > of broken disks is much much bigger than what they gained from not losing > in the rather hard to hit power off cases. > > -Andi BTW. Is there any blacklist of disks with broken FLUSH CACHE command? Or a list of companies that cheat in implementation of it? Mikulas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 14:12 ` Andi Kleen 2005-05-15 14:21 ` Mikulas Patocka @ 2005-05-15 14:52 ` Tomasz Torcz 2005-05-15 15:00 ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka 2005-05-15 15:00 ` Hyper-Threading Vulnerability Arjan van de Ven 1 sibling, 2 replies; 144+ messages in thread From: Tomasz Torcz @ 2005-05-15 14:52 UTC (permalink / raw) To: linux-kernel On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: > > > > However they've patched the FreeBSD kernel to "workaround?" it: > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch > > > > > > That's a similar stupid idea as they did with the disk write > > > cache (lowering the MTBFs of their disks by considerable factors, > > > which is much worse than the power off data loss problem) > > > Let's not go down this path please. > > > > What wrong did they do with disk write cache? > > They turned it off by default, which according to disk vendors > lowers the MTBF of your disk to a fraction of the original value. > > I bet the total amount of valuable data lost for FreeBSD users because > of broken disks is much much bigger than what they gained from not losing > in the rather hard to hit power off cases. Aren't I/O barriers a way to safely use write cache? -- Tomasz Torcz "God, root, what's the difference?" zdzichu@irc.-nie.spam-.pl "God is more forgiving." ^ permalink raw reply [flat|nested] 144+ messages in thread
* Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 14:52 ` Tomasz Torcz @ 2005-05-15 15:00 ` Mikulas Patocka 2005-05-15 15:21 ` Gene Heskett 2005-05-16 14:50 ` Alan Cox 2005-05-15 15:00 ` Hyper-Threading Vulnerability Arjan van de Ven 1 sibling, 2 replies; 144+ messages in thread From: Mikulas Patocka @ 2005-05-15 15:00 UTC (permalink / raw) To: Tomasz Torcz; +Cc: linux-kernel On Sun, 15 May 2005, Tomasz Torcz wrote: > On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: > > > > > However they've patched the FreeBSD kernel to "workaround?" it: > > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch > > > > > > > > That's a similar stupid idea as they did with the disk write > > > > cache (lowering the MTBFs of their disks by considerable factors, > > > > which is much worse than the power off data loss problem) > > > > Let's not go down this path please. > > > > > > What wrong did they do with disk write cache? > > > > They turned it off by default, which according to disk vendors > > lowers the MTBF of your disk to a fraction of the original value. > > > > I bet the total amount of valuable data lost for FreeBSD users because > > of broken disks is much much bigger than what they gained from not losing > > in the rather hard to hit power off cases. > > Aren't I/O barriers a way to safely use write cache? FreeBSD used these barriers (FLUSH CACHE command) long time ago. There are rumors that some disks ignore FLUSH CACHE command just to get higher benchmarks in Windows. But I haven't heart of any proof. Does anybody know, what companies fake this command? Mikulas ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 15:00 ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka @ 2005-05-15 15:21 ` Gene Heskett 2005-05-15 15:29 ` Jeff Garzik ` (2 more replies) 2005-05-16 14:50 ` Alan Cox 1 sibling, 3 replies; 144+ messages in thread From: Gene Heskett @ 2005-05-15 15:21 UTC (permalink / raw) To: linux-kernel On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: >On Sun, 15 May 2005, Tomasz Torcz wrote: >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: >> > > > > However they've patched the FreeBSD kernel to >> > > > > "workaround?" it: >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht >> > > > >t5.patch >> > > > >> > > > That's a similar stupid idea as they did with the disk write >> > > > cache (lowering the MTBFs of their disks by considerable >> > > > factors, which is much worse than the power off data loss >> > > > problem) Let's not go down this path please. >> > > >> > > What wrong did they do with disk write cache? >> > >> > They turned it off by default, which according to disk vendors >> > lowers the MTBF of your disk to a fraction of the original >> > value. >> > >> > I bet the total amount of valuable data lost for FreeBSD users >> > because of broken disks is much much bigger than what they >> > gained from not losing in the rather hard to hit power off >> > cases. >> >> Aren't I/O barriers a way to safely use write cache? > >FreeBSD used these barriers (FLUSH CACHE command) long time ago. > >There are rumors that some disks ignore FLUSH CACHE command just to > get higher benchmarks in Windows. But I haven't heart of any proof. > Does anybody know, what companies fake this command? > >From a story I read elsewhere just a few days ago, this problem is virtually universal even in the umpty-bucks 15,000 rpm scsi server drives. It appears that this is just another way to crank up the numbers and make each drive seem faster than its competition. My gut feeling is that if this gets enough ink to get under the drive makers skins, we will see the issuance of a utility from the makers that will re-program the drives therefore enabling the proper handling of the FLUSH CACHE command. This would be an excellent chance IMO, to make a bit of noise if the utility comes out, but only runs on windows. In that event, we hold their feet to the fire (the prefereable method), or a wrapper is written that allows it to run on any os with a bash-like shell manager. >Mikulas >- >To unsubscribe from this list: send the line "unsubscribe > linux-kernel" in the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.34% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 15:21 ` Gene Heskett @ 2005-05-15 15:29 ` Jeff Garzik 2005-05-15 16:27 ` Disk write cache Kenichi Okuyama 2005-05-16 1:56 ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett 2005-05-15 16:24 ` Mikulas Patocka 2005-05-15 21:38 ` Tomasz Torcz 2 siblings, 2 replies; 144+ messages in thread From: Jeff Garzik @ 2005-05-15 15:29 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote: > On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: > >On Sun, 15 May 2005, Tomasz Torcz wrote: > >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: > >> > > > > However they've patched the FreeBSD kernel to > >> > > > > "workaround?" it: > >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht > >> > > > >t5.patch > >> > > > > >> > > > That's a similar stupid idea as they did with the disk write > >> > > > cache (lowering the MTBFs of their disks by considerable > >> > > > factors, which is much worse than the power off data loss > >> > > > problem) Let's not go down this path please. > >> > > > >> > > What wrong did they do with disk write cache? > >> > > >> > They turned it off by default, which according to disk vendors > >> > lowers the MTBF of your disk to a fraction of the original > >> > value. > >> > > >> > I bet the total amount of valuable data lost for FreeBSD users > >> > because of broken disks is much much bigger than what they > >> > gained from not losing in the rather hard to hit power off > >> > cases. > >> > >> Aren't I/O barriers a way to safely use write cache? > > > >FreeBSD used these barriers (FLUSH CACHE command) long time ago. > > > >There are rumors that some disks ignore FLUSH CACHE command just to > > get higher benchmarks in Windows. But I haven't heart of any proof. > > Does anybody know, what companies fake this command? > > > >From a story I read elsewhere just a few days ago, this problem is > virtually universal even in the umpty-bucks 15,000 rpm scsi server > drives. It appears that this is just another way to crank up the > numbers and make each drive seem faster than its competition. > > My gut feeling is that if this gets enough ink to get under the drive > makers skins, we will see the issuance of a utility from the makers > that will re-program the drives therefore enabling the proper > handling of the FLUSH CACHE command. This would be an excellent > chance IMO, to make a bit of noise if the utility comes out, but only > runs on windows. In that event, we hold their feet to the fire (the > prefereable method), or a wrapper is written that allows it to run on > any os with a bash-like shell manager. There is a large amount of yammering and speculation in this thread. Most disks do seem to obey SYNC CACHE / FLUSH CACHE. Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache 2005-05-15 15:29 ` Jeff Garzik @ 2005-05-15 16:27 ` Kenichi Okuyama 2005-05-15 16:43 ` Jeff Garzik 2005-05-16 1:56 ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett 1 sibling, 1 reply; 144+ messages in thread From: Kenichi Okuyama @ 2005-05-15 16:27 UTC (permalink / raw) To: jgarzik; +Cc: gene.heskett, linux-kernel >>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes: Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote: >> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: >> >On Sun, 15 May 2005, Tomasz Torcz wrote: >> >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: >> >> > > > > However they've patched the FreeBSD kernel to >> >> > > > > "workaround?" it: >> >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht >> >> > > > >t5.patch >> >> > > > >> >> > > > That's a similar stupid idea as they did with the disk write >> >> > > > cache (lowering the MTBFs of their disks by considerable >> >> > > > factors, which is much worse than the power off data loss >> >> > > > problem) Let's not go down this path please. >> >> > > >> >> > > What wrong did they do with disk write cache? >> >> > >> >> > They turned it off by default, which according to disk vendors >> >> > lowers the MTBF of your disk to a fraction of the original >> >> > value. >> >> > >> >> > I bet the total amount of valuable data lost for FreeBSD users >> >> > because of broken disks is much much bigger than what they >> >> > gained from not losing in the rather hard to hit power off >> >> > cases. >> >> >> >> Aren't I/O barriers a way to safely use write cache? >> > >> >FreeBSD used these barriers (FLUSH CACHE command) long time ago. >> > >> >There are rumors that some disks ignore FLUSH CACHE command just to >> > get higher benchmarks in Windows. But I haven't heart of any proof. >> > Does anybody know, what companies fake this command? >> > >> >From a story I read elsewhere just a few days ago, this problem is >> virtually universal even in the umpty-bucks 15,000 rpm scsi server >> drives. It appears that this is just another way to crank up the >> numbers and make each drive seem faster than its competition. >> >> My gut feeling is that if this gets enough ink to get under the drive >> makers skins, we will see the issuance of a utility from the makers >> that will re-program the drives therefore enabling the proper >> handling of the FLUSH CACHE command. This would be an excellent >> chance IMO, to make a bit of noise if the utility comes out, but only >> runs on windows. In that event, we hold their feet to the fire (the >> prefereable method), or a wrapper is written that allows it to run on >> any os with a bash-like shell manager. Jeff> There is a large amount of yammering and speculation in this thread. Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE. Then it must be file system who's not controlling properly. And because this is so widely spread among Linux, there must be at least one bug existing in VFS ( or there was, and everyone copied it ). At least, from: http://developer.osdl.jp/projects/doubt/ there is project name "diskio" which does black box test about this: http://developer.osdl.jp/projects/doubt/diskio/index.html And if we assume for Read after Write access semantics of HDD for "SURELY" checking the data image on disk surface ( by HDD, I mean ), on both SCSI and ATA, ALL the file system does not pass the test. And I was wondering who's bad. File system? Device driver of both SCSI and ATA? or criterion? From Jeff's point, it seems like file system or criterion... ---- Kenichi Okuyama ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache 2005-05-15 16:27 ` Disk write cache Kenichi Okuyama @ 2005-05-15 16:43 ` Jeff Garzik 2005-05-15 16:50 ` Kyle Moffett ` (4 more replies) 0 siblings, 5 replies; 144+ messages in thread From: Jeff Garzik @ 2005-05-15 16:43 UTC (permalink / raw) To: Kenichi Okuyama; +Cc: gene.heskett, linux-kernel Kenichi Okuyama wrote: >>>>>>"Jeff" == Jeff Garzik <jgarzik@pobox.com> writes: > > > Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote: > >>>On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: >>> >>>>On Sun, 15 May 2005, Tomasz Torcz wrote: >>>> >>>>>On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: >>>>> >>>>>>>>>However they've patched the FreeBSD kernel to >>>>>>>>>"workaround?" it: >>>>>>>>>ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht >>>>>>>>>t5.patch >>>>>>>> >>>>>>>>That's a similar stupid idea as they did with the disk write >>>>>>>>cache (lowering the MTBFs of their disks by considerable >>>>>>>>factors, which is much worse than the power off data loss >>>>>>>>problem) Let's not go down this path please. >>>>>>> >>>>>>>What wrong did they do with disk write cache? >>>>>> >>>>>>They turned it off by default, which according to disk vendors >>>>>>lowers the MTBF of your disk to a fraction of the original >>>>>>value. >>>>>> >>>>>>I bet the total amount of valuable data lost for FreeBSD users >>>>>>because of broken disks is much much bigger than what they >>>>>>gained from not losing in the rather hard to hit power off >>>>>>cases. >>>>> >>>>> Aren't I/O barriers a way to safely use write cache? >>>> >>>>FreeBSD used these barriers (FLUSH CACHE command) long time ago. >>>> >>>>There are rumors that some disks ignore FLUSH CACHE command just to >>>>get higher benchmarks in Windows. But I haven't heart of any proof. >>>>Does anybody know, what companies fake this command? >>>> >>> >>>>From a story I read elsewhere just a few days ago, this problem is >>>virtually universal even in the umpty-bucks 15,000 rpm scsi server >>>drives. It appears that this is just another way to crank up the >>>numbers and make each drive seem faster than its competition. >>> >>>My gut feeling is that if this gets enough ink to get under the drive >>>makers skins, we will see the issuance of a utility from the makers >>>that will re-program the drives therefore enabling the proper >>>handling of the FLUSH CACHE command. This would be an excellent >>>chance IMO, to make a bit of noise if the utility comes out, but only >>>runs on windows. In that event, we hold their feet to the fire (the >>>prefereable method), or a wrapper is written that allows it to run on >>>any os with a bash-like shell manager. > > > > Jeff> There is a large amount of yammering and speculation in this thread. > > Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE. > > > Then it must be file system who's not controlling properly. And > because this is so widely spread among Linux, there must be at least > one bug existing in VFS ( or there was, and everyone copied it ). > > At least, from: > > http://developer.osdl.jp/projects/doubt/ > > there is project name "diskio" which does black box test about this: > > http://developer.osdl.jp/projects/doubt/diskio/index.html > > And if we assume for Read after Write access semantics of HDD for > "SURELY" checking the data image on disk surface ( by HDD, I mean ), > on both SCSI and ATA, ALL the file system does not pass the test. > > And I was wondering who's bad. File system? Device driver of both > SCSI and ATA? or criterion? From Jeff's point, it seems like file > system or criterion... The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE command to be generated has only been present in the most recent 2.6.x kernels. See the "write barrier" stuff that people have been discussing. Furthermore, read-after-write implies nothing at all. The only way to you can be assured that your data has "hit the platter" is (1) issuing [FLUSH|SYNC] CACHE, or (2) using FUA-style disk commands It sounds like your test (or reasoning) is invalid. Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache 2005-05-15 16:43 ` Jeff Garzik @ 2005-05-15 16:50 ` Kyle Moffett 2005-05-15 16:56 ` Andi Kleen ` (3 subsequent siblings) 4 siblings, 0 replies; 144+ messages in thread From: Kyle Moffett @ 2005-05-15 16:50 UTC (permalink / raw) To: Jeff Garzik; +Cc: Kenichi Okuyama, gene.heskett, linux-kernel On May 15, 2005, at 12:43:07, Jeff Garzik wrote: > The only way to you can be assured that your data has "hit the > platter" is > (1) issuing [FLUSH|SYNC] CACHE, or > (2) using FUA-style disk commands And even then, some battery-backed RAID controllers will completely ignore cache-flushes, because in the event of a power failure they can maintain the cached data for anywhere from a couple days to a month or two, depending on the quality of the card and the size of its battery. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache 2005-05-15 16:43 ` Jeff Garzik 2005-05-15 16:50 ` Kyle Moffett @ 2005-05-15 16:56 ` Andi Kleen 2005-05-15 20:44 ` Andrew Morton 2005-05-15 16:58 ` Disk write cache Mikulas Patocka ` (2 subsequent siblings) 4 siblings, 1 reply; 144+ messages in thread From: Andi Kleen @ 2005-05-15 16:56 UTC (permalink / raw) To: Jeff Garzik; +Cc: gene.heskett, linux-kernel, okuyamak Jeff Garzik <jgarzik@pobox.com> writes: > > The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE > command to be generated has only been present in the most recent 2.6.x > kernels. See the "write barrier" stuff that people have been > discussing. Are you sure mainline does it for fsync() file data at all? iirc it was only done for journal writes in reiserfs/xfs/jbd. However since I suppose a lot of disks flush everything pending on a flush cache command it still works assuming the file systems write the data to disk in fsync before syncing the journal. I don't know if they do that. -Andi ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache 2005-05-15 16:56 ` Andi Kleen @ 2005-05-15 20:44 ` Andrew Morton 2005-05-15 23:31 ` Cache based insecurity/CPU cache/Disk Cache Tradeoffs Brian O'Mahoney 0 siblings, 1 reply; 144+ messages in thread From: Andrew Morton @ 2005-05-15 20:44 UTC (permalink / raw) To: Andi Kleen; +Cc: jgarzik, gene.heskett, linux-kernel, okuyamak Andi Kleen <ak@muc.de> wrote: > > However since > I suppose a lot of disks flush everything pending on a flush cache > command it still works assuming the file systems write the > data to disk in fsync before syncing the journal. I don't know > if they do that. ext3 does, in data=journal and data=ordered modes. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Cache based insecurity/CPU cache/Disk Cache Tradeoffs 2005-05-15 20:44 ` Andrew Morton @ 2005-05-15 23:31 ` Brian O'Mahoney 0 siblings, 0 replies; 144+ messages in thread From: Brian O'Mahoney @ 2005-05-15 23:31 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel In principle, it is correct that CPU caches should _not_ permit, or facilitate data leakage attacks and disk caches should _not_ prevent applications from ensuring that data is really transferred to non- volatile storage. But turning Hypertheading, multiple ALUs, or disk cacheing off in the OS is not a solution, it is a cop-out, and as other posters have pointed out, simply invites other more serious failure modes; thus the BSD knee jerk reactions are simply wrong, and in fact counter productive. The name of the game is a correct, not a fast fix. Don't make things worse. So what really does need doing (a) a power-is-failing hook which does a dirty-writback and flush cache to disk; this is the best you can do, and it is very very cheap to provide DC power hold up for 10(s)->100(s) seconds, by which time the crap disks will do an autonomous writback anyway (1-10 F +5v,+12b ~ 12 USD say), or, on servers use a UPS with, say 30m hold up. Well designed servers, or SAN disks have this built in. (b) CPU registers, and caches, are inherently insecure, and most hardware designers still do not have a good enough background to understand what the OS really needs to do this right in hardware: so secure apps need a way to tell the OS to do an _expensive_ context switch in which it is guarenteed to flush all all leaky-context, and since this is architecture-model-sub_architecture- ... mask step dependant it can only be done in the OS, but user-land needs a way to tell the OS to be paranoic, after the context save and before scheduling another real context (excluding the idle-loop), this is an API extension, ulimit ?. This will let user-land, not include atchitecture dependant code, and most context switches to be no more expensive than they are now. Almost no applications need paranoic context flushes, can't know how to do it themselves, so this has to go in the model dependant OS code, with a user mode API to turn it on per-thead. -- mit freundlichen Grüßen, Brian. Dr. Brian O'Mahoney Mobile +41 (0)79 334 8035 Email: omb@bluewin.ch Bleicherstrasse 25, CH-8953 Dietikon, Switzerland PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D F5 14 39 C9 6D 38 56 D5 ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache 2005-05-15 16:43 ` Jeff Garzik 2005-05-15 16:50 ` Kyle Moffett 2005-05-15 16:56 ` Andi Kleen @ 2005-05-15 16:58 ` Mikulas Patocka 2005-05-15 17:20 ` Kenichi Okuyama 2005-05-16 11:02 ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree 4 siblings, 0 replies; 144+ messages in thread From: Mikulas Patocka @ 2005-05-15 16:58 UTC (permalink / raw) To: Jeff Garzik; +Cc: Kenichi Okuyama, gene.heskett, linux-kernel On Sun, 15 May 2005, Jeff Garzik wrote: > Kenichi Okuyama wrote: > >>>>>>"Jeff" == Jeff Garzik <jgarzik@pobox.com> writes: > > > > > > Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote: > > > >>>On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: > >>> > >>>>On Sun, 15 May 2005, Tomasz Torcz wrote: > >>>> > >>>>>On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: > >>>>> > >>>>>>>>>However they've patched the FreeBSD kernel to > >>>>>>>>>"workaround?" it: > >>>>>>>>>ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht > >>>>>>>>>t5.patch > >>>>>>>> > >>>>>>>>That's a similar stupid idea as they did with the disk write > >>>>>>>>cache (lowering the MTBFs of their disks by considerable > >>>>>>>>factors, which is much worse than the power off data loss > >>>>>>>>problem) Let's not go down this path please. > >>>>>>> > >>>>>>>What wrong did they do with disk write cache? > >>>>>> > >>>>>>They turned it off by default, which according to disk vendors > >>>>>>lowers the MTBF of your disk to a fraction of the original > >>>>>>value. > >>>>>> > >>>>>>I bet the total amount of valuable data lost for FreeBSD users > >>>>>>because of broken disks is much much bigger than what they > >>>>>>gained from not losing in the rather hard to hit power off > >>>>>>cases. > >>>>> > >>>>> Aren't I/O barriers a way to safely use write cache? > >>>> > >>>>FreeBSD used these barriers (FLUSH CACHE command) long time ago. > >>>> > >>>>There are rumors that some disks ignore FLUSH CACHE command just to > >>>>get higher benchmarks in Windows. But I haven't heart of any proof. > >>>>Does anybody know, what companies fake this command? > >>>> > >>> > >>>>From a story I read elsewhere just a few days ago, this problem is > >>>virtually universal even in the umpty-bucks 15,000 rpm scsi server > >>>drives. It appears that this is just another way to crank up the > >>>numbers and make each drive seem faster than its competition. > >>> > >>>My gut feeling is that if this gets enough ink to get under the drive > >>>makers skins, we will see the issuance of a utility from the makers > >>>that will re-program the drives therefore enabling the proper > >>>handling of the FLUSH CACHE command. This would be an excellent > >>>chance IMO, to make a bit of noise if the utility comes out, but only > >>>runs on windows. In that event, we hold their feet to the fire (the > >>>prefereable method), or a wrapper is written that allows it to run on > >>>any os with a bash-like shell manager. > > > > > > > > Jeff> There is a large amount of yammering and speculation in this thread. > > > > Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE. > > > > > > Then it must be file system who's not controlling properly. And > > because this is so widely spread among Linux, there must be at least > > one bug existing in VFS ( or there was, and everyone copied it ). > > > > At least, from: > > > > http://developer.osdl.jp/projects/doubt/ > > > > there is project name "diskio" which does black box test about this: > > > > http://developer.osdl.jp/projects/doubt/diskio/index.html > > > > And if we assume for Read after Write access semantics of HDD for > > "SURELY" checking the data image on disk surface ( by HDD, I mean ), > > on both SCSI and ATA, ALL the file system does not pass the test. > > > > And I was wondering who's bad. File system? Device driver of both > > SCSI and ATA? or criterion? From Jeff's point, it seems like file > > system or criterion... > > The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE > command to be generated has only been present in the most recent 2.6.x > kernels. See the "write barrier" stuff that people have been discussing. > > Furthermore, read-after-write implies nothing at all. The only way to > you can be assured that your data has "hit the platter" is > (1) issuing [FLUSH|SYNC] CACHE, or > (2) using FUA-style disk commands > > It sounds like your test (or reasoning) is invalid. The above program checks that write+[f[data]]sync took longer than time required to transmit data via IDE bus. It has nothing to do with FLUSH CACHE command at all. The results just show that ext3 used to have bug in f[data]sync in data-journal mode and that XFS still has bug in fdatasync on 2.4 kernels. Incorrect results in this test can't be caused by buggy disk. Mikulas > Jeff > > > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache 2005-05-15 16:43 ` Jeff Garzik ` (2 preceding siblings ...) 2005-05-15 16:58 ` Disk write cache Mikulas Patocka @ 2005-05-15 17:20 ` Kenichi Okuyama 2005-05-16 11:02 ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree 4 siblings, 0 replies; 144+ messages in thread From: Kenichi Okuyama @ 2005-05-15 17:20 UTC (permalink / raw) To: jgarzik; +Cc: gene.heskett, linux-kernel >>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes: Jeff> Kenichi Okuyama wrote: >>>>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes: >> >> Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote: >> >>>> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: >>>> >>>>> On Sun, 15 May 2005, Tomasz Torcz wrote: >>>>> >>>>>> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: >>>>>> >>>>>>>>>> However they've patched the FreeBSD kernel to >>>>>>>>>> "workaround?" it: >>>>>>>>>> ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht >>>>>>>>>> t5.patch >>>>>>>>> >>>>>>>>> That's a similar stupid idea as they did with the disk write >>>>>>>>> cache (lowering the MTBFs of their disks by considerable >>>>>>>>> factors, which is much worse than the power off data loss >>>>>>>>> problem) Let's not go down this path please. >>>>>>>> >>>>>>>> What wrong did they do with disk write cache? >>>>>>> >>>>>>> They turned it off by default, which according to disk vendors >>>>>>> lowers the MTBF of your disk to a fraction of the original >>>>>>> value. >>>>>>> >>>>>>> I bet the total amount of valuable data lost for FreeBSD users >>>>>>> because of broken disks is much much bigger than what they >>>>>>> gained from not losing in the rather hard to hit power off >>>>>>> cases. >>>>>> >>>>> Aren't I/O barriers a way to safely use write cache? >>>>> >>>>> FreeBSD used these barriers (FLUSH CACHE command) long time ago. >>>>> >>>>> There are rumors that some disks ignore FLUSH CACHE command just to >>>>> get higher benchmarks in Windows. But I haven't heart of any proof. >>>>> Does anybody know, what companies fake this command? >>>>> >>>> >>>>> From a story I read elsewhere just a few days ago, this problem is >>>> virtually universal even in the umpty-bucks 15,000 rpm scsi server >>>> drives. It appears that this is just another way to crank up the >>>> numbers and make each drive seem faster than its competition. >>>> >>>> My gut feeling is that if this gets enough ink to get under the drive >>>> makers skins, we will see the issuance of a utility from the makers >>>> that will re-program the drives therefore enabling the proper >>>> handling of the FLUSH CACHE command. This would be an excellent >>>> chance IMO, to make a bit of noise if the utility comes out, but only >>>> runs on windows. In that event, we hold their feet to the fire (the >>>> prefereable method), or a wrapper is written that allows it to run on >>>> any os with a bash-like shell manager. >> >> >> Jeff> There is a large amount of yammering and speculation in this thread. >> Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE. >> >> >> Then it must be file system who's not controlling properly. And >> because this is so widely spread among Linux, there must be at least >> one bug existing in VFS ( or there was, and everyone copied it ). >> >> At least, from: >> >> http://developer.osdl.jp/projects/doubt/ >> >> there is project name "diskio" which does black box test about this: >> >> http://developer.osdl.jp/projects/doubt/diskio/index.html >> >> And if we assume for Read after Write access semantics of HDD for >> "SURELY" checking the data image on disk surface ( by HDD, I mean ), >> on both SCSI and ATA, ALL the file system does not pass the test. >> >> And I was wondering who's bad. File system? Device driver of both >> SCSI and ATA? or criterion? From Jeff's point, it seems like file >> system or criterion... Jeff> The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE Jeff> command to be generated has only been present in the most recent 2.6.x Jeff> kernels. See the "write barrier" stuff that people have been discussing. Jeff> Furthermore, read-after-write implies nothing at all. The only way to Jeff> you can be assured that your data has "hit the platter" is Jeff> (1) issuing [FLUSH|SYNC] CACHE, or Jeff> (2) using FUA-style disk commands Jeff> It sounds like your test (or reasoning) is invalid. Thank you for you information, Jeff. I didn't see the reason why my reasoning is invalid, for they are black box test and doesn't care about implementation. But with your explanation and some logs, I see where to look for. I'll run test with FreeBSD as soon as I got time. If FreeBSD fails, there must be something wrong with reasoning. Thanks again for great hint. regards, ---- Kenichi Okuyama ^ permalink raw reply [flat|nested] 144+ messages in thread
* Linux does not care for data integrity (was: Disk write cache) 2005-05-15 16:43 ` Jeff Garzik ` (3 preceding siblings ...) 2005-05-15 17:20 ` Kenichi Okuyama @ 2005-05-16 11:02 ` Matthias Andree 2005-05-16 11:12 ` Arjan van de Ven 2005-05-16 13:48 ` Linux does not care for data integrity Mark Lord 4 siblings, 2 replies; 144+ messages in thread From: Matthias Andree @ 2005-05-16 11:02 UTC (permalink / raw) To: linux-kernel On Sun, 15 May 2005, Jeff Garzik wrote: > The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE > command to be generated has only been present in the most recent 2.6.x > kernels. See the "write barrier" stuff that people have been discussing. To make this explicit and unmistakable, Linux should be ashamed of having put its users' data at risk for as long as it has existed, and looking at how often I still get "barrier synch failed", it still does with the kernel SUSE Linux 9.3 shipped with. This came up several times, from database and mailserver authors, but found no reasonable solution to date. The documentation which file systems request cache flush for fsync, and which device drivers (SCSI as ATA) as well as chipset adaptors pass this down properly, is still missing. I've asked for help with such a list several times over the recent years, I've offered my help in setting up and maintaining the list when sent the raw information, but no-one cared to provide this kind of information. I will not try again, it's no good, kernel hackers with a handful of exceptions, don't care. If they think they do in spite of my statement, they'll have to prove their point by growing up and documenting which combinations of (file system, mount options, block dev driver, hardware/chip driver) barrier synch is 100% reliable, which file systems, chipset drivers, block drivers, hardware drivers, are missing links in the chain -- and request that the kernel switches off the drive's write cache in all drives unless the whole fsync() stuff works (unless defeated by a "benchmark" kernel boot parameter). Until then, my applications will have to recommend that users switch off drive caches for consistency. P. S.: Yes, the subject and this mail are provoking and exaggerated a tiny bit. I feel that's needed to raise the necessary motivation to finally address this issue after a decade or so. -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 11:02 ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree @ 2005-05-16 11:12 ` Arjan van de Ven 2005-05-16 11:29 ` Matthias Andree 2005-05-16 14:57 ` Linux does not care for data integrity (was: Disk write cache) Alan Cox 2005-05-16 13:48 ` Linux does not care for data integrity Mark Lord 1 sibling, 2 replies; 144+ messages in thread From: Arjan van de Ven @ 2005-05-16 11:12 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel > and > request that the kernel switches off the drive's write cache in all > drives unless the whole fsync() stuff works (unless defeated by a > "benchmark" kernel boot parameter). I think you missed the part where disabling the writecache decreases the mtbf of your disk by like a factor 100 or so. At which point your dataloss opportunity INCREASES by doing this. Sure you can waive rethorics around, but the fact is that linux is improving; there now is write barrier support for ext3 (and I assume reiserfs) for at least IDE and iirc selected scsi too. Lets repeat that again: disabling the writecache altogether is bad for your disk. really bad. Barriers aren't brilliant for it either but a heck of a lot better. Lacking barriers, it's probably safer for your data to have write cache on than off. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 11:12 ` Arjan van de Ven @ 2005-05-16 11:29 ` Matthias Andree 2005-05-16 14:02 ` Arjan van de Ven 2005-05-16 14:57 ` Linux does not care for data integrity (was: Disk write cache) Alan Cox 1 sibling, 1 reply; 144+ messages in thread From: Matthias Andree @ 2005-05-16 11:29 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Matthias Andree, linux-kernel On Mon, 16 May 2005, Arjan van de Ven wrote: > I think you missed the part where disabling the writecache decreases the > mtbf of your disk by like a factor 100 or so. At which point your > dataloss opportunity INCREASES by doing this. Nah, if that were a factor of 100, then it should have been in the OEM manuals, no? Besides that, although my small sample is not representative, I have older drives still alive & kicking - an MTBF of 1/100 of what the vendor stated would mean the chance of failure way above 90 % by now, the drive has seen 22,000 POH with write cache off and has been a system drive for like 14,000 POH. So? > Sure you can waive rethorics around, but the fact is that linux is > improving; there now is write barrier support for ext3 (and I assume > reiserfs) for at least IDE and iirc selected scsi too. See the problem: "I assume", "IIRC selected...". There is no list of corroborated facts which systems work and which don't. I have made several attempts in compiling one, posting public calls for data here, no response. I don't blame you personally, but the lack of documentation about such crucial facts or generally documentation in Linux environments in general. -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 11:29 ` Matthias Andree @ 2005-05-16 14:02 ` Arjan van de Ven 2005-05-16 14:48 ` Matthias Andree 0 siblings, 1 reply; 144+ messages in thread From: Arjan van de Ven @ 2005-05-16 14:02 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel On Mon, 2005-05-16 at 13:29 +0200, Matthias Andree wrote: > On Mon, 16 May 2005, Arjan van de Ven wrote: > > > I think you missed the part where disabling the writecache decreases the > > mtbf of your disk by like a factor 100 or so. At which point your > > dataloss opportunity INCREASES by doing this. > > Nah, if that were a factor of 100, then it should have been in the OEM > manuals, no? Why would they? Windows doesn't do it. They only need to advertise MTBF in the default settings (and I guess in Windows). They do talk about this if you ask them. > So? one sample doesn't prove the statistics are wrong. > > > Sure you can waive rethorics around, but the fact is that linux is > > improving; there now is write barrier support for ext3 (and I assume > > reiserfs) for at least IDE and iirc selected scsi too. > > See the problem: "I assume", "IIRC selected...". There is no > list of corroborated facts which systems work and which don't. I have > made several attempts in compiling one, posting public calls for data > here, no response. well what stops you from building that list yourself by doing the actual work yourself? ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 14:02 ` Arjan van de Ven @ 2005-05-16 14:48 ` Matthias Andree 2005-05-16 15:06 ` Alan Cox 0 siblings, 1 reply; 144+ messages in thread From: Matthias Andree @ 2005-05-16 14:48 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Matthias Andree, linux-kernel On Mon, 16 May 2005, Arjan van de Ven wrote: > > See the problem: "I assume", "IIRC selected...". There is no > > list of corroborated facts which systems work and which don't. I have > > made several attempts in compiling one, posting public calls for data > > here, no response. > > well what stops you from building that list yourself by doing the actual > work yourself? Two things. #1 it's the subsystem maintainer's responsibility to arrange for such information. I searched Documentation/* to no avail, see below. #2 That I would need to get acquainted with and understand several dozen subsystems, drivers and so on to be able to make a substantiated statement. Subsystem maintainers will usually know the shape their code is in and just need to state "not yet", "not planned", "not needed, different layer", "work in progress" or "working since kernel version 2.6.42". Takes a minute per maintainer, rather than wasting countless hours on working through foreign code only to forget all this after I know what I wanted to know. Sounds like an unreasonable expectation? Not to me. I had hoped, several times, that asking here would give the first dozen of answers as a starting point. It's not as though I could go forth and just take two weeks off a shelf and read all common block devices code... I still have insufficient information even for ext3 on traditional parallel ATA interfaces, so how do I start a list without information? $ cd linux-2.6/Documentation/ $ find -iname '*barr*' ./arm/Sharp-LH/IOBarrier $ head -4 ../Makefile VERSION = 2 PATCHLEVEL = 6 SUBLEVEL = 11 EXTRAVERSION = .9 $ Documentation/block/biodoc.txt has some information about how it could look like two years from now. filesystems/ext3 mentions it requires a barrier=1 mount option. No information what block interfaces support it. AIC7XXX was once reported to have it, experimentally, I don't know what has become of the code, and I don't have AIC7XXX here, too expensive. -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 14:48 ` Matthias Andree @ 2005-05-16 15:06 ` Alan Cox 2005-05-16 15:40 ` Matthias Andree 2005-05-29 21:02 ` Linux does not care for data integrity (was: Disk write cache) Greg Stark 0 siblings, 2 replies; 144+ messages in thread From: Alan Cox @ 2005-05-16 15:06 UTC (permalink / raw) To: Matthias Andree; +Cc: Arjan van de Ven, Linux Kernel Mailing List I think you need to get real if you want that degree of integrity with a PC Your typical PC setup means your precious data Gets written to un ecc protected memory over an unprotected bus Gets read back over the same Each PATA command is sent without any CRC or error recovery/correction The PATA data is pulled out of unprotected memory over PCI It goes to the drive (with a CRC) and gets stored in memory It's probably sitting in non ECC RAM on the disk It's probably fed through non ECC DSP logic It's mixed on the disk with other data and may get rewritten without you knowing You might want to amuse yourself trying to get the bit error rates for the busses and ram to start documenting the probabilities. I'd prefer Linux turned writecache off on old drives but Mark Lord has really good points even there. And for scsi we do tagging and the journals can be ordered depending on your need. You are storing 40 billion bits of information on a lump of metal and glass rotating at 10,000rpm and pushing into areas of quantum theory in order to store you data. It should be no suprise it might not be there a month later. You also appear confused: It isn't the maintainers responsibility to arrange for such info. It's the maintainers responsibility to process contributed patches with such info. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 15:06 ` Alan Cox @ 2005-05-16 15:40 ` Matthias Andree 2005-05-16 18:04 ` Alan Cox 2005-05-29 21:02 ` Linux does not care for data integrity (was: Disk write cache) Greg Stark 1 sibling, 1 reply; 144+ messages in thread From: Matthias Andree @ 2005-05-16 15:40 UTC (permalink / raw) To: Alan Cox; +Cc: Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List On Mon, 16 May 2005, Alan Cox wrote: > I'd prefer Linux turned writecache off on old drives but Mark Lord has > really good points even there. And for scsi we do tagging and the > journals can be ordered depending on your need. Is tagged command queueing (we'll need the ordered tag here) compatible with all SCSI adaptors that Linux supports? What if tagged command queueing is switched off for some reason (adaptor or HW incapability, user override) and the drive still has write cache enable = true and queue algorithm modifier = 1 (which permits out-of-order execution of write requests except for ordered tags)? Is that something that would cause some bit of notice to be logged? Or is that simply "do this at your own risk". My recent SCSI drives have been shipping with WCE=1 and QAM=0. Am I missing a bit here? > You also appear confused: It isn't the maintainers responsibility to > arrange for such info. It's the maintainers responsibility to process > contributed patches with such info. I didn't think of arranging as in "write himself". Who writes that info down doesn't matter, but I'd think that such documentation should always be committed alongside the code, except in code marked experimental. (which, in turn, should only be promoted to non-experimental if it's properly documented). I understand that people who understand the code are eager to focus on the code and even if that documentation is just an unordered lists of statement with a kernel version attached, that'd be fine. But what is a decent code without users? -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 15:40 ` Matthias Andree @ 2005-05-16 18:04 ` Alan Cox 2005-05-16 19:11 ` Linux does not care for data integrity Florian Weimer 0 siblings, 1 reply; 144+ messages in thread From: Alan Cox @ 2005-05-16 18:04 UTC (permalink / raw) To: Matthias Andree; +Cc: Arjan van de Ven, Linux Kernel Mailing List On Llu, 2005-05-16 at 16:40, Matthias Andree wrote: > On Mon, 16 May 2005, Alan Cox wrote: > Is tagged command queueing (we'll need the ordered tag here) compatible > with all SCSI adaptors that Linux supports? TCQ is a device not controller property. > What if tagged command queueing is switched off for some reason > (adaptor or HW incapability, user override) and the drive still has > write cache enable = true and queue algorithm modifier = 1 (which We turn the write back cache off if TCQ isn't available. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-05-16 18:04 ` Alan Cox @ 2005-05-16 19:11 ` Florian Weimer 0 siblings, 0 replies; 144+ messages in thread From: Florian Weimer @ 2005-05-16 19:11 UTC (permalink / raw) To: Alan Cox; +Cc: Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List * Alan Cox: > On Llu, 2005-05-16 at 16:40, Matthias Andree wrote: >> On Mon, 16 May 2005, Alan Cox wrote: >> Is tagged command queueing (we'll need the ordered tag here) compatible >> with all SCSI adaptors that Linux supports? > > TCQ is a device not controller property. I suppose it's one in RAID controllers. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 15:06 ` Alan Cox 2005-05-16 15:40 ` Matthias Andree @ 2005-05-29 21:02 ` Greg Stark 2005-05-29 21:16 ` Matthias Andree 1 sibling, 1 reply; 144+ messages in thread From: Greg Stark @ 2005-05-29 21:02 UTC (permalink / raw) To: Alan Cox; +Cc: Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > I think you need to get real if you want that degree of integrity with a > PC > > Your typical PC setup means your precious data ... All of your listed cases are low probability events. You're quit right that low probability errors will always be present -- you could have just listed cosmic rays and been finished. They're by far the most common such source of errors. But that doesn't mean we should just throw up our hands and say there's no way to make computers work right, let's go home. Making computer systems that don't randomly trash file systems in the case of power outages isn't a hard problem. It's been solved for decades. That's *why* fsync exists. Oracle, Sybase, Postgres, other databases have hard requirements. They guarantee that when they acknowledge a transaction commit the data has been written to non-volatile media and will be recoverable even in the face of a routine power loss. They meet this requirement just fine on SCSI drives (where write caching generally ships disabled) and on any OS where fsync issues a cache flush. If the OS doesn't successfully flush the data to disk on fsync then it's quite likely that any routine power outage will mean transactions are lost. That's just ridiculous. Worse, if the disk flushes the data to disk out of order it's quite likely the entire database will be corrupted on any simple power outage. I'm not clear whether that's the case for any common drives. -- greg ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-29 21:02 ` Linux does not care for data integrity (was: Disk write cache) Greg Stark @ 2005-05-29 21:16 ` Matthias Andree 2005-05-30 6:04 ` Greg Stark 2005-06-01 19:02 ` Linux does not care for data integrity Bill Davidsen 0 siblings, 2 replies; 144+ messages in thread From: Matthias Andree @ 2005-05-29 21:16 UTC (permalink / raw) To: Greg Stark Cc: Alan Cox, Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List On Sun, 29 May 2005, Greg Stark wrote: > Oracle, Sybase, Postgres, other databases have hard requirements. They > guarantee that when they acknowledge a transaction commit the data has been > written to non-volatile media and will be recoverable even in the face of a > routine power loss. > > They meet this requirement just fine on SCSI drives (where write caching > generally ships disabled) and on any OS where fsync issues a cache flush. If I don't know what facts "generally ships disabled" is based on, all of the more recent SCSI drives (non SCA type though) I acquired came with write cache enabled and some also with queue algorithm modifier set to 1. > Worse, if the disk flushes the data to disk out of order it's quite > likely the entire database will be corrupted on any simple power > outage. I'm not clear whether that's the case for any common drives. It's a matter of enforcing write order. In how far such ordering constraints are propagated by file systems, VFS layer, down to the hardware, is the grand question. -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-29 21:16 ` Matthias Andree @ 2005-05-30 6:04 ` Greg Stark 2005-05-30 8:21 ` Matthias Andree 2005-06-01 19:02 ` Linux does not care for data integrity Bill Davidsen 1 sibling, 1 reply; 144+ messages in thread From: Greg Stark @ 2005-05-30 6:04 UTC (permalink / raw) To: Matthias Andree Cc: Greg Stark, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Matthias Andree <matthias.andree@gmx.de> writes: > On Sun, 29 May 2005, Greg Stark wrote: > > > They meet this requirement just fine on SCSI drives (where write caching > > generally ships disabled) and on any OS where fsync issues a cache flush. If > > I don't know what facts "generally ships disabled" is based on, all of > the more recent SCSI drives (non SCA type though) I acquired came with > write cache enabled and some also with queue algorithm modifier set to 1. People routinely post "Why does this cheap IDE drive outperform my shiny new high end SCSI drive?" questions to the postgres mailing list. To which people point out the IDE numbers they've presented are physically impossible for a 7200 RPM drive and the SCSI numbers agree appropriately with an average rotational latency calculated from whatever speed their SCSI drives are. > > Worse, if the disk flushes the data to disk out of order it's quite > > likely the entire database will be corrupted on any simple power > > outage. I'm not clear whether that's the case for any common drives. > > It's a matter of enforcing write order. In how far such ordering > constraints are propagated by file systems, VFS layer, down to the > hardware, is the grand question. Well guaranteeing write order will at least mean the database isn't complete garbage after a power event. It still means lost transactions, something that isn't going to be acceptable for any real-life business where those transactions are actual dollars. -- greg ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-30 6:04 ` Greg Stark @ 2005-05-30 8:21 ` Matthias Andree 0 siblings, 0 replies; 144+ messages in thread From: Matthias Andree @ 2005-05-30 8:21 UTC (permalink / raw) To: Greg Stark Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List On Mon, 30 May 2005, Greg Stark wrote: > Matthias Andree <matthias.andree@gmx.de> writes: > > > On Sun, 29 May 2005, Greg Stark wrote: > > > > > They meet this requirement just fine on SCSI drives (where write caching > > > generally ships disabled) and on any OS where fsync issues a cache flush. If > > > > I don't know what facts "generally ships disabled" is based on, all of > > the more recent SCSI drives (non SCA type though) I acquired came with > > write cache enabled and some also with queue algorithm modifier set to 1. > > People routinely post "Why does this cheap IDE drive outperform my shiny new > high end SCSI drive?" questions to the postgres mailing list. To which people > point out the IDE numbers they've presented are physically impossible for a > 7200 RPM drive and the SCSI numbers agree appropriately with an average > rotational latency calculated from whatever speed their SCSI drives are. This may be a different reason than the vendor default or the saved setting being WCE = 0, Queue Algorithm Modifier = 0... I would really appreciate if the kernel printed a warning for every partition mounted that cannot both enforce write order and guarantee synchronous completion for f(data)sync, based on the drive's write cache, file system type, current write barrier support and all that. > > It's a matter of enforcing write order. In how far such ordering > > constraints are propagated by file systems, VFS layer, down to the > > hardware, is the grand question. > > Well guaranteeing write order will at least mean the database isn't complete > garbage after a power event. > > It still means lost transactions, something that isn't going to be acceptable > for any real-life business where those transactions are actual dollars. Right, synchronous completion is the other issue. I want the kernel to tell me if it's capable of doing that on a particular partition (given hardware settings WRT cache, drivers, file system, and all that). Either in the docs or if it's too confusing via dmesg. -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-05-29 21:16 ` Matthias Andree 2005-05-30 6:04 ` Greg Stark @ 2005-06-01 19:02 ` Bill Davidsen 2005-06-01 22:02 ` Matthias Andree ` (2 more replies) 1 sibling, 3 replies; 144+ messages in thread From: Bill Davidsen @ 2005-06-01 19:02 UTC (permalink / raw) To: Matthias Andree; +Cc: Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Matthias Andree wrote: > On Sun, 29 May 2005, Greg Stark wrote: > > >>Oracle, Sybase, Postgres, other databases have hard requirements. They >>guarantee that when they acknowledge a transaction commit the data has been >>written to non-volatile media and will be recoverable even in the face of a >>routine power loss. >> >>They meet this requirement just fine on SCSI drives (where write caching >>generally ships disabled) and on any OS where fsync issues a cache flush. If > > > I don't know what facts "generally ships disabled" is based on, all of > the more recent SCSI drives (non SCA type though) I acquired came with > write cache enabled and some also with queue algorithm modifier set to 1. > > >>Worse, if the disk flushes the data to disk out of order it's quite >>likely the entire database will be corrupted on any simple power >>outage. I'm not clear whether that's the case for any common drives. > > > It's a matter of enforcing write order. In how far such ordering > constraints are propagated by file systems, VFS layer, down to the > hardware, is the grand question. > The problem is that in many options required to make that happen in the o/s, hardware, and application are going to kill performance. And even if you can control order of write, unless you can get write to final non-volatile media control you can get a sane database but still lose transactions. If there was a way for the o/s to know when a physical write was done other than using flushes to force completion, then overall performance could be higher, but individual transaction might have greater latency. And the app could use fsync to force order of write as needed. In many cases groups of writes can be done in any order as long as they are all done before the next logical step takes place. This would change the meaning of fsync from "force out the data" to "wait for the data to be written" in some implementations. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-01 19:02 ` Linux does not care for data integrity Bill Davidsen @ 2005-06-01 22:02 ` Matthias Andree 2005-06-02 0:12 ` Bill Davidsen 2005-06-02 0:36 ` Jeff Garzik 2005-06-02 8:53 ` Helge Hafting 2 siblings, 1 reply; 144+ messages in thread From: Matthias Andree @ 2005-06-01 22:02 UTC (permalink / raw) To: Bill Davidsen Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List On Wed, 01 Jun 2005, Bill Davidsen wrote: > >It's a matter of enforcing write order. In how far such ordering > >constraints are propagated by file systems, VFS layer, down to the > >hardware, is the grand question. > > The problem is that in many options required to make that happen in the > o/s, hardware, and application are going to kill performance. And even > if you can control order of write, unless you can get write to final > non-volatile media control you can get a sane database but still lose > transactions. > > If there was a way for the o/s to know when a physical write was done > other than using flushes to force completion, then overall performance > could be higher, but individual transaction might have greater latency. > And the app could use fsync to force order of write as needed. In many > cases groups of writes can be done in any order as long as they are all > done before the next logical step takes place. I have a déjà-vu, and I do believe that this discussion has taken place in this list before, perhaps with a slightly different alignment, and likely in the context of mail transfer agents and perhaps synchronous directory (data) updates (file creation and such). Exposing a bit of the queueing to the user space through new syscalls may be an interesting experiment, although I do not have the resources to provide code. Something like fsync() that doesn't flush the whole file system (which appears to be the most common implementation) but tracks what is needed, and that returns when data for a given file is on disk. > This would change the meaning of fsync from "force out the data" to > "wait for the data to be written" in some implementations. Naming suggestion: flazysync() -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-01 22:02 ` Matthias Andree @ 2005-06-02 0:12 ` Bill Davidsen 0 siblings, 0 replies; 144+ messages in thread From: Bill Davidsen @ 2005-06-02 0:12 UTC (permalink / raw) To: Matthias Andree; +Cc: Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Matthias Andree wrote: >On Wed, 01 Jun 2005, Bill Davidsen wrote: > > > >>>It's a matter of enforcing write order. In how far such ordering >>>constraints are propagated by file systems, VFS layer, down to the >>>hardware, is the grand question. >>> >>> >>The problem is that in many options required to make that happen in the >>o/s, hardware, and application are going to kill performance. And even >>if you can control order of write, unless you can get write to final >>non-volatile media control you can get a sane database but still lose >>transactions. >> >>If there was a way for the o/s to know when a physical write was done >>other than using flushes to force completion, then overall performance >>could be higher, but individual transaction might have greater latency. >>And the app could use fsync to force order of write as needed. In many >>cases groups of writes can be done in any order as long as they are all >>done before the next logical step takes place. >> >> > >I have a déjà-vu, and I do believe that this discussion has taken place >in this list before, perhaps with a slightly different alignment, and >likely in the context of mail transfer agents and perhaps synchronous >directory (data) updates (file creation and such). Exposing a bit of the >queueing to the user space through new syscalls may be an interesting >experiment, although I do not have the resources to provide code. >Something like fsync() that doesn't flush the whole file system (which >appears to be the most common implementation) but tracks what is needed, >and that returns when data for a given file is on disk. > > What I had in mind was not a "push" to flush anything anywhere, but rather a watch. As a hypothetical, I open a file and every time a write() is done a counter is incremented in the fd. That's the easy part. Then every time a physical write is completed the count is reduced. To allow for write combining the count could be in bytes rather than syscalls and physical operations. That's the hard part, I don't think the hardware is telling. In addition obviously writes may be combined between i/o related to several fds. But if that could be done, then fsync becomes "wait until my buffered byte count drops to zero," which could be an ioctl. Just having such a checkpoint would address some of the data coherency issues. AFAIK this isn't possible with common ATA devices, and it clearly doesn't address every desirable feature. In spite of that, if someone better qualified to assess the problems and benefits cares to comment, fine. If not, at least I think I explained what I was thinking more clearly. > > >>This would change the meaning of fsync from "force out the data" to >>"wait for the data to be written" in some implementations. >> >> > >Naming suggestion: flazysync() > > > -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-01 19:02 ` Linux does not care for data integrity Bill Davidsen 2005-06-01 22:02 ` Matthias Andree @ 2005-06-02 0:36 ` Jeff Garzik 2005-06-02 1:37 ` Bill Davidsen 2005-06-02 8:53 ` Helge Hafting 2 siblings, 1 reply; 144+ messages in thread From: Jeff Garzik @ 2005-06-02 0:36 UTC (permalink / raw) To: Bill Davidsen Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Bill Davidsen wrote: > This would change the meaning of fsync from "force out the data" to > "wait for the data to be written" in some implementations. This is the meaning of fsync: copies all in-core parts of a file to disk, and waits until the device reports that all parts are on stable storage. Anything less is a bug. Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-02 0:36 ` Jeff Garzik @ 2005-06-02 1:37 ` Bill Davidsen 2005-06-02 1:54 ` Jeff Garzik 0 siblings, 1 reply; 144+ messages in thread From: Bill Davidsen @ 2005-06-02 1:37 UTC (permalink / raw) To: Jeff Garzik Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Jeff Garzik wrote: > Bill Davidsen wrote: > >> This would change the meaning of fsync from "force out the data" to >> "wait for the data to be written" in some implementations. > > > This is the meaning of fsync: copies all in-core parts of a file to > disk, and waits until the device reports that all parts are on stable > storage. > > Anything less is a bug. How about anything more? The truth is that much common hardware doesn't really make the cache to disk move visible, and turning off cache really hurts performance. And it would appear that fsync force a lot more data out of memory than just the blocks for the file in question. However, the point I was making is that it would be useful to be able to tell when the write to non-volatile took place, not to force that to happen. Not to do anything which would flush a lot of other stuff and busy the drive. What I suggest is NOT fsync, just a way to assure ordering. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-02 1:37 ` Bill Davidsen @ 2005-06-02 1:54 ` Jeff Garzik 0 siblings, 0 replies; 144+ messages in thread From: Jeff Garzik @ 2005-06-02 1:54 UTC (permalink / raw) To: Bill Davidsen Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Bill Davidsen wrote: > How about anything more? The truth is that much common hardware doesn't > really make the cache to disk move visible, and turning off cache really > hurts performance. And it would appear that fsync force a lot more data > out of memory than just the blocks for the file in question. Correct. That's the tradeoff with the ATA interface: you must be aware of the cache flush requirements when designing a solution such as a database that really cares about fsync(2), or a journalling filesystem. > However, the point I was making is that it would be useful to be able to > tell when the write to non-volatile took place, not to force that to > happen. Not to do anything which would flush a lot of other stuff and > busy the drive. What I suggest is NOT fsync, just a way to assure ordering. To make that possible, POSIX must become a transactional, async I/O API... :) Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-01 19:02 ` Linux does not care for data integrity Bill Davidsen 2005-06-01 22:02 ` Matthias Andree 2005-06-02 0:36 ` Jeff Garzik @ 2005-06-02 8:53 ` Helge Hafting 2005-06-02 12:00 ` Bill Davidsen 2 siblings, 1 reply; 144+ messages in thread From: Helge Hafting @ 2005-06-02 8:53 UTC (permalink / raw) To: Bill Davidsen Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Bill Davidsen wrote: > Matthias Andree wrote: > >> On Sun, 29 May 2005, Greg Stark wrote: >> >> >>> Oracle, Sybase, Postgres, other databases have hard requirements. They >>> guarantee that when they acknowledge a transaction commit the data >>> has been >>> written to non-volatile media and will be recoverable even in the >>> face of a >>> routine power loss. >>> >>> They meet this requirement just fine on SCSI drives (where write >>> caching >>> generally ships disabled) and on any OS where fsync issues a cache >>> flush. If >> >> >> >> I don't know what facts "generally ships disabled" is based on, all of >> the more recent SCSI drives (non SCA type though) I acquired came with >> write cache enabled and some also with queue algorithm modifier set >> to 1. >> >> >>> Worse, if the disk flushes the data to disk out of order it's quite >>> likely the entire database will be corrupted on any simple power >>> outage. I'm not clear whether that's the case for any common drives. >> >> >> >> It's a matter of enforcing write order. In how far such ordering >> constraints are propagated by file systems, VFS layer, down to the >> hardware, is the grand question. >> > The problem is that in many options required to make that happen in > the o/s, hardware, and application are going to kill performance. And > even if you can control order of write, unless you can get write to > final non-volatile media control you can get a sane database but still > lose transactions. > > If there was a way for the o/s to know when a physical write was done > other than using flushes to force completion, then overall performance > could be higher, but individual transaction might have greater > latency. And the app could use fsync to force order of write as > needed. In many cases groups of writes can be done in any order as > long as they are all done before the next logical step takes place. There is a workaround. Get an UPS just for the disks. It don't have to be big, just enough to keep the disks going long enough to commit their caches after the rest of the machine died from a power loss. Such a small unit could possibly fit inside the cabinet, avoiding the trouble with people stepping on the power cord. With this in place, any write that makes it from the controller to the disk is safely stored for all practical purposes. Helge Hafting ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-02 8:53 ` Helge Hafting @ 2005-06-02 12:00 ` Bill Davidsen 2005-06-02 13:33 ` Lennart Sorensen 0 siblings, 1 reply; 144+ messages in thread From: Bill Davidsen @ 2005-06-02 12:00 UTC (permalink / raw) To: Helge Hafting Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Helge Hafting wrote: > Bill Davidsen wrote: > >> Matthias Andree wrote: >> >>> On Sun, 29 May 2005, Greg Stark wrote: >>> >>> >>>> Oracle, Sybase, Postgres, other databases have hard requirements. They >>>> guarantee that when they acknowledge a transaction commit the data >>>> has been >>>> written to non-volatile media and will be recoverable even in the >>>> face of a >>>> routine power loss. >>>> >>>> They meet this requirement just fine on SCSI drives (where write >>>> caching >>>> generally ships disabled) and on any OS where fsync issues a cache >>>> flush. If >>> >>> >>> >>> >>> I don't know what facts "generally ships disabled" is based on, all of >>> the more recent SCSI drives (non SCA type though) I acquired came with >>> write cache enabled and some also with queue algorithm modifier set >>> to 1. >>> >>> >>>> Worse, if the disk flushes the data to disk out of order it's quite >>>> likely the entire database will be corrupted on any simple power >>>> outage. I'm not clear whether that's the case for any common drives. >>> >>> >>> >>> >>> It's a matter of enforcing write order. In how far such ordering >>> constraints are propagated by file systems, VFS layer, down to the >>> hardware, is the grand question. >>> >> The problem is that in many options required to make that happen in >> the o/s, hardware, and application are going to kill performance. And >> even if you can control order of write, unless you can get write to >> final non-volatile media control you can get a sane database but >> still lose transactions. >> >> If there was a way for the o/s to know when a physical write was done >> other than using flushes to force completion, then overall >> performance could be higher, but individual transaction might have >> greater latency. And the app could use fsync to force order of write >> as needed. In many cases groups of writes can be done in any order as >> long as they are all done before the next logical step takes place. > > > There is a workaround. Get an UPS just for the disks. It don't have > to be > big, just enough to keep the disks going long enough to commit their > caches after the rest of the machine died from a power loss. Such a > small > unit could possibly fit inside the cabinet, avoiding the trouble with > people stepping on the power cord. > > With this in place, any write that makes it from the controller to the > disk is safely stored for all practical purposes. Unfortunately even drives in a dual power tray with redundany power from separate UPS sources will occasionally have a power failure. Proved that last month, the power strip in the rack failed, dumped all the load on the other leg, the surge tripped a breaker. Had an APC UPS in my office fail in a mode which dropped power, waited for the battery to trickle charge to charge the battery a bit, then repeat. Looks to be losing half of a full wave rectifier. The point is that power failures WILL HAPPEN, even with good backups. The goal should be to prevent excessive and avoidable data damage when it does. Shameless plug: for office use I changed from APC to Belkin on all new units, they have had Linux drivers for some time now, and I like to support those who support Linux. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-02 12:00 ` Bill Davidsen @ 2005-06-02 13:33 ` Lennart Sorensen 2005-06-04 13:37 ` Bill Davidsen 0 siblings, 1 reply; 144+ messages in thread From: Lennart Sorensen @ 2005-06-02 13:33 UTC (permalink / raw) To: Bill Davidsen Cc: Helge Hafting, Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List On Thu, Jun 02, 2005 at 08:00:34AM -0400, Bill Davidsen wrote: > Unfortunately even drives in a dual power tray with redundany power from > separate UPS sources will occasionally have a power failure. Proved that > last month, the power strip in the rack failed, dumped all the load on > the other leg, the surge tripped a breaker. Had an APC UPS in my office > fail in a mode which dropped power, waited for the battery to trickle > charge to charge the battery a bit, then repeat. Looks to be losing half > of a full wave rectifier. > > The point is that power failures WILL HAPPEN, even with good backups. > The goal should be to prevent excessive and avoidable data damage when > it does. > > Shameless plug: for office use I changed from APC to Belkin on all new > units, they have had Linux drivers for some time now, and I like to > support those who support Linux. Hasn't apcupsd existed for at least a decade? Works rather well for me. Hard to imagine better linux/unix support than APC seems to have provided so far. For some reason Belkin screms cheap junk to me. Maybe that's because that is what you always see for sale with that brand on it. They may have nice stuff that I just haven't seen because it isn't carried by most stores. Len Sorensen ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-02 13:33 ` Lennart Sorensen @ 2005-06-04 13:37 ` Bill Davidsen 2005-06-04 15:31 ` Bernd Eckenfels 0 siblings, 1 reply; 144+ messages in thread From: Bill Davidsen @ 2005-06-04 13:37 UTC (permalink / raw) To: Lennart Sorensen Cc: Helge Hafting, Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List Lennart Sorensen wrote: >On Thu, Jun 02, 2005 at 08:00:34AM -0400, Bill Davidsen wrote: > > >>Unfortunately even drives in a dual power tray with redundany power from >>separate UPS sources will occasionally have a power failure. Proved that >>last month, the power strip in the rack failed, dumped all the load on >>the other leg, the surge tripped a breaker. Had an APC UPS in my office >>fail in a mode which dropped power, waited for the battery to trickle >>charge to charge the battery a bit, then repeat. Looks to be losing half >>of a full wave rectifier. >> >>The point is that power failures WILL HAPPEN, even with good backups. >>The goal should be to prevent excessive and avoidable data damage when >>it does. >> >>Shameless plug: for office use I changed from APC to Belkin on all new >>units, they have had Linux drivers for some time now, and I like to >>support those who support Linux. >> >> > >Hasn't apcupsd existed for at least a decade? Works rather well for me. >Hard to imagine better linux/unix support than APC seems to have >provided so far. > > I thought apcuspd was a third party project, sourceforce shows it as a project. Didn't know APC was actually "providing" anything, is the driver on the CD now? Sure wasn't on the APC CD I had, I did have it at one time, but it didn't come with the UPS (at that time). >For some reason Belkin screms cheap junk to me. Maybe that's because >that is what you always see for sale with that brand on it. They may >have nice stuff that I just haven't seen because it isn't carried by >most stores. > You don't have Staples or Wal-Mart? Office Max did drop the UPS, the local store manager said the issue was margin, hadn't had enough returns on either brand to be meaningful. -- bill davidsen <davidsen@tmr.com> CTO TMR Associates, Inc Doing interesting things with small computers since 1979 ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-06-04 13:37 ` Bill Davidsen @ 2005-06-04 15:31 ` Bernd Eckenfels 0 siblings, 0 replies; 144+ messages in thread From: Bernd Eckenfels @ 2005-06-04 15:31 UTC (permalink / raw) To: linux-kernel In article <42A1AE8B.5000907@tmr.com> you wrote: > I thought apcuspd was a third party project, sourceforce shows it as a > project. Didn't know APC was actually "providing" anything, is the > driver on the CD now? Sure wasn't on the APC CD I had, I did have it at > one time, but it didn't come with the UPS (at that time). And I havent found a UPS Daemon yet who can out of the box query the snmp cards, one has to hack that themself. However this is still no option against data corruption on power loss. I mean: there is a fuse in your Server. And UPS are known to fail even with power attached. Gruss Bernd ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity (was: Disk write cache) 2005-05-16 11:12 ` Arjan van de Ven 2005-05-16 11:29 ` Matthias Andree @ 2005-05-16 14:57 ` Alan Cox 1 sibling, 0 replies; 144+ messages in thread From: Alan Cox @ 2005-05-16 14:57 UTC (permalink / raw) To: Arjan van de Ven; +Cc: Matthias Andree, Linux Kernel Mailing List On Llu, 2005-05-16 at 12:12, Arjan van de Ven wrote: > Sure you can waive rethorics around, but the fact is that linux is > improving; there now is write barrier support for ext3 (and I assume > reiserfs) for at least IDE and iirc selected scsi too. scsi supports tagging so ext3 at least is just fine. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-05-16 11:02 ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree 2005-05-16 11:12 ` Arjan van de Ven @ 2005-05-16 13:48 ` Mark Lord 2005-05-16 14:59 ` Matthias Andree 1 sibling, 1 reply; 144+ messages in thread From: Mark Lord @ 2005-05-16 13:48 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel >To make this explicit and unmistakable, Linux should be ashamed of >having put its users' data at risk for as long as it has existed, and >looking at how often I still get "barrier synch failed", it still does >with the kernel SUSE Linux 9.3 shipped with. With ATA drives, this is strictly a userspace "policy" decision. Most of us want longer lifespan and 2X the performance from our hardware, and use UPSs to guarantee continuous power & survivability. Others want to live more dangerously on the power supply end, but still be safe on the filesystem end -- no guarantees there, even with "hdparm -W0" to disable the on-drive cache. Pulling power from a writing drive is ALWAYS a bad idea, and can permanently corrupt the track/cylinder that was being written. This will toast a filesystem regardless of how careful or proper the write flushes were done. Write caching on the drive is not as big an issue as good reliable power for this. Cheers ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Linux does not care for data integrity 2005-05-16 13:48 ` Linux does not care for data integrity Mark Lord @ 2005-05-16 14:59 ` Matthias Andree 0 siblings, 0 replies; 144+ messages in thread From: Matthias Andree @ 2005-05-16 14:59 UTC (permalink / raw) To: Mark Lord; +Cc: Matthias Andree, linux-kernel On Mon, 16 May 2005, Mark Lord wrote: > Most of us want longer lifespan and 2X the performance from our hardware, > and use UPSs to guarantee continuous power & survivability. Which is a different story and doesn't protect from dying power supply units. I have replaced several PSUs that died "in mid-flight" and that were not overloaded. UPS isn't going to help in that case. Of course you can use a redundant PSU, redundant UPS - but that's easily more than a battery-backed up cache on a decent RAID controller - since drive failure will also toast file systems. > Others want to live more dangerously on the power supply end, > but still be safe on the filesystem end -- no guarantees there, > even with "hdparm -W0" to disable the on-drive cache. As long as one can rely on the kernel scheduling writes in the proper order, no problem that I'd see. ext3 has apparently been doing this for a long time in the default options, and I have yet to see ext3 corruption (except for massive hardware failure such as b0rked non-ECC RAM or a harddisk that crashed its heads). > Pulling power from a writing drive is ALWAYS a bad idea, > and can permanently corrupt the track/cylinder that was being > written. This will toast a filesystem regardless of how careful > or proper the write flushes were done. Most drive manufacturers make more extensive guarantees about what gets NOT damaged when a write is interrupted by power loss, and are careful to turn the write current off pretty soon on power loss. None of the OEM manuals I looked at advised that data that was already on disk would be damaged beyond the block that was being written. -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 15:29 ` Jeff Garzik 2005-05-15 16:27 ` Disk write cache Kenichi Okuyama @ 2005-05-16 1:56 ` Gene Heskett 2005-05-16 2:11 ` Jeff Garzik ` (2 more replies) 1 sibling, 3 replies; 144+ messages in thread From: Gene Heskett @ 2005-05-16 1:56 UTC (permalink / raw) To: linux-kernel On Sunday 15 May 2005 11:29, Jeff Garzik wrote: >On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote: >> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: >> >On Sun, 15 May 2005, Tomasz Torcz wrote: >> >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: >> >> > > > > However they've patched the FreeBSD kernel to >> >> > > > > "workaround?" it: >> >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09 >> >> > > > >/ht t5.patch >> >> > > > >> >> > > > That's a similar stupid idea as they did with the disk >> >> > > > write cache (lowering the MTBFs of their disks by >> >> > > > considerable factors, which is much worse than the power >> >> > > > off data loss problem) Let's not go down this path >> >> > > > please. >> >> > > >> >> > > What wrong did they do with disk write cache? >> >> > >> >> > They turned it off by default, which according to disk >> >> > vendors lowers the MTBF of your disk to a fraction of the >> >> > original value. >> >> > >> >> > I bet the total amount of valuable data lost for FreeBSD >> >> > users because of broken disks is much much bigger than what >> >> > they gained from not losing in the rather hard to hit power >> >> > off cases. >> >> >> >> Aren't I/O barriers a way to safely use write cache? >> > >> >FreeBSD used these barriers (FLUSH CACHE command) long time ago. >> > >> >There are rumors that some disks ignore FLUSH CACHE command just >> > to get higher benchmarks in Windows. But I haven't heart of any >> > proof. Does anybody know, what companies fake this command? >> > >> >From a story I read elsewhere just a few days ago, this problem >> > is >> >> virtually universal even in the umpty-bucks 15,000 rpm scsi server >> drives. It appears that this is just another way to crank up the >> numbers and make each drive seem faster than its competition. >> >> My gut feeling is that if this gets enough ink to get under the >> drive makers skins, we will see the issuance of a utility from the >> makers that will re-program the drives therefore enabling the >> proper handling of the FLUSH CACHE command. This would be an >> excellent chance IMO, to make a bit of noise if the utility comes >> out, but only runs on windows. In that event, we hold their feet >> to the fire (the prefereable method), or a wrapper is written that >> allows it to run on any os with a bash-like shell manager. > >There is a large amount of yammering and speculation in this thread. I agree, and frankly I'm just another of the yammerers as I don't have the clout to be otherwise. >Most disks do seem to obey SYNC CACHE / FLUSH CACHE. > > Jeff I don't think I have any drives here that do obey that, Jeff. I got curious about this, oh, maybe a year back when this discussion first took place on another list, and wrote a test gizmo that copied a large file, then slept for 1 second and issued a sync command. No drive led activity until the usual 5 second delay of the filesystem had expired. To me, that indicated that the sync command was being returned as completed without error and I had my shell prompt back long before the drives leds came on. Admittedly that may not be a 100% valid test, but I really did expect to see the leds come on as the sync command was executed. I also have some setup stuff for heyu that runs at various times of the day, reconfigureing how heyu and xtend run 3 times a day here, which depends on a valid disk file, and I've had to use sleeps for guaranteeing the proper sequencing, where if the sync command actually worked, I could get the job done quite a bit faster. Again, probably not a valid test of the sync command, but thats the evidence I have. I do not believe it works here, with any of the 5 drives currently spinning in these two boxes. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.34% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 1:56 ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett @ 2005-05-16 2:11 ` Jeff Garzik 2005-05-16 2:24 ` Mikulas Patocka 2005-05-16 2:32 ` Mark Lord 2 siblings, 0 replies; 144+ messages in thread From: Jeff Garzik @ 2005-05-16 2:11 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel Gene Heskett wrote: > I don't think I have any drives here that do obey that, Jeff. I got > curious about this, oh, maybe a year back when this discussion first > took place on another list, and wrote a test gizmo that copied a > large file, then slept for 1 second and issued a sync command. No > drive led activity until the usual 5 second delay of the filesystem > had expired. To me, that indicated that the sync command was being > returned as completed without error and I had my shell prompt back > long before the drives leds came on. Admittedly that may not be a > 100% valid test, but I really did expect to see the leds come on as > the sync command was executed. > Again, probably not a valid test of the sync command, but thats the > evidence I have. I do not believe it works here, with any of the 5 > drives currently spinning in these two boxes. Correct, that's a pretty poor test. Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 1:56 ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett 2005-05-16 2:11 ` Jeff Garzik @ 2005-05-16 2:24 ` Mikulas Patocka 2005-05-16 3:05 ` Gene Heskett 2005-05-16 2:32 ` Mark Lord 2 siblings, 1 reply; 144+ messages in thread From: Mikulas Patocka @ 2005-05-16 2:24 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel On Sun, 15 May 2005, Gene Heskett wrote: > >There is a large amount of yammering and speculation in this thread. > > I agree, and frankly I'm just another of the yammerers as I don't > have the clout to be otherwise. > > >Most disks do seem to obey SYNC CACHE / FLUSH CACHE. > > > > Jeff > > I don't think I have any drives here that do obey that, Jeff. I got > curious about this, oh, maybe a year back when this discussion first > took place on another list, and wrote a test gizmo that copied a > large file, then slept for 1 second and issued a sync command. No > drive led activity until the usual 5 second delay of the filesystem > had expired. To me, that indicated that the sync command was being > returned as completed without error and I had my shell prompt back > long before the drives leds came on. Admittedly that may not be a > 100% valid test, but I really did expect to see the leds come on as > the sync command was executed. > > I also have some setup stuff for heyu that runs at various times of > the day, reconfigureing how heyu and xtend run 3 times a day here, > which depends on a valid disk file, and I've had to use sleeps for > guaranteeing the proper sequencing, where if the sync command > actually worked, I could get the job done quite a bit faster. > > Again, probably not a valid test of the sync command, but thats the > evidence I have. I do not believe it works here, with any of the 5 > drives currently spinning in these two boxes. Note, that Linux can't send FLUSH CACHE command at all (until very recent 2.6 kernels). So write cache is always dangerous under Linux, no matter if disk is broken or not. Another note: according to posix, sync() is asynchronous --- i.e. it initiates write, but doesn't have to wait for complete. In linux, sync() waits for writes to complete, but it doesn't have to in other OSes. Mikulas > -- > Cheers, Gene > "There are four boxes to be used in defense of liberty: > soap, ballot, jury, and ammo. Please use in that order." > -Ed Howdershelt (Author) > 99.34% setiathome rank, not too shabby for a WV hillbilly > Yahoo.com and AOL/TW attorneys please note, additions to the above > message by Gene Heskett are: > Copyright 2005 by Maurice Eugene Heskett, all rights reserved. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 2:24 ` Mikulas Patocka @ 2005-05-16 3:05 ` Gene Heskett 0 siblings, 0 replies; 144+ messages in thread From: Gene Heskett @ 2005-05-16 3:05 UTC (permalink / raw) To: linux-kernel On Sunday 15 May 2005 22:24, Mikulas Patocka wrote: >On Sun, 15 May 2005, Gene Heskett wrote: >> >There is a large amount of yammering and speculation in this >> > thread. >> >> I agree, and frankly I'm just another of the yammerers as I don't >> have the clout to be otherwise. >> >> >Most disks do seem to obey SYNC CACHE / FLUSH CACHE. >> > >> > Jeff >> >> I don't think I have any drives here that do obey that, Jeff. I >> got curious about this, oh, maybe a year back when this discussion >> first took place on another list, and wrote a test gizmo that >> copied a large file, then slept for 1 second and issued a sync >> command. No drive led activity until the usual 5 second delay of >> the filesystem had expired. To me, that indicated that the sync >> command was being returned as completed without error and I had my >> shell prompt back long before the drives leds came on. Admittedly >> that may not be a 100% valid test, but I really did expect to see >> the leds come on as the sync command was executed. >> >> I also have some setup stuff for heyu that runs at various times >> of the day, reconfigureing how heyu and xtend run 3 times a day >> here, which depends on a valid disk file, and I've had to use >> sleeps for guaranteeing the proper sequencing, where if the sync >> command actually worked, I could get the job done quite a bit >> faster. >> >> Again, probably not a valid test of the sync command, but thats >> the evidence I have. I do not believe it works here, with any of >> the 5 drives currently spinning in these two boxes. > >Note, that Linux can't send FLUSH CACHE command at all (until very > recent 2.6 kernels). So write cache is always dangerous under > Linux, no matter if disk is broken or not. > >Another note: according to posix, sync() is asynchronous --- i.e. it >initiates write, but doesn't have to wait for complete. In linux, > sync() waits for writes to complete, but it doesn't have to in > other OSes. > >Mikulas > Humm, I'm getting the impression I should rerun that test script if I can find it. I believe the last time I tried it, I was running a 2.4.x kernel, right now 2.6.12-rc1. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.34% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 1:56 ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett 2005-05-16 2:11 ` Jeff Garzik 2005-05-16 2:24 ` Mikulas Patocka @ 2005-05-16 2:32 ` Mark Lord 2005-05-16 3:08 ` Gene Heskett 2005-05-18 4:03 ` Eric D. Mudama 2 siblings, 2 replies; 144+ messages in thread From: Mark Lord @ 2005-05-16 2:32 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel >took place on another list, and wrote a test gizmo that copied a >large file, then slept for 1 second and issued a sync command. No >drive led activity until the usual 5 second delay of the filesystem >had expired. To me, that indicated that the sync command was being There's your clue. The drive LEDs normally reflect activity over the ATA bus (the cable!). If they're not on, then the drive isn't receiving data/commands from the host. Cheers ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 2:32 ` Mark Lord @ 2005-05-16 3:08 ` Gene Heskett 2005-05-16 13:44 ` Mark Lord 2005-05-18 4:03 ` Eric D. Mudama 1 sibling, 1 reply; 144+ messages in thread From: Gene Heskett @ 2005-05-16 3:08 UTC (permalink / raw) To: linux-kernel On Sunday 15 May 2005 22:32, Mark Lord wrote: > >took place on another list, and wrote a test gizmo that copied a > >large file, then slept for 1 second and issued a sync command. No > >drive led activity until the usual 5 second delay of the > > filesystem had expired. To me, that indicated that the sync > > command was being > >There's your clue. The drive LEDs normally reflect activity >over the ATA bus (the cable!). If they're not on, then the drive >isn't receiving data/commands from the host. > >Cheers That was my theory too Mark, but Jeff G. says its not a valid indicator. So who's right? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.34% setiathome rank, not too shabby for a WV hillbilly Yahoo.com and AOL/TW attorneys please note, additions to the above message by Gene Heskett are: Copyright 2005 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 3:08 ` Gene Heskett @ 2005-05-16 13:44 ` Mark Lord 0 siblings, 0 replies; 144+ messages in thread From: Mark Lord @ 2005-05-16 13:44 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel Gene Heskett wrote: > >>There's your clue. The drive LEDs normally reflect activity >>over the ATA bus (the cable!). If they're not on, then the drive >>isn't receiving data/commands from the host. > > That was my theory too Mark, but Jeff G. says its not a valid > indicator. So who's right? If the LEDs are connected to the controller on the motherboard, then they are a strict indication of activity over the cable between the drive and controller (if they function at all). But it is possible for software to leave those LEDs permanently in the "on" state, depending on the register sequence used. If the LEDs are on the drive itself, they may indicate transfers over the connector (cable) -- usually always the case -- or they could indicate transfers to/from the media. Cheers ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 2:32 ` Mark Lord 2005-05-16 3:08 ` Gene Heskett @ 2005-05-18 4:03 ` Eric D. Mudama 1 sibling, 0 replies; 144+ messages in thread From: Eric D. Mudama @ 2005-05-18 4:03 UTC (permalink / raw) To: Mark Lord; +Cc: Gene Heskett, linux-kernel On 5/15/05, Mark Lord <lkml@rtr.ca> wrote: > There's your clue. The drive LEDs normally reflect activity > over the ATA bus (the cable!). If they're not on, then the drive > isn't receiving data/commands from the host. Mark is correct, activity indicators are associated with bus activity, not internal drive activity. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 15:21 ` Gene Heskett 2005-05-15 15:29 ` Jeff Garzik @ 2005-05-15 16:24 ` Mikulas Patocka 2005-05-16 11:18 ` Matthias Andree 2005-05-15 21:38 ` Tomasz Torcz 2 siblings, 1 reply; 144+ messages in thread From: Mikulas Patocka @ 2005-05-15 16:24 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel On Sun, 15 May 2005, Gene Heskett wrote: > On Sunday 15 May 2005 11:00, Mikulas Patocka wrote: > >On Sun, 15 May 2005, Tomasz Torcz wrote: > >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote: > >> > > > > However they've patched the FreeBSD kernel to > >> > > > > "workaround?" it: > >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht > >> > > > >t5.patch > >> > > > > >> > > > That's a similar stupid idea as they did with the disk write > >> > > > cache (lowering the MTBFs of their disks by considerable > >> > > > factors, which is much worse than the power off data loss > >> > > > problem) Let's not go down this path please. > >> > > > >> > > What wrong did they do with disk write cache? > >> > > >> > They turned it off by default, which according to disk vendors > >> > lowers the MTBF of your disk to a fraction of the original > >> > value. > >> > > >> > I bet the total amount of valuable data lost for FreeBSD users > >> > because of broken disks is much much bigger than what they > >> > gained from not losing in the rather hard to hit power off > >> > cases. > >> > >> Aren't I/O barriers a way to safely use write cache? > > > >FreeBSD used these barriers (FLUSH CACHE command) long time ago. > > > >There are rumors that some disks ignore FLUSH CACHE command just to > > get higher benchmarks in Windows. But I haven't heart of any proof. > > Does anybody know, what companies fake this command? > > > From a story I read elsewhere just a few days ago, this problem is > virtually universal even in the umpty-bucks 15,000 rpm scsi server > drives. It appears that this is just another way to crank up the > numbers and make each drive seem faster than its competition. I've just made test on my Western Digical 40G IDE disk: just writes without flush cache: 1min 33sec same access pattern, but flush cache after each write: 20min 7sec (and disk made more noise) (this testcase does many 1-sector writes to the same or adjacent sectors, so cache helps here a lot) So it's likely that this disk honours cache flushing. (but the disk contains another severe bug --- it corrupts it cache-coherency logic when 256-sector accesses are being used --- I asked WD about it and got no response. 256 is represented as 0 in IDE registers --- that's probably where the bug came from). I've also heard a lot of rumors about ignoring cache flush --- but I mean, have anybody actually proven that some disk corrupts data this way? i.e.: make a program that does repeatedly this: write some sector issue flush cache command send a packet about what was written where ... and turn off machine while this program runs and see if disk contains all the data from packets. or write many small sectors issue flush cache turn off power via ACPI on next reboot see, if disk contains all the data Note that disk can still ignore FLUSH CACHE command cached data are small enough to be written on power loss, so small FLUSH CACHE time doesn't prove disk cheating. Mikulas > My gut feeling is that if this gets enough ink to get under the drive > makers skins, we will see the issuance of a utility from the makers > that will re-program the drives therefore enabling the proper > handling of the FLUSH CACHE command. This would be an excellent > chance IMO, to make a bit of noise if the utility comes out, but only > runs on windows. In that event, we hold their feet to the fire (the > prefereable method), or a wrapper is written that allows it to run on > any os with a bash-like shell manager. > > >Mikulas > >- > >To unsubscribe from this list: send the line "unsubscribe > > linux-kernel" in the body of a message to majordomo@vger.kernel.org > >More majordomo info at http://vger.kernel.org/majordomo-info.html > >Please read the FAQ at http://www.tux.org/lkml/ > > -- > Cheers, Gene > "There are four boxes to be used in defense of liberty: > soap, ballot, jury, and ammo. Please use in that order." > -Ed Howdershelt (Author) > 99.34% setiathome rank, not too shabby for a WV hillbilly > Yahoo.com and AOL/TW attorneys please note, additions to the above > message by Gene Heskett are: > Copyright 2005 by Maurice Eugene Heskett, all rights reserved. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 16:24 ` Mikulas Patocka @ 2005-05-16 11:18 ` Matthias Andree 2005-05-16 14:33 ` Jeff Garzik ` (2 more replies) 0 siblings, 3 replies; 144+ messages in thread From: Matthias Andree @ 2005-05-16 11:18 UTC (permalink / raw) To: linux-kernel On Sun, 15 May 2005, Mikulas Patocka wrote: > Note that disk can still ignore FLUSH CACHE command cached data are small > enough to be written on power loss, so small FLUSH CACHE time doesn't > prove disk cheating. Have you seen a drive yet that writes back blocks after power loss? I have heard rumors about this, but all OEM manuals I looked at for drives I bought or recommended simply stated that the block currently being written at power loss can become damaged (with write cache off), and that the drive can lose the full write cache at power loss (with write cache on) so this looks like daydreaming manifested as rumor. I've heard that drives would be taking rotational energy from their rotating platters and such, but never heard how the hardware compensates the dilation with decreasing rotational frequency, which also requires changed filter settings for the write channel, block encoding, delays, possibly stepping the heads and so on. I don't believe these stories until I see evidence. These are corner cases that a vendor would hardly optimize for. If you know a disk drive (not battery-backed disk controller!) that flashes its cache to NVRAM, or uses rotational energy to save its cache on the platters, please name brand and model and where I can download the material that documents this behavior. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 11:18 ` Matthias Andree @ 2005-05-16 14:33 ` Jeff Garzik 2005-05-16 15:26 ` Richard B. Johnson 2005-05-16 18:11 ` Disk write cache (Was: Hyper-Threading Vulnerability) Valdis.Kletnieks 2005-05-16 14:54 ` Alan Cox 2005-05-18 4:06 ` Eric D. Mudama 2 siblings, 2 replies; 144+ messages in thread From: Jeff Garzik @ 2005-05-16 14:33 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel Matthias Andree wrote: > On Sun, 15 May 2005, Mikulas Patocka wrote: > > >>Note that disk can still ignore FLUSH CACHE command cached data are small >>enough to be written on power loss, so small FLUSH CACHE time doesn't >>prove disk cheating. > > > Have you seen a drive yet that writes back blocks after power loss? > > I have heard rumors about this, but all OEM manuals I looked at for > drives I bought or recommended simply stated that the block currently > being written at power loss can become damaged (with write cache off), > and that the drive can lose the full write cache at power loss (with > write cache on) so this looks like daydreaming manifested as rumor. Upon power loss, at least one ATA vendor's disks try to write out as much data as possible. Jeff ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 14:33 ` Jeff Garzik @ 2005-05-16 15:26 ` Richard B. Johnson 2005-05-16 16:00 ` [OT] drive behavior on power-off (was: Disk write cache) Matthias Andree 2005-05-16 18:11 ` Disk write cache (Was: Hyper-Threading Vulnerability) Valdis.Kletnieks 1 sibling, 1 reply; 144+ messages in thread From: Richard B. Johnson @ 2005-05-16 15:26 UTC (permalink / raw) To: Jeff Garzik; +Cc: Matthias Andree, linux-kernel On Mon, 16 May 2005, Jeff Garzik wrote: > Matthias Andree wrote: >> On Sun, 15 May 2005, Mikulas Patocka wrote: >> >> >>> Note that disk can still ignore FLUSH CACHE command cached data are small >>> enough to be written on power loss, so small FLUSH CACHE time doesn't >>> prove disk cheating. >> >> Have you seen a drive yet that writes back blocks after power loss? >> >> I have heard rumors about this, but all OEM manuals I looked at for >> drives I bought or recommended simply stated that the block currently >> being written at power loss can become damaged (with write cache off), >> and that the drive can lose the full write cache at power loss (with >> write cache on) so this looks like daydreaming manifested as rumor. > > Upon power loss, at least one ATA vendor's disks try to write out as > much data as possible. > > Jeff Then I suggest you never use such a drive. Anything that does this, will end up replacing a good track with garbage. Unless a disk drive has a built-in power source such as super-capacitors or batteries, what happens during a power-failure is that all electronics stops and the discs start coasting. Eventually the heads will crash onto the platter. Older discs had a magnetically released latch which would send the heads to an inside landing zone. Nobody bothers anymore. Many high-quality drives cache data. Fortunately, upon power loss these data are NOT attempted to be written. This means that, although you may have incomplete or even bad data on the physical medium, at least the medium can be read and written. The sectoring has not been corrupted (read destroyed). If you think about the physical process necessary to write data to the medium, you will understand that without a large amount of energy storage capacity on the disk, it's just not possible. To write a sector, one needs to cache the data in a sector-buffer putting on a sector header and trailing CRC, wait for the write- splice from the previous sector (could be almost one rotation), then write data and sync to the sector. If the disc is too slow, these data will be underwrite the sector. Also, if the disc was only 5 percent slow, the clock recovery on a subsequent read will be off by 5 percent, outside the range of PLL lock-in, so you write something that can never be read, a guaranteed bad block. Combinations of journalizing on media that can be completely flushed, and ordinary cache-intensive discs can result in reliable data storage. However a single ATA or SCSI disk just isn't a perfectly reliable storage medium although it's usually good enough. Cheers, Dick Johnson Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips). Notice : All mail here is now cached for review by Dictator Bush. 98.36% of all statistics are fiction. ^ permalink raw reply [flat|nested] 144+ messages in thread
* [OT] drive behavior on power-off (was: Disk write cache) 2005-05-16 15:26 ` Richard B. Johnson @ 2005-05-16 16:00 ` Matthias Andree 0 siblings, 0 replies; 144+ messages in thread From: Matthias Andree @ 2005-05-16 16:00 UTC (permalink / raw) To: Richard B. Johnson; +Cc: Jeff Garzik, Matthias Andree, linux-kernel On Mon, 16 May 2005, Richard B. Johnson wrote: > Then I suggest you never use such a drive. Anything that does this, > will end up replacing a good track with garbage. Unless a disk drive > has a built-in power source such as super-capacitors or batteries, what > happens during a power-failure is that all electronics stops and > the discs start coasting. Eventually the heads will crash onto > the platter. Older discs had a magnetically released latch which would > send the heads to an inside landing zone. Nobody bothers anymore. IBM/Hitachi hard disk drives still use a "load/unload ramp" that entirely moves the heads off the platters - I've known this since the DJNA, and it is still advertised in Deskstar 7K500 and Ultrastar 15K147 to name just two examples. -- Matthias Andree ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 14:33 ` Jeff Garzik 2005-05-16 15:26 ` Richard B. Johnson @ 2005-05-16 18:11 ` Valdis.Kletnieks 1 sibling, 0 replies; 144+ messages in thread From: Valdis.Kletnieks @ 2005-05-16 18:11 UTC (permalink / raw) To: Jeff Garzik; +Cc: Matthias Andree, linux-kernel [-- Attachment #1: Type: text/plain, Size: 332 bytes --] On Mon, 16 May 2005 10:33:30 EDT, Jeff Garzik said: > Upon power loss, at least one ATA vendor's disks try to write out as > much data as possible. Does the firmware for this vendor's disks have enough smarts to reserve that last little bit of power to park the heads so it's not actively writing when it finally loses entirely? [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 11:18 ` Matthias Andree 2005-05-16 14:33 ` Jeff Garzik @ 2005-05-16 14:54 ` Alan Cox 2005-05-17 13:15 ` Bill Davidsen 2005-05-18 4:06 ` Eric D. Mudama 2 siblings, 1 reply; 144+ messages in thread From: Alan Cox @ 2005-05-16 14:54 UTC (permalink / raw) To: Matthias Andree; +Cc: Linux Kernel Mailing List > I have heard rumors about this, but all OEM manuals I looked at for > drives I bought or recommended simply stated that the block currently > being written at power loss can become damaged (with write cache off), > and that the drive can lose the full write cache at power loss (with > write cache on) so this looks like daydreaming manifested as rumor. IBM drives definitely used to trash the sector in this case. They newer ones either don't or recover from it presumably because people took that to be a drive failure and returned it. Sometimes the people win ;) > flashes its cache to NVRAM, or uses rotational energy to save its cache > on the platters, please name brand and model and where I can download > the material that documents this behavior. I am not aware of any IDE drive with these properties. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 14:54 ` Alan Cox @ 2005-05-17 13:15 ` Bill Davidsen 2005-05-17 21:41 ` Kyle Moffett 0 siblings, 1 reply; 144+ messages in thread From: Bill Davidsen @ 2005-05-17 13:15 UTC (permalink / raw) To: Alan Cox; +Cc: Matthias Andree, Linux Kernel Mailing List On Mon, 16 May 2005, Alan Cox wrote: > > flashes its cache to NVRAM, or uses rotational energy to save its cache > > on the platters, please name brand and model and where I can download > > the material that documents this behavior. > > I am not aware of any IDE drive with these properties. I'm not sure I know of a SCSI drive which does that, either. It was a big thing a few decades to use rotational energy to park the heads, but I haven't seen discussion of save to nvram. Then again, I haven't been looking for it. What would be ideal is some cache which didn't depend on power to maintain state, like core (remember core?) or the bubble memory which spent almost a decade being just slightly too {slow,costly} to replace disk. There doesn't seem to be a cost effective technology yet. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-17 13:15 ` Bill Davidsen @ 2005-05-17 21:41 ` Kyle Moffett 0 siblings, 0 replies; 144+ messages in thread From: Kyle Moffett @ 2005-05-17 21:41 UTC (permalink / raw) To: Bill Davidsen; +Cc: Alan Cox, Matthias Andree, Linux Kernel Mailing List On May 17, 2005, at 09:15:52, Bill Davidsen wrote: > What would be ideal is some cache which didn't depend on power to > maintain > state, like core (remember core?) or the bubble memory which spent > almost > a decade being just slightly too {slow,costly} to replace disk. There > doesn't seem to be a cost effective technology yet. I've seen some articles recently on a micro-punchcard technology that uses grids of thousands of miniature needles and sheets of polymer plastic that can be melted at somewhat low temperatures to create or remove indentations in the plastic. The device can read and write each position at a very high rate, and since there are several thousand bits per position, with one bit for each needle, the bandwidth is enormous. (And it scales linearly with the size of the device, too!) Purportedly these grids can be easily built with slight modifications to modern semiconductor etching technologies, and the polymer plastic is reasonably simple to manufacture, so the resultant cost per device is hundreds of times cheaper than today's drives. Likewise, they have significantly higher memory density than current hardware due to fewer relativistic and quantum effects (no magnetism). Cheers, Kyle Moffett -----BEGIN GEEK CODE BLOCK----- Version: 3.12 GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$ L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+ PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$ r !y?(-) ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-16 11:18 ` Matthias Andree 2005-05-16 14:33 ` Jeff Garzik 2005-05-16 14:54 ` Alan Cox @ 2005-05-18 4:06 ` Eric D. Mudama 2 siblings, 0 replies; 144+ messages in thread From: Eric D. Mudama @ 2005-05-18 4:06 UTC (permalink / raw) To: linux-kernel On 5/16/05, Matthias Andree <matthias.andree@gmx.de> wrote: > I've heard that drives would be taking rotational energy from their > rotating platters and such, but never heard how the hardware compensates > the dilation with decreasing rotational frequency, which also requires > changed filter settings for the write channel, block encoding, delays, > possibly stepping the heads and so on. I don't believe these stories > until I see evidence. I'm pretty sure that most drives out there will immediately attempt to safely retract or park the heads the instant that a power loss is detected. There's too much potential damage that can occur if the heads aren't able to safely retract to a landing zone or ramp, that trying to save "one more block of cached data" just isn't worth the risk. --eric ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 15:21 ` Gene Heskett 2005-05-15 15:29 ` Jeff Garzik 2005-05-15 16:24 ` Mikulas Patocka @ 2005-05-15 21:38 ` Tomasz Torcz 2 siblings, 0 replies; 144+ messages in thread From: Tomasz Torcz @ 2005-05-15 21:38 UTC (permalink / raw) To: linux-kernel On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote: > >FreeBSD used these barriers (FLUSH CACHE command) long time ago. > > > >There are rumors that some disks ignore FLUSH CACHE command just to > > get higher benchmarks in Windows. But I haven't heart of any proof. > > Does anybody know, what companies fake this command? > > > >From a story I read elsewhere just a few days ago, this problem is > virtually universal even in the umpty-bucks 15,000 rpm scsi server > drives. It appears that this is just another way to crank up the > numbers and make each drive seem faster than its competition. Probably you talking about this: http://www.livejournal.com/~brad/2116715.html It has hit Slashdot yesterday. -- Tomasz Torcz "God, root, what's the difference?" zdzichu@irc.-nie.spam-.pl "God is more forgiving." ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Disk write cache (Was: Hyper-Threading Vulnerability) 2005-05-15 15:00 ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka 2005-05-15 15:21 ` Gene Heskett @ 2005-05-16 14:50 ` Alan Cox 1 sibling, 0 replies; 144+ messages in thread From: Alan Cox @ 2005-05-16 14:50 UTC (permalink / raw) To: Mikulas Patocka; +Cc: Tomasz Torcz, Linux Kernel Mailing List On Sul, 2005-05-15 at 16:00, Mikulas Patocka wrote: > There are rumors that some disks ignore FLUSH CACHE command just to get > higher benchmarks in Windows. But I haven't heart of any proof. Does > anybody know, what companies fake this command? The specification was intentionally written so that his command has to do what it is specified to or be unknown and thus error and not be in the ident info. That was done by people who wanted to be very sure that any vendor who tried to shortcut the command would have "sue me" written on their forehead. There are problems with a few older drives which have a write cache but don't support cache commands. Alan ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-15 14:52 ` Tomasz Torcz 2005-05-15 15:00 ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka @ 2005-05-15 15:00 ` Arjan van de Ven 1 sibling, 0 replies; 144+ messages in thread From: Arjan van de Ven @ 2005-05-15 15:00 UTC (permalink / raw) To: Tomasz Torcz; +Cc: linux-kernel > > They turned it off by default, which according to disk vendors > > lowers the MTBF of your disk to a fraction of the original value. > > > > I bet the total amount of valuable data lost for FreeBSD users because > > of broken disks is much much bigger than what they gained from not losing > > in the rather hard to hit power off cases. > > Aren't I/O barriers a way to safely use write cache? yes they are. However of course they also decrease the mtbf somewhat, although less so than entirely disabling the cache.... ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability @ 2005-05-13 22:51 linux 2005-05-14 8:03 ` Arjan van de Ven 0 siblings, 1 reply; 144+ messages in thread From: linux @ 2005-05-13 22:51 UTC (permalink / raw) To: linux-kernel The problem is with the *combination* of fine-grained multithreading, a shared cache, *and* high-resolution timing via RDTSC. A far easier fix would be to disable RDTSC. (A third possibility would be to disable the cache, but I assume that's too horrible to contemplate.) When Intel implemented RDTSC, they were quite aware that it made a good covert channel and provided an enable bit (bit 2 of CR4) to control user-space access. This attack is just showing that, with the tight coupling provided by hyperthreading, it's possible to receive "interesting" data from a process that is *not* deliberately transmitting. (Whereas the classic problem enfocing the Bell-Lapadula model comes from preventing *deliberate* transmission.) If you don't want to disable it universally, how about providing, at the OS level, a way for a task to request that RDTSC be disabled while it is running? If another task tries to use it, it traps and one of the two (doesn't matter which!) gets rescheduled later when the other is not running. If RDTSC is too annoying to disable, just interpret the same flag as a "schedule me solo" flag that prevents scheduling anything else (at least, not sharing the same ->mm) on the other virtual processor. (Of course, the time should count double for scheduler fairness purposes.) ^ permalink raw reply [flat|nested] 144+ messages in thread
* Re: Hyper-Threading Vulnerability 2005-05-13 22:51 linux @ 2005-05-14 8:03 ` Arjan van de Ven 0 siblings, 0 replies; 144+ messages in thread From: Arjan van de Ven @ 2005-05-14 8:03 UTC (permalink / raw) To: linux; +Cc: linux-kernel On Fri, 2005-05-13 at 22:51 +0000, linux@horizon.com wrote: > If RDTSC is too annoying to disable, just interpret the same flag as a > "schedule me solo" flag that prevents scheduling anything else (at least, > not sharing the same ->mm) on the other virtual processor. (Of course, > the time should count double for scheduler fairness purposes.) rdtsc is so unreliable on current hardware that no userspace app should be using it anyway; it's not synchronized on SMP, powermanagement impacts the rate of the ticks all the time etc etc. Basically it's worthless on modern machines for anything but in-kernel busy loops. ^ permalink raw reply [flat|nested] 144+ messages in thread
end of thread, other threads:[~2005-06-04 15:31 UTC | newest] Thread overview: 144+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2005-05-13 5:51 Hyper-Threading Vulnerability Gabor MICSKO 2005-05-13 12:47 ` Barry K. Nathan 2005-05-13 14:10 ` Jeff Garzik 2005-05-13 14:23 ` Daniel Jacobowitz 2005-05-13 14:32 ` Jeff Garzik 2005-05-13 17:13 ` Andy Isaacson 2005-05-13 18:30 ` Vadim Lobanov 2005-05-13 19:02 ` Andy Isaacson 2005-05-15 9:31 ` Adrian Bunk 2005-05-13 17:14 ` Gabor MICSKO 2005-05-13 20:23 ` Barry K. Nathan 2005-05-13 18:03 ` Andi Kleen 2005-05-13 18:34 ` Eric Rannaud 2005-05-13 18:35 ` Alan Cox 2005-05-13 18:49 ` Scott Robert Ladd 2005-05-13 19:08 ` Andi Kleen 2005-05-13 19:36 ` Grant Coady 2005-05-16 17:00 ` Linus Torvalds 2005-05-16 12:37 ` Tommy Reynolds 2005-05-18 19:07 ` Bill Davidsen 2005-05-13 18:38 ` Richard F. Rebel 2005-05-13 19:05 ` Andi Kleen 2005-05-13 21:26 ` Andy Isaacson 2005-05-13 21:59 ` Matt Mackall 2005-05-13 22:47 ` Alan Cox 2005-05-13 23:00 ` Lee Revell 2005-05-13 23:27 ` Dave Jones 2005-05-13 23:38 ` Lee Revell 2005-05-13 23:44 ` Dave Jones 2005-05-14 7:37 ` Lee Revell 2005-05-14 15:33 ` Andrea Arcangeli 2005-05-15 1:07 ` Christer Weinigel 2005-05-15 9:48 ` Andi Kleen 2005-05-14 15:23 ` Alan Cox 2005-05-14 15:45 ` andrea 2005-05-15 13:38 ` Mikulas Patocka 2005-05-16 7:06 ` andrea 2005-05-14 16:30 ` Lee Revell 2005-05-14 16:44 ` Arjan van de Ven 2005-05-14 17:56 ` Lee Revell 2005-05-14 18:01 ` Arjan van de Ven 2005-05-14 19:21 ` Lee Revell 2005-05-14 19:48 ` Arjan van de Ven 2005-05-14 23:40 ` Lee Revell 2005-05-15 7:30 ` Arjan van de Ven 2005-05-15 20:41 ` Alan Cox 2005-05-15 20:48 ` Arjan van de Ven 2005-05-15 21:10 ` Lee Revell 2005-05-15 22:55 ` Dave Jones 2005-05-15 23:10 ` Lee Revell 2005-05-16 7:25 ` Arjan van de Ven 2005-05-15 9:37 ` Andi Kleen 2005-05-15 3:19 ` dean gaudet 2005-05-15 10:01 ` Andi Kleen 2005-05-15 10:23 ` 2.6.4 timer and helper functions kernel 2005-05-19 0:38 ` George Anzinger 2005-05-15 9:33 ` Hyper-Threading Vulnerability Adrian Bunk 2005-05-14 17:04 ` Jindrich Makovicka 2005-05-14 18:27 ` Lee Revell 2005-05-15 9:58 ` Andi Kleen 2005-05-14 0:39 ` dean gaudet 2005-05-16 13:41 ` Andrea Arcangeli 2005-05-15 9:43 ` Andi Kleen 2005-05-15 18:42 ` David Schwartz 2005-05-15 18:56 ` Dr. David Alan Gilbert 2005-05-16 7:10 ` Eric W. Biederman 2005-05-16 11:04 ` Andi Kleen 2005-05-16 19:14 ` Eric W. Biederman 2005-05-16 20:05 ` Valdis.Kletnieks 2005-05-15 14:00 ` Mikulas Patocka 2005-05-15 14:26 ` Andi Kleen 2005-05-13 23:32 ` Paul Jakma 2005-05-14 16:29 ` Paul Jakma 2005-05-13 19:14 ` Jim Crilly 2005-05-13 20:18 ` Barry K. Nathan 2005-05-13 23:14 ` Jim Crilly 2005-05-13 19:16 ` Diego Calleja 2005-05-13 19:42 ` Frank Denis (Jedi/Sector One) 2005-05-15 9:54 ` Andi Kleen 2005-05-15 13:51 ` Mikulas Patocka 2005-05-15 14:12 ` Andi Kleen 2005-05-15 14:21 ` Mikulas Patocka 2005-05-15 14:52 ` Tomasz Torcz 2005-05-15 15:00 ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka 2005-05-15 15:21 ` Gene Heskett 2005-05-15 15:29 ` Jeff Garzik 2005-05-15 16:27 ` Disk write cache Kenichi Okuyama 2005-05-15 16:43 ` Jeff Garzik 2005-05-15 16:50 ` Kyle Moffett 2005-05-15 16:56 ` Andi Kleen 2005-05-15 20:44 ` Andrew Morton 2005-05-15 23:31 ` Cache based insecurity/CPU cache/Disk Cache Tradeoffs Brian O'Mahoney 2005-05-15 16:58 ` Disk write cache Mikulas Patocka 2005-05-15 17:20 ` Kenichi Okuyama 2005-05-16 11:02 ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree 2005-05-16 11:12 ` Arjan van de Ven 2005-05-16 11:29 ` Matthias Andree 2005-05-16 14:02 ` Arjan van de Ven 2005-05-16 14:48 ` Matthias Andree 2005-05-16 15:06 ` Alan Cox 2005-05-16 15:40 ` Matthias Andree 2005-05-16 18:04 ` Alan Cox 2005-05-16 19:11 ` Linux does not care for data integrity Florian Weimer 2005-05-29 21:02 ` Linux does not care for data integrity (was: Disk write cache) Greg Stark 2005-05-29 21:16 ` Matthias Andree 2005-05-30 6:04 ` Greg Stark 2005-05-30 8:21 ` Matthias Andree 2005-06-01 19:02 ` Linux does not care for data integrity Bill Davidsen 2005-06-01 22:02 ` Matthias Andree 2005-06-02 0:12 ` Bill Davidsen 2005-06-02 0:36 ` Jeff Garzik 2005-06-02 1:37 ` Bill Davidsen 2005-06-02 1:54 ` Jeff Garzik 2005-06-02 8:53 ` Helge Hafting 2005-06-02 12:00 ` Bill Davidsen 2005-06-02 13:33 ` Lennart Sorensen 2005-06-04 13:37 ` Bill Davidsen 2005-06-04 15:31 ` Bernd Eckenfels 2005-05-16 14:57 ` Linux does not care for data integrity (was: Disk write cache) Alan Cox 2005-05-16 13:48 ` Linux does not care for data integrity Mark Lord 2005-05-16 14:59 ` Matthias Andree 2005-05-16 1:56 ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett 2005-05-16 2:11 ` Jeff Garzik 2005-05-16 2:24 ` Mikulas Patocka 2005-05-16 3:05 ` Gene Heskett 2005-05-16 2:32 ` Mark Lord 2005-05-16 3:08 ` Gene Heskett 2005-05-16 13:44 ` Mark Lord 2005-05-18 4:03 ` Eric D. Mudama 2005-05-15 16:24 ` Mikulas Patocka 2005-05-16 11:18 ` Matthias Andree 2005-05-16 14:33 ` Jeff Garzik 2005-05-16 15:26 ` Richard B. Johnson 2005-05-16 16:00 ` [OT] drive behavior on power-off (was: Disk write cache) Matthias Andree 2005-05-16 18:11 ` Disk write cache (Was: Hyper-Threading Vulnerability) Valdis.Kletnieks 2005-05-16 14:54 ` Alan Cox 2005-05-17 13:15 ` Bill Davidsen 2005-05-17 21:41 ` Kyle Moffett 2005-05-18 4:06 ` Eric D. Mudama 2005-05-15 21:38 ` Tomasz Torcz 2005-05-16 14:50 ` Alan Cox 2005-05-15 15:00 ` Hyper-Threading Vulnerability Arjan van de Ven 2005-05-13 22:51 linux 2005-05-14 8:03 ` Arjan van de Ven
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).