linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Hyper-Threading Vulnerability
@ 2005-05-13  5:51 Gabor MICSKO
  2005-05-13 12:47 ` Barry K. Nathan
  2005-05-13 18:03 ` Andi Kleen
  0 siblings, 2 replies; 150+ messages in thread
From: Gabor MICSKO @ 2005-05-13  5:51 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 684 bytes --]

Hi!

From http://kerneltrap.org/node/5103

``Hyper-Threading, as currently implemented on Intel Pentium Extreme
Edition, Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from
a serious security flaw," Colin explains. "This flaw permits local
information disclosure, including allowing an unprivileged user to steal
an RSA private key being used on the same machine. Administrators of
multi-user systems are strongly advised to take action to disable
Hyper-Threading immediately."

``More'' info here:
http://www.daemonology.net/hyperthreading-considered-harmful/

Is this flaw affects the current stable Linux kernels? Workaround?
Patch?

Thanks.

-
MG

[-- Attachment #2: Ez az üzenetrész digitális aláírással van ellátva --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13  5:51 Hyper-Threading Vulnerability Gabor MICSKO
@ 2005-05-13 12:47 ` Barry K. Nathan
  2005-05-13 14:10   ` Jeff Garzik
  2005-05-13 18:03 ` Andi Kleen
  1 sibling, 1 reply; 150+ messages in thread
From: Barry K. Nathan @ 2005-05-13 12:47 UTC (permalink / raw)
  To: Gabor MICSKO; +Cc: linux-kernel

On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote:
> Is this flaw affects the current stable Linux kernels? Workaround?
> Patch?

Some pages with relevant information:
http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html
http://bugzilla.kernel.org/show_bug.cgi?id=2317

AFAICT, the workaround is something like this:
1. If possible, disable HyperThreading in BIOS.
2. If you have only one CPU, boot a UP kernel rather than SMP.
3. If you have 2 or more CPU's and you can't disable HT in the BIOS,
   boot with "maxcpus=n", where "n" is the number of physical CPU's
   in the computer (e.g. "maxcpus=2"). If you are running a kernel
   earlier than 2.6.5 or 2.4.26, this probably isn't going to work.
   If you try this, check dmesg afterward to make sure it worked
   properly (see the bugzilla.kernel.org URL for details).
4. If you would try #3 except you are running a 2.4.xx *vendor* kernel
   (not mainline), where xx < 26, try "noht".
5. If #3 and #4 don't work, try "acpi=off".

Option #3 ("maxcpus=2") is what I expect to be deploying in the next
several hours, FWIW...

-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 12:47 ` Barry K. Nathan
@ 2005-05-13 14:10   ` Jeff Garzik
  2005-05-13 14:23     ` Daniel Jacobowitz
  2005-05-13 20:23     ` Barry K. Nathan
  0 siblings, 2 replies; 150+ messages in thread
From: Jeff Garzik @ 2005-05-13 14:10 UTC (permalink / raw)
  To: Barry K. Nathan; +Cc: Gabor MICSKO, linux-kernel

Barry K. Nathan wrote:
> On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote:
> 
>>Is this flaw affects the current stable Linux kernels? Workaround?
>>Patch?

Simple.  Just boot a uniprocessor kernel, and/or disable HT in BIOS.


> Some pages with relevant information:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html
> http://bugzilla.kernel.org/show_bug.cgi?id=2317

These pages have zero information on the "flaw."  In fact, I can see no 
information at all proving that there is even a problem here.

Classic "I found a problem, but I'm keeping the info a secret" security 
crapola.

	Jeff



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 14:10   ` Jeff Garzik
@ 2005-05-13 14:23     ` Daniel Jacobowitz
  2005-05-13 14:32       ` Jeff Garzik
  2005-05-13 20:23     ` Barry K. Nathan
  1 sibling, 1 reply; 150+ messages in thread
From: Daniel Jacobowitz @ 2005-05-13 14:23 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Barry K. Nathan, Gabor MICSKO, linux-kernel

On Fri, May 13, 2005 at 10:10:36AM -0400, Jeff Garzik wrote:
> Barry K. Nathan wrote:
> >On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote:
> >
> >>Is this flaw affects the current stable Linux kernels? Workaround?
> >>Patch?
> 
> Simple.  Just boot a uniprocessor kernel, and/or disable HT in BIOS.
> 
> 
> >Some pages with relevant information:
> >http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html
> >http://bugzilla.kernel.org/show_bug.cgi?id=2317
> 
> These pages have zero information on the "flaw."  In fact, I can see no 
> information at all proving that there is even a problem here.
> 
> Classic "I found a problem, but I'm keeping the info a secret" security 
> crapola.

FYI:
  http://www.daemonology.net/hyperthreading-considered-harmful/

I don't much agree with Colin about the severity of the problem, but
I've read his paper, which should be generally available later today.
It's definitely a legitimate issue.

-- 
Daniel Jacobowitz
CodeSourcery, LLC

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 14:23     ` Daniel Jacobowitz
@ 2005-05-13 14:32       ` Jeff Garzik
  2005-05-13 17:13         ` Andy Isaacson
  2005-05-13 17:14         ` Gabor MICSKO
  0 siblings, 2 replies; 150+ messages in thread
From: Jeff Garzik @ 2005-05-13 14:32 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: Barry K. Nathan, Gabor MICSKO, linux-kernel

Daniel Jacobowitz wrote:
> On Fri, May 13, 2005 at 10:10:36AM -0400, Jeff Garzik wrote:
> 
>>Barry K. Nathan wrote:
>>
>>>On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote:
>>>
>>>
>>>>Is this flaw affects the current stable Linux kernels? Workaround?
>>>>Patch?
>>
>>Simple.  Just boot a uniprocessor kernel, and/or disable HT in BIOS.
>>
>>
>>
>>>Some pages with relevant information:
>>>http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html
>>>http://bugzilla.kernel.org/show_bug.cgi?id=2317
>>
>>These pages have zero information on the "flaw."  In fact, I can see no 
>>information at all proving that there is even a problem here.
>>
>>Classic "I found a problem, but I'm keeping the info a secret" security 
>>crapola.
> 
> 
> FYI:
>   http://www.daemonology.net/hyperthreading-considered-harmful/

Already read it.  This link provides no more information than either of 
the above links provide.


> I don't much agree with Colin about the severity of the problem, but
> I've read his paper, which should be generally available later today.
> It's definitely a legitimate issue.

We'll see...

As of this moment, there continues to be _zero_ information proving that 
a problem exists.

	Jeff



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 14:32       ` Jeff Garzik
@ 2005-05-13 17:13         ` Andy Isaacson
  2005-05-13 18:30           ` Vadim Lobanov
  2005-05-13 17:14         ` Gabor MICSKO
  1 sibling, 1 reply; 150+ messages in thread
From: Andy Isaacson @ 2005-05-13 17:13 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Daniel Jacobowitz, Barry K. Nathan, Gabor MICSKO, linux-kernel

On Fri, May 13, 2005 at 10:32:48AM -0400, Jeff Garzik wrote:
> Daniel Jacobowitz wrote:
> >  http://www.daemonology.net/hyperthreading-considered-harmful/
> 
> Already read it.  This link provides no more information than either of 
> the above links provide.

He's posted his paper now.

http://www.daemonology.net/papers/htt.pdf

It's a side channel timing attack on data-dependent computation through
the L1 and L2 caches.  Nice work.  In-the-wild exploitation is
difficult, though; your timing gets screwed up if you get scheduled away
from your victim, and you don't even know, because you can't tell where
you were scheduled, so on any reasonably busy multiuser system it's not
clear that the attack is practical.

-andy

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 14:32       ` Jeff Garzik
  2005-05-13 17:13         ` Andy Isaacson
@ 2005-05-13 17:14         ` Gabor MICSKO
  1 sibling, 0 replies; 150+ messages in thread
From: Gabor MICSKO @ 2005-05-13 17:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Daniel Jacobowitz, Barry K. Nathan, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 267 bytes --]


More info in this paper:

http://www.daemonology.net/papers/htt.pdf

 
> > FYI:
> >   http://www.daemonology.net/hyperthreading-considered-harmful/
> 
> Already read it.  This link provides no more information than either of 
> the above links provide.


[-- Attachment #2: Ez az üzenetrész digitális aláírással van ellátva --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13  5:51 Hyper-Threading Vulnerability Gabor MICSKO
  2005-05-13 12:47 ` Barry K. Nathan
@ 2005-05-13 18:03 ` Andi Kleen
  2005-05-13 18:34   ` Eric Rannaud
                     ` (3 more replies)
  1 sibling, 4 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-13 18:03 UTC (permalink / raw)
  To: Gabor MICSKO; +Cc: linux-kernel

Gabor MICSKO <gmicsko@szintezis.hu> writes:

> Hi!
>
> From http://kerneltrap.org/node/5103
>
> ``Hyper-Threading, as currently implemented on Intel Pentium Extreme
> Edition, Pentium 4, Mobile Pentium 4, and Xeon processors, suffers from
> a serious security flaw," Colin explains. "This flaw permits local
> information disclosure, including allowing an unprivileged user to steal
> an RSA private key being used on the same machine. Administrators of
> multi-user systems are strongly advised to take action to disable
> Hyper-Threading immediately."
>
> ``More'' info here:
> http://www.daemonology.net/hyperthreading-considered-harmful/
>
> Is this flaw affects the current stable Linux kernels? Workaround?
> Patch?

This is not a kernel problem, but a user space problem. The fix 
is to change the user space crypto code to need the same number of cache line
accesses on all keys. 

Disabling HT for this would the totally wrong approach, like throwing
out the baby with the bath water.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 17:13         ` Andy Isaacson
@ 2005-05-13 18:30           ` Vadim Lobanov
  2005-05-13 19:02             ` Andy Isaacson
  0 siblings, 1 reply; 150+ messages in thread
From: Vadim Lobanov @ 2005-05-13 18:30 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Jeff Garzik, Daniel Jacobowitz, Barry K. Nathan, Gabor MICSKO,
	linux-kernel

On Fri, 13 May 2005, Andy Isaacson wrote:

> On Fri, May 13, 2005 at 10:32:48AM -0400, Jeff Garzik wrote:
> > Daniel Jacobowitz wrote:
> > >  http://www.daemonology.net/hyperthreading-considered-harmful/
> >
> > Already read it.  This link provides no more information than either of
> > the above links provide.
>
> He's posted his paper now.
>
> http://www.daemonology.net/papers/htt.pdf
>
> It's a side channel timing attack on data-dependent computation through
> the L1 and L2 caches.  Nice work.  In-the-wild exploitation is
> difficult, though; your timing gets screwed up if you get scheduled away
> from your victim, and you don't even know, because you can't tell where
> you were scheduled, so on any reasonably busy multiuser system it's not
> clear that the attack is practical.
>
> -andy
> -

Wouldn't scheduling appear as a rather big time delta (in measuring the
cache access times), so you would know to disregard that data point?

(Just wondering... :-) )

-Vadim

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:03 ` Andi Kleen
@ 2005-05-13 18:34   ` Eric Rannaud
  2005-05-13 18:35   ` Alan Cox
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 150+ messages in thread
From: Eric Rannaud @ 2005-05-13 18:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Gabor MICSKO, linux-kernel

On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote:
> This is not a kernel problem, but a user space problem. The fix 
> is to change the user space crypto code to need the same number of cache line
> accesses on all keys. 

Well, this might not be trivial in general, and as pointed out by Colin
Perceval, this would require a major rewrite of OpenSSL RSA key
generation procedure. He also notes that other applications, a priori
less sensible, might also be targeted. And obviously, it would be
impractical to ensure this property in all application code.


> Disabling HT for this would the totally wrong approach, like throwing
> out the baby with the bath water.

Colin also mentions another work-around, at the level of the scheduler:

"[...] action must be taken to ensure that no pair of threads execute
simultaneously on the same processor core if they have different
privileges. Due to the complexities of performing such privilege checks
correctly and based on the principle that security fixes should be
chosen in such a way as to minimize the potential for new bugs to be
introduced, we recommend that existing operating systems provide the
necessary avoidance of inappropriate co-scheduling by never scheduling
any two threads on the same core, i.e., by only scheduling threads on
the first thread associated with each processor core. The more complex
solution of allowing certain "blessed" pairs of threads to be scheduled
on the same processor core is best delayed until future operating
systems where it can be extensively tested. In light of the potential
for information to be leaked across context switches, especially via the
L2 and larger cache(s), we also recommend that operating systems provide
some mechanism for processes to request special "secure" treatment,
which would include flushing all caches upon a context switch. It is not
immediately clear whether it is possible to use the occupancy of the
cache across context switches as a side channel, but if an unprivileged
user can cause his code to pre-empt a cryptographic operation
(e.g., by operating with a higher scheduling priority and being
repeatedly woken up by another process), then there is certainly a
strong possibility of a side
channel existing even in the absence of Hyper-Threading."

Is that relevant to the Linux kernel?

  /er.
-- 
"Sleep, she is for the weak"
http://www.eleves.ens.fr/home/rannaud/


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:03 ` Andi Kleen
  2005-05-13 18:34   ` Eric Rannaud
@ 2005-05-13 18:35   ` Alan Cox
  2005-05-13 18:49     ` Scott Robert Ladd
  2005-05-18 19:07     ` Bill Davidsen
  2005-05-13 18:38   ` Richard F. Rebel
  2005-05-13 19:16   ` Diego Calleja
  3 siblings, 2 replies; 150+ messages in thread
From: Alan Cox @ 2005-05-13 18:35 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Gabor MICSKO, Linux Kernel Mailing List

> This is not a kernel problem, but a user space problem. The fix 
> is to change the user space crypto code to need the same number of cache line
> accesses on all keys. 

You actually also need to hit the same cache line sequence on all keys
if you take a bit more care about it.

> Disabling HT for this would the totally wrong approach, like throwing
> out the baby with the bath water.

HT for most users is pretty irrelevant, its a neat idea but the
benchmarks don't suggest its too big a hit


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:03 ` Andi Kleen
  2005-05-13 18:34   ` Eric Rannaud
  2005-05-13 18:35   ` Alan Cox
@ 2005-05-13 18:38   ` Richard F. Rebel
  2005-05-13 19:05     ` Andi Kleen
  2005-05-13 19:14     ` Jim Crilly
  2005-05-13 19:16   ` Diego Calleja
  3 siblings, 2 replies; 150+ messages in thread
From: Richard F. Rebel @ 2005-05-13 18:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Gabor MICSKO, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 502 bytes --]

On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote:
> This is not a kernel problem, but a user space problem. The fix 
> is to change the user space crypto code to need the same number of cache line
> accesses on all keys. 
> 
> Disabling HT for this would the totally wrong approach, like throwing
> out the baby with the bath water.
> 
> -Andi

Why?  It's certainly reasonable to disable it for the time being and
even prudent to do so.

-- 
Richard F. Rebel

cat /dev/null > `tty`

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:35   ` Alan Cox
@ 2005-05-13 18:49     ` Scott Robert Ladd
  2005-05-13 19:08       ` Andi Kleen
                         ` (2 more replies)
  2005-05-18 19:07     ` Bill Davidsen
  1 sibling, 3 replies; 150+ messages in thread
From: Scott Robert Ladd @ 2005-05-13 18:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List

Alan Cox wrote:
> HT for most users is pretty irrelevant, its a neat idea but the
> benchmarks don't suggest its too big a hit

On real-world applications, I haven't seen HT boost performance by more
than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything
at all. HT is a nice idea, but I don't enable it on my systems.

..Scott

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:30           ` Vadim Lobanov
@ 2005-05-13 19:02             ` Andy Isaacson
  2005-05-15  9:31               ` Adrian Bunk
  0 siblings, 1 reply; 150+ messages in thread
From: Andy Isaacson @ 2005-05-13 19:02 UTC (permalink / raw)
  To: Vadim Lobanov
  Cc: Jeff Garzik, Daniel Jacobowitz, Barry K. Nathan, Gabor MICSKO,
	linux-kernel

On Fri, May 13, 2005 at 11:30:27AM -0700, Vadim Lobanov wrote:
> On Fri, 13 May 2005, Andy Isaacson wrote:
> > It's a side channel timing attack on data-dependent computation through
> > the L1 and L2 caches.  Nice work.  In-the-wild exploitation is
> > difficult, though; your timing gets screwed up if you get scheduled away
> > from your victim, and you don't even know, because you can't tell where
> > you were scheduled, so on any reasonably busy multiuser system it's not
> > clear that the attack is practical.
> 
> Wouldn't scheduling appear as a rather big time delta (in measuring the
> cache access times), so you would know to disregard that data point?
> 
> (Just wondering... :-) )

Good question.  Yes, you can probably filter the data.  The question is,
how hard is it to set up the conditions to acquire the data?  You have
to be scheduled on the same core as the target process (sibling
threads).  And you don't know when the target is going to be scheduled,
and on a real-world system, there are other threads competing for
scheduling; if it's SMP (2 core, 4 thread) with perfect 100% utilization
then you've only got a 33% chance of being scheduled on the right
thread, and it gets worse if the machine is idle since the kernel should
schedule you and the OpenSSL process on different cores...

Getting the conditions right is challenging.  Not impossible, but
neither is it a foregone conclusion.

-andy

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:38   ` Richard F. Rebel
@ 2005-05-13 19:05     ` Andi Kleen
  2005-05-13 21:26       ` Andy Isaacson
  2005-05-13 23:32       ` Paul Jakma
  2005-05-13 19:14     ` Jim Crilly
  1 sibling, 2 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-13 19:05 UTC (permalink / raw)
  To: Richard F. Rebel; +Cc: Gabor MICSKO, linux-kernel

On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote:
> > This is not a kernel problem, but a user space problem. The fix 
> > is to change the user space crypto code to need the same number of cache line
> > accesses on all keys. 
> > 
> > Disabling HT for this would the totally wrong approach, like throwing
> > out the baby with the bath water.
> > 
> > -Andi
> 
> Why?  It's certainly reasonable to disable it for the time being and
> even prudent to do so.

No, i strongly disagree on that. The reasonable thing to do is
to fix the crypto code which has this vulnerability, not break
a useful performance enhancement for everybody else.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:49     ` Scott Robert Ladd
@ 2005-05-13 19:08       ` Andi Kleen
  2005-05-13 19:36       ` Grant Coady
  2005-05-16 17:00       ` Linus Torvalds
  2 siblings, 0 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-13 19:08 UTC (permalink / raw)
  To: Scott Robert Ladd; +Cc: Alan Cox, Gabor MICSKO, Linux Kernel Mailing List

On Fri, May 13, 2005 at 02:49:25PM -0400, Scott Robert Ladd wrote:
> Alan Cox wrote:
> > HT for most users is pretty irrelevant, its a neat idea but the
> > benchmarks don't suggest its too big a hit
> 
> On real-world applications, I haven't seen HT boost performance by more
> than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything
> at all. HT is a nice idea, but I don't enable it on my systems.

I saw better improvement in some cases.  It always depends on the workload.
And on the generation of HT (there are three around). And lots of other
factors.

Even for your workload only it does not seem to me to be very rational
to throw away a 15% speedup with open eyes.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:38   ` Richard F. Rebel
  2005-05-13 19:05     ` Andi Kleen
@ 2005-05-13 19:14     ` Jim Crilly
  2005-05-13 20:18       ` Barry K. Nathan
  1 sibling, 1 reply; 150+ messages in thread
From: Jim Crilly @ 2005-05-13 19:14 UTC (permalink / raw)
  To: Richard F. Rebel; +Cc: Andi Kleen, Gabor MICSKO, linux-kernel

On 05/13/05 02:38:03PM -0400, Richard F. Rebel wrote:
> On Fri, 2005-05-13 at 20:03 +0200, Andi Kleen wrote:
> > This is not a kernel problem, but a user space problem. The fix 
> > is to change the user space crypto code to need the same number of cache line
> > accesses on all keys. 
> > 
> > Disabling HT for this would the totally wrong approach, like throwing
> > out the baby with the bath water.
> > 
> > -Andi
> 
> Why?  It's certainly reasonable to disable it for the time being and
> even prudent to do so.

And what if you have more than one physical HT processor? AFAIK there's no
way to disable HT and still run SMP at the same time.

> 
> -- 
> Richard F. Rebel
> 
> cat /dev/null > `tty`

Jim.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:03 ` Andi Kleen
                     ` (2 preceding siblings ...)
  2005-05-13 18:38   ` Richard F. Rebel
@ 2005-05-13 19:16   ` Diego Calleja
  2005-05-13 19:42     ` Frank Denis (Jedi/Sector One)
  2005-05-15  9:54     ` Andi Kleen
  3 siblings, 2 replies; 150+ messages in thread
From: Diego Calleja @ 2005-05-13 19:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: gmicsko, linux-kernel

El Fri, 13 May 2005 20:03:58 +0200,
Andi Kleen <ak@muc.de> escribió:


> This is not a kernel problem, but a user space problem. The fix 
> is to change the user space crypto code to need the same number of cache line
> accesses on all keys. 


However they've patched the FreeBSD kernel to "workaround?" it:
ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:49     ` Scott Robert Ladd
  2005-05-13 19:08       ` Andi Kleen
@ 2005-05-13 19:36       ` Grant Coady
  2005-05-16 17:00       ` Linus Torvalds
  2 siblings, 0 replies; 150+ messages in thread
From: Grant Coady @ 2005-05-13 19:36 UTC (permalink / raw)
  To: Scott Robert Ladd
  Cc: Alan Cox, Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List

On Fri, 13 May 2005 14:49:25 -0400, Scott Robert Ladd <lkml@coyotegulch.com> wrote:

>Alan Cox wrote:
>> HT for most users is pretty irrelevant, its a neat idea but the
>> benchmarks don't suggest its too big a hit
>
>On real-world applications, I haven't seen HT boost performance by more
>than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything
>at all. HT is a nice idea, but I don't enable it on my systems.

P4-HT is great for winxp, a runaway process only gets half the CPU 
resources, keeps the system responsive.  I like HT for that reason, 
perhaps that's what it was designed for?  Hardware fix for msft 'OS' :o)

Recently on single AMD CPU box, 2.6.latest-mm, diff got stuck, no 
disk activity, 100% CPU, started another terminal, recompiled kernel 
with 8K stacks and rebooted, the whole time the unkillable 'diff' 
was using just over 1/2 of resources.  top showed all 1GB RAM in use, 
no swap activity, nothing odd in /proc/whatever -- only happened once.

I suspected 4k stacks as only change before 'crash' was turning on 
samba server day before, but I didn't trace 'problem' as it wasn't 
really a crash.  Impressive -- seeing 2.6 handling a stupid process, 
business as usual for everything else.  Haven't had a problem since 
changing to 8K stacks.  nfs, samba and ssh terminals on reiserfs 3.6
on via sata.  May have had nvidia driver installed at the time, I 
now load that only when X running (rare), mostly headless use.

--Grant.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 19:16   ` Diego Calleja
@ 2005-05-13 19:42     ` Frank Denis (Jedi/Sector One)
  2005-05-15  9:54     ` Andi Kleen
  1 sibling, 0 replies; 150+ messages in thread
From: Frank Denis (Jedi/Sector One) @ 2005-05-13 19:42 UTC (permalink / raw)
  To: Diego Calleja; +Cc: Andi Kleen, gmicsko, linux-kernel

On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote:
> However they've patched the FreeBSD kernel to "workaround?" it:
> ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch

  This patch just disables hyperthreading by default.
  

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 19:14     ` Jim Crilly
@ 2005-05-13 20:18       ` Barry K. Nathan
  2005-05-13 23:14         ` Jim Crilly
  0 siblings, 1 reply; 150+ messages in thread
From: Barry K. Nathan @ 2005-05-13 20:18 UTC (permalink / raw)
  To: Richard F. Rebel, Andi Kleen, Gabor MICSKO, linux-kernel

On Fri, May 13, 2005 at 03:14:43PM -0400, Jim Crilly wrote:
> And what if you have more than one physical HT processor? AFAIK there's no
> way to disable HT and still run SMP at the same time.

Actually, there is; read my post earlier in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=111598859708620&w=2

To elaborate on the "check dmesg" part of that e-mail:

After you reboot with "maxcpus=2" (or however many physical CPU's you
have), you need to make sure you have messages like this, which indicate
that it really worked:

WARNING: No sibling found for CPU 0.
WARNING: No sibling found for CPU 1.

(and so on, if you have more than 2 CPU's)

-Barry K. Nathan <barryn@pobox.com>


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 14:10   ` Jeff Garzik
  2005-05-13 14:23     ` Daniel Jacobowitz
@ 2005-05-13 20:23     ` Barry K. Nathan
  1 sibling, 0 replies; 150+ messages in thread
From: Barry K. Nathan @ 2005-05-13 20:23 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Barry K. Nathan, Gabor MICSKO, linux-kernel

On Fri, May 13, 2005 at 10:10:36AM -0400, Jeff Garzik wrote:
> Barry K. Nathan wrote:
> >On Fri, May 13, 2005 at 07:51:20AM +0200, Gabor MICSKO wrote:
> >
> >>Is this flaw affects the current stable Linux kernels? Workaround?
> >>Patch?
> 
> Simple.  Just boot a uniprocessor kernel, and/or disable HT in BIOS.
> 
> 
> >Some pages with relevant information:
> >http://www.ussg.iu.edu/hypermail/linux/kernel/0403.2/0920.html
> >http://bugzilla.kernel.org/show_bug.cgi?id=2317
> 
> These pages have zero information on the "flaw."  In fact, I can see no 
> information at all proving that there is even a problem here.

I meant that those two URL's have relevant information regarding
disabling HT for those of us who can't simply boot a UP kernel or
disable HT in the BIOS, not that they had information on the flaw.

-Barry K. Nathan <barryn@pobox.com>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 19:05     ` Andi Kleen
@ 2005-05-13 21:26       ` Andy Isaacson
  2005-05-13 21:59         ` Matt Mackall
                           ` (4 more replies)
  2005-05-13 23:32       ` Paul Jakma
  1 sibling, 5 replies; 150+ messages in thread
From: Andy Isaacson @ 2005-05-13 21:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso

On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote:
> On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> > Why?  It's certainly reasonable to disable it for the time being and
> > even prudent to do so.
> 
> No, i strongly disagree on that. The reasonable thing to do is
> to fix the crypto code which has this vulnerability, not break
> a useful performance enhancement for everybody else.

Pardon me for saying so, but that's bullshit.  You're asking the crypto
guys to give up a 5x performance gain (that's my wild guess) by giving
up all their data-dependent algorithms and contorting their code wildly,
to avoid a microarchitectural problem with Intel's HT implementation.

There are three places to cut off the side channel, none of which is
obviously the right one.
1. The HT implementation could do the cache tricks Colin suggested in
   his paper.  Fairly large performance hit to address a fairly small
   problem.
2. The OS could do the scheduler tricks to avoid scheduling unfriendly
   threads on the same core.  You're leaving a lot of the benefit of HT
   on the floor by doing so.
3. Every security-sensitive app can be rigorously audited and re-written
   to avoid *ever* referencing memory with the address determined by
   private data.

(3) is a complete non-starter.  It's just not feasible to rewrite all
that code.  Furthermore, there's no way to know what code needs to be
rewritten!  (Until someone publishes an advisory, that is...)

Hmm, I can't think of any reason that this technique wouldn't work to
extract information from kernel secrets, as well... 

If SHA has plaintext-dependent memory references, Colin's technique
would enable an adversary to extract the contents of the /dev/random
pools.  I don't *think* SHA does, based on a quick reading of
lib/sha1.c, but someone with an actual clue should probably take a look.

Andi, are you prepared to *require* that no code ever make a memory
reference as a function of a secret?  Because that's what you're
suggesting the crypto people should do.

-andy

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 21:26       ` Andy Isaacson
@ 2005-05-13 21:59         ` Matt Mackall
  2005-05-13 22:47           ` Alan Cox
  2005-05-14  0:39         ` dean gaudet
                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 150+ messages in thread
From: Matt Mackall @ 2005-05-13 21:59 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Andi Kleen, Richard F. Rebel, Gabor MICSKO, linux-kernel, tytso

On Fri, May 13, 2005 at 02:26:20PM -0700, Andy Isaacson wrote:
> On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote:
> > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> > > Why?  It's certainly reasonable to disable it for the time being and
> > > even prudent to do so.
> > 
> > No, i strongly disagree on that. The reasonable thing to do is
> > to fix the crypto code which has this vulnerability, not break
> > a useful performance enhancement for everybody else.
> 
> Pardon me for saying so, but that's bullshit.  You're asking the crypto
> guys to give up a 5x performance gain (that's my wild guess) by giving
> up all their data-dependent algorithms and contorting their code wildly,
> to avoid a microarchitectural problem with Intel's HT implementation.
> 
> There are three places to cut off the side channel, none of which is
> obviously the right one.
> 1. The HT implementation could do the cache tricks Colin suggested in
>    his paper.  Fairly large performance hit to address a fairly small
>    problem.
> 2. The OS could do the scheduler tricks to avoid scheduling unfriendly
>    threads on the same core.  You're leaving a lot of the benefit of HT
>    on the floor by doing so.
> 3. Every security-sensitive app can be rigorously audited and re-written
>    to avoid *ever* referencing memory with the address determined by
>    private data.
> 
> (3) is a complete non-starter.  It's just not feasible to rewrite all
> that code.  Furthermore, there's no way to know what code needs to be
> rewritten!  (Until someone publishes an advisory, that is...)
> 
> Hmm, I can't think of any reason that this technique wouldn't work to
> extract information from kernel secrets, as well... 
> 
> If SHA has plaintext-dependent memory references, Colin's technique
> would enable an adversary to extract the contents of the /dev/random
> pools.  I don't *think* SHA does, based on a quick reading of
> lib/sha1.c, but someone with an actual clue should probably take a look.

SHA1 should be fine, as are the pool mixing bits. Much more
problematic is the ability to do timing attacks against the entropy
gathering itself. If an attacker can guess the TSC value that gets
mixed into the pool, that's a problem.

It might not be much of a problem though. If he's a bit off per guess
(really impressive), he'll still be many bits off by the time there's
enough entropy in the primary pool to reseed the secondary pool so he
can check his guesswork.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 21:59         ` Matt Mackall
@ 2005-05-13 22:47           ` Alan Cox
  2005-05-13 23:00             ` Lee Revell
  0 siblings, 1 reply; 150+ messages in thread
From: Alan Cox @ 2005-05-13 22:47 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso

On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote:
> It might not be much of a problem though. If he's a bit off per guess
> (really impressive), he'll still be many bits off by the time there's
> enough entropy in the primary pool to reseed the secondary pool so he
> can check his guesswork.

You can also disable the tsc to user space in the intel processors.
Thats something they anticipated as being neccessary in secure
environments long ago. This makes the attack much harder.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 22:47           ` Alan Cox
@ 2005-05-13 23:00             ` Lee Revell
  2005-05-13 23:27               ` Dave Jones
  0 siblings, 1 reply; 150+ messages in thread
From: Lee Revell @ 2005-05-13 23:00 UTC (permalink / raw)
  To: Alan Cox
  Cc: Matt Mackall, Andy Isaacson, Andi Kleen, Richard F. Rebel,
	Gabor MICSKO, Linux Kernel Mailing List, tytso

On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote:
> On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote:
> > It might not be much of a problem though. If he's a bit off per guess
> > (really impressive), he'll still be many bits off by the time there's
> > enough entropy in the primary pool to reseed the secondary pool so he
> > can check his guesswork.
> 
> You can also disable the tsc to user space in the intel processors.
> Thats something they anticipated as being neccessary in secure
> environments long ago. This makes the attack much harder.

And break the hundreds of apps that depend on rdtsc?  Am I missing
something?

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 20:18       ` Barry K. Nathan
@ 2005-05-13 23:14         ` Jim Crilly
  0 siblings, 0 replies; 150+ messages in thread
From: Jim Crilly @ 2005-05-13 23:14 UTC (permalink / raw)
  To: Barry K. Nathan; +Cc: Richard F. Rebel, Andi Kleen, Gabor MICSKO, linux-kernel

On 05/13/05 01:18:40PM -0700, Barry K. Nathan wrote:
> On Fri, May 13, 2005 at 03:14:43PM -0400, Jim Crilly wrote:
> > And what if you have more than one physical HT processor? AFAIK there's no
> > way to disable HT and still run SMP at the same time.
> 
> Actually, there is; read my post earlier in this thread:
> http://marc.theaimsgroup.com/?l=linux-kernel&m=111598859708620&w=2
> 
> To elaborate on the "check dmesg" part of that e-mail:
> 
> After you reboot with "maxcpus=2" (or however many physical CPU's you
> have), you need to make sure you have messages like this, which indicate
> that it really worked:
> 
> WARNING: No sibling found for CPU 0.
> WARNING: No sibling found for CPU 1.
> 
> (and so on, if you have more than 2 CPU's)

But what about machines that don't enumerate physical processors before
logical? The comment in setup.c implies that the order that the BIOS
presents CPUs is undefined and if you're unlucky enough to have a machine
that presents the CPUs as physical, logical, physical, logical, etc you're
screwed.

Jim.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 23:00             ` Lee Revell
@ 2005-05-13 23:27               ` Dave Jones
  2005-05-13 23:38                 ` Lee Revell
  0 siblings, 1 reply; 150+ messages in thread
From: Dave Jones @ 2005-05-13 23:27 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote:
 > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote:
 > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote:
 > > > It might not be much of a problem though. If he's a bit off per guess
 > > > (really impressive), he'll still be many bits off by the time there's
 > > > enough entropy in the primary pool to reseed the secondary pool so he
 > > > can check his guesswork.
 > > 
 > > You can also disable the tsc to user space in the intel processors.
 > > Thats something they anticipated as being neccessary in secure
 > > environments long ago. This makes the attack much harder.
 > 
 > And break the hundreds of apps that depend on rdtsc?  Am I missing
 > something?

If those apps depend on rdtsc being a) present, and b) working
without providing fallbacks, they're already broken.

There's a reason its displayed in /proc/cpuinfo's flags field,
and visible through cpuid. Apps should be testing for presence
before assuming features are present.

		Dave


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 19:05     ` Andi Kleen
  2005-05-13 21:26       ` Andy Isaacson
@ 2005-05-13 23:32       ` Paul Jakma
  2005-05-14 16:29         ` Paul Jakma
  1 sibling, 1 reply; 150+ messages in thread
From: Paul Jakma @ 2005-05-13 23:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel

On Fri, 13 May 2005, Andi Kleen wrote:

> No, i strongly disagree on that. The reasonable thing to do is to 
> fix the crypto code which has this vulnerability, not break a 
> useful performance enhancement for everybody else.

Already done:

http://www.openssl.org/news/secadv_20030317.txt

This is old news it seems, a timing attack that has long been known 
about and fixed.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
What happens when you cut back the jungle?  It recedes.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 23:27               ` Dave Jones
@ 2005-05-13 23:38                 ` Lee Revell
  2005-05-13 23:44                   ` Dave Jones
  2005-05-14 15:23                   ` Alan Cox
  0 siblings, 2 replies; 150+ messages in thread
From: Lee Revell @ 2005-05-13 23:38 UTC (permalink / raw)
  To: Dave Jones
  Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Fri, 2005-05-13 at 19:27 -0400, Dave Jones wrote:
> On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote:
>  > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote:
>  > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote:
>  > > > It might not be much of a problem though. If he's a bit off per guess
>  > > > (really impressive), he'll still be many bits off by the time there's
>  > > > enough entropy in the primary pool to reseed the secondary pool so he
>  > > > can check his guesswork.
>  > > 
>  > > You can also disable the tsc to user space in the intel processors.
>  > > Thats something they anticipated as being neccessary in secure
>  > > environments long ago. This makes the attack much harder.
>  > 
>  > And break the hundreds of apps that depend on rdtsc?  Am I missing
>  > something?
> 
> If those apps depend on rdtsc being a) present, and b) working
> without providing fallbacks, they're already broken.
> 
> There's a reason its displayed in /proc/cpuinfo's flags field,
> and visible through cpuid. Apps should be testing for presence
> before assuming features are present.
> 

Well yes but you would still have to recompile those apps.  And take the
big performance hit from using gettimeofday vs rdtsc.  Disabling HT by
default looks pretty good by comparison.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 23:38                 ` Lee Revell
@ 2005-05-13 23:44                   ` Dave Jones
  2005-05-14  7:37                     ` Lee Revell
  2005-05-14 15:23                   ` Alan Cox
  1 sibling, 1 reply; 150+ messages in thread
From: Dave Jones @ 2005-05-13 23:44 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Fri, May 13, 2005 at 07:38:08PM -0400, Lee Revell wrote:
 > On Fri, 2005-05-13 at 19:27 -0400, Dave Jones wrote:
 > > On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote:
 > >  > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote:
 > >  > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote:
 > >  > > > It might not be much of a problem though. If he's a bit off per guess
 > >  > > > (really impressive), he'll still be many bits off by the time there's
 > >  > > > enough entropy in the primary pool to reseed the secondary pool so he
 > >  > > > can check his guesswork.
 > >  > > 
 > >  > > You can also disable the tsc to user space in the intel processors.
 > >  > > Thats something they anticipated as being neccessary in secure
 > >  > > environments long ago. This makes the attack much harder.
 > >  > 
 > >  > And break the hundreds of apps that depend on rdtsc?  Am I missing
 > >  > something?
 > > 
 > > If those apps depend on rdtsc being a) present, and b) working
 > > without providing fallbacks, they're already broken.
 > > 
 > > There's a reason its displayed in /proc/cpuinfo's flags field,
 > > and visible through cpuid. Apps should be testing for presence
 > > before assuming features are present.
 > > 
 > 
 > Well yes but you would still have to recompile those apps.

Not if the app is written correctly. See above.

		Dave


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 21:26       ` Andy Isaacson
  2005-05-13 21:59         ` Matt Mackall
@ 2005-05-14  0:39         ` dean gaudet
  2005-05-16 13:41           ` Andrea Arcangeli
  2005-05-15  9:43         ` Andi Kleen
                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 150+ messages in thread
From: dean gaudet @ 2005-05-14  0:39 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Andi Kleen, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso

On Fri, 13 May 2005, Andy Isaacson wrote:

> On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote:
> > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> > > Why?  It's certainly reasonable to disable it for the time being and
> > > even prudent to do so.
> > 
> > No, i strongly disagree on that. The reasonable thing to do is
> > to fix the crypto code which has this vulnerability, not break
> > a useful performance enhancement for everybody else.
> 
> Pardon me for saying so, but that's bullshit.  You're asking the crypto
> guys to give up a 5x performance gain (that's my wild guess) by giving
> up all their data-dependent algorithms and contorting their code wildly,
> to avoid a microarchitectural problem with Intel's HT implementation.

i think your wild guess is way off.  i can think of several approaches to 
fix these problems which won't be anywhere near 5x.

the problem is that an attacker can observe which cache indices (rows) are 
in use.  one workaround is to overload the possible secrets which each 
index represents.

you can overload the secrets in each cache line:  for example when doing 
exponentiation there is an array of bignums x**(2*n).  bignums themselves 
are arrays (which span multiple cache lines).  do a "row/column transpose" 
on this array of arrays -- suddenly each cache line contains a number of 
possible secrets.  if you're operating with 32-bit words in a 64 byte line 
then you've achieved a 16-fold reduction in exposed information by this 
transpose.  there'll be almost no performance penalty.

you can overload the secrets in each cache index:  abuse the associativity 
of the cache.  the affected processors are all 8-way associative.  
ideally you'd want to arrange your data so that it all collides within the 
same cache index -- and get an 8-fold reduction in exposure.  the trick 
here is the L2 is physically indexed, and userland code can perform only 
virtual allocations.  but it's not too hard to discover physical conflicts 
if you really want to (using rdtsc) -- it would be done early in the 
initialization of the program because it involves asking for enough memory 
until the kernel gives you enough colliding pages.  (a system call could 
help with this if we really wanted it.)

my not-so-wild guess is a 128-fold reduction for less than 10% perf hit...

i think there's possibly another approach involving a permuted array of 
indirection pointers... which is going to affect perf a bit due to the 
extra indirection required, but we're talking <10% here.  (i'm just not 
convinced yet you can select a permutation in a manner which doesn't leak 
information when the attacker can view multiple invocations of the crypto 
for example.)


> If SHA has plaintext-dependent memory references, Colin's technique
> would enable an adversary to extract the contents of the /dev/random
> pools.  I don't *think* SHA does, based on a quick reading of
> lib/sha1.c, but someone with an actual clue should probably take a look.

the SHA family do not have any data-dependencies in their memory access 
patterns.

-dean

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 23:44                   ` Dave Jones
@ 2005-05-14  7:37                     ` Lee Revell
  2005-05-14 15:33                       ` Andrea Arcangeli
  0 siblings, 1 reply; 150+ messages in thread
From: Lee Revell @ 2005-05-14  7:37 UTC (permalink / raw)
  To: Dave Jones
  Cc: Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Fri, 2005-05-13 at 19:44 -0400, Dave Jones wrote:
> On Fri, May 13, 2005 at 07:38:08PM -0400, Lee Revell wrote:
>  > On Fri, 2005-05-13 at 19:27 -0400, Dave Jones wrote:
>  > > On Fri, May 13, 2005 at 07:00:12PM -0400, Lee Revell wrote:
>  > >  > On Fri, 2005-05-13 at 23:47 +0100, Alan Cox wrote:
>  > >  > > On Gwe, 2005-05-13 at 22:59, Matt Mackall wrote:
>  > >  > > > It might not be much of a problem though. If he's a bit off per guess
>  > >  > > > (really impressive), he'll still be many bits off by the time there's
>  > >  > > > enough entropy in the primary pool to reseed the secondary pool so he
>  > >  > > > can check his guesswork.
>  > >  > > 
>  > >  > > You can also disable the tsc to user space in the intel processors.
>  > >  > > Thats something they anticipated as being neccessary in secure
>  > >  > > environments long ago. This makes the attack much harder.
>  > >  > 
>  > >  > And break the hundreds of apps that depend on rdtsc?  Am I missing
>  > >  > something?
>  > > 
>  > > If those apps depend on rdtsc being a) present, and b) working
>  > > without providing fallbacks, they're already broken.
>  > > 
>  > > There's a reason its displayed in /proc/cpuinfo's flags field,
>  > > and visible through cpuid. Apps should be testing for presence
>  > > before assuming features are present.
>  > > 
>  > 
>  > Well yes but you would still have to recompile those apps.
> 
> Not if the app is written correctly. See above.

The apps that bother to use rdtsc vs. gettimeofday need a cheap high res
timer more than a correct one anyway - it's not guaranteed that rdtsc
provides a reliable time source at all, due to SMP and frequency scaling
issues.

I'll try to benchmark the difference.  Maybe it's not that big a deal.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 23:38                 ` Lee Revell
  2005-05-13 23:44                   ` Dave Jones
@ 2005-05-14 15:23                   ` Alan Cox
  2005-05-14 15:45                     ` andrea
  2005-05-14 16:30                     ` Lee Revell
  1 sibling, 2 replies; 150+ messages in thread
From: Alan Cox @ 2005-05-14 15:23 UTC (permalink / raw)
  To: Lee Revell
  Cc: Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sad, 2005-05-14 at 00:38, Lee Revell wrote:
> Well yes but you would still have to recompile those apps.  And take the
> big performance hit from using gettimeofday vs rdtsc.  Disabling HT by
> default looks pretty good by comparison.

You cannot use rdtsc for anything but rough instruction timing. The
timers for different processors run at different speeds on some SMP
systems, the timer rates vary as processors change clock rate nowdays.
Rdtsc may also jump dramatically on a suspend/resume.

If the app uses rdtsc then generally speaking its terminally broken. The
only exception is some profiling tools.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14  7:37                     ` Lee Revell
@ 2005-05-14 15:33                       ` Andrea Arcangeli
  2005-05-15  1:07                         ` Christer Weinigel
  2005-05-15  9:48                         ` Andi Kleen
  0 siblings, 2 replies; 150+ messages in thread
From: Andrea Arcangeli @ 2005-05-14 15:33 UTC (permalink / raw)
  To: Lee Revell
  Cc: Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, May 14, 2005 at 03:37:18AM -0400, Lee Revell wrote:
> The apps that bother to use rdtsc vs. gettimeofday need a cheap high res
> timer more than a correct one anyway - it's not guaranteed that rdtsc
> provides a reliable time source at all, due to SMP and frequency scaling
> issues.

On x86-64 the cost of gettimeofday is the same of the tsc, turning off
tsc on x86-64 is not nice (even if we usually have HPET there, so
perhaps it wouldn't be too bad). TSC is something only the kernel (or a
person with some kernel/hardware knowledge) can do safely knowing it'll
work fine. But on x86-64 parts of the kernel runs in userland...

Preventing tasks with different uid to run on the same physical cpu was
my first idea, disabled by default via sysctl, so only if one is
paranoid can enable it.

But before touching the kernel in any way it would be really nice if
somebody could bother to demonstrate this is real because I've an hard
time to believe this is not purely vapourware. On artificial
environments a computer can recognize the difference between two faces
too, no big deal, but that doesn't mean the same software is going to
recognize million of different faces at the airport too. So nothing has
been demonstrated in practical terms yet.

Nobody runs openssl -sign thousand of times in a row on a pure idle
system without noticing the 100% load on the other cpu for months (and
he's not root so he can't hide his runaway 100% process, if he was root
and he could modify the kernel or ps/top to hide the runaway process,
he'd have faster ways to sniff).

So to me this sounds a purerly theoretical problem. Cache covert
channels are possible too as the paper states, next time somebody will
find how to sniff a letter out of a pdf document on a UP no-HT system by
opening and closing it some million of times on a otherwise idle system.
We're sure not going to flush the l2 cache because of that (at least not
by default ;).

This was an interesting read, but in practice I'd rate this to have
severity 1 on a 0-100 scale, unless somebody bothers to demonstrate it
in a remotely realistic environment.

Even if this would be real if they sniff a openssl key, unless they also
crack the dns the browser will complain (not very different from not
having a certificate authority signature on a fake key). And if the
server is remotely serious they'll notice the 100% runaway process way
before he can sniff the whole key (the 100% runaway load cannot be
hidden). Most servers have some statistics so a 100% load for weeks or
months isn't very likely to be overlooked.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 15:23                   ` Alan Cox
@ 2005-05-14 15:45                     ` andrea
  2005-05-15 13:38                       ` Mikulas Patocka
  2005-05-14 16:30                     ` Lee Revell
  1 sibling, 1 reply; 150+ messages in thread
From: andrea @ 2005-05-14 15:45 UTC (permalink / raw)
  To: Alan Cox
  Cc: Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso,
	Andrew Morton

On Sat, May 14, 2005 at 04:23:10PM +0100, Alan Cox wrote:
> You cannot use rdtsc for anything but rough instruction timing. The
> timers for different processors run at different speeds on some SMP
> systems, the timer rates vary as processors change clock rate nowdays.
> Rdtsc may also jump dramatically on a suspend/resume.

x86-64 uses it for vgettimeofday very safely (i386 could do too but it
doesn't).

Anyway I believe at least for seccomp it's worth to turn off the tsc,
not just for HT but for the L2 cache too. So it's up to you, either you
turn it off completely (which isn't very nice IMHO) or I recommend to
apply this below patch. This has been tested successfully on x86-64
against current cogito repository (i686 compiles so I didn't bother
testing ;). People selling the cpu through cpushare may appreciate this
bit for a peace of mind. There's no way to get any timing info anymore
with this applied (gettimeofday is forbidden of course). The seccomp
environment is completely deterministic so it can't be allowed to get
timing info, it has to be deterministic so in the future I can enable a
computing mode that does a parallel computing for each task with server
side transparent checkpointing and verification that the output is the
same from all the 2/3 seller computers for each task, without the buyer
even noticing (for now the verification is left to the buyer client
side and there's no checkpointing, since that would require more kernel
changes to track the dirty bits but it'll be easy to extend once the
basic mode is finished).

Thanks.

Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>

Index: arch/i386/kernel/process.c
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/i386/kernel/process.c  (mode:100644)
+++ uncommitted/arch/i386/kernel/process.c  (mode:100644)
@@ -561,6 +561,25 @@
 }
 
 /*
+ * This function selects if the context switch from prev to next
+ * has to tweak the TSC disable bit in the cr4.
+ */
+static void disable_tsc(struct thread_info *prev,
+			struct thread_info *next)
+{
+	if (unlikely(has_secure_computing(prev) ||
+		     has_secure_computing(next))) {
+		/* slow path here */
+		if (has_secure_computing(prev) &&
+		    !has_secure_computing(next)) {
+			clear_in_cr4(X86_CR4_TSD);
+		} else if (!has_secure_computing(prev) &&
+			   has_secure_computing(next))
+			set_in_cr4(X86_CR4_TSD);
+	}
+}
+
+/*
  *	switch_to(x,yn) should switch tasks from x to y.
  *
  * We fsave/fwait so that an exception goes off at the right time
@@ -639,6 +658,8 @@
 	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
 		handle_io_bitmap(next, tss);
 
+	disable_tsc(prev_p->thread_info, next_p->thread_info);
+
 	return prev_p;
 }
 
Index: arch/x86_64/kernel/process.c
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/x86_64/kernel/process.c  (mode:100644)
+++ uncommitted/arch/x86_64/kernel/process.c  (mode:100644)
@@ -439,6 +439,25 @@
 }
 
 /*
+ * This function selects if the context switch from prev to next
+ * has to tweak the TSC disable bit in the cr4.
+ */
+static void disable_tsc(struct thread_info *prev,
+			struct thread_info *next)
+{
+	if (unlikely(has_secure_computing(prev) ||
+		     has_secure_computing(next))) {
+		/* slow path here */
+		if (has_secure_computing(prev) &&
+		    !has_secure_computing(next)) {
+			clear_in_cr4(X86_CR4_TSD);
+		} else if (!has_secure_computing(prev) &&
+			   has_secure_computing(next))
+			set_in_cr4(X86_CR4_TSD);
+	}
+}
+
+/*
  * This special macro can be used to load a debugging register
  */
 #define loaddebug(thread,r) set_debug(thread->debugreg ## r, r)
@@ -556,6 +575,8 @@
 		}
 	}
 
+	disable_tsc(prev_p->thread_info, next_p->thread_info);
+
 	return prev_p;
 }
 
Index: include/linux/seccomp.h
===================================================================
--- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/include/linux/seccomp.h  (mode:100644)
+++ uncommitted/include/linux/seccomp.h  (mode:100644)
@@ -19,6 +19,11 @@
 		__secure_computing(this_syscall);
 }
 
+static inline int has_secure_computing(struct thread_info *ti)
+{
+	return unlikely(test_ti_thread_flag(ti, TIF_SECCOMP));
+}
+
 #else /* CONFIG_SECCOMP */
 
 #if (__GNUC__ > 2)
@@ -28,6 +33,7 @@
 #endif
 
 #define secure_computing(x) do { } while (0)
+#define has_secure_computing(x) 0
 
 #endif /* CONFIG_SECCOMP */
 

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 23:32       ` Paul Jakma
@ 2005-05-14 16:29         ` Paul Jakma
  0 siblings, 0 replies; 150+ messages in thread
From: Paul Jakma @ 2005-05-14 16:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel

On Sat, 14 May 2005, Paul Jakma wrote:

> http://www.openssl.org/news/secadv_20030317.txt
>
> This is old news it seems, a timing attack that has long been known 
> about and fixed.

I've now been told it's a new, more involved, timing attack to the 
one the URL above describes a defence against.

regards,
-- 
Paul Jakma	paul@clubi.ie	paul@jakma.org	Key ID: 64A2FF6A
Fortune:
Weinberg's First Law:
 	Progress is only made on alternate Fridays.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 15:23                   ` Alan Cox
  2005-05-14 15:45                     ` andrea
@ 2005-05-14 16:30                     ` Lee Revell
  2005-05-14 16:44                       ` Arjan van de Ven
                                         ` (2 more replies)
  1 sibling, 3 replies; 150+ messages in thread
From: Lee Revell @ 2005-05-14 16:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote:
> On Sad, 2005-05-14 at 00:38, Lee Revell wrote:
> > Well yes but you would still have to recompile those apps.  And take the
> > big performance hit from using gettimeofday vs rdtsc.  Disabling HT by
> > default looks pretty good by comparison.
> 
> You cannot use rdtsc for anything but rough instruction timing. The
> timers for different processors run at different speeds on some SMP
> systems, the timer rates vary as processors change clock rate nowdays.
> Rdtsc may also jump dramatically on a suspend/resume.
> 
> If the app uses rdtsc then generally speaking its terminally broken. The
> only exception is some profiling tools.

That is basically all JACK and mplayer use it for.  They have RT
constraints and the tsc is used to know if we got woken up too late and
should just drop some frames.  The developers are aware of the issues
with rdtsc and have chosen to use it anyway because these apps need
every ounce of CPU and cannot tolerate the overhead of gettimeofday(). 

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 16:30                     ` Lee Revell
@ 2005-05-14 16:44                       ` Arjan van de Ven
  2005-05-14 17:56                         ` Lee Revell
  2005-05-14 17:04                       ` Jindrich Makovicka
  2005-05-15  9:58                       ` Andi Kleen
  2 siblings, 1 reply; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-14 16:44 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 12:30 -0400, Lee Revell wrote:
> On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote:
> > On Sad, 2005-05-14 at 00:38, Lee Revell wrote:
> > > Well yes but you would still have to recompile those apps.  And take the
> > > big performance hit from using gettimeofday vs rdtsc.  Disabling HT by
> > > default looks pretty good by comparison.
> > 
> > You cannot use rdtsc for anything but rough instruction timing. The
> > timers for different processors run at different speeds on some SMP
> > systems, the timer rates vary as processors change clock rate nowdays.
> > Rdtsc may also jump dramatically on a suspend/resume.
> > 
> > If the app uses rdtsc then generally speaking its terminally broken. The
> > only exception is some profiling tools.
> 
> That is basically all JACK and mplayer use it for.  They have RT
> constraints and the tsc is used to know if we got woken up too late and
> should just drop some frames.  The developers are aware of the issues
> with rdtsc and have chosen to use it anyway because these apps need
> every ounce of CPU and cannot tolerate the overhead of gettimeofday(). 


then JACK is terminally broken if it doesn't have a fallback for non-
rdtsc cpus. 


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 16:30                     ` Lee Revell
  2005-05-14 16:44                       ` Arjan van de Ven
@ 2005-05-14 17:04                       ` Jindrich Makovicka
  2005-05-14 18:27                         ` Lee Revell
  2005-05-15  9:58                       ` Andi Kleen
  2 siblings, 1 reply; 150+ messages in thread
From: Jindrich Makovicka @ 2005-05-14 17:04 UTC (permalink / raw)
  To: linux-kernel

Lee Revell wrote:
> On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote:
> 
>>On Sad, 2005-05-14 at 00:38, Lee Revell wrote:
>>
>>>Well yes but you would still have to recompile those apps.  And take the
>>>big performance hit from using gettimeofday vs rdtsc.  Disabling HT by
>>>default looks pretty good by comparison.
>>
>>You cannot use rdtsc for anything but rough instruction timing. The
>>timers for different processors run at different speeds on some SMP
>>systems, the timer rates vary as processors change clock rate nowdays.
>>Rdtsc may also jump dramatically on a suspend/resume.
>>
>>If the app uses rdtsc then generally speaking its terminally broken. The
>>only exception is some profiling tools.
> 
> 
> That is basically all JACK and mplayer use it for.  They have RT
> constraints and the tsc is used to know if we got woken up too late and
> should just drop some frames.  The developers are aware of the issues
> with rdtsc and have chosen to use it anyway because these apps need
> every ounce of CPU and cannot tolerate the overhead of gettimeofday(). 

AFAIK, mplayer actually uses gettimeofday(). rdtsc is used in some
places for profiling and debugging purposes and not compiled in by default.

-- 
Jindrich Makovicka


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 16:44                       ` Arjan van de Ven
@ 2005-05-14 17:56                         ` Lee Revell
  2005-05-14 18:01                           ` Arjan van de Ven
  2005-05-15  9:33                           ` Hyper-Threading Vulnerability Adrian Bunk
  0 siblings, 2 replies; 150+ messages in thread
From: Lee Revell @ 2005-05-14 17:56 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote:
> then JACK is terminally broken if it doesn't have a fallback for non-
> rdtsc cpus. 

It does have a fallback, but the selection is done at compile time.  It
uses rdtsc for all x86 CPUs except pre-i586 SMP systems.

Maybe we should check at runtime, but this has always worked.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 17:56                         ` Lee Revell
@ 2005-05-14 18:01                           ` Arjan van de Ven
  2005-05-14 19:21                             ` Lee Revell
  2005-05-15 10:01                             ` Andi Kleen
  2005-05-15  9:33                           ` Hyper-Threading Vulnerability Adrian Bunk
  1 sibling, 2 replies; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-14 18:01 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote:
> On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote:
> > then JACK is terminally broken if it doesn't have a fallback for non-
> > rdtsc cpus. 
> 
> It does have a fallback, but the selection is done at compile time.  It
> uses rdtsc for all x86 CPUs except pre-i586 SMP systems.
> 
> Maybe we should check at runtime,

it's probably a sign that JACK isn't used on SMP systems much, at least
not on the bigger systems (like IBM's x440's) where the tsc *will*
differ wildly between cpus...



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 17:04                       ` Jindrich Makovicka
@ 2005-05-14 18:27                         ` Lee Revell
  0 siblings, 0 replies; 150+ messages in thread
From: Lee Revell @ 2005-05-14 18:27 UTC (permalink / raw)
  To: Jindrich Makovicka; +Cc: linux-kernel

On Sat, 2005-05-14 at 19:04 +0200, Jindrich Makovicka wrote:
> AFAIK, mplayer actually uses gettimeofday(). rdtsc is used in some
> places for profiling and debugging purposes and not compiled in by default.
> 

OK.  The comments in the JACK code say it was copied from mplayer.  I
guess the usage is not the same.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 18:01                           ` Arjan van de Ven
@ 2005-05-14 19:21                             ` Lee Revell
  2005-05-14 19:48                               ` Arjan van de Ven
  2005-05-15 10:01                             ` Andi Kleen
  1 sibling, 1 reply; 150+ messages in thread
From: Lee Revell @ 2005-05-14 19:21 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 20:01 +0200, Arjan van de Ven wrote:
> On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote:
> > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote:
> > > then JACK is terminally broken if it doesn't have a fallback for non-
> > > rdtsc cpus. 
> > 
> > It does have a fallback, but the selection is done at compile time.  It
> > uses rdtsc for all x86 CPUs except pre-i586 SMP systems.
> > 
> > Maybe we should check at runtime,
> 
> it's probably a sign that JACK isn't used on SMP systems much, at least
> not on the bigger systems (like IBM's x440's) where the tsc *will*
> differ wildly between cpus...

Correct.  The only bug reports we have seen related to the use of the
TSC is due to CPU frequency scaling.  The fix is to not use it - people
who want to use their PC as a DSP for audio probably don't want their
processor slowing down anyway.  And JACK is targeted at desktop and
smaller systems, it would be kind of crazy to run it on a big iron.
Well, maybe there are people who like to record sessions or practice
guitar in the server room...

If gettimeofday is really as cheap as rdtsc on x86_64, we should use it.
But it's too expensive for slower x86 systems.  Anyway, Andi's fix
disables *all* high res timing including gettimeofday.  Obviously no
multimedia app can tolerate this, so discussing rdtsc is really a red
herring.  But multimedia apps aren't much in seccomp environments
either.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 19:21                             ` Lee Revell
@ 2005-05-14 19:48                               ` Arjan van de Ven
  2005-05-14 23:40                                 ` Lee Revell
  2005-05-15  3:19                                 ` dean gaudet
  0 siblings, 2 replies; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-14 19:48 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 15:21 -0400, Lee Revell wrote:
> On Sat, 2005-05-14 at 20:01 +0200, Arjan van de Ven wrote:
> > On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote:
> > > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote:
> > > > then JACK is terminally broken if it doesn't have a fallback for non-
> > > > rdtsc cpus. 
> > > 
> > > It does have a fallback, but the selection is done at compile time.  It
> > > uses rdtsc for all x86 CPUs except pre-i586 SMP systems.
> > > 
> > > Maybe we should check at runtime,
> > 
> > it's probably a sign that JACK isn't used on SMP systems much, at least
> > not on the bigger systems (like IBM's x440's) where the tsc *will*
> > differ wildly between cpus...
> 
> Correct.  The only bug reports we have seen related to the use of the
> TSC is due to CPU frequency scaling.  The fix is to not use it - people
> who want to use their PC as a DSP for audio probably don't want their
> processor slowing down anyway. 

it's a matter of time (my estimate is a year or two) before processors
get variable frequencies based on temperature targets etc...
and then rdtsc is really useless for this kind of thing..



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 19:48                               ` Arjan van de Ven
@ 2005-05-14 23:40                                 ` Lee Revell
  2005-05-15  7:30                                   ` Arjan van de Ven
  2005-05-15  9:37                                   ` Andi Kleen
  2005-05-15  3:19                                 ` dean gaudet
  1 sibling, 2 replies; 150+ messages in thread
From: Lee Revell @ 2005-05-14 23:40 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 21:48 +0200, Arjan van de Ven wrote:
> On Sat, 2005-05-14 at 15:21 -0400, Lee Revell wrote:
> > On Sat, 2005-05-14 at 20:01 +0200, Arjan van de Ven wrote:
> > > On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote:
> > > > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote:
> > > > > then JACK is terminally broken if it doesn't have a fallback for non-
> > > > > rdtsc cpus. 
> > > > 
> > > > It does have a fallback, but the selection is done at compile time.  It
> > > > uses rdtsc for all x86 CPUs except pre-i586 SMP systems.
> > > > 
> > > > Maybe we should check at runtime,
> > > 
> > > it's probably a sign that JACK isn't used on SMP systems much, at least
> > > not on the bigger systems (like IBM's x440's) where the tsc *will*
> > > differ wildly between cpus...
> > 
> > Correct.  The only bug reports we have seen related to the use of the
> > TSC is due to CPU frequency scaling.  The fix is to not use it - people
> > who want to use their PC as a DSP for audio probably don't want their
> > processor slowing down anyway. 
> 
> it's a matter of time (my estimate is a year or two) before processors
> get variable frequencies based on temperature targets etc...
> and then rdtsc is really useless for this kind of thing..

I was under the impression that P4 and later processors do not vary the
TSC rate when doing frequency scaling.  This is mentioned in the
documentation for the high res timers patch.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 15:33                       ` Andrea Arcangeli
@ 2005-05-15  1:07                         ` Christer Weinigel
  2005-05-15  9:48                         ` Andi Kleen
  1 sibling, 0 replies; 150+ messages in thread
From: Christer Weinigel @ 2005-05-15  1:07 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Lee Revell, Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson,
	Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso

Andrea Arcangeli <andrea@suse.de> writes:

> Nobody runs openssl -sign thousand of times in a row on a pure idle
> system without noticing the 100% load on the other cpu for months

Well, actually one does.  On a normal https server, each https request
results in an operation on the private key.  So if the attacker shares
the same web server as the victim it's probably rather easy for the
attacker to see when the machine is idle and launch an attack giving
him thousands of chances to spy on the victim.

But I do agree that this probably isn't all that serious, for those
who really have secrets to hide, they won't run their https server on
a machine shared with anybody else.

  /Christer

-- 
"Just how much can I get away with and still go to heaven?"

Freelance consultant specializing in device driver programming for Linux 
Christer Weinigel <christer@weinigel.se>  http://www.weinigel.se

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 19:48                               ` Arjan van de Ven
  2005-05-14 23:40                                 ` Lee Revell
@ 2005-05-15  3:19                                 ` dean gaudet
  1 sibling, 0 replies; 150+ messages in thread
From: dean gaudet @ 2005-05-15  3:19 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Lee Revell, Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson,
	Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso

On Sat, 14 May 2005, Arjan van de Ven wrote:

> it's a matter of time (my estimate is a year or two) before processors
> get variable frequencies based on temperature targets etc...
> and then rdtsc is really useless for this kind of thing..

what do you mean "a year or two"?  processors have been doing this for 
many years now.

i'm biased, but i still think transmeta did this the right way... the tsc 
operates at the top frequency of the processor always.

i do a hell of a lot of microbenchmarking on various processors and i 
always use tsc -- but i'm just smart enough to take multiple samples and i 
try to make each sample smaller than a time slice... which avoids most of 
the pitfalls, and would even work on smp boxes with tsc differences.

-dean

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 23:40                                 ` Lee Revell
@ 2005-05-15  7:30                                   ` Arjan van de Ven
  2005-05-15 20:41                                     ` Alan Cox
  2005-05-15  9:37                                   ` Andi Kleen
  1 sibling, 1 reply; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-15  7:30 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, 2005-05-14 at 19:40 -0400, Lee Revell wrote:
> > it's a matter of time (my estimate is a year or two) before processors
> > get variable frequencies based on temperature targets etc...
> > and then rdtsc is really useless for this kind of thing..
> 
> I was under the impression that P4 and later processors do not vary the
> TSC rate when doing frequency scaling.  This is mentioned in the
> documentation for the high res timers patch.

seems not the case, and worse, during idle time the clock is allowed to
stop entirely.... (and that is also happening more and more and linux is
getting more agressive idle support (eg no timer tick and such patches)
which will trigger bios thresholds for this even more too.




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 19:02             ` Andy Isaacson
@ 2005-05-15  9:31               ` Adrian Bunk
  0 siblings, 0 replies; 150+ messages in thread
From: Adrian Bunk @ 2005-05-15  9:31 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Vadim Lobanov, Jeff Garzik, Daniel Jacobowitz, Barry K. Nathan,
	Gabor MICSKO, linux-kernel

On Fri, May 13, 2005 at 12:02:44PM -0700, Andy Isaacson wrote:
> On Fri, May 13, 2005 at 11:30:27AM -0700, Vadim Lobanov wrote:
> > On Fri, 13 May 2005, Andy Isaacson wrote:
> > > It's a side channel timing attack on data-dependent computation through
> > > the L1 and L2 caches.  Nice work.  In-the-wild exploitation is
> > > difficult, though; your timing gets screwed up if you get scheduled away
> > > from your victim, and you don't even know, because you can't tell where
> > > you were scheduled, so on any reasonably busy multiuser system it's not
> > > clear that the attack is practical.
> > 
> > Wouldn't scheduling appear as a rather big time delta (in measuring the
> > cache access times), so you would know to disregard that data point?
> > 
> > (Just wondering... :-) )
> 
> Good question.  Yes, you can probably filter the data.  The question is,
> how hard is it to set up the conditions to acquire the data?  You have
> to be scheduled on the same core as the target process (sibling
> threads).  And you don't know when the target is going to be scheduled,
> and on a real-world system, there are other threads competing for
> scheduling; if it's SMP (2 core, 4 thread) with perfect 100% utilization
> then you've only got a 33% chance of being scheduled on the right
> thread, and it gets worse if the machine is idle since the kernel should
> schedule you and the OpenSSL process on different cores...
>...

But if you start 3 processes in the idle case you might get a 100% 
chance?

> -andy

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 17:56                         ` Lee Revell
  2005-05-14 18:01                           ` Arjan van de Ven
@ 2005-05-15  9:33                           ` Adrian Bunk
  1 sibling, 0 replies; 150+ messages in thread
From: Adrian Bunk @ 2005-05-15  9:33 UTC (permalink / raw)
  To: Lee Revell
  Cc: Arjan van de Ven, Alan Cox, Dave Jones, Matt Mackall,
	Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso

On Sat, May 14, 2005 at 01:56:36PM -0400, Lee Revell wrote:
> On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote:
> > then JACK is terminally broken if it doesn't have a fallback for non-
> > rdtsc cpus. 
> 
> It does have a fallback, but the selection is done at compile time.  It
> uses rdtsc for all x86 CPUs except pre-i586 SMP systems.
> 
> Maybe we should check at runtime, but this has always worked.

If this is critical for JACK, runtime selection was an improvement for 
distributions like Debian that support both pre-i586 SMP systems and 
current hardware.

> Lee

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 23:40                                 ` Lee Revell
  2005-05-15  7:30                                   ` Arjan van de Ven
@ 2005-05-15  9:37                                   ` Andi Kleen
  1 sibling, 0 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-15  9:37 UTC (permalink / raw)
  To: Lee Revell
  Cc: Arjan van de Ven, Alan Cox, Dave Jones, Matt Mackall,
	Andy Isaacson, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso

> I was under the impression that P4 and later processors do not vary the
> TSC rate when doing frequency scaling.  This is mentioned in the
> documentation for the high res timers patch.

Prescott and later do not vary TSC, but P4s before that do.
On x86-64 it is true because only Nocona is supported which has a 
pstate invariant TSC.

The latest x86-64 kernel has a special X86_CONSTANT_TSC internal
CPUID bit, which is set in that case. If some other subsystem
uses it I would recommend to port that to i386 too.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 21:26       ` Andy Isaacson
  2005-05-13 21:59         ` Matt Mackall
  2005-05-14  0:39         ` dean gaudet
@ 2005-05-15  9:43         ` Andi Kleen
  2005-05-15 18:42           ` David Schwartz
  2005-05-16  7:10           ` Eric W. Biederman
  2005-05-15 14:00         ` Mikulas Patocka
  2005-05-15 14:26         ` Andi Kleen
  4 siblings, 2 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-15  9:43 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso

On Fri, May 13, 2005 at 02:26:20PM -0700, Andy Isaacson wrote:
> On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote:
> > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> > > Why?  It's certainly reasonable to disable it for the time being and
> > > even prudent to do so.
> > 
> > No, i strongly disagree on that. The reasonable thing to do is
> > to fix the crypto code which has this vulnerability, not break
> > a useful performance enhancement for everybody else.
> 
> Pardon me for saying so, but that's bullshit.  You're asking the crypto
> guys to give up a 5x performance gain (that's my wild guess) by giving
> up all their data-dependent algorithms and contorting their code wildly,
> to avoid a microarchitectural problem with Intel's HT implementation.

And what you're doing is to ask all the non crypto guys to give
up an useful optimization just to fix a problem in the crypto guy's
code. The cache line information leak is just a information leak
bug in the crypto code, not a general problem.

There is much more non crypto code than crypto code around - you
are proposing to screw the majority of codes to solve a relatively
obscure problem of only a few functions, which seems like the totally
wrong approach to me.

BTW the crypto guys are always free to check for hyperthreading
themselves and use different functions.  However there is a catch
there - the modern dual core processors which actually have
separated L1 and L2 caches set these too to stay compatible
with old code and license managers.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 15:33                       ` Andrea Arcangeli
  2005-05-15  1:07                         ` Christer Weinigel
@ 2005-05-15  9:48                         ` Andi Kleen
  1 sibling, 0 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-15  9:48 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Lee Revell, Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, May 14, 2005 at 05:33:07PM +0200, Andrea Arcangeli wrote:
> On Sat, May 14, 2005 at 03:37:18AM -0400, Lee Revell wrote:
> > The apps that bother to use rdtsc vs. gettimeofday need a cheap high res
> > timer more than a correct one anyway - it's not guaranteed that rdtsc
> > provides a reliable time source at all, due to SMP and frequency scaling
> > issues.
> 
> On x86-64 the cost of gettimeofday is the same of the tsc, turning off

It depends, on many systems it is more costly. e.g. on many SMP
systems we have to use HPET or even the PM timer, because TSC is not
reliable.

> tsc on x86-64 is not nice (even if we usually have HPET there, so
> perhaps it wouldn't be too bad). TSC is something only the kernel (or a
> person with some kernel/hardware knowledge) can do safely knowing it'll
> work fine. But on x86-64 parts of the kernel runs in userland...

Agreed. It is quite complicated to decide if TSC is reliable or not
and I would not recommend user space to do this.

[hmm actually I already have constant_tsc fake cpuid bit, but 
it only refers to single CPUs. I wonder if I should add another
one for SMP "synchronized_tsc". The latest mm code already has
this information, but it does not export it yet] 


> 
> Preventing tasks with different uid to run on the same physical cpu was
> my first idea, disabled by default via sysctl, so only if one is
> paranoid can enable it.

The paranoid should just fix their crypto code. And if they're
clinically paranoid they can always boot with noht or disable
it in the BIOS. But really I think they should just fix OpenSSL. 

> 
> But before touching the kernel in any way it would be really nice if
> somebody could bother to demonstrate this is real because I've an hard
> time to believe this is not purely vapourware. On artificial

Similar feeling here.

> Nobody runs openssl -sign thousand of times in a row on a pure idle
> system without noticing the 100% load on the other cpu for months (and
> he's not root so he can't hide his runaway 100% process, if he was root
> and he could modify the kernel or ps/top to hide the runaway process,
> he'd have faster ways to sniff).

Exactly.

> 
> So to me this sounds a purerly theoretical problem. Cache covert

Perhaps not purely theoretical, but it is certainly not something
that needs drastic action like disabling HT in general.

> This was an interesting read, but in practice I'd rate this to have
> severity 1 on a 0-100 scale, unless somebody bothers to demonstrate it
> in a remotely realistic environment.

Agreed.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 19:16   ` Diego Calleja
  2005-05-13 19:42     ` Frank Denis (Jedi/Sector One)
@ 2005-05-15  9:54     ` Andi Kleen
  2005-05-15 13:51       ` Mikulas Patocka
  1 sibling, 1 reply; 150+ messages in thread
From: Andi Kleen @ 2005-05-15  9:54 UTC (permalink / raw)
  To: Diego Calleja; +Cc: gmicsko, linux-kernel

On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote:
> El Fri, 13 May 2005 20:03:58 +0200,
> Andi Kleen <ak@muc.de> escribi?:
> 
> 
> > This is not a kernel problem, but a user space problem. The fix 
> > is to change the user space crypto code to need the same number of cache line
> > accesses on all keys. 
> 
> 
> However they've patched the FreeBSD kernel to "workaround?" it:
> ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch

That's a similar stupid idea as they did with the disk write
cache (lowering the MTBFs of their disks by considerable factors,
which is much worse than the power off data loss problem) 
Let's not go down this path please.

-Andi


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 16:30                     ` Lee Revell
  2005-05-14 16:44                       ` Arjan van de Ven
  2005-05-14 17:04                       ` Jindrich Makovicka
@ 2005-05-15  9:58                       ` Andi Kleen
  2 siblings, 0 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-15  9:58 UTC (permalink / raw)
  To: Lee Revell
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, May 14, 2005 at 12:30:28PM -0400, Lee Revell wrote:
> On Sat, 2005-05-14 at 16:23 +0100, Alan Cox wrote:
> > On Sad, 2005-05-14 at 00:38, Lee Revell wrote:
> > > Well yes but you would still have to recompile those apps.  And take the
> > > big performance hit from using gettimeofday vs rdtsc.  Disabling HT by
> > > default looks pretty good by comparison.
> > 
> > You cannot use rdtsc for anything but rough instruction timing. The
> > timers for different processors run at different speeds on some SMP
> > systems, the timer rates vary as processors change clock rate nowdays.
> > Rdtsc may also jump dramatically on a suspend/resume.
> > 
> > If the app uses rdtsc then generally speaking its terminally broken. The
> > only exception is some profiling tools.
> 
> That is basically all JACK and mplayer use it for.  They have RT
> constraints and the tsc is used to know if we got woken up too late and
> should just drop some frames.  The developers are aware of the issues
> with rdtsc and have chosen to use it anyway because these apps need
> every ounce of CPU and cannot tolerate the overhead of gettimeofday(). 

I would consider jack broken then. For once it breaks
on Centrinos and on AMD systems with PowerNow and some others which all
have frequency scaling with non pstate invariant TSC.

As an additional problem the modern Opterons which support SMP
powernow can even have completely different TSC frequencies
on different CPUs.

All I can recommend is to use gettimeofday() for this. The kernel
goes to considerable pains to make gettimeofday() fast, and when
it is not fast then the system in general cannot do it better.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 18:01                           ` Arjan van de Ven
  2005-05-14 19:21                             ` Lee Revell
@ 2005-05-15 10:01                             ` Andi Kleen
  2005-05-15 10:23                               ` 2.6.4 timer and helper functions kernel
  1 sibling, 1 reply; 150+ messages in thread
From: Andi Kleen @ 2005-05-15 10:01 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Lee Revell, Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sat, May 14, 2005 at 08:01:33PM +0200, Arjan van de Ven wrote:
> On Sat, 2005-05-14 at 13:56 -0400, Lee Revell wrote:
> > On Sat, 2005-05-14 at 18:44 +0200, Arjan van de Ven wrote:
> > > then JACK is terminally broken if it doesn't have a fallback for non-
> > > rdtsc cpus. 
> > 
> > It does have a fallback, but the selection is done at compile time.  It
> > uses rdtsc for all x86 CPUs except pre-i586 SMP systems.
> > 
> > Maybe we should check at runtime,
> 
> it's probably a sign that JACK isn't used on SMP systems much, at least
> not on the bigger systems (like IBM's x440's) where the tsc *will*
> differ wildly between cpus...

It does not even need SMP, just use a Centrino laptop.

I suppose what the Jack guys are doing is to recommend to disable
frequency scaling then the sound guys complain again
that sound on linux is so hard to use. I wonder where this comes from? :)

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* 2.6.4 timer and helper functions
  2005-05-15 10:01                             ` Andi Kleen
@ 2005-05-15 10:23                               ` kernel
  2005-05-19  0:38                                 ` George Anzinger
  0 siblings, 1 reply; 150+ messages in thread
From: kernel @ 2005-05-15 10:23 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1416 bytes --]

Hi all,
i am running a 2.6.4 kernel on my system , and i am playing a little bit
with kernel time issues and helper functions,just to understand how the
things really work.
While doing that on my x86 system and loaded a module from LDD 3rd
edition,jit.c, which uses a dynamic /proc file to return textual
information.
The info that returns is in this format and uses the kernel functions
,do_gettimeofday,current_kernel_time and jiffies_to_timespec.
The output format is:
0x0009073c 0x000000010009073c 1116162967.247441
                              1116162967.246530656        591.586065248
0x0009073c 0x000000010009073c 1116162967.247463
                              1116162967.246530656        591.586065248
0x0009073c 0x000000010009073c 1116162967.247476
                              1116162967.246530656        591.586065248
0x0009073c 0x000000010009073c 1116162967.247489
                              1116162967.246530656        591.586065248
where the first two values are the jiffies and jiffies_64.The next two are
the do_gettimeofday and current_kernel_time and the last value is the
jiffies_to_timespec.This output text is "recorded" after 16 minutes of
uptime.Shouldnt the last value be the same as uptime.I have attached an
output file from the boot time until the time the function resets the
struct and starts count from the beggining.Is this a bug or i am missing
sth here???

Best regards,
Chris.

[-- Attachment #2: NAS --]
[-- Type: application/octet-stream, Size: 8798 bytes --]

0xfffd544c 0x00000000fffd544c 1116162200.659770
                              1116162200.659069664    4294139.459575264
0xfffd544c 0x00000000fffd544c 1116162200.659793
                              1116162200.659069664    4294139.459575264
0xfffd544c 0x00000000fffd544c 1116162200.659807
                              1116162200.659069664    4294139.459575264
0xfffd544c 0x00000000fffd544c 1116162200.659820
                              1116162200.659069664    4294139.459575264
0xfffd575b 0x00000000fffd575b 1116162201.442085
                              1116162201.441950648    4294140.242456248
0xfffd575b 0x00000000fffd575b 1116162201.442109
                              1116162201.441950648    4294140.242456248
0xfffd575b 0x00000000fffd575b 1116162201.442122
                              1116162201.441950648    4294140.242456248
0xfffd575b 0x00000000fffd575b 1116162201.442135
                              1116162201.441950648    4294140.242456248
0xfffd5a71 0x00000000fffd5a71 1116162202.231974
                              1116162202.231830568    4294141.32336168
0xfffd5a71 0x00000000fffd5a71 1116162202.231996
                              1116162202.231830568    4294141.32336168
0xfffd5a71 0x00000000fffd5a71 1116162202.232010
                              1116162202.231830568    4294141.32336168
0xfffd5a71 0x00000000fffd5a71 1116162202.232023
                              1116162202.231830568    4294141.32336168
0xfffd5d63 0x00000000fffd5d63 1116162202.986007
                              1116162202.985715960    4294141.786221560
0xfffd5d63 0x00000000fffd5d63 1116162202.986030
                              1116162202.985715960    4294141.786221560
0xfffd5d63 0x00000000fffd5d63 1116162202.986043
                              1116162202.985715960    4294141.786221560
0xfffd5d63 0x00000000fffd5d63 1116162202.986056
                              1116162202.985715960    4294141.786221560
0xfffd71d6 0x00000000fffd71d6 1116162208.220317
                              1116162208.219920240    4294147.20425840
0xfffd71d6 0x00000000fffd71d6 1116162208.220341
                              1116162208.219920240    4294147.20425840
0xfffd71d6 0x00000000fffd71d6 1116162208.220354
                              1116162208.219920240    4294147.20425840
0xfffd71d6 0x00000000fffd71d6 1116162208.220367
                              1116162208.219920240    4294147.20425840
0xfffd7432 0x00000000fffd7432 1116162208.824141
                              1116162208.823828432    4294147.624334032
0xfffd7432 0x00000000fffd7432 1116162208.824165
                              1116162208.823828432    4294147.624334032
0xfffd7432 0x00000000fffd7432 1116162208.824178
                              1116162208.823828432    4294147.624334032
0xfffd7432 0x00000000fffd7432 1116162208.824191
                              1116162208.823828432    4294147.624334032
0xfffd76dc 0x00000000fffd76dc 1116162209.506691
                              1116162209.505724768    4294148.306230368
0xfffd76dc 0x00000000fffd76dc 1116162209.506714
                              1116162209.505724768    4294148.306230368
0xfffd76dd 0x00000000fffd76dd 1116162209.506750
                              1116162209.506724616    4294148.307230216
0xfffd76dd 0x00000000fffd76dd 1116162209.506764
                              1116162209.506724616    4294148.307230216
0xfffd79f0 0x00000000fffd79f0 1116162210.293679
                              1116162210.293604992    4294149.94110592
0xfffd79f0 0x00000000fffd79f0 1116162210.293702
                              1116162210.293604992    4294149.94110592
0xfffd79f0 0x00000000fffd79f0 1116162210.293715
                              1116162210.293604992    4294149.94110592
0xfffd79f0 0x00000000fffd79f0 1116162210.293728
                              1116162210.293604992    4294149.94110592
0xfffd7c99 0x00000000fffd7c99 1116162210.974616
                              1116162210.974501480    4294149.775007080
0xfffd7c99 0x00000000fffd7c99 1116162210.974640
                              1116162210.974501480    4294149.775007080
0xfffd7c99 0x00000000fffd7c99 1116162210.974653
                              1116162210.974501480    4294149.775007080
0xfffd7c99 0x00000000fffd7c99 1116162210.974666
                              1116162210.974501480    4294149.775007080
0xfffd7fb0 0x00000000fffd7fb0 1116162211.766070
                              1116162211.765381248    4294150.565886848
0xfffd7fb0 0x00000000fffd7fb0 1116162211.766094
                              1116162211.765381248    4294150.565886848
0xfffd7fb0 0x00000000fffd7fb0 1116162211.766107
                              1116162211.765381248    4294150.565886848
0xfffd7fb0 0x00000000fffd7fb0 1116162211.766120
                              1116162211.765381248    4294150.565886848
0xfffd829c 0x00000000fffd829c 1116162212.513993
                              1116162212.513267552    4294151.313773152
0xfffd829c 0x00000000fffd829c 1116162212.514016
                              1116162212.513267552    4294151.313773152
0xfffd829c 0x00000000fffd829c 1116162212.514029
                              1116162212.513267552    4294151.313773152
0xfffd829c 0x00000000fffd829c 1116162212.514042
                              1116162212.513267552    4294151.313773152
0xfffd858d 0x00000000fffd858d 1116162213.266431
                              1116162213.266153096    4294152.66658696
0xfffd858d 0x00000000fffd858d 1116162213.266453
                              1116162213.266153096    4294152.66658696
0xfffd858d 0x00000000fffd858d 1116162213.266467
                              1116162213.266153096    4294152.66658696
0xfffd858d 0x00000000fffd858d 1116162213.266480
                              1116162213.266153096    4294152.66658696
0xfffdaeb0 0x00000000fffdaeb0 1116162223.796156
                              1116162223.795552384    4294162.596057984
0xfffdaeb0 0x00000000fffdaeb0 1116162223.796180
                              1116162223.795552384    4294162.596057984
0xfffdaeb0 0x00000000fffdaeb0 1116162223.796193
                              1116162223.795552384    4294162.596057984
0xfffdaeb0 0x00000000fffdaeb0 1116162223.796206
                              1116162223.795552384    4294162.596057984
0xfffdb151 0x00000000fffdb151 1116162224.469209
                              1116162224.468450088    4294163.268955688
0xfffdb151 0x00000000fffdb151 1116162224.469233
                              1116162224.468450088    4294163.268955688
0xfffdb151 0x00000000fffdb151 1116162224.469247
                              1116162224.468450088    4294163.268955688
0xfffdb151 0x00000000fffdb151 1116162224.469260
                              1116162224.468450088    4294163.268955688
0xfffdb3b2 0x00000000fffdb3b2 1116162225.077922
                              1116162225.077357520    4294163.877863120
0xfffdb3b2 0x00000000fffdb3b2 1116162225.077946
                              1116162225.077357520    4294163.877863120
0xfffdb3b2 0x00000000fffdb3b2 1116162225.077959
                              1116162225.077357520    4294163.877863120
0xfffdb3b2 0x00000000fffdb3b2 1116162225.077972
                              1116162225.077357520    4294163.877863120
0xfffded9b 0x00000000fffded9b 1116162239.900231
                              1116162239.900104120    4294178.700609720
0xfffdfbd9 0x00000000fffdfbd9 1116162243.545937
                              1116162243.545549928    4294182.346055528
0xfffdfea5 0x00000000fffdfea5 1116162244.261748
                              1116162244.261441096    4294183.61946696
0xfffe014b 0x00000000fffe014b 1116162244.939810
                              1116162244.939338040    4294183.739843640
0xfffe03cb 0x00000000fffe03cb 1116162245.580168
                              1116162245.579240760    4294184.379746360
0xfffe0674 0x00000000fffe0674 1116162246.260663
                              1116162246.260137248    4294185.60642848
0xfffe08f5 0x00000000fffe08f5 1116162246.901559
                              1116162246.901039816    4294185.701545416
0xfffe0b99 0x00000000fffe0b99 1116162247.577274
                              1116162247.576937064    4294186.377442664
0xfffe0e63 0x00000000fffe0e63 1116162248.291766
                              1116162248.290828536    4294187.91334136
0xfffe48ee 0x00000000fffe48ee 1116162263.276450
                              1116162263.275550512    4294202.76056112
......................................................................
......................................................................
.....................after 5 minutes exactly from the beggining ......
0x0000002d 0x000000010000002d 1116162375.706265
                              1116162375.705458568          0.44993160


the counter resets from the beggining...shouldnt be like that from the beggining???

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14 15:45                     ` andrea
@ 2005-05-15 13:38                       ` Mikulas Patocka
  2005-05-16  7:06                         ` andrea
  0 siblings, 1 reply; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-15 13:38 UTC (permalink / raw)
  To: andrea
  Cc: Alan Cox, Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson,
	Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso, Andrew Morton



On Sat, 14 May 2005 andrea@cpushare.com wrote:

> On Sat, May 14, 2005 at 04:23:10PM +0100, Alan Cox wrote:
> > You cannot use rdtsc for anything but rough instruction timing. The
> > timers for different processors run at different speeds on some SMP
> > systems, the timer rates vary as processors change clock rate nowdays.
> > Rdtsc may also jump dramatically on a suspend/resume.
>
> x86-64 uses it for vgettimeofday very safely (i386 could do too but it
> doesn't).
>
> Anyway I believe at least for seccomp it's worth to turn off the tsc,
> not just for HT but for the L2 cache too. So it's up to you, either you
> turn it off completely (which isn't very nice IMHO) or I recommend to
> apply this below patch. This has been tested successfully on x86-64
> against current cogito repository (i686 compiles so I didn't bother
> testing ;). People selling the cpu through cpushare may appreciate this
> bit for a peace of mind. There's no way to get any timing info anymore
> with this applied (gettimeofday is forbidden of course).

Another possibility to get timing is from direct-io --- i.e. initiate
direct io read, wait until one cache line contains new data and you can be
sure that the next will contain new data in certain time. IDE controller
bus master operation acts here as a timer.

Mikulas

> The seccomp
> environment is completely deterministic so it can't be allowed to get
> timing info, it has to be deterministic so in the future I can enable a
> computing mode that does a parallel computing for each task with server
> side transparent checkpointing and verification that the output is the
> same from all the 2/3 seller computers for each task, without the buyer
> even noticing (for now the verification is left to the buyer client
> side and there's no checkpointing, since that would require more kernel
> changes to track the dirty bits but it'll be easy to extend once the
> basic mode is finished).
>
> Thanks.
>
> Signed-off-by: Andrea Arcangeli <andrea@cpushare.com>
>
> Index: arch/i386/kernel/process.c
> ===================================================================
> --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/i386/kernel/process.c  (mode:100644)
> +++ uncommitted/arch/i386/kernel/process.c  (mode:100644)
> @@ -561,6 +561,25 @@
>  }
>
>  /*
> + * This function selects if the context switch from prev to next
> + * has to tweak the TSC disable bit in the cr4.
> + */
> +static void disable_tsc(struct thread_info *prev,
> +			struct thread_info *next)
> +{
> +	if (unlikely(has_secure_computing(prev) ||
> +		     has_secure_computing(next))) {
> +		/* slow path here */
> +		if (has_secure_computing(prev) &&
> +		    !has_secure_computing(next)) {
> +			clear_in_cr4(X86_CR4_TSD);
> +		} else if (!has_secure_computing(prev) &&
> +			   has_secure_computing(next))
> +			set_in_cr4(X86_CR4_TSD);
> +	}
> +}
> +
> +/*
>   *	switch_to(x,yn) should switch tasks from x to y.
>   *
>   * We fsave/fwait so that an exception goes off at the right time
> @@ -639,6 +658,8 @@
>  	if (unlikely(prev->io_bitmap_ptr || next->io_bitmap_ptr))
>  		handle_io_bitmap(next, tss);
>
> +	disable_tsc(prev_p->thread_info, next_p->thread_info);
> +
>  	return prev_p;
>  }
>
> Index: arch/x86_64/kernel/process.c
> ===================================================================
> --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/arch/x86_64/kernel/process.c  (mode:100644)
> +++ uncommitted/arch/x86_64/kernel/process.c  (mode:100644)
> @@ -439,6 +439,25 @@
>  }
>
>  /*
> + * This function selects if the context switch from prev to next
> + * has to tweak the TSC disable bit in the cr4.
> + */
> +static void disable_tsc(struct thread_info *prev,
> +			struct thread_info *next)
> +{
> +	if (unlikely(has_secure_computing(prev) ||
> +		     has_secure_computing(next))) {
> +		/* slow path here */
> +		if (has_secure_computing(prev) &&
> +		    !has_secure_computing(next)) {
> +			clear_in_cr4(X86_CR4_TSD);
> +		} else if (!has_secure_computing(prev) &&
> +			   has_secure_computing(next))
> +			set_in_cr4(X86_CR4_TSD);
> +	}
> +}
> +
> +/*
>   * This special macro can be used to load a debugging register
>   */
>  #define loaddebug(thread,r) set_debug(thread->debugreg ## r, r)
> @@ -556,6 +575,8 @@
>  		}
>  	}
>
> +	disable_tsc(prev_p->thread_info, next_p->thread_info);
> +
>  	return prev_p;
>  }
>
> Index: include/linux/seccomp.h
> ===================================================================
> --- eed337ef5e9ae7d62caa84b7974a11fddc7f06e0/include/linux/seccomp.h  (mode:100644)
> +++ uncommitted/include/linux/seccomp.h  (mode:100644)
> @@ -19,6 +19,11 @@
>  		__secure_computing(this_syscall);
>  }
>
> +static inline int has_secure_computing(struct thread_info *ti)
> +{
> +	return unlikely(test_ti_thread_flag(ti, TIF_SECCOMP));
> +}
> +
>  #else /* CONFIG_SECCOMP */
>
>  #if (__GNUC__ > 2)
> @@ -28,6 +33,7 @@
>  #endif
>
>  #define secure_computing(x) do { } while (0)
> +#define has_secure_computing(x) 0
>
>  #endif /* CONFIG_SECCOMP */
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15  9:54     ` Andi Kleen
@ 2005-05-15 13:51       ` Mikulas Patocka
  2005-05-15 14:12         ` Andi Kleen
  0 siblings, 1 reply; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-15 13:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel



On Sun, 15 May 2005, Andi Kleen wrote:

> On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote:
> > El Fri, 13 May 2005 20:03:58 +0200,
> > Andi Kleen <ak@muc.de> escribi?:
> >
> >
> > > This is not a kernel problem, but a user space problem. The fix
> > > is to change the user space crypto code to need the same number of cache line
> > > accesses on all keys.
> >
> >
> > However they've patched the FreeBSD kernel to "workaround?" it:
> > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch
>
> That's a similar stupid idea as they did with the disk write
> cache (lowering the MTBFs of their disks by considerable factors,
> which is much worse than the power off data loss problem)
> Let's not go down this path please.

What wrong did they do with disk write cache?

Mikulas

> -Andi
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 21:26       ` Andy Isaacson
                           ` (2 preceding siblings ...)
  2005-05-15  9:43         ` Andi Kleen
@ 2005-05-15 14:00         ` Mikulas Patocka
  2005-05-15 14:26         ` Andi Kleen
  4 siblings, 0 replies; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-15 14:00 UTC (permalink / raw)
  To: Andy Isaacson
  Cc: Andi Kleen, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso



On Fri, 13 May 2005, Andy Isaacson wrote:

> On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote:
> > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> > > Why?  It's certainly reasonable to disable it for the time being and
> > > even prudent to do so.
> >
> > No, i strongly disagree on that. The reasonable thing to do is
> > to fix the crypto code which has this vulnerability, not break
> > a useful performance enhancement for everybody else.
>
> Pardon me for saying so, but that's bullshit.  You're asking the crypto
> guys to give up a 5x performance gain (that's my wild guess) by giving
> up all their data-dependent algorithms and contorting their code wildly,
> to avoid a microarchitectural problem with Intel's HT implementation.

That information leak can be exploited not only on HT or SMP, but on any
CPU with L2 cache. Without HT it's much harder to get information about L2
cache footprint, but it's still possible. If an attacker can make
unlimited number of connections to ssh or http server and manages to get 1
bit in 100 connections, it's still a problem.

Possible solutions:
1) don't use branches and data-dependent memory accesses depending on
secret data
2) flush cache completely when switching to process with different EUID
(0.2ms on Pentium 4 with 1M cache, even worse on CPUs with more cache).

Disabling HT/SMP is not a solution. A year later someone may come with
something like this:
* prefill L2 cache with known pattern
* sleep on some precious timer
* make connection to security application (ssh, https)
* on wakeup, read what's in L2 cache --- get one bit with small
probability --- but when repeated many times, it's still a problem

Mikulas

> There are three places to cut off the side channel, none of which is
> obviously the right one.
> 1. The HT implementation could do the cache tricks Colin suggested in
>    his paper.  Fairly large performance hit to address a fairly small
>    problem.
> 2. The OS could do the scheduler tricks to avoid scheduling unfriendly
>    threads on the same core.  You're leaving a lot of the benefit of HT
>    on the floor by doing so.
> 3. Every security-sensitive app can be rigorously audited and re-written
>    to avoid *ever* referencing memory with the address determined by
>    private data.
>
> (3) is a complete non-starter.  It's just not feasible to rewrite all
> that code.  Furthermore, there's no way to know what code needs to be
> rewritten!  (Until someone publishes an advisory, that is...)
>
> Hmm, I can't think of any reason that this technique wouldn't work to
> extract information from kernel secrets, as well...
>
> If SHA has plaintext-dependent memory references, Colin's technique
> would enable an adversary to extract the contents of the /dev/random
> pools.  I don't *think* SHA does, based on a quick reading of
> lib/sha1.c, but someone with an actual clue should probably take a look.
>
> Andi, are you prepared to *require* that no code ever make a memory
> reference as a function of a secret?  Because that's what you're
> suggesting the crypto people should do.
>
> -andy
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 13:51       ` Mikulas Patocka
@ 2005-05-15 14:12         ` Andi Kleen
  2005-05-15 14:21           ` Mikulas Patocka
  2005-05-15 14:52           ` Tomasz Torcz
  0 siblings, 2 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-15 14:12 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: linux-kernel

On Sun, May 15, 2005 at 03:51:05PM +0200, Mikulas Patocka wrote:
> 
> 
> On Sun, 15 May 2005, Andi Kleen wrote:
> 
> > On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote:
> > > El Fri, 13 May 2005 20:03:58 +0200,
> > > Andi Kleen <ak@muc.de> escribi?:
> > >
> > >
> > > > This is not a kernel problem, but a user space problem. The fix
> > > > is to change the user space crypto code to need the same number of cache line
> > > > accesses on all keys.
> > >
> > >
> > > However they've patched the FreeBSD kernel to "workaround?" it:
> > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch
> >
> > That's a similar stupid idea as they did with the disk write
> > cache (lowering the MTBFs of their disks by considerable factors,
> > which is much worse than the power off data loss problem)
> > Let's not go down this path please.
> 
> What wrong did they do with disk write cache?

They turned it off by default, which according to disk vendors
lowers the MTBF of your disk to a fraction of the original value.

I bet the total amount of valuable data lost for FreeBSD users because
of broken disks is much much bigger than what they gained from not losing
in the rather hard to hit power off cases.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 14:12         ` Andi Kleen
@ 2005-05-15 14:21           ` Mikulas Patocka
  2005-05-15 14:52           ` Tomasz Torcz
  1 sibling, 0 replies; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-15 14:21 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel



On Sun, 15 May 2005, Andi Kleen wrote:

> On Sun, May 15, 2005 at 03:51:05PM +0200, Mikulas Patocka wrote:
> >
> >
> > On Sun, 15 May 2005, Andi Kleen wrote:
> >
> > > On Fri, May 13, 2005 at 09:16:09PM +0200, Diego Calleja wrote:
> > > > El Fri, 13 May 2005 20:03:58 +0200,
> > > > Andi Kleen <ak@muc.de> escribi?:
> > > >
> > > >
> > > > > This is not a kernel problem, but a user space problem. The fix
> > > > > is to change the user space crypto code to need the same number of cache line
> > > > > accesses on all keys.
> > > >
> > > >
> > > > However they've patched the FreeBSD kernel to "workaround?" it:
> > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch
> > >
> > > That's a similar stupid idea as they did with the disk write
> > > cache (lowering the MTBFs of their disks by considerable factors,
> > > which is much worse than the power off data loss problem)
> > > Let's not go down this path please.
> >
> > What wrong did they do with disk write cache?
>
> They turned it off by default, which according to disk vendors
> lowers the MTBF of your disk to a fraction of the original value.
>
> I bet the total amount of valuable data lost for FreeBSD users because
> of broken disks is much much bigger than what they gained from not losing
> in the rather hard to hit power off cases.
>
> -Andi

BTW. Is there any blacklist of disks with broken FLUSH CACHE command? Or a
list of companies that cheat in implementation of it?

Mikulas

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 21:26       ` Andy Isaacson
                           ` (3 preceding siblings ...)
  2005-05-15 14:00         ` Mikulas Patocka
@ 2005-05-15 14:26         ` Andi Kleen
  4 siblings, 0 replies; 150+ messages in thread
From: Andi Kleen @ 2005-05-15 14:26 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso

> There are three places to cut off the side channel, none of which is
> obviously the right one.
> 1. The HT implementation could do the cache tricks Colin suggested in
>    his paper.  Fairly large performance hit to address a fairly small
>    problem.

As Dean pointed out that is probably not true.

> 2. The OS could do the scheduler tricks to avoid scheduling unfriendly
>    threads on the same core.  You're leaving a lot of the benefit of HT
>    on the floor by doing so.

And probably still lose badly in some workloads.

> 3. Every security-sensitive app can be rigorously audited and re-written
>    to avoid *ever* referencing memory with the address determined by
>    private data.

Sure after it was demonstrated that this attack is actually feasible
in practice. If yes then fix the crypto code, otherwise do nothing.

I have no problem with crypto people being paranoid (that is their
job after all), as long as they don't try to affect non crypto code
in the process. But the later seems to be clearly the case here :-(
> 
> (3) is a complete non-starter.  It's just not feasible to rewrite all
> that code.  Furthermore, there's no way to know what code needs to be
> rewritten!  (Until someone publishes an advisory, that is...)
> 
> Hmm, I can't think of any reason that this technique wouldn't work to
> extract information from kernel secrets, as well... 
> 
> If SHA has plaintext-dependent memory references, Colin's technique
> would enable an adversary to extract the contents of the /dev/random
> pools.  I don't *think* SHA does, based on a quick reading of
> lib/sha1.c, but someone with an actual clue should probably take a look.
> 
> Andi, are you prepared to *require* that no code ever make a memory
> reference as a function of a secret?  Because that's what you're
> suggesting the crypto people should do.

No, just not do it frequently enough that you leak enough data.
Or add dummy memory references to blend your data.

And then nobody said writing crypto code was easy. It just got a bit
harder today.

It is basically like writing smart card code, where you need
to care about such side channels. The other crypto code writers
just need to care about this too.  They will probably avoid
other timing attacks on cache misses too with this approach. Although 
it is doubtful enough signal is leaked in this way, e.g. if you time the 
performance of a network server with RSA answering over the network -
but you see some data is always leaked - the question is just
if it is enough and accurate data to aid an attacker. The paper
has shown that it is feasible in some cases, but so far the proof
is still out this could be actually replicated in not very controlled
loads. With more noise in the data it becomes harder. And the question
is is the small amount of data with normal background workload
is really useful enough to lead to useful real world attacks. I have
severe doubts on that. Certainly the effidence is not clear enough
for a serious step like disabling an  useful performance enhancement like
HT.

-Andi

P.S.: My personal opinion is that we have a far bigger crypto security
problem on many system due to weak /dev/random seeding on many systems. 
If anything is done it would be better to attack that.



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 14:12         ` Andi Kleen
  2005-05-15 14:21           ` Mikulas Patocka
@ 2005-05-15 14:52           ` Tomasz Torcz
  2005-05-15 15:00             ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka
  2005-05-15 15:00             ` Hyper-Threading Vulnerability Arjan van de Ven
  1 sibling, 2 replies; 150+ messages in thread
From: Tomasz Torcz @ 2005-05-15 14:52 UTC (permalink / raw)
  To: linux-kernel

On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
> > > > However they've patched the FreeBSD kernel to "workaround?" it:
> > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch
> > >
> > > That's a similar stupid idea as they did with the disk write
> > > cache (lowering the MTBFs of their disks by considerable factors,
> > > which is much worse than the power off data loss problem)
> > > Let's not go down this path please.
> > 
> > What wrong did they do with disk write cache?
> 
> They turned it off by default, which according to disk vendors
> lowers the MTBF of your disk to a fraction of the original value.
> 
> I bet the total amount of valuable data lost for FreeBSD users because
> of broken disks is much much bigger than what they gained from not losing
> in the rather hard to hit power off cases.

 Aren't I/O barriers a way to safely use write cache?

-- 
Tomasz Torcz                 "God, root, what's the difference?"
zdzichu@irc.-nie.spam-.pl         "God is more forgiving."


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 14:52           ` Tomasz Torcz
@ 2005-05-15 15:00             ` Mikulas Patocka
  2005-05-15 15:21               ` Gene Heskett
  2005-05-16 14:50               ` Alan Cox
  2005-05-15 15:00             ` Hyper-Threading Vulnerability Arjan van de Ven
  1 sibling, 2 replies; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-15 15:00 UTC (permalink / raw)
  To: Tomasz Torcz; +Cc: linux-kernel



On Sun, 15 May 2005, Tomasz Torcz wrote:

> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
> > > > > However they've patched the FreeBSD kernel to "workaround?" it:
> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/htt5.patch
> > > >
> > > > That's a similar stupid idea as they did with the disk write
> > > > cache (lowering the MTBFs of their disks by considerable factors,
> > > > which is much worse than the power off data loss problem)
> > > > Let's not go down this path please.
> > >
> > > What wrong did they do with disk write cache?
> >
> > They turned it off by default, which according to disk vendors
> > lowers the MTBF of your disk to a fraction of the original value.
> >
> > I bet the total amount of valuable data lost for FreeBSD users because
> > of broken disks is much much bigger than what they gained from not losing
> > in the rather hard to hit power off cases.
>
>  Aren't I/O barriers a way to safely use write cache?

FreeBSD used these barriers (FLUSH CACHE command) long time ago.

There are rumors that some disks ignore FLUSH CACHE command just to get
higher benchmarks in Windows. But I haven't heart of any proof. Does
anybody know, what companies fake this command?

Mikulas

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 14:52           ` Tomasz Torcz
  2005-05-15 15:00             ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka
@ 2005-05-15 15:00             ` Arjan van de Ven
  1 sibling, 0 replies; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-15 15:00 UTC (permalink / raw)
  To: Tomasz Torcz; +Cc: linux-kernel


> > They turned it off by default, which according to disk vendors
> > lowers the MTBF of your disk to a fraction of the original value.
> > 
> > I bet the total amount of valuable data lost for FreeBSD users because
> > of broken disks is much much bigger than what they gained from not losing
> > in the rather hard to hit power off cases.
> 
>  Aren't I/O barriers a way to safely use write cache?

yes they are. However of course they also decrease the mtbf somewhat,
although less so than entirely disabling the cache....



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 15:00             ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka
@ 2005-05-15 15:21               ` Gene Heskett
  2005-05-15 15:29                 ` Jeff Garzik
                                   ` (2 more replies)
  2005-05-16 14:50               ` Alan Cox
  1 sibling, 3 replies; 150+ messages in thread
From: Gene Heskett @ 2005-05-15 15:21 UTC (permalink / raw)
  To: linux-kernel

On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
>On Sun, 15 May 2005, Tomasz Torcz wrote:
>> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
>> > > > > However they've patched the FreeBSD kernel to
>> > > > > "workaround?" it:
>> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
>> > > > >t5.patch
>> > > >
>> > > > That's a similar stupid idea as they did with the disk write
>> > > > cache (lowering the MTBFs of their disks by considerable
>> > > > factors, which is much worse than the power off data loss
>> > > > problem) Let's not go down this path please.
>> > >
>> > > What wrong did they do with disk write cache?
>> >
>> > They turned it off by default, which according to disk vendors
>> > lowers the MTBF of your disk to a fraction of the original
>> > value.
>> >
>> > I bet the total amount of valuable data lost for FreeBSD users
>> > because of broken disks is much much bigger than what they
>> > gained from not losing in the rather hard to hit power off
>> > cases.
>>
>>  Aren't I/O barriers a way to safely use write cache?
>
>FreeBSD used these barriers (FLUSH CACHE command) long time ago.
>
>There are rumors that some disks ignore FLUSH CACHE command just to
> get higher benchmarks in Windows. But I haven't heart of any proof.
> Does anybody know, what companies fake this command?
>
>From a story I read elsewhere just a few days ago, this problem is 
virtually universal even in the umpty-bucks 15,000 rpm scsi server 
drives.  It appears that this is just another way to crank up the 
numbers and make each drive seem faster than its competition.

My gut feeling is that if this gets enough ink to get under the drive 
makers skins, we will see the issuance of a utility from the makers 
that will re-program the drives therefore enabling the proper 
handling of the FLUSH CACHE command.  This would be an excellent 
chance IMO, to make a bit of noise if the utility comes out, but only 
runs on windows.  In that event, we hold their feet to the fire (the 
prefereable method), or a wrapper is written that allows it to run on 
any os with a bash-like shell manager.

>Mikulas
>-
>To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at  http://www.tux.org/lkml/

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 15:21               ` Gene Heskett
@ 2005-05-15 15:29                 ` Jeff Garzik
  2005-05-15 16:27                   ` Disk write cache Kenichi Okuyama
  2005-05-16  1:56                   ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett
  2005-05-15 16:24                 ` Mikulas Patocka
  2005-05-15 21:38                 ` Tomasz Torcz
  2 siblings, 2 replies; 150+ messages in thread
From: Jeff Garzik @ 2005-05-15 15:29 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
> >On Sun, 15 May 2005, Tomasz Torcz wrote:
> >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
> >> > > > > However they've patched the FreeBSD kernel to
> >> > > > > "workaround?" it:
> >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
> >> > > > >t5.patch
> >> > > >
> >> > > > That's a similar stupid idea as they did with the disk write
> >> > > > cache (lowering the MTBFs of their disks by considerable
> >> > > > factors, which is much worse than the power off data loss
> >> > > > problem) Let's not go down this path please.
> >> > >
> >> > > What wrong did they do with disk write cache?
> >> >
> >> > They turned it off by default, which according to disk vendors
> >> > lowers the MTBF of your disk to a fraction of the original
> >> > value.
> >> >
> >> > I bet the total amount of valuable data lost for FreeBSD users
> >> > because of broken disks is much much bigger than what they
> >> > gained from not losing in the rather hard to hit power off
> >> > cases.
> >>
> >>  Aren't I/O barriers a way to safely use write cache?
> >
> >FreeBSD used these barriers (FLUSH CACHE command) long time ago.
> >
> >There are rumors that some disks ignore FLUSH CACHE command just to
> > get higher benchmarks in Windows. But I haven't heart of any proof.
> > Does anybody know, what companies fake this command?
> >
> >From a story I read elsewhere just a few days ago, this problem is 
> virtually universal even in the umpty-bucks 15,000 rpm scsi server 
> drives.  It appears that this is just another way to crank up the 
> numbers and make each drive seem faster than its competition.
> 
> My gut feeling is that if this gets enough ink to get under the drive 
> makers skins, we will see the issuance of a utility from the makers 
> that will re-program the drives therefore enabling the proper 
> handling of the FLUSH CACHE command.  This would be an excellent 
> chance IMO, to make a bit of noise if the utility comes out, but only 
> runs on windows.  In that event, we hold their feet to the fire (the 
> prefereable method), or a wrapper is written that allows it to run on 
> any os with a bash-like shell manager.


There is a large amount of yammering and speculation in this thread.

Most disks do seem to obey SYNC CACHE / FLUSH CACHE.

	Jeff




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 15:21               ` Gene Heskett
  2005-05-15 15:29                 ` Jeff Garzik
@ 2005-05-15 16:24                 ` Mikulas Patocka
  2005-05-16 11:18                   ` Matthias Andree
  2005-05-15 21:38                 ` Tomasz Torcz
  2 siblings, 1 reply; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-15 16:24 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel



On Sun, 15 May 2005, Gene Heskett wrote:

> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
> >On Sun, 15 May 2005, Tomasz Torcz wrote:
> >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
> >> > > > > However they've patched the FreeBSD kernel to
> >> > > > > "workaround?" it:
> >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
> >> > > > >t5.patch
> >> > > >
> >> > > > That's a similar stupid idea as they did with the disk write
> >> > > > cache (lowering the MTBFs of their disks by considerable
> >> > > > factors, which is much worse than the power off data loss
> >> > > > problem) Let's not go down this path please.
> >> > >
> >> > > What wrong did they do with disk write cache?
> >> >
> >> > They turned it off by default, which according to disk vendors
> >> > lowers the MTBF of your disk to a fraction of the original
> >> > value.
> >> >
> >> > I bet the total amount of valuable data lost for FreeBSD users
> >> > because of broken disks is much much bigger than what they
> >> > gained from not losing in the rather hard to hit power off
> >> > cases.
> >>
> >>  Aren't I/O barriers a way to safely use write cache?
> >
> >FreeBSD used these barriers (FLUSH CACHE command) long time ago.
> >
> >There are rumors that some disks ignore FLUSH CACHE command just to
> > get higher benchmarks in Windows. But I haven't heart of any proof.
> > Does anybody know, what companies fake this command?
> >
> From a story I read elsewhere just a few days ago, this problem is
> virtually universal even in the umpty-bucks 15,000 rpm scsi server
> drives.  It appears that this is just another way to crank up the
> numbers and make each drive seem faster than its competition.

I've just made test on my Western Digical 40G IDE disk:

just writes without flush cache: 1min 33sec
same access pattern, but flush cache after each write: 20min 7sec (and
disk made more noise)
(this testcase does many 1-sector writes to the same or adjacent sectors,
so cache helps here a lot)

So it's likely that this disk honours cache flushing.

(but the disk contains another severe bug --- it corrupts it
cache-coherency logic when 256-sector accesses are being used --- I
asked WD about it and got no response. 256 is represented as 0 in IDE
registers --- that's probably where the bug came from).

I've also heard a lot of rumors about ignoring cache flush --- but I mean,
have anybody actually proven that some disk corrupts data this way? i.e.:

make a program that does repeatedly this:
write some sector
issue flush cache command
send a packet about what was written where

... and turn off machine while this program runs and see if disk contains
all the data from packets.

or

write many small sectors
issue flush cache
turn off power via ACPI
on next reboot see, if disk contains all the data


Note that disk can still ignore FLUSH CACHE command cached data are small
enough to be written on power loss, so small FLUSH CACHE time doesn't
prove disk cheating.

Mikulas

> My gut feeling is that if this gets enough ink to get under the drive
> makers skins, we will see the issuance of a utility from the makers
> that will re-program the drives therefore enabling the proper
> handling of the FLUSH CACHE command.  This would be an excellent
> chance IMO, to make a bit of noise if the utility comes out, but only
> runs on windows.  In that event, we hold their feet to the fire (the
> prefereable method), or a wrapper is written that allows it to run on
> any os with a bash-like shell manager.
>
> >Mikulas
> >-
> >To unsubscribe from this list: send the line "unsubscribe
> > linux-kernel" in the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >Please read the FAQ at  http://www.tux.org/lkml/
>
> --
> Cheers, Gene
> "There are four boxes to be used in defense of liberty:
>  soap, ballot, jury, and ammo. Please use in that order."
> -Ed Howdershelt (Author)
> 99.34% setiathome rank, not too shabby for a WV hillbilly
> Yahoo.com and AOL/TW attorneys please note, additions to the above
> message by Gene Heskett are:
> Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache
  2005-05-15 15:29                 ` Jeff Garzik
@ 2005-05-15 16:27                   ` Kenichi Okuyama
  2005-05-15 16:43                     ` Jeff Garzik
  2005-05-16  1:56                   ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett
  1 sibling, 1 reply; 150+ messages in thread
From: Kenichi Okuyama @ 2005-05-15 16:27 UTC (permalink / raw)
  To: jgarzik; +Cc: gene.heskett, linux-kernel

>>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes:

Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
>> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
>> >On Sun, 15 May 2005, Tomasz Torcz wrote:
>> >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
>> >> > > > > However they've patched the FreeBSD kernel to
>> >> > > > > "workaround?" it:
>> >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
>> >> > > > >t5.patch
>> >> > > >
>> >> > > > That's a similar stupid idea as they did with the disk write
>> >> > > > cache (lowering the MTBFs of their disks by considerable
>> >> > > > factors, which is much worse than the power off data loss
>> >> > > > problem) Let's not go down this path please.
>> >> > >
>> >> > > What wrong did they do with disk write cache?
>> >> >
>> >> > They turned it off by default, which according to disk vendors
>> >> > lowers the MTBF of your disk to a fraction of the original
>> >> > value.
>> >> >
>> >> > I bet the total amount of valuable data lost for FreeBSD users
>> >> > because of broken disks is much much bigger than what they
>> >> > gained from not losing in the rather hard to hit power off
>> >> > cases.
>> >>
>> >>  Aren't I/O barriers a way to safely use write cache?
>> >
>> >FreeBSD used these barriers (FLUSH CACHE command) long time ago.
>> >
>> >There are rumors that some disks ignore FLUSH CACHE command just to
>> > get higher benchmarks in Windows. But I haven't heart of any proof.
>> > Does anybody know, what companies fake this command?
>> >
>> >From a story I read elsewhere just a few days ago, this problem is 
>> virtually universal even in the umpty-bucks 15,000 rpm scsi server 
>> drives.  It appears that this is just another way to crank up the 
>> numbers and make each drive seem faster than its competition.
>> 
>> My gut feeling is that if this gets enough ink to get under the drive 
>> makers skins, we will see the issuance of a utility from the makers 
>> that will re-program the drives therefore enabling the proper 
>> handling of the FLUSH CACHE command.  This would be an excellent 
>> chance IMO, to make a bit of noise if the utility comes out, but only 
>> runs on windows.  In that event, we hold their feet to the fire (the 
>> prefereable method), or a wrapper is written that allows it to run on 
>> any os with a bash-like shell manager.


Jeff> There is a large amount of yammering and speculation in this thread.

Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE.


Then it must be file system who's not controlling properly.  And
because this is so widely spread among Linux, there must be at least
one bug existing in VFS ( or there was, and everyone copied it ).

At least, from:

	http://developer.osdl.jp/projects/doubt/

there is project name "diskio" which does black box test about this:

	http://developer.osdl.jp/projects/doubt/diskio/index.html

And if we assume for Read after Write access semantics of HDD for
"SURELY" checking the data image on disk surface ( by HDD, I mean ),
on both SCSI and ATA, ALL the file system does not pass the test.

And I was wondering who's bad. File system? Device driver of both
SCSI and ATA? or criterion? From Jeff's point, it seems like file
system or criterion...
---- 
Kenichi Okuyama

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache
  2005-05-15 16:27                   ` Disk write cache Kenichi Okuyama
@ 2005-05-15 16:43                     ` Jeff Garzik
  2005-05-15 16:50                       ` Kyle Moffett
                                         ` (4 more replies)
  0 siblings, 5 replies; 150+ messages in thread
From: Jeff Garzik @ 2005-05-15 16:43 UTC (permalink / raw)
  To: Kenichi Okuyama; +Cc: gene.heskett, linux-kernel

Kenichi Okuyama wrote:
>>>>>>"Jeff" == Jeff Garzik <jgarzik@pobox.com> writes:
> 
> 
> Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
> 
>>>On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
>>>
>>>>On Sun, 15 May 2005, Tomasz Torcz wrote:
>>>>
>>>>>On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
>>>>>
>>>>>>>>>However they've patched the FreeBSD kernel to
>>>>>>>>>"workaround?" it:
>>>>>>>>>ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
>>>>>>>>>t5.patch
>>>>>>>>
>>>>>>>>That's a similar stupid idea as they did with the disk write
>>>>>>>>cache (lowering the MTBFs of their disks by considerable
>>>>>>>>factors, which is much worse than the power off data loss
>>>>>>>>problem) Let's not go down this path please.
>>>>>>>
>>>>>>>What wrong did they do with disk write cache?
>>>>>>
>>>>>>They turned it off by default, which according to disk vendors
>>>>>>lowers the MTBF of your disk to a fraction of the original
>>>>>>value.
>>>>>>
>>>>>>I bet the total amount of valuable data lost for FreeBSD users
>>>>>>because of broken disks is much much bigger than what they
>>>>>>gained from not losing in the rather hard to hit power off
>>>>>>cases.
>>>>>
>>>>> Aren't I/O barriers a way to safely use write cache?
>>>>
>>>>FreeBSD used these barriers (FLUSH CACHE command) long time ago.
>>>>
>>>>There are rumors that some disks ignore FLUSH CACHE command just to
>>>>get higher benchmarks in Windows. But I haven't heart of any proof.
>>>>Does anybody know, what companies fake this command?
>>>>
>>>
>>>>From a story I read elsewhere just a few days ago, this problem is 
>>>virtually universal even in the umpty-bucks 15,000 rpm scsi server 
>>>drives.  It appears that this is just another way to crank up the 
>>>numbers and make each drive seem faster than its competition.
>>>
>>>My gut feeling is that if this gets enough ink to get under the drive 
>>>makers skins, we will see the issuance of a utility from the makers 
>>>that will re-program the drives therefore enabling the proper 
>>>handling of the FLUSH CACHE command.  This would be an excellent 
>>>chance IMO, to make a bit of noise if the utility comes out, but only 
>>>runs on windows.  In that event, we hold their feet to the fire (the 
>>>prefereable method), or a wrapper is written that allows it to run on 
>>>any os with a bash-like shell manager.
> 
> 
> 
> Jeff> There is a large amount of yammering and speculation in this thread.
> 
> Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
> 
> 
> Then it must be file system who's not controlling properly.  And
> because this is so widely spread among Linux, there must be at least
> one bug existing in VFS ( or there was, and everyone copied it ).
> 
> At least, from:
> 
> 	http://developer.osdl.jp/projects/doubt/
> 
> there is project name "diskio" which does black box test about this:
> 
> 	http://developer.osdl.jp/projects/doubt/diskio/index.html
> 
> And if we assume for Read after Write access semantics of HDD for
> "SURELY" checking the data image on disk surface ( by HDD, I mean ),
> on both SCSI and ATA, ALL the file system does not pass the test.
> 
> And I was wondering who's bad. File system? Device driver of both
> SCSI and ATA? or criterion? From Jeff's point, it seems like file
> system or criterion...

The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE 
command to be generated has only been present in the most recent 2.6.x 
kernels.  See the "write barrier" stuff that people have been discussing.

Furthermore, read-after-write implies nothing at all.  The only way to 
you can be assured that your data has "hit the platter" is
(1) issuing [FLUSH|SYNC] CACHE, or
(2) using FUA-style disk commands

It sounds like your test (or reasoning) is invalid.

	Jeff




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache
  2005-05-15 16:43                     ` Jeff Garzik
@ 2005-05-15 16:50                       ` Kyle Moffett
  2005-05-15 16:56                       ` Andi Kleen
                                         ` (3 subsequent siblings)
  4 siblings, 0 replies; 150+ messages in thread
From: Kyle Moffett @ 2005-05-15 16:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Kenichi Okuyama, gene.heskett, linux-kernel

On May 15, 2005, at 12:43:07, Jeff Garzik wrote:
> The only way to you can be assured that your data has "hit the  
> platter" is
> (1) issuing [FLUSH|SYNC] CACHE, or
> (2) using FUA-style disk commands

And even then, some battery-backed RAID controllers will completely
ignore cache-flushes, because in the event of a power failure they
can maintain the cached data for anywhere from a couple days to a
month or two, depending on the quality of the card and the size of
its battery.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache
  2005-05-15 16:43                     ` Jeff Garzik
  2005-05-15 16:50                       ` Kyle Moffett
@ 2005-05-15 16:56                       ` Andi Kleen
  2005-05-15 20:44                         ` Andrew Morton
  2005-05-15 16:58                       ` Disk write cache Mikulas Patocka
                                         ` (2 subsequent siblings)
  4 siblings, 1 reply; 150+ messages in thread
From: Andi Kleen @ 2005-05-15 16:56 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: gene.heskett, linux-kernel, okuyamak

Jeff Garzik <jgarzik@pobox.com> writes:
>
> The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE
> command to be generated has only been present in the most recent 2.6.x
> kernels.  See the "write barrier" stuff that people have been
> discussing.

Are you sure mainline does it for fsync() file data at all? iirc it
was only done for journal writes in reiserfs/xfs/jbd. However since
I suppose a lot of disks flush everything pending on a flush cache
command it still works assuming the file systems write the 
data to disk in fsync before syncing the journal. I don't know
if they do that.

-Andi

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache
  2005-05-15 16:43                     ` Jeff Garzik
  2005-05-15 16:50                       ` Kyle Moffett
  2005-05-15 16:56                       ` Andi Kleen
@ 2005-05-15 16:58                       ` Mikulas Patocka
  2005-05-15 17:20                       ` Kenichi Okuyama
  2005-05-16 11:02                       ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree
  4 siblings, 0 replies; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-15 16:58 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Kenichi Okuyama, gene.heskett, linux-kernel



On Sun, 15 May 2005, Jeff Garzik wrote:

> Kenichi Okuyama wrote:
> >>>>>>"Jeff" == Jeff Garzik <jgarzik@pobox.com> writes:
> >
> >
> > Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
> >
> >>>On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
> >>>
> >>>>On Sun, 15 May 2005, Tomasz Torcz wrote:
> >>>>
> >>>>>On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
> >>>>>
> >>>>>>>>>However they've patched the FreeBSD kernel to
> >>>>>>>>>"workaround?" it:
> >>>>>>>>>ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
> >>>>>>>>>t5.patch
> >>>>>>>>
> >>>>>>>>That's a similar stupid idea as they did with the disk write
> >>>>>>>>cache (lowering the MTBFs of their disks by considerable
> >>>>>>>>factors, which is much worse than the power off data loss
> >>>>>>>>problem) Let's not go down this path please.
> >>>>>>>
> >>>>>>>What wrong did they do with disk write cache?
> >>>>>>
> >>>>>>They turned it off by default, which according to disk vendors
> >>>>>>lowers the MTBF of your disk to a fraction of the original
> >>>>>>value.
> >>>>>>
> >>>>>>I bet the total amount of valuable data lost for FreeBSD users
> >>>>>>because of broken disks is much much bigger than what they
> >>>>>>gained from not losing in the rather hard to hit power off
> >>>>>>cases.
> >>>>>
> >>>>> Aren't I/O barriers a way to safely use write cache?
> >>>>
> >>>>FreeBSD used these barriers (FLUSH CACHE command) long time ago.
> >>>>
> >>>>There are rumors that some disks ignore FLUSH CACHE command just to
> >>>>get higher benchmarks in Windows. But I haven't heart of any proof.
> >>>>Does anybody know, what companies fake this command?
> >>>>
> >>>
> >>>>From a story I read elsewhere just a few days ago, this problem is
> >>>virtually universal even in the umpty-bucks 15,000 rpm scsi server
> >>>drives.  It appears that this is just another way to crank up the
> >>>numbers and make each drive seem faster than its competition.
> >>>
> >>>My gut feeling is that if this gets enough ink to get under the drive
> >>>makers skins, we will see the issuance of a utility from the makers
> >>>that will re-program the drives therefore enabling the proper
> >>>handling of the FLUSH CACHE command.  This would be an excellent
> >>>chance IMO, to make a bit of noise if the utility comes out, but only
> >>>runs on windows.  In that event, we hold their feet to the fire (the
> >>>prefereable method), or a wrapper is written that allows it to run on
> >>>any os with a bash-like shell manager.
> >
> >
> >
> > Jeff> There is a large amount of yammering and speculation in this thread.
> >
> > Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
> >
> >
> > Then it must be file system who's not controlling properly.  And
> > because this is so widely spread among Linux, there must be at least
> > one bug existing in VFS ( or there was, and everyone copied it ).
> >
> > At least, from:
> >
> > 	http://developer.osdl.jp/projects/doubt/
> >
> > there is project name "diskio" which does black box test about this:
> >
> > 	http://developer.osdl.jp/projects/doubt/diskio/index.html
> >
> > And if we assume for Read after Write access semantics of HDD for
> > "SURELY" checking the data image on disk surface ( by HDD, I mean ),
> > on both SCSI and ATA, ALL the file system does not pass the test.
> >
> > And I was wondering who's bad. File system? Device driver of both
> > SCSI and ATA? or criterion? From Jeff's point, it seems like file
> > system or criterion...
>
> The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE
> command to be generated has only been present in the most recent 2.6.x
> kernels.  See the "write barrier" stuff that people have been discussing.
>
> Furthermore, read-after-write implies nothing at all.  The only way to
> you can be assured that your data has "hit the platter" is
> (1) issuing [FLUSH|SYNC] CACHE, or
> (2) using FUA-style disk commands
>
> It sounds like your test (or reasoning) is invalid.

The above program checks that write+[f[data]]sync took longer than time
required to transmit data via IDE bus. It has nothing to do with FLUSH
CACHE command at all.

The results just show that ext3 used to have bug in f[data]sync in
data-journal mode and that XFS still has bug in fdatasync on 2.4 kernels.

Incorrect results in this test can't be caused by buggy disk.

Mikulas

> 	Jeff
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache
  2005-05-15 16:43                     ` Jeff Garzik
                                         ` (2 preceding siblings ...)
  2005-05-15 16:58                       ` Disk write cache Mikulas Patocka
@ 2005-05-15 17:20                       ` Kenichi Okuyama
  2005-05-16 11:02                       ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree
  4 siblings, 0 replies; 150+ messages in thread
From: Kenichi Okuyama @ 2005-05-15 17:20 UTC (permalink / raw)
  To: jgarzik; +Cc: gene.heskett, linux-kernel

>>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes:

Jeff> Kenichi Okuyama wrote:
>>>>>>> "Jeff" == Jeff Garzik <jgarzik@pobox.com> writes:
>> 
>> 
Jeff> On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
>> 
>>>> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
>>>> 
>>>>> On Sun, 15 May 2005, Tomasz Torcz wrote:
>>>>> 
>>>>>> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
>>>>>> 
>>>>>>>>>> However they've patched the FreeBSD kernel to
>>>>>>>>>> "workaround?" it:
>>>>>>>>>> ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09/ht
>>>>>>>>>> t5.patch
>>>>>>>>> 
>>>>>>>>> That's a similar stupid idea as they did with the disk write
>>>>>>>>> cache (lowering the MTBFs of their disks by considerable
>>>>>>>>> factors, which is much worse than the power off data loss
>>>>>>>>> problem) Let's not go down this path please.
>>>>>>>> 
>>>>>>>> What wrong did they do with disk write cache?
>>>>>>> 
>>>>>>> They turned it off by default, which according to disk vendors
>>>>>>> lowers the MTBF of your disk to a fraction of the original
>>>>>>> value.
>>>>>>> 
>>>>>>> I bet the total amount of valuable data lost for FreeBSD users
>>>>>>> because of broken disks is much much bigger than what they
>>>>>>> gained from not losing in the rather hard to hit power off
>>>>>>> cases.
>>>>>> 
>>>>> Aren't I/O barriers a way to safely use write cache?
>>>>> 
>>>>> FreeBSD used these barriers (FLUSH CACHE command) long time ago.
>>>>> 
>>>>> There are rumors that some disks ignore FLUSH CACHE command just to
>>>>> get higher benchmarks in Windows. But I haven't heart of any proof.
>>>>> Does anybody know, what companies fake this command?
>>>>> 
>>>> 
>>>>> From a story I read elsewhere just a few days ago, this problem is 
>>>> virtually universal even in the umpty-bucks 15,000 rpm scsi server 
>>>> drives.  It appears that this is just another way to crank up the 
>>>> numbers and make each drive seem faster than its competition.
>>>> 
>>>> My gut feeling is that if this gets enough ink to get under the drive 
>>>> makers skins, we will see the issuance of a utility from the makers 
>>>> that will re-program the drives therefore enabling the proper 
>>>> handling of the FLUSH CACHE command.  This would be an excellent 
>>>> chance IMO, to make a bit of noise if the utility comes out, but only 
>>>> runs on windows.  In that event, we hold their feet to the fire (the 
>>>> prefereable method), or a wrapper is written that allows it to run on 
>>>> any os with a bash-like shell manager.
>> 
>> 
>> 
Jeff> There is a large amount of yammering and speculation in this thread.
>> 
Jeff> Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
>> 
>> 
>> Then it must be file system who's not controlling properly.  And
>> because this is so widely spread among Linux, there must be at least
>> one bug existing in VFS ( or there was, and everyone copied it ).
>> 
>> At least, from:
>> 
>> http://developer.osdl.jp/projects/doubt/
>> 
>> there is project name "diskio" which does black box test about this:
>> 
>> http://developer.osdl.jp/projects/doubt/diskio/index.html
>> 
>> And if we assume for Read after Write access semantics of HDD for
>> "SURELY" checking the data image on disk surface ( by HDD, I mean ),
>> on both SCSI and ATA, ALL the file system does not pass the test.
>> 
>> And I was wondering who's bad. File system? Device driver of both
>> SCSI and ATA? or criterion? From Jeff's point, it seems like file
>> system or criterion...

Jeff> The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE 
Jeff> command to be generated has only been present in the most recent 2.6.x 
Jeff> kernels.  See the "write barrier" stuff that people have been discussing.

Jeff> Furthermore, read-after-write implies nothing at all.  The only way to 
Jeff> you can be assured that your data has "hit the platter" is
Jeff> (1) issuing [FLUSH|SYNC] CACHE, or
Jeff> (2) using FUA-style disk commands

Jeff> It sounds like your test (or reasoning) is invalid.

Thank you for you information, Jeff.

I didn't see the reason why my reasoning is invalid, for they are
black box test and doesn't care about implementation.

But with your explanation and some logs, I see where to look for.
I'll run test with FreeBSD as soon as I got time.
If FreeBSD fails, there must be something wrong with reasoning.

Thanks again for great hint.
regards,
---- 
Kenichi Okuyama

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: Hyper-Threading Vulnerability
  2005-05-15  9:43         ` Andi Kleen
@ 2005-05-15 18:42           ` David Schwartz
  2005-05-15 18:56             ` Dr. David Alan Gilbert
  2005-05-16  7:10           ` Eric W. Biederman
  1 sibling, 1 reply; 150+ messages in thread
From: David Schwartz @ 2005-05-15 18:42 UTC (permalink / raw)
  To: linux-kernel


Andi Kleen wrote:

> And what you're doing is to ask all the non crypto guys to give
> up an useful optimization just to fix a problem in the crypto guy's
> code. The cache line information leak is just a information leak
> bug in the crypto code, not a general problem.

	Portable code shouldn't even have to know that there is such a thing as a
cache line. It should be able to rely on the operating system not to let
other tasks with a different security context spy on the details of its
operation.

> There is much more non crypto code than crypto code around - you
> are proposing to screw the majority of codes to solve a relatively
> obscure problem of only a few functions, which seems like the totally
> wrong approach to me.

	That I do agree with.

> BTW the crypto guys are always free to check for hyperthreading
> themselves and use different functions.  However there is a catch
> there - the modern dual core processors which actually have
> separated L1 and L2 caches set these too to stay compatible
> with old code and license managers.

	This is just a recipe for making it impossible to write correct code. If
you don't believe the operating system or the hardware is at all at fault
for this problem, then it would follow that they could repeat this same
problem with some new mechanism and still not be at fault. So even if the
program checked for hyper-threading, it would still not be correct. It would
have to check for every possible future way this same type of problem could
arise and hide every type of trace that they could create, even if that
trace is in optimization mechanisms and potential channels over which the
programmer has no knowledge because they don't exist yet.

	Let's try a rudctio ad absurdum. Surely you would agree that something
other than than the crypto software is at fault if the operating system or
hardware allowed another process with a different security context to see
every instruction the code executed. The crypto authors shouldn't be
expected to make the instruction flows look identical. How different is
monitoring the memory accesses?

	Portable, POSIX-compliant C software shouldn't even have to know that there
is such a thing as a cache line.

	I'm not going to be unreasonable though. Hyper-threading is here, and now
that we know the potential problems, it's not unreasonable to ask developers
of crypto code to work around it. But it's not a bug in their code that they
need to fix. In fact, they can't even fix it yet because there is no
portable way to determine if you're on a machine that has hyper-threading or
not.

	DS



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 18:42           ` David Schwartz
@ 2005-05-15 18:56             ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 150+ messages in thread
From: Dr. David Alan Gilbert @ 2005-05-15 18:56 UTC (permalink / raw)
  To: David Schwartz; +Cc: linux-kernel

* David Schwartz (davids@webmaster.com) wrote:
> 
> Andi Kleen wrote:
> 
> > And what you're doing is to ask all the non crypto guys to give
> > up an useful optimization just to fix a problem in the crypto guy's
> > code. The cache line information leak is just a information leak
> > bug in the crypto code, not a general problem.
> 
> 	Portable code shouldn't even have to know that there is such a thing as a
> cache line. It should be able to rely on the operating system not to let
> other tasks with a different security context spy on the details of its
> operation.

I find it interesting to compare this thread with a thread from about
a week ago talking about how /proc/cpuinfo wasn't consistent
across architectures - where we come round to the view of whether
the application writers shouldn't care/are too dumb/shouldn't need
to know about/can't be trusted  with knowing about what the real
hardware is.

Personally I think this is a good case of where the application
should take care of it - with whatever support the OS can really
give.

(That is if this is actually a real problem and not just
purely theoretical - my crypto knowledge isn't good enough
to answer that - but it feels very very abstract).

Dave
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    | Running GNU/Linux on Alpha,68K| Happy  \ 
\ gro.gilbert @ treblig.org | MIPS,x86,ARM,SPARC,PPC & HPPA | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15  7:30                                   ` Arjan van de Ven
@ 2005-05-15 20:41                                     ` Alan Cox
  2005-05-15 20:48                                       ` Arjan van de Ven
  0 siblings, 1 reply; 150+ messages in thread
From: Alan Cox @ 2005-05-15 20:41 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote:
> stop entirely.... (and that is also happening more and more and linux is
> getting more agressive idle support (eg no timer tick and such patches)
> which will trigger bios thresholds for this even more too.

Cyrix did TSC stop on halt a long long time ago, back when it was worth
the power difference.

Alan


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache
  2005-05-15 16:56                       ` Andi Kleen
@ 2005-05-15 20:44                         ` Andrew Morton
  2005-05-15 23:31                           ` Cache based insecurity/CPU cache/Disk Cache Tradeoffs Brian O'Mahoney
  0 siblings, 1 reply; 150+ messages in thread
From: Andrew Morton @ 2005-05-15 20:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: jgarzik, gene.heskett, linux-kernel, okuyamak

Andi Kleen <ak@muc.de> wrote:
>
> However since
>  I suppose a lot of disks flush everything pending on a flush cache
>  command it still works assuming the file systems write the 
>  data to disk in fsync before syncing the journal. I don't know
>  if they do that.


ext3 does, in data=journal and data=ordered modes.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 20:41                                     ` Alan Cox
@ 2005-05-15 20:48                                       ` Arjan van de Ven
  2005-05-15 21:10                                         ` Lee Revell
  0 siblings, 1 reply; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-15 20:48 UTC (permalink / raw)
  To: Alan Cox
  Cc: Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote:
> On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote:
> > stop entirely.... (and that is also happening more and more and linux is
> > getting more agressive idle support (eg no timer tick and such patches)
> > which will trigger bios thresholds for this even more too.
> 
> Cyrix did TSC stop on halt a long long time ago, back when it was worth
> the power difference.

With linux going to ACPI C2 mode more... tsc is defined to halt in C2...



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 20:48                                       ` Arjan van de Ven
@ 2005-05-15 21:10                                         ` Lee Revell
  2005-05-15 22:55                                           ` Dave Jones
  0 siblings, 1 reply; 150+ messages in thread
From: Lee Revell @ 2005-05-15 21:10 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Alan Cox, Dave Jones, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote:
> On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote:
> > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote:
> > > stop entirely.... (and that is also happening more and more and linux is
> > > getting more agressive idle support (eg no timer tick and such patches)
> > > which will trigger bios thresholds for this even more too.
> > 
> > Cyrix did TSC stop on halt a long long time ago, back when it was worth
> > the power difference.
> 
> With linux going to ACPI C2 mode more... tsc is defined to halt in C2...

JACK doesn't care about any of this now, the behavior when you
suspend/resume with a running jackd is undefined.  Eventually we should
handle it, but there's no point until the ALSA drivers get proper
suspend/resume support.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 15:21               ` Gene Heskett
  2005-05-15 15:29                 ` Jeff Garzik
  2005-05-15 16:24                 ` Mikulas Patocka
@ 2005-05-15 21:38                 ` Tomasz Torcz
  2 siblings, 0 replies; 150+ messages in thread
From: Tomasz Torcz @ 2005-05-15 21:38 UTC (permalink / raw)
  To: linux-kernel

On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
> >FreeBSD used these barriers (FLUSH CACHE command) long time ago.
> >
> >There are rumors that some disks ignore FLUSH CACHE command just to
> > get higher benchmarks in Windows. But I haven't heart of any proof.
> > Does anybody know, what companies fake this command?
> >
> >From a story I read elsewhere just a few days ago, this problem is 
> virtually universal even in the umpty-bucks 15,000 rpm scsi server 
> drives.  It appears that this is just another way to crank up the 
> numbers and make each drive seem faster than its competition.

 Probably you talking about this: http://www.livejournal.com/~brad/2116715.html
It has hit Slashdot yesterday.

-- 
Tomasz Torcz                 "God, root, what's the difference?"
zdzichu@irc.-nie.spam-.pl         "God is more forgiving."


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 21:10                                         ` Lee Revell
@ 2005-05-15 22:55                                           ` Dave Jones
  2005-05-15 23:10                                             ` Lee Revell
  0 siblings, 1 reply; 150+ messages in thread
From: Dave Jones @ 2005-05-15 22:55 UTC (permalink / raw)
  To: Lee Revell
  Cc: Arjan van de Ven, Alan Cox, Matt Mackall, Andy Isaacson,
	Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso

On Sun, May 15, 2005 at 05:10:59PM -0400, Lee Revell wrote:
 > On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote:
 > > On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote:
 > > > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote:
 > > > > stop entirely.... (and that is also happening more and more and linux is
 > > > > getting more agressive idle support (eg no timer tick and such patches)
 > > > > which will trigger bios thresholds for this even more too.
 > > > 
 > > > Cyrix did TSC stop on halt a long long time ago, back when it was worth
 > > > the power difference.
 > > 
 > > With linux going to ACPI C2 mode more... tsc is defined to halt in C2...
 > 
 > JACK doesn't care about any of this now, the behavior when you
 > suspend/resume with a running jackd is undefined.  Eventually we should
 > handle it, but there's no point until the ALSA drivers get proper
 > suspend/resume support.

suspend/resume are S states, not C states. C states are occuring
during runtime.

		Dave


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 22:55                                           ` Dave Jones
@ 2005-05-15 23:10                                             ` Lee Revell
  2005-05-16  7:25                                               ` Arjan van de Ven
  0 siblings, 1 reply; 150+ messages in thread
From: Lee Revell @ 2005-05-15 23:10 UTC (permalink / raw)
  To: Dave Jones
  Cc: Arjan van de Ven, Alan Cox, Matt Mackall, Andy Isaacson,
	Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso

On Sun, 2005-05-15 at 18:55 -0400, Dave Jones wrote:
> On Sun, May 15, 2005 at 05:10:59PM -0400, Lee Revell wrote:
>  > On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote:
>  > > On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote:
>  > > > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote:
>  > > > > stop entirely.... (and that is also happening more and more and linux is
>  > > > > getting more agressive idle support (eg no timer tick and such patches)
>  > > > > which will trigger bios thresholds for this even more too.
>  > > > 
>  > > > Cyrix did TSC stop on halt a long long time ago, back when it was worth
>  > > > the power difference.
>  > > 
>  > > With linux going to ACPI C2 mode more... tsc is defined to halt in C2...
>  > 
>  > JACK doesn't care about any of this now, the behavior when you
>  > suspend/resume with a running jackd is undefined.  Eventually we should
>  > handle it, but there's no point until the ALSA drivers get proper
>  > suspend/resume support.
> 
> suspend/resume are S states, not C states. C states are occuring
> during runtime.

It should never go into C2 if jackd is running, because you're getting
interrupts from the audio interface at least every 100ms or so (usually
much more often) which will wake up jackd and any clients.

Lee


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Cache based insecurity/CPU cache/Disk Cache Tradeoffs
  2005-05-15 20:44                         ` Andrew Morton
@ 2005-05-15 23:31                           ` Brian O'Mahoney
  0 siblings, 0 replies; 150+ messages in thread
From: Brian O'Mahoney @ 2005-05-15 23:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

In principle, it is correct that CPU caches should _not_ permit, or
facilitate data leakage attacks and disk caches should _not_ prevent
applications from ensuring that data is really transferred to non-
volatile storage.

But turning Hypertheading, multiple ALUs, or disk cacheing off in the
OS is not a solution, it is a cop-out, and as other posters have pointed
out, simply invites other more serious failure modes; thus the BSD
knee jerk reactions are simply wrong, and in fact counter productive.
The name of the game is a correct, not a fast fix. Don't make things
worse.

So what really does need doing

(a) a power-is-failing hook which does a dirty-writback and flush
cache to disk; this is the best you can do, and it is very very cheap
to provide DC power hold up for 10(s)->100(s) seconds, by which time
the crap disks will do an autonomous writback anyway (1-10 F +5v,+12b
~ 12 USD say), or, on servers use a UPS with, say 30m hold up.

Well designed servers, or SAN disks have this built in.

(b) CPU registers, and caches, are inherently insecure, and most
hardware designers still do not have a good enough background to
understand what the OS really needs to do this right in hardware:

so secure apps need a way to tell the OS to do an _expensive_
context switch in which it is guarenteed to flush all all leaky-context,
and since this is architecture-model-sub_architecture- ...
mask step dependant it can only be done in the OS, but user-land
needs a way to tell the OS to be paranoic, after the context
save and before scheduling another real context (excluding
the idle-loop), this is an API extension, ulimit ?.

This will let user-land, not include atchitecture dependant code,
and most context switches to be no more expensive than they are
now.

Almost no applications need paranoic context flushes, can't know
how to do it themselves, so this has to go in the model dependant
OS code, with a user mode API to turn it on per-thead.



-- 
mit freundlichen Grüßen, Brian.

Dr. Brian O'Mahoney
Mobile +41 (0)79 334 8035 Email: omb@bluewin.ch
Bleicherstrasse 25, CH-8953 Dietikon, Switzerland
PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D  F5 14 39 C9 6D 38 56 D5

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 15:29                 ` Jeff Garzik
  2005-05-15 16:27                   ` Disk write cache Kenichi Okuyama
@ 2005-05-16  1:56                   ` Gene Heskett
  2005-05-16  2:11                     ` Jeff Garzik
                                       ` (2 more replies)
  1 sibling, 3 replies; 150+ messages in thread
From: Gene Heskett @ 2005-05-16  1:56 UTC (permalink / raw)
  To: linux-kernel

On Sunday 15 May 2005 11:29, Jeff Garzik wrote:
>On Sun, May 15, 2005 at 11:21:36AM -0400, Gene Heskett wrote:
>> On Sunday 15 May 2005 11:00, Mikulas Patocka wrote:
>> >On Sun, 15 May 2005, Tomasz Torcz wrote:
>> >> On Sun, May 15, 2005 at 04:12:07PM +0200, Andi Kleen wrote:
>> >> > > > > However they've patched the FreeBSD kernel to
>> >> > > > > "workaround?" it:
>> >> > > > > ftp://ftp.freebsd.org/pub/FreeBSD/CERT/patches/SA-05:09
>> >> > > > >/ht t5.patch
>> >> > > >
>> >> > > > That's a similar stupid idea as they did with the disk
>> >> > > > write cache (lowering the MTBFs of their disks by
>> >> > > > considerable factors, which is much worse than the power
>> >> > > > off data loss problem) Let's not go down this path
>> >> > > > please.
>> >> > >
>> >> > > What wrong did they do with disk write cache?
>> >> >
>> >> > They turned it off by default, which according to disk
>> >> > vendors lowers the MTBF of your disk to a fraction of the
>> >> > original value.
>> >> >
>> >> > I bet the total amount of valuable data lost for FreeBSD
>> >> > users because of broken disks is much much bigger than what
>> >> > they gained from not losing in the rather hard to hit power
>> >> > off cases.
>> >>
>> >>  Aren't I/O barriers a way to safely use write cache?
>> >
>> >FreeBSD used these barriers (FLUSH CACHE command) long time ago.
>> >
>> >There are rumors that some disks ignore FLUSH CACHE command just
>> > to get higher benchmarks in Windows. But I haven't heart of any
>> > proof. Does anybody know, what companies fake this command?
>> >
>> >From a story I read elsewhere just a few days ago, this problem
>> > is
>>
>> virtually universal even in the umpty-bucks 15,000 rpm scsi server
>> drives.  It appears that this is just another way to crank up the
>> numbers and make each drive seem faster than its competition.
>>
>> My gut feeling is that if this gets enough ink to get under the
>> drive makers skins, we will see the issuance of a utility from the
>> makers that will re-program the drives therefore enabling the
>> proper handling of the FLUSH CACHE command.  This would be an
>> excellent chance IMO, to make a bit of noise if the utility comes
>> out, but only runs on windows.  In that event, we hold their feet
>> to the fire (the prefereable method), or a wrapper is written that
>> allows it to run on any os with a bash-like shell manager.
>
>There is a large amount of yammering and speculation in this thread.

I agree, and frankly I'm just another  of the yammerers as I don't 
have the clout to be otherwise.

>Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
>
> Jeff

I don't think I have any drives here that do obey that, Jeff.  I got 
curious about this, oh, maybe a year back when this discussion first 
took place on another list, and wrote a test gizmo that copied a 
large file, then slept for 1 second and issued a sync command.  No 
drive led activity until the usual 5 second delay of the filesystem 
had expired.  To me, that indicated that the sync command was being 
returned as completed without error and I had my shell prompt back 
long before the drives leds came on.  Admittedly that may not be a 
100% valid test, but I really did expect to see the leds come on as 
the sync command was executed.

I also have some setup stuff for heyu that runs at various times of 
the day, reconfigureing how heyu and xtend run 3 times a day here, 
which depends on a valid disk file, and I've had to use sleeps for 
guaranteeing the proper sequencing, where if the sync command 
actually worked, I could get the job done quite a bit faster.

Again, probably not a valid test of the sync command, but thats the 
evidence I have.  I do not believe it works here, with any of the 5 
drives currently spinning in these two boxes.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16  1:56                   ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett
@ 2005-05-16  2:11                     ` Jeff Garzik
  2005-05-16  2:24                     ` Mikulas Patocka
  2005-05-16  2:32                     ` Mark Lord
  2 siblings, 0 replies; 150+ messages in thread
From: Jeff Garzik @ 2005-05-16  2:11 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

Gene Heskett wrote:
> I don't think I have any drives here that do obey that, Jeff.  I got 
> curious about this, oh, maybe a year back when this discussion first 
> took place on another list, and wrote a test gizmo that copied a 
> large file, then slept for 1 second and issued a sync command.  No 
> drive led activity until the usual 5 second delay of the filesystem 
> had expired.  To me, that indicated that the sync command was being 
> returned as completed without error and I had my shell prompt back 
> long before the drives leds came on.  Admittedly that may not be a 
> 100% valid test, but I really did expect to see the leds come on as 
> the sync command was executed.

> Again, probably not a valid test of the sync command, but thats the 
> evidence I have.  I do not believe it works here, with any of the 5 
> drives currently spinning in these two boxes.

Correct, that's a pretty poor test.

	Jeff



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16  1:56                   ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett
  2005-05-16  2:11                     ` Jeff Garzik
@ 2005-05-16  2:24                     ` Mikulas Patocka
  2005-05-16  3:05                       ` Gene Heskett
  2005-05-16  2:32                     ` Mark Lord
  2 siblings, 1 reply; 150+ messages in thread
From: Mikulas Patocka @ 2005-05-16  2:24 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel



On Sun, 15 May 2005, Gene Heskett wrote:

> >There is a large amount of yammering and speculation in this thread.
>
> I agree, and frankly I'm just another  of the yammerers as I don't
> have the clout to be otherwise.
>
> >Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
> >
> > Jeff
>
> I don't think I have any drives here that do obey that, Jeff.  I got
> curious about this, oh, maybe a year back when this discussion first
> took place on another list, and wrote a test gizmo that copied a
> large file, then slept for 1 second and issued a sync command.  No
> drive led activity until the usual 5 second delay of the filesystem
> had expired.  To me, that indicated that the sync command was being
> returned as completed without error and I had my shell prompt back
> long before the drives leds came on.  Admittedly that may not be a
> 100% valid test, but I really did expect to see the leds come on as
> the sync command was executed.
>
> I also have some setup stuff for heyu that runs at various times of
> the day, reconfigureing how heyu and xtend run 3 times a day here,
> which depends on a valid disk file, and I've had to use sleeps for
> guaranteeing the proper sequencing, where if the sync command
> actually worked, I could get the job done quite a bit faster.
>
> Again, probably not a valid test of the sync command, but thats the
> evidence I have.  I do not believe it works here, with any of the 5
> drives currently spinning in these two boxes.

Note, that Linux can't send FLUSH CACHE command at all (until very recent
2.6 kernels). So write cache is always dangerous under Linux, no matter if
disk is broken or not.

Another note: according to posix, sync() is asynchronous --- i.e. it
initiates write, but doesn't have to wait for complete. In linux, sync()
waits for writes to complete, but it doesn't have to in other OSes.

Mikulas

> --
> Cheers, Gene
> "There are four boxes to be used in defense of liberty:
>  soap, ballot, jury, and ammo. Please use in that order."
> -Ed Howdershelt (Author)
> 99.34% setiathome rank, not too shabby for a WV hillbilly
> Yahoo.com and AOL/TW attorneys please note, additions to the above
> message by Gene Heskett are:
> Copyright 2005 by Maurice Eugene Heskett, all rights reserved.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16  1:56                   ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett
  2005-05-16  2:11                     ` Jeff Garzik
  2005-05-16  2:24                     ` Mikulas Patocka
@ 2005-05-16  2:32                     ` Mark Lord
  2005-05-16  3:08                       ` Gene Heskett
  2005-05-18  4:03                       ` Eric D. Mudama
  2 siblings, 2 replies; 150+ messages in thread
From: Mark Lord @ 2005-05-16  2:32 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

 >took place on another list, and wrote a test gizmo that copied a
 >large file, then slept for 1 second and issued a sync command.  No
 >drive led activity until the usual 5 second delay of the filesystem
 >had expired.  To me, that indicated that the sync command was being

There's your clue.  The drive LEDs normally reflect activity
over the ATA bus (the cable!). If they're not on, then the drive
isn't receiving data/commands from the host.

Cheers

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16  2:24                     ` Mikulas Patocka
@ 2005-05-16  3:05                       ` Gene Heskett
  0 siblings, 0 replies; 150+ messages in thread
From: Gene Heskett @ 2005-05-16  3:05 UTC (permalink / raw)
  To: linux-kernel

On Sunday 15 May 2005 22:24, Mikulas Patocka wrote:
>On Sun, 15 May 2005, Gene Heskett wrote:
>> >There is a large amount of yammering and speculation in this
>> > thread.
>>
>> I agree, and frankly I'm just another  of the yammerers as I don't
>> have the clout to be otherwise.
>>
>> >Most disks do seem to obey SYNC CACHE / FLUSH CACHE.
>> >
>> > Jeff
>>
>> I don't think I have any drives here that do obey that, Jeff.  I
>> got curious about this, oh, maybe a year back when this discussion
>> first took place on another list, and wrote a test gizmo that
>> copied a large file, then slept for 1 second and issued a sync
>> command.  No drive led activity until the usual 5 second delay of
>> the filesystem had expired.  To me, that indicated that the sync
>> command was being returned as completed without error and I had my
>> shell prompt back long before the drives leds came on.  Admittedly
>> that may not be a 100% valid test, but I really did expect to see
>> the leds come on as the sync command was executed.
>>
>> I also have some setup stuff for heyu that runs at various times
>> of the day, reconfigureing how heyu and xtend run 3 times a day
>> here, which depends on a valid disk file, and I've had to use
>> sleeps for guaranteeing the proper sequencing, where if the sync
>> command actually worked, I could get the job done quite a bit
>> faster.
>>
>> Again, probably not a valid test of the sync command, but thats
>> the evidence I have.  I do not believe it works here, with any of
>> the 5 drives currently spinning in these two boxes.
>
>Note, that Linux can't send FLUSH CACHE command at all (until very
> recent 2.6 kernels). So write cache is always dangerous under
> Linux, no matter if disk is broken or not.
>
>Another note: according to posix, sync() is asynchronous --- i.e. it
>initiates write, but doesn't have to wait for complete. In linux,
> sync() waits for writes to complete, but it doesn't have to in
> other OSes.
>
>Mikulas
>
Humm, I'm getting the impression I should rerun that test script if I 
can find it.  I believe the last time I tried it, I was running a 
2.4.x kernel, right now 2.6.12-rc1.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16  2:32                     ` Mark Lord
@ 2005-05-16  3:08                       ` Gene Heskett
  2005-05-16 13:44                         ` Mark Lord
  2005-05-18  4:03                       ` Eric D. Mudama
  1 sibling, 1 reply; 150+ messages in thread
From: Gene Heskett @ 2005-05-16  3:08 UTC (permalink / raw)
  To: linux-kernel

On Sunday 15 May 2005 22:32, Mark Lord wrote:
> >took place on another list, and wrote a test gizmo that copied a
> >large file, then slept for 1 second and issued a sync command.  No
> >drive led activity until the usual 5 second delay of the
> > filesystem had expired.  To me, that indicated that the sync
> > command was being
>
>There's your clue.  The drive LEDs normally reflect activity
>over the ATA bus (the cable!). If they're not on, then the drive
>isn't receiving data/commands from the host.
>
>Cheers

That was my theory too Mark, but Jeff G. says its not a valid 
indicator.  So who's right?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.34% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2005 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 13:38                       ` Mikulas Patocka
@ 2005-05-16  7:06                         ` andrea
  0 siblings, 0 replies; 150+ messages in thread
From: andrea @ 2005-05-16  7:06 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: Alan Cox, Lee Revell, Dave Jones, Matt Mackall, Andy Isaacson,
	Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	Linux Kernel Mailing List, tytso, Andrew Morton

On Sun, May 15, 2005 at 03:38:22PM +0200, Mikulas Patocka wrote:
> Another possibility to get timing is from direct-io --- i.e. initiate
> direct io read, wait until one cache line contains new data and you can be
> sure that the next will contain new data in certain time. IDE controller
> bus master operation acts here as a timer.

There's no way to do direct-io through seccomp, all the fds are pipes
with twisted userland listening the other side of the pipe. So disabling
the tsc is more than enough to give to CPUShare users a peace of mind
with HT enabled and without having to flush the l2 cache either.
CPUShare is the only case I can imagine where an untrusted and random
bytecode running at 100% system load is the normal behaviour.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15  9:43         ` Andi Kleen
  2005-05-15 18:42           ` David Schwartz
@ 2005-05-16  7:10           ` Eric W. Biederman
  2005-05-16 11:04             ` Andi Kleen
  1 sibling, 1 reply; 150+ messages in thread
From: Eric W. Biederman @ 2005-05-16  7:10 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andy Isaacson, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso

Andi Kleen <ak@muc.de> writes:

> On Fri, May 13, 2005 at 02:26:20PM -0700, Andy Isaacson wrote:
> > On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote:
> > > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> > > > Why?  It's certainly reasonable to disable it for the time being and
> > > > even prudent to do so.
> > > 
> > > No, i strongly disagree on that. The reasonable thing to do is
> > > to fix the crypto code which has this vulnerability, not break
> > > a useful performance enhancement for everybody else.
> > 
> > Pardon me for saying so, but that's bullshit.  You're asking the crypto
> > guys to give up a 5x performance gain (that's my wild guess) by giving
> > up all their data-dependent algorithms and contorting their code wildly,
> > to avoid a microarchitectural problem with Intel's HT implementation.
> 
> And what you're doing is to ask all the non crypto guys to give
> up an useful optimization just to fix a problem in the crypto guy's
> code. The cache line information leak is just a information leak
> bug in the crypto code, not a general problem.

It is not a problem in the crypto code, it is a mis-feature of
the hardware/kernel combination.  As such you must know be intimate
about each and every flavor of the hardware to attempt to avoid
it in the software, and that way lies madness.

First this is a reminder that prefect security requires an audit
of the hardware as well as the software.  As we are neither
auditing the hardware not locking it down we obviously will not
achieve perfection.  The question then becomes what can be done
to decrease the likely hood that an application will inadvertently
and unavoidably leak information from timing attacks due to unknown
hardware optimizations?  Attacks that do not result from hardware
micro-architecture are another problem and one an application can
anticipate and avoid.

Ideally a solution will be proposed that will allow this problem
to be avoided using the existing POSIX API or at least the current
linux kernel API.  But that problem may not be the case.

The only solution I have seen proposed so far that seems to work
is to not schedule untrusted processes simultaneously with 
the security code.  With the current API that sounds like
a root process killing off, or at least stopping all non-root
processes until the critical process has finished.

Potentially the scheduler can be modified to do this at a finer
grain but I don't know if this would impact the scheduler fast
path.  Given the rarity and uncertainty of this it should probably
be something that the process that is worried about security should
asks for instead of simply getting by default.

So it looks to me like the sanest way to handle this is to
allocate a pool of threads/processes one per cpu.  Set the
affinity of each process to a particular cpu.  And set priority
of the threads to run at the highest priority.  And during the
critical time ensure none of the threads are sleeping.

Can someone see a better way to prevent an accidental information
leak to do to hardware architecture details?   

I wish there was a better way to ensure all of the threads were
running simultaneously and other then giving them the highest priority 
in the system but I don't currently see an alternative.

> There is much more non crypto code than crypto code around - you
> are proposing to screw the majority of codes to solve a relatively
> obscure problem of only a few functions, which seems like the totally
> wrong approach to me.
> 
> BTW the crypto guys are always free to check for hyperthreading
> themselves and use different functions.  However there is a catch
> there - the modern dual core processors which actually have
> separated L1 and L2 caches set these too to stay compatible
> with old code and license managers.

And those same processors will have the same problem if the share
significant cpu resources.  Ideally the entire problem set
would fit in the cache and the cpu designers would allow cache
blocks to be locked but that is not currently the case.  So a shared
L3 cache with dual core processors will have the same problem.

In addition a flavor of this attack may be made by repeatedly doing
multiplies or other activities that access functional units and seeing
how long they have to be waited for.  So even hyperthreading without
sharing a L2 cache may see this problem.

Eric

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-15 23:10                                             ` Lee Revell
@ 2005-05-16  7:25                                               ` Arjan van de Ven
  0 siblings, 0 replies; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-16  7:25 UTC (permalink / raw)
  To: Lee Revell
  Cc: Dave Jones, Alan Cox, Matt Mackall, Andy Isaacson, Andi Kleen,
	Richard F. Rebel, Gabor MICSKO, Linux Kernel Mailing List, tytso

On Sun, 2005-05-15 at 19:10 -0400, Lee Revell wrote:
> On Sun, 2005-05-15 at 18:55 -0400, Dave Jones wrote:
> > On Sun, May 15, 2005 at 05:10:59PM -0400, Lee Revell wrote:
> >  > On Sun, 2005-05-15 at 22:48 +0200, Arjan van de Ven wrote:
> >  > > On Sun, 2005-05-15 at 21:41 +0100, Alan Cox wrote:
> >  > > > On Sul, 2005-05-15 at 08:30, Arjan van de Ven wrote:
> >  > > > > stop entirely.... (and that is also happening more and more and linux is
> >  > > > > getting more agressive idle support (eg no timer tick and such patches)
> >  > > > > which will trigger bios thresholds for this even more too.
> >  > > > 
> >  > > > Cyrix did TSC stop on halt a long long time ago, back when it was worth
> >  > > > the power difference.
> >  > > 
> >  > > With linux going to ACPI C2 mode more... tsc is defined to halt in C2...
> >  > 
> >  > JACK doesn't care about any of this now, the behavior when you
> >  > suspend/resume with a running jackd is undefined.  Eventually we should
> >  > handle it, but there's no point until the ALSA drivers get proper
> >  > suspend/resume support.
> > 
> > suspend/resume are S states, not C states. C states are occuring
> > during runtime.
> 
> It should never go into C2 if jackd is running, because you're getting
> interrupts from the audio interface at least every 100ms or so (usually
> much more often) which will wake up jackd and any clients.

you're not guaranteed to not enter C2 in that case. C2 can happen after
a few ms already



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Linux does not care for data integrity (was: Disk write cache)
  2005-05-15 16:43                     ` Jeff Garzik
                                         ` (3 preceding siblings ...)
  2005-05-15 17:20                       ` Kenichi Okuyama
@ 2005-05-16 11:02                       ` Matthias Andree
  2005-05-16 11:12                         ` Arjan van de Ven
  2005-05-16 13:48                         ` Linux does not care for data integrity Mark Lord
  4 siblings, 2 replies; 150+ messages in thread
From: Matthias Andree @ 2005-05-16 11:02 UTC (permalink / raw)
  To: linux-kernel

On Sun, 15 May 2005, Jeff Garzik wrote:

> The ability of a filesystem or fsync(2) to cause a [FLUSH|SYNC] CACHE 
> command to be generated has only been present in the most recent 2.6.x 
> kernels.  See the "write barrier" stuff that people have been discussing.

To make this explicit and unmistakable, Linux should be ashamed of
having put its users' data at risk for as long as it has existed, and
looking at how often I still get "barrier synch failed", it still does
with the kernel SUSE Linux 9.3 shipped with.

This came up several times, from database and mailserver authors, but
found no reasonable solution to date.

The documentation which file systems request cache flush for fsync, and
which device drivers (SCSI as ATA) as well as chipset adaptors pass this
down properly, is still missing. I've asked for help with such a list
several times over the recent years, I've offered my help in setting up
and maintaining the list when sent the raw information, but no-one cared
to provide this kind of information.

I will not try again, it's no good, kernel hackers with a handful of
exceptions, don't care.

If they think they do in spite of my statement, they'll have to prove
their point by growing up and documenting which combinations of
(file system, mount options, block dev driver, hardware/chip driver)
barrier synch is 100% reliable, which file systems, chipset drivers,
block drivers, hardware drivers, are missing links in the chain -- and
request that the kernel switches off the drive's write cache in all
drives unless the whole fsync() stuff works (unless defeated by a
"benchmark" kernel boot parameter).

Until then, my applications will have to recommend that users switch off
drive caches for consistency.

P. S.: Yes, the subject and this mail are provoking and exaggerated a
       tiny bit. I feel that's needed to raise the necessary motivation
       to finally address this issue after a decade or so.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-16  7:10           ` Eric W. Biederman
@ 2005-05-16 11:04             ` Andi Kleen
  2005-05-16 19:14               ` Eric W. Biederman
  0 siblings, 1 reply; 150+ messages in thread
From: Andi Kleen @ 2005-05-16 11:04 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andy Isaacson, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso

> The only solution I have seen proposed so far that seems to work
> is to not schedule untrusted processes simultaneously with 
> the security code.  With the current API that sounds like
> a root process killing off, or at least stopping all non-root
> processes until the critical process has finished.

With virtualization and a hypervisor freely scheduling it is quite 
impossible to guarantee this. Of course as always the signal 
is quite noisy so it is unclear if it is exploitable in practical 
settings. On virtualized environments you cannot use ps to see
if a crypto process is running. 

> And those same processors will have the same problem if the share
> significant cpu resources.  Ideally the entire problem set
> would fit in the cache and the cpu designers would allow cache
> blocks to be locked but that is not currently the case.  So a shared
> L3 cache with dual core processors will have the same problem.

At some point the signal gets noisy enough and the assumptions
an attacker has to make too great for it being an useful attack.
For me it is not even clear it is a real attack on native Linux, at 
least the setup in the paper looked highly artifical and quite impractical. 
e.g. I suppose it would be quite difficult to really synchronize
to the beginning and end of the RSA encryptions on a server that
does other things too.

-Andi


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 11:02                       ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree
@ 2005-05-16 11:12                         ` Arjan van de Ven
  2005-05-16 11:29                           ` Matthias Andree
  2005-05-16 14:57                           ` Linux does not care for data integrity (was: Disk write cache) Alan Cox
  2005-05-16 13:48                         ` Linux does not care for data integrity Mark Lord
  1 sibling, 2 replies; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-16 11:12 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

>  and
> request that the kernel switches off the drive's write cache in all
> drives unless the whole fsync() stuff works (unless defeated by a
> "benchmark" kernel boot parameter).

I think you missed the part where disabling the writecache decreases the
mtbf of your disk by like a factor 100 or so. At which point your
dataloss opportunity INCREASES by doing this.

Sure you can waive rethorics around, but the fact is that linux is
improving; there now is write barrier support for ext3 (and I assume
reiserfs) for at least IDE and iirc selected scsi too. 

Lets repeat that again: disabling the writecache altogether is bad for
your disk. really bad. Barriers aren't brilliant for it either but a
heck of a lot better. Lacking barriers, it's probably safer for your
data to have write cache on than off.



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 16:24                 ` Mikulas Patocka
@ 2005-05-16 11:18                   ` Matthias Andree
  2005-05-16 14:33                     ` Jeff Garzik
                                       ` (2 more replies)
  0 siblings, 3 replies; 150+ messages in thread
From: Matthias Andree @ 2005-05-16 11:18 UTC (permalink / raw)
  To: linux-kernel

On Sun, 15 May 2005, Mikulas Patocka wrote:

> Note that disk can still ignore FLUSH CACHE command cached data are small
> enough to be written on power loss, so small FLUSH CACHE time doesn't
> prove disk cheating.

Have you seen a drive yet that writes back blocks after power loss?

I have heard rumors about this, but all OEM manuals I looked at for
drives I bought or recommended simply stated that the block currently
being written at power loss can become damaged (with write cache off),
and that the drive can lose the full write cache at power loss (with
write cache on) so this looks like daydreaming manifested as rumor.

I've heard that drives would be taking rotational energy from their
rotating platters and such, but never heard how the hardware compensates
the dilation with decreasing rotational frequency, which also requires
changed filter settings for the write channel, block encoding, delays,
possibly stepping the heads and so on. I don't believe these stories
until I see evidence.

These are corner cases that a vendor would hardly optimize for.
If you know a disk drive (not battery-backed disk controller!) that
flashes its cache to NVRAM, or uses rotational energy to save its cache
on the platters, please name brand and model and where I can download
the material that documents this behavior.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 11:12                         ` Arjan van de Ven
@ 2005-05-16 11:29                           ` Matthias Andree
  2005-05-16 14:02                             ` Arjan van de Ven
  2005-05-16 14:57                           ` Linux does not care for data integrity (was: Disk write cache) Alan Cox
  1 sibling, 1 reply; 150+ messages in thread
From: Matthias Andree @ 2005-05-16 11:29 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Matthias Andree, linux-kernel

On Mon, 16 May 2005, Arjan van de Ven wrote:

> I think you missed the part where disabling the writecache decreases the
> mtbf of your disk by like a factor 100 or so. At which point your
> dataloss opportunity INCREASES by doing this.

Nah, if that were a factor of 100, then it should have been in the OEM
manuals, no?

Besides that, although my small sample is not representative, I have
older drives still alive & kicking - an MTBF of 1/100 of what the vendor
stated would mean the chance of failure way above 90 % by now, the drive
has seen 22,000 POH with write cache off and has been a system drive for
like 14,000 POH. So?

> Sure you can waive rethorics around, but the fact is that linux is
> improving; there now is write barrier support for ext3 (and I assume
> reiserfs) for at least IDE and iirc selected scsi too. 

See the problem: "I assume", "IIRC selected...". There is no
list of corroborated facts which systems work and which don't. I have
made several attempts in compiling one, posting public calls for data
here, no response.

I don't blame you personally, but the lack of documentation about such
crucial facts or generally documentation in Linux environments in general.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-16 17:00       ` Linus Torvalds
@ 2005-05-16 12:37         ` Tommy Reynolds
  0 siblings, 0 replies; 150+ messages in thread
From: Tommy Reynolds @ 2005-05-16 12:37 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]

Uttered Linus Torvalds <torvalds@osdl.org>, spake thus:

> It does show that if you want to hide key operations, you want to be 
> careful. I don't think HT is at fault per se.

Trivially easy when two processes share the same FS namespace.
Consider two files:

$ ls -l /tmp/a /tmp/b
-rw------ 1 owner owner xxxxx /tmp/a
-rw------ 1 owner owner xxxxx /tmp/b

One file serves as a clock.  Note that the permissions deny all
access to everyone except the owner.  The owner user then does this,
intentionally or unintentionally:

for x in 0 0 0 1 0 0 0 0 0 1
do
	rm -f /tmp/a /tmp/b
	case "$x" in
	0 )	rm -f /tmp/a;;
	1 )	touch /tmp/a;;
	esac
	touch /tmp/b
	sleep	2
done

And the baddie does this:

let n=1
let char=0
while (($n < 8))
do
	while [ ! -f /tmp/b ]; do
		sleep 0.5
	done
	let char=((char << 1))
	if [ -f /tmp/a ]; do
		let char=((char + 1))
	done
done
printf "The letter was: %c\n" $char

This is one of the classic TEMPEST problems that secure systems have
long had to deal with.  See, at no time did HT ever raise its ugly
head ;-)

Cheers

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-14  0:39         ` dean gaudet
@ 2005-05-16 13:41           ` Andrea Arcangeli
  0 siblings, 0 replies; 150+ messages in thread
From: Andrea Arcangeli @ 2005-05-16 13:41 UTC (permalink / raw)
  To: dean gaudet
  Cc: Andy Isaacson, Andi Kleen, Richard F. Rebel, Gabor MICSKO,
	linux-kernel, mpm, tytso

On Fri, May 13, 2005 at 05:39:25PM -0700, dean gaudet wrote:
> same cache index -- and get an 8-fold reduction in exposure.  the trick 
> here is the L2 is physically indexed, and userland code can perform only 
> virtual allocations.  but it's not too hard to discover physical conflicts 
> if you really want to (using rdtsc) -- it would be done early in the 
> initialization of the program because it involves asking for enough memory 
> until the kernel gives you enough colliding pages.  (a system call could 
> help with this if we really wanted it.)

A 8-way set associative 1M cache is guaranteed to go at l2 speed only
up to 128K (no matter what the kernel does), but even if the secret
payload is larger than 128K as long as the load is still distributed
evenly at each pass for each page, there's not going to be any covert
channel, simply the process will run slower than it could if it had a
better page coloring.

So I don't see the need of kernel support, all it needs to do is to know
the page size, and that's provided to userland already.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16  3:08                       ` Gene Heskett
@ 2005-05-16 13:44                         ` Mark Lord
  0 siblings, 0 replies; 150+ messages in thread
From: Mark Lord @ 2005-05-16 13:44 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

Gene Heskett wrote:
>
>>There's your clue.  The drive LEDs normally reflect activity
>>over the ATA bus (the cable!). If they're not on, then the drive
>>isn't receiving data/commands from the host.
>
> That was my theory too Mark, but Jeff G. says its not a valid 
> indicator.  So who's right?

If the LEDs are connected to the controller on the motherboard,
then they are a strict indication of activity over the cable
between the drive and controller (if they function at all).
But it is possible for software to leave those LEDs permanently
in the "on" state, depending on the register sequence used.

If the LEDs are on the drive itself, they may indicate transfers
over the connector (cable) -- usually always the case -- or they
could indicate transfers to/from the media.

Cheers


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-05-16 11:02                       ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree
  2005-05-16 11:12                         ` Arjan van de Ven
@ 2005-05-16 13:48                         ` Mark Lord
  2005-05-16 14:59                           ` Matthias Andree
  1 sibling, 1 reply; 150+ messages in thread
From: Mark Lord @ 2005-05-16 13:48 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

 >To make this explicit and unmistakable, Linux should be ashamed of
 >having put its users' data at risk for as long as it has existed, and
 >looking at how often I still get "barrier synch failed", it still does
 >with the kernel SUSE Linux 9.3 shipped with.

With ATA drives, this is strictly a userspace "policy" decision.

Most of us want longer lifespan and 2X the performance from our hardware,
and use UPSs to guarantee continuous power & survivability.

Others want to live more dangerously on the power supply end,
but still be safe on the filesystem end -- no guarantees there,
even with "hdparm -W0" to disable the on-drive cache.

Pulling power from a writing drive is ALWAYS a bad idea,
and can permanently corrupt the track/cylinder that was being
written.  This will toast a filesystem regardless of how careful
or proper the write flushes were done.

Write caching on the drive is not as big an issue as
good reliable power for this.

Cheers

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 11:29                           ` Matthias Andree
@ 2005-05-16 14:02                             ` Arjan van de Ven
  2005-05-16 14:48                               ` Matthias Andree
  0 siblings, 1 reply; 150+ messages in thread
From: Arjan van de Ven @ 2005-05-16 14:02 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

On Mon, 2005-05-16 at 13:29 +0200, Matthias Andree wrote:
> On Mon, 16 May 2005, Arjan van de Ven wrote:
> 
> > I think you missed the part where disabling the writecache decreases the
> > mtbf of your disk by like a factor 100 or so. At which point your
> > dataloss opportunity INCREASES by doing this.
> 
> Nah, if that were a factor of 100, then it should have been in the OEM
> manuals, no?

Why would they? Windows doesn't do it.  They only need to advertise MTBF
in the default settings (and I guess in Windows).

They do talk about this if you ask them.

>  So?

one sample doesn't prove the statistics are wrong.

> 
> > Sure you can waive rethorics around, but the fact is that linux is
> > improving; there now is write barrier support for ext3 (and I assume
> > reiserfs) for at least IDE and iirc selected scsi too. 
> 
> See the problem: "I assume", "IIRC selected...". There is no
> list of corroborated facts which systems work and which don't. I have
> made several attempts in compiling one, posting public calls for data
> here, no response.

well what stops you from building that list yourself by doing the actual
work yourself?



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16 11:18                   ` Matthias Andree
@ 2005-05-16 14:33                     ` Jeff Garzik
  2005-05-16 15:26                       ` Richard B. Johnson
  2005-05-16 18:11                       ` Disk write cache (Was: Hyper-Threading Vulnerability) Valdis.Kletnieks
  2005-05-16 14:54                     ` Alan Cox
  2005-05-18  4:06                     ` Eric D. Mudama
  2 siblings, 2 replies; 150+ messages in thread
From: Jeff Garzik @ 2005-05-16 14:33 UTC (permalink / raw)
  To: Matthias Andree; +Cc: linux-kernel

Matthias Andree wrote:
> On Sun, 15 May 2005, Mikulas Patocka wrote:
> 
> 
>>Note that disk can still ignore FLUSH CACHE command cached data are small
>>enough to be written on power loss, so small FLUSH CACHE time doesn't
>>prove disk cheating.
> 
> 
> Have you seen a drive yet that writes back blocks after power loss?
> 
> I have heard rumors about this, but all OEM manuals I looked at for
> drives I bought or recommended simply stated that the block currently
> being written at power loss can become damaged (with write cache off),
> and that the drive can lose the full write cache at power loss (with
> write cache on) so this looks like daydreaming manifested as rumor.

Upon power loss, at least one ATA vendor's disks try to write out as 
much data as possible.

	Jeff




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 14:02                             ` Arjan van de Ven
@ 2005-05-16 14:48                               ` Matthias Andree
  2005-05-16 15:06                                 ` Alan Cox
  0 siblings, 1 reply; 150+ messages in thread
From: Matthias Andree @ 2005-05-16 14:48 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Matthias Andree, linux-kernel

On Mon, 16 May 2005, Arjan van de Ven wrote:

> > See the problem: "I assume", "IIRC selected...". There is no
> > list of corroborated facts which systems work and which don't. I have
> > made several attempts in compiling one, posting public calls for data
> > here, no response.
> 
> well what stops you from building that list yourself by doing the actual
> work yourself?

Two things.

#1 it's the subsystem maintainer's responsibility to arrange for such
information. I searched Documentation/* to no avail, see below.

#2 That I would need to get acquainted with and understand several dozen
subsystems, drivers and so on to be able to make a substantiated
statement.

Subsystem maintainers will usually know the shape their code is in and
just need to state "not yet", "not planned", "not needed, different
layer", "work in progress" or "working since kernel version 2.6.42".

Takes a minute per maintainer, rather than wasting countless hours on
working through foreign code only to forget all this after I know what I
wanted to know. Sounds like an unreasonable expectation? Not to me. I
had hoped, several times, that asking here would give the first dozen of
answers as a starting point.

It's not as though I could go forth and just take two weeks off a shelf
and read all common block devices code...

I still have insufficient information even for ext3 on traditional
parallel ATA interfaces, so how do I start a list without information?

$ cd linux-2.6/Documentation/
$ find -iname '*barr*'
./arm/Sharp-LH/IOBarrier
$ head -4 ../Makefile
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 11
EXTRAVERSION = .9
$

Documentation/block/biodoc.txt has some information about how it could
look like two years from now. filesystems/ext3 mentions it requires a
barrier=1 mount option. No information what block interfaces support it.

AIC7XXX was once reported to have it, experimentally, I don't know what
has become of the code, and I don't have AIC7XXX here, too expensive.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-15 15:00             ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka
  2005-05-15 15:21               ` Gene Heskett
@ 2005-05-16 14:50               ` Alan Cox
  1 sibling, 0 replies; 150+ messages in thread
From: Alan Cox @ 2005-05-16 14:50 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: Tomasz Torcz, Linux Kernel Mailing List

On Sul, 2005-05-15 at 16:00, Mikulas Patocka wrote:
> There are rumors that some disks ignore FLUSH CACHE command just to get
> higher benchmarks in Windows. But I haven't heart of any proof. Does
> anybody know, what companies fake this command?

The specification was intentionally written so that his command has to
do what it is specified to or be unknown and thus error and not be in
the ident info.

That was done by people who wanted to be very sure that any vendor who
tried to shortcut the command would have "sue me" written on their
forehead.

There are problems with a few older drives which have a write cache but
don't support cache commands.

Alan


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16 11:18                   ` Matthias Andree
  2005-05-16 14:33                     ` Jeff Garzik
@ 2005-05-16 14:54                     ` Alan Cox
  2005-05-17 13:15                       ` Bill Davidsen
  2005-05-18  4:06                     ` Eric D. Mudama
  2 siblings, 1 reply; 150+ messages in thread
From: Alan Cox @ 2005-05-16 14:54 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Linux Kernel Mailing List

> I have heard rumors about this, but all OEM manuals I looked at for
> drives I bought or recommended simply stated that the block currently
> being written at power loss can become damaged (with write cache off),
> and that the drive can lose the full write cache at power loss (with
> write cache on) so this looks like daydreaming manifested as rumor.

IBM drives definitely used to trash the sector in this case. They newer
ones either don't or recover from it presumably because people took that
to be a drive failure and returned it. Sometimes the people win ;)

> flashes its cache to NVRAM, or uses rotational energy to save its cache
> on the platters, please name brand and model and where I can download
> the material that documents this behavior.

I am not aware of any IDE drive with these properties.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 11:12                         ` Arjan van de Ven
  2005-05-16 11:29                           ` Matthias Andree
@ 2005-05-16 14:57                           ` Alan Cox
  1 sibling, 0 replies; 150+ messages in thread
From: Alan Cox @ 2005-05-16 14:57 UTC (permalink / raw)
  To: Arjan van de Ven; +Cc: Matthias Andree, Linux Kernel Mailing List

On Llu, 2005-05-16 at 12:12, Arjan van de Ven wrote:
> Sure you can waive rethorics around, but the fact is that linux is
> improving; there now is write barrier support for ext3 (and I assume
> reiserfs) for at least IDE and iirc selected scsi too. 

scsi supports tagging so ext3 at least is just fine.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-05-16 13:48                         ` Linux does not care for data integrity Mark Lord
@ 2005-05-16 14:59                           ` Matthias Andree
  0 siblings, 0 replies; 150+ messages in thread
From: Matthias Andree @ 2005-05-16 14:59 UTC (permalink / raw)
  To: Mark Lord; +Cc: Matthias Andree, linux-kernel

On Mon, 16 May 2005, Mark Lord wrote:

> Most of us want longer lifespan and 2X the performance from our hardware,
> and use UPSs to guarantee continuous power & survivability.

Which is a different story and doesn't protect from dying power supply
units.  I have replaced several PSUs that died "in mid-flight" and that
were not overloaded. UPS isn't going to help in that case. Of course you
can use a redundant PSU, redundant UPS - but that's easily more than a
battery-backed up cache on a decent RAID controller - since drive
failure will also toast file systems.

> Others want to live more dangerously on the power supply end,
> but still be safe on the filesystem end -- no guarantees there,
> even with "hdparm -W0" to disable the on-drive cache.

As long as one can rely on the kernel scheduling writes in the proper
order, no problem that I'd see. ext3 has apparently been doing this for
a long time in the default options, and I have yet to see ext3
corruption (except for massive hardware failure such as b0rked non-ECC
RAM or a harddisk that crashed its heads).

> Pulling power from a writing drive is ALWAYS a bad idea,
> and can permanently corrupt the track/cylinder that was being
> written.  This will toast a filesystem regardless of how careful
> or proper the write flushes were done.

Most drive manufacturers make more extensive guarantees about what gets
NOT damaged when a write is interrupted by power loss, and are careful
to turn the write current off pretty soon on power loss. None of the OEM
manuals I looked at advised that data that was already on disk would be
damaged beyond the block that was being written.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 14:48                               ` Matthias Andree
@ 2005-05-16 15:06                                 ` Alan Cox
  2005-05-16 15:40                                   ` Matthias Andree
  2005-05-29 21:02                                   ` Linux does not care for data integrity (was: Disk write cache) Greg Stark
  0 siblings, 2 replies; 150+ messages in thread
From: Alan Cox @ 2005-05-16 15:06 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Arjan van de Ven, Linux Kernel Mailing List

I think you need to get real if you want that degree of integrity with a
PC

Your typical PC setup means your precious data

	Gets written to un ecc protected memory over an unprotected bus
	Gets read back over the same
	Each PATA command is sent without any CRC or error recovery/correction
	The PATA data is pulled out of unprotected memory over PCI
	It goes to the drive (with a CRC) and gets stored in memory
	It's probably sitting in non ECC RAM on the disk
	It's probably fed through non ECC DSP logic
	It's mixed on the disk with other data and may get rewritten without
you knowing

You might want to amuse yourself trying to get the bit error rates for
the busses and ram to start documenting the probabilities.

I'd prefer Linux turned writecache off on old drives but Mark Lord has
really good points even there. And for scsi we do tagging and the
journals can be ordered depending on your need.

You are storing 40 billion bits of information on a lump of metal and
glass rotating at 10,000rpm and pushing into areas of quantum theory in
order to store you data. It should be no suprise it might not be there a
month later.

You also appear confused: It isn't the maintainers responsibility to
arrange for such info. It's the maintainers responsibility to process
contributed patches with such info.



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16 14:33                     ` Jeff Garzik
@ 2005-05-16 15:26                       ` Richard B. Johnson
  2005-05-16 16:00                         ` [OT] drive behavior on power-off (was: Disk write cache) Matthias Andree
  2005-05-16 18:11                       ` Disk write cache (Was: Hyper-Threading Vulnerability) Valdis.Kletnieks
  1 sibling, 1 reply; 150+ messages in thread
From: Richard B. Johnson @ 2005-05-16 15:26 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Matthias Andree, linux-kernel

On Mon, 16 May 2005, Jeff Garzik wrote:

> Matthias Andree wrote:
>> On Sun, 15 May 2005, Mikulas Patocka wrote:
>>
>>
>>> Note that disk can still ignore FLUSH CACHE command cached data are small
>>> enough to be written on power loss, so small FLUSH CACHE time doesn't
>>> prove disk cheating.
>>
>> Have you seen a drive yet that writes back blocks after power loss?
>>
>> I have heard rumors about this, but all OEM manuals I looked at for
>> drives I bought or recommended simply stated that the block currently
>> being written at power loss can become damaged (with write cache off),
>> and that the drive can lose the full write cache at power loss (with
>> write cache on) so this looks like daydreaming manifested as rumor.
>
> Upon power loss, at least one ATA vendor's disks try to write out as
> much data as possible.
>
> 	Jeff

Then I suggest you never use such a drive. Anything that does this,
will end up replacing a good track with garbage. Unless a disk drive
has a built-in power source such as super-capacitors or batteries, what
happens during a power-failure is that all electronics stops and
the discs start coasting. Eventually the heads will crash onto
the platter. Older discs had a magnetically released latch which would
send the heads to an inside landing zone. Nobody bothers anymore.

Many high-quality drives cache data. Fortunately, upon power loss
these data are NOT attempted to be written. This means that,
although you may have incomplete or even bad data on the physical
medium, at least the medium can be read and written. The sectoring
has not been corrupted (read destroyed).

If you think about the physical process necessary to write data to
the medium, you will understand that without a large amount of
energy storage capacity on the disk, it's just not possible.

To write a sector, one needs to cache the data in a sector-buffer
putting on a sector header and trailing CRC, wait for the write-
splice from the previous sector (could be almost one rotation),
then write data and sync to the sector. If the disc is too slow,
these data will be underwrite the sector. Also, if the disc
was only 5 percent slow, the clock recovery on a subsequent
read will be off by 5 percent, outside the range of PLL lock-in,
so you write something that can never be read, a guaranteed bad block.

Combinations of journalizing on media that can be completely flushed,
and ordinary cache-intensive discs can result in reliable data
storage. However a single ATA or SCSI disk just isn't a perfectly
reliable storage medium although it's usually good enough.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 15:06                                 ` Alan Cox
@ 2005-05-16 15:40                                   ` Matthias Andree
  2005-05-16 18:04                                     ` Alan Cox
  2005-05-29 21:02                                   ` Linux does not care for data integrity (was: Disk write cache) Greg Stark
  1 sibling, 1 reply; 150+ messages in thread
From: Matthias Andree @ 2005-05-16 15:40 UTC (permalink / raw)
  To: Alan Cox; +Cc: Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List

On Mon, 16 May 2005, Alan Cox wrote:

> I'd prefer Linux turned writecache off on old drives but Mark Lord has
> really good points even there. And for scsi we do tagging and the
> journals can be ordered depending on your need.

Is tagged command queueing (we'll need the ordered tag here) compatible
with all SCSI adaptors that Linux supports?

What if tagged command queueing is switched off for some reason
(adaptor or HW incapability, user override) and the drive still has
write cache enable = true and queue algorithm modifier = 1 (which
permits out-of-order execution of write requests except for ordered
tags)? Is that something that would cause some bit of notice to be
logged? Or is that simply "do this at your own risk". My recent SCSI
drives have been shipping with WCE=1 and QAM=0.

Am I missing a bit here?

> You also appear confused: It isn't the maintainers responsibility to
> arrange for such info. It's the maintainers responsibility to process
> contributed patches with such info.

I didn't think of arranging as in "write himself". Who writes that info
down doesn't matter, but I'd think that such documentation should always
be committed alongside the code, except in code marked experimental.
(which, in turn, should only be promoted to non-experimental if it's
properly documented).

I understand that people who understand the code are eager to focus on
the code and even if that documentation is just an unordered lists of
statement with a kernel version attached, that'd be fine. But what is a
decent code without users?

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* [OT] drive behavior on power-off (was: Disk write cache)
  2005-05-16 15:26                       ` Richard B. Johnson
@ 2005-05-16 16:00                         ` Matthias Andree
  0 siblings, 0 replies; 150+ messages in thread
From: Matthias Andree @ 2005-05-16 16:00 UTC (permalink / raw)
  To: Richard B. Johnson; +Cc: Jeff Garzik, Matthias Andree, linux-kernel

On Mon, 16 May 2005, Richard B. Johnson wrote:

> Then I suggest you never use such a drive. Anything that does this,
> will end up replacing a good track with garbage. Unless a disk drive
> has a built-in power source such as super-capacitors or batteries, what
> happens during a power-failure is that all electronics stops and
> the discs start coasting. Eventually the heads will crash onto
> the platter. Older discs had a magnetically released latch which would
> send the heads to an inside landing zone. Nobody bothers anymore.

IBM/Hitachi hard disk drives still use a "load/unload ramp" that
entirely moves the heads off the platters - I've known this since the
DJNA, and it is still advertised in Deskstar 7K500 and Ultrastar 15K147
to name just two examples.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:49     ` Scott Robert Ladd
  2005-05-13 19:08       ` Andi Kleen
  2005-05-13 19:36       ` Grant Coady
@ 2005-05-16 17:00       ` Linus Torvalds
  2005-05-16 12:37         ` Tommy Reynolds
  2 siblings, 1 reply; 150+ messages in thread
From: Linus Torvalds @ 2005-05-16 17:00 UTC (permalink / raw)
  To: Scott Robert Ladd
  Cc: Alan Cox, Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List



On Fri, 13 May 2005, Scott Robert Ladd wrote:
>
> Alan Cox wrote:
> > HT for most users is pretty irrelevant, its a neat idea but the
> > benchmarks don't suggest its too big a hit
> 
> On real-world applications, I haven't seen HT boost performance by more
> than 15% on a Pentium 4 -- and the usual gain is around 5%, if anything
> at all. HT is a nice idea, but I don't enable it on my systems.

HT is _wonderful_ for latency reduction. 

Why people think "performace" means "throughput" is something I'll never
understand. Throughput is _always_ secondary to latency, and really only
becomes interesting when it becomes a latency number (ie "I need higher
throughput in order to process these jobs in 4 hours instead of 8" -
notice how the real issue was again about _latency_).

Now, Linux tends to have pretty good CPU latency anyway, so it's not
usually that big of a deal, but I definitely enjoyed having a HT machine
over a regular UP one. I'm told the effect was even more pronounced on 
XP. 

Of course, these days I enjoy having dual cores more, though, and with
multiple cores, the latency advantages of HT become much less pronounced.

As to the HT "vulnerability", it really seems to be not a whole lot
different than what people saw with early SMP and (small) direct-mapped
caches. Thank God those days are gone.

I'd be really surprised if somebody is actually able to get a real-world
attack on a real-world pgp key usage or similar out of it (and as to the
covert channel, nobody cares). It's a fairly interesting approach, but
it's certainly neither new nor HT-specific, or necessarily seem all that
worrying in real life.

(HT and modern CPU speeds just means that the covert channel is _faster_
than it has been before, since you can test the L1 at core speeds. I doubt
it helps the key attack much, though, since faster in that case cuts both
ways: the speed of testing the cache eviction may have gone up, but so has
the speed of the operation you're trying to follow, and you'd likely have
a really hard time trying to catch things in real life).

It does show that if you want to hide key operations, you want to be 
careful. I don't think HT is at fault per se. 

		Linus

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 15:40                                   ` Matthias Andree
@ 2005-05-16 18:04                                     ` Alan Cox
  2005-05-16 19:11                                       ` Linux does not care for data integrity Florian Weimer
  0 siblings, 1 reply; 150+ messages in thread
From: Alan Cox @ 2005-05-16 18:04 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Arjan van de Ven, Linux Kernel Mailing List

On Llu, 2005-05-16 at 16:40, Matthias Andree wrote:
> On Mon, 16 May 2005, Alan Cox wrote:
> Is tagged command queueing (we'll need the ordered tag here) compatible
> with all SCSI adaptors that Linux supports?

TCQ is a device not controller property.

> What if tagged command queueing is switched off for some reason
> (adaptor or HW incapability, user override) and the drive still has
> write cache enable = true and queue algorithm modifier = 1 (which

We turn the write back cache off if TCQ isn't available.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16 14:33                     ` Jeff Garzik
  2005-05-16 15:26                       ` Richard B. Johnson
@ 2005-05-16 18:11                       ` Valdis.Kletnieks
  1 sibling, 0 replies; 150+ messages in thread
From: Valdis.Kletnieks @ 2005-05-16 18:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Matthias Andree, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 332 bytes --]

On Mon, 16 May 2005 10:33:30 EDT, Jeff Garzik said:

> Upon power loss, at least one ATA vendor's disks try to write out as 
> much data as possible.

Does the firmware for this vendor's disks have enough smarts to reserve that
last little bit of power to park the heads so it's not actively writing when
it finally loses entirely?

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-05-16 18:04                                     ` Alan Cox
@ 2005-05-16 19:11                                       ` Florian Weimer
  0 siblings, 0 replies; 150+ messages in thread
From: Florian Weimer @ 2005-05-16 19:11 UTC (permalink / raw)
  To: Alan Cox; +Cc: Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List

* Alan Cox:

> On Llu, 2005-05-16 at 16:40, Matthias Andree wrote:
>> On Mon, 16 May 2005, Alan Cox wrote:
>> Is tagged command queueing (we'll need the ordered tag here) compatible
>> with all SCSI adaptors that Linux supports?
>
> TCQ is a device not controller property.

I suppose it's one in RAID controllers.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-16 11:04             ` Andi Kleen
@ 2005-05-16 19:14               ` Eric W. Biederman
  2005-05-16 20:05                 ` Valdis.Kletnieks
  0 siblings, 1 reply; 150+ messages in thread
From: Eric W. Biederman @ 2005-05-16 19:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andy Isaacson, Richard F. Rebel, Gabor MICSKO, linux-kernel, mpm, tytso

Andi Kleen <ak@muc.de> writes:

> > The only solution I have seen proposed so far that seems to work
> > is to not schedule untrusted processes simultaneously with 
> > the security code.  With the current API that sounds like
> > a root process killing off, or at least stopping all non-root
> > processes until the critical process has finished.
> 
> With virtualization and a hypervisor freely scheduling it is quite 
> impossible to guarantee this. Of course as always the signal 
> is quite noisy so it is unclear if it is exploitable in practical 
> settings. On virtualized environments you cannot use ps to see
> if a crypto process is running. 

Interesting.  I think that is a problem for the hypervisor maintainer.
Although that is about enough to convince me to request a
OS flag that says "please give me privacy" and later that can be passed
down to the hypervisor.  My gut feel is running under a hypervisor
is when things will at their most vulnerable.

Where this is a threat is when there will be a lot of RSA
key transactions.  At which point it is likely that the attacker
can reproduce enough of the setup to figure out the fine details.

I think discovering a crypto process will simply be a matter
finding a https sever.  As for getting the timing how about
initiating a https connection?  Getting rid of the noise will certainly
be a challenge but you will have multiple attempts.

> > And those same processors will have the same problem if the share
> > significant cpu resources.  Ideally the entire problem set
> > would fit in the cache and the cpu designers would allow cache
> > blocks to be locked but that is not currently the case.  So a shared
> > L3 cache with dual core processors will have the same problem.
> 
> At some point the signal gets noisy enough and the assumptions
> an attacker has to make too great for it being an useful attack.
> For me it is not even clear it is a real attack on native Linux, at 
> least the setup in the paper looked highly artifical and quite impractical. 
> e.g. I suppose it would be quite difficult to really synchronize
> to the beginning and end of the RSA encryptions on a server that
> does other things too.

Possibly.  But then buffer overflow attacks when you don't know the exact
stack layout are similarly difficult and ways have been found.  And if
you have multiple chances things get easier.  And if you are aiming
at something easier then brute forcing a private key even the littlest
bit is a help.

When people mmap pages we zero them for the same reason so that
we don't have unintentional information leaks.

I agree that for now because little is known this is a highly specialized
attack.  However the trend is now towards increasingly big SMP's.
That increases the number of resources that can be shared so the
possibility of a problem increases.  At the rate Intel's cpus are
going we may see throttling of one cpu core when the other one
generates too much heat, because it is busy doing something else cpu
intensive.  And other optimizations lead to much easier to imagine
vulnerabilities.

As for noise with the area cpu designers are getting into things
are becoming increasingly fine grained so information is leaking
at an increasingly fine level.  As the L2 cache issue has shown
that information starts to leak below the level an application
designer has control of.  At which point things get very difficult
to manage.  

Information leaks are more difficult than simply gaining root on
the box where you can simply take the information you want.  But
that means that is exactly where a locked down well administered
box will be vulnerable if a way is not found to avoid the problem.
I don't know what the consequences of having your private key
discovered are, but I have never heard a case where identity theft
was something pleasant to fix.

Eric

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-16 19:14               ` Eric W. Biederman
@ 2005-05-16 20:05                 ` Valdis.Kletnieks
  0 siblings, 0 replies; 150+ messages in thread
From: Valdis.Kletnieks @ 2005-05-16 20:05 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Andi Kleen, Andy Isaacson, Richard F. Rebel, Gabor MICSKO,
	linux-kernel, mpm, tytso

[-- Attachment #1: Type: text/plain, Size: 713 bytes --]

On Mon, 16 May 2005 13:14:23 MDT, Eric W. Biederman said:

> Interesting.  I think that is a problem for the hypervisor maintainer.
> Although that is about enough to convince me to request a
> OS flag that says "please give me privacy" and later that can be passed
> down to the hypervisor.  My gut feel is running under a hypervisor
> is when things will at their most vulnerable.

Not really, because....

> I think discovering a crypto process will simply be a matter
> finding a https sever.  As for getting the timing how about
> initiating a https connection?  Getting rid of the noise will certainly
> be a challenge but you will have multiple attempts.

And the hypervisor is, if anything, adding noise.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16 14:54                     ` Alan Cox
@ 2005-05-17 13:15                       ` Bill Davidsen
  2005-05-17 21:41                         ` Kyle Moffett
  0 siblings, 1 reply; 150+ messages in thread
From: Bill Davidsen @ 2005-05-17 13:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: Matthias Andree, Linux Kernel Mailing List

On Mon, 16 May 2005, Alan Cox wrote:

> > flashes its cache to NVRAM, or uses rotational energy to save its cache
> > on the platters, please name brand and model and where I can download
> > the material that documents this behavior.
> 
> I am not aware of any IDE drive with these properties.

I'm not sure I know of a SCSI drive which does that, either. It was a big
thing a few decades to use rotational energy to park the heads, but I
haven't seen discussion of save to nvram. Then again, I haven't been
looking for it.

What would be ideal is some cache which didn't depend on power to maintain
state, like core (remember core?) or the bubble memory which spent almost
a decade being just slightly too {slow,costly} to replace disk. There
doesn't seem to be a cost effective technology yet.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-17 13:15                       ` Bill Davidsen
@ 2005-05-17 21:41                         ` Kyle Moffett
  0 siblings, 0 replies; 150+ messages in thread
From: Kyle Moffett @ 2005-05-17 21:41 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Alan Cox, Matthias Andree, Linux Kernel Mailing List

On May 17, 2005, at 09:15:52, Bill Davidsen wrote:
> What would be ideal is some cache which didn't depend on power to  
> maintain
> state, like core (remember core?) or the bubble memory which spent  
> almost
> a decade being just slightly too {slow,costly} to replace disk. There
> doesn't seem to be a cost effective technology yet.

I've seen some articles recently on a micro-punchcard technology that  
uses
grids of thousands of miniature needles and sheets of polymer plastic  
that
can be melted at somewhat low temperatures to create or remove  
indentations
in the plastic.  The device can read and write each position at a  
very high
rate, and since there are several thousand bits per position, with  
one bit
for each needle, the bandwidth is enormous.  (And it scales linearly  
with
the size of the device, too!)  Purportedly these grids can be easily  
built
with slight modifications to modern semiconductor etching  
technologies, and
the polymer plastic is reasonably simple to manufacture, so the  
resultant
cost per device is hundreds of times cheaper than today's drives.   
Likewise,
they have significantly higher memory density than current hardware  
due to
fewer relativistic and quantum effects (no magnetism).


Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$  
r  !y?(-)
------END GEEK CODE BLOCK------




^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16  2:32                     ` Mark Lord
  2005-05-16  3:08                       ` Gene Heskett
@ 2005-05-18  4:03                       ` Eric D. Mudama
  1 sibling, 0 replies; 150+ messages in thread
From: Eric D. Mudama @ 2005-05-18  4:03 UTC (permalink / raw)
  To: Mark Lord; +Cc: Gene Heskett, linux-kernel

On 5/15/05, Mark Lord <lkml@rtr.ca> wrote:
> There's your clue.  The drive LEDs normally reflect activity
> over the ATA bus (the cable!). If they're not on, then the drive
> isn't receiving data/commands from the host.

Mark is correct, activity indicators are associated with bus activity,
not internal drive activity.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-16 11:18                   ` Matthias Andree
  2005-05-16 14:33                     ` Jeff Garzik
  2005-05-16 14:54                     ` Alan Cox
@ 2005-05-18  4:06                     ` Eric D. Mudama
  2 siblings, 0 replies; 150+ messages in thread
From: Eric D. Mudama @ 2005-05-18  4:06 UTC (permalink / raw)
  To: linux-kernel

On 5/16/05, Matthias Andree <matthias.andree@gmx.de> wrote:
> I've heard that drives would be taking rotational energy from their
> rotating platters and such, but never heard how the hardware compensates
> the dilation with decreasing rotational frequency, which also requires
> changed filter settings for the write channel, block encoding, delays,
> possibly stepping the heads and so on. I don't believe these stories
> until I see evidence.

I'm pretty sure that most drives out there will immediately attempt to
safely retract or park the heads the instant that a power loss is
detected.  There's too much potential damage that can occur if the
heads aren't able to safely retract to a landing zone or ramp, that
trying to save "one more block of cached data" just isn't worth the
risk.

--eric

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Hyper-Threading Vulnerability
  2005-05-13 18:35   ` Alan Cox
  2005-05-13 18:49     ` Scott Robert Ladd
@ 2005-05-18 19:07     ` Bill Davidsen
  1 sibling, 0 replies; 150+ messages in thread
From: Bill Davidsen @ 2005-05-18 19:07 UTC (permalink / raw)
  To: Alan Cox; +Cc: Andi Kleen, Gabor MICSKO, Linux Kernel Mailing List

On Fri, 13 May 2005, Alan Cox wrote:

> > This is not a kernel problem, but a user space problem. The fix 
> > is to change the user space crypto code to need the same number of cache line
> > accesses on all keys. 
> 
> You actually also need to hit the same cache line sequence on all keys
> if you take a bit more care about it.
> 
> > Disabling HT for this would the totally wrong approach, like throwing
> > out the baby with the bath water.
> 
> HT for most users is pretty irrelevant, its a neat idea but the
> benchmarks don't suggest its too big a hit

This is one of those things which can give any result depending on the
measurement. For kernel compiles I might see a 5-30% reduction in clock
time, for threaded applications like web/mail/news not much, and for
applications which communicate via shared memory up to 50% because some
blocking system calls can be avoided and cache impact is lower.

In general I have to agree with the "too big," but I haven't seen any
indication that the hole can be exploited without being able to run a
custom application on the machine, so for single users machines and
servers the risk level seems low.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: 2.6.4 timer and helper functions
  2005-05-15 10:23                               ` 2.6.4 timer and helper functions kernel
@ 2005-05-19  0:38                                 ` George Anzinger
  0 siblings, 0 replies; 150+ messages in thread
From: George Anzinger @ 2005-05-19  0:38 UTC (permalink / raw)
  To: kernel; +Cc: linux-kernel

kernel@wired-net.gr wrote:
> Hi all,
> i am running a 2.6.4 kernel on my system , and i am playing a little bit
> with kernel time issues and helper functions,just to understand how the
> things really work.
> While doing that on my x86 system and loaded a module from LDD 3rd
> edition,jit.c, which uses a dynamic /proc file to return textual
> information.
> The info that returns is in this format and uses the kernel functions
> ,do_gettimeofday,current_kernel_time and jiffies_to_timespec.
> The output format is:
> 0x0009073c 0x000000010009073c 1116162967.247441
>                               1116162967.246530656        591.586065248
> 0x0009073c 0x000000010009073c 1116162967.247463
>                               1116162967.246530656        591.586065248
> 0x0009073c 0x000000010009073c 1116162967.247476
>                               1116162967.246530656        591.586065248
> 0x0009073c 0x000000010009073c 1116162967.247489
>                               1116162967.246530656        591.586065248
> where the first two values are the jiffies and jiffies_64.The next two are
> the do_gettimeofday and current_kernel_time and the last value is the
> jiffies_to_timespec.This output text is "recorded" after 16 minutes of
> uptime.Shouldnt the last value be the same as uptime.I have attached an
> output file from the boot time until the time the function resets the
> struct and starts count from the beggining.Is this a bug or i am missing
> sth here???

You are assuming that jiffies starts at zero at boot time.  This is clearly not 
so even from your print outs.  (It starts at a value near overflow of the low 
order 32-bits to flush out problems with the roll over.)
> 
-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-16 15:06                                 ` Alan Cox
  2005-05-16 15:40                                   ` Matthias Andree
@ 2005-05-29 21:02                                   ` Greg Stark
  2005-05-29 21:16                                     ` Matthias Andree
  1 sibling, 1 reply; 150+ messages in thread
From: Greg Stark @ 2005-05-29 21:02 UTC (permalink / raw)
  To: Alan Cox; +Cc: Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List


Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> I think you need to get real if you want that degree of integrity with a
> PC
> 
> Your typical PC setup means your precious data
...

All of your listed cases are low probability events. You're quit right that
low probability errors will always be present -- you could have just listed
cosmic rays and been finished. They're by far the most common such source of
errors.

But that doesn't mean we should just throw up our hands and say there's no way
to make computers work right, let's go home.

Making computer systems that don't randomly trash file systems in the case of
power outages isn't a hard problem. It's been solved for decades. That's *why*
fsync exists.

Oracle, Sybase, Postgres, other databases have hard requirements. They
guarantee that when they acknowledge a transaction commit the data has been
written to non-volatile media and will be recoverable even in the face of a
routine power loss.

They meet this requirement just fine on SCSI drives (where write caching
generally ships disabled) and on any OS where fsync issues a cache flush. If
the OS doesn't successfully flush the data to disk on fsync then it's quite
likely that any routine power outage will mean transactions are lost. That's
just ridiculous.

Worse, if the disk flushes the data to disk out of order it's quite likely the
entire database will be corrupted on any simple power outage. I'm not clear
whether that's the case for any common drives.

-- 
greg


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-29 21:02                                   ` Linux does not care for data integrity (was: Disk write cache) Greg Stark
@ 2005-05-29 21:16                                     ` Matthias Andree
  2005-05-30  6:04                                       ` Greg Stark
  2005-06-01 19:02                                       ` Linux does not care for data integrity Bill Davidsen
  0 siblings, 2 replies; 150+ messages in thread
From: Matthias Andree @ 2005-05-29 21:16 UTC (permalink / raw)
  To: Greg Stark
  Cc: Alan Cox, Matthias Andree, Arjan van de Ven, Linux Kernel Mailing List

On Sun, 29 May 2005, Greg Stark wrote:

> Oracle, Sybase, Postgres, other databases have hard requirements. They
> guarantee that when they acknowledge a transaction commit the data has been
> written to non-volatile media and will be recoverable even in the face of a
> routine power loss.
> 
> They meet this requirement just fine on SCSI drives (where write caching
> generally ships disabled) and on any OS where fsync issues a cache flush. If

I don't know what facts "generally ships disabled" is based on, all of
the more recent SCSI drives (non SCA type though) I acquired came with
write cache enabled and some also with queue algorithm modifier set to 1.

> Worse, if the disk flushes the data to disk out of order it's quite
> likely the entire database will be corrupted on any simple power
> outage. I'm not clear whether that's the case for any common drives.

It's a matter of enforcing write order. In how far such ordering
constraints are propagated by file systems, VFS layer, down to the
hardware, is the grand question.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-29 21:16                                     ` Matthias Andree
@ 2005-05-30  6:04                                       ` Greg Stark
  2005-05-30  8:21                                         ` Matthias Andree
  2005-06-01 19:02                                       ` Linux does not care for data integrity Bill Davidsen
  1 sibling, 1 reply; 150+ messages in thread
From: Greg Stark @ 2005-05-30  6:04 UTC (permalink / raw)
  To: Matthias Andree
  Cc: Greg Stark, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Matthias Andree <matthias.andree@gmx.de> writes:

> On Sun, 29 May 2005, Greg Stark wrote:
> 
> > They meet this requirement just fine on SCSI drives (where write caching
> > generally ships disabled) and on any OS where fsync issues a cache flush. If
> 
> I don't know what facts "generally ships disabled" is based on, all of
> the more recent SCSI drives (non SCA type though) I acquired came with
> write cache enabled and some also with queue algorithm modifier set to 1.

People routinely post "Why does this cheap IDE drive outperform my shiny new
high end SCSI drive?" questions to the postgres mailing list. To which people
point out the IDE numbers they've presented are physically impossible for a
7200 RPM drive and the SCSI numbers agree appropriately with an average
rotational latency calculated from whatever speed their SCSI drives are.

> > Worse, if the disk flushes the data to disk out of order it's quite
> > likely the entire database will be corrupted on any simple power
> > outage. I'm not clear whether that's the case for any common drives.
> 
> It's a matter of enforcing write order. In how far such ordering
> constraints are propagated by file systems, VFS layer, down to the
> hardware, is the grand question.

Well guaranteeing write order will at least mean the database isn't complete
garbage after a power event.

It still means lost transactions, something that isn't going to be acceptable
for any real-life business where those transactions are actual dollars.

-- 
greg


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity (was: Disk write cache)
  2005-05-30  6:04                                       ` Greg Stark
@ 2005-05-30  8:21                                         ` Matthias Andree
  0 siblings, 0 replies; 150+ messages in thread
From: Matthias Andree @ 2005-05-30  8:21 UTC (permalink / raw)
  To: Greg Stark
  Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

On Mon, 30 May 2005, Greg Stark wrote:

> Matthias Andree <matthias.andree@gmx.de> writes:
> 
> > On Sun, 29 May 2005, Greg Stark wrote:
> > 
> > > They meet this requirement just fine on SCSI drives (where write caching
> > > generally ships disabled) and on any OS where fsync issues a cache flush. If
> > 
> > I don't know what facts "generally ships disabled" is based on, all of
> > the more recent SCSI drives (non SCA type though) I acquired came with
> > write cache enabled and some also with queue algorithm modifier set to 1.
> 
> People routinely post "Why does this cheap IDE drive outperform my shiny new
> high end SCSI drive?" questions to the postgres mailing list. To which people
> point out the IDE numbers they've presented are physically impossible for a
> 7200 RPM drive and the SCSI numbers agree appropriately with an average
> rotational latency calculated from whatever speed their SCSI drives are.

This may be a different reason than the vendor default or the saved
setting being WCE = 0, Queue Algorithm Modifier = 0...

I would really appreciate if the kernel printed a warning for every
partition mounted that cannot both enforce write order and guarantee
synchronous completion for f(data)sync, based on the drive's write
cache, file system type, current write barrier support and all that.

> > It's a matter of enforcing write order. In how far such ordering
> > constraints are propagated by file systems, VFS layer, down to the
> > hardware, is the grand question.
> 
> Well guaranteeing write order will at least mean the database isn't complete
> garbage after a power event.
> 
> It still means lost transactions, something that isn't going to be acceptable
> for any real-life business where those transactions are actual dollars.

Right, synchronous completion is the other issue. I want the kernel to
tell me if it's capable of doing that on a particular partition (given
hardware settings WRT cache, drivers, file system, and all that). Either
in the docs or if it's too confusing via dmesg.

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-05-29 21:16                                     ` Matthias Andree
  2005-05-30  6:04                                       ` Greg Stark
@ 2005-06-01 19:02                                       ` Bill Davidsen
  2005-06-01 22:02                                         ` Matthias Andree
                                                           ` (2 more replies)
  1 sibling, 3 replies; 150+ messages in thread
From: Bill Davidsen @ 2005-06-01 19:02 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Matthias Andree wrote:
> On Sun, 29 May 2005, Greg Stark wrote:
> 
> 
>>Oracle, Sybase, Postgres, other databases have hard requirements. They
>>guarantee that when they acknowledge a transaction commit the data has been
>>written to non-volatile media and will be recoverable even in the face of a
>>routine power loss.
>>
>>They meet this requirement just fine on SCSI drives (where write caching
>>generally ships disabled) and on any OS where fsync issues a cache flush. If
> 
> 
> I don't know what facts "generally ships disabled" is based on, all of
> the more recent SCSI drives (non SCA type though) I acquired came with
> write cache enabled and some also with queue algorithm modifier set to 1.
> 
> 
>>Worse, if the disk flushes the data to disk out of order it's quite
>>likely the entire database will be corrupted on any simple power
>>outage. I'm not clear whether that's the case for any common drives.
> 
> 
> It's a matter of enforcing write order. In how far such ordering
> constraints are propagated by file systems, VFS layer, down to the
> hardware, is the grand question.
> 
The problem is that in many options required to make that happen in the 
o/s, hardware, and application are going to kill performance. And even 
if you can control order of write, unless you can get write to final 
non-volatile media control you can get a sane database but still lose 
transactions.

If there was a way for the o/s to know when a physical write was done 
other than using flushes to force completion, then overall performance 
could be higher, but individual transaction might have greater latency. 
And the app could use fsync to force order of write as needed. In many 
cases groups of writes can be done in any order as long as they are all 
done before the next logical step takes place.

This would change the meaning of fsync from "force out the data" to 
"wait for the data to be written" in some implementations.

-- 
bill davidsen <davidsen@tmr.com>
   CTO TMR Associates, Inc
   Doing interesting things with small computers since 1979

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-01 19:02                                       ` Linux does not care for data integrity Bill Davidsen
@ 2005-06-01 22:02                                         ` Matthias Andree
  2005-06-02  0:12                                           ` Bill Davidsen
  2005-06-02  0:36                                         ` Jeff Garzik
  2005-06-02  8:53                                         ` Helge Hafting
  2 siblings, 1 reply; 150+ messages in thread
From: Matthias Andree @ 2005-06-01 22:02 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

On Wed, 01 Jun 2005, Bill Davidsen wrote:

> >It's a matter of enforcing write order. In how far such ordering
> >constraints are propagated by file systems, VFS layer, down to the
> >hardware, is the grand question.
>
> The problem is that in many options required to make that happen in the 
> o/s, hardware, and application are going to kill performance. And even 
> if you can control order of write, unless you can get write to final 
> non-volatile media control you can get a sane database but still lose 
> transactions.
> 
> If there was a way for the o/s to know when a physical write was done 
> other than using flushes to force completion, then overall performance 
> could be higher, but individual transaction might have greater latency. 
> And the app could use fsync to force order of write as needed. In many 
> cases groups of writes can be done in any order as long as they are all 
> done before the next logical step takes place.

I have a déjà-vu, and I do believe that this discussion has taken place
in this list before, perhaps with a slightly different alignment, and
likely in the context of mail transfer agents and perhaps synchronous
directory (data) updates (file creation and such). Exposing a bit of the
queueing to the user space through new syscalls may be an interesting
experiment, although I do not have the resources to provide code.
Something like fsync() that doesn't flush the whole file system (which
appears to be the most common implementation) but tracks what is needed,
and that returns when data for a given file is on disk.

> This would change the meaning of fsync from "force out the data" to 
> "wait for the data to be written" in some implementations.

Naming suggestion: flazysync()

-- 
Matthias Andree

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-01 22:02                                         ` Matthias Andree
@ 2005-06-02  0:12                                           ` Bill Davidsen
  0 siblings, 0 replies; 150+ messages in thread
From: Bill Davidsen @ 2005-06-02  0:12 UTC (permalink / raw)
  To: Matthias Andree; +Cc: Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Matthias Andree wrote:

>On Wed, 01 Jun 2005, Bill Davidsen wrote:
>
>  
>
>>>It's a matter of enforcing write order. In how far such ordering
>>>constraints are propagated by file systems, VFS layer, down to the
>>>hardware, is the grand question.
>>>      
>>>
>>The problem is that in many options required to make that happen in the 
>>o/s, hardware, and application are going to kill performance. And even 
>>if you can control order of write, unless you can get write to final 
>>non-volatile media control you can get a sane database but still lose 
>>transactions.
>>
>>If there was a way for the o/s to know when a physical write was done 
>>other than using flushes to force completion, then overall performance 
>>could be higher, but individual transaction might have greater latency. 
>>And the app could use fsync to force order of write as needed. In many 
>>cases groups of writes can be done in any order as long as they are all 
>>done before the next logical step takes place.
>>    
>>
>
>I have a déjà-vu, and I do believe that this discussion has taken place
>in this list before, perhaps with a slightly different alignment, and
>likely in the context of mail transfer agents and perhaps synchronous
>directory (data) updates (file creation and such). Exposing a bit of the
>queueing to the user space through new syscalls may be an interesting
>experiment, although I do not have the resources to provide code.
>Something like fsync() that doesn't flush the whole file system (which
>appears to be the most common implementation) but tracks what is needed,
>and that returns when data for a given file is on disk.
>  
>

What I had in mind was not a "push" to flush anything anywhere, but 
rather a watch. As a hypothetical, I open a file and every time a 
write() is done a counter is incremented in the fd. That's the easy 
part. Then every time a physical write is completed the count is 
reduced. To allow for write combining the count could be in bytes rather 
than syscalls and physical operations. That's the hard part, I don't 
think the hardware is telling. In addition obviously writes may be 
combined between i/o related to several fds. But if that could be done, 
then fsync becomes "wait until my buffered byte count drops to zero," 
which could be an ioctl. Just having such a checkpoint would address 
some of the data coherency issues.

AFAIK this isn't possible with common ATA devices, and it clearly 
doesn't address every desirable feature. In spite of that, if someone 
better qualified to assess the problems and benefits cares to comment, 
fine. If not, at least I think I explained what I was thinking more clearly.

>  
>
>>This would change the meaning of fsync from "force out the data" to 
>>"wait for the data to be written" in some implementations.
>>    
>>
>
>Naming suggestion: flazysync()
>
>  
>


-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-01 19:02                                       ` Linux does not care for data integrity Bill Davidsen
  2005-06-01 22:02                                         ` Matthias Andree
@ 2005-06-02  0:36                                         ` Jeff Garzik
  2005-06-02  1:37                                           ` Bill Davidsen
  2005-06-02  8:53                                         ` Helge Hafting
  2 siblings, 1 reply; 150+ messages in thread
From: Jeff Garzik @ 2005-06-02  0:36 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Bill Davidsen wrote:
> This would change the meaning of fsync from "force out the data" to 
> "wait for the data to be written" in some implementations.

This is the meaning of fsync:  copies all in-core parts of a file to 
disk, and waits until the device reports that all parts are on stable 
storage.

Anything less is a bug.

	Jeff



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-02  0:36                                         ` Jeff Garzik
@ 2005-06-02  1:37                                           ` Bill Davidsen
  2005-06-02  1:54                                             ` Jeff Garzik
  0 siblings, 1 reply; 150+ messages in thread
From: Bill Davidsen @ 2005-06-02  1:37 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Jeff Garzik wrote:

> Bill Davidsen wrote:
>
>> This would change the meaning of fsync from "force out the data" to 
>> "wait for the data to be written" in some implementations.
>
>
> This is the meaning of fsync:  copies all in-core parts of a file to 
> disk, and waits until the device reports that all parts are on stable 
> storage.
>
> Anything less is a bug. 


How about anything more? The truth is that much common hardware doesn't 
really make the cache to disk move visible, and turning off cache really 
hurts performance. And it would appear that fsync force a lot more data 
out of memory than just the blocks for the file in question.

However, the point I was making is that it would be useful to be able to 
tell when the write to non-volatile took place, not to force that to 
happen. Not to do anything which would flush a lot of other stuff and 
busy the drive. What I suggest is NOT fsync, just a way to assure ordering.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-02  1:37                                           ` Bill Davidsen
@ 2005-06-02  1:54                                             ` Jeff Garzik
  0 siblings, 0 replies; 150+ messages in thread
From: Jeff Garzik @ 2005-06-02  1:54 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Bill Davidsen wrote:
> How about anything more? The truth is that much common hardware doesn't 
> really make the cache to disk move visible, and turning off cache really 
> hurts performance. And it would appear that fsync force a lot more data 
> out of memory than just the blocks for the file in question.

Correct.  That's the tradeoff with the ATA interface:  you must be aware 
of the cache flush requirements when designing a solution such as a 
database that really cares about fsync(2), or a journalling filesystem.


> However, the point I was making is that it would be useful to be able to 
> tell when the write to non-volatile took place, not to force that to 
> happen. Not to do anything which would flush a lot of other stuff and 
> busy the drive. What I suggest is NOT fsync, just a way to assure ordering.

To make that possible, POSIX must become a transactional, async I/O 
API... :)

	Jeff



^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-01 19:02                                       ` Linux does not care for data integrity Bill Davidsen
  2005-06-01 22:02                                         ` Matthias Andree
  2005-06-02  0:36                                         ` Jeff Garzik
@ 2005-06-02  8:53                                         ` Helge Hafting
  2005-06-02 12:00                                           ` Bill Davidsen
  2 siblings, 1 reply; 150+ messages in thread
From: Helge Hafting @ 2005-06-02  8:53 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Bill Davidsen wrote:

> Matthias Andree wrote:
>
>> On Sun, 29 May 2005, Greg Stark wrote:
>>
>>
>>> Oracle, Sybase, Postgres, other databases have hard requirements. They
>>> guarantee that when they acknowledge a transaction commit the data 
>>> has been
>>> written to non-volatile media and will be recoverable even in the 
>>> face of a
>>> routine power loss.
>>>
>>> They meet this requirement just fine on SCSI drives (where write 
>>> caching
>>> generally ships disabled) and on any OS where fsync issues a cache 
>>> flush. If
>>
>>
>>
>> I don't know what facts "generally ships disabled" is based on, all of
>> the more recent SCSI drives (non SCA type though) I acquired came with
>> write cache enabled and some also with queue algorithm modifier set 
>> to 1.
>>
>>
>>> Worse, if the disk flushes the data to disk out of order it's quite
>>> likely the entire database will be corrupted on any simple power
>>> outage. I'm not clear whether that's the case for any common drives.
>>
>>
>>
>> It's a matter of enforcing write order. In how far such ordering
>> constraints are propagated by file systems, VFS layer, down to the
>> hardware, is the grand question.
>>
> The problem is that in many options required to make that happen in 
> the o/s, hardware, and application are going to kill performance. And 
> even if you can control order of write, unless you can get write to 
> final non-volatile media control you can get a sane database but still 
> lose transactions.
>
> If there was a way for the o/s to know when a physical write was done 
> other than using flushes to force completion, then overall performance 
> could be higher, but individual transaction might have greater 
> latency. And the app could use fsync to force order of write as 
> needed. In many cases groups of writes can be done in any order as 
> long as they are all done before the next logical step takes place. 

There is a workaround.  Get an UPS just for the disks.  It don't have to be
big, just enough to keep the disks going long enough to commit their
caches after the rest of the machine died from a power loss.  Such a small
unit could possibly fit inside the cabinet, avoiding the trouble with
people stepping on the power cord.

With this in place, any write that makes it from the controller to the
disk is safely stored for all practical purposes.

Helge Hafting

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-02  8:53                                         ` Helge Hafting
@ 2005-06-02 12:00                                           ` Bill Davidsen
  2005-06-02 13:33                                             ` Lennart Sorensen
  0 siblings, 1 reply; 150+ messages in thread
From: Bill Davidsen @ 2005-06-02 12:00 UTC (permalink / raw)
  To: Helge Hafting
  Cc: Matthias Andree, Alan Cox, Arjan van de Ven, Linux Kernel Mailing List

Helge Hafting wrote:

> Bill Davidsen wrote:
>
>> Matthias Andree wrote:
>>
>>> On Sun, 29 May 2005, Greg Stark wrote:
>>>
>>>
>>>> Oracle, Sybase, Postgres, other databases have hard requirements. They
>>>> guarantee that when they acknowledge a transaction commit the data 
>>>> has been
>>>> written to non-volatile media and will be recoverable even in the 
>>>> face of a
>>>> routine power loss.
>>>>
>>>> They meet this requirement just fine on SCSI drives (where write 
>>>> caching
>>>> generally ships disabled) and on any OS where fsync issues a cache 
>>>> flush. If
>>>
>>>
>>>
>>>
>>> I don't know what facts "generally ships disabled" is based on, all of
>>> the more recent SCSI drives (non SCA type though) I acquired came with
>>> write cache enabled and some also with queue algorithm modifier set 
>>> to 1.
>>>
>>>
>>>> Worse, if the disk flushes the data to disk out of order it's quite
>>>> likely the entire database will be corrupted on any simple power
>>>> outage. I'm not clear whether that's the case for any common drives.
>>>
>>>
>>>
>>>
>>> It's a matter of enforcing write order. In how far such ordering
>>> constraints are propagated by file systems, VFS layer, down to the
>>> hardware, is the grand question.
>>>
>> The problem is that in many options required to make that happen in 
>> the o/s, hardware, and application are going to kill performance. And 
>> even if you can control order of write, unless you can get write to 
>> final non-volatile media control you can get a sane database but 
>> still lose transactions.
>>
>> If there was a way for the o/s to know when a physical write was done 
>> other than using flushes to force completion, then overall 
>> performance could be higher, but individual transaction might have 
>> greater latency. And the app could use fsync to force order of write 
>> as needed. In many cases groups of writes can be done in any order as 
>> long as they are all done before the next logical step takes place. 
>
>
> There is a workaround.  Get an UPS just for the disks.  It don't have 
> to be
> big, just enough to keep the disks going long enough to commit their
> caches after the rest of the machine died from a power loss.  Such a 
> small
> unit could possibly fit inside the cabinet, avoiding the trouble with
> people stepping on the power cord.
>
> With this in place, any write that makes it from the controller to the
> disk is safely stored for all practical purposes.


Unfortunately even drives in a dual power tray with redundany power from 
separate UPS sources will occasionally have a power failure. Proved that 
last month, the power strip in the rack failed, dumped all the load on 
the other leg, the surge tripped a breaker. Had an APC UPS in my office 
fail in a mode which dropped power, waited for the battery to trickle 
charge to charge the battery a bit, then repeat. Looks to be losing half 
of a full wave rectifier.

The point is that power failures WILL HAPPEN, even with good backups. 
The goal should be to prevent excessive and avoidable data damage when 
it does.

Shameless plug: for office use I changed from APC to Belkin on all new 
units, they have had Linux drivers for some time now, and I like to 
support those who support Linux.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-02 12:00                                           ` Bill Davidsen
@ 2005-06-02 13:33                                             ` Lennart Sorensen
  2005-06-04 13:37                                               ` Bill Davidsen
  0 siblings, 1 reply; 150+ messages in thread
From: Lennart Sorensen @ 2005-06-02 13:33 UTC (permalink / raw)
  To: Bill Davidsen
  Cc: Helge Hafting, Matthias Andree, Alan Cox, Arjan van de Ven,
	Linux Kernel Mailing List

On Thu, Jun 02, 2005 at 08:00:34AM -0400, Bill Davidsen wrote:
> Unfortunately even drives in a dual power tray with redundany power from 
> separate UPS sources will occasionally have a power failure. Proved that 
> last month, the power strip in the rack failed, dumped all the load on 
> the other leg, the surge tripped a breaker. Had an APC UPS in my office 
> fail in a mode which dropped power, waited for the battery to trickle 
> charge to charge the battery a bit, then repeat. Looks to be losing half 
> of a full wave rectifier.
> 
> The point is that power failures WILL HAPPEN, even with good backups. 
> The goal should be to prevent excessive and avoidable data damage when 
> it does.
> 
> Shameless plug: for office use I changed from APC to Belkin on all new 
> units, they have had Linux drivers for some time now, and I like to 
> support those who support Linux.

Hasn't apcupsd existed for at least a decade?  Works rather well for me.
Hard to imagine better linux/unix support than APC seems to have
provided so far.

For some reason Belkin screms cheap junk to me.  Maybe that's because
that is what you always see for sale with that brand on it.  They may
have nice stuff that I just haven't seen because it isn't carried by
most stores.

Len Sorensen

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-02 13:33                                             ` Lennart Sorensen
@ 2005-06-04 13:37                                               ` Bill Davidsen
  2005-06-04 15:31                                                 ` Bernd Eckenfels
  0 siblings, 1 reply; 150+ messages in thread
From: Bill Davidsen @ 2005-06-04 13:37 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Helge Hafting, Matthias Andree, Alan Cox, Arjan van de Ven,
	Linux Kernel Mailing List

Lennart Sorensen wrote:

>On Thu, Jun 02, 2005 at 08:00:34AM -0400, Bill Davidsen wrote:
>  
>
>>Unfortunately even drives in a dual power tray with redundany power from 
>>separate UPS sources will occasionally have a power failure. Proved that 
>>last month, the power strip in the rack failed, dumped all the load on 
>>the other leg, the surge tripped a breaker. Had an APC UPS in my office 
>>fail in a mode which dropped power, waited for the battery to trickle 
>>charge to charge the battery a bit, then repeat. Looks to be losing half 
>>of a full wave rectifier.
>>
>>The point is that power failures WILL HAPPEN, even with good backups. 
>>The goal should be to prevent excessive and avoidable data damage when 
>>it does.
>>
>>Shameless plug: for office use I changed from APC to Belkin on all new 
>>units, they have had Linux drivers for some time now, and I like to 
>>support those who support Linux.
>>    
>>
>
>Hasn't apcupsd existed for at least a decade?  Works rather well for me.
>Hard to imagine better linux/unix support than APC seems to have
>provided so far.
>  
>

I thought apcuspd was a third party project, sourceforce shows it as a 
project. Didn't know APC was actually "providing" anything, is the 
driver on the CD now? Sure wasn't on the APC CD I had, I did have it at 
one time, but it didn't come with the UPS (at that time).

>For some reason Belkin screms cheap junk to me.  Maybe that's because
>that is what you always see for sale with that brand on it.  They may
>have nice stuff that I just haven't seen because it isn't carried by
>most stores.
>
You don't have Staples or Wal-Mart? Office Max did drop the UPS, the 
local store manager said the issue was margin, hadn't had enough returns 
on either brand to be meaningful.

-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Linux does not care for data integrity
  2005-06-04 13:37                                               ` Bill Davidsen
@ 2005-06-04 15:31                                                 ` Bernd Eckenfels
  0 siblings, 0 replies; 150+ messages in thread
From: Bernd Eckenfels @ 2005-06-04 15:31 UTC (permalink / raw)
  To: linux-kernel

In article <42A1AE8B.5000907@tmr.com> you wrote:
> I thought apcuspd was a third party project, sourceforce shows it as a 
> project. Didn't know APC was actually "providing" anything, is the 
> driver on the CD now? Sure wasn't on the APC CD I had, I did have it at 
> one time, but it didn't come with the UPS (at that time).

And I havent found a UPS Daemon yet who can out of the box query the snmp
cards, one has to hack that themself. However this is still no option
against data corruption on power loss. I mean: there is a fuse in your
Server. And UPS are known to fail even with power attached.

Gruss
Bernd

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: Disk write cache (Was: Hyper-Threading Vulnerability)
@ 2005-05-18 22:11 Lincoln Dale (ltd)
  0 siblings, 0 replies; 150+ messages in thread
From: Lincoln Dale (ltd) @ 2005-05-18 22:11 UTC (permalink / raw)
  To: John Stoffel; +Cc: Eric D. Mudama, Robert Hancock, linux-kernel

 

> -----Original Message-----
> From: John Stoffel [mailto:john@stoffel.org] 
> Sent: Wednesday, 18 May 2005 11:49 PM
> To: Lincoln Dale (ltd)
> Cc: Eric D. Mudama; Robert Hancock; linux-kernel
> Subject: RE: Disk write cache (Was: Hyper-Threading Vulnerability)
> 
> >>>>> "Lincoln" == Lincoln Dale \(ltd\) <Lincoln> writes:
> 
> Lincoln> why don't drive vendors create firmware which reserved a 
> Lincoln> cache-sized (e.g. 2MB) hole of internal drive space 
> somewhere 
> Lincoln> for such an event, and a "cache flush caused by hard-reset"
> Lincoln> simply caused it to write the cache to a fixed (contiguous) 
> Lincoln> area of disk.
> 
> Well, if you're losing power in the next Xmilliseconds, do 
> you have the time to seek to the cache holding area and 
> settle down the head (since you could have done a seek from 
> the edge of the disk to the middle), start writing, etc? 

I believe its possible.
rationale:

 [1] ATX power specification, (google finds this for me at
http://www.formfactors.org/developer%5Cspecs%5CATX12V_1_3dg.pdf)
     section 3.2.11 (Voltage Hold-up time) states:

	The power supply should maintain output regulation per Section
3.2.1 despite a loss of input
	power at the low-end nominal range-115 VAC / 57 Hz or 230 VAC /
47 Hz-at maximum
	continuous output load as applicable for a minimum of 17 ms.

     the assumption here is that T6 in figure 5 does de-assert the
POWER_OK signal early in that "minimum of 17ms".
     the spec (unfortunately) only calls for >=1msec.

     once again, i see that there could be a market for a combination of
p/s & peripherals that could make use of it.
     lets say that we DO have 17msec.

 [2] Hard drive response times
     picking a 'standard' high-end hard drive (Maxtor Atlas 10K V scsi
disk):

	average seek + rotional latency is measured at 7.6msec.
	transfer rates at beginning of disk are 89.5MB/s at end of disk
are 53.9MB/s.
      (source
http://www.storagereview.com/articles/200411/200411028D300L0_2.html)

     allowing 8msec for seek time, and writing at the 'slow' side of the
disk, writing 2MB
     could take ~37msec (2 / 53.9).  allow 50% overhead here - and we
have 55msec.

     55 + 8 = 63 msec.

ok - 63msec doesn't fit into 17msec -
but as i say, a combination of p/s and/or larger caps (and/or more
innovative design by a case or p/s manufactuer which creates a dedicated
peripheral power bus)

> Seems better to have a cache sized flash ram instead where 
> you could just keep the data there in case of power loss.  
> 
> But that's expensive, and not something most people need...

indeed, and that is what MS have been targeting. (flash isn't that
expensive, but flash write times are..).



cheers,

lincoln.


> 
> John
> 

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-18  9:45 Lincoln Dale (ltd)
@ 2005-05-18 13:48 ` John Stoffel
  0 siblings, 0 replies; 150+ messages in thread
From: John Stoffel @ 2005-05-18 13:48 UTC (permalink / raw)
  To: Lincoln Dale (ltd); +Cc: Eric D. Mudama, Robert Hancock, linux-kernel

>>>>> "Lincoln" == Lincoln Dale \(ltd\) <Lincoln> writes:

Lincoln> why don't drive vendors create firmware which reserved a
Lincoln> cache-sized (e.g. 2MB) hole of internal drive space somewhere
Lincoln> for such an event, and a "cache flush caused by hard-reset"
Lincoln> simply caused it to write the cache to a fixed (contiguous)
Lincoln> area of disk.

Well, if you're losing power in the next Xmilliseconds, do you have
the time to seek to the cache holding area and settle down the head
(since you could have done a seek from the edge of the disk to the
middle), start writing, etc?  Seems better to have a cache sized flash
ram instead where you could just keep the data there in case of power
loss.  

But that's expensive, and not something most people need...

John

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-18  7:16 Paul Zimmerman
  2005-05-18 11:10 ` Richard B. Johnson
@ 2005-05-18 12:47 ` Stephan Wonczak
  1 sibling, 0 replies; 150+ messages in thread
From: Stephan Wonczak @ 2005-05-18 12:47 UTC (permalink / raw)
  To: Paul Zimmerman; +Cc: linux-kernel, mrmacman_g4

Paul Zimmerman wrote:
> On May 17, 2005, at 21:41:39, Kyle Moffett wrote:
> 
>>I've seen some articles recently on a micro-punchcard technology that  uses
>>grids of thousands of miniature needles and sheets of polymer plastic
> 
> 
> Bwa-ha-ha! That's rich. You should have saved that one for next April
> 1st!
> Does it use micro-relay logic to drive the micro-punchcard reader? Or
> does it have nano-technology vacuum tube logic circuits?
> 
> Good one.

    No, actually. That one's for real. See:

    http://www.zurich.ibm.com/st/storage/millipede.html

    Looks like it will be the next generation of storage after rotating
discs.
    (*grumble* ... forgot to hit 'reply all'. Sorry!)

		Stephan Wonczak
		
     "I haven't lost my mind; I know exactly where I left it."
     "The meaning of my life is to make me crazy"


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-18  7:16 Paul Zimmerman
@ 2005-05-18 11:10 ` Richard B. Johnson
  2005-05-18 12:47 ` Stephan Wonczak
  1 sibling, 0 replies; 150+ messages in thread
From: Richard B. Johnson @ 2005-05-18 11:10 UTC (permalink / raw)
  To: Paul Zimmerman; +Cc: linux-kernel, mrmacman_g4

On Wed, 18 May 2005, Paul Zimmerman wrote:

> On May 17, 2005, at 21:41:39, Kyle Moffett wrote:
>> I've seen some articles recently on a micro-punchcard technology that  uses
>> grids of thousands of miniature needles and sheets of polymer plastic
>
> Bwa-ha-ha! That's rich. You should have saved that one for next April
> 1st!
> Does it use micro-relay logic to drive the micro-punchcard reader? Or
> does it have nano-technology vacuum tube logic circuits?
>
> Good one.
>
> -Paul

Actually carbon nano tubes, vacuum tubes not needed! May need a
"filament" transformer, though ^:


Cheers,
Dick Johnson
Penguin : Linux version 2.6.11.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* RE: Disk write cache (Was: Hyper-Threading Vulnerability)
@ 2005-05-18  9:45 Lincoln Dale (ltd)
  2005-05-18 13:48 ` John Stoffel
  0 siblings, 1 reply; 150+ messages in thread
From: Lincoln Dale (ltd) @ 2005-05-18  9:45 UTC (permalink / raw)
  To: Eric D. Mudama, Robert Hancock; +Cc: linux-kernel

Eric,

> On 5/16/05, Robert Hancock <hancockr@shaw.ca> wrote:
> > If the power to the drive is truly just cut, then this is basically 
> > what will happen. However, I have heard, for what it's 
> worth, that in 
> > many cases if you pull the AC power from a typical PC, the 
> Power Good 
> > signal from the PSU will be de-asserted, which triggers the 
> Reset line 
> > on all the buses, which triggers the ATA reset line, which triggers 
> > the drive to finish writing out the sector it is doing. There is 
> > likely enough capacitance in the power supply to do that 
> before the voltage drops off.
> 
> Yes, but as you said this isn't a power loss event.  It is a 
> hard reset with a full write cache, which all drives on the 
> market today respond to by flushing the cache.
> 
> According to the spec the time to flush can exceed 30s, so 
> your PSU better have some honkin caps on it to ensure data 
> integrity when you yank the power cord out of the wall.

why don't drive vendors create firmware which reserved a cache-sized
(e.g. 2MB) hole of internal drive space somewhere for such an event, and
a "cache flush caused by hard-reset" simply caused it to write the cache
to a fixed (contiguous) area of disk.

the same drive firmware on power-on could check that area and 'write
back' the data to the correct locations.

all said and done, why wouldn't a vendor (lets just say "Maxtor" :) )
implement something like this and market it as a feature?
i'd happily spend a few extra bucks for something that given a modern
PSU providing a few Hz of power (e.g. 50msec) provided higher data
reliability in case of power failure..


cheers,

lincoln.

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
@ 2005-05-18  7:16 Paul Zimmerman
  2005-05-18 11:10 ` Richard B. Johnson
  2005-05-18 12:47 ` Stephan Wonczak
  0 siblings, 2 replies; 150+ messages in thread
From: Paul Zimmerman @ 2005-05-18  7:16 UTC (permalink / raw)
  To: linux-kernel; +Cc: mrmacman_g4

On May 17, 2005, at 21:41:39, Kyle Moffett wrote:
>I've seen some articles recently on a micro-punchcard technology that  uses
>grids of thousands of miniature needles and sheets of polymer plastic

Bwa-ha-ha! That's rich. You should have saved that one for next April
1st!
Does it use micro-relay logic to drive the micro-punchcard reader? Or
does it have nano-technology vacuum tube logic circuits?

Good one.

-Paul

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
  2005-05-17  3:29               ` Disk write cache (Was: Hyper-Threading Vulnerability) Robert Hancock
@ 2005-05-18  4:11                 ` Eric D. Mudama
  0 siblings, 0 replies; 150+ messages in thread
From: Eric D. Mudama @ 2005-05-18  4:11 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

On 5/16/05, Robert Hancock <hancockr@shaw.ca> wrote:
> If the power to the drive is truly just cut, then this is basically what
> will happen. However, I have heard, for what it's worth, that in many
> cases if you pull the AC power from a typical PC, the Power Good signal
> from the PSU will be de-asserted, which triggers the Reset line on all
> the buses, which triggers the ATA reset line, which triggers the drive
> to finish writing out the sector it is doing. There is likely enough
> capacitance in the power supply to do that before the voltage drops off.

Yes, but as you said this isn't a power loss event.  It is a hard
reset with a full write cache, which all drives on the market today
respond to by flushing the cache.

According to the spec the time to flush can exceed 30s, so your PSU
better have some honkin caps on it to ensure data integrity when you
yank the power cord out of the wall.

--eric

^ permalink raw reply	[flat|nested] 150+ messages in thread

* Re: Disk write cache (Was: Hyper-Threading Vulnerability)
       [not found]             ` <44PRr-6mz-33@gated-at.bofh.it>
@ 2005-05-17  3:29               ` Robert Hancock
  2005-05-18  4:11                 ` Eric D. Mudama
  0 siblings, 1 reply; 150+ messages in thread
From: Robert Hancock @ 2005-05-17  3:29 UTC (permalink / raw)
  To: linux-kernel

Richard B. Johnson wrote:
> Then I suggest you never use such a drive. Anything that does this,
> will end up replacing a good track with garbage. Unless a disk drive
> has a built-in power source such as super-capacitors or batteries, what
> happens during a power-failure is that all electronics stops and
> the discs start coasting. Eventually the heads will crash onto

If the power to the drive is truly just cut, then this is basically what 
will happen. However, I have heard, for what it's worth, that in many 
cases if you pull the AC power from a typical PC, the Power Good signal 
from the PSU will be de-asserted, which triggers the Reset line on all 
the buses, which triggers the ATA reset line, which triggers the drive 
to finish writing out the sector it is doing. There is likely enough 
capacitance in the power supply to do that before the voltage drops off.

> the platter. Older discs had a magnetically released latch which would
> send the heads to an inside landing zone. Nobody bothers anymore.

Sure they do. All current or remotely recent drives (to my knowledge, 
anyway) will park the heads properly at the landing zone on power-off. 
If the drive is told to power off cleanly, this works as expected, and 
if the power is simply cut, the remaining energy in the spinning 
platters is used like a generator to provide power to move the head 
actuator to the park positon.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 150+ messages in thread

end of thread, other threads:[~2005-06-04 15:31 UTC | newest]

Thread overview: 150+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-05-13  5:51 Hyper-Threading Vulnerability Gabor MICSKO
2005-05-13 12:47 ` Barry K. Nathan
2005-05-13 14:10   ` Jeff Garzik
2005-05-13 14:23     ` Daniel Jacobowitz
2005-05-13 14:32       ` Jeff Garzik
2005-05-13 17:13         ` Andy Isaacson
2005-05-13 18:30           ` Vadim Lobanov
2005-05-13 19:02             ` Andy Isaacson
2005-05-15  9:31               ` Adrian Bunk
2005-05-13 17:14         ` Gabor MICSKO
2005-05-13 20:23     ` Barry K. Nathan
2005-05-13 18:03 ` Andi Kleen
2005-05-13 18:34   ` Eric Rannaud
2005-05-13 18:35   ` Alan Cox
2005-05-13 18:49     ` Scott Robert Ladd
2005-05-13 19:08       ` Andi Kleen
2005-05-13 19:36       ` Grant Coady
2005-05-16 17:00       ` Linus Torvalds
2005-05-16 12:37         ` Tommy Reynolds
2005-05-18 19:07     ` Bill Davidsen
2005-05-13 18:38   ` Richard F. Rebel
2005-05-13 19:05     ` Andi Kleen
2005-05-13 21:26       ` Andy Isaacson
2005-05-13 21:59         ` Matt Mackall
2005-05-13 22:47           ` Alan Cox
2005-05-13 23:00             ` Lee Revell
2005-05-13 23:27               ` Dave Jones
2005-05-13 23:38                 ` Lee Revell
2005-05-13 23:44                   ` Dave Jones
2005-05-14  7:37                     ` Lee Revell
2005-05-14 15:33                       ` Andrea Arcangeli
2005-05-15  1:07                         ` Christer Weinigel
2005-05-15  9:48                         ` Andi Kleen
2005-05-14 15:23                   ` Alan Cox
2005-05-14 15:45                     ` andrea
2005-05-15 13:38                       ` Mikulas Patocka
2005-05-16  7:06                         ` andrea
2005-05-14 16:30                     ` Lee Revell
2005-05-14 16:44                       ` Arjan van de Ven
2005-05-14 17:56                         ` Lee Revell
2005-05-14 18:01                           ` Arjan van de Ven
2005-05-14 19:21                             ` Lee Revell
2005-05-14 19:48                               ` Arjan van de Ven
2005-05-14 23:40                                 ` Lee Revell
2005-05-15  7:30                                   ` Arjan van de Ven
2005-05-15 20:41                                     ` Alan Cox
2005-05-15 20:48                                       ` Arjan van de Ven
2005-05-15 21:10                                         ` Lee Revell
2005-05-15 22:55                                           ` Dave Jones
2005-05-15 23:10                                             ` Lee Revell
2005-05-16  7:25                                               ` Arjan van de Ven
2005-05-15  9:37                                   ` Andi Kleen
2005-05-15  3:19                                 ` dean gaudet
2005-05-15 10:01                             ` Andi Kleen
2005-05-15 10:23                               ` 2.6.4 timer and helper functions kernel
2005-05-19  0:38                                 ` George Anzinger
2005-05-15  9:33                           ` Hyper-Threading Vulnerability Adrian Bunk
2005-05-14 17:04                       ` Jindrich Makovicka
2005-05-14 18:27                         ` Lee Revell
2005-05-15  9:58                       ` Andi Kleen
2005-05-14  0:39         ` dean gaudet
2005-05-16 13:41           ` Andrea Arcangeli
2005-05-15  9:43         ` Andi Kleen
2005-05-15 18:42           ` David Schwartz
2005-05-15 18:56             ` Dr. David Alan Gilbert
2005-05-16  7:10           ` Eric W. Biederman
2005-05-16 11:04             ` Andi Kleen
2005-05-16 19:14               ` Eric W. Biederman
2005-05-16 20:05                 ` Valdis.Kletnieks
2005-05-15 14:00         ` Mikulas Patocka
2005-05-15 14:26         ` Andi Kleen
2005-05-13 23:32       ` Paul Jakma
2005-05-14 16:29         ` Paul Jakma
2005-05-13 19:14     ` Jim Crilly
2005-05-13 20:18       ` Barry K. Nathan
2005-05-13 23:14         ` Jim Crilly
2005-05-13 19:16   ` Diego Calleja
2005-05-13 19:42     ` Frank Denis (Jedi/Sector One)
2005-05-15  9:54     ` Andi Kleen
2005-05-15 13:51       ` Mikulas Patocka
2005-05-15 14:12         ` Andi Kleen
2005-05-15 14:21           ` Mikulas Patocka
2005-05-15 14:52           ` Tomasz Torcz
2005-05-15 15:00             ` Disk write cache (Was: Hyper-Threading Vulnerability) Mikulas Patocka
2005-05-15 15:21               ` Gene Heskett
2005-05-15 15:29                 ` Jeff Garzik
2005-05-15 16:27                   ` Disk write cache Kenichi Okuyama
2005-05-15 16:43                     ` Jeff Garzik
2005-05-15 16:50                       ` Kyle Moffett
2005-05-15 16:56                       ` Andi Kleen
2005-05-15 20:44                         ` Andrew Morton
2005-05-15 23:31                           ` Cache based insecurity/CPU cache/Disk Cache Tradeoffs Brian O'Mahoney
2005-05-15 16:58                       ` Disk write cache Mikulas Patocka
2005-05-15 17:20                       ` Kenichi Okuyama
2005-05-16 11:02                       ` Linux does not care for data integrity (was: Disk write cache) Matthias Andree
2005-05-16 11:12                         ` Arjan van de Ven
2005-05-16 11:29                           ` Matthias Andree
2005-05-16 14:02                             ` Arjan van de Ven
2005-05-16 14:48                               ` Matthias Andree
2005-05-16 15:06                                 ` Alan Cox
2005-05-16 15:40                                   ` Matthias Andree
2005-05-16 18:04                                     ` Alan Cox
2005-05-16 19:11                                       ` Linux does not care for data integrity Florian Weimer
2005-05-29 21:02                                   ` Linux does not care for data integrity (was: Disk write cache) Greg Stark
2005-05-29 21:16                                     ` Matthias Andree
2005-05-30  6:04                                       ` Greg Stark
2005-05-30  8:21                                         ` Matthias Andree
2005-06-01 19:02                                       ` Linux does not care for data integrity Bill Davidsen
2005-06-01 22:02                                         ` Matthias Andree
2005-06-02  0:12                                           ` Bill Davidsen
2005-06-02  0:36                                         ` Jeff Garzik
2005-06-02  1:37                                           ` Bill Davidsen
2005-06-02  1:54                                             ` Jeff Garzik
2005-06-02  8:53                                         ` Helge Hafting
2005-06-02 12:00                                           ` Bill Davidsen
2005-06-02 13:33                                             ` Lennart Sorensen
2005-06-04 13:37                                               ` Bill Davidsen
2005-06-04 15:31                                                 ` Bernd Eckenfels
2005-05-16 14:57                           ` Linux does not care for data integrity (was: Disk write cache) Alan Cox
2005-05-16 13:48                         ` Linux does not care for data integrity Mark Lord
2005-05-16 14:59                           ` Matthias Andree
2005-05-16  1:56                   ` Disk write cache (Was: Hyper-Threading Vulnerability) Gene Heskett
2005-05-16  2:11                     ` Jeff Garzik
2005-05-16  2:24                     ` Mikulas Patocka
2005-05-16  3:05                       ` Gene Heskett
2005-05-16  2:32                     ` Mark Lord
2005-05-16  3:08                       ` Gene Heskett
2005-05-16 13:44                         ` Mark Lord
2005-05-18  4:03                       ` Eric D. Mudama
2005-05-15 16:24                 ` Mikulas Patocka
2005-05-16 11:18                   ` Matthias Andree
2005-05-16 14:33                     ` Jeff Garzik
2005-05-16 15:26                       ` Richard B. Johnson
2005-05-16 16:00                         ` [OT] drive behavior on power-off (was: Disk write cache) Matthias Andree
2005-05-16 18:11                       ` Disk write cache (Was: Hyper-Threading Vulnerability) Valdis.Kletnieks
2005-05-16 14:54                     ` Alan Cox
2005-05-17 13:15                       ` Bill Davidsen
2005-05-17 21:41                         ` Kyle Moffett
2005-05-18  4:06                     ` Eric D. Mudama
2005-05-15 21:38                 ` Tomasz Torcz
2005-05-16 14:50               ` Alan Cox
2005-05-15 15:00             ` Hyper-Threading Vulnerability Arjan van de Ven
     [not found] <43Bnu-Ut-9@gated-at.bofh.it>
     [not found] ` <44sLm-3Mg-33@gated-at.bofh.it>
     [not found]   ` <44sUX-42h-11@gated-at.bofh.it>
     [not found]     ` <44teb-4fb-1@gated-at.bofh.it>
     [not found]       ` <44uaj-4Z3-5@gated-at.bofh.it>
     [not found]         ` <44LXu-2W6-15@gated-at.bofh.it>
     [not found]           ` <44OVj-5xS-3@gated-at.bofh.it>
     [not found]             ` <44PRr-6mz-33@gated-at.bofh.it>
2005-05-17  3:29               ` Disk write cache (Was: Hyper-Threading Vulnerability) Robert Hancock
2005-05-18  4:11                 ` Eric D. Mudama
2005-05-18  7:16 Paul Zimmerman
2005-05-18 11:10 ` Richard B. Johnson
2005-05-18 12:47 ` Stephan Wonczak
2005-05-18  9:45 Lincoln Dale (ltd)
2005-05-18 13:48 ` John Stoffel
2005-05-18 22:11 Lincoln Dale (ltd)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).