git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
@ 2005-06-13 19:50 linux
  2005-06-13 20:08 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: linux @ 2005-06-13 19:50 UTC (permalink / raw)
  To: git, torvalds

> So the problem is totally different from the way git uses a hash. In the 
> git model, an attacker by definition cannot control both versions of a 
> file, since if he controls just _one_ version, he doesn't need to do the 
> attack in the first place!

You are insufficiently paranoid, Grasshopper.

The basic attack goes like this:

- I construct two .c files with identical hashes.  One is something
  useful; perhaps a device driver for some piece of hardware that my
  desired target has.  The other is similar, but includes a remote
  root explot.

  (With an n-bit hash and an automated way to make harmless changes
  to source files, I can generate 2^(n/2) variants of each and expect to
  get a match, even in the absence of a better attack.)

- I submit the first one to the Linux kernel.  It's valid and gets
  merged.

- A kernel release, including the "interesting" driver, gets made and
  sprinkled with holy penguin pee.  Signatures, hashes, and all that.

- Through various means (possibly just running a kernel download mirror,
  or possibly by splicing into my target's upstream Internet connection),
  I substitute the malware file for the real source code.

- My target verifies all the hashes and signatures, decides that this "Linus"
  person signing it is trustworthy, and compiles and installs the kernel.

- I walk in my back door and do suitable rude things.


The point is, it *is* possible for an attacker to control both versions of
a file.  The reason he needs to do the attack is that one version looks
legitimate and the other includes a Nasty Surprise.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 19:50 [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful linux
@ 2005-06-13 20:08 ` Linus Torvalds
  2005-06-13 20:17   ` Jason McMullan
  2005-06-13 21:03   ` linux
  2005-06-13 20:46 ` Junio C Hamano
  2005-06-13 20:52 ` Radoslaw Szkodzinski
  2 siblings, 2 replies; 9+ messages in thread
From: Linus Torvalds @ 2005-06-13 20:08 UTC (permalink / raw)
  To: linux; +Cc: git



On Mon, 13 Jun 2005 linux@horizon.com wrote:
> 
> You are insufficiently paranoid, Grasshopper.

No, I just am not lettign paranoia mean that I sit around shivering all 
day long.

> The basic attack goes like this:
> 
> - I construct two .c files with identical hashes.

Ok, I have a better plan.

 - you learn to fly by flapping your arms fast enough
 - you then learn to pee burning gasoline
 - then, you fly around New York, setting everybody you see on fire, until 
   people make you emperor.

Sounds like a good plan, no?

But perhaps slightly impractical. 

Now, let's go back to your plan. Why do you think your plan is any better 
than mine?

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 20:08 ` Linus Torvalds
@ 2005-06-13 20:17   ` Jason McMullan
  2005-06-13 21:03   ` linux
  1 sibling, 0 replies; 9+ messages in thread
From: Jason McMullan @ 2005-06-13 20:17 UTC (permalink / raw)
  To: GIT Mailling list

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

On Mon, 2005-06-13 at 13:08 -0700, Linus Torvalds wrote:
> Ok, I have a better plan.
> 
>  - you learn to fly by flapping your arms fast enough
>  - you then learn to pee burning gasoline
>  - then, you fly around New York, setting everybody you see on fire,
> until 
>    people make you emperor.

Can't... stop... laughing....


"Cryptology - It's a comedy goldmine!"

-- 
Jason McMullan <jason.mcmullan@timesys.com>
TimeSys Corporation


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 19:50 [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful linux
  2005-06-13 20:08 ` Linus Torvalds
@ 2005-06-13 20:46 ` Junio C Hamano
  2005-06-13 20:52 ` Radoslaw Szkodzinski
  2 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2005-06-13 20:46 UTC (permalink / raw)
  To: linux; +Cc: git

>> So the problem is totally different from the way git uses a hash. In the 
>> git model, an attacker by definition cannot control both versions of a 
>> file, since if he controls just _one_ version, he doesn't need to do the 
>> attack in the first place!

> You are insufficiently paranoid, Grasshopper.

> The basic attack goes like this:

> - I construct two .c files with identical hashes.  One is something
>   useful; perhaps a device driver for some piece of hardware that my
>   desired target has.  The other is similar, but includes a remote
>   root explot.

>   (With an n-bit hash and an automated way to make harmless changes
>   to source files, I can generate 2^(n/2) variants of each and expect to
>   get a match, even in the absence of a better attack.)

> - I submit the first one to the Linux kernel.  It's valid and gets
>   merged.

I doubt that this part would work in practice.

Wouldn't you have to have some "garbage" in the early part of
that driver source, probably in a C comment block or an
otherwise unused string constant, that serves no apparent
purpose, which is inserted by your "automated harmless changes"
machinery?

Wouldn't that catch people's attention and cause them to
question and reject that patch in the first place?

Wouldn't that mean you do not have control over even _one_
version, let alone _both_ versions?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 19:50 [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful linux
  2005-06-13 20:08 ` Linus Torvalds
  2005-06-13 20:46 ` Junio C Hamano
@ 2005-06-13 20:52 ` Radoslaw Szkodzinski
  2 siblings, 0 replies; 9+ messages in thread
From: Radoslaw Szkodzinski @ 2005-06-13 20:52 UTC (permalink / raw)
  To: linux; +Cc: git, torvalds

linux@horizon.com wrote:

>>So the problem is totally different from the way git uses a hash. In the 
>>git model, an attacker by definition cannot control both versions of a 
>>file, since if he controls just _one_ version, he doesn't need to do the 
>>attack in the first place!
>>    
>>
>
>You are insufficiently paranoid, Grasshopper.
>
>The basic attack goes like this:
>
>- I construct two .c files with identical hashes.  One is something
>  useful; perhaps a device driver for some piece of hardware that my
>  desired target has.  The other is similar, but includes a remote
>  root explot.
>
>  (With an n-bit hash and an automated way to make harmless changes
>  to source files, I can generate 2^(n/2) variants of each and expect to
>  get a match, even in the absence of a better attack.)
>
>  
>
And you get lots of nonsense in the new file.

>- I submit the first one to the Linux kernel.  It's valid and gets
>  merged.
>
>  
>
And funny as it is, when the hole is found you're busted. Or at least
the first person responsible.
You probably couldn't shadow yourself enough not to get caught.

>- A kernel release, including the "interesting" driver, gets made and
>  sprinkled with holy penguin pee.  Signatures, hashes, and all that.
>
>  
>
Which mean that you can't change your name on the project. See above.

>- Through various means (possibly just running a kernel download mirror,
>  or possibly by splicing into my target's upstream Internet connection),
>  I substitute the malware file for the real source code.
>
>  
>
If you can splice into the connection, you can put there anything you want,
including another kernel and any amount of exploits. Even with SSH.
Ever heard of man-in-the-middle attacks?

With high-grade security you won't be able to splice into the connection,
as it'll be fully encrypted (with HTH key exchange) and/or randomised using things like EFF's Tor.
Then they can check with kernel.org or any other mirror.

>- My target verifies all the hashes and signatures, decides that this "Linus"
>  person signing it is trustworthy, and compiles and installs the kernel.
>  
>
And they're so unforseeing that they don't check the sources of the
drivers they use.
Funny. And if they don't use it, you'll have a problem with enabling
your exploit.
Your best target would be a scheduler, but that's heavily scrutinised.

>- I walk in my back door and do suitable rude things.
>
>  
>
Like going to jail.

>The point is, it *is* possible for an attacker to control both versions of
>a file.  The reason he needs to do the attack is that one version looks
>legitimate and the other includes a Nasty Surprise.
>  
>
It is in theory. Tell someone when you mount such an attack on anybody.

AstralStorm

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 20:08 ` Linus Torvalds
  2005-06-13 20:17   ` Jason McMullan
@ 2005-06-13 21:03   ` linux
  2005-06-13 21:39     ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: linux @ 2005-06-13 21:03 UTC (permalink / raw)
  To: torvalds; +Cc: git, linux

> No, I just am not letting paranoia mean that I sit around shivering all 
> day long.

I'm sorry if I implied that.  I meant "paranoid" in the sense of
"imagining attack"; you were saying there is no way to attack git via
a collision attack on the underlying hash, and I objected.

I agree with you that:
- The attack is still wildly impractical, and
- Anything is better than the unauthenticated TCP we use these days!

>> The basic attack goes like this:
>> 
>> - I construct two .c files with identical hashes.

> Ok, I have a better plan.
>
> - you learn to fly by flapping your arms fast enough
> - you then learn to pee burning gasoline
> - then, you fly around New York, setting everybody you see on fire, until 
>   people make you emperor.
>
> Sounds like a good plan, no?

ROFL!  Oh my.  That's worthy of reprinting.  I was pleased with myself
for making fun of the "what if there's an accidental hash collision"
theory by assuming that kernel development would continue uninterrupted
until the sun went nova, but this is truly masterful scorn.

> But perhaps slightly impractical.

There are just few laws of physics it violates.

Not to mention that New York is still a trifle touchy about the combination
of flying and burning fossil fuels, and this poses problems for step 3.

> Now, let's go back to your plan. Why do you think your plan is any better 
> than mine?

I was trying to point out that a collision attack is possible.  That is,
*if* we assume that someone can has the ability to find a hash collision,
*then* they can use that to break git's authenticity guarantees.

I wasn't addressing the plausibility of the "if" part.  I agree that
requiring the hashed text to be plausible C source makes all current
attacks (including the MD5 ones) irrelevant, and reduces you to straight
brute force, which is quite implausible.

But it *is* a collsion attack, not a preimage attack, and it *is* at
least consistent with all known laws of physics.

I did *not* say, or mean to imply, that there was anything wrong with
git's hashing.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 21:03   ` linux
@ 2005-06-13 21:39     ` Linus Torvalds
  2005-06-13 23:03       ` linux
  0 siblings, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2005-06-13 21:39 UTC (permalink / raw)
  To: linux; +Cc: git



On Mon, 13 Jun 2005 linux@horizon.com wrote:
> 
> > But perhaps slightly impractical.
> 
> There are just few laws of physics it violates.

Yeah, yeah. You avoided a few laws of phsyics of your own.

For example, when you say

  "(With an n-bit hash and an automated way to make harmless changes
   to source files, I can generate 2^(n/2) variants of each and expect to
   get a match, even in the absence of a better attack.)"

you kind of ignore the fact that "n" here is 160, and so you're going to 
be searching for quite a few versions of each. Also, you have to compare 
the sha's of all of those 2**80 versions against each other which is a lot 
of work in itself.

Finally, you have to make sure that al the versions make sense, and that 
people will take them 100% unmodified. 

My plan was more interesting, I feel.

		Linus

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 21:39     ` Linus Torvalds
@ 2005-06-13 23:03       ` linux
  2005-06-14  1:49         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 9+ messages in thread
From: linux @ 2005-06-13 23:03 UTC (permalink / raw)
  To: torvalds; +Cc: git, linux

Oh, for heaven's sake.  This is getting to be entirely too big
an issue.  For the record, I believe:

- There are far easier ways to back-door the public Linux kernel.
  (See the recent "subtly malicious code" contest.)
- There are far easier ways to back-door a particular target's
  kernel.
- There are easier ways than back-dooring their kernel to attack
  a particular target's computer.
- A 2^80 work factor is sufficient protection for quite a few
  years, especially as the processor speed growth curve finally
  seems to be flattening.

All I was trying to point out is that a sqrt(n) hash collision
attack *exists*.  Not that it's practical or anything.

For the quantity of computers required, it would be *cheaper* to start a
factory making my own line of PCI bus master network cards that included
remote memory sniffing features on-board!

> Yeah, yeah. You avoided a few laws of phsyics of your own.
>
> For example, when you say
>
>  "(With an n-bit hash and an automated way to make harmless changes
>   to source files, I can generate 2^(n/2) variants of each and expect to
>   get a match, even in the absence of a better attack.)"
>
> you kind of ignore the fact that "n" here is 160, and so you're going to 
> be searching for quite a few versions of each. Also, you have to compare 
> the sha's of all of those 2**80 versions against each other which is a lot 
> of work in itself.

I am most definitely NOT ignoring it.  That's what makes the attack infeasible.
(But I insist that 2^80 is a hell of a long way from violating any laws
of physics.)

As for the comparison, if the two texts have the same form, there are
well-known ways to reduce the storage requirements by large factors
like 2^40.  For texts of different form, regular Pollard's rho doesn't
work, but maybe something ingenious is possible.

All I was trying to say is that a sqrt(2^160) attack exists.  Remember
the original statement?

>>> So the problem is totally different from the way git uses a hash. In the 
>>> git model, an attacker by definition cannot control both versions of a 
>>> file, since if he controls just _one_ version, he doesn't need to do the 
>>> attack in the first place!
>>>
>>> Put another way: you could use this exact example for a version of git
>>> that uses md5-sums instead of sha1's, but it wouldn't show anything at all 
>>> about a git vulnerability even so.

*That's* what I was responding to.  It basically says "there is no
collision attack on the way git uses the hash", a statement I disagree
with.  I didn't say it was a realistic attack, because the whole reason
that SHA is 160 bits is so that it's resistant to a hash collision attack!


> Finally, you have to make sure that al the versions make sense, and that 
> people will take them 100% unmodified. 

Making them all make sense isn't hard; I can come up with 80 binary
variants on a piece of code by changing order of helper functions,
order of variable declarations, order of statements, comments,
spacing, and whatnot without even having to get into trailing
or redundant whitespace.

Making sure that nobody messes with it is trickier.  That would be
easier with a small patch, but that would in turn make the problem
of generating 2^80 plausible variants harder.

I'm reaching a bit, but hey, it was a quick description, not a
fully-fledged Cunning Plan.

> My plan was more interesting, I feel.

Oh for you, sure.  But as the one you had nominated to carry it
out, I'd like to object!

For plausibility, I prefer mine for manufacturing network cards with
built-in back doors.  There's a chance I could even get the Chinese
government to help fund it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful
  2005-06-13 23:03       ` linux
@ 2005-06-14  1:49         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 9+ messages in thread
From: Benjamin Herrenschmidt @ 2005-06-14  1:49 UTC (permalink / raw)
  To: linux; +Cc: torvalds, git


> Oh for you, sure.  But as the one you had nominated to carry it
> out, I'd like to object!
> 
> For plausibility, I prefer mine for manufacturing network cards with
> built-in back doors.  There's a chance I could even get the Chinese
> government to help fund it.

Be careful with architectures that have a semi-decent iommu though :)

Ben.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-06-14  1:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-13 19:50 [zooko@zooko.com: [Revctrl] colliding md5 hashes of human-meaningful linux
2005-06-13 20:08 ` Linus Torvalds
2005-06-13 20:17   ` Jason McMullan
2005-06-13 21:03   ` linux
2005-06-13 21:39     ` Linus Torvalds
2005-06-13 23:03       ` linux
2005-06-14  1:49         ` Benjamin Herrenschmidt
2005-06-13 20:46 ` Junio C Hamano
2005-06-13 20:52 ` Radoslaw Szkodzinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).