All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Malicously tampering git metadata?
@ 2015-12-16  3:26 Santiago Torres
  2015-12-16  7:20 ` Stefan Beller
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Santiago Torres @ 2015-12-16  3:26 UTC (permalink / raw)
  To: Git

Hello everyone,

I'm Santiago, a PhD student at NYU doing research about secure software
development pipelines. We've been studying different aspects of Git
lately, (as it is an integral part of many projects) and we believe
we've found a vulnerabilty in the way Git structures/signs metadata. 

An attacker capable of performing as a Man in the Middle between a
GitHub server and a developer is able to trick such developer into
merging vulnerable commit objects, or omit security patches --- even if
all users sign all commit objects. Given that Git metadata is unsigned,
it can be modified to provide incorrect views of a repository to
downstream developers.

An example of a malicious commit merge follows:

1) The attacker controlling or acting as the upstream server identifies
two branches: one in which the unsuspecting developer is working on, and
another in which a vulnerable piece of code is located.

2) Branch pointers are modified: the packed-refs file (or ref/heads/*)
is edited so that the master branch points to the vulnerable commit
object. Having performed the change, no additional configuration must be
made by the attacker, who now waits for an unsuspecting developer to
pull.

3) Once a developer pulls, he or she will be prompted to merge his code
with the new change-set (the vulnerable commit). This operation will
only resemble developer negligence. If no conflicts arise, the attack
will pass unsuspected.

4) The developer pushes to upstream. All the traffic can be re-routed
back to the original repository. The target branch now contains a
vulnerable piece of code.

We have identified additional attack scenarios for modifying the
metadata that result in a incorrect state of the target repository, and
we are ready to disclose information about other variants of this attack
as well.

We also designed a backwards-compatible defense mechanism to prevent
attacks based on Git metadata tampering. Also we implemented a proof of
concept of the scheme, and performed timing, stress and concurrency
tests; our results show that the overhead should be minimal, even in
large software repositories such as the Linux Kernel.

We already approached people from CERT and GitHub regarding this attack
scenario, and we'd also like to hear your comments regarding this.

Thanks!
-Santiago.

P.S. We also elaborate more about this attack vector in this document: 
https://drive.google.com/a/nyu.edu/file/d/0B2KBm0fULlS1RDR5UHVESjVua3M/view?usp=sharing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-16  3:26 [RFC] Malicously tampering git metadata? Santiago Torres
@ 2015-12-16  7:20 ` Stefan Beller
  2015-12-18  1:06   ` Santiago Torres
  2015-12-18  4:02 ` Jeff King
  2015-12-18 23:10 ` Theodore Ts'o
  2 siblings, 1 reply; 14+ messages in thread
From: Stefan Beller @ 2015-12-16  7:20 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Tue, Dec 15, 2015 at 7:26 PM, Santiago Torres <santiago@nyu.edu> wrote:
> Hello everyone,
>
> I'm Santiago, a PhD student at NYU doing research about secure software
> development pipelines. We've been studying different aspects of Git
> lately, (as it is an integral part of many projects) and we believe
> we've found a vulnerabilty in the way Git structures/signs metadata.
>
> An attacker capable of performing as a Man in the Middle between a
> GitHub server and a developer is able to trick such developer into
> merging vulnerable commit objects, or omit security patches --- even if
> all users sign all commit objects. Given that Git metadata is unsigned,
> it can be modified to provide incorrect views of a repository to
> downstream developers.
>
> An example of a malicious commit merge follows:
>
> 1) The attacker controlling or acting as the upstream server identifies
> two branches: one in which the unsuspecting developer is working on, and
> another in which a vulnerable piece of code is located.
>
> 2) Branch pointers are modified: the packed-refs file (or ref/heads/*)
> is edited so that the master branch points to the vulnerable commit
> object. Having performed the change, no additional configuration must be
> made by the attacker, who now waits for an unsuspecting developer to
> pull.
>
> 3) Once a developer pulls, he or she will be prompted to merge his code
> with the new change-set (the vulnerable commit). This operation will
> only resemble developer negligence. If no conflicts arise, the attack
> will pass unsuspected.
>
> 4) The developer pushes to upstream. All the traffic can be re-routed
> back to the original repository. The target branch now contains a
> vulnerable piece of code.
>
> We have identified additional attack scenarios for modifying the
> metadata that result in a incorrect state of the target repository, and
> we are ready to disclose information about other variants of this attack
> as well.
>
> We also designed a backwards-compatible defense mechanism to prevent
> attacks based on Git metadata tampering. Also we implemented a proof of
> concept of the scheme, and performed timing, stress and concurrency
> tests; our results show that the overhead should be minimal, even in
> large software repositories such as the Linux Kernel.
>
> We already approached people from CERT and GitHub regarding this attack
> scenario, and we'd also like to hear your comments regarding this.

This is what push certificates ought to solve.
The server records all pushes and its signed certificates of pushes
and by the difference of the
refs (either in packed refs or as a loose ref) to the push certificate
this tampering of
the server can be detected.

The push certs can however not be obtained via Git itself (they are
just stored on the
server for now for later inspection or similar), because to be really
sure the client would
need to learn about these push certificates out of band.

The model there would be:
* A vulnerable piece of software exists.
* It get's fixed (and the fix is pushed with a signed push)
* the MITM server doesn't show the fix (show the code from before fix) nor
  the push certificate thereof
* client still pulls vulnerable code

This model shows the distribution of push certs via the server itself may not be
optimal.

Thanks for researching on Git,
Stefan

>
> Thanks!
> -Santiago.
>
> P.S. We also elaborate more about this attack vector in this document:
> https://drive.google.com/a/nyu.edu/file/d/0B2KBm0fULlS1RDR5UHVESjVua3M/view?usp=sharing
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-16  7:20 ` Stefan Beller
@ 2015-12-18  1:06   ` Santiago Torres
  2015-12-18  3:55     ` Jeff King
  0 siblings, 1 reply; 14+ messages in thread
From: Santiago Torres @ 2015-12-18  1:06 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git

Hi Stefan, thanks for the insight. 

> This is what push certificates ought to solve.
> The server records all pushes and its signed certificates of pushes
> and by the difference of the
> refs (either in packed refs or as a loose ref) to the push certificate
> this tampering of
> the server can be detected.

Is there any specification about push certificates? I would like to read
about them, but I don't seem to find documentation anywhere. Are they a
part of git's specification?

> 
> The push certs can however not be obtained via Git itself (they are
> just stored on the
> server for now for later inspection or similar), because to be really
> sure the client would
> need to learn about these push certificates out of band.

I was thinking that an in-band solution could be integrated as long as
we assume a compromise would result in an complete (unreconcilable)
fork attack; fork attacks aren't subtle and could be detected easily.

> 
> The model there would be:
> * A vulnerable piece of software exists.
> * It get's fixed (and the fix is pushed with a signed push)
> * the MITM server doesn't show the fix (show the code from before fix) nor
>   the push certificate thereof
> * client still pulls vulnerable code

Yes, this is a possible attack vector. However, a server could also
present a branch pointer as different (e.g., point an experimental
branch to an unsigned v1.1 tag). This has other implications, as the
code is pushed/pulled from upstream, it just goes somewhere different.

> 
> This model shows the distribution of push certs via the server itself may not be
> optimal.

Yes, it might not be optimal, but it could provide protection against
the attack I just described, for more complex attacks might not be so
subtle. Adding to this, developers likely coordinate their efforts
through other means (sic), so the lack of a push certificate (withheld
by a server) could raise some yellow flags.

We've made a proof of concept of such tool (in-bandh push certificates),
and would like to share the basic design of it here. However, it follows
our threat model: a compromised server that can't introduce malicious
code (thanks to commit signing), but can modify branch pointers and
other unsigned metadata to alter the repository's state.

> 
> Thanks for researching on Git,

Thanks for working in such a great tool :)
-Santiago

> Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-18  1:06   ` Santiago Torres
@ 2015-12-18  3:55     ` Jeff King
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2015-12-18  3:55 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Stefan Beller, Git

On Thu, Dec 17, 2015 at 08:06:36PM -0500, Santiago Torres wrote:

> > This is what push certificates ought to solve.
> > The server records all pushes and its signed certificates of pushes
> > and by the difference of the
> > refs (either in packed refs or as a loose ref) to the push certificate
> > this tampering of
> > the server can be detected.
> 
> Is there any specification about push certificates? I would like to read
> about them, but I don't seem to find documentation anywhere. Are they a
> part of git's specification?

Try pack-protocol.txt and protocol-capabilities.txt in the
Documentation/technical directory of the git.git repo.

E.g.:

  https://github.com/git/git/blob/bdfc6b364a51b19efbacbd46fcef5be41a5db50e/Documentation/technical/pack-protocol.txt#L489

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-16  3:26 [RFC] Malicously tampering git metadata? Santiago Torres
  2015-12-16  7:20 ` Stefan Beller
@ 2015-12-18  4:02 ` Jeff King
  2015-12-18 23:10 ` Theodore Ts'o
  2 siblings, 0 replies; 14+ messages in thread
From: Jeff King @ 2015-12-18  4:02 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Tue, Dec 15, 2015 at 10:26:39PM -0500, Santiago Torres wrote:

> An example of a malicious commit merge follows:
> 
> 1) The attacker controlling or acting as the upstream server identifies
> two branches: one in which the unsuspecting developer is working on, and
> another in which a vulnerable piece of code is located.

One thing to make clear here: the side branch with the vulnerable code
must be a _new_ vulnerability that was not already part of the "main"
branch the developer is working on. That is, I do not immediately see a
way to resurrect an old vulnerability, because a merge of the old,
broken commit would not result in reintroducing it.

This is more about "there was experimental junk on branch X, and you
tricked some developer into pulling X onto Y, and now Y unexpectedly has
the junk on it". And I agree with Stefan that push-certs are the
intended defense against this.

Of course, in the real world things are much easier. Most projects do
not sign commits at all, let alone use push certs. If developers are
pulling from a compromised server, then you can simply make up whatever
broken commits you want, and there's no way to tell the difference
between them and the real commits.

-Peff

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-16  3:26 [RFC] Malicously tampering git metadata? Santiago Torres
  2015-12-16  7:20 ` Stefan Beller
  2015-12-18  4:02 ` Jeff King
@ 2015-12-18 23:10 ` Theodore Ts'o
  2015-12-19 17:30   ` Santiago Torres
  2 siblings, 1 reply; 14+ messages in thread
From: Theodore Ts'o @ 2015-12-18 23:10 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Tue, Dec 15, 2015 at 10:26:39PM -0500, Santiago Torres wrote:
> 4) The developer pushes to upstream. All the traffic can be re-routed
> back to the original repository. The target branch now contains a
> vulnerable piece of code.

I assume you are assuming here that the "push to upstream" doesn't
involve some kind of human verification?  If someone tried pushing
something like this to Linus, he would be checking the git diff stats
and git commit structure for things that might look like "developer
negligence".  He's been known to complain to subsystem developers in
his own inimitable when the git commit structure looks suspicious, so
I'm pretty sure he would notice this.

But normally that developnment process we don't talk about "pushing to
upstream" as much as "requesting a pull".  So it would be useful when
you describe the attack to explicit describe the development workflow
that is vulnerable to your attack.

For example, in my use case, I'm authorative over changes in fs/ext4.
So when I pull from Linus's repo, I examine (using "gitk fs/ext4") all
commits coming from upstream that modify fs/ext4.  So if someone tries
introducing a change in fs/ext4 coming from "upstream", I will notice.
Then when I request a pull request from Linus, the git pull request
describes what commits are new in my tree that are not in his, and
shows the diffstats from upstream.  When Linus verifies my pull, there
are multiple opportunities where he will notice funny business:

a) He would notice that my origin commit is something that is not in
his upstream tree.

b) His diffstat is different from my diffstat (since thanks to the
man-in-the middle, the conception of what commits are new in the git
pull request will be different from his).

c) His diffstat will show that files outside of fs/ext4 have been
modified, which is a red flag that will merit more close examination.
(And if the attacker had tried to introduce a change in fs/ext4, I
would have noticed when I pulled from the man-in-the-middle git
repo.)

Now, if there is zero checking when the user pushes to upstream, then
yes, there are all sorts of potential problems.  But that's one of the
reasons why it's generally considered a good thing for Linux
developers to use as the origin commit for their work official
releases (which can be demarked using GPG-signed git tags).

So for example, the changes for ext4 that were sent to Linus for v4.4
was based off of v4.3-rc2:

git tag  --verify v4.3-rc2
object 1f93e4a96c9109378204c147b3eec0d0e8100fde
type commit
tag v4.3-rc2
tagger Linus Torvalds <torvalds@linux-foundation.org> 1442784761 -0700

Linux 4.3-rc2
gpg: Signature made Sun 20 Sep 2015 05:32:41 PM EDT using RSA key ID 00411886
gpg: Good signature from "Linus Torvalds <torvalds@linux-foundation.org>" [full]


And the changes which I sent to Linus were also signed by a tag, and
better yet, someone can indepedently verify this after the fact:

% git show --oneline --show-signature f41683a204ea61568f0fd0804d47c19561f2ee39
f41683a merged tag 'ext4_for_linus_stable'
gpg: Signature made Sun 06 Dec 2015 10:35:27 PM EST using RSA key ID 950D81A3
gpg: Good signature from "Theodore Ts'o <tytso@mit.edu>" [ultimate]
gpg:                 aka "Theodore Ts'o <tytso@debian.org>" [ultimate]
gpg:                 aka "Theodore Ts'o <tytso@google.com>" [ultimate]

They can also verify that the chain of commits that I sent to Linus
were rooted in Linus's signed v4.3-rc2 tag, so this kind of
after-the-fact auditing means that if there *were* funny business, it
could be caught even if Linus slipped up in his checking.


Now, the crazy behavior where github users randomly and promiscuously
do pushes and pull without doing any kind of verification may very
well be dangerous.  But so is someone who saves a 80 patch series from
their inbox, and without reading or verifying all of the patches
applies them blindly to their tree using "git am" --- or if they were
using cvs or svn, bulk applied the patches without doing any
verification....

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-18 23:10 ` Theodore Ts'o
@ 2015-12-19 17:30   ` Santiago Torres
  2015-12-20  1:28     ` Theodore Ts'o
  0 siblings, 1 reply; 14+ messages in thread
From: Santiago Torres @ 2015-12-19 17:30 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Git

> I assume you are assuming here that the "push to upstream" doesn't
> involve some kind of human verification?  If someone tried pushing
> something like this to Linus, he would be checking the git diff stats
> and git commit structure for things that might look like "developer
> negligence".  He's been known to complain to subsystem developers in
> his own inimitable when the git commit structure looks suspicious, so
> I'm pretty sure he would notice this.
> 
> For example, in my use case, I'm authorative over changes in fs/ext4.
> So when I pull from Linus's repo, I examine (using "gitk fs/ext4") all
> commits coming from upstream that modify fs/ext4.  So if someone tries
> introducing a change in fs/ext4 coming from "upstream", I will notice.
> Then when I request a pull request from Linus, the git pull request
> describes what commits are new in my tree that are not in his, and
> shows the diffstats from upstream.  When Linus verifies my pull, there
> are multiple opportunities where he will notice funny business:
> 
> a) He would notice that my origin commit is something that is not in
> his upstream tree.
> 
> b) His diffstat is different from my diffstat (since thanks to the
> man-in-the middle, the conception of what commits are new in the git
> pull request will be different from his).
> 
> c) His diffstat will show that files outside of fs/ext4 have been
> modified, which is a red flag that will merit more close examination.
> (And if the attacker had tried to introduce a change in fs/ext4, I
> would have noticed when I pulled from the man-in-the-middle git
> repo.)

Yes, we've been considering that these kind of attacks wouldn't be too
effective against, let's say, dictator-lieutenant workflows. 

> 
> Now, the crazy behavior where github users randomly and promiscuously
> do pushes and pull without doing any kind of verification may very
> well be dangerous. 

Yes, we were mostly familiar with this workflow before starting this
research. I can see how the "github generation" is open to many attacks
of this nature. Would git be interested in integrating a defense that
covers users of this nature (which seems to be a growing userbase)?

> But so is someone who saves a 80 patch series from
> their inbox, and without reading or verifying all of the patches
> applies them blindly to their tree using "git am" --- or if they were
> using cvs or svn, bulk applied the patches without doing any
> verification....

Just out of curiosity, are there known cases of projects in which this
has happened (I've noticed that both Git and Linux are quite stringent
in their review/merge process so this wouldn't be the case).

> 
> Cheers,

Thanks for the insight!
-Santiago.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-19 17:30   ` Santiago Torres
@ 2015-12-20  1:28     ` Theodore Ts'o
  2016-01-12 18:21       ` Santiago Torres
  0 siblings, 1 reply; 14+ messages in thread
From: Theodore Ts'o @ 2015-12-20  1:28 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Sat, Dec 19, 2015 at 12:30:18PM -0500, Santiago Torres wrote:
> > Now, the crazy behavior where github users randomly and promiscuously
> > do pushes and pull without doing any kind of verification may very
> > well be dangerous. 
> 
> Yes, we were mostly familiar with this workflow before starting this
> research. I can see how the "github generation" is open to many attacks
> of this nature. Would git be interested in integrating a defense that
> covers users of this nature (which seems to be a growing userbase)?

One of the interesting challenges is that git is a pretty low-level
tool, and so people have built all sorts of different workflows on top
of it.

For example, at $WORK, we use gerrit, which is a code review tool, so
all git commits that are to be merged into the "upstream" repository
0gets pushed to a gerrit server, where it goes through a code review
process where a second engineer can review the code, request changes,
make comments, or ask questions, and where the git commits can go
through multiple rounds of review / revision before they are finally
accepted (at least one reviewer must give a +2 review, and there must
be no -2 reviews; and there can be automated tools that do build or
regression tests that can give automated -1 or -2 reviews) --- and
where all of the information collected during the code review process
is saved as part of the audit trail for a Sarbanes-Oxley (SOX)
compliance regime.

Other people use github-style workflows, and others use signed tags
with e-mail code reviews, etc.  And I'm sure there must be many others.

So the challenge is that in order to accomodate many possible
workflows, some of which use third-party tools, changes to make git
more secure for one workflow must not get in the way of these other
workflows --- which means that enforcement of new controls for the
"github generation" probably will have to be optional.  But then
people belonging to the "github generation" can also easily turn off
these features.  And as the NSA learned the hard way in Vietnam, if
the tools cause any inconenience, or has the perception of
constraining legitmate users, security features can have a way of
getting turned off.[1]

[1] A History of US Communications Security, The David G. Boak
lectures, Volume II, "Nestor in Vietname".  pg 43-44.  (A declassified
version can be found at:
http://www.governmentattic.org/18docs/Hist_US_COMSEC_Boak_NSA_1973u.pdf)

> > But so is someone who saves a 80 patch series from
> > their inbox, and without reading or verifying all of the patches
> > applies them blindly to their tree using "git am" --- or if they were
> > using cvs or svn, bulk applied the patches without doing any
> > verification....
> 
> Just out of curiosity, are there known cases of projects in which this
> has happened (I've noticed that both Git and Linux are quite stringent
> in their review/merge process so this wouldn't be the case).

I can't point at specific instances, but given that in the "github
generation", people are fine with blindly pulling someone else's
Docker images and running it on their production servers or
workstations, and where software installation gets done with "wget
http://example.org | bash" or the equivalent, that it's probably more
often than we might be comfortable.

I also suspect that a bad guy would probably find inserting a
man-in-the-middle server into one of these installation flows is
probably a much more practical attack in terms of real world
considerations.  :-)

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2015-12-20  1:28     ` Theodore Ts'o
@ 2016-01-12 18:21       ` Santiago Torres
  2016-01-12 18:39         ` Stefan Beller
  0 siblings, 1 reply; 14+ messages in thread
From: Santiago Torres @ 2016-01-12 18:21 UTC (permalink / raw)
  To: Git

Hello Everyone,

Thanks for the feedback regarding our attack scenario; it certainly shed some
light in what is the current state of git's metadata protection. We were
pleasantly surprised that attacks of this nature were considered, yet we think
we can improve on the current mechanisms.

We have been designing an extension that addresses this attack scenario (and
other similar attacks). Although originally it is not based on push
certificates, we feel that it works similar to them. The principal advantages
over push certificates are:

1) It doesn't require (although it could support it) a side channel. We store similar
    information about branch status (push status) on the repository itself. 

2) It is backwards compatible, as it doesn't modify the existing metadata
    format.

3) Following Ted's email, it could be easily integrated in any git workflow.
    Although some workflows might be benefitted more than others, it doesn't
    get in the way of any existing workflow that we know of.

4) It covers a broader attack suurface (e.g., our malicious-merge scenario).

To keep things simple (we can elaborate in further emails), our solution
basically works by keeping track of pushes by developers in an append only
file, so that, everytime a branch is pushed, the deloper signs his version of
the log and his "push entry" (similar to a push certificate). Right now, we
push this log to a separate branch called BSL (for Branch State Log), but
ideallly this could be part of the git metadata. 

Upon pulling/fetching, this push certificate chain (BSL) is also fetched
and used to verify whether all branches are pointing to a sensible
location (i.e., the location reported by the last user who
pushed/merged).This ensures that a malicious server can't change the
location to which branches point to.

Furthermore, upon fetching, users with write access also push an entry
indicating that they fetched the BSL. This avoids cases in which a malicious
attacker keeps older versions of the BSL and withhold changes to other users.
This mainly addresses an attack we refer to as "effort duplication attack".

For key distribution, we consider it could be done via the BSL itself, where a
user can sign the public key of a newly-added developer to the system as an
entry. To produce a local trust keychain, our tool simply requires an initial
root of trust (say, the main developer's trusted key), and it builds up the
keychain by traversing the BSL. Revocation could be done similarly too.

We already have a proof of concept of this scheme, and we have performed timing
and storage measurements, as well as stress tests. We believe integration
should not be hard, and that it is not particularly invasive with any type of
workflow. We wanted to note that our current implementation was designed in
with the aim of not changing the Git source code (although we would like to do
that in the future), which makes it a little more network-storage
expensive than a native solution.

Here is the overhead of our proof of concept:

    1) Storage overhead only increases for about 1kb per push entry. Right now,
    since we are storing each entry as a separate file in a separate branch, the
    storage overhead is non-optimal. If this file were to be stored as git
    metadata, it would only be around 345 bytes, depending on the signature
    scheme). 
        
    2) Network overhead is linear, and it adds less than 23kb per
    pull/fetch/push, mainly because we are opening three different tcp sessions
    with git. If we were able to integrate this in the git protocol,
    network overhead should be really close to the storage overhead x
    unseen push entries.

    3) For "fetch entries", storage and network overhead is minimal, as users
    only need to upload a 4-byte random number per fetch. We consider this cost to
    be negligible, (although we'd like to hear the insight from people
    knowledgeable to git).

    4) Timing shouldn't be a concern. Verifying an entry (using 1024 dsa with libgcrypt)
    is done in 30 microseconds per entry with an average laptop. A BSL with 46633
    entries --- imagine git.git had a push entry for each commit --- is less than
    1.28 seconds (assuming the whole BSL is verified upon cloning, for example).

In general, this solution seems to be achieveable, nonintrusive and, overall, a
nice thing to have. We were considering working towards a patch if it sounds
reasonable to the community.

Thanks!
-Santiago.

On Sat, Dec 19, 2015 at 08:28:35PM -0500, Theodore Ts'o wrote:
> On Sat, Dec 19, 2015 at 12:30:18PM -0500, Santiago Torres wrote:
> > > Now, the crazy behavior where github users randomly and promiscuously
> > > do pushes and pull without doing any kind of verification may very
> > > well be dangerous. 
> > 
> > Yes, we were mostly familiar with this workflow before starting this
> > research. I can see how the "github generation" is open to many attacks
> > of this nature. Would git be interested in integrating a defense that
> > covers users of this nature (which seems to be a growing userbase)?
> 
> One of the interesting challenges is that git is a pretty low-level
> tool, and so people have built all sorts of different workflows on top
> of it.
> 
> For example, at $WORK, we use gerrit, which is a code review tool, so
> all git commits that are to be merged into the "upstream" repository
> 0gets pushed to a gerrit server, where it goes through a code review
> process where a second engineer can review the code, request changes,
> make comments, or ask questions, and where the git commits can go
> through multiple rounds of review / revision before they are finally
> accepted (at least one reviewer must give a +2 review, and there must
> be no -2 reviews; and there can be automated tools that do build or
> regression tests that can give automated -1 or -2 reviews) --- and
> where all of the information collected during the code review process
> is saved as part of the audit trail for a Sarbanes-Oxley (SOX)
> compliance regime.
> 
> Other people use github-style workflows, and others use signed tags
> with e-mail code reviews, etc.  And I'm sure there must be many others.
> 
> So the challenge is that in order to accomodate many possible
> workflows, some of which use third-party tools, changes to make git
> more secure for one workflow must not get in the way of these other
> workflows --- which means that enforcement of new controls for the
> "github generation" probably will have to be optional.  But then
> people belonging to the "github generation" can also easily turn off
> these features.  And as the NSA learned the hard way in Vietnam, if
> the tools cause any inconenience, or has the perception of
> constraining legitmate users, security features can have a way of
> getting turned off.[1]
> 
> [1] A History of US Communications Security, The David G. Boak
> lectures, Volume II, "Nestor in Vietname".  pg 43-44.  (A declassified
> version can be found at:
> http://www.governmentattic.org/18docs/Hist_US_COMSEC_Boak_NSA_1973u.pdf)
> 
> > > But so is someone who saves a 80 patch series from
> > > their inbox, and without reading or verifying all of the patches
> > > applies them blindly to their tree using "git am" --- or if they were
> > > using cvs or svn, bulk applied the patches without doing any
> > > verification....
> > 
> > Just out of curiosity, are there known cases of projects in which this
> > has happened (I've noticed that both Git and Linux are quite stringent
> > in their review/merge process so this wouldn't be the case).
> 
> I can't point at specific instances, but given that in the "github
> generation", people are fine with blindly pulling someone else's
> Docker images and running it on their production servers or
> workstations, and where software installation gets done with "wget
> http://example.org | bash" or the equivalent, that it's probably more
> often than we might be comfortable.
> 
> I also suspect that a bad guy would probably find inserting a
> man-in-the-middle server into one of these installation flows is
> probably a much more practical attack in terms of real world
> considerations.  :-)
> 
> Cheers,
> 
> 						- Ted

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2016-01-12 18:21       ` Santiago Torres
@ 2016-01-12 18:39         ` Stefan Beller
  2016-01-14 17:16           ` Santiago Torres
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Beller @ 2016-01-12 18:39 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Tue, Jan 12, 2016 at 10:21 AM, Santiago Torres <santiago@nyu.edu> wrote:
> Hello Everyone,
>
> Thanks for the feedback regarding our attack scenario; it certainly shed some
> light in what is the current state of git's metadata protection. We were
> pleasantly surprised that attacks of this nature were considered, yet we think
> we can improve on the current mechanisms.
>
> We have been designing an extension that addresses this attack scenario (and
> other similar attacks). Although originally it is not based on push
> certificates, we feel that it works similar to them. The principal advantages
> over push certificates are:
>
> 1) It doesn't require (although it could support it) a side channel. We store similar
>     information about branch status (push status) on the repository itself.
>
> 2) It is backwards compatible, as it doesn't modify the existing metadata
>     format.
>
> 3) Following Ted's email, it could be easily integrated in any git workflow.
>     Although some workflows might be benefitted more than others, it doesn't
>     get in the way of any existing workflow that we know of.
>
> 4) It covers a broader attack suurface (e.g., our malicious-merge scenario).
>
> To keep things simple (we can elaborate in further emails), our solution
> basically works by keeping track of pushes by developers in an append only
> file, so that, everytime a branch is pushed, the deloper signs his version of
> the log and his "push entry" (similar to a push certificate). Right now, we
> push this log to a separate branch called BSL (for Branch State Log), but
> ideallly this could be part of the git metadata.

Recently in another context (an alternative refs backend) there was a proposal
by Shawn to keep the .git directory versioned by git itself, i.e.
having only loose refs in
.git/refs and then there is a repository tracking .git/refs as a
directory structure.

Using that idea of a refs back end, combined with signed tags in the
refs repository
would give you signed version of the log of possible push entries.

>
> Upon pulling/fetching, this push certificate chain (BSL) is also fetched
> and used to verify whether all branches are pointing to a sensible
> location (i.e., the location reported by the last user who
> pushed/merged).This ensures that a malicious server can't change the
> location to which branches point to.

This is what push certs ought to solve already?
AFAIU the main issue with untrustworthy servers is holding back the latest push.
As Ted said, usually there is problem in the code and then the fix is pushed,
but the malicious server would not advertise the update, but deliver the old
unfixed version.

This attack cannot be mitigated by having either a side channel (email
announcements)
or time outs (state is only good if push cert is newer than <amount of
time>, but this may
require empty pushes)


>
> Furthermore, upon fetching, users with write access also push an entry
> indicating that they fetched the BSL. This avoids cases in which a malicious
> attacker keeps older versions of the BSL and withhold changes to other users.

This would make it a "be malicious to all or none" thing? So the
attacker cannot attack
a single target IIUC.

I have a bad feeling about repository modifications upon fetching as
software distribution
is a highly asymmetric workflow (number of fetches is many orders of
magnitudes larger than
pushes), which may not scale well? (How would you serialize parallel
fetches into the BSL?)

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2016-01-12 18:39         ` Stefan Beller
@ 2016-01-14 17:16           ` Santiago Torres
  2016-01-14 17:21             ` Stefan Beller
  0 siblings, 1 reply; 14+ messages in thread
From: Santiago Torres @ 2016-01-14 17:16 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git

Hello Stefan, thanks for your feedback again. 

> This is what push certs ought to solve already?

Yes, they aim to solve the same issue. Unfortunately, push certificates
don't solve all posible scenarios of metadata manipulation (e.g., a
malicious server changing branch pointers to trick a user into merging
unwanted changes).

> AFAIU the main issue with untrustworthy servers is holding back the latest push.
> As Ted said, usually there is problem in the code and then the fix is pushed,
> but the malicious server would not advertise the update, but deliver the old
> unfixed version.
> 
> This attack cannot be mitigated by having either a side channel (email
> announcements)
> or time outs (state is only good if push cert is newer than <amount of
> time>, but this may
> require empty pushes)
> 

I'm sorry, did you mean to say "can"? 

Yes, this is a possible solution to this issue. The solution I'm
proposing here wouldn't require a side channel though. As long as users'
keys are not compromised, the BSL could either expire (raise an alarm),
or force an irreconcileable fork attack --- the user who submited the
changes won't be able to push until this change makes it through.

> 
> >
> > Furthermore, upon fetching, users with write access also push an entry
> > indicating that they fetched the BSL. This avoids cases in which a malicious
> > attacker keeps older versions of the BSL and withhold changes to other users.
> 
> This would make it a "be malicious to all or none" thing? So the
> attacker cannot attack
> a single target IIUC.

Yes, this is true. The ides is that any attack to a single target is
easy to identify.

> 
> I have a bad feeling about repository modifications upon fetching as
> software distribution is a highly asymmetric workflow (number of
> fetches is many orders of magnitudes larger than pushes), which may
> not scale well? 

Yes, we were worried about the same ourselves. The upside is that, push
and fetch entries in our scheme are also orders of magnitude smaller;
fetch entries do not need to be signed and they can be as little as four
bytes (they might be gc-able also). 


> (How would you serialize parallel fetches into the BSL?)

Yes, this would imply that locking needs to be performed on the server
side. It is important to note that fetch entries are only relevant for
users to have write access (as only they are beneffited by them). For
read-only fetches, like an automated build, this feature could be
disabled.

> 
> Thanks,
> Stefan

Thanks again!
-Santiago.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2016-01-14 17:16           ` Santiago Torres
@ 2016-01-14 17:21             ` Stefan Beller
  2016-01-22 18:00               ` Santiago Torres
  0 siblings, 1 reply; 14+ messages in thread
From: Stefan Beller @ 2016-01-14 17:21 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Thu, Jan 14, 2016 at 9:16 AM, Santiago Torres <santiago@nyu.edu> wrote:
> Hello Stefan, thanks for your feedback again.
>
>> This is what push certs ought to solve already?
>
> Yes, they aim to solve the same issue. Unfortunately, push certificates
> don't solve all posible scenarios of metadata manipulation (e.g., a
> malicious server changing branch pointers to trick a user into merging
> unwanted changes).
>
>> AFAIU the main issue with untrustworthy servers is holding back the latest push.
>> As Ted said, usually there is problem in the code and then the fix is pushed,
>> but the malicious server would not advertise the update, but deliver the old
>> unfixed version.
>>
>> This attack cannot be mitigated by having either a side channel (email
>> announcements)
>> or time outs (state is only good if push cert is newer than <amount of
>> time>, but this may
>> require empty pushes)
>>
>
> I'm sorry, did you mean to say "can"?

Yes, formulating that sentence took a while and I did not proofread it.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2016-01-14 17:21             ` Stefan Beller
@ 2016-01-22 18:00               ` Santiago Torres
  2016-01-22 18:51                 ` Stefan Beller
  0 siblings, 1 reply; 14+ messages in thread
From: Santiago Torres @ 2016-01-22 18:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Git

On Thu, Jan 14, 2016 at 09:21:28AM -0800, Stefan Beller wrote:
> On Thu, Jan 14, 2016 at 9:16 AM, Santiago Torres <santiago@nyu.edu> wrote:
> > Hello Stefan, thanks for your feedback again.
> >
> >> This is what push certs ought to solve already?
> >
> > Yes, they aim to solve the same issue. Unfortunately, push certificates
> > don't solve all posible scenarios of metadata manipulation (e.g., a
> > malicious server changing branch pointers to trick a user into merging
> > unwanted changes).
> >
> >> AFAIU the main issue with untrustworthy servers is holding back the latest push.
> >> As Ted said, usually there is problem in the code and then the fix is pushed,
> >> but the malicious server would not advertise the update, but deliver the old
> >> unfixed version.
> >>
> >> This attack cannot be mitigated by having either a side channel (email
> >> announcements)
> >> or time outs (state is only good if push cert is newer than <amount of
> >> time>, but this may
> >> require empty pushes)
> >>
> >
> > I'm sorry, did you mean to say "can"?
> 
> Yes, formulating that sentence took a while and I did not proofread it.

Sorry, Stefan. I didn't mean to come off as rude; I just wanted to make
sure I understood correctly what you were proposing.

Do you have any further insight? I think that, besides the supporting
multiple workflows, maybe synchronizing concurrent fetches might be an
issue to our solution.

Thanks a lot!
-Santiago.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC] Malicously tampering git metadata?
  2016-01-22 18:00               ` Santiago Torres
@ 2016-01-22 18:51                 ` Stefan Beller
  0 siblings, 0 replies; 14+ messages in thread
From: Stefan Beller @ 2016-01-22 18:51 UTC (permalink / raw)
  To: Santiago Torres; +Cc: Git

On Fri, Jan 22, 2016 at 10:00 AM, Santiago Torres <santiago@nyu.edu> wrote:
> On Thu, Jan 14, 2016 at 09:21:28AM -0800, Stefan Beller wrote:
>> On Thu, Jan 14, 2016 at 9:16 AM, Santiago Torres <santiago@nyu.edu> wrote:
>> > Hello Stefan, thanks for your feedback again.
>> >
>> >> This is what push certs ought to solve already?
>> >
>> > Yes, they aim to solve the same issue. Unfortunately, push certificates
>> > don't solve all posible scenarios of metadata manipulation (e.g., a
>> > malicious server changing branch pointers to trick a user into merging
>> > unwanted changes).
>> >
>> >> AFAIU the main issue with untrustworthy servers is holding back the latest push.
>> >> As Ted said, usually there is problem in the code and then the fix is pushed,
>> >> but the malicious server would not advertise the update, but deliver the old
>> >> unfixed version.
>> >>
>> >> This attack cannot be mitigated by having either a side channel (email
>> >> announcements)
>> >> or time outs (state is only good if push cert is newer than <amount of
>> >> time>, but this may
>> >> require empty pushes)
>> >>
>> >
>> > I'm sorry, did you mean to say "can"?
>>
>> Yes, formulating that sentence took a while and I did not proofread it.
>
> Sorry, Stefan. I didn't mean to come off as rude; I just wanted to make
> sure I understood correctly what you were proposing.

Not at all, I just made a typo. :)

>
> Do you have any further insight? I think that, besides the supporting
> multiple workflows, maybe synchronizing concurrent fetches might be an
> issue to our solution.

I did not think further about any issues there.

Thanks,
Stefan

>
> Thanks a lot!
> -Santiago.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-01-22 18:51 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-16  3:26 [RFC] Malicously tampering git metadata? Santiago Torres
2015-12-16  7:20 ` Stefan Beller
2015-12-18  1:06   ` Santiago Torres
2015-12-18  3:55     ` Jeff King
2015-12-18  4:02 ` Jeff King
2015-12-18 23:10 ` Theodore Ts'o
2015-12-19 17:30   ` Santiago Torres
2015-12-20  1:28     ` Theodore Ts'o
2016-01-12 18:21       ` Santiago Torres
2016-01-12 18:39         ` Stefan Beller
2016-01-14 17:16           ` Santiago Torres
2016-01-14 17:21             ` Stefan Beller
2016-01-22 18:00               ` Santiago Torres
2016-01-22 18:51                 ` Stefan Beller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.