* how to (integrity) verify a whole git repo @ 2020-04-21 4:45 Christoph Anton Mitterer 2020-04-21 6:53 ` Jonathan Nieder 2020-04-21 19:14 ` Junio C Hamano 0 siblings, 2 replies; 7+ messages in thread From: Christoph Anton Mitterer @ 2020-04-21 4:45 UTC (permalink / raw) To: git Hi. It seems I couldn't really find any definitive answer one the following: How to cryptographically verify the integrity of a whole git repo (i.e. all it's commits/blobs/etc. in the history? Assume e.g. I have the kernel sources and want to do some bisection. One has also retrieved Linus' and GregKH's key via some trusted path and assumes that SHA1 is more or less still safe enough ;-) 1) Of course there is git verify-tag and verify-commit which are signed with the GPPG, but these alone check, AFAIU, only the respective tag/commit. How to check everything else? Is it enough to git fsck --full? Everything earlier in the history of a verified tag/commit should be cryptographically safe (assuming SHA1 would be still secure enough), right? 2) But this of course won't show me anything which is in the repo but not earlier in the history of the tag/commit I've checked, right?! Is there a way to e.g. have everything dropped which is not verifiable via some signed commit/tag? 3) I'd assume that normal operations like checkout/bisect/etc. notice if some SHA1 sum doesn't match. So once I've verified say kernel v.5.6 tag, I could checkout everything in the history of that and be sure it wasn't modified, right? But of course this wouldn't include e.g. other stable versions, like v5.5.13. Thanks, Chris. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how to (integrity) verify a whole git repo 2020-04-21 4:45 how to (integrity) verify a whole git repo Christoph Anton Mitterer @ 2020-04-21 6:53 ` Jonathan Nieder 2020-04-21 14:42 ` Christoph Anton Mitterer 2020-04-21 19:14 ` Junio C Hamano 1 sibling, 1 reply; 7+ messages in thread From: Jonathan Nieder @ 2020-04-21 6:53 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: git Hi Christoph, Christoph Anton Mitterer wrote: > How to cryptographically verify the integrity of a whole git repo (i.e. > all it's commits/blobs/etc. in the history? This happens automatically as part of fetch. When you fetch, the objects' content is transfered over the wire but not their names. The name of each object is a hash of its content. Thus, whenever you address an object by its name, you are using its verified identity. > Assume e.g. I have the kernel sources and want to do some bisection. > One has also retrieved Linus' and GregKH's key via some trusted path > and assumes that SHA1 is more or less still safe enough ;-) > > 1) Of course there is git verify-tag and verify-commit which are signed > with the GPPG, but these alone check, AFAIU, only the respective > tag/commit. Tag and commit object content include the object ids for the objects they reference, so (assuming we are using a strong hash) their name is enough to verify all content reachable from them. In other words, it's a Merkle tree. > How to check everything else? Is it enough to git fsck --full? fsck is helpful for checking that objects are valid --- that they don't reference any objects you don't have, that their format is correct, and so on. So it's good to run (or you can use the transfer.fsckObjects setting to run fsck as part of the clone or fetch operation). Thanks and hope that helps, Jonathan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how to (integrity) verify a whole git repo 2020-04-21 6:53 ` Jonathan Nieder @ 2020-04-21 14:42 ` Christoph Anton Mitterer 2020-04-21 16:19 ` Konstantin Ryabitsev 0 siblings, 1 reply; 7+ messages in thread From: Christoph Anton Mitterer @ 2020-04-21 14:42 UTC (permalink / raw) To: Jonathan Nieder; +Cc: git Hey Jonathan. On Mon, 2020-04-20 at 23:53 -0700, Jonathan Nieder wrote: > This happens automatically as part of fetch. When you fetch, the > objects' content is transfered over the wire but not their > names. The > name of each object is a hash of its content. Thus, whenever you > address an object by its name, you are using its verified identity. Okay maybe I wasn't clear enough :D (mixing up integrity and authenticity). I'd guess that what you describe here is, that effectively the chain of all SHA1 hashes is computed when one does fetch, right? But this alone doesn't guarantee cryptographic authenticity, e.g. as in "that's the kernel sources as released by Linus". > Tag and commit object content include the object ids for the objects > they reference, so (assuming we are using a strong hash) their name > is enough to verify all content reachable from them. > > In other words, it's a Merkle tree. And for (cryptographically) checking the authenticity of that tree, wouldn't I need to verify the signatures on it's leaves? Taking again the kernel as an example: If I clone the repo (or fsck it later), than all I know is that there was no corruption, if the all the tips are correct, since they start the chain of hash sums to all other objects. But an attacker could have just forged these tips. So for checking authenticity, I need to verify some signatures on them Now if I check e.g. Linus signature on tag v5.6; I should know that everything earlier (in the tree, not chronologically) to that tag are authentic. But not e.g. any commits on top of v.5.6 (which aren't either signed themselves or protected by another tag "above" them). Neither any commits never reached from v.5.6, e.g. later stable patches like anything from above v.5.5 (which is again below v.5.6) up to v.5.5.13, which is not. So from my understanding, to use only commits that are authentic by the kernel upstream developers, I'd need verify all these tips.. and throw away everything which is not reachable by one of them. Is that somehow possible? Thanks, Chris. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how to (integrity) verify a whole git repo 2020-04-21 14:42 ` Christoph Anton Mitterer @ 2020-04-21 16:19 ` Konstantin Ryabitsev 2020-04-23 18:12 ` Christoph Anton Mitterer 0 siblings, 1 reply; 7+ messages in thread From: Konstantin Ryabitsev @ 2020-04-21 16:19 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: Jonathan Nieder, git On Tue, Apr 21, 2020 at 04:42:16PM +0200, Christoph Anton Mitterer wrote: > Taking again the kernel as an example: > If I clone the repo (or fsck it later), than all I know is that there > was no corruption, if the all the tips are correct, since they start > the chain of hash sums to all other objects. Notably, there is normally only one branch in torvalds/linux.git, and that's "master". So, there's only one tip. > But an attacker could have just forged these tips. > So for checking authenticity, I need to verify some signatures on them > > Now if I check e.g. Linus signature on tag v5.6; I should know that > everything earlier (in the tree, not chronologically) to that tag are > authentic. Yes, verifying a signature on a tag tells you that all commits are bit-for-bit exactly the same as on Linus's workstation where he created the signature. > But not e.g. any commits on top of v.5.6 (which aren't either signed > themselves or protected by another tag "above" them). This is mostly true, yes. > Neither any commits never reached from v.5.6, e.g. later stable patches > like anything from above v.5.5 (which is again below v.5.6) up to > v.5.5.13, which is not. Stable commits would be in the stable tree, and those tags are signed by Greg Kroah-Hartman. > So from my understanding, to use only commits that are authentic by the > kernel upstream developers, I'd need verify all these tips.. and throw > away everything which is not reachable by one of them. > > Is that somehow possible? You probably don't care about commits that arrive between releases, so effectively you are already doing that? Even if you have loose objects that aren't reachable from your current tip (e.g. you only care about objects in the stable branch linux-5.6.y), it's not like they are going to "poison" your tree, so removing them is just a garbage collection operation at best. ## Minor attestation rant I would argue that your premise of "authenticity" is wrong. The best that we are currently able to offer is a guarantee that, at the point where the tag was signed, the tree is bit-for-bit exact to the tree the way it exists on Linus Torvalds' (or Greg KH's) workstation. However, both Linus and Greg merge code from tens of thousands of other contributors and it's important to keep in mind that their tag signatures do not offer any kind of attestation proof of the code's actual authorship or origin. Looking for such proof would be near-impossible -- even if we had a universally accepted mechanism to do cryptographic attestation of all patches and commits, normal maintainer operations would necessarily break this chain: - maintainers insert their own trailers into commit messages (Signed-off-by, Tested-by, Acked-by, etc). - maintainers reorder and edit patches that they receive from individual contributors -- for typos, minor stylistical cleanups, extra comments, etc. - maintainers routinely rebase patches they receive before they can submit them to be merged into mainline. Full code attestation is possible in projects where all commits are forks and merges -- for example, many Git**b/Gerrit projects could be set up to require full cryptographic attestation of commits, if all operations are forks, pull requests, and merges. However, it would be impossible to force this development paradigm onto the Linux kernel -- it would be extremely disruptive and require massive individual effort to overhaul every maintainer's workflow. Furthermore, many maintainers would reject this approach because they would disagree about the main premise behind the effort -- that cryptographically signing every commit offers enough tangible benefit to be worth it. Let me expound on the last point. There are some 15,000 personas who have committed code to the Linux kernel (a persona could be the same person committing code from different commercial entities -- jdoe@google.com vs jdoe@redhat.com). Even if we assume that each commit is signed, we then must have a way to perform some kind of meaningful verification, right? - Where do we get all the public keys required for such a task? - How do we handle cases where a key has expired or worse, has been revoked by the developer? This can't invalidate their past commits, because it's impossible to re-sign those. - How do we bootstrap distributed trust without relying on someone being a Fundamentally Non-corruptible Person? It's certainly not me -- I have close relatives living under, shall we say, regimes with loose standards when it comes to personal freedoms. - How much trust should we be putting into cryptographic signatures? Linux developers aren't necessarily that much better about keeping their workstations protected against malicious attacks, so they are just as vulnerable to having their private keys stolen as anyone else. For this reason, Linux maintainers use either a zero-trust approach, or a last-leg trust approach: - Submaintainers don't put much trust into *who* wrote the code and review all submissions they receive as potentially containing security bugs (intentional or not); their job is to review the code and pass it up the chain to maintainers. - if maintainers receive pull requests from submaintainers, then they *may* check cryptographic signatures on the trees they pull. I am trying to encourage all maintainers to do this, and I've been working to introduce patch attestation so that maintainers preferring to work with patch series as opposed to pull requests can have similar functionality. - Linus checks all signatures on trees he pulls from non-kernel.org locations. Unfortunately, I've not been able to convince him that he should check them on stuff he pulls from kernel.org as well (and he has his own reasons for that). So, all of this is to say that as the person cloning linux.git you are merely the last link in the chain of "trusting the maintainer before you." In your case that maintainer is Linus (or Greg KH), and you have to agree that, in the end, "having a tree that is bit-for-bit identical with what Linus has" is a pretty good assurance that it's as "authentic Linux" as it gets. -K ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how to (integrity) verify a whole git repo 2020-04-21 16:19 ` Konstantin Ryabitsev @ 2020-04-23 18:12 ` Christoph Anton Mitterer 0 siblings, 0 replies; 7+ messages in thread From: Christoph Anton Mitterer @ 2020-04-23 18:12 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: git On Tue, 2020-04-21 at 12:19 -0400, Konstantin Ryabitsev wrote: > > So from my understanding, to use only commits that are authentic by > > the > > kernel upstream developers, I'd need verify all these tips.. and > > throw > > away everything which is not reachable by one of them. > > > > Is that somehow possible? > > You probably don't care about commits that arrive between releases, No, I guess not. > so > effectively you are already doing that? Even if you have loose > objects > that aren't reachable from your current tip (e.g. you only care > about > objects in the stable branch linux-5.6.y), it's not like they are > going > to "poison" your tree, so removing them is just a garbage collection > operation at best. Well it's clear that any "loose" objects (in the sense of "not part of the history by something that is signed and that I trust") don't poison the tree before whatever I trust... but of course only if one never accidentally uses anything of them. For the Linus' kernel and Gerg's stable kernel repos this is probably rather unlikely, since the do not contain much which is not signed by one of the two. But for other projects one might have many development branches or other stuff which might not be signed at all. When one then "works" on such a repo one doesn't always want to check whether the current stuff one uses is signed and trusted or not. > I would argue that your premise of "authenticity" is wrong. The best > that we are currently able to offer is a guarantee that, at the > point > where the tag was signed, the tree is bit-for-bit exact to the tree > the > way it exists on Linus Torvalds' (or Greg KH's) workstation. And isn't that already something? :-D It means to the least, that no simple MitM was possible in contrast to when I just git clone git://whatever . > However, both Linus and Greg merge code from tens of thousands of > other > contributors and it's important to keep in mind that their tag > signatures do not offer any kind of attestation proof of the code's > actual authorship or origin. Sure... but this is anyway the case... and nothing which one could easily change or improve. The best thing in terms of authenticity on can possibly get is being able to have the repo exactly the same as it considered correct at it's canonical upstream. Everything better would require full trust paths and mutual signing between all participating developers - which would surely be nice to have, but is probably a completely other question. Also, there are many much smaller projects, where things would be much easier. > > - Submaintainers don't put much trust into *who* wrote the code and > review all submissions they receive as potentially containing > security > bugs (intentional or not); their job is to review the code and pass > it > up the chain to maintainers. > - if maintainers receive pull requests from submaintainers, then > they > *may* check cryptographic signatures on the trees they pull. I am > trying to encourage all maintainers to do this, and I've been > working > to introduce patch attestation so that maintainers preferring to > work > with patch series as opposed to pull requests can have similar > functionality. > - Linus checks all signatures on trees he pulls from non-kernel.org > locations. Unfortunately, I've not been able to convince him that > he > should check them on stuff he pulls from kernel.org as well (and > he > has his own reasons for that). But all this gives already quite some trust into the whole thing. > So, all of this is to say that as the person cloning linux.git you > are > merely the last link in the chain of "trusting the maintainer before > you." In your case that maintainer is Linus (or Greg KH), and you > have > to agree that, in the end, "having a tree that is bit-for-bit > identical > with what Linus has" is a pretty good assurance that it's as > "authentic > Linux" as it gets. Exactly... it's at least not much worse (if at all) than taking e.g. my pre-compiled distro kernel, for which the sources are like not better checked or more securely retrieved than when I clone Linus' git and verify the tags. My main concern was really to ideally "throw away" everything which wasn't protected by a set of certain keys,... so that I wouldn't accidentally use it. Thanks, Chris. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how to (integrity) verify a whole git repo 2020-04-21 4:45 how to (integrity) verify a whole git repo Christoph Anton Mitterer 2020-04-21 6:53 ` Jonathan Nieder @ 2020-04-21 19:14 ` Junio C Hamano 2020-04-23 4:02 ` Christoph Anton Mitterer 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2020-04-21 19:14 UTC (permalink / raw) To: Christoph Anton Mitterer; +Cc: git Christoph Anton Mitterer <calestyo@scientia.net> writes: > How to check everything else? Is it enough to git fsck --full? > > Everything earlier in the history of a verified tag/commit should be > cryptographically safe (assuming SHA1 would be still secure enough), > right? Correct. > 2) But this of course won't show me anything which is in the repo but > not earlier in the history of the tag/commit I've checked, right?! > Is there a way to e.g. have everything dropped which is not verifiable > via some signed commit/tag? You can compute the commits that are not reachable from any of the signed tags. git rev-list --all --not $list_tags_and_commits_you_trust_here will enumerate all the commits that are not reachable from those tags. But your "have everything dropped" is a fuzzy notion and you must be more precise to define what you want. Imagine this history: ----o-----o-----L-----x----x-----x-----x-----x----x HEAD (master) / / / ... ------o----o----G where you have two people you trust (Linus and Greg), HEAD is the tip of your 'master' branch, probably you fetched from Linus, L and G are the two recent tags Linus and Greg signed. If you enumerate commits that are not reachable from L or G, you'll get all commits that are marked with 'x'. Commits marked with 'o' are reachable from either 'L' or 'G', and you would want to keep them. Now, you need to define what you mean by "have everything dropped". You can remove commits 'x' but then after that where would your 'master' branch point at? There is no good answer to that question. What you could do is remove all branches and tags except for the signed tags you trust from your repository and then use "git repack" the repository. Then there will be tags that point at L and G but you'd be discarding 'master' (which is not signed) and repack will discard all 'x' in the sample history illustrated above. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: how to (integrity) verify a whole git repo 2020-04-21 19:14 ` Junio C Hamano @ 2020-04-23 4:02 ` Christoph Anton Mitterer 0 siblings, 0 replies; 7+ messages in thread From: Christoph Anton Mitterer @ 2020-04-23 4:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hey Junio. On Tue, 2020-04-21 at 12:14 -0700, Junio C Hamano wrote: > You can compute the commits that are not reachable from any of the > signed tags. > > git rev-list --all --not $list_tags_and_commits_you_trust_here > > will enumerate all the commits that are not reachable from those > tags. And with reachable you mean: "commits which were not before the commits I trust" ("before obviously again not in terms of their date, but their position in the tree"). > But your "have everything dropped" is a fuzzy notion and you must be > more precise to define what you want. Imagine this history: > > > ----o-----o-----L-----x----x-----x-----x-----x----x HEAD (master) > / > / > / > ... ------o----o----G > > where you have two people you trust (Linus and Greg), HEAD is the > tip of your 'master' branch, probably you fetched from Linus, L and > G are the two recent tags Linus and Greg signed. > > If you enumerate commits that are not reachable from L or G, you'll > get all commits that are marked with 'x'. Commits marked with 'o' > are reachable from either 'L' or 'G', and you would want to keep > them. That seems to be more or less what I'd want. > Now, you need to define what you mean by "have everything dropped". > You can remove commits 'x' but then after that where would your > 'master' branch point at? There is no good answer to that question. Hmm well naively I'd have said master should point to L, assuming Greg's branch was merged into it and assuming git knows which branch was the one merged into. Of course that would leave Greg's branch possibly dangling at G. Maybe one could handle such cases like this: ----o-----o-----L-----x----x-----x-----x-----x----x HEAD (master) / / / ... -----t----o----G If the former branch name can be determined (from the commit message?), recreate it. If not, the commits from Greg's branch could be either left unreachable or maybe, with some special option, could be pointed at by some newly created branch-name foo-1 or whatever. If Greg's branch contains a commit pointed to by tag (here named t), at least this would be reachable anyway. But I guess for the use case I'm thinking about, unreachable commits wouldn't be that much of a problem. > What you could do is remove all branches and tags except for the > signed tags you trust from your repository and then use "git repack" > the repository. Then there will be tags that point at L and G but > you'd be discarding 'master' (which is not signed) and repack will > discard all 'x' in the sample history illustrated above. Well one could probably just manually set master to some reasonable commit, i.e. the one which was likely anyway master at some point in time, until Linus added further commits. Is there an easy (like for people who don't dream in git ;-) ) and ideally fast way to do all this. I would have guessed that a command which does this more or less out of the box, might be quite helpful for security conscious people. The scenario shouldn't be so rare: - one clones a repo, where commits are usually not signed, but tags are - one has a number of trusted people and can even securely retrieve their keys (in my case, Debian ships Linus' and Greg's key in the source package of the kernel) - one needs to work with the repo, including any older states in the history (in my case it's trying to bisect the - for me - showstopper bug: https://bugzilla.kernel.org/show_bug.cgi?id=207245 ) - one doesn't want to use anything which is not signed by trusted people, so basically one wants a repo, as if it would have just been cloned when all branches/etc. were at the state of a signed tag (or commit). So I have something like the (stable)kernel repo which looks a bit like (with (L) and (G) indicating who signed): x---x---x--- foo / ----o-----v.5.5(L)----o----o-----v.5.6(L)----x-----x----x master \ \ \ o----v.5.6.1(G)---o----v.5.6.1(G) \ o----o----v.5.5.1(G)---o---o---v.5.5.1(G)---x---x A command like: git drop-unsigned-stuff --trusted-key 00411886 --trusted-key 6092693E would end up in this (and even garbage-collect all unreachable stuff already, unless one uses some special option): ----o-----v.5.5(L)----o----o-----v.5.6(L) master \ \ \ o----v.5.6.1(G)---o----v.5.6.1(G) \ o----o----v.5.5.1(G)---o---o---v.5.5.1(G) So with that repo, unless I fetch something new, I could be sure, everything I have or I could potentially checkout was at some time trusted by someone I trust. In the example above, a branch (foo) which is completely unsigned would consequentially be dropped completely. In earlier days, most projects released their (signed) sources as some tarball,...many nowadays just set (and sometimes even sign) some git tag (which is great)... but with the old tarball one could have been sure that everything in it is trusted (if one trusts the signer), which git this is of course less simple. So such cases I would have liked a simple way to get rid of everything untrusted. But probably my use case is just too exotic, otherwise git would already have a helper command for it ^^ Cheers, Chris. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2020-04-23 18:12 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-04-21 4:45 how to (integrity) verify a whole git repo Christoph Anton Mitterer 2020-04-21 6:53 ` Jonathan Nieder 2020-04-21 14:42 ` Christoph Anton Mitterer 2020-04-21 16:19 ` Konstantin Ryabitsev 2020-04-23 18:12 ` Christoph Anton Mitterer 2020-04-21 19:14 ` Junio C Hamano 2020-04-23 4:02 ` Christoph Anton Mitterer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).