* Gitorious should use CRC128 / 256 / 512 instead of SHA-1 @ 2023-01-13 12:59 Hans Petter Selasky 2023-01-13 13:30 ` Konstantin Khomoutov 0 siblings, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 12:59 UTC (permalink / raw) To: git Hi, Currently GIT only supports cryptographic hashes for its commit tags. That means: 1) It's very difficult to edit the history without also recomputing the hash tags for all commits after the needed change-point, which then means references to a repository is broken. 2) Only a single bit error in the main repository can break everything! 3) Illicit contents may be present in binary blobs, which in the future may be need to be removed without warrant and the only way to do that is by rebasing and force pushing, which will break "everything". It can be everything from child-porn to expired distribution licenses. Many people think that bit errors cannot happen because the memory uses ECC and the file system uses cryptographic hashes to verify the integrity of the data. But what many people forget about is that when copying data from memory to disk, typically using a DMA channel data is copied w/o any kind of integrity protection, because the integrity protection is not end-to-end. The integrity protection is only per-link. Therefore I propose the following changes to GIT. 1) Use a CRC128 / 256 or 512 non-cryptographic based hashing algorithm as default. 2) Add support for a CRC fixup field, which usually is zero, but when merges are needed, it can be non-zero, to allow the hash-tag-value to remain the same! This also allows for easy conversion of existing GIT repositories to the new scheme. 3) All git objects should be uncompressed. CRC-XXX can easily be used to correct multiple bit errors without any performance overhead. Please CC me. I'm not subscribed to this list. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 12:59 Gitorious should use CRC128 / 256 / 512 instead of SHA-1 Hans Petter Selasky @ 2023-01-13 13:30 ` Konstantin Khomoutov 2023-01-13 13:39 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Khomoutov @ 2023-01-13 13:30 UTC (permalink / raw) To: git; +Cc: Hans Petter Selasky On Fri, Jan 13, 2023 at 01:59:44PM +0100, Hans Petter Selasky wrote: > Currently GIT only supports cryptographic hashes for its commit tags. [...] https://github.com/git/git/blob/9bf691b78cf906751e65d65ba0c6ffdcd9a5a12c/Documentation/technical/hash-function-transition.txt It's not clear why are you referring to Gitorious in your mail's subject and then talk about Git. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 13:30 ` Konstantin Khomoutov @ 2023-01-13 13:39 ` Hans Petter Selasky 2023-01-13 14:21 ` rsbecker ` (2 more replies) 0 siblings, 3 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 13:39 UTC (permalink / raw) To: git On 1/13/23 14:30, Konstantin Khomoutov wrote: > On Fri, Jan 13, 2023 at 01:59:44PM +0100, Hans Petter Selasky wrote: > >> Currently GIT only supports cryptographic hashes for its commit tags. > [...] > > https://github.com/git/git/blob/9bf691b78cf906751e65d65ba0c6ffdcd9a5a12c/Documentation/technical/hash-function-transition.txt > > It's not clear why are you referring to Gitorious in your mail's subject and > then talk about Git. > Hi, I thought that Git was short for Gitorious? My bad. The document you refer to really highlights my concerns, that a strong cryptographic hash algorithm is the highway to hell. Do _not_ use a cryptographic hash for Git. Use plain good old CRC hashes. Just imagine the consequences of finding child porn inside a 10-year old firmware binary blob in the Linux kernel. Will you just ignore it, or will you fix it? That's why I say, that it must be possible to forge the hashes by default. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 13:39 ` Hans Petter Selasky @ 2023-01-13 14:21 ` rsbecker 2023-01-13 14:42 ` Hans Petter Selasky ` (2 more replies) 2023-01-13 15:30 ` Konstantin Khomoutov 2023-01-13 15:39 ` Konstantin Ryabitsev 2 siblings, 3 replies; 42+ messages in thread From: rsbecker @ 2023-01-13 14:21 UTC (permalink / raw) To: 'Hans Petter Selasky', git On January 13, 2023 8:40 AM, Hans Petter Selasky wrote: >On 1/13/23 14:30, Konstantin Khomoutov wrote: >> On Fri, Jan 13, 2023 at 01:59:44PM +0100, Hans Petter Selasky wrote: >> >>> Currently GIT only supports cryptographic hashes for its commit tags. >> [...] >> >> https://github.com/git/git/blob/9bf691b78cf906751e65d65ba0c6ffdcd9a5a1 >> 2c/Documentation/technical/hash-function-transition.txt >> >> It's not clear why are you referring to Gitorious in your mail's >> subject and then talk about Git. >> > >Hi, > >I thought that Git was short for Gitorious? My bad. > >The document you refer to really highlights my concerns, that a strong >cryptographic hash algorithm is the highway to hell. > >Do _not_ use a cryptographic hash for Git. Use plain good old CRC hashes. > >Just imagine the consequences of finding child porn inside a 10-year old firmware >binary blob in the Linux kernel. Will you just ignore it, or will you fix it? > >That's why I say, that it must be possible to forge the hashes by default. I do not understand the goal of this request. If it is possible to forge hashes, then nothing in a git repository can ever be trusted. Signed content will no longer be verifiable. The whole Merkel Tree representing the commit history becomes easily corruptible by hackers and no upstream remote repository can ever be trusted - or someone's own if someone targets a repo with malware that rewrites hashes. Imagine a scenario when malware replaces a blob in a repo and then forges the hash to pretend that the replacement never occurred. Using git as a supply chain audit trail becomes impossible. This is a potential vector for ransomware invading the git ecosystem. This seems like a really fatal path to take for the product. The advantage of how git functions is that it is possible to mirror or clone repositories, protecting from hardware errors. Repositories exist in distributed form, so there may be hundreds or thousands of copies in case someone's copy is corrupted by a disk or memory write error - so that takes hash reconstruction out of the requirement set. If the git architecture was based on a central repository model only, then this might be a reasonable request, but that is not how git works. If, for instance, a main GitHub repo is somehow corrupted, it can be repaired by a push --force or a clone from a different instance. Unless I am missing your point. --Randall ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 14:21 ` rsbecker @ 2023-01-13 14:42 ` Hans Petter Selasky 2023-01-13 15:45 ` Konstantin Ryabitsev 2023-01-13 15:15 ` Hans Petter Selasky 2023-01-13 17:44 ` Philip Oakley 2 siblings, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 14:42 UTC (permalink / raw) To: rsbecker, git On 1/13/23 15:21, rsbecker@nexbridge.com wrote: > On January 13, 2023 8:40 AM, Hans Petter Selasky wrote: >> On 1/13/23 14:30, Konstantin Khomoutov wrote: >>> On Fri, Jan 13, 2023 at 01:59:44PM +0100, Hans Petter Selasky wrote: >>> >>>> Currently GIT only supports cryptographic hashes for its commit tags. >>> [...] >>> >>> https://github.com/git/git/blob/9bf691b78cf906751e65d65ba0c6ffdcd9a5a1 >>> 2c/Documentation/technical/hash-function-transition.txt >>> >>> It's not clear why are you referring to Gitorious in your mail's >>> subject and then talk about Git. >>> >> >> Hi, >> >> I thought that Git was short for Gitorious? My bad. >> >> The document you refer to really highlights my concerns, that a strong >> cryptographic hash algorithm is the highway to hell. >> >> Do _not_ use a cryptographic hash for Git. Use plain good old CRC hashes. >> >> Just imagine the consequences of finding child porn inside a 10-year old firmware >> binary blob in the Linux kernel. Will you just ignore it, or will you fix it? >> >> That's why I say, that it must be possible to forge the hashes by default. > Hi, > I do not understand the goal of this request. If it is possible to forge hashes, then nothing in a git repository can ever be trusted. Signed content will no longer be verifiable. The whole Merkel Tree representing the commit history becomes easily corruptible by hackers and no upstream remote repository can ever be trusted - or someone's own if someone targets a repo with malware that rewrites hashes. Imagine a scenario when malware replaces a blob in a repo and then forges the hash to pretend that the replacement never occurred. Using git as a supply chain audit trail becomes impossible. This is a potential vector for ransomware invading the git ecosystem. This seems like a really fatal path to take for the product. If a hacker replaces a blob, everyone on the project will see it, because such changes typically generate a commit e-mail. And then an action will be made to revoke the access of that hacker. Now a clever hacker wouldn't do that. A clever hacker would just flip one bit somewhere in a random blob, looking like a hardware fault, and then force the project to rewind to backups every day, because the repository can no longer be verified. > The advantage of how git functions is that it is possible to mirror or clone repositories, protecting from hardware errors. Repositories exist in distributed form, so there may be hundreds or thousands of copies in case someone's copy is corrupted by a disk or memory write error - so that takes hash reconstruction out of the requirement set. If the git architecture was based on a central repository model only, then this might be a reasonable request, but that is not how git works. If, for instance, a main GitHub repo is somehow corrupted, it can be repaired by a push --force or a clone from a different instance. > There is no advantage from protecting from hardware errors, unless you can recover from them! Cryptographic hash algorithms are not suitable to recover bits. They only tell data is OK or NOK, and if there is no backup, you loose it! It is no solution for big repositories to rewind to backups just because of bit-flips. Such problems should be fixed w/o the need to roll-back, because that stops the entire production! > it can be repaired by a push --force Hobby projects can do that, but not big projects like FreeBSD and the Linux kernel. > Unless I am missing your point. Yes, a little bit :-) --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 14:42 ` Hans Petter Selasky @ 2023-01-13 15:45 ` Konstantin Ryabitsev 2023-01-13 15:50 ` Hans Petter Selasky 2023-01-13 15:54 ` Hans Petter Selasky 0 siblings, 2 replies; 42+ messages in thread From: Konstantin Ryabitsev @ 2023-01-13 15:45 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: rsbecker, git On Fri, Jan 13, 2023 at 03:42:48PM +0100, Hans Petter Selasky wrote: > > I do not understand the goal of this request. If it is possible to forge > > hashes, then nothing in a git repository can ever be trusted. Signed > > content will no longer be verifiable. The whole Merkel Tree representing > > the commit history becomes easily corruptible by hackers and no upstream > > remote repository can ever be trusted - or someone's own if someone > > targets a repo with malware that rewrites hashes. Imagine a scenario when > > malware replaces a blob in a repo and then forges the hash to pretend that > > the replacement never occurred. Using git as a supply chain audit trail > > becomes impossible. This is a potential vector for ransomware invading the > > git ecosystem. This seems like a really fatal path to take for the > > product. > > If a hacker replaces a blob, everyone on the project will see it, because > such changes typically generate a commit e-mail. I don't think you have a very clear picture of how git works. > And then an action will be made to revoke the access of that hacker. Now a > clever hacker wouldn't do that. A clever hacker would just flip one bit > somewhere in a random blob, looking like a hardware fault, and then force > the project to rewind to backups every day, because the repository can no > longer be verified. That's not how it works at all. If there is a corrupted object, the admins of the repository just put the correct object into place either from a backup or from another copy of the repository. There is no rewinding required. > There is no advantage from protecting from hardware errors, unless you can > recover from them! Cryptographic hash algorithms are not suitable to recover > bits. They only tell data is OK or NOK, and if there is no backup, you loose > it! This is true about all digital media. > It is no solution for big repositories to rewind to backups just because > of bit-flips. Such problems should be fixed w/o the need to roll-back, > because that stops the entire production! No it doesn't. > > it can be repaired by a push --force > > Hobby projects can do that, but not big projects like FreeBSD and the Linux > kernel. Sure they can, but not due to missing objects (a corrupted object is just a missing object). If, for some reason, Linus ever needs to remove something from linux.git, he will do it and just give a heads-up why and for what reason. I think you're misunderstanding some of the core principles of git. -K ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 15:45 ` Konstantin Ryabitsev @ 2023-01-13 15:50 ` Hans Petter Selasky 2023-01-13 15:56 ` rsbecker 2023-01-13 15:54 ` Hans Petter Selasky 1 sibling, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 15:50 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 16:45, Konstantin Ryabitsev wrote: > I think you're misunderstanding some of the core principles of git. Maybe, I'm usually commandering git via the terminal. But if you say you can already edit stuff, why does the commit hash need to be cryptographic? I don't get that part. Yeah, I think of git commits like blockchain. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 15:50 ` Hans Petter Selasky @ 2023-01-13 15:56 ` rsbecker 2023-01-13 16:02 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: rsbecker @ 2023-01-13 15:56 UTC (permalink / raw) To: 'Hans Petter Selasky', 'Konstantin Ryabitsev'; +Cc: git On January 13, 2023 10:50 AM, Hans Petter Selasky wrote: >On 1/13/23 16:45, Konstantin Ryabitsev wrote: >> I think you're misunderstanding some of the core principles of git. > >Maybe, I'm usually commandering git via the terminal. > >But if you say you can already edit stuff, why does the commit hash need to be >cryptographic? I don't get that part. Yeah, I think of git commits like blockchain. git is using SHA1/SHA256 (which happen to be coincidentally cryptographic) as message digests with a very low probability of collisions when the hashes are computed. There is never a situation, implied by cryptography, where there is a decode of a git hash. In order to make git a blockchain, you would need to implement central signing authorities, which would require a fork if the signature mechanism changes. The signature mechanism (SSH, GPG) is distinct from hash computation in git's trees, but depends on hash integrity. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 15:56 ` rsbecker @ 2023-01-13 16:02 ` Hans Petter Selasky 0 siblings, 0 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:02 UTC (permalink / raw) To: rsbecker, 'Konstantin Ryabitsev'; +Cc: git On 1/13/23 16:56, rsbecker@nexbridge.com wrote: > git is using SHA1/SHA256 (which happen to be coincidentally cryptographic) as message digests with a very low probability of collisions when the hashes are computed. There is never a situation, implied by cryptography, where there is a decode of a git hash. In order to make git a blockchain, you would need to implement central signing authorities, which would require a fork if the signature mechanism changes. The signature mechanism (SSH, GPG) is distinct from hash computation in git's trees, but depends on hash integrity. I see. But at the same time any unique enough hash, identifies a specific piece of code or checkout, even though it is not under a specific signing authority. And that is the problem, that authorities may distribute allowed-only-hashes for their hardware ... --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 15:45 ` Konstantin Ryabitsev 2023-01-13 15:50 ` Hans Petter Selasky @ 2023-01-13 15:54 ` Hans Petter Selasky 2023-01-13 16:02 ` Konstantin Ryabitsev 1 sibling, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 15:54 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 16:45, Konstantin Ryabitsev wrote: > If, for some reason, Linus ever needs to remove something > from linux.git, he will do it and just give a heads-up why and for what > reason. This gotta be a joke. There are 46K forks of Linus Torvalds Linux kernel on GitHUB, and if Linus Torvalds one day decides to do a forced push, it will for sure be a disaster! --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 15:54 ` Hans Petter Selasky @ 2023-01-13 16:02 ` Konstantin Ryabitsev 2023-01-13 16:06 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Ryabitsev @ 2023-01-13 16:02 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: rsbecker, git On Fri, Jan 13, 2023 at 04:54:39PM +0100, Hans Petter Selasky wrote: > On 1/13/23 16:45, Konstantin Ryabitsev wrote: > > If, for some reason, Linus ever needs to remove something > > from linux.git, he will do it and just give a heads-up why and for what > > reason. > > This gotta be a joke. > > There are 46K forks of Linus Torvalds Linux kernel on GitHUB, and if Linus > Torvalds one day decides to do a forced push, it will for sure be a > disaster! No it won't, and I speak from some position of authority on this subject (I'm responsible for git.kernel.org). If Linus has to alter the history of linux.git, it will for sure be an extraordinary event -- it's never happened yet. However, it will be widely publicised, the reasons for it will be made clear, and everyone will just accept it and move on. Git history edits occur all the time. Most tooling expects this to occasionally happen and deals with it correctly. -K ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:02 ` Konstantin Ryabitsev @ 2023-01-13 16:06 ` Hans Petter Selasky 2023-01-13 16:18 ` Hans Petter Selasky 2023-01-13 16:27 ` Konstantin Ryabitsev 0 siblings, 2 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:06 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 17:02, Konstantin Ryabitsev wrote: > On Fri, Jan 13, 2023 at 04:54:39PM +0100, Hans Petter Selasky wrote: >> On 1/13/23 16:45, Konstantin Ryabitsev wrote: >>> If, for some reason, Linus ever needs to remove something >>> from linux.git, he will do it and just give a heads-up why and for what >>> reason. >> >> This gotta be a joke. >> >> There are 46K forks of Linus Torvalds Linux kernel on GitHUB, and if Linus >> Torvalds one day decides to do a forced push, it will for sure be a >> disaster! > > No it won't, and I speak from some position of authority on this subject (I'm > responsible for git.kernel.org). > > If Linus has to alter the history of linux.git, it will for sure be an > extraordinary event -- it's never happened yet. However, it will be widely > publicised, the reasons for it will be made clear, and everyone will just > accept it and move on. > > Git history edits occur all the time. Most tooling expects this to > occasionally happen and deals with it correctly. > OK, if you say so. Though in my mind 46K rebases of millions of commits seem a lot overhead. However, if history can be edited anyway, why do you need the cryptographic hash algorithm. Why not use a non-cryptographic one? What's the point? Only so that one party can stay in control? --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:06 ` Hans Petter Selasky @ 2023-01-13 16:18 ` Hans Petter Selasky 2023-01-13 16:36 ` Konstantin Ryabitsev 2023-01-13 16:27 ` Konstantin Ryabitsev 1 sibling, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:18 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 17:06, Hans Petter Selasky wrote: > What's the point? Only so that one party can stay in control? Let me phrase it like this: You clearly believe in the zero-trust principle. I don't. Why can't git support both beliefs, and it can be configurable somehow then? --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:18 ` Hans Petter Selasky @ 2023-01-13 16:36 ` Konstantin Ryabitsev 2023-01-13 16:44 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Ryabitsev @ 2023-01-13 16:36 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: rsbecker, git On Fri, Jan 13, 2023 at 05:18:40PM +0100, Hans Petter Selasky wrote: > On 1/13/23 17:06, Hans Petter Selasky wrote: > > What's the point? Only so that one party can stay in control? > > Let me phrase it like this: > > You clearly believe in the zero-trust principle. I don't. I'm not sure what you mean here, but git is certainly not zero-trust. When you clone linux.git from git.kernel.org, you're very much trusting that: - I (or members of my team) didn't mess with the repository - Linus (or someone who hacked his laptop) didn't mess with the repository Git is tamper-evident, not tamper-proof, so by definition it cannot be zero-trust. > Why can't git support both beliefs, and it can be configurable somehow then? Well, git is literally built on the concept of unique hashes. It's not possible to make this bit configurable, as it would be a totally different project with entirely different internals. Not saying such framework doesn't have a reason to exist, but it's not something that can be built on top of git. -K ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:36 ` Konstantin Ryabitsev @ 2023-01-13 16:44 ` Hans Petter Selasky 2023-01-13 16:49 ` Konstantin Ryabitsev 0 siblings, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:44 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 17:36, Konstantin Ryabitsev wrote: > I'm not sure what you mean here, but git is certainly not zero-trust. When you > clone linux.git from git.kernel.org, you're very much trusting that: > > - I (or members of my team) didn't mess with the repository > - Linus (or someone who hacked his laptop) didn't mess with the repository > > Git is tamper-evident, not tamper-proof, so by definition it cannot be > zero-trust. Hi, By using a cryptographic hash algorithm, the goal is to avoid tampering you say, like tampering on the internet, ISP, cache node and so on. To me that's clearly a zero-trust thought. You don't trust the guy(s) that put down the infrastructure, neither those that provide that local cache for the GIT repository, only the master repository. SHA-1 gives a certain confidence, that if you checkout XXXXXXX, then you get a likely expected result with reduced possibility of tampering. Anyone could intercept a CRC protected blob and re-compute the hash and send it on. But not a SHA-1 one. I on the other hand trust the guys that put down the internet and are providing the cache nodes for GIT. It's two different world views. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:44 ` Hans Petter Selasky @ 2023-01-13 16:49 ` Konstantin Ryabitsev 2023-01-13 16:51 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Ryabitsev @ 2023-01-13 16:49 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: rsbecker, git On Fri, Jan 13, 2023 at 05:44:03PM +0100, Hans Petter Selasky wrote: > By using a cryptographic hash algorithm, the goal is to avoid tampering you > say, like tampering on the internet, ISP, cache node and so on. To me that's > clearly a zero-trust thought. You don't trust the guy(s) that put down the > infrastructure, neither those that provide that local cache for the GIT > repository, only the master repository. SHA-1 gives a certain confidence, > that if you checkout XXXXXXX, then you get a likely expected result with > reduced possibility of tampering. > > Anyone could intercept a CRC protected blob and re-compute the hash and send > it on. But not a SHA-1 one. > > I on the other hand trust the guys that put down the internet and are > providing the cache nodes for GIT. I admit, I never trust the "guys who put down the internet," so that's a very scary scenario to me (and I would say to pretty much everyone else on this list). > It's two different world views. Indeed, werenotalike.gif :) -K ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:49 ` Konstantin Ryabitsev @ 2023-01-13 16:51 ` Hans Petter Selasky 0 siblings, 0 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:51 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 17:49, Konstantin Ryabitsev wrote: >> It's two different world views. > Indeed, werenotalike.gif 😄 OK, I have no problem about that. Thanks for the discussion. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:06 ` Hans Petter Selasky 2023-01-13 16:18 ` Hans Petter Selasky @ 2023-01-13 16:27 ` Konstantin Ryabitsev 2023-01-13 16:30 ` Hans Petter Selasky 2023-01-13 16:35 ` Hans Petter Selasky 1 sibling, 2 replies; 42+ messages in thread From: Konstantin Ryabitsev @ 2023-01-13 16:27 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: rsbecker, git On Fri, Jan 13, 2023 at 05:06:57PM +0100, Hans Petter Selasky wrote: > OK, if you say so. Though in my mind 46K rebases of millions of commits seem > a lot overhead. Not to discourage you, but you seem to be making statements without a good understanding of how git works. If there is a history rewrite (even one that for some reason goes back millions of commits) all hash calculations will happen exactly once -- on the system of the person who's rewriting the history. After they push it, it's just a bunch of objects that everyone else merely downloads. > However, if history can be edited anyway, why do you need the cryptographic > hash algorithm. Why not use a non-cryptographic one? The answer is, unhelpfully, "because that's how git works." Every commit is a standalone object that references the previous commit, plus includes hashes of all trees, and those include hashes of all blobs. SHA-1 was picked because of its speed and the fact that it guarantees an extremely low potential for collisions (even better with SHA256). As a side-effect, it's easy to calculate the integrity of the entire tree, including its history, by verifying its hashes (this is what git fsck does). Hashes aren't really "cryptographic" anyway (they just happen to be used all over the place in cryptography). It's really just a one-way function to reduce content of arbitrary size to a set of bytes of a determined size (and give a relatively high assurance of it being collision-free). -K ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:27 ` Konstantin Ryabitsev @ 2023-01-13 16:30 ` Hans Petter Selasky 2023-01-13 16:35 ` Hans Petter Selasky 1 sibling, 0 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:30 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 17:27, Konstantin Ryabitsev wrote: > The answer is, unhelpfully, "because that's how git works." Every commit is a > standalone object that references the previous commit, plus includes hashes of > all trees, and those include hashes of all blobs. SHA-1 was picked because of > its speed and the fact that it guarantees an extremely low potential for > collisions (even better with SHA256). As a side-effect, it's easy to calculate > the integrity of the entire tree, including its history, by verifying its > hashes (this is what git fsck does). Same thing can be said for CRC-XXX. Just some magic CPU instructions and we're good. You don't even need a library. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:27 ` Konstantin Ryabitsev 2023-01-13 16:30 ` Hans Petter Selasky @ 2023-01-13 16:35 ` Hans Petter Selasky 2023-01-13 16:41 ` Konstantin Ryabitsev 1 sibling, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:35 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 17:27, Konstantin Ryabitsev wrote: > Not to discourage you, but you seem to be making statements without a good > understanding of how git works. If there is a history rewrite (even one that > for some reason goes back millions of commits) all hash calculations will > happen exactly once -- on the system of the person who's rewriting the > history. After they push it, it's just a bunch of objects that everyone else > merely downloads. If you used CRC, you would not need that, because CRC calculations are "concatenatable", while SHA-1's are not. CRC would just need the first and the last hash, and then you would apply the "difference". --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:35 ` Hans Petter Selasky @ 2023-01-13 16:41 ` Konstantin Ryabitsev 2023-01-13 16:45 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: Konstantin Ryabitsev @ 2023-01-13 16:41 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: rsbecker, git On Fri, Jan 13, 2023 at 05:35:57PM +0100, Hans Petter Selasky wrote: > > Not to discourage you, but you seem to be making statements without a good > > understanding of how git works. If there is a history rewrite (even one that > > for some reason goes back millions of commits) all hash calculations will > > happen exactly once -- on the system of the person who's rewriting the > > history. After they push it, it's just a bunch of objects that everyone else > > merely downloads. > > If you used CRC, you would not need that, because CRC calculations are > "concatenatable", while SHA-1's are not. CRC would just need the first and > the last hash, and then you would apply the "difference". It doesn't matter how it works behind the scenes as long as the produced hash is not unique (and CRC gives you no assurance of being unique). Git is built on the concept that every object has a unique hash. If this is no longer true, then it's literally no longer git, but is something else. Since we're discussing this on the git list, it's not really a discussion worth having here. -K ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 16:41 ` Konstantin Ryabitsev @ 2023-01-13 16:45 ` Hans Petter Selasky 0 siblings, 0 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 16:45 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: rsbecker, git On 1/13/23 17:41, Konstantin Ryabitsev wrote: > It doesn't matter how it works behind the scenes as long as the produced hash > is not unique (and CRC gives you no assurance of being unique). Git is built > on the concept that every object has a unique hash. If this is no longer true, > then it's literally no longer git, but is something else. That's why I say you need a fixup field, in case of collisions. CRC is used plenty all over the place and has good entropy. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 14:21 ` rsbecker 2023-01-13 14:42 ` Hans Petter Selasky @ 2023-01-13 15:15 ` Hans Petter Selasky 2023-01-13 17:44 ` Philip Oakley 2 siblings, 0 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 15:15 UTC (permalink / raw) To: rsbecker, git On 1/13/23 15:21, rsbecker@nexbridge.com wrote: > Signed content will no longer be verifiable. The whole Merkel Tree representing the commit history becomes easily corruptible by hackers Hi, As a long time open sourcer and hacker, I'm totally against signing software. Is the GIT project going to build the new infrastructure for John-Deers new tractor firmware adventure? It is totally against the values of open source craftmanship. I don't think any of you crypto-enthusiasts understand how propritary companies use signed software to keep their power intact. That's also an argument for using a non-crypto hash. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 14:21 ` rsbecker 2023-01-13 14:42 ` Hans Petter Selasky 2023-01-13 15:15 ` Hans Petter Selasky @ 2023-01-13 17:44 ` Philip Oakley 2 siblings, 0 replies; 42+ messages in thread From: Philip Oakley @ 2023-01-13 17:44 UTC (permalink / raw) To: rsbecker, 'Hans Petter Selasky', git On 13/01/2023 14:21, rsbecker@nexbridge.com wrote: > On January 13, 2023 8:40 AM, Hans Petter Selasky wrote: >> On 1/13/23 14:30, Konstantin Khomoutov wrote: >>> On Fri, Jan 13, 2023 at 01:59:44PM +0100, Hans Petter Selasky wrote: >>> >>>> Currently GIT only supports cryptographic hashes for its commit tags. >>> [...] >>> >>> https://github.com/git/git/blob/9bf691b78cf906751e65d65ba0c6ffdcd9a5a1 >>> 2c/Documentation/technical/hash-function-transition.txt >>> >>> It's not clear why are you referring to Gitorious in your mail's >>> subject and then talk about Git. >>> >> Hi, >> >> I thought that Git was short for Gitorious? My bad. >> >> The document you refer to really highlights my concerns, that a strong >> cryptographic hash algorithm is the highway to hell. >> >> Do _not_ use a cryptographic hash for Git. Use plain good old CRC hashes. >> >> Just imagine the consequences of finding child porn inside a 10-year old firmware >> binary blob in the Linux kernel. Will you just ignore it, or will you fix it? >> >> That's why I say, that it must be possible to forge the hashes by default. > I do not understand the goal of this request. I'd agree about the core need for 'absolute' integrity checking. However we have been here before, but without a way forward. It was the "Subject: [TOPIC 3/17] Obliterate" at Git Contributor Summit, Los Angeles (April 5, 2020). https://lore.kernel.org/git/5B2FEA46-A12F-4DE7-A184-E8856EF66248@jramsay.com.au/ Discussion at https://docs.google.com/document/d/15a_MPnKaEPbC92a4jhprlHvkyirDh2CtTtgOxNbnIbA/edit#heading=h.wljwyo3r1m6l The core need I think HPS is referring to is that need to 'obliterate' some blob (which contains the en-mass data), and perhaps some trees, commits and tags, which may also hold objectionable meta data, at least from reference repositories, and at the same time authenticate (if that's the right term) the list of such obliterated objects. It will be a difficult task to carefully cut the fog of misdirection and scares in this arena. It's one of those problem statements whose answer is "42". > If it is possible to forge hashes, then nothing in a git repository can ever be trusted. Signed content will no longer be verifiable. The whole Merkel Tree representing the commit history becomes easily corruptible by hackers and no upstream remote repository can ever be trusted - or someone's own if someone targets a repo with malware that rewrites hashes. Imagine a scenario when malware replaces a blob in a repo and then forges the hash to pretend that the replacement never occurred. Using git as a supply chain audit trail becomes impossible. This is a potential vector for ransomware invading the git ecosystem. This seems like a really fatal path to take for the product. The supply chain audit is (would be) a real problem if the presence of a specific hash is a punishable criminal offence. I suspect it already is in some jurisdictions. > > The advantage of how git functions is that it is possible to mirror or clone repositories, protecting from hardware errors. Repositories exist in distributed form, so there may be hundreds or thousands of copies in case someone's copy is corrupted by a disk or memory write error - so that takes hash reconstruction out of the requirement set. If the git architecture was based on a central repository model only, then this might be a reasonable request, but that is not how git works. The law works in mysterious ways it's wonderful ways to demonstrate ;-) Possession of certain artefacts can be a problem, so it is something that is worth careful consideration. We shouldn't let the 'distribution of criminal artefacts' be something 'guaranteed' by Git, despite careful users. > If, for instance, a main GitHub repo is somehow corrupted, it can be repaired by a push --force or a clone from a different instance. > > Unless I am missing your point. > --Randall > The forced replacement of 'redacted' material is already a problem in other domains. We should be able to manage a redaction list for a repository that needs it. All that said, CRC isn't any sort of solution! -- Philip ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 13:39 ` Hans Petter Selasky 2023-01-13 14:21 ` rsbecker @ 2023-01-13 15:30 ` Konstantin Khomoutov 2023-01-13 15:39 ` Konstantin Ryabitsev 2 siblings, 0 replies; 42+ messages in thread From: Konstantin Khomoutov @ 2023-01-13 15:30 UTC (permalink / raw) To: git; +Cc: Hans Petter Selasky On Fri, Jan 13, 2023 at 02:39:37PM +0100, Hans Petter Selasky wrote: [...] > > It's not clear why are you referring to Gitorious in your mail's subject and > > then talk about Git. [...] > I thought that Git was short for Gitorious? My bad. No, unless you're late to the Git party ;-) Old-timers do remember Gitorious as a software project [1] which is closely related to Git but was a totally separate project. 1. https://en.wikipedia.org/wiki/Gitorious ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 13:39 ` Hans Petter Selasky 2023-01-13 14:21 ` rsbecker 2023-01-13 15:30 ` Konstantin Khomoutov @ 2023-01-13 15:39 ` Konstantin Ryabitsev 2 siblings, 0 replies; 42+ messages in thread From: Konstantin Ryabitsev @ 2023-01-13 15:39 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: git On Fri, Jan 13, 2023 at 02:39:37PM +0100, Hans Petter Selasky wrote: > Just imagine the consequences of finding child porn inside a 10-year old > firmware binary blob in the Linux kernel. Will you just ignore it, or will > you fix it? How do you expect something like this would happen? A much more likely scenario would be someone contributing a binary blob that doesn't actually allow redistribution, and therefore would need to be purged from the repository. When something like this happens, everyone is given a heads-up, the history is rewritten, and everyone moves on. It's a fairly routine procedure -- ask anyone who's ever committed an API key into their repo. Git supports history edits and everyone lives with it just fine -- I think you are under the impression that git is some kind of globally distributed blockchain where any history edit requires a consensus fork. It's not at all the case. -K ^ permalink raw reply [flat|nested] 42+ messages in thread
* Gitorious should use CRC128 / 256 / 512 instead of SHA-1 @ 2023-01-13 13:23 Hans Petter Selasky 2023-01-14 23:59 ` brian m. carlson 2023-01-15 13:53 ` Michal Suchánek 0 siblings, 2 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-13 13:23 UTC (permalink / raw) To: git Hi, Currently GIT only supports cryptographic hashes for its commit tags. That means: 1) It's very difficult to edit the history without also recomputing the hash tags for all commits after the needed change-point, which then means references to a repository is broken. 2) Only a single bit error in the main repository can break everything! 3) Illicit contents may be present in binary blobs, which in the future may be need to be removed without warrant and the only way to do that is by rebasing and force pushing, which will break "everything". It can be everything from child-porn to expired distribution licenses. Many people think that bit errors cannot happen because the memory uses ECC and the file system uses cryptographic hashes to verify the integrity of the data. But what many people forget about is that when copying data from memory to disk, typically using a DMA channel data is copied w/o any kind of integrity protection, because the integrity protection is not end-to-end. The integrity protection is only per-link. Therefore I propose the following changes to GIT. 1) Use a CRC128 / 256 or 512 non-cryptographic based hashing algorithm as default. 2) Add support for a CRC fixup field, which usually is zero, but when merges are needed, it can be non-zero, to allow the hash-tag-value to remain the same! This also allows for easy conversion of existing GIT repositories to the new scheme. 3) All git objects should be uncompressed. CRC-XXX can easily be used to correct multiple bit errors without any performance overhead. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 13:23 Hans Petter Selasky @ 2023-01-14 23:59 ` brian m. carlson 2023-01-15 3:14 ` Junio C Hamano ` (3 more replies) 2023-01-15 13:53 ` Michal Suchánek 1 sibling, 4 replies; 42+ messages in thread From: brian m. carlson @ 2023-01-14 23:59 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: git [-- Attachment #1: Type: text/plain, Size: 3907 bytes --] On 2023-01-13 at 13:23:59, Hans Petter Selasky wrote: > Hi, > > Currently GIT only supports cryptographic hashes for its commit tags. > > That means: > > 1) It's very difficult to edit the history without also recomputing the hash > tags for all commits after the needed change-point, which then means > references to a repository is broken. This is intentional. Commit and tag signing requires an unbroken Merkle tree-like construction that prevents the history from being modified by signing a single commit or tag. > 2) Only a single bit error in the main repository can break everything! git fsck is designed to detect this, and by default it's run every time the repository is repacked (such as by git gc). But yes, this is a problem, and changing to an algorithm which isn't cryptographically secure won't change that. Prudent users back up data to prevent data loss. > 3) Illicit contents may be present in binary blobs, which in the future may > be need to be removed without warrant and the only way to do that is by > rebasing and force pushing, which will break "everything". It can be > everything from child-porn to expired distribution licenses. This is a problem in every Merkle tree-like system. Most repositories have some sort of code review or access control that prevents people from generally pushing inappropriate content. For example, if somebody proposed to push any sort of pornography or other inappropriate content (e.g., a racist screed) to one of my repositories or one of my employer's, I'd refuse to approve or merge such a change, because that wouldn't be appropriate for the repository. I don't feel this is enough of a problem that using a Merkle tree-like construction is a bad idea, given the benefits it offers. > Therefore I propose the following changes to GIT. > > 1) Use a CRC128 / 256 or 512 non-cryptographic based hashing algorithm as > default. As the person who wrote the SHA-256 support, I'm pleased to report that adding a new hash algorithm isn't very difficult anymore. The largest part of the work is updating all the tests. I've tried very hard to make this substantially easier for everyone. However, Git is moving in the direction of stronger cryptographic algorithms, rather than insecure hashing algorithms. I don't think your proposal is a good idea, nor do I think it's likely to be adopted. If it were adopted, the signing of commits and tags would be meaningless, and because it would be trivial to create collisions[0], there would clearly be some pairs of objects which could not be stored. This would make Git much less useful, and it might allow users to attempt to forge or replace content without being detected. That being said, you are free to create your own fork of the code which does so, provided you comply with the terms of the license. > 2) Add support for a CRC fixup field, which usually is zero, but when merges > are needed, it can be non-zero, to allow the hash-tag-value to remain the > same! This also allows for easy conversion of existing GIT repositories to > the new scheme. For the same reason as above, I don't think this is a good idea. > 3) All git objects should be uncompressed. This would dramatically increase the size of most repositories. I've easily seen repositories where the uncompressed contents exceed 1 TB in size yet the repository is only double-digit gigabytes, if that. Most people will find the increase in disk usage unacceptable, and I'm certain that includes Git hosterse. [0] CRC is linear and the following relations apply, which makes forgery trivial (see https://en.wikipedia.org/wiki/Cyclic_redundancy_check): CRC(x XOR y) = CRC(x) XOR CRC(y) XOR c for some c CRC(x XOR y XOR z) = CRC(x) XOR CRC(y) XOR CRC(z) -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 263 bytes --] ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-14 23:59 ` brian m. carlson @ 2023-01-15 3:14 ` Junio C Hamano 2023-01-15 10:09 ` demerphq ` (2 subsequent siblings) 3 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2023-01-15 3:14 UTC (permalink / raw) To: brian m. carlson; +Cc: Hans Petter Selasky, git "brian m. carlson" <sandals@crustytoothpaste.net> writes: >> 3) Illicit contents may be present in binary blobs, which in the future may >> be need to be removed without warrant and the only way to do that is by >> rebasing and force pushing, which will break "everything". It can be >> everything from child-porn to expired distribution licenses. > > This is a problem in every Merkle tree-like system. Most repositories > have some sort of code review or access control that prevents people > from generally pushing inappropriate content. For example, if somebody > proposed to push any sort of pornography or other inappropriate content > (e.g., a racist screed) to one of my repositories or one of my > employer's, I'd refuse to approve or merge such a change, because > that wouldn't be appropriate for the repository. > > I don't feel this is enough of a problem that using a Merkle tree-like > construction is a bad idea, given the benefits it offers. While I agree with the primary thrust of your argument, this one is a bit tricky to reason about. External rules change and can declare what has been accepted as appropriate inappropriate on a whim, long after you reviewed the material coming into your history and decided it was perfectly fine, under the then-prevailing definition of what is and isn't appropriate. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-14 23:59 ` brian m. carlson 2023-01-15 3:14 ` Junio C Hamano @ 2023-01-15 10:09 ` demerphq 2023-01-16 7:21 ` Hans Petter Selasky 2023-01-16 7:23 ` Hans Petter Selasky 3 siblings, 0 replies; 42+ messages in thread From: demerphq @ 2023-01-15 10:09 UTC (permalink / raw) To: brian m. carlson, Hans Petter Selasky, git On Sun, 15 Jan 2023 at 01:05, brian m. carlson <sandals@crustytoothpaste.net> wrote: > > This is a problem in every Merkle tree-like system. Most repositories > have some sort of code review or access control that prevents people > from generally pushing inappropriate content. For example, if somebody > proposed to push any sort of pornography or other inappropriate content > (e.g., a racist screed) to one of my repositories or one of my > employer's, I'd refuse to approve or merge such a change, because > that wouldn't be appropriate for the repository. > > I don't feel this is enough of a problem that using a Merkle tree-like > construction is a bad idea, given the benefits it offers. [resend in plain text] It isn't clear to me why this needs to be a problem at all. If the Merkele tree contains data later in its chain that says "replace Object X with Y", provided the replacement mechanism doesn't touch commit objects, only blobs, then you can replace files in the history with other files without altering the commit history. Provided the toolchain validates that it has found a proper "replacement instruction" in the history, it should be possible to safely replace blobs without a full history rewrite. The replacement mechanism could be structured so that you can only "nuke" a file, eg, replace it with a zero byte blob, making it somewhat less open to abuse, or it could allow arbitrary blobs to be mapped to each other. So long as the mapping data is in the commit history it should be as secure as the original mapping no? Git could be taught to warn the user "Checking out a rewritten blob X as Y, see 012deadbeef for the rewrite instruction." when it happened. Again, provided this does not touch the *commit* tree, just raw blobs, I dont see why you can't have an object replacement facility. Am I missing something? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/" ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-14 23:59 ` brian m. carlson 2023-01-15 3:14 ` Junio C Hamano 2023-01-15 10:09 ` demerphq @ 2023-01-16 7:21 ` Hans Petter Selasky 2023-01-16 7:23 ` Hans Petter Selasky 3 siblings, 0 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-16 7:21 UTC (permalink / raw) To: brian m. carlson, git On 1/15/23 00:59, brian m. carlson wrote: >> 3) Illicit contents may be present in binary blobs, which in the future may >> be need to be removed without warrant and the only way to do that is by >> rebasing and force pushing, which will break "everything". It can be >> everything from child-porn to expired distribution licenses. > This is a problem in every Merkle tree-like system. Most repositories > have some sort of code review or access control that prevents people > from generally pushing inappropriate content. For example, if somebody > proposed to push any sort of pornography or other inappropriate content > (e.g., a racist screed) to one of my repositories or one of my > employer's, I'd refuse to approve or merge such a change, because > that wouldn't be appropriate for the repository. > > I don't feel this is enough of a problem that using a Merkle tree-like > construction is a bad idea, given the benefits it offers. > Yeah, right. And of course you have all the tools to decode those megabyte big firmware blobs from intel supporting wireless cards all over the place to see what is actually inside there, that they are not using some 3rd party code which licence will expire at some point, and then you need to remove those binaries. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-14 23:59 ` brian m. carlson ` (2 preceding siblings ...) 2023-01-16 7:21 ` Hans Petter Selasky @ 2023-01-16 7:23 ` Hans Petter Selasky 2023-01-16 12:34 ` rsbecker 3 siblings, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-16 7:23 UTC (permalink / raw) To: brian m. carlson, git On 1/15/23 00:59, brian m. carlson wrote: > However, Git is moving in the direction of stronger cryptographic > algorithms, rather than insecure hashing algorithms. I don't think your > proposal is a good idea, nor do I think it's likely to be adopted. I disagree. There is no need for signing in a version control system. It just makes it harder to change things, like the right-to-repair. In my eyes there is a high chance of abuse, by vendors that do no want others to flash or edit their device firmwares. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 7:23 ` Hans Petter Selasky @ 2023-01-16 12:34 ` rsbecker 2023-01-16 14:01 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: rsbecker @ 2023-01-16 12:34 UTC (permalink / raw) To: 'Hans Petter Selasky', 'brian m. carlson', git On January 16, 2023 2:24 AM, Hans Petter Selasky wrote: >On 1/15/23 00:59, brian m. carlson wrote: >> However, Git is moving in the direction of stronger cryptographic >> algorithms, rather than insecure hashing algorithms. I don't think >> your proposal is a good idea, nor do I think it's likely to be adopted. > >I disagree. There is no need for signing in a version control system. It just makes it >harder to change things, like the right-to-repair. In my eyes there is a high chance >of abuse, by vendors that do no want others to flash or edit their device >firmwares. The two matters are completely isolated and distinct. In the OpenSource community, anyone typically has the right to modify. Please refer to the GPLv3, ECLIPSE, and MIT licenses for example. Those are the governing documents that permit modification and define intellectual property rights. Please consult those licenses with regards to right-to-repair statements that have no legal bearing on git or any other GPL-governed software product. In my view, the issue raised is a red herring that keeps getting brought up, which does not contribute positively to this request's discussion, but would presumably would increase the hit rate on web searches, to which this reply unfortunately contributes. The assertion of no need for signing can apply to a centralized version control system, like SVN, because users are authenticated centrally, and the contribution can be made definitive without a separate signature, providing no one with root authority on the server hacks the repository. In the architecture of a distributed version control system (specifically git for this discussion), there is no evidence of origin of changes because the commit identity is cooperative rather than being enforced by a central authority and hacking the repository by root is detectible. The assertion of signing as abuse of rights is also an opinion that, so far, has no supporting evidence given. Perhaps a paper in a refereed journal might give this position some credibility. My point is that signing is critical in a DVCS and a major function point used by DevOps architects for adopting git in new organizations. In the regulated world, FinTech, FDA, Aviation, etc., signing contributes to the evidence of origin of changes required by PCI and SWIFT (ref: section 6 in each regulation). Without signed tags (which the establishes the change origins for releases for production use), deployment becomes less certain and less acceptable to the audit community with whom I interact on a regular basis. --Randall ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 12:34 ` rsbecker @ 2023-01-16 14:01 ` Hans Petter Selasky 2023-01-16 15:06 ` Junio C Hamano 0 siblings, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-16 14:01 UTC (permalink / raw) To: rsbecker, 'brian m. carlson', git On 1/16/23 13:34, rsbecker@nexbridge.com wrote: > On January 16, 2023 2:24 AM, Hans Petter Selasky wrote: >> On 1/15/23 00:59, brian m. carlson wrote: >>> However, Git is moving in the direction of stronger cryptographic >>> algorithms, rather than insecure hashing algorithms. I don't think >>> your proposal is a good idea, nor do I think it's likely to be adopted. >> >> I disagree. There is no need for signing in a version control system. It just makes it >> harder to change things, like the right-to-repair. In my eyes there is a high chance >> of abuse, by vendors that do no want others to flash or edit their device >> firmwares. > Hi, > The two matters are completely isolated and distinct. In the OpenSource community, anyone typically has the right to modify. Please refer to the GPLv3, ECLIPSE, and MIT licenses for example. Those are the governing documents that permit modification and define intellectual property rights. Please consult those licenses with regards to right-to-repair statements that have no legal bearing on git or any other GPL-governed software product. In my view, the issue raised is a red herring that keeps getting brought up, which does not contribute positively to this request's discussion, but would presumably would increase the hit rate on web searches, to which this reply unfortunately contributes. The use of cryptographic hash tags, allows one party to stay in control of and monetize a project, actually by doing nothing more than rebranding an existing product. > The assertion of no need for signing can apply to a centralized version control system, like SVN, because users are authenticated centrally, and the contribution can be made definitive without a separate signature, providing no one with root authority on the server hacks the repository. In the architecture of a distributed version control system (specifically git for this discussion), there is no evidence of origin of changes because the commit identity is cooperative rather than being enforced by a central authority and hacking the repository by root is detectible. The assertion of signing as abuse of rights is also an opinion that, so far, has no supporting evidence given. Perhaps a paper in a refereed journal might give this position some credibility. From what I've read the GPLv3 goes pretty far to also provide flashing rights for software, but what use is that, when flashing the unsigned software on your Samsung phone, for example, some fuse breaks in the hardware, and then you can no longer use certain apps on your phone? > > My point is that signing is critical in a DVCS and a major function point used by DevOps architects for adopting git in new organizations. In the regulated world, FinTech, FDA, Aviation, etc., signing contributes to the evidence of origin of changes required by PCI and SWIFT (ref: section 6 in each regulation). Without signed tags (which the establishes the change origins for releases for production use), deployment becomes less certain and less acceptable to the audit community with whom I interact on a regular basis. > It's very clear to me, that supporting signing straight off the VCS, will not help the opensource and right-to-repair community at all. It's just ripe for abuse, like I say. Hacking is prevented by using a secure copy mechanism between the servers, which you can upgrade separately. You already see the problem, SHA-1 is not good enough to prevent hacking. Why not just separate the hacking preventing measures and the needs of a good VCS? --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 14:01 ` Hans Petter Selasky @ 2023-01-16 15:06 ` Junio C Hamano 0 siblings, 0 replies; 42+ messages in thread From: Junio C Hamano @ 2023-01-16 15:06 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: rsbecker, 'brian m. carlson', git Hans Petter Selasky <hps@selasky.org> writes: > From what I've read the GPLv3 goes pretty far to also provide flashing > rights for software, but what use is that, when flashing the unsigned > software on your Samsung phone, for example, some fuse breaks in the > hardware, and then you can no longer use certain apps on your phone? It smells that you are conflating the signing of source material and the sealing of tivoized hardware that use cryptographic signature to tell what binaries are allowed to run on it. The signing implemented by the software we the Git development community build is not about the latter. The source used to build binaries for your tivoized hardware can come from a VCS that is deliberately designed to allow object name collisions, and your build would just be locked out the same unless you have the signing key that pleases the hardware. Use of Git there would not make the story any different, I am afraid. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-13 13:23 Hans Petter Selasky 2023-01-14 23:59 ` brian m. carlson @ 2023-01-15 13:53 ` Michal Suchánek 2023-01-16 7:17 ` Hans Petter Selasky 1 sibling, 1 reply; 42+ messages in thread From: Michal Suchánek @ 2023-01-15 13:53 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: git Hello, On Fri, Jan 13, 2023 at 02:23:59PM +0100, Hans Petter Selasky wrote: > Hi, > > Currently GIT only supports cryptographic hashes for its commit tags. > > That means: > > 1) It's very difficult to edit the history without also recomputing the hash > tags for all commits after the needed change-point, which then means > references to a repository is broken. That also makes it difficult to alter the repository intentionally without anyone noticing. With SHA1 being somewhat weak it may be possible to alter repository content although I am not aware of any practical attacks shown so far. For that reason using stronger hashes is planned in the future. > 2) Only a single bit error in the main repository can break everything! > > 3) Illicit contents may be present in binary blobs, which in the future may > be need to be removed without warrant and the only way to do that is by > rebasing and force pushing, which will break "everything". It can be > everything from child-porn to expired distribution licenses. It's good to avoid spam getting into your repository. If you really need to alter it long into the past you still can. Everyone will notice that you did, and that's an intentional feature. In some situations it is understandably an annoyance but there's so much you can do. At least tags should remain stable. > Many people think that bit errors cannot happen because the memory uses ECC > and the file system uses cryptographic hashes to verify the integrity of the > data. But what many people forget about is that when copying data from > memory to disk, typically using a DMA channel data is copied w/o any kind of > integrity protection, because the integrity protection is not end-to-end. > The integrity protection is only per-link. So long as all links have integrity protection it's end-to-end. Integrity checks for CPU chaches, buses, and IO protocols do exist. It's not that errors cannot happen, they are very unlikely. In the very rare case that such error happens so long as non-corrupted version of the object can be supplied by anyone who has a copy of the repository it is recoverable. For old objects this should be your backup system. For new objects the worst case is that the history is rolled back so the missing object is not needed. Thanks Michal ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-15 13:53 ` Michal Suchánek @ 2023-01-16 7:17 ` Hans Petter Selasky 2023-01-16 9:13 ` Michal Suchánek 0 siblings, 1 reply; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-16 7:17 UTC (permalink / raw) To: Michal Suchánek; +Cc: git On 1/15/23 14:53, Michal Suchánek wrote: >> Many people think that bit errors cannot happen because the memory uses ECC >> and the file system uses cryptographic hashes to verify the integrity of the >> data. But what many people forget about is that when copying data from >> memory to disk, typically using a DMA channel data is copied w/o any kind of >> integrity protection, because the integrity protection is not end-to-end. >> The integrity protection is only per-link. > > So long as all links have integrity protection it's end-to-end. > Hi Michael, You clearly don't see what this is about! Only if the same CRC mechanism is end-to-end, you don't have any good integrity mechanism at all! Let me try to explain what this is about in very simple words. Because memcpy() does not copy the ECC CRC values along with the data, it is an unsafe memory copy mechanism, which may introduce bit-errors without noticing. It does not help to only have ECC RAM or for that sake protect the PCI links. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 7:17 ` Hans Petter Selasky @ 2023-01-16 9:13 ` Michal Suchánek 2023-01-16 9:55 ` Hans Petter Selasky 0 siblings, 1 reply; 42+ messages in thread From: Michal Suchánek @ 2023-01-16 9:13 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: git On Mon, Jan 16, 2023 at 08:17:58AM +0100, Hans Petter Selasky wrote: > On 1/15/23 14:53, Michal Suchánek wrote: > > > Many people think that bit errors cannot happen because the memory uses ECC > > > and the file system uses cryptographic hashes to verify the integrity of the > > > data. But what many people forget about is that when copying data from > > > memory to disk, typically using a DMA channel data is copied w/o any kind of > > > integrity protection, because the integrity protection is not end-to-end. > > > The integrity protection is only per-link. > > > > So long as all links have integrity protection it's end-to-end. > > > > Hi Michael, > > You clearly don't see what this is about! Only if the same CRC mechanism is > end-to-end, you don't have any good integrity mechanism at all! > > Let me try to explain what this is about in very simple words. Because > memcpy() does not copy the ECC CRC values along with the data, it is an > unsafe memory copy mechanism, which may introduce bit-errors without > noticing. It does not help to only have ECC RAM or for that sake protect the > PCI links. The ECC protects against 1bit errors - so long as only 1 bit is flipped along that path it is corrected. If you have bigger errors ECC can sometimes detect them and your system crashes or whatever, and sometimes they go unnoticed. It does not make sense to copy around that CRC. It is used to recover the corrupted bit, and when that data is copied to a new location a new CRC is calculated that can detect an error in that location. Copying that checksum around would only accumulate the errors. Of course, that assumes that the corruption happens only in the cheaper external long-term storage, and data does not get corrupted as it goes through your CPU where it is stored only a few CPU cycles at a time. It is mostly the case but when you need extreme reliability system-level schemes that mitigate this possibility do exist. Thanks Michal ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 9:13 ` Michal Suchánek @ 2023-01-16 9:55 ` Hans Petter Selasky 2023-01-16 12:31 ` rsbecker 2023-01-16 19:08 ` Michal Suchánek 0 siblings, 2 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-16 9:55 UTC (permalink / raw) To: Michal Suchánek; +Cc: git On 1/16/23 10:13, Michal Suchánek wrote: > when that data is copied to a new location a new > CRC is calculated that can detect an error in that location. Yes, that is correct, but what is "copying data"? Are you saying that copying data is always error free? --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* RE: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 9:55 ` Hans Petter Selasky @ 2023-01-16 12:31 ` rsbecker 2023-01-16 14:10 ` Hans Petter Selasky 2023-01-16 19:08 ` Michal Suchánek 1 sibling, 1 reply; 42+ messages in thread From: rsbecker @ 2023-01-16 12:31 UTC (permalink / raw) To: 'Hans Petter Selasky', 'Michal Suchánek' Cc: git, wrights On January 16, 2023 4:56 AM, Hans Petter Selasky wrote: >On 1/16/23 10:13, Michal Suchánek wrote: >> when that data is copied to a new location a new CRC is calculated >> that can detect an error in that location. > >Yes, that is correct, but what is "copying data"? Are you saying that copying data is >always error free? Not in all possible computing devices, no. But in certain high-reliability and mission critical systems, there are parity checks and communication mechanisms that verify the integrity of data transfers memory-to-memory, memory-to-register, and over inter-CPU bus, and memory-to-disk-storage checks. The result of a corruption on one of my systems would result in a CPU halt rather than blindly accepting the result, taking the faulty processor offline until the cause is investigated and then reloaded or repaired. This applies to any component, including disks, CLIMs, DMA, and anything else in the architecture. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 12:31 ` rsbecker @ 2023-01-16 14:10 ` Hans Petter Selasky 0 siblings, 0 replies; 42+ messages in thread From: Hans Petter Selasky @ 2023-01-16 14:10 UTC (permalink / raw) To: rsbecker, 'Michal Suchánek'; +Cc: git, wrights On 1/16/23 13:31, rsbecker@nexbridge.com wrote: > On January 16, 2023 4:56 AM, Hans Petter Selasky wrote: >> On 1/16/23 10:13, Michal Suchánek wrote: >>> when that data is copied to a new location a new CRC is calculated >>> that can detect an error in that location. >> >> Yes, that is correct, but what is "copying data"? Are you saying that copying data is >> always error free? > > Not in all possible computing devices, no. But in certain high-reliability and mission critical systems, there are parity checks and communication mechanisms that verify the integrity of data transfers memory-to-memory, memory-to-register, and over inter-CPU bus, and memory-to-disk-storage checks. The result of a corruption on one of my systems would result in a CPU halt rather than blindly accepting the result, taking the faulty processor offline until the cause is investigated and then reloaded or repaired. This applies to any component, including disks, CLIMs, DMA, and anything else in the architecture. > Hi, I doesn't matter if the system is high-reliability or not. The problem is exactly the same. If you have a CPU register which you add to another CPU register, then you need to recompute the parity information on the destination CPU register. That basically means you always trust the output of the CPU adder. There is simply no relationship between input parity and output parity in the linear adder case. Whenever "parity" information is lost, it opens up the possiblity of irrecoverable errors. That's why I say, that GIT would be better of in that regard with an end-to-end, CRC parity mechanism. --HPS ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: Gitorious should use CRC128 / 256 / 512 instead of SHA-1 2023-01-16 9:55 ` Hans Petter Selasky 2023-01-16 12:31 ` rsbecker @ 2023-01-16 19:08 ` Michal Suchánek 1 sibling, 0 replies; 42+ messages in thread From: Michal Suchánek @ 2023-01-16 19:08 UTC (permalink / raw) To: Hans Petter Selasky; +Cc: git On Mon, Jan 16, 2023 at 10:55:34AM +0100, Hans Petter Selasky wrote: > On 1/16/23 10:13, Michal Suchánek wrote: > > when that data is copied to a new location a new > > CRC is calculated that can detect an error in that location. > > Yes, that is correct, but what is "copying data"? Are you saying that > copying data is always error free? Maybe you should not cut out the answer to your qestion? Thanks Michal ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2023-01-16 19:08 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-13 12:59 Gitorious should use CRC128 / 256 / 512 instead of SHA-1 Hans Petter Selasky 2023-01-13 13:30 ` Konstantin Khomoutov 2023-01-13 13:39 ` Hans Petter Selasky 2023-01-13 14:21 ` rsbecker 2023-01-13 14:42 ` Hans Petter Selasky 2023-01-13 15:45 ` Konstantin Ryabitsev 2023-01-13 15:50 ` Hans Petter Selasky 2023-01-13 15:56 ` rsbecker 2023-01-13 16:02 ` Hans Petter Selasky 2023-01-13 15:54 ` Hans Petter Selasky 2023-01-13 16:02 ` Konstantin Ryabitsev 2023-01-13 16:06 ` Hans Petter Selasky 2023-01-13 16:18 ` Hans Petter Selasky 2023-01-13 16:36 ` Konstantin Ryabitsev 2023-01-13 16:44 ` Hans Petter Selasky 2023-01-13 16:49 ` Konstantin Ryabitsev 2023-01-13 16:51 ` Hans Petter Selasky 2023-01-13 16:27 ` Konstantin Ryabitsev 2023-01-13 16:30 ` Hans Petter Selasky 2023-01-13 16:35 ` Hans Petter Selasky 2023-01-13 16:41 ` Konstantin Ryabitsev 2023-01-13 16:45 ` Hans Petter Selasky 2023-01-13 15:15 ` Hans Petter Selasky 2023-01-13 17:44 ` Philip Oakley 2023-01-13 15:30 ` Konstantin Khomoutov 2023-01-13 15:39 ` Konstantin Ryabitsev 2023-01-13 13:23 Hans Petter Selasky 2023-01-14 23:59 ` brian m. carlson 2023-01-15 3:14 ` Junio C Hamano 2023-01-15 10:09 ` demerphq 2023-01-16 7:21 ` Hans Petter Selasky 2023-01-16 7:23 ` Hans Petter Selasky 2023-01-16 12:34 ` rsbecker 2023-01-16 14:01 ` Hans Petter Selasky 2023-01-16 15:06 ` Junio C Hamano 2023-01-15 13:53 ` Michal Suchánek 2023-01-16 7:17 ` Hans Petter Selasky 2023-01-16 9:13 ` Michal Suchánek 2023-01-16 9:55 ` Hans Petter Selasky 2023-01-16 12:31 ` rsbecker 2023-01-16 14:10 ` Hans Petter Selasky 2023-01-16 19:08 ` Michal Suchánek
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).