On 2023-02-01 at 23:37:19, Junio C Hamano wrote: > "brian m. carlson" writes: > > > I don't think a blurb is necessary, but you're basically underscoring > > the problem, which is that nobody is willing to promise that compression > > is consistent, but yet people want to rely on that fact. I'm willing to > > write and implement a consistent tar spec and to guarantee compatibility > > with that, but the tension here is that people also want gzip to never > > change its byte format ever, which frankly seems unrealistic without > > explicit guarantees. Maybe the authors will agree to promise that, but > > it seems unlikely. > > Just to step back a bit, where does the distinction between > guaranteeing the tar format stability and gzip compressed bitstream > stability come from? At both levels, the same thing can be > expressed in multiple different ways, I think, but spelling out how > exactly the compressor compresses is more involved than spelling out > how entries in a tar archive is ordered and each entry is expressed, > or something? Yes, at least with my understanding about how gzip and compression in general work. The tar format (and the pax format which builds on it) can mostly be restricted by explaining what data is to be included in the pax and tar headers and how it is to be formatted. If we say, we will always write such and such information in the pax header and sort the keys, and we write such and such information in the tar header, then the format is completely deterministic, and we can make nice guarantees. My understanding about how Lempel-Ziv-based compression algorithms work is that there's a lot more freedom to decide how best to compress things and that there isn't always a logical obvious choice, but I will admit my understanding is relatively limited. If someone thinks we can effectively succeed in supporting compression more than just relying on gzip, I would be delighted to be shown to be wrong. > > That would probably break things, because gzip is GPLv3, and we'd need > > to ship a much older GPLv2 gzip, which would probably differ from the > > current behaviour, and might also have some security problems. > > Yup, security issues may make bit-for-bit-stability unrealistic. > IIRC, the last time we had discussion on this topic, we settled > on stability across the same version of Git (i.e. deterministic > result)? Yes, I think that's what we agreed. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA