git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A slight inconvenience with 'git archive --format=tar'
@ 2012-06-13 14:47 Rafał Mużyło
  2012-06-13 17:45 ` Junio C Hamano
  0 siblings, 1 reply; 3+ messages in thread
From: Rafał Mużyło @ 2012-06-13 14:47 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

I just stumbled upon this while checking a few mailing lists.
I haven't found any mails about in in the archives yet, so I assume, that
no mail have been written yet.

The problem is described here:
http://sourceforge.net/projects/sevenzip/forums/forum/45798/topic/5322604

Basically, while this is not a problem for GNU tar, the correct checksum
should be computed using unsigned values.

Attached trivial testcase shows the difference.

Patch making the change shown in the testcase also attached.


[-- Attachment #2: cksum-test.c --]
[-- Type: text/x-c, Size: 520 bytes --]

#include <stdio.h>
#include <string.h>

static unsigned int ustar_header_chksum(const void *buffer, int sign)
{
  const char *p = (const char *)buffer;
  unsigned int chksum = 0;
  while (p < (const char *)buffer + strlen(buffer))
  {
    if (sign) chksum += *p++; else chksum += (unsigned char)*p++;
  }
  return chksum;
}

int main(int argc, char** argv)
{
const char* teststring = "żółte źrebię";
printf("%u\n", ustar_header_chksum(teststring, 0));
printf("%u\n", ustar_header_chksum(teststring, 1));
return 0;
}

[-- Attachment #3: git-tar.patch --]
[-- Type: text/plain, Size: 515 bytes --]

--- archive-tar.c	2012-04-26 21:25:49.000000000 +0200
+++ archive-tar.c	2012-06-13 16:43:59.220945967 +0200
@@ -104,11 +104,11 @@ static unsigned int ustar_header_chksum(
 	char *p = (char *)header;
 	unsigned int chksum = 0;
 	while (p < header->chksum)
-		chksum += *p++;
+		chksum += (unsigned char)*p++;
 	chksum += sizeof(header->chksum) * ' ';
 	p += sizeof(header->chksum);
 	while (p < (char *)header + sizeof(struct ustar_header))
-		chksum += *p++;
+		chksum += (unsigned char)*p++;
 	return chksum;
 }
 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: A slight inconvenience with 'git archive --format=tar'
  2012-06-13 14:47 A slight inconvenience with 'git archive --format=tar' Rafał Mużyło
@ 2012-06-13 17:45 ` Junio C Hamano
  2012-06-13 19:58   ` René Scharfe
  0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2012-06-13 17:45 UTC (permalink / raw)
  To: Rafał Mużyło; +Cc: git

Rafał Mużyło <galtgendo@gmail.com> writes:

> I just stumbled upon this while checking a few mailing lists.
> I haven't found any mails about in in the archives yet, so I assume, that
> no mail have been written yet.
>
> The problem is described here:
> http://sourceforge.net/projects/sevenzip/forums/forum/45798/topic/5322604

Thanks.  It sounds a bit more than "slight inconvenience" to me ;-)

-- >8 --
Date: Wed, 13 Jun 2012 10:42:25 -0700
Subject: [PATCH] archive: ustar header checksum is computed unsigned

POSIX.1 (pax) is pretty clear on this:

  The chksum field shall be the ISO/IEC 646:1991 standard IRV
  representation of the octal value of the simple sum of all octets
  in the header logical record. Each octet in the header shall be
  treated as an unsigned value. These values shall be added to an
  unsigned integer, initialized to zero, the precision of which is
  not less than 17 bits. When calculating the checksum, the chksum
  field is treated as if it were all <space> characters.

so is GNU:

  http://www.gnu.org/software/tar/manual/html_node/Checksumming.html

Found by 7zip folks and reported by Rafał Mużyło.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 archive-tar.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index dc91c6b..0ba3f25 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -139,13 +139,13 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
 
 static unsigned int ustar_header_chksum(const struct ustar_header *header)
 {
-	const char *p = (const char *)header;
+	const unsigned char *p = (const unsigned char *)header;
 	unsigned int chksum = 0;
-	while (p < header->chksum)
+	while (p < (const unsigned char *)header->chksum)
 		chksum += *p++;
 	chksum += sizeof(header->chksum) * ' ';
 	p += sizeof(header->chksum);
-	while (p < (const char *)header + sizeof(struct ustar_header))
+	while (p < (const unsigned char *)header + sizeof(struct ustar_header))
 		chksum += *p++;
 	return chksum;
 }
-- 
1.7.11.rc3.25.g4c2075b

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: A slight inconvenience with 'git archive --format=tar'
  2012-06-13 17:45 ` Junio C Hamano
@ 2012-06-13 19:58   ` René Scharfe
  0 siblings, 0 replies; 3+ messages in thread
From: René Scharfe @ 2012-06-13 19:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Rafał Mużyło, git

Am 13.06.2012 19:45, schrieb Junio C Hamano:
> Rafał Mużyło <galtgendo@gmail.com> writes:
>
>> I just stumbled upon this while checking a few mailing lists.
>> I haven't found any mails about in in the archives yet, so I assume, that
>> no mail have been written yet.
>>
>> The problem is described here:
>> http://sourceforge.net/projects/sevenzip/forums/forum/45798/topic/5322604
>
> Thanks.  It sounds a bit more than "slight inconvenience" to me ;-)

Indeed, but two mitigating factors, if you will, are that this only 
affects files whose path or link target contain non-ASCII characters, 
and -- as Rafał wrote -- that GNU tar silently accepts signed checksums 
as well, which makes it a bit difficult to test.

René

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-06-13 19:58 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-13 14:47 A slight inconvenience with 'git archive --format=tar' Rafał Mużyło
2012-06-13 17:45 ` Junio C Hamano
2012-06-13 19:58   ` René Scharfe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).