All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brandon Williams <bmwill@google.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	Jeff Hostetler <git@jeffhostetler.com>, Jeff King <peff@peff.net>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [RFC] 'unsigned long' to 'size_t' conversion
Date: Wed, 6 Dec 2017 10:10:02 -0800	[thread overview]
Message-ID: <20171206181002.GA59217@google.com> (raw)
In-Reply-To: <a660460d-b294-5113-bfaf-d98bcf99bad5@gmail.com>

On 12/06, Derrick Stolee wrote:
> There are several places in Git where we refer to the size of an
> object by an 'unsigned long' instead of a 'size_t'. In 64-bit Linux,
> 'unsigned long' is 8 bytes, but in 64-bit Windows it is 4 bytes.
> 
> The main issue with this conversion is that large objects fail to
> load (they seem to hash and store just fine). For example, the
> following 'blob8gb' is an 8 GB file where the ith byte is equal to i
> % 256:
> 
> $ git hash-object -w --no-filters blob8gb
> 5391939346b98600acc0283dda24649450cec51f
> 
> $ git cat-file -s 5391939346b98600acc0283dda24649450cec51f
> error: bad object header
> fatal: git cat-file: could not get object info
> 
> An existing discussion can be found here:
> https://github.com/git-for-windows/git/issues/1063
> 
> The error message results from unpack_object_header_buffer() which
> had its most-recent meaningful change in 'ea4f9685:
> unpack_object_header_buffer(): clear the size field upon error' (in
> 2011).
> 
> In my opinion, the correct thing to do would be to replace all
> 'unsigned long's that refer to an object size and replace them with
> 'size_t'. However, a simple "git grep 'unsigned long size'" reveals
> 194 results, and there are other permutations of names and pointer
> types all over.
> 
> This conversion would be a significant patch, so I wanted to get the
> community's thoughts on this conversion.
> 
> If there are small, isolated chunks that can be done safely, then
> this may be a good target for a first patch.

I think that an effort like this would definitely be worthwhile.  Much
like the unsigned char[20] -> struct object_id conversion I would think
that the best way to go about such a conversion would be to do it in
small chunks as you've mentioned.  That way you are only causing churn
in hopefully small parts of the code base at a time instead of one
monolithic change that is sure to cause conflicts.

-- 
Brandon Williams

  parent reply	other threads:[~2017-12-06 18:10 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-06 15:08 [RFC] 'unsigned long' to 'size_t' conversion Derrick Stolee
2017-12-06 16:48 ` Thomas Braun
2017-12-06 18:10 ` Brandon Williams [this message]
2017-12-06 21:43 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171206181002.GA59217@google.com \
    --to=bmwill@google.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.