All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Torsten Bögershausen" <tboegi@web.de>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: Git List <git@vger.kernel.org>, ashishnegi33@gmail.com
Subject: Re: [PATCH 1/1] convert: tighten the safe autocrlf handling
Date: Fri, 24 Nov 2017 19:59:05 +0100	[thread overview]
Message-ID: <20171124185905.GA9736@tor.lan> (raw)
In-Reply-To: <CAPig+cT7=yLUVpmtutmTep5NBbSRNOL17dsOuVvn_Scu7_+p_w@mail.gmail.com>

On Fri, Nov 24, 2017 at 12:24:48PM -0500, Eric Sunshine wrote:
> On Fri, Nov 24, 2017 at 11:14 AM,  <tboegi@web.de> wrote:
> > When a text file had been commited with CRLF and the file is commited
> > again, the CRLF are kept if .gitattributs has "text=auto".
> > This is done by analyzing the content of the blob stored in the index:
> > If a '\r' is found, Git assumes that the blob was commited with CRLF.
> >
> > The simple search for a '\r' does not always work as expected:
> > A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary.
> > Now the content is converted into UTF-8. At the next commit Git treats the
> > file as text, the CRLF should be converted into LF, but isn't.
> >
> > Solution:
> > Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found,
> > 0 is returned directly, this is the most common case.
> > If a '\r' is found, the content is analyzed more deeply.
> >
> > Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> > ---
> > diff --git a/convert.c b/convert.c
> > @@ -220,18 +220,27 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
> > -static int has_cr_in_index(const struct index_state *istate, const char *path)
> > +static int has_crlf_in_index(const struct index_state *istate, const char *path)
> >  {
> >         unsigned long sz;
> >         void *data;
> > -       int has_cr;
> > +       const char *crp;
> > +       int has_crlf = 0;
> >
> >         data = read_blob_data_from_index(istate, path, &sz);
> >         if (!data)
> >                 return 0;
> > -       has_cr = memchr(data, '\r', sz) != NULL;
> > +
> > +       crp = memchr(data, '\r', sz);
> > +       if (crp && (crp[1] == '\n')) {
> 
> If I understand correctly, this isn't a NUL-terminated string and it
> might be a binary blob, so if the lone CR in a file resides at the end
> of the file, won't this try looking for LF out-of-bounds? I would have
> expected the conditional to be:
> 
>     if (crp && crp - data + 1 < sz && crp[1] == '\n') {
> 
> or any equivalent variation.
> 

The read_blob_data_from_index() function should always append a '\0',
regardless if the blob is binary or not.

  reply	other threads:[~2017-11-24 18:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-14 12:31 Changing encoding of a file : What should happen to CRLF in file ? Ashish Negi
2017-11-14 15:20 ` Torsten Bögershausen
2017-11-14 16:13   ` Ashish Negi
2017-11-14 16:15     ` Ashish Negi
2017-11-14 17:09       ` Torsten Bögershausen
2017-11-15  8:11         ` Ashish Negi
2017-11-15 17:12           ` Torsten Bögershausen
2017-11-15 19:05             ` Ashish Negi
2017-11-16 16:15               ` Torsten Bögershausen
2017-11-23 16:31                 ` Ashish Negi
2017-11-23 20:25                   ` Torsten Bögershausen
2017-11-24  6:37                     ` Ashish Negi
2017-11-14 16:45     ` Torsten Bögershausen
2017-11-24 16:14 ` [PATCH 1/1] convert: tighten the safe autocrlf handling tboegi
2017-11-24 17:24   ` Eric Sunshine
2017-11-24 18:59     ` Torsten Bögershausen [this message]
2017-11-25  3:16   ` Junio C Hamano
2017-11-26 12:20 ` [PATCH v2 " tboegi
2017-12-08 17:46 ` [PATCH v1 1/2] t0027: Don't use git commit <empty-pathspec> tboegi
2017-12-08 18:13   ` Junio C Hamano
2017-12-08 18:21     ` Junio C Hamano
2017-12-08 18:50       ` Torsten Bögershausen
2017-12-08 17:46 ` [PATCH v1 2/2] t0027: Adapt the new MIX tests to Windows tboegi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171124185905.GA9736@tor.lan \
    --to=tboegi@web.de \
    --cc=ashishnegi33@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.