* [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data()
2016-05-15 22:14 ` Junio C Hamano
@ 2016-05-16 15:51 ` tboegi
2016-05-16 16:13 ` Junio C Hamano
2016-05-16 15:51 ` [PATCH v3 1/1] " tboegi
` (6 subsequent siblings)
7 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-16 15:51 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Changes since v2:
- Only 1 patch, the t6038 needs to go as a separate "series"
has_cr_in_index() uses either
data = read_sha1_file(sha1, &type, &sz);
or
data = read_blob_data_from_cache(path, &sz);
(I like v2 better, since there is a single code to get a sha object
into memory, which later will be replaced by a streaming object)
But in any case, here is v3
Torsten Bögershausen (1):
convert: ce_compare_data() checks for a sha1 of a path
cache.h | 1 +
convert.c | 35 +++++++++++++++++++++--------------
convert.h | 23 +++++++++++++++++++----
read-cache.c | 4 +++-
sha1_file.c | 17 +++++++++++++----
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data()
2016-05-16 15:51 ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
@ 2016-05-16 16:13 ` Junio C Hamano
2016-05-17 4:08 ` Torsten Bögershausen
0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-16 16:13 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> From: Torsten Bögershausen <tboegi@web.de>
>
> Changes since v2:
>
> - Only 1 patch, the t6038 needs to go as a separate "series"
>
> has_cr_in_index() uses either
> data = read_sha1_file(sha1, &type, &sz);
> or
> data = read_blob_data_from_cache(path, &sz);
>
> (I like v2 better, since there is a single code to get a sha object
> into memory, which later will be replaced by a streaming object)
Wait a minute, please. I only asked the reason why you did it that
way and mentioned that the end result seemed equivalent. "The end
result seems equivalent" does not automatically mean "therefore you
must avoid changing the code."
If you still prefer the original code, and your preference is backed
by a solid reasoning, don't change the code to a less preferrable
version. You may have to explain what you wrote in () above clearly
in an updated log message to save other readers from asking the same
question as I asked, though.
Thanks.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data()
2016-05-16 16:13 ` Junio C Hamano
@ 2016-05-17 4:08 ` Torsten Bögershausen
2016-05-17 16:09 ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
` (6 more replies)
0 siblings, 7 replies; 53+ messages in thread
From: Torsten Bögershausen @ 2016-05-17 4:08 UTC (permalink / raw)
To: Junio C Hamano, tboegi; +Cc: git
On 05/16/2016 06:13 PM, Junio C Hamano wrote:
> Wait a minute, please. I only asked the reason why you did it that
> way and mentioned that the end result seemed equivalent. "The end
> result seems equivalent" does not automatically mean "therefore you
> must avoid changing the code."
>
> If you still prefer the original code, and your preference is backed
> by a solid reasoning, don't change the code to a less preferrable
> version. You may have to explain what you wrote in () above clearly
> in an updated log message to save other readers from asking the same
> question as I asked, though.
>
> Thanks.
> --
>
No problem.
I will re-send v4 in some time and pull out the update of t6038 into an
own path
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v1 1/1] t6038; use crlf on all platforms
2016-05-17 4:08 ` Torsten Bögershausen
@ 2016-05-17 16:09 ` tboegi
2016-05-17 18:39 ` Junio C Hamano
2016-05-17 16:41 ` [PATCH v4 0/2] CRLF: ce_compare_data() checks for a sha1 of a path tboegi
` (5 subsequent siblings)
6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-17 16:09 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
t6038 uses different code, depending if NATIVE_CRLF is set ot not.
When the native line endings are LF, merge.renormalize is not tested very well.
Change the test to always use CRLF by setting core.eol=crlf.
---
Broke the 10/10 series into smaller pieces, this is the update of t6038
t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
1 file changed, 11 insertions(+), 26 deletions(-)
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 85c10b0..4dc8c1a 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
test_expect_success setup '
git config core.autocrlf false &&
+ git config core.eol crlf &&
echo first line | append_cr >file &&
echo first line >control_file &&
@@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
same line
EOF
- if test_have_prereq NATIVE_CRLF; then
- append_cr <expected >expected.temp &&
- mv expected.temp expected
- fi &&
+ append_cr <expected >expected.temp &&
+ mv expected.temp expected &&
git config merge.renormalize true &&
git rm -fr . &&
rm -f .gitattributes &&
@@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
same line
EOF
- if test_have_prereq NATIVE_CRLF; then
- append_cr <expected >expected.temp &&
- mv expected.temp expected
- fi &&
+ append_cr <expected >expected.temp &&
+ mv expected.temp expected &&
git config merge.renormalize true &&
git rm -fr . &&
rm -f .gitattributes &&
@@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
echo "<<<<<<<" >expected &&
- if test_have_prereq NATIVE_CRLF; then
- echo first line | append_cr >>expected &&
- echo same line | append_cr >>expected &&
- echo ======= | append_cr >>expected
- else
- echo first line >>expected &&
- echo same line >>expected &&
- echo ======= >>expected
- fi &&
+ echo first line | append_cr >>expected &&
+ echo same line | append_cr >>expected &&
+ echo ======= | append_cr >>expected &&
echo first line | append_cr >>expected &&
echo same line | append_cr >>expected &&
echo ">>>>>>>" >>expected &&
@@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
echo "<<<<<<<" >expected &&
echo first line | append_cr >>expected &&
echo same line | append_cr >>expected &&
- if test_have_prereq NATIVE_CRLF; then
- echo ======= | append_cr >>expected &&
- echo first line | append_cr >>expected &&
- echo same line | append_cr >>expected
- else
- echo ======= >>expected &&
- echo first line >>expected &&
- echo same line >>expected
- fi &&
+ echo ======= | append_cr >>expected &&
+ echo first line | append_cr >>expected &&
+ echo same line | append_cr >>expected &&
echo ">>>>>>>" >>expected &&
git config merge.renormalize false &&
rm -f .gitattributes &&
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v1 1/1] t6038; use crlf on all platforms
2016-05-17 16:09 ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
@ 2016-05-17 18:39 ` Junio C Hamano
0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-17 18:39 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> From: Torsten Bögershausen <tboegi@web.de>
>
> t6038 uses different code, depending if NATIVE_CRLF is set ot not.
> When the native line endings are LF, merge.renormalize is not tested very well.
> Change the test to always use CRLF by setting core.eol=crlf.
> ---
> Broke the 10/10 series into smaller pieces, this is the update of t6038
Thanks.
- Missing sign-off.
- "... is not tested very well", which implies "with this change,
it will be tested", is a secondary benefit. The reader need to
agree that, whether the platform native line ending is CRLF or
LF, 'git merge' should behave identically on any platform, as
long as the line ending convention used in the repository is
explicitly set in the same way, before agreeing that this is a
good thing to do in general. And that is a bigger benefit, no?
- But doesn't the same principle apply in the other direction?
When forced to do core.eol=lf, a platform with NATIVE_CRLF should
behave identically to how a platform without NATIVE_CRLF would?
With this patch, we lose test coverage for core.eol=lf case,
which used to be tested on non-NATIVE_CRLF platforms. Isn't that
a concern to us?
> t/t6038-merge-text-auto.sh | 37 +++++++++++--------------------------
> 1 file changed, 11 insertions(+), 26 deletions(-)
>
> diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
> index 85c10b0..4dc8c1a 100755
> --- a/t/t6038-merge-text-auto.sh
> +++ b/t/t6038-merge-text-auto.sh
> @@ -18,6 +18,7 @@ test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
>
> test_expect_success setup '
> git config core.autocrlf false &&
> + git config core.eol crlf &&
>
> echo first line | append_cr >file &&
> echo first line >control_file &&
> @@ -72,10 +73,8 @@ test_expect_success 'Merge after setting text=auto' '
> same line
> EOF
>
> - if test_have_prereq NATIVE_CRLF; then
> - append_cr <expected >expected.temp &&
> - mv expected.temp expected
> - fi &&
> + append_cr <expected >expected.temp &&
> + mv expected.temp expected &&
> git config merge.renormalize true &&
> git rm -fr . &&
> rm -f .gitattributes &&
> @@ -90,10 +89,8 @@ test_expect_success 'Merge addition of text=auto' '
> same line
> EOF
>
> - if test_have_prereq NATIVE_CRLF; then
> - append_cr <expected >expected.temp &&
> - mv expected.temp expected
> - fi &&
> + append_cr <expected >expected.temp &&
> + mv expected.temp expected &&
> git config merge.renormalize true &&
> git rm -fr . &&
> rm -f .gitattributes &&
> @@ -104,15 +101,9 @@ test_expect_success 'Merge addition of text=auto' '
>
> test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
> echo "<<<<<<<" >expected &&
> - if test_have_prereq NATIVE_CRLF; then
> - echo first line | append_cr >>expected &&
> - echo same line | append_cr >>expected &&
> - echo ======= | append_cr >>expected
> - else
> - echo first line >>expected &&
> - echo same line >>expected &&
> - echo ======= >>expected
> - fi &&
> + echo first line | append_cr >>expected &&
> + echo same line | append_cr >>expected &&
> + echo ======= | append_cr >>expected &&
> echo first line | append_cr >>expected &&
> echo same line | append_cr >>expected &&
> echo ">>>>>>>" >>expected &&
> @@ -128,15 +119,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
> echo "<<<<<<<" >expected &&
> echo first line | append_cr >>expected &&
> echo same line | append_cr >>expected &&
> - if test_have_prereq NATIVE_CRLF; then
> - echo ======= | append_cr >>expected &&
> - echo first line | append_cr >>expected &&
> - echo same line | append_cr >>expected
> - else
> - echo ======= >>expected &&
> - echo first line >>expected &&
> - echo same line >>expected
> - fi &&
> + echo ======= | append_cr >>expected &&
> + echo first line | append_cr >>expected &&
> + echo same line | append_cr >>expected &&
> echo ">>>>>>>" >>expected &&
> git config merge.renormalize false &&
> rm -f .gitattributes &&
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v4 0/2] CRLF: ce_compare_data() checks for a sha1 of a path
2016-05-17 4:08 ` Torsten Bögershausen
2016-05-17 16:09 ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
@ 2016-05-17 16:41 ` tboegi
2016-05-17 16:41 ` [PATCH v4 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
` (4 subsequent siblings)
6 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-17 16:41 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Break up the old 10/10 series about CLRF handling into smaller
series. This is a small bugfix, when merge.renormalize is used
with core.autocrlf (and no attributes are set).
Prepare the refactoring to use the streaming interface.
Torsten Bögershausen (2):
read-cache: factor out get_sha1_from_index() helper
convert: ce_compare_data() checks for a sha1 of a path
cache.h | 4 ++++
convert.c | 34 ++++++++++++++++++++++------------
convert.h | 23 +++++++++++++++++++----
read-cache.c | 33 +++++++++++++++++++++------------
sha1_file.c | 17 +++++++++++++----
5 files changed, 79 insertions(+), 32 deletions(-)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v4 1/2] read-cache: factor out get_sha1_from_index() helper
2016-05-17 4:08 ` Torsten Bögershausen
2016-05-17 16:09 ` [PATCH v1 1/1] t6038; use crlf on all platforms tboegi
2016-05-17 16:41 ` [PATCH v4 0/2] CRLF: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-17 16:41 ` tboegi
2016-05-17 16:41 ` [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
` (3 subsequent siblings)
6 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-17 16:41 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().
This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.
Add a wrapper definition get_sha1_from_cache().
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 3 +++
read-cache.c | 29 ++++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
#define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path) get_sha1_from_index (&the_index, (path))
#endif
enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
}
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
/*
* This internal function is only declared here for the benefit of
* lookup_replace_object(). Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
{
- int pos, len;
+ const unsigned char *sha1;
unsigned long sz;
enum object_type type;
void *data;
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
if (pos < 0) {
/*
* We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
}
if (pos < 0)
return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
}
void stat_validity_clear(struct stat_validity *sv)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-17 4:08 ` Torsten Bögershausen
` (2 preceding siblings ...)
2016-05-17 16:41 ` [PATCH v4 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-17 16:41 ` tboegi
2016-05-17 18:58 ` Junio C Hamano
2016-05-19 14:21 ` [PATCH v5 0/2] CRLF: " tboegi
` (2 subsequent siblings)
6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-17 16:41 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.
Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.
While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.
has_cr_in_index() does not use read_blob_data_from_cache() any more.
A single function to read data for a give sha value makes it easier to
refactor has_cr_in_index() to use the streaming interface.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 1 +
convert.c | 34 ++++++++++++++++++++++------------
convert.h | 23 +++++++++++++++++++----
read-cache.c | 4 +++-
sha1_file.c | 17 +++++++++++++----
5 files changed, 58 insertions(+), 21 deletions(-)
diff --git a/cache.h b/cache.h
index 15a2a10..a5136c0 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
#define HASH_WRITE_OBJECT 1
#define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1 4
extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
diff --git a/convert.c b/convert.c
index f524b8d..c972d14 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
}
}
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
{
unsigned long sz;
void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
+ int has_cr = 0;
+ enum object_type type;
+ if (!sha1)
+ sha1 = get_sha1_from_cache(path);
+ if (!sha1)
+ return 0;
+ data = read_sha1_file(sha1, &type, &sz);
if (!data)
return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
+ if (type == OBJ_BLOB)
+ has_cr = memchr(data, '\r', sz) != NULL;
+
free(data);
return has_cr;
}
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
struct strbuf *buf,
enum crlf_action crlf_action, enum safe_crlf checksafe)
{
@@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
* If the file in the index has any CR in it, do not convert.
* This is the new safer autocrlf handling.
*/
- if (has_cr_in_index(path))
+ if (has_cr_in_index(path, sha1))
return 0;
}
}
@@ -852,8 +859,9 @@ const char *get_convert_attr_ascii(const char *path)
return "";
}
-int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
{
int ret = 0;
const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
if (ret && dst) {
src = dst->buf;
len = dst->len;
@@ -882,7 +890,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident);
}
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe)
{
struct conv_attrs ca;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
die("%s: clean filter '%s' failed", path, ca.drv->name);
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
}
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
extern const char *get_convert_attr_ascii(const char *path);
/* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
+{
+ return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
extern int convert_to_working_tree(const char *path, const char *src,
size_t len, struct strbuf *dst);
extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
{
return convert_to_git(path, NULL, 0, NULL, 0);
}
+static inline int would_convert_to_git_ce_sha1(const char *path,
+ const unsigned char *sha1)
+{
+ return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
/* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
- struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe);
extern int would_convert_to_git_filter_fd(const char *path);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..0ebc237 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
if (fd >= 0) {
unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_CE_HAS_SHA1;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);
/* index_fd() closed the file descriptor already */
}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
{
int ret, re_allocated = 0;
int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
if (!type)
type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
*/
if ((type == OBJ_BLOB) && path) {
struct strbuf nbuf = STRBUF_INIT;
- if (convert_to_git(path, buf, size, &nbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ if (convert_to_git_ce_sha1(path,
+ valid_sha1 ? sha1 : NULL,
+ buf, size, &nbuf,
+ write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
buf = strbuf_detach(&nbuf, &size);
re_allocated = 1;
}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
{
int ret;
const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
struct strbuf sbuf = STRBUF_INIT;
assert(path);
assert(would_convert_to_git_filter_fd(path));
- convert_to_git_filter_fd(path, fd, &sbuf,
+ convert_to_git_filter_fd(path,
+ valid_sha1 ? sha1 : NULL,
+ fd, &sbuf,
write_object ? safe_crlf : SAFE_CRLF_FALSE);
if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
else if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
flags);
else
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-17 16:41 ` [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-17 18:58 ` Junio C Hamano
2016-05-18 4:26 ` Torsten Bögershausen
0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-17 18:58 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> #define HASH_WRITE_OBJECT 1
> #define HASH_FORMAT_CHECK 2
> +#define HASH_CE_HAS_SHA1 4
How does one pronounce the words in this constant? Does it make a
listener understand what this constant means?
/*
* We need a comment around here to say what these two
* parameters mean. I am guessing that (1) if sha1 is not NULL,
* path is ignored and the function inspects if it has CR; (2)
* otherwise it checks the index entry at path and inspects if
* it has CR.
*/
>
> -static int has_cr_in_index(const char *path)
> +static int has_cr_in_index(const char *path, const unsigned char *sha1)
> {
This makes me seriously wonder if it is a good idea to modify this
function like this. (1) means this function is not about IN INDEX
at all!
Perhaps add a "static int blob_has_cr(const unsigned char *sha1)"
and call it from the real caller you added that wants to call this
butchered two-param version that has sha1 is a better solution?
> -static int crlf_to_git(const char *path, const char *src, size_t len,
> +static int crlf_to_git(const char *path, const unsigned char *sha1,
> + const char *src, size_t len,
> struct strbuf *buf,
> enum crlf_action crlf_action, enum safe_crlf checksafe)
> {
> @@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
> * If the file in the index has any CR in it, do not convert.
> * This is the new safer autocrlf handling.
> */
> - if (has_cr_in_index(path))
> + if (has_cr_in_index(path, sha1))
I think this change is too ugly. The new "sha1" parameter is
telling us that "in order to see if the indexed version has CR, do
not look at the index, but look at the contents of this blob".
But wouldn't the result become more understandable if instead we
passed a new parameter "cr-state-for-conversion" that takes UNKNOWN,
HAS_CR, or NO_CR to this function?
In other words, what if, when the caller knows it does not care
what's in the index but wants to instead see if a different blob has
CR, we make it the caller's responsibility to figure it out by
calling blob_has_cr() before calling into this codepath and pass the
result of that check down? When cr-state-for-conversion is UNKNOWN,
this code knows that it needs to call has_cr_in_index(path) to
figure it out itself. Otherwise, it can and should use the
caller-supplied value without asking has_cr_in_index(path).
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-17 18:58 ` Junio C Hamano
@ 2016-05-18 4:26 ` Torsten Bögershausen
2016-05-18 15:10 ` Torsten Bögershausen
0 siblings, 1 reply; 53+ messages in thread
From: Torsten Bögershausen @ 2016-05-18 4:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On 05/17/2016 08:58 PM, Junio C Hamano wrote:
> tboegi@web.de writes:
>
>> #define HASH_WRITE_OBJECT 1
>> #define HASH_FORMAT_CHECK 2
>> +#define HASH_CE_HAS_SHA1 4
>
> How does one pronounce the words in this constant? Does it make a
> listener understand what this constant means?
How about
HASH_USE_SHA1_FROM_CE
or
HASH_CE_HAS_VALID_SHA1
>
>
> /*
> * We need a comment around here to say what these two
> * parameters mean. I am guessing that (1) if sha1 is not NULL,
> * path is ignored and the function inspects if it has CR; (2)
> * otherwise it checks the index entry at path and inspects if
> * it has CR.
> */
>>
>> -static int has_cr_in_index(const char *path)
>> +static int has_cr_in_index(const char *path, const unsigned char *sha1)
>> {
>
> This makes me seriously wonder if it is a good idea to modify this
> function like this. (1) means this function is not about IN INDEX
> at all!
>
> Perhaps add a "static int blob_has_cr(const unsigned char *sha1)"
> and call it from the real caller you added that wants to call this
> butchered two-param version that has sha1 is a better solution?
>
>> -static int crlf_to_git(const char *path, const char *src, size_t len,
>> +static int crlf_to_git(const char *path, const unsigned char *sha1,
>> + const char *src, size_t len,
>> struct strbuf *buf,
>> enum crlf_action crlf_action, enum safe_crlf checksafe)
>> {
>> @@ -260,7 +267,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
>> * If the file in the index has any CR in it, do not convert.
>> * This is the new safer autocrlf handling.
>> */
>> - if (has_cr_in_index(path))
>> + if (has_cr_in_index(path, sha1))
>
> I think this change is too ugly. The new "sha1" parameter is
> telling us that "in order to see if the indexed version has CR, do
> not look at the index, but look at the contents of this blob".
Thanks for some fresh eyes, I guess that
blob_has_cr(sha1)
would make most sense.
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-18 4:26 ` Torsten Bögershausen
@ 2016-05-18 15:10 ` Torsten Bögershausen
0 siblings, 0 replies; 53+ messages in thread
From: Torsten Bögershausen @ 2016-05-18 15:10 UTC (permalink / raw)
To: Torsten Bögershausen, Junio C Hamano; +Cc: git
On 18.05.16 06:26, Torsten Bögershausen wrote:
> On 05/17/2016 08:58 PM, Junio C Hamano wrote:
>> tboegi@web.de writes:
>>
>>> #define HASH_WRITE_OBJECT 1
>>> #define HASH_FORMAT_CHECK 2
>>> +#define HASH_CE_HAS_SHA1 4
>>
>> How does one pronounce the words in this constant? Does it make a
>> listener understand what this constant means?
> How about
> HASH_USE_SHA1_FROM_CE
> or
> HASH_CE_HAS_VALID_SHA1
or, before I send a new patch,
HASH_USE_SHA_NOT_PATH
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 0/2] CRLF: ce_compare_data() checks for a sha1 of a path
2016-05-17 4:08 ` Torsten Bögershausen
` (3 preceding siblings ...)
2016-05-17 16:41 ` [PATCH v4 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-19 14:21 ` tboegi
2016-05-19 23:10 ` Junio C Hamano
2016-05-19 14:21 ` [PATCH v5 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-19 14:21 ` [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-19 14:21 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Break up the old 10/10 series about CLRF handling into smaller
series. This is a small bugfix, when merge.renormalize is used
with core.autocrlf (and no attributes are set).
Prepare the refactoring to use the streaming interface.
Changes since v4:
- Rename #define in cache.h into HASH_USE_SHA_NOT_PATH
- convert.c: Rename has_cr_in_index into blob_has_cr()
Better logic when sha1 != NULL,
Adjusted the commit message
Torsten Bögershausen (2):
read-cache: factor out get_sha1_from_index() helper
convert: ce_compare_data() checks for a sha1 of a path
cache.h | 4 ++++
convert.c | 34 ++++++++++++++++++++++------------
convert.h | 23 +++++++++++++++++++----
read-cache.c | 33 +++++++++++++++++++++------------
sha1_file.c | 17 +++++++++++++----
5 files changed, 79 insertions(+), 32 deletions(-)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v5 0/2] CRLF: ce_compare_data() checks for a sha1 of a path
2016-05-19 14:21 ` [PATCH v5 0/2] CRLF: " tboegi
@ 2016-05-19 23:10 ` Junio C Hamano
0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-19 23:10 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> From: Torsten Bögershausen <tboegi@web.de>
>
> Break up the old 10/10 series about CLRF handling into smaller
> series. This is a small bugfix, when merge.renormalize is used
> with core.autocrlf (and no attributes are set).
Is it worth protecting the fix with a new test? Or does this flip
an existing "expect-failure" to "expect-success"?
> Prepare the refactoring to use the streaming interface.
> Changes since v4:
> - Rename #define in cache.h into HASH_USE_SHA_NOT_PATH
> - convert.c: Rename has_cr_in_index into blob_has_cr()
> Better logic when sha1 != NULL,
> Adjusted the commit message
>
> Torsten Bögershausen (2):
> read-cache: factor out get_sha1_from_index() helper
> convert: ce_compare_data() checks for a sha1 of a path
>
> cache.h | 4 ++++
> convert.c | 34 ++++++++++++++++++++++------------
> convert.h | 23 +++++++++++++++++++----
> read-cache.c | 33 +++++++++++++++++++++------------
> sha1_file.c | 17 +++++++++++++----
> 5 files changed, 79 insertions(+), 32 deletions(-)
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v5 1/2] read-cache: factor out get_sha1_from_index() helper
2016-05-17 4:08 ` Torsten Bögershausen
` (4 preceding siblings ...)
2016-05-19 14:21 ` [PATCH v5 0/2] CRLF: " tboegi
@ 2016-05-19 14:21 ` tboegi
2016-05-19 14:21 ` [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
6 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-19 14:21 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().
This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.
Add a wrapper definition get_sha1_from_cache().
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 3 +++
read-cache.c | 29 ++++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
#define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path) get_sha1_from_index (&the_index, (path))
#endif
enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
}
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
/*
* This internal function is only declared here for the benefit of
* lookup_replace_object(). Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
{
- int pos, len;
+ const unsigned char *sha1;
unsigned long sz;
enum object_type type;
void *data;
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
if (pos < 0) {
/*
* We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
}
if (pos < 0)
return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
}
void stat_validity_clear(struct stat_validity *sv)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-17 4:08 ` Torsten Bögershausen
` (5 preceding siblings ...)
2016-05-19 14:21 ` [PATCH v5 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-19 14:21 ` tboegi
2016-05-19 23:03 ` Junio C Hamano
6 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-19 14:21 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.
Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.
While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.
While at it, rename has_cr_in_index() into blob_has_cr()
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 1 +
convert.c | 34 ++++++++++++++++++++++------------
convert.h | 23 +++++++++++++++++++----
read-cache.c | 4 +++-
sha1_file.c | 17 +++++++++++++----
5 files changed, 58 insertions(+), 21 deletions(-)
diff --git a/cache.h b/cache.h
index 15a2a10..868599e 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
#define HASH_WRITE_OBJECT 1
#define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
diff --git a/convert.c b/convert.c
index f524b8d..92ddfb1 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,26 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
}
}
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *sha1)
{
unsigned long sz;
void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
+ int has_cr = 0;
+ enum object_type type;
+ if (!sha1)
+ return 0;
+ data = read_sha1_file(sha1, &type, &sz);
if (!data)
return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
+ if (type == OBJ_BLOB)
+ has_cr = memchr(data, '\r', sz) != NULL;
+
free(data);
return has_cr;
}
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
struct strbuf *buf,
enum crlf_action crlf_action, enum safe_crlf checksafe)
{
@@ -260,7 +265,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
* If the file in the index has any CR in it, do not convert.
* This is the new safer autocrlf handling.
*/
- if (has_cr_in_index(path))
+ if (!sha1)
+ sha1 = get_sha1_from_cache(path);
+ if (blob_has_cr(sha1))
return 0;
}
}
@@ -852,8 +859,9 @@ const char *get_convert_attr_ascii(const char *path)
return "";
}
-int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
{
int ret = 0;
const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
if (ret && dst) {
src = dst->buf;
len = dst->len;
@@ -882,7 +890,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident);
}
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe)
{
struct conv_attrs ca;
@@ -894,7 +904,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
die("%s: clean filter '%s' failed", path, ca.drv->name);
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
}
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
extern const char *get_convert_attr_ascii(const char *path);
/* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
+{
+ return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
extern int convert_to_working_tree(const char *path, const char *src,
size_t len, struct strbuf *dst);
extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
{
return convert_to_git(path, NULL, 0, NULL, 0);
}
+static inline int would_convert_to_git_ce_sha1(const char *path,
+ const unsigned char *sha1)
+{
+ return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
/* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
- struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe);
extern int would_convert_to_git_filter_fd(const char *path);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
if (fd >= 0) {
unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_USE_SHA_NOT_PATH;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);
/* index_fd() closed the file descriptor already */
}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..1fdbfd3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
{
int ret, re_allocated = 0;
int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
if (!type)
type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
*/
if ((type == OBJ_BLOB) && path) {
struct strbuf nbuf = STRBUF_INIT;
- if (convert_to_git(path, buf, size, &nbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ if (convert_to_git_ce_sha1(path,
+ valid_sha1 ? sha1 : NULL,
+ buf, size, &nbuf,
+ write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
buf = strbuf_detach(&nbuf, &size);
re_allocated = 1;
}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
{
int ret;
const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
struct strbuf sbuf = STRBUF_INIT;
assert(path);
assert(would_convert_to_git_filter_fd(path));
- convert_to_git_filter_fd(path, fd, &sbuf,
+ convert_to_git_filter_fd(path,
+ valid_sha1 ? sha1 : NULL,
+ fd, &sbuf,
write_object ? safe_crlf : SAFE_CRLF_FALSE);
if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
else if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
flags);
else
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-19 14:21 ` [PATCH v5 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-19 23:03 ` Junio C Hamano
2016-05-20 17:12 ` [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
` (3 more replies)
0 siblings, 4 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-19 23:03 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> +int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
> + const char *src, size_t len,
> + struct strbuf *dst, enum safe_crlf checksafe)
That's a strange name for the function, as "ce" and "sha1" gives no
hints what they are about.
If I understand correctly:
- convert_to_git() is about converting <src, len> into <dst>, and
"path" is not for reading the contents to be converted. It is
used to tell crlf_to_git() the index entry to pay attention to
when defeating the usual conversion rules due to strange contents
in the index (it also is used to report errors for safe-crlf
check).
- This one adds "sha1", and that is not about the contents to be
converted, either. Like "path", "sha1" is used to tell what blob
to check when disabling the usual conversion rules.
Does the above description help in coming up with a better
description for the ce/sha1 thing? The comment near the code that
uses them reads like so:
/*
* If the file in the index has any CR in it, do not convert.
* This is the new safer autocrlf handling.
*/
What is a good name to call the input to that logic? "This
function, in addition to convert_to_git(), takes an extra parameter,
that tells it an extra piece of information 'X'"; what is X?
At the same time, the parameter "sha1" needs to be renamed to
clarify what object it is and what purpose it serves. "sha1" alone
is an overly generic name, and it does not hint that it may not even
be given to the function, and that it doesn't have anything to do
with the contents <src, len> points at.
Note. Perhaps you wanted _ce_sha1 suffix to tell the readers
that it takes "an object name of the cache entry" that
further affects the conversion? If so the sha1 parameter
should be renamed to match (and make it clear to readers
that is what you meant).
The "sha1" is pretending to be the more important input for the
primary function of this function by being in early part of the
parameter list. This may need to be rethought; we probably should
have done so as part of your previous refactoring of this file.
convert_to_git() takes the data for <path> in <src, len> and gives
result in <dst>, so these four should be its first parameters. The
detail of the way the conversion works may be affected by additional
parameters, e.g. <checksafe> controls if extra warnings are given.
The <sha1> is to influence the conversion logic further to disable
the crlf-to-git conversion by inspecting a blob, and it tells the
function that the blob comes from an unusual place (as opposed to
the index entry at <path>). So it should sit next to checksafe as
an auxiliary input parameter.
The logic implemented by the patch looks correct, but I'd have to
say that the result is an unmaintainable mess. Right now, I can see
what is going on in the new code. I am sure that in 6 months, with
poorly named helper functions and parameters, I will have hard time
recalling how the new code works.
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper
2016-05-19 23:03 ` Junio C Hamano
@ 2016-05-20 17:12 ` tboegi
2016-05-20 17:12 ` [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
` (2 subsequent siblings)
3 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-20 17:12 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().
This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.
Add a wrapper definition get_sha1_from_cache().
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 3 +++
read-cache.c | 29 ++++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
#define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path) get_sha1_from_index (&the_index, (path))
#endif
enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
}
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
/*
* This internal function is only declared here for the benefit of
* lookup_replace_object(). Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
{
- int pos, len;
+ const unsigned char *sha1;
unsigned long sz;
enum object_type type;
void *data;
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
if (pos < 0) {
/*
* We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
}
if (pos < 0)
return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
}
void stat_validity_clear(struct stat_validity *sv)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-19 23:03 ` Junio C Hamano
2016-05-20 17:12 ` [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-20 17:12 ` tboegi
2016-05-20 17:46 ` Junio C Hamano
2016-05-21 10:01 ` [PATCH v7 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-21 10:01 ` [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
3 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-20 17:12 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
To compare a file in working tree with the index, convert_to_git() is used,
the result is hashed and the hash value compared with ce->sha1.
Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.
While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.
Forward sha1 from ce_compare_data() into convert_to_git().
All other callers use NULL, and the sha1 it is determined from path using
get_sha1_from_cache(path), this is the same handling as before.
Re-order the arguments for convert_to_git() according to their importance:
`src`, `len` and `dst` are the place in memory, where the conversion is done
`path` is the file name to look up the attributes.
`sha1` is needed by the "new safer autocrlf handling".
`checksafe` determines, if a warning is printed or an error is raised.
In the same spirit, forward the sha1 into would_convert_to_git().
While at it, rename has_cr_in_index() into blob_has_cr()
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Changes sinve v6:
decrease the messiness with 12 %
convert_to_git() has a re-ordered parameter list.
Describe whats going on better in the commit msg.
Cleanup: 0 -> SAFE_CRLF_FALSE at some places
---
builtin/apply.c | 3 ++-
builtin/blame.c | 2 +-
cache.h | 1 +
combine-diff.c | 4 +++-
convert.c | 38 +++++++++++++++++++++++++-------------
convert.h | 20 ++++++++++++++------
diff.c | 3 ++-
dir.c | 2 +-
read-cache.c | 4 +++-
sha1_file.c | 17 +++++++++++++----
10 files changed, 65 insertions(+), 29 deletions(-)
diff --git a/builtin/apply.c b/builtin/apply.c
index 8e4da2e..c01654a 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -2140,7 +2140,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
case S_IFREG:
if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
return error(_("unable to open or read %s"), path);
- convert_to_git(path, buf->buf, buf->len, buf, 0);
+ convert_to_git(buf->buf, buf->len, buf,
+ path, NULL, SAFE_CRLF_FALSE);
return 0;
default:
return -1;
diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0..4a01e20 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2377,7 +2377,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
if (strbuf_read(&buf, 0, 0) < 0)
die_errno("failed to read from stdin");
}
- convert_to_git(path, buf.buf, buf.len, &buf, 0);
+ convert_to_git(buf.buf, buf.len, &buf, path, NULL, SAFE_CRLF_FALSE);
origin->file.ptr = buf.buf;
origin->file.size = buf.len;
pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);
diff --git a/cache.h b/cache.h
index 15a2a10..868599e 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
#define HASH_WRITE_OBJECT 1
#define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
diff --git a/combine-diff.c b/combine-diff.c
index 0e1d4b0..cac4c81 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1053,7 +1053,9 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
if (is_file) {
struct strbuf buf = STRBUF_INIT;
- if (convert_to_git(elem->path, result, len, &buf, safe_crlf)) {
+ if (convert_to_git(result, len, &buf,
+ elem->path, NULL,
+ safe_crlf)) {
free(result);
result = strbuf_detach(&buf, &len);
result_size = len;
diff --git a/convert.c b/convert.c
index f524b8d..a58bb26 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,26 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
}
}
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *sha1)
{
unsigned long sz;
void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
+ int has_cr = 0;
+ enum object_type type;
+ if (!sha1)
+ return 0;
+ data = read_sha1_file(sha1, &type, &sz);
if (!data)
return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
+ if (type == OBJ_BLOB)
+ has_cr = memchr(data, '\r', sz) != NULL;
+
free(data);
return has_cr;
}
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
struct strbuf *buf,
enum crlf_action crlf_action, enum safe_crlf checksafe)
{
@@ -260,7 +265,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
* If the file in the index has any CR in it, do not convert.
* This is the new safer autocrlf handling.
*/
- if (has_cr_in_index(path))
+ if (!sha1)
+ sha1 = get_sha1_from_cache(path);
+ if (blob_has_cr(sha1))
return 0;
}
}
@@ -852,8 +859,10 @@ const char *get_convert_attr_ascii(const char *path)
return "";
}
-int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git(const char *src, size_t len,
+ struct strbuf *dst,
+ const char *path, const unsigned char *sha1,
+ enum safe_crlf checksafe)
{
int ret = 0;
const char *filter = NULL;
@@ -874,7 +883,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
if (ret && dst) {
src = dst->buf;
len = dst->len;
@@ -882,7 +891,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident);
}
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe)
{
struct conv_attrs ca;
@@ -894,7 +905,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
die("%s: clean filter '%s' failed", path, ca.drv->name);
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
}
@@ -949,7 +960,8 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
src = dst->buf;
len = dst->len;
}
- return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE);
+ ret |= convert_to_git(src, len, dst, path, NULL, SAFE_CRLF_FALSE);
+ return ret;
}
/*****************************************************************
diff --git a/convert.h b/convert.h
index ccf436b..12fe767 100644
--- a/convert.h
+++ b/convert.h
@@ -37,19 +37,27 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
extern const char *get_convert_attr_ascii(const char *path);
/* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git(const char *src, size_t len,
+ struct strbuf *dst,
+ const char *path, const unsigned char *sha1,
+ enum safe_crlf checksafe);
+
extern int convert_to_working_tree(const char *path, const char *src,
size_t len, struct strbuf *dst);
extern int renormalize_buffer(const char *path, const char *src, size_t len,
struct strbuf *dst);
-static inline int would_convert_to_git(const char *path)
+
+static inline int would_convert_to_git(const char *path,
+ const unsigned char *sha1)
{
- return convert_to_git(path, NULL, 0, NULL, 0);
+ return convert_to_git(NULL, 0, NULL, path, sha1, SAFE_CRLF_FALSE);
}
+
+
/* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
- struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe);
extern int would_convert_to_git_filter_fd(const char *path);
diff --git a/diff.c b/diff.c
index d3734d3..9c00973 100644
--- a/diff.c
+++ b/diff.c
@@ -2810,7 +2810,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
/*
* Convert from working tree format to canonical git format
*/
- if (convert_to_git(s->path, s->data, s->size, &buf, crlf_warn)) {
+ if (convert_to_git(s->data, s->size, &buf, s->path, NULL,
+ crlf_warn)) {
size_t size = 0;
munmap(s->data, s->size);
s->should_munmap = 0;
diff --git a/dir.c b/dir.c
index 656f272..5ac379d 100644
--- a/dir.c
+++ b/dir.c
@@ -713,7 +713,7 @@ static int add_excludes(const char *fname, const char *base, int baselen,
(pos = cache_name_pos(fname, strlen(fname))) >= 0 &&
!ce_stage(active_cache[pos]) &&
ce_uptodate(active_cache[pos]) &&
- !would_convert_to_git(fname))
+ !would_convert_to_git(fname, NULL))
hashcpy(sha1_stat->sha1, active_cache[pos]->sha1);
else
hash_sha1_file(buf, size, "blob", sha1_stat->sha1);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
if (fd >= 0) {
unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_USE_SHA_NOT_PATH;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);
/* index_fd() closed the file descriptor already */
}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..48906b0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
{
int ret, re_allocated = 0;
int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
if (!type)
type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
*/
if ((type == OBJ_BLOB) && path) {
struct strbuf nbuf = STRBUF_INIT;
- if (convert_to_git(path, buf, size, &nbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ if (convert_to_git(
+ buf, size, &nbuf,path,
+ valid_sha1 ? sha1 : NULL,
+ write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
buf = strbuf_detach(&nbuf, &size);
re_allocated = 1;
}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
{
int ret;
const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
struct strbuf sbuf = STRBUF_INIT;
assert(path);
assert(would_convert_to_git_filter_fd(path));
- convert_to_git_filter_fd(path, fd, &sbuf,
+ convert_to_git_filter_fd(path,
+ valid_sha1 ? sha1 : NULL,
+ fd, &sbuf,
write_object ? safe_crlf : SAFE_CRLF_FALSE);
if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
else if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git(path,sha1_ce)))
ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
flags);
else
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-20 17:12 ` [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-20 17:46 ` Junio C Hamano
0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-20 17:46 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> From: Torsten Bögershausen <tboegi@web.de>
>
> To compare a file in working tree with the index, convert_to_git() is used,
> the result is hashed and the hash value compared with ce->sha1.
>
> Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
> are converted or not: When a CRLF had been in the index before, CRLF in
> the working tree are not converted.
>
> While in a merge, a file name in the working tree has different blobs
> in the index with different hash values.
> Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
> the would_convert_crlf_at_commit() looks at the appropriate blob.
>
> Forward sha1 from ce_compare_data() into convert_to_git().
> All other callers use NULL, and the sha1 it is determined from path using
> get_sha1_from_cache(path), this is the same handling as before.
>
> Re-order the arguments for convert_to_git() according to their importance:
> `src`, `len` and `dst` are the place in memory, where the conversion is done
> `path` is the file name to look up the attributes.
> `sha1` is needed by the "new safer autocrlf handling".
> `checksafe` determines, if a warning is printed or an error is raised.
>
> In the same spirit, forward the sha1 into would_convert_to_git().
>
> While at it, rename has_cr_in_index() into blob_has_cr()
>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> Changes sinve v6:
> decrease the messiness with 12 %
What does that mean????
> convert_to_git() has a re-ordered parameter list.
That is not what I suggested, though. <path, src, len> being
the first three would be fine, and that would be in line with
helpers ${frotz}_to_git() that are used from there.
The primary thing that made me worried was a new parameter with a
bland name "sha1" whose purpose is unclear was added near the
beginning, leading readers to confusion.
Whether you keep <path> at the beginning of move it to later
together with <sha1>, helpers like crlf_to_git() need to be updated
to match. E.g. I would say that this
> - ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
> + ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
would want to be ordered more like this:
ret |= crlf_to_git(path, src, len, dst,
ca.crlf_action, checksafe, index_blob_sha1);
if you choose to keep the first four intact for convert_to_git(),
that is.
> Describe whats going on better in the commit msg.
The suggestion to rename the parameter was to allow readers of the
code to immediately know what kind of SHA1 it is. They cannot
afford to run "git blame" every time to find the commit to read the
commit log message; for a public function like convert_to_git(),
in-code comment is also necessary.
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v7 1/2] read-cache: factor out get_sha1_from_index() helper
2016-05-19 23:03 ` Junio C Hamano
2016-05-20 17:12 ` [PATCH v6 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
2016-05-20 17:12 ` [PATCH v6 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-21 10:01 ` tboegi
2016-05-21 10:01 ` [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
3 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-21 10:01 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().
This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.
Add a wrapper definition get_sha1_from_cache().
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 3 +++
read-cache.c | 29 ++++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/cache.h b/cache.h
index 160f8e3..15a2a10 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
#define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path) get_sha1_from_index (&the_index, (path))
#endif
enum object_type {
@@ -1038,6 +1039,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
}
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
/*
* This internal function is only declared here for the benefit of
* lookup_replace_object(). Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
{
- int pos, len;
+ const unsigned char *sha1;
unsigned long sz;
enum object_type type;
void *data;
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
if (pos < 0) {
/*
* We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
}
if (pos < 0)
return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
}
void stat_validity_clear(struct stat_validity *sv)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-19 23:03 ` Junio C Hamano
` (2 preceding siblings ...)
2016-05-21 10:01 ` [PATCH v7 1/2] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-05-21 10:01 ` tboegi
2016-05-24 18:36 ` Junio C Hamano
3 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-21 10:01 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
To compare a file in working tree with the index, convert_to_git() is used,
the result is hashed and the hash value compared with ce->sha1.
Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not.
The "new safer autocrlf handling" checks if CRLF had been in the index before,
and if, the CRLF in the working tree are not converted.
While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.
Add a new parameter index_blob_sha1 to convert_to_git(), and forward the
sha1 from ce_compare_data() into convert_to_git(). Other callers use NULL
for index_blob_sha1, and the sha1 is determined from path
using get_sha1_from_cache(path). This is the same handling as before.
In the same spirit, forward the sha1 into would_convert_to_git().
While at it, rename has_cr_in_index() into blob_has_cr() and replace
0 with SAFE_CRLF_FALSE.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
V6 went into the wrong direction.
V5 -> V7: Adds parameter index_blob_sha1 to (convert_to_git() and would_convert_to_git().
builtin/apply.c | 3 ++-
builtin/blame.c | 2 +-
cache.h | 1 +
combine-diff.c | 3 ++-
convert.c | 34 ++++++++++++++++++++++------------
convert.h | 15 +++++++++++----
diff.c | 3 ++-
dir.c | 2 +-
read-cache.c | 4 +++-
sha1_file.c | 12 +++++++++---
10 files changed, 54 insertions(+), 25 deletions(-)
diff --git a/builtin/apply.c b/builtin/apply.c
index 8e4da2e..0cf9a0a 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -2140,7 +2140,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
case S_IFREG:
if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
return error(_("unable to open or read %s"), path);
- convert_to_git(path, buf->buf, buf->len, buf, 0);
+ convert_to_git(path, buf->buf, buf->len, buf,
+ SAFE_CRLF_FALSE, NULL);
return 0;
default:
return -1;
diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0..1c523b6 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2377,7 +2377,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
if (strbuf_read(&buf, 0, 0) < 0)
die_errno("failed to read from stdin");
}
- convert_to_git(path, buf.buf, buf.len, &buf, 0);
+ convert_to_git(path, buf.buf, buf.len, &buf, SAFE_CRLF_FALSE, NULL);
origin->file.ptr = buf.buf;
origin->file.size = buf.len;
pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);
diff --git a/cache.h b/cache.h
index 15a2a10..868599e 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
#define HASH_WRITE_OBJECT 1
#define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
diff --git a/combine-diff.c b/combine-diff.c
index 0e1d4b0..c4fa884 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1053,7 +1053,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
if (is_file) {
struct strbuf buf = STRBUF_INIT;
- if (convert_to_git(elem->path, result, len, &buf, safe_crlf)) {
+ if (convert_to_git(elem->path, result, len,
+ &buf, safe_crlf, NULL)) {
free(result);
result = strbuf_detach(&buf, &len);
result_size = len;
diff --git a/convert.c b/convert.c
index f524b8d..f0eb4ed 100644
--- a/convert.c
+++ b/convert.c
@@ -217,23 +217,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
}
}
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *index_blob_sha1)
{
unsigned long sz;
void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
+ int has_cr = 0;
+ enum object_type type;
+ if (!index_blob_sha1)
+ return 0;
+ data = read_sha1_file(index_blob_sha1, &type, &sz);
if (!data)
return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
+ if (type == OBJ_BLOB)
+ has_cr = memchr(data, '\r', sz) != NULL;
+
free(data);
return has_cr;
}
static int crlf_to_git(const char *path, const char *src, size_t len,
struct strbuf *buf,
- enum crlf_action crlf_action, enum safe_crlf checksafe)
+ enum crlf_action crlf_action, enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1)
{
struct text_stat stats;
char *dst;
@@ -260,7 +265,9 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
* If the file in the index has any CR in it, do not convert.
* This is the new safer autocrlf handling.
*/
- if (has_cr_in_index(path))
+ if (!index_blob_sha1)
+ index_blob_sha1 = get_sha1_from_cache(path);
+ if (blob_has_cr(index_blob_sha1))
return 0;
}
}
@@ -853,7 +860,8 @@ const char *get_convert_attr_ascii(const char *path)
}
int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe)
+ struct strbuf *dst, enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1)
{
int ret = 0;
const char *filter = NULL;
@@ -874,7 +882,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe, index_blob_sha1);
if (ret && dst) {
src = dst->buf;
len = dst->len;
@@ -883,7 +891,8 @@ int convert_to_git(const char *path, const char *src, size_t len,
}
void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
- enum safe_crlf checksafe)
+ enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1)
{
struct conv_attrs ca;
convert_attrs(&ca, path);
@@ -894,7 +903,8 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
die("%s: clean filter '%s' failed", path, ca.drv->name);
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
+ checksafe, index_blob_sha1);
ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
}
@@ -949,7 +959,7 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
src = dst->buf;
len = dst->len;
}
- return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE);
+ return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE, NULL);
}
/*****************************************************************
diff --git a/convert.h b/convert.h
index ccf436b..60c46b8 100644
--- a/convert.h
+++ b/convert.h
@@ -38,19 +38,26 @@ extern const char *get_convert_attr_ascii(const char *path);
/* returns 1 if *dst was used */
extern int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe);
+ struct strbuf *dst, enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1);
+
extern int convert_to_working_tree(const char *path, const char *src,
size_t len, struct strbuf *dst);
extern int renormalize_buffer(const char *path, const char *src, size_t len,
struct strbuf *dst);
-static inline int would_convert_to_git(const char *path)
+static inline int would_convert_to_git(const char *path,
+ const unsigned char *index_blob_sha1)
{
- return convert_to_git(path, NULL, 0, NULL, 0);
+ return convert_to_git(path, NULL, 0, NULL, SAFE_CRLF_FALSE,
+ index_blob_sha1);
}
+
/* Precondition: would_convert_to_git_filter_fd(path) == true */
extern void convert_to_git_filter_fd(const char *path, int fd,
struct strbuf *dst,
- enum safe_crlf checksafe);
+ enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1);
+
extern int would_convert_to_git_filter_fd(const char *path);
/*****************************************************************
diff --git a/diff.c b/diff.c
index d3734d3..a8308e0 100644
--- a/diff.c
+++ b/diff.c
@@ -2810,7 +2810,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
/*
* Convert from working tree format to canonical git format
*/
- if (convert_to_git(s->path, s->data, s->size, &buf, crlf_warn)) {
+ if (convert_to_git(s->path, s->data, s->size, &buf,
+ crlf_warn, NULL)) {
size_t size = 0;
munmap(s->data, s->size);
s->should_munmap = 0;
diff --git a/dir.c b/dir.c
index 656f272..5ac379d 100644
--- a/dir.c
+++ b/dir.c
@@ -713,7 +713,7 @@ static int add_excludes(const char *fname, const char *base, int baselen,
(pos = cache_name_pos(fname, strlen(fname))) >= 0 &&
!ce_stage(active_cache[pos]) &&
ce_uptodate(active_cache[pos]) &&
- !would_convert_to_git(fname))
+ !would_convert_to_git(fname, NULL))
hashcpy(sha1_stat->sha1, active_cache[pos]->sha1);
else
hash_sha1_file(buf, size, "blob", sha1_stat->sha1);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
if (fd >= 0) {
unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_USE_SHA_NOT_PATH;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);
/* index_fd() closed the file descriptor already */
}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..52e5c6f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
{
int ret, re_allocated = 0;
int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
if (!type)
type = OBJ_BLOB;
@@ -3285,7 +3286,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
if ((type == OBJ_BLOB) && path) {
struct strbuf nbuf = STRBUF_INIT;
if (convert_to_git(path, buf, size, &nbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ write_object ? safe_crlf : SAFE_CRLF_FALSE,
+ valid_sha1 ? sha1 : NULL)) {
buf = strbuf_detach(&nbuf, &size);
re_allocated = 1;
}
@@ -3313,13 +3315,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
{
int ret;
const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
struct strbuf sbuf = STRBUF_INIT;
assert(path);
assert(would_convert_to_git_filter_fd(path));
convert_to_git_filter_fd(path, fd, &sbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE);
+ write_object ? safe_crlf : SAFE_CRLF_FALSE,
+ valid_sha1 ? sha1 : NULL);
if (write_object)
ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
@@ -3396,6 +3400,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3412,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
else if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git(path,sha1_ce)))
ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
flags);
else
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path
2016-05-21 10:01 ` [PATCH v7 2/2] convert: ce_compare_data() checks for a sha1 of a path tboegi
@ 2016-05-24 18:36 ` Junio C Hamano
0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-24 18:36 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> From: Torsten Bögershausen <tboegi@web.de>
>
> To compare a file in working tree with the index, convert_to_git() is used,
> the result is hashed and the hash value compared with ce->sha1.
>
> Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
> are converted or not.
> The "new safer autocrlf handling" checks if CRLF had been in the index before,
> and if, the CRLF in the working tree are not converted.
>
> While in a merge, a file name in the working tree has different blobs
> in the index with different hash values.
> Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
> the would_convert_crlf_at_commit() looks at the appropriate blob.
>
> Add a new parameter index_blob_sha1 to convert_to_git(), and forward the
> sha1 from ce_compare_data() into convert_to_git(). Other callers use NULL
> for index_blob_sha1, and the sha1 is determined from path
> using get_sha1_from_cache(path). This is the same handling as before.
>
> In the same spirit, forward the sha1 into would_convert_to_git().
>
> While at it, rename has_cr_in_index() into blob_has_cr() and replace
> 0 with SAFE_CRLF_FALSE.
>
> Signed-off-by: Torsten Bögershausen <tboegi@web.de>
> ---
Assuming that it is sensible to pass an extra "no, no, do not look
at the index entry at path, but look at this blob to see if the CRLF
conversion must be disabled" parameter all the down the callchain,
this round looks good.
I really hate to say this this late in the reroll cycle, but this
part of the description makes me wonder:
> While in a merge, a file name in the working tree has
> different blobs in the index with different hash values.
> Forwarding ce->sha1 from ce_compare_data() into crlf_to_git()
> makes sure the would_convert_crlf_at_commit() looks at the
> appropriate blob.
When we need content-level merges with help from the end user, we
would need to "convert-to-git" the result of conflict resolution by
the user left in the working tree, and that convert-to-git needs to
take our original contents into account, i.e. "did we have CR in the
blob already? if so, disable the CRLF thing".
But we would always have "our" original contents at stage #2 in the
index in such a case, and would_convert_to_git() eventually calls
into read_blob_data_from_cache(), which knows to read from stage #2
Even if we were in a renaming merge conflict, where they renamed
file F to G while we kept file F as file F, the conflicted working
tree file will be made in path G, and stage #2 of the index for path
G would have our original contents we had at path F.
So it is not clear why this "no, no, look at this blob instead" is
necessary in the first place. What problem does this solve? Is
this a fix for something that can be easily demonstrated with a new
test case?
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v3 1/1] convert: ce_compare_data() checks for a sha1 of a path
2016-05-15 22:14 ` Junio C Hamano
2016-05-16 15:51 ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
@ 2016-05-16 15:51 ` tboegi
2016-05-30 17:00 ` [PATCH v1 0/1] t6038-merge-text-auto.sh tboegi
` (5 subsequent siblings)
7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-16 15:51 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
To compare a file in working tree with the index, convert_to_git() is used,
the the result is hashed and the hash value compared with ce->sha1.
Deep down would_convert_crlf_at_commit() is invoked, to check if CRLF
are converted or not: When a CRLF had been in the index before, CRLF in
the working tree are not converted.
While in a merge, a file name in the working tree has different blobs
in the index with different hash values.
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
the would_convert_crlf_at_commit() looks at the appropriate blob.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 1 +
convert.c | 35 +++++++++++++++++++++--------------
convert.h | 23 +++++++++++++++++++----
read-cache.c | 4 +++-
sha1_file.c | 17 +++++++++++++----
5 files changed, 57 insertions(+), 23 deletions(-)
diff --git a/cache.h b/cache.h
index 160f8e3..5b25462 100644
--- a/cache.h
+++ b/cache.h
@@ -604,6 +604,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
#define HASH_WRITE_OBJECT 1
#define HASH_FORMAT_CHECK 2
+#define HASH_CE_HAS_SHA1 4
extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
diff --git a/convert.c b/convert.c
index f524b8d..dc86899 100644
--- a/convert.c
+++ b/convert.c
@@ -217,21 +217,25 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
}
}
-static int has_cr_in_index(const char *path)
+static int has_cr_in_index(const char *path, const unsigned char *sha1)
{
unsigned long sz;
void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
- if (!data)
- return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
+ int has_cr = 0;
+ enum object_type type = OBJ_BLOB;
+ if (sha1)
+ data = read_sha1_file(sha1, &type, &sz);
+ else
+ data = read_blob_data_from_cache(path, &sz);
+
+ if (data)
+ has_cr = memchr(data, '\r', sz) != NULL;
free(data);
return has_cr;
}
-static int crlf_to_git(const char *path, const char *src, size_t len,
+static int crlf_to_git(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
struct strbuf *buf,
enum crlf_action crlf_action, enum safe_crlf checksafe)
{
@@ -260,7 +264,7 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
* If the file in the index has any CR in it, do not convert.
* This is the new safer autocrlf handling.
*/
- if (has_cr_in_index(path))
+ if (has_cr_in_index(path, sha1))
return 0;
}
}
@@ -852,8 +856,9 @@ const char *get_convert_attr_ascii(const char *path)
return "";
}
-int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe)
+int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
{
int ret = 0;
const char *filter = NULL;
@@ -874,7 +879,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, sha1, src, len, dst, ca.crlf_action, checksafe);
if (ret && dst) {
src = dst->buf;
len = dst->len;
@@ -882,7 +887,9 @@ int convert_to_git(const char *path, const char *src, size_t len,
return ret | ident_to_git(path, src, len, dst, ca.ident);
}
-void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
+void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe)
{
struct conv_attrs ca;
@@ -894,7 +901,7 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
die("%s: clean filter '%s' failed", path, ca.drv->name);
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, sha1, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
}
diff --git a/convert.h b/convert.h
index ccf436b..1634d37 100644
--- a/convert.h
+++ b/convert.h
@@ -37,8 +37,16 @@ extern const char *get_wt_convert_stats_ascii(const char *path);
extern const char *get_convert_attr_ascii(const char *path);
/* returns 1 if *dst was used */
-extern int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe);
+extern int convert_to_git_ce_sha1(const char *path, const unsigned char *sha1,
+ const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe);
+
+static inline int convert_to_git(const char *path, const char *src, size_t len,
+ struct strbuf *dst, enum safe_crlf checksafe)
+{
+ return convert_to_git_ce_sha1(path, NULL, src, len, dst, checksafe);
+}
+
extern int convert_to_working_tree(const char *path, const char *src,
size_t len, struct strbuf *dst);
extern int renormalize_buffer(const char *path, const char *src, size_t len,
@@ -47,9 +55,16 @@ static inline int would_convert_to_git(const char *path)
{
return convert_to_git(path, NULL, 0, NULL, 0);
}
+static inline int would_convert_to_git_ce_sha1(const char *path,
+ const unsigned char *sha1)
+{
+ return convert_to_git_ce_sha1(path, sha1, NULL, 0, NULL, 0);
+}
+
/* Precondition: would_convert_to_git_filter_fd(path) == true */
-extern void convert_to_git_filter_fd(const char *path, int fd,
- struct strbuf *dst,
+extern void convert_to_git_filter_fd(const char *path,
+ const unsigned char *sha1,
+ int fd, struct strbuf *dst,
enum safe_crlf checksafe);
extern int would_convert_to_git_filter_fd(const char *path);
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..4b0e3c3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
if (fd >= 0) {
unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_CE_HAS_SHA1;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);
/* index_fd() closed the file descriptor already */
}
diff --git a/sha1_file.c b/sha1_file.c
index d0f2aa0..dd013d5 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3275,6 +3275,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
{
int ret, re_allocated = 0;
int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
if (!type)
type = OBJ_BLOB;
@@ -3284,8 +3285,11 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
*/
if ((type == OBJ_BLOB) && path) {
struct strbuf nbuf = STRBUF_INIT;
- if (convert_to_git(path, buf, size, &nbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ if (convert_to_git_ce_sha1(path,
+ valid_sha1 ? sha1 : NULL,
+ buf, size, &nbuf,
+ write_object ? safe_crlf : SAFE_CRLF_FALSE)){
+
buf = strbuf_detach(&nbuf, &size);
re_allocated = 1;
}
@@ -3313,12 +3317,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
{
int ret;
const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_CE_HAS_SHA1;
struct strbuf sbuf = STRBUF_INIT;
assert(path);
assert(would_convert_to_git_filter_fd(path));
- convert_to_git_filter_fd(path, fd, &sbuf,
+ convert_to_git_filter_fd(path,
+ valid_sha1 ? sha1 : NULL,
+ fd, &sbuf,
write_object ? safe_crlf : SAFE_CRLF_FALSE);
if (write_object)
@@ -3396,6 +3403,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_CE_HAS_SHA1 ? sha1 : NULL;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3406,7 +3415,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
else if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git_ce_sha1(path,sha1_ce)))
ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
flags);
else
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v1 0/1] t6038-merge-text-auto.sh
2016-05-15 22:14 ` Junio C Hamano
2016-05-16 15:51 ` [PATCH v3 0/1] CRLF-Handling: bug fix around ce_compare_data() tboegi
2016-05-16 15:51 ` [PATCH v3 1/1] " tboegi
@ 2016-05-30 17:00 ` tboegi
2016-05-30 18:00 ` Junio C Hamano
2016-05-30 17:00 ` [PATCH v1 1/1] t6038: different eol for "Merge addition of text=auto" tboegi
` (4 subsequent siblings)
7 siblings, 1 reply; 53+ messages in thread
From: tboegi @ 2016-05-30 17:00 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Split of the old 10/10 series.
This is the update of t6038, which is needed to
motivate the patch
`convert: ce_compare_data() checks for a sha1 of a path`
on top of
`convert: unify the "auto" handling of CRLF`
When files with different eols are merged with
merge.renormalize = true,
it is important to look at the right blob to determine if
the crlf came from the blob or are a result of a coversion.
This is a little bit of a hen-and-egg problem:
The problem comes up after the "unified auto handling".
In theory, it should have been since before:
get_sha1_from_index() says:
* We might be in the middle of a merge, in which
* case we would read stage #2 (ours).
This seams wrong, as in the merge we sometimes need to
look at "theirs".
(But I haven't managed to construct a TC)
t6038-merge-text-auto.sh
Torsten Bögershausen (1):
t6038: different eol for "Merge addition of text=auto"
t/t6038-merge-text-auto.sh | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v1 0/1] t6038-merge-text-auto.sh
2016-05-30 17:00 ` [PATCH v1 0/1] t6038-merge-text-auto.sh tboegi
@ 2016-05-30 18:00 ` Junio C Hamano
2016-05-30 18:48 ` Junio C Hamano
0 siblings, 1 reply; 53+ messages in thread
From: Junio C Hamano @ 2016-05-30 18:00 UTC (permalink / raw)
To: tboegi; +Cc: git
tboegi@web.de writes:
> This is a little bit of a hen-and-egg problem:
> The problem comes up after the "unified auto handling".
> In theory, it should have been since before:
> get_sha1_from_index() says:
>
> * We might be in the middle of a merge, in which
> * case we would read stage #2 (ours).
>
> This seams wrong, as in the merge we sometimes need to
> look at "theirs".
The two comment you quoted is absolutely the right thing to do.
"In a merge, we sometimes need to look at 'theirs'" is like saying
"When we are dong 'git add', we need to look at what is in the
working tree". It is total red herring.
Step back and think why we even look at what in the index in the
first place; it is to decide if we want or do not want to disable
the automatic CRLF -> LF conversion. And think again the reason why
do we look at the index.
There may be a line with CRLF line endings in the new contents,
whether it came from a merge, cherry-pick, patch application, or
plain-simple "git add" from the working tree. Auto-CRLF usually
says "We want CRLF turned into LF". But the user misconfigured and
for this path the user might not want the conversion take place, in
which case you would disable the conversion. Where do you take that
hint "the user might have misconfigured?" from? By looking at what
the user _started_ her update from. If the state before this "we
need to replace the blob in the index with a new contents, so we
need to hash the new contents to come up with the updated blob"
started contains CRLF already, that may be a hint--if we apply the
CRLF->LF conversion on the original, even if the "new contents" were
identical to what she already had, we would end up changing the blob
with her current configuration. Hence we disable.
Isn't that the reasoning behind that "safe auto-crlf" thing?
The new contents getting integrated into her current state may have
CRLF, and if a merge or a cherry-pick leaves conflicts, they may be
stored in stage #3. But paying attention to that to decide if we
want to disable Auto-CRLF conversion is simply wrong; you should
look at the CRLF in stage #3 as purely something that might need to
be converted, not something that affects the decision if it needs to
be converted, just like you view CRLF in a working tree file when
you do "git add"..
Imagine that you started from a history where somebody recorded a
text file with CRLF in the blob, unconverted. Later the project
decided to express their text with LF to support cross-platform
development better, and sets up the Auto-CRLF. Your user is working
near the tip of that history after the eol correction happened. Now
she gets a pull-request of a branch that forked from an old point in
the history, before the eol correction and full of CRLF. Yes, to
integrate the change being proposed, she needs to look at "theirs";
that's the whole point of a "merge". Why should she revert the eol
correction her history has by getting fooled by the fact that the
update was based on a part of the history before the eol correction?
^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH v1 0/1] t6038-merge-text-auto.sh
2016-05-30 18:00 ` Junio C Hamano
@ 2016-05-30 18:48 ` Junio C Hamano
0 siblings, 0 replies; 53+ messages in thread
From: Junio C Hamano @ 2016-05-30 18:48 UTC (permalink / raw)
To: tboegi; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> Imagine that you started from a history where somebody recorded a
> text file with CRLF in the blob, unconverted. Later the project
> decided to express their text with LF to support cross-platform
> development better, and sets up the Auto-CRLF. Your user is working
> near the tip of that history after the eol correction happened. Now
> she gets a pull-request of a branch that forked from an old point in
> the history, before the eol correction and full of CRLF. Yes, to
> integrate the change being proposed, she needs to look at "theirs";
> that's the whole point of a "merge". Why should she revert the eol
> correction her history has by getting fooled by the fact that the
> update was based on a part of the history before the eol correction?
Thinking aloud along this line a bit further, if you really cared
(and I don't feel very strongly, as I think "safe crlf" is an ugly
workaround that lets users keep their misconfiguration by penalizing
their day-to-day operation), you may want to think about doing a
3-way merge of "eol preference" beween all stages.
That is, if the common ancestor in stage #1 and the current version
in stage #2 both have its text in LF, and the data being merged in
stage #3 is in CRLF, you sayr "do not convert; the change being
brought in wants to have CRLF endings, while our history did not
care". Similarly, if the common ancestor in stage #1 and the data
being merged in stage #3 both have CRLF, and your version in stage
#2 has LF, you say "We wanted to fix eol since the side branch
forked, and the side branch predates that fix, so we keep the eol
fix we did since we diverged", i.e. "Do convert".
For doing this, you may want to refactor the codepath that decides
"Auto-CRLF usually wants to turn CRLF in text to LF, but should we
disable that logic now, because the user already has CRLF in the
current one?" into a function that takes a single parameter, `path`,
and returns "Yes, do convert CRLF to LF / No, do not convert"
boolean. Having a low level function that says "What's the blob at
this path in the index" and have the caller run that logic feels
unwieldy if we want to go that route.
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v1 1/1] t6038: different eol for "Merge addition of text=auto"
2016-05-15 22:14 ` Junio C Hamano
` (2 preceding siblings ...)
2016-05-30 17:00 ` [PATCH v1 0/1] t6038-merge-text-auto.sh tboegi
@ 2016-05-30 17:00 ` tboegi
2016-06-07 15:20 ` [PATCH v2 0/3] unified auto CRLF handling, V2 tboegi
` (3 subsequent siblings)
7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-05-30 17:00 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
t6038 uses different code, depending if NATIVE_CRLF is set ot not.
Enhance the test coverage for cross-platform testing and run
"Merge addition of text=auto" with both lf and crlf as core.eol.
It is important to be run this test with crlf under Linux,
the day when the "unified auto handling will be introduced.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
t/t6038-merge-text-auto.sh | 28 +++++++++++++++++++++++-----
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 33b77ee..8846f5d 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -91,16 +91,13 @@ test_expect_success 'Merge after setting text=auto' '
compare_files expected file
'
-test_expect_success 'Merge addition of text=auto' '
+test_expect_success 'Merge addition of text=auto eol=LF' '
+ git config core.eol lf &&
cat <<-\EOF >expected &&
first line
same line
EOF
- if test_have_prereq NATIVE_CRLF; then
- append_cr <expected >expected.temp &&
- mv expected.temp expected
- fi &&
git config merge.renormalize true &&
git rm -fr . &&
rm -f .gitattributes &&
@@ -109,7 +106,28 @@ test_expect_success 'Merge addition of text=auto' '
compare_files expected file
'
+test_expect_success 'Merge addition of text=auto eol=CRLF' '
+ git config core.eol crlf &&
+ cat <<-\EOF >expected &&
+ first line
+ same line
+ EOF
+
+ append_cr <expected >expected.temp &&
+ mv expected.temp expected &&
+ git config merge.renormalize true &&
+ git rm -fr . &&
+ rm -f .gitattributes &&
+ git reset --hard b &&
+ echo >&2 "After git reset --hard b" &&
+ git ls-files -s --eol >&2 &&
+ git merge a &&
+ compare_files expected file
+'
+
+
test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
+ git config core.eol native &&
echo "<<<<<<<" >expected &&
if test_have_prereq NATIVE_CRLF; then
echo first line | append_cr >>expected &&
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v2 0/3] unified auto CRLF handling, V2
2016-05-15 22:14 ` Junio C Hamano
` (3 preceding siblings ...)
2016-05-30 17:00 ` [PATCH v1 1/1] t6038: different eol for "Merge addition of text=auto" tboegi
@ 2016-06-07 15:20 ` tboegi
2016-06-07 15:20 ` [PATCH v2 1/3] convert: unify the "auto" handling of CRLF tboegi
` (2 subsequent siblings)
7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
unified auto CRLF handling, V2
1/3 is 7/10 of the old 10/10 series
2/3 and
3/3 is a replacement for tb/convert-peek-in-index:
Better commit message, added test case
All in all we are getting closer.
Most of the patches had been send & reviewed earlier,
but a critical review won't hurt.
Torsten Bögershausen (3):
convert: unify the "auto" handling of CRLF
read-cache: factor out get_sha1_from_index() helper
Correct ce_compare_data() in a middle of a merge
Documentation/config.txt | 12 +++----
Documentation/gitattributes.txt | 15 +++++----
builtin/apply.c | 3 +-
builtin/blame.c | 2 +-
cache.h | 4 +++
combine-diff.c | 3 +-
convert.c | 65 +++++++++++++++++++++++-------------
convert.h | 18 +++++++---
diff.c | 3 +-
dir.c | 2 +-
read-cache.c | 33 +++++++++++-------
sha1_file.c | 12 +++++--
t/t0025-crlf-auto.sh | 4 +--
t/t0027-auto-crlf.sh | 32 +++++++++---------
t/t6038-merge-text-auto.sh | 74 ++++++++++++++++++++++++-----------------
15 files changed, 172 insertions(+), 110 deletions(-)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v2 1/3] convert: unify the "auto" handling of CRLF
2016-05-15 22:14 ` Junio C Hamano
` (4 preceding siblings ...)
2016-06-07 15:20 ` [PATCH v2 0/3] unified auto CRLF handling, V2 tboegi
@ 2016-06-07 15:20 ` tboegi
2016-06-07 15:20 ` [PATCH v2 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
2016-06-07 15:20 ` [PATCH v2 3/3] Correct ce_compare_data() in a middle of a merge tboegi
7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Before this change,
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
would have the same effect as
$ echo "* text" >.gitattributes
$ git config core.eol crlf
Since the 'eol' attribute had higher priority than 'text=auto', this may
corrupt binary files and is not what most users expect to happen.
Make the 'eol' attribute to obey 'text=auto', and now
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
behaves the same as
$ echo "* text=auto" >.gitattributes
$ git config core.eol crlf
In other words,
$ echo "* text=auto eol=crlf" >.gitattributes
has the same effect as
$ git config core.autocrlf true
and
$ echo "* text=auto eol=lf" >.gitattributes
has the same effect as
$ git config core.autocrlf input
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
Documentation/config.txt | 12 +++++-------
Documentation/gitattributes.txt | 15 +++++++++------
convert.c | 42 +++++++++++++++++++++--------------------
convert.h | 3 ++-
t/t0025-crlf-auto.sh | 4 ++--
t/t0027-auto-crlf.sh | 32 +++++++++++++++----------------
t/t6038-merge-text-auto.sh | 23 ++++++++++++++--------
7 files changed, 71 insertions(+), 60 deletions(-)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 53f00db..208b5de 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -405,13 +405,11 @@ file with mixed line endings would be reported by the `core.safecrlf`
mechanism.
core.autocrlf::
- Setting this variable to "true" is almost the same as setting
- the `text` attribute to "auto" on all files except that text
- files are not guaranteed to be normalized: files that contain
- `CRLF` in the repository will not be touched. Use this
- setting if you want to have `CRLF` line endings in your
- working directory even though the repository does not have
- normalized line endings. This variable can be set to 'input',
+ Setting this variable to "true" is the same as setting
+ the `text` attribute to "auto" on all files and core.eol to "crlf".
+ Set to true if you want to have `CRLF` line endings in your
+ working directory and the repository has LF line endings.
+ This variable can be set to 'input',
in which case no output conversion is performed.
core.symlinks::
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..d7a124b 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -115,6 +115,7 @@ text file is normalized, its line endings are converted to LF in the
repository. To control what line ending style is used in the working
directory, use the `eol` attribute for a single file and the
`core.eol` configuration variable for all text files.
+Note that `core.autocrlf` overrides `core.eol`
Set::
@@ -130,8 +131,9 @@ Unset::
Set to string value "auto"::
When `text` is set to "auto", the path is marked for automatic
- end-of-line normalization. If Git decides that the content is
- text, its line endings are normalized to LF on checkin.
+ end-of-line conversion. If Git decides that the content is
+ text, its line endings are converted to LF on checkin.
+ When the file has been commited with CRLF, no conversion is done.
Unspecified::
@@ -146,7 +148,7 @@ unspecified.
^^^^^
This attribute sets a specific line-ending style to be used in the
-working directory. It enables end-of-line normalization without any
+working directory. It enables end-of-line conversion without any
content checks, effectively setting the `text` attribute.
Set to string value "crlf"::
@@ -186,9 +188,10 @@ the working directory, and prevent .jpg files from being normalized
regardless of their content.
------------------------
+* text=auto
*.txt text
-*.vcproj eol=crlf
-*.sh eol=lf
+*.vcproj text eol=crlf
+*.sh text eol=lf
*.jpg -text
------------------------
@@ -198,7 +201,7 @@ normalization in Git.
If you simply want to have CRLF line endings in your working directory
regardless of the repository you are working with, you can set the
-config variable "core.autocrlf" without changing any attributes.
+config variable "core.autocrlf" without using any attributes.
------------------------
[core]
diff --git a/convert.c b/convert.c
index b1614bf..67d69b5 100644
--- a/convert.c
+++ b/convert.c
@@ -176,7 +176,9 @@ static enum eol output_eol(enum crlf_action crlf_action)
return EOL_LF;
case CRLF_UNDEFINED:
case CRLF_AUTO_CRLF:
+ return EOL_CRLF;
case CRLF_AUTO_INPUT:
+ return EOL_LF;
case CRLF_TEXT:
case CRLF_AUTO:
/* fall through */
@@ -254,17 +256,15 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
if (convert_is_binary(len, &stats))
return 0;
-
- if (crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- /*
- * If the file in the index has any CR in it, do not convert.
- * This is the new safer autocrlf handling.
- */
- if (has_cr_in_index(path))
- return 0;
- }
+ /*
+ * If the file in the index has any CR in it, do not convert.
+ * This is the new safer autocrlf handling.
+ */
+ if (checksafe == SAFE_CRLF_RENORMALIZE)
+ checksafe = SAFE_CRLF_FALSE;
+ else if (has_cr_in_index(path))
+ return 0;
}
-
check_safe_crlf(path, crlf_action, &stats, checksafe);
/* Optimization: No CRLF? Nothing to convert, regardless. */
@@ -320,12 +320,10 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
return 0;
if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- if (crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
- /* If we have any CR or CRLF line endings, we do not touch it */
- /* This is the new safer autocrlf-handling */
- if (stats.lonecr || stats.crlf )
- return 0;
- }
+ /* If we have any CR or CRLF line endings, we do not touch it */
+ /* This is the new safer autocrlf-handling */
+ if (stats.lonecr || stats.crlf )
+ return 0;
if (convert_is_binary(len, &stats))
return 0;
@@ -786,7 +784,11 @@ static void convert_attrs(struct conv_attrs *ca, const char *path)
ca->drv = git_path_check_convert(ccheck + 2);
if (ca->crlf_action != CRLF_BINARY) {
enum eol eol_attr = git_path_check_eol(ccheck + 3);
- if (eol_attr == EOL_LF)
+ if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_LF)
+ ca->crlf_action = CRLF_AUTO_INPUT;
+ else if (ca->crlf_action == CRLF_AUTO && eol_attr == EOL_CRLF)
+ ca->crlf_action = CRLF_AUTO_CRLF;
+ else if (eol_attr == EOL_LF)
ca->crlf_action = CRLF_TEXT_INPUT;
else if (eol_attr == EOL_CRLF)
ca->crlf_action = CRLF_TEXT_CRLF;
@@ -845,9 +847,9 @@ const char *get_convert_attr_ascii(const char *path)
case CRLF_AUTO:
return "text=auto";
case CRLF_AUTO_CRLF:
- return "text=auto eol=crlf"; /* This is not supported yet */
+ return "text=auto eol=crlf";
case CRLF_AUTO_INPUT:
- return "text=auto eol=lf"; /* This is not supported yet */
+ return "text=auto eol=lf";
}
return "";
}
@@ -949,7 +951,7 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
src = dst->buf;
len = dst->len;
}
- return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_FALSE);
+ return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_RENORMALIZE);
}
/*****************************************************************
diff --git a/convert.h b/convert.h
index ccf436b..81b6cdf 100644
--- a/convert.h
+++ b/convert.h
@@ -7,7 +7,8 @@
enum safe_crlf {
SAFE_CRLF_FALSE = 0,
SAFE_CRLF_FAIL = 1,
- SAFE_CRLF_WARN = 2
+ SAFE_CRLF_WARN = 2,
+ SAFE_CRLF_RENORMALIZE = 4
};
extern enum safe_crlf safe_crlf;
diff --git a/t/t0025-crlf-auto.sh b/t/t0025-crlf-auto.sh
index c164b46..d0bee08 100755
--- a/t/t0025-crlf-auto.sh
+++ b/t/t0025-crlf-auto.sh
@@ -114,7 +114,7 @@ test_expect_success 'autocrlf=true does not normalize CRLF files' '
test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
'
-test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
+test_expect_success 'text=auto, autocrlf=true does not normalize CRLF files' '
rm -f .gitattributes tmp LFonly CRLFonly LFwithNUL &&
git config core.autocrlf true &&
@@ -126,7 +126,7 @@ test_expect_success 'text=auto, autocrlf=true _does_ normalize CRLF files' '
LFonlydiff=$(git diff LFonly) &&
CRLFonlydiff=$(git diff CRLFonly) &&
LFwithNULdiff=$(git diff LFwithNUL) &&
- test -z "$LFonlydiff" -a -n "$CRLFonlydiff" -a -z "$LFwithNULdiff"
+ test -z "$LFonlydiff" -a -z "$CRLFonlydiff" -a -z "$LFwithNULdiff"
'
test_expect_success 'text=auto, autocrlf=true does not normalize binary files' '
diff --git a/t/t0027-auto-crlf.sh b/t/t0027-auto-crlf.sh
index 9372589..8367d0b 100755
--- a/t/t0027-auto-crlf.sh
+++ b/t/t0027-auto-crlf.sh
@@ -175,8 +175,8 @@ attr_ascii () {
text,lf) echo "text eol=lf" ;;
text,crlf) echo "text eol=crlf" ;;
auto,) echo "text=auto" ;;
- auto,lf) echo "text eol=lf" ;;
- auto,crlf) echo "text eol=crlf" ;;
+ auto,lf) echo "text=auto eol=lf" ;;
+ auto,crlf) echo "text=auto eol=crlf" ;;
lf,) echo "text eol=lf" ;;
crlf,) echo "text eol=crlf" ;;
,) echo "" ;;
@@ -397,10 +397,9 @@ commit_chk_wrnNNO "" "" false "" "" "" ""
commit_chk_wrnNNO "" "" true LF_CRLF "" "" "" ""
commit_chk_wrnNNO "" "" input "" "" "" "" ""
-commit_chk_wrnNNO "auto" "" false "$WILC" "$WICL" "$WAMIX" "" ""
-commit_chk_wrnNNO "auto" "" true LF_CRLF "" LF_CRLF "" ""
-commit_chk_wrnNNO "auto" "" input "" CRLF_LF CRLF_LF "" ""
-
+commit_chk_wrnNNO "auto" "" false "$WILC" "" "" "" ""
+commit_chk_wrnNNO "auto" "" true LF_CRLF "" "" "" ""
+commit_chk_wrnNNO "auto" "" input "" "" "" "" ""
for crlf in true false input
do
commit_chk_wrnNNO -text "" $crlf "" "" "" "" ""
@@ -408,8 +407,8 @@ do
commit_chk_wrnNNO -text crlf $crlf "" "" "" "" ""
commit_chk_wrnNNO "" lf $crlf "" CRLF_LF CRLF_LF "" CRLF_LF
commit_chk_wrnNNO "" crlf $crlf LF_CRLF "" LF_CRLF LF_CRLF ""
- commit_chk_wrnNNO auto lf $crlf "" CRLF_LF CRLF_LF "" CRLF_LF
- commit_chk_wrnNNO auto crlf $crlf LF_CRLF "" LF_CRLF LF_CRLF ""
+ commit_chk_wrnNNO auto lf $crlf "" "" "" "" ""
+ commit_chk_wrnNNO auto crlf $crlf LF_CRLF "" "" "" ""
commit_chk_wrnNNO text lf $crlf "" CRLF_LF CRLF_LF "" CRLF_LF
commit_chk_wrnNNO text crlf $crlf LF_CRLF "" LF_CRLF LF_CRLF ""
done
@@ -454,9 +453,9 @@ do
check_in_repo_NNO -text "" $crlf LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
check_in_repo_NNO -text lf $crlf LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
check_in_repo_NNO -text crlf $crlf LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
- check_in_repo_NNO auto "" $crlf LF LF LF LF_mix_CR CRLF_nul
- check_in_repo_NNO auto lf $crlf LF LF LF LF_mix_CR LF_nul
- check_in_repo_NNO auto crlf $crlf LF LF LF LF_mix_CR LF_nul
+ check_in_repo_NNO auto "" $crlf LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
+ check_in_repo_NNO auto lf $crlf LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
+ check_in_repo_NNO auto crlf $crlf LF CRLF CRLF_mix_LF LF_mix_CR CRLF_nul
check_in_repo_NNO text "" $crlf LF LF LF LF_mix_CR LF_nul
check_in_repo_NNO text lf $crlf LF LF LF LF_mix_CR LF_nul
check_in_repo_NNO text crlf $crlf LF LF LF LF_mix_CR LF_nul
@@ -493,7 +492,8 @@ fi
export CRLF_MIX_LF_CR MIX NL
# Same handling with and without ident
-for id in "" ident
+#for id in "" ident
+for id in ""
do
for ceol in lf crlf native
do
@@ -509,7 +509,7 @@ do
checkout_files text "$id" "crlf" "$crlf" "$ceol" CRLF CRLF CRLF CRLF_mix_CR CRLF_nul
# currently the same as text, eol=XXX
checkout_files auto "$id" "lf" "$crlf" "$ceol" LF CRLF CRLF_mix_LF LF_mix_CR LF_nul
- checkout_files auto "$id" "crlf" "$crlf" "$ceol" CRLF CRLF CRLF CRLF_mix_CR CRLF_nul
+ checkout_files auto "$id" "crlf" "$crlf" "$ceol" CRLF CRLF CRLF_mix_LF LF_mix_CR LF_nul
done
# core.autocrlf false, different core.eol
@@ -517,7 +517,7 @@ do
# core.autocrlf true
checkout_files "" "$id" "" true "$ceol" CRLF CRLF CRLF_mix_LF LF_mix_CR LF_nul
# text: core.autocrlf = true overrides core.eol
- checkout_files auto "$id" "" true "$ceol" CRLF CRLF CRLF LF_mix_CR LF_nul
+ checkout_files auto "$id" "" true "$ceol" CRLF CRLF CRLF_mix_LF LF_mix_CR LF_nul
checkout_files text "$id" "" true "$ceol" CRLF CRLF CRLF CRLF_mix_CR CRLF_nul
# text: core.autocrlf = input overrides core.eol
checkout_files text "$id" "" input "$ceol" LF CRLF CRLF_mix_LF LF_mix_CR LF_nul
@@ -531,8 +531,8 @@ do
checkout_files text "$id" "" false "" $NL CRLF $MIX_CRLF_LF $MIX_LF_CR $LFNUL
checkout_files text "$id" "" false native $NL CRLF $MIX_CRLF_LF $MIX_LF_CR $LFNUL
# auto: core.autocrlf=false and core.eol unset(or native) uses native eol
- checkout_files auto "$id" "" false "" $NL CRLF $MIX_CRLF_LF LF_mix_CR LF_nul
- checkout_files auto "$id" "" false native $NL CRLF $MIX_CRLF_LF LF_mix_CR LF_nul
+ checkout_files auto "$id" "" false "" $NL CRLF CRLF_mix_LF LF_mix_CR LF_nul
+ checkout_files auto "$id" "" false native $NL CRLF CRLF_mix_LF LF_mix_CR LF_nul
done
# Should be the last test case: remove some files from the worktree
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 85c10b0..33b77ee 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -16,6 +16,13 @@ test_description='CRLF merge conflict across text=auto change
test_have_prereq SED_STRIPS_CR && SED_OPTIONS=-b
+compare_files () {
+ tr '\015\000' QN <"$1" >"$1".expect &&
+ tr '\015\000' QN <"$2" >"$2".actual &&
+ test_cmp "$1".expect "$2".actual &&
+ rm "$1".expect "$2".actual
+}
+
test_expect_success setup '
git config core.autocrlf false &&
@@ -30,7 +37,7 @@ test_expect_success setup '
git branch side &&
echo "* text=auto" >.gitattributes &&
- touch file &&
+ echo first line >file &&
git add .gitattributes file &&
test_tick &&
git commit -m "normalize file" &&
@@ -81,7 +88,7 @@ test_expect_success 'Merge after setting text=auto' '
rm -f .gitattributes &&
git reset --hard a &&
git merge b &&
- test_cmp expected file
+ compare_files expected file
'
test_expect_success 'Merge addition of text=auto' '
@@ -99,7 +106,7 @@ test_expect_success 'Merge addition of text=auto' '
rm -f .gitattributes &&
git reset --hard b &&
git merge a &&
- test_cmp expected file
+ compare_files expected file
'
test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
@@ -121,7 +128,7 @@ test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
git reset --hard a &&
test_must_fail git merge b &&
fuzz_conflict file >file.fuzzy &&
- test_cmp expected file.fuzzy
+ compare_files expected file.fuzzy
'
test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
@@ -143,7 +150,7 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
git reset --hard b &&
test_must_fail git merge a &&
fuzz_conflict file >file.fuzzy &&
- test_cmp expected file.fuzzy
+ compare_files expected file.fuzzy
'
test_expect_failure 'checkout -m after setting text=auto' '
@@ -158,7 +165,7 @@ test_expect_failure 'checkout -m after setting text=auto' '
git reset --hard initial &&
git checkout a -- . &&
git checkout -m b &&
- test_cmp expected file
+ compare_files expected file
'
test_expect_failure 'checkout -m addition of text=auto' '
@@ -173,7 +180,7 @@ test_expect_failure 'checkout -m addition of text=auto' '
git reset --hard initial &&
git checkout b -- . &&
git checkout -m a &&
- test_cmp expected file
+ compare_files expected file
'
test_expect_failure 'cherry-pick patch from after text=auto was added' '
@@ -187,7 +194,7 @@ test_expect_failure 'cherry-pick patch from after text=auto was added' '
git reset --hard b &&
test_must_fail git cherry-pick a >err 2>&1 &&
grep "[Nn]othing added" err &&
- test_cmp expected file
+ compare_files expected file
'
test_expect_success 'Test delete/normalize conflict' '
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v2 2/3] read-cache: factor out get_sha1_from_index() helper
2016-05-15 22:14 ` Junio C Hamano
` (5 preceding siblings ...)
2016-06-07 15:20 ` [PATCH v2 1/3] convert: unify the "auto" handling of CRLF tboegi
@ 2016-06-07 15:20 ` tboegi
2016-06-07 15:20 ` [PATCH v2 3/3] Correct ce_compare_data() in a middle of a merge tboegi
7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
Factor out the retrieval of the sha1 for a given path in
read_blob_data_from_index() into the function get_sha1_from_index().
This will be used in the next commit, when convert.c can do the
analyze for "text=auto" without slurping the whole blob into memory
at once.
Add a wrapper definition get_sha1_from_cache().
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
cache.h | 3 +++
read-cache.c | 29 ++++++++++++++++++-----------
2 files changed, 21 insertions(+), 11 deletions(-)
diff --git a/cache.h b/cache.h
index 6049f86..322ee40 100644
--- a/cache.h
+++ b/cache.h
@@ -379,6 +379,7 @@ extern void free_name_hash(struct index_state *istate);
#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
#define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
+#define get_sha1_from_cache(path) get_sha1_from_index (&the_index, (path))
#endif
enum object_type {
@@ -1050,6 +1051,8 @@ static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *
return read_sha1_file_extended(sha1, type, size, LOOKUP_REPLACE_OBJECT);
}
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path);
+
/*
* This internal function is only declared here for the benefit of
* lookup_replace_object(). Please do not call it directly.
diff --git a/read-cache.c b/read-cache.c
index d9fb78b..a3ef967 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2263,13 +2263,27 @@ int index_name_is_other(const struct index_state *istate, const char *name,
void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size)
{
- int pos, len;
+ const unsigned char *sha1;
unsigned long sz;
enum object_type type;
void *data;
- len = strlen(path);
- pos = index_name_pos(istate, path, len);
+ sha1 = get_sha1_from_index(istate, path);
+ if (!sha1)
+ return NULL;
+ data = read_sha1_file(sha1, &type, &sz);
+ if (!data || type != OBJ_BLOB) {
+ free(data);
+ return NULL;
+ }
+ if (size)
+ *size = sz;
+ return data;
+}
+
+const unsigned char *get_sha1_from_index(struct index_state *istate, const char *path)
+{
+ int pos = index_name_pos(istate, path, strlen(path));
if (pos < 0) {
/*
* We might be in the middle of a merge, in which
@@ -2285,14 +2299,7 @@ void *read_blob_data_from_index(struct index_state *istate, const char *path, un
}
if (pos < 0)
return NULL;
- data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
- if (!data || type != OBJ_BLOB) {
- free(data);
- return NULL;
- }
- if (size)
- *size = sz;
- return data;
+ return (istate->cache[pos]->sha1);
}
void stat_validity_clear(struct stat_validity *sv)
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread
* [PATCH v2 3/3] Correct ce_compare_data() in a middle of a merge
2016-05-15 22:14 ` Junio C Hamano
` (6 preceding siblings ...)
2016-06-07 15:20 ` [PATCH v2 2/3] read-cache: factor out get_sha1_from_index() helper tboegi
@ 2016-06-07 15:20 ` tboegi
7 siblings, 0 replies; 53+ messages in thread
From: tboegi @ 2016-06-07 15:20 UTC (permalink / raw)
To: git; +Cc: Torsten Bögershausen
From: Torsten Bögershausen <tboegi@web.de>
The following didn't work as expected:
- In a middle of a merge
- merge.renormalize is true,
- .gitattributes = "* text=auto"
- core.eol = crlf
Merge a blob with CRLF "first line\r\nsame line\r\n" and a blob
with LF "first line\nsame line\n".
The expected result of the merge is "first line\nsame line\n".
The content in the working tree is "first line\r\nsame line\r\n",
and ce_compare_data() should find that the content is clean and return 0.
Deep down crlf_to_git() is invoked, to check if CRLF are converted or not.
The "new safer autocrlf handling" calls blob_has_cr().
Instead of using the sha1 of the blob, (CRLF in this example),
the function get_sha1_from_index() is invoked.
get_sha1_from_index() decides to return "ours" when in the middle of
the merge, which is LF.
As a result, the CRLF in the worktree are converted into LF before
the comparison.
The contents of LF and CRLF don't match any more.
The problem is that ce_compare_data() has ce->sha1, but the sha1 is lost
on it's way into blob_has_cr().
Forwarding ce->sha1 from ce_compare_data() into crlf_to_git() makes sure
that blob_has_cr() looks at the appropriate blob.
Add a new parameter index_blob_sha1 to convert_to_git(), and forward the
sha1 from ce_compare_data() into convert_to_git(). Other callers use NULL
for index_blob_sha1, and the sha1 is determined from path
using get_sha1_from_cache(path). This is the same handling as before.
In the same spirit, forward the sha1 into would_convert_to_git().
While at it, rename has_cr_in_index() into blob_has_cr()
and replace 0 with SAFE_CRLF_FALSE.
Add a TC in t6038 to have a test coverage under Linux.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
---
builtin/apply.c | 3 ++-
builtin/blame.c | 2 +-
cache.h | 1 +
combine-diff.c | 3 ++-
convert.c | 43 ++++++++++++++++++++++++++------------
convert.h | 15 ++++++++++----
diff.c | 3 ++-
dir.c | 2 +-
read-cache.c | 4 +++-
sha1_file.c | 12 ++++++++---
t/t6038-merge-text-auto.sh | 51 +++++++++++++++++++++++++---------------------
11 files changed, 90 insertions(+), 49 deletions(-)
diff --git a/builtin/apply.c b/builtin/apply.c
index 8e4da2e..0cf9a0a 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -2140,7 +2140,8 @@ static int read_old_data(struct stat *st, const char *path, struct strbuf *buf)
case S_IFREG:
if (strbuf_read_file(buf, path, st->st_size) != st->st_size)
return error(_("unable to open or read %s"), path);
- convert_to_git(path, buf->buf, buf->len, buf, 0);
+ convert_to_git(path, buf->buf, buf->len, buf,
+ SAFE_CRLF_FALSE, NULL);
return 0;
default:
return -1;
diff --git a/builtin/blame.c b/builtin/blame.c
index 21f42b0..1c523b6 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2377,7 +2377,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
if (strbuf_read(&buf, 0, 0) < 0)
die_errno("failed to read from stdin");
}
- convert_to_git(path, buf.buf, buf.len, &buf, 0);
+ convert_to_git(path, buf.buf, buf.len, &buf, SAFE_CRLF_FALSE, NULL);
origin->file.ptr = buf.buf;
origin->file.size = buf.len;
pretend_sha1_file(buf.buf, buf.len, OBJ_BLOB, origin->blob_sha1);
diff --git a/cache.h b/cache.h
index 322ee40..86f48f9 100644
--- a/cache.h
+++ b/cache.h
@@ -605,6 +605,7 @@ extern int ie_modified(const struct index_state *, const struct cache_entry *, s
#define HASH_WRITE_OBJECT 1
#define HASH_FORMAT_CHECK 2
+#define HASH_USE_SHA_NOT_PATH 4
extern int index_fd(unsigned char *sha1, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
extern int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned flags);
diff --git a/combine-diff.c b/combine-diff.c
index 8f2313d..8b535a7 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -1052,7 +1052,8 @@ static void show_patch_diff(struct combine_diff_path *elem, int num_parent,
if (is_file) {
struct strbuf buf = STRBUF_INIT;
- if (convert_to_git(elem->path, result, len, &buf, safe_crlf)) {
+ if (convert_to_git(elem->path, result, len,
+ &buf, safe_crlf, NULL)) {
free(result);
result = strbuf_detach(&buf, &len);
result_size = len;
diff --git a/convert.c b/convert.c
index 67d69b5..802ee7c 100644
--- a/convert.c
+++ b/convert.c
@@ -219,23 +219,28 @@ static void check_safe_crlf(const char *path, enum crlf_action crlf_action,
}
}
-static int has_cr_in_index(const char *path)
+static int blob_has_cr(const unsigned char *index_blob_sha1)
{
unsigned long sz;
void *data;
- int has_cr;
-
- data = read_blob_data_from_cache(path, &sz);
+ int has_cr = 0;
+ enum object_type type;
+ if (!index_blob_sha1)
+ return 0;
+ data = read_sha1_file(index_blob_sha1, &type, &sz);
if (!data)
return 0;
- has_cr = memchr(data, '\r', sz) != NULL;
+ if (type == OBJ_BLOB)
+ has_cr = memchr(data, '\r', sz) != NULL;
+
free(data);
return has_cr;
}
static int crlf_to_git(const char *path, const char *src, size_t len,
struct strbuf *buf,
- enum crlf_action crlf_action, enum safe_crlf checksafe)
+ enum crlf_action crlf_action, enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1)
{
struct text_stat stats;
char *dst;
@@ -256,14 +261,23 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
if (crlf_action == CRLF_AUTO || crlf_action == CRLF_AUTO_INPUT || crlf_action == CRLF_AUTO_CRLF) {
if (convert_is_binary(len, &stats))
return 0;
+
/*
* If the file in the index has any CR in it, do not convert.
* This is the new safer autocrlf handling.
*/
if (checksafe == SAFE_CRLF_RENORMALIZE)
checksafe = SAFE_CRLF_FALSE;
- else if (has_cr_in_index(path))
- return 0;
+ else {
+ /*
+ * If the file in the index has any CR in it, do not convert.
+ * This is the new safer autocrlf handling.
+ */
+ if (!index_blob_sha1)
+ index_blob_sha1 = get_sha1_from_cache(path);
+ if (blob_has_cr(index_blob_sha1))
+ return 0;
+ }
}
check_safe_crlf(path, crlf_action, &stats, checksafe);
@@ -855,7 +869,8 @@ const char *get_convert_attr_ascii(const char *path)
}
int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe)
+ struct strbuf *dst, enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1)
{
int ret = 0;
const char *filter = NULL;
@@ -876,7 +891,7 @@ int convert_to_git(const char *path, const char *src, size_t len,
src = dst->buf;
len = dst->len;
}
- ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
+ ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe, index_blob_sha1);
if (ret && dst) {
src = dst->buf;
len = dst->len;
@@ -885,7 +900,8 @@ int convert_to_git(const char *path, const char *src, size_t len,
}
void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
- enum safe_crlf checksafe)
+ enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1)
{
struct conv_attrs ca;
convert_attrs(&ca, path);
@@ -896,7 +912,8 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
die("%s: clean filter '%s' failed", path, ca.drv->name);
- crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
+ crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action,
+ checksafe, index_blob_sha1);
ident_to_git(path, dst->buf, dst->len, dst, ca.ident);
}
@@ -951,7 +968,7 @@ int renormalize_buffer(const char *path, const char *src, size_t len, struct str
src = dst->buf;
len = dst->len;
}
- return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_RENORMALIZE);
+ return ret | convert_to_git(path, src, len, dst, SAFE_CRLF_RENORMALIZE, NULL);
}
/*****************************************************************
diff --git a/convert.h b/convert.h
index 81b6cdf..8c34dd5 100644
--- a/convert.h
+++ b/convert.h
@@ -39,19 +39,26 @@ extern const char *get_convert_attr_ascii(const char *path);
/* returns 1 if *dst was used */
extern int convert_to_git(const char *path, const char *src, size_t len,
- struct strbuf *dst, enum safe_crlf checksafe);
+ struct strbuf *dst, enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1);
+
extern int convert_to_working_tree(const char *path, const char *src,
size_t len, struct strbuf *dst);
extern int renormalize_buffer(const char *path, const char *src, size_t len,
struct strbuf *dst);
-static inline int would_convert_to_git(const char *path)
+static inline int would_convert_to_git(const char *path,
+ const unsigned char *index_blob_sha1)
{
- return convert_to_git(path, NULL, 0, NULL, 0);
+ return convert_to_git(path, NULL, 0, NULL, SAFE_CRLF_FALSE,
+ index_blob_sha1);
}
+
/* Precondition: would_convert_to_git_filter_fd(path) == true */
extern void convert_to_git_filter_fd(const char *path, int fd,
struct strbuf *dst,
- enum safe_crlf checksafe);
+ enum safe_crlf checksafe,
+ const unsigned char *index_blob_sha1);
+
extern int would_convert_to_git_filter_fd(const char *path);
/*****************************************************************
diff --git a/diff.c b/diff.c
index d3734d3..a8308e0 100644
--- a/diff.c
+++ b/diff.c
@@ -2810,7 +2810,8 @@ int diff_populate_filespec(struct diff_filespec *s, unsigned int flags)
/*
* Convert from working tree format to canonical git format
*/
- if (convert_to_git(s->path, s->data, s->size, &buf, crlf_warn)) {
+ if (convert_to_git(s->path, s->data, s->size, &buf,
+ crlf_warn, NULL)) {
size_t size = 0;
munmap(s->data, s->size);
s->should_munmap = 0;
diff --git a/dir.c b/dir.c
index 6172b34..66f1ffb 100644
--- a/dir.c
+++ b/dir.c
@@ -712,7 +712,7 @@ static int add_excludes(const char *fname, const char *base, int baselen,
(pos = cache_name_pos(fname, strlen(fname))) >= 0 &&
!ce_stage(active_cache[pos]) &&
ce_uptodate(active_cache[pos]) &&
- !would_convert_to_git(fname))
+ !would_convert_to_git(fname, NULL))
hashcpy(sha1_stat->sha1, active_cache[pos]->sha1);
else
hash_sha1_file(buf, size, "blob", sha1_stat->sha1);
diff --git a/read-cache.c b/read-cache.c
index a3ef967..c109b6d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -163,7 +163,9 @@ static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
if (fd >= 0) {
unsigned char sha1[20];
- if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0))
+ unsigned flags = HASH_USE_SHA_NOT_PATH;
+ memcpy(sha1, ce->sha1, sizeof(sha1));
+ if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, flags))
match = hashcmp(sha1, ce->sha1);
/* index_fd() closed the file descriptor already */
}
diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..10630f0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3273,6 +3273,7 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
{
int ret, re_allocated = 0;
int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
if (!type)
type = OBJ_BLOB;
@@ -3283,7 +3284,8 @@ static int index_mem(unsigned char *sha1, void *buf, size_t size,
if ((type == OBJ_BLOB) && path) {
struct strbuf nbuf = STRBUF_INIT;
if (convert_to_git(path, buf, size, &nbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE)) {
+ write_object ? safe_crlf : SAFE_CRLF_FALSE,
+ valid_sha1 ? sha1 : NULL)) {
buf = strbuf_detach(&nbuf, &size);
re_allocated = 1;
}
@@ -3311,13 +3313,15 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd,
{
int ret;
const int write_object = flags & HASH_WRITE_OBJECT;
+ const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
struct strbuf sbuf = STRBUF_INIT;
assert(path);
assert(would_convert_to_git_filter_fd(path));
convert_to_git_filter_fd(path, fd, &sbuf,
- write_object ? safe_crlf : SAFE_CRLF_FALSE);
+ write_object ? safe_crlf : SAFE_CRLF_FALSE,
+ valid_sha1 ? sha1 : NULL);
if (write_object)
ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
@@ -3394,6 +3398,8 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
+ const unsigned char *sha1_ce;
+ sha1_ce = flags & HASH_USE_SHA_NOT_PATH ? sha1 : NULL;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
@@ -3404,7 +3410,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st,
else if (!S_ISREG(st->st_mode))
ret = index_pipe(sha1, fd, type, path, flags);
else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
- (path && would_convert_to_git(path)))
+ (path && would_convert_to_git(path,sha1_ce)))
ret = index_core(sha1, fd, xsize_t(st->st_size), type, path,
flags);
else
diff --git a/t/t6038-merge-text-auto.sh b/t/t6038-merge-text-auto.sh
index 33b77ee..5e8d5fa 100755
--- a/t/t6038-merge-text-auto.sh
+++ b/t/t6038-merge-text-auto.sh
@@ -91,16 +91,13 @@ test_expect_success 'Merge after setting text=auto' '
compare_files expected file
'
-test_expect_success 'Merge addition of text=auto' '
+test_expect_success 'Merge addition of text=auto eol=LF' '
+ git config core.eol lf &&
cat <<-\EOF >expected &&
first line
same line
EOF
- if test_have_prereq NATIVE_CRLF; then
- append_cr <expected >expected.temp &&
- mv expected.temp expected
- fi &&
git config merge.renormalize true &&
git rm -fr . &&
rm -f .gitattributes &&
@@ -109,17 +106,31 @@ test_expect_success 'Merge addition of text=auto' '
compare_files expected file
'
+test_expect_success 'Merge addition of text=auto eol=CRLF' '
+ git config core.eol crlf &&
+ cat <<-\EOF >expected &&
+ first line
+ same line
+ EOF
+
+ append_cr <expected >expected.temp &&
+ mv expected.temp expected &&
+ git config merge.renormalize true &&
+ git rm -fr . &&
+ rm -f .gitattributes &&
+ git reset --hard b &&
+ echo >&2 "After git reset --hard b" &&
+ git ls-files -s --eol >&2 &&
+ git merge a &&
+ compare_files expected file
+'
+
test_expect_success 'Detect CRLF/LF conflict after setting text=auto' '
+ git config core.eol native &&
echo "<<<<<<<" >expected &&
- if test_have_prereq NATIVE_CRLF; then
- echo first line | append_cr >>expected &&
- echo same line | append_cr >>expected &&
- echo ======= | append_cr >>expected
- else
- echo first line >>expected &&
- echo same line >>expected &&
- echo ======= >>expected
- fi &&
+ echo first line >>expected &&
+ echo same line >>expected &&
+ echo ======= >>expected &&
echo first line | append_cr >>expected &&
echo same line | append_cr >>expected &&
echo ">>>>>>>" >>expected &&
@@ -135,15 +146,9 @@ test_expect_success 'Detect LF/CRLF conflict from addition of text=auto' '
echo "<<<<<<<" >expected &&
echo first line | append_cr >>expected &&
echo same line | append_cr >>expected &&
- if test_have_prereq NATIVE_CRLF; then
- echo ======= | append_cr >>expected &&
- echo first line | append_cr >>expected &&
- echo same line | append_cr >>expected
- else
- echo ======= >>expected &&
- echo first line >>expected &&
- echo same line >>expected
- fi &&
+ echo ======= >>expected &&
+ echo first line >>expected &&
+ echo same line >>expected &&
echo ">>>>>>>" >>expected &&
git config merge.renormalize false &&
rm -f .gitattributes &&
--
2.0.0.rc1.6318.g0c2c796
^ permalink raw reply related [flat|nested] 53+ messages in thread