All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] builtin/mv.c: use correct type to compute size of an array element
@ 2022-07-07  2:02 Junio C Hamano
  2022-07-07  5:52 ` [PATCH v2] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove() Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Junio C Hamano @ 2022-07-07  2:02 UTC (permalink / raw)
  To: git

The variables 'source', 'destination', and 'submodule_gitfile' are
all of type "const char **", and an element of such an array is of
"type const char *".

Noticed while running "make contrib/coccinelle/array.cocci.patch".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * There is this rule in the array.cocci file:

        @@
        type T;
        T *dst;
        T *src;
        expression n;
        @@
        (
        - memmove(dst, src, (n) * sizeof(*dst));
        + MOVE_ARRAY(dst, src, n);
        |
        - memmove(dst, src, (n) * sizeof(*src));
        + MOVE_ARRAY(dst, src, n);
        |
        - memmove(dst, src, (n) * sizeof(T));
        + MOVE_ARRAY(dst, src, n);
        )

   but it triggered only for modes[] array among the four parallel
   arrays that are being moved here.

   There are a few tangents.

   * Should we in general use sizeof(TYPE) in these cases, instead
     of the size of the zeroth element, e.g.

 		memmove(source + i, source + i + 1,
			n * sizeof(source[i]));
    
     It would have been caught by the above Coccinelle rule (we are
     taking the size of *dst).

   * Shouldn't we have an array of struct with four members, instead
     of four parallel arrays, e.g.

		struct {
			const char *source;
			const char *destination;
			enum update_mode mode;
			const char *submodule_gitfile;
		} *mv_file;

   The latter question is important to answer before we accept
   Coccinelle-suggested rewrite to use four MOVE_ARRAY() on these
   four parallel arrays on top of this fix.

 builtin/mv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git c/builtin/mv.c w/builtin/mv.c
index 2a38e2af46..d419e83f0f 100644
--- c/builtin/mv.c
+++ w/builtin/mv.c
@@ -377,13 +377,13 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		if (--argc > 0) {
 			int n = argc - i;
 			memmove(source + i, source + i + 1,
-				n * sizeof(char *));
+				n * sizeof(const char *));
 			memmove(destination + i, destination + i + 1,
-				n * sizeof(char *));
+				n * sizeof(const char *));
 			memmove(modes + i, modes + i + 1,
 				n * sizeof(enum update_mode));
 			memmove(submodule_gitfile + i, submodule_gitfile + i + 1,
-				n * sizeof(char *));
+				n * sizeof(const char *));
 			i--;
 		}
 	}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove()
  2022-07-07  2:02 [PATCH] builtin/mv.c: use correct type to compute size of an array element Junio C Hamano
@ 2022-07-07  5:52 ` Junio C Hamano
  2022-07-10  1:33   ` [PATCH v3] " Junio C Hamano
  2022-07-07 12:11 ` [PATCH] builtin/mv.c: use correct type to compute size of an array element Ævar Arnfjörð Bjarmason
  2022-07-07 18:42 ` Jeff King
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-07-07  5:52 UTC (permalink / raw)
  To: git

The variables 'source', 'destination', and 'submodule_gitfile' are
all of type "const char **", and an element of such an array is of
"type const char *", but these memmove() calls were written as if
these variables are of type "char **".

Once these memmove() calls are fixed to use the correct type to
compute the number of bytes to be moved, e.g.

-      memmove(source + i, source + i + 1, n * sizeof(char *));
+      memmove(source + i, source + i + 1, n * sizeof(const char *));

existing contrib/coccinelle/array.cocci rules can recognize them as
candidates for turning into MOVE_ARRAY().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/mv.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/builtin/mv.c b/builtin/mv.c
index 83a465ba83..d599233057 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -282,14 +282,12 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 remove_entry:
 		if (--argc > 0) {
 			int n = argc - i;
-			memmove(source + i, source + i + 1,
-				n * sizeof(char *));
-			memmove(destination + i, destination + i + 1,
-				n * sizeof(char *));
+			MOVE_ARRAY(source + i, source + i + 1, n);
+			MOVE_ARRAY(destination + i, destination + i + 1, n);
 			memmove(modes + i, modes + i + 1,
 				n * sizeof(enum update_mode));
-			memmove(submodule_gitfile + i, submodule_gitfile + i + 1,
-				n * sizeof(char *));
+			MOVE_ARRAY(submodule_gitfile + i,
+				   submodule_gitfile + i + 1, n);
 			i--;
 		}
 	}
-- 
2.37.0-211-gafcdf5f063


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-07  2:02 [PATCH] builtin/mv.c: use correct type to compute size of an array element Junio C Hamano
  2022-07-07  5:52 ` [PATCH v2] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove() Junio C Hamano
@ 2022-07-07 12:11 ` Ævar Arnfjörð Bjarmason
  2022-07-07 18:10   ` Junio C Hamano
  2022-07-07 18:27   ` [PATCH] builtin/mv.c: use correct type to compute size of an array element René Scharfe
  2022-07-07 18:42 ` Jeff King
  2 siblings, 2 replies; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-07 12:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe


On Wed, Jul 06 2022, Junio C Hamano wrote:

> The variables 'source', 'destination', and 'submodule_gitfile' are
> all of type "const char **", and an element of such an array is of
> "type const char *".
>
> Noticed while running "make contrib/coccinelle/array.cocci.patch".
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * There is this rule in the array.cocci file:
>
>         @@
>         type T;
>         T *dst;
>         T *src;
>         expression n;
>         @@
>         (
>         - memmove(dst, src, (n) * sizeof(*dst));
>         + MOVE_ARRAY(dst, src, n);
>         |
>         - memmove(dst, src, (n) * sizeof(*src));
>         + MOVE_ARRAY(dst, src, n);
>         |
>         - memmove(dst, src, (n) * sizeof(T));
>         + MOVE_ARRAY(dst, src, n);
>         )
>
>    but it triggered only for modes[] array among the four parallel
>    arrays that are being moved here.
>
>    There are a few tangents.
>
>    * Should we in general use sizeof(TYPE) in these cases, instead
>      of the size of the zeroth element, e.g.
>
>  		memmove(source + i, source + i + 1,
> 			n * sizeof(source[i]));
>     
>      It would have been caught by the above Coccinelle rule (we are
>      taking the size of *dst).
>
>    * Shouldn't we have an array of struct with four members, instead
>      of four parallel arrays, e.g.
>
> 		struct {
> 			const char *source;
> 			const char *destination;
> 			enum update_mode mode;
> 			const char *submodule_gitfile;
> 		} *mv_file;
>
>    The latter question is important to answer before we accept
>    Coccinelle-suggested rewrite to use four MOVE_ARRAY() on these
>    four parallel arrays on top of this fix.

This seems to be the same sort case as noted in 7bd97d6dff3 (git: use
COPY_ARRAY and MOVE_ARRAY in handle_alias(), 2019-09-19).

I tried (going aginst the warnings in that commit about the
non-generality) updating the rules to catch these cases, which yielded
the below. I wonder if we should apply some version of it. I.e. one-off
fix these, and perhaps have the cocci rule BUG() if we see this sort of
code again (the form technically could be used, but it seems all our
uses of it are ones we could convert to the simpler one...).

diff --git a/add-patch.c b/add-patch.c
index 509ca04456b..eff338d3901 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -915,10 +915,9 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
 	file_diff->hunk_nr += splittable_into - 1;
 	ALLOC_GROW(file_diff->hunk, file_diff->hunk_nr, file_diff->hunk_alloc);
 	if (hunk_index + splittable_into < file_diff->hunk_nr)
-		memmove(file_diff->hunk + hunk_index + splittable_into,
-			file_diff->hunk + hunk_index + 1,
-			(file_diff->hunk_nr - hunk_index - splittable_into)
-			* sizeof(*hunk));
+		MOVE_ARRAY(file_diff->hunk + hunk_index + splittable_into,
+			   file_diff->hunk + hunk_index + 1,
+			   file_diff->hunk_nr - hunk_index - splittable_into);
 	hunk = file_diff->hunk + hunk_index;
 	hunk->splittable_into = 1;
 	memset(hunk + 1, 0, (splittable_into - 1) * sizeof(*hunk));
diff --git a/builtin/mv.c b/builtin/mv.c
index 83a465ba831..f6187d1264a 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -282,14 +282,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 remove_entry:
 		if (--argc > 0) {
 			int n = argc - i;
-			memmove(source + i, source + i + 1,
-				n * sizeof(char *));
-			memmove(destination + i, destination + i + 1,
-				n * sizeof(char *));
-			memmove(modes + i, modes + i + 1,
-				n * sizeof(enum update_mode));
-			memmove(submodule_gitfile + i, submodule_gitfile + i + 1,
-				n * sizeof(char *));
+			MOVE_ARRAY(source + i, source + i + 1, n);
+			MOVE_ARRAY(destination + i, destination + i + 1, n);
+			MOVE_ARRAY(modes + i, modes + i + 1, n);
+			MOVE_ARRAY(submodule_gitfile + i,
+				   submodule_gitfile + i + 1, n);
 			i--;
 		}
 	}
diff --git a/commit.c b/commit.c
index 1fb1b2ea90c..c23d3e3678f 100644
--- a/commit.c
+++ b/commit.c
@@ -151,10 +151,9 @@ int register_commit_graft(struct repository *r, struct commit_graft *graft,
 		   r->parsed_objects->grafts_alloc);
 	r->parsed_objects->grafts_nr++;
 	if (pos < r->parsed_objects->grafts_nr)
-		memmove(r->parsed_objects->grafts + pos + 1,
-			r->parsed_objects->grafts + pos,
-			(r->parsed_objects->grafts_nr - pos - 1) *
-			sizeof(*r->parsed_objects->grafts));
+		MOVE_ARRAY(r->parsed_objects->grafts + pos + 1,
+			   r->parsed_objects->grafts + pos,
+			   r->parsed_objects->grafts_nr - pos - 1);
 	r->parsed_objects->grafts[pos] = graft;
 	unparse_commit(r, &graft->oid);
 	return 0;
diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
index 9a4f00cb1bd..988ff9a3fda 100644
--- a/contrib/coccinelle/array.cocci
+++ b/contrib/coccinelle/array.cocci
@@ -73,6 +73,15 @@ expression n;
 + MOVE_ARRAY(dst, src, n);
 )
 
+@@
+expression D;
+expression S;
+expression N;
+expression Z;
+@@
+- memmove(D, S, (N) * Z);
++ MOVE_ARRAY(D, S, N);
+
 @@
 type T;
 T *ptr;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-07 12:11 ` [PATCH] builtin/mv.c: use correct type to compute size of an array element Ævar Arnfjörð Bjarmason
@ 2022-07-07 18:10   ` Junio C Hamano
  2022-07-07 19:11     ` René Scharfe
  2022-07-07 18:27   ` [PATCH] builtin/mv.c: use correct type to compute size of an array element René Scharfe
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-07-07 18:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, René Scharfe

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Jul 06 2022, Junio C Hamano wrote:
>
>> The variables 'source', 'destination', and 'submodule_gitfile' are
>> all of type "const char **", and an element of such an array is of
>> "type const char *".
>>
>> Noticed while running "make contrib/coccinelle/array.cocci.patch".
>>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>
>>  * There is this rule in the array.cocci file:
>>
>>         @@
>>         type T;
>>         T *dst;
>>         T *src;
>>         expression n;
>>         @@
>>         (
>>         - memmove(dst, src, (n) * sizeof(*dst));
>>         + MOVE_ARRAY(dst, src, n);
>>         |
>>         - memmove(dst, src, (n) * sizeof(*src));
>>         + MOVE_ARRAY(dst, src, n);
>>         |
>>         - memmove(dst, src, (n) * sizeof(T));
>>         + MOVE_ARRAY(dst, src, n);
>>         )
>>
>>    but it triggered only for modes[] array among the four parallel
>>    arrays that are being moved here.
>>
> This seems to be the same sort case as noted in 7bd97d6dff3 (git: use
> COPY_ARRAY and MOVE_ARRAY in handle_alias(), 2019-09-19).

"This" here means "sizeof(const T) and sizeof(T) are the same and
our Cocci rules do not trigger when a wrong one is used", and I
agree that is exactly the same issue as fixed by 7bd97d6dff3.

> I tried (going aginst the warnings in that commit about the
> non-generality) updating the rules to catch these cases, which yielded
> the below.

> I wonder if we should apply some version of it. I.e. one-off
> fix these, and perhaps have the cocci rule BUG() if we see this sort of
> code again (the form technically could be used, but it seems all our
> uses of it are ones we could convert to the simpler one...).

One off rewrite using an ultra-loose rule, with human vetting the
result, may be a good idea---after all, we are encouraging our
developers to use MOVE_ARRAY() when appropriate, so it does not
exactly matter how you discovered a memmove() can be rewritten
safely to MOVE_ARRAY() as long as the resulting patch is correct.

It is a different story to add a loose automation that is designed
to produce incorrect rewrite, and expects that humans always vet the
outcome.  Given that we run these rules at CI (doesn't it block GGG
submitters?), it is a bad idea.

> diff --git a/add-patch.c b/add-patch.c
> index 509ca04456b..eff338d3901 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -915,10 +915,9 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>  	file_diff->hunk_nr += splittable_into - 1;
>  	ALLOC_GROW(file_diff->hunk, file_diff->hunk_nr, file_diff->hunk_alloc);
>  	if (hunk_index + splittable_into < file_diff->hunk_nr)
> -		memmove(file_diff->hunk + hunk_index + splittable_into,
> -			file_diff->hunk + hunk_index + 1,
> -			(file_diff->hunk_nr - hunk_index - splittable_into)
> -			* sizeof(*hunk));
> +		MOVE_ARRAY(file_diff->hunk + hunk_index + splittable_into,
> +			   file_diff->hunk + hunk_index + 1,
> +			   file_diff->hunk_nr - hunk_index - splittable_into);

This does not look all that more readable, unfortunately.

> diff --git a/builtin/mv.c b/builtin/mv.c
> index 83a465ba831..f6187d1264a 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -282,14 +282,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  remove_entry:
>  		if (--argc > 0) {
>  			int n = argc - i;
> -			memmove(source + i, source + i + 1,
> -				n * sizeof(char *));
> -			memmove(destination + i, destination + i + 1,
> -				n * sizeof(char *));
> -			memmove(modes + i, modes + i + 1,
> -				n * sizeof(enum update_mode));
> -			memmove(submodule_gitfile + i, submodule_gitfile + i + 1,
> -				n * sizeof(char *));
> +			MOVE_ARRAY(source + i, source + i + 1, n);
> +			MOVE_ARRAY(destination + i, destination + i + 1, n);
> +			MOVE_ARRAY(modes + i, modes + i + 1, n);
> +			MOVE_ARRAY(submodule_gitfile + i,
> +				   submodule_gitfile + i + 1, n);

This is exactly what I sent.  I guess the change in this hunk is not
all that different from add-patch.c but somehow it makes it much
easier to read.  Perhaps that is only because these moves are much
simpler (i.e. shift by 1).

> diff --git a/commit.c b/commit.c
> index 1fb1b2ea90c..c23d3e3678f 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -151,10 +151,9 @@ int register_commit_graft(struct repository *r, struct commit_graft *graft,
>  		   r->parsed_objects->grafts_alloc);
>  	r->parsed_objects->grafts_nr++;
>  	if (pos < r->parsed_objects->grafts_nr)
> -		memmove(r->parsed_objects->grafts + pos + 1,
> -			r->parsed_objects->grafts + pos,
> -			(r->parsed_objects->grafts_nr - pos - 1) *
> -			sizeof(*r->parsed_objects->grafts));
> +		MOVE_ARRAY(r->parsed_objects->grafts + pos + 1,
> +			   r->parsed_objects->grafts + pos,
> +			   r->parsed_objects->grafts_nr - pos - 1);

Likewise.

> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
> index 9a4f00cb1bd..988ff9a3fda 100644
> --- a/contrib/coccinelle/array.cocci
> +++ b/contrib/coccinelle/array.cocci
> @@ -73,6 +73,15 @@ expression n;
>  + MOVE_ARRAY(dst, src, n);
>  )
>  
> +@@
> +expression D;
> +expression S;
> +expression N;
> +expression Z;
> +@@
> +- memmove(D, S, (N) * Z);
> ++ MOVE_ARRAY(D, S, N);

This is definitely unwelcome, as there is nothing that ensures Z has
the same value as sizeof(D[0]).

While our eyes are on array.cocci, I have a few observations on it.

This is not meant specifically to you, Ævar, but comments by those
more familiar with Coccinelle (and I am sure the bar to pass is
fairly low, as I am not all that familiar) are very much
appreciated.

    @@
    expression dst, src, n, E;
    @@
      memcpy(dst, src, n * sizeof(
    - E[...]
    + *(E)
      ))

This seems to force us prefer sizeof(*(E)) over sizeof(E[i]), when
it is used to compute the byte size of memcpy() operation.  There is
no reason to prefer one over the other, but I presume it is there
only for convenience for the other rules in this file (I recall
vaguely reading somewhere that these rules do not "execute" from top
to bottom, so I wonder how effective it is?).

    @@
    type T;
    T *ptr;
    T[] arr;
    expression E, n;
    @@
    (
      memcpy(ptr, E,
    - n * sizeof(*(ptr))
    + n * sizeof(T)
      )
    |
      memcpy(arr, E,
    - n * sizeof(*(arr))
    + n * sizeof(T)
      )
    |
      memcpy(E, ptr,
    - n * sizeof(*(ptr))
    + n * sizeof(T)
      )
    |
      memcpy(E, arr,
    - n * sizeof(*(arr))
    + n * sizeof(T)
      )
    )

Likewise, but this one is a lot worse.

Taken alone, sizeof(*(ptr)) is far more preferrable than sizeof(T),
because the code will be more maintainable.

    Side Note.  I know builtin/mv.c had this type mismatch between
    the variable and sizeof() from the beginning when 11be42a4 (Make
    git-mv a builtin, 2006-07-26) introduced both the variable
    declaration "const char **source" and memmove() on it, but a
    code that starts out with "char **src" with memmove() that moves
    part of src[] and uses sizeof(char *) to compute the byte size
    of the move would become broken the same way when a later
    developer tightens the declaration to use "const char **src"
    without realizing that they have to update the type used in
    sizeof().

So even though I am guessing that this is to allow the later rules
to worry only about sizeof(T), I am a bit unhappy to see the rule.
If an existing code matched this rule and get rewritten to use
sizeof(T), not sizeof(*(ptr)), but did not match the other rules to
be rewritten to use COPY_ARRAY(), the overall effect would be that
the automation made the code worse.

    @@
    type T;
    T *dst_ptr;
    T *src_ptr;
    T[] dst_arr;
    T[] src_arr;
    expression n;
    @@
    (
    - memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
    + COPY_ARRAY(dst_ptr, src_ptr, n)
    |
    - memcpy(dst_ptr, src_arr, (n) * sizeof(T))
    + COPY_ARRAY(dst_ptr, src_arr, n)
    |
    - memcpy(dst_arr, src_ptr, (n) * sizeof(T))
    + COPY_ARRAY(dst_arr, src_ptr, n)
    |
    - memcpy(dst_arr, src_arr, (n) * sizeof(T))
    + COPY_ARRAY(dst_arr, src_arr, n)
    )

I take it that thanks to the earlier "meh -- between sizeof(*p) and
sizeof(p[0]) there is no reason to prefer one over the other" and
"oh, no, we should prefer sizeof(*p) not sizeof(typeof(*p)) but this
one is the other way around" rules, this one only has to deal with
sizeof(T).

Am I reading it correctly?

    @@
    type T;
    T *dst;
    T *src;
    expression n;
    @@
    (
    - memmove(dst, src, (n) * sizeof(*dst));
    + MOVE_ARRAY(dst, src, n);
    |
    - memmove(dst, src, (n) * sizeof(*src));
    + MOVE_ARRAY(dst, src, n);
    |
    - memmove(dst, src, (n) * sizeof(T));
    + MOVE_ARRAY(dst, src, n);
    )

What I find interesting is that this one seems to be able to do the
necessary rewrite without having to do the "turn everything into
sizeof(T) first" trick.  If this approach works well, I'd rather see
the COPY_ARRAY() done without the first two preliminary rewrite
rules.

I wonder if the pattern in the first rule catches sizeof(dst[0])
instead of sizeof(*dst), though.

    @@
    type T;
    T *ptr;
    expression n;
    @@
    - ptr = xmalloc((n) * sizeof(*ptr));
    + ALLOC_ARRAY(ptr, n);

    @@
    type T;
    T *ptr;
    expression n;
    @@
    - ptr = xmalloc((n) * sizeof(T));
    + ALLOC_ARRAY(ptr, n);

Is it a no-op rewrite if we replace the above two rules with
something like:

.   @@
.   type T;
.   T *ptr;
.   expression n;
.   @@
.   (
.   - ptr = xmalloc((n) * sizeof(*ptr));
.   + ALLOC_ARRAY(ptr, n);
.   |
.   - ptr = xmalloc((n) * sizeof(T));
.   + ALLOC_ARRAY(ptr, n);
.   )

or even following the pattern of the next one ...

.   @@
.   type T;
.   T *ptr;
.   expression n;
.   @@
.   - ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
.   + ALLOC_ARRAY(ptr, n);

... I have to wonder?  I like the simplicity of this pattern.

    @@
    type T;
    T *ptr;
    expression n != 1;
    @@
    - ptr = xcalloc(n, \( sizeof(*ptr) \| sizeof(T) \) )
    + CALLOC_ARRAY(ptr, n)

And this leaves xcalloc(1, ...) alone because it is a way to get a
cleared block of memory that may not be an array at all.  Shouldn't
we have "n != 1" for xmalloc rule as well, I wonder, if only for
consistency?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-07 12:11 ` [PATCH] builtin/mv.c: use correct type to compute size of an array element Ævar Arnfjörð Bjarmason
  2022-07-07 18:10   ` Junio C Hamano
@ 2022-07-07 18:27   ` René Scharfe
  1 sibling, 0 replies; 18+ messages in thread
From: René Scharfe @ 2022-07-07 18:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, Junio C Hamano; +Cc: git

Am 07.07.22 um 14:11 schrieb Ævar Arnfjörð Bjarmason:
>
> On Wed, Jul 06 2022, Junio C Hamano wrote:
>
>> The variables 'source', 'destination', and 'submodule_gitfile' are
>> all of type "const char **", and an element of such an array is of
>> "type const char *".
>>
>> Noticed while running "make contrib/coccinelle/array.cocci.patch".
>>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>>
>>  * There is this rule in the array.cocci file:
>>
>>         @@
>>         type T;
>>         T *dst;
>>         T *src;
>>         expression n;
>>         @@
>>         (
>>         - memmove(dst, src, (n) * sizeof(*dst));
>>         + MOVE_ARRAY(dst, src, n);
>>         |
>>         - memmove(dst, src, (n) * sizeof(*src));
>>         + MOVE_ARRAY(dst, src, n);
>>         |
>>         - memmove(dst, src, (n) * sizeof(T));
>>         + MOVE_ARRAY(dst, src, n);
>>         )
>>
>>    but it triggered only for modes[] array among the four parallel
>>    arrays that are being moved here.
>>
>>    There are a few tangents.
>>
>>    * Should we in general use sizeof(TYPE) in these cases, instead
>>      of the size of the zeroth element, e.g.
>>
>>  		memmove(source + i, source + i + 1,
>> 			n * sizeof(source[i]));
>>
>>      It would have been caught by the above Coccinelle rule (we are
>>      taking the size of *dst).

You mean using normalization rules for memmove like we have for memcpy,
to cover a wider range of combinations without having explicit rules for
each one?  Makes sense -- memcpy and memmove take the same arguments, so
we should treat them the same.

>>
>>    * Shouldn't we have an array of struct with four members, instead
>>      of four parallel arrays, e.g.
>>
>> 		struct {
>> 			const char *source;
>> 			const char *destination;
>> 			enum update_mode mode;
>> 			const char *submodule_gitfile;
>> 		} *mv_file;

Possibly, but that would require restructuring the code.  Currently the
function internal_prefix_pathspec() is used to build the string arrays
source and destination separately.

>>
>>    The latter question is important to answer before we accept
>>    Coccinelle-suggested rewrite to use four MOVE_ARRAY() on these
>>    four parallel arrays on top of this fix.
>
> This seems to be the same sort case as noted in 7bd97d6dff3 (git: use
> COPY_ARRAY and MOVE_ARRAY in handle_alias(), 2019-09-19).
>
> I tried (going aginst the warnings in that commit about the
> non-generality) updating the rules to catch these cases, which yielded
> the below. I wonder if we should apply some version of it. I.e. one-off
> fix these, and perhaps have the cocci rule BUG() if we see this sort of
> code again (the form technically could be used, but it seems all our
> uses of it are ones we could convert to the simpler one...).

Having a loose rule and inspecting its results carefully can work, but
having such a rule published and run by CI pipelines etc. risks adding
defects and false positives.  I'm not sure it's much of a win to come
up with a loose Coccinelle rule instead of grepping and hand-editing if
great care needs to be taken in both cases.

>
> diff --git a/add-patch.c b/add-patch.c
> index 509ca04456b..eff338d3901 100644
> --- a/add-patch.c
> +++ b/add-patch.c
> @@ -915,10 +915,9 @@ static int split_hunk(struct add_p_state *s, struct file_diff *file_diff,
>  	file_diff->hunk_nr += splittable_into - 1;
>  	ALLOC_GROW(file_diff->hunk, file_diff->hunk_nr, file_diff->hunk_alloc);
>  	if (hunk_index + splittable_into < file_diff->hunk_nr)
> -		memmove(file_diff->hunk + hunk_index + splittable_into,
> -			file_diff->hunk + hunk_index + 1,
> -			(file_diff->hunk_nr - hunk_index - splittable_into)
> -			* sizeof(*hunk));
> +		MOVE_ARRAY(file_diff->hunk + hunk_index + splittable_into,
> +			   file_diff->hunk + hunk_index + 1,
> +			   file_diff->hunk_nr - hunk_index - splittable_into);

OK.

>  	hunk = file_diff->hunk + hunk_index;
>  	hunk->splittable_into = 1;
>  	memset(hunk + 1, 0, (splittable_into - 1) * sizeof(*hunk));
> diff --git a/builtin/mv.c b/builtin/mv.c
> index 83a465ba831..f6187d1264a 100644
> --- a/builtin/mv.c
> +++ b/builtin/mv.c
> @@ -282,14 +282,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
>  remove_entry:
>  		if (--argc > 0) {
>  			int n = argc - i;
> -			memmove(source + i, source + i + 1,
> -				n * sizeof(char *));
> -			memmove(destination + i, destination + i + 1,
> -				n * sizeof(char *));
> -			memmove(modes + i, modes + i + 1,
> -				n * sizeof(enum update_mode));
> -			memmove(submodule_gitfile + i, submodule_gitfile + i + 1,
> -				n * sizeof(char *));
> +			MOVE_ARRAY(source + i, source + i + 1, n);
> +			MOVE_ARRAY(destination + i, destination + i + 1, n);
> +			MOVE_ARRAY(modes + i, modes + i + 1, n);
> +			MOVE_ARRAY(submodule_gitfile + i,
> +				   submodule_gitfile + i + 1, n);

A simpler approach could be to mark the entries as invalid somehow and
skip them in the final loop instead of compacting the array like that.
We don't shrink the allocation anyway, so there's no need to move half
the entries on average on each such a delete.

>  			i--;
>  		}
>  	}
> diff --git a/commit.c b/commit.c
> index 1fb1b2ea90c..c23d3e3678f 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -151,10 +151,9 @@ int register_commit_graft(struct repository *r, struct commit_graft *graft,
>  		   r->parsed_objects->grafts_alloc);
>  	r->parsed_objects->grafts_nr++;
>  	if (pos < r->parsed_objects->grafts_nr)
> -		memmove(r->parsed_objects->grafts + pos + 1,
> -			r->parsed_objects->grafts + pos,
> -			(r->parsed_objects->grafts_nr - pos - 1) *
> -			sizeof(*r->parsed_objects->grafts));
> +		MOVE_ARRAY(r->parsed_objects->grafts + pos + 1,
> +			   r->parsed_objects->grafts + pos,
> +			   r->parsed_objects->grafts_nr - pos - 1);

Is that -1 correct?  Yes, it's needed because grafts_nr is incremented
already above, before making room and actually adding the the new item.
Confused me for a minute.

>  	r->parsed_objects->grafts[pos] = graft;
>  	unparse_commit(r, &graft->oid);
>  	return 0;
> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
> index 9a4f00cb1bd..988ff9a3fda 100644
> --- a/contrib/coccinelle/array.cocci
> +++ b/contrib/coccinelle/array.cocci
> @@ -73,6 +73,15 @@ expression n;
>  + MOVE_ARRAY(dst, src, n);
>  )
>
> +@@
> +expression D;
> +expression S;
> +expression N;
> +expression Z;
> +@@
> +- memmove(D, S, (N) * Z);
> ++ MOVE_ARRAY(D, S, N);
> +
>  @@
>  type T;
>  T *ptr;

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-07  2:02 [PATCH] builtin/mv.c: use correct type to compute size of an array element Junio C Hamano
  2022-07-07  5:52 ` [PATCH v2] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove() Junio C Hamano
  2022-07-07 12:11 ` [PATCH] builtin/mv.c: use correct type to compute size of an array element Ævar Arnfjörð Bjarmason
@ 2022-07-07 18:42 ` Jeff King
  2022-07-07 20:25   ` Junio C Hamano
  2 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2022-07-07 18:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Jul 06, 2022 at 07:02:18PM -0700, Junio C Hamano wrote:

>    * Should we in general use sizeof(TYPE) in these cases, instead
>      of the size of the zeroth element, e.g.
> 
>  		memmove(source + i, source + i + 1,
> 			n * sizeof(source[i]));
>     
>      It would have been caught by the above Coccinelle rule (we are
>      taking the size of *dst).

I'm not sure I understand this. As you noted in a later email, using
sizeof(TYPE) is less maintainable if the type of "source" changes. But
later you mention using "*source" instead of "source[i]". I don't think
there is a particular reason to prefer one over the other from the
compiler perspective. I find "*source" more idiomatic (but better still
of course is MOVE_ARRAY, which removes the choice entirely).

>    * Shouldn't we have an array of struct with four members, instead
>      of four parallel arrays, e.g.
> 
> 		struct {
> 			const char *source;
> 			const char *destination;
> 			enum update_mode mode;
> 			const char *submodule_gitfile;
> 		} *mv_file;
> 
>    The latter question is important to answer before we accept
>    Coccinelle-suggested rewrite to use four MOVE_ARRAY() on these
>    four parallel arrays on top of this fix.

I think that would make the code a lot cleaner. But it looks like
"source" and "destination" come from separate calls to
internal_prefix_pathspec().  So you'd have to reconcile that. And
there's some extra trickiness that sometimes "destination" comes from
"dest_path", which _isn't_ always the same size as "source".

So I suspect the code which uses these arrays would be cleaner with a
struct, but the setup may get worse. :)

-Peff

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-07 18:10   ` Junio C Hamano
@ 2022-07-07 19:11     ` René Scharfe
  2022-07-09  8:16       ` René Scharfe
  0 siblings, 1 reply; 18+ messages in thread
From: René Scharfe @ 2022-07-07 19:11 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason; +Cc: git

Am 07.07.22 um 20:10 schrieb Junio C Hamano:
> While our eyes are on array.cocci, I have a few observations on it.
>
> This is not meant specifically to you, Ævar, but comments by those
> more familiar with Coccinelle (and I am sure the bar to pass is
> fairly low, as I am not all that familiar) are very much
> appreciated.
>
>     @@
>     expression dst, src, n, E;
>     @@
>       memcpy(dst, src, n * sizeof(
>     - E[...]
>     + *(E)
>       ))
>
> This seems to force us prefer sizeof(*(E)) over sizeof(E[i]), when
> it is used to compute the byte size of memcpy() operation.  There is
> no reason to prefer one over the other, but I presume it is there
> only for convenience for the other rules in this file (I recall
> vaguely reading somewhere that these rules do not "execute" from top
> to bottom, so I wonder how effective it is?).

It halves the number of syntax variants to deal with.

>
>     @@
>     type T;
>     T *ptr;
>     T[] arr;
>     expression E, n;
>     @@
>     (
>       memcpy(ptr, E,
>     - n * sizeof(*(ptr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(arr, E,
>     - n * sizeof(*(arr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(E, ptr,
>     - n * sizeof(*(ptr))
>     + n * sizeof(T)
>       )
>     |
>       memcpy(E, arr,
>     - n * sizeof(*(arr))
>     + n * sizeof(T)
>       )
>     )
>
> Likewise, but this one is a lot worse.
>
> Taken alone, sizeof(*(ptr)) is far more preferrable than sizeof(T),
> because the code will be more maintainable.
>
>     Side Note.  I know builtin/mv.c had this type mismatch between
>     the variable and sizeof() from the beginning when 11be42a4 (Make
>     git-mv a builtin, 2006-07-26) introduced both the variable
>     declaration "const char **source" and memmove() on it, but a
>     code that starts out with "char **src" with memmove() that moves
>     part of src[] and uses sizeof(char *) to compute the byte size
>     of the move would become broken the same way when a later
>     developer tightens the declaration to use "const char **src"
>     without realizing that they have to update the type used in
>     sizeof().
>
> So even though I am guessing that this is to allow the later rules
> to worry only about sizeof(T), I am a bit unhappy to see the rule.
> If an existing code matched this rule and get rewritten to use
> sizeof(T), not sizeof(*(ptr)), but did not match the other rules to
> be rewritten to use COPY_ARRAY(), the overall effect would be that
> the automation made the code worse.

True.

>
>     @@
>     type T;
>     T *dst_ptr;
>     T *src_ptr;
>     T[] dst_arr;
>     T[] src_arr;
>     expression n;
>     @@
>     (
>     - memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_ptr, src_ptr, n)
>     |
>     - memcpy(dst_ptr, src_arr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_ptr, src_arr, n)
>     |
>     - memcpy(dst_arr, src_ptr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_arr, src_ptr, n)
>     |
>     - memcpy(dst_arr, src_arr, (n) * sizeof(T))
>     + COPY_ARRAY(dst_arr, src_arr, n)
>     )
>
> I take it that thanks to the earlier "meh -- between sizeof(*p) and
> sizeof(p[0]) there is no reason to prefer one over the other" and
> "oh, no, we should prefer sizeof(*p) not sizeof(typeof(*p)) but this
> one is the other way around" rules, this one only has to deal with
> sizeof(T).
>
> Am I reading it correctly?

Yes.  Without the ugly normalization step in the middle could either
use twelve cases instead of four here or use inline alternatives,
e.g.:

type T;
T *dst_ptr;
T *src_ptr;
T[] dst_arr;
T[] src_arr;
expression n;
@@
(
- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_ptr, src_ptr, n)
|
- memcpy(dst_ptr, src_arr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_ptr, src_arr, n)
|
- memcpy(dst_arr, src_ptr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_arr, src_ptr, n)
|
- memcpy(dst_arr, src_arr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
+ COPY_ARRAY(dst_arr, src_arr, n)
)

I seem to remember that rules like this missed some cases, but perhaps
that's no longer an issue with the latest Coccinelle version?

>
>     @@
>     type T;
>     T *dst;
>     T *src;
>     expression n;
>     @@
>     (
>     - memmove(dst, src, (n) * sizeof(*dst));
>     + MOVE_ARRAY(dst, src, n);
>     |
>     - memmove(dst, src, (n) * sizeof(*src));
>     + MOVE_ARRAY(dst, src, n);
>     |
>     - memmove(dst, src, (n) * sizeof(T));
>     + MOVE_ARRAY(dst, src, n);
>     )
>
> What I find interesting is that this one seems to be able to do the
> necessary rewrite without having to do the "turn everything into
> sizeof(T) first" trick.  If this approach works well, I'd rather see
> the COPY_ARRAY() done without the first two preliminary rewrite
> rules.

It doesn't support arrays (T[]).  That doesn't matter in practice
because we don't have such cases (yet?).

>
> I wonder if the pattern in the first rule catches sizeof(dst[0])
> instead of sizeof(*dst), though.

It doesn't.

>
>     @@
>     type T;
>     T *ptr;
>     expression n;
>     @@
>     - ptr = xmalloc((n) * sizeof(*ptr));
>     + ALLOC_ARRAY(ptr, n);
>
>     @@
>     type T;
>     T *ptr;
>     expression n;
>     @@
>     - ptr = xmalloc((n) * sizeof(T));
>     + ALLOC_ARRAY(ptr, n);
>
> Is it a no-op rewrite if we replace the above two rules with
> something like:
>
> .   @@
> .   type T;
> .   T *ptr;
> .   expression n;
> .   @@
> .   (
> .   - ptr = xmalloc((n) * sizeof(*ptr));
> .   + ALLOC_ARRAY(ptr, n);
> .   |
> .   - ptr = xmalloc((n) * sizeof(T));
> .   + ALLOC_ARRAY(ptr, n);
> .   )

I think so.

>
> or even following the pattern of the next one ...
>
> .   @@
> .   type T;
> .   T *ptr;
> .   expression n;
> .   @@
> .   - ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
> .   + ALLOC_ARRAY(ptr, n);
>
> ... I have to wonder?  I like the simplicity of this pattern.

In theory this is equivalent.

>
>     @@
>     type T;
>     T *ptr;
>     expression n != 1;
>     @@
>     - ptr = xcalloc(n, \( sizeof(*ptr) \| sizeof(T) \) )
>     + CALLOC_ARRAY(ptr, n)
>
> And this leaves xcalloc(1, ...) alone because it is a way to get a
> cleared block of memory that may not be an array at all.  Shouldn't
> we have "n != 1" for xmalloc rule as well, I wonder, if only for
> consistency?

I agree that a single-element array is a bit awkward, so allowing the
explicit sizeof in that case is less iffy.  ALLOC/CALLOC macros for
single items might make that automation more palatable..

René

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-07 18:42 ` Jeff King
@ 2022-07-07 20:25   ` Junio C Hamano
  0 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2022-07-07 20:25 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Wed, Jul 06, 2022 at 07:02:18PM -0700, Junio C Hamano wrote:
>
>>    * Should we in general use sizeof(TYPE) in these cases, instead
>>      of the size of the zeroth element, e.g.
>> 
>>  		memmove(source + i, source + i + 1,
>> 			n * sizeof(source[i]));
>>     
>>      It would have been caught by the above Coccinelle rule (we are
>>      taking the size of *dst).
>
> I'm not sure I understand this. As you noted in a later email, using
> sizeof(TYPE) is less maintainable if the type of "source" changes.

Sorry for a typo or thinko, whichever one you like ;-)

> But
> later you mention using "*source" instead of "source[i]". I don't think
> there is a particular reason to prefer one over the other from the
> compiler perspective. I find "*source" more idiomatic (but better still
> of course is MOVE_ARRAY, which removes the choice entirely).

Yes, I wrote source[i] there only because I found it somewhat
awkward to write source[0] or *source there, when the moved
(sub)array is from the index 'i' to the end.  *(source + i) would
have matched the intention better but it still is awkward.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-07 19:11     ` René Scharfe
@ 2022-07-09  8:16       ` René Scharfe
  2022-07-10  5:38         ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: René Scharfe @ 2022-07-09  8:16 UTC (permalink / raw)
  To: Junio C Hamano, Ævar Arnfjörð Bjarmason; +Cc: git

Am 07.07.22 um 21:11 schrieb René Scharfe:
> Am 07.07.22 um 20:10 schrieb Junio C Hamano:
>>     @@
>>     type T;
>>     T *dst_ptr;
>>     T *src_ptr;
>>     T[] dst_arr;
>>     T[] src_arr;
>>     expression n;
>>     @@
>>     (
>>     - memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
>>     + COPY_ARRAY(dst_ptr, src_ptr, n)
>>     |
>>     - memcpy(dst_ptr, src_arr, (n) * sizeof(T))
>>     + COPY_ARRAY(dst_ptr, src_arr, n)
>>     |
>>     - memcpy(dst_arr, src_ptr, (n) * sizeof(T))
>>     + COPY_ARRAY(dst_arr, src_ptr, n)
>>     |
>>     - memcpy(dst_arr, src_arr, (n) * sizeof(T))
>>     + COPY_ARRAY(dst_arr, src_arr, n)
>>     )
>>
>> I take it that thanks to the earlier "meh -- between sizeof(*p) and
>> sizeof(p[0]) there is no reason to prefer one over the other" and
>> "oh, no, we should prefer sizeof(*p) not sizeof(typeof(*p)) but this
>> one is the other way around" rules, this one only has to deal with
>> sizeof(T).
>>
>> Am I reading it correctly?
>
> Yes.  Without the ugly normalization step in the middle could either
> use twelve cases instead of four here or use inline alternatives,
> e.g.:
>
> type T;
> T *dst_ptr;
> T *src_ptr;
> T[] dst_arr;
> T[] src_arr;
> expression n;
> @@
> (
> - memcpy(dst_ptr, src_ptr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
> + COPY_ARRAY(dst_ptr, src_ptr, n)
> |
> - memcpy(dst_ptr, src_arr, (n) * \( sizeof(*(dst_ptr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
> + COPY_ARRAY(dst_ptr, src_arr, n)
> |
> - memcpy(dst_arr, src_ptr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_ptr)) \| sizeof(T) \) )
> + COPY_ARRAY(dst_arr, src_ptr, n)
> |
> - memcpy(dst_arr, src_arr, (n) * \( sizeof(*(dst_arr)) \| sizeof(*(src_arr)) \| sizeof(T) \) )
> + COPY_ARRAY(dst_arr, src_arr, n)
> )
>
> I seem to remember that rules like this missed some cases, but perhaps
> that's no longer an issue with the latest Coccinelle version?

Not a problem, it seems; at least Coccinelle 1.1.1 is still able to
recreate the conversions from 45ccef87b3 (use COPY_ARRAY, 2016-09-25)
and 921d49be86 (use COPY_ARRAY for copying arrays, 2019-06-15) with the
patch below, which removes the normalization rules.  It increases the
processing time for array.cocci from 53s to 66s for me, though.  Worth
the increased precision and clarity?

René

---
 contrib/coccinelle/array.cocci | 82 +++++++++++++++++-----------------
 1 file changed, 40 insertions(+), 42 deletions(-)

diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
index 9a4f00cb1b..aa75937950 100644
--- a/contrib/coccinelle/array.cocci
+++ b/contrib/coccinelle/array.cocci
@@ -1,60 +1,58 @@
 @@
-expression dst, src, n, E;
+type T;
+T *dst_ptr;
+T *src_ptr;
+expression n;
 @@
-  memcpy(dst, src, n * sizeof(
-- E[...]
-+ *(E)
-  ))
+- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_ptr))
+-                                \| sizeof(*(src_ptr))
+-                                \| sizeof(dst_ptr[...])
+-                                \| sizeof(src_ptr[...])
+-                                \) )
++ COPY_ARRAY(dst_ptr, src_ptr, n)

 @@
 type T;
-T *ptr;
-T[] arr;
-expression E, n;
+T *dst_ptr;
+T[] src_arr;
+expression n;
 @@
-(
-  memcpy(ptr, E,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(arr, E,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, ptr,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, arr,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
-)
+- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_ptr))
+-                                \| sizeof(*(src_arr))
+-                                \| sizeof(dst_ptr[...])
+-                                \| sizeof(src_arr[...])
+-                                \) )
++ COPY_ARRAY(dst_ptr, src_arr, n)

 @@
 type T;
-T *dst_ptr;
+T[] dst_arr;
 T *src_ptr;
+expression n;
+@@
+- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_arr))
+-                                \| sizeof(*(src_ptr))
+-                                \| sizeof(dst_arr[...])
+-                                \| sizeof(src_ptr[...])
+-                                \) )
++ COPY_ARRAY(dst_arr, src_ptr, n)
+
+@@
+type T;
 T[] dst_arr;
 T[] src_arr;
 expression n;
 @@
-(
-- memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
-+ COPY_ARRAY(dst_ptr, src_ptr, n)
-|
-- memcpy(dst_ptr, src_arr, (n) * sizeof(T))
-+ COPY_ARRAY(dst_ptr, src_arr, n)
-|
-- memcpy(dst_arr, src_ptr, (n) * sizeof(T))
-+ COPY_ARRAY(dst_arr, src_ptr, n)
-|
-- memcpy(dst_arr, src_arr, (n) * sizeof(T))
+- memcpy(dst_arr, src_arr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_arr))
+-                                \| sizeof(*(src_arr))
+-                                \| sizeof(dst_arr[...])
+-                                \| sizeof(src_arr[...])
+-                                \) )
 + COPY_ARRAY(dst_arr, src_arr, n)
-)

 @@
 type T;
--
2.37.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v3] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove()
  2022-07-07  5:52 ` [PATCH v2] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove() Junio C Hamano
@ 2022-07-10  1:33   ` Junio C Hamano
  2022-07-18 20:30     ` Derrick Stolee
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-07-10  1:33 UTC (permalink / raw)
  To: git

The variables 'source', 'destination', and 'submodule_gitfile' are
all of type "const char **", and an element of such an array is of
"type const char *", but these memmove() calls were written as if
these variables are of type "char **".

Once these memmove() calls are fixed to use the correct type to
compute the number of bytes to be moved, e.g.

-      memmove(source + i, source + i + 1, n * sizeof(char *));
+      memmove(source + i, source + i + 1, n * sizeof(const char *));

existing contrib/coccinelle/array.cocci rules can recognize them as
candidates for turning into MOVE_ARRAY().

While at it, use CALLOC_ARRAY() instead of xcalloc() to allocate the
modes[] array that is involved in the change.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * The first hunk is new; with this additional rewrite from
   xcalloc() to CALLOC_ARRAY(), the static-analysis job on 'seen'
   starts passing (cf. https://github.com/git/git/runs/7267146111)

   I think this is now good enough to merge down to 'next' and to
   'master'.

 builtin/mv.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/builtin/mv.c b/builtin/mv.c
index 83a465ba83..859fa5023f 100644
--- a/builtin/mv.c
+++ b/builtin/mv.c
@@ -148,7 +148,8 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 		die(_("index file corrupt"));
 
 	source = internal_prefix_pathspec(prefix, argv, argc, 0);
-	modes = xcalloc(argc, sizeof(enum update_mode));
+	CALLOC_ARRAY(modes, argc);
+
 	/*
 	 * Keep trailing slash, needed to let
 	 * "git mv file no-such-dir/" error out, except in the case
@@ -282,14 +283,11 @@ int cmd_mv(int argc, const char **argv, const char *prefix)
 remove_entry:
 		if (--argc > 0) {
 			int n = argc - i;
-			memmove(source + i, source + i + 1,
-				n * sizeof(char *));
-			memmove(destination + i, destination + i + 1,
-				n * sizeof(char *));
-			memmove(modes + i, modes + i + 1,
-				n * sizeof(enum update_mode));
-			memmove(submodule_gitfile + i, submodule_gitfile + i + 1,
-				n * sizeof(char *));
+			MOVE_ARRAY(source + i, source + i + 1, n);
+			MOVE_ARRAY(destination + i, destination + i + 1, n);
+			MOVE_ARRAY(modes + i, modes + i + 1, n);
+			MOVE_ARRAY(submodule_gitfile + i,
+				   submodule_gitfile + i + 1, n);
 			i--;
 		}
 	}
-- 
2.37.0-238-g7905cdbf0e


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] builtin/mv.c: use correct type to compute size of an array element
  2022-07-09  8:16       ` René Scharfe
@ 2022-07-10  5:38         ` Junio C Hamano
  2022-07-10 10:05           ` [PATCH] cocci: avoid normalization rules for memcpy René Scharfe
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2022-07-10  5:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ævar Arnfjörð Bjarmason, git

René Scharfe <l.s.r@web.de> writes:

> Not a problem, it seems; at least Coccinelle 1.1.1 is still able to
> recreate the conversions from 45ccef87b3 (use COPY_ARRAY, 2016-09-25)
> and 921d49be86 (use COPY_ARRAY for copying arrays, 2019-06-15) with the
> patch below, which removes the normalization rules.  

The result certainly is cleaner and also looks much less error
prone.

> It increases the
> processing time for array.cocci from 53s to 66s for me, though.  Worth
> the increased precision and clarity?

I would say so.  For manual tests that humans stare at their
progress waiting for their completion, every second may matter, but
a check that makes us wait for more than 30 seconds *and* forces us
to be extra careful when vetting its validity is worse than a check
that takes 10 more seconds with much less risk of broken output.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] cocci: avoid normalization rules for memcpy
  2022-07-10  5:38         ` Junio C Hamano
@ 2022-07-10 10:05           ` René Scharfe
  2022-07-10 14:45             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 18+ messages in thread
From: René Scharfe @ 2022-07-10 10:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, git

Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are
intended to reduce the number of sizeof variants to deal with.  They can
have unintended side effects if only they match, but not the one for the
COPY_ARRAY conversion at the end.

Avoid these side effects by instead using a self-contained rule for each
combination of array and pointer for source and destination which lists
all sizeof variants inline.

This lets "make contrib/coccinelle/array.cocci.patch" take 15% longer on
my machine, but gives peace of mind that no incomplete transformation
will be generated.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
Patch generated with --patience for easier reading.

 contrib/coccinelle/array.cocci | 92 +++++++++++++++++-----------------
 1 file changed, 45 insertions(+), 47 deletions(-)

diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
index 9a4f00cb1b..aa75937950 100644
--- a/contrib/coccinelle/array.cocci
+++ b/contrib/coccinelle/array.cocci
@@ -1,60 +1,58 @@
-@@
-expression dst, src, n, E;
-@@
-  memcpy(dst, src, n * sizeof(
-- E[...]
-+ *(E)
-  ))
-
-@@
-type T;
-T *ptr;
-T[] arr;
-expression E, n;
-@@
-(
-  memcpy(ptr, E,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(arr, E,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, ptr,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, arr,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
-)
-
 @@
 type T;
 T *dst_ptr;
 T *src_ptr;
-T[] dst_arr;
-T[] src_arr;
 expression n;
 @@
-(
-- memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
+- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_ptr))
+-                                \| sizeof(*(src_ptr))
+-                                \| sizeof(dst_ptr[...])
+-                                \| sizeof(src_ptr[...])
+-                                \) )
 + COPY_ARRAY(dst_ptr, src_ptr, n)
-|
-- memcpy(dst_ptr, src_arr, (n) * sizeof(T))
+
+@@
+type T;
+T *dst_ptr;
+T[] src_arr;
+expression n;
+@@
+- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_ptr))
+-                                \| sizeof(*(src_arr))
+-                                \| sizeof(dst_ptr[...])
+-                                \| sizeof(src_arr[...])
+-                                \) )
 + COPY_ARRAY(dst_ptr, src_arr, n)
-|
-- memcpy(dst_arr, src_ptr, (n) * sizeof(T))
+
+@@
+type T;
+T[] dst_arr;
+T *src_ptr;
+expression n;
+@@
+- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_arr))
+-                                \| sizeof(*(src_ptr))
+-                                \| sizeof(dst_arr[...])
+-                                \| sizeof(src_ptr[...])
+-                                \) )
 + COPY_ARRAY(dst_arr, src_ptr, n)
-|
-- memcpy(dst_arr, src_arr, (n) * sizeof(T))
+
+@@
+type T;
+T[] dst_arr;
+T[] src_arr;
+expression n;
+@@
+- memcpy(dst_arr, src_arr, (n) * \( sizeof(T)
+-                                \| sizeof(*(dst_arr))
+-                                \| sizeof(*(src_arr))
+-                                \| sizeof(dst_arr[...])
+-                                \| sizeof(src_arr[...])
+-                                \) )
 + COPY_ARRAY(dst_arr, src_arr, n)
-)

 @@
 type T;
--
2.37.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] cocci: avoid normalization rules for memcpy
  2022-07-10 10:05           ` [PATCH] cocci: avoid normalization rules for memcpy René Scharfe
@ 2022-07-10 14:45             ` Ævar Arnfjörð Bjarmason
  2022-07-10 16:32               ` Ævar Arnfjörð Bjarmason
                                 ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-10 14:45 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git


On Sun, Jul 10 2022, René Scharfe wrote:

> Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are
> intended to reduce the number of sizeof variants to deal with.  They can
> have unintended side effects if only they match, but not the one for the
> COPY_ARRAY conversion at the end.

Since ab/cocci-unused is marked for "next" it would be really nice to
have this based on top so we can add tests for these transformations
(the topic adds a way to do that).

But if you don't feel like  doing that this is fine too.

> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
> index 9a4f00cb1b..aa75937950 100644
> --- a/contrib/coccinelle/array.cocci
> +++ b/contrib/coccinelle/array.cocci
> @@ -1,60 +1,58 @@
> -@@
> -expression dst, src, n, E;
> -@@
> -  memcpy(dst, src, n * sizeof(
> -- E[...]
> -+ *(E)
> -  ))
> -
> -@@
> -type T;
> -T *ptr;
> -T[] arr;
> -expression E, n;
> -@@
> -(
> -  memcpy(ptr, E,
> -- n * sizeof(*(ptr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(arr, E,
> -- n * sizeof(*(arr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(E, ptr,
> -- n * sizeof(*(ptr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(E, arr,
> -- n * sizeof(*(arr))
> -+ n * sizeof(T)
> -  )
> -)
> -
>  @@
>  type T;
>  T *dst_ptr;
>  T *src_ptr;
> -T[] dst_arr;
> -T[] src_arr;
>  expression n;
>  @@
> -(
> -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
> +- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T)
> +-                                \| sizeof(*(dst_ptr))
> +-                                \| sizeof(*(src_ptr))
> +-                                \| sizeof(dst_ptr[...])
> +-                                \| sizeof(src_ptr[...])
> +-                                \) )
>  + COPY_ARRAY(dst_ptr, src_ptr, n)
> -|
> -- memcpy(dst_ptr, src_arr, (n) * sizeof(T))
> +
> +@@
> +type T;
> +T *dst_ptr;
> +T[] src_arr;
> +expression n;
> +@@
> +- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T)
> +-                                \| sizeof(*(dst_ptr))
> +-                                \| sizeof(*(src_arr))
> +-                                \| sizeof(dst_ptr[...])
> +-                                \| sizeof(src_arr[...])
> +-                                \) )
>  + COPY_ARRAY(dst_ptr, src_arr, n)
> -|
> -- memcpy(dst_arr, src_ptr, (n) * sizeof(T))
> +
> +@@
> +type T;
> +T[] dst_arr;
> +T *src_ptr;
> +expression n;
> +@@
> +- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T)
> +-                                \| sizeof(*(dst_arr))
> +-                                \| sizeof(*(src_ptr))
> +-                                \| sizeof(dst_arr[...])
> +-                                \| sizeof(src_ptr[...])
> +-                                \) )
>  + COPY_ARRAY(dst_arr, src_ptr, n)
> -|
> -- memcpy(dst_arr, src_arr, (n) * sizeof(T))
> +
> +@@
> +type T;
> +T[] dst_arr;
> +T[] src_arr;
> +expression n;
> +@@
> +- memcpy(dst_arr, src_arr, (n) * \( sizeof(T)
> +-                                \| sizeof(*(dst_arr))
> +-                                \| sizeof(*(src_arr))
> +-                                \| sizeof(dst_arr[...])
> +-                                \| sizeof(src_arr[...])
> +-                                \) )
>  + COPY_ARRAY(dst_arr, src_arr, n)
> -)
>
>  @@
>  type T;

Hrm, this seems like a lot of repetition, it's here in the rules you're
editing already, but these repeated "sizeof" make it a lot more verbose.

Isn't there a way to avoid this by simply wrapping this across lines, I
didn't test, but I think you can do this sort of thing in the cocci
grammar:

- memcpy(
- COPY_ARRAY(
  (
  dst_arr
  |
  dst_ptr
  )
  ,
  (
  src_arr
  |
  src_ptr
  )
  ,
  (n) *
-  [your big sizeof alternate here]
  )

I.e. you want to preserve whatever we match in the 1st and 2nd
arguments, but only want to munge part of the 3rd argument. The cocci
grammar can "reach into" lines like that, it doesn't need to be limited
to line-based diffs.

But I didn't try it in this caes, and maybe there's a good reason for
why it can't happen in this case...

I also wonder if that won't be a lot faster, i.e. if you can condense
this all into one rule it won't need to match this N times, but maybe
the overall complexity of the rules makes it come out to the same thing
in the end...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] cocci: avoid normalization rules for memcpy
  2022-07-10 14:45             ` Ævar Arnfjörð Bjarmason
@ 2022-07-10 16:32               ` Ævar Arnfjörð Bjarmason
  2022-07-10 19:30               ` Junio C Hamano
  2022-07-11 17:11               ` René Scharfe
  2 siblings, 0 replies; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-10 16:32 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git


On Sun, Jul 10 2022, Ævar Arnfjörð Bjarmason wrote:

> On Sun, Jul 10 2022, René Scharfe wrote:

> Isn't there a way to avoid this by simply wrapping this across lines, I
> didn't test, but I think you can do this sort of thing in the cocci
> grammar:
>
> - memcpy(
> - COPY_ARRAY(

Oops, typo!

I just manually typed this out, so I meant "+ COPY_ARRAY(" here, in case
it wasn't clear.

To elaborate: Here's an example in the cocci repository:
https://github.com/coccinelle/coccinelle/blob/master/tests/pcim.cocci

You can also use "..." elides, but I didn't check if that works for the
start of an argument list (but it should, I think), here's an example
with it at the end:
https://github.com/coccinelle/coccinelle/blob/master/scripts/coccicheck/cocci/kc.cocci

So I think rules like:

- memcpy(
+ COPY_ARRAY(
  ...,
- A
+ B
  );

Are possible, and something you can use in this case (I have "5.1 Basic
Transformations" from cocci_spatch_grammar.pdf in front of me).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] cocci: avoid normalization rules for memcpy
  2022-07-10 14:45             ` Ævar Arnfjörð Bjarmason
  2022-07-10 16:32               ` Ævar Arnfjörð Bjarmason
@ 2022-07-10 19:30               ` Junio C Hamano
  2022-07-11 17:11               ` René Scharfe
  2 siblings, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2022-07-10 19:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: René Scharfe, git

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> Since ab/cocci-unused is marked for "next" it would be really nice to
> have this based on top so we can add tests for these transformations
> (the topic adds a way to do that).

Please do not follow this mantra in general.

What is already in 'next' cannot be avoided.  What is not yet in
'next' can yield just fine if needed.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] cocci: avoid normalization rules for memcpy
  2022-07-10 14:45             ` Ævar Arnfjörð Bjarmason
  2022-07-10 16:32               ` Ævar Arnfjörð Bjarmason
  2022-07-10 19:30               ` Junio C Hamano
@ 2022-07-11 17:11               ` René Scharfe
  2022-07-11 20:05                 ` Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 18+ messages in thread
From: René Scharfe @ 2022-07-11 17:11 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, git

Am 10.07.22 um 16:45 schrieb Ævar Arnfjörð Bjarmason:
>
> On Sun, Jul 10 2022, René Scharfe wrote:
>
>> Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are
>> intended to reduce the number of sizeof variants to deal with.  They can
>> have unintended side effects if only they match, but not the one for the
>> COPY_ARRAY conversion at the end.
>
> Since ab/cocci-unused is marked for "next" it would be really nice to
> have this based on top so we can add tests for these transformations
> (the topic adds a way to do that).

Testing semantic patches sounds like a good idea.  We can add tests later,
once that feature landed.

>
> But if you don't feel like  doing that this is fine too.
>
>> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
>> index 9a4f00cb1b..aa75937950 100644
>> --- a/contrib/coccinelle/array.cocci
>> +++ b/contrib/coccinelle/array.cocci
>> @@ -1,60 +1,58 @@
>> -@@
>> -expression dst, src, n, E;
>> -@@
>> -  memcpy(dst, src, n * sizeof(
>> -- E[...]
>> -+ *(E)
>> -  ))
>> -
>> -@@
>> -type T;
>> -T *ptr;
>> -T[] arr;
>> -expression E, n;
>> -@@
>> -(
>> -  memcpy(ptr, E,
>> -- n * sizeof(*(ptr))
>> -+ n * sizeof(T)
>> -  )
>> -|
>> -  memcpy(arr, E,
>> -- n * sizeof(*(arr))
>> -+ n * sizeof(T)
>> -  )
>> -|
>> -  memcpy(E, ptr,
>> -- n * sizeof(*(ptr))
>> -+ n * sizeof(T)
>> -  )
>> -|
>> -  memcpy(E, arr,
>> -- n * sizeof(*(arr))
>> -+ n * sizeof(T)
>> -  )
>> -)
>> -
>>  @@
>>  type T;
>>  T *dst_ptr;
>>  T *src_ptr;
>> -T[] dst_arr;
>> -T[] src_arr;
>>  expression n;
>>  @@
>> -(
>> -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
>> +- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T)
>> +-                                \| sizeof(*(dst_ptr))
>> +-                                \| sizeof(*(src_ptr))
>> +-                                \| sizeof(dst_ptr[...])
>> +-                                \| sizeof(src_ptr[...])
>> +-                                \) )
>>  + COPY_ARRAY(dst_ptr, src_ptr, n)
>> -|
>> -- memcpy(dst_ptr, src_arr, (n) * sizeof(T))
>> +
>> +@@
>> +type T;
>> +T *dst_ptr;
>> +T[] src_arr;
>> +expression n;
>> +@@
>> +- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T)
>> +-                                \| sizeof(*(dst_ptr))
>> +-                                \| sizeof(*(src_arr))
>> +-                                \| sizeof(dst_ptr[...])
>> +-                                \| sizeof(src_arr[...])
>> +-                                \) )
>>  + COPY_ARRAY(dst_ptr, src_arr, n)
>> -|
>> -- memcpy(dst_arr, src_ptr, (n) * sizeof(T))
>> +
>> +@@
>> +type T;
>> +T[] dst_arr;
>> +T *src_ptr;
>> +expression n;
>> +@@
>> +- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T)
>> +-                                \| sizeof(*(dst_arr))
>> +-                                \| sizeof(*(src_ptr))
>> +-                                \| sizeof(dst_arr[...])
>> +-                                \| sizeof(src_ptr[...])
>> +-                                \) )
>>  + COPY_ARRAY(dst_arr, src_ptr, n)
>> -|
>> -- memcpy(dst_arr, src_arr, (n) * sizeof(T))
>> +
>> +@@
>> +type T;
>> +T[] dst_arr;
>> +T[] src_arr;
>> +expression n;
>> +@@
>> +- memcpy(dst_arr, src_arr, (n) * \( sizeof(T)
>> +-                                \| sizeof(*(dst_arr))
>> +-                                \| sizeof(*(src_arr))
>> +-                                \| sizeof(dst_arr[...])
>> +-                                \| sizeof(src_arr[...])
>> +-                                \) )
>>  + COPY_ARRAY(dst_arr, src_arr, n)
>> -)
>>
>>  @@
>>  type T;
>
> Hrm, this seems like a lot of repetition, it's here in the rules you're
> editing already, but these repeated "sizeof" make it a lot more verbose.
>
> Isn't there a way to avoid this by simply wrapping this across lines, I
> didn't test, but I think you can do this sort of thing in the cocci
> grammar:
>
> - memcpy(
> - COPY_ARRAY(
>   (
>   dst_arr
>   |
>   dst_ptr
>   )
>   ,
>   (
>   src_arr
>   |
>   src_ptr
>   )
>   ,
>   (n) *
> -  [your big sizeof alternate here]
>   )

Hmm, that would match many more combinations, e.g. this one:

void f(int *a, int *b, long n, int c[1]) {
	memcpy(a, b, n * sizeof(*c));
}

The elements of a, b and c have the same type, so replacing c with a
(which a conversion to COPY_ARRAY would do) would produce the same
object code.  I can't come up with a plausible scenario like above and
where a type change of c down the line would cause problems, but I
also can't convince myself that no such thing can exist.  Tricky.

And I can't get it to format the whitespace around the third argument
of COPY_ARRAY nicely in all cases.

And it takes 37% longer on my machine.

But it sure is more compact. :)

@@
type T;
T *dst_ptr;
T *src_ptr;
T[] dst_arr;
T[] src_arr;
expression n;
@@
- memcpy
+ COPY_ARRAY
  (
  \( dst_ptr \| dst_arr \) ,
  \( src_ptr \| src_arr \) ,
- (n) *  \( sizeof(T)
-        \| sizeof(*(dst_ptr))
-        \| sizeof(*(dst_arr))
-        \| sizeof(*(src_ptr))
-        \| sizeof(*(src_arr))
-        \| sizeof(dst_ptr[...])
-        \| sizeof(dst_arr[...])
-        \| sizeof(src_ptr[...])
-        \| sizeof(src_arr[...])
-        \)
+ n
  )

>
> I.e. you want to preserve whatever we match in the 1st and 2nd
> arguments, but only want to munge part of the 3rd argument. The cocci
> grammar can "reach into" lines like that, it doesn't need to be limited
> to line-based diffs.
>
> But I didn't try it in this caes, and maybe there's a good reason for
> why it can't happen in this case...
>
> I also wonder if that won't be a lot faster, i.e. if you can condense
> this all into one rule it won't need to match this N times, but maybe
> the overall complexity of the rules makes it come out to the same thing
> in the end...

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] cocci: avoid normalization rules for memcpy
  2022-07-11 17:11               ` René Scharfe
@ 2022-07-11 20:05                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 18+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-07-11 20:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git


On Mon, Jul 11 2022, René Scharfe wrote:

> Am 10.07.22 um 16:45 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Sun, Jul 10 2022, René Scharfe wrote:
>>
>>> Some of the rules for using COPY_ARRAY instead of memcpy with sizeof are
>>> intended to reduce the number of sizeof variants to deal with.  They can
>>> have unintended side effects if only they match, but not the one for the
>>> COPY_ARRAY conversion at the end.
>>
>> Since ab/cocci-unused is marked for "next" it would be really nice to
>> have this based on top so we can add tests for these transformations
>> (the topic adds a way to do that).
>
> Testing semantic patches sounds like a good idea.  We can add tests later,
> once that feature landed.

Okey, I might look at this then...

>>
>> But if you don't feel like  doing that this is fine too.
>>
>>> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
>>> index 9a4f00cb1b..aa75937950 100644
>>> --- a/contrib/coccinelle/array.cocci
>>> +++ b/contrib/coccinelle/array.cocci
>>> @@ -1,60 +1,58 @@
>>> -@@
>>> -expression dst, src, n, E;
>>> -@@
>>> -  memcpy(dst, src, n * sizeof(
>>> -- E[...]
>>> -+ *(E)
>>> -  ))
>>> -
>>> -@@
>>> -type T;
>>> -T *ptr;
>>> -T[] arr;
>>> -expression E, n;
>>> -@@
>>> -(
>>> -  memcpy(ptr, E,
>>> -- n * sizeof(*(ptr))
>>> -+ n * sizeof(T)
>>> -  )
>>> -|
>>> -  memcpy(arr, E,
>>> -- n * sizeof(*(arr))
>>> -+ n * sizeof(T)
>>> -  )
>>> -|
>>> -  memcpy(E, ptr,
>>> -- n * sizeof(*(ptr))
>>> -+ n * sizeof(T)
>>> -  )
>>> -|
>>> -  memcpy(E, arr,
>>> -- n * sizeof(*(arr))
>>> -+ n * sizeof(T)
>>> -  )
>>> -)
>>> -
>>>  @@
>>>  type T;
>>>  T *dst_ptr;
>>>  T *src_ptr;
>>> -T[] dst_arr;
>>> -T[] src_arr;
>>>  expression n;
>>>  @@
>>> -(
>>> -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
>>> +- memcpy(dst_ptr, src_ptr, (n) * \( sizeof(T)
>>> +-                                \| sizeof(*(dst_ptr))
>>> +-                                \| sizeof(*(src_ptr))
>>> +-                                \| sizeof(dst_ptr[...])
>>> +-                                \| sizeof(src_ptr[...])
>>> +-                                \) )
>>>  + COPY_ARRAY(dst_ptr, src_ptr, n)
>>> -|
>>> -- memcpy(dst_ptr, src_arr, (n) * sizeof(T))
>>> +
>>> +@@
>>> +type T;
>>> +T *dst_ptr;
>>> +T[] src_arr;
>>> +expression n;
>>> +@@
>>> +- memcpy(dst_ptr, src_arr, (n) * \( sizeof(T)
>>> +-                                \| sizeof(*(dst_ptr))
>>> +-                                \| sizeof(*(src_arr))
>>> +-                                \| sizeof(dst_ptr[...])
>>> +-                                \| sizeof(src_arr[...])
>>> +-                                \) )
>>>  + COPY_ARRAY(dst_ptr, src_arr, n)
>>> -|
>>> -- memcpy(dst_arr, src_ptr, (n) * sizeof(T))
>>> +
>>> +@@
>>> +type T;
>>> +T[] dst_arr;
>>> +T *src_ptr;
>>> +expression n;
>>> +@@
>>> +- memcpy(dst_arr, src_ptr, (n) * \( sizeof(T)
>>> +-                                \| sizeof(*(dst_arr))
>>> +-                                \| sizeof(*(src_ptr))
>>> +-                                \| sizeof(dst_arr[...])
>>> +-                                \| sizeof(src_ptr[...])
>>> +-                                \) )
>>>  + COPY_ARRAY(dst_arr, src_ptr, n)
>>> -|
>>> -- memcpy(dst_arr, src_arr, (n) * sizeof(T))
>>> +
>>> +@@
>>> +type T;
>>> +T[] dst_arr;
>>> +T[] src_arr;
>>> +expression n;
>>> +@@
>>> +- memcpy(dst_arr, src_arr, (n) * \( sizeof(T)
>>> +-                                \| sizeof(*(dst_arr))
>>> +-                                \| sizeof(*(src_arr))
>>> +-                                \| sizeof(dst_arr[...])
>>> +-                                \| sizeof(src_arr[...])
>>> +-                                \) )
>>>  + COPY_ARRAY(dst_arr, src_arr, n)
>>> -)
>>>
>>>  @@
>>>  type T;
>>
>> Hrm, this seems like a lot of repetition, it's here in the rules you're
>> editing already, but these repeated "sizeof" make it a lot more verbose.
>>
>> Isn't there a way to avoid this by simply wrapping this across lines, I
>> didn't test, but I think you can do this sort of thing in the cocci
>> grammar:
>>
>> - memcpy(
>> - COPY_ARRAY(
>>   (
>>   dst_arr
>>   |
>>   dst_ptr
>>   )
>>   ,
>>   (
>>   src_arr
>>   |
>>   src_ptr
>>   )
>>   ,
>>   (n) *
>> -  [your big sizeof alternate here]
>>   )
>
> Hmm, that would match many more combinations, e.g. this one:
>
> void f(int *a, int *b, long n, int c[1]) {
> 	memcpy(a, b, n * sizeof(*c));
> }
>
> The elements of a, b and c have the same type, so replacing c with a
> (which a conversion to COPY_ARRAY would do) would produce the same
> object code.  I can't come up with a plausible scenario like above and
> where a type change of c down the line would cause problems, but I
> also can't convince myself that no such thing can exist.  Tricky.
>
> And I can't get it to format the whitespace around the third argument
> of COPY_ARRAY nicely in all cases.
>
> And it takes 37% longer on my machine.
>
> But it sure is more compact. :)
>
> @@
> type T;
> T *dst_ptr;
> T *src_ptr;
> T[] dst_arr;
> T[] src_arr;
> expression n;
> @@
> - memcpy
> + COPY_ARRAY
>   (
>   \( dst_ptr \| dst_arr \) ,
>   \( src_ptr \| src_arr \) ,
> - (n) *  \( sizeof(T)
> -        \| sizeof(*(dst_ptr))
> -        \| sizeof(*(dst_arr))
> -        \| sizeof(*(src_ptr))
> -        \| sizeof(*(src_arr))
> -        \| sizeof(dst_ptr[...])
> -        \| sizeof(dst_arr[...])
> -        \| sizeof(src_ptr[...])
> -        \| sizeof(src_arr[...])
> -        \)
> + n
>   )

..once those land, because cocci debug output is a lot more useful with
target test data :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove()
  2022-07-10  1:33   ` [PATCH v3] " Junio C Hamano
@ 2022-07-18 20:30     ` Derrick Stolee
  0 siblings, 0 replies; 18+ messages in thread
From: Derrick Stolee @ 2022-07-18 20:30 UTC (permalink / raw)
  To: Junio C Hamano, git

On 7/9/2022 9:33 PM, Junio C Hamano wrote:
> The variables 'source', 'destination', and 'submodule_gitfile' are
> all of type "const char **", and an element of such an array is of
> "type const char *", but these memmove() calls were written as if
> these variables are of type "char **".
> 
> Once these memmove() calls are fixed to use the correct type to
> compute the number of bytes to be moved, e.g.
> 
> -      memmove(source + i, source + i + 1, n * sizeof(char *));
> +      memmove(source + i, source + i + 1, n * sizeof(const char *));
> 
> existing contrib/coccinelle/array.cocci rules can recognize them as
> candidates for turning into MOVE_ARRAY().
> 
> While at it, use CALLOC_ARRAY() instead of xcalloc() to allocate the
> modes[] array that is involved in the change.

I'm super late in noticing this, but these changes will fix the
static-analysis build in the CI build, so the sooner this can make
it to master the better.

Thanks!
-Stolee


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-07-18 20:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-07  2:02 [PATCH] builtin/mv.c: use correct type to compute size of an array element Junio C Hamano
2022-07-07  5:52 ` [PATCH v2] builtin/mv.c: use the MOVE_ARRAY() macro instead of memmove() Junio C Hamano
2022-07-10  1:33   ` [PATCH v3] " Junio C Hamano
2022-07-18 20:30     ` Derrick Stolee
2022-07-07 12:11 ` [PATCH] builtin/mv.c: use correct type to compute size of an array element Ævar Arnfjörð Bjarmason
2022-07-07 18:10   ` Junio C Hamano
2022-07-07 19:11     ` René Scharfe
2022-07-09  8:16       ` René Scharfe
2022-07-10  5:38         ` Junio C Hamano
2022-07-10 10:05           ` [PATCH] cocci: avoid normalization rules for memcpy René Scharfe
2022-07-10 14:45             ` Ævar Arnfjörð Bjarmason
2022-07-10 16:32               ` Ævar Arnfjörð Bjarmason
2022-07-10 19:30               ` Junio C Hamano
2022-07-11 17:11               ` René Scharfe
2022-07-11 20:05                 ` Ævar Arnfjörð Bjarmason
2022-07-07 18:27   ` [PATCH] builtin/mv.c: use correct type to compute size of an array element René Scharfe
2022-07-07 18:42 ` Jeff King
2022-07-07 20:25   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.