git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* coccinelle: adjustments for array.cocci?
@ 2019-11-12 15:08 Markus Elfring
  2019-11-12 18:37 ` René Scharfe
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-12 15:08 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

Hello,

I would like to comment implementation details from
the commit 177fbab747da4f58cb2a8ce010b3515c86dd67c9 ("coccinelle: use COPY_ARRAY for copying arrays").


Do you find the following code variant (for the semantic patch language) also useful?

 memcpy(
(       ptr, E, n *
-       sizeof(*(ptr))
+       sizeof(T)
|       arr, E, n *
-       sizeof(*(arr))
+       sizeof(T)
|       E, ptr, n *
-       sizeof(*(ptr))
+       sizeof(T)
|       E, arr, n *
-       sizeof(*(arr))
+       sizeof(T)
)
       )


How do you think about the following SmPL code variant?

-memcpy
+COPY_ARRAY
       (
(       dst_ptr
|       dst_arr
)
       ,
(       src_ptr
|       src_arr
)
       ,
-       (n) * sizeof(T)
+       n
       )


Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-12 15:08 coccinelle: adjustments for array.cocci? Markus Elfring
@ 2019-11-12 18:37 ` René Scharfe
  2019-11-13  2:11   ` Junio C Hamano
                     ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: René Scharfe @ 2019-11-12 18:37 UTC (permalink / raw)
  To: Markus Elfring, git

Am 12.11.19 um 16:08 schrieb Markus Elfring:
> Hello,
>
> I would like to comment implementation details from
> the commit 177fbab747da4f58cb2a8ce010b3515c86dd67c9 ("coccinelle: use COPY_ARRAY for copying arrays").
>
>
> Do you find the following code variant (for the semantic patch language) also useful?
>
>  memcpy(
> (       ptr, E, n *
> -       sizeof(*(ptr))
> +       sizeof(T)
> |       arr, E, n *
> -       sizeof(*(arr))
> +       sizeof(T)
> |       E, ptr, n *
> -       sizeof(*(ptr))
> +       sizeof(T)
> |       E, arr, n *
> -       sizeof(*(arr))
> +       sizeof(T)
> )
>        )

This reduces duplication in the semantic patch, which is nice.  I think
I tried something like that at the time, but found that it failed to
produce some of the cases in 921d49be86 ("use COPY_ARRAY for copying
arrays", 2019-06-15) for some reason.

> How do you think about the following SmPL code variant?
>
> -memcpy
> +COPY_ARRAY
>        (
> (       dst_ptr
> |       dst_arr
> )
>        ,
> (       src_ptr
> |       src_arr
> )
>        ,
> -       (n) * sizeof(T)
> +       n
>        )

This eliminates duplication in the semantic patch, which is good.  It
messes up the indentation of n in some of the cases in 921d49be86 ("use
COPY_ARRAY for copying arrays", 2019-06-15), though.  Hmm, but that can
be cured by duplicating the comma:

   - , (n) * sizeof(T)
   + , n

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-12 18:37 ` René Scharfe
@ 2019-11-13  2:11   ` Junio C Hamano
  2019-11-13  8:49     ` Markus Elfring
  2019-11-15 20:37   ` coccinelle: " Markus Elfring
  2019-11-16 16:33   ` Markus Elfring
  2 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-11-13  2:11 UTC (permalink / raw)
  To: René Scharfe; +Cc: Markus Elfring, git

René Scharfe <l.s.r@web.de> writes:

> This reduces duplication in the semantic patch, which is nice.  I think
> I tried something like that at the time, but found that it failed to
> produce some of the cases in 921d49be86 ("use COPY_ARRAY for copying
> arrays", 2019-06-15) for some reason.

Thanks for mentioning.

I too recall that seemingly redundant entries were noticed during
the review and at least back then removing the seemingly redundant
ones caused failures in rewriting.

That is why I am hesitant to touch any patch that says "simplify
cocci rule" making it sound as if simplification is a good thing on
its own.  I have no problem with "we change the rule this way, which
eliminates this false positive / negative, that is demonstrated in
the added tests in t/ directory", though.

Thanks.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-13  2:11   ` Junio C Hamano
@ 2019-11-13  8:49     ` Markus Elfring
  2019-11-14  2:03       ` Junio C Hamano
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-13  8:49 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe; +Cc: git

> I too recall that seemingly redundant entries were noticed during
> the review and at least back then removing the seemingly redundant
> ones caused failures in rewriting.

I am curious if the redundancy can be reconsidered once more.

Do you refer to open issues around source code reformatting
and pretty-printing together with the Coccinelle software here?

Would you like to achieve any further improvements also in this area?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-13  8:49     ` Markus Elfring
@ 2019-11-14  2:03       ` Junio C Hamano
  2019-11-14 13:15         ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-11-14  2:03 UTC (permalink / raw)
  To: Markus Elfring; +Cc: René Scharfe, git

Markus Elfring <Markus.Elfring@web.de> writes:

>> I too recall that seemingly redundant entries were noticed during
>> the review and at least back then removing the seemingly redundant
>> ones caused failures in rewriting.
>
> I am curious if the redundancy can be reconsidered once more.
>
> Do you refer to open issues around source code reformatting
> and pretty-printing together with the Coccinelle software here?

Sorry, I do not follow.  

If you are asking if I am interested in following bleeding edge
Coccinelle development and use this project as a guinea pig to do
so, then the answer is no.  I'd rather see us instead staying on the
trailing edge ;-) to make sure that we use common denominator
features that are known to be available in all widely deployed and
perhaps a bit dated versions that come with popular distros.

And if that means we have to accept inefficient ways to express our
patterns, we are willing to pay for that cost.

So, "the A.cocci file uses a set of inefficient expressions that can
be written more concisely like this, using the bleeding edge version
of the syntax" is not a useful improvement for the purpose of this
project, while "the A.cocci file uses a set of inefficient
expressions that can be written more concisely like this, and all
versions of cocci that is newer than X would understand the
notation.  Even distro D that tends to ship with fairly stale
versions of packages ship version X+n, so this change should be
safe" is very much appreciated.

Thanks.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-14  2:03       ` Junio C Hamano
@ 2019-11-14 13:15         ` Markus Elfring
  2019-11-14 16:41           ` René Scharfe
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-14 13:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, git

>>> I too recall that seemingly redundant entries were noticed during
>>> the review and at least back then removing the seemingly redundant
>>> ones caused failures in rewriting.
>>
>> I am curious if the redundancy can be reconsidered once more.
>>
>> Do you refer to open issues around source code reformatting
>> and pretty-printing together with the Coccinelle software here?
>
> Sorry, I do not follow.
>
> If you are asking if I am interested in following bleeding edge
> Coccinelle development and use this project as a guinea pig to do so,

I did not ask this.

You mentioned “failures”. - I became curious then if corresponding software
development challenges can be clarified a bit more.


> then the answer is no.

Such feedback is reasonable.


> I'd rather see us instead staying on the trailing edge ;-)
> to make sure that we use common denominator features that are known
> to be available in all widely deployed and perhaps a bit dated versions
> that come with popular distros.

I find that I am proposing script adjustments within the basic feature set
for the semantic patch language here.
Further fine-tuning will become possible, won't it?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-14 13:15         ` Markus Elfring
@ 2019-11-14 16:41           ` René Scharfe
  2019-11-14 17:14             ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: René Scharfe @ 2019-11-14 16:41 UTC (permalink / raw)
  To: Markus Elfring, Junio C Hamano; +Cc: git

Am 14.11.19 um 14:15 schrieb Markus Elfring:
> You mentioned “failures”. - I became curious then if corresponding software
> development challenges can be clarified a bit more.

Let's try to restore/repeat the pertinent paragraph, with context and
attribution:

Am 13.11.19 um 03:11 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>> Am 12.11.19 um 16:08 schrieb Markus Elfring:
>>>
>>> Do you find the following code variant (for the semantic patch language) also useful?
>>>
>>>  memcpy(
>>> (       ptr, E, n *
>>> -       sizeof(*(ptr))
>>> +       sizeof(T)
>>> |       arr, E, n *
>>> -       sizeof(*(arr))
>>> +       sizeof(T)
>>> |       E, ptr, n *
>>> -       sizeof(*(ptr))
>>> +       sizeof(T)
>>> |       E, arr, n *
>>> -       sizeof(*(arr))
>>> +       sizeof(T)
>>> )
>>>        )
>
>> This reduces duplication in the semantic patch, which is nice.  I think
>> I tried something like that at the time, but found that it failed to
>> produce some of the cases in 921d49be86 ("use COPY_ARRAY for copying
>> arrays", 2019-06-15) for some reason.
> Thanks for mentioning.
>
> I too recall that seemingly redundant entries were noticed during
> the review and at least back then removing the seemingly redundant
> ones caused failures in rewriting.

You can see for yourself by:

 1. applying the patch at the bottom to implement your suggested change,
 2. running "git show 921d49be86 | patch -p1 -R" to undo 921d49be86,
 3. running "make contrib/coccinelle/array.cocci.patch",
 4. running "patch -p1 <contrib/coccinelle/array.cocci.patch",
 5. running "git diff".

If the new version of array.cocci is equivalent to the current one then
that last step should show no difference.  For me, "git diff --stat"
reports, however:

 contrib/coccinelle/array.cocci | 30 ++++++++++++++----------------
 fast-import.c                  |  2 +-
 packfile.c                     |  4 ++--
 pretty.c                       |  4 ++--
 4 files changed, 19 insertions(+), 21 deletions(-)

The changes in array.cocci are expected of course, but the others
indicate that the new version missed transformations that the current
version generated.

René


-- >8 --
diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
index 46b8d2ee11..e7bcbefcc1 100644
--- a/contrib/coccinelle/array.cocci
+++ b/contrib/coccinelle/array.cocci
@@ -12,27 +12,25 @@ T *ptr;
 T[] arr;
 expression E, n;
 @@
+  memcpy(
 (
-  memcpy(ptr, E,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
+  ptr, E, n *
+- sizeof(*(ptr))
++ sizeof(T)
 |
-  memcpy(arr, E,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
+  arr, E, n *
+- sizeof(*(arr))
++ sizeof(T)
 |
-  memcpy(E, ptr,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
+  E, ptr, n *
+- sizeof(*(ptr))
++ sizeof(T)
 |
-  memcpy(E, arr,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
+  E, arr, n *
+- sizeof(*(arr))
++ sizeof(T)
 )
+  )

 @@
 type T;

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-14 16:41           ` René Scharfe
@ 2019-11-14 17:14             ` Markus Elfring
  2019-11-14 17:46               ` René Scharfe
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-14 17:14 UTC (permalink / raw)
  To: René Scharfe, git; +Cc: Junio C Hamano

> If the new version of array.cocci is equivalent to the current one then
> that last step should show no difference.

I hoped it.


>  contrib/coccinelle/array.cocci | 30 ++++++++++++++----------------
>  fast-import.c                  |  2 +-
>  packfile.c                     |  4 ++--
>  pretty.c                       |  4 ++--
>  4 files changed, 19 insertions(+), 21 deletions(-)
>
> The changes in array.cocci are expected of course, but the others
> indicate that the new version missed transformations that the current
> version generated.

Would we like to submit a bug report for the Coccinelle software?

Which version did you try out for the comparison of generated patches?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-14 17:14             ` Markus Elfring
@ 2019-11-14 17:46               ` René Scharfe
  2019-11-15 11:11                 ` git-coccinelle: " Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: René Scharfe @ 2019-11-14 17:46 UTC (permalink / raw)
  To: Markus Elfring, git; +Cc: Junio C Hamano

Am 14.11.19 um 18:14 schrieb Markus Elfring:
>> If the new version of array.cocci is equivalent to the current one then
>> that last step should show no difference.
>
> I hoped it.
>
>
>>  contrib/coccinelle/array.cocci | 30 ++++++++++++++----------------
>>  fast-import.c                  |  2 +-
>>  packfile.c                     |  4 ++--
>>  pretty.c                       |  4 ++--
>>  4 files changed, 19 insertions(+), 21 deletions(-)
>>
>> The changes in array.cocci are expected of course, but the others
>> indicate that the new version missed transformations that the current
>> version generated.
>
> Would we like to submit a bug report for the Coccinelle software?

Not really, because...

> Which version did you try out for the comparison of generated patches?

... I use the last version of the Debian testing package, 1.0.4.deb-4.
https://tracker.debian.org/pkg/coccinelle says it was removed from
testing recently.  I was actually waiting for a more recent version
like 1.0.8 to be packaged; not sure what's going on there.

Anyway, someone who can reproduce the issue using the latest release
of Coccinelle would be in a better position to file a bug report.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: git-coccinelle: adjustments for array.cocci?
  2019-11-14 17:46               ` René Scharfe
@ 2019-11-15 11:11                 ` Markus Elfring
  2019-11-15 14:20                   ` Markus Elfring
                                     ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-15 11:11 UTC (permalink / raw)
  To: Coccinelle; +Cc: git, Junio C Hamano, René Scharfe

> Anyway, someone who can reproduce the issue using the latest release
> of Coccinelle would be in a better position to file a bug report.

Hello,

I repeated the discussed source code transformation approach together
with the software combination “Coccinelle 1.0.8-00004-g842075f7” (OCaml 4.09).
https://github.com/coccinelle/coccinelle/commits/master

1. Yesterday I checked the source files out for the software “Git”
   according to the commit “The first batch post 2.24 cycle”.
   https://github.com/git/git/commit/d9f6f3b6195a0ca35642561e530798ad1469bd41

2. I restored a previous development status by the following command.

   git show 921d49be86 | patch -p1 -R

   See also:
   https://public-inbox.org/git/53346d52-e096-c651-f70a-ce6ca4d82ff9@web.de/

3. I stored a generated patch based on the currently released SmPL script.
   https://github.com/git/git/blob/177fbab747da4f58cb2a8ce010b3515c86dd67c9/contrib/coccinelle/array.cocci

4. I applied the following patch then.

diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
index 46b8d2ee11..89df184bbd 100644
--- a/contrib/coccinelle/array.cocci
+++ b/contrib/coccinelle/array.cocci
@@ -12,27 +12,21 @@ T *ptr;
 T[] arr;
 expression E, n;
 @@
-(
-  memcpy(ptr, E,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(arr, E,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, ptr,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, arr,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
+ memcpy(
+(       ptr, E, n *
+-       sizeof(*(ptr))
++       sizeof(T)
+|       arr, E, n *
+-       sizeof(*(arr))
++       sizeof(T)
+|       E, ptr, n *
+-       sizeof(*(ptr))
++       sizeof(T)
+|       E, arr, n *
+-       sizeof(*(arr))
++       sizeof(T)
 )
+       )

 @@
 type T;

   I suggested in this way to move a bit of SmPL code.

5. I stored another generated patch based on the adjusted SmPL script.

6. I performed a corresponding file comparison.

--- array-released.diff	2019-11-14 21:29:11.020576916 +0100
+++ array-reduced1.diff	2019-11-14 21:45:58.931956527 +0100
@@ -6,24 +6,10 @@
  	r->entry_count = t->entry_count;
  	r->delta_depth = t->delta_depth;
 -	memcpy(r->entries,t->entries,t->entry_count*sizeof(t->entries[0]));
-+	COPY_ARRAY(r->entries, t->entries, t->entry_count);
++	memcpy(r->entries,t->entries,t->entry_count*sizeof(*(t->entries)));
  	release_tree_content(t);
  	return r;
  }
-diff -u -p a/pretty.c b/pretty.c
---- a/pretty.c
-+++ b/pretty.c
-@@ -106,8 +106,8 @@ static void setup_commit_formats(void)
- 	commit_formats_len = ARRAY_SIZE(builtin_formats);
- 	builtin_formats_len = commit_formats_len;
- 	ALLOC_GROW(commit_formats, commit_formats_len, commit_formats_alloc);
--	memcpy(commit_formats, builtin_formats,
--	       sizeof(*builtin_formats)*ARRAY_SIZE(builtin_formats));
-+	COPY_ARRAY(commit_formats, builtin_formats,
-+		   ARRAY_SIZE(builtin_formats));
-
- 	git_config(git_pretty_formats_config, NULL);
- }
 diff -u -p a/packfile.c b/packfile.c
 --- a/packfile.c
 +++ b/packfile.c
@@ -36,17 +22,6 @@
  		} else {
  			ALLOC_GROW(poi_stack, poi_stack_nr+1, poi_stack_alloc);
  		}
-@@ -1698,8 +1698,8 @@ void *unpack_entry(struct repository *r,
- 		    && delta_stack == small_delta_stack) {
- 			delta_stack_alloc = alloc_nr(delta_stack_nr);
- 			ALLOC_ARRAY(delta_stack, delta_stack_alloc);
--			memcpy(delta_stack, small_delta_stack,
--			       sizeof(*delta_stack)*delta_stack_nr);
-+			COPY_ARRAY(delta_stack, small_delta_stack,
-+				   delta_stack_nr);
- 		} else {
- 			ALLOC_GROW(delta_stack, delta_stack_nr+1, delta_stack_alloc);
- 		}
 diff -u -p a/compat/regex/regexec.c b/compat/regex/regexec.c
 --- a/compat/regex/regexec.c
 +++ b/compat/regex/regexec.c


How do you think about the differences from this test result?

Regards,
Markus

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: git-coccinelle: adjustments for array.cocci?
  2019-11-15 11:11                 ` git-coccinelle: " Markus Elfring
@ 2019-11-15 14:20                   ` Markus Elfring
  2019-11-15 18:50                   ` Markus Elfring
  2019-11-16 17:57                   ` Julia Lawall
  2 siblings, 0 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-15 14:20 UTC (permalink / raw)
  To: Coccinelle; +Cc: git, Junio C Hamano, René Scharfe

> --- array-released.diff	2019-11-14 21:29:11.020576916 +0100
> +++ array-reduced1.diff	2019-11-14 21:45:58.931956527 +0100
> @@ -6,24 +6,10 @@
>   	r->entry_count = t->entry_count;
>   	r->delta_depth = t->delta_depth;
>  -	memcpy(r->entries,t->entries,t->entry_count*sizeof(t->entries[0]));
> -+	COPY_ARRAY(r->entries, t->entries, t->entry_count);
> ++	memcpy(r->entries,t->entries,t->entry_count*sizeof(*(t->entries)));
>   	release_tree_content(t);
>   	return r;
>   }

Can another variant for a transformation rule help to clarify unexpected
software behaviour around data processing with the semantic patch language?

@@
expression dst, src, n, E;
type T;
T *ptr;
T[] arr;
@@
  memcpy(
(        dst, src, sizeof(
+                         *(
                            E
-                            [...]
+                          )
                          ) * n
|
        ptr, src, sizeof(
-                        *(ptr)
+                        T
                        ) * n
|       arr, src, sizeof(
-                        *(arr)
+                        T
                        ) * n
|       dst, ptr, sizeof(
-                        *(ptr)
+                        T
                        ) * n
|       dst, arr, sizeof(
-                        *(arr)
+                        T
                        ) * n
)
       )


elfring@Sonne:~/Projekte/git/lokal> spatch contrib/coccinelle/array-test3.cocci fast-import.c
…

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: git-coccinelle: adjustments for array.cocci?
  2019-11-15 11:11                 ` git-coccinelle: " Markus Elfring
  2019-11-15 14:20                   ` Markus Elfring
@ 2019-11-15 18:50                   ` Markus Elfring
  2019-11-16  1:00                     ` [Cocci] " Julia Lawall
  2019-11-16 17:57                   ` Julia Lawall
  2 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-15 18:50 UTC (permalink / raw)
  To: Coccinelle, git; +Cc: Junio C Hamano, René Scharfe

> --- array-released.diff	2019-11-14 21:29:11.020576916 +0100
> +++ array-reduced1.diff	2019-11-14 21:45:58.931956527 +0100
> @@ -6,24 +6,10 @@
>   	r->entry_count = t->entry_count;
>   	r->delta_depth = t->delta_depth;
>  -	memcpy(r->entries,t->entries,t->entry_count*sizeof(t->entries[0]));
> -+	COPY_ARRAY(r->entries, t->entries, t->entry_count);
> ++	memcpy(r->entries,t->entries,t->entry_count*sizeof(*(t->entries)));
>   	release_tree_content(t);
>   	return r;
>   }

It took a while to become more aware of software development challenges
for the safe data processing with the semantic patch language also
at such a source code place.
https://github.com/git/git/blob/3edfcc65fdfc708c1c8f1d314885eecf9beb9b67/fast-import.c#L640

I got the impression that the Coccinelle software is occasionally able
to determine from the search specification “sizeof(T)” the corresponding
data type for code like “*(t->entries)”.
But it seems that there are circumstances to consider where the desired
data type was not automatically determined.
Thus the data processing  can become safer by explicitly expressing
the case distinction for the handling of expressions.

Adjusted transformation rule:
@@
type T;
T* dst_ptr, src_ptr;
T[] dst_arr, src_arr;
expression n, x;
@@
-memcpy
+COPY_ARRAY
       (
(       dst_ptr
|       dst_arr
)
       ,
(       src_ptr
|       src_arr
)
       ,
-       (n) * \( sizeof(T) \| sizeof(*(x)) \)
+       n
       )


Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-12 18:37 ` René Scharfe
  2019-11-13  2:11   ` Junio C Hamano
@ 2019-11-15 20:37   ` Markus Elfring
  2019-11-16 21:13     ` René Scharfe
  2019-11-16 16:33   ` Markus Elfring
  2 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-15 20:37 UTC (permalink / raw)
  To: René Scharfe, git

> This eliminates duplication in the semantic patch, which is good.

Thanks that you think in such a direction.


> It messes up the indentation of n in some of the cases in 921d49be86 ("use
> COPY_ARRAY for copying arrays", 2019-06-15), though.  Hmm, but that can
> be cured by duplicating the comma:

I have picked up further improvement possibilities for this SmPL script.
Would you like to integrate any of these changes?


@@
expression dst, src, n, E;
@@
 memcpy(dst, src, sizeof(
+                        *(
                           E
-                           [...]
+                         )
                         ) * n
       )

@@
type T;
T *ptr;
T[] arr;
expression E, n;
@@
 memcpy(
(       ptr, E, sizeof(
-                      *(ptr)
+                      T
                      ) * n
|       arr, E, sizeof(
-                      *(arr)
+                      T
                      ) * n
|       E, ptr, sizeof(
-                      *(ptr)
+                      T
                      ) * n
|       E, arr, sizeof(
-                      *(arr)
+                      T
                      ) * n
)
       )

@@
type T;
T* dst_ptr, src_ptr;
T[] dst_arr, src_arr;
expression n, x;
@@
-memcpy
+COPY_ARRAY
       (
(       dst_ptr
|       dst_arr
)
       ,
(       src_ptr
|       src_arr
)
-      , (n) * \( sizeof(T) \| sizeof(*(x)) \)
+      , n
       )

@@
type T;
T* dst, src, ptr;
expression n;
@@
(
-memmove
+MOVE_ARRAY
        (dst, src
-                , (n) * \( sizeof(* \( dst \| src \) ) \| sizeof(T) \)
+                , n
        )
|
-ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
+ALLOC_ARRAY(ptr, n)
);


Now I observe that the placement of space characters can be a coding style
concern at four places for adjusted lines by the generated patch.
Would you like to clarify remaining issues for pretty-printing
in such use cases?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Cocci] git-coccinelle: adjustments for array.cocci?
  2019-11-15 18:50                   ` Markus Elfring
@ 2019-11-16  1:00                     ` Julia Lawall
  2019-11-16  6:57                       ` Markus Elfring
  2019-11-16  8:29                       ` Markus Elfring
  0 siblings, 2 replies; 41+ messages in thread
From: Julia Lawall @ 2019-11-16  1:00 UTC (permalink / raw)
  To: Markus Elfring; +Cc: Coccinelle, git, René Scharfe, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2089 bytes --]



On Fri, 15 Nov 2019, Markus Elfring wrote:

> > --- array-released.diff	2019-11-14 21:29:11.020576916 +0100
> > +++ array-reduced1.diff	2019-11-14 21:45:58.931956527 +0100
> > @@ -6,24 +6,10 @@
> >   	r->entry_count = t->entry_count;
> >   	r->delta_depth = t->delta_depth;
> >  -	memcpy(r->entries,t->entries,t->entry_count*sizeof(t->entries[0]));
> > -+	COPY_ARRAY(r->entries, t->entries, t->entry_count);
> > ++	memcpy(r->entries,t->entries,t->entry_count*sizeof(*(t->entries)));
> >   	release_tree_content(t);
> >   	return r;
> >   }
>
> It took a while to become more aware of software development challenges
> for the safe data processing with the semantic patch language also
> at such a source code place.
> https://github.com/git/git/blob/3edfcc65fdfc708c1c8f1d314885eecf9beb9b67/fast-import.c#L640
>
> I got the impression that the Coccinelle software is occasionally able
> to determine from the search specification “sizeof(T)” the corresponding
> data type for code like “*(t->entries)”.

It can determine the type of t->entries if it has access to the definition
of the type of t.  This type may be in a header file.  If you want
Coccinelle to be able to find this information you can use the option
--all-includes or --recursive-includes.  It will be more efficient with
the option --include-headers-for-types.

julia

> But it seems that there are circumstances to consider where the desired
> data type was not automatically determined.
> Thus the data processing  can become safer by explicitly expressing
> the case distinction for the handling of expressions.
>
> Adjusted transformation rule:
> @@
> type T;
> T* dst_ptr, src_ptr;
> T[] dst_arr, src_arr;
> expression n, x;
> @@
> -memcpy
> +COPY_ARRAY
>        (
> (       dst_ptr
> |       dst_arr
> )
>        ,
> (       src_ptr
> |       src_arr
> )
>        ,
> -       (n) * \( sizeof(T) \| sizeof(*(x)) \)
> +       n
>        )
>
>
> Regards,
> Markus
> _______________________________________________
> Cocci mailing list
> Cocci@systeme.lip6.fr
> https://systeme.lip6.fr/mailman/listinfo/cocci
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Cocci] git-coccinelle: adjustments for array.cocci?
  2019-11-16  1:00                     ` [Cocci] " Julia Lawall
@ 2019-11-16  6:57                       ` Markus Elfring
  2019-11-16  8:29                       ` Markus Elfring
  1 sibling, 0 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-16  6:57 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Coccinelle, git, René Scharfe, Junio C Hamano

>> https://github.com/git/git/blob/3edfcc65fdfc708c1c8f1d314885eecf9beb9b67/fast-import.c#L640
>>
>> I got the impression that the Coccinelle software is occasionally able
>> to determine from the search specification “sizeof(T)” the corresponding
>> data type for code like “*(t->entries)”.
>
> It can determine the type of t->entries if it has access to the definition
> of the type of t.

Should this type determination always work here because the data structure
“tree_content” for the parameter “t” of the function “grow_tree_content”
is defined in the same source file?
https://github.com/git/git/blob/3edfcc65fdfc708c1c8f1d314885eecf9beb9b67/fast-import.c#L85


>                    This type may be in a header file.  If you want
> Coccinelle to be able to find this information you can use the option
> --all-includes or --recursive-includes.  It will be more efficient with
> the option --include-headers-for-types.

Such information can be more helpful in other situations than the mentioned
test case.


>> But it seems that there are circumstances to consider where the desired
>> data type was not automatically determined.

Would you like to take the presented differences from the discussed
before/after comparison better into account?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Cocci] git-coccinelle: adjustments for array.cocci?
  2019-11-16  1:00                     ` [Cocci] " Julia Lawall
  2019-11-16  6:57                       ` Markus Elfring
@ 2019-11-16  8:29                       ` Markus Elfring
  1 sibling, 0 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-16  8:29 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Coccinelle, git, René Scharfe, Junio C Hamano

> It can determine the type of t->entries if it has access to the definition
> of the type of t.

I would like to point another implementation detail out.

Another known function was also an update candidate.
https://github.com/git/git/blob/9a1180fc304ad9831641e5788e9c8d3dfc10ccdd/pretty.c#L90

elfring@Sonne:~/Projekte/git/lokal> spatch contrib/coccinelle/array.cocci pretty.c
…
@@ -106,8 +106,8 @@ static void setup_commit_formats(void)
        commit_formats_len = ARRAY_SIZE(builtin_formats);
        builtin_formats_len = commit_formats_len;
        ALLOC_GROW(commit_formats, commit_formats_len, commit_formats_alloc);
-       memcpy(commit_formats, builtin_formats,
-              sizeof(*builtin_formats)*ARRAY_SIZE(builtin_formats));
+       COPY_ARRAY(commit_formats, builtin_formats,
+                  ARRAY_SIZE(builtin_formats));

        git_config(git_pretty_formats_config, NULL);
 }


This patch generation can work based on the following SmPL code combination.

“…
expression n, x;
…
-      , (n) * \( sizeof(T) \| sizeof(*(x)) \)
…”

The asterisk should refer to a pointer expression within a sizeof operator.
I got informed that the semantic patch language would support such a restriction.

Thus I have tried out to specify the corresponding metavariables in this way.

“…
expression n;
expression* x;
…”

But the shown diff hunk is not regenerated by this SmPL script variant.
How should an array like “builtin_formats” (which is even defined in the same function)
be treated by the Coccinelle software in such use cases?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-12 18:37 ` René Scharfe
  2019-11-13  2:11   ` Junio C Hamano
  2019-11-15 20:37   ` coccinelle: " Markus Elfring
@ 2019-11-16 16:33   ` Markus Elfring
  2019-11-16 21:38     ` René Scharfe
  2 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-16 16:33 UTC (permalink / raw)
  To: René Scharfe, git

> This reduces duplication in the semantic patch, which is nice.  I think
> I tried something like that at the time, but found that it failed to
> produce some of the cases in 921d49be86 ("use COPY_ARRAY for copying
> arrays", 2019-06-15) for some reason.

I propose to integrate an other solution variant.

* How do you think about to delete questionable transformation rules
  together with increasing the usage of nested disjunctions in this script
  for the semantic patch language?

* Can a single transformation rule become sufficient for the discussed
  change pattern?


@@
type T;
T* dst_ptr, src_ptr, ptr;
T[] dst_arr, src_arr;
expression n, x;
@@
(
-memcpy
+COPY_ARRAY
       (
(       dst_ptr
|       dst_arr
)
       ,
(       src_ptr
|       src_arr
)
-      , (n) * \( sizeof(T) \| sizeof( \( *(x) \| x[...] \) ) \)
+      , n
       )
|
-memmove
+MOVE_ARRAY
        (dst_ptr,
         src_ptr
-               , (n) * \( sizeof(* \( dst_ptr \| src_ptr \) ) \| sizeof(T) \)
+               , n
        )
|
-ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
+ALLOC_ARRAY(ptr, n)
)


Would you like to clarify remaining challenges for pretty-printing
in such use cases?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Cocci] git-coccinelle: adjustments for array.cocci?
  2019-11-15 11:11                 ` git-coccinelle: " Markus Elfring
  2019-11-15 14:20                   ` Markus Elfring
  2019-11-15 18:50                   ` Markus Elfring
@ 2019-11-16 17:57                   ` Julia Lawall
  2019-11-16 18:29                     ` Markus Elfring
  2 siblings, 1 reply; 41+ messages in thread
From: Julia Lawall @ 2019-11-16 17:57 UTC (permalink / raw)
  To: Markus Elfring; +Cc: Coccinelle, René Scharfe, Junio C Hamano, git

[-- Attachment #1: Type: text/plain, Size: 4506 bytes --]



On Fri, 15 Nov 2019, Markus Elfring wrote:

> > Anyway, someone who can reproduce the issue using the latest release
> > of Coccinelle would be in a better position to file a bug report.
>
> Hello,
>
> I repeated the discussed source code transformation approach together
> with the software combination “Coccinelle 1.0.8-00004-g842075f7” (OCaml 4.09).
> https://github.com/coccinelle/coccinelle/commits/master
>
> 1. Yesterday I checked the source files out for the software “Git”
>    according to the commit “The first batch post 2.24 cycle”.
>    https://github.com/git/git/commit/d9f6f3b6195a0ca35642561e530798ad1469bd41
>
> 2. I restored a previous development status by the following command.
>
>    git show 921d49be86 | patch -p1 -R
>
>    See also:
>    https://public-inbox.org/git/53346d52-e096-c651-f70a-ce6ca4d82ff9@web.de/
>
> 3. I stored a generated patch based on the currently released SmPL script.
>    https://github.com/git/git/blob/177fbab747da4f58cb2a8ce010b3515c86dd67c9/contrib/coccinelle/array.cocci
>
> 4. I applied the following patch then.
>
> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
> index 46b8d2ee11..89df184bbd 100644
> --- a/contrib/coccinelle/array.cocci
> +++ b/contrib/coccinelle/array.cocci
> @@ -12,27 +12,21 @@ T *ptr;
>  T[] arr;
>  expression E, n;
>  @@
> -(
> -  memcpy(ptr, E,
> -- n * sizeof(*(ptr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(arr, E,
> -- n * sizeof(*(arr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(E, ptr,
> -- n * sizeof(*(ptr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(E, arr,
> -- n * sizeof(*(arr))
> -+ n * sizeof(T)
> -  )
> + memcpy(
> +(       ptr, E, n *
> +-       sizeof(*(ptr))
> ++       sizeof(T)
> +|       arr, E, n *
> +-       sizeof(*(arr))
> ++       sizeof(T)
> +|       E, ptr, n *
> +-       sizeof(*(ptr))
> ++       sizeof(T)
> +|       E, arr, n *
> +-       sizeof(*(arr))
> ++       sizeof(T)
>  )
> +       )

This seems quite unreadable, in contrast to the original code.

>
>  @@
>  type T;
>
>    I suggested in this way to move a bit of SmPL code.
>
> 5. I stored another generated patch based on the adjusted SmPL script.

No idea what it means to store a patch.

> 6. I performed a corresponding file comparison.
>
> --- array-released.diff	2019-11-14 21:29:11.020576916 +0100
> +++ array-reduced1.diff	2019-11-14 21:45:58.931956527 +0100
> @@ -6,24 +6,10 @@
>   	r->entry_count = t->entry_count;
>   	r->delta_depth = t->delta_depth;
>  -	memcpy(r->entries,t->entries,t->entry_count*sizeof(t->entries[0]));
> -+	COPY_ARRAY(r->entries, t->entries, t->entry_count);
> ++	memcpy(r->entries,t->entries,t->entry_count*sizeof(*(t->entries)));
>   	release_tree_content(t);
>   	return r;
>   }

I have no idea what is being compared here. The COPY_ARRAY thing looks
nice, but doesn't seem to have anything to do with your semantic patch.

julia



> -diff -u -p a/pretty.c b/pretty.c
> ---- a/pretty.c
> -+++ b/pretty.c
> -@@ -106,8 +106,8 @@ static void setup_commit_formats(void)
> - 	commit_formats_len = ARRAY_SIZE(builtin_formats);
> - 	builtin_formats_len = commit_formats_len;
> - 	ALLOC_GROW(commit_formats, commit_formats_len, commit_formats_alloc);
> --	memcpy(commit_formats, builtin_formats,
> --	       sizeof(*builtin_formats)*ARRAY_SIZE(builtin_formats));
> -+	COPY_ARRAY(commit_formats, builtin_formats,
> -+		   ARRAY_SIZE(builtin_formats));
> -
> - 	git_config(git_pretty_formats_config, NULL);
> - }
>  diff -u -p a/packfile.c b/packfile.c
>  --- a/packfile.c
>  +++ b/packfile.c
> @@ -36,17 +22,6 @@
>   		} else {
>   			ALLOC_GROW(poi_stack, poi_stack_nr+1, poi_stack_alloc);
>   		}
> -@@ -1698,8 +1698,8 @@ void *unpack_entry(struct repository *r,
> - 		    && delta_stack == small_delta_stack) {
> - 			delta_stack_alloc = alloc_nr(delta_stack_nr);
> - 			ALLOC_ARRAY(delta_stack, delta_stack_alloc);
> --			memcpy(delta_stack, small_delta_stack,
> --			       sizeof(*delta_stack)*delta_stack_nr);
> -+			COPY_ARRAY(delta_stack, small_delta_stack,
> -+				   delta_stack_nr);
> - 		} else {
> - 			ALLOC_GROW(delta_stack, delta_stack_nr+1, delta_stack_alloc);
> - 		}
>  diff -u -p a/compat/regex/regexec.c b/compat/regex/regexec.c
>  --- a/compat/regex/regexec.c
>  +++ b/compat/regex/regexec.c
>
>
> How do you think about the differences from this test result?
>
> Regards,
> Markus
> _______________________________________________
> Cocci mailing list
> Cocci@systeme.lip6.fr
> https://systeme.lip6.fr/mailman/listinfo/cocci
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [Cocci] git-coccinelle: adjustments for array.cocci?
  2019-11-16 17:57                   ` Julia Lawall
@ 2019-11-16 18:29                     ` Markus Elfring
  0 siblings, 0 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-16 18:29 UTC (permalink / raw)
  To: Julia Lawall; +Cc: Coccinelle, René Scharfe, Junio C Hamano, git

>> + memcpy(
>> +(       ptr, E, n *
>> +-       sizeof(*(ptr))
>> ++       sizeof(T)
>> +|       arr, E, n *
>> +-       sizeof(*(arr))
>> ++       sizeof(T)
>> +|       E, ptr, n *
>> +-       sizeof(*(ptr))
>> ++       sizeof(T)
>> +|       E, arr, n *
>> +-       sizeof(*(arr))
>> ++       sizeof(T)
>>  )
>> +       )
>
> This seems quite unreadable, in contrast to the original code.

The code formatting can vary for improved applications of SmPL disjunctions.

See also related update suggestions:
* https://public-inbox.org/git/05ab1110-2115-7886-f890-9983caabc52c@web.de/
* https://public-inbox.org/git/75b9417b-14a7-c9c6-25eb-f6e05f340376@web.de/


>> 5. I stored another generated patch based on the adjusted SmPL script.
>
> No idea what it means to store a patch.

I put the output from the program “spatch” into a text file like “array-reduced1.diff”
in a selected directory.


>> 6. I performed a corresponding file comparison.
>>
>> --- array-released.diff	2019-11-14 21:29:11.020576916 +0100
>> +++ array-reduced1.diff	2019-11-14 21:45:58.931956527 +0100
>> @@ -6,24 +6,10 @@
>>   	r->entry_count = t->entry_count;
>>   	r->delta_depth = t->delta_depth;
>>  -	memcpy(r->entries,t->entries,t->entry_count*sizeof(t->entries[0]));
>> -+	COPY_ARRAY(r->entries, t->entries, t->entry_count);
>> ++	memcpy(r->entries,t->entries,t->entry_count*sizeof(*(t->entries)));
>>   	release_tree_content(t);
>>   	return r;
>>   }
>
> I have no idea what is being compared here.

I suggest to take another look at the described steps then.


> The COPY_ARRAY thing looks nice, but doesn't seem to have anything to do
> with your semantic patch.

I find your interpretation of the presented software situation questionable.

* I got the impression in the meantime that my suggestion for a refactoring
  of a specific SmPL disjunction influenced transformation results for
  a subsequent SmPL rule in unexpected ways.

* Other software adjustments and solution variants can trigger further
  development considerations, can't they?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-15 20:37   ` coccinelle: " Markus Elfring
@ 2019-11-16 21:13     ` René Scharfe
  2019-11-17  7:56       ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: René Scharfe @ 2019-11-16 21:13 UTC (permalink / raw)
  To: Markus Elfring; +Cc: git

Am 15.11.19 um 21:37 schrieb Markus Elfring:
>> This eliminates duplication in the semantic patch, which is good.
>
> Thanks that you think in such a direction.
>
>
>> It messes up the indentation of n in some of the cases in 921d49be86 ("use
>> COPY_ARRAY for copying arrays", 2019-06-15), though.  Hmm, but that can
>> be cured by duplicating the comma:
>
> I have picked up further improvement possibilities for this SmPL script.
> Would you like to integrate any of these changes?

Not sure, could you please elaborate on the benefits of each proposed
change?

> @@
> expression dst, src, n, E;
> @@
>  memcpy(dst, src, sizeof(
> +                        *(
>                            E
> -                           [...]
> +                         )
>                          ) * n
>        )

That's longer and looks more complicated to me than what we currently have:

  @@
  expression dst, src, n, E;
  @@
    memcpy(dst, src, n * sizeof(
  - E[...]
  + *(E)
    ))

Avoiding to duplicate E doesn't seem to be worth it.  I can see that
indenting the sizeof parameter and parentheses could improve readability,
though.

> @@
> type T;
> T *ptr;
> T[] arr;
> expression E, n;
> @@
>  memcpy(
> (       ptr, E, sizeof(
> -                      *(ptr)
> +                      T
>                       ) * n
> |       arr, E, sizeof(
> -                      *(arr)
> +                      T
>                       ) * n
> |       E, ptr, sizeof(
> -                      *(ptr)
> +                      T
>                       ) * n
> |       E, arr, sizeof(
> -                      *(arr)
> +                      T
>                       ) * n
> )
>        )

This still fails to regenerate two of the changes from 921d49be86 (use
COPY_ARRAY for copying arrays, 2019-06-15), at least with for me (and
Coccinelle 1.0.4).

> @@
> type T;
> T* dst_ptr, src_ptr;
> T[] dst_arr, src_arr;
> expression n, x;
> @@
> -memcpy
> +COPY_ARRAY
>        (
> (       dst_ptr
> |       dst_arr
> )
>        ,
> (       src_ptr
> |       src_arr
> )
> -      , (n) * \( sizeof(T) \| sizeof(*(x)) \)
> +      , n
>        )

That x could be anything -- it's not tied to the element size of source
or destination.  Such a transformation might change the meaning of the
code, as COPY_ARRAY will use the element size of the destination behind
the scenes.  So that doesn't look safe to me.

> @@
> type T;
> T* dst, src, ptr;
> expression n;
> @@
> (
> -memmove
> +MOVE_ARRAY
>         (dst, src
> -                , (n) * \( sizeof(* \( dst \| src \) ) \| sizeof(T) \)
> +                , n
>         )
> |
> -ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
> +ALLOC_ARRAY(ptr, n)
> );

memmove/MOVE_ARRAY and xmalloc/ALLOC_ARRAY are quite different; why
would we want to jam transformations for them into the same rule like
this?  The only overlap seems to be n.  Handling memmove/MOVE_ARRAY and
memcpy/COPY_ARRAY together would make more sense, as they take the same
kinds of parameters.

I didn't know that disjunctions can be specified inline using \(, \|
and \), though.  Rules can be much more compact that way.  Mixing
languages like that can also be quite confusing.  Syntax highlighting
could help; https://github.com/ahf/cocci-syntax at least doesn't
show those any different from regular code, though.

> Now I observe that the placement of space characters can be a coding style
> concern at four places for adjusted lines by the generated patch.
> Would you like to clarify remaining issues for pretty-printing
> in such use cases?

Ideally, generated code should adhere to Documentation/CodingGuidelines,
so that it can be accepted without requiring hand-editing.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-16 16:33   ` Markus Elfring
@ 2019-11-16 21:38     ` René Scharfe
  2019-11-17  8:19       ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: René Scharfe @ 2019-11-16 21:38 UTC (permalink / raw)
  To: Markus Elfring; +Cc: git

Am 16.11.19 um 17:33 schrieb Markus Elfring:
>> This reduces duplication in the semantic patch, which is nice.  I think
>> I tried something like that at the time, but found that it failed to
>> produce some of the cases in 921d49be86 ("use COPY_ARRAY for copying
>> arrays", 2019-06-15) for some reason.
>
> I propose to integrate an other solution variant.
>
> * How do you think about to delete questionable transformation rules
>   together with increasing the usage of nested disjunctions in this script
>   for the semantic patch language?

Which transformation rules are questionable and why?  Removing broken
or ineffective rules would be very welcome.

Specifying disjunctions inline can make rules shorter, but harder to
understand due to mixing languages.  Perhaps this is a matter of
getting used to it, and syntax highlighting might help a bit.

> * Can a single transformation rule become sufficient for the discussed
>   change pattern?
>
>
> @@
> type T;
> T* dst_ptr, src_ptr, ptr;
> T[] dst_arr, src_arr;
> expression n, x;
> @@
> (
> -memcpy
> +COPY_ARRAY
>        (
> (       dst_ptr
> |       dst_arr
> )
>        ,
> (       src_ptr
> |       src_arr
> )
> -      , (n) * \( sizeof(T) \| sizeof( \( *(x) \| x[...] \) ) \)
> +      , n
>        )
> |
> -memmove
> +MOVE_ARRAY
>         (dst_ptr,
>          src_ptr
> -               , (n) * \( sizeof(* \( dst_ptr \| src_ptr \) ) \| sizeof(T) \)
> +               , n
>         )
> |
> -ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
> +ALLOC_ARRAY(ptr, n)
> )

memmove/MOVE_ARRAY take the same kind of parameters as
memcpy/COPY_ARRAY, so handling them in the same rule makes sense.
The former could take advantage of the transformations for arrays
that the latter has.

Mixing in the unrelated xmalloc/ALLOC_ARRAY transformation does
not make sense to me, though.

Matching sizeof of anything (with the x) can produce inaccurate
transformations, as mentioned in the other reply I just sent.

> Would you like to clarify remaining challenges for pretty-printing
> in such use cases?

Not sure what you mean here.  Did my other reply answer it?  If it
didn't then please state what's unclear to you.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-16 21:13     ` René Scharfe
@ 2019-11-17  7:56       ` Markus Elfring
  2019-11-17 13:40         ` René Scharfe
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-17  7:56 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

>> @@
>> expression dst, src, n, E;
>> @@
>>  memcpy(dst, src, sizeof(
>> +                        *(
>>                            E
>> -                           [...]
>> +                         )
>>                          ) * n
>>        )
>
> That's longer and looks more complicated to me

I point another possibility out to express a change specification
by the means of the semantic patch language.
How would you think about such SmPL code if the indentation
will be reduced?


> than what we currently have:
>   @@
>   expression dst, src, n, E;
>   @@
>     memcpy(dst, src, n * sizeof(
>   - E[...]
>   + *(E)
>     ))
>
> Avoiding to duplicate E doesn't seem to be worth it.

I show other development preferences occasionally.


> I can see that indenting the sizeof parameter and parentheses could
> improve readability, though.

Thanks that you can follow such coding style aspects.


>> @@
>> type T;
>> T *ptr;
>> T[] arr;
>> expression E, n;
>> @@
>>  memcpy(
>> (       ptr, E, sizeof(
>> -                      *(ptr)
>> +                      T
>>                       ) * n
>> |       arr, E, sizeof(
>> -                      *(arr)
>> +                      T
>>                       ) * n
>> |       E, ptr, sizeof(
>> -                      *(ptr)
>> +                      T
>>                       ) * n
>> |       E, arr, sizeof(
>> -                      *(arr)
>> +                      T
>>                       ) * n
>> )
>>        )
>
> This still fails to regenerate two of the changes from 921d49be86
> (use COPY_ARRAY for copying arrays, 2019-06-15), at least with for me
> (and Coccinelle 1.0.4).

Would you become keen to find the reasons out for unexpected data processing
results (also by the software combination “Coccinelle 1.0.8-00004-g842075f7”)
at this place?

But this transformation rule can probably be omitted if the usage
of SmPL disjunctions will be increased in a subsequent rule, can't it?


>> @@
>> type T;
>> T* dst_ptr, src_ptr;
>> T[] dst_arr, src_arr;
>> expression n, x;
>> @@
>> -memcpy
>> +COPY_ARRAY
>>        (
>> (       dst_ptr
>> |       dst_arr
>> )
>>        ,
>> (       src_ptr
>> |       src_arr
>> )
>> -      , (n) * \( sizeof(T) \| sizeof(*(x)) \)
>> +      , n
>>        )
>
> That x could be anything -- it's not tied to the element size of source
> or destination.  Such a transformation might change the meaning of the
> code, as COPY_ARRAY will use the element size of the destination behind
> the scenes.  So that doesn't look safe to me.

Would you like to use the SmPL code “*( \( src_ptr \| src_arr \) )” instead?


>> @@
>> type T;
>> T* dst, src, ptr;
>> expression n;
>> @@
>> (
>> -memmove
>> +MOVE_ARRAY
>>         (dst, src
>> -                , (n) * \( sizeof(* \( dst \| src \) ) \| sizeof(T) \)
>> +                , n
>>         )
>> |
>> -ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
>> +ALLOC_ARRAY(ptr, n)
>> );
>
> memmove/MOVE_ARRAY and xmalloc/ALLOC_ARRAY are quite different;

These functions provide another programming interface.


> why would we want to jam transformations for them into the same rule
> like this?

Possible nicer run time characteristics by the Coccinelle software.


> The only overlap seems to be n.

These case distinctions can share also the metavariable “T” for the
desired source code deletion.


> Handling memmove/MOVE_ARRAY and memcpy/COPY_ARRAY together would make
> more sense, as they take the same kinds of parameters.

Would you like to adjust the SmPL code in such a design direction?


> I didn't know that disjunctions can be specified inline using \(, \|
> and \), though.  Rules can be much more compact that way.

I hope that more corresponding software improvements can be achieved.


> Mixing languages like that can also be quite confusing.

I agree to this development concern.


>> Now I observe that the placement of space characters can be a coding style
>> concern at four places for adjusted lines by the generated patch.
>> Would you like to clarify remaining issues for pretty-printing
>> in such use cases?
>
> Ideally, generated code should adhere to Documentation/CodingGuidelines,
> so that it can be accepted without requiring hand-editing.

But how does the software situation look like if the original source code
would contain coding style issues?

It seems to be possible to specify SmPL code in a way so that even questionable
code layout would be preserved by an automatic transformation.

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-16 21:38     ` René Scharfe
@ 2019-11-17  8:19       ` Markus Elfring
  2019-11-17 13:40         ` René Scharfe
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-17  8:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

> Which transformation rules are questionable and why?

It was chosen to transform source code fragments (pointer expressions)
by two SmPL rules so that the search pattern “sizeof(T)” would work
in the third rule.


> Removing broken or ineffective rules would be very welcome.

I suggest to reconsider programming opportunities also by the means of
the semantic patch language.


> Specifying disjunctions inline can make rules shorter, but harder to
> understand due to mixing languages.  Perhaps this is a matter of
> getting used to it, and syntax highlighting might help a bit.

I agree to this view.


> Mixing in the unrelated xmalloc/ALLOC_ARRAY transformation does
> not make sense to me, though.

I propose to increase the sharing (or reuse) of involved metavariables.


> Matching sizeof of anything (with the x) can produce inaccurate
> transformations, as mentioned in the other reply I just sent.

Would you like to apply any further SmPL code fine-tuning?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-17  7:56       ` Markus Elfring
@ 2019-11-17 13:40         ` René Scharfe
  2019-11-17 18:19           ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: René Scharfe @ 2019-11-17 13:40 UTC (permalink / raw)
  To: Markus Elfring; +Cc: git

Am 17.11.19 um 08:56 schrieb Markus Elfring:
>>> @@
>>> expression dst, src, n, E;
>>> @@
>>>  memcpy(dst, src, sizeof(
>>> +                        *(
>>>                            E
>>> -                           [...]
>>> +                         )
>>>                          ) * n
>>>        )
>>
>> That's longer and looks more complicated to me
>
> I point another possibility out to express a change specification
> by the means of the semantic patch language.
> How would you think about such SmPL code if the indentation
> will be reduced?

Whitespace is not what makes the above example more complicated than the
equivalent rule below; separating the pieces of simple expressions does.

>> than what we currently have:
>>   @@
>>   expression dst, src, n, E;
>>   @@
>>     memcpy(dst, src, n * sizeof(
>>   - E[...]
>>   + *(E)
>>     ))

>>> @@
>>> type T;
>>> T *ptr;
>>> T[] arr;
>>> expression E, n;
>>> @@
>>>  memcpy(
>>> (       ptr, E, sizeof(
>>> -                      *(ptr)
>>> +                      T
>>>                       ) * n
>>> |       arr, E, sizeof(
>>> -                      *(arr)
>>> +                      T
>>>                       ) * n
>>> |       E, ptr, sizeof(
>>> -                      *(ptr)
>>> +                      T
>>>                       ) * n
>>> |       E, arr, sizeof(
>>> -                      *(arr)
>>> +                      T
>>>                       ) * n
>>> )
>>>        )
>>
>> This still fails to regenerate two of the changes from 921d49be86
>> (use COPY_ARRAY for copying arrays, 2019-06-15), at least with for me
>> (and Coccinelle 1.0.4).
>
> Would you become keen to find the reasons out for unexpected data processing
> results (also by the software combination “Coccinelle 1.0.8-00004-g842075f7”)
> at this place?

It looks like a bug in Coccinelle to me and I'd like to see it fixed if
that's confirmed, of course.  And I'd like to see Debian pick up a newer
version, preferably containing that fix.  But at least until then our
semantic patches need to work around it.

> But this transformation rule can probably be omitted if the usage
> of SmPL disjunctions will be increased in a subsequent rule, can't it?

Perhaps, but I don't see how.  Do you?

>>> @@
>>> type T;
>>> T* dst_ptr, src_ptr;
>>> T[] dst_arr, src_arr;
>>> expression n, x;
>>> @@
>>> -memcpy
>>> +COPY_ARRAY
>>>        (
>>> (       dst_ptr
>>> |       dst_arr
>>> )
>>>        ,
>>> (       src_ptr
>>> |       src_arr
>>> )
>>> -      , (n) * \( sizeof(T) \| sizeof(*(x)) \)
>>> +      , n
>>>        )
>>
>> That x could be anything -- it's not tied to the element size of source
>> or destination.  Such a transformation might change the meaning of the
>> code, as COPY_ARRAY will use the element size of the destination behind
>> the scenes.  So that doesn't look safe to me.
>
> Would you like to use the SmPL code “*( \( src_ptr \| src_arr \) )” instead?

That leaves out dst_ptr and dst_arr.

And what would it mean to match e.g. this ?

	memcpy(dst_ptr, src_ptr, n * sizeof(*src_arr))

At least the element size would be the same, but I'd rather shy away from
transforming weird cases like this automatically.

>>> @@
>>> type T;
>>> T* dst, src, ptr;
>>> expression n;
>>> @@
>>> (
>>> -memmove
>>> +MOVE_ARRAY
>>>         (dst, src
>>> -                , (n) * \( sizeof(* \( dst \| src \) ) \| sizeof(T) \)
>>> +                , n
>>>         )
>>> |
>>> -ptr = xmalloc((n) * \( sizeof(*ptr) \| sizeof(T) \))
>>> +ALLOC_ARRAY(ptr, n)
>>> );
>>
>> memmove/MOVE_ARRAY and xmalloc/ALLOC_ARRAY are quite different;
>
> These functions provide another programming interface.

Huh, which one specifically?  Here are the signatures of the functions
and macros, for reference:

  void *memmove(void *dest, const void *src, size_t n);
  void *memcpy(void *dest, const void *src, size_t n);

  COPY_ARRAY(dst, src, n)
  MOVE_ARRAY(dst, src, n)

>> why would we want to jam transformations for them into the same rule
>> like this?
>
> Possible nicer run time characteristics by the Coccinelle software.

How much faster is it exactly?

Speedups are good, but I think readability of rules is more important
than coccicheck duration.

>> Handling memmove/MOVE_ARRAY and memcpy/COPY_ARRAY together would make
>> more sense, as they take the same kinds of parameters.
>
> Would you like to adjust the SmPL code in such a design direction?

I can't find any examples in our code base that would be transformed by
a generalized rule.  That reduces my own motivation to tinker with the
existing rules to close to zero.

>>> Now I observe that the placement of space characters can be a coding style
>>> concern at four places for adjusted lines by the generated patch.
>>> Would you like to clarify remaining issues for pretty-printing
>>> in such use cases?
>>
>> Ideally, generated code should adhere to Documentation/CodingGuidelines,
>> so that it can be accepted without requiring hand-editing.
>
> But how does the software situation look like if the original source code
> would contain coding style issues?

The same: Generated code should not add coding style issues.  We can
still use results that need to be polished, but that's a manual step
which reduces the benefits of automation.

> It seems to be possible to specify SmPL code in a way so that even questionable
> code layout would be preserved by an automatic transformation.

That may be acceptable.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-17  8:19       ` Markus Elfring
@ 2019-11-17 13:40         ` René Scharfe
  2019-11-17 18:36           ` Markus Elfring
  2019-11-18 16:10           ` [PATCH] coccinelle: improve array.cocci Markus Elfring
  0 siblings, 2 replies; 41+ messages in thread
From: René Scharfe @ 2019-11-17 13:40 UTC (permalink / raw)
  To: Markus Elfring; +Cc: git

Am 17.11.19 um 09:19 schrieb Markus Elfring:
>> Which transformation rules are questionable and why?
>
> It was chosen to transform source code fragments (pointer expressions)
> by two SmPL rules so that the search pattern “sizeof(T)” would work
> in the third rule.

Ah, right, it would be nice to get rid of those normalization rules,
especially the second one.  I don't see how, though, without either
causing a combinatorial explosion or loosening up the matching too much.

>> Matching sizeof of anything (with the x) can produce inaccurate
>> transformations, as mentioned in the other reply I just sent.
>
> Would you like to apply any further SmPL code fine-tuning?

I guess that's a question for Junio, and his reply in
https://public-inbox.org/git/xmqqa790cyp1.fsf@gitster-ct.c.googlers.com/
seems relevant.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-17 13:40         ` René Scharfe
@ 2019-11-17 18:19           ` Markus Elfring
  2019-11-19 19:14             ` René Scharfe
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-17 18:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

> Whitespace is not what makes the above example more complicated than the
> equivalent rule below;

A different code layout might help in a better understanding for such
change specifications.


> separating the pieces of simple expressions does.

Will there occasionally be a need to change only the required source code parts?


>>> than what we currently have:
>>>   @@
>>>   expression dst, src, n, E;
>>>   @@
>>>     memcpy(dst, src, n * sizeof(
>>>   - E[...]
>>>   + *(E)
>>>     ))

Are any circumstances to consider where only the essential implementation details
should be touched by an automatic software transformation?


>>>> @@
>>>> type T;
>>>> T *ptr;
>>>> T[] arr;
>>>> expression E, n;
>>>> @@
>>>>  memcpy(
>>>> (       ptr, E, sizeof(
>>>> -                      *(ptr)
>>>> +                      T
>>>>                       ) * n
>>>> |       arr, E, sizeof(
>>>> -                      *(arr)
>>>> +                      T
>>>>                       ) * n
>>>> |       E, ptr, sizeof(
>>>> -                      *(ptr)
>>>> +                      T
>>>>                       ) * n
>>>> |       E, arr, sizeof(
>>>> -                      *(arr)
>>>> +                      T
>>>>                       ) * n
>>>> )
>>>>        )
>>>
>>> This still fails to regenerate two of the changes from 921d49be86
>>> (use COPY_ARRAY for copying arrays, 2019-06-15), at least with for me
>>> (and Coccinelle 1.0.4).
>>
>> Would you become keen to find the reasons out for unexpected data processing
>> results (also by the software combination “Coccinelle 1.0.8-00004-g842075f7”)
>> at this place?
>
> It looks like a bug in Coccinelle to me

We might stumble also on just another (temporary) software limitation.


> and I'd like to see it fixed

Would you like to support corresponding development anyhow?


> if that's confirmed, of course.

I am curious if further feedback will evolve for affected software areas.


> And I'd like to see Debian pick up a newer version, preferably containing that fix.

I assume that you can wait a long time for progress in the software
distribution direction.


> But at least until then our semantic patches need to work around it.

Would another concrete fix for the currently discussed SmPL script
be better than a “workaround”?


>> But this transformation rule can probably be omitted if the usage
>> of SmPL disjunctions will be increased in a subsequent rule, can't it?
>
> Perhaps, but I don't see how.  Do you?

Obviously, yes (in principle according to my proposal from yesterday).
https://public-inbox.org/git/05ab1110-2115-7886-f890-9983caabc52c@web.de/


>> Would you like to use the SmPL code “*( \( src_ptr \| src_arr \) )” instead?
>
> That leaves out dst_ptr and dst_arr.

How many items should finally be filtered in the discussed SmPL disjunction?


> And what would it mean to match e.g. this ?
>
> 	memcpy(dst_ptr, src_ptr, n * sizeof(*src_arr))

The Coccinelle software takes care for commutativity by isomorphisms.
https://github.com/coccinelle/coccinelle/blob/19ee1697bf152d37a78a20cefe148775bf4b0e0d/standard.iso#L241


> At least the element size would be the same, but I'd rather shy away from
> transforming weird cases like this automatically.

Do you mean to specify additional restrictions by SmPL code?


>   void *memmove(void *dest, const void *src, size_t n);
>   void *memcpy(void *dest, const void *src, size_t n);
>
>   COPY_ARRAY(dst, src, n)
>   MOVE_ARRAY(dst, src, n)

Can the replacement of these functions by macro calls be combined further
by improved SmPL code?


>> Possible nicer run time characteristics by the Coccinelle software.
>
> How much faster is it exactly?

The answer will depend on efforts which you would like to invest
in corresponding (representative) measurements.


> Speedups are good, but I think readability of rules is more important
> than coccicheck duration.

I hope that a more pleasing balance can be found for the involved
usability factors.


>> But how does the software situation look like if the original source code
>> would contain coding style issues?
>
> The same: Generated code should not add coding style issues.

Such an expectation is generally nice. - But target conflicts can occur there.


> We can still use results that need to be polished, but that's a manual step
> which reduces the benefits of automation.

I am curious how the software development practice will evolve further.

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-17 13:40         ` René Scharfe
@ 2019-11-17 18:36           ` Markus Elfring
  2019-11-19 19:15             ` René Scharfe
  2019-11-18 16:10           ` [PATCH] coccinelle: improve array.cocci Markus Elfring
  1 sibling, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-17 18:36 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

>> It was chosen to transform source code fragments (pointer expressions)
>> by two SmPL rules so that the search pattern “sizeof(T)” would work
>> in the third rule.
>
> Ah, right, it would be nice to get rid of those normalization rules,
> especially the second one.

Thanks for such positive feedback.


> I don't see how,

Where is your view too limited at the moment?


> though, without either causing a combinatorial explosion

Growing combinations can become more interesting, can't they?


> or loosening up the matching too much.

I hope that we can achieve another reasonably safe transformation
approach together.

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH] coccinelle: improve array.cocci
  2019-11-17 13:40         ` René Scharfe
  2019-11-17 18:36           ` Markus Elfring
@ 2019-11-18 16:10           ` Markus Elfring
  2019-11-19 19:15             ` René Scharfe
  2020-01-25  8:23             ` Markus Elfring
  1 sibling, 2 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-18 16:10 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

From: Markus Elfring <elfring@users.sourceforge.net>
Date: Mon, 18 Nov 2019 17:00:37 +0100

This script contained transformation rules for the semantic patch language
which used similar code.

1. Delete two SmPL rules which were used to transform source code fragments
   (pointer expressions) so that the search pattern “sizeof(T)” would work
   in the third rule.
   See also the topic “coccinelle: adjustments for array.cocci?”:
   https://public-inbox.org/git/f28f5fb8-2814-9df5-faf2-7146ed1a1f4d@web.de/

2. Combine the remaining rules by using six SmPL disjunctions.

3. Adjust case distinctions and corresponding metavariables so that
   the desired search for update candidates can be more complete.

4. Increase the precision for the specification of required changes.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
---
 contrib/coccinelle/array.cocci | 100 ++++++---------------------------
 1 file changed, 18 insertions(+), 82 deletions(-)

diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
index 46b8d2ee11..bcd6ff4793 100644
--- a/contrib/coccinelle/array.cocci
+++ b/contrib/coccinelle/array.cocci
@@ -1,90 +1,26 @@
-@@
-expression dst, src, n, E;
-@@
-  memcpy(dst, src, n * sizeof(
-- E[...]
-+ *(E)
-  ))
-
 @@
 type T;
-T *ptr;
-T[] arr;
-expression E, n;
-@@
-(
-  memcpy(ptr, E,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(arr, E,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, ptr,
-- n * sizeof(*(ptr))
-+ n * sizeof(T)
-  )
-|
-  memcpy(E, arr,
-- n * sizeof(*(arr))
-+ n * sizeof(T)
-  )
-)
-
-@@
-type T;
-T *dst_ptr;
-T *src_ptr;
-T[] dst_arr;
 T[] src_arr;
-expression n;
+expression n, dst_e, src_e;
+expression* dst_p_e, src_p_e;
 @@
 (
-- memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
-+ COPY_ARRAY(dst_ptr, src_ptr, n)
-|
-- memcpy(dst_ptr, src_arr, (n) * sizeof(T))
-+ COPY_ARRAY(dst_ptr, src_arr, n)
-|
-- memcpy(dst_arr, src_ptr, (n) * sizeof(T))
-+ COPY_ARRAY(dst_arr, src_ptr, n)
-|
-- memcpy(dst_arr, src_arr, (n) * sizeof(T))
-+ COPY_ARRAY(dst_arr, src_arr, n)
-)
-
-@@
-type T;
-T *dst;
-T *src;
-expression n;
-@@
 (
-- memmove(dst, src, (n) * sizeof(*dst));
-+ MOVE_ARRAY(dst, src, n);
-|
-- memmove(dst, src, (n) * sizeof(*src));
-+ MOVE_ARRAY(dst, src, n);
+-memcpy
++COPY_ARRAY
 |
-- memmove(dst, src, (n) * sizeof(T));
-+ MOVE_ARRAY(dst, src, n);
+-memmove
++MOVE_ARRAY
+)
+       (
+        dst_e,
+        src_e
+-       , (n) * \( sizeof(T) \| sizeof( \( *(src_p_e) \| src_e[...] \| src_arr \) ) \)
++       , n
+       )
+|
++ALLOC_ARRAY(
+             dst_p_e
+-                    = xmalloc((n) * \( sizeof( \( *(src_p_e) \| src_e[...] \| src_arr \) ) \| sizeof(T) \))
++            , n)
 )
-
-@@
-type T;
-T *ptr;
-expression n;
-@@
-- ptr = xmalloc((n) * sizeof(*ptr));
-+ ALLOC_ARRAY(ptr, n);
-
-@@
-type T;
-T *ptr;
-expression n;
-@@
-- ptr = xmalloc((n) * sizeof(T));
-+ ALLOC_ARRAY(ptr, n);
--
2.24.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-17 18:19           ` Markus Elfring
@ 2019-11-19 19:14             ` René Scharfe
  2019-11-19 20:21               ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: René Scharfe @ 2019-11-19 19:14 UTC (permalink / raw)
  To: Markus Elfring; +Cc: git

Am 17.11.19 um 19:19 schrieb Markus Elfring:
>> Whitespace is not what makes the above example more complicated than the
>> equivalent rule below;
>
> A different code layout might help in a better understanding for such
> change specifications.
>
>
>> separating the pieces of simple expressions does.
>
> Will there occasionally be a need to change only the required source code parts?

Changing parts that don't need to be changed does not make sense to me.
Why do you ask and how does it relate to the example at hand?

>>>> than what we currently have:
>>>>   @@
>>>>   expression dst, src, n, E;
>>>>   @@
>>>>     memcpy(dst, src, n * sizeof(
>>>>   - E[...]
>>>>   + *(E)
>>>>     ))
>
> Are any circumstances to consider where only the essential implementation details
> should be touched by an automatic software transformation?

I don't understand this question.

>> It looks like a bug in Coccinelle to me
>
> We might stumble also on just another (temporary) software limitation.
>
>
>> and I'd like to see it fixed
>
> Would you like to support corresponding development anyhow?

I don't see me learning OCaml in the near future.  Or are you looking
for donations? :)

>> But at least until then our semantic patches need to work around it.
>
> Would another concrete fix for the currently discussed SmPL script
> be better than a “workaround”?

These are different things.  Fixes (repairs) are always welcome.  But
they should not rely on SmPL constructs that only work properly using
unreleased versions of Coccinelle.

>>> Would you like to use the SmPL code “*( \( src_ptr \| src_arr \) )” instead?
>>
>> That leaves out dst_ptr and dst_arr.
>
> How many items should finally be filtered in the discussed SmPL disjunction?

Let's see: dst and src can be pointers or array references, which makes
four combinations.  sizeof could either operate the shared type or on an
element of dst or an element of src.  An element can be accessed either
using dereference (*) or subscript ([]).  That makes five possible
variations for the sizeof, right?  So twenty combinations in total.

>> And what would it mean to match e.g. this ?
>>
>> 	memcpy(dst_ptr, src_ptr, n * sizeof(*src_arr))
>
> The Coccinelle software takes care for commutativity by isomorphisms.
> https://github.com/coccinelle/coccinelle/blob/19ee1697bf152d37a78a20cefe148775bf4b0e0d/standard.iso#L241

OK, but I had a different concern (more below).

>> At least the element size would be the same, but I'd rather shy away from
>> transforming weird cases like this automatically.
>
> Do you mean to specify additional restrictions by SmPL code?

Let's take this silly C fragment as an example:

	char *src = strdup("foo");
	size_t src_len = strlen(src);
	char *dst = malloc(src_len);
	char unrelated[17];
	memcpy(dst, src, src_len * sizeof(*unrelated));

My point is that taking the size of something that is neither source nor
destination is weird enough that it should be left alone by semantic
patches.  Matching should be precise enough to avoid false
transformations.

>>   void *memmove(void *dest, const void *src, size_t n);
>>   void *memcpy(void *dest, const void *src, size_t n);
>>
>>   COPY_ARRAY(dst, src, n)
>>   MOVE_ARRAY(dst, src, n)
>
> Can the replacement of these functions by macro calls be combined further
> by improved SmPL code?

Very likely.

>>> Possible nicer run time characteristics by the Coccinelle software.
>>
>> How much faster is it exactly?
>
> The answer will depend on efforts which you would like to invest
> in corresponding (representative) measurements.

Is that some kind of quantum effect? ;-)

When I try to convince people to apply a patch that is intended to
speed up something, I often use https://github.com/sharkdp/hyperfine
these days.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-17 18:36           ` Markus Elfring
@ 2019-11-19 19:15             ` René Scharfe
  0 siblings, 0 replies; 41+ messages in thread
From: René Scharfe @ 2019-11-19 19:15 UTC (permalink / raw)
  To: Markus Elfring; +Cc: git

Am 17.11.19 um 19:36 schrieb Markus Elfring:
>>> It was chosen to transform source code fragments (pointer expressions)
>>> by two SmPL rules so that the search pattern “sizeof(T)” would work
>>> in the third rule.
>>
>> Ah, right, it would be nice to get rid of those normalization rules,
>> especially the second one.
>
> Thanks for such positive feedback.
>
>
>> I don't see how,
>
> Where is your view too limited at the moment?

I don't know.  Or perhaps it's rather too wide and I worry about
irrelevant details?

>> though, without either causing a combinatorial explosion
>
> Growing combinations can become more interesting, can't they?

Perhaps, but not if they blow up the size of the semantic patch or
the runtime of Coccinelle.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] coccinelle: improve array.cocci
  2019-11-18 16:10           ` [PATCH] coccinelle: improve array.cocci Markus Elfring
@ 2019-11-19 19:15             ` René Scharfe
  2019-11-20  9:01               ` Markus Elfring
  2019-11-22  5:54               ` [PATCH] " Junio C Hamano
  2020-01-25  8:23             ` Markus Elfring
  1 sibling, 2 replies; 41+ messages in thread
From: René Scharfe @ 2019-11-19 19:15 UTC (permalink / raw)
  To: Markus Elfring, git


Am 18.11.19 um 17:10 schrieb Markus Elfring:
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Mon, 18 Nov 2019 17:00:37 +0100
>
> This script contained transformation rules for the semantic patch language
> which used similar code.
>
> 1. Delete two SmPL rules which were used to transform source code fragments
>    (pointer expressions) so that the search pattern “sizeof(T)” would work
>    in the third rule.
>    See also the topic “coccinelle: adjustments for array.cocci?”:
>    https://public-inbox.org/git/f28f5fb8-2814-9df5-faf2-7146ed1a1f4d@web.de/
>
> 2. Combine the remaining rules by using six SmPL disjunctions.
>
> 3. Adjust case distinctions and corresponding metavariables so that
>    the desired search for update candidates can be more complete.
>
> 4. Increase the precision for the specification of required changes.
>
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
> ---
>  contrib/coccinelle/array.cocci | 100 ++++++---------------------------
>  1 file changed, 18 insertions(+), 82 deletions(-)

The diff is hard to read, so here's the resulting semantic patch:

-- start --
@@
type T;
T[] src_arr;
expression n, dst_e, src_e;
expression* dst_p_e, src_p_e;
@@
(
(
-memcpy
+COPY_ARRAY
|
-memmove
+MOVE_ARRAY
)
       (
        dst_e,
        src_e
-       , (n) * \( sizeof(T) \| sizeof( \( *(src_p_e) \| src_e[...] \| src_arr \) ) \)
+       , n
       )
|
+ALLOC_ARRAY(
             dst_p_e
-                    = xmalloc((n) * \( sizeof( \( *(src_p_e) \| src_e[...] \| src_arr \) ) \| sizeof(T) \))
+            , n)
)
-- end --

I like that COPY_ARRAY and MOVE_ARRAY are handled in the same rule,
as they share the same parameters and do the same -- except that
the latter handles overlaps, while the former may be a bit faster.

And I like that it's short.

I don't like that ALLOC_ARRAY is handled in the same rule, as it is
quite different from the other two macros.

Coccinelle needs significantly longer to apply the new version.
Here are times for master:

Benchmark #1: make contrib/coccinelle/array.cocci.patch
  Time (mean ± σ):     19.314 s ±  0.200 s    [User: 19.065 s, System: 0.224 s]
  Range (min … max):   19.009 s … 19.718 s    10 runs

... and here with the patch applied:

Benchmark #1: make contrib/coccinelle/array.cocci.patch
  Time (mean ± σ):     43.420 s ±  0.490 s    [User: 43.087 s, System: 0.273 s]
  Range (min … max):   42.636 s … 44.359 s    10 runs

The current version checks if source and destination are of the same type,
and whether the sizeof operand is either said type or an element of source
or destination.  The new one does not.  So I don't see claim 4 ("Increase
the precision") fulfilled, quite the opposite rather.  It can produce e.g.
a transformation like this:

 void f(int *dst, char *src, size_t n)
 {
-	memcpy(dst, src, n * sizeof(short));
+	COPY_ARRAY(dst, src, n);
 }

The COPY_ARRAY there effectively expands to:

	memcpy(dst, src, n * sizeof(*dst));

... which is quite different -- if short is 2 bytes wide and int 4 bytes
then we copy twice as many bytes as before.

I think an automatic transformation should only be generated if it is
safe.  It's hard to spot a weird case in a generated patch amid ten
well-behaving ones.

>
> diff --git a/contrib/coccinelle/array.cocci b/contrib/coccinelle/array.cocci
> index 46b8d2ee11..bcd6ff4793 100644
> --- a/contrib/coccinelle/array.cocci
> +++ b/contrib/coccinelle/array.cocci
> @@ -1,90 +1,26 @@
> -@@
> -expression dst, src, n, E;
> -@@
> -  memcpy(dst, src, n * sizeof(
> -- E[...]
> -+ *(E)
> -  ))
> -
>  @@
>  type T;
> -T *ptr;
> -T[] arr;
> -expression E, n;
> -@@
> -(
> -  memcpy(ptr, E,
> -- n * sizeof(*(ptr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(arr, E,
> -- n * sizeof(*(arr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(E, ptr,
> -- n * sizeof(*(ptr))
> -+ n * sizeof(T)
> -  )
> -|
> -  memcpy(E, arr,
> -- n * sizeof(*(arr))
> -+ n * sizeof(T)
> -  )
> -)
> -
> -@@
> -type T;
> -T *dst_ptr;
> -T *src_ptr;
> -T[] dst_arr;
>  T[] src_arr;
> -expression n;
> +expression n, dst_e, src_e;
> +expression* dst_p_e, src_p_e;
>  @@
>  (
> -- memcpy(dst_ptr, src_ptr, (n) * sizeof(T))
> -+ COPY_ARRAY(dst_ptr, src_ptr, n)
> -|
> -- memcpy(dst_ptr, src_arr, (n) * sizeof(T))
> -+ COPY_ARRAY(dst_ptr, src_arr, n)
> -|
> -- memcpy(dst_arr, src_ptr, (n) * sizeof(T))
> -+ COPY_ARRAY(dst_arr, src_ptr, n)
> -|
> -- memcpy(dst_arr, src_arr, (n) * sizeof(T))
> -+ COPY_ARRAY(dst_arr, src_arr, n)
> -)
> -
> -@@
> -type T;
> -T *dst;
> -T *src;
> -expression n;
> -@@
>  (
> -- memmove(dst, src, (n) * sizeof(*dst));
> -+ MOVE_ARRAY(dst, src, n);
> -|
> -- memmove(dst, src, (n) * sizeof(*src));
> -+ MOVE_ARRAY(dst, src, n);
> +-memcpy
> ++COPY_ARRAY
>  |
> -- memmove(dst, src, (n) * sizeof(T));
> -+ MOVE_ARRAY(dst, src, n);
> +-memmove
> ++MOVE_ARRAY
> +)
> +       (
> +        dst_e,
> +        src_e
> +-       , (n) * \( sizeof(T) \| sizeof( \( *(src_p_e) \| src_e[...] \| src_arr \) ) \)
> ++       , n
> +       )
> +|
> ++ALLOC_ARRAY(
> +             dst_p_e
> +-                    = xmalloc((n) * \( sizeof( \( *(src_p_e) \| src_e[...] \| src_arr \) ) \| sizeof(T) \))
> ++            , n)
>  )
> -
> -@@
> -type T;
> -T *ptr;
> -expression n;
> -@@
> -- ptr = xmalloc((n) * sizeof(*ptr));
> -+ ALLOC_ARRAY(ptr, n);
> -
> -@@
> -type T;
> -T *ptr;
> -expression n;
> -@@
> -- ptr = xmalloc((n) * sizeof(T));
> -+ ALLOC_ARRAY(ptr, n);
> --
> 2.24.0
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-19 19:14             ` René Scharfe
@ 2019-11-19 20:21               ` Markus Elfring
  2019-11-21 19:01                 ` René Scharfe
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-19 20:21 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

>> Will there occasionally be a need to change only the required source code parts?
>
> Changing parts that don't need to be changed does not make sense to me.
> Why do you ask and how does it relate to the example at hand?

How does such feedback fit to the discussed SmPL change specification?

- E[...]
+ *(E)


>> Would you like to support corresponding development anyhow?
>
> I don't see me learning OCaml in the near future.

This can be fine.


> Or are you looking for donations?

Other (software) projects can benefit also from additional resources,
can't they?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: improve array.cocci
  2019-11-19 19:15             ` René Scharfe
@ 2019-11-20  9:01               ` Markus Elfring
  2019-11-21 19:02                 ` René Scharfe
  2019-11-22  5:54               ` [PATCH] " Junio C Hamano
  1 sibling, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-20  9:01 UTC (permalink / raw)
  To: René Scharfe, git

> I like that COPY_ARRAY and MOVE_ARRAY are handled in the same rule,
> as they share the same parameters and do the same -- except that
> the latter handles overlaps, while the former may be a bit faster.
>
> And I like that it's short.

Thanks for such positive feedback after our growing discussion.


> I don't like that ALLOC_ARRAY is handled in the same rule, as it is
> quite different from the other two macros.

This case distinction can share a few metavariables with the other
transformation approach, can't it?


> Coccinelle needs significantly longer to apply the new version.

This can happen because of a more complete source code search pattern,
can't it?

The data processing can benefit from parallelisation (if desired.)
https://github.com/coccinelle/coccinelle/blob/66a1118e04a6aaf1acdae89623313c8e05158a8d/docs/manual/spatch_options.tex#L745


> Here are times for master:

The SmPL script execution times can be analysed also directly with
the help of the Coccinelle software by profiling functionality.
https://github.com/coccinelle/coccinelle/blob/66a1118e04a6aaf1acdae89623313c8e05158a8d/docs/manual/spatch_options.tex#L736


> ... and here with the patch applied:
>
> Benchmark #1: make contrib/coccinelle/array.cocci.patch
>   Time (mean ± σ):     43.420 s ±  0.490 s    [User: 43.087 s, System: 0.273 s]

I got an other distribution of run times on my test system.


> The current version checks if source and destination are of the same type,
> and whether the sizeof operand is either said type or an element of source
> or destination.

The specification of metavariables for pointer types has got some consequences.


> The new one does not.

I suggest to use a search for (pointer) expressions instead.
This approach can trigger other consequences then.


> So I don't see claim 4 ("Increase the precision") fulfilled,

I tried to express an adjustment on the change granularity by the plus
and minus characters at the beginning of the lines in the semantic patch.

The SmPL disjunctions provide also more common functionality now.


> quite the opposite rather.

The search for compatible pointers can become even more challenging.


> I think an automatic transformation should only be generated if it is safe.

Different expectations can occur around safety and change convenience.

Would you eventually work with SmPL script variants in parallel according
to different confidence settings?


> It's hard to spot a weird case in a generated patch amid ten
> well-behaving ones.

I can follow also this development concern to some degree.

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: adjustments for array.cocci?
  2019-11-19 20:21               ` Markus Elfring
@ 2019-11-21 19:01                 ` René Scharfe
  0 siblings, 0 replies; 41+ messages in thread
From: René Scharfe @ 2019-11-21 19:01 UTC (permalink / raw)
  To: Markus Elfring; +Cc: git

Am 19.11.19 um 21:21 schrieb Markus Elfring:
>>> Will there occasionally be a need to change only the required source code parts?
>>
>> Changing parts that don't need to be changed does not make sense to me.
>> Why do you ask and how does it relate to the example at hand?
>
> How does such feedback fit to the discussed SmPL change specification?
>
> - E[...]
> + *(E)

The cited fragment is from a rule that normalizes references to array
elements which are fed to sizeof.  It reduces the number of combinations
to consider in the rules following it, but it's not in itself a change
we'd want to apply.

The next helper rule turns sizeof operating on array elements to sizeof
on specific types.  That one is much uglier, as it removes the
information from whence the inferred type came.

In practice none of these helpers transformed any code that wasn't
matched by the final rule for using COPY_ARRAY.  It would be nice to get
rid of them nevertheless, to rule out such side-effects.  I just don't
see a practical way to make do without them, though.

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: improve array.cocci
  2019-11-20  9:01               ` Markus Elfring
@ 2019-11-21 19:02                 ` René Scharfe
  2019-11-21 19:44                   ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: René Scharfe @ 2019-11-21 19:02 UTC (permalink / raw)
  To: Markus Elfring, git

Am 20.11.19 um 10:01 schrieb Markus Elfring:
>> I don't like that ALLOC_ARRAY is handled in the same rule, as it is
>> quite different from the other two macros.
>
> This case distinction can share a few metavariables with the other
> transformation approach, can't it?

Can it can, but should it?  In my opinion it should not; separate
concerns should get their own rules.  That's easier to manage for
developers.  I suspect it's also easier for Coccinelle to evaluate,
but didn't check.

>> Coccinelle needs significantly longer to apply the new version.
>
> This can happen because of a more complete source code search pattern,
> can't it?

Perhaps.

> The data processing can benefit from parallelisation (if desired.)
> https://github.com/coccinelle/coccinelle/blob/66a1118e04a6aaf1acdae89623313c8e05158a8d/docs/manual/spatch_options.tex#L745

Right.  I use MAKEFLAGS += -j6, which runs six spatch instances in
parallel for the coccicheck make target of Git instead.

>> Here are times for master:
>
> The SmPL script execution times can be analysed also directly with
> the help of the Coccinelle software by profiling functionality.
> https://github.com/coccinelle/coccinelle/blob/66a1118e04a6aaf1acdae89623313c8e05158a8d/docs/manual/spatch_options.tex#L736

OK, so --profile allows to analyze in which of its parts Coccinelle
spends the extra time.

>> The current version checks if source and destination are of the same type,
>> and whether the sizeof operand is either said type or an element of source
>> or destination.
>
> The specification of metavariables for pointer types has got some consequences.
>
>
>> The new one does not.
>
> I suggest to use a search for (pointer) expressions instead.
> This approach can trigger other consequences then.

Why don't we need to check the type?

>> So I don't see claim 4 ("Increase the precision") fulfilled,
>
> I tried to express an adjustment on the change granularity by the plus
> and minus characters at the beginning of the lines in the semantic patch.

Hmm, to me "precision" means to transform exactly those cases that are
intended to be transformed, i.e. to avoid false positives and negatives.
What you seem to mean here I'd rather describe as "reduce duplication".

> The SmPL disjunctions provide also more common functionality now.
>
>
>> quite the opposite rather.
>
> The search for compatible pointers can become even more challenging.

It's what we currently have, in an a clunky way.

>> I think an automatic transformation should only be generated if it is safe.
>
> Different expectations can occur around safety and change convenience.
>
> Would you eventually work with SmPL script variants in parallel according
> to different confidence settings?

Me?  No.  If I can't trust automatic transformations then I don't want
them.  I can already generate bugs fast enough manually, thank you
very much. :)

René

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: improve array.cocci
  2019-11-21 19:02                 ` René Scharfe
@ 2019-11-21 19:44                   ` Markus Elfring
  2019-11-22 15:29                     ` SZEDER Gábor
  0 siblings, 1 reply; 41+ messages in thread
From: Markus Elfring @ 2019-11-21 19:44 UTC (permalink / raw)
  To: René Scharfe, git

>> This case distinction can share a few metavariables with the other
>> transformation approach, can't it?
>
> Can it can, but should it?  In my opinion it should not;

I presented a software design in an other direction.
Some data processing approaches can benefit from sharing common information.


> separate concerns should get their own rules.

Strict separation triggers corresponding consequences.


> That's easier to manage for developers.

This view can be reasonable.


> I suspect it's also easier for Coccinelle to evaluate, but didn't check.

I find such an assumption questionable.


> I use MAKEFLAGS += -j6, which runs six spatch instances in
> parallel for the coccicheck make target of Git instead.

The program “spatch” supports parallelisation also directly by the parameter “--jobs”.
Did you try it out occasionally?


> OK, so --profile allows to analyze in which of its parts Coccinelle
> spends the extra time.

Some information about time distribution will be displayed.


>> I suggest to use a search for (pointer) expressions instead.
>> This approach can trigger other consequences then.
>
> Why don't we need to check the type?

I got the impression that we stumble on a general challenge for generic
source code searches.
How many efforts would we like to invest in solving type safety issues?


>> Would you eventually work with SmPL script variants in parallel according
>> to different confidence settings?
>
> Me?  No.

Such a view can be fine.

But I am also still trying to improve various implementation details
despite of known software limitations.


> If I can't trust automatic transformations then I don't want them.

I need to live with compromises together also with current development tools.


> I can already generate bugs fast enough manually, thank you very much. :)

This is usual.

I hope that specific tools can make our lives occasionally a bit easier.

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH] coccinelle: improve array.cocci
  2019-11-19 19:15             ` René Scharfe
  2019-11-20  9:01               ` Markus Elfring
@ 2019-11-22  5:54               ` Junio C Hamano
  2019-11-22  7:34                 ` Markus Elfring
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-11-22  5:54 UTC (permalink / raw)
  To: René Scharfe; +Cc: Markus Elfring, git

René Scharfe <l.s.r@web.de> writes:

> The current version checks if source and destination are of the same type,
> and whether the sizeof operand is either said type or an element of source
> or destination.  The new one does not.  So I don't see claim 4 ("Increase
> the precision") fulfilled, quite the opposite rather.  It can produce e.g.
> a transformation like this:
>
>  void f(int *dst, char *src, size_t n)
>  {
> -	memcpy(dst, src, n * sizeof(short));
> +	COPY_ARRAY(dst, src, n);
>  }
>
> The COPY_ARRAY there effectively expands to:
>
> 	memcpy(dst, src, n * sizeof(*dst));
>
> ... which is quite different -- if short is 2 bytes wide and int 4 bytes
> then we copy twice as many bytes as before.
>
> I think an automatic transformation should only be generated if it is
> safe.  It's hard to spot a weird case in a generated patch amid ten
> well-behaving ones.

Nicely said; I agree 100% with you that the priority of this project
is to use these *.cocci transformations in such a way that they are
absolutely safe---so that humans do not have to spend time sifting
the result through to find accidental bad transformations.

And thanks for taking time to very clearly explain why the proposed
rewrite is not something we want to take.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: improve array.cocci
  2019-11-22  5:54               ` [PATCH] " Junio C Hamano
@ 2019-11-22  7:34                 ` Markus Elfring
  0 siblings, 0 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-22  7:34 UTC (permalink / raw)
  To: Junio C Hamano, René Scharfe; +Cc: git

> Nicely said; I agree 100% with you that the priority of this project
> is to use these *.cocci transformations in such a way that they are
> absolutely safe

Such a goal can be generally desirable.

But I got the impression that there are target conflicts to consider
for the currently discussed SmPL script.
The available transformation approaches show different open issues,
don't they?

Your desire is easier to fulfil for other change patterns.


> ---so that humans do not have to spend time sifting the result through
> to find accidental bad transformations.

Automatic source code analysis contains the usual risk for false positives.
How many efforts would we like to invest in improving corresponding
software solutions?


> And thanks for taking time to very clearly explain why the proposed
> rewrite is not something we want to take.

Would you like to check once more if additional update candidates
will be found in the source files with the presented SmPL script variant?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: improve array.cocci
  2019-11-21 19:44                   ` Markus Elfring
@ 2019-11-22 15:29                     ` SZEDER Gábor
  2019-11-22 16:17                       ` Markus Elfring
  0 siblings, 1 reply; 41+ messages in thread
From: SZEDER Gábor @ 2019-11-22 15:29 UTC (permalink / raw)
  To: Markus Elfring; +Cc: René Scharfe, git, Jeff King

On Thu, Nov 21, 2019 at 08:44:12PM +0100, Markus Elfring wrote:
> The program “spatch” supports parallelisation also directly by the parameter “--jobs”.
> Did you try it out occasionally?

I did try --jobs on a couple of occasions, and the results always
varied between broken, not working, or downright making things even
slower.


  $ spatch --version
  spatch version 1.0.4 with Python support and with PCRE support
  $ spatch --sp-file contrib/coccinelle/array.cocci --all-includes --patch . --jobs 2 alias.c alloc.c
  init_defs_builtins: /usr/lib/coccinelle/standard.h
  HANDLING: alias.c alloc.c
  Fatal error: exception Sys_error("array: No such file or directory")

This issue seems to be fixed in later versions, but this is the
version what many distros still ship and what is used in our CI
builds, so we do care about 1.0.4.


  $ spatch --version
  spatch version 1.0.8 compiled with OCaml version 4.05.0
  Flags passed to the configure script: [none]
  OCaml scripting support: yes
  Python scripting support: yes
  Syntax of regular expressions: PCRE
  $ /usr/bin/time --format='%e | %M' make contrib/coccinelle/array.cocci.patch
      SPATCH contrib/coccinelle/array.cocci
  102.06 | 129084

Our Makefile recipes run Coccinelle in a sequential loop, one 'spatch'
invocation for each source file by default.  Therefore, merely passing
in '--jobs <N>' doesn't bring any runtime benefits:

  $ /usr/bin/time --format='%e | %M' make SPATCH_FLAGS='--all-includes --patch . --jobs 8' contrib/coccinelle/array.cocci.patch
      SPATCH contrib/coccinelle/array.cocci
  105.31 | 118512

Some time ago we found that invoking 'spatch' with multiple files at
once does bring notable speedup (with 1.0.4), although at the cost of
drastically increased memory footprint, see commit 960154b9c1
(coccicheck: optionally batch spatch invocations, 2019-05-06).  Alas,
trying to use that in the hope that 'spatch' can do more in parallel
if it has more files to process at once doesn't bring any runtime
benefits, either:

  $ /usr/bin/time --format='%e | %M' make SPATCH_FLAGS='--all-includes --patch . --jobs 8' SPATCH_BATCH_SIZE=8 contrib/coccinelle/array.cocci.patch
      SPATCH contrib/coccinelle/array.cocci
  116.27 | 349964

And by further increasing the batch size it just gets notably slower;
also note the order of magnitude higher max memory usage:

  $ /usr/bin/time --format='%e | %M' make SPATCH_FLAGS='--all-includes --patch . --jobs 8' SPATCH_BATCH_SIZE=32 contrib/coccinelle/array.cocci.patch
      SPATCH contrib/coccinelle/array.cocci
  197.70 | 1205784

It appears that batching 'spatch' invocations with 1.0.8 does not
bring the same benefits as with 1.0.4, but brings slowdowns instead...

Anyway, looking at 'ps u -L' output it appears that 'spatch' doesn't
really do any parallel work, and there are only two 'spatch' processes
and no threads despite '--jobs 8':

  szeder    2561  0.4  0.5  36944 21520 pts/0    S+   15:31   0:00 spatch
  szeder    2567 97.1 30.5 1228372 1205332 pts/0 R+   15:31   0:29 spatch


Note that 1.0.8 above was run in a Docker container, while 1.0.4 on
the host.  This may or may not have influenced the runtimes reported
above.  FWIW, 'make -j4 coccicheck' parallelizes just fine even in the
container and with 1.0.8.


A different approach relying on 'make -j' to parallelize 'spatch'
invocations was discussed here:

  https://public-inbox.org/git/20180802115522.16107-1-szeder.dev@gmail.com/T/#u


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: improve array.cocci
  2019-11-22 15:29                     ` SZEDER Gábor
@ 2019-11-22 16:17                       ` Markus Elfring
  0 siblings, 0 replies; 41+ messages in thread
From: Markus Elfring @ 2019-11-22 16:17 UTC (permalink / raw)
  To: Gábor Szeder; +Cc: René Scharfe, Jeff King, git

> I did try --jobs on a couple of occasions,

Thanks for your feedback.


> and the results always varied between broken, not working,

The parallelisation support by the Coccinelle software was questionable
for a while.


> or downright making things even slower.

I wonder about this information.


…
>   spatch version 1.0.4 with Python support and with PCRE support
>   Fatal error: exception Sys_error("array: No such file or directory")

Another bit of background information:
* https://lore.kernel.org/cocci/99082e9d-8047-eee3-68dd-9849868d4a96@users.sourceforge.net/
  https://systeme.lip6.fr/pipermail/cocci/2016-August/003546.html

* https://lore.kernel.org/cocci/alpine.DEB.2.10.1506171707190.2578@hadrien/
  https://systeme.lip6.fr/pipermail/cocci/2015-June/002141.html> Therefore, merely passing in '--jobs <N>' doesn't bring any runtime benefits:

This program parameter should be used for other command variants.

…
> It appears that batching 'spatch' invocations with 1.0.8 does not
> bring the same benefits as with 1.0.4, but brings slowdowns instead...

Would you like to share your experiences also on the Coccinelle mailing list?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: coccinelle: improve array.cocci
  2019-11-18 16:10           ` [PATCH] coccinelle: improve array.cocci Markus Elfring
  2019-11-19 19:15             ` René Scharfe
@ 2020-01-25  8:23             ` Markus Elfring
  1 sibling, 0 replies; 41+ messages in thread
From: Markus Elfring @ 2020-01-25  8:23 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Brian M. Carlson

> This script contained transformation rules for the semantic patch language
> which used similar code.

How do you think about to adjust any implementation details according to
the shown proposal?

Would you like to continue corresponding code review?

Regards,
Markus

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2020-01-25  8:23 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-12 15:08 coccinelle: adjustments for array.cocci? Markus Elfring
2019-11-12 18:37 ` René Scharfe
2019-11-13  2:11   ` Junio C Hamano
2019-11-13  8:49     ` Markus Elfring
2019-11-14  2:03       ` Junio C Hamano
2019-11-14 13:15         ` Markus Elfring
2019-11-14 16:41           ` René Scharfe
2019-11-14 17:14             ` Markus Elfring
2019-11-14 17:46               ` René Scharfe
2019-11-15 11:11                 ` git-coccinelle: " Markus Elfring
2019-11-15 14:20                   ` Markus Elfring
2019-11-15 18:50                   ` Markus Elfring
2019-11-16  1:00                     ` [Cocci] " Julia Lawall
2019-11-16  6:57                       ` Markus Elfring
2019-11-16  8:29                       ` Markus Elfring
2019-11-16 17:57                   ` Julia Lawall
2019-11-16 18:29                     ` Markus Elfring
2019-11-15 20:37   ` coccinelle: " Markus Elfring
2019-11-16 21:13     ` René Scharfe
2019-11-17  7:56       ` Markus Elfring
2019-11-17 13:40         ` René Scharfe
2019-11-17 18:19           ` Markus Elfring
2019-11-19 19:14             ` René Scharfe
2019-11-19 20:21               ` Markus Elfring
2019-11-21 19:01                 ` René Scharfe
2019-11-16 16:33   ` Markus Elfring
2019-11-16 21:38     ` René Scharfe
2019-11-17  8:19       ` Markus Elfring
2019-11-17 13:40         ` René Scharfe
2019-11-17 18:36           ` Markus Elfring
2019-11-19 19:15             ` René Scharfe
2019-11-18 16:10           ` [PATCH] coccinelle: improve array.cocci Markus Elfring
2019-11-19 19:15             ` René Scharfe
2019-11-20  9:01               ` Markus Elfring
2019-11-21 19:02                 ` René Scharfe
2019-11-21 19:44                   ` Markus Elfring
2019-11-22 15:29                     ` SZEDER Gábor
2019-11-22 16:17                       ` Markus Elfring
2019-11-22  5:54               ` [PATCH] " Junio C Hamano
2019-11-22  7:34                 ` Markus Elfring
2020-01-25  8:23             ` Markus Elfring

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).