All of lore.kernel.org
 help / color / mirror / Atom feed
* [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
@ 2022-03-19 19:10 Eric Wheeler
  2022-03-19 19:39 ` Julia Lawall
  2022-04-03 15:58 ` [cocci] using gcc & clang -MF to reduce spatch work (was: Using `parallel` [...]) Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Wheeler @ 2022-03-19 19:10 UTC (permalink / raw)
  To: cocci

Hi All,

Just a quick tip for others (like me) who are new to Coccinelle:

You might already use this, but if not, here's a hint for doing lots of 
replacements in parallel:

	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c

If the SmPL depends on interactions between files then `parallel` won't 
work, but if the changes work per-file then it runs much faster with lots 
of big .c files.

In cases where you do need to run over lots of files and they _do_ 
interact, you might get a Stack Overflow error.  In this case set 
something like this for 1GB of stack space:
	ulimit -s $((1024*1024))

In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few 
minutes and finished thousands of replacements!

Julia, you might add this to documentation if you think it would be 
useful.

--
Eric Wheeler

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-19 19:10 [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows Eric Wheeler
@ 2022-03-19 19:39 ` Julia Lawall
  2022-03-19 22:38   ` Eric Wheeler
  2022-04-03 15:58 ` [cocci] using gcc & clang -MF to reduce spatch work (was: Using `parallel` [...]) Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 21+ messages in thread
From: Julia Lawall @ 2022-03-19 19:39 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci



On Sat, 19 Mar 2022, Eric Wheeler wrote:

> Hi All,
>
> Just a quick tip for others (like me) who are new to Coccinelle:
>
> You might already use this, but if not, here's a hint for doing lots of
> replacements in parallel:
>
> 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
>
> If the SmPL depends on interactions between files then `parallel` won't
> work, but if the changes work per-file then it runs much faster with lots
> of big .c files.
>
> In cases where you do need to run over lots of files and they _do_
> interact, you might get a Stack Overflow error.  In this case set
> something like this for 1GB of stack space:
> 	ulimit -s $((1024*1024))
>
> In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> minutes and finished thousands of replacements!
>
> Julia, you might add this to documentation if you think it would be
> useful.

I'm not sure what you are trying to do.  You can give spatch the name of a
directory and the argument -j N for a number for cores N, ad it will run
in parallel on the files in the directroy.

julia

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-19 19:39 ` Julia Lawall
@ 2022-03-19 22:38   ` Eric Wheeler
  2022-03-20  6:33     ` Julia Lawall
  2022-03-20  6:47     ` [cocci] Parallel data processing for selected SmPL scripts Markus Elfring
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Wheeler @ 2022-03-19 22:38 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

On Sat, 19 Mar 2022, Julia Lawall wrote:
> On Sat, 19 Mar 2022, Eric Wheeler wrote:
> 
> > Hi All,
> >
> > Just a quick tip for others (like me) who are new to Coccinelle:
> >
> > You might already use this, but if not, here's a hint for doing lots of
> > replacements in parallel:
> >
> > 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
> >
> > If the SmPL depends on interactions between files then `parallel` won't
> > work, but if the changes work per-file then it runs much faster with lots
> > of big .c files.
> >
> > In cases where you do need to run over lots of files and they _do_
> > interact, you might get a Stack Overflow error.  In this case set
> > something like this for 1GB of stack space:
> > 	ulimit -s $((1024*1024))
> >
> > In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> > minutes and finished thousands of replacements!
> >
> > Julia, you might add this to documentation if you think it would be
> > useful.
> 
> I'm not sure what you are trying to do.  You can give spatch the name of a
> directory and the argument -j N for a number for cores N, ad it will run
> in parallel on the files in the directroy.

Oh really? Well, nevermind what I said then.

However, that could be a documentation bug:

I don't see -j in the `spatch --help` output.

--
Eric Wheeler


> 
> julia
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-19 22:38   ` Eric Wheeler
@ 2022-03-20  6:33     ` Julia Lawall
  2022-03-20 21:18       ` Eric Wheeler
  2022-04-01 20:20       ` Eric Wheeler
  2022-03-20  6:47     ` [cocci] Parallel data processing for selected SmPL scripts Markus Elfring
  1 sibling, 2 replies; 21+ messages in thread
From: Julia Lawall @ 2022-03-20  6:33 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci



On Sat, 19 Mar 2022, Eric Wheeler wrote:

> On Sat, 19 Mar 2022, Julia Lawall wrote:
> > On Sat, 19 Mar 2022, Eric Wheeler wrote:
> >
> > > Hi All,
> > >
> > > Just a quick tip for others (like me) who are new to Coccinelle:
> > >
> > > You might already use this, but if not, here's a hint for doing lots of
> > > replacements in parallel:
> > >
> > > 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
> > >
> > > If the SmPL depends on interactions between files then `parallel` won't
> > > work, but if the changes work per-file then it runs much faster with lots
> > > of big .c files.
> > >
> > > In cases where you do need to run over lots of files and they _do_
> > > interact, you might get a Stack Overflow error.  In this case set
> > > something like this for 1GB of stack space:
> > > 	ulimit -s $((1024*1024))
> > >
> > > In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> > > minutes and finished thousands of replacements!
> > >
> > > Julia, you might add this to documentation if you think it would be
> > > useful.
> >
> > I'm not sure what you are trying to do.  You can give spatch the name of a
> > directory and the argument -j N for a number for cores N, ad it will run
> > in parallel on the files in the directroy.
>
> Oh really? Well, nevermind what I said then.
>
> However, that could be a documentation bug:
>
> I don't see -j in the `spatch --help` output.

It's there, but there are many other options, so you may have missed it:

  --jobs          the number of processes to be used
  -j              the number of processes to be used

julia

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Parallel data processing for selected SmPL scripts
  2022-03-19 22:38   ` Eric Wheeler
  2022-03-20  6:33     ` Julia Lawall
@ 2022-03-20  6:47     ` Markus Elfring
  1 sibling, 0 replies; 21+ messages in thread
From: Markus Elfring @ 2022-03-20  6:47 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci


> I don't see -j in the `spatch --help` output.


Did you overlook the following information?

* concurrency

* --jobs          the number of processes to be used


Would you like to achieve any improvements for the affected software documentation?
https://gitlab.inria.fr/coccinelle/coccinelle/-/blob/20fdb67f4b20a242f222337e13091115884cf6bb/docs/spatch.1.in#L453
https://github.com/coccinelle/coccinelle/blob/7e6bf1fb99c80521b83740e81c95dc417ae6a621/docs/spatch.1.in#L453

Regards,
Markus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-20  6:33     ` Julia Lawall
@ 2022-03-20 21:18       ` Eric Wheeler
  2022-03-20 21:25         ` Julia Lawall
  2022-04-01 20:20       ` Eric Wheeler
  1 sibling, 1 reply; 21+ messages in thread
From: Eric Wheeler @ 2022-03-20 21:18 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

On Sun, 20 Mar 2022, Julia Lawall wrote:
> On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > On Sat, 19 Mar 2022, Julia Lawall wrote:
> > > On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > >
> > > > Hi All,
> > > >
> > > > Just a quick tip for others (like me) who are new to Coccinelle:
> > > >
> > > > You might already use this, but if not, here's a hint for doing lots of
> > > > replacements in parallel:
> > > >
> > > > 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
> > > >
> > > > If the SmPL depends on interactions between files then `parallel` won't
> > > > work, but if the changes work per-file then it runs much faster with lots
> > > > of big .c files.
> > > >
> > > > In cases where you do need to run over lots of files and they _do_
> > > > interact, you might get a Stack Overflow error.  In this case set
> > > > something like this for 1GB of stack space:
> > > > 	ulimit -s $((1024*1024))
> > > >
> > > > In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> > > > minutes and finished thousands of replacements!
> > > >
> > > > Julia, you might add this to documentation if you think it would be
> > > > useful.
> > >
> > > I'm not sure what you are trying to do.  You can give spatch the name of a
> > > directory and the argument -j N for a number for cores N, ad it will run
> > > in parallel on the files in the directroy.
> >
> > Oh really? Well, nevermind what I said then.
> >
> > However, that could be a documentation bug:
> >
> > I don't see -j in the `spatch --help` output.
> 
> It's there, but there are many other options, so you may have missed it:
> 
>   --jobs          the number of processes to be used
>   -j              the number of processes to be used

hmm... is there a new version or does my OCaml not support the feature?

]# ./spatch.opt 2>&1 | grep jobs
  <no output>

coccinelle-git]# ./spatch.opt --version
spatch version 1.1.1-00106-ge65a6dd compiled with OCaml version 4.05.0
Flags passed to the configure script: --prefix=/usr/local/coccinelle-git/
OCaml scripting support: yes
Python scripting support: yes
Syntax of regular expressions: PCRE

-Eric

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-20 21:18       ` Eric Wheeler
@ 2022-03-20 21:25         ` Julia Lawall
  2022-03-20 22:15           ` Eric Wheeler
  0 siblings, 1 reply; 21+ messages in thread
From: Julia Lawall @ 2022-03-20 21:25 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci

> hmm... is there a new version or does my OCaml not support the feature?
>
> ]# ./spatch.opt 2>&1 | grep jobs
>   <no output>
>
> coccinelle-git]# ./spatch.opt --version
> spatch version 1.1.1-00106-ge65a6dd compiled with OCaml version 4.05.0
> Flags passed to the configure script: --prefix=/usr/local/coccinelle-git/
> OCaml scripting support: yes
> Python scripting support: yes
> Syntax of regular expressions: PCRE

You have to use the --help option to get the complete set of arguments.

julia

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-20 21:25         ` Julia Lawall
@ 2022-03-20 22:15           ` Eric Wheeler
  2022-03-20 22:23             ` Julia Lawall
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Wheeler @ 2022-03-20 22:15 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

On Sun, 20 Mar 2022, Julia Lawall wrote:

> > hmm... is there a new version or does my OCaml not support the feature?
> >
> > ]# ./spatch.opt 2>&1 | grep jobs
> >   <no output>
> >
> > coccinelle-git]# ./spatch.opt --version
> > spatch version 1.1.1-00106-ge65a6dd compiled with OCaml version 4.05.0
> > Flags passed to the configure script: --prefix=/usr/local/coccinelle-git/
> > OCaml scripting support: yes
> > Python scripting support: yes
> > Syntax of regular expressions: PCRE
> 
> You have to use the --help option to get the complete set of arguments.

]# ./spatch.opt --help 2>&1|grep jobs
  --jobs          the number of processes to be used

There it is!

I find it strange that -h and --help behave differently. IMHO, -j is 
useful enough that you might consider adding it to the default output.


--
Eric Wheeler


> 
> julia
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-20 22:15           ` Eric Wheeler
@ 2022-03-20 22:23             ` Julia Lawall
  0 siblings, 0 replies; 21+ messages in thread
From: Julia Lawall @ 2022-03-20 22:23 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci



On Sun, 20 Mar 2022, Eric Wheeler wrote:

> On Sun, 20 Mar 2022, Julia Lawall wrote:
>
> > > hmm... is there a new version or does my OCaml not support the feature?
> > >
> > > ]# ./spatch.opt 2>&1 | grep jobs
> > >   <no output>
> > >
> > > coccinelle-git]# ./spatch.opt --version
> > > spatch version 1.1.1-00106-ge65a6dd compiled with OCaml version 4.05.0
> > > Flags passed to the configure script: --prefix=/usr/local/coccinelle-git/
> > > OCaml scripting support: yes
> > > Python scripting support: yes
> > > Syntax of regular expressions: PCRE
> >
> > You have to use the --help option to get the complete set of arguments.
>
> ]# ./spatch.opt --help 2>&1|grep jobs
>   --jobs          the number of processes to be used
>
> There it is!
>
> I find it strange that -h and --help behave differently. IMHO, -j is
> useful enough that you might consider adding it to the default output.

Maybe...  There are so many options.  I hate to move it out of a category
that it nicely fits into.

julia

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-03-20  6:33     ` Julia Lawall
  2022-03-20 21:18       ` Eric Wheeler
@ 2022-04-01 20:20       ` Eric Wheeler
  2022-04-01 20:28         ` Julia Lawall
  2022-04-01 21:10         ` Markus Elfring
  1 sibling, 2 replies; 21+ messages in thread
From: Eric Wheeler @ 2022-04-01 20:20 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

On Sun, 20 Mar 2022, Julia Lawall wrote:
> On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > On Sat, 19 Mar 2022, Julia Lawall wrote:
> > > On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > > > Just a quick tip for others (like me) who are new to Coccinelle:
> > > >
> > > > You might already use this, but if not, here's a hint for doing lots of
> > > > replacements in parallel:
> > > >
> > > > 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
> > > >
> > > > If the SmPL depends on interactions between files then `parallel` won't
> > > > work, but if the changes work per-file then it runs much faster with lots
> > > > of big .c files.
> > > >
> > > > In cases where you do need to run over lots of files and they _do_
> > > > interact, you might get a Stack Overflow error.  In this case set
> > > > something like this for 1GB of stack space:
> > > > 	ulimit -s $((1024*1024))
> > > >
> > > > In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> > > > minutes and finished thousands of replacements!
> > > >
> > > > Julia, you might add this to documentation if you think it would be
> > > > useful.
> > >
> > > I'm not sure what you are trying to do.  You can give spatch the name of a
> > > directory and the argument -j N for a number for cores N, ad it will run
> > > in parallel on the files in the directroy.
> >
>   --jobs          the number of processes to be used
>   -j              the number of processes to be used

Running `top` shows spatch at 100% cpu: it is not parallelizing, though 
the rules and files are completely independent of eachother (no 
inter-SmPL-rule dependencies).  

Not sure if this is an spatch bug or not, but its ~3x faster to use 
`parallel` as follows:

]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
real	0m1.224s

]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
real	0m4.852s


Here is the SmPL:
	@ p4 @                  
	expression list EL;     
	constant C =~ "BUG";    
	@@                      
	-printf(C, EL);         
	+BUG(C, EL);            
				
	@ p3 @                  
	expression list EL;     
	constant C =~ "warn";   
	@@                      
	-printf(C, EL);         
	+pr_warn(C, EL);        
				
	@ p @                   
	expression list EL;     
	constant C;             
	@@                      
	-printf(C, EL);         
	+pr_info(C, EL);        
				
	@ p2 @                  
	expression list EL;     
	constant C;             
	@@                      
	-fprintf(stderr, C, EL);
	+pr_err(C, EL);         


The .c files are from here:
	https://github.com/KJ7LNW/xnec2c

--
Eric Wheeler


> 
> julia
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-04-01 20:20       ` Eric Wheeler
@ 2022-04-01 20:28         ` Julia Lawall
  2022-04-02 17:39           ` Eric Wheeler
  2022-04-01 21:10         ` Markus Elfring
  1 sibling, 1 reply; 21+ messages in thread
From: Julia Lawall @ 2022-04-01 20:28 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: Julia Lawall, cocci



On Fri, 1 Apr 2022, Eric Wheeler wrote:

> On Sun, 20 Mar 2022, Julia Lawall wrote:
> > On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > > On Sat, 19 Mar 2022, Julia Lawall wrote:
> > > > On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > > > > Just a quick tip for others (like me) who are new to Coccinelle:
> > > > >
> > > > > You might already use this, but if not, here's a hint for doing lots of
> > > > > replacements in parallel:
> > > > >
> > > > > 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
> > > > >
> > > > > If the SmPL depends on interactions between files then `parallel` won't
> > > > > work, but if the changes work per-file then it runs much faster with lots
> > > > > of big .c files.
> > > > >
> > > > > In cases where you do need to run over lots of files and they _do_
> > > > > interact, you might get a Stack Overflow error.  In this case set
> > > > > something like this for 1GB of stack space:
> > > > > 	ulimit -s $((1024*1024))
> > > > >
> > > > > In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> > > > > minutes and finished thousands of replacements!
> > > > >
> > > > > Julia, you might add this to documentation if you think it would be
> > > > > useful.
> > > >
> > > > I'm not sure what you are trying to do.  You can give spatch the name of a
> > > > directory and the argument -j N for a number for cores N, ad it will run
> > > > in parallel on the files in the directroy.
> > >
> >   --jobs          the number of processes to be used
> >   -j              the number of processes to be used
>
> Running `top` shows spatch at 100% cpu: it is not parallelizing, though
> the rules and files are completely independent of eachother (no
> inter-SmPL-rule dependencies).

I don't understand what you mean by spatch at 100%.  I think you should
have 24 spatches?

On the other hand, you should not put a bunch of files on the command
line.  The intended behavior is for that not to be parallel.  Coccinelle
makes no effort to figure out whether parallelism is useful.

Normally, one runs spatch on a directory, and then it takes care of
working on the different files in parallel.

It seems that you don't want to process resources.c.  Maybe the semantic
patch should be adjusted so that the code in resource.c is not matched?

With python or OCaml you can also cause Coccinelle to exit on a file you
don't want to process.

julia

>
> Not sure if this is an spatch bug or not, but its ~3x faster to use
> `parallel` as follows:
>
> ]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> real	0m1.224s
>
> ]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> real	0m4.852s
>
>
> Here is the SmPL:
> 	@ p4 @
> 	expression list EL;
> 	constant C =~ "BUG";
> 	@@
> 	-printf(C, EL);
> 	+BUG(C, EL);
>
> 	@ p3 @
> 	expression list EL;
> 	constant C =~ "warn";
> 	@@
> 	-printf(C, EL);
> 	+pr_warn(C, EL);
>
> 	@ p @
> 	expression list EL;
> 	constant C;
> 	@@
> 	-printf(C, EL);
> 	+pr_info(C, EL);
>
> 	@ p2 @
> 	expression list EL;
> 	constant C;
> 	@@
> 	-fprintf(stderr, C, EL);
> 	+pr_err(C, EL);
>
>
> The .c files are from here:
> 	https://github.com/KJ7LNW/xnec2c
>
> --
> Eric Wheeler
>
>
> >
> > julia
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files
  2022-04-01 20:20       ` Eric Wheeler
  2022-04-01 20:28         ` Julia Lawall
@ 2022-04-01 21:10         ` Markus Elfring
  1 sibling, 0 replies; 21+ messages in thread
From: Markus Elfring @ 2022-04-01 21:10 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci

> Here is the SmPL:
> 	@ p4 @
> 	expression list EL;
> 	constant C =~ "BUG";
> 	@@
> 	-printf(C, EL);
> 	+BUG(C, EL);
>
> 	@ p3 @
> 	expression list EL;
> 	constant C =~ "warn";
> 	@@
> 	-printf(C, EL);
> 	+pr_warn(C, EL);
>
> 	@ p @
> 	expression list EL;
> 	constant C;
> 	@@
> 	-printf(C, EL);
> 	+pr_info(C, EL);
>
> 	@ p2 @
> 	expression list EL;
> 	constant C;
> 	@@
> 	-fprintf(stderr, C, EL);
> 	+pr_err(C, EL);


How do you think about to compare the software run time characteristics
with a script variant for the semantic patch language like the following?


@replacement@
constant C1 =~ "BUG",
         C2 =~ "warn",
         C3 =~ "info",
         C4;
@@
(
-printf
+BUG
 (C1, ...);
|
-printf
+pr_warn
 (C2, ...);
|
-printf
+pr_info
 (C3, ...);
|
-fprintf
+pr_err
 (
-stderr,
 C4, ...
 );
)


Would you like to extend the application of SmPL disjunctions any further?


Regards,
Markus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-04-01 20:28         ` Julia Lawall
@ 2022-04-02 17:39           ` Eric Wheeler
  2022-04-02 17:50             ` Julia Lawall
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Wheeler @ 2022-04-02 17:39 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

On Fri, 1 Apr 2022, Julia Lawall wrote:
> On Fri, 1 Apr 2022, Eric Wheeler wrote:
> > On Sun, 20 Mar 2022, Julia Lawall wrote:
> > > On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > > > On Sat, 19 Mar 2022, Julia Lawall wrote:
> > > > > On Sat, 19 Mar 2022, Eric Wheeler wrote:
> > > > > > Just a quick tip for others (like me) who are new to Coccinelle:
> > > > > >
> > > > > > You might already use this, but if not, here's a hint for doing lots of
> > > > > > replacements in parallel:
> > > > > >
> > > > > > 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
> > > > > >
> > > > > > If the SmPL depends on interactions between files then `parallel` won't
> > > > > > work, but if the changes work per-file then it runs much faster with lots
> > > > > > of big .c files.
> > > > > >
> > > > > > In cases where you do need to run over lots of files and they _do_
> > > > > > interact, you might get a Stack Overflow error.  In this case set
> > > > > > something like this for 1GB of stack space:
> > > > > > 	ulimit -s $((1024*1024))
> > > > > >
> > > > > > In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> > > > > > minutes and finished thousands of replacements!
> > > > > >
> > > > > > Julia, you might add this to documentation if you think it would be
> > > > > > useful.
> > > > >
> > > > > I'm not sure what you are trying to do.  You can give spatch the name of a
> > > > > directory and the argument -j N for a number for cores N, ad it will run
> > > > > in parallel on the files in the directroy.
> > > >
> > >   --jobs          the number of processes to be used
> > >   -j              the number of processes to be used
> >
> > Running `top` shows spatch at 100% cpu: it is not parallelizing, though
> > the rules and files are completely independent of eachother (no
> > inter-SmPL-rule dependencies).
> 
> I don't understand what you mean by spatch at 100%.  I think you should
> have 24 spatches?

I meant 100% of a single CPU core, it wasn't running in parallel, but read on:

> On the other hand, you should not put a bunch of files on the command
> line.  The intended behavior is for that not to be parallel.  Coccinelle
> makes no effort to figure out whether parallelism is useful.
> 
> Normally, one runs spatch on a directory, and then it takes care of
> working on the different files in parallel.

Oh, in that case, it is faster to use spatch's -j option and pass a 
directory (see below).  I didn't realize it operated on a directory, I've 
always passed each .c file to the cmdline, usually specifying only the 
files that I know to be affected, and sometimes *.c .

If that could be combined then -j would work in both cases.

Is there a reason that spatch should treat a list of files on the 
cmdline differently than a list of files coming out of a directory?  

Much faster as a directory:

	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci src/
	real	0m0.980s <<<

Slower even with `parallel` on a list of .c files:

	]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
	real    0m1.224s

	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
	real    0m4.852s


> It seems that you don't want to process resources.c.  Maybe the semantic
> patch should be adjusted so that the code in resource.c is not matched?

Ya, its 100k of byte stuffing from the gtk resources being written to a 
big .c file array.  No need to process it, its just slow and big.  It is 
generated at build time so I deleted it to use src/.

> With python or OCaml you can also cause Coccinelle to exit on a file you
> don't want to process.

When you say "exit", will it skip the file or exit and skip any later 
files?

How do you skip a certain .c file in the patch?

Is there a way to list exclusively which .c files I would like to use? For 
example:

   ]# spatch -j24 --sp-file cocci/printf-refactor.cocci `git grep -l printf`
                                                        ^^^^^^^^^^^^^^^^^^
I tried --use-gitgrep but I'm not sure that it did anything, or at least 
the time didn't change substantially:

	]# time spatch --use-gitgrep -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
	real	0m1.305s

	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
	real	0m1.300s

These are small numbers and doesn't really need a speed increase for this 
patch, so this question is really about future patches that may take 
longer to process.

-Eric

> 
> julia
> 
> >
> > Not sure if this is an spatch bug or not, but its ~3x faster to use
> > `parallel` as follows:
> >
> > ]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> > real	0m1.224s
> >
> > ]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> > real	0m4.852s
> >
> >
> > Here is the SmPL:
> > 	@ p4 @
> > 	expression list EL;
> > 	constant C =~ "BUG";
> > 	@@
> > 	-printf(C, EL);
> > 	+BUG(C, EL);
> >
> > 	@ p3 @
> > 	expression list EL;
> > 	constant C =~ "warn";
> > 	@@
> > 	-printf(C, EL);
> > 	+pr_warn(C, EL);
> >
> > 	@ p @
> > 	expression list EL;
> > 	constant C;
> > 	@@
> > 	-printf(C, EL);
> > 	+pr_info(C, EL);
> >
> > 	@ p2 @
> > 	expression list EL;
> > 	constant C;
> > 	@@
> > 	-fprintf(stderr, C, EL);
> > 	+pr_err(C, EL);
> >
> >
> > The .c files are from here:
> > 	https://github.com/KJ7LNW/xnec2c
> >
> > --
> > Eric Wheeler
> >
> >
> > >
> > > julia
> > >
> >
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-04-02 17:39           ` Eric Wheeler
@ 2022-04-02 17:50             ` Julia Lawall
  2022-04-02 20:59               ` Eric Wheeler
  0 siblings, 1 reply; 21+ messages in thread
From: Julia Lawall @ 2022-04-02 17:50 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci

> Oh, in that case, it is faster to use spatch's -j option and pass a
> directory (see below).  I didn't realize it operated on a directory, I've
> always passed each .c file to the cmdline, usually specifying only the
> files that I know to be affected, and sometimes *.c .
>
> If that could be combined then -j would work in both cases.
>
> Is there a reason that spatch should treat a list of files on the
> cmdline differently than a list of files coming out of a directory?

So that you can have semantic patches that cross multiple files.  Ideally,
you would find the files that go together, eg for each device driver, and
then run spatch for each.  There is also a way to make a file with the
names of the files that go together, separated by blank lines.  But that
is a lot of work, and fragile over time.

>
> Much faster as a directory:
>
> 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci src/
> 	real	0m0.980s <<<
>
> Slower even with `parallel` on a list of .c files:
>
> 	]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> 	real    0m1.224s
>
> 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> 	real    0m4.852s
>
>
> > It seems that you don't want to process resources.c.  Maybe the semantic
> > patch should be adjusted so that the code in resource.c is not matched?
>
> Ya, its 100k of byte stuffing from the gtk resources being written to a
> big .c file array.  No need to process it, its just slow and big.  It is
> generated at build time so I deleted it to use src/.
>
> > With python or OCaml you can also cause Coccinelle to exit on a file you
> > don't want to process.
>
> When you say "exit", will it skip the file or exit and skip any later
> files?

Just stop processing of the current file.

> How do you skip a certain .c file in the patch?

@script:ocaml@
@@

if List.mem (List.hd(Coccilib.files())) ["badfile1";"badfile2"]
then Coccilib.exit()

> Is there a way to list exclusively which .c files I would like to use? For
> example:
>
>    ]# spatch -j24 --sp-file cocci/printf-refactor.cocci `git grep -l printf`
>                                                         ^^^^^^^^^^^^^^^^^^

Coccinelle already finds the words in your semantic patch that are
important for a change to occur, and only parses files that contain those
words.

> I tried --use-gitgrep but I'm not sure that it did anything, or at least
> the time didn't change substantially:

You can see from the HANDLING and Skipping output what is being parsed and
what is being skipped.

Perhpas your semantic patch is not written in a way that allows spatch to
find out what the important words are.  You can run spatch --parse-cocci
and it will show you the important words at the end.

Note that putting the above script will cause there to be no more
important words and thus all files to be considered, because the script
code applies to all files.

You can use the --file-groups <file> argument with the files of interest
like this

file1.c

file2.c

file3.c

But in most cases this doesn't seem like a good idea.  The list of files
could change over time.

julia

> 	]# time spatch --use-gitgrep -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
> 	real	0m1.305s
>
> 	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
> 	real	0m1.300s
>
> These are small numbers and doesn't really need a speed increase for this
> patch, so this question is really about future patches that may take
> longer to process.
>
> -Eric
>
> >
> > julia
> >
> > >
> > > Not sure if this is an spatch bug or not, but its ~3x faster to use
> > > `parallel` as follows:
> > >
> > > ]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> > > real	0m1.224s
> > >
> > > ]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> > > real	0m4.852s
> > >
> > >
> > > Here is the SmPL:
> > > 	@ p4 @
> > > 	expression list EL;
> > > 	constant C =~ "BUG";
> > > 	@@
> > > 	-printf(C, EL);
> > > 	+BUG(C, EL);
> > >
> > > 	@ p3 @
> > > 	expression list EL;
> > > 	constant C =~ "warn";
> > > 	@@
> > > 	-printf(C, EL);
> > > 	+pr_warn(C, EL);
> > >
> > > 	@ p @
> > > 	expression list EL;
> > > 	constant C;
> > > 	@@
> > > 	-printf(C, EL);
> > > 	+pr_info(C, EL);
> > >
> > > 	@ p2 @
> > > 	expression list EL;
> > > 	constant C;
> > > 	@@
> > > 	-fprintf(stderr, C, EL);
> > > 	+pr_err(C, EL);
> > >
> > >
> > > The .c files are from here:
> > > 	https://github.com/KJ7LNW/xnec2c
> > >
> > > --
> > > Eric Wheeler
> > >
> > >
> > > >
> > > > julia
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-04-02 17:50             ` Julia Lawall
@ 2022-04-02 20:59               ` Eric Wheeler
  2022-04-02 21:19                 ` Julia Lawall
  2022-04-03  8:32                 ` [cocci] Using `parallel` to run over lots of .c files Markus Elfring
  0 siblings, 2 replies; 21+ messages in thread
From: Eric Wheeler @ 2022-04-02 20:59 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

> > Is there a reason that spatch should treat a list of files on the
> > cmdline differently than a list of files coming out of a directory?
> 
> So that you can have semantic patches that cross multiple files.  Ideally,
> you would find the files that go together, eg for each device driver, and
> then run spatch for each.  There is also a way to make a file with the
> names of the files that go together, separated by blank lines.  But that
> is a lot of work, and fragile over time.
> 
> > Much faster as a directory:
> >
> > 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci src/
> > 	real	0m0.980s <<<
> >
> > Slower even with `parallel` on a list of .c files:
> >
> > 	]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> > 	real    0m1.224s
> >
> > 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> > 	real    0m4.852s
> >
> > > It seems that you don't want to process resources.c.  Maybe the semantic
> > > patch should be adjusted so that the code in resource.c is not matched?
> > > With python or OCaml you can also cause Coccinelle to exit on a file you
> > > don't want to process.
> > When you say "exit", will it skip the file or exit and skip any later
> > files?
> Just stop processing of the current file.
> > How do you skip a certain .c file in the patch?
> @script:ocaml@
> @@
> if List.mem (List.hd(Coccilib.files())) ["badfile1";"badfile2"]
> then Coccilib.exit()
> 
> > Is there a way to list exclusively which .c files I would like to use? For
> > example:
> >
> >    ]# spatch -j24 --sp-file cocci/printf-refactor.cocci `git grep -l printf`
> >                                                         ^^^^^^^^^^^^^^^^^^
> 
> Coccinelle already finds the words in your semantic patch that are
> important for a change to occur, and only parses files that contain those
> words.

Ok, then maybe I'm trying to optimize something that doesn't need 
optimized.

> > I tried --use-gitgrep but I'm not sure that it did anything, or at least
> > the time didn't change substantially:
> 
> You can see from the HANDLING and Skipping output what is being parsed and
> what is being skipped.
> 
> Perhpas your semantic patch is not written in a way that allows spatch to
> find out what the important words are.  You can run spatch --parse-cocci
> and it will show you the important words at the end.

It is getting the important words for grepping:
	Grep query printf || stderr || fprintf

> Note that putting the above script will cause there to be no more
> important words and thus all files to be considered, because the script
> code applies to all files.

Interesting.  To confirm my understanding: Did you mean that including 
your @script:ocaml@ example would prevent important word processing?  

> You can use the --file-groups <file> argument with the files of interest
> like this

Should --file-groups work with spatch -jN such that each group is 
processed in parallel?  It seems that --file-groups is the same as putting 
it on the command line, at least in the trivial case of a single group:

Directory is parallel:
	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
	real	0m1.063s

File-groups are serial:
	]# git grep -l printf  > f
	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci --file-groups f &> /dev/null
	real	0m4.959s

Same thing, serial, without the temp file "f" using <() since 
--file-groups doesn't appear to seek():
	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci --file-groups <(git grep -l printf) &> /dev/null
	real	0m4.981s

-Eric

> But in most cases this doesn't seem like a good idea.  The list of files
> could change over time.
> 
> julia
> 
> > 	]# time spatch --use-gitgrep -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
> > 	real	0m1.305s
> >
> > 	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
> > 	real	0m1.300s
> >
> > These are small numbers and doesn't really need a speed increase for this
> > patch, so this question is really about future patches that may take
> > longer to process.
> >
> > -Eric
> >
> > >
> > > julia
> > >
> > > >
> > > > Not sure if this is an spatch bug or not, but its ~3x faster to use
> > > > `parallel` as follows:
> > > >
> > > > ]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> > > > real	0m1.224s
> > > >
> > > > ]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> > > > real	0m4.852s
> > > >
> > > >
> > > > Here is the SmPL:
> > > > 	@ p4 @
> > > > 	expression list EL;
> > > > 	constant C =~ "BUG";
> > > > 	@@
> > > > 	-printf(C, EL);
> > > > 	+BUG(C, EL);
> > > >
> > > > 	@ p3 @
> > > > 	expression list EL;
> > > > 	constant C =~ "warn";
> > > > 	@@
> > > > 	-printf(C, EL);
> > > > 	+pr_warn(C, EL);
> > > >
> > > > 	@ p @
> > > > 	expression list EL;
> > > > 	constant C;
> > > > 	@@
> > > > 	-printf(C, EL);
> > > > 	+pr_info(C, EL);
> > > >
> > > > 	@ p2 @
> > > > 	expression list EL;
> > > > 	constant C;
> > > > 	@@
> > > > 	-fprintf(stderr, C, EL);
> > > > 	+pr_err(C, EL);
> > > >
> > > >
> > > > The .c files are from here:
> > > > 	https://github.com/KJ7LNW/xnec2c
> > > >
> > > > --
> > > > Eric Wheeler
> > > >
> > > >
> > > > >
> > > > > julia
> > > > >
> > > >
> > >
> >
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-04-02 20:59               ` Eric Wheeler
@ 2022-04-02 21:19                 ` Julia Lawall
  2022-04-02 23:03                   ` Eric Wheeler
  2022-04-03  8:32                 ` [cocci] Using `parallel` to run over lots of .c files Markus Elfring
  1 sibling, 1 reply; 21+ messages in thread
From: Julia Lawall @ 2022-04-02 21:19 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci



On Sat, 2 Apr 2022, Eric Wheeler wrote:

> > > Is there a reason that spatch should treat a list of files on the
> > > cmdline differently than a list of files coming out of a directory?
> >
> > So that you can have semantic patches that cross multiple files.  Ideally,
> > you would find the files that go together, eg for each device driver, and
> > then run spatch for each.  There is also a way to make a file with the
> > names of the files that go together, separated by blank lines.  But that
> > is a lot of work, and fragile over time.
> >
> > > Much faster as a directory:
> > >
> > > 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci src/
> > > 	real	0m0.980s <<<
> > >
> > > Slower even with `parallel` on a list of .c files:
> > >
> > > 	]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> > > 	real    0m1.224s
> > >
> > > 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> > > 	real    0m4.852s
> > >
> > > > It seems that you don't want to process resources.c.  Maybe the semantic
> > > > patch should be adjusted so that the code in resource.c is not matched?
> > > > With python or OCaml you can also cause Coccinelle to exit on a file you
> > > > don't want to process.
> > > When you say "exit", will it skip the file or exit and skip any later
> > > files?
> > Just stop processing of the current file.
> > > How do you skip a certain .c file in the patch?
> > @script:ocaml@
> > @@
> > if List.mem (List.hd(Coccilib.files())) ["badfile1";"badfile2"]
> > then Coccilib.exit()
> >
> > > Is there a way to list exclusively which .c files I would like to use? For
> > > example:
> > >
> > >    ]# spatch -j24 --sp-file cocci/printf-refactor.cocci `git grep -l printf`
> > >                                                         ^^^^^^^^^^^^^^^^^^
> >
> > Coccinelle already finds the words in your semantic patch that are
> > important for a change to occur, and only parses files that contain those
> > words.
>
> Ok, then maybe I'm trying to optimize something that doesn't need
> optimized.
>
> > > I tried --use-gitgrep but I'm not sure that it did anything, or at least
> > > the time didn't change substantially:
> >
> > You can see from the HANDLING and Skipping output what is being parsed and
> > what is being skipped.
> >
> > Perhpas your semantic patch is not written in a way that allows spatch to
> > find out what the important words are.  You can run spatch --parse-cocci
> > and it will show you the important words at the end.
>
> It is getting the important words for grepping:
> 	Grep query printf || stderr || fprintf
>
> > Note that putting the above script will cause there to be no more
> > important words and thus all files to be considered, because the script
> > code applies to all files.
>
> Interesting.  To confirm my understanding: Did you mean that including
> your @script:ocaml@ example would prevent important word processing?

I believe so.  You can always confirm with --parse-cocci.

>
> > You can use the --file-groups <file> argument with the files of interest
> > like this
>
> Should --file-groups work with spatch -jN such that each group is
> processed in parallel?  It seems that --file-groups is the same as putting
> it on the command line, at least in the trivial case of a single group:

Exactly.  But I suggested to put a line between each file, making one
group per file.

julia

> Directory is parallel:
> 	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
> 	real	0m1.063s
>
> File-groups are serial:
> 	]# git grep -l printf  > f
> 	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci --file-groups f &> /dev/null
> 	real	0m4.959s
>
> Same thing, serial, without the temp file "f" using <() since
> --file-groups doesn't appear to seek():
> 	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci --file-groups <(git grep -l printf) &> /dev/null
> 	real	0m4.981s
>
> -Eric
>
> > But in most cases this doesn't seem like a good idea.  The list of files
> > could change over time.
> >
> > julia
> >
> > > 	]# time spatch --use-gitgrep -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
> > > 	real	0m1.305s
> > >
> > > 	]# time spatch -j4 --sp-file cocci/printf-refactor.cocci src/ &> /dev/null
> > > 	real	0m1.300s
> > >
> > > These are small numbers and doesn't really need a speed increase for this
> > > patch, so this question is really about future patches that may take
> > > longer to process.
> > >
> > > -Eric
> > >
> > > >
> > > > julia
> > > >
> > > > >
> > > > > Not sure if this is an spatch bug or not, but its ~3x faster to use
> > > > > `parallel` as follows:
> > > > >
> > > > > ]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> > > > > real	0m1.224s
> > > > >
> > > > > ]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> > > > > real	0m4.852s
> > > > >
> > > > >
> > > > > Here is the SmPL:
> > > > > 	@ p4 @
> > > > > 	expression list EL;
> > > > > 	constant C =~ "BUG";
> > > > > 	@@
> > > > > 	-printf(C, EL);
> > > > > 	+BUG(C, EL);
> > > > >
> > > > > 	@ p3 @
> > > > > 	expression list EL;
> > > > > 	constant C =~ "warn";
> > > > > 	@@
> > > > > 	-printf(C, EL);
> > > > > 	+pr_warn(C, EL);
> > > > >
> > > > > 	@ p @
> > > > > 	expression list EL;
> > > > > 	constant C;
> > > > > 	@@
> > > > > 	-printf(C, EL);
> > > > > 	+pr_info(C, EL);
> > > > >
> > > > > 	@ p2 @
> > > > > 	expression list EL;
> > > > > 	constant C;
> > > > > 	@@
> > > > > 	-fprintf(stderr, C, EL);
> > > > > 	+pr_err(C, EL);
> > > > >
> > > > >
> > > > > The .c files are from here:
> > > > > 	https://github.com/KJ7LNW/xnec2c
> > > > >
> > > > > --
> > > > > Eric Wheeler
> > > > >
> > > > >
> > > > > >
> > > > > > julia
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows.
  2022-04-02 21:19                 ` Julia Lawall
@ 2022-04-02 23:03                   ` Eric Wheeler
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wheeler @ 2022-04-02 23:03 UTC (permalink / raw)
  To: Julia Lawall; +Cc: cocci

On Sat, 2 Apr 2022, Julia Lawall wrote:
> On Sat, 2 Apr 2022, Eric Wheeler wrote:
> > > > Is there a reason that spatch should treat a list of files on the
> > > > cmdline differently than a list of files coming out of a directory?
> > >
> > > So that you can have semantic patches that cross multiple files.  Ideally,
> > > you would find the files that go together, eg for each device driver, and
> > > then run spatch for each.  There is also a way to make a file with the
> > > names of the files that go together, separated by blank lines.  But that
> > > is a lot of work, and fragile over time.
> > >
> > > > Much faster as a directory:
> > > >
> > > > 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci src/
> > > > 	real	0m0.980s <<<
> > > >
> > > > Slower even with `parallel` on a list of .c files:
> > > >
> > > > 	]# time parallel -j24 -- spatch --sp-file cocci/printf-refactor.cocci  :::  `ls src/*.c|grep -v resources.c` &> /dev/null
> > > > 	real    0m1.224s
> > > >
> > > > 	]# time spatch -j24 --sp-file cocci/printf-refactor.cocci `ls src/*.c|grep -v resources.c` &> /dev/null
> > > > 	real    0m4.852s
> > > >
> > > > > It seems that you don't want to process resources.c.  Maybe the semantic
> > > > > patch should be adjusted so that the code in resource.c is not matched?
> > > > > With python or OCaml you can also cause Coccinelle to exit on a file you
> > > > > don't want to process.
> > > > When you say "exit", will it skip the file or exit and skip any later
> > > > files?
> > > Just stop processing of the current file.
> > > > How do you skip a certain .c file in the patch?
> > > @script:ocaml@
> > > @@
> > > if List.mem (List.hd(Coccilib.files())) ["badfile1";"badfile2"]
> > > then Coccilib.exit()
> > >
> > > > Is there a way to list exclusively which .c files I would like to use? For
> > > > example:
> > > >
> > > >    ]# spatch -j24 --sp-file cocci/printf-refactor.cocci `git grep -l printf`
> > > >                                                         ^^^^^^^^^^^^^^^^^^
> > >
> > > Coccinelle already finds the words in your semantic patch that are
> > > important for a change to occur, and only parses files that contain those
> > > words.
> >
> > Ok, then maybe I'm trying to optimize something that doesn't need
> > optimized.
> >
> > > > I tried --use-gitgrep but I'm not sure that it did anything, or at least
> > > > the time didn't change substantially:
> > >
> > > You can see from the HANDLING and Skipping output what is being parsed and
> > > what is being skipped.
> > >
> > > Perhpas your semantic patch is not written in a way that allows spatch to
> > > find out what the important words are.  You can run spatch --parse-cocci
> > > and it will show you the important words at the end.
> >
> > It is getting the important words for grepping:
> > 	Grep query printf || stderr || fprintf
> >
> > > Note that putting the above script will cause there to be no more
> > > important words and thus all files to be considered, because the script
> > > code applies to all files.
> >
> > Interesting.  To confirm my understanding: Did you mean that including
> > your @script:ocaml@ example would prevent important word processing?
> 
> I believe so.  You can always confirm with --parse-cocci.
> 
> >
> > > You can use the --file-groups <file> argument with the files of interest
> > > like this
> >
> > Should --file-groups work with spatch -jN such that each group is
> > processed in parallel?  It seems that --file-groups is the same as putting
> > it on the command line, at least in the trivial case of a single group:
> 
> Exactly.  But I suggested to put a line between each file, making one
> group per file.

Got it.  Now my understanding is consistent with the output,
and `spatch -jN` is faster than using `parallel`:

	]$ time spatch  -j4 --sp-file cocci/printf-refactor.cocci src/ &>/dev/null
	real	0m6.826s

	]$ time spatch  -j4 --sp-file cocci/printf-refactor.cocci --file-groups <(git grep -l printf | sed 's/$/\n/') &>/dev/null
	real	0m6.767s

	]$ time parallel -j4 spatch --sp-file cocci/printf-refactor.cocci ::: src/*.c &> /dev/null
	real	0m8.295s

Thanks for your help!

--
Eric Wheeler


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files
  2022-04-02 20:59               ` Eric Wheeler
  2022-04-02 21:19                 ` Julia Lawall
@ 2022-04-03  8:32                 ` Markus Elfring
  2022-04-06  1:28                   ` Eric Wheeler
  1 sibling, 1 reply; 21+ messages in thread
From: Markus Elfring @ 2022-04-03  8:32 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci

> Ok, then maybe I'm trying to optimize something that doesn't need optimized.


Your understanding is still evolving also for the supported functionality of
the Coccinelle software (and corresponding program parameters).

The run time characteristics might be good enough for your source code transformations.

Additional development ideas can be shared for possible adjustments of implementation details,
can't they?


Regards,
Markus


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [cocci] using gcc & clang -MF to reduce spatch work (was: Using `parallel` [...])
  2022-03-19 19:10 [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows Eric Wheeler
  2022-03-19 19:39 ` Julia Lawall
@ 2022-04-03 15:58 ` Ævar Arnfjörð Bjarmason
  2022-04-03 16:27   ` Julia Lawall
  1 sibling, 1 reply; 21+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-04-03 15:58 UTC (permalink / raw)
  To: Eric Wheeler; +Cc: cocci


On Sat, Mar 19 2022, Eric Wheeler wrote:

> Hi All,
>
> Just a quick tip for others (like me) who are new to Coccinelle:
>
> You might already use this, but if not, here's a hint for doing lots of 
> replacements in parallel:
>
> 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
>
> If the SmPL depends on interactions between files then `parallel` won't 
> work, but if the changes work per-file then it runs much faster with lots 
> of big .c files.
>
> In cases where you do need to run over lots of files and they _do_ 
> interact, you might get a Stack Overflow error.  In this case set 
> something like this for 1GB of stack space:
> 	ulimit -s $((1024*1024))
>
> In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few 
> minutes and finished thousands of replacements!
>
> Julia, you might add this to documentation if you think it would be 
> useful.

To change the $subject to an alternate approach that may work for you,
and that I've been experimenting with.

If you use GCC or Clang to compile your C code you can use the -MF
option to emit a dependency tree for your *.c files, i.e. your compiler
is already doing all the work that spatch needs to re-do to find
includes.

But more importantly, if you have 100 files and change a *.h only used
by 10 of them, you'll only need to invoke spatch on those 10 files.

I had a question related to this here on-list in
<211116.86h7ccox6b.gmgdl@evledraar.gmail.com>, i.e. you can use
--no-includes and instead feed the list of files to be included to
spatch. If those files come from your dependencies made with -MF you
migth be able to assume they're correct.

Well, not if your *.o file is newer than your *.c, as it might have
grown new includes, but in this sub-example you'd make your *.c.cocci-ok
(or whatever) generation depend on the corresponding *.o, or would
otherwise compile first (which is much, much faster than spatch,
especially with ccache).

I don't have any stand-alone example for use, sorry, but basically in
make syntax something like:

	# get dependencies from our last compilation
	-inlude $(wildcard *.mak)
	# next compilation
	%.o: %.c
		$(CC) ... -MF $@.mak -MT $@ -MT $@.cocci-ok $<
	%.o.cocci-ok: %.o
		spatch ...

I just typed that up now, and there's sure to be syntax errors etc, but
I think you get the idea, i.e. you'll save yourself spatch work by
bootstrapping from your C compilation.

Also note that you don't need to run GNU make, if you build with
something else you can (with a tiny bit of munging) change the make
syntax it emits to just a list of files to depend on (the easiest way to
parse it being to invoke a make one-liner to spew it out for you).

I think it would be really useful if spatch could spew out the list of
headers it understood from a given file.

Depending on your compilation (e.g. ifdefs) the two may not 1=1 map, but
you could also do this CC trick in reverse, i.e. if you ran spatch first
you could feed the resulting dependency graph from spatch to your C
compiler.

In pracice I haven't found reason to worry about that lack of 1=1
mapping.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] using gcc & clang -MF to reduce spatch work (was: Using `parallel` [...])
  2022-04-03 15:58 ` [cocci] using gcc & clang -MF to reduce spatch work (was: Using `parallel` [...]) Ævar Arnfjörð Bjarmason
@ 2022-04-03 16:27   ` Julia Lawall
  0 siblings, 0 replies; 21+ messages in thread
From: Julia Lawall @ 2022-04-03 16:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Eric Wheeler, cocci

[-- Attachment #1: Type: text/plain, Size: 3460 bytes --]



On Sun, 3 Apr 2022, Ævar Arnfjörð Bjarmason wrote:

>
> On Sat, Mar 19 2022, Eric Wheeler wrote:
>
> > Hi All,
> >
> > Just a quick tip for others (like me) who are new to Coccinelle:
> >
> > You might already use this, but if not, here's a hint for doing lots of
> > replacements in parallel:
> >
> > 	parallel -j24 spatch --sp-file smpl.cocci {} --in-place ::: *.c
> >
> > If the SmPL depends on interactions between files then `parallel` won't
> > work, but if the changes work per-file then it runs much faster with lots
> > of big .c files.
> >
> > In cases where you do need to run over lots of files and they _do_
> > interact, you might get a Stack Overflow error.  In this case set
> > something like this for 1GB of stack space:
> > 	ulimit -s $((1024*1024))
> >
> > In my case spatch needed 1GB of stack and 2.4GB of RAM.  It took a few
> > minutes and finished thousands of replacements!
> >
> > Julia, you might add this to documentation if you think it would be
> > useful.
>
> To change the $subject to an alternate approach that may work for you,
> and that I've been experimenting with.
>
> If you use GCC or Clang to compile your C code you can use the -MF
> option to emit a dependency tree for your *.c files, i.e. your compiler
> is already doing all the work that spatch needs to re-do to find
> includes.
>
> But more importantly, if you have 100 files and change a *.h only used
> by 10 of them, you'll only need to invoke spatch on those 10 files.
>
> I had a question related to this here on-list in
> <211116.86h7ccox6b.gmgdl@evledraar.gmail.com>, i.e. you can use
> --no-includes and instead feed the list of files to be included to
> spatch. If those files come from your dependencies made with -MF you
> migth be able to assume they're correct.
>
> Well, not if your *.o file is newer than your *.c, as it might have
> grown new includes, but in this sub-example you'd make your *.c.cocci-ok
> (or whatever) generation depend on the corresponding *.o, or would
> otherwise compile first (which is much, much faster than spatch,
> especially with ccache).
>
> I don't have any stand-alone example for use, sorry, but basically in
> make syntax something like:
>
> 	# get dependencies from our last compilation
> 	-inlude $(wildcard *.mak)
> 	# next compilation
> 	%.o: %.c
> 		$(CC) ... -MF $@.mak -MT $@ -MT $@.cocci-ok $<
> 	%.o.cocci-ok: %.o
> 		spatch ...
>
> I just typed that up now, and there's sure to be syntax errors etc, but
> I think you get the idea, i.e. you'll save yourself spatch work by
> bootstrapping from your C compilation.
>
> Also note that you don't need to run GNU make, if you build with
> something else you can (with a tiny bit of munging) change the make
> syntax it emits to just a list of files to depend on (the easiest way to
> parse it being to invoke a make one-liner to spew it out for you).
>
> I think it would be really useful if spatch could spew out the list of
> headers it understood from a given file.

Maybe the --verbose-includes option will help.

There is also the option --use-patch-diff that will check only the
uncommitted changes in a given directory.

julia

>
> Depending on your compilation (e.g. ifdefs) the two may not 1=1 map, but
> you could also do this CC trick in reverse, i.e. if you ran spatch first
> you could feed the resulting dependency graph from spatch to your C
> compiler.
>
> In pracice I haven't found reason to worry about that lack of 1=1
> mapping.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [cocci] Using `parallel` to run over lots of .c files
  2022-04-03  8:32                 ` [cocci] Using `parallel` to run over lots of .c files Markus Elfring
@ 2022-04-06  1:28                   ` Eric Wheeler
  0 siblings, 0 replies; 21+ messages in thread
From: Eric Wheeler @ 2022-04-06  1:28 UTC (permalink / raw)
  To: Markus Elfring; +Cc: cocci

On Sun, 3 Apr 2022, Markus Elfring wrote:
> > Ok, then maybe I'm trying to optimize something that doesn't need optimized.
> 
> Your understanding is still evolving also for the supported functionality of
> the Coccinelle software (and corresponding program parameters).
> 
> The run time characteristics might be good enough for your source code transformations.
> 
> Additional development ideas can be shared for possible adjustments of implementation details,
> can't they?

Indeed!  Thanks for the encouragement.

--
Eric Wheeler



> 
> 
> Regards,
> Markus
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-04-06  1:28 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-19 19:10 [cocci] Using `parallel` to run over lots of .c files and avoiding stack overflows Eric Wheeler
2022-03-19 19:39 ` Julia Lawall
2022-03-19 22:38   ` Eric Wheeler
2022-03-20  6:33     ` Julia Lawall
2022-03-20 21:18       ` Eric Wheeler
2022-03-20 21:25         ` Julia Lawall
2022-03-20 22:15           ` Eric Wheeler
2022-03-20 22:23             ` Julia Lawall
2022-04-01 20:20       ` Eric Wheeler
2022-04-01 20:28         ` Julia Lawall
2022-04-02 17:39           ` Eric Wheeler
2022-04-02 17:50             ` Julia Lawall
2022-04-02 20:59               ` Eric Wheeler
2022-04-02 21:19                 ` Julia Lawall
2022-04-02 23:03                   ` Eric Wheeler
2022-04-03  8:32                 ` [cocci] Using `parallel` to run over lots of .c files Markus Elfring
2022-04-06  1:28                   ` Eric Wheeler
2022-04-01 21:10         ` Markus Elfring
2022-03-20  6:47     ` [cocci] Parallel data processing for selected SmPL scripts Markus Elfring
2022-04-03 15:58 ` [cocci] using gcc & clang -MF to reduce spatch work (was: Using `parallel` [...]) Ævar Arnfjörð Bjarmason
2022-04-03 16:27   ` Julia Lawall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.